linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/8] Immediate values for fast branches
@ 2007-06-15 20:23 Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 1/8] Immediate Value - Architecture Independent Code Mathieu Desnoyers
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-15 20:23 UTC (permalink / raw)
  To: akpm, linux-kernel

Hi,

This is the latest release of the previously called "Conditional Calls"
infrastructure, turned into a static variable synchronized with the load
immediate instructions whenever it is changed. Important changes were made
following the advice received (more details in the specific changelogs). Please
feel free to have a look at this improved version.

It applies at the end of the 2.6.22-rc4-mm2 patch series.

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 1/8] Immediate Value - Architecture Independent Code
  2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
@ 2007-06-15 20:23 ` Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 2/8] Immediate Values - Non Optimized Architectures Mathieu Desnoyers
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-15 20:23 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 17737 bytes --]

Immediate values (previously known as conditional calls) are used as a
fast condition used in an if() statement to compile in a block meant to be
dynamically enabled at runtime. When it is disabled, it has a very small
footprint: it loads an immediate value, fed to the branch. In the disabled
state, the branch skips the whole block (if the block consists in a function
call, it will skip the argument setup and the call itself).

It can be used to compile code in the kernel that is seldomly meant to be
dynamically activated. It's the case of CPU specific workarounds, profiling,
tracing, etc.

There is a generic immediate() version, which uses standard global variables,
and optimized per architecture immediate() implementations, which use a load
immediate to remove a data cache hit. When the immediate() functionnality is
disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__immediate" to place the pointers to the enable
value. immediate() activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer hold the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

The module_mutex is exported from kernel/module.c so other builtin objects can
lock it. At module load time, each immediate value is checked to see if it must
be enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In their current implementation, immediate values should not be used to store
data not meant to be either 0 or !0. The goal of the optimized implementations
is to get the smallest instruction, with lowest impact on the normal function
behavior. Also, since the i386 architecture implementation depends on the 0 vs
!0 variable caracteristic, it is recommended to only use these variables as a
condition for a if() statement.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-generic/vmlinux.lds.h |    8 +
 include/linux/immediate.h         |  116 ++++++++++++++++++++++
 include/linux/module.h            |   10 +
 kernel/Makefile                   |    1 
 kernel/immediate.c                |  197 ++++++++++++++++++++++++++++++++++++++
 kernel/module.c                   |   22 +++-
 6 files changed, 351 insertions(+), 3 deletions(-)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2007-06-15 15:58:53.000000000 -0400
@@ -0,0 +1,116 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be set at runtime. Only set to 0 or 1.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef __KERNEL__
+
+struct module;
+
+/* Always access this type with the provided functions. */
+typedef struct { int value; } immediate_t;
+
+struct __immediate {
+		/* Identifier variable (int *) of the immediate value */
+	immediate_t *var;
+		/*
+		 * Pointer to the memory location that holds the value (data or
+ 		 * immediate value within the load immediate instruction).
+		 */
+	void *enable;
+		/* Immediate value flags, see the list below */
+	int flags;
+};
+
+/*
+ * Immediate value flags : selects the mechanism used to set the immediate
+ * value. This is primarily used at reentrancy-unfriendly sites.
+ *
+ * On an architecture that has optimized immediate values implemented,
+ * the IF_OPTIMIZED flags distinguishes between optimized and non-optimized
+ * immediate value statements: typically, reentrancy-unfriendly sites should
+ * declare their immediate values without the IF_OPTIMIZED flag.
+ */
+#define IF_OPTIMIZED		(1 << 0) /* Use optimized immediate */
+#define IF_LOCKDEP		(1 << 1) /* Can trigger lockdep at patch site */
+#define _IF_NR			2
+
+/*
+ * In order to support embedded systems with read-only memory for the text
+ * segment, the choice to disable the "optimized" immediate values is left as a
+ * config option even if the architecture has the optimized flavor.
+ *
+ * This include scheme is used to support both the generic and optimized version
+ * at the same time : if a _immediate is declared with the IF_OPTIMIZED flags
+ * unset, it will use the generic version. This is useful when we must place
+ * immediates in locations that present specific reentrancy issues, such as
+ * some trap handlers, in the lockdep code and some of the scheduler code. The
+ * optimized version, when it uses the i386 mechanism to insure correct
+ * cross-cpu code modification, can trigger a trap, which will call into lockdep
+ * and might have other side-effects.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+
+#include <asm/immediate.h>		/* optimized immediate flavor */
+
+/*
+ * Generic immediate flavor always available.
+ *
+ * Note : the empty asm volatile with read constraint is used here instead of a
+ * "used" attribute to fix a gcc 4.1.x bug.
+ * Quoting Jeremy Fitzhardinge <jeremy@goop.org> :
+ * "There's a gcc bug which ignores the attribute for local-scope static
+ * variables: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29299"
+ */
+#define immediate_generic(flags, var)					\
+	({								\
+		static const struct __immediate __immediate_info	\
+			__attribute__((section("__immediate"))) =	\
+			{ &(var), NULL, (flags) & ~IF_OPTIMIZED } ;	\
+		asm volatile ( "" : : "i" (&__immediate_info));		\
+		((var).value);						\
+	})
+
+extern void immediate_arm(immediate_t *var);
+extern void immediate_disarm(immediate_t *var);
+extern int immediate_list(void);
+extern void module_immediate_setup(struct module *mod);
+extern void __immediate_update(immediate_t *var, int value);
+#else /* !CONFIG_IMMEDIATE */
+
+#include <asm-generic/immediate.h>	/* fallback on generic immediate */
+
+#define immediate_generic(flags, var)	(unlikely((var).value))
+
+static inline void immediate_arm(immediate_t *var)
+{
+	var->value = 1;
+}
+
+static inline void immediate_disarm(immediate_t *var)
+{
+	var->value = 0;
+}
+static inline void module_immediate_setup(struct module *mod) { }
+static inline void __immediate_update(immediate_t *var, int value)
+{
+	var->value = value;
+}
+#endif /* CONFIG_IMMEDIATE */
+
+/* immediate_query : Returns 1 if enabled, 0 if disabled or not present */
+static inline int immediate_query(immediate_t *var)
+{
+	return var->value;
+}
+
+#endif /* __KERNEL__ */
+#endif
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-06-15 15:58:28.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-06-15 16:13:50.000000000 -0400
@@ -122,6 +122,13 @@
 		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
 	}								\
 									\
+	/* Immediate values: pointers */				\
+	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
+		VMLINUX_SYMBOL(__start___immediate) = .;		\
+		*(__immediate)						\
+		VMLINUX_SYMBOL(__stop___immediate) = .;			\
+	}								\
+									\
 	/* Kernel symbol table: strings */				\
         __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
 		*(__ksymtab_strings)					\
@@ -266,4 +273,3 @@
   	*(.initcall6s.init)						\
   	*(.initcall7.init)						\
   	*(.initcall7s.init)
-
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2007-06-15 15:58:28.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2007-06-15 16:13:49.000000000 -0400
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <asm/local.h>
 
 #include <asm/module.h>
@@ -67,6 +68,10 @@
 /* Archs provide a method of finding the correct exception table. */
 struct exception_table_entry;
 
+/* Protects the list of modules. */
+extern struct mutex module_mutex;
+extern struct list_head modules;
+
 const struct exception_table_entry *
 search_extable(const struct exception_table_entry *first,
 	       const struct exception_table_entry *last,
@@ -354,6 +359,11 @@
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+
+#ifdef CONFIG_IMMEDIATE
+	const struct __immediate *immediates;
+	unsigned int num_immediates;
+#endif
 };
 #ifndef MODULE_ARCH_INIT
 #define MODULE_ARCH_INIT {}
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-06-15 15:58:28.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-06-15 16:13:49.000000000 -0400
@@ -32,6 +32,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -65,8 +66,8 @@
 static DEFINE_SPINLOCK(modlist_lock);
 
 /* List of modules, protected by module_mutex AND modlist_lock */
-static DEFINE_MUTEX(module_mutex);
-static LIST_HEAD(modules);
+DEFINE_MUTEX(module_mutex);
+LIST_HEAD(modules);
 
 static BLOCKING_NOTIFIER_HEAD(module_notify_list);
 
@@ -1578,6 +1579,9 @@
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex = 0;
+	unsigned int markersindex = 0;
+	unsigned int markersstringsindex = 0;
 	struct module *mod;
 	long err = 0;
 	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
@@ -1674,6 +1678,9 @@
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+#ifdef CONFIG_IMMEDIATE
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__immediate");
+#endif
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1684,6 +1691,8 @@
 #endif
 	if (unwindex)
 		sechdrs[unwindex].sh_flags |= SHF_ALLOC;
+	if (immediateindex)
+		sechdrs[immediateindex].sh_flags |= SHF_ALLOC;
 
 	/* Check module struct version now, before we try to use module. */
 	if (!check_modstruct_version(sechdrs, versindex, mod)) {
@@ -1824,6 +1833,13 @@
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	if (immediateindex) {
+		mod->immediates = (void *)sechdrs[immediateindex].sh_addr;
+		mod->num_immediates =
+			sechdrs[immediateindex].sh_size / sizeof(*mod->immediates);
+	}
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -1883,6 +1899,8 @@
 
 	add_kallsyms(mod, sechdrs, symindex, strindex, secstrings);
 
+	module_immediate_setup(mod);
+
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2007-06-15 15:58:28.000000000 -0400
+++ linux-2.6-lttng/kernel/Makefile	2007-06-15 16:13:50.000000000 -0400
@@ -56,6 +56,7 @@
 obj-$(CONFIG_UTS_NS) += utsname.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c	2007-06-15 15:58:53.000000000 -0400
@@ -0,0 +1,197 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+
+extern const struct __immediate __start___immediate[];
+extern const struct __immediate __stop___immediate[];
+
+/*
+ * modules_mutex nests inside immediate_mutex. immediate_mutex protects builtin
+ * immediates and module immediates.
+ */
+DEFINE_MUTEX(immediate_mutex);
+
+/*
+ * Sets a range of immediates to a enabled state : set the enable bit.
+ */
+static void _immediate_update_range(
+	const struct __immediate *begin, const struct __immediate *end)
+{
+	const struct __immediate *iter;
+	int enable;
+
+	for (iter = begin; iter < end; iter++) {
+		if (!(iter->flags & IF_OPTIMIZED))
+			continue;
+		enable = immediate_query(iter->var);
+		if (enable != IMMEDIATE_OPTIMIZED_ENABLE(iter->enable))
+			immediate_optimized_set_enable(iter->enable, enable);
+	}
+}
+
+#ifdef CONFIG_MODULES
+/*
+ * Setup the immediate according to the variable upon which it depends.  Called
+ * by load_module with module_mutex held. This mutex protects against concurrent
+ * modifications to modules'immediates. Therefore, since
+ * module_immediate_setup() does not modify builtin immediates, it does not need
+ * to take the immediate_mutex.
+ */
+void module_immediate_setup(struct module *mod)
+{
+	_immediate_update_range(mod->immediates,
+		mod->immediates+mod->num_immediates);
+}
+#endif
+
+/*
+ * Provides a listing of the immediates present in the kernel with their type
+ * (optimized or generic) and state (enabled or disabled).
+ */
+static int _immediate_list_range(const struct __immediate *begin,
+	const struct __immediate *end)
+{
+	const struct __immediate *iter;
+	int found = 0;
+
+	for (iter = begin; iter < end; iter++) {
+		printk("variable %p \n", iter->var);
+		if (iter->flags & IF_OPTIMIZED)
+			printk("  enable %u optimized\n",
+				IMMEDIATE_OPTIMIZED_ENABLE(iter->enable));
+		else
+			printk("  enable %u generic\n",
+				immediate_query(iter->var));
+		found++;
+	}
+	return found;
+}
+
+#ifdef CONFIG_MODULES
+static inline void __immediate_update_modules(immediate_t *var, int value)
+{
+	struct module *mod;
+
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		_immediate_update_range(mod->immediates,
+			mod->immediates+mod->num_immediates);
+	}
+}
+#else
+static inline void __immediate_update_modules(immediate_t *var, int value) { }
+#endif
+
+/*
+ * Calls _immediate_update_range for the core immediates and modules immediates.
+ */
+void __immediate_update(immediate_t *var, int value)
+{
+
+	var->value = value;
+	/* Core kernel immediates */
+	_immediate_update_range(__start___immediate, __stop___immediate);
+	/* immediates in modules. */
+	__immediate_update_modules(var, value);
+}
+
+#ifdef CONFIG_MODULES
+/*
+ * Takes module_mutex.
+ */
+void _immediate_update(immediate_t *var, int value)
+{
+	mutex_lock(&module_mutex);
+	__immediate_update(var, value);
+	mutex_unlock(&module_mutex);
+}
+#else
+void _immediate_update(immediate_t *var, int value)
+{
+	__immediate_update(var, value);
+}
+#endif
+
+/* immediate enabling/disabling use the immediate_mutex to synchronize. */
+void immediate_arm(immediate_t *var)
+{
+	mutex_lock(&immediate_mutex);
+	_immediate_update(var, 1);
+	mutex_unlock(&immediate_mutex);
+}
+EXPORT_SYMBOL_GPL(immediate_arm);
+
+/* immediate enabling/disabling use the immediate_mutex to synchronize. */
+void immediate_disarm(immediate_t *var)
+{
+	mutex_lock(&immediate_mutex);
+	_immediate_update(var, 0);
+	mutex_unlock(&immediate_mutex);
+}
+EXPORT_SYMBOL_GPL(immediate_disarm);
+
+#ifdef CONFIG_MODULES
+static inline int immediate_list_modules(void)
+{
+	int found = 0;
+	struct module *mod;
+
+	printk("Listing module immediate values\n");
+	mutex_lock(&module_mutex);
+	list_for_each_entry(mod, &modules, list) {
+		if (!mod->taints) {
+			printk("Listing immediate values for module %s\n",
+				mod->name);
+			found += _immediate_list_range(mod->immediates,
+				mod->immediates+mod->num_immediates);
+		}
+	}
+	mutex_unlock(&module_mutex);
+	return found;
+}
+#else
+static inline int immediate_list_modules(void)
+{
+	return 0;
+}
+#endif
+
+/*
+ * Calls _immediate_list_range for the core and module immediates.
+ * Cond call listing uses the module_mutex to synchronize.
+ * Takes the immediate mutex to protect against builtin immediate modification
+ * and takes the module_mutex to protect against module list modification.
+ * TODO : should output this listing to a procfs file.
+ */
+int immediate_list(void)
+{
+	int found = 0;
+
+	mutex_lock(&immediate_mutex);
+	/* Core kernel immediates */
+	printk("Listing kernel immediate values\n");
+	found += _immediate_list_range(__start___immediate, __stop___immediate);
+	/* immediates in modules. */
+	found += immediate_list_modules();
+	mutex_unlock(&immediate_mutex);
+	return found;
+}
+EXPORT_SYMBOL_GPL(immediate_list);

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 2/8] Immediate Values - Non Optimized Architectures
  2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 1/8] Immediate Value - Architecture Independent Code Mathieu Desnoyers
@ 2007-06-15 20:23 ` Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 3/8] Immediate Value - Add kconfig menus Mathieu Desnoyers
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-15 20:23 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-non-optimized-architectures.patch --]
[-- Type: text/plain, Size: 9601 bytes --]

Architecture agnostic, generic, version of the immediate values. It uses a
global variable to mimic the immediate values.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-alpha/immediate.h     |    1 +
 include/asm-arm/immediate.h       |    1 +
 include/asm-arm26/immediate.h     |    1 +
 include/asm-cris/immediate.h      |    1 +
 include/asm-frv/immediate.h       |    1 +
 include/asm-generic/immediate.h   |   14 ++++++++++++++
 include/asm-h8300/immediate.h     |    1 +
 include/asm-i386/immediate.h      |    1 +
 include/asm-ia64/immediate.h      |    1 +
 include/asm-m32r/immediate.h      |    1 +
 include/asm-m68k/immediate.h      |    1 +
 include/asm-m68knommu/immediate.h |    1 +
 include/asm-mips/immediate.h      |    1 +
 include/asm-parisc/immediate.h    |    1 +
 include/asm-powerpc/immediate.h   |    1 +
 include/asm-ppc/immediate.h       |    1 +
 include/asm-s390/immediate.h      |    1 +
 include/asm-sh/immediate.h        |    1 +
 include/asm-sh64/immediate.h      |    1 +
 include/asm-sparc/immediate.h     |    1 +
 include/asm-sparc64/immediate.h   |    1 +
 include/asm-um/immediate.h        |    1 +
 include/asm-v850/immediate.h      |    1 +
 include/asm-x86_64/immediate.h    |    1 +
 include/asm-xtensa/immediate.h    |    1 +
 25 files changed, 38 insertions(+)

Index: linux-2.6-lttng/include/asm-generic/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-generic/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1,14 @@
+#ifndef _ASM_GENERIC_IMMEDIATE_H
+#define _ASM_GENERIC_IMMEDIATE_H
+
+/* Default flags, used by _immediate() */
+#define IF_DEFAULT			0
+
+/* Fallback on the generic immediate, since no optimized version is available */
+#define immediate_optimized		immediate_generic
+#define _immediate(flags, var)		immediate_generic(flags, var)
+
+/* immediate with default behavior */
+#define immediate(var)		_immediate(IF_DEFAULT, var)
+
+#endif /* _ASM_GENERIC_IMMEDIATE_H */
Index: linux-2.6-lttng/include/asm-alpha/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-alpha/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-arm/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-arm/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-arm26/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-arm26/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-cris/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-cris/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-frv/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-frv/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-h8300/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-h8300/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-ia64/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-ia64/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-m32r/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-m32r/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-m68k/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-m68k/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-m68knommu/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-m68knommu/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-mips/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-mips/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-parisc/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-parisc/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-ppc/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-ppc/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-s390/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-s390/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-sh/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sh/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-sh64/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sh64/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-sparc/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sparc/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-sparc64/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sparc64/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-um/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-um/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-v850/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-v850/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-x86_64/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-x86_64/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-xtensa/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-xtensa/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-i386/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-i386/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>
Index: linux-2.6-lttng/include/asm-powerpc/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-powerpc/immediate.h	2007-06-15 16:13:55.000000000 -0400
@@ -0,0 +1 @@
+#include <asm-generic/immediate.h>

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 3/8] Immediate Value - Add kconfig menus
  2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 1/8] Immediate Value - Architecture Independent Code Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 2/8] Immediate Values - Non Optimized Architectures Mathieu Desnoyers
@ 2007-06-15 20:23 ` Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 4/8] Immediate Value - i386 Optimization Mathieu Desnoyers
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-15 20:23 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers, Adrian Bunk, Andi Kleen

[-- Attachment #1: immediate-values-kconfig-menus.patch --]
[-- Type: text/plain, Size: 14249 bytes --]

Immediate values provide a way to compile in kernels features that can be
enabled dynamically, with a very small footprint when disabled.

This patch:

Add Kconfig menus for the marker code.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
---

 arch/alpha/Kconfig       |    6 ++++++
 arch/arm/Kconfig         |    6 ++++++
 arch/arm26/Kconfig       |    6 ++++++
 arch/avr32/Kconfig.debug |    7 +++++++
 arch/cris/Kconfig        |    6 ++++++
 arch/frv/Kconfig         |    6 ++++++
 arch/h8300/Kconfig       |    6 ++++++
 arch/i386/Kconfig        |    2 ++
 arch/ia64/Kconfig        |    3 +++
 arch/m32r/Kconfig        |    6 ++++++
 arch/m68k/Kconfig        |    6 ++++++
 arch/m68knommu/Kconfig   |    6 ++++++
 arch/mips/Kconfig        |    6 ++++++
 arch/parisc/Kconfig      |    6 ++++++
 arch/powerpc/Kconfig     |    3 +++
 arch/ppc/Kconfig         |    6 ++++++
 arch/s390/Kconfig        |    2 ++
 arch/sh/Kconfig          |    6 ++++++
 arch/sh64/Kconfig        |    6 ++++++
 arch/sparc/Kconfig       |    2 ++
 arch/sparc64/Kconfig     |    3 +++
 arch/um/Kconfig          |    6 ++++++
 arch/v850/Kconfig        |    6 ++++++
 arch/x86_64/Kconfig      |    3 +++
 arch/xtensa/Kconfig      |    6 ++++++
 kernel/Kconfig.immediate |    9 +++++++++
 26 files changed, 136 insertions(+)

Index: linux-2.6-lttng/arch/alpha/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/alpha/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/alpha/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -653,6 +653,12 @@
 
 source "arch/alpha/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/alpha/Kconfig.debug"
 
 # DUMMY_CONSOLE may be defined in drivers/video/console/Kconfig
Index: linux-2.6-lttng/arch/arm/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/arm/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/arm/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -1046,6 +1046,12 @@
 
 source "arch/arm/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/arm/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/arm26/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/arm26/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/arm26/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -244,6 +244,12 @@
 
 source "drivers/usb/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/arm26/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/cris/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/cris/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/cris/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -198,6 +198,12 @@
 
 source "drivers/usb/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/cris/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/frv/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/frv/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/frv/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -375,6 +375,12 @@
 
 source "fs/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/frv/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/h8300/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/h8300/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/h8300/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -223,6 +223,12 @@
 
 source "fs/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/h8300/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/i386/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/i386/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/i386/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -1235,6 +1235,8 @@
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
 
+source "kernel/Kconfig.immediate"
+
 endif # INSTRUMENTATION
 
 source "arch/i386/Kconfig.debug"
Index: linux-2.6-lttng/arch/ia64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/ia64/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/ia64/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -593,6 +593,9 @@
 	  a probepoint and specifies the callback.  Kprobes is useful
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
+
+source "kernel/Kconfig.immediate"
+
 endmenu
 
 source "arch/ia64/Kconfig.debug"
Index: linux-2.6-lttng/arch/m32r/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/m32r/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/m32r/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -408,6 +408,12 @@
 
 source "arch/m32r/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/m32r/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/m68k/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/m68k/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/m68k/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -679,6 +679,12 @@
 
 source "fs/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/m68k/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/m68knommu/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/m68knommu/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/m68knommu/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -668,6 +668,12 @@
 
 source "fs/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/m68knommu/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/mips/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/mips/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/mips/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -1957,6 +1957,12 @@
 
 source "arch/mips/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/mips/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/parisc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/parisc/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/parisc/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -269,6 +269,12 @@
 
 source "arch/parisc/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/parisc/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/powerpc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/powerpc/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -905,6 +905,9 @@
 	  a probepoint and specifies the callback.  Kprobes is useful
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
+
+source "kernel/Kconfig.immediate"
+
 endmenu
 
 source "arch/powerpc/Kconfig.debug"
Index: linux-2.6-lttng/arch/ppc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/ppc/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/ppc/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -1451,8 +1451,14 @@
 
 source "lib/Kconfig"
 
+menu "Instrumentation Support"
+
 source "arch/powerpc/oprofile/Kconfig"
 
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/ppc/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/s390/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/s390/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/s390/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -547,6 +547,8 @@
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
 
+source "kernel/Kconfig.immediate"
+
 endmenu
 
 source "arch/s390/Kconfig.debug"
Index: linux-2.6-lttng/arch/sh/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sh/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/sh/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -721,6 +721,12 @@
 
 source "arch/sh/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/sh/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/sh64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sh64/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/sh64/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -284,6 +284,12 @@
 
 source "arch/sh64/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/sh64/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/sparc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sparc/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/sparc/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -313,6 +313,8 @@
 
 source "arch/sparc/oprofile/Kconfig"
 
+source "kernel/Kconfig.immediate"
+
 endmenu
 
 source "arch/sparc/Kconfig.debug"
Index: linux-2.6-lttng/arch/sparc64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sparc64/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/sparc64/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -438,6 +438,9 @@
 	  a probepoint and specifies the callback.  Kprobes is useful
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
+
+source "kernel/Kconfig.immediate"
+
 endmenu
 
 source "arch/sparc64/Kconfig.debug"
Index: linux-2.6-lttng/arch/um/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/um/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/um/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -335,4 +335,10 @@
 	bool
 	default n
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/um/Kconfig.debug"
Index: linux-2.6-lttng/arch/v850/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/v850/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/v850/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -331,6 +331,12 @@
 
 source "drivers/usb/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/v850/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/x86_64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/x86_64/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/x86_64/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -792,6 +792,9 @@
 	  a probepoint and specifies the callback.  Kprobes is useful
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
+
+source "kernel/Kconfig.immediate"
+
 endmenu
 
 source "arch/x86_64/Kconfig.debug"
Index: linux-2.6-lttng/arch/xtensa/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/xtensa/Kconfig	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/xtensa/Kconfig	2007-06-15 16:14:02.000000000 -0400
@@ -251,6 +251,12 @@
 	  provide one yourself.
 endmenu
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 source "arch/xtensa/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/kernel/Kconfig.immediate
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/Kconfig.immediate	2007-06-15 16:14:02.000000000 -0400
@@ -0,0 +1,9 @@
+# Immediate values configuration
+
+config IMMEDIATE
+	bool "Use self-modifying code to provide fast immediate values"
+	help
+	  Provides a way to use immediate values acting as global values to
+	  dynamically enable kernel features while having a very small
+	  footprint when disabled. You may want to disable this feature if you
+	  run your kernel code on a read-only rom/flash.
Index: linux-2.6-lttng/arch/avr32/Kconfig.debug
===================================================================
--- linux-2.6-lttng.orig/arch/avr32/Kconfig.debug	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/avr32/Kconfig.debug	2007-06-15 16:14:02.000000000 -0400
@@ -6,6 +6,9 @@
 
 source "lib/Kconfig.debug"
 
+menu "Instrumentation Support"
+	depends on EXPERIMENTAL
+
 config KPROBES
 	bool "Kprobes"
 	depends on DEBUG_KERNEL
@@ -16,4 +19,8 @@
           for kernel debugging, non-intrusive instrumentation and testing.
           If in doubt, say "N".
 
+source "kernel/Kconfig.immediate"
+
+endmenu
+
 endmenu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 4/8] Immediate Value - i386 Optimization
  2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2007-06-15 20:23 ` [patch 3/8] Immediate Value - Add kconfig menus Mathieu Desnoyers
@ 2007-06-15 20:23 ` Mathieu Desnoyers
  2007-06-15 22:02   ` Chuck Ebbert
  2007-06-15 20:23 ` [patch 5/8] Immediate Value - PowerPC Optimization Mathieu Desnoyers
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-15 20:23 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-i386-optimization.patch --]
[-- Type: text/plain, Size: 10940 bytes --]

i386 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used for the branch test.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 arch/i386/kernel/Makefile    |    1 
 arch/i386/kernel/immediate.c |  177 +++++++++++++++++++++++++++++++++++++++++++
 include/asm-i386/immediate.h |   72 +++++++++++++++++
 3 files changed, 249 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/include/asm-i386/immediate.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-i386/immediate.h	2007-06-15 16:13:55.000000000 -0400
+++ linux-2.6-lttng/include/asm-i386/immediate.h	2007-06-15 16:14:04.000000000 -0400
@@ -1 +1,71 @@
-#include <asm-generic/immediate.h>
+#ifndef _ASM_I386_IMMEDIATE_H
+#define _ASM_I386_IMMEDIATE_H
+
+/*
+ * Immediate values. i386 architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#define IF_DEFAULT (IF_OPTIMIZED | IF_LOCKDEP)
+
+/*
+ * Optimized version of the immediate. Passing the flags as a pointer to
+ * the inline assembly to trick it into putting the flags value as third
+ * parameter in the structure.
+ */
+#define immediate_optimized(flags, var)					\
+	({								\
+		int condition;						\
+		asm (	".section __immediate, \"a\", @progbits;\n\t"	\
+					".long %1, 0f, %2;\n\t"		\
+					".previous;\n\t"		\
+					"0:\n\t"			\
+					"movl %3,%0;\n\t"		\
+				: "=r" (condition)			\
+				: "m" (var),				\
+				  "m" (*(char*)flags),			\
+				  "i" (0));				\
+		(condition);						\
+	})
+
+/*
+ * immediate macro selecting the generic or optimized version of immediate,
+ * depending on the flags specified. It is a macro because we need to pass the
+ * name to immediate_optimized() and immediate_generic() so they can declare a
+ * static variable with it.
+ */
+#define _immediate(flags, var)						\
+({									\
+	(((flags) & IF_LOCKDEP) && ((flags) & IF_OPTIMIZED)) ?		\
+		immediate_optimized(flags, var) :			\
+		immediate_generic(flags, var);				\
+})
+
+/* immediate with default behavior */
+#define immediate(var)	_immediate(IF_DEFAULT, var)
+
+/*
+ * Architecture dependant immediate information, used internally for immediate
+ * activation.
+ */
+
+/*
+ * Offset of the immediate value from the start of the movl instruction, in
+ * bytes. We point to the first lower byte of the 4 bytes immediate value. Only
+ * changing one byte makes sure we do an atomic memory write, independently of
+ * the alignment of the 4 bytes in the load immediate instruction.
+ */
+#define IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
+#define IMMEDIATE_OPTIMIZED_ENABLE_TYPE unsigned char
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define IMMEDIATE_OPTIMIZED_ENABLE(a)					\
+	(*(IMMEDIATE_OPTIMIZED_ENABLE_TYPE*)				\
+		((char*)(a)+IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET))
+
+extern int immediate_optimized_set_enable(void *address, char enable);
+
+#endif /* _ASM_I386_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/i386/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/arch/i386/kernel/Makefile	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/i386/kernel/Makefile	2007-06-15 16:14:04.000000000 -0400
@@ -35,6 +35,7 @@
 obj-y				+= sysenter.o vsyscall.o
 obj-$(CONFIG_ACPI_SRAT) 	+= srat.o
 obj-$(CONFIG_EFI) 		+= efi.o efi_stub.o
+obj-$(CONFIG_IMMEDIATE)		+= immediate.o
 obj-$(CONFIG_DOUBLEFAULT) 	+= doublefault.o
 obj-$(CONFIG_SERIAL_8250)	+= legacy_serial.o
 obj-$(CONFIG_VM86)		+= vm86.o
Index: linux-2.6-lttng/arch/i386/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/i386/kernel/immediate.c	2007-06-15 16:14:04.000000000 -0400
@@ -0,0 +1,177 @@
+/*
+ * Immediate Value - i386 architecture specific code.
+ *
+ * Rationale
+ *
+ * Required because of :
+ * - Erratum 49 fix for Intel PIII.
+ * - Still present on newer processors : Intel Core 2 Duo Processor for Intel
+ *   Centrino Duo Processor Technology Specification Update, AH33.
+ *   Unsynchronized Cross-Modifying Code Operations Can Cause Unexpected
+ *   Instruction Execution Results.
+ *
+ * Permits immediate value modification by XMC with correct serialization.
+ *
+ * Reentrant for NMI and trap handler instrumentation. Permits XMC to a
+ * location that has preemption enabled because it involves no temporary or
+ * reused data structure.
+ *
+ * Quoting Richard J Moore, source of the information motivating this
+ * implementation which differs from the one proposed by Intel which is not
+ * suitable for kernel context (does not support NMI and would require disabling
+ * interrupts on every CPU for a long period) :
+ *
+ * "There is another issue to consider when looking into using probes other
+ * then int3:
+ *
+ * Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
+ * practice of modifying code on one processor where another has prefetched
+ * the unmodified version of the code. Intel states that unpredictable general
+ * protection faults may result if a synchronizing instruction (iret, int,
+ * int3, cpuid, etc ) is not executed on the second processor before it
+ * executes the pre-fetched out-of-date copy of the instruction.
+ *
+ * When we became aware of this I had a long discussion with Intel's
+ * microarchitecture guys. It turns out that the reason for this erratum
+ * (which incidentally Intel does not intend to fix) is because the trace
+ * cache - the stream of micorops resulting from instruction interpretation -
+ * cannot guaranteed to be valid. Reading between the lines I assume this
+ * issue arises because of optimization done in the trace cache, where it is
+ * no longer possible to identify the original instruction boundaries. If the
+ * CPU discoverers that the trace cache has been invalidated because of
+ * unsynchronized cross-modification then instruction execution will be
+ * aborted with a GPF. Further discussion with Intel revealed that replacing
+ * the first opcode byte with an int3 would not be subject to this erratum.
+ *
+ * So, is cmpxchg reliable? One has to guarantee more than mere atomicity."
+ *
+ * Overall design
+ *
+ * The algorithm proposed by Intel applies not so well in kernel context: it
+ * would imply disabling interrupts and looping on every CPUs while modifying
+ * the code and would not support instrumentation of code called from interrupt
+ * sources that cannot be disabled.
+ *
+ * Therefore, we use a different algorithm to respect Intel's erratum (see the
+ * quoted discussion above). We make sure that no CPU sees an out-of-date copy
+ * of a pre-fetched instruction by 1 - using a breakpoint, which skips the
+ * instruction that is going to be modified, 2 - issuing an IPI to every CPU to
+ * execute a sync_core(), to make sure that even when the breakpoint is removed,
+ * no cpu could possibly still have the out-of-date copy of the instruction,
+ * modify the now unused 2nd byte of the instruction, and then put back the
+ * original 1st byte of the instruction.
+ *
+ * It has exactly the same intent as the algorithm proposed by Intel, but
+ * it has less side-effects, scales better and supports NMI, SMI and MCE.
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/notifier.h>
+#include <linux/preempt.h>
+#include <linux/smp.h>
+#include <linux/notifier.h>
+#include <linux/module.h>
+#include <linux/immediate.h>
+#include <linux/kdebug.h>
+
+#include <asm/cacheflush.h>
+
+#define BREAKPOINT_INSTRUCTION  0xcc
+#define BREAKPOINT_INS_LEN 1
+
+static long target_eip;
+
+static void immediate_synchronize_core(void *info)
+{
+	sync_core();	/* use cpuid to stop speculative execution */
+}
+
+/*
+ * The eip value points right after the breakpoint instruction, in the second
+ * byte of the movb. Incrementing it of 1 byte makes the code resume right after
+ * the movb instruction, effectively skipping this instruction.
+ *
+ * We simply skip the 2 bytes load immediate here, leaving the register in an
+ * undefined state. We don't care about the content (0 or !0), because we are
+ * changing the value 0->1 or 1->0. This small window of undefined value
+ * doesn't matter.
+ */
+static int immediate_notifier(struct notifier_block *nb,
+	unsigned long val, void *data)
+{
+	enum die_val die_val = (enum die_val) val;
+	struct die_args *args = data;
+
+	if (!args->regs || user_mode_vm(args->regs))
+		return NOTIFY_DONE;
+
+	if (die_val == DIE_INT3	&& args->regs->eip == target_eip) {
+		args->regs->eip += 1; /* Skip the next byte of load immediate */
+		return NOTIFY_STOP;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block immediate_notify = {
+	.notifier_call = immediate_notifier,
+	.priority = 0x7fffffff,	/* we need to be notified first */
+};
+
+/*
+ * The address is not aligned. We can only change 1 byte of the value
+ * atomically.
+ * Must be called with immediate_mutex held.
+ */
+int immediate_optimized_set_enable(void *address, char enable)
+{
+	char saved_byte;
+	int ret;
+	char *dest = address;
+
+	if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
+		return 0;
+
+#if defined(CONFIG_DEBUG_RODATA) || defined(CONFIG_DEBUG_PAGEALLOC)
+	/* Make sure this page is writable */
+	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_EXEC);
+	global_flush_tlb();
+#endif
+	target_eip = (long)address + BREAKPOINT_INS_LEN;
+	/* register_die_notifier has memory barriers */
+	register_die_notifier(&immediate_notify);
+	saved_byte = *dest;
+	*dest = BREAKPOINT_INSTRUCTION;
+	wmb();
+	/*
+	 * Execute serializing instruction on each CPU.
+	 * Acts as a memory barrier.
+	 */
+	ret = on_each_cpu(immediate_synchronize_core, NULL, 1, 1);
+	BUG_ON(ret != 0);
+
+	dest[1] = enable;
+	wmb();
+	*dest = saved_byte;
+		/*
+		 * Wait for all int3 handlers to end
+		 * (interrupts are disabled in int3).
+		 * This CPU is clearly not in a int3 handler,
+		 * because int3 handler is not preemptible and
+		 * there cannot be any more int3 handler called
+		 * for this site, because we placed the original
+		 * instruction back.
+		 * synchronize_sched has memory barriers.
+		 */
+	synchronize_sched();
+	unregister_die_notifier(&immediate_notify);
+	/* unregister_die_notifier has memory barriers */
+	target_eip = 0;
+#if defined(CONFIG_DEBUG_RODATA)
+	/* Set this page back to RX */
+	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_RX);
+	global_flush_tlb();
+#endif
+	return 0;
+}
+EXPORT_SYMBOL_GPL(immediate_optimized_set_enable);

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 5/8] Immediate Value - PowerPC Optimization
  2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2007-06-15 20:23 ` [patch 4/8] Immediate Value - i386 Optimization Mathieu Desnoyers
@ 2007-06-15 20:23 ` Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 6/8] Immediate Value - Documentation Mathieu Desnoyers
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-15 20:23 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-powerpc-optimization.patch --]
[-- Type: text/plain, Size: 4748 bytes --]

PowerPC optimization of the immediate values which uses a li instruction,
patched with an immediate value 0 or 1 to set the immediate value.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 arch/powerpc/kernel/Makefile    |    1 
 arch/powerpc/kernel/immediate.c |   34 +++++++++++++++++++
 include/asm-powerpc/immediate.h |   69 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 103 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/include/asm-powerpc/immediate.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-powerpc/immediate.h	2007-06-15 16:13:55.000000000 -0400
+++ linux-2.6-lttng/include/asm-powerpc/immediate.h	2007-06-15 16:14:04.000000000 -0400
@@ -1 +1,68 @@
-#include <asm-generic/immediate.h>
+#ifndef _ASM_POWERPC_IMMEDIATE_H
+#define _ASM_POWERPC_IMMEDIATE_H
+
+/*
+ * Immediate values. PowerPC architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm-compat.h>
+
+#define IF_DEFAULT (IF_OPTIMIZED | IF_LOCKDEP)
+
+/* Optimized version of the immediate */
+#define immediate_optimized(flags, var)					\
+	({								\
+		char condition;						\
+		asm (	".section __immediate, \"a\", @progbits;\n\t"	\
+					PPC_LONG "%1, 0f, %2;\n\t"	\
+					".previous;\n\t"		\
+					".align 4\n\t"			\
+					"0:\n\t"			\
+					"li %0,%3;\n\t"			\
+				: "=r" (condition)			\
+				: "i" (&var),				\
+				  "i" (flags),				\
+				  "i" (0));				\
+		(condition);						\
+	})
+
+/*
+ * immediate macro selecting the generic or optimized version of immediate,
+ * depending on the flags specified. It is a macro because we need to pass the
+ * name to immediate_optimized() and immediate_generic() so they can declare a
+ * static variable with it.
+ */
+#define _immediate(flags, var)						\
+({									\
+	((flags) & IF_OPTIMIZED) ?					\
+		immediate_optimized(flags, var) :			\
+		immediate_generic(flags, var);				\
+})
+
+/* immediate with default behavior */
+#define immediate(var)	 _immediate(IF_DEFAULT, var)
+
+/*
+ * Architecture dependant immediate information, used internally for immediate
+ * activation.
+ */
+
+/*
+ * Offset of the immediate value from the start of the addi instruction (result
+ * of the li mnemonic), in bytes.
+ */
+#define IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 2
+#define IMMEDIATE_OPTIMIZED_ENABLE_TYPE unsigned short
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define IMMEDIATE_OPTIMIZED_ENABLE(a)					\
+	(*(IMMEDIATE_OPTIMIZED_ENABLE_TYPE*)				\
+		((char*)(a)+IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET))
+
+extern int immediate_optimized_set_enable(void *address, char enable);
+
+#endif /* _ASM_POWERPC_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/powerpc/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/kernel/Makefile	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/powerpc/kernel/Makefile	2007-06-15 16:14:04.000000000 -0400
@@ -96,3 +96,4 @@
 
 extra-$(CONFIG_PPC_FPU)		+= fpu.o
 extra-$(CONFIG_PPC64)		+= entry_64.o
+obj-$(CONFIG_IMMEDIATE)		+= immediate.o
Index: linux-2.6-lttng/arch/powerpc/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/powerpc/kernel/immediate.c	2007-06-15 16:14:04.000000000 -0400
@@ -0,0 +1,34 @@
+/*
+ * Powerpc optimized immediate values enabling/disabling.
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/module.h>
+#include <linux/immediate.h>
+#include <linux/string.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+/*
+ * The address is aligned on 4 bytes boundary: the 4 bytes instruction we are
+ * changing fits within one page.
+ */
+int immediate_optimized_set_enable(void *address, char enable)
+{
+	char newi[IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET+1];
+	int size = IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET
+			+ sizeof(IMMEDIATE_OPTIMIZED_ENABLE_TYPE);
+
+#if defined(CONFIG_DEBUG_PAGEALLOC)
+	/* Make sure this page is writable */
+	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_EXEC);
+	global_flush_tlb();
+#endif
+	memcpy(newi, address, size);
+	IMMEDIATE_OPTIMIZED_ENABLE(&newi[0]) = enable;
+	memcpy(address, newi, size);
+	flush_icache_range((unsigned long)address, size);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(immediate_optimized_set_enable);

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 6/8] Immediate Value - Documentation
  2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2007-06-15 20:23 ` [patch 5/8] Immediate Value - PowerPC Optimization Mathieu Desnoyers
@ 2007-06-15 20:23 ` Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 7/8] F00F bug fixup for i386 - use immediate values Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 8/8] Scheduler profiling - Use " Mathieu Desnoyers
  7 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-15 20:23 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-documentation.patch --]
[-- Type: text/plain, Size: 4142 bytes --]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 Documentation/immediate.txt |  103 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

Index: linux-2.6-lttng/Documentation/immediate.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/Documentation/immediate.txt	2007-06-15 16:14:05.000000000 -0400
@@ -0,0 +1,103 @@
+		        Using the Immediate Values
+
+			    Mathieu Desnoyers
+
+
+This document introduces Immediate Values and their use.
+
+* Purpose of immediate values
+
+An immediate value is used to compile into the kernel a branch that is disabled
+at compile-time that has almost no measurable performance impact on the kernel.
+Then, at runtime, it can be enabled dynamically.
+
+It can be used to compile code in the kernel that is seldomly meant to be
+dynamically activated. It's the case of CPU specific workarounds, profiling,
+tracing, etc.
+
+This infrastructure is specialized in supporting dynamic patching of the values
+in the instruction stream when multiple CPUs are running without disturbing the
+normal system behavior.
+
+
+* Usage
+
+In order to use the macro immediate, you should include linux/immediate.h.
+
+#include <linux/immediate.h>
+
+immediate_t __read_mostly this_immediate;
+EXPORT_SYMBOL(this_immediate);
+
+
+Add, in your code :
+
+if (unlikely(immediate(this_immediate))) {
+	some code...
+}
+
+And then, use:
+
+Use immediate_arm(&this_immediate) to activate the immediate value.
+
+Use immediate_disarm(&this_immediate) to deactivate the immediate value.
+
+Use immediate_query(&this_immediate) to query the immediate value state.
+
+The immediate mechanism supports inserting multiple instances of the same
+immediate. Immediate values can be put in inline functions, inlined static
+functions, and unrolled loops.
+
+
+* Optimization for a given architecture
+
+One can implement optimized immediate values for a given architecture by
+replacing asm-$ARCH/immediate.h.
+
+The IF_* flags can be used to control the type of immediate value. See the
+include/linux/immediate.h header for the list of flags. They can be specified as
+the first parameter of the _immediate() macro, as in the following example,
+which uses flags to declare a immediate that always uses the generic version of
+the immediates. It can be useful to use this when immediates are placed in
+kernel code presenting particular reentrancy challenges.
+
+if (unlikely(_immediate(IF_DEFAULT & ~IF_OPTIMIZED, this_immediate))) {
+	some code...
+}
+
+
+* Performance improvement
+
+Result of a small test comparing:
+
+1 - Branch depending on a cache miss (has to fetch in memory, caused by a 128
+    bytes stride)). This is the test that is likely to look like what
+    side-effect the original profile_hit code was causing, under the
+    assumption that the kernel is already using L1 and L2 caches at
+    their full capacity and that a supplementary data load would cause
+    cache trashing.
+2 - Branch depending on L1 cache hit. Just for comparison.
+3 - Branch depending on a load immediate in the instruction stream.
+
+It has been compiled with gcc -O2. Tests done on a 3GHz P4.
+
+In the first test series, the branch is not taken:
+
+number of tests : 1000
+number of branches per test : 81920
+memory hit cycles per iteration (mean) : 48.252
+L1 cache hit cycles per iteration (mean) : 16.1693
+instruction stream based test, cycles per iteration (mean) : 16.0432
+
+
+In the second test series, the branch is taken and an integer is
+incremented within the block:
+
+number of tests : 1000
+number of branches per test : 81920
+memory hit cycles per iteration (mean) : 48.2691
+L1 cache hit cycles per iteration (mean) : 16.396
+instruction stream based test, cycles per iteration (mean) : 16.0441
+
+Therefore, the memory fetch based test seems to be 200% slower than the
+load immediate based test.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 7/8] F00F bug fixup for i386 - use immediate values
  2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2007-06-15 20:23 ` [patch 6/8] Immediate Value - Documentation Mathieu Desnoyers
@ 2007-06-15 20:23 ` Mathieu Desnoyers
  2007-06-15 20:23 ` [patch 8/8] Scheduler profiling - Use " Mathieu Desnoyers
  7 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-15 20:23 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: f00f-bug-use-immediate-values.patch --]
[-- Type: text/plain, Size: 2755 bytes --]

Use the faster immediate values for F00F bug handling in do_page_fault.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

---
 arch/i386/kernel/traps.c     |    4 ++++
 arch/i386/mm/fault.c         |    3 ++-
 include/asm-i386/processor.h |    3 +++
 3 files changed, 9 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/arch/i386/kernel/traps.c
===================================================================
--- linux-2.6-lttng.orig/arch/i386/kernel/traps.c	2007-06-15 16:13:50.000000000 -0400
+++ linux-2.6-lttng/arch/i386/kernel/traps.c	2007-06-15 16:14:06.000000000 -0400
@@ -31,6 +31,7 @@
 #include <linux/uaccess.h>
 #include <linux/nmi.h>
 #include <linux/bug.h>
+#include <linux/immediate.h>
 
 #ifdef CONFIG_EISA
 #include <linux/ioport.h>
@@ -1149,6 +1150,8 @@
 #endif /* CONFIG_MATH_EMULATION */
 
 #ifdef CONFIG_X86_F00F_BUG
+immediate_t __read_mostly f00f_bug_fix;
+
 void __init trap_init_f00f_bug(void)
 {
 	__set_fixmap(FIX_F00F_IDT, __pa(&idt_table), PAGE_KERNEL_RO);
@@ -1159,6 +1162,7 @@
 	 */
 	idt_descr.address = fix_to_virt(FIX_F00F_IDT);
 	load_idt(&idt_descr);
+	immediate_arm(&f00f_bug_fix);
 }
 #endif
 
Index: linux-2.6-lttng/arch/i386/mm/fault.c
===================================================================
--- linux-2.6-lttng.orig/arch/i386/mm/fault.c	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/arch/i386/mm/fault.c	2007-06-15 16:14:06.000000000 -0400
@@ -25,6 +25,7 @@
 #include <linux/kprobes.h>
 #include <linux/uaccess.h>
 #include <linux/kdebug.h>
+#include <linux/immediate.h>
 
 #include <asm/system.h>
 #include <asm/desc.h>
@@ -477,7 +478,7 @@
 	/*
 	 * Pentium F0 0F C7 C8 bug workaround.
 	 */
-	if (boot_cpu_data.f00f_bug) {
+	if (unlikely(immediate(f00f_bug_fix))) {
 		unsigned long nr;
 		
 		nr = (address - idt_descr.address) >> 3;
Index: linux-2.6-lttng/include/asm-i386/processor.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-i386/processor.h	2007-06-15 16:13:51.000000000 -0400
+++ linux-2.6-lttng/include/asm-i386/processor.h	2007-06-15 16:14:06.000000000 -0400
@@ -21,6 +21,7 @@
 #include <asm/percpu.h>
 #include <linux/cpumask.h>
 #include <linux/init.h>
+#include <linux/immediate.h>
 #include <asm/processor-flags.h>
 
 /* flag for disabling the tsc */
@@ -102,6 +103,8 @@
 extern struct tss_struct doublefault_tss;
 DECLARE_PER_CPU(struct tss_struct, init_tss);
 
+extern immediate_t __read_mostly f00f_bug_fix;
+
 #ifdef CONFIG_SMP
 extern struct cpuinfo_x86 cpu_data[];
 #define current_cpu_data cpu_data[smp_processor_id()]

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 8/8] Scheduler profiling - Use immediate values
  2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
                   ` (6 preceding siblings ...)
  2007-06-15 20:23 ` [patch 7/8] F00F bug fixup for i386 - use immediate values Mathieu Desnoyers
@ 2007-06-15 20:23 ` Mathieu Desnoyers
  7 siblings, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-15 20:23 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: profiling-use-immediate-values.patch --]
[-- Type: text/plain, Size: 6560 bytes --]

Use immediate values with lower d-cache hit in optimized version as a
condition for scheduler profiling call.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

---
 drivers/kvm/svm.c       |    2 +-
 drivers/kvm/vmx.c       |    2 +-
 include/linux/profile.h |    9 +++------
 kernel/profile.c        |   37 +++++++++++++++++++++++++------------
 kernel/sched.c          |    3 ++-
 5 files changed, 32 insertions(+), 21 deletions(-)

Index: linux-2.6-lttng/kernel/profile.c
===================================================================
--- linux-2.6-lttng.orig/kernel/profile.c	2007-06-15 16:13:50.000000000 -0400
+++ linux-2.6-lttng/kernel/profile.c	2007-06-15 16:14:07.000000000 -0400
@@ -23,6 +23,7 @@
 #include <linux/profile.h>
 #include <linux/highmem.h>
 #include <linux/mutex.h>
+#include <linux/immediate.h>
 #include <asm/sections.h>
 #include <asm/semaphore.h>
 #include <asm/irq_regs.h>
@@ -42,9 +43,6 @@
 static atomic_t *prof_buffer;
 static unsigned long prof_len, prof_shift;
 
-int prof_on __read_mostly;
-EXPORT_SYMBOL_GPL(prof_on);
-
 static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
@@ -52,6 +50,12 @@
 static DEFINE_MUTEX(profile_flip_mutex);
 #endif /* CONFIG_SMP */
 
+/* Immediate values */
+immediate_t __read_mostly sleep_profiling, sched_profiling, kvm_profiling,
+	cpu_profiling;
+EXPORT_SYMBOL_GPL(kvm_profiling);
+EXPORT_SYMBOL_GPL(cpu_profiling);
+
 static int __init profile_setup(char * str)
 {
 	static char __initdata schedstr[] = "schedule";
@@ -60,7 +64,7 @@
 	int par;
 
 	if (!strncmp(str, sleepstr, strlen(sleepstr))) {
-		prof_on = SLEEP_PROFILING;
+		immediate_arm(&sleep_profiling);
 		if (str[strlen(sleepstr)] == ',')
 			str += strlen(sleepstr) + 1;
 		if (get_option(&str, &par))
@@ -69,7 +73,7 @@
 			"kernel sleep profiling enabled (shift: %ld)\n",
 			prof_shift);
 	} else if (!strncmp(str, schedstr, strlen(schedstr))) {
-		prof_on = SCHED_PROFILING;
+		immediate_arm(&sched_profiling);
 		if (str[strlen(schedstr)] == ',')
 			str += strlen(schedstr) + 1;
 		if (get_option(&str, &par))
@@ -78,7 +82,7 @@
 			"kernel schedule profiling enabled (shift: %ld)\n",
 			prof_shift);
 	} else if (!strncmp(str, kvmstr, strlen(kvmstr))) {
-		prof_on = KVM_PROFILING;
+		immediate_arm(&kvm_profiling);
 		if (str[strlen(kvmstr)] == ',')
 			str += strlen(kvmstr) + 1;
 		if (get_option(&str, &par))
@@ -88,7 +92,7 @@
 			prof_shift);
 	} else if (get_option(&str, &par)) {
 		prof_shift = par;
-		prof_on = CPU_PROFILING;
+		immediate_arm(&cpu_profiling);
 		printk(KERN_INFO "kernel profiling enabled (shift: %ld)\n",
 			prof_shift);
 	}
@@ -99,7 +103,10 @@
 
 void __init profile_init(void)
 {
-	if (!prof_on) 
+	if (!immediate_query(&sleep_profiling) &&
+		!immediate_query(&sched_profiling) &&
+		!immediate_query(&kvm_profiling) &&
+		!immediate_query(&cpu_profiling))
 		return;
  
 	/* only text is profiled */
@@ -288,7 +295,7 @@
 	int i, j, cpu;
 	struct profile_hit *hits;
 
-	if (prof_on != type || !prof_buffer)
+	if (!prof_buffer)
 		return;
 	pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
 	i = primary = (pc & (NR_PROFILE_GRP - 1)) << PROFILE_GRPSHIFT;
@@ -398,7 +405,7 @@
 {
 	unsigned long pc;
 
-	if (prof_on != type || !prof_buffer)
+	if (!prof_buffer)
 		return;
 	pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
 	atomic_add(nr_hits, &prof_buffer[min(pc, prof_len - 1)]);
@@ -555,7 +562,10 @@
 	}
 	return 0;
 out_cleanup:
-	prof_on = 0;
+	immediate_disarm(&sleep_profiling);
+	immediate_disarm(&sched_profiling);
+	immediate_disarm(&kvm_profiling);
+	immediate_disarm(&cpu_profiling);
 	smp_mb();
 	on_each_cpu(profile_nop, NULL, 0, 1);
 	for_each_online_cpu(cpu) {
@@ -582,7 +592,10 @@
 {
 	struct proc_dir_entry *entry;
 
-	if (!prof_on)
+	if (!immediate_query(&sleep_profiling) &&
+		!immediate_query(&sched_profiling) &&
+		!immediate_query(&kvm_profiling) &&
+		!immediate_query(&cpu_profiling))
 		return 0;
 	if (create_hash_tables())
 		return -1;
Index: linux-2.6-lttng/include/linux/profile.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/profile.h	2007-06-15 16:13:50.000000000 -0400
+++ linux-2.6-lttng/include/linux/profile.h	2007-06-15 16:14:07.000000000 -0400
@@ -10,7 +10,8 @@
 
 #include <asm/errno.h>
 
-extern int prof_on __read_mostly;
+extern immediate_t __read_mostly sleep_profiling, sched_profiling, kvm_profiling,
+		cpu_profiling;
 
 #define CPU_PROFILING	1
 #define SCHED_PROFILING	2
@@ -35,11 +36,7 @@
  */
 static inline void profile_hit(int type, void *ip)
 {
-	/*
-	 * Speedup for the common (no profiling enabled) case:
-	 */
-	if (unlikely(prof_on == type))
-		profile_hits(type, ip, 1);
+	profile_hits(type, ip, 1);
 }
 
 #ifdef CONFIG_PROC_FS
Index: linux-2.6-lttng/kernel/sched.c
===================================================================
--- linux-2.6-lttng.orig/kernel/sched.c	2007-06-15 16:13:50.000000000 -0400
+++ linux-2.6-lttng/kernel/sched.c	2007-06-15 16:14:07.000000000 -0400
@@ -2998,7 +2998,8 @@
 			print_irqtrace_events(prev);
 		dump_stack();
 	}
-	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
+	if (unlikely(immediate(sched_profiling)))
+		profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
 	/*
 	 * The idle thread is not allowed to schedule!
Index: linux-2.6-lttng/drivers/kvm/svm.c
===================================================================
--- linux-2.6-lttng.orig/drivers/kvm/svm.c	2007-06-15 16:13:50.000000000 -0400
+++ linux-2.6-lttng/drivers/kvm/svm.c	2007-06-15 16:14:07.000000000 -0400
@@ -1647,7 +1647,7 @@
 	/*
 	 * Profile KVM exit RIPs:
 	 */
-	if (unlikely(prof_on == KVM_PROFILING))
+	if (unlikely(immediate(kvm_profiling)))
 		profile_hit(KVM_PROFILING,
 			(void *)(unsigned long)vcpu->svm->vmcb->save.rip);
 
Index: linux-2.6-lttng/drivers/kvm/vmx.c
===================================================================
--- linux-2.6-lttng.orig/drivers/kvm/vmx.c	2007-06-15 16:13:50.000000000 -0400
+++ linux-2.6-lttng/drivers/kvm/vmx.c	2007-06-15 16:14:07.000000000 -0400
@@ -2131,7 +2131,7 @@
 	/*
 	 * Profile KVM exit RIPs:
 	 */
-	if (unlikely(prof_on == KVM_PROFILING))
+	if (unlikely(immediate(kvm_profiling)))
 		profile_hit(KVM_PROFILING, (void *)vmcs_readl(GUEST_RIP));
 
 	vcpu->launched = 1;

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 4/8] Immediate Value - i386 Optimization
  2007-06-15 20:23 ` [patch 4/8] Immediate Value - i386 Optimization Mathieu Desnoyers
@ 2007-06-15 22:02   ` Chuck Ebbert
  2007-06-17 17:50     ` Mathieu Desnoyers
  2007-06-18 14:57     ` [patch 4/8] Immediate Value - i386 Optimization; kprobes Mathieu Desnoyers
  0 siblings, 2 replies; 24+ messages in thread
From: Chuck Ebbert @ 2007-06-15 22:02 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel

On 06/15/2007 04:23 PM, Mathieu Desnoyers wrote:
> i386 optimization of the immediate values which uses a movl with code patching
> to set/unset the value used to populate the register used for the branch test.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> ---
>  arch/i386/kernel/Makefile    |    1 
>  arch/i386/kernel/immediate.c |  177 +++++++++++++++++++++++++++++++++++++++++++
>  include/asm-i386/immediate.h |   72 +++++++++++++++++
>  3 files changed, 249 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6-lttng/include/asm-i386/immediate.h
> ===================================================================
> --- linux-2.6-lttng.orig/include/asm-i386/immediate.h	2007-06-15 16:13:55.000000000 -0400
> +++ linux-2.6-lttng/include/asm-i386/immediate.h	2007-06-15 16:14:04.000000000 -0400
> @@ -1 +1,71 @@
> -#include <asm-generic/immediate.h>
> +#ifndef _ASM_I386_IMMEDIATE_H
> +#define _ASM_I386_IMMEDIATE_H
> +
> +/*
> + * Immediate values. i386 architecture optimizations.
> + *
> + * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> + *
> + * This file is released under the GPLv2.
> + * See the file COPYING for more details.
> + */
> +
> +#define IF_DEFAULT (IF_OPTIMIZED | IF_LOCKDEP)
> +
> +/*
> + * Optimized version of the immediate. Passing the flags as a pointer to
> + * the inline assembly to trick it into putting the flags value as third
> + * parameter in the structure.
> + */
> +#define immediate_optimized(flags, var)					\
> +	({								\
> +		int condition;						\
> +		asm (	".section __immediate, \"a\", @progbits;\n\t"	\
> +					".long %1, 0f, %2;\n\t"		\
> +					".previous;\n\t"		\
> +					"0:\n\t"			\
> +					"movl %3,%0;\n\t"		\
> +				: "=r" (condition)			\
> +				: "m" (var),				\
> +				  "m" (*(char*)flags),			\
> +				  "i" (0));				\
> +		(condition);						\

Unnecessary parens.

> +	})
> +
> +/*
> + * immediate macro selecting the generic or optimized version of immediate,
> + * depending on the flags specified. It is a macro because we need to pass the
> + * name to immediate_optimized() and immediate_generic() so they can declare a
> + * static variable with it.
> + */
> +#define _immediate(flags, var)						\
> +({									\
> +	(((flags) & IF_LOCKDEP) && ((flags) & IF_OPTIMIZED)) ?		\
> +		immediate_optimized(flags, var) :			\
> +		immediate_generic(flags, var);				\
> +})
> +
> +/* immediate with default behavior */
> +#define immediate(var)	_immediate(IF_DEFAULT, var)
> +
> +/*
> + * Architecture dependant immediate information, used internally for immediate
> + * activation.
> + */
> +
> +/*
> + * Offset of the immediate value from the start of the movl instruction, in
> + * bytes. We point to the first lower byte of the 4 bytes immediate value. Only
> + * changing one byte makes sure we do an atomic memory write, independently of
> + * the alignment of the 4 bytes in the load immediate instruction.
> + */
> +#define IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
> +#define IMMEDIATE_OPTIMIZED_ENABLE_TYPE unsigned char
> +/* Dereference enable as lvalue from a pointer to its instruction */
> +#define IMMEDIATE_OPTIMIZED_ENABLE(a)					\
> +	(*(IMMEDIATE_OPTIMIZED_ENABLE_TYPE*)				\
> +		((char*)(a)+IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET))
> +
> +extern int immediate_optimized_set_enable(void *address, char enable);
> +
> +#endif /* _ASM_I386_IMMEDIATE_H */
> Index: linux-2.6-lttng/arch/i386/kernel/Makefile
> ===================================================================
> --- linux-2.6-lttng.orig/arch/i386/kernel/Makefile	2007-06-15 16:13:51.000000000 -0400
> +++ linux-2.6-lttng/arch/i386/kernel/Makefile	2007-06-15 16:14:04.000000000 -0400
> @@ -35,6 +35,7 @@
>  obj-y				+= sysenter.o vsyscall.o
>  obj-$(CONFIG_ACPI_SRAT) 	+= srat.o
>  obj-$(CONFIG_EFI) 		+= efi.o efi_stub.o
> +obj-$(CONFIG_IMMEDIATE)		+= immediate.o
>  obj-$(CONFIG_DOUBLEFAULT) 	+= doublefault.o
>  obj-$(CONFIG_SERIAL_8250)	+= legacy_serial.o
>  obj-$(CONFIG_VM86)		+= vm86.o
> Index: linux-2.6-lttng/arch/i386/kernel/immediate.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/arch/i386/kernel/immediate.c	2007-06-15 16:14:04.000000000 -0400
> @@ -0,0 +1,177 @@
> +/*
> + * Immediate Value - i386 architecture specific code.
> + *
> + * Rationale
> + *
> + * Required because of :
> + * - Erratum 49 fix for Intel PIII.
> + * - Still present on newer processors : Intel Core 2 Duo Processor for Intel
> + *   Centrino Duo Processor Technology Specification Update, AH33.
> + *   Unsynchronized Cross-Modifying Code Operations Can Cause Unexpected
> + *   Instruction Execution Results.
> + *
> + * Permits immediate value modification by XMC with correct serialization.
> + *
> + * Reentrant for NMI and trap handler instrumentation. Permits XMC to a
> + * location that has preemption enabled because it involves no temporary or
> + * reused data structure.
> + *
> + * Quoting Richard J Moore, source of the information motivating this
> + * implementation which differs from the one proposed by Intel which is not
> + * suitable for kernel context (does not support NMI and would require disabling
> + * interrupts on every CPU for a long period) :
> + *
> + * "There is another issue to consider when looking into using probes other
> + * then int3:
> + *
> + * Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
> + * practice of modifying code on one processor where another has prefetched
> + * the unmodified version of the code. Intel states that unpredictable general
> + * protection faults may result if a synchronizing instruction (iret, int,
> + * int3, cpuid, etc ) is not executed on the second processor before it
> + * executes the pre-fetched out-of-date copy of the instruction.
> + *
> + * When we became aware of this I had a long discussion with Intel's
> + * microarchitecture guys. It turns out that the reason for this erratum
> + * (which incidentally Intel does not intend to fix) is because the trace
> + * cache - the stream of micorops resulting from instruction interpretation -
> + * cannot guaranteed to be valid. Reading between the lines I assume this
> + * issue arises because of optimization done in the trace cache, where it is
> + * no longer possible to identify the original instruction boundaries. If the
> + * CPU discoverers that the trace cache has been invalidated because of
> + * unsynchronized cross-modification then instruction execution will be
> + * aborted with a GPF. Further discussion with Intel revealed that replacing
> + * the first opcode byte with an int3 would not be subject to this erratum.
> + *
> + * So, is cmpxchg reliable? One has to guarantee more than mere atomicity."
> + *
> + * Overall design
> + *
> + * The algorithm proposed by Intel applies not so well in kernel context: it
> + * would imply disabling interrupts and looping on every CPUs while modifying
> + * the code and would not support instrumentation of code called from interrupt
> + * sources that cannot be disabled.
> + *
> + * Therefore, we use a different algorithm to respect Intel's erratum (see the
> + * quoted discussion above). We make sure that no CPU sees an out-of-date copy
> + * of a pre-fetched instruction by 1 - using a breakpoint, which skips the
> + * instruction that is going to be modified, 2 - issuing an IPI to every CPU to
> + * execute a sync_core(), to make sure that even when the breakpoint is removed,
> + * no cpu could possibly still have the out-of-date copy of the instruction,
> + * modify the now unused 2nd byte of the instruction, and then put back the
> + * original 1st byte of the instruction.
> + *
> + * It has exactly the same intent as the algorithm proposed by Intel, but
> + * it has less side-effects, scales better and supports NMI, SMI and MCE.

Algorithm looks plausible, was it really tested?

> + *
> + * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> + */
> +
> +#include <linux/notifier.h>
> +#include <linux/preempt.h>
> +#include <linux/smp.h>
> +#include <linux/notifier.h>
> +#include <linux/module.h>
> +#include <linux/immediate.h>
> +#include <linux/kdebug.h>
> +
> +#include <asm/cacheflush.h>
> +
> +#define BREAKPOINT_INSTRUCTION  0xcc
> +#define BREAKPOINT_INS_LEN 1
> +
> +static long target_eip;
> +
> +static void immediate_synchronize_core(void *info)
> +{
> +	sync_core();	/* use cpuid to stop speculative execution */
> +}
> +
> +/*
> + * The eip value points right after the breakpoint instruction, in the second
> + * byte of the movb. Incrementing it of 1 byte makes the code resume right after
> + * the movb instruction, effectively skipping this instruction.
> + *
> + * We simply skip the 2 bytes load immediate here, leaving the register in an
> + * undefined state. We don't care about the content (0 or !0), because we are
> + * changing the value 0->1 or 1->0. This small window of undefined value
> + * doesn't matter.
> + */
> +static int immediate_notifier(struct notifier_block *nb,
> +	unsigned long val, void *data)
> +{
> +	enum die_val die_val = (enum die_val) val;
> +	struct die_args *args = data;
> +
> +	if (!args->regs || user_mode_vm(args->regs))
> +		return NOTIFY_DONE;
> +
> +	if (die_val == DIE_INT3	&& args->regs->eip == target_eip) {
> +		args->regs->eip += 1; /* Skip the next byte of load immediate */

<nitpick>
		args->regs->eip++;

> +		return NOTIFY_STOP;
> +	}
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block immediate_notify = {
> +	.notifier_call = immediate_notifier,
> +	.priority = 0x7fffffff,	/* we need to be notified first */
> +};
> +
> +/*
> + * The address is not aligned. We can only change 1 byte of the value
> + * atomically.
> + * Must be called with immediate_mutex held.
> + */
> +int immediate_optimized_set_enable(void *address, char enable)
> +{
> +	char saved_byte;
> +	int ret;
> +	char *dest = address;
> +
> +	if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
> +		return 0;
> +
> +#if defined(CONFIG_DEBUG_RODATA) || defined(CONFIG_DEBUG_PAGEALLOC)
> +	/* Make sure this page is writable */
> +	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_EXEC);
> +	global_flush_tlb();
> +#endif

Can't we have a macro or inline to do this, and the setting back
to read-only? kprobes also has the same ugly #if blocks...

Hmm, what happens if you race with kprobes setting a probe on
the same page? Couldn't it unexpectedly become read-only?

> +	target_eip = (long)address + BREAKPOINT_INS_LEN;
> +	/* register_die_notifier has memory barriers */
> +	register_die_notifier(&immediate_notify);
> +	saved_byte = *dest;
> +	*dest = BREAKPOINT_INSTRUCTION;
> +	wmb();
> +	/*
> +	 * Execute serializing instruction on each CPU.
> +	 * Acts as a memory barrier.
> +	 */
> +	ret = on_each_cpu(immediate_synchronize_core, NULL, 1, 1);
> +	BUG_ON(ret != 0);
> +
> +	dest[1] = enable;
> +	wmb();
> +	*dest = saved_byte;
> +		/*
> +		 * Wait for all int3 handlers to end
> +		 * (interrupts are disabled in int3).
> +		 * This CPU is clearly not in a int3 handler,
> +		 * because int3 handler is not preemptible and
> +		 * there cannot be any more int3 handler called
> +		 * for this site, because we placed the original
> +		 * instruction back.
> +		 * synchronize_sched has memory barriers.
> +		 */
> +	synchronize_sched();
> +	unregister_die_notifier(&immediate_notify);
> +	/* unregister_die_notifier has memory barriers */
> +	target_eip = 0;
> +#if defined(CONFIG_DEBUG_RODATA)
> +	/* Set this page back to RX */
> +	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_RX);
> +	global_flush_tlb();
> +#endif
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(immediate_optimized_set_enable);
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 4/8] Immediate Value - i386 Optimization
  2007-06-15 22:02   ` Chuck Ebbert
@ 2007-06-17 17:50     ` Mathieu Desnoyers
  2007-06-18 14:57     ` [patch 4/8] Immediate Value - i386 Optimization; kprobes Mathieu Desnoyers
  1 sibling, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-17 17:50 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: akpm, linux-kernel

* Chuck Ebbert (cebbert@redhat.com) wrote:
> On 06/15/2007 04:23 PM, Mathieu Desnoyers wrote:
> > i386 optimization of the immediate values which uses a movl with code patching
> > to set/unset the value used to populate the register used for the branch test.
> > 
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > ---
> >  arch/i386/kernel/Makefile    |    1 
> >  arch/i386/kernel/immediate.c |  177 +++++++++++++++++++++++++++++++++++++++++++
> >  include/asm-i386/immediate.h |   72 +++++++++++++++++
> >  3 files changed, 249 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6-lttng/include/asm-i386/immediate.h
> > ===================================================================
> > --- linux-2.6-lttng.orig/include/asm-i386/immediate.h	2007-06-15 16:13:55.000000000 -0400
> > +++ linux-2.6-lttng/include/asm-i386/immediate.h	2007-06-15 16:14:04.000000000 -0400
> > @@ -1 +1,71 @@
> > -#include <asm-generic/immediate.h>
> > +#ifndef _ASM_I386_IMMEDIATE_H
> > +#define _ASM_I386_IMMEDIATE_H
> > +
> > +/*
> > + * Immediate values. i386 architecture optimizations.
> > + *
> > + * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > + *
> > + * This file is released under the GPLv2.
> > + * See the file COPYING for more details.
> > + */
> > +
> > +#define IF_DEFAULT (IF_OPTIMIZED | IF_LOCKDEP)
> > +
> > +/*
> > + * Optimized version of the immediate. Passing the flags as a pointer to
> > + * the inline assembly to trick it into putting the flags value as third
> > + * parameter in the structure.
> > + */
> > +#define immediate_optimized(flags, var)					\
> > +	({								\
> > +		int condition;						\
> > +		asm (	".section __immediate, \"a\", @progbits;\n\t"	\
> > +					".long %1, 0f, %2;\n\t"		\
> > +					".previous;\n\t"		\
> > +					"0:\n\t"			\
> > +					"movl %3,%0;\n\t"		\
> > +				: "=r" (condition)			\
> > +				: "m" (var),				\
> > +				  "m" (*(char*)flags),			\
> > +				  "i" (0));				\
> > +		(condition);						\
> 
Hi Chuck,

> Unnecessary parens.
> 

ok, fixed in each immediate.h.

> > +	})
> > +
> > +/*
> > + * immediate macro selecting the generic or optimized version of immediate,
> > + * depending on the flags specified. It is a macro because we need to pass the
> > + * name to immediate_optimized() and immediate_generic() so they can declare a
> > + * static variable with it.
> > + */
> > +#define _immediate(flags, var)						\
> > +({									\
> > +	(((flags) & IF_LOCKDEP) && ((flags) & IF_OPTIMIZED)) ?		\
> > +		immediate_optimized(flags, var) :			\
> > +		immediate_generic(flags, var);				\
> > +})
> > +
> > +/* immediate with default behavior */
> > +#define immediate(var)	_immediate(IF_DEFAULT, var)
> > +
> > +/*
> > + * Architecture dependant immediate information, used internally for immediate
> > + * activation.
> > + */
> > +
> > +/*
> > + * Offset of the immediate value from the start of the movl instruction, in
> > + * bytes. We point to the first lower byte of the 4 bytes immediate value. Only
> > + * changing one byte makes sure we do an atomic memory write, independently of
> > + * the alignment of the 4 bytes in the load immediate instruction.
> > + */
> > +#define IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
> > +#define IMMEDIATE_OPTIMIZED_ENABLE_TYPE unsigned char
> > +/* Dereference enable as lvalue from a pointer to its instruction */
> > +#define IMMEDIATE_OPTIMIZED_ENABLE(a)					\
> > +	(*(IMMEDIATE_OPTIMIZED_ENABLE_TYPE*)				\
> > +		((char*)(a)+IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET))
> > +
> > +extern int immediate_optimized_set_enable(void *address, char enable);
> > +
> > +#endif /* _ASM_I386_IMMEDIATE_H */
> > Index: linux-2.6-lttng/arch/i386/kernel/Makefile
> > ===================================================================
> > --- linux-2.6-lttng.orig/arch/i386/kernel/Makefile	2007-06-15 16:13:51.000000000 -0400
> > +++ linux-2.6-lttng/arch/i386/kernel/Makefile	2007-06-15 16:14:04.000000000 -0400
> > @@ -35,6 +35,7 @@
> >  obj-y				+= sysenter.o vsyscall.o
> >  obj-$(CONFIG_ACPI_SRAT) 	+= srat.o
> >  obj-$(CONFIG_EFI) 		+= efi.o efi_stub.o
> > +obj-$(CONFIG_IMMEDIATE)		+= immediate.o
> >  obj-$(CONFIG_DOUBLEFAULT) 	+= doublefault.o
> >  obj-$(CONFIG_SERIAL_8250)	+= legacy_serial.o
> >  obj-$(CONFIG_VM86)		+= vm86.o
> > Index: linux-2.6-lttng/arch/i386/kernel/immediate.c
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6-lttng/arch/i386/kernel/immediate.c	2007-06-15 16:14:04.000000000 -0400
> > @@ -0,0 +1,177 @@
> > +/*
> > + * Immediate Value - i386 architecture specific code.
> > + *
> > + * Rationale
> > + *
> > + * Required because of :
> > + * - Erratum 49 fix for Intel PIII.
> > + * - Still present on newer processors : Intel Core 2 Duo Processor for Intel
> > + *   Centrino Duo Processor Technology Specification Update, AH33.
> > + *   Unsynchronized Cross-Modifying Code Operations Can Cause Unexpected
> > + *   Instruction Execution Results.
> > + *
> > + * Permits immediate value modification by XMC with correct serialization.
> > + *
> > + * Reentrant for NMI and trap handler instrumentation. Permits XMC to a
> > + * location that has preemption enabled because it involves no temporary or
> > + * reused data structure.
> > + *
> > + * Quoting Richard J Moore, source of the information motivating this
> > + * implementation which differs from the one proposed by Intel which is not
> > + * suitable for kernel context (does not support NMI and would require disabling
> > + * interrupts on every CPU for a long period) :
> > + *
> > + * "There is another issue to consider when looking into using probes other
> > + * then int3:
> > + *
> > + * Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
> > + * practice of modifying code on one processor where another has prefetched
> > + * the unmodified version of the code. Intel states that unpredictable general
> > + * protection faults may result if a synchronizing instruction (iret, int,
> > + * int3, cpuid, etc ) is not executed on the second processor before it
> > + * executes the pre-fetched out-of-date copy of the instruction.
> > + *
> > + * When we became aware of this I had a long discussion with Intel's
> > + * microarchitecture guys. It turns out that the reason for this erratum
> > + * (which incidentally Intel does not intend to fix) is because the trace
> > + * cache - the stream of micorops resulting from instruction interpretation -
> > + * cannot guaranteed to be valid. Reading between the lines I assume this
> > + * issue arises because of optimization done in the trace cache, where it is
> > + * no longer possible to identify the original instruction boundaries. If the
> > + * CPU discoverers that the trace cache has been invalidated because of
> > + * unsynchronized cross-modification then instruction execution will be
> > + * aborted with a GPF. Further discussion with Intel revealed that replacing
> > + * the first opcode byte with an int3 would not be subject to this erratum.
> > + *
> > + * So, is cmpxchg reliable? One has to guarantee more than mere atomicity."
> > + *
> > + * Overall design
> > + *
> > + * The algorithm proposed by Intel applies not so well in kernel context: it
> > + * would imply disabling interrupts and looping on every CPUs while modifying
> > + * the code and would not support instrumentation of code called from interrupt
> > + * sources that cannot be disabled.
> > + *
> > + * Therefore, we use a different algorithm to respect Intel's erratum (see the
> > + * quoted discussion above). We make sure that no CPU sees an out-of-date copy
> > + * of a pre-fetched instruction by 1 - using a breakpoint, which skips the
> > + * instruction that is going to be modified, 2 - issuing an IPI to every CPU to
> > + * execute a sync_core(), to make sure that even when the breakpoint is removed,
> > + * no cpu could possibly still have the out-of-date copy of the instruction,
> > + * modify the now unused 2nd byte of the instruction, and then put back the
> > + * original 1st byte of the instruction.
> > + *
> > + * It has exactly the same intent as the algorithm proposed by Intel, but
> > + * it has less side-effects, scales better and supports NMI, SMI and MCE.
> 
> Algorithm looks plausible, was it really tested?
> 

Yes, but I can't reproduce the erroneous condition on my setup. I just
realized that I did not test it after my last changes. Therefore, see
the following comments inline for the upcoming fixes :

> > + *
> > + * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > + */
> > +
> > +#include <linux/notifier.h>
> > +#include <linux/preempt.h>
> > +#include <linux/smp.h>
> > +#include <linux/notifier.h>
> > +#include <linux/module.h>
> > +#include <linux/immediate.h>
> > +#include <linux/kdebug.h>
> > +
> > +#include <asm/cacheflush.h>
> > +
> > +#define BREAKPOINT_INSTRUCTION  0xcc
> > +#define BREAKPOINT_INS_LEN 1
> > +
> > +static long target_eip;
> > +
> > +static void immediate_synchronize_core(void *info)
> > +{
> > +	sync_core();	/* use cpuid to stop speculative execution */
> > +}
> > +
> > +/*
> > + * The eip value points right after the breakpoint instruction, in the second
> > + * byte of the movb. Incrementing it of 1 byte makes the code resume right after
> > + * the movb instruction, effectively skipping this instruction.
> > + *
> > + * We simply skip the 2 bytes load immediate here, leaving the register in an
> > + * undefined state. We don't care about the content (0 or !0), because we are
> > + * changing the value 0->1 or 1->0. This small window of undefined value
> > + * doesn't matter.
> > + */
> > +static int immediate_notifier(struct notifier_block *nb,
> > +	unsigned long val, void *data)
> > +{
> > +	enum die_val die_val = (enum die_val) val;
> > +	struct die_args *args = data;
> > +
> > +	if (!args->regs || user_mode_vm(args->regs))
> > +		return NOTIFY_DONE;
> > +
> > +	if (die_val == DIE_INT3	&& args->regs->eip == target_eip) {
> > +		args->regs->eip += 1; /* Skip the next byte of load immediate */
> 
> <nitpick>
> 		args->regs->eip++;
> 

Should now be args->regs->eip += 4; because I jump over the last 4 bytes
of the 5 bytes long movl instead of jumping over the last byte of the 2 bytes
movb I used previously.

I also have to use the int3 handler installed by CONFIG_KPROBES when
CONFIG_IMMEDIATE is used so I do not depend on KPROBES for i386. It will
also be fixed in the next release.

> > +		return NOTIFY_STOP;
> > +	}
> > +	return NOTIFY_DONE;
> > +}
> > +
> > +static struct notifier_block immediate_notify = {
> > +	.notifier_call = immediate_notifier,
> > +	.priority = 0x7fffffff,	/* we need to be notified first */
> > +};
> > +
> > +/*
> > + * The address is not aligned. We can only change 1 byte of the value
> > + * atomically.
> > + * Must be called with immediate_mutex held.
> > + */
> > +int immediate_optimized_set_enable(void *address, char enable)
> > +{
> > +	char saved_byte;
> > +	int ret;
> > +	char *dest = address;
> > +
> > +	if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
> > +		return 0;
> > +
> > +#if defined(CONFIG_DEBUG_RODATA) || defined(CONFIG_DEBUG_PAGEALLOC)
> > +	/* Make sure this page is writable */
> > +	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_EXEC);
> > +	global_flush_tlb();
> > +#endif
> 
> Can't we have a macro or inline to do this, and the setting back
> to read-only? kprobes also has the same ugly #if blocks...
> 
I guess it would be better to share a common macro between kprobes and
immediate for this.

> Hmm, what happens if you race with kprobes setting a probe on
> the same page? Couldn't it unexpectedly become read-only?
> 

Sure it can. Is there any spinlock already there that could be used by
different objects and would fit this purpose?

> > +	target_eip = (long)address + BREAKPOINT_INS_LEN;
> > +	/* register_die_notifier has memory barriers */
> > +	register_die_notifier(&immediate_notify);
> > +	saved_byte = *dest;
> > +	*dest = BREAKPOINT_INSTRUCTION;
> > +	wmb();
> > +	/*
> > +	 * Execute serializing instruction on each CPU.
> > +	 * Acts as a memory barrier.
> > +	 */
> > +	ret = on_each_cpu(immediate_synchronize_core, NULL, 1, 1);
> > +	BUG_ON(ret != 0);
> > +
> > +	dest[1] = enable;
> > +	wmb();
> > +	*dest = saved_byte;
> > +		/*
> > +		 * Wait for all int3 handlers to end
> > +		 * (interrupts are disabled in int3).
> > +		 * This CPU is clearly not in a int3 handler,
> > +		 * because int3 handler is not preemptible and
> > +		 * there cannot be any more int3 handler called
> > +		 * for this site, because we placed the original
> > +		 * instruction back.
> > +		 * synchronize_sched has memory barriers.
> > +		 */
> > +	synchronize_sched();
> > +	unregister_die_notifier(&immediate_notify);
> > +	/* unregister_die_notifier has memory barriers */
> > +	target_eip = 0;
> > +#if defined(CONFIG_DEBUG_RODATA)
> > +	/* Set this page back to RX */
> > +	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_RX);
> > +	global_flush_tlb();
> > +#endif
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(immediate_optimized_set_enable);
> > 
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 4/8] Immediate Value - i386 Optimization; kprobes
  2007-06-15 22:02   ` Chuck Ebbert
  2007-06-17 17:50     ` Mathieu Desnoyers
@ 2007-06-18 14:57     ` Mathieu Desnoyers
  2007-06-18 18:44       ` Chuck Ebbert
  1 sibling, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-18 14:57 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: akpm, linux-kernel, prasanna, ananth

* Chuck Ebbert (cebbert@redhat.com) wrote:
> > +		return NOTIFY_STOP;
> > +	}
> > +	return NOTIFY_DONE;
> > +}
> > +
> > +static struct notifier_block immediate_notify = {
> > +	.notifier_call = immediate_notifier,
> > +	.priority = 0x7fffffff,	/* we need to be notified first */
> > +};
> > +
> > +/*
> > + * The address is not aligned. We can only change 1 byte of the value
> > + * atomically.
> > + * Must be called with immediate_mutex held.
> > + */
> > +int immediate_optimized_set_enable(void *address, char enable)
> > +{
> > +	char saved_byte;
> > +	int ret;
> > +	char *dest = address;
> > +
> > +	if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
> > +		return 0;
> > +
> > +#if defined(CONFIG_DEBUG_RODATA) || defined(CONFIG_DEBUG_PAGEALLOC)
> > +	/* Make sure this page is writable */
> > +	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_EXEC);
> > +	global_flush_tlb();
> > +#endif
> 
> Can't we have a macro or inline to do this, and the setting back
> to read-only? kprobes also has the same ugly #if blocks...
> 
> Hmm, what happens if you race with kprobes setting a probe on
> the same page? Couldn't it unexpectedly become read-only?
> 

Hi Chuck,

I am looking more closely at kprobes; a few comments while we are here:

1 - Why is kprobe_count an atomic_t variable instead of a simple
integer? It is always protected by the kprobe_mutex anyway. All this
synchronization seems redundant.

2 - I wonder where is the equivalent of my snippet in kprobes code:
> > +#if defined(CONFIG_DEBUG_RODATA) || defined(CONFIG_DEBUG_PAGEALLOC)
> > + /* Make sure this page is writable */
> > + change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_EXEC);
> > + global_flush_tlb();
> > +#endif

I fancy it's done by the kprobe_page_fault handler, but I do not see
clearly how writing the breakpoint from arch_arm_kprobe() in
non-writeable memory is done.

In any case, I would like not to use that kind of approach; I prefer to
set the memory page to RWX before doing the memory write so I do not
depend on the page fault handler (remember that I instrument the page
fault handler itself...).

Maybe we could use a shared "text_mutex" between kprobes and
immediate values to insure mutual exclusion for .text modification.
However, we would still have the following coherency issue when an
immediate value and a kprobe share the same address:

1- enable immediate value
2- put a kprobe at the immediate value load instruction address
3- disable immediate value
4- remove kprobe

The kprobe removal would put back the load immediate instruction and
would not touch the loaded value (which is ok). However, the instruction
copy kept by kprobes would not be in sync with the immediate value
state:

Scenario 1: kprobes int3 handler first:

1- enable immediate value
2- put a kprobe at the immediate value load instruction address

-> int3 triggered
kprobe handler runs. Single-steps the "enabled" state.

3- disable immediate value

-> int3 triggered
kprobe handler runs. Single-steps the "enabled" state. This state is
wrong.

4- remove kprobe


Scenario 2: immediate value int3 handler first:

1- enable immediate value
2- put a kprobe at the immediate value load instruction address

-> int3 triggered
kprobe handler runs. Single-steps the "enabled" state.

3- disable immediate value
  -> int3 triggered (while we disable)
  While we disable, the immediate value int3 handler is executed first. It
  would cause the kprobe handler not to be called, and no "missing"
  counter would be incremented.

kprobe handler runs. Single-steps the "enabled" state. This state is
wrong.

4- remove kprobe


Since I don't want to depend on kprobes to put the breakpoint, because
of its reentrancy limitations (see all the __probes functions), It would
be good to figure out a mutual exclusion mechanism between immediate
values and kprobes. Maybe we could forbid kprobes to insert probes on
addresses present in the immediate values tables ? Or better: if we
detect this scenario, we could put the kprobe breakpoint at the
instruction following the "movl".

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 4/8] Immediate Value - i386 Optimization; kprobes
  2007-06-18 14:57     ` [patch 4/8] Immediate Value - i386 Optimization; kprobes Mathieu Desnoyers
@ 2007-06-18 18:44       ` Chuck Ebbert
  2007-06-18 18:56         ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Chuck Ebbert @ 2007-06-18 18:44 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, prasanna, ananth

On 06/18/2007 10:57 AM, Mathieu Desnoyers wrote:
> * Chuck Ebbert (cebbert@redhat.com) wrote:
>>> +		return NOTIFY_STOP;
>>> +	}
>>> +	return NOTIFY_DONE;
>>> +}
>>> +
>>> +static struct notifier_block immediate_notify = {
>>> +	.notifier_call = immediate_notifier,
>>> +	.priority = 0x7fffffff,	/* we need to be notified first */
>>> +};
>>> +
>>> +/*
>>> + * The address is not aligned. We can only change 1 byte of the value
>>> + * atomically.
>>> + * Must be called with immediate_mutex held.
>>> + */
>>> +int immediate_optimized_set_enable(void *address, char enable)
>>> +{
>>> +	char saved_byte;
>>> +	int ret;
>>> +	char *dest = address;
>>> +
>>> +	if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
>>> +		return 0;
>>> +
>>> +#if defined(CONFIG_DEBUG_RODATA) || defined(CONFIG_DEBUG_PAGEALLOC)
>>> +	/* Make sure this page is writable */
>>> +	change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_EXEC);
>>> +	global_flush_tlb();
>>> +#endif
>> Can't we have a macro or inline to do this, and the setting back
>> to read-only? kprobes also has the same ugly #if blocks...
>>
>> Hmm, what happens if you race with kprobes setting a probe on
>> the same page? Couldn't it unexpectedly become read-only?
>>
> 
> Hi Chuck,
> 
> I am looking more closely at kprobes; a few comments while we are here:
> 
> 1 - Why is kprobe_count an atomic_t variable instead of a simple
> integer? It is always protected by the kprobe_mutex anyway. All this
> synchronization seems redundant.
> 
> 2 - I wonder where is the equivalent of my snippet in kprobes code:
>>> +#if defined(CONFIG_DEBUG_RODATA) || defined(CONFIG_DEBUG_PAGEALLOC)
>>> + /* Make sure this page is writable */
>>> + change_page_attr(virt_to_page(address), 1, PAGE_KERNEL_EXEC);
>>> + global_flush_tlb();
>>> +#endif
> 
> I fancy it's done by the kprobe_page_fault handler, but I do not see
> clearly how writing the breakpoint from arch_arm_kprobe() in
> non-writeable memory is done.

Looks like it's not merged yet:

http://lkml.org/lkml/2007/6/7/2

This needs to go in before 2.6.22-final

> 
> In any case, I would like not to use that kind of approach; I prefer to
> set the memory page to RWX before doing the memory write so I do not
> depend on the page fault handler (remember that I instrument the page
> fault handler itself...).
> 
> Maybe we could use a shared "text_mutex" between kprobes and
> immediate values to insure mutual exclusion for .text modification.
> However, we would still have the following coherency issue when an
> immediate value and a kprobe share the same address:
> 
> 1- enable immediate value
> 2- put a kprobe at the immediate value load instruction address
> 3- disable immediate value
> 4- remove kprobe
> 
> The kprobe removal would put back the load immediate instruction and
> would not touch the loaded value (which is ok). However, the instruction
> copy kept by kprobes would not be in sync with the immediate value
> state:
> 
> Scenario 1: kprobes int3 handler first:
> 
> 1- enable immediate value
> 2- put a kprobe at the immediate value load instruction address
> 
> -> int3 triggered
> kprobe handler runs. Single-steps the "enabled" state.
> 
> 3- disable immediate value
> 
> -> int3 triggered
> kprobe handler runs. Single-steps the "enabled" state. This state is
> wrong.
> 
> 4- remove kprobe
> 
> 
> Scenario 2: immediate value int3 handler first:
> 
> 1- enable immediate value
> 2- put a kprobe at the immediate value load instruction address
> 
> -> int3 triggered
> kprobe handler runs. Single-steps the "enabled" state.
> 
> 3- disable immediate value
>   -> int3 triggered (while we disable)
>   While we disable, the immediate value int3 handler is executed first. It
>   would cause the kprobe handler not to be called, and no "missing"
>   counter would be incremented.
> 
> kprobe handler runs. Single-steps the "enabled" state. This state is
> wrong.
> 
> 4- remove kprobe
> 
> 
> Since I don't want to depend on kprobes to put the breakpoint, because
> of its reentrancy limitations (see all the __probes functions), It would
> be good to figure out a mutual exclusion mechanism between immediate
> values and kprobes. Maybe we could forbid kprobes to insert probes on
> addresses present in the immediate values tables ? Or better: if we
> detect this scenario, we could put the kprobe breakpoint at the
> instruction following the "movl".
> 

That's up to you and the kprobes people, I guess...



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 4/8] Immediate Value - i386 Optimization; kprobes
  2007-06-18 18:44       ` Chuck Ebbert
@ 2007-06-18 18:56         ` Andrew Morton
  2007-06-18 19:27           ` Mathieu Desnoyers
  2007-06-18 19:32           ` Andi Kleen
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2007-06-18 18:56 UTC (permalink / raw)
  To: Chuck Ebbert, Andi Kleen
  Cc: Mathieu Desnoyers, linux-kernel, prasanna, ananth

On Mon, 18 Jun 2007 14:44:57 -0400
Chuck Ebbert <cebbert@redhat.com> wrote:

> > I fancy it's done by the kprobe_page_fault handler, but I do not see
> > clearly how writing the breakpoint from arch_arm_kprobe() in
> > non-writeable memory is done.
> 
> Looks like it's not merged yet:
> 
> http://lkml.org/lkml/2007/6/7/2
> 
> This needs to go in before 2.6.22-final

Andi, I'll include the below two patches in the next batch, OK?



From: "S. P. Prasanna" <prasanna@in.ibm.com>

Fix the problem of page protection introduced by CONFIG_DEBUG_RODATA for
x86_64 architecture.  As per Andi Kleen's suggestion, the kernel text pages
are marked writeable only for a short duration to insert or remove the
breakpoints.

Signed-off-by: Prasanna S P<prasanna@in.ibm.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86_64/kernel/kprobes.c |   26 ++++++++++++++++++++++++++
 arch/x86_64/mm/init.c        |    6 +++++-
 include/asm-x86_64/kprobes.h |   10 ++++++++++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff -puN arch/x86_64/kernel/kprobes.c~kprobes-x86_64-fix-for-mark-ro-data arch/x86_64/kernel/kprobes.c
--- a/arch/x86_64/kernel/kprobes.c~kprobes-x86_64-fix-for-mark-ro-data
+++ a/arch/x86_64/kernel/kprobes.c
@@ -209,16 +209,42 @@ static void __kprobes arch_copy_kprobe(s
 
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
+	unsigned long addr = (unsigned long)p->addr;
+	int page_readonly = 0;
+
+	if (kernel_readonly_text(addr)) {
+		change_page_attr_addr(addr, 1, PAGE_KERNEL_EXEC);
+		global_flush_tlb();
+		page_readonly = 1;
+	}
 	*p->addr = BREAKPOINT_INSTRUCTION;
 	flush_icache_range((unsigned long) p->addr,
 			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+	if (page_readonly) {
+		change_page_attr_addr(addr, 1, PAGE_KERNEL_RO);
+		global_flush_tlb();
+	}
 }
 
 void __kprobes arch_disarm_kprobe(struct kprobe *p)
 {
+	unsigned long addr = (unsigned long)p->addr;
+	int page_readonly = 0;
+
+	if (kernel_readonly_text(addr)) {
+		change_page_attr_addr(addr, 1, PAGE_KERNEL_EXEC);
+		global_flush_tlb();
+		page_readonly = 1;
+	}
+
 	*p->addr = p->opcode;
 	flush_icache_range((unsigned long) p->addr,
 			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+
+	if (page_readonly) {
+		change_page_attr_addr(addr, 1, PAGE_KERNEL_RO);
+		global_flush_tlb();
+	}
 }
 
 void __kprobes arch_remove_kprobe(struct kprobe *p)
diff -puN arch/x86_64/mm/init.c~kprobes-x86_64-fix-for-mark-ro-data arch/x86_64/mm/init.c
--- a/arch/x86_64/mm/init.c~kprobes-x86_64-fix-for-mark-ro-data
+++ a/arch/x86_64/mm/init.c
@@ -48,6 +48,7 @@
 #define Dprintk(x...)
 #endif
 
+int kernel_text_is_ro;
 const struct dma_mapping_ops* dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
@@ -600,10 +601,13 @@ void mark_rodata_ro(void)
 {
 	unsigned long start = (unsigned long)_stext, end;
 
+	kernel_text_is_ro = 1;
 #ifdef CONFIG_HOTPLUG_CPU
 	/* It must still be possible to apply SMP alternatives. */
-	if (num_possible_cpus() > 1)
+	if (num_possible_cpus() > 1) {
 		start = (unsigned long)_etext;
+		kernel_text_is_ro = 0;
+	}
 #endif
 	end = (unsigned long)__end_rodata;
 	start = (start + PAGE_SIZE - 1) & PAGE_MASK;
diff -puN include/asm-x86_64/kprobes.h~kprobes-x86_64-fix-for-mark-ro-data include/asm-x86_64/kprobes.h
--- a/include/asm-x86_64/kprobes.h~kprobes-x86_64-fix-for-mark-ro-data
+++ a/include/asm-x86_64/kprobes.h
@@ -26,6 +26,7 @@
 #include <linux/types.h>
 #include <linux/ptrace.h>
 #include <linux/percpu.h>
+#include <asm-generic/sections.h>
 
 #define  __ARCH_WANT_KPROBES_INSN_SLOT
 
@@ -88,4 +89,13 @@ extern int kprobe_handler(struct pt_regs
 
 extern int kprobe_exceptions_notify(struct notifier_block *self,
 				    unsigned long val, void *data);
+extern int kernel_text_is_ro;
+static inline int kernel_readonly_text(unsigned long address)
+{
+	if (kernel_text_is_ro && ((address >= (unsigned long)_stext)
+					&& (address < (unsigned long) _etext)))
+		return 1;
+
+	return 0;
+}
 #endif				/* _ASM_KPROBES_H */
_



From: "S. P. Prasanna" <prasanna@in.ibm.com>

Fix the problem of page protection introduced by CONFIG_DEBUG_RODATA. 
CONFIG_DEBUG_RODATA marks the text pages as read-only, hence kprobes is
unable to insert breakpoints in the kernel text.  This patch overrides the
page protection when adding or removing a probe for the i386 architecture.

Signed-off-by: Prasanna S P<prasanna@in.ibm.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/i386/kernel/kprobes.c |   26 ++++++++++++++++++++++++++
 arch/i386/mm/init.c        |    2 ++
 include/asm-i386/kprobes.h |   12 ++++++++++++
 include/asm-i386/pgtable.h |    2 ++
 4 files changed, 42 insertions(+)

diff -puN arch/i386/kernel/kprobes.c~kprobes-i386-fix-for-mark-ro-data arch/i386/kernel/kprobes.c
--- a/arch/i386/kernel/kprobes.c~kprobes-i386-fix-for-mark-ro-data
+++ a/arch/i386/kernel/kprobes.c
@@ -169,16 +169,42 @@ int __kprobes arch_prepare_kprobe(struct
 
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
+	unsigned long addr = (unsigned long) p->addr;
+	int page_readonly = 0;
+
+	if (kernel_readonly_text(addr)) {
+		page_readonly = 1;
+		change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RWX);
+		global_flush_tlb();
+	}
+
 	*p->addr = BREAKPOINT_INSTRUCTION;
 	flush_icache_range((unsigned long) p->addr,
 			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+
+	if (page_readonly) {
+		change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RX);
+		global_flush_tlb();
+	}
 }
 
 void __kprobes arch_disarm_kprobe(struct kprobe *p)
 {
+	unsigned long addr = (unsigned long) p->addr;
+	int page_readonly = 0;
+
+	if (kernel_readonly_text(addr)) {
+		page_readonly = 1;
+		change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RWX);
+		global_flush_tlb();
+	}
 	*p->addr = p->opcode;
 	flush_icache_range((unsigned long) p->addr,
 			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+	if (page_readonly) {
+		change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RX);
+		global_flush_tlb();
+	}
 }
 
 void __kprobes arch_remove_kprobe(struct kprobe *p)
diff -puN arch/i386/mm/init.c~kprobes-i386-fix-for-mark-ro-data arch/i386/mm/init.c
--- a/arch/i386/mm/init.c~kprobes-i386-fix-for-mark-ro-data
+++ a/arch/i386/mm/init.c
@@ -45,6 +45,7 @@
 #include <asm/sections.h>
 #include <asm/paravirt.h>
 
+int kernel_text_is_ro;
 unsigned int __VMALLOC_RESERVE = 128 << 20;
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
@@ -808,6 +809,7 @@ void mark_rodata_ro(void)
 		change_page_attr(virt_to_page(start),
 		                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
 		printk("Write protecting the kernel text: %luk\n", size >> 10);
+		kernel_text_is_ro = 1;
 	}
 
 	start += size;
diff -puN include/asm-i386/kprobes.h~kprobes-i386-fix-for-mark-ro-data include/asm-i386/kprobes.h
--- a/include/asm-i386/kprobes.h~kprobes-i386-fix-for-mark-ro-data
+++ a/include/asm-i386/kprobes.h
@@ -26,6 +26,8 @@
  */
 #include <linux/types.h>
 #include <linux/ptrace.h>
+#include <linux/pfn.h>
+#include <asm-generic/sections.h>
 
 #define  __ARCH_WANT_KPROBES_INSN_SLOT
 
@@ -90,4 +92,14 @@ static inline void restore_interrupts(st
 
 extern int kprobe_exceptions_notify(struct notifier_block *self,
 				    unsigned long val, void *data);
+extern int kernel_text_is_ro;
+static inline int kernel_readonly_text(unsigned long address)
+{
+
+	if (kernel_text_is_ro && ((address >= PFN_ALIGN(_text))
+				&& (address < PFN_ALIGN(_etext))))
+		return 1;
+
+	return 0;
+}
 #endif				/* _ASM_KPROBES_H */
diff -puN include/asm-i386/pgtable.h~kprobes-i386-fix-for-mark-ro-data include/asm-i386/pgtable.h
--- a/include/asm-i386/pgtable.h~kprobes-i386-fix-for-mark-ro-data
+++ a/include/asm-i386/pgtable.h
@@ -159,6 +159,7 @@ void paging_init(void);
 extern unsigned long long __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
 #define __PAGE_KERNEL_RO		(__PAGE_KERNEL & ~_PAGE_RW)
 #define __PAGE_KERNEL_RX		(__PAGE_KERNEL_EXEC & ~_PAGE_RW)
+#define __PAGE_KERNEL_RWX		(__PAGE_KERNEL_EXEC | _PAGE_RW)
 #define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
@@ -167,6 +168,7 @@ extern unsigned long long __PAGE_KERNEL,
 #define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO)
 #define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC)
 #define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX)
+#define PAGE_KERNEL_RWX		__pgprot(__PAGE_KERNEL_RWX)
 #define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE)
 #define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE)
 #define PAGE_KERNEL_LARGE_EXEC	__pgprot(__PAGE_KERNEL_LARGE_EXEC)
_


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 4/8] Immediate Value - i386 Optimization; kprobes
  2007-06-18 18:56         ` Andrew Morton
@ 2007-06-18 19:27           ` Mathieu Desnoyers
  2007-06-18 19:32           ` Andi Kleen
  1 sibling, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-18 19:27 UTC (permalink / raw)
  To: Andrew Morton, prasanna; +Cc: Chuck Ebbert, Andi Kleen, linux-kernel, ananth

Hi Prasanna,

Please see comments below about the patch.

* Andrew Morton (akpm@linux-foundation.org) wrote:
> On Mon, 18 Jun 2007 14:44:57 -0400
> Chuck Ebbert <cebbert@redhat.com> wrote:
> 
> > > I fancy it's done by the kprobe_page_fault handler, but I do not see
> > > clearly how writing the breakpoint from arch_arm_kprobe() in
> > > non-writeable memory is done.
> > 
> > Looks like it's not merged yet:
> > 
> > http://lkml.org/lkml/2007/6/7/2
> > 
> > This needs to go in before 2.6.22-final
> 
> Andi, I'll include the below two patches in the next batch, OK?
> 
> 
> 
> From: "S. P. Prasanna" <prasanna@in.ibm.com>
> 
> Fix the problem of page protection introduced by CONFIG_DEBUG_RODATA for
> x86_64 architecture.  As per Andi Kleen's suggestion, the kernel text pages
> are marked writeable only for a short duration to insert or remove the
> breakpoints.
> 
> Signed-off-by: Prasanna S P<prasanna@in.ibm.com>
> Acked-by: Jim Keniston <jkenisto@us.ibm.com>
> Cc: Andi Kleen <ak@suse.de>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  arch/x86_64/kernel/kprobes.c |   26 ++++++++++++++++++++++++++
>  arch/x86_64/mm/init.c        |    6 +++++-
>  include/asm-x86_64/kprobes.h |   10 ++++++++++
>  3 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff -puN arch/x86_64/kernel/kprobes.c~kprobes-x86_64-fix-for-mark-ro-data arch/x86_64/kernel/kprobes.c
> --- a/arch/x86_64/kernel/kprobes.c~kprobes-x86_64-fix-for-mark-ro-data
> +++ a/arch/x86_64/kernel/kprobes.c
> @@ -209,16 +209,42 @@ static void __kprobes arch_copy_kprobe(s
>  
>  void __kprobes arch_arm_kprobe(struct kprobe *p)
>  {
> +	unsigned long addr = (unsigned long)p->addr;
> +	int page_readonly = 0;
> +
> +	if (kernel_readonly_text(addr)) {
> +		change_page_attr_addr(addr, 1, PAGE_KERNEL_EXEC);
> +		global_flush_tlb();
> +		page_readonly = 1;
> +	}
>  	*p->addr = BREAKPOINT_INSTRUCTION;
>  	flush_icache_range((unsigned long) p->addr,
>  			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
> +	if (page_readonly) {
> +		change_page_attr_addr(addr, 1, PAGE_KERNEL_RO);
> +		global_flush_tlb();
> +	}
>  }
>  
>  void __kprobes arch_disarm_kprobe(struct kprobe *p)
>  {
> +	unsigned long addr = (unsigned long)p->addr;
> +	int page_readonly = 0;
> +
> +	if (kernel_readonly_text(addr)) {
> +		change_page_attr_addr(addr, 1, PAGE_KERNEL_EXEC);
> +		global_flush_tlb();
> +		page_readonly = 1;
> +	}
> +
>  	*p->addr = p->opcode;
>  	flush_icache_range((unsigned long) p->addr,
>  			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
> +
> +	if (page_readonly) {
> +		change_page_attr_addr(addr, 1, PAGE_KERNEL_RO);
> +		global_flush_tlb();
> +	}
>  }
>  
>  void __kprobes arch_remove_kprobe(struct kprobe *p)
> diff -puN arch/x86_64/mm/init.c~kprobes-x86_64-fix-for-mark-ro-data arch/x86_64/mm/init.c
> --- a/arch/x86_64/mm/init.c~kprobes-x86_64-fix-for-mark-ro-data
> +++ a/arch/x86_64/mm/init.c
> @@ -48,6 +48,7 @@
>  #define Dprintk(x...)
>  #endif
>  
> +int kernel_text_is_ro;
>  const struct dma_mapping_ops* dma_ops;
>  EXPORT_SYMBOL(dma_ops);
>  
> @@ -600,10 +601,13 @@ void mark_rodata_ro(void)
>  {
>  	unsigned long start = (unsigned long)_stext, end;
>  
> +	kernel_text_is_ro = 1;
>  #ifdef CONFIG_HOTPLUG_CPU
>  	/* It must still be possible to apply SMP alternatives. */
> -	if (num_possible_cpus() > 1)
> +	if (num_possible_cpus() > 1) {
>  		start = (unsigned long)_etext;
> +		kernel_text_is_ro = 0;
> +	}
>  #endif
>  	end = (unsigned long)__end_rodata;
>  	start = (start + PAGE_SIZE - 1) & PAGE_MASK;
> diff -puN include/asm-x86_64/kprobes.h~kprobes-x86_64-fix-for-mark-ro-data include/asm-x86_64/kprobes.h
> --- a/include/asm-x86_64/kprobes.h~kprobes-x86_64-fix-for-mark-ro-data
> +++ a/include/asm-x86_64/kprobes.h
> @@ -26,6 +26,7 @@
>  #include <linux/types.h>
>  #include <linux/ptrace.h>
>  #include <linux/percpu.h>
> +#include <asm-generic/sections.h>
>  
>  #define  __ARCH_WANT_KPROBES_INSN_SLOT
>  
> @@ -88,4 +89,13 @@ extern int kprobe_handler(struct pt_regs
>  
>  extern int kprobe_exceptions_notify(struct notifier_block *self,
>  				    unsigned long val, void *data);
> +extern int kernel_text_is_ro;
> +static inline int kernel_readonly_text(unsigned long address)
> +{
> +	if (kernel_text_is_ro && ((address >= (unsigned long)_stext)
> +					&& (address < (unsigned long) _etext)))
> +		return 1;
> +
> +	return 0;
> +}
>  #endif				/* _ASM_KPROBES_H */
> _
> 
> 
> 
> From: "S. P. Prasanna" <prasanna@in.ibm.com>
> 
> Fix the problem of page protection introduced by CONFIG_DEBUG_RODATA. 
> CONFIG_DEBUG_RODATA marks the text pages as read-only, hence kprobes is
> unable to insert breakpoints in the kernel text.  This patch overrides the
> page protection when adding or removing a probe for the i386 architecture.
> 
> Signed-off-by: Prasanna S P<prasanna@in.ibm.com>
> Acked-by: Jim Keniston <jkenisto@us.ibm.com>
> Cc: Andi Kleen <ak@suse.de>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  arch/i386/kernel/kprobes.c |   26 ++++++++++++++++++++++++++
>  arch/i386/mm/init.c        |    2 ++
>  include/asm-i386/kprobes.h |   12 ++++++++++++
>  include/asm-i386/pgtable.h |    2 ++
>  4 files changed, 42 insertions(+)
> 
> diff -puN arch/i386/kernel/kprobes.c~kprobes-i386-fix-for-mark-ro-data arch/i386/kernel/kprobes.c
> --- a/arch/i386/kernel/kprobes.c~kprobes-i386-fix-for-mark-ro-data
> +++ a/arch/i386/kernel/kprobes.c
> @@ -169,16 +169,42 @@ int __kprobes arch_prepare_kprobe(struct
>  
>  void __kprobes arch_arm_kprobe(struct kprobe *p)
>  {
> +	unsigned long addr = (unsigned long) p->addr;
> +	int page_readonly = 0;
> +
> +	if (kernel_readonly_text(addr)) {
> +		page_readonly = 1;
> +		change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RWX);
> +		global_flush_tlb();
> +	}
> +
>  	*p->addr = BREAKPOINT_INSTRUCTION;
>  	flush_icache_range((unsigned long) p->addr,
>  			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));

Something is weird here:

A- flush_icache_range does nothing on i386. However, I guess it's there
so the code is easier to port to other architectures.
B- since this code is meant to be eventually ported in some way, we have
to be aware that sizeof(kprobe_opcode_t) may differ from 1 byte and that
the instruction can be across a page boundary. However, I do not see
this case dealt properly.

So either we make this function architecture specific (by removing the
flush_icache_range), or we turn this into an architecture independant
function, but we make sure that we change the flags of every potential
pages affected.

> +
> +	if (page_readonly) {
> +		change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RX);
> +		global_flush_tlb();
> +	}
>  }
>  

Something like :

int ro_text = kernel_readonly_text(addr);

if (ro_text) {
  ...
}
...
if (ro_text) {
  ...
}

Seems nicer than setting a boolean in the middle of the function,
doesn't it ?


>  void __kprobes arch_disarm_kprobe(struct kprobe *p)
>  {
> +	unsigned long addr = (unsigned long) p->addr;
> +	int page_readonly = 0;
> +
> +	if (kernel_readonly_text(addr)) {
> +		page_readonly = 1;
> +		change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RWX);
> +		global_flush_tlb();
> +	}
>  	*p->addr = p->opcode;
>  	flush_icache_range((unsigned long) p->addr,
>  			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
> +	if (page_readonly) {
> +		change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RX);
> +		global_flush_tlb();
> +	}
>  }
>  
>  void __kprobes arch_remove_kprobe(struct kprobe *p)
> diff -puN arch/i386/mm/init.c~kprobes-i386-fix-for-mark-ro-data arch/i386/mm/init.c
> --- a/arch/i386/mm/init.c~kprobes-i386-fix-for-mark-ro-data
> +++ a/arch/i386/mm/init.c
> @@ -45,6 +45,7 @@
>  #include <asm/sections.h>
>  #include <asm/paravirt.h>
>  
> +int kernel_text_is_ro;
>  unsigned int __VMALLOC_RESERVE = 128 << 20;
>  
>  DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
> @@ -808,6 +809,7 @@ void mark_rodata_ro(void)
>  		change_page_attr(virt_to_page(start),
>  		                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
>  		printk("Write protecting the kernel text: %luk\n", size >> 10);
> +		kernel_text_is_ro = 1;
>  	}
>  
>  	start += size;
> diff -puN include/asm-i386/kprobes.h~kprobes-i386-fix-for-mark-ro-data include/asm-i386/kprobes.h
> --- a/include/asm-i386/kprobes.h~kprobes-i386-fix-for-mark-ro-data
> +++ a/include/asm-i386/kprobes.h
> @@ -26,6 +26,8 @@
>   */
>  #include <linux/types.h>
>  #include <linux/ptrace.h>
> +#include <linux/pfn.h>
> +#include <asm-generic/sections.h>
>  
>  #define  __ARCH_WANT_KPROBES_INSN_SLOT
>  
> @@ -90,4 +92,14 @@ static inline void restore_interrupts(st
>  
>  extern int kprobe_exceptions_notify(struct notifier_block *self,
>  				    unsigned long val, void *data);
> +extern int kernel_text_is_ro;
> +static inline int kernel_readonly_text(unsigned long address)
> +{
> +
> +	if (kernel_text_is_ro && ((address >= PFN_ALIGN(_text))
> +				&& (address < PFN_ALIGN(_etext))))
> +		return 1;
> +
> +	return 0;
> +}

I see here that it does not check for module text addresses, which is
correct. module.c does not currently implement read-only pages for
module's text section. Is it planned ?

>  #endif				/* _ASM_KPROBES_H */
> diff -puN include/asm-i386/pgtable.h~kprobes-i386-fix-for-mark-ro-data include/asm-i386/pgtable.h
> --- a/include/asm-i386/pgtable.h~kprobes-i386-fix-for-mark-ro-data
> +++ a/include/asm-i386/pgtable.h
> @@ -159,6 +159,7 @@ void paging_init(void);
>  extern unsigned long long __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
>  #define __PAGE_KERNEL_RO		(__PAGE_KERNEL & ~_PAGE_RW)
>  #define __PAGE_KERNEL_RX		(__PAGE_KERNEL_EXEC & ~_PAGE_RW)
> +#define __PAGE_KERNEL_RWX		(__PAGE_KERNEL_EXEC | _PAGE_RW)
>  #define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD)
>  #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
>  #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
> @@ -167,6 +168,7 @@ extern unsigned long long __PAGE_KERNEL,
>  #define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO)
>  #define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC)
>  #define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX)
> +#define PAGE_KERNEL_RWX		__pgprot(__PAGE_KERNEL_RWX)
>  #define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE)
>  #define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE)
>  #define PAGE_KERNEL_LARGE_EXEC	__pgprot(__PAGE_KERNEL_LARGE_EXEC)
> _
> 
There is a new define for i386, but not for x86_64 ? I doubt it is
required anyway, because __PAGE_KERNEL_EXEC implies
(__PAGE_KERNEL_RX | _PAGE_RW).

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 4/8] Immediate Value - i386 Optimization; kprobes
  2007-06-18 18:56         ` Andrew Morton
  2007-06-18 19:27           ` Mathieu Desnoyers
@ 2007-06-18 19:32           ` Andi Kleen
  2007-06-18 20:16             ` Chuck Ebbert
  2007-06-19 10:06             ` [patch 1/2] kprobes i386 quick fix mark-ro-data S. P. Prasanna
  1 sibling, 2 replies; 24+ messages in thread
From: Andi Kleen @ 2007-06-18 19:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chuck Ebbert, Mathieu Desnoyers, linux-kernel, prasanna, ananth

On Monday 18 June 2007 20:56:32 Andrew Morton wrote:
> On Mon, 18 Jun 2007 14:44:57 -0400
> Chuck Ebbert <cebbert@redhat.com> wrote:
> 
> > > I fancy it's done by the kprobe_page_fault handler, but I do not see
> > > clearly how writing the breakpoint from arch_arm_kprobe() in
> > > non-writeable memory is done.
> > 
> > Looks like it's not merged yet:
> > 
> > http://lkml.org/lkml/2007/6/7/2
> > 
> > This needs to go in before 2.6.22-final
> 
> Andi, I'll include the below two patches in the next batch, OK?

It won't work reliably unless some of the c_p_a() fixes get in first.

> 
> 
> 
> +extern int kernel_text_is_ro;

No externs in .c files


I also don't like kernel_text_is_read_only() much, it would
be better to just lookup_address() it and check the write flag.

But for 2.6.22 as a quick fix it might be better to just
make KPROBES dependent on !DEBUG_RODATA. That would be a one liner.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 4/8] Immediate Value - i386 Optimization; kprobes
  2007-06-18 19:32           ` Andi Kleen
@ 2007-06-18 20:16             ` Chuck Ebbert
  2007-06-19 10:06             ` [patch 1/2] kprobes i386 quick fix mark-ro-data S. P. Prasanna
  1 sibling, 0 replies; 24+ messages in thread
From: Chuck Ebbert @ 2007-06-18 20:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Mathieu Desnoyers, linux-kernel, prasanna, ananth

On 06/18/2007 03:32 PM, Andi Kleen wrote:
> 
> But for 2.6.22 as a quick fix it might be better to just
> make KPROBES dependent on !DEBUG_RODATA. That would be a one liner.

Yeah, that's probably best for now.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 1/2] kprobes i386 quick fix mark-ro-data
  2007-06-18 19:32           ` Andi Kleen
  2007-06-18 20:16             ` Chuck Ebbert
@ 2007-06-19 10:06             ` S. P. Prasanna
  2007-06-19 10:08               ` [patch 2/2] kprobes x86_64 " S. P. Prasanna
  2007-06-19 16:47               ` [patch 1/2] kprobes i386 " Andi Kleen
  1 sibling, 2 replies; 24+ messages in thread
From: S. P. Prasanna @ 2007-06-19 10:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Chuck Ebbert, Mathieu Desnoyers, linux-kernel,
	ananth, mathieu.desnoyers

On Mon, Jun 18, 2007 at 09:32:56PM +0200, Andi Kleen wrote:
> On Monday 18 June 2007 20:56:32 Andrew Morton wrote:
> > On Mon, 18 Jun 2007 14:44:57 -0400
> > Chuck Ebbert <cebbert@redhat.com> wrote:
> > 
> > > > I fancy it's done by the kprobe_page_fault handler, but I do not see
> > > > clearly how writing the breakpoint from arch_arm_kprobe() in
> > > > non-writeable memory is done.
> > > 
> > > Looks like it's not merged yet:
> > > 
> > > http://lkml.org/lkml/2007/6/7/2
> > > 
> > > This needs to go in before 2.6.22-final
> > 
> > Andi, I'll include the below two patches in the next batch, OK?
> 
> It won't work reliably unless some of the c_p_a() fixes get in first.
> 
> > 
> > 
> > 
> > +extern int kernel_text_is_ro;
> 
> No externs in .c files
Yes.
> 
> 
> I also don't like kernel_text_is_read_only() much, it would
> be better to just lookup_address() it and check the write flag.

Yes, I will look into this approach.
> 
> But for 2.6.22 as a quick fix it might be better to just
> make KPROBES dependent on !DEBUG_RODATA. That would be a one liner.
> 

Please find the quick fix as per your suggestion below.

Thanks
Prasanna

This patch is a quick fix to enable kprobes only if DEBUG_RODATA is
not configured. Since DEBUG_RODATA markes the kernel text pages as read-only.

Signed-off-by: Prasanna S P. <prasanna@in.ibm.com>


 arch/i386/Kconfig |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff -puN arch/i386/Kconfig~kprobes-quick-fix-mark-ro-data-i386 arch/i386/Kconfig
--- linux-2.6.22-rc5/arch/i386/Kconfig~kprobes-quick-fix-mark-ro-data-i386	2007-06-19 14:55:31.000000000 +0530
+++ linux-2.6.22-rc5-prasanna/arch/i386/Kconfig	2007-06-19 14:55:31.000000000 +0530
@@ -1212,6 +1212,8 @@ source "drivers/Kconfig"
 
 source "fs/Kconfig"
 
+source "arch/i386/Kconfig.debug"
+
 menu "Instrumentation Support"
 	depends on EXPERIMENTAL
 
@@ -1219,7 +1221,7 @@ source "arch/i386/oprofile/Kconfig"
 
 config KPROBES
 	bool "Kprobes (EXPERIMENTAL)"
-	depends on KALLSYMS && EXPERIMENTAL && MODULES
+	depends on KALLSYMS && EXPERIMENTAL && MODULES && !DEBUG_RODATA
 	help
 	  Kprobes allows you to trap at almost any kernel address and
 	  execute a callback function.  register_kprobe() establishes
@@ -1228,8 +1230,6 @@ config KPROBES
 	  If in doubt, say "N".
 endmenu
 
-source "arch/i386/Kconfig.debug"
-
 source "security/Kconfig"
 
 source "crypto/Kconfig"

_
-- 
Prasanna S.P.
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-41776329

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 2/2] kprobes x86_64 quick fix mark-ro-data
  2007-06-19 10:06             ` [patch 1/2] kprobes i386 quick fix mark-ro-data S. P. Prasanna
@ 2007-06-19 10:08               ` S. P. Prasanna
  2007-06-19 13:21                 ` Arjan van de Ven
  2007-06-19 16:47               ` [patch 1/2] kprobes i386 " Andi Kleen
  1 sibling, 1 reply; 24+ messages in thread
From: S. P. Prasanna @ 2007-06-19 10:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Chuck Ebbert, Mathieu Desnoyers, linux-kernel,
	ananth, mathieu.desnoyers

This patch is a quick fix for x86_64 to enable kprobes only if DEBUG_RODATA is
not configured. Since DEBUG_RODATA markes the kernel text pages as read-only.

Signed-off-by: Prasanna S P. <prasanna@in.ibm.com>


 arch/x86_64/Kconfig |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff -puN arch/x86_64/Kconfig~kprobes-quick-fix-mark-ro-data-x86_64 arch/x86_64/Kconfig
--- linux-2.6.22-rc5/arch/x86_64/Kconfig~kprobes-quick-fix-mark-ro-data-x86_64	2007-06-19 14:55:56.000000000 +0530
+++ linux-2.6.22-rc5-prasanna/arch/x86_64/Kconfig	2007-06-19 14:55:56.000000000 +0530
@@ -764,6 +764,8 @@ source "drivers/firmware/Kconfig"
 
 source fs/Kconfig
 
+source "arch/x86_64/Kconfig.debug"
+
 menu "Instrumentation Support"
         depends on EXPERIMENTAL
 
@@ -771,7 +773,7 @@ source "arch/x86_64/oprofile/Kconfig"
 
 config KPROBES
 	bool "Kprobes (EXPERIMENTAL)"
-	depends on KALLSYMS && EXPERIMENTAL && MODULES
+	depends on KALLSYMS && EXPERIMENTAL && MODULES && !DEBUG_RODATA
 	help
 	  Kprobes allows you to trap at almost any kernel address and
 	  execute a callback function.  register_kprobe() establishes
@@ -780,8 +782,6 @@ config KPROBES
 	  If in doubt, say "N".
 endmenu
 
-source "arch/x86_64/Kconfig.debug"
-
 source "security/Kconfig"
 
 source "crypto/Kconfig"

_
-- 
Prasanna S.P.
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-41776329

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 2/2] kprobes x86_64 quick fix mark-ro-data
  2007-06-19 10:08               ` [patch 2/2] kprobes x86_64 " S. P. Prasanna
@ 2007-06-19 13:21                 ` Arjan van de Ven
  2007-06-19 13:30                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 24+ messages in thread
From: Arjan van de Ven @ 2007-06-19 13:21 UTC (permalink / raw)
  To: prasanna
  Cc: Andi Kleen, Andrew Morton, Chuck Ebbert, Mathieu Desnoyers,
	linux-kernel, ananth

On Tue, 2007-06-19 at 15:38 +0530, S. P. Prasanna wrote:
> This patch is a quick fix for x86_64 to enable kprobes only if DEBUG_RODATA is
> not configured. Since DEBUG_RODATA markes the kernel text pages as read-only.


it does??

I don't seem to be able to find this in the source code..... 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 2/2] kprobes x86_64 quick fix mark-ro-data
  2007-06-19 13:21                 ` Arjan van de Ven
@ 2007-06-19 13:30                   ` Mathieu Desnoyers
  2007-06-19 13:44                     ` Arjan van de Ven
  0 siblings, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2007-06-19 13:30 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: prasanna, Andi Kleen, Andrew Morton, Chuck Ebbert, linux-kernel, ananth

* Arjan van de Ven (arjan@infradead.org) wrote:
> On Tue, 2007-06-19 at 15:38 +0530, S. P. Prasanna wrote:
> > This patch is a quick fix for x86_64 to enable kprobes only if DEBUG_RODATA is
> > not configured. Since DEBUG_RODATA markes the kernel text pages as read-only.
> 
> 
> it does??
> 
> I don't seem to be able to find this in the source code..... 
> 

See arch/x86_64/mm/init.c:mark_rodata_ro().

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 2/2] kprobes x86_64 quick fix mark-ro-data
  2007-06-19 13:30                   ` Mathieu Desnoyers
@ 2007-06-19 13:44                     ` Arjan van de Ven
  2007-06-19 14:31                       ` S. P. Prasanna
  0 siblings, 1 reply; 24+ messages in thread
From: Arjan van de Ven @ 2007-06-19 13:44 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: prasanna, Andi Kleen, Andrew Morton, Chuck Ebbert, linux-kernel, ananth

On Tue, 2007-06-19 at 09:30 -0400, Mathieu Desnoyers wrote:
> * Arjan van de Ven (arjan@infradead.org) wrote:
> > On Tue, 2007-06-19 at 15:38 +0530, S. P. Prasanna wrote:
> > > This patch is a quick fix for x86_64 to enable kprobes only if DEBUG_RODATA is
> > > not configured. Since DEBUG_RODATA markes the kernel text pages as read-only.
> > 
> > 
> > it does??
> > 
> > I don't seem to be able to find this in the source code..... 
> > 

> See arch/x86_64/mm/init.c:mark_rodata_ro().

eh woops


PATCH] x86: tighten kernel image page access rights
author
Jan Beulich <jbeulich@novell.com>

Wed, 2 May 2007 17:27:10 +0000
(19:27 +0200)
committer
Andi Kleen <andi@basil.nowhere.org>

Wed, 2 May 2007 17:27:10 +0000
(19:27 +0200)
commit
6fb14755a676282a4e6caa05a08c92db8e45cfff


changed it to include text (even though Andi vetoed that before when I
asked for it on grounds of breaking kprobes)... sounds this really wants
to be a 2nd config option to seperatedly do code and data.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 2/2] kprobes x86_64 quick fix mark-ro-data
  2007-06-19 13:44                     ` Arjan van de Ven
@ 2007-06-19 14:31                       ` S. P. Prasanna
  0 siblings, 0 replies; 24+ messages in thread
From: S. P. Prasanna @ 2007-06-19 14:31 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Mathieu Desnoyers, Andi Kleen, Andrew Morton, Chuck Ebbert,
	linux-kernel, ananth

On Tue, Jun 19, 2007 at 06:44:30AM -0700, Arjan van de Ven wrote:
> On Tue, 2007-06-19 at 09:30 -0400, Mathieu Desnoyers wrote:
> > * Arjan van de Ven (arjan@infradead.org) wrote:
> > > On Tue, 2007-06-19 at 15:38 +0530, S. P. Prasanna wrote:
> > > > This patch is a quick fix for x86_64 to enable kprobes only if DEBUG_RODATA is
> > > > not configured. Since DEBUG_RODATA markes the kernel text pages as read-only.
> > > 
> > > 
> > > it does??
> > > 
> > > I don't seem to be able to find this in the source code..... 
> > > 
> 
> > See arch/x86_64/mm/init.c:mark_rodata_ro().
> 
> eh woops
> 
> 
> PATCH] x86: tighten kernel image page access rights
> author
> Jan Beulich <jbeulich@novell.com>
> 
> Wed, 2 May 2007 17:27:10 +0000
> (19:27 +0200)
> committer
> Andi Kleen <andi@basil.nowhere.org>
> 
> Wed, 2 May 2007 17:27:10 +0000
> (19:27 +0200)
> commit
> 6fb14755a676282a4e6caa05a08c92db8e45cfff
> 
> 
> changed it to include text (even though Andi vetoed that before when I
> asked for it on grounds of breaking kprobes)... sounds this really wants
> to be a 2nd config option to seperatedly do code and data.

Something like having a seperate config option and a routine
to mark kernel text as read execute only.
And call mark_rwtext_ro() if CONFIG_DEBUG_ROTEXT is enabled
below in mark_rodata_ro().

/* this code is for i386 architecture*/
static inline void mark_rwtext_ro(void)
{
	unsigned long start = PFN_ALIGN(_text);
	unsigned long size = PFN_ALIGN(_etext) - start;

#ifdef CONFIG_HOTPLUG_CPU
	/* It must still be possible to apply SMP alternatives. */
	if (num_possible_cpus() <= 1)
#endif
	{
		change_page_attr(virt_to_page(start),
		                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
		printk("Write protecting the kernel text: %luk\n", size >> 10);
	}
	/*
	 * global_flush_tlb() will be called after marking the data as readonly.
	 */
}

#ifdef CONFIG_DEBUG_RODATA

void mark_rodata_ro(void)
{
....

#ifdef CONFIG_DEBUG_ROTEXT
	mark_rwtext_ro();
#endif
....

}

Thanks
Prasanna
-- 
Prasanna S.P.
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-41776329

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 1/2] kprobes i386 quick fix mark-ro-data
  2007-06-19 10:06             ` [patch 1/2] kprobes i386 quick fix mark-ro-data S. P. Prasanna
  2007-06-19 10:08               ` [patch 2/2] kprobes x86_64 " S. P. Prasanna
@ 2007-06-19 16:47               ` Andi Kleen
  1 sibling, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2007-06-19 16:47 UTC (permalink / raw)
  To: prasanna
  Cc: Andrew Morton, Chuck Ebbert, Mathieu Desnoyers, linux-kernel, ananth


> Please find the quick fix as per your suggestion below.

I've already done it myself. But I ended up doing it the other
way around -- making DEBUG_RODATA dependent on !KPROBES
because I figured KPROBES is more important than DEBUG_RODATA
and it is less confusing for the user this way.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2007-06-19 16:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-15 20:23 [patch 0/8] Immediate values for fast branches Mathieu Desnoyers
2007-06-15 20:23 ` [patch 1/8] Immediate Value - Architecture Independent Code Mathieu Desnoyers
2007-06-15 20:23 ` [patch 2/8] Immediate Values - Non Optimized Architectures Mathieu Desnoyers
2007-06-15 20:23 ` [patch 3/8] Immediate Value - Add kconfig menus Mathieu Desnoyers
2007-06-15 20:23 ` [patch 4/8] Immediate Value - i386 Optimization Mathieu Desnoyers
2007-06-15 22:02   ` Chuck Ebbert
2007-06-17 17:50     ` Mathieu Desnoyers
2007-06-18 14:57     ` [patch 4/8] Immediate Value - i386 Optimization; kprobes Mathieu Desnoyers
2007-06-18 18:44       ` Chuck Ebbert
2007-06-18 18:56         ` Andrew Morton
2007-06-18 19:27           ` Mathieu Desnoyers
2007-06-18 19:32           ` Andi Kleen
2007-06-18 20:16             ` Chuck Ebbert
2007-06-19 10:06             ` [patch 1/2] kprobes i386 quick fix mark-ro-data S. P. Prasanna
2007-06-19 10:08               ` [patch 2/2] kprobes x86_64 " S. P. Prasanna
2007-06-19 13:21                 ` Arjan van de Ven
2007-06-19 13:30                   ` Mathieu Desnoyers
2007-06-19 13:44                     ` Arjan van de Ven
2007-06-19 14:31                       ` S. P. Prasanna
2007-06-19 16:47               ` [patch 1/2] kprobes i386 " Andi Kleen
2007-06-15 20:23 ` [patch 5/8] Immediate Value - PowerPC Optimization Mathieu Desnoyers
2007-06-15 20:23 ` [patch 6/8] Immediate Value - Documentation Mathieu Desnoyers
2007-06-15 20:23 ` [patch 7/8] F00F bug fixup for i386 - use immediate values Mathieu Desnoyers
2007-06-15 20:23 ` [patch 8/8] Scheduler profiling - Use " Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).