linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] kmemcheck v4
@ 2008-02-14 19:59 Vegard Nossum
  2008-02-14 20:01 ` [PATCH 2/4] " Vegard Nossum
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Vegard Nossum @ 2008-02-14 19:59 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Daniel Walker, Ingo Molnar, Richard Knutsson, Andi Kleen,
	Pekka Enberg, Christoph Lameter

Greetings,

I post the fourth revision of kmemcheck. It seems that we are slowly
converging towards something usable :-)

General description: kmemcheck is a patch to the linux kernel that
detects use of uninitialized memory. It does this by trapping every
read and write to memory that was allocated dynamically (e.g. using
kmalloc()). If a memory address is read that has not previously been
written to, a message is printed to the kernel log.

Changes since v3:
- More clean-ups. Hopefully the SLUB bits are clearer now.
- Don't print directly from the page fault handler. Instead, we save all
   errors on a ring queue and print them from a helper kernel thread.
- Some experimental support for graceful bit-field handling.
- Preliminary support for use-after-free detection.
- I've also separated the patch into logical chunks.

On my machine, the kmemcheck-enabled kernel now boots into full
graphical desktop. As expected, it is much, much slower than the vanilla
kernel, but still surprisingly usable. Unfortunately, the kernel
usually freezes hard after a couple of hours for unknown reasons --
ideas and/or patches are welcome ;-)

The patches apply to v2.6.25-rc1.


Kind regards,
Vegard Nossum


 From 0fcca4341b6b1b277d936558aa3cab0f212bad9b Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: Thu, 14 Feb 2008 19:10:40 +0100
Subject: [PATCH] kmemcheck: add the core kmemcheck changes

General description: kmemcheck is a patch to the linux kernel that
detects use of uninitialized memory. It does this by trapping every
read and write to memory that was allocated dynamically (e.g. using
kmalloc()). If a memory address is read that has not previously been
written to, a message is printed to the kernel log.

Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
---
  Documentation/kmemcheck.txt    |   73 ++++
  arch/x86/Kconfig.debug         |   35 ++
  arch/x86/kernel/Makefile       |    2 +
  arch/x86/kernel/kmemcheck_32.c |  781 ++++++++++++++++++++++++++++++++++++++++
  include/asm-x86/kmemcheck.h    |    3 +
  include/asm-x86/kmemcheck_32.h |   22 ++
  include/asm-x86/pgtable.h      |    4 +-
  include/asm-x86/pgtable_32.h   |    1 +
  include/linux/gfp.h            |    3 +-
  include/linux/kmemcheck.h      |   17 +
  include/linux/page-flags.h     |    7 +
  11 files changed, 945 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/kmemcheck.txt
  create mode 100644 arch/x86/kernel/kmemcheck_32.c
  create mode 100644 include/asm-x86/kmemcheck.h
  create mode 100644 include/asm-x86/kmemcheck_32.h
  create mode 100644 include/linux/kmemcheck.h

diff --git a/Documentation/kmemcheck.txt b/Documentation/kmemcheck.txt
new file mode 100644
index 0000000..d234571
--- /dev/null
+++ b/Documentation/kmemcheck.txt
@@ -0,0 +1,73 @@
+Technical description
+=====================
+
+kmemcheck works by marking memory pages non-present. This means that whenever
+somebody attempts to access the page, a page fault is generated. The page
+fault handler notices that the page was in fact only hidden, and so it calls
+on the kmemcheck code to make further investigations.
+
+When the investigations are completed, kmemcheck "shows" the page by marking
+it present (as it would be under normal circumstances). This way, the
+interrupted code can continue as usual.
+
+But after the instruction has been executed, we should hide the page again, so
+that we can catch the next access too! Now kmemcheck makes use of a debugging
+feature of the processor, namely single-stepping. When the processor has
+finished the one instruction that generated the memory access, a debug
+exception is raised. From here, we simply hide the page again and continue
+execution, this time with the single-stepping feature turned off.
+
+
+Changes to the memory allocator (SLUB)
+======================================
+
+kmemcheck requires some assistance from the memory allocator in order to work.
+The memory allocator needs to
+
+1. Request twice as much memory as would normally be needed. The bottom half
+   of the memory is what the user actually sees and uses; the upper half
+   contains the so-called shadow memory, which stores the status of each byte
+   in the bottom half, e.g. initialized or uninitialized.
+2. Tell kmemcheck which parts of memory that should be marked uninitialized.
+   There are actually a few more states, such as "not yet allocated" and
+   "recently freed".
+
+If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
+memory that can take page faults because of kmemcheck.
+
+If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
+request memory with the __GFP_NOTRACK flag. This does not prevent the page
+faults from occuring, however, but marks the object in question as being
+initialized so that no warnings will ever be produced for this object.
+
+
+Problems
+========
+
+The most prominent problem seems to be that of bit-fields. kmemcheck can only
+track memory with byte granularity. Therefore, when gcc generates code to
+access only one bit in a bit-field, there is really no way for kmemcheck to
+know which of the other bits that will be used or thrown away. Consequently,
+there may be bogus warnings for bit-field accesses. There is some experimental
+support to detect this automatically, though it is probably better to work
+around this by explicitly initializing whole bit-fields at once.
+
+Some allocations are used for DMA. As DMA doesn't go through the paging
+mechanism, we have absolutely no way to detect DMA writes. This means that
+spurious warnings may be seen on access to DMA memory. DMA allocations should
+be annotated with the __GFP_NOTRACK flag or allocated from caches marked
+SLAB_NOTRACK to work around this problem.
+
+
+Future enhancements
+===================
+
+There is already some preliminary support for catching use-after-free errors.
+What still needs to be done is delaying kfree() so that memory is not
+reallocated immediately after freeing it. [Suggested by Pekka Enberg.]
+
+It should be possible to allow SMP systems by duplicating the page tables for
+each processor in the system. This is probably extremely difficult, however.
+[Suggested by Ingo Molnar.]
+
+Support for instruction set extensions like XMM, SSE2, etc.
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 864affc..f373c0e 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -134,6 +134,41 @@ config IOMMU_LEAK
  	  Add a simple leak tracer to the IOMMU code. This is useful when you
  	  are debugging a buggy device driver that leaks IOMMU mappings.

+config KMEMCHECK
+	bool "kmemcheck: trap use of uninitialized memory"
+	depends on M386 && !X86_GENERIC && !SMP
+	depends on !CC_OPTIMIZE_FOR_SIZE
+	depends on !DEBUG_PAGEALLOC && SLUB
+	select DEBUG_INFO
+	select FRAME_POINTER
+	select STACKTRACE
+	default n
+	help
+	  This option enables tracing of dynamically allocated kernel memory
+	  to see if memory is used before it has been given an initial value.
+	  Be aware that this requires half of your memory for bookkeeping and
+	  will insert extra code at *every* read and write to tracked memory
+	  thus slow down the kernel code (but user code is unaffected).
+
+config KMEMCHECK_PARTIAL_OK
+	bool "kmemcheck: allow partially uninitialized memory"
+	depends on KMEMCHECK
+	default y
+	help
+	  This option works around certain GCC optimizations that produce
+	  32-bit reads from 16-bit variables where the upper 16 bits are
+	  thrown away afterwards. This may of course also hide some real
+	  bugs.
+
+config KMEMCHECK_BITOPS_OK
+	bool "kmemcheck: allow bit-field manipulation"
+	depends on KMEMCHECK
+	default n
+	help
+	  This option silences warnings that would be generated for bit-field
+	  accesses where not all the bits are initialized at the same time.
+	  This may also hide some real bugs.
+
  #
  # IO delay types:
  #
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 76ec0f8..f302a8a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -78,6 +78,8 @@ endif
  obj-$(CONFIG_SCx200)		+= scx200.o
  scx200-y			+= scx200_32.o

+obj-$(CONFIG_KMEMCHECK)		+= kmemcheck_32.o
+
  ###
  # 64 bit specific files
  ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/kmemcheck_32.c b/arch/x86/kernel/kmemcheck_32.c
new file mode 100644
index 0000000..0863ce2
--- /dev/null
+++ b/arch/x86/kernel/kmemcheck_32.c
@@ -0,0 +1,781 @@
+/**
+ * kmemcheck - a heavyweight memory checker
+ * Copyright (C) 2007, 2008  Vegard Nossum <vegardno@ifi.uio.no>
+ * (With a lot of help from Ingo Molnar and Pekka Enberg.)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2) as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kallsyms.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/page-flags.h>
+#include <linux/stacktrace.h>
+
+#include <asm/cacheflush.h>
+#include <asm/kdebug.h>
+#include <asm/kmemcheck.h>
+#include <asm/pgtable.h>
+#include <asm/string.h>
+#include <asm/tlbflush.h>
+
+enum shadow {
+	SHADOW_UNALLOCATED,
+	SHADOW_UNINITIALIZED,
+	SHADOW_INITIALIZED,
+	SHADOW_FREED,
+};
+
+struct kmemcheck_error {
+	/* Kind of access that caused the error */
+	enum shadow		state;
+	/* Address and size of the erroneous read */
+	uint32_t		address;
+	unsigned int		size;
+
+	struct pt_regs		regs;
+	struct stack_trace	trace;
+	unsigned long		trace_entries[32];
+};
+
+/*
+ * Create a ring queue of errors to output. We can't call printk() directly
+ * from the kmemcheck traps, since this may call the console drivers and
+ * result in a recursive fault.
+ */
+static struct kmemcheck_error error_fifo[32];
+static unsigned int error_count;
+static unsigned int error_rd;
+static unsigned int error_wr;
+
+static struct task_struct *kmemcheck_thread;
+
+static struct kmemcheck_error *
+error_next_wr(void)
+{
+	struct kmemcheck_error *e;
+
+	if (error_count == ARRAY_SIZE(error_fifo))
+		return NULL;
+
+	e = &error_fifo[error_wr];
+	if (++error_wr == ARRAY_SIZE(error_fifo))
+		error_wr = 0;
+	++error_count;
+	return e;
+}
+
+static struct kmemcheck_error *
+error_next_rd(void)
+{
+	struct kmemcheck_error *e;
+
+	if (error_count == 0)
+		return NULL;
+
+	e = &error_fifo[error_rd];
+	if (++error_rd == ARRAY_SIZE(error_fifo))
+		error_rd = 0;
+	--error_count;
+	return e;
+}
+
+/*
+ * Save the context of the error.
+ */
+static void
+error_save(enum shadow state, uint32_t address, unsigned int size,
+	struct pt_regs *regs)
+{
+	static uint32_t prev_ip;
+
+	struct kmemcheck_error *e;
+
+	/* Don't report several adjacent errors from the same EIP. */
+	if (regs->ip == prev_ip)
+		return;
+	prev_ip = regs->ip;
+
+	e = error_next_wr();
+	if (!e)
+		return;
+
+	e->state = state;
+	e->address = address;
+	e->size = size;
+
+	/* Save regs */
+	memcpy(&e->regs, regs, sizeof(*regs));
+
+	/* Save stack trace */
+	e->trace.nr_entries = 0;
+	e->trace.entries = e->trace_entries;
+	e->trace.max_entries = ARRAY_SIZE(e->trace_entries);
+	e->trace.skip = 4;
+	save_stack_trace(&e->trace);
+
+	if (kmemcheck_thread)
+		wake_up_process(kmemcheck_thread);
+}
+
+static void
+error_recall(void)
+{
+	static const char *desc[] = {
+		[SHADOW_UNALLOCATED]	= "unallocated",
+		[SHADOW_UNINITIALIZED]	= "uninitialized",
+		[SHADOW_INITIALIZED]	= "initialized",
+		[SHADOW_FREED]		= "freed",
+	};
+
+	struct kmemcheck_error *e;
+
+	e = error_next_rd();
+	if (!e)
+		return;
+
+	printk(KERN_ALERT "kmemcheck: Caught %d-bit read from %s memory\n",
+		e->size, desc[e->state]);
+	printk(KERN_ALERT "=> address %08x\n", e->address);
+
+	__show_registers(&e->regs, 1);
+	print_stack_trace(&e->trace, 0);
+}
+
+/*
+ * The error reporter thread.
+ */
+static int
+kmemcheck_thread_run(void *data)
+{
+	while (true) {
+		while (error_count > 0)
+			error_recall();
+
+		/* Sleep */
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule();
+	}
+}
+
+static int
+init(void)
+{
+	struct task_struct *t;
+
+	printk(KERN_INFO "kmemcheck: \"Bugs, beware!\"\n");
+
+	t = kthread_create(&kmemcheck_thread_run, NULL,
+		"kmemcheck");
+	if (IS_ERR(t)) {
+		printk(KERN_ERR "kmemcheck: Couldn't start output thread\n");
+		return PTR_ERR(t);
+	}
+
+	kmemcheck_thread = t;
+	wake_up_process(kmemcheck_thread);
+	return 0;
+}
+
+core_initcall(init);
+
+/*
+ * Return the shadow address for the given address. Returns NULL if the
+ * address is not tracked.
+ */
+static void *
+address_get_shadow(unsigned long address)
+{
+	struct page *page;
+	struct page *head;
+
+	if (address < PAGE_OFFSET)
+		return NULL;
+	page = virt_to_page(address);
+	if (!page)
+		return NULL;
+	head = compound_head(page);
+	if (!head)
+		return NULL;
+	if (!PageSlab(head))
+		return NULL;
+	if (!PageTracked(head))
+		return NULL;
+	return (void *) address + (PAGE_SIZE << (compound_order(head) - 1));
+}
+
+static int
+show_addr(uint32_t addr)
+{
+	pte_t *pte;
+	int level;
+
+	if (!address_get_shadow(addr))
+		return 0;
+
+	pte = lookup_address(addr, &level);
+	BUG_ON(!pte);
+	BUG_ON(level != PG_LEVEL_4K);
+
+	pte->pte_low |= _PAGE_PRESENT;
+	__flush_tlb_one(addr);
+	return 1;
+}
+
+static int
+hide_addr(uint32_t addr)
+{
+	pte_t *pte;
+	int level;
+
+	if (!address_get_shadow(addr))
+		return 0;
+
+	pte = lookup_address(addr, &level);
+	BUG_ON(!pte);
+	BUG_ON(level != PG_LEVEL_4K);
+
+	pte->pte_low &= ~_PAGE_PRESENT;
+	__flush_tlb_one(addr);
+	return 1;
+}
+
+DEFINE_PER_CPU(bool, kmemcheck_busy) = false;
+DEFINE_PER_CPU(uint32_t, kmemcheck_addr1) = 0;
+DEFINE_PER_CPU(uint32_t, kmemcheck_addr2) = 0;
+DEFINE_PER_CPU(uint32_t, kmemcheck_reg_flags) = 0;
+
+DEFINE_PER_CPU(int, kmemcheck_num) = 0;
+DEFINE_PER_CPU(int, kmemcheck_balance) = 0;
+
+/*
+ * Called from the #PF handler.
+ */
+void
+kmemcheck_show(struct pt_regs *regs)
+{
+	int n;
+
+	BUG_ON(!irqs_disabled());
+
+	if (__get_cpu_var(kmemcheck_balance) != 0) {
+		oops_in_progress = 1;
+		panic("kmemcheck: extra #PF");
+	}
+
+	++__get_cpu_var(kmemcheck_num);
+
+	BUG_ON(!__get_cpu_var(kmemcheck_addr1)
+		&& !__get_cpu_var(kmemcheck_addr2));
+
+	n = 0;
+	n += show_addr(__get_cpu_var(kmemcheck_addr1));
+	n += show_addr(__get_cpu_var(kmemcheck_addr2));
+
+	/* None of the addresses actually belonged to kmemcheck. Note that
+	 * this is not an error. */
+	if (n == 0)
+		return;
+
+	++__get_cpu_var(kmemcheck_balance);
+	++__get_cpu_var(kmemcheck_num);
+
+	/*
+	 * The IF needs to be cleared as well, so that the faulting
+	 * instruction can run "uninterrupted". Otherwise, we might take
+	 * an interrupt and start executing that before we've had a chance
+	 * to hide the page again.
+	 *
+	 * NOTE: In the rare case of multiple faults, we must not override
+	 * the original flags:
+	 */
+	if (!(regs->flags & TF_MASK))
+		__get_cpu_var(kmemcheck_reg_flags) = regs->flags;
+
+	regs->flags |= TF_MASK;
+	regs->flags &= ~IF_MASK;
+}
+
+/*
+ * Called from the #DB handler.
+ */
+void
+kmemcheck_hide(struct pt_regs *regs)
+{
+	BUG_ON(!irqs_disabled());
+
+	--__get_cpu_var(kmemcheck_balance);
+	if (unlikely(__get_cpu_var(kmemcheck_balance) != 0)) {
+		oops_in_progress = 1;
+		panic("kmemcheck: extra #DB");
+	}
+
+	hide_addr(__get_cpu_var(kmemcheck_addr1));
+	hide_addr(__get_cpu_var(kmemcheck_addr2));
+	__get_cpu_var(kmemcheck_addr1) = 0;
+	__get_cpu_var(kmemcheck_addr2) = 0;
+
+	if (!(__get_cpu_var(kmemcheck_reg_flags) & TF_MASK))
+		regs->flags &= ~TF_MASK;
+	if (__get_cpu_var(kmemcheck_reg_flags) & IF_MASK)
+		regs->flags |= IF_MASK;
+}
+
+void
+kmemcheck_prepare(struct pt_regs *regs)
+{
+	/*
+	 * Detect and handle recursive pagefaults:
+	 */
+	if (__get_cpu_var(kmemcheck_balance) > 0) {
+		panic_timeout++;
+		/*
+		 * We can have multi-address faults from accesses like:
+		 *
+		 *          rep movsb %ds:(%esi),%es:(%edi)
+		 *
+		 * So in this case, we hide the current in-progress fault
+		 * and handle when we detect this recursion, we hide the
+		 * currently in-progress addresses again.
+		 */
+		kmemcheck_hide(regs);
+	}
+}
+
+void
+kmemcheck_show_pages(struct page *p, unsigned int n)
+{
+	unsigned int i;
+	struct page *head;
+
+	head = compound_head(p);
+	BUG_ON(!head);
+
+	ClearPageTracked(head);
+
+	for (i = 0; i < n; ++i) {
+		unsigned long address;
+		pte_t *pte;
+		int level;
+
+		address = (unsigned long) page_address(&p[i]);
+		pte = lookup_address(address, &level);
+		BUG_ON(!pte);
+		BUG_ON(level != PG_LEVEL_4K);
+
+		pte->pte_low |= _PAGE_PRESENT;
+		pte->pte_low &= ~_PAGE_HIDDEN;
+		__flush_tlb_one(address);
+	}
+}
+
+void
+kmemcheck_hide_pages(struct page *p, unsigned int n)
+{
+	unsigned int i;
+	struct page *head;
+
+	head = compound_head(p);
+	BUG_ON(!head);
+
+	SetPageTracked(head);
+
+	for (i = 0; i < n; ++i) {
+		unsigned long address;
+		pte_t *pte;
+		int level;
+
+		address = (unsigned long) page_address(&p[i]);
+		pte = lookup_address(address, &level);
+		BUG_ON(!pte);
+		BUG_ON(level != PG_LEVEL_4K);
+
+		pte->pte_low &= ~_PAGE_PRESENT;
+		pte->pte_low |= _PAGE_HIDDEN;
+		__flush_tlb_one(address);
+	}
+}
+
+static void
+mark_shadow(void *address, unsigned int n, enum shadow status)
+{
+	void *shadow;
+
+	shadow = address_get_shadow((unsigned long) address);
+	if (!shadow)
+		return;
+	__memset(shadow, status, n);
+}
+
+void
+kmemcheck_mark_unallocated(void *address, unsigned int n)
+{
+	mark_shadow(address, n, SHADOW_UNALLOCATED);
+}
+
+void
+kmemcheck_mark_uninitialized(void *address, unsigned int n)
+{
+	mark_shadow(address, n, SHADOW_UNINITIALIZED);
+}
+
+/*
+ * Fill the shadow memory of the given address such that the memory at that
+ * address is marked as being initialized.
+ */
+void
+kmemcheck_mark_initialized(void *address, unsigned int n)
+{
+	mark_shadow(address, n, SHADOW_INITIALIZED);
+}
+
+void
+kmemcheck_mark_freed(void *address, unsigned int n)
+{
+	mark_shadow(address, n, SHADOW_FREED);
+}
+
+void
+kmemcheck_mark_unallocated_pages(struct page *p, unsigned int n)
+{
+	unsigned int i;
+
+	for (i = 0; i < n; ++i)
+		kmemcheck_mark_unallocated(page_address(&p[i]), PAGE_SIZE);
+}
+
+void
+kmemcheck_mark_uninitialized_pages(struct page *p, unsigned int n)
+{
+	unsigned int i;
+
+	for (i = 0; i < n; ++i)
+		kmemcheck_mark_uninitialized(page_address(&p[i]), PAGE_SIZE);
+}
+
+static bool
+opcode_is_prefix(uint8_t b)
+{
+	return
+		/* Group 1 */
+		b == 0xf0 || b == 0xf2 || b == 0xf3
+		/* Group 2 */
+		|| b == 0x2e || b == 0x36 || b == 0x3e || b == 0x26
+		|| b == 0x64 || b == 0x65 || b == 0x2e || b == 0x3e
+		/* Group 3 */
+		|| b == 0x66
+		/* Group 4 */
+		|| b == 0x67;
+}
+
+/* This is a VERY crude opcode decoder. We only need to find the size of the
+ * load/store that caused our #PF and this should work for all the opcodes
+ * that we care about. Moreover, the ones who invented this instruction set
+ * should be shot. */
+static unsigned int
+opcode_get_size(const uint8_t *op)
+{
+	/* Default operand size */
+	int operand_size_override = 32;
+
+	/* prefixes */
+	for (; opcode_is_prefix(*op); ++op) {
+		if (*op == 0x66)
+			operand_size_override = 16;
+	}
+
+	/* escape opcode */
+	if (*op == 0x0f) {
+		++op;
+
+		if (*op == 0xb6)
+			return operand_size_override >> 1;
+		if (*op == 0xb7)
+			return 16;
+	}
+
+	return (*op & 1) ? operand_size_override : 8;
+}
+
+static const uint8_t *
+opcode_get_primary(const uint8_t *op)
+{
+	/* skip prefixes */
+	for (; opcode_is_prefix(*op); ++op);
+	return op;
+}
+
+static inline enum shadow
+test(void *shadow, unsigned int size)
+{
+	uint8_t *x;
+
+	x = shadow;
+
+#ifdef CONFIG_KMEMCHECK_PARTIAL_OK
+	/*
+	 * Make sure _some_ bytes are initialized. Gcc frequently generates
+	 * code to access neighboring bytes.
+	 */
+	switch (size) {
+	case 32:
+		if (x[3] == SHADOW_INITIALIZED)
+			return x[3];
+		if (x[2] == SHADOW_INITIALIZED)
+			return x[2];
+	case 16:
+		if (x[1] == SHADOW_INITIALIZED)
+			return x[1];
+	case 8:
+		if (x[0] == SHADOW_INITIALIZED)
+			return x[0];
+	}
+#else
+	switch (size) {
+	case 32:
+		if (x[3] != SHADOW_INITIALIZED)
+			return x[3];
+		if (x[2] != SHADOW_INITIALIZED)
+			return x[2];
+	case 16:
+		if (x[1] != SHADOW_INITIALIZED)
+			return x[1];
+	case 8:
+		if (x[0] != SHADOW_INITIALIZED)
+			return x[0];
+	}
+#endif
+
+	return x[0];
+}
+
+static inline void
+set(void *shadow, unsigned int size)
+{
+	uint8_t *x;
+
+	x = shadow;
+
+	switch (size) {
+	case 32:
+		x[3] = SHADOW_INITIALIZED;
+		x[2] = SHADOW_INITIALIZED;
+	case 16:
+		x[1] = SHADOW_INITIALIZED;
+	case 8:
+		x[0] = SHADOW_INITIALIZED;
+	}
+
+	return;
+}
+
+static void
+kmemcheck_read(struct pt_regs *regs, uint32_t address, unsigned int size)
+{
+	void *shadow;
+	enum shadow status;
+
+	shadow = address_get_shadow(address);
+	if (!shadow)
+		return;
+
+	status = test(shadow, size);
+	if (status == SHADOW_INITIALIZED)
+		return;
+
+	/* Don't warn about it again. */
+	set(shadow, size);
+
+	oops_in_progress = 1;
+	error_save(status, address, size, regs);
+}
+
+static void
+kmemcheck_write(struct pt_regs *regs, uint32_t address, unsigned int size)
+{
+	void *shadow;
+
+	shadow = address_get_shadow(address);
+	if (!shadow)
+		return;
+	set(shadow, size);
+}
+
+void
+kmemcheck_access(struct pt_regs *regs,
+	unsigned long fallback_address, enum kmemcheck_method fallback_method)
+{
+	const uint8_t *insn;
+	const uint8_t *insn_primary;
+	unsigned int size;
+
+	if (__get_cpu_var(kmemcheck_busy)) {
+		oops_in_progress = 1;
+		panic("kmemcheck: recursive fault");
+	}
+
+	__get_cpu_var(kmemcheck_busy) = true;
+
+	insn = (const uint8_t *) regs->ip;
+	insn_primary = opcode_get_primary(insn);
+
+	size = opcode_get_size(insn);
+
+	switch (insn_primary[0]) {
+#ifdef CONFIG_KMEMCHECK_BITOPS_OK
+		/* AND, OR, XOR */
+		/*
+		 * Unfortunately, these instructions have to be excluded from
+		 * our regular checking since they access only some (and not
+		 * all) bits. This clears out "bogus" bitfield-access warnings.
+		 */
+	case 0x80:
+	case 0x81:
+	case 0x82:
+	case 0x83:
+		switch ((insn_primary[1] >> 3) & 7) {
+			/* OR */
+		case 1:
+			/* AND */
+		case 4:
+			/* XOR */
+		case 6:
+			kmemcheck_write(regs, fallback_address, size);
+			__get_cpu_var(kmemcheck_addr1) = fallback_address;
+			__get_cpu_var(kmemcheck_addr2) = 0;
+			__get_cpu_var(kmemcheck_busy) = false;
+			return;
+
+			/* ADD */
+		case 0:
+			/* ADC */
+		case 2:
+			/* SBB */
+		case 3:
+			/* SUB */
+		case 5:
+			/* CMP */
+		case 7:
+			break;
+		}
+		break;
+#endif
+
+		/* MOVS, MOVSB, MOVSW, MOVSD */
+	case 0xa4:
+	case 0xa5:
+		/* These instructions are special because they take two
+		 * addresses, but we only get one page fault. */
+		kmemcheck_read(regs, regs->si, size);
+		kmemcheck_write(regs, regs->di, size);
+		__get_cpu_var(kmemcheck_addr1) = regs->si;
+		__get_cpu_var(kmemcheck_addr2) = regs->di;
+		__get_cpu_var(kmemcheck_busy) = false;
+		return;
+
+		/* CMPS, CMPSB, CMPSW, CMPSD */
+	case 0xa6:
+	case 0xa7:
+		kmemcheck_read(regs, regs->si, size);
+		kmemcheck_read(regs, regs->di, size);
+		__get_cpu_var(kmemcheck_addr1) = regs->si;
+		__get_cpu_var(kmemcheck_addr2) = regs->di;
+		__get_cpu_var(kmemcheck_busy) = false;
+		return;
+	}
+
+	/* If the opcode isn't special in any way, we use the data from the
+	 * page fault handler to determine the address and type of memory
+	 * access. */
+	switch (fallback_method) {
+	case KMEMCHECK_READ:
+		kmemcheck_read(regs, fallback_address, size);
+		__get_cpu_var(kmemcheck_addr1) = fallback_address;
+		__get_cpu_var(kmemcheck_addr2) = 0;
+		__get_cpu_var(kmemcheck_busy) = false;
+		return;
+	case KMEMCHECK_WRITE:
+		kmemcheck_write(regs, fallback_address, size);
+		__get_cpu_var(kmemcheck_addr1) = fallback_address;
+		__get_cpu_var(kmemcheck_addr2) = 0;
+		__get_cpu_var(kmemcheck_busy) = false;
+		return;
+	}
+}
+
+/*
+ * A faster implementation of memset() when tracking is enabled where the
+ * whole memory area is within a single page.
+ */
+static void
+memset_one_page(unsigned long s, int c, size_t n)
+{
+	void *x;
+	unsigned long flags;
+
+	x = address_get_shadow(s);
+	if (!x) {
+		/* The page isn't being tracked. */
+		__memset((void *) s, c, n);
+		return;
+	}
+
+	/* While we are not guarding the page in question, nobody else
+	 * should be able to change them. */
+	local_irq_save(flags);
+
+	show_addr(s);
+	__memset((void *) s, c, n);
+	__memset((void *) x, SHADOW_INITIALIZED, n);
+	hide_addr(s);
+
+	local_irq_restore(flags);
+}
+
+/*
+ * A faster implementation of memset() when tracking is enabled. We cannot
+ * assume that all pages within the range are tracked, so copying has to be
+ * split into page-sized (or smaller, for the ends) chunks.
+ */
+void
+kmemcheck_memset(unsigned long s, int c, size_t n)
+{
+	unsigned long a_page, a_offset;
+	unsigned long b_page, b_offset;
+	unsigned long i;
+
+	if (!n)
+		return;
+
+	if (!slab_is_available()) {
+		__memset((void *) s, c, n);
+		return;
+	}
+
+	a_page = s & PAGE_MASK;
+	b_page = (s + n) & PAGE_MASK;
+
+	if (a_page == b_page) {
+		/* The entire area is within the same page. Good, we only
+		 * need one memset(). */
+		memset_one_page(s, c, n);
+		return;
+	}
+
+	a_offset = s & ~PAGE_MASK;
+	b_offset = (s + n) & ~PAGE_MASK;
+
+	/* Clear the head, body, and tail of the memory area. */
+	if (a_offset < PAGE_SIZE)
+		memset_one_page(s, c, PAGE_SIZE - a_offset);
+	for (i = a_page + PAGE_SIZE; i < b_page; i += PAGE_SIZE)
+		memset_one_page(i, c, PAGE_SIZE);
+	if (b_offset > 0)
+		memset_one_page(b_page, c, b_offset);
+}
+
+EXPORT_SYMBOL(kmemcheck_memset);
diff --git a/include/asm-x86/kmemcheck.h b/include/asm-x86/kmemcheck.h
new file mode 100644
index 0000000..11de35a
--- /dev/null
+++ b/include/asm-x86/kmemcheck.h
@@ -0,0 +1,3 @@
+#ifdef CONFIG_X86_32
+# include "kmemcheck_32.h"
+#endif
diff --git a/include/asm-x86/kmemcheck_32.h b/include/asm-x86/kmemcheck_32.h
new file mode 100644
index 0000000..295e256
--- /dev/null
+++ b/include/asm-x86/kmemcheck_32.h
@@ -0,0 +1,22 @@
+#ifndef ASM_X86_KMEMCHECK_32_H
+#define ASM_X86_KMEMCHECK_32_H
+
+#include <linux/percpu.h>
+#include <asm/pgtable.h>
+
+enum kmemcheck_method {
+	KMEMCHECK_READ,
+	KMEMCHECK_WRITE,
+};
+
+#ifdef CONFIG_KMEMCHECK
+void kmemcheck_prepare(struct pt_regs *regs);
+
+void kmemcheck_show(struct pt_regs *regs);
+void kmemcheck_hide(struct pt_regs *regs);
+
+void kmemcheck_access(struct pt_regs *regs,
+	unsigned long address, enum kmemcheck_method method);
+#endif
+
+#endif
diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h
index 174b877..eb64bbb 100644
--- a/include/asm-x86/pgtable.h
+++ b/include/asm-x86/pgtable.h
@@ -17,8 +17,8 @@
  #define _PAGE_BIT_GLOBAL	8	/* Global TLB entry PPro+ */
  #define _PAGE_BIT_UNUSED1	9	/* available for programmer */
  #define _PAGE_BIT_UNUSED2	10
-#define _PAGE_BIT_UNUSED3	11
  #define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
+#define _PAGE_BIT_HIDDEN	11
  #define _PAGE_BIT_NX           63       /* No execute: only valid after cpuid check */

  /*
@@ -37,9 +37,9 @@
  #define _PAGE_GLOBAL	(_AC(1, L)<<_PAGE_BIT_GLOBAL)	/* Global TLB entry */
  #define _PAGE_UNUSED1	(_AC(1, L)<<_PAGE_BIT_UNUSED1)
  #define _PAGE_UNUSED2	(_AC(1, L)<<_PAGE_BIT_UNUSED2)
-#define _PAGE_UNUSED3	(_AC(1, L)<<_PAGE_BIT_UNUSED3)
  #define _PAGE_PAT	(_AC(1, L)<<_PAGE_BIT_PAT)
  #define _PAGE_PAT_LARGE (_AC(1, L)<<_PAGE_BIT_PAT_LARGE)
+#define _PAGE_HIDDEN	(_AC(1, L)<<_PAGE_BIT_HIDDEN)

  #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
  #define _PAGE_NX	(_AC(1, ULL) << _PAGE_BIT_NX)
diff --git a/include/asm-x86/pgtable_32.h b/include/asm-x86/pgtable_32.h
index a842c72..6830703 100644
--- a/include/asm-x86/pgtable_32.h
+++ b/include/asm-x86/pgtable_32.h
@@ -87,6 +87,7 @@ void paging_init(void);
  extern unsigned long pg0[];

  #define pte_present(x)	((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE))
+#define pte_hidden(x)	((x).pte_low & (_PAGE_HIDDEN))

  /* To avoid harmful races, pmd_none(x) should check only the lower when PAE */
  #define pmd_none(x)	(!(unsigned long)pmd_val(x))
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0c6ce51..2138d64 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -50,8 +50,9 @@ struct vm_area_struct;
  #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
  #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
+#define __GFP_NOTRACK	((__force gfp_t)0x200000u)  /* Don't track with kmemcheck */

-#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 22	/* Room for 22 __GFP_FOO bits */
  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))

  /* This equals 0, but use constants in case they ever change */
diff --git a/include/linux/kmemcheck.h b/include/linux/kmemcheck.h
new file mode 100644
index 0000000..407bc5c
--- /dev/null
+++ b/include/linux/kmemcheck.h
@@ -0,0 +1,17 @@
+#ifndef LINUX_KMEMCHECK_H
+#define LINUX_KMEMCHECK_H
+
+#ifdef CONFIG_KMEMCHECK
+void kmemcheck_show_pages(struct page *p, unsigned int n);
+void kmemcheck_hide_pages(struct page *p, unsigned int n);
+
+void kmemcheck_mark_unallocated(void *address, unsigned int n);
+void kmemcheck_mark_uninitialized(void *address, unsigned int n);
+void kmemcheck_mark_initialized(void *address, unsigned int n);
+void kmemcheck_mark_freed(void *address, unsigned int n);
+
+void kmemcheck_mark_unallocated_pages(struct page *p, unsigned int n);
+void kmemcheck_mark_uninitialized_pages(struct page *p, unsigned int n);
+#endif
+
+#endif
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index bbad43f..1593859 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -90,6 +90,8 @@
  #define PG_reclaim		17	/* To be reclaimed asap */
  #define PG_buddy		19	/* Page is free, on buddy lists */

+#define PG_tracked		20	/* Page is tracked by kmemcheck */
+
  /* PG_readahead is only used for file reads; PG_reclaim is only for writes */
  #define PG_readahead		PG_reclaim /* Reminder to do async read-ahead */

@@ -296,6 +298,11 @@ static inline void __ClearPageTail(struct page *page)
  #define SetPageUncached(page)	set_bit(PG_uncached, &(page)->flags)
  #define ClearPageUncached(page)	clear_bit(PG_uncached, &(page)->flags)

+#define PageTracked(page)	test_bit(PG_tracked, &(page)->flags)
+#define SetPageTracked(page)	set_bit(PG_tracked, &(page)->flags)
+#define ClearPageTracked(page)	clear_bit(PG_tracked, &(page)->flags)
+
+
  struct page;	/* forward declaration */

  extern void cancel_dirty_page(struct page *page, unsigned int account_size);
-- 
1.5.3.8

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] kmemcheck v4
  2008-02-14 19:59 [PATCH 1/4] kmemcheck v4 Vegard Nossum
@ 2008-02-14 20:01 ` Vegard Nossum
  2008-02-14 20:02 ` [PATCH 3/4] " Vegard Nossum
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Vegard Nossum @ 2008-02-14 20:01 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Daniel Walker, Ingo Molnar, Richard Knutsson, Andi Kleen,
	Pekka Enberg, Christoph Lameter

 From 5cbf7d1cb81b4f8bb4d80c027b74cdf0f08aaaff Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: Thu, 14 Feb 2008 19:20:51 +0100
Subject: [PATCH] kmemcheck: add the x86 hooks

The hooks that we modify are:
- Page fault handler (to handle kmemcheck faults)
- Debug exception handler (to hide pages after single-stepping
   the instruction that caused the page fault)

Also redefine memset() to use the optimized version if kmemcheck
is enabled.

Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
---
  arch/x86/kernel/cpu/common.c |    7 +++++++
  arch/x86/kernel/entry_32.S   |    8 ++++----
  arch/x86/kernel/traps_32.c   |   22 +++++++++++++++++++++-
  arch/x86/mm/fault.c          |   21 +++++++++++++++++----
  include/asm-x86/string_32.h  |    8 ++++++++
  5 files changed, 57 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f86a3c4..83cd359 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -634,6 +634,13 @@ void __init early_cpu_init(void)
  	nexgen_init_cpu();
  	umc_init_cpu();
  	early_cpu_detect();
+
+#ifdef CONFIG_KMEMCHECK
+	/*
+	 * We need 4K granular PTEs for kmemcheck:
+	 */
+	setup_clear_cpu_cap(X86_FEATURE_PSE);
+#endif
  }

  /* Make sure %fs is initialized properly in idle threads */
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 824e21b..30b94e7 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -289,7 +289,7 @@ ENTRY(ia32_sysenter_target)
  	CFI_DEF_CFA esp, 0
  	CFI_REGISTER esp, ebp
  	movl TSS_sysenter_sp0(%esp),%esp
-sysenter_past_esp:
+ENTRY(sysenter_past_esp)
  	/*
  	 * No need to follow this irqs on/off section: the syscall
  	 * disabled irqs and here we enable it straight after entry:
@@ -767,7 +767,7 @@ label:						\
  	CFI_ADJUST_CFA_OFFSET 4;		\
  	CFI_REL_OFFSET eip, 0

-KPROBE_ENTRY(debug)
+KPROBE_ENTRY(x86_debug)
  	RING0_INT_FRAME
  	cmpl $ia32_sysenter_target,(%esp)
  	jne debug_stack_correct
@@ -781,7 +781,7 @@ debug_stack_correct:
  	call do_debug
  	jmp ret_from_exception
  	CFI_ENDPROC
-KPROBE_END(debug)
+KPROBE_END(x86_debug)

  /*
   * NMI is doubly nasty. It can happen _while_ we're handling
@@ -835,7 +835,7 @@ nmi_debug_stack_check:
  	/* We have a RING0_INT_FRAME here */
  	cmpw $__KERNEL_CS,16(%esp)
  	jne nmi_stack_correct
-	cmpl $debug,(%esp)
+	cmpl $x86_debug,(%esp)
  	jb nmi_stack_correct
  	cmpl $debug_esp_fix_insn,(%esp)
  	ja nmi_stack_correct
diff --git a/arch/x86/kernel/traps_32.c b/arch/x86/kernel/traps_32.c
index b22c01e..1299d38 100644
--- a/arch/x86/kernel/traps_32.c
+++ b/arch/x86/kernel/traps_32.c
@@ -56,6 +56,7 @@
  #include <asm/arch_hooks.h>
  #include <linux/kdebug.h>
  #include <asm/stacktrace.h>
+#include <asm/kmemcheck.h>

  #include <linux/module.h>

@@ -841,6 +842,10 @@ void __kprobes do_int3(struct pt_regs *regs, long error_code)
  }
  #endif

+extern void ia32_sysenter_target(void);
+extern void sysenter_past_esp(void);
+extern void x86_debug(void);
+
  /*
   * Our handling of the processor debug registers is non-trivial.
   * We do not clear them on entry and exit from the kernel. Therefore
@@ -872,6 +877,20 @@ void __kprobes do_debug(struct pt_regs * regs, long error_code)

  	get_debugreg(condition, 6);

+#ifdef CONFIG_KMEMCHECK
+	/* Catch kmemcheck conditions first of all! */
+	if (condition & DR_STEP) {
+		if (!(regs->flags & VM_MASK) && !user_mode(regs) &&
+			((void *)regs->ip != system_call) &&
+			((void *)regs->ip != x86_debug) &&
+			((void *)regs->ip != ia32_sysenter_target) &&
+			((void *)regs->ip != sysenter_past_esp)) {
+			kmemcheck_hide(regs);
+			return;
+		}
+	}
+#endif
+
  	/*
  	 * The processor cleared BTF, so don't mark that we need it set.
  	 */
@@ -881,6 +900,7 @@ void __kprobes do_debug(struct pt_regs * regs, long error_code)
  	if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
  					SIGTRAP) == NOTIFY_STOP)
  		return;
+
  	/* It's safe to allow irq's after DR6 has been saved */
  	if (regs->flags & X86_EFLAGS_IF)
  		local_irq_enable();
@@ -1154,7 +1174,7 @@ void __init trap_init(void)
  #endif

  	set_trap_gate(0,&divide_error);
-	set_intr_gate(1,&debug);
+	set_intr_gate(1,&x86_debug);
  	set_intr_gate(2,&nmi);
  	set_system_intr_gate(3, &int3); /* int3/4 can be called from all */
  	set_system_gate(4,&overflow);
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 621afb6..2886cf9 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -33,6 +33,7 @@
  #include <asm/smp.h>
  #include <asm/tlbflush.h>
  #include <asm/proto.h>
+#include <asm/kmemcheck.h>
  #include <asm-generic/sections.h>

  /*
@@ -493,7 +494,8 @@ static int spurious_fault(unsigned long address,
   *
   * This assumes no large pages in there.
   */
-static int vmalloc_fault(unsigned long address)
+static int vmalloc_fault(struct pt_regs *regs, unsigned long address,
+	unsigned long error_code)
  {
  #ifdef CONFIG_X86_32
  	unsigned long pgd_paddr;
@@ -511,8 +513,19 @@ static int vmalloc_fault(unsigned long address)
  	if (!pmd_k)
  		return -1;
  	pte_k = pte_offset_kernel(pmd_k, address);
-	if (!pte_present(*pte_k))
-		return -1;
+	if (!pte_present(*pte_k)) {
+		if (!pte_hidden(*pte_k))
+			return -1;
+
+#ifdef CONFIG_KMEMCHECK
+		kmemcheck_prepare(regs);
+		if (error_code & 2)
+			kmemcheck_access(regs, address, KMEMCHECK_WRITE);
+		else
+			kmemcheck_access(regs, address, KMEMCHECK_READ);
+		kmemcheck_show(regs);
+#endif
+	}
  	return 0;
  #else
  	pgd_t *pgd, *pgd_ref;
@@ -623,7 +636,7 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
  	if (unlikely(address >= TASK_SIZE64)) {
  #endif
  		if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) &&
-		    vmalloc_fault(address) >= 0)
+		    vmalloc_fault(regs, address, error_code) >= 0)
  			return;

  		/* Can handle a stale RO->RW TLB */
diff --git a/include/asm-x86/string_32.h b/include/asm-x86/string_32.h
index c5d13a8..546cea0 100644
--- a/include/asm-x86/string_32.h
+++ b/include/asm-x86/string_32.h
@@ -262,6 +262,14 @@ __asm__  __volatile__( \
   __constant_c_x_memset((s),(0x01010101UL*(unsigned char)(c)),(count)) : \
   __memset((s),(c),(count)))

+/* If kmemcheck is enabled, our best bet is a custom memset() that disables
+ * checking in order to save a whole lot of (unnecessary) page faults. */
+#ifdef CONFIG_KMEMCHECK
+void kmemcheck_memset(unsigned long s, int c, size_t n);
+#undef memset
+#define memset(s, c, n) kmemcheck_memset((unsigned long) (s), (c), (n))
+#endif
+
  /*
   * find the first occurrence of byte 'c', or 1 past the area if none
   */
-- 
1.5.3.8



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] kmemcheck v4
  2008-02-14 19:59 [PATCH 1/4] kmemcheck v4 Vegard Nossum
  2008-02-14 20:01 ` [PATCH 2/4] " Vegard Nossum
@ 2008-02-14 20:02 ` Vegard Nossum
  2008-02-14 20:18   ` Pekka Enberg
  2008-02-14 20:02 ` [PATCH 4/4] " Vegard Nossum
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Vegard Nossum @ 2008-02-14 20:02 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Daniel Walker, Ingo Molnar, Richard Knutsson, Andi Kleen,
	Pekka Enberg, Christoph Lameter

 From 4ce1c09e38b2402dc04f0246916f3c23abe8f3e1 Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: Thu, 14 Feb 2008 19:25:39 +0100
Subject: [PATCH] kmemcheck: make SLUB use kmemcheck

With kmemcheck enabled, SLUB needs to do this:

1. Request twice as much memory as would normally be needed. The bottom half
    of the memory is what the user actually sees and uses; the upper half
    contains the so-called shadow memory, which stores the status of each byte
    in the bottom half, e.g. initialized or uninitialized.
2. Tell kmemcheck which parts of memory that should be marked uninitialized.
    There are actually a few more states, such as "not yet allocated" and
    "recently freed".

If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
memory that can take page faults because of kmemcheck.

If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
request memory with the __GFP_NOTRACK flag. This does not prevent the page
faults from occuring, however, but marks the object in question as being
initialized so that no warnings will ever be produced for this object.

Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
---
  include/linux/slab.h |    1 +
  kernel/fork.c        |   15 ++++---
  mm/slub.c            |  113 +++++++++++++++++++++++++++++++++++++++++---------
  3 files changed, 102 insertions(+), 27 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index f62caaa..35cc185 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -28,6 +28,7 @@
  #define SLAB_DESTROY_BY_RCU	0x00080000UL	/* Defer freeing slabs to RCU */
  #define SLAB_MEM_SPREAD		0x00100000UL	/* Spread some memory over cpuset */
  #define SLAB_TRACE		0x00200000UL	/* Trace allocations and frees */
+#define SLAB_NOTRACK		0x00400000UL	/* Don't track use of uninitialized memory */

  /* The following flags affect the page allocator grouping pages by mobility */
  #define SLAB_RECLAIM_ACCOUNT	0x00020000UL		/* Objects are reclaimable */
diff --git a/kernel/fork.c b/kernel/fork.c
index 4363a4e..1541016 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -141,7 +141,7 @@ void __init fork_init(unsigned long mempages)
  	/* create a slab on which task_structs can be allocated */
  	task_struct_cachep =
  		kmem_cache_create("task_struct", sizeof(struct task_struct),
-			ARCH_MIN_TASKALIGN, SLAB_PANIC, NULL);
+			ARCH_MIN_TASKALIGN, SLAB_PANIC | SLAB_NOTRACK, NULL);
  #endif

  	/*
@@ -1547,23 +1547,24 @@ void __init proc_caches_init(void)
  {
  	sighand_cachep = kmem_cache_create("sighand_cache",
  			sizeof(struct sighand_struct), 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_DESTROY_BY_RCU,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_DESTROY_BY_RCU
+			|SLAB_NOTRACK,
  			sighand_ctor);
  	signal_cachep = kmem_cache_create("signal_cache",
  			sizeof(struct signal_struct), 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NOTRACK, NULL);
  	files_cachep = kmem_cache_create("files_cache",
  			sizeof(struct files_struct), 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NOTRACK, NULL);
  	fs_cachep = kmem_cache_create("fs_cache",
  			sizeof(struct fs_struct), 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NOTRACK, NULL);
  	vm_area_cachep = kmem_cache_create("vm_area_struct",
  			sizeof(struct vm_area_struct), 0,
-			SLAB_PANIC, NULL);
+			SLAB_PANIC|SLAB_NOTRACK, NULL);
  	mm_cachep = kmem_cache_create("mm_struct",
  			sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NOTRACK, NULL);
  }

  /*
diff --git a/mm/slub.c b/mm/slub.c
index e2989ae..c9787df 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -21,6 +21,7 @@
  #include <linux/ctype.h>
  #include <linux/kallsyms.h>
  #include <linux/memory.h>
+#include <linux/kmemcheck.h>

  /*
   * Lock order:
@@ -198,7 +199,7 @@ static inline void ClearSlabDebug(struct page *page)
  		SLAB_TRACE | SLAB_DESTROY_BY_RCU)

  #define SLUB_MERGE_SAME (SLAB_DEBUG_FREE | SLAB_RECLAIM_ACCOUNT | \
-		SLAB_CACHE_DMA)
+		SLAB_CACHE_DMA | SLAB_NOTRACK)

  #ifndef ARCH_KMALLOC_MINALIGN
  #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
@@ -1077,9 +1078,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
  {
  	struct page *page;
  	int pages = 1 << s->order;
-
-	if (s->order)
-		flags |= __GFP_COMP;
+	int order = s->order;

  	if (s->flags & SLAB_CACHE_DMA)
  		flags |= SLUB_DMA;
@@ -1087,14 +1086,50 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
  	if (s->flags & SLAB_RECLAIM_ACCOUNT)
  		flags |= __GFP_RECLAIMABLE;

+#ifdef CONFIG_KMEMCHECK
+	/*
+	 * With kmemcheck enabled, we actually allocate twice as much. The
+	 * upper half of the allocation is used as our shadow memory where
+	 * the status (e.g. initialized/uninitialized) of each byte is
+	 * stored.
+	 */
+	if (!(s->flags & SLAB_NOTRACK)) {
+		pages += pages;
+		order += 1;
+	}
+#endif
+
+	if (order)
+		flags |= __GFP_COMP;
+
  	if (node == -1)
-		page = alloc_pages(flags, s->order);
+		page = alloc_pages(flags, order);
  	else
-		page = alloc_pages_node(node, flags, s->order);
+		page = alloc_pages_node(node, flags, order);

  	if (!page)
  		return NULL;

+#ifdef CONFIG_KMEMCHECK
+	if (!(s->flags & SLAB_NOTRACK)) {
+		/*
+		 * Mark it as non-present for the MMU so that our accesses to
+		 * this memory will trigger a page fault and let us analyze
+		 * the memory accesses.
+		 */
+		kmemcheck_hide_pages(page, pages >> 1);
+
+		/*
+		 * Objects from caches that have a constructor don't get
+		 * cleared when they're allocated, so we need to do it here.
+		 */
+		if (s->ctor)
+			kmemcheck_mark_uninitialized_pages(page, pages >> 1);
+		else
+			kmemcheck_mark_unallocated_pages(page, pages >> 1);
+	}
+#endif
+
  	mod_zone_page_state(page_zone(page),
  		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
  		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
@@ -1159,6 +1194,7 @@ out:
  static void __free_slab(struct kmem_cache *s, struct page *page)
  {
  	int pages = 1 << s->order;
+	int order = s->order;

  	if (unlikely(SlabDebug(page))) {
  		void *p;
@@ -1169,13 +1205,21 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
  		ClearSlabDebug(page);
  	}

+#ifdef CONFIG_KMEMCHECK
+	if (!(s->flags & SLAB_NOTRACK)) {
+		kmemcheck_show_pages(page, pages);
+		pages += pages;
+		order += 1;
+	}
+#endif
+
  	mod_zone_page_state(page_zone(page),
  		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
  		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
  		-pages);

  	page->mapping = NULL;
-	__free_pages(page, s->order);
+	__free_pages(page, order);
  }

  static void rcu_free_slab(struct rcu_head *h)
@@ -1656,6 +1700,28 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
  	if (unlikely((gfpflags & __GFP_ZERO) && object))
  		memset(object, 0, c->objsize);

+#ifdef CONFIG_KMEMCHECK
+	if (!(gfpflags & __GFP_ZERO) && !(s->flags & SLAB_NOTRACK)) {
+		if (gfpflags & __GFP_NOTRACK) {
+			/*
+			 * Allow notracked objects to be allocated from
+			 * tracked caches. Note however that these objects
+			 * will still get page faults on access, they just
+			 * won't ever be flagged as uninitialized. If page
+			 * faults are not acceptable, the slab cache itself
+			 * should be marked NOTRACK.
+			 */
+			kmemcheck_mark_initialized(object, s->objsize);
+		} else if (!s->ctor) {
+			/*
+			 * New objects should be marked uninitialized before
+			 * they're returned to the called.
+			 */
+			kmemcheck_mark_uninitialized(object, s->objsize);
+		}
+	}
+#endif
+
  	return object;
  }

@@ -1767,10 +1833,19 @@ static __always_inline void slab_free(struct kmem_cache *s,
  {
  	void **object = (void *)x;
  	struct kmem_cache_cpu *c;
-
  #ifdef SLUB_FASTPATH
  	void **freelist;
+#else
+	unsigned long flags;
+
+#endif
+
+#ifdef CONFIG_KMEMCHECK
+	if (!s->ctor)
+		kmemcheck_mark_freed(object, s->objsize);
+#endif

+#ifdef SLUB_FASTPATH
  	c = get_cpu_slab(s, raw_smp_processor_id());
  	debug_check_no_locks_freed(object, s->objsize);
  	do {
@@ -1795,8 +1870,6 @@ static __always_inline void slab_free(struct kmem_cache *s,
  		stat(c, FREE_FASTPATH);
  	} while (cmpxchg_local(&c->freelist, freelist, object) != freelist);
  #else
-	unsigned long flags;
-
  	local_irq_save(flags);
  	debug_check_no_locks_freed(object, s->objsize);
  	c = get_cpu_slab(s, smp_processor_id());
@@ -2527,12 +2600,10 @@ static int __init setup_slub_nomerge(char *str)
  __setup("slub_nomerge", setup_slub_nomerge);

  static struct kmem_cache *create_kmalloc_cache(struct kmem_cache *s,
-		const char *name, int size, gfp_t gfp_flags)
+	const char *name, int size, gfp_t gfp_flags, unsigned int flags)
  {
-	unsigned int flags = 0;
-
  	if (gfp_flags & SLUB_DMA)
-		flags = SLAB_CACHE_DMA;
+		flags |= SLAB_CACHE_DMA;

  	down_write(&slub_lock);
  	if (!kmem_cache_open(s, gfp_flags, name, size, ARCH_KMALLOC_MINALIGN,
@@ -2595,7 +2666,8 @@ static noinline struct kmem_cache *dma_kmalloc_cache(int index, gfp_t flags)

  	if (!s || !text || !kmem_cache_open(s, flags, text,
  			realsize, ARCH_KMALLOC_MINALIGN,
-			SLAB_CACHE_DMA|__SYSFS_ADD_DEFERRED, NULL)) {
+			SLAB_CACHE_DMA|SLAB_NOTRACK|__SYSFS_ADD_DEFERRED,
+			NULL)) {
  		kfree(s);
  		kfree(text);
  		goto unlock_out;
@@ -2979,7 +3051,7 @@ void __init kmem_cache_init(void)
  	 * kmem_cache_open for slab_state == DOWN.
  	 */
  	create_kmalloc_cache(&kmalloc_caches[0], "kmem_cache_node",
-		sizeof(struct kmem_cache_node), GFP_KERNEL);
+		sizeof(struct kmem_cache_node), GFP_KERNEL, 0);
  	kmalloc_caches[0].refcount = -1;
  	caches++;

@@ -2992,18 +3064,18 @@ void __init kmem_cache_init(void)
  	/* Caches that are not of the two-to-the-power-of size */
  	if (KMALLOC_MIN_SIZE <= 64) {
  		create_kmalloc_cache(&kmalloc_caches[1],
-				"kmalloc-96", 96, GFP_KERNEL);
+				"kmalloc-96", 96, GFP_KERNEL, 0);
  		caches++;
  	}
  	if (KMALLOC_MIN_SIZE <= 128) {
  		create_kmalloc_cache(&kmalloc_caches[2],
-				"kmalloc-192", 192, GFP_KERNEL);
+				"kmalloc-192", 192, GFP_KERNEL, 0);
  		caches++;
  	}

  	for (i = KMALLOC_SHIFT_LOW; i < PAGE_SHIFT; i++) {
  		create_kmalloc_cache(&kmalloc_caches[i],
-			"kmalloc", 1 << i, GFP_KERNEL);
+			"kmalloc", 1 << i, GFP_KERNEL, 0);
  		caches++;
  	}

@@ -3040,7 +3112,6 @@ void __init kmem_cache_init(void)
  	kmem_size = sizeof(struct kmem_cache);
  #endif

-
  	printk(KERN_INFO
  		"SLUB: Genslabs=%d, HWalign=%d, Order=%d-%d, MinObjects=%d,"
  		" CPUs=%d, Nodes=%d\n",
@@ -4230,6 +4301,8 @@ static char *create_unique_id(struct kmem_cache *s)
  		*p++ = 'a';
  	if (s->flags & SLAB_DEBUG_FREE)
  		*p++ = 'F';
+	if (!(s->flags & SLAB_NOTRACK))
+		*p++ = 't';
  	if (p != name + 1)
  		*p++ = '-';
  	p += sprintf(p, "%07d", s->size);
-- 
1.5.3.8



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] kmemcheck v4
  2008-02-14 19:59 [PATCH 1/4] kmemcheck v4 Vegard Nossum
  2008-02-14 20:01 ` [PATCH 2/4] " Vegard Nossum
  2008-02-14 20:02 ` [PATCH 3/4] " Vegard Nossum
@ 2008-02-14 20:02 ` Vegard Nossum
  2008-02-14 20:12   ` Pekka Enberg
  2008-02-14 21:49   ` Andi Kleen
  2008-02-14 20:45 ` [PATCH 1/4] " Pekka Enberg
  2008-02-14 21:05 ` Randy Dunlap
  4 siblings, 2 replies; 11+ messages in thread
From: Vegard Nossum @ 2008-02-14 20:02 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Daniel Walker, Ingo Molnar, Richard Knutsson, Andi Kleen,
	Pekka Enberg, Christoph Lameter

 From f65bd157b88d3ae9a75737cdff5d6f27920af43d Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Thu, 14 Feb 2008 20:53:42 +0100
Subject: [PATCH] kmemcheck: Fix-up (some bogus) reports

Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
---
  include/asm-generic/siginfo.h |    8 ++++++++
  include/linux/fs.h            |    4 ++--
  include/linux/netdevice.h     |    4 ++--
  include/linux/skbuff.h        |    6 +++++-
  include/net/inet_sock.h       |    3 ++-
  include/net/tcp.h             |   11 +++++++++++
  init/do_mounts.c              |    8 ++++++--
  kernel/signal.c               |   12 ++++++++++++
  net/core/skbuff.c             |    6 ++++++
  net/ipv4/tcp_output.c         |    4 ++++
  10 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 8786e01..b70cd97 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -278,11 +278,19 @@ void do_schedule_next_timer(struct siginfo *info);

  static inline void copy_siginfo(struct siginfo *to, struct siginfo *from)
  {
+#ifdef CONFIG_KMEMCHECK
+	memcpy(to, from, sizeof(*to));
+#else
+	/*
+	 * Optimization, only copy up to the size of the largest known
+	 * union member:
+	 */
  	if (from->si_code < 0)
  		memcpy(to, from, sizeof(*to));
  	else
  		/* _sigchld is currently the largest know union member */
  		memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld));
+#endif
  }

  #endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 18cfbf7..e038fb0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -875,8 +875,8 @@ struct file_lock {
  	struct pid *fl_nspid;
  	wait_queue_head_t fl_wait;
  	struct file *fl_file;
-	unsigned char fl_flags;
-	unsigned char fl_type;
+	unsigned int fl_flags;
+	unsigned int fl_type;
  	loff_t fl_start;
  	loff_t fl_end;

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 047d432..d4cc6ee 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -193,8 +193,8 @@ struct dev_addr_list
  {
  	struct dev_addr_list	*next;
  	u8			da_addr[MAX_ADDR_LEN];
-	u8			da_addrlen;
-	u8			da_synced;
+	unsigned int		da_addrlen;
+	unsigned int		da_synced;
  	int			da_users;
  	int			da_gusers;
  };
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 412672a..7bdb37f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1294,7 +1294,11 @@ static inline void __skb_queue_purge(struct sk_buff_head *list)
  static inline struct sk_buff *__dev_alloc_skb(unsigned int length,
  					      gfp_t gfp_mask)
  {
-	struct sk_buff *skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
+	struct sk_buff *skb;
+#ifdef CONFIG_KMEMCHECK
+	gfp_mask |= __GFP_ZERO;
+#endif
+	skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
  	if (likely(skb))
  		skb_reserve(skb, NET_SKB_PAD);
  	return skb;
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 70013c5..a575171 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -73,7 +73,8 @@ struct inet_request_sock {
  				sack_ok	   : 1,
  				wscale_ok  : 1,
  				ecn_ok	   : 1,
-				acked	   : 1;
+				acked	   : 1,
+				__filler   : 3;
  	struct ip_options	*opt;
  };

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de4ea3..4d522f6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -951,6 +951,17 @@ static inline void tcp_openreq_init(struct request_sock *req,
  	tcp_rsk(req)->rcv_isn = TCP_SKB_CB(skb)->seq;
  	req->mss = rx_opt->mss_clamp;
  	req->ts_recent = rx_opt->saw_tstamp ? rx_opt->rcv_tsval : 0;
+#ifdef CONFIG_KMEMCHECK
+	/* bitfield init */
+	ireq->snd_wscale =
+	ireq->rcv_wscale =
+	ireq->tstamp_ok =
+	ireq->sack_ok =
+	ireq->wscale_ok =
+	ireq->ecn_ok =
+	ireq->acked =
+	ireq->__filler = 0;
+#endif
  	ireq->tstamp_ok = rx_opt->tstamp_ok;
  	ireq->sack_ok = rx_opt->sack_ok;
  	ireq->snd_wscale = rx_opt->snd_wscale;
diff --git a/init/do_mounts.c b/init/do_mounts.c
index f865731..87b1b0f 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -201,9 +201,13 @@ static int __init do_mount_root(char *name, char *fs, int flags, void *data)
  	return 0;
  }

+#if PAGE_SIZE < PATH_MAX
+# error increase the fs_names allocation size here
+#endif
+
  void __init mount_block_root(char *name, int flags)
  {
-	char *fs_names = __getname();
+	char *fs_names = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
  	char *p;
  #ifdef CONFIG_BLOCK
  	char b[BDEVNAME_SIZE];
@@ -251,7 +255,7 @@ retry:
  #endif
  	panic("VFS: Unable to mount root fs on %s", b);
  out:
-	putname(fs_names);
+	free_pages((unsigned long)fs_names, 1);
  }

  #ifdef CONFIG_ROOT_NFS
diff --git a/kernel/signal.c b/kernel/signal.c
index 2c1f08d..2cf5164 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -691,6 +691,12 @@ static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
  		list_add_tail(&q->list, &signals->list);
  		switch ((unsigned long) info) {
  		case (unsigned long) SEND_SIG_NOINFO:
+			/*
+			 * Make sure we always have a fully initialized
+			 * siginfo struct:
+			 */
+			memset(&q->info, 0, sizeof(q->info));
+
  			q->info.si_signo = sig;
  			q->info.si_errno = 0;
  			q->info.si_code = SI_USER;
@@ -698,6 +704,12 @@ static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
  			q->info.si_uid = current->uid;
  			break;
  		case (unsigned long) SEND_SIG_PRIV:
+			/*
+			 * Make sure we always have a fully initialized
+			 * siginfo struct:
+			 */
+			memset(&q->info, 0, sizeof(q->info));
+
  			q->info.si_signo = sig;
  			q->info.si_errno = 0;
  			q->info.si_code = SI_KERNEL;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4e35422..855b6fa 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -223,6 +223,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
  		struct sk_buff *child = skb + 1;
  		atomic_t *fclone_ref = (atomic_t *) (child + 1);

+#ifdef CONFIG_KMEMCHECK
+		memset(child, 0, offsetof(struct sk_buff, tail));
+#endif
  		skb->fclone = SKB_FCLONE_ORIG;
  		atomic_set(fclone_ref, 1);

@@ -255,6 +258,9 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
  	int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
  	struct sk_buff *skb;

+#ifdef CONFIG_KMEMCHECK
+	gfp_mask |= __GFP_ZERO;
+#endif
  	skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, node);
  	if (likely(skb)) {
  		skb_reserve(skb, NET_SKB_PAD);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ed750f9..47bb63b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -333,6 +333,10 @@ static inline void TCP_ECN_send(struct sock *sk, struct sk_buff *skb,
  static void tcp_init_nondata_skb(struct sk_buff *skb, u32 seq, u8 flags)
  {
  	skb->csum = 0;
+	skb->local_df = skb->cloned = skb->ip_summed = skb->nohdr =
+		skb->nfctinfo = 0;
+	skb->pkt_type = skb->fclone = skb->ipvs_property = skb->peeked =
+		skb->nf_trace = 0;

  	TCP_SKB_CB(skb)->flags = flags;
  	TCP_SKB_CB(skb)->sacked = 0;
-- 
1.5.3.8



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] kmemcheck v4
  2008-02-14 20:02 ` [PATCH 4/4] " Vegard Nossum
@ 2008-02-14 20:12   ` Pekka Enberg
  2008-02-14 20:31     ` Vegard Nossum
  2008-02-14 21:49   ` Andi Kleen
  1 sibling, 1 reply; 11+ messages in thread
From: Pekka Enberg @ 2008-02-14 20:12 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Linux Kernel Mailing List, Daniel Walker, Ingo Molnar,
	Richard Knutsson, Andi Kleen, Christoph Lameter

Hi,

Vegard Nossum wrote:
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 412672a..7bdb37f 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1294,7 +1294,11 @@ static inline void __skb_queue_purge(struct 
> sk_buff_head *list)
>  static inline struct sk_buff *__dev_alloc_skb(unsigned int length,
>                            gfp_t gfp_mask)
>  {
> -    struct sk_buff *skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
> +    struct sk_buff *skb;
> +#ifdef CONFIG_KMEMCHECK
> +    gfp_mask |= __GFP_ZERO;
> +#endif

Use __GFP_NOTRACK here (no need to wrap it in CONFIG_KMEMCHECK either).

> +    skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
>      if (likely(skb))
>          skb_reserve(skb, NET_SKB_PAD);
>      return skb;

> diff --git a/init/do_mounts.c b/init/do_mounts.c
> index f865731..87b1b0f 100644
> --- a/init/do_mounts.c
> +++ b/init/do_mounts.c
> @@ -201,9 +201,13 @@ static int __init do_mount_root(char *name, char 
> *fs, int flags, void *data)
>      return 0;
>  }
> 
> +#if PAGE_SIZE < PATH_MAX
> +# error increase the fs_names allocation size here
> +#endif
> +
>  void __init mount_block_root(char *name, int flags)
>  {
> -    char *fs_names = __getname();
> +    char *fs_names = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
>      char *p;
>  #ifdef CONFIG_BLOCK
>      char b[BDEVNAME_SIZE];
> @@ -251,7 +255,7 @@ retry:
>  #endif
>      panic("VFS: Unable to mount root fs on %s", b);
>  out:
> -    putname(fs_names);
> +    free_pages((unsigned long)fs_names, 1);

As discussed before, I don't think kmemcheck should be complaining about 
this (even though this is a potential bug). Have you tried with the 
current patches to see if it still triggers? Could have been one of the 
kmemcheck bugs, no?

> @@ -255,6 +258,9 @@ struct sk_buff *__netdev_alloc_skb(struct net_device 
> *dev,
>      int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
>      struct sk_buff *skb;
> 
> +#ifdef CONFIG_KMEMCHECK
> +    gfp_mask |= __GFP_ZERO;
> +#endif

__GFP_NOTRACK here

			Pekka

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] kmemcheck v4
  2008-02-14 20:02 ` [PATCH 3/4] " Vegard Nossum
@ 2008-02-14 20:18   ` Pekka Enberg
  0 siblings, 0 replies; 11+ messages in thread
From: Pekka Enberg @ 2008-02-14 20:18 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Linux Kernel Mailing List, Daniel Walker, Ingo Molnar,
	Richard Knutsson, Andi Kleen, Christoph Lameter

Vegard Nossum wrote:
>  From 4ce1c09e38b2402dc04f0246916f3c23abe8f3e1 Mon Sep 17 00:00:00 2001
> From: Vegard Nossum <vegard.nossum@gmail.com>
> Date: Thu, 14 Feb 2008 19:25:39 +0100
> Subject: [PATCH] kmemcheck: make SLUB use kmemcheck
> 
> With kmemcheck enabled, SLUB needs to do this:
> 
> 1. Request twice as much memory as would normally be needed. The bottom 
> half
>    of the memory is what the user actually sees and uses; the upper half
>    contains the so-called shadow memory, which stores the status of each 
> byte
>    in the bottom half, e.g. initialized or uninitialized.
> 2. Tell kmemcheck which parts of memory that should be marked 
> uninitialized.
>    There are actually a few more states, such as "not yet allocated" and
>    "recently freed".
> 
> If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
> memory that can take page faults because of kmemcheck.
> 
> If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can 
> still
> request memory with the __GFP_NOTRACK flag. This does not prevent the page
> faults from occuring, however, but marks the object in question as being
> initialized so that no warnings will ever be produced for this object.
> 
> Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>

Looks good to me.

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] kmemcheck v4
  2008-02-14 20:12   ` Pekka Enberg
@ 2008-02-14 20:31     ` Vegard Nossum
  0 siblings, 0 replies; 11+ messages in thread
From: Vegard Nossum @ 2008-02-14 20:31 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Linux Kernel Mailing List, Daniel Walker, Ingo Molnar,
	Richard Knutsson, Andi Kleen, Christoph Lameter

On 2/14/08, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> Hi,
>
>
>  Vegard Nossum wrote:
>  > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>  > index 412672a..7bdb37f 100644
>  > --- a/include/linux/skbuff.h
>  > +++ b/include/linux/skbuff.h
>  > @@ -1294,7 +1294,11 @@ static inline void __skb_queue_purge(struct
>  > sk_buff_head *list)
>  >  static inline struct sk_buff *__dev_alloc_skb(unsigned int length,
>  >                            gfp_t gfp_mask)
>  >  {
>  > -    struct sk_buff *skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
>  > +    struct sk_buff *skb;
>  > +#ifdef CONFIG_KMEMCHECK
>  > +    gfp_mask |= __GFP_ZERO;
>  > +#endif
>
>
> Use __GFP_NOTRACK here (no need to wrap it in CONFIG_KMEMCHECK either).
>
>  > +    skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
>
> >      if (likely(skb))
>  >          skb_reserve(skb, NET_SKB_PAD);
>  >      return skb;
>
>
> > diff --git a/init/do_mounts.c b/init/do_mounts.c
>  > index f865731..87b1b0f 100644
>  > --- a/init/do_mounts.c
>  > +++ b/init/do_mounts.c
>  > @@ -201,9 +201,13 @@ static int __init do_mount_root(char *name, char
>  > *fs, int flags, void *data)
>  >      return 0;
>  >  }
>  >
>  > +#if PAGE_SIZE < PATH_MAX
>  > +# error increase the fs_names allocation size here
>  > +#endif
>  > +
>  >  void __init mount_block_root(char *name, int flags)
>  >  {
>  > -    char *fs_names = __getname();
>  > +    char *fs_names = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
>  >      char *p;
>  >  #ifdef CONFIG_BLOCK
>  >      char b[BDEVNAME_SIZE];
>  > @@ -251,7 +255,7 @@ retry:
>  >  #endif
>  >      panic("VFS: Unable to mount root fs on %s", b);
>  >  out:
>  > -    putname(fs_names);
>  > +    free_pages((unsigned long)fs_names, 1);
>
>
> As discussed before, I don't think kmemcheck should be complaining about
>  this (even though this is a potential bug). Have you tried with the
>  current patches to see if it still triggers? Could have been one of the
>  kmemcheck bugs, no?
>
>
>  > @@ -255,6 +258,9 @@ struct sk_buff *__netdev_alloc_skb(struct net_device
>  > *dev,
>  >      int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
>  >      struct sk_buff *skb;
>  >
>  > +#ifdef CONFIG_KMEMCHECK
>  > +    gfp_mask |= __GFP_ZERO;
>  > +#endif
>
>
> __GFP_NOTRACK here
>
>
>                         Pekka
>

Will fix those, thanks!


Vegard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] kmemcheck v4
  2008-02-14 19:59 [PATCH 1/4] kmemcheck v4 Vegard Nossum
                   ` (2 preceding siblings ...)
  2008-02-14 20:02 ` [PATCH 4/4] " Vegard Nossum
@ 2008-02-14 20:45 ` Pekka Enberg
  2008-02-14 21:05 ` Randy Dunlap
  4 siblings, 0 replies; 11+ messages in thread
From: Pekka Enberg @ 2008-02-14 20:45 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Linux Kernel Mailing List, Daniel Walker, Ingo Molnar,
	Richard Knutsson, Andi Kleen, Christoph Lameter

Vegard Nossum wrote:

> +DEFINE_PER_CPU(bool, kmemcheck_busy) = false;
> +DEFINE_PER_CPU(uint32_t, kmemcheck_addr1) = 0;
> +DEFINE_PER_CPU(uint32_t, kmemcheck_addr2) = 0;
> +DEFINE_PER_CPU(uint32_t, kmemcheck_reg_flags) = 0;
> +
> +DEFINE_PER_CPU(int, kmemcheck_num) = 0;
> +DEFINE_PER_CPU(int, kmemcheck_balance) = 0;

No need to explicitly initialize these, they're automatically zeroed.

> +
> +/*
> + * Called from the #PF handler.
> + */
> +void
> +kmemcheck_show(struct pt_regs *regs)
> +{
> +    int n;
> +
> +    BUG_ON(!irqs_disabled());
> +
> +    if (__get_cpu_var(kmemcheck_balance) != 0) {
> +        oops_in_progress = 1;

Why?

> +        panic("kmemcheck: extra #PF");
> +    }
> +

> +/*
> + * Called from the #DB handler.
> + */
> +void
> +kmemcheck_hide(struct pt_regs *regs)
> +{
> +    BUG_ON(!irqs_disabled());
> +
> +    --__get_cpu_var(kmemcheck_balance);
> +    if (unlikely(__get_cpu_var(kmemcheck_balance) != 0)) {
> +        oops_in_progress = 1;

ditto

> +        panic("kmemcheck: extra #DB");
> +    }
> +
> +static void
> +kmemcheck_read(struct pt_regs *regs, uint32_t address, unsigned int size)
> +{
> +    void *shadow;
> +    enum shadow status;
> +
> +    shadow = address_get_shadow(address);
> +    if (!shadow)
> +        return;
> +
> +    status = test(shadow, size);
> +    if (status == SHADOW_INITIALIZED)
> +        return;
> +
> +    /* Don't warn about it again. */
> +    set(shadow, size);
> +
> +    oops_in_progress = 1;

I don't see you setting that to zero anywhere. What's this doing here 
anyway?

> +    error_save(status, address, size, regs);
> +}

> +
> +/*
> + * A faster implementation of memset() when tracking is enabled. We cannot
> + * assume that all pages within the range are tracked, so copying has 
> to be
> + * split into page-sized (or smaller, for the ends) chunks.
> + */
> +void
> +kmemcheck_memset(unsigned long s, int c, size_t n)
> +{
> +    unsigned long a_page, a_offset;
> +    unsigned long b_page, b_offset;

Uhm, s/a_page/start_page/g, s/b_page/end_page/g and same for the offsets 
too.

> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 0c6ce51..2138d64 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -50,8 +50,9 @@ struct vm_area_struct;
>  #define __GFP_THISNODE    ((__force gfp_t)0x40000u)/* No fallback, no 
> policies */
>  #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is 
> reclaimable */
>  #define __GFP_MOVABLE    ((__force gfp_t)0x100000u)  /* Page is movable */
> +#define __GFP_NOTRACK    ((__force gfp_t)0x200000u)  /* Don't track 
> with kmemcheck */

I think we can set __GFP_NOTRACK to zero to eliminate all dead code when 
  CONFIG_KMEMCHECK is disabled?

> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index bbad43f..1593859 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -90,6 +90,8 @@
>  #define PG_reclaim        17    /* To be reclaimed asap */
>  #define PG_buddy        19    /* Page is free, on buddy lists */
> 
> +#define PG_tracked        20    /* Page is tracked by kmemcheck */
> +

I think we should re-use PG_owner_priv_1 for this (see PG_pinned, for 
example).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] kmemcheck v4
  2008-02-14 19:59 [PATCH 1/4] kmemcheck v4 Vegard Nossum
                   ` (3 preceding siblings ...)
  2008-02-14 20:45 ` [PATCH 1/4] " Pekka Enberg
@ 2008-02-14 21:05 ` Randy Dunlap
  4 siblings, 0 replies; 11+ messages in thread
From: Randy Dunlap @ 2008-02-14 21:05 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Linux Kernel Mailing List, Daniel Walker, Ingo Molnar,
	Richard Knutsson, Andi Kleen, Pekka Enberg, Christoph Lameter

On Thu, 14 Feb 2008 20:59:46 +0100 Vegard Nossum wrote:

> Greetings,
> 

Just a few doc comments (below).

> 
>  From 0fcca4341b6b1b277d936558aa3cab0f212bad9b Mon Sep 17 00:00:00 2001
> From: Vegard Nossum <vegard.nossum@gmail.com>
> Date: Thu, 14 Feb 2008 19:10:40 +0100
> Subject: [PATCH] kmemcheck: add the core kmemcheck changes
> 
> General description: kmemcheck is a patch to the linux kernel that
> detects use of uninitialized memory. It does this by trapping every
> read and write to memory that was allocated dynamically (e.g. using
> kmalloc()). If a memory address is read that has not previously been
> written to, a message is printed to the kernel log.
> 
> Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
> ---
>   Documentation/kmemcheck.txt    |   73 ++++
>   arch/x86/Kconfig.debug         |   35 ++
>   arch/x86/kernel/Makefile       |    2 +
>   arch/x86/kernel/kmemcheck_32.c |  781 ++++++++++++++++++++++++++++++++++++++++
>   include/asm-x86/kmemcheck.h    |    3 +
>   include/asm-x86/kmemcheck_32.h |   22 ++
>   include/asm-x86/pgtable.h      |    4 +-
>   include/asm-x86/pgtable_32.h   |    1 +
>   include/linux/gfp.h            |    3 +-
>   include/linux/kmemcheck.h      |   17 +
>   include/linux/page-flags.h     |    7 +
>   11 files changed, 945 insertions(+), 3 deletions(-)
>   create mode 100644 Documentation/kmemcheck.txt
>   create mode 100644 arch/x86/kernel/kmemcheck_32.c
>   create mode 100644 include/asm-x86/kmemcheck.h
>   create mode 100644 include/asm-x86/kmemcheck_32.h
>   create mode 100644 include/linux/kmemcheck.h
> 
> diff --git a/Documentation/kmemcheck.txt b/Documentation/kmemcheck.txt
> new file mode 100644
> index 0000000..d234571
> --- /dev/null
> +++ b/Documentation/kmemcheck.txt
> @@ -0,0 +1,73 @@

> +
> +Changes to the memory allocator (SLUB)
> +======================================
> +
> +kmemcheck requires some assistance from the memory allocator in order to work.
> +The memory allocator needs to
> +
> +1. Request twice as much memory as would normally be needed. The bottom half
> +   of the memory is what the user actually sees and uses; the upper half
> +   contains the so-called shadow memory, which stores the status of each byte
> +   in the bottom half, e.g. initialized or uninitialized.
> +2. Tell kmemcheck which parts of memory that should be marked uninitialized.

Drop "that".

> +   There are actually a few more states, such as "not yet allocated" and
> +   "recently freed".
> +
> +If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
> +memory that can take page faults because of kmemcheck.
> +
> +If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
> +request memory with the __GFP_NOTRACK flag. This does not prevent the page
> +faults from occuring, however, but marks the object in question as being

               occurring,

> +initialized so that no warnings will ever be produced for this object.
> +
> +
> +Problems
> +========
> +
> +The most prominent problem seems to be that of bit-fields. kmemcheck can only
> +track memory with byte granularity. Therefore, when gcc generates code to
> +access only one bit in a bit-field, there is really no way for kmemcheck to
> +know which of the other bits that will be used or thrown away. Consequently,

Drop "that".

> +there may be bogus warnings for bit-field accesses. There is some experimental
> +support to detect this automatically, though it is probably better to work
> +around this by explicitly initializing whole bit-fields at once.
> +
> +Some allocations are used for DMA. As DMA doesn't go through the paging
> +mechanism, we have absolutely no way to detect DMA writes. This means that
> +spurious warnings may be seen on access to DMA memory. DMA allocations should
> +be annotated with the __GFP_NOTRACK flag or allocated from caches marked
> +SLAB_NOTRACK to work around this problem.
> +

---
~Randy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] kmemcheck v4
  2008-02-14 20:02 ` [PATCH 4/4] " Vegard Nossum
  2008-02-14 20:12   ` Pekka Enberg
@ 2008-02-14 21:49   ` Andi Kleen
  2008-02-15 19:18     ` Vegard Nossum
  1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2008-02-14 21:49 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Linux Kernel Mailing List, Daniel Walker, Ingo Molnar,
	Richard Knutsson, Andi Kleen, Pekka Enberg, Christoph Lameter


The ifdefs are quite ugly. I would recommend to define standard
functions (kmemcheck_init_zero or similar and an own __GFP flag) that can 
be used without ifdef and easily nop'ed out on !KMEMCHECK kernels.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] kmemcheck v4
  2008-02-14 21:49   ` Andi Kleen
@ 2008-02-15 19:18     ` Vegard Nossum
  0 siblings, 0 replies; 11+ messages in thread
From: Vegard Nossum @ 2008-02-15 19:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linux Kernel Mailing List, Daniel Walker, Ingo Molnar,
	Richard Knutsson, Pekka Enberg, Christoph Lameter

On Thu, Feb 14, 2008 at 10:49 PM, Andi Kleen <andi@firstfloor.org> wrote:
>  The ifdefs are quite ugly. I would recommend to define standard
>  functions (kmemcheck_init_zero or similar and an own __GFP flag) that can
>  be used without ifdef and easily nop'ed out on !KMEMCHECK kernels.

Yes, they are. We do in fact have a flag for this purpose,
__GFP_NOTRACK (or SLAB_NOTRACK for entire slab caches). I have now
changed some of the hunks to use this method instead of ifdefs. Thanks
for pointing this out.

Also note that this patch is not meant to be a part of kmemcheck
itself; it simply silences some of the (some bogus) warnings.
Preferably, each one of these call-sites should be carefully audited
and "fixed" independently of kmemcheck itself. In fact, the patch
should probably be split, one for the bogus warnings (where the
bogus-warning fixes are no-ops if kmemcheck is disabled), and the real
errors in their own patches.

There is currently a huge problem with bit-fields. When bit-fields are
initialized, gcc may use the AND/OR instructions to initialize one
field at a time (depending on what the C code looks like, of course).
This is seen by kmemcheck as 1. full 8/16/32-bit read 2. full
8/16/32-bit write. Therefore the first initialization of (access to) a
bit-field variable will generally cause a warning from kmemcheck. And
this is perfectly legal from the compiler point of view.

Ingo's approach so far has been to initialize all the bit-field
variables at once. This is probably what the author of the original
code wants to do anyway, though not in all cases. I admit that I am
not confident enough with kernel code myself to decide what the
appropriate fix would be in most cases of kmemcheck warnings.
Therefore, take this last patch with a grain of salt (for now).

Thank you.


Vegard

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-02-15 19:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-14 19:59 [PATCH 1/4] kmemcheck v4 Vegard Nossum
2008-02-14 20:01 ` [PATCH 2/4] " Vegard Nossum
2008-02-14 20:02 ` [PATCH 3/4] " Vegard Nossum
2008-02-14 20:18   ` Pekka Enberg
2008-02-14 20:02 ` [PATCH 4/4] " Vegard Nossum
2008-02-14 20:12   ` Pekka Enberg
2008-02-14 20:31     ` Vegard Nossum
2008-02-14 21:49   ` Andi Kleen
2008-02-15 19:18     ` Vegard Nossum
2008-02-14 20:45 ` [PATCH 1/4] " Pekka Enberg
2008-02-14 21:05 ` Randy Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).