All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] mm: per-thread vma caching
@ 2014-02-27 21:48 ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-02-27 21:48 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: Linus Torvalds, Peter Zijlstra, Michel Lespinasse, Mel Gorman,
	Rik van Riel, KOSAKI Motohiro, Davidlohr Bueso, aswin,
	scott.norton, linux-mm, linux-kernel

From: Davidlohr Bueso <davidlohr@hp.com>

This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random, thus
further comparison with other approaches were needed. There are two things
to consider when dealing with this, the cache hit rate and the latency of
find_vma(). Improving the hit-rate does not necessarily translate in finding
the vma any faster, as the overhead of any fancy caching schemes can be too
high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by up
to 250%, for workloads with good locality. On the other hand, this simple
scheme is pretty much useless for workloads with poor locality. Analyzing
ebizzy runs shows that, no matter how many threads are running, the
mmap_cache hit rate is less than 2%, and in many situations below 1%.

The proposed approach is to replace this scheme with a small per-thread cache,
maximizing hit rates at a very low maintenance cost. Invalidations are
performed by simply bumping up a 32-bit sequence number. The only expensive
operation is in the rare case of a seq number overflow, where all caches that
share the same address space are flushed. Upon a miss, the proposed replacement
policy is based on the page number that contains the virtual address in
question. Concretely, the following results are seen on an 80 core, 8 socket
x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread scheme
does improve ~50% hit rate by just adding a few more slots to the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current approach
as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
approach always shows nearly perfect hit rates, while baseline is just
about non-existent. The amounts of cycles can fluctuate between anywhere
from ~60 to ~116 for the baseline scheme, but this approach reduces it
considerably. For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
---
Changes from v3 (http://lkml.org/lkml/2014/2/26/637):
- Fixed an invalidation occurrence for nommu
- Make nommu have 4 slots, instead of limiting it to one.

Please note that kgdb, nommu and unicore32 arch are *untested*. Thanks.

 arch/unicore32/include/asm/mmu_context.h |  2 +-
 fs/exec.c                                |  4 +-
 fs/proc/task_mmu.c                       |  2 +-
 include/linux/mm_types.h                 |  4 +-
 include/linux/sched.h                    |  4 ++
 include/linux/vmacache.h                 | 40 ++++++++++++++
 kernel/debug/debug_core.c                | 13 ++++-
 kernel/fork.c                            |  5 +-
 mm/Makefile                              |  2 +-
 mm/mmap.c                                | 54 +++++++++---------
 mm/nommu.c                               | 23 +++++---
 mm/vmacache.c                            | 94 ++++++++++++++++++++++++++++++++
 12 files changed, 203 insertions(+), 44 deletions(-)
 create mode 100644 include/linux/vmacache.h
 create mode 100644 mm/vmacache.c

diff --git a/arch/unicore32/include/asm/mmu_context.h b/arch/unicore32/include/asm/mmu_context.h
index fb5e4c6..2dcd037 100644
--- a/arch/unicore32/include/asm/mmu_context.h
+++ b/arch/unicore32/include/asm/mmu_context.h
@@ -73,7 +73,7 @@ do { \
 		else \
 			mm->mmap = NULL; \
 		rb_erase(&high_vma->vm_rb, &mm->mm_rb); \
-		mm->mmap_cache = NULL; \
+		vmacache_invalidate(mm); \
 		mm->map_count--; \
 		remove_vma(high_vma); \
 	} \
diff --git a/fs/exec.c b/fs/exec.c
index 3d78fcc..3fb63b5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -820,7 +820,7 @@ EXPORT_SYMBOL(read_code);
 static int exec_mmap(struct mm_struct *mm)
 {
 	struct task_struct *tsk;
-	struct mm_struct * old_mm, *active_mm;
+	struct mm_struct *old_mm, *active_mm;
 
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
@@ -846,6 +846,8 @@ static int exec_mmap(struct mm_struct *mm)
 	tsk->mm = mm;
 	tsk->active_mm = mm;
 	activate_mm(active_mm, mm);
+	tsk->mm->vmacache_seqnum = 0;
+	vmacache_flush(tsk);
 	task_unlock(tsk);
 	if (old_mm) {
 		up_read(&old_mm->mmap_sem);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index fb52b54..231c836 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -152,7 +152,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 
 	/*
 	 * We remember last_addr rather than next_addr to hit with
-	 * mmap_cache most of the time. We have zero last_addr at
+	 * vmacache most of the time. We have zero last_addr at
 	 * the beginning and also after lseek. We will have -1 last_addr
 	 * after the end of the vmas.
 	 */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 290901a..2b58d19 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -342,9 +342,9 @@ struct mm_rss_stat {
 
 struct kioctx_table;
 struct mm_struct {
-	struct vm_area_struct * mmap;		/* list of VMAs */
+	struct vm_area_struct *mmap;		/* list of VMAs */
 	struct rb_root mm_rb;
-	struct vm_area_struct * mmap_cache;	/* last find_vma result */
+	u32 vmacache_seqnum;                   /* per-thread vmacache */
 #ifdef CONFIG_MMU
 	unsigned long (*get_unmapped_area) (struct file *filp,
 				unsigned long addr, unsigned long len,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a781dec..7754ab0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -23,6 +23,7 @@ struct sched_param {
 #include <linux/errno.h>
 #include <linux/nodemask.h>
 #include <linux/mm_types.h>
+#include <linux/vmacache.h>
 #include <linux/preempt_mask.h>
 
 #include <asm/page.h>
@@ -1228,6 +1229,9 @@ struct task_struct {
 #ifdef CONFIG_COMPAT_BRK
 	unsigned brk_randomized:1;
 #endif
+	/* per-thread vma caching */
+	u32 vmacache_seqnum;
+	struct vm_area_struct *vmacache[VMACACHE_SIZE];
 #if defined(SPLIT_RSS_COUNTING)
 	struct task_rss_stat	rss_stat;
 #endif
diff --git a/include/linux/vmacache.h b/include/linux/vmacache.h
new file mode 100644
index 0000000..40e4eb8
--- /dev/null
+++ b/include/linux/vmacache.h
@@ -0,0 +1,40 @@
+#ifndef __LINUX_VMACACHE_H
+#define __LINUX_VMACACHE_H
+
+#include <linux/mm.h>
+
+#define VMACACHE_BITS 2
+#define VMACACHE_SIZE (1U << VMACACHE_BITS)
+#define VMACACHE_MASK (VMACACHE_SIZE - 1)
+/*
+ * Hash based on the page number. Provides a good hit rate for
+ * workloads with good locality and those with random accesses as well.
+ */
+#define VMACACHE_HASH(addr) ((addr >> PAGE_SHIFT) & VMACACHE_MASK)
+
+#define vmacache_flush(tsk)					 \
+	do {							 \
+		memset(tsk->vmacache, 0, sizeof(tsk->vmacache)); \
+	} while (0)
+
+extern void vmacache_flush_all(struct mm_struct *mm);
+extern void vmacache_update(unsigned long addr, struct vm_area_struct *newvma);
+extern struct vm_area_struct *vmacache_find(struct mm_struct *mm,
+						    unsigned long addr);
+
+#ifndef CONFIG_MMU
+extern struct vm_area_struct *vmacache_find_exact(struct mm_struct *mm,
+						  unsigned long start,
+						  unsigned long end);
+#endif
+
+static inline void vmacache_invalidate(struct mm_struct *mm)
+{
+	mm->vmacache_seqnum++;
+
+	/* deal with overflows */
+	if (unlikely(mm->vmacache_seqnum == 0))
+		vmacache_flush_all(mm);
+}
+
+#endif /* __LINUX_VMACACHE_H */
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 334b398..7f1a97a 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -224,10 +224,17 @@ static void kgdb_flush_swbreak_addr(unsigned long addr)
 	if (!CACHE_FLUSH_IS_SAFE)
 		return;
 
-	if (current->mm && current->mm->mmap_cache) {
-		flush_cache_range(current->mm->mmap_cache,
-				  addr, addr + BREAK_INSTR_SIZE);
+	if (current->mm) {
+		int i;
+
+		for (i = 0; i < VMACACHE_SIZE; i++) {
+			if (!current->vmacache[i])
+				continue;
+			flush_cache_range(current->vmacache[i],
+					  addr, addr + BREAK_INSTR_SIZE);
+		}
 	}
+
 	/* Force flush instruction cache if it was outside the mm */
 	flush_icache_range(addr, addr + BREAK_INSTR_SIZE);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index a17621c..523bce5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -363,7 +363,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 
 	mm->locked_vm = 0;
 	mm->mmap = NULL;
-	mm->mmap_cache = NULL;
+	mm->vmacache_seqnum = 0;
 	mm->map_count = 0;
 	cpumask_clear(mm_cpumask(mm));
 	mm->mm_rb = RB_ROOT;
@@ -833,6 +833,9 @@ static struct mm_struct *dup_mm(struct task_struct *tsk)
 	if (mm->binfmt && !try_module_get(mm->binfmt->module))
 		goto free_pt;
 
+	/* initialize the new vmacache entries */
+	vmacache_flush(tsk);
+
 	return mm;
 
 free_pt:
diff --git a/mm/Makefile b/mm/Makefile
index 310c90a..ad6638f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -17,7 +17,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
 			   util.o mmzone.o vmstat.o backing-dev.o \
 			   mm_init.o mmu_context.o percpu.o slab_common.o \
 			   compaction.o balloon_compaction.o \
-			   interval_tree.o list_lru.o $(mmu-y)
+			   vmacache.o interval_tree.o list_lru.o $(mmu-y)
 
 obj-y += init-mm.o
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 20ff0c3..47329e1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -681,8 +681,9 @@ __vma_unlink(struct mm_struct *mm, struct vm_area_struct *vma,
 	prev->vm_next = next = vma->vm_next;
 	if (next)
 		next->vm_prev = prev;
-	if (mm->mmap_cache == vma)
-		mm->mmap_cache = prev;
+
+	/* Kill the cache */
+	vmacache_invalidate(mm);
 }
 
 /*
@@ -1989,34 +1990,33 @@ EXPORT_SYMBOL(get_unmapped_area);
 /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
 struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 {
-	struct vm_area_struct *vma = NULL;
+	struct rb_node *rb_node;
+	struct vm_area_struct *vma;
 
 	/* Check the cache first. */
-	/* (Cache hit rate is typically around 35%.) */
-	vma = ACCESS_ONCE(mm->mmap_cache);
-	if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) {
-		struct rb_node *rb_node;
+	vma = vmacache_find(mm, addr);
+	if (likely(vma))
+		return vma;
 
-		rb_node = mm->mm_rb.rb_node;
-		vma = NULL;
+	rb_node = mm->mm_rb.rb_node;
+	vma = NULL;
 
-		while (rb_node) {
-			struct vm_area_struct *vma_tmp;
-
-			vma_tmp = rb_entry(rb_node,
-					   struct vm_area_struct, vm_rb);
-
-			if (vma_tmp->vm_end > addr) {
-				vma = vma_tmp;
-				if (vma_tmp->vm_start <= addr)
-					break;
-				rb_node = rb_node->rb_left;
-			} else
-				rb_node = rb_node->rb_right;
-		}
-		if (vma)
-			mm->mmap_cache = vma;
+	while (rb_node) {
+		struct vm_area_struct *tmp;
+
+		tmp = rb_entry(rb_node, struct vm_area_struct, vm_rb);
+
+		if (tmp->vm_end > addr) {
+			vma = tmp;
+			if (tmp->vm_start <= addr)
+				break;
+			rb_node = rb_node->rb_left;
+		} else
+			rb_node = rb_node->rb_right;
 	}
+
+	if (vma)
+		vmacache_update(addr, vma);
 	return vma;
 }
 
@@ -2388,7 +2388,9 @@ detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma,
 	} else
 		mm->highest_vm_end = prev ? prev->vm_end : 0;
 	tail_vma->vm_next = NULL;
-	mm->mmap_cache = NULL;		/* Kill the cache. */
+
+	/* Kill the cache */
+	vmacache_invalidate(mm);
 }
 
 /*
diff --git a/mm/nommu.c b/mm/nommu.c
index 8740213..95c2bd9 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -768,16 +768,23 @@ static void add_vma_to_mm(struct mm_struct *mm, struct vm_area_struct *vma)
  */
 static void delete_vma_from_mm(struct vm_area_struct *vma)
 {
+	int i;
 	struct address_space *mapping;
 	struct mm_struct *mm = vma->vm_mm;
+	struct task_struct *curr = current;
 
 	kenter("%p", vma);
 
 	protect_vma(vma, 0);
 
 	mm->map_count--;
-	if (mm->mmap_cache == vma)
-		mm->mmap_cache = NULL;
+	for (i = 0; i < VMACACHE_SIZE; i++) {
+		/* if the vma is cached, invalidate the entire cache */
+		if (curr->vmacache[i] == vma) {
+			vmacache_invalidate(mm);
+			break;
+		}
+	}
 
 	/* remove the VMA from the mapping */
 	if (vma->vm_file) {
@@ -825,8 +832,8 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 	struct vm_area_struct *vma;
 
 	/* check the cache first */
-	vma = ACCESS_ONCE(mm->mmap_cache);
-	if (vma && vma->vm_start <= addr && vma->vm_end > addr)
+	vma = vmacache_find(mm, addr);
+	if (likely(vma))
 		return vma;
 
 	/* trawl the list (there may be multiple mappings in which addr
@@ -835,7 +842,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 		if (vma->vm_start > addr)
 			return NULL;
 		if (vma->vm_end > addr) {
-			mm->mmap_cache = vma;
+			vmacache_update(addr, vma);
 			return vma;
 		}
 	}
@@ -874,8 +881,8 @@ static struct vm_area_struct *find_vma_exact(struct mm_struct *mm,
 	unsigned long end = addr + len;
 
 	/* check the cache first */
-	vma = mm->mmap_cache;
-	if (vma && vma->vm_start == addr && vma->vm_end == end)
+	vma = vmacache_find_exact(mm, addr, end);
+	if (vma)
 		return vma;
 
 	/* trawl the list (there may be multiple mappings in which addr
@@ -886,7 +893,7 @@ static struct vm_area_struct *find_vma_exact(struct mm_struct *mm,
 		if (vma->vm_start > addr)
 			return NULL;
 		if (vma->vm_end == end) {
-			mm->mmap_cache = vma;
+			vmacache_update(addr, vma);
 			return vma;
 		}
 	}
diff --git a/mm/vmacache.c b/mm/vmacache.c
new file mode 100644
index 0000000..91cd694
--- /dev/null
+++ b/mm/vmacache.c
@@ -0,0 +1,94 @@
+/*
+ * Copyright (C) 2014 Davidlohr Bueso.
+ */
+#include <linux/sched.h>
+#include <linux/vmacache.h>
+
+/*
+ * Flush vma caches for threads that share a given mm.
+ *
+ * The operation is safe because the caller holds the mmap_sem
+ * exclusively and other threads accessing the vma cache will
+ * have mmap_sem held at least for read, so no extra locking
+ * is required to maintain the vma cache.
+ */
+void vmacache_flush_all(struct mm_struct *mm)
+{
+	struct task_struct *g, *p;
+
+	rcu_read_lock();
+	for_each_process_thread(g, p) {
+		/*
+		 * Only flush the vmacache pointers as the
+		 * mm seqnum is already set and curr's will
+		 * be set upon invalidation when the next
+		 * lookup is done.
+		 */
+		if (mm == p->mm)
+			vmacache_flush(p);
+	}
+	rcu_read_unlock();
+}
+
+void vmacache_update(unsigned long addr, struct vm_area_struct *newvma)
+{
+	int idx = VMACACHE_HASH(addr);
+	current->vmacache[idx] = newvma;
+}
+
+static bool vmacache_valid(struct mm_struct *mm)
+{
+	struct task_struct *curr = current;
+
+	if (mm != curr->mm)
+		return false;
+
+	if (mm->vmacache_seqnum != curr->vmacache_seqnum) {
+		/*
+		 * First attempt will always be invalid, initialize
+		 * the new cache for this task here.
+		 */
+		curr->vmacache_seqnum = mm->vmacache_seqnum;
+		vmacache_flush(curr);
+		return false;
+	}
+	return true;
+}
+
+struct vm_area_struct *vmacache_find(struct mm_struct *mm, unsigned long addr)
+{
+	int i;
+
+	if (!vmacache_valid(mm))
+		return NULL;
+
+	for (i = 0; i < VMACACHE_SIZE; i++) {
+		struct vm_area_struct *vma = current->vmacache[i];
+
+		if (vma && vma->vm_start <= addr && vma->vm_end > addr)
+			return vma;
+	}
+
+	return NULL;
+}
+
+#ifndef CONFIG_MMU
+struct vm_area_struct *vmacache_find_exact(struct mm_struct *mm,
+					   unsigned long start,
+					   unsigned long end)
+{
+	int i;
+
+	if (!vmacache_valid(mm))
+		return NULL;
+
+	for (i = 0; i < VMACACHE_SIZE; i++) {
+		struct vm_area_struct *vma = current->vmacache[i];
+
+		if (vma && vma->vm_start == start && vma->vm_end == end)
+			return vma;
+	}
+
+	return NULL;
+}
+#endif
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4] mm: per-thread vma caching
@ 2014-02-27 21:48 ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-02-27 21:48 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: Linus Torvalds, Peter Zijlstra, Michel Lespinasse, Mel Gorman,
	Rik van Riel, KOSAKI Motohiro, Davidlohr Bueso, aswin,
	scott.norton, linux-mm, linux-kernel

From: Davidlohr Bueso <davidlohr@hp.com>

This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random, thus
further comparison with other approaches were needed. There are two things
to consider when dealing with this, the cache hit rate and the latency of
find_vma(). Improving the hit-rate does not necessarily translate in finding
the vma any faster, as the overhead of any fancy caching schemes can be too
high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by up
to 250%, for workloads with good locality. On the other hand, this simple
scheme is pretty much useless for workloads with poor locality. Analyzing
ebizzy runs shows that, no matter how many threads are running, the
mmap_cache hit rate is less than 2%, and in many situations below 1%.

The proposed approach is to replace this scheme with a small per-thread cache,
maximizing hit rates at a very low maintenance cost. Invalidations are
performed by simply bumping up a 32-bit sequence number. The only expensive
operation is in the rare case of a seq number overflow, where all caches that
share the same address space are flushed. Upon a miss, the proposed replacement
policy is based on the page number that contains the virtual address in
question. Concretely, the following results are seen on an 80 core, 8 socket
x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread scheme
does improve ~50% hit rate by just adding a few more slots to the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current approach
as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
approach always shows nearly perfect hit rates, while baseline is just
about non-existent. The amounts of cycles can fluctuate between anywhere
from ~60 to ~116 for the baseline scheme, but this approach reduces it
considerably. For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
---
Changes from v3 (http://lkml.org/lkml/2014/2/26/637):
- Fixed an invalidation occurrence for nommu
- Make nommu have 4 slots, instead of limiting it to one.

Please note that kgdb, nommu and unicore32 arch are *untested*. Thanks.

 arch/unicore32/include/asm/mmu_context.h |  2 +-
 fs/exec.c                                |  4 +-
 fs/proc/task_mmu.c                       |  2 +-
 include/linux/mm_types.h                 |  4 +-
 include/linux/sched.h                    |  4 ++
 include/linux/vmacache.h                 | 40 ++++++++++++++
 kernel/debug/debug_core.c                | 13 ++++-
 kernel/fork.c                            |  5 +-
 mm/Makefile                              |  2 +-
 mm/mmap.c                                | 54 +++++++++---------
 mm/nommu.c                               | 23 +++++---
 mm/vmacache.c                            | 94 ++++++++++++++++++++++++++++++++
 12 files changed, 203 insertions(+), 44 deletions(-)
 create mode 100644 include/linux/vmacache.h
 create mode 100644 mm/vmacache.c

diff --git a/arch/unicore32/include/asm/mmu_context.h b/arch/unicore32/include/asm/mmu_context.h
index fb5e4c6..2dcd037 100644
--- a/arch/unicore32/include/asm/mmu_context.h
+++ b/arch/unicore32/include/asm/mmu_context.h
@@ -73,7 +73,7 @@ do { \
 		else \
 			mm->mmap = NULL; \
 		rb_erase(&high_vma->vm_rb, &mm->mm_rb); \
-		mm->mmap_cache = NULL; \
+		vmacache_invalidate(mm); \
 		mm->map_count--; \
 		remove_vma(high_vma); \
 	} \
diff --git a/fs/exec.c b/fs/exec.c
index 3d78fcc..3fb63b5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -820,7 +820,7 @@ EXPORT_SYMBOL(read_code);
 static int exec_mmap(struct mm_struct *mm)
 {
 	struct task_struct *tsk;
-	struct mm_struct * old_mm, *active_mm;
+	struct mm_struct *old_mm, *active_mm;
 
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
@@ -846,6 +846,8 @@ static int exec_mmap(struct mm_struct *mm)
 	tsk->mm = mm;
 	tsk->active_mm = mm;
 	activate_mm(active_mm, mm);
+	tsk->mm->vmacache_seqnum = 0;
+	vmacache_flush(tsk);
 	task_unlock(tsk);
 	if (old_mm) {
 		up_read(&old_mm->mmap_sem);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index fb52b54..231c836 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -152,7 +152,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 
 	/*
 	 * We remember last_addr rather than next_addr to hit with
-	 * mmap_cache most of the time. We have zero last_addr at
+	 * vmacache most of the time. We have zero last_addr at
 	 * the beginning and also after lseek. We will have -1 last_addr
 	 * after the end of the vmas.
 	 */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 290901a..2b58d19 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -342,9 +342,9 @@ struct mm_rss_stat {
 
 struct kioctx_table;
 struct mm_struct {
-	struct vm_area_struct * mmap;		/* list of VMAs */
+	struct vm_area_struct *mmap;		/* list of VMAs */
 	struct rb_root mm_rb;
-	struct vm_area_struct * mmap_cache;	/* last find_vma result */
+	u32 vmacache_seqnum;                   /* per-thread vmacache */
 #ifdef CONFIG_MMU
 	unsigned long (*get_unmapped_area) (struct file *filp,
 				unsigned long addr, unsigned long len,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a781dec..7754ab0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -23,6 +23,7 @@ struct sched_param {
 #include <linux/errno.h>
 #include <linux/nodemask.h>
 #include <linux/mm_types.h>
+#include <linux/vmacache.h>
 #include <linux/preempt_mask.h>
 
 #include <asm/page.h>
@@ -1228,6 +1229,9 @@ struct task_struct {
 #ifdef CONFIG_COMPAT_BRK
 	unsigned brk_randomized:1;
 #endif
+	/* per-thread vma caching */
+	u32 vmacache_seqnum;
+	struct vm_area_struct *vmacache[VMACACHE_SIZE];
 #if defined(SPLIT_RSS_COUNTING)
 	struct task_rss_stat	rss_stat;
 #endif
diff --git a/include/linux/vmacache.h b/include/linux/vmacache.h
new file mode 100644
index 0000000..40e4eb8
--- /dev/null
+++ b/include/linux/vmacache.h
@@ -0,0 +1,40 @@
+#ifndef __LINUX_VMACACHE_H
+#define __LINUX_VMACACHE_H
+
+#include <linux/mm.h>
+
+#define VMACACHE_BITS 2
+#define VMACACHE_SIZE (1U << VMACACHE_BITS)
+#define VMACACHE_MASK (VMACACHE_SIZE - 1)
+/*
+ * Hash based on the page number. Provides a good hit rate for
+ * workloads with good locality and those with random accesses as well.
+ */
+#define VMACACHE_HASH(addr) ((addr >> PAGE_SHIFT) & VMACACHE_MASK)
+
+#define vmacache_flush(tsk)					 \
+	do {							 \
+		memset(tsk->vmacache, 0, sizeof(tsk->vmacache)); \
+	} while (0)
+
+extern void vmacache_flush_all(struct mm_struct *mm);
+extern void vmacache_update(unsigned long addr, struct vm_area_struct *newvma);
+extern struct vm_area_struct *vmacache_find(struct mm_struct *mm,
+						    unsigned long addr);
+
+#ifndef CONFIG_MMU
+extern struct vm_area_struct *vmacache_find_exact(struct mm_struct *mm,
+						  unsigned long start,
+						  unsigned long end);
+#endif
+
+static inline void vmacache_invalidate(struct mm_struct *mm)
+{
+	mm->vmacache_seqnum++;
+
+	/* deal with overflows */
+	if (unlikely(mm->vmacache_seqnum == 0))
+		vmacache_flush_all(mm);
+}
+
+#endif /* __LINUX_VMACACHE_H */
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 334b398..7f1a97a 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -224,10 +224,17 @@ static void kgdb_flush_swbreak_addr(unsigned long addr)
 	if (!CACHE_FLUSH_IS_SAFE)
 		return;
 
-	if (current->mm && current->mm->mmap_cache) {
-		flush_cache_range(current->mm->mmap_cache,
-				  addr, addr + BREAK_INSTR_SIZE);
+	if (current->mm) {
+		int i;
+
+		for (i = 0; i < VMACACHE_SIZE; i++) {
+			if (!current->vmacache[i])
+				continue;
+			flush_cache_range(current->vmacache[i],
+					  addr, addr + BREAK_INSTR_SIZE);
+		}
 	}
+
 	/* Force flush instruction cache if it was outside the mm */
 	flush_icache_range(addr, addr + BREAK_INSTR_SIZE);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index a17621c..523bce5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -363,7 +363,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 
 	mm->locked_vm = 0;
 	mm->mmap = NULL;
-	mm->mmap_cache = NULL;
+	mm->vmacache_seqnum = 0;
 	mm->map_count = 0;
 	cpumask_clear(mm_cpumask(mm));
 	mm->mm_rb = RB_ROOT;
@@ -833,6 +833,9 @@ static struct mm_struct *dup_mm(struct task_struct *tsk)
 	if (mm->binfmt && !try_module_get(mm->binfmt->module))
 		goto free_pt;
 
+	/* initialize the new vmacache entries */
+	vmacache_flush(tsk);
+
 	return mm;
 
 free_pt:
diff --git a/mm/Makefile b/mm/Makefile
index 310c90a..ad6638f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -17,7 +17,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
 			   util.o mmzone.o vmstat.o backing-dev.o \
 			   mm_init.o mmu_context.o percpu.o slab_common.o \
 			   compaction.o balloon_compaction.o \
-			   interval_tree.o list_lru.o $(mmu-y)
+			   vmacache.o interval_tree.o list_lru.o $(mmu-y)
 
 obj-y += init-mm.o
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 20ff0c3..47329e1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -681,8 +681,9 @@ __vma_unlink(struct mm_struct *mm, struct vm_area_struct *vma,
 	prev->vm_next = next = vma->vm_next;
 	if (next)
 		next->vm_prev = prev;
-	if (mm->mmap_cache == vma)
-		mm->mmap_cache = prev;
+
+	/* Kill the cache */
+	vmacache_invalidate(mm);
 }
 
 /*
@@ -1989,34 +1990,33 @@ EXPORT_SYMBOL(get_unmapped_area);
 /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
 struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 {
-	struct vm_area_struct *vma = NULL;
+	struct rb_node *rb_node;
+	struct vm_area_struct *vma;
 
 	/* Check the cache first. */
-	/* (Cache hit rate is typically around 35%.) */
-	vma = ACCESS_ONCE(mm->mmap_cache);
-	if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) {
-		struct rb_node *rb_node;
+	vma = vmacache_find(mm, addr);
+	if (likely(vma))
+		return vma;
 
-		rb_node = mm->mm_rb.rb_node;
-		vma = NULL;
+	rb_node = mm->mm_rb.rb_node;
+	vma = NULL;
 
-		while (rb_node) {
-			struct vm_area_struct *vma_tmp;
-
-			vma_tmp = rb_entry(rb_node,
-					   struct vm_area_struct, vm_rb);
-
-			if (vma_tmp->vm_end > addr) {
-				vma = vma_tmp;
-				if (vma_tmp->vm_start <= addr)
-					break;
-				rb_node = rb_node->rb_left;
-			} else
-				rb_node = rb_node->rb_right;
-		}
-		if (vma)
-			mm->mmap_cache = vma;
+	while (rb_node) {
+		struct vm_area_struct *tmp;
+
+		tmp = rb_entry(rb_node, struct vm_area_struct, vm_rb);
+
+		if (tmp->vm_end > addr) {
+			vma = tmp;
+			if (tmp->vm_start <= addr)
+				break;
+			rb_node = rb_node->rb_left;
+		} else
+			rb_node = rb_node->rb_right;
 	}
+
+	if (vma)
+		vmacache_update(addr, vma);
 	return vma;
 }
 
@@ -2388,7 +2388,9 @@ detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma,
 	} else
 		mm->highest_vm_end = prev ? prev->vm_end : 0;
 	tail_vma->vm_next = NULL;
-	mm->mmap_cache = NULL;		/* Kill the cache. */
+
+	/* Kill the cache */
+	vmacache_invalidate(mm);
 }
 
 /*
diff --git a/mm/nommu.c b/mm/nommu.c
index 8740213..95c2bd9 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -768,16 +768,23 @@ static void add_vma_to_mm(struct mm_struct *mm, struct vm_area_struct *vma)
  */
 static void delete_vma_from_mm(struct vm_area_struct *vma)
 {
+	int i;
 	struct address_space *mapping;
 	struct mm_struct *mm = vma->vm_mm;
+	struct task_struct *curr = current;
 
 	kenter("%p", vma);
 
 	protect_vma(vma, 0);
 
 	mm->map_count--;
-	if (mm->mmap_cache == vma)
-		mm->mmap_cache = NULL;
+	for (i = 0; i < VMACACHE_SIZE; i++) {
+		/* if the vma is cached, invalidate the entire cache */
+		if (curr->vmacache[i] == vma) {
+			vmacache_invalidate(mm);
+			break;
+		}
+	}
 
 	/* remove the VMA from the mapping */
 	if (vma->vm_file) {
@@ -825,8 +832,8 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 	struct vm_area_struct *vma;
 
 	/* check the cache first */
-	vma = ACCESS_ONCE(mm->mmap_cache);
-	if (vma && vma->vm_start <= addr && vma->vm_end > addr)
+	vma = vmacache_find(mm, addr);
+	if (likely(vma))
 		return vma;
 
 	/* trawl the list (there may be multiple mappings in which addr
@@ -835,7 +842,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 		if (vma->vm_start > addr)
 			return NULL;
 		if (vma->vm_end > addr) {
-			mm->mmap_cache = vma;
+			vmacache_update(addr, vma);
 			return vma;
 		}
 	}
@@ -874,8 +881,8 @@ static struct vm_area_struct *find_vma_exact(struct mm_struct *mm,
 	unsigned long end = addr + len;
 
 	/* check the cache first */
-	vma = mm->mmap_cache;
-	if (vma && vma->vm_start == addr && vma->vm_end == end)
+	vma = vmacache_find_exact(mm, addr, end);
+	if (vma)
 		return vma;
 
 	/* trawl the list (there may be multiple mappings in which addr
@@ -886,7 +893,7 @@ static struct vm_area_struct *find_vma_exact(struct mm_struct *mm,
 		if (vma->vm_start > addr)
 			return NULL;
 		if (vma->vm_end == end) {
-			mm->mmap_cache = vma;
+			vmacache_update(addr, vma);
 			return vma;
 		}
 	}
diff --git a/mm/vmacache.c b/mm/vmacache.c
new file mode 100644
index 0000000..91cd694
--- /dev/null
+++ b/mm/vmacache.c
@@ -0,0 +1,94 @@
+/*
+ * Copyright (C) 2014 Davidlohr Bueso.
+ */
+#include <linux/sched.h>
+#include <linux/vmacache.h>
+
+/*
+ * Flush vma caches for threads that share a given mm.
+ *
+ * The operation is safe because the caller holds the mmap_sem
+ * exclusively and other threads accessing the vma cache will
+ * have mmap_sem held at least for read, so no extra locking
+ * is required to maintain the vma cache.
+ */
+void vmacache_flush_all(struct mm_struct *mm)
+{
+	struct task_struct *g, *p;
+
+	rcu_read_lock();
+	for_each_process_thread(g, p) {
+		/*
+		 * Only flush the vmacache pointers as the
+		 * mm seqnum is already set and curr's will
+		 * be set upon invalidation when the next
+		 * lookup is done.
+		 */
+		if (mm == p->mm)
+			vmacache_flush(p);
+	}
+	rcu_read_unlock();
+}
+
+void vmacache_update(unsigned long addr, struct vm_area_struct *newvma)
+{
+	int idx = VMACACHE_HASH(addr);
+	current->vmacache[idx] = newvma;
+}
+
+static bool vmacache_valid(struct mm_struct *mm)
+{
+	struct task_struct *curr = current;
+
+	if (mm != curr->mm)
+		return false;
+
+	if (mm->vmacache_seqnum != curr->vmacache_seqnum) {
+		/*
+		 * First attempt will always be invalid, initialize
+		 * the new cache for this task here.
+		 */
+		curr->vmacache_seqnum = mm->vmacache_seqnum;
+		vmacache_flush(curr);
+		return false;
+	}
+	return true;
+}
+
+struct vm_area_struct *vmacache_find(struct mm_struct *mm, unsigned long addr)
+{
+	int i;
+
+	if (!vmacache_valid(mm))
+		return NULL;
+
+	for (i = 0; i < VMACACHE_SIZE; i++) {
+		struct vm_area_struct *vma = current->vmacache[i];
+
+		if (vma && vma->vm_start <= addr && vma->vm_end > addr)
+			return vma;
+	}
+
+	return NULL;
+}
+
+#ifndef CONFIG_MMU
+struct vm_area_struct *vmacache_find_exact(struct mm_struct *mm,
+					   unsigned long start,
+					   unsigned long end)
+{
+	int i;
+
+	if (!vmacache_valid(mm))
+		return NULL;
+
+	for (i = 0; i < VMACACHE_SIZE; i++) {
+		struct vm_area_struct *vma = current->vmacache[i];
+
+		if (vma && vma->vm_start == start && vma->vm_end == end)
+			return vma;
+	}
+
+	return NULL;
+}
+#endif
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-02-27 21:48 ` Davidlohr Bueso
@ 2014-02-28  4:39   ` Davidlohr Bueso
  -1 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-02-28  4:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Thu, 2014-02-27 at 13:48 -0800, Davidlohr Bueso wrote:
> From: Davidlohr Bueso <davidlohr@hp.com>
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 8740213..95c2bd9 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -768,16 +768,23 @@ static void add_vma_to_mm(struct mm_struct *mm, struct vm_area_struct *vma)
>   */
>  static void delete_vma_from_mm(struct vm_area_struct *vma)
>  {
> +	int i;
>  	struct address_space *mapping;
>  	struct mm_struct *mm = vma->vm_mm;
> +	struct task_struct *curr = current;
>  
>  	kenter("%p", vma);
>  
>  	protect_vma(vma, 0);
>  
>  	mm->map_count--;
> -	if (mm->mmap_cache == vma)
> -		mm->mmap_cache = NULL;
> +	for (i = 0; i < VMACACHE_SIZE; i++) {
> +		/* if the vma is cached, invalidate the entire cache */
> +		if (curr->vmacache[i] == vma) {
> +			vmacache_invalidate(mm);

*sigh* this should be curr->mm. 

Andrew, if there is no more feedback, do you want me to send another
patch for this or prefer fixing yourself for -mm? Assuming you'll take
it, of course.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
@ 2014-02-28  4:39   ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-02-28  4:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Thu, 2014-02-27 at 13:48 -0800, Davidlohr Bueso wrote:
> From: Davidlohr Bueso <davidlohr@hp.com>
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 8740213..95c2bd9 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -768,16 +768,23 @@ static void add_vma_to_mm(struct mm_struct *mm, struct vm_area_struct *vma)
>   */
>  static void delete_vma_from_mm(struct vm_area_struct *vma)
>  {
> +	int i;
>  	struct address_space *mapping;
>  	struct mm_struct *mm = vma->vm_mm;
> +	struct task_struct *curr = current;
>  
>  	kenter("%p", vma);
>  
>  	protect_vma(vma, 0);
>  
>  	mm->map_count--;
> -	if (mm->mmap_cache == vma)
> -		mm->mmap_cache = NULL;
> +	for (i = 0; i < VMACACHE_SIZE; i++) {
> +		/* if the vma is cached, invalidate the entire cache */
> +		if (curr->vmacache[i] == vma) {
> +			vmacache_invalidate(mm);

*sigh* this should be curr->mm. 

Andrew, if there is no more feedback, do you want me to send another
patch for this or prefer fixing yourself for -mm? Assuming you'll take
it, of course.

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-02-27 21:48 ` Davidlohr Bueso
  (?)
  (?)
@ 2014-03-04  0:00 ` Andrew Morton
  2014-03-04  0:18     ` Davidlohr Bueso
  -1 siblings, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2014-03-04  0:00 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Thu, 27 Feb 2014 13:48:24 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:

> From: Davidlohr Bueso <davidlohr@hp.com>
> 
> This patch is a continuation of efforts trying to optimize find_vma(),
> avoiding potentially expensive rbtree walks to locate a vma upon faults.
> The original approach (https://lkml.org/lkml/2013/11/1/410), where the
> largest vma was also cached, ended up being too specific and random, thus
> further comparison with other approaches were needed. There are two things
> to consider when dealing with this, the cache hit rate and the latency of
> find_vma(). Improving the hit-rate does not necessarily translate in finding
> the vma any faster, as the overhead of any fancy caching schemes can be too
> high to consider.
> 
> We currently cache the last used vma for the whole address space, which
> provides a nice optimization, reducing the total cycles in find_vma() by up
> to 250%, for workloads with good locality. On the other hand, this simple
> scheme is pretty much useless for workloads with poor locality. Analyzing
> ebizzy runs shows that, no matter how many threads are running, the
> mmap_cache hit rate is less than 2%, and in many situations below 1%.
> 
> The proposed approach is to replace this scheme with a small per-thread cache,
> maximizing hit rates at a very low maintenance cost. Invalidations are
> performed by simply bumping up a 32-bit sequence number. The only expensive
> operation is in the rare case of a seq number overflow, where all caches that
> share the same address space are flushed. Upon a miss, the proposed replacement
> policy is based on the page number that contains the virtual address in
> question. Concretely, the following results are seen on an 80 core, 8 socket
> x86-64 box:
> 
> ...
> 
> 2) Kernel build: This one is already pretty good with the current approach
> as we're dealing with good locality.
> 
> +----------------+----------+------------------+
> | caching scheme | hit-rate | cycles (billion) |
> +----------------+----------+------------------+
> | baseline       | 75.28%   | 11.03            |
> | patched        | 88.09%   | 9.31             |
> +----------------+----------+------------------+

What is the "cycles" number here?  I'd like to believe we sped up kernel
builds by 10% ;)

Were any overall run time improvements observable?

> ...
>
> @@ -1228,6 +1229,9 @@ struct task_struct {
>  #ifdef CONFIG_COMPAT_BRK
>  	unsigned brk_randomized:1;
>  #endif
> +	/* per-thread vma caching */
> +	u32 vmacache_seqnum;
> +	struct vm_area_struct *vmacache[VMACACHE_SIZE];

So these are implicitly locked by being per-thread.

> +static inline void vmacache_invalidate(struct mm_struct *mm)
> +{
> +	mm->vmacache_seqnum++;
> +
> +	/* deal with overflows */
> +	if (unlikely(mm->vmacache_seqnum == 0))
> +		vmacache_flush_all(mm);
> +}

What's the locking rule for mm->vmacache_seqnum?

>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  0:00 ` Andrew Morton
@ 2014-03-04  0:18     ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-04  0:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 2014-03-03 at 16:00 -0800, Andrew Morton wrote:
> On Thu, 27 Feb 2014 13:48:24 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> 
> > From: Davidlohr Bueso <davidlohr@hp.com>
> > 
> > This patch is a continuation of efforts trying to optimize find_vma(),
> > avoiding potentially expensive rbtree walks to locate a vma upon faults.
> > The original approach (https://lkml.org/lkml/2013/11/1/410), where the
> > largest vma was also cached, ended up being too specific and random, thus
> > further comparison with other approaches were needed. There are two things
> > to consider when dealing with this, the cache hit rate and the latency of
> > find_vma(). Improving the hit-rate does not necessarily translate in finding
> > the vma any faster, as the overhead of any fancy caching schemes can be too
> > high to consider.
> > 
> > We currently cache the last used vma for the whole address space, which
> > provides a nice optimization, reducing the total cycles in find_vma() by up
> > to 250%, for workloads with good locality. On the other hand, this simple
> > scheme is pretty much useless for workloads with poor locality. Analyzing
> > ebizzy runs shows that, no matter how many threads are running, the
> > mmap_cache hit rate is less than 2%, and in many situations below 1%.
> > 
> > The proposed approach is to replace this scheme with a small per-thread cache,
> > maximizing hit rates at a very low maintenance cost. Invalidations are
> > performed by simply bumping up a 32-bit sequence number. The only expensive
> > operation is in the rare case of a seq number overflow, where all caches that
> > share the same address space are flushed. Upon a miss, the proposed replacement
> > policy is based on the page number that contains the virtual address in
> > question. Concretely, the following results are seen on an 80 core, 8 socket
> > x86-64 box:
> > 
> > ...
> > 
> > 2) Kernel build: This one is already pretty good with the current approach
> > as we're dealing with good locality.
> > 
> > +----------------+----------+------------------+
> > | caching scheme | hit-rate | cycles (billion) |
> > +----------------+----------+------------------+
> > | baseline       | 75.28%   | 11.03            |
> > | patched        | 88.09%   | 9.31             |
> > +----------------+----------+------------------+
> 
> What is the "cycles" number here?  I'd like to believe we sped up kernel
> builds by 10% ;)
> 
> Were any overall run time improvements observable?

Weeell not too much (I wouldn't normally go measuring cycles if I could
use a benchmark instead ;). As discussed a while back, all this occurs
under the mmap_sem anyway, so while we do optimize find_vma() in more
workloads than before, it doesn't translate in better benchmark
throughput :( The same occurs if we get rid of any caching and just rely
on rbtree walks, sure the cost of find_vma() goes way up, but that
really doesn't hurt from a user perspective. Fwiw, I did see in ebizzy
perf traces find_vma goes from ~7% to ~0.4%.

> 
> > ...
> >
> > @@ -1228,6 +1229,9 @@ struct task_struct {
> >  #ifdef CONFIG_COMPAT_BRK
> >  	unsigned brk_randomized:1;
> >  #endif
> > +	/* per-thread vma caching */
> > +	u32 vmacache_seqnum;
> > +	struct vm_area_struct *vmacache[VMACACHE_SIZE];
> 
> So these are implicitly locked by being per-thread.

Yes.

> > +static inline void vmacache_invalidate(struct mm_struct *mm)
> > +{
> > +	mm->vmacache_seqnum++;
> > +
> > +	/* deal with overflows */
> > +	if (unlikely(mm->vmacache_seqnum == 0))
> > +		vmacache_flush_all(mm);
> > +}
> 
> What's the locking rule for mm->vmacache_seqnum?

Invalidations occur under the mmap_sem (writing), just like
mm->mmap_cache did.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
@ 2014-03-04  0:18     ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-04  0:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 2014-03-03 at 16:00 -0800, Andrew Morton wrote:
> On Thu, 27 Feb 2014 13:48:24 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> 
> > From: Davidlohr Bueso <davidlohr@hp.com>
> > 
> > This patch is a continuation of efforts trying to optimize find_vma(),
> > avoiding potentially expensive rbtree walks to locate a vma upon faults.
> > The original approach (https://lkml.org/lkml/2013/11/1/410), where the
> > largest vma was also cached, ended up being too specific and random, thus
> > further comparison with other approaches were needed. There are two things
> > to consider when dealing with this, the cache hit rate and the latency of
> > find_vma(). Improving the hit-rate does not necessarily translate in finding
> > the vma any faster, as the overhead of any fancy caching schemes can be too
> > high to consider.
> > 
> > We currently cache the last used vma for the whole address space, which
> > provides a nice optimization, reducing the total cycles in find_vma() by up
> > to 250%, for workloads with good locality. On the other hand, this simple
> > scheme is pretty much useless for workloads with poor locality. Analyzing
> > ebizzy runs shows that, no matter how many threads are running, the
> > mmap_cache hit rate is less than 2%, and in many situations below 1%.
> > 
> > The proposed approach is to replace this scheme with a small per-thread cache,
> > maximizing hit rates at a very low maintenance cost. Invalidations are
> > performed by simply bumping up a 32-bit sequence number. The only expensive
> > operation is in the rare case of a seq number overflow, where all caches that
> > share the same address space are flushed. Upon a miss, the proposed replacement
> > policy is based on the page number that contains the virtual address in
> > question. Concretely, the following results are seen on an 80 core, 8 socket
> > x86-64 box:
> > 
> > ...
> > 
> > 2) Kernel build: This one is already pretty good with the current approach
> > as we're dealing with good locality.
> > 
> > +----------------+----------+------------------+
> > | caching scheme | hit-rate | cycles (billion) |
> > +----------------+----------+------------------+
> > | baseline       | 75.28%   | 11.03            |
> > | patched        | 88.09%   | 9.31             |
> > +----------------+----------+------------------+
> 
> What is the "cycles" number here?  I'd like to believe we sped up kernel
> builds by 10% ;)
> 
> Were any overall run time improvements observable?

Weeell not too much (I wouldn't normally go measuring cycles if I could
use a benchmark instead ;). As discussed a while back, all this occurs
under the mmap_sem anyway, so while we do optimize find_vma() in more
workloads than before, it doesn't translate in better benchmark
throughput :( The same occurs if we get rid of any caching and just rely
on rbtree walks, sure the cost of find_vma() goes way up, but that
really doesn't hurt from a user perspective. Fwiw, I did see in ebizzy
perf traces find_vma goes from ~7% to ~0.4%.

> 
> > ...
> >
> > @@ -1228,6 +1229,9 @@ struct task_struct {
> >  #ifdef CONFIG_COMPAT_BRK
> >  	unsigned brk_randomized:1;
> >  #endif
> > +	/* per-thread vma caching */
> > +	u32 vmacache_seqnum;
> > +	struct vm_area_struct *vmacache[VMACACHE_SIZE];
> 
> So these are implicitly locked by being per-thread.

Yes.

> > +static inline void vmacache_invalidate(struct mm_struct *mm)
> > +{
> > +	mm->vmacache_seqnum++;
> > +
> > +	/* deal with overflows */
> > +	if (unlikely(mm->vmacache_seqnum == 0))
> > +		vmacache_flush_all(mm);
> > +}
> 
> What's the locking rule for mm->vmacache_seqnum?

Invalidations occur under the mmap_sem (writing), just like
mm->mmap_cache did.

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-02-27 21:48 ` Davidlohr Bueso
                   ` (2 preceding siblings ...)
  (?)
@ 2014-03-04  0:40 ` Andrew Morton
  2014-03-04  0:59     ` Davidlohr Bueso
  -1 siblings, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2014-03-04  0:40 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Thu, 27 Feb 2014 13:48:24 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:

> From: Davidlohr Bueso <davidlohr@hp.com>
> 
> This patch is a continuation of efforts trying to optimize find_vma(),
> avoiding potentially expensive rbtree walks to locate a vma upon faults.
> The original approach (https://lkml.org/lkml/2013/11/1/410), where the
> largest vma was also cached, ended up being too specific and random, thus
> further comparison with other approaches were needed. There are two things
> to consider when dealing with this, the cache hit rate and the latency of
> find_vma(). Improving the hit-rate does not necessarily translate in finding
> the vma any faster, as the overhead of any fancy caching schemes can be too
> high to consider.
> 
> We currently cache the last used vma for the whole address space, which
> provides a nice optimization, reducing the total cycles in find_vma() by up
> to 250%, for workloads with good locality. On the other hand, this simple
> scheme is pretty much useless for workloads with poor locality. Analyzing
> ebizzy runs shows that, no matter how many threads are running, the
> mmap_cache hit rate is less than 2%, and in many situations below 1%.
> 
> The proposed approach is to replace this scheme with a small per-thread cache,
> maximizing hit rates at a very low maintenance cost. Invalidations are
> performed by simply bumping up a 32-bit sequence number. The only expensive
> operation is in the rare case of a seq number overflow, where all caches that
> share the same address space are flushed. Upon a miss, the proposed replacement
> policy is based on the page number that contains the virtual address in
> question. Concretely, the following results are seen on an 80 core, 8 socket
> x86-64 box:

A second look...

> Please note that kgdb, nommu and unicore32 arch are *untested*. Thanks.

I build tested nommu, fwiw.

>
> ...
>
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -342,9 +342,9 @@ struct mm_rss_stat {
>  
>  struct kioctx_table;
>  struct mm_struct {
> -	struct vm_area_struct * mmap;		/* list of VMAs */
> +	struct vm_area_struct *mmap;		/* list of VMAs */
>  	struct rb_root mm_rb;
> -	struct vm_area_struct * mmap_cache;	/* last find_vma result */
> +	u32 vmacache_seqnum;                   /* per-thread vmacache */

nitpick: in kernelese this is typically "per-task".  If it was in the
mm_struct then it would be "per process".  And I guess if it was in the
thread_struct it would be "per thread", but that isn't a distinction
I've seen made.

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -23,6 +23,7 @@ struct sched_param {
>  #include <linux/errno.h>
>  #include <linux/nodemask.h>
>  #include <linux/mm_types.h>
> +#include <linux/vmacache.h>

This might be awkward - vmacache.h drags in mm.h and we have had tangly
problems with these major header files in the past.  I'd be inclined to
remove this inclusion and just forward-declare vm_area_struct, but we
still need VMACACHE_SIZE, sigh.  Wait and see what happens, I guess.

>
> ...
>
> --- /dev/null
> +++ b/include/linux/vmacache.h
> @@ -0,0 +1,40 @@
> +#ifndef __LINUX_VMACACHE_H
> +#define __LINUX_VMACACHE_H
> +
> +#include <linux/mm.h>
> +
> +#define VMACACHE_BITS 2
> +#define VMACACHE_SIZE (1U << VMACACHE_BITS)
> +#define VMACACHE_MASK (VMACACHE_SIZE - 1)
> +/*
> + * Hash based on the page number. Provides a good hit rate for
> + * workloads with good locality and those with random accesses as well.
> + */
> +#define VMACACHE_HASH(addr) ((addr >> PAGE_SHIFT) & VMACACHE_MASK)
> +
> +#define vmacache_flush(tsk)					 \
> +	do {							 \
> +		memset(tsk->vmacache, 0, sizeof(tsk->vmacache)); \
> +	} while (0)

There's no particular reason to implement this in cpp.  Using C is
typesafer and nicer.  But then we get into header file issues again. 
More sigh

> +extern void vmacache_flush_all(struct mm_struct *mm);
> +extern void vmacache_update(unsigned long addr, struct vm_area_struct *newvma);
> +extern struct vm_area_struct *vmacache_find(struct mm_struct *mm,
> +						    unsigned long addr);
> +
> +#ifndef CONFIG_MMU
> +extern struct vm_area_struct *vmacache_find_exact(struct mm_struct *mm,
> +						  unsigned long start,
> +						  unsigned long end);
> +#endif

We often omit the ifdefs in this case.  It means that a compile-time
error becomes a link-time error but that's a small cost for unmucking
the header files.  It doesn't matter much in vmacache.h, but some
headers would become a complete maze of ifdefs otherwise.

>
> ...
>
> --- /dev/null
> +++ b/mm/vmacache.c
> @@ -0,0 +1,94 @@
> +/*
> + * Copyright (C) 2014 Davidlohr Bueso.
> + */
> +#include <linux/sched.h>
> +#include <linux/vmacache.h>
> +
> +/*
> + * Flush vma caches for threads that share a given mm.
> + *
> + * The operation is safe because the caller holds the mmap_sem
> + * exclusively and other threads accessing the vma cache will
> + * have mmap_sem held at least for read, so no extra locking
> + * is required to maintain the vma cache.
> + */

Ah, there are our locking rules.

>
> ...
>
> +static bool vmacache_valid(struct mm_struct *mm)
> +{
> +	struct task_struct *curr = current;
> +
> +	if (mm != curr->mm)
> +		return false;

What's going on here?  Handling a task poking around in someone else's
mm?  I'm thinking "__access_remote_vm", but I don't know what you were
thinking ;) An explanatory comment would be revealing.

> +	if (mm->vmacache_seqnum != curr->vmacache_seqnum) {
> +		/*
> +		 * First attempt will always be invalid, initialize
> +		 * the new cache for this task here.
> +		 */
> +		curr->vmacache_seqnum = mm->vmacache_seqnum;
> +		vmacache_flush(curr);
> +		return false;
> +	}
> +	return true;
> +}
> +
>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  0:40 ` Andrew Morton
@ 2014-03-04  0:59     ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-04  0:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 2014-03-03 at 16:40 -0800, Andrew Morton wrote:
> On Thu, 27 Feb 2014 13:48:24 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -342,9 +342,9 @@ struct mm_rss_stat {
> >  
> >  struct kioctx_table;
> >  struct mm_struct {
> > -	struct vm_area_struct * mmap;		/* list of VMAs */
> > +	struct vm_area_struct *mmap;		/* list of VMAs */
> >  	struct rb_root mm_rb;
> > -	struct vm_area_struct * mmap_cache;	/* last find_vma result */
> > +	u32 vmacache_seqnum;                   /* per-thread vmacache */
> 
> nitpick: in kernelese this is typically "per-task".  If it was in the
> mm_struct then it would be "per process".  And I guess if it was in the
> thread_struct it would be "per thread", but that isn't a distinction
> I've seen made.

Sure, I am referring to per-task, subject title as well. My mind just
treats them as synonyms in this context. My bad.

> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -23,6 +23,7 @@ struct sched_param {
> >  #include <linux/errno.h>
> >  #include <linux/nodemask.h>
> >  #include <linux/mm_types.h>
> > +#include <linux/vmacache.h>
> 
> This might be awkward - vmacache.h drags in mm.h and we have had tangly
> problems with these major header files in the past.  I'd be inclined to
> remove this inclusion and just forward-declare vm_area_struct, but we
> still need VMACACHE_SIZE, sigh.  Wait and see what happens, I guess.

Yeah, I wasn't sure what to do about that and was expecting it to come
up in the review process. Let me know if you want me to change/update
this.

> > ...
> >
> > --- /dev/null
> > +++ b/include/linux/vmacache.h
> > @@ -0,0 +1,40 @@
> > +#ifndef __LINUX_VMACACHE_H
> > +#define __LINUX_VMACACHE_H
> > +
> > +#include <linux/mm.h>
> > +
> > +#define VMACACHE_BITS 2
> > +#define VMACACHE_SIZE (1U << VMACACHE_BITS)
> > +#define VMACACHE_MASK (VMACACHE_SIZE - 1)
> > +/*
> > + * Hash based on the page number. Provides a good hit rate for
> > + * workloads with good locality and those with random accesses as well.
> > + */
> > +#define VMACACHE_HASH(addr) ((addr >> PAGE_SHIFT) & VMACACHE_MASK)
> > +
> > +#define vmacache_flush(tsk)					 \
> > +	do {							 \
> > +		memset(tsk->vmacache, 0, sizeof(tsk->vmacache)); \
> > +	} while (0)
> 
> There's no particular reason to implement this in cpp.  Using C is
> typesafer and nicer.  But then we get into header file issues again. 
> More sigh

Yep, I ran into that issue when trying to make it an inline function.

> 
> > +extern void vmacache_flush_all(struct mm_struct *mm);
> > +extern void vmacache_update(unsigned long addr, struct vm_area_struct *newvma);
> > +extern struct vm_area_struct *vmacache_find(struct mm_struct *mm,
> > +						    unsigned long addr);
> > +
> > +#ifndef CONFIG_MMU
> > +extern struct vm_area_struct *vmacache_find_exact(struct mm_struct *mm,
> > +						  unsigned long start,
> > +						  unsigned long end);
> > +#endif
> 
> We often omit the ifdefs in this case.  It means that a compile-time
> error becomes a link-time error but that's a small cost for unmucking
> the header files.  It doesn't matter much in vmacache.h, but some
> headers would become a complete maze of ifdefs otherwise.

Ok.

> >...
> >
> > +static bool vmacache_valid(struct mm_struct *mm)
> > +{
> > +	struct task_struct *curr = current;
> > +
> > +	if (mm != curr->mm)
> > +		return false;
> 
> What's going on here?  Handling a task poking around in someone else's
> mm?  I'm thinking "__access_remote_vm", but I don't know what you were
> thinking ;) An explanatory comment would be revealing.

I don't understand the doubt here. Seems like a pretty obvious thing to
check -- yes it's probably unlikely but we certainly don't want to be
validating the cache on an mm that's not ours... or are you saying it's
redundant??

And no, we don't want __access_remote_vm() here.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
@ 2014-03-04  0:59     ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-04  0:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 2014-03-03 at 16:40 -0800, Andrew Morton wrote:
> On Thu, 27 Feb 2014 13:48:24 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -342,9 +342,9 @@ struct mm_rss_stat {
> >  
> >  struct kioctx_table;
> >  struct mm_struct {
> > -	struct vm_area_struct * mmap;		/* list of VMAs */
> > +	struct vm_area_struct *mmap;		/* list of VMAs */
> >  	struct rb_root mm_rb;
> > -	struct vm_area_struct * mmap_cache;	/* last find_vma result */
> > +	u32 vmacache_seqnum;                   /* per-thread vmacache */
> 
> nitpick: in kernelese this is typically "per-task".  If it was in the
> mm_struct then it would be "per process".  And I guess if it was in the
> thread_struct it would be "per thread", but that isn't a distinction
> I've seen made.

Sure, I am referring to per-task, subject title as well. My mind just
treats them as synonyms in this context. My bad.

> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -23,6 +23,7 @@ struct sched_param {
> >  #include <linux/errno.h>
> >  #include <linux/nodemask.h>
> >  #include <linux/mm_types.h>
> > +#include <linux/vmacache.h>
> 
> This might be awkward - vmacache.h drags in mm.h and we have had tangly
> problems with these major header files in the past.  I'd be inclined to
> remove this inclusion and just forward-declare vm_area_struct, but we
> still need VMACACHE_SIZE, sigh.  Wait and see what happens, I guess.

Yeah, I wasn't sure what to do about that and was expecting it to come
up in the review process. Let me know if you want me to change/update
this.

> > ...
> >
> > --- /dev/null
> > +++ b/include/linux/vmacache.h
> > @@ -0,0 +1,40 @@
> > +#ifndef __LINUX_VMACACHE_H
> > +#define __LINUX_VMACACHE_H
> > +
> > +#include <linux/mm.h>
> > +
> > +#define VMACACHE_BITS 2
> > +#define VMACACHE_SIZE (1U << VMACACHE_BITS)
> > +#define VMACACHE_MASK (VMACACHE_SIZE - 1)
> > +/*
> > + * Hash based on the page number. Provides a good hit rate for
> > + * workloads with good locality and those with random accesses as well.
> > + */
> > +#define VMACACHE_HASH(addr) ((addr >> PAGE_SHIFT) & VMACACHE_MASK)
> > +
> > +#define vmacache_flush(tsk)					 \
> > +	do {							 \
> > +		memset(tsk->vmacache, 0, sizeof(tsk->vmacache)); \
> > +	} while (0)
> 
> There's no particular reason to implement this in cpp.  Using C is
> typesafer and nicer.  But then we get into header file issues again. 
> More sigh

Yep, I ran into that issue when trying to make it an inline function.

> 
> > +extern void vmacache_flush_all(struct mm_struct *mm);
> > +extern void vmacache_update(unsigned long addr, struct vm_area_struct *newvma);
> > +extern struct vm_area_struct *vmacache_find(struct mm_struct *mm,
> > +						    unsigned long addr);
> > +
> > +#ifndef CONFIG_MMU
> > +extern struct vm_area_struct *vmacache_find_exact(struct mm_struct *mm,
> > +						  unsigned long start,
> > +						  unsigned long end);
> > +#endif
> 
> We often omit the ifdefs in this case.  It means that a compile-time
> error becomes a link-time error but that's a small cost for unmucking
> the header files.  It doesn't matter much in vmacache.h, but some
> headers would become a complete maze of ifdefs otherwise.

Ok.

> >...
> >
> > +static bool vmacache_valid(struct mm_struct *mm)
> > +{
> > +	struct task_struct *curr = current;
> > +
> > +	if (mm != curr->mm)
> > +		return false;
> 
> What's going on here?  Handling a task poking around in someone else's
> mm?  I'm thinking "__access_remote_vm", but I don't know what you were
> thinking ;) An explanatory comment would be revealing.

I don't understand the doubt here. Seems like a pretty obvious thing to
check -- yes it's probably unlikely but we certainly don't want to be
validating the cache on an mm that's not ours... or are you saying it's
redundant??

And no, we don't want __access_remote_vm() here.

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  0:59     ` Davidlohr Bueso
  (?)
@ 2014-03-04  1:23     ` Andrew Morton
  2014-03-04  2:42         ` Davidlohr Bueso
  -1 siblings, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2014-03-04  1:23 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 03 Mar 2014 16:59:38 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:

> > >...
> > >
> > > +static bool vmacache_valid(struct mm_struct *mm)
> > > +{
> > > +	struct task_struct *curr = current;
> > > +
> > > +	if (mm != curr->mm)
> > > +		return false;
> > 
> > What's going on here?  Handling a task poking around in someone else's
> > mm?  I'm thinking "__access_remote_vm", but I don't know what you were
> > thinking ;) An explanatory comment would be revealing.
> 
> I don't understand the doubt here. Seems like a pretty obvious thing to
> check -- yes it's probably unlikely but we certainly don't want to be
> validating the cache on an mm that's not ours... or are you saying it's
> redundant??

Well it has to be here for a reason and I'm wondering that that reason
is.  If nobody comes here with a foreign mm then let's remove it.  Or
perhaps stick a WARN_ON_ONCE() in there to detect the unexpected.  If
there _is_ a real reason, let's write that down.

> And no, we don't want __access_remote_vm() here.

__access_remote_vm doesn't look at the vma cache, so scrub that
explanation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  1:23     ` Andrew Morton
@ 2014-03-04  2:42         ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-04  2:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 2014-03-03 at 17:23 -0800, Andrew Morton wrote:
> On Mon, 03 Mar 2014 16:59:38 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> 
> > > >...
> > > >
> > > > +static bool vmacache_valid(struct mm_struct *mm)
> > > > +{
> > > > +	struct task_struct *curr = current;
> > > > +
> > > > +	if (mm != curr->mm)
> > > > +		return false;
> > > 
> > > What's going on here?  Handling a task poking around in someone else's
> > > mm?  I'm thinking "__access_remote_vm", but I don't know what you were
> > > thinking ;) An explanatory comment would be revealing.
> > 
> > I don't understand the doubt here. Seems like a pretty obvious thing to
> > check -- yes it's probably unlikely but we certainly don't want to be
> > validating the cache on an mm that's not ours... or are you saying it's
> > redundant??
> 
> Well it has to be here for a reason and I'm wondering that that reason
> is.  If nobody comes here with a foreign mm then let's remove it.

find_vma() can be called by concurrent threads sharing the mm->mmap_sem
for reading, thus this check needs to be there.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
@ 2014-03-04  2:42         ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-04  2:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 2014-03-03 at 17:23 -0800, Andrew Morton wrote:
> On Mon, 03 Mar 2014 16:59:38 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> 
> > > >...
> > > >
> > > > +static bool vmacache_valid(struct mm_struct *mm)
> > > > +{
> > > > +	struct task_struct *curr = current;
> > > > +
> > > > +	if (mm != curr->mm)
> > > > +		return false;
> > > 
> > > What's going on here?  Handling a task poking around in someone else's
> > > mm?  I'm thinking "__access_remote_vm", but I don't know what you were
> > > thinking ;) An explanatory comment would be revealing.
> > 
> > I don't understand the doubt here. Seems like a pretty obvious thing to
> > check -- yes it's probably unlikely but we certainly don't want to be
> > validating the cache on an mm that's not ours... or are you saying it's
> > redundant??
> 
> Well it has to be here for a reason and I'm wondering that that reason
> is.  If nobody comes here with a foreign mm then let's remove it.

find_vma() can be called by concurrent threads sharing the mm->mmap_sem
for reading, thus this check needs to be there.

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  2:42         ` Davidlohr Bueso
  (?)
@ 2014-03-04  3:12         ` Andrew Morton
  2014-03-04  3:13             ` Davidlohr Bueso
  -1 siblings, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2014-03-04  3:12 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 03 Mar 2014 18:42:33 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:

> On Mon, 2014-03-03 at 17:23 -0800, Andrew Morton wrote:
> > On Mon, 03 Mar 2014 16:59:38 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> > 
> > > > >...
> > > > >
> > > > > +static bool vmacache_valid(struct mm_struct *mm)
> > > > > +{
> > > > > +	struct task_struct *curr = current;
> > > > > +
> > > > > +	if (mm != curr->mm)
> > > > > +		return false;
> > > > 
> > > > What's going on here?  Handling a task poking around in someone else's
> > > > mm?  I'm thinking "__access_remote_vm", but I don't know what you were
> > > > thinking ;) An explanatory comment would be revealing.
> > > 
> > > I don't understand the doubt here. Seems like a pretty obvious thing to
> > > check -- yes it's probably unlikely but we certainly don't want to be
> > > validating the cache on an mm that's not ours... or are you saying it's
> > > redundant??
> > 
> > Well it has to be here for a reason and I'm wondering that that reason
> > is.  If nobody comes here with a foreign mm then let's remove it.
> 
> find_vma() can be called by concurrent threads sharing the mm->mmap_sem
> for reading, thus this check needs to be there.

Confused.  If the threads share mm->mmap_sem then they share mm and the
test will always be false?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  3:12         ` Andrew Morton
@ 2014-03-04  3:13             ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-04  3:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 2014-03-03 at 19:12 -0800, Andrew Morton wrote:
> On Mon, 03 Mar 2014 18:42:33 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> 
> > On Mon, 2014-03-03 at 17:23 -0800, Andrew Morton wrote:
> > > On Mon, 03 Mar 2014 16:59:38 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> > > 
> > > > > >...
> > > > > >
> > > > > > +static bool vmacache_valid(struct mm_struct *mm)
> > > > > > +{
> > > > > > +	struct task_struct *curr = current;
> > > > > > +
> > > > > > +	if (mm != curr->mm)
> > > > > > +		return false;
> > > > > 
> > > > > What's going on here?  Handling a task poking around in someone else's
> > > > > mm?  I'm thinking "__access_remote_vm", but I don't know what you were
> > > > > thinking ;) An explanatory comment would be revealing.
> > > > 
> > > > I don't understand the doubt here. Seems like a pretty obvious thing to
> > > > check -- yes it's probably unlikely but we certainly don't want to be
> > > > validating the cache on an mm that's not ours... or are you saying it's
> > > > redundant??
> > > 
> > > Well it has to be here for a reason and I'm wondering that that reason
> > > is.  If nobody comes here with a foreign mm then let's remove it.
> > 
> > find_vma() can be called by concurrent threads sharing the mm->mmap_sem
> > for reading, thus this check needs to be there.
> 
> Confused.  If the threads share mm->mmap_sem then they share mm and the
> test will always be false?

Yes, I shortly realized that was silly... but I can say for sure it can
happen and a quick qemu run confirms it. So I see your point as to
asking why we need it, so now I'm looking for an explanation in the
code.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
@ 2014-03-04  3:13             ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-04  3:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 2014-03-03 at 19:12 -0800, Andrew Morton wrote:
> On Mon, 03 Mar 2014 18:42:33 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> 
> > On Mon, 2014-03-03 at 17:23 -0800, Andrew Morton wrote:
> > > On Mon, 03 Mar 2014 16:59:38 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> > > 
> > > > > >...
> > > > > >
> > > > > > +static bool vmacache_valid(struct mm_struct *mm)
> > > > > > +{
> > > > > > +	struct task_struct *curr = current;
> > > > > > +
> > > > > > +	if (mm != curr->mm)
> > > > > > +		return false;
> > > > > 
> > > > > What's going on here?  Handling a task poking around in someone else's
> > > > > mm?  I'm thinking "__access_remote_vm", but I don't know what you were
> > > > > thinking ;) An explanatory comment would be revealing.
> > > > 
> > > > I don't understand the doubt here. Seems like a pretty obvious thing to
> > > > check -- yes it's probably unlikely but we certainly don't want to be
> > > > validating the cache on an mm that's not ours... or are you saying it's
> > > > redundant??
> > > 
> > > Well it has to be here for a reason and I'm wondering that that reason
> > > is.  If nobody comes here with a foreign mm then let's remove it.
> > 
> > find_vma() can be called by concurrent threads sharing the mm->mmap_sem
> > for reading, thus this check needs to be there.
> 
> Confused.  If the threads share mm->mmap_sem then they share mm and the
> test will always be false?

Yes, I shortly realized that was silly... but I can say for sure it can
happen and a quick qemu run confirms it. So I see your point as to
asking why we need it, so now I'm looking for an explanation in the
code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  3:13             ` Davidlohr Bueso
  (?)
@ 2014-03-04  3:26             ` Andrew Morton
  -1 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2014-03-04  3:26 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 03 Mar 2014 19:13:30 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:

> On Mon, 2014-03-03 at 19:12 -0800, Andrew Morton wrote:
> > On Mon, 03 Mar 2014 18:42:33 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> > 
> > > On Mon, 2014-03-03 at 17:23 -0800, Andrew Morton wrote:
> > > > On Mon, 03 Mar 2014 16:59:38 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:
> > > > 
> > > > > > >...
> > > > > > >
> > > > > > > +static bool vmacache_valid(struct mm_struct *mm)
> > > > > > > +{
> > > > > > > +	struct task_struct *curr = current;
> > > > > > > +
> > > > > > > +	if (mm != curr->mm)
> > > > > > > +		return false;
> > > > > > 
> > > > > > What's going on here?  Handling a task poking around in someone else's
> > > > > > mm?  I'm thinking "__access_remote_vm", but I don't know what you were
> > > > > > thinking ;) An explanatory comment would be revealing.
> > > > > 
> > > > > I don't understand the doubt here. Seems like a pretty obvious thing to
> > > > > check -- yes it's probably unlikely but we certainly don't want to be
> > > > > validating the cache on an mm that's not ours... or are you saying it's
> > > > > redundant??
> > > > 
> > > > Well it has to be here for a reason and I'm wondering that that reason
> > > > is.  If nobody comes here with a foreign mm then let's remove it.
> > > 
> > > find_vma() can be called by concurrent threads sharing the mm->mmap_sem
> > > for reading, thus this check needs to be there.
> > 
> > Confused.  If the threads share mm->mmap_sem then they share mm and the
> > test will always be false?
> 
> Yes, I shortly realized that was silly... but I can say for sure it can
> happen and a quick qemu run confirms it. So I see your point as to
> asking why we need it, so now I'm looking for an explanation in the
> code.

Great, please do.  We may well find that we have buggy (or at least
inefficient) callers, which we can fix.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  3:13             ` Davidlohr Bueso
@ 2014-03-04  3:26               ` Linus Torvalds
  -1 siblings, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2014-03-04  3:26 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, Chandramouleeswaran,
	Aswin, Norton, Scott J, linux-mm, Linux Kernel Mailing List

On Mon, Mar 3, 2014 at 7:13 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> Yes, I shortly realized that was silly... but I can say for sure it can
> happen and a quick qemu run confirms it. So I see your point as to
> asking why we need it, so now I'm looking for an explanation in the
> code.

We definitely *do* have users.

One example would be ptrace -> access_process_vm -> __access_remote_vm
-> get_user_pages() -> find_extend_vma() -> find_vma_prev -> find_vma.

                    Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
@ 2014-03-04  3:26               ` Linus Torvalds
  0 siblings, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2014-03-04  3:26 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, Chandramouleeswaran,
	Aswin, Norton, Scott J, linux-mm, Linux Kernel Mailing List

On Mon, Mar 3, 2014 at 7:13 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> Yes, I shortly realized that was silly... but I can say for sure it can
> happen and a quick qemu run confirms it. So I see your point as to
> asking why we need it, so now I'm looking for an explanation in the
> code.

We definitely *do* have users.

One example would be ptrace -> access_process_vm -> __access_remote_vm
-> get_user_pages() -> find_extend_vma() -> find_vma_prev -> find_vma.

                    Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  3:26               ` Linus Torvalds
@ 2014-03-04  5:32                 ` Davidlohr Bueso
  -1 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-04  5:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, Chandramouleeswaran,
	Aswin, Norton, Scott J, linux-mm, Linux Kernel Mailing List

On Mon, 2014-03-03 at 19:26 -0800, Linus Torvalds wrote:
> On Mon, Mar 3, 2014 at 7:13 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >
> > Yes, I shortly realized that was silly... but I can say for sure it can
> > happen and a quick qemu run confirms it. So I see your point as to
> > asking why we need it, so now I'm looking for an explanation in the
> > code.
> 
> We definitely *do* have users.
> 
> One example would be ptrace -> access_process_vm -> __access_remote_vm
> -> get_user_pages() -> find_extend_vma() -> find_vma_prev -> find_vma.

And:

[    4.274542] Call Trace:
[    4.274747]  [<ffffffff81809525>] dump_stack+0x46/0x58
[    4.275069]  [<ffffffff811331ee>] vmacache_find+0xae/0xc0
[    4.275425]  [<ffffffff8113c840>] find_vma+0x20/0x80
[    4.275625]  [<ffffffff8113e5cb>] find_extend_vma+0x2b/0x90
[    4.275982]  [<ffffffff81138a09>] __get_user_pages+0x99/0x5a0
[    4.276427]  [<ffffffff81137b0b>] ? follow_page_mask+0x32b/0x400
[    4.276671]  [<ffffffff81138fc2>] get_user_pages+0x52/0x60
[    4.276886]  [<ffffffff81167dc3>] copy_strings.isra.20+0x1a3/0x2f0
[    4.277239]  [<ffffffff81167f4d>] copy_strings_kernel+0x3d/0x50
[    4.277472]  [<ffffffff811b3688>] load_script+0x1e8/0x280
[    4.277692]  [<ffffffff81167d0a>] ? copy_strings.isra.20+0xea/0x2f0
[    4.277931]  [<ffffffff81167ff7>] search_binary_handler+0x97/0x1d0
[    4.278288]  [<ffffffff811694bf>] do_execve_common.isra.28+0x4ef/0x650
[    4.278544]  [<ffffffff81169638>] do_execve+0x18/0x20
[    4.278754]  [<ffffffff8116984e>] SyS_execve+0x2e/0x40
[    4.278960]  [<ffffffff8181b549>] stub_execve+0x69/0xa0



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
@ 2014-03-04  5:32                 ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-04  5:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, Chandramouleeswaran,
	Aswin, Norton, Scott J, linux-mm, Linux Kernel Mailing List

On Mon, 2014-03-03 at 19:26 -0800, Linus Torvalds wrote:
> On Mon, Mar 3, 2014 at 7:13 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >
> > Yes, I shortly realized that was silly... but I can say for sure it can
> > happen and a quick qemu run confirms it. So I see your point as to
> > asking why we need it, so now I'm looking for an explanation in the
> > code.
> 
> We definitely *do* have users.
> 
> One example would be ptrace -> access_process_vm -> __access_remote_vm
> -> get_user_pages() -> find_extend_vma() -> find_vma_prev -> find_vma.

And:

[    4.274542] Call Trace:
[    4.274747]  [<ffffffff81809525>] dump_stack+0x46/0x58
[    4.275069]  [<ffffffff811331ee>] vmacache_find+0xae/0xc0
[    4.275425]  [<ffffffff8113c840>] find_vma+0x20/0x80
[    4.275625]  [<ffffffff8113e5cb>] find_extend_vma+0x2b/0x90
[    4.275982]  [<ffffffff81138a09>] __get_user_pages+0x99/0x5a0
[    4.276427]  [<ffffffff81137b0b>] ? follow_page_mask+0x32b/0x400
[    4.276671]  [<ffffffff81138fc2>] get_user_pages+0x52/0x60
[    4.276886]  [<ffffffff81167dc3>] copy_strings.isra.20+0x1a3/0x2f0
[    4.277239]  [<ffffffff81167f4d>] copy_strings_kernel+0x3d/0x50
[    4.277472]  [<ffffffff811b3688>] load_script+0x1e8/0x280
[    4.277692]  [<ffffffff81167d0a>] ? copy_strings.isra.20+0xea/0x2f0
[    4.277931]  [<ffffffff81167ff7>] search_binary_handler+0x97/0x1d0
[    4.278288]  [<ffffffff811694bf>] do_execve_common.isra.28+0x4ef/0x650
[    4.278544]  [<ffffffff81169638>] do_execve+0x18/0x20
[    4.278754]  [<ffffffff8116984e>] SyS_execve+0x2e/0x40
[    4.278960]  [<ffffffff8181b549>] stub_execve+0x69/0xa0


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  0:59     ` Davidlohr Bueso
@ 2014-03-06 22:56       ` Andrew Morton
  -1 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2014-03-06 22:56 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 03 Mar 2014 16:59:38 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:

> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -23,6 +23,7 @@ struct sched_param {
> > >  #include <linux/errno.h>
> > >  #include <linux/nodemask.h>
> > >  #include <linux/mm_types.h>
> > > +#include <linux/vmacache.h>
> > 
> > This might be awkward - vmacache.h drags in mm.h and we have had tangly
> > problems with these major header files in the past.  I'd be inclined to
> > remove this inclusion and just forward-declare vm_area_struct, but we
> > still need VMACACHE_SIZE, sigh.  Wait and see what happens, I guess.
> 
> Yeah, I wasn't sure what to do about that and was expecting it to come
> up in the review process. Let me know if you want me to change/update
> this.

OK, so the include graph has blown up in our faces.

This is what I came up with.  Haven't tested it a lot yet.  Thoughts?


From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-per-thread-vma-caching-fix-3

Attempt to untangle header files.

Prior to this patch:

mm.h does not require sched.h
sched.h does not require mm.h
sched.h requires vmacache.h
vmacache.h requires mm.h

After this patch:

mm.h still does not require sched.h
sched.h still does not require mm.h
sched.h does not require vmacache.h
mm.h does not require vmacache.h
vmacache.h requires (and includes) mm.h
vmacache.h requires (and includes) sched.h

To do all this, the three "#define VMACACHE_foo" lines were moved to
sched.h.

The inclusions of sched.h and mm.h into vmacache.h are actually
unrequired because the .c files include mm.h and sched.h directly, but
I put them in there for cargo-cult reasons.

vmacache_flush() no longer needs to be implemented in cpp - make it so.

Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/unicore32/include/asm/mmu_context.h |    2 ++
 fs/exec.c                                |    1 +
 fs/proc/task_mmu.c                       |    1 +
 include/linux/sched.h                    |    5 ++++-
 include/linux/vmacache.h                 |   12 +++++-------
 kernel/debug/debug_core.c                |    1 +
 kernel/fork.c                            |    2 ++
 mm/mmap.c                                |    1 +
 mm/nommu.c                               |    1 +
 mm/vmacache.c                            |    1 +
 10 files changed, 19 insertions(+), 8 deletions(-)

diff -puN arch/unicore32/include/asm/mmu_context.h~mm-per-thread-vma-caching-fix-3 arch/unicore32/include/asm/mmu_context.h
--- a/arch/unicore32/include/asm/mmu_context.h~mm-per-thread-vma-caching-fix-3
+++ a/arch/unicore32/include/asm/mmu_context.h
@@ -14,6 +14,8 @@
 
 #include <linux/compiler.h>
 #include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/io.h>
 
 #include <asm/cacheflush.h>
diff -puN fs/exec.c~mm-per-thread-vma-caching-fix-3 fs/exec.c
--- a/fs/exec.c~mm-per-thread-vma-caching-fix-3
+++ a/fs/exec.c
@@ -26,6 +26,7 @@
 #include <linux/file.h>
 #include <linux/fdtable.h>
 #include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/stat.h>
 #include <linux/fcntl.h>
 #include <linux/swap.h>
diff -puN fs/proc/task_mmu.c~mm-per-thread-vma-caching-fix-3 fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~mm-per-thread-vma-caching-fix-3
+++ a/fs/proc/task_mmu.c
@@ -1,4 +1,5 @@
 #include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/hugetlb.h>
 #include <linux/huge_mm.h>
 #include <linux/mount.h>
diff -puN include/linux/mm_types.h~mm-per-thread-vma-caching-fix-3 include/linux/mm_types.h
diff -puN include/linux/sched.h~mm-per-thread-vma-caching-fix-3 include/linux/sched.h
--- a/include/linux/sched.h~mm-per-thread-vma-caching-fix-3
+++ a/include/linux/sched.h
@@ -23,7 +23,6 @@ struct sched_param {
 #include <linux/errno.h>
 #include <linux/nodemask.h>
 #include <linux/mm_types.h>
-#include <linux/vmacache.h>
 #include <linux/preempt_mask.h>
 
 #include <asm/page.h>
@@ -131,6 +130,10 @@ struct perf_event_context;
 struct blk_plug;
 struct filename;
 
+#define VMACACHE_BITS 2
+#define VMACACHE_SIZE (1U << VMACACHE_BITS)
+#define VMACACHE_MASK (VMACACHE_SIZE - 1)
+
 /*
  * List of flags we want to share for kernel threads,
  * if only because they are not used by them anyway.
diff -puN include/linux/vmacache.h~mm-per-thread-vma-caching-fix-3 include/linux/vmacache.h
--- a/include/linux/vmacache.h~mm-per-thread-vma-caching-fix-3
+++ a/include/linux/vmacache.h
@@ -1,21 +1,19 @@
 #ifndef __LINUX_VMACACHE_H
 #define __LINUX_VMACACHE_H
 
+#include <linux/sched.h>
 #include <linux/mm.h>
 
-#define VMACACHE_BITS 2
-#define VMACACHE_SIZE (1U << VMACACHE_BITS)
-#define VMACACHE_MASK (VMACACHE_SIZE - 1)
 /*
  * Hash based on the page number. Provides a good hit rate for
  * workloads with good locality and those with random accesses as well.
  */
 #define VMACACHE_HASH(addr) ((addr >> PAGE_SHIFT) & VMACACHE_MASK)
 
-#define vmacache_flush(tsk)					 \
-	do {							 \
-		memset(tsk->vmacache, 0, sizeof(tsk->vmacache)); \
-	} while (0)
+static inline void vmacache_flush(struct task_struct *tsk)
+{
+	memset(tsk->vmacache, 0, sizeof(tsk->vmacache));
+}
 
 extern void vmacache_flush_all(struct mm_struct *mm);
 extern void vmacache_update(unsigned long addr, struct vm_area_struct *newvma);
diff -puN kernel/debug/debug_core.c~mm-per-thread-vma-caching-fix-3 kernel/debug/debug_core.c
--- a/kernel/debug/debug_core.c~mm-per-thread-vma-caching-fix-3
+++ a/kernel/debug/debug_core.c
@@ -49,6 +49,7 @@
 #include <linux/pid.h>
 #include <linux/smp.h>
 #include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/rcupdate.h>
 
 #include <asm/cacheflush.h>
diff -puN kernel/fork.c~mm-per-thread-vma-caching-fix-3 kernel/fork.c
--- a/kernel/fork.c~mm-per-thread-vma-caching-fix-3
+++ a/kernel/fork.c
@@ -28,6 +28,8 @@
 #include <linux/mman.h>
 #include <linux/mmu_notifier.h>
 #include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/nsproxy.h>
 #include <linux/capability.h>
 #include <linux/cpu.h>
diff -puN mm/Makefile~mm-per-thread-vma-caching-fix-3 mm/Makefile
diff -puN mm/mmap.c~mm-per-thread-vma-caching-fix-3 mm/mmap.c
--- a/mm/mmap.c~mm-per-thread-vma-caching-fix-3
+++ a/mm/mmap.c
@@ -10,6 +10,7 @@
 #include <linux/slab.h>
 #include <linux/backing-dev.h>
 #include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/shm.h>
 #include <linux/mman.h>
 #include <linux/pagemap.h>
diff -puN mm/nommu.c~mm-per-thread-vma-caching-fix-3 mm/nommu.c
--- a/mm/nommu.c~mm-per-thread-vma-caching-fix-3
+++ a/mm/nommu.c
@@ -15,6 +15,7 @@
 
 #include <linux/export.h>
 #include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/mman.h>
 #include <linux/swap.h>
 #include <linux/file.h>
diff -puN mm/vmacache.c~mm-per-thread-vma-caching-fix-3 mm/vmacache.c
--- a/mm/vmacache.c~mm-per-thread-vma-caching-fix-3
+++ a/mm/vmacache.c
@@ -2,6 +2,7 @@
  * Copyright (C) 2014 Davidlohr Bueso.
  */
 #include <linux/sched.h>
+#include <linux/mm.h>
 #include <linux/vmacache.h>
 
 /*
_


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
@ 2014-03-06 22:56       ` Andrew Morton
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2014-03-06 22:56 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Michel Lespinasse,
	Mel Gorman, Rik van Riel, KOSAKI Motohiro, aswin, scott.norton,
	linux-mm, linux-kernel

On Mon, 03 Mar 2014 16:59:38 -0800 Davidlohr Bueso <davidlohr@hp.com> wrote:

> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -23,6 +23,7 @@ struct sched_param {
> > >  #include <linux/errno.h>
> > >  #include <linux/nodemask.h>
> > >  #include <linux/mm_types.h>
> > > +#include <linux/vmacache.h>
> > 
> > This might be awkward - vmacache.h drags in mm.h and we have had tangly
> > problems with these major header files in the past.  I'd be inclined to
> > remove this inclusion and just forward-declare vm_area_struct, but we
> > still need VMACACHE_SIZE, sigh.  Wait and see what happens, I guess.
> 
> Yeah, I wasn't sure what to do about that and was expecting it to come
> up in the review process. Let me know if you want me to change/update
> this.

OK, so the include graph has blown up in our faces.

This is what I came up with.  Haven't tested it a lot yet.  Thoughts?


From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-per-thread-vma-caching-fix-3

Attempt to untangle header files.

Prior to this patch:

mm.h does not require sched.h
sched.h does not require mm.h
sched.h requires vmacache.h
vmacache.h requires mm.h

After this patch:

mm.h still does not require sched.h
sched.h still does not require mm.h
sched.h does not require vmacache.h
mm.h does not require vmacache.h
vmacache.h requires (and includes) mm.h
vmacache.h requires (and includes) sched.h

To do all this, the three "#define VMACACHE_foo" lines were moved to
sched.h.

The inclusions of sched.h and mm.h into vmacache.h are actually
unrequired because the .c files include mm.h and sched.h directly, but
I put them in there for cargo-cult reasons.

vmacache_flush() no longer needs to be implemented in cpp - make it so.

Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/unicore32/include/asm/mmu_context.h |    2 ++
 fs/exec.c                                |    1 +
 fs/proc/task_mmu.c                       |    1 +
 include/linux/sched.h                    |    5 ++++-
 include/linux/vmacache.h                 |   12 +++++-------
 kernel/debug/debug_core.c                |    1 +
 kernel/fork.c                            |    2 ++
 mm/mmap.c                                |    1 +
 mm/nommu.c                               |    1 +
 mm/vmacache.c                            |    1 +
 10 files changed, 19 insertions(+), 8 deletions(-)

diff -puN arch/unicore32/include/asm/mmu_context.h~mm-per-thread-vma-caching-fix-3 arch/unicore32/include/asm/mmu_context.h
--- a/arch/unicore32/include/asm/mmu_context.h~mm-per-thread-vma-caching-fix-3
+++ a/arch/unicore32/include/asm/mmu_context.h
@@ -14,6 +14,8 @@
 
 #include <linux/compiler.h>
 #include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/io.h>
 
 #include <asm/cacheflush.h>
diff -puN fs/exec.c~mm-per-thread-vma-caching-fix-3 fs/exec.c
--- a/fs/exec.c~mm-per-thread-vma-caching-fix-3
+++ a/fs/exec.c
@@ -26,6 +26,7 @@
 #include <linux/file.h>
 #include <linux/fdtable.h>
 #include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/stat.h>
 #include <linux/fcntl.h>
 #include <linux/swap.h>
diff -puN fs/proc/task_mmu.c~mm-per-thread-vma-caching-fix-3 fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~mm-per-thread-vma-caching-fix-3
+++ a/fs/proc/task_mmu.c
@@ -1,4 +1,5 @@
 #include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/hugetlb.h>
 #include <linux/huge_mm.h>
 #include <linux/mount.h>
diff -puN include/linux/mm_types.h~mm-per-thread-vma-caching-fix-3 include/linux/mm_types.h
diff -puN include/linux/sched.h~mm-per-thread-vma-caching-fix-3 include/linux/sched.h
--- a/include/linux/sched.h~mm-per-thread-vma-caching-fix-3
+++ a/include/linux/sched.h
@@ -23,7 +23,6 @@ struct sched_param {
 #include <linux/errno.h>
 #include <linux/nodemask.h>
 #include <linux/mm_types.h>
-#include <linux/vmacache.h>
 #include <linux/preempt_mask.h>
 
 #include <asm/page.h>
@@ -131,6 +130,10 @@ struct perf_event_context;
 struct blk_plug;
 struct filename;
 
+#define VMACACHE_BITS 2
+#define VMACACHE_SIZE (1U << VMACACHE_BITS)
+#define VMACACHE_MASK (VMACACHE_SIZE - 1)
+
 /*
  * List of flags we want to share for kernel threads,
  * if only because they are not used by them anyway.
diff -puN include/linux/vmacache.h~mm-per-thread-vma-caching-fix-3 include/linux/vmacache.h
--- a/include/linux/vmacache.h~mm-per-thread-vma-caching-fix-3
+++ a/include/linux/vmacache.h
@@ -1,21 +1,19 @@
 #ifndef __LINUX_VMACACHE_H
 #define __LINUX_VMACACHE_H
 
+#include <linux/sched.h>
 #include <linux/mm.h>
 
-#define VMACACHE_BITS 2
-#define VMACACHE_SIZE (1U << VMACACHE_BITS)
-#define VMACACHE_MASK (VMACACHE_SIZE - 1)
 /*
  * Hash based on the page number. Provides a good hit rate for
  * workloads with good locality and those with random accesses as well.
  */
 #define VMACACHE_HASH(addr) ((addr >> PAGE_SHIFT) & VMACACHE_MASK)
 
-#define vmacache_flush(tsk)					 \
-	do {							 \
-		memset(tsk->vmacache, 0, sizeof(tsk->vmacache)); \
-	} while (0)
+static inline void vmacache_flush(struct task_struct *tsk)
+{
+	memset(tsk->vmacache, 0, sizeof(tsk->vmacache));
+}
 
 extern void vmacache_flush_all(struct mm_struct *mm);
 extern void vmacache_update(unsigned long addr, struct vm_area_struct *newvma);
diff -puN kernel/debug/debug_core.c~mm-per-thread-vma-caching-fix-3 kernel/debug/debug_core.c
--- a/kernel/debug/debug_core.c~mm-per-thread-vma-caching-fix-3
+++ a/kernel/debug/debug_core.c
@@ -49,6 +49,7 @@
 #include <linux/pid.h>
 #include <linux/smp.h>
 #include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/rcupdate.h>
 
 #include <asm/cacheflush.h>
diff -puN kernel/fork.c~mm-per-thread-vma-caching-fix-3 kernel/fork.c
--- a/kernel/fork.c~mm-per-thread-vma-caching-fix-3
+++ a/kernel/fork.c
@@ -28,6 +28,8 @@
 #include <linux/mman.h>
 #include <linux/mmu_notifier.h>
 #include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/nsproxy.h>
 #include <linux/capability.h>
 #include <linux/cpu.h>
diff -puN mm/Makefile~mm-per-thread-vma-caching-fix-3 mm/Makefile
diff -puN mm/mmap.c~mm-per-thread-vma-caching-fix-3 mm/mmap.c
--- a/mm/mmap.c~mm-per-thread-vma-caching-fix-3
+++ a/mm/mmap.c
@@ -10,6 +10,7 @@
 #include <linux/slab.h>
 #include <linux/backing-dev.h>
 #include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/shm.h>
 #include <linux/mman.h>
 #include <linux/pagemap.h>
diff -puN mm/nommu.c~mm-per-thread-vma-caching-fix-3 mm/nommu.c
--- a/mm/nommu.c~mm-per-thread-vma-caching-fix-3
+++ a/mm/nommu.c
@@ -15,6 +15,7 @@
 
 #include <linux/export.h>
 #include <linux/mm.h>
+#include <linux/vmacache.h>
 #include <linux/mman.h>
 #include <linux/swap.h>
 #include <linux/file.h>
diff -puN mm/vmacache.c~mm-per-thread-vma-caching-fix-3 mm/vmacache.c
--- a/mm/vmacache.c~mm-per-thread-vma-caching-fix-3
+++ a/mm/vmacache.c
@@ -2,6 +2,7 @@
  * Copyright (C) 2014 Davidlohr Bueso.
  */
 #include <linux/sched.h>
+#include <linux/mm.h>
 #include <linux/vmacache.h>
 
 /*
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
       [not found]   ` <CA+55aFw88xiY+o5FE6VtHNkpUZDK3FPt31oCpNsgn1BH7wAPZw@mail.gmail.com>
@ 2014-03-08 19:57     ` Oleg Nesterov
       [not found]     ` <20140308194405.GA32403@redhat.com>
  1 sibling, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2014-03-08 19:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Davidlohr Bueso, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, linux-kernel

looks like I removed lkml somehow, resend...

On 03/08, Linus Torvalds wrote:
>
> On Sat, Mar 8, 2014 at 10:40 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > It seems this should be moved into copy_mm(), CLONE_VM needs to invalidate
> > ->vmacache too.
>
> No it doesn't. CLONE_VM doesn't change any of the vma lists,

Sure. But another thread or CLONE_VM task can do vmacache_invalidate(),
hit vmacache_seqnum == 0 and call vmacache_flush_all() to solve the
problem with potential overflow.

> so
> there's no reason to invalidate anything.
>
> > dup_task_struct() also copies vmacache_seqnum/vmacache, but the new thread
> > is not yet visible to vmacache_flush_all().
>
> So either the new task struct will share the mm, in which case the
> cache entries are fine

Not if they should be flushed by vmacache_flush_all() above.

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
       [not found]     ` <20140308194405.GA32403@redhat.com>
@ 2014-03-08 20:02       ` Linus Torvalds
  2014-03-09  3:22         ` Davidlohr Bueso
  2014-03-09 12:57         ` Oleg Nesterov
  0 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2014-03-08 20:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Davidlohr Bueso, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, Linux Kernel Mailing List

On Sat, Mar 8, 2014 at 11:44 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Sure. But another thread or CLONE_VM task can do vmacache_invalidate(),
> hit vmacache_seqnum == 0 and call vmacache_flush_all() to solve the
> problem with potential overflow.

How?

Any invalidation is supposed to hold the mm semaphore for writing. And
we should have it for reading.

That said, maybe we don't. Maybe we only get it in the dup_mm() path,
I didn't check. In that case, we should probably either get it, or do
some silly memory barrier thing ("check that the sequence number
didn't change between copying the cache and exposing the new thread").

            Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-08 20:02       ` Linus Torvalds
@ 2014-03-09  3:22         ` Davidlohr Bueso
  2014-03-09 12:57         ` Oleg Nesterov
  1 sibling, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-09  3:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, Linux Kernel Mailing List

On Sat, 2014-03-08 at 12:02 -0800, Linus Torvalds wrote:
> On Sat, Mar 8, 2014 at 11:44 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Sure. But another thread or CLONE_VM task can do vmacache_invalidate(),
> > hit vmacache_seqnum == 0 and call vmacache_flush_all() to solve the
> > problem with potential overflow.
> 
> How?
> 
> Any invalidation is supposed to hold the mm semaphore for writing. And
> we should have it for reading.

Yes, invalidations are always with the write lock held. In any case it's
a good candidate to use verify_mm_writelocked(), even if it's only under
debug environments.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-08 20:02       ` Linus Torvalds
  2014-03-09  3:22         ` Davidlohr Bueso
@ 2014-03-09 12:57         ` Oleg Nesterov
  2014-03-09 15:57           ` Linus Torvalds
  1 sibling, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2014-03-09 12:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Davidlohr Bueso, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, Linux Kernel Mailing List

On 03/08, Linus Torvalds wrote:
>
> On Sat, Mar 8, 2014 at 11:44 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Sure. But another thread or CLONE_VM task can do vmacache_invalidate(),
> > hit vmacache_seqnum == 0 and call vmacache_flush_all() to solve the
> > problem with potential overflow.
>
> How?
>
> Any invalidation is supposed to hold the mm semaphore for writing.

Yes,

> And
> we should have it for reading.

No, dup_task_struct() is obviously lockless. And the new child is not yet
visible to for_each_process_thread().

clone(CLONE_VM) can create a thread with the corrupted vmacache.




OK. Suppose we have a task T1 which has the valid vmacache,
T1->vmacache_seqnum == T1->mm->vmacache_seqnum == 0. Suppose it sleeps a lot.

Suppose that its subthread T2 does a lot munmap's, finally mm->vmacache_seqnum
becomes zero again and T2 calls vmacache_flush_all().

T1 wakes up and does clone(CLONE_VM). The new thread T3 gets the copy
of T2's ->vmacache_seqnum and ->vmacache[].

T2 continues, vmacache_flush_all() finds T1 and does vmacache_flush(T1).

But the new thread T3 is not on the list yet, vmacache_flush_all() can't
find it.

So T3 will run with vmacache_valid() == T (till the next invalidate(mm)
of course) but its ->vmacache[] points to nowhere.

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-09 12:57         ` Oleg Nesterov
@ 2014-03-09 15:57           ` Linus Torvalds
  2014-03-09 17:09             ` Oleg Nesterov
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2014-03-09 15:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Davidlohr Bueso, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, Linux Kernel Mailing List

On Sun, Mar 9, 2014 at 5:57 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> No, dup_task_struct() is obviously lockless. And the new child is not yet
> visible to for_each_process_thread().

Ok, then the siimple approach is to just do

    /* Did we miss an invalidate event? *
    if (mm->seqcount < tsk->seqcount)
        clear_vma_cache();

after making the new thread visible.

Then the "race" becomes one of "we cannot have 4 billion mmap/munmap
events in other threads while we're setting up a new thread", which I
think is fine.

               Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-09 15:57           ` Linus Torvalds
@ 2014-03-09 17:09             ` Oleg Nesterov
  2014-03-09 17:16               ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2014-03-09 17:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Davidlohr Bueso, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, Linux Kernel Mailing List

On 03/09, Linus Torvalds wrote:
>
> On Sun, Mar 9, 2014 at 5:57 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > No, dup_task_struct() is obviously lockless. And the new child is not yet
> > visible to for_each_process_thread().
>
> Ok, then the siimple approach is to just do
>
>     /* Did we miss an invalidate event? *
>     if (mm->seqcount < tsk->seqcount)
>         clear_vma_cache();
>
> after making the new thread visible.
>
> Then the "race" becomes one of "we cannot have 4 billion mmap/munmap
> events in other threads while we're setting up a new thread",

But it's not the "while we're setting up a new thread", it is "since
vmacache_valid() was called list time". And the cloning task can just
sleep(A_LOT) and then do CLONE_VM.

Of course, of course, this race is pute theoretical anyway. But imho
makes sense to fix anyway, and the natural/trivial approach is just to
move vmacache_flush(tsk) from dup_mm() to copy_mm(), right after the
"if (!oldmm)" check.

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-09 17:09             ` Oleg Nesterov
@ 2014-03-09 17:16               ` Linus Torvalds
  2014-03-10 19:56                 ` [PATCH -next] mm,vmacache: also flush cache for VM_CLONE Davidlohr Bueso
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2014-03-09 17:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Davidlohr Bueso, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, Linux Kernel Mailing List

On Sun, Mar 9, 2014 at 10:09 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> But it's not the "while we're setting up a new thread", it is "since
> vmacache_valid() was called list time". And the cloning task can just
> sleep(A_LOT) and then do CLONE_VM.
>
> Of course, of course, this race is pute theoretical anyway. But imho
> makes sense to fix anyway, and the natural/trivial approach is just to
> move vmacache_flush(tsk) from dup_mm() to copy_mm(), right after the
> "if (!oldmm)" check.

Fair enough, you've convinced me it's subtle enough that we do not
want to play games..

          Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH -next] mm,vmacache: also flush cache for VM_CLONE
  2014-03-09 17:16               ` Linus Torvalds
@ 2014-03-10 19:56                 ` Davidlohr Bueso
  2014-03-13 14:59                   ` Oleg Nesterov
  0 siblings, 1 reply; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-10 19:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, Linux Kernel Mailing List

Oleg found that there is a potential race if we don't flush the task
for threads (VM_CLONE):

"Suppose we have a task T1 which has the valid vmacache,
T1->vmacache_seqnum == T1->mm->vmacache_seqnum == 0. Suppose it sleeps a lot.

Suppose that its subthread T2 does a lot munmap's, finally mm->vmacache_seqnum
becomes zero again and T2 calls vmacache_flush_all().

T1 wakes up and does clone(CLONE_VM). The new thread T3 gets the copy
of T2's ->vmacache_seqnum and ->vmacache[].

T2 continues, vmacache_flush_all() finds T1 and does vmacache_flush(T1).

But the new thread T3 is not on the list yet, vmacache_flush_all() can't
find it.

So T3 will run with vmacache_valid() == T (till the next invalidate(mm)
of course) but its ->vmacache[] points to nowhere."

Address this by moving the flush call into copy_mm(), instead of only
having it in dup_mm().

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 kernel/fork.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 3e02737..45b6241 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -841,9 +841,6 @@ static struct mm_struct *dup_mm(struct task_struct *tsk)
 	if (mm->binfmt && !try_module_get(mm->binfmt->module))
 		goto free_pt;
 
-	/* initialize the new vmacache entries */
-	vmacache_flush(tsk);
-
 	return mm;
 
 free_pt:
@@ -887,6 +884,9 @@ static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
 	if (!oldmm)
 		return 0;
 
+	/* initialize the new vmacache entries */
+	vmacache_flush(tsk);
+
 	if (clone_flags & CLONE_VM) {
 		atomic_inc(&oldmm->mm_users);
 		mm = oldmm;
-- 
1.8.1.4




^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH -next] mm,vmacache: also flush cache for VM_CLONE
  2014-03-10 19:56                 ` [PATCH -next] mm,vmacache: also flush cache for VM_CLONE Davidlohr Bueso
@ 2014-03-13 14:59                   ` Oleg Nesterov
  2014-03-13 15:32                     ` Oleg Nesterov
       [not found]                     ` <CA+55aFyNd7L+G3hFauJPxUOengK-_o2G-SFmVooPZ-sE6xBj=g@mail.gmail.com>
  0 siblings, 2 replies; 42+ messages in thread
From: Oleg Nesterov @ 2014-03-13 14:59 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, Linux Kernel Mailing List

Sorry for delay, I was distracted...

On 03/10, Davidlohr Bueso wrote:
>
> @@ -841,9 +841,6 @@ static struct mm_struct *dup_mm(struct task_struct *tsk)
>  	if (mm->binfmt && !try_module_get(mm->binfmt->module))
>  		goto free_pt;
>
> -	/* initialize the new vmacache entries */
> -	vmacache_flush(tsk);
> -
>  	return mm;
>
>  free_pt:
> @@ -887,6 +884,9 @@ static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
>  	if (!oldmm)
>  		return 0;
>
> +	/* initialize the new vmacache entries */
> +	vmacache_flush(tsk);
> +
>  	if (clone_flags & CLONE_VM) {
>  		atomic_inc(&oldmm->mm_users);
>  		mm = oldmm;

Yes. But it seems that use_mm() and unuse_mm() should invalidate vmacache too.

Suppose that a kernel thread T does, say,

	use_mm(foreign_mm);
	get_user(...);
	unuse_mm();

This can trigger a fault and populate T->vmacache[]. If this code is called
again vmacache_find() can use the stale entries.

Or, assuming that only a kernel thread can do use_mm(), we can change
vmacache_valid() to also check !PF_KTHREAD.

Hmm. Another problem is that use_mm() doesn't take ->mmap_sem and thus
it can race with vmacache_flush_all()...


Finally. Shouldn't vmacache_update() check current->mm == mm as well?
What if access_remote_vm/get_user_pages trigger find_vma() ??? Unless
I missed something this is not theoretical at all and can lead to the
corrupted vmacache, no?

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -next] mm,vmacache: also flush cache for VM_CLONE
  2014-03-13 14:59                   ` Oleg Nesterov
@ 2014-03-13 15:32                     ` Oleg Nesterov
  2014-03-13 19:04                       ` Davidlohr Bueso
       [not found]                     ` <CA+55aFyNd7L+G3hFauJPxUOengK-_o2G-SFmVooPZ-sE6xBj=g@mail.gmail.com>
  1 sibling, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2014-03-13 15:32 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, Linux Kernel Mailing List

On 03/13, Oleg Nesterov wrote:
>
> Yes. But it seems that use_mm() and unuse_mm() should invalidate vmacache too.
>
> Suppose that a kernel thread T does, say,
>
> 	use_mm(foreign_mm);
> 	get_user(...);
> 	unuse_mm();
>
> This can trigger a fault and populate T->vmacache[]. If this code is called
> again vmacache_find() can use the stale entries.
>
> Or, assuming that only a kernel thread can do use_mm(), we can change
> vmacache_valid() to also check !PF_KTHREAD.

Yes, I think we should check PF_KTHREAD, because

> Hmm. Another problem is that use_mm() doesn't take ->mmap_sem and thus
> it can race with vmacache_flush_all()...

this also closes this race. use_mm() users should not use vmacache at all.

> Finally. Shouldn't vmacache_update() check current->mm == mm as well?
> What if access_remote_vm/get_user_pages trigger find_vma() ??? Unless
> I missed something this is not theoretical at all and can lead to the
> corrupted vmacache, no?

Looks like a real problem or I am totally confused. I think we need
something like below (uncompiled).

Oleg.

--- x/mm/vmacache.c
+++ x/mm/vmacache.c
@@ -30,20 +30,24 @@ void vmacache_flush_all(struct mm_struct
 	rcu_read_unlock();
 }
 
+static bool vmacache_valid_mm(mm)
+{
+	return current->mm == mm && !(current->flags & PF_KTHREAD);
+}
+
 void vmacache_update(unsigned long addr, struct vm_area_struct *newvma)
 {
-	int idx = VMACACHE_HASH(addr);
-	current->vmacache[idx] = newvma;
+	if (vmacache_valid_mm(newvma->vm_mm))
+		current->vmacache[VMACACHE_HASH(addr)] = newvma;
 }
 
 static bool vmacache_valid(struct mm_struct *mm)
 {
-	struct task_struct *curr = current;
-
-	if (mm != curr->mm)
+	if (!vmacache_valid_mm(mm))
 		return false;
 
 	if (mm->vmacache_seqnum != curr->vmacache_seqnum) {
+		struct task_struct *curr = current;
 		/*
 		 * First attempt will always be invalid, initialize
 		 * the new cache for this task here.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -next] mm,vmacache: also flush cache for VM_CLONE
       [not found]                     ` <CA+55aFyNd7L+G3hFauJPxUOengK-_o2G-SFmVooPZ-sE6xBj=g@mail.gmail.com>
@ 2014-03-13 16:36                       ` Oleg Nesterov
  2014-03-13 18:27                         ` async_pf.c && use_mm() (Was: mm,vmacache: also flush cache for VM_CLONE) Oleg Nesterov
  0 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2014-03-13 16:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Davidlohr Bueso,
	Davidlohr Bueso, KOSAKI Motohiro, Rik van Riel, Mel Gorman,
	Andrew Morton, Michel Lespinasse, Ingo Molnar

On 03/13, Linus Torvalds wrote:
>
> On Mar 13, 2014 8:11 AM, "Oleg Nesterov" <oleg@redhat.com> wrote:
> >
> > Suppose that a kernel thread T does, say,
> >
> >         use_mm(foreign_mm);
> >         get_user(...);
> >         unuse_mm();
>
> That would be a major bug. Kernel threads cannot access use memory.

Unless a kernel thread does use_mm() ;)

> Has
> somebody added anything that crazy?

Hmm. aio no longer uses use_mm()... But there are other users:

	drivers/usb/gadget/inode.c  582   use_mm(mm);
	drivers/vhost/vhost.c       211   use_mm(dev->mm);
	virt/kvm/async_pf.c          68   use_mm(mm);

And yes, they do copy_to/from_user().

Hmm, but at first glance async_pf_execute() doesn't need use_mm() at all.
And perhaps other callers can use get_user_pages() too.

> The kernel thread "use_mm" is to avoid unnecessary context switches of the
> tlb when switching to a kennel thread, exactly *because* a kernel thread is
> never supposed to access use space, so it cannot care what user memory is
> attached.

It seems that you are talking about switch_mm-like things or I misunderstood.
use_mm() actually changes ->mm, not only ->active_mm.

> So I object vet much to making kernel threads special on this context,
> unless the "special" bit is some VM_BUG_ON() or similar.

See above. Perhaps we can kill use_mm() (personally I don't think we should),
but until then vmacache needs this check afaics.

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* async_pf.c && use_mm() (Was: mm,vmacache: also flush cache for VM_CLONE)
  2014-03-13 16:36                       ` Oleg Nesterov
@ 2014-03-13 18:27                         ` Oleg Nesterov
       [not found]                           ` <CA+55aFwqTbsYCyPf6_i6RmBkPHpEhJjiRfZm6_1_yPa_kUkYiQ@mail.gmail.com>
  0 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2014-03-13 18:27 UTC (permalink / raw)
  To: Linus Torvalds, Gleb Natapov
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Davidlohr Bueso,
	Davidlohr Bueso, KOSAKI Motohiro, Rik van Riel, Mel Gorman,
	Andrew Morton, Michel Lespinasse, Ingo Molnar

On 03/13, Oleg Nesterov wrote:
>
> Hmm, but at first glance async_pf_execute() doesn't need use_mm() at all.

Seriously, why it nees use_mm? get_user_pages(mm => apf->mm) should work.
Perhaps there is some kvm magic ?

But actually I am writing this email because mmdrop() doesn't look right
or I missed something. I think that kvm_setup_async_pf() should increment
->mm_users and async_pf_execute() needs mmput() ?

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH -next] mm,vmacache: also flush cache for VM_CLONE
  2014-03-13 15:32                     ` Oleg Nesterov
@ 2014-03-13 19:04                       ` Davidlohr Bueso
  0 siblings, 0 replies; 42+ messages in thread
From: Davidlohr Bueso @ 2014-03-13 19:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Davidlohr Bueso, Linux Kernel Mailing List

On Thu, 2014-03-13 at 16:32 +0100, Oleg Nesterov wrote:
> On 03/13, Oleg Nesterov wrote:
> >
> > Yes. But it seems that use_mm() and unuse_mm() should invalidate vmacache too.
> >
> > Suppose that a kernel thread T does, say,
> >
> > 	use_mm(foreign_mm);
> > 	get_user(...);
> > 	unuse_mm();
> >
> > This can trigger a fault and populate T->vmacache[]. If this code is called
> > again vmacache_find() can use the stale entries.
> >
> > Or, assuming that only a kernel thread can do use_mm(), we can change
> > vmacache_valid() to also check !PF_KTHREAD.
> 
> Yes, I think we should check PF_KTHREAD, because
> 
> > Hmm. Another problem is that use_mm() doesn't take ->mmap_sem and thus
> > it can race with vmacache_flush_all()...
> 
> this also closes this race. use_mm() users should not use vmacache at all.
> 
> > Finally. Shouldn't vmacache_update() check current->mm == mm as well?
> > What if access_remote_vm/get_user_pages trigger find_vma() ??? Unless
> > I missed something this is not theoretical at all and can lead to the
> > corrupted vmacache, no?
> 
> Looks like a real problem or I am totally confused. I think we need
> something like below (uncompiled).

Thanks for looking into this Oleg. I was actually chasing a bug
triggered by trinity where we have a stale cache and vmacache_find() is
returning a bogus vma structure even when vma->vm_mm != mm:

https://lkml.org/lkml/2014/3/9/201
https://lkml.org/lkml/2014/3/11/563

So it just might be a real problem.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: async_pf.c && use_mm() (Was: mm,vmacache: also flush cache for VM_CLONE)
       [not found]                           ` <CA+55aFwqTbsYCyPf6_i6RmBkPHpEhJjiRfZm6_1_yPa_kUkYiQ@mail.gmail.com>
@ 2014-03-13 21:44                             ` Linus Torvalds
  2014-03-14 18:23                               ` Oleg Nesterov
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2014-03-13 21:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linux Kernel Mailing List, Gleb Natapov, Peter Zijlstra,
	Davidlohr Bueso, Davidlohr Bueso, KOSAKI Motohiro, Rik van Riel,
	Andrew Morton, Mel Gorman, Michel Lespinasse, Ingo Molnar

On Thu, Mar 13, 2014 at 12:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Maybe it uses use_mm just to increment the usage count. Which is bogus, and
> it should just "get" it instead, but whatever.
>
> On my phone, so I can't check the details.

Ok, no longer on my phone, and no, it clearly does the reference count with a

    atomic_inc(&work->mm->mm_count);

separately. The use_mm/unuse_mm seems entirely specious.

Maybe it has some historical meaning to it.

               Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-04  3:26               ` Linus Torvalds
@ 2014-03-14  3:05                 ` Li Zefan
  -1 siblings, 0 replies; 42+ messages in thread
From: Li Zefan @ 2014-03-14  3:05 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Chandramouleeswaran, Aswin, Norton, Scott J, linux-mm,
	Linux Kernel Mailing List

Hi Davidlohr,

On 2014/3/4 11:26, Linus Torvalds wrote:
> On Mon, Mar 3, 2014 at 7:13 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>>
>> Yes, I shortly realized that was silly... but I can say for sure it can
>> happen and a quick qemu run confirms it. So I see your point as to
>> asking why we need it, so now I'm looking for an explanation in the
>> code.
> 
> We definitely *do* have users.
> 
> One example would be ptrace -> access_process_vm -> __access_remote_vm
> -> get_user_pages() -> find_extend_vma() -> find_vma_prev -> find_vma.
> 

I raw this oops on 3.14.0-rc5-next-20140307, which is possible caused by
your patch? Don't know how it was triggered.

[ 6072.026715] BUG: unable to handle kernel NULL pointer dereference at 00000000000007f8
[ 6072.026729] IP: [<ffffffff811a0189>] follow_page_mask+0x69/0x620
[ 6072.026742] PGD c1975f067 PUD c19479067 PMD 0
[ 6072.026749] Oops: 0000 [#1] SMP
[ 6072.026852] CPU: 2 PID: 13445 Comm: ps Not tainted 3.14.0-rc5-next-20140307-0.1-default+ #4
[ 6072.026863] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285          /BC11BTSA              , BIO
S CTSAV036 04/27/2011
[ 6072.026872] task: ffff88061d8848a0 ti: ffff880618854000 task.ti: ffff880618854000
[ 6072.026880] RIP: 0010:[<ffffffff811a0189>]  [<ffffffff811a0189>] follow_page_mask+0x69/0x620
[ 6072.026889] RSP: 0018:ffff880618855c18  EFLAGS: 00010206
[ 6072.026895] RAX: 00000000000000ff RBX: ffffffffffffffea RCX: ffff880618855d0c
[ 6072.026902] RDX: 0000000000000000 RSI: 00007fff0a474cc7 RDI: ffff88061aef8f00
[ 6072.026909] RBP: ffff880618855c88 R08: 0000000000000002 R09: 0000000000000000
[ 6072.026916] R10: 0000000000000000 R11: 0000000000003485 R12: 00007fff0a474cc7
[ 6072.026924] R13: 0000000000000016 R14: ffff88061aef8f00 R15: ffff880c1c842508
[ 6072.026932] FS:  00007f4687701700(0000) GS:ffff880c26a00000(0000) knlGS:0000000000000000
[ 6072.026940] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6072.026947] CR2: 00000000000007f8 CR3: 0000000c184ee000 CR4: 00000000000007e0
[ 6072.026955] Stack:
[ 6072.026959]  ffff880618855c48 ffff880618855d0c 0000000018855c58 0000000000000246
[ 6072.026969]  0000000000000000 0000000000000752 ffffffff817c975c 0000000000000000
[ 6072.026980]  ffff880618855c88 0000000000000016 ffff880c1c842508 ffff88061d8848a0
[ 6072.026989] Call Trace:
[ 6072.026998]  [<ffffffff811a4b14>] __get_user_pages+0x204/0x5a0
[ 6072.027007]  [<ffffffff811a4f62>] get_user_pages+0x52/0x60
[ 6072.027015]  [<ffffffff811a5088>] __access_remote_vm+0x118/0x1f0
[ 6072.027023]  [<ffffffff811a51bb>] access_process_vm+0x5b/0x80
[ 6072.027033]  [<ffffffff812675a7>] proc_pid_cmdline+0x77/0x120
[ 6072.027041]  [<ffffffff81267da2>] proc_info_read+0xa2/0xe0
[ 6072.027050]  [<ffffffff811f439d>] vfs_read+0xad/0x1a0
[ 6072.027057]  [<ffffffff811f45b5>] SyS_read+0x65/0xb0
[ 6072.027066]  [<ffffffff8159ba12>] system_call_fastpath+0x16/0x1b
[ 6072.027072] Code: f4 4c 89 f7 89 45 a4 e8 36 0e eb ff 48 3d 00 f0 ff ff 48 89 c3 0f 86 d7 00 00 00 4c 89 e0
 49 8b 56 40 48 c1 e8 27 25 ff 01 00 00 <48> 8b 0c c2 48 85 c9 75 3e 41 83 e5 08 74 1b 49 8b 87 90 00 00
[ 6072.027134] RIP  [<ffffffff811a0189>] follow_page_mask+0x69/0x620
[ 6072.027142]  RSP <ffff880618855c18>
[ 6072.027146] CR2: 00000000000007f8
[ 6072.134516] ---[ end trace 8d006e01f05d1ba8 ]---



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
@ 2014-03-14  3:05                 ` Li Zefan
  0 siblings, 0 replies; 42+ messages in thread
From: Li Zefan @ 2014-03-14  3:05 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Chandramouleeswaran, Aswin, Norton, Scott J, linux-mm,
	Linux Kernel Mailing List

Hi Davidlohr,

On 2014/3/4 11:26, Linus Torvalds wrote:
> On Mon, Mar 3, 2014 at 7:13 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>>
>> Yes, I shortly realized that was silly... but I can say for sure it can
>> happen and a quick qemu run confirms it. So I see your point as to
>> asking why we need it, so now I'm looking for an explanation in the
>> code.
> 
> We definitely *do* have users.
> 
> One example would be ptrace -> access_process_vm -> __access_remote_vm
> -> get_user_pages() -> find_extend_vma() -> find_vma_prev -> find_vma.
> 

I raw this oops on 3.14.0-rc5-next-20140307, which is possible caused by
your patch? Don't know how it was triggered.

[ 6072.026715] BUG: unable to handle kernel NULL pointer dereference at 00000000000007f8
[ 6072.026729] IP: [<ffffffff811a0189>] follow_page_mask+0x69/0x620
[ 6072.026742] PGD c1975f067 PUD c19479067 PMD 0
[ 6072.026749] Oops: 0000 [#1] SMP
[ 6072.026852] CPU: 2 PID: 13445 Comm: ps Not tainted 3.14.0-rc5-next-20140307-0.1-default+ #4
[ 6072.026863] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285          /BC11BTSA              , BIO
S CTSAV036 04/27/2011
[ 6072.026872] task: ffff88061d8848a0 ti: ffff880618854000 task.ti: ffff880618854000
[ 6072.026880] RIP: 0010:[<ffffffff811a0189>]  [<ffffffff811a0189>] follow_page_mask+0x69/0x620
[ 6072.026889] RSP: 0018:ffff880618855c18  EFLAGS: 00010206
[ 6072.026895] RAX: 00000000000000ff RBX: ffffffffffffffea RCX: ffff880618855d0c
[ 6072.026902] RDX: 0000000000000000 RSI: 00007fff0a474cc7 RDI: ffff88061aef8f00
[ 6072.026909] RBP: ffff880618855c88 R08: 0000000000000002 R09: 0000000000000000
[ 6072.026916] R10: 0000000000000000 R11: 0000000000003485 R12: 00007fff0a474cc7
[ 6072.026924] R13: 0000000000000016 R14: ffff88061aef8f00 R15: ffff880c1c842508
[ 6072.026932] FS:  00007f4687701700(0000) GS:ffff880c26a00000(0000) knlGS:0000000000000000
[ 6072.026940] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6072.026947] CR2: 00000000000007f8 CR3: 0000000c184ee000 CR4: 00000000000007e0
[ 6072.026955] Stack:
[ 6072.026959]  ffff880618855c48 ffff880618855d0c 0000000018855c58 0000000000000246
[ 6072.026969]  0000000000000000 0000000000000752 ffffffff817c975c 0000000000000000
[ 6072.026980]  ffff880618855c88 0000000000000016 ffff880c1c842508 ffff88061d8848a0
[ 6072.026989] Call Trace:
[ 6072.026998]  [<ffffffff811a4b14>] __get_user_pages+0x204/0x5a0
[ 6072.027007]  [<ffffffff811a4f62>] get_user_pages+0x52/0x60
[ 6072.027015]  [<ffffffff811a5088>] __access_remote_vm+0x118/0x1f0
[ 6072.027023]  [<ffffffff811a51bb>] access_process_vm+0x5b/0x80
[ 6072.027033]  [<ffffffff812675a7>] proc_pid_cmdline+0x77/0x120
[ 6072.027041]  [<ffffffff81267da2>] proc_info_read+0xa2/0xe0
[ 6072.027050]  [<ffffffff811f439d>] vfs_read+0xad/0x1a0
[ 6072.027057]  [<ffffffff811f45b5>] SyS_read+0x65/0xb0
[ 6072.027066]  [<ffffffff8159ba12>] system_call_fastpath+0x16/0x1b
[ 6072.027072] Code: f4 4c 89 f7 89 45 a4 e8 36 0e eb ff 48 3d 00 f0 ff ff 48 89 c3 0f 86 d7 00 00 00 4c 89 e0
 49 8b 56 40 48 c1 e8 27 25 ff 01 00 00 <48> 8b 0c c2 48 85 c9 75 3e 41 83 e5 08 74 1b 49 8b 87 90 00 00
[ 6072.027134] RIP  [<ffffffff811a0189>] follow_page_mask+0x69/0x620
[ 6072.027142]  RSP <ffff880618855c18>
[ 6072.027146] CR2: 00000000000007f8
[ 6072.134516] ---[ end trace 8d006e01f05d1ba8 ]---


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
  2014-03-14  3:05                 ` Li Zefan
@ 2014-03-14  4:43                   ` Andrew Morton
  -1 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2014-03-14  4:43 UTC (permalink / raw)
  To: Li Zefan
  Cc: Davidlohr Bueso, Linus Torvalds, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Chandramouleeswaran, Aswin, Norton, Scott J, linux-mm,
	Linux Kernel Mailing List

On Fri, 14 Mar 2014 11:05:51 +0800 Li Zefan <lizefan@huawei.com> wrote:

> Hi Davidlohr,
> 
> On 2014/3/4 11:26, Linus Torvalds wrote:
> > On Mon, Mar 3, 2014 at 7:13 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >>
> >> Yes, I shortly realized that was silly... but I can say for sure it can
> >> happen and a quick qemu run confirms it. So I see your point as to
> >> asking why we need it, so now I'm looking for an explanation in the
> >> code.
> > 
> > We definitely *do* have users.
> > 
> > One example would be ptrace -> access_process_vm -> __access_remote_vm
> > -> get_user_pages() -> find_extend_vma() -> find_vma_prev -> find_vma.
> > 
> 
> I raw this oops on 3.14.0-rc5-next-20140307, which is possible caused by
> your patch? Don't know how it was triggered.
> 
> ...
>
> [ 6072.027007]  [<ffffffff811a4f62>] get_user_pages+0x52/0x60
> [ 6072.027015]  [<ffffffff811a5088>] __access_remote_vm+0x118/0x1f0
> [ 6072.027023]  [<ffffffff811a51bb>] access_process_vm+0x5b/0x80
> [ 6072.027033]  [<ffffffff812675a7>] proc_pid_cmdline+0x77/0x120
> [ 6072.027041]  [<ffffffff81267da2>] proc_info_read+0xa2/0xe0
> [ 6072.027050]  [<ffffffff811f439d>] vfs_read+0xad/0x1a0
> [ 6072.027057]  [<ffffffff811f45b5>] SyS_read+0x65/0xb0
> [ 6072.027066]  [<ffffffff8159ba12>] system_call_fastpath+0x16/0x1b
> [ 6072.027072] Code: f4 4c 89 f7 89 45 a4 e8 36 0e eb ff 48 3d 00 f0 ff ff 48 89 c3 0f 86 d7 00 00 00 4c 89 e0
>  49 8b 56 40 48 c1 e8 27 25 ff 01 00 00 <48> 8b 0c c2 48 85 c9 75 3e 41 83 e5 08 74 1b 49 8b 87 90 00 00
> [ 6072.027134] RIP  [<ffffffff811a0189>] follow_page_mask+0x69/0x620
> [ 6072.027142]  RSP <ffff880618855c18>
> [ 6072.027146] CR2: 00000000000007f8

Yep.  Please grab whichever of

mm-per-thread-vma-caching-fix-3.patch
mm-per-thread-vma-caching-fix-4.patch
mm-per-thread-vma-caching-fix-5.patch
mm-per-thread-vma-caching-fix-6-checkpatch-fixes.patch
mm-per-thread-vma-caching-fix-6-fix.patch

which you don't have from http://ozlabs.org/~akpm/mmots/broken-out/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] mm: per-thread vma caching
@ 2014-03-14  4:43                   ` Andrew Morton
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2014-03-14  4:43 UTC (permalink / raw)
  To: Li Zefan
  Cc: Davidlohr Bueso, Linus Torvalds, Ingo Molnar, Peter Zijlstra,
	Michel Lespinasse, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Chandramouleeswaran, Aswin, Norton, Scott J, linux-mm,
	Linux Kernel Mailing List

On Fri, 14 Mar 2014 11:05:51 +0800 Li Zefan <lizefan@huawei.com> wrote:

> Hi Davidlohr,
> 
> On 2014/3/4 11:26, Linus Torvalds wrote:
> > On Mon, Mar 3, 2014 at 7:13 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >>
> >> Yes, I shortly realized that was silly... but I can say for sure it can
> >> happen and a quick qemu run confirms it. So I see your point as to
> >> asking why we need it, so now I'm looking for an explanation in the
> >> code.
> > 
> > We definitely *do* have users.
> > 
> > One example would be ptrace -> access_process_vm -> __access_remote_vm
> > -> get_user_pages() -> find_extend_vma() -> find_vma_prev -> find_vma.
> > 
> 
> I raw this oops on 3.14.0-rc5-next-20140307, which is possible caused by
> your patch? Don't know how it was triggered.
> 
> ...
>
> [ 6072.027007]  [<ffffffff811a4f62>] get_user_pages+0x52/0x60
> [ 6072.027015]  [<ffffffff811a5088>] __access_remote_vm+0x118/0x1f0
> [ 6072.027023]  [<ffffffff811a51bb>] access_process_vm+0x5b/0x80
> [ 6072.027033]  [<ffffffff812675a7>] proc_pid_cmdline+0x77/0x120
> [ 6072.027041]  [<ffffffff81267da2>] proc_info_read+0xa2/0xe0
> [ 6072.027050]  [<ffffffff811f439d>] vfs_read+0xad/0x1a0
> [ 6072.027057]  [<ffffffff811f45b5>] SyS_read+0x65/0xb0
> [ 6072.027066]  [<ffffffff8159ba12>] system_call_fastpath+0x16/0x1b
> [ 6072.027072] Code: f4 4c 89 f7 89 45 a4 e8 36 0e eb ff 48 3d 00 f0 ff ff 48 89 c3 0f 86 d7 00 00 00 4c 89 e0
>  49 8b 56 40 48 c1 e8 27 25 ff 01 00 00 <48> 8b 0c c2 48 85 c9 75 3e 41 83 e5 08 74 1b 49 8b 87 90 00 00
> [ 6072.027134] RIP  [<ffffffff811a0189>] follow_page_mask+0x69/0x620
> [ 6072.027142]  RSP <ffff880618855c18>
> [ 6072.027146] CR2: 00000000000007f8

Yep.  Please grab whichever of

mm-per-thread-vma-caching-fix-3.patch
mm-per-thread-vma-caching-fix-4.patch
mm-per-thread-vma-caching-fix-5.patch
mm-per-thread-vma-caching-fix-6-checkpatch-fixes.patch
mm-per-thread-vma-caching-fix-6-fix.patch

which you don't have from http://ozlabs.org/~akpm/mmots/broken-out/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: async_pf.c && use_mm() (Was: mm,vmacache: also flush cache for VM_CLONE)
  2014-03-13 21:44                             ` Linus Torvalds
@ 2014-03-14 18:23                               ` Oleg Nesterov
  0 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2014-03-14 18:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Gleb Natapov, Peter Zijlstra,
	Davidlohr Bueso, Davidlohr Bueso, KOSAKI Motohiro, Rik van Riel,
	Andrew Morton, Mel Gorman, Michel Lespinasse, Ingo Molnar

On 03/13, Linus Torvalds wrote:
>
> Ok, no longer on my phone, and no, it clearly does the reference count with a
>
>     atomic_inc(&work->mm->mm_count);
>
> separately. The use_mm/unuse_mm seems entirely specious.

Yes, it really looks as if we can simply remove it.

But once again, with or without use_mm() it seems that the refcounting
is buggy. get_user_pages() is simply wrong if ->mm_users == 0 and
exit_mmap/etc was already called (or in progress).

So I think we need something like below, but I can't test this change
or audit other (potential) users of kvm_async_pf->mm.

Perhaps this is not a bug and somehow it is guaranteed that, say,
kvm_clear_async_pf_completion_queue() must be always called before the
caller of kvm_setup_async_pf() can exit? I don't know, but in this case
we do not need any accounting and this should be documented.

Gleb, what do you think?

Oleg.

--- x/virt/kvm/async_pf.c
+++ x/virt/kvm/async_pf.c
@@ -65,11 +65,9 @@ static void async_pf_execute(struct work_struct *work)
 
 	might_sleep();
 
-	use_mm(mm);
 	down_read(&mm->mmap_sem);
 	get_user_pages(current, mm, addr, 1, 1, 0, NULL, NULL);
 	up_read(&mm->mmap_sem);
-	unuse_mm(mm);
 
 	spin_lock(&vcpu->async_pf.lock);
 	list_add_tail(&apf->link, &vcpu->async_pf.done);
@@ -85,7 +83,7 @@ static void async_pf_execute(struct work_struct *work)
 	if (waitqueue_active(&vcpu->wq))
 		wake_up_interruptible(&vcpu->wq);
 
-	mmdrop(mm);
+	mmput(mm);
 	kvm_put_kvm(vcpu->kvm);
 }
 
@@ -98,7 +96,7 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
 				   typeof(*work), queue);
 		list_del(&work->queue);
 		if (cancel_work_sync(&work->work)) {
-			mmdrop(work->mm);
+			mmput(work->mm);
 			kvm_put_kvm(vcpu->kvm); /* == work->vcpu->kvm */
 			kmem_cache_free(async_pf_cache, work);
 		}
@@ -162,7 +160,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
 	work->addr = gfn_to_hva(vcpu->kvm, gfn);
 	work->arch = *arch;
 	work->mm = current->mm;
-	atomic_inc(&work->mm->mm_count);
+	atomic_inc(&work->mm->mm_users);
 	kvm_get_kvm(work->vcpu->kvm);
 
 	/* this can't really happen otherwise gfn_to_pfn_async
@@ -180,7 +178,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
 	return 1;
 retry_sync:
 	kvm_put_kvm(work->vcpu->kvm);
-	mmdrop(work->mm);
+	mmput(work->mm);
 	kmem_cache_free(async_pf_cache, work);
 	return 0;
 }


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2014-03-14 18:24 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-27 21:48 [PATCH v4] mm: per-thread vma caching Davidlohr Bueso
2014-02-27 21:48 ` Davidlohr Bueso
2014-02-28  4:39 ` Davidlohr Bueso
2014-02-28  4:39   ` Davidlohr Bueso
2014-03-04  0:00 ` Andrew Morton
2014-03-04  0:18   ` Davidlohr Bueso
2014-03-04  0:18     ` Davidlohr Bueso
2014-03-04  0:40 ` Andrew Morton
2014-03-04  0:59   ` Davidlohr Bueso
2014-03-04  0:59     ` Davidlohr Bueso
2014-03-04  1:23     ` Andrew Morton
2014-03-04  2:42       ` Davidlohr Bueso
2014-03-04  2:42         ` Davidlohr Bueso
2014-03-04  3:12         ` Andrew Morton
2014-03-04  3:13           ` Davidlohr Bueso
2014-03-04  3:13             ` Davidlohr Bueso
2014-03-04  3:26             ` Andrew Morton
2014-03-04  3:26             ` Linus Torvalds
2014-03-04  3:26               ` Linus Torvalds
2014-03-04  5:32               ` Davidlohr Bueso
2014-03-04  5:32                 ` Davidlohr Bueso
2014-03-14  3:05               ` Li Zefan
2014-03-14  3:05                 ` Li Zefan
2014-03-14  4:43                 ` Andrew Morton
2014-03-14  4:43                   ` Andrew Morton
2014-03-06 22:56     ` Andrew Morton
2014-03-06 22:56       ` Andrew Morton
     [not found] ` <20140308184040.GA29602@redhat.com>
     [not found]   ` <CA+55aFw88xiY+o5FE6VtHNkpUZDK3FPt31oCpNsgn1BH7wAPZw@mail.gmail.com>
2014-03-08 19:57     ` Oleg Nesterov
     [not found]     ` <20140308194405.GA32403@redhat.com>
2014-03-08 20:02       ` Linus Torvalds
2014-03-09  3:22         ` Davidlohr Bueso
2014-03-09 12:57         ` Oleg Nesterov
2014-03-09 15:57           ` Linus Torvalds
2014-03-09 17:09             ` Oleg Nesterov
2014-03-09 17:16               ` Linus Torvalds
2014-03-10 19:56                 ` [PATCH -next] mm,vmacache: also flush cache for VM_CLONE Davidlohr Bueso
2014-03-13 14:59                   ` Oleg Nesterov
2014-03-13 15:32                     ` Oleg Nesterov
2014-03-13 19:04                       ` Davidlohr Bueso
     [not found]                     ` <CA+55aFyNd7L+G3hFauJPxUOengK-_o2G-SFmVooPZ-sE6xBj=g@mail.gmail.com>
2014-03-13 16:36                       ` Oleg Nesterov
2014-03-13 18:27                         ` async_pf.c && use_mm() (Was: mm,vmacache: also flush cache for VM_CLONE) Oleg Nesterov
     [not found]                           ` <CA+55aFwqTbsYCyPf6_i6RmBkPHpEhJjiRfZm6_1_yPa_kUkYiQ@mail.gmail.com>
2014-03-13 21:44                             ` Linus Torvalds
2014-03-14 18:23                               ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.