linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Memory policy corruption fixes -stable
@ 2012-10-09 16:58 Mel Gorman
  2012-10-09 16:58 ` [PATCH 1/5] revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages" Mel Gorman
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Mel Gorman @ 2012-10-09 16:58 UTC (permalink / raw)
  To: Stable
  Cc: Andi Kleen, Andrew Morton, KOSAKI Motohiro, Dave Jones,
	Christoph Lameter, Hugh Dickins, LKML, Linux-MM, Mel Gorman

This is a backport of the series "Memory policy corruption fixes V2". This
should apply to 3.6-stable, 3.5-stable, 3.4-stable and 3.0-stable without
any difficulty.  It will not apply cleanly to 3.2 but just drop the "revert"
patch and the rest of the series should apply.

I tested 3.6-stable and 3.0-stable with just the revert and trinity breaks
as expected for the mempolicy tests. Applying the full series in both case
allowed trinity to complete successfully. Andi Kleen reported previously
that the series fixed a database performance regression[1].

[1] https://lkml.org/lkml/2012/8/22/585

 include/linux/mempolicy.h |    2 +-
 mm/mempolicy.c            |  137 +++++++++++++++++++++++++++++----------------
 2 files changed, 89 insertions(+), 50 deletions(-)

-- 
1.7.9.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
  2012-10-09 16:58 [PATCH 0/5] Memory policy corruption fixes -stable Mel Gorman
@ 2012-10-09 16:58 ` Mel Gorman
  2012-10-09 16:58 ` [PATCH 2/5] mempolicy: remove mempolicy sharing Mel Gorman
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2012-10-09 16:58 UTC (permalink / raw)
  To: Stable
  Cc: Andi Kleen, Andrew Morton, KOSAKI Motohiro, Dave Jones,
	Christoph Lameter, Hugh Dickins, LKML, Linux-MM, Mel Gorman

From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>

commit 8d34694c1abf29df1f3c7317936b7e3e2e308d9b upstream.

Commit 05f144a0d5c2 ("mm: mempolicy: Let vma_merge and vma_split handle
vma->vm_policy linkages") removed vma->vm_policy updates code but it is
the purpose of mbind_range().  Now, mbind_range() is virtually a no-op
and while it does not allow memory corruption it is not the right fix.
This patch is a revert.

[mgorman@suse.de: Edited changelog]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/mempolicy.c |   41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ada3be..92daa26 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -607,6 +607,27 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
 	return first;
 }
 
+/* Apply policy to a single VMA */
+static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
+{
+	int err = 0;
+	struct mempolicy *old = vma->vm_policy;
+
+	pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
+		 vma->vm_start, vma->vm_end, vma->vm_pgoff,
+		 vma->vm_ops, vma->vm_file,
+		 vma->vm_ops ? vma->vm_ops->set_policy : NULL);
+
+	if (vma->vm_ops && vma->vm_ops->set_policy)
+		err = vma->vm_ops->set_policy(vma, new);
+	if (!err) {
+		mpol_get(new);
+		vma->vm_policy = new;
+		mpol_put(old);
+	}
+	return err;
+}
+
 /* Step 2: apply policy to a range and do splits. */
 static int mbind_range(struct mm_struct *mm, unsigned long start,
 		       unsigned long end, struct mempolicy *new_pol)
@@ -655,23 +676,9 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 			if (err)
 				goto out;
 		}
-
-		/*
-		 * Apply policy to a single VMA. The reference counting of
-		 * policy for vma_policy linkages has already been handled by
-		 * vma_merge and split_vma as necessary. If this is a shared
-		 * policy then ->set_policy will increment the reference count
-		 * for an sp node.
-		 */
-		pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
-			vma->vm_start, vma->vm_end, vma->vm_pgoff,
-			vma->vm_ops, vma->vm_file,
-			vma->vm_ops ? vma->vm_ops->set_policy : NULL);
-		if (vma->vm_ops && vma->vm_ops->set_policy) {
-			err = vma->vm_ops->set_policy(vma, new_pol);
-			if (err)
-				goto out;
-		}
+		err = policy_vma(vma, new_pol);
+		if (err)
+			goto out;
 	}
 
  out:
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] mempolicy: remove mempolicy sharing
  2012-10-09 16:58 [PATCH 0/5] Memory policy corruption fixes -stable Mel Gorman
  2012-10-09 16:58 ` [PATCH 1/5] revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages" Mel Gorman
@ 2012-10-09 16:58 ` Mel Gorman
  2012-10-09 16:58 ` [PATCH 3/5] mempolicy: fix a race in shared_policy_replace() Mel Gorman
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2012-10-09 16:58 UTC (permalink / raw)
  To: Stable
  Cc: Andi Kleen, Andrew Morton, KOSAKI Motohiro, Dave Jones,
	Christoph Lameter, Hugh Dickins, LKML, Linux-MM, Mel Gorman

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

commit 869833f2c5c6e4dd09a5378cfc665ffb4615e5d2 upstream.

Dave Jones' system call fuzz testing tool "trinity" triggered the
following bug error with slab debugging enabled

    =============================================================================
    BUG numa_policy (Not tainted): Poison overwritten
    -----------------------------------------------------------------------------

    INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
    INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
     __slab_alloc+0x3d3/0x445
     kmem_cache_alloc+0x29d/0x2b0
     mpol_new+0xa3/0x140
     sys_mbind+0x142/0x620
     system_call_fastpath+0x16/0x1b

    INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
     __slab_free+0x2e/0x1de
     kmem_cache_free+0x25a/0x260
     __mpol_put+0x27/0x30
     remove_vma+0x68/0x90
     exit_mmap+0x118/0x140
     mmput+0x73/0x110
     exit_mm+0x108/0x130
     do_exit+0x162/0xb90
     do_group_exit+0x4f/0xc0
     sys_exit_group+0x17/0x20
     system_call_fastpath+0x16/0x1b

    INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x          (null) flags=0x20000000004080
    INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0

The problem is that the structure is being prematurely freed due to a
reference count imbalance. In the following case mbind(addr, len) should
replace the memory policies of both vma1 and vma2 and thus they will
become to share the same mempolicy and the new mempolicy will have the
MPOL_F_SHARED flag.

  +-------------------+-------------------+
  |     vma1          |     vma2(shmem)   |
  +-------------------+-------------------+
  |                                       |
 addr                                 addr+len

alloc_pages_vma() uses get_vma_policy() and mpol_cond_put() pair for
maintaining the mempolicy reference count.  The current rule is that
get_vma_policy() only increments refcount for shmem VMA and
mpol_conf_put() only decrements refcount if the policy has
MPOL_F_SHARED.

In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED!
The reference count will be decreased even though was not increased
whenever alloc_page_vma() is called.  This has been broken since commit
[52cd3b07: mempolicy: rework mempolicy Reference Counting] in 2008.

There is another serious bug with the sharing of memory policies.
Currently, mempolicy rebind logic (it is called from cpuset rebinding)
ignores a refcount of mempolicy and override it forcibly.  Thus, any
mempolicy sharing may cause mempolicy corruption.  The bug was
introduced by commit [68860ec1: cpusets: automatic numa mempolicy
rebinding].

Ideally, the shared policy handling would be rewritten to either
properly handle COW of the policy structures or at least reference count
MPOL_F_SHARED based exclusively on information within the policy.
However, this patch takes the easier approach of disabling any policy
sharing between VMAs.  Each new range allocated with sp_alloc will
allocate a new policy, set the reference count to 1 and drop the
reference count of the old policy.  This increases the memory footprint
but is not expected to be a major problem as mbind() is unlikely to be
used for fine-grained ranges.  It is also inefficient because it means
we allocate a new policy even in cases where mbind_range() could use the
new_policy passed to it.  However, it is more straight-forward and the
change should be invisible to the user.

[mgorman@suse.de: Edited changelog]
Reported-by: Dave Jones <davej@redhat.com>,
Cc: Christoph Lameter <cl@linux.com>,
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/mempolicy.c |   52 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 92daa26..f0728ae 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -607,24 +607,39 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
 	return first;
 }
 
-/* Apply policy to a single VMA */
-static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
+/*
+ * Apply policy to a single VMA
+ * This must be called with the mmap_sem held for writing.
+ */
+static int vma_replace_policy(struct vm_area_struct *vma,
+						struct mempolicy *pol)
 {
-	int err = 0;
-	struct mempolicy *old = vma->vm_policy;
+	int err;
+	struct mempolicy *old;
+	struct mempolicy *new;
 
 	pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
 		 vma->vm_start, vma->vm_end, vma->vm_pgoff,
 		 vma->vm_ops, vma->vm_file,
 		 vma->vm_ops ? vma->vm_ops->set_policy : NULL);
 
-	if (vma->vm_ops && vma->vm_ops->set_policy)
+	new = mpol_dup(pol);
+	if (IS_ERR(new))
+		return PTR_ERR(new);
+
+	if (vma->vm_ops && vma->vm_ops->set_policy) {
 		err = vma->vm_ops->set_policy(vma, new);
-	if (!err) {
-		mpol_get(new);
-		vma->vm_policy = new;
-		mpol_put(old);
+		if (err)
+			goto err_out;
 	}
+
+	old = vma->vm_policy;
+	vma->vm_policy = new; /* protected by mmap_sem */
+	mpol_put(old);
+
+	return 0;
+ err_out:
+	mpol_put(new);
 	return err;
 }
 
@@ -676,7 +691,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 			if (err)
 				goto out;
 		}
-		err = policy_vma(vma, new_pol);
+		err = vma_replace_policy(vma, new_pol);
 		if (err)
 			goto out;
 	}
@@ -2153,15 +2168,24 @@ static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 				struct mempolicy *pol)
 {
-	struct sp_node *n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
+	struct sp_node *n;
+	struct mempolicy *newpol;
 
+	n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
 	if (!n)
 		return NULL;
+
+	newpol = mpol_dup(pol);
+	if (IS_ERR(newpol)) {
+		kmem_cache_free(sn_cache, n);
+		return NULL;
+	}
+	newpol->flags |= MPOL_F_SHARED;
+
 	n->start = start;
 	n->end = end;
-	mpol_get(pol);
-	pol->flags |= MPOL_F_SHARED;	/* for unref */
-	n->policy = pol;
+	n->policy = newpol;
+
 	return n;
 }
 
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] mempolicy: fix a race in shared_policy_replace()
  2012-10-09 16:58 [PATCH 0/5] Memory policy corruption fixes -stable Mel Gorman
  2012-10-09 16:58 ` [PATCH 1/5] revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages" Mel Gorman
  2012-10-09 16:58 ` [PATCH 2/5] mempolicy: remove mempolicy sharing Mel Gorman
@ 2012-10-09 16:58 ` Mel Gorman
  2012-10-09 16:58 ` [PATCH 4/5] mempolicy: fix refcount leak in mpol_set_shared_policy() Mel Gorman
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2012-10-09 16:58 UTC (permalink / raw)
  To: Stable
  Cc: Andi Kleen, Andrew Morton, KOSAKI Motohiro, Dave Jones,
	Christoph Lameter, Hugh Dickins, LKML, Linux-MM, Mel Gorman

commit b22d127a39ddd10d93deee3d96e643657ad53a49 upstream.

shared_policy_replace() use of sp_alloc() is unsafe.  1) sp_node cannot
be dereferenced if sp->lock is not held and 2) another thread can modify
sp_node between spin_unlock for allocating a new sp node and next
spin_lock.  The bug was introduced before 2.6.12-rc2.

Kosaki's original patch for this problem was to allocate an sp node and
policy within shared_policy_replace and initialise it when the lock is
reacquired.  I was not keen on this approach because it partially
duplicates sp_alloc().  As the paths were sp->lock is taken are not that
performance critical this patch converts sp->lock to sp->mutex so it can
sleep when calling sp_alloc().

[kosaki.motohiro@jp.fujitsu.com: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mempolicy.h |    2 +-
 mm/mempolicy.c            |   37 ++++++++++++++++---------------------
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 95b738c..df08254 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -188,7 +188,7 @@ struct sp_node {
 
 struct shared_policy {
 	struct rb_root root;
-	spinlock_t lock;
+	struct mutex mutex;
 };
 
 void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f0728ae..b2f12ec 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2083,7 +2083,7 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b)
  */
 
 /* lookup first element intersecting start-end */
-/* Caller holds sp->lock */
+/* Caller holds sp->mutex */
 static struct sp_node *
 sp_lookup(struct shared_policy *sp, unsigned long start, unsigned long end)
 {
@@ -2147,13 +2147,13 @@ mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx)
 
 	if (!sp->root.rb_node)
 		return NULL;
-	spin_lock(&sp->lock);
+	mutex_lock(&sp->mutex);
 	sn = sp_lookup(sp, idx, idx+1);
 	if (sn) {
 		mpol_get(sn->policy);
 		pol = sn->policy;
 	}
-	spin_unlock(&sp->lock);
+	mutex_unlock(&sp->mutex);
 	return pol;
 }
 
@@ -2193,10 +2193,10 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 static int shared_policy_replace(struct shared_policy *sp, unsigned long start,
 				 unsigned long end, struct sp_node *new)
 {
-	struct sp_node *n, *new2 = NULL;
+	struct sp_node *n;
+	int ret = 0;
 
-restart:
-	spin_lock(&sp->lock);
+	mutex_lock(&sp->mutex);
 	n = sp_lookup(sp, start, end);
 	/* Take care of old policies in the same range. */
 	while (n && n->start < end) {
@@ -2209,16 +2209,14 @@ restart:
 		} else {
 			/* Old policy spanning whole new range. */
 			if (n->end > end) {
+				struct sp_node *new2;
+				new2 = sp_alloc(end, n->end, n->policy);
 				if (!new2) {
-					spin_unlock(&sp->lock);
-					new2 = sp_alloc(end, n->end, n->policy);
-					if (!new2)
-						return -ENOMEM;
-					goto restart;
+					ret = -ENOMEM;
+					goto out;
 				}
 				n->end = start;
 				sp_insert(sp, new2);
-				new2 = NULL;
 				break;
 			} else
 				n->end = start;
@@ -2229,12 +2227,9 @@ restart:
 	}
 	if (new)
 		sp_insert(sp, new);
-	spin_unlock(&sp->lock);
-	if (new2) {
-		mpol_put(new2->policy);
-		kmem_cache_free(sn_cache, new2);
-	}
-	return 0;
+out:
+	mutex_unlock(&sp->mutex);
+	return ret;
 }
 
 /**
@@ -2252,7 +2247,7 @@ void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol)
 	int ret;
 
 	sp->root = RB_ROOT;		/* empty tree == default mempolicy */
-	spin_lock_init(&sp->lock);
+	mutex_init(&sp->mutex);
 
 	if (mpol) {
 		struct vm_area_struct pvma;
@@ -2318,7 +2313,7 @@ void mpol_free_shared_policy(struct shared_policy *p)
 
 	if (!p->root.rb_node)
 		return;
-	spin_lock(&p->lock);
+	mutex_lock(&p->mutex);
 	next = rb_first(&p->root);
 	while (next) {
 		n = rb_entry(next, struct sp_node, nd);
@@ -2327,7 +2322,7 @@ void mpol_free_shared_policy(struct shared_policy *p)
 		mpol_put(n->policy);
 		kmem_cache_free(sn_cache, n);
 	}
-	spin_unlock(&p->lock);
+	mutex_unlock(&p->mutex);
 }
 
 /* assumes fs == KERNEL_DS */
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] mempolicy: fix refcount leak in mpol_set_shared_policy()
  2012-10-09 16:58 [PATCH 0/5] Memory policy corruption fixes -stable Mel Gorman
                   ` (2 preceding siblings ...)
  2012-10-09 16:58 ` [PATCH 3/5] mempolicy: fix a race in shared_policy_replace() Mel Gorman
@ 2012-10-09 16:58 ` Mel Gorman
  2012-10-09 16:58 ` [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma() Mel Gorman
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2012-10-09 16:58 UTC (permalink / raw)
  To: Stable
  Cc: Andi Kleen, Andrew Morton, KOSAKI Motohiro, Dave Jones,
	Christoph Lameter, Hugh Dickins, LKML, Linux-MM, Mel Gorman

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

commit 63f74ca21f1fad36d075e063f06dcc6d39fe86b2 upstream.

When shared_policy_replace() fails to allocate new->policy is not freed
correctly by mpol_set_shared_policy().  The problem is that shared
mempolicy code directly call kmem_cache_free() in multiple places where
it is easy to make a mistake.

This patch creates an sp_free wrapper function and uses it. The bug was
introduced pre-git age (IOW, before 2.6.12-rc2).

[mgorman@suse.de: Editted changelog]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/mempolicy.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b2f12ec..1763418 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2157,12 +2157,17 @@ mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx)
 	return pol;
 }
 
+static void sp_free(struct sp_node *n)
+{
+	mpol_put(n->policy);
+	kmem_cache_free(sn_cache, n);
+}
+
 static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 {
 	pr_debug("deleting %lx-l%lx\n", n->start, n->end);
 	rb_erase(&n->nd, &sp->root);
-	mpol_put(n->policy);
-	kmem_cache_free(sn_cache, n);
+	sp_free(n);
 }
 
 static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
@@ -2301,7 +2306,7 @@ int mpol_set_shared_policy(struct shared_policy *info,
 	}
 	err = shared_policy_replace(info, vma->vm_pgoff, vma->vm_pgoff+sz, new);
 	if (err && new)
-		kmem_cache_free(sn_cache, new);
+		sp_free(new);
 	return err;
 }
 
@@ -2318,9 +2323,7 @@ void mpol_free_shared_policy(struct shared_policy *p)
 	while (next) {
 		n = rb_entry(next, struct sp_node, nd);
 		next = rb_next(&n->nd);
-		rb_erase(&n->nd, &p->root);
-		mpol_put(n->policy);
-		kmem_cache_free(sn_cache, n);
+		sp_delete(p, n);
 	}
 	mutex_unlock(&p->mutex);
 }
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-10-09 16:58 [PATCH 0/5] Memory policy corruption fixes -stable Mel Gorman
                   ` (3 preceding siblings ...)
  2012-10-09 16:58 ` [PATCH 4/5] mempolicy: fix refcount leak in mpol_set_shared_policy() Mel Gorman
@ 2012-10-09 16:58 ` Mel Gorman
  2012-12-04 12:54   ` Tommi Rantala
  2012-10-10  0:47 ` [PATCH 0/5] Memory policy corruption fixes -stable Greg KH
  2012-10-14  9:13 ` Ben Hutchings
  6 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2012-10-09 16:58 UTC (permalink / raw)
  To: Stable
  Cc: Andi Kleen, Andrew Morton, KOSAKI Motohiro, Dave Jones,
	Christoph Lameter, Hugh Dickins, LKML, Linux-MM, Mel Gorman

commit 00442ad04a5eac08a98255697c510e708f6082e2 upstream.

Commit cc9a6c877661 ("cpuset: mm: reduce large amounts of memory barrier
related damage v3") introduced a potential memory corruption.
shmem_alloc_page() uses a pseudo vma and it has one significant unique
combination, vma->vm_ops=NULL and vma->policy->flags & MPOL_F_SHARED.

get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
and mpol_cond_put() DOES decrease a policy ref when a policy has
MPOL_F_SHARED.  Therefore, when a cpuset update race occurs,
alloc_pages_vma() falls in 'goto retry_cpuset' path, decrements the
reference count and frees the policy prematurely.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/mempolicy.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1763418..3d64b36 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1552,8 +1552,18 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
 									addr);
 			if (vpol)
 				pol = vpol;
-		} else if (vma->vm_policy)
+		} else if (vma->vm_policy) {
 			pol = vma->vm_policy;
+
+			/*
+			 * shmem_alloc_page() passes MPOL_F_SHARED policy with
+			 * a pseudo vma whose vma->vm_ops=NULL. Take a reference
+			 * count on these policies which will be dropped by
+			 * mpol_cond_put() later
+			 */
+			if (mpol_needs_cond_ref(pol))
+				mpol_get(pol);
+		}
 	}
 	if (!pol)
 		pol = &default_policy;
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Memory policy corruption fixes -stable
  2012-10-09 16:58 [PATCH 0/5] Memory policy corruption fixes -stable Mel Gorman
                   ` (4 preceding siblings ...)
  2012-10-09 16:58 ` [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma() Mel Gorman
@ 2012-10-10  0:47 ` Greg KH
  2012-10-14  9:13 ` Ben Hutchings
  6 siblings, 0 replies; 22+ messages in thread
From: Greg KH @ 2012-10-10  0:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Stable, Andi Kleen, Andrew Morton, KOSAKI Motohiro, Dave Jones,
	Christoph Lameter, Hugh Dickins, LKML, Linux-MM

On Tue, Oct 09, 2012 at 05:58:36PM +0100, Mel Gorman wrote:
> This is a backport of the series "Memory policy corruption fixes V2". This
> should apply to 3.6-stable, 3.5-stable, 3.4-stable and 3.0-stable without
> any difficulty.  It will not apply cleanly to 3.2 but just drop the "revert"
> patch and the rest of the series should apply.
> 
> I tested 3.6-stable and 3.0-stable with just the revert and trinity breaks
> as expected for the mempolicy tests. Applying the full series in both case
> allowed trinity to complete successfully. Andi Kleen reported previously
> that the series fixed a database performance regression[1].
> 
> [1] https://lkml.org/lkml/2012/8/22/585
> 
>  include/linux/mempolicy.h |    2 +-
>  mm/mempolicy.c            |  137 +++++++++++++++++++++++++++++----------------
>  2 files changed, 89 insertions(+), 50 deletions(-)

Looks good, thanks, now all queued up.

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] Memory policy corruption fixes -stable
  2012-10-09 16:58 [PATCH 0/5] Memory policy corruption fixes -stable Mel Gorman
                   ` (5 preceding siblings ...)
  2012-10-10  0:47 ` [PATCH 0/5] Memory policy corruption fixes -stable Greg KH
@ 2012-10-14  9:13 ` Ben Hutchings
  6 siblings, 0 replies; 22+ messages in thread
From: Ben Hutchings @ 2012-10-14  9:13 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Stable, Andi Kleen, Andrew Morton, KOSAKI Motohiro, Dave Jones,
	Christoph Lameter, Hugh Dickins, LKML, Linux-MM

[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]

On Tue, 2012-10-09 at 17:58 +0100, Mel Gorman wrote:
> This is a backport of the series "Memory policy corruption fixes V2". This
> should apply to 3.6-stable, 3.5-stable, 3.4-stable and 3.0-stable without
> any difficulty.  It will not apply cleanly to 3.2 but just drop the "revert"
> patch and the rest of the series should apply.
> 
> I tested 3.6-stable and 3.0-stable with just the revert and trinity breaks
> as expected for the mempolicy tests. Applying the full series in both case
> allowed trinity to complete successfully. Andi Kleen reported previously
> that the series fixed a database performance regression[1].
> 
> [1] https://lkml.org/lkml/2012/8/22/585
> 
>  include/linux/mempolicy.h |    2 +-
>  mm/mempolicy.c            |  137 +++++++++++++++++++++++++++++----------------
>  2 files changed, 89 insertions(+), 50 deletions(-)

I've queued up patches 2-5 for 3.2, thanks.

Ben.

-- 
Ben Hutchings
Always try to do things in chronological order;
it's less confusing that way.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-10-09 16:58 ` [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma() Mel Gorman
@ 2012-12-04 12:54   ` Tommi Rantala
  2012-12-04 14:15     ` Mel Gorman
  0 siblings, 1 reply; 22+ messages in thread
From: Tommi Rantala @ 2012-12-04 12:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Stable, Andi Kleen, Andrew Morton, KOSAKI Motohiro, Dave Jones,
	Christoph Lameter, Hugh Dickins, LKML, Linux-MM

2012/10/9 Mel Gorman <mgorman@suse.de>:
> commit 00442ad04a5eac08a98255697c510e708f6082e2 upstream.
>
> Commit cc9a6c877661 ("cpuset: mm: reduce large amounts of memory barrier
> related damage v3") introduced a potential memory corruption.
> shmem_alloc_page() uses a pseudo vma and it has one significant unique
> combination, vma->vm_ops=NULL and vma->policy->flags & MPOL_F_SHARED.
>
> get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
> and mpol_cond_put() DOES decrease a policy ref when a policy has
> MPOL_F_SHARED.  Therefore, when a cpuset update race occurs,
> alloc_pages_vma() falls in 'goto retry_cpuset' path, decrements the
> reference count and frees the policy prematurely.

Hello,

kmemleak is complaining about memory leaks that point to the mbind()
syscall. I've seen this only in v3.7-rcX, so I bisected this, and
found that this patch is the first mainline commit where I'm able to
reproduce it with Trinity.

$ ./trinity -q -C32 -c mbind -N100000
Trinity v1.1pre  Dave Jones <davej@redhat.com> 2012
[2823] Marking 64-bit syscall 237 (mbind) as enabled
[2823] Marking 32-bit syscall 274 (mbind) as enabled
Enabling syscall mbind
Initial random seed from time of day: 829620642 (0x317301a2)
[2824] Watchdog is alive
[2823] Started watchdog process, PID is 2824
[2825] Main thread is alive.
375 sockets created based on info from socket cachefile.
Generating file descriptors
Added 24 filenames from /dev
Added 23893 filenames from /proc
Added 8415 filenames from /sys
[2825] Random reseed: 4210789068 (0xfafb8acc)
[watchdog] 1060 iterations. [F:996 S:63]
[watchdog] 6588 iterations. [F:6119 S:468]
[watchdog] 12405 iterations. [F:11509 S:894]
[watchdog] 18163 iterations. [F:16850 S:1311]
[watchdog] 24001 iterations. [F:22260 S:1741]
[watchdog] 30122 iterations. [F:27935 S:2184]
[watchdog] 36074 iterations. [F:33465 S:2605]
[watchdog] 42042 iterations. [F:38971 S:3069]
[watchdog] 47949 iterations. [F:44419 S:3527]
[watchdog] 53873 iterations. [F:49908 S:3961]
[watchdog] 59719 iterations. [F:55345 S:4369]
[watchdog] 65583 iterations. [F:60787 S:4790]
[watchdog] 71690 iterations. [F:66451 S:5233]
[watchdog] 77755 iterations. [F:72070 S:5681]
[watchdog] 83850 iterations. [F:77714 S:6134]
[watchdog] 89877 iterations. [F:83296 S:6582]
[watchdog] 95890 iterations. [F:88851 S:7042]
[2825] Bailing main loop. Exit reason: Reached maximum syscall count
[2824] Watchdog exiting

Ran 100017 syscalls. Successes: 7355  Failures: 92665

# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88002c8b1060 (size 24):
  comm "trinity-child13", pid 2141, jiffies 4294861068 (age 1585.092s)
  hex dump (first 24 bytes):
    02 00 00 00 01 00 03 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00                          ........
  backtrace:
    [<ffffffff81a53481>] kmemleak_alloc+0x21/0x50
    [<ffffffff81196116>] kmem_cache_alloc+0x96/0x220
    [<ffffffff8118fd02>] __mpol_dup+0x22/0x190
    [<ffffffff8118feb8>] sp_alloc+0x48/0xa0
    [<ffffffff81190960>] mpol_set_shared_policy+0x40/0xd0
    [<ffffffff8115f1f8>] shmem_set_policy+0x28/0x30
    [<ffffffff811902c1>] mbind_range+0x1a1/0x210
    [<ffffffff811904fc>] do_mbind+0x1cc/0x2d0
    [<ffffffff811906a2>] sys_mbind+0xa2/0xb0
    [<ffffffff81a924a9>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff88003d83f168 (size 24):
  comm "trinity-child10", pid 2686, jiffies 4295117470 (age 1328.725s)
  hex dump (first 24 bytes):
    01 00 00 00 01 00 03 00 00 00 00 00 00 00 00 00  ................
    01 00 00 00 00 00 00 00                          ........
  backtrace:
    [<ffffffff81a53481>] kmemleak_alloc+0x21/0x50
    [<ffffffff81196116>] kmem_cache_alloc+0x96/0x220
    [<ffffffff8118fd02>] __mpol_dup+0x22/0x190
    [<ffffffff8118feb8>] sp_alloc+0x48/0xa0
    [<ffffffff81190960>] mpol_set_shared_policy+0x40/0xd0
    [<ffffffff8115f1f8>] shmem_set_policy+0x28/0x30
    [<ffffffff811902c1>] mbind_range+0x1a1/0x210
    [<ffffffff811904fc>] do_mbind+0x1cc/0x2d0
    [<ffffffff811906a2>] sys_mbind+0xa2/0xb0
    [<ffffffff81a924a9>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff
#

Since the patch is touching the reference counting, I suppose the
finding could be legit.

Tommi

> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Christoph Lameter <cl@linux.com>
> Cc: Josh Boyer <jwboyer@gmail.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/mempolicy.c |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 1763418..3d64b36 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1552,8 +1552,18 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
>                                                                         addr);
>                         if (vpol)
>                                 pol = vpol;
> -               } else if (vma->vm_policy)
> +               } else if (vma->vm_policy) {
>                         pol = vma->vm_policy;
> +
> +                       /*
> +                        * shmem_alloc_page() passes MPOL_F_SHARED policy with
> +                        * a pseudo vma whose vma->vm_ops=NULL. Take a reference
> +                        * count on these policies which will be dropped by
> +                        * mpol_cond_put() later
> +                        */
> +                       if (mpol_needs_cond_ref(pol))
> +                               mpol_get(pol);
> +               }
>         }
>         if (!pol)
>                 pol = &default_policy;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-12-04 12:54   ` Tommi Rantala
@ 2012-12-04 14:15     ` Mel Gorman
  2012-12-05  5:11       ` Hugh Dickins
  0 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2012-12-04 14:15 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Stable, Andi Kleen, Andrew Morton, KOSAKI Motohiro, Dave Jones,
	Christoph Lameter, Hugh Dickins, LKML, Linux-MM

On Tue, Dec 04, 2012 at 02:54:08PM +0200, Tommi Rantala wrote:
> 2012/10/9 Mel Gorman <mgorman@suse.de>:
> > commit 00442ad04a5eac08a98255697c510e708f6082e2 upstream.
> >
> > Commit cc9a6c877661 ("cpuset: mm: reduce large amounts of memory barrier
> > related damage v3") introduced a potential memory corruption.
> > shmem_alloc_page() uses a pseudo vma and it has one significant unique
> > combination, vma->vm_ops=NULL and vma->policy->flags & MPOL_F_SHARED.
> >
> > get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
> > and mpol_cond_put() DOES decrease a policy ref when a policy has
> > MPOL_F_SHARED.  Therefore, when a cpuset update race occurs,
> > alloc_pages_vma() falls in 'goto retry_cpuset' path, decrements the
> > reference count and frees the policy prematurely.
> 
> Hello,
> 
> kmemleak is complaining about memory leaks that point to the mbind()
> syscall. I've seen this only in v3.7-rcX, so I bisected this, and
> found that this patch is the first mainline commit where I'm able to
> reproduce it with Trinity.
> 

Uncool.

I'm writing this from an airport so am not in the position to test properly
but at a glance I'm not seeing what drops the reference count taken by
mpol_shared_policy_lookup() in all cases.  vm_ops->get_policy() probably
gets it right but what about shmem_alloc_page() and shmem_swapin()?

This patch is only compile tested. If the reference counts are dropped
somewhere I did not spot quickly then it'll cause a use-after-free bug
instead but is worth trying anyway.

diff --git a/mm/shmem.c b/mm/shmem.c
index 89341b6..6229a43 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -912,6 +912,7 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
 {
 	struct mempolicy mpol, *spol;
 	struct vm_area_struct pvma;
+	struct page *page;
 
 	spol = mpol_cond_copy(&mpol,
 			mpol_shared_policy_lookup(&info->policy, index));
@@ -922,13 +923,19 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
 	pvma.vm_pgoff = index + info->vfs_inode.i_ino;
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = spol;
-	return swapin_readahead(swap, gfp, &pvma, 0);
+	page = swapin_readahead(swap, gfp, &pvma, 0);
+
+	/* Drop reference taken by mpol_shared_policy_lookup() */
+	mpol_cond_put(pvma.vm_policy);
+
+	return page;
 }
 
 static struct page *shmem_alloc_page(gfp_t gfp,
 			struct shmem_inode_info *info, pgoff_t index)
 {
 	struct vm_area_struct pvma;
+	struct page *page;
 
 	/* Create a pseudo vma that just contains the policy */
 	pvma.vm_start = 0;
@@ -940,7 +947,12 @@ static struct page *shmem_alloc_page(gfp_t gfp,
 	/*
 	 * alloc_page_vma() will drop the shared policy reference
 	 */
-	return alloc_page_vma(gfp, &pvma, 0);
+	page = alloc_page_vma(gfp, &pvma, 0);
+
+	/* Drop reference taken by mpol_shared_policy_lookup() */
+	mpol_cond_put(pvma.vm_policy);
+
+	return page;
 }
 #else /* !CONFIG_NUMA */
 #ifdef CONFIG_TMPFS



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-12-04 14:15     ` Mel Gorman
@ 2012-12-05  5:11       ` Hugh Dickins
  2012-12-05  6:28         ` Hugh Dickins
  0 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2012-12-05  5:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Tommi Rantala, Stable, Andi Kleen, Andrew Morton,
	KOSAKI Motohiro, Dave Jones, Christoph Lameter, LKML, Linux-MM

On Tue, 4 Dec 2012, Mel Gorman wrote:
> On Tue, Dec 04, 2012 at 02:54:08PM +0200, Tommi Rantala wrote:
> > 2012/10/9 Mel Gorman <mgorman@suse.de>:
> > > commit 00442ad04a5eac08a98255697c510e708f6082e2 upstream.
> > >
> > > Commit cc9a6c877661 ("cpuset: mm: reduce large amounts of memory barrier
> > > related damage v3") introduced a potential memory corruption.
> > > shmem_alloc_page() uses a pseudo vma and it has one significant unique
> > > combination, vma->vm_ops=NULL and vma->policy->flags & MPOL_F_SHARED.
> > >
> > > get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
> > > and mpol_cond_put() DOES decrease a policy ref when a policy has
> > > MPOL_F_SHARED.  Therefore, when a cpuset update race occurs,
> > > alloc_pages_vma() falls in 'goto retry_cpuset' path, decrements the
> > > reference count and frees the policy prematurely.
> > 
> > Hello,
> > 
> > kmemleak is complaining about memory leaks that point to the mbind()
> > syscall. I've seen this only in v3.7-rcX, so I bisected this, and
> > found that this patch is the first mainline commit where I'm able to
> > reproduce it with Trinity.
> > 
> 
> Uncool.
> 
> I'm writing this from an airport so am not in the position to test properly
> but at a glance I'm not seeing what drops the reference count taken by
> mpol_shared_policy_lookup() in all cases.  vm_ops->get_policy() probably
> gets it right but what about shmem_alloc_page() and shmem_swapin()?
> 
> This patch is only compile tested. If the reference counts are dropped
> somewhere I did not spot quickly then it'll cause a use-after-free bug
> instead but is worth trying anyway.
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 89341b6..6229a43 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -912,6 +912,7 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
>  {
>  	struct mempolicy mpol, *spol;
>  	struct vm_area_struct pvma;
> +	struct page *page;
>  
>  	spol = mpol_cond_copy(&mpol,
>  			mpol_shared_policy_lookup(&info->policy, index));
> @@ -922,13 +923,19 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
>  	pvma.vm_pgoff = index + info->vfs_inode.i_ino;
>  	pvma.vm_ops = NULL;
>  	pvma.vm_policy = spol;
> -	return swapin_readahead(swap, gfp, &pvma, 0);
> +	page = swapin_readahead(swap, gfp, &pvma, 0);
> +
> +	/* Drop reference taken by mpol_shared_policy_lookup() */
> +	mpol_cond_put(pvma.vm_policy);
> +
> +	return page;
>  }
>  
>  static struct page *shmem_alloc_page(gfp_t gfp,
>  			struct shmem_inode_info *info, pgoff_t index)
>  {
>  	struct vm_area_struct pvma;
> +	struct page *page;
>  
>  	/* Create a pseudo vma that just contains the policy */
>  	pvma.vm_start = 0;
> @@ -940,7 +947,12 @@ static struct page *shmem_alloc_page(gfp_t gfp,
>  	/*
>  	 * alloc_page_vma() will drop the shared policy reference
>  	 */
> -	return alloc_page_vma(gfp, &pvma, 0);
> +	page = alloc_page_vma(gfp, &pvma, 0);
> +
> +	/* Drop reference taken by mpol_shared_policy_lookup() */
> +	mpol_cond_put(pvma.vm_policy);
> +
> +	return page;
>  }
>  #else /* !CONFIG_NUMA */
>  #ifdef CONFIG_TMPFS

Thank you, Tommi and Mel.  Easy enough for me to reproduce without
kmemleak and trinity, by mounting a tmpfs with mpol= and keeping an
eye on numa_policy in /proc/slabinfo while building a tree there.

Yes, your patch fixes it Mel, but I prefer it as below, with a couple
of mods: removing the no longer true comment, and leaving shmem_swapin()
alone with just a comment.  It appears to be the job of the rather weird
mpol_cond_copy() to drop the reference on the original mempolicy, and
clear MPOL_F_SHARED so the copy won't need one (it's trying to cope with
the fact that swapin_readahead will make an unknown number of calls to
alloc_page_vma).  So I'd rather not add another mpol_cond_put there,
whose cond will never be met.

I don't much like the result, but that's because it's adding further
cruft on top of the onstack pseudo-vma stuff: more impetus for me to
revisit the alloc_page_mpol() patch I worked on years ago, but gave
up when I couldn't understand the mpol refcounting: hopefully I'll
find that Kosaki's changes have made it all clearer now.

Please consent to the addition of your signoff: thanks!


[PATCH] tmpfs: fix shared mempolicy leak

From: Mel Gorman <mgorman@suse.de>

Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Awaiting-Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
---

 mm/shmem.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

--- 3.7-rc8/mm/shmem.c	2012-11-16 19:26:56.388459961 -0800
+++ linux/mm/shmem.c	2012-12-04 20:00:44.556241603 -0800
@@ -922,13 +922,17 @@ static struct page *shmem_swapin(swp_ent
 	pvma.vm_pgoff = index + info->vfs_inode.i_ino;
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = spol;
+
 	return swapin_readahead(swap, gfp, &pvma, 0);
+
+	/* mpol_cond_copy already dropped ref from mpol_shared_policy_lookup */
 }
 
 static struct page *shmem_alloc_page(gfp_t gfp,
 			struct shmem_inode_info *info, pgoff_t index)
 {
 	struct vm_area_struct pvma;
+	struct page *page;
 
 	/* Create a pseudo vma that just contains the policy */
 	pvma.vm_start = 0;
@@ -937,10 +941,12 @@ static struct page *shmem_alloc_page(gfp
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
 
-	/*
-	 * alloc_page_vma() will drop the shared policy reference
-	 */
-	return alloc_page_vma(gfp, &pvma, 0);
+	page = alloc_page_vma(gfp, &pvma, 0);
+
+	/* Drop reference taken by mpol_shared_policy_lookup() */
+	mpol_cond_put(pvma.vm_policy);
+
+	return page;
 }
 #else /* !CONFIG_NUMA */
 #ifdef CONFIG_TMPFS

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-12-05  5:11       ` Hugh Dickins
@ 2012-12-05  6:28         ` Hugh Dickins
  2012-12-05  7:24           ` [PATCH] tmpfs: fix shared mempolicy leak Hugh Dickins
  0 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2012-12-05  6:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Tommi Rantala, Stable, Andi Kleen, Andrew Morton,
	KOSAKI Motohiro, Dave Jones, Christoph Lameter, LKML, Linux-MM

On Tue, 4 Dec 2012, Hugh Dickins wrote:
> 
> Yes, your patch fixes it Mel, but I prefer it as below, with a couple
> of mods: removing the no longer true comment, and leaving shmem_swapin()
> alone with just a comment.  It appears to be the job of the rather weird
> mpol_cond_copy() to drop the reference on the original mempolicy, and
> clear MPOL_F_SHARED so the copy won't need one (it's trying to cope with
> the fact that swapin_readahead will make an unknown number of calls to
> alloc_page_vma).  So I'd rather not add another mpol_cond_put there,
> whose cond will never be met.

Hold on, ignore that patch for now, I think I had my priorities
upside down: it would be better for shmem_swapin() to behave as
you proposed, and we delete the mpol_cond_copy() weirdness instead.

Your 00442ad04a5e changed alloc_pages_vma() to keep its refcounting
in balance, so it now does not matter that swapin_readahead() makes
an unknown number of calls to it: we should simply take a reference
before and drop it after, just as you do in shmem_alloc_page().

I'd still like to revisit alloc_page_vma(), and its refcount
manipulations do now appear redundant; but changing that is not
something I want to get into in a last minute rush.  But getting rid
of mpol_cond_copy() should be safe and clear, I'll test that out now
and reply with an updated patch (or else admit I got confused).

Hugh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] tmpfs: fix shared mempolicy leak
  2012-12-05  6:28         ` Hugh Dickins
@ 2012-12-05  7:24           ` Hugh Dickins
  2012-12-05  9:52             ` Mel Gorman
  0 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2012-12-05  7:24 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Tommi Rantala, Stable, Andi Kleen, Andrew Morton,
	KOSAKI Motohiro, Dave Jones, Christoph Lameter, LKML, Linux-MM

From: Mel Gorman <mgorman@suse.de>

Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().

Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
the same refcounting there as in shmem_alloc_page(), delete its onstack
mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
those were invented to let swapin_readahead() make an unknown number of
calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a5e,
alloc_pages_vma() has kept refcount in balance, so now no problem.

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Awaiting-signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
---

 include/linux/mempolicy.h |   16 ----------------
 mm/mempolicy.c            |   22 ----------------------
 mm/shmem.c                |   26 ++++++++++++++++----------
 3 files changed, 16 insertions(+), 48 deletions(-)

--- 3.7-rc8/include/linux/mempolicy.h	2012-10-14 16:16:57.637308925 -0700
+++ linux/include/linux/mempolicy.h	2012-12-04 22:38:21.812178829 -0800
@@ -82,16 +82,6 @@ static inline void mpol_cond_put(struct
 		__mpol_put(pol);
 }
 
-extern struct mempolicy *__mpol_cond_copy(struct mempolicy *tompol,
-					  struct mempolicy *frompol);
-static inline struct mempolicy *mpol_cond_copy(struct mempolicy *tompol,
-						struct mempolicy *frompol)
-{
-	if (!frompol)
-		return frompol;
-	return __mpol_cond_copy(tompol, frompol);
-}
-
 extern struct mempolicy *__mpol_dup(struct mempolicy *pol);
 static inline struct mempolicy *mpol_dup(struct mempolicy *pol)
 {
@@ -215,12 +205,6 @@ static inline void mpol_cond_put(struct
 {
 }
 
-static inline struct mempolicy *mpol_cond_copy(struct mempolicy *to,
-						struct mempolicy *from)
-{
-	return from;
-}
-
 static inline void mpol_get(struct mempolicy *pol)
 {
 }
--- 3078/mm/mempolicy.c	2012-10-20 20:56:24.675917367 -0700
+++ 3078X/mm/mempolicy.c	2012-12-04 22:33:31.516171929 -0800
@@ -2037,28 +2037,6 @@ struct mempolicy *__mpol_dup(struct memp
 	return new;
 }
 
-/*
- * If *frompol needs [has] an extra ref, copy *frompol to *tompol ,
- * eliminate the * MPOL_F_* flags that require conditional ref and
- * [NOTE!!!] drop the extra ref.  Not safe to reference *frompol directly
- * after return.  Use the returned value.
- *
- * Allows use of a mempolicy for, e.g., multiple allocations with a single
- * policy lookup, even if the policy needs/has extra ref on lookup.
- * shmem_readahead needs this.
- */
-struct mempolicy *__mpol_cond_copy(struct mempolicy *tompol,
-						struct mempolicy *frompol)
-{
-	if (!mpol_needs_cond_ref(frompol))
-		return frompol;
-
-	*tompol = *frompol;
-	tompol->flags &= ~MPOL_F_SHARED;	/* copy doesn't need unref */
-	__mpol_put(frompol);
-	return tompol;
-}
-
 /* Slow path of a mempolicy comparison */
 bool __mpol_equal(struct mempolicy *a, struct mempolicy *b)
 {
--- 3078/mm/shmem.c	2012-11-16 19:26:56.388459961 -0800
+++ 3078X/mm/shmem.c	2012-12-04 22:32:35.328170594 -0800
@@ -910,25 +910,29 @@ static struct mempolicy *shmem_get_sbmpo
 static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
 			struct shmem_inode_info *info, pgoff_t index)
 {
-	struct mempolicy mpol, *spol;
 	struct vm_area_struct pvma;
-
-	spol = mpol_cond_copy(&mpol,
-			mpol_shared_policy_lookup(&info->policy, index));
+	struct page *page;
 
 	/* Create a pseudo vma that just contains the policy */
 	pvma.vm_start = 0;
 	/* Bias interleave by inode number to distribute better across nodes */
 	pvma.vm_pgoff = index + info->vfs_inode.i_ino;
 	pvma.vm_ops = NULL;
-	pvma.vm_policy = spol;
-	return swapin_readahead(swap, gfp, &pvma, 0);
+	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
+
+	page = swapin_readahead(swap, gfp, &pvma, 0);
+
+	/* Drop reference taken by mpol_shared_policy_lookup() */
+	mpol_cond_put(pvma.vm_policy);
+
+	return page;
 }
 
 static struct page *shmem_alloc_page(gfp_t gfp,
 			struct shmem_inode_info *info, pgoff_t index)
 {
 	struct vm_area_struct pvma;
+	struct page *page;
 
 	/* Create a pseudo vma that just contains the policy */
 	pvma.vm_start = 0;
@@ -937,10 +941,12 @@ static struct page *shmem_alloc_page(gfp
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
 
-	/*
-	 * alloc_page_vma() will drop the shared policy reference
-	 */
-	return alloc_page_vma(gfp, &pvma, 0);
+	page = alloc_page_vma(gfp, &pvma, 0);
+
+	/* Drop reference taken by mpol_shared_policy_lookup() */
+	mpol_cond_put(pvma.vm_policy);
+
+	return page;
 }
 #else /* !CONFIG_NUMA */
 #ifdef CONFIG_TMPFS

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tmpfs: fix shared mempolicy leak
  2012-12-05  7:24           ` [PATCH] tmpfs: fix shared mempolicy leak Hugh Dickins
@ 2012-12-05  9:52             ` Mel Gorman
  2012-12-05 20:25               ` Tommi Rantala
  0 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2012-12-05  9:52 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Tommi Rantala, Stable, Andi Kleen, Andrew Morton,
	KOSAKI Motohiro, Dave Jones, Christoph Lameter, LKML, Linux-MM

On Tue, Dec 04, 2012 at 11:24:30PM -0800, Hugh Dickins wrote:
> From: Mel Gorman <mgorman@suse.de>
> 
> Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount
> imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
> refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
> on expecting alloc_page_vma() to drop the refcount it had acquired.
> This deserves a rework: but for now fix the leak in shmem_alloc_page().
> 
> Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
> the same refcounting there as in shmem_alloc_page(), delete its onstack
> mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
> those were invented to let swapin_readahead() make an unknown number of
> calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a5e,
> alloc_pages_vma() has kept refcount in balance, so now no problem.
> 

Agreed. Anything that reduces the complexity of the mempolicy ref counting
is worthwhile even if it's only by a small bit.

> Reported-by: Tommi Rantala <tt.rantala@gmail.com>
> Awaiting-signed-off-by: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org

Thanks Hugh for turning gibber into a patch!

Signed-off-by: Mel Gorman <mgorman@suse.de>

Tommi, just in case, can you confirm this fixes the problem for you please?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tmpfs: fix shared mempolicy leak
  2012-12-05  9:52             ` Mel Gorman
@ 2012-12-05 20:25               ` Tommi Rantala
  2012-12-05 21:59                 ` Hugh Dickins
  0 siblings, 1 reply; 22+ messages in thread
From: Tommi Rantala @ 2012-12-05 20:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Hugh Dickins, Stable, Andi Kleen, Andrew Morton, KOSAKI Motohiro,
	Dave Jones, Christoph Lameter, LKML, Linux-MM

2012/12/5 Mel Gorman <mgorman@suse.de>:
> On Tue, Dec 04, 2012 at 11:24:30PM -0800, Hugh Dickins wrote:
>> From: Mel Gorman <mgorman@suse.de>
>>
>> Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount
>> imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
>> refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
>> on expecting alloc_page_vma() to drop the refcount it had acquired.
>> This deserves a rework: but for now fix the leak in shmem_alloc_page().
>
> Thanks Hugh for turning gibber into a patch!
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
>
> Tommi, just in case, can you confirm this fixes the problem for you please?

Confirmed! No more complaints from kmemleak.

Thanks,
Tommi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tmpfs: fix shared mempolicy leak
  2012-12-05 20:25               ` Tommi Rantala
@ 2012-12-05 21:59                 ` Hugh Dickins
  2012-12-05 22:01                   ` Hugh Dickins
  0 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2012-12-05 21:59 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Mel Gorman, Andi Kleen, Andrew Morton, KOSAKI Motohiro,
	Dave Jones, Christoph Lameter, LKML, Linux-MM

On Wed, 5 Dec 2012, Tommi Rantala wrote:
> 2012/12/5 Mel Gorman <mgorman@suse.de>:
> > On Tue, Dec 04, 2012 at 11:24:30PM -0800, Hugh Dickins wrote:
> >> From: Mel Gorman <mgorman@suse.de>
> >>
> >> Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount
> >> imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
> >> refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
> >> on expecting alloc_page_vma() to drop the refcount it had acquired.
> >> This deserves a rework: but for now fix the leak in shmem_alloc_page().
> >
> > Thanks Hugh for turning gibber into a patch!
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> >
> > Tommi, just in case, can you confirm this fixes the problem for you please?
> 
> Confirmed! No more complaints from kmemleak.

Great, thanks.  I'll update the tags and send straight to Linus now.

Hugh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] tmpfs: fix shared mempolicy leak
  2012-12-05 21:59                 ` Hugh Dickins
@ 2012-12-05 22:01                   ` Hugh Dickins
  0 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2012-12-05 22:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tommi Rantala, Mel Gorman, Andi Kleen, Andrew Morton,
	KOSAKI Motohiro, Dave Jones, Christoph Lameter, linux-kernel,
	linux-mm

From: Mel Gorman <mgorman@suse.de>

This fixes a regression in 3.7-rc, which has since gone into stable.

Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().

Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
the same refcounting there as in shmem_alloc_page(), delete its onstack
mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
those were invented to let swapin_readahead() make an unknown number of
calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a5e,
alloc_pages_vma() has kept refcount in balance, so now no problem.

Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
---

 include/linux/mempolicy.h |   16 ----------------
 mm/mempolicy.c            |   22 ----------------------
 mm/shmem.c                |   26 ++++++++++++++++----------
 3 files changed, 16 insertions(+), 48 deletions(-)

--- 3.7-rc8/include/linux/mempolicy.h	2012-10-14 16:16:57.637308925 -0700
+++ linux/include/linux/mempolicy.h	2012-12-04 22:38:21.812178829 -0800
@@ -82,16 +82,6 @@ static inline void mpol_cond_put(struct
 		__mpol_put(pol);
 }
 
-extern struct mempolicy *__mpol_cond_copy(struct mempolicy *tompol,
-					  struct mempolicy *frompol);
-static inline struct mempolicy *mpol_cond_copy(struct mempolicy *tompol,
-						struct mempolicy *frompol)
-{
-	if (!frompol)
-		return frompol;
-	return __mpol_cond_copy(tompol, frompol);
-}
-
 extern struct mempolicy *__mpol_dup(struct mempolicy *pol);
 static inline struct mempolicy *mpol_dup(struct mempolicy *pol)
 {
@@ -215,12 +205,6 @@ static inline void mpol_cond_put(struct
 {
 }
 
-static inline struct mempolicy *mpol_cond_copy(struct mempolicy *to,
-						struct mempolicy *from)
-{
-	return from;
-}
-
 static inline void mpol_get(struct mempolicy *pol)
 {
 }
--- 3078/mm/mempolicy.c	2012-10-20 20:56:24.675917367 -0700
+++ 3078X/mm/mempolicy.c	2012-12-04 22:33:31.516171929 -0800
@@ -2037,28 +2037,6 @@ struct mempolicy *__mpol_dup(struct memp
 	return new;
 }
 
-/*
- * If *frompol needs [has] an extra ref, copy *frompol to *tompol ,
- * eliminate the * MPOL_F_* flags that require conditional ref and
- * [NOTE!!!] drop the extra ref.  Not safe to reference *frompol directly
- * after return.  Use the returned value.
- *
- * Allows use of a mempolicy for, e.g., multiple allocations with a single
- * policy lookup, even if the policy needs/has extra ref on lookup.
- * shmem_readahead needs this.
- */
-struct mempolicy *__mpol_cond_copy(struct mempolicy *tompol,
-						struct mempolicy *frompol)
-{
-	if (!mpol_needs_cond_ref(frompol))
-		return frompol;
-
-	*tompol = *frompol;
-	tompol->flags &= ~MPOL_F_SHARED;	/* copy doesn't need unref */
-	__mpol_put(frompol);
-	return tompol;
-}
-
 /* Slow path of a mempolicy comparison */
 bool __mpol_equal(struct mempolicy *a, struct mempolicy *b)
 {
--- 3078/mm/shmem.c	2012-11-16 19:26:56.388459961 -0800
+++ 3078X/mm/shmem.c	2012-12-04 22:32:35.328170594 -0800
@@ -910,25 +910,29 @@ static struct mempolicy *shmem_get_sbmpo
 static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
 			struct shmem_inode_info *info, pgoff_t index)
 {
-	struct mempolicy mpol, *spol;
 	struct vm_area_struct pvma;
-
-	spol = mpol_cond_copy(&mpol,
-			mpol_shared_policy_lookup(&info->policy, index));
+	struct page *page;
 
 	/* Create a pseudo vma that just contains the policy */
 	pvma.vm_start = 0;
 	/* Bias interleave by inode number to distribute better across nodes */
 	pvma.vm_pgoff = index + info->vfs_inode.i_ino;
 	pvma.vm_ops = NULL;
-	pvma.vm_policy = spol;
-	return swapin_readahead(swap, gfp, &pvma, 0);
+	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
+
+	page = swapin_readahead(swap, gfp, &pvma, 0);
+
+	/* Drop reference taken by mpol_shared_policy_lookup() */
+	mpol_cond_put(pvma.vm_policy);
+
+	return page;
 }
 
 static struct page *shmem_alloc_page(gfp_t gfp,
 			struct shmem_inode_info *info, pgoff_t index)
 {
 	struct vm_area_struct pvma;
+	struct page *page;
 
 	/* Create a pseudo vma that just contains the policy */
 	pvma.vm_start = 0;
@@ -937,10 +941,12 @@ static struct page *shmem_alloc_page(gfp
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
 
-	/*
-	 * alloc_page_vma() will drop the shared policy reference
-	 */
-	return alloc_page_vma(gfp, &pvma, 0);
+	page = alloc_page_vma(gfp, &pvma, 0);
+
+	/* Drop reference taken by mpol_shared_policy_lookup() */
+	mpol_cond_put(pvma.vm_policy);
+
+	return page;
 }
 #else /* !CONFIG_NUMA */
 #ifdef CONFIG_TMPFS

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-08-21  7:26     ` Mel Gorman
  2012-08-21 15:37       ` Christoph Lameter
@ 2012-09-07 23:06       ` KOSAKI Motohiro
  1 sibling, 0 replies; 22+ messages in thread
From: KOSAKI Motohiro @ 2012-09-07 23:06 UTC (permalink / raw)
  To: mgorman
  Cc: cl, akpm, kosaki.motohiro, davej, ben, ak, hughd, linux-kernel, linux-mm

> mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
> 
> [cc9a6c87: cpuset: mm: reduce large amounts of memory barrier related damage
> v3] introduced a potential memory corruption. shmem_alloc_page() uses a
> pseudo vma and it has one significant unique combination, vma->vm_ops=NULL
> and vma->policy->flags & MPOL_F_SHARED.
> 
> get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL and
> mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
> Therefore, when a cpuset update race occurs, alloc_pages_vma() falls in 'goto
> retry_cpuset' path, decrements the reference count and frees the policy
> prematurely.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/mempolicy.c |   12 +++++++++++-
>  1 files changed, 11 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 45f9825..9842ef5 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1552,8 +1552,18 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
>  									addr);
>  			if (vpol)
>  				pol = vpol;
> -		} else if (vma->vm_policy)
> +		} else if (vma->vm_policy) {
>  			pol = vma->vm_policy;
> +
> +			/*
> +			 * shmem_alloc_page() passes MPOL_F_SHARED policy with
> +			 * a pseudo vma whose vma->vm_ops=NULL. Take a reference
> +			 * count on these policies which will be dropped by
> +			 * mpol_cond_put() later
> +			 */
> +			if (mpol_needs_cond_ref(pol))
> +				mpol_get(pol);
> +		}

Ok, looks sene change. thank you.


Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

>  	}
>  	if (!pol)
>  		pol = &default_policy;
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-08-21  7:26     ` Mel Gorman
@ 2012-08-21 15:37       ` Christoph Lameter
  2012-09-07 23:06       ` KOSAKI Motohiro
  1 sibling, 0 replies; 22+ messages in thread
From: Christoph Lameter @ 2012-08-21 15:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, KOSAKI Motohiro, Dave Jones, Ben Hutchings,
	Andi Kleen, Hugh Dickins, LKML, Linux-MM

On Tue, 21 Aug 2012, Mel Gorman wrote:

> mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()

Reviewed-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-08-20 19:51   ` Christoph Lameter
@ 2012-08-21  7:26     ` Mel Gorman
  2012-08-21 15:37       ` Christoph Lameter
  2012-09-07 23:06       ` KOSAKI Motohiro
  0 siblings, 2 replies; 22+ messages in thread
From: Mel Gorman @ 2012-08-21  7:26 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, KOSAKI Motohiro, Dave Jones, Ben Hutchings,
	Andi Kleen, Hugh Dickins, LKML, Linux-MM

On Mon, Aug 20, 2012 at 07:51:10PM +0000, Christoph Lameter wrote:
> On Mon, 20 Aug 2012, Mel Gorman wrote:
> 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 45f9825..82e872f 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1545,15 +1545,28 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
> >  		struct vm_area_struct *vma, unsigned long addr)
> >  {
> >  	struct mempolicy *pol = task->mempolicy;
> > +	int got_ref;
> 
> New variable. Need to set it to zero?
> 

Not needed at all, I was meant to get rid of it. Ben had pointed out this
exact problem with the initialisation.

> >
> >  	if (vma) {
> >  		if (vma->vm_ops && vma->vm_ops->get_policy) {
> >  			struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
> >  									addr);
> > -			if (vpol)
> > +			if (vpol) {
> >  				pol = vpol;
> > -		} else if (vma->vm_policy)
> > +				got_ref = 1;
> 
> Set the new variable. But it was not initialzed before. So now its 1 or
> undefined?
> 

It's not even needed because the next block is the code that originally
cared about the value of got_ref.

> > +			}
> > +		} else if (vma->vm_policy) {
> >  			pol = vma->vm_policy;
> > +
> > +			/*
> > +			 * shmem_alloc_page() passes MPOL_F_SHARED policy with
> > +			 * a pseudo vma whose vma->vm_ops=NULL. Take a reference
> > +			 * count on these policies which will be dropped by
> > +			 * mpol_cond_put() later
> > +			 */
> > +			if (mpol_needs_cond_ref(pol))
> > +				mpol_get(pol);
> > +		}
> >  	}
> >  	if (!pol)
> >  		pol = &default_policy;
> >
> 
> I do not see any use of got_ref. Can we get rid of the variable?
> 

Yes, here is a correct version of the patch. Thanks Christoph.

---8<---
mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()

[cc9a6c87: cpuset: mm: reduce large amounts of memory barrier related damage
v3] introduced a potential memory corruption. shmem_alloc_page() uses a
pseudo vma and it has one significant unique combination, vma->vm_ops=NULL
and vma->policy->flags & MPOL_F_SHARED.

get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL and
mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
Therefore, when a cpuset update race occurs, alloc_pages_vma() falls in 'goto
retry_cpuset' path, decrements the reference count and frees the policy
prematurely.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/mempolicy.c |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 45f9825..9842ef5 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1552,8 +1552,18 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
 									addr);
 			if (vpol)
 				pol = vpol;
-		} else if (vma->vm_policy)
+		} else if (vma->vm_policy) {
 			pol = vma->vm_policy;
+
+			/*
+			 * shmem_alloc_page() passes MPOL_F_SHARED policy with
+			 * a pseudo vma whose vma->vm_ops=NULL. Take a reference
+			 * count on these policies which will be dropped by
+			 * mpol_cond_put() later
+			 */
+			if (mpol_needs_cond_ref(pol))
+				mpol_get(pol);
+		}
 	}
 	if (!pol)
 		pol = &default_policy;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-08-20 16:36 ` [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma() Mel Gorman
@ 2012-08-20 19:51   ` Christoph Lameter
  2012-08-21  7:26     ` Mel Gorman
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2012-08-20 19:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, KOSAKI Motohiro, Dave Jones, Ben Hutchings,
	Andi Kleen, Hugh Dickins, LKML, Linux-MM

On Mon, 20 Aug 2012, Mel Gorman wrote:

> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 45f9825..82e872f 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1545,15 +1545,28 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
>  		struct vm_area_struct *vma, unsigned long addr)
>  {
>  	struct mempolicy *pol = task->mempolicy;
> +	int got_ref;

New variable. Need to set it to zero?

>
>  	if (vma) {
>  		if (vma->vm_ops && vma->vm_ops->get_policy) {
>  			struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
>  									addr);
> -			if (vpol)
> +			if (vpol) {
>  				pol = vpol;
> -		} else if (vma->vm_policy)
> +				got_ref = 1;

Set the new variable. But it was not initialzed before. So now its 1 or
undefined?

> +			}
> +		} else if (vma->vm_policy) {
>  			pol = vma->vm_policy;
> +
> +			/*
> +			 * shmem_alloc_page() passes MPOL_F_SHARED policy with
> +			 * a pseudo vma whose vma->vm_ops=NULL. Take a reference
> +			 * count on these policies which will be dropped by
> +			 * mpol_cond_put() later
> +			 */
> +			if (mpol_needs_cond_ref(pol))
> +				mpol_get(pol);
> +		}
>  	}
>  	if (!pol)
>  		pol = &default_policy;
>

I do not see any use of got_ref. Can we get rid of the variable?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-08-20 16:36 [PATCH 0/5] Memory policy corruption fixes V2 Mel Gorman
@ 2012-08-20 16:36 ` Mel Gorman
  2012-08-20 19:51   ` Christoph Lameter
  0 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2012-08-20 16:36 UTC (permalink / raw)
  To: Andrew Morton, KOSAKI Motohiro
  Cc: Dave Jones, Christoph Lameter, Ben Hutchings, Andi Kleen,
	Hugh Dickins, LKML, Linux-MM, Mel Gorman

[cc9a6c87: cpuset: mm: reduce large amounts of memory barrier related damage
v3] introduced a potential memory corruption. shmem_alloc_page() uses a
pseudo vma and it has one significant unique combination, vma->vm_ops=NULL
and vma->policy->flags & MPOL_F_SHARED.

get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL and
mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
Therefore, when a cpuset update race occurs, alloc_pages_vma() falls in 'goto
retry_cpuset' path, decrements the reference count and frees the policy
prematurely.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/mempolicy.c |   17 +++++++++++++++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 45f9825..82e872f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1545,15 +1545,28 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
 		struct vm_area_struct *vma, unsigned long addr)
 {
 	struct mempolicy *pol = task->mempolicy;
+	int got_ref;
 
 	if (vma) {
 		if (vma->vm_ops && vma->vm_ops->get_policy) {
 			struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
 									addr);
-			if (vpol)
+			if (vpol) {
 				pol = vpol;
-		} else if (vma->vm_policy)
+				got_ref = 1;
+			}
+		} else if (vma->vm_policy) {
 			pol = vma->vm_policy;
+
+			/*
+			 * shmem_alloc_page() passes MPOL_F_SHARED policy with
+			 * a pseudo vma whose vma->vm_ops=NULL. Take a reference
+			 * count on these policies which will be dropped by
+			 * mpol_cond_put() later
+			 */
+			if (mpol_needs_cond_ref(pol))
+				mpol_get(pol);
+		}
 	}
 	if (!pol)
 		pol = &default_policy;
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-12-05 22:01 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-09 16:58 [PATCH 0/5] Memory policy corruption fixes -stable Mel Gorman
2012-10-09 16:58 ` [PATCH 1/5] revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages" Mel Gorman
2012-10-09 16:58 ` [PATCH 2/5] mempolicy: remove mempolicy sharing Mel Gorman
2012-10-09 16:58 ` [PATCH 3/5] mempolicy: fix a race in shared_policy_replace() Mel Gorman
2012-10-09 16:58 ` [PATCH 4/5] mempolicy: fix refcount leak in mpol_set_shared_policy() Mel Gorman
2012-10-09 16:58 ` [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma() Mel Gorman
2012-12-04 12:54   ` Tommi Rantala
2012-12-04 14:15     ` Mel Gorman
2012-12-05  5:11       ` Hugh Dickins
2012-12-05  6:28         ` Hugh Dickins
2012-12-05  7:24           ` [PATCH] tmpfs: fix shared mempolicy leak Hugh Dickins
2012-12-05  9:52             ` Mel Gorman
2012-12-05 20:25               ` Tommi Rantala
2012-12-05 21:59                 ` Hugh Dickins
2012-12-05 22:01                   ` Hugh Dickins
2012-10-10  0:47 ` [PATCH 0/5] Memory policy corruption fixes -stable Greg KH
2012-10-14  9:13 ` Ben Hutchings
  -- strict thread matches above, loose matches on Subject: below --
2012-08-20 16:36 [PATCH 0/5] Memory policy corruption fixes V2 Mel Gorman
2012-08-20 16:36 ` [PATCH 5/5] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma() Mel Gorman
2012-08-20 19:51   ` Christoph Lameter
2012-08-21  7:26     ` Mel Gorman
2012-08-21 15:37       ` Christoph Lameter
2012-09-07 23:06       ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).