All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30  9:02 ` kosaki.motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Hi Linus and all

Sorry for the long long delay.

Maybe I've finished a code auditing of mempolicy.c. It was hard work, multiple
mistakes hided real issues. :-/

Yes, my code auditing is too slow as usual, I know. But probably a slow work is 
better than do nothing, maybe.

Sadly, I've found multiple long living bugs and almost patches in this area were
 committed with no review.  That's sad. Therefore, I'd like to maintain this area 
a while (hopefully short) until known breakage will be fixed or finding another
volunteer. The purpose is not developing new feature. It is preventing more
corruption until known issue is fixed. (Of course, if anyone want to take the role,
I'm really happy and drop patch 6/6.

Here are some bandaid fixing patch set for mempolicy. It may fix all known uservisible
corruptions. But, I know, we have a lot of remained work. Below are unfixed issues.
(because of, they are broken by design. I can't fix them w/o design change)

1) mpol mount option of tmpfs doesn't work correctly

When not using =static mount option modifier, a nodemask is affected cpuset restriction 
of *inode creation* thread and ignore page allocation thread.

2) cpuset rebinding doesn't work correctly if shmem are shared from multiple processes.

mbind(2) (and set_mempolicy(2)) stores pre-calculated mask, say, 
(mbind-mask & cpuset-mask & online-nodes-mask) into struct mempolicy.
and the mempolicy attaches vma and shmem regions. But, of course, 
cpuset-mask depend on a thread. Thus we can't calculate final node mask
on ahead.

3) mbind(2) doesn't work correctly if shmem are shared from multiple processes.

The same of (2). mbind(2) stores final nodemask, but another thread have another
final node mask. The calculation is incorrect.


So, I think we should reconsider about shared mempolicy completely. If my remember 
is correct, Hugh said he worry about this feature because nobody seems
to use and lots bug about 4 years ago. As usually he is, he is correct. now, I'm 
convinced nobody uses this feature. Current  all of my patches are trivial and 
we can easily reproduce kernel memory corruption. And it don't work proper 
functionality (see above todo list). Thus, I'd like to hear his opinion. if he 
still dislike shared mempolicy feature, I'm ok to remove it entirely instead of 
applying current bandaid patch set.

Thank you.


-----------------------------------------------
KOSAKI Motohiro (6):
  Revert "mm: mempolicy: Let vma_merge and vma_split handle
    vma->vm_policy linkages"
  mempolicy: Kill all mempolicy sharing
  mempolicy: fix a race in shared_policy_replace()
  mempolicy: fix refcount leak in mpol_set_shared_policy()
  mempolicy: fix a memory corruption by refcount imbalance in
    alloc_pages_vma()
  MAINTAINERS: Added MEMPOLICY entry

 MAINTAINERS    |    7 +++
 mm/mempolicy.c |  151 ++++++++++++++++++++++++++++++++++++++++----------------
 mm/shmem.c     |    9 ++--
 3 files changed, 120 insertions(+), 47 deletions(-)


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30  9:02 ` kosaki.motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Hi Linus and all

Sorry for the long long delay.

Maybe I've finished a code auditing of mempolicy.c. It was hard work, multiple
mistakes hided real issues. :-/

Yes, my code auditing is too slow as usual, I know. But probably a slow work is 
better than do nothing, maybe.

Sadly, I've found multiple long living bugs and almost patches in this area were
 committed with no review.  That's sad. Therefore, I'd like to maintain this area 
a while (hopefully short) until known breakage will be fixed or finding another
volunteer. The purpose is not developing new feature. It is preventing more
corruption until known issue is fixed. (Of course, if anyone want to take the role,
I'm really happy and drop patch 6/6.

Here are some bandaid fixing patch set for mempolicy. It may fix all known uservisible
corruptions. But, I know, we have a lot of remained work. Below are unfixed issues.
(because of, they are broken by design. I can't fix them w/o design change)

1) mpol mount option of tmpfs doesn't work correctly

When not using =static mount option modifier, a nodemask is affected cpuset restriction 
of *inode creation* thread and ignore page allocation thread.

2) cpuset rebinding doesn't work correctly if shmem are shared from multiple processes.

mbind(2) (and set_mempolicy(2)) stores pre-calculated mask, say, 
(mbind-mask & cpuset-mask & online-nodes-mask) into struct mempolicy.
and the mempolicy attaches vma and shmem regions. But, of course, 
cpuset-mask depend on a thread. Thus we can't calculate final node mask
on ahead.

3) mbind(2) doesn't work correctly if shmem are shared from multiple processes.

The same of (2). mbind(2) stores final nodemask, but another thread have another
final node mask. The calculation is incorrect.


So, I think we should reconsider about shared mempolicy completely. If my remember 
is correct, Hugh said he worry about this feature because nobody seems
to use and lots bug about 4 years ago. As usually he is, he is correct. now, I'm 
convinced nobody uses this feature. Current  all of my patches are trivial and 
we can easily reproduce kernel memory corruption. And it don't work proper 
functionality (see above todo list). Thus, I'd like to hear his opinion. if he 
still dislike shared mempolicy feature, I'm ok to remove it entirely instead of 
applying current bandaid patch set.

Thank you.


-----------------------------------------------
KOSAKI Motohiro (6):
  Revert "mm: mempolicy: Let vma_merge and vma_split handle
    vma->vm_policy linkages"
  mempolicy: Kill all mempolicy sharing
  mempolicy: fix a race in shared_policy_replace()
  mempolicy: fix refcount leak in mpol_set_shared_policy()
  mempolicy: fix a memory corruption by refcount imbalance in
    alloc_pages_vma()
  MAINTAINERS: Added MEMPOLICY entry

 MAINTAINERS    |    7 +++
 mm/mempolicy.c |  151 ++++++++++++++++++++++++++++++++++++++++----------------
 mm/shmem.c     |    9 ++--
 3 files changed, 120 insertions(+), 47 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
  2012-05-30  9:02 ` kosaki.motohiro
@ 2012-05-30  9:02   ` kosaki.motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro, Andrew Morton,
	KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>

commit 05f144a0d5 removed vma->vm_policy updates code and it is a purpose of
mbind_range(). Now, mbind_range() is virtually no-op. no-op function don't
makes any bugs, I agree. but maybe it is not right fix.

Cc: Dave Jones <davej@redhat.com>,
Cc: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mempolicy.c |   41 ++++++++++++++++++++++++-----------------
 1 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f15c1b2..0a60def 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -607,6 +607,27 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
 	return first;
 }
 
+/* Apply policy to a single VMA */
+static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
+{
+	int err = 0;
+	struct mempolicy *old = vma->vm_policy;
+
+	pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
+		 vma->vm_start, vma->vm_end, vma->vm_pgoff,
+		 vma->vm_ops, vma->vm_file,
+		 vma->vm_ops ? vma->vm_ops->set_policy : NULL);
+
+	if (vma->vm_ops && vma->vm_ops->set_policy)
+		err = vma->vm_ops->set_policy(vma, new);
+	if (!err) {
+		mpol_get(new);
+		vma->vm_policy = new;
+		mpol_put(old);
+	}
+	return err;
+}
+
 /* Step 2: apply policy to a range and do splits. */
 static int mbind_range(struct mm_struct *mm, unsigned long start,
 		       unsigned long end, struct mempolicy *new_pol)
@@ -655,23 +676,9 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 			if (err)
 				goto out;
 		}
-
-		/*
-		 * Apply policy to a single VMA. The reference counting of
-		 * policy for vma_policy linkages has already been handled by
-		 * vma_merge and split_vma as necessary. If this is a shared
-		 * policy then ->set_policy will increment the reference count
-		 * for an sp node.
-		 */
-		pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
-			vma->vm_start, vma->vm_end, vma->vm_pgoff,
-			vma->vm_ops, vma->vm_file,
-			vma->vm_ops ? vma->vm_ops->set_policy : NULL);
-		if (vma->vm_ops && vma->vm_ops->set_policy) {
-			err = vma->vm_ops->set_policy(vma, new_pol);
-			if (err)
-				goto out;
-		}
+		err = policy_vma(vma, new_pol);
+		if (err)
+			goto out;
 	}
 
  out:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
@ 2012-05-30  9:02   ` kosaki.motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro, Andrew Morton,
	KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>

commit 05f144a0d5 removed vma->vm_policy updates code and it is a purpose of
mbind_range(). Now, mbind_range() is virtually no-op. no-op function don't
makes any bugs, I agree. but maybe it is not right fix.

Cc: Dave Jones <davej@redhat.com>,
Cc: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mempolicy.c |   41 ++++++++++++++++++++++++-----------------
 1 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f15c1b2..0a60def 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -607,6 +607,27 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
 	return first;
 }
 
+/* Apply policy to a single VMA */
+static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
+{
+	int err = 0;
+	struct mempolicy *old = vma->vm_policy;
+
+	pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
+		 vma->vm_start, vma->vm_end, vma->vm_pgoff,
+		 vma->vm_ops, vma->vm_file,
+		 vma->vm_ops ? vma->vm_ops->set_policy : NULL);
+
+	if (vma->vm_ops && vma->vm_ops->set_policy)
+		err = vma->vm_ops->set_policy(vma, new);
+	if (!err) {
+		mpol_get(new);
+		vma->vm_policy = new;
+		mpol_put(old);
+	}
+	return err;
+}
+
 /* Step 2: apply policy to a range and do splits. */
 static int mbind_range(struct mm_struct *mm, unsigned long start,
 		       unsigned long end, struct mempolicy *new_pol)
@@ -655,23 +676,9 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 			if (err)
 				goto out;
 		}
-
-		/*
-		 * Apply policy to a single VMA. The reference counting of
-		 * policy for vma_policy linkages has already been handled by
-		 * vma_merge and split_vma as necessary. If this is a shared
-		 * policy then ->set_policy will increment the reference count
-		 * for an sp node.
-		 */
-		pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
-			vma->vm_start, vma->vm_end, vma->vm_pgoff,
-			vma->vm_ops, vma->vm_file,
-			vma->vm_ops ? vma->vm_ops->set_policy : NULL);
-		if (vma->vm_ops && vma->vm_ops->set_policy) {
-			err = vma->vm_ops->set_policy(vma, new_pol);
-			if (err)
-				goto out;
-		}
+		err = policy_vma(vma, new_pol);
+		if (err)
+			goto out;
 	}
 
  out:
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 2/6] mempolicy: Kill all mempolicy sharing
  2012-05-30  9:02 ` kosaki.motohiro
@ 2012-05-30  9:02   ` kosaki.motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro, Andrew Morton

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Dave Jones' system call fuzz testing tool "trinity" triggered the following
bug error with slab debugging enabled

[ 7613.229315] =============================================================================
[ 7613.229955] BUG numa_policy (Not tainted): Poison overwritten
[ 7613.230560] -----------------------------------------------------------------------------
[ 7613.230560]
[ 7613.231834] INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
[ 7613.232518] INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
[ 7613.233188]  __slab_alloc+0x3d3/0x445
[ 7613.233877]  kmem_cache_alloc+0x29d/0x2b0
[ 7613.234564]  mpol_new+0xa3/0x140
[ 7613.235236]  sys_mbind+0x142/0x620
[ 7613.235929]  system_call_fastpath+0x16/0x1b
[ 7613.236640] INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
[ 7613.237354]  __slab_free+0x2e/0x1de
[ 7613.238080]  kmem_cache_free+0x25a/0x260
[ 7613.238799]  __mpol_put+0x27/0x30
[ 7613.239515]  remove_vma+0x68/0x90
[ 7613.240223]  exit_mmap+0x118/0x140
[ 7613.240939]  mmput+0x73/0x110
[ 7613.241651]  exit_mm+0x108/0x130
[ 7613.242367]  do_exit+0x162/0xb90
[ 7613.243074]  do_group_exit+0x4f/0xc0
[ 7613.243790]  sys_exit_group+0x17/0x20
[ 7613.244507]  system_call_fastpath+0x16/0x1b
[ 7613.245212] INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x          (null) flags=0x20000000004080
[ 7613.246000] INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0

The problem was created by a reference count imbalance. Example, In following case,
mbind(addr, len) try to replace mempolicies of both vma1 and vma2 and thus they will
become to share the same mempolicy, _and_ the new mempolicy has MPOL_F_SHARED flag.

  +-------------------+-------------------+
  |     vma1          |     vma2(shmem)   |
  +-------------------+-------------------+
  |                                       |
 addr                                 addr+len

Look at alloc_pages_vma(), it uses get_vma_policy() and mpol_cond_put() pair
for maintaining mempolicy refcount. The current rule is, get_vma_policy() does
NOT increment refcount if the policy is not attached shmem vma and mpol_cond_put()
DOES decrement refcount if mpol has MPOL_F_SHARED.

See? In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED! then,
refcount will be decreased even though was not increased whenever alloc_page_vma()
is called. As you know, mere mbind(MPOL_MF_MOVE) calls alloc_page_vma().

Oh, Oh my god.. Who can imagine alloc_pages_vma() was broken!? It is one of Linux
memory management central code! This bug was introduced by commit 52cd3b0740
(mempolicy: rework mempolicy Reference Counting) at 2008. I.e. it was living 4
years!

More unfortunately mempolicy has one another serious broken. Currently,
mempolicy rebind logic (it is called from cpuset rebinding) ignore a refcount
of mempolicy and override it forcibly. Thus, any mempolicy sharing may
cause mempolicy corruption. The bug was introduced by commit 68860ec10b
(cpusets: automatic numa mempolicy rebinding) at 2005!

Maybe, we need to rewrite MPOL_F_SHARED and mempolicy rebinding code at all.
But at first step, to disable any policy sharing hide the issue from a userland.

Reported-by: Dave Jones <davej@redhat.com>,
Cc: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mempolicy.c |   49 ++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 0a60def..9505cb9 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -607,24 +607,38 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
 	return first;
 }
 
-/* Apply policy to a single VMA */
-static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
+/*
+ * Apply policy to a single VMA
+ * This must be called with the mmap_sem held for writing.
+ */
+static int policy_vma(struct vm_area_struct *vma, struct mempolicy *pol)
 {
-	int err = 0;
-	struct mempolicy *old = vma->vm_policy;
+	int err;
+	struct mempolicy *old;
+	struct mempolicy *new;
 
 	pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
 		 vma->vm_start, vma->vm_end, vma->vm_pgoff,
 		 vma->vm_ops, vma->vm_file,
 		 vma->vm_ops ? vma->vm_ops->set_policy : NULL);
 
-	if (vma->vm_ops && vma->vm_ops->set_policy)
+	new = mpol_dup(pol);
+	if (IS_ERR(new))
+		return PTR_ERR(new);
+
+	if (vma->vm_ops && vma->vm_ops->set_policy) {
 		err = vma->vm_ops->set_policy(vma, new);
-	if (!err) {
-		mpol_get(new);
-		vma->vm_policy = new;
-		mpol_put(old);
+		if (err)
+			goto err_out;
 	}
+
+	old = vma->vm_policy;
+	vma->vm_policy = new; /* protected by mmap_sem */
+	mpol_put(old);
+
+	return 0;
+ err_out:
+	mpol_put(new);
 	return err;
 }
 
@@ -2147,15 +2161,24 @@ static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 				struct mempolicy *pol)
 {
-	struct sp_node *n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
+	struct sp_node *n;
+	struct mempolicy *newpol;
 
+	n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
 	if (!n)
 		return NULL;
+
+	newpol = mpol_dup(pol);
+	if (IS_ERR(newpol)) {
+		kmem_cache_free(sn_cache, n);
+		return NULL;
+	}
+	newpol->flags |= MPOL_F_SHARED;
+
 	n->start = start;
 	n->end = end;
-	mpol_get(pol);
-	pol->flags |= MPOL_F_SHARED;	/* for unref */
-	n->policy = pol;
+	n->policy = newpol;
+
 	return n;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 2/6] mempolicy: Kill all mempolicy sharing
@ 2012-05-30  9:02   ` kosaki.motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro, Andrew Morton

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Dave Jones' system call fuzz testing tool "trinity" triggered the following
bug error with slab debugging enabled

[ 7613.229315] =============================================================================
[ 7613.229955] BUG numa_policy (Not tainted): Poison overwritten
[ 7613.230560] -----------------------------------------------------------------------------
[ 7613.230560]
[ 7613.231834] INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
[ 7613.232518] INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
[ 7613.233188]  __slab_alloc+0x3d3/0x445
[ 7613.233877]  kmem_cache_alloc+0x29d/0x2b0
[ 7613.234564]  mpol_new+0xa3/0x140
[ 7613.235236]  sys_mbind+0x142/0x620
[ 7613.235929]  system_call_fastpath+0x16/0x1b
[ 7613.236640] INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
[ 7613.237354]  __slab_free+0x2e/0x1de
[ 7613.238080]  kmem_cache_free+0x25a/0x260
[ 7613.238799]  __mpol_put+0x27/0x30
[ 7613.239515]  remove_vma+0x68/0x90
[ 7613.240223]  exit_mmap+0x118/0x140
[ 7613.240939]  mmput+0x73/0x110
[ 7613.241651]  exit_mm+0x108/0x130
[ 7613.242367]  do_exit+0x162/0xb90
[ 7613.243074]  do_group_exit+0x4f/0xc0
[ 7613.243790]  sys_exit_group+0x17/0x20
[ 7613.244507]  system_call_fastpath+0x16/0x1b
[ 7613.245212] INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x          (null) flags=0x20000000004080
[ 7613.246000] INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0

The problem was created by a reference count imbalance. Example, In following case,
mbind(addr, len) try to replace mempolicies of both vma1 and vma2 and thus they will
become to share the same mempolicy, _and_ the new mempolicy has MPOL_F_SHARED flag.

  +-------------------+-------------------+
  |     vma1          |     vma2(shmem)   |
  +-------------------+-------------------+
  |                                       |
 addr                                 addr+len

Look at alloc_pages_vma(), it uses get_vma_policy() and mpol_cond_put() pair
for maintaining mempolicy refcount. The current rule is, get_vma_policy() does
NOT increment refcount if the policy is not attached shmem vma and mpol_cond_put()
DOES decrement refcount if mpol has MPOL_F_SHARED.

See? In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED! then,
refcount will be decreased even though was not increased whenever alloc_page_vma()
is called. As you know, mere mbind(MPOL_MF_MOVE) calls alloc_page_vma().

Oh, Oh my god.. Who can imagine alloc_pages_vma() was broken!? It is one of Linux
memory management central code! This bug was introduced by commit 52cd3b0740
(mempolicy: rework mempolicy Reference Counting) at 2008. I.e. it was living 4
years!

More unfortunately mempolicy has one another serious broken. Currently,
mempolicy rebind logic (it is called from cpuset rebinding) ignore a refcount
of mempolicy and override it forcibly. Thus, any mempolicy sharing may
cause mempolicy corruption. The bug was introduced by commit 68860ec10b
(cpusets: automatic numa mempolicy rebinding) at 2005!

Maybe, we need to rewrite MPOL_F_SHARED and mempolicy rebinding code at all.
But at first step, to disable any policy sharing hide the issue from a userland.

Reported-by: Dave Jones <davej@redhat.com>,
Cc: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mempolicy.c |   49 ++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 0a60def..9505cb9 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -607,24 +607,38 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
 	return first;
 }
 
-/* Apply policy to a single VMA */
-static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
+/*
+ * Apply policy to a single VMA
+ * This must be called with the mmap_sem held for writing.
+ */
+static int policy_vma(struct vm_area_struct *vma, struct mempolicy *pol)
 {
-	int err = 0;
-	struct mempolicy *old = vma->vm_policy;
+	int err;
+	struct mempolicy *old;
+	struct mempolicy *new;
 
 	pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
 		 vma->vm_start, vma->vm_end, vma->vm_pgoff,
 		 vma->vm_ops, vma->vm_file,
 		 vma->vm_ops ? vma->vm_ops->set_policy : NULL);
 
-	if (vma->vm_ops && vma->vm_ops->set_policy)
+	new = mpol_dup(pol);
+	if (IS_ERR(new))
+		return PTR_ERR(new);
+
+	if (vma->vm_ops && vma->vm_ops->set_policy) {
 		err = vma->vm_ops->set_policy(vma, new);
-	if (!err) {
-		mpol_get(new);
-		vma->vm_policy = new;
-		mpol_put(old);
+		if (err)
+			goto err_out;
 	}
+
+	old = vma->vm_policy;
+	vma->vm_policy = new; /* protected by mmap_sem */
+	mpol_put(old);
+
+	return 0;
+ err_out:
+	mpol_put(new);
 	return err;
 }
 
@@ -2147,15 +2161,24 @@ static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 				struct mempolicy *pol)
 {
-	struct sp_node *n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
+	struct sp_node *n;
+	struct mempolicy *newpol;
 
+	n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
 	if (!n)
 		return NULL;
+
+	newpol = mpol_dup(pol);
+	if (IS_ERR(newpol)) {
+		kmem_cache_free(sn_cache, n);
+		return NULL;
+	}
+	newpol->flags |= MPOL_F_SHARED;
+
 	n->start = start;
 	n->end = end;
-	mpol_get(pol);
-	pol->flags |= MPOL_F_SHARED;	/* for unref */
-	n->policy = pol;
+	n->policy = newpol;
+
 	return n;
 }
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 3/6] mempolicy: fix a race in shared_policy_replace()
  2012-05-30  9:02 ` kosaki.motohiro
@ 2012-05-30  9:02   ` kosaki.motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro, Andrew Morton

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

shared_policy_replace() uses sp_alloc() wrongly. 1) sp_node can't be dereferenced when
not holding sp->lock and 2) another thread can modify sp_node between spin_unlock for
restarting and next spin_lock, we shouldn't believe n->end and n->policy were not
modified.

This patch fixes them.

Note: The bug was introduced pre-git age (IOW, before 2.6.12-rc2). I believe nobody
uses this feature in production systems.

Cc: Dave Jones <davej@redhat.com>,
Cc: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mempolicy.c |   55 ++++++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 9505cb9..d97d2db 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2158,6 +2158,14 @@ static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 	kmem_cache_free(sn_cache, n);
 }
 
+static void sp_node_init(struct sp_node *node, unsigned long start,
+			 unsigned long end, struct mempolicy *pol)
+{
+	node->start = start;
+	node->end = end;
+	node->policy = pol;
+}
+
 static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 				struct mempolicy *pol)
 {
@@ -2174,10 +2182,7 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 		return NULL;
 	}
 	newpol->flags |= MPOL_F_SHARED;
-
-	n->start = start;
-	n->end = end;
-	n->policy = newpol;
+	sp_node_init(n, start, end, newpol);
 
 	return n;
 }
@@ -2186,7 +2191,9 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 static int shared_policy_replace(struct shared_policy *sp, unsigned long start,
 				 unsigned long end, struct sp_node *new)
 {
+	int err;
 	struct sp_node *n, *new2 = NULL;
+	struct mempolicy *new2_pol = NULL;
 
 restart:
 	spin_lock(&sp->lock);
@@ -2202,16 +2209,16 @@ restart:
 		} else {
 			/* Old policy spanning whole new range. */
 			if (n->end > end) {
-				if (!new2) {
-					spin_unlock(&sp->lock);
-					new2 = sp_alloc(end, n->end, n->policy);
-					if (!new2)
-						return -ENOMEM;
-					goto restart;
-				}
-				n->end = start;
+				if (!new2)
+					goto alloc_new2;
+
+				*new2_pol = *n->policy;
+				atomic_set(&new2_pol->refcnt, 1);
+				sp_node_init(new2, n->end, end, new2_pol);
 				sp_insert(sp, new2);
+				n->end = start;
 				new2 = NULL;
+				new2_pol = NULL;
 				break;
 			} else
 				n->end = start;
@@ -2223,11 +2230,25 @@ restart:
 	if (new)
 		sp_insert(sp, new);
 	spin_unlock(&sp->lock);
-	if (new2) {
-		mpol_put(new2->policy);
-		kmem_cache_free(sn_cache, new2);
-	}
-	return 0;
+	err = 0;
+
+ err_out:
+	if (new2_pol)
+		mpol_put(new2_pol);
+        if (new2)
+                kmem_cache_free(sn_cache, new2);
+	return err;
+
+ alloc_new2:
+	spin_unlock(&sp->lock);
+	err = -ENOMEM;
+	new2 = kmem_cache_alloc(sn_cache, GFP_KERNEL);
+	if (!new2)
+		goto err_out;
+	new2_pol = kmem_cache_alloc(policy_cache, GFP_KERNEL);
+	if (!new2_pol)
+		goto err_out;
+	goto restart;
 }
 
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 3/6] mempolicy: fix a race in shared_policy_replace()
@ 2012-05-30  9:02   ` kosaki.motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro, Andrew Morton

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

shared_policy_replace() uses sp_alloc() wrongly. 1) sp_node can't be dereferenced when
not holding sp->lock and 2) another thread can modify sp_node between spin_unlock for
restarting and next spin_lock, we shouldn't believe n->end and n->policy were not
modified.

This patch fixes them.

Note: The bug was introduced pre-git age (IOW, before 2.6.12-rc2). I believe nobody
uses this feature in production systems.

Cc: Dave Jones <davej@redhat.com>,
Cc: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mempolicy.c |   55 ++++++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 9505cb9..d97d2db 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2158,6 +2158,14 @@ static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 	kmem_cache_free(sn_cache, n);
 }
 
+static void sp_node_init(struct sp_node *node, unsigned long start,
+			 unsigned long end, struct mempolicy *pol)
+{
+	node->start = start;
+	node->end = end;
+	node->policy = pol;
+}
+
 static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 				struct mempolicy *pol)
 {
@@ -2174,10 +2182,7 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 		return NULL;
 	}
 	newpol->flags |= MPOL_F_SHARED;
-
-	n->start = start;
-	n->end = end;
-	n->policy = newpol;
+	sp_node_init(n, start, end, newpol);
 
 	return n;
 }
@@ -2186,7 +2191,9 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 static int shared_policy_replace(struct shared_policy *sp, unsigned long start,
 				 unsigned long end, struct sp_node *new)
 {
+	int err;
 	struct sp_node *n, *new2 = NULL;
+	struct mempolicy *new2_pol = NULL;
 
 restart:
 	spin_lock(&sp->lock);
@@ -2202,16 +2209,16 @@ restart:
 		} else {
 			/* Old policy spanning whole new range. */
 			if (n->end > end) {
-				if (!new2) {
-					spin_unlock(&sp->lock);
-					new2 = sp_alloc(end, n->end, n->policy);
-					if (!new2)
-						return -ENOMEM;
-					goto restart;
-				}
-				n->end = start;
+				if (!new2)
+					goto alloc_new2;
+
+				*new2_pol = *n->policy;
+				atomic_set(&new2_pol->refcnt, 1);
+				sp_node_init(new2, n->end, end, new2_pol);
 				sp_insert(sp, new2);
+				n->end = start;
 				new2 = NULL;
+				new2_pol = NULL;
 				break;
 			} else
 				n->end = start;
@@ -2223,11 +2230,25 @@ restart:
 	if (new)
 		sp_insert(sp, new);
 	spin_unlock(&sp->lock);
-	if (new2) {
-		mpol_put(new2->policy);
-		kmem_cache_free(sn_cache, new2);
-	}
-	return 0;
+	err = 0;
+
+ err_out:
+	if (new2_pol)
+		mpol_put(new2_pol);
+        if (new2)
+                kmem_cache_free(sn_cache, new2);
+	return err;
+
+ alloc_new2:
+	spin_unlock(&sp->lock);
+	err = -ENOMEM;
+	new2 = kmem_cache_alloc(sn_cache, GFP_KERNEL);
+	if (!new2)
+		goto err_out;
+	new2_pol = kmem_cache_alloc(policy_cache, GFP_KERNEL);
+	if (!new2_pol)
+		goto err_out;
+	goto restart;
 }
 
 /**
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 4/6] mempolicy: fix refcount leak in mpol_set_shared_policy()
  2012-05-30  9:02 ` kosaki.motohiro
@ 2012-05-30  9:02   ` kosaki.motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro, Andrew Morton

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

When shared_policy_replace() failure, new->policy is not freed correctly.
The real problem is, shared mempolicy codes directly call kmem_cache_free()
in multiple place. It can be easily mistake.

This patch creates proper wrapper function and uses it. The bug was introduced
pre-git age (IOW, before 2.6.12-rc2).

Cc: Dave Jones <davej@redhat.com>,
Cc: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mempolicy.c |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d97d2db..7fb7d51 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2150,12 +2150,17 @@ mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx)
 	return pol;
 }
 
+static void sp_free(struct sp_node *n)
+{
+	mpol_put(n->policy);
+	kmem_cache_free(sn_cache, n);
+}
+
 static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 {
 	pr_debug("deleting %lx-l%lx\n", n->start, n->end);
 	rb_erase(&n->nd, &sp->root);
-	mpol_put(n->policy);
-	kmem_cache_free(sn_cache, n);
+	sp_free(n);
 }
 
 static void sp_node_init(struct sp_node *node, unsigned long start,
@@ -2320,7 +2325,7 @@ int mpol_set_shared_policy(struct shared_policy *info,
 	}
 	err = shared_policy_replace(info, vma->vm_pgoff, vma->vm_pgoff+sz, new);
 	if (err && new)
-		kmem_cache_free(sn_cache, new);
+		sp_free(new);
 	return err;
 }
 
@@ -2337,9 +2342,7 @@ void mpol_free_shared_policy(struct shared_policy *p)
 	while (next) {
 		n = rb_entry(next, struct sp_node, nd);
 		next = rb_next(&n->nd);
-		rb_erase(&n->nd, &p->root);
-		mpol_put(n->policy);
-		kmem_cache_free(sn_cache, n);
+		sp_delete(p, n);
 	}
 	spin_unlock(&p->lock);
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 4/6] mempolicy: fix refcount leak in mpol_set_shared_policy()
@ 2012-05-30  9:02   ` kosaki.motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro, Andrew Morton

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

When shared_policy_replace() failure, new->policy is not freed correctly.
The real problem is, shared mempolicy codes directly call kmem_cache_free()
in multiple place. It can be easily mistake.

This patch creates proper wrapper function and uses it. The bug was introduced
pre-git age (IOW, before 2.6.12-rc2).

Cc: Dave Jones <davej@redhat.com>,
Cc: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mempolicy.c |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d97d2db..7fb7d51 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2150,12 +2150,17 @@ mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx)
 	return pol;
 }
 
+static void sp_free(struct sp_node *n)
+{
+	mpol_put(n->policy);
+	kmem_cache_free(sn_cache, n);
+}
+
 static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 {
 	pr_debug("deleting %lx-l%lx\n", n->start, n->end);
 	rb_erase(&n->nd, &sp->root);
-	mpol_put(n->policy);
-	kmem_cache_free(sn_cache, n);
+	sp_free(n);
 }
 
 static void sp_node_init(struct sp_node *node, unsigned long start,
@@ -2320,7 +2325,7 @@ int mpol_set_shared_policy(struct shared_policy *info,
 	}
 	err = shared_policy_replace(info, vma->vm_pgoff, vma->vm_pgoff+sz, new);
 	if (err && new)
-		kmem_cache_free(sn_cache, new);
+		sp_free(new);
 	return err;
 }
 
@@ -2337,9 +2342,7 @@ void mpol_free_shared_policy(struct shared_policy *p)
 	while (next) {
 		n = rb_entry(next, struct sp_node, nd);
 		next = rb_next(&n->nd);
-		rb_erase(&n->nd, &p->root);
-		mpol_put(n->policy);
-		kmem_cache_free(sn_cache, n);
+		sp_delete(p, n);
 	}
 	spin_unlock(&p->lock);
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-05-30  9:02 ` kosaki.motohiro
@ 2012-05-30  9:02   ` kosaki.motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro, Andrew Morton,
	Miao Xie, Peter Zijlstra

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

commit cc9a6c8776 (cpuset: mm: reduce large amounts of memory barrier related
damage v3) introduced a memory corruption.

shmem_alloc_page() passes pseudo vma and it has one significant unique
combination, vma->vm_ops=NULL and (vma->policy->flags & MPOL_F_SHARED).

Now, get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
and mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
Therefore, when cpuset race is happen and alloc_pages_vma() fall in
'goto retry_cpuset' path, a policy refcount will be decreased too much and
therefore it will make memory corruption.

This patch fixes it.

Cc: Dave Jones <davej@redhat.com>,
Cc: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mempolicy.c |   13 ++++++++++++-
 mm/shmem.c     |    9 +++++----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7fb7d51..0da0969 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1544,18 +1544,29 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
 		struct vm_area_struct *vma, unsigned long addr)
 {
 	struct mempolicy *pol = task->mempolicy;
+	int got_ref;
 
 	if (vma) {
 		if (vma->vm_ops && vma->vm_ops->get_policy) {
 			struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
 									addr);
-			if (vpol)
+			if (vpol) {
 				pol = vpol;
+				got_ref = 1;
+			}
 		} else if (vma->vm_policy)
 			pol = vma->vm_policy;
 	}
 	if (!pol)
 		pol = &default_policy;
+
+	/*
+	 * shmem_alloc_page() passes MPOL_F_SHARED policy with vma->vm_ops=NULL.
+	 * Thus, we need to take additional ref for avoiding refcount imbalance.
+	 */
+	if (!got_ref && mpol_needs_cond_ref(pol))
+		mpol_get(pol);
+
 	return pol;
 }
 
diff --git a/mm/shmem.c b/mm/shmem.c
index d576b84..eb5f1eb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -919,6 +919,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
 			struct shmem_inode_info *info, pgoff_t index)
 {
 	struct vm_area_struct pvma;
+	struct page *page;
 
 	/* Create a pseudo vma that just contains the policy */
 	pvma.vm_start = 0;
@@ -926,10 +927,10 @@ static struct page *shmem_alloc_page(gfp_t gfp,
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
 
-	/*
-	 * alloc_page_vma() will drop the shared policy reference
-	 */
-	return alloc_page_vma(gfp, &pvma, 0);
+	page = alloc_page_vma(gfp, &pvma, 0);
+
+	mpol_put(pvma.vm_policy);
+	return page;
 }
 #else /* !CONFIG_NUMA */
 #ifdef CONFIG_TMPFS
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
@ 2012-05-30  9:02   ` kosaki.motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro, Andrew Morton,
	Miao Xie, Peter Zijlstra

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

commit cc9a6c8776 (cpuset: mm: reduce large amounts of memory barrier related
damage v3) introduced a memory corruption.

shmem_alloc_page() passes pseudo vma and it has one significant unique
combination, vma->vm_ops=NULL and (vma->policy->flags & MPOL_F_SHARED).

Now, get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
and mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
Therefore, when cpuset race is happen and alloc_pages_vma() fall in
'goto retry_cpuset' path, a policy refcount will be decreased too much and
therefore it will make memory corruption.

This patch fixes it.

Cc: Dave Jones <davej@redhat.com>,
Cc: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/mempolicy.c |   13 ++++++++++++-
 mm/shmem.c     |    9 +++++----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7fb7d51..0da0969 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1544,18 +1544,29 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
 		struct vm_area_struct *vma, unsigned long addr)
 {
 	struct mempolicy *pol = task->mempolicy;
+	int got_ref;
 
 	if (vma) {
 		if (vma->vm_ops && vma->vm_ops->get_policy) {
 			struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
 									addr);
-			if (vpol)
+			if (vpol) {
 				pol = vpol;
+				got_ref = 1;
+			}
 		} else if (vma->vm_policy)
 			pol = vma->vm_policy;
 	}
 	if (!pol)
 		pol = &default_policy;
+
+	/*
+	 * shmem_alloc_page() passes MPOL_F_SHARED policy with vma->vm_ops=NULL.
+	 * Thus, we need to take additional ref for avoiding refcount imbalance.
+	 */
+	if (!got_ref && mpol_needs_cond_ref(pol))
+		mpol_get(pol);
+
 	return pol;
 }
 
diff --git a/mm/shmem.c b/mm/shmem.c
index d576b84..eb5f1eb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -919,6 +919,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
 			struct shmem_inode_info *info, pgoff_t index)
 {
 	struct vm_area_struct pvma;
+	struct page *page;
 
 	/* Create a pseudo vma that just contains the policy */
 	pvma.vm_start = 0;
@@ -926,10 +927,10 @@ static struct page *shmem_alloc_page(gfp_t gfp,
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
 
-	/*
-	 * alloc_page_vma() will drop the shared policy reference
-	 */
-	return alloc_page_vma(gfp, &pvma, 0);
+	page = alloc_page_vma(gfp, &pvma, 0);
+
+	mpol_put(pvma.vm_policy);
+	return page;
 }
 #else /* !CONFIG_NUMA */
 #ifdef CONFIG_TMPFS
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 6/6] MAINTAINERS: Added MEMPOLICY entry
  2012-05-30  9:02 ` kosaki.motohiro
@ 2012-05-30  9:02   ` kosaki.motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 MAINTAINERS |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a246490..6f4a8e2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4467,6 +4467,13 @@ F:	drivers/mtd/
 F:	include/linux/mtd/
 F:	include/mtd/
 
+MEMPOLICY
+M:	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
+L:	linux-mm@kvack.org
+S:	Maintained
+F:	include/linux/mempolicy.h
+F:	mm/mempolicy.c
+
 MICROBLAZE ARCHITECTURE
 M:	Michal Simek <monstr@monstr.eu>
 L:	microblaze-uclinux@itee.uq.edu.au (moderated for non-subscribers)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 6/6] MAINTAINERS: Added MEMPOLICY entry
@ 2012-05-30  9:02   ` kosaki.motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: kosaki.motohiro @ 2012-05-30  9:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Dave Jones, Mel Gorman, Linus Torvalds,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 MAINTAINERS |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a246490..6f4a8e2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4467,6 +4467,13 @@ F:	drivers/mtd/
 F:	include/linux/mtd/
 F:	include/mtd/
 
+MEMPOLICY
+M:	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
+L:	linux-mm@kvack.org
+S:	Maintained
+F:	include/linux/mempolicy.h
+F:	mm/mempolicy.c
+
 MICROBLAZE ARCHITECTURE
 M:	Michal Simek <monstr@monstr.eu>
 L:	microblaze-uclinux@itee.uq.edu.au (moderated for non-subscribers)
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30  9:02 ` kosaki.motohiro
@ 2012-05-30 18:26   ` Linus Torvalds
  -1 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2012-05-30 18:26 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro

On Wed, May 30, 2012 at 2:02 AM,  <kosaki.motohiro@gmail.com> wrote:
>
> So, I think we should reconsider about shared mempolicy completely.

Quite frankly, I'd prefer that approach. The code is subtle and
horribly bug-fraught, and I absolutely detest the way it looks too.
Reading your patches was actually somewhat painful.

If we could just remove the support for it entirely, that would be
*much* preferable to continue working with this code.

Could we just try that removal, and see if anybody screams?

                  Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 18:26   ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2012-05-30 18:26 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Christoph Lameter, stable, hughd, KOSAKI Motohiro

On Wed, May 30, 2012 at 2:02 AM,  <kosaki.motohiro@gmail.com> wrote:
>
> So, I think we should reconsider about shared mempolicy completely.

Quite frankly, I'd prefer that approach. The code is subtle and
horribly bug-fraught, and I absolutely detest the way it looks too.
Reading your patches was actually somewhat painful.

If we could just remove the support for it entirely, that would be
*much* preferable to continue working with this code.

Could we just try that removal, and see if anybody screams?

                  Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 18:26   ` Linus Torvalds
@ 2012-05-30 18:34     ` Christoph Lameter
  -1 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 18:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kosaki.motohiro, linux-kernel, linux-mm, Andrew Morton,
	Dave Jones, Mel Gorman, stable, hughd, sivanich, KOSAKI Motohiro,
	andi

On Wed, 30 May 2012, Linus Torvalds wrote:

> On Wed, May 30, 2012 at 2:02 AM,  <kosaki.motohiro@gmail.com> wrote:
> >
> > So, I think we should reconsider about shared mempolicy completely.
>
> Quite frankly, I'd prefer that approach. The code is subtle and
> horribly bug-fraught, and I absolutely detest the way it looks too.
> Reading your patches was actually somewhat painful.

It is so bad mostly because the integration of shared memory policies with
cpusets is not really working. Using either in isolation is ok especially
shared mempolicies do not play well with cpusets.

> If we could just remove the support for it entirely, that would be
> *much* preferable to continue working with this code.

Well shm support needs memory policies to spread data across nodes etc.
AFAICT support was put in due to requirements to support large database
vendors (oracle). Andi?

Its not going to be easy to remove.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 18:34     ` Christoph Lameter
  0 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 18:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kosaki.motohiro, linux-kernel, linux-mm, Andrew Morton,
	Dave Jones, Mel Gorman, stable, hughd, sivanich, KOSAKI Motohiro,
	andi

On Wed, 30 May 2012, Linus Torvalds wrote:

> On Wed, May 30, 2012 at 2:02 AM,  <kosaki.motohiro@gmail.com> wrote:
> >
> > So, I think we should reconsider about shared mempolicy completely.
>
> Quite frankly, I'd prefer that approach. The code is subtle and
> horribly bug-fraught, and I absolutely detest the way it looks too.
> Reading your patches was actually somewhat painful.

It is so bad mostly because the integration of shared memory policies with
cpusets is not really working. Using either in isolation is ok especially
shared mempolicies do not play well with cpusets.

> If we could just remove the support for it entirely, that would be
> *much* preferable to continue working with this code.

Well shm support needs memory policies to spread data across nodes etc.
AFAICT support was put in due to requirements to support large database
vendors (oracle). Andi?

Its not going to be easy to remove.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 18:34     ` Christoph Lameter
@ 2012-05-30 18:46       ` Andi Kleen
  -1 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 18:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Linus Torvalds, kosaki.motohiro, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich,
	KOSAKI Motohiro, andi

On Wed, May 30, 2012 at 01:34:21PM -0500, Christoph Lameter wrote:
> On Wed, 30 May 2012, Linus Torvalds wrote:
> 
> > On Wed, May 30, 2012 at 2:02 AM,  <kosaki.motohiro@gmail.com> wrote:
> > >
> > > So, I think we should reconsider about shared mempolicy completely.
> >
> > Quite frankly, I'd prefer that approach. The code is subtle and
> > horribly bug-fraught, and I absolutely detest the way it looks too.
> > Reading your patches was actually somewhat painful.
> 
> It is so bad mostly because the integration of shared memory policies with
> cpusets is not really working. Using either in isolation is ok especially
> shared mempolicies do not play well with cpusets.

Yes the cpusets did some horrible things.

I always regretted that cpusets were no done with custom node lists.
That would have been much cleaner and also likely faster than what we have.

> > If we could just remove the support for it entirely, that would be
> > *much* preferable to continue working with this code.
> 
> Well shm support needs memory policies to spread data across nodes etc.
> AFAICT support was put in due to requirements to support large database
> vendors (oracle). Andi?

Yes we need shared policy for the big databases.

Maybe we could stop supporting cpusets with that though. Not sure they
really use that.

> Its not going to be easy to remove.

Shared policies? I don't think you can remove them.
cpusets+shared policy? maybe, but still will be hard. 

-Andi

> 

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 18:46       ` Andi Kleen
  0 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 18:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Linus Torvalds, kosaki.motohiro, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich,
	KOSAKI Motohiro, andi

On Wed, May 30, 2012 at 01:34:21PM -0500, Christoph Lameter wrote:
> On Wed, 30 May 2012, Linus Torvalds wrote:
> 
> > On Wed, May 30, 2012 at 2:02 AM,  <kosaki.motohiro@gmail.com> wrote:
> > >
> > > So, I think we should reconsider about shared mempolicy completely.
> >
> > Quite frankly, I'd prefer that approach. The code is subtle and
> > horribly bug-fraught, and I absolutely detest the way it looks too.
> > Reading your patches was actually somewhat painful.
> 
> It is so bad mostly because the integration of shared memory policies with
> cpusets is not really working. Using either in isolation is ok especially
> shared mempolicies do not play well with cpusets.

Yes the cpusets did some horrible things.

I always regretted that cpusets were no done with custom node lists.
That would have been much cleaner and also likely faster than what we have.

> > If we could just remove the support for it entirely, that would be
> > *much* preferable to continue working with this code.
> 
> Well shm support needs memory policies to spread data across nodes etc.
> AFAICT support was put in due to requirements to support large database
> vendors (oracle). Andi?

Yes we need shared policy for the big databases.

Maybe we could stop supporting cpusets with that though. Not sure they
really use that.

> Its not going to be easy to remove.

Shared policies? I don't think you can remove them.
cpusets+shared policy? maybe, but still will be hard. 

-Andi

> 

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 18:46       ` Andi Kleen
@ 2012-05-30 18:50         ` Christoph Lameter
  -1 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 18:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, kosaki.motohiro, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich,
	KOSAKI Motohiro

On Wed, 30 May 2012, Andi Kleen wrote:

> I always regretted that cpusets were no done with custom node lists.
> That would have been much cleaner and also likely faster than what we have.

Could shared memory policies ignore cpuset constraints?

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 18:50         ` Christoph Lameter
  0 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 18:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, kosaki.motohiro, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich,
	KOSAKI Motohiro

On Wed, 30 May 2012, Andi Kleen wrote:

> I always regretted that cpusets were no done with custom node lists.
> That would have been much cleaner and also likely faster than what we have.

Could shared memory policies ignore cpuset constraints?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 18:34     ` Christoph Lameter
@ 2012-05-30 19:03       ` Linus Torvalds
  -1 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2012-05-30 19:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, linux-kernel, linux-mm, Andrew Morton,
	Dave Jones, Mel Gorman, stable, hughd, sivanich, KOSAKI Motohiro,
	andi

On Wed, May 30, 2012 at 11:34 AM, Christoph Lameter <cl@linux.com> wrote:
>
> Well shm support needs memory policies to spread data across nodes etc.
> AFAICT support was put in due to requirements to support large database
> vendors (oracle). Andi?
>
> Its not going to be easy to remove.

Ok. So can the people involved with this please review this patch-set
and comment on this one? I think it needs a bit of language editing
for the commit commentary, and I'd like it to see a *lot * of ack's
from the relevant vm people.

Please?

                      Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 19:03       ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2012-05-30 19:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, linux-kernel, linux-mm, Andrew Morton,
	Dave Jones, Mel Gorman, stable, hughd, sivanich, KOSAKI Motohiro,
	andi

On Wed, May 30, 2012 at 11:34 AM, Christoph Lameter <cl@linux.com> wrote:
>
> Well shm support needs memory policies to spread data across nodes etc.
> AFAICT support was put in due to requirements to support large database
> vendors (oracle). Andi?
>
> Its not going to be easy to remove.

Ok. So can the people involved with this please review this patch-set
and comment on this one? I think it needs a bit of language editing
for the commit commentary, and I'd like it to see a *lot * of ack's
from the relevant vm people.

Please?

                      Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
  2012-05-30  9:02   ` kosaki.motohiro
@ 2012-05-30 19:17     ` Christoph Lameter
  -1 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 19:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, Andrew Morton, KOSAKI Motohiro

On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:

> From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>
> commit 05f144a0d5 removed vma->vm_policy updates code and it is a purpose of
> mbind_range(). Now, mbind_range() is virtually no-op. no-op function don't
> makes any bugs, I agree. but maybe it is not right fix.

I dont really understand the changelog. But to restore the policy_vma() is
the right thing to do since there are potential multiple use cases where
we want to apply a policy to a vma.

Proposed new changelog:

Commit 05f144a0d5 folded policy_vma() into mbind_range(). There are
other use cases of policy_vma(*) though and so revert a piece of
that commit in order to have a policy_vma() function again.

> @@ -655,23 +676,9 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>  			if (err)
>  				goto out;
>  		}
> -
> -		/*
> -		 * Apply policy to a single VMA. The reference counting of
> -		 * policy for vma_policy linkages has already been handled by
> -		 * vma_merge and split_vma as necessary. If this is a shared
> -		 * policy then ->set_policy will increment the reference count
> -		 * for an sp node.
> -		 */

You are dropping the nice comments by Mel that explain the refcounting.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
@ 2012-05-30 19:17     ` Christoph Lameter
  0 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 19:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, Andrew Morton, KOSAKI Motohiro

On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:

> From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>
> commit 05f144a0d5 removed vma->vm_policy updates code and it is a purpose of
> mbind_range(). Now, mbind_range() is virtually no-op. no-op function don't
> makes any bugs, I agree. but maybe it is not right fix.

I dont really understand the changelog. But to restore the policy_vma() is
the right thing to do since there are potential multiple use cases where
we want to apply a policy to a vma.

Proposed new changelog:

Commit 05f144a0d5 folded policy_vma() into mbind_range(). There are
other use cases of policy_vma(*) though and so revert a piece of
that commit in order to have a policy_vma() function again.

> @@ -655,23 +676,9 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>  			if (err)
>  				goto out;
>  		}
> -
> -		/*
> -		 * Apply policy to a single VMA. The reference counting of
> -		 * policy for vma_policy linkages has already been handled by
> -		 * vma_merge and split_vma as necessary. If this is a shared
> -		 * policy then ->set_policy will increment the reference count
> -		 * for an sp node.
> -		 */

You are dropping the nice comments by Mel that explain the refcounting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 6/6] MAINTAINERS: Added MEMPOLICY entry
  2012-05-30  9:02   ` kosaki.motohiro
@ 2012-05-30 19:27     ` Christoph Lameter
  -1 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 19:27 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, KOSAKI Motohiro, andi

Glad that someone is willing to take this on. Shared memory policies are
an ugly thing.

Adding Andi. He needs to be cced on future postings of this patchset.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 6/6] MAINTAINERS: Added MEMPOLICY entry
@ 2012-05-30 19:27     ` Christoph Lameter
  0 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 19:27 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, KOSAKI Motohiro, andi

Glad that someone is willing to take this on. Shared memory policies are
an ugly thing.

Adding Andi. He needs to be cced on future postings of this patchset.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 18:50         ` Christoph Lameter
@ 2012-05-30 19:32           ` Andi Kleen
  -1 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 19:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Linus Torvalds, kosaki.motohiro, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich, KOSAKI Motohiro

On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
> On Wed, 30 May 2012, Andi Kleen wrote:
> 
> > I always regretted that cpusets were no done with custom node lists.
> > That would have been much cleaner and also likely faster than what we have.
> 
> Could shared memory policies ignore cpuset constraints?

Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
Even with soft policy you could well break someone's setup.

Maybe there are some better ways to do that now with memcg, not fully sure.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 19:32           ` Andi Kleen
  0 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 19:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Linus Torvalds, kosaki.motohiro, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich, KOSAKI Motohiro

On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
> On Wed, 30 May 2012, Andi Kleen wrote:
> 
> > I always regretted that cpusets were no done with custom node lists.
> > That would have been much cleaner and also likely faster than what we have.
> 
> Could shared memory policies ignore cpuset constraints?

Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
Even with soft policy you could well break someone's setup.

Maybe there are some better ways to do that now with memcg, not fully sure.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
  2012-05-30 19:17     ` Christoph Lameter
  (?)
@ 2012-05-30 19:39       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 19:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, Andrew Morton

On Wed, May 30, 2012 at 3:17 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:
>
>> From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>>
>> commit 05f144a0d5 removed vma->vm_policy updates code and it is a purpose of
>> mbind_range(). Now, mbind_range() is virtually no-op. no-op function don't
>> makes any bugs, I agree. but maybe it is not right fix.
>
> I dont really understand the changelog. But to restore the policy_vma() is
> the right thing to do since there are potential multiple use cases where
> we want to apply a policy to a vma.
>
> Proposed new changelog:
>
> Commit 05f144a0d5 folded policy_vma() into mbind_range(). There are
> other use cases of policy_vma(*) though and so revert a piece of
> that commit in order to have a policy_vma() function again.
>
>> @@ -655,23 +676,9 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>>                       if (err)
>>                               goto out;
>>               }
>> -
>> -             /*
>> -              * Apply policy to a single VMA. The reference counting of
>> -              * policy for vma_policy linkages has already been handled by
>> -              * vma_merge and split_vma as necessary. If this is a shared
>> -              * policy then ->set_policy will increment the reference count
>> -              * for an sp node.
>> -              */
>
> You are dropping the nice comments by Mel that explain the refcounting.

Because this is not strictly correct. 1) vma_merge() and split_vma() don't
care mempolicy refcount. They only dup and drop it. 2) This mpol_get() is
for vma attaching. This function don't need to care sp_node internal.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
@ 2012-05-30 19:39       ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 19:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, Andrew Morton

On Wed, May 30, 2012 at 3:17 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:
>
>> From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>>
>> commit 05f144a0d5 removed vma->vm_policy updates code and it is a purpose of
>> mbind_range(). Now, mbind_range() is virtually no-op. no-op function don't
>> makes any bugs, I agree. but maybe it is not right fix.
>
> I dont really understand the changelog. But to restore the policy_vma() is
> the right thing to do since there are potential multiple use cases where
> we want to apply a policy to a vma.
>
> Proposed new changelog:
>
> Commit 05f144a0d5 folded policy_vma() into mbind_range(). There are
> other use cases of policy_vma(*) though and so revert a piece of
> that commit in order to have a policy_vma() function again.
>
>> @@ -655,23 +676,9 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>> � � � � � � � � � � � if (err)
>> � � � � � � � � � � � � � � � goto out;
>> � � � � � � � }
>> -
>> - � � � � � � /*
>> - � � � � � � �* Apply policy to a single VMA. The reference counting of
>> - � � � � � � �* policy for vma_policy linkages has already been handled by
>> - � � � � � � �* vma_merge and split_vma as necessary. If this is a shared
>> - � � � � � � �* policy then ->set_policy will increment the reference count
>> - � � � � � � �* for an sp node.
>> - � � � � � � �*/
>
> You are dropping the nice comments by Mel that explain the refcounting.

Because this is not strictly correct. 1) vma_merge() and split_vma() don't
care mempolicy refcount. They only dup and drop it. 2) This mpol_get() is
for vma attaching. This function don't need to care sp_node internal.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
@ 2012-05-30 19:39       ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 19:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, Andrew Morton

On Wed, May 30, 2012 at 3:17 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:
>
>> From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
>>
>> commit 05f144a0d5 removed vma->vm_policy updates code and it is a purpose of
>> mbind_range(). Now, mbind_range() is virtually no-op. no-op function don't
>> makes any bugs, I agree. but maybe it is not right fix.
>
> I dont really understand the changelog. But to restore the policy_vma() is
> the right thing to do since there are potential multiple use cases where
> we want to apply a policy to a vma.
>
> Proposed new changelog:
>
> Commit 05f144a0d5 folded policy_vma() into mbind_range(). There are
> other use cases of policy_vma(*) though and so revert a piece of
> that commit in order to have a policy_vma() function again.
>
>> @@ -655,23 +676,9 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>>                       if (err)
>>                               goto out;
>>               }
>> -
>> -             /*
>> -              * Apply policy to a single VMA. The reference counting of
>> -              * policy for vma_policy linkages has already been handled by
>> -              * vma_merge and split_vma as necessary. If this is a shared
>> -              * policy then ->set_policy will increment the reference count
>> -              * for an sp node.
>> -              */
>
> You are dropping the nice comments by Mel that explain the refcounting.

Because this is not strictly correct. 1) vma_merge() and split_vma() don't
care mempolicy refcount. They only dup and drop it. 2) This mpol_get() is
for vma attaching. This function don't need to care sp_node internal.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] mempolicy: Kill all mempolicy sharing
  2012-05-30  9:02   ` kosaki.motohiro
@ 2012-05-30 19:41     ` Christoph Lameter
  -1 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 19:41 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, KOSAKI Motohiro, Andrew Morton,
	andi

On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:

> refcount will be decreased even though was not increased whenever alloc_page_vma()
> is called. As you know, mere mbind(MPOL_MF_MOVE) calls alloc_page_vma().

Most of these issues are about memory migration and shared memory. If we
exempt shared memory from memory migration (after all that shared memory
has its own distinct memory policies already!) then a lot of these issues
wont arise.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] mempolicy: Kill all mempolicy sharing
@ 2012-05-30 19:41     ` Christoph Lameter
  0 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 19:41 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, KOSAKI Motohiro, Andrew Morton,
	andi

On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:

> refcount will be decreased even though was not increased whenever alloc_page_vma()
> is called. As you know, mere mbind(MPOL_MF_MOVE) calls alloc_page_vma().

Most of these issues are about memory migration and shared memory. If we
exempt shared memory from memory migration (after all that shared memory
has its own distinct memory policies already!) then a lot of these issues
wont arise.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 19:32           ` Andi Kleen
@ 2012-05-30 19:42             ` Christoph Lameter
  -1 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 19:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, kosaki.motohiro, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich,
	KOSAKI Motohiro

On Wed, 30 May 2012, Andi Kleen wrote:

> On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
> > On Wed, 30 May 2012, Andi Kleen wrote:
> >
> > > I always regretted that cpusets were no done with custom node lists.
> > > That would have been much cleaner and also likely faster than what we have.
> >
> > Could shared memory policies ignore cpuset constraints?
>
> Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
> Even with soft policy you could well break someone's setup.

Well at least lets exempt shared memory from memory migration and memory
policy updates. That seems to be causing many of these issues.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 19:42             ` Christoph Lameter
  0 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 19:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, kosaki.motohiro, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich,
	KOSAKI Motohiro

On Wed, 30 May 2012, Andi Kleen wrote:

> On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
> > On Wed, 30 May 2012, Andi Kleen wrote:
> >
> > > I always regretted that cpusets were no done with custom node lists.
> > > That would have been much cleaner and also likely faster than what we have.
> >
> > Could shared memory policies ignore cpuset constraints?
>
> Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
> Even with soft policy you could well break someone's setup.

Well at least lets exempt shared memory from memory migration and memory
policy updates. That seems to be causing many of these issues.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
  2012-05-30 19:39       ` KOSAKI Motohiro
@ 2012-05-30 19:44         ` Christoph Lameter
  -1 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 19:44 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, Andrew Morton

On Wed, 30 May 2012, KOSAKI Motohiro wrote:

> > You are dropping the nice comments by Mel that explain the refcounting.
>
> Because this is not strictly correct. 1) vma_merge() and split_vma() don't
> care mempolicy refcount. They only dup and drop it. 2) This mpol_get() is
> for vma attaching. This function don't need to care sp_node internal.

Ok then say so in the changelog.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
@ 2012-05-30 19:44         ` Christoph Lameter
  0 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-05-30 19:44 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, Andrew Morton

On Wed, 30 May 2012, KOSAKI Motohiro wrote:

> > You are dropping the nice comments by Mel that explain the refcounting.
>
> Because this is not strictly correct. 1) vma_merge() and split_vma() don't
> care mempolicy refcount. They only dup and drop it. 2) This mpol_get() is
> for vma attaching. This function don't need to care sp_node internal.

Ok then say so in the changelog.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] mempolicy: Kill all mempolicy sharing
  2012-05-30 19:41     ` Christoph Lameter
@ 2012-05-30 19:46       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 19:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, Andrew Morton, andi

On Wed, May 30, 2012 at 3:41 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:
>
>> refcount will be decreased even though was not increased whenever alloc_page_vma()
>> is called. As you know, mere mbind(MPOL_MF_MOVE) calls alloc_page_vma().
>
> Most of these issues are about memory migration and shared memory. If we
> exempt shared memory from memory migration (after all that shared memory
> has its own distinct memory policies already!) then a lot of these issues
> wont arise.

The final point is, to make proper cow for struct mempolicy. but until
fixing cpuset
bindings issue, we can't use mempolicy sharing anyway.

Moreover, now most core piece of mempolicy life cycle, say split_vma()
and dup_mmap(), don't use mempolicy sharing. Only mbind() does.

Thus, this patch don't increase normal workload memory usage.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] mempolicy: Kill all mempolicy sharing
@ 2012-05-30 19:46       ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 19:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, stable, hughd, Andrew Morton, andi

On Wed, May 30, 2012 at 3:41 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:
>
>> refcount will be decreased even though was not increased whenever alloc_page_vma()
>> is called. As you know, mere mbind(MPOL_MF_MOVE) calls alloc_page_vma().
>
> Most of these issues are about memory migration and shared memory. If we
> exempt shared memory from memory migration (after all that shared memory
> has its own distinct memory policies already!) then a lot of these issues
> wont arise.

The final point is, to make proper cow for struct mempolicy. but until
fixing cpuset
bindings issue, we can't use mempolicy sharing anyway.

Moreover, now most core piece of mempolicy life cycle, say split_vma()
and dup_mmap(), don't use mempolicy sharing. Only mbind() does.

Thus, this patch don't increase normal workload memory usage.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] mempolicy: Kill all mempolicy sharing
  2012-05-30 19:41     ` Christoph Lameter
@ 2012-05-30 19:48       ` Andi Kleen
  -1 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 19:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, linux-kernel, linux-mm, Andrew Morton,
	Dave Jones, Mel Gorman, Linus Torvalds, stable, hughd,
	KOSAKI Motohiro, Andrew Morton, andi

On Wed, May 30, 2012 at 02:41:22PM -0500, Christoph Lameter wrote:
> On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:
> 
> > refcount will be decreased even though was not increased whenever alloc_page_vma()
> > is called. As you know, mere mbind(MPOL_MF_MOVE) calls alloc_page_vma().
> 
> Most of these issues are about memory migration and shared memory. If we
> exempt shared memory from memory migration (after all that shared memory
> has its own distinct memory policies already!) then a lot of these issues
> wont arise.

Soft memory offlining needs migration. It's fairly important that this
works: on the database systems most memory is in shared memory and they
have a lot of memory, so predictive failure analysis and soft offlining
helps a lot.

Classic migration is probably not too important here, but they pretty
much rely on the same low level mechanism.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] mempolicy: Kill all mempolicy sharing
@ 2012-05-30 19:48       ` Andi Kleen
  0 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 19:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, linux-kernel, linux-mm, Andrew Morton,
	Dave Jones, Mel Gorman, Linus Torvalds, stable, hughd,
	KOSAKI Motohiro, Andrew Morton, andi

On Wed, May 30, 2012 at 02:41:22PM -0500, Christoph Lameter wrote:
> On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:
> 
> > refcount will be decreased even though was not increased whenever alloc_page_vma()
> > is called. As you know, mere mbind(MPOL_MF_MOVE) calls alloc_page_vma().
> 
> Most of these issues are about memory migration and shared memory. If we
> exempt shared memory from memory migration (after all that shared memory
> has its own distinct memory policies already!) then a lot of these issues
> wont arise.

Soft memory offlining needs migration. It's fairly important that this
works: on the database systems most memory is in shared memory and they
have a lot of memory, so predictive failure analysis and soft offlining
helps a lot.

Classic migration is probably not too important here, but they pretty
much rely on the same low level mechanism.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 19:42             ` Christoph Lameter
@ 2012-05-30 19:52               ` Andi Kleen
  -1 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 19:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Linus Torvalds, kosaki.motohiro, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich, KOSAKI Motohiro

On Wed, May 30, 2012 at 02:42:42PM -0500, Christoph Lameter wrote:
> On Wed, 30 May 2012, Andi Kleen wrote:
> 
> > On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
> > > On Wed, 30 May 2012, Andi Kleen wrote:
> > >
> > > > I always regretted that cpusets were no done with custom node lists.
> > > > That would have been much cleaner and also likely faster than what we have.
> > >
> > > Could shared memory policies ignore cpuset constraints?
> >
> > Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
> > Even with soft policy you could well break someone's setup.
> 
> Well at least lets exempt shared memory from memory migration and memory
> policy updates. That seems to be causing many of these issues.

Migration on the page level is needed for the memory error handling.

Updates: you mean not allowing to set the policy when there are already
multiple mappers? I could see that causing some unexpected behaviour. Presumably
a standard database will only set it at the beginning, but I don't know
if that would work for all users.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 19:52               ` Andi Kleen
  0 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 19:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Linus Torvalds, kosaki.motohiro, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich, KOSAKI Motohiro

On Wed, May 30, 2012 at 02:42:42PM -0500, Christoph Lameter wrote:
> On Wed, 30 May 2012, Andi Kleen wrote:
> 
> > On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
> > > On Wed, 30 May 2012, Andi Kleen wrote:
> > >
> > > > I always regretted that cpusets were no done with custom node lists.
> > > > That would have been much cleaner and also likely faster than what we have.
> > >
> > > Could shared memory policies ignore cpuset constraints?
> >
> > Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
> > Even with soft policy you could well break someone's setup.
> 
> Well at least lets exempt shared memory from memory migration and memory
> policy updates. That seems to be causing many of these issues.

Migration on the page level is needed for the memory error handling.

Updates: you mean not allowing to set the policy when there are already
multiple mappers? I could see that causing some unexpected behaviour. Presumably
a standard database will only set it at the beginning, but I don't know
if that would work for all users.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 19:42             ` Christoph Lameter
@ 2012-05-30 19:54               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 19:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Linus Torvalds, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich

On Wed, May 30, 2012 at 3:42 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 30 May 2012, Andi Kleen wrote:
>
>> On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
>> > On Wed, 30 May 2012, Andi Kleen wrote:
>> >
>> > > I always regretted that cpusets were no done with custom node lists.
>> > > That would have been much cleaner and also likely faster than what we have.
>> >
>> > Could shared memory policies ignore cpuset constraints?
>>
>> Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
>> Even with soft policy you could well break someone's setup.
>
> Well at least lets exempt shared memory from memory migration and memory
> policy updates. That seems to be causing many of these issues.

Yes, that's right direction, I think. Currently, shmem_set_policy() can't handle
nonlinear mapping. vma -> file offset transration is not so easy work
and I doubt
we should do.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 19:54               ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 19:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Linus Torvalds, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich

On Wed, May 30, 2012 at 3:42 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 30 May 2012, Andi Kleen wrote:
>
>> On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
>> > On Wed, 30 May 2012, Andi Kleen wrote:
>> >
>> > > I always regretted that cpusets were no done with custom node lists.
>> > > That would have been much cleaner and also likely faster than what we have.
>> >
>> > Could shared memory policies ignore cpuset constraints?
>>
>> Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
>> Even with soft policy you could well break someone's setup.
>
> Well at least lets exempt shared memory from memory migration and memory
> policy updates. That seems to be causing many of these issues.

Yes, that's right direction, I think. Currently, shmem_set_policy() can't handle
nonlinear mapping. vma -> file offset transration is not so easy work
and I doubt
we should do.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 19:52               ` Andi Kleen
@ 2012-05-30 20:00                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 20:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Linus Torvalds, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich

On Wed, May 30, 2012 at 3:52 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Wed, May 30, 2012 at 02:42:42PM -0500, Christoph Lameter wrote:
>> On Wed, 30 May 2012, Andi Kleen wrote:
>>
>> > On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
>> > > On Wed, 30 May 2012, Andi Kleen wrote:
>> > >
>> > > > I always regretted that cpusets were no done with custom node lists.
>> > > > That would have been much cleaner and also likely faster than what we have.
>> > >
>> > > Could shared memory policies ignore cpuset constraints?
>> >
>> > Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
>> > Even with soft policy you could well break someone's setup.
>>
>> Well at least lets exempt shared memory from memory migration and memory
>> policy updates. That seems to be causing many of these issues.
>
> Migration on the page level is needed for the memory error handling.
>
> Updates: you mean not allowing to set the policy when there are already
> multiple mappers? I could see that causing some unexpected behaviour. Presumably
> a standard database will only set it at the beginning, but I don't know
> if that would work for all users.

We don't need to kill migration core. We only need to kill that mbind(2) updates
vma->policy of shmem.

page migration for hwpoison is harmless. Because of, an attacker can't
inject hwpoison
intentntionally on production environment (HWPOISON_INJECTION=N).

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 20:00                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 20:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Linus Torvalds, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich

On Wed, May 30, 2012 at 3:52 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Wed, May 30, 2012 at 02:42:42PM -0500, Christoph Lameter wrote:
>> On Wed, 30 May 2012, Andi Kleen wrote:
>>
>> > On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
>> > > On Wed, 30 May 2012, Andi Kleen wrote:
>> > >
>> > > > I always regretted that cpusets were no done with custom node lists.
>> > > > That would have been much cleaner and also likely faster than what we have.
>> > >
>> > > Could shared memory policies ignore cpuset constraints?
>> >
>> > Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
>> > Even with soft policy you could well break someone's setup.
>>
>> Well at least lets exempt shared memory from memory migration and memory
>> policy updates. That seems to be causing many of these issues.
>
> Migration on the page level is needed for the memory error handling.
>
> Updates: you mean not allowing to set the policy when there are already
> multiple mappers? I could see that causing some unexpected behaviour. Presumably
> a standard database will only set it at the beginning, but I don't know
> if that would work for all users.

We don't need to kill migration core. We only need to kill that mbind(2) updates
vma->policy of shmem.

page migration for hwpoison is harmless. Because of, an attacker can't
inject hwpoison
intentntionally on production environment (HWPOISON_INJECTION=N).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 18:26   ` Linus Torvalds
  (?)
@ 2012-05-30 20:10     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 20:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Christoph Lameter, stable, hughd

On Wed, May 30, 2012 at 2:26 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, May 30, 2012 at 2:02 AM,  <kosaki.motohiro@gmail.com> wrote:
>>
>> So, I think we should reconsider about shared mempolicy completely.
>
> Quite frankly, I'd prefer that approach. The code is subtle and
> horribly bug-fraught, and I absolutely detest the way it looks too.
> Reading your patches was actually somewhat painful.

Oh, very sorry. I made effort to make smallest and simplest patches.
But I couldn't I do better. Current MPOL_F_SHARED is cra^H^H^H
complex. ;-)


> If we could just remove the support for it entirely, that would be
> *much* preferable to continue working with this code.
>
> Could we just try that removal, and see if anybody screams?

I'm keeping neutral a while and hope to hear other developers opinion.

Thank you.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 20:10     ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 20:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Christoph Lameter, stable, hughd

On Wed, May 30, 2012 at 2:26 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, May 30, 2012 at 2:02 AM, �<kosaki.motohiro@gmail.com> wrote:
>>
>> So, I think we should reconsider about shared mempolicy completely.
>
> Quite frankly, I'd prefer that approach. The code is subtle and
> horribly bug-fraught, and I absolutely detest the way it looks too.
> Reading your patches was actually somewhat painful.

Oh, very sorry. I made effort to make smallest and simplest patches.
But I couldn't I do better. Current MPOL_F_SHARED is cra^H^H^H
complex. ;-)


> If we could just remove the support for it entirely, that would be
> *much* preferable to continue working with this code.
>
> Could we just try that removal, and see if anybody screams?

I'm keeping neutral a while and hope to hear other developers opinion.

Thank you.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 20:10     ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 20:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Christoph Lameter, stable, hughd

On Wed, May 30, 2012 at 2:26 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, May 30, 2012 at 2:02 AM,  <kosaki.motohiro@gmail.com> wrote:
>>
>> So, I think we should reconsider about shared mempolicy completely.
>
> Quite frankly, I'd prefer that approach. The code is subtle and
> horribly bug-fraught, and I absolutely detest the way it looks too.
> Reading your patches was actually somewhat painful.

Oh, very sorry. I made effort to make smallest and simplest patches.
But I couldn't I do better. Current MPOL_F_SHARED is cra^H^H^H
complex. ;-)


> If we could just remove the support for it entirely, that would be
> *much* preferable to continue working with this code.
>
> Could we just try that removal, and see if anybody screams?

I'm keeping neutral a while and hope to hear other developers opinion.

Thank you.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 19:54               ` KOSAKI Motohiro
@ 2012-05-30 20:10                 ` Andi Kleen
  -1 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 20:10 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Andi Kleen, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

> Yes, that's right direction, I think. Currently, shmem_set_policy() can't handle
> nonlinear mapping.

I've been mulling for some time to just remove non linear mappings.
AFAIK they were only useful on 32bit and are obsolete and could be
emulated with VMAs instead.

-Andi

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 20:10                 ` Andi Kleen
  0 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 20:10 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Andi Kleen, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

> Yes, that's right direction, I think. Currently, shmem_set_policy() can't handle
> nonlinear mapping.

I've been mulling for some time to just remove non linear mappings.
AFAIK they were only useful on 32bit and are obsolete and could be
emulated with VMAs instead.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 20:10                 ` Andi Kleen
@ 2012-05-30 20:16                   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 20:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Linus Torvalds, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich

On Wed, May 30, 2012 at 4:10 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> Yes, that's right direction, I think. Currently, shmem_set_policy() can't handle
>> nonlinear mapping.
>
> I've been mulling for some time to just remove non linear mappings.
> AFAIK they were only useful on 32bit and are obsolete and could be
> emulated with VMAs instead.

I agree. It is only userful on 32bit and current enterprise users don't use
32bit anymore. So, I don't think emulated by vmas cause user visible issue.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 20:16                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 20:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Linus Torvalds, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich

On Wed, May 30, 2012 at 4:10 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> Yes, that's right direction, I think. Currently, shmem_set_policy() can't handle
>> nonlinear mapping.
>
> I've been mulling for some time to just remove non linear mappings.
> AFAIK they were only useful on 32bit and are obsolete and could be
> emulated with VMAs instead.

I agree. It is only userful on 32bit and current enterprise users don't use
32bit anymore. So, I don't think emulated by vmas cause user visible issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] mempolicy: Kill all mempolicy sharing
  2012-05-30  9:02   ` kosaki.motohiro
@ 2012-05-30 20:31     ` Andi Kleen
  -1 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 20:31 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, Christoph Lameter, stable, hughd,
	KOSAKI Motohiro, Andrew Morton

kosaki.motohiro@gmail.com writes:

> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> Dave Jones' system call fuzz testing tool "trinity" triggered the following
> bug error with slab debugging enabled

We have to fix it properly sorry. There are users who benefit from it
and just disabling it is not gonna fly.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] mempolicy: Kill all mempolicy sharing
@ 2012-05-30 20:31     ` Andi Kleen
  0 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 20:31 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, Christoph Lameter, stable, hughd,
	KOSAKI Motohiro, Andrew Morton

kosaki.motohiro@gmail.com writes:

> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> Dave Jones' system call fuzz testing tool "trinity" triggered the following
> bug error with slab debugging enabled

We have to fix it properly sorry. There are users who benefit from it
and just disabling it is not gonna fly.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
  2012-05-30  9:02   ` kosaki.motohiro
@ 2012-05-30 20:37     ` Andi Kleen
  -1 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 20:37 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, Christoph Lameter, stable, hughd,
	KOSAKI Motohiro, Andrew Morton, Miao Xie, Peter Zijlstra

kosaki.motohiro@gmail.com writes:

> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> commit cc9a6c8776 (cpuset: mm: reduce large amounts of memory barrier related
> damage v3) introduced a memory corruption.
>
> shmem_alloc_page() passes pseudo vma and it has one significant unique
> combination, vma->vm_ops=NULL and (vma->policy->flags & MPOL_F_SHARED).
>
> Now, get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
> and mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
> Therefore, when cpuset race is happen and alloc_pages_vma() fall in
> 'goto retry_cpuset' path, a policy refcount will be decreased too much and
> therefore it will make memory corruption.
>
> This patch fixes it.

Looks good.

Acked-by: Andi Kleen <ak@linux.intel.com>

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()
@ 2012-05-30 20:37     ` Andi Kleen
  0 siblings, 0 replies; 82+ messages in thread
From: Andi Kleen @ 2012-05-30 20:37 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	Linus Torvalds, Christoph Lameter, stable, hughd,
	KOSAKI Motohiro, Andrew Morton, Miao Xie, Peter Zijlstra

kosaki.motohiro@gmail.com writes:

> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> commit cc9a6c8776 (cpuset: mm: reduce large amounts of memory barrier related
> damage v3) introduced a memory corruption.
>
> shmem_alloc_page() passes pseudo vma and it has one significant unique
> combination, vma->vm_ops=NULL and (vma->policy->flags & MPOL_F_SHARED).
>
> Now, get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
> and mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
> Therefore, when cpuset race is happen and alloc_pages_vma() fall in
> 'goto retry_cpuset' path, a policy refcount will be decreased too much and
> therefore it will make memory corruption.
>
> This patch fixes it.

Looks good.

Acked-by: Andi Kleen <ak@linux.intel.com>

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] mempolicy: Kill all mempolicy sharing
  2012-05-30 20:31     ` Andi Kleen
@ 2012-05-30 20:58       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 20:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kosaki.motohiro, linux-kernel, linux-mm, Andrew Morton,
	Dave Jones, Mel Gorman, Linus Torvalds, Christoph Lameter,
	stable, hughd, KOSAKI Motohiro, Andrew Morton

(5/30/12 4:31 PM), Andi Kleen wrote:
> kosaki.motohiro@gmail.com writes:
>
>> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
>>
>> Dave Jones' system call fuzz testing tool "trinity" triggered the following
>> bug error with slab debugging enabled
>
> We have to fix it properly sorry. There are users who benefit from it
> and just disabling it is not gonna fly.

Why? This patch doesn't make any user visible behavior change. I haven't
caught what does your "proper" mean.




^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 2/6] mempolicy: Kill all mempolicy sharing
@ 2012-05-30 20:58       ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 20:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kosaki.motohiro, linux-kernel, linux-mm, Andrew Morton,
	Dave Jones, Mel Gorman, Linus Torvalds, Christoph Lameter,
	stable, hughd, KOSAKI Motohiro, Andrew Morton

(5/30/12 4:31 PM), Andi Kleen wrote:
> kosaki.motohiro@gmail.com writes:
>
>> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
>>
>> Dave Jones' system call fuzz testing tool "trinity" triggered the following
>> bug error with slab debugging enabled
>
> We have to fix it properly sorry. There are users who benefit from it
> and just disabling it is not gonna fly.

Why? This patch doesn't make any user visible behavior change. I haven't
caught what does your "proper" mean.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 20:00                 ` KOSAKI Motohiro
@ 2012-05-30 21:22                   ` Ben Hutchings
  -1 siblings, 0 replies; 82+ messages in thread
From: Ben Hutchings @ 2012-05-30 21:22 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andi Kleen, Christoph Lameter, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

On Wed, May 30, 2012 at 04:00:55PM -0400, KOSAKI Motohiro wrote:
> On Wed, May 30, 2012 at 3:52 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > On Wed, May 30, 2012 at 02:42:42PM -0500, Christoph Lameter wrote:
> >> On Wed, 30 May 2012, Andi Kleen wrote:
> >>
> >> > On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
> >> > > On Wed, 30 May 2012, Andi Kleen wrote:
> >> > >
> >> > > > I always regretted that cpusets were no done with custom node lists.
> >> > > > That would have been much cleaner and also likely faster than what we have.
> >> > >
> >> > > Could shared memory policies ignore cpuset constraints?
> >> >
> >> > Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
> >> > Even with soft policy you could well break someone's setup.
> >>
> >> Well at least lets exempt shared memory from memory migration and memory
> >> policy updates. That seems to be causing many of these issues.
> >
> > Migration on the page level is needed for the memory error handling.
> >
> > Updates: you mean not allowing to set the policy when there are already
> > multiple mappers? I could see that causing some unexpected behaviour. Presumably
> > a standard database will only set it at the beginning, but I don't know
> > if that would work for all users.
> 
> We don't need to kill migration core. We only need to kill that mbind(2) updates
> vma->policy of shmem.
[...]

So should I (and Greg) drop 'mm: mempolicy: Let vma_merge and
vma_split handle vma->vm_policy linkages' from the pending stable
releases?  Or is that OK as an interim fix until these changes go
into mainline?
 
Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 21:22                   ` Ben Hutchings
  0 siblings, 0 replies; 82+ messages in thread
From: Ben Hutchings @ 2012-05-30 21:22 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andi Kleen, Christoph Lameter, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

On Wed, May 30, 2012 at 04:00:55PM -0400, KOSAKI Motohiro wrote:
> On Wed, May 30, 2012 at 3:52 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > On Wed, May 30, 2012 at 02:42:42PM -0500, Christoph Lameter wrote:
> >> On Wed, 30 May 2012, Andi Kleen wrote:
> >>
> >> > On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
> >> > > On Wed, 30 May 2012, Andi Kleen wrote:
> >> > >
> >> > > > I always regretted that cpusets were no done with custom node lists.
> >> > > > That would have been much cleaner and also likely faster than what we have.
> >> > >
> >> > > Could shared memory policies ignore cpuset constraints?
> >> >
> >> > Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
> >> > Even with soft policy you could well break someone's setup.
> >>
> >> Well at least lets exempt shared memory from memory migration and memory
> >> policy updates. That seems to be causing many of these issues.
> >
> > Migration on the page level is needed for the memory error handling.
> >
> > Updates: you mean not allowing to set the policy when there are already
> > multiple mappers? I could see that causing some unexpected behaviour. Presumably
> > a standard database will only set it at the beginning, but I don't know
> > if that would work for all users.
> 
> We don't need to kill migration core. We only need to kill that mbind(2) updates
> vma->policy of shmem.
[...]

So should I (and Greg) drop 'mm: mempolicy: Let vma_merge and
vma_split handle vma->vm_policy linkages' from the pending stable
releases?  Or is that OK as an interim fix until these changes go
into mainline?
 
Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 21:22                   ` Ben Hutchings
@ 2012-05-30 21:25                     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 21:25 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: KOSAKI Motohiro, Andi Kleen, Christoph Lameter, Linus Torvalds,
	linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	stable, hughd, sivanich

(5/30/12 5:22 PM), Ben Hutchings wrote:
> On Wed, May 30, 2012 at 04:00:55PM -0400, KOSAKI Motohiro wrote:
>> On Wed, May 30, 2012 at 3:52 PM, Andi Kleen<andi@firstfloor.org>  wrote:
>>> On Wed, May 30, 2012 at 02:42:42PM -0500, Christoph Lameter wrote:
>>>> On Wed, 30 May 2012, Andi Kleen wrote:
>>>>
>>>>> On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
>>>>>> On Wed, 30 May 2012, Andi Kleen wrote:
>>>>>>
>>>>>>> I always regretted that cpusets were no done with custom node lists.
>>>>>>> That would have been much cleaner and also likely faster than what we have.
>>>>>>
>>>>>> Could shared memory policies ignore cpuset constraints?
>>>>>
>>>>> Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
>>>>> Even with soft policy you could well break someone's setup.
>>>>
>>>> Well at least lets exempt shared memory from memory migration and memory
>>>> policy updates. That seems to be causing many of these issues.
>>>
>>> Migration on the page level is needed for the memory error handling.
>>>
>>> Updates: you mean not allowing to set the policy when there are already
>>> multiple mappers? I could see that causing some unexpected behaviour. Presumably
>>> a standard database will only set it at the beginning, but I don't know
>>> if that would work for all users.
>>
>> We don't need to kill migration core. We only need to kill that mbind(2) updates
>> vma->policy of shmem.
> [...]
>
> So should I (and Greg) drop 'mm: mempolicy: Let vma_merge and
> vma_split handle vma->vm_policy linkages' from the pending stable
> releases?  Or is that OK as an interim fix until these changes go
> into mainline?

Please drop. It screw up mbind(2).





^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-05-30 21:25                     ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-30 21:25 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: KOSAKI Motohiro, Andi Kleen, Christoph Lameter, Linus Torvalds,
	linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	stable, hughd, sivanich

(5/30/12 5:22 PM), Ben Hutchings wrote:
> On Wed, May 30, 2012 at 04:00:55PM -0400, KOSAKI Motohiro wrote:
>> On Wed, May 30, 2012 at 3:52 PM, Andi Kleen<andi@firstfloor.org>  wrote:
>>> On Wed, May 30, 2012 at 02:42:42PM -0500, Christoph Lameter wrote:
>>>> On Wed, 30 May 2012, Andi Kleen wrote:
>>>>
>>>>> On Wed, May 30, 2012 at 01:50:02PM -0500, Christoph Lameter wrote:
>>>>>> On Wed, 30 May 2012, Andi Kleen wrote:
>>>>>>
>>>>>>> I always regretted that cpusets were no done with custom node lists.
>>>>>>> That would have been much cleaner and also likely faster than what we have.
>>>>>>
>>>>>> Could shared memory policies ignore cpuset constraints?
>>>>>
>>>>> Only if noone uses cpusets as a "security" mechanism, just for a "soft policy"
>>>>> Even with soft policy you could well break someone's setup.
>>>>
>>>> Well at least lets exempt shared memory from memory migration and memory
>>>> policy updates. That seems to be causing many of these issues.
>>>
>>> Migration on the page level is needed for the memory error handling.
>>>
>>> Updates: you mean not allowing to set the policy when there are already
>>> multiple mappers? I could see that causing some unexpected behaviour. Presumably
>>> a standard database will only set it at the beginning, but I don't know
>>> if that would work for all users.
>>
>> We don't need to kill migration core. We only need to kill that mbind(2) updates
>> vma->policy of shmem.
> [...]
>
> So should I (and Greg) drop 'mm: mempolicy: Let vma_merge and
> vma_split handle vma->vm_policy linkages' from the pending stable
> releases?  Or is that OK as an interim fix until these changes go
> into mainline?

Please drop. It screw up mbind(2).




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
  2012-05-30 19:17     ` Christoph Lameter
@ 2012-05-31  6:49       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-31  6:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, linux-kernel, linux-mm, Andrew Morton,
	Dave Jones, Mel Gorman, Linus Torvalds, stable, hughd,
	Andrew Morton, KOSAKI Motohiro

(5/30/12 3:17 PM), Christoph Lameter wrote:
> On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:
>
>> From: KOSAKI Motohiro<kosaki.motohiro@gmail.com>
>>
>> commit 05f144a0d5 removed vma->vm_policy updates code and it is a purpose of
>> mbind_range(). Now, mbind_range() is virtually no-op. no-op function don't
>> makes any bugs, I agree. but maybe it is not right fix.
>
> I dont really understand the changelog. But to restore the policy_vma() is
> the right thing to do since there are potential multiple use cases where
> we want to apply a policy to a vma.
>
> Proposed new changelog:
>
> Commit 05f144a0d5 folded policy_vma() into mbind_range(). There are
> other use cases of policy_vma(*) though and so revert a piece of
> that commit in order to have a policy_vma() function again.

sorry, I overlooked this. Commit 05f144a0d5 don't work neither regular vma
nor shmem vma. thus I can't take this proposal. sorry.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"
@ 2012-05-31  6:49       ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-05-31  6:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, linux-kernel, linux-mm, Andrew Morton,
	Dave Jones, Mel Gorman, Linus Torvalds, stable, hughd,
	Andrew Morton, KOSAKI Motohiro

(5/30/12 3:17 PM), Christoph Lameter wrote:
> On Wed, 30 May 2012, kosaki.motohiro@gmail.com wrote:
>
>> From: KOSAKI Motohiro<kosaki.motohiro@gmail.com>
>>
>> commit 05f144a0d5 removed vma->vm_policy updates code and it is a purpose of
>> mbind_range(). Now, mbind_range() is virtually no-op. no-op function don't
>> makes any bugs, I agree. but maybe it is not right fix.
>
> I dont really understand the changelog. But to restore the policy_vma() is
> the right thing to do since there are potential multiple use cases where
> we want to apply a policy to a vma.
>
> Proposed new changelog:
>
> Commit 05f144a0d5 folded policy_vma() into mbind_range(). There are
> other use cases of policy_vma(*) though and so revert a piece of
> that commit in order to have a policy_vma() function again.

sorry, I overlooked this. Commit 05f144a0d5 don't work neither regular vma
nor shmem vma. thus I can't take this proposal. sorry.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-05-30 20:16                   ` KOSAKI Motohiro
@ 2012-06-01  0:45                     ` david
  -1 siblings, 0 replies; 82+ messages in thread
From: david @ 2012-06-01  0:45 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andi Kleen, Christoph Lameter, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

On Wed, 30 May 2012, KOSAKI Motohiro wrote:

> On Wed, May 30, 2012 at 4:10 PM, Andi Kleen <andi@firstfloor.org> wrote:
>>> Yes, that's right direction, I think. Currently, shmem_set_policy() can't handle
>>> nonlinear mapping.
>>
>> I've been mulling for some time to just remove non linear mappings.
>> AFAIK they were only useful on 32bit and are obsolete and could be
>> emulated with VMAs instead.
>
> I agree. It is only userful on 32bit and current enterprise users don't use
> 32bit anymore. So, I don't think emulated by vmas cause user visible issue.

I wish this was true, there are a lot of systems out there still running 
32 bit linux, even on 64 bit capible hardware. This is especially true in 
enterprises where they have either homegrown or proprietary software that 
isn't 64 bit clean.

David Lang

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-06-01  0:45                     ` david
  0 siblings, 0 replies; 82+ messages in thread
From: david @ 2012-06-01  0:45 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andi Kleen, Christoph Lameter, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

On Wed, 30 May 2012, KOSAKI Motohiro wrote:

> On Wed, May 30, 2012 at 4:10 PM, Andi Kleen <andi@firstfloor.org> wrote:
>>> Yes, that's right direction, I think. Currently, shmem_set_policy() can't handle
>>> nonlinear mapping.
>>
>> I've been mulling for some time to just remove non linear mappings.
>> AFAIK they were only useful on 32bit and are obsolete and could be
>> emulated with VMAs instead.
>
> I agree. It is only userful on 32bit and current enterprise users don't use
> 32bit anymore. So, I don't think emulated by vmas cause user visible issue.

I wish this was true, there are a lot of systems out there still running 
32 bit linux, even on 64 bit capible hardware. This is especially true in 
enterprises where they have either homegrown or proprietary software that 
isn't 64 bit clean.

David Lang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-06-01  0:45                     ` david
@ 2012-06-01 13:51                       ` Christoph Lameter
  -1 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-06-01 13:51 UTC (permalink / raw)
  To: david
  Cc: KOSAKI Motohiro, Andi Kleen, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

On Thu, 31 May 2012, david@lang.hm wrote:

> On Wed, 30 May 2012, KOSAKI Motohiro wrote:
>
> > On Wed, May 30, 2012 at 4:10 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > > > Yes, that's right direction, I think. Currently, shmem_set_policy()
> > > > can't handle
> > > > nonlinear mapping.
> > >
> > > I've been mulling for some time to just remove non linear mappings.
> > > AFAIK they were only useful on 32bit and are obsolete and could be
> > > emulated with VMAs instead.
> >
> > I agree. It is only userful on 32bit and current enterprise users don't use
> > 32bit anymore. So, I don't think emulated by vmas cause user visible issue.
>
> I wish this was true, there are a lot of systems out there still running 32
> bit linux, even on 64 bit capible hardware. This is especially true in
> enterprises where they have either homegrown or proprietary software that
> isn't 64 bit clean.

32 bit binaries (and entire distros) run fine under 64 bit kernels.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-06-01 13:51                       ` Christoph Lameter
  0 siblings, 0 replies; 82+ messages in thread
From: Christoph Lameter @ 2012-06-01 13:51 UTC (permalink / raw)
  To: david
  Cc: KOSAKI Motohiro, Andi Kleen, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

On Thu, 31 May 2012, david@lang.hm wrote:

> On Wed, 30 May 2012, KOSAKI Motohiro wrote:
>
> > On Wed, May 30, 2012 at 4:10 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > > > Yes, that's right direction, I think. Currently, shmem_set_policy()
> > > > can't handle
> > > > nonlinear mapping.
> > >
> > > I've been mulling for some time to just remove non linear mappings.
> > > AFAIK they were only useful on 32bit and are obsolete and could be
> > > emulated with VMAs instead.
> >
> > I agree. It is only userful on 32bit and current enterprise users don't use
> > 32bit anymore. So, I don't think emulated by vmas cause user visible issue.
>
> I wish this was true, there are a lot of systems out there still running 32
> bit linux, even on 64 bit capible hardware. This is especially true in
> enterprises where they have either homegrown or proprietary software that
> isn't 64 bit clean.

32 bit binaries (and entire distros) run fine under 64 bit kernels.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-06-01 13:51                       ` Christoph Lameter
@ 2012-06-01 19:31                         ` david
  -1 siblings, 0 replies; 82+ messages in thread
From: david @ 2012-06-01 19:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Andi Kleen, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

On Fri, 1 Jun 2012, Christoph Lameter wrote:

> Subject: Re: [PATCH 0/6] mempolicy memory corruption fixlet
> 
> On Thu, 31 May 2012, david@lang.hm wrote:
>
>> On Wed, 30 May 2012, KOSAKI Motohiro wrote:
>>
>>> On Wed, May 30, 2012 at 4:10 PM, Andi Kleen <andi@firstfloor.org> wrote:
>>>>> Yes, that's right direction, I think. Currently, shmem_set_policy()
>>>>> can't handle
>>>>> nonlinear mapping.
>>>>
>>>> I've been mulling for some time to just remove non linear mappings.
>>>> AFAIK they were only useful on 32bit and are obsolete and could be
>>>> emulated with VMAs instead.
>>>
>>> I agree. It is only userful on 32bit and current enterprise users don't use
>>> 32bit anymore. So, I don't think emulated by vmas cause user visible issue.
>>
>> I wish this was true, there are a lot of systems out there still running 32
>> bit linux, even on 64 bit capible hardware. This is especially true in
>> enterprises where they have either homegrown or proprietary software that
>> isn't 64 bit clean.
>
> 32 bit binaries (and entire distros) run fine under 64 bit kernels.

unfortunantly, not quite 100% of the time. It's very good, but the 
automount bug a month or so ago is an example of how you can run into rare 
problems. Many "enterprise" systems are not willing to risk it.

David Lang

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-06-01 19:31                         ` david
  0 siblings, 0 replies; 82+ messages in thread
From: david @ 2012-06-01 19:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Andi Kleen, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

On Fri, 1 Jun 2012, Christoph Lameter wrote:

> Subject: Re: [PATCH 0/6] mempolicy memory corruption fixlet
> 
> On Thu, 31 May 2012, david@lang.hm wrote:
>
>> On Wed, 30 May 2012, KOSAKI Motohiro wrote:
>>
>>> On Wed, May 30, 2012 at 4:10 PM, Andi Kleen <andi@firstfloor.org> wrote:
>>>>> Yes, that's right direction, I think. Currently, shmem_set_policy()
>>>>> can't handle
>>>>> nonlinear mapping.
>>>>
>>>> I've been mulling for some time to just remove non linear mappings.
>>>> AFAIK they were only useful on 32bit and are obsolete and could be
>>>> emulated with VMAs instead.
>>>
>>> I agree. It is only userful on 32bit and current enterprise users don't use
>>> 32bit anymore. So, I don't think emulated by vmas cause user visible issue.
>>
>> I wish this was true, there are a lot of systems out there still running 32
>> bit linux, even on 64 bit capible hardware. This is especially true in
>> enterprises where they have either homegrown or proprietary software that
>> isn't 64 bit clean.
>
> 32 bit binaries (and entire distros) run fine under 64 bit kernels.

unfortunantly, not quite 100% of the time. It's very good, but the 
automount bug a month or so ago is an example of how you can run into rare 
problems. Many "enterprise" systems are not willing to risk it.

David Lang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-06-01 19:31                         ` david
@ 2012-06-01 19:37                           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-06-01 19:37 UTC (permalink / raw)
  To: david
  Cc: Christoph Lameter, Andi Kleen, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

>>>>>> Yes, that's right direction, I think. Currently, shmem_set_policy()
>>>>>> can't handle
>>>>>> nonlinear mapping.
>>>>>
>>>>> I've been mulling for some time to just remove non linear mappings.
>>>>> AFAIK they were only useful on 32bit and are obsolete and could be
>>>>> emulated with VMAs instead.
>>>>
>>>>
>>>> I agree. It is only userful on 32bit and current enterprise users don't
>>>> use
>>>> 32bit anymore. So, I don't think emulated by vmas cause user visible
>>>> issue.
>>>
>>>
>>> I wish this was true, there are a lot of systems out there still running
>>> 32
>>> bit linux, even on 64 bit capible hardware. This is especially true in
>>> enterprises where they have either homegrown or proprietary software that
>>> isn't 64 bit clean.
>>
>> 32 bit binaries (and entire distros) run fine under 64 bit kernels.
>
> unfortunantly, not quite 100% of the time. It's very good, but the automount
> bug a month or so ago is an example of how you can run into rare problems.
> Many "enterprise" systems are not willing to risk it.

Then we can remove a feature safely. Risk not taker continue to uses old distro
kernel. And some years after, distros discontinue to support 32bit. And then,
problem will vanish automatically.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-06-01 19:37                           ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-06-01 19:37 UTC (permalink / raw)
  To: david
  Cc: Christoph Lameter, Andi Kleen, Linus Torvalds, linux-kernel,
	linux-mm, Andrew Morton, Dave Jones, Mel Gorman, stable, hughd,
	sivanich

>>>>>> Yes, that's right direction, I think. Currently, shmem_set_policy()
>>>>>> can't handle
>>>>>> nonlinear mapping.
>>>>>
>>>>> I've been mulling for some time to just remove non linear mappings.
>>>>> AFAIK they were only useful on 32bit and are obsolete and could be
>>>>> emulated with VMAs instead.
>>>>
>>>>
>>>> I agree. It is only userful on 32bit and current enterprise users don't
>>>> use
>>>> 32bit anymore. So, I don't think emulated by vmas cause user visible
>>>> issue.
>>>
>>>
>>> I wish this was true, there are a lot of systems out there still running
>>> 32
>>> bit linux, even on 64 bit capible hardware. This is especially true in
>>> enterprises where they have either homegrown or proprietary software that
>>> isn't 64 bit clean.
>>
>> 32 bit binaries (and entire distros) run fine under 64 bit kernels.
>
> unfortunantly, not quite 100% of the time. It's very good, but the automount
> bug a month or so ago is an example of how you can run into rare problems.
> Many "enterprise" systems are not willing to risk it.

Then we can remove a feature safely. Risk not taker continue to uses old distro
kernel. And some years after, distros discontinue to support 32bit. And then,
problem will vanish automatically.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-06-01 19:37                           ` KOSAKI Motohiro
@ 2012-06-05 19:02                             ` Linus Torvalds
  -1 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2012-06-05 19:02 UTC (permalink / raw)
  To: KOSAKI Motohiro, Andrew Morton
  Cc: david, Christoph Lameter, Andi Kleen, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich

I'm coming back to this email thread, because I didn't apply the
series due to all the ongoing discussion and hoping that somebody
would put changelog fixes and ack notices etc together.

I'd also really like to know that the people who saw the problem that
caused the current single patch (that this series reverts) would test
the whole series. Maybe that happened and I didn't notice it in the
threads, but I don't think so.

In fact, right now I'm assuming that the series will eventually come
to me through Andrew. Andrew, correct?

                       Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-06-05 19:02                             ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2012-06-05 19:02 UTC (permalink / raw)
  To: KOSAKI Motohiro, Andrew Morton
  Cc: david, Christoph Lameter, Andi Kleen, linux-kernel, linux-mm,
	Andrew Morton, Dave Jones, Mel Gorman, stable, hughd, sivanich

I'm coming back to this email thread, because I didn't apply the
series due to all the ongoing discussion and hoping that somebody
would put changelog fixes and ack notices etc together.

I'd also really like to know that the people who saw the problem that
caused the current single patch (that this series reverts) would test
the whole series. Maybe that happened and I didn't notice it in the
threads, but I don't think so.

In fact, right now I'm assuming that the series will eventually come
to me through Andrew. Andrew, correct?

                       Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-06-05 19:02                             ` Linus Torvalds
@ 2012-06-05 19:17                               ` Andrew Morton
  -1 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2012-06-05 19:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KOSAKI Motohiro, david, Christoph Lameter, Andi Kleen,
	linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	stable, hughd, sivanich

On Tue, 5 Jun 2012 12:02:25 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I'm coming back to this email thread, because I didn't apply the
> series due to all the ongoing discussion and hoping that somebody
> would put changelog fixes and ack notices etc together.
> 
> I'd also really like to know that the people who saw the problem that
> caused the current single patch (that this series reverts) would test
> the whole series. Maybe that happened and I didn't notice it in the
> threads, but I don't think so.
> 
> In fact, right now I'm assuming that the series will eventually come
> to me through Andrew. Andrew, correct?
> 

yup.

I expect there will be a v2 series (at least).  It's unclear what
we'll be doing with [2/6]: whether the patch will be reworked, or
whether Andi misunderstood its effects?

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-06-05 19:17                               ` Andrew Morton
  0 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2012-06-05 19:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KOSAKI Motohiro, david, Christoph Lameter, Andi Kleen,
	linux-kernel, linux-mm, Andrew Morton, Dave Jones, Mel Gorman,
	stable, hughd, sivanich

On Tue, 5 Jun 2012 12:02:25 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I'm coming back to this email thread, because I didn't apply the
> series due to all the ongoing discussion and hoping that somebody
> would put changelog fixes and ack notices etc together.
> 
> I'd also really like to know that the people who saw the problem that
> caused the current single patch (that this series reverts) would test
> the whole series. Maybe that happened and I didn't notice it in the
> threads, but I don't think so.
> 
> In fact, right now I'm assuming that the series will eventually come
> to me through Andrew. Andrew, correct?
> 

yup.

I expect there will be a v2 series (at least).  It's unclear what
we'll be doing with [2/6]: whether the patch will be reworked, or
whether Andi misunderstood its effects?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
  2012-06-05 19:17                               ` Andrew Morton
@ 2012-06-06 19:10                                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-06-06 19:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, KOSAKI Motohiro, david, Christoph Lameter,
	Andi Kleen, linux-kernel, linux-mm, Andrew Morton, Dave Jones,
	Mel Gorman, stable, hughd, sivanich

(6/5/12 3:17 PM), Andrew Morton wrote:
> On Tue, 5 Jun 2012 12:02:25 -0700
> Linus Torvalds<torvalds@linux-foundation.org>  wrote:
>
>> I'm coming back to this email thread, because I didn't apply the
>> series due to all the ongoing discussion and hoping that somebody
>> would put changelog fixes and ack notices etc together.
>>
>> I'd also really like to know that the people who saw the problem that
>> caused the current single patch (that this series reverts) would test
>> the whole series. Maybe that happened and I didn't notice it in the
>> threads, but I don't think so.

I'm not surprised this. If many people are interesting to review this area,
mempolicy wouldn't have break so a lot.


>> In fact, right now I'm assuming that the series will eventually come
>> to me through Andrew. Andrew, correct?
>
> yup.
>
> I expect there will be a v2 series (at least).  It's unclear what
> we'll be doing with [2/6]: whether the patch will be reworked, or
> whether Andi misunderstood its effects?

Maybe because Andi didn't join bug fix works in this area for several years?


Currently, mbind(2) is completely broken. A primary role of mbind(2) is to
update memory policy of some vmas and Mel's fix remvoed it. Then, mbind is
almostly no-op. it's a regression.

I'm not clear which point you seems unclear. So, let's repeat a description of
[2/6].

There are two problem now, alloc_pages_vma() has strong and wrong assumption.
vma->policy never have MPOL_F_SHARED and shared_policy->policy must have it.
And, cpusets rebinding ignore mpol->refcnt and updates it forcibly.

The final point is to implement cow. But for it, we need rewrite mpol->rebind
family completely. It doesn't fit for 3.5 timeframe.


The downside of patch [2/6] is very small. because,

A memplicy is only shared three cases, 1) mbind() updates multiple
vmas 2) mbind() updates shmem vma 3) A shared policy splits into two regions
by a part region update.

All of them are rare. Because nobody hit kernel panic until now. Then I don't
think my patch increase memory footprint.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/6] mempolicy memory corruption fixlet
@ 2012-06-06 19:10                                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 82+ messages in thread
From: KOSAKI Motohiro @ 2012-06-06 19:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, KOSAKI Motohiro, david, Christoph Lameter,
	Andi Kleen, linux-kernel, linux-mm, Andrew Morton, Dave Jones,
	Mel Gorman, stable, hughd, sivanich

(6/5/12 3:17 PM), Andrew Morton wrote:
> On Tue, 5 Jun 2012 12:02:25 -0700
> Linus Torvalds<torvalds@linux-foundation.org>  wrote:
>
>> I'm coming back to this email thread, because I didn't apply the
>> series due to all the ongoing discussion and hoping that somebody
>> would put changelog fixes and ack notices etc together.
>>
>> I'd also really like to know that the people who saw the problem that
>> caused the current single patch (that this series reverts) would test
>> the whole series. Maybe that happened and I didn't notice it in the
>> threads, but I don't think so.

I'm not surprised this. If many people are interesting to review this area,
mempolicy wouldn't have break so a lot.


>> In fact, right now I'm assuming that the series will eventually come
>> to me through Andrew. Andrew, correct?
>
> yup.
>
> I expect there will be a v2 series (at least).  It's unclear what
> we'll be doing with [2/6]: whether the patch will be reworked, or
> whether Andi misunderstood its effects?

Maybe because Andi didn't join bug fix works in this area for several years?


Currently, mbind(2) is completely broken. A primary role of mbind(2) is to
update memory policy of some vmas and Mel's fix remvoed it. Then, mbind is
almostly no-op. it's a regression.

I'm not clear which point you seems unclear. So, let's repeat a description of
[2/6].

There are two problem now, alloc_pages_vma() has strong and wrong assumption.
vma->policy never have MPOL_F_SHARED and shared_policy->policy must have it.
And, cpusets rebinding ignore mpol->refcnt and updates it forcibly.

The final point is to implement cow. But for it, we need rewrite mpol->rebind
family completely. It doesn't fit for 3.5 timeframe.


The downside of patch [2/6] is very small. because,

A memplicy is only shared three cases, 1) mbind() updates multiple
vmas 2) mbind() updates shmem vma 3) A shared policy splits into two regions
by a part region update.

All of them are rare. Because nobody hit kernel panic until now. Then I don't
think my patch increase memory footprint.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2012-06-06 19:11 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-30  9:02 [PATCH 0/6] mempolicy memory corruption fixlet kosaki.motohiro
2012-05-30  9:02 ` kosaki.motohiro
2012-05-30  9:02 ` [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages" kosaki.motohiro
2012-05-30  9:02   ` kosaki.motohiro
2012-05-30 19:17   ` Christoph Lameter
2012-05-30 19:17     ` Christoph Lameter
2012-05-30 19:39     ` KOSAKI Motohiro
2012-05-30 19:39       ` KOSAKI Motohiro
2012-05-30 19:39       ` KOSAKI Motohiro
2012-05-30 19:44       ` Christoph Lameter
2012-05-30 19:44         ` Christoph Lameter
2012-05-31  6:49     ` KOSAKI Motohiro
2012-05-31  6:49       ` KOSAKI Motohiro
2012-05-30  9:02 ` [PATCH 2/6] mempolicy: Kill all mempolicy sharing kosaki.motohiro
2012-05-30  9:02   ` kosaki.motohiro
2012-05-30 19:41   ` Christoph Lameter
2012-05-30 19:41     ` Christoph Lameter
2012-05-30 19:46     ` KOSAKI Motohiro
2012-05-30 19:46       ` KOSAKI Motohiro
2012-05-30 19:48     ` Andi Kleen
2012-05-30 19:48       ` Andi Kleen
2012-05-30 20:31   ` Andi Kleen
2012-05-30 20:31     ` Andi Kleen
2012-05-30 20:58     ` KOSAKI Motohiro
2012-05-30 20:58       ` KOSAKI Motohiro
2012-05-30  9:02 ` [PATCH 3/6] mempolicy: fix a race in shared_policy_replace() kosaki.motohiro
2012-05-30  9:02   ` kosaki.motohiro
2012-05-30  9:02 ` [PATCH 4/6] mempolicy: fix refcount leak in mpol_set_shared_policy() kosaki.motohiro
2012-05-30  9:02   ` kosaki.motohiro
2012-05-30  9:02 ` [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma() kosaki.motohiro
2012-05-30  9:02   ` kosaki.motohiro
2012-05-30 20:37   ` Andi Kleen
2012-05-30 20:37     ` Andi Kleen
2012-05-30  9:02 ` [PATCH 6/6] MAINTAINERS: Added MEMPOLICY entry kosaki.motohiro
2012-05-30  9:02   ` kosaki.motohiro
2012-05-30 19:27   ` Christoph Lameter
2012-05-30 19:27     ` Christoph Lameter
2012-05-30 18:26 ` [PATCH 0/6] mempolicy memory corruption fixlet Linus Torvalds
2012-05-30 18:26   ` Linus Torvalds
2012-05-30 18:34   ` Christoph Lameter
2012-05-30 18:34     ` Christoph Lameter
2012-05-30 18:46     ` Andi Kleen
2012-05-30 18:46       ` Andi Kleen
2012-05-30 18:50       ` Christoph Lameter
2012-05-30 18:50         ` Christoph Lameter
2012-05-30 19:32         ` Andi Kleen
2012-05-30 19:32           ` Andi Kleen
2012-05-30 19:42           ` Christoph Lameter
2012-05-30 19:42             ` Christoph Lameter
2012-05-30 19:52             ` Andi Kleen
2012-05-30 19:52               ` Andi Kleen
2012-05-30 20:00               ` KOSAKI Motohiro
2012-05-30 20:00                 ` KOSAKI Motohiro
2012-05-30 21:22                 ` Ben Hutchings
2012-05-30 21:22                   ` Ben Hutchings
2012-05-30 21:25                   ` KOSAKI Motohiro
2012-05-30 21:25                     ` KOSAKI Motohiro
2012-05-30 19:54             ` KOSAKI Motohiro
2012-05-30 19:54               ` KOSAKI Motohiro
2012-05-30 20:10               ` Andi Kleen
2012-05-30 20:10                 ` Andi Kleen
2012-05-30 20:16                 ` KOSAKI Motohiro
2012-05-30 20:16                   ` KOSAKI Motohiro
2012-06-01  0:45                   ` david
2012-06-01  0:45                     ` david
2012-06-01 13:51                     ` Christoph Lameter
2012-06-01 13:51                       ` Christoph Lameter
2012-06-01 19:31                       ` david
2012-06-01 19:31                         ` david
2012-06-01 19:37                         ` KOSAKI Motohiro
2012-06-01 19:37                           ` KOSAKI Motohiro
2012-06-05 19:02                           ` Linus Torvalds
2012-06-05 19:02                             ` Linus Torvalds
2012-06-05 19:17                             ` Andrew Morton
2012-06-05 19:17                               ` Andrew Morton
2012-06-06 19:10                               ` KOSAKI Motohiro
2012-06-06 19:10                                 ` KOSAKI Motohiro
2012-05-30 19:03     ` Linus Torvalds
2012-05-30 19:03       ` Linus Torvalds
2012-05-30 20:10   ` KOSAKI Motohiro
2012-05-30 20:10     ` KOSAKI Motohiro
2012-05-30 20:10     ` KOSAKI Motohiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.