All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
@ 2013-01-02 10:01 ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2013-01-02 10:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	Christoph Lameter, David Rientjes, Mel Gorman, Ingo Molnar,
	Peter Zijlstra, linux-kernel, linux-mm

Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.

Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask()
when no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
With slab poisoning, you can then rely on mpol_to_str() to set the bit
for node 0x6b6b, probably in the next page above the caller's stack.

mpol_parse_str() is only called from shmem_parse_options(): no_context
is always true, so call it unused for now, and remove !no_context code.
Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
expect.  Then mpol_to_str() can ignore its no_context argument also,
the mpol being appropriately initialized whether contextualized or not.
Rename its no_context unused too, and let subsequent patch remove them
(that's not needed for stable backporting, which would involve rejects).

I don't understand why MPOL_LOCAL is described as a pseudo-policy:
it's a reasonable policy which suffers from a confusing implementation
in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
much more robust if MPOL_LOCAL were recognized in switch statements
throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
empty) nodes mask like everyone else, instead of its preferred_node
variant (I presume an optimization from the days before MPOL_LOCAL).
But that would take me too long to get right and fully tested.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
---

 mm/mempolicy.c |   64 +++++++++++++++++++----------------------------
 1 file changed, 26 insertions(+), 38 deletions(-)

--- 3.8-rc1/mm/mempolicy.c	2012-12-22 09:43:27.636015582 -0800
+++ linux/mm/mempolicy.c	2013-01-01 23:44:10.715017466 -0800
@@ -2595,8 +2595,7 @@ void numa_default_policy(void)
  */
 
 /*
- * "local" is pseudo-policy:  MPOL_PREFERRED with MPOL_F_LOCAL flag
- * Used only for mpol_parse_str() and mpol_to_str()
+ * "local" is implemented internally by MPOL_PREFERRED with MPOL_F_LOCAL flag.
  */
 static const char * const policy_modes[] =
 {
@@ -2610,28 +2609,21 @@ static const char * const policy_modes[]
 
 #ifdef CONFIG_TMPFS
 /**
- * mpol_parse_str - parse string to mempolicy
+ * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option.
  * @str:  string containing mempolicy to parse
  * @mpol:  pointer to struct mempolicy pointer, returned on success.
- * @no_context:  flag whether to "contextualize" the mempolicy
+ * @unused:  redundant argument, to be removed later.
  *
  * Format of input:
  *	<mode>[=<flags>][:<nodelist>]
  *
- * if @no_context is true, save the input nodemask in w.user_nodemask in
- * the returned mempolicy.  This will be used to "clone" the mempolicy in
- * a specific context [cpuset] at a later time.  Used to parse tmpfs mpol
- * mount option.  Note that if 'static' or 'relative' mode flags were
- * specified, the input nodemask will already have been saved.  Saving
- * it again is redundant, but safe.
- *
  * On success, returns 0, else 1
  */
-int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
+int mpol_parse_str(char *str, struct mempolicy **mpol, int unused)
 {
 	struct mempolicy *new = NULL;
 	unsigned short mode;
-	unsigned short uninitialized_var(mode_flags);
+	unsigned short mode_flags;
 	nodemask_t nodes;
 	char *nodelist = strchr(str, ':');
 	char *flags = strchr(str, '=');
@@ -2719,24 +2711,23 @@ int mpol_parse_str(char *str, struct mem
 	if (IS_ERR(new))
 		goto out;
 
-	if (no_context) {
-		/* save for contextualization */
-		new->w.user_nodemask = nodes;
-	} else {
-		int ret;
-		NODEMASK_SCRATCH(scratch);
-		if (scratch) {
-			task_lock(current);
-			ret = mpol_set_nodemask(new, &nodes, scratch);
-			task_unlock(current);
-		} else
-			ret = -ENOMEM;
-		NODEMASK_SCRATCH_FREE(scratch);
-		if (ret) {
-			mpol_put(new);
-			goto out;
-		}
-	}
+	/*
+	 * Save nodes for mpol_to_str() to show the tmpfs mount options
+	 * for /proc/mounts, /proc/pid/mounts and /proc/pid/mountinfo.
+	 */
+	if (mode != MPOL_PREFERRED)
+		new->v.nodes = nodes;
+	else if (nodelist)
+		new->v.preferred_node = first_node(nodes);
+	else
+		new->flags |= MPOL_F_LOCAL;
+
+	/*
+	 * Save nodes for contextualization: this will be used to "clone"
+	 * the mempolicy in a specific context [cpuset] at a later time.
+	 */
+	new->w.user_nodemask = nodes;
+
 	err = 0;
 
 out:
@@ -2756,13 +2747,13 @@ out:
  * @buffer:  to contain formatted mempolicy string
  * @maxlen:  length of @buffer
  * @pol:  pointer to mempolicy to be formatted
- * @no_context:  "context free" mempolicy - use nodemask in w.user_nodemask
+ * @unused:  redundant argument, to be removed later.
  *
  * Convert a mempolicy into a string.
  * Returns the number of characters in buffer (if positive)
  * or an error (negative)
  */
-int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
+int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int unused)
 {
 	char *p = buffer;
 	int l;
@@ -2788,7 +2779,7 @@ int mpol_to_str(char *buffer, int maxlen
 	case MPOL_PREFERRED:
 		nodes_clear(nodes);
 		if (flags & MPOL_F_LOCAL)
-			mode = MPOL_LOCAL;	/* pseudo-policy */
+			mode = MPOL_LOCAL;
 		else
 			node_set(pol->v.preferred_node, nodes);
 		break;
@@ -2796,10 +2787,7 @@ int mpol_to_str(char *buffer, int maxlen
 	case MPOL_BIND:
 		/* Fall through */
 	case MPOL_INTERLEAVE:
-		if (no_context)
-			nodes = pol->w.user_nodemask;
-		else
-			nodes = pol->v.nodes;
+		nodes = pol->v.nodes;
 		break;
 
 	default:

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
@ 2013-01-02 10:01 ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2013-01-02 10:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	Christoph Lameter, David Rientjes, Mel Gorman, Ingo Molnar,
	Peter Zijlstra, linux-kernel, linux-mm

Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.

Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask()
when no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
With slab poisoning, you can then rely on mpol_to_str() to set the bit
for node 0x6b6b, probably in the next page above the caller's stack.

mpol_parse_str() is only called from shmem_parse_options(): no_context
is always true, so call it unused for now, and remove !no_context code.
Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
expect.  Then mpol_to_str() can ignore its no_context argument also,
the mpol being appropriately initialized whether contextualized or not.
Rename its no_context unused too, and let subsequent patch remove them
(that's not needed for stable backporting, which would involve rejects).

I don't understand why MPOL_LOCAL is described as a pseudo-policy:
it's a reasonable policy which suffers from a confusing implementation
in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
much more robust if MPOL_LOCAL were recognized in switch statements
throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
empty) nodes mask like everyone else, instead of its preferred_node
variant (I presume an optimization from the days before MPOL_LOCAL).
But that would take me too long to get right and fully tested.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
---

 mm/mempolicy.c |   64 +++++++++++++++++++----------------------------
 1 file changed, 26 insertions(+), 38 deletions(-)

--- 3.8-rc1/mm/mempolicy.c	2012-12-22 09:43:27.636015582 -0800
+++ linux/mm/mempolicy.c	2013-01-01 23:44:10.715017466 -0800
@@ -2595,8 +2595,7 @@ void numa_default_policy(void)
  */
 
 /*
- * "local" is pseudo-policy:  MPOL_PREFERRED with MPOL_F_LOCAL flag
- * Used only for mpol_parse_str() and mpol_to_str()
+ * "local" is implemented internally by MPOL_PREFERRED with MPOL_F_LOCAL flag.
  */
 static const char * const policy_modes[] =
 {
@@ -2610,28 +2609,21 @@ static const char * const policy_modes[]
 
 #ifdef CONFIG_TMPFS
 /**
- * mpol_parse_str - parse string to mempolicy
+ * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option.
  * @str:  string containing mempolicy to parse
  * @mpol:  pointer to struct mempolicy pointer, returned on success.
- * @no_context:  flag whether to "contextualize" the mempolicy
+ * @unused:  redundant argument, to be removed later.
  *
  * Format of input:
  *	<mode>[=<flags>][:<nodelist>]
  *
- * if @no_context is true, save the input nodemask in w.user_nodemask in
- * the returned mempolicy.  This will be used to "clone" the mempolicy in
- * a specific context [cpuset] at a later time.  Used to parse tmpfs mpol
- * mount option.  Note that if 'static' or 'relative' mode flags were
- * specified, the input nodemask will already have been saved.  Saving
- * it again is redundant, but safe.
- *
  * On success, returns 0, else 1
  */
-int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
+int mpol_parse_str(char *str, struct mempolicy **mpol, int unused)
 {
 	struct mempolicy *new = NULL;
 	unsigned short mode;
-	unsigned short uninitialized_var(mode_flags);
+	unsigned short mode_flags;
 	nodemask_t nodes;
 	char *nodelist = strchr(str, ':');
 	char *flags = strchr(str, '=');
@@ -2719,24 +2711,23 @@ int mpol_parse_str(char *str, struct mem
 	if (IS_ERR(new))
 		goto out;
 
-	if (no_context) {
-		/* save for contextualization */
-		new->w.user_nodemask = nodes;
-	} else {
-		int ret;
-		NODEMASK_SCRATCH(scratch);
-		if (scratch) {
-			task_lock(current);
-			ret = mpol_set_nodemask(new, &nodes, scratch);
-			task_unlock(current);
-		} else
-			ret = -ENOMEM;
-		NODEMASK_SCRATCH_FREE(scratch);
-		if (ret) {
-			mpol_put(new);
-			goto out;
-		}
-	}
+	/*
+	 * Save nodes for mpol_to_str() to show the tmpfs mount options
+	 * for /proc/mounts, /proc/pid/mounts and /proc/pid/mountinfo.
+	 */
+	if (mode != MPOL_PREFERRED)
+		new->v.nodes = nodes;
+	else if (nodelist)
+		new->v.preferred_node = first_node(nodes);
+	else
+		new->flags |= MPOL_F_LOCAL;
+
+	/*
+	 * Save nodes for contextualization: this will be used to "clone"
+	 * the mempolicy in a specific context [cpuset] at a later time.
+	 */
+	new->w.user_nodemask = nodes;
+
 	err = 0;
 
 out:
@@ -2756,13 +2747,13 @@ out:
  * @buffer:  to contain formatted mempolicy string
  * @maxlen:  length of @buffer
  * @pol:  pointer to mempolicy to be formatted
- * @no_context:  "context free" mempolicy - use nodemask in w.user_nodemask
+ * @unused:  redundant argument, to be removed later.
  *
  * Convert a mempolicy into a string.
  * Returns the number of characters in buffer (if positive)
  * or an error (negative)
  */
-int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
+int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int unused)
 {
 	char *p = buffer;
 	int l;
@@ -2788,7 +2779,7 @@ int mpol_to_str(char *buffer, int maxlen
 	case MPOL_PREFERRED:
 		nodes_clear(nodes);
 		if (flags & MPOL_F_LOCAL)
-			mode = MPOL_LOCAL;	/* pseudo-policy */
+			mode = MPOL_LOCAL;
 		else
 			node_set(pol->v.preferred_node, nodes);
 		break;
@@ -2796,10 +2787,7 @@ int mpol_to_str(char *buffer, int maxlen
 	case MPOL_BIND:
 		/* Fall through */
 	case MPOL_INTERLEAVE:
-		if (no_context)
-			nodes = pol->w.user_nodemask;
-		else
-			nodes = pol->v.nodes;
+		nodes = pol->v.nodes;
 		break;
 
 	default:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/2] mempolicy: remove arg from mpol_parse_str, mpol_to_str
  2013-01-02 10:01 ` Hugh Dickins
@ 2013-01-02 10:04   ` Hugh Dickins
  -1 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2013-01-02 10:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	Christoph Lameter, David Rientjes, Mel Gorman, Ingo Molnar,
	Peter Zijlstra, linux-kernel, linux-mm

Remove the unused argument (formerly no_context) from mpol_parse_str()
and from mpol_to_str().

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 fs/proc/task_mmu.c        |    2 +-
 include/linux/mempolicy.h |   11 ++++-------
 mm/mempolicy.c            |    6 ++----
 mm/shmem.c                |    4 ++--
 4 files changed, 9 insertions(+), 14 deletions(-)

--- 3.8-rc1+/fs/proc/task_mmu.c	2012-12-22 09:43:26.916015565 -0800
+++ linux/fs/proc/task_mmu.c	2013-01-01 23:26:30.174992261 -0800
@@ -1278,7 +1278,7 @@ static int show_numa_map(struct seq_file
 	walk.mm = mm;
 
 	pol = get_vma_policy(task, vma, vma->vm_start);
-	mpol_to_str(buffer, sizeof(buffer), pol, 0);
+	mpol_to_str(buffer, sizeof(buffer), pol);
 	mpol_cond_put(pol);
 
 	seq_printf(m, "%08lx %s", vma->vm_start, buffer);
--- 3.8-rc1+/include/linux/mempolicy.h	2012-12-22 09:43:27.172015571 -0800
+++ linux/include/linux/mempolicy.h	2013-01-01 23:26:30.174992261 -0800
@@ -165,11 +165,10 @@ int do_migrate_pages(struct mm_struct *m
 
 
 #ifdef CONFIG_TMPFS
-extern int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context);
+extern int mpol_parse_str(char *str, struct mempolicy **mpol);
 #endif
 
-extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol,
-			int no_context);
+extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol);
 
 /* Check if a vma is migratable */
 static inline int vma_migratable(struct vm_area_struct *vma)
@@ -296,15 +295,13 @@ static inline void check_highest_zone(in
 }
 
 #ifdef CONFIG_TMPFS
-static inline int mpol_parse_str(char *str, struct mempolicy **mpol,
-				int no_context)
+static inline int mpol_parse_str(char *str, struct mempolicy **mpol)
 {
 	return 1;	/* error */
 }
 #endif
 
-static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol,
-				int no_context)
+static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
 {
 	return 0;
 }
--- 3.8-rc1+/mm/mempolicy.c	2013-01-01 23:44:10.715017466 -0800
+++ linux/mm/mempolicy.c	2013-01-01 23:47:34.223022303 -0800
@@ -2612,14 +2612,13 @@ static const char * const policy_modes[]
  * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option.
  * @str:  string containing mempolicy to parse
  * @mpol:  pointer to struct mempolicy pointer, returned on success.
- * @unused:  redundant argument, to be removed later.
  *
  * Format of input:
  *	<mode>[=<flags>][:<nodelist>]
  *
  * On success, returns 0, else 1
  */
-int mpol_parse_str(char *str, struct mempolicy **mpol, int unused)
+int mpol_parse_str(char *str, struct mempolicy **mpol)
 {
 	struct mempolicy *new = NULL;
 	unsigned short mode;
@@ -2747,13 +2746,12 @@ out:
  * @buffer:  to contain formatted mempolicy string
  * @maxlen:  length of @buffer
  * @pol:  pointer to mempolicy to be formatted
- * @unused:  redundant argument, to be removed later.
  *
  * Convert a mempolicy into a string.
  * Returns the number of characters in buffer (if positive)
  * or an error (negative)
  */
-int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int unused)
+int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
 {
 	char *p = buffer;
 	int l;
--- 3.8-rc1+/mm/shmem.c	2012-12-22 09:43:27.660015583 -0800
+++ linux/mm/shmem.c	2013-01-01 23:26:30.174992261 -0800
@@ -889,7 +889,7 @@ static void shmem_show_mpol(struct seq_f
 	if (!mpol || mpol->mode == MPOL_DEFAULT)
 		return;		/* show nothing */
 
-	mpol_to_str(buffer, sizeof(buffer), mpol, 1);
+	mpol_to_str(buffer, sizeof(buffer), mpol);
 
 	seq_printf(seq, ",mpol=%s", buffer);
 }
@@ -2463,7 +2463,7 @@ static int shmem_parse_options(char *opt
 			if (!gid_valid(sbinfo->gid))
 				goto bad_val;
 		} else if (!strcmp(this_char,"mpol")) {
-			if (mpol_parse_str(value, &sbinfo->mpol, 1))
+			if (mpol_parse_str(value, &sbinfo->mpol))
 				goto bad_val;
 		} else {
 			printk(KERN_ERR "tmpfs: Bad mount option %s\n",

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/2] mempolicy: remove arg from mpol_parse_str, mpol_to_str
@ 2013-01-02 10:04   ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2013-01-02 10:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	Christoph Lameter, David Rientjes, Mel Gorman, Ingo Molnar,
	Peter Zijlstra, linux-kernel, linux-mm

Remove the unused argument (formerly no_context) from mpol_parse_str()
and from mpol_to_str().

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 fs/proc/task_mmu.c        |    2 +-
 include/linux/mempolicy.h |   11 ++++-------
 mm/mempolicy.c            |    6 ++----
 mm/shmem.c                |    4 ++--
 4 files changed, 9 insertions(+), 14 deletions(-)

--- 3.8-rc1+/fs/proc/task_mmu.c	2012-12-22 09:43:26.916015565 -0800
+++ linux/fs/proc/task_mmu.c	2013-01-01 23:26:30.174992261 -0800
@@ -1278,7 +1278,7 @@ static int show_numa_map(struct seq_file
 	walk.mm = mm;
 
 	pol = get_vma_policy(task, vma, vma->vm_start);
-	mpol_to_str(buffer, sizeof(buffer), pol, 0);
+	mpol_to_str(buffer, sizeof(buffer), pol);
 	mpol_cond_put(pol);
 
 	seq_printf(m, "%08lx %s", vma->vm_start, buffer);
--- 3.8-rc1+/include/linux/mempolicy.h	2012-12-22 09:43:27.172015571 -0800
+++ linux/include/linux/mempolicy.h	2013-01-01 23:26:30.174992261 -0800
@@ -165,11 +165,10 @@ int do_migrate_pages(struct mm_struct *m
 
 
 #ifdef CONFIG_TMPFS
-extern int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context);
+extern int mpol_parse_str(char *str, struct mempolicy **mpol);
 #endif
 
-extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol,
-			int no_context);
+extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol);
 
 /* Check if a vma is migratable */
 static inline int vma_migratable(struct vm_area_struct *vma)
@@ -296,15 +295,13 @@ static inline void check_highest_zone(in
 }
 
 #ifdef CONFIG_TMPFS
-static inline int mpol_parse_str(char *str, struct mempolicy **mpol,
-				int no_context)
+static inline int mpol_parse_str(char *str, struct mempolicy **mpol)
 {
 	return 1;	/* error */
 }
 #endif
 
-static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol,
-				int no_context)
+static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
 {
 	return 0;
 }
--- 3.8-rc1+/mm/mempolicy.c	2013-01-01 23:44:10.715017466 -0800
+++ linux/mm/mempolicy.c	2013-01-01 23:47:34.223022303 -0800
@@ -2612,14 +2612,13 @@ static const char * const policy_modes[]
  * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option.
  * @str:  string containing mempolicy to parse
  * @mpol:  pointer to struct mempolicy pointer, returned on success.
- * @unused:  redundant argument, to be removed later.
  *
  * Format of input:
  *	<mode>[=<flags>][:<nodelist>]
  *
  * On success, returns 0, else 1
  */
-int mpol_parse_str(char *str, struct mempolicy **mpol, int unused)
+int mpol_parse_str(char *str, struct mempolicy **mpol)
 {
 	struct mempolicy *new = NULL;
 	unsigned short mode;
@@ -2747,13 +2746,12 @@ out:
  * @buffer:  to contain formatted mempolicy string
  * @maxlen:  length of @buffer
  * @pol:  pointer to mempolicy to be formatted
- * @unused:  redundant argument, to be removed later.
  *
  * Convert a mempolicy into a string.
  * Returns the number of characters in buffer (if positive)
  * or an error (negative)
  */
-int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int unused)
+int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
 {
 	char *p = buffer;
 	int l;
--- 3.8-rc1+/mm/shmem.c	2012-12-22 09:43:27.660015583 -0800
+++ linux/mm/shmem.c	2013-01-01 23:26:30.174992261 -0800
@@ -889,7 +889,7 @@ static void shmem_show_mpol(struct seq_f
 	if (!mpol || mpol->mode == MPOL_DEFAULT)
 		return;		/* show nothing */
 
-	mpol_to_str(buffer, sizeof(buffer), mpol, 1);
+	mpol_to_str(buffer, sizeof(buffer), mpol);
 
 	seq_printf(seq, ",mpol=%s", buffer);
 }
@@ -2463,7 +2463,7 @@ static int shmem_parse_options(char *opt
 			if (!gid_valid(sbinfo->gid))
 				goto bad_val;
 		} else if (!strcmp(this_char,"mpol")) {
-			if (mpol_parse_str(value, &sbinfo->mpol, 1))
+			if (mpol_parse_str(value, &sbinfo->mpol))
 				goto bad_val;
 		} else {
 			printk(KERN_ERR "tmpfs: Bad mount option %s\n",

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
  2013-01-02 10:01 ` Hugh Dickins
@ 2013-01-02 14:32   ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2013-01-02 14:32 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	David Rientjes, Mel Gorman, Ingo Molnar, Peter Zijlstra,
	linux-kernel, linux-mm

On Wed, 2 Jan 2013, Hugh Dickins wrote:

> Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
> when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask()
> when no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
> which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
> With slab poisoning, you can then rely on mpol_to_str() to set the bit
> for node 0x6b6b, probably in the next page above the caller's stack.

Ugly. But 2.6.35 means that the patch was not included in several
enterprise linux releases.

> I don't understand why MPOL_LOCAL is described as a pseudo-policy:
> it's a reasonable policy which suffers from a confusing implementation
> in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
> much more robust if MPOL_LOCAL were recognized in switch statements
> throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
> empty) nodes mask like everyone else, instead of its preferred_node
> variant (I presume an optimization from the days before MPOL_LOCAL).
> But that would take me too long to get right and fully tested.

The current approaches to implementing NUMA scheduling are making
MPOL_LOCAL an explicit policy. See
https://patchwork.kernel.org/patch/1703641/.

Does that address the concerns?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
@ 2013-01-02 14:32   ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2013-01-02 14:32 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	David Rientjes, Mel Gorman, Ingo Molnar, Peter Zijlstra,
	linux-kernel, linux-mm

On Wed, 2 Jan 2013, Hugh Dickins wrote:

> Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
> when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask()
> when no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
> which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
> With slab poisoning, you can then rely on mpol_to_str() to set the bit
> for node 0x6b6b, probably in the next page above the caller's stack.

Ugly. But 2.6.35 means that the patch was not included in several
enterprise linux releases.

> I don't understand why MPOL_LOCAL is described as a pseudo-policy:
> it's a reasonable policy which suffers from a confusing implementation
> in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
> much more robust if MPOL_LOCAL were recognized in switch statements
> throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
> empty) nodes mask like everyone else, instead of its preferred_node
> variant (I presume an optimization from the days before MPOL_LOCAL).
> But that would take me too long to get right and fully tested.

The current approaches to implementing NUMA scheduling are making
MPOL_LOCAL an explicit policy. See
https://patchwork.kernel.org/patch/1703641/.

Does that address the concerns?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
  2013-01-02 10:01 ` Hugh Dickins
@ 2013-01-02 15:57   ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2013-01-02 15:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	David Rientjes, Mel Gorman, Ingo Molnar, Peter Zijlstra,
	linux-kernel, linux-mm

On Wed, 2 Jan 2013, Hugh Dickins wrote:

> @@ -2756,13 +2747,13 @@ out:
>   * @buffer:  to contain formatted mempolicy string
>   * @maxlen:  length of @buffer
>   * @pol:  pointer to mempolicy to be formatted
> - * @no_context:  "context free" mempolicy - use nodemask in w.user_nodemask
> + * @unused:  redundant argument, to be removed later.
>   *
>   * Convert a mempolicy into a string.
>   * Returns the number of characters in buffer (if positive)
>   * or an error (negative)
>   */
> -int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
> +int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int unused)
>  {
>  	char *p = buffer;
>  	int l;
> @@ -2796,10 +2787,7 @@ int mpol_to_str(char *buffer, int maxlen
>  	case MPOL_BIND:
>  		/* Fall through */
>  	case MPOL_INTERLEAVE:
> -		if (no_context)
> -			nodes = pol->w.user_nodemask;
> -		else
> -			nodes = pol->v.nodes;
> +		nodes = pol->v.nodes;
>  		break;
>

no_context was always true. Why is the code from the false branch kept?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
@ 2013-01-02 15:57   ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2013-01-02 15:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	David Rientjes, Mel Gorman, Ingo Molnar, Peter Zijlstra,
	linux-kernel, linux-mm

On Wed, 2 Jan 2013, Hugh Dickins wrote:

> @@ -2756,13 +2747,13 @@ out:
>   * @buffer:  to contain formatted mempolicy string
>   * @maxlen:  length of @buffer
>   * @pol:  pointer to mempolicy to be formatted
> - * @no_context:  "context free" mempolicy - use nodemask in w.user_nodemask
> + * @unused:  redundant argument, to be removed later.
>   *
>   * Convert a mempolicy into a string.
>   * Returns the number of characters in buffer (if positive)
>   * or an error (negative)
>   */
> -int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
> +int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int unused)
>  {
>  	char *p = buffer;
>  	int l;
> @@ -2796,10 +2787,7 @@ int mpol_to_str(char *buffer, int maxlen
>  	case MPOL_BIND:
>  		/* Fall through */
>  	case MPOL_INTERLEAVE:
> -		if (no_context)
> -			nodes = pol->w.user_nodemask;
> -		else
> -			nodes = pol->v.nodes;
> +		nodes = pol->v.nodes;
>  		break;
>

no_context was always true. Why is the code from the false branch kept?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
  2013-01-02 15:57   ` Christoph Lameter
@ 2013-01-02 17:24     ` Linus Torvalds
  -1 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2013-01-02 17:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hugh Dickins, Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	David Rientjes, Mel Gorman, Ingo Molnar, Peter Zijlstra,
	Linux Kernel Mailing List, linux-mm

On Wed, Jan 2, 2013 at 7:57 AM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 2 Jan 2013, Hugh Dickins wrote:
>
>> @@ -2796,10 +2787,7 @@ int mpol_to_str(char *buffer, int maxlen
>>       case MPOL_BIND:
>>               /* Fall through */
>>       case MPOL_INTERLEAVE:
>> -             if (no_context)
>> -                     nodes = pol->w.user_nodemask;
>> -             else
>> -                     nodes = pol->v.nodes;
>> +             nodes = pol->v.nodes;
>>               break;
>>
>
> no_context was always true. Why is the code from the false branch kept?

no_context is zero in the caller in fs/proc/task_mmu.c, and one in the
mm/shmem.c caller. So it's not always true (for mpol_parse_str() there
is only one caller, and it's always true as Hugh said).

Anyway, I do not know why Hugh took the true case, but I don't really
imagine that it matters. So I'll take these two patches, but it would
be good if you double-checked this, Hugh.

Hugh?

                   Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
@ 2013-01-02 17:24     ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2013-01-02 17:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hugh Dickins, Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	David Rientjes, Mel Gorman, Ingo Molnar, Peter Zijlstra,
	Linux Kernel Mailing List, linux-mm

On Wed, Jan 2, 2013 at 7:57 AM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 2 Jan 2013, Hugh Dickins wrote:
>
>> @@ -2796,10 +2787,7 @@ int mpol_to_str(char *buffer, int maxlen
>>       case MPOL_BIND:
>>               /* Fall through */
>>       case MPOL_INTERLEAVE:
>> -             if (no_context)
>> -                     nodes = pol->w.user_nodemask;
>> -             else
>> -                     nodes = pol->v.nodes;
>> +             nodes = pol->v.nodes;
>>               break;
>>
>
> no_context was always true. Why is the code from the false branch kept?

no_context is zero in the caller in fs/proc/task_mmu.c, and one in the
mm/shmem.c caller. So it's not always true (for mpol_parse_str() there
is only one caller, and it's always true as Hugh said).

Anyway, I do not know why Hugh took the true case, but I don't really
imagine that it matters. So I'll take these two patches, but it would
be good if you double-checked this, Hugh.

Hugh?

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
  2013-01-02 17:24     ` Linus Torvalds
@ 2013-01-02 17:27       ` Linus Torvalds
  -1 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2013-01-02 17:27 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hugh Dickins, Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	David Rientjes, Mel Gorman, Ingo Molnar, Peter Zijlstra,
	Linux Kernel Mailing List, linux-mm

On Wed, Jan 2, 2013 at 9:24 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Anyway, I do not know why Hugh took the true case, but I don't really
> imagine that it matters. So I'll take these two patches, but it would
> be good if you double-checked this, Hugh.

Oh, Hugh actually even mentioned it in the commit message. So never mind.

                Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
@ 2013-01-02 17:27       ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2013-01-02 17:27 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hugh Dickins, Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	David Rientjes, Mel Gorman, Ingo Molnar, Peter Zijlstra,
	Linux Kernel Mailing List, linux-mm

On Wed, Jan 2, 2013 at 9:24 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Anyway, I do not know why Hugh took the true case, but I don't really
> imagine that it matters. So I'll take these two patches, but it would
> be good if you double-checked this, Hugh.

Oh, Hugh actually even mentioned it in the commit message. So never mind.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
  2013-01-02 14:32   ` Christoph Lameter
@ 2013-01-02 18:30     ` Hugh Dickins
  -1 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2013-01-02 18:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Linus Torvalds, Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	David Rientjes, Mel Gorman, Ingo Molnar, Peter Zijlstra,
	linux-kernel, linux-mm

On Wed, 2 Jan 2013, Christoph Lameter wrote:
> On Wed, 2 Jan 2013, Hugh Dickins wrote:
> 
> > Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
> > when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask()
> > when no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
> > which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
> > With slab poisoning, you can then rely on mpol_to_str() to set the bit
> > for node 0x6b6b, probably in the next page above the caller's stack.
> 
> Ugly. But 2.6.35 means that the patch was not included in several
> enterprise linux releases.

Thanks, that's some relief.  I forgot to mention that a good test for
whether your particular kernel (with who knows what additional patches
applied) is affected, is to

mount -o remount,mpol=local /dev/shm # which should be a tmpfs
grep /dev/shm /proc/mounts

If that says "mpol=prefer" then you're affected and need the fix; if
it says "mpol=local" (like 2.6.34 or after this fix) then you're safe.

(Conversely, setting "mpol=prefer" shows up as "mpol=local" after the,
fix, since that's what prefer without a node specification amounts to.)

> 
> > I don't understand why MPOL_LOCAL is described as a pseudo-policy:
> > it's a reasonable policy which suffers from a confusing implementation
> > in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
> > much more robust if MPOL_LOCAL were recognized in switch statements
> > throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
> > empty) nodes mask like everyone else, instead of its preferred_node
> > variant (I presume an optimization from the days before MPOL_LOCAL).
> > But that would take me too long to get right and fully tested.
> 
> The current approaches to implementing NUMA scheduling are making
> MPOL_LOCAL an explicit policy. See
> https://patchwork.kernel.org/patch/1703641/.

It's a good step in the right direction.

> 
> Does that address the concerns?

It makes no difference to this bug, and does not go far enough to
remove all the MPOL_F_LOCAL MPOL_PREFERRED MPOL_LOCAL twistiness.

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
@ 2013-01-02 18:30     ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2013-01-02 18:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Linus Torvalds, Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro,
	David Rientjes, Mel Gorman, Ingo Molnar, Peter Zijlstra,
	linux-kernel, linux-mm

On Wed, 2 Jan 2013, Christoph Lameter wrote:
> On Wed, 2 Jan 2013, Hugh Dickins wrote:
> 
> > Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
> > when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask()
> > when no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
> > which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
> > With slab poisoning, you can then rely on mpol_to_str() to set the bit
> > for node 0x6b6b, probably in the next page above the caller's stack.
> 
> Ugly. But 2.6.35 means that the patch was not included in several
> enterprise linux releases.

Thanks, that's some relief.  I forgot to mention that a good test for
whether your particular kernel (with who knows what additional patches
applied) is affected, is to

mount -o remount,mpol=local /dev/shm # which should be a tmpfs
grep /dev/shm /proc/mounts

If that says "mpol=prefer" then you're affected and need the fix; if
it says "mpol=local" (like 2.6.34 or after this fix) then you're safe.

(Conversely, setting "mpol=prefer" shows up as "mpol=local" after the,
fix, since that's what prefer without a node specification amounts to.)

> 
> > I don't understand why MPOL_LOCAL is described as a pseudo-policy:
> > it's a reasonable policy which suffers from a confusing implementation
> > in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
> > much more robust if MPOL_LOCAL were recognized in switch statements
> > throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
> > empty) nodes mask like everyone else, instead of its preferred_node
> > variant (I presume an optimization from the days before MPOL_LOCAL).
> > But that would take me too long to get right and fully tested.
> 
> The current approaches to implementing NUMA scheduling are making
> MPOL_LOCAL an explicit policy. See
> https://patchwork.kernel.org/patch/1703641/.

It's a good step in the right direction.

> 
> Does that address the concerns?

It makes no difference to this bug, and does not go far enough to
remove all the MPOL_F_LOCAL MPOL_PREFERRED MPOL_LOCAL twistiness.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
  2013-01-02 17:24     ` Linus Torvalds
@ 2013-01-02 18:48       ` Hugh Dickins
  -1 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2013-01-02 18:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Andrew Morton, Lee Schermerhorn,
	KOSAKI Motohiro, David Rientjes, Mel Gorman, Ingo Molnar,
	Peter Zijlstra, Linux Kernel Mailing List, linux-mm

On Wed, 2 Jan 2013, Linus Torvalds wrote:
> On Wed, Jan 2, 2013 at 7:57 AM, Christoph Lameter <cl@linux.com> wrote:
> > On Wed, 2 Jan 2013, Hugh Dickins wrote:
> >
> >> @@ -2796,10 +2787,7 @@ int mpol_to_str(char *buffer, int maxlen
> >>       case MPOL_BIND:
> >>               /* Fall through */
> >>       case MPOL_INTERLEAVE:
> >> -             if (no_context)
> >> -                     nodes = pol->w.user_nodemask;
> >> -             else
> >> -                     nodes = pol->v.nodes;
> >> +             nodes = pol->v.nodes;
> >>               break;
> >>
> >
> > no_context was always true. Why is the code from the false branch kept?
> 
> no_context is zero in the caller in fs/proc/task_mmu.c, and one in the
> mm/shmem.c caller. So it's not always true (for mpol_parse_str() there
> is only one caller, and it's always true as Hugh said).

Yes, I think Christoph was remembering the old days when mpol_to_str()
started out just for tmpfs; later /proc/pid/numa_maps extended it for
use on vmas (the "contextualized" !no_context case).

> 
> Anyway, I do not know why Hugh took the true case, but I don't really
> imagine that it matters. So I'll take these two patches, but it would
> be good if you double-checked this, Hugh.

Thanks, yes, I played with a number of ways of fixing it (and sat on my
original fix for several days, rightly guessing this an area where more
problems would emerge - only later realizing mpol=prefer:Node wrong too).

I could probably have kept mpol_to_str()'s no_context arg, and done
something with it in the MPOL_PREFERRED case; perhaps would have chosen
that if the arg had been more understandably named than "no_context";
but in the end thought removing the need for the arg was simplest.

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
@ 2013-01-02 18:48       ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2013-01-02 18:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Andrew Morton, Lee Schermerhorn,
	KOSAKI Motohiro, David Rientjes, Mel Gorman, Ingo Molnar,
	Peter Zijlstra, Linux Kernel Mailing List, linux-mm

On Wed, 2 Jan 2013, Linus Torvalds wrote:
> On Wed, Jan 2, 2013 at 7:57 AM, Christoph Lameter <cl@linux.com> wrote:
> > On Wed, 2 Jan 2013, Hugh Dickins wrote:
> >
> >> @@ -2796,10 +2787,7 @@ int mpol_to_str(char *buffer, int maxlen
> >>       case MPOL_BIND:
> >>               /* Fall through */
> >>       case MPOL_INTERLEAVE:
> >> -             if (no_context)
> >> -                     nodes = pol->w.user_nodemask;
> >> -             else
> >> -                     nodes = pol->v.nodes;
> >> +             nodes = pol->v.nodes;
> >>               break;
> >>
> >
> > no_context was always true. Why is the code from the false branch kept?
> 
> no_context is zero in the caller in fs/proc/task_mmu.c, and one in the
> mm/shmem.c caller. So it's not always true (for mpol_parse_str() there
> is only one caller, and it's always true as Hugh said).

Yes, I think Christoph was remembering the old days when mpol_to_str()
started out just for tmpfs; later /proc/pid/numa_maps extended it for
use on vmas (the "contextualized" !no_context case).

> 
> Anyway, I do not know why Hugh took the true case, but I don't really
> imagine that it matters. So I'll take these two patches, but it would
> be good if you double-checked this, Hugh.

Thanks, yes, I played with a number of ways of fixing it (and sat on my
original fix for several days, rightly guessing this an area where more
problems would emerge - only later realizing mpol=prefer:Node wrong too).

I could probably have kept mpol_to_str()'s no_context arg, and done
something with it in the MPOL_PREFERRED case; perhaps would have chosen
that if the arg had been more understandably named than "no_context";
but in the end thought removing the need for the arg was simplest.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
  2013-01-02 10:01 ` Hugh Dickins
@ 2013-01-02 20:38   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 20+ messages in thread
From: KOSAKI Motohiro @ 2013-01-02 20:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Lee Schermerhorn,
	Christoph Lameter, David Rientjes, Mel Gorman, Ingo Molnar,
	Peter Zijlstra, LKML, linux-mm

On Wed, Jan 2, 2013 at 5:01 AM, Hugh Dickins <hughd@google.com> wrote:
> Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
> mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
> or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
> in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
> pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
> worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.
>
> Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
> when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask()
> when no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
> which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
> With slab poisoning, you can then rely on mpol_to_str() to set the bit
> for node 0x6b6b, probably in the next page above the caller's stack.
>
> mpol_parse_str() is only called from shmem_parse_options(): no_context
> is always true, so call it unused for now, and remove !no_context code.
> Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
> expect.  Then mpol_to_str() can ignore its no_context argument also,
> the mpol being appropriately initialized whether contextualized or not.
> Rename its no_context unused too, and let subsequent patch remove them
> (that's not needed for stable backporting, which would involve rejects).
>
> I don't understand why MPOL_LOCAL is described as a pseudo-policy:
> it's a reasonable policy which suffers from a confusing implementation
> in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
> much more robust if MPOL_LOCAL were recognized in switch statements
> throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
> empty) nodes mask like everyone else, instead of its preferred_node
> variant (I presume an optimization from the days before MPOL_LOCAL).
> But that would take me too long to get right and fully tested.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org
> ---
>
>  mm/mempolicy.c |   64 +++++++++++++++++++----------------------------
>  1 file changed, 26 insertions(+), 38 deletions(-)
>
> --- 3.8-rc1/mm/mempolicy.c      2012-12-22 09:43:27.636015582 -0800
> +++ linux/mm/mempolicy.c        2013-01-01 23:44:10.715017466 -0800
> @@ -2595,8 +2595,7 @@ void numa_default_policy(void)
>   */
>
>  /*
> - * "local" is pseudo-policy:  MPOL_PREFERRED with MPOL_F_LOCAL flag
> - * Used only for mpol_parse_str() and mpol_to_str()
> + * "local" is implemented internally by MPOL_PREFERRED with MPOL_F_LOCAL flag.
>   */
>  static const char * const policy_modes[] =
>  {
> @@ -2610,28 +2609,21 @@ static const char * const policy_modes[]
>
>  #ifdef CONFIG_TMPFS
>  /**
> - * mpol_parse_str - parse string to mempolicy
> + * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option.
>   * @str:  string containing mempolicy to parse
>   * @mpol:  pointer to struct mempolicy pointer, returned on success.
> - * @no_context:  flag whether to "contextualize" the mempolicy
> + * @unused:  redundant argument, to be removed later.
>   *
>   * Format of input:
>   *     <mode>[=<flags>][:<nodelist>]
>   *
> - * if @no_context is true, save the input nodemask in w.user_nodemask in
> - * the returned mempolicy.  This will be used to "clone" the mempolicy in
> - * a specific context [cpuset] at a later time.  Used to parse tmpfs mpol
> - * mount option.  Note that if 'static' or 'relative' mode flags were
> - * specified, the input nodemask will already have been saved.  Saving
> - * it again is redundant, but safe.
> - *
>   * On success, returns 0, else 1
>   */
> -int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
> +int mpol_parse_str(char *str, struct mempolicy **mpol, int unused)
>  {
>         struct mempolicy *new = NULL;
>         unsigned short mode;
> -       unsigned short uninitialized_var(mode_flags);
> +       unsigned short mode_flags;
>         nodemask_t nodes;
>         char *nodelist = strchr(str, ':');
>         char *flags = strchr(str, '=');
> @@ -2719,24 +2711,23 @@ int mpol_parse_str(char *str, struct mem
>         if (IS_ERR(new))
>                 goto out;
>
> -       if (no_context) {
> -               /* save for contextualization */
> -               new->w.user_nodemask = nodes;
> -       } else {
> -               int ret;
> -               NODEMASK_SCRATCH(scratch);
> -               if (scratch) {
> -                       task_lock(current);
> -                       ret = mpol_set_nodemask(new, &nodes, scratch);
> -                       task_unlock(current);
> -               } else
> -                       ret = -ENOMEM;
> -               NODEMASK_SCRATCH_FREE(scratch);
> -               if (ret) {
> -                       mpol_put(new);
> -                       goto out;
> -               }
> -       }
> +       /*
> +        * Save nodes for mpol_to_str() to show the tmpfs mount options
> +        * for /proc/mounts, /proc/pid/mounts and /proc/pid/mountinfo.
> +        */
> +       if (mode != MPOL_PREFERRED)
> +               new->v.nodes = nodes;
> +       else if (nodelist)
> +               new->v.preferred_node = first_node(nodes);
> +       else
> +               new->flags |= MPOL_F_LOCAL;
> +
> +       /*
> +        * Save nodes for contextualization: this will be used to "clone"
> +        * the mempolicy in a specific context [cpuset] at a later time.
> +        */
> +       new->w.user_nodemask = nodes;
> +
>         err = 0;

Ugh, Indeed.
I should have realized this mistake at my last full review.



>
>  out:
> @@ -2756,13 +2747,13 @@ out:
>   * @buffer:  to contain formatted mempolicy string
>   * @maxlen:  length of @buffer
>   * @pol:  pointer to mempolicy to be formatted
> - * @no_context:  "context free" mempolicy - use nodemask in w.user_nodemask
> + * @unused:  redundant argument, to be removed later.
>   *
>   * Convert a mempolicy into a string.
>   * Returns the number of characters in buffer (if positive)
>   * or an error (negative)
>   */
> -int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
> +int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int unused)
>  {
>         char *p = buffer;
>         int l;
> @@ -2788,7 +2779,7 @@ int mpol_to_str(char *buffer, int maxlen
>         case MPOL_PREFERRED:
>                 nodes_clear(nodes);
>                 if (flags & MPOL_F_LOCAL)
> -                       mode = MPOL_LOCAL;      /* pseudo-policy */
> +                       mode = MPOL_LOCAL;
>                 else
>                         node_set(pol->v.preferred_node, nodes);
>                 break;
> @@ -2796,10 +2787,7 @@ int mpol_to_str(char *buffer, int maxlen
>         case MPOL_BIND:
>                 /* Fall through */
>         case MPOL_INTERLEAVE:
> -               if (no_context)
> -                       nodes = pol->w.user_nodemask;
> -               else
> -                       nodes = pol->v.nodes;
> +               nodes = pol->v.nodes;
>                 break;

Hmm. yes, tmpfs mempolicy shoule be out of cpuset contextualization.
then when no_context is true, w.user_nodemask is alywas same v.nodes.

But note, I'd like to change this to make memory hot-plug safe. then I
may resurrect
this test for distinguish before and after hot plugging.

Anyway, I have no seen any problems.

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory
@ 2013-01-02 20:38   ` KOSAKI Motohiro
  0 siblings, 0 replies; 20+ messages in thread
From: KOSAKI Motohiro @ 2013-01-02 20:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Lee Schermerhorn,
	Christoph Lameter, David Rientjes, Mel Gorman, Ingo Molnar,
	Peter Zijlstra, LKML, linux-mm

On Wed, Jan 2, 2013 at 5:01 AM, Hugh Dickins <hughd@google.com> wrote:
> Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
> mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
> or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
> in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
> pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
> worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.
>
> Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
> when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask()
> when no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
> which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
> With slab poisoning, you can then rely on mpol_to_str() to set the bit
> for node 0x6b6b, probably in the next page above the caller's stack.
>
> mpol_parse_str() is only called from shmem_parse_options(): no_context
> is always true, so call it unused for now, and remove !no_context code.
> Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
> expect.  Then mpol_to_str() can ignore its no_context argument also,
> the mpol being appropriately initialized whether contextualized or not.
> Rename its no_context unused too, and let subsequent patch remove them
> (that's not needed for stable backporting, which would involve rejects).
>
> I don't understand why MPOL_LOCAL is described as a pseudo-policy:
> it's a reasonable policy which suffers from a confusing implementation
> in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
> much more robust if MPOL_LOCAL were recognized in switch statements
> throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
> empty) nodes mask like everyone else, instead of its preferred_node
> variant (I presume an optimization from the days before MPOL_LOCAL).
> But that would take me too long to get right and fully tested.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org
> ---
>
>  mm/mempolicy.c |   64 +++++++++++++++++++----------------------------
>  1 file changed, 26 insertions(+), 38 deletions(-)
>
> --- 3.8-rc1/mm/mempolicy.c      2012-12-22 09:43:27.636015582 -0800
> +++ linux/mm/mempolicy.c        2013-01-01 23:44:10.715017466 -0800
> @@ -2595,8 +2595,7 @@ void numa_default_policy(void)
>   */
>
>  /*
> - * "local" is pseudo-policy:  MPOL_PREFERRED with MPOL_F_LOCAL flag
> - * Used only for mpol_parse_str() and mpol_to_str()
> + * "local" is implemented internally by MPOL_PREFERRED with MPOL_F_LOCAL flag.
>   */
>  static const char * const policy_modes[] =
>  {
> @@ -2610,28 +2609,21 @@ static const char * const policy_modes[]
>
>  #ifdef CONFIG_TMPFS
>  /**
> - * mpol_parse_str - parse string to mempolicy
> + * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option.
>   * @str:  string containing mempolicy to parse
>   * @mpol:  pointer to struct mempolicy pointer, returned on success.
> - * @no_context:  flag whether to "contextualize" the mempolicy
> + * @unused:  redundant argument, to be removed later.
>   *
>   * Format of input:
>   *     <mode>[=<flags>][:<nodelist>]
>   *
> - * if @no_context is true, save the input nodemask in w.user_nodemask in
> - * the returned mempolicy.  This will be used to "clone" the mempolicy in
> - * a specific context [cpuset] at a later time.  Used to parse tmpfs mpol
> - * mount option.  Note that if 'static' or 'relative' mode flags were
> - * specified, the input nodemask will already have been saved.  Saving
> - * it again is redundant, but safe.
> - *
>   * On success, returns 0, else 1
>   */
> -int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
> +int mpol_parse_str(char *str, struct mempolicy **mpol, int unused)
>  {
>         struct mempolicy *new = NULL;
>         unsigned short mode;
> -       unsigned short uninitialized_var(mode_flags);
> +       unsigned short mode_flags;
>         nodemask_t nodes;
>         char *nodelist = strchr(str, ':');
>         char *flags = strchr(str, '=');
> @@ -2719,24 +2711,23 @@ int mpol_parse_str(char *str, struct mem
>         if (IS_ERR(new))
>                 goto out;
>
> -       if (no_context) {
> -               /* save for contextualization */
> -               new->w.user_nodemask = nodes;
> -       } else {
> -               int ret;
> -               NODEMASK_SCRATCH(scratch);
> -               if (scratch) {
> -                       task_lock(current);
> -                       ret = mpol_set_nodemask(new, &nodes, scratch);
> -                       task_unlock(current);
> -               } else
> -                       ret = -ENOMEM;
> -               NODEMASK_SCRATCH_FREE(scratch);
> -               if (ret) {
> -                       mpol_put(new);
> -                       goto out;
> -               }
> -       }
> +       /*
> +        * Save nodes for mpol_to_str() to show the tmpfs mount options
> +        * for /proc/mounts, /proc/pid/mounts and /proc/pid/mountinfo.
> +        */
> +       if (mode != MPOL_PREFERRED)
> +               new->v.nodes = nodes;
> +       else if (nodelist)
> +               new->v.preferred_node = first_node(nodes);
> +       else
> +               new->flags |= MPOL_F_LOCAL;
> +
> +       /*
> +        * Save nodes for contextualization: this will be used to "clone"
> +        * the mempolicy in a specific context [cpuset] at a later time.
> +        */
> +       new->w.user_nodemask = nodes;
> +
>         err = 0;

Ugh, Indeed.
I should have realized this mistake at my last full review.



>
>  out:
> @@ -2756,13 +2747,13 @@ out:
>   * @buffer:  to contain formatted mempolicy string
>   * @maxlen:  length of @buffer
>   * @pol:  pointer to mempolicy to be formatted
> - * @no_context:  "context free" mempolicy - use nodemask in w.user_nodemask
> + * @unused:  redundant argument, to be removed later.
>   *
>   * Convert a mempolicy into a string.
>   * Returns the number of characters in buffer (if positive)
>   * or an error (negative)
>   */
> -int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
> +int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int unused)
>  {
>         char *p = buffer;
>         int l;
> @@ -2788,7 +2779,7 @@ int mpol_to_str(char *buffer, int maxlen
>         case MPOL_PREFERRED:
>                 nodes_clear(nodes);
>                 if (flags & MPOL_F_LOCAL)
> -                       mode = MPOL_LOCAL;      /* pseudo-policy */
> +                       mode = MPOL_LOCAL;
>                 else
>                         node_set(pol->v.preferred_node, nodes);
>                 break;
> @@ -2796,10 +2787,7 @@ int mpol_to_str(char *buffer, int maxlen
>         case MPOL_BIND:
>                 /* Fall through */
>         case MPOL_INTERLEAVE:
> -               if (no_context)
> -                       nodes = pol->w.user_nodemask;
> -               else
> -                       nodes = pol->v.nodes;
> +               nodes = pol->v.nodes;
>                 break;

Hmm. yes, tmpfs mempolicy shoule be out of cpuset contextualization.
then when no_context is true, w.user_nodemask is alywas same v.nodes.

But note, I'd like to change this to make memory hot-plug safe. then I
may resurrect
this test for distinguish before and after hot plugging.

Anyway, I have no seen any problems.

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] mempolicy: remove arg from mpol_parse_str, mpol_to_str
  2013-01-02 10:04   ` Hugh Dickins
@ 2013-01-02 20:39     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 20+ messages in thread
From: KOSAKI Motohiro @ 2013-01-02 20:39 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Lee Schermerhorn,
	Christoph Lameter, David Rientjes, Mel Gorman, Ingo Molnar,
	Peter Zijlstra, LKML, linux-mm

On Wed, Jan 2, 2013 at 5:04 AM, Hugh Dickins <hughd@google.com> wrote:
> Remove the unused argument (formerly no_context) from mpol_parse_str()
> and from mpol_to_str().
>
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] mempolicy: remove arg from mpol_parse_str, mpol_to_str
@ 2013-01-02 20:39     ` KOSAKI Motohiro
  0 siblings, 0 replies; 20+ messages in thread
From: KOSAKI Motohiro @ 2013-01-02 20:39 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Lee Schermerhorn,
	Christoph Lameter, David Rientjes, Mel Gorman, Ingo Molnar,
	Peter Zijlstra, LKML, linux-mm

On Wed, Jan 2, 2013 at 5:04 AM, Hugh Dickins <hughd@google.com> wrote:
> Remove the unused argument (formerly no_context) from mpol_parse_str()
> and from mpol_to_str().
>
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-01-02 20:45 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-02 10:01 [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory Hugh Dickins
2013-01-02 10:01 ` Hugh Dickins
2013-01-02 10:04 ` [PATCH 2/2] mempolicy: remove arg from mpol_parse_str, mpol_to_str Hugh Dickins
2013-01-02 10:04   ` Hugh Dickins
2013-01-02 20:39   ` KOSAKI Motohiro
2013-01-02 20:39     ` KOSAKI Motohiro
2013-01-02 14:32 ` [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corrupting memory Christoph Lameter
2013-01-02 14:32   ` Christoph Lameter
2013-01-02 18:30   ` Hugh Dickins
2013-01-02 18:30     ` Hugh Dickins
2013-01-02 15:57 ` Christoph Lameter
2013-01-02 15:57   ` Christoph Lameter
2013-01-02 17:24   ` Linus Torvalds
2013-01-02 17:24     ` Linus Torvalds
2013-01-02 17:27     ` Linus Torvalds
2013-01-02 17:27       ` Linus Torvalds
2013-01-02 18:48     ` Hugh Dickins
2013-01-02 18:48       ` Hugh Dickins
2013-01-02 20:38 ` KOSAKI Motohiro
2013-01-02 20:38   ` KOSAKI Motohiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.