All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-05  8:28 ` Gu Zheng
  0 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-05  8:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Tejun Heo, linux-mm, Cgroups, stable

When running with the kernel(3.15-rc7+), the follow bug occurs:
[ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
[ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
[ 9969.441175] INFO: lockdep is turned off.
[ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
[ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
[ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
[ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
[ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
[ 9969.974071] Call Trace:
[ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
[ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
[ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
[ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
[ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
[ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
[ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
[ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
[ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
[ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
[ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
[ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
[ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
[ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
[ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b

The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
rcu_read_lock protection region to protect the access to cpuset only in
current_cpuset_is_being_rebound(). So that we can avoid this bug.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
 kernel/cpuset.c |    8 +++++++-
 mm/mempolicy.c  |    2 --
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..a735402 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1188,7 +1188,13 @@ done:
 
 int current_cpuset_is_being_rebound(void)
 {
-	return task_cs(current) == cpuset_being_rebound;
+	int ret;
+
+	rcu_read_lock();
+	ret = task_cs(current) == cpuset_being_rebound;
+	rcu_read_unlock();
+
+	return ret;
 }
 
 static int update_relax_domain_level(struct cpuset *cs, s64 val)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 78e1472..b58a9d5 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2138,7 +2138,6 @@ struct mempolicy *__mpol_dup(struct mempolicy *old)
 	} else
 		*new = *old;
 
-	rcu_read_lock();
 	if (current_cpuset_is_being_rebound()) {
 		nodemask_t mems = cpuset_mems_allowed(current);
 		if (new->flags & MPOL_F_REBINDING)
@@ -2146,7 +2145,6 @@ struct mempolicy *__mpol_dup(struct mempolicy *old)
 		else
 			mpol_rebind_policy(new, &mems, MPOL_REBIND_ONCE);
 	}
-	rcu_read_unlock();
 	atomic_set(&new->refcnt, 1);
 	return new;
 }
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-05  8:28 ` Gu Zheng
  0 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-05  8:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Tejun Heo, linux-mm, Cgroups, stable

When running with the kernel(3.15-rc7+), the follow bug occurs:
[ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
[ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
[ 9969.441175] INFO: lockdep is turned off.
[ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
[ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
[ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
[ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
[ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
[ 9969.974071] Call Trace:
[ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
[ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
[ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
[ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
[ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
[ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
[ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
[ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
[ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
[ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
[ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
[ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
[ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
[ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
[ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b

The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
rcu_read_lock protection region to protect the access to cpuset only in
current_cpuset_is_being_rebound(). So that we can avoid this bug.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
 kernel/cpuset.c |    8 +++++++-
 mm/mempolicy.c  |    2 --
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..a735402 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1188,7 +1188,13 @@ done:
 
 int current_cpuset_is_being_rebound(void)
 {
-	return task_cs(current) == cpuset_being_rebound;
+	int ret;
+
+	rcu_read_lock();
+	ret = task_cs(current) == cpuset_being_rebound;
+	rcu_read_unlock();
+
+	return ret;
 }
 
 static int update_relax_domain_level(struct cpuset *cs, s64 val)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 78e1472..b58a9d5 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2138,7 +2138,6 @@ struct mempolicy *__mpol_dup(struct mempolicy *old)
 	} else
 		*new = *old;
 
-	rcu_read_lock();
 	if (current_cpuset_is_being_rebound()) {
 		nodemask_t mems = cpuset_mems_allowed(current);
 		if (new->flags & MPOL_F_REBINDING)
@@ -2146,7 +2145,6 @@ struct mempolicy *__mpol_dup(struct mempolicy *old)
 		else
 			mpol_rebind_policy(new, &mems, MPOL_REBIND_ONCE);
 	}
-	rcu_read_unlock();
 	atomic_set(&new->refcnt, 1);
 	return new;
 }
-- 
1.7.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-05  8:28 ` Gu Zheng
@ 2014-06-05 14:18   ` Greg KH
  -1 siblings, 0 replies; 34+ messages in thread
From: Greg KH @ 2014-06-05 14:18 UTC (permalink / raw)
  To: Gu Zheng
  Cc: linux-kernel, Andrew Morton, Tejun Heo, linux-mm, Cgroups, stable

On Thu, Jun 05, 2014 at 04:28:52PM +0800, Gu Zheng wrote:
> When running with the kernel(3.15-rc7+), the follow bug occurs:
> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
> [ 9969.441175] INFO: lockdep is turned off.
> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
> [ 9969.974071] Call Trace:
> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
> 
> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
> rcu_read_lock protection region to protect the access to cpuset only in
> current_cpuset_is_being_rebound(). So that we can avoid this bug.
> 
> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> ---
>  kernel/cpuset.c |    8 +++++++-
>  mm/mempolicy.c  |    2 --
>  2 files changed, 7 insertions(+), 3 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-05 14:18   ` Greg KH
  0 siblings, 0 replies; 34+ messages in thread
From: Greg KH @ 2014-06-05 14:18 UTC (permalink / raw)
  To: Gu Zheng
  Cc: linux-kernel, Andrew Morton, Tejun Heo, linux-mm, Cgroups, stable

On Thu, Jun 05, 2014 at 04:28:52PM +0800, Gu Zheng wrote:
> When running with the kernel(3.15-rc7+), the follow bug occurs:
> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
> [ 9969.441175] INFO: lockdep is turned off.
> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
> [ 9969.974071] Call Trace:
> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
> 
> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
> rcu_read_lock protection region to protect the access to cpuset only in
> current_cpuset_is_being_rebound(). So that we can avoid this bug.
> 
> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> ---
>  kernel/cpuset.c |    8 +++++++-
>  mm/mempolicy.c  |    2 --
>  2 files changed, 7 insertions(+), 3 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-05  8:28 ` Gu Zheng
  (?)
@ 2014-06-05 20:23   ` Andrew Morton
  -1 siblings, 0 replies; 34+ messages in thread
From: Andrew Morton @ 2014-06-05 20:23 UTC (permalink / raw)
  To: Gu Zheng; +Cc: linux-kernel, Tejun Heo, linux-mm, Cgroups, stable

On Thu, 5 Jun 2014 16:28:52 +0800 Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:

> When running with the kernel(3.15-rc7+), the follow bug occurs:
> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
> [ 9969.441175] INFO: lockdep is turned off.
> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
> [ 9969.974071] Call Trace:
> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
> 
> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
> rcu_read_lock protection region to protect the access to cpuset only in
> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>
> ...
>
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1188,7 +1188,13 @@ done:
>  
>  int current_cpuset_is_being_rebound(void)
>  {
> -	return task_cs(current) == cpuset_being_rebound;
> +	int ret;
> +
> +	rcu_read_lock();
> +	ret = task_cs(current) == cpuset_being_rebound;
> +	rcu_read_unlock();
> +
> +	return ret;
>  }

Looks fishy to me.  If the rcu_read_lock() stabilizes
cpuset_being_rebound then cpuset_being_rebound can change immediately
after rcu_read_unlock() and `ret' is now wrong.

Anyway.  Tejun, this one is yours please ;)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-05 20:23   ` Andrew Morton
  0 siblings, 0 replies; 34+ messages in thread
From: Andrew Morton @ 2014-06-05 20:23 UTC (permalink / raw)
  To: Gu Zheng; +Cc: linux-kernel, Tejun Heo, linux-mm, Cgroups, stable

On Thu, 5 Jun 2014 16:28:52 +0800 Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:

> When running with the kernel(3.15-rc7+), the follow bug occurs:
> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
> [ 9969.441175] INFO: lockdep is turned off.
> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
> [ 9969.974071] Call Trace:
> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
> 
> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
> rcu_read_lock protection region to protect the access to cpuset only in
> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>
> ...
>
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1188,7 +1188,13 @@ done:
>  
>  int current_cpuset_is_being_rebound(void)
>  {
> -	return task_cs(current) == cpuset_being_rebound;
> +	int ret;
> +
> +	rcu_read_lock();
> +	ret = task_cs(current) == cpuset_being_rebound;
> +	rcu_read_unlock();
> +
> +	return ret;
>  }

Looks fishy to me.  If the rcu_read_lock() stabilizes
cpuset_being_rebound then cpuset_being_rebound can change immediately
after rcu_read_unlock() and `ret' is now wrong.

Anyway.  Tejun, this one is yours please ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-05 20:23   ` Andrew Morton
  0 siblings, 0 replies; 34+ messages in thread
From: Andrew Morton @ 2014-06-05 20:23 UTC (permalink / raw)
  To: Gu Zheng; +Cc: linux-kernel, Tejun Heo, linux-mm, Cgroups, stable

On Thu, 5 Jun 2014 16:28:52 +0800 Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:

> When running with the kernel(3.15-rc7+), the follow bug occurs:
> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
> [ 9969.441175] INFO: lockdep is turned off.
> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
> [ 9969.974071] Call Trace:
> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
> 
> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
> rcu_read_lock protection region to protect the access to cpuset only in
> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>
> ...
>
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1188,7 +1188,13 @@ done:
>  
>  int current_cpuset_is_being_rebound(void)
>  {
> -	return task_cs(current) == cpuset_being_rebound;
> +	int ret;
> +
> +	rcu_read_lock();
> +	ret = task_cs(current) == cpuset_being_rebound;
> +	rcu_read_unlock();
> +
> +	return ret;
>  }

Looks fishy to me.  If the rcu_read_lock() stabilizes
cpuset_being_rebound then cpuset_being_rebound can change immediately
after rcu_read_unlock() and `ret' is now wrong.

Anyway.  Tejun, this one is yours please ;)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-05 14:18   ` Greg KH
  (?)
@ 2014-06-06  9:34     ` Gu Zheng
  -1 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-06  9:34 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, Andrew Morton, Tejun Heo, linux-mm, Cgroups, stable

Hi Greg,
On 06/05/2014 10:18 PM, Greg KH wrote:

> On Thu, Jun 05, 2014 at 04:28:52PM +0800, Gu Zheng wrote:
>> When running with the kernel(3.15-rc7+), the follow bug occurs:
>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
>> [ 9969.441175] INFO: lockdep is turned off.
>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
>> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
>> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
>> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
>> [ 9969.974071] Call Trace:
>> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
>> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
>> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
>> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
>> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
>> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
>> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
>> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
>> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
>> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
>> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
>> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
>> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
>> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
>> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>>
>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
>> rcu_read_lock protection region to protect the access to cpuset only in
>> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>>
>> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> ---
>>  kernel/cpuset.c |    8 +++++++-
>>  mm/mempolicy.c  |    2 --
>>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> <formletter>
> 
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.

> 
> </formletter>

Thanks for your reminder.
I'll follow the right rules.

Regards,
Gu

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> .
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-06  9:34     ` Gu Zheng
  0 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-06  9:34 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, Andrew Morton, Tejun Heo, linux-mm, Cgroups, stable

Hi Greg,
On 06/05/2014 10:18 PM, Greg KH wrote:

> On Thu, Jun 05, 2014 at 04:28:52PM +0800, Gu Zheng wrote:
>> When running with the kernel(3.15-rc7+), the follow bug occurs:
>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
>> [ 9969.441175] INFO: lockdep is turned off.
>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
>> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
>> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
>> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
>> [ 9969.974071] Call Trace:
>> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
>> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
>> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
>> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
>> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
>> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
>> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
>> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
>> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
>> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
>> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
>> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
>> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
>> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
>> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>>
>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
>> rcu_read_lock protection region to protect the access to cpuset only in
>> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>>
>> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> ---
>>  kernel/cpuset.c |    8 +++++++-
>>  mm/mempolicy.c  |    2 --
>>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> <formletter>
> 
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.

> 
> </formletter>

Thanks for your reminder.
I'll follow the right rules.

Regards,
Gu

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> .
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-06  9:34     ` Gu Zheng
  0 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-06  9:34 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, Andrew Morton, Tejun Heo,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Cgroups,
	stable-u79uwXL29TY76Z2rM5mHXA

Hi Greg,
On 06/05/2014 10:18 PM, Greg KH wrote:

> On Thu, Jun 05, 2014 at 04:28:52PM +0800, Gu Zheng wrote:
>> When running with the kernel(3.15-rc7+), the follow bug occurs:
>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
>> [ 9969.441175] INFO: lockdep is turned off.
>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
>> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
>> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
>> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
>> [ 9969.974071] Call Trace:
>> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
>> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
>> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
>> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
>> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
>> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
>> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
>> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
>> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
>> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
>> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
>> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
>> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
>> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
>> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>>
>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
>> rcu_read_lock protection region to protect the access to cpuset only in
>> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>>
>> Signed-off-by: Gu Zheng <guz.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
>> ---
>>  kernel/cpuset.c |    8 +++++++-
>>  mm/mempolicy.c  |    2 --
>>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> <formletter>
> 
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.

> 
> </formletter>

Thanks for your reminder.
I'll follow the right rules.

Regards,
Gu

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
> .
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-05 20:23   ` Andrew Morton
@ 2014-06-06 10:07     ` Gu Zheng
  -1 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-06 10:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Tejun Heo, linux-mm, Cgroups, stable, Li Zefan

(cc'ing Li)

Hi Andrew,
Thanks for your comment.
On 06/06/2014 04:23 AM, Andrew Morton wrote:

> On Thu, 5 Jun 2014 16:28:52 +0800 Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> 
>> When running with the kernel(3.15-rc7+), the follow bug occurs:
>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
>> [ 9969.441175] INFO: lockdep is turned off.
>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
>> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
>> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
>> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
>> [ 9969.974071] Call Trace:
>> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
>> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
>> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
>> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
>> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
>> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
>> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
>> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
>> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
>> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
>> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
>> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
>> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
>> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
>> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>>
>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
>> rcu_read_lock protection region to protect the access to cpuset only in
>> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>>
>> ...
>>
>> --- a/kernel/cpuset.c
>> +++ b/kernel/cpuset.c
>> @@ -1188,7 +1188,13 @@ done:
>>  
>>  int current_cpuset_is_being_rebound(void)
>>  {
>> -	return task_cs(current) == cpuset_being_rebound;
>> +	int ret;
>> +
>> +	rcu_read_lock();
>> +	ret = task_cs(current) == cpuset_being_rebound;
>> +	rcu_read_unlock();
>> +
>> +	return ret;
>>  }
> 
> Looks fishy to me.  If the rcu_read_lock() stabilizes
> cpuset_being_rebound then cpuset_being_rebound can change immediately
> after rcu_read_unlock() and `ret' is now wrong.

IMO, whether cpuset_being_rebound changed or not is immaterial here, we
just want to know whether the cpuset is being rebound at that point.

> 
> Anyway.  Tejun, this one is yours please ;)

To Tejun, Li:
Any comment? And if I misread something, please correct me?

Thanks,
Gu

> .
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-06 10:07     ` Gu Zheng
  0 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-06 10:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Tejun Heo, linux-mm, Cgroups, stable, Li Zefan

(cc'ing Li)

Hi Andrew,
Thanks for your comment.
On 06/06/2014 04:23 AM, Andrew Morton wrote:

> On Thu, 5 Jun 2014 16:28:52 +0800 Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> 
>> When running with the kernel(3.15-rc7+), the follow bug occurs:
>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
>> [ 9969.441175] INFO: lockdep is turned off.
>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
>> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
>> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
>> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
>> [ 9969.974071] Call Trace:
>> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
>> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
>> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
>> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
>> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
>> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
>> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
>> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
>> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
>> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
>> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
>> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
>> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
>> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
>> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>>
>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
>> rcu_read_lock protection region to protect the access to cpuset only in
>> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>>
>> ...
>>
>> --- a/kernel/cpuset.c
>> +++ b/kernel/cpuset.c
>> @@ -1188,7 +1188,13 @@ done:
>>  
>>  int current_cpuset_is_being_rebound(void)
>>  {
>> -	return task_cs(current) == cpuset_being_rebound;
>> +	int ret;
>> +
>> +	rcu_read_lock();
>> +	ret = task_cs(current) == cpuset_being_rebound;
>> +	rcu_read_unlock();
>> +
>> +	return ret;
>>  }
> 
> Looks fishy to me.  If the rcu_read_lock() stabilizes
> cpuset_being_rebound then cpuset_being_rebound can change immediately
> after rcu_read_unlock() and `ret' is now wrong.

IMO, whether cpuset_being_rebound changed or not is immaterial here, we
just want to know whether the cpuset is being rebound at that point.

> 
> Anyway.  Tejun, this one is yours please ;)

To Tejun, Li:
Any comment? And if I misread something, please correct me?

Thanks,
Gu

> .
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-06 10:07     ` Gu Zheng
@ 2014-06-08 22:47       ` David Rientjes
  -1 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2014-06-08 22:47 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Andrew Morton, linux-kernel, Tejun Heo, linux-mm, Cgroups,
	stable, Li Zefan

On Fri, 6 Jun 2014, Gu Zheng wrote:

> >> When running with the kernel(3.15-rc7+), the follow bug occurs:
> >> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
> >> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
> >> [ 9969.441175] INFO: lockdep is turned off.
> >> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
> >> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
> >> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
> >> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
> >> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
> >> [ 9969.974071] Call Trace:
> >> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
> >> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
> >> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
> >> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
> >> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
> >> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> >> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
> >> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> >> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
> >> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
> >> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
> >> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
> >> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
> >> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
> >> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
> >> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
> >> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
> >>
> >> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
> >> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
> >> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
> >> rcu_read_lock protection region to protect the access to cpuset only in
> >> current_cpuset_is_being_rebound(). So that we can avoid this bug.
> >>
> >> ...
> >>
> >> --- a/kernel/cpuset.c
> >> +++ b/kernel/cpuset.c
> >> @@ -1188,7 +1188,13 @@ done:
> >>  
> >>  int current_cpuset_is_being_rebound(void)
> >>  {
> >> -	return task_cs(current) == cpuset_being_rebound;
> >> +	int ret;
> >> +
> >> +	rcu_read_lock();
> >> +	ret = task_cs(current) == cpuset_being_rebound;
> >> +	rcu_read_unlock();
> >> +
> >> +	return ret;
> >>  }
> > 
> > Looks fishy to me.  If the rcu_read_lock() stabilizes
> > cpuset_being_rebound then cpuset_being_rebound can change immediately
> > after rcu_read_unlock() and `ret' is now wrong.
> 
> IMO, whether cpuset_being_rebound changed or not is immaterial here, we
> just want to know whether the cpuset is being rebound at that point.
> 

I think your patch addresses the problem that you're reporting but misses 
the larger problem with cpuset.mems rebinding on fork().  When the 
forker's task_struct is duplicated (which includes ->mems_allowed) and it 
races with an update to cpuset_being_rebound in update_tasks_nodemask() 
then the task's mems_allowed doesn't get updated.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-08 22:47       ` David Rientjes
  0 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2014-06-08 22:47 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Andrew Morton, linux-kernel, Tejun Heo, linux-mm, Cgroups,
	stable, Li Zefan

On Fri, 6 Jun 2014, Gu Zheng wrote:

> >> When running with the kernel(3.15-rc7+), the follow bug occurs:
> >> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
> >> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
> >> [ 9969.441175] INFO: lockdep is turned off.
> >> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
> >> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
> >> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
> >> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
> >> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
> >> [ 9969.974071] Call Trace:
> >> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
> >> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
> >> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
> >> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
> >> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
> >> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> >> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
> >> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
> >> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
> >> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
> >> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
> >> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
> >> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
> >> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
> >> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
> >> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
> >> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
> >>
> >> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
> >> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
> >> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
> >> rcu_read_lock protection region to protect the access to cpuset only in
> >> current_cpuset_is_being_rebound(). So that we can avoid this bug.
> >>
> >> ...
> >>
> >> --- a/kernel/cpuset.c
> >> +++ b/kernel/cpuset.c
> >> @@ -1188,7 +1188,13 @@ done:
> >>  
> >>  int current_cpuset_is_being_rebound(void)
> >>  {
> >> -	return task_cs(current) == cpuset_being_rebound;
> >> +	int ret;
> >> +
> >> +	rcu_read_lock();
> >> +	ret = task_cs(current) == cpuset_being_rebound;
> >> +	rcu_read_unlock();
> >> +
> >> +	return ret;
> >>  }
> > 
> > Looks fishy to me.  If the rcu_read_lock() stabilizes
> > cpuset_being_rebound then cpuset_being_rebound can change immediately
> > after rcu_read_unlock() and `ret' is now wrong.
> 
> IMO, whether cpuset_being_rebound changed or not is immaterial here, we
> just want to know whether the cpuset is being rebound at that point.
> 

I think your patch addresses the problem that you're reporting but misses 
the larger problem with cpuset.mems rebinding on fork().  When the 
forker's task_struct is duplicated (which includes ->mems_allowed) and it 
races with an update to cpuset_being_rebound in update_tasks_nodemask() 
then the task's mems_allowed doesn't get updated.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-08 22:47       ` David Rientjes
@ 2014-06-09  8:48         ` Gu Zheng
  -1 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-09  8:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, linux-kernel, Tejun Heo, linux-mm, Cgroups,
	stable, Li Zefan

Hi David,

On 06/09/2014 06:47 AM, David Rientjes wrote:

> On Fri, 6 Jun 2014, Gu Zheng wrote:
> 
>>>> When running with the kernel(3.15-rc7+), the follow bug occurs:
>>>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>>>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
>>>> [ 9969.441175] INFO: lockdep is turned off.
>>>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
>>>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
>>>> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
>>>> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
>>>> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
>>>> [ 9969.974071] Call Trace:
>>>> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
>>>> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
>>>> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
>>>> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
>>>> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
>>>> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>>>> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
>>>> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>>>> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
>>>> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
>>>> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
>>>> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
>>>> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
>>>> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
>>>> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
>>>> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
>>>> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>>>>
>>>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
>>>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
>>>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
>>>> rcu_read_lock protection region to protect the access to cpuset only in
>>>> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>>>>
>>>> ...
>>>>
>>>> --- a/kernel/cpuset.c
>>>> +++ b/kernel/cpuset.c
>>>> @@ -1188,7 +1188,13 @@ done:
>>>>  
>>>>  int current_cpuset_is_being_rebound(void)
>>>>  {
>>>> -	return task_cs(current) == cpuset_being_rebound;
>>>> +	int ret;
>>>> +
>>>> +	rcu_read_lock();
>>>> +	ret = task_cs(current) == cpuset_being_rebound;
>>>> +	rcu_read_unlock();
>>>> +
>>>> +	return ret;
>>>>  }
>>>
>>> Looks fishy to me.  If the rcu_read_lock() stabilizes
>>> cpuset_being_rebound then cpuset_being_rebound can change immediately
>>> after rcu_read_unlock() and `ret' is now wrong.
>>
>> IMO, whether cpuset_being_rebound changed or not is immaterial here, we
>> just want to know whether the cpuset is being rebound at that point.
>>
> 
> I think your patch addresses the problem that you're reporting but misses 
> the larger problem with cpuset.mems rebinding on fork().  When the 
> forker's task_struct is duplicated (which includes ->mems_allowed) and it 
> races with an update to cpuset_being_rebound in update_tasks_nodemask() 
> then the task's mems_allowed doesn't get updated.

Yes, you are right, this patch just wants to address the bug reported above.
The race condition you mentioned above inherently exists there, but it is yet
another issue, the rcu lock here makes no sense to it, and I think we need
additional sync-mechanisms if want to fix it.
But thinking more, though the current implementation has flaw, but I worry
about the negative effect if we really want to fix it. Or maybe the fear
is unnecessary.:) 

Thanks,
Gu 

> .
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-09  8:48         ` Gu Zheng
  0 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-09  8:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, linux-kernel, Tejun Heo, linux-mm, Cgroups,
	stable, Li Zefan

Hi David,

On 06/09/2014 06:47 AM, David Rientjes wrote:

> On Fri, 6 Jun 2014, Gu Zheng wrote:
> 
>>>> When running with the kernel(3.15-rc7+), the follow bug occurs:
>>>> [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>>>> [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
>>>> [ 9969.441175] INFO: lockdep is turned off.
>>>> [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
>>>> [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
>>>> [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
>>>> [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
>>>> [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
>>>> [ 9969.974071] Call Trace:
>>>> [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
>>>> [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
>>>> [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
>>>> [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
>>>> [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
>>>> [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>>>> [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
>>>> [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
>>>> [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
>>>> [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
>>>> [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
>>>> [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
>>>> [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
>>>> [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
>>>> [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
>>>> [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
>>>> [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
>>>>
>>>> The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex)
>>>> under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(),
>>>> the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the
>>>> rcu_read_lock protection region to protect the access to cpuset only in
>>>> current_cpuset_is_being_rebound(). So that we can avoid this bug.
>>>>
>>>> ...
>>>>
>>>> --- a/kernel/cpuset.c
>>>> +++ b/kernel/cpuset.c
>>>> @@ -1188,7 +1188,13 @@ done:
>>>>  
>>>>  int current_cpuset_is_being_rebound(void)
>>>>  {
>>>> -	return task_cs(current) == cpuset_being_rebound;
>>>> +	int ret;
>>>> +
>>>> +	rcu_read_lock();
>>>> +	ret = task_cs(current) == cpuset_being_rebound;
>>>> +	rcu_read_unlock();
>>>> +
>>>> +	return ret;
>>>>  }
>>>
>>> Looks fishy to me.  If the rcu_read_lock() stabilizes
>>> cpuset_being_rebound then cpuset_being_rebound can change immediately
>>> after rcu_read_unlock() and `ret' is now wrong.
>>
>> IMO, whether cpuset_being_rebound changed or not is immaterial here, we
>> just want to know whether the cpuset is being rebound at that point.
>>
> 
> I think your patch addresses the problem that you're reporting but misses 
> the larger problem with cpuset.mems rebinding on fork().  When the 
> forker's task_struct is duplicated (which includes ->mems_allowed) and it 
> races with an update to cpuset_being_rebound in update_tasks_nodemask() 
> then the task's mems_allowed doesn't get updated.

Yes, you are right, this patch just wants to address the bug reported above.
The race condition you mentioned above inherently exists there, but it is yet
another issue, the rcu lock here makes no sense to it, and I think we need
additional sync-mechanisms if want to fix it.
But thinking more, though the current implementation has flaw, but I worry
about the negative effect if we really want to fix it. Or maybe the fear
is unnecessary.:) 

Thanks,
Gu 

> .
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-09  8:48         ` Gu Zheng
@ 2014-06-09  9:13           ` David Rientjes
  -1 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2014-06-09  9:13 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Andrew Morton, linux-kernel, Tejun Heo, linux-mm, Cgroups,
	stable, Li Zefan

On Mon, 9 Jun 2014, Gu Zheng wrote:

> > I think your patch addresses the problem that you're reporting but misses 
> > the larger problem with cpuset.mems rebinding on fork().  When the 
> > forker's task_struct is duplicated (which includes ->mems_allowed) and it 
> > races with an update to cpuset_being_rebound in update_tasks_nodemask() 
> > then the task's mems_allowed doesn't get updated.
> 
> Yes, you are right, this patch just wants to address the bug reported above.
> The race condition you mentioned above inherently exists there, but it is yet
> another issue, the rcu lock here makes no sense to it, and I think we need
> additional sync-mechanisms if want to fix it.

Yes, the rcu lock is not providing protection for any critical section 
here that requires (1) the forker's cpuset to be stored in 
cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by 
the cpuset nodemask update, and no race involving the two.

> But thinking more, though the current implementation has flaw, but I worry
> about the negative effect if we really want to fix it. Or maybe the fear
> is unnecessary.:) 
> 

It needs to be slightly rewritten to work properly without negatively 
impacting the latency of fork().  Do you have the cycles to do it?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-09  9:13           ` David Rientjes
  0 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2014-06-09  9:13 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Andrew Morton, linux-kernel, Tejun Heo, linux-mm, Cgroups,
	stable, Li Zefan

On Mon, 9 Jun 2014, Gu Zheng wrote:

> > I think your patch addresses the problem that you're reporting but misses 
> > the larger problem with cpuset.mems rebinding on fork().  When the 
> > forker's task_struct is duplicated (which includes ->mems_allowed) and it 
> > races with an update to cpuset_being_rebound in update_tasks_nodemask() 
> > then the task's mems_allowed doesn't get updated.
> 
> Yes, you are right, this patch just wants to address the bug reported above.
> The race condition you mentioned above inherently exists there, but it is yet
> another issue, the rcu lock here makes no sense to it, and I think we need
> additional sync-mechanisms if want to fix it.

Yes, the rcu lock is not providing protection for any critical section 
here that requires (1) the forker's cpuset to be stored in 
cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by 
the cpuset nodemask update, and no race involving the two.

> But thinking more, though the current implementation has flaw, but I worry
> about the negative effect if we really want to fix it. Or maybe the fear
> is unnecessary.:) 
> 

It needs to be slightly rewritten to work properly without negatively 
impacting the latency of fork().  Do you have the cycles to do it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-09  9:13           ` David Rientjes
@ 2014-06-09  9:58             ` Gu Zheng
  -1 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-09  9:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, linux-kernel, Tejun Heo, linux-mm, Cgroups,
	stable, Li Zefan

Hi David,

On 06/09/2014 05:13 PM, David Rientjes wrote:

> On Mon, 9 Jun 2014, Gu Zheng wrote:
> 
>>> I think your patch addresses the problem that you're reporting but misses 
>>> the larger problem with cpuset.mems rebinding on fork().  When the 
>>> forker's task_struct is duplicated (which includes ->mems_allowed) and it 
>>> races with an update to cpuset_being_rebound in update_tasks_nodemask() 
>>> then the task's mems_allowed doesn't get updated.
>>
>> Yes, you are right, this patch just wants to address the bug reported above.
>> The race condition you mentioned above inherently exists there, but it is yet
>> another issue, the rcu lock here makes no sense to it, and I think we need
>> additional sync-mechanisms if want to fix it.
> 
> Yes, the rcu lock is not providing protection for any critical section 
> here that requires (1) the forker's cpuset to be stored in 
> cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by 
> the cpuset nodemask update, and no race involving the two.
> 
>> But thinking more, though the current implementation has flaw, but I worry
>> about the negative effect if we really want to fix it. Or maybe the fear
>> is unnecessary.:) 
>>
> 
> It needs to be slightly rewritten to work properly without negatively 
> impacting the latency of fork().  Do you have the cycles to do it?
> 

To be honest, I'm busy with other schedule. And if you(or other
guys) have proper proposal, please go ahead.

To Tejun, Li and Andrew:
Any comment? Or could you apply this *bug fix* first?

Regards,
Gu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-09  9:58             ` Gu Zheng
  0 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-09  9:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, linux-kernel, Tejun Heo, linux-mm, Cgroups,
	stable, Li Zefan

Hi David,

On 06/09/2014 05:13 PM, David Rientjes wrote:

> On Mon, 9 Jun 2014, Gu Zheng wrote:
> 
>>> I think your patch addresses the problem that you're reporting but misses 
>>> the larger problem with cpuset.mems rebinding on fork().  When the 
>>> forker's task_struct is duplicated (which includes ->mems_allowed) and it 
>>> races with an update to cpuset_being_rebound in update_tasks_nodemask() 
>>> then the task's mems_allowed doesn't get updated.
>>
>> Yes, you are right, this patch just wants to address the bug reported above.
>> The race condition you mentioned above inherently exists there, but it is yet
>> another issue, the rcu lock here makes no sense to it, and I think we need
>> additional sync-mechanisms if want to fix it.
> 
> Yes, the rcu lock is not providing protection for any critical section 
> here that requires (1) the forker's cpuset to be stored in 
> cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by 
> the cpuset nodemask update, and no race involving the two.
> 
>> But thinking more, though the current implementation has flaw, but I worry
>> about the negative effect if we really want to fix it. Or maybe the fear
>> is unnecessary.:) 
>>
> 
> It needs to be slightly rewritten to work properly without negatively 
> impacting the latency of fork().  Do you have the cycles to do it?
> 

To be honest, I'm busy with other schedule. And if you(or other
guys) have proper proposal, please go ahead.

To Tejun, Li and Andrew:
Any comment? Or could you apply this *bug fix* first?

Regards,
Gu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-09  9:13           ` David Rientjes
@ 2014-06-10  2:58             ` Li Zefan
  -1 siblings, 0 replies; 34+ messages in thread
From: Li Zefan @ 2014-06-10  2:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: Gu Zheng, Andrew Morton, linux-kernel, Tejun Heo, linux-mm,
	Cgroups, stable

On 2014/6/9 17:13, David Rientjes wrote:
> On Mon, 9 Jun 2014, Gu Zheng wrote:
> 
>>> I think your patch addresses the problem that you're reporting but misses 
>>> the larger problem with cpuset.mems rebinding on fork().  When the 
>>> forker's task_struct is duplicated (which includes ->mems_allowed) and it 
>>> races with an update to cpuset_being_rebound in update_tasks_nodemask() 
>>> then the task's mems_allowed doesn't get updated.
>>
>> Yes, you are right, this patch just wants to address the bug reported above.
>> The race condition you mentioned above inherently exists there, but it is yet
>> another issue, the rcu lock here makes no sense to it, and I think we need
>> additional sync-mechanisms if want to fix it.
> 
> Yes, the rcu lock is not providing protection for any critical section 
> here that requires (1) the forker's cpuset to be stored in 
> cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by 
> the cpuset nodemask update, and no race involving the two.
>

Yes, this is a long-standing issue. Besides the race you described, the child
task's mems_allowed can be wrong if the cpuset's nodemask changes before the
child has been added to the cgroup's tasklist.

I remember Tejun once said he wanted to disallow task migration between
cgroups during fork, and that should fix this problem.
 
>> But thinking more, though the current implementation has flaw, but I worry
>> about the negative effect if we really want to fix it. Or maybe the fear
>> is unnecessary.:) 
>>
> 
> It needs to be slightly rewritten to work properly without negatively 
> impacting the latency of fork().  Do you have the cycles to do it?
> 

Sounds you have other idea?


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-10  2:58             ` Li Zefan
  0 siblings, 0 replies; 34+ messages in thread
From: Li Zefan @ 2014-06-10  2:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: Gu Zheng, Andrew Morton, linux-kernel, Tejun Heo, linux-mm,
	Cgroups, stable

On 2014/6/9 17:13, David Rientjes wrote:
> On Mon, 9 Jun 2014, Gu Zheng wrote:
> 
>>> I think your patch addresses the problem that you're reporting but misses 
>>> the larger problem with cpuset.mems rebinding on fork().  When the 
>>> forker's task_struct is duplicated (which includes ->mems_allowed) and it 
>>> races with an update to cpuset_being_rebound in update_tasks_nodemask() 
>>> then the task's mems_allowed doesn't get updated.
>>
>> Yes, you are right, this patch just wants to address the bug reported above.
>> The race condition you mentioned above inherently exists there, but it is yet
>> another issue, the rcu lock here makes no sense to it, and I think we need
>> additional sync-mechanisms if want to fix it.
> 
> Yes, the rcu lock is not providing protection for any critical section 
> here that requires (1) the forker's cpuset to be stored in 
> cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by 
> the cpuset nodemask update, and no race involving the two.
>

Yes, this is a long-standing issue. Besides the race you described, the child
task's mems_allowed can be wrong if the cpuset's nodemask changes before the
child has been added to the cgroup's tasklist.

I remember Tejun once said he wanted to disallow task migration between
cgroups during fork, and that should fix this problem.
 
>> But thinking more, though the current implementation has flaw, but I worry
>> about the negative effect if we really want to fix it. Or maybe the fear
>> is unnecessary.:) 
>>
> 
> It needs to be slightly rewritten to work properly without negatively 
> impacting the latency of fork().  Do you have the cycles to do it?
> 

Sounds you have other idea?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-10  2:58             ` Li Zefan
  (?)
@ 2014-06-10 22:16               ` David Rientjes
  -1 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2014-06-10 22:16 UTC (permalink / raw)
  To: Li Zefan
  Cc: Gu Zheng, Andrew Morton, linux-kernel, Tejun Heo, linux-mm,
	Cgroups, stable

On Tue, 10 Jun 2014, Li Zefan wrote:

> > Yes, the rcu lock is not providing protection for any critical section 
> > here that requires (1) the forker's cpuset to be stored in 
> > cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by 
> > the cpuset nodemask update, and no race involving the two.
> >
> 
> Yes, this is a long-standing issue. Besides the race you described, the child
> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
> child has been added to the cgroup's tasklist.
> 
> I remember Tejun once said he wanted to disallow task migration between
> cgroups during fork, and that should fix this problem.
>  

Ok, I don't want to fix it in cpusets if cgroups will eventually prevent 
it, so I need an understanding of the long term plan.  Will cgroups 
continue to allow migration during fork(), Tejun?

> > It needs to be slightly rewritten to work properly without negatively 
> > impacting the latency of fork().  Do you have the cycles to do it?
> > 
> 
> Sounds you have other idea?
> 

It wouldn't be too difficult with a cgroup post fork callback into the 
cpuset code to rebind the nodemask if it has changed, but with my above 
concern those might be yanked out eventually :)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-10 22:16               ` David Rientjes
  0 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2014-06-10 22:16 UTC (permalink / raw)
  To: Li Zefan
  Cc: Gu Zheng, Andrew Morton, linux-kernel, Tejun Heo, linux-mm,
	Cgroups, stable

On Tue, 10 Jun 2014, Li Zefan wrote:

> > Yes, the rcu lock is not providing protection for any critical section 
> > here that requires (1) the forker's cpuset to be stored in 
> > cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by 
> > the cpuset nodemask update, and no race involving the two.
> >
> 
> Yes, this is a long-standing issue. Besides the race you described, the child
> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
> child has been added to the cgroup's tasklist.
> 
> I remember Tejun once said he wanted to disallow task migration between
> cgroups during fork, and that should fix this problem.
>  

Ok, I don't want to fix it in cpusets if cgroups will eventually prevent 
it, so I need an understanding of the long term plan.  Will cgroups 
continue to allow migration during fork(), Tejun?

> > It needs to be slightly rewritten to work properly without negatively 
> > impacting the latency of fork().  Do you have the cycles to do it?
> > 
> 
> Sounds you have other idea?
> 

It wouldn't be too difficult with a cgroup post fork callback into the 
cpuset code to rebind the nodemask if it has changed, but with my above 
concern those might be yanked out eventually :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-10 22:16               ` David Rientjes
  0 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2014-06-10 22:16 UTC (permalink / raw)
  To: Li Zefan
  Cc: Gu Zheng, Andrew Morton, linux-kernel, Tejun Heo,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Cgroups,
	stable-u79uwXL29TY76Z2rM5mHXA

On Tue, 10 Jun 2014, Li Zefan wrote:

> > Yes, the rcu lock is not providing protection for any critical section 
> > here that requires (1) the forker's cpuset to be stored in 
> > cpuset_being_rebound or (2) the forked thread's cpuset to be rebound by 
> > the cpuset nodemask update, and no race involving the two.
> >
> 
> Yes, this is a long-standing issue. Besides the race you described, the child
> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
> child has been added to the cgroup's tasklist.
> 
> I remember Tejun once said he wanted to disallow task migration between
> cgroups during fork, and that should fix this problem.
>  

Ok, I don't want to fix it in cpusets if cgroups will eventually prevent 
it, so I need an understanding of the long term plan.  Will cgroups 
continue to allow migration during fork(), Tejun?

> > It needs to be slightly rewritten to work properly without negatively 
> > impacting the latency of fork().  Do you have the cycles to do it?
> > 
> 
> Sounds you have other idea?
> 

It wouldn't be too difficult with a cgroup post fork callback into the 
cpuset code to rebind the nodemask if it has changed, but with my above 
concern those might be yanked out eventually :)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-10  2:58             ` Li Zefan
@ 2014-06-20 21:01               ` Tejun Heo
  -1 siblings, 0 replies; 34+ messages in thread
From: Tejun Heo @ 2014-06-20 21:01 UTC (permalink / raw)
  To: Li Zefan
  Cc: David Rientjes, Gu Zheng, Andrew Morton, linux-kernel, linux-mm,
	Cgroups, stable

Hello, Li.

Sorry about the long delay.

On Tue, Jun 10, 2014 at 10:58:45AM +0800, Li Zefan wrote:
> Yes, this is a long-standing issue. Besides the race you described, the child
> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
> child has been added to the cgroup's tasklist.
> 
> I remember Tejun once said he wanted to disallow task migration between
> cgroups during fork, and that should fix this problem.

I'm having trouble remembering but yeah enforcing stricter behavior
across fork could be beneficial.  Hmmm... the problem with making
forks exclusive against migrations is that we'll end up adding more
locking to the fork path which isn't too nice.

Hmmm... other controllers (cgroup_freezer) can reliably synchronize
the child's state to the cgroup it belongs to.  Why can't cpuset?  Is
there something fundamentally missing in the cgroup API?

> > It needs to be slightly rewritten to work properly without negatively 
> > impacting the latency of fork().  Do you have the cycles to do it?
> > 
> 
> Sounds you have other idea?

I don't think the suggested patch breaks anything more than it was
broken before and we should probably apply it for the time being.  Li?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-20 21:01               ` Tejun Heo
  0 siblings, 0 replies; 34+ messages in thread
From: Tejun Heo @ 2014-06-20 21:01 UTC (permalink / raw)
  To: Li Zefan
  Cc: David Rientjes, Gu Zheng, Andrew Morton, linux-kernel, linux-mm,
	Cgroups, stable

Hello, Li.

Sorry about the long delay.

On Tue, Jun 10, 2014 at 10:58:45AM +0800, Li Zefan wrote:
> Yes, this is a long-standing issue. Besides the race you described, the child
> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
> child has been added to the cgroup's tasklist.
> 
> I remember Tejun once said he wanted to disallow task migration between
> cgroups during fork, and that should fix this problem.

I'm having trouble remembering but yeah enforcing stricter behavior
across fork could be beneficial.  Hmmm... the problem with making
forks exclusive against migrations is that we'll end up adding more
locking to the fork path which isn't too nice.

Hmmm... other controllers (cgroup_freezer) can reliably synchronize
the child's state to the cgroup it belongs to.  Why can't cpuset?  Is
there something fundamentally missing in the cgroup API?

> > It needs to be slightly rewritten to work properly without negatively 
> > impacting the latency of fork().  Do you have the cycles to do it?
> > 
> 
> Sounds you have other idea?

I don't think the suggested patch breaks anything more than it was
broken before and we should probably apply it for the time being.  Li?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-20 21:01               ` Tejun Heo
  (?)
@ 2014-06-24  2:28                 ` Li Zefan
  -1 siblings, 0 replies; 34+ messages in thread
From: Li Zefan @ 2014-06-24  2:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Rientjes, Gu Zheng, Andrew Morton, linux-kernel, linux-mm,
	Cgroups, stable

On 2014/6/21 5:01, Tejun Heo wrote:
> Hello, Li.
> 
> Sorry about the long delay.
> 
> On Tue, Jun 10, 2014 at 10:58:45AM +0800, Li Zefan wrote:
>> Yes, this is a long-standing issue. Besides the race you described, the child
>> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
>> child has been added to the cgroup's tasklist.
>>
>> I remember Tejun once said he wanted to disallow task migration between
>> cgroups during fork, and that should fix this problem.
> 
> I'm having trouble remembering but yeah enforcing stricter behavior
> across fork could be beneficial.  Hmmm... the problem with making
> forks exclusive against migrations is that we'll end up adding more
> locking to the fork path which isn't too nice.
> 
> Hmmm... other controllers (cgroup_freezer) can reliably synchronize
> the child's state to the cgroup it belongs to.  Why can't cpuset?  Is
> there something fundamentally missing in the cgroup API?
> 

cgroup_freezer uses the fork callback. We can also do this for cpuset as
suggested by David, which adds a little bit overhead to the fork path.

David, care to send out a patch?

>>> It needs to be slightly rewritten to work properly without negatively 
>>> impacting the latency of fork().  Do you have the cycles to do it?
>>>
>>
>> Sounds you have other idea?
> 
> I don't think the suggested patch breaks anything more than it was
> broken before and we should probably apply it for the time being.  Li?
> 

Yeah, we should apply Gu Zheng's patch any way.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-24  2:28                 ` Li Zefan
  0 siblings, 0 replies; 34+ messages in thread
From: Li Zefan @ 2014-06-24  2:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Rientjes, Gu Zheng, Andrew Morton, linux-kernel, linux-mm,
	Cgroups, stable

On 2014/6/21 5:01, Tejun Heo wrote:
> Hello, Li.
> 
> Sorry about the long delay.
> 
> On Tue, Jun 10, 2014 at 10:58:45AM +0800, Li Zefan wrote:
>> Yes, this is a long-standing issue. Besides the race you described, the child
>> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
>> child has been added to the cgroup's tasklist.
>>
>> I remember Tejun once said he wanted to disallow task migration between
>> cgroups during fork, and that should fix this problem.
> 
> I'm having trouble remembering but yeah enforcing stricter behavior
> across fork could be beneficial.  Hmmm... the problem with making
> forks exclusive against migrations is that we'll end up adding more
> locking to the fork path which isn't too nice.
> 
> Hmmm... other controllers (cgroup_freezer) can reliably synchronize
> the child's state to the cgroup it belongs to.  Why can't cpuset?  Is
> there something fundamentally missing in the cgroup API?
> 

cgroup_freezer uses the fork callback. We can also do this for cpuset as
suggested by David, which adds a little bit overhead to the fork path.

David, care to send out a patch?

>>> It needs to be slightly rewritten to work properly without negatively 
>>> impacting the latency of fork().  Do you have the cycles to do it?
>>>
>>
>> Sounds you have other idea?
> 
> I don't think the suggested patch breaks anything more than it was
> broken before and we should probably apply it for the time being.  Li?
> 

Yeah, we should apply Gu Zheng's patch any way.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-24  2:28                 ` Li Zefan
  0 siblings, 0 replies; 34+ messages in thread
From: Li Zefan @ 2014-06-24  2:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Rientjes, Gu Zheng, Andrew Morton, linux-kernel, linux-mm,
	Cgroups, stable

On 2014/6/21 5:01, Tejun Heo wrote:
> Hello, Li.
> 
> Sorry about the long delay.
> 
> On Tue, Jun 10, 2014 at 10:58:45AM +0800, Li Zefan wrote:
>> Yes, this is a long-standing issue. Besides the race you described, the child
>> task's mems_allowed can be wrong if the cpuset's nodemask changes before the
>> child has been added to the cgroup's tasklist.
>>
>> I remember Tejun once said he wanted to disallow task migration between
>> cgroups during fork, and that should fix this problem.
> 
> I'm having trouble remembering but yeah enforcing stricter behavior
> across fork could be beneficial.  Hmmm... the problem with making
> forks exclusive against migrations is that we'll end up adding more
> locking to the fork path which isn't too nice.
> 
> Hmmm... other controllers (cgroup_freezer) can reliably synchronize
> the child's state to the cgroup it belongs to.  Why can't cpuset?  Is
> there something fundamentally missing in the cgroup API?
> 

cgroup_freezer uses the fork callback. We can also do this for cpuset as
suggested by David, which adds a little bit overhead to the fork path.

David, care to send out a patch?

>>> It needs to be slightly rewritten to work properly without negatively 
>>> impacting the latency of fork().  Do you have the cycles to do it?
>>>
>>
>> Sounds you have other idea?
> 
> I don't think the suggested patch breaks anything more than it was
> broken before and we should probably apply it for the time being.  Li?
> 

Yeah, we should apply Gu Zheng's patch any way.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-24  2:28                 ` Li Zefan
@ 2014-06-24 20:58                   ` Tejun Heo
  -1 siblings, 0 replies; 34+ messages in thread
From: Tejun Heo @ 2014-06-24 20:58 UTC (permalink / raw)
  To: Li Zefan
  Cc: David Rientjes, Gu Zheng, Andrew Morton, linux-kernel, linux-mm,
	Cgroups, stable

Hello,

On Tue, Jun 24, 2014 at 10:28:12AM +0800, Li Zefan wrote:
> > I don't think the suggested patch breaks anything more than it was
> > broken before and we should probably apply it for the time being.  Li?
> 
> Yeah, we should apply Gu Zheng's patch any way.

Gu Zheng, can you please respin the patch with updated explanation on
the temporary nature of the change.  I'll apply it once Li acks it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-24 20:58                   ` Tejun Heo
  0 siblings, 0 replies; 34+ messages in thread
From: Tejun Heo @ 2014-06-24 20:58 UTC (permalink / raw)
  To: Li Zefan
  Cc: David Rientjes, Gu Zheng, Andrew Morton, linux-kernel, linux-mm,
	Cgroups, stable

Hello,

On Tue, Jun 24, 2014 at 10:28:12AM +0800, Li Zefan wrote:
> > I don't think the suggested patch breaks anything more than it was
> > broken before and we should probably apply it for the time being.  Li?
> 
> Yeah, we should apply Gu Zheng's patch any way.

Gu Zheng, can you please respin the patch with updated explanation on
the temporary nature of the change.  I'll apply it once Li acks it.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
  2014-06-24 20:58                   ` Tejun Heo
@ 2014-06-25  0:57                     ` Gu Zheng
  -1 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-25  0:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, David Rientjes, Andrew Morton, linux-kernel, linux-mm,
	Cgroups, stable

Hi Tejun,
On 06/25/2014 04:58 AM, Tejun Heo wrote:

> Hello,
> 
> On Tue, Jun 24, 2014 at 10:28:12AM +0800, Li Zefan wrote:
>>> I don't think the suggested patch breaks anything more than it was
>>> broken before and we should probably apply it for the time being.  Li?
>>
>> Yeah, we should apply Gu Zheng's patch any way.
> 
> Gu Zheng, can you please respin the patch with updated explanation on
> the temporary nature of the change.  I'll apply it once Li acks it.

OK, I'll resend it soon.

Regards,
Gu

> 
> Thanks.
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm/mempolicy: fix sleeping function called from invalid context
@ 2014-06-25  0:57                     ` Gu Zheng
  0 siblings, 0 replies; 34+ messages in thread
From: Gu Zheng @ 2014-06-25  0:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, David Rientjes, Andrew Morton, linux-kernel, linux-mm,
	Cgroups, stable

Hi Tejun,
On 06/25/2014 04:58 AM, Tejun Heo wrote:

> Hello,
> 
> On Tue, Jun 24, 2014 at 10:28:12AM +0800, Li Zefan wrote:
>>> I don't think the suggested patch breaks anything more than it was
>>> broken before and we should probably apply it for the time being.  Li?
>>
>> Yeah, we should apply Gu Zheng's patch any way.
> 
> Gu Zheng, can you please respin the patch with updated explanation on
> the temporary nature of the change.  I'll apply it once Li acks it.

OK, I'll resend it soon.

Regards,
Gu

> 
> Thanks.
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2014-06-25  1:08 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-05  8:28 [PATCH] mm/mempolicy: fix sleeping function called from invalid context Gu Zheng
2014-06-05  8:28 ` Gu Zheng
2014-06-05 14:18 ` Greg KH
2014-06-05 14:18   ` Greg KH
2014-06-06  9:34   ` Gu Zheng
2014-06-06  9:34     ` Gu Zheng
2014-06-06  9:34     ` Gu Zheng
2014-06-05 20:23 ` Andrew Morton
2014-06-05 20:23   ` Andrew Morton
2014-06-05 20:23   ` Andrew Morton
2014-06-06 10:07   ` Gu Zheng
2014-06-06 10:07     ` Gu Zheng
2014-06-08 22:47     ` David Rientjes
2014-06-08 22:47       ` David Rientjes
2014-06-09  8:48       ` Gu Zheng
2014-06-09  8:48         ` Gu Zheng
2014-06-09  9:13         ` David Rientjes
2014-06-09  9:13           ` David Rientjes
2014-06-09  9:58           ` Gu Zheng
2014-06-09  9:58             ` Gu Zheng
2014-06-10  2:58           ` Li Zefan
2014-06-10  2:58             ` Li Zefan
2014-06-10 22:16             ` David Rientjes
2014-06-10 22:16               ` David Rientjes
2014-06-10 22:16               ` David Rientjes
2014-06-20 21:01             ` Tejun Heo
2014-06-20 21:01               ` Tejun Heo
2014-06-24  2:28               ` Li Zefan
2014-06-24  2:28                 ` Li Zefan
2014-06-24  2:28                 ` Li Zefan
2014-06-24 20:58                 ` Tejun Heo
2014-06-24 20:58                   ` Tejun Heo
2014-06-25  0:57                   ` Gu Zheng
2014-06-25  0:57                     ` Gu Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.