Re: + cpu_hotplug-dont-affect-current-tasks-affinity.patch added to -mm tree

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: + cpu_hotplug-dont-affect-current-tasks-affinity.patch added to -mm tree
@ 2009-07-29  2:33 Oleg Nesterov
  2009-07-29 21:21 ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2009-07-29  2:33 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar, Lai Jiangshan, Rusty Russell; +Cc: linux-kernel

Noticed this patch by accident...

> Subject: cpu_hotplug: don't affect current task's affinity
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>
> _cpu_down() changes the current task's affinity and then recovers it at
> the end.
>
> It has two problems:

Yes.

But can't we make a much more simple patch ?

Why should we play with set_cpus_allowed() at all? We can just migrate
the caller of _cpu_down() before other tasks, right after take_cpu_down()
does __cpu_disable()->remove_cpu_from_maps().

No?

Oleg.

------------------------------------------------------------------------
Uncompiled, and we need to export move_task_off_dead_cpu(). Just to
explain what I mean.

 kernel/cpu.c |   18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -163,6 +163,7 @@ static inline void check_for_tasks(int c
 }
 
 struct take_cpu_down_param {
+	struct task_struct *caller;
 	unsigned long mod;
 	void *hcpu;
 };
@@ -181,6 +182,8 @@ static int __ref take_cpu_down(void *_pa
 	raw_notifier_call_chain(&cpu_chain, CPU_DYING | param->mod,
 				param->hcpu);
 
+	move_task_off_dead_cpu(param->hcpu, param->caller);
+
 	/* Force idle task to run as soon as we yield: it should
 	   immediately notice cpu is offline and die quickly. */
 	sched_idle_next();
@@ -191,10 +194,10 @@ static int __ref take_cpu_down(void *_pa
 static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 {
 	int err, nr_calls = 0;
-	cpumask_var_t old_allowed;
 	void *hcpu = (void *)(long)cpu;
 	unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
 	struct take_cpu_down_param tcd_param = {
+		.caller = current,
 		.mod = mod,
 		.hcpu = hcpu,
 	};
@@ -205,9 +208,6 @@ static int __ref _cpu_down(unsigned int 
 	if (!cpu_online(cpu))
 		return -EINVAL;
 
-	if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL))
-		return -ENOMEM;
-
 	cpu_hotplug_begin();
 	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
 					hcpu, -1, &nr_calls);
@@ -221,11 +221,6 @@ static int __ref _cpu_down(unsigned int 
 		goto out_release;
 	}
 
-	/* Ensure that we are not runnable on dying cpu */
-	cpumask_copy(old_allowed, &current->cpus_allowed);
-	set_cpus_allowed_ptr(current,
-			     cpumask_of(cpumask_any_but(cpu_online_mask, cpu)));
-
 	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
@@ -233,7 +228,7 @@ static int __ref _cpu_down(unsigned int 
 					    hcpu) == NOTIFY_BAD)
 			BUG();
 
-		goto out_allowed;
+		goto out_release;
 	}
 	BUG_ON(cpu_online(cpu));
 
@@ -251,8 +246,6 @@ static int __ref _cpu_down(unsigned int 
 
 	check_for_tasks(cpu);
 
-out_allowed:
-	set_cpus_allowed_ptr(current, old_allowed);
 out_release:
 	cpu_hotplug_done();
 	if (!err) {
@@ -260,7 +253,6 @@ out_release:
 					    hcpu) == NOTIFY_BAD)
 			BUG();
 	}
-	free_cpumask_var(old_allowed);
 	return err;
 }
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: + cpu_hotplug-dont-affect-current-tasks-affinity.patch added to -mm tree
  2009-07-29  2:33 + cpu_hotplug-dont-affect-current-tasks-affinity.patch added to -mm tree Oleg Nesterov
@ 2009-07-29 21:21 ` Oleg Nesterov
  2009-07-29 21:22   ` [PATCH] cpusets: fix deadlock with cpu_down()->cpuset_lock() Oleg Nesterov
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Oleg Nesterov @ 2009-07-29 21:21 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar, Lai Jiangshan, Rusty Russell
  Cc: linux-kernel, Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

(add more CCs)

On 07/29, Oleg Nesterov wrote:
>
> But can't we make a much more simple patch ?
>
> Why should we play with set_cpus_allowed() at all? We can just migrate
> the caller of _cpu_down() before other tasks, right after take_cpu_down()
> does __cpu_disable()->remove_cpu_from_maps().
>
> No?
>
> Oleg.
>
> ------------------------------------------------------------------------
> Uncompiled, and we need to export move_task_off_dead_cpu(). Just to
> explain what I mean.

OK, tested. I think this should work. But first we should fix the bug
with cpusets. I'll send the patch in a minute...

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] cpusets: fix deadlock with cpu_down()->cpuset_lock()
  2009-07-29 21:21 ` Oleg Nesterov
@ 2009-07-29 21:22   ` Oleg Nesterov
  2009-07-29 23:00     ` Oleg Nesterov
  2009-07-29 21:42   ` [PATCH 0/1] cpu_hotplug: don't play with current->cpus_allowed Oleg Nesterov
  2009-07-29 21:43   ` [PATCH 1/1] " Oleg Nesterov
  2 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2009-07-29 21:22 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar, Lai Jiangshan, Rusty Russell
  Cc: linux-kernel, Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

I strongly believe the bug does exist, but this patch needs the review
from maintainers.

Suppose that task T bound to CPU takes callback_mutex. If cpu_down(CPU)
happens before T drops callback_mutex we have a deadlock.

stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
cpuset_lock() and hangs forever because CPU is already dead and thus
T can't be scheduled.

This patch unexports cpuset_lock/cpuset_unlock and converts kernel/cpuset.c
to use these helpers instead of lock/unlock of callback_mutex. The only
reason for migration_call()->cpuset_lock() was cpuset_cpus_allowed_locked()
called by move_task_off_dead_cpu(), so this patch adds get_online_cpus() to
cpuset_lock().

IOW, with this patch migration_call(CPU_DEAD) runs without callback_mutex,
but kernel/cpuset.c always takes get_online_cpus() before callback_mutex.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 include/linux/cpuset.h |    6 ---
 kernel/sched.c         |    2 -
 kernel/cpuset.c        |   86 +++++++++++++++++++++----------------------------
 3 files changed, 37 insertions(+), 57 deletions(-)

--- CPUHP/include/linux/cpuset.h~CPU_SET_LOCK	2009-06-17 14:11:26.000000000 +0200
+++ CPUHP/include/linux/cpuset.h	2009-07-29 22:17:41.000000000 +0200
@@ -69,9 +69,6 @@ struct seq_file;
 extern void cpuset_task_status_allowed(struct seq_file *m,
 					struct task_struct *task);
 
-extern void cpuset_lock(void);
-extern void cpuset_unlock(void);
-
 extern int cpuset_mem_spread_node(void);
 
 static inline int cpuset_do_page_mem_spread(void)
@@ -157,9 +154,6 @@ static inline void cpuset_task_status_al
 {
 }
 
-static inline void cpuset_lock(void) {}
-static inline void cpuset_unlock(void) {}
-
 static inline int cpuset_mem_spread_node(void)
 {
 	return 0;
--- CPUHP/kernel/sched.c~CPU_SET_LOCK	2009-07-23 17:06:39.000000000 +0200
+++ CPUHP/kernel/sched.c	2009-07-29 22:18:33.000000000 +0200
@@ -7550,7 +7550,6 @@ migration_call(struct notifier_block *nf
 
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
-		cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);
 		kthread_stop(rq->migration_thread);
@@ -7565,7 +7564,6 @@ migration_call(struct notifier_block *nf
 		rq->idle->sched_class = &idle_sched_class;
 		migrate_dead_tasks(cpu);
 		spin_unlock_irq(&rq->lock);
-		cpuset_unlock();
 		migrate_nr_uninterruptible(rq);
 		BUG_ON(rq->nr_running != 0);
 		calc_global_load_remove(rq);
--- CPUHP/kernel/cpuset.c~CPU_SET_LOCK	2009-06-17 14:11:26.000000000 +0200
+++ CPUHP/kernel/cpuset.c	2009-07-29 22:49:30.000000000 +0200
@@ -215,6 +215,21 @@ static struct cpuset top_cpuset = {
 
 static DEFINE_MUTEX(callback_mutex);
 
+static void cpuset_lock(void)
+{
+	/* Protect against cpu_down(), move_task_off_dead_cpu() needs
+	 * cpuset_cpus_allowed_locked()
+	 */
+	get_online_cpus();
+	mutex_lock(&callback_mutex);
+}
+
+static void cpuset_unlock(void)
+{
+	mutex_unlock(&callback_mutex);
+	put_online_cpus();
+}
+
 /*
  * cpuset_buffer_lock protects both the cpuset_name and cpuset_nodelist
  * buffers.  They are statically allocated to prevent using excess stack
@@ -890,9 +905,9 @@ static int update_cpumask(struct cpuset 
 
 	is_load_balanced = is_sched_load_balance(trialcs);
 
-	mutex_lock(&callback_mutex);
+	cpuset_lock();
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
-	mutex_unlock(&callback_mutex);
+	cpuset_unlock();
 
 	/*
 	 * Scan tasks in the cpuset, and update the cpumasks of any
@@ -1093,9 +1108,9 @@ static int update_nodemask(struct cpuset
 	if (retval < 0)
 		goto done;
 
-	mutex_lock(&callback_mutex);
+	cpuset_lock();
 	cs->mems_allowed = trialcs->mems_allowed;
-	mutex_unlock(&callback_mutex);
+	cpuset_unlock();
 
 	update_tasks_nodemask(cs, &oldmem, &heap);
 
@@ -1207,9 +1222,9 @@ static int update_flag(cpuset_flagbits_t
 	spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs))
 			|| (is_spread_page(cs) != is_spread_page(trialcs)));
 
-	mutex_lock(&callback_mutex);
+	cpuset_lock();
 	cs->flags = trialcs->flags;
-	mutex_unlock(&callback_mutex);
+	cpuset_unlock();
 
 	if (!cpumask_empty(trialcs->cpus_allowed) && balance_flag_changed)
 		async_rebuild_sched_domains();
@@ -1516,9 +1531,9 @@ static int cpuset_sprintf_cpulist(char *
 {
 	int ret;
 
-	mutex_lock(&callback_mutex);
+	cpuset_lock();
 	ret = cpulist_scnprintf(page, PAGE_SIZE, cs->cpus_allowed);
-	mutex_unlock(&callback_mutex);
+	cpuset_unlock();
 
 	return ret;
 }
@@ -1527,9 +1542,9 @@ static int cpuset_sprintf_memlist(char *
 {
 	nodemask_t mask;
 
-	mutex_lock(&callback_mutex);
+	cpuset_lock();
 	mask = cs->mems_allowed;
-	mutex_unlock(&callback_mutex);
+	cpuset_unlock();
 
 	return nodelist_scnprintf(page, PAGE_SIZE, mask);
 }
@@ -1980,12 +1995,12 @@ static void scan_for_empty_cpusets(struc
 		oldmems = cp->mems_allowed;
 
 		/* Remove offline cpus and mems from this cpuset. */
-		mutex_lock(&callback_mutex);
+		cpuset_lock();
 		cpumask_and(cp->cpus_allowed, cp->cpus_allowed,
 			    cpu_online_mask);
 		nodes_and(cp->mems_allowed, cp->mems_allowed,
 						node_states[N_HIGH_MEMORY]);
-		mutex_unlock(&callback_mutex);
+		cpuset_unlock();
 
 		/* Move tasks from the empty cpuset to a parent */
 		if (cpumask_empty(cp->cpus_allowed) ||
@@ -2029,9 +2044,9 @@ static int cpuset_track_online_cpus(stru
 	}
 
 	cgroup_lock();
-	mutex_lock(&callback_mutex);
+	cpuset_lock();
 	cpumask_copy(top_cpuset.cpus_allowed, cpu_online_mask);
-	mutex_unlock(&callback_mutex);
+	cpuset_unlock();
 	scan_for_empty_cpusets(&top_cpuset);
 	ndoms = generate_sched_domains(&doms, &attr);
 	cgroup_unlock();
@@ -2055,9 +2070,9 @@ static int cpuset_track_online_nodes(str
 	switch (action) {
 	case MEM_ONLINE:
 	case MEM_OFFLINE:
-		mutex_lock(&callback_mutex);
+		cpuset_lock();
 		top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
-		mutex_unlock(&callback_mutex);
+		cpuset_unlock();
 		if (action == MEM_OFFLINE)
 			scan_for_empty_cpusets(&top_cpuset);
 		break;
@@ -2100,9 +2115,9 @@ void __init cpuset_init_smp(void)
 
 void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 {
-	mutex_lock(&callback_mutex);
+	cpuset_lock();
 	cpuset_cpus_allowed_locked(tsk, pmask);
-	mutex_unlock(&callback_mutex);
+	cpuset_unlock();
 }
 
 /**
@@ -2135,11 +2150,11 @@ nodemask_t cpuset_mems_allowed(struct ta
 {
 	nodemask_t mask;
 
-	mutex_lock(&callback_mutex);
+	cpuset_lock();
 	task_lock(tsk);
 	guarantee_online_mems(task_cs(tsk), &mask);
 	task_unlock(tsk);
-	mutex_unlock(&callback_mutex);
+	cpuset_unlock();
 
 	return mask;
 }
@@ -2252,14 +2267,14 @@ int __cpuset_node_allowed_softwall(int n
 		return 1;
 
 	/* Not hardwall and node outside mems_allowed: scan up cpusets */
-	mutex_lock(&callback_mutex);
+	cpuset_lock();
 
 	task_lock(current);
 	cs = nearest_hardwall_ancestor(task_cs(current));
 	task_unlock(current);
 
 	allowed = node_isset(node, cs->mems_allowed);
-	mutex_unlock(&callback_mutex);
+	cpuset_unlock();
 	return allowed;
 }
 
@@ -2302,33 +2317,6 @@ int __cpuset_node_allowed_hardwall(int n
 }
 
 /**
- * cpuset_lock - lock out any changes to cpuset structures
- *
- * The out of memory (oom) code needs to mutex_lock cpusets
- * from being changed while it scans the tasklist looking for a
- * task in an overlapping cpuset.  Expose callback_mutex via this
- * cpuset_lock() routine, so the oom code can lock it, before
- * locking the task list.  The tasklist_lock is a spinlock, so
- * must be taken inside callback_mutex.
- */
-
-void cpuset_lock(void)
-{
-	mutex_lock(&callback_mutex);
-}
-
-/**
- * cpuset_unlock - release lock on cpuset changes
- *
- * Undo the lock taken in a previous cpuset_lock() call.
- */
-
-void cpuset_unlock(void)
-{
-	mutex_unlock(&callback_mutex);
-}
-
-/**
  * cpuset_mem_spread_node() - On which node to begin search for a page
  *
  * If a task is marked PF_SPREAD_PAGE or PF_SPREAD_SLAB (as for


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/1] cpu_hotplug: don't play with current->cpus_allowed
  2009-07-29 21:21 ` Oleg Nesterov
  2009-07-29 21:22   ` [PATCH] cpusets: fix deadlock with cpu_down()->cpuset_lock() Oleg Nesterov
@ 2009-07-29 21:42   ` Oleg Nesterov
  2009-07-29 21:43   ` [PATCH 1/1] " Oleg Nesterov
  2 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2009-07-29 21:42 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar, Lai Jiangshan, Rusty Russell
  Cc: linux-kernel, Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

I am suggesting this patch instead of

	cpu_hotplug-dont-affect-current-tasks-affinity.patch
	http://marc.info/?l=linux-mm-commits&m=124830024011314

in -mm tree.

Of course this is subjective, but it really looks simpler to me.

Not a big deal though, I am sending it mostly because nobody
replied to my question, and I had to actually compile and test
this patch ;)

Thoughts?

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/1] cpu_hotplug: don't play with current->cpus_allowed
  2009-07-29 21:21 ` Oleg Nesterov
  2009-07-29 21:22   ` [PATCH] cpusets: fix deadlock with cpu_down()->cpuset_lock() Oleg Nesterov
  2009-07-29 21:42   ` [PATCH 0/1] cpu_hotplug: don't play with current->cpus_allowed Oleg Nesterov
@ 2009-07-29 21:43   ` Oleg Nesterov
  2009-07-30  2:01     ` Lai Jiangshan
  2 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2009-07-29 21:43 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar, Lai Jiangshan, Rusty Russell
  Cc: linux-kernel, Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

_cpu_down() changes the current task's affinity and then recovers it at
the end. The problems are well known: we can't restore old_allowed if it
was bound to the now-dead-cpu, and we can race with the userspace which
can change cpu-affinity during unplug.

_cpu_down() should not play with current->cpus_allowed at all. Instead,
take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
removes the dying cpu from cpu_online_mask.

Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 include/linux/sched.h |    1 +
 kernel/sched.c        |    2 +-
 kernel/cpu.c          |   19 ++++++-------------
 3 files changed, 8 insertions(+), 14 deletions(-)

--- CPUHP/include/linux/sched.h~CPU_DOWN_AFF	2009-07-23 17:06:39.000000000 +0200
+++ CPUHP/include/linux/sched.h	2009-07-29 23:27:44.000000000 +0200
@@ -1794,6 +1794,7 @@ extern void sched_clock_idle_sleep_event
 extern void sched_clock_idle_wakeup_event(u64 delta_ns);
 
 #ifdef CONFIG_HOTPLUG_CPU
+extern void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p);
 extern void idle_task_exit(void);
 #else
 static inline void idle_task_exit(void) {}
--- CPUHP/kernel/sched.c~CPU_DOWN_AFF	2009-07-29 22:18:33.000000000 +0200
+++ CPUHP/kernel/sched.c	2009-07-29 23:27:44.000000000 +0200
@@ -7118,7 +7118,7 @@ static int __migrate_task_irq(struct tas
 /*
  * Figure out where task on dead CPU should go, use force if necessary.
  */
-static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
+void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
 {
 	int dest_cpu;
 	const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(dead_cpu));
--- CPUHP/kernel/cpu.c~CPU_DOWN_AFF	2009-06-23 14:23:52.000000000 +0200
+++ CPUHP/kernel/cpu.c	2009-07-29 23:27:44.000000000 +0200
@@ -163,6 +163,7 @@ static inline void check_for_tasks(int c
 }
 
 struct take_cpu_down_param {
+	struct task_struct *caller;
 	unsigned long mod;
 	void *hcpu;
 };
@@ -171,6 +172,7 @@ struct take_cpu_down_param {
 static int __ref take_cpu_down(void *_param)
 {
 	struct take_cpu_down_param *param = _param;
+	unsigned int cpu = (unsigned long)param->hcpu;
 	int err;
 
 	/* Ensure this CPU doesn't handle any more interrupts. */
@@ -181,6 +183,8 @@ static int __ref take_cpu_down(void *_pa
 	raw_notifier_call_chain(&cpu_chain, CPU_DYING | param->mod,
 				param->hcpu);
 
+	if (task_cpu(param->caller) == cpu)
+		move_task_off_dead_cpu(cpu, param->caller);
 	/* Force idle task to run as soon as we yield: it should
 	   immediately notice cpu is offline and die quickly. */
 	sched_idle_next();
@@ -191,10 +195,10 @@ static int __ref take_cpu_down(void *_pa
 static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 {
 	int err, nr_calls = 0;
-	cpumask_var_t old_allowed;
 	void *hcpu = (void *)(long)cpu;
 	unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
 	struct take_cpu_down_param tcd_param = {
+		.caller = current,
 		.mod = mod,
 		.hcpu = hcpu,
 	};
@@ -205,9 +209,6 @@ static int __ref _cpu_down(unsigned int 
 	if (!cpu_online(cpu))
 		return -EINVAL;
 
-	if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL))
-		return -ENOMEM;
-
 	cpu_hotplug_begin();
 	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
 					hcpu, -1, &nr_calls);
@@ -221,11 +222,6 @@ static int __ref _cpu_down(unsigned int 
 		goto out_release;
 	}
 
-	/* Ensure that we are not runnable on dying cpu */
-	cpumask_copy(old_allowed, &current->cpus_allowed);
-	set_cpus_allowed_ptr(current,
-			     cpumask_of(cpumask_any_but(cpu_online_mask, cpu)));
-
 	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
@@ -233,7 +229,7 @@ static int __ref _cpu_down(unsigned int 
 					    hcpu) == NOTIFY_BAD)
 			BUG();
 
-		goto out_allowed;
+		goto out_release;
 	}
 	BUG_ON(cpu_online(cpu));
 
@@ -251,8 +247,6 @@ static int __ref _cpu_down(unsigned int 
 
 	check_for_tasks(cpu);
 
-out_allowed:
-	set_cpus_allowed_ptr(current, old_allowed);
 out_release:
 	cpu_hotplug_done();
 	if (!err) {
@@ -260,7 +254,6 @@ out_release:
 					    hcpu) == NOTIFY_BAD)
 			BUG();
 	}
-	free_cpumask_var(old_allowed);
 	return err;
 }
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpusets: fix deadlock with cpu_down()->cpuset_lock()
  2009-07-29 21:22   ` [PATCH] cpusets: fix deadlock with cpu_down()->cpuset_lock() Oleg Nesterov
@ 2009-07-29 23:00     ` Oleg Nesterov
  2009-07-30  1:53       ` Lai Jiangshan
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2009-07-29 23:00 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar, Lai Jiangshan, Rusty Russell
  Cc: linux-kernel, Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

On 07/29, Oleg Nesterov wrote:
>
> I strongly believe the bug does exist, but this patch needs the review
> from maintainers.

Yes...

> IOW, with this patch migration_call(CPU_DEAD) runs without callback_mutex,
> but kernel/cpuset.c always takes get_online_cpus() before callback_mutex.

Oh. I'm afraid this is not an option.

callback_mutex should nest under cgroup_mutex, but cpu hotplu pathes
take cgroup_mutex under cpu_hotplug->lock. Lockdep won't be happy.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpusets: fix deadlock with cpu_down()->cpuset_lock()
  2009-07-29 23:00     ` Oleg Nesterov
@ 2009-07-30  1:53       ` Lai Jiangshan
  2009-07-30 17:51         ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2009-07-30  1:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

Oleg Nesterov wrote:
> On 07/29, Oleg Nesterov wrote:
>> I strongly believe the bug does exist, but this patch needs the review
>> from maintainers.
> 
> Yes...
> 
>> IOW, with this patch migration_call(CPU_DEAD) runs without callback_mutex,
>> but kernel/cpuset.c always takes get_online_cpus() before callback_mutex.
> 
> Oh. I'm afraid this is not an option.
> 
> callback_mutex should nest under cgroup_mutex, but cpu hotplu pathes
> take cgroup_mutex under cpu_hotplug->lock. Lockdep won't be happy.
> 
> Oleg.
> 

We have made great effort to remove get_online_cpus() from cgroup_mutex
critical region.

We can migrate the owner of callback_mutex in migration_call(CPU_DEAD)
at first(and then take callback_mutex and migrate others).
It fixes this bug, but it can't help for your
"cpu_hotplug: don't play with current->cpus_allowed" patch.

Lai.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] cpu_hotplug: don't play with current->cpus_allowed
  2009-07-29 21:43   ` [PATCH 1/1] " Oleg Nesterov
@ 2009-07-30  2:01     ` Lai Jiangshan
  2009-07-30 16:32       ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2009-07-30  2:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

Oleg Nesterov wrote:
> _cpu_down() changes the current task's affinity and then recovers it at
> the end. The problems are well known: we can't restore old_allowed if it
> was bound to the now-dead-cpu, and we can race with the userspace which
> can change cpu-affinity during unplug.
> 
> _cpu_down() should not play with current->cpus_allowed at all. Instead,
> take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
> removes the dying cpu from cpu_online_mask.
> 
> Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> 
>  include/linux/sched.h |    1 +
>  kernel/sched.c        |    2 +-
>  kernel/cpu.c          |   19 ++++++-------------
>  3 files changed, 8 insertions(+), 14 deletions(-)
> 
> --- CPUHP/include/linux/sched.h~CPU_DOWN_AFF	2009-07-23 17:06:39.000000000 +0200
> +++ CPUHP/include/linux/sched.h	2009-07-29 23:27:44.000000000 +0200
> @@ -1794,6 +1794,7 @@ extern void sched_clock_idle_sleep_event
>  extern void sched_clock_idle_wakeup_event(u64 delta_ns);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +extern void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p);
>  extern void idle_task_exit(void);
>  #else
>  static inline void idle_task_exit(void) {}
> --- CPUHP/kernel/sched.c~CPU_DOWN_AFF	2009-07-29 22:18:33.000000000 +0200
> +++ CPUHP/kernel/sched.c	2009-07-29 23:27:44.000000000 +0200
> @@ -7118,7 +7118,7 @@ static int __migrate_task_irq(struct tas
>  /*
>   * Figure out where task on dead CPU should go, use force if necessary.
>   */
> -static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> +void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
>  {
>  	int dest_cpu;
>  	const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(dead_cpu));
> --- CPUHP/kernel/cpu.c~CPU_DOWN_AFF	2009-06-23 14:23:52.000000000 +0200
> +++ CPUHP/kernel/cpu.c	2009-07-29 23:27:44.000000000 +0200
> @@ -163,6 +163,7 @@ static inline void check_for_tasks(int c
>  }
>  
>  struct take_cpu_down_param {
> +	struct task_struct *caller;
>  	unsigned long mod;
>  	void *hcpu;
>  };
> @@ -171,6 +172,7 @@ struct take_cpu_down_param {
>  static int __ref take_cpu_down(void *_param)
>  {
>  	struct take_cpu_down_param *param = _param;
> +	unsigned int cpu = (unsigned long)param->hcpu;
>  	int err;
>  
>  	/* Ensure this CPU doesn't handle any more interrupts. */
> @@ -181,6 +183,8 @@ static int __ref take_cpu_down(void *_pa
>  	raw_notifier_call_chain(&cpu_chain, CPU_DYING | param->mod,
>  				param->hcpu);
>  
> +	if (task_cpu(param->caller) == cpu)
> +		move_task_off_dead_cpu(cpu, param->caller);

move_task_off_dead_cpu() calls cpuset_cpus_allowed_locked() which
needs callback_mutex held. But actually we don't hold it, it'll
will corrupt the work of other task which holds callback_mutex.
Is it right?

Lai



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] cpu_hotplug: don't play with current->cpus_allowed
  2009-07-30  2:01     ` Lai Jiangshan
@ 2009-07-30 16:32       ` Oleg Nesterov
  0 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2009-07-30 16:32 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

On 07/30, Lai Jiangshan wrote:
>
> Oleg Nesterov wrote:
> > _cpu_down() changes the current task's affinity and then recovers it at
> > the end. The problems are well known: we can't restore old_allowed if it
> > was bound to the now-dead-cpu, and we can race with the userspace which
> > can change cpu-affinity during unplug.
> >
> > _cpu_down() should not play with current->cpus_allowed at all. Instead,
> > take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
> > removes the dying cpu from cpu_online_mask.
> >
> >  static int __ref take_cpu_down(void *_param)
> >  {
> >  	struct take_cpu_down_param *param = _param;
> > +	unsigned int cpu = (unsigned long)param->hcpu;
> >  	int err;
> >
> >  	/* Ensure this CPU doesn't handle any more interrupts. */
> > @@ -181,6 +183,8 @@ static int __ref take_cpu_down(void *_pa
> >  	raw_notifier_call_chain(&cpu_chain, CPU_DYING | param->mod,
> >  				param->hcpu);
> >
> > +	if (task_cpu(param->caller) == cpu)
> > +		move_task_off_dead_cpu(cpu, param->caller);
>
> move_task_off_dead_cpu() calls cpuset_cpus_allowed_locked() which
> needs callback_mutex held. But actually we don't hold it, it'll
> will corrupt the work of other task which holds callback_mutex.
> Is it right?

Of course it is not. That is why I tried to kill cpuset_lock() first.

And I still think it must die. But I don't know how to remove it.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpusets: fix deadlock with cpu_down()->cpuset_lock()
  2009-07-30  1:53       ` Lai Jiangshan
@ 2009-07-30 17:51         ` Oleg Nesterov
  2009-07-31  2:23           ` Lai Jiangshan
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2009-07-30 17:51 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

On 07/30, Lai Jiangshan wrote:
>
> Oleg Nesterov wrote:
> > On 07/29, Oleg Nesterov wrote:
> >> I strongly believe the bug does exist, but this patch needs the review
> >> from maintainers.
> >
> > Yes...
> >
> >> IOW, with this patch migration_call(CPU_DEAD) runs without callback_mutex,
> >> but kernel/cpuset.c always takes get_online_cpus() before callback_mutex.
> >
> > Oh. I'm afraid this is not an option.
> >
> > callback_mutex should nest under cgroup_mutex, but cpu hotplu pathes
> > take cgroup_mutex under cpu_hotplug->lock. Lockdep won't be happy.
> >
> > Oleg.
> >
>
> We have made great effort to remove get_online_cpus() from cgroup_mutex
> critical region.

Agreed.

> We can migrate the owner of callback_mutex in migration_call(CPU_DEAD)
> at first(and then take callback_mutex and migrate others).

Not sure I understand how can we do this. Even if we know the owner
of callback_mutex, if we can migrate it safely without callback_mutex
why we can't migrate other tasks without this lock?

In any case this doesn't look like a clean solution, imho. But I hardly
understand what cpuset is, can't suggest something clever.

I don't really understand why guarantee_online_cpus() needs this mutex,
and I don' understand why it have to check cs->parent.

update_cpumask() doesn't allow to set ->cpus_allowed which does not
intersect with cpu_online_mask (unless cs is empty). This means that
guarantee_online_cpus()->cpumask_intersects() == T is only possible
when we are called from cpu_down() path, right? But can't we just
return cpu_online_mask in this case? I mean,

	static void guarantee_online_cpus(const struct cpuset *cs,
					  struct cpumask *pmask)
	{
		if (cpumask_intersects(cs->cpus_allowed, cpu_online_mask))
			cpumask_and(pmask, cs->cpus_allowed, cpu_online_mask);
		else
			/* !!!!!!!!!!!!!!!!!!!!
			 * cpuset_track_online_cpus(CPU_DEAD)->scan_for_empty_cpusets()
			 * will fix this.
			 */
			cpumask_copy(pmask, cpu_online_mask);
	}

Most probably I missed something, never looked in cpuset.c before.

Oleg.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpusets: fix deadlock with cpu_down()->cpuset_lock()
  2009-07-30 17:51         ` Oleg Nesterov
@ 2009-07-31  2:23           ` Lai Jiangshan
  2009-08-01  4:42             ` [PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down() Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2009-07-31  2:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

Oleg Nesterov wrote:
> On 07/30, Lai Jiangshan wrote:
>> Oleg Nesterov wrote:
>>> On 07/29, Oleg Nesterov wrote:
>>>> I strongly believe the bug does exist, but this patch needs the review
>>>> from maintainers.
>>> Yes...
>>>
>>>> IOW, with this patch migration_call(CPU_DEAD) runs without callback_mutex,
>>>> but kernel/cpuset.c always takes get_online_cpus() before callback_mutex.
>>> Oh. I'm afraid this is not an option.
>>>
>>> callback_mutex should nest under cgroup_mutex, but cpu hotplu pathes
>>> take cgroup_mutex under cpu_hotplug->lock. Lockdep won't be happy.
>>>
>>> Oleg.
>>>
>> We have made great effort to remove get_online_cpus() from cgroup_mutex
>> critical region.
> 
> Agreed.
> 
>> We can migrate the owner of callback_mutex in migration_call(CPU_DEAD)
>> at first(and then take callback_mutex and migrate others).
> 
> Not sure I understand how can we do this. Even if we know the owner
> of callback_mutex, if we can migrate it safely without callback_mutex
> why we can't migrate other tasks without this lock?

Since we have migrated the owner, we can take callback_mutex to
migrate others ... 

> 
> In any case this doesn't look like a clean solution,

No, it's not a clean solution.

> imho. But I hardly understand what cpuset is,


> can't suggest something clever.

We can add cpuset_lock()/cpuset_unlock() around __stop_machine()
in _cpu_down().

cpuset_lock()
__stop_machine()
	......
	mutex_lock(&lock);
	# It's OK, because we don't require any other lock in this
	# critical region. It's will not cause any kinds of deadlock.
	......
	flush_workqueue(stop_machine_wq);
	# It's OK too. because all work functions(chill(),stop_cpu())
	# of stop_machine_wq don't require any other lock.
	......
	mutex_unlock(&lock);
cpuset_unlock()


This fixes the bug in migrate_call(). Because there is no task which
holds callback_mutex in dead cpu after we add
cpuset_lock()/cpuset_unlock() around __stop_machine() in _cpu_down().

And it helps for your "cpu_hotplug: don't play with current->cpus_allowed"
Am I right?

Lai


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down()
  2009-07-31  2:23           ` Lai Jiangshan
@ 2009-08-01  4:42             ` Oleg Nesterov
  2009-08-01  5:34               ` Oleg Nesterov
  2009-08-02  2:18               ` Lai Jiangshan
  0 siblings, 2 replies; 17+ messages in thread
From: Oleg Nesterov @ 2009-08-01  4:42 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

On 07/31, Lai Jiangshan wrote:
>
>
> We can add cpuset_lock()/cpuset_unlock() around __stop_machine()
> in _cpu_down().

Yes I think this should work. CPU_DYING must not take cpuset_lock().

But. This way cpuset_lock() becomes even more subtle (and imho fragile).

What is even more importanrt (to me ;), you didn't answer my question:
why can't we kill this awful lock??? Why can't we simplify things?


I'm almost sure I missed something. As I said I do not understand cpusets
and honestly I'd like to avoid studying it.

You forced me to make another patch, please explain what I have missed ;)

Thanks!

Oleg.

-------------------------------------------------------------------------------
[PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down()

I strongly believe the bug does exist, but this patch needs the review
from maintainers. COMPILE TESTED.

Suppose that task T bound to CPU takes callback_mutex. If cpu_down(CPU)
happens before T drops callback_mutex we have a deadlock.

stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
cpuset_lock() and hangs forever because CPU is already dead and thus
T can't be scheduled.

This patch:

	- kills cpuset_lock() and cpuset_cpus_allowed_locked()

	- converts move_task_off_dead_cpu() to use cpuset_cpus_allowed()

	- adds cpuset->cpumask_lock, this lock is taken by
	  update_cpumask() around cpumask_copy(cs->cpus_allowed, newcpus).

	  From now we can access cs->cpus_allowed safely either under the
	  global callback_mutex or cs->cpumask_lock.

Then we change guarantee_online_cpus(),

	- take cs->cpumask_lock instead of callback_mutex

	- do NOT scan cs->parent cpusets. If there are no online CPUs in
	  cs->cpus_allowed, we use cpu_online_mask. This is only possible
	  when we are called by cpu_down() hooks, in that case
	  cpuset_track_online_cpus(CPU_DEAD) will fix things later.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 include/linux/cpuset.h |   13 ----------
 kernel/sched.c         |    4 ---
 kernel/cpuset.c        |   61 +++++++++++++------------------------------------
 3 files changed, 18 insertions(+), 60 deletions(-)

--- CPUHP/include/linux/cpuset.h~CPU_SET_LOCK	2009-08-01 04:29:15.000000000 +0200
+++ CPUHP/include/linux/cpuset.h	2009-08-01 05:08:25.000000000 +0200
@@ -21,8 +21,6 @@ extern int number_of_cpusets;	/* How man
 extern int cpuset_init(void);
 extern void cpuset_init_smp(void);
 extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
-extern void cpuset_cpus_allowed_locked(struct task_struct *p,
-				       struct cpumask *mask);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
 #define cpuset_current_mems_allowed (current->mems_allowed)
 void cpuset_init_current_mems_allowed(void);
@@ -69,9 +67,6 @@ struct seq_file;
 extern void cpuset_task_status_allowed(struct seq_file *m,
 					struct task_struct *task);
 
-extern void cpuset_lock(void);
-extern void cpuset_unlock(void);
-
 extern int cpuset_mem_spread_node(void);
 
 static inline int cpuset_do_page_mem_spread(void)
@@ -105,11 +100,6 @@ static inline void cpuset_cpus_allowed(s
 {
 	cpumask_copy(mask, cpu_possible_mask);
 }
-static inline void cpuset_cpus_allowed_locked(struct task_struct *p,
-					      struct cpumask *mask)
-{
-	cpumask_copy(mask, cpu_possible_mask);
-}
 
 static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
 {
@@ -157,9 +147,6 @@ static inline void cpuset_task_status_al
 {
 }
 
-static inline void cpuset_lock(void) {}
-static inline void cpuset_unlock(void) {}
-
 static inline int cpuset_mem_spread_node(void)
 {
 	return 0;
--- CPUHP/kernel/sched.c~CPU_SET_LOCK	2009-08-01 04:29:15.000000000 +0200
+++ CPUHP/kernel/sched.c	2009-08-01 05:06:36.000000000 +0200
@@ -7136,7 +7136,7 @@ again:
 
 	/* No more Mr. Nice Guy. */
 	if (dest_cpu >= nr_cpu_ids) {
-		cpuset_cpus_allowed_locked(p, &p->cpus_allowed);
+		cpuset_cpus_allowed(p, &p->cpus_allowed);
 		dest_cpu = cpumask_any_and(cpu_online_mask, &p->cpus_allowed);
 
 		/*
@@ -7550,7 +7550,6 @@ migration_call(struct notifier_block *nf
 
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
-		cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);
 		kthread_stop(rq->migration_thread);
@@ -7565,7 +7564,6 @@ migration_call(struct notifier_block *nf
 		rq->idle->sched_class = &idle_sched_class;
 		migrate_dead_tasks(cpu);
 		spin_unlock_irq(&rq->lock);
-		cpuset_unlock();
 		migrate_nr_uninterruptible(rq);
 		BUG_ON(rq->nr_running != 0);
 		calc_global_load_remove(rq);
--- CPUHP/kernel/cpuset.c~CPU_SET_LOCK	2009-08-01 04:29:15.000000000 +0200
+++ CPUHP/kernel/cpuset.c	2009-08-01 06:29:15.000000000 +0200
@@ -92,6 +92,7 @@ struct cpuset {
 	struct cgroup_subsys_state css;
 
 	unsigned long flags;		/* "unsigned long" so bitops work */
+	spinlock_t cpumask_lock;	/* protects ->cpus_allowed */
 	cpumask_var_t cpus_allowed;	/* CPUs allowed to tasks in cpuset */
 	nodemask_t mems_allowed;	/* Memory Nodes allowed to tasks */
 
@@ -267,16 +268,23 @@ static struct file_system_type cpuset_fs
  * Call with callback_mutex held.
  */
 
-static void guarantee_online_cpus(const struct cpuset *cs,
-				  struct cpumask *pmask)
+static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
 {
-	while (cs && !cpumask_intersects(cs->cpus_allowed, cpu_online_mask))
-		cs = cs->parent;
 	if (cs)
+		spin_lock(&cs->cpumask_lock);
+	/*
+	 * cs->cpus_allowed must include online CPUs, or we are called
+	 * from cpu_down() hooks. In that case use cpu_online_mask
+	 * temporary until scan_for_empty_cpusets() moves us to ->parent
+	 * cpuset.
+	 */
+	if (cs && cpumask_intersects(cs->cpus_allowed, cpu_online_mask))
 		cpumask_and(pmask, cs->cpus_allowed, cpu_online_mask);
 	else
 		cpumask_copy(pmask, cpu_online_mask);
-	BUG_ON(!cpumask_intersects(pmask, cpu_online_mask));
+
+	if (cs)
+		spin_unlock(&cs->cpumask_lock);
 }
 
 /*
@@ -891,7 +899,9 @@ static int update_cpumask(struct cpuset 
 	is_load_balanced = is_sched_load_balance(trialcs);
 
 	mutex_lock(&callback_mutex);
+	spin_lock(&cs->cpumask_lock);
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
+	spin_unlock(&cs->cpumask_lock);
 	mutex_unlock(&callback_mutex);
 
 	/*
@@ -1781,6 +1791,8 @@ static struct cgroup_subsys_state *cpuse
 	cs = kmalloc(sizeof(*cs), GFP_KERNEL);
 	if (!cs)
 		return ERR_PTR(-ENOMEM);
+
+	spin_lock_init(&cs->cpumask_lock);
 	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL)) {
 		kfree(cs);
 		return ERR_PTR(-ENOMEM);
@@ -2097,20 +2109,8 @@ void __init cpuset_init_smp(void)
  * subset of cpu_online_map, even if this means going outside the
  * tasks cpuset.
  **/
-
 void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 {
-	mutex_lock(&callback_mutex);
-	cpuset_cpus_allowed_locked(tsk, pmask);
-	mutex_unlock(&callback_mutex);
-}
-
-/**
- * cpuset_cpus_allowed_locked - return cpus_allowed mask from a tasks cpuset.
- * Must be called with callback_mutex held.
- **/
-void cpuset_cpus_allowed_locked(struct task_struct *tsk, struct cpumask *pmask)
-{
 	task_lock(tsk);
 	guarantee_online_cpus(task_cs(tsk), pmask);
 	task_unlock(tsk);
@@ -2302,33 +2302,6 @@ int __cpuset_node_allowed_hardwall(int n
 }
 
 /**
- * cpuset_lock - lock out any changes to cpuset structures
- *
- * The out of memory (oom) code needs to mutex_lock cpusets
- * from being changed while it scans the tasklist looking for a
- * task in an overlapping cpuset.  Expose callback_mutex via this
- * cpuset_lock() routine, so the oom code can lock it, before
- * locking the task list.  The tasklist_lock is a spinlock, so
- * must be taken inside callback_mutex.
- */
-
-void cpuset_lock(void)
-{
-	mutex_lock(&callback_mutex);
-}
-
-/**
- * cpuset_unlock - release lock on cpuset changes
- *
- * Undo the lock taken in a previous cpuset_lock() call.
- */
-
-void cpuset_unlock(void)
-{
-	mutex_unlock(&callback_mutex);
-}
-
-/**
  * cpuset_mem_spread_node() - On which node to begin search for a page
  *
  * If a task is marked PF_SPREAD_PAGE or PF_SPREAD_SLAB (as for


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down()
  2009-08-01  4:42             ` [PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down() Oleg Nesterov
@ 2009-08-01  5:34               ` Oleg Nesterov
  2009-08-02  2:18               ` Lai Jiangshan
  1 sibling, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2009-08-01  5:34 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

Sorry for noise, forgot to mention...

On 08/01, Oleg Nesterov wrote:
>
> 	- adds cpuset->cpumask_lock, this lock is taken by
> 	  update_cpumask() around cpumask_copy(cs->cpus_allowed, newcpus).

Most probably there are other places where ->cpus_allowed is changed,
in that case they need ->cpumask_lock too. But this is trivial, the
patch I sent just explains what I meant.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down()
  2009-08-01  4:42             ` [PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down() Oleg Nesterov
  2009-08-01  5:34               ` Oleg Nesterov
@ 2009-08-02  2:18               ` Lai Jiangshan
  2009-08-02  6:55                 ` Oleg Nesterov
  1 sibling, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2009-08-02  2:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

Oleg Nesterov wrote:

> 
> 	- do NOT scan cs->parent cpusets. If there are no online CPUs in
> 	  cs->cpus_allowed, we use cpu_online_mask. This is only possible
> 	  when we are called by cpu_down() hooks, in that case
> 	  cpuset_track_online_cpus(CPU_DEAD) will fix things later.
> 


We must scan cs->parent cpusets.
A task is constrained by a cpuset, it must be constrained
this cpuset's parent too.

cpuset_lock() is not awful at all.

Lai.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down()
  2009-08-02  2:18               ` Lai Jiangshan
@ 2009-08-02  6:55                 ` Oleg Nesterov
  2009-08-04  7:35                   ` Lai Jiangshan
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2009-08-02  6:55 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

On 08/02, Lai Jiangshan wrote:
>
> Oleg Nesterov wrote:
>
> >
> > 	- do NOT scan cs->parent cpusets. If there are no online CPUs in
> > 	  cs->cpus_allowed, we use cpu_online_mask. This is only possible
> > 	  when we are called by cpu_down() hooks, in that case
> > 	  cpuset_track_online_cpus(CPU_DEAD) will fix things later.
> >
>
>
> We must scan cs->parent cpusets.
> A task is constrained by a cpuset,

Yes, the task esacpes its cpuset. With or without this patch.
Because cs->cpus_allowed has no online CPUs.

> it must be constrained
> this cpuset's parent too.

It will be constained again, after scan_for_empty_cpusets(), no?

The only difference this patch adds is that a task temporary uses
cpu_online_mask when its cs->cpuset_allowed becomes empty. Why
this is so bad? This can only happen when CPU dies and cs becomes
empty, force majeure.

Even without this patch, the task is not actually constrained by
its cs->parent. Yes, we use ->parent->cpus_allowed. But this mask
can be changed right after guarantee_online_cpus() returns. And
since this task does not belong to cs->parent cpuset,
update_tasks_cpumask() will not fix this task. Again, until
cpuset_track_online_cpus().

Could you explain what problem this patch adds?

> cpuset_lock() is not awful at all.

OK it is not awful. But surely it complicates things. And currently
it is buggy.

Oleg.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down()
  2009-08-02  6:55                 ` Oleg Nesterov
@ 2009-08-04  7:35                   ` Lai Jiangshan
  2009-08-04 16:36                     ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2009-08-04  7:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

Oleg Nesterov wrote:
> On 08/02, Lai Jiangshan wrote:
>> Oleg Nesterov wrote:
>>
>>> 	- do NOT scan cs->parent cpusets. If there are no online CPUs in
>>> 	  cs->cpus_allowed, we use cpu_online_mask. This is only possible
>>> 	  when we are called by cpu_down() hooks, in that case
>>> 	  cpuset_track_online_cpus(CPU_DEAD) will fix things later.
>>>
>>
>> We must scan cs->parent cpusets.
>> A task is constrained by a cpuset,
> 
> Yes, the task esacpes its cpuset. With or without this patch.
> Because cs->cpus_allowed has no online CPUs.
> 
>> it must be constrained
>> this cpuset's parent too.
> 
> It will be constained again, after scan_for_empty_cpusets(), no?

cpuset_cpus_allowed() is not only  used for CPU offline.

sched_setaffinity() also uses it. The task will not be constained again.
Or I missed something.

Lai


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down()
  2009-08-04  7:35                   ` Lai Jiangshan
@ 2009-08-04 16:36                     ` Oleg Nesterov
  0 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2009-08-04 16:36 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Ingo Molnar, Rusty Russell, linux-kernel,
	Li Zefan, Miao Xie, Paul Menage, Peter Zijlstra,
	Gautham R Shenoy

On 08/04, Lai Jiangshan wrote:
>
> Oleg Nesterov wrote:
> > On 08/02, Lai Jiangshan wrote:
> >> Oleg Nesterov wrote:
> >>
> >>> 	- do NOT scan cs->parent cpusets. If there are no online CPUs in
> >>> 	  cs->cpus_allowed, we use cpu_online_mask. This is only possible
> >>> 	  when we are called by cpu_down() hooks, in that case
> >>> 	  cpuset_track_online_cpus(CPU_DEAD) will fix things later.
> >>>
> >>
> >> We must scan cs->parent cpusets.
> >> A task is constrained by a cpuset,
> >
> > Yes, the task esacpes its cpuset. With or without this patch.
> > Because cs->cpus_allowed has no online CPUs.
> >
> >> it must be constrained
> >> this cpuset's parent too.
> >
> > It will be constained again, after scan_for_empty_cpusets(), no?
>
> cpuset_cpus_allowed() is not only  used for CPU offline.
>
> sched_setaffinity() also uses it.

Sure. And it must take get_online_cpus() to avoid the races with hotplug.

> The task will not be constained again.

It will be constrained, and we don't need to scan cs->parent.

Because, since we are protected from cpu_down(), cs->cpus_allowed must
have online CPUs. update_cpumask() doesn't allow the empty cpumasks.

> Or I missed something.

Or me ;)

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-08-04 16:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-29  2:33 + cpu_hotplug-dont-affect-current-tasks-affinity.patch added to -mm tree Oleg Nesterov
2009-07-29 21:21 ` Oleg Nesterov
2009-07-29 21:22   ` [PATCH] cpusets: fix deadlock with cpu_down()->cpuset_lock() Oleg Nesterov
2009-07-29 23:00     ` Oleg Nesterov
2009-07-30  1:53       ` Lai Jiangshan
2009-07-30 17:51         ` Oleg Nesterov
2009-07-31  2:23           ` Lai Jiangshan
2009-08-01  4:42             ` [PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down() Oleg Nesterov
2009-08-01  5:34               ` Oleg Nesterov
2009-08-02  2:18               ` Lai Jiangshan
2009-08-02  6:55                 ` Oleg Nesterov
2009-08-04  7:35                   ` Lai Jiangshan
2009-08-04 16:36                     ` Oleg Nesterov
2009-07-29 21:42   ` [PATCH 0/1] cpu_hotplug: don't play with current->cpus_allowed Oleg Nesterov
2009-07-29 21:43   ` [PATCH 1/1] " Oleg Nesterov
2009-07-30  2:01     ` Lai Jiangshan
2009-07-30 16:32       ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.