linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CPU Hotplug broken -mm5 onwards
@ 2004-04-18 17:06 Srivatsa Vaddagiri
  2004-04-19  3:34 ` Nick Piggin
       [not found] ` <20040421023650.24b9f85a.akpm@osdl.org>
  0 siblings, 2 replies; 7+ messages in thread
From: Srivatsa Vaddagiri @ 2004-04-18 17:06 UTC (permalink / raw)
  To: rusty, Ingo Molnar, Nick Piggin; +Cc: akpm, linux-kernel, lhcs-devel

Hi,
	I found that I can't boot with CONFIG_HOTPLUG_CPU defined in both
mm5 and mm6. Debugging this revealed it to be because exec path can now require 
cpu hotplug sem (sched_migrate_task) and this has lead to a deadlock between 
flush_workqueue and __call_usermodehelper. 

flush_workqueue takes cpu hotplug sem and blocks until workqueue is flushed.
__call_usermodehelper, one of the queued work function, blocks because it
also needs cpu hotplug sem during exec.  As of result of this, exec does not 
progress and system does not boot.

I feel we can fix this by converting cpucontrol to a reader-writer semaphore or 
big-reader-lock(?). One problem with reader-writer semaphore is there does not
seem to be any down_write_interruptible, which is needed by cpu_down/up.

Comments?

BTW, I think a cpu_is_offline check is needed in sched_migrate_task, since
dest_cpu could have been downed by the time it has acquired the semaphore. 
In which case, we could end up adding the task to dead cpu's runqueue?
An alternate solution would be to put the same check in __migrate_task.



-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPU Hotplug broken -mm5 onwards
  2004-04-18 17:06 CPU Hotplug broken -mm5 onwards Srivatsa Vaddagiri
@ 2004-04-19  3:34 ` Nick Piggin
  2004-04-19 12:58   ` [lhcs-devel] " Srivatsa Vaddagiri
       [not found] ` <20040421023650.24b9f85a.akpm@osdl.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2004-04-19  3:34 UTC (permalink / raw)
  To: vatsa; +Cc: rusty, Ingo Molnar, akpm, linux-kernel, lhcs-devel

Srivatsa Vaddagiri wrote:
> Hi,
> 	I found that I can't boot with CONFIG_HOTPLUG_CPU defined in both
> mm5 and mm6. Debugging this revealed it to be because exec path can now require 
> cpu hotplug sem (sched_migrate_task) and this has lead to a deadlock between 
> flush_workqueue and __call_usermodehelper. 
> 
> flush_workqueue takes cpu hotplug sem and blocks until workqueue is flushed.
> __call_usermodehelper, one of the queued work function, blocks because it
> also needs cpu hotplug sem during exec.  As of result of this, exec does not 
> progress and system does not boot.
> 
> I feel we can fix this by converting cpucontrol to a reader-writer semaphore or 
> big-reader-lock(?). One problem with reader-writer semaphore is there does not
> seem to be any down_write_interruptible, which is needed by cpu_down/up.
> 
> Comments?
> 

You are right, but it wasn't introduced in -mm or sched-domains
patches. However, one of Ingo's recent patches does balance on
exec for SMP, not just NUMA so it will make this more common.

So, Rusty has to fix it ;)

I think a rwsem might be a good idea anyway, because
sched_migrate_task can end up being called pretty often with
balance on exec and balance on clone. The semaphore could easily
place undue serialisation on that path.

> BTW, I think a cpu_is_offline check is needed in sched_migrate_task, since
> dest_cpu could have been downed by the time it has acquired the semaphore. 
> In which case, we could end up adding the task to dead cpu's runqueue?
> An alternate solution would be to put the same check in __migrate_task.
> 

Yes you are correct.

Can we arrange some of these checks to disappear when HOTPLUG_CPU
is not set? For example, make cpu_is_offline only valid to call for
CPUs that have been online sometime, and can evaluate to 0 if
HOTPLUG_CPU is not set?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [lhcs-devel] Re: CPU Hotplug broken -mm5 onwards
  2004-04-19  3:34 ` Nick Piggin
@ 2004-04-19 12:58   ` Srivatsa Vaddagiri
  2004-04-19 22:55     ` Nick Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Srivatsa Vaddagiri @ 2004-04-19 12:58 UTC (permalink / raw)
  To: Nick Piggin; +Cc: rusty, Ingo Molnar, akpm, linux-kernel, lhcs-devel

On Mon, Apr 19, 2004 at 01:34:14PM +1000, Nick Piggin wrote:
> I think a rwsem might be a good idea anyway, because
> sched_migrate_task can end up being called pretty often with
> balance on exec and balance on clone. The semaphore could easily
> place undue serialisation on that path.

I found that r/w sem does not help here ..It can still lead to deadlocks.
One example I hit is :

cpu_up takes write lock, sends out CPU_UP_PREPARE notification. As part
of it, many do kthread_create, which uses workqueue. The work function
is never processed because keventd would be blocked on a previous 
work function, waiting for hotplug sem in exec path.

So, as Rusty said, I think we really need to consider removing
lock_cpu_hotplug from sched_migrate_task. AFAICS that lock
was needed to prevent adding tasks to dead cpus. The same 
can be accomplished by removing lock_cpu_hotplug from sched_migrate_task
and adding a cpu_is_offline check in __migrate_task.
This will eliminate all the deadlocks I have been hitting.


> Can we arrange some of these checks to disappear when HOTPLUG_CPU
> is not set? For example, make cpu_is_offline only valid to call for
> CPUs that have been online sometime, and can evaluate to 0 if
> HOTPLUG_CPU is not set?

I think this is already being done in include/linux/cpu.h

	

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [lhcs-devel] Re: CPU Hotplug broken -mm5 onwards
  2004-04-19 12:58   ` [lhcs-devel] " Srivatsa Vaddagiri
@ 2004-04-19 22:55     ` Nick Piggin
  2004-04-19 23:07       ` Rusty Russell
  0 siblings, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2004-04-19 22:55 UTC (permalink / raw)
  To: vatsa; +Cc: rusty, Ingo Molnar, akpm, linux-kernel, lhcs-devel

Srivatsa Vaddagiri wrote:
> On Mon, Apr 19, 2004 at 01:34:14PM +1000, Nick Piggin wrote:
> 
>>I think a rwsem might be a good idea anyway, because
>>sched_migrate_task can end up being called pretty often with
>>balance on exec and balance on clone. The semaphore could easily
>>place undue serialisation on that path.
> 
> 
> I found that r/w sem does not help here ..It can still lead to deadlocks.
> One example I hit is :
> 
> cpu_up takes write lock, sends out CPU_UP_PREPARE notification. As part
> of it, many do kthread_create, which uses workqueue. The work function
> is never processed because keventd would be blocked on a previous 
> work function, waiting for hotplug sem in exec path.
> 
> So, as Rusty said, I think we really need to consider removing
> lock_cpu_hotplug from sched_migrate_task. AFAICS that lock
> was needed to prevent adding tasks to dead cpus. The same 
> can be accomplished by removing lock_cpu_hotplug from sched_migrate_task
> and adding a cpu_is_offline check in __migrate_task.
> This will eliminate all the deadlocks I have been hitting.
> 

Yes this would be a better idea. Care to send Andrew a patch
against -mm?

> 
> 
>>Can we arrange some of these checks to disappear when HOTPLUG_CPU
>>is not set? For example, make cpu_is_offline only valid to call for
>>CPUs that have been online sometime, and can evaluate to 0 if
>>HOTPLUG_CPU is not set?
> 
> 
> I think this is already being done in include/linux/cpu.h
> 

Yes I see. I didn't realise the first one was under an ifdef :P
Sorry.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [lhcs-devel] Re: CPU Hotplug broken -mm5 onwards
  2004-04-19 22:55     ` Nick Piggin
@ 2004-04-19 23:07       ` Rusty Russell
  0 siblings, 0 replies; 7+ messages in thread
From: Rusty Russell @ 2004-04-19 23:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Srivatsa Vaddagiri, rusty, Ingo Molnar, Andrew Morton,
	lkml - Kernel Mailing List, LHCS list

On Tue, 2004-04-20 at 08:55, Nick Piggin wrote:
> > So, as Rusty said, I think we really need to consider removing
> > lock_cpu_hotplug from sched_migrate_task. AFAICS that lock
> > was needed to prevent adding tasks to dead cpus. The same 
> > can be accomplished by removing lock_cpu_hotplug from sched_migrate_task
> > and adding a cpu_is_offline check in __migrate_task.
> > This will eliminate all the deadlocks I have been hitting.
> > 
> 
> Yes this would be a better idea. Care to send Andrew a patch
> against -mm?

What surprises me is that this is a regression.  The original hotplug
code on top of Nicksched(TM) removed that lock as part of the "don't
change cpus_allowed to migrate on exec" fix.  When we pushed the patch
straight into Linus' tree, we had to do lock_cpu_hotplug because we
didn't have that code.

Obviously, it escaped somewhere...

Cheers,
Rusty.
-- 
Anyone who quotes me in their signature is an idiot -- Rusty Russell


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPU Hotplug broken -mm5 onwards
       [not found]   ` <20040421095939.GB10767@in.ibm.com>
@ 2004-04-21 16:44     ` Srivatsa Vaddagiri
  2004-04-22  7:03       ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Srivatsa Vaddagiri @ 2004-04-21 16:44 UTC (permalink / raw)
  To: rusty; +Cc: mingo, nickpiggin, akpm, linux-kernel, lhcs-devel

On Wed, Apr 21, 2004 at 03:29:39PM +0530, Srivatsa Vaddagiri wrote:
> On Wed, Apr 21, 2004 at 02:36:50AM -0700, Andrew Morton wrote:
> > 
> > Not sure who to aim this at, but:  poke.
> 
> I found that removing the lock in exec path has other consequences
> which I am fixing/testing. I will be sending an updated patch later today 
> (against 2.6.6-rc1-mm1) to fix all those issues.

Found that lockless migration of execing task is _extremely_ racy.
The races I hit are described below, alongwith probable solutions.

Task migration done elsewhere should be safe (?) since they either
hold the lock (sys_sched_setaffinity) or are done entirely with preemption 
disabled (load_balance).


   sched_balance_exec does:

	a. disables preemption
	b. finds new_cpu for current
	c. enables preemption 
	d. calls sched_migrate_task to migrate current to new_cpu

   and sched_migrate_task does:

	e. task_rq_lock(p)
	f. migrate_task(p, dest_cpu ..)
		(if we have to wait for migration thread)
		g. task_rq_unlock()
		h. wake_up_process(rq->migration_thread)
		i. wait_for_completion()


   Several things can happen here:

	1. new_cpu can go down after h and before migration thread has
	   got around to handle the request

	   ==> we need to add a cpu_is_offline check in __migrate_task
		
	2. new_cpu can go down between c and d or before f.

	   ===> Even though this case is automatically handled by the above 
	        change (migrate_task being called on a running task, current, 
		will delegate migration to migration thread), would it be 
	 	good practice to avoid calling migrate_task in the first place
		itself when dest_cpu is offline. This means adding another
	 	cpu_is_offline check after e in sched_migrate_task

	3. The 'current' task can get preempted _immediately_ after
	   g and when it comes back, task_cpu(p) can be dead. In 
	   which case, it is invalid to do wake_up on a non-existent migration 
	   thread.  (rq->migration_thread can be NULL).

	   ===> We should disable preemption thr' g and h

	4. Before migration thread gets around to handle the request, its cpu 
	   goes dead. This will leave unhandled migration requests in the dead 
	   cpu. 

	   ===> We need to wakeup sleeping requestors (if any) in CPU_DEAD 
	        notification.

I really wonder if we can get rid of these issues by avoiding balancing at 
exec time and instead have it balanced during load_balance ..Alternately
if this is valuable and we want to retain it, I think we still need to 
consider a read/write sem, with sched_migrate_task doing down_read_trylock.
This may eliminate the deadlock I hit between cpu_up and CPU_UP_PREPARE 
notification, which had forced me away from r/w sem.

Anyway patch below addresses the above races. Its against 2.6.6-rc2-mm1
and has been tested on a 4way Intel Pentium SMP m/c. I have added a
cpu_is_offlince check in migration_thread(). If that is true, migration
thread stop processing and just waits for kthread_stop.

Let me know what you think!



---

 linux-2.6.6-rc2-mm1-root/kernel/sched.c |   40 ++++++++++++++++++++++++++++----
 1 files changed, 36 insertions(+), 4 deletions(-)

diff -puN kernel/sched.c~remove_hotplug_lock_in_sched_migrate_task kernel/sched.c
--- linux-2.6.6-rc2-mm1/kernel/sched.c~remove_hotplug_lock_in_sched_migrate_task	2004-04-20 20:42:03.000000000 +0530
+++ linux-2.6.6-rc2-mm1-root/kernel/sched.c	2004-04-20 22:22:16.527816944 +0530
@@ -1420,16 +1420,18 @@ static void sched_migrate_task(task_t *p
 	runqueue_t *rq;
 	unsigned long flags;
 
-	lock_cpu_hotplug();
 	rq = task_rq_lock(p, &flags);
-	if (!cpu_isset(dest_cpu, p->cpus_allowed))
+	if (!cpu_isset(dest_cpu, p->cpus_allowed) ||
+			unlikely(cpu_is_offline(dest_cpu)))
 		goto out;
 
 	/* force the process onto the specified CPU */
 	if (migrate_task(p, dest_cpu, &req)) {
 		/* Need to wait for migration thread. */
+		preempt_disable();
 		task_rq_unlock(rq, &flags);
 		wake_up_process(rq->migration_thread);
+		preempt_enable();
 		wait_for_completion(&req.done);
 
 		/*
@@ -1438,13 +1440,11 @@ static void sched_migrate_task(task_t *p
 		 * the migration.
 		 */
 		tlb_migrate_prepare(current->mm);
-		unlock_cpu_hotplug();
 
 		return;
 	}
 out:
 	task_rq_unlock(rq, &flags);
-	unlock_cpu_hotplug();
 }
 
 /*
@@ -3485,6 +3485,9 @@ static void __migrate_task(struct task_s
 {
 	runqueue_t *rq_dest;
 
+	if (unlikely(cpu_is_offline(dest_cpu)))
+		return;
+
 	rq_dest = cpu_rq(dest_cpu);
 
 	double_rq_lock(this_rq(), rq_dest);
@@ -3540,6 +3543,8 @@ static int migration_thread(void * data)
 		if (list_empty(head)) {
 			spin_unlock_irq(&rq->lock);
 			schedule();
+			if (unlikely(cpu_is_offline(cpu)))
+				goto wait_to_die;
 			continue;
 		}
 		req = list_entry(head->next, migration_req_t, list);
@@ -3560,6 +3565,15 @@ static int migration_thread(void * data)
 		complete(&req->done);
 	}
 	return 0;
+wait_to_die:
+	/* Wait for kthread_stop */
+	set_current_state(TASK_INTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+	__set_current_state(TASK_RUNNING);
+	return 0;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -3630,6 +3644,8 @@ static int migration_call(struct notifie
 	struct task_struct *p;
 	struct runqueue *rq;
 	unsigned long flags;
+	struct list_head *head;
+	migration_req_t *req, *tmp;
 
 	switch (action) {
 	case CPU_UP_PREPARE:
@@ -3652,6 +3668,22 @@ static int migration_call(struct notifie
 		/* Unbind it from offline cpu so it can run.  Fall thru. */
 		kthread_bind(cpu_rq(cpu)->migration_thread,smp_processor_id());
 	case CPU_DEAD:
+		rq = cpu_rq(cpu);
+		head = &rq->migration_queue;
+		spin_lock_irq(&rq->lock);
+		list_for_each_entry_safe(req, tmp, head, list) {
+
+			BUG_ON(req->type != REQ_MOVE_TASK);
+
+			list_del_init(&req->list);
+
+			/* No need to migrate the task in the request. It would
+	 		 * have already been migrated (maybe to a different
+			 * CPU). Just wake up the requestor.
+			 */
+			complete(&req->done);
+		}
+		spin_unlock_irq(&rq->lock);
 		kthread_stop(cpu_rq(cpu)->migration_thread);
 		cpu_rq(cpu)->migration_thread = NULL;
  		BUG_ON(cpu_rq(cpu)->nr_running != 0);

_



The above patch is running fine against 2.6.6-rc2-mm1 for the last 3 hrs.

The same patch when I had tried against 2.6.6-rc1-mm1 lead to 
this BUG in include/asm-i386/mmu_context.h after running for 1 hr.

	BUG_ON(cpu_tlbstate[cpu].active_mm != next);

Stack trace was:

------------[ cut here ]------------
kernel BUG at include/asm/mmu_context.h:53!
invalid operand: 0000 [#1]
PREEMPT SMP
CPU:    0
EIP:    0060:[<c03c94a2>]    Not tainted VLI
EFLAGS: 00010087   (2.6.6-rc1-mm1)
EIP is at schedule+0x482/0x62d
eax: 00000000   ebx: d34a0a00   ecx: 00000000   edx: f73f1a80
esi: d01b2b70   edi: c170bca0   ebp: ebcbbfb0   esp: ebcbbf5c
ds: 007b   es: 007b   ss: 0068
Process kstopmachine (pid: 19332, threadinfo=ebcba000 task=d34a0850)
Stack: d34a0850 c170bca0 00000001 ebcbbf8c f73f1a80 d01b2b70 00000003 00000000
       cad73ee8 ebcba000 cad73ee0 00000292 d01b2b70 00002fcb c04c53d9 0000037b
       d34a0850 d34a0a00 ebcba000 fffffff0 cad73ed8 c013bd3f c013bd7b 00000000
Call Trace:
 [<c013bd3f>] do_stop+0x0/0x6f
 [<c013bd7b>] do_stop+0x3c/0x6f
 [<c0133d39>] kthread+0xb7/0xbd
 [<c0133c82>] kthread+0x0/0xbd
 [<c01023a1>] kernel_thread_helper+0x5/0xb


Don't know if it is related to something that is fixed in 2.6.6-rc2-mm1.

Will update this list tomorrow with how my tests are faring on 2.6.6-rc2-mm1.


	

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPU Hotplug broken -mm5 onwards
  2004-04-21 16:44     ` Srivatsa Vaddagiri
@ 2004-04-22  7:03       ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2004-04-22  7:03 UTC (permalink / raw)
  To: vatsa; +Cc: rusty, mingo, nickpiggin, linux-kernel, lhcs-devel

Srivatsa Vaddagiri <vatsa@in.ibm.com> wrote:
>
> Anyway patch below addresses the above races. Its against 2.6.6-rc2-mm1
>  and has been tested on a 4way Intel Pentium SMP m/c. I have added a
>  cpu_is_offlince check in migration_thread(). If that is true, migration
>  thread stop processing and just waits for kthread_stop.

I'll take the stunned silence as agreement ;)

It needed a warning fixup:

kernel/sched.c: In function `migration_call':
kernel/sched.c:3648: warning: unused variable `tmp'
kernel/sched.c:3648: warning: unused variable `req'
kernel/sched.c:3647: warning: unused variable `head'

---

 25-akpm/kernel/sched.c |   44 ++++++++++++++++++++++++--------------------
 1 files changed, 24 insertions(+), 20 deletions(-)

diff -puN kernel/sched.c~sched-remove_hotplug_lock_in_sched_migrate_task-warnings kernel/sched.c
--- 25/kernel/sched.c~sched-remove_hotplug_lock_in_sched_migrate_task-warnings	2004-04-21 23:56:43.736745848 -0700
+++ 25-akpm/kernel/sched.c	2004-04-21 23:58:42.628671528 -0700
@@ -3339,8 +3339,6 @@ static int migration_call(struct notifie
 	struct task_struct *p;
 	struct runqueue *rq;
 	unsigned long flags;
-	struct list_head *head;
-	migration_req_t *req, *tmp;
 
 	switch (action) {
 	case CPU_UP_PREPARE:
@@ -3363,25 +3361,31 @@ static int migration_call(struct notifie
 		/* Unbind it from offline cpu so it can run.  Fall thru. */
 		kthread_bind(cpu_rq(cpu)->migration_thread,smp_processor_id());
 	case CPU_DEAD:
-		rq = cpu_rq(cpu);
-		head = &rq->migration_queue;
-		spin_lock_irq(&rq->lock);
-		list_for_each_entry_safe(req, tmp, head, list) {
-
-			BUG_ON(req->type != REQ_MOVE_TASK);
-
-			list_del_init(&req->list);
-
-			/* No need to migrate the task in the request. It would
-	 		 * have already been migrated (maybe to a different
-			 * CPU). Just wake up the requestor.
-			 */
-			complete(&req->done);
+		{
+			struct list_head *head;
+			migration_req_t *req, *tmp;
+
+			rq = cpu_rq(cpu);
+			head = &rq->migration_queue;
+			spin_lock_irq(&rq->lock);
+			list_for_each_entry_safe(req, tmp, head, list) {
+
+				BUG_ON(req->type != REQ_MOVE_TASK);
+
+				list_del_init(&req->list);
+
+				/*
+				 * No need to migrate the task in the request.
+				 * It would have already been migrated (maybe to
+				 * a different CPU). Just wake up the requestor
+				 */
+				complete(&req->done);
+			}
+			spin_unlock_irq(&rq->lock);
+			kthread_stop(cpu_rq(cpu)->migration_thread);
+			cpu_rq(cpu)->migration_thread = NULL;
+	 		BUG_ON(cpu_rq(cpu)->nr_running != 0);
 		}
-		spin_unlock_irq(&rq->lock);
-		kthread_stop(cpu_rq(cpu)->migration_thread);
-		cpu_rq(cpu)->migration_thread = NULL;
- 		BUG_ON(cpu_rq(cpu)->nr_running != 0);
  		break;
 #endif
 	}

_


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-04-22  7:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-18 17:06 CPU Hotplug broken -mm5 onwards Srivatsa Vaddagiri
2004-04-19  3:34 ` Nick Piggin
2004-04-19 12:58   ` [lhcs-devel] " Srivatsa Vaddagiri
2004-04-19 22:55     ` Nick Piggin
2004-04-19 23:07       ` Rusty Russell
     [not found] ` <20040421023650.24b9f85a.akpm@osdl.org>
     [not found]   ` <20040421095939.GB10767@in.ibm.com>
2004-04-21 16:44     ` Srivatsa Vaddagiri
2004-04-22  7:03       ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).