All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] cpu/hotplug: Modify lock status before making cpu hotplug callbacks
@ 2017-06-07 21:13 John Allen
  2017-06-12 14:59 ` Thomas Gleixner
  0 siblings, 1 reply; 2+ messages in thread
From: John Allen @ 2017-06-07 21:13 UTC (permalink / raw)
  To: tglx, linux-kernel; +Cc: Nathan Fontenot, Michael Bringmann, John Allen

A deadlock has been observed during a cpu hot add on powerpc machines.

The situation is as follows:
First, in _cpu_up, we take the cpu_hotplug lock and towards the end of _cpu_up,
we make the cpu hotplug callbacks, one of which is arch_update_cpu_topology.
For most other architectures, making this call while the parent thread has the
cpu_hotplug lock seems harmless as most implementations of
arch_update_cpu_topology appear to be quite simple. However, the powerpc
implementation is significantly more complex and requires us to make a call to
stop_machine which in turn attempts to take the cpu_hotplug lock in
get_online_cpus. We then deadlock as the parent thread is waiting on the child
to complete and the child is waiting on the parent to release the lock.

This solution attempts to resolve the issue by incrementing the cpu_hotplug
refcount and releasing the cpu_hotplug mutex in _cpu_up so that the subsequent
call to get_online_cpus can take the mutex while other invocations of _cpu_up
are still prevented from executing as the refcount is non-zero. From my testing,
this seems to resolve the deadlock, but I'm not sure if there is a reason that
get_online_cpus *should* be locked out at this point.

Previous RFC for an alternate solution from Michael Bringmann with additional
details about the bug can be seen here:
https://patchwork.ozlabs.org/patch/771293/

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 9ae6fbe..fc19437 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -776,6 +776,10 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 	cpu_hotplug_begin();

 	cpuhp_tasks_frozen = tasks_frozen;
+	
+	cpu_hotplug.active_writer = NULL;
+	atomic_inc(&cpu_hotplug.refcount);
+	mutex_unlock(&cpu_hotplug.lock);

 	prev_state = st->state;
 	st->target = target;
@@ -810,6 +814,9 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 		cpuhp_kick_ap_work(cpu);
 	}

+	put_online_cpus();
+
+	return ret;
 out:
 	cpu_hotplug_done();
 	return ret;
@@ -939,7 +946,14 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
 	 * responsible for bringing it up to the target state.
 	 */
 	target = min((int)target, CPUHP_BRINGUP_CPU);
+
+	cpu_hotplug.active_writer = NULL;
+	atomic_inc(&cpu_hotplug.refcount);
+	mutex_unlock(&cpu_hotplug.lock);
 	ret = cpuhp_up_callbacks(cpu, st, target);
+	put_online_cpus();
+
+	return ret;
 out:
 	cpu_hotplug_done();
 	return ret;

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [RFC] cpu/hotplug: Modify lock status before making cpu hotplug callbacks
  2017-06-07 21:13 [RFC] cpu/hotplug: Modify lock status before making cpu hotplug callbacks John Allen
@ 2017-06-12 14:59 ` Thomas Gleixner
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Gleixner @ 2017-06-12 14:59 UTC (permalink / raw)
  To: John Allen; +Cc: LKML, Nathan Fontenot, Michael Bringmann, Peter Zijlstra

On Wed, 7 Jun 2017, John Allen wrote:
> A deadlock has been observed during a cpu hot add on powerpc machines.
> 
> The situation is as follows:
> First, in _cpu_up, we take the cpu_hotplug lock and towards the end of _cpu_up,
> we make the cpu hotplug callbacks, one of which is arch_update_cpu_topology.
> For most other architectures, making this call while the parent thread has the
> cpu_hotplug lock seems harmless as most implementations of
> arch_update_cpu_topology appear to be quite simple. However, the powerpc
> implementation is significantly more complex and requires us to make a call to
> stop_machine which in turn attempts to take the cpu_hotplug lock in
> get_online_cpus. We then deadlock as the parent thread is waiting on the child
> to complete and the child is waiting on the parent to release the lock.
> 
> This solution attempts to resolve the issue by incrementing the cpu_hotplug
> refcount and releasing the cpu_hotplug mutex in _cpu_up so that the subsequent
> call to get_online_cpus can take the mutex while other invocations of _cpu_up
> are still prevented from executing as the refcount is non-zero. From my testing,
> this seems to resolve the deadlock, but I'm not sure if there is a reason that
> get_online_cpus *should* be locked out at this point.

This is horrible, really.

We have code pending in -next which reworks the hotplug lock to a per cpu
reader writer semaphore, so this hack is not going to work anymore.

See git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git smp/hotplug

This addresses the issue of calling stomp_machine() from inside a hotplug
locked region, by introducing stop_machine_cpuslocked(). That should solve
your problem nicely.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-06-12 14:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07 21:13 [RFC] cpu/hotplug: Modify lock status before making cpu hotplug callbacks John Allen
2017-06-12 14:59 ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.