All of lore.kernel.org
 help / color / mirror / Atom feed
* [LOCKDEP SPLAT 4.14-rc1] Conflict between smpboot thread create completions and get_online_cpus
@ 2017-09-18 19:24 Steven Rostedt
  2017-09-21 11:04 ` Peter Zijlstra
  0 siblings, 1 reply; 2+ messages in thread
From: Steven Rostedt @ 2017-09-18 19:24 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar, H. Peter Anvin

I hit the following lockdep splat:

 ======================================================
 WARNING: possible circular locking dependency detected
 4.14.0-rc1-test+ #99 Not tainted
 ------------------------------------------------------
 watchdog/3/27 is trying to acquire lock:
  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8100ea07>] release_ds_buffers+0x27/0xb0
 
 but now in release context of a crosslock acquired at the following:
  ((complete)&self->parked){+.+.}, at: [<ffffffff810ca616>] kthread_park+0x46/0x60
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 -> #2 ((complete)&self->parked){+.+.}:
        __lock_acquire+0x10af/0x1110
        lock_acquire+0xea/0x1f0
        wait_for_completion+0x52/0x1a0
        kthread_park+0x46/0x60
        __smpboot_create_thread.part.3+0x7d/0xf0
        smpboot_register_percpu_thread_cpumask+0xb4/0x130
        spawn_ksoftirqd+0x3b/0x45
        do_one_initcall+0x43/0x171
        kernel_init_freeable+0x134/0x270
        kernel_init+0xe/0x101
        ret_from_fork+0x2a/0x40
 
 -> #1 (smpboot_threads_lock){+.+.}:
        __lock_acquire+0x10af/0x1110
        lock_acquire+0xea/0x1f0
        __mutex_lock+0x86/0x990
        mutex_lock_nested+0x1b/0x20
        smpboot_register_percpu_thread_cpumask+0x59/0x130
        spawn_ksoftirqd+0x3b/0x45
        do_one_initcall+0x43/0x171
        kernel_init_freeable+0x134/0x270
        kernel_init+0xe/0x101
        ret_from_fork+0x2a/0x40
 
 -> #0 (cpu_hotplug_lock.rw_sem){++++}:
        cpus_read_lock+0x57/0xc0
        release_ds_buffers+0x27/0xb0
        x86_release_hardware+0x8f/0xa0
        hw_perf_event_destroy+0xe/0x20
        _free_event+0xca/0x3d0
 
 other info that might help us debug this:
 
 Chain exists of:
   cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
 
  Possible unsafe locking scenario by crosslock:
 
        CPU0                    CPU1
        ----                    ----
   lock(smpboot_threads_lock);
   lock((complete)&self->parked);
                                lock(cpu_hotplug_lock.rw_sem);
                                unlock((complete)&self->parked);
 
  *** DEADLOCK ***
 
 1 lock held by watchdog/3/27:
  #0:  (&x->wait){....}, at: [<ffffffff810f59bd>] complete+0x1d/0x60
 
 stack backtrace:
 CPU: 3 PID: 27 Comm: watchdog/3 Not tainted 4.14.0-rc1-test+ #99
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 Call Trace:
  dump_stack+0x86/0xcf
  print_circular_bug+0x209/0x300
  check_prev_add+0x3be/0x700
  ? trace_event_raw_event_lock+0xf0/0xf0
  lock_commit_crosslock+0x439/0x5b0
  ? lock_commit_crosslock+0x439/0x5b0
  complete+0x29/0x60
  ? __kthread_parkme+0x45/0xa0
  __kthread_parkme+0x71/0xa0
  ? smpboot_thread_fn+0x2d/0x280
  kthread_parkme+0x24/0x40
  smpboot_thread_fn+0xac/0x280
  kthread+0x152/0x190
  ? sort_range+0x30/0x30
  ? kthread_create_on_node+0x40/0x40
  ret_from_fork+0x2a/0x40
 NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.


And adding some debugging where the lockdep splat happened:

 watchdog-27      3.... 764857us : cpus_read_lock+0x5/0xc0: lock cpu
 watchdog-27      3.... 764858us : <stack trace>
  => x86_release_hardware+0x8f/0xa0
  => hw_perf_event_destroy+0xe/0x20
  => _free_event+0xca/0x3d0
  => put_event+0x14/0x20
  => perf_event_release_kernel+0x236/0x330
  => watchdog_nmi_disable+0x49/0x70
  => watchdog_disable+0x4d/0x60
  => smpboot_thread_fn+0x138/0x280
  => kthread+0x152/0x190
  => ret_from_fork+0x2a/0x40
 watchdog-27      3.... 764917us : cpus_read_unlock+0x5/0xa0: unlock cpu
 watchdog-27      3d..1 764918us : lock_commit_crosslock+0xc1/0x5b0: xhlock=132

I found that the following chain is of issue:

Creating the per_cpu smpboot hotplug thread function:

	mutex_lock(&smpboot_threads_lock);
	
	tsk = kthread_create_on_cpu(smpboot_thread_fn, td, cpu,
				    ht->thread_comm);
	kthread_park(tsk);

		wait_for_completion(&kthread->parked);


In the mean time, we have that thread doing:

	kthread() {
		smpboot_thread_fn() {
			watchdog_disable() {
				watchdog_nmi_disable() {
					perf_event_release_kernel() {
						put_event() {
							_free_event() {
								event->destroy() {
								(hw_perf_event_destroy())
									x86_release_hardware() {
										release_ds_buffers() {
											get_online_cpus();
											put_online_cpus();
										}
									}
								}
							}
						}
					}
				}
			}
			kthread_parkme() {
				complete(&self->park);


Or the abridged version:

	creating a task under smpboot_threads_lock and then calling wait_for_completion()

	having a case where get_online_cpus() is taken before smpboot_threads_lock

	Then the task you are waiting for, calls get_online_cpus() before doing the complete.

Creates a possibility of a deadlock, because we can have:

	TASK1		TASK2		TASK3
	-----		-----		-----
  mutex_lock(&smpboot_threads_lock);
  wait_for_completion();
  /* blocked */

		    get_online_cpus();
		    mutex_lock(&smpboot_threads_lock);
		    /* blocked */

				    get_online_cpus();
				    /* block */

				    /* Still needs to do: */

				    put_online_cpus();

				    complete();

Sadly, this splat happens at boot up and prevents me from testing
my other locking I'm playing with.

-- Steve

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [LOCKDEP SPLAT 4.14-rc1] Conflict between smpboot thread create completions and get_online_cpus
  2017-09-18 19:24 [LOCKDEP SPLAT 4.14-rc1] Conflict between smpboot thread create completions and get_online_cpus Steven Rostedt
@ 2017-09-21 11:04 ` Peter Zijlstra
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Zijlstra @ 2017-09-21 11:04 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Thomas Gleixner, LKML, Ingo Molnar, H. Peter Anvin

On Mon, Sep 18, 2017 at 03:24:59PM -0400, Steven Rostedt wrote:
> And adding some debugging where the lockdep splat happened:
> 
>  watchdog-27      3.... 764857us : cpus_read_lock+0x5/0xc0: lock cpu
>  watchdog-27      3.... 764858us : <stack trace>
>   => x86_release_hardware+0x8f/0xa0
>   => hw_perf_event_destroy+0xe/0x20
>   => _free_event+0xca/0x3d0
>   => put_event+0x14/0x20
>   => perf_event_release_kernel+0x236/0x330
>   => watchdog_nmi_disable+0x49/0x70
>   => watchdog_disable+0x4d/0x60
>   => smpboot_thread_fn+0x138/0x280
>   => kthread+0x152/0x190
>   => ret_from_fork+0x2a/0x40
>  watchdog-27      3.... 764917us : cpus_read_unlock+0x5/0xa0: unlock cpu
>  watchdog-27      3d..1 764918us : lock_commit_crosslock+0xc1/0x5b0: xhlock=132

Yeah, known issue. Thomas has already been prodding at the watchdog
stuff.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-09-21 11:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-18 19:24 [LOCKDEP SPLAT 4.14-rc1] Conflict between smpboot thread create completions and get_online_cpus Steven Rostedt
2017-09-21 11:04 ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.