All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] printk: fix one circular lockdep warning about console_lock
@ 2014-02-11  6:50 jiel
  2014-02-11 18:29 ` Kees Cook
  2014-02-11 21:19 ` Andrew Morton
  0 siblings, 2 replies; 9+ messages in thread
From: jiel @ 2014-02-11  6:50 UTC (permalink / raw)
  To: akpm, joe, tj, fweisbec, davem, keescook
  Cc: linux-kernel, Jane Li, Daniel Vetter

From: Jane Li <jiel@marvell.com>

This patch tries to fix a warning about possible circular locking
dependency.

If do in following sequence:
    enter suspend ->  resume ->  plug-out CPUx (echo 0 > cpux/online)
lockdep will show warning as following:

======================================================
[ INFO: possible circular locking dependency detected ]
3.10.0 #2 Tainted: G           O
-------------------------------------------------------
sh/1271 is trying to acquire lock:
(console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
but task is already holding lock:
(cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:
-> #2 (cpu_hotplug.lock){+.+.+.}:
[<c017bb7c>] lock_acquire+0x98/0x12c
[<c06f5014>] mutex_lock_nested+0x50/0x3d8
[<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
[<c06ebfac>] _cpu_up+0x24/0x154
[<c06ec140>] cpu_up+0x64/0x84
[<c0981834>] smp_init+0x9c/0xd4
[<c0973880>] kernel_init_freeable+0x78/0x1c8
[<c06e7f40>] kernel_init+0x8/0xe4
[<c010eec8>] ret_from_fork+0x14/0x2c

-> #1 (cpu_add_remove_lock){+.+.+.}:
[<c017bb7c>] lock_acquire+0x98/0x12c
[<c06f5014>] mutex_lock_nested+0x50/0x3d8
[<c012b758>] disable_nonboot_cpus+0x8/0xe8
[<c016b83c>] suspend_devices_and_enter+0x214/0x448
[<c016bc54>] pm_suspend+0x1e4/0x284
[<c016bdcc>] try_to_suspend+0xa4/0xbc
[<c0143848>] process_one_work+0x1c4/0x4fc
[<c0143f80>] worker_thread+0x138/0x37c
[<c014aaf8>] kthread+0xa4/0xb0
[<c010eec8>] ret_from_fork+0x14/0x2c

-> #0 (console_lock){+.+.+.}:
[<c017b5d0>] __lock_acquire+0x1b38/0x1b80
[<c017bb7c>] lock_acquire+0x98/0x12c
[<c01288c4>] console_lock+0x54/0x68
[<c06ebf7c>] console_cpu_notify+0x20/0x2c
[<c01501d4>] notifier_call_chain+0x44/0x84
[<c012b448>] __cpu_notify+0x2c/0x48
[<c012b5b0>] cpu_notify_nofail+0x8/0x14
[<c06e81bc>] _cpu_down+0xf4/0x258
[<c06e8344>] cpu_down+0x24/0x40
[<c06e921c>] store_online+0x30/0x74
[<c03b7298>] dev_attr_store+0x18/0x24
[<c025fc5c>] sysfs_write_file+0x16c/0x19c
[<c0207a98>] vfs_write+0xb4/0x190
[<c0207e58>] SyS_write+0x3c/0x70
[<c010ee00>] ret_fast_syscall+0x0/0x48

Chain exists of:
   console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock

Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
lock(cpu_hotplug.lock);
                               lock(cpu_add_remove_lock);
                               lock(cpu_hotplug.lock);
lock(console_lock);
  *** DEADLOCK ***

There are three locks involved in two sequence:
a) pm suspend:
	console_lock (@suspend_console())
	cpu_add_remove_lock (@disable_nonboot_cpus())
	cpu_hotplug.lock (@_cpu_down())
b) Plug-out CPUx:
	cpu_add_remove_lock (@(cpu_down())
	cpu_hotplug.lock (@_cpu_down())
	console_lock (@console_cpu_notify()) => Lockdeps prints warning log.

There should be not real deadlock, as flag of console_suspended can
protect this.

Printk registers cpu hotplug notify function. When CPUx is plug-out/in,
always execute console_lock() and console_unlock(). This patch
modifies that with console_trylock() and console_unlock(). Then use
that instead of the unconditional console_lock/unlock pair to avoid the
warning.

Signed-off-by: Jane Li <jiel@marvell.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 kernel/printk/printk.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b1d255f..e7a0525 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1893,6 +1893,20 @@ void resume_console(void)
 }
 
 /**
+ * console_flush - flush dmesg if console isn't suspended
+ *
+ * console_unlock always flushes the dmesg buffer, so just try to
+ * grab&drop the console lock. If that fails we know that the current
+ * holder will eventually drop the console lock and so flush the dmesg
+ * buffers at the earliest possible time.
+ */
+void console_flush(void)
+{
+	if (console_trylock())
+		console_unlock();
+}
+
+/**
  * console_cpu_notify - print deferred console messages after CPU hotplug
  * @self: notifier struct
  * @action: CPU hotplug event
@@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
 	case CPU_DEAD:
 	case CPU_DOWN_FAILED:
 	case CPU_UP_CANCELED:
-		console_lock();
-		console_unlock();
+		console_flush();
 	}
 	return NOTIFY_OK;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] printk: fix one circular lockdep warning about console_lock
  2014-02-11  6:50 [PATCH] printk: fix one circular lockdep warning about console_lock jiel
@ 2014-02-11 18:29 ` Kees Cook
  2014-02-11 21:19 ` Andrew Morton
  1 sibling, 0 replies; 9+ messages in thread
From: Kees Cook @ 2014-02-11 18:29 UTC (permalink / raw)
  To: jiel
  Cc: Andrew Morton, Joe Perches, Tejun Heo,
	Frédéric Weisbecker, David S. Miller, LKML,
	Daniel Vetter

On Mon, Feb 10, 2014 at 10:50 PM,  <jiel@marvell.com> wrote:
> From: Jane Li <jiel@marvell.com>
>
> This patch tries to fix a warning about possible circular locking
> dependency.
>
> If do in following sequence:
>     enter suspend ->  resume ->  plug-out CPUx (echo 0 > cpux/online)
> lockdep will show warning as following:
>
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.10.0 #2 Tainted: G           O
> -------------------------------------------------------
> sh/1271 is trying to acquire lock:
> (console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> but task is already holding lock:
> (cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
> -> #2 (cpu_hotplug.lock){+.+.+.}:
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> [<c06ebfac>] _cpu_up+0x24/0x154
> [<c06ec140>] cpu_up+0x64/0x84
> [<c0981834>] smp_init+0x9c/0xd4
> [<c0973880>] kernel_init_freeable+0x78/0x1c8
> [<c06e7f40>] kernel_init+0x8/0xe4
> [<c010eec8>] ret_from_fork+0x14/0x2c
>
> -> #1 (cpu_add_remove_lock){+.+.+.}:
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> [<c012b758>] disable_nonboot_cpus+0x8/0xe8
> [<c016b83c>] suspend_devices_and_enter+0x214/0x448
> [<c016bc54>] pm_suspend+0x1e4/0x284
> [<c016bdcc>] try_to_suspend+0xa4/0xbc
> [<c0143848>] process_one_work+0x1c4/0x4fc
> [<c0143f80>] worker_thread+0x138/0x37c
> [<c014aaf8>] kthread+0xa4/0xb0
> [<c010eec8>] ret_from_fork+0x14/0x2c
>
> -> #0 (console_lock){+.+.+.}:
> [<c017b5d0>] __lock_acquire+0x1b38/0x1b80
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c01288c4>] console_lock+0x54/0x68
> [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> [<c01501d4>] notifier_call_chain+0x44/0x84
> [<c012b448>] __cpu_notify+0x2c/0x48
> [<c012b5b0>] cpu_notify_nofail+0x8/0x14
> [<c06e81bc>] _cpu_down+0xf4/0x258
> [<c06e8344>] cpu_down+0x24/0x40
> [<c06e921c>] store_online+0x30/0x74
> [<c03b7298>] dev_attr_store+0x18/0x24
> [<c025fc5c>] sysfs_write_file+0x16c/0x19c
> [<c0207a98>] vfs_write+0xb4/0x190
> [<c0207e58>] SyS_write+0x3c/0x70
> [<c010ee00>] ret_fast_syscall+0x0/0x48
>
> Chain exists of:
>    console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock
>
> Possible unsafe locking scenario:
>        CPU0                    CPU1
>        ----                    ----
> lock(cpu_hotplug.lock);
>                                lock(cpu_add_remove_lock);
>                                lock(cpu_hotplug.lock);
> lock(console_lock);
>   *** DEADLOCK ***
>
> There are three locks involved in two sequence:
> a) pm suspend:
>         console_lock (@suspend_console())
>         cpu_add_remove_lock (@disable_nonboot_cpus())
>         cpu_hotplug.lock (@_cpu_down())
> b) Plug-out CPUx:
>         cpu_add_remove_lock (@(cpu_down())
>         cpu_hotplug.lock (@_cpu_down())
>         console_lock (@console_cpu_notify()) => Lockdeps prints warning log.
>
> There should be not real deadlock, as flag of console_suspended can
> protect this.
>
> Printk registers cpu hotplug notify function. When CPUx is plug-out/in,
> always execute console_lock() and console_unlock(). This patch
> modifies that with console_trylock() and console_unlock(). Then use
> that instead of the unconditional console_lock/unlock pair to avoid the
> warning.
>
> Signed-off-by: Jane Li <jiel@marvell.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  kernel/printk/printk.c |   17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index b1d255f..e7a0525 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1893,6 +1893,20 @@ void resume_console(void)
>  }
>
>  /**
> + * console_flush - flush dmesg if console isn't suspended
> + *
> + * console_unlock always flushes the dmesg buffer, so just try to
> + * grab&drop the console lock. If that fails we know that the current
> + * holder will eventually drop the console lock and so flush the dmesg
> + * buffers at the earliest possible time.
> + */
> +void console_flush(void)
> +{
> +       if (console_trylock())
> +               console_unlock();
> +}
> +
> +/**
>   * console_cpu_notify - print deferred console messages after CPU hotplug
>   * @self: notifier struct
>   * @action: CPU hotplug event
> @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
>         case CPU_DEAD:
>         case CPU_DOWN_FAILED:
>         case CPU_UP_CANCELED:
> -               console_lock();
> -               console_unlock();
> +               console_flush();
>         }
>         return NOTIFY_OK;
>  }
> --
> 1.7.9.5
>

Seems reasonable to me.

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] printk: fix one circular lockdep warning about console_lock
  2014-02-11  6:50 [PATCH] printk: fix one circular lockdep warning about console_lock jiel
  2014-02-11 18:29 ` Kees Cook
@ 2014-02-11 21:19 ` Andrew Morton
  2014-02-11 21:35   ` Daniel Vetter
                     ` (3 more replies)
  1 sibling, 4 replies; 9+ messages in thread
From: Andrew Morton @ 2014-02-11 21:19 UTC (permalink / raw)
  To: jiel
  Cc: joe, tj, fweisbec, davem, keescook, linux-kernel, Daniel Vetter,
	Jan Kara

On Tue, 11 Feb 2014 14:50:00 +0800 <jiel@marvell.com> wrote:

> From: Jane Li <jiel@marvell.com>
> 
> This patch tries to fix a warning about possible circular locking
> dependency.
> 
> If do in following sequence:
>     enter suspend ->  resume ->  plug-out CPUx (echo 0 > cpux/online)
> lockdep will show warning as following:
> 
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.10.0 #2 Tainted: G           O
> -------------------------------------------------------
> sh/1271 is trying to acquire lock:
> (console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> but task is already holding lock:
> (cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> -> #2 (cpu_hotplug.lock){+.+.+.}:
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> [<c06ebfac>] _cpu_up+0x24/0x154
> [<c06ec140>] cpu_up+0x64/0x84
> [<c0981834>] smp_init+0x9c/0xd4
> [<c0973880>] kernel_init_freeable+0x78/0x1c8
> [<c06e7f40>] kernel_init+0x8/0xe4
> [<c010eec8>] ret_from_fork+0x14/0x2c
> 
> -> #1 (cpu_add_remove_lock){+.+.+.}:
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> [<c012b758>] disable_nonboot_cpus+0x8/0xe8
> [<c016b83c>] suspend_devices_and_enter+0x214/0x448
> [<c016bc54>] pm_suspend+0x1e4/0x284
> [<c016bdcc>] try_to_suspend+0xa4/0xbc
> [<c0143848>] process_one_work+0x1c4/0x4fc
> [<c0143f80>] worker_thread+0x138/0x37c
> [<c014aaf8>] kthread+0xa4/0xb0
> [<c010eec8>] ret_from_fork+0x14/0x2c
> 
> -> #0 (console_lock){+.+.+.}:
> [<c017b5d0>] __lock_acquire+0x1b38/0x1b80
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c01288c4>] console_lock+0x54/0x68
> [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> [<c01501d4>] notifier_call_chain+0x44/0x84
> [<c012b448>] __cpu_notify+0x2c/0x48
> [<c012b5b0>] cpu_notify_nofail+0x8/0x14
> [<c06e81bc>] _cpu_down+0xf4/0x258
> [<c06e8344>] cpu_down+0x24/0x40
> [<c06e921c>] store_online+0x30/0x74
> [<c03b7298>] dev_attr_store+0x18/0x24
> [<c025fc5c>] sysfs_write_file+0x16c/0x19c
> [<c0207a98>] vfs_write+0xb4/0x190
> [<c0207e58>] SyS_write+0x3c/0x70
> [<c010ee00>] ret_fast_syscall+0x0/0x48
> 
> Chain exists of:
>    console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock
> 
> Possible unsafe locking scenario:
>        CPU0                    CPU1
>        ----                    ----
> lock(cpu_hotplug.lock);
>                                lock(cpu_add_remove_lock);
>                                lock(cpu_hotplug.lock);
> lock(console_lock);
>   *** DEADLOCK ***

These traces hurt my brain.

> There are three locks involved in two sequence:
> a) pm suspend:
> 	console_lock (@suspend_console())
> 	cpu_add_remove_lock (@disable_nonboot_cpus())
> 	cpu_hotplug.lock (@_cpu_down())

But but but.  suspend_console() releases console_sem again.  So the
sequence is actually

 	down(&console_sem) (@suspend_console())
 	up(&console_sem) (@suspend_console())
 	cpu_add_remove_lock (@disable_nonboot_cpus())
 	cpu_hotplug.lock (@_cpu_down())

So console_sem *doesn't* nest outside cpu_add_remove_lock and
cpu_hotplug.lock.

> b) Plug-out CPUx:
> 	cpu_add_remove_lock (@(cpu_down())
> 	cpu_hotplug.lock (@_cpu_down())
> 	console_lock (@console_cpu_notify()) => Lockdeps prints warning log.
> 
> There should be not real deadlock, as flag of console_suspended can
> protect this.

console_lock() does down(&console_sem) *before* testing
console_suspended, so I don't understand this sentence - a more
detailed description would help.

> Printk registers cpu hotplug notify function. When CPUx is plug-out/in,
> always execute console_lock() and console_unlock(). This patch
> modifies that with console_trylock() and console_unlock(). Then use
> that instead of the unconditional console_lock/unlock pair to avoid the
> warning.
> 
> ...
>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1893,6 +1893,20 @@ void resume_console(void)
>  }
>  
>  /**
> + * console_flush - flush dmesg if console isn't suspended
> + *
> + * console_unlock always flushes the dmesg buffer, so just try to
> + * grab&drop the console lock. If that fails we know that the current
> + * holder will eventually drop the console lock and so flush the dmesg
> + * buffers at the earliest possible time.
> + */

The comment should describe why we added this code, please: talk about
cpu_hotplug.lock and console_lock.

> +void console_flush(void)
> +{
> +	if (console_trylock())
> +		console_unlock();
> +}
> +
> +/**
>   * console_cpu_notify - print deferred console messages after CPU hotplug
>   * @self: notifier struct
>   * @action: CPU hotplug event
> @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
>  	case CPU_DEAD:
>  	case CPU_DOWN_FAILED:
>  	case CPU_UP_CANCELED:
> -		console_lock();
> -		console_unlock();
> +		console_flush();
>  	}
>  	return NOTIFY_OK;

Well, this is a bit hacky and makes the already-far-too-complex code
even more complex.  If it is indeed the case that the deadlock cannot
really occur then let's try to find a way of suppressing the lockdep
warning without making runtime changes.

What I'm struggling with is what *should* the ranking of these locks be?
>From a conceptual high-level design standpoint, which is the
"innermost" lock?  I tend to think that it is console_lock, because
blocking CPU hotplug is a quite high-level operation.

But console_lock is such a kooky special-case in the way it is used to
control the printk corking that it is hard to take general rules and
apply them here.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] printk: fix one circular lockdep warning about console_lock
  2014-02-11 21:19 ` Andrew Morton
@ 2014-02-11 21:35   ` Daniel Vetter
  2014-02-13 10:48   ` Jan Kara
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Daniel Vetter @ 2014-02-11 21:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jane Li, Joe Perches, Tejun Heo, fweisbec, David Miller,
	Kees Cook, Linux Kernel Mailing List, Jan Kara

Aside: I've suggested this fix (but was too lazy to write the actual
patch), hence also my sob on it.

On Tue, Feb 11, 2014 at 10:19 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1893,6 +1893,20 @@ void resume_console(void)
>>  }
>>
>>  /**
>> + * console_flush - flush dmesg if console isn't suspended
>> + *
>> + * console_unlock always flushes the dmesg buffer, so just try to
>> + * grab&drop the console lock. If that fails we know that the current
>> + * holder will eventually drop the console lock and so flush the dmesg
>> + * buffers at the earliest possible time.
>> + */
>
> The comment should describe why we added this code, please: talk about
> cpu_hotplug.lock and console_lock.

I've suggested this since it's the generic way to flush out dmesg. The
printk functions also use this trick, but with the complication that
console_trylock_for_printk also drops the logbuf log. So imo this is a
general helper to flush the console buffer in odd circumstances and
not specific to the cpu lockdep issue at hand. At least my idea was
that this could be useful to untangle other issues around
console_lock.

>> +void console_flush(void)
>> +{
>> +     if (console_trylock())
>> +             console_unlock();
>> +}
>> +
>> +/**
>>   * console_cpu_notify - print deferred console messages after CPU hotplug
>>   * @self: notifier struct
>>   * @action: CPU hotplug event
>> @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
>>       case CPU_DEAD:
>>       case CPU_DOWN_FAILED:
>>       case CPU_UP_CANCELED:
>> -             console_lock();
>> -             console_unlock();
>> +             console_flush();
>>       }
>>       return NOTIFY_OK;
>
> Well, this is a bit hacky and makes the already-far-too-complex code
> even more complex.  If it is indeed the case that the deadlock cannot
> really occur then let's try to find a way of suppressing the lockdep
> warning without making runtime changes.
>
> What I'm struggling with is what *should* the ranking of these locks be?
> From a conceptual high-level design standpoint, which is the
> "innermost" lock?  I tend to think that it is console_lock, because
> blocking CPU hotplug is a quite high-level operation.
>
> But console_lock is such a kooky special-case in the way it is used to
> control the printk corking that it is hard to take general rules and
> apply them here.

console_lock is one giant disaster imo, at least from my experience
with it's interactions in fbdev/drm/gpu code - due to bonghits around
how the fbcon code is structured we need to run the entire initial
modeset under the console_lock, which means if anything goes wrong we
can't even dump to net/serial console.

Hence why I've proposed the trylock "fix", prettied up in the
console_flush helper to duct-tape over the lockdep splat at hand. It
actually achieves exactly what the code sets out to do (flush dmesg as
soon as possible) without extending the reach of insane console_lock
interactions into even more code. Because that way lies madness, at
least when I look at our vain attempts to make fbcon and drm work
together properly ...

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] printk: fix one circular lockdep warning about console_lock
  2014-02-11 21:19 ` Andrew Morton
  2014-02-11 21:35   ` Daniel Vetter
@ 2014-02-13 10:48   ` Jan Kara
  2014-02-14  7:42   ` Jane Li
  2014-03-19  3:08   ` Jane Li
  3 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2014-02-13 10:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jiel, joe, tj, fweisbec, davem, keescook, linux-kernel,
	Daniel Vetter, Jan Kara

On Tue 11-02-14 13:19:27, Andrew Morton wrote:
> On Tue, 11 Feb 2014 14:50:00 +0800 <jiel@marvell.com> wrote:
> 
> > From: Jane Li <jiel@marvell.com>
> > 
> > This patch tries to fix a warning about possible circular locking
> > dependency.
> > 
> > If do in following sequence:
> >     enter suspend ->  resume ->  plug-out CPUx (echo 0 > cpux/online)
> > lockdep will show warning as following:
> > 
> > ======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 3.10.0 #2 Tainted: G           O
> > -------------------------------------------------------
> > sh/1271 is trying to acquire lock:
> > (console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> > but task is already holding lock:
> > (cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> > which lock already depends on the new lock.
> > 
> > the existing dependency chain (in reverse order) is:
> > -> #2 (cpu_hotplug.lock){+.+.+.}:
> > [<c017bb7c>] lock_acquire+0x98/0x12c
> > [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> > [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> > [<c06ebfac>] _cpu_up+0x24/0x154
> > [<c06ec140>] cpu_up+0x64/0x84
> > [<c0981834>] smp_init+0x9c/0xd4
> > [<c0973880>] kernel_init_freeable+0x78/0x1c8
> > [<c06e7f40>] kernel_init+0x8/0xe4
> > [<c010eec8>] ret_from_fork+0x14/0x2c
> > 
> > -> #1 (cpu_add_remove_lock){+.+.+.}:
> > [<c017bb7c>] lock_acquire+0x98/0x12c
> > [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> > [<c012b758>] disable_nonboot_cpus+0x8/0xe8
> > [<c016b83c>] suspend_devices_and_enter+0x214/0x448
> > [<c016bc54>] pm_suspend+0x1e4/0x284
> > [<c016bdcc>] try_to_suspend+0xa4/0xbc
> > [<c0143848>] process_one_work+0x1c4/0x4fc
> > [<c0143f80>] worker_thread+0x138/0x37c
> > [<c014aaf8>] kthread+0xa4/0xb0
> > [<c010eec8>] ret_from_fork+0x14/0x2c
> > 
> > -> #0 (console_lock){+.+.+.}:
> > [<c017b5d0>] __lock_acquire+0x1b38/0x1b80
> > [<c017bb7c>] lock_acquire+0x98/0x12c
> > [<c01288c4>] console_lock+0x54/0x68
> > [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> > [<c01501d4>] notifier_call_chain+0x44/0x84
> > [<c012b448>] __cpu_notify+0x2c/0x48
> > [<c012b5b0>] cpu_notify_nofail+0x8/0x14
> > [<c06e81bc>] _cpu_down+0xf4/0x258
> > [<c06e8344>] cpu_down+0x24/0x40
> > [<c06e921c>] store_online+0x30/0x74
> > [<c03b7298>] dev_attr_store+0x18/0x24
> > [<c025fc5c>] sysfs_write_file+0x16c/0x19c
> > [<c0207a98>] vfs_write+0xb4/0x190
> > [<c0207e58>] SyS_write+0x3c/0x70
> > [<c010ee00>] ret_fast_syscall+0x0/0x48
> > 
> > Chain exists of:
> >    console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock
> > 
> > Possible unsafe locking scenario:
> >        CPU0                    CPU1
> >        ----                    ----
> > lock(cpu_hotplug.lock);
> >                                lock(cpu_add_remove_lock);
> >                                lock(cpu_hotplug.lock);
> > lock(console_lock);
> >   *** DEADLOCK ***
> 
> These traces hurt my brain.
> 
> > There are three locks involved in two sequence:
> > a) pm suspend:
> > 	console_lock (@suspend_console())
> > 	cpu_add_remove_lock (@disable_nonboot_cpus())
> > 	cpu_hotplug.lock (@_cpu_down())
> 
> But but but.  suspend_console() releases console_sem again.  So the
> sequence is actually
> 
>  	down(&console_sem) (@suspend_console())
>  	up(&console_sem) (@suspend_console())
>  	cpu_add_remove_lock (@disable_nonboot_cpus())
>  	cpu_hotplug.lock (@_cpu_down())
> 
> So console_sem *doesn't* nest outside cpu_add_remove_lock and
> cpu_hotplug.lock.
  Exactly. My take would be that the lockdep annotation of console_sem is
just missing
  mutex_release(&console_lock_dep_map, 1, _RET_IP_);
in suspend_console() and similar counterpart in resume_console(). We are
doing the annotation by hand and apparently this got missed.

...
> > +void console_flush(void)
> > +{
> > +	if (console_trylock())
> > +		console_unlock();
> > +}
> > +
> > +/**
> >   * console_cpu_notify - print deferred console messages after CPU hotplug
> >   * @self: notifier struct
> >   * @action: CPU hotplug event
> > @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
> >  	case CPU_DEAD:
> >  	case CPU_DOWN_FAILED:
> >  	case CPU_UP_CANCELED:
> > -		console_lock();
> > -		console_unlock();
> > +		console_flush();
> >  	}
> >  	return NOTIFY_OK;
> 
> Well, this is a bit hacky and makes the already-far-too-complex code
> even more complex.  If it is indeed the case that the deadlock cannot
> really occur then let's try to find a way of suppressing the lockdep
> warning without making runtime changes.
> 
> What I'm struggling with is what *should* the ranking of these locks be?
> From a conceptual high-level design standpoint, which is the
> "innermost" lock?  I tend to think that it is console_lock, because
> blocking CPU hotplug is a quite high-level operation.
> 
> But console_lock is such a kooky special-case in the way it is used to
> control the printk corking that it is hard to take general rules and
> apply them here.
  Currently I think it should be pretty much the innermost lock for
anything except for console driver special locks.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] printk: fix one circular lockdep warning about console_lock
  2014-02-11 21:19 ` Andrew Morton
  2014-02-11 21:35   ` Daniel Vetter
  2014-02-13 10:48   ` Jan Kara
@ 2014-02-14  7:42   ` Jane Li
  2014-03-19  3:08   ` Jane Li
  3 siblings, 0 replies; 9+ messages in thread
From: Jane Li @ 2014-02-14  7:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: joe, tj, fweisbec, davem, keescook, linux-kernel, Daniel Vetter,
	Jan Kara

On 02/12/2014 05:19 AM, Andrew Morton wrote:

>> There are three locks involved in two sequence:
>> a) pm suspend:
>> 	console_lock (@suspend_console())
>> 	cpu_add_remove_lock (@disable_nonboot_cpus())
>> 	cpu_hotplug.lock (@_cpu_down())
> But but but.  suspend_console() releases console_sem again.

Console_lock does not refer to console_sem but console_lock_dep_map. Its name is console_lock. Suspend_console() does not release console_lock_dep_map.

> So the
> sequence is actually
>
>   	down(&console_sem) (@suspend_console())

	   acquire(&console_lock_dep_map) (&suspend_console())

>   	up(&console_sem) (@suspend_console())
>   	cpu_add_remove_lock (@disable_nonboot_cpus())
>   	cpu_hotplug.lock (@_cpu_down())
>
> So console_sem *doesn't* nest outside cpu_add_remove_lock and
> cpu_hotplug.lock.

Add console_lock in the sequence.

>
>> b) Plug-out CPUx:
>> 	cpu_add_remove_lock (@(cpu_down())
>> 	cpu_hotplug.lock (@_cpu_down())
>> 	console_lock (@console_cpu_notify()) => Lockdeps prints warning log.
>>
>> There should be not real deadlock, as flag of console_suspended can
>> protect this.
> console_lock() does down(&console_sem) *before* testing
> console_suspended, so I don't understand this sentence - a more
> detailed description would help.

After suspend_console(), console_sem is unlocked, but console_lock_dep_map has been acquired.


Best Regards,
Jane


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] printk: fix one circular lockdep warning about console_lock
  2014-02-11 21:19 ` Andrew Morton
                     ` (2 preceding siblings ...)
  2014-02-14  7:42   ` Jane Li
@ 2014-03-19  3:08   ` Jane Li
  2014-03-19 10:00     ` Jan Kara
  3 siblings, 1 reply; 9+ messages in thread
From: Jane Li @ 2014-03-19  3:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: joe, tj, fweisbec, davem, keescook, linux-kernel, Daniel Vetter,
	Jan Kara

On 02/12/2014 05:19 AM, Andrew Morton wrote:

> On Tue, 11 Feb 2014 14:50:00 +0800<jiel@marvell.com>  wrote:
>
>> From: Jane Li<jiel@marvell.com>
>>
>> This patch tries to fix a warning about possible circular locking
>> dependency.
>>
>> If do in following sequence:
>>      enter suspend ->  resume ->  plug-out CPUx (echo 0 > cpux/online)
>> lockdep will show warning as following:
>>
>> ======================================================
>> [ INFO: possible circular locking dependency detected ]
>> 3.10.0 #2 Tainted: G           O
>> -------------------------------------------------------
>> sh/1271 is trying to acquire lock:
>> (console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
>> but task is already holding lock:
>> (cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
>> which lock already depends on the new lock.
>>
>> the existing dependency chain (in reverse order) is:
>> -> #2 (cpu_hotplug.lock){+.+.+.}:
>> [<c017bb7c>] lock_acquire+0x98/0x12c
>> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
>> [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
>> [<c06ebfac>] _cpu_up+0x24/0x154
>> [<c06ec140>] cpu_up+0x64/0x84
>> [<c0981834>] smp_init+0x9c/0xd4
>> [<c0973880>] kernel_init_freeable+0x78/0x1c8
>> [<c06e7f40>] kernel_init+0x8/0xe4
>> [<c010eec8>] ret_from_fork+0x14/0x2c
>>
>> -> #1 (cpu_add_remove_lock){+.+.+.}:
>> [<c017bb7c>] lock_acquire+0x98/0x12c
>> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
>> [<c012b758>] disable_nonboot_cpus+0x8/0xe8
>> [<c016b83c>] suspend_devices_and_enter+0x214/0x448
>> [<c016bc54>] pm_suspend+0x1e4/0x284
>> [<c016bdcc>] try_to_suspend+0xa4/0xbc
>> [<c0143848>] process_one_work+0x1c4/0x4fc
>> [<c0143f80>] worker_thread+0x138/0x37c
>> [<c014aaf8>] kthread+0xa4/0xb0
>> [<c010eec8>] ret_from_fork+0x14/0x2c
>>
>> -> #0 (console_lock){+.+.+.}:
>> [<c017b5d0>] __lock_acquire+0x1b38/0x1b80
>> [<c017bb7c>] lock_acquire+0x98/0x12c
>> [<c01288c4>] console_lock+0x54/0x68
>> [<c06ebf7c>] console_cpu_notify+0x20/0x2c
>> [<c01501d4>] notifier_call_chain+0x44/0x84
>> [<c012b448>] __cpu_notify+0x2c/0x48
>> [<c012b5b0>] cpu_notify_nofail+0x8/0x14
>> [<c06e81bc>] _cpu_down+0xf4/0x258
>> [<c06e8344>] cpu_down+0x24/0x40
>> [<c06e921c>] store_online+0x30/0x74
>> [<c03b7298>] dev_attr_store+0x18/0x24
>> [<c025fc5c>] sysfs_write_file+0x16c/0x19c
>> [<c0207a98>] vfs_write+0xb4/0x190
>> [<c0207e58>] SyS_write+0x3c/0x70
>> [<c010ee00>] ret_fast_syscall+0x0/0x48
>>
>> Chain exists of:
>>     console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock
>>
>> Possible unsafe locking scenario:
>>         CPU0                    CPU1
>>         ----                    ----
>> lock(cpu_hotplug.lock);
>>                                 lock(cpu_add_remove_lock);
>>                                 lock(cpu_hotplug.lock);
>> lock(console_lock);
>>    *** DEADLOCK ***
> These traces hurt my brain.
>
>> There are three locks involved in two sequence:
>> a) pm suspend:
>> 	console_lock (@suspend_console())
>> 	cpu_add_remove_lock (@disable_nonboot_cpus())
>> 	cpu_hotplug.lock (@_cpu_down())
> But but but.  suspend_console() releases console_sem again.  So the
> sequence is actually
>
>   	down(&console_sem) (@suspend_console())
>   	up(&console_sem) (@suspend_console())
>   	cpu_add_remove_lock (@disable_nonboot_cpus())
>   	cpu_hotplug.lock (@_cpu_down())
>
> So console_sem *doesn't* nest outside cpu_add_remove_lock and
> cpu_hotplug.lock.

  Jan Kara and Jane have answered this question in other emails.

>> b) Plug-out CPUx:
>> 	cpu_add_remove_lock (@(cpu_down())
>> 	cpu_hotplug.lock (@_cpu_down())
>> 	console_lock (@console_cpu_notify()) => Lockdeps prints warning log.
>>
>> There should be not real deadlock, as flag of console_suspended can
>> protect this.
> console_lock() does down(&console_sem) *before* testing
> console_suspended, so I don't understand this sentence - a more
> detailed description would help.

Jane has answered this question in another email.

>> Printk registers cpu hotplug notify function. When CPUx is plug-out/in,
>> always execute console_lock() and console_unlock(). This patch
>> modifies that with console_trylock() and console_unlock(). Then use
>> that instead of the unconditional console_lock/unlock pair to avoid the
>> warning.
>>
>> ...
>>
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1893,6 +1893,20 @@ void resume_console(void)
>>   }
>>   
>>   /**
>> + * console_flush - flush dmesg if console isn't suspended
>> + *
>> + * console_unlock always flushes the dmesg buffer, so just try to
>> + * grab&drop the console lock. If that fails we know that the current
>> + * holder will eventually drop the console lock and so flush the dmesg
>> + * buffers at the earliest possible time.
>> + */
> The comment should describe why we added this code, please: talk about
> cpu_hotplug.lock and console_lock.

Daniel has answered this question in another email.

>> +void console_flush(void)
>> +{
>> +	if (console_trylock())
>> +		console_unlock();
>> +}
>> +
>> +/**
>>    * console_cpu_notify - print deferred console messages after CPU hotplug
>>    * @self: notifier struct
>>    * @action: CPU hotplug event
>> @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
>>   	case CPU_DEAD:
>>   	case CPU_DOWN_FAILED:
>>   	case CPU_UP_CANCELED:
>> -		console_lock();
>> -		console_unlock();
>> +		console_flush();
>>   	}
>>   	return NOTIFY_OK;
> Well, this is a bit hacky and makes the already-far-too-complex code
> even more complex.  If it is indeed the case that the deadlock cannot
> really occur then let's try to find a way of suppressing the lockdep
> warning without making runtime changes.
>
> What I'm struggling with is what *should* the ranking of these locks be?
>  From a conceptual high-level design standpoint, which is the
> "innermost" lock?  I tend to think that it is console_lock, because
> blocking CPU hotplug is a quite high-level operation.
>
> But console_lock is such a kooky special-case in the way it is used to
> control the printk corking that it is hard to take general rules and
> apply them here.

Daniel and Jan Kara have answered this question in other emails.

Do you agree with this solution or have other comments?

Thanks!

Best Regards,
Jane

   


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] printk: fix one circular lockdep warning about console_lock
  2014-03-19  3:08   ` Jane Li
@ 2014-03-19 10:00     ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2014-03-19 10:00 UTC (permalink / raw)
  To: Jane Li
  Cc: Andrew Morton, joe, tj, fweisbec, davem, keescook, linux-kernel,
	Daniel Vetter, Jan Kara

On Wed 19-03-14 11:08:08, Jane Li wrote:
> On 02/12/2014 05:19 AM, Andrew Morton wrote:
> 
> >On Tue, 11 Feb 2014 14:50:00 +0800<jiel@marvell.com>  wrote:
> >
> >>From: Jane Li<jiel@marvell.com>
> >>
> >>This patch tries to fix a warning about possible circular locking
> >>dependency.
> >>
> >>If do in following sequence:
> >>     enter suspend ->  resume ->  plug-out CPUx (echo 0 > cpux/online)
> >>lockdep will show warning as following:
> >>
> >>======================================================
> >>[ INFO: possible circular locking dependency detected ]
> >>3.10.0 #2 Tainted: G           O
> >>-------------------------------------------------------
> >>sh/1271 is trying to acquire lock:
> >>(console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> >>but task is already holding lock:
> >>(cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> >>which lock already depends on the new lock.
> >>
> >>the existing dependency chain (in reverse order) is:
> >>-> #2 (cpu_hotplug.lock){+.+.+.}:
> >>[<c017bb7c>] lock_acquire+0x98/0x12c
> >>[<c06f5014>] mutex_lock_nested+0x50/0x3d8
> >>[<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> >>[<c06ebfac>] _cpu_up+0x24/0x154
> >>[<c06ec140>] cpu_up+0x64/0x84
> >>[<c0981834>] smp_init+0x9c/0xd4
> >>[<c0973880>] kernel_init_freeable+0x78/0x1c8
> >>[<c06e7f40>] kernel_init+0x8/0xe4
> >>[<c010eec8>] ret_from_fork+0x14/0x2c
> >>
> >>-> #1 (cpu_add_remove_lock){+.+.+.}:
> >>[<c017bb7c>] lock_acquire+0x98/0x12c
> >>[<c06f5014>] mutex_lock_nested+0x50/0x3d8
> >>[<c012b758>] disable_nonboot_cpus+0x8/0xe8
> >>[<c016b83c>] suspend_devices_and_enter+0x214/0x448
> >>[<c016bc54>] pm_suspend+0x1e4/0x284
> >>[<c016bdcc>] try_to_suspend+0xa4/0xbc
> >>[<c0143848>] process_one_work+0x1c4/0x4fc
> >>[<c0143f80>] worker_thread+0x138/0x37c
> >>[<c014aaf8>] kthread+0xa4/0xb0
> >>[<c010eec8>] ret_from_fork+0x14/0x2c
> >>
> >>-> #0 (console_lock){+.+.+.}:
> >>[<c017b5d0>] __lock_acquire+0x1b38/0x1b80
> >>[<c017bb7c>] lock_acquire+0x98/0x12c
> >>[<c01288c4>] console_lock+0x54/0x68
> >>[<c06ebf7c>] console_cpu_notify+0x20/0x2c
> >>[<c01501d4>] notifier_call_chain+0x44/0x84
> >>[<c012b448>] __cpu_notify+0x2c/0x48
> >>[<c012b5b0>] cpu_notify_nofail+0x8/0x14
> >>[<c06e81bc>] _cpu_down+0xf4/0x258
> >>[<c06e8344>] cpu_down+0x24/0x40
> >>[<c06e921c>] store_online+0x30/0x74
> >>[<c03b7298>] dev_attr_store+0x18/0x24
> >>[<c025fc5c>] sysfs_write_file+0x16c/0x19c
> >>[<c0207a98>] vfs_write+0xb4/0x190
> >>[<c0207e58>] SyS_write+0x3c/0x70
> >>[<c010ee00>] ret_fast_syscall+0x0/0x48
> >>
> >>Chain exists of:
> >>    console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock
> >>
> >>Possible unsafe locking scenario:
> >>        CPU0                    CPU1
> >>        ----                    ----
> >>lock(cpu_hotplug.lock);
> >>                                lock(cpu_add_remove_lock);
> >>                                lock(cpu_hotplug.lock);
> >>lock(console_lock);
> >>   *** DEADLOCK ***
> >These traces hurt my brain.
> >
> >>There are three locks involved in two sequence:
> >>a) pm suspend:
> >>	console_lock (@suspend_console())
> >>	cpu_add_remove_lock (@disable_nonboot_cpus())
> >>	cpu_hotplug.lock (@_cpu_down())
> >But but but.  suspend_console() releases console_sem again.  So the
> >sequence is actually
> >
> >  	down(&console_sem) (@suspend_console())
> >  	up(&console_sem) (@suspend_console())
> >  	cpu_add_remove_lock (@disable_nonboot_cpus())
> >  	cpu_hotplug.lock (@_cpu_down())
> >
> >So console_sem *doesn't* nest outside cpu_add_remove_lock and
> >cpu_hotplug.lock.
> 
>  Jan Kara and Jane have answered this question in other emails.
> 
> >>b) Plug-out CPUx:
> >>	cpu_add_remove_lock (@(cpu_down())
> >>	cpu_hotplug.lock (@_cpu_down())
> >>	console_lock (@console_cpu_notify()) => Lockdeps prints warning log.
> >>
> >>There should be not real deadlock, as flag of console_suspended can
> >>protect this.
> >console_lock() does down(&console_sem) *before* testing
> >console_suspended, so I don't understand this sentence - a more
> >detailed description would help.
> 
> Jane has answered this question in another email.
> 
> >>Printk registers cpu hotplug notify function. When CPUx is plug-out/in,
> >>always execute console_lock() and console_unlock(). This patch
> >>modifies that with console_trylock() and console_unlock(). Then use
> >>that instead of the unconditional console_lock/unlock pair to avoid the
> >>warning.
> >>
> >>...
> >>
> >>--- a/kernel/printk/printk.c
> >>+++ b/kernel/printk/printk.c
> >>@@ -1893,6 +1893,20 @@ void resume_console(void)
> >>  }
> >>  /**
> >>+ * console_flush - flush dmesg if console isn't suspended
> >>+ *
> >>+ * console_unlock always flushes the dmesg buffer, so just try to
> >>+ * grab&drop the console lock. If that fails we know that the current
> >>+ * holder will eventually drop the console lock and so flush the dmesg
> >>+ * buffers at the earliest possible time.
> >>+ */
> >The comment should describe why we added this code, please: talk about
> >cpu_hotplug.lock and console_lock.
> 
> Daniel has answered this question in another email.
> 
> >>+void console_flush(void)
> >>+{
> >>+	if (console_trylock())
> >>+		console_unlock();
> >>+}
> >>+
> >>+/**
> >>   * console_cpu_notify - print deferred console messages after CPU hotplug
> >>   * @self: notifier struct
> >>   * @action: CPU hotplug event
> >>@@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
> >>  	case CPU_DEAD:
> >>  	case CPU_DOWN_FAILED:
> >>  	case CPU_UP_CANCELED:
> >>-		console_lock();
> >>-		console_unlock();
> >>+		console_flush();
> >>  	}
> >>  	return NOTIFY_OK;
> >Well, this is a bit hacky and makes the already-far-too-complex code
> >even more complex.  If it is indeed the case that the deadlock cannot
> >really occur then let's try to find a way of suppressing the lockdep
> >warning without making runtime changes.
> >
> >What I'm struggling with is what *should* the ranking of these locks be?
> > From a conceptual high-level design standpoint, which is the
> >"innermost" lock?  I tend to think that it is console_lock, because
> >blocking CPU hotplug is a quite high-level operation.
> >
> >But console_lock is such a kooky special-case in the way it is used to
> >control the printk corking that it is hard to take general rules and
> >apply them here.
> 
> Daniel and Jan Kara have answered this question in other emails.
> 
> Do you agree with this solution or have other comments?
  Umm, I disagree with the patch. What I proposed in my answer to your
patch is something like the patch below. Does it fix your problem?

								Honza

>From 91497a88c403a7f22e78fee2f69d7413c6e8209f Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 19 Mar 2014 10:56:12 +0100
Subject: [PATCH] printk: Fixup lockdep annotation in console_suspend()

Although console_suspend() releases console_sem, it doesn't tell lockdep
about it. That results in the lockdep warning about circular locking
when doing the following:
  enter suspend ->  resume ->  plug-out CPUx (echo 0 > cpux/online)

Fix the problem by telling lockdep we actually released the semaphore in
console_suspend() and acquired it again in console_resume().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 4dae9cbe9259..e6ada322782b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1880,6 +1880,7 @@ void suspend_console(void)
 	console_lock();
 	console_suspended = 1;
 	up(&console_sem);
+	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
 }
 
 void resume_console(void)
@@ -1887,6 +1888,7 @@ void resume_console(void)
 	if (!console_suspend_enabled)
 		return;
 	down(&console_sem);
+	mutex_acquire(&console_lock_dep_map, 0, 0, _RET_IP_);
 	console_suspended = 0;
 	console_unlock();
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] printk: fix one circular lockdep warning about console_lock
       [not found] ` <532A838D.8040209@marvell.com>
@ 2014-03-20  6:15   ` Jane Li
  0 siblings, 0 replies; 9+ messages in thread
From: Jane Li @ 2014-03-20  6:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, joe, tj, fweisbec, davem, keescook, linux-kernel,
	Daniel Vetter, jiel@marvell.com

On 03/19/2014 06:00 PM, Jan Kara wrote:

> On Wed 19-03-14 11:08:08, Jane Li wrote:
>> On 02/12/2014 05:19 AM, Andrew Morton wrote:
>>> On Tue, 11 Feb 2014 14:50:00 +0800<jiel@marvell.com>  wrote:
>   Umm, I disagree with the patch. What I proposed in my answer to your patch is something like the patch below. Does it fix your problem?
>
> 								Honza
>
> From 91497a88c403a7f22e78fee2f69d7413c6e8209f Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Wed, 19 Mar 2014 10:56:12 +0100
> Subject: [PATCH] printk: Fixup lockdep annotation in console_suspend()
>
> Although console_suspend() releases console_sem, it doesn't tell lockdep about it. That results in the lockdep warning about circular locking when doing the following:
>   enter suspend ->  resume ->  plug-out CPUx (echo 0 > cpux/online)
>
> Fix the problem by telling lockdep we actually released the semaphore in
> console_suspend() and acquired it again in console_resume().
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/printk/printk.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 4dae9cbe9259..e6ada322782b 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1880,6 +1880,7 @@ void suspend_console(void)
>  	console_lock();
>  	console_suspended = 1;
>  	up(&console_sem);
> +	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
>  }
>  
>  void resume_console(void)
> @@ -1887,6 +1888,7 @@ void resume_console(void)
>  	if (!console_suspend_enabled)
>  		return;
>  	down(&console_sem);
> +	mutex_acquire(&console_lock_dep_map, 0, 0, _RET_IP_);
>  	console_suspended = 0;
>  	console_unlock();
>  }
> --
> 1.8.1.4

Oh, yes, this new solution works well. If run same test, no circular lockdep warning occurs now. I will update patch v2.

Sorry for misunderstanding your answer before.

Thanks!

Best Regards,

Jane



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-03-20  6:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-11  6:50 [PATCH] printk: fix one circular lockdep warning about console_lock jiel
2014-02-11 18:29 ` Kees Cook
2014-02-11 21:19 ` Andrew Morton
2014-02-11 21:35   ` Daniel Vetter
2014-02-13 10:48   ` Jan Kara
2014-02-14  7:42   ` Jane Li
2014-03-19  3:08   ` Jane Li
2014-03-19 10:00     ` Jan Kara
     [not found] <23E65EE4544D484CBE73B588E192C6D620B93AC9EB@sc-vexch3.marvell.com>
     [not found] ` <532A838D.8040209@marvell.com>
2014-03-20  6:15   ` Jane Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.