* [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
@ 2017-11-07 21:08 Ville Syrjala
2017-11-07 22:47 ` Rafael J. Wysocki
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Ville Syrjala @ 2017-11-07 21:08 UTC (permalink / raw)
To: linux-acpi
Cc: intel-gfx, stable, Rafael J . Wysocki, Len Brown, Peter Zijlstra,
Tejun Heo, Ingo Molnar
From: Ville Syrjälä <ville.syrjala@linux.intel.com>
acpi_remove_pm_notifier() ends up calling flush_workqueue() while
holding acpi_pm_notifier_lock, and that same lock is taken by
by the work via acpi_pm_notify_handler(). This can deadlock.
To fix the problem let's split the single lock into two: one to
protect the dev->wakeup between the work vs. add/remove, and
another one to handle notifier installation vs. removal.
After commit a1d14934ea4b ("workqueue/lockdep: 'Fix' flush_work()
annotation") I was able to kill the machine (Intel Braswell)
very easily with 'powertop --auto-tune', runtime suspending i915,
and trying to wake it up via the USB keyboard. The cases when
it didn't die are presumably explained by lockdep getting disabled
by something else (cpu hotplug locking issues usually).
Fortunately I still got a lockdep report over netconsole
(trickling in very slowly), even though the machine was
otherwise practically dead:
[ 112.179806] ======================================================
[ 114.670858] WARNING: possible circular locking dependency detected
[ 117.155663] 4.13.0-rc6-bsw-bisect-00169-ga1d14934ea4b #119 Not tainted
[ 119.658101] ------------------------------------------------------
[ 121.310242] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command.
[ 121.313294] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
[ 121.313346] xhci_hcd 0000:00:14.0: HC died; cleaning up
[ 121.313485] usb 1-6: USB disconnect, device number 3
[ 121.313501] usb 1-6.2: USB disconnect, device number 4
[ 134.747383] kworker/0:2/47 is trying to acquire lock:
[ 137.220790] (acpi_pm_notifier_lock){+.+.}, at: [<ffffffff813cafdf>] acpi_pm_notify_handler+0x2f/0x80
[ 139.721524]
[ 139.721524] but task is already holding lock:
[ 144.672922] ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
[ 147.184450]
[ 147.184450] which lock already depends on the new lock.
[ 147.184450]
[ 154.604711]
[ 154.604711] the existing dependency chain (in reverse order) is:
[ 159.447888]
[ 159.447888] -> #2 ((&dpc->work)){+.+.}:
[ 164.183486] __lock_acquire+0x1255/0x13f0
[ 166.504313] lock_acquire+0xb5/0x210
[ 168.778973] process_one_work+0x1b9/0x720
[ 171.030316] worker_thread+0x4c/0x440
[ 173.257184] kthread+0x154/0x190
[ 175.456143] ret_from_fork+0x27/0x40
[ 177.624348]
[ 177.624348] -> #1 ("kacpi_notify"){+.+.}:
[ 181.850351] __lock_acquire+0x1255/0x13f0
[ 183.941695] lock_acquire+0xb5/0x210
[ 186.046115] flush_workqueue+0xdd/0x510
[ 190.408153] acpi_os_wait_events_complete+0x31/0x40
[ 192.625303] acpi_remove_notify_handler+0x133/0x188
[ 194.820829] acpi_remove_pm_notifier+0x56/0x90
[ 196.989068] acpi_dev_pm_detach+0x5f/0xa0
[ 199.145866] dev_pm_domain_detach+0x27/0x30
[ 201.285614] i2c_device_probe+0x100/0x210
[ 203.411118] driver_probe_device+0x23e/0x310
[ 205.522425] __driver_attach+0xa3/0xb0
[ 207.634268] bus_for_each_dev+0x69/0xa0
[ 209.714797] driver_attach+0x1e/0x20
[ 211.778258] bus_add_driver+0x1bc/0x230
[ 213.837162] driver_register+0x60/0xe0
[ 215.868162] i2c_register_driver+0x42/0x70
[ 217.869551] 0xffffffffa0172017
[ 219.863009] do_one_initcall+0x45/0x170
[ 221.843863] do_init_module+0x5f/0x204
[ 223.817915] load_module+0x225b/0x29b0
[ 225.757234] SyS_finit_module+0xc6/0xd0
[ 227.661851] do_syscall_64+0x5c/0x120
[ 229.536819] return_from_SYSCALL_64+0x0/0x7a
[ 231.392444]
[ 231.392444] -> #0 (acpi_pm_notifier_lock){+.+.}:
[ 235.124914] check_prev_add+0x44e/0x8a0
[ 237.024795] __lock_acquire+0x1255/0x13f0
[ 238.937351] lock_acquire+0xb5/0x210
[ 240.840799] __mutex_lock+0x75/0x940
[ 242.709517] mutex_lock_nested+0x1c/0x20
[ 244.551478] acpi_pm_notify_handler+0x2f/0x80
[ 246.382052] acpi_ev_notify_dispatch+0x44/0x5c
[ 248.194412] acpi_os_execute_deferred+0x14/0x30
[ 250.003925] process_one_work+0x1ec/0x720
[ 251.803191] worker_thread+0x4c/0x440
[ 253.605307] kthread+0x154/0x190
[ 255.387498] ret_from_fork+0x27/0x40
[ 257.153175]
[ 257.153175] other info that might help us debug this:
[ 257.153175]
[ 262.324392] Chain exists of:
[ 262.324392] acpi_pm_notifier_lock --> "kacpi_notify" --> (&dpc->work)
[ 262.324392]
[ 267.391997] Possible unsafe locking scenario:
[ 267.391997]
[ 270.758262] CPU0 CPU1
[ 272.431713] ---- ----
[ 274.060756] lock((&dpc->work));
[ 275.646532] lock("kacpi_notify");
[ 277.260772] lock((&dpc->work));
[ 278.839146] lock(acpi_pm_notifier_lock);
[ 280.391902]
[ 280.391902] *** DEADLOCK ***
[ 280.391902]
[ 284.986385] 2 locks held by kworker/0:2/47:
[ 286.524895] #0: ("kacpi_notify"){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
[ 288.112927] #1: ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
[ 289.727725]
Cc: stable@vger.kernel.org
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
drivers/acpi/device_pm.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
index fbcc73f7a099..18af71057b44 100644
--- a/drivers/acpi/device_pm.c
+++ b/drivers/acpi/device_pm.c
@@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
#ifdef CONFIG_PM
static DEFINE_MUTEX(acpi_pm_notifier_lock);
+static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
void acpi_pm_wakeup_event(struct device *dev)
{
@@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
if (!dev && !func)
return AE_BAD_PARAMETER;
- mutex_lock(&acpi_pm_notifier_lock);
+ mutex_lock(&acpi_pm_notifier_install_lock);
if (adev->wakeup.flags.notifier_present)
goto out;
- adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
- adev->wakeup.context.dev = dev;
- adev->wakeup.context.func = func;
-
status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
acpi_pm_notify_handler, NULL);
if (ACPI_FAILURE(status))
goto out;
+ mutex_lock(&acpi_pm_notifier_lock);
+ adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
+ adev->wakeup.context.dev = dev;
+ adev->wakeup.context.func = func;
adev->wakeup.flags.notifier_present = true;
+ mutex_unlock(&acpi_pm_notifier_lock);
out:
- mutex_unlock(&acpi_pm_notifier_lock);
+ mutex_unlock(&acpi_pm_notifier_install_lock);
return status;
}
@@ -472,7 +474,7 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev)
{
acpi_status status = AE_BAD_PARAMETER;
- mutex_lock(&acpi_pm_notifier_lock);
+ mutex_lock(&acpi_pm_notifier_install_lock);
if (!adev->wakeup.flags.notifier_present)
goto out;
@@ -483,14 +485,15 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev)
if (ACPI_FAILURE(status))
goto out;
+ mutex_lock(&acpi_pm_notifier_lock);
adev->wakeup.context.func = NULL;
adev->wakeup.context.dev = NULL;
wakeup_source_unregister(adev->wakeup.ws);
-
adev->wakeup.flags.notifier_present = false;
+ mutex_unlock(&acpi_pm_notifier_lock);
out:
- mutex_unlock(&acpi_pm_notifier_lock);
+ mutex_unlock(&acpi_pm_notifier_install_lock);
return status;
}
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
2017-11-07 21:08 [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock Ville Syrjala
@ 2017-11-07 22:47 ` Rafael J. Wysocki
2017-11-08 9:13 ` ✗ Fi.CI.BAT: failure for " Patchwork
2017-11-08 12:21 ` ✓ Fi.CI.BAT: success " Patchwork
2 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-07 22:47 UTC (permalink / raw)
To: Ville Syrjala
Cc: Peter Zijlstra, intel-gfx, Rafael J . Wysocki, Stable,
ACPI Devel Maling List, Tejun Heo, Ingo Molnar, Len Brown
On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
<ville.syrjala@linux.intel.com> wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> holding acpi_pm_notifier_lock, and that same lock is taken by
> by the work via acpi_pm_notify_handler(). This can deadlock.
OK, good catch!
[cut]
>
> Cc: stable@vger.kernel.org
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
> drivers/acpi/device_pm.c | 21 ++++++++++++---------
> 1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> index fbcc73f7a099..18af71057b44 100644
> --- a/drivers/acpi/device_pm.c
> +++ b/drivers/acpi/device_pm.c
> @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
>
> #ifdef CONFIG_PM
> static DEFINE_MUTEX(acpi_pm_notifier_lock);
> +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
>
> void acpi_pm_wakeup_event(struct device *dev)
> {
> @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> if (!dev && !func)
> return AE_BAD_PARAMETER;
>
> - mutex_lock(&acpi_pm_notifier_lock);
> + mutex_lock(&acpi_pm_notifier_install_lock);
>
> if (adev->wakeup.flags.notifier_present)
> goto out;
>
> - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> - adev->wakeup.context.dev = dev;
> - adev->wakeup.context.func = func;
> -
But this doesn't look good to me.
notifier_present should be checked under acpi_pm_notifier_lock.
Actually, acpi_install_notify_handler() itself need not be called
under the lock, because it does sufficient checks of its own.
So say you do
mutex_lock(&acpi_pm_notifier_lock);
if (adev->wakeup.context.dev)
goto out;
adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
adev->wakeup.context.dev = dev;
adev->wakeup.context.func = func;
mutex_unlock(&acpi_pm_notifier_lock);
> status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> acpi_pm_notify_handler, NULL);
> if (ACPI_FAILURE(status))
> goto out;
>
> + mutex_lock(&acpi_pm_notifier_lock);
And here you just set notifier_present under acpi_pm_notifier_lock.
> + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> + adev->wakeup.context.dev = dev;
> + adev->wakeup.context.func = func;
> adev->wakeup.flags.notifier_present = true;
> + mutex_unlock(&acpi_pm_notifier_lock);
>
> out:
> - mutex_unlock(&acpi_pm_notifier_lock);
> + mutex_unlock(&acpi_pm_notifier_install_lock);
> return status;
> }
Then on removal you can clear notifier_present first and drop the lock
around the acpi_remove_notify_handler() call and nothing bad will
happen.
If you call acpi_add_pm_notifier() twice in parallel, the first
instance will set context.dev and the second one will see it set and
bail out. The first instance will then do the rest.
If you call acpi_remove_pm_notifier() twice in a row, the first
instance will see notifier_present set and will clear it, so the
second one will see notifier_present unset and it will bail out.
Now, if you call acpi_remove_pm_notifier() in parallel with
acpi_add_pm_notifier(), either the former will see notifier_present
unset and bail out, or the latter will see context.dev unset and bail
out.
It doesn't look like the outer lock is needed, or have I missed anything?
Thanks,
Rafael
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
@ 2017-11-07 22:47 ` Rafael J. Wysocki
0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-07 22:47 UTC (permalink / raw)
To: Ville Syrjala
Cc: ACPI Devel Maling List, intel-gfx, Stable, Rafael J . Wysocki,
Len Brown, Peter Zijlstra, Tejun Heo, Ingo Molnar
On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
<ville.syrjala@linux.intel.com> wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> holding acpi_pm_notifier_lock, and that same lock is taken by
> by the work via acpi_pm_notify_handler(). This can deadlock.
OK, good catch!
[cut]
>
> Cc: stable@vger.kernel.org
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
> drivers/acpi/device_pm.c | 21 ++++++++++++---------
> 1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> index fbcc73f7a099..18af71057b44 100644
> --- a/drivers/acpi/device_pm.c
> +++ b/drivers/acpi/device_pm.c
> @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
>
> #ifdef CONFIG_PM
> static DEFINE_MUTEX(acpi_pm_notifier_lock);
> +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
>
> void acpi_pm_wakeup_event(struct device *dev)
> {
> @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> if (!dev && !func)
> return AE_BAD_PARAMETER;
>
> - mutex_lock(&acpi_pm_notifier_lock);
> + mutex_lock(&acpi_pm_notifier_install_lock);
>
> if (adev->wakeup.flags.notifier_present)
> goto out;
>
> - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> - adev->wakeup.context.dev = dev;
> - adev->wakeup.context.func = func;
> -
But this doesn't look good to me.
notifier_present should be checked under acpi_pm_notifier_lock.
Actually, acpi_install_notify_handler() itself need not be called
under the lock, because it does sufficient checks of its own.
So say you do
mutex_lock(&acpi_pm_notifier_lock);
if (adev->wakeup.context.dev)
goto out;
adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
adev->wakeup.context.dev = dev;
adev->wakeup.context.func = func;
mutex_unlock(&acpi_pm_notifier_lock);
> status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> acpi_pm_notify_handler, NULL);
> if (ACPI_FAILURE(status))
> goto out;
>
> + mutex_lock(&acpi_pm_notifier_lock);
And here you just set notifier_present under acpi_pm_notifier_lock.
> + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> + adev->wakeup.context.dev = dev;
> + adev->wakeup.context.func = func;
> adev->wakeup.flags.notifier_present = true;
> + mutex_unlock(&acpi_pm_notifier_lock);
>
> out:
> - mutex_unlock(&acpi_pm_notifier_lock);
> + mutex_unlock(&acpi_pm_notifier_install_lock);
> return status;
> }
Then on removal you can clear notifier_present first and drop the lock
around the acpi_remove_notify_handler() call and nothing bad will
happen.
If you call acpi_add_pm_notifier() twice in parallel, the first
instance will set context.dev and the second one will see it set and
bail out. The first instance will then do the rest.
If you call acpi_remove_pm_notifier() twice in a row, the first
instance will see notifier_present set and will clear it, so the
second one will see notifier_present unset and it will bail out.
Now, if you call acpi_remove_pm_notifier() in parallel with
acpi_add_pm_notifier(), either the former will see notifier_present
unset and bail out, or the latter will see context.dev unset and bail
out.
It doesn't look like the outer lock is needed, or have I missed anything?
Thanks,
Rafael
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
2017-11-07 22:47 ` Rafael J. Wysocki
@ 2017-11-07 23:06 ` Rafael J. Wysocki
-1 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-07 23:06 UTC (permalink / raw)
To: Ville Syrjala
Cc: Rafael J. Wysocki, ACPI Devel Maling List, intel-gfx, Stable,
Rafael J . Wysocki, Len Brown, Peter Zijlstra, Tejun Heo,
Ingo Molnar
On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> <ville.syrjala@linux.intel.com> wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >
> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > by the work via acpi_pm_notify_handler(). This can deadlock.
>
> OK, good catch!
>
> [cut]
>
> >
> > Cc: stable@vger.kernel.org
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Len Brown <lenb@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > 1 file changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > index fbcc73f7a099..18af71057b44 100644
> > --- a/drivers/acpi/device_pm.c
> > +++ b/drivers/acpi/device_pm.c
> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> >
> > #ifdef CONFIG_PM
> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> >
> > void acpi_pm_wakeup_event(struct device *dev)
> > {
> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > if (!dev && !func)
> > return AE_BAD_PARAMETER;
> >
> > - mutex_lock(&acpi_pm_notifier_lock);
> > + mutex_lock(&acpi_pm_notifier_install_lock);
> >
> > if (adev->wakeup.flags.notifier_present)
> > goto out;
> >
> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > - adev->wakeup.context.dev = dev;
> > - adev->wakeup.context.func = func;
> > -
>
> But this doesn't look good to me.
>
> notifier_present should be checked under acpi_pm_notifier_lock.
>
> Actually, acpi_install_notify_handler() itself need not be called
> under the lock, because it does sufficient checks of its own.
>
> So say you do
>
> mutex_lock(&acpi_pm_notifier_lock);
>
> if (adev->wakeup.context.dev)
> goto out;
>
> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> adev->wakeup.context.dev = dev;
> adev->wakeup.context.func = func;
>
> mutex_unlock(&acpi_pm_notifier_lock);
>
> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > acpi_pm_notify_handler, NULL);
> > if (ACPI_FAILURE(status))
> > goto out;
> >
> > + mutex_lock(&acpi_pm_notifier_lock);
>
> And here you just set notifier_present under acpi_pm_notifier_lock.
>
> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > + adev->wakeup.context.dev = dev;
> > + adev->wakeup.context.func = func;
> > adev->wakeup.flags.notifier_present = true;
> > + mutex_unlock(&acpi_pm_notifier_lock);
> >
> > out:
> > - mutex_unlock(&acpi_pm_notifier_lock);
> > + mutex_unlock(&acpi_pm_notifier_install_lock);
> > return status;
> > }
>
> Then on removal you can clear notifier_present first and drop the lock
> around the acpi_remove_notify_handler() call and nothing bad will
> happen.
>
> If you call acpi_add_pm_notifier() twice in parallel, the first
> instance will set context.dev and the second one will see it set and
> bail out. The first instance will then do the rest.
>
> If you call acpi_remove_pm_notifier() twice in a row, the first
> instance will see notifier_present set and will clear it, so the
> second one will see notifier_present unset and it will bail out.
>
> Now, if you call acpi_remove_pm_notifier() in parallel with
> acpi_add_pm_notifier(), either the former will see notifier_present
> unset and bail out, or the latter will see context.dev unset and bail
> out.
>
> It doesn't look like the outer lock is needed, or have I missed anything?
So something like the below (totally untested) should work too, shouldn't it?
---
drivers/acpi/device_pm.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)
Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -424,6 +424,13 @@ static void acpi_pm_notify_handler(acpi_
acpi_bus_put_acpi_device(adev);
}
+static void acpi_dev_wakeup_cleanup(struct acpi_device *adev)
+{
+ adev->wakeup.context.func = NULL;
+ adev->wakeup.context.dev = NULL;
+ wakeup_source_unregister(adev->wakeup.ws);
+}
+
/**
* acpi_add_pm_notifier - Register PM notify handler for given ACPI device.
* @adev: ACPI device to add the notify handler for.
@@ -445,17 +452,24 @@ acpi_status acpi_add_pm_notifier(struct
mutex_lock(&acpi_pm_notifier_lock);
- if (adev->wakeup.flags.notifier_present)
+ if (adev->wakeup.context.dev || adev->wakeup.context.func)
goto out;
adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
adev->wakeup.context.dev = dev;
adev->wakeup.context.func = func;
+ mutex_unlock(&acpi_pm_notifier_lock);
+
status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
acpi_pm_notify_handler, NULL);
- if (ACPI_FAILURE(status))
+
+ mutex_lock(&acpi_pm_notifier_lock);
+
+ if (ACPI_FAILURE(status)) {
+ acpi_dev_wakeup_cleanup(adev);
goto out;
+ }
adev->wakeup.flags.notifier_present = true;
@@ -477,17 +491,22 @@ acpi_status acpi_remove_pm_notifier(stru
if (!adev->wakeup.flags.notifier_present)
goto out;
+ adev->wakeup.flags.notifier_present = false;
+
+ mutex_unlock(&acpi_pm_notifier_lock);
+
status = acpi_remove_notify_handler(adev->handle,
ACPI_SYSTEM_NOTIFY,
acpi_pm_notify_handler);
- if (ACPI_FAILURE(status))
- goto out;
- adev->wakeup.context.func = NULL;
- adev->wakeup.context.dev = NULL;
- wakeup_source_unregister(adev->wakeup.ws);
+ mutex_lock(&acpi_pm_notifier_lock);
- adev->wakeup.flags.notifier_present = false;
+ if (ACPI_FAILURE(status)) {
+ adev->wakeup.flags.notifier_present = true;
+ goto out;
+ }
+
+ acpi_dev_wakeup_cleanup(adev);
out:
mutex_unlock(&acpi_pm_notifier_lock);
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
@ 2017-11-07 23:06 ` Rafael J. Wysocki
0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-07 23:06 UTC (permalink / raw)
To: Ville Syrjala
Cc: Rafael J. Wysocki, ACPI Devel Maling List, intel-gfx, Stable,
Rafael J . Wysocki, Len Brown, Peter Zijlstra, Tejun Heo,
Ingo Molnar
On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> <ville.syrjala@linux.intel.com> wrote:
> > From: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> >
> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > by the work via acpi_pm_notify_handler(). This can deadlock.
>
> OK, good catch!
>
> [cut]
>
> >
> > Cc: stable@vger.kernel.org
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Len Brown <lenb@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > Signed-off-by: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > ---
> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > 1 file changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > index fbcc73f7a099..18af71057b44 100644
> > --- a/drivers/acpi/device_pm.c
> > +++ b/drivers/acpi/device_pm.c
> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> >
> > #ifdef CONFIG_PM
> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> >
> > void acpi_pm_wakeup_event(struct device *dev)
> > {
> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > if (!dev && !func)
> > return AE_BAD_PARAMETER;
> >
> > - mutex_lock(&acpi_pm_notifier_lock);
> > + mutex_lock(&acpi_pm_notifier_install_lock);
> >
> > if (adev->wakeup.flags.notifier_present)
> > goto out;
> >
> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > - adev->wakeup.context.dev = dev;
> > - adev->wakeup.context.func = func;
> > -
>
> But this doesn't look good to me.
>
> notifier_present should be checked under acpi_pm_notifier_lock.
>
> Actually, acpi_install_notify_handler() itself need not be called
> under the lock, because it does sufficient checks of its own.
>
> So say you do
>
> mutex_lock(&acpi_pm_notifier_lock);
>
> if (adev->wakeup.context.dev)
> goto out;
>
> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> adev->wakeup.context.dev = dev;
> adev->wakeup.context.func = func;
>
> mutex_unlock(&acpi_pm_notifier_lock);
>
> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > acpi_pm_notify_handler, NULL);
> > if (ACPI_FAILURE(status))
> > goto out;
> >
> > + mutex_lock(&acpi_pm_notifier_lock);
>
> And here you just set notifier_present under acpi_pm_notifier_lock.
>
> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > + adev->wakeup.context.dev = dev;
> > + adev->wakeup.context.func = func;
> > adev->wakeup.flags.notifier_present = true;
> > + mutex_unlock(&acpi_pm_notifier_lock);
> >
> > out:
> > - mutex_unlock(&acpi_pm_notifier_lock);
> > + mutex_unlock(&acpi_pm_notifier_install_lock);
> > return status;
> > }
>
> Then on removal you can clear notifier_present first and drop the lock
> around the acpi_remove_notify_handler() call and nothing bad will
> happen.
>
> If you call acpi_add_pm_notifier() twice in parallel, the first
> instance will set context.dev and the second one will see it set and
> bail out. The first instance will then do the rest.
>
> If you call acpi_remove_pm_notifier() twice in a row, the first
> instance will see notifier_present set and will clear it, so the
> second one will see notifier_present unset and it will bail out.
>
> Now, if you call acpi_remove_pm_notifier() in parallel with
> acpi_add_pm_notifier(), either the former will see notifier_present
> unset and bail out, or the latter will see context.dev unset and bail
> out.
>
> It doesn't look like the outer lock is needed, or have I missed anything?
So something like the below (totally untested) should work too, shouldn't it?
---
drivers/acpi/device_pm.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)
Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -424,6 +424,13 @@ static void acpi_pm_notify_handler(acpi_
acpi_bus_put_acpi_device(adev);
}
+static void acpi_dev_wakeup_cleanup(struct acpi_device *adev)
+{
+ adev->wakeup.context.func = NULL;
+ adev->wakeup.context.dev = NULL;
+ wakeup_source_unregister(adev->wakeup.ws);
+}
+
/**
* acpi_add_pm_notifier - Register PM notify handler for given ACPI device.
* @adev: ACPI device to add the notify handler for.
@@ -445,17 +452,24 @@ acpi_status acpi_add_pm_notifier(struct
mutex_lock(&acpi_pm_notifier_lock);
- if (adev->wakeup.flags.notifier_present)
+ if (adev->wakeup.context.dev || adev->wakeup.context.func)
goto out;
adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
adev->wakeup.context.dev = dev;
adev->wakeup.context.func = func;
+ mutex_unlock(&acpi_pm_notifier_lock);
+
status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
acpi_pm_notify_handler, NULL);
- if (ACPI_FAILURE(status))
+
+ mutex_lock(&acpi_pm_notifier_lock);
+
+ if (ACPI_FAILURE(status)) {
+ acpi_dev_wakeup_cleanup(adev);
goto out;
+ }
adev->wakeup.flags.notifier_present = true;
@@ -477,17 +491,22 @@ acpi_status acpi_remove_pm_notifier(stru
if (!adev->wakeup.flags.notifier_present)
goto out;
+ adev->wakeup.flags.notifier_present = false;
+
+ mutex_unlock(&acpi_pm_notifier_lock);
+
status = acpi_remove_notify_handler(adev->handle,
ACPI_SYSTEM_NOTIFY,
acpi_pm_notify_handler);
- if (ACPI_FAILURE(status))
- goto out;
- adev->wakeup.context.func = NULL;
- adev->wakeup.context.dev = NULL;
- wakeup_source_unregister(adev->wakeup.ws);
+ mutex_lock(&acpi_pm_notifier_lock);
- adev->wakeup.flags.notifier_present = false;
+ if (ACPI_FAILURE(status)) {
+ adev->wakeup.flags.notifier_present = true;
+ goto out;
+ }
+
+ acpi_dev_wakeup_cleanup(adev);
out:
mutex_unlock(&acpi_pm_notifier_lock);
^ permalink raw reply [flat|nested] 18+ messages in thread
* ✗ Fi.CI.BAT: failure for ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
2017-11-07 21:08 [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock Ville Syrjala
2017-11-07 22:47 ` Rafael J. Wysocki
@ 2017-11-08 9:13 ` Patchwork
2017-11-08 12:21 ` ✓ Fi.CI.BAT: success " Patchwork
2 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2017-11-08 9:13 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: intel-gfx
== Series Details ==
Series: ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
URL : https://patchwork.freedesktop.org/series/33372/
State : failure
== Summary ==
Series 33372v1 ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
https://patchwork.freedesktop.org/api/1.0/series/33372/revisions/1/mbox/
Test chamelium:
Subgroup dp-crc-fast:
fail -> PASS (fi-kbl-7500u) fdo#102514
Test gem_sync:
Subgroup basic-store-each:
pass -> FAIL (fi-glk-dsi)
Test kms_addfb_basic:
Subgroup bad-pitch-63:
pass -> INCOMPLETE (fi-glk-dsi)
Test kms_flip:
Subgroup basic-plain-flip:
incomplete -> PASS (fi-cnl-y) fdo#103339
fdo#102514 https://bugs.freedesktop.org/show_bug.cgi?id=102514
fdo#103339 https://bugs.freedesktop.org/show_bug.cgi?id=103339
fi-bdw-5557u total:289 pass:268 dwarn:0 dfail:0 fail:0 skip:21 time:442s
fi-blb-e6850 total:289 pass:223 dwarn:1 dfail:0 fail:0 skip:65 time:379s
fi-bsw-n3050 total:289 pass:243 dwarn:0 dfail:0 fail:0 skip:46 time:539s
fi-bwr-2160 total:289 pass:183 dwarn:0 dfail:0 fail:0 skip:106 time:278s
fi-bxt-dsi total:289 pass:259 dwarn:0 dfail:0 fail:0 skip:30 time:508s
fi-bxt-j4205 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:511s
fi-byt-j1900 total:289 pass:254 dwarn:0 dfail:0 fail:0 skip:35 time:491s
fi-cnl-y total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:557s
fi-elk-e7500 total:289 pass:229 dwarn:0 dfail:0 fail:0 skip:60 time:433s
fi-gdg-551 total:289 pass:178 dwarn:1 dfail:0 fail:1 skip:109 time:262s
fi-glk-1 total:289 pass:261 dwarn:0 dfail:0 fail:0 skip:28 time:541s
fi-glk-dsi total:182 pass:158 dwarn:0 dfail:0 fail:1 skip:22
fi-hsw-4770 total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:433s
fi-hsw-4770r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:428s
fi-ilk-650 total:289 pass:228 dwarn:0 dfail:0 fail:0 skip:61 time:425s
fi-ivb-3520m total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:476s
fi-ivb-3770 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:461s
fi-kbl-7500u total:289 pass:264 dwarn:1 dfail:0 fail:0 skip:24 time:485s
fi-kbl-7560u total:289 pass:270 dwarn:0 dfail:0 fail:0 skip:19 time:520s
fi-kbl-7567u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:479s
fi-kbl-r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:531s
fi-pnv-d510 total:289 pass:222 dwarn:1 dfail:0 fail:0 skip:66 time:570s
fi-skl-6260u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:456s
fi-skl-6600u total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:543s
fi-skl-6700hq total:289 pass:263 dwarn:0 dfail:0 fail:0 skip:26 time:565s
fi-skl-6700k total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:518s
fi-skl-6770hq total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:499s
fi-skl-gvtdvm total:289 pass:266 dwarn:0 dfail:0 fail:0 skip:23 time:455s
fi-snb-2520m total:289 pass:250 dwarn:0 dfail:0 fail:0 skip:39 time:554s
fi-snb-2600 total:289 pass:249 dwarn:0 dfail:0 fail:0 skip:40 time:421s
fi-bdw-gvtdvm failed to connect after reboot
fi-byt-n2820 failed to connect after reboot
748f2c6b4046b23a623b4af3799563ef3110bb0d drm-tip: 2017y-11m-08d-07h-50m-13s UTC integration manifest
9c87551c19e0 ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7001/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
2017-11-07 22:47 ` Rafael J. Wysocki
@ 2017-11-08 9:47 ` Rafael J. Wysocki
-1 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 9:47 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Peter Zijlstra, intel-gfx, Rafael J . Wysocki, Stable,
ACPI Devel Maling List, Tejun Heo, Ingo Molnar, Len Brown
On Tue, Nov 7, 2017 at 11:47 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> <ville.syrjala@linux.intel.com> wrote:
>> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>
>> acpi_remove_pm_notifier() ends up calling flush_workqueue() while
>> holding acpi_pm_notifier_lock, and that same lock is taken by
>> by the work via acpi_pm_notify_handler(). This can deadlock.
>
> OK, good catch!
>
> [cut]
>
>>
>> Cc: stable@vger.kernel.org
>> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Cc: Len Brown <lenb@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
>> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> ---
>> drivers/acpi/device_pm.c | 21 ++++++++++++---------
>> 1 file changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
>> index fbcc73f7a099..18af71057b44 100644
>> --- a/drivers/acpi/device_pm.c
>> +++ b/drivers/acpi/device_pm.c
>> @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
>>
>> #ifdef CONFIG_PM
>> static DEFINE_MUTEX(acpi_pm_notifier_lock);
>> +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
>>
>> void acpi_pm_wakeup_event(struct device *dev)
>> {
>> @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
>> if (!dev && !func)
>> return AE_BAD_PARAMETER;
>>
>> - mutex_lock(&acpi_pm_notifier_lock);
>> + mutex_lock(&acpi_pm_notifier_install_lock);
>>
>> if (adev->wakeup.flags.notifier_present)
>> goto out;
>>
>> - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> - adev->wakeup.context.dev = dev;
>> - adev->wakeup.context.func = func;
>> -
>
> But this doesn't look good to me.
>
> notifier_present should be checked under acpi_pm_notifier_lock.
Well, not really, so the above is OK.
However, I still would prefer to avoid adding the outer lock.
Thanks,
Rafael
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
@ 2017-11-08 9:47 ` Rafael J. Wysocki
0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 9:47 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Ville Syrjala, ACPI Devel Maling List, intel-gfx, Stable,
Rafael J . Wysocki, Len Brown, Peter Zijlstra, Tejun Heo,
Ingo Molnar
On Tue, Nov 7, 2017 at 11:47 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> <ville.syrjala@linux.intel.com> wrote:
>> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>
>> acpi_remove_pm_notifier() ends up calling flush_workqueue() while
>> holding acpi_pm_notifier_lock, and that same lock is taken by
>> by the work via acpi_pm_notify_handler(). This can deadlock.
>
> OK, good catch!
>
> [cut]
>
>>
>> Cc: stable@vger.kernel.org
>> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Cc: Len Brown <lenb@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
>> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> ---
>> drivers/acpi/device_pm.c | 21 ++++++++++++---------
>> 1 file changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
>> index fbcc73f7a099..18af71057b44 100644
>> --- a/drivers/acpi/device_pm.c
>> +++ b/drivers/acpi/device_pm.c
>> @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
>>
>> #ifdef CONFIG_PM
>> static DEFINE_MUTEX(acpi_pm_notifier_lock);
>> +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
>>
>> void acpi_pm_wakeup_event(struct device *dev)
>> {
>> @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
>> if (!dev && !func)
>> return AE_BAD_PARAMETER;
>>
>> - mutex_lock(&acpi_pm_notifier_lock);
>> + mutex_lock(&acpi_pm_notifier_install_lock);
>>
>> if (adev->wakeup.flags.notifier_present)
>> goto out;
>>
>> - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> - adev->wakeup.context.dev = dev;
>> - adev->wakeup.context.func = func;
>> -
>
> But this doesn't look good to me.
>
> notifier_present should be checked under acpi_pm_notifier_lock.
Well, not really, so the above is OK.
However, I still would prefer to avoid adding the outer lock.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 18+ messages in thread
* ✓ Fi.CI.BAT: success for ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
2017-11-07 21:08 [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock Ville Syrjala
2017-11-07 22:47 ` Rafael J. Wysocki
2017-11-08 9:13 ` ✗ Fi.CI.BAT: failure for " Patchwork
@ 2017-11-08 12:21 ` Patchwork
2 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2017-11-08 12:21 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: intel-gfx
== Series Details ==
Series: ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
URL : https://patchwork.freedesktop.org/series/33372/
State : success
== Summary ==
Series 33372v1 ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
https://patchwork.freedesktop.org/api/1.0/series/33372/revisions/1/mbox/
Test chamelium:
Subgroup dp-crc-fast:
fail -> PASS (fi-kbl-7500u) fdo#102514
fdo#102514 https://bugs.freedesktop.org/show_bug.cgi?id=102514
fi-bdw-5557u total:289 pass:268 dwarn:0 dfail:0 fail:0 skip:21 time:442s
fi-blb-e6850 total:289 pass:223 dwarn:1 dfail:0 fail:0 skip:65 time:379s
fi-bsw-n3050 total:289 pass:243 dwarn:0 dfail:0 fail:0 skip:46 time:539s
fi-bwr-2160 total:289 pass:183 dwarn:0 dfail:0 fail:0 skip:106 time:278s
fi-bxt-dsi total:289 pass:259 dwarn:0 dfail:0 fail:0 skip:30 time:508s
fi-bxt-j4205 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:511s
fi-byt-j1900 total:289 pass:254 dwarn:0 dfail:0 fail:0 skip:35 time:491s
fi-elk-e7500 total:289 pass:229 dwarn:0 dfail:0 fail:0 skip:60 time:433s
fi-gdg-551 total:289 pass:178 dwarn:1 dfail:0 fail:1 skip:109 time:262s
fi-glk-1 total:289 pass:261 dwarn:0 dfail:0 fail:0 skip:28 time:541s
fi-hsw-4770 total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:433s
fi-hsw-4770r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:428s
fi-ilk-650 total:289 pass:228 dwarn:0 dfail:0 fail:0 skip:61 time:425s
fi-ivb-3520m total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:476s
fi-ivb-3770 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:461s
fi-kbl-7500u total:289 pass:264 dwarn:1 dfail:0 fail:0 skip:24 time:485s
fi-kbl-7560u total:289 pass:270 dwarn:0 dfail:0 fail:0 skip:19 time:520s
fi-kbl-7567u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:479s
fi-kbl-r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:531s
fi-pnv-d510 total:289 pass:222 dwarn:1 dfail:0 fail:0 skip:66 time:570s
fi-skl-6260u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:456s
fi-skl-6600u total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:543s
fi-skl-6700hq total:289 pass:263 dwarn:0 dfail:0 fail:0 skip:26 time:565s
fi-skl-6700k total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:518s
fi-skl-6770hq total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:499s
fi-skl-gvtdvm total:289 pass:266 dwarn:0 dfail:0 fail:0 skip:23 time:455s
fi-snb-2520m total:289 pass:250 dwarn:0 dfail:0 fail:0 skip:39 time:554s
fi-snb-2600 total:289 pass:249 dwarn:0 dfail:0 fail:0 skip:40 time:421s
Blacklisted hosts:
fi-cnl-y total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:557s
fi-glk-dsi total:182 pass:158 dwarn:0 dfail:0 fail:1 skip:22
fi-bdw-gvtdvm failed to connect after reboot
fi-byt-n2820 failed to connect after reboot
748f2c6b4046b23a623b4af3799563ef3110bb0d drm-tip: 2017y-11m-08d-07h-50m-13s UTC integration manifest
9c87551c19e0 ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7001/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
2017-11-07 23:06 ` Rafael J. Wysocki
@ 2017-11-08 12:23 ` Rafael J. Wysocki
-1 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 12:23 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Peter Zijlstra, intel-gfx, Rafael J . Wysocki,
Stable, ACPI Devel Maling List, Tejun Heo, Ingo Molnar,
Len Brown
On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
>> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
>> <ville.syrjala@linux.intel.com> wrote:
>> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> >
>> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
>> > holding acpi_pm_notifier_lock, and that same lock is taken by
>> > by the work via acpi_pm_notify_handler(). This can deadlock.
>>
>> OK, good catch!
>>
>> [cut]
>>
>> >
>> > Cc: stable@vger.kernel.org
>> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > Cc: Len Brown <lenb@kernel.org>
>> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > Cc: Tejun Heo <tj@kernel.org>
>> > Cc: Ingo Molnar <mingo@kernel.org>
>> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
>> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > ---
>> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
>> > 1 file changed, 12 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
>> > index fbcc73f7a099..18af71057b44 100644
>> > --- a/drivers/acpi/device_pm.c
>> > +++ b/drivers/acpi/device_pm.c
>> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
>> >
>> > #ifdef CONFIG_PM
>> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
>> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
>> >
>> > void acpi_pm_wakeup_event(struct device *dev)
>> > {
>> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
>> > if (!dev && !func)
>> > return AE_BAD_PARAMETER;
>> >
>> > - mutex_lock(&acpi_pm_notifier_lock);
>> > + mutex_lock(&acpi_pm_notifier_install_lock);
>> >
>> > if (adev->wakeup.flags.notifier_present)
>> > goto out;
>> >
>> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > - adev->wakeup.context.dev = dev;
>> > - adev->wakeup.context.func = func;
>> > -
>>
>> But this doesn't look good to me.
>>
>> notifier_present should be checked under acpi_pm_notifier_lock.
>>
>> Actually, acpi_install_notify_handler() itself need not be called
>> under the lock, because it does sufficient checks of its own.
>>
>> So say you do
>>
>> mutex_lock(&acpi_pm_notifier_lock);
>>
>> if (adev->wakeup.context.dev)
>> goto out;
>>
>> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> adev->wakeup.context.dev = dev;
>> adev->wakeup.context.func = func;
>>
>> mutex_unlock(&acpi_pm_notifier_lock);
>>
>> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
>> > acpi_pm_notify_handler, NULL);
>> > if (ACPI_FAILURE(status))
>> > goto out;
>> >
>> > + mutex_lock(&acpi_pm_notifier_lock);
>>
>> And here you just set notifier_present under acpi_pm_notifier_lock.
>>
>> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > + adev->wakeup.context.dev = dev;
>> > + adev->wakeup.context.func = func;
>> > adev->wakeup.flags.notifier_present = true;
>> > + mutex_unlock(&acpi_pm_notifier_lock);
>> >
>> > out:
>> > - mutex_unlock(&acpi_pm_notifier_lock);
>> > + mutex_unlock(&acpi_pm_notifier_install_lock);
>> > return status;
>> > }
>>
>> Then on removal you can clear notifier_present first and drop the lock
>> around the acpi_remove_notify_handler() call and nothing bad will
>> happen.
>>
>> If you call acpi_add_pm_notifier() twice in parallel, the first
>> instance will set context.dev and the second one will see it set and
>> bail out. The first instance will then do the rest.
>>
>> If you call acpi_remove_pm_notifier() twice in a row, the first
>> instance will see notifier_present set and will clear it, so the
>> second one will see notifier_present unset and it will bail out.
>>
>> Now, if you call acpi_remove_pm_notifier() in parallel with
>> acpi_add_pm_notifier(), either the former will see notifier_present
>> unset and bail out, or the latter will see context.dev unset and bail
>> out.
>>
>> It doesn't look like the outer lock is needed, or have I missed anything?
>
> So something like the below (totally untested) should work too, shouldn't it?
There is a problem if a device is removed while acpi_add_pm_notifier()
is still in progress, in which case with my patch the
acpi_remove_pm_notifier() called from the removal path may bail out
prematurely (that doesn't seem likely to happen, but still I don't see
why it cannot happen), so I'll just use your patch. :-)
Thanks,
Rafael
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
@ 2017-11-08 12:23 ` Rafael J. Wysocki
0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 12:23 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Ville Syrjala, Rafael J. Wysocki, ACPI Devel Maling List,
intel-gfx, Stable, Rafael J . Wysocki, Len Brown, Peter Zijlstra,
Tejun Heo, Ingo Molnar
On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
>> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
>> <ville.syrjala@linux.intel.com> wrote:
>> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> >
>> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
>> > holding acpi_pm_notifier_lock, and that same lock is taken by
>> > by the work via acpi_pm_notify_handler(). This can deadlock.
>>
>> OK, good catch!
>>
>> [cut]
>>
>> >
>> > Cc: stable@vger.kernel.org
>> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > Cc: Len Brown <lenb@kernel.org>
>> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > Cc: Tejun Heo <tj@kernel.org>
>> > Cc: Ingo Molnar <mingo@kernel.org>
>> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
>> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > ---
>> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
>> > 1 file changed, 12 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
>> > index fbcc73f7a099..18af71057b44 100644
>> > --- a/drivers/acpi/device_pm.c
>> > +++ b/drivers/acpi/device_pm.c
>> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
>> >
>> > #ifdef CONFIG_PM
>> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
>> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
>> >
>> > void acpi_pm_wakeup_event(struct device *dev)
>> > {
>> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
>> > if (!dev && !func)
>> > return AE_BAD_PARAMETER;
>> >
>> > - mutex_lock(&acpi_pm_notifier_lock);
>> > + mutex_lock(&acpi_pm_notifier_install_lock);
>> >
>> > if (adev->wakeup.flags.notifier_present)
>> > goto out;
>> >
>> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > - adev->wakeup.context.dev = dev;
>> > - adev->wakeup.context.func = func;
>> > -
>>
>> But this doesn't look good to me.
>>
>> notifier_present should be checked under acpi_pm_notifier_lock.
>>
>> Actually, acpi_install_notify_handler() itself need not be called
>> under the lock, because it does sufficient checks of its own.
>>
>> So say you do
>>
>> mutex_lock(&acpi_pm_notifier_lock);
>>
>> if (adev->wakeup.context.dev)
>> goto out;
>>
>> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> adev->wakeup.context.dev = dev;
>> adev->wakeup.context.func = func;
>>
>> mutex_unlock(&acpi_pm_notifier_lock);
>>
>> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
>> > acpi_pm_notify_handler, NULL);
>> > if (ACPI_FAILURE(status))
>> > goto out;
>> >
>> > + mutex_lock(&acpi_pm_notifier_lock);
>>
>> And here you just set notifier_present under acpi_pm_notifier_lock.
>>
>> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > + adev->wakeup.context.dev = dev;
>> > + adev->wakeup.context.func = func;
>> > adev->wakeup.flags.notifier_present = true;
>> > + mutex_unlock(&acpi_pm_notifier_lock);
>> >
>> > out:
>> > - mutex_unlock(&acpi_pm_notifier_lock);
>> > + mutex_unlock(&acpi_pm_notifier_install_lock);
>> > return status;
>> > }
>>
>> Then on removal you can clear notifier_present first and drop the lock
>> around the acpi_remove_notify_handler() call and nothing bad will
>> happen.
>>
>> If you call acpi_add_pm_notifier() twice in parallel, the first
>> instance will set context.dev and the second one will see it set and
>> bail out. The first instance will then do the rest.
>>
>> If you call acpi_remove_pm_notifier() twice in a row, the first
>> instance will see notifier_present set and will clear it, so the
>> second one will see notifier_present unset and it will bail out.
>>
>> Now, if you call acpi_remove_pm_notifier() in parallel with
>> acpi_add_pm_notifier(), either the former will see notifier_present
>> unset and bail out, or the latter will see context.dev unset and bail
>> out.
>>
>> It doesn't look like the outer lock is needed, or have I missed anything?
>
> So something like the below (totally untested) should work too, shouldn't it?
There is a problem if a device is removed while acpi_add_pm_notifier()
is still in progress, in which case with my patch the
acpi_remove_pm_notifier() called from the removal path may bail out
prematurely (that doesn't seem likely to happen, but still I don't see
why it cannot happen), so I'll just use your patch. :-)
Thanks,
Rafael
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
2017-11-08 12:23 ` Rafael J. Wysocki
@ 2017-11-08 12:31 ` Ville Syrjälä
-1 siblings, 0 replies; 18+ messages in thread
From: Ville Syrjälä @ 2017-11-08 12:31 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, ACPI Devel Maling List, intel-gfx, Stable,
Rafael J . Wysocki, Len Brown, Peter Zijlstra, Tejun Heo,
Ingo Molnar
On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> >> <ville.syrjala@linux.intel.com> wrote:
> >> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >> >
> >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> >>
> >> OK, good catch!
> >>
> >> [cut]
> >>
> >> >
> >> > Cc: stable@vger.kernel.org
> >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> > Cc: Len Brown <lenb@kernel.org>
> >> > Cc: Peter Zijlstra <peterz@infradead.org>
> >> > Cc: Tejun Heo <tj@kernel.org>
> >> > Cc: Ingo Molnar <mingo@kernel.org>
> >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> >> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >> > ---
> >> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
> >> > 1 file changed, 12 insertions(+), 9 deletions(-)
> >> >
> >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> >> > index fbcc73f7a099..18af71057b44 100644
> >> > --- a/drivers/acpi/device_pm.c
> >> > +++ b/drivers/acpi/device_pm.c
> >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> >> >
> >> > #ifdef CONFIG_PM
> >> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
> >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> >> >
> >> > void acpi_pm_wakeup_event(struct device *dev)
> >> > {
> >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> >> > if (!dev && !func)
> >> > return AE_BAD_PARAMETER;
> >> >
> >> > - mutex_lock(&acpi_pm_notifier_lock);
> >> > + mutex_lock(&acpi_pm_notifier_install_lock);
> >> >
> >> > if (adev->wakeup.flags.notifier_present)
> >> > goto out;
> >> >
> >> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> >> > - adev->wakeup.context.dev = dev;
> >> > - adev->wakeup.context.func = func;
> >> > -
> >>
> >> But this doesn't look good to me.
> >>
> >> notifier_present should be checked under acpi_pm_notifier_lock.
> >>
> >> Actually, acpi_install_notify_handler() itself need not be called
> >> under the lock, because it does sufficient checks of its own.
> >>
> >> So say you do
> >>
> >> mutex_lock(&acpi_pm_notifier_lock);
> >>
> >> if (adev->wakeup.context.dev)
> >> goto out;
> >>
> >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> >> adev->wakeup.context.dev = dev;
> >> adev->wakeup.context.func = func;
> >>
> >> mutex_unlock(&acpi_pm_notifier_lock);
> >>
> >> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> >> > acpi_pm_notify_handler, NULL);
> >> > if (ACPI_FAILURE(status))
> >> > goto out;
> >> >
> >> > + mutex_lock(&acpi_pm_notifier_lock);
> >>
> >> And here you just set notifier_present under acpi_pm_notifier_lock.
> >>
> >> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> >> > + adev->wakeup.context.dev = dev;
> >> > + adev->wakeup.context.func = func;
> >> > adev->wakeup.flags.notifier_present = true;
> >> > + mutex_unlock(&acpi_pm_notifier_lock);
> >> >
> >> > out:
> >> > - mutex_unlock(&acpi_pm_notifier_lock);
> >> > + mutex_unlock(&acpi_pm_notifier_install_lock);
> >> > return status;
> >> > }
> >>
> >> Then on removal you can clear notifier_present first and drop the lock
> >> around the acpi_remove_notify_handler() call and nothing bad will
> >> happen.
> >>
> >> If you call acpi_add_pm_notifier() twice in parallel, the first
> >> instance will set context.dev and the second one will see it set and
> >> bail out. The first instance will then do the rest.
> >>
> >> If you call acpi_remove_pm_notifier() twice in a row, the first
> >> instance will see notifier_present set and will clear it, so the
> >> second one will see notifier_present unset and it will bail out.
> >>
> >> Now, if you call acpi_remove_pm_notifier() in parallel with
> >> acpi_add_pm_notifier(), either the former will see notifier_present
> >> unset and bail out, or the latter will see context.dev unset and bail
> >> out.
> >>
> >> It doesn't look like the outer lock is needed, or have I missed anything?
> >
> > So something like the below (totally untested) should work too, shouldn't it?
>
> There is a problem if a device is removed while acpi_add_pm_notifier()
> is still in progress, in which case with my patch the
> acpi_remove_pm_notifier() called from the removal path may bail out
> prematurely (that doesn't seem likely to happen, but still I don't see
> why it cannot happen), so I'll just use your patch. :-)
OK. I was just looking at your version and was pretty much convinced
that it would work. But I'll take your word that it might not :)
--
Ville Syrjälä
Intel OTC
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
@ 2017-11-08 12:31 ` Ville Syrjälä
0 siblings, 0 replies; 18+ messages in thread
From: Ville Syrjälä @ 2017-11-08 12:31 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, ACPI Devel Maling List, intel-gfx, Stable,
Rafael J . Wysocki, Len Brown, Peter Zijlstra, Tejun Heo,
Ingo Molnar
On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> >> <ville.syrjala@linux.intel.com> wrote:
> >> > From: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> >> >
> >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> >>
> >> OK, good catch!
> >>
> >> [cut]
> >>
> >> >
> >> > Cc: stable@vger.kernel.org
> >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> > Cc: Len Brown <lenb@kernel.org>
> >> > Cc: Peter Zijlstra <peterz@infradead.org>
> >> > Cc: Tejun Heo <tj@kernel.org>
> >> > Cc: Ingo Molnar <mingo@kernel.org>
> >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> >> > Signed-off-by: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> >> > ---
> >> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
> >> > 1 file changed, 12 insertions(+), 9 deletions(-)
> >> >
> >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> >> > index fbcc73f7a099..18af71057b44 100644
> >> > --- a/drivers/acpi/device_pm.c
> >> > +++ b/drivers/acpi/device_pm.c
> >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> >> >
> >> > #ifdef CONFIG_PM
> >> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
> >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> >> >
> >> > void acpi_pm_wakeup_event(struct device *dev)
> >> > {
> >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> >> > if (!dev && !func)
> >> > return AE_BAD_PARAMETER;
> >> >
> >> > - mutex_lock(&acpi_pm_notifier_lock);
> >> > + mutex_lock(&acpi_pm_notifier_install_lock);
> >> >
> >> > if (adev->wakeup.flags.notifier_present)
> >> > goto out;
> >> >
> >> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> >> > - adev->wakeup.context.dev = dev;
> >> > - adev->wakeup.context.func = func;
> >> > -
> >>
> >> But this doesn't look good to me.
> >>
> >> notifier_present should be checked under acpi_pm_notifier_lock.
> >>
> >> Actually, acpi_install_notify_handler() itself need not be called
> >> under the lock, because it does sufficient checks of its own.
> >>
> >> So say you do
> >>
> >> mutex_lock(&acpi_pm_notifier_lock);
> >>
> >> if (adev->wakeup.context.dev)
> >> goto out;
> >>
> >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> >> adev->wakeup.context.dev = dev;
> >> adev->wakeup.context.func = func;
> >>
> >> mutex_unlock(&acpi_pm_notifier_lock);
> >>
> >> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> >> > acpi_pm_notify_handler, NULL);
> >> > if (ACPI_FAILURE(status))
> >> > goto out;
> >> >
> >> > + mutex_lock(&acpi_pm_notifier_lock);
> >>
> >> And here you just set notifier_present under acpi_pm_notifier_lock.
> >>
> >> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> >> > + adev->wakeup.context.dev = dev;
> >> > + adev->wakeup.context.func = func;
> >> > adev->wakeup.flags.notifier_present = true;
> >> > + mutex_unlock(&acpi_pm_notifier_lock);
> >> >
> >> > out:
> >> > - mutex_unlock(&acpi_pm_notifier_lock);
> >> > + mutex_unlock(&acpi_pm_notifier_install_lock);
> >> > return status;
> >> > }
> >>
> >> Then on removal you can clear notifier_present first and drop the lock
> >> around the acpi_remove_notify_handler() call and nothing bad will
> >> happen.
> >>
> >> If you call acpi_add_pm_notifier() twice in parallel, the first
> >> instance will set context.dev and the second one will see it set and
> >> bail out. The first instance will then do the rest.
> >>
> >> If you call acpi_remove_pm_notifier() twice in a row, the first
> >> instance will see notifier_present set and will clear it, so the
> >> second one will see notifier_present unset and it will bail out.
> >>
> >> Now, if you call acpi_remove_pm_notifier() in parallel with
> >> acpi_add_pm_notifier(), either the former will see notifier_present
> >> unset and bail out, or the latter will see context.dev unset and bail
> >> out.
> >>
> >> It doesn't look like the outer lock is needed, or have I missed anything?
> >
> > So something like the below (totally untested) should work too, shouldn't it?
>
> There is a problem if a device is removed while acpi_add_pm_notifier()
> is still in progress, in which case with my patch the
> acpi_remove_pm_notifier() called from the removal path may bail out
> prematurely (that doesn't seem likely to happen, but still I don't see
> why it cannot happen), so I'll just use your patch. :-)
OK. I was just looking at your version and was pretty much convinced
that it would work. But I'll take your word that it might not :)
--
Ville Syrj�l�
Intel OTC
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
2017-11-08 12:31 ` Ville Syrjälä
@ 2017-11-08 12:41 ` Rafael J. Wysocki
-1 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 12:41 UTC (permalink / raw)
To: Ville Syrjälä
Cc: Rafael J. Wysocki, Peter Zijlstra, intel-gfx, Rafael J . Wysocki,
Stable, ACPI Devel Maling List, Tejun Heo, Ingo Molnar,
Len Brown
On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrjälä wrote:
> On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> > >> <ville.syrjala@linux.intel.com> wrote:
> > >> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > >> >
> > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> > >>
> > >> OK, good catch!
> > >>
> > >> [cut]
> > >>
> > >> >
> > >> > Cc: stable@vger.kernel.org
> > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >> > Cc: Len Brown <lenb@kernel.org>
> > >> > Cc: Peter Zijlstra <peterz@infradead.org>
> > >> > Cc: Tejun Heo <tj@kernel.org>
> > >> > Cc: Ingo Molnar <mingo@kernel.org>
> > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > >> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > >> > ---
> > >> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > >> > 1 file changed, 12 insertions(+), 9 deletions(-)
> > >> >
> > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > >> > index fbcc73f7a099..18af71057b44 100644
> > >> > --- a/drivers/acpi/device_pm.c
> > >> > +++ b/drivers/acpi/device_pm.c
> > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> > >> >
> > >> > #ifdef CONFIG_PM
> > >> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> > >> >
> > >> > void acpi_pm_wakeup_event(struct device *dev)
> > >> > {
> > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > >> > if (!dev && !func)
> > >> > return AE_BAD_PARAMETER;
> > >> >
> > >> > - mutex_lock(&acpi_pm_notifier_lock);
> > >> > + mutex_lock(&acpi_pm_notifier_install_lock);
> > >> >
> > >> > if (adev->wakeup.flags.notifier_present)
> > >> > goto out;
> > >> >
> > >> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > >> > - adev->wakeup.context.dev = dev;
> > >> > - adev->wakeup.context.func = func;
> > >> > -
> > >>
> > >> But this doesn't look good to me.
> > >>
> > >> notifier_present should be checked under acpi_pm_notifier_lock.
> > >>
> > >> Actually, acpi_install_notify_handler() itself need not be called
> > >> under the lock, because it does sufficient checks of its own.
> > >>
> > >> So say you do
> > >>
> > >> mutex_lock(&acpi_pm_notifier_lock);
> > >>
> > >> if (adev->wakeup.context.dev)
> > >> goto out;
> > >>
> > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > >> adev->wakeup.context.dev = dev;
> > >> adev->wakeup.context.func = func;
> > >>
> > >> mutex_unlock(&acpi_pm_notifier_lock);
> > >>
> > >> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > >> > acpi_pm_notify_handler, NULL);
> > >> > if (ACPI_FAILURE(status))
> > >> > goto out;
> > >> >
> > >> > + mutex_lock(&acpi_pm_notifier_lock);
> > >>
> > >> And here you just set notifier_present under acpi_pm_notifier_lock.
> > >>
> > >> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > >> > + adev->wakeup.context.dev = dev;
> > >> > + adev->wakeup.context.func = func;
> > >> > adev->wakeup.flags.notifier_present = true;
> > >> > + mutex_unlock(&acpi_pm_notifier_lock);
> > >> >
> > >> > out:
> > >> > - mutex_unlock(&acpi_pm_notifier_lock);
> > >> > + mutex_unlock(&acpi_pm_notifier_install_lock);
> > >> > return status;
> > >> > }
> > >>
> > >> Then on removal you can clear notifier_present first and drop the lock
> > >> around the acpi_remove_notify_handler() call and nothing bad will
> > >> happen.
> > >>
> > >> If you call acpi_add_pm_notifier() twice in parallel, the first
> > >> instance will set context.dev and the second one will see it set and
> > >> bail out. The first instance will then do the rest.
> > >>
> > >> If you call acpi_remove_pm_notifier() twice in a row, the first
> > >> instance will see notifier_present set and will clear it, so the
> > >> second one will see notifier_present unset and it will bail out.
> > >>
> > >> Now, if you call acpi_remove_pm_notifier() in parallel with
> > >> acpi_add_pm_notifier(), either the former will see notifier_present
> > >> unset and bail out, or the latter will see context.dev unset and bail
> > >> out.
> > >>
> > >> It doesn't look like the outer lock is needed, or have I missed anything?
> > >
> > > So something like the below (totally untested) should work too, shouldn't it?
> >
> > There is a problem if a device is removed while acpi_add_pm_notifier()
> > is still in progress, in which case with my patch the
> > acpi_remove_pm_notifier() called from the removal path may bail out
> > prematurely (that doesn't seem likely to happen, but still I don't see
> > why it cannot happen), so I'll just use your patch. :-)
>
> OK. I was just looking at your version and was pretty much convinced
> that it would work. But I'll take your word that it might not :)
Well, you don't have to. :-)
The scenario I have in mind is as follows:
1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
lock. notifier_present is still unset.
2. acpi_remove_pm_notifier() checks notifier_present under the lock.
It is (still) unset, so the function decides that there's nothing to do.
3. acpi_add_pm_notifier() continues with notifier installation and the
device goes away at the same time.
Thanks,
Rafael
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
@ 2017-11-08 12:41 ` Rafael J. Wysocki
0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 12:41 UTC (permalink / raw)
To: Ville Syrjälä
Cc: Rafael J. Wysocki, ACPI Devel Maling List, intel-gfx, Stable,
Rafael J . Wysocki, Len Brown, Peter Zijlstra, Tejun Heo,
Ingo Molnar
On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrj�l� wrote:
> On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> > >> <ville.syrjala@linux.intel.com> wrote:
> > >> > From: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > >> >
> > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> > >>
> > >> OK, good catch!
> > >>
> > >> [cut]
> > >>
> > >> >
> > >> > Cc: stable@vger.kernel.org
> > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >> > Cc: Len Brown <lenb@kernel.org>
> > >> > Cc: Peter Zijlstra <peterz@infradead.org>
> > >> > Cc: Tejun Heo <tj@kernel.org>
> > >> > Cc: Ingo Molnar <mingo@kernel.org>
> > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > >> > Signed-off-by: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > >> > ---
> > >> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > >> > 1 file changed, 12 insertions(+), 9 deletions(-)
> > >> >
> > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > >> > index fbcc73f7a099..18af71057b44 100644
> > >> > --- a/drivers/acpi/device_pm.c
> > >> > +++ b/drivers/acpi/device_pm.c
> > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> > >> >
> > >> > #ifdef CONFIG_PM
> > >> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> > >> >
> > >> > void acpi_pm_wakeup_event(struct device *dev)
> > >> > {
> > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > >> > if (!dev && !func)
> > >> > return AE_BAD_PARAMETER;
> > >> >
> > >> > - mutex_lock(&acpi_pm_notifier_lock);
> > >> > + mutex_lock(&acpi_pm_notifier_install_lock);
> > >> >
> > >> > if (adev->wakeup.flags.notifier_present)
> > >> > goto out;
> > >> >
> > >> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > >> > - adev->wakeup.context.dev = dev;
> > >> > - adev->wakeup.context.func = func;
> > >> > -
> > >>
> > >> But this doesn't look good to me.
> > >>
> > >> notifier_present should be checked under acpi_pm_notifier_lock.
> > >>
> > >> Actually, acpi_install_notify_handler() itself need not be called
> > >> under the lock, because it does sufficient checks of its own.
> > >>
> > >> So say you do
> > >>
> > >> mutex_lock(&acpi_pm_notifier_lock);
> > >>
> > >> if (adev->wakeup.context.dev)
> > >> goto out;
> > >>
> > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > >> adev->wakeup.context.dev = dev;
> > >> adev->wakeup.context.func = func;
> > >>
> > >> mutex_unlock(&acpi_pm_notifier_lock);
> > >>
> > >> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > >> > acpi_pm_notify_handler, NULL);
> > >> > if (ACPI_FAILURE(status))
> > >> > goto out;
> > >> >
> > >> > + mutex_lock(&acpi_pm_notifier_lock);
> > >>
> > >> And here you just set notifier_present under acpi_pm_notifier_lock.
> > >>
> > >> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > >> > + adev->wakeup.context.dev = dev;
> > >> > + adev->wakeup.context.func = func;
> > >> > adev->wakeup.flags.notifier_present = true;
> > >> > + mutex_unlock(&acpi_pm_notifier_lock);
> > >> >
> > >> > out:
> > >> > - mutex_unlock(&acpi_pm_notifier_lock);
> > >> > + mutex_unlock(&acpi_pm_notifier_install_lock);
> > >> > return status;
> > >> > }
> > >>
> > >> Then on removal you can clear notifier_present first and drop the lock
> > >> around the acpi_remove_notify_handler() call and nothing bad will
> > >> happen.
> > >>
> > >> If you call acpi_add_pm_notifier() twice in parallel, the first
> > >> instance will set context.dev and the second one will see it set and
> > >> bail out. The first instance will then do the rest.
> > >>
> > >> If you call acpi_remove_pm_notifier() twice in a row, the first
> > >> instance will see notifier_present set and will clear it, so the
> > >> second one will see notifier_present unset and it will bail out.
> > >>
> > >> Now, if you call acpi_remove_pm_notifier() in parallel with
> > >> acpi_add_pm_notifier(), either the former will see notifier_present
> > >> unset and bail out, or the latter will see context.dev unset and bail
> > >> out.
> > >>
> > >> It doesn't look like the outer lock is needed, or have I missed anything?
> > >
> > > So something like the below (totally untested) should work too, shouldn't it?
> >
> > There is a problem if a device is removed while acpi_add_pm_notifier()
> > is still in progress, in which case with my patch the
> > acpi_remove_pm_notifier() called from the removal path may bail out
> > prematurely (that doesn't seem likely to happen, but still I don't see
> > why it cannot happen), so I'll just use your patch. :-)
>
> OK. I was just looking at your version and was pretty much convinced
> that it would work. But I'll take your word that it might not :)
Well, you don't have to. :-)
The scenario I have in mind is as follows:
1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
lock. notifier_present is still unset.
2. acpi_remove_pm_notifier() checks notifier_present under the lock.
It is (still) unset, so the function decides that there's nothing to do.
3. acpi_add_pm_notifier() continues with notifier installation and the
device goes away at the same time.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
2017-11-08 12:41 ` Rafael J. Wysocki
(?)
@ 2017-11-08 12:55 ` Rafael J. Wysocki
-1 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 12:55 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Ville Syrjälä,
Rafael J. Wysocki, ACPI Devel Maling List, intel-gfx, Stable,
Rafael J . Wysocki, Len Brown, Peter Zijlstra, Tejun Heo,
Ingo Molnar
On Wed, Nov 8, 2017 at 1:41 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrjälä wrote:
>> On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
>> > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
>> > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
>> > >> <ville.syrjala@linux.intel.com> wrote:
>> > >> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > >> >
>> > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
>> > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
>> > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
>> > >>
>> > >> OK, good catch!
>> > >>
>> > >> [cut]
>> > >>
>> > >> >
>> > >> > Cc: stable@vger.kernel.org
>> > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > >> > Cc: Len Brown <lenb@kernel.org>
>> > >> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > >> > Cc: Tejun Heo <tj@kernel.org>
>> > >> > Cc: Ingo Molnar <mingo@kernel.org>
>> > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
>> > >> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > >> > ---
>> > >> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
>> > >> > 1 file changed, 12 insertions(+), 9 deletions(-)
>> > >> >
>> > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
>> > >> > index fbcc73f7a099..18af71057b44 100644
>> > >> > --- a/drivers/acpi/device_pm.c
>> > >> > +++ b/drivers/acpi/device_pm.c
>> > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
>> > >> >
>> > >> > #ifdef CONFIG_PM
>> > >> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
>> > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
>> > >> >
>> > >> > void acpi_pm_wakeup_event(struct device *dev)
>> > >> > {
>> > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
>> > >> > if (!dev && !func)
>> > >> > return AE_BAD_PARAMETER;
>> > >> >
>> > >> > - mutex_lock(&acpi_pm_notifier_lock);
>> > >> > + mutex_lock(&acpi_pm_notifier_install_lock);
>> > >> >
>> > >> > if (adev->wakeup.flags.notifier_present)
>> > >> > goto out;
>> > >> >
>> > >> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > >> > - adev->wakeup.context.dev = dev;
>> > >> > - adev->wakeup.context.func = func;
>> > >> > -
>> > >>
>> > >> But this doesn't look good to me.
>> > >>
>> > >> notifier_present should be checked under acpi_pm_notifier_lock.
>> > >>
>> > >> Actually, acpi_install_notify_handler() itself need not be called
>> > >> under the lock, because it does sufficient checks of its own.
>> > >>
>> > >> So say you do
>> > >>
>> > >> mutex_lock(&acpi_pm_notifier_lock);
>> > >>
>> > >> if (adev->wakeup.context.dev)
>> > >> goto out;
>> > >>
>> > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > >> adev->wakeup.context.dev = dev;
>> > >> adev->wakeup.context.func = func;
>> > >>
>> > >> mutex_unlock(&acpi_pm_notifier_lock);
>> > >>
>> > >> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
>> > >> > acpi_pm_notify_handler, NULL);
>> > >> > if (ACPI_FAILURE(status))
>> > >> > goto out;
>> > >> >
>> > >> > + mutex_lock(&acpi_pm_notifier_lock);
>> > >>
>> > >> And here you just set notifier_present under acpi_pm_notifier_lock.
>> > >>
>> > >> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > >> > + adev->wakeup.context.dev = dev;
>> > >> > + adev->wakeup.context.func = func;
>> > >> > adev->wakeup.flags.notifier_present = true;
>> > >> > + mutex_unlock(&acpi_pm_notifier_lock);
>> > >> >
>> > >> > out:
>> > >> > - mutex_unlock(&acpi_pm_notifier_lock);
>> > >> > + mutex_unlock(&acpi_pm_notifier_install_lock);
>> > >> > return status;
>> > >> > }
>> > >>
>> > >> Then on removal you can clear notifier_present first and drop the lock
>> > >> around the acpi_remove_notify_handler() call and nothing bad will
>> > >> happen.
>> > >>
>> > >> If you call acpi_add_pm_notifier() twice in parallel, the first
>> > >> instance will set context.dev and the second one will see it set and
>> > >> bail out. The first instance will then do the rest.
>> > >>
>> > >> If you call acpi_remove_pm_notifier() twice in a row, the first
>> > >> instance will see notifier_present set and will clear it, so the
>> > >> second one will see notifier_present unset and it will bail out.
>> > >>
>> > >> Now, if you call acpi_remove_pm_notifier() in parallel with
>> > >> acpi_add_pm_notifier(), either the former will see notifier_present
>> > >> unset and bail out, or the latter will see context.dev unset and bail
>> > >> out.
>> > >>
>> > >> It doesn't look like the outer lock is needed, or have I missed anything?
>> > >
>> > > So something like the below (totally untested) should work too, shouldn't it?
>> >
>> > There is a problem if a device is removed while acpi_add_pm_notifier()
>> > is still in progress, in which case with my patch the
>> > acpi_remove_pm_notifier() called from the removal path may bail out
>> > prematurely (that doesn't seem likely to happen, but still I don't see
>> > why it cannot happen), so I'll just use your patch. :-)
>>
>> OK. I was just looking at your version and was pretty much convinced
>> that it would work. But I'll take your word that it might not :)
>
> Well, you don't have to. :-)
>
> The scenario I have in mind is as follows:
>
> 1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
> lock. notifier_present is still unset.
>
> 2. acpi_remove_pm_notifier() checks notifier_present under the lock.
> It is (still) unset, so the function decides that there's nothing to do.
>
> 3. acpi_add_pm_notifier() continues with notifier installation and the
> device goes away at the same time.
Of course, the outer lock doesn't help against
acpi_remove_pm_notifier() in the removal path running right before
acpi_add_pm_notifier(), so if there's no other mutual exclusion, we
still have a problem in there (need to check that).
Thanks,
Rafael
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
2017-11-08 12:41 ` Rafael J. Wysocki
@ 2017-11-08 13:00 ` Ville Syrjälä
-1 siblings, 0 replies; 18+ messages in thread
From: Ville Syrjälä @ 2017-11-08 13:00 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Peter Zijlstra, intel-gfx, Rafael J . Wysocki,
Stable, ACPI Devel Maling List, Tejun Heo, Ingo Molnar,
Len Brown
On Wed, Nov 08, 2017 at 01:41:06PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrjälä wrote:
> > On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> > > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> > > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> > > >> <ville.syrjala@linux.intel.com> wrote:
> > > >> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >> >
> > > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> > > >>
> > > >> OK, good catch!
> > > >>
> > > >> [cut]
> > > >>
> > > >> >
> > > >> > Cc: stable@vger.kernel.org
> > > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >> > Cc: Len Brown <lenb@kernel.org>
> > > >> > Cc: Peter Zijlstra <peterz@infradead.org>
> > > >> > Cc: Tejun Heo <tj@kernel.org>
> > > >> > Cc: Ingo Molnar <mingo@kernel.org>
> > > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > > >> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >> > ---
> > > >> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > > >> > 1 file changed, 12 insertions(+), 9 deletions(-)
> > > >> >
> > > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > > >> > index fbcc73f7a099..18af71057b44 100644
> > > >> > --- a/drivers/acpi/device_pm.c
> > > >> > +++ b/drivers/acpi/device_pm.c
> > > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> > > >> >
> > > >> > #ifdef CONFIG_PM
> > > >> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> > > >> >
> > > >> > void acpi_pm_wakeup_event(struct device *dev)
> > > >> > {
> > > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > > >> > if (!dev && !func)
> > > >> > return AE_BAD_PARAMETER;
> > > >> >
> > > >> > - mutex_lock(&acpi_pm_notifier_lock);
> > > >> > + mutex_lock(&acpi_pm_notifier_install_lock);
> > > >> >
> > > >> > if (adev->wakeup.flags.notifier_present)
> > > >> > goto out;
> > > >> >
> > > >> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > - adev->wakeup.context.dev = dev;
> > > >> > - adev->wakeup.context.func = func;
> > > >> > -
> > > >>
> > > >> But this doesn't look good to me.
> > > >>
> > > >> notifier_present should be checked under acpi_pm_notifier_lock.
> > > >>
> > > >> Actually, acpi_install_notify_handler() itself need not be called
> > > >> under the lock, because it does sufficient checks of its own.
> > > >>
> > > >> So say you do
> > > >>
> > > >> mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> if (adev->wakeup.context.dev)
> > > >> goto out;
> > > >>
> > > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> adev->wakeup.context.dev = dev;
> > > >> adev->wakeup.context.func = func;
> > > >>
> > > >> mutex_unlock(&acpi_pm_notifier_lock);
> > > >>
> > > >> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > > >> > acpi_pm_notify_handler, NULL);
> > > >> > if (ACPI_FAILURE(status))
> > > >> > goto out;
> > > >> >
> > > >> > + mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> And here you just set notifier_present under acpi_pm_notifier_lock.
> > > >>
> > > >> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > + adev->wakeup.context.dev = dev;
> > > >> > + adev->wakeup.context.func = func;
> > > >> > adev->wakeup.flags.notifier_present = true;
> > > >> > + mutex_unlock(&acpi_pm_notifier_lock);
> > > >> >
> > > >> > out:
> > > >> > - mutex_unlock(&acpi_pm_notifier_lock);
> > > >> > + mutex_unlock(&acpi_pm_notifier_install_lock);
> > > >> > return status;
> > > >> > }
> > > >>
> > > >> Then on removal you can clear notifier_present first and drop the lock
> > > >> around the acpi_remove_notify_handler() call and nothing bad will
> > > >> happen.
> > > >>
> > > >> If you call acpi_add_pm_notifier() twice in parallel, the first
> > > >> instance will set context.dev and the second one will see it set and
> > > >> bail out. The first instance will then do the rest.
> > > >>
> > > >> If you call acpi_remove_pm_notifier() twice in a row, the first
> > > >> instance will see notifier_present set and will clear it, so the
> > > >> second one will see notifier_present unset and it will bail out.
> > > >>
> > > >> Now, if you call acpi_remove_pm_notifier() in parallel with
> > > >> acpi_add_pm_notifier(), either the former will see notifier_present
> > > >> unset and bail out, or the latter will see context.dev unset and bail
> > > >> out.
> > > >>
> > > >> It doesn't look like the outer lock is needed, or have I missed anything?
> > > >
> > > > So something like the below (totally untested) should work too, shouldn't it?
> > >
> > > There is a problem if a device is removed while acpi_add_pm_notifier()
> > > is still in progress, in which case with my patch the
> > > acpi_remove_pm_notifier() called from the removal path may bail out
> > > prematurely (that doesn't seem likely to happen, but still I don't see
> > > why it cannot happen), so I'll just use your patch. :-)
> >
> > OK. I was just looking at your version and was pretty much convinced
> > that it would work. But I'll take your word that it might not :)
>
> Well, you don't have to. :-)
>
> The scenario I have in mind is as follows:
>
> 1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
> lock. notifier_present is still unset.
>
> 2. acpi_remove_pm_notifier() checks notifier_present under the lock.
> It is (still) unset, so the function decides that there's nothing to do.
>
> 3. acpi_add_pm_notifier() continues with notifier installation and the
> device goes away at the same time.
Right. That makes sense. Though I don't know what would prevent racing
add_pm_notifier() against remove_pm_notifier() in a similar way even
with the outer lock. In that case the remove_pm_notifier() would just
have to get at the mutex first and then we'd still end up with the
notifier installed.
--
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
@ 2017-11-08 13:00 ` Ville Syrjälä
0 siblings, 0 replies; 18+ messages in thread
From: Ville Syrjälä @ 2017-11-08 13:00 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, ACPI Devel Maling List, intel-gfx, Stable,
Rafael J . Wysocki, Len Brown, Peter Zijlstra, Tejun Heo,
Ingo Molnar
On Wed, Nov 08, 2017 at 01:41:06PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrj�l� wrote:
> > On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> > > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> > > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> > > >> <ville.syrjala@linux.intel.com> wrote:
> > > >> > From: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > > >> >
> > > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> > > >>
> > > >> OK, good catch!
> > > >>
> > > >> [cut]
> > > >>
> > > >> >
> > > >> > Cc: stable@vger.kernel.org
> > > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >> > Cc: Len Brown <lenb@kernel.org>
> > > >> > Cc: Peter Zijlstra <peterz@infradead.org>
> > > >> > Cc: Tejun Heo <tj@kernel.org>
> > > >> > Cc: Ingo Molnar <mingo@kernel.org>
> > > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > > >> > Signed-off-by: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > > >> > ---
> > > >> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > > >> > 1 file changed, 12 insertions(+), 9 deletions(-)
> > > >> >
> > > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > > >> > index fbcc73f7a099..18af71057b44 100644
> > > >> > --- a/drivers/acpi/device_pm.c
> > > >> > +++ b/drivers/acpi/device_pm.c
> > > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> > > >> >
> > > >> > #ifdef CONFIG_PM
> > > >> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> > > >> >
> > > >> > void acpi_pm_wakeup_event(struct device *dev)
> > > >> > {
> > > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > > >> > if (!dev && !func)
> > > >> > return AE_BAD_PARAMETER;
> > > >> >
> > > >> > - mutex_lock(&acpi_pm_notifier_lock);
> > > >> > + mutex_lock(&acpi_pm_notifier_install_lock);
> > > >> >
> > > >> > if (adev->wakeup.flags.notifier_present)
> > > >> > goto out;
> > > >> >
> > > >> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > - adev->wakeup.context.dev = dev;
> > > >> > - adev->wakeup.context.func = func;
> > > >> > -
> > > >>
> > > >> But this doesn't look good to me.
> > > >>
> > > >> notifier_present should be checked under acpi_pm_notifier_lock.
> > > >>
> > > >> Actually, acpi_install_notify_handler() itself need not be called
> > > >> under the lock, because it does sufficient checks of its own.
> > > >>
> > > >> So say you do
> > > >>
> > > >> mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> if (adev->wakeup.context.dev)
> > > >> goto out;
> > > >>
> > > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> adev->wakeup.context.dev = dev;
> > > >> adev->wakeup.context.func = func;
> > > >>
> > > >> mutex_unlock(&acpi_pm_notifier_lock);
> > > >>
> > > >> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > > >> > acpi_pm_notify_handler, NULL);
> > > >> > if (ACPI_FAILURE(status))
> > > >> > goto out;
> > > >> >
> > > >> > + mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> And here you just set notifier_present under acpi_pm_notifier_lock.
> > > >>
> > > >> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > + adev->wakeup.context.dev = dev;
> > > >> > + adev->wakeup.context.func = func;
> > > >> > adev->wakeup.flags.notifier_present = true;
> > > >> > + mutex_unlock(&acpi_pm_notifier_lock);
> > > >> >
> > > >> > out:
> > > >> > - mutex_unlock(&acpi_pm_notifier_lock);
> > > >> > + mutex_unlock(&acpi_pm_notifier_install_lock);
> > > >> > return status;
> > > >> > }
> > > >>
> > > >> Then on removal you can clear notifier_present first and drop the lock
> > > >> around the acpi_remove_notify_handler() call and nothing bad will
> > > >> happen.
> > > >>
> > > >> If you call acpi_add_pm_notifier() twice in parallel, the first
> > > >> instance will set context.dev and the second one will see it set and
> > > >> bail out. The first instance will then do the rest.
> > > >>
> > > >> If you call acpi_remove_pm_notifier() twice in a row, the first
> > > >> instance will see notifier_present set and will clear it, so the
> > > >> second one will see notifier_present unset and it will bail out.
> > > >>
> > > >> Now, if you call acpi_remove_pm_notifier() in parallel with
> > > >> acpi_add_pm_notifier(), either the former will see notifier_present
> > > >> unset and bail out, or the latter will see context.dev unset and bail
> > > >> out.
> > > >>
> > > >> It doesn't look like the outer lock is needed, or have I missed anything?
> > > >
> > > > So something like the below (totally untested) should work too, shouldn't it?
> > >
> > > There is a problem if a device is removed while acpi_add_pm_notifier()
> > > is still in progress, in which case with my patch the
> > > acpi_remove_pm_notifier() called from the removal path may bail out
> > > prematurely (that doesn't seem likely to happen, but still I don't see
> > > why it cannot happen), so I'll just use your patch. :-)
> >
> > OK. I was just looking at your version and was pretty much convinced
> > that it would work. But I'll take your word that it might not :)
>
> Well, you don't have to. :-)
>
> The scenario I have in mind is as follows:
>
> 1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
> lock. notifier_present is still unset.
>
> 2. acpi_remove_pm_notifier() checks notifier_present under the lock.
> It is (still) unset, so the function decides that there's nothing to do.
>
> 3. acpi_add_pm_notifier() continues with notifier installation and the
> device goes away at the same time.
Right. That makes sense. Though I don't know what would prevent racing
add_pm_notifier() against remove_pm_notifier() in a similar way even
with the outer lock. In that case the remove_pm_notifier() would just
have to get at the mutex first and then we'd still end up with the
notifier installed.
--
Ville Syrj�l�
Intel OTC
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2017-11-08 13:00 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-07 21:08 [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock Ville Syrjala
2017-11-07 22:47 ` Rafael J. Wysocki
2017-11-07 22:47 ` Rafael J. Wysocki
2017-11-07 23:06 ` Rafael J. Wysocki
2017-11-07 23:06 ` Rafael J. Wysocki
2017-11-08 12:23 ` Rafael J. Wysocki
2017-11-08 12:23 ` Rafael J. Wysocki
2017-11-08 12:31 ` Ville Syrjälä
2017-11-08 12:31 ` Ville Syrjälä
2017-11-08 12:41 ` Rafael J. Wysocki
2017-11-08 12:41 ` Rafael J. Wysocki
2017-11-08 12:55 ` Rafael J. Wysocki
2017-11-08 13:00 ` Ville Syrjälä
2017-11-08 13:00 ` Ville Syrjälä
2017-11-08 9:47 ` Rafael J. Wysocki
2017-11-08 9:47 ` Rafael J. Wysocki
2017-11-08 9:13 ` ✗ Fi.CI.BAT: failure for " Patchwork
2017-11-08 12:21 ` ✓ Fi.CI.BAT: success " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.