All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] bdi: Do not use freezable workqueue
@ 2019-10-04 10:00 Mika Westerberg
  2019-10-04 10:00 ` [PATCH 2/2] Revert "libata, freezer: avoid block device removal while system is frozen" Mika Westerberg
  2019-10-04 13:22 ` [PATCH 1/2] bdi: Do not use freezable workqueue Jens Axboe
  0 siblings, 2 replies; 4+ messages in thread
From: Mika Westerberg @ 2019-10-04 10:00 UTC (permalink / raw)
  To: Jens Axboe, Rafael J . Wysocki
  Cc: Pavel Machek, Jan Kara, Tejun Heo, Greg Kroah-Hartman,
	Sebastian Andrzej Siewior, Thomas Gleixner, AceLan Kao,
	Mika Westerberg, linux-kernel, linux-pm

A removable block device, such as NVMe or SSD connected over Thunderbolt
can be hot-removed any time including when the system is suspended. When
device is hot-removed during suspend and the system gets resumed, kernel
first resumes devices and then thaws the userspace including freezable
workqueues. What happens in that case is that the NVMe driver notices
that the device is unplugged and removes it from the system. This ends
up calling bdi_unregister() for the gendisk which then schedules
wb_workfn() to be run one more time.

However, since the bdi_wq is still frozen flush_delayed_work() call in
wb_shutdown() blocks forever halting system resume process. User sees
this as hang as nothing is happening anymore.

Triggering sysrq-w reveals this:

  Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
  Call Trace:
   ? __schedule+0x2c5/0x630
   ? wait_for_completion+0xa4/0x120
   schedule+0x3e/0xc0
   schedule_timeout+0x1c9/0x320
   ? resched_curr+0x1f/0xd0
   ? wait_for_completion+0xa4/0x120
   wait_for_completion+0xc3/0x120
   ? wake_up_q+0x60/0x60
   __flush_work+0x131/0x1e0
   ? flush_workqueue_prep_pwqs+0x130/0x130
   bdi_unregister+0xb9/0x130
   del_gendisk+0x2d2/0x2e0
   nvme_ns_remove+0xed/0x110 [nvme_core]
   nvme_remove_namespaces+0x96/0xd0 [nvme_core]
   nvme_remove+0x5b/0x160 [nvme]
   pci_device_remove+0x36/0x90
   device_release_driver_internal+0xdf/0x1c0
   nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
   process_one_work+0x1c2/0x3f0
   worker_thread+0x48/0x3e0
   kthread+0x100/0x140
   ? current_work+0x30/0x30
   ? kthread_park+0x80/0x80
   ret_from_fork+0x35/0x40

This is not limited to NVMes so exactly same issue can be reproduced by
hot-removing SSD (over Thunderbolt) while the system is suspended.

Prevent this from happening by removing WQ_FREEZABLE from bdi_wq.

Reported-by: AceLan Kao <acelan.kao@canonical.com>
Link: https://marc.info/?l=linux-kernel&m=138695698516487
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204385
Link: https://lore.kernel.org/lkml/20191002122136.GD2819@lahna.fi.intel.com/#t
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
I'm not too familiar with the bdi and block layer so it may be there is a
good reason having freezabe bdi_wq and we need to re-think how this could
be solved.

This problem is easy to reproduce with Thunderbolt capable systems by doing
following steps:

  1. Connect NVMe or SSD over Thunderbolt
  2. Suspend the system (mem, s2idle)
  3. Detach the NVMe or SSD
  4. Resume system

 mm/backing-dev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d9daa3e422d0..c360f6a6c844 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -239,8 +239,8 @@ static int __init default_bdi_init(void)
 {
 	int err;
 
-	bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_FREEZABLE |
-					      WQ_UNBOUND | WQ_SYSFS, 0);
+	bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_UNBOUND |
+				 WQ_SYSFS, 0);
 	if (!bdi_wq)
 		return -ENOMEM;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] Revert "libata, freezer: avoid block device removal while system is frozen"
  2019-10-04 10:00 [PATCH 1/2] bdi: Do not use freezable workqueue Mika Westerberg
@ 2019-10-04 10:00 ` Mika Westerberg
  2019-10-04 13:22 ` [PATCH 1/2] bdi: Do not use freezable workqueue Jens Axboe
  1 sibling, 0 replies; 4+ messages in thread
From: Mika Westerberg @ 2019-10-04 10:00 UTC (permalink / raw)
  To: Jens Axboe, Rafael J . Wysocki
  Cc: Pavel Machek, Jan Kara, Tejun Heo, Greg Kroah-Hartman,
	Sebastian Andrzej Siewior, Thomas Gleixner, AceLan Kao,
	Mika Westerberg, linux-kernel, linux-pm

This reverts commit 85fbd722ad0f5d64d1ad15888cd1eb2188bfb557.

The commit was added as a quick band-aid for a hang that happened when a
block device was removed during system suspend. Now that bdi_wq is not
freezable anymore the hang should not be possible and we can get rid of
this hack by reverting it.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/ata/libata-scsi.c | 21 ---------------------
 kernel/freezer.c          |  6 ------
 2 files changed, 27 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 76d0f9de767b..58e09ffe8b9c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4791,27 +4791,6 @@ void ata_scsi_hotplug(struct work_struct *work)
 		return;
 	}
 
-	/*
-	 * XXX - UGLY HACK
-	 *
-	 * The block layer suspend/resume path is fundamentally broken due
-	 * to freezable kthreads and workqueue and may deadlock if a block
-	 * device gets removed while resume is in progress.  I don't know
-	 * what the solution is short of removing freezable kthreads and
-	 * workqueues altogether.
-	 *
-	 * The following is an ugly hack to avoid kicking off device
-	 * removal while freezer is active.  This is a joke but does avoid
-	 * this particular deadlock scenario.
-	 *
-	 * https://bugzilla.kernel.org/show_bug.cgi?id=62801
-	 * http://marc.info/?l=linux-kernel&m=138695698516487
-	 */
-#ifdef CONFIG_FREEZER
-	while (pm_freezing)
-		msleep(10);
-#endif
-
 	DPRINTK("ENTER\n");
 	mutex_lock(&ap->scsi_scan_mutex);
 
diff --git a/kernel/freezer.c b/kernel/freezer.c
index c0738424bb43..dc520f01f99d 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -22,12 +22,6 @@ EXPORT_SYMBOL(system_freezing_cnt);
 bool pm_freezing;
 bool pm_nosig_freezing;
 
-/*
- * Temporary export for the deadlock workaround in ata_scsi_hotplug().
- * Remove once the hack becomes unnecessary.
- */
-EXPORT_SYMBOL_GPL(pm_freezing);
-
 /* protects freezing and frozen transitions */
 static DEFINE_SPINLOCK(freezer_lock);
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] bdi: Do not use freezable workqueue
  2019-10-04 10:00 [PATCH 1/2] bdi: Do not use freezable workqueue Mika Westerberg
  2019-10-04 10:00 ` [PATCH 2/2] Revert "libata, freezer: avoid block device removal while system is frozen" Mika Westerberg
@ 2019-10-04 13:22 ` Jens Axboe
  2019-10-06 15:08   ` Rafael J. Wysocki
  1 sibling, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2019-10-04 13:22 UTC (permalink / raw)
  To: Mika Westerberg, Rafael J . Wysocki
  Cc: Pavel Machek, Jan Kara, Tejun Heo, Greg Kroah-Hartman,
	Sebastian Andrzej Siewior, Thomas Gleixner, AceLan Kao,
	linux-kernel, linux-pm

On 10/4/19 4:00 AM, Mika Westerberg wrote:
> A removable block device, such as NVMe or SSD connected over Thunderbolt
> can be hot-removed any time including when the system is suspended. When
> device is hot-removed during suspend and the system gets resumed, kernel
> first resumes devices and then thaws the userspace including freezable
> workqueues. What happens in that case is that the NVMe driver notices
> that the device is unplugged and removes it from the system. This ends
> up calling bdi_unregister() for the gendisk which then schedules
> wb_workfn() to be run one more time.
> 
> However, since the bdi_wq is still frozen flush_delayed_work() call in
> wb_shutdown() blocks forever halting system resume process. User sees
> this as hang as nothing is happening anymore.
> 
> Triggering sysrq-w reveals this:
> 
>    Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
>    Call Trace:
>     ? __schedule+0x2c5/0x630
>     ? wait_for_completion+0xa4/0x120
>     schedule+0x3e/0xc0
>     schedule_timeout+0x1c9/0x320
>     ? resched_curr+0x1f/0xd0
>     ? wait_for_completion+0xa4/0x120
>     wait_for_completion+0xc3/0x120
>     ? wake_up_q+0x60/0x60
>     __flush_work+0x131/0x1e0
>     ? flush_workqueue_prep_pwqs+0x130/0x130
>     bdi_unregister+0xb9/0x130
>     del_gendisk+0x2d2/0x2e0
>     nvme_ns_remove+0xed/0x110 [nvme_core]
>     nvme_remove_namespaces+0x96/0xd0 [nvme_core]
>     nvme_remove+0x5b/0x160 [nvme]
>     pci_device_remove+0x36/0x90
>     device_release_driver_internal+0xdf/0x1c0
>     nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
>     process_one_work+0x1c2/0x3f0
>     worker_thread+0x48/0x3e0
>     kthread+0x100/0x140
>     ? current_work+0x30/0x30
>     ? kthread_park+0x80/0x80
>     ret_from_fork+0x35/0x40
> 
> This is not limited to NVMes so exactly same issue can be reproduced by
> hot-removing SSD (over Thunderbolt) while the system is suspended.
> 
> Prevent this from happening by removing WQ_FREEZABLE from bdi_wq.

This series looks good for me, I don't think there's a reason for
the workers to be marked freezable.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] bdi: Do not use freezable workqueue
  2019-10-04 13:22 ` [PATCH 1/2] bdi: Do not use freezable workqueue Jens Axboe
@ 2019-10-06 15:08   ` Rafael J. Wysocki
  0 siblings, 0 replies; 4+ messages in thread
From: Rafael J. Wysocki @ 2019-10-06 15:08 UTC (permalink / raw)
  To: Jens Axboe, Mika Westerberg
  Cc: Rafael J . Wysocki, Pavel Machek, Jan Kara, Tejun Heo,
	Greg Kroah-Hartman, Sebastian Andrzej Siewior, Thomas Gleixner,
	AceLan Kao, Linux Kernel Mailing List, Linux PM

On Fri, Oct 4, 2019 at 3:22 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 10/4/19 4:00 AM, Mika Westerberg wrote:
> > A removable block device, such as NVMe or SSD connected over Thunderbolt
> > can be hot-removed any time including when the system is suspended. When
> > device is hot-removed during suspend and the system gets resumed, kernel
> > first resumes devices and then thaws the userspace including freezable
> > workqueues. What happens in that case is that the NVMe driver notices
> > that the device is unplugged and removes it from the system. This ends
> > up calling bdi_unregister() for the gendisk which then schedules
> > wb_workfn() to be run one more time.
> >
> > However, since the bdi_wq is still frozen flush_delayed_work() call in
> > wb_shutdown() blocks forever halting system resume process. User sees
> > this as hang as nothing is happening anymore.
> >
> > Triggering sysrq-w reveals this:
> >
> >    Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> >    Call Trace:
> >     ? __schedule+0x2c5/0x630
> >     ? wait_for_completion+0xa4/0x120
> >     schedule+0x3e/0xc0
> >     schedule_timeout+0x1c9/0x320
> >     ? resched_curr+0x1f/0xd0
> >     ? wait_for_completion+0xa4/0x120
> >     wait_for_completion+0xc3/0x120
> >     ? wake_up_q+0x60/0x60
> >     __flush_work+0x131/0x1e0
> >     ? flush_workqueue_prep_pwqs+0x130/0x130
> >     bdi_unregister+0xb9/0x130
> >     del_gendisk+0x2d2/0x2e0
> >     nvme_ns_remove+0xed/0x110 [nvme_core]
> >     nvme_remove_namespaces+0x96/0xd0 [nvme_core]
> >     nvme_remove+0x5b/0x160 [nvme]
> >     pci_device_remove+0x36/0x90
> >     device_release_driver_internal+0xdf/0x1c0
> >     nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
> >     process_one_work+0x1c2/0x3f0
> >     worker_thread+0x48/0x3e0
> >     kthread+0x100/0x140
> >     ? current_work+0x30/0x30
> >     ? kthread_park+0x80/0x80
> >     ret_from_fork+0x35/0x40
> >
> > This is not limited to NVMes so exactly same issue can be reproduced by
> > hot-removing SSD (over Thunderbolt) while the system is suspended.
> >
> > Prevent this from happening by removing WQ_FREEZABLE from bdi_wq.
>
> This series looks good for me, I don't think there's a reason for
> the workers to be marked freezable.

I was a bit concerned that the original idea might be to prevent
writes to the persistent storage from occurring after creating an
image during hibernation, but if that's not the case, the series is
fine from the general power management standpoint, so

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

for both patches.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-10-06 15:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-04 10:00 [PATCH 1/2] bdi: Do not use freezable workqueue Mika Westerberg
2019-10-04 10:00 ` [PATCH 2/2] Revert "libata, freezer: avoid block device removal while system is frozen" Mika Westerberg
2019-10-04 13:22 ` [PATCH 1/2] bdi: Do not use freezable workqueue Jens Axboe
2019-10-06 15:08   ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.