linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* System hangs if NVMe/SSD is removed during suspend
@ 2019-10-02 12:21 Mika Westerberg
  2019-10-03 16:50 ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Mika Westerberg @ 2019-10-02 12:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rafael J. Wysocki, AceLan Kao, Jens Axboe, Jan Kara,
	Greg Kroah-Hartman, linux-kernel

Hi Tejun,

In a system with Thunderbolt connected NVMe or SSD entering system
suspend, detaching the NVMe/SSD and resuming the system hangs (see also
https://bugzilla.kernel.org/show_bug.cgi?id=204385).

Triggering sysrq-w I see this:

[  113.093783] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
[  113.095156] Call Trace:
[  113.096234]  ? __schedule+0x2c5/0x630
[  113.097409]  ? wait_for_completion+0xa4/0x120
[  113.098639]  schedule+0x3e/0xc0
[  113.099769]  schedule_timeout+0x1c9/0x320
[  113.100973]  ? resched_curr+0x1f/0xd0
[  113.102146]  ? wait_for_completion+0xa4/0x120
[  113.103379]  wait_for_completion+0xc3/0x120
[  113.104595]  ? wake_up_q+0x60/0x60
[  113.105749]  __flush_work+0x131/0x1e0
[  113.106925]  ? flush_workqueue_prep_pwqs+0x130/0x130
[  113.108215]  bdi_unregister+0xb9/0x130
[  113.109403]  del_gendisk+0x2d2/0x2e0
[  113.110580]  nvme_ns_remove+0xed/0x110 [nvme_core]
[  113.111853]  nvme_remove_namespaces+0x96/0xd0 [nvme_core]
[  113.113177]  nvme_remove+0x5b/0x160 [nvme]
[  113.114391]  pci_device_remove+0x36/0x90
[  113.115590]  device_release_driver_internal+0xdf/0x1c0
[  113.116893]  nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
[  113.118217]  process_one_work+0x1c2/0x3f0
[  113.119434]  worker_thread+0x48/0x3e0
[  113.120619]  kthread+0x100/0x140
[  113.121772]  ? current_work+0x30/0x30
[  113.122955]  ? kthread_park+0x80/0x80
[  113.124142]  ret_from_fork+0x35/0x40

The exact place is in wb_shutdown():

        /*
         * Drain work list and shutdown the delayed_work.  !WB_registered
         * tells wb_workfn() that @wb is dying and its work_list needs to
         * be drained no matter what.
         */
        mod_delayed_work(bdi_wq, &wb->dwork, 0);
        flush_delayed_work(&wb->dwork);

Now bdi_wq is marked as WQ_FREEZABLE and at this time we are still
resuming devices so I think it is still frozen. This basically results
that the resume process is waiting for bdi_unregister() but it cannot
progress because its workqueue is still frozen.

I saw you "dealt" similar situation for libata with commit 85fbd722ad0f
("libata, freezer: avoid block device removal while system is frozen")
and that there was discussion around this here:

  https://marc.info/?l=linux-kernel&m=138695698516487

but from that discussion I don't see more generic solution to be
implemented.

Any ideas we should fix this properly?

I mean nowadays Thunderbolt connected storage is quite common and I
suppose situation like this might wery well happen, say you have a dock
with SSD connected and you disconnect it when the laptop is suspended.

I've been using following hack to prevent the issue but I'm quite sure
there is a better solution ;-)

Thanks in advance!

 drivers/ata/libata-scsi.c | 2 ++
 mm/backing-dev.c          | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 76d0f9de767b..3fea8d72f6f9 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4791,6 +4791,7 @@ void ata_scsi_hotplug(struct work_struct *work)
 		return;
 	}
 
+#if 0
 	/*
 	 * XXX - UGLY HACK
 	 *
@@ -4810,6 +4811,7 @@ void ata_scsi_hotplug(struct work_struct *work)
 #ifdef CONFIG_FREEZER
 	while (pm_freezing)
 		msleep(10);
+#endif
 #endif
 
 	DPRINTK("ENTER\n");
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index e8e89158adec..8e77711d5dd0 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -236,8 +236,8 @@ static int __init default_bdi_init(void)
 {
 	int err;
 
-	bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_FREEZABLE |
-					      WQ_UNBOUND | WQ_SYSFS, 0);
+	bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_UNBOUND |
+					      WQ_SYSFS, 0);
 	if (!bdi_wq)
 		return -ENOMEM;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: System hangs if NVMe/SSD is removed during suspend
  2019-10-02 12:21 System hangs if NVMe/SSD is removed during suspend Mika Westerberg
@ 2019-10-03 16:50 ` Tejun Heo
  2019-10-04  8:03   ` Mika Westerberg
  0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2019-10-03 16:50 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, AceLan Kao, Jens Axboe, Jan Kara,
	Greg Kroah-Hartman, linux-kernel

Hello, Mika.

On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote:
> but from that discussion I don't see more generic solution to be
> implemented.
> 
> Any ideas we should fix this properly?

Yeah, the only fix I can think of is not using freezable wq.  It's
just not a good idea and not all that difficult to avoid using.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: System hangs if NVMe/SSD is removed during suspend
  2019-10-03 16:50 ` Tejun Heo
@ 2019-10-04  8:03   ` Mika Westerberg
  2019-10-04  9:59     ` Rafael J. Wysocki
  0 siblings, 1 reply; 8+ messages in thread
From: Mika Westerberg @ 2019-10-04  8:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rafael J. Wysocki, AceLan Kao, Jens Axboe, Jan Kara,
	Greg Kroah-Hartman, linux-kernel

On Thu, Oct 03, 2019 at 09:50:33AM -0700, Tejun Heo wrote:
> Hello, Mika.
> 
> On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote:
> > but from that discussion I don't see more generic solution to be
> > implemented.
> > 
> > Any ideas we should fix this properly?
> 
> Yeah, the only fix I can think of is not using freezable wq.  It's
> just not a good idea and not all that difficult to avoid using.

OK, thanks.

In that case I will just make a patch that removes WQ_FREEZABLE from
bdi_wq and see what people think about it :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: System hangs if NVMe/SSD is removed during suspend
  2019-10-04  8:03   ` Mika Westerberg
@ 2019-10-04  9:59     ` Rafael J. Wysocki
  2019-10-04 11:01       ` Mika Westerberg
  0 siblings, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2019-10-04  9:59 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Tejun Heo, Rafael J. Wysocki, AceLan Kao, Jens Axboe, Jan Kara,
	Greg Kroah-Hartman, linux-kernel

On Friday, October 4, 2019 10:03:40 AM CEST Mika Westerberg wrote:
> On Thu, Oct 03, 2019 at 09:50:33AM -0700, Tejun Heo wrote:
> > Hello, Mika.
> > 
> > On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote:
> > > but from that discussion I don't see more generic solution to be
> > > implemented.
> > > 
> > > Any ideas we should fix this properly?
> > 
> > Yeah, the only fix I can think of is not using freezable wq.  It's
> > just not a good idea and not all that difficult to avoid using.
> 
> OK, thanks.
> 
> In that case I will just make a patch that removes WQ_FREEZABLE from
> bdi_wq and see what people think about it :)

I guess that depends on why WQ_FREEZABLE was added to it in the first place. :-)

The reason might be to avoid writes to persistent storage after creating an
image during hibernation, since wqs remain frozen throughout the entire
hibernation including the image saving phase.

Arguably, making the wq freezable is kind of a sledgehammer approach to that
particular issue, but in principle it may prevent data corruption from
occurring, so be careful there.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: System hangs if NVMe/SSD is removed during suspend
  2019-10-04  9:59     ` Rafael J. Wysocki
@ 2019-10-04 11:01       ` Mika Westerberg
  2019-10-04 13:32         ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Mika Westerberg @ 2019-10-04 11:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Tejun Heo, Rafael J. Wysocki, AceLan Kao, Jens Axboe, Jan Kara,
	Greg Kroah-Hartman, linux-kernel

On Fri, Oct 04, 2019 at 11:59:26AM +0200, Rafael J. Wysocki wrote:
> On Friday, October 4, 2019 10:03:40 AM CEST Mika Westerberg wrote:
> > On Thu, Oct 03, 2019 at 09:50:33AM -0700, Tejun Heo wrote:
> > > Hello, Mika.
> > > 
> > > On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote:
> > > > but from that discussion I don't see more generic solution to be
> > > > implemented.
> > > > 
> > > > Any ideas we should fix this properly?
> > > 
> > > Yeah, the only fix I can think of is not using freezable wq.  It's
> > > just not a good idea and not all that difficult to avoid using.
> > 
> > OK, thanks.
> > 
> > In that case I will just make a patch that removes WQ_FREEZABLE from
> > bdi_wq and see what people think about it :)
> 
> I guess that depends on why WQ_FREEZABLE was added to it in the first place. :-)
> 
> The reason might be to avoid writes to persistent storage after creating an
> image during hibernation, since wqs remain frozen throughout the entire
> hibernation including the image saving phase.

Good point.

> Arguably, making the wq freezable is kind of a sledgehammer approach to that
> particular issue, but in principle it may prevent data corruption from
> occurring, so be careful there.

I tried to find the commit that introduced the "freezing" and I think it
is this one:

  03ba3782e8dc writeback: switch to per-bdi threads for flushing data

Unfortunately from that commit it is not clear (at least to me) why it
calls set_freezable() for the bdi task. It does not look like it has
anything to do with blocking writes to storage while entering
hibernation but I may be mistaken.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: System hangs if NVMe/SSD is removed during suspend
  2019-10-04 11:01       ` Mika Westerberg
@ 2019-10-04 13:32         ` Jens Axboe
  2019-10-07 10:08           ` Jan Kara
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2019-10-04 13:32 UTC (permalink / raw)
  To: Mika Westerberg, Rafael J. Wysocki
  Cc: Tejun Heo, Rafael J. Wysocki, AceLan Kao, Jan Kara,
	Greg Kroah-Hartman, linux-kernel

On 10/4/19 5:01 AM, Mika Westerberg wrote:
> On Fri, Oct 04, 2019 at 11:59:26AM +0200, Rafael J. Wysocki wrote:
>> On Friday, October 4, 2019 10:03:40 AM CEST Mika Westerberg wrote:
>>> On Thu, Oct 03, 2019 at 09:50:33AM -0700, Tejun Heo wrote:
>>>> Hello, Mika.
>>>>
>>>> On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote:
>>>>> but from that discussion I don't see more generic solution to be
>>>>> implemented.
>>>>>
>>>>> Any ideas we should fix this properly?
>>>>
>>>> Yeah, the only fix I can think of is not using freezable wq.  It's
>>>> just not a good idea and not all that difficult to avoid using.
>>>
>>> OK, thanks.
>>>
>>> In that case I will just make a patch that removes WQ_FREEZABLE from
>>> bdi_wq and see what people think about it :)
>>
>> I guess that depends on why WQ_FREEZABLE was added to it in the first place. :-)
>>
>> The reason might be to avoid writes to persistent storage after creating an
>> image during hibernation, since wqs remain frozen throughout the entire
>> hibernation including the image saving phase.
> 
> Good point.
> 
>> Arguably, making the wq freezable is kind of a sledgehammer approach to that
>> particular issue, but in principle it may prevent data corruption from
>> occurring, so be careful there.
> 
> I tried to find the commit that introduced the "freezing" and I think it
> is this one:
> 
>    03ba3782e8dc writeback: switch to per-bdi threads for flushing data
> 
> Unfortunately from that commit it is not clear (at least to me) why it
> calls set_freezable() for the bdi task. It does not look like it has
> anything to do with blocking writes to storage while entering
> hibernation but I may be mistaken.

Wow, a decade ago...

Honestly, I don't recall why these were marked freezable, and as I wrote
in the other reply, I don't think there's a good reason for that to be
the case.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: System hangs if NVMe/SSD is removed during suspend
  2019-10-04 13:32         ` Jens Axboe
@ 2019-10-07 10:08           ` Jan Kara
  2019-10-09 13:22             ` Rafael J. Wysocki
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2019-10-07 10:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Mika Westerberg, Rafael J. Wysocki, Tejun Heo, Rafael J. Wysocki,
	AceLan Kao, Jan Kara, Greg Kroah-Hartman, linux-kernel

On Fri 04-10-19 07:32:40, Jens Axboe wrote:
> On 10/4/19 5:01 AM, Mika Westerberg wrote:
> > On Fri, Oct 04, 2019 at 11:59:26AM +0200, Rafael J. Wysocki wrote:
> >> On Friday, October 4, 2019 10:03:40 AM CEST Mika Westerberg wrote:
> >>> On Thu, Oct 03, 2019 at 09:50:33AM -0700, Tejun Heo wrote:
> >>>> Hello, Mika.
> >>>>
> >>>> On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote:
> >>>>> but from that discussion I don't see more generic solution to be
> >>>>> implemented.
> >>>>>
> >>>>> Any ideas we should fix this properly?
> >>>>
> >>>> Yeah, the only fix I can think of is not using freezable wq.  It's
> >>>> just not a good idea and not all that difficult to avoid using.
> >>>
> >>> OK, thanks.
> >>>
> >>> In that case I will just make a patch that removes WQ_FREEZABLE from
> >>> bdi_wq and see what people think about it :)
> >>
> >> I guess that depends on why WQ_FREEZABLE was added to it in the first place. :-)
> >>
> >> The reason might be to avoid writes to persistent storage after creating an
> >> image during hibernation, since wqs remain frozen throughout the entire
> >> hibernation including the image saving phase.
> > 
> > Good point.
> > 
> >> Arguably, making the wq freezable is kind of a sledgehammer approach to that
> >> particular issue, but in principle it may prevent data corruption from
> >> occurring, so be careful there.
> > 
> > I tried to find the commit that introduced the "freezing" and I think it
> > is this one:
> > 
> >    03ba3782e8dc writeback: switch to per-bdi threads for flushing data
> > 
> > Unfortunately from that commit it is not clear (at least to me) why it
> > calls set_freezable() for the bdi task. It does not look like it has
> > anything to do with blocking writes to storage while entering
> > hibernation but I may be mistaken.
> 
> Wow, a decade ago...
> 
> Honestly, I don't recall why these were marked freezable, and as I wrote
> in the other reply, I don't think there's a good reason for that to be
> the case.

Well, cannot it happen that the flush worker will get stuck in D state
because some subsystem is already suspended and thus hibernation fails
(because AFAIK processes in uninterruptible sleep block hibernation)?

I was also somewhat worried that the hibernation image could be
inconsistent if flush workers do something while hibernation image is being
taken but that does not seem to be a valid concern as all kernel processes
get frozen before hibernation image is taken.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: System hangs if NVMe/SSD is removed during suspend
  2019-10-07 10:08           ` Jan Kara
@ 2019-10-09 13:22             ` Rafael J. Wysocki
  0 siblings, 0 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2019-10-09 13:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Mika Westerberg, Rafael J. Wysocki, Tejun Heo,
	AceLan Kao, Greg Kroah-Hartman, linux-kernel

On 10/7/2019 12:08 PM, Jan Kara wrote:
> On Fri 04-10-19 07:32:40, Jens Axboe wrote:
>> On 10/4/19 5:01 AM, Mika Westerberg wrote:
>>> On Fri, Oct 04, 2019 at 11:59:26AM +0200, Rafael J. Wysocki wrote:
>>>> On Friday, October 4, 2019 10:03:40 AM CEST Mika Westerberg wrote:
>>>>> On Thu, Oct 03, 2019 at 09:50:33AM -0700, Tejun Heo wrote:
>>>>>> Hello, Mika.
>>>>>>
>>>>>> On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote:
>>>>>>> but from that discussion I don't see more generic solution to be
>>>>>>> implemented.
>>>>>>>
>>>>>>> Any ideas we should fix this properly?
>>>>>> Yeah, the only fix I can think of is not using freezable wq.  It's
>>>>>> just not a good idea and not all that difficult to avoid using.
>>>>> OK, thanks.
>>>>>
>>>>> In that case I will just make a patch that removes WQ_FREEZABLE from
>>>>> bdi_wq and see what people think about it :)
>>>> I guess that depends on why WQ_FREEZABLE was added to it in the first place. :-)
>>>>
>>>> The reason might be to avoid writes to persistent storage after creating an
>>>> image during hibernation, since wqs remain frozen throughout the entire
>>>> hibernation including the image saving phase.
>>> Good point.
>>>
>>>> Arguably, making the wq freezable is kind of a sledgehammer approach to that
>>>> particular issue, but in principle it may prevent data corruption from
>>>> occurring, so be careful there.
>>> I tried to find the commit that introduced the "freezing" and I think it
>>> is this one:
>>>
>>>     03ba3782e8dc writeback: switch to per-bdi threads for flushing data
>>>
>>> Unfortunately from that commit it is not clear (at least to me) why it
>>> calls set_freezable() for the bdi task. It does not look like it has
>>> anything to do with blocking writes to storage while entering
>>> hibernation but I may be mistaken.
>> Wow, a decade ago...
>>
>> Honestly, I don't recall why these were marked freezable, and as I wrote
>> in the other reply, I don't think there's a good reason for that to be
>> the case.
> Well, cannot it happen that the flush worker will get stuck in D state
> because some subsystem is already suspended and thus hibernation fails
> (because AFAIK processes in uninterruptible sleep block hibernation)?
>
> I was also somewhat worried that the hibernation image could be
> inconsistent if flush workers do something while hibernation image is being
> taken but that does not seem to be a valid concern as all kernel processes
> get frozen before hibernation image is taken.

To be precise, nothing is scheduled while creating a hibernation image, 
but once the image has been created, threads that are not frozen can be 
scheduled again and there are kernel threads which aren't frozen.

So the question is whether or not any of the kernel threads which are 
not frozen can do anything potentially unsafe if the bdi wq is not 
freezable and I don't quite see what that might be.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-10-09 13:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-02 12:21 System hangs if NVMe/SSD is removed during suspend Mika Westerberg
2019-10-03 16:50 ` Tejun Heo
2019-10-04  8:03   ` Mika Westerberg
2019-10-04  9:59     ` Rafael J. Wysocki
2019-10-04 11:01       ` Mika Westerberg
2019-10-04 13:32         ` Jens Axboe
2019-10-07 10:08           ` Jan Kara
2019-10-09 13:22             ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).