All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] cpu hot-remove fails with nvme device
@ 2015-03-16 12:07 Yigal Korman
  2015-03-16 17:51 ` Keith Busch
  0 siblings, 1 reply; 3+ messages in thread
From: Yigal Korman @ 2015-03-16 12:07 UTC (permalink / raw)


Hi,

Using cpu hotplug to disable (offline) a cpu:
echo 0 > /sys/devices/system/cpu/cpu10/online
does not return.

Machine is usable, and cpu10 is marked as offline, but any test
scripts hangs because the "echo 0" never returned.

Doing bisect pointed to this commit id: a4aea5623d "NVMe: Convert to blk-mq"
Indeed, the test machine in question has an NVMe card.
Doing "rmmod nvme" and then disabling cpus works.

Thanks,
Yigal

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [BUG] cpu hot-remove fails with nvme device
  2015-03-16 12:07 [BUG] cpu hot-remove fails with nvme device Yigal Korman
@ 2015-03-16 17:51 ` Keith Busch
  2015-03-16 19:38   ` Keith Busch
  0 siblings, 1 reply; 3+ messages in thread
From: Keith Busch @ 2015-03-16 17:51 UTC (permalink / raw)


On Mon, 16 Mar 2015, Yigal Korman wrote:
> Hi,
>
> Using cpu hotplug to disable (offline) a cpu:
> echo 0 > /sys/devices/system/cpu/cpu10/online
> does not return.
>
> Machine is usable, and cpu10 is marked as offline, but any test
> scripts hangs because the "echo 0" never returned.
>
> Doing bisect pointed to this commit id: a4aea5623d "NVMe: Convert to blk-mq"
> Indeed, the test machine in question has an NVMe card.
> Doing "rmmod nvme" and then disabling cpus works.

Thanks for the head's up. Do you know if nvme is the only driver
subscribing to blk-mq in your system? The nvme driver doesn't register
with cpu notification anymore, so I'm just trying to see if this is
a generic blk-mq issue or if there is some other interaction specific
to nvme.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [BUG] cpu hot-remove fails with nvme device
  2015-03-16 17:51 ` Keith Busch
@ 2015-03-16 19:38   ` Keith Busch
  0 siblings, 0 replies; 3+ messages in thread
From: Keith Busch @ 2015-03-16 19:38 UTC (permalink / raw)


> On Mon, 16 Mar 2015, Yigal Korman wrote:
>> Using cpu hotplug to disable (offline) a cpu:
>> echo 0 > /sys/devices/system/cpu/cpu10/online
>> does not return.

I just got back to my lab and tried your test. It's failing because the
blk-mq hot cpu notifier callback freezes the queues and waits for all
who've entered to exit (i.e., no outstanding requests). The NVMe driver
submits async admin commands that don't have a timeout, so this queue
never has zero references until the driver shuts the controller down.

I don't have a good idea how to fix this right now. I'll look into it
unless Jens or someone else has a suggestion.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-03-16 19:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-16 12:07 [BUG] cpu hot-remove fails with nvme device Yigal Korman
2015-03-16 17:51 ` Keith Busch
2015-03-16 19:38   ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.