linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] Handling of managed IRQs when hotplugging CPUs
@ 2019-02-05 15:28 Hannes Reinecke
  2019-02-19  2:19 ` Ming Lei
  0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2019-02-05 15:28 UTC (permalink / raw)
  To: lsf-pc, SCSI Mailing List, linux-block

Hi all,

this came up during discussion on the mailing list (cf thread "Question 
on handling managed IRQs when hotplugging CPUs").
The problem is that with managed IRQs and block-mq I/O will be routed to 
individual CPUs, and the response will be send to the IRQ assigned to 
that CPU.

If now a CPU hotplug event occurs when I/O is still in-flight the IRQ 
will _still_ be assigned to the CPU, causing any pending interrupt to be 
lost.
Hence the driver will never notice that an interrupt has happened, and 
an I/O timeout occurs.

One proposal was to quiesce the device when a CPU hotplug event occurs, 
and only allow for CPU hotplugging once it's fully quiesced.

While this would be working, it will be introducing quite some system 
stall, and it actually a rather big impact in the system.
Another possiblity would be to have the driver abort the requests 
itself, but this requires specific callbacks into the driver, and, of 
course, the driver having the ability to actually do so.

I would like to discuss at LSF/MM how these issues can be addressed best.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] Handling of managed IRQs when hotplugging CPUs
  2019-02-05 15:28 [LSF/MM TOPIC] Handling of managed IRQs when hotplugging CPUs Hannes Reinecke
@ 2019-02-19  2:19 ` Ming Lei
  2019-02-19 14:24   ` Hannes Reinecke
  2019-02-25 17:22   ` Hannes Reinecke
  0 siblings, 2 replies; 5+ messages in thread
From: Ming Lei @ 2019-02-19  2:19 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: lsf-pc, SCSI Mailing List, linux-block, Thomas Gleixner,
	Christoph Hellwig

On Tue, Feb 5, 2019 at 11:30 PM Hannes Reinecke <hare@suse.de> wrote:
>
> Hi all,
>
> this came up during discussion on the mailing list (cf thread "Question
> on handling managed IRQs when hotplugging CPUs").
> The problem is that with managed IRQs and block-mq I/O will be routed to
> individual CPUs, and the response will be send to the IRQ assigned to
> that CPU.
>
> If now a CPU hotplug event occurs when I/O is still in-flight the IRQ
> will _still_ be assigned to the CPU, causing any pending interrupt to be
> lost.
> Hence the driver will never notice that an interrupt has happened, and
> an I/O timeout occurs.

Lots of driver's timeout handler only returns BLK_EH_RESET_TIMER,
and this situation can't be covered by IO timeout for these devices.

For example, we have see IO hang issue on HPSA, megaraid_sas
before when wrong msi vector is set on IO command. Even one such
issue on aacraid isn't fixed yet.

>
> One proposal was to quiesce the device when a CPU hotplug event occurs,
> and only allow for CPU hotplugging once it's fully quiesced.

That is the original solution, but big problem is that queue dependency
exists, such as loop/DM's queue depends on underlying's queue, NVMe
IO queue depends on  its admin queue.

>
> While this would be working, it will be introducing quite some system
> stall, and it actually a rather big impact in the system.
> Another possiblity would be to have the driver abort the requests
> itself, but this requires specific callbacks into the driver, and, of
> course, the driver having the ability to actually do so.
>
> I would like to discuss at LSF/MM how these issues can be addressed best.

One related topic is that the current static queue mapping without CPU hotplug
handler involved may waste lots of IRQ vectors[1], and how to deal
with this problem?

[1] http://lists.infradead.org/pipermail/linux-nvme/2019-January/021961.html

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] Handling of managed IRQs when hotplugging CPUs
  2019-02-19  2:19 ` Ming Lei
@ 2019-02-19 14:24   ` Hannes Reinecke
  2019-02-19 15:14     ` Ming Lei
  2019-02-25 17:22   ` Hannes Reinecke
  1 sibling, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2019-02-19 14:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: lsf-pc, SCSI Mailing List, linux-block, Thomas Gleixner,
	Christoph Hellwig

On 2/19/19 3:19 AM, Ming Lei wrote:
> On Tue, Feb 5, 2019 at 11:30 PM Hannes Reinecke <hare@suse.de> wrote:
>>
>> Hi all,
>>
>> this came up during discussion on the mailing list (cf thread "Question
>> on handling managed IRQs when hotplugging CPUs").
>> The problem is that with managed IRQs and block-mq I/O will be routed to
>> individual CPUs, and the response will be send to the IRQ assigned to
>> that CPU.
>>
>> If now a CPU hotplug event occurs when I/O is still in-flight the IRQ
>> will _still_ be assigned to the CPU, causing any pending interrupt to be
>> lost.
>> Hence the driver will never notice that an interrupt has happened, and
>> an I/O timeout occurs.
> 
> Lots of driver's timeout handler only returns BLK_EH_RESET_TIMER,
> and this situation can't be covered by IO timeout for these devices.
> 
> For example, we have see IO hang issue on HPSA, megaraid_sas
> before when wrong msi vector is set on IO command. Even one such
> issue on aacraid isn't fixed yet.
> 
>>
>> One proposal was to quiesce the device when a CPU hotplug event occurs,
>> and only allow for CPU hotplugging once it's fully quiesced.
> 
> That is the original solution, but big problem is that queue dependency
> exists, such as loop/DM's queue depends on underlying's queue, NVMe
> IO queue depends on  its admin queue.
> 
>>
>> While this would be working, it will be introducing quite some system
>> stall, and it actually a rather big impact in the system.
>> Another possiblity would be to have the driver abort the requests
>> itself, but this requires specific callbacks into the driver, and, of
>> course, the driver having the ability to actually do so.
>>
>> I would like to discuss at LSF/MM how these issues can be addressed best.
> 
> One related topic is that the current static queue mapping without CPU hotplug
> handler involved may waste lots of IRQ vectors[1], and how to deal
> with this problem?
> 
> [1] http://lists.infradead.org/pipermail/linux-nvme/2019-January/021961.html
> 
Yes, ideally I would like to touch upon that, too.
Additionally we have the issue raised by the mpt3sas folks [2], where 
they ran into a CPU lockup when having more CPU cores than interrupts.

[2] https://patchwork.kernel.org/cover/10811825

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] Handling of managed IRQs when hotplugging CPUs
  2019-02-19 14:24   ` Hannes Reinecke
@ 2019-02-19 15:14     ` Ming Lei
  0 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2019-02-19 15:14 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: lsf-pc, SCSI Mailing List, linux-block, Thomas Gleixner,
	Christoph Hellwig

On Tue, Feb 19, 2019 at 10:24 PM Hannes Reinecke <hare@suse.de> wrote:
>
> On 2/19/19 3:19 AM, Ming Lei wrote:
> > On Tue, Feb 5, 2019 at 11:30 PM Hannes Reinecke <hare@suse.de> wrote:
> >>
> >> Hi all,
> >>
> >> this came up during discussion on the mailing list (cf thread "Question
> >> on handling managed IRQs when hotplugging CPUs").
> >> The problem is that with managed IRQs and block-mq I/O will be routed to
> >> individual CPUs, and the response will be send to the IRQ assigned to
> >> that CPU.
> >>
> >> If now a CPU hotplug event occurs when I/O is still in-flight the IRQ
> >> will _still_ be assigned to the CPU, causing any pending interrupt to be
> >> lost.
> >> Hence the driver will never notice that an interrupt has happened, and
> >> an I/O timeout occurs.
> >
> > Lots of driver's timeout handler only returns BLK_EH_RESET_TIMER,
> > and this situation can't be covered by IO timeout for these devices.
> >
> > For example, we have see IO hang issue on HPSA, megaraid_sas
> > before when wrong msi vector is set on IO command. Even one such
> > issue on aacraid isn't fixed yet.
> >
> >>
> >> One proposal was to quiesce the device when a CPU hotplug event occurs,
> >> and only allow for CPU hotplugging once it's fully quiesced.
> >
> > That is the original solution, but big problem is that queue dependency
> > exists, such as loop/DM's queue depends on underlying's queue, NVMe
> > IO queue depends on  its admin queue.
> >
> >>
> >> While this would be working, it will be introducing quite some system
> >> stall, and it actually a rather big impact in the system.
> >> Another possiblity would be to have the driver abort the requests
> >> itself, but this requires specific callbacks into the driver, and, of
> >> course, the driver having the ability to actually do so.
> >>
> >> I would like to discuss at LSF/MM how these issues can be addressed best.
> >
> > One related topic is that the current static queue mapping without CPU hotplug
> > handler involved may waste lots of IRQ vectors[1], and how to deal
> > with this problem?
> >
> > [1] http://lists.infradead.org/pipermail/linux-nvme/2019-January/021961.html
> >
> Yes, ideally I would like to touch upon that, too.
> Additionally we have the issue raised by the mpt3sas folks [2], where
> they ran into a CPU lockup when having more CPU cores than interrupts.

In theory, if number of submission queues is bigger than number of completion
queues, the softlock issue might be triggered. But in reality, that often means
the irq handler takes too much CPU.

I saw such NVMe devices in which queue mapping is 2:1(2 submission: 1
completion)
and IOPS may reach millions without any soft lockup.

Threaded irq handler may help this case too, at least for avoiding soft lockup.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] Handling of managed IRQs when hotplugging CPUs
  2019-02-19  2:19 ` Ming Lei
  2019-02-19 14:24   ` Hannes Reinecke
@ 2019-02-25 17:22   ` Hannes Reinecke
  1 sibling, 0 replies; 5+ messages in thread
From: Hannes Reinecke @ 2019-02-25 17:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: lsf-pc, SCSI Mailing List, linux-block, Thomas Gleixner,
	Christoph Hellwig

On 2/19/19 3:19 AM, Ming Lei wrote:
> On Tue, Feb 5, 2019 at 11:30 PM Hannes Reinecke <hare@suse.de> wrote:
>>
>> Hi all,
>>
>> this came up during discussion on the mailing list (cf thread "Question
>> on handling managed IRQs when hotplugging CPUs").
>> The problem is that with managed IRQs and block-mq I/O will be routed to
>> individual CPUs, and the response will be send to the IRQ assigned to
>> that CPU.
>>
>> If now a CPU hotplug event occurs when I/O is still in-flight the IRQ
>> will _still_ be assigned to the CPU, causing any pending interrupt to be
>> lost.
>> Hence the driver will never notice that an interrupt has happened, and
>> an I/O timeout occurs.
> 
> Lots of driver's timeout handler only returns BLK_EH_RESET_TIMER,
> and this situation can't be covered by IO timeout for these devices.
> 
> For example, we have see IO hang issue on HPSA, megaraid_sas
> before when wrong msi vector is set on IO command. Even one such
> issue on aacraid isn't fixed yet.
> 
Precisely.

>>
>> One proposal was to quiesce the device when a CPU hotplug event occurs,
>> and only allow for CPU hotplugging once it's fully quiesced.
> 
> That is the original solution, but big problem is that queue dependency
> exists, such as loop/DM's queue depends on underlying's queue, NVMe
> IO queue depends on  its admin queue.
> 
Well, obviously we would have to wait for _all_ queues to be quiesced.
And for stacked devices we will need to take the I/O stack into account, 
true.

>>
>> While this would be working, it will be introducing quite some system
>> stall, and it actually a rather big impact in the system.
>> Another possiblity would be to have the driver abort the requests
>> itself, but this requires specific callbacks into the driver, and, of
>> course, the driver having the ability to actually do so.
>>
>> I would like to discuss at LSF/MM how these issues can be addressed best.
> 
> One related topic is that the current static queue mapping without CPU hotplug
> handler involved may waste lots of IRQ vectors[1], and how to deal
> with this problem?
> 
> [1] http://lists.infradead.org/pipermail/linux-nvme/2019-January/021961.html
> 
Good point. Let's do it.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-02-25 17:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-05 15:28 [LSF/MM TOPIC] Handling of managed IRQs when hotplugging CPUs Hannes Reinecke
2019-02-19  2:19 ` Ming Lei
2019-02-19 14:24   ` Hannes Reinecke
2019-02-19 15:14     ` Ming Lei
2019-02-25 17:22   ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).