All of lore.kernel.org
 help / color / mirror / Atom feed
* Question on handling managed IRQs when hotplugging CPUs
@ 2019-01-29 11:25 John Garry
  2019-01-29 11:54 ` Hannes Reinecke
  2019-01-29 15:44 ` Keith Busch
  0 siblings, 2 replies; 26+ messages in thread
From: John Garry @ 2019-01-29 11:25 UTC (permalink / raw)
  To: tglx, Christoph Hellwig
  Cc: Marc Zyngier, axboe, Keith Busch, Peter Zijlstra,
	Michael Ellerman, Linuxarm, linux-kernel, Hannes Reinecke

Hi,

I have a question on $subject which I hope you can shed some light on.

According to commit c5cb83bb337c25 ("genirq/cpuhotplug: Handle managed 
IRQs on CPU hotplug"), if we offline the last CPU in a managed IRQ 
affinity mask, the IRQ is shutdown.

The reasoning is that this IRQ is thought to be associated with a 
specific queue on a MQ device, and the CPUs in the IRQ affinity mask are 
the same CPUs associated with the queue. So, if no CPU is using the 
queue, then no need for the IRQ.

However how does this handle scenario of last CPU in IRQ affinity mask 
being offlined while IO associated with queue is still in flight?

Or if we make the decision to use queue associated with the current CPU, 
and then that CPU (being the last CPU online in the queue's IRQ 
afffinity mask) goes offline and we finish the delivery with another CPU?

In these cases, when the IO completes, it would not be serviced and timeout.

I have actually tried this on my arm64 system and I see IO timeouts.

Thanks in advance,
John


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-29 11:25 Question on handling managed IRQs when hotplugging CPUs John Garry
@ 2019-01-29 11:54 ` Hannes Reinecke
  2019-01-29 12:01   ` Thomas Gleixner
  2019-01-29 15:44 ` Keith Busch
  1 sibling, 1 reply; 26+ messages in thread
From: Hannes Reinecke @ 2019-01-29 11:54 UTC (permalink / raw)
  To: John Garry, tglx, Christoph Hellwig
  Cc: Marc Zyngier, axboe, Keith Busch, Peter Zijlstra,
	Michael Ellerman, Linuxarm, linux-kernel, SCSI Mailing List

On 1/29/19 12:25 PM, John Garry wrote:
> Hi,
> 
> I have a question on $subject which I hope you can shed some light on.
> 
> According to commit c5cb83bb337c25 ("genirq/cpuhotplug: Handle managed 
> IRQs on CPU hotplug"), if we offline the last CPU in a managed IRQ 
> affinity mask, the IRQ is shutdown.
> 
> The reasoning is that this IRQ is thought to be associated with a 
> specific queue on a MQ device, and the CPUs in the IRQ affinity mask are 
> the same CPUs associated with the queue. So, if no CPU is using the 
> queue, then no need for the IRQ.
> 
> However how does this handle scenario of last CPU in IRQ affinity mask 
> being offlined while IO associated with queue is still in flight?
> 
> Or if we make the decision to use queue associated with the current CPU, 
> and then that CPU (being the last CPU online in the queue's IRQ 
> afffinity mask) goes offline and we finish the delivery with another CPU?
> 
> In these cases, when the IO completes, it would not be serviced and 
> timeout.
> 
> I have actually tried this on my arm64 system and I see IO timeouts.
> 
That actually is a very good question, and I have been wondering about 
this for quite some time.

I find it a bit hard to envision a scenario where the IRQ affinity is 
automatically (and, more importantly, atomically!) re-routed to one of 
the other CPUs.
And even it it were, chances are that there are checks in the driver 
_preventing_ them from handling those requests, seeing that they should 
have been handled by another CPU ...

I guess the safest bet is to implement a 'cleanup' worker queue which is 
responsible of looking through all the outstanding commands (on all 
hardware queues), and then complete those for which no corresponding CPU 
/ irqhandler can be found.

But I defer to the higher authorities here; maybe I'm totally wrong and 
it's already been taken care of.

But if there is no generic mechanism this really is a fit topic for 
LSF/MM, as most other drivers would be affected, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-29 11:54 ` Hannes Reinecke
@ 2019-01-29 12:01   ` Thomas Gleixner
  2019-01-29 15:27     ` John Garry
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2019-01-29 12:01 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: John Garry, Christoph Hellwig, Marc Zyngier, axboe, Keith Busch,
	Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	SCSI Mailing List

On Tue, 29 Jan 2019, Hannes Reinecke wrote:
> That actually is a very good question, and I have been wondering about this
> for quite some time.
> 
> I find it a bit hard to envision a scenario where the IRQ affinity is
> automatically (and, more importantly, atomically!) re-routed to one of the
> other CPUs.
> And even it it were, chances are that there are checks in the driver
> _preventing_ them from handling those requests, seeing that they should have
> been handled by another CPU ...
> 
> I guess the safest bet is to implement a 'cleanup' worker queue which is
> responsible of looking through all the outstanding commands (on all hardware
> queues), and then complete those for which no corresponding CPU / irqhandler
> can be found.
> 
> But I defer to the higher authorities here; maybe I'm totally wrong and it's
> already been taken care of.

TBH, I don't know. I merily was involved in the genirq side of this. But
yes, in order to make this work correctly the basic contract for CPU
hotplug case must be:

If the last CPU which is associated to a queue (and the corresponding
interrupt) goes offline, then the subsytem/driver code has to make sure
that:

   1) No more requests can be queued on that queue

   2) All outstanding of that queue have been completed or redirected
      (don't know if that's possible at all) to some other queue.

That has to be done in that order obviously. Whether any of the
subsystems/drivers actually implements this, I can't tell.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-29 12:01   ` Thomas Gleixner
@ 2019-01-29 15:27     ` John Garry
  2019-01-29 16:27       ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: John Garry @ 2019-01-29 15:27 UTC (permalink / raw)
  To: Thomas Gleixner, Hannes Reinecke
  Cc: Christoph Hellwig, Marc Zyngier, axboe, Keith Busch,
	Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	SCSI Mailing List

Hi Hannes, Thomas,

On 29/01/2019 12:01, Thomas Gleixner wrote:
> On Tue, 29 Jan 2019, Hannes Reinecke wrote:
>> That actually is a very good question, and I have been wondering about this
>> for quite some time.
>>
>> I find it a bit hard to envision a scenario where the IRQ affinity is
>> automatically (and, more importantly, atomically!) re-routed to one of the
>> other CPUs.

Isn't this what happens today for non-managed IRQs?

>> And even it it were, chances are that there are checks in the driver
>> _preventing_ them from handling those requests, seeing that they should have
>> been handled by another CPU ...

Really? I would not think that it matters which CPU we service the 
interrupt on.

>>
>> I guess the safest bet is to implement a 'cleanup' worker queue which is
>> responsible of looking through all the outstanding commands (on all hardware
>> queues), and then complete those for which no corresponding CPU / irqhandler
>> can be found.
>>
>> But I defer to the higher authorities here; maybe I'm totally wrong and it's
>> already been taken care of.
>
> TBH, I don't know. I merily was involved in the genirq side of this. But
> yes, in order to make this work correctly the basic contract for CPU
> hotplug case must be:
>
> If the last CPU which is associated to a queue (and the corresponding
> interrupt) goes offline, then the subsytem/driver code has to make sure
> that:
>
>    1) No more requests can be queued on that queue
>
>    2) All outstanding of that queue have been completed or redirected
>       (don't know if that's possible at all) to some other queue.

This may not be possible. For the HW I deal with, we have symmetrical 
delivery and completion queues, and a command delivered on DQx will 
always complete on CQx. Each completion queue has a dedicated IRQ.

>
> That has to be done in that order obviously. Whether any of the
> subsystems/drivers actually implements this, I can't tell.

Going back to c5cb83bb337c25, it seems to me that the change was made 
with the idea that we can maintain the affinity for the IRQ as we're 
shutting it down as no interrupts should occur.

However I don't see why we can't instead keep the IRQ up and set the 
affinity to all online CPUs in offline path, and restore the original 
affinity in online path. The reason we set the queue affinity to 
specific CPUs is for performance, but I would not say that this matters 
for handling residual IRQs.

Thanks,
John

>
> Thanks,
>
> 	tglx
>
> .
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-29 11:25 Question on handling managed IRQs when hotplugging CPUs John Garry
  2019-01-29 11:54 ` Hannes Reinecke
@ 2019-01-29 15:44 ` Keith Busch
  2019-01-29 17:12   ` John Garry
  1 sibling, 1 reply; 26+ messages in thread
From: Keith Busch @ 2019-01-29 15:44 UTC (permalink / raw)
  To: John Garry
  Cc: tglx, Christoph Hellwig, Marc Zyngier, axboe, Peter Zijlstra,
	Michael Ellerman, Linuxarm, linux-kernel, Hannes Reinecke

On Tue, Jan 29, 2019 at 03:25:48AM -0800, John Garry wrote:
> Hi,
> 
> I have a question on $subject which I hope you can shed some light on.
> 
> According to commit c5cb83bb337c25 ("genirq/cpuhotplug: Handle managed 
> IRQs on CPU hotplug"), if we offline the last CPU in a managed IRQ 
> affinity mask, the IRQ is shutdown.
> 
> The reasoning is that this IRQ is thought to be associated with a 
> specific queue on a MQ device, and the CPUs in the IRQ affinity mask are 
> the same CPUs associated with the queue. So, if no CPU is using the 
> queue, then no need for the IRQ.
> 
> However how does this handle scenario of last CPU in IRQ affinity mask 
> being offlined while IO associated with queue is still in flight?
> 
> Or if we make the decision to use queue associated with the current CPU, 
> and then that CPU (being the last CPU online in the queue's IRQ 
> afffinity mask) goes offline and we finish the delivery with another CPU?
> 
> In these cases, when the IO completes, it would not be serviced and timeout.
> 
> I have actually tried this on my arm64 system and I see IO timeouts.

Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
which would reap all outstanding commands before the CPU and IRQ are
taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
Create hctx for each present CPU"). It sounds like we should bring
something like that back, but make more fine grain to the per-cpu context.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-29 15:27     ` John Garry
@ 2019-01-29 16:27       ` Thomas Gleixner
  2019-01-29 17:23         ` John Garry
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2019-01-29 16:27 UTC (permalink / raw)
  To: John Garry
  Cc: Hannes Reinecke, Christoph Hellwig, Marc Zyngier, axboe,
	Keith Busch, Peter Zijlstra, Michael Ellerman, Linuxarm,
	linux-kernel, SCSI Mailing List

On Tue, 29 Jan 2019, John Garry wrote:
> On 29/01/2019 12:01, Thomas Gleixner wrote:
> > If the last CPU which is associated to a queue (and the corresponding
> > interrupt) goes offline, then the subsytem/driver code has to make sure
> > that:
> > 
> >    1) No more requests can be queued on that queue
> > 
> >    2) All outstanding of that queue have been completed or redirected
> >       (don't know if that's possible at all) to some other queue.
> 
> This may not be possible. For the HW I deal with, we have symmetrical delivery
> and completion queues, and a command delivered on DQx will always complete on
> CQx. Each completion queue has a dedicated IRQ.

So you can stop queueing on DQx and wait for all outstanding ones to come
in on CQx, right?

> > That has to be done in that order obviously. Whether any of the
> > subsystems/drivers actually implements this, I can't tell.
> 
> Going back to c5cb83bb337c25, it seems to me that the change was made with the
> idea that we can maintain the affinity for the IRQ as we're shutting it down
> as no interrupts should occur.
> 
> However I don't see why we can't instead keep the IRQ up and set the affinity
> to all online CPUs in offline path, and restore the original affinity in
> online path. The reason we set the queue affinity to specific CPUs is for
> performance, but I would not say that this matters for handling residual IRQs.

Oh yes it does. The problem is especially on x86, that if you have a large
number of queues and you take a large number of CPUs offline, then you run
into vector space exhaustion on the remaining online CPUs.

In the worst case a single CPU on x86 has only 186 vectors available for
device interrupts. So just take a quad socket machine with 144 CPUs and two
multiqueue devices with a queue per cpu. ---> FAIL

It probably fails already with one device because there are lots of other
devices which have regular interrupt which cannot be shut down.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-29 15:44 ` Keith Busch
@ 2019-01-29 17:12   ` John Garry
  2019-01-29 17:20     ` Keith Busch
  0 siblings, 1 reply; 26+ messages in thread
From: John Garry @ 2019-01-29 17:12 UTC (permalink / raw)
  To: Keith Busch
  Cc: tglx, Christoph Hellwig, Marc Zyngier, axboe, Peter Zijlstra,
	Michael Ellerman, Linuxarm, linux-kernel, Hannes Reinecke

On 29/01/2019 15:44, Keith Busch wrote:
> On Tue, Jan 29, 2019 at 03:25:48AM -0800, John Garry wrote:
>> Hi,
>>
>> I have a question on $subject which I hope you can shed some light on.
>>
>> According to commit c5cb83bb337c25 ("genirq/cpuhotplug: Handle managed
>> IRQs on CPU hotplug"), if we offline the last CPU in a managed IRQ
>> affinity mask, the IRQ is shutdown.
>>
>> The reasoning is that this IRQ is thought to be associated with a
>> specific queue on a MQ device, and the CPUs in the IRQ affinity mask are
>> the same CPUs associated with the queue. So, if no CPU is using the
>> queue, then no need for the IRQ.
>>
>> However how does this handle scenario of last CPU in IRQ affinity mask
>> being offlined while IO associated with queue is still in flight?
>>
>> Or if we make the decision to use queue associated with the current CPU,
>> and then that CPU (being the last CPU online in the queue's IRQ
>> afffinity mask) goes offline and we finish the delivery with another CPU?
>>
>> In these cases, when the IO completes, it would not be serviced and timeout.
>>
>> I have actually tried this on my arm64 system and I see IO timeouts.
>
> Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
> which would reap all outstanding commands before the CPU and IRQ are
> taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
> Create hctx for each present CPU"). It sounds like we should bring
> something like that back, but make more fine grain to the per-cpu context.
>

Seems reasonable. But we would need it to deal with drivers where they 
only expose a single queue to BLK MQ, but use many queues internally. I 
think megaraid sas does this, for example.

I would also be slightly concerned with commands being issued from the 
driver unknown to blk mq, like SCSI TMF.

Thanks,
John

> .
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-29 17:12   ` John Garry
@ 2019-01-29 17:20     ` Keith Busch
  2019-01-30 10:38       ` John Garry
  0 siblings, 1 reply; 26+ messages in thread
From: Keith Busch @ 2019-01-29 17:20 UTC (permalink / raw)
  To: John Garry
  Cc: tglx, Christoph Hellwig, Marc Zyngier, axboe, Peter Zijlstra,
	Michael Ellerman, Linuxarm, linux-kernel, Hannes Reinecke

On Tue, Jan 29, 2019 at 05:12:40PM +0000, John Garry wrote:
> On 29/01/2019 15:44, Keith Busch wrote:
> > 
> > Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
> > which would reap all outstanding commands before the CPU and IRQ are
> > taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
> > Create hctx for each present CPU"). It sounds like we should bring
> > something like that back, but make more fine grain to the per-cpu context.
> > 
> 
> Seems reasonable. But we would need it to deal with drivers where they only
> expose a single queue to BLK MQ, but use many queues internally. I think
> megaraid sas does this, for example.
> 
> I would also be slightly concerned with commands being issued from the
> driver unknown to blk mq, like SCSI TMF.

I don't think either of those descriptions sound like good candidates
for using managed IRQ affinities.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-29 16:27       ` Thomas Gleixner
@ 2019-01-29 17:23         ` John Garry
  0 siblings, 0 replies; 26+ messages in thread
From: John Garry @ 2019-01-29 17:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Hannes Reinecke, Christoph Hellwig, Marc Zyngier, axboe,
	Keith Busch, Peter Zijlstra, Michael Ellerman, Linuxarm,
	linux-kernel, SCSI Mailing List

On 29/01/2019 16:27, Thomas Gleixner wrote:
> On Tue, 29 Jan 2019, John Garry wrote:
>> On 29/01/2019 12:01, Thomas Gleixner wrote:
>>> If the last CPU which is associated to a queue (and the corresponding
>>> interrupt) goes offline, then the subsytem/driver code has to make sure
>>> that:
>>>
>>>    1) No more requests can be queued on that queue
>>>
>>>    2) All outstanding of that queue have been completed or redirected
>>>       (don't know if that's possible at all) to some other queue.
>>
>> This may not be possible. For the HW I deal with, we have symmetrical delivery
>> and completion queues, and a command delivered on DQx will always complete on
>> CQx. Each completion queue has a dedicated IRQ.
>
> So you can stop queueing on DQx and wait for all outstanding ones to come
> in on CQx, right?

Right, and this sounds like what Keith Busch mentioned in his reply.

>
>>> That has to be done in that order obviously. Whether any of the
>>> subsystems/drivers actually implements this, I can't tell.
>>
>> Going back to c5cb83bb337c25, it seems to me that the change was made with the
>> idea that we can maintain the affinity for the IRQ as we're shutting it down
>> as no interrupts should occur.
>>
>> However I don't see why we can't instead keep the IRQ up and set the affinity
>> to all online CPUs in offline path, and restore the original affinity in
>> online path. The reason we set the queue affinity to specific CPUs is for
>> performance, but I would not say that this matters for handling residual IRQs.
>
> Oh yes it does. The problem is especially on x86, that if you have a large
> number of queues and you take a large number of CPUs offline, then you run
> into vector space exhaustion on the remaining online CPUs.
>
> In the worst case a single CPU on x86 has only 186 vectors available for
> device interrupts. So just take a quad socket machine with 144 CPUs and two
> multiqueue devices with a queue per cpu. ---> FAIL
>
> It probably fails already with one device because there are lots of other
> devices which have regular interrupt which cannot be shut down.

OK, understood.

Thanks,
John

>
> Thanks,
>
> 	tglx
>
>
> .
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-29 17:20     ` Keith Busch
@ 2019-01-30 10:38       ` John Garry
  2019-01-30 12:43         ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: John Garry @ 2019-01-30 10:38 UTC (permalink / raw)
  To: Keith Busch
  Cc: tglx, Christoph Hellwig, Marc Zyngier, axboe, Peter Zijlstra,
	Michael Ellerman, Linuxarm, linux-kernel, Hannes Reinecke

On 29/01/2019 17:20, Keith Busch wrote:
> On Tue, Jan 29, 2019 at 05:12:40PM +0000, John Garry wrote:
>> On 29/01/2019 15:44, Keith Busch wrote:
>>>
>>> Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
>>> which would reap all outstanding commands before the CPU and IRQ are
>>> taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
>>> Create hctx for each present CPU"). It sounds like we should bring
>>> something like that back, but make more fine grain to the per-cpu context.
>>>
>>
>> Seems reasonable. But we would need it to deal with drivers where they only
>> expose a single queue to BLK MQ, but use many queues internally. I think
>> megaraid sas does this, for example.
>>
>> I would also be slightly concerned with commands being issued from the
>> driver unknown to blk mq, like SCSI TMF.
>
> I don't think either of those descriptions sound like good candidates
> for using managed IRQ affinities.

I wouldn't say that this behaviour is obvious to the developer. I can't 
see anything in Documentation/PCI/MSI-HOWTO.txt

It also seems that this policy to rely on upper layer to flush+freeze 
queues would cause issues if managed IRQs are used by drivers in other 
subsystems. Networks controllers may have multiple queues and 
unsoliciated interrupts.

Thanks,
John

>
> .
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-30 10:38       ` John Garry
@ 2019-01-30 12:43         ` Thomas Gleixner
  2019-01-31 17:48           ` John Garry
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2019-01-30 12:43 UTC (permalink / raw)
  To: John Garry
  Cc: Keith Busch, Christoph Hellwig, Marc Zyngier, axboe,
	Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	Hannes Reinecke

On Wed, 30 Jan 2019, John Garry wrote:
> On 29/01/2019 17:20, Keith Busch wrote:
> > On Tue, Jan 29, 2019 at 05:12:40PM +0000, John Garry wrote:
> > > On 29/01/2019 15:44, Keith Busch wrote:
> > > > 
> > > > Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
> > > > which would reap all outstanding commands before the CPU and IRQ are
> > > > taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
> > > > Create hctx for each present CPU"). It sounds like we should bring
> > > > something like that back, but make more fine grain to the per-cpu
> > > > context.
> > > > 
> > > 
> > > Seems reasonable. But we would need it to deal with drivers where they
> > > only
> > > expose a single queue to BLK MQ, but use many queues internally. I think
> > > megaraid sas does this, for example.
> > > 
> > > I would also be slightly concerned with commands being issued from the
> > > driver unknown to blk mq, like SCSI TMF.
> > 
> > I don't think either of those descriptions sound like good candidates
> > for using managed IRQ affinities.
> 
> I wouldn't say that this behaviour is obvious to the developer. I can't see
> anything in Documentation/PCI/MSI-HOWTO.txt
> 
> It also seems that this policy to rely on upper layer to flush+freeze queues
> would cause issues if managed IRQs are used by drivers in other subsystems.
> Networks controllers may have multiple queues and unsoliciated interrupts.

It's doesn't matter which part is managing flush/freeze of queues as long
as something (either common subsystem code, upper layers or the driver
itself) does it.

So for the megaraid SAS example the BLK MQ layer obviously can't do
anything because it only sees a single request queue. But the driver could,
if the the hardware supports it. tell the device to stop queueing
completions on the completion queue which is associated with a particular
CPU (or set of CPUs) during offline and then wait for the on flight stuff
to be finished. If the hardware does not allow that, then managed
interrupts can't work for it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-30 12:43         ` Thomas Gleixner
@ 2019-01-31 17:48           ` John Garry
  2019-02-01 15:56             ` Hannes Reinecke
  0 siblings, 1 reply; 26+ messages in thread
From: John Garry @ 2019-01-31 17:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Keith Busch, Christoph Hellwig, Marc Zyngier, axboe,
	Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	Hannes Reinecke, linux-scsi, linux-block

On 30/01/2019 12:43, Thomas Gleixner wrote:
> On Wed, 30 Jan 2019, John Garry wrote:
>> On 29/01/2019 17:20, Keith Busch wrote:
>>> On Tue, Jan 29, 2019 at 05:12:40PM +0000, John Garry wrote:
>>>> On 29/01/2019 15:44, Keith Busch wrote:
>>>>>
>>>>> Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
>>>>> which would reap all outstanding commands before the CPU and IRQ are
>>>>> taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
>>>>> Create hctx for each present CPU"). It sounds like we should bring
>>>>> something like that back, but make more fine grain to the per-cpu
>>>>> context.
>>>>>
>>>>
>>>> Seems reasonable. But we would need it to deal with drivers where they
>>>> only
>>>> expose a single queue to BLK MQ, but use many queues internally. I think
>>>> megaraid sas does this, for example.
>>>>
>>>> I would also be slightly concerned with commands being issued from the
>>>> driver unknown to blk mq, like SCSI TMF.
>>>
>>> I don't think either of those descriptions sound like good candidates
>>> for using managed IRQ affinities.
>>
>> I wouldn't say that this behaviour is obvious to the developer. I can't see
>> anything in Documentation/PCI/MSI-HOWTO.txt
>>
>> It also seems that this policy to rely on upper layer to flush+freeze queues
>> would cause issues if managed IRQs are used by drivers in other subsystems.
>> Networks controllers may have multiple queues and unsoliciated interrupts.
>
> It's doesn't matter which part is managing flush/freeze of queues as long
> as something (either common subsystem code, upper layers or the driver
> itself) does it.
>
> So for the megaraid SAS example the BLK MQ layer obviously can't do
> anything because it only sees a single request queue. But the driver could,
> if the the hardware supports it. tell the device to stop queueing
> completions on the completion queue which is associated with a particular
> CPU (or set of CPUs) during offline and then wait for the on flight stuff
> to be finished. If the hardware does not allow that, then managed
> interrupts can't work for it.
>

A rough audit of current SCSI drivers tells that these set 
PCI_IRQ_AFFINITY in some path but don't set Scsi_host.nr_hw_queues at all:
aacraid, be2iscsi, csiostor, megaraid, mpt3sas

I don't know specific driver details, like changing completion queue.

Thanks,
John

> Thanks,
>
> 	tglx
>
> .
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-01-31 17:48           ` John Garry
@ 2019-02-01 15:56             ` Hannes Reinecke
  2019-02-01 21:57               ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Hannes Reinecke @ 2019-02-01 15:56 UTC (permalink / raw)
  To: John Garry, Thomas Gleixner
  Cc: Keith Busch, Christoph Hellwig, Marc Zyngier, axboe,
	Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	Hannes Reinecke, linux-scsi, linux-block

On 1/31/19 6:48 PM, John Garry wrote:
> On 30/01/2019 12:43, Thomas Gleixner wrote:
>> On Wed, 30 Jan 2019, John Garry wrote:
>>> On 29/01/2019 17:20, Keith Busch wrote:
>>>> On Tue, Jan 29, 2019 at 05:12:40PM +0000, John Garry wrote:
>>>>> On 29/01/2019 15:44, Keith Busch wrote:
>>>>>>
>>>>>> Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
>>>>>> which would reap all outstanding commands before the CPU and IRQ are
>>>>>> taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
>>>>>> Create hctx for each present CPU"). It sounds like we should bring
>>>>>> something like that back, but make more fine grain to the per-cpu
>>>>>> context.
>>>>>>
>>>>>
>>>>> Seems reasonable. But we would need it to deal with drivers where they
>>>>> only
>>>>> expose a single queue to BLK MQ, but use many queues internally. I 
>>>>> think
>>>>> megaraid sas does this, for example.
>>>>>
>>>>> I would also be slightly concerned with commands being issued from the
>>>>> driver unknown to blk mq, like SCSI TMF.
>>>>
>>>> I don't think either of those descriptions sound like good candidates
>>>> for using managed IRQ affinities.
>>>
>>> I wouldn't say that this behaviour is obvious to the developer. I 
>>> can't see
>>> anything in Documentation/PCI/MSI-HOWTO.txt
>>>
>>> It also seems that this policy to rely on upper layer to flush+freeze 
>>> queues
>>> would cause issues if managed IRQs are used by drivers in other 
>>> subsystems.
>>> Networks controllers may have multiple queues and unsoliciated 
>>> interrupts.
>>
>> It's doesn't matter which part is managing flush/freeze of queues as long
>> as something (either common subsystem code, upper layers or the driver
>> itself) does it.
>>
>> So for the megaraid SAS example the BLK MQ layer obviously can't do
>> anything because it only sees a single request queue. But the driver 
>> could,
>> if the the hardware supports it. tell the device to stop queueing
>> completions on the completion queue which is associated with a particular
>> CPU (or set of CPUs) during offline and then wait for the on flight stuff
>> to be finished. If the hardware does not allow that, then managed
>> interrupts can't work for it.
>>
> 
> A rough audit of current SCSI drivers tells that these set 
> PCI_IRQ_AFFINITY in some path but don't set Scsi_host.nr_hw_queues at all:
> aacraid, be2iscsi, csiostor, megaraid, mpt3sas
> 
Megaraid and mpt3sas don't have that functionality (or, at least, not 
that I'm aware).
And in general I'm not sure if the above approach is feasible.

Thing is, if we have _managed_ CPU hotplug (ie if the hardware provides 
some means of quiescing the CPU before hotplug) then the whole thing is 
trivial; disable SQ and wait for all outstanding commands to complete.
Then trivially all requests are completed and the issue is resolved.
Even with todays infrastructure.

And I'm not sure if we can handle surprise CPU hotplug at all, given all 
the possible race conditions.
But then I might be wrong.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-01 15:56             ` Hannes Reinecke
@ 2019-02-01 21:57               ` Thomas Gleixner
  2019-02-04  7:12                 ` Hannes Reinecke
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2019-02-01 21:57 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: John Garry, Keith Busch, Christoph Hellwig, Marc Zyngier, axboe,
	Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	Hannes Reinecke, linux-scsi, linux-block

On Fri, 1 Feb 2019, Hannes Reinecke wrote:
> Thing is, if we have _managed_ CPU hotplug (ie if the hardware provides some
> means of quiescing the CPU before hotplug) then the whole thing is trivial;
> disable SQ and wait for all outstanding commands to complete.
> Then trivially all requests are completed and the issue is resolved.
> Even with todays infrastructure.
> 
> And I'm not sure if we can handle surprise CPU hotplug at all, given all the
> possible race conditions.
> But then I might be wrong.

The kernel would completely fall apart when a CPU would vanish by surprise,
i.e. uncontrolled by the kernel. Then the SCSI driver exploding would be
the least of our problems.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-01 21:57               ` Thomas Gleixner
@ 2019-02-04  7:12                 ` Hannes Reinecke
  2019-02-05 13:24                   ` John Garry
  0 siblings, 1 reply; 26+ messages in thread
From: Hannes Reinecke @ 2019-02-04  7:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: John Garry, Keith Busch, Christoph Hellwig, Marc Zyngier, axboe,
	Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	Hannes Reinecke, linux-scsi, linux-block

On 2/1/19 10:57 PM, Thomas Gleixner wrote:
> On Fri, 1 Feb 2019, Hannes Reinecke wrote:
>> Thing is, if we have _managed_ CPU hotplug (ie if the hardware provides some
>> means of quiescing the CPU before hotplug) then the whole thing is trivial;
>> disable SQ and wait for all outstanding commands to complete.
>> Then trivially all requests are completed and the issue is resolved.
>> Even with todays infrastructure.
>>
>> And I'm not sure if we can handle surprise CPU hotplug at all, given all the
>> possible race conditions.
>> But then I might be wrong.
> 
> The kernel would completely fall apart when a CPU would vanish by surprise,
> i.e. uncontrolled by the kernel. Then the SCSI driver exploding would be
> the least of our problems.
> 
Hehe. As I thought.

So, as the user then has to wait for the system to declars 'ready for 
CPU remove', why can't we just disable the SQ and wait for all I/O to 
complete?
We can make it more fine-grained by just waiting on all outstanding I/O 
on that SQ to complete, but waiting for all I/O should be good as an 
initial try.
With that we wouldn't need to fiddle with driver internals, and could 
make it pretty generic.
And we could always add more detailed logic if the driver has the means 
for doing so.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-04  7:12                 ` Hannes Reinecke
@ 2019-02-05 13:24                   ` John Garry
  2019-02-05 14:52                     ` Keith Busch
  0 siblings, 1 reply; 26+ messages in thread
From: John Garry @ 2019-02-05 13:24 UTC (permalink / raw)
  To: Hannes Reinecke, Thomas Gleixner
  Cc: Keith Busch, Christoph Hellwig, Marc Zyngier, axboe,
	Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	Hannes Reinecke, linux-scsi, linux-block

On 04/02/2019 07:12, Hannes Reinecke wrote:
> On 2/1/19 10:57 PM, Thomas Gleixner wrote:
>> On Fri, 1 Feb 2019, Hannes Reinecke wrote:
>>> Thing is, if we have _managed_ CPU hotplug (ie if the hardware
>>> provides some
>>> means of quiescing the CPU before hotplug) then the whole thing is
>>> trivial;
>>> disable SQ and wait for all outstanding commands to complete.
>>> Then trivially all requests are completed and the issue is resolved.
>>> Even with todays infrastructure.
>>>
>>> And I'm not sure if we can handle surprise CPU hotplug at all, given
>>> all the
>>> possible race conditions.
>>> But then I might be wrong.
>>
>> The kernel would completely fall apart when a CPU would vanish by
>> surprise,
>> i.e. uncontrolled by the kernel. Then the SCSI driver exploding would be
>> the least of our problems.
>>
> Hehe. As I thought.

Hi Hannes,

>
> So, as the user then has to wait for the system to declars 'ready for
> CPU remove', why can't we just disable the SQ and wait for all I/O to
> complete?
> We can make it more fine-grained by just waiting on all outstanding I/O
> on that SQ to complete, but waiting for all I/O should be good as an
> initial try.
> With that we wouldn't need to fiddle with driver internals, and could
> make it pretty generic.

I don't fully understand this idea - specifically, at which layer would 
we be waiting for all the IO to complete?

> And we could always add more detailed logic if the driver has the means
> for doing so.
>

Thanks,
John

> Cheers,
>
> Hannes



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-05 13:24                   ` John Garry
@ 2019-02-05 14:52                     ` Keith Busch
  2019-02-05 15:09                       ` John Garry
  2019-02-05 15:10                       ` Hannes Reinecke
  0 siblings, 2 replies; 26+ messages in thread
From: Keith Busch @ 2019-02-05 14:52 UTC (permalink / raw)
  To: John Garry
  Cc: Hannes Reinecke, Thomas Gleixner, Christoph Hellwig,
	Marc Zyngier, axboe, Peter Zijlstra, Michael Ellerman, Linuxarm,
	linux-kernel, Hannes Reinecke, linux-scsi, linux-block

On Tue, Feb 05, 2019 at 05:24:11AM -0800, John Garry wrote:
> On 04/02/2019 07:12, Hannes Reinecke wrote:
> 
> Hi Hannes,
> 
> >
> > So, as the user then has to wait for the system to declars 'ready for
> > CPU remove', why can't we just disable the SQ and wait for all I/O to
> > complete?
> > We can make it more fine-grained by just waiting on all outstanding I/O
> > on that SQ to complete, but waiting for all I/O should be good as an
> > initial try.
> > With that we wouldn't need to fiddle with driver internals, and could
> > make it pretty generic.
> 
> I don't fully understand this idea - specifically, at which layer would 
> we be waiting for all the IO to complete?

Whichever layer dispatched the IO to a CPU specific context should
be the one to wait for its completion. That should be blk-mq for most
block drivers.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-05 14:52                     ` Keith Busch
@ 2019-02-05 15:09                       ` John Garry
  2019-02-05 15:11                         ` Keith Busch
                                           ` (2 more replies)
  2019-02-05 15:10                       ` Hannes Reinecke
  1 sibling, 3 replies; 26+ messages in thread
From: John Garry @ 2019-02-05 15:09 UTC (permalink / raw)
  To: Keith Busch
  Cc: Hannes Reinecke, Thomas Gleixner, Christoph Hellwig,
	Marc Zyngier, axboe, Peter Zijlstra, Michael Ellerman, Linuxarm,
	linux-kernel, Hannes Reinecke, linux-scsi, linux-block

On 05/02/2019 14:52, Keith Busch wrote:
> On Tue, Feb 05, 2019 at 05:24:11AM -0800, John Garry wrote:
>> On 04/02/2019 07:12, Hannes Reinecke wrote:
>>
>> Hi Hannes,
>>
>>>
>>> So, as the user then has to wait for the system to declars 'ready for
>>> CPU remove', why can't we just disable the SQ and wait for all I/O to
>>> complete?
>>> We can make it more fine-grained by just waiting on all outstanding I/O
>>> on that SQ to complete, but waiting for all I/O should be good as an
>>> initial try.
>>> With that we wouldn't need to fiddle with driver internals, and could
>>> make it pretty generic.
>>
>> I don't fully understand this idea - specifically, at which layer would
>> we be waiting for all the IO to complete?
>
> Whichever layer dispatched the IO to a CPU specific context should
> be the one to wait for its completion. That should be blk-mq for most
> block drivers.

For SCSI devices, unfortunately not all IO sent to the HW originates 
from blk-mq or any other single entity.

Thanks,
John

>
> .
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-05 14:52                     ` Keith Busch
  2019-02-05 15:09                       ` John Garry
@ 2019-02-05 15:10                       ` Hannes Reinecke
  2019-02-05 15:16                         ` Keith Busch
  1 sibling, 1 reply; 26+ messages in thread
From: Hannes Reinecke @ 2019-02-05 15:10 UTC (permalink / raw)
  To: Keith Busch, John Garry
  Cc: Hannes Reinecke, Thomas Gleixner, Christoph Hellwig,
	Marc Zyngier, axboe, Peter Zijlstra, Michael Ellerman, Linuxarm,
	linux-kernel, linux-scsi, linux-block

On 2/5/19 3:52 PM, Keith Busch wrote:
> On Tue, Feb 05, 2019 at 05:24:11AM -0800, John Garry wrote:
>> On 04/02/2019 07:12, Hannes Reinecke wrote:
>>
>> Hi Hannes,
>>
>>>
>>> So, as the user then has to wait for the system to declars 'ready for
>>> CPU remove', why can't we just disable the SQ and wait for all I/O to
>>> complete?
>>> We can make it more fine-grained by just waiting on all outstanding I/O
>>> on that SQ to complete, but waiting for all I/O should be good as an
>>> initial try.
>>> With that we wouldn't need to fiddle with driver internals, and could
>>> make it pretty generic.
>>
>> I don't fully understand this idea - specifically, at which layer would
>> we be waiting for all the IO to complete?
> 
> Whichever layer dispatched the IO to a CPU specific context should
> be the one to wait for its completion. That should be blk-mq for most
> block drivers.
> 
Indeed.
But we don't provide any mechanisms for that ATM, right?

Maybe this would be a topic fit for LSF/MM?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-05 15:09                       ` John Garry
@ 2019-02-05 15:11                         ` Keith Busch
  2019-02-05 15:15                         ` Hannes Reinecke
  2019-02-05 18:23                         ` Christoph Hellwig
  2 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2019-02-05 15:11 UTC (permalink / raw)
  To: John Garry
  Cc: Hannes Reinecke, Thomas Gleixner, Christoph Hellwig,
	Marc Zyngier, axboe, Peter Zijlstra, Michael Ellerman, Linuxarm,
	linux-kernel, Hannes Reinecke, linux-scsi, linux-block

On Tue, Feb 05, 2019 at 03:09:28PM +0000, John Garry wrote:
> On 05/02/2019 14:52, Keith Busch wrote:
> > On Tue, Feb 05, 2019 at 05:24:11AM -0800, John Garry wrote:
> > > On 04/02/2019 07:12, Hannes Reinecke wrote:
> > > 
> > > Hi Hannes,
> > > 
> > > > 
> > > > So, as the user then has to wait for the system to declars 'ready for
> > > > CPU remove', why can't we just disable the SQ and wait for all I/O to
> > > > complete?
> > > > We can make it more fine-grained by just waiting on all outstanding I/O
> > > > on that SQ to complete, but waiting for all I/O should be good as an
> > > > initial try.
> > > > With that we wouldn't need to fiddle with driver internals, and could
> > > > make it pretty generic.
> > > 
> > > I don't fully understand this idea - specifically, at which layer would
> > > we be waiting for all the IO to complete?
> > 
> > Whichever layer dispatched the IO to a CPU specific context should
> > be the one to wait for its completion. That should be blk-mq for most
> > block drivers.
> 
> For SCSI devices, unfortunately not all IO sent to the HW originates from
> blk-mq or any other single entity.

Then they'll need to register their own CPU notifiers and handle the
ones they dispatched.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-05 15:09                       ` John Garry
  2019-02-05 15:11                         ` Keith Busch
@ 2019-02-05 15:15                         ` Hannes Reinecke
  2019-02-05 15:27                           ` John Garry
  2019-02-05 18:23                         ` Christoph Hellwig
  2 siblings, 1 reply; 26+ messages in thread
From: Hannes Reinecke @ 2019-02-05 15:15 UTC (permalink / raw)
  To: John Garry, Keith Busch
  Cc: Thomas Gleixner, Christoph Hellwig, Marc Zyngier, axboe,
	Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	Hannes Reinecke, linux-scsi, linux-block

On 2/5/19 4:09 PM, John Garry wrote:
> On 05/02/2019 14:52, Keith Busch wrote:
>> On Tue, Feb 05, 2019 at 05:24:11AM -0800, John Garry wrote:
>>> On 04/02/2019 07:12, Hannes Reinecke wrote:
>>>
>>> Hi Hannes,
>>>
>>>>
>>>> So, as the user then has to wait for the system to declars 'ready for
>>>> CPU remove', why can't we just disable the SQ and wait for all I/O to
>>>> complete?
>>>> We can make it more fine-grained by just waiting on all outstanding I/O
>>>> on that SQ to complete, but waiting for all I/O should be good as an
>>>> initial try.
>>>> With that we wouldn't need to fiddle with driver internals, and could
>>>> make it pretty generic.
>>>
>>> I don't fully understand this idea - specifically, at which layer would
>>> we be waiting for all the IO to complete?
>>
>> Whichever layer dispatched the IO to a CPU specific context should
>> be the one to wait for its completion. That should be blk-mq for most
>> block drivers.
> 
> For SCSI devices, unfortunately not all IO sent to the HW originates 
> from blk-mq or any other single entity.
> 
No, not as such.
But each IO sent to the HW requires a unique identifcation (ie a valid 
tag). And as the tagspace is managed by block-mq (minus management 
commands, but I'm working on that currently) we can easily figure out if 
the device is busy by checking for an empty tag map.

Should be doable for most modern HBAs.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-05 15:10                       ` Hannes Reinecke
@ 2019-02-05 15:16                         ` Keith Busch
  0 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2019-02-05 15:16 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: John Garry, Hannes Reinecke, Thomas Gleixner, Christoph Hellwig,
	Marc Zyngier, axboe, Peter Zijlstra, Michael Ellerman, Linuxarm,
	linux-kernel, linux-scsi, linux-block

On Tue, Feb 05, 2019 at 04:10:47PM +0100, Hannes Reinecke wrote:
> On 2/5/19 3:52 PM, Keith Busch wrote:
> > Whichever layer dispatched the IO to a CPU specific context should
> > be the one to wait for its completion. That should be blk-mq for most
> > block drivers.
> > 
> Indeed.
> But we don't provide any mechanisms for that ATM, right?
> 
> Maybe this would be a topic fit for LSF/MM?

Right, there's nothing handling this now, and sounds like it'd be a good
discussion to bring to the storage track.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-05 15:15                         ` Hannes Reinecke
@ 2019-02-05 15:27                           ` John Garry
  0 siblings, 0 replies; 26+ messages in thread
From: John Garry @ 2019-02-05 15:27 UTC (permalink / raw)
  To: Hannes Reinecke, Keith Busch
  Cc: Thomas Gleixner, Christoph Hellwig, Marc Zyngier, axboe,
	Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	Hannes Reinecke, linux-scsi, linux-block

On 05/02/2019 15:15, Hannes Reinecke wrote:
> On 2/5/19 4:09 PM, John Garry wrote:
>> On 05/02/2019 14:52, Keith Busch wrote:
>>> On Tue, Feb 05, 2019 at 05:24:11AM -0800, John Garry wrote:
>>>> On 04/02/2019 07:12, Hannes Reinecke wrote:
>>>>
>>>> Hi Hannes,
>>>>
>>>>>
>>>>> So, as the user then has to wait for the system to declars 'ready for
>>>>> CPU remove', why can't we just disable the SQ and wait for all I/O to
>>>>> complete?
>>>>> We can make it more fine-grained by just waiting on all outstanding
>>>>> I/O
>>>>> on that SQ to complete, but waiting for all I/O should be good as an
>>>>> initial try.
>>>>> With that we wouldn't need to fiddle with driver internals, and could
>>>>> make it pretty generic.
>>>>
>>>> I don't fully understand this idea - specifically, at which layer would
>>>> we be waiting for all the IO to complete?
>>>
>>> Whichever layer dispatched the IO to a CPU specific context should
>>> be the one to wait for its completion. That should be blk-mq for most
>>> block drivers.
>>
>> For SCSI devices, unfortunately not all IO sent to the HW originates
>> from blk-mq or any other single entity.
>>
> No, not as such.
> But each IO sent to the HW requires a unique identifcation (ie a valid
> tag). And as the tagspace is managed by block-mq (minus management
> commands, but I'm working on that currently) we can easily figure out if
> the device is busy by checking for an empty tag map.

That sounds like a reasonable starting solution.

Thanks,
John

>
> Should be doable for most modern HBAs.
>
> Cheers,
>
> Hannes



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-05 15:09                       ` John Garry
  2019-02-05 15:11                         ` Keith Busch
  2019-02-05 15:15                         ` Hannes Reinecke
@ 2019-02-05 18:23                         ` Christoph Hellwig
  2019-02-06  9:21                           ` John Garry
  2 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2019-02-05 18:23 UTC (permalink / raw)
  To: John Garry
  Cc: Keith Busch, Hannes Reinecke, Thomas Gleixner, Christoph Hellwig,
	Marc Zyngier, axboe, Peter Zijlstra, Michael Ellerman, Linuxarm,
	linux-kernel, Hannes Reinecke, linux-scsi, linux-block

On Tue, Feb 05, 2019 at 03:09:28PM +0000, John Garry wrote:
> For SCSI devices, unfortunately not all IO sent to the HW originates from 
> blk-mq or any other single entity.

Where else would SCSI I/O originate from?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-05 18:23                         ` Christoph Hellwig
@ 2019-02-06  9:21                           ` John Garry
  2019-02-06 13:34                             ` Benjamin Block
  0 siblings, 1 reply; 26+ messages in thread
From: John Garry @ 2019-02-06  9:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Hannes Reinecke, Thomas Gleixner, Marc Zyngier,
	axboe, Peter Zijlstra, Michael Ellerman, Linuxarm, linux-kernel,
	Hannes Reinecke, linux-scsi, linux-block

On 05/02/2019 18:23, Christoph Hellwig wrote:
> On Tue, Feb 05, 2019 at 03:09:28PM +0000, John Garry wrote:
>> For SCSI devices, unfortunately not all IO sent to the HW originates from
>> blk-mq or any other single entity.
>
> Where else would SCSI I/O originate from?

Please note that I was referring to other management IO, like SAS SMP, 
TMFs, and other proprietary commands which the driver may generate for 
the HBA - https://marc.info/?l=linux-scsi&m=154831889001973&w=2 
discusses some of them also.

Thanks,
John

>
> .
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question on handling managed IRQs when hotplugging CPUs
  2019-02-06  9:21                           ` John Garry
@ 2019-02-06 13:34                             ` Benjamin Block
  0 siblings, 0 replies; 26+ messages in thread
From: Benjamin Block @ 2019-02-06 13:34 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, Keith Busch, Hannes Reinecke, Thomas Gleixner,
	Marc Zyngier, axboe, Peter Zijlstra, Michael Ellerman, Linuxarm,
	linux-kernel, Hannes Reinecke, linux-scsi, linux-block

On Wed, Feb 06, 2019 at 09:21:40AM +0000, John Garry wrote:
> On 05/02/2019 18:23, Christoph Hellwig wrote:
> > On Tue, Feb 05, 2019 at 03:09:28PM +0000, John Garry wrote:
> > > For SCSI devices, unfortunately not all IO sent to the HW originates from
> > > blk-mq or any other single entity.
> > 
> > Where else would SCSI I/O originate from?
> 
> Please note that I was referring to other management IO, like SAS SMP, TMFs,
> and other proprietary commands which the driver may generate for the HBA -
> https://marc.info/?l=linux-scsi&m=154831889001973&w=2 discusses some of them
> also.
> 

Especially the TMFs send via SCSI EH are a bit of a pain I guess,
because they are entirely managed by the device drivers, but depending
on the device driver they might not even qualify for the problem Hannes
is seeing.

-- 
With Best Regards, Benjamin Block      /      Linux on IBM Z Kernel Development
IBM Systems & Technology Group   /  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Matthias Hartmann       /      Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-02-06 13:34 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-29 11:25 Question on handling managed IRQs when hotplugging CPUs John Garry
2019-01-29 11:54 ` Hannes Reinecke
2019-01-29 12:01   ` Thomas Gleixner
2019-01-29 15:27     ` John Garry
2019-01-29 16:27       ` Thomas Gleixner
2019-01-29 17:23         ` John Garry
2019-01-29 15:44 ` Keith Busch
2019-01-29 17:12   ` John Garry
2019-01-29 17:20     ` Keith Busch
2019-01-30 10:38       ` John Garry
2019-01-30 12:43         ` Thomas Gleixner
2019-01-31 17:48           ` John Garry
2019-02-01 15:56             ` Hannes Reinecke
2019-02-01 21:57               ` Thomas Gleixner
2019-02-04  7:12                 ` Hannes Reinecke
2019-02-05 13:24                   ` John Garry
2019-02-05 14:52                     ` Keith Busch
2019-02-05 15:09                       ` John Garry
2019-02-05 15:11                         ` Keith Busch
2019-02-05 15:15                         ` Hannes Reinecke
2019-02-05 15:27                           ` John Garry
2019-02-05 18:23                         ` Christoph Hellwig
2019-02-06  9:21                           ` John Garry
2019-02-06 13:34                             ` Benjamin Block
2019-02-05 15:10                       ` Hannes Reinecke
2019-02-05 15:16                         ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.