linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* NVME, isolcpus, and irq affinity
@ 2020-10-12 15:49 Chris Friesen
  2020-10-12 17:13 ` Keith Busch
  2020-10-13  0:51 ` Ming Lei
  0 siblings, 2 replies; 8+ messages in thread
From: Chris Friesen @ 2020-10-12 15:49 UTC (permalink / raw)
  To: linux-nvme

Hi,

I'm not subscribed to the list so please CC me on replies.

I've got a linux system running the RT kernel with threaded irqs.  On 
startup we affine the various irq threads to the housekeeping CPUs, but 
I recently hit a scenario where after some days of uptime we ended up 
with a number of NVME irq threads affined to application cores instead 
(not good when we're trying to run low-latency applications).

Looking at the code, it appears that the NVME driver can in some 
scenarios call nvme_setup_io_queues() after the initial setup and thus 
allocate new IRQ threads at runtime.  It appears that this will then 
call pci_alloc_irq_vectors_affinity(), which seems to determine affinity 
without any regard for things like "isolcpus" or "cset shield".

There seem to be other reports of similar issues:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566

Am I worried about nothing, or is there a risk that those irq threads 
would actually need to do real work (which would cause unacceptable 
jitter in my application)?

Assuming I'm reading the code correctly, how does it make sense for the 
NVME driver to affine interrupts to CPUs which have explicitly been 
designated as "isolated"?

Thanks,

Chris



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVME, isolcpus, and irq affinity
  2020-10-12 15:49 NVME, isolcpus, and irq affinity Chris Friesen
@ 2020-10-12 17:13 ` Keith Busch
  2020-10-12 18:50   ` Chris Friesen
  2020-10-13  0:51 ` Ming Lei
  1 sibling, 1 reply; 8+ messages in thread
From: Keith Busch @ 2020-10-12 17:13 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-nvme

On Mon, Oct 12, 2020 at 09:49:38AM -0600, Chris Friesen wrote:
> I've got a linux system running the RT kernel with threaded irqs.  On
> startup we affine the various irq threads to the housekeeping CPUs, but I
> recently hit a scenario where after some days of uptime we ended up with a
> number of NVME irq threads affined to application cores instead (not good
> when we're trying to run low-latency applications).
> 
> Looking at the code, it appears that the NVME driver can in some scenarios
> call nvme_setup_io_queues() after the initial setup and thus allocate new
> IRQ threads at runtime.  It appears that this will then call
> pci_alloc_irq_vectors_affinity(), 

Yes, the driver will re-run interrupt setup on a controller reset.

> which seems to determine affinity without any regard for things like
> "isolcpus" or "cset shield".
>
> There seem to be other reports of similar issues:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
> 
> Am I worried about nothing, or is there a risk that those irq threads would
> actually need to do real work (which would cause unacceptable jitter in my
> application)?
> 
> Assuming I'm reading the code correctly, how does it make sense for the NVME
> driver to affine interrupts to CPUs which have explicitly been designated as
> "isolated"?

The driver allocates interrupts, but doesn't affine them. The driver
lets the kernel handle that instead.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVME, isolcpus, and irq affinity
  2020-10-12 17:13 ` Keith Busch
@ 2020-10-12 18:50   ` Chris Friesen
  2020-10-12 19:05     ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Friesen @ 2020-10-12 18:50 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme

On 10/12/2020 11:13 AM, Keith Busch wrote:
> On Mon, Oct 12, 2020 at 09:49:38AM -0600, Chris Friesen wrote:

>> Assuming I'm reading the code correctly, how does it make sense for the NVME
>> driver to affine interrupts to CPUs which have explicitly been designated as
>> "isolated"?
> 
> The driver allocates interrupts, but doesn't affine them. The driver
> lets the kernel handle that instead.

Okay, thanks for the quick reply.

The interrupts in question have names of the form "nvmeXqY", where X and 
Y are integers.  Some of them are affined to individual CPUs while 
others are affined to two or more CPUs.

If no tasks on the CPUs in question do any disk I/O is it possible these 
interrupts could still get triggered by activity instigated by tasks on 
other CPUs?  Or would they basically be idle and inert in that case?

Thanks,

Chris

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVME, isolcpus, and irq affinity
  2020-10-12 18:50   ` Chris Friesen
@ 2020-10-12 19:05     ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2020-10-12 19:05 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-nvme

On Mon, Oct 12, 2020 at 12:50:49PM -0600, Chris Friesen wrote:
> On 10/12/2020 11:13 AM, Keith Busch wrote:
> > On Mon, Oct 12, 2020 at 09:49:38AM -0600, Chris Friesen wrote:
> 
> > > Assuming I'm reading the code correctly, how does it make sense for the NVME
> > > driver to affine interrupts to CPUs which have explicitly been designated as
> > > "isolated"?
> > 
> > The driver allocates interrupts, but doesn't affine them. The driver
> > lets the kernel handle that instead.
> 
> Okay, thanks for the quick reply.
> 
> The interrupts in question have names of the form "nvmeXqY", where X and Y
> are integers.  Some of them are affined to individual CPUs while others are
> affined to two or more CPUs.
> 
> If no tasks on the CPUs in question do any disk I/O is it possible these
> interrupts could still get triggered by activity instigated by tasks on
> other CPUs?  Or would they basically be idle and inert in that case?

In order to wake the thread, the top half handler needs to see
completion entries on that interrupt's associated queue. If you never
use the CPUs in an interrupt's affinity mask to dispatch IO, there won't
be completions, so the thread never wakes.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVME, isolcpus, and irq affinity
  2020-10-12 15:49 NVME, isolcpus, and irq affinity Chris Friesen
  2020-10-12 17:13 ` Keith Busch
@ 2020-10-13  0:51 ` Ming Lei
  2020-10-13  6:24   ` Chris Friesen
  1 sibling, 1 reply; 8+ messages in thread
From: Ming Lei @ 2020-10-13  0:51 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-nvme

On Mon, Oct 12, 2020 at 11:52 PM Chris Friesen
<chris.friesen@windriver.com> wrote:
>
> Hi,
>
> I'm not subscribed to the list so please CC me on replies.
>
> I've got a linux system running the RT kernel with threaded irqs.  On
> startup we affine the various irq threads to the housekeeping CPUs, but
> I recently hit a scenario where after some days of uptime we ended up
> with a number of NVME irq threads affined to application cores instead
> (not good when we're trying to run low-latency applications).
>
> Looking at the code, it appears that the NVME driver can in some
> scenarios call nvme_setup_io_queues() after the initial setup and thus
> allocate new IRQ threads at runtime.  It appears that this will then
> call pci_alloc_irq_vectors_affinity(), which seems to determine affinity
> without any regard for things like "isolcpus" or "cset shield".
>
> There seem to be other reports of similar issues:
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
>
> Am I worried about nothing, or is there a risk that those irq threads
> would actually need to do real work (which would cause unacceptable
> jitter in my application)?
>
> Assuming I'm reading the code correctly, how does it make sense for the
> NVME driver to affine interrupts to CPUs which have explicitly been
> designated as "isolated"?

You may pass 'isolcpus=managed_irq,...' for this kind of isolation, see details
in 'isolcpus=' part of Documentation/admin-guide/kernel-parameters.txt.

And this feature is added since v5.6.

Thanks,
Ming Lei

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVME, isolcpus, and irq affinity
  2020-10-13  0:51 ` Ming Lei
@ 2020-10-13  6:24   ` Chris Friesen
  2020-10-13  7:50     ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Friesen @ 2020-10-13  6:24 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-nvme

On 10/12/2020 6:51 PM, Ming Lei wrote:
> On Mon, Oct 12, 2020 at 11:52 PM Chris Friesen
> <chris.friesen@windriver.com> wrote:
>>
>> Hi,
>>
>> I'm not subscribed to the list so please CC me on replies.
>>
>> I've got a linux system running the RT kernel with threaded irqs.  On
>> startup we affine the various irq threads to the housekeeping CPUs, but
>> I recently hit a scenario where after some days of uptime we ended up
>> with a number of NVME irq threads affined to application cores instead
>> (not good when we're trying to run low-latency applications).
>>
>> Looking at the code, it appears that the NVME driver can in some
>> scenarios call nvme_setup_io_queues() after the initial setup and thus
>> allocate new IRQ threads at runtime.  It appears that this will then
>> call pci_alloc_irq_vectors_affinity(), which seems to determine affinity
>> without any regard for things like "isolcpus" or "cset shield".
>>
>> There seem to be other reports of similar issues:
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
>>
>> Am I worried about nothing, or is there a risk that those irq threads
>> would actually need to do real work (which would cause unacceptable
>> jitter in my application)?
>>
>> Assuming I'm reading the code correctly, how does it make sense for the
>> NVME driver to affine interrupts to CPUs which have explicitly been
>> designated as "isolated"?
> 
> You may pass 'isolcpus=managed_irq,...' for this kind of isolation, see details
> in 'isolcpus=' part of Documentation/admin-guide/kernel-parameters.txt.
> 
> And this feature is added since v5.6.

I suspect that might work, unfortunately it's not available in our 
kernel and jumping to a brand new kernel will mean a lot of additional 
validation work so it's not something we can do on a whim.

I'm definitely looking forward to moving to something newer though.

Chris

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVME, isolcpus, and irq affinity
  2020-10-13  6:24   ` Chris Friesen
@ 2020-10-13  7:50     ` Ming Lei
  2020-10-13 15:19       ` Chris Friesen
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2020-10-13  7:50 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-nvme

On Tue, Oct 13, 2020 at 2:24 PM Chris Friesen
<chris.friesen@windriver.com> wrote:
>
> On 10/12/2020 6:51 PM, Ming Lei wrote:
> > On Mon, Oct 12, 2020 at 11:52 PM Chris Friesen
> > <chris.friesen@windriver.com> wrote:
> >>
> >> Hi,
> >>
> >> I'm not subscribed to the list so please CC me on replies.
> >>
> >> I've got a linux system running the RT kernel with threaded irqs.  On
> >> startup we affine the various irq threads to the housekeeping CPUs, but
> >> I recently hit a scenario where after some days of uptime we ended up
> >> with a number of NVME irq threads affined to application cores instead
> >> (not good when we're trying to run low-latency applications).
> >>
> >> Looking at the code, it appears that the NVME driver can in some
> >> scenarios call nvme_setup_io_queues() after the initial setup and thus
> >> allocate new IRQ threads at runtime.  It appears that this will then
> >> call pci_alloc_irq_vectors_affinity(), which seems to determine affinity
> >> without any regard for things like "isolcpus" or "cset shield".
> >>
> >> There seem to be other reports of similar issues:
> >>
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
> >>
> >> Am I worried about nothing, or is there a risk that those irq threads
> >> would actually need to do real work (which would cause unacceptable
> >> jitter in my application)?
> >>
> >> Assuming I'm reading the code correctly, how does it make sense for the
> >> NVME driver to affine interrupts to CPUs which have explicitly been
> >> designated as "isolated"?
> >
> > You may pass 'isolcpus=managed_irq,...' for this kind of isolation, see details
> > in 'isolcpus=' part of Documentation/admin-guide/kernel-parameters.txt.
> >
> > And this feature is added since v5.6.
>
> I suspect that might work, unfortunately it's not available in our

Did you look at '11ea68f553e2 genirq, sched/isolation: Isolate from
handling managed interrupts'?
Which supposes to address this kind of issue exactly.

> kernel and jumping to a brand new kernel will mean a lot of additional
> validation work so it's not something we can do on a whim.

You can backport that patch to your kernel.

>
> I'm definitely looking forward to moving to something newer though.

What is the something newer you want?


Thanks,
Ming Lei

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVME, isolcpus, and irq affinity
  2020-10-13  7:50     ` Ming Lei
@ 2020-10-13 15:19       ` Chris Friesen
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Friesen @ 2020-10-13 15:19 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-nvme

On 10/13/2020 1:50 AM, Ming Lei wrote:
> On Tue, Oct 13, 2020 at 2:24 PM Chris Friesen
> <chris.friesen@windriver.com> wrote:

>> kernel and jumping to a brand new kernel will mean a lot of additional
>> validation work so it's not something we can do on a whim.
> 
> You can backport that patch to your kernel.

Right, we might end up having to do something like that.

>> I'm definitely looking forward to moving to something newer though.
> 
> What is the something newer you want?

Sorry, I meant something newer than what we have currently. :)

Chris

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-13 15:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12 15:49 NVME, isolcpus, and irq affinity Chris Friesen
2020-10-12 17:13 ` Keith Busch
2020-10-12 18:50   ` Chris Friesen
2020-10-12 19:05     ` Keith Busch
2020-10-13  0:51 ` Ming Lei
2020-10-13  6:24   ` Chris Friesen
2020-10-13  7:50     ` Ming Lei
2020-10-13 15:19       ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).