linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Virtio-scsi multiqueue irq affinity
@ 2019-03-18  6:21 Peter Xu
  2019-03-23 17:15 ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2019-03-18  6:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Thomas Gleixner, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin

Hi, Christoph & all,

I noticed that starting from commit 0d9f0a52c8b9 ("virtio_scsi: use
virtio IRQ affinity", 2017-02-27) the virtio scsi driver is using a
new way (via irq_create_affinity_masks()) to automatically initialize
IRQ affinities for the multi-queues, which is different comparing to
all the other virtio devices (like virtio-net, which still uses
virtqueue_set_affinity(), which is actually, irq_set_affinity_hint()).

Firstly, it will definitely broke some of the userspace programs with
that when the scripts wanted to do the bindings explicitly like before
and they could simply fail with -EIO now every time when echoing to
/proc/irq/N/smp_affinity of any of the multi-queues (see
write_irq_affinity()).

Is there any specific reason to do it with the new way?  Since AFAIU
we should still allow the system admins to decide what to do for such
configurations, .e.g., what if we only want to provision half of the
CPU resources to handle IRQs for a specific virtio-scsi controller?
We won't be able to achieve that with current policy.  Or, could this
be a question for the IRQ system (irq_create_affinity_masks()) in
general?  Any special considerations behind the big picture?

I believe I must have missed some contexts here and there... but I'd
like to raise the question up.  Say, if the new way is preferred and
attempted, maybe it would worth it to spread the idea to the rest of
the virtio drivers who support multi-queues as well.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2019-03-18  6:21 Virtio-scsi multiqueue irq affinity Peter Xu
@ 2019-03-23 17:15 ` Thomas Gleixner
  2019-03-25  5:02   ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2019-03-23 17:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin

Peter,

On Mon, 18 Mar 2019, Peter Xu wrote:
> I noticed that starting from commit 0d9f0a52c8b9 ("virtio_scsi: use
> virtio IRQ affinity", 2017-02-27) the virtio scsi driver is using a
> new way (via irq_create_affinity_masks()) to automatically initialize
> IRQ affinities for the multi-queues, which is different comparing to
> all the other virtio devices (like virtio-net, which still uses
> virtqueue_set_affinity(), which is actually, irq_set_affinity_hint()).
> 
> Firstly, it will definitely broke some of the userspace programs with
> that when the scripts wanted to do the bindings explicitly like before
> and they could simply fail with -EIO now every time when echoing to
> /proc/irq/N/smp_affinity of any of the multi-queues (see
> write_irq_affinity()).

Did it break anything? I did not see a report so far. Assumptions about
potential breakage are not really useful.

> Is there any specific reason to do it with the new way?  Since AFAIU
> we should still allow the system admins to decide what to do for such
> configurations, .e.g., what if we only want to provision half of the
> CPU resources to handle IRQs for a specific virtio-scsi controller?
> We won't be able to achieve that with current policy.  Or, could this
> be a question for the IRQ system (irq_create_affinity_masks()) in
> general?  Any special considerations behind the big picture?

That has nothing to do with the irq subsystem. That merily provides the
mechanisms.

The reason behind this is that multi-queue devices set up queues per cpu or
if not enough queues are available queues per cpu groups. So it does not
make sense to move the interrupt away from the CPU or the CPU group.

Aside of that in the CPU hotunplug case, interrupts used to be moved to the
online CPUs which resulted in problems for e.g. hibernation because on
large systems moving all interrupts to the boot CPU does not work due to
vector space exhaustion. Also CPU hotunplug is used for power management
purposes and there it does not make sense either to have the per cpu queues
of the offlined CPUs moved to the still online CPUs which then end up with
several queues.

The new way to deal with this is to strictly bind per CPU (per CPU group)
queues. If the CPU or the last CPU in the group goes offline the following
happens:

 1) The queue is disabled, i.e. no new requests can be queued

 2) Wait for the outstanding requests to complete

 3) Shut down the interrupt

 This avoids having multiple queues moved to the still online CPUs and also
 prevents vector space exhaustion because the shut down interrupt does not
 have to be migrated.

When the CPU (or the first in the group) comes online again:

 1) Reenable the interrupt

 2) Reenable the queue

Hope that helps.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2019-03-23 17:15 ` Thomas Gleixner
@ 2019-03-25  5:02   ` Peter Xu
  2019-03-25  7:06     ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2019-03-25  5:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei

On Sat, Mar 23, 2019 at 06:15:59PM +0100, Thomas Gleixner wrote:
> Peter,

Hi, Thomas,

> 
> On Mon, 18 Mar 2019, Peter Xu wrote:
> > I noticed that starting from commit 0d9f0a52c8b9 ("virtio_scsi: use
> > virtio IRQ affinity", 2017-02-27) the virtio scsi driver is using a
> > new way (via irq_create_affinity_masks()) to automatically initialize
> > IRQ affinities for the multi-queues, which is different comparing to
> > all the other virtio devices (like virtio-net, which still uses
> > virtqueue_set_affinity(), which is actually, irq_set_affinity_hint()).
> > 
> > Firstly, it will definitely broke some of the userspace programs with
> > that when the scripts wanted to do the bindings explicitly like before
> > and they could simply fail with -EIO now every time when echoing to
> > /proc/irq/N/smp_affinity of any of the multi-queues (see
> > write_irq_affinity()).
> 
> Did it break anything? I did not see a report so far. Assumptions about
> potential breakage are not really useful.

It broke some automation scripts e.g. where they tried to bind CPUs to
IRQs before staring IO but these scripts failed early during setup
when trying to echo into the affinity procfs file.  Actually I started
to look into this because of such script breakage reported by QEs.
Iinitially it was thought as a kernel bug but later we noticed that
it's a change in policy.

> 
> > Is there any specific reason to do it with the new way?  Since AFAIU
> > we should still allow the system admins to decide what to do for such
> > configurations, .e.g., what if we only want to provision half of the
> > CPU resources to handle IRQs for a specific virtio-scsi controller?
> > We won't be able to achieve that with current policy.  Or, could this
> > be a question for the IRQ system (irq_create_affinity_masks()) in
> > general?  Any special considerations behind the big picture?
> 
> That has nothing to do with the irq subsystem. That merily provides the
> mechanisms.
> 
> The reason behind this is that multi-queue devices set up queues per cpu or
> if not enough queues are available queues per cpu groups. So it does not
> make sense to move the interrupt away from the CPU or the CPU group.
> 
> Aside of that in the CPU hotunplug case, interrupts used to be moved to the
> online CPUs which resulted in problems for e.g. hibernation because on
> large systems moving all interrupts to the boot CPU does not work due to
> vector space exhaustion. Also CPU hotunplug is used for power management
> purposes and there it does not make sense either to have the per cpu queues
> of the offlined CPUs moved to the still online CPUs which then end up with
> several queues.
> 
> The new way to deal with this is to strictly bind per CPU (per CPU group)
> queues. If the CPU or the last CPU in the group goes offline the following
> happens:
> 
>  1) The queue is disabled, i.e. no new requests can be queued
> 
>  2) Wait for the outstanding requests to complete
> 
>  3) Shut down the interrupt
> 
>  This avoids having multiple queues moved to the still online CPUs and also
>  prevents vector space exhaustion because the shut down interrupt does not
>  have to be migrated.
> 
> When the CPU (or the first in the group) comes online again:
> 
>  1) Reenable the interrupt
> 
>  2) Reenable the queue
> 
> Hope that helps.

Thanks for explaining everything!  It helps a lot, and yes it makes
perfect sense to me.

If no one reported any issue I think either the scripts are not
checking the return code so they might fail silently but it might not
matter much (e.g., if the only thing that a script wants to do is to
spread the CPUs upon the IRQs then the script can simply cancel the
setup procedure of this, and even failing of those echos won't affect
much too), or they're just simpled fixed up later on.  Now the only
thing I am unsure about is whether there could be scenarios that we
may not want to use the default policy to spread the cores.

One thing I can think of is the real-time scenario where "isolcpus="
is provided, then logically we should not allow any isolated CPUs to
be bound to any of the multi-queue IRQs.  Though Ming Lei and I had a
discussion offlist before and Ming explained to me that as long as the
isolated CPUs do not generate any IO then there will be no IRQ on
those isolated (real-time) CPUs at all.  Can we guarantee that?  Now
I'm thinking whether the ideal way should be that, when multi-queue is
used with "isolcpus=" then we only spread the queues upon housekeeping
CPUs somehow?  Because AFAIU general real-time applications should not
use block IOs at all (and if not those hardware multi-queues running
upon isolated CPUs would probably be a pure waste too because they
could be always idle on the isolated cores where the real-time
application runs).

CCing Ming too.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2019-03-25  5:02   ` Peter Xu
@ 2019-03-25  7:06     ` Ming Lei
  2019-03-25  8:53       ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2019-03-25  7:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: Thomas Gleixner, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei

On Mon, Mar 25, 2019 at 01:02:13PM +0800, Peter Xu wrote:
> On Sat, Mar 23, 2019 at 06:15:59PM +0100, Thomas Gleixner wrote:
> > Peter,
> 
> Hi, Thomas,
> 
> > 
> > On Mon, 18 Mar 2019, Peter Xu wrote:
> > > I noticed that starting from commit 0d9f0a52c8b9 ("virtio_scsi: use
> > > virtio IRQ affinity", 2017-02-27) the virtio scsi driver is using a
> > > new way (via irq_create_affinity_masks()) to automatically initialize
> > > IRQ affinities for the multi-queues, which is different comparing to
> > > all the other virtio devices (like virtio-net, which still uses
> > > virtqueue_set_affinity(), which is actually, irq_set_affinity_hint()).
> > > 
> > > Firstly, it will definitely broke some of the userspace programs with
> > > that when the scripts wanted to do the bindings explicitly like before
> > > and they could simply fail with -EIO now every time when echoing to
> > > /proc/irq/N/smp_affinity of any of the multi-queues (see
> > > write_irq_affinity()).
> > 
> > Did it break anything? I did not see a report so far. Assumptions about
> > potential breakage are not really useful.
> 
> It broke some automation scripts e.g. where they tried to bind CPUs to
> IRQs before staring IO but these scripts failed early during setup
> when trying to echo into the affinity procfs file.  Actually I started
> to look into this because of such script breakage reported by QEs.
> Iinitially it was thought as a kernel bug but later we noticed that
> it's a change in policy.
> 
> > 
> > > Is there any specific reason to do it with the new way?  Since AFAIU
> > > we should still allow the system admins to decide what to do for such
> > > configurations, .e.g., what if we only want to provision half of the
> > > CPU resources to handle IRQs for a specific virtio-scsi controller?
> > > We won't be able to achieve that with current policy.  Or, could this
> > > be a question for the IRQ system (irq_create_affinity_masks()) in
> > > general?  Any special considerations behind the big picture?
> > 
> > That has nothing to do with the irq subsystem. That merily provides the
> > mechanisms.
> > 
> > The reason behind this is that multi-queue devices set up queues per cpu or
> > if not enough queues are available queues per cpu groups. So it does not
> > make sense to move the interrupt away from the CPU or the CPU group.
> > 
> > Aside of that in the CPU hotunplug case, interrupts used to be moved to the
> > online CPUs which resulted in problems for e.g. hibernation because on
> > large systems moving all interrupts to the boot CPU does not work due to
> > vector space exhaustion. Also CPU hotunplug is used for power management
> > purposes and there it does not make sense either to have the per cpu queues
> > of the offlined CPUs moved to the still online CPUs which then end up with
> > several queues.
> > 
> > The new way to deal with this is to strictly bind per CPU (per CPU group)
> > queues. If the CPU or the last CPU in the group goes offline the following
> > happens:
> > 
> >  1) The queue is disabled, i.e. no new requests can be queued
> > 
> >  2) Wait for the outstanding requests to complete
> > 
> >  3) Shut down the interrupt
> > 
> >  This avoids having multiple queues moved to the still online CPUs and also
> >  prevents vector space exhaustion because the shut down interrupt does not
> >  have to be migrated.
> > 
> > When the CPU (or the first in the group) comes online again:
> > 
> >  1) Reenable the interrupt
> > 
> >  2) Reenable the queue
> > 
> > Hope that helps.
> 
> Thanks for explaining everything!  It helps a lot, and yes it makes
> perfect sense to me.
> 
> If no one reported any issue I think either the scripts are not
> checking the return code so they might fail silently but it might not
> matter much (e.g., if the only thing that a script wants to do is to
> spread the CPUs upon the IRQs then the script can simply cancel the
> setup procedure of this, and even failing of those echos won't affect
> much too), or they're just simpled fixed up later on.  Now the only
> thing I am unsure about is whether there could be scenarios that we
> may not want to use the default policy to spread the cores.
> 
> One thing I can think of is the real-time scenario where "isolcpus="
> is provided, then logically we should not allow any isolated CPUs to
> be bound to any of the multi-queue IRQs.  Though Ming Lei and I had a

So far, this behaviour is made by user-space.

From my understanding, IRQ subsystem doesn't handle "isolcpus=", even
though the Kconfig help doesn't mention irq affinity affect:

          Make sure that CPUs running critical tasks are not disturbed by
          any source of "noise" such as unbound workqueues, timers, kthreads...
          Unbound jobs get offloaded to housekeeping CPUs. This is driven by
          the "isolcpus=" boot parameter.

Yeah, some RT application may exclude 'isolcpus=' from some IRQ's
affinity via /proc/irq interface, and now it becomes not possible any
more to do that for managed IRQ.

> discussion offlist before and Ming explained to me that as long as the
> isolated CPUs do not generate any IO then there will be no IRQ on
> those isolated (real-time) CPUs at all.  Can we guarantee that?  Now

It is only guaranteed for 1:1 mapping.

blk-mq uses managed IRQ's affinity to setup queue mapping, for example:

1) single hardware queue
- this queue's IRQ affinity includes all CPUs, then the hardware queue's
IRQ is only fired on one specific CPU for IO submitted from any CPU

2) multi hardware queue
- there are N hardware queues
- for each hardware queue i(i < N), its IRQ's affinity may include N(i) CPUs,
then IRQ for this hardware queue i is fired on one specific CPU among N(i).


Thanks,
Ming

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2019-03-25  7:06     ` Ming Lei
@ 2019-03-25  8:53       ` Thomas Gleixner
  2019-03-25  9:43         ` Peter Xu
  2019-03-25  9:50         ` Ming Lei
  0 siblings, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2019-03-25  8:53 UTC (permalink / raw)
  To: Ming Lei
  Cc: Peter Xu, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei

Ming,

On Mon, 25 Mar 2019, Ming Lei wrote:
> On Mon, Mar 25, 2019 at 01:02:13PM +0800, Peter Xu wrote:
> > One thing I can think of is the real-time scenario where "isolcpus="
> > is provided, then logically we should not allow any isolated CPUs to
> > be bound to any of the multi-queue IRQs.  Though Ming Lei and I had a
> 
> So far, this behaviour is made by user-space.
> 
> >From my understanding, IRQ subsystem doesn't handle "isolcpus=", even
> though the Kconfig help doesn't mention irq affinity affect:
> 
>           Make sure that CPUs running critical tasks are not disturbed by
>           any source of "noise" such as unbound workqueues, timers, kthreads...
>           Unbound jobs get offloaded to housekeeping CPUs. This is driven by
>           the "isolcpus=" boot parameter.

isolcpus has no effect on the interupts. That's what 'irqaffinity=' is for.

> Yeah, some RT application may exclude 'isolcpus=' from some IRQ's
> affinity via /proc/irq interface, and now it becomes not possible any
> more to do that for managed IRQ.
> 
> > discussion offlist before and Ming explained to me that as long as the
> > isolated CPUs do not generate any IO then there will be no IRQ on
> > those isolated (real-time) CPUs at all.  Can we guarantee that?  Now
> 
> It is only guaranteed for 1:1 mapping.
> 
> blk-mq uses managed IRQ's affinity to setup queue mapping, for example:
> 
> 1) single hardware queue
> - this queue's IRQ affinity includes all CPUs, then the hardware queue's
> IRQ is only fired on one specific CPU for IO submitted from any CPU

Right. We can special case that for single HW queue to honor the default
affinity setting. That's not hard to achieve.
 
> 2) multi hardware queue
> - there are N hardware queues
> - for each hardware queue i(i < N), its IRQ's affinity may include N(i) CPUs,
> then IRQ for this hardware queue i is fired on one specific CPU among N(i).

Correct and that's the sane case where it does not matter much, because if
your task on an isolated CPU does I/O then redirecting it through some
other CPU does not make sense. If it doesn't do I/O it wont be affected by
the dormant queue.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2019-03-25  8:53       ` Thomas Gleixner
@ 2019-03-25  9:43         ` Peter Xu
  2019-03-25 13:27           ` Thomas Gleixner
  2019-03-25  9:50         ` Ming Lei
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Xu @ 2019-03-25  9:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ming Lei, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei

On Mon, Mar 25, 2019 at 09:53:28AM +0100, Thomas Gleixner wrote:
> Ming,
> 
> On Mon, 25 Mar 2019, Ming Lei wrote:
> > On Mon, Mar 25, 2019 at 01:02:13PM +0800, Peter Xu wrote:
> > > One thing I can think of is the real-time scenario where "isolcpus="
> > > is provided, then logically we should not allow any isolated CPUs to
> > > be bound to any of the multi-queue IRQs.  Though Ming Lei and I had a
> > 
> > So far, this behaviour is made by user-space.
> > 
> > >From my understanding, IRQ subsystem doesn't handle "isolcpus=", even
> > though the Kconfig help doesn't mention irq affinity affect:
> > 
> >           Make sure that CPUs running critical tasks are not disturbed by
> >           any source of "noise" such as unbound workqueues, timers, kthreads...
> >           Unbound jobs get offloaded to housekeeping CPUs. This is driven by
> >           the "isolcpus=" boot parameter.
> 
> isolcpus has no effect on the interupts. That's what 'irqaffinity=' is for.
> 
> > Yeah, some RT application may exclude 'isolcpus=' from some IRQ's
> > affinity via /proc/irq interface, and now it becomes not possible any
> > more to do that for managed IRQ.
> > 
> > > discussion offlist before and Ming explained to me that as long as the
> > > isolated CPUs do not generate any IO then there will be no IRQ on
> > > those isolated (real-time) CPUs at all.  Can we guarantee that?  Now
> > 
> > It is only guaranteed for 1:1 mapping.
> > 
> > blk-mq uses managed IRQ's affinity to setup queue mapping, for example:
> > 
> > 1) single hardware queue
> > - this queue's IRQ affinity includes all CPUs, then the hardware queue's
> > IRQ is only fired on one specific CPU for IO submitted from any CPU
> 
> Right. We can special case that for single HW queue to honor the default
> affinity setting. That's not hard to achieve.
>  
> > 2) multi hardware queue
> > - there are N hardware queues
> > - for each hardware queue i(i < N), its IRQ's affinity may include N(i) CPUs,
> > then IRQ for this hardware queue i is fired on one specific CPU among N(i).
> 
> Correct and that's the sane case where it does not matter much, because if
> your task on an isolated CPU does I/O then redirecting it through some
> other CPU does not make sense. If it doesn't do I/O it wont be affected by
> the dormant queue.

(My thanks to both.)

Now I understand it can be guaranteed so it should not break
determinism of the real-time applications.  But again, I'm curious
whether we can specify how to spread the hardware queues of a block
controller (as I asked in my previous post) instead of the default one
(which is to spread the queues upon all the cores)?  I'll try to give
a detailed example on this one this time: Let's assume we've had a
host with 2 nodes and 8 cores (Node 0 with CPUs 0-3, Node 1 with CPUs
4-7), and a SCSI controller with 4 queues.  We want to take the 2nd
node to run the real-time applications so we do isolcpus=4-7.  By
default, IIUC the hardware queues will be allocated like this:

  - queue 1: CPU 0,1
  - queue 2: CPU 2,3
  - queue 3: CPU 4,5
  - queue 4: CPU 6,7

And the IRQs of the queues will be bound to the same cpuset that the
queue is bound to.

So my previous question is: since we know that CPU 4-7 won't generate
any IO after all (and they shouldn't), could it be possible that we
configure the system somehow to reflect a mapping like below:

  - queue 1: CPU 0
  - qeueu 2: CPU 1
  - queue 3: CPU 2
  - queue 4: CPU 3

Then we disallow the CPUs 4-7 to generate IO and return failure if
they tries to.

Again, I'm pretty uncertain on whether this case can be anything close
to useful...  It just came out of my pure curiosity.  I think it at
least has some benefits like: we will guarantee that the realtime CPUs
won't send block IO requests (which could be good because it could
simply break real-time determinism), and we'll save two queues from
being totally idle (so if we run non-real-time block applications on
cores 0-3 we still gain 4 hardware queues's throughput rather than 2).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2019-03-25  8:53       ` Thomas Gleixner
  2019-03-25  9:43         ` Peter Xu
@ 2019-03-25  9:50         ` Ming Lei
  2021-05-08  7:52           ` xuyihang
  1 sibling, 1 reply; 16+ messages in thread
From: Ming Lei @ 2019-03-25  9:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Xu, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei

On Mon, Mar 25, 2019 at 09:53:28AM +0100, Thomas Gleixner wrote:
> Ming,
> 
> On Mon, 25 Mar 2019, Ming Lei wrote:
> > On Mon, Mar 25, 2019 at 01:02:13PM +0800, Peter Xu wrote:
> > > One thing I can think of is the real-time scenario where "isolcpus="
> > > is provided, then logically we should not allow any isolated CPUs to
> > > be bound to any of the multi-queue IRQs.  Though Ming Lei and I had a
> > 
> > So far, this behaviour is made by user-space.
> > 
> > >From my understanding, IRQ subsystem doesn't handle "isolcpus=", even
> > though the Kconfig help doesn't mention irq affinity affect:
> > 
> >           Make sure that CPUs running critical tasks are not disturbed by
> >           any source of "noise" such as unbound workqueues, timers, kthreads...
> >           Unbound jobs get offloaded to housekeeping CPUs. This is driven by
> >           the "isolcpus=" boot parameter.
> 
> isolcpus has no effect on the interupts. That's what 'irqaffinity=' is for.

Indeed.

irq_default_affinity is built from 'irqaffinity=', however, we don't
consider irq_default_affinity for managed IRQ affinity.

Looks Peter wants to exclude some CPUs from the spread on managed IRQ.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2019-03-25  9:43         ` Peter Xu
@ 2019-03-25 13:27           ` Thomas Gleixner
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2019-03-25 13:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: Ming Lei, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei

Peter,

On Mon, 25 Mar 2019, Peter Xu wrote:
> Now I understand it can be guaranteed so it should not break
> determinism of the real-time applications.  But again, I'm curious
> whether we can specify how to spread the hardware queues of a block
> controller (as I asked in my previous post) instead of the default one
> (which is to spread the queues upon all the cores)?  I'll try to give
> a detailed example on this one this time: Let's assume we've had a
> host with 2 nodes and 8 cores (Node 0 with CPUs 0-3, Node 1 with CPUs
> 4-7), and a SCSI controller with 4 queues.  We want to take the 2nd
> node to run the real-time applications so we do isolcpus=4-7.  By
> default, IIUC the hardware queues will be allocated like this:
> 
>   - queue 1: CPU 0,1
>   - queue 2: CPU 2,3
>   - queue 3: CPU 4,5
>   - queue 4: CPU 6,7
> 
> And the IRQs of the queues will be bound to the same cpuset that the
> queue is bound to.
> 
> So my previous question is: since we know that CPU 4-7 won't generate
> any IO after all (and they shouldn't), could it be possible that we
> configure the system somehow to reflect a mapping like below:
> 
>   - queue 1: CPU 0
>   - qeueu 2: CPU 1
>   - queue 3: CPU 2
>   - queue 4: CPU 3
> 
> Then we disallow the CPUs 4-7 to generate IO and return failure if
> they tries to.
> 
> Again, I'm pretty uncertain on whether this case can be anything close
> to useful...  It just came out of my pure curiosity.  I think it at
> least has some benefits like: we will guarantee that the realtime CPUs
> won't send block IO requests (which could be good because it could
> simply break real-time determinism), and we'll save two queues from
> being totally idle (so if we run non-real-time block applications on
> cores 0-3 we still gain 4 hardware queues's throughput rather than 2).

If that _IS_ useful, then the affinity spreading logic can be changed to
accomodate that. It's not really hard to do so, but we'd need a proper
usecase for justification.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2019-03-25  9:50         ` Ming Lei
@ 2021-05-08  7:52           ` xuyihang
  2021-05-08 12:26             ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: xuyihang @ 2021-05-08  7:52 UTC (permalink / raw)
  To: Ming Lei, Thomas Gleixner
  Cc: Peter Xu, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei,
	liaochang1


在 2019/3/25 17:50, Ming Lei 写道:
> On Mon, Mar 25, 2019 at 09:53:28AM +0100, Thomas Gleixner wrote:
>> Ming,
>>
>> On Mon, 25 Mar 2019, Ming Lei wrote:
>>> On Mon, Mar 25, 2019 at 01:02:13PM +0800, Peter Xu wrote:
>>>> One thing I can think of is the real-time scenario where "isolcpus="
>>>> is provided, then logically we should not allow any isolated CPUs to
>>>> be bound to any of the multi-queue IRQs.  Though Ming Lei and I had a
>>> So far, this behaviour is made by user-space.
>>>
>>> >From my understanding, IRQ subsystem doesn't handle "isolcpus=", even
>>> though the Kconfig help doesn't mention irq affinity affect:
>>>
>>>            Make sure that CPUs running critical tasks are not disturbed by
>>>            any source of "noise" such as unbound workqueues, timers, kthreads...
>>>            Unbound jobs get offloaded to housekeeping CPUs. This is driven by
>>>            the "isolcpus=" boot parameter.
>> isolcpus has no effect on the interupts. That's what 'irqaffinity=' is for.
> Indeed.
>
> irq_default_affinity is built from 'irqaffinity=', however, we don't
> consider irq_default_affinity for managed IRQ affinity.
>
> Looks Peter wants to exclude some CPUs from the spread on managed IRQ.


Hi Ming and Thomas,


We are dealing with a scenario which may need to assign a default 
irqaffinity

for managed IRQ.


Assume we have a full CPU usage RT thread running binded to a specific CPU.

In the mean while, interrupt handler registered by a device which is 
ksoftirqd

may never have a chance to run. (And we don't want to use isolate CPU)


There could be a couple way to deal with this problem:

1. Adjust priority of ksoftirqd or RT thread, so the interrupt handler 
could preempt

RT thread. However, I am not sure whether it could have some side 
effects or not.

2. Adjust interrupt CPU affinity or RT thread affinity. But managed IRQ 
seems

design to forbid user from manipulating interrupt affinity.


It seems managed IRQ is coupled with user side application to me.

Would you share your thoughts about this issue please?


Thanks,

Yihang


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2021-05-08  7:52           ` xuyihang
@ 2021-05-08 12:26             ` Thomas Gleixner
  2021-05-10  3:19               ` liaochang (A)
  2021-05-10  8:48               ` xuyihang
  0 siblings, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2021-05-08 12:26 UTC (permalink / raw)
  To: xuyihang, Ming Lei
  Cc: Peter Xu, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei,
	liaochang1

Yihang,

On Sat, May 08 2021 at 15:52, xuyihang wrote:
>
> We are dealing with a scenario which may need to assign a default 
> irqaffinity for managed IRQ.
> 
> Assume we have a full CPU usage RT thread running binded to a specific
> CPU.
>
> In the mean while, interrupt handler registered by a device which is
> ksoftirqd may never have a chance to run. (And we don't want to use
> isolate CPU)

A device cannot register and interrupt handler in ksoftirqd.

> There could be a couple way to deal with this problem:
>
> 1. Adjust priority of ksoftirqd or RT thread, so the interrupt handler 
> could preempt
>
> RT thread. However, I am not sure whether it could have some side 
> effects or not.
>
> 2. Adjust interrupt CPU affinity or RT thread affinity. But managed IRQ 
> seems design to forbid user from manipulating interrupt affinity.
>
> It seems managed IRQ is coupled with user side application to me.
>
> Would you share your thoughts about this issue please?

Can you please provide a more detailed description of your system?

    - Number of CPUs

    - Kernel version
    - Is NOHZ full enabled?
    - Any isolation mechanisms enabled, and if so how are they
      configured (e.g. on the kernel command line)?

    - Number of queues in the multiqueue device
          
    - Is the RT thread issuing I/O to the multiqueue device?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2021-05-08 12:26             ` Thomas Gleixner
@ 2021-05-10  3:19               ` liaochang (A)
  2021-05-10  7:54                 ` Thomas Gleixner
  2021-05-10  8:48               ` xuyihang
  1 sibling, 1 reply; 16+ messages in thread
From: liaochang (A) @ 2021-05-10  3:19 UTC (permalink / raw)
  To: Thomas Gleixner, xuyihang, Ming Lei
  Cc: Peter Xu, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei

Hi Thomas,

在 2021/5/8 20:26, Thomas Gleixner 写道:
> Yihang,
> 
> On Sat, May 08 2021 at 15:52, xuyihang wrote:
>>
>> We are dealing with a scenario which may need to assign a default 
>> irqaffinity for managed IRQ.
>>
>> Assume we have a full CPU usage RT thread running binded to a specific
>> CPU.
>>
>> In the mean while, interrupt handler registered by a device which is
>> ksoftirqd may never have a chance to run. (And we don't want to use
>> isolate CPU)
> 
> A device cannot register and interrupt handler in ksoftirqd.

I learn the scenario further after communicate with Yihang offline:
1.We have a machine with 36 CPUs,and assign several RT threads to last two CPUs(CPU-34, CPU-35).
2.I/O device driver create single managed irq, the affinity of which includes CPU-34 and CPU-35.
3.Another regular application launch I/O operation at different CPUs with the ones RT threads use,
  then CPU-34/35 will receive hardware interrupt and wakeup ksoftirqd to deal with real I/O stuff.
4.Cause the priority and schedule policy of RT thread overwhlem per-cpu ksoftirqd, it looks like
  ksoftirqd has no chance to run at CPU-34/35,which leads to I/O processing can't finish at time,
  and application get stuck.

> 
>> There could be a couple way to deal with this problem:
>>
>> 1. Adjust priority of ksoftirqd or RT thread, so the interrupt handler 
>> could preempt
>>
>> RT thread. However, I am not sure whether it could have some side 
>> effects or not.
>>
>> 2. Adjust interrupt CPU affinity or RT thread affinity. But managed IRQ 
>> seems design to forbid user from manipulating interrupt affinity.
>>
>> It seems managed IRQ is coupled with user side application to me.
>>
>> Would you share your thoughts about this issue please?
> 
> Can you please provide a more detailed description of your system?
> 
>     - Number of CPUs
> 
>     - Kernel version
>     - Is NOHZ full enabled?
>     - Any isolation mechanisms enabled, and if so how are they
>       configured (e.g. on the kernel command line)?
> 
>     - Number of queues in the multiqueue device
>           
>     - Is the RT thread issuing I/O to the multiqueue device?
> 
> Thanks,
> 
>         tglx
> .
> 
BR,
Liao Chang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2021-05-10  3:19               ` liaochang (A)
@ 2021-05-10  7:54                 ` Thomas Gleixner
  2021-05-18  1:37                   ` liaochang (A)
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2021-05-10  7:54 UTC (permalink / raw)
  To: liaochang (A), xuyihang, Ming Lei
  Cc: Peter Xu, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei

Liao,

On Mon, May 10 2021 at 11:19, liaochang wrote:
> 1.We have a machine with 36 CPUs,and assign several RT threads to last
> two CPUs(CPU-34, CPU-35).

Which kind of machine? x86?

> 2.I/O device driver create single managed irq, the affinity of which
> includes CPU-34 and CPU-35.

If that driver creates only a single managed interrupt, then the
possible affinity of that interrupt spawns CPUs 0 - 35.

That's expected, but what is the effective affinity of that interrupt?

# cat /proc/irq/$N/effective_affinity

Also please provide the full output of

# cat /proc/interrupts

and point out which device we are talking about.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2021-05-08 12:26             ` Thomas Gleixner
  2021-05-10  3:19               ` liaochang (A)
@ 2021-05-10  8:48               ` xuyihang
  2021-05-10 19:56                 ` Thomas Gleixner
  1 sibling, 1 reply; 16+ messages in thread
From: xuyihang @ 2021-05-10  8:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ming Lei
  Cc: Peter Xu, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei,
	liaochang1

Thomas,

在 2021/5/8 20:26, Thomas Gleixner 写道:
> Yihang,
>
> On Sat, May 08 2021 at 15:52, xuyihang wrote:
>> We are dealing with a scenario which may need to assign a default
>> irqaffinity for managed IRQ.
>>
>> Assume we have a full CPU usage RT thread running binded to a specific
>> CPU.
>>
>> In the mean while, interrupt handler registered by a device which is
>> ksoftirqd may never have a chance to run. (And we don't want to use
>> isolate CPU)
> A device cannot register and interrupt handler in ksoftirqd.
>
>> There could be a couple way to deal with this problem:
>>
>> 1. Adjust priority of ksoftirqd or RT thread, so the interrupt handler
>> could preempt
>>
>> RT thread. However, I am not sure whether it could have some side
>> effects or not.
>>
>> 2. Adjust interrupt CPU affinity or RT thread affinity. But managed IRQ
>> seems design to forbid user from manipulating interrupt affinity.
>>
>> It seems managed IRQ is coupled with user side application to me.
>>
>> Would you share your thoughts about this issue please?
> Can you please provide a more detailed description of your system?
>
>      - Number of CPUs
It's a 4 CPU x86 VM.
>      - Kernel version
This experiment run on linux-4.19
>      - Is NOHZ full enabled?
nohz=off
>      - Any isolation mechanisms enabled, and if so how are they
>        configured (e.g. on the kernel command line)?

Some core is isolated by command line (such as : isolcpus=3), and bind

with RT thread, and no other isolation configure.

>      - Number of queues in the multiqueue device

Only one queue.

[root@localhost ~]# cat /proc/interrupts | grep request
  27:       5499          0          0          0   PCI-MSI 
65539-edge      virtio1-request

This environment is a virtual machine and it's a virtio device, I guess it

should not make any difference in this case.

>      - Is the RT thread issuing I/O to the multiqueue device?

The RT thread doesn't issue IO.



We simplified the reproduce procedure:

1. Start a busy loopping program that have near 100% cpu usage, named print

./print 1 1 &


2. Make the program become realtime application

chrt -f -p 1 11514


3. Bind the RT process to the **managed irq** core

taskset -cpa 0 11514


4. Use dd to write to hard drive, and dd could not finish and return.

dd if=/dev/zero of=/test.img bs=1K count=1 oflag=direct,sync &


Since CPU is fully utilized by RT application, and hard drive driver choose

CPU0 to handle it's softirq, there is no chance for dd to run.

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
   11514 root      -2   0    2228    740    676 R 100.0   0.0 3:26.70 print


If we make some change on this experiment:

1.  Make this RT application use less CPU time instead of 100%, the problem

disappear.

2, If we change rq_affinity to 2, in order to avoid handle softirq on 
the same

core of RT thread, the problem also disappear. However, this approach

result in about 10%-30% random write proformance deduction comparing

to rq_affinity = 1, since it may has better cache utilization.

echo 2 > /sys/block/sda/queue/rq_affinity


Therefore, I want to exclude some CPU from managed irq on boot parameter,

which has simliar approach to 11ea68f553e2 ("genirq, sched/isolation: 
Isolate

from handling managed interrupts").


Thanks,

Yihang


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2021-05-10  8:48               ` xuyihang
@ 2021-05-10 19:56                 ` Thomas Gleixner
  2021-05-11 12:38                   ` xuyihang
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2021-05-10 19:56 UTC (permalink / raw)
  To: xuyihang, Ming Lei
  Cc: Peter Xu, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei,
	liaochang1

Yihang,

On Mon, May 10 2021 at 16:48, xuyihang wrote:
> 在 2021/5/8 20:26, Thomas Gleixner 写道:
>> Can you please provide a more detailed description of your system?

>>      - Kernel version
> This experiment run on linux-4.19

Again. Please provide reports against the most recent mainline version
and not against some randomly picked kernel variant.

> If we make some change on this experiment:
>
> 1.  Make this RT application use less CPU time instead of 100%, the problem
> disappear.
>
> 2, If we change rq_affinity to 2, in order to avoid handle softirq on 
> the same core of RT thread, the problem also disappear. However, this approach
> result in about 10%-30% random write proformance deduction comparing
> to rq_affinity = 1, since it may has better cache utilization.
> echo 2 > /sys/block/sda/queue/rq_affinity
>
> Therefore, I want to exclude some CPU from managed irq on boot
> parameter,

Why has this realtime thread to run on CPU0 and cannot move to some
other CPU?

> which has simliar approach to 11ea68f553e2 ("genirq, sched/isolation: 
> Isolate from handling managed interrupts").

Why can't you use the existing isolation mechanisms?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2021-05-10 19:56                 ` Thomas Gleixner
@ 2021-05-11 12:38                   ` xuyihang
  0 siblings, 0 replies; 16+ messages in thread
From: xuyihang @ 2021-05-11 12:38 UTC (permalink / raw)
  To: Thomas Gleixner, Ming Lei
  Cc: Peter Xu, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei,
	liaochang1

Hi Thomas,


The previous experiment require a device driver to enable managed irq,

which I could not easily install on a most recent branch of OS.

Actually what I was asking is whether we could change the managed irq

behaviour a little bit, rather than reporting a bug.

So, to better illustrate this problem I do another test to simulate this 
scenario.


This time I wrote a kernel module, and in the module_init function, I use

request_irq to register a irq. In the irq_handler it put a work in the 
workqueue.

And the work_handler would print "work handler called".


1. Register a irq for a fake new deivce and queue a work_handler

when irq arrives.

/ # insmod request_irq.ko

2. Bind the irq to CPU3

/ # echo 8 > /proc/irq/7/smp_affinity

3. Start a full CPU usage RT process and bind to CPU3

./test.sh &

/ # taskset -p 8 100
pid 100's current affinity mask: f
pid 100's new affinity mask: 8

/ # chrt -f -p 1 100
pid 100's current scheduling policy: SCHED_OTHER
pid 100's current scheduling priority: 0
pid 100's new scheduling policy: SCHED_FIFO
pid 100's new scheduling priority: 1
/ # echo -1 >/proc/sys/kernel/sched_rt_runtime_us
/ # echo -1 >/proc/sys/kernel/sched_rt_period_us

/ # top

Mem: 27376K used, 73224K free, 0K shrd, 0K buff, 8368K cached
CPU0:  0.0% usr  0.0% sys  0.0% nic  100% idle  0.0% io  0.0% irq 0.0% sirq
CPU1:  0.0% usr  0.0% sys  0.0% nic  100% idle  0.0% io  0.0% irq 0.0% sirq
CPU2:  0.0% usr  0.0% sys  0.0% nic  100% idle  0.0% io  0.0% irq 0.0% sirq
CPU3:  100% usr  0.0% sys  0.0% nic  0.0% idle  0.0% io  0.0% irq 0.0% sirq
Load average: 4.00 4.00 4.00 5/62 126
   PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   100     1 0        R     3252  3.2   3 26.3 {exe} ash ./test.sh
   126     1 0        R     3252  3.2   1  0.8 top

...

/ # echo -n trigger > /sys/kernel/debug/irq/irqs/7

 From the demsg we can tell the queued work_handler is not called.


I could understand the behaviour is as expected, but in pratice, let's say

people work on the RT team could be a totally different team for device

driver. It feels like it is nice to have a feature to exclude some CPU from

managed irq driver.


在 2021/5/11 3:56, Thomas Gleixner 写道:
>
> Again. Please provide reports against the most recent mainline version
> and not against some randomly picked kernel variant.
This time I try it on current master branch.
Linux (none) 5.12.0-next-20210506+ #3 SMP Tue May 11 14:53:58 HKT 2021 
x86_64 GNU/Linux

>> If we make some change on this experiment:
>>
>> 1.  Make this RT application use less CPU time instead of 100%, the problem
>> disappear.
>>
>> 2, If we change rq_affinity to 2, in order to avoid handle softirq on
>> the same core of RT thread, the problem also disappear. However, this approach
>> result in about 10%-30% random write proformance deduction comparing
>> to rq_affinity = 1, since it may has better cache utilization.
>> echo 2 > /sys/block/sda/queue/rq_affinity
>>
>> Therefore, I want to exclude some CPU from managed irq on boot
>> parameter,
> Why has this realtime thread to run on CPU0 and cannot move to some
> other CPU?

Yes, this realtime thread could move to other CPU, but I think maybe it's

not so good to dodge the managed irq CPU. It also seems OS does not

give so much hint to indicate RT thread should not run on this CPU. I

think the kernel should be able to schedule the irq workqueue handler

a little bit, since RT thread is more like a user application and driver 
works

within kernel space.

>> which has simliar approach to 11ea68f553e2 ("genirq, sched/isolation:
>> Isolate from handling managed interrupts").
> Why can't you use the existing isolation mechanisms?

Isolation of CPU forbids other process from utilizing this CPU. Sometimes

the RT thread may not use up all CPU time, so other process could schedule

to this CPU and run for a little while.


Thanks for your time,

Yihang


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Virtio-scsi multiqueue irq affinity
  2021-05-10  7:54                 ` Thomas Gleixner
@ 2021-05-18  1:37                   ` liaochang (A)
  0 siblings, 0 replies; 16+ messages in thread
From: liaochang (A) @ 2021-05-18  1:37 UTC (permalink / raw)
  To: Thomas Gleixner, xuyihang, Ming Lei
  Cc: Peter Xu, Christoph Hellwig, Jason Wang, Luiz Capitulino,
	Linux Kernel Mailing List, Michael S. Tsirkin, minlei

Thomas,

在 2021/5/10 15:54, Thomas Gleixner 写道:
> Liao,
> 
> On Mon, May 10 2021 at 11:19, liaochang wrote:
>> 1.We have a machine with 36 CPUs,and assign several RT threads to last
>> two CPUs(CPU-34, CPU-35).
> 
> Which kind of machine? x86?
> 
>> 2.I/O device driver create single managed irq, the affinity of which
>> includes CPU-34 and CPU-35.
> 
> If that driver creates only a single managed interrupt, then the
> possible affinity of that interrupt spawns CPUs 0 - 35.
> 
> That's expected, but what is the effective affinity of that interrupt?
> 
> # cat /proc/irq/$N/effective_affinity
> 
> Also please provide the full output of
> 
> # cat /proc/interrupts
> 
> and point out which device we are talking about.

the mentioned managed irq is registered by virtio-scsi driver over PCI (on X86 platform, VM with 4 vCPU),
as shown below.

#lspci -vvv
...
00:04.0 SCSI storage controller: Virtio: Virtio SCSI
        Subsystem: Virtio: Device 0008
        Physical Slot: 4
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 11
        Region 0: I/O ports at c140 [size=64]
        Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K]
        Region 4: Memory at fe004000 (64-bit, prefetchable) [size=16K]
        Capabilities: [98] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=1 offset=00000000
                PBA: BAR=1 offset=00000800

#ls /sys/bus/pci/devices/0000:00:04.0/msi_irqs
33 34 35 36

#cat /proc/interrupts
...
 33:          0          0          0          0   PCI-MSI 65536-edge      virtio1-config
 34:          0          0          0          0   PCI-MSI 65537-edge      virtio1-control
 35:          0          0          0          0   PCI-MSI 65538-edge      virtio1-event
 36:      10637          0          0          0   PCI-MSI 65539-edge      virtio1-request

As you see, virtio-scsi allocates four MSI-X interrupts,from 33 to 36, and the last one supposes to
be triggered when the data of virtqueue is ready to receive, then its interrupt handler will raise
ksoftirqd to process I/O.If I assign FIFO RT thread to CPU0, a simple I/O operation issued by command
"dd if=/dev/zero of=/test.img bs=1K cout=1 oflag=direct,sync" will never finish.

Although that's expected, do you think it is sort of risky for Linux availability? Given in cloud
based environment,services from different teams may have serious influence to each other because of
lack of enough communication or good understanding about infrastructure, Thanks.

This problem arises when RT thread and ksoftirqd scheduled on the same CPU, beside placing RT thread
carefully, I also tried to set "rq_affinity" as 2, but the cost is a performance degradation of some
I/O benchmark by 10%~30%. So I wonder if the affinity of managed irq supports configuration from user space
or via kernel bootargs? Thanks.

> 
> Thanks,
> 
>         tglx
> .
> 
BR,
Liao, Chang

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-05-18  1:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-18  6:21 Virtio-scsi multiqueue irq affinity Peter Xu
2019-03-23 17:15 ` Thomas Gleixner
2019-03-25  5:02   ` Peter Xu
2019-03-25  7:06     ` Ming Lei
2019-03-25  8:53       ` Thomas Gleixner
2019-03-25  9:43         ` Peter Xu
2019-03-25 13:27           ` Thomas Gleixner
2019-03-25  9:50         ` Ming Lei
2021-05-08  7:52           ` xuyihang
2021-05-08 12:26             ` Thomas Gleixner
2021-05-10  3:19               ` liaochang (A)
2021-05-10  7:54                 ` Thomas Gleixner
2021-05-18  1:37                   ` liaochang (A)
2021-05-10  8:48               ` xuyihang
2021-05-10 19:56                 ` Thomas Gleixner
2021-05-11 12:38                   ` xuyihang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).