linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Peter Xu <peterx@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>, Jason Wang <jasowang@redhat.com>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: Virtio-scsi multiqueue irq affinity
Date: Sat, 23 Mar 2019 18:15:59 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.21.1903231805310.1798@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20190318062150.GC6654@xz-x1>

Peter,

On Mon, 18 Mar 2019, Peter Xu wrote:
> I noticed that starting from commit 0d9f0a52c8b9 ("virtio_scsi: use
> virtio IRQ affinity", 2017-02-27) the virtio scsi driver is using a
> new way (via irq_create_affinity_masks()) to automatically initialize
> IRQ affinities for the multi-queues, which is different comparing to
> all the other virtio devices (like virtio-net, which still uses
> virtqueue_set_affinity(), which is actually, irq_set_affinity_hint()).
> 
> Firstly, it will definitely broke some of the userspace programs with
> that when the scripts wanted to do the bindings explicitly like before
> and they could simply fail with -EIO now every time when echoing to
> /proc/irq/N/smp_affinity of any of the multi-queues (see
> write_irq_affinity()).

Did it break anything? I did not see a report so far. Assumptions about
potential breakage are not really useful.

> Is there any specific reason to do it with the new way?  Since AFAIU
> we should still allow the system admins to decide what to do for such
> configurations, .e.g., what if we only want to provision half of the
> CPU resources to handle IRQs for a specific virtio-scsi controller?
> We won't be able to achieve that with current policy.  Or, could this
> be a question for the IRQ system (irq_create_affinity_masks()) in
> general?  Any special considerations behind the big picture?

That has nothing to do with the irq subsystem. That merily provides the
mechanisms.

The reason behind this is that multi-queue devices set up queues per cpu or
if not enough queues are available queues per cpu groups. So it does not
make sense to move the interrupt away from the CPU or the CPU group.

Aside of that in the CPU hotunplug case, interrupts used to be moved to the
online CPUs which resulted in problems for e.g. hibernation because on
large systems moving all interrupts to the boot CPU does not work due to
vector space exhaustion. Also CPU hotunplug is used for power management
purposes and there it does not make sense either to have the per cpu queues
of the offlined CPUs moved to the still online CPUs which then end up with
several queues.

The new way to deal with this is to strictly bind per CPU (per CPU group)
queues. If the CPU or the last CPU in the group goes offline the following
happens:

 1) The queue is disabled, i.e. no new requests can be queued

 2) Wait for the outstanding requests to complete

 3) Shut down the interrupt

 This avoids having multiple queues moved to the still online CPUs and also
 prevents vector space exhaustion because the shut down interrupt does not
 have to be migrated.

When the CPU (or the first in the group) comes online again:

 1) Reenable the interrupt

 2) Reenable the queue

Hope that helps.

Thanks,

	tglx

  reply	other threads:[~2019-03-23 17:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-18  6:21 Virtio-scsi multiqueue irq affinity Peter Xu
2019-03-23 17:15 ` Thomas Gleixner [this message]
2019-03-25  5:02   ` Peter Xu
2019-03-25  7:06     ` Ming Lei
2019-03-25  8:53       ` Thomas Gleixner
2019-03-25  9:43         ` Peter Xu
2019-03-25 13:27           ` Thomas Gleixner
2019-03-25  9:50         ` Ming Lei
2021-05-08  7:52           ` xuyihang
2021-05-08 12:26             ` Thomas Gleixner
2021-05-10  3:19               ` liaochang (A)
2021-05-10  7:54                 ` Thomas Gleixner
2021-05-18  1:37                   ` liaochang (A)
2021-05-10  8:48               ` xuyihang
2021-05-10 19:56                 ` Thomas Gleixner
2021-05-11 12:38                   ` xuyihang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1903231805310.1798@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=hch@lst.de \
    --cc=jasowang@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).