Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: sagi@grimberg.me, linux-nvme@lists.infradead.org,
	ming.lei@redhat.com, helgaas@kernel.org, tglx@linutronix.de,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCHv2 1/2] PCI/MSI: Export __pci_msix_desc_mask_irq
Date: Sat, 7 Dec 2019 06:18:21 +0900
Message-ID: <20191206211821.GA1709@redsun51.ssa.fujisawa.hgst.com> (raw)
In-Reply-To: <20191203090454.ftfu6pyz2ubxg7fk@linutronix.de>

On Tue, Dec 03, 2019 at 10:04:54AM +0100, Sebastian Andrzej Siewior wrote:
> On 2019-12-02 23:46:03 [+0100], Christoph Hellwig wrote:
> > On Tue, Dec 03, 2019 at 07:20:57AM +0900, Keith Busch wrote:
> > > Export the fast msix mask for drivers to use when the read-back is
> > > undesired.
> > 
> > As said last time calling this seems wrong as it breaks the irq_chip
> > abstraction.  But looking at the disable_irq_nosync semantics I think
> > that function should do a non-pasted disable disable for MSI(-X)
> > interrupts.  Can you look into that?
> 
> Using disable_irq_nosync() would be the same as using IRQF_ONESHOT which
> is the preferred way.
> Keith complained about this as slow and avoiding the read-back as
> noticeable.
> 
> The generic way would be pci_msi_mask_irq() and the difference
> 
> |                 msix_mask_irq(desc, flag);
> |                 readl(desc->mask_base);         /* Flush write to device */
> 
> would be that flush.

Right, so the solution should be simply to remove the readl(). I'm pretty
sure that's safe to do: if mask_irq() returns before the device happens
to see the mask is set and generates an undesired interrupt, the irq
flow handler will observe irqd_irq_disabled() and return early. We have
to deal with that anyway because the device may have sent an interrupt
message at the same time the CPU was masking it. These would look
like the same thing from the CPU perspective.

I can't be completely sure it's safe for everyone though, so I'll try to
quantify the impact of the read back on nvme with some real hardware,
because I'm starting to wonder if this is really as important as I
initially thought.

If we do two readl()'s per IO, that's pretty noticable. It looks like
it's adding about 1usec to the completion latency (plus or minus, it
depends on the platform and if switches are involved).

But we certainly don't need this for each IO. For low depth workloads,
we can just handle all completions in the primary handler and never
mask interrupts.

In case there are lots of completions that are better handled in the
nvme_irq_thread(), then we can call disable_irq_nosync(). That doesn't
immediately mask MSIx because we didn't set IRQ_DISABLE_UNLAZY. The mask
won't happen until we see another interrupt, which means the thread
is going to be handling a lot of completions. The ratio of commands
processed to msix masking would be quite low. As far as I can tell,
the overhead seems pretty negligible.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply index

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-02 22:20 [PATCHv2 0/2] Keith Busch
2019-12-02 22:20 ` [PATCHv2 1/2] PCI/MSI: Export __pci_msix_desc_mask_irq Keith Busch
2019-12-02 22:46   ` Christoph Hellwig
2019-12-03  9:04     ` Sebastian Andrzej Siewior
2019-12-06 21:18       ` Keith Busch [this message]
2019-12-02 22:20 ` [PATCHv2 2/2] nvme/pci: Mask device interrupts for threaded handlers Keith Busch
2019-12-03  7:47   ` Christoph Hellwig
2019-12-03 12:07     ` Keith Busch
2019-12-04 10:10   ` Sironi, Filippo
2019-12-04 13:58     ` hch

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191206211821.GA1709@redsun51.ssa.fujisawa.hgst.com \
    --to=kbusch@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=hch@lst.de \
    --cc=helgaas@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=ming.lei@redhat.com \
    --cc=sagi@grimberg.me \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git