Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: Jens Axboe <axboe@fb.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>, Long Li <longli@microsoft.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Nadolski Edmund <edmund.nadolski@intel.com>,
	Keith Busch <kbusch@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH V3 0/2] nvme-pci: check CQ after batch submission for Microsoft device
Date: Fri, 22 Nov 2019 21:58:36 +0000
Message-ID: <9ef6c1da-99c5-14f8-edb7-af50c935ce76@fb.com> (raw)
In-Reply-To: <20191122214954.GB8700@ming.t460p>

On 11/22/19 2:49 PM, Ming Lei wrote:
> On Fri, Nov 22, 2019 at 02:04:52PM +0000, Jens Axboe wrote:
>> On 11/22/19 3:25 AM, Ming Lei wrote:
>>>> as that will still overload the one cpu that the interrupt handler was
>>>> assigned to.  A dumb fix would be a cpu mask for the threaded interrupt
>>>
>>> Actually one CPU is fast enough to handle several drive's interrupt handling.
>>> Also there is per-queue depth limit, and the interrupt flood issue in network
>>> can't be serious on storage.
>>
>> This is true today, but it won't be true in the future. Lets aim for a
>> solution that's a little more future proof than just "enough today", if
>> we're going to make changes in this area.
> 
> That should be a new feature for future hardware, and we don't know any
> performance details, and it can be hard to prepare for it now. Maybe
> such hardware or case never comes:

Oh it'll surely come, and maybe sooner than you think. My point is that
using "one CPU is fast enough to handle several drive interrupts" is
very shortsighted, and probably not even true today.

> - storage device has queue depth, which limits the max in-flight requests
> to be handled in each queue's interrupt handler.

Only if new requests aren't also coming in and completing while you are
doing that work.
> 
> - Suppose such fast hardware comes, it isn't reasonable for them
> to support N:1 mapping(N is big).

Very true, in fact that's already pretty damn dumb today...

> - Also IRQ matrix has balanced interrupt handling loading already, that
> said most of times, one CPU is just responsible for handing one hw queue's
> interrupt. Even in Azure's case, 8 CPUs are mapped to one hw queue, but
> there is just several CPUs which is for responsible for at most 2 hw queues.
> 
> So could we focus on now and fix the regression first?

As far as I could tell from the other message, sounds like they both
have broken interrupt coalescing? Makes it harder to care, honestly...

But yes, I think we should do something about this. This really isn't a
new issue, if a core gets overloaded just doing completions from
interrupts, we should punt the work. NAPI has been doing that for ages,
and the block layer also used to have support it, but nobody used it.
Would be a great idea to make a blk-mq friendly version of that, with
the kinds of IOPS and latencies in mind that we see today and in the
coming years. I don't think hacking around this in the nvme driver is a
very good way to go about it.

-- 
Jens Axboe

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-14  2:59 Ming Lei
2019-11-14  2:59 ` [PATCH V3 1/2] nvme-pci: move sq/cq_poll lock initialization into nvme_init_queue Ming Lei
2019-11-14  2:59 ` [PATCH V3 2/2] nvme-pci: check CQ after batch submission for Microsoft device Ming Lei
2019-11-14  4:56   ` Keith Busch
2019-11-14  8:56     ` Ming Lei
2019-11-21  3:11 ` [PATCH V3 0/2] " Ming Lei
2019-11-21  6:14   ` Christoph Hellwig
2019-11-21  7:46     ` Ming Lei
2019-11-21 15:45       ` Keith Busch
2019-11-22  9:44         ` Ming Lei
2019-11-22  9:57           ` Christoph Hellwig
2019-11-22 10:25             ` Ming Lei
2019-11-22 14:04               ` Jens Axboe
2019-11-22 21:49                 ` Ming Lei
2019-11-22 21:58                   ` Jens Axboe [this message]
2019-11-22 22:30                     ` Ming Lei

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ef6c1da-99c5-14f8-edb7-af50c935ce76@fb.com \
    --to=axboe@fb.com \
    --cc=edmund.nadolski@intel.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=longli@microsoft.com \
    --cc=ming.lei@redhat.com \
    --cc=sagi@grimberg.me \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git