LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: John Garry <john.garry@huawei.com>
Cc: Marc Zyngier <maz@kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"chenxiang (M)" <chenxiang66@hisilicon.com>,
	"bigeasy@linutronix.de" <bigeasy@linutronix.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"hare@suse.com" <hare@suse.com>, "hch@lst.de" <hch@lst.de>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	Zhang Yi <yi.zhang@redhat.com>
Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt
Date: Fri, 3 Jan 2020 19:29:08 +0800
Message-ID: <20200103112908.GA20353@ming.t460p> (raw)
In-Reply-To: <2b070d25-ee35-aa1f-3254-d086c6b872b1@huawei.com>

On Fri, Jan 03, 2020 at 10:41:48AM +0000, John Garry wrote:
> On 03/01/2020 00:46, Ming Lei wrote:
> > > > > d the
> > > > > DMA API more than an architecture-specific problem.
> > > > > 
> > > > > Given that we have so far very little data, I'd hold off any conclusion.
> > > > We can start to collect latency data of dma unmapping vs nvme_irq()
> > > > on both x86 and arm64.
> > > > 
> > > > I will see if I can get a such box for collecting the latency data.
> > > To reiterate what I mentioned before about IOMMU DMA unmap on x86, a key
> > > difference is that by default it uses the non-strict (lazy) mode unmap, i.e.
> > > we unmap in batches. ARM64 uses general default, which is strict mode, i.e.
> > > every unmap results in an IOTLB fluch.
> > > 
> > > In my setup, if I switch to lazy unmap (set iommu.strict=0 on cmdline), then
> > > no lockup.
> > > 
> > > Are any special IOMMU setups being used for x86, like enabling strict mode?
> > > I don't know...
> > BTW, I have run the test on one 224-core ARM64 with one 32-hw_queue NVMe, the
> > softlock issue can be triggered in one minute.
> > 
> > nvme_irq() often takes ~5us to complete on this machine, then there is really
> > risk of cpu lockup when IOPS is > 200K.
> 
> Do you have a typical nvme_irq() completion time for a mid-range x86 server?

~1us.

It is done via bcc script, and ebpf itself may introduce some overhead.

> 
> > 
> > The soft lockup can be triggered too if 'iommu.strict=0' is passed in,
> > just takes a bit longer by starting more IO jobs.
> > 
> > In above test, I submit IO to one single NVMe drive from 4 CPU cores via 8 or
> > 12 jobs(iommu.strict=0), meantime make the nvme interrupt handled just in one
> > dedicated CPU core.
> 
> Well a problem with so many CPUs is that it does not scale (well) with MQ
> devices, like NVMe.
> 
> As CPU count goes up, device queue count doesn't and we get more contention.

The problem is worse on ARM64 system, in which there are more CPU cores,
and each single CPU core is often slower than x86's. Meantime each
hardware interrupt has to be handled on single CPU target.

Also the storage device(such as NVMe) itself should be same for both
from performance viewpoint.

> 
> > 
> > Is there lock contention among iommu dma map and unmap callback?
> 
> There would be the IOVA management, but that would be common to x86. Each
> CPU keeps an IOVA cache, and there is a central pool of cached IOVAs, so
> that reduces any contention, unless the caches are exhausted.
> 
> I think most contention/bottleneck is at the SMMU HW interface, which has a
> single queue interface.

Not sure if it is related with single queue interface, given my test just
uses single hw queue by pushing several CPU cores to submit IO and
handling the single queue's interrupt on one dedicated CPU core.


Thanks,
Ming


  reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06 14:35 [PATCH RFC 0/1] Threaded handler uses irq affinity for when the interrupt is managed John Garry
2019-12-06 14:35 ` [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt John Garry
2019-12-06 15:22   ` Marc Zyngier
2019-12-06 16:16     ` John Garry
2019-12-07  8:03   ` Ming Lei
2019-12-09 14:30     ` John Garry
2019-12-09 15:09       ` Hannes Reinecke
2019-12-09 15:17         ` Marc Zyngier
2019-12-09 15:25           ` Hannes Reinecke
2019-12-09 15:36             ` Marc Zyngier
2019-12-09 15:49           ` Qais Yousef
2019-12-09 15:55             ` Marc Zyngier
2019-12-10  1:43       ` Ming Lei
2019-12-10  9:45         ` John Garry
2019-12-10 10:06           ` Ming Lei
2019-12-10 10:28           ` Marc Zyngier
2019-12-10 10:59             ` John Garry
2019-12-10 11:36               ` Marc Zyngier
2019-12-10 12:05                 ` John Garry
2019-12-10 18:32                   ` Marc Zyngier
2019-12-11  9:41                     ` John Garry
2019-12-13 10:07                       ` John Garry
2019-12-13 10:31                         ` Marc Zyngier
2019-12-13 12:08                           ` John Garry
2019-12-14 10:59                             ` Marc Zyngier
2019-12-11 17:09         ` John Garry
2019-12-12 22:38           ` Ming Lei
2019-12-13 11:12             ` John Garry
2019-12-13 13:18               ` Ming Lei
2019-12-13 15:43                 ` John Garry
2019-12-13 17:12                   ` Ming Lei
2019-12-13 17:50                     ` John Garry
2019-12-14 13:56                   ` Marc Zyngier
2019-12-16 10:47                     ` John Garry
2019-12-16 11:40                       ` Marc Zyngier
2019-12-16 14:17                         ` John Garry
2019-12-16 18:00                           ` Marc Zyngier
2019-12-16 18:50                             ` John Garry
2019-12-20 11:30                               ` John Garry
2019-12-20 14:43                                 ` Marc Zyngier
2019-12-20 15:38                                   ` John Garry
2019-12-20 16:16                                     ` Marc Zyngier
2019-12-20 23:31                                     ` Ming Lei
2019-12-23  9:07                                       ` Marc Zyngier
2019-12-23 10:26                                         ` John Garry
2019-12-23 10:47                                           ` Marc Zyngier
2019-12-23 11:35                                             ` John Garry
2019-12-24  1:59                                             ` Ming Lei
2019-12-24 11:20                                               ` Marc Zyngier
2019-12-25  0:48                                                 ` Ming Lei
2020-01-02 10:35                                                   ` John Garry
2020-01-03  0:46                                                     ` Ming Lei
2020-01-03 10:41                                                       ` John Garry
2020-01-03 11:29                                                         ` Ming Lei [this message]
2020-01-03 11:50                                                           ` John Garry
2020-01-04 12:03                                                             ` Ming Lei
2020-05-30  7:46 ` [tip: irq/core] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs tip-bot2 for Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200103112908.GA20353@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=bvanassche@acm.org \
    --cc=chenxiang66@hisilicon.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git