iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Jean-Philippe Brucker <jean-philippe@linaro.org>
To: John Garry <john.garry@huawei.com>
Cc: Will Deacon <will@kernel.org>, Ming Lei <ming.lei@redhat.com>,
	iommu@lists.linux-foundation.org, Marc Zyngier <maz@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>
Subject: Re: arm-smmu-v3 high cpu usage for NVMe
Date: Thu, 19 Mar 2020 19:43:49 +0100	[thread overview]
Message-ID: <20200319184349.GA1697676@myrica> (raw)
In-Reply-To: <c6ab8020-dc06-0c7d-7a41-e792d90f97ba@huawei.com>

On Thu, Mar 19, 2020 at 12:54:59PM +0000, John Garry wrote:
> Hi Will,
> 
> > 
> > On Thu, Jan 02, 2020 at 05:44:39PM +0000, John Garry wrote:
> > > And for the overall system, we have:
> > > 
> > >    PerfTop:   85864 irqs/sec  kernel:89.6%  exact:  0.0% lost: 0/34434 drop:
> > > 0/40116 [4000Hz cycles],  (all, 96 CPUs)
> > > --------------------------------------------------------------------------------------------------------------------------
> > > 
> > >      27.43%  [kernel]          [k] arm_smmu_cmdq_issue_cmdlist
> > >      11.71%  [kernel]          [k] _raw_spin_unlock_irqrestore
> > >       6.35%  [kernel]          [k] _raw_spin_unlock_irq
> > >       2.65%  [kernel]          [k] get_user_pages_fast
> > >       2.03%  [kernel]          [k] __slab_free
> > >       1.55%  [kernel]          [k] tick_nohz_idle_exit
> > >       1.47%  [kernel]          [k] arm_lpae_map
> > >       1.39%  [kernel]          [k] __fget
> > >       1.14%  [kernel]          [k] __lock_text_start
> > >       1.09%  [kernel]          [k] _raw_spin_lock
> > >       1.08%  [kernel]          [k] bio_release_pages.part.42
> > >       1.03%  [kernel]          [k] __sbitmap_get_word
> > >       0.97%  [kernel]          [k] arm_smmu_atc_inv_domain.constprop.42
> > >       0.91%  [kernel]          [k] fput_many
> > >       0.88%  [kernel]          [k] __arm_lpae_map
> > > 
> > > One thing to note is that we still spend an appreciable amount of time in
> > > arm_smmu_atc_inv_domain(), which is disappointing when considering it should
> > > effectively be a noop.
> > > 
> > > As for arm_smmu_cmdq_issue_cmdlist(), I do note that during the testing our
> > > batch size is 1, so we're not seeing the real benefit of the batching. I
> > > can't help but think that we could improve this code to try to combine CMD
> > > SYNCs for small batches.
> > > 
> > > Anyway, let me know your thoughts or any questions. I'll have a look if a
> > > get a chance for other possible bottlenecks.
> > 
> > Did you ever get any more information on this? I don't have any SMMUv3
> > hardware any more, so I can't really dig into this myself.
> 
> I'm only getting back to look at this now, as SMMU performance is a bit of a
> hot topic again for us.
> 
> So one thing we are doing which looks to help performance is this series
> from Marc:
> 
> https://lore.kernel.org/lkml/9171c554-50d2-142b-96ae-1357952fce52@huawei.com/T/#mee5562d1efd6aaeb8d2682bdb6807fe7b5d7f56d
> 
> So that is just spreading the per-CPU load for NVMe interrupt handling
> (where the DMA unmapping is happening), so I'd say just side-stepping any
> SMMU issue really.
> 
> Going back to the SMMU, I wanted to run epbf and perf annotate to help
> profile this, but was having no luck getting them to work properly. I'll
> look at this again now.

Could you also try with the upcoming ATS change currently in Will's tree?
They won't improve your numbers but it'd be good to check that they don't
make things worse.

I've run a bunch of netperf instances on multiple cores and collecting
SMMU usage (on TaiShan 2280). I'm getting the following ratio pretty
consistently.

- 6.07% arm_smmu_iotlb_sync
   - 5.74% arm_smmu_tlb_inv_range
        5.09% arm_smmu_cmdq_issue_cmdlist
        0.28% __pi_memset
        0.08% __pi_memcpy
        0.08% arm_smmu_atc_inv_domain.constprop.37
        0.07% arm_smmu_cmdq_build_cmd
        0.01% arm_smmu_cmdq_batch_add
     0.31% __pi_memset

So arm_smmu_atc_inv_domain() takes about 1.4% of arm_smmu_iotlb_sync(),
when ATS is not used. According to the annotations, the load from the
atomic_read(), that checks whether the domain uses ATS, is 77% of the
samples in arm_smmu_atc_inv_domain() (265 of 345 samples), so I'm not sure
there is much room for optimization there.

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2020-03-19 18:44 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 15:17 [PATCH v2 0/8] Sort out SMMUv3 ATC invalidation and locking Will Deacon
2019-08-21 15:17 ` [PATCH v2 1/8] iommu/arm-smmu-v3: Document ordering guarantees of command insertion Will Deacon
2019-08-21 15:17 ` [PATCH v2 2/8] iommu/arm-smmu-v3: Disable detection of ATS and PRI Will Deacon
2019-08-21 15:36   ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 3/8] iommu/arm-smmu-v3: Remove boolean bitfield for 'ats_enabled' flag Will Deacon
2019-08-21 15:17 ` [PATCH v2 4/8] iommu/arm-smmu-v3: Don't issue CMD_SYNC for zero-length invalidations Will Deacon
2019-08-21 15:17 ` [PATCH v2 5/8] iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters Will Deacon
2019-08-21 15:50   ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 6/8] iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs Will Deacon
2019-08-21 16:25   ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 7/8] iommu/arm-smmu-v3: Avoid locking on invalidation path when not using ATS Will Deacon
2019-08-22 12:36   ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 8/8] Revert "iommu/arm-smmu-v3: Disable detection of ATS and PRI" Will Deacon
2020-01-02 17:44 ` arm-smmu-v3 high cpu usage for NVMe John Garry
2020-03-18 20:53   ` Will Deacon
2020-03-19 12:54     ` John Garry
2020-03-19 18:43       ` Jean-Philippe Brucker [this message]
2020-03-20 10:41         ` John Garry
2020-03-20 11:18           ` Jean-Philippe Brucker
2020-03-20 16:20             ` John Garry
2020-03-20 16:33               ` Marc Zyngier
2020-03-23  9:03                 ` John Garry
2020-03-23  9:16                   ` Marc Zyngier
2020-03-24  9:18                     ` John Garry
2020-03-24 10:43                       ` Marc Zyngier
2020-03-24 11:55                         ` John Garry
2020-03-24 12:07                           ` Robin Murphy
2020-03-24 12:37                             ` John Garry
2020-03-25 15:31                               ` John Garry
2020-05-22 14:52           ` John Garry
2020-05-25  5:57             ` Song Bao Hua (Barry Song)
     [not found]     ` <482c00d5-8e6d-1484-820e-1e89851ad5aa@huawei.com>
2020-04-06 15:11       ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200319184349.GA1697676@myrica \
    --to=jean-philippe@linaro.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=john.garry@huawei.com \
    --cc=maz@kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).