iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Will Deacon <will@kernel.org>, <iommu@lists.linux-foundation.org>
Cc: Vijay Kilary <vkilari@codeaurora.org>,
	Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
	Jon Masters <jcm@redhat.com>, Jan Glauber <jglauber@marvell.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Jayachandran Chandrasekharan Nair <jnair@marvell.com>,
	Robin Murphy <robin.murphy@arm.com>
Subject: Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue
Date: Wed, 24 Jul 2019 10:58:26 +0100	[thread overview]
Message-ID: <c8dcc53f-8afa-0966-dcfd-ca79b099893f@huawei.com> (raw)
In-Reply-To: <20190711171927.28803-1-will@kernel.org>

On 11/07/2019 18:19, Will Deacon wrote:
> Hi everyone,
>
> This is a significant rework of the RFC I previously posted here:
>
>   https://lkml.kernel.org/r/20190611134603.4253-1-will.deacon@arm.com
>
> But this time, it looks like it might actually be worthwhile according
> to my perf profiles, where __iommu_unmap() falls a long way down the
> profile for a multi-threaded netperf run. I'm still relying on others to
> confirm this is useful, however.
>
> Some of the changes since last time are:
>
>   * Support for constructing and submitting a list of commands in the
>     driver
>
>   * Numerous changes to the IOMMU and io-pgtable APIs so that we can
>     submit commands in batches
>
>   * Removal of cmpxchg() from cmdq_shared_lock() fast-path
>
>   * Code restructuring and cleanups
>
> This current applies against my iommu/devel branch that Joerg has pulled
> for 5.3. If you want to test it out, I've put everything here:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq
>
> Feedback welcome. I appreciate that we're in the merge window, but I
> wanted to get this on the list for people to look at as an RFC.
>

I tested storage performance on this series, which I think is a better 
scenario to test than network performance, that being generally limited 
by the network link speed.

Results:

Baseline performance (will/iommu/devel, commit 9e6ea59f3)
8x SAS disks D05	839K IOPS
1x NVMe D05		454K IOPS
1x NVMe D06		442k IOPS

Patchset performance (will/iommu/cmdq)
8x SAS disk D05		835K IOPS
1x NVMe D05		472K IOPS
1x NVMe D06		459k IOPS

So we see a bit of an NVMe boost, but about the same for 8x disks. No 
iommu performance is about 918K IOPs for 8x disks, so it is not limited 
by the medium.

The D06 is a bit memory starved, so that may account for generally lower 
NVMe performance.

John

> Cheers,
>
> Will
>
> --->8
>
> Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
> Cc: Jan Glauber <jglauber@marvell.com>
> Cc: Jon Masters <jcm@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Zhen Lei <thunder.leizhen@huawei.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Vijay Kilary <vkilari@codeaurora.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
>
> Will Deacon (19):
>   iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops
>   iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()
>   iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops
>   iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes
>   iommu: Introduce iommu_iotlb_gather_add_page()
>   iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync()
>   iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf()
>   iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in
>     drivers
>   iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf()
>   iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page()
>   iommu/io-pgtable: Remove unused ->tlb_sync() callback
>   iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap()
>   iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page()
>   iommu/arm-smmu-v3: Separate s/w and h/w views of prod and cons indexes
>   iommu/arm-smmu-v3: Drop unused 'q' argument from Q_OVF macro
>   iommu/arm-smmu-v3: Move low-level queue fields out of arm_smmu_queue
>   iommu/arm-smmu-v3: Operate directly on low-level queue where possible
>   iommu/arm-smmu-v3: Reduce contention during command-queue insertion
>   iommu/arm-smmu-v3: Defer TLB invalidation until ->iotlb_sync()
>
>  drivers/gpu/drm/panfrost/panfrost_mmu.c |  24 +-
>  drivers/iommu/amd_iommu.c               |  11 +-
>  drivers/iommu/arm-smmu-v3.c             | 856 ++++++++++++++++++++++++--------
>  drivers/iommu/arm-smmu.c                | 103 +++-
>  drivers/iommu/dma-iommu.c               |   9 +-
>  drivers/iommu/exynos-iommu.c            |   3 +-
>  drivers/iommu/intel-iommu.c             |   3 +-
>  drivers/iommu/io-pgtable-arm-v7s.c      |  57 +--
>  drivers/iommu/io-pgtable-arm.c          |  48 +-
>  drivers/iommu/iommu.c                   |  24 +-
>  drivers/iommu/ipmmu-vmsa.c              |  28 +-
>  drivers/iommu/msm_iommu.c               |  42 +-
>  drivers/iommu/mtk_iommu.c               |  45 +-
>  drivers/iommu/mtk_iommu_v1.c            |   3 +-
>  drivers/iommu/omap-iommu.c              |   2 +-
>  drivers/iommu/qcom_iommu.c              |  44 +-
>  drivers/iommu/rockchip-iommu.c          |   2 +-
>  drivers/iommu/s390-iommu.c              |   3 +-
>  drivers/iommu/tegra-gart.c              |  12 +-
>  drivers/iommu/tegra-smmu.c              |   2 +-
>  drivers/vfio/vfio_iommu_type1.c         |  27 +-
>  include/linux/io-pgtable.h              |  57 ++-
>  include/linux/iommu.h                   |  92 +++-
>  23 files changed, 1090 insertions(+), 407 deletions(-)
>


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  parent reply	other threads:[~2019-07-24  9:58 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-11 17:19 [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 01/19] iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 02/19] iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 03/19] iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 04/19] iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes Will Deacon
2019-07-24  7:19   ` Joerg Roedel
2019-07-24  7:41     ` Will Deacon
2019-07-25  7:58       ` Joerg Roedel
2019-07-11 17:19 ` [RFC PATCH v2 05/19] iommu: Introduce iommu_iotlb_gather_add_page() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 06/19] iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 07/19] iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 08/19] iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 09/19] iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 10/19] iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 11/19] iommu/io-pgtable: Remove unused ->tlb_sync() callback Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 12/19] iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 13/19] iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 14/19] iommu/arm-smmu-v3: Separate s/w and h/w views of prod and cons indexes Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 15/19] iommu/arm-smmu-v3: Drop unused 'q' argument from Q_OVF macro Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 16/19] iommu/arm-smmu-v3: Move low-level queue fields out of arm_smmu_queue Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 17/19] iommu/arm-smmu-v3: Operate directly on low-level queue where possible Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion Will Deacon
2019-07-19 11:04   ` John Garry
2019-07-24 12:15     ` Will Deacon
2019-07-24 14:03       ` John Garry
2019-07-24 14:07         ` Will Deacon
2019-07-24  8:20   ` John Garry
2019-07-24 14:33     ` Will Deacon
2019-07-25 11:31       ` John Garry
2019-07-11 17:19 ` [RFC PATCH v2 19/19] iommu/arm-smmu-v3: Defer TLB invalidation until ->iotlb_sync() Will Deacon
2019-07-19  4:25 ` [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue Ganapatrao Kulkarni
2019-07-24 12:28   ` Will Deacon
2019-07-24  9:58 ` John Garry [this message]
2019-07-24 12:20   ` Will Deacon
2019-07-24 14:25     ` John Garry
2019-07-24 14:48       ` Will Deacon
2019-07-25 10:11         ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8dcc53f-8afa-0966-dcfd-ca79b099893f@huawei.com \
    --to=john.garry@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jcm@redhat.com \
    --cc=jean-philippe.brucker@arm.com \
    --cc=jglauber@marvell.com \
    --cc=jnair@marvell.com \
    --cc=robin.murphy@arm.com \
    --cc=vkilari@codeaurora.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).