All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xingang Wang <wangxingang5@huawei.com>
To: Eric Auger <eric.auger@redhat.com>, <eric.auger.pro@gmail.com>,
	<jean-philippe@linaro.org>, <iommu@lists.linux-foundation.org>,
	<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
	<kvmarm@lists.cs.columbia.edu>, <will@kernel.org>,
	<maz@kernel.org>, <robin.murphy@arm.com>, <joro@8bytes.org>,
	<alex.williamson@redhat.com>, <tn@semihalf.com>,
	<zhukeqian1@huawei.com>
Cc: <jacob.jun.pan@linux.intel.com>, <yi.l.liu@intel.com>,
	<zhangfei.gao@linaro.org>, <zhangfei.gao@gmail.com>,
	<vivek.gautam@arm.com>, <shameerali.kolothum.thodi@huawei.com>,
	<yuzenghui@huawei.com>, <nicoleotsuka@gmail.com>,
	<lushenming@huawei.com>, <vsethi@nvidia.com>,
	<chenxiang66@hisilicon.com>, <vdumpa@nvidia.com>,
	<jiangkunkun@huawei.com>
Subject: Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Date: Wed, 14 Apr 2021 10:36:11 +0800	[thread overview]
Message-ID: <55930e46-0a45-0d43-b34e-432cf332b42c@huawei.com> (raw)
In-Reply-To: <20210411111228.14386-1-eric.auger@redhat.com>

Hi Eric, Jean-Philippe

On 2021/4/11 19:12, Eric Auger wrote:
> SMMUv3 Nested Stage Setup (IOMMU part)
> 
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
> 
> This is based on Jean-Philippe's
> [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
> https://www.spinics.net/lists/arm-kernel/msg886518.html
> (including the patches that were not pulled for 5.13)
> 
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 
> Then those capabilities gets implemented in the SMMUv3 driver.
> 
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> v5.12-rc6-jean-iopf-14-2stage-v15
> (including the VFIO part in its last version: v13)
> 

I am testing the performance of an accelerator with/without SVA/vSVA,
and found there might be some potential performance loss risk for SVA/vSVA.

I use a Network and computing encryption device (SEC), and send 1MB 
request for 10000 times.

I trigger mm fault before I send the request, so there should be no iopf.

Here's what I got:

physical scenario:
performance:		SVA:9MB/s  	NOSVA:9MB/s
tlb_miss: 		SVA:302,651	NOSVA:1,223
trans_table_walk_access:SVA:302,276	NOSVA:1,237

VM scenario:
performance:		vSVA:9MB/s  	NOvSVA:6MB/s  about 30~40% loss
tlb_miss: 		vSVA:4,423,897	NOvSVA:1,907
trans_table_walk_access:vSVA:61,928,430	NOvSVA:21,948

In physical scenario, there's almost no performance loss, but the 
tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high, 
comparing to NOSVA.

In VM scenario, there's about 30~40% performance loss, this is because 
the two stage tlb_miss and trans_table_walk_access is even higher, and 
impact the performance.

I compare the procedure of building page table of SVA and NOSVA, and 
found that NOSVA uses 2MB mapping as far as possible, while SVA uses 
only 4KB.

I retest with huge page, and huge page could solve this problem, the 
performance of SVA/vSVA is almost the same as NOSVA.

I am wondering do you have any other solution for the performance loss 
of vSVA, or any other method to reduce the tlb_miss/trans_table_walk.

Thanks

Xingang

.

WARNING: multiple messages have this Message-ID (diff)
From: Xingang Wang <wangxingang5@huawei.com>
To: Eric Auger <eric.auger@redhat.com>, <eric.auger.pro@gmail.com>,
	<jean-philippe@linaro.org>, <iommu@lists.linux-foundation.org>,
	<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
	<kvmarm@lists.cs.columbia.edu>, <will@kernel.org>,
	<maz@kernel.org>, <robin.murphy@arm.com>, <joro@8bytes.org>,
	<alex.williamson@redhat.com>, <tn@semihalf.com>,
	<zhukeqian1@huawei.com>
Cc: vsethi@nvidia.com, jiangkunkun@huawei.com, lushenming@huawei.com,
	vivek.gautam@arm.com, zhangfei.gao@linaro.org
Subject: Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Date: Wed, 14 Apr 2021 10:36:11 +0800	[thread overview]
Message-ID: <55930e46-0a45-0d43-b34e-432cf332b42c@huawei.com> (raw)
In-Reply-To: <20210411111228.14386-1-eric.auger@redhat.com>

Hi Eric, Jean-Philippe

On 2021/4/11 19:12, Eric Auger wrote:
> SMMUv3 Nested Stage Setup (IOMMU part)
> 
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
> 
> This is based on Jean-Philippe's
> [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
> https://www.spinics.net/lists/arm-kernel/msg886518.html
> (including the patches that were not pulled for 5.13)
> 
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 
> Then those capabilities gets implemented in the SMMUv3 driver.
> 
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> v5.12-rc6-jean-iopf-14-2stage-v15
> (including the VFIO part in its last version: v13)
> 

I am testing the performance of an accelerator with/without SVA/vSVA,
and found there might be some potential performance loss risk for SVA/vSVA.

I use a Network and computing encryption device (SEC), and send 1MB 
request for 10000 times.

I trigger mm fault before I send the request, so there should be no iopf.

Here's what I got:

physical scenario:
performance:		SVA:9MB/s  	NOSVA:9MB/s
tlb_miss: 		SVA:302,651	NOSVA:1,223
trans_table_walk_access:SVA:302,276	NOSVA:1,237

VM scenario:
performance:		vSVA:9MB/s  	NOvSVA:6MB/s  about 30~40% loss
tlb_miss: 		vSVA:4,423,897	NOvSVA:1,907
trans_table_walk_access:vSVA:61,928,430	NOvSVA:21,948

In physical scenario, there's almost no performance loss, but the 
tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high, 
comparing to NOSVA.

In VM scenario, there's about 30~40% performance loss, this is because 
the two stage tlb_miss and trans_table_walk_access is even higher, and 
impact the performance.

I compare the procedure of building page table of SVA and NOSVA, and 
found that NOSVA uses 2MB mapping as far as possible, while SVA uses 
only 4KB.

I retest with huge page, and huge page could solve this problem, the 
performance of SVA/vSVA is almost the same as NOSVA.

I am wondering do you have any other solution for the performance loss 
of vSVA, or any other method to reduce the tlb_miss/trans_table_walk.

Thanks

Xingang

.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Xingang Wang <wangxingang5@huawei.com>
To: Eric Auger <eric.auger@redhat.com>, <eric.auger.pro@gmail.com>,
	<jean-philippe@linaro.org>, <iommu@lists.linux-foundation.org>,
	<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
	<kvmarm@lists.cs.columbia.edu>, <will@kernel.org>,
	<maz@kernel.org>, <robin.murphy@arm.com>, <joro@8bytes.org>,
	<alex.williamson@redhat.com>, <tn@semihalf.com>,
	<zhukeqian1@huawei.com>
Cc: vsethi@nvidia.com, jacob.jun.pan@linux.intel.com,
	lushenming@huawei.com, chenxiang66@hisilicon.com,
	nicoleotsuka@gmail.com, vivek.gautam@arm.com, vdumpa@nvidia.com,
	yi.l.liu@intel.com, zhangfei.gao@linaro.org
Subject: Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Date: Wed, 14 Apr 2021 10:36:11 +0800	[thread overview]
Message-ID: <55930e46-0a45-0d43-b34e-432cf332b42c@huawei.com> (raw)
In-Reply-To: <20210411111228.14386-1-eric.auger@redhat.com>

Hi Eric, Jean-Philippe

On 2021/4/11 19:12, Eric Auger wrote:
> SMMUv3 Nested Stage Setup (IOMMU part)
> 
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
> 
> This is based on Jean-Philippe's
> [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
> https://www.spinics.net/lists/arm-kernel/msg886518.html
> (including the patches that were not pulled for 5.13)
> 
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 
> Then those capabilities gets implemented in the SMMUv3 driver.
> 
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> v5.12-rc6-jean-iopf-14-2stage-v15
> (including the VFIO part in its last version: v13)
> 

I am testing the performance of an accelerator with/without SVA/vSVA,
and found there might be some potential performance loss risk for SVA/vSVA.

I use a Network and computing encryption device (SEC), and send 1MB 
request for 10000 times.

I trigger mm fault before I send the request, so there should be no iopf.

Here's what I got:

physical scenario:
performance:		SVA:9MB/s  	NOSVA:9MB/s
tlb_miss: 		SVA:302,651	NOSVA:1,223
trans_table_walk_access:SVA:302,276	NOSVA:1,237

VM scenario:
performance:		vSVA:9MB/s  	NOvSVA:6MB/s  about 30~40% loss
tlb_miss: 		vSVA:4,423,897	NOvSVA:1,907
trans_table_walk_access:vSVA:61,928,430	NOvSVA:21,948

In physical scenario, there's almost no performance loss, but the 
tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high, 
comparing to NOSVA.

In VM scenario, there's about 30~40% performance loss, this is because 
the two stage tlb_miss and trans_table_walk_access is even higher, and 
impact the performance.

I compare the procedure of building page table of SVA and NOSVA, and 
found that NOSVA uses 2MB mapping as far as possible, while SVA uses 
only 4KB.

I retest with huge page, and huge page could solve this problem, the 
performance of SVA/vSVA is almost the same as NOSVA.

I am wondering do you have any other solution for the performance loss 
of vSVA, or any other method to reduce the tlb_miss/trans_table_walk.

Thanks

Xingang

.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  parent reply	other threads:[~2021-04-14  2:36 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-11 11:12 [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
2021-04-11 11:12 ` Eric Auger
2021-04-11 11:12 ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 01/12] iommu: Introduce attach/detach_pasid_table API Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 02/12] iommu: Introduce bind/unbind_guest_msi Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 03/12] iommu/smmuv3: Allow s1 and s2 configs to coexist Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 04/12] iommu/smmuv3: Get prepared for nested stage support Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 05/12] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 06/12] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 07/12] iommu/smmuv3: Implement cache_invalidate Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-05-14  3:09   ` Kunkun Jiang
2021-05-14  3:09     ` Kunkun Jiang
2021-05-14  3:09     ` Kunkun Jiang
2021-05-21  6:48   ` Kunkun Jiang
2021-05-21  6:48     ` Kunkun Jiang
2021-05-21  6:48     ` Kunkun Jiang
2021-04-11 11:12 ` [PATCH v15 08/12] dma-iommu: Implement NESTED_MSI cookie Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 09/12] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 10/12] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 11/12] iommu/smmuv3: Implement bind/unbind_guest_msi Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12 ` [PATCH v15 12/12] iommu/smmuv3: report additional recoverable faults Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-11 11:12   ` Eric Auger
2021-04-14  2:36 ` Xingang Wang [this message]
2021-04-14  2:36   ` [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) Xingang Wang
2021-04-14  2:36   ` Xingang Wang
2021-04-14  6:56   ` Shameerali Kolothum Thodi
2021-04-14  6:56     ` Shameerali Kolothum Thodi
2021-04-14  6:56     ` Shameerali Kolothum Thodi
2021-04-14  7:08     ` Xingang Wang
2021-04-14  7:08       ` Xingang Wang
2021-04-14  7:08       ` Xingang Wang
2021-04-21 11:28 ` Vivek Kumar Gautam
2021-04-21 11:28   ` Vivek Kumar Gautam
2021-04-21 11:28   ` Vivek Kumar Gautam
2021-04-23 18:21 ` Sumit Gupta
2021-04-23 18:21   ` Sumit Gupta
2021-04-23 18:21   ` Sumit Gupta
2021-09-27 21:17 ` Krishna Reddy
2021-09-27 21:17   ` Krishna Reddy
2021-09-27 21:17   ` Krishna Reddy via iommu
2021-09-28  6:25   ` Eric Auger
2021-09-28  6:25     ` Eric Auger
2021-09-28  6:25     ` Eric Auger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55930e46-0a45-0d43-b34e-432cf332b42c@huawei.com \
    --to=wangxingang5@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=chenxiang66@hisilicon.com \
    --cc=eric.auger.pro@gmail.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jean-philippe@linaro.org \
    --cc=jiangkunkun@huawei.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lushenming@huawei.com \
    --cc=maz@kernel.org \
    --cc=nicoleotsuka@gmail.com \
    --cc=robin.murphy@arm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tn@semihalf.com \
    --cc=vdumpa@nvidia.com \
    --cc=vivek.gautam@arm.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    --cc=yuzenghui@huawei.com \
    --cc=zhangfei.gao@gmail.com \
    --cc=zhangfei.gao@linaro.org \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.