All of lore.kernel.org
 help / color / mirror / Atom feed
From: "chenxiang (M)" <chenxiang66@hisilicon.com>
To: Lu Baolu <baolu.lu@linux.intel.com>,
	Georgi Djakov <djakov@kernel.org>,
	Georgi Djakov <quic_c_gdjako@quicinc.com>, <will@kernel.org>,
	<robin.murphy@arm.com>
Cc: <isaacm@codeaurora.org>, <linux-kernel@vger.kernel.org>,
	<iommu@lists.linux-foundation.org>, <pratikp@codeaurora.org>,
	<linux-arm-kernel@lists.infradead.org>,
	"linuxarm@huawei.com" <linuxarm@huawei.com>
Subject: Re: [PATCH v7 00/15] Optimizing iommu_[map/unmap] performance
Date: Thu, 15 Jul 2021 09:51:13 +0800	[thread overview]
Message-ID: <edfff22f-251d-e07d-fdbc-0eb00c821f15@hisilicon.com> (raw)
In-Reply-To: <4d466ea9-2c1a-2e19-af5b-0434441ee7cb@linux.intel.com>



在 2021/7/15 9:23, Lu Baolu 写道:
> On 7/14/21 10:24 PM, Georgi Djakov wrote:
>> On 16.06.21 16:38, Georgi Djakov wrote:
>>> When unmapping a buffer from an IOMMU domain, the IOMMU framework 
>>> unmaps
>>> the buffer at a granule of the largest page size that is supported by
>>> the IOMMU hardware and fits within the buffer. For every block that
>>> is unmapped, the IOMMU framework will call into the IOMMU driver, and
>>> then the io-pgtable framework to walk the page tables to find the entry
>>> that corresponds to the IOVA, and then unmaps the entry.
>>>
>>> This can be suboptimal in scenarios where a buffer or a piece of a
>>> buffer can be split into several contiguous page blocks of the same 
>>> size.
>>> For example, consider an IOMMU that supports 4 KB page blocks, 2 MB 
>>> page
>>> blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is 
>>> being
>>> unmapped at IOVA 0. The current call-flow will result in 4 indirect 
>>> calls,
>>> and 2 page table walks, to unmap 2 entries that are next to each 
>>> other in
>>> the page-tables, when both entries could have been unmapped in one shot
>>> by clearing both page table entries in the same call.
>>>
>>> The same optimization is applicable to mapping buffers as well, so
>>> these patches implement a set of callbacks called unmap_pages and
>>> map_pages to the io-pgtable code and IOMMU drivers which unmaps or maps
>>> an IOVA range that consists of a number of pages of the same
>>> page size that is supported by the IOMMU hardware, and allows for
>>> manipulating multiple page table entries in the same set of indirect
>>> calls. The reason for introducing these callbacks is to give other 
>>> IOMMU
>>> drivers/io-pgtable formats time to change to using the new 
>>> callbacks, so
>>> that the transition to using this approach can be done piecemeal.
>>
>> Hi Will,
>>
>> Did you get a chance to look at this patchset? Most patches are already
>> acked/reviewed and all still applies clean on rc1.
>
> I also have the ops->[un]map_pages implementation for the Intel IOMMU
> driver. I will post them once the iommu/core part get applied.

I also implement those callbacks on ARM SMMUV3 based on this series, and 
use dma_map_benchmark to have a test on
the latency of map/unmap as follows, and i think it promotes much on the 
latency of map/unmap. I will also plan to post
the implementations for ARM SMMUV3 after this series are applied.

t = 1(thread = 1):
                    before opt(us)   after opt(us)
g=1(4K size)        0.1/1.3          0.1/0.8
g=2(8K size)        0.2/1.5          0.2/0.9
g=4(16K size)       0.3/1.9          0.1/1.1
g=8(32K size)       0.5/2.7          0.2/1.4
g=16(64K size)      1.0/4.5          0.2/2.0
g=32(128K size)     1.8/7.9          0.2/3.3
g=64(256K size)     3.7/14.8         0.4/6.1
g=128(512K size)    7.1/14.7         0.5/10.4
g=256(1M size)      14.0/53.9        0.8/19.3
g=512(2M size)      0.2/0.9          0.2/0.9
g=1024(4M size)     0.5/1.5          0.4/1.0

t = 10(thread = 10):
                    before opt(us)   after opt(us)
g=1(4K size)        0.3/7.0          0.1/5.8
g=2(8K size)        0.4/6.7          0.3/6.0
g=4(16K size)       0.5/6.3          0.3/5.6
g=8(32K size)       0.5/8.3          0.2/6.3
g=16(64K size)      1.0/17.3         0.3/12.4
g=32(128K size)     1.8/36.0         0.2/24.2
g=64(256K size)     4.3/67.2         1.2/46.4
g=128(512K size)    7.8/93.7         1.3/94.2
g=256(1M size)      14.7/280.8       1.8/191.5
g=512(2M size)      3.6/3.2          1.5/2.5
g=1024(4M size)     2.0/3.1          1.8/2.6


>
> Best regards,
> baolu
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
> .
>



WARNING: multiple messages have this Message-ID (diff)
From: "chenxiang (M)" <chenxiang66@hisilicon.com>
To: Lu Baolu <baolu.lu@linux.intel.com>,
	Georgi Djakov <djakov@kernel.org>,
	Georgi Djakov <quic_c_gdjako@quicinc.com>, <will@kernel.org>,
	<robin.murphy@arm.com>
Cc: isaacm@codeaurora.org, linux-kernel@vger.kernel.org,
	"linuxarm@huawei.com" <linuxarm@huawei.com>,
	iommu@lists.linux-foundation.org, pratikp@codeaurora.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v7 00/15] Optimizing iommu_[map/unmap] performance
Date: Thu, 15 Jul 2021 09:51:13 +0800	[thread overview]
Message-ID: <edfff22f-251d-e07d-fdbc-0eb00c821f15@hisilicon.com> (raw)
In-Reply-To: <4d466ea9-2c1a-2e19-af5b-0434441ee7cb@linux.intel.com>



在 2021/7/15 9:23, Lu Baolu 写道:
> On 7/14/21 10:24 PM, Georgi Djakov wrote:
>> On 16.06.21 16:38, Georgi Djakov wrote:
>>> When unmapping a buffer from an IOMMU domain, the IOMMU framework 
>>> unmaps
>>> the buffer at a granule of the largest page size that is supported by
>>> the IOMMU hardware and fits within the buffer. For every block that
>>> is unmapped, the IOMMU framework will call into the IOMMU driver, and
>>> then the io-pgtable framework to walk the page tables to find the entry
>>> that corresponds to the IOVA, and then unmaps the entry.
>>>
>>> This can be suboptimal in scenarios where a buffer or a piece of a
>>> buffer can be split into several contiguous page blocks of the same 
>>> size.
>>> For example, consider an IOMMU that supports 4 KB page blocks, 2 MB 
>>> page
>>> blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is 
>>> being
>>> unmapped at IOVA 0. The current call-flow will result in 4 indirect 
>>> calls,
>>> and 2 page table walks, to unmap 2 entries that are next to each 
>>> other in
>>> the page-tables, when both entries could have been unmapped in one shot
>>> by clearing both page table entries in the same call.
>>>
>>> The same optimization is applicable to mapping buffers as well, so
>>> these patches implement a set of callbacks called unmap_pages and
>>> map_pages to the io-pgtable code and IOMMU drivers which unmaps or maps
>>> an IOVA range that consists of a number of pages of the same
>>> page size that is supported by the IOMMU hardware, and allows for
>>> manipulating multiple page table entries in the same set of indirect
>>> calls. The reason for introducing these callbacks is to give other 
>>> IOMMU
>>> drivers/io-pgtable formats time to change to using the new 
>>> callbacks, so
>>> that the transition to using this approach can be done piecemeal.
>>
>> Hi Will,
>>
>> Did you get a chance to look at this patchset? Most patches are already
>> acked/reviewed and all still applies clean on rc1.
>
> I also have the ops->[un]map_pages implementation for the Intel IOMMU
> driver. I will post them once the iommu/core part get applied.

I also implement those callbacks on ARM SMMUV3 based on this series, and 
use dma_map_benchmark to have a test on
the latency of map/unmap as follows, and i think it promotes much on the 
latency of map/unmap. I will also plan to post
the implementations for ARM SMMUV3 after this series are applied.

t = 1(thread = 1):
                    before opt(us)   after opt(us)
g=1(4K size)        0.1/1.3          0.1/0.8
g=2(8K size)        0.2/1.5          0.2/0.9
g=4(16K size)       0.3/1.9          0.1/1.1
g=8(32K size)       0.5/2.7          0.2/1.4
g=16(64K size)      1.0/4.5          0.2/2.0
g=32(128K size)     1.8/7.9          0.2/3.3
g=64(256K size)     3.7/14.8         0.4/6.1
g=128(512K size)    7.1/14.7         0.5/10.4
g=256(1M size)      14.0/53.9        0.8/19.3
g=512(2M size)      0.2/0.9          0.2/0.9
g=1024(4M size)     0.5/1.5          0.4/1.0

t = 10(thread = 10):
                    before opt(us)   after opt(us)
g=1(4K size)        0.3/7.0          0.1/5.8
g=2(8K size)        0.4/6.7          0.3/6.0
g=4(16K size)       0.5/6.3          0.3/5.6
g=8(32K size)       0.5/8.3          0.2/6.3
g=16(64K size)      1.0/17.3         0.3/12.4
g=32(128K size)     1.8/36.0         0.2/24.2
g=64(256K size)     4.3/67.2         1.2/46.4
g=128(512K size)    7.8/93.7         1.3/94.2
g=256(1M size)      14.7/280.8       1.8/191.5
g=512(2M size)      3.6/3.2          1.5/2.5
g=1024(4M size)     2.0/3.1          1.8/2.6


>
> Best regards,
> baolu
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
> .
>


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: "chenxiang (M)" <chenxiang66@hisilicon.com>
To: Lu Baolu <baolu.lu@linux.intel.com>,
	Georgi Djakov <djakov@kernel.org>,
	Georgi Djakov <quic_c_gdjako@quicinc.com>, <will@kernel.org>,
	<robin.murphy@arm.com>
Cc: <isaacm@codeaurora.org>, <linux-kernel@vger.kernel.org>,
	<iommu@lists.linux-foundation.org>, <pratikp@codeaurora.org>,
	<linux-arm-kernel@lists.infradead.org>,
	"linuxarm@huawei.com" <linuxarm@huawei.com>
Subject: Re: [PATCH v7 00/15] Optimizing iommu_[map/unmap] performance
Date: Thu, 15 Jul 2021 09:51:13 +0800	[thread overview]
Message-ID: <edfff22f-251d-e07d-fdbc-0eb00c821f15@hisilicon.com> (raw)
In-Reply-To: <4d466ea9-2c1a-2e19-af5b-0434441ee7cb@linux.intel.com>



在 2021/7/15 9:23, Lu Baolu 写道:
> On 7/14/21 10:24 PM, Georgi Djakov wrote:
>> On 16.06.21 16:38, Georgi Djakov wrote:
>>> When unmapping a buffer from an IOMMU domain, the IOMMU framework 
>>> unmaps
>>> the buffer at a granule of the largest page size that is supported by
>>> the IOMMU hardware and fits within the buffer. For every block that
>>> is unmapped, the IOMMU framework will call into the IOMMU driver, and
>>> then the io-pgtable framework to walk the page tables to find the entry
>>> that corresponds to the IOVA, and then unmaps the entry.
>>>
>>> This can be suboptimal in scenarios where a buffer or a piece of a
>>> buffer can be split into several contiguous page blocks of the same 
>>> size.
>>> For example, consider an IOMMU that supports 4 KB page blocks, 2 MB 
>>> page
>>> blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is 
>>> being
>>> unmapped at IOVA 0. The current call-flow will result in 4 indirect 
>>> calls,
>>> and 2 page table walks, to unmap 2 entries that are next to each 
>>> other in
>>> the page-tables, when both entries could have been unmapped in one shot
>>> by clearing both page table entries in the same call.
>>>
>>> The same optimization is applicable to mapping buffers as well, so
>>> these patches implement a set of callbacks called unmap_pages and
>>> map_pages to the io-pgtable code and IOMMU drivers which unmaps or maps
>>> an IOVA range that consists of a number of pages of the same
>>> page size that is supported by the IOMMU hardware, and allows for
>>> manipulating multiple page table entries in the same set of indirect
>>> calls. The reason for introducing these callbacks is to give other 
>>> IOMMU
>>> drivers/io-pgtable formats time to change to using the new 
>>> callbacks, so
>>> that the transition to using this approach can be done piecemeal.
>>
>> Hi Will,
>>
>> Did you get a chance to look at this patchset? Most patches are already
>> acked/reviewed and all still applies clean on rc1.
>
> I also have the ops->[un]map_pages implementation for the Intel IOMMU
> driver. I will post them once the iommu/core part get applied.

I also implement those callbacks on ARM SMMUV3 based on this series, and 
use dma_map_benchmark to have a test on
the latency of map/unmap as follows, and i think it promotes much on the 
latency of map/unmap. I will also plan to post
the implementations for ARM SMMUV3 after this series are applied.

t = 1(thread = 1):
                    before opt(us)   after opt(us)
g=1(4K size)        0.1/1.3          0.1/0.8
g=2(8K size)        0.2/1.5          0.2/0.9
g=4(16K size)       0.3/1.9          0.1/1.1
g=8(32K size)       0.5/2.7          0.2/1.4
g=16(64K size)      1.0/4.5          0.2/2.0
g=32(128K size)     1.8/7.9          0.2/3.3
g=64(256K size)     3.7/14.8         0.4/6.1
g=128(512K size)    7.1/14.7         0.5/10.4
g=256(1M size)      14.0/53.9        0.8/19.3
g=512(2M size)      0.2/0.9          0.2/0.9
g=1024(4M size)     0.5/1.5          0.4/1.0

t = 10(thread = 10):
                    before opt(us)   after opt(us)
g=1(4K size)        0.3/7.0          0.1/5.8
g=2(8K size)        0.4/6.7          0.3/6.0
g=4(16K size)       0.5/6.3          0.3/5.6
g=8(32K size)       0.5/8.3          0.2/6.3
g=16(64K size)      1.0/17.3         0.3/12.4
g=32(128K size)     1.8/36.0         0.2/24.2
g=64(256K size)     4.3/67.2         1.2/46.4
g=128(512K size)    7.8/93.7         1.3/94.2
g=256(1M size)      14.7/280.8       1.8/191.5
g=512(2M size)      3.6/3.2          1.5/2.5
g=1024(4M size)     2.0/3.1          1.8/2.6


>
> Best regards,
> baolu
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
> .
>



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-07-15  1:51 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16 13:38 [PATCH v7 00/15] Optimizing iommu_[map/unmap] performance Georgi Djakov
2021-06-16 13:38 ` Georgi Djakov
2021-06-16 13:38 ` Georgi Djakov
2021-06-16 13:38 ` [PATCH v7 01/15] iommu/io-pgtable: Introduce unmap_pages() as a page table op Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38 ` [PATCH v7 02/15] iommu: Add an unmap_pages() op for IOMMU drivers Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38 ` [PATCH v7 03/15] iommu/io-pgtable: Introduce map_pages() as a page table op Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38 ` [PATCH v7 04/15] iommu: Add a map_pages() op for IOMMU drivers Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38 ` [PATCH v7 05/15] iommu: Use bitmap to calculate page size in iommu_pgsize() Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-17  7:16   ` Lu Baolu
2021-06-17  7:16     ` Lu Baolu
2021-06-17  7:16     ` Lu Baolu
2021-06-16 13:38 ` [PATCH v7 06/15] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-17  7:17   ` Lu Baolu
2021-06-17  7:17     ` Lu Baolu
2021-06-17  7:17     ` Lu Baolu
2021-06-16 13:38 ` [PATCH v7 07/15] iommu: Hook up '->unmap_pages' driver callback Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-17  7:18   ` Lu Baolu
2021-06-17  7:18     ` Lu Baolu
2021-06-17  7:18     ` Lu Baolu
2021-06-16 13:38 ` [PATCH v7 08/15] iommu: Add support for the map_pages() callback Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-17  7:18   ` Lu Baolu
2021-06-17  7:18     ` Lu Baolu
2021-06-17  7:18     ` Lu Baolu
2021-06-16 13:38 ` [PATCH v7 09/15] iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38 ` [PATCH v7 10/15] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages() Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-07-15  9:31   ` Kunkun Jiang
2021-07-15  9:31     ` Kunkun Jiang
2021-07-15  9:31     ` Kunkun Jiang
2021-06-16 13:38 ` [PATCH v7 11/15] iommu/io-pgtable-arm: Implement arm_lpae_map_pages() Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38 ` [PATCH v7 12/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages() Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38 ` [PATCH v7 13/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages() Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38 ` [PATCH v7 14/15] iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38 ` [PATCH v7 15/15] iommu/arm-smmu: Implement the map_pages() " Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-06-16 13:38   ` Georgi Djakov
2021-07-14 14:24 ` [PATCH v7 00/15] Optimizing iommu_[map/unmap] performance Georgi Djakov
2021-07-14 14:24   ` Georgi Djakov
2021-07-14 14:24   ` Georgi Djakov
2021-07-15  1:23   ` Lu Baolu
2021-07-15  1:23     ` Lu Baolu
2021-07-15  1:23     ` Lu Baolu
2021-07-15  1:51     ` chenxiang (M) [this message]
2021-07-15  1:51       ` chenxiang (M)
2021-07-15  1:51       ` chenxiang (M)
2021-07-26 10:37 ` Joerg Roedel
2021-07-26 10:37   ` Joerg Roedel
2021-07-26 10:37   ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=edfff22f-251d-e07d-fdbc-0eb00c821f15@hisilicon.com \
    --to=chenxiang66@hisilicon.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=djakov@kernel.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=isaacm@codeaurora.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=pratikp@codeaurora.org \
    --cc=quic_c_gdjako@quicinc.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.