linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] iommu: reduce spinlock contention on fast path
@ 2019-11-21  0:13 Cong Wang
  2019-11-21  0:13 ` [PATCH 1/3] iommu: match the original algorithm Cong Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Cong Wang @ 2019-11-21  0:13 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, joro, Cong Wang

This patchset contains three small optimizations for the global spinlock
contention in IOVA cache. Our memcache perf test shows this reduced its
p999 latency down by 45% on AMD when IOMMU is enabled.

Cong Wang (3):
  iommu: match the original algorithm
  iommu: optimize iova_magazine_free_pfns()
  iommu: avoid taking iova_rbtree_lock twice

---
 drivers/iommu/iova.c | 44 +++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 19 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] iommu: match the original algorithm
  2019-11-21  0:13 [PATCH 0/3] iommu: reduce spinlock contention on fast path Cong Wang
@ 2019-11-21  0:13 ` Cong Wang
  2019-11-27 18:00   ` John Garry
  2019-11-21  0:13 ` [PATCH 2/3] iommu: optimize iova_magazine_free_pfns() Cong Wang
  2019-11-21  0:13 ` [PATCH 3/3] iommu: avoid taking iova_rbtree_lock twice Cong Wang
  2 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2019-11-21  0:13 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, joro, Cong Wang

The IOVA cache algorithm implemented in IOMMU code does not
exactly match the original algorithm described in the paper.

Particularly, it doesn't need to free the loaded empty magazine
when trying to put it back to global depot.

This patch makes it exactly match the original algorithm.

Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/iommu/iova.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 41c605b0058f..92f72a85e62a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -900,7 +900,7 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 
 	if (!iova_magazine_full(cpu_rcache->loaded)) {
 		can_insert = true;
-	} else if (!iova_magazine_full(cpu_rcache->prev)) {
+	} else if (iova_magazine_empty(cpu_rcache->prev)) {
 		swap(cpu_rcache->prev, cpu_rcache->loaded);
 		can_insert = true;
 	} else {
@@ -909,8 +909,9 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 		if (new_mag) {
 			spin_lock(&rcache->lock);
 			if (rcache->depot_size < MAX_GLOBAL_MAGS) {
-				rcache->depot[rcache->depot_size++] =
-						cpu_rcache->loaded;
+				swap(rcache->depot[rcache->depot_size], cpu_rcache->prev);
+				swap(cpu_rcache->prev, cpu_rcache->loaded);
+				rcache->depot_size++;
 			} else {
 				mag_to_free = cpu_rcache->loaded;
 			}
@@ -963,14 +964,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache,
 
 	if (!iova_magazine_empty(cpu_rcache->loaded)) {
 		has_pfn = true;
-	} else if (!iova_magazine_empty(cpu_rcache->prev)) {
+	} else if (iova_magazine_full(cpu_rcache->prev)) {
 		swap(cpu_rcache->prev, cpu_rcache->loaded);
 		has_pfn = true;
 	} else {
 		spin_lock(&rcache->lock);
 		if (rcache->depot_size > 0) {
-			iova_magazine_free(cpu_rcache->loaded);
-			cpu_rcache->loaded = rcache->depot[--rcache->depot_size];
+			swap(rcache->depot[rcache->depot_size - 1], cpu_rcache->prev);
+			swap(cpu_rcache->prev, cpu_rcache->loaded);
+			rcache->depot_size--;
 			has_pfn = true;
 		}
 		spin_unlock(&rcache->lock);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] iommu: optimize iova_magazine_free_pfns()
  2019-11-21  0:13 [PATCH 0/3] iommu: reduce spinlock contention on fast path Cong Wang
  2019-11-21  0:13 ` [PATCH 1/3] iommu: match the original algorithm Cong Wang
@ 2019-11-21  0:13 ` Cong Wang
  2019-11-21  0:13 ` [PATCH 3/3] iommu: avoid taking iova_rbtree_lock twice Cong Wang
  2 siblings, 0 replies; 7+ messages in thread
From: Cong Wang @ 2019-11-21  0:13 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, joro, Cong Wang

If the maganize is empty, iova_magazine_free_pfns() should
be a nop, however it misses the case of mag->size==0. So we
should just call iova_magazine_empty().

This should reduce the contention on iovad->iova_rbtree_lock
a little bit.

Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/iommu/iova.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 92f72a85e62a..b82c6f1cbfc2 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -797,13 +797,23 @@ static void iova_magazine_free(struct iova_magazine *mag)
 	kfree(mag);
 }
 
+static bool iova_magazine_full(struct iova_magazine *mag)
+{
+	return (mag && mag->size == IOVA_MAG_SIZE);
+}
+
+static bool iova_magazine_empty(struct iova_magazine *mag)
+{
+	return (!mag || mag->size == 0);
+}
+
 static void
 iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad)
 {
 	unsigned long flags;
 	int i;
 
-	if (!mag)
+	if (iova_magazine_empty(mag))
 		return;
 
 	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
@@ -820,16 +830,6 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad)
 	mag->size = 0;
 }
 
-static bool iova_magazine_full(struct iova_magazine *mag)
-{
-	return (mag && mag->size == IOVA_MAG_SIZE);
-}
-
-static bool iova_magazine_empty(struct iova_magazine *mag)
-{
-	return (!mag || mag->size == 0);
-}
-
 static unsigned long iova_magazine_pop(struct iova_magazine *mag,
 				       unsigned long limit_pfn)
 {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] iommu: avoid taking iova_rbtree_lock twice
  2019-11-21  0:13 [PATCH 0/3] iommu: reduce spinlock contention on fast path Cong Wang
  2019-11-21  0:13 ` [PATCH 1/3] iommu: match the original algorithm Cong Wang
  2019-11-21  0:13 ` [PATCH 2/3] iommu: optimize iova_magazine_free_pfns() Cong Wang
@ 2019-11-21  0:13 ` Cong Wang
  2 siblings, 0 replies; 7+ messages in thread
From: Cong Wang @ 2019-11-21  0:13 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, joro, Cong Wang

Both find_iova() and __free_iova() take iova_rbtree_lock,
there is no reason to take and release it twice inside
free_iova().

Fold them into the critical section by calling the unlock
versions instead.

Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/iommu/iova.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b82c6f1cbfc2..ba6bd33f2b16 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -390,10 +390,14 @@ EXPORT_SYMBOL_GPL(__free_iova);
 void
 free_iova(struct iova_domain *iovad, unsigned long pfn)
 {
-	struct iova *iova = find_iova(iovad, pfn);
+	unsigned long flags;
+	struct iova *iova;
 
+	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+	iova = private_find_iova(iovad, pfn);
 	if (iova)
-		__free_iova(iovad, iova);
+		private_free_iova(iovad, iova);
+	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
 
 }
 EXPORT_SYMBOL_GPL(free_iova);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] iommu: match the original algorithm
  2019-11-21  0:13 ` [PATCH 1/3] iommu: match the original algorithm Cong Wang
@ 2019-11-27 18:00   ` John Garry
  2019-11-27 18:38     ` Qian Cai
  2019-11-28 19:30     ` Cong Wang
  0 siblings, 2 replies; 7+ messages in thread
From: John Garry @ 2019-11-27 18:00 UTC (permalink / raw)
  To: Cong Wang, iommu; +Cc: linux-kernel

On 21/11/2019 00:13, Cong Wang wrote:
> The IOVA cache algorithm implemented in IOMMU code does not
> exactly match the original algorithm described in the paper.
> 
> Particularly, it doesn't need to free the loaded empty magazine
> when trying to put it back to global depot.
> 
> This patch makes it exactly match the original algorithm.
> 

I haven't gone into the details, but this patch alone is giving this:

root@(none)$ [  123.857024] kmemleak: 8 new suspected memory leaks (see 
/sys/kernel/debug/kmemleak)

root@(none)$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff002339843000 (size 2048):
   comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   backtrace:
     [<000000001d2710bf>] kmem_cache_alloc+0x188/0x260
     [<00000000cc229a78>] init_iova_domain+0x1e8/0x2a8
     [<000000002646fc92>] iommu_setup_dma_ops+0x200/0x710
     [<00000000acc5fe46>] arch_setup_dma_ops+0x80/0x128
     [<00000000994e1e43>] acpi_dma_configure+0x11c/0x140
     [<00000000effe9374>] pci_dma_configure+0xe0/0x108
     [<00000000f614ae1e>] really_probe+0x210/0x548
     [<0000000087884b1b>] driver_probe_device+0x7c/0x148
     [<0000000010af2936>] device_driver_attach+0x94/0xa0
     [<00000000c92b2971>] __driver_attach+0xa4/0x110
     [<00000000c873500f>] bus_for_each_dev+0xe8/0x158
     [<00000000c7d0e008>] driver_attach+0x30/0x40
     [<000000003cf39ba8>] bus_add_driver+0x234/0x2f0
     [<0000000043830a45>] driver_register+0xbc/0x1d0
     [<00000000c8a41162>] __pci_register_driver+0xb0/0xc8
     [<00000000e562eeec>] sas_v3_pci_driver_init+0x20/0x28
unreferenced object 0xffff002339844000 (size 2048):
   comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s)

[snip]

And I don't feel like continuing until it's resolved....

Thanks,
John

> Cc: Joerg Roedel <joro@8bytes.org>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>   drivers/iommu/iova.c | 14 ++++++++------
>   1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 41c605b0058f..92f72a85e62a 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -900,7 +900,7 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
>   
>   	if (!iova_magazine_full(cpu_rcache->loaded)) {
>   		can_insert = true;
> -	} else if (!iova_magazine_full(cpu_rcache->prev)) {
> +	} else if (iova_magazine_empty(cpu_rcache->prev)) {
>   		swap(cpu_rcache->prev, cpu_rcache->loaded);
>   		can_insert = true;
>   	} else {
> @@ -909,8 +909,9 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
>   		if (new_mag) {
>   			spin_lock(&rcache->lock);
>   			if (rcache->depot_size < MAX_GLOBAL_MAGS) {
> -				rcache->depot[rcache->depot_size++] =
> -						cpu_rcache->loaded;
> +				swap(rcache->depot[rcache->depot_size], cpu_rcache->prev);
> +				swap(cpu_rcache->prev, cpu_rcache->loaded);
> +				rcache->depot_size++;
>   			} else {
>   				mag_to_free = cpu_rcache->loaded;
>   			}
> @@ -963,14 +964,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache,
>   
>   	if (!iova_magazine_empty(cpu_rcache->loaded)) {
>   		has_pfn = true;
> -	} else if (!iova_magazine_empty(cpu_rcache->prev)) {
> +	} else if (iova_magazine_full(cpu_rcache->prev)) {
>   		swap(cpu_rcache->prev, cpu_rcache->loaded);
>   		has_pfn = true;
>   	} else {
>   		spin_lock(&rcache->lock);
>   		if (rcache->depot_size > 0) {
> -			iova_magazine_free(cpu_rcache->loaded);
> -			cpu_rcache->loaded = rcache->depot[--rcache->depot_size];
> +			swap(rcache->depot[rcache->depot_size - 1], cpu_rcache->prev);
> +			swap(cpu_rcache->prev, cpu_rcache->loaded);
> +			rcache->depot_size--;
>   			has_pfn = true;
>   		}
>   		spin_unlock(&rcache->lock);
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] iommu: match the original algorithm
  2019-11-27 18:00   ` John Garry
@ 2019-11-27 18:38     ` Qian Cai
  2019-11-28 19:30     ` Cong Wang
  1 sibling, 0 replies; 7+ messages in thread
From: Qian Cai @ 2019-11-27 18:38 UTC (permalink / raw)
  To: John Garry; +Cc: Cong Wang, iommu, linux-kernel



> On Nov 27, 2019, at 1:01 PM, John Garry <john.garry@huawei.com> wrote:
> 
> I haven't gone into the details, but this patch alone is giving this:
> 
> root@(none)$ [  123.857024] kmemleak: 8 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> 
> root@(none)$ cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffff002339843000 (size 2048):
>  comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s)
>  hex dump (first 32 bytes):
>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>  backtrace:
>    [<000000001d2710bf>] kmem_cache_alloc+0x188/0x260
>    [<00000000cc229a78>] init_iova_domain+0x1e8/0x2a8
>    [<000000002646fc92>] iommu_setup_dma_ops+0x200/0x710
>    [<00000000acc5fe46>] arch_setup_dma_ops+0x80/0x128
>    [<00000000994e1e43>] acpi_dma_configure+0x11c/0x140
>    [<00000000effe9374>] pci_dma_configure+0xe0/0x108
>    [<00000000f614ae1e>] really_probe+0x210/0x548
>    [<0000000087884b1b>] driver_probe_device+0x7c/0x148
>    [<0000000010af2936>] device_driver_attach+0x94/0xa0
>    [<00000000c92b2971>] __driver_attach+0xa4/0x110
>    [<00000000c873500f>] bus_for_each_dev+0xe8/0x158
>    [<00000000c7d0e008>] driver_attach+0x30/0x40
>    [<000000003cf39ba8>] bus_add_driver+0x234/0x2f0
>    [<0000000043830a45>] driver_register+0xbc/0x1d0
>    [<00000000c8a41162>] __pci_register_driver+0xb0/0xc8
>    [<00000000e562eeec>] sas_v3_pci_driver_init+0x20/0x28
> unreferenced object 0xffff002339844000 (size 2048):
>  comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s)
> 
> [snip]
> 
> And I don't feel like continuing until it's resolved....

Thanks for talking a hit by this before me. It is frustrating that people tend not to test their patches properly  with things like kmemleak.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] iommu: match the original algorithm
  2019-11-27 18:00   ` John Garry
  2019-11-27 18:38     ` Qian Cai
@ 2019-11-28 19:30     ` Cong Wang
  1 sibling, 0 replies; 7+ messages in thread
From: Cong Wang @ 2019-11-28 19:30 UTC (permalink / raw)
  To: John Garry; +Cc: iommu, LKML

On Wed, Nov 27, 2019 at 10:01 AM John Garry <john.garry@huawei.com> wrote:
>
> On 21/11/2019 00:13, Cong Wang wrote:
> > The IOVA cache algorithm implemented in IOMMU code does not
> > exactly match the original algorithm described in the paper.
> >
> > Particularly, it doesn't need to free the loaded empty magazine
> > when trying to put it back to global depot.
> >
> > This patch makes it exactly match the original algorithm.
> >
>
> I haven't gone into the details, but this patch alone is giving this:
>
> root@(none)$ [  123.857024] kmemleak: 8 new suspected memory leaks (see
> /sys/kernel/debug/kmemleak)

Ah, thanks for catching it! I see what I missed, I should pre-allocate those
empty entries in order to make it work correctly.

I didn't catch this because this was tested on a production machine where
we can't afford CONFIG_DEBUG_KMEMLEAK=y for obvious performance
concerns.

Anyway, I will fix this and send v2.

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-11-28 19:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-21  0:13 [PATCH 0/3] iommu: reduce spinlock contention on fast path Cong Wang
2019-11-21  0:13 ` [PATCH 1/3] iommu: match the original algorithm Cong Wang
2019-11-27 18:00   ` John Garry
2019-11-27 18:38     ` Qian Cai
2019-11-28 19:30     ` Cong Wang
2019-11-21  0:13 ` [PATCH 2/3] iommu: optimize iova_magazine_free_pfns() Cong Wang
2019-11-21  0:13 ` [PATCH 3/3] iommu: avoid taking iova_rbtree_lock twice Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).