IOMMU Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/2] iommu/iova: enhance the rcache optimization
@ 2019-08-15 12:11 Zhen Lei
  2019-08-15 12:11 ` [PATCH v2 1/2] iommu/iova: introduce iova_magazine_compact_pfns() Zhen Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Zhen Lei @ 2019-08-15 12:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jean-Philippe Brucker, John Garry,
	Robin Murphy, Will Deacon, Joerg Roedel, iommu, Omer Peleg,
	Adam Morrison, Shaohua Li, Ben Serebrin, David Woodhouse,
	linux-arm-kernel, linux-kernel

v1 --> v2
1. I did not chagne the patches but added this cover-letter.
2. Add a batch of reviewers base on
   9257b4a206fc ("iommu/iova: introduce per-cpu caching to iova allocation")
3. I described the problem I met in patch 2, but I hope below brief description
   can help people to quickly understand.
   Suppose there are six rcache sizes, each size can maximum hold 10000 IOVAs.
   --------------------------------------------
   |  4K   |  8K  | 16K  |  32K | 64K  | 128K |
   --------------------------------------------
   | 10000 | 9000 | 8500 | 8600 | 9200 | 7000 |
   --------------------------------------------
   As the above map displayed, the whole rcache buffered too many IOVAs. Now, the
   worst case can be coming, suppose we need 20000 4K IOVAs at one time. That means
   10000 IOVAs can be allocated from rcache, but another 10000 IOVAs should be 
   allocated from RB tree base on alloc_iova() function. But the RB tree currently
   have at least (9000 + 8500 + 8600 + 9200 + 7000) = 42300 nodes. The average speed
   of RB tree traverse will be very slow. For my test scenario, the 4K size IOVAs are
   frequently used, but others are not. So similarly, when the 20000 4K IOVAs are
   continuous freed, the first 10000 IOVAs can be quickly buffered, but the other
   10000 IOVAs can not.

Zhen Lei (2):
  iommu/iova: introduce iova_magazine_compact_pfns()
  iommu/iova: enhance the rcache optimization

 drivers/iommu/iova.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++----
 include/linux/iova.h |   1 +
 2 files changed, 95 insertions(+), 6 deletions(-)

-- 
1.8.3


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] iommu/iova: introduce iova_magazine_compact_pfns()
  2019-08-15 12:11 [PATCH v2 0/2] iommu/iova: enhance the rcache optimization Zhen Lei
@ 2019-08-15 12:11 ` Zhen Lei
  2019-08-15 12:11 ` [PATCH v2 2/2] iommu/iova: enhance the rcache optimization Zhen Lei
  2019-08-23  8:15 ` [PATCH v2 0/2] " Leizhen (ThunderTown)
  2 siblings, 0 replies; 4+ messages in thread
From: Zhen Lei @ 2019-08-15 12:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jean-Philippe Brucker, John Garry,
	Robin Murphy, Will Deacon, Joerg Roedel, iommu, Omer Peleg,
	Adam Morrison, Shaohua Li, Ben Serebrin, David Woodhouse,
	linux-arm-kernel, linux-kernel

iova_magazine_free_pfns() can only free the whole magazine buffer, add
iova_magazine_compact_pfns() to support free part of it.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/iova.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 3e1a8a6755723a9..4b7a9efa0ef40af 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -795,18 +795,19 @@ static void iova_magazine_free(struct iova_magazine *mag)
 	kfree(mag);
 }
 
-static void
-iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad)
+static void iova_magazine_compact_pfns(struct iova_magazine *mag,
+				       struct iova_domain *iovad,
+				       unsigned long newsize)
 {
 	unsigned long flags;
 	int i;
 
-	if (!mag)
+	if (!mag || mag->size <= newsize)
 		return;
 
 	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
 
-	for (i = 0 ; i < mag->size; ++i) {
+	for (i = newsize; i < mag->size; ++i) {
 		struct iova *iova = private_find_iova(iovad, mag->pfns[i]);
 
 		BUG_ON(!iova);
@@ -815,7 +816,13 @@ static void iova_magazine_free(struct iova_magazine *mag)
 
 	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
 
-	mag->size = 0;
+	mag->size = newsize;
+}
+
+static void
+iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad)
+{
+	iova_magazine_compact_pfns(mag, iovad, 0);
 }
 
 static bool iova_magazine_full(struct iova_magazine *mag)
-- 
1.8.3


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] iommu/iova: enhance the rcache optimization
  2019-08-15 12:11 [PATCH v2 0/2] iommu/iova: enhance the rcache optimization Zhen Lei
  2019-08-15 12:11 ` [PATCH v2 1/2] iommu/iova: introduce iova_magazine_compact_pfns() Zhen Lei
@ 2019-08-15 12:11 ` Zhen Lei
  2019-08-23  8:15 ` [PATCH v2 0/2] " Leizhen (ThunderTown)
  2 siblings, 0 replies; 4+ messages in thread
From: Zhen Lei @ 2019-08-15 12:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jean-Philippe Brucker, John Garry,
	Robin Murphy, Will Deacon, Joerg Roedel, iommu, Omer Peleg,
	Adam Morrison, Shaohua Li, Ben Serebrin, David Woodhouse,
	linux-arm-kernel, linux-kernel

The rcache method caches the freed IOVAs, to improve the performance of
IOVAs allocation and release. This is usually okay, but it maybe declined
in some special scenarios.

For example, currently the IOVA_RANGE_CACHE_MAX_SIZE is 6, and for ecch
size, contains: MAX_GLOBAL_MAGS=32 shareable depot magazines, each vcpu
has two magazines(cpu_rcaches->loaded and cpu_rcaches->prev). In an
extreme case, it can max cache ((num_possible_cpus() * 2 + 32) * 128 * 6)
IOVAs, it's very large. The worst case happens when the depot magazines
of a certain size(usually 4K) is full, further free_iova_fast() invoking
will cause iova_magazine_free_pfns() to be called. As the above saied,
too many IOVAs buffered, so that the RB tree is very large, the
iova_magazine_free_pfns()-->private_find_iova(), and the missed iova
allocation: alloc_iova()-->__alloc_and_insert_iova_range() will spend too
much time. And that the current rcache method have no cleanup operation,
the buffered IOVAs will only increase but not decrease.

For my FIO stress test scenario, the performance drop about 35%, and can
not recover even if re-execute the test cases.
Jobs: 21 (f=21): [2.3% done] [8887M/0K /s] [2170K/0 iops]
Jobs: 21 (f=21): [2.3% done] [8902M/0K /s] [2173K/0 iops]
Jobs: 21 (f=21): [2.3% done] [6010M/0K /s] [1467K/0 iops]
Jobs: 21 (f=21): [2.3% done] [5397M/0K /s] [1318K/0 iops]

So that, I add the statistic about the rcache, when the above case
happened, release the IOVAs which are not hit.
Jobs: 21 (f=21): [100.0% done] [10324M/0K /s] [2520K/0 iops]
Jobs: 21 (f=21): [100.0% done] [10290M/0K /s] [2512K/0 iops]
Jobs: 21 (f=21): [100.0% done] [10035M/0K /s] [2450K/0 iops]
Jobs: 21 (f=21): [100.0% done] [10214M/0K /s] [2494K/0 iops]

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/iova.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/iova.h |  1 +
 2 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 4b7a9efa0ef40af..f3828f4add25375 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -23,6 +23,8 @@ static unsigned long iova_rcache_get(struct iova_domain *iovad,
 				     unsigned long limit_pfn);
 static void init_iova_rcaches(struct iova_domain *iovad);
 static void free_iova_rcaches(struct iova_domain *iovad);
+static void iova_compact_rcache(struct iova_domain *iovad,
+				struct iova_rcache *curr_rcache);
 static void fq_destroy_all_entries(struct iova_domain *iovad);
 static void fq_flush_timeout(struct timer_list *t);
 
@@ -781,6 +783,8 @@ struct iova_magazine {
 
 struct iova_cpu_rcache {
 	spinlock_t lock;
+	bool prev_mag_hit;
+	unsigned long nr_hit;
 	struct iova_magazine *loaded;
 	struct iova_magazine *prev;
 };
@@ -934,6 +938,7 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 	if (mag_to_free) {
 		iova_magazine_free_pfns(mag_to_free, iovad);
 		iova_magazine_free(mag_to_free);
+		iova_compact_rcache(iovad, rcache);
 	}
 
 	return can_insert;
@@ -971,18 +976,22 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache,
 	} else if (!iova_magazine_empty(cpu_rcache->prev)) {
 		swap(cpu_rcache->prev, cpu_rcache->loaded);
 		has_pfn = true;
+		cpu_rcache->prev_mag_hit = true;
 	} else {
 		spin_lock(&rcache->lock);
 		if (rcache->depot_size > 0) {
 			iova_magazine_free(cpu_rcache->loaded);
 			cpu_rcache->loaded = rcache->depot[--rcache->depot_size];
 			has_pfn = true;
+			rcache->depot_mags_hit = true;
 		}
 		spin_unlock(&rcache->lock);
 	}
 
-	if (has_pfn)
+	if (has_pfn) {
+		cpu_rcache->nr_hit++;
 		iova_pfn = iova_magazine_pop(cpu_rcache->loaded, limit_pfn);
+	}
 
 	spin_unlock_irqrestore(&cpu_rcache->lock, flags);
 
@@ -1049,5 +1058,77 @@ void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad)
 	}
 }
 
+static void iova_compact_percpu_mags(struct iova_domain *iovad,
+				     struct iova_rcache *rcache)
+{
+	unsigned int cpu;
+
+	for_each_possible_cpu(cpu) {
+		unsigned long flags;
+		struct iova_cpu_rcache *cpu_rcache;
+
+		cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu);
+
+		spin_lock_irqsave(&cpu_rcache->lock, flags);
+		if (!cpu_rcache->prev_mag_hit)
+			iova_magazine_free_pfns(cpu_rcache->prev, iovad);
+
+		if (cpu_rcache->nr_hit < IOVA_MAG_SIZE)
+			iova_magazine_compact_pfns(cpu_rcache->loaded,
+						   iovad,
+						   cpu_rcache->nr_hit);
+
+		cpu_rcache->nr_hit = 0;
+		cpu_rcache->prev_mag_hit = false;
+		spin_unlock_irqrestore(&cpu_rcache->lock, flags);
+	}
+}
+
+static void iova_compact_depot_mags(struct iova_domain *iovad,
+				    struct iova_rcache *rcache)
+{
+	int i;
+	unsigned long depot_size;
+	struct iova_magazine *depot[MAX_GLOBAL_MAGS];
+
+	spin_lock(&rcache->lock);
+	if (!rcache->depot_size || rcache->depot_mags_hit) {
+		spin_unlock(&rcache->lock);
+		return;
+	}
+
+	depot_size = rcache->depot_size;
+	for (i = 0; i < depot_size; i++)
+		depot[i] = rcache->depot[i];
+	rcache->depot_size = 0;
+	rcache->depot_mags_hit = false;
+	spin_unlock(&rcache->lock);
+
+	for (i = 0; i < depot_size; i++) {
+		iova_magazine_free_pfns(depot[i], iovad);
+		iova_magazine_free(depot[i]);
+	}
+}
+
+static void iova_compact_rcache(struct iova_domain *iovad,
+				struct iova_rcache *curr_rcache)
+{
+	int i;
+	struct iova_rcache *rcache;
+
+	for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; i++) {
+		rcache = &iovad->rcaches[i];
+
+		/*
+		 * Don's compact current rcache, that maybe reused immediately
+		 */
+		if (rcache == curr_rcache)
+			continue;
+
+		iova_compact_percpu_mags(iovad, rcache);
+		iova_compact_depot_mags(iovad, rcache);
+	}
+}
+
 MODULE_AUTHOR("Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>");
 MODULE_LICENSE("GPL");
diff --git a/include/linux/iova.h b/include/linux/iova.h
index a0637abffee88b0..44f35b2641b736c 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -30,6 +30,7 @@ struct iova {
 
 struct iova_rcache {
 	spinlock_t lock;
+	bool depot_mags_hit;
 	unsigned long depot_size;
 	struct iova_magazine *depot[MAX_GLOBAL_MAGS];
 	struct iova_cpu_rcache __percpu *cpu_rcaches;
-- 
1.8.3


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 0/2] iommu/iova: enhance the rcache optimization
  2019-08-15 12:11 [PATCH v2 0/2] iommu/iova: enhance the rcache optimization Zhen Lei
  2019-08-15 12:11 ` [PATCH v2 1/2] iommu/iova: introduce iova_magazine_compact_pfns() Zhen Lei
  2019-08-15 12:11 ` [PATCH v2 2/2] iommu/iova: enhance the rcache optimization Zhen Lei
@ 2019-08-23  8:15 ` " Leizhen (ThunderTown)
  2 siblings, 0 replies; 4+ messages in thread
From: Leizhen (ThunderTown) @ 2019-08-23  8:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Jean-Philippe Brucker, John Garry,
	Robin Murphy, Will Deacon, Joerg Roedel, iommu, Omer Peleg,
	Adam Morrison, Shaohua Li, Ben Serebrin, David Woodhouse,
	linux-arm-kernel, linux-kernel

Hi all,
  Can anyone help review it?


On 2019/8/15 20:11, Zhen Lei wrote:
> v1 --> v2
> 1. I did not chagne the patches but added this cover-letter.
> 2. Add a batch of reviewers base on
>    9257b4a206fc ("iommu/iova: introduce per-cpu caching to iova allocation")
> 3. I described the problem I met in patch 2, but I hope below brief description
>    can help people to quickly understand.
>    Suppose there are six rcache sizes, each size can maximum hold 10000 IOVAs.
>    --------------------------------------------
>    |  4K   |  8K  | 16K  |  32K | 64K  | 128K |
>    --------------------------------------------
>    | 10000 | 9000 | 8500 | 8600 | 9200 | 7000 |
>    --------------------------------------------
>    As the above map displayed, the whole rcache buffered too many IOVAs. Now, the
>    worst case can be coming, suppose we need 20000 4K IOVAs at one time. That means
>    10000 IOVAs can be allocated from rcache, but another 10000 IOVAs should be 
>    allocated from RB tree base on alloc_iova() function. But the RB tree currently
>    have at least (9000 + 8500 + 8600 + 9200 + 7000) = 42300 nodes. The average speed
>    of RB tree traverse will be very slow. For my test scenario, the 4K size IOVAs are
>    frequently used, but others are not. So similarly, when the 20000 4K IOVAs are
>    continuous freed, the first 10000 IOVAs can be quickly buffered, but the other
>    10000 IOVAs can not.
> 
> Zhen Lei (2):
>   iommu/iova: introduce iova_magazine_compact_pfns()
>   iommu/iova: enhance the rcache optimization
> 
>  drivers/iommu/iova.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++----
>  include/linux/iova.h |   1 +
>  2 files changed, 95 insertions(+), 6 deletions(-)
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-15 12:11 [PATCH v2 0/2] iommu/iova: enhance the rcache optimization Zhen Lei
2019-08-15 12:11 ` [PATCH v2 1/2] iommu/iova: introduce iova_magazine_compact_pfns() Zhen Lei
2019-08-15 12:11 ` [PATCH v2 2/2] iommu/iova: enhance the rcache optimization Zhen Lei
2019-08-23  8:15 ` [PATCH v2 0/2] " Leizhen (ThunderTown)

IOMMU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-iommu/0 linux-iommu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-iommu linux-iommu/ https://lore.kernel.org/linux-iommu \
		iommu@lists.linux-foundation.org iommu@archiver.kernel.org
	public-inbox-index linux-iommu

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linux-foundation.lists.iommu


AGPL code for this site: git clone https://public-inbox.org/ public-inbox