From: John Garry <john.garry@huawei.com> To: Cong Wang <xiyou.wangcong@gmail.com>, <iommu@lists.linux-foundation.org> Cc: <linux-kernel@vger.kernel.org> Subject: Re: [Patch v2 1/3] iommu: match the original algorithm Date: Fri, 29 Nov 2019 14:43:23 +0000 [thread overview] Message-ID: <d0f58734-0c1e-af9d-3437-31cf6c8a86f9@huawei.com> (raw) In-Reply-To: <20191129004855.18506-2-xiyou.wangcong@gmail.com> On 29/11/2019 00:48, Cong Wang wrote: > The IOVA cache algorithm implemented in IOMMU code does not > exactly match the original algorithm described in the paper. > which paper? > Particularly, it doesn't need to free the loaded empty magazine > when trying to put it back to global depot. To make it work, we > have to pre-allocate magazines in the depot and only recycle them > when all of them are full. > > Before this patch, rcache->depot[] contains either full or > freed entries, after this patch, it contains either full or > empty (but allocated) entries. I *quickly* tested this patch and got a small performance gain. > > Cc: Joerg Roedel <joro@8bytes.org> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> > --- > drivers/iommu/iova.c | 45 +++++++++++++++++++++++++++----------------- > 1 file changed, 28 insertions(+), 17 deletions(-) > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index 41c605b0058f..cb473ddce4cf 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -862,12 +862,16 @@ static void init_iova_rcaches(struct iova_domain *iovad) > struct iova_cpu_rcache *cpu_rcache; > struct iova_rcache *rcache; > unsigned int cpu; > - int i; > + int i, j; > > for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { > rcache = &iovad->rcaches[i]; > spin_lock_init(&rcache->lock); > rcache->depot_size = 0; > + for (j = 0; j < MAX_GLOBAL_MAGS; ++j) { > + rcache->depot[j] = iova_magazine_alloc(GFP_KERNEL); > + WARN_ON(!rcache->depot[j]); > + } > rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size()); > if (WARN_ON(!rcache->cpu_rcaches)) > continue; > @@ -900,24 +904,30 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, > > if (!iova_magazine_full(cpu_rcache->loaded)) { > can_insert = true; > - } else if (!iova_magazine_full(cpu_rcache->prev)) { > + } else if (iova_magazine_empty(cpu_rcache->prev)) { is this change strictly necessary? > swap(cpu_rcache->prev, cpu_rcache->loaded); > can_insert = true; > } else { > - struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC); > + spin_lock(&rcache->lock); > + if (rcache->depot_size < MAX_GLOBAL_MAGS) { > + swap(rcache->depot[rcache->depot_size], cpu_rcache->prev); > + swap(cpu_rcache->prev, cpu_rcache->loaded); > + rcache->depot_size++; > + can_insert = true; > + } else { > + mag_to_free = cpu_rcache->loaded; > + } > + spin_unlock(&rcache->lock); > + > + if (mag_to_free) { > + struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC); > > - if (new_mag) { > - spin_lock(&rcache->lock); > - if (rcache->depot_size < MAX_GLOBAL_MAGS) { > - rcache->depot[rcache->depot_size++] = > - cpu_rcache->loaded; > + if (new_mag) { > + cpu_rcache->loaded = new_mag; > + can_insert = true; > } else { > - mag_to_free = cpu_rcache->loaded; > + mag_to_free = NULL; > } > - spin_unlock(&rcache->lock); > - > - cpu_rcache->loaded = new_mag; > - can_insert = true; > } > } > > @@ -963,14 +973,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache, > > if (!iova_magazine_empty(cpu_rcache->loaded)) { > has_pfn = true; > - } else if (!iova_magazine_empty(cpu_rcache->prev)) { > + } else if (iova_magazine_full(cpu_rcache->prev)) { > swap(cpu_rcache->prev, cpu_rcache->loaded); > has_pfn = true; > } else { > spin_lock(&rcache->lock); > if (rcache->depot_size > 0) { > - iova_magazine_free(cpu_rcache->loaded); it is good to remove this from under the lock, apart from this change > - cpu_rcache->loaded = rcache->depot[--rcache->depot_size]; > + swap(rcache->depot[rcache->depot_size - 1], cpu_rcache->prev); > + swap(cpu_rcache->prev, cpu_rcache->loaded); > + rcache->depot_size--; I'm not sure how appropriate the name "depot_size" is any longer. > has_pfn = true; > } > spin_unlock(&rcache->lock); > @@ -1019,7 +1030,7 @@ static void free_iova_rcaches(struct iova_domain *iovad) > iova_magazine_free(cpu_rcache->prev); > } > free_percpu(rcache->cpu_rcaches); > - for (j = 0; j < rcache->depot_size; ++j) > + for (j = 0; j < MAX_GLOBAL_MAGS; ++j) > iova_magazine_free(rcache->depot[j]); > } > } >
WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com> To: Cong Wang <xiyou.wangcong@gmail.com>, <iommu@lists.linux-foundation.org> Cc: linux-kernel@vger.kernel.org Subject: Re: [Patch v2 1/3] iommu: match the original algorithm Date: Fri, 29 Nov 2019 14:43:23 +0000 [thread overview] Message-ID: <d0f58734-0c1e-af9d-3437-31cf6c8a86f9@huawei.com> (raw) In-Reply-To: <20191129004855.18506-2-xiyou.wangcong@gmail.com> On 29/11/2019 00:48, Cong Wang wrote: > The IOVA cache algorithm implemented in IOMMU code does not > exactly match the original algorithm described in the paper. > which paper? > Particularly, it doesn't need to free the loaded empty magazine > when trying to put it back to global depot. To make it work, we > have to pre-allocate magazines in the depot and only recycle them > when all of them are full. > > Before this patch, rcache->depot[] contains either full or > freed entries, after this patch, it contains either full or > empty (but allocated) entries. I *quickly* tested this patch and got a small performance gain. > > Cc: Joerg Roedel <joro@8bytes.org> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> > --- > drivers/iommu/iova.c | 45 +++++++++++++++++++++++++++----------------- > 1 file changed, 28 insertions(+), 17 deletions(-) > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index 41c605b0058f..cb473ddce4cf 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -862,12 +862,16 @@ static void init_iova_rcaches(struct iova_domain *iovad) > struct iova_cpu_rcache *cpu_rcache; > struct iova_rcache *rcache; > unsigned int cpu; > - int i; > + int i, j; > > for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { > rcache = &iovad->rcaches[i]; > spin_lock_init(&rcache->lock); > rcache->depot_size = 0; > + for (j = 0; j < MAX_GLOBAL_MAGS; ++j) { > + rcache->depot[j] = iova_magazine_alloc(GFP_KERNEL); > + WARN_ON(!rcache->depot[j]); > + } > rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size()); > if (WARN_ON(!rcache->cpu_rcaches)) > continue; > @@ -900,24 +904,30 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, > > if (!iova_magazine_full(cpu_rcache->loaded)) { > can_insert = true; > - } else if (!iova_magazine_full(cpu_rcache->prev)) { > + } else if (iova_magazine_empty(cpu_rcache->prev)) { is this change strictly necessary? > swap(cpu_rcache->prev, cpu_rcache->loaded); > can_insert = true; > } else { > - struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC); > + spin_lock(&rcache->lock); > + if (rcache->depot_size < MAX_GLOBAL_MAGS) { > + swap(rcache->depot[rcache->depot_size], cpu_rcache->prev); > + swap(cpu_rcache->prev, cpu_rcache->loaded); > + rcache->depot_size++; > + can_insert = true; > + } else { > + mag_to_free = cpu_rcache->loaded; > + } > + spin_unlock(&rcache->lock); > + > + if (mag_to_free) { > + struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC); > > - if (new_mag) { > - spin_lock(&rcache->lock); > - if (rcache->depot_size < MAX_GLOBAL_MAGS) { > - rcache->depot[rcache->depot_size++] = > - cpu_rcache->loaded; > + if (new_mag) { > + cpu_rcache->loaded = new_mag; > + can_insert = true; > } else { > - mag_to_free = cpu_rcache->loaded; > + mag_to_free = NULL; > } > - spin_unlock(&rcache->lock); > - > - cpu_rcache->loaded = new_mag; > - can_insert = true; > } > } > > @@ -963,14 +973,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache, > > if (!iova_magazine_empty(cpu_rcache->loaded)) { > has_pfn = true; > - } else if (!iova_magazine_empty(cpu_rcache->prev)) { > + } else if (iova_magazine_full(cpu_rcache->prev)) { > swap(cpu_rcache->prev, cpu_rcache->loaded); > has_pfn = true; > } else { > spin_lock(&rcache->lock); > if (rcache->depot_size > 0) { > - iova_magazine_free(cpu_rcache->loaded); it is good to remove this from under the lock, apart from this change > - cpu_rcache->loaded = rcache->depot[--rcache->depot_size]; > + swap(rcache->depot[rcache->depot_size - 1], cpu_rcache->prev); > + swap(cpu_rcache->prev, cpu_rcache->loaded); > + rcache->depot_size--; I'm not sure how appropriate the name "depot_size" is any longer. > has_pfn = true; > } > spin_unlock(&rcache->lock); > @@ -1019,7 +1030,7 @@ static void free_iova_rcaches(struct iova_domain *iovad) > iova_magazine_free(cpu_rcache->prev); > } > free_percpu(rcache->cpu_rcaches); > - for (j = 0; j < rcache->depot_size; ++j) > + for (j = 0; j < MAX_GLOBAL_MAGS; ++j) > iova_magazine_free(rcache->depot[j]); > } > } > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2019-11-29 14:43 UTC|newest] Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-29 0:48 [Patch v2 0/3] iommu: reduce spinlock contention on fast path Cong Wang 2019-11-29 0:48 ` Cong Wang 2019-11-29 0:48 ` [Patch v2 1/3] iommu: match the original algorithm Cong Wang 2019-11-29 0:48 ` Cong Wang 2019-11-29 14:43 ` John Garry [this message] 2019-11-29 14:43 ` John Garry 2019-11-30 5:58 ` Cong Wang 2019-11-30 5:58 ` Cong Wang 2019-12-02 10:55 ` John Garry 2019-12-02 10:55 ` John Garry 2019-12-03 19:26 ` Cong Wang 2019-12-03 19:26 ` Cong Wang 2019-12-02 16:58 ` Christoph Hellwig 2019-12-02 16:58 ` Christoph Hellwig 2019-12-03 19:24 ` Cong Wang 2019-12-03 19:24 ` Cong Wang 2019-11-29 0:48 ` [Patch v2 2/3] iommu: optimize iova_magazine_free_pfns() Cong Wang 2019-11-29 0:48 ` Cong Wang 2019-11-29 13:24 ` John Garry 2019-11-29 13:24 ` John Garry 2019-11-30 6:02 ` Cong Wang 2019-11-30 6:02 ` Cong Wang 2019-12-02 10:02 ` John Garry 2019-12-02 10:02 ` John Garry 2019-12-03 19:40 ` Cong Wang 2019-12-03 19:40 ` Cong Wang 2019-12-02 16:59 ` Christoph Hellwig 2019-12-02 16:59 ` Christoph Hellwig 2019-12-03 19:28 ` Cong Wang 2019-12-03 19:28 ` Cong Wang 2019-11-29 0:48 ` [Patch v2 3/3] iommu: avoid taking iova_rbtree_lock twice Cong Wang 2019-11-29 0:48 ` Cong Wang 2019-11-29 13:34 ` John Garry 2019-11-29 13:34 ` John Garry 2019-11-30 6:03 ` Cong Wang 2019-11-30 6:03 ` Cong Wang 2019-12-17 9:43 ` [Patch v2 0/3] iommu: reduce spinlock contention on fast path Joerg Roedel 2019-12-17 9:43 ` Joerg Roedel 2019-12-18 4:32 ` Cong Wang 2019-12-18 4:32 ` Cong Wang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=d0f58734-0c1e-af9d-3437-31cf6c8a86f9@huawei.com \ --to=john.garry@huawei.com \ --cc=iommu@lists.linux-foundation.org \ --cc=linux-kernel@vger.kernel.org \ --cc=xiyou.wangcong@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.