All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: John Garry <john.garry@huawei.com>
Cc: iommu@lists.linux-foundation.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [Patch v2 1/3] iommu: match the original algorithm
Date: Fri, 29 Nov 2019 21:58:34 -0800	[thread overview]
Message-ID: <CAM_iQpXAf8obF1-CRCGc3Fb_YmNBozcyoKQC5yuP6r9Akg6HBg@mail.gmail.com> (raw)
In-Reply-To: <d0f58734-0c1e-af9d-3437-31cf6c8a86f9@huawei.com>

On Fri, Nov 29, 2019 at 6:43 AM John Garry <john.garry@huawei.com> wrote:
>
> On 29/11/2019 00:48, Cong Wang wrote:
> > The IOVA cache algorithm implemented in IOMMU code does not
> > exactly match the original algorithm described in the paper.
> >
>
> which paper?

It's in drivers/iommu/iova.c, from line 769:

 769 /*
 770  * Magazine caches for IOVA ranges.  For an introduction to magazines,
 771  * see the USENIX 2001 paper "Magazines and Vmem: Extending the Slab
 772  * Allocator to Many CPUs and Arbitrary Resources" by Bonwick and Adams.
 773  * For simplicity, we use a static magazine size and don't implement the
 774  * dynamic size tuning described in the paper.
 775  */


>
> > Particularly, it doesn't need to free the loaded empty magazine
> > when trying to put it back to global depot. To make it work, we
> > have to pre-allocate magazines in the depot and only recycle them
> > when all of them are full.
> >
> > Before this patch, rcache->depot[] contains either full or
> > freed entries, after this patch, it contains either full or
> > empty (but allocated) entries.
>
> I *quickly* tested this patch and got a small performance gain.

Thanks for testing! It requires a different workload to see bigger gain,
in our case, 24 memcache.parallel servers with 120 clients.


>
> >
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > ---
> >   drivers/iommu/iova.c | 45 +++++++++++++++++++++++++++-----------------
> >   1 file changed, 28 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> > index 41c605b0058f..cb473ddce4cf 100644
> > --- a/drivers/iommu/iova.c
> > +++ b/drivers/iommu/iova.c
> > @@ -862,12 +862,16 @@ static void init_iova_rcaches(struct iova_domain *iovad)
> >       struct iova_cpu_rcache *cpu_rcache;
> >       struct iova_rcache *rcache;
> >       unsigned int cpu;
> > -     int i;
> > +     int i, j;
> >
> >       for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
> >               rcache = &iovad->rcaches[i];
> >               spin_lock_init(&rcache->lock);
> >               rcache->depot_size = 0;
> > +             for (j = 0; j < MAX_GLOBAL_MAGS; ++j) {
> > +                     rcache->depot[j] = iova_magazine_alloc(GFP_KERNEL);
> > +                     WARN_ON(!rcache->depot[j]);
> > +             }
> >               rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size());
> >               if (WARN_ON(!rcache->cpu_rcaches))
> >                       continue;
> > @@ -900,24 +904,30 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
> >
> >       if (!iova_magazine_full(cpu_rcache->loaded)) {
> >               can_insert = true;
> > -     } else if (!iova_magazine_full(cpu_rcache->prev)) {
> > +     } else if (iova_magazine_empty(cpu_rcache->prev)) {
>
> is this change strictly necessary?

Yes, because it is what described in the paper. But it should be
functionally same because cpu_rcache->prev is either full or empty.



>
> >               swap(cpu_rcache->prev, cpu_rcache->loaded);
> >               can_insert = true;
> >       } else {
> > -             struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC);
> > +             spin_lock(&rcache->lock);
> > +             if (rcache->depot_size < MAX_GLOBAL_MAGS) {
> > +                     swap(rcache->depot[rcache->depot_size], cpu_rcache->prev);
> > +                     swap(cpu_rcache->prev, cpu_rcache->loaded);
> > +                     rcache->depot_size++;
> > +                     can_insert = true;
> > +             } else {
> > +                     mag_to_free = cpu_rcache->loaded;
> > +             }
> > +             spin_unlock(&rcache->lock);
> > +
> > +             if (mag_to_free) {
> > +                     struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC);
> >
> > -             if (new_mag) {
> > -                     spin_lock(&rcache->lock);
> > -                     if (rcache->depot_size < MAX_GLOBAL_MAGS) {
> > -                             rcache->depot[rcache->depot_size++] =
> > -                                             cpu_rcache->loaded;
> > +                     if (new_mag) {
> > +                             cpu_rcache->loaded = new_mag;
> > +                             can_insert = true;
> >                       } else {
> > -                             mag_to_free = cpu_rcache->loaded;
> > +                             mag_to_free = NULL;
> >                       }
> > -                     spin_unlock(&rcache->lock);
> > -
> > -                     cpu_rcache->loaded = new_mag;
> > -                     can_insert = true;
> >               }
> >       }
> >
> > @@ -963,14 +973,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache,
> >
> >       if (!iova_magazine_empty(cpu_rcache->loaded)) {
> >               has_pfn = true;
> > -     } else if (!iova_magazine_empty(cpu_rcache->prev)) {
> > +     } else if (iova_magazine_full(cpu_rcache->prev)) {
> >               swap(cpu_rcache->prev, cpu_rcache->loaded);
> >               has_pfn = true;
> >       } else {
> >               spin_lock(&rcache->lock);
> >               if (rcache->depot_size > 0) {
> > -                     iova_magazine_free(cpu_rcache->loaded);
>
> it is good to remove this from under the lock, apart from this change
>
> > -                     cpu_rcache->loaded = rcache->depot[--rcache->depot_size];
> > +                     swap(rcache->depot[rcache->depot_size - 1], cpu_rcache->prev);
> > +                     swap(cpu_rcache->prev, cpu_rcache->loaded);
> > +                     rcache->depot_size--;
>
> I'm not sure how appropriate the name "depot_size" is any longer.

I think it is still okay, because empty ones don't count. However if you
have better names, I am open to your suggestion.

Thanks.

WARNING: multiple messages have this Message-ID (diff)
From: Cong Wang <xiyou.wangcong@gmail.com>
To: John Garry <john.garry@huawei.com>
Cc: iommu@lists.linux-foundation.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [Patch v2 1/3] iommu: match the original algorithm
Date: Fri, 29 Nov 2019 21:58:34 -0800	[thread overview]
Message-ID: <CAM_iQpXAf8obF1-CRCGc3Fb_YmNBozcyoKQC5yuP6r9Akg6HBg@mail.gmail.com> (raw)
In-Reply-To: <d0f58734-0c1e-af9d-3437-31cf6c8a86f9@huawei.com>

On Fri, Nov 29, 2019 at 6:43 AM John Garry <john.garry@huawei.com> wrote:
>
> On 29/11/2019 00:48, Cong Wang wrote:
> > The IOVA cache algorithm implemented in IOMMU code does not
> > exactly match the original algorithm described in the paper.
> >
>
> which paper?

It's in drivers/iommu/iova.c, from line 769:

 769 /*
 770  * Magazine caches for IOVA ranges.  For an introduction to magazines,
 771  * see the USENIX 2001 paper "Magazines and Vmem: Extending the Slab
 772  * Allocator to Many CPUs and Arbitrary Resources" by Bonwick and Adams.
 773  * For simplicity, we use a static magazine size and don't implement the
 774  * dynamic size tuning described in the paper.
 775  */


>
> > Particularly, it doesn't need to free the loaded empty magazine
> > when trying to put it back to global depot. To make it work, we
> > have to pre-allocate magazines in the depot and only recycle them
> > when all of them are full.
> >
> > Before this patch, rcache->depot[] contains either full or
> > freed entries, after this patch, it contains either full or
> > empty (but allocated) entries.
>
> I *quickly* tested this patch and got a small performance gain.

Thanks for testing! It requires a different workload to see bigger gain,
in our case, 24 memcache.parallel servers with 120 clients.


>
> >
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > ---
> >   drivers/iommu/iova.c | 45 +++++++++++++++++++++++++++-----------------
> >   1 file changed, 28 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> > index 41c605b0058f..cb473ddce4cf 100644
> > --- a/drivers/iommu/iova.c
> > +++ b/drivers/iommu/iova.c
> > @@ -862,12 +862,16 @@ static void init_iova_rcaches(struct iova_domain *iovad)
> >       struct iova_cpu_rcache *cpu_rcache;
> >       struct iova_rcache *rcache;
> >       unsigned int cpu;
> > -     int i;
> > +     int i, j;
> >
> >       for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
> >               rcache = &iovad->rcaches[i];
> >               spin_lock_init(&rcache->lock);
> >               rcache->depot_size = 0;
> > +             for (j = 0; j < MAX_GLOBAL_MAGS; ++j) {
> > +                     rcache->depot[j] = iova_magazine_alloc(GFP_KERNEL);
> > +                     WARN_ON(!rcache->depot[j]);
> > +             }
> >               rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size());
> >               if (WARN_ON(!rcache->cpu_rcaches))
> >                       continue;
> > @@ -900,24 +904,30 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
> >
> >       if (!iova_magazine_full(cpu_rcache->loaded)) {
> >               can_insert = true;
> > -     } else if (!iova_magazine_full(cpu_rcache->prev)) {
> > +     } else if (iova_magazine_empty(cpu_rcache->prev)) {
>
> is this change strictly necessary?

Yes, because it is what described in the paper. But it should be
functionally same because cpu_rcache->prev is either full or empty.



>
> >               swap(cpu_rcache->prev, cpu_rcache->loaded);
> >               can_insert = true;
> >       } else {
> > -             struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC);
> > +             spin_lock(&rcache->lock);
> > +             if (rcache->depot_size < MAX_GLOBAL_MAGS) {
> > +                     swap(rcache->depot[rcache->depot_size], cpu_rcache->prev);
> > +                     swap(cpu_rcache->prev, cpu_rcache->loaded);
> > +                     rcache->depot_size++;
> > +                     can_insert = true;
> > +             } else {
> > +                     mag_to_free = cpu_rcache->loaded;
> > +             }
> > +             spin_unlock(&rcache->lock);
> > +
> > +             if (mag_to_free) {
> > +                     struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC);
> >
> > -             if (new_mag) {
> > -                     spin_lock(&rcache->lock);
> > -                     if (rcache->depot_size < MAX_GLOBAL_MAGS) {
> > -                             rcache->depot[rcache->depot_size++] =
> > -                                             cpu_rcache->loaded;
> > +                     if (new_mag) {
> > +                             cpu_rcache->loaded = new_mag;
> > +                             can_insert = true;
> >                       } else {
> > -                             mag_to_free = cpu_rcache->loaded;
> > +                             mag_to_free = NULL;
> >                       }
> > -                     spin_unlock(&rcache->lock);
> > -
> > -                     cpu_rcache->loaded = new_mag;
> > -                     can_insert = true;
> >               }
> >       }
> >
> > @@ -963,14 +973,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache,
> >
> >       if (!iova_magazine_empty(cpu_rcache->loaded)) {
> >               has_pfn = true;
> > -     } else if (!iova_magazine_empty(cpu_rcache->prev)) {
> > +     } else if (iova_magazine_full(cpu_rcache->prev)) {
> >               swap(cpu_rcache->prev, cpu_rcache->loaded);
> >               has_pfn = true;
> >       } else {
> >               spin_lock(&rcache->lock);
> >               if (rcache->depot_size > 0) {
> > -                     iova_magazine_free(cpu_rcache->loaded);
>
> it is good to remove this from under the lock, apart from this change
>
> > -                     cpu_rcache->loaded = rcache->depot[--rcache->depot_size];
> > +                     swap(rcache->depot[rcache->depot_size - 1], cpu_rcache->prev);
> > +                     swap(cpu_rcache->prev, cpu_rcache->loaded);
> > +                     rcache->depot_size--;
>
> I'm not sure how appropriate the name "depot_size" is any longer.

I think it is still okay, because empty ones don't count. However if you
have better names, I am open to your suggestion.

Thanks.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2019-11-30  5:58 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-29  0:48 [Patch v2 0/3] iommu: reduce spinlock contention on fast path Cong Wang
2019-11-29  0:48 ` Cong Wang
2019-11-29  0:48 ` [Patch v2 1/3] iommu: match the original algorithm Cong Wang
2019-11-29  0:48   ` Cong Wang
2019-11-29 14:43   ` John Garry
2019-11-29 14:43     ` John Garry
2019-11-30  5:58     ` Cong Wang [this message]
2019-11-30  5:58       ` Cong Wang
2019-12-02 10:55       ` John Garry
2019-12-02 10:55         ` John Garry
2019-12-03 19:26         ` Cong Wang
2019-12-03 19:26           ` Cong Wang
2019-12-02 16:58   ` Christoph Hellwig
2019-12-02 16:58     ` Christoph Hellwig
2019-12-03 19:24     ` Cong Wang
2019-12-03 19:24       ` Cong Wang
2019-11-29  0:48 ` [Patch v2 2/3] iommu: optimize iova_magazine_free_pfns() Cong Wang
2019-11-29  0:48   ` Cong Wang
2019-11-29 13:24   ` John Garry
2019-11-29 13:24     ` John Garry
2019-11-30  6:02     ` Cong Wang
2019-11-30  6:02       ` Cong Wang
2019-12-02 10:02       ` John Garry
2019-12-02 10:02         ` John Garry
2019-12-03 19:40         ` Cong Wang
2019-12-03 19:40           ` Cong Wang
2019-12-02 16:59   ` Christoph Hellwig
2019-12-02 16:59     ` Christoph Hellwig
2019-12-03 19:28     ` Cong Wang
2019-12-03 19:28       ` Cong Wang
2019-11-29  0:48 ` [Patch v2 3/3] iommu: avoid taking iova_rbtree_lock twice Cong Wang
2019-11-29  0:48   ` Cong Wang
2019-11-29 13:34   ` John Garry
2019-11-29 13:34     ` John Garry
2019-11-30  6:03     ` Cong Wang
2019-11-30  6:03       ` Cong Wang
2019-12-17  9:43 ` [Patch v2 0/3] iommu: reduce spinlock contention on fast path Joerg Roedel
2019-12-17  9:43   ` Joerg Roedel
2019-12-18  4:32   ` Cong Wang
2019-12-18  4:32     ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM_iQpXAf8obF1-CRCGc3Fb_YmNBozcyoKQC5yuP6r9Akg6HBg@mail.gmail.com \
    --to=xiyou.wangcong@gmail.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=john.garry@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.