All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: <iommu@lists.linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [Patch v2 1/3] iommu: match the original algorithm
Date: Mon, 2 Dec 2019 10:55:33 +0000	[thread overview]
Message-ID: <b27d0ba1-4f30-3e25-6898-26305a3d42db@huawei.com> (raw)
In-Reply-To: <CAM_iQpXAf8obF1-CRCGc3Fb_YmNBozcyoKQC5yuP6r9Akg6HBg@mail.gmail.com>

On 30/11/2019 05:58, Cong Wang wrote:
> On Fri, Nov 29, 2019 at 6:43 AM John Garry <john.garry@huawei.com> wrote:
>>
>> On 29/11/2019 00:48, Cong Wang wrote:
>>> The IOVA cache algorithm implemented in IOMMU code does not
>>> exactly match the original algorithm described in the paper.
>>>
>>
>> which paper?
> 
> It's in drivers/iommu/iova.c, from line 769:
> 
>   769 /*
>   770  * Magazine caches for IOVA ranges.  For an introduction to magazines,
>   771  * see the USENIX 2001 paper "Magazines and Vmem: Extending the Slab
>   772  * Allocator to Many CPUs and Arbitrary Resources" by Bonwick and Adams.
>   773  * For simplicity, we use a static magazine size and don't implement the
>   774  * dynamic size tuning described in the paper.
>   775  */
> 
> 
>>
>>> Particularly, it doesn't need to free the loaded empty magazine
>>> when trying to put it back to global depot. To make it work, we
>>> have to pre-allocate magazines in the depot and only recycle them
>>> when all of them are full.
>>>
>>> Before this patch, rcache->depot[] contains either full or
>>> freed entries, after this patch, it contains either full or
>>> empty (but allocated) entries.
>>
>> I *quickly* tested this patch and got a small performance gain.
> 
> Thanks for testing! It requires a different workload to see bigger gain,
> in our case, 24 memcache.parallel servers with 120 clients.
> 

So in fact I was getting a ~10% throughput boost for my storage test. 
Seems more than I would expect/hope for. I would need to test more.

> 
>>
>>>
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>> ---
>>>    drivers/iommu/iova.c | 45 +++++++++++++++++++++++++++-----------------
>>>    1 file changed, 28 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 41c605b0058f..cb473ddce4cf 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -862,12 +862,16 @@ static void init_iova_rcaches(struct iova_domain *iovad)
>>>        struct iova_cpu_rcache *cpu_rcache;
>>>        struct iova_rcache *rcache;
>>>        unsigned int cpu;
>>> -     int i;
>>> +     int i, j;
>>>
>>>        for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>>>                rcache = &iovad->rcaches[i];
>>>                spin_lock_init(&rcache->lock);
>>>                rcache->depot_size = 0;
>>> +             for (j = 0; j < MAX_GLOBAL_MAGS; ++j) {
>>> +                     rcache->depot[j] = iova_magazine_alloc(GFP_KERNEL);
>>> +                     WARN_ON(!rcache->depot[j]);
>>> +             }
>>>                rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size());
>>>                if (WARN_ON(!rcache->cpu_rcaches))
>>>                        continue;
>>> @@ -900,24 +904,30 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
>>>
>>>        if (!iova_magazine_full(cpu_rcache->loaded)) {
>>>                can_insert = true;
>>> -     } else if (!iova_magazine_full(cpu_rcache->prev)) {
>>> +     } else if (iova_magazine_empty(cpu_rcache->prev)) {
>>
>> is this change strictly necessary?
> 
> Yes, because it is what described in the paper. But it should be
> functionally same because cpu_rcache->prev is either full or empty.

That is was what I was getting at.

> 
> 
> 
>>
>>>                swap(cpu_rcache->prev, cpu_rcache->loaded);
>>>                can_insert = true;
>>>        } else {
>>> -             struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC);

Apart from this change, did anyone ever consider kmem cache for the 
magazines?

>>> +             spin_lock(&rcache->lock);
>>> +             if (rcache->depot_size < MAX_GLOBAL_MAGS) {
>>> +                     swap(rcache->depot[rcache->depot_size], cpu_rcache->prev);
>>> +                     swap(cpu_rcache->prev, cpu_rcache->loaded);
>>> +                     rcache->depot_size++;
>>> +                     can_insert = true;
>>> +             } else {
>>> +                     mag_to_free = cpu_rcache->loaded;
>>> +             }
>>> +             spin_unlock(&rcache->lock);
>>> +
>>> +             if (mag_to_free) {
>>> +                     struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC);
>>>
>>> -             if (new_mag) {
>>> -                     spin_lock(&rcache->lock);
>>> -                     if (rcache->depot_size < MAX_GLOBAL_MAGS) {
>>> -                             rcache->depot[rcache->depot_size++] =
>>> -                                             cpu_rcache->loaded;
>>> +                     if (new_mag) {
>>> +                             cpu_rcache->loaded = new_mag;
>>> +                             can_insert = true;
>>>                        } else {
>>> -                             mag_to_free = cpu_rcache->loaded;
>>> +                             mag_to_free = NULL;
>>>                        }
>>> -                     spin_unlock(&rcache->lock);
>>> -
>>> -                     cpu_rcache->loaded = new_mag;
>>> -                     can_insert = true;
>>>                }
>>>        }
>>>
>>> @@ -963,14 +973,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache,
>>>
>>>        if (!iova_magazine_empty(cpu_rcache->loaded)) {
>>>                has_pfn = true;
>>> -     } else if (!iova_magazine_empty(cpu_rcache->prev)) {
>>> +     } else if (iova_magazine_full(cpu_rcache->prev)) {
>>>                swap(cpu_rcache->prev, cpu_rcache->loaded);
>>>                has_pfn = true;
>>>        } else {
>>>                spin_lock(&rcache->lock);
>>>                if (rcache->depot_size > 0) {
>>> -                     iova_magazine_free(cpu_rcache->loaded);
>>
>> it is good to remove this from under the lock, apart from this change
>>
>>> -                     cpu_rcache->loaded = rcache->depot[--rcache->depot_size];
>>> +                     swap(rcache->depot[rcache->depot_size - 1], cpu_rcache->prev);
>>> +                     swap(cpu_rcache->prev, cpu_rcache->loaded);

I wonder if not using swap() at all is neater here.

>>> +                     rcache->depot_size--;
>>
>> I'm not sure how appropriate the name "depot_size" is any longer.
> 
> I think it is still okay, because empty ones don't count. However if you
> have better names, I am open to your suggestion.

Yeah, probably.

thanks,
John


WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: iommu@lists.linux-foundation.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [Patch v2 1/3] iommu: match the original algorithm
Date: Mon, 2 Dec 2019 10:55:33 +0000	[thread overview]
Message-ID: <b27d0ba1-4f30-3e25-6898-26305a3d42db@huawei.com> (raw)
In-Reply-To: <CAM_iQpXAf8obF1-CRCGc3Fb_YmNBozcyoKQC5yuP6r9Akg6HBg@mail.gmail.com>

On 30/11/2019 05:58, Cong Wang wrote:
> On Fri, Nov 29, 2019 at 6:43 AM John Garry <john.garry@huawei.com> wrote:
>>
>> On 29/11/2019 00:48, Cong Wang wrote:
>>> The IOVA cache algorithm implemented in IOMMU code does not
>>> exactly match the original algorithm described in the paper.
>>>
>>
>> which paper?
> 
> It's in drivers/iommu/iova.c, from line 769:
> 
>   769 /*
>   770  * Magazine caches for IOVA ranges.  For an introduction to magazines,
>   771  * see the USENIX 2001 paper "Magazines and Vmem: Extending the Slab
>   772  * Allocator to Many CPUs and Arbitrary Resources" by Bonwick and Adams.
>   773  * For simplicity, we use a static magazine size and don't implement the
>   774  * dynamic size tuning described in the paper.
>   775  */
> 
> 
>>
>>> Particularly, it doesn't need to free the loaded empty magazine
>>> when trying to put it back to global depot. To make it work, we
>>> have to pre-allocate magazines in the depot and only recycle them
>>> when all of them are full.
>>>
>>> Before this patch, rcache->depot[] contains either full or
>>> freed entries, after this patch, it contains either full or
>>> empty (but allocated) entries.
>>
>> I *quickly* tested this patch and got a small performance gain.
> 
> Thanks for testing! It requires a different workload to see bigger gain,
> in our case, 24 memcache.parallel servers with 120 clients.
> 

So in fact I was getting a ~10% throughput boost for my storage test. 
Seems more than I would expect/hope for. I would need to test more.

> 
>>
>>>
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>> ---
>>>    drivers/iommu/iova.c | 45 +++++++++++++++++++++++++++-----------------
>>>    1 file changed, 28 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 41c605b0058f..cb473ddce4cf 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -862,12 +862,16 @@ static void init_iova_rcaches(struct iova_domain *iovad)
>>>        struct iova_cpu_rcache *cpu_rcache;
>>>        struct iova_rcache *rcache;
>>>        unsigned int cpu;
>>> -     int i;
>>> +     int i, j;
>>>
>>>        for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>>>                rcache = &iovad->rcaches[i];
>>>                spin_lock_init(&rcache->lock);
>>>                rcache->depot_size = 0;
>>> +             for (j = 0; j < MAX_GLOBAL_MAGS; ++j) {
>>> +                     rcache->depot[j] = iova_magazine_alloc(GFP_KERNEL);
>>> +                     WARN_ON(!rcache->depot[j]);
>>> +             }
>>>                rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size());
>>>                if (WARN_ON(!rcache->cpu_rcaches))
>>>                        continue;
>>> @@ -900,24 +904,30 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
>>>
>>>        if (!iova_magazine_full(cpu_rcache->loaded)) {
>>>                can_insert = true;
>>> -     } else if (!iova_magazine_full(cpu_rcache->prev)) {
>>> +     } else if (iova_magazine_empty(cpu_rcache->prev)) {
>>
>> is this change strictly necessary?
> 
> Yes, because it is what described in the paper. But it should be
> functionally same because cpu_rcache->prev is either full or empty.

That is was what I was getting at.

> 
> 
> 
>>
>>>                swap(cpu_rcache->prev, cpu_rcache->loaded);
>>>                can_insert = true;
>>>        } else {
>>> -             struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC);

Apart from this change, did anyone ever consider kmem cache for the 
magazines?

>>> +             spin_lock(&rcache->lock);
>>> +             if (rcache->depot_size < MAX_GLOBAL_MAGS) {
>>> +                     swap(rcache->depot[rcache->depot_size], cpu_rcache->prev);
>>> +                     swap(cpu_rcache->prev, cpu_rcache->loaded);
>>> +                     rcache->depot_size++;
>>> +                     can_insert = true;
>>> +             } else {
>>> +                     mag_to_free = cpu_rcache->loaded;
>>> +             }
>>> +             spin_unlock(&rcache->lock);
>>> +
>>> +             if (mag_to_free) {
>>> +                     struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC);
>>>
>>> -             if (new_mag) {
>>> -                     spin_lock(&rcache->lock);
>>> -                     if (rcache->depot_size < MAX_GLOBAL_MAGS) {
>>> -                             rcache->depot[rcache->depot_size++] =
>>> -                                             cpu_rcache->loaded;
>>> +                     if (new_mag) {
>>> +                             cpu_rcache->loaded = new_mag;
>>> +                             can_insert = true;
>>>                        } else {
>>> -                             mag_to_free = cpu_rcache->loaded;
>>> +                             mag_to_free = NULL;
>>>                        }
>>> -                     spin_unlock(&rcache->lock);
>>> -
>>> -                     cpu_rcache->loaded = new_mag;
>>> -                     can_insert = true;
>>>                }
>>>        }
>>>
>>> @@ -963,14 +973,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache,
>>>
>>>        if (!iova_magazine_empty(cpu_rcache->loaded)) {
>>>                has_pfn = true;
>>> -     } else if (!iova_magazine_empty(cpu_rcache->prev)) {
>>> +     } else if (iova_magazine_full(cpu_rcache->prev)) {
>>>                swap(cpu_rcache->prev, cpu_rcache->loaded);
>>>                has_pfn = true;
>>>        } else {
>>>                spin_lock(&rcache->lock);
>>>                if (rcache->depot_size > 0) {
>>> -                     iova_magazine_free(cpu_rcache->loaded);
>>
>> it is good to remove this from under the lock, apart from this change
>>
>>> -                     cpu_rcache->loaded = rcache->depot[--rcache->depot_size];
>>> +                     swap(rcache->depot[rcache->depot_size - 1], cpu_rcache->prev);
>>> +                     swap(cpu_rcache->prev, cpu_rcache->loaded);

I wonder if not using swap() at all is neater here.

>>> +                     rcache->depot_size--;
>>
>> I'm not sure how appropriate the name "depot_size" is any longer.
> 
> I think it is still okay, because empty ones don't count. However if you
> have better names, I am open to your suggestion.

Yeah, probably.

thanks,
John

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2019-12-02 10:55 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-29  0:48 [Patch v2 0/3] iommu: reduce spinlock contention on fast path Cong Wang
2019-11-29  0:48 ` Cong Wang
2019-11-29  0:48 ` [Patch v2 1/3] iommu: match the original algorithm Cong Wang
2019-11-29  0:48   ` Cong Wang
2019-11-29 14:43   ` John Garry
2019-11-29 14:43     ` John Garry
2019-11-30  5:58     ` Cong Wang
2019-11-30  5:58       ` Cong Wang
2019-12-02 10:55       ` John Garry [this message]
2019-12-02 10:55         ` John Garry
2019-12-03 19:26         ` Cong Wang
2019-12-03 19:26           ` Cong Wang
2019-12-02 16:58   ` Christoph Hellwig
2019-12-02 16:58     ` Christoph Hellwig
2019-12-03 19:24     ` Cong Wang
2019-12-03 19:24       ` Cong Wang
2019-11-29  0:48 ` [Patch v2 2/3] iommu: optimize iova_magazine_free_pfns() Cong Wang
2019-11-29  0:48   ` Cong Wang
2019-11-29 13:24   ` John Garry
2019-11-29 13:24     ` John Garry
2019-11-30  6:02     ` Cong Wang
2019-11-30  6:02       ` Cong Wang
2019-12-02 10:02       ` John Garry
2019-12-02 10:02         ` John Garry
2019-12-03 19:40         ` Cong Wang
2019-12-03 19:40           ` Cong Wang
2019-12-02 16:59   ` Christoph Hellwig
2019-12-02 16:59     ` Christoph Hellwig
2019-12-03 19:28     ` Cong Wang
2019-12-03 19:28       ` Cong Wang
2019-11-29  0:48 ` [Patch v2 3/3] iommu: avoid taking iova_rbtree_lock twice Cong Wang
2019-11-29  0:48   ` Cong Wang
2019-11-29 13:34   ` John Garry
2019-11-29 13:34     ` John Garry
2019-11-30  6:03     ` Cong Wang
2019-11-30  6:03       ` Cong Wang
2019-12-17  9:43 ` [Patch v2 0/3] iommu: reduce spinlock contention on fast path Joerg Roedel
2019-12-17  9:43   ` Joerg Roedel
2019-12-18  4:32   ` Cong Wang
2019-12-18  4:32     ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b27d0ba1-4f30-3e25-6898-26305a3d42db@huawei.com \
    --to=john.garry@huawei.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.