From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: John Garry <john.garry@huawei.com>, <robin.murphy@arm.com>,
<joro@8bytes.org>, <will@kernel.org>
Cc: <linuxarm@huawei.com>, <linux-kernel@vger.kernel.org>,
<iommu@lists.linux-foundation.org>, <xiyou.wangcong@gmail.com>
Subject: Re: [RESEND PATCH v3 3/4] iommu/iova: Flush CPU rcache for when a depot fills
Date: Wed, 9 Dec 2020 20:11:13 +0800 [thread overview]
Message-ID: <552fd9c5-d3dd-e1b3-d7e8-2a30904f22c4@huawei.com> (raw)
In-Reply-To: <851ba6cf-8f4c-74dc-3666-ee6d547999d3@huawei.com>
On 2020/12/9 19:22, John Garry wrote:
> On 09/12/2020 09:13, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2020/11/17 18:25, John Garry wrote:
>>> Leizhen reported some time ago that IOVA performance may degrade over time
>>> [0], but unfortunately his solution to fix this problem was not given
>>> attention.
>>>
>>> To summarize, the issue is that as time goes by, the CPU rcache and depot
>>> rcache continue to grow. As such, IOVA RB tree access time also continues
>>> to grow.
>>>
>>> At a certain point, a depot may become full, and also some CPU rcaches may
>>> also be full when inserting another IOVA is attempted. For this scenario,
>>> currently the "loaded" CPU rcache is freed and a new one is created. This
>>> freeing means that many IOVAs in the RB tree need to be freed, which
>>> makes IO throughput performance fall off a cliff in some storage scenarios:
>>>
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 iops]
>>> Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 iops]
>>>
>>> And continue in this fashion, without recovering. Note that in this
>>> example it was required to wait 16 hours for this to occur. Also note that
>>> IO throughput also becomes gradually becomes more unstable leading up to
>>> this point.
>>>
>>> This problem is only seen for non-strict mode. For strict mode, the rcaches
>>> stay quite compact.
>>>
>>> As a solution to this issue, judge that the IOVA caches have grown too big
>>> when cached magazines need to be free, and just flush all the CPUs rcaches
>>> instead.
>>>
>>> The depot rcaches, however, are not flushed, as they can be used to
>>> immediately replenish active CPUs.
>>>
>>> In future, some IOVA compaction could be implemented to solve the
>>> instabilty issue, which I figure could be quite complex to implement.
>>>
>>> [0] https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leizhen@huawei.com/
>>>
>>> Analyzed-by: Zhen Lei <thunder.leizhen@huawei.com>
>>> Reported-by: Xiang Chen <chenxiang66@hisilicon.com>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>
> Thanks for having a look
>
>>> ---
>>> drivers/iommu/iova.c | 16 ++++++----------
>>> 1 file changed, 6 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 1f3f0f8b12e0..386005055aca 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -901,7 +901,6 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
>>> struct iova_rcache *rcache,
>>> unsigned long iova_pfn)
>>> {
>>> - struct iova_magazine *mag_to_free = NULL;
>>> struct iova_cpu_rcache *cpu_rcache;
>>> bool can_insert = false;
>>> unsigned long flags;
>>> @@ -923,13 +922,12 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
>>> if (cpu_rcache->loaded)
>>> rcache->depot[rcache->depot_size++] =
>>> cpu_rcache->loaded;
>>> - } else {
>>> - mag_to_free = cpu_rcache->loaded;
>>> + can_insert = true;
>>> + cpu_rcache->loaded = new_mag;
>>> }
>>> spin_unlock(&rcache->lock);
>>> -
>>> - cpu_rcache->loaded = new_mag;
>>> - can_insert = true;
>>> + if (!can_insert)
>>> + iova_magazine_free(new_mag);
>>> }
>>> }
>>> @@ -938,10 +936,8 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
>>> spin_unlock_irqrestore(&cpu_rcache->lock, flags);
>>> - if (mag_to_free) {
>>> - iova_magazine_free_pfns(mag_to_free, iovad);
>>> - iova_magazine_free(mag_to_free);
>> mag_to_free has been stripped out, that's why lock protection is not required here.
>>
>>> - }
>>> + if (!can_insert)
>>> + free_all_cpu_cached_iovas(iovad);
>> Lock protection required.
>
> But we have the per-CPU rcache locking again in free_cpu_cached_iovas() (which is called per-CPU from free_all_cpu_cached_iovas()).
>
> ok? Or some other lock you mean?
Oh, Sorry, think of function free_cpu_cached_iovas() as function free_iova_rcaches().
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
>
> Cheers,
> John
>
>>
>>> return can_insert;
>>> }
>>>
>>
>> .
>>
>
>
> .
>
next prev parent reply other threads:[~2020-12-09 12:12 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-17 10:25 [RESEND PATCH v3 0/4] iommu/iova: Solve longterm IOVA issue John Garry
2020-11-17 10:25 ` [RESEND PATCH v3 1/4] iommu/iova: Add free_all_cpu_cached_iovas() John Garry
2020-12-09 8:58 ` Leizhen (ThunderTown)
2020-12-09 12:41 ` John Garry
2020-11-17 10:25 ` [RESEND PATCH v3 2/4] iommu/iova: Avoid double-negatives in magazine helpers John Garry
2020-12-09 9:03 ` Leizhen (ThunderTown)
2020-12-09 11:39 ` John Garry
2020-12-09 12:31 ` Leizhen (ThunderTown)
2020-11-17 10:25 ` [RESEND PATCH v3 3/4] iommu/iova: Flush CPU rcache for when a depot fills John Garry
2020-12-09 9:13 ` Leizhen (ThunderTown)
2020-12-09 11:22 ` John Garry
2020-12-09 12:11 ` Leizhen (ThunderTown) [this message]
2020-11-17 10:25 ` [RESEND PATCH v3 4/4] iommu: avoid taking iova_rbtree_lock twice John Garry
2020-12-01 15:35 ` [RESEND PATCH v3 0/4] iommu/iova: Solve longterm IOVA issue John Garry
2020-12-01 21:02 ` Will Deacon
2020-12-02 15:20 ` John Garry
2020-12-01 21:45 ` Will Deacon
2020-12-03 6:04 ` Dmitry Safonov
2020-12-03 14:54 ` John Garry
2021-01-15 11:32 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=552fd9c5-d3dd-e1b3-d7e8-2a30904f22c4@huawei.com \
--to=thunder.leizhen@huawei.com \
--cc=iommu@lists.linux-foundation.org \
--cc=john.garry@huawei.com \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).