iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: John Garry <john.garry@huawei.com>,
	joro@8bytes.org, will@kernel.org, jejb@linux.ibm.com,
	martin.petersen@oracle.com, hch@lst.de, m.szyprowski@samsung.com
Cc: iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org, linuxarm@huawei.com
Subject: Re: [PATCH 1/6] iommu: Move IOVA power-of-2 roundup into allocator
Date: Fri, 19 Mar 2021 19:20:16 +0000	[thread overview]
Message-ID: <afc2fc05-a799-cb14-debd-d36afed8f456@arm.com> (raw)
In-Reply-To: <73d459de-b5cc-e2f5-bcd7-2ee23c8d5075@huawei.com>

On 2021-03-19 16:58, John Garry wrote:
> On 19/03/2021 16:13, Robin Murphy wrote:
>> On 2021-03-19 13:25, John Garry wrote:
>>> Move the IOVA size power-of-2 rcache roundup into the IOVA allocator.
>>>
>>> This is to eventually make it possible to be able to configure the upper
>>> limit of the IOVA rcache range.
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   drivers/iommu/dma-iommu.c |  8 ------
>>>   drivers/iommu/iova.c      | 51 ++++++++++++++++++++++++++-------------
>>>   2 files changed, 34 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index af765c813cc8..15b7270a5c2a 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -429,14 +429,6 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
>>> iommu_domain *domain,
>>>       shift = iova_shift(iovad);
>>>       iova_len = size >> shift;
>>> -    /*
>>> -     * Freeing non-power-of-two-sized allocations back into the IOVA 
>>> caches
>>> -     * will come back to bite us badly, so we have to waste a bit of 
>>> space
>>> -     * rounding up anything cacheable to make sure that can't 
>>> happen. The
>>> -     * order of the unadjusted size will still match upon freeing.
>>> -     */
>>> -    if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>>> -        iova_len = roundup_pow_of_two(iova_len);
>>>       dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index e6e2fa85271c..e62e9e30b30c 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -179,7 +179,7 @@ iova_insert_rbtree(struct rb_root *root, struct 
>>> iova *iova,
>>>   static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
>>>           unsigned long size, unsigned long limit_pfn,
>>> -            struct iova *new, bool size_aligned)
>>> +            struct iova *new, bool size_aligned, bool fast)
>>>   {
>>>       struct rb_node *curr, *prev;
>>>       struct iova *curr_iova;
>>> @@ -188,6 +188,15 @@ static int __alloc_and_insert_iova_range(struct 
>>> iova_domain *iovad,
>>>       unsigned long align_mask = ~0UL;
>>>       unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
>>> +    /*
>>> +     * Freeing non-power-of-two-sized allocations back into the IOVA 
>>> caches
>>> +     * will come back to bite us badly, so we have to waste a bit of 
>>> space
>>> +     * rounding up anything cacheable to make sure that can't 
>>> happen. The
>>> +     * order of the unadjusted size will still match upon freeing.
>>> +     */
>>> +    if (fast && size < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>>> +        size = roundup_pow_of_two(size);
>>
>> If this transformation is only relevant to alloc_iova_fast(), and we 
>> have to add a special parameter here to tell whether we were called 
>> from alloc_iova_fast(), doesn't it seem more sensible to just do it in 
>> alloc_iova_fast() rather than here?
> 
> We have the restriction that anything we put in the rcache needs be a 
> power-of-2.

I was really only talking about the apparently silly structure of:

void foo(bool in_bar) {
	if (in_bar)
		//do thing
	...
}
void bar() {
	foo(true);
}

vs.:

void foo() {
	...
}
void bar() {
	//do thing
	foo();
}

> So then we have the issue of how to dynamically increase this rcache 
> threshold. The problem is that we may have many devices associated with 
> the same domain. So, in theory, we can't assume that when we increase 
> the threshold that some other device will try to fast free an IOVA which 
> was allocated prior to the increase and was not rounded up.
> 
> I'm very open to better (or less bad) suggestions on how to do this ...

...but yes, regardless of exactly where it happens, rounding up or not 
is the problem for rcaches in general. I've said several times that my 
preferred approach is to not change it that dynamically at all, but 
instead treat it more like we treat the default domain type.

> I could say that we only allow this for a group with a single device, so 
> these sort of things don't have to be worried about, but even then the 
> iommu_group internals are not readily accessible here.
> 
>>
>> But then the API itself has no strict requirement that a pfn passed to 
>> free_iova_fast() wasn't originally allocated with alloc_iova(), so 
>> arguably hiding the adjustment away makes it less clear that the 
>> responsibility is really on any caller of free_iova_fast() to make 
>> sure they don't get things wrong.
>>
> 
> alloc_iova() doesn't roundup to pow-of-2, so wouldn't it be broken to do 
> that?

Well, right now neither call rounds up, which is why iommu-dma takes 
care to avoid any issues by explicitly rounding up for itself 
beforehand. I'm just concerned that giving the impression that the API 
takes care of everything for itself will make it easier to write broken 
code in future, if that impression is in fact not entirely true.

I don't even think it's very likely that someone would manage to hit 
that rather wacky alloc/free pattern either way, I just know that 
getting wrong-sized things into the rcaches is an absolute sod to debug, 
so...

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2021-03-19 19:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19 13:25 [PATCH 0/6] dma mapping/iommu: Allow IOMMU IOVA rcache range to be configured John Garry
2021-03-19 13:25 ` [PATCH 1/6] iommu: Move IOVA power-of-2 roundup into allocator John Garry
2021-03-19 16:13   ` Robin Murphy
2021-03-19 16:58     ` John Garry
2021-03-19 19:20       ` Robin Murphy [this message]
2021-03-22 15:01         ` John Garry
2021-03-31  9:58           ` Robin Murphy
2021-04-06 16:54             ` John Garry
2021-04-14 17:44               ` John Garry
2021-03-19 13:25 ` [PATCH 2/6] iova: Add a per-domain count of reserved nodes John Garry
2021-03-19 13:25 ` [PATCH 3/6] iova: Allow rcache range upper limit to be configurable John Garry
2021-03-19 16:25   ` Robin Murphy
2021-03-19 17:26     ` John Garry
2021-03-31 10:53       ` Robin Murphy
2021-03-19 13:25 ` [PATCH 4/6] iommu: Add iommu_dma_set_opt_size() John Garry
2021-03-19 13:25 ` [PATCH 5/6] dma-mapping/iommu: Add dma_set_max_opt_size() John Garry
2021-03-19 17:00   ` Robin Murphy
2021-03-19 18:02     ` John Garry
2021-03-31  8:01     ` Salil Mehta
2021-03-31  8:08     ` Salil Mehta
2021-03-19 13:25 ` [PATCH 6/6] scsi: hisi_sas: Set max optimal DMA size for v3 hw John Garry
2021-03-19 13:40 ` [PATCH 0/6] dma mapping/iommu: Allow IOMMU IOVA rcache range to be configured Christoph Hellwig
2021-03-19 15:42   ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afc2fc05-a799-cb14-debd-d36afed8f456@arm.com \
    --to=robin.murphy@arm.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jejb@linux.ibm.com \
    --cc=john.garry@huawei.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=m.szyprowski@samsung.com \
    --cc=martin.petersen@oracle.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).