All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: John Garry <john.garry@huawei.com>, Will Deacon <will@kernel.org>
Cc: linux-kernel@vger.kernel.org, sakari.ailus@linux.intel.com,
	mst@redhat.com, airlied@linux.ie, gregkh@linuxfoundation.org,
	linuxarm@huawei.com, jonathanh@nvidia.com,
	iommu@lists.linux-foundation.org, thierry.reding@gmail.com,
	daniel@ffwll.ch, bingbu.cao@intel.com, digetx@gmail.com,
	mchehab@kernel.org, jasowang@redhat.com, tian.shu.qiu@intel.com
Subject: Re: [PATCH v4 2/6] iova: Allow rcache range upper limit to be flexible
Date: Mon, 2 Aug 2021 17:09:35 +0100	[thread overview]
Message-ID: <83de3911-145d-77c8-17c1-981e4ff825d3@arm.com> (raw)
In-Reply-To: <27bb22cf-db64-0aa5-215f-2adf06b6455d@huawei.com>

On 2021-08-02 16:23, John Garry wrote:
> On 02/08/2021 16:01, Will Deacon wrote:
>> On Wed, Jul 14, 2021 at 06:36:39PM +0800, John Garry wrote:
>>> Some LLDs may request DMA mappings whose IOVA length exceeds that of the
>>> current rcache upper limit.
>>
>> What's an LLD?
>>
> 
> low-level driver
> 
> maybe I'll stick with simply "drivers"
> 
>>> This means that allocations for those IOVAs will never be cached, and
>>> always must be allocated and freed from the RB tree per DMA mapping 
>>> cycle.
>>> This has a significant effect on performance, more so since commit
>>> 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search
>>> fails"), as discussed at [0].
>>>
>>> As a first step towards allowing the rcache range upper limit be
>>> configured, hold this value in the IOVA rcache structure, and allocate
>>> the rcaches separately.
>>>
>>> [0] 
>>> https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/ 
>>>
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   drivers/iommu/dma-iommu.c |  2 +-
>>>   drivers/iommu/iova.c      | 23 +++++++++++++++++------
>>>   include/linux/iova.h      |  4 ++--
>>>   3 files changed, 20 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 98ba927aee1a..4772278aa5da 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -434,7 +434,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
>>> iommu_domain *domain,
>>>        * rounding up anything cacheable to make sure that can't 
>>> happen. The
>>>        * order of the unadjusted size will still match upon freeing.
>>>        */
>>> -    if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>>> +    if (iova_len < (1 << (iovad->rcache_max_size - 1)))
>>>           iova_len = roundup_pow_of_two(iova_len);
>>>       dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index b6cf5f16123b..07ce73fdd8c1 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -15,6 +15,8 @@
>>>   /* The anchor node sits above the top of the usable address space */
>>>   #define IOVA_ANCHOR    ~0UL
>>> +#define IOVA_RANGE_CACHE_MAX_SIZE 6    /* log of max cached IOVA 
>>> range size (in pages) */
>>
>> Is that the same as an 'order'? i.e. IOVA_RANGE_CACHE_MAX_ORDER?
> 
> Yeah, that may be better. I was just using the same name as before.
> 
>>
>>> +
>>>   static bool iova_rcache_insert(struct iova_domain *iovad,
>>>                      unsigned long pfn,
>>>                      unsigned long size);
>>> @@ -881,7 +883,14 @@ static void init_iova_rcaches(struct iova_domain 
>>> *iovad)
>>>       unsigned int cpu;
>>>       int i;
>>> -    for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>>> +    iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE;
>>> +
>>> +    iovad->rcaches = kcalloc(iovad->rcache_max_size,
>>> +                 sizeof(*iovad->rcaches), GFP_KERNEL);
>>> +    if (!iovad->rcaches)
>>> +        return;
>>
>> Returning quietly here doesn't seem like the right thing to do. At 
>> least, I
>> don't think the rest of the functions here are checking rcaches against
>> NULL.
>>
> 
> For sure, but that is what other code which can fail here already does, 
> like:
> 
> static void init_iova_rcaches(struct iova_domain *iovad)
> {
>      ...
> 
>      for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>          ...
> 
>          rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), 
> cache_line_size());
>          if (WARN_ON(!rcache->cpu_rcaches))
>              continue;
> }
> 
> and that is not safe either.

Yeah, along with flush queues, historically this has all been 
super-dodgy in terms of failure handling (or lack of).

> This issue was raised a while ago. I don't mind trying to fix it - a 
> slightly painful part is that it touches a few subsystems.

Maybe pull the rcache init out of iova_domain_init() entirely? Only 
iommu-dma uses {alloc,free}_iova_fast(), so TBH it's only a great big 
waste of memory for all the other IOVA domain users anyway.

The other week I started pondering how much of iommu-dma only needs to 
be exposed to the IOMMU core rather than the whole kernel now; I suppose 
there's probably an equal argument to be made for some of these bits of 
the IOVA API, and this might pave the way towards some more logical 
separation, but let's get the functional side dealt with before we worry 
too much about splitting headers.

Robin.

WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: John Garry <john.garry@huawei.com>, Will Deacon <will@kernel.org>
Cc: mchehab@kernel.org, daniel@ffwll.ch, mst@redhat.com,
	airlied@linux.ie, gregkh@linuxfoundation.org,
	jasowang@redhat.com, linuxarm@huawei.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	thierry.reding@gmail.com, sakari.ailus@linux.intel.com,
	bingbu.cao@intel.com, digetx@gmail.com, jonathanh@nvidia.com,
	tian.shu.qiu@intel.com
Subject: Re: [PATCH v4 2/6] iova: Allow rcache range upper limit to be flexible
Date: Mon, 2 Aug 2021 17:09:35 +0100	[thread overview]
Message-ID: <83de3911-145d-77c8-17c1-981e4ff825d3@arm.com> (raw)
In-Reply-To: <27bb22cf-db64-0aa5-215f-2adf06b6455d@huawei.com>

On 2021-08-02 16:23, John Garry wrote:
> On 02/08/2021 16:01, Will Deacon wrote:
>> On Wed, Jul 14, 2021 at 06:36:39PM +0800, John Garry wrote:
>>> Some LLDs may request DMA mappings whose IOVA length exceeds that of the
>>> current rcache upper limit.
>>
>> What's an LLD?
>>
> 
> low-level driver
> 
> maybe I'll stick with simply "drivers"
> 
>>> This means that allocations for those IOVAs will never be cached, and
>>> always must be allocated and freed from the RB tree per DMA mapping 
>>> cycle.
>>> This has a significant effect on performance, more so since commit
>>> 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search
>>> fails"), as discussed at [0].
>>>
>>> As a first step towards allowing the rcache range upper limit be
>>> configured, hold this value in the IOVA rcache structure, and allocate
>>> the rcaches separately.
>>>
>>> [0] 
>>> https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/ 
>>>
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   drivers/iommu/dma-iommu.c |  2 +-
>>>   drivers/iommu/iova.c      | 23 +++++++++++++++++------
>>>   include/linux/iova.h      |  4 ++--
>>>   3 files changed, 20 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 98ba927aee1a..4772278aa5da 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -434,7 +434,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
>>> iommu_domain *domain,
>>>        * rounding up anything cacheable to make sure that can't 
>>> happen. The
>>>        * order of the unadjusted size will still match upon freeing.
>>>        */
>>> -    if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>>> +    if (iova_len < (1 << (iovad->rcache_max_size - 1)))
>>>           iova_len = roundup_pow_of_two(iova_len);
>>>       dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index b6cf5f16123b..07ce73fdd8c1 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -15,6 +15,8 @@
>>>   /* The anchor node sits above the top of the usable address space */
>>>   #define IOVA_ANCHOR    ~0UL
>>> +#define IOVA_RANGE_CACHE_MAX_SIZE 6    /* log of max cached IOVA 
>>> range size (in pages) */
>>
>> Is that the same as an 'order'? i.e. IOVA_RANGE_CACHE_MAX_ORDER?
> 
> Yeah, that may be better. I was just using the same name as before.
> 
>>
>>> +
>>>   static bool iova_rcache_insert(struct iova_domain *iovad,
>>>                      unsigned long pfn,
>>>                      unsigned long size);
>>> @@ -881,7 +883,14 @@ static void init_iova_rcaches(struct iova_domain 
>>> *iovad)
>>>       unsigned int cpu;
>>>       int i;
>>> -    for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>>> +    iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE;
>>> +
>>> +    iovad->rcaches = kcalloc(iovad->rcache_max_size,
>>> +                 sizeof(*iovad->rcaches), GFP_KERNEL);
>>> +    if (!iovad->rcaches)
>>> +        return;
>>
>> Returning quietly here doesn't seem like the right thing to do. At 
>> least, I
>> don't think the rest of the functions here are checking rcaches against
>> NULL.
>>
> 
> For sure, but that is what other code which can fail here already does, 
> like:
> 
> static void init_iova_rcaches(struct iova_domain *iovad)
> {
>      ...
> 
>      for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
>          ...
> 
>          rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), 
> cache_line_size());
>          if (WARN_ON(!rcache->cpu_rcaches))
>              continue;
> }
> 
> and that is not safe either.

Yeah, along with flush queues, historically this has all been 
super-dodgy in terms of failure handling (or lack of).

> This issue was raised a while ago. I don't mind trying to fix it - a 
> slightly painful part is that it touches a few subsystems.

Maybe pull the rcache init out of iova_domain_init() entirely? Only 
iommu-dma uses {alloc,free}_iova_fast(), so TBH it's only a great big 
waste of memory for all the other IOVA domain users anyway.

The other week I started pondering how much of iommu-dma only needs to 
be exposed to the IOMMU core rather than the whole kernel now; I suppose 
there's probably an equal argument to be made for some of these bits of 
the IOVA API, and this might pave the way towards some more logical 
separation, but let's get the functional side dealt with before we worry 
too much about splitting headers.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2021-08-02 16:09 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14 10:36 [PATCH v4 0/6] iommu: Allow IOVA rcache range be configured John Garry
2021-07-14 10:36 ` John Garry
2021-07-14 10:36 ` [PATCH v4 1/6] iommu: Refactor iommu_group_store_type() John Garry
2021-07-14 10:36   ` John Garry
2021-08-02 14:46   ` Will Deacon
2021-08-02 14:46     ` Will Deacon
2021-07-14 10:36 ` [PATCH v4 2/6] iova: Allow rcache range upper limit to be flexible John Garry
2021-07-14 10:36   ` John Garry
2021-08-02 15:01   ` Will Deacon
2021-08-02 15:01     ` Will Deacon
2021-08-02 15:23     ` John Garry
2021-08-02 15:23       ` John Garry
2021-08-02 16:09       ` Robin Murphy [this message]
2021-08-02 16:09         ` Robin Murphy
2021-07-14 10:36 ` [PATCH v4 3/6] iommu: Allow iommu_change_dev_def_domain() realloc default domain for same type John Garry
2021-07-14 10:36   ` John Garry
2021-07-14 10:36 ` [PATCH v4 4/6] iommu: Allow max opt DMA len be set for a group via sysfs John Garry
2021-07-14 10:36   ` John Garry
2021-07-14 10:36 ` [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain() John Garry
2021-07-14 10:36   ` John Garry
2021-08-02 15:06   ` Will Deacon
2021-08-02 15:06     ` Will Deacon
2021-08-02 16:06     ` John Garry
2021-08-02 16:06       ` John Garry
2021-08-02 16:16       ` Robin Murphy
2021-08-02 16:16         ` Robin Murphy
2021-08-02 16:40         ` John Garry
2021-08-02 16:40           ` John Garry
2021-08-02 17:18           ` John Garry
2021-08-02 17:18             ` John Garry
2021-09-21  8:48         ` John Garry
2021-09-21  8:48           ` John Garry
2021-07-14 10:36 ` [PATCH v4 6/6] dma-iommu: Pass iova len for IOVA domain init John Garry
2021-07-14 10:36   ` John Garry
2021-07-15  1:36 kernel test robot
2021-07-19  7:58 ` Dan Carpenter
2021-07-19  7:58 ` Dan Carpenter
2021-07-19  9:12 ` John Garry
2021-07-19  9:12   ` John Garry
2021-07-19  9:32   ` Robin Murphy
2021-07-19  9:32     ` Robin Murphy
2021-07-19 10:45     ` John Garry
2021-07-19 10:45       ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83de3911-145d-77c8-17c1-981e4ff825d3@arm.com \
    --to=robin.murphy@arm.com \
    --cc=airlied@linux.ie \
    --cc=bingbu.cao@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=digetx@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jasowang@redhat.com \
    --cc=john.garry@huawei.com \
    --cc=jonathanh@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=mst@redhat.com \
    --cc=sakari.ailus@linux.intel.com \
    --cc=thierry.reding@gmail.com \
    --cc=tian.shu.qiu@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.