All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: "chenxiang (M)" <chenxiang66@hisilicon.com>,
	will@kernel.org, joro@8bytes.org
Cc: iommu@lists.linux-foundation.org, linuxarm@openeuler.org,
	linuxarm@huawei.com
Subject: Re: [Linuxarm] Re: [PATCH 0/4] Free cached iovas when rmmod the driver of the last device in group
Date: Tue, 8 Jun 2021 15:17:32 +0100	[thread overview]
Message-ID: <4de7e383-e481-b3d5-7030-20cc8d7740fc@arm.com> (raw)
In-Reply-To: <9a4764fc-28bc-e4cb-10f5-e4934d24b9b5@hisilicon.com>

On 2021-06-08 08:50, chenxiang (M) wrote:
> Hi Robin,
> 
> 
> 在 2021/6/7 19:23, Robin Murphy 写道:
>> On 2021-06-07 03:42, chenxiang wrote:
>>> From: Xiang Chen <chenxiang66@hisilicon.com>
>>>
>>> When rmmod the driver of the last device in the group, cached iovas 
>>> are not
>>> used, and it is better to free them to save memories. And also export
>>> function free_rcache_cached_iovas() and iommu_domain_to_iova().
>>
>> How common is it to use a device a significant amount, then unbind its 
>> driver and not use it for long enough to care about? Off-hand I can't 
>> think of a particularly realistic use-case which looks like that - the 
>> closest I can imagine is unbinding a default kernel driver to replace 
>> it with VFIO, but I would expect the set of devices intended for 
>> assignment to be distinct from the set of devices used by the host 
>> itself, and thus the host driver wouldn't have actually done much to 
>> generate a lot of DMA mappings in that initial period. Is my 
>> expectation there wrong?
> 
> It is possible that user uses the driver for a while and then rmmod it 
> (for example, SAS card is the driver we use always, but sometimes we 
> need to compare the performance with raid card, so we will insmod the 
> raid driver, and rmmod
> the driver after the test). At this situation, the rcache iovas of raid 
> card are always still even if rmmod the driver (user always rmmod the 
> driver rather than removing the device which will release the group and 
> release all resources).

OK, so my question is how many end users are doing that on production 
systems with distro kernels, such that this feature deserves being 
supported and maintained in mainline?
>> If there is such a case, how much memory does this actually save in 
>> practice? The theoretical worst-case should be roughly 4 * 128 * 6 * 
>> sizeof(struct iova) bytes per CPU, which is around 192KB assuming 
>> 64-byte kmem cache alignment. 
> 
> There are 128 CPUs in our ARM64 kunpeng920 board, and i add a debugfs 
> interface drop_rcache of every group in local code which calls function 
> free_rcache_cached_iovas(), and we can see that it saves 1~2M memory 
> after freeing cached iovas.
> 
> estuary:/$ free
>               total       used       free     shared    buffers cached
> Mem:     526776720    1336216  525440504     177664          0 177664
> -/+ buffers/cache:    1158552  525618168
> Swap:            0          0          0
> estuary:/sys/kernel/debug/iommu/iovad/iommu_domain29$ echo 1 > drop_rcache
> estuary:/sys/kernel/debug/iommu/iovad/iommu_domain29$ free
>               total       used       free     shared    buffers cached
> Mem:     526776720    1334672  525442048     177664          0 177664
> -/+ buffers/cache:    1157008  525619712
> Swap:            0          0          0

Erm, isn't 12KB from 4GB (per CPU) basically just noise? :/

Sure, it might start to look a bit more significant on a quad-core TV 
box with 1GB of RAM total, but then something like that isn't ever going 
to be driving a big multi-queue SAS adapter to churn that many DMA 
mappings in the first place.

Consider that a "memory saving" feature which only has some minimal 
benefit for niche use-cases on very large systems, but costs a constant 
amount of memory in terms of code and symbol bloat in the kernel image, 
even on actually-memory-constrained systems, is rather counterproductive 
overall. As I said before, I do like the general idea *if* there's a 
significant saving to be had from it, but what you're showing us here 
suggests that there isn't :(

Note also that on arm64 systems, commit 65688d2a05de ("arm64: cache: 
Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)") queued in linux-next is 
already likely to cut everything in half here.

> The other reason i want to free rcache iovas in suitable place is the 
> IOVA longterm issue[0] (which is unrecoverable), if freeing cached iovas 
> when rmmod the driver or
> adding a debugfs to do that, i think we can recover it by rmmod the 
> driver and then insmod it or calling the debugfs drop_rcache though it 
> doesn't solve the issue.

Or perhaps we could, y'know, actually solve that problem, rather than 
invent hacks to bodge around it. I've lost track of where we got to with 
that thread (sorry!), but I think it still needs someone to look into 
the relative timing of all the flush queue interactions if we're to 
really understand what's going on, and if a simple obvious fix doesn't 
fall out of that then it probably makes sense to convert the rcache 
depot to a garbage-collected list (per the original concept).

Much as I think that purging the rcaches automatically to paper over 
that issue isn't a great idea, making the user do it manually is 
certainly even worse.

Robin.

> [0] 
> https://lore.kernel.org/linux-iommu/1607538189-237944-4-git-send-email-john.garry@huawei.com/ 
> 
> 
>> However it seems rather unlikely in practice to have every possible 
>> cache entry of every size used, so if saving smaller amounts of memory 
>> is a concern wouldn't you also want to explicitly drain the flush 
>> queues (16KB per CPU) and maybe look at trying to reclaim the unused 
>> pagetable pages from the domain itself - that ~192KB worth of cached 
>> IOVAs represents ~32K pages worth of IOVA space, which for an 
>> implementation like io-pgtable-arm with the 4KB granule means ~256KB 
>> worth of non-leaf pagetables left behind.
> 
> Ok, we may consider about those.
> 
>>
>> I'm not against the idea of having a mechanism to "compact" an idle 
>> DMA domain if there are convincing numbers to back it up, but the 
>> actual implementation would need to be better than this as well - 
>> having the IOMMU core code poking directly into the internals of the 
>> iommu-dma abstraction is not the way to go, and exporting anything to 
>> modules, which the IOMU core is not, smells suspicious.
>>
>> Robin.
>>
>>> Xiang Chen (4):
>>>    iommu/iova: add a function to free all rcached iovas and export it
>>>    iommu/iova: use function free_rcache_cached_iovas() to free all
>>>      rcached iovas
>>>    dma-iommu: add a interface to get iova_domain from iommu domain
>>>    iommu: free cached iovas when rmmod the driver of the last device in
>>>      the group
>>>
>>>   drivers/iommu/dma-iommu.c |  7 +++++++
>>>   drivers/iommu/iommu.c     |  7 +++++++
>>>   drivers/iommu/iova.c      | 17 ++++++++++++-----
>>>   include/linux/dma-iommu.h |  6 ++++++
>>>   include/linux/iova.h      |  5 +++++
>>>   5 files changed, 37 insertions(+), 5 deletions(-)
>>>
>> _______________________________________________
>> Linuxarm mailing list -- linuxarm@openeuler.org
>> To unsubscribe send an email to linuxarm-leave@openeuler.org
> 
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

      reply	other threads:[~2021-06-08 14:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07  2:42 [PATCH 0/4] Free cached iovas when rmmod the driver of the last device in group chenxiang
2021-06-07  2:42 ` [PATCH 1/4] iommu/iova: add a function to free all rcached iovas and export it chenxiang
2021-06-07  2:43 ` [PATCH 2/4] iommu/iova: use function free_rcache_cached_iovas() to free all rcached iovas chenxiang
2021-06-07  2:43 ` [PATCH 3/4] dma-iommu: add a interface to get iova_domain from iommu domain chenxiang
2021-06-07  2:43 ` [PATCH 4/4] iommu: free cached iovas when rmmod the driver of the last device in the group chenxiang
2021-06-07 11:23 ` [PATCH 0/4] Free cached iovas when rmmod the driver of the last device in group Robin Murphy
2021-06-08  7:50   ` [Linuxarm] " chenxiang (M)
2021-06-08 14:17     ` Robin Murphy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4de7e383-e481-b3d5-7030-20cc8d7740fc@arm.com \
    --to=robin.murphy@arm.com \
    --cc=chenxiang66@hisilicon.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=linuxarm@huawei.com \
    --cc=linuxarm@openeuler.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.