All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Jason Gunthorpe <jgg@ziepe.ca>,
	Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: akpm@linux-foundation.org, alex.williamson@redhat.com,
	alim.akhtar@samsung.com, alyssa@rosenzweig.io,
	asahi@lists.linux.dev, baolu.lu@linux.intel.com,
	bhelgaas@google.com, cgroups@vger.kernel.org, corbet@lwn.net,
	david@redhat.com, dwmw2@infradead.org, hannes@cmpxchg.org,
	heiko@sntech.de, iommu@lists.linux.dev, jasowang@redhat.com,
	jernej.skrabec@gmail.com, jonathanh@nvidia.com, joro@8bytes.org,
	kevin.tian@intel.com, krzysztof.kozlowski@linaro.org,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-rockchip@lists.infradead.org,
	linux-samsung-soc@vger.kernel.org, linux-sunxi@lists.linux.dev,
	linux-tegra@vger.kernel.org, lizefan.x@bytedance.com,
	marcan@marcan.st, mhiramat@kernel.org, mst@redhat.com,
	m.szyprowski@samsung.com, netdev@vger.kernel.org,
	paulmck@kernel.org, rdunlap@infradead.org, samuel@sholland.org,
	suravee.suthikulpanit@amd.com, sven@svenpeter.dev,
	thierry.reding@gmail.com, tj@kernel.org,
	tomas.mudrunka@gmail.com, vdumpa@nvidia.com,
	virtualization@lists.linux.dev, wens@csie.org, will@kernel.org,
	yu-cheng.yu@intel.com
Subject: Re: [PATCH 08/16] iommu/fsl: use page allocation function provided by iommu-pages.h
Date: Wed, 29 Nov 2023 16:48:43 +0000	[thread overview]
Message-ID: <52de3aca-41b1-471e-8f87-1a77de547510@arm.com> (raw)
In-Reply-To: <20231128235037.GC1312390@ziepe.ca>

On 28/11/2023 11:50 pm, Jason Gunthorpe wrote:
> On Tue, Nov 28, 2023 at 06:00:13PM -0500, Pasha Tatashin wrote:
>> On Tue, Nov 28, 2023 at 5:53 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>>
>>> On 2023-11-28 8:49 pm, Pasha Tatashin wrote:
>>>> Convert iommu/fsl_pamu.c to use the new page allocation functions
>>>> provided in iommu-pages.h.
>>>
>>> Again, this is not a pagetable. This thing doesn't even *have* pagetables.
>>>
>>> Similar to patches #1 and #2 where you're lumping in configuration
>>> tables which belong to the IOMMU driver itself, as opposed to pagetables
>>> which effectively belong to an IOMMU domain's user. But then there are
>>> still drivers where you're *not* accounting similar configuration
>>> structures, so I really struggle to see how this metric is useful when
>>> it's so completely inconsistent in what it's counting :/
>>
>> The whole IOMMU subsystem allocates a significant amount of kernel
>> locked memory that we want to at least observe. The new field in
>> vmstat does just that: it reports ALL buddy allocator memory that
>> IOMMU allocates. However, for accounting purposes, I agree, we need to
>> do better, and separate at least iommu pagetables from the rest.
>>
>> We can separate the metric into two:
>> iommu pagetable only
>> iommu everything
>>
>> or into three:
>> iommu pagetable only
>> iommu dma
>> iommu everything
>>
>> What do you think?
> 
> I think I said this at LPC - if you want to have fine grained
> accounting of memory by owner you need to go talk to the cgroup people
> and come up with something generic. Adding ever open coded finer
> category breakdowns just for iommu doesn't make alot of sense.
> 
> You can make some argument that the pagetable memory should be counted
> because kvm counts it's shadow memory, but I wouldn't go into further
> detail than that with hand coded counters..

Right, pagetable memory is interesting since it's something that any 
random kernel user can indirectly allocate via iommu_domain_alloc() and 
iommu_map(), and some of those users may even be doing so on behalf of 
userspace. I have no objection to accounting and potentially applying 
limits to *that*.

Beyond that, though, there is nothing special about "the IOMMU 
subsystem". The amount of memory an IOMMU driver needs to allocate for 
itself in order to function is not of interest beyond curiosity, it just 
is what it is; limiting it would only break the IOMMU, and if a user 
thinks it's "too much", the only actionable thing that might help is to 
physically remove devices from the system. Similar for DMA buffers; it 
might be intriguing to account those, but it's not really an actionable 
metric - in the overwhelming majority of cases you can't simply tell a 
driver to allocate less than what it needs. And that is of course 
assuming if we were to account *all* DMA buffers, since whether they 
happen to have an IOMMU translation or not is irrelevant (we'd have 
already accounted the pagetables as pagetables if so).

I bet "the networking subsystem" also consumes significant memory on the 
same kind of big systems where IOMMU pagetables would be of any concern. 
I believe some of the some of the "serious" NICs can easily run up 
hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. 
- would you propose accounting those too?

Thanks,
Robin.

WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: Jason Gunthorpe <jgg@ziepe.ca>,
	Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: akpm@linux-foundation.org, alex.williamson@redhat.com,
	alim.akhtar@samsung.com, alyssa@rosenzweig.io,
	asahi@lists.linux.dev, baolu.lu@linux.intel.com,
	bhelgaas@google.com, cgroups@vger.kernel.org, corbet@lwn.net,
	david@redhat.com, dwmw2@infradead.org, hannes@cmpxchg.org,
	heiko@sntech.de, iommu@lists.linux.dev, jasowang@redhat.com,
	jernej.skrabec@gmail.com, jonathanh@nvidia.com, joro@8bytes.org,
	kevin.tian@intel.com, krzysztof.kozlowski@linaro.org,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-rockchip@lists.infradead.org,
	linux-samsung-soc@vger.kernel.org, linux-sunxi@lists.linux.dev,
	linux-tegra@vger.kernel.org, lizefan.x@bytedance.com,
	marcan@marcan.st, mhiramat@kernel.org, mst@redhat.com,
	m.szyprowski@samsung.com, netdev@vger.kernel.org,
	paulmck@kernel.org, rdunlap@infradead.org, samuel@sholland.org,
	suravee.suthikulpanit@amd.com, sven@svenpeter.dev,
	thierry.reding@gmail.com, tj@kernel.org,
	tomas.mudrunka@gmail.com, vdumpa@nvidia.com,
	virtualization@lists.linux.dev, wens@csie.org, will@kernel.org,
	yu-cheng.yu@intel.com
Subject: Re: [PATCH 08/16] iommu/fsl: use page allocation function provided by iommu-pages.h
Date: Wed, 29 Nov 2023 16:48:43 +0000	[thread overview]
Message-ID: <52de3aca-41b1-471e-8f87-1a77de547510@arm.com> (raw)
In-Reply-To: <20231128235037.GC1312390@ziepe.ca>

On 28/11/2023 11:50 pm, Jason Gunthorpe wrote:
> On Tue, Nov 28, 2023 at 06:00:13PM -0500, Pasha Tatashin wrote:
>> On Tue, Nov 28, 2023 at 5:53 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>>
>>> On 2023-11-28 8:49 pm, Pasha Tatashin wrote:
>>>> Convert iommu/fsl_pamu.c to use the new page allocation functions
>>>> provided in iommu-pages.h.
>>>
>>> Again, this is not a pagetable. This thing doesn't even *have* pagetables.
>>>
>>> Similar to patches #1 and #2 where you're lumping in configuration
>>> tables which belong to the IOMMU driver itself, as opposed to pagetables
>>> which effectively belong to an IOMMU domain's user. But then there are
>>> still drivers where you're *not* accounting similar configuration
>>> structures, so I really struggle to see how this metric is useful when
>>> it's so completely inconsistent in what it's counting :/
>>
>> The whole IOMMU subsystem allocates a significant amount of kernel
>> locked memory that we want to at least observe. The new field in
>> vmstat does just that: it reports ALL buddy allocator memory that
>> IOMMU allocates. However, for accounting purposes, I agree, we need to
>> do better, and separate at least iommu pagetables from the rest.
>>
>> We can separate the metric into two:
>> iommu pagetable only
>> iommu everything
>>
>> or into three:
>> iommu pagetable only
>> iommu dma
>> iommu everything
>>
>> What do you think?
> 
> I think I said this at LPC - if you want to have fine grained
> accounting of memory by owner you need to go talk to the cgroup people
> and come up with something generic. Adding ever open coded finer
> category breakdowns just for iommu doesn't make alot of sense.
> 
> You can make some argument that the pagetable memory should be counted
> because kvm counts it's shadow memory, but I wouldn't go into further
> detail than that with hand coded counters..

Right, pagetable memory is interesting since it's something that any 
random kernel user can indirectly allocate via iommu_domain_alloc() and 
iommu_map(), and some of those users may even be doing so on behalf of 
userspace. I have no objection to accounting and potentially applying 
limits to *that*.

Beyond that, though, there is nothing special about "the IOMMU 
subsystem". The amount of memory an IOMMU driver needs to allocate for 
itself in order to function is not of interest beyond curiosity, it just 
is what it is; limiting it would only break the IOMMU, and if a user 
thinks it's "too much", the only actionable thing that might help is to 
physically remove devices from the system. Similar for DMA buffers; it 
might be intriguing to account those, but it's not really an actionable 
metric - in the overwhelming majority of cases you can't simply tell a 
driver to allocate less than what it needs. And that is of course 
assuming if we were to account *all* DMA buffers, since whether they 
happen to have an IOMMU translation or not is irrelevant (we'd have 
already accounted the pagetables as pagetables if so).

I bet "the networking subsystem" also consumes significant memory on the 
same kind of big systems where IOMMU pagetables would be of any concern. 
I believe some of the some of the "serious" NICs can easily run up 
hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. 
- would you propose accounting those too?

Thanks,
Robin.

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

  reply	other threads:[~2023-11-29 16:48 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-28 20:49 [PATCH 00/16] IOMMU memory observability Pasha Tatashin
2023-11-28 20:49 ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 01/16] iommu/vt-d: add wrapper functions for page allocations Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 02/16] iommu/amd: use page allocation function provided by iommu-pages.h Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 03/16] iommu/io-pgtable-arm: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 04/16] iommu/io-pgtable-dart: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-29  7:49   ` Janne Grunau
2023-11-29  7:49     ` Janne Grunau
2023-11-29 21:49     ` Pasha Tatashin
2023-11-29 21:49       ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 05/16] iommu/io-pgtable-arm-v7s: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 22:46   ` Robin Murphy
2023-11-28 22:46     ` Robin Murphy
2023-11-28 22:55     ` Pasha Tatashin
2023-11-28 22:55       ` Pasha Tatashin
2023-11-28 23:07       ` Robin Murphy
2023-11-28 23:07         ` Robin Murphy
2023-11-28 23:32         ` Pasha Tatashin
2023-11-28 23:32           ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 06/16] iommu/dma: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 22:33   ` Robin Murphy
2023-11-28 22:33     ` Robin Murphy
2023-11-28 22:50     ` Pasha Tatashin
2023-11-28 22:50       ` Pasha Tatashin
2023-11-28 22:59       ` Robin Murphy
2023-11-28 22:59         ` Robin Murphy
2023-11-28 23:06         ` Pasha Tatashin
2023-11-28 23:06           ` Pasha Tatashin
2023-11-28 23:08         ` Pasha Tatashin
2023-11-28 23:08           ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 07/16] iommu/exynos: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 08/16] iommu/fsl: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 22:53   ` Robin Murphy
2023-11-28 22:53     ` Robin Murphy
2023-11-28 23:00     ` Pasha Tatashin
2023-11-28 23:00       ` Pasha Tatashin
2023-11-28 23:50       ` Jason Gunthorpe
2023-11-28 23:50         ` Jason Gunthorpe
2023-11-29 16:48         ` Robin Murphy [this message]
2023-11-29 16:48           ` Robin Murphy
2023-11-29 19:45           ` Pasha Tatashin
2023-11-29 19:45             ` Pasha Tatashin
2023-11-29 20:03             ` Jason Gunthorpe
2023-11-29 20:03               ` Jason Gunthorpe
2023-11-29 20:44               ` Pasha Tatashin
2023-11-29 20:44                 ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 09/16] iommu/iommufd: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 23:52   ` Jason Gunthorpe
2023-11-28 23:52     ` Jason Gunthorpe
2023-11-29 21:59     ` Pasha Tatashin
2023-11-29 21:59       ` Pasha Tatashin
2023-11-30  0:02       ` Jason Gunthorpe
2023-11-30  0:02         ` Jason Gunthorpe
2023-11-28 20:49 ` [PATCH 10/16] iommu/rockchip: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 11/16] iommu/sun50i: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 12/16] iommu/tegra-smmu: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 13/16] iommu: observability of the IOMMU allocations Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-30 14:03   ` kernel test robot
2023-11-30 14:03     ` kernel test robot
2023-11-30 14:03     ` kernel test robot
2023-11-28 20:49 ` [PATCH 14/16] iommu: account IOMMU allocated memory Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 15/16] vhost-vdpa: account iommu allocations Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-12-25 16:09   ` Michael S. Tsirkin
2023-12-25 16:09     ` Michael S. Tsirkin
2023-12-26 18:23     ` Pasha Tatashin
2023-12-26 18:23       ` Pasha Tatashin
2023-11-28 20:49 ` [PATCH 16/16] vfio: " Pasha Tatashin
2023-11-28 20:49   ` Pasha Tatashin
2023-11-28 23:53   ` Jason Gunthorpe
2023-11-28 23:53     ` Jason Gunthorpe
2023-11-29 21:36     ` Pasha Tatashin
2023-11-29 21:36       ` Pasha Tatashin
2023-11-28 21:33 ` [PATCH 00/16] IOMMU memory observability Yosry Ahmed
2023-11-28 21:33   ` Yosry Ahmed
2023-11-28 22:31   ` Pasha Tatashin
2023-11-28 22:31     ` Pasha Tatashin
2023-11-28 23:03     ` Yosry Ahmed
2023-11-28 23:03       ` Yosry Ahmed
2023-11-28 23:52       ` Jason Gunthorpe
2023-11-28 23:52         ` Jason Gunthorpe
2023-11-29  0:25         ` Yosry Ahmed
2023-11-29  0:25           ` Yosry Ahmed
2023-11-29  0:28           ` Jason Gunthorpe
2023-11-29  0:28             ` Jason Gunthorpe
2023-11-29  0:30             ` Yosry Ahmed
2023-11-29  0:30               ` Yosry Ahmed
2023-11-29  0:54               ` Jason Gunthorpe
2023-11-29  0:54                 ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52de3aca-41b1-471e-8f87-1a77de547510@arm.com \
    --to=robin.murphy@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alim.akhtar@samsung.com \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=hannes@cmpxchg.org \
    --cc=heiko@sntech.de \
    --cc=iommu@lists.linux.dev \
    --cc=jasowang@redhat.com \
    --cc=jernej.skrabec@gmail.com \
    --cc=jgg@ziepe.ca \
    --cc=jonathanh@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=krzysztof.kozlowski@linaro.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linux-samsung-soc@vger.kernel.org \
    --cc=linux-sunxi@lists.linux.dev \
    --cc=linux-tegra@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=m.szyprowski@samsung.com \
    --cc=marcan@marcan.st \
    --cc=mhiramat@kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulmck@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=samuel@sholland.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=sven@svenpeter.dev \
    --cc=thierry.reding@gmail.com \
    --cc=tj@kernel.org \
    --cc=tomas.mudrunka@gmail.com \
    --cc=vdumpa@nvidia.com \
    --cc=virtualization@lists.linux.dev \
    --cc=wens@csie.org \
    --cc=will@kernel.org \
    --cc=yu-cheng.yu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.