From: Pasha Tatashin <pasha.tatashin@soleen.com> To: Robin Murphy <robin.murphy@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca>, akpm@linux-foundation.org, alex.williamson@redhat.com, alim.akhtar@samsung.com, alyssa@rosenzweig.io, asahi@lists.linux.dev, baolu.lu@linux.intel.com, bhelgaas@google.com, cgroups@vger.kernel.org, corbet@lwn.net, david@redhat.com, dwmw2@infradead.org, hannes@cmpxchg.org, heiko@sntech.de, iommu@lists.linux.dev, jasowang@redhat.com, jernej.skrabec@gmail.com, jonathanh@nvidia.com, joro@8bytes.org, kevin.tian@intel.com, krzysztof.kozlowski@linaro.org, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rockchip@lists.infradead.org, linux-samsung-soc@vger.kernel.org, linux-sunxi@lists.linux.dev, linux-tegra@vger.kernel.org, lizefan.x@bytedance.com, marcan@marcan.st, mhiramat@kernel.org, mst@redhat.com, m.szyprowski@samsung.com, netdev@vger.kernel.org, paulmck@kernel.org, rdunlap@infradead.org, samuel@sholland.org, suravee.suthikulpanit@amd.com, sven@svenpeter.dev, thierry.reding@gmail.com, tj@kernel.org, tomas.mudrunka@gmail.com, vdumpa@nvidia.com, virtualization@lists.linux.dev, wens@csie.org, will@kernel.org, yu-cheng.yu@intel.com Subject: Re: [PATCH 08/16] iommu/fsl: use page allocation function provided by iommu-pages.h Date: Wed, 29 Nov 2023 14:45:03 -0500 [thread overview] Message-ID: <CA+CK2bCcfS1Fo8RvTeGXj_ejPRX9--sh5Jz8nzhkZnut4juDmg@mail.gmail.com> (raw) In-Reply-To: <52de3aca-41b1-471e-8f87-1a77de547510@arm.com> > >> We can separate the metric into two: > >> iommu pagetable only > >> iommu everything > >> > >> or into three: > >> iommu pagetable only > >> iommu dma > >> iommu everything > >> > >> What do you think? > > > > I think I said this at LPC - if you want to have fine grained > > accounting of memory by owner you need to go talk to the cgroup people > > and come up with something generic. Adding ever open coded finer > > category breakdowns just for iommu doesn't make alot of sense. > > > > You can make some argument that the pagetable memory should be counted > > because kvm counts it's shadow memory, but I wouldn't go into further > > detail than that with hand coded counters.. > > Right, pagetable memory is interesting since it's something that any > random kernel user can indirectly allocate via iommu_domain_alloc() and > iommu_map(), and some of those users may even be doing so on behalf of > userspace. I have no objection to accounting and potentially applying > limits to *that*. Yes, in the next version, I will separate pagetable only from the rest, for the limits. > Beyond that, though, there is nothing special about "the IOMMU > subsystem". The amount of memory an IOMMU driver needs to allocate for > itself in order to function is not of interest beyond curiosity, it just > is what it is; limiting it would only break the IOMMU, and if a user Agree about the amount of memory IOMMU allocates for itself, but that should be small, if it is not, we have to at least show where the memory is used. > thinks it's "too much", the only actionable thing that might help is to > physically remove devices from the system. Similar for DMA buffers; it > might be intriguing to account those, but it's not really an actionable > metric - in the overwhelming majority of cases you can't simply tell a > driver to allocate less than what it needs. And that is of course > assuming if we were to account *all* DMA buffers, since whether they > happen to have an IOMMU translation or not is irrelevant (we'd have > already accounted the pagetables as pagetables if so). DMA mappings should be observable (do not have to be limited). At the very least, it can help with explaining the kernel memory overhead anomalies on production systems. > I bet "the networking subsystem" also consumes significant memory on the It does, and GPU drivers also may consume a significant amount of memory. > same kind of big systems where IOMMU pagetables would be of any concern. > I believe some of the some of the "serious" NICs can easily run up > hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. > - would you propose accounting those too? Yes. Any kind of kernel memory that is proportional to the workload should be accountable. Someone is using those resources compared to the idling system, and that someone should be charged. Pasha
WARNING: multiple messages have this Message-ID (diff)
From: Pasha Tatashin <pasha.tatashin@soleen.com> To: Robin Murphy <robin.murphy@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca>, akpm@linux-foundation.org, alex.williamson@redhat.com, alim.akhtar@samsung.com, alyssa@rosenzweig.io, asahi@lists.linux.dev, baolu.lu@linux.intel.com, bhelgaas@google.com, cgroups@vger.kernel.org, corbet@lwn.net, david@redhat.com, dwmw2@infradead.org, hannes@cmpxchg.org, heiko@sntech.de, iommu@lists.linux.dev, jasowang@redhat.com, jernej.skrabec@gmail.com, jonathanh@nvidia.com, joro@8bytes.org, kevin.tian@intel.com, krzysztof.kozlowski@linaro.org, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rockchip@lists.infradead.org, linux-samsung-soc@vger.kernel.org, linux-sunxi@lists.linux.dev, linux-tegra@vger.kernel.org, lizefan.x@bytedance.com, marcan@marcan.st, mhiramat@kernel.org, mst@redhat.com, m.szyprowski@samsung.com, netdev@vger.kernel.org, paulmck@kernel.org, rdunlap@infradead.org, samuel@sholland.org, suravee.suthikulpanit@amd.com, sven@svenpeter.dev, thierry.reding@gmail.com, tj@kernel.org, tomas.mudrunka@gmail.com, vdumpa@nvidia.com, virtualization@lists.linux.dev, wens@csie.org, will@kernel.org, yu-cheng.yu@intel.com Subject: Re: [PATCH 08/16] iommu/fsl: use page allocation function provided by iommu-pages.h Date: Wed, 29 Nov 2023 14:45:03 -0500 [thread overview] Message-ID: <CA+CK2bCcfS1Fo8RvTeGXj_ejPRX9--sh5Jz8nzhkZnut4juDmg@mail.gmail.com> (raw) In-Reply-To: <52de3aca-41b1-471e-8f87-1a77de547510@arm.com> > >> We can separate the metric into two: > >> iommu pagetable only > >> iommu everything > >> > >> or into three: > >> iommu pagetable only > >> iommu dma > >> iommu everything > >> > >> What do you think? > > > > I think I said this at LPC - if you want to have fine grained > > accounting of memory by owner you need to go talk to the cgroup people > > and come up with something generic. Adding ever open coded finer > > category breakdowns just for iommu doesn't make alot of sense. > > > > You can make some argument that the pagetable memory should be counted > > because kvm counts it's shadow memory, but I wouldn't go into further > > detail than that with hand coded counters.. > > Right, pagetable memory is interesting since it's something that any > random kernel user can indirectly allocate via iommu_domain_alloc() and > iommu_map(), and some of those users may even be doing so on behalf of > userspace. I have no objection to accounting and potentially applying > limits to *that*. Yes, in the next version, I will separate pagetable only from the rest, for the limits. > Beyond that, though, there is nothing special about "the IOMMU > subsystem". The amount of memory an IOMMU driver needs to allocate for > itself in order to function is not of interest beyond curiosity, it just > is what it is; limiting it would only break the IOMMU, and if a user Agree about the amount of memory IOMMU allocates for itself, but that should be small, if it is not, we have to at least show where the memory is used. > thinks it's "too much", the only actionable thing that might help is to > physically remove devices from the system. Similar for DMA buffers; it > might be intriguing to account those, but it's not really an actionable > metric - in the overwhelming majority of cases you can't simply tell a > driver to allocate less than what it needs. And that is of course > assuming if we were to account *all* DMA buffers, since whether they > happen to have an IOMMU translation or not is irrelevant (we'd have > already accounted the pagetables as pagetables if so). DMA mappings should be observable (do not have to be limited). At the very least, it can help with explaining the kernel memory overhead anomalies on production systems. > I bet "the networking subsystem" also consumes significant memory on the It does, and GPU drivers also may consume a significant amount of memory. > same kind of big systems where IOMMU pagetables would be of any concern. > I believe some of the some of the "serious" NICs can easily run up > hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. > - would you propose accounting those too? Yes. Any kind of kernel memory that is proportional to the workload should be accountable. Someone is using those resources compared to the idling system, and that someone should be charged. Pasha _______________________________________________ Linux-rockchip mailing list Linux-rockchip@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-rockchip
next prev parent reply other threads:[~2023-11-29 19:45 UTC|newest] Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-11-28 20:49 [PATCH 00/16] IOMMU memory observability Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 01/16] iommu/vt-d: add wrapper functions for page allocations Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 02/16] iommu/amd: use page allocation function provided by iommu-pages.h Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 03/16] iommu/io-pgtable-arm: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 04/16] iommu/io-pgtable-dart: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-29 7:49 ` Janne Grunau 2023-11-29 7:49 ` Janne Grunau 2023-11-29 21:49 ` Pasha Tatashin 2023-11-29 21:49 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 05/16] iommu/io-pgtable-arm-v7s: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 22:46 ` Robin Murphy 2023-11-28 22:46 ` Robin Murphy 2023-11-28 22:55 ` Pasha Tatashin 2023-11-28 22:55 ` Pasha Tatashin 2023-11-28 23:07 ` Robin Murphy 2023-11-28 23:07 ` Robin Murphy 2023-11-28 23:32 ` Pasha Tatashin 2023-11-28 23:32 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 06/16] iommu/dma: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 22:33 ` Robin Murphy 2023-11-28 22:33 ` Robin Murphy 2023-11-28 22:50 ` Pasha Tatashin 2023-11-28 22:50 ` Pasha Tatashin 2023-11-28 22:59 ` Robin Murphy 2023-11-28 22:59 ` Robin Murphy 2023-11-28 23:06 ` Pasha Tatashin 2023-11-28 23:06 ` Pasha Tatashin 2023-11-28 23:08 ` Pasha Tatashin 2023-11-28 23:08 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 07/16] iommu/exynos: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 08/16] iommu/fsl: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 22:53 ` Robin Murphy 2023-11-28 22:53 ` Robin Murphy 2023-11-28 23:00 ` Pasha Tatashin 2023-11-28 23:00 ` Pasha Tatashin 2023-11-28 23:50 ` Jason Gunthorpe 2023-11-28 23:50 ` Jason Gunthorpe 2023-11-29 16:48 ` Robin Murphy 2023-11-29 16:48 ` Robin Murphy 2023-11-29 19:45 ` Pasha Tatashin [this message] 2023-11-29 19:45 ` Pasha Tatashin 2023-11-29 20:03 ` Jason Gunthorpe 2023-11-29 20:03 ` Jason Gunthorpe 2023-11-29 20:44 ` Pasha Tatashin 2023-11-29 20:44 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 09/16] iommu/iommufd: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 23:52 ` Jason Gunthorpe 2023-11-28 23:52 ` Jason Gunthorpe 2023-11-29 21:59 ` Pasha Tatashin 2023-11-29 21:59 ` Pasha Tatashin 2023-11-30 0:02 ` Jason Gunthorpe 2023-11-30 0:02 ` Jason Gunthorpe 2023-11-28 20:49 ` [PATCH 10/16] iommu/rockchip: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 11/16] iommu/sun50i: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 12/16] iommu/tegra-smmu: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 13/16] iommu: observability of the IOMMU allocations Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-30 14:03 ` kernel test robot 2023-11-30 14:03 ` kernel test robot 2023-11-30 14:03 ` kernel test robot 2023-11-28 20:49 ` [PATCH 14/16] iommu: account IOMMU allocated memory Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 15/16] vhost-vdpa: account iommu allocations Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-12-25 16:09 ` Michael S. Tsirkin 2023-12-25 16:09 ` Michael S. Tsirkin 2023-12-26 18:23 ` Pasha Tatashin 2023-12-26 18:23 ` Pasha Tatashin 2023-11-28 20:49 ` [PATCH 16/16] vfio: " Pasha Tatashin 2023-11-28 20:49 ` Pasha Tatashin 2023-11-28 23:53 ` Jason Gunthorpe 2023-11-28 23:53 ` Jason Gunthorpe 2023-11-29 21:36 ` Pasha Tatashin 2023-11-29 21:36 ` Pasha Tatashin 2023-11-28 21:33 ` [PATCH 00/16] IOMMU memory observability Yosry Ahmed 2023-11-28 21:33 ` Yosry Ahmed 2023-11-28 22:31 ` Pasha Tatashin 2023-11-28 22:31 ` Pasha Tatashin 2023-11-28 23:03 ` Yosry Ahmed 2023-11-28 23:03 ` Yosry Ahmed 2023-11-28 23:52 ` Jason Gunthorpe 2023-11-28 23:52 ` Jason Gunthorpe 2023-11-29 0:25 ` Yosry Ahmed 2023-11-29 0:25 ` Yosry Ahmed 2023-11-29 0:28 ` Jason Gunthorpe 2023-11-29 0:28 ` Jason Gunthorpe 2023-11-29 0:30 ` Yosry Ahmed 2023-11-29 0:30 ` Yosry Ahmed 2023-11-29 0:54 ` Jason Gunthorpe 2023-11-29 0:54 ` Jason Gunthorpe
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CA+CK2bCcfS1Fo8RvTeGXj_ejPRX9--sh5Jz8nzhkZnut4juDmg@mail.gmail.com \ --to=pasha.tatashin@soleen.com \ --cc=akpm@linux-foundation.org \ --cc=alex.williamson@redhat.com \ --cc=alim.akhtar@samsung.com \ --cc=alyssa@rosenzweig.io \ --cc=asahi@lists.linux.dev \ --cc=baolu.lu@linux.intel.com \ --cc=bhelgaas@google.com \ --cc=cgroups@vger.kernel.org \ --cc=corbet@lwn.net \ --cc=david@redhat.com \ --cc=dwmw2@infradead.org \ --cc=hannes@cmpxchg.org \ --cc=heiko@sntech.de \ --cc=iommu@lists.linux.dev \ --cc=jasowang@redhat.com \ --cc=jernej.skrabec@gmail.com \ --cc=jgg@ziepe.ca \ --cc=jonathanh@nvidia.com \ --cc=joro@8bytes.org \ --cc=kevin.tian@intel.com \ --cc=krzysztof.kozlowski@linaro.org \ --cc=kvm@vger.kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-rockchip@lists.infradead.org \ --cc=linux-samsung-soc@vger.kernel.org \ --cc=linux-sunxi@lists.linux.dev \ --cc=linux-tegra@vger.kernel.org \ --cc=lizefan.x@bytedance.com \ --cc=m.szyprowski@samsung.com \ --cc=marcan@marcan.st \ --cc=mhiramat@kernel.org \ --cc=mst@redhat.com \ --cc=netdev@vger.kernel.org \ --cc=paulmck@kernel.org \ --cc=rdunlap@infradead.org \ --cc=robin.murphy@arm.com \ --cc=samuel@sholland.org \ --cc=suravee.suthikulpanit@amd.com \ --cc=sven@svenpeter.dev \ --cc=thierry.reding@gmail.com \ --cc=tj@kernel.org \ --cc=tomas.mudrunka@gmail.com \ --cc=vdumpa@nvidia.com \ --cc=virtualization@lists.linux.dev \ --cc=wens@csie.org \ --cc=will@kernel.org \ --cc=yu-cheng.yu@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.