From: David Hildenbrand <david@redhat.com>
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
Matthew Wilcox <willy@infradead.org>
Cc: "jean-philippe@linaro.org" <jean-philippe@linaro.org>,
"kevin.tian@intel.com" <kevin.tian@intel.com>,
"chensihang \(A\)" <chensihang1@hisilicon.com>,
"jgg@ziepe.ca" <jgg@ziepe.ca>,
"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"iommu@lists.linux-foundation.org"
<iommu@lists.linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"zhangfei.gao@linaro.org" <zhangfei.gao@linaro.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Liguozhu \(Kenneth\)" <liguozhu@hisilicon.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin
Date: Mon, 8 Feb 2021 09:21:42 +0100 [thread overview]
Message-ID: <beb4dfb5-e9d2-a76c-f965-28cff5e4658b@redhat.com> (raw)
In-Reply-To: <b4e2acc237e44ffe916135e96ad3ef20@hisilicon.com>
On 08.02.21 03:27, Song Bao Hua (Barry Song) wrote:
>
>
>> -----Original Message-----
>> From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org] On Behalf Of
>> Matthew Wilcox
>> Sent: Monday, February 8, 2021 2:31 PM
>> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
>> Cc: Wangzhou (B) <wangzhou1@hisilicon.com>; linux-kernel@vger.kernel.org;
>> iommu@lists.linux-foundation.org; linux-mm@kvack.org;
>> linux-arm-kernel@lists.infradead.org; linux-api@vger.kernel.org; Andrew
>> Morton <akpm@linux-foundation.org>; Alexander Viro <viro@zeniv.linux.org.uk>;
>> gregkh@linuxfoundation.org; jgg@ziepe.ca; kevin.tian@intel.com;
>> jean-philippe@linaro.org; eric.auger@redhat.com; Liguozhu (Kenneth)
>> <liguozhu@hisilicon.com>; zhangfei.gao@linaro.org; chensihang (A)
>> <chensihang1@hisilicon.com>
>> Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory
>> pin
>>
>> On Sun, Feb 07, 2021 at 10:24:28PM +0000, Song Bao Hua (Barry Song) wrote:
>>>>> In high-performance I/O cases, accelerators might want to perform
>>>>> I/O on a memory without IO page faults which can result in dramatically
>>>>> increased latency. Current memory related APIs could not achieve this
>>>>> requirement, e.g. mlock can only avoid memory to swap to backup device,
>>>>> page migration can still trigger IO page fault.
>>>>
>>>> Well ... we have two requirements. The application wants to not take
>>>> page faults. The system wants to move the application to a different
>>>> NUMA node in order to optimise overall performance. Why should the
>>>> application's desires take precedence over the kernel's desires? And why
>>>> should it be done this way rather than by the sysadmin using numactl to
>>>> lock the application to a particular node?
>>>
>>> NUMA balancer is just one of many reasons for page migration. Even one
>>> simple alloc_pages() can cause memory migration in just single NUMA
>>> node or UMA system.
>>>
>>> The other reasons for page migration include but are not limited to:
>>> * memory move due to CMA
>>> * memory move due to huge pages creation
>>>
>>> Hardly we can ask users to disable the COMPACTION, CMA and Huge Page
>>> in the whole system.
>>
>> You're dodging the question. Should the CMA allocation fail because
>> another application is using SVA?
>>
>> I would say no.
>
> I would say no as well.
>
> While IOMMU is enabled, CMA almost has one user only: IOMMU driver
> as other drivers will depend on iommu to use non-contiguous memory
> though they are still calling dma_alloc_coherent().
>
> In iommu driver, dma_alloc_coherent is called during initialization
> and there is no new allocation afterwards. So it wouldn't cause
> runtime impact on SVA performance. Even there is new allocations,
> CMA will fall back to general alloc_pages() and iommu drivers are
> almost allocating small memory for command queues.
>
> So I would say general compound pages, huge pages, especially
> transparent huge pages, would be bigger concerns than CMA for
> internal page migration within one NUMA.
>
> Not like CMA, general alloc_pages() can get memory by moving
> pages other than those pinned.
>
> And there is no guarantee we can always bind the memory of
> SVA applications to single one NUMA, so NUMA balancing is
> still a concern.
>
> But I agree we need a way to make CMA success while the userspace
> pages are pinned. Since pin has been viral in many drivers, I
> assume there is a way to handle this. Otherwise, APIs like
> V4L2_MEMORY_USERPTR[1] will possibly make CMA fail as there
> is no guarantee that usersspace will allocate unmovable memory
> and there is no guarantee the fallback path- alloc_pages() can
> succeed while allocating big memory.
>
Long term pinnings cannot go onto CMA-reserved memory, and there is
similar work to also fix ZONE_MOVABLE in that regard.
https://lkml.kernel.org/r/20210125194751.1275316-1-pasha.tatashin@soleen.com
One of the reasons I detest using long term pinning of pages where it
could be avoided. Take VFIO and RDMA as an example: these things
currently can't work without them.
What I read here: "DMA performance will be affected severely". That does
not sound like a compelling argument to me for long term pinnings.
Please find another way to achieve the same goal without long term
pinnings controlled by user space - e.g., controlling when migration
actually happens.
For example, CMA/alloc_contig_range()/memory unplug are corner cases
that happen rarely, you shouldn't have to worry about them messing with
your DMA performance.
--
Thanks,
David / dhildenb
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2021-02-08 8:22 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-07 8:18 [RFC PATCH v3 0/2] mempinfd: Add new syscall to provide memory pin Zhou Wang
2021-02-07 8:18 ` [RFC PATCH v3 1/2] " Zhou Wang
2021-02-07 21:34 ` Matthew Wilcox
2021-02-07 22:24 ` Song Bao Hua (Barry Song)
2021-02-08 1:30 ` Matthew Wilcox
2021-02-08 2:27 ` Song Bao Hua (Barry Song)
2021-02-08 8:21 ` David Hildenbrand [this message]
2021-02-08 10:13 ` Song Bao Hua (Barry Song)
2021-02-08 10:37 ` David Hildenbrand
2021-02-08 20:52 ` Song Bao Hua (Barry Song)
2021-02-08 2:18 ` David Rientjes via iommu
2021-02-08 5:34 ` Song Bao Hua (Barry Song)
2021-02-07 21:51 ` Arnd Bergmann
2021-02-09 9:27 ` Zhou Wang
2021-02-07 22:02 ` Andy Lutomirski
2021-02-09 9:17 ` Zhou Wang
2021-02-09 9:37 ` Greg KH
2021-02-09 11:58 ` Zhou Wang
2021-02-09 12:01 ` Greg KH
2021-02-09 12:20 ` Zhou Wang
2021-02-10 18:50 ` Matthew Wilcox
2021-02-08 8:14 ` David Hildenbrand
2021-02-08 18:33 ` Jason Gunthorpe
2021-02-08 20:35 ` Song Bao Hua (Barry Song)
2021-02-08 21:30 ` Jason Gunthorpe
2021-02-09 3:01 ` Song Bao Hua (Barry Song)
2021-02-09 13:53 ` Jason Gunthorpe
2021-02-09 22:22 ` Song Bao Hua (Barry Song)
2021-02-10 18:04 ` Jason Gunthorpe
2021-02-10 21:39 ` Song Bao Hua (Barry Song)
2021-02-11 10:28 ` David Hildenbrand
2021-02-07 8:18 ` [RFC PATCH v3 2/2] selftests/vm: add mempinfd test Zhou Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=beb4dfb5-e9d2-a76c-f965-28cff5e4658b@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=chensihang1@hisilicon.com \
--cc=gregkh@linuxfoundation.org \
--cc=iommu@lists.linux-foundation.org \
--cc=jean-philippe@linaro.org \
--cc=jgg@ziepe.ca \
--cc=kevin.tian@intel.com \
--cc=liguozhu@hisilicon.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=song.bao.hua@hisilicon.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=zhangfei.gao@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).