linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: "Wangzhou (B)" <wangzhou1@hisilicon.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>,
	"kevin.tian@intel.com" <kevin.tian@intel.com>,
	"jean-philippe@linaro.org" <jean-philippe@linaro.org>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"Liguozhu (Kenneth)" <liguozhu@hisilicon.com>,
	"zhangfei.gao@linaro.org" <zhangfei.gao@linaro.org>,
	"chensihang (A)" <chensihang1@hisilicon.com>
Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin
Date: Mon, 8 Feb 2021 11:37:24 +0100	[thread overview]
Message-ID: <bbe18536-7048-d790-11bf-0b0742a59926@redhat.com> (raw)
In-Reply-To: <fdee54b5ab91453d93d2f775ca2532d2@hisilicon.com>

On 08.02.21 11:13, Song Bao Hua (Barry Song) wrote:
> 
> 
>> -----Original Message-----
>> From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org] On Behalf Of
>> David Hildenbrand
>> Sent: Monday, February 8, 2021 9:22 PM
>> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; Matthew Wilcox
>> <willy@infradead.org>
>> Cc: Wangzhou (B) <wangzhou1@hisilicon.com>; linux-kernel@vger.kernel.org;
>> iommu@lists.linux-foundation.org; linux-mm@kvack.org;
>> linux-arm-kernel@lists.infradead.org; linux-api@vger.kernel.org; Andrew
>> Morton <akpm@linux-foundation.org>; Alexander Viro <viro@zeniv.linux.org.uk>;
>> gregkh@linuxfoundation.org; jgg@ziepe.ca; kevin.tian@intel.com;
>> jean-philippe@linaro.org; eric.auger@redhat.com; Liguozhu (Kenneth)
>> <liguozhu@hisilicon.com>; zhangfei.gao@linaro.org; chensihang (A)
>> <chensihang1@hisilicon.com>
>> Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory
>> pin
>>
>> On 08.02.21 03:27, Song Bao Hua (Barry Song) wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org] On Behalf
>> Of
>>>> Matthew Wilcox
>>>> Sent: Monday, February 8, 2021 2:31 PM
>>>> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
>>>> Cc: Wangzhou (B) <wangzhou1@hisilicon.com>; linux-kernel@vger.kernel.org;
>>>> iommu@lists.linux-foundation.org; linux-mm@kvack.org;
>>>> linux-arm-kernel@lists.infradead.org; linux-api@vger.kernel.org; Andrew
>>>> Morton <akpm@linux-foundation.org>; Alexander Viro
>> <viro@zeniv.linux.org.uk>;
>>>> gregkh@linuxfoundation.org; jgg@ziepe.ca; kevin.tian@intel.com;
>>>> jean-philippe@linaro.org; eric.auger@redhat.com; Liguozhu (Kenneth)
>>>> <liguozhu@hisilicon.com>; zhangfei.gao@linaro.org; chensihang (A)
>>>> <chensihang1@hisilicon.com>
>>>> Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory
>>>> pin
>>>>
>>>> On Sun, Feb 07, 2021 at 10:24:28PM +0000, Song Bao Hua (Barry Song) wrote:
>>>>>>> In high-performance I/O cases, accelerators might want to perform
>>>>>>> I/O on a memory without IO page faults which can result in dramatically
>>>>>>> increased latency. Current memory related APIs could not achieve this
>>>>>>> requirement, e.g. mlock can only avoid memory to swap to backup device,
>>>>>>> page migration can still trigger IO page fault.
>>>>>>
>>>>>> Well ... we have two requirements.  The application wants to not take
>>>>>> page faults.  The system wants to move the application to a different
>>>>>> NUMA node in order to optimise overall performance.  Why should the
>>>>>> application's desires take precedence over the kernel's desires?  And why
>>>>>> should it be done this way rather than by the sysadmin using numactl to
>>>>>> lock the application to a particular node?
>>>>>
>>>>> NUMA balancer is just one of many reasons for page migration. Even one
>>>>> simple alloc_pages() can cause memory migration in just single NUMA
>>>>> node or UMA system.
>>>>>
>>>>> The other reasons for page migration include but are not limited to:
>>>>> * memory move due to CMA
>>>>> * memory move due to huge pages creation
>>>>>
>>>>> Hardly we can ask users to disable the COMPACTION, CMA and Huge Page
>>>>> in the whole system.
>>>>
>>>> You're dodging the question.  Should the CMA allocation fail because
>>>> another application is using SVA?
>>>>
>>>> I would say no.
>>>
>>> I would say no as well.
>>>
>>> While IOMMU is enabled, CMA almost has one user only: IOMMU driver
>>> as other drivers will depend on iommu to use non-contiguous memory
>>> though they are still calling dma_alloc_coherent().
>>>
>>> In iommu driver, dma_alloc_coherent is called during initialization
>>> and there is no new allocation afterwards. So it wouldn't cause
>>> runtime impact on SVA performance. Even there is new allocations,
>>> CMA will fall back to general alloc_pages() and iommu drivers are
>>> almost allocating small memory for command queues.
>>>
>>> So I would say general compound pages, huge pages, especially
>>> transparent huge pages, would be bigger concerns than CMA for
>>> internal page migration within one NUMA.
>>>
>>> Not like CMA, general alloc_pages() can get memory by moving
>>> pages other than those pinned.
>>>
>>> And there is no guarantee we can always bind the memory of
>>> SVA applications to single one NUMA, so NUMA balancing is
>>> still a concern.
>>>
>>> But I agree we need a way to make CMA success while the userspace
>>> pages are pinned. Since pin has been viral in many drivers, I
>>> assume there is a way to handle this. Otherwise, APIs like
>>> V4L2_MEMORY_USERPTR[1] will possibly make CMA fail as there
>>> is no guarantee that usersspace will allocate unmovable memory
>>> and there is no guarantee the fallback path- alloc_pages() can
>>> succeed while allocating big memory.
>>>
>>
>> Long term pinnings cannot go onto CMA-reserved memory, and there is
>> similar work to also fix ZONE_MOVABLE in that regard.
>>
>> https://lkml.kernel.org/r/20210125194751.1275316-1-pasha.tatashin@soleen.c
>> om
>>
>> One of the reasons I detest using long term pinning of pages where it
>> could be avoided. Take VFIO and RDMA as an example: these things
>> currently can't work without them.
>>
>> What I read here: "DMA performance will be affected severely". That does
>> not sound like a compelling argument to me for long term pinnings.
>> Please find another way to achieve the same goal without long term
>> pinnings controlled by user space - e.g., controlling when migration
>> actually happens.
>>
>> For example, CMA/alloc_contig_range()/memory unplug are corner cases
>> that happen rarely, you shouldn't have to worry about them messing with
>> your DMA performance.
> 
> I agree CMA/alloc_contig_range()/memory unplug would be corner cases,
> the major cases would be THP, NUMA balancing while we could totally
> disable them but it seems insensible to do that only because there is
> a process using SVA in the system.

Can't you use huge pages in your application that uses SVA and prevent 
THP/NUMA balancing from kicking in?

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-02-08 10:50 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-07  8:18 [RFC PATCH v3 0/2] mempinfd: Add new syscall to provide memory pin Zhou Wang
2021-02-07  8:18 ` [RFC PATCH v3 1/2] " Zhou Wang
2021-02-07 21:34   ` Matthew Wilcox
2021-02-07 22:24     ` Song Bao Hua (Barry Song)
2021-02-08  1:30       ` Matthew Wilcox
2021-02-08  2:27         ` Song Bao Hua (Barry Song)
2021-02-08  8:21           ` David Hildenbrand
2021-02-08 10:13             ` Song Bao Hua (Barry Song)
2021-02-08 10:37               ` David Hildenbrand [this message]
2021-02-08 20:52                 ` Song Bao Hua (Barry Song)
2021-02-08  2:18       ` David Rientjes
2021-02-08  5:34         ` Song Bao Hua (Barry Song)
2021-02-09  9:02     ` Zhou Wang
2021-02-07 21:51   ` Arnd Bergmann
2021-02-09  9:27     ` Zhou Wang
2021-02-07 22:02   ` Andy Lutomirski
2021-02-09  9:17     ` Zhou Wang
2021-02-09  9:37       ` Greg KH
2021-02-09 11:58         ` Zhou Wang
2021-02-09 12:01           ` Greg KH
2021-02-09 12:20             ` Zhou Wang
2021-02-10 18:50               ` Matthew Wilcox
2021-02-08  8:14   ` David Hildenbrand
2021-02-08 18:33     ` Jason Gunthorpe
2021-02-08 20:35       ` Song Bao Hua (Barry Song)
2021-02-08 21:30         ` Jason Gunthorpe
2021-02-09  3:01           ` Song Bao Hua (Barry Song)
2021-02-09 13:53             ` Jason Gunthorpe
2021-02-09 22:22               ` Song Bao Hua (Barry Song)
2021-02-10 18:04                 ` Jason Gunthorpe
2021-02-10 21:39                   ` Song Bao Hua (Barry Song)
2021-02-11 10:28                     ` David Hildenbrand
2021-02-07  8:18 ` [RFC PATCH v3 2/2] selftests/vm: add mempinfd test Zhou Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbe18536-7048-d790-11bf-0b0742a59926@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chensihang1@hisilicon.com \
    --cc=eric.auger@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=liguozhu@hisilicon.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wangzhou1@hisilicon.com \
    --cc=willy@infradead.org \
    --cc=zhangfei.gao@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).