dri-devel Archive on lore.kernel.org
 help / color / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	Doug Ledford <dledford@redhat.com>,
	"Vetter, Daniel" <daniel.vetter@intel.com>,
	"Xiong, Jianxin" <jianxin.xiong@intel.com>
Subject: Re: [RFC PATCH v2 0/3] RDMA: add dma-buf support
Date: Wed, 1 Jul 2020 14:55:50 +0200
Message-ID: <34077a9f-7924-fbb3-04d9-cd20243f815c@amd.com> (raw)
In-Reply-To: <20200701123904.GM25301@ziepe.ca>

Am 01.07.20 um 14:39 schrieb Jason Gunthorpe:
> On Wed, Jul 01, 2020 at 11:03:06AM +0200, Christian König wrote:
>> Am 30.06.20 um 20:46 schrieb Xiong, Jianxin:
>>>> From: Jason Gunthorpe <jgg@ziepe.ca>
>>>> Sent: Tuesday, June 30, 2020 10:35 AM
>>>> To: Xiong, Jianxin <jianxin.xiong@intel.com>
>>>> Cc: linux-rdma@vger.kernel.org; Doug Ledford <dledford@redhat.com>; Sumit Semwal <sumit.semwal@linaro.org>; Leon Romanovsky
>>>> <leon@kernel.org>; Vetter, Daniel <daniel.vetter@intel.com>; Christian Koenig <christian.koenig@amd.com>
>>>> Subject: Re: [RFC PATCH v2 0/3] RDMA: add dma-buf support
>>>> On Tue, Jun 30, 2020 at 05:21:33PM +0000, Xiong, Jianxin wrote:
>>>>>>> Heterogeneous Memory Management (HMM) utilizes
>>>>>>> mmu_interval_notifier and ZONE_DEVICE to support shared virtual
>>>>>>> address space and page migration between system memory and device
>>>>>>> memory. HMM doesn't support pinning device memory because pages
>>>>>>> located on device must be able to migrate to system memory when
>>>>>>> accessed by CPU. Peer-to-peer access is possible if the peer can
>>>>>>> handle page fault. For RDMA, that means the NIC must support on-demand paging.
>>>>>> peer-peer access is currently not possible with hmm_range_fault().
>>>>> Currently hmm_range_fault() always sets the cpu access flag and device
>>>>> private pages are migrated to the system RAM in the fault handler.
>>>>> However, it's possible to have a modified code flow to keep the device
>>>>> private page info for use with peer to peer access.
>>>> Sort of, but only within the same device, RDMA or anything else generic can't reach inside a DEVICE_PRIVATE and extract anything useful.
>>> But pfn is supposed to be all that is needed.
>>>>>> So.. this patch doesn't really do anything new? We could just make a MR against the DMA buf mmap and get to the same place?
>>>>> That's right, the patch alone is just half of the story. The
>>>>> functionality depends on availability of dma-buf exporter that can pin
>>>>> the device memory.
>>>> Well, what do you want to happen here? The RDMA parts are reasonable, but I don't want to add new functionality without a purpose - the
>>>> other parts need to be settled out first.
>>> At the RDMA side, we mainly want to check if the changes are acceptable. For example,
>>> the part about adding 'fd' to the device ops and the ioctl interface. All the previous
>>> comments are very helpful for us to refine the patch so that we can be ready when
>>> GPU side support becomes available.
>>>> The need for the dynamic mapping support for even the current DMA Buf hacky P2P users is really too bad. Can you get any GPU driver to
>>>> support non-dynamic mapping?
>>> We are working on direct direction.
>>>>>>> migrate to system RAM. This is due to the lack of knowledge about
>>>>>>> whether the importer can perform peer-to-peer access and the lack
>>>>>>> of resource limit control measure for GPU. For the first part, the
>>>>>>> latest dma-buf driver has a peer-to-peer flag for the importer,
>>>>>>> but the flag is currently tied to dynamic mapping support, which
>>>>>>> requires on-demand paging support from the NIC to work.
>>>>>> ODP for DMA buf?
>>>>> Right.
>>>> Hum. This is not actually so hard to do. The whole dma buf proposal would make a lot more sense if the 'dma buf MR' had to be the
>>>> dynamic kind and the driver had to provide the faulting. It would not be so hard to change mlx5 to be able to work like this, perhaps. (the
>>>> locking might be a bit tricky though)
>>> The main issue is that not all NICs support ODP.
>> You don't need on-demand paging support from the NIC for dynamic mapping to
>> work.
>> All you need is the ability to stop wait for ongoing accesses to end and
>> make sure that new ones grab a new mapping.
> Swap and flush isn't a general HW ability either..
> I'm unclear how this could be useful, it is guarenteed to corrupt
> in-progress writes?
> Did you mean pause, swap and resume? That's ODP.

Yes, something like this. And good to know, never heard of ODP.

On the GPU side we can pipeline things, e.g. you can program the 
hardware that page tables are changed at a certain point in time.

So what we do is when we get a notification that a buffer will move 
around is to mark this buffer in our structures as invalid and return a 
fence to so that the exporter is able to wait for ongoing stuff to finish.

The actual move then happens only after the ongoing operations on the 
GPU are finished and on the next operation we grab the new location of 
the buffer and re-program the page tables to it.

This way all the CPU does is really just planning asynchronous page 
table changes which are executed on the GPU later on.

You can of course do it synchronized as well, but this would hurt our 
performance pretty badly.


> Jason

dri-devel mailing list

  reply index

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1593451903-30959-1-git-send-email-jianxin.xiong@intel.com>
     [not found] ` <20200629185152.GD25301@ziepe.ca>
     [not found]   ` <MW3PR11MB4555A99038FA0CFC3ED80D3DE56F0@MW3PR11MB4555.namprd11.prod.outlook.com>
     [not found]     ` <20200630173435.GK25301@ziepe.ca>
2020-06-30 18:46       ` Xiong, Jianxin
2020-06-30 19:17         ` Jason Gunthorpe
2020-06-30 20:08           ` Xiong, Jianxin
2020-07-02 12:27             ` Jason Gunthorpe
2020-07-01  9:03         ` Christian König
2020-07-01 12:07           ` Daniel Vetter
2020-07-01 12:14             ` Daniel Vetter
2020-07-01 12:39           ` Jason Gunthorpe
2020-07-01 12:55             ` Christian König [this message]
2020-07-01 15:42               ` Daniel Vetter
2020-07-01 17:15                 ` Jason Gunthorpe
2020-07-02 13:10                   ` Daniel Vetter
2020-07-02 13:29                     ` Jason Gunthorpe
2020-07-02 14:50                       ` Christian König
2020-07-02 18:15                         ` Daniel Vetter
2020-07-03 12:03                           ` Jason Gunthorpe
2020-07-03 12:52                             ` Daniel Vetter
2020-07-03 13:14                               ` Jason Gunthorpe
2020-07-03 13:21                                 ` Christian König
2020-07-07 21:58                                   ` Xiong, Jianxin
2020-07-08  9:38                                     ` Christian König
2020-07-08  9:49                                       ` Daniel Vetter
2020-07-08 14:20                                         ` Christian König
2020-07-08 14:33                                           ` Alex Deucher
2020-06-30 18:56 ` Xiong, Jianxin
     [not found] ` <1593451903-30959-2-git-send-email-jianxin.xiong@intel.com>
2020-06-30 19:04   ` [RFC PATCH v2 1/3] RDMA/umem: Support importing dma-buf as user memory region Xiong, Jianxin
     [not found] ` <1593451903-30959-3-git-send-email-jianxin.xiong@intel.com>
2020-06-30 19:04   ` [RFC PATCH v2 2/3] RDMA/core: Expand the driver method 'reg_user_mr' to support dma-buf Xiong, Jianxin
     [not found] ` <1593451903-30959-4-git-send-email-jianxin.xiong@intel.com>
2020-06-30 19:05   ` [RFC PATCH v2 3/3] RDMA/uverbs: Add uverbs command for dma-buf based MR registration Xiong, Jianxin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34077a9f-7924-fbb3-04d9-cd20243f815c@amd.com \
    --to=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=dledford@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=jianxin.xiong@intel.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

dri-devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dri-devel/0 dri-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dri-devel dri-devel/ https://lore.kernel.org/dri-devel \
	public-inbox-index dri-devel

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git