From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1162107AbdAEXYU (ORCPT ); Thu, 5 Jan 2017 18:24:20 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33140 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1034559AbdAEXX5 (ORCPT ); Thu, 5 Jan 2017 18:23:57 -0500 Date: Thu, 5 Jan 2017 18:23:52 -0500 From: Jerome Glisse To: Jason Gunthorpe Cc: Jerome Glisse , "Deucher, Alexander" , "'linux-kernel@vger.kernel.org'" , "'linux-rdma@vger.kernel.org'" , "'linux-nvdimm@lists.01.org'" , "'Linux-media@vger.kernel.org'" , "'dri-devel@lists.freedesktop.org'" , "'linux-pci@vger.kernel.org'" , "Kuehling, Felix" , "Sagalovitch, Serguei" , "Blinzer, Paul" , "Koenig, Christian" , "Suthikulpanit, Suravee" , "Sander, Ben" , hch@infradead.org, david1.zhou@amd.com, qiang.yu@amd.com Subject: Re: Enabling peer to peer device transactions for PCIe devices Message-ID: <20170105232352.GB6426@redhat.com> References: <20170105183927.GA5324@gmail.com> <20170105190113.GA12587@obsidianresearch.com> <20170105195424.GB2166@redhat.com> <20170105200719.GB31047@obsidianresearch.com> <20170105201935.GC2166@redhat.com> <20170105224215.GA3855@obsidianresearch.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170105224215.GA3855@obsidianresearch.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 05 Jan 2017 23:23:56 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 05, 2017 at 03:42:15PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 05, 2017 at 03:19:36PM -0500, Jerome Glisse wrote: > > > > Always having a VMA changes the discussion - the question is how to > > > create a VMA that reprensents IO device memory, and how do DMA > > > consumers extract the correct information from that VMA to pass to the > > > kernel DMA API so it can setup peer-peer DMA. > > > > Well my point is that it can't be. In HMM case inside a single VMA > > you > [..] > > > In the GPUDirect case the idea is that you have a specific device vma > > that you map for peer to peer. > > [..] > > I still don't understand what you driving at - you've said in both > cases a user VMA exists. In the former case no, there is no VMA directly but if you want one than a device can provide one. But such VMA is useless as CPU access is not expected. > > From my perspective in RDMA, all I want is a core kernel flow to > convert a '__user *' into a scatter list of DMA addresses, that works no > matter what is backing that VMA, be it HMM, a 'hidden' GPU object, or > struct page memory. > > A '__user *' pointer is the only way to setup a RDMA MR, and I see no > reason to have another API at this time. > > The details of how to translate to a scatter list are a MM subject, > and the MM folks need to get > > I just don't care if that routine works at a page level, or a whole > VMA level, or some combination of both, that is up to the MM team to > figure out :) And that's what i am trying to get accross. There is 2 cases here. What exist on today hardware. Thing like GPU direct, that works on VMA level. Versus where some new hardware is going were want to do thing on page level. Both require different API at different level. What i was trying to get accross is that no matter what level you consider in the end you still need something at the DMA API level. And that the 2 different use case (device vma or regular vma) means 2 differents API for the device driver. > > > a page level. Expectation here is that the GPU userspace expose a special > > API to allow RDMA to directly happen on GPU object allocated through > > GPU specific API (ie it is not regular memory and it is not accessible > > by CPU). > > So, how do you identify these GPU objects? How do you expect RDMA > convert them to scatter lists? How will ODP work? No ODP on those. If you want vma, the GPU device driver can provide one. GPU object are disjoint from regular memory (coming from some form of mmap). They are created through ioctl and in many case are never expose to the CPU. They only exist inside the GPU driver realm. None the less there is usecase where exchanging those object accross computer over a network make sense. I am not an end user here :) > > > We have MMU notifiers to handle this today in RDMA. Async RDMA MR > > > Invalidate like you see in the above out of tree patches is totally > > > crazy and shouldn't be in mainline. Use ODP capable RDMA hardware. > > > > Well there is still a large base of hardware that do not have such > > feature and some people would like to be able to keep using those. > > Hopefully someone will figure out how to do that without the crazy > async MR invalidation. Personnaly i don't care too much about this old hardware and thus i am fine without supporting them. The open source userspace is playing catchup and doing feature for old hardware probably does not make sense. Cheers, Jérôme