From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757703AbdDRSAy (ORCPT ); Tue, 18 Apr 2017 14:00:54 -0400 Received: from quartz.orcorp.ca ([184.70.90.242]:43301 "EHLO quartz.orcorp.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753231AbdDRSAu (ORCPT ); Tue, 18 Apr 2017 14:00:50 -0400 Date: Tue, 18 Apr 2017 12:00:20 -0600 From: Jason Gunthorpe To: Dan Williams Cc: Benjamin Herrenschmidt , Logan Gunthorpe , Bjorn Helgaas , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Keith Busch , linux-pci@vger.kernel.org, linux-scsi , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm , "linux-kernel@vger.kernel.org" , Jerome Glisse Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory Message-ID: <20170418180020.GE7181@obsidianresearch.com> References: <1492381396.25766.43.camel@kernel.crashing.org> <20170418164557.GA7181@obsidianresearch.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Broken-Reverse-DNS: no host name found for IP address 10.0.0.156 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 18, 2017 at 10:27:47AM -0700, Dan Williams wrote: > > FWIW, RDMA probably wouldn't want to use a p2mem device either, we > > already have APIs that map BAR memory to user space, and would like to > > keep using them. A 'enable P2P for bar' helper function sounds better > > to me. > > ...and I think it's not a helper function as much as asking the bus > provider "can these two device dma to each other". What I mean I could write in a RDMA driver: /* Allow the memory in BAR 1 to be the target of P2P transactions */ pci_enable_p2p_bar(dev, 1); And not require anything else.. > The "helper" is the dma api redirecting through a software-iommu > that handles bus address translation differently than it would > handle host memory dma mapping. Not sure, until we see what arches actually need to do here it is hard to design common helpers. Here are a few obvious things that arches will need to implement to support this broadly: - Virtualization might need to do a hypervisor call to get the right translation, or consult some hypervisor specific description table. - Anything using IOMMUs for virtualization will need to setup IOMMU permissions to allow the P2P flow, this might require translation to an address cookie. - Fail if the PCI devices are in different domains, or setup hardware to do completion bus/device/function translation. - All platforms can succeed if the PCI devices are under the same 'segment', but where segments begin is somewhat platform specific knowledge. (this is 'same switch' idea Logan has talked about) So, we can eventually design helpers for various common scenarios, but until we see what arch code actually needs to do it seems premature. Much of this seems to involve interaction with some kind of hardware, or consulation of some kind of currently platform specific data, so I'm not sure what a software-iommu would be doing?? The main thing to agree on is that this code belongs under dma ops and that arches have to support struct page mapped BAR addresses in their dma ops inputs. Is that resonable? Jason