From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754108AbdDRVEV (ORCPT ); Tue, 18 Apr 2017 17:04:21 -0400 Received: from quartz.orcorp.ca ([184.70.90.242]:46016 "EHLO quartz.orcorp.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752148AbdDRVER (ORCPT ); Tue, 18 Apr 2017 17:04:17 -0400 Date: Tue, 18 Apr 2017 15:03:39 -0600 From: Jason Gunthorpe To: Dan Williams Cc: Logan Gunthorpe , Benjamin Herrenschmidt , Bjorn Helgaas , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Keith Busch , linux-pci@vger.kernel.org, linux-scsi , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm , "linux-kernel@vger.kernel.org" , Jerome Glisse Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory Message-ID: <20170418210339.GA24257@obsidianresearch.com> References: <1492381396.25766.43.camel@kernel.crashing.org> <20170418164557.GA7181@obsidianresearch.com> <20170418190138.GH7181@obsidianresearch.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Broken-Reverse-DNS: no host name found for IP address 10.0.0.156 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 18, 2017 at 12:48:35PM -0700, Dan Williams wrote: > > Yes, I noticed this problem too and that makes sense. It just means > > every dma_ops will probably need to be modified to either support p2p > > pages or fail on them. Though, the only real difficulty there is that it > > will be a lot of work. > > I don't think you need to go touch all dma_ops, I think you can just > arrange for devices that are going to do dma to get redirected to a > p2p aware provider of operations that overrides the system default > dma_ops. I.e. just touch get_dma_ops(). I don't follow, when does get_dma_ops() return a p2p aware provider? It has no way to know if the DMA is going to involve p2p, get_dma_ops is called with the device initiating the DMA. So you'd always return the P2P shim on a system that has registered P2P memory? Even so, how does this shim work? dma_ops are not really intended to be stacked. How would we make unmap work, for instance? What happens when the underlying iommu dma ops actually natively understands p2p and doesn't want the shim? I think this opens an even bigger can of worms.. Lets find a strategy to safely push this into dma_ops. What about something more incremental like this instead: - dma_ops will set map_sg_p2p == map_sg when they are updated to support p2p, otherwise DMA on P2P pages will fail for those ops. - When all ops support p2p we remove the if and ops->map_sg then just call map_sg_p2p - For now the scatterlist maintains a bit when pages are added indicating if p2p memory might be present in the list. - Unmap for p2p and non-p2p is the same, the underlying ops driver has to make it work. diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 0977317c6835c2..505ed7d502053d 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -103,6 +103,9 @@ struct dma_map_ops { int (*map_sg)(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir, unsigned long attrs); + int (*map_sg_p2p)(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction dir, + unsigned long attrs); void (*unmap_sg)(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir, @@ -244,7 +247,15 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, for_each_sg(sg, s, nents, i) kmemcheck_mark_initialized(sg_virt(s), s->length); BUG_ON(!valid_dma_direction(dir)); - ents = ops->map_sg(dev, sg, nents, dir, attrs); + + if (sg_has_p2p(sg)) { + if (ops->map_sg_p2p) + ents = ops->map_sg_p2p(dev, sg, nents, dir, attrs); + else + return 0; + } else + ents = ops->map_sg(dev, sg, nents, dir, attrs); + BUG_ON(ents < 0); debug_dma_map_sg(dev, sg, nents, ents, dir);