From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5AC0C48BD6 for ; Wed, 26 Jun 2019 20:45:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A93F6216FD for ; Wed, 26 Jun 2019 20:45:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726399AbfFZUpx (ORCPT ); Wed, 26 Jun 2019 16:45:53 -0400 Received: from ale.deltatee.com ([207.54.116.67]:49582 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726293AbfFZUpx (ORCPT ); Wed, 26 Jun 2019 16:45:53 -0400 Received: from s01061831bf6ec98c.cg.shawcable.net ([68.147.80.180] helo=[192.168.6.132]) by ale.deltatee.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1hgEnT-0002Rd-NS; Wed, 26 Jun 2019 14:45:40 -0600 To: Jason Gunthorpe Cc: Christoph Hellwig , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, Jens Axboe , Bjorn Helgaas , Dan Williams , Sagi Grimberg , Keith Busch , Stephen Bates References: <20190620161240.22738-1-logang@deltatee.com> <20190624072752.GA3954@lst.de> <558a27ba-e7c9-9d94-cad0-377b8ee374a6@deltatee.com> <20190625072008.GB30350@lst.de> <20190625170115.GA9746@lst.de> <41235a05-8ed1-e69a-e7cd-48cae7d8a676@deltatee.com> <20190626065708.GB24531@lst.de> <20190626202107.GA5850@ziepe.ca> From: Logan Gunthorpe Message-ID: <8a0a08c3-a537-bff6-0852-a5f337a70688@deltatee.com> Date: Wed, 26 Jun 2019 14:45:38 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.1 MIME-Version: 1.0 In-Reply-To: <20190626202107.GA5850@ziepe.ca> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 68.147.80.180 X-SA-Exim-Rcpt-To: sbates@raithlin.com, kbusch@kernel.org, sagi@grimberg.me, dan.j.williams@intel.com, bhelgaas@google.com, axboe@kernel.dk, linux-rdma@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, hch@lst.de, jgg@ziepe.ca X-SA-Exim-Mail-From: logang@deltatee.com Subject: Re: [RFC PATCH 00/28] Removing struct page from P2PDMA X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On 2019-06-26 2:21 p.m., Jason Gunthorpe wrote: > On Wed, Jun 26, 2019 at 12:31:08PM -0600, Logan Gunthorpe wrote: >>> we have a hole behind len where we could store flag. Preferably >>> optionally based on a P2P or other magic memory types config >>> option so that 32-bit systems with 32-bit phys_addr_t actually >>> benefit from the smaller and better packing structure. >> >> That seems sensible. The one thing that's unclear though is how to get >> the PCI Bus address when appropriate. Can we pass that in instead of the >> phys_addr with an appropriate flag? Or will we need to pass the actual >> physical address and then, at the map step, the driver has to some how >> lookup the PCI device to figure out the bus offset? > > I agree with CH, if we go down this path it is a layering violation > for the thing injecting bio's into the block stack to know what struct > device they egress&dma map on just to be able to do the dma_map up > front. Not sure I agree with this statement. The p2pdma code already *must* know and access the pci_dev of the dma device ahead of when it submits the IO to know if it's valid to allocate and use P2P memory at all. This is why the submitting driver has a lot of the information needed to map this memory that the mapping driver does not. > So we must be able to go from this new phys_addr_t&flags to some BAR > information during dma_map. > For instance we could use a small hash table of the upper phys addr > bits, or an interval tree, to do the lookup. Yes, if we're going to take a hard stance on this. But using an interval tree (or similar) is a lot more work for the CPU to figure out these mappings that may not be strictly necessary if we could just pass better information down from the submitting driver to the mapping driver. > The bar info would give the exporting struct device and any other info > we need to make the iommu mapping. Well, the IOMMU mapping is the normal thing the mapping driver will always do. We'd really just need the submitting driver to, when appropriate, inform the mapping driver that this is a pci bus address and not to call dma_map_xxx(). Then, for special mappings for the CMB like Christoph is talking about, it's simply a matter of doing a range compare on the PCI Bus address and converting the bus address to a BAR and offset. Logan