From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751354AbeECQAh (ORCPT ); Thu, 3 May 2018 12:00:37 -0400 Received: from ale.deltatee.com ([207.54.116.67]:53836 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751075AbeECQAe (ORCPT ); Thu, 3 May 2018 12:00:34 -0400 To: =?UTF-8?Q?Christian_K=c3=b6nig?= , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Benjamin Herrenschmidt , Alex Williamson References: <20180423233046.21476-1-logang@deltatee.com> <805645c1-ea40-2e57-88eb-5dd34e579b2e@deltatee.com> <3e4e0126-f444-8d88-6793-b5eb97c61f76@amd.com> From: Logan Gunthorpe Message-ID: Date: Thu, 3 May 2018 09:59:54 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <3e4e0126-f444-8d88-6793-b5eb97c61f76@amd.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 172.16.1.162 X-SA-Exim-Rcpt-To: alex.williamson@redhat.com, benh@kernel.crashing.org, jglisse@redhat.com, dan.j.williams@intel.com, maxg@mellanox.com, jgg@mellanox.com, bhelgaas@google.com, sagi@grimberg.me, keith.busch@intel.com, axboe@kernel.dk, hch@lst.de, sbates@raithlin.com, linux-block@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, christian.koenig@amd.com X-SA-Exim-Mail-From: logang@deltatee.com Subject: Re: [PATCH v4 00/14] Copy Offload in NVMe Fabrics with P2P PCI Memory X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/05/18 03:05 AM, Christian König wrote: > Ok, I'm still missing the big picture here. First question is what is > the P2PDMA provider? Well there's some pretty good documentation in the patchset for this, but in short, a provider is a device that provides some kind of P2P resource (ie. BAR memory, or perhaps a doorbell register -- only memory is supported at this time). > Second question is how to you want to handle things when device are not > behind the same root port (which is perfectly possible in the cases I > deal with)? I think we need to implement a whitelist. If both root ports are in the white list and are on the same bus then we return a larger distance instead of -1. > Third question why multiple clients? That feels a bit like you are > pushing something special to your use case into the common PCI > subsystem. Something which usually isn't a good idea. No, I think this will be pretty standard. In the simple general case you are going to have one provider and at least two clients (one which writes the memory and one which reads it). However, one client is likely, but not necessarily, the same as the provider. In the NVMeof case, we might have N clients: 1 RDMA device and N-1 block devices. The code doesn't care which device provides the memory as it could be the RDMA device or one/all of the block devices (or, in theory, a completely separate device with P2P-able memory). However, it does require that all devices involved are accessible per pci_p2pdma_distance() or it won't use P2P transactions. I could also imagine other use cases: ie. an RDMA NIC sends data to a GPU for processing and then sends the data to an NVMe device for storage (or vice-versa). In this case we have 3 clients and one provider. > As far as I can see we need a function which return the distance between > a initiator and target device. This function then returns -1 if the > transaction can't be made and a positive value otherwise. If you need to make a simpler convenience function for your use case I'm not against it. > We also need to give the direction of the transaction and have a > whitelist root complex PCI-IDs which can handle P2P transactions from > different ports for a certain DMA direction. Yes. In the NVMeof case we need all devices to be able to DMA in both directions so we did not need the DMA direction. But I can see this being useful once we add the whitelist. Logan