From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-DM3-obe.outbound.protection.outlook.com (mail-dm3nam03on0085.outbound.protection.outlook.com [104.47.41.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id C1ADF2063E2E9 for ; Thu, 3 May 2018 10:30:04 -0700 (PDT) Subject: Re: [PATCH v4 00/14] Copy Offload in NVMe Fabrics with P2P PCI Memory References: <20180423233046.21476-1-logang@deltatee.com> <805645c1-ea40-2e57-88eb-5dd34e579b2e@deltatee.com> <3e4e0126-f444-8d88-6793-b5eb97c61f76@amd.com> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: <38d866cf-f7b4-7118-d737-5a5dcd9f3784@amd.com> Date: Thu, 3 May 2018 19:29:11 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: base64 Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Logan Gunthorpe , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Jens Axboe , Benjamin Herrenschmidt , Alex Williamson , Keith Busch , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jason Gunthorpe , Bjorn Helgaas , Max Gurtovoy , Christoph Hellwig List-ID: QW0gMDMuMDUuMjAxOCB1bSAxNzo1OSBzY2hyaWViIExvZ2FuIEd1bnRob3JwZToKPiBPbiAwMy8w NS8xOCAwMzowNSBBTSwgQ2hyaXN0aWFuIEvDtm5pZyB3cm90ZToKPj4gU2Vjb25kIHF1ZXN0aW9u IGlzIGhvdyB0byB5b3Ugd2FudCB0byBoYW5kbGUgdGhpbmdzIHdoZW4gZGV2aWNlIGFyZSBub3QK Pj4gYmVoaW5kIHRoZSBzYW1lIHJvb3QgcG9ydCAod2hpY2ggaXMgcGVyZmVjdGx5IHBvc3NpYmxl IGluIHRoZSBjYXNlcyBJCj4+IGRlYWwgd2l0aCk/Cj4gSSB0aGluayB3ZSBuZWVkIHRvIGltcGxl bWVudCBhIHdoaXRlbGlzdC4gSWYgYm90aCByb290IHBvcnRzIGFyZSBpbiB0aGUKPiB3aGl0ZSBs aXN0IGFuZCBhcmUgb24gdGhlIHNhbWUgYnVzIHRoZW4gd2UgcmV0dXJuIGEgbGFyZ2VyIGRpc3Rh bmNlCj4gaW5zdGVhZCBvZiAtMS4KClNvdW5kcyBnb29kLgoKPj4gVGhpcmQgcXVlc3Rpb24gd2h5 IG11bHRpcGxlIGNsaWVudHM/IFRoYXQgZmVlbHMgYSBiaXQgbGlrZSB5b3UgYXJlCj4+IHB1c2hp bmcgc29tZXRoaW5nIHNwZWNpYWwgdG8geW91ciB1c2UgY2FzZSBpbnRvIHRoZSBjb21tb24gUENJ Cj4+IHN1YnN5c3RlbS4gU29tZXRoaW5nIHdoaWNoIHVzdWFsbHkgaXNuJ3QgYSBnb29kIGlkZWEu Cj4gTm8sIEkgdGhpbmsgdGhpcyB3aWxsIGJlIHByZXR0eSBzdGFuZGFyZC4gSW4gdGhlIHNpbXBs ZSBnZW5lcmFsIGNhc2UgeW91Cj4gYXJlIGdvaW5nIHRvIGhhdmUgb25lIHByb3ZpZGVyIGFuZCBh dCBsZWFzdCB0d28gY2xpZW50cyAob25lIHdoaWNoCj4gd3JpdGVzIHRoZSBtZW1vcnkgYW5kIG9u ZSB3aGljaCByZWFkcyBpdCkuIEhvd2V2ZXIsIG9uZSBjbGllbnQgaXMKPiBsaWtlbHksIGJ1dCBu b3QgbmVjZXNzYXJpbHksIHRoZSBzYW1lIGFzIHRoZSBwcm92aWRlci4KCk9rLCB0aGF0IGlzIHRo ZSBwb2ludCB3aGVyZSBJJ20gc3R1Y2suIFdoeSBkbyB3ZSBuZWVkIHRoYXQgaW4gb25lIApmdW5j dGlvbiBjYWxsIGluIHRoZSBQQ0llIHN1YnN5c3RlbT8KClRoZSBwcm9ibGVtIGF0IGxlYXN0IHdp dGggR1BVcyBpcyB0aGF0IHdlIHNlcmlvdXNseSBkb24ndCBoYXZlIHRoYXQgCmluZm9ybWF0aW9u IGhlcmUsIGNhdXNlIHRoZSBQQ0kgc3Vic3lzdGVtIG1pZ2h0IG5vdCBiZSBhd2FyZSBvZiBhbGwg dGhlIAppbnRlcmNvbm5lY3Rpb25zLgoKRm9yIGV4YW1wbGUgaXQgaXNuJ3QgdW5jb21tb24gdG8g cHV0IG11bHRpcGxlIEdQVXMgb24gb25lIGJvYXJkLiBUbyB0aGUgClBDSSBzdWJzeXN0ZW0gdGhh dCBsb29rcyBsaWtlIHNlcGFyYXRlIGRldmljZXMsIGJ1dCBpbiByZWFsaXR5IGFsbCBHUFVzIAph cmUgaW50ZXJjb25uZWN0ZWQgYW5kIGNhbiBhY2Nlc3MgZWFjaCBvdGhlcnMgbWVtb3J5IGRpcmVj dGx5IHdpdGhvdXQgCmdvaW5nIG92ZXIgdGhlIFBDSWUgYnVzLgoKSSBzZXJpb3VzbHkgZG9uJ3Qg d2FudCB0byBtb2RlbCB0aGF0IGluIHRoZSBQQ0kgc3Vic3lzdGVtLCBidXQgcmF0aGVyIAp0aGUg ZHJpdmVyLiBUaGF0J3Mgd2h5IGl0IGZlZWxzIGxpa2UgYSBtaXN0YWtlIHRvIG1lIHRvIHB1c2gg YWxsIHRoYXQgCmludG8gdGhlIFBDSSBmdW5jdGlvbi4KCj4gSW4gdGhlIE5WTWVvZiBjYXNlLCB3 ZSBtaWdodCBoYXZlIE4gY2xpZW50czogMSBSRE1BIGRldmljZSBhbmQgTi0xIGJsb2NrCj4gZGV2 aWNlcy4gVGhlIGNvZGUgZG9lc24ndCBjYXJlIHdoaWNoIGRldmljZSBwcm92aWRlcyB0aGUgbWVt b3J5IGFzIGl0Cj4gY291bGQgYmUgdGhlIFJETUEgZGV2aWNlIG9yIG9uZS9hbGwgb2YgdGhlIGJs b2NrIGRldmljZXMgKG9yLCBpbiB0aGVvcnksCj4gYSBjb21wbGV0ZWx5IHNlcGFyYXRlIGRldmlj ZSB3aXRoIFAyUC1hYmxlIG1lbW9yeSkuIEhvd2V2ZXIsIGl0IGRvZXMKPiByZXF1aXJlIHRoYXQg YWxsIGRldmljZXMgaW52b2x2ZWQgYXJlIGFjY2Vzc2libGUgcGVyCj4gcGNpX3AycGRtYV9kaXN0 YW5jZSgpIG9yIGl0IHdvbid0IHVzZSBQMlAgdHJhbnNhY3Rpb25zLgo+Cj4gSSBjb3VsZCBhbHNv IGltYWdpbmUgb3RoZXIgdXNlIGNhc2VzOiBpZS4gYW4gUkRNQSBOSUMgc2VuZHMgZGF0YSB0byBh Cj4gR1BVIGZvciBwcm9jZXNzaW5nIGFuZCB0aGVuIHNlbmRzIHRoZSBkYXRhIHRvIGFuIE5WTWUg ZGV2aWNlIGZvciBzdG9yYWdlCj4gKG9yIHZpY2UtdmVyc2EpLiBJbiB0aGlzIGNhc2Ugd2UgaGF2 ZSAzIGNsaWVudHMgYW5kIG9uZSBwcm92aWRlci4KCldoeSBjYW4ndCB3ZSBtb2RlbCB0aGF0IGFz IHR3byBzZXBhcmF0ZSB0cmFuc2FjdGlvbnM/CgpFLmcuIG9uZSBmcm9tIHRoZSBSRE1BIE5JQyB0 byB0aGUgR1BVIG1lbW9yeS4gQW5kIGFub3RoZXIgb25lIGZyb20gdGhlIApHUFUgbWVtb3J5IHRv IHRoZSBOVk1lIGRldmljZS4KClRoYXQgd291bGQgYWxzbyBtYXRjaCBob3cgSSBnZXQgdGhpcyBp bmZvcm1hdGlvbiBmcm9tIHVzZXJzcGFjZS4KCj4+IEFzIGZhciBhcyBJIGNhbiBzZWUgd2UgbmVl ZCBhIGZ1bmN0aW9uIHdoaWNoIHJldHVybiB0aGUgZGlzdGFuY2UgYmV0d2Vlbgo+PiBhIGluaXRp YXRvciBhbmQgdGFyZ2V0IGRldmljZS4gVGhpcyBmdW5jdGlvbiB0aGVuIHJldHVybnMgLTEgaWYg dGhlCj4+IHRyYW5zYWN0aW9uIGNhbid0IGJlIG1hZGUgYW5kIGEgcG9zaXRpdmUgdmFsdWUgb3Ro ZXJ3aXNlLgo+IElmIHlvdSBuZWVkIHRvIG1ha2UgYSBzaW1wbGVyIGNvbnZlbmllbmNlIGZ1bmN0 aW9uIGZvciB5b3VyIHVzZSBjYXNlIEknbQo+IG5vdCBhZ2FpbnN0IGl0LgoKWWVhaCwgc2FtZSBm b3IgbWUuIElmIEJqb3JuIGlzIG9rIHdpdGggdGhhdCBzcGVjaWFsaXplZCBOVk0gZnVuY3Rpb25z IAp0aGF0IEknbSBmaW5lIHdpdGggdGhhdCBhcyB3ZWxsLgoKSSB0aGluayBpdCB3b3VsZCBqdXN0 IGJlIG1vcmUgY29udmVuaWVudCB3aGVuIHdlIGNhbiBjb21lIHVwIHdpdGggCmZ1bmN0aW9ucyB3 aGljaCBjYW4gaGFuZGxlIGFsbCB1c2UgY2FzZXMsIGNhdXNlIHRoZXJlIHN0aWxsIHNlZW1zIHRv IGJlIAphIGxvdCBvZiBzaW1pbGFyaXRpZXMuCgo+Cj4+IFdlIGFsc28gbmVlZCB0byBnaXZlIHRo ZSBkaXJlY3Rpb24gb2YgdGhlIHRyYW5zYWN0aW9uIGFuZCBoYXZlIGEKPj4gd2hpdGVsaXN0IHJv b3QgY29tcGxleCBQQ0ktSURzIHdoaWNoIGNhbiBoYW5kbGUgUDJQIHRyYW5zYWN0aW9ucyBmcm9t Cj4+IGRpZmZlcmVudCBwb3J0cyBmb3IgYSBjZXJ0YWluIERNQSBkaXJlY3Rpb24uCj4gWWVzLiBJ biB0aGUgTlZNZW9mIGNhc2Ugd2UgbmVlZCBhbGwgZGV2aWNlcyB0byBiZSBhYmxlIHRvIERNQSBp biBib3RoCj4gZGlyZWN0aW9ucyBzbyB3ZSBkaWQgbm90IG5lZWQgdGhlIERNQSBkaXJlY3Rpb24u IEJ1dCBJIGNhbiBzZWUgdGhpcwo+IGJlaW5nIHVzZWZ1bCBvbmNlIHdlIGFkZCB0aGUgd2hpdGVs aXN0LgoKT2ssIEkgYWdyZWUgdGhhdCBjYW4gYmUgYWRkZWQgbGF0ZXIgb24uIEZvciBzaW1wbGlj aXR5IGxldCdzIGFzc3VtZSBmb3IgCm5vdyB3ZSBhbHdheXMgdG8gYmlkaXJlY3Rpb25hbCB0cmFu c2ZlcnMuCgpUaGFua3MgZm9yIHRoZSBleHBsYW5hdGlvbiwKQ2hyaXN0aWFuLgoKPgo+IExvZ2Fu CgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpMaW51eC1u dmRpbW0gbWFpbGluZyBsaXN0CkxpbnV4LW52ZGltbUBsaXN0cy4wMS5vcmcKaHR0cHM6Ly9saXN0 cy4wMS5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1udmRpbW0K From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH v4 00/14] Copy Offload in NVMe Fabrics with P2P PCI Memory To: Logan Gunthorpe , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Benjamin Herrenschmidt , Alex Williamson References: <20180423233046.21476-1-logang@deltatee.com> <805645c1-ea40-2e57-88eb-5dd34e579b2e@deltatee.com> <3e4e0126-f444-8d88-6793-b5eb97c61f76@amd.com> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: <38d866cf-f7b4-7118-d737-5a5dcd9f3784@amd.com> Date: Thu, 3 May 2018 19:29:11 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Return-Path: christian.koenig@amd.com List-ID: Am 03.05.2018 um 17:59 schrieb Logan Gunthorpe: > On 03/05/18 03:05 AM, Christian König wrote: >> Second question is how to you want to handle things when device are not >> behind the same root port (which is perfectly possible in the cases I >> deal with)? > I think we need to implement a whitelist. If both root ports are in the > white list and are on the same bus then we return a larger distance > instead of -1. Sounds good. >> Third question why multiple clients? That feels a bit like you are >> pushing something special to your use case into the common PCI >> subsystem. Something which usually isn't a good idea. > No, I think this will be pretty standard. In the simple general case you > are going to have one provider and at least two clients (one which > writes the memory and one which reads it). However, one client is > likely, but not necessarily, the same as the provider. Ok, that is the point where I'm stuck. Why do we need that in one function call in the PCIe subsystem? The problem at least with GPUs is that we seriously don't have that information here, cause the PCI subsystem might not be aware of all the interconnections. For example it isn't uncommon to put multiple GPUs on one board. To the PCI subsystem that looks like separate devices, but in reality all GPUs are interconnected and can access each others memory directly without going over the PCIe bus. I seriously don't want to model that in the PCI subsystem, but rather the driver. That's why it feels like a mistake to me to push all that into the PCI function. > In the NVMeof case, we might have N clients: 1 RDMA device and N-1 block > devices. The code doesn't care which device provides the memory as it > could be the RDMA device or one/all of the block devices (or, in theory, > a completely separate device with P2P-able memory). However, it does > require that all devices involved are accessible per > pci_p2pdma_distance() or it won't use P2P transactions. > > I could also imagine other use cases: ie. an RDMA NIC sends data to a > GPU for processing and then sends the data to an NVMe device for storage > (or vice-versa). In this case we have 3 clients and one provider. Why can't we model that as two separate transactions? E.g. one from the RDMA NIC to the GPU memory. And another one from the GPU memory to the NVMe device. That would also match how I get this information from userspace. >> As far as I can see we need a function which return the distance between >> a initiator and target device. This function then returns -1 if the >> transaction can't be made and a positive value otherwise. > If you need to make a simpler convenience function for your use case I'm > not against it. Yeah, same for me. If Bjorn is ok with that specialized NVM functions that I'm fine with that as well. I think it would just be more convenient when we can come up with functions which can handle all use cases, cause there still seems to be a lot of similarities. > >> We also need to give the direction of the transaction and have a >> whitelist root complex PCI-IDs which can handle P2P transactions from >> different ports for a certain DMA direction. > Yes. In the NVMeof case we need all devices to be able to DMA in both > directions so we did not need the DMA direction. But I can see this > being useful once we add the whitelist. Ok, I agree that can be added later on. For simplicity let's assume for now we always to bidirectional transfers. Thanks for the explanation, Christian. > > Logan From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Christian_K=c3=b6nig?= Subject: Re: [PATCH v4 00/14] Copy Offload in NVMe Fabrics with P2P PCI Memory Date: Thu, 3 May 2018 19:29:11 +0200 Message-ID: <38d866cf-f7b4-7118-d737-5a5dcd9f3784@amd.com> References: <20180423233046.21476-1-logang@deltatee.com> <805645c1-ea40-2e57-88eb-5dd34e579b2e@deltatee.com> <3e4e0126-f444-8d88-6793-b5eb97c61f76@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; Format="flowed" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" To: Logan Gunthorpe , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: Jens Axboe , Benjamin Herrenschmidt , Alex Williamson , Keith Busch , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jason Gunthorpe , Bjorn Helgaas , Max Gurtovoy , Christoph Hellwig List-Id: linux-rdma@vger.kernel.org QW0gMDMuMDUuMjAxOCB1bSAxNzo1OSBzY2hyaWViIExvZ2FuIEd1bnRob3JwZToKPiBPbiAwMy8w NS8xOCAwMzowNSBBTSwgQ2hyaXN0aWFuIEvDtm5pZyB3cm90ZToKPj4gU2Vjb25kIHF1ZXN0aW9u IGlzIGhvdyB0byB5b3Ugd2FudCB0byBoYW5kbGUgdGhpbmdzIHdoZW4gZGV2aWNlIGFyZSBub3QK Pj4gYmVoaW5kIHRoZSBzYW1lIHJvb3QgcG9ydCAod2hpY2ggaXMgcGVyZmVjdGx5IHBvc3NpYmxl IGluIHRoZSBjYXNlcyBJCj4+IGRlYWwgd2l0aCk/Cj4gSSB0aGluayB3ZSBuZWVkIHRvIGltcGxl bWVudCBhIHdoaXRlbGlzdC4gSWYgYm90aCByb290IHBvcnRzIGFyZSBpbiB0aGUKPiB3aGl0ZSBs aXN0IGFuZCBhcmUgb24gdGhlIHNhbWUgYnVzIHRoZW4gd2UgcmV0dXJuIGEgbGFyZ2VyIGRpc3Rh bmNlCj4gaW5zdGVhZCBvZiAtMS4KClNvdW5kcyBnb29kLgoKPj4gVGhpcmQgcXVlc3Rpb24gd2h5 IG11bHRpcGxlIGNsaWVudHM/IFRoYXQgZmVlbHMgYSBiaXQgbGlrZSB5b3UgYXJlCj4+IHB1c2hp bmcgc29tZXRoaW5nIHNwZWNpYWwgdG8geW91ciB1c2UgY2FzZSBpbnRvIHRoZSBjb21tb24gUENJ Cj4+IHN1YnN5c3RlbS4gU29tZXRoaW5nIHdoaWNoIHVzdWFsbHkgaXNuJ3QgYSBnb29kIGlkZWEu Cj4gTm8sIEkgdGhpbmsgdGhpcyB3aWxsIGJlIHByZXR0eSBzdGFuZGFyZC4gSW4gdGhlIHNpbXBs ZSBnZW5lcmFsIGNhc2UgeW91Cj4gYXJlIGdvaW5nIHRvIGhhdmUgb25lIHByb3ZpZGVyIGFuZCBh dCBsZWFzdCB0d28gY2xpZW50cyAob25lIHdoaWNoCj4gd3JpdGVzIHRoZSBtZW1vcnkgYW5kIG9u ZSB3aGljaCByZWFkcyBpdCkuIEhvd2V2ZXIsIG9uZSBjbGllbnQgaXMKPiBsaWtlbHksIGJ1dCBu b3QgbmVjZXNzYXJpbHksIHRoZSBzYW1lIGFzIHRoZSBwcm92aWRlci4KCk9rLCB0aGF0IGlzIHRo ZSBwb2ludCB3aGVyZSBJJ20gc3R1Y2suIFdoeSBkbyB3ZSBuZWVkIHRoYXQgaW4gb25lIApmdW5j dGlvbiBjYWxsIGluIHRoZSBQQ0llIHN1YnN5c3RlbT8KClRoZSBwcm9ibGVtIGF0IGxlYXN0IHdp dGggR1BVcyBpcyB0aGF0IHdlIHNlcmlvdXNseSBkb24ndCBoYXZlIHRoYXQgCmluZm9ybWF0aW9u IGhlcmUsIGNhdXNlIHRoZSBQQ0kgc3Vic3lzdGVtIG1pZ2h0IG5vdCBiZSBhd2FyZSBvZiBhbGwg dGhlIAppbnRlcmNvbm5lY3Rpb25zLgoKRm9yIGV4YW1wbGUgaXQgaXNuJ3QgdW5jb21tb24gdG8g cHV0IG11bHRpcGxlIEdQVXMgb24gb25lIGJvYXJkLiBUbyB0aGUgClBDSSBzdWJzeXN0ZW0gdGhh dCBsb29rcyBsaWtlIHNlcGFyYXRlIGRldmljZXMsIGJ1dCBpbiByZWFsaXR5IGFsbCBHUFVzIAph cmUgaW50ZXJjb25uZWN0ZWQgYW5kIGNhbiBhY2Nlc3MgZWFjaCBvdGhlcnMgbWVtb3J5IGRpcmVj dGx5IHdpdGhvdXQgCmdvaW5nIG92ZXIgdGhlIFBDSWUgYnVzLgoKSSBzZXJpb3VzbHkgZG9uJ3Qg d2FudCB0byBtb2RlbCB0aGF0IGluIHRoZSBQQ0kgc3Vic3lzdGVtLCBidXQgcmF0aGVyIAp0aGUg ZHJpdmVyLiBUaGF0J3Mgd2h5IGl0IGZlZWxzIGxpa2UgYSBtaXN0YWtlIHRvIG1lIHRvIHB1c2gg YWxsIHRoYXQgCmludG8gdGhlIFBDSSBmdW5jdGlvbi4KCj4gSW4gdGhlIE5WTWVvZiBjYXNlLCB3 ZSBtaWdodCBoYXZlIE4gY2xpZW50czogMSBSRE1BIGRldmljZSBhbmQgTi0xIGJsb2NrCj4gZGV2 aWNlcy4gVGhlIGNvZGUgZG9lc24ndCBjYXJlIHdoaWNoIGRldmljZSBwcm92aWRlcyB0aGUgbWVt b3J5IGFzIGl0Cj4gY291bGQgYmUgdGhlIFJETUEgZGV2aWNlIG9yIG9uZS9hbGwgb2YgdGhlIGJs b2NrIGRldmljZXMgKG9yLCBpbiB0aGVvcnksCj4gYSBjb21wbGV0ZWx5IHNlcGFyYXRlIGRldmlj ZSB3aXRoIFAyUC1hYmxlIG1lbW9yeSkuIEhvd2V2ZXIsIGl0IGRvZXMKPiByZXF1aXJlIHRoYXQg YWxsIGRldmljZXMgaW52b2x2ZWQgYXJlIGFjY2Vzc2libGUgcGVyCj4gcGNpX3AycGRtYV9kaXN0 YW5jZSgpIG9yIGl0IHdvbid0IHVzZSBQMlAgdHJhbnNhY3Rpb25zLgo+Cj4gSSBjb3VsZCBhbHNv IGltYWdpbmUgb3RoZXIgdXNlIGNhc2VzOiBpZS4gYW4gUkRNQSBOSUMgc2VuZHMgZGF0YSB0byBh Cj4gR1BVIGZvciBwcm9jZXNzaW5nIGFuZCB0aGVuIHNlbmRzIHRoZSBkYXRhIHRvIGFuIE5WTWUg ZGV2aWNlIGZvciBzdG9yYWdlCj4gKG9yIHZpY2UtdmVyc2EpLiBJbiB0aGlzIGNhc2Ugd2UgaGF2 ZSAzIGNsaWVudHMgYW5kIG9uZSBwcm92aWRlci4KCldoeSBjYW4ndCB3ZSBtb2RlbCB0aGF0IGFz IHR3byBzZXBhcmF0ZSB0cmFuc2FjdGlvbnM/CgpFLmcuIG9uZSBmcm9tIHRoZSBSRE1BIE5JQyB0 byB0aGUgR1BVIG1lbW9yeS4gQW5kIGFub3RoZXIgb25lIGZyb20gdGhlIApHUFUgbWVtb3J5IHRv IHRoZSBOVk1lIGRldmljZS4KClRoYXQgd291bGQgYWxzbyBtYXRjaCBob3cgSSBnZXQgdGhpcyBp bmZvcm1hdGlvbiBmcm9tIHVzZXJzcGFjZS4KCj4+IEFzIGZhciBhcyBJIGNhbiBzZWUgd2UgbmVl ZCBhIGZ1bmN0aW9uIHdoaWNoIHJldHVybiB0aGUgZGlzdGFuY2UgYmV0d2Vlbgo+PiBhIGluaXRp YXRvciBhbmQgdGFyZ2V0IGRldmljZS4gVGhpcyBmdW5jdGlvbiB0aGVuIHJldHVybnMgLTEgaWYg dGhlCj4+IHRyYW5zYWN0aW9uIGNhbid0IGJlIG1hZGUgYW5kIGEgcG9zaXRpdmUgdmFsdWUgb3Ro ZXJ3aXNlLgo+IElmIHlvdSBuZWVkIHRvIG1ha2UgYSBzaW1wbGVyIGNvbnZlbmllbmNlIGZ1bmN0 aW9uIGZvciB5b3VyIHVzZSBjYXNlIEknbQo+IG5vdCBhZ2FpbnN0IGl0LgoKWWVhaCwgc2FtZSBm b3IgbWUuIElmIEJqb3JuIGlzIG9rIHdpdGggdGhhdCBzcGVjaWFsaXplZCBOVk0gZnVuY3Rpb25z IAp0aGF0IEknbSBmaW5lIHdpdGggdGhhdCBhcyB3ZWxsLgoKSSB0aGluayBpdCB3b3VsZCBqdXN0 IGJlIG1vcmUgY29udmVuaWVudCB3aGVuIHdlIGNhbiBjb21lIHVwIHdpdGggCmZ1bmN0aW9ucyB3 aGljaCBjYW4gaGFuZGxlIGFsbCB1c2UgY2FzZXMsIGNhdXNlIHRoZXJlIHN0aWxsIHNlZW1zIHRv IGJlIAphIGxvdCBvZiBzaW1pbGFyaXRpZXMuCgo+Cj4+IFdlIGFsc28gbmVlZCB0byBnaXZlIHRo ZSBkaXJlY3Rpb24gb2YgdGhlIHRyYW5zYWN0aW9uIGFuZCBoYXZlIGEKPj4gd2hpdGVsaXN0IHJv b3QgY29tcGxleCBQQ0ktSURzIHdoaWNoIGNhbiBoYW5kbGUgUDJQIHRyYW5zYWN0aW9ucyBmcm9t Cj4+IGRpZmZlcmVudCBwb3J0cyBmb3IgYSBjZXJ0YWluIERNQSBkaXJlY3Rpb24uCj4gWWVzLiBJ biB0aGUgTlZNZW9mIGNhc2Ugd2UgbmVlZCBhbGwgZGV2aWNlcyB0byBiZSBhYmxlIHRvIERNQSBp biBib3RoCj4gZGlyZWN0aW9ucyBzbyB3ZSBkaWQgbm90IG5lZWQgdGhlIERNQSBkaXJlY3Rpb24u IEJ1dCBJIGNhbiBzZWUgdGhpcwo+IGJlaW5nIHVzZWZ1bCBvbmNlIHdlIGFkZCB0aGUgd2hpdGVs aXN0LgoKT2ssIEkgYWdyZWUgdGhhdCBjYW4gYmUgYWRkZWQgbGF0ZXIgb24uIEZvciBzaW1wbGlj aXR5IGxldCdzIGFzc3VtZSBmb3IgCm5vdyB3ZSBhbHdheXMgdG8gYmlkaXJlY3Rpb25hbCB0cmFu c2ZlcnMuCgpUaGFua3MgZm9yIHRoZSBleHBsYW5hdGlvbiwKQ2hyaXN0aWFuLgoKPgo+IExvZ2Fu CgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpMaW51eC1u dmRpbW0gbWFpbGluZyBsaXN0CkxpbnV4LW52ZGltbUBsaXN0cy4wMS5vcmcKaHR0cHM6Ly9saXN0 cy4wMS5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1udmRpbW0K From mboxrd@z Thu Jan 1 00:00:00 1970 From: christian.koenig@amd.com (=?UTF-8?Q?Christian_K=c3=b6nig?=) Date: Thu, 3 May 2018 19:29:11 +0200 Subject: [PATCH v4 00/14] Copy Offload in NVMe Fabrics with P2P PCI Memory In-Reply-To: References: <20180423233046.21476-1-logang@deltatee.com> <805645c1-ea40-2e57-88eb-5dd34e579b2e@deltatee.com> <3e4e0126-f444-8d88-6793-b5eb97c61f76@amd.com> Message-ID: <38d866cf-f7b4-7118-d737-5a5dcd9f3784@amd.com> Am 03.05.2018 um 17:59 schrieb Logan Gunthorpe: > On 03/05/18 03:05 AM, Christian K?nig wrote: >> Second question is how to you want to handle things when device are not >> behind the same root port (which is perfectly possible in the cases I >> deal with)? > I think we need to implement a whitelist. If both root ports are in the > white list and are on the same bus then we return a larger distance > instead of -1. Sounds good. >> Third question why multiple clients? That feels a bit like you are >> pushing something special to your use case into the common PCI >> subsystem. Something which usually isn't a good idea. > No, I think this will be pretty standard. In the simple general case you > are going to have one provider and at least two clients (one which > writes the memory and one which reads it). However, one client is > likely, but not necessarily, the same as the provider. Ok, that is the point where I'm stuck. Why do we need that in one function call in the PCIe subsystem? The problem at least with GPUs is that we seriously don't have that information here, cause the PCI subsystem might not be aware of all the interconnections. For example it isn't uncommon to put multiple GPUs on one board. To the PCI subsystem that looks like separate devices, but in reality all GPUs are interconnected and can access each others memory directly without going over the PCIe bus. I seriously don't want to model that in the PCI subsystem, but rather the driver. That's why it feels like a mistake to me to push all that into the PCI function. > In the NVMeof case, we might have N clients: 1 RDMA device and N-1 block > devices. The code doesn't care which device provides the memory as it > could be the RDMA device or one/all of the block devices (or, in theory, > a completely separate device with P2P-able memory). However, it does > require that all devices involved are accessible per > pci_p2pdma_distance() or it won't use P2P transactions. > > I could also imagine other use cases: ie. an RDMA NIC sends data to a > GPU for processing and then sends the data to an NVMe device for storage > (or vice-versa). In this case we have 3 clients and one provider. Why can't we model that as two separate transactions? E.g. one from the RDMA NIC to the GPU memory. And another one from the GPU memory to the NVMe device. That would also match how I get this information from userspace. >> As far as I can see we need a function which return the distance between >> a initiator and target device. This function then returns -1 if the >> transaction can't be made and a positive value otherwise. > If you need to make a simpler convenience function for your use case I'm > not against it. Yeah, same for me. If Bjorn is ok with that specialized NVM functions that I'm fine with that as well. I think it would just be more convenient when we can come up with functions which can handle all use cases, cause there still seems to be a lot of similarities. > >> We also need to give the direction of the transaction and have a >> whitelist root complex PCI-IDs which can handle P2P transactions from >> different ports for a certain DMA direction. > Yes. In the NVMeof case we need all devices to be able to DMA in both > directions so we did not need the DMA direction. But I can see this > being useful once we add the whitelist. Ok, I agree that can be added later on. For simplicity let's assume for now we always to bidirectional transfers. Thanks for the explanation, Christian. > > Logan