From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin Herrenschmidt Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory Date: Wed, 12 Apr 2017 15:22:12 +1000 Message-ID: <1491974532.7236.43.camel@kernel.crashing.org> References: <1490911959-5146-1-git-send-email-logang@deltatee.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <1490911959-5146-1-git-send-email-logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" To: Logan Gunthorpe , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Dan Williams , Keith Busch , Jason Gunthorpe Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org List-Id: linux-nvdimm@lists.01.org T24gVGh1LCAyMDE3LTAzLTMwIGF0IDE2OjEyIC0wNjAwLCBMb2dhbiBHdW50aG9ycGUgd3JvdGU6 Cj4gSGVsbG8sCj4gCj4gQXMgZGlzY3Vzc2VkIGF0IExTRi9NTSB3ZSdkIGxpa2UgdG8gcHJlc2Vu dCBvdXIgd29yayB0byBlbmFibGUKPiBjb3B5IG9mZmxvYWQgc3VwcG9ydCBpbiBOVk1lIGZhYnJp Y3MgUkRNQSB0YXJnZXRzLiBXZSdkIGFwcHJlY2lhdGUKPiBzb21lIHJldmlldyBhbmQgZmVlZGJh Y2sgZnJvbSB0aGUgY29tbXVuaXR5IG9uIG91ciBkaXJlY3Rpb24uCj4gVGhpcyBzZXJpZXMgaXMg bm90IGludGVuZGVkIHRvIGdvIHVwc3RyZWFtIGF0IHRoaXMgcG9pbnQuCj4gCj4gVGhlIGNvbmNl cHQgaGVyZSBpcyB0byB1c2UgbWVtb3J5IHRoYXQncyBleHBvc2VkIG9uIGEgUENJIEJBUiBhcwo+ IGRhdGEgYnVmZmVycyBpbiB0aGUgTlZNRSB0YXJnZXQgY29kZSBzdWNoIHRoYXQgZGF0YSBjYW4g YmUgdHJhbnNmZXJyZWQKPiBmcm9tIGFuIFJETUEgTklDIHRvIHRoZSBzcGVjaWFsIG1lbW9yeSBh bmQgdGhlbiBkaXJlY3RseSB0byBhbiBOVk1lCj4gZGV2aWNlIGF2b2lkaW5nIHN5c3RlbSBtZW1v cnkgZW50aXJlbHkuIFRoZSB1cHNpZGUgb2YgdGhpcyBpcyBiZXR0ZXIKPiBRb1MgZm9yIGFwcGxp Y2F0aW9ucyBydW5uaW5nIG9uIHRoZSBDUFUgdXRpbGl6aW5nIG1lbW9yeSBhbmQgbG93ZXIKPiBQ Q0kgYmFuZHdpZHRoIHJlcXVpcmVkIHRvIHRoZSBDUFUgKHN1Y2ggdGhhdCBzeXN0ZW1zIGNvdWxk IGJlIGRlc2lnbmVkCj4gd2l0aCBmZXdlciBsYW5lcyBjb25uZWN0ZWQgdG8gdGhlIENQVSkuIEhv d2V2ZXIsIHByZXNlbnRseSwgdGhlIHRyYWRlLW9mZgo+IGlzIGN1cnJlbnRseSBhIHJlZHVjdGlv biBpbiBvdmVyYWxsIHRocm91Z2hwdXQuIChMYXJnZWx5IGR1ZSB0byBoYXJkd2FyZQo+IGlzc3Vl cyB0aGF0IHdvdWxkIGNlcnRhaW5seSBpbXByb3ZlIGluIHRoZSBmdXR1cmUpLgoKQW5vdGhlciBp c3N1ZSBvZiBjb3Vyc2UgaXMgdGhhdCBub3QgYWxsIHN5c3RlbXMgc3VwcG9ydCBQMlAKYmV0d2Vl biBob3N0IGJyaWRnZXMgOi0pIChUaG91Z2ggYWxtb3N0IGFsbCBzd2l0Y2hlcyBjYW4gZW5hYmxl IGl0KS4KCj4gRHVlIHRvIHRoZXNlIHRyYWRlLW9mZnMgd2UndmUgZGVzaWduZWQgdGhlIHN5c3Rl bSB0byBvbmx5IGVuYWJsZSB1c2luZwo+IHRoZSBQQ0kgbWVtb3J5IGluIGNhc2VzIHdoZXJlIHRo ZSBOSUMsIE5WTWUgZGV2aWNlcyBhbmQgbWVtb3J5IGFyZSBhbGwKPiBiZWhpbmQgdGhlIHNhbWUg UENJIHN3aXRjaC4KCk9rLiBJIHN1cHBvc2UgdGhhdCdzIGEgcmVhc29uYWJsZSBzdGFydGluZyBw b2ludC4gRG8gSSBoYXZlbid0IGxvb2tlZAphdCB0aGUgcGF0Y2hlcyBpbiBkZXRhaWwgeWV0IGJ1 dCBpdCB3b3VsZCBiZSBuaWNlIGlmIHRoYXQgcG9saWN5IHdhcyBpbgphIHdlbGwgaXNvbGF0ZWQg Y29tcG9uZW50IHNvIGl0IGNhbiBwb3RlbnRpYWxseSBiZSBhZmZlY3RlZCBieQphcmNoL3BsYXRm b3JtIGNvZGUuCgpEbyB5b3UgaGFuZGxlIGZ1bmt5IGFkZHJlc3MgdHJhbnNsYXRpb24gdG9vID8g SUUuIHRoZSBmYWN0IHRoYXQgdGhlIFBDSQphZGRyZXNzZXMgYXJlbid0IHRoZSBzYW1lIGFzIHRo ZSBDUFUgcGh5c2ljYWwgYWRkcmVzc2VzIGZvciBhIEJBUiA/Cgo+IFRoaXMgd2lsbCBtZWFuIG1h bnkgc2V0dXBzIHRoYXQgY291bGQgbGlrZWx5Cj4gd29yayB3ZWxsIHdpbGwgbm90IGJlIHN1cHBv cnRlZCBzbyB0aGF0IHdlIGNhbiBiZSBtb3JlIGNvbmZpZGVudCBpdAo+IHdpbGwgd29yayBhbmQg bm90IHBsYWNlIGFueSByZXNwb25zaWJpbGl0eSBvbiB0aGUgdXNlciB0byB1bmRlcnN0YW5kCj4g dGhlaXIgdG9wb2xvZ3kuIChXZSd2ZSBjaG9zZW4gdG8gZ28gdGhpcyByb3V0ZSBiYXNlZCBvbiBm ZWVkYmFjayB3ZQo+IHJlY2VpdmVkIGF0IExTRikuCj4gCj4gSW4gb3JkZXIgdG8gZW5hYmxlIHRo aXMgZnVuY3Rpb25hbGl0eSB3ZSBpbnRyb2R1Y2UgYSBuZXcgcDJwbWVtIGRldmljZQo+IHdoaWNo IGNhbiBiZSBpbnN0YW50aWF0ZWQgYnkgUENJIGRyaXZlcnMuIFRoZSBkZXZpY2Ugd2lsbCByZWdp c3RlciBzb21lCj4gUENJIG1lbW9yeSBhcyBaT05FX0RFVklDRSBhbmQgcHJvdmlkZSBhbiBnZW5h bGxvYyBiYXNlZCBhbGxvY2F0b3IgZm9yCj4gdXNlcnMgb2YgdGhlc2UgZGV2aWNlcyB0byBnZXQg YnVmZmVycy4KCkkgZG9uJ3QgY29tcGxldGVseSB1bmRlcnN0YW5kIHRoaXMuIFRoaXMgaXMgYWN0 dWFsIG1lbW9yeSBvbiB0aGUgUENJCmJ1cyA/IFdoZXJlIGRvZXMgaXQgY29tZSBmcm9tID8gT3Ig YXJlIHlvdSBqdXN0IHRyeWluZyB0byBjcmVhdGUgc3RydWN0CnBhZ2VzIHRoYXQgY292ZXIgeW91 ciBQQ0llIERNQSB0YXJnZXQgPwogCj4gV2UgZ2l2ZSBhbiBleGFtcGxlIG9mIGVuYWJsaW5nCj4g cDJwIG1lbW9yeSB3aXRoIHRoZSBjeGdiNCBkcml2ZXIsIGhvd2V2ZXIgY3VycmVudGx5IHRoZXNl IGRldmljZXMgaGF2ZQo+IHNvbWUgaGFyZHdhcmUgaXNzdWVzIHRoYXQgcHJldmVudCB0aGVpciB1 c2Ugc28gd2Ugd2lsbCBsaWtlbHkgYmUKPiBkcm9wcGluZyB0aGlzIHBhdGNoIGluIHRoZSBmdXR1 cmUuIElkZWFsbHksIHdlJ2Qgd2FudCB0byBlbmFibGUgdGhpcwo+IGZ1bmN0aW9uYWxpdHkgd2l0 aCBOVk1FIENNQiBidWZmZXJzLCBob3dldmVyIHdlIGRvbid0IGhhdmUgYW55IGhhcmR3YXJlCj4g d2l0aCB0aGlzIGZlYXR1cmUgYXQgdGhpcyB0aW1lLgoKU28gY29ycmVjdCBtZSBpZiBJJ20gd3Jv bmcsIHlvdSBhcmUgdHJ5aW5nIHRvIGNyZWF0ZSBzdHJ1Y3QgcGFnZSdzIHRoYXQKbWFwIGEgUENJ ZSBCQVIgcmlnaHQgPyBJJ20gdHJ5aW5nIHRvIHVuZGVyc3RhbmQgaG93IHRoYXQgaW50ZXJhY3Rz IHdpdGgKd2hhdCBKZXJvbWUgaXMgZG9pbmcgZm9yIEhNTS4KClRoZSByZWFzb24gaXMgdGhhdCB0 aGUgSE1NIGN1cnJlbnRseSBjcmVhdGVzIHRoZSBzdHJ1Y3QgcGFnZXMgd2l0aAoiZmFrZSIgUEZO cyBwb2ludGluZyB0byBhIGhvbGUgaW4gdGhlIGFkZHJlc3Mgc3BhY2UgcmF0aGVyIHRoYW4KY292 ZXJpbmcgdGhlIGFjdHVhbCBQQ0llIG1lbW9yeSBvZiB0aGUgR1BVLiBIZSBkb2VzIHRoYXQgdG8g ZGVhbCB3aXRoCnRoZSBmYWN0IHRoYXQgc29tZSBHUFVzIGhhdmUgYSBzbWFsbGVyIGFwZXJ0dXJl IG9uIFBDSWUgdGhhbiB0aGVpcgp0b3RhbCBtZW1vcnkuCgpIb3dldmVyLCBJIGhhdmUgYXNrZWQg aGltIHRvIG9ubHkgYXBwbHkgdGhhdCBwb2xpY3kgaWYgdGhlIGFwZXJ0dXJlIGlzCmluZGVlZCBz bWFsbGVyLCBhbmQgaWYgbm90LCBjcmVhdGUgc3RydWN0IHBhZ2VzIHRoYXQgZGlyZWN0bHkgY292 ZXIgdGhlClBDSWUgQkFSIG9mIHRoZSBHUFUgaW5zdGVhZCwgd2hpY2ggd2lsbCB3b3JrIGJldHRl ciBvbiBzeXN0ZW1zIG9yCmFyY2hpdGVjdHVyZSB0aGF0IGRvbid0IGhhdmUgYSAicGluaG9sZSIg d2luZG93IGxpbWl0YXRpb24uCgpIb3dldmVyIGhlIHdhcyB1bmRlciB0aGUgaW1wcmVzc2lvbiB0 aGF0IHRoaXMgd2FzIGdvaW5nIHRvIGNvbGxpZGUgd2l0aAp3aGF0IHlvdSBndXlzIGFyZSBkb2lu Zywgc28gSSdtIHRyeWluZyB0byB1bmRlcnN0YW5kIGhvdy4gCgo+IEluIG52bWV0LXJkbWEsIHdl IGF0dGVtcHQgdG8gZ2V0IGFuIGFwcHJvcHJpYXRlIHAycG1lbSBkZXZpY2UgYXQKPiBxdWV1ZSBj cmVhdGlvbiB0aW1lIGFuZCBpZiBhIHN1aXRhYmxlIG9uZSBpcyBmb3VuZCB3ZSB3aWxsIHVzZSBp dCBmb3IKPiBhbGwgdGhlIChub24taW5saW5lZCkgbWVtb3J5IGluIHRoZSBxdWV1ZS4gQW4gJ2Fs bG93X3AycG1lbScgY29uZmlnZnMKPiBhdHRyaWJ1dGUgaXMgYWxzbyBjcmVhdGVkIHdoaWNoIGlz IHJlcXVpcmVkIHRvIGJlIHNldCBiZWZvcmUgYW55IHAycG1lbQo+IGlzIGF0dGVtcHRlZC4KPiAK PiBUaGlzIHBhdGNoc2V0IGFsc28gaW5jbHVkZXMgYSBtb3JlIGNvbnRyb3ZlcnNpYWwgcGF0Y2gg d2hpY2ggcHJvdmlkZXMgYW4KPiBpbnRlcmZhY2UgZm9yIHVzZXJzcGFjZSB0byBvYnRhaW4gcDJw bWVtIGJ1ZmZlcnMgdGhyb3VnaCBhbiBtbWFwIGNhbGwgb24KPiBhIGNkZXYuIFRoaXMgZW5hYmxl cyB1c2Vyc3BhY2UgdG8gZmFpcmx5IGVhc2lseSB1c2UgcDJwbWVtIHdpdGggUkRNQSBhbmQKPiBP X0RJUkVDVCBpbnRlcmZhY2VzLiBIb3dldmVyLCB0aGUgdXNlciB3b3VsZCBiZSBlbnRpcmVseSBy ZXNwb25zaWJsZSBmb3IKPiBrbm93aW5nIHdoYXQgdGhlaXIgZG9pbmcgYW5kIGluc3BlY3Rpbmcg c3lzZnMgdG8gdW5kZXJzdGFuZCB0aGUgcGNpCj4gdG9wb2xvZ3kgYW5kIG9ubHkgdXNpbmcgaXQg aW4gc2FuZSBzaXR1YXRpb25zLgo+IAo+IFRoYW5rcywKPiAKPiBMb2dhbgo+IAo+IAo+IExvZ2Fu IEd1bnRob3JwZSAoNik6Cj4gwqAgSW50cm9kdWNlIFBlZXItdG8tUGVlciBtZW1vcnkgKHAycG1l bSkgZGV2aWNlCj4gwqAgbnZtZXQ6IFVzZSBwMnBtZW0gaW4gbnZtZSB0YXJnZXQKPiDCoCBzY2F0 dGVybGlzdDogTW9kaWZ5IFNHIGNvcHkgZnVuY3Rpb25zIHRvIHN1cHBvcnQgaW8gbWVtb3J5Lgo+ IMKgIG52bWV0OiBCZSBjYXJlZnVsIGFib3V0IHVzaW5nIGlvbWVtIGFjY2Vzc2VzIHdoZW4gZGVh bGluZyB3aXRoIHAycG1lbQo+IMKgIHAycG1lbTogU3VwcG9ydCBkZXZpY2UgcmVtb3ZhbAo+IMKg IHAycG1lbTogQWRkZWQgY2hhciBkZXZpY2UgdXNlciBpbnRlcmZhY2UKPiAKPiBTdGV2ZSBXaXNl ICgyKToKPiDCoCBjeGdiNDogc2V0dXAgcGNpZSBtZW1vcnkgd2luZG93IDQgYW5kIGNyZWF0ZSBw MnBtZW0gcmVnaW9uCj4gwqAgcDJwbWVtOiBBZGQgZGVidWdmcyAic3RhdHMiIGZpbGUKPiAKPiDC oGRyaXZlcnMvbWVtb3J5L0tjb25maWfCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgfMKgwqDCoDUgKwo+IMKgZHJpdmVycy9tZW1vcnkvTWFrZWZpbGXC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoHzCoMKgwqAy ICsKPiDCoGRyaXZlcnMvbWVtb3J5L3AycG1lbS5jwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqB8IDY5NyArKysrKysrKysrKysrKysrKysrKysrKysKPiDC oGRyaXZlcnMvbmV0L2V0aGVybmV0L2NoZWxzaW8vY3hnYjQvY3hnYjQuaMKgwqDCoMKgwqDCoHzC oMKgwqAzICsKPiDCoGRyaXZlcnMvbmV0L2V0aGVybmV0L2NoZWxzaW8vY3hnYjQvY3hnYjRfbWFp bi5jIHzCoMKgOTcgKysrLQo+IMKgZHJpdmVycy9uZXQvZXRoZXJuZXQvY2hlbHNpby9jeGdiNC90 NF9yZWdzLmjCoMKgwqDCoHzCoMKgwqA1ICsKPiDCoGRyaXZlcnMvbnZtZS90YXJnZXQvY29uZmln ZnMuY8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoHzCoMKgMzEgKysKPiDCoGRy aXZlcnMvbnZtZS90YXJnZXQvY29yZS5jwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqB8wqDCoDE4ICstCj4gwqBkcml2ZXJzL252bWUvdGFyZ2V0L2ZhYnJpY3MtY21k LmPCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqB8wqDCoDI4ICstCj4gwqBkcml2ZXJzL252 bWUvdGFyZ2V0L252bWV0LmjCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqB8wqDCoMKgMiArCj4gwqBkcml2ZXJzL252bWUvdGFyZ2V0L3JkbWEuY8KgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgfCAxODMgKysrKystLQo+IMKgZHJpdmVycy9z Y3NpL3Njc2lfZGVidWcuY8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqB8wqDCoMKgNyArLQo+IMKgaW5jbHVkZS9saW51eC9wMnBtZW0uaMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqB8IDEyMCArKysrCj4gwqBpbmNs dWRlL2xpbnV4L3NjYXR0ZXJsaXN0LmjCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqB8wqDCoMKgNyArLQo+IMKgbGliL3NjYXR0ZXJsaXN0LmPCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoHzCoMKgNjQgKyst Cj4gwqAxNSBmaWxlcyBjaGFuZ2VkLCAxMTg5IGluc2VydGlvbnMoKyksIDgwIGRlbGV0aW9ucygt KQo+IMKgY3JlYXRlIG1vZGUgMTAwNjQ0IGRyaXZlcnMvbWVtb3J5L3AycG1lbS5jCj4gwqBjcmVh dGUgbW9kZSAxMDA2NDQgaW5jbHVkZS9saW51eC9wMnBtZW0uaAo+IAo+IC0tCj4gMi4xLjQKX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KTGludXgtbnZkaW1t IG1haWxpbmcgbGlzdApMaW51eC1udmRpbW1AbGlzdHMuMDEub3JnCmh0dHBzOi8vbGlzdHMuMDEu b3JnL21haWxtYW4vbGlzdGluZm8vbGludXgtbnZkaW1tCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752494AbdDLGYO (ORCPT ); Wed, 12 Apr 2017 02:24:14 -0400 Received: from gate.crashing.org ([63.228.1.57]:55947 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751848AbdDLGYK (ORCPT ); Wed, 12 Apr 2017 02:24:10 -0400 Message-ID: <1491974532.7236.43.camel@kernel.crashing.org> Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory From: Benjamin Herrenschmidt To: Logan Gunthorpe , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Dan Williams , Keith Busch , Jason Gunthorpe Cc: linux-pci@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@ml01.01.org, linux-kernel@vger.kernel.org Date: Wed, 12 Apr 2017 15:22:12 +1000 In-Reply-To: <1490911959-5146-1-git-send-email-logang@deltatee.com> References: <1490911959-5146-1-git-send-email-logang@deltatee.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6 (3.22.6-1.fc25) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2017-03-30 at 16:12 -0600, Logan Gunthorpe wrote: > Hello, > > As discussed at LSF/MM we'd like to present our work to enable > copy offload support in NVMe fabrics RDMA targets. We'd appreciate > some review and feedback from the community on our direction. > This series is not intended to go upstream at this point. > > The concept here is to use memory that's exposed on a PCI BAR as > data buffers in the NVME target code such that data can be transferred > from an RDMA NIC to the special memory and then directly to an NVMe > device avoiding system memory entirely. The upside of this is better > QoS for applications running on the CPU utilizing memory and lower > PCI bandwidth required to the CPU (such that systems could be designed > with fewer lanes connected to the CPU). However, presently, the trade-off > is currently a reduction in overall throughput. (Largely due to hardware > issues that would certainly improve in the future). Another issue of course is that not all systems support P2P between host bridges :-) (Though almost all switches can enable it). > Due to these trade-offs we've designed the system to only enable using > the PCI memory in cases where the NIC, NVMe devices and memory are all > behind the same PCI switch. Ok. I suppose that's a reasonable starting point. Do I haven't looked at the patches in detail yet but it would be nice if that policy was in a well isolated component so it can potentially be affected by arch/platform code. Do you handle funky address translation too ? IE. the fact that the PCI addresses aren't the same as the CPU physical addresses for a BAR ? > This will mean many setups that could likely > work well will not be supported so that we can be more confident it > will work and not place any responsibility on the user to understand > their topology. (We've chosen to go this route based on feedback we > received at LSF). > > In order to enable this functionality we introduce a new p2pmem device > which can be instantiated by PCI drivers. The device will register some > PCI memory as ZONE_DEVICE and provide an genalloc based allocator for > users of these devices to get buffers. I don't completely understand this. This is actual memory on the PCI bus ? Where does it come from ? Or are you just trying to create struct pages that cover your PCIe DMA target ? > We give an example of enabling > p2p memory with the cxgb4 driver, however currently these devices have > some hardware issues that prevent their use so we will likely be > dropping this patch in the future. Ideally, we'd want to enable this > functionality with NVME CMB buffers, however we don't have any hardware > with this feature at this time. So correct me if I'm wrong, you are trying to create struct page's that map a PCIe BAR right ? I'm trying to understand how that interacts with what Jerome is doing for HMM. The reason is that the HMM currently creates the struct pages with "fake" PFNs pointing to a hole in the address space rather than covering the actual PCIe memory of the GPU. He does that to deal with the fact that some GPUs have a smaller aperture on PCIe than their total memory. However, I have asked him to only apply that policy if the aperture is indeed smaller, and if not, create struct pages that directly cover the PCIe BAR of the GPU instead, which will work better on systems or architecture that don't have a "pinhole" window limitation. However he was under the impression that this was going to collide with what you guys are doing, so I'm trying to understand how. > In nvmet-rdma, we attempt to get an appropriate p2pmem device at > queue creation time and if a suitable one is found we will use it for > all the (non-inlined) memory in the queue. An 'allow_p2pmem' configfs > attribute is also created which is required to be set before any p2pmem > is attempted. > > This patchset also includes a more controversial patch which provides an > interface for userspace to obtain p2pmem buffers through an mmap call on > a cdev. This enables userspace to fairly easily use p2pmem with RDMA and > O_DIRECT interfaces. However, the user would be entirely responsible for > knowing what their doing and inspecting sysfs to understand the pci > topology and only using it in sane situations. > > Thanks, > > Logan > > > Logan Gunthorpe (6): >   Introduce Peer-to-Peer memory (p2pmem) device >   nvmet: Use p2pmem in nvme target >   scatterlist: Modify SG copy functions to support io memory. >   nvmet: Be careful about using iomem accesses when dealing with p2pmem >   p2pmem: Support device removal >   p2pmem: Added char device user interface > > Steve Wise (2): >   cxgb4: setup pcie memory window 4 and create p2pmem region >   p2pmem: Add debugfs "stats" file > >  drivers/memory/Kconfig                          |   5 + >  drivers/memory/Makefile                         |   2 + >  drivers/memory/p2pmem.c                         | 697 ++++++++++++++++++++++++ >  drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |   3 + >  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  97 +++- >  drivers/net/ethernet/chelsio/cxgb4/t4_regs.h    |   5 + >  drivers/nvme/target/configfs.c                  |  31 ++ >  drivers/nvme/target/core.c                      |  18 +- >  drivers/nvme/target/fabrics-cmd.c               |  28 +- >  drivers/nvme/target/nvmet.h                     |   2 + >  drivers/nvme/target/rdma.c                      | 183 +++++-- >  drivers/scsi/scsi_debug.c                       |   7 +- >  include/linux/p2pmem.h                          | 120 ++++ >  include/linux/scatterlist.h                     |   7 +- >  lib/scatterlist.c                               |  64 ++- >  15 files changed, 1189 insertions(+), 80 deletions(-) >  create mode 100644 drivers/memory/p2pmem.c >  create mode 100644 include/linux/p2pmem.h > > -- > 2.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: benh@kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 12 Apr 2017 15:22:12 +1000 Subject: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory In-Reply-To: <1490911959-5146-1-git-send-email-logang@deltatee.com> References: <1490911959-5146-1-git-send-email-logang@deltatee.com> Message-ID: <1491974532.7236.43.camel@kernel.crashing.org> On Thu, 2017-03-30@16:12 -0600, Logan Gunthorpe wrote: > Hello, > > As discussed at LSF/MM we'd like to present our work to enable > copy offload support in NVMe fabrics RDMA targets. We'd appreciate > some review and feedback from the community on our direction. > This series is not intended to go upstream at this point. > > The concept here is to use memory that's exposed on a PCI BAR as > data buffers in the NVME target code such that data can be transferred > from an RDMA NIC to the special memory and then directly to an NVMe > device avoiding system memory entirely. The upside of this is better > QoS for applications running on the CPU utilizing memory and lower > PCI bandwidth required to the CPU (such that systems could be designed > with fewer lanes connected to the CPU). However, presently, the trade-off > is currently a reduction in overall throughput. (Largely due to hardware > issues that would certainly improve in the future). Another issue of course is that not all systems support P2P between host bridges :-) (Though almost all switches can enable it). > Due to these trade-offs we've designed the system to only enable using > the PCI memory in cases where the NIC, NVMe devices and memory are all > behind the same PCI switch. Ok. I suppose that's a reasonable starting point. Do I haven't looked at the patches in detail yet but it would be nice if that policy was in a well isolated component so it can potentially be affected by arch/platform code. Do you handle funky address translation too ? IE. the fact that the PCI addresses aren't the same as the CPU physical addresses for a BAR ? > This will mean many setups that could likely > work well will not be supported so that we can be more confident it > will work and not place any responsibility on the user to understand > their topology. (We've chosen to go this route based on feedback we > received at LSF). > > In order to enable this functionality we introduce a new p2pmem device > which can be instantiated by PCI drivers. The device will register some > PCI memory as ZONE_DEVICE and provide an genalloc based allocator for > users of these devices to get buffers. I don't completely understand this. This is actual memory on the PCI bus ? Where does it come from ? Or are you just trying to create struct pages that cover your PCIe DMA target ? > We give an example of enabling > p2p memory with the cxgb4 driver, however currently these devices have > some hardware issues that prevent their use so we will likely be > dropping this patch in the future. Ideally, we'd want to enable this > functionality with NVME CMB buffers, however we don't have any hardware > with this feature at this time. So correct me if I'm wrong, you are trying to create struct page's that map a PCIe BAR right ? I'm trying to understand how that interacts with what Jerome is doing for HMM. The reason is that the HMM currently creates the struct pages with "fake" PFNs pointing to a hole in the address space rather than covering the actual PCIe memory of the GPU. He does that to deal with the fact that some GPUs have a smaller aperture on PCIe than their total memory. However, I have asked him to only apply that policy if the aperture is indeed smaller, and if not, create struct pages that directly cover the PCIe BAR of the GPU instead, which will work better on systems or architecture that don't have a "pinhole" window limitation. However he was under the impression that this was going to collide with what you guys are doing, so I'm trying to understand how. > In nvmet-rdma, we attempt to get an appropriate p2pmem device at > queue creation time and if a suitable one is found we will use it for > all the (non-inlined) memory in the queue. An 'allow_p2pmem' configfs > attribute is also created which is required to be set before any p2pmem > is attempted. > > This patchset also includes a more controversial patch which provides an > interface for userspace to obtain p2pmem buffers through an mmap call on > a cdev. This enables userspace to fairly easily use p2pmem with RDMA and > O_DIRECT interfaces. However, the user would be entirely responsible for > knowing what their doing and inspecting sysfs to understand the pci > topology and only using it in sane situations. > > Thanks, > > Logan > > > Logan Gunthorpe (6): > ? Introduce Peer-to-Peer memory (p2pmem) device > ? nvmet: Use p2pmem in nvme target > ? scatterlist: Modify SG copy functions to support io memory. > ? nvmet: Be careful about using iomem accesses when dealing with p2pmem > ? p2pmem: Support device removal > ? p2pmem: Added char device user interface > > Steve Wise (2): > ? cxgb4: setup pcie memory window 4 and create p2pmem region > ? p2pmem: Add debugfs "stats" file > > ?drivers/memory/Kconfig??????????????????????????|???5 + > ?drivers/memory/Makefile?????????????????????????|???2 + > ?drivers/memory/p2pmem.c?????????????????????????| 697 ++++++++++++++++++++++++ > ?drivers/net/ethernet/chelsio/cxgb4/cxgb4.h??????|???3 + > ?drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |??97 +++- > ?drivers/net/ethernet/chelsio/cxgb4/t4_regs.h????|???5 + > ?drivers/nvme/target/configfs.c??????????????????|??31 ++ > ?drivers/nvme/target/core.c??????????????????????|??18 +- > ?drivers/nvme/target/fabrics-cmd.c???????????????|??28 +- > ?drivers/nvme/target/nvmet.h?????????????????????|???2 + > ?drivers/nvme/target/rdma.c??????????????????????| 183 +++++-- > ?drivers/scsi/scsi_debug.c???????????????????????|???7 +- > ?include/linux/p2pmem.h??????????????????????????| 120 ++++ > ?include/linux/scatterlist.h?????????????????????|???7 +- > ?lib/scatterlist.c???????????????????????????????|??64 ++- > ?15 files changed, 1189 insertions(+), 80 deletions(-) > ?create mode 100644 drivers/memory/p2pmem.c > ?create mode 100644 include/linux/p2pmem.h > > -- > 2.1.4