From mboxrd@z Thu Jan 1 00:00:00 1970 From: Knut Omang Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory Date: Mon, 24 Apr 2017 09:36:37 +0200 Message-ID: <1493019397.3171.118.camel@oracle.com> References: <1490911959-5146-1-git-send-email-logang@deltatee.com> <1491974532.7236.43.camel@kernel.crashing.org> <5ac22496-56ec-025d-f153-140001d2a7f9@deltatee.com> <1492034124.7236.77.camel@kernel.crashing.org> <81888a1e-eb0d-cbbc-dc66-0a09c32e4ea2@deltatee.com> <20170413232631.GB24910@bhelgaas-glaptop.roam.corp.google.com> <20170414041656.GA30694@obsidianresearch.com> <1492169849.25766.3.camel@kernel.crashing.org> <630c1c63-ff17-1116-e069-2b8f93e50fa2@deltatee.com> <20170414190452.GA15679@bhelgaas-glaptop.roam.corp.google.com> <1492207643.25766.18.camel@kernel.crashing.org> <1492311719.25766.37.camel@kernel.crashing.org> <5e43818e-8c6b-8be8-23ff-b798633d2a73@deltatee.com> <1492381907.25766.49.camel@kernel.crashing.org> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <1492381907.25766.49.camel-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" To: Benjamin Herrenschmidt , Logan Gunthorpe , Dan Williams Cc: Jens Axboe , Keith Busch , "James E.J. Bottomley" , "Martin K. Petersen" , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Steve Wise , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, Jason Gunthorpe , Jerome Glisse , Bjorn Helgaas , linux-scsi , linux-nvdimm , Max Gurtovoy , Christoph Hellwig List-Id: linux-nvdimm@lists.01.org T24gTW9uLCAyMDE3LTA0LTE3IGF0IDA4OjMxICsxMDAwLCBCZW5qYW1pbiBIZXJyZW5zY2htaWR0 IHdyb3RlOgo+IE9uIFN1biwgMjAxNy0wNC0xNiBhdCAxMDozNCAtMDYwMCwgTG9nYW4gR3VudGhv cnBlIHdyb3RlOgo+ID7CoAo+ID4gT24gMTYvMDQvMTcgMDk6NTMgQU0sIERhbiBXaWxsaWFtcyB3 cm90ZToKPiA+ID4gWk9ORV9ERVZJQ0UgYWxsb3dzIHlvdSB0byByZWRpcmVjdCB2aWEgZ2V0X2Rl dl9wYWdlbWFwKCkgdG8gcmV0cmlldmUKPiA+ID4gY29udGV4dCBhYm91dCB0aGUgcGh5c2ljYWwg YWRkcmVzcyBpbiBxdWVzdGlvbi4gSSdtIHRoaW5raW5nIHlvdSBjYW4KPiA+ID4gaGFuZyBidXMg YWRkcmVzcyB0cmFuc2xhdGlvbiBkYXRhIG9mZiBvZiB0aGF0IHN0cnVjdHVyZS4gVGhpcyBzZWVt cwo+ID4gPiB2YWd1ZWx5IHNpbWlsYXIgdG8gd2hhdCBITU0gaXMgZG9pbmcuCj4gPsKgCj4gPiBU aGFua3MhIEkgZGlkbid0IHJlYWxpemUgeW91IGhhZCB0aGUgaW5mcmFzdHJ1Y3R1cmUgdG8gbG9v ayB1cCBhIGRldmljZQo+ID4gZnJvbSBhIHBmbi9wYWdlLiBUaGF0IHdvdWxkIHJlYWxseSBjb21l IGluIGhhbmR5IGZvciB1cy4KPiAKPiBJdCBkb2VzIGluZGVlZC4gSSB3b24ndCBiZSBhYmxlIHRv IHBsYXkgd2l0aCB0aGF0IG11Y2ggZm9yIGEgZmV3IHdlZWtzCj4gKHNlZSBteSBvdGhlciBlbWFp bCkgc28gaWYgeW91J3JlIGdvaW5nIHRvIHRhY2tsZSB0aGlzIHdoaWxlIEknbSBhd2F5LAo+IGNh biB5b3Ugd29yayB3aXRoIEplcm9tZSB0byBtYWtlIHN1cmUgeW91IGRvbid0IGNvbmZsaWN0IHdp dGggSE1NID8KPiAKPiBJIHJlYWxseSB3YW50IGEgd2F5IGZvciBITU0gdG8gYmUgYWJsZSB0byBs YXlvdXQgc3RydWN0IHBhZ2VzIG92ZXIgdGhlCj4gR1BVIEJBUnMgcmF0aGVyIHRoYW4gaW4gImFs bG9jYXRlZCBmcmVlIHNwYWNlIiBmb3IgdGhlIGNhc2Ugd2hlcmUgdGhlCj4gQkFSIGlzIGJpZyBl bm91Z2ggdG8gY292ZXIgYWxsIG9mIHRoZSBHUFUgbWVtb3J5Lgo+IAo+IEluIGdlbmVyYWwsIEkn ZCBsaWtlIGEgc2ltcGxlICYgZ2VuZXJpYyB3YXkgZm9yIGFueSBkcml2ZXIgdG8gYXNrIHRoZQo+ IGNvcmUgdG8gbGF5b3V0IERNQSdibGUgc3RydWN0IHBhZ2VzIG92ZXIgQkFSIHNwYWNlLiBJIGFu IG5vdCBjb252aW5jZWQKPiB0aGlzIHJlcXVpcmVzIGEgInAybWVtIGRldmljZSIgdG8gYmUgY3Jl YXRlZCBvbiB0b3Agb2YgdGhpcyB0aG91Z2ggYnV0Cj4gdGhhdCdzIGEgZGlmZmVyZW50IGRpc2N1 c3Npb24uCj4gCj4gT2YgY291cnNlIHRoZSBhY3R1YWwgYWJpbGl0eSB0byBwZXJmb3JtIHRoZSBE TUEgbWFwcGluZyB3aWxsIGJlIHN1YmplY3QKPiB0byB2YXJpb3VzIHJlc3RyaWN0aW9ucyB0aGF0 IHdpbGwgaGF2ZSB0byBiZSBpbXBsZW1lbnRlZCBpbiB0aGUgYWN0dWFsCj4gImRtYV9vcHMgb3Zl cnJpZGUiIGJhY2tlbmQuIFdlIGNhbiBoYXZlIGdlbmVyaWMgY29kZSB0byBoYW5kbGUgdGhlIGNh c2UKPiB3aGVyZSBkZXZpY2VzIHJlc2lkZSBvbiB0aGUgc2FtZSBkb21haW4sIHdoaWNoIGNhbiBk ZWFsIHdpdGggc3dpdGNoCj4gY29uZmlndXJhdGlvbiBldGMuLi4gd2Ugd2lsbCBuZWVkIHRvIGhh dmUgaW9tbXUgc3BlY2lmaWMgY29kZSB0byBoYW5kbGUKPiB0aGUgY2FzZSBnb2luZyB0aHJvdWdo IHRoZSBmYWJyaWMuwqAKPiAKPiBWaXJ0dWFsaXphdGlvbiBpcyBhIHNlcGFyYXRlIGNhbiBvZiB3 b3JtcyBkdWUgdG8gaG93IHFlbXUgY29tcGxldGVseQo+IGZha2VzIHRoZSBNTUlPIHNwYWNlLCB3 ZSBjYW4gbG9vayBpbnRvIHRoYXQgbGF0ZXIuCgpNeSBmaXJzdCByZWZsZXggd2hlbiByZWFkaW5n IHRoaXMgdGhyZWFkIHdhcyB0byB0aGluayB0aGF0IHRoaXMgd2hvbGUgZG9tYWluCmxlbmRzIGl0 IHNlbGYgZXhjZWxsZW50bHkgdG8gdGVzdGluZyB2aWEgUWVtdS4gQ291bGQgaXQgYmUgdGhhdCBk b2luZyB0aGlzIGluwqAKdGhlIG9wcG9zaXRlIGRpcmVjdGlvbiBtaWdodCBiZSBhIHNhZmVyIGFw cHJvYWNoIGluIHRoZSBsb25nIHJ1biBldmVuIHRob3VnaMKgCihzaWduaWZpY2FudCkgbW9yZSB3 b3JrIHVwLWZyb250PwoKRWcuIHN0YXJ0IGJ5IGZpeGluZy9wcm92aWRpbmcvZG9jdW1lbnRpbmcg c3VpdGFibGUgbW9kZWwocynCoApmb3IgdGVzdGluZyB0aGlzIGluIFFlbXUsIHRoZW4gaW1wbGVt ZW50IHRoZSBwYXRjaCBzZXQgYmFzZWTCoApvbiB0aG9zZSBtb2RlbHM/CgpUaGFua3MsCktudXQK Cj4gCj4gQ2hlZXJzLAo+IEJlbi4KPiAKPiAtLQo+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBs aXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC1yZG1hIiBpbgo+IHRoZSBib2R5 IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnCj4gTW9yZSBtYWpvcmRv bW8gaW5mbyBhdMKgwqBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwK X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KTGludXgtbnZk aW1tIG1haWxpbmcgbGlzdApMaW51eC1udmRpbW1AbGlzdHMuMDEub3JnCmh0dHBzOi8vbGlzdHMu MDEub3JnL21haWxtYW4vbGlzdGluZm8vbGludXgtbnZkaW1tCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1166181AbdDXHiG convert rfc822-to-8bit (ORCPT ); Mon, 24 Apr 2017 03:38:06 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:40686 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1166035AbdDXHh4 (ORCPT ); Mon, 24 Apr 2017 03:37:56 -0400 Message-ID: <1493019397.3171.118.camel@oracle.com> Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory From: Knut Omang To: Benjamin Herrenschmidt , Logan Gunthorpe , Dan Williams Cc: Bjorn Helgaas , Jason Gunthorpe , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Keith Busch , linux-pci@vger.kernel.org, linux-scsi , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm , "linux-kernel@vger.kernel.org" , Jerome Glisse Date: Mon, 24 Apr 2017 09:36:37 +0200 In-Reply-To: <1492381907.25766.49.camel@kernel.crashing.org> References: <1490911959-5146-1-git-send-email-logang@deltatee.com> <1491974532.7236.43.camel@kernel.crashing.org> <5ac22496-56ec-025d-f153-140001d2a7f9@deltatee.com> <1492034124.7236.77.camel@kernel.crashing.org> <81888a1e-eb0d-cbbc-dc66-0a09c32e4ea2@deltatee.com> <20170413232631.GB24910@bhelgaas-glaptop.roam.corp.google.com> <20170414041656.GA30694@obsidianresearch.com> <1492169849.25766.3.camel@kernel.crashing.org> <630c1c63-ff17-1116-e069-2b8f93e50fa2@deltatee.com> <20170414190452.GA15679@bhelgaas-glaptop.roam.corp.google.com> <1492207643.25766.18.camel@kernel.crashing.org> <1492311719.25766.37.camel@kernel.crashing.org> <5e43818e-8c6b-8be8-23ff-b798633d2a73@deltatee.com> <1492381907.25766.49.camel@kernel.crashing.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.20.5 (3.20.5-1.fc24) Mime-Version: 1.0 Content-Transfer-Encoding: 8BIT X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2017-04-17 at 08:31 +1000, Benjamin Herrenschmidt wrote: > On Sun, 2017-04-16 at 10:34 -0600, Logan Gunthorpe wrote: > >  > > On 16/04/17 09:53 AM, Dan Williams wrote: > > > ZONE_DEVICE allows you to redirect via get_dev_pagemap() to retrieve > > > context about the physical address in question. I'm thinking you can > > > hang bus address translation data off of that structure. This seems > > > vaguely similar to what HMM is doing. > >  > > Thanks! I didn't realize you had the infrastructure to look up a device > > from a pfn/page. That would really come in handy for us. > > It does indeed. I won't be able to play with that much for a few weeks > (see my other email) so if you're going to tackle this while I'm away, > can you work with Jerome to make sure you don't conflict with HMM ? > > I really want a way for HMM to be able to layout struct pages over the > GPU BARs rather than in "allocated free space" for the case where the > BAR is big enough to cover all of the GPU memory. > > In general, I'd like a simple & generic way for any driver to ask the > core to layout DMA'ble struct pages over BAR space. I an not convinced > this requires a "p2mem device" to be created on top of this though but > that's a different discussion. > > Of course the actual ability to perform the DMA mapping will be subject > to various restrictions that will have to be implemented in the actual > "dma_ops override" backend. We can have generic code to handle the case > where devices reside on the same domain, which can deal with switch > configuration etc... we will need to have iommu specific code to handle > the case going through the fabric.  > > Virtualization is a separate can of worms due to how qemu completely > fakes the MMIO space, we can look into that later. My first reflex when reading this thread was to think that this whole domain lends it self excellently to testing via Qemu. Could it be that doing this in  the opposite direction might be a safer approach in the long run even though  (significant) more work up-front? Eg. start by fixing/providing/documenting suitable model(s)  for testing this in Qemu, then implement the patch set based  on those models? Thanks, Knut > > Cheers, > Ben. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at  http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: Message-ID: <1493019397.3171.118.camel@oracle.com> Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory From: Knut Omang To: Benjamin Herrenschmidt , Logan Gunthorpe , Dan Williams Cc: Bjorn Helgaas , Jason Gunthorpe , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Keith Busch , linux-pci@vger.kernel.org, linux-scsi , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm , "linux-kernel@vger.kernel.org" , Jerome Glisse Date: Mon, 24 Apr 2017 09:36:37 +0200 In-Reply-To: <1492381907.25766.49.camel@kernel.crashing.org> References: <1490911959-5146-1-git-send-email-logang@deltatee.com> <1491974532.7236.43.camel@kernel.crashing.org> <5ac22496-56ec-025d-f153-140001d2a7f9@deltatee.com> <1492034124.7236.77.camel@kernel.crashing.org> <81888a1e-eb0d-cbbc-dc66-0a09c32e4ea2@deltatee.com> <20170413232631.GB24910@bhelgaas-glaptop.roam.corp.google.com> <20170414041656.GA30694@obsidianresearch.com> <1492169849.25766.3.camel@kernel.crashing.org> <630c1c63-ff17-1116-e069-2b8f93e50fa2@deltatee.com> <20170414190452.GA15679@bhelgaas-glaptop.roam.corp.google.com> <1492207643.25766.18.camel@kernel.crashing.org> <1492311719.25766.37.camel@kernel.crashing.org> <5e43818e-8c6b-8be8-23ff-b798633d2a73@deltatee.com> <1492381907.25766.49.camel@kernel.crashing.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 List-ID: On Mon, 2017-04-17 at 08:31 +1000, Benjamin Herrenschmidt wrote: > On Sun, 2017-04-16 at 10:34 -0600, Logan Gunthorpe wrote: > >=C2=A0 > > On 16/04/17 09:53 AM, Dan Williams wrote: > > > ZONE_DEVICE allows you to redirect via get_dev_pagemap() to retrieve > > > context about the physical address in question. I'm thinking you can > > > hang bus address translation data off of that structure. This seems > > > vaguely similar to what HMM is doing. > >=C2=A0 > > Thanks! I didn't realize you had the infrastructure to look up a device > > from a pfn/page. That would really come in handy for us. >=20 > It does indeed. I won't be able to play with that much for a few weeks > (see my other email) so if you're going to tackle this while I'm away, > can you work with Jerome to make sure you don't conflict with HMM ? >=20 > I really want a way for HMM to be able to layout struct pages over the > GPU BARs rather than in "allocated free space" for the case where the > BAR is big enough to cover all of the GPU memory. >=20 > In general, I'd like a simple & generic way for any driver to ask the > core to layout DMA'ble struct pages over BAR space. I an not convinced > this requires a "p2mem device" to be created on top of this though but > that's a different discussion. >=20 > Of course the actual ability to perform the DMA mapping will be subject > to various restrictions that will have to be implemented in the actual > "dma_ops override" backend. We can have generic code to handle the case > where devices reside on the same domain, which can deal with switch > configuration etc... we will need to have iommu specific code to handle > the case going through the fabric.=C2=A0 >=20 > Virtualization is a separate can of worms due to how qemu completely > fakes the MMIO space, we can look into that later. My first reflex when reading this thread was to think that this whole domai= n lends it self excellently to testing via Qemu. Could it be that doing this = in=C2=A0 the opposite direction might be a safer approach in the long run even thoug= h=C2=A0 (significant) more work up-front? Eg. start by fixing/providing/documenting suitable model(s)=C2=A0 for testing this in Qemu, then implement the patch set based=C2=A0 on those models? Thanks, Knut >=20 > Cheers, > Ben. >=20 > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at=C2=A0=C2=A0http://vger.kernel.org/majordomo-info.h= tml From mboxrd@z Thu Jan 1 00:00:00 1970 From: knut.omang@oracle.com (Knut Omang) Date: Mon, 24 Apr 2017 09:36:37 +0200 Subject: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory In-Reply-To: <1492381907.25766.49.camel@kernel.crashing.org> References: <1490911959-5146-1-git-send-email-logang@deltatee.com> <1491974532.7236.43.camel@kernel.crashing.org> <5ac22496-56ec-025d-f153-140001d2a7f9@deltatee.com> <1492034124.7236.77.camel@kernel.crashing.org> <81888a1e-eb0d-cbbc-dc66-0a09c32e4ea2@deltatee.com> <20170413232631.GB24910@bhelgaas-glaptop.roam.corp.google.com> <20170414041656.GA30694@obsidianresearch.com> <1492169849.25766.3.camel@kernel.crashing.org> <630c1c63-ff17-1116-e069-2b8f93e50fa2@deltatee.com> <20170414190452.GA15679@bhelgaas-glaptop.roam.corp.google.com> <1492207643.25766.18.camel@kernel.crashing.org> <1492311719.25766.37.camel@kernel.crashing.org> <5e43818e-8c6b-8be8-23ff-b798633d2a73@deltatee.com> <1492381907.25766.49.camel@kernel.crashing.org> Message-ID: <1493019397.3171.118.camel@oracle.com> On Mon, 2017-04-17@08:31 +1000, Benjamin Herrenschmidt wrote: > On Sun, 2017-04-16@10:34 -0600, Logan Gunthorpe wrote: > >? > > On 16/04/17 09:53 AM, Dan Williams wrote: > > > ZONE_DEVICE allows you to redirect via get_dev_pagemap() to retrieve > > > context about the physical address in question. I'm thinking you can > > > hang bus address translation data off of that structure. This seems > > > vaguely similar to what HMM is doing. > >? > > Thanks! I didn't realize you had the infrastructure to look up a device > > from a pfn/page. That would really come in handy for us. > > It does indeed. I won't be able to play with that much for a few weeks > (see my other email) so if you're going to tackle this while I'm away, > can you work with Jerome to make sure you don't conflict with HMM ? > > I really want a way for HMM to be able to layout struct pages over the > GPU BARs rather than in "allocated free space" for the case where the > BAR is big enough to cover all of the GPU memory. > > In general, I'd like a simple & generic way for any driver to ask the > core to layout DMA'ble struct pages over BAR space. I an not convinced > this requires a "p2mem device" to be created on top of this though but > that's a different discussion. > > Of course the actual ability to perform the DMA mapping will be subject > to various restrictions that will have to be implemented in the actual > "dma_ops override" backend. We can have generic code to handle the case > where devices reside on the same domain, which can deal with switch > configuration etc... we will need to have iommu specific code to handle > the case going through the fabric.? > > Virtualization is a separate can of worms due to how qemu completely > fakes the MMIO space, we can look into that later. My first reflex when reading this thread was to think that this whole domain lends it self excellently to testing via Qemu. Could it be that doing this in? the opposite direction might be a safer approach in the long run even though? (significant) more work up-front? Eg. start by fixing/providing/documenting suitable model(s)? for testing this in Qemu, then implement the patch set based? on those models? Thanks, Knut > > Cheers, > Ben. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at??http://vger.kernel.org/majordomo-info.html