From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 96EF9203BBBAA for ; Tue, 8 May 2018 12:34:09 -0700 (PDT) Date: Tue, 8 May 2018 13:34:07 -0600 From: Alex Williamson Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Message-ID: <20180508133407.57a46902@w520.home> In-Reply-To: <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> <20180423233046.21476-5-logang@deltatee.com> <20180507231306.GG161390@bhelgaas-glaptop.roam.corp.google.com> <0b4183ef-e720-204b-9e85-b9eaf7a4136a@deltatee.com> <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> MIME-Version: 1.0 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Logan Gunthorpe Cc: Jens Axboe , Keith Busch , linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, =?UTF-8?B?SsOpcsO0bWU=?= Glisse , Jason Gunthorpe , Bjorn Helgaas , Benjamin Herrenschmidt , Bjorn Helgaas , Max Gurtovoy , Christoph Hellwig , Christian =?UTF-8?B?S8O2bmln?= List-ID: T24gVHVlLCA4IE1heSAyMDE4IDEzOjEzOjQwIC0wNjAwCkxvZ2FuIEd1bnRob3JwZSA8bG9nYW5n QGRlbHRhdGVlLmNvbT4gd3JvdGU6Cgo+IE9uIDA4LzA1LzE4IDEwOjUwIEFNLCBDaHJpc3RpYW4g S8O2bmlnIHdyb3RlOgo+ID4gRS5nLiB0cmFuc2FjdGlvbnMgYXJlIGluaXRpYWxseSBzZW5kIHRv IHRoZSByb290IGNvbXBsZXggZm9yIAo+ID4gdHJhbnNsYXRpb24sIHRoYXQncyBmb3Igc3VyZS4g QnV0IGF0IGxlYXN0IGZvciBBTUQgR1BVcyB0aGUgcm9vdCBjb21wbGV4IAo+ID4gYW5zd2VycyB3 aXRoIHRoZSB0cmFuc2xhdGVkIGFkZHJlc3Mgd2hpY2ggaXMgdGhlbiBjYWNoZWQgaW4gdGhlIGRl dmljZS4KPiA+IAo+ID4gU28gZnVydGhlciB0cmFuc2FjdGlvbnMgZm9yIHRoZSBzYW1lIGFkZHJl c3MgcmFuZ2UgdGhlbiBnbyBkaXJlY3RseSB0byAKPiA+IHRoZSBkZXN0aW5hdGlvbi4gIAo+IAo+ IFNvdW5kcyBsaWtlIHlvdSBhcmUgcmVmZXJyaW5nIHRvIEFkZHJlc3MgVHJhbnNsYXRpb24gU2Vy dmljZXMgKEFUUykuCj4gVGhpcyBpcyBxdWl0ZSBzZXBhcmF0ZSBmcm9tIEFDUyBhbmQsIHRvIG15 IGtub3dsZWRnZSwgaXNuJ3Qgd2lkZWx5Cj4gc3VwcG9ydGVkIGJ5IHN3aXRjaCBoYXJkd2FyZS4K ClRoZXkgYXJlIG5vdCBzbyB1bnJlbGF0ZWQsIHNlZSB0aGUgQUNTIERpcmVjdCBUcmFuc2xhdGVk IFAyUApjYXBhYmlsaXR5LCB3aGljaCBpbiBmYWN0IG11c3QgYmUgaW1wbGVtZW50ZWQgYnkgc3dp dGNoIGRvd25zdHJlYW0KcG9ydHMgaW1wbGVtZW50aW5nIEFDUyBhbmQgd29ya3Mgc3BlY2lmaWNh bGx5IHdpdGggQVRTLiAgVGhpcyBhcHBlYXJzIHRvCmJlIHRoZSB3YXkgdGhlIFBDSSBTSUcgd291 bGQgaW50ZW5kIGZvciBQMlAgdG8gb2NjdXIgd2l0aGluIGFuIElPTU1VCm1hbmFnZWQgdG9wb2xv Z3ksIHJvdXRpbmcgcHJlLXRyYW5zbGF0ZWQgRE1BIGRpcmVjdGx5IGJldHdlZW4gcGVlcgpkZXZp Y2VzIHdoaWxlIHJlcXVpcmluZyBub24tdHJhbnNsYXRlZCByZXF1ZXN0cyB0byBib3VuY2UgdGhy b3VnaCB0aGUKSU9NTVUuICBSZWFsbHksIHdoYXQncyB0aGUgdmFsdWUgb2YgaGF2aW5nIGFuIEkv TyB2aXJ0dWFsIGFkZHJlc3Mgc3BhY2UKcHJvdmlkZWQgYnkgYW4gSU9NTVUgaWYgd2UncmUgZ29p bmcgdG8gYWxsb3cgcGh5c2ljYWwgRE1BIGJldHdlZW4KZG93bnN0cmVhbSBkZXZpY2VzLCBjb3Vs ZG4ndCB3ZSBqdXN0IHR1cm4gb2ZmIHRoZSBJT01NVSBhbHRvZ2V0aGVyPyAgT2YKY291cnNlIEFU UyBpcyBub3Qgd2l0aG91dCBob2xlcyBpdHNlbGYsIGJhc2ljYWxseSB0aGF0IHdlIHRydXN0IHRo ZQplbmRwb2ludCdzIGltcGxlbWVudGF0aW9uIG9mIEFUUyBpbXBsaWNpdGx5LiAgVGhhbmtzLAoK QWxleApfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpMaW51 eC1udmRpbW0gbWFpbGluZyBsaXN0CkxpbnV4LW52ZGltbUBsaXN0cy4wMS5vcmcKaHR0cHM6Ly9s aXN0cy4wMS5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1udmRpbW0K From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 8 May 2018 13:34:07 -0600 From: Alex Williamson To: Logan Gunthorpe Cc: Christian =?UTF-8?B?S8O2bmln?= , Bjorn Helgaas , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org, Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?UTF-8?B?SsOpcsO0bWU=?= Glisse , Benjamin Herrenschmidt Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Message-ID: <20180508133407.57a46902@w520.home> In-Reply-To: <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> <20180423233046.21476-5-logang@deltatee.com> <20180507231306.GG161390@bhelgaas-glaptop.roam.corp.google.com> <0b4183ef-e720-204b-9e85-b9eaf7a4136a@deltatee.com> <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 List-ID: On Tue, 8 May 2018 13:13:40 -0600 Logan Gunthorpe wrote: > On 08/05/18 10:50 AM, Christian K=C3=B6nig wrote: > > E.g. transactions are initially send to the root complex for=20 > > translation, that's for sure. But at least for AMD GPUs the root comple= x=20 > > answers with the translated address which is then cached in the device. > >=20 > > So further transactions for the same address range then go directly to= =20 > > the destination. =20 >=20 > Sounds like you are referring to Address Translation Services (ATS). > This is quite separate from ACS and, to my knowledge, isn't widely > supported by switch hardware. They are not so unrelated, see the ACS Direct Translated P2P capability, which in fact must be implemented by switch downstream ports implementing ACS and works specifically with ATS. This appears to be the way the PCI SIG would intend for P2P to occur within an IOMMU managed topology, routing pre-translated DMA directly between peer devices while requiring non-translated requests to bounce through the IOMMU. Really, what's the value of having an I/O virtual address space provided by an IOMMU if we're going to allow physical DMA between downstream devices, couldn't we just turn off the IOMMU altogether? Of course ATS is not without holes itself, basically that we trust the endpoint's implementation of ATS implicitly. Thanks, Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Date: Tue, 8 May 2018 13:34:07 -0600 Message-ID: <20180508133407.57a46902@w520.home> References: <20180423233046.21476-1-logang@deltatee.com> <20180423233046.21476-5-logang@deltatee.com> <20180507231306.GG161390@bhelgaas-glaptop.roam.corp.google.com> <0b4183ef-e720-204b-9e85-b9eaf7a4136a@deltatee.com> <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Return-path: In-Reply-To: <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> Sender: linux-kernel-owner@vger.kernel.org To: Logan Gunthorpe Cc: Christian =?UTF-8?B?S8O2bmln?= , Bjorn Helgaas , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org, Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?UTF-8?B?SsOpcsO0bWU=?= Glisse , Benjamin Herrenschmidt List-Id: linux-rdma@vger.kernel.org On Tue, 8 May 2018 13:13:40 -0600 Logan Gunthorpe wrote: > On 08/05/18 10:50 AM, Christian König wrote: > > E.g. transactions are initially send to the root complex for > > translation, that's for sure. But at least for AMD GPUs the root complex > > answers with the translated address which is then cached in the device. > > > > So further transactions for the same address range then go directly to > > the destination. > > Sounds like you are referring to Address Translation Services (ATS). > This is quite separate from ACS and, to my knowledge, isn't widely > supported by switch hardware. They are not so unrelated, see the ACS Direct Translated P2P capability, which in fact must be implemented by switch downstream ports implementing ACS and works specifically with ATS. This appears to be the way the PCI SIG would intend for P2P to occur within an IOMMU managed topology, routing pre-translated DMA directly between peer devices while requiring non-translated requests to bounce through the IOMMU. Really, what's the value of having an I/O virtual address space provided by an IOMMU if we're going to allow physical DMA between downstream devices, couldn't we just turn off the IOMMU altogether? Of course ATS is not without holes itself, basically that we trust the endpoint's implementation of ATS implicitly. Thanks, Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: alex.williamson@redhat.com (Alex Williamson) Date: Tue, 8 May 2018 13:34:07 -0600 Subject: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches In-Reply-To: <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> <20180423233046.21476-5-logang@deltatee.com> <20180507231306.GG161390@bhelgaas-glaptop.roam.corp.google.com> <0b4183ef-e720-204b-9e85-b9eaf7a4136a@deltatee.com> <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> Message-ID: <20180508133407.57a46902@w520.home> On Tue, 8 May 2018 13:13:40 -0600 Logan Gunthorpe wrote: > On 08/05/18 10:50 AM, Christian K?nig wrote: > > E.g. transactions are initially send to the root complex for > > translation, that's for sure. But at least for AMD GPUs the root complex > > answers with the translated address which is then cached in the device. > > > > So further transactions for the same address range then go directly to > > the destination. > > Sounds like you are referring to Address Translation Services (ATS). > This is quite separate from ACS and, to my knowledge, isn't widely > supported by switch hardware. They are not so unrelated, see the ACS Direct Translated P2P capability, which in fact must be implemented by switch downstream ports implementing ACS and works specifically with ATS. This appears to be the way the PCI SIG would intend for P2P to occur within an IOMMU managed topology, routing pre-translated DMA directly between peer devices while requiring non-translated requests to bounce through the IOMMU. Really, what's the value of having an I/O virtual address space provided by an IOMMU if we're going to allow physical DMA between downstream devices, couldn't we just turn off the IOMMU altogether? Of course ATS is not without holes itself, basically that we trust the endpoint's implementation of ATS implicitly. Thanks, Alex