From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-x234.google.com (mail-ot0-x234.google.com [IPv6:2607:f8b0:4003:c0f::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id E447A20945DB7 for ; Tue, 8 May 2018 16:00:34 -0700 (PDT) Received: by mail-ot0-x234.google.com with SMTP id n3-v6so1752209ota.5 for ; Tue, 08 May 2018 16:00:34 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20180508163206.7d3bf383@w520.home> References: <20180423233046.21476-1-logang@deltatee.com> <20180423233046.21476-5-logang@deltatee.com> <20180507231306.GG161390@bhelgaas-glaptop.roam.corp.google.com> <0b4183ef-e720-204b-9e85-b9eaf7a4136a@deltatee.com> <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> <20180508133407.57a46902@w520.home> <5fc9b1c1-9208-06cc-0ec5-1f54c2520494@deltatee.com> <20180508141331.7cd737cb@w520.home> <20180508144341.0441b676@w520.home> <20180508152631.50fd583c@w520.home> <354F7407-0DC7-470C-B9AA-74FDF9C46B08@raithlin.com> <20180508160336.0935ddde@w520.home> <20905682-9440-7d4b-0260-99d3dc794c3d@deltatee.com> <20180508163206.7d3bf383@w520.home> From: Dan Williams Date: Tue, 8 May 2018 16:00:33 -0700 Message-ID: Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Alex Williamson Cc: Jens Axboe , Keith Busch , "linux-nvdimm@lists.01.org" , "linux-rdma@vger.kernel.org" , "linux-pci@vger.kernel.org" , Christoph Hellwig , "linux-kernel@vger.kernel.org" , "linux-nvme@lists.infradead.org" , "linux-block@vger.kernel.org" , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jason Gunthorpe , Bjorn Helgaas , Benjamin Herrenschmidt , Bjorn Helgaas , Max Gurtovoy , =?UTF-8?Q?Christian_K=C3=B6nig?= List-ID: On Tue, May 8, 2018 at 3:32 PM, Alex Williamson wrote: > On Tue, 8 May 2018 16:10:19 -0600 > Logan Gunthorpe wrote: > >> On 08/05/18 04:03 PM, Alex Williamson wrote: >> > If IOMMU grouping implies device assignment (because nobody else uses >> > it to the same extent as device assignment) then the build-time option >> > falls to pieces, we need a single kernel that can do both. I think we >> > need to get more clever about allowing the user to specify exactly at >> > which points in the topology they want to disable isolation. Thanks, >> >> >> Yeah, so based on the discussion I'm leaning toward just having a >> command line option that takes a list of BDFs and disables ACS for them. >> (Essentially as Dan has suggested.) This avoids the shotgun. >> >> Then, the pci_p2pdma_distance command needs to check that ACS is >> disabled for all bridges between the two devices. If this is not the >> case, it returns -1. Future work can check if the EP has ATS support, in >> which case it has to check for the ACS direct translated bit. >> >> A user then needs to either disable the IOMMU and/or add the command >> line option to disable ACS for the specific downstream ports in the PCI >> hierarchy. This means the IOMMU groups will be less granular but >> presumably the person adding the command line argument understands this. >> >> We may also want to do some work so that there's informative dmesgs on >> which BDFs need to be specified on the command line so it's not so >> difficult for the user to figure out. > > I'd advise caution with a user supplied BDF approach, we have no > guaranteed persistence for a device's PCI address. Adding a device > might renumber the buses, replacing a device with one that consumes > more/less bus numbers can renumber the buses, motherboard firmware > updates could renumber the buses, pci=assign-buses can renumber the > buses, etc. This is why the VT-d spec makes use of device paths when > describing PCI hierarchies, firmware can't know what bus number will be > assigned to a device, but it does know the base bus number and the path > of devfns needed to get to it. I don't know how we come up with an > option that's easy enough for a user to understand, but reasonably > robust against hardware changes. Thanks, True, but at the same time this feature is for "users with custom hardware designed for purpose", I assume they would be willing to take on the bus renumbering risk. It's already the case that /sys/bus/pci/drivers//bind takes BDF, which is why it seemed to make a similar interface for the command line. Ideally we could later get something into ACPI or other platform firmware to arrange for bridges to disable ACS by default if we see p2p becoming a common-off-the-shelf feature. I.e. a BIOS switch to enable p2p in a given PCI-E sub-domain. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <20180508163206.7d3bf383@w520.home> References: <20180423233046.21476-1-logang@deltatee.com> <20180423233046.21476-5-logang@deltatee.com> <20180507231306.GG161390@bhelgaas-glaptop.roam.corp.google.com> <0b4183ef-e720-204b-9e85-b9eaf7a4136a@deltatee.com> <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> <20180508133407.57a46902@w520.home> <5fc9b1c1-9208-06cc-0ec5-1f54c2520494@deltatee.com> <20180508141331.7cd737cb@w520.home> <20180508144341.0441b676@w520.home> <20180508152631.50fd583c@w520.home> <354F7407-0DC7-470C-B9AA-74FDF9C46B08@raithlin.com> <20180508160336.0935ddde@w520.home> <20905682-9440-7d4b-0260-99d3dc794c3d@deltatee.com> <20180508163206.7d3bf383@w520.home> From: Dan Williams Date: Tue, 8 May 2018 16:00:33 -0700 Message-ID: Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches To: Alex Williamson Cc: Logan Gunthorpe , Stephen Bates , =?UTF-8?Q?Christian_K=C3=B6nig?= , Bjorn Helgaas , "linux-kernel@vger.kernel.org" , "linux-pci@vger.kernel.org" , "linux-nvme@lists.infradead.org" , "linux-rdma@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Benjamin Herrenschmidt Content-Type: text/plain; charset="UTF-8" List-ID: On Tue, May 8, 2018 at 3:32 PM, Alex Williamson wrote: > On Tue, 8 May 2018 16:10:19 -0600 > Logan Gunthorpe wrote: > >> On 08/05/18 04:03 PM, Alex Williamson wrote: >> > If IOMMU grouping implies device assignment (because nobody else uses >> > it to the same extent as device assignment) then the build-time option >> > falls to pieces, we need a single kernel that can do both. I think we >> > need to get more clever about allowing the user to specify exactly at >> > which points in the topology they want to disable isolation. Thanks, >> >> >> Yeah, so based on the discussion I'm leaning toward just having a >> command line option that takes a list of BDFs and disables ACS for them. >> (Essentially as Dan has suggested.) This avoids the shotgun. >> >> Then, the pci_p2pdma_distance command needs to check that ACS is >> disabled for all bridges between the two devices. If this is not the >> case, it returns -1. Future work can check if the EP has ATS support, in >> which case it has to check for the ACS direct translated bit. >> >> A user then needs to either disable the IOMMU and/or add the command >> line option to disable ACS for the specific downstream ports in the PCI >> hierarchy. This means the IOMMU groups will be less granular but >> presumably the person adding the command line argument understands this. >> >> We may also want to do some work so that there's informative dmesgs on >> which BDFs need to be specified on the command line so it's not so >> difficult for the user to figure out. > > I'd advise caution with a user supplied BDF approach, we have no > guaranteed persistence for a device's PCI address. Adding a device > might renumber the buses, replacing a device with one that consumes > more/less bus numbers can renumber the buses, motherboard firmware > updates could renumber the buses, pci=assign-buses can renumber the > buses, etc. This is why the VT-d spec makes use of device paths when > describing PCI hierarchies, firmware can't know what bus number will be > assigned to a device, but it does know the base bus number and the path > of devfns needed to get to it. I don't know how we come up with an > option that's easy enough for a user to understand, but reasonably > robust against hardware changes. Thanks, True, but at the same time this feature is for "users with custom hardware designed for purpose", I assume they would be willing to take on the bus renumbering risk. It's already the case that /sys/bus/pci/drivers//bind takes BDF, which is why it seemed to make a similar interface for the command line. Ideally we could later get something into ACPI or other platform firmware to arrange for bridges to disable ACS by default if we see p2p becoming a common-off-the-shelf feature. I.e. a BIOS switch to enable p2p in a given PCI-E sub-domain. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Williams Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Date: Tue, 8 May 2018 16:00:33 -0700 Message-ID: References: <20180423233046.21476-1-logang@deltatee.com> <20180423233046.21476-5-logang@deltatee.com> <20180507231306.GG161390@bhelgaas-glaptop.roam.corp.google.com> <0b4183ef-e720-204b-9e85-b9eaf7a4136a@deltatee.com> <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> <20180508133407.57a46902@w520.home> <5fc9b1c1-9208-06cc-0ec5-1f54c2520494@deltatee.com> <20180508141331.7cd737cb@w520.home> <20180508144341.0441b676@w520.home> <20180508152631.50fd583c@w520.home> <354F7407-0DC7-470C-B9AA-74FDF9C46B08@raithlin.com> <20180508160336.0935ddde@w520.home> <20905682-9440-7d4b-0260-99d3dc794c3d@deltatee.com> <20180508163206.7d3bf383@w520.home> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20180508163206.7d3bf383-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" To: Alex Williamson Cc: Jens Axboe , Keith Busch , "linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org" , "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Christoph Hellwig , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org" , "linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jason Gunthorpe , Bjorn Helgaas , Benjamin Herrenschmidt , Bjorn Helgaas , Max Gurtovoy , =?UTF-8?Q?Christian_K=C3=B6nig?= List-Id: linux-rdma@vger.kernel.org On Tue, May 8, 2018 at 3:32 PM, Alex Williamson wrote: > On Tue, 8 May 2018 16:10:19 -0600 > Logan Gunthorpe wrote: > >> On 08/05/18 04:03 PM, Alex Williamson wrote: >> > If IOMMU grouping implies device assignment (because nobody else uses >> > it to the same extent as device assignment) then the build-time option >> > falls to pieces, we need a single kernel that can do both. I think we >> > need to get more clever about allowing the user to specify exactly at >> > which points in the topology they want to disable isolation. Thanks, >> >> >> Yeah, so based on the discussion I'm leaning toward just having a >> command line option that takes a list of BDFs and disables ACS for them. >> (Essentially as Dan has suggested.) This avoids the shotgun. >> >> Then, the pci_p2pdma_distance command needs to check that ACS is >> disabled for all bridges between the two devices. If this is not the >> case, it returns -1. Future work can check if the EP has ATS support, in >> which case it has to check for the ACS direct translated bit. >> >> A user then needs to either disable the IOMMU and/or add the command >> line option to disable ACS for the specific downstream ports in the PCI >> hierarchy. This means the IOMMU groups will be less granular but >> presumably the person adding the command line argument understands this. >> >> We may also want to do some work so that there's informative dmesgs on >> which BDFs need to be specified on the command line so it's not so >> difficult for the user to figure out. > > I'd advise caution with a user supplied BDF approach, we have no > guaranteed persistence for a device's PCI address. Adding a device > might renumber the buses, replacing a device with one that consumes > more/less bus numbers can renumber the buses, motherboard firmware > updates could renumber the buses, pci=assign-buses can renumber the > buses, etc. This is why the VT-d spec makes use of device paths when > describing PCI hierarchies, firmware can't know what bus number will be > assigned to a device, but it does know the base bus number and the path > of devfns needed to get to it. I don't know how we come up with an > option that's easy enough for a user to understand, but reasonably > robust against hardware changes. Thanks, True, but at the same time this feature is for "users with custom hardware designed for purpose", I assume they would be willing to take on the bus renumbering risk. It's already the case that /sys/bus/pci/drivers//bind takes BDF, which is why it seemed to make a similar interface for the command line. Ideally we could later get something into ACPI or other platform firmware to arrange for bridges to disable ACS by default if we see p2p becoming a common-off-the-shelf feature. I.e. a BIOS switch to enable p2p in a given PCI-E sub-domain. From mboxrd@z Thu Jan 1 00:00:00 1970 From: dan.j.williams@intel.com (Dan Williams) Date: Tue, 8 May 2018 16:00:33 -0700 Subject: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches In-Reply-To: <20180508163206.7d3bf383@w520.home> References: <20180423233046.21476-1-logang@deltatee.com> <20180423233046.21476-5-logang@deltatee.com> <20180507231306.GG161390@bhelgaas-glaptop.roam.corp.google.com> <0b4183ef-e720-204b-9e85-b9eaf7a4136a@deltatee.com> <3584a6ac-95c7-5d23-1859-aee30605776e@deltatee.com> <20180508133407.57a46902@w520.home> <5fc9b1c1-9208-06cc-0ec5-1f54c2520494@deltatee.com> <20180508141331.7cd737cb@w520.home> <20180508144341.0441b676@w520.home> <20180508152631.50fd583c@w520.home> <354F7407-0DC7-470C-B9AA-74FDF9C46B08@raithlin.com> <20180508160336.0935ddde@w520.home> <20905682-9440-7d4b-0260-99d3dc794c3d@deltatee.com> <20180508163206.7d3bf383@w520.home> Message-ID: On Tue, May 8, 2018 at 3:32 PM, Alex Williamson wrote: > On Tue, 8 May 2018 16:10:19 -0600 > Logan Gunthorpe wrote: > >> On 08/05/18 04:03 PM, Alex Williamson wrote: >> > If IOMMU grouping implies device assignment (because nobody else uses >> > it to the same extent as device assignment) then the build-time option >> > falls to pieces, we need a single kernel that can do both. I think we >> > need to get more clever about allowing the user to specify exactly at >> > which points in the topology they want to disable isolation. Thanks, >> >> >> Yeah, so based on the discussion I'm leaning toward just having a >> command line option that takes a list of BDFs and disables ACS for them. >> (Essentially as Dan has suggested.) This avoids the shotgun. >> >> Then, the pci_p2pdma_distance command needs to check that ACS is >> disabled for all bridges between the two devices. If this is not the >> case, it returns -1. Future work can check if the EP has ATS support, in >> which case it has to check for the ACS direct translated bit. >> >> A user then needs to either disable the IOMMU and/or add the command >> line option to disable ACS for the specific downstream ports in the PCI >> hierarchy. This means the IOMMU groups will be less granular but >> presumably the person adding the command line argument understands this. >> >> We may also want to do some work so that there's informative dmesgs on >> which BDFs need to be specified on the command line so it's not so >> difficult for the user to figure out. > > I'd advise caution with a user supplied BDF approach, we have no > guaranteed persistence for a device's PCI address. Adding a device > might renumber the buses, replacing a device with one that consumes > more/less bus numbers can renumber the buses, motherboard firmware > updates could renumber the buses, pci=assign-buses can renumber the > buses, etc. This is why the VT-d spec makes use of device paths when > describing PCI hierarchies, firmware can't know what bus number will be > assigned to a device, but it does know the base bus number and the path > of devfns needed to get to it. I don't know how we come up with an > option that's easy enough for a user to understand, but reasonably > robust against hardware changes. Thanks, True, but at the same time this feature is for "users with custom hardware designed for purpose", I assume they would be willing to take on the bus renumbering risk. It's already the case that /sys/bus/pci/drivers//bind takes BDF, which is why it seemed to make a similar interface for the command line. Ideally we could later get something into ACPI or other platform firmware to arrange for bridges to disable ACS by default if we see p2p becoming a common-off-the-shelf feature. I.e. a BIOS switch to enable p2p in a given PCI-E sub-domain.