All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Bug 55211] pci_disable_link_state PCIE_LINK_STATE_L0S no longer disables ASPM for ath5k
       [not found] ` <20130317155023.5B2EB11FB81@bugzilla.kernel.org>
@ 2013-03-17 17:19   ` Yinghai Lu
  2013-03-18 17:37     ` [PATCH] PCI: Remove not needed check in disable aspm link Yinghai Lu
  0 siblings, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-03-17 17:19 UTC (permalink / raw)
  To: bugzilla-daemon, Roman Yepishev, Bjorn Helgaas
  Cc: Taku Izumi, Linux Kernel Mailing List, linux-pci, NetDev

[-- Attachment #1: Type: text/plain, Size: 1038 bytes --]

On Sun, Mar 17, 2013 at 8:50 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=55211

> --- Comment #9 from Roman Yepishev <roman.yepishev@gmail.com>  2013-03-17 15:50:23 ---
> Re-tested on two laptops - AOA150, ath5k device got ASPM disabled and on a
> Lenovo E420 I got ASPM disabled for iwlwifi-driven Intel Corporation Centrino
> Wireless-N 1000 [8086:0084].
>
> Also I found http://article.gmane.org/gmane.linux.kernel.pci/20640 where a
> question was raised why pci_disable_link_state stopped doing anything for
> iwlwifi devices too.
>

good.

Did you test first patch or second patch?

Please test second patch only on affected platforms.

old logic is quite strange, on boot path and hotplug path it have
different aspm_disabled setting.
as it could set aspm_disabled after pci root bus scanning.

Second patch will not restore to old logic, and just remove not needed
aspm_disabled checking
for disabling path.

So second patch is right fix, but it need more test.

Thanks

Yinghai

[-- Attachment #2: pci_acpi_osc_aspm_fix.patch --]
[-- Type: application/octet-stream, Size: 3533 bytes --]

Subject: [PATCH] PCI, ACPI: Delay pcie_no_aspm() calling

Roman reported ath5k does not work anymore on 3.8.
Bisected to
| commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
| Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
| Date:   Tue Oct 30 15:27:13 2012 +0900
|
|    PCI/ACPI: Request _OSC control before scanning PCI root bus
|
|    This patch moves up the code block to request _OSC control in order to
|    separate ACPI work and PCI work in acpi_pci_root_add().

It make pci_disable_link_state does not work anymore as acpi_disabled
is set before pci root bus scanning.
It will skip that in quirks and pcie_aspm_sanity_check.

Retore old logic just delay calling pcie_no_aspm() later.

https://bugzilla.kernel.org/show_bug.cgi?id=55211
http://article.gmane.org/gmane.linux.kernel.pci/20640

Need it for 3.8 stable.

Reported-by: Roman Yepishev <roman.yepishev@gmail.com>
Bisected-by: Roman Yepishev <roman.yepishev@gmail.com>
Tested-by: Roman Yepishev <roman.yepishev@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org

---
 drivers/acpi/pci_root.c |   28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

Index: linux-2.6/drivers/acpi/pci_root.c
===================================================================
--- linux-2.6.orig/drivers/acpi/pci_root.c
+++ linux-2.6/drivers/acpi/pci_root.c
@@ -415,7 +415,9 @@ static int acpi_pci_root_add(struct acpi
 	struct acpi_pci_root *root;
 	struct acpi_pci_driver *driver;
 	u32 flags, base_flags;
-	bool is_osc_granted = false;
+	/* -1: not even tried, 0: tried but failed, 1: tried and succesful */
+	int osc_support_query_state = -1;
+	int osc_control_set_state = -1;
 
 	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
 	if (!root)
@@ -488,11 +490,12 @@ static int acpi_pci_root_add(struct acpi
 	if (flags != base_flags) {
 		status = acpi_pci_osc_support(root, flags);
 		if (ACPI_FAILURE(status)) {
+			osc_support_query_state = 0;
 			dev_info(&device->dev, "ACPI _OSC support "
-				"notification failed, disabling PCIe ASPM\n");
-			pcie_no_aspm();
+				"notification failed, PCIe ASPM will be disabled\n");
 			flags = base_flags;
-		}
+		} else
+			osc_support_query_state = 1;
 	}
 	if (!pcie_ports_disabled
 	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
@@ -514,11 +517,11 @@ static int acpi_pci_root_add(struct acpi
 		status = acpi_pci_osc_control_set(device->handle, &flags,
 				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
 		if (ACPI_SUCCESS(status)) {
-			is_osc_granted = true;
+			osc_control_set_state = 1;
 			dev_info(&device->dev,
 				"ACPI _OSC control (0x%02x) granted\n", flags);
 		} else {
-			is_osc_granted = false;
+			osc_control_set_state = 0;
 			dev_info(&device->dev,
 				"ACPI _OSC request failed (%s), "
 				"returned control mask: 0x%02x\n",
@@ -555,13 +558,16 @@ static int acpi_pci_root_add(struct acpi
 	}
 
 	/* ASPM setting */
-	if (is_osc_granted) {
+	if (osc_support_query_state == 0) {
+		dev_info(&device->dev, "ACPI _OSC support notification failed, PCIe ASPM disabled\n");
+		pcie_no_aspm();
+	}
+	if (osc_control_set_state == 1) {
 		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
 			pcie_clear_aspm(root->bus);
-	} else {
-		pr_info("ACPI _OSC control for PCIe not granted, "
-			"disabling ASPM\n");
-		pcie_no_aspm();
+	} else if (osc_control_set_state == 0) {
+			dev_info(&device->dev, "ACPI _OSC control not granted, PCIe ASPM disabled\n");
+			pcie_no_aspm();
 	}
 
 	pci_acpi_add_bus_pm_notifier(device, root->bus);

[-- Attachment #3: disable_aspm_remove_not_needed_check.patch --]
[-- Type: application/octet-stream, Size: 3040 bytes --]

Subject: [PATCH] PCI: Remove not needed check in disable aspm link

Roman reported ath5k does not work anymore on 3.8.
Bisected to
| commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
| Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
| Date:   Tue Oct 30 15:27:13 2012 +0900
|
|    PCI/ACPI: Request _OSC control before scanning PCI root bus
|
|    This patch moves up the code block to request _OSC control in order to
|    separate ACPI work and PCI work in acpi_pci_root_add().

It make pci_disable_link_state does not work anymore as acpi_disabled
is set before pci root bus scanning.
It will skip that in quirks and pcie_aspm_sanity_check.

Acctually we don't need to check aspm_disabled in disable link, as
we already have protection about link state following.

https://bugzilla.kernel.org/show_bug.cgi?id=55211
http://article.gmane.org/gmane.linux.kernel.pci/20640

Need it for 3.8 stable.

Reported-by: Roman Yepishev <roman.yepishev@gmail.com>
Bisected-by: Roman Yepishev <roman.yepishev@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org

---
 drivers/pci/pcie/aspm.c |   21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

Index: linux-2.6/drivers/pci/pcie/aspm.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/aspm.c
+++ linux-2.6/drivers/pci/pcie/aspm.c
@@ -493,15 +493,6 @@ static int pcie_aspm_sanity_check(struct
 			return -EINVAL;
 
 		/*
-		 * If ASPM is disabled then we're not going to change
-		 * the BIOS state. It's safe to continue even if it's a
-		 * pre-1.1 device
-		 */
-
-		if (aspm_disabled)
-			continue;
-
-		/*
 		 * Disable ASPM for pre-1.1 PCIe device, we follow MS to use
 		 * RBER bit to determine if a function is 1.1 version device
 		 */
@@ -718,15 +709,11 @@ void pcie_aspm_powersave_config_link(str
  * pci_disable_link_state - disable pci device's link state, so the link will
  * never enter specific states
  */
-static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
-				     bool force)
+static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
 {
 	struct pci_dev *parent = pdev->bus->self;
 	struct pcie_link_state *link;
 
-	if (aspm_disabled && !force)
-		return;
-
 	if (!pci_is_pcie(pdev))
 		return;
 
@@ -757,13 +744,13 @@ static void __pci_disable_link_state(str
 
 void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, false, false);
+	__pci_disable_link_state(pdev, state, false);
 }
 EXPORT_SYMBOL(pci_disable_link_state_locked);
 
 void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, true, false);
+	__pci_disable_link_state(pdev, state, true);
 }
 EXPORT_SYMBOL(pci_disable_link_state);
 
@@ -781,7 +768,7 @@ void pcie_clear_aspm(struct pci_bus *bus
 		__pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
 					 PCIE_LINK_STATE_L1 |
 					 PCIE_LINK_STATE_CLKPM,
-					 false, true);
+					 false);
 	}
 }
 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-17 17:19   ` [Bug 55211] pci_disable_link_state PCIE_LINK_STATE_L0S no longer disables ASPM for ath5k Yinghai Lu
@ 2013-03-18 17:37     ` Yinghai Lu
  2013-03-27 22:56       ` Bjorn Helgaas
  0 siblings, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-03-18 17:37 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: linux-pci, linux-acpi, linux-kernel, Yinghai Lu, Taku Izumi,
	Kenji Kaneshige, stable

Roman reported ath5k does not work anymore on 3.8.
Bisected to
| commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
| Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
| Date:   Tue Oct 30 15:27:13 2012 +0900
|
|    PCI/ACPI: Request _OSC control before scanning PCI root bus
|
|    This patch moves up the code block to request _OSC control in order to
|    separate ACPI work and PCI work in acpi_pci_root_add().

It make pci_disable_link_state does not work anymore as acpi_disabled
is set before pci root bus scanning.
It will skip that in quirks and pcie_aspm_sanity_check.

We could revert to old logic, but that will make booting path and hotplug
path with different aspm_disabled again.

Acctually we don't need to check aspm_disabled in disable link, as
we already have protection about link state following.

https://bugzilla.kernel.org/show_bug.cgi?id=55211
http://article.gmane.org/gmane.linux.kernel.pci/20640

Need it for 3.8 stable.

Reported-by: Roman Yepishev <roman.yepishev@gmail.com>
Bisected-by: Roman Yepishev <roman.yepishev@gmail.com>
Tested-by: Roman Yepishev <roman.yepishev@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Cc: stable@kernel.org

---
 drivers/pci/pcie/aspm.c |   21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

Index: linux-2.6/drivers/pci/pcie/aspm.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/aspm.c
+++ linux-2.6/drivers/pci/pcie/aspm.c
@@ -493,15 +493,6 @@ static int pcie_aspm_sanity_check(struct
 			return -EINVAL;
 
 		/*
-		 * If ASPM is disabled then we're not going to change
-		 * the BIOS state. It's safe to continue even if it's a
-		 * pre-1.1 device
-		 */
-
-		if (aspm_disabled)
-			continue;
-
-		/*
 		 * Disable ASPM for pre-1.1 PCIe device, we follow MS to use
 		 * RBER bit to determine if a function is 1.1 version device
 		 */
@@ -718,15 +709,11 @@ void pcie_aspm_powersave_config_link(str
  * pci_disable_link_state - disable pci device's link state, so the link will
  * never enter specific states
  */
-static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
-				     bool force)
+static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
 {
 	struct pci_dev *parent = pdev->bus->self;
 	struct pcie_link_state *link;
 
-	if (aspm_disabled && !force)
-		return;
-
 	if (!pci_is_pcie(pdev))
 		return;
 
@@ -757,13 +744,13 @@ static void __pci_disable_link_state(str
 
 void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, false, false);
+	__pci_disable_link_state(pdev, state, false);
 }
 EXPORT_SYMBOL(pci_disable_link_state_locked);
 
 void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, true, false);
+	__pci_disable_link_state(pdev, state, true);
 }
 EXPORT_SYMBOL(pci_disable_link_state);
 
@@ -781,7 +768,7 @@ void pcie_clear_aspm(struct pci_bus *bus
 		__pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
 					 PCIE_LINK_STATE_L1 |
 					 PCIE_LINK_STATE_CLKPM,
-					 false, true);
+					 false);
 	}
 }
 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-18 17:37     ` [PATCH] PCI: Remove not needed check in disable aspm link Yinghai Lu
@ 2013-03-27 22:56       ` Bjorn Helgaas
  2013-03-28  7:41         ` Yinghai Lu
  0 siblings, 1 reply; 71+ messages in thread
From: Bjorn Helgaas @ 2013-03-27 22:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rafael J. Wysocki, linux-pci, linux-acpi, linux-kernel,
	Taku Izumi, Kenji Kaneshige, stable

On Mon, Mar 18, 2013 at 11:37 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> Roman reported ath5k does not work anymore on 3.8.
> Bisected to
> | commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
> | Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
> | Date:   Tue Oct 30 15:27:13 2012 +0900
> |
> |    PCI/ACPI: Request _OSC control before scanning PCI root bus
> |
> |    This patch moves up the code block to request _OSC control in order to
> |    separate ACPI work and PCI work in acpi_pci_root_add().
>
> It make pci_disable_link_state does not work anymore as acpi_disabled
> is set before pci root bus scanning.
> It will skip that in quirks and pcie_aspm_sanity_check.
>
> We could revert to old logic, but that will make booting path and hotplug
> path with different aspm_disabled again.
>
> Acctually we don't need to check aspm_disabled in disable link, as
> we already have protection about link state following.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=55211
> http://article.gmane.org/gmane.linux.kernel.pci/20640
>
> Need it for 3.8 stable.
>
> Reported-by: Roman Yepishev <roman.yepishev@gmail.com>
> Bisected-by: Roman Yepishev <roman.yepishev@gmail.com>
> Tested-by: Roman Yepishev <roman.yepishev@gmail.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
> Cc: stable@kernel.org
>
> ---
>  drivers/pci/pcie/aspm.c |   21 ++++-----------------
>  1 file changed, 4 insertions(+), 17 deletions(-)
>
> Index: linux-2.6/drivers/pci/pcie/aspm.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/pcie/aspm.c
> +++ linux-2.6/drivers/pci/pcie/aspm.c
> @@ -493,15 +493,6 @@ static int pcie_aspm_sanity_check(struct
>                         return -EINVAL;
>
>                 /*
> -                * If ASPM is disabled then we're not going to change
> -                * the BIOS state. It's safe to continue even if it's a
> -                * pre-1.1 device
> -                */
> -
> -               if (aspm_disabled)
> -                       continue;
> -
> -               /*
>                  * Disable ASPM for pre-1.1 PCIe device, we follow MS to use
>                  * RBER bit to determine if a function is 1.1 version device
>                  */
> @@ -718,15 +709,11 @@ void pcie_aspm_powersave_config_link(str
>   * pci_disable_link_state - disable pci device's link state, so the link will
>   * never enter specific states
>   */
> -static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
> -                                    bool force)
> +static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>  {
>         struct pci_dev *parent = pdev->bus->self;
>         struct pcie_link_state *link;
>
> -       if (aspm_disabled && !force)
> -               return;
> -
>         if (!pci_is_pcie(pdev))
>                 return;
>
> @@ -757,13 +744,13 @@ static void __pci_disable_link_state(str
>
>  void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
>  {
> -       __pci_disable_link_state(pdev, state, false, false);
> +       __pci_disable_link_state(pdev, state, false);
>  }
>  EXPORT_SYMBOL(pci_disable_link_state_locked);
>
>  void pci_disable_link_state(struct pci_dev *pdev, int state)
>  {
> -       __pci_disable_link_state(pdev, state, true, false);
> +       __pci_disable_link_state(pdev, state, true);
>  }
>  EXPORT_SYMBOL(pci_disable_link_state);
>
> @@ -781,7 +768,7 @@ void pcie_clear_aspm(struct pci_bus *bus
>                 __pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
>                                          PCIE_LINK_STATE_L1 |
>                                          PCIE_LINK_STATE_CLKPM,
> -                                        false, true);
> +                                        false);
>         }
>  }
>

This might fix the problem, but the code is still a mess.  In
acpi_pci_root_add(), why do we have the following?

    acpi_pci_root_add

      acpi_pci_osc_support
      if (flags != base_flags)
        pcie_no_aspm
      if (...)
        acpi_pci_osc_control_set
        if (ACPI_SUCCESS)
          is_osc_granted = true

      pci_acpi_scan_root

      if (is_osc_granted)
        if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
          pcie_clear_aspm
      else
        pcie_no_aspm

Why can't we set all the ASPM flags *first*, before calling
pci_acpi_scan_root()?  That way we could just do the correct ASPM
setup as we discover devices during enumeration, rather than trying to
fix things up afterwards.  I suspect pcie_clear_aspm() is broken
anyway, because it looks like it only touches one level of the
hierarchy, without recursively descending it.

But Taku went to some trouble in 8c33f51d to introduce is_osc_granted
to remember this until after pci_acpi_scan_root(), so presumably
there's some reason for this.  Do you remember why, Taku?

Bjorn

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-27 22:56       ` Bjorn Helgaas
@ 2013-03-28  7:41         ` Yinghai Lu
  2013-03-28 12:46           ` Bjorn Helgaas
  0 siblings, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-03-28  7:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, linux-pci, linux-acpi, linux-kernel,
	Taku Izumi, Kenji Kaneshige, stable

On Wed, Mar 27, 2013 at 3:56 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> Why can't we set all the ASPM flags *first*, before calling
> pci_acpi_scan_root()?  That way we could just do the correct ASPM
> setup as we discover devices during enumeration, rather than trying to
> fix things up afterwards.  I suspect pcie_clear_aspm() is broken
> anyway, because it looks like it only touches one level of the
> hierarchy, without recursively descending it.

Yes, we can clean up aspm stop/clear up.
and that should be for 3.10 right?

But this patch should be safe for 3.9 and stable.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-28  7:41         ` Yinghai Lu
@ 2013-03-28 12:46           ` Bjorn Helgaas
  2013-03-28 20:21             ` Yinghai Lu
  2013-03-28 20:24             ` Yinghai Lu
  0 siblings, 2 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-03-28 12:46 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rafael J. Wysocki, linux-pci, linux-acpi, linux-kernel,
	Taku Izumi, Kenji Kaneshige, stable

On Thu, Mar 28, 2013 at 1:41 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Mar 27, 2013 at 3:56 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> Why can't we set all the ASPM flags *first*, before calling
>> pci_acpi_scan_root()?  That way we could just do the correct ASPM
>> setup as we discover devices during enumeration, rather than trying to
>> fix things up afterwards.  I suspect pcie_clear_aspm() is broken
>> anyway, because it looks like it only touches one level of the
>> hierarchy, without recursively descending it.
>
> Yes, we can clean up aspm stop/clear up.
> and that should be for 3.10 right?
>
> But this patch should be safe for 3.9 and stable.

This patch might be *safe*, but it (and the changelog) are completely
unintelligible.

The problem with applying an unintelligible stop-gap patch is that it
becomes forever part of the changelog, and it's a huge waste of time
to everybody who tries to understand the history later.  That's why I
think it's worth spending some time to make a good patch now.

Bjorn

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-28 12:46           ` Bjorn Helgaas
@ 2013-03-28 20:21             ` Yinghai Lu
  2013-03-28 20:24             ` Yinghai Lu
  1 sibling, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-03-28 20:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, linux-pci, linux-acpi, linux-kernel,
	Taku Izumi, Kenji Kaneshige, stable

[-- Attachment #1: Type: text/plain, Size: 519 bytes --]

On Thu, Mar 28, 2013 at 5:46 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:

> This patch might be *safe*, but it (and the changelog) are completely
> unintelligible.
>
> The problem with applying an unintelligible stop-gap patch is that it
> becomes forever part of the changelog, and it's a huge waste of time
> to everybody who tries to understand the history later.  That's why I
> think it's worth spending some time to make a good patch now.

Ok, Please check if attached is doing what you want.

Thanks

Yinghai

[-- Attachment #2: disable_aspm.patch --]
[-- Type: application/octet-stream, Size: 8303 bytes --]

Subject: [PATCH] PCI: Remove not needed check in disable aspm link

Roman reported ath5k does not work anymore on 3.8.
Bisected to
| commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
| Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
| Date:   Tue Oct 30 15:27:13 2012 +0900
|
|    PCI/ACPI: Request _OSC control before scanning PCI root bus
|
|    This patch moves up the code block to request _OSC control in order to
|    separate ACPI work and PCI work in acpi_pci_root_add().

It make pci_disable_link_state does not work anymore as acpi_disabled
is set before pci root bus scanning.
It will skip that in quirks and pcie_aspm_sanity_check.

We could revert to old logic, but that will make booting path and hotplug
path with different aspm_disabled again.

Acctually we don't need to check aspm_disabled in disable link, as
we already have protection about link state following.

https://bugzilla.kernel.org/show_bug.cgi?id=55211
http://article.gmane.org/gmane.linux.kernel.pci/20640

Need it for 3.8 stable.

-v2: more cleanup
	1. remove aspm_support_enabled, as if it compiled in, support is there
		so even user pass aspm=off, link_state still get allocated,
		then we will have chance to disable aspm on devices from
		buggy setting of BIOS.
	2. move pcie_no_aspm() calling for fadt disabling before scanning
		requested by Bjorn.

Reported-by: Roman Yepishev <roman.yepishev@gmail.com>
Bisected-by: Roman Yepishev <roman.yepishev@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

---
 drivers/acpi/pci_root.c  |   25 +++++++++---------------
 drivers/pci/pcie/aspm.c  |   48 ++---------------------------------------------
 include/linux/pci-aspm.h |    4 ---
 include/linux/pci.h      |    2 -
 4 files changed, 14 insertions(+), 65 deletions(-)

Index: linux-2.6/drivers/acpi/pci_root.c
===================================================================
--- linux-2.6.orig/drivers/acpi/pci_root.c
+++ linux-2.6/drivers/acpi/pci_root.c
@@ -415,7 +415,6 @@ static int acpi_pci_root_add(struct acpi
 	struct acpi_pci_root *root;
 	struct acpi_pci_driver *driver;
 	u32 flags, base_flags;
-	bool is_osc_granted = false;
 
 	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
 	if (!root)
@@ -494,6 +493,11 @@ static int acpi_pci_root_add(struct acpi
 			flags = base_flags;
 		}
 	}
+
+	/* ASPM setting */
+	if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
+		pcie_no_aspm();
+
 	if (!pcie_ports_disabled
 	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
 		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
@@ -513,16 +517,17 @@ static int acpi_pci_root_add(struct acpi
 
 		status = acpi_pci_osc_control_set(device->handle, &flags,
 				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
-		if (ACPI_SUCCESS(status)) {
-			is_osc_granted = true;
+		if (ACPI_SUCCESS(status))
 			dev_info(&device->dev,
 				"ACPI _OSC control (0x%02x) granted\n", flags);
-		} else {
-			is_osc_granted = false;
+		else {
 			dev_info(&device->dev,
 				"ACPI _OSC request failed (%s), "
 				"returned control mask: 0x%02x\n",
 				acpi_format_exception(status), flags);
+			pr_info("ACPI _OSC control for PCIe not granted, "
+				"disabling ASPM\n");
+			pcie_no_aspm();
 		}
 	} else {
 		dev_info(&device->dev,
@@ -554,16 +559,6 @@ static int acpi_pci_root_add(struct acpi
 		goto out_del_root;
 	}
 
-	/* ASPM setting */
-	if (is_osc_granted) {
-		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
-			pcie_clear_aspm(root->bus);
-	} else {
-		pr_info("ACPI _OSC control for PCIe not granted, "
-			"disabling ASPM\n");
-		pcie_no_aspm();
-	}
-
 	pci_acpi_add_bus_pm_notifier(device, root->bus);
 	if (device->wakeup.flags.run_wake)
 		device_set_run_wake(root->bus->bridge, true);
Index: linux-2.6/drivers/pci/pcie/aspm.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/aspm.c
+++ linux-2.6/drivers/pci/pcie/aspm.c
@@ -69,7 +69,6 @@ struct pcie_link_state {
 };
 
 static int aspm_disabled, aspm_force;
-static bool aspm_support_enabled = true;
 static DEFINE_MUTEX(aspm_lock);
 static LIST_HEAD(link_list);
 
@@ -493,15 +492,6 @@ static int pcie_aspm_sanity_check(struct
 			return -EINVAL;
 
 		/*
-		 * If ASPM is disabled then we're not going to change
-		 * the BIOS state. It's safe to continue even if it's a
-		 * pre-1.1 device
-		 */
-
-		if (aspm_disabled)
-			continue;
-
-		/*
 		 * Disable ASPM for pre-1.1 PCIe device, we follow MS to use
 		 * RBER bit to determine if a function is 1.1 version device
 		 */
@@ -556,9 +546,6 @@ void pcie_aspm_init_link_state(struct pc
 	struct pcie_link_state *link;
 	int blacklist = !!pcie_aspm_sanity_check(pdev);
 
-	if (!aspm_support_enabled)
-		return;
-
 	if (!pci_is_pcie(pdev) || pdev->link_state)
 		return;
 	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
@@ -718,15 +705,11 @@ void pcie_aspm_powersave_config_link(str
  * pci_disable_link_state - disable pci device's link state, so the link will
  * never enter specific states
  */
-static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
-				     bool force)
+static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
 {
 	struct pci_dev *parent = pdev->bus->self;
 	struct pcie_link_state *link;
 
-	if (aspm_disabled && !force)
-		return;
-
 	if (!pci_is_pcie(pdev))
 		return;
 
@@ -757,34 +740,16 @@ static void __pci_disable_link_state(str
 
 void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, false, false);
+	__pci_disable_link_state(pdev, state, false);
 }
 EXPORT_SYMBOL(pci_disable_link_state_locked);
 
 void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, true, false);
+	__pci_disable_link_state(pdev, state, true);
 }
 EXPORT_SYMBOL(pci_disable_link_state);
 
-void pcie_clear_aspm(struct pci_bus *bus)
-{
-	struct pci_dev *child;
-
-	if (aspm_force)
-		return;
-
-	/*
-	 * Clear any ASPM setup that the firmware has carried out on this bus
-	 */
-	list_for_each_entry(child, &bus->devices, bus_list) {
-		__pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
-					 PCIE_LINK_STATE_L1 |
-					 PCIE_LINK_STATE_CLKPM,
-					 false, true);
-	}
-}
-
 static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp)
 {
 	int i;
@@ -944,7 +909,6 @@ static int __init pcie_aspm_disable(char
 	if (!strcmp(str, "off")) {
 		aspm_policy = POLICY_DEFAULT;
 		aspm_disabled = 1;
-		aspm_support_enabled = false;
 		printk(KERN_INFO "PCIe ASPM is disabled\n");
 	} else if (!strcmp(str, "force")) {
 		aspm_force = 1;
@@ -980,9 +944,3 @@ int pcie_aspm_enabled(void)
        return !aspm_disabled;
 }
 EXPORT_SYMBOL(pcie_aspm_enabled);
-
-bool pcie_aspm_support_enabled(void)
-{
-	return aspm_support_enabled;
-}
-EXPORT_SYMBOL(pcie_aspm_support_enabled);
Index: linux-2.6/include/linux/pci-aspm.h
===================================================================
--- linux-2.6.orig/include/linux/pci-aspm.h
+++ linux-2.6/include/linux/pci-aspm.h
@@ -29,7 +29,6 @@ extern void pcie_aspm_pm_state_change(st
 extern void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
 extern void pci_disable_link_state(struct pci_dev *pdev, int state);
 extern void pci_disable_link_state_locked(struct pci_dev *pdev, int state);
-extern void pcie_clear_aspm(struct pci_bus *bus);
 extern void pcie_no_aspm(void);
 #else
 static inline void pcie_aspm_init_link_state(struct pci_dev *pdev)
@@ -47,9 +46,6 @@ static inline void pcie_aspm_powersave_c
 static inline void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
 }
-static inline void pcie_clear_aspm(struct pci_bus *bus)
-{
-}
 static inline void pcie_no_aspm(void)
 {
 }
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1168,7 +1168,7 @@ static inline int pcie_aspm_enabled(void
 static inline bool pcie_aspm_support_enabled(void) { return false; }
 #else
 extern int pcie_aspm_enabled(void);
-extern bool pcie_aspm_support_enabled(void);
+static inline bool pcie_aspm_support_enabled(void) { return true; }
 #endif
 
 #ifdef CONFIG_PCIEAER

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-28 12:46           ` Bjorn Helgaas
  2013-03-28 20:21             ` Yinghai Lu
@ 2013-03-28 20:24             ` Yinghai Lu
  2013-03-28 20:24               ` Yinghai Lu
  1 sibling, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-03-28 20:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Roman Yepishev
  Cc: Rafael J. Wysocki, linux-pci, linux-acpi, linux-kernel,
	Taku Izumi, Kenji Kaneshige

resending with adding To Roman.

On Thu, Mar 28, 2013 at 5:46 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> This patch might be *safe*, but it (and the changelog) are completely
> unintelligible.
>
> The problem with applying an unintelligible stop-gap patch is that it
> becomes forever part of the changelog, and it's a huge waste of time
> to everybody who tries to understand the history later.  That's why I
> think it's worth spending some time to make a good patch now.

Please check if attached patch is doing what you want.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-28 20:24             ` Yinghai Lu
@ 2013-03-28 20:24               ` Yinghai Lu
  2013-03-29  3:22                 ` Bjorn Helgaas
  0 siblings, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-03-28 20:24 UTC (permalink / raw)
  To: Bjorn Helgaas, Roman Yepishev
  Cc: Rafael J. Wysocki, linux-pci, linux-acpi, linux-kernel,
	Taku Izumi, Kenji Kaneshige

[-- Attachment #1: Type: text/plain, Size: 663 bytes --]

patch for Roman

On Thu, Mar 28, 2013 at 1:24 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> resending with adding To Roman.
>
> On Thu, Mar 28, 2013 at 5:46 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> This patch might be *safe*, but it (and the changelog) are completely
>> unintelligible.
>>
>> The problem with applying an unintelligible stop-gap patch is that it
>> becomes forever part of the changelog, and it's a huge waste of time
>> to everybody who tries to understand the history later.  That's why I
>> think it's worth spending some time to make a good patch now.
>
> Please check if attached patch is doing what you want.
>
> Thanks
>
> Yinghai

[-- Attachment #2: disable_aspm.patch --]
[-- Type: application/octet-stream, Size: 8303 bytes --]

Subject: [PATCH] PCI: Remove not needed check in disable aspm link

Roman reported ath5k does not work anymore on 3.8.
Bisected to
| commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
| Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
| Date:   Tue Oct 30 15:27:13 2012 +0900
|
|    PCI/ACPI: Request _OSC control before scanning PCI root bus
|
|    This patch moves up the code block to request _OSC control in order to
|    separate ACPI work and PCI work in acpi_pci_root_add().

It make pci_disable_link_state does not work anymore as acpi_disabled
is set before pci root bus scanning.
It will skip that in quirks and pcie_aspm_sanity_check.

We could revert to old logic, but that will make booting path and hotplug
path with different aspm_disabled again.

Acctually we don't need to check aspm_disabled in disable link, as
we already have protection about link state following.

https://bugzilla.kernel.org/show_bug.cgi?id=55211
http://article.gmane.org/gmane.linux.kernel.pci/20640

Need it for 3.8 stable.

-v2: more cleanup
	1. remove aspm_support_enabled, as if it compiled in, support is there
		so even user pass aspm=off, link_state still get allocated,
		then we will have chance to disable aspm on devices from
		buggy setting of BIOS.
	2. move pcie_no_aspm() calling for fadt disabling before scanning
		requested by Bjorn.

Reported-by: Roman Yepishev <roman.yepishev@gmail.com>
Bisected-by: Roman Yepishev <roman.yepishev@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

---
 drivers/acpi/pci_root.c  |   25 +++++++++---------------
 drivers/pci/pcie/aspm.c  |   48 ++---------------------------------------------
 include/linux/pci-aspm.h |    4 ---
 include/linux/pci.h      |    2 -
 4 files changed, 14 insertions(+), 65 deletions(-)

Index: linux-2.6/drivers/acpi/pci_root.c
===================================================================
--- linux-2.6.orig/drivers/acpi/pci_root.c
+++ linux-2.6/drivers/acpi/pci_root.c
@@ -415,7 +415,6 @@ static int acpi_pci_root_add(struct acpi
 	struct acpi_pci_root *root;
 	struct acpi_pci_driver *driver;
 	u32 flags, base_flags;
-	bool is_osc_granted = false;
 
 	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
 	if (!root)
@@ -494,6 +493,11 @@ static int acpi_pci_root_add(struct acpi
 			flags = base_flags;
 		}
 	}
+
+	/* ASPM setting */
+	if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
+		pcie_no_aspm();
+
 	if (!pcie_ports_disabled
 	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
 		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
@@ -513,16 +517,17 @@ static int acpi_pci_root_add(struct acpi
 
 		status = acpi_pci_osc_control_set(device->handle, &flags,
 				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
-		if (ACPI_SUCCESS(status)) {
-			is_osc_granted = true;
+		if (ACPI_SUCCESS(status))
 			dev_info(&device->dev,
 				"ACPI _OSC control (0x%02x) granted\n", flags);
-		} else {
-			is_osc_granted = false;
+		else {
 			dev_info(&device->dev,
 				"ACPI _OSC request failed (%s), "
 				"returned control mask: 0x%02x\n",
 				acpi_format_exception(status), flags);
+			pr_info("ACPI _OSC control for PCIe not granted, "
+				"disabling ASPM\n");
+			pcie_no_aspm();
 		}
 	} else {
 		dev_info(&device->dev,
@@ -554,16 +559,6 @@ static int acpi_pci_root_add(struct acpi
 		goto out_del_root;
 	}
 
-	/* ASPM setting */
-	if (is_osc_granted) {
-		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
-			pcie_clear_aspm(root->bus);
-	} else {
-		pr_info("ACPI _OSC control for PCIe not granted, "
-			"disabling ASPM\n");
-		pcie_no_aspm();
-	}
-
 	pci_acpi_add_bus_pm_notifier(device, root->bus);
 	if (device->wakeup.flags.run_wake)
 		device_set_run_wake(root->bus->bridge, true);
Index: linux-2.6/drivers/pci/pcie/aspm.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/aspm.c
+++ linux-2.6/drivers/pci/pcie/aspm.c
@@ -69,7 +69,6 @@ struct pcie_link_state {
 };
 
 static int aspm_disabled, aspm_force;
-static bool aspm_support_enabled = true;
 static DEFINE_MUTEX(aspm_lock);
 static LIST_HEAD(link_list);
 
@@ -493,15 +492,6 @@ static int pcie_aspm_sanity_check(struct
 			return -EINVAL;
 
 		/*
-		 * If ASPM is disabled then we're not going to change
-		 * the BIOS state. It's safe to continue even if it's a
-		 * pre-1.1 device
-		 */
-
-		if (aspm_disabled)
-			continue;
-
-		/*
 		 * Disable ASPM for pre-1.1 PCIe device, we follow MS to use
 		 * RBER bit to determine if a function is 1.1 version device
 		 */
@@ -556,9 +546,6 @@ void pcie_aspm_init_link_state(struct pc
 	struct pcie_link_state *link;
 	int blacklist = !!pcie_aspm_sanity_check(pdev);
 
-	if (!aspm_support_enabled)
-		return;
-
 	if (!pci_is_pcie(pdev) || pdev->link_state)
 		return;
 	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
@@ -718,15 +705,11 @@ void pcie_aspm_powersave_config_link(str
  * pci_disable_link_state - disable pci device's link state, so the link will
  * never enter specific states
  */
-static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
-				     bool force)
+static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
 {
 	struct pci_dev *parent = pdev->bus->self;
 	struct pcie_link_state *link;
 
-	if (aspm_disabled && !force)
-		return;
-
 	if (!pci_is_pcie(pdev))
 		return;
 
@@ -757,34 +740,16 @@ static void __pci_disable_link_state(str
 
 void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, false, false);
+	__pci_disable_link_state(pdev, state, false);
 }
 EXPORT_SYMBOL(pci_disable_link_state_locked);
 
 void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, true, false);
+	__pci_disable_link_state(pdev, state, true);
 }
 EXPORT_SYMBOL(pci_disable_link_state);
 
-void pcie_clear_aspm(struct pci_bus *bus)
-{
-	struct pci_dev *child;
-
-	if (aspm_force)
-		return;
-
-	/*
-	 * Clear any ASPM setup that the firmware has carried out on this bus
-	 */
-	list_for_each_entry(child, &bus->devices, bus_list) {
-		__pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
-					 PCIE_LINK_STATE_L1 |
-					 PCIE_LINK_STATE_CLKPM,
-					 false, true);
-	}
-}
-
 static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp)
 {
 	int i;
@@ -944,7 +909,6 @@ static int __init pcie_aspm_disable(char
 	if (!strcmp(str, "off")) {
 		aspm_policy = POLICY_DEFAULT;
 		aspm_disabled = 1;
-		aspm_support_enabled = false;
 		printk(KERN_INFO "PCIe ASPM is disabled\n");
 	} else if (!strcmp(str, "force")) {
 		aspm_force = 1;
@@ -980,9 +944,3 @@ int pcie_aspm_enabled(void)
        return !aspm_disabled;
 }
 EXPORT_SYMBOL(pcie_aspm_enabled);
-
-bool pcie_aspm_support_enabled(void)
-{
-	return aspm_support_enabled;
-}
-EXPORT_SYMBOL(pcie_aspm_support_enabled);
Index: linux-2.6/include/linux/pci-aspm.h
===================================================================
--- linux-2.6.orig/include/linux/pci-aspm.h
+++ linux-2.6/include/linux/pci-aspm.h
@@ -29,7 +29,6 @@ extern void pcie_aspm_pm_state_change(st
 extern void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
 extern void pci_disable_link_state(struct pci_dev *pdev, int state);
 extern void pci_disable_link_state_locked(struct pci_dev *pdev, int state);
-extern void pcie_clear_aspm(struct pci_bus *bus);
 extern void pcie_no_aspm(void);
 #else
 static inline void pcie_aspm_init_link_state(struct pci_dev *pdev)
@@ -47,9 +46,6 @@ static inline void pcie_aspm_powersave_c
 static inline void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
 }
-static inline void pcie_clear_aspm(struct pci_bus *bus)
-{
-}
 static inline void pcie_no_aspm(void)
 {
 }
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1168,7 +1168,7 @@ static inline int pcie_aspm_enabled(void
 static inline bool pcie_aspm_support_enabled(void) { return false; }
 #else
 extern int pcie_aspm_enabled(void);
-extern bool pcie_aspm_support_enabled(void);
+static inline bool pcie_aspm_support_enabled(void) { return true; }
 #endif
 
 #ifdef CONFIG_PCIEAER

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-28 20:24               ` Yinghai Lu
@ 2013-03-29  3:22                 ` Bjorn Helgaas
  2013-03-29  5:59                   ` Yinghai Lu
  2013-03-29 18:11                   ` Roman Yepishev
  0 siblings, 2 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-03-29  3:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

[+cc Matthew]
[+cc e1000-devel@lists.sourceforge.net for suspected 82575/82598 regression]

On Thu, Mar 28, 2013 at 01:24:55PM -0700, Yinghai Lu wrote:
> patch for Roman
> 
> On Thu, Mar 28, 2013 at 1:24 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > resending with adding To Roman.
> >
> > On Thu, Mar 28, 2013 at 5:46 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >> This patch might be *safe*, but it (and the changelog) are completely
> >> unintelligible.
> >>
> >> The problem with applying an unintelligible stop-gap patch is that it
> >> becomes forever part of the changelog, and it's a huge waste of time
> >> to everybody who tries to understand the history later.  That's why I
> >> think it's worth spending some time to make a good patch now.
> >
> > Please check if attached patch is doing what you want.

Patch inlined below for convenience.

> Subject: [PATCH] PCI: Remove not needed check in disable aspm link
> 
> Roman reported ath5k does not work anymore on 3.8.
> Bisected to
> | commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
> | Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
> | Date:   Tue Oct 30 15:27:13 2012 +0900
> |
> |    PCI/ACPI: Request _OSC control before scanning PCI root bus
> |
> |    This patch moves up the code block to request _OSC control in order to
> |    separate ACPI work and PCI work in acpi_pci_root_add().
> 
> It make pci_disable_link_state does not work anymore as acpi_disabled
> is set before pci root bus scanning.
> It will skip that in quirks and pcie_aspm_sanity_check.

I think this regression has nothing to do with pci_disable_link_state().

When aspm_disabled is set, pci_disable_link_state() doesn't do anything.

In both 3.7 and 3.8, aspm_disabled is set in acpi_pci_root_add() before
any driver probe routines are run, so it looks like calling
pci_disable_link_state() from a driver had no effect even in 3.7.  This
is a problem, of course, but not the one Roman is seeing, because ath5k
calls pci_disable_link_state() from the driver probe routine.

There are also PCI_FIXUP_FINAL quirks for 82575 and 82598 NICs that call
pci_disable_link_state().  In 3.7, these quirks are run before
aspm_disabled is set, but 8c33f51d moved the pcie_no_aspm() call up
before we start scanning the bus, so in 3.8, aspm_disabled is set
*before* we run them.  I think that means 8c33f51d broke all these
quirks.  That's also a problem, of course, but this isn't the one Roman
is seeing either.

I think the problem Roman is seeing happens when
pcie_aspm_init_link_state() calls pcie_aspm_sanity_check() during device
enumeration.  In 3.8, the fact that aspm_disabled is already set by the
time we get here means we skip the check for pre-1.1 PCIe devices, and
I think *this* is what Roman is seeing.

I suspect the following hunk of your patch is enough to fix things for
Roman:

> --- linux-2.6.orig/drivers/pci/pcie/aspm.c
> +++ linux-2.6/drivers/pci/pcie/aspm.c
> @@ -493,15 +492,6 @@ static int pcie_aspm_sanity_check(struct
>  			return -EINVAL;
>  
>  		/*
> -		 * If ASPM is disabled then we're not going to change
> -		 * the BIOS state. It's safe to continue even if it's a
> -		 * pre-1.1 device
> -		 */
> -
> -		if (aspm_disabled)
> -			continue;
> -
> -		/*
>  		 * Disable ASPM for pre-1.1 PCIe device, we follow MS to use
>  		 * RBER bit to determine if a function is 1.1 version device
>  		 */

However, this test was added by Matthew in c9651e70, and I can't remove
it unless we have an explanation of why removing it will not reintroduce
the bug he was fixing.

This code is such a terrible mess that it's not surprising at all that
we have all these issues.  But there's too much to untangle in v3.9; all
we can hope for is to fix the regressions in v3.9 and clean it up later.

> We could revert to old logic, but that will make booting path and hotplug
> path with different aspm_disabled again.
> 
> Acctually we don't need to check aspm_disabled in disable link, as
> we already have protection about link state following.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=55211
> http://article.gmane.org/gmane.linux.kernel.pci/20640
> 
> Need it for 3.8 stable.
> 
> -v2: more cleanup
> 	1. remove aspm_support_enabled, as if it compiled in, support is there
> 		so even user pass aspm=off, link_state still get allocated,
> 		then we will have chance to disable aspm on devices from
> 		buggy setting of BIOS.
> 	2. move pcie_no_aspm() calling for fadt disabling before scanning
> 		requested by Bjorn.
> 
> Reported-by: Roman Yepishev <roman.yepishev@gmail.com>
> Bisected-by: Roman Yepishev <roman.yepishev@gmail.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
> 
> ---
>  drivers/acpi/pci_root.c  |   25 +++++++++---------------
>  drivers/pci/pcie/aspm.c  |   48 ++---------------------------------------------
>  include/linux/pci-aspm.h |    4 ---
>  include/linux/pci.h      |    2 -
>  4 files changed, 14 insertions(+), 65 deletions(-)
> 
> Index: linux-2.6/drivers/acpi/pci_root.c
> ===================================================================
> --- linux-2.6.orig/drivers/acpi/pci_root.c
> +++ linux-2.6/drivers/acpi/pci_root.c
> @@ -415,7 +415,6 @@ static int acpi_pci_root_add(struct acpi
>  	struct acpi_pci_root *root;
>  	struct acpi_pci_driver *driver;
>  	u32 flags, base_flags;
> -	bool is_osc_granted = false;
>  
>  	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
>  	if (!root)
> @@ -494,6 +493,11 @@ static int acpi_pci_root_add(struct acpi
>  			flags = base_flags;
>  		}
>  	}
> +
> +	/* ASPM setting */
> +	if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
> +		pcie_no_aspm();
> +
>  	if (!pcie_ports_disabled
>  	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
>  		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
> @@ -513,16 +517,17 @@ static int acpi_pci_root_add(struct acpi
>  
>  		status = acpi_pci_osc_control_set(device->handle, &flags,
>  				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
> -		if (ACPI_SUCCESS(status)) {
> -			is_osc_granted = true;
> +		if (ACPI_SUCCESS(status))
>  			dev_info(&device->dev,
>  				"ACPI _OSC control (0x%02x) granted\n", flags);
> -		} else {
> -			is_osc_granted = false;
> +		else {
>  			dev_info(&device->dev,
>  				"ACPI _OSC request failed (%s), "
>  				"returned control mask: 0x%02x\n",
>  				acpi_format_exception(status), flags);
> +			pr_info("ACPI _OSC control for PCIe not granted, "
> +				"disabling ASPM\n");
> +			pcie_no_aspm();
>  		}
>  	} else {
>  		dev_info(&device->dev,
> @@ -554,16 +559,6 @@ static int acpi_pci_root_add(struct acpi
>  		goto out_del_root;
>  	}
>  
> -	/* ASPM setting */
> -	if (is_osc_granted) {
> -		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
> -			pcie_clear_aspm(root->bus);
> -	} else {
> -		pr_info("ACPI _OSC control for PCIe not granted, "
> -			"disabling ASPM\n");
> -		pcie_no_aspm();
> -	}
> -
>  	pci_acpi_add_bus_pm_notifier(device, root->bus);
>  	if (device->wakeup.flags.run_wake)
>  		device_set_run_wake(root->bus->bridge, true);
> Index: linux-2.6/drivers/pci/pcie/aspm.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/pcie/aspm.c
> +++ linux-2.6/drivers/pci/pcie/aspm.c
> @@ -69,7 +69,6 @@ struct pcie_link_state {
>  };
>  
>  static int aspm_disabled, aspm_force;
> -static bool aspm_support_enabled = true;
>  static DEFINE_MUTEX(aspm_lock);
>  static LIST_HEAD(link_list);
>  
> @@ -493,15 +492,6 @@ static int pcie_aspm_sanity_check(struct
>  			return -EINVAL;
>  
>  		/*
> -		 * If ASPM is disabled then we're not going to change
> -		 * the BIOS state. It's safe to continue even if it's a
> -		 * pre-1.1 device
> -		 */
> -
> -		if (aspm_disabled)
> -			continue;
> -
> -		/*
>  		 * Disable ASPM for pre-1.1 PCIe device, we follow MS to use
>  		 * RBER bit to determine if a function is 1.1 version device
>  		 */
> @@ -556,9 +546,6 @@ void pcie_aspm_init_link_state(struct pc
>  	struct pcie_link_state *link;
>  	int blacklist = !!pcie_aspm_sanity_check(pdev);
>  
> -	if (!aspm_support_enabled)
> -		return;
> -
>  	if (!pci_is_pcie(pdev) || pdev->link_state)
>  		return;
>  	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
> @@ -718,15 +705,11 @@ void pcie_aspm_powersave_config_link(str
>   * pci_disable_link_state - disable pci device's link state, so the link will
>   * never enter specific states
>   */
> -static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
> -				     bool force)
> +static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>  {
>  	struct pci_dev *parent = pdev->bus->self;
>  	struct pcie_link_state *link;
>  
> -	if (aspm_disabled && !force)
> -		return;
> -
>  	if (!pci_is_pcie(pdev))
>  		return;
>  
> @@ -757,34 +740,16 @@ static void __pci_disable_link_state(str
>  
>  void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
>  {
> -	__pci_disable_link_state(pdev, state, false, false);
> +	__pci_disable_link_state(pdev, state, false);
>  }
>  EXPORT_SYMBOL(pci_disable_link_state_locked);
>  
>  void pci_disable_link_state(struct pci_dev *pdev, int state)
>  {
> -	__pci_disable_link_state(pdev, state, true, false);
> +	__pci_disable_link_state(pdev, state, true);
>  }
>  EXPORT_SYMBOL(pci_disable_link_state);
>  
> -void pcie_clear_aspm(struct pci_bus *bus)
> -{
> -	struct pci_dev *child;
> -
> -	if (aspm_force)
> -		return;
> -
> -	/*
> -	 * Clear any ASPM setup that the firmware has carried out on this bus
> -	 */
> -	list_for_each_entry(child, &bus->devices, bus_list) {
> -		__pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
> -					 PCIE_LINK_STATE_L1 |
> -					 PCIE_LINK_STATE_CLKPM,
> -					 false, true);
> -	}
> -}
> -
>  static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp)
>  {
>  	int i;
> @@ -944,7 +909,6 @@ static int __init pcie_aspm_disable(char
>  	if (!strcmp(str, "off")) {
>  		aspm_policy = POLICY_DEFAULT;
>  		aspm_disabled = 1;
> -		aspm_support_enabled = false;
>  		printk(KERN_INFO "PCIe ASPM is disabled\n");
>  	} else if (!strcmp(str, "force")) {
>  		aspm_force = 1;
> @@ -980,9 +944,3 @@ int pcie_aspm_enabled(void)
>         return !aspm_disabled;
>  }
>  EXPORT_SYMBOL(pcie_aspm_enabled);
> -
> -bool pcie_aspm_support_enabled(void)
> -{
> -	return aspm_support_enabled;
> -}
> -EXPORT_SYMBOL(pcie_aspm_support_enabled);
> Index: linux-2.6/include/linux/pci-aspm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pci-aspm.h
> +++ linux-2.6/include/linux/pci-aspm.h
> @@ -29,7 +29,6 @@ extern void pcie_aspm_pm_state_change(st
>  extern void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
>  extern void pci_disable_link_state(struct pci_dev *pdev, int state);
>  extern void pci_disable_link_state_locked(struct pci_dev *pdev, int state);
> -extern void pcie_clear_aspm(struct pci_bus *bus);
>  extern void pcie_no_aspm(void);
>  #else
>  static inline void pcie_aspm_init_link_state(struct pci_dev *pdev)
> @@ -47,9 +46,6 @@ static inline void pcie_aspm_powersave_c
>  static inline void pci_disable_link_state(struct pci_dev *pdev, int state)
>  {
>  }
> -static inline void pcie_clear_aspm(struct pci_bus *bus)
> -{
> -}
>  static inline void pcie_no_aspm(void)
>  {
>  }
> Index: linux-2.6/include/linux/pci.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pci.h
> +++ linux-2.6/include/linux/pci.h
> @@ -1168,7 +1168,7 @@ static inline int pcie_aspm_enabled(void
>  static inline bool pcie_aspm_support_enabled(void) { return false; }
>  #else
>  extern int pcie_aspm_enabled(void);
> -extern bool pcie_aspm_support_enabled(void);
> +static inline bool pcie_aspm_support_enabled(void) { return true; }
>  #endif
>  
>  #ifdef CONFIG_PCIEAER

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-29  3:22                 ` Bjorn Helgaas
@ 2013-03-29  5:59                   ` Yinghai Lu
  2013-03-29 12:24                       ` Bjorn Helgaas
  2013-03-29 18:11                   ` Roman Yepishev
  1 sibling, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-03-29  5:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Matthew Garrett, e1000-devel, linux-pci, Kenji Kaneshige,
	linux-kernel, Rafael J. Wysocki, linux-acpi, Roman Yepishev,
	Taku Izumi


[-- Attachment #1.1: Type: text/plain, Size: 4475 bytes --]

On Thu, Mar 28, 2013 at 8:22 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:

> [+cc Matthew]
> [+cc e1000-devel@lists.sourceforge.net for suspected 82575/82598
> regression]
>
> On Thu, Mar 28, 2013 at 01:24:55PM -0700, Yinghai Lu wrote:
> > patch for Roman
> >
> > On Thu, Mar 28, 2013 at 1:24 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > > resending with adding To Roman.
> > >
> > > On Thu, Mar 28, 2013 at 5:46 AM, Bjorn Helgaas <bhelgaas@google.com>
> wrote:
> > >> This patch might be *safe*, but it (and the changelog) are completely
> > >> unintelligible.
> > >>
> > >> The problem with applying an unintelligible stop-gap patch is that it
> > >> becomes forever part of the changelog, and it's a huge waste of time
> > >> to everybody who tries to understand the history later.  That's why I
> > >> think it's worth spending some time to make a good patch now.
> > >
> > > Please check if attached patch is doing what you want.
>
> Patch inlined below for convenience.
>
> > Subject: [PATCH] PCI: Remove not needed check in disable aspm link
> >
> > Roman reported ath5k does not work anymore on 3.8.
> > Bisected to
> > | commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
> > | Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
> > | Date:   Tue Oct 30 15:27:13 2012 +0900
> > |
> > |    PCI/ACPI: Request _OSC control before scanning PCI root bus
> > |
> > |    This patch moves up the code block to request _OSC control in order
> to
> > |    separate ACPI work and PCI work in acpi_pci_root_add().
> >
> > It make pci_disable_link_state does not work anymore as acpi_disabled
> > is set before pci root bus scanning.
> > It will skip that in quirks and pcie_aspm_sanity_check.
>
> I think this regression has nothing to do with pci_disable_link_state().
>
> When aspm_disabled is set, pci_disable_link_state() doesn't do anything.
>
> In both 3.7 and 3.8, aspm_disabled is set in acpi_pci_root_add() before
> any driver probe routines are run, so it looks like calling
> pci_disable_link_state() from a driver had no effect even in 3.7.  This
> is a problem, of course, but not the one Roman is seeing, because ath5k
> calls pci_disable_link_state() from the driver probe routine.
>
> There are also PCI_FIXUP_FINAL quirks for 82575 and 82598 NICs that call
> pci_disable_link_state().  In 3.7, these quirks are run before
> aspm_disabled is set, but 8c33f51d moved the pcie_no_aspm() call up
> before we start scanning the bus, so in 3.8, aspm_disabled is set
> *before* we run them.  I think that means 8c33f51d broke all these
> quirks.  That's also a problem, of course, but this isn't the one Roman
> is seeing either.
>
> I think the problem Roman is seeing happens when
> pcie_aspm_init_link_state() calls pcie_aspm_sanity_check() during device
> enumeration.  In 3.8, the fact that aspm_disabled is already set by the
> time we get here means we skip the check for pre-1.1 PCIe devices, and
> I think *this* is what Roman is seeing.
>
> I suspect the following hunk of your patch is enough to fix things for
> Roman:
>
> > --- linux-2.6.orig/drivers/pci/pcie/aspm.c
> > +++ linux-2.6/drivers/pci/pcie/aspm.c
> > @@ -493,15 +492,6 @@ static int pcie_aspm_sanity_check(struct
> >                       return -EINVAL;
> >
> >               /*
> > -              * If ASPM is disabled then we're not going to change
> > -              * the BIOS state. It's safe to continue even if it's a
> > -              * pre-1.1 device
> > -              */
> > -
> > -             if (aspm_disabled)
> > -                     continue;
> > -
> > -             /*
> >                * Disable ASPM for pre-1.1 PCIe device, we follow MS to
> use
> >                * RBER bit to determine if a function is 1.1 version
> device
> >                */
>
> However, this test was added by Matthew in c9651e70, and I can't remove
> it unless we have an explanation of why removing it will not reintroduce
> the bug he was fixing.
>
> This code is such a terrible mess that it's not surprising at all that
> we have all these issues.  But there's too much to untangle in v3.9; all
> we can hope for is to fix the regressions in v3.9 and clean it up later.
>

v1 will fix quirks and pcie_aspm_sanity_check path.
v2. will go further even user pass "aspm=off", those quirks and disable
aspm in driver
will still work, and also call pcie_no_aspm for disable aspm for FADT path
early.

So now you want half of v1, and not want to fix quirk path.
Is my understanding right?

Yinghai

[-- Attachment #2: Type: text/plain, Size: 402 bytes --]

------------------------------------------------------------------------------
Own the Future-Intel(R) Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest. Compete 
for recognition, cash, and the chance to get your game on Steam. 
$5K grand prize plus 10 genre and skill prizes. Submit your demo 
by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2

[-- Attachment #3: Type: text/plain, Size: 257 bytes --]

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-29  5:59                   ` Yinghai Lu
@ 2013-03-29 12:24                       ` Bjorn Helgaas
  0 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-03-29 12:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Thu, Mar 28, 2013 at 11:59 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> On Thu, Mar 28, 2013 at 8:22 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> [+cc Matthew]
>> [+cc e1000-devel@lists.sourceforge.net for suspected 82575/82598
>> regression]
>>
>> On Thu, Mar 28, 2013 at 01:24:55PM -0700, Yinghai Lu wrote:
>> > patch for Roman
>> >
>> > On Thu, Mar 28, 2013 at 1:24 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > > resending with adding To Roman.
>> > >
>> > > On Thu, Mar 28, 2013 at 5:46 AM, Bjorn Helgaas <bhelgaas@google.com>
>> > > wrote:
>> > >> This patch might be *safe*, but it (and the changelog) are completely
>> > >> unintelligible.
>> > >>
>> > >> The problem with applying an unintelligible stop-gap patch is that it
>> > >> becomes forever part of the changelog, and it's a huge waste of time
>> > >> to everybody who tries to understand the history later.  That's why I
>> > >> think it's worth spending some time to make a good patch now.
>> > >
>> > > Please check if attached patch is doing what you want.
>>
>> Patch inlined below for convenience.
>>
>> > Subject: [PATCH] PCI: Remove not needed check in disable aspm link
>> >
>> > Roman reported ath5k does not work anymore on 3.8.
>> > Bisected to
>> > | commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
>> > | Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
>> > | Date:   Tue Oct 30 15:27:13 2012 +0900
>> > |
>> > |    PCI/ACPI: Request _OSC control before scanning PCI root bus
>> > |
>> > |    This patch moves up the code block to request _OSC control in order
>> > to
>> > |    separate ACPI work and PCI work in acpi_pci_root_add().
>> >
>> > It make pci_disable_link_state does not work anymore as acpi_disabled
>> > is set before pci root bus scanning.
>> > It will skip that in quirks and pcie_aspm_sanity_check.
>>
>> I think this regression has nothing to do with pci_disable_link_state().
>>
>> When aspm_disabled is set, pci_disable_link_state() doesn't do anything.
>>
>> In both 3.7 and 3.8, aspm_disabled is set in acpi_pci_root_add() before
>> any driver probe routines are run, so it looks like calling
>> pci_disable_link_state() from a driver had no effect even in 3.7.  This
>> is a problem, of course, but not the one Roman is seeing, because ath5k
>> calls pci_disable_link_state() from the driver probe routine.
>>
>> There are also PCI_FIXUP_FINAL quirks for 82575 and 82598 NICs that call
>> pci_disable_link_state().  In 3.7, these quirks are run before
>> aspm_disabled is set, but 8c33f51d moved the pcie_no_aspm() call up
>> before we start scanning the bus, so in 3.8, aspm_disabled is set
>> *before* we run them.  I think that means 8c33f51d broke all these
>> quirks.  That's also a problem, of course, but this isn't the one Roman
>> is seeing either.
>>
>> I think the problem Roman is seeing happens when
>> pcie_aspm_init_link_state() calls pcie_aspm_sanity_check() during device
>> enumeration.  In 3.8, the fact that aspm_disabled is already set by the
>> time we get here means we skip the check for pre-1.1 PCIe devices, and
>> I think *this* is what Roman is seeing.
>>
>> I suspect the following hunk of your patch is enough to fix things for
>> Roman:
>>
>> > --- linux-2.6.orig/drivers/pci/pcie/aspm.c
>> > +++ linux-2.6/drivers/pci/pcie/aspm.c
>> > @@ -493,15 +492,6 @@ static int pcie_aspm_sanity_check(struct
>> >                       return -EINVAL;
>> >
>> >               /*
>> > -              * If ASPM is disabled then we're not going to change
>> > -              * the BIOS state. It's safe to continue even if it's a
>> > -              * pre-1.1 device
>> > -              */
>> > -
>> > -             if (aspm_disabled)
>> > -                     continue;
>> > -
>> > -             /*
>> >                * Disable ASPM for pre-1.1 PCIe device, we follow MS to
>> > use
>> >                * RBER bit to determine if a function is 1.1 version
>> > device
>> >                */
>>
>> However, this test was added by Matthew in c9651e70, and I can't remove
>> it unless we have an explanation of why removing it will not reintroduce
>> the bug he was fixing.
>>
>> This code is such a terrible mess that it's not surprising at all that
>> we have all these issues.  But there's too much to untangle in v3.9; all
>> we can hope for is to fix the regressions in v3.9 and clean it up later.
>
>
> v1 will fix quirks and pcie_aspm_sanity_check path.
> v2. will go further even user pass "aspm=off", those quirks and disable aspm
> in driver
> will still work, and also call pcie_no_aspm for disable aspm for FADT path
> early.
>
> So now you want half of v1, and not want to fix quirk path.
> Is my understanding right?

What I want is a patch that fixes the regression and doesn't break
anything else, along with a changelog that makes it obvious that we're
doing the right thing.  I don't know what that looks like yet.  None
of your patches so far is even close.

Half of your v1 patch (removing the pcie_aspm_sanity_check() test)
*might* be the right thing, but only if you can clearly explain why
that will not reintroduce the bug Matthew fixed with c9651e70.

I think we also need to fix the PCI_FIXUP_FINAL quirk regression, but
that's a separate issue and should be a separate patch.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-03-29 12:24                       ` Bjorn Helgaas
  0 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-03-29 12:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Thu, Mar 28, 2013 at 11:59 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> On Thu, Mar 28, 2013 at 8:22 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> [+cc Matthew]
>> [+cc e1000-devel@lists.sourceforge.net for suspected 82575/82598
>> regression]
>>
>> On Thu, Mar 28, 2013 at 01:24:55PM -0700, Yinghai Lu wrote:
>> > patch for Roman
>> >
>> > On Thu, Mar 28, 2013 at 1:24 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > > resending with adding To Roman.
>> > >
>> > > On Thu, Mar 28, 2013 at 5:46 AM, Bjorn Helgaas <bhelgaas@google.com>
>> > > wrote:
>> > >> This patch might be *safe*, but it (and the changelog) are completely
>> > >> unintelligible.
>> > >>
>> > >> The problem with applying an unintelligible stop-gap patch is that it
>> > >> becomes forever part of the changelog, and it's a huge waste of time
>> > >> to everybody who tries to understand the history later.  That's why I
>> > >> think it's worth spending some time to make a good patch now.
>> > >
>> > > Please check if attached patch is doing what you want.
>>
>> Patch inlined below for convenience.
>>
>> > Subject: [PATCH] PCI: Remove not needed check in disable aspm link
>> >
>> > Roman reported ath5k does not work anymore on 3.8.
>> > Bisected to
>> > | commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
>> > | Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
>> > | Date:   Tue Oct 30 15:27:13 2012 +0900
>> > |
>> > |    PCI/ACPI: Request _OSC control before scanning PCI root bus
>> > |
>> > |    This patch moves up the code block to request _OSC control in order
>> > to
>> > |    separate ACPI work and PCI work in acpi_pci_root_add().
>> >
>> > It make pci_disable_link_state does not work anymore as acpi_disabled
>> > is set before pci root bus scanning.
>> > It will skip that in quirks and pcie_aspm_sanity_check.
>>
>> I think this regression has nothing to do with pci_disable_link_state().
>>
>> When aspm_disabled is set, pci_disable_link_state() doesn't do anything.
>>
>> In both 3.7 and 3.8, aspm_disabled is set in acpi_pci_root_add() before
>> any driver probe routines are run, so it looks like calling
>> pci_disable_link_state() from a driver had no effect even in 3.7.  This
>> is a problem, of course, but not the one Roman is seeing, because ath5k
>> calls pci_disable_link_state() from the driver probe routine.
>>
>> There are also PCI_FIXUP_FINAL quirks for 82575 and 82598 NICs that call
>> pci_disable_link_state().  In 3.7, these quirks are run before
>> aspm_disabled is set, but 8c33f51d moved the pcie_no_aspm() call up
>> before we start scanning the bus, so in 3.8, aspm_disabled is set
>> *before* we run them.  I think that means 8c33f51d broke all these
>> quirks.  That's also a problem, of course, but this isn't the one Roman
>> is seeing either.
>>
>> I think the problem Roman is seeing happens when
>> pcie_aspm_init_link_state() calls pcie_aspm_sanity_check() during device
>> enumeration.  In 3.8, the fact that aspm_disabled is already set by the
>> time we get here means we skip the check for pre-1.1 PCIe devices, and
>> I think *this* is what Roman is seeing.
>>
>> I suspect the following hunk of your patch is enough to fix things for
>> Roman:
>>
>> > --- linux-2.6.orig/drivers/pci/pcie/aspm.c
>> > +++ linux-2.6/drivers/pci/pcie/aspm.c
>> > @@ -493,15 +492,6 @@ static int pcie_aspm_sanity_check(struct
>> >                       return -EINVAL;
>> >
>> >               /*
>> > -              * If ASPM is disabled then we're not going to change
>> > -              * the BIOS state. It's safe to continue even if it's a
>> > -              * pre-1.1 device
>> > -              */
>> > -
>> > -             if (aspm_disabled)
>> > -                     continue;
>> > -
>> > -             /*
>> >                * Disable ASPM for pre-1.1 PCIe device, we follow MS to
>> > use
>> >                * RBER bit to determine if a function is 1.1 version
>> > device
>> >                */
>>
>> However, this test was added by Matthew in c9651e70, and I can't remove
>> it unless we have an explanation of why removing it will not reintroduce
>> the bug he was fixing.
>>
>> This code is such a terrible mess that it's not surprising at all that
>> we have all these issues.  But there's too much to untangle in v3.9; all
>> we can hope for is to fix the regressions in v3.9 and clean it up later.
>
>
> v1 will fix quirks and pcie_aspm_sanity_check path.
> v2. will go further even user pass "aspm=off", those quirks and disable aspm
> in driver
> will still work, and also call pcie_no_aspm for disable aspm for FADT path
> early.
>
> So now you want half of v1, and not want to fix quirk path.
> Is my understanding right?

What I want is a patch that fixes the regression and doesn't break
anything else, along with a changelog that makes it obvious that we're
doing the right thing.  I don't know what that looks like yet.  None
of your patches so far is even close.

Half of your v1 patch (removing the pcie_aspm_sanity_check() test)
*might* be the right thing, but only if you can clearly explain why
that will not reintroduce the bug Matthew fixed with c9651e70.

I think we also need to fix the PCI_FIXUP_FINAL quirk regression, but
that's a separate issue and should be a separate patch.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-29 12:24                       ` Bjorn Helgaas
@ 2013-03-29 18:02                         ` Yinghai Lu
  -1 siblings, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-03-29 18:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Fri, Mar 29, 2013 at 5:24 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> Half of your v1 patch (removing the pcie_aspm_sanity_check() test)
> *might* be the right thing, but only if you can clearly explain why
> that will not reintroduce the bug Matthew fixed with c9651e70.
>
> I think we also need to fix the PCI_FIXUP_FINAL quirk regression, but
> that's a separate issue and should be a separate patch.


First commit from Matthew
 0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
    Right now we won't touch ASPM state if ASPM is disabled, except in the case
    where we find a device that appears to be too old to reliably support ASPM.
    Right now we'll clear it in that case, which is almost certainly the wrong
    thing to do

Try to not touch pre-1.1 ASPM for all, and it causes lots of regression.

So second commit

cdb0f9a1ad2e ASPM: Fix pcie devices with non-pcie children
    Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
    Some other systems using the pata_jmicron driver fail to boot because no
    disks are detected.  Passing pcie_aspm=force on the kernel command line
    works around it.

move the check aspm_disabled down.

but ath5 and etc (pre-1.1) really need to aspm_disable to change their
hw setting.

So the right solution would be dropping pcie_aspm_sanity_check()
change -in v2 should make all both happy, as quirk and disable that in
driver for ath5 are calling
pcie_disable_aspm_state explicitly.

In v2, we already removed pcie_clear_aspm() that is calling
pcie_disable_aspm_state.


Please check attached -v3.


Thanks

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-03-29 18:02                         ` Yinghai Lu
  0 siblings, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-03-29 18:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Fri, Mar 29, 2013 at 5:24 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> Half of your v1 patch (removing the pcie_aspm_sanity_check() test)
> *might* be the right thing, but only if you can clearly explain why
> that will not reintroduce the bug Matthew fixed with c9651e70.
>
> I think we also need to fix the PCI_FIXUP_FINAL quirk regression, but
> that's a separate issue and should be a separate patch.


First commit from Matthew
 0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
    Right now we won't touch ASPM state if ASPM is disabled, except in the case
    where we find a device that appears to be too old to reliably support ASPM.
    Right now we'll clear it in that case, which is almost certainly the wrong
    thing to do

Try to not touch pre-1.1 ASPM for all, and it causes lots of regression.

So second commit

cdb0f9a1ad2e ASPM: Fix pcie devices with non-pcie children
    Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
    Some other systems using the pata_jmicron driver fail to boot because no
    disks are detected.  Passing pcie_aspm=force on the kernel command line
    works around it.

move the check aspm_disabled down.

but ath5 and etc (pre-1.1) really need to aspm_disable to change their
hw setting.

So the right solution would be dropping pcie_aspm_sanity_check()
change -in v2 should make all both happy, as quirk and disable that in
driver for ath5 are calling
pcie_disable_aspm_state explicitly.

In v2, we already removed pcie_clear_aspm() that is calling
pcie_disable_aspm_state.


Please check attached -v3.


Thanks

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-29 18:02                         ` Yinghai Lu
@ 2013-03-29 18:04                           ` Yinghai Lu
  -1 siblings, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-03-29 18:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

attatched -v3 again

On Fri, Mar 29, 2013 at 11:02 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Mar 29, 2013 at 5:24 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> Half of your v1 patch (removing the pcie_aspm_sanity_check() test)
>> *might* be the right thing, but only if you can clearly explain why
>> that will not reintroduce the bug Matthew fixed with c9651e70.
>>
>> I think we also need to fix the PCI_FIXUP_FINAL quirk regression, but
>> that's a separate issue and should be a separate patch.
>
>
> First commit from Matthew
>  0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
>     Right now we won't touch ASPM state if ASPM is disabled, except in the case
>     where we find a device that appears to be too old to reliably support ASPM.
>     Right now we'll clear it in that case, which is almost certainly the wrong
>     thing to do
>
> Try to not touch pre-1.1 ASPM for all, and it causes lots of regression.
>
> So second commit
>
> cdb0f9a1ad2e ASPM: Fix pcie devices with non-pcie children
>     Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
>     Some other systems using the pata_jmicron driver fail to boot because no
>     disks are detected.  Passing pcie_aspm=force on the kernel command line
>     works around it.
>
> move the check aspm_disabled down.
>
> but ath5 and etc (pre-1.1) really need to aspm_disable to change their
> hw setting.
>
> So the right solution would be dropping pcie_aspm_sanity_check()
> change -in v2 should make all both happy, as quirk and disable that in
> driver for ath5 are calling
> pcie_disable_aspm_state explicitly.
>
> In v2, we already removed pcie_clear_aspm() that is calling
> pcie_disable_aspm_state.
>
>
> Please check attached -v3.
>
>
> Thanks
>
> Yinghai

[-- Attachment #2: disable_aspm_3.patch --]
[-- Type: application/octet-stream, Size: 8221 bytes --]

Subject: [PATCH] PCI: Remove not needed check in disable aspm link

Roman reported ath5k does not work anymore on 3.8.
Bisected to
| commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
| Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
| Date:   Tue Oct 30 15:27:13 2012 +0900
|
|    PCI/ACPI: Request _OSC control before scanning PCI root bus
|
|    This patch moves up the code block to request _OSC control in order to
|    separate ACPI work and PCI work in acpi_pci_root_add().

It make pcie_aspm_sanity_check does not work anymore as acpi_disabled
is set before pci root bus scanning.

We could revert to old logic, but that will make booting path and hotplug
path with different aspm_disabled again.

Acctually we don't need to check aspm_disabled in pci_disable_link_state,
as we already have protection about link state checking.
and pci_disable_link_state will be only called explicted for quirk
and driver.

That will keep the logic in pcie_aspm_sanity_check() in commits:
 0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
 cdb0f9a1a     ASPM: Fix pcie devices with non-pcie children
still working, AKA still not touch pre-1.1 ASPM device.

https://bugzilla.kernel.org/show_bug.cgi?id=55211
http://article.gmane.org/gmane.linux.kernel.pci/20640

Need it for 3.8 stable.

-v2: more cleanup
	1. remove aspm_support_enabled, as if it compiled in, support is there
		so even user pass aspm=off, link_state still get allocated,
		then we will have chance to disable aspm on devices from
		buggy setting of BIOS.
	2. move pcie_no_aspm() calling for fadt disabling before scanning
		requested by Bjorn.
-v3: remove change in pcie_aspm_sanity_check()

Reported-by: Roman Yepishev <roman.yepishev@gmail.com>
Bisected-by: Roman Yepishev <roman.yepishev@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

---
 drivers/acpi/pci_root.c  |   25 +++++++++---------------
 drivers/pci/pcie/aspm.c  |   48 ++---------------------------------------------
 include/linux/pci-aspm.h |    4 ---
 include/linux/pci.h      |    2 -
 4 files changed, 14 insertions(+), 65 deletions(-)

Index: linux-2.6/drivers/acpi/pci_root.c
===================================================================
--- linux-2.6.orig/drivers/acpi/pci_root.c
+++ linux-2.6/drivers/acpi/pci_root.c
@@ -415,7 +415,6 @@ static int acpi_pci_root_add(struct acpi
 	struct acpi_pci_root *root;
 	struct acpi_pci_driver *driver;
 	u32 flags, base_flags;
-	bool is_osc_granted = false;
 
 	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
 	if (!root)
@@ -494,6 +493,11 @@ static int acpi_pci_root_add(struct acpi
 			flags = base_flags;
 		}
 	}
+
+	/* ASPM setting */
+	if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
+		pcie_no_aspm();
+
 	if (!pcie_ports_disabled
 	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
 		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
@@ -513,16 +517,17 @@ static int acpi_pci_root_add(struct acpi
 
 		status = acpi_pci_osc_control_set(device->handle, &flags,
 				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
-		if (ACPI_SUCCESS(status)) {
-			is_osc_granted = true;
+		if (ACPI_SUCCESS(status))
 			dev_info(&device->dev,
 				"ACPI _OSC control (0x%02x) granted\n", flags);
-		} else {
-			is_osc_granted = false;
+		else {
 			dev_info(&device->dev,
 				"ACPI _OSC request failed (%s), "
 				"returned control mask: 0x%02x\n",
 				acpi_format_exception(status), flags);
+			pr_info("ACPI _OSC control for PCIe not granted, "
+				"disabling ASPM\n");
+			pcie_no_aspm();
 		}
 	} else {
 		dev_info(&device->dev,
@@ -554,16 +559,6 @@ static int acpi_pci_root_add(struct acpi
 		goto out_del_root;
 	}
 
-	/* ASPM setting */
-	if (is_osc_granted) {
-		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
-			pcie_clear_aspm(root->bus);
-	} else {
-		pr_info("ACPI _OSC control for PCIe not granted, "
-			"disabling ASPM\n");
-		pcie_no_aspm();
-	}
-
 	pci_acpi_add_bus_pm_notifier(device, root->bus);
 	if (device->wakeup.flags.run_wake)
 		device_set_run_wake(root->bus->bridge, true);
Index: linux-2.6/drivers/pci/pcie/aspm.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/aspm.c
+++ linux-2.6/drivers/pci/pcie/aspm.c
@@ -69,7 +69,6 @@ struct pcie_link_state {
 };
 
 static int aspm_disabled, aspm_force;
-static bool aspm_support_enabled = true;
 static DEFINE_MUTEX(aspm_lock);
 static LIST_HEAD(link_list);
 
@@ -556,9 +546,6 @@ void pcie_aspm_init_link_state(struct pc
 	struct pcie_link_state *link;
 	int blacklist = !!pcie_aspm_sanity_check(pdev);
 
-	if (!aspm_support_enabled)
-		return;
-
 	if (!pci_is_pcie(pdev) || pdev->link_state)
 		return;
 	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
@@ -718,15 +705,11 @@ void pcie_aspm_powersave_config_link(str
  * pci_disable_link_state - disable pci device's link state, so the link will
  * never enter specific states
  */
-static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
-				     bool force)
+static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
 {
 	struct pci_dev *parent = pdev->bus->self;
 	struct pcie_link_state *link;
 
-	if (aspm_disabled && !force)
-		return;
-
 	if (!pci_is_pcie(pdev))
 		return;
 
@@ -757,34 +740,16 @@ static void __pci_disable_link_state(str
 
 void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, false, false);
+	__pci_disable_link_state(pdev, state, false);
 }
 EXPORT_SYMBOL(pci_disable_link_state_locked);
 
 void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, true, false);
+	__pci_disable_link_state(pdev, state, true);
 }
 EXPORT_SYMBOL(pci_disable_link_state);
 
-void pcie_clear_aspm(struct pci_bus *bus)
-{
-	struct pci_dev *child;
-
-	if (aspm_force)
-		return;
-
-	/*
-	 * Clear any ASPM setup that the firmware has carried out on this bus
-	 */
-	list_for_each_entry(child, &bus->devices, bus_list) {
-		__pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
-					 PCIE_LINK_STATE_L1 |
-					 PCIE_LINK_STATE_CLKPM,
-					 false, true);
-	}
-}
-
 static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp)
 {
 	int i;
@@ -944,7 +909,6 @@ static int __init pcie_aspm_disable(char
 	if (!strcmp(str, "off")) {
 		aspm_policy = POLICY_DEFAULT;
 		aspm_disabled = 1;
-		aspm_support_enabled = false;
 		printk(KERN_INFO "PCIe ASPM is disabled\n");
 	} else if (!strcmp(str, "force")) {
 		aspm_force = 1;
@@ -980,9 +944,3 @@ int pcie_aspm_enabled(void)
        return !aspm_disabled;
 }
 EXPORT_SYMBOL(pcie_aspm_enabled);
-
-bool pcie_aspm_support_enabled(void)
-{
-	return aspm_support_enabled;
-}
-EXPORT_SYMBOL(pcie_aspm_support_enabled);
Index: linux-2.6/include/linux/pci-aspm.h
===================================================================
--- linux-2.6.orig/include/linux/pci-aspm.h
+++ linux-2.6/include/linux/pci-aspm.h
@@ -29,7 +29,6 @@ extern void pcie_aspm_pm_state_change(st
 extern void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
 extern void pci_disable_link_state(struct pci_dev *pdev, int state);
 extern void pci_disable_link_state_locked(struct pci_dev *pdev, int state);
-extern void pcie_clear_aspm(struct pci_bus *bus);
 extern void pcie_no_aspm(void);
 #else
 static inline void pcie_aspm_init_link_state(struct pci_dev *pdev)
@@ -47,9 +46,6 @@ static inline void pcie_aspm_powersave_c
 static inline void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
 }
-static inline void pcie_clear_aspm(struct pci_bus *bus)
-{
-}
 static inline void pcie_no_aspm(void)
 {
 }
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1168,7 +1168,7 @@ static inline int pcie_aspm_enabled(void
 static inline bool pcie_aspm_support_enabled(void) { return false; }
 #else
 extern int pcie_aspm_enabled(void);
-extern bool pcie_aspm_support_enabled(void);
+static inline bool pcie_aspm_support_enabled(void) { return true; }
 #endif
 
 #ifdef CONFIG_PCIEAER

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-03-29 18:04                           ` Yinghai Lu
  0 siblings, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-03-29 18:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

attatched -v3 again

On Fri, Mar 29, 2013 at 11:02 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Mar 29, 2013 at 5:24 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> Half of your v1 patch (removing the pcie_aspm_sanity_check() test)
>> *might* be the right thing, but only if you can clearly explain why
>> that will not reintroduce the bug Matthew fixed with c9651e70.
>>
>> I think we also need to fix the PCI_FIXUP_FINAL quirk regression, but
>> that's a separate issue and should be a separate patch.
>
>
> First commit from Matthew
>  0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
>     Right now we won't touch ASPM state if ASPM is disabled, except in the case
>     where we find a device that appears to be too old to reliably support ASPM.
>     Right now we'll clear it in that case, which is almost certainly the wrong
>     thing to do
>
> Try to not touch pre-1.1 ASPM for all, and it causes lots of regression.
>
> So second commit
>
> cdb0f9a1ad2e ASPM: Fix pcie devices with non-pcie children
>     Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
>     Some other systems using the pata_jmicron driver fail to boot because no
>     disks are detected.  Passing pcie_aspm=force on the kernel command line
>     works around it.
>
> move the check aspm_disabled down.
>
> but ath5 and etc (pre-1.1) really need to aspm_disable to change their
> hw setting.
>
> So the right solution would be dropping pcie_aspm_sanity_check()
> change -in v2 should make all both happy, as quirk and disable that in
> driver for ath5 are calling
> pcie_disable_aspm_state explicitly.
>
> In v2, we already removed pcie_clear_aspm() that is calling
> pcie_disable_aspm_state.
>
>
> Please check attached -v3.
>
>
> Thanks
>
> Yinghai

[-- Attachment #2: disable_aspm_3.patch --]
[-- Type: application/octet-stream, Size: 8221 bytes --]

Subject: [PATCH] PCI: Remove not needed check in disable aspm link

Roman reported ath5k does not work anymore on 3.8.
Bisected to
| commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
| Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
| Date:   Tue Oct 30 15:27:13 2012 +0900
|
|    PCI/ACPI: Request _OSC control before scanning PCI root bus
|
|    This patch moves up the code block to request _OSC control in order to
|    separate ACPI work and PCI work in acpi_pci_root_add().

It make pcie_aspm_sanity_check does not work anymore as acpi_disabled
is set before pci root bus scanning.

We could revert to old logic, but that will make booting path and hotplug
path with different aspm_disabled again.

Acctually we don't need to check aspm_disabled in pci_disable_link_state,
as we already have protection about link state checking.
and pci_disable_link_state will be only called explicted for quirk
and driver.

That will keep the logic in pcie_aspm_sanity_check() in commits:
 0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
 cdb0f9a1a     ASPM: Fix pcie devices with non-pcie children
still working, AKA still not touch pre-1.1 ASPM device.

https://bugzilla.kernel.org/show_bug.cgi?id=55211
http://article.gmane.org/gmane.linux.kernel.pci/20640

Need it for 3.8 stable.

-v2: more cleanup
	1. remove aspm_support_enabled, as if it compiled in, support is there
		so even user pass aspm=off, link_state still get allocated,
		then we will have chance to disable aspm on devices from
		buggy setting of BIOS.
	2. move pcie_no_aspm() calling for fadt disabling before scanning
		requested by Bjorn.
-v3: remove change in pcie_aspm_sanity_check()

Reported-by: Roman Yepishev <roman.yepishev@gmail.com>
Bisected-by: Roman Yepishev <roman.yepishev@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

---
 drivers/acpi/pci_root.c  |   25 +++++++++---------------
 drivers/pci/pcie/aspm.c  |   48 ++---------------------------------------------
 include/linux/pci-aspm.h |    4 ---
 include/linux/pci.h      |    2 -
 4 files changed, 14 insertions(+), 65 deletions(-)

Index: linux-2.6/drivers/acpi/pci_root.c
===================================================================
--- linux-2.6.orig/drivers/acpi/pci_root.c
+++ linux-2.6/drivers/acpi/pci_root.c
@@ -415,7 +415,6 @@ static int acpi_pci_root_add(struct acpi
 	struct acpi_pci_root *root;
 	struct acpi_pci_driver *driver;
 	u32 flags, base_flags;
-	bool is_osc_granted = false;
 
 	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
 	if (!root)
@@ -494,6 +493,11 @@ static int acpi_pci_root_add(struct acpi
 			flags = base_flags;
 		}
 	}
+
+	/* ASPM setting */
+	if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
+		pcie_no_aspm();
+
 	if (!pcie_ports_disabled
 	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
 		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
@@ -513,16 +517,17 @@ static int acpi_pci_root_add(struct acpi
 
 		status = acpi_pci_osc_control_set(device->handle, &flags,
 				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
-		if (ACPI_SUCCESS(status)) {
-			is_osc_granted = true;
+		if (ACPI_SUCCESS(status))
 			dev_info(&device->dev,
 				"ACPI _OSC control (0x%02x) granted\n", flags);
-		} else {
-			is_osc_granted = false;
+		else {
 			dev_info(&device->dev,
 				"ACPI _OSC request failed (%s), "
 				"returned control mask: 0x%02x\n",
 				acpi_format_exception(status), flags);
+			pr_info("ACPI _OSC control for PCIe not granted, "
+				"disabling ASPM\n");
+			pcie_no_aspm();
 		}
 	} else {
 		dev_info(&device->dev,
@@ -554,16 +559,6 @@ static int acpi_pci_root_add(struct acpi
 		goto out_del_root;
 	}
 
-	/* ASPM setting */
-	if (is_osc_granted) {
-		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
-			pcie_clear_aspm(root->bus);
-	} else {
-		pr_info("ACPI _OSC control for PCIe not granted, "
-			"disabling ASPM\n");
-		pcie_no_aspm();
-	}
-
 	pci_acpi_add_bus_pm_notifier(device, root->bus);
 	if (device->wakeup.flags.run_wake)
 		device_set_run_wake(root->bus->bridge, true);
Index: linux-2.6/drivers/pci/pcie/aspm.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/aspm.c
+++ linux-2.6/drivers/pci/pcie/aspm.c
@@ -69,7 +69,6 @@ struct pcie_link_state {
 };
 
 static int aspm_disabled, aspm_force;
-static bool aspm_support_enabled = true;
 static DEFINE_MUTEX(aspm_lock);
 static LIST_HEAD(link_list);
 
@@ -556,9 +546,6 @@ void pcie_aspm_init_link_state(struct pc
 	struct pcie_link_state *link;
 	int blacklist = !!pcie_aspm_sanity_check(pdev);
 
-	if (!aspm_support_enabled)
-		return;
-
 	if (!pci_is_pcie(pdev) || pdev->link_state)
 		return;
 	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
@@ -718,15 +705,11 @@ void pcie_aspm_powersave_config_link(str
  * pci_disable_link_state - disable pci device's link state, so the link will
  * never enter specific states
  */
-static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
-				     bool force)
+static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
 {
 	struct pci_dev *parent = pdev->bus->self;
 	struct pcie_link_state *link;
 
-	if (aspm_disabled && !force)
-		return;
-
 	if (!pci_is_pcie(pdev))
 		return;
 
@@ -757,34 +740,16 @@ static void __pci_disable_link_state(str
 
 void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, false, false);
+	__pci_disable_link_state(pdev, state, false);
 }
 EXPORT_SYMBOL(pci_disable_link_state_locked);
 
 void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, true, false);
+	__pci_disable_link_state(pdev, state, true);
 }
 EXPORT_SYMBOL(pci_disable_link_state);
 
-void pcie_clear_aspm(struct pci_bus *bus)
-{
-	struct pci_dev *child;
-
-	if (aspm_force)
-		return;
-
-	/*
-	 * Clear any ASPM setup that the firmware has carried out on this bus
-	 */
-	list_for_each_entry(child, &bus->devices, bus_list) {
-		__pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
-					 PCIE_LINK_STATE_L1 |
-					 PCIE_LINK_STATE_CLKPM,
-					 false, true);
-	}
-}
-
 static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp)
 {
 	int i;
@@ -944,7 +909,6 @@ static int __init pcie_aspm_disable(char
 	if (!strcmp(str, "off")) {
 		aspm_policy = POLICY_DEFAULT;
 		aspm_disabled = 1;
-		aspm_support_enabled = false;
 		printk(KERN_INFO "PCIe ASPM is disabled\n");
 	} else if (!strcmp(str, "force")) {
 		aspm_force = 1;
@@ -980,9 +944,3 @@ int pcie_aspm_enabled(void)
        return !aspm_disabled;
 }
 EXPORT_SYMBOL(pcie_aspm_enabled);
-
-bool pcie_aspm_support_enabled(void)
-{
-	return aspm_support_enabled;
-}
-EXPORT_SYMBOL(pcie_aspm_support_enabled);
Index: linux-2.6/include/linux/pci-aspm.h
===================================================================
--- linux-2.6.orig/include/linux/pci-aspm.h
+++ linux-2.6/include/linux/pci-aspm.h
@@ -29,7 +29,6 @@ extern void pcie_aspm_pm_state_change(st
 extern void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
 extern void pci_disable_link_state(struct pci_dev *pdev, int state);
 extern void pci_disable_link_state_locked(struct pci_dev *pdev, int state);
-extern void pcie_clear_aspm(struct pci_bus *bus);
 extern void pcie_no_aspm(void);
 #else
 static inline void pcie_aspm_init_link_state(struct pci_dev *pdev)
@@ -47,9 +46,6 @@ static inline void pcie_aspm_powersave_c
 static inline void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
 }
-static inline void pcie_clear_aspm(struct pci_bus *bus)
-{
-}
 static inline void pcie_no_aspm(void)
 {
 }
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1168,7 +1168,7 @@ static inline int pcie_aspm_enabled(void
 static inline bool pcie_aspm_support_enabled(void) { return false; }
 #else
 extern int pcie_aspm_enabled(void);
-extern bool pcie_aspm_support_enabled(void);
+static inline bool pcie_aspm_support_enabled(void) { return true; }
 #endif
 
 #ifdef CONFIG_PCIEAER

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-29  3:22                 ` Bjorn Helgaas
  2013-03-29  5:59                   ` Yinghai Lu
@ 2013-03-29 18:11                   ` Roman Yepishev
  1 sibling, 0 replies; 71+ messages in thread
From: Roman Yepishev @ 2013-03-29 18:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Fri, Mar 29, 2013 at 5:22 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Matthew]
> [+cc e1000-devel@lists.sourceforge.net for suspected 82575/82598 regression]
>
> I think this regression has nothing to do with pci_disable_link_state().
>
> ...
>
> There are also PCI_FIXUP_FINAL quirks for 82575 and 82598 NICs that call
> pci_disable_link_state().  In 3.7, these quirks are run before
> aspm_disabled is set, but 8c33f51d moved the pcie_no_aspm() call up
> before we start scanning the bus, so in 3.8, aspm_disabled is set
> *before* we run them.  I think that means 8c33f51d broke all these
> quirks.  That's also a problem, of course, but this isn't the one Roman
> is seeing either.
I have to say that my iwlwifi device (Intel Corporation Centrino
Wireless-N 1000 [Condor Peak] [8086:0084] also appears to be affected
- it calls pci_disable_link_state too and in current 3.8 that does not
do anything. It looks like it affects something when the system is
resumed from suspend, but I was not able to confirm that yet (but
there's a thread about removing the code since it did not appear to
work - http://thread.gmane.org/gmane.linux.kernel.pci/20628/focus=20640)

> I think the problem Roman is seeing happens when
> pcie_aspm_init_link_state() calls pcie_aspm_sanity_check() during device
> enumeration.  In 3.8, the fact that aspm_disabled is already set by the
> time we get here means we skip the check for pre-1.1 PCIe devices, and
> I think *this* is what Roman is seeing.
>
> I suspect the following hunk of your patch is enough to fix things for
> Roman:
>
>> --- linux-2.6.orig/drivers/pci/pcie/aspm.c
>> +++ linux-2.6/drivers/pci/pcie/aspm.c
>> @@ -493,15 +492,6 @@ static int pcie_aspm_sanity_check(struct
>>                       return -EINVAL;
>>
>>               /*
>> -              * If ASPM is disabled then we're not going to change
>> -              * the BIOS state. It's safe to continue even if it's a
>> -              * pre-1.1 device
>> -              */
>> -
>> -             if (aspm_disabled)
>> -                     continue;
>> -
>> -             /*
>>                * Disable ASPM for pre-1.1 PCIe device, we follow MS to use
>>                * RBER bit to determine if a function is 1.1 version device
>>                */
>
> However, this test was added by Matthew in c9651e70, and I can't remove
> it unless we have an explanation of why removing it will not reintroduce
> the bug he was fixing.
>
> This code is such a terrible mess that it's not surprising at all that
> we have all these issues.  But there's too much to untangle in v3.9; all
> we can hope for is to fix the regressions in v3.9 and clean it up later.
>
I have removed the check and indeed it allowed ASPM to become disabled
during the ath5k driver load.

--
Regards, Roman Yepishev

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-03-29 18:04                           ` Yinghai Lu
@ 2013-04-01 23:52                             ` Bjorn Helgaas
  -1 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-04-01 23:52 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Fri, Mar 29, 2013 at 11:04:48AM -0700, Yinghai Lu wrote:
> attatched -v3 again
> 
> On Fri, Mar 29, 2013 at 11:02 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Fri, Mar 29, 2013 at 5:24 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>
> >> Half of your v1 patch (removing the pcie_aspm_sanity_check() test)
> >> *might* be the right thing, but only if you can clearly explain why
> >> that will not reintroduce the bug Matthew fixed with c9651e70.
> >>
> >> I think we also need to fix the PCI_FIXUP_FINAL quirk regression, but
> >> that's a separate issue and should be a separate patch.
> >
> >
> > First commit from Matthew
> >  0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
> >     Right now we won't touch ASPM state if ASPM is disabled, except in the case
> >     where we find a device that appears to be too old to reliably support ASPM.
> >     Right now we'll clear it in that case, which is almost certainly the wrong
> >     thing to do
> >
> > Try to not touch pre-1.1 ASPM for all, and it causes lots of regression.
> >
> > So second commit
> >
> > cdb0f9a1ad2e ASPM: Fix pcie devices with non-pcie children
> >     Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
> >     Some other systems using the pata_jmicron driver fail to boot because no
> >     disks are detected.  Passing pcie_aspm=force on the kernel command line
> >     works around it.
> >
> > move the check aspm_disabled down.
> >
> > but ath5 and etc (pre-1.1) really need to aspm_disable to change their
> > hw setting.
> >
> > So the right solution would be dropping pcie_aspm_sanity_check()
> > change -in v2 should make all both happy, as quirk and disable that in
> > driver for ath5 are calling
> > pcie_disable_aspm_state explicitly.
> >
> > In v2, we already removed pcie_clear_aspm() that is calling
> > pcie_disable_aspm_state.
> >
> >
> > Please check attached -v3.

It's getting late in the v3.9 cycle already, and while your v3 patch
probably fixes Roman's problem, I can't convince myself that it is
safe in general.

I think the safest thing to do at this point is to revert 8c33f51df
("PCI/ACPI: Request _OSC control before scanning PCI root bus") with a
patch like the one below.

That does mean the booting path and hotplug paths will be different (we set
aspm_disabled after boot but before hotplug), but it was that way for a
long time before 8c33f51df.  I think it's more important to fix this recent
ath5k regression than to fix a long-standing hotplug bug that nobody ever
complained about.

Obviously, I think we should fix the hotplug bug and clean up the ASPM
mess, too.  But we need to do that when we have more time to do it right
and test it.

Bjorn


commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Mon Apr 1 15:47:39 2013 -0600

    Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
    
    This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
    
    Conflicts:
    	drivers/acpi/pci_root.c
    
    Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 0ac546d..c740364 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -415,7 +415,6 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	struct acpi_pci_root *root;
 	struct acpi_pci_driver *driver;
 	u32 flags, base_flags;
-	bool is_osc_granted = false;
 
 	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
 	if (!root)
@@ -476,6 +475,30 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT;
 	acpi_pci_osc_support(root, flags);
 
+	/*
+	 * TBD: Need PCI interface for enumeration/configuration of roots.
+	 */
+
+	mutex_lock(&acpi_pci_root_lock);
+	list_add_tail(&root->node, &acpi_pci_roots);
+	mutex_unlock(&acpi_pci_root_lock);
+
+	/*
+	 * Scan the Root Bridge
+	 * --------------------
+	 * Must do this prior to any attempt to bind the root device, as the
+	 * PCI namespace does not get created until this call is made (and
+	 * thus the root bridge's pci_dev does not exist).
+	 */
+	root->bus = pci_acpi_scan_root(root);
+	if (!root->bus) {
+		printk(KERN_ERR PREFIX
+			    "Bus %04x:%02x not present in PCI namespace\n",
+			    root->segment, (unsigned int)root->secondary.start);
+		result = -ENODEV;
+		goto out_del_root;
+	}
+
 	/* Indicate support for various _OSC capabilities. */
 	if (pci_ext_cfg_avail())
 		flags |= OSC_EXT_PCI_CONFIG_SUPPORT;
@@ -494,6 +517,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
 			flags = base_flags;
 		}
 	}
+
 	if (!pcie_ports_disabled
 	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
 		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
@@ -514,54 +538,28 @@ static int acpi_pci_root_add(struct acpi_device *device,
 		status = acpi_pci_osc_control_set(device->handle, &flags,
 				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
 		if (ACPI_SUCCESS(status)) {
-			is_osc_granted = true;
 			dev_info(&device->dev,
 				"ACPI _OSC control (0x%02x) granted\n", flags);
+			if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM) {
+				/*
+				 * We have ASPM control, but the FADT indicates
+				 * that it's unsupported. Clear it.
+				 */
+				pcie_clear_aspm(root->bus);
+			}
 		} else {
-			is_osc_granted = false;
 			dev_info(&device->dev,
 				"ACPI _OSC request failed (%s), "
 				"returned control mask: 0x%02x\n",
 				acpi_format_exception(status), flags);
+			pr_info("ACPI _OSC control for PCIe not granted, "
+				"disabling ASPM\n");
+			pcie_no_aspm();
 		}
 	} else {
 		dev_info(&device->dev,
-			"Unable to request _OSC control "
-			"(_OSC support mask: 0x%02x)\n", flags);
-	}
-
-	/*
-	 * TBD: Need PCI interface for enumeration/configuration of roots.
-	 */
-
-	mutex_lock(&acpi_pci_root_lock);
-	list_add_tail(&root->node, &acpi_pci_roots);
-	mutex_unlock(&acpi_pci_root_lock);
-
-	/*
-	 * Scan the Root Bridge
-	 * --------------------
-	 * Must do this prior to any attempt to bind the root device, as the
-	 * PCI namespace does not get created until this call is made (and 
-	 * thus the root bridge's pci_dev does not exist).
-	 */
-	root->bus = pci_acpi_scan_root(root);
-	if (!root->bus) {
-		printk(KERN_ERR PREFIX
-			    "Bus %04x:%02x not present in PCI namespace\n",
-			    root->segment, (unsigned int)root->secondary.start);
-		result = -ENODEV;
-		goto out_del_root;
-	}
-
-	/* ASPM setting */
-	if (is_osc_granted) {
-		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
-			pcie_clear_aspm(root->bus);
-	} else {
-		pr_info("ACPI _OSC control for PCIe not granted, "
-			"disabling ASPM\n");
-		pcie_no_aspm();
+			 "Unable to request _OSC control "
+			 "(_OSC support mask: 0x%02x)\n", flags);
 	}
 
 	pci_acpi_add_bus_pm_notifier(device, root->bus);

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-04-01 23:52                             ` Bjorn Helgaas
  0 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-04-01 23:52 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Fri, Mar 29, 2013 at 11:04:48AM -0700, Yinghai Lu wrote:
> attatched -v3 again
> 
> On Fri, Mar 29, 2013 at 11:02 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Fri, Mar 29, 2013 at 5:24 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>
> >> Half of your v1 patch (removing the pcie_aspm_sanity_check() test)
> >> *might* be the right thing, but only if you can clearly explain why
> >> that will not reintroduce the bug Matthew fixed with c9651e70.
> >>
> >> I think we also need to fix the PCI_FIXUP_FINAL quirk regression, but
> >> that's a separate issue and should be a separate patch.
> >
> >
> > First commit from Matthew
> >  0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
> >     Right now we won't touch ASPM state if ASPM is disabled, except in the case
> >     where we find a device that appears to be too old to reliably support ASPM.
> >     Right now we'll clear it in that case, which is almost certainly the wrong
> >     thing to do
> >
> > Try to not touch pre-1.1 ASPM for all, and it causes lots of regression.
> >
> > So second commit
> >
> > cdb0f9a1ad2e ASPM: Fix pcie devices with non-pcie children
> >     Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
> >     Some other systems using the pata_jmicron driver fail to boot because no
> >     disks are detected.  Passing pcie_aspm=force on the kernel command line
> >     works around it.
> >
> > move the check aspm_disabled down.
> >
> > but ath5 and etc (pre-1.1) really need to aspm_disable to change their
> > hw setting.
> >
> > So the right solution would be dropping pcie_aspm_sanity_check()
> > change -in v2 should make all both happy, as quirk and disable that in
> > driver for ath5 are calling
> > pcie_disable_aspm_state explicitly.
> >
> > In v2, we already removed pcie_clear_aspm() that is calling
> > pcie_disable_aspm_state.
> >
> >
> > Please check attached -v3.

It's getting late in the v3.9 cycle already, and while your v3 patch
probably fixes Roman's problem, I can't convince myself that it is
safe in general.

I think the safest thing to do at this point is to revert 8c33f51df
("PCI/ACPI: Request _OSC control before scanning PCI root bus") with a
patch like the one below.

That does mean the booting path and hotplug paths will be different (we set
aspm_disabled after boot but before hotplug), but it was that way for a
long time before 8c33f51df.  I think it's more important to fix this recent
ath5k regression than to fix a long-standing hotplug bug that nobody ever
complained about.

Obviously, I think we should fix the hotplug bug and clean up the ASPM
mess, too.  But we need to do that when we have more time to do it right
and test it.

Bjorn


commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Mon Apr 1 15:47:39 2013 -0600

    Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
    
    This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
    
    Conflicts:
    	drivers/acpi/pci_root.c
    
    Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 0ac546d..c740364 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -415,7 +415,6 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	struct acpi_pci_root *root;
 	struct acpi_pci_driver *driver;
 	u32 flags, base_flags;
-	bool is_osc_granted = false;
 
 	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
 	if (!root)
@@ -476,6 +475,30 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT;
 	acpi_pci_osc_support(root, flags);
 
+	/*
+	 * TBD: Need PCI interface for enumeration/configuration of roots.
+	 */
+
+	mutex_lock(&acpi_pci_root_lock);
+	list_add_tail(&root->node, &acpi_pci_roots);
+	mutex_unlock(&acpi_pci_root_lock);
+
+	/*
+	 * Scan the Root Bridge
+	 * --------------------
+	 * Must do this prior to any attempt to bind the root device, as the
+	 * PCI namespace does not get created until this call is made (and
+	 * thus the root bridge's pci_dev does not exist).
+	 */
+	root->bus = pci_acpi_scan_root(root);
+	if (!root->bus) {
+		printk(KERN_ERR PREFIX
+			    "Bus %04x:%02x not present in PCI namespace\n",
+			    root->segment, (unsigned int)root->secondary.start);
+		result = -ENODEV;
+		goto out_del_root;
+	}
+
 	/* Indicate support for various _OSC capabilities. */
 	if (pci_ext_cfg_avail())
 		flags |= OSC_EXT_PCI_CONFIG_SUPPORT;
@@ -494,6 +517,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
 			flags = base_flags;
 		}
 	}
+
 	if (!pcie_ports_disabled
 	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
 		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
@@ -514,54 +538,28 @@ static int acpi_pci_root_add(struct acpi_device *device,
 		status = acpi_pci_osc_control_set(device->handle, &flags,
 				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
 		if (ACPI_SUCCESS(status)) {
-			is_osc_granted = true;
 			dev_info(&device->dev,
 				"ACPI _OSC control (0x%02x) granted\n", flags);
+			if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM) {
+				/*
+				 * We have ASPM control, but the FADT indicates
+				 * that it's unsupported. Clear it.
+				 */
+				pcie_clear_aspm(root->bus);
+			}
 		} else {
-			is_osc_granted = false;
 			dev_info(&device->dev,
 				"ACPI _OSC request failed (%s), "
 				"returned control mask: 0x%02x\n",
 				acpi_format_exception(status), flags);
+			pr_info("ACPI _OSC control for PCIe not granted, "
+				"disabling ASPM\n");
+			pcie_no_aspm();
 		}
 	} else {
 		dev_info(&device->dev,
-			"Unable to request _OSC control "
-			"(_OSC support mask: 0x%02x)\n", flags);
-	}
-
-	/*
-	 * TBD: Need PCI interface for enumeration/configuration of roots.
-	 */
-
-	mutex_lock(&acpi_pci_root_lock);
-	list_add_tail(&root->node, &acpi_pci_roots);
-	mutex_unlock(&acpi_pci_root_lock);
-
-	/*
-	 * Scan the Root Bridge
-	 * --------------------
-	 * Must do this prior to any attempt to bind the root device, as the
-	 * PCI namespace does not get created until this call is made (and 
-	 * thus the root bridge's pci_dev does not exist).
-	 */
-	root->bus = pci_acpi_scan_root(root);
-	if (!root->bus) {
-		printk(KERN_ERR PREFIX
-			    "Bus %04x:%02x not present in PCI namespace\n",
-			    root->segment, (unsigned int)root->secondary.start);
-		result = -ENODEV;
-		goto out_del_root;
-	}
-
-	/* ASPM setting */
-	if (is_osc_granted) {
-		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
-			pcie_clear_aspm(root->bus);
-	} else {
-		pr_info("ACPI _OSC control for PCIe not granted, "
-			"disabling ASPM\n");
-		pcie_no_aspm();
+			 "Unable to request _OSC control "
+			 "(_OSC support mask: 0x%02x)\n", flags);
 	}
 
 	pci_acpi_add_bus_pm_notifier(device, root->bus);

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-04-01 23:52                             ` Bjorn Helgaas
@ 2013-04-02  0:03                               ` Yinghai Lu
  -1 siblings, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-04-02  0:03 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Mon, Apr 1, 2013 at 4:52 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Mar 29, 2013 at 11:04:48AM -0700, Yinghai Lu wrote:
>> attatched -v3 again
>>
>> > Please check attached -v3.
>
> It's getting late in the v3.9 cycle already, and while your v3 patch
> probably fixes Roman's problem, I can't convince myself that it is
> safe in general.
>
> I think the safest thing to do at this point is to revert 8c33f51df
> ("PCI/ACPI: Request _OSC control before scanning PCI root bus") with a
> patch like the one below.

Agreed.

>
> That does mean the booting path and hotplug paths will be different (we set
> aspm_disabled after boot but before hotplug), but it was that way for a
> long time before 8c33f51df.  I think it's more important to fix this recent
> ath5k regression than to fix a long-standing hotplug bug that nobody ever
> complained about.
>
> Obviously, I think we should fix the hotplug bug and clean up the ASPM
> mess, too.  But we need to do that when we have more time to do it right
> and test it.

Sure.

>
> Bjorn
>
>
> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Mon Apr 1 15:47:39 2013 -0600
>
>     Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>
>     This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>
>     Conflicts:
>         drivers/acpi/pci_root.c
>
>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Acked-by: Yinghai Lu <yinghai@kernel.org>

stable need this reverting too.

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-04-02  0:03                               ` Yinghai Lu
  0 siblings, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-04-02  0:03 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Mon, Apr 1, 2013 at 4:52 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Mar 29, 2013 at 11:04:48AM -0700, Yinghai Lu wrote:
>> attatched -v3 again
>>
>> > Please check attached -v3.
>
> It's getting late in the v3.9 cycle already, and while your v3 patch
> probably fixes Roman's problem, I can't convince myself that it is
> safe in general.
>
> I think the safest thing to do at this point is to revert 8c33f51df
> ("PCI/ACPI: Request _OSC control before scanning PCI root bus") with a
> patch like the one below.

Agreed.

>
> That does mean the booting path and hotplug paths will be different (we set
> aspm_disabled after boot but before hotplug), but it was that way for a
> long time before 8c33f51df.  I think it's more important to fix this recent
> ath5k regression than to fix a long-standing hotplug bug that nobody ever
> complained about.
>
> Obviously, I think we should fix the hotplug bug and clean up the ASPM
> mess, too.  But we need to do that when we have more time to do it right
> and test it.

Sure.

>
> Bjorn
>
>
> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Mon Apr 1 15:47:39 2013 -0600
>
>     Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>
>     This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>
>     Conflicts:
>         drivers/acpi/pci_root.c
>
>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Acked-by: Yinghai Lu <yinghai@kernel.org>

stable need this reverting too.

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-04-01 23:52                             ` Bjorn Helgaas
@ 2013-04-02  0:10                               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2013-04-02  0:10 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Roman Yepishev, linux-pci, linux-acpi, linux-kernel,
	Taku Izumi, Kenji Kaneshige, Matthew Garrett, e1000-devel

On Monday, April 01, 2013 05:52:56 PM Bjorn Helgaas wrote:
> On Fri, Mar 29, 2013 at 11:04:48AM -0700, Yinghai Lu wrote:
> > attatched -v3 again
> > 
> > On Fri, Mar 29, 2013 at 11:02 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > > On Fri, Mar 29, 2013 at 5:24 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > >>
> > >> Half of your v1 patch (removing the pcie_aspm_sanity_check() test)
> > >> *might* be the right thing, but only if you can clearly explain why
> > >> that will not reintroduce the bug Matthew fixed with c9651e70.
> > >>
> > >> I think we also need to fix the PCI_FIXUP_FINAL quirk regression, but
> > >> that's a separate issue and should be a separate patch.
> > >
> > >
> > > First commit from Matthew
> > >  0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
> > >     Right now we won't touch ASPM state if ASPM is disabled, except in the case
> > >     where we find a device that appears to be too old to reliably support ASPM.
> > >     Right now we'll clear it in that case, which is almost certainly the wrong
> > >     thing to do
> > >
> > > Try to not touch pre-1.1 ASPM for all, and it causes lots of regression.
> > >
> > > So second commit
> > >
> > > cdb0f9a1ad2e ASPM: Fix pcie devices with non-pcie children
> > >     Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
> > >     Some other systems using the pata_jmicron driver fail to boot because no
> > >     disks are detected.  Passing pcie_aspm=force on the kernel command line
> > >     works around it.
> > >
> > > move the check aspm_disabled down.
> > >
> > > but ath5 and etc (pre-1.1) really need to aspm_disable to change their
> > > hw setting.
> > >
> > > So the right solution would be dropping pcie_aspm_sanity_check()
> > > change -in v2 should make all both happy, as quirk and disable that in
> > > driver for ath5 are calling
> > > pcie_disable_aspm_state explicitly.
> > >
> > > In v2, we already removed pcie_clear_aspm() that is calling
> > > pcie_disable_aspm_state.
> > >
> > >
> > > Please check attached -v3.
> 
> It's getting late in the v3.9 cycle already, and while your v3 patch
> probably fixes Roman's problem, I can't convince myself that it is
> safe in general.
> 
> I think the safest thing to do at this point is to revert 8c33f51df
> ("PCI/ACPI: Request _OSC control before scanning PCI root bus") with a
> patch like the one below.
> 
> That does mean the booting path and hotplug paths will be different (we set
> aspm_disabled after boot but before hotplug), but it was that way for a
> long time before 8c33f51df.  I think it's more important to fix this recent
> ath5k regression than to fix a long-standing hotplug bug that nobody ever
> complained about.
> 
> Obviously, I think we should fix the hotplug bug and clean up the ASPM
> mess, too.  But we need to do that when we have more time to do it right
> and test it.
> 
> Bjorn
> 
> 
> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Mon Apr 1 15:47:39 2013 -0600
> 
>     Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>     
>     This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>     
>     Conflicts:
>     	drivers/acpi/pci_root.c
>     
>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 0ac546d..c740364 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -415,7 +415,6 @@ static int acpi_pci_root_add(struct acpi_device *device,
>  	struct acpi_pci_root *root;
>  	struct acpi_pci_driver *driver;
>  	u32 flags, base_flags;
> -	bool is_osc_granted = false;
>  
>  	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
>  	if (!root)
> @@ -476,6 +475,30 @@ static int acpi_pci_root_add(struct acpi_device *device,
>  	flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT;
>  	acpi_pci_osc_support(root, flags);
>  
> +	/*
> +	 * TBD: Need PCI interface for enumeration/configuration of roots.
> +	 */
> +
> +	mutex_lock(&acpi_pci_root_lock);
> +	list_add_tail(&root->node, &acpi_pci_roots);
> +	mutex_unlock(&acpi_pci_root_lock);
> +
> +	/*
> +	 * Scan the Root Bridge
> +	 * --------------------
> +	 * Must do this prior to any attempt to bind the root device, as the
> +	 * PCI namespace does not get created until this call is made (and
> +	 * thus the root bridge's pci_dev does not exist).
> +	 */
> +	root->bus = pci_acpi_scan_root(root);
> +	if (!root->bus) {
> +		printk(KERN_ERR PREFIX
> +			    "Bus %04x:%02x not present in PCI namespace\n",
> +			    root->segment, (unsigned int)root->secondary.start);
> +		result = -ENODEV;
> +		goto out_del_root;
> +	}
> +
>  	/* Indicate support for various _OSC capabilities. */
>  	if (pci_ext_cfg_avail())
>  		flags |= OSC_EXT_PCI_CONFIG_SUPPORT;
> @@ -494,6 +517,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
>  			flags = base_flags;
>  		}
>  	}
> +
>  	if (!pcie_ports_disabled
>  	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
>  		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
> @@ -514,54 +538,28 @@ static int acpi_pci_root_add(struct acpi_device *device,
>  		status = acpi_pci_osc_control_set(device->handle, &flags,
>  				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
>  		if (ACPI_SUCCESS(status)) {
> -			is_osc_granted = true;
>  			dev_info(&device->dev,
>  				"ACPI _OSC control (0x%02x) granted\n", flags);
> +			if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM) {
> +				/*
> +				 * We have ASPM control, but the FADT indicates
> +				 * that it's unsupported. Clear it.
> +				 */
> +				pcie_clear_aspm(root->bus);
> +			}
>  		} else {
> -			is_osc_granted = false;
>  			dev_info(&device->dev,
>  				"ACPI _OSC request failed (%s), "
>  				"returned control mask: 0x%02x\n",
>  				acpi_format_exception(status), flags);
> +			pr_info("ACPI _OSC control for PCIe not granted, "
> +				"disabling ASPM\n");
> +			pcie_no_aspm();
>  		}
>  	} else {
>  		dev_info(&device->dev,
> -			"Unable to request _OSC control "
> -			"(_OSC support mask: 0x%02x)\n", flags);
> -	}
> -
> -	/*
> -	 * TBD: Need PCI interface for enumeration/configuration of roots.
> -	 */
> -
> -	mutex_lock(&acpi_pci_root_lock);
> -	list_add_tail(&root->node, &acpi_pci_roots);
> -	mutex_unlock(&acpi_pci_root_lock);
> -
> -	/*
> -	 * Scan the Root Bridge
> -	 * --------------------
> -	 * Must do this prior to any attempt to bind the root device, as the
> -	 * PCI namespace does not get created until this call is made (and 
> -	 * thus the root bridge's pci_dev does not exist).
> -	 */
> -	root->bus = pci_acpi_scan_root(root);
> -	if (!root->bus) {
> -		printk(KERN_ERR PREFIX
> -			    "Bus %04x:%02x not present in PCI namespace\n",
> -			    root->segment, (unsigned int)root->secondary.start);
> -		result = -ENODEV;
> -		goto out_del_root;
> -	}
> -
> -	/* ASPM setting */
> -	if (is_osc_granted) {
> -		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
> -			pcie_clear_aspm(root->bus);
> -	} else {
> -		pr_info("ACPI _OSC control for PCIe not granted, "
> -			"disabling ASPM\n");
> -		pcie_no_aspm();
> +			 "Unable to request _OSC control "
> +			 "(_OSC support mask: 0x%02x)\n", flags);
>  	}
>  
>  	pci_acpi_add_bus_pm_notifier(device, root->bus);
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-04-02  0:10                               ` Rafael J. Wysocki
  0 siblings, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2013-04-02  0:10 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Roman Yepishev, linux-pci, linux-acpi, linux-kernel,
	Taku Izumi, Kenji Kaneshige, Matthew Garrett, e1000-devel

On Monday, April 01, 2013 05:52:56 PM Bjorn Helgaas wrote:
> On Fri, Mar 29, 2013 at 11:04:48AM -0700, Yinghai Lu wrote:
> > attatched -v3 again
> > 
> > On Fri, Mar 29, 2013 at 11:02 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > > On Fri, Mar 29, 2013 at 5:24 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > >>
> > >> Half of your v1 patch (removing the pcie_aspm_sanity_check() test)
> > >> *might* be the right thing, but only if you can clearly explain why
> > >> that will not reintroduce the bug Matthew fixed with c9651e70.
> > >>
> > >> I think we also need to fix the PCI_FIXUP_FINAL quirk regression, but
> > >> that's a separate issue and should be a separate patch.
> > >
> > >
> > > First commit from Matthew
> > >  0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
> > >     Right now we won't touch ASPM state if ASPM is disabled, except in the case
> > >     where we find a device that appears to be too old to reliably support ASPM.
> > >     Right now we'll clear it in that case, which is almost certainly the wrong
> > >     thing to do
> > >
> > > Try to not touch pre-1.1 ASPM for all, and it causes lots of regression.
> > >
> > > So second commit
> > >
> > > cdb0f9a1ad2e ASPM: Fix pcie devices with non-pcie children
> > >     Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
> > >     Some other systems using the pata_jmicron driver fail to boot because no
> > >     disks are detected.  Passing pcie_aspm=force on the kernel command line
> > >     works around it.
> > >
> > > move the check aspm_disabled down.
> > >
> > > but ath5 and etc (pre-1.1) really need to aspm_disable to change their
> > > hw setting.
> > >
> > > So the right solution would be dropping pcie_aspm_sanity_check()
> > > change -in v2 should make all both happy, as quirk and disable that in
> > > driver for ath5 are calling
> > > pcie_disable_aspm_state explicitly.
> > >
> > > In v2, we already removed pcie_clear_aspm() that is calling
> > > pcie_disable_aspm_state.
> > >
> > >
> > > Please check attached -v3.
> 
> It's getting late in the v3.9 cycle already, and while your v3 patch
> probably fixes Roman's problem, I can't convince myself that it is
> safe in general.
> 
> I think the safest thing to do at this point is to revert 8c33f51df
> ("PCI/ACPI: Request _OSC control before scanning PCI root bus") with a
> patch like the one below.
> 
> That does mean the booting path and hotplug paths will be different (we set
> aspm_disabled after boot but before hotplug), but it was that way for a
> long time before 8c33f51df.  I think it's more important to fix this recent
> ath5k regression than to fix a long-standing hotplug bug that nobody ever
> complained about.
> 
> Obviously, I think we should fix the hotplug bug and clean up the ASPM
> mess, too.  But we need to do that when we have more time to do it right
> and test it.
> 
> Bjorn
> 
> 
> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Mon Apr 1 15:47:39 2013 -0600
> 
>     Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>     
>     This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>     
>     Conflicts:
>     	drivers/acpi/pci_root.c
>     
>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 0ac546d..c740364 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -415,7 +415,6 @@ static int acpi_pci_root_add(struct acpi_device *device,
>  	struct acpi_pci_root *root;
>  	struct acpi_pci_driver *driver;
>  	u32 flags, base_flags;
> -	bool is_osc_granted = false;
>  
>  	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
>  	if (!root)
> @@ -476,6 +475,30 @@ static int acpi_pci_root_add(struct acpi_device *device,
>  	flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT;
>  	acpi_pci_osc_support(root, flags);
>  
> +	/*
> +	 * TBD: Need PCI interface for enumeration/configuration of roots.
> +	 */
> +
> +	mutex_lock(&acpi_pci_root_lock);
> +	list_add_tail(&root->node, &acpi_pci_roots);
> +	mutex_unlock(&acpi_pci_root_lock);
> +
> +	/*
> +	 * Scan the Root Bridge
> +	 * --------------------
> +	 * Must do this prior to any attempt to bind the root device, as the
> +	 * PCI namespace does not get created until this call is made (and
> +	 * thus the root bridge's pci_dev does not exist).
> +	 */
> +	root->bus = pci_acpi_scan_root(root);
> +	if (!root->bus) {
> +		printk(KERN_ERR PREFIX
> +			    "Bus %04x:%02x not present in PCI namespace\n",
> +			    root->segment, (unsigned int)root->secondary.start);
> +		result = -ENODEV;
> +		goto out_del_root;
> +	}
> +
>  	/* Indicate support for various _OSC capabilities. */
>  	if (pci_ext_cfg_avail())
>  		flags |= OSC_EXT_PCI_CONFIG_SUPPORT;
> @@ -494,6 +517,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
>  			flags = base_flags;
>  		}
>  	}
> +
>  	if (!pcie_ports_disabled
>  	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
>  		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
> @@ -514,54 +538,28 @@ static int acpi_pci_root_add(struct acpi_device *device,
>  		status = acpi_pci_osc_control_set(device->handle, &flags,
>  				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
>  		if (ACPI_SUCCESS(status)) {
> -			is_osc_granted = true;
>  			dev_info(&device->dev,
>  				"ACPI _OSC control (0x%02x) granted\n", flags);
> +			if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM) {
> +				/*
> +				 * We have ASPM control, but the FADT indicates
> +				 * that it's unsupported. Clear it.
> +				 */
> +				pcie_clear_aspm(root->bus);
> +			}
>  		} else {
> -			is_osc_granted = false;
>  			dev_info(&device->dev,
>  				"ACPI _OSC request failed (%s), "
>  				"returned control mask: 0x%02x\n",
>  				acpi_format_exception(status), flags);
> +			pr_info("ACPI _OSC control for PCIe not granted, "
> +				"disabling ASPM\n");
> +			pcie_no_aspm();
>  		}
>  	} else {
>  		dev_info(&device->dev,
> -			"Unable to request _OSC control "
> -			"(_OSC support mask: 0x%02x)\n", flags);
> -	}
> -
> -	/*
> -	 * TBD: Need PCI interface for enumeration/configuration of roots.
> -	 */
> -
> -	mutex_lock(&acpi_pci_root_lock);
> -	list_add_tail(&root->node, &acpi_pci_roots);
> -	mutex_unlock(&acpi_pci_root_lock);
> -
> -	/*
> -	 * Scan the Root Bridge
> -	 * --------------------
> -	 * Must do this prior to any attempt to bind the root device, as the
> -	 * PCI namespace does not get created until this call is made (and 
> -	 * thus the root bridge's pci_dev does not exist).
> -	 */
> -	root->bus = pci_acpi_scan_root(root);
> -	if (!root->bus) {
> -		printk(KERN_ERR PREFIX
> -			    "Bus %04x:%02x not present in PCI namespace\n",
> -			    root->segment, (unsigned int)root->secondary.start);
> -		result = -ENODEV;
> -		goto out_del_root;
> -	}
> -
> -	/* ASPM setting */
> -	if (is_osc_granted) {
> -		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
> -			pcie_clear_aspm(root->bus);
> -	} else {
> -		pr_info("ACPI _OSC control for PCIe not granted, "
> -			"disabling ASPM\n");
> -		pcie_no_aspm();
> +			 "Unable to request _OSC control "
> +			 "(_OSC support mask: 0x%02x)\n", flags);
>  	}
>  
>  	pci_acpi_add_bus_pm_notifier(device, root->bus);
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-04-02  0:03                               ` Yinghai Lu
@ 2013-04-02 20:10                                 ` Bjorn Helgaas
  -1 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-04-02 20:10 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Mon, Apr 1, 2013 at 6:03 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Apr 1, 2013 at 4:52 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, Mar 29, 2013 at 11:04:48AM -0700, Yinghai Lu wrote:
>>> attatched -v3 again
>>>
>>> > Please check attached -v3.
>>
>> It's getting late in the v3.9 cycle already, and while your v3 patch
>> probably fixes Roman's problem, I can't convince myself that it is
>> safe in general.
>>
>> I think the safest thing to do at this point is to revert 8c33f51df
>> ("PCI/ACPI: Request _OSC control before scanning PCI root bus") with a
>> patch like the one below.
>
> Agreed.
>
>>
>> That does mean the booting path and hotplug paths will be different (we set
>> aspm_disabled after boot but before hotplug), but it was that way for a
>> long time before 8c33f51df.  I think it's more important to fix this recent
>> ath5k regression than to fix a long-standing hotplug bug that nobody ever
>> complained about.
>>
>> Obviously, I think we should fix the hotplug bug and clean up the ASPM
>> mess, too.  But we need to do that when we have more time to do it right
>> and test it.
>
> Sure.
>
>>
>> Bjorn
>>
>>
>> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
>> Author: Bjorn Helgaas <bhelgaas@google.com>
>> Date:   Mon Apr 1 15:47:39 2013 -0600
>>
>>     Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>>
>>     This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>>
>>     Conflicts:
>>         drivers/acpi/pci_root.c
>>
>>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> Acked-by: Yinghai Lu <yinghai@kernel.org>
>
> stable need this reverting too.

I updated the changelog and added this to my for-linus branch, headed for v3.9.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-04-02 20:10                                 ` Bjorn Helgaas
  0 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-04-02 20:10 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Mon, Apr 1, 2013 at 6:03 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Apr 1, 2013 at 4:52 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, Mar 29, 2013 at 11:04:48AM -0700, Yinghai Lu wrote:
>>> attatched -v3 again
>>>
>>> > Please check attached -v3.
>>
>> It's getting late in the v3.9 cycle already, and while your v3 patch
>> probably fixes Roman's problem, I can't convince myself that it is
>> safe in general.
>>
>> I think the safest thing to do at this point is to revert 8c33f51df
>> ("PCI/ACPI: Request _OSC control before scanning PCI root bus") with a
>> patch like the one below.
>
> Agreed.
>
>>
>> That does mean the booting path and hotplug paths will be different (we set
>> aspm_disabled after boot but before hotplug), but it was that way for a
>> long time before 8c33f51df.  I think it's more important to fix this recent
>> ath5k regression than to fix a long-standing hotplug bug that nobody ever
>> complained about.
>>
>> Obviously, I think we should fix the hotplug bug and clean up the ASPM
>> mess, too.  But we need to do that when we have more time to do it right
>> and test it.
>
> Sure.
>
>>
>> Bjorn
>>
>>
>> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
>> Author: Bjorn Helgaas <bhelgaas@google.com>
>> Date:   Mon Apr 1 15:47:39 2013 -0600
>>
>>     Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>>
>>     This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>>
>>     Conflicts:
>>         drivers/acpi/pci_root.c
>>
>>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> Acked-by: Yinghai Lu <yinghai@kernel.org>
>
> stable need this reverting too.

I updated the changelog and added this to my for-linus branch, headed for v3.9.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-04-02 20:10                                 ` Bjorn Helgaas
@ 2013-06-12  6:20                                   ` Yinghai Lu
  -1 siblings, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-06-12  6:20 UTC (permalink / raw)
  To: Bjorn Helgaas, Jiang Liu
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Tue, Apr 2, 2013 at 1:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Apr 1, 2013 at 6:03 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
>>> Author: Bjorn Helgaas <bhelgaas@google.com>
>>> Date:   Mon Apr 1 15:47:39 2013 -0600
>>>
>>>     Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>>>
>>>     This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>>>
>>>     Conflicts:
>>>         drivers/acpi/pci_root.c
>>>
>>>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>
>> Acked-by: Yinghai Lu <yinghai@kernel.org>
>>
>> stable need this reverting too.
>
> I updated the changelog and added this to my for-linus branch, headed for v3.9.

We may need to revert this reverting for 3.10.

Today noticed that acpiphp jump in before pciehp on one setup.

current code from acpi_pci_root_add we have
  1. pci_acpi_scan_root
         ==> pci devices enumeration and bus scanning.
             ==> pci_alloc_child_bus
                 ==> pcibios_add_bus
                     ==> acpi_pci_add_bus
                         ==> acpiphp_enumerate_slots
                             ==> ...==> register_slot
                                  ==> device_is_managed_by_native_pciehp
                                        ==> check osc_set with
OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
   2. _OSC set request

so we always have acpiphp hotplug slot registered at first.

so either we need to
A. revert reverting about _OSC
B. move pcibios_add_bus down to pci_bus_add_devices()
    as acpiphp and apci pci slot driver are some kind of drivers for pci_bus
C. A+B

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-06-12  6:20                                   ` Yinghai Lu
  0 siblings, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-06-12  6:20 UTC (permalink / raw)
  To: Bjorn Helgaas, Jiang Liu
  Cc: Roman Yepishev, Rafael J. Wysocki, linux-pci, linux-acpi,
	linux-kernel, Taku Izumi, Kenji Kaneshige, Matthew Garrett,
	e1000-devel

On Tue, Apr 2, 2013 at 1:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Apr 1, 2013 at 6:03 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
>>> Author: Bjorn Helgaas <bhelgaas@google.com>
>>> Date:   Mon Apr 1 15:47:39 2013 -0600
>>>
>>>     Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>>>
>>>     This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>>>
>>>     Conflicts:
>>>         drivers/acpi/pci_root.c
>>>
>>>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>
>> Acked-by: Yinghai Lu <yinghai@kernel.org>
>>
>> stable need this reverting too.
>
> I updated the changelog and added this to my for-linus branch, headed for v3.9.

We may need to revert this reverting for 3.10.

Today noticed that acpiphp jump in before pciehp on one setup.

current code from acpi_pci_root_add we have
  1. pci_acpi_scan_root
         ==> pci devices enumeration and bus scanning.
             ==> pci_alloc_child_bus
                 ==> pcibios_add_bus
                     ==> acpi_pci_add_bus
                         ==> acpiphp_enumerate_slots
                             ==> ...==> register_slot
                                  ==> device_is_managed_by_native_pciehp
                                        ==> check osc_set with
OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
   2. _OSC set request

so we always have acpiphp hotplug slot registered at first.

so either we need to
A. revert reverting about _OSC
B. move pcibios_add_bus down to pci_bus_add_devices()
    as acpiphp and apci pci slot driver are some kind of drivers for pci_bus
C. A+B

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-12  6:20                                   ` Yinghai Lu
@ 2013-06-12 17:05                                     ` Bjorn Helgaas
  -1 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-06-12 17:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Taku Izumi, Kenji Kaneshige,
	Matthew Garrett, e1000-devel

On Wed, Jun 12, 2013 at 12:20 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, Apr 2, 2013 at 1:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Mon, Apr 1, 2013 at 6:03 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
>>>> Author: Bjorn Helgaas <bhelgaas@google.com>
>>>> Date:   Mon Apr 1 15:47:39 2013 -0600
>>>>
>>>>     Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>>>>
>>>>     This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>>>>
>>>>     Conflicts:
>>>>         drivers/acpi/pci_root.c
>>>>
>>>>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>>>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>>
>>> Acked-by: Yinghai Lu <yinghai@kernel.org>
>>>
>>> stable need this reverting too.
>>
>> I updated the changelog and added this to my for-linus branch, headed for v3.9.
>
> We may need to revert this reverting for 3.10.
>
> Today noticed that acpiphp jump in before pciehp on one setup.
>
> current code from acpi_pci_root_add we have
>   1. pci_acpi_scan_root
>          ==> pci devices enumeration and bus scanning.
>              ==> pci_alloc_child_bus
>                  ==> pcibios_add_bus
>                      ==> acpi_pci_add_bus
>                          ==> acpiphp_enumerate_slots
>                              ==> ...==> register_slot
>                                   ==> device_is_managed_by_native_pciehp
>                                         ==> check osc_set with
> OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
>    2. _OSC set request
>
> so we always have acpiphp hotplug slot registered at first.
>
> so either we need to
> A. revert reverting about _OSC
> B. move pcibios_add_bus down to pci_bus_add_devices()
>     as acpiphp and apci pci slot driver are some kind of drivers for pci_bus
> C. A+B

It doesn't surprise me at all that there are problems in the _OSC code
and the acpiphp/pciehp interaction.  That whole area is a complete
disaster.  It'd really be nice if somebody stepped up and reworked it
so it makes sense.

But this report is useless to me.  I don't have time to work out what
the problem is and how it affects users and come up with a fix.

My advice is to simplify the path first, and worry about fixing the
bug afterwards.  We've already done several iterations of fiddling
with things, and I think all we're doing is playing "whack-a-mole" and
pushing the bugs around from one place to another.

Bjorn

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-06-12 17:05                                     ` Bjorn Helgaas
  0 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-06-12 17:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Taku Izumi, Kenji Kaneshige,
	Matthew Garrett, e1000-devel

On Wed, Jun 12, 2013 at 12:20 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, Apr 2, 2013 at 1:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Mon, Apr 1, 2013 at 6:03 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
>>>> Author: Bjorn Helgaas <bhelgaas@google.com>
>>>> Date:   Mon Apr 1 15:47:39 2013 -0600
>>>>
>>>>     Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>>>>
>>>>     This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>>>>
>>>>     Conflicts:
>>>>         drivers/acpi/pci_root.c
>>>>
>>>>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>>>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>>
>>> Acked-by: Yinghai Lu <yinghai@kernel.org>
>>>
>>> stable need this reverting too.
>>
>> I updated the changelog and added this to my for-linus branch, headed for v3.9.
>
> We may need to revert this reverting for 3.10.
>
> Today noticed that acpiphp jump in before pciehp on one setup.
>
> current code from acpi_pci_root_add we have
>   1. pci_acpi_scan_root
>          ==> pci devices enumeration and bus scanning.
>              ==> pci_alloc_child_bus
>                  ==> pcibios_add_bus
>                      ==> acpi_pci_add_bus
>                          ==> acpiphp_enumerate_slots
>                              ==> ...==> register_slot
>                                   ==> device_is_managed_by_native_pciehp
>                                         ==> check osc_set with
> OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
>    2. _OSC set request
>
> so we always have acpiphp hotplug slot registered at first.
>
> so either we need to
> A. revert reverting about _OSC
> B. move pcibios_add_bus down to pci_bus_add_devices()
>     as acpiphp and apci pci slot driver are some kind of drivers for pci_bus
> C. A+B

It doesn't surprise me at all that there are problems in the _OSC code
and the acpiphp/pciehp interaction.  That whole area is a complete
disaster.  It'd really be nice if somebody stepped up and reworked it
so it makes sense.

But this report is useless to me.  I don't have time to work out what
the problem is and how it affects users and come up with a fix.

My advice is to simplify the path first, and worry about fixing the
bug afterwards.  We've already done several iterations of fiddling
with things, and I think all we're doing is playing "whack-a-mole" and
pushing the bugs around from one place to another.

Bjorn

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-12 17:05                                     ` Bjorn Helgaas
  (?)
@ 2013-06-12 19:41                                     ` Yinghai Lu
  2013-06-13  3:50                                       ` Bjorn Helgaas
                                                         ` (2 more replies)
  -1 siblings, 3 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-06-12 19:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 2038 bytes --]

On Wed, Jun 12, 2013 at 10:05 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> current code from acpi_pci_root_add we have
>>   1. pci_acpi_scan_root
>>          ==> pci devices enumeration and bus scanning.
>>              ==> pci_alloc_child_bus
>>                  ==> pcibios_add_bus
>>                      ==> acpi_pci_add_bus
>>                          ==> acpiphp_enumerate_slots
>>                              ==> ...==> register_slot
>>                                   ==> device_is_managed_by_native_pciehp
>>                                         ==> check osc_set with
>> OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
>>    2. _OSC set request
>>
>> so we always have acpiphp hotplug slot registered at first.
>>
>> so either we need to
>> A. revert reverting about _OSC
>> B. move pcibios_add_bus down to pci_bus_add_devices()
>>     as acpiphp and apci pci slot driver are some kind of drivers for pci_bus
>> C. A+B
>
> It doesn't surprise me at all that there are problems in the _OSC code
> and the acpiphp/pciehp interaction.  That whole area is a complete
> disaster.  It'd really be nice if somebody stepped up and reworked it
> so it makes sense.
>
> But this report is useless to me.  I don't have time to work out what
> the problem is and how it affects users and come up with a fix.

effects: without fix the problem, user can not use pcie native hotplug
if their system's firmware support acpihp and pciehp.
And make it worse, that acpiphp have to be built-in, so they have no
way to blacklist acpiphp in config.

>
> My advice is to simplify the path first, and worry about fixing the
> bug afterwards.  We've already done several iterations of fiddling
> with things, and I think all we're doing is playing "whack-a-mole" and
> pushing the bugs around from one place to another.

We need to address regression at first.
my suggestion is : revert the reverting and apply my -v3 version that will fix
regression that Roman Yepishev met.

please check attached two patches, hope it could save your some time.

Yinghai

[-- Attachment #2: revert_revert_osc_change_linus.patch --]
[-- Type: application/octet-stream, Size: 5303 bytes --]

Subject: [PATCH] ACPI, PCI: Revert reverting of "PCI/ACPI: Request _OSC control before scanning PCI root bus"

In
| commit b8178f130e25c1bdac1c33e0996f1ff6e20ec08e
| Author: Bjorn Helgaas <bhelgaas@google.com>
| Date:   Mon Apr 1 15:47:39 2013 -0600
|
|    Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
|
|    This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
for v3.9, "OSC set early" is reverted.

Now we have problem on v3.10, as it has
       - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe)
       - Make acpiphp builtin only, not modular (Jiang Liu)
   acpiphp get registered with pcibios_add_bus very early.
We can not enable pcie native hotplug driver any more if system support
acpiphp and pciehp.

Calling path: in acpi_pci_root_add we have
1. pci_acpi_scan_root
     ==> pci devices enumeration and bus scanning.
       ==> pci_alloc_child_bus
         ==> pcibios_add_bus
           ==> acpi_pci_add_bus
             ==> acpiphp_enumerate_slots
               ==> ...==> register_slot
                 ==> device_is_managed_by_native_pciehp
                   ==> check osc_set with OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
2. _OSC set request
so we always have acpiphp hotplug slot registered at first, as OSC is
not set yet.

We need "OSC set early" before pci_apci_scan_root again.

The point: later aspm, pme, pciehp, aer support would rely on value in
root osc_support_set/osc_control_set.

For v3.10, let's put the "osc control set early" back,
as we have user in pci_acpi_scan_root.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index e427dc5..207d773 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -382,6 +382,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	int result;
 	struct acpi_pci_root *root;
 	u32 flags, base_flags;
+	bool is_osc_granted = false;
 
 	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
 	if (!root)
@@ -442,30 +443,6 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT;
 	acpi_pci_osc_support(root, flags);
 
-	/*
-	 * TBD: Need PCI interface for enumeration/configuration of roots.
-	 */
-
-	mutex_lock(&acpi_pci_root_lock);
-	list_add_tail(&root->node, &acpi_pci_roots);
-	mutex_unlock(&acpi_pci_root_lock);
-
-	/*
-	 * Scan the Root Bridge
-	 * --------------------
-	 * Must do this prior to any attempt to bind the root device, as the
-	 * PCI namespace does not get created until this call is made (and
-	 * thus the root bridge's pci_dev does not exist).
-	 */
-	root->bus = pci_acpi_scan_root(root);
-	if (!root->bus) {
-		printk(KERN_ERR PREFIX
-			    "Bus %04x:%02x not present in PCI namespace\n",
-			    root->segment, (unsigned int)root->secondary.start);
-		result = -ENODEV;
-		goto out_del_root;
-	}
-
 	/* Indicate support for various _OSC capabilities. */
 	if (pci_ext_cfg_avail())
 		flags |= OSC_EXT_PCI_CONFIG_SUPPORT;
@@ -484,7 +461,6 @@ static int acpi_pci_root_add(struct acpi_device *device,
 			flags = base_flags;
 		}
 	}
-
 	if (!pcie_ports_disabled
 	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
 		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
@@ -505,28 +481,54 @@ static int acpi_pci_root_add(struct acpi_device *device,
 		status = acpi_pci_osc_control_set(device->handle, &flags,
 				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
 		if (ACPI_SUCCESS(status)) {
+			is_osc_granted = true;
 			dev_info(&device->dev,
 				"ACPI _OSC control (0x%02x) granted\n", flags);
-			if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM) {
-				/*
-				 * We have ASPM control, but the FADT indicates
-				 * that it's unsupported. Clear it.
-				 */
-				pcie_clear_aspm(root->bus);
-			}
 		} else {
+			is_osc_granted = false;
 			dev_info(&device->dev,
 				"ACPI _OSC request failed (%s), "
 				"returned control mask: 0x%02x\n",
 				acpi_format_exception(status), flags);
-			pr_info("ACPI _OSC control for PCIe not granted, "
-				"disabling ASPM\n");
-			pcie_no_aspm();
 		}
 	} else {
 		dev_info(&device->dev,
-			 "Unable to request _OSC control "
-			 "(_OSC support mask: 0x%02x)\n", flags);
+			"Unable to request _OSC control "
+			"(_OSC support mask: 0x%02x)\n", flags);
+	}
+
+	/*
+	 * TBD: Need PCI interface for enumeration/configuration of roots.
+	 */
+
+	mutex_lock(&acpi_pci_root_lock);
+	list_add_tail(&root->node, &acpi_pci_roots);
+	mutex_unlock(&acpi_pci_root_lock);
+
+	/*
+	 * Scan the Root Bridge
+	 * --------------------
+	 * Must do this prior to any attempt to bind the root device, as the
+	 * PCI namespace does not get created until this call is made (and 
+	 * thus the root bridge's pci_dev does not exist).
+	 */
+	root->bus = pci_acpi_scan_root(root);
+	if (!root->bus) {
+		printk(KERN_ERR PREFIX
+			    "Bus %04x:%02x not present in PCI namespace\n",
+			    root->segment, (unsigned int)root->secondary.start);
+		result = -ENODEV;
+		goto out_del_root;
+	}
+
+	/* ASPM setting */
+	if (is_osc_granted) {
+		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
+			pcie_clear_aspm(root->bus);
+	} else {
+		pr_info("ACPI _OSC control for PCIe not granted, "
+			"disabling ASPM\n");
+		pcie_no_aspm();
 	}
 
 	pci_acpi_add_bus_pm_notifier(device, root->bus);

[-- Attachment #3: disable_aspm_3_linus.patch --]
[-- Type: application/octet-stream, Size: 8145 bytes --]

Subject: [PATCH] PCI: Remove not needed check in disable aspm link

Roman reported ath5k does not work anymore on 3.8.
Bisected to
| commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6
| Author: Taku Izumi <izumi.taku@jp.fujitsu.com>
| Date:   Tue Oct 30 15:27:13 2012 +0900
|
|    PCI/ACPI: Request _OSC control before scanning PCI root bus
|
|    This patch moves up the code block to request _OSC control in order to
|    separate ACPI work and PCI work in acpi_pci_root_add().

It make pcie_aspm_sanity_check does not work anymore as aspm_disabled
is set before pci root bus scanning.

Bjorn revert that for v3.9, and that will make booting path and hotplug
path with different aspm_disabled again.
For fixing acpiphp and pciehp loading, we need put that "osc set early"
again.

For regression that Roman had, acctually we don't need to check
aspm_disabled in pci_disable_link_state, as we already have
protection about link state checking.
And pci_disable_link_state will be called only for quirk and driver
directly.

That will keep the logic in pcie_aspm_sanity_check() in commits:
 0ae5eaf10     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
 cdb0f9a1a     ASPM: Fix pcie devices with non-pcie children
still working, AKA still not touch pre-1.1 ASPM device.

https://bugzilla.kernel.org/show_bug.cgi?id=55211
http://article.gmane.org/gmane.linux.kernel.pci/20640

Also remove aspm_support_enabled. if aspm is compiled in, even user pass
aspm=off, link_state still get allocated, then we will have chance to
disable aspm on devices from buggy setting of BIOS.

According Bjorn, move the FADT disabling handling early.

-v4: update changelog accordingly for v3.10

Reported-by: Roman Yepishev <roman.yepishev@gmail.com>
Bisected-by: Roman Yepishev <roman.yepishev@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

---
 drivers/acpi/pci_root.c  |   24 +++++++++---------------
 drivers/pci/pcie/aspm.c  |   39 +++------------------------------------
 include/linux/pci-aspm.h |    4 ----
 include/linux/pci.h      |    2 +-
 4 files changed, 13 insertions(+), 56 deletions(-)

Index: linux-2.6/drivers/acpi/pci_root.c
===================================================================
--- linux-2.6.orig/drivers/acpi/pci_root.c
+++ linux-2.6/drivers/acpi/pci_root.c
@@ -382,7 +382,6 @@ static int acpi_pci_root_add(struct acpi
 	int result;
 	struct acpi_pci_root *root;
 	u32 flags, base_flags;
-	bool is_osc_granted = false;
 
 	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
 	if (!root)
@@ -461,6 +460,11 @@ static int acpi_pci_root_add(struct acpi
 			flags = base_flags;
 		}
 	}
+
+	/* ASPM setting */
+	if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
+		pcie_no_aspm();
+
 	if (!pcie_ports_disabled
 	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
 		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
@@ -480,16 +484,16 @@ static int acpi_pci_root_add(struct acpi
 
 		status = acpi_pci_osc_control_set(device->handle, &flags,
 				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
-		if (ACPI_SUCCESS(status)) {
-			is_osc_granted = true;
+		if (ACPI_SUCCESS(status))
 			dev_info(&device->dev,
 				"ACPI _OSC control (0x%02x) granted\n", flags);
-		} else {
-			is_osc_granted = false;
+		else {
 			dev_info(&device->dev,
 				"ACPI _OSC request failed (%s), "
 				"returned control mask: 0x%02x\n",
 				acpi_format_exception(status), flags);
+			pr_info("ACPI _OSC control for PCIe not granted disabling ASPM\n");
+			pcie_no_aspm();
 		}
 	} else {
 		dev_info(&device->dev,
@@ -521,16 +525,6 @@ static int acpi_pci_root_add(struct acpi
 		goto out_del_root;
 	}
 
-	/* ASPM setting */
-	if (is_osc_granted) {
-		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
-			pcie_clear_aspm(root->bus);
-	} else {
-		pr_info("ACPI _OSC control for PCIe not granted, "
-			"disabling ASPM\n");
-		pcie_no_aspm();
-	}
-
 	pci_acpi_add_bus_pm_notifier(device, root->bus);
 	if (device->wakeup.flags.run_wake)
 		device_set_run_wake(root->bus->bridge, true);
Index: linux-2.6/drivers/pci/pcie/aspm.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/aspm.c
+++ linux-2.6/drivers/pci/pcie/aspm.c
@@ -69,7 +69,6 @@ struct pcie_link_state {
 };
 
 static int aspm_disabled, aspm_force;
-static bool aspm_support_enabled = true;
 static DEFINE_MUTEX(aspm_lock);
 static LIST_HEAD(link_list);
 
@@ -556,9 +555,6 @@ void pcie_aspm_init_link_state(struct pc
 	struct pcie_link_state *link;
 	int blacklist = !!pcie_aspm_sanity_check(pdev);
 
-	if (!aspm_support_enabled)
-		return;
-
 	if (!pci_is_pcie(pdev) || pdev->link_state)
 		return;
 	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
@@ -718,15 +714,11 @@ void pcie_aspm_powersave_config_link(str
  * pci_disable_link_state - disable pci device's link state, so the link will
  * never enter specific states
  */
-static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
-				     bool force)
+static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
 {
 	struct pci_dev *parent = pdev->bus->self;
 	struct pcie_link_state *link;
 
-	if (aspm_disabled && !force)
-		return;
-
 	if (!pci_is_pcie(pdev))
 		return;
 
@@ -757,34 +749,16 @@ static void __pci_disable_link_state(str
 
 void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, false, false);
+	__pci_disable_link_state(pdev, state, false);
 }
 EXPORT_SYMBOL(pci_disable_link_state_locked);
 
 void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
-	__pci_disable_link_state(pdev, state, true, false);
+	__pci_disable_link_state(pdev, state, true);
 }
 EXPORT_SYMBOL(pci_disable_link_state);
 
-void pcie_clear_aspm(struct pci_bus *bus)
-{
-	struct pci_dev *child;
-
-	if (aspm_force)
-		return;
-
-	/*
-	 * Clear any ASPM setup that the firmware has carried out on this bus
-	 */
-	list_for_each_entry(child, &bus->devices, bus_list) {
-		__pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
-					 PCIE_LINK_STATE_L1 |
-					 PCIE_LINK_STATE_CLKPM,
-					 false, true);
-	}
-}
-
 static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp)
 {
 	int i;
@@ -944,7 +918,6 @@ static int __init pcie_aspm_disable(char
 	if (!strcmp(str, "off")) {
 		aspm_policy = POLICY_DEFAULT;
 		aspm_disabled = 1;
-		aspm_support_enabled = false;
 		printk(KERN_INFO "PCIe ASPM is disabled\n");
 	} else if (!strcmp(str, "force")) {
 		aspm_force = 1;
@@ -980,9 +953,3 @@ int pcie_aspm_enabled(void)
        return !aspm_disabled;
 }
 EXPORT_SYMBOL(pcie_aspm_enabled);
-
-bool pcie_aspm_support_enabled(void)
-{
-	return aspm_support_enabled;
-}
-EXPORT_SYMBOL(pcie_aspm_support_enabled);
Index: linux-2.6/include/linux/pci-aspm.h
===================================================================
--- linux-2.6.orig/include/linux/pci-aspm.h
+++ linux-2.6/include/linux/pci-aspm.h
@@ -29,7 +29,6 @@ void pcie_aspm_pm_state_change(struct pc
 void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
 void pci_disable_link_state(struct pci_dev *pdev, int state);
 void pci_disable_link_state_locked(struct pci_dev *pdev, int state);
-void pcie_clear_aspm(struct pci_bus *bus);
 void pcie_no_aspm(void);
 #else
 static inline void pcie_aspm_init_link_state(struct pci_dev *pdev)
@@ -47,9 +46,6 @@ static inline void pcie_aspm_powersave_c
 static inline void pci_disable_link_state(struct pci_dev *pdev, int state)
 {
 }
-static inline void pcie_clear_aspm(struct pci_bus *bus)
-{
-}
 static inline void pcie_no_aspm(void)
 {
 }
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1186,7 +1186,7 @@ static inline int pcie_aspm_enabled(void
 static inline bool pcie_aspm_support_enabled(void) { return false; }
 #else
 int pcie_aspm_enabled(void);
-bool pcie_aspm_support_enabled(void);
+static inline bool pcie_aspm_support_enabled(void) { return true; }
 #endif
 
 #ifdef CONFIG_PCIEAER

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-12 19:41                                     ` Yinghai Lu
@ 2013-06-13  3:50                                       ` Bjorn Helgaas
  2013-06-13  4:11                                           ` Jiang Liu (Gerry)
  2013-06-13  5:47                                         ` Yinghai Lu
  2013-06-14 14:11                                       ` Bjorn Helgaas
  2014-06-14 21:21                                       ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Bjorn Helgaas
  2 siblings, 2 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-06-13  3:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

On Wed, Jun 12, 2013 at 1:41 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Jun 12, 2013 at 10:05 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> current code from acpi_pci_root_add we have
>>>   1. pci_acpi_scan_root
>>>          ==> pci devices enumeration and bus scanning.
>>>              ==> pci_alloc_child_bus
>>>                  ==> pcibios_add_bus
>>>                      ==> acpi_pci_add_bus
>>>                          ==> acpiphp_enumerate_slots
>>>                              ==> ...==> register_slot
>>>                                   ==> device_is_managed_by_native_pciehp
>>>                                         ==> check osc_set with
>>> OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
>>>    2. _OSC set request
>>>
>>> so we always have acpiphp hotplug slot registered at first.
>>>
>>> so either we need to
>>> A. revert reverting about _OSC
>>> B. move pcibios_add_bus down to pci_bus_add_devices()
>>>     as acpiphp and apci pci slot driver are some kind of drivers for pci_bus
>>> C. A+B
>>
>> It doesn't surprise me at all that there are problems in the _OSC code
>> and the acpiphp/pciehp interaction.  That whole area is a complete
>> disaster.  It'd really be nice if somebody stepped up and reworked it
>> so it makes sense.
>>
>> But this report is useless to me.  I don't have time to work out what
>> the problem is and how it affects users and come up with a fix.
>
> effects: without fix the problem, user can not use pcie native hotplug
> if their system's firmware support acpihp and pciehp.
> And make it worse, that acpiphp have to be built-in, so they have no
> way to blacklist acpiphp in config.
>
>>
>> My advice is to simplify the path first, and worry about fixing the
>> bug afterwards.  We've already done several iterations of fiddling
>> with things, and I think all we're doing is playing "whack-a-mole" and
>> pushing the bugs around from one place to another.
>
> We need to address regression at first.
> my suggestion is : revert the reverting and apply my -v3 version that will fix
> regression that Roman Yepishev met.
>
> please check attached two patches, hope it could save your some time.

OK, you're right.  It's not reasonable to do anything more than a
minimal fix when we're at -rc5.

Sigh.  I'll spend tomorrow trying to understand your patches and write
changelogs for you.

I think you're saying that in systems that support both acpiphp and
pciehp, we should be using pciehp, but we currently use acpiphp.  If
so, that's certainly a bug.  How serious is it?  Is it a disaster if
we use acpiphp until we can resolve this cleanly?  Are there a lot of
systems that claim to support acpiphp but it doesn't actually work?

Bjorn

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-13  3:50                                       ` Bjorn Helgaas
@ 2013-06-13  4:11                                           ` Jiang Liu (Gerry)
  2013-06-13  5:47                                         ` Yinghai Lu
  1 sibling, 0 replies; 71+ messages in thread
From: Jiang Liu (Gerry) @ 2013-06-13  4:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

Hi Bjorn,
     I'm working on several acpiphp related bugfixes, and feel some
are materials for 3.10 too. Actually we have identified four bugs
related to dock station support on Sony VAIO VPCZ23A4R laptop.
I will try to send out patchset to address these bugs tonight.
Seems we really need to rethink about acpiphp and pciehp now.
Regards!
Gerry
On 2013/6/13 11:50, Bjorn Helgaas wrote:
> On Wed, Jun 12, 2013 at 1:41 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Wed, Jun 12, 2013 at 10:05 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>> current code from acpi_pci_root_add we have
>>>>    1. pci_acpi_scan_root
>>>>           ==> pci devices enumeration and bus scanning.
>>>>               ==> pci_alloc_child_bus
>>>>                   ==> pcibios_add_bus
>>>>                       ==> acpi_pci_add_bus
>>>>                           ==> acpiphp_enumerate_slots
>>>>                               ==> ...==> register_slot
>>>>                                    ==> device_is_managed_by_native_pciehp
>>>>                                          ==> check osc_set with
>>>> OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
>>>>     2. _OSC set request
>>>>
>>>> so we always have acpiphp hotplug slot registered at first.
>>>>
>>>> so either we need to
>>>> A. revert reverting about _OSC
>>>> B. move pcibios_add_bus down to pci_bus_add_devices()
>>>>      as acpiphp and apci pci slot driver are some kind of drivers for pci_bus
>>>> C. A+B
>>>
>>> It doesn't surprise me at all that there are problems in the _OSC code
>>> and the acpiphp/pciehp interaction.  That whole area is a complete
>>> disaster.  It'd really be nice if somebody stepped up and reworked it
>>> so it makes sense.
>>>
>>> But this report is useless to me.  I don't have time to work out what
>>> the problem is and how it affects users and come up with a fix.
>>
>> effects: without fix the problem, user can not use pcie native hotplug
>> if their system's firmware support acpihp and pciehp.
>> And make it worse, that acpiphp have to be built-in, so they have no
>> way to blacklist acpiphp in config.
>>
>>>
>>> My advice is to simplify the path first, and worry about fixing the
>>> bug afterwards.  We've already done several iterations of fiddling
>>> with things, and I think all we're doing is playing "whack-a-mole" and
>>> pushing the bugs around from one place to another.
>>
>> We need to address regression at first.
>> my suggestion is : revert the reverting and apply my -v3 version that will fix
>> regression that Roman Yepishev met.
>>
>> please check attached two patches, hope it could save your some time.
>
> OK, you're right.  It's not reasonable to do anything more than a
> minimal fix when we're at -rc5.
>
> Sigh.  I'll spend tomorrow trying to understand your patches and write
> changelogs for you.
>
> I think you're saying that in systems that support both acpiphp and
> pciehp, we should be using pciehp, but we currently use acpiphp.  If
> so, that's certainly a bug.  How serious is it?  Is it a disaster if
> we use acpiphp until we can resolve this cleanly?  Are there a lot of
> systems that claim to support acpiphp but it doesn't actually work?
>
> Bjorn
>
> .
>



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-06-13  4:11                                           ` Jiang Liu (Gerry)
  0 siblings, 0 replies; 71+ messages in thread
From: Jiang Liu (Gerry) @ 2013-06-13  4:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

Hi Bjorn,
     I'm working on several acpiphp related bugfixes, and feel some
are materials for 3.10 too. Actually we have identified four bugs
related to dock station support on Sony VAIO VPCZ23A4R laptop.
I will try to send out patchset to address these bugs tonight.
Seems we really need to rethink about acpiphp and pciehp now.
Regards!
Gerry
On 2013/6/13 11:50, Bjorn Helgaas wrote:
> On Wed, Jun 12, 2013 at 1:41 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Wed, Jun 12, 2013 at 10:05 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>> current code from acpi_pci_root_add we have
>>>>    1. pci_acpi_scan_root
>>>>           ==> pci devices enumeration and bus scanning.
>>>>               ==> pci_alloc_child_bus
>>>>                   ==> pcibios_add_bus
>>>>                       ==> acpi_pci_add_bus
>>>>                           ==> acpiphp_enumerate_slots
>>>>                               ==> ...==> register_slot
>>>>                                    ==> device_is_managed_by_native_pciehp
>>>>                                          ==> check osc_set with
>>>> OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
>>>>     2. _OSC set request
>>>>
>>>> so we always have acpiphp hotplug slot registered at first.
>>>>
>>>> so either we need to
>>>> A. revert reverting about _OSC
>>>> B. move pcibios_add_bus down to pci_bus_add_devices()
>>>>      as acpiphp and apci pci slot driver are some kind of drivers for pci_bus
>>>> C. A+B
>>>
>>> It doesn't surprise me at all that there are problems in the _OSC code
>>> and the acpiphp/pciehp interaction.  That whole area is a complete
>>> disaster.  It'd really be nice if somebody stepped up and reworked it
>>> so it makes sense.
>>>
>>> But this report is useless to me.  I don't have time to work out what
>>> the problem is and how it affects users and come up with a fix.
>>
>> effects: without fix the problem, user can not use pcie native hotplug
>> if their system's firmware support acpihp and pciehp.
>> And make it worse, that acpiphp have to be built-in, so they have no
>> way to blacklist acpiphp in config.
>>
>>>
>>> My advice is to simplify the path first, and worry about fixing the
>>> bug afterwards.  We've already done several iterations of fiddling
>>> with things, and I think all we're doing is playing "whack-a-mole" and
>>> pushing the bugs around from one place to another.
>>
>> We need to address regression at first.
>> my suggestion is : revert the reverting and apply my -v3 version that will fix
>> regression that Roman Yepishev met.
>>
>> please check attached two patches, hope it could save your some time.
>
> OK, you're right.  It's not reasonable to do anything more than a
> minimal fix when we're at -rc5.
>
> Sigh.  I'll spend tomorrow trying to understand your patches and write
> changelogs for you.
>
> I think you're saying that in systems that support both acpiphp and
> pciehp, we should be using pciehp, but we currently use acpiphp.  If
> so, that's certainly a bug.  How serious is it?  Is it a disaster if
> we use acpiphp until we can resolve this cleanly?  Are there a lot of
> systems that claim to support acpiphp but it doesn't actually work?
>
> Bjorn
>
> .
>



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-13  3:50                                       ` Bjorn Helgaas
  2013-06-13  4:11                                           ` Jiang Liu (Gerry)
@ 2013-06-13  5:47                                         ` Yinghai Lu
  2013-06-13 12:04                                           ` Rafael J. Wysocki
  1 sibling, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-06-13  5:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

On Wed, Jun 12, 2013 at 8:50 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Wed, Jun 12, 2013 at 1:41 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> I think you're saying that in systems that support both acpiphp and
> pciehp, we should be using pciehp, but we currently use acpiphp.  If
> so, that's certainly a bug.  How serious is it?  Is it a disaster if
> we use acpiphp until we can resolve this cleanly?  Are there a lot of
> systems that claim to support acpiphp but it doesn't actually work?

No sure. To make acpiphp would need more expertise in bios.
Normally BIOS vendor would have half done work there, and will need
OEM or system vendor have someone to make it work ....
You would not want to read asl code in DSDT to help them out.
That is not something that we can control.

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-13  5:47                                         ` Yinghai Lu
@ 2013-06-13 12:04                                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 71+ messages in thread
From: Rafael J. Wysocki @ 2013-06-13 12:04 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Jiang Liu, Roman Yepishev, linux-pci, linux-acpi,
	linux-kernel, Linus Torvalds, Andrew Morton, Greg Kroah-Hartman

On Wednesday, June 12, 2013 10:47:08 PM Yinghai Lu wrote:
> On Wed, Jun 12, 2013 at 8:50 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > On Wed, Jun 12, 2013 at 1:41 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >
> > I think you're saying that in systems that support both acpiphp and
> > pciehp, we should be using pciehp, but we currently use acpiphp.  If
> > so, that's certainly a bug.  How serious is it?  Is it a disaster if
> > we use acpiphp until we can resolve this cleanly?  Are there a lot of
> > systems that claim to support acpiphp but it doesn't actually work?
> 
> No sure. To make acpiphp would need more expertise in bios.
> Normally BIOS vendor would have half done work there, and will need
> OEM or system vendor have someone to make it work ....
> You would not want to read asl code in DSDT to help them out.
> That is not something that we can control.

However, pciehp may simply not work by itself on those systems.

It's pretty much like saying "Oh, _CRS may be screwed up, so let's just ignore
it", which isn't overly smart.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-13  4:11                                           ` Jiang Liu (Gerry)
  (?)
@ 2013-06-13 13:57                                           ` Bjorn Helgaas
  -1 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-06-13 13:57 UTC (permalink / raw)
  To: Jiang Liu (Gerry)
  Cc: Yinghai Lu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

On Wed, Jun 12, 2013 at 10:11 PM, Jiang Liu (Gerry)
<jiang.liu@huawei.com> wrote:
> Hi Bjorn,
>     I'm working on several acpiphp related bugfixes, and feel some
> are materials for 3.10 too. Actually we have identified four bugs
> related to dock station support on Sony VAIO VPCZ23A4R laptop.
> I will try to send out patchset to address these bugs tonight.
> Seems we really need to rethink about acpiphp and pciehp now.

We certainly need more rework in acpiphp and pciehp.  But unless it's
an obvious fix for a serious regression, I'm doubtful about putting it
in 3.10.  rc6 is imminent and it's not the time to be putting
significant changes in.

I don't know the details of the dock issues, but you might not need to
be in a huge rush about them.

Bjorn

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-12 19:41                                     ` Yinghai Lu
  2013-06-13  3:50                                       ` Bjorn Helgaas
@ 2013-06-14 14:11                                       ` Bjorn Helgaas
  2013-06-14 16:17                                         ` Yinghai Lu
  2014-06-14 21:21                                       ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Bjorn Helgaas
  2 siblings, 1 reply; 71+ messages in thread
From: Bjorn Helgaas @ 2013-06-14 14:11 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

On Wed, Jun 12, 2013 at 12:41:42PM -0700, Yinghai Lu wrote:
> On Wed, Jun 12, 2013 at 10:05 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >> current code from acpi_pci_root_add we have
> >>   1. pci_acpi_scan_root
> >>          ==> pci devices enumeration and bus scanning.
> >>              ==> pci_alloc_child_bus
> >>                  ==> pcibios_add_bus
> >>                      ==> acpi_pci_add_bus
> >>                          ==> acpiphp_enumerate_slots
> >>                              ==> ...==> register_slot
> >>                                   ==> device_is_managed_by_native_pciehp
> >>                                         ==> check osc_set with
> >> OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
> >>    2. _OSC set request
> >>
> >> so we always have acpiphp hotplug slot registered at first.
> >>
> >> so either we need to
> >> A. revert reverting about _OSC
> >> B. move pcibios_add_bus down to pci_bus_add_devices()
> >>     as acpiphp and apci pci slot driver are some kind of drivers for pci_bus
> >> C. A+B
> >
> > It doesn't surprise me at all that there are problems in the _OSC code
> > and the acpiphp/pciehp interaction.  That whole area is a complete
> > disaster.  It'd really be nice if somebody stepped up and reworked it
> > so it makes sense.
> >
> > But this report is useless to me.  I don't have time to work out what
> > the problem is and how it affects users and come up with a fix.
> 
> effects: without fix the problem, user can not use pcie native hotplug
> if their system's firmware support acpihp and pciehp.
> And make it worse, that acpiphp have to be built-in, so they have no
> way to blacklist acpiphp in config.
> 
> >
> > My advice is to simplify the path first, and worry about fixing the
> > bug afterwards.  We've already done several iterations of fiddling
> > with things, and I think all we're doing is playing "whack-a-mole" and
> > pushing the bugs around from one place to another.
> 
> We need to address regression at first.
> my suggestion is : revert the reverting and apply my -v3 version that will fix
> regression that Roman Yepishev met.
> 
> please check attached two patches, hope it could save your some time.

Here are some of my notes from trying to sort this out, in chronological
order:

    29594404 v3.7
      Bus scanned before requesting _OSC control
      pre-1.1 ath5k has ASPM disabled (works fine)

    8c33f51d "request _OSC control before scanning bus"

    19f949f5 v3.8
      _OSC control requested before scanning bus
      Now pre-1.1 ath5k has ASPM enabled and doesn't work
      https://bugzilla.kernel.org/show_bug.cgi?id=55211 opened

    b8178f13 "revert 'request _OSC control before scanning bus' (8c33f51d)"
      Bus now scanned before requesting _OSC control (as in v3.7)

    c1be5a5b v3.9
      pciehp claims slots first, even when both pciehp & acpiphp are
      built-in, because pciehp module_init precedes acpiphp module_init
      in link order

    6037a803 "Convert acpiphp to be builtin only"
      This also adds "acpiphp.disable" boot option

    3b63aaa7 "Do not use ACPI PCI subdriver mechanism"
      Now acpiphp claims slots first because we call
      acpiphp_enumerate_slots() from pcibios_add_bus() during PCI device
      enumeration.  This happens before pciehp, which still uses
      module_init.

    f722406f v3.10-rc1

    ........ "Revert reverting of 'request _OSC control before scanning bus' (b8178f13)"
      _OSC control requested before scanning bus (as in v3.8)
      pre-1.1 ath5k probably has ASPM enabled and doesn't work

    ........ "Remove not needed check in disable aspm link"
      Now pci_disable_link_state() unconditionally disables ASPM,
      even when BIOS hasn't given us ASPM control


1) The problem you're trying to fix is that when both acpiphp and
pciehp are supported for the same slot, acpiphp claims the slot first
and pciehp will not claim it.  I think this problem was introduced by
3b63aaa7, which was merged after v3.9.  Therefore, v3.9 should work
correctly, and this regression appeared in v3.10-rc1.

2) As you say, acpiphp cannot be a module, so the user would have to
rebuild the kernel to remove it.  However, 6037a803 *did* add a
"acpiphp.disable" boot option, so that should be a workaround that
allows pciehp to claim the slot.

3) I think your "revert reverting" patch gets us back to the same
situation we had after 8c33f51d, i.e., Roman's pre-1.1 ath5k device
will have ASPM enabled and won't work.  I don't want to leave the tree
in this broken state, even though you intend to fix it in the next
patch.  If you can reorder your patches so the ASPM fix is first, that
would be better.

4) Your "Remove not needed check in disable aspm link" patch makes
pci_disable_link_state() disable ASPM even when the OS doesn't have
permission to control ASPM.  I think this is a mistake.  I proposed a
similar change in [1], but Rafael and Matthew thought it was too
risky, and I agree.

Bjorn

[1] https://lkml.kernel.org/r/20130510225257.GA10847@google.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 14:11                                       ` Bjorn Helgaas
@ 2013-06-14 16:17                                         ` Yinghai Lu
  2013-06-14 16:33                                           ` Bjorn Helgaas
  0 siblings, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-06-14 16:17 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

On Fri, Jun 14, 2013 at 7:11 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Here are some of my notes from trying to sort this out, in chronological
> order:
>
>     29594404 v3.7
>       Bus scanned before requesting _OSC control
>       pre-1.1 ath5k has ASPM disabled (works fine)
>
>     8c33f51d "request _OSC control before scanning bus"
>
>     19f949f5 v3.8
>       _OSC control requested before scanning bus
>       Now pre-1.1 ath5k has ASPM enabled and doesn't work
>       https://bugzilla.kernel.org/show_bug.cgi?id=55211 opened
>
>     b8178f13 "revert 'request _OSC control before scanning bus' (8c33f51d)"
>       Bus now scanned before requesting _OSC control (as in v3.7)
>
>     c1be5a5b v3.9
>       pciehp claims slots first, even when both pciehp & acpiphp are
>       built-in, because pciehp module_init precedes acpiphp module_init
>       in link order
>
>     6037a803 "Convert acpiphp to be builtin only"
>       This also adds "acpiphp.disable" boot option
>
>     3b63aaa7 "Do not use ACPI PCI subdriver mechanism"
>       Now acpiphp claims slots first because we call
>       acpiphp_enumerate_slots() from pcibios_add_bus() during PCI device
>       enumeration.  This happens before pciehp, which still uses
>       module_init.
>
>     f722406f v3.10-rc1
>
>     ........ "Revert reverting of 'request _OSC control before scanning bus' (b8178f13)"
>       _OSC control requested before scanning bus (as in v3.8)
>       pre-1.1 ath5k probably has ASPM enabled and doesn't work
>
>     ........ "Remove not needed check in disable aspm link"
>       Now pci_disable_link_state() unconditionally disables ASPM,
>       even when BIOS hasn't given us ASPM control
>
>
> 1) The problem you're trying to fix is that when both acpiphp and
> pciehp are supported for the same slot, acpiphp claims the slot first
> and pciehp will not claim it.  I think this problem was introduced by
> 3b63aaa7, which was merged after v3.9.  Therefore, v3.9 should work
> correctly, and this regression appeared in v3.10-rc1.
>
> 2) As you say, acpiphp cannot be a module, so the user would have to
> rebuild the kernel to remove it.  However, 6037a803 *did* add a
> "acpiphp.disable" boot option, so that should be a workaround that
> allows pciehp to claim the slot.

How about the same system that some slots need to be handled by acpiphp
and some others need to be handled by pciehp ?

for example: laptop that have dock that will need acpiphp, and also have
pci express card that need pciehp.

>
> 3) I think your "revert reverting" patch gets us back to the same
> situation we had after 8c33f51d, i.e., Roman's pre-1.1 ath5k device
> will have ASPM enabled and won't work.  I don't want to leave the tree
> in this broken state, even though you intend to fix it in the next
> patch.  If you can reorder your patches so the ASPM fix is first, that
> would be better.

yes.

We could apply your patch in [1] at first, and revert the reverting.
and do not touch pcie_clear_aspm now.

>
> 4) Your "Remove not needed check in disable aspm link" patch makes
> pci_disable_link_state() disable ASPM even when the OS doesn't have
> permission to control ASPM.  I think this is a mistake.  I proposed a
> similar change in [1], but Rafael and Matthew thought it was too
> risky, and I agree.

before all those changes, and in current state:
quirk disable aspm is before _osc support and control are set.
aka in pci_acpi_scan_root will allocate all link state struct, and quirk
call pci_disable_link_state, and later will _osc support or control can
not be set, pcie_no_aspm is called, can will block all aspm operation.

That is risky too?, why booting path quirk could do that, but driver
and hot-add quirk path can not do that ?

or we can have another pci_disable_link_state always work on quirk path only?

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 16:17                                         ` Yinghai Lu
@ 2013-06-14 16:33                                           ` Bjorn Helgaas
  2013-06-14 16:57                                             ` Yinghai Lu
  0 siblings, 1 reply; 71+ messages in thread
From: Bjorn Helgaas @ 2013-06-14 16:33 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

On Fri, Jun 14, 2013 at 10:17 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jun 14, 2013 at 7:11 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Here are some of my notes from trying to sort this out, in chronological
>> order:
>>
>>     29594404 v3.7
>>       Bus scanned before requesting _OSC control
>>       pre-1.1 ath5k has ASPM disabled (works fine)
>>
>>     8c33f51d "request _OSC control before scanning bus"
>>
>>     19f949f5 v3.8
>>       _OSC control requested before scanning bus
>>       Now pre-1.1 ath5k has ASPM enabled and doesn't work
>>       https://bugzilla.kernel.org/show_bug.cgi?id=55211 opened
>>
>>     b8178f13 "revert 'request _OSC control before scanning bus' (8c33f51d)"
>>       Bus now scanned before requesting _OSC control (as in v3.7)
>>
>>     c1be5a5b v3.9
>>       pciehp claims slots first, even when both pciehp & acpiphp are
>>       built-in, because pciehp module_init precedes acpiphp module_init
>>       in link order
>>
>>     6037a803 "Convert acpiphp to be builtin only"
>>       This also adds "acpiphp.disable" boot option
>>
>>     3b63aaa7 "Do not use ACPI PCI subdriver mechanism"
>>       Now acpiphp claims slots first because we call
>>       acpiphp_enumerate_slots() from pcibios_add_bus() during PCI device
>>       enumeration.  This happens before pciehp, which still uses
>>       module_init.
>>
>>     f722406f v3.10-rc1
>>
>>     ........ "Revert reverting of 'request _OSC control before scanning bus' (b8178f13)"
>>       _OSC control requested before scanning bus (as in v3.8)
>>       pre-1.1 ath5k probably has ASPM enabled and doesn't work
>>
>>     ........ "Remove not needed check in disable aspm link"
>>       Now pci_disable_link_state() unconditionally disables ASPM,
>>       even when BIOS hasn't given us ASPM control
>>
>>
>> 1) The problem you're trying to fix is that when both acpiphp and
>> pciehp are supported for the same slot, acpiphp claims the slot first
>> and pciehp will not claim it.  I think this problem was introduced by
>> 3b63aaa7, which was merged after v3.9.  Therefore, v3.9 should work
>> correctly, and this regression appeared in v3.10-rc1.
>>
>> 2) As you say, acpiphp cannot be a module, so the user would have to
>> rebuild the kernel to remove it.  However, 6037a803 *did* add a
>> "acpiphp.disable" boot option, so that should be a workaround that
>> allows pciehp to claim the slot.
>
> How about the same system that some slots need to be handled by acpiphp
> and some others need to be handled by pciehp ?
>
> for example: laptop that have dock that will need acpiphp, and also have
> pci express card that need pciehp.
>
>>
>> 3) I think your "revert reverting" patch gets us back to the same
>> situation we had after 8c33f51d, i.e., Roman's pre-1.1 ath5k device
>> will have ASPM enabled and won't work.  I don't want to leave the tree
>> in this broken state, even though you intend to fix it in the next
>> patch.  If you can reorder your patches so the ASPM fix is first, that
>> would be better.
>
> yes.
>
> We could apply your patch in [1] at first, and revert the reverting.
> and do not touch pcie_clear_aspm now.
>
>>
>> 4) Your "Remove not needed check in disable aspm link" patch makes
>> pci_disable_link_state() disable ASPM even when the OS doesn't have
>> permission to control ASPM.  I think this is a mistake.  I proposed a
>> similar change in [1], but Rafael and Matthew thought it was too
>> risky, and I agree.
>
> before all those changes, and in current state:
> quirk disable aspm is before _osc support and control are set.

Can you please refer to specific function names?  I can't read your mind.

You might be referring to quirk_disable_aspm_l0s().  This is a
pci_fixup_final quirk that calls pci_disable_link_state().  In the
current tree, we enumerate devices before requesting _OSC control.
However, pci_fixup_final quirks are not run until the
pci_apply_final_quirks() fs_initcall, which is after we request _OSC
control.

As far as I can tell, we never call pci_disable_link_state() before
calling pcie_no_aspm().

> aka in pci_acpi_scan_root will allocate all link state struct, and quirk
> call pci_disable_link_state, and later will _osc support or control can
> not be set, pcie_no_aspm is called, can will block all aspm operation.
>
> That is risky too?, why booting path quirk could do that, but driver
> and hot-add quirk path can not do that ?
>
> or we can have another pci_disable_link_state always work on quirk path only?
>
> Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 16:33                                           ` Bjorn Helgaas
@ 2013-06-14 16:57                                             ` Yinghai Lu
  2013-06-14 17:44                                               ` Bjorn Helgaas
  0 siblings, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-06-14 16:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

On Fri, Jun 14, 2013 at 9:33 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Jun 14, 2013 at 10:17 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> Can you please refer to specific function names?  I can't read your mind.
>
> You might be referring to quirk_disable_aspm_l0s().  This is a
> pci_fixup_final quirk that calls pci_disable_link_state().  In the
> current tree, we enumerate devices before requesting _OSC control.
> However, pci_fixup_final quirks are not run until the
> pci_apply_final_quirks() fs_initcall, which is after we request _OSC
> control.
>
> As far as I can tell, we never call pci_disable_link_state() before
> calling pcie_no_aspm().

ok, you are right, that is not pci_disable_link_state.

It is pcie_aspm_init_link_state ==> pcie_aspm_sanity_check in booting path
that disable aspm.  It has  "if (aspm_disabled)" in it, and it cause
the difference.

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 16:57                                             ` Yinghai Lu
@ 2013-06-14 17:44                                               ` Bjorn Helgaas
  2013-06-14 18:26                                                 ` Yinghai Lu
  0 siblings, 1 reply; 71+ messages in thread
From: Bjorn Helgaas @ 2013-06-14 17:44 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

On Fri, Jun 14, 2013 at 10:57 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jun 14, 2013 at 9:33 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, Jun 14, 2013 at 10:17 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>> Can you please refer to specific function names?  I can't read your mind.
>>
>> You might be referring to quirk_disable_aspm_l0s().  This is a
>> pci_fixup_final quirk that calls pci_disable_link_state().  In the
>> current tree, we enumerate devices before requesting _OSC control.
>> However, pci_fixup_final quirks are not run until the
>> pci_apply_final_quirks() fs_initcall, which is after we request _OSC
>> control.
>>
>> As far as I can tell, we never call pci_disable_link_state() before
>> calling pcie_no_aspm().
>
> ok, you are right, that is not pci_disable_link_state.
>
> It is pcie_aspm_init_link_state ==> pcie_aspm_sanity_check in booting path
> that disable aspm.  It has  "if (aspm_disabled)" in it, and it cause
> the difference.

Yes, I agree, the pcie_aspm_init_link_state() path uses aspm_disabled
before we set it:

    acpi_pci_root_add
      pci_acpi_scan_root
        pci_scan_child_bus
          pci_scan_slot
            pcie_aspm_init_link_state
              pcie_aspm_sanity_check
                if (aspm_disabled)              # used before set
                  ...
      acpi_pci_osc_control_set
      pcie_no_aspm
        aspm_disabled = 1                       # set

That might mean we do some ASPM configuration during enumeration (in
pci_scan_slot()) even though the BIOS hasn't given us permission.  It
looks like we did that even in v3.7, since we did the enumeration
before the _OSC there as well.  That looks like a bug to me.

I don't think the fact that we have been doing ASPM config during
enumeration before _OSC is an argument for dropping the check in
pci_disable_link_state().

Bjorn

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 17:44                                               ` Bjorn Helgaas
@ 2013-06-14 18:26                                                 ` Yinghai Lu
  2013-06-14 21:26                                                   ` Bjorn Helgaas
  0 siblings, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-06-14 18:26 UTC (permalink / raw)
  To: Bjorn Helgaas, Matthew Garrett
  Cc: Jiang Liu, Roman Yepishev, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman

On Fri, Jun 14, 2013 at 10:44 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Jun 14, 2013 at 10:57 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, Jun 14, 2013 at 9:33 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> On Fri, Jun 14, 2013 at 10:17 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>
>>> Can you please refer to specific function names?  I can't read your mind.
>>>
>>> You might be referring to quirk_disable_aspm_l0s().  This is a
>>> pci_fixup_final quirk that calls pci_disable_link_state().  In the
>>> current tree, we enumerate devices before requesting _OSC control.
>>> However, pci_fixup_final quirks are not run until the
>>> pci_apply_final_quirks() fs_initcall, which is after we request _OSC
>>> control.
>>>
>>> As far as I can tell, we never call pci_disable_link_state() before
>>> calling pcie_no_aspm().
>>
>> ok, you are right, that is not pci_disable_link_state.
>>
>> It is pcie_aspm_init_link_state ==> pcie_aspm_sanity_check in booting path
>> that disable aspm.  It has  "if (aspm_disabled)" in it, and it cause
>> the difference.
>
> Yes, I agree, the pcie_aspm_init_link_state() path uses aspm_disabled
> before we set it:
>
>     acpi_pci_root_add
>       pci_acpi_scan_root
>         pci_scan_child_bus
>           pci_scan_slot
>             pcie_aspm_init_link_state
>               pcie_aspm_sanity_check
>                 if (aspm_disabled)              # used before set
>                   ...
>       acpi_pci_osc_control_set
>       pcie_no_aspm
>         aspm_disabled = 1                       # set
>
> That might mean we do some ASPM configuration during enumeration (in
> pci_scan_slot()) even though the BIOS hasn't given us permission.  It
> looks like we did that even in v3.7, since we did the enumeration
> before the _OSC there as well.  That looks like a bug to me.

agreed. that means commits from Matthew Garrett

commit 4949be16822e92a18ea0cc1616319926628092ee
Author: Matthew Garrett <mjg@redhat.com>
Date:   Tue Mar 6 13:41:49 2012 -0500

    PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled

commit c9651e70ad0aa499814817cbf3cc1d0b806ed3a1
Author: Matthew Garrett <mjg@redhat.com>
Date:   Tue Mar 27 10:17:41 2012 -0400

    ASPM: Fix pcie devices with non-pcie children

will only works when the user specify "aspm=off" in boot command line.

(Roman's should have problem when he boot current linus tree with
"aspm=off", as no one will disable aspm for the offending pci devices)

To close the hole that Matthew' commits miss, that we should move _OSC
support/control set ahead.

For Roman's system it will have to fail, as BIOS enable prep-1.1 pcie devices
aspm, and do not handle over control to OS, so os can not disable aspm link
state for it.

To workaround the problem in Roman's system, we can add pcie_aspm=force_off

so we will have
pcie_aspm=off
pcie_aspm=force
pcie_aspm=force_off

What a mess!

>
> I don't think the fact that we have been doing ASPM config during
> enumeration before _OSC is an argument for dropping the check in
> pci_disable_link_state().

Agreed.

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 18:26                                                 ` Yinghai Lu
@ 2013-06-14 21:26                                                   ` Bjorn Helgaas
  2013-06-14 21:30                                                       ` Matthew Garrett
  2013-06-14 22:17                                                     ` Yinghai Lu
  0 siblings, 2 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2013-06-14 21:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matthew Garrett, Jiang Liu, Roman Yepishev, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

[+cc Maxim, Jussi]

On Fri, Jun 14, 2013 at 12:26 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jun 14, 2013 at 10:44 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, Jun 14, 2013 at 10:57 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Fri, Jun 14, 2013 at 9:33 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>> On Fri, Jun 14, 2013 at 10:17 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>>
>>>> Can you please refer to specific function names?  I can't read your mind.
>>>>
>>>> You might be referring to quirk_disable_aspm_l0s().  This is a
>>>> pci_fixup_final quirk that calls pci_disable_link_state().  In the
>>>> current tree, we enumerate devices before requesting _OSC control.
>>>> However, pci_fixup_final quirks are not run until the
>>>> pci_apply_final_quirks() fs_initcall, which is after we request _OSC
>>>> control.
>>>>
>>>> As far as I can tell, we never call pci_disable_link_state() before
>>>> calling pcie_no_aspm().
>>>
>>> ok, you are right, that is not pci_disable_link_state.
>>>
>>> It is pcie_aspm_init_link_state ==> pcie_aspm_sanity_check in booting path
>>> that disable aspm.  It has  "if (aspm_disabled)" in it, and it cause
>>> the difference.
>>
>> Yes, I agree, the pcie_aspm_init_link_state() path uses aspm_disabled
>> before we set it:
>>
>>     acpi_pci_root_add
>>       pci_acpi_scan_root
>>         pci_scan_child_bus
>>           pci_scan_slot
>>             pcie_aspm_init_link_state
>>               pcie_aspm_sanity_check
>>                 if (aspm_disabled)              # used before set
>>                   ...
>>       acpi_pci_osc_control_set
>>       pcie_no_aspm
>>         aspm_disabled = 1                       # set
>>
>> That might mean we do some ASPM configuration during enumeration (in
>> pci_scan_slot()) even though the BIOS hasn't given us permission.  It
>> looks like we did that even in v3.7, since we did the enumeration
>> before the _OSC there as well.  That looks like a bug to me.
>
> agreed. that means commits from Matthew Garrett
>
> commit 4949be16822e92a18ea0cc1616319926628092ee
> Author: Matthew Garrett <mjg@redhat.com>
> Date:   Tue Mar 6 13:41:49 2012 -0500
>
>     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled
>
> commit c9651e70ad0aa499814817cbf3cc1d0b806ed3a1
> Author: Matthew Garrett <mjg@redhat.com>
> Date:   Tue Mar 27 10:17:41 2012 -0400
>
>     ASPM: Fix pcie devices with non-pcie children
>
> will only works when the user specify "aspm=off" in boot command line.
>
> (Roman's should have problem when he boot current linus tree with
> "aspm=off", as no one will disable aspm for the offending pci devices)
>
> To close the hole that Matthew' commits miss, that we should move _OSC
> support/control set ahead.
>
> For Roman's system it will have to fail, as BIOS enable prep-1.1 pcie devices
> aspm, and do not handle over control to OS, so os can not disable aspm link
> state for it.
>
> To workaround the problem in Roman's system, we can add pcie_aspm=force_off
>
> so we will have
> pcie_aspm=off
> pcie_aspm=force
> pcie_aspm=force_off
>
> What a mess!

Yeah, this is a huge mess.  It makes my head hurt.  I don't think it's
reasonable to add more flags because that will make my head hurt even
more.

If I understand correctly, on Roman's system (the Acer Aspire One
AOA150 netbook mentioned in
https://bugzilla.kernel.org/show_bug.cgi?id=55211):

  - The BIOS leaves ASPM enabled for the ath5k device (03:00.0)
  - The BIOS does not allow the OS to manage ASPM (via _OSC)
  - The ath5k device does not work correctly with ASPM enabled
  - The ath5k driver calls pci_disable_link_state(), but we do not
disable ASPM because we don't have permission from the BIOS

This is basically the case I investigated in bz 57331 [1], and my
conclusion was that Windows behaves the same way, i.e., Windows also
leaves ASPM enabled in this situation.

It would be interesting to know whether that device on Roman's machine
works under Windows and what the ASPM configuration there is.  When
Maxim added the pci_disable_link_state() to ath5k with 6ccf15a1, he
did say that the Windows driver disabled L0s [2], but I don't know
what machine that was or what its _OSC method said.

At the time of 6ccf15a1, Linux evaluated _OSC but did not call
pcie_no_aspm() when it failed, so the pci_disable_link_state() in
ath5k actually *did* disable ASPM.

I did find the Atheros Windows driver for the AOA150 on the Acer
website [3], and the .INF file has several interesting mentions of
ASPM, but I don't know what they mean.

Bjorn

[1] https://bugzilla.kernel.org/show_bug.cgi?id=57331#c5

[2] https://lists.ath5k.org/pipermail/ath5k-devel/2010-June/003842.html

[3] http://global-download.acer.com/GDFiles/Driver/Wireless%20LAN/WLAN_Atheros_7.6.0.224_XPx86_A.zip?acerid=633639843308102758&Step1=NETBOOK,%20CHROMEBOOK&Step2=ASPIRE%20ONE&Step3=AOA150&OS=ALL&LC=en&BC=ACER&SC=PA_6

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 21:26                                                   ` Bjorn Helgaas
  2013-06-14 21:30                                                       ` Matthew Garrett
@ 2013-06-14 21:30                                                       ` Matthew Garrett
  1 sibling, 0 replies; 71+ messages in thread
From: Matthew Garrett @ 2013-06-14 21:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Jiang Liu, Roman Yepishev, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

On Fri, 2013-06-14 at 15:26 -0600, Bjorn Helgaas wrote:

> I did find the Atheros Windows driver for the AOA150 on the Acer
> website [3], and the .INF file has several interesting mentions of
> ASPM, but I don't know what they mean.

They're not the standard functions, so it's possible that the Windows
driver for this hardware disables ASPM via its own register writes.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-06-14 21:30                                                       ` Matthew Garrett
  0 siblings, 0 replies; 71+ messages in thread
From: Matthew Garrett @ 2013-06-14 21:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Jiang Liu, Roman Yepishev, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 576 bytes --]

On Fri, 2013-06-14 at 15:26 -0600, Bjorn Helgaas wrote:

> I did find the Atheros Windows driver for the AOA150 on the Acer
> website [3], and the .INF file has several interesting mentions of
> ASPM, but I don't know what they mean.

They're not the standard functions, so it's possible that the Windows
driver for this hardware disables ASPM via its own register writes.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-06-14 21:30                                                       ` Matthew Garrett
  0 siblings, 0 replies; 71+ messages in thread
From: Matthew Garrett @ 2013-06-14 21:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Jiang Liu, Roman Yepishev, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

T24gRnJpLCAyMDEzLTA2LTE0IGF0IDE1OjI2IC0wNjAwLCBCam9ybiBIZWxnYWFzIHdyb3RlOg0K
DQo+IEkgZGlkIGZpbmQgdGhlIEF0aGVyb3MgV2luZG93cyBkcml2ZXIgZm9yIHRoZSBBT0ExNTAg
b24gdGhlIEFjZXINCj4gd2Vic2l0ZSBbM10sIGFuZCB0aGUgLklORiBmaWxlIGhhcyBzZXZlcmFs
IGludGVyZXN0aW5nIG1lbnRpb25zIG9mDQo+IEFTUE0sIGJ1dCBJIGRvbid0IGtub3cgd2hhdCB0
aGV5IG1lYW4uDQoNClRoZXkncmUgbm90IHRoZSBzdGFuZGFyZCBmdW5jdGlvbnMsIHNvIGl0J3Mg
cG9zc2libGUgdGhhdCB0aGUgV2luZG93cw0KZHJpdmVyIGZvciB0aGlzIGhhcmR3YXJlIGRpc2Fi
bGVzIEFTUE0gdmlhIGl0cyBvd24gcmVnaXN0ZXIgd3JpdGVzLg0KDQotLSANCk1hdHRoZXcgR2Fy
cmV0dCB8IG1qZzU5QHNyY2YudWNhbS5vcmcNCg==

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 21:26                                                   ` Bjorn Helgaas
  2013-06-14 21:30                                                       ` Matthew Garrett
@ 2013-06-14 22:17                                                     ` Yinghai Lu
  2013-06-14 22:27                                                         ` Matthew Garrett
  1 sibling, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-06-14 22:17 UTC (permalink / raw)
  To: Bjorn Helgaas, Roman Yepishev
  Cc: Matthew Garrett, Jiang Liu, Rafael J. Wysocki, linux-pci,
	linux-acpi, linux-kernel, Linus Torvalds, Andrew Morton,
	Greg Kroah-Hartman, Maxim Levitsky, Jussi Kivilinna

[-- Attachment #1: Type: text/plain, Size: 1808 bytes --]

On Fri, Jun 14, 2013 at 2:26 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Maxim, Jussi]
>
> Yeah, this is a huge mess.  It makes my head hurt.  I don't think it's
> reasonable to add more flags because that will make my head hurt even
> more.
>
> If I understand correctly, on Roman's system (the Acer Aspire One
> AOA150 netbook mentioned in
> https://bugzilla.kernel.org/show_bug.cgi?id=55211):
>
>   - The BIOS leaves ASPM enabled for the ath5k device (03:00.0)
>   - The BIOS does not allow the OS to manage ASPM (via _OSC)
>   - The ath5k device does not work correctly with ASPM enabled
>   - The ath5k driver calls pci_disable_link_state(), but we do not
> disable ASPM because we don't have permission from the BIOS

looks like Matthew Garrett path is causing problem:

 commit 4949be16822e92a18ea0cc1616319926628092ee
 Author: Matthew Garrett <mjg@redhat.com>
 Date:   Tue Mar 6 13:41:49 2012 -0500

     PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled

 commit c9651e70ad0aa499814817cbf3cc1d0b806ed3a1
 Author: Matthew Garrett <mjg@redhat.com>
 Date:   Tue Mar 27 10:17:41 2012 -0400

     ASPM: Fix pcie devices with non-pcie children

after those two patches, it aspm_disabled is set, via _osc early,
pre-1.1 devices aspm register will be touched even aspm_force is not specified.

pcie_aspm_init_link_state will all the way to
 pcie_config_aspm_path ==> pcie_config_aspm_link

in that path, aspm_disabled is not checked nowhere.

BTW, when aspm is not disabled, even the link is allocated, because it is
black listed,  so it is never get touched.
Matthew's patch is not needed in any case.

I would suspect that that aspm enabling in Roman's system could set by that
path instead of BIOS.

Roman, can you please check two patches + linsus' tree on your system?

Thanks

Yinghai

[-- Attachment #2: revert_revert_osc_change_linus.patch --]
[-- Type: application/octet-stream, Size: 5303 bytes --]

Subject: [PATCH] ACPI, PCI: Revert reverting of "PCI/ACPI: Request _OSC control before scanning PCI root bus"

In
| commit b8178f130e25c1bdac1c33e0996f1ff6e20ec08e
| Author: Bjorn Helgaas <bhelgaas@google.com>
| Date:   Mon Apr 1 15:47:39 2013 -0600
|
|    Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
|
|    This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
for v3.9, "OSC set early" is reverted.

Now we have problem on v3.10, as it has
       - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe)
       - Make acpiphp builtin only, not modular (Jiang Liu)
   acpiphp get registered with pcibios_add_bus very early.
We can not enable pcie native hotplug driver any more if system support
acpiphp and pciehp.

Calling path: in acpi_pci_root_add we have
1. pci_acpi_scan_root
     ==> pci devices enumeration and bus scanning.
       ==> pci_alloc_child_bus
         ==> pcibios_add_bus
           ==> acpi_pci_add_bus
             ==> acpiphp_enumerate_slots
               ==> ...==> register_slot
                 ==> device_is_managed_by_native_pciehp
                   ==> check osc_set with OSC_PCI_EXPRESS_NATIVE_HP_CONTROL
2. _OSC set request
so we always have acpiphp hotplug slot registered at first, as OSC is
not set yet.

We need "OSC set early" before pci_apci_scan_root again.

The point: later aspm, pme, pciehp, aer support would rely on value in
root osc_support_set/osc_control_set.

For v3.10, let's put the "osc control set early" back,
as we have user in pci_acpi_scan_root.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index e427dc5..207d773 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -382,6 +382,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	int result;
 	struct acpi_pci_root *root;
 	u32 flags, base_flags;
+	bool is_osc_granted = false;
 
 	root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
 	if (!root)
@@ -442,30 +443,6 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT;
 	acpi_pci_osc_support(root, flags);
 
-	/*
-	 * TBD: Need PCI interface for enumeration/configuration of roots.
-	 */
-
-	mutex_lock(&acpi_pci_root_lock);
-	list_add_tail(&root->node, &acpi_pci_roots);
-	mutex_unlock(&acpi_pci_root_lock);
-
-	/*
-	 * Scan the Root Bridge
-	 * --------------------
-	 * Must do this prior to any attempt to bind the root device, as the
-	 * PCI namespace does not get created until this call is made (and
-	 * thus the root bridge's pci_dev does not exist).
-	 */
-	root->bus = pci_acpi_scan_root(root);
-	if (!root->bus) {
-		printk(KERN_ERR PREFIX
-			    "Bus %04x:%02x not present in PCI namespace\n",
-			    root->segment, (unsigned int)root->secondary.start);
-		result = -ENODEV;
-		goto out_del_root;
-	}
-
 	/* Indicate support for various _OSC capabilities. */
 	if (pci_ext_cfg_avail())
 		flags |= OSC_EXT_PCI_CONFIG_SUPPORT;
@@ -484,7 +461,6 @@ static int acpi_pci_root_add(struct acpi_device *device,
 			flags = base_flags;
 		}
 	}
-
 	if (!pcie_ports_disabled
 	    && (flags & ACPI_PCIE_REQ_SUPPORT) == ACPI_PCIE_REQ_SUPPORT) {
 		flags = OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL
@@ -505,28 +481,54 @@ static int acpi_pci_root_add(struct acpi_device *device,
 		status = acpi_pci_osc_control_set(device->handle, &flags,
 				       OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
 		if (ACPI_SUCCESS(status)) {
+			is_osc_granted = true;
 			dev_info(&device->dev,
 				"ACPI _OSC control (0x%02x) granted\n", flags);
-			if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM) {
-				/*
-				 * We have ASPM control, but the FADT indicates
-				 * that it's unsupported. Clear it.
-				 */
-				pcie_clear_aspm(root->bus);
-			}
 		} else {
+			is_osc_granted = false;
 			dev_info(&device->dev,
 				"ACPI _OSC request failed (%s), "
 				"returned control mask: 0x%02x\n",
 				acpi_format_exception(status), flags);
-			pr_info("ACPI _OSC control for PCIe not granted, "
-				"disabling ASPM\n");
-			pcie_no_aspm();
 		}
 	} else {
 		dev_info(&device->dev,
-			 "Unable to request _OSC control "
-			 "(_OSC support mask: 0x%02x)\n", flags);
+			"Unable to request _OSC control "
+			"(_OSC support mask: 0x%02x)\n", flags);
+	}
+
+	/*
+	 * TBD: Need PCI interface for enumeration/configuration of roots.
+	 */
+
+	mutex_lock(&acpi_pci_root_lock);
+	list_add_tail(&root->node, &acpi_pci_roots);
+	mutex_unlock(&acpi_pci_root_lock);
+
+	/*
+	 * Scan the Root Bridge
+	 * --------------------
+	 * Must do this prior to any attempt to bind the root device, as the
+	 * PCI namespace does not get created until this call is made (and 
+	 * thus the root bridge's pci_dev does not exist).
+	 */
+	root->bus = pci_acpi_scan_root(root);
+	if (!root->bus) {
+		printk(KERN_ERR PREFIX
+			    "Bus %04x:%02x not present in PCI namespace\n",
+			    root->segment, (unsigned int)root->secondary.start);
+		result = -ENODEV;
+		goto out_del_root;
+	}
+
+	/* ASPM setting */
+	if (is_osc_granted) {
+		if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM)
+			pcie_clear_aspm(root->bus);
+	} else {
+		pr_info("ACPI _OSC control for PCIe not granted, "
+			"disabling ASPM\n");
+		pcie_no_aspm();
 	}
 
 	pci_acpi_add_bus_pm_notifier(device, root->bus);

[-- Attachment #3: revert_matthew_aspm_disabled.patch --]
[-- Type: application/octet-stream, Size: 583 bytes --]

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 403a443..b795b43 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -493,15 +493,6 @@ static int pcie_aspm_sanity_check(struct pci_dev *pdev)
 			return -EINVAL;
 
 		/*
-		 * If ASPM is disabled then we're not going to change
-		 * the BIOS state. It's safe to continue even if it's a
-		 * pre-1.1 device
-		 */
-
-		if (aspm_disabled)
-			continue;
-
-		/*
 		 * Disable ASPM for pre-1.1 PCIe device, we follow MS to use
 		 * RBER bit to determine if a function is 1.1 version device
 		 */

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 22:17                                                     ` Yinghai Lu
  2013-06-14 22:27                                                         ` Matthew Garrett
@ 2013-06-14 22:27                                                         ` Matthew Garrett
  0 siblings, 0 replies; 71+ messages in thread
From: Matthew Garrett @ 2013-06-14 22:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Roman Yepishev, Jiang Liu, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

On Fri, 2013-06-14 at 15:17 -0700, Yinghai Lu wrote:

> after those two patches, it aspm_disabled is set, via _osc early,
> pre-1.1 devices aspm register will be touched even aspm_force is not specified.

I don't follow. We were previously automatically disabling ASPM on
pre-1.1 devices even if _OSC didn't give us control. I've confirmed that
this was the wrong thing for us to be doing, and my patch changed the
behaviour such that if the firmware enables ASPM on a pre-1.1 device and
refuses to give us control via _OSC we will leave ASPM enabled.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-06-14 22:27                                                         ` Matthew Garrett
  0 siblings, 0 replies; 71+ messages in thread
From: Matthew Garrett @ 2013-06-14 22:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Roman Yepishev, Jiang Liu, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 755 bytes --]

On Fri, 2013-06-14 at 15:17 -0700, Yinghai Lu wrote:

> after those two patches, it aspm_disabled is set, via _osc early,
> pre-1.1 devices aspm register will be touched even aspm_force is not specified.

I don't follow. We were previously automatically disabling ASPM on
pre-1.1 devices even if _OSC didn't give us control. I've confirmed that
this was the wrong thing for us to be doing, and my patch changed the
behaviour such that if the firmware enables ASPM on a pre-1.1 device and
refuses to give us control via _OSC we will leave ASPM enabled.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-06-14 22:27                                                         ` Matthew Garrett
  0 siblings, 0 replies; 71+ messages in thread
From: Matthew Garrett @ 2013-06-14 22:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Roman Yepishev, Jiang Liu, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

T24gRnJpLCAyMDEzLTA2LTE0IGF0IDE1OjE3IC0wNzAwLCBZaW5naGFpIEx1IHdyb3RlOg0KDQo+
IGFmdGVyIHRob3NlIHR3byBwYXRjaGVzLCBpdCBhc3BtX2Rpc2FibGVkIGlzIHNldCwgdmlhIF9v
c2MgZWFybHksDQo+IHByZS0xLjEgZGV2aWNlcyBhc3BtIHJlZ2lzdGVyIHdpbGwgYmUgdG91Y2hl
ZCBldmVuIGFzcG1fZm9yY2UgaXMgbm90IHNwZWNpZmllZC4NCg0KSSBkb24ndCBmb2xsb3cuIFdl
IHdlcmUgcHJldmlvdXNseSBhdXRvbWF0aWNhbGx5IGRpc2FibGluZyBBU1BNIG9uDQpwcmUtMS4x
IGRldmljZXMgZXZlbiBpZiBfT1NDIGRpZG4ndCBnaXZlIHVzIGNvbnRyb2wuIEkndmUgY29uZmly
bWVkIHRoYXQNCnRoaXMgd2FzIHRoZSB3cm9uZyB0aGluZyBmb3IgdXMgdG8gYmUgZG9pbmcsIGFu
ZCBteSBwYXRjaCBjaGFuZ2VkIHRoZQ0KYmVoYXZpb3VyIHN1Y2ggdGhhdCBpZiB0aGUgZmlybXdh
cmUgZW5hYmxlcyBBU1BNIG9uIGEgcHJlLTEuMSBkZXZpY2UgYW5kDQpyZWZ1c2VzIHRvIGdpdmUg
dXMgY29udHJvbCB2aWEgX09TQyB3ZSB3aWxsIGxlYXZlIEFTUE0gZW5hYmxlZC4NCg0KLS0gDQpN
YXR0aGV3IEdhcnJldHQgfCBtamc1OUBzcmNmLnVjYW0ub3JnDQo=

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 22:27                                                         ` Matthew Garrett
  (?)
  (?)
@ 2013-06-14 22:40                                                         ` Yinghai Lu
  2013-06-14 22:48                                                             ` Matthew Garrett
  -1 siblings, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2013-06-14 22:40 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Bjorn Helgaas, Roman Yepishev, Jiang Liu, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

On Fri, Jun 14, 2013 at 3:27 PM, Matthew Garrett
<matthew.garrett@nebula.com> wrote:
> On Fri, 2013-06-14 at 15:17 -0700, Yinghai Lu wrote:
>
>> after those two patches, it aspm_disabled is set, via _osc early,
>> pre-1.1 devices aspm register will be touched even aspm_force is not specified.
>
> I don't follow. We were previously automatically disabling ASPM on
> pre-1.1 devices even if _OSC didn't give us control.

I don't think so, we just moved _OSC support/control setting before pci scan
in 3.8 and revert that in v3.9.

> I've confirmed that
> this was the wrong thing for us to be doing, and my patch changed the
> behaviour such that if the firmware enables ASPM on a pre-1.1 device and
> refuses to give us control via _OSC we will leave ASPM enabled.

not sure, aspm_disabled should be false on booting path when that function
is called, if you don't pass aspm=off.

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 22:40                                                         ` Yinghai Lu
  2013-06-14 22:48                                                             ` Matthew Garrett
@ 2013-06-14 22:48                                                             ` Matthew Garrett
  0 siblings, 0 replies; 71+ messages in thread
From: Matthew Garrett @ 2013-06-14 22:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Roman Yepishev, Jiang Liu, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

On Fri, 2013-06-14 at 15:40 -0700, Yinghai Lu wrote:
> On Fri, Jun 14, 2013 at 3:27 PM, Matthew Garrett
> <matthew.garrett@nebula.com> wrote:
> > On Fri, 2013-06-14 at 15:17 -0700, Yinghai Lu wrote:
> >
> >> after those two patches, it aspm_disabled is set, via _osc early,
> >> pre-1.1 devices aspm register will be touched even aspm_force is not specified.
> >
> > I don't follow. We were previously automatically disabling ASPM on
> > pre-1.1 devices even if _OSC didn't give us control.
> 
> I don't think so, we just moved _OSC support/control setting before pci scan
> in 3.8 and revert that in v3.9.

Right, sorry, I don't mean _OSC, I mean the FADT flag. We were
previously automatically disabling ASPM on pre-1.1 devices even if the
FADT flag was set.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-06-14 22:48                                                             ` Matthew Garrett
  0 siblings, 0 replies; 71+ messages in thread
From: Matthew Garrett @ 2013-06-14 22:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Roman Yepishev, Jiang Liu, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 964 bytes --]

On Fri, 2013-06-14 at 15:40 -0700, Yinghai Lu wrote:
> On Fri, Jun 14, 2013 at 3:27 PM, Matthew Garrett
> <matthew.garrett@nebula.com> wrote:
> > On Fri, 2013-06-14 at 15:17 -0700, Yinghai Lu wrote:
> >
> >> after those two patches, it aspm_disabled is set, via _osc early,
> >> pre-1.1 devices aspm register will be touched even aspm_force is not specified.
> >
> > I don't follow. We were previously automatically disabling ASPM on
> > pre-1.1 devices even if _OSC didn't give us control.
> 
> I don't think so, we just moved _OSC support/control setting before pci scan
> in 3.8 and revert that in v3.9.

Right, sorry, I don't mean _OSC, I mean the FADT flag. We were
previously automatically disabling ASPM on pre-1.1 devices even if the
FADT flag was set.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
@ 2013-06-14 22:48                                                             ` Matthew Garrett
  0 siblings, 0 replies; 71+ messages in thread
From: Matthew Garrett @ 2013-06-14 22:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Roman Yepishev, Jiang Liu, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

T24gRnJpLCAyMDEzLTA2LTE0IGF0IDE1OjQwIC0wNzAwLCBZaW5naGFpIEx1IHdyb3RlOg0KPiBP
biBGcmksIEp1biAxNCwgMjAxMyBhdCAzOjI3IFBNLCBNYXR0aGV3IEdhcnJldHQNCj4gPG1hdHRo
ZXcuZ2FycmV0dEBuZWJ1bGEuY29tPiB3cm90ZToNCj4gPiBPbiBGcmksIDIwMTMtMDYtMTQgYXQg
MTU6MTcgLTA3MDAsIFlpbmdoYWkgTHUgd3JvdGU6DQo+ID4NCj4gPj4gYWZ0ZXIgdGhvc2UgdHdv
IHBhdGNoZXMsIGl0IGFzcG1fZGlzYWJsZWQgaXMgc2V0LCB2aWEgX29zYyBlYXJseSwNCj4gPj4g
cHJlLTEuMSBkZXZpY2VzIGFzcG0gcmVnaXN0ZXIgd2lsbCBiZSB0b3VjaGVkIGV2ZW4gYXNwbV9m
b3JjZSBpcyBub3Qgc3BlY2lmaWVkLg0KPiA+DQo+ID4gSSBkb24ndCBmb2xsb3cuIFdlIHdlcmUg
cHJldmlvdXNseSBhdXRvbWF0aWNhbGx5IGRpc2FibGluZyBBU1BNIG9uDQo+ID4gcHJlLTEuMSBk
ZXZpY2VzIGV2ZW4gaWYgX09TQyBkaWRuJ3QgZ2l2ZSB1cyBjb250cm9sLg0KPiANCj4gSSBkb24n
dCB0aGluayBzbywgd2UganVzdCBtb3ZlZCBfT1NDIHN1cHBvcnQvY29udHJvbCBzZXR0aW5nIGJl
Zm9yZSBwY2kgc2Nhbg0KPiBpbiAzLjggYW5kIHJldmVydCB0aGF0IGluIHYzLjkuDQoNClJpZ2h0
LCBzb3JyeSwgSSBkb24ndCBtZWFuIF9PU0MsIEkgbWVhbiB0aGUgRkFEVCBmbGFnLiBXZSB3ZXJl
DQpwcmV2aW91c2x5IGF1dG9tYXRpY2FsbHkgZGlzYWJsaW5nIEFTUE0gb24gcHJlLTEuMSBkZXZp
Y2VzIGV2ZW4gaWYgdGhlDQpGQURUIGZsYWcgd2FzIHNldC4NCg0KLS0gDQpNYXR0aGV3IEdhcnJl
dHQgfCBtamc1OUBzcmNmLnVjYW0ub3JnDQo=

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] PCI: Remove not needed check in disable aspm link
  2013-06-14 22:48                                                             ` Matthew Garrett
  (?)
  (?)
@ 2013-06-14 23:00                                                             ` Yinghai Lu
  -1 siblings, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2013-06-14 23:00 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Bjorn Helgaas, Roman Yepishev, Jiang Liu, Rafael J. Wysocki,
	linux-pci, linux-acpi, linux-kernel, Linus Torvalds,
	Andrew Morton, Greg Kroah-Hartman, Maxim Levitsky,
	Jussi Kivilinna

On Fri, Jun 14, 2013 at 3:48 PM, Matthew Garrett
<matthew.garrett@nebula.com> wrote:
> On Fri, 2013-06-14 at 15:40 -0700, Yinghai Lu wrote:
>> On Fri, Jun 14, 2013 at 3:27 PM, Matthew Garrett
>> <matthew.garrett@nebula.com> wrote:
>> > On Fri, 2013-06-14 at 15:17 -0700, Yinghai Lu wrote:
>> >
>> >> after those two patches, it aspm_disabled is set, via _osc early,
>> >> pre-1.1 devices aspm register will be touched even aspm_force is not specified.
>> >
>> > I don't follow. We were previously automatically disabling ASPM on
>> > pre-1.1 devices even if _OSC didn't give us control.
>>
>> I don't think so, we just moved _OSC support/control setting before pci scan
>> in 3.8 and revert that in v3.9.
>
> Right, sorry, I don't mean _OSC, I mean the FADT flag. We were
> previously automatically disabling ASPM on pre-1.1 devices even if the
> FADT flag was set.

so in that case aspm_disabled, never get set.
booting path: pcie_aspm_init_link_state will not touch aspm on pre-1.1 devices.

late, FADT checking will cause pcie_clear_aspm() get called, it will
call __pci_disable_link_state, and because following line in
pcie_aspm_link()
        /* Nothing to do if the link is already in the requested state */
        state &= (link->aspm_capable & ~link->aspm_disable);
        if (link->aspm_enabled == state)
                return;
aspm in pre-1.1 devices still does not get touched.

Maybe I miss something.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling
  2013-06-12 19:41                                     ` Yinghai Lu
  2013-06-13  3:50                                       ` Bjorn Helgaas
  2013-06-14 14:11                                       ` Bjorn Helgaas
@ 2014-06-14 21:21                                       ` Bjorn Helgaas
  2014-06-14 21:21                                         ` [PATCH RFC 1/4] PCI: pciehp: Make pcie_wait_cmd() self-contained Bjorn Helgaas
                                                           ` (5 more replies)
  2 siblings, 6 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2014-06-14 21:21 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jan C. Nordholz, Kenji Kaneshige, Rajat Jain, linux-pci

Yinghai has been working on pciehp timeouts related to a hardware
erratum in Intel, AMD, and Nvidia hotplug controllers.  This affects
the way we wait for command completion on those controllers.

I had some suggestions about how to change pciehp to make this work
better in general, without having to check for specific vendors.  We
need something that works well on hardware that conforms to the spec,
as well as the stuff that doesn't.

I haven't heard anything for a while, so I wrote up these patches to
make my proposals concrete.  Unfortunately, I can't easily test any of
this, so I'm posting these for comment and possible testing if anybody
is ambitious.

The Intel erratum is CF118, described here:
http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
---

Bjorn Helgaas (4):
      PCI: pciehp: Make pcie_wait_cmd() self-contained
      PCI: pciehp: Wait for hotplug command completion lazily
      PCI: pciehp: Compute timeout from hotplug command start time
      PCI: pciehp: Remove assumptions about which commands cause completion events


 drivers/pci/hotplug/pciehp.h     |    2 +
 drivers/pci/hotplug/pciehp_hpc.c |   91 +++++++++++++++++---------------------
 2 files changed, 42 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC 1/4] PCI: pciehp: Make pcie_wait_cmd() self-contained
  2014-06-14 21:21                                       ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Bjorn Helgaas
@ 2014-06-14 21:21                                         ` Bjorn Helgaas
  2014-06-14 21:21                                         ` [PATCH RFC 2/4] PCI: pciehp: Wait for hotplug command completion lazily Bjorn Helgaas
                                                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2014-06-14 21:21 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jan C. Nordholz, Kenji Kaneshige, Rajat Jain, linux-pci

pcie_wait_cmd() waits for the controller to finish a hotplug command.  Move
the associated logic (to determine whether waiting is required and whether
we're using interrupts or polling) from pcie_write_cmd() to
pcie_wait_cmd().

No functional change.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/pciehp.h     |    1 +
 drivers/pci/hotplug/pciehp_hpc.c |   30 ++++++++++++++----------------
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 8e9012dca450..f7bc886c20be 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -92,6 +92,7 @@ struct controller {
 	struct slot *slot;
 	wait_queue_head_t queue;	/* sleep & wake process */
 	u32 slot_cap;
+	u32 slot_ctrl;
 	struct timer_list poll_timer;
 	unsigned int cmd_busy:1;
 	unsigned int no_cmd_complete:1;
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 056841651a80..0e76e9d9d134 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -129,16 +129,24 @@ static int pcie_poll_cmd(struct controller *ctrl)
 	return 0;	/* timeout */
 }
 
-static void pcie_wait_cmd(struct controller *ctrl, int poll)
+static void pcie_wait_cmd(struct controller *ctrl)
 {
 	unsigned int msecs = pciehp_poll_mode ? 2500 : 1000;
 	unsigned long timeout = msecs_to_jiffies(msecs);
 	int rc;
 
-	if (poll)
-		rc = pcie_poll_cmd(ctrl);
-	else
+	/*
+	 * If the controller does not generate notifications for command
+	 * completions, we never need to wait between writes.
+	 */
+	if (ctrl->no_cmd_complete)
+		return;
+
+	if (ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE &&
+	    ctrl->slot_ctrl & PCI_EXP_SLTCTL_CCIE)
 		rc = wait_event_timeout(ctrl->queue, !ctrl->cmd_busy, timeout);
+	else
+		rc = pcie_poll_cmd(ctrl);
 	if (!rc)
 		ctrl_dbg(ctrl, "Command not completed in 1000 msec\n");
 }
@@ -187,22 +195,12 @@ static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
 	ctrl->cmd_busy = 1;
 	smp_mb();
 	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
+	ctrl->slot_ctrl = slot_ctrl;
 
 	/*
 	 * Wait for command completion.
 	 */
-	if (!ctrl->no_cmd_complete) {
-		int poll = 0;
-		/*
-		 * if hotplug interrupt is not enabled or command
-		 * completed interrupt is not enabled, we need to poll
-		 * command completed event.
-		 */
-		if (!(slot_ctrl & PCI_EXP_SLTCTL_HPIE) ||
-		    !(slot_ctrl & PCI_EXP_SLTCTL_CCIE))
-			poll = 1;
-		pcie_wait_cmd(ctrl, poll);
-	}
+	pcie_wait_cmd(ctrl);
 	mutex_unlock(&ctrl->ctrl_lock);
 }
 


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC 2/4] PCI: pciehp: Wait for hotplug command completion lazily
  2014-06-14 21:21                                       ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Bjorn Helgaas
  2014-06-14 21:21                                         ` [PATCH RFC 1/4] PCI: pciehp: Make pcie_wait_cmd() self-contained Bjorn Helgaas
@ 2014-06-14 21:21                                         ` Bjorn Helgaas
  2015-05-29 22:45                                           ` Alex Williamson
  2014-06-14 21:21                                         ` [PATCH RFC 3/4] PCI: pciehp: Compute timeout from hotplug command start time Bjorn Helgaas
                                                           ` (3 subsequent siblings)
  5 siblings, 1 reply; 71+ messages in thread
From: Bjorn Helgaas @ 2014-06-14 21:21 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jan C. Nordholz, Kenji Kaneshige, Rajat Jain, linux-pci

Previously we issued a hotplug command and waited for it to complete.  But
there's no need to wait until we're ready to issue the *next* command.  The
next command will probably be much later, so the first one may have already
completed and we may not have to actually wait at all.

Because of hardware errata, some controllers generate command completion
events for some commands but not others.  In the case of Intel CF118 (see
spec update reference), the controller indicates command completion only
for Slot Control writes that change the value of the following bits:

  Power Controller Control
  Power Indicator Control
  Attention Indicator Control
  Electromechanical Interlock Control

Changes to other bits, e.g., the interrupt enable bits, do not cause the
Command Completed bit to be set.  Controllers from AMD and Nvidia are
reported to have similar errata.

These errata cause timeouts when pcie_enable_notification() enables
interrupts.  Previously that timeout occurred at boot-time.  With this
change, the timeout occurs later, when we change the state of the slot
power, indicators, or interlock.  This speeds up boot but causes a timeout
at the first hotplug event on the slot.  Subsequent events don't timeout
because only the first (boot-time) hotplug command updates Slot Control
without touching the power/indicator/interlock controls.

Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/pciehp_hpc.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 0e76e9d9d134..f44fdb5b0b08 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -165,6 +165,9 @@ static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
 
 	mutex_lock(&ctrl->ctrl_lock);
 
+	/* Wait for any previous command that might still be in progress */
+	pcie_wait_cmd(ctrl);
+
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
 	if (slot_status & PCI_EXP_SLTSTA_CC) {
 		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
@@ -197,10 +200,6 @@ static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
 	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
 	ctrl->slot_ctrl = slot_ctrl;
 
-	/*
-	 * Wait for command completion.
-	 */
-	pcie_wait_cmd(ctrl);
 	mutex_unlock(&ctrl->ctrl_lock);
 }
 


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC 3/4] PCI: pciehp: Compute timeout from hotplug command start time
  2014-06-14 21:21                                       ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Bjorn Helgaas
  2014-06-14 21:21                                         ` [PATCH RFC 1/4] PCI: pciehp: Make pcie_wait_cmd() self-contained Bjorn Helgaas
  2014-06-14 21:21                                         ` [PATCH RFC 2/4] PCI: pciehp: Wait for hotplug command completion lazily Bjorn Helgaas
@ 2014-06-14 21:21                                         ` Bjorn Helgaas
  2014-06-15  2:18                                           ` Yinghai Lu
  2014-06-14 21:21                                         ` [PATCH RFC 4/4] PCI: pciehp: Remove assumptions about which commands cause completion events Bjorn Helgaas
                                                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 71+ messages in thread
From: Bjorn Helgaas @ 2014-06-14 21:21 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jan C. Nordholz, Kenji Kaneshige, Rajat Jain, linux-pci

If we issue a hotplug command, go do something else, then come back and
wait for the command to complete, we don't have to wait the whole timeout
period, because some of it elapsed while we were doing something else.

Keep track of the time we issued the command, and wait only until the
timeout period from that point has elapsed.

For controllers with errata like Intel CF118, we previously timed out
before issuing the second hotplug command:

  At time T1 (during boot):
    - Write DLLSCE, ABPE, PDCE, etc. to Slot Control
  At time T2 (hotplug event):
    - Wait for command completion (CC) in Slot Status
    - Timeout because CC is never set in Slot Status
    - Write PCC, PIC, etc. to Slot Control

With this change, we wait until T1 + 1 second instead of T2 + 1 second.
If the hotplug event is more than 1 second after the boot-time
initialization, we won't wait for the timeout at all.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/pciehp.h     |    1 +
 drivers/pci/hotplug/pciehp_hpc.c |   23 +++++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index f7bc886c20be..c496258cd9a7 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -94,6 +94,7 @@ struct controller {
 	u32 slot_cap;
 	u32 slot_ctrl;
 	struct timer_list poll_timer;
+	unsigned long cmd_started;	/* jiffies */
 	unsigned int cmd_busy:1;
 	unsigned int no_cmd_complete:1;
 	unsigned int link_active_reporting:1;
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index f44fdb5b0b08..18ac24d84f9b 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -104,11 +104,10 @@ static inline void pciehp_free_irq(struct controller *ctrl)
 		free_irq(ctrl->pcie->irq, ctrl);
 }
 
-static int pcie_poll_cmd(struct controller *ctrl)
+static int pcie_poll_cmd(struct controller *ctrl, int timeout)
 {
 	struct pci_dev *pdev = ctrl_dev(ctrl);
 	u16 slot_status;
-	int timeout = 1000;
 
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
 	if (slot_status & PCI_EXP_SLTSTA_CC) {
@@ -132,7 +131,9 @@ static int pcie_poll_cmd(struct controller *ctrl)
 static void pcie_wait_cmd(struct controller *ctrl)
 {
 	unsigned int msecs = pciehp_poll_mode ? 2500 : 1000;
-	unsigned long timeout = msecs_to_jiffies(msecs);
+	unsigned long duration = msecs_to_jiffies(msecs);
+	unsigned long cmd_timeout = ctrl->cmd_started + duration;
+	unsigned long now, timeout;
 	int rc;
 
 	/*
@@ -142,11 +143,24 @@ static void pcie_wait_cmd(struct controller *ctrl)
 	if (ctrl->no_cmd_complete)
 		return;
 
+	if (!ctrl->cmd_busy)
+		return;
+
+	/*
+	 * Even if the command has already timed out, we want to call
+	 * pcie_poll_cmd() so it can clear PCI_EXP_SLTSTA_CC.
+	 */
+	now = jiffies;
+	if (time_before_eq(cmd_timeout, now))
+		timeout = 1;
+	else
+		timeout = cmd_timeout - now;
+
 	if (ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE &&
 	    ctrl->slot_ctrl & PCI_EXP_SLTCTL_CCIE)
 		rc = wait_event_timeout(ctrl->queue, !ctrl->cmd_busy, timeout);
 	else
-		rc = pcie_poll_cmd(ctrl);
+		rc = pcie_poll_cmd(ctrl, timeout);
 	if (!rc)
 		ctrl_dbg(ctrl, "Command not completed in 1000 msec\n");
 }
@@ -198,6 +212,7 @@ static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
 	ctrl->cmd_busy = 1;
 	smp_mb();
 	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
+	ctrl->cmd_started = jiffies;
 	ctrl->slot_ctrl = slot_ctrl;
 
 	mutex_unlock(&ctrl->ctrl_lock);


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC 4/4] PCI: pciehp: Remove assumptions about which commands cause completion events
  2014-06-14 21:21                                       ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Bjorn Helgaas
                                                           ` (2 preceding siblings ...)
  2014-06-14 21:21                                         ` [PATCH RFC 3/4] PCI: pciehp: Compute timeout from hotplug command start time Bjorn Helgaas
@ 2014-06-14 21:21                                         ` Bjorn Helgaas
  2014-06-17  3:25                                           ` Rajat Jain
  2014-06-16  1:26                                         ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Rajat Jain
  2014-08-15 22:05                                         ` Yinghai Lu
  5 siblings, 1 reply; 71+ messages in thread
From: Bjorn Helgaas @ 2014-06-14 21:21 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jan C. Nordholz, Kenji Kaneshige, Rajat Jain, linux-pci

We use incorrect logic to decide whether a PCIe hotplug controller
generates command completion events.

5808639bfa98 ("pciehp: fix slow probing") assumed that the Slot Status
"Command Completed" bit was set only for commands affecting slot power,
indicators, or electromechanical interlock.  That assumption is false: per
sec. 6.7.3.2 of PCIe spec r3.0, a write targeting any portion of the Slot
Control register is a command, and (if command completed events are
supported) software must wait for a command to complete before issuing the
next command.

5808639bfa98 was to fix boot-time timeouts (see bugzilla below) on a Lenovo
Thinkpad R61 with an Intel hotplug controller.  The controller probably has
the Intel CF118 erratum, which means it doesn't report Command Completed
unless the Slot Control power, indicator, or interlock bits are changed.
This causes a timeout because pciehp always waits for Command Complete (if
supported), regardless of which bits are changed.

Remove the incorrect logic because the timeouts have been addressed
differently by these changes:

  PCI: pciehp: Wait for hotplug command completion lazily
  PCI: pciehp: Compute timeout from hotplug command start time

Link: https://bugzilla.kernel.org/show_bug.cgi?id=10751
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/pciehp_hpc.c |   39 ++++++++------------------------------
 1 file changed, 8 insertions(+), 31 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 18ac24d84f9b..1f70de5359fb 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -161,8 +161,10 @@ static void pcie_wait_cmd(struct controller *ctrl)
 		rc = wait_event_timeout(ctrl->queue, !ctrl->cmd_busy, timeout);
 	else
 		rc = pcie_poll_cmd(ctrl, timeout);
+
 	if (!rc)
-		ctrl_dbg(ctrl, "Command not completed in 1000 msec\n");
+		ctrl_info(ctrl, "Timeout on hotplug command %#010x\n",
+			  ctrl->slot_ctrl);
 }
 
 /**
@@ -174,7 +176,6 @@ static void pcie_wait_cmd(struct controller *ctrl)
 static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
 {
 	struct pci_dev *pdev = ctrl_dev(ctrl);
-	u16 slot_status;
 	u16 slot_ctrl;
 
 	mutex_lock(&ctrl->ctrl_lock);
@@ -182,30 +183,6 @@ static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
 	/* Wait for any previous command that might still be in progress */
 	pcie_wait_cmd(ctrl);
 
-	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
-	if (slot_status & PCI_EXP_SLTSTA_CC) {
-		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
-					   PCI_EXP_SLTSTA_CC);
-		if (!ctrl->no_cmd_complete) {
-			/*
-			 * After 1 sec and CMD_COMPLETED still not set, just
-			 * proceed forward to issue the next command according
-			 * to spec. Just print out the error message.
-			 */
-			ctrl_dbg(ctrl, "CMD_COMPLETED not clear after 1 sec\n");
-		} else if (!NO_CMD_CMPL(ctrl)) {
-			/*
-			 * This controller seems to notify of command completed
-			 * event even though it supports none of power
-			 * controller, attention led, power led and EMI.
-			 */
-			ctrl_dbg(ctrl, "Unexpected CMD_COMPLETED. Need to wait for command completed event\n");
-			ctrl->no_cmd_complete = 0;
-		} else {
-			ctrl_dbg(ctrl, "Unexpected CMD_COMPLETED. Maybe the controller is broken\n");
-		}
-	}
-
 	pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
 	slot_ctrl &= ~mask;
 	slot_ctrl |= (cmd & mask);
@@ -785,14 +762,14 @@ struct controller *pcie_init(struct pcie_device *dev)
 	mutex_init(&ctrl->ctrl_lock);
 	init_waitqueue_head(&ctrl->queue);
 	dbg_ctrl(ctrl);
+
 	/*
 	 * Controller doesn't notify of command completion if the "No
-	 * Command Completed Support" bit is set in Slot Capability
-	 * register or the controller supports none of power
-	 * controller, attention led, power led and EMI.
+	 * Command Completed Support" bit is set in Slot Capabilities.
+	 * If set, it means the controller can accept hotplug commands
+	 * with no delay between them.
 	 */
-	if (NO_CMD_CMPL(ctrl) ||
-	    !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
+	if (NO_CMD_CMPL(ctrl))
 		ctrl->no_cmd_complete = 1;
 
 	/* Check if Data Link Layer Link Active Reporting is implemented */


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 3/4] PCI: pciehp: Compute timeout from hotplug command start time
  2014-06-14 21:21                                         ` [PATCH RFC 3/4] PCI: pciehp: Compute timeout from hotplug command start time Bjorn Helgaas
@ 2014-06-15  2:18                                           ` Yinghai Lu
  2014-06-17  0:13                                             ` Bjorn Helgaas
  0 siblings, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2014-06-15  2:18 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jan C. Nordholz, Kenji Kaneshige, Rajat Jain, linux-pci

On Sat, Jun 14, 2014 at 2:21 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index f44fdb5b0b08..18ac24d84f9b 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -142,11 +143,24 @@ static void pcie_wait_cmd(struct controller *ctrl)
>         if (ctrl->no_cmd_complete)
>                 return;
>
> +       if (!ctrl->cmd_busy)
> +               return;

Can you move those two lines to second patch ?

Other than that, for all 4 patches

Acked-by: Yinghai Lu <yinghai@kernel.org>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling
  2014-06-14 21:21                                       ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Bjorn Helgaas
                                                           ` (3 preceding siblings ...)
  2014-06-14 21:21                                         ` [PATCH RFC 4/4] PCI: pciehp: Remove assumptions about which commands cause completion events Bjorn Helgaas
@ 2014-06-16  1:26                                         ` Rajat Jain
  2014-08-15 22:05                                         ` Yinghai Lu
  5 siblings, 0 replies; 71+ messages in thread
From: Rajat Jain @ 2014-06-16  1:26 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Yinghai Lu, Jan C. Nordholz, Kenji Kaneshige, linux-pci

Hi Bjorn,

>
> I haven't heard anything for a while, so I wrote up these patches to
> make my proposals concrete.  Unfortunately, I can't easily test any of
> this, so I'm posting these for comment and possible testing if anybody
> is ambitious.
>

I'll test them out on my machines tomorrow.

Thanks,

Rajat

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 3/4] PCI: pciehp: Compute timeout from hotplug command start time
  2014-06-15  2:18                                           ` Yinghai Lu
@ 2014-06-17  0:13                                             ` Bjorn Helgaas
  2014-06-17 17:33                                               ` Yinghai Lu
  0 siblings, 1 reply; 71+ messages in thread
From: Bjorn Helgaas @ 2014-06-17  0:13 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jan C. Nordholz, Kenji Kaneshige, Rajat Jain, linux-pci

On Sat, Jun 14, 2014 at 8:18 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Sat, Jun 14, 2014 at 2:21 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index f44fdb5b0b08..18ac24d84f9b 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -142,11 +143,24 @@ static void pcie_wait_cmd(struct controller *ctrl)
>>         if (ctrl->no_cmd_complete)
>>                 return;
>>
>> +       if (!ctrl->cmd_busy)
>> +               return;
>
> Can you move those two lines to second patch ?

Yeah, those lines don't really fit in this patch.  I don't think they
really fit in the second patch ("Wait for hotplug command completion
lazily") either, since that is simply moving the wait from after
issuing a command to before issuing the next command.

I moved them to the first patch ("Make pcie_wait_cmd()
self-contained") since that is more concerned with cleaning up
pcie_wait_cmd().  It could even be a separate patch; it would have
made sense to check cmd_busy before calling pcie_wait_cmd().  But it
seemed like overkill to make another patch just for that.

Anyway, I put them all at
http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/?h=pci/hotplug
.  I'll add the results of Rajat's testing (assuming that goes well).
It would be good if you could make sure they do what you expect on
your hardware, too.  If you do test it, let me know what sort of
controller you tested on (vendor/device ID).

I think you will see a "Timeout on hotplug command %#010x (issued %u
msec ago)" message.  I don't think there will be any actual slowdown,
but I hope the message doesn't cause users too much heartburn.

> Other than that, for all 4 patches
>
> Acked-by: Yinghai Lu <yinghai@kernel.org>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 4/4] PCI: pciehp: Remove assumptions about which commands cause completion events
  2014-06-14 21:21                                         ` [PATCH RFC 4/4] PCI: pciehp: Remove assumptions about which commands cause completion events Bjorn Helgaas
@ 2014-06-17  3:25                                           ` Rajat Jain
  0 siblings, 0 replies; 71+ messages in thread
From: Rajat Jain @ 2014-06-17  3:25 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Yinghai Lu, Jan C. Nordholz, Kenji Kaneshige, linux-pci

Hello,

I tested all the 4 patches and they works great on my system (one that
sets the completion on all writes to "slot control" register).

Tested-by: Rajat Jain <rajatxjain@gmail.com>

02:01.0 PCI bridge: Integrated Device Technology, Inc. Device 807a
(rev 02) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Bus: primary=02, secondary=90, subordinate=9f, sec-latency=0
        I/O behind bridge: 00002000-00002fff
        Memory behind bridge: 88000000-8bffffff
        Prefetchable memory behind bridge: 00000000b0200000-00000000b03fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency
L0s <64ns, L1 <1us
                        ExtTag+ RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+
AuxPwr- TransPend-
                LnkCap: Port #1, Speed 5GT/s, Width x4, ASPM L0s L1,
Latency L0 <4us, L1 <4us
                        ClockPM- Surprise+ LLActRep+ BwNot+
                LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train-
SlotClk- DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug+ Surprise-
                        Slot #1, PowerLimit 0.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+
CmdCplt+ HPIrq+ LinkChg+
                        Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet- Interlock-
                        Changed: MRL- PresDet- LinkState-
                DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR-, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [c0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fff41740  Data: 0001
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [200 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=4
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=04 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
                        Port Arbitration Table <?>
        Capabilities: [320 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+
UpstreamFwd+ EgressCtrl+ DirectTrans+
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir-
UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [330 v1] #12
        Kernel driver in use: pcieport


Before these 4 patches, there was no timeout on my system but there
was a spurious error message saying

[   26.324838] pciehp 0000:02:00.0:pcie24: Unexpected CMD_COMPLETED.
Need to wait for command completed event

which was gone with these patches.


Thanks!

Rajat

On Sat, Jun 14, 2014 at 2:21 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> We use incorrect logic to decide whether a PCIe hotplug controller
> generates command completion events.
>
> 5808639bfa98 ("pciehp: fix slow probing") assumed that the Slot Status
> "Command Completed" bit was set only for commands affecting slot power,
> indicators, or electromechanical interlock.  That assumption is false: per
> sec. 6.7.3.2 of PCIe spec r3.0, a write targeting any portion of the Slot
> Control register is a command, and (if command completed events are
> supported) software must wait for a command to complete before issuing the
> next command.
>
> 5808639bfa98 was to fix boot-time timeouts (see bugzilla below) on a Lenovo
> Thinkpad R61 with an Intel hotplug controller.  The controller probably has
> the Intel CF118 erratum, which means it doesn't report Command Completed
> unless the Slot Control power, indicator, or interlock bits are changed.
> This causes a timeout because pciehp always waits for Command Complete (if
> supported), regardless of which bits are changed.
>
> Remove the incorrect logic because the timeouts have been addressed
> differently by these changes:
>
>   PCI: pciehp: Wait for hotplug command completion lazily
>   PCI: pciehp: Compute timeout from hotplug command start time
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=10751
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/hotplug/pciehp_hpc.c |   39 ++++++++------------------------------
>  1 file changed, 8 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 18ac24d84f9b..1f70de5359fb 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -161,8 +161,10 @@ static void pcie_wait_cmd(struct controller *ctrl)
>                 rc = wait_event_timeout(ctrl->queue, !ctrl->cmd_busy, timeout);
>         else
>                 rc = pcie_poll_cmd(ctrl, timeout);
> +
>         if (!rc)
> -               ctrl_dbg(ctrl, "Command not completed in 1000 msec\n");
> +               ctrl_info(ctrl, "Timeout on hotplug command %#010x\n",
> +                         ctrl->slot_ctrl);
>  }
>
>  /**
> @@ -174,7 +176,6 @@ static void pcie_wait_cmd(struct controller *ctrl)
>  static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
>  {
>         struct pci_dev *pdev = ctrl_dev(ctrl);
> -       u16 slot_status;
>         u16 slot_ctrl;
>
>         mutex_lock(&ctrl->ctrl_lock);
> @@ -182,30 +183,6 @@ static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
>         /* Wait for any previous command that might still be in progress */
>         pcie_wait_cmd(ctrl);
>
> -       pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> -       if (slot_status & PCI_EXP_SLTSTA_CC) {
> -               pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> -                                          PCI_EXP_SLTSTA_CC);
> -               if (!ctrl->no_cmd_complete) {
> -                       /*
> -                        * After 1 sec and CMD_COMPLETED still not set, just
> -                        * proceed forward to issue the next command according
> -                        * to spec. Just print out the error message.
> -                        */
> -                       ctrl_dbg(ctrl, "CMD_COMPLETED not clear after 1 sec\n");
> -               } else if (!NO_CMD_CMPL(ctrl)) {
> -                       /*
> -                        * This controller seems to notify of command completed
> -                        * event even though it supports none of power
> -                        * controller, attention led, power led and EMI.
> -                        */
> -                       ctrl_dbg(ctrl, "Unexpected CMD_COMPLETED. Need to wait for command completed event\n");
> -                       ctrl->no_cmd_complete = 0;
> -               } else {
> -                       ctrl_dbg(ctrl, "Unexpected CMD_COMPLETED. Maybe the controller is broken\n");
> -               }
> -       }
> -
>         pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
>         slot_ctrl &= ~mask;
>         slot_ctrl |= (cmd & mask);
> @@ -785,14 +762,14 @@ struct controller *pcie_init(struct pcie_device *dev)
>         mutex_init(&ctrl->ctrl_lock);
>         init_waitqueue_head(&ctrl->queue);
>         dbg_ctrl(ctrl);
> +
>         /*
>          * Controller doesn't notify of command completion if the "No
> -        * Command Completed Support" bit is set in Slot Capability
> -        * register or the controller supports none of power
> -        * controller, attention led, power led and EMI.
> +        * Command Completed Support" bit is set in Slot Capabilities.
> +        * If set, it means the controller can accept hotplug commands
> +        * with no delay between them.
>          */
> -       if (NO_CMD_CMPL(ctrl) ||
> -           !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
> +       if (NO_CMD_CMPL(ctrl))
>                 ctrl->no_cmd_complete = 1;
>
>         /* Check if Data Link Layer Link Active Reporting is implemented */
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 3/4] PCI: pciehp: Compute timeout from hotplug command start time
  2014-06-17  0:13                                             ` Bjorn Helgaas
@ 2014-06-17 17:33                                               ` Yinghai Lu
  0 siblings, 0 replies; 71+ messages in thread
From: Yinghai Lu @ 2014-06-17 17:33 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jan C. Nordholz, Kenji Kaneshige, Rajat Jain, linux-pci

On Mon, Jun 16, 2014 at 5:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Sat, Jun 14, 2014 at 8:18 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Sat, Jun 14, 2014 at 2:21 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>> index f44fdb5b0b08..18ac24d84f9b 100644
>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>> @@ -142,11 +143,24 @@ static void pcie_wait_cmd(struct controller *ctrl)
>>>         if (ctrl->no_cmd_complete)
>>>                 return;
>>>
>>> +       if (!ctrl->cmd_busy)
>>> +               return;
>>
>> Can you move those two lines to second patch ?
>
> I moved them to the first patch ("Make pcie_wait_cmd()
> self-contained") since that is more concerned with cleaning up
> pcie_wait_cmd().  It could even be a separate patch; it would have
> made sense to check cmd_busy before calling pcie_wait_cmd().  But it
> seemed like overkill to make another patch just for that.

Agreed.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling
  2014-06-14 21:21                                       ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Bjorn Helgaas
                                                           ` (4 preceding siblings ...)
  2014-06-16  1:26                                         ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Rajat Jain
@ 2014-08-15 22:05                                         ` Yinghai Lu
  2014-08-15 23:35                                           ` Bjorn Helgaas
  5 siblings, 1 reply; 71+ messages in thread
From: Yinghai Lu @ 2014-08-15 22:05 UTC (permalink / raw)
  To: Bjorn Helgaas, Kenji Kaneshige; +Cc: Jan C. Nordholz, Rajat Jain, linux-pci

[-- Attachment #1: Type: text/plain, Size: 4394 bytes --]

On Sat, Jun 14, 2014 at 2:21 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Yinghai has been working on pciehp timeouts related to a hardware
> erratum in Intel, AMD, and Nvidia hotplug controllers.  This affects
> the way we wait for command completion on those controllers.
>
> I had some suggestions about how to change pciehp to make this work
> better in general, without having to check for specific vendors.  We
> need something that works well on hardware that conforms to the spec,
> as well as the stuff that doesn't.
>
> I haven't heard anything for a while, so I wrote up these patches to
> make my proposals concrete.  Unfortunately, I can't easily test any of
> this, so I'm posting these for comment and possible testing if anybody
> is ambitious.
>
> The Intel erratum is CF118, described here:
> http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
> ---
>
> Bjorn Helgaas (4):
>       PCI: pciehp: Make pcie_wait_cmd() self-contained
>       PCI: pciehp: Wait for hotplug command completion lazily
>       PCI: pciehp: Compute timeout from hotplug command start time
>       PCI: pciehp: Remove assumptions about which commands cause completion events
>
>
>  drivers/pci/hotplug/pciehp.h     |    2 +
>  drivers/pci/hotplug/pciehp_hpc.c |   91 +++++++++++++++++---------------------
>  2 files changed, 42 insertions(+), 51 deletions(-)

Looks like we missed something. With last kernel I still saw the 1s
delay per slot.

After adding more debug printout patches, I got following:

[   67.476898] calling  pcied_init+0x0/0x74 @ 1
[   67.477114] pciehp 0000:00:02.0:pcie04: Hotplug Controller:
[   67.477115] pciehp 0000:00:02.0:pcie04:   Seg/Bus/Dev/Func/IRQ :
0000:00:02.0 IRQ 58
[   67.477117] pciehp 0000:00:02.0:pcie04:   Vendor ID            : 0x8086
[   67.477118] pciehp 0000:00:02.0:pcie04:   Device ID            : 0x2f04
[   67.477119] pciehp 0000:00:02.0:pcie04:   Subsystem ID         : 0x0000
[   67.477120] pciehp 0000:00:02.0:pcie04:   Subsystem Vendor ID  : 0x8086
[   67.477121] pciehp 0000:00:02.0:pcie04:   PCIe Cap offset      : 0x90
[   67.477124] pciehp 0000:00:02.0:pcie04:   PCI resource [13]     :
[io  0x5000-0x5fff]
[   67.477125] pciehp 0000:00:02.0:pcie04:   PCI resource [14]     :
[mem 0x98000000-0x9bffffff]
[   67.477127] pciehp 0000:00:02.0:pcie04:   PCI resource [15]     :
[mem 0x381800000000-0x381bffffffff 64bit pref]
[   67.477128] pciehp 0000:00:02.0:pcie04: Slot Capabilities      : 0x00088cdb
[   67.477129] pciehp 0000:00:02.0:pcie04:   Physical Slot Number : 1
[   67.477130] pciehp 0000:00:02.0:pcie04:   Attention Button     : yes
[   67.477131] pciehp 0000:00:02.0:pcie04:   Power Controller     : yes
[   67.477132] pciehp 0000:00:02.0:pcie04:   MRL Sensor           :  no
[   67.477132] pciehp 0000:00:02.0:pcie04:   Attention Indicator  : yes
[   67.477133] pciehp 0000:00:02.0:pcie04:   Power Indicator      : yes
[   67.477134] pciehp 0000:00:02.0:pcie04:   Hot-Plug Surprise    :  no
[   67.477135] pciehp 0000:00:02.0:pcie04:   EMI Present          :  no
[   67.477136] pciehp 0000:00:02.0:pcie04:   Command Completed    : yes
[   67.477137] pciehp 0000:00:02.0:pcie04: Slot Status            : 0x0010
[   67.477138] pciehp 0000:00:02.0:pcie04: Slot Control           : 0x07cb
[   67.477140] pciehp 0000:00:02.0:pcie04: Link Active Reporting supported
[   67.477144] pciehp 0000:00:02.0:pcie04: pcie_disable_notification:
SLOTCTRL a8 write cmd 0
[   67.477145] pciehp 0000:00:02.0:pcie04: Slot #1 AttnBtn+ AttnInd+
PwrInd+ PwrCtrl+ MRL- Interlock- NoCompl- LLActRep+
[   67.479926] pciehp 0000:00:02.0:pcie04: Registering
domain:bus:dev=0000:01:00 sun=1
[   67.479975] pci_bus 0000:01: dev 00, created physical slot 1
[   67.480041] pci_hotplug: __pci_hp_register: Added slot 1 to the list
[   69.078753] pciehp 0000:00:02.0:pcie04: Timeout on hotplug command
0x000007c0 (issued 1604 msec ago)
[   69.078758] pciehp 0000:00:02.0:pcie04: pcie_enable_notification:
SLOTCTRL a8 write cmd 1031
[   69.078763] pciehp 0000:00:02.0:pcie04: pciehp_get_power_status:
SLOTCTRL a8 value read 17f1
[   69.078765] pciehp 0000:00:02.0:pcie04: service driver pciehp loaded

so there are pcie_disable_notification and pcie_enable_notification.

pcie_enable_notification will wait 1s.

wonder if we can just remove pcie_disable_notification calling from
pciehp_hpc.c::pcie_init()  at all.

Thanks

Yinghai

[-- Attachment #2: correct_timeout_print.patch --]
[-- Type: text/x-patch, Size: 1182 bytes --]

Subject: [PATCH] pci: get exact timeout in pciehp

1. pcie_poll_cmd take msecs instead of jiffies.
2. debug print out should count to current now instead of
   the one before wait_event

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/hotplug/pciehp_hpc.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -160,7 +160,7 @@ static void pcie_wait_cmd(struct control
 	    ctrl->slot_ctrl & PCI_EXP_SLTCTL_CCIE)
 		rc = wait_event_timeout(ctrl->queue, !ctrl->cmd_busy, timeout);
 	else
-		rc = pcie_poll_cmd(ctrl, timeout);
+		rc = pcie_poll_cmd(ctrl, jiffies_to_msecs(timeout));
 
 	/*
 	 * Controllers with errata like Intel CF118 don't generate
@@ -173,7 +173,7 @@ static void pcie_wait_cmd(struct control
 	if (!rc)
 		ctrl_info(ctrl, "Timeout on hotplug command %#010x (issued %u msec ago)\n",
 			  ctrl->slot_ctrl,
-			  jiffies_to_msecs(now - ctrl->cmd_started));
+			  jiffies_to_msecs(jiffies - ctrl->cmd_started));
 }
 
 /**

[-- Attachment #3: pciehp_debug_again.patch --]
[-- Type: text/x-patch, Size: 2109 bytes --]

Subject: [PATCH] pci: more debug printout for pcie_write_cmd for pciehp

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/hotplug/pciehp_hpc.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -422,9 +422,9 @@ void pciehp_set_attention_status(struct
 	default:
 		return;
 	}
+	pcie_write_cmd(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC);
 	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
-	pcie_write_cmd(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC);
 }
 
 void pciehp_green_led_on(struct slot *slot)
@@ -602,6 +602,8 @@ void pcie_enable_notification(struct con
 		PCI_EXP_SLTCTL_DLLSCE);
 
 	pcie_write_cmd(ctrl, cmd, mask);
+	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
+		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
 }
 
 static void pcie_disable_notification(struct controller *ctrl)
@@ -613,6 +615,8 @@ static void pcie_disable_notification(st
 		PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
 		PCI_EXP_SLTCTL_DLLSCE);
 	pcie_write_cmd(ctrl, 0, mask);
+	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
+		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
 }
 
 /*
@@ -640,6 +644,8 @@ int pciehp_reset_slot(struct slot *slot,
 	stat_mask |= PCI_EXP_SLTSTA_DLLSC;
 
 	pcie_write_cmd(ctrl, 0, ctrl_mask);
+	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
+		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
 	if (pciehp_poll_mode)
 		del_timer_sync(&ctrl->poll_timer);
 
@@ -647,6 +653,8 @@ int pciehp_reset_slot(struct slot *slot,
 
 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask);
 	pcie_write_cmd(ctrl, ctrl_mask, ctrl_mask);
+	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
+		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
 	if (pciehp_poll_mode)
 		int_poll_timeout(ctrl->poll_timer.data);
 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling
  2014-08-15 22:05                                         ` Yinghai Lu
@ 2014-08-15 23:35                                           ` Bjorn Helgaas
  0 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2014-08-15 23:35 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Kenji Kaneshige, Jan C. Nordholz, Rajat Jain, linux-pci

On Fri, Aug 15, 2014 at 03:05:39PM -0700, Yinghai Lu wrote:
> On Sat, Jun 14, 2014 at 2:21 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > Yinghai has been working on pciehp timeouts related to a hardware
> > erratum in Intel, AMD, and Nvidia hotplug controllers.  This affects
> > the way we wait for command completion on those controllers.
> >
> > I had some suggestions about how to change pciehp to make this work
> > better in general, without having to check for specific vendors.  We
> > need something that works well on hardware that conforms to the spec,
> > as well as the stuff that doesn't.
> >
> > I haven't heard anything for a while, so I wrote up these patches to
> > make my proposals concrete.  Unfortunately, I can't easily test any of
> > this, so I'm posting these for comment and possible testing if anybody
> > is ambitious.
> >
> > The Intel erratum is CF118, described here:
> > http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
> > ---
> >
> > Bjorn Helgaas (4):
> >       PCI: pciehp: Make pcie_wait_cmd() self-contained
> >       PCI: pciehp: Wait for hotplug command completion lazily
> >       PCI: pciehp: Compute timeout from hotplug command start time
> >       PCI: pciehp: Remove assumptions about which commands cause completion events
> >
> >
> >  drivers/pci/hotplug/pciehp.h     |    2 +
> >  drivers/pci/hotplug/pciehp_hpc.c |   91 +++++++++++++++++---------------------
> >  2 files changed, 42 insertions(+), 51 deletions(-)
> 
> Looks like we missed something. With last kernel I still saw the 1s
> delay per slot.
> 
> After adding more debug printout patches, I got following:
> 
> [   67.476898] calling  pcied_init+0x0/0x74 @ 1
> [   67.477114] pciehp 0000:00:02.0:pcie04: Hotplug Controller:
> [   67.477115] pciehp 0000:00:02.0:pcie04:   Seg/Bus/Dev/Func/IRQ :
> 0000:00:02.0 IRQ 58
> [   67.477117] pciehp 0000:00:02.0:pcie04:   Vendor ID            : 0x8086
> [   67.477118] pciehp 0000:00:02.0:pcie04:   Device ID            : 0x2f04
> [   67.477119] pciehp 0000:00:02.0:pcie04:   Subsystem ID         : 0x0000
> [   67.477120] pciehp 0000:00:02.0:pcie04:   Subsystem Vendor ID  : 0x8086
> [   67.477121] pciehp 0000:00:02.0:pcie04:   PCIe Cap offset      : 0x90
> [   67.477124] pciehp 0000:00:02.0:pcie04:   PCI resource [13]     :
> [io  0x5000-0x5fff]
> [   67.477125] pciehp 0000:00:02.0:pcie04:   PCI resource [14]     :
> [mem 0x98000000-0x9bffffff]
> [   67.477127] pciehp 0000:00:02.0:pcie04:   PCI resource [15]     :
> [mem 0x381800000000-0x381bffffffff 64bit pref]
> [   67.477128] pciehp 0000:00:02.0:pcie04: Slot Capabilities      : 0x00088cdb
> [   67.477129] pciehp 0000:00:02.0:pcie04:   Physical Slot Number : 1
> [   67.477130] pciehp 0000:00:02.0:pcie04:   Attention Button     : yes
> [   67.477131] pciehp 0000:00:02.0:pcie04:   Power Controller     : yes
> [   67.477132] pciehp 0000:00:02.0:pcie04:   MRL Sensor           :  no
> [   67.477132] pciehp 0000:00:02.0:pcie04:   Attention Indicator  : yes
> [   67.477133] pciehp 0000:00:02.0:pcie04:   Power Indicator      : yes
> [   67.477134] pciehp 0000:00:02.0:pcie04:   Hot-Plug Surprise    :  no
> [   67.477135] pciehp 0000:00:02.0:pcie04:   EMI Present          :  no
> [   67.477136] pciehp 0000:00:02.0:pcie04:   Command Completed    : yes
> [   67.477137] pciehp 0000:00:02.0:pcie04: Slot Status            : 0x0010
> [   67.477138] pciehp 0000:00:02.0:pcie04: Slot Control           : 0x07cb
> [   67.477140] pciehp 0000:00:02.0:pcie04: Link Active Reporting supported
> [   67.477144] pciehp 0000:00:02.0:pcie04: pcie_disable_notification:
> SLOTCTRL a8 write cmd 0
> [   67.477145] pciehp 0000:00:02.0:pcie04: Slot #1 AttnBtn+ AttnInd+
> PwrInd+ PwrCtrl+ MRL- Interlock- NoCompl- LLActRep+
> [   67.479926] pciehp 0000:00:02.0:pcie04: Registering
> domain:bus:dev=0000:01:00 sun=1
> [   67.479975] pci_bus 0000:01: dev 00, created physical slot 1
> [   67.480041] pci_hotplug: __pci_hp_register: Added slot 1 to the list
> [   69.078753] pciehp 0000:00:02.0:pcie04: Timeout on hotplug command
> 0x000007c0 (issued 1604 msec ago)
> [   69.078758] pciehp 0000:00:02.0:pcie04: pcie_enable_notification:
> SLOTCTRL a8 write cmd 1031
> [   69.078763] pciehp 0000:00:02.0:pcie04: pciehp_get_power_status:
> SLOTCTRL a8 value read 17f1
> [   69.078765] pciehp 0000:00:02.0:pcie04: service driver pciehp loaded
> 
> so there are pcie_disable_notification and pcie_enable_notification.
> 
> pcie_enable_notification will wait 1s.
> 
> wonder if we can just remove pcie_disable_notification calling from
> pciehp_hpc.c::pcie_init()  at all.

Yes, I agree.  I think it looks safe to drop the
pcie_disable_notification() call from pcie_init().

Bjorn

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 2/4] PCI: pciehp: Wait for hotplug command completion lazily
  2014-06-14 21:21                                         ` [PATCH RFC 2/4] PCI: pciehp: Wait for hotplug command completion lazily Bjorn Helgaas
@ 2015-05-29 22:45                                           ` Alex Williamson
  2015-06-01 21:43                                             ` Bjorn Helgaas
  0 siblings, 1 reply; 71+ messages in thread
From: Alex Williamson @ 2015-05-29 22:45 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Jan C. Nordholz, Kenji Kaneshige, Rajat Jain,
	linux-pci, Myron Stowe


Sorry, I'm digging up and old patch here and this RFC was the only copy
I could find in my mailbox.

On Sat, 2014-06-14 at 15:21 -0600, Bjorn Helgaas wrote:
> Previously we issued a hotplug command and waited for it to complete.  But
> there's no need to wait until we're ready to issue the *next* command.  The
> next command will probably be much later, so the first one may have already
> completed and we may not have to actually wait at all.

I'm seeing a regression as a result of this patch.  Consider the
following function:

int pciehp_reset_slot(struct slot *slot, int probe)
{
	...
	pcie_write_cmd(ctrl, 0, ctrl_mask);
	...
	pci_reset_bridge_secondary_bus(ctrl->pcie->port);

Our command write is clearing bits that control whether the slot has
presence detection and link layer state change enabled.  These are the
things that we're trying to clear to prevent a secondary bus reset from
turning into a surprise hotplug.  If we don't wait for the command to
complete, what state are when in as we initiate the secondary bus reset?
On my system, it's like we never issued the write command at all, the
previous behavior where the hotplug controller detects the link down as
a surprise hotplug returns.  So I'm afraid we do need to wait until
we're ready to issue the *next* command, because it's our only
indication that the current command has completed.

I thought maybe we just need a separate flush command for cases like
this, but then I started looking at the cases where we use this
function:

pciehp_set_attention_status
pciehp_green_led_on
pciehp_green_led_off
pciehp_green_led_blink
pciehp_power_on_slot
pciehp_power_off_slot
pcie_enable_notification
pcie_disable_notification

The slot power ones concern me the same as the reset issue.  In the 'on'
case, we immediately call link enable, so the slot better have
acknowledged the power on command.  In the 'off' case, it seems pretty
bad to complete the 'off' function, but we don't know if the slot is
actually off yet.  For notifications, I can imagine that the caller of
those functions is going to want on or off at the completion of those
functions, not some in-between state.  Even in the case of the LED
functions, where do we want the error to occur, at the time the command
is issued, or maybe some time in the future when the next unrelated
command comes through.

So, rather than posting a new patch adding a flush everywhere that it
seems a little scary, should we consider scrapping this patch altogether
as unsafe?  We need to wait, not only to issue the next command, but to
have any sort of acknowledgment that the current command has completed.

> 
> Because of hardware errata, some controllers generate command completion
> events for some commands but not others.  In the case of Intel CF118 (see
> spec update reference), the controller indicates command completion only
> for Slot Control writes that change the value of the following bits:
> 
>   Power Controller Control
>   Power Indicator Control
>   Attention Indicator Control
>   Electromechanical Interlock Control
> 
> Changes to other bits, e.g., the interrupt enable bits, do not cause the
> Command Completed bit to be set.  Controllers from AMD and Nvidia are
> reported to have similar errata.
> 
> These errata cause timeouts when pcie_enable_notification() enables
> interrupts.  Previously that timeout occurred at boot-time.  With this
> change, the timeout occurs later, when we change the state of the slot
> power, indicators, or interlock.  This speeds up boot but causes a timeout
> at the first hotplug event on the slot.  Subsequent events don't timeout
> because only the first (boot-time) hotplug command updates Slot Control
> without touching the power/indicator/interlock controls.

It sounds like we need some sort of completion mask to handle devices
like this, instead of effectively removing the write-barrier for the
general case.  Thanks,

Alex

> 
> Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/hotplug/pciehp_hpc.c |    7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 0e76e9d9d134..f44fdb5b0b08 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -165,6 +165,9 @@ static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
>  
>  	mutex_lock(&ctrl->ctrl_lock);
>  
> +	/* Wait for any previous command that might still be in progress */
> +	pcie_wait_cmd(ctrl);
> +
>  	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
>  	if (slot_status & PCI_EXP_SLTSTA_CC) {
>  		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> @@ -197,10 +200,6 @@ static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
>  	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
>  	ctrl->slot_ctrl = slot_ctrl;
>  
> -	/*
> -	 * Wait for command completion.
> -	 */
> -	pcie_wait_cmd(ctrl);
>  	mutex_unlock(&ctrl->ctrl_lock);
>  }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 2/4] PCI: pciehp: Wait for hotplug command completion lazily
  2015-05-29 22:45                                           ` Alex Williamson
@ 2015-06-01 21:43                                             ` Bjorn Helgaas
  2015-06-01 22:02                                               ` Alex Williamson
  0 siblings, 1 reply; 71+ messages in thread
From: Bjorn Helgaas @ 2015-06-01 21:43 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yinghai Lu, Jan C. Nordholz, Kenji Kaneshige, Rajat Jain,
	linux-pci, Myron Stowe

On Fri, May 29, 2015 at 04:45:34PM -0600, Alex Williamson wrote:
> 
> Sorry, I'm digging up and old patch here and this RFC was the only copy
> I could find in my mailbox.
> 
> On Sat, 2014-06-14 at 15:21 -0600, Bjorn Helgaas wrote:
> > Previously we issued a hotplug command and waited for it to complete.  But
> > there's no need to wait until we're ready to issue the *next* command.  The
> > next command will probably be much later, so the first one may have already
> > completed and we may not have to actually wait at all.
> 
> I'm seeing a regression as a result of this patch.  Consider the
> following function:
> 
> int pciehp_reset_slot(struct slot *slot, int probe)
> {
> 	...
> 	pcie_write_cmd(ctrl, 0, ctrl_mask);
> 	...
> 	pci_reset_bridge_secondary_bus(ctrl->pcie->port);
> 
> Our command write is clearing bits that control whether the slot has
> presence detection and link layer state change enabled.  These are the
> things that we're trying to clear to prevent a secondary bus reset from
> turning into a surprise hotplug.  If we don't wait for the command to
> complete, what state are when in as we initiate the secondary bus reset?
> On my system, it's like we never issued the write command at all, the
> previous behavior where the hotplug controller detects the link down as
> a surprise hotplug returns.  So I'm afraid we do need to wait until
> we're ready to issue the *next* command, because it's our only
> indication that the current command has completed.

I think that's a bug.  If we do something that depends on a command
completion, we'll have to wait for it.

> pciehp_set_attention_status
> pciehp_green_led_on
> pciehp_green_led_off
> pciehp_green_led_blink
> pciehp_power_on_slot
> pciehp_power_off_slot
> pcie_enable_notification
> pcie_disable_notification
> 
> The slot power ones concern me the same as the reset issue.  In the 'on'
> case, we immediately call link enable, so the slot better have
> acknowledged the power on command.  In the 'off' case, it seems pretty
> bad to complete the 'off' function, but we don't know if the slot is
> actually off yet.  

I agree.

> For notifications, I can imagine that the caller of
> those functions is going to want on or off at the completion of those
> functions, not some in-between state.  

These control interrupt generation by the hotplug controller in the switch.
When disabling interrupt generation, I think we should wait for completion
before we call free_irq(); waiting should prevent spurious interrupts.
When enabling interrupt generation, we've already hooked up the ISR before
we enable interrupt generation, so I don't see a need to wait.

> Even in the case of the LED
> functions, where do we want the error to occur, at the time the command
> is issued, or maybe some time in the future when the next unrelated
> command comes through.

The only error here is the "Timeout on hotplug command" message, and we
don't do anything other than print the message, so I'm not too worried
about this one.

> So, rather than posting a new patch adding a flush everywhere that it
> seems a little scary, should we consider scrapping this patch altogether
> as unsafe?  We need to wait, not only to issue the next command, but to
> have any sort of acknowledgment that the current command has completed.

I definitely screwed up by assuming that completion was only useful for
pacing writes to the command register.  I think lazy checking is fine for
command pacing, but we certainly need to know about completions for other
things, too.

If we added calls to pcie_wait_cmd() in these places:

  pciehp_power_on_slot
  pciehp_power_off_slot
  pcie_disable_notification
  pciehp_reset_slot

do you think that would be enough?

> > Because of hardware errata, some controllers generate command completion
> > events for some commands but not others.  In the case of Intel CF118 (see
> > spec update reference), the controller indicates command completion only
> > for Slot Control writes that change the value of the following bits:
> > 
> >   Power Controller Control
> >   Power Indicator Control
> >   Attention Indicator Control
> >   Electromechanical Interlock Control
> > 
> > Changes to other bits, e.g., the interrupt enable bits, do not cause the
> > Command Completed bit to be set.  Controllers from AMD and Nvidia are
> > reported to have similar errata.
> > 
> > These errata cause timeouts when pcie_enable_notification() enables
> > interrupts.  Previously that timeout occurred at boot-time.  With this
> > change, the timeout occurs later, when we change the state of the slot
> > power, indicators, or interlock.  This speeds up boot but causes a timeout
> > at the first hotplug event on the slot.  Subsequent events don't timeout
> > because only the first (boot-time) hotplug command updates Slot Control
> > without touching the power/indicator/interlock controls.
> 
> It sounds like we need some sort of completion mask to handle devices
> like this, instead of effectively removing the write-barrier for the
> general case.  Thanks,

You mean some quirk-based solution, where we know certain devices don't
indicate command completion for certain events?  That seems hard because I
think there are many devices that have this erratum.  It was difficult to
extract this out of Intel, and I'm pretty sure other vendors have the same
issue.

Bjorn

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 2/4] PCI: pciehp: Wait for hotplug command completion lazily
  2015-06-01 21:43                                             ` Bjorn Helgaas
@ 2015-06-01 22:02                                               ` Alex Williamson
  2015-06-01 22:12                                                 ` Bjorn Helgaas
  0 siblings, 1 reply; 71+ messages in thread
From: Alex Williamson @ 2015-06-01 22:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Jan C. Nordholz, Kenji Kaneshige, Rajat Jain,
	linux-pci, Myron Stowe

On Mon, 2015-06-01 at 16:43 -0500, Bjorn Helgaas wrote:
> On Fri, May 29, 2015 at 04:45:34PM -0600, Alex Williamson wrote:
> > 
> > Sorry, I'm digging up and old patch here and this RFC was the only copy
> > I could find in my mailbox.
> > 
> > On Sat, 2014-06-14 at 15:21 -0600, Bjorn Helgaas wrote:
> > > Previously we issued a hotplug command and waited for it to complete.  But
> > > there's no need to wait until we're ready to issue the *next* command.  The
> > > next command will probably be much later, so the first one may have already
> > > completed and we may not have to actually wait at all.
> > 
> > I'm seeing a regression as a result of this patch.  Consider the
> > following function:
> > 
> > int pciehp_reset_slot(struct slot *slot, int probe)
> > {
> > 	...
> > 	pcie_write_cmd(ctrl, 0, ctrl_mask);
> > 	...
> > 	pci_reset_bridge_secondary_bus(ctrl->pcie->port);
> > 
> > Our command write is clearing bits that control whether the slot has
> > presence detection and link layer state change enabled.  These are the
> > things that we're trying to clear to prevent a secondary bus reset from
> > turning into a surprise hotplug.  If we don't wait for the command to
> > complete, what state are when in as we initiate the secondary bus reset?
> > On my system, it's like we never issued the write command at all, the
> > previous behavior where the hotplug controller detects the link down as
> > a surprise hotplug returns.  So I'm afraid we do need to wait until
> > we're ready to issue the *next* command, because it's our only
> > indication that the current command has completed.
> 
> I think that's a bug.  If we do something that depends on a command
> completion, we'll have to wait for it.
> 
> > pciehp_set_attention_status
> > pciehp_green_led_on
> > pciehp_green_led_off
> > pciehp_green_led_blink
> > pciehp_power_on_slot
> > pciehp_power_off_slot
> > pcie_enable_notification
> > pcie_disable_notification
> > 
> > The slot power ones concern me the same as the reset issue.  In the 'on'
> > case, we immediately call link enable, so the slot better have
> > acknowledged the power on command.  In the 'off' case, it seems pretty
> > bad to complete the 'off' function, but we don't know if the slot is
> > actually off yet.  
> 
> I agree.
> 
> > For notifications, I can imagine that the caller of
> > those functions is going to want on or off at the completion of those
> > functions, not some in-between state.  
> 
> These control interrupt generation by the hotplug controller in the switch.
> When disabling interrupt generation, I think we should wait for completion
> before we call free_irq(); waiting should prevent spurious interrupts.
> When enabling interrupt generation, we've already hooked up the ISR before
> we enable interrupt generation, so I don't see a need to wait.
> 
> > Even in the case of the LED
> > functions, where do we want the error to occur, at the time the command
> > is issued, or maybe some time in the future when the next unrelated
> > command comes through.
> 
> The only error here is the "Timeout on hotplug command" message, and we
> don't do anything other than print the message, so I'm not too worried
> about this one.
> 
> > So, rather than posting a new patch adding a flush everywhere that it
> > seems a little scary, should we consider scrapping this patch altogether
> > as unsafe?  We need to wait, not only to issue the next command, but to
> > have any sort of acknowledgment that the current command has completed.
> 
> I definitely screwed up by assuming that completion was only useful for
> pacing writes to the command register.  I think lazy checking is fine for
> command pacing, but we certainly need to know about completions for other
> things, too.
> 
> If we added calls to pcie_wait_cmd() in these places:
> 
>   pciehp_power_on_slot
>   pciehp_power_off_slot
>   pcie_disable_notification
>   pciehp_reset_slot
> 
> do you think that would be enough?

I think that would solve the problem, but the API becomes very difficult
to use correctly if the programmer just needs to know that a
pcie_wait_cmd() is necessary following a pcie_write_cmd() if you really
want to be sure it's synchronous.  pcie_write_cmd() should probably
incorporate the "safe" behavior and some new _nowait version should
handle the special cases where we don't need to wait

> > > Because of hardware errata, some controllers generate command completion
> > > events for some commands but not others.  In the case of Intel CF118 (see
> > > spec update reference), the controller indicates command completion only
> > > for Slot Control writes that change the value of the following bits:
> > > 
> > >   Power Controller Control
> > >   Power Indicator Control
> > >   Attention Indicator Control
> > >   Electromechanical Interlock Control
> > > 
> > > Changes to other bits, e.g., the interrupt enable bits, do not cause the
> > > Command Completed bit to be set.  Controllers from AMD and Nvidia are
> > > reported to have similar errata.
> > > 
> > > These errata cause timeouts when pcie_enable_notification() enables
> > > interrupts.  Previously that timeout occurred at boot-time.  With this
> > > change, the timeout occurs later, when we change the state of the slot
> > > power, indicators, or interlock.  This speeds up boot but causes a timeout
> > > at the first hotplug event on the slot.  Subsequent events don't timeout
> > > because only the first (boot-time) hotplug command updates Slot Control
> > > without touching the power/indicator/interlock controls.
> > 
> > It sounds like we need some sort of completion mask to handle devices
> > like this, instead of effectively removing the write-barrier for the
> > general case.  Thanks,
> 
> You mean some quirk-based solution, where we know certain devices don't
> indicate command completion for certain events?  That seems hard because I
> think there are many devices that have this erratum.  It was difficult to
> extract this out of Intel, and I'm pretty sure other vendors have the same
> issue.

Yeah, I was thinking of a bitmap where we could quirk non-completion of
devices, but maybe it's not worth it.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC 2/4] PCI: pciehp: Wait for hotplug command completion lazily
  2015-06-01 22:02                                               ` Alex Williamson
@ 2015-06-01 22:12                                                 ` Bjorn Helgaas
  0 siblings, 0 replies; 71+ messages in thread
From: Bjorn Helgaas @ 2015-06-01 22:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yinghai Lu, Jan C. Nordholz, Kenji Kaneshige, Rajat Jain,
	linux-pci, Myron Stowe

On Mon, Jun 01, 2015 at 04:02:59PM -0600, Alex Williamson wrote:
> On Mon, 2015-06-01 at 16:43 -0500, Bjorn Helgaas wrote:
> > If we added calls to pcie_wait_cmd() in these places:
> > 
> >   pciehp_power_on_slot
> >   pciehp_power_off_slot
> >   pcie_disable_notification
> >   pciehp_reset_slot
> > 
> > do you think that would be enough?
> 
> I think that would solve the problem, but the API becomes very difficult
> to use correctly if the programmer just needs to know that a
> pcie_wait_cmd() is necessary following a pcie_write_cmd() if you really
> want to be sure it's synchronous.  pcie_write_cmd() should probably
> incorporate the "safe" behavior and some new _nowait version should
> handle the special cases where we don't need to wait

I don't think of pcie_write_cmd() as an API, since it's static to
pciehp_hpc.c, but I do like your idea.

Bjorn

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2015-06-01 22:12 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-55211-13546@https.bugzilla.kernel.org/>
     [not found] ` <20130317155023.5B2EB11FB81@bugzilla.kernel.org>
2013-03-17 17:19   ` [Bug 55211] pci_disable_link_state PCIE_LINK_STATE_L0S no longer disables ASPM for ath5k Yinghai Lu
2013-03-18 17:37     ` [PATCH] PCI: Remove not needed check in disable aspm link Yinghai Lu
2013-03-27 22:56       ` Bjorn Helgaas
2013-03-28  7:41         ` Yinghai Lu
2013-03-28 12:46           ` Bjorn Helgaas
2013-03-28 20:21             ` Yinghai Lu
2013-03-28 20:24             ` Yinghai Lu
2013-03-28 20:24               ` Yinghai Lu
2013-03-29  3:22                 ` Bjorn Helgaas
2013-03-29  5:59                   ` Yinghai Lu
2013-03-29 12:24                     ` Bjorn Helgaas
2013-03-29 12:24                       ` Bjorn Helgaas
2013-03-29 18:02                       ` Yinghai Lu
2013-03-29 18:02                         ` Yinghai Lu
2013-03-29 18:04                         ` Yinghai Lu
2013-03-29 18:04                           ` Yinghai Lu
2013-04-01 23:52                           ` Bjorn Helgaas
2013-04-01 23:52                             ` Bjorn Helgaas
2013-04-02  0:03                             ` Yinghai Lu
2013-04-02  0:03                               ` Yinghai Lu
2013-04-02 20:10                               ` Bjorn Helgaas
2013-04-02 20:10                                 ` Bjorn Helgaas
2013-06-12  6:20                                 ` Yinghai Lu
2013-06-12  6:20                                   ` Yinghai Lu
2013-06-12 17:05                                   ` Bjorn Helgaas
2013-06-12 17:05                                     ` Bjorn Helgaas
2013-06-12 19:41                                     ` Yinghai Lu
2013-06-13  3:50                                       ` Bjorn Helgaas
2013-06-13  4:11                                         ` Jiang Liu (Gerry)
2013-06-13  4:11                                           ` Jiang Liu (Gerry)
2013-06-13 13:57                                           ` Bjorn Helgaas
2013-06-13  5:47                                         ` Yinghai Lu
2013-06-13 12:04                                           ` Rafael J. Wysocki
2013-06-14 14:11                                       ` Bjorn Helgaas
2013-06-14 16:17                                         ` Yinghai Lu
2013-06-14 16:33                                           ` Bjorn Helgaas
2013-06-14 16:57                                             ` Yinghai Lu
2013-06-14 17:44                                               ` Bjorn Helgaas
2013-06-14 18:26                                                 ` Yinghai Lu
2013-06-14 21:26                                                   ` Bjorn Helgaas
2013-06-14 21:30                                                     ` Matthew Garrett
2013-06-14 21:30                                                       ` Matthew Garrett
2013-06-14 21:30                                                       ` Matthew Garrett
2013-06-14 22:17                                                     ` Yinghai Lu
2013-06-14 22:27                                                       ` Matthew Garrett
2013-06-14 22:27                                                         ` Matthew Garrett
2013-06-14 22:27                                                         ` Matthew Garrett
2013-06-14 22:40                                                         ` Yinghai Lu
2013-06-14 22:48                                                           ` Matthew Garrett
2013-06-14 22:48                                                             ` Matthew Garrett
2013-06-14 22:48                                                             ` Matthew Garrett
2013-06-14 23:00                                                             ` Yinghai Lu
2014-06-14 21:21                                       ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Bjorn Helgaas
2014-06-14 21:21                                         ` [PATCH RFC 1/4] PCI: pciehp: Make pcie_wait_cmd() self-contained Bjorn Helgaas
2014-06-14 21:21                                         ` [PATCH RFC 2/4] PCI: pciehp: Wait for hotplug command completion lazily Bjorn Helgaas
2015-05-29 22:45                                           ` Alex Williamson
2015-06-01 21:43                                             ` Bjorn Helgaas
2015-06-01 22:02                                               ` Alex Williamson
2015-06-01 22:12                                                 ` Bjorn Helgaas
2014-06-14 21:21                                         ` [PATCH RFC 3/4] PCI: pciehp: Compute timeout from hotplug command start time Bjorn Helgaas
2014-06-15  2:18                                           ` Yinghai Lu
2014-06-17  0:13                                             ` Bjorn Helgaas
2014-06-17 17:33                                               ` Yinghai Lu
2014-06-14 21:21                                         ` [PATCH RFC 4/4] PCI: pciehp: Remove assumptions about which commands cause completion events Bjorn Helgaas
2014-06-17  3:25                                           ` Rajat Jain
2014-06-16  1:26                                         ` [PATCH RFC 0/4] PCI: pciehp: Fix Command Completion handling Rajat Jain
2014-08-15 22:05                                         ` Yinghai Lu
2014-08-15 23:35                                           ` Bjorn Helgaas
2013-04-02  0:10                             ` [PATCH] PCI: Remove not needed check in disable aspm link Rafael J. Wysocki
2013-04-02  0:10                               ` Rafael J. Wysocki
2013-03-29 18:11                   ` Roman Yepishev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.