linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bug 216877] New: Regression in PCI powermanagement breaks resume after suspend
       [not found] <ba98ad17-53b0-1f76-70f7-10d37d279f9d@witt.link>
@ 2023-01-04 15:02 ` Bjorn Helgaas
  2023-01-04 15:37   ` Thomas Witt
  0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2023-01-04 15:02 UTC (permalink / raw)
  To: Thomas Witt; +Cc: Vidya Sagar, Thomas Witt, linux-kernel, linux-pci

On Wed, Jan 04, 2023 at 09:44:25AM +0100, Thomas Witt wrote:
> On 04/01/2023 01:30, Bjorn Helgaas wrote:
> > On Mon, Jan 02, 2023 at 11:15:16AM -0600, Bjorn Helgaas wrote:
> > > On Mon, Jan 02, 2023 at 02:02:51PM +0000, bugzilla-daemon@kernel.org wrote:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=216877
> > > > 
> > > >              Bug ID: 216877
> > > >             Summary: Regression in PCI powermanagement breaks resume after
> > > >                      suspend
> > > >      Kernel Version: 6.0.0-rc1

BTW, if the bisect is correct, I think the regression actually is in
v6.1-rc1, where 5e85eba6f50d ("PCI/ASPM: Refactor L1 PM Substates
Control Register programming") appeared. 

> > > > Created attachment 303512
> > > >    --> https://bugzilla.kernel.org/attachment.cgi?id=303512&action=edit
> > > > output of git bisect log
> > > > 
> > > > After commit 5e85eba6f50dc288c22083a7e213152bcc4b8208 "PCI/ASPM:
> > > > Refactor L1 PM Substates Control Register programming" my Laptop
> > > > does not resume PCI devices back from suspend.
> > 
> > Thomas, could you try the debug patch below on top of v6.2-rc1?
> 
> Thank you for that patch Bjorn, but as far as I can see it does not change
> anything.

Thanks for testing it.  Maybe Vidya will have more ideas.  The patch
below (based on v6.2-rc1) would revert 5e85eba6f50d and 4ff116d0d5fd.
If 5e85eba6f50d is the culprit, it should fix the regression.  It
would also potentially break L1 substates after resume, so we'd like
to avoid reverting it if possible.

But the "Unable to change power state from D3hot to D0, device
inaccessible" symptom suggests that the device is still in D3, which
would be more like a wakeup issue than an ASPM issue. 

Your bisect log said 3e347969a577 ("PCI/PM: Reduce D3hot delay with
usleep_range()") was "good", but it would be worth double-checking,
e.g., see if reverting it from v6.2-rc1 makes any difference.

Bjorn

commit 61de2691d549 ("Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming"")
parent 1b929c02afd3
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Wed Jan 4 08:38:53 2023 -0600

    Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming"
    
    This reverts 5e85eba6f50d ("PCI/ASPM: Refactor L1 PM Substates Control
    Register programming") and 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates
    Capability for suspend/resume").

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index fba95486caaf..5641786bd020 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1665,7 +1665,6 @@ int pci_save_state(struct pci_dev *dev)
 		return i;
 
 	pci_save_ltr_state(dev);
-	pci_save_aspm_l1ss_state(dev);
 	pci_save_dpc_state(dev);
 	pci_save_aer_state(dev);
 	pci_save_ptm_state(dev);
@@ -1772,7 +1771,6 @@ void pci_restore_state(struct pci_dev *dev)
 	 * LTR itself (in the PCIe capability).
 	 */
 	pci_restore_ltr_state(dev);
-	pci_restore_aspm_l1ss_state(dev);
 
 	pci_restore_pcie_state(dev);
 	pci_restore_pasid_state(dev);
@@ -3465,11 +3463,6 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
 	if (error)
 		pci_err(dev, "unable to allocate suspend buffer for LTR\n");
 
-	error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
-					    2 * sizeof(u32));
-	if (error)
-		pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
-
 	pci_allocate_vc_save_buffers(dev);
 }
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9ed3b5550043..9049d07d3aae 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -566,14 +566,10 @@ bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
 void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
 void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
-void pci_save_aspm_l1ss_state(struct pci_dev *dev);
-void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
 #else
 static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
 static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
 static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
-static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
-static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
 #endif
 
 #ifdef CONFIG_PCIE_ECRC
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 53a1fa306e1e..4b4184563a92 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -470,31 +470,6 @@ static void pci_clear_and_set_dword(struct pci_dev *pdev, int pos,
 	pci_write_config_dword(pdev, pos, val);
 }
 
-static void aspm_program_l1ss(struct pci_dev *dev, u32 ctl1, u32 ctl2)
-{
-	u16 l1ss = dev->l1ss;
-	u32 l1_2_enable;
-
-	/*
-	 * Per PCIe r6.0, sec 5.5.4, T_POWER_ON in PCI_L1SS_CTL2 must be
-	 * programmed prior to setting the L1.2 enable bits in PCI_L1SS_CTL1.
-	 */
-	pci_write_config_dword(dev, l1ss + PCI_L1SS_CTL2, ctl2);
-
-	/*
-	 * In addition, Common_Mode_Restore_Time and LTR_L1.2_THRESHOLD in
-	 * PCI_L1SS_CTL1 must be programmed *before* setting the L1.2
-	 * enable bits, even though they're all in PCI_L1SS_CTL1.
-	 */
-	l1_2_enable = ctl1 & PCI_L1SS_CTL1_L1_2_MASK;
-	ctl1 &= ~PCI_L1SS_CTL1_L1_2_MASK;
-
-	pci_write_config_dword(dev, l1ss + PCI_L1SS_CTL1, ctl1);
-	if (l1_2_enable)
-		pci_write_config_dword(dev, l1ss + PCI_L1SS_CTL1,
-				       ctl1 | l1_2_enable);
-}
-
 /* Calculate L1.2 PM substate timing parameters */
 static void aspm_calc_l1ss_info(struct pcie_link_state *link,
 				u32 parent_l1ss_cap, u32 child_l1ss_cap)
@@ -504,6 +479,7 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
 	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
 	u32 ctl1 = 0, ctl2 = 0;
 	u32 pctl1, pctl2, cctl1, cctl2;
+	u32 pl1_2_enables, cl1_2_enables;
 
 	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
 		return;
@@ -552,21 +528,39 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
 	    ctl2 == pctl2 && ctl2 == cctl2)
 		return;
 
-	pctl1 &= ~(PCI_L1SS_CTL1_CM_RESTORE_TIME |
-		   PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
-		   PCI_L1SS_CTL1_LTR_L12_TH_SCALE);
-	pctl1 |= (ctl1 & (PCI_L1SS_CTL1_CM_RESTORE_TIME |
-			  PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
-			  PCI_L1SS_CTL1_LTR_L12_TH_SCALE));
-	aspm_program_l1ss(parent, pctl1, ctl2);
+	/* Disable L1.2 while updating.  See PCIe r5.0, sec 5.5.4, 7.8.3.3 */
+	pl1_2_enables = pctl1 & PCI_L1SS_CTL1_L1_2_MASK;
+	cl1_2_enables = cctl1 & PCI_L1SS_CTL1_L1_2_MASK;
 
-	cctl1 &= ~(PCI_L1SS_CTL1_CM_RESTORE_TIME |
-		   PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
-		   PCI_L1SS_CTL1_LTR_L12_TH_SCALE);
-	cctl1 |= (ctl1 & (PCI_L1SS_CTL1_CM_RESTORE_TIME |
-			  PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
-			  PCI_L1SS_CTL1_LTR_L12_TH_SCALE));
-	aspm_program_l1ss(child, cctl1, ctl2);
+	if (pl1_2_enables || cl1_2_enables) {
+		pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1,
+					PCI_L1SS_CTL1_L1_2_MASK, 0);
+		pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
+					PCI_L1SS_CTL1_L1_2_MASK, 0);
+	}
+
+	/* Program T_POWER_ON times in both ports */
+	pci_write_config_dword(parent, parent->l1ss + PCI_L1SS_CTL2, ctl2);
+	pci_write_config_dword(child, child->l1ss + PCI_L1SS_CTL2, ctl2);
+
+	/* Program Common_Mode_Restore_Time in upstream device */
+	pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
+				PCI_L1SS_CTL1_CM_RESTORE_TIME, ctl1);
+
+	/* Program LTR_L1.2_THRESHOLD time in both ports */
+	pci_clear_and_set_dword(parent,	parent->l1ss + PCI_L1SS_CTL1,
+				PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
+				PCI_L1SS_CTL1_LTR_L12_TH_SCALE, ctl1);
+	pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1,
+				PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
+				PCI_L1SS_CTL1_LTR_L12_TH_SCALE, ctl1);
+
+	if (pl1_2_enables || cl1_2_enables) {
+		pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1, 0,
+					pl1_2_enables);
+		pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1, 0,
+					cl1_2_enables);
+	}
 }
 
 static void aspm_l1ss_init(struct pcie_link_state *link)
@@ -757,43 +751,6 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
 				PCI_L1SS_CTL1_L1SS_MASK, val);
 }
 
-void pci_save_aspm_l1ss_state(struct pci_dev *dev)
-{
-	struct pci_cap_saved_state *save_state;
-	u16 l1ss = dev->l1ss;
-	u32 *cap;
-
-	if (!l1ss)
-		return;
-
-	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
-	if (!save_state)
-		return;
-
-	cap = (u32 *)&save_state->cap.data[0];
-	pci_read_config_dword(dev, l1ss + PCI_L1SS_CTL2, cap++);
-	pci_read_config_dword(dev, l1ss + PCI_L1SS_CTL1, cap++);
-}
-
-void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
-{
-	struct pci_cap_saved_state *save_state;
-	u32 *cap, ctl1, ctl2;
-	u16 l1ss = dev->l1ss;
-
-	if (!l1ss)
-		return;
-
-	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
-	if (!save_state)
-		return;
-
-	cap = (u32 *)&save_state->cap.data[0];
-	ctl2 = *cap++;
-	ctl1 = *cap;
-	aspm_program_l1ss(dev, ctl1, ctl2);
-}
-
 static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
 {
 	pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Bug 216877] New: Regression in PCI powermanagement breaks resume after suspend
  2023-01-04 15:02 ` [Bug 216877] New: Regression in PCI powermanagement breaks resume after suspend Bjorn Helgaas
@ 2023-01-04 15:37   ` Thomas Witt
  2023-01-26 19:24     ` Thomas Witt
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Witt @ 2023-01-04 15:37 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Vidya Sagar, linux-kernel, linux-pci

On 04/01/2023 16:02, Bjorn Helgaas wrote:
> On Wed, Jan 04, 2023 at 09:44:25AM +0100, Thomas Witt wrote:
>> On 04/01/2023 01:30, Bjorn Helgaas wrote:
>>> On Mon, Jan 02, 2023 at 11:15:16AM -0600, Bjorn Helgaas wrote:
>>>> On Mon, Jan 02, 2023 at 02:02:51PM +0000, bugzilla-daemon@kernel.org wrote:
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=216877
>>>>>
>>>>>               Bug ID: 216877
>>>>>              Summary: Regression in PCI powermanagement breaks resume after
>>>>>                       suspend
>>>>>       Kernel Version: 6.0.0-rc1
> 
> BTW, if the bisect is correct, I think the regression actually is in
> v6.1-rc1, where 5e85eba6f50d ("PCI/ASPM: Refactor L1 PM Substates
> Control Register programming") appeared.
> 
>>>>> Created attachment 303512
>>>>>     --> https://bugzilla.kernel.org/attachment.cgi?id=303512&action=edit
>>>>> output of git bisect log
>>>>>
>>>>> After commit 5e85eba6f50dc288c22083a7e213152bcc4b8208 "PCI/ASPM:
>>>>> Refactor L1 PM Substates Control Register programming" my Laptop
>>>>> does not resume PCI devices back from suspend.
>>>
>>> Thomas, could you try the debug patch below on top of v6.2-rc1?
>>
>> Thank you for that patch Bjorn, but as far as I can see it does not change
>> anything.
> 
> Thanks for testing it.  Maybe Vidya will have more ideas.  The patch
> below (based on v6.2-rc1) would revert 5e85eba6f50d and 4ff116d0d5fd.
> If 5e85eba6f50d is the culprit, it should fix the regression.  It
> would also potentially break L1 substates after resume, so we'd like
> to avoid reverting it if possible.
> 
> But the "Unable to change power state from D3hot to D0, device
> inaccessible" symptom suggests that the device is still in D3, which
> would be more like a wakeup issue than an ASPM issue.
> 
> Your bisect log said 3e347969a577 ("PCI/PM: Reduce D3hot delay with
> usleep_range()") was "good", but it would be worth double-checking,
> e.g., see if reverting it from v6.2-rc1 makes any difference.
> 
> Bjorn
> 
> commit 61de2691d549 ("Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming"")
> parent 1b929c02afd3
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Wed Jan 4 08:38:53 2023 -0600
> 
>      Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming"

With this patch on top of 6.2-rc1 suspend/resume works and my PCI 
devices come back online.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 216877] New: Regression in PCI powermanagement breaks resume after suspend
  2023-01-04 15:37   ` Thomas Witt
@ 2023-01-26 19:24     ` Thomas Witt
  2023-02-02 20:49       ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Witt @ 2023-01-26 19:24 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Vidya Sagar, linux-kernel, linux-pci

On 04/01/2023 16:37, Thomas Witt wrote:
> On 04/01/2023 16:02, Bjorn Helgaas wrote:
>> Thanks for testing it.  Maybe Vidya will have more ideas.  The patch
>> below (based on v6.2-rc1) would revert 5e85eba6f50d and 4ff116d0d5fd.
>> If 5e85eba6f50d is the culprit, it should fix the regression.  It
>> would also potentially break L1 substates after resume, so we'd like
>> to avoid reverting it if possible.
>>
>> But the "Unable to change power state from D3hot to D0, device
>> inaccessible" symptom suggests that the device is still in D3, which
>> would be more like a wakeup issue than an ASPM issue.
>>
>> Your bisect log said 3e347969a577 ("PCI/PM: Reduce D3hot delay with
>> usleep_range()") was "good", but it would be worth double-checking,
>> e.g., see if reverting it from v6.2-rc1 makes any difference.
>>
>> Bjorn
>>
>> commit 61de2691d549 ("Revert "PCI/ASPM: Refactor L1 PM Substates 
>> Control Register programming"")
>> parent 1b929c02afd3
>> Author: Bjorn Helgaas <bhelgaas@google.com>
>> Date:   Wed Jan 4 08:38:53 2023 -0600
>>
>>      Revert "PCI/ASPM: Refactor L1 PM Substates Control Register 
>> programming"
> 
> With this patch on top of 6.2-rc1 suspend/resume works and my PCI 
> devices come back online.
> 

Hello Bjorn, hello Vidya,

do you have an Idea what went wrong in that commit to cause my PCI 
devices to not return from D3?

BR
Thomas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 216877] New: Regression in PCI powermanagement breaks resume after suspend
  2023-01-26 19:24     ` Thomas Witt
@ 2023-02-02 20:49       ` Bjorn Helgaas
  0 siblings, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2023-02-02 20:49 UTC (permalink / raw)
  To: Thomas Witt; +Cc: Vidya Sagar, linux-kernel, linux-pci

On Thu, Jan 26, 2023 at 08:24:23PM +0100, Thomas Witt wrote:
> On 04/01/2023 16:37, Thomas Witt wrote:
> > On 04/01/2023 16:02, Bjorn Helgaas wrote:
> > > Thanks for testing it.  Maybe Vidya will have more ideas.  The patch
> > > below (based on v6.2-rc1) would revert 5e85eba6f50d and 4ff116d0d5fd.
> > > If 5e85eba6f50d is the culprit, it should fix the regression.  It
> > > would also potentially break L1 substates after resume, so we'd like
> > > to avoid reverting it if possible.
> > > 
> > > But the "Unable to change power state from D3hot to D0, device
> > > inaccessible" symptom suggests that the device is still in D3, which
> > > would be more like a wakeup issue than an ASPM issue.
> > > 
> > > Your bisect log said 3e347969a577 ("PCI/PM: Reduce D3hot delay with
> > > usleep_range()") was "good", but it would be worth double-checking,
> > > e.g., see if reverting it from v6.2-rc1 makes any difference.
> > > 
> > > Bjorn
> > > 
> > > commit 61de2691d549 ("Revert "PCI/ASPM: Refactor L1 PM Substates
> > > Control Register programming"")
> > > parent 1b929c02afd3
> > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > Date:   Wed Jan 4 08:38:53 2023 -0600
> > > 
> > >      Revert "PCI/ASPM: Refactor L1 PM Substates Control Register
> > > programming"
> > 
> > With this patch on top of 6.2-rc1 suspend/resume works and my PCI
> > devices come back online.
> 
> do you have an Idea what went wrong in that commit to cause my PCI devices
> to not return from D3?

I do not have any ideas yet.

I suspect maybe we should plan to revert 5e85eba6f50d ("PCI/ASPM:
Refactor L1 PM Substates Control Register programming") and
4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") for now.

It's odd that they both appeared in v6.1, and we haven't seen a flood
of reports about this, so maybe there's something unusual about your
machine, Thomas.

I'll post a revert, and we can do that for v6.2 if we don't come up
with any other ideas.

Bjorn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 216877] New: Regression in PCI powermanagement breaks resume after suspend
  2023-01-02 17:15 ` Bjorn Helgaas
@ 2023-01-04  0:30   ` Bjorn Helgaas
  0 siblings, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2023-01-04  0:30 UTC (permalink / raw)
  To: Vidya Sagar; +Cc: Thomas Witt, linux-kernel, linux-pci

On Mon, Jan 02, 2023 at 11:15:16AM -0600, Bjorn Helgaas wrote:
> On Mon, Jan 02, 2023 at 02:02:51PM +0000, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216877
> > 
> >             Bug ID: 216877
> >            Summary: Regression in PCI powermanagement breaks resume after
> >                     suspend
> >     Kernel Version: 6.0.0-rc1
> > 
> > Created attachment 303512
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=303512&action=edit
> > output of git bisect log
> > 
> > After commit 5e85eba6f50dc288c22083a7e213152bcc4b8208 "PCI/ASPM:
> > Refactor L1 PM Substates Control Register programming" my Laptop
> > does not resume PCI devices back from suspend.

Thomas, could you try the debug patch below on top of v6.2-rc1?

Prior to 5e85eba6f50d, aspm_calc_l1ss_info() did these config writes:

    if (enables)
 a    child  clear L1SS_CTL1  PCIPM_L1_2 ASPM_L1_2	# disable L1.2
 b    parent clear L1SS_CTL1  PCIPM_L1_2 ASPM_L1_2	# disable L1.2
    /* T_POWER_ON in both */
 c  parent write L1SS_CTL2  T_POWER_ON
 d  child  write L1SS_CTL2  T_POWER_ON
    /* Common_Mode_Restore_Time in parent (rsvd in child) */
 e  parent write L1SS_CTL1  CM_RESTORE_TIME | LTR_L12_TH # clear CM_REST_TIME 
    /* LTR_L1.2_THRESHOLD in both */
 f  parent write L1SS_CTL1  CM_RESTORE_TIME | LTR_L12_TH # clear LTR_L2_TH
 g  child  write L1SS_CTL1  CM_RESTORE_TIME | LTR_L12_TH # CM_REST_TIME rsvd?
    if (enables)
 h    parent write L1SS_CTL1  PCIPM_L1_2 ASPM_L1_2
 i    child  write L1SS_CTL1  PCIPM_L1_2 ASPM_L1_2

After 5e85eba6f50d, we do these:

 A  parent write L1SS_CTL2  T_POWER_ON
 B  parent write L1SS_CTL1  (CM_RESTORE_TIME | LTR_L12_TH) & ~(PCIPM_L1_2 ASPM_L1_2)
    if (enable)
 C    parent write L1SS_CTL1  PCIPM_L1_2 ASPM_L1_2

 D  child  write L1SS_CTL2  T_POWER_ON
 E  child  write L1SS_CTL1  (CM_RESTORE_TIME | LTR_L12_TH) & ~(PCIPM_L1_2 ASPM_L1_2)
    if (enable)
 F    child  write L1SS_CTL1  PCIPM_L1_2 ASPM_L1_2

Notes:

  - Before 5e85eba6f50d, we disable L1.2 at (a,b) before writing
    T_POWER_ON, CM_RESTORE_TIME, and LTR_L12_TH at (c,d,e,f,g).

    After 5e85eba6f50d, we write T_POWER_ON, CM_RESTORE_TIME, and
    LTR_L12_TH at (A,B) without disabling L1.2.  Sec 5.5.4 suggests
    this may be a problem.

  - Even before 5e85eba6f50d, we write CM_REST_TIME for the child at
    (g), which is reserved per spec.

  - Before 5e85eba6f50d, the write at (e) inserts the new CMRT value,
    but ORs in the LTR_L12_TH values without clearing what was there
    before.  The write at (f) inserts LTR_L12_TH correctly, so the
    result is probably correct, but it's messy.  I think 5e85eba6f50d
    does this better.


commit 98248ea220c8 ("debug https://bugzilla.kernel.org/show_bug.cgi?id=216877")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Tue Jan 3 18:15:11 2023 -0600

    debug https://bugzilla.kernel.org/show_bug.cgi?id=216877
    

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 53a1fa306e1e..398c817858ac 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -552,6 +552,11 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
 	    ctl2 == pctl2 && ctl2 == cctl2)
 		return;
 
+	pci_clear_and_set_dword(child, child->l1ss + PCI_L1SS_CTL1,
+				PCI_L1SS_CTL1_L1_2_MASK, 0);
+	pci_clear_and_set_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
+				PCI_L1SS_CTL1_L1_2_MASK, 0);
+
 	pctl1 &= ~(PCI_L1SS_CTL1_CM_RESTORE_TIME |
 		   PCI_L1SS_CTL1_LTR_L12_TH_VALUE |
 		   PCI_L1SS_CTL1_LTR_L12_TH_SCALE);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Bug 216877] New: Regression in PCI powermanagement breaks resume after suspend
       [not found] <bug-216877-41252@https.bugzilla.kernel.org/>
@ 2023-01-02 17:15 ` Bjorn Helgaas
  2023-01-04  0:30   ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2023-01-02 17:15 UTC (permalink / raw)
  To: Vidya Sagar; +Cc: Thomas Witt, linux-kernel, linux-pci

Thank you very much for your report and all the work of bisecting,
Thomas!  If you have a chance, can you collect the "sudo lspci -vv"
output (the one attached doesn't include the ASPM info, probably
because lspci wasn't run as root), and also a complete dmesg log from
before the suspend?

On Mon, Jan 02, 2023 at 02:02:51PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216877
> 
>             Bug ID: 216877
>            Summary: Regression in PCI powermanagement breaks resume after
>                     suspend
>     Kernel Version: 6.0.0-rc1
> 
> Created attachment 303512
>   --> https://bugzilla.kernel.org/attachment.cgi?id=303512&action=edit
> output of git bisect log
> 
> After commit 5e85eba6f50dc288c22083a7e213152bcc4b8208 "PCI/ASPM:
> Refactor L1 PM Substates Control Register programming" my Laptop
> does not resume PCI devices back from suspend.
> 
> My Laptop is a Tuxedo Infinitybook S 14 v5, as far as I can tell
> they use a Clevo L140CU Mainboard.
> 
> The main symptom is:
> iwlwifi 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
> nvme 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible
> 
> after that, the level of interaction I still have with the laptop
> varies, but It cannot run dmesg and it cannot do a clean reboot. The
> issue occurs on every suspend/resume cycle.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-02-02 20:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ba98ad17-53b0-1f76-70f7-10d37d279f9d@witt.link>
2023-01-04 15:02 ` [Bug 216877] New: Regression in PCI powermanagement breaks resume after suspend Bjorn Helgaas
2023-01-04 15:37   ` Thomas Witt
2023-01-26 19:24     ` Thomas Witt
2023-02-02 20:49       ` Bjorn Helgaas
     [not found] <bug-216877-41252@https.bugzilla.kernel.org/>
2023-01-02 17:15 ` Bjorn Helgaas
2023-01-04  0:30   ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).