linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
@ 2022-07-05  6:00 Vidya Sagar
  2022-07-13 17:59 ` Vidya Sagar
  2022-07-15 10:38 ` Ben Chuang
  0 siblings, 2 replies; 24+ messages in thread
From: Vidya Sagar @ 2022-07-05  6:00 UTC (permalink / raw)
  To: bhelgaas, lorenzo.pieralisi, refactormyself, kw, rajatja, kenny,
	treding, jonathanh, abhsahu, sagupta
  Cc: benchuanggli, linux-pci, linux-kernel, kthota, mmaddireddy,
	vidyas, sagar.tv

Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
saved and restored during suspend/resume leading to L1 Substates
configuration being lost post-resume.

Save the L1 Substates control registers so that the configuration is
retained post-resume.

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
---
Hi,
Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
on your laptop (Dell XPS 13) one last time?
IMHO, the regression observed on your laptop with an old version of the patch
could be due to a buggy old version BIOS in the laptop.

Thanks,
Vidya Sagar

 drivers/pci/pci.c       |  7 +++++++
 drivers/pci/pci.h       |  4 ++++
 drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index cfaf40a540a8..aca05880aaa3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
 		return i;
 
 	pci_save_ltr_state(dev);
+	pci_save_aspm_l1ss_state(dev);
 	pci_save_dpc_state(dev);
 	pci_save_aer_state(dev);
 	pci_save_ptm_state(dev);
@@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
 	 * LTR itself (in the PCIe capability).
 	 */
 	pci_restore_ltr_state(dev);
+	pci_restore_aspm_l1ss_state(dev);
 
 	pci_restore_pcie_state(dev);
 	pci_restore_pasid_state(dev);
@@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
 	if (error)
 		pci_err(dev, "unable to allocate suspend buffer for LTR\n");
 
+	error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
+					    2 * sizeof(u32));
+	if (error)
+		pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
+
 	pci_allocate_vc_save_buffers(dev);
 }
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e10cdec6c56e..92d8c92662a4 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
 void pcie_aspm_pm_state_change(struct pci_dev *pdev);
 void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
+void pci_save_aspm_l1ss_state(struct pci_dev *dev);
+void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
 #else
 static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
 static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
 static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
 static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
+static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
+static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
 #endif
 
 #ifdef CONFIG_PCIE_ECRC
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index a96b7424c9bc..2c29fdd20059 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
 				PCI_L1SS_CTL1_L1SS_MASK, val);
 }
 
+void pci_save_aspm_l1ss_state(struct pci_dev *dev)
+{
+	int aspm_l1ss;
+	struct pci_cap_saved_state *save_state;
+	u32 *cap;
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
+	if (!aspm_l1ss)
+		return;
+
+	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
+	if (!save_state)
+		return;
+
+	cap = (u32 *)&save_state->cap.data[0];
+	pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
+	pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
+}
+
+void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
+{
+	int aspm_l1ss;
+	struct pci_cap_saved_state *save_state;
+	u32 *cap;
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
+	if (!aspm_l1ss)
+		return;
+
+	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
+	if (!save_state)
+		return;
+
+	cap = (u32 *)&save_state->cap.data[0];
+	pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
+	pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
+}
+
 static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
 {
 	pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-05  6:00 [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume Vidya Sagar
@ 2022-07-13 17:59 ` Vidya Sagar
  2022-07-13 18:16   ` Bjorn Helgaas
  2022-07-15 10:38 ` Ben Chuang
  1 sibling, 1 reply; 24+ messages in thread
From: Vidya Sagar @ 2022-07-13 17:59 UTC (permalink / raw)
  To: bhelgaas, lorenzo.pieralisi, refactormyself, kw, rajatja, kenny,
	treding, jonathanh, abhsahu, sagupta
  Cc: benchuanggli, linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv

Hi,
@Kenneth, Could you please verify it on your laptop one last time?

@Bjorn, Could you please review this change in the meanwhile?

Thanks,
Vidya Sagar

On 7/5/2022 11:30 AM, Vidya Sagar wrote:
> External email: Use caution opening links or attachments
> 
> 
> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> saved and restored during suspend/resume leading to L1 Substates
> configuration being lost post-resume.
> 
> Save the L1 Substates control registers so that the configuration is
> retained post-resume.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> ---
> Hi,
> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> on your laptop (Dell XPS 13) one last time?
> IMHO, the regression observed on your laptop with an old version of the patch
> could be due to a buggy old version BIOS in the laptop.
> 
> Thanks,
> Vidya Sagar
> 
>   drivers/pci/pci.c       |  7 +++++++
>   drivers/pci/pci.h       |  4 ++++
>   drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 55 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index cfaf40a540a8..aca05880aaa3 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
>                  return i;
> 
>          pci_save_ltr_state(dev);
> +       pci_save_aspm_l1ss_state(dev);
>          pci_save_dpc_state(dev);
>          pci_save_aer_state(dev);
>          pci_save_ptm_state(dev);
> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
>           * LTR itself (in the PCIe capability).
>           */
>          pci_restore_ltr_state(dev);
> +       pci_restore_aspm_l1ss_state(dev);
> 
>          pci_restore_pcie_state(dev);
>          pci_restore_pasid_state(dev);
> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
>          if (error)
>                  pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> 
> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> +                                           2 * sizeof(u32));
> +       if (error)
> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> +
>          pci_allocate_vc_save_buffers(dev);
>   }
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index e10cdec6c56e..92d8c92662a4 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
>   void pcie_aspm_exit_link_state(struct pci_dev *pdev);
>   void pcie_aspm_pm_state_change(struct pci_dev *pdev);
>   void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>   #else
>   static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
>   static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
>   static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
>   static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
>   #endif
> 
>   #ifdef CONFIG_PCIE_ECRC
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index a96b7424c9bc..2c29fdd20059 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>                                  PCI_L1SS_CTL1_L1SS_MASK, val);
>   }
> 
> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> +{
> +       int aspm_l1ss;
> +       struct pci_cap_saved_state *save_state;
> +       u32 *cap;
> +
> +       if (!pci_is_pcie(dev))
> +               return;
> +
> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> +       if (!aspm_l1ss)
> +               return;
> +
> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> +       if (!save_state)
> +               return;
> +
> +       cap = (u32 *)&save_state->cap.data[0];
> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> +}
> +
> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> +{
> +       int aspm_l1ss;
> +       struct pci_cap_saved_state *save_state;
> +       u32 *cap;
> +
> +       if (!pci_is_pcie(dev))
> +               return;
> +
> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> +       if (!aspm_l1ss)
> +               return;
> +
> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> +       if (!save_state)
> +               return;
> +
> +       cap = (u32 *)&save_state->cap.data[0];
> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> +}
> +
>   static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
>   {
>          pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> --
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-13 17:59 ` Vidya Sagar
@ 2022-07-13 18:16   ` Bjorn Helgaas
  2022-07-14  4:20     ` Kai-Heng Feng
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2022-07-13 18:16 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: bhelgaas, lorenzo.pieralisi, refactormyself, kw, rajatja, kenny,
	treding, jonathanh, abhsahu, sagupta, benchuanggli, linux-pci,
	linux-kernel, kthota, mmaddireddy, sagar.tv, Kai-Heng Feng

[+cc Kai-Heng]

On Wed, Jul 13, 2022 at 11:29:42PM +0530, Vidya Sagar wrote:
> Hi,
> @Kenneth, Could you please verify it on your laptop one last time?
> 
> @Bjorn, Could you please review this change in the meanwhile?

Seems like this may be related to Kai-Heng's patch:
https://lore.kernel.org/r/20220509073639.2048236-1-kai.heng.feng@canonical.com
since he specifically mentioned L1SS.

I applied Kai-Heng's patch for v5.20 yesterday, but I haven't worked
out the connection to this patch.  But if you want Kenneth to test
this, it should probably be on top of Kai-Heng's patch so we're
testing something close to the eventual result.

> On 7/5/2022 11:30 AM, Vidya Sagar wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> > saved and restored during suspend/resume leading to L1 Substates
> > configuration being lost post-resume.
> > 
> > Save the L1 Substates control registers so that the configuration is
> > retained post-resume.
> > 
> > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> > ---
> > Hi,
> > Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> > on your laptop (Dell XPS 13) one last time?
> > IMHO, the regression observed on your laptop with an old version of the patch
> > could be due to a buggy old version BIOS in the laptop.
> > 
> > Thanks,
> > Vidya Sagar
> > 
> >   drivers/pci/pci.c       |  7 +++++++
> >   drivers/pci/pci.h       |  4 ++++
> >   drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> >   3 files changed, 55 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index cfaf40a540a8..aca05880aaa3 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> >                  return i;
> > 
> >          pci_save_ltr_state(dev);
> > +       pci_save_aspm_l1ss_state(dev);
> >          pci_save_dpc_state(dev);
> >          pci_save_aer_state(dev);
> >          pci_save_ptm_state(dev);
> > @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> >           * LTR itself (in the PCIe capability).
> >           */
> >          pci_restore_ltr_state(dev);
> > +       pci_restore_aspm_l1ss_state(dev);
> > 
> >          pci_restore_pcie_state(dev);
> >          pci_restore_pasid_state(dev);
> > @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >          if (error)
> >                  pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> > 
> > +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> > +                                           2 * sizeof(u32));
> > +       if (error)
> > +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> > +
> >          pci_allocate_vc_save_buffers(dev);
> >   }
> > 
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index e10cdec6c56e..92d8c92662a4 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> >   void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> >   void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> >   void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> > +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> > +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> >   #else
> >   static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> >   static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> >   static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> >   static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> > +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> > +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> >   #endif
> > 
> >   #ifdef CONFIG_PCIE_ECRC
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index a96b7424c9bc..2c29fdd20059 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >                                  PCI_L1SS_CTL1_L1SS_MASK, val);
> >   }
> > 
> > +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> > +{
> > +       int aspm_l1ss;
> > +       struct pci_cap_saved_state *save_state;
> > +       u32 *cap;
> > +
> > +       if (!pci_is_pcie(dev))
> > +               return;
> > +
> > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > +       if (!aspm_l1ss)
> > +               return;
> > +
> > +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > +       if (!save_state)
> > +               return;
> > +
> > +       cap = (u32 *)&save_state->cap.data[0];
> > +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> > +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> > +}
> > +
> > +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> > +{
> > +       int aspm_l1ss;
> > +       struct pci_cap_saved_state *save_state;
> > +       u32 *cap;
> > +
> > +       if (!pci_is_pcie(dev))
> > +               return;
> > +
> > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > +       if (!aspm_l1ss)
> > +               return;
> > +
> > +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > +       if (!save_state)
> > +               return;
> > +
> > +       cap = (u32 *)&save_state->cap.data[0];
> > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> > +}
> > +
> >   static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> >   {
> >          pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> > --
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-13 18:16   ` Bjorn Helgaas
@ 2022-07-14  4:20     ` Kai-Heng Feng
  0 siblings, 0 replies; 24+ messages in thread
From: Kai-Heng Feng @ 2022-07-14  4:20 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Vidya Sagar, bhelgaas, lorenzo.pieralisi, refactormyself, kw,
	rajatja, kenny, treding, jonathanh, abhsahu, sagupta,
	benchuanggli, linux-pci, linux-kernel, kthota, mmaddireddy,
	sagar.tv

On Thu, Jul 14, 2022 at 2:16 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Kai-Heng]
>
> On Wed, Jul 13, 2022 at 11:29:42PM +0530, Vidya Sagar wrote:
> > Hi,
> > @Kenneth, Could you please verify it on your laptop one last time?
> >
> > @Bjorn, Could you please review this change in the meanwhile?
>
> Seems like this may be related to Kai-Heng's patch:
> https://lore.kernel.org/r/20220509073639.2048236-1-kai.heng.feng@canonical.com
> since he specifically mentioned L1SS.

Yes, to make L1ss restore successful on system resume this patch is also needed.

Kai-Heng

>
> I applied Kai-Heng's patch for v5.20 yesterday, but I haven't worked
> out the connection to this patch.  But if you want Kenneth to test
> this, it should probably be on top of Kai-Heng's patch so we're
> testing something close to the eventual result.
>
> > On 7/5/2022 11:30 AM, Vidya Sagar wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> > > saved and restored during suspend/resume leading to L1 Substates
> > > configuration being lost post-resume.
> > >
> > > Save the L1 Substates control registers so that the configuration is
> > > retained post-resume.
> > >
> > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> > > ---
> > > Hi,
> > > Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> > > on your laptop (Dell XPS 13) one last time?
> > > IMHO, the regression observed on your laptop with an old version of the patch
> > > could be due to a buggy old version BIOS in the laptop.
> > >
> > > Thanks,
> > > Vidya Sagar
> > >
> > >   drivers/pci/pci.c       |  7 +++++++
> > >   drivers/pci/pci.h       |  4 ++++
> > >   drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> > >   3 files changed, 55 insertions(+)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index cfaf40a540a8..aca05880aaa3 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> > >                  return i;
> > >
> > >          pci_save_ltr_state(dev);
> > > +       pci_save_aspm_l1ss_state(dev);
> > >          pci_save_dpc_state(dev);
> > >          pci_save_aer_state(dev);
> > >          pci_save_ptm_state(dev);
> > > @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> > >           * LTR itself (in the PCIe capability).
> > >           */
> > >          pci_restore_ltr_state(dev);
> > > +       pci_restore_aspm_l1ss_state(dev);
> > >
> > >          pci_restore_pcie_state(dev);
> > >          pci_restore_pasid_state(dev);
> > > @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> > >          if (error)
> > >                  pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> > >
> > > +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> > > +                                           2 * sizeof(u32));
> > > +       if (error)
> > > +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> > > +
> > >          pci_allocate_vc_save_buffers(dev);
> > >   }
> > >
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index e10cdec6c56e..92d8c92662a4 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> > >   void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> > >   void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> > >   void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> > > +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> > > +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> > >   #else
> > >   static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> > >   static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> > >   static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> > >   static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> > > +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> > > +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> > >   #endif
> > >
> > >   #ifdef CONFIG_PCIE_ECRC
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index a96b7424c9bc..2c29fdd20059 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> > >                                  PCI_L1SS_CTL1_L1SS_MASK, val);
> > >   }
> > >
> > > +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> > > +{
> > > +       int aspm_l1ss;
> > > +       struct pci_cap_saved_state *save_state;
> > > +       u32 *cap;
> > > +
> > > +       if (!pci_is_pcie(dev))
> > > +               return;
> > > +
> > > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > > +       if (!aspm_l1ss)
> > > +               return;
> > > +
> > > +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > > +       if (!save_state)
> > > +               return;
> > > +
> > > +       cap = (u32 *)&save_state->cap.data[0];
> > > +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> > > +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> > > +}
> > > +
> > > +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> > > +{
> > > +       int aspm_l1ss;
> > > +       struct pci_cap_saved_state *save_state;
> > > +       u32 *cap;
> > > +
> > > +       if (!pci_is_pcie(dev))
> > > +               return;
> > > +
> > > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > > +       if (!aspm_l1ss)
> > > +               return;
> > > +
> > > +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > > +       if (!save_state)
> > > +               return;
> > > +
> > > +       cap = (u32 *)&save_state->cap.data[0];
> > > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> > > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> > > +}
> > > +
> > >   static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> > >   {
> > >          pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> > > --
> > > 2.17.1
> > >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-05  6:00 [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume Vidya Sagar
  2022-07-13 17:59 ` Vidya Sagar
@ 2022-07-15 10:38 ` Ben Chuang
  2022-07-22  7:31   ` Kai-Heng Feng
  1 sibling, 1 reply; 24+ messages in thread
From: Ben Chuang @ 2022-07-15 10:38 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: bhelgaas, lorenzo.pieralisi, refactormyself, kw, rajatja, kenny,
	treding, jonathanh, abhsahu, sagupta, linux-pci,
	Linux Kernel Mailing List, kthota, mmaddireddy, sagar.tv

On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
>
> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> saved and restored during suspend/resume leading to L1 Substates
> configuration being lost post-resume.
>
> Save the L1 Substates control registers so that the configuration is
> retained post-resume.
>
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>

Hi Vidya,

I tested this patch on kernel v5.19-rc6.
The test device is GL9755 card reader controller on Intel i5-10210U RVP.
This patch can restore L1SS after suspend/resume.

The test results are as follows:

After Boot:
#lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
        Capabilities: [110 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=255us
PortTPowerOnTime=3100us
                L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
                           T_CommonMode=0us LTR1.2_Threshold=3145728ns
                L1SubCtl2: T_PwrOn=3100us


After suspend/resume without this patch.
#lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
        Capabilities: [110 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=255us
PortTPowerOnTime=3100us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us


After suspend/resume with this patch.
#lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
        Capabilities: [110 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=255us
PortTPowerOnTime=3100us
                L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
                           T_CommonMode=0us LTR1.2_Threshold=3145728ns
                L1SubCtl2: T_PwrOn=3100us


Tested-by: Ben Chuang <benchuanggli@gmail.com>

Best regards,
Ben Chuang


> ---
> Hi,
> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> on your laptop (Dell XPS 13) one last time?
> IMHO, the regression observed on your laptop with an old version of the patch
> could be due to a buggy old version BIOS in the laptop.
>
> Thanks,
> Vidya Sagar
>
>  drivers/pci/pci.c       |  7 +++++++
>  drivers/pci/pci.h       |  4 ++++
>  drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 55 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index cfaf40a540a8..aca05880aaa3 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
>                 return i;
>
>         pci_save_ltr_state(dev);
> +       pci_save_aspm_l1ss_state(dev);
>         pci_save_dpc_state(dev);
>         pci_save_aer_state(dev);
>         pci_save_ptm_state(dev);
> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
>          * LTR itself (in the PCIe capability).
>          */
>         pci_restore_ltr_state(dev);
> +       pci_restore_aspm_l1ss_state(dev);
>
>         pci_restore_pcie_state(dev);
>         pci_restore_pasid_state(dev);
> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
>         if (error)
>                 pci_err(dev, "unable to allocate suspend buffer for LTR\n");
>
> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> +                                           2 * sizeof(u32));
> +       if (error)
> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> +
>         pci_allocate_vc_save_buffers(dev);
>  }
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index e10cdec6c56e..92d8c92662a4 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
>  void pcie_aspm_exit_link_state(struct pci_dev *pdev);
>  void pcie_aspm_pm_state_change(struct pci_dev *pdev);
>  void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>  #else
>  static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
>  static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
>  static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
>  static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
>  #endif
>
>  #ifdef CONFIG_PCIE_ECRC
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index a96b7424c9bc..2c29fdd20059 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>                                 PCI_L1SS_CTL1_L1SS_MASK, val);
>  }
>
> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> +{
> +       int aspm_l1ss;
> +       struct pci_cap_saved_state *save_state;
> +       u32 *cap;
> +
> +       if (!pci_is_pcie(dev))
> +               return;
> +
> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> +       if (!aspm_l1ss)
> +               return;
> +
> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> +       if (!save_state)
> +               return;
> +
> +       cap = (u32 *)&save_state->cap.data[0];
> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> +}
> +
> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> +{
> +       int aspm_l1ss;
> +       struct pci_cap_saved_state *save_state;
> +       u32 *cap;
> +
> +       if (!pci_is_pcie(dev))
> +               return;
> +
> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> +       if (!aspm_l1ss)
> +               return;
> +
> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> +       if (!save_state)
> +               return;
> +
> +       cap = (u32 *)&save_state->cap.data[0];
> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> +}
> +
>  static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
>  {
>         pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-15 10:38 ` Ben Chuang
@ 2022-07-22  7:31   ` Kai-Heng Feng
  2022-07-22  9:41     ` Lukasz Majczak
  0 siblings, 1 reply; 24+ messages in thread
From: Kai-Heng Feng @ 2022-07-22  7:31 UTC (permalink / raw)
  To: Ben Chuang
  Cc: Vidya Sagar, bhelgaas, lorenzo.pieralisi, refactormyself, kw,
	rajatja, kenny, treding, jonathanh, abhsahu, sagupta, linux-pci,
	Linux Kernel Mailing List, kthota, mmaddireddy, sagar.tv

On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
>
> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> >
> > Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> > saved and restored during suspend/resume leading to L1 Substates
> > configuration being lost post-resume.
> >
> > Save the L1 Substates control registers so that the configuration is
> > retained post-resume.
> >
> > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
>
> Hi Vidya,
>
> I tested this patch on kernel v5.19-rc6.
> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> This patch can restore L1SS after suspend/resume.
>
> The test results are as follows:
>
> After Boot:
> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>         Capabilities: [110 v1] L1 PM Substates
>                 L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> ASPM_L1.1+ L1_PM_Substates+
>                           PortCommonModeRestoreTime=255us
> PortTPowerOnTime=3100us
>                 L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>                            T_CommonMode=0us LTR1.2_Threshold=3145728ns
>                 L1SubCtl2: T_PwrOn=3100us
>
>
> After suspend/resume without this patch.
> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>         Capabilities: [110 v1] L1 PM Substates
>                 L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> ASPM_L1.1+ L1_PM_Substates+
>                           PortCommonModeRestoreTime=255us
> PortTPowerOnTime=3100us
>                 L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>                            T_CommonMode=0us LTR1.2_Threshold=0ns
>                 L1SubCtl2: T_PwrOn=10us
>
>
> After suspend/resume with this patch.
> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>         Capabilities: [110 v1] L1 PM Substates
>                 L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> ASPM_L1.1+ L1_PM_Substates+
>                           PortCommonModeRestoreTime=255us
> PortTPowerOnTime=3100us
>                 L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>                            T_CommonMode=0us LTR1.2_Threshold=3145728ns
>                 L1SubCtl2: T_PwrOn=3100us
>
>
> Tested-by: Ben Chuang <benchuanggli@gmail.com>

Forgot to add mine:
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

>
> Best regards,
> Ben Chuang
>
>
> > ---
> > Hi,
> > Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> > on your laptop (Dell XPS 13) one last time?
> > IMHO, the regression observed on your laptop with an old version of the patch
> > could be due to a buggy old version BIOS in the laptop.
> >
> > Thanks,
> > Vidya Sagar
> >
> >  drivers/pci/pci.c       |  7 +++++++
> >  drivers/pci/pci.h       |  4 ++++
> >  drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 55 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index cfaf40a540a8..aca05880aaa3 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> >                 return i;
> >
> >         pci_save_ltr_state(dev);
> > +       pci_save_aspm_l1ss_state(dev);
> >         pci_save_dpc_state(dev);
> >         pci_save_aer_state(dev);
> >         pci_save_ptm_state(dev);
> > @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> >          * LTR itself (in the PCIe capability).
> >          */
> >         pci_restore_ltr_state(dev);
> > +       pci_restore_aspm_l1ss_state(dev);
> >
> >         pci_restore_pcie_state(dev);
> >         pci_restore_pasid_state(dev);
> > @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >         if (error)
> >                 pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> >
> > +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> > +                                           2 * sizeof(u32));
> > +       if (error)
> > +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> > +
> >         pci_allocate_vc_save_buffers(dev);
> >  }
> >
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index e10cdec6c56e..92d8c92662a4 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> >  void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> >  void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> >  void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> > +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> > +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> >  #else
> >  static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> >  static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> >  static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> >  static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> > +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> > +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> >  #endif
> >
> >  #ifdef CONFIG_PCIE_ECRC
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index a96b7424c9bc..2c29fdd20059 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >                                 PCI_L1SS_CTL1_L1SS_MASK, val);
> >  }
> >
> > +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> > +{
> > +       int aspm_l1ss;
> > +       struct pci_cap_saved_state *save_state;
> > +       u32 *cap;
> > +
> > +       if (!pci_is_pcie(dev))
> > +               return;
> > +
> > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > +       if (!aspm_l1ss)
> > +               return;
> > +
> > +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > +       if (!save_state)
> > +               return;
> > +
> > +       cap = (u32 *)&save_state->cap.data[0];
> > +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> > +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> > +}
> > +
> > +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> > +{
> > +       int aspm_l1ss;
> > +       struct pci_cap_saved_state *save_state;
> > +       u32 *cap;
> > +
> > +       if (!pci_is_pcie(dev))
> > +               return;
> > +
> > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > +       if (!aspm_l1ss)
> > +               return;
> > +
> > +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > +       if (!save_state)
> > +               return;
> > +
> > +       cap = (u32 *)&save_state->cap.data[0];
> > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> > +}
> > +
> >  static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> >  {
> >         pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-22  7:31   ` Kai-Heng Feng
@ 2022-07-22  9:41     ` Lukasz Majczak
  2022-07-22 17:42       ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Lukasz Majczak @ 2022-07-22  9:41 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Ben Chuang, Vidya Sagar, bhelgaas, lorenzo.pieralisi,
	refactormyself, kw, rajatja, kenny, treding, jonathanh, abhsahu,
	sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
>
> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
> >
> > On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> > >
> > > Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> > > saved and restored during suspend/resume leading to L1 Substates
> > > configuration being lost post-resume.
> > >
> > > Save the L1 Substates control registers so that the configuration is
> > > retained post-resume.
> > >
> > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> >
> > Hi Vidya,
> >
> > I tested this patch on kernel v5.19-rc6.
> > The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> > This patch can restore L1SS after suspend/resume.
> >
> > The test results are as follows:
> >
> > After Boot:
> > #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >         Capabilities: [110 v1] L1 PM Substates
> >                 L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > ASPM_L1.1+ L1_PM_Substates+
> >                           PortCommonModeRestoreTime=255us
> > PortTPowerOnTime=3100us
> >                 L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >                            T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >                 L1SubCtl2: T_PwrOn=3100us
> >
> >
> > After suspend/resume without this patch.
> > #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >         Capabilities: [110 v1] L1 PM Substates
> >                 L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > ASPM_L1.1+ L1_PM_Substates+
> >                           PortCommonModeRestoreTime=255us
> > PortTPowerOnTime=3100us
> >                 L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >                            T_CommonMode=0us LTR1.2_Threshold=0ns
> >                 L1SubCtl2: T_PwrOn=10us
> >
> >
> > After suspend/resume with this patch.
> > #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >         Capabilities: [110 v1] L1 PM Substates
> >                 L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > ASPM_L1.1+ L1_PM_Substates+
> >                           PortCommonModeRestoreTime=255us
> > PortTPowerOnTime=3100us
> >                 L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >                            T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >                 L1SubCtl2: T_PwrOn=3100us
> >
> >
> > Tested-by: Ben Chuang <benchuanggli@gmail.com>
>
> Forgot to add mine:
> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>
> >
> > Best regards,
> > Ben Chuang
> >
> >
> > > ---
> > > Hi,
> > > Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> > > on your laptop (Dell XPS 13) one last time?
> > > IMHO, the regression observed on your laptop with an old version of the patch
> > > could be due to a buggy old version BIOS in the laptop.
> > >
> > > Thanks,
> > > Vidya Sagar
> > >
> > >  drivers/pci/pci.c       |  7 +++++++
> > >  drivers/pci/pci.h       |  4 ++++
> > >  drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 55 insertions(+)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index cfaf40a540a8..aca05880aaa3 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> > >                 return i;
> > >
> > >         pci_save_ltr_state(dev);
> > > +       pci_save_aspm_l1ss_state(dev);
> > >         pci_save_dpc_state(dev);
> > >         pci_save_aer_state(dev);
> > >         pci_save_ptm_state(dev);
> > > @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> > >          * LTR itself (in the PCIe capability).
> > >          */
> > >         pci_restore_ltr_state(dev);
> > > +       pci_restore_aspm_l1ss_state(dev);
> > >
> > >         pci_restore_pcie_state(dev);
> > >         pci_restore_pasid_state(dev);
> > > @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> > >         if (error)
> > >                 pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> > >
> > > +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> > > +                                           2 * sizeof(u32));
> > > +       if (error)
> > > +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> > > +
> > >         pci_allocate_vc_save_buffers(dev);
> > >  }
> > >
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index e10cdec6c56e..92d8c92662a4 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> > >  void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> > >  void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> > >  void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> > > +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> > > +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> > >  #else
> > >  static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> > >  static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> > >  static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> > >  static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> > > +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> > > +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> > >  #endif
> > >
> > >  #ifdef CONFIG_PCIE_ECRC
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index a96b7424c9bc..2c29fdd20059 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> > >                                 PCI_L1SS_CTL1_L1SS_MASK, val);
> > >  }
> > >
> > > +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> > > +{
> > > +       int aspm_l1ss;
> > > +       struct pci_cap_saved_state *save_state;
> > > +       u32 *cap;
> > > +
> > > +       if (!pci_is_pcie(dev))
> > > +               return;
> > > +
> > > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > > +       if (!aspm_l1ss)
> > > +               return;
> > > +
> > > +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > > +       if (!save_state)
> > > +               return;
> > > +
> > > +       cap = (u32 *)&save_state->cap.data[0];
> > > +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> > > +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> > > +}
> > > +
> > > +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> > > +{
> > > +       int aspm_l1ss;
> > > +       struct pci_cap_saved_state *save_state;
> > > +       u32 *cap;
> > > +
> > > +       if (!pci_is_pcie(dev))
> > > +               return;
> > > +
> > > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > > +       if (!aspm_l1ss)
> > > +               return;
> > > +
> > > +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > > +       if (!save_state)
> > > +               return;
> > > +
> > > +       cap = (u32 *)&save_state->cap.data[0];
> > > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> > > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> > > +}
> > > +
> > >  static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> > >  {
> > >         pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> > > --
> > > 2.17.1
> > >

Hi,

With this patch (and also mentioned
https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
applied on 5.10 (chromeos-5.10) I am observing problems after
suspend/resume with my WiFi card - it looks like whole communication
via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3

I played a little bit with this code and it looks like the
pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
why, not a PCI expert).

Best regards,
Lukasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-22  9:41     ` Lukasz Majczak
@ 2022-07-22 17:42       ` Bjorn Helgaas
  2022-07-23 17:03         ` Vidya Sagar
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2022-07-22 17:42 UTC (permalink / raw)
  To: Lukasz Majczak
  Cc: Kai-Heng Feng, Ben Chuang, Vidya Sagar, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, rajatja, kenny, treding,
	jonathanh, abhsahu, sagupta, linux-pci,
	Linux Kernel Mailing List, kthota, mmaddireddy, sagar.tv

On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
> > On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
> > > On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> > > >
> > > > Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> > > > saved and restored during suspend/resume leading to L1 Substates
> > > > configuration being lost post-resume.
> > > >
> > > > Save the L1 Substates control registers so that the configuration is
> > > > retained post-resume.
> > > >
> > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> > >
> > > Hi Vidya,
> > >
> > > I tested this patch on kernel v5.19-rc6.
> > > The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> > > This patch can restore L1SS after suspend/resume.
> > >
> > > The test results are as follows:
> > >
> > > After Boot:
> > > #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> > >         Capabilities: [110 v1] L1 PM Substates
> > >                 L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > > ASPM_L1.1+ L1_PM_Substates+
> > >                           PortCommonModeRestoreTime=255us
> > > PortTPowerOnTime=3100us
> > >                 L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > >                            T_CommonMode=0us LTR1.2_Threshold=3145728ns
> > >                 L1SubCtl2: T_PwrOn=3100us
> > >
> > >
> > > After suspend/resume without this patch.
> > > #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> > >         Capabilities: [110 v1] L1 PM Substates
> > >                 L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > > ASPM_L1.1+ L1_PM_Substates+
> > >                           PortCommonModeRestoreTime=255us
> > > PortTPowerOnTime=3100us
> > >                 L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> > >                            T_CommonMode=0us LTR1.2_Threshold=0ns
> > >                 L1SubCtl2: T_PwrOn=10us
> > >
> > >
> > > After suspend/resume with this patch.
> > > #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> > >         Capabilities: [110 v1] L1 PM Substates
> > >                 L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > > ASPM_L1.1+ L1_PM_Substates+
> > >                           PortCommonModeRestoreTime=255us
> > > PortTPowerOnTime=3100us
> > >                 L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > >                            T_CommonMode=0us LTR1.2_Threshold=3145728ns
> > >                 L1SubCtl2: T_PwrOn=3100us
> > >
> > >
> > > Tested-by: Ben Chuang <benchuanggli@gmail.com>
> >
> > Forgot to add mine:
> > Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >
> > >
> > > Best regards,
> > > Ben Chuang
> > >
> > >
> > > > ---
> > > > Hi,
> > > > Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> > > > on your laptop (Dell XPS 13) one last time?
> > > > IMHO, the regression observed on your laptop with an old version of the patch
> > > > could be due to a buggy old version BIOS in the laptop.
> > > >
> > > > Thanks,
> > > > Vidya Sagar
> > > >
> > > >  drivers/pci/pci.c       |  7 +++++++
> > > >  drivers/pci/pci.h       |  4 ++++
> > > >  drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 55 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index cfaf40a540a8..aca05880aaa3 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> > > >                 return i;
> > > >
> > > >         pci_save_ltr_state(dev);
> > > > +       pci_save_aspm_l1ss_state(dev);
> > > >         pci_save_dpc_state(dev);
> > > >         pci_save_aer_state(dev);
> > > >         pci_save_ptm_state(dev);
> > > > @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> > > >          * LTR itself (in the PCIe capability).
> > > >          */
> > > >         pci_restore_ltr_state(dev);
> > > > +       pci_restore_aspm_l1ss_state(dev);
> > > >
> > > >         pci_restore_pcie_state(dev);
> > > >         pci_restore_pasid_state(dev);
> > > > @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> > > >         if (error)
> > > >                 pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> > > >
> > > > +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> > > > +                                           2 * sizeof(u32));
> > > > +       if (error)
> > > > +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> > > > +
> > > >         pci_allocate_vc_save_buffers(dev);
> > > >  }
> > > >
> > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > > index e10cdec6c56e..92d8c92662a4 100644
> > > > --- a/drivers/pci/pci.h
> > > > +++ b/drivers/pci/pci.h
> > > > @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> > > >  void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> > > >  void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> > > >  void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> > > > +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> > > > +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> > > >  #else
> > > >  static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> > > >  static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> > > >  static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> > > >  static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> > > > +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> > > > +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> > > >  #endif
> > > >
> > > >  #ifdef CONFIG_PCIE_ECRC
> > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > index a96b7424c9bc..2c29fdd20059 100644
> > > > --- a/drivers/pci/pcie/aspm.c
> > > > +++ b/drivers/pci/pcie/aspm.c
> > > > @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> > > >                                 PCI_L1SS_CTL1_L1SS_MASK, val);
> > > >  }
> > > >
> > > > +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> > > > +{
> > > > +       int aspm_l1ss;
> > > > +       struct pci_cap_saved_state *save_state;
> > > > +       u32 *cap;
> > > > +
> > > > +       if (!pci_is_pcie(dev))
> > > > +               return;
> > > > +
> > > > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > > > +       if (!aspm_l1ss)
> > > > +               return;
> > > > +
> > > > +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > > > +       if (!save_state)
> > > > +               return;
> > > > +
> > > > +       cap = (u32 *)&save_state->cap.data[0];
> > > > +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> > > > +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> > > > +}
> > > > +
> > > > +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> > > > +{
> > > > +       int aspm_l1ss;
> > > > +       struct pci_cap_saved_state *save_state;
> > > > +       u32 *cap;
> > > > +
> > > > +       if (!pci_is_pcie(dev))
> > > > +               return;
> > > > +
> > > > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > > > +       if (!aspm_l1ss)
> > > > +               return;
> > > > +
> > > > +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > > > +       if (!save_state)
> > > > +               return;
> > > > +
> > > > +       cap = (u32 *)&save_state->cap.data[0];
> > > > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> > > > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> > > > +}
> > > > +
> > > >  static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> > > >  {
> > > >         pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> > > > --
> > > > 2.17.1
> > > >
> 
> Hi,
> 
> With this patch (and also mentioned
> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
> applied on 5.10 (chromeos-5.10) I am observing problems after
> suspend/resume with my WiFi card - it looks like whole communication
> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> 
> I played a little bit with this code and it looks like the
> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> why, not a PCI expert).

Thanks a lot for testing this!  I'm not quite sure what to make of the
results since v5.10 is fairly old (Dec 2020) and I don't know what
other changes are in chromeos-5.10.

Random observations, no analysis below.  This from your dmesg
certainly looks like PCI reads failing and returning ~0:

  Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
  iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
  iwlwifi 0000:01:00.0: Device gone - attempting removal
  Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.

And then we re-enumerate 01:00.0 and it looks like it may have been
reset (BAR is 0):

  pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
  pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]

lspci diffs from before/after suspend:

   00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
     Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
  -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
  +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
  -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
  +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
  -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
  +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
  -       Capabilities: [150 v0] Null
  -       Capabilities: [200 v1] L1 PM Substates
  -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
  -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
  -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
  -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
  -               L1SubCtl2: T_PwrOn=60us

The DevSta differences might be BIOS bugs, probably not relevant.
Interesting that ASPM is disabled, maybe didn't get enabled after
re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
disappeared.

   01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
		  LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
  -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
  +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
	  Capabilities: [154 v1] L1 PM Substates
		  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			    PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
  -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
  -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
  +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
  +                          T_CommonMode=0us LTR1.2_Threshold=0ns

Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
get reinitialized after re-enumeration?  Looks like we didn't restore
L1SubCtl1.

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-22 17:42       ` Bjorn Helgaas
@ 2022-07-23 17:03         ` Vidya Sagar
  2022-07-25 22:50           ` Rajat Jain
  0 siblings, 1 reply; 24+ messages in thread
From: Vidya Sagar @ 2022-07-23 17:03 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukasz Majczak
  Cc: Kai-Heng Feng, Ben Chuang, bhelgaas, lorenzo.pieralisi,
	refactormyself, kw, rajatja, kenny, treding, jonathanh, abhsahu,
	sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

Agree with Bjorn's observations.
The fact that the L1SS capability registers themselves disappeared in 
the root port post resume indicates that there seems to be something 
wrong with the BIOS itself.
Could you please check from that perspective?

Thanks,
Vidya Sagar


On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>>
>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
>>>>> saved and restored during suspend/resume leading to L1 Substates
>>>>> configuration being lost post-resume.
>>>>>
>>>>> Save the L1 Substates control registers so that the configuration is
>>>>> retained post-resume.
>>>>>
>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
>>>>
>>>> Hi Vidya,
>>>>
>>>> I tested this patch on kernel v5.19-rc6.
>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
>>>> This patch can restore L1SS after suspend/resume.
>>>>
>>>> The test results are as follows:
>>>>
>>>> After Boot:
>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>          Capabilities: [110 v1] L1 PM Substates
>>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>                            PortCommonModeRestoreTime=255us
>>>> PortTPowerOnTime=3100us
>>>>                  L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>                             T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>                  L1SubCtl2: T_PwrOn=3100us
>>>>
>>>>
>>>> After suspend/resume without this patch.
>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>          Capabilities: [110 v1] L1 PM Substates
>>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>                            PortCommonModeRestoreTime=255us
>>>> PortTPowerOnTime=3100us
>>>>                  L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>                             T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>                  L1SubCtl2: T_PwrOn=10us
>>>>
>>>>
>>>> After suspend/resume with this patch.
>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>          Capabilities: [110 v1] L1 PM Substates
>>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>                            PortCommonModeRestoreTime=255us
>>>> PortTPowerOnTime=3100us
>>>>                  L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>                             T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>                  L1SubCtl2: T_PwrOn=3100us
>>>>
>>>>
>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
>>>
>>> Forgot to add mine:
>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>
>>>>
>>>> Best regards,
>>>> Ben Chuang
>>>>
>>>>
>>>>> ---
>>>>> Hi,
>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
>>>>> on your laptop (Dell XPS 13) one last time?
>>>>> IMHO, the regression observed on your laptop with an old version of the patch
>>>>> could be due to a buggy old version BIOS in the laptop.
>>>>>
>>>>> Thanks,
>>>>> Vidya Sagar
>>>>>
>>>>>   drivers/pci/pci.c       |  7 +++++++
>>>>>   drivers/pci/pci.h       |  4 ++++
>>>>>   drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
>>>>>   3 files changed, 55 insertions(+)
>>>>>
>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>> index cfaf40a540a8..aca05880aaa3 100644
>>>>> --- a/drivers/pci/pci.c
>>>>> +++ b/drivers/pci/pci.c
>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
>>>>>                  return i;
>>>>>
>>>>>          pci_save_ltr_state(dev);
>>>>> +       pci_save_aspm_l1ss_state(dev);
>>>>>          pci_save_dpc_state(dev);
>>>>>          pci_save_aer_state(dev);
>>>>>          pci_save_ptm_state(dev);
>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
>>>>>           * LTR itself (in the PCIe capability).
>>>>>           */
>>>>>          pci_restore_ltr_state(dev);
>>>>> +       pci_restore_aspm_l1ss_state(dev);
>>>>>
>>>>>          pci_restore_pcie_state(dev);
>>>>>          pci_restore_pasid_state(dev);
>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
>>>>>          if (error)
>>>>>                  pci_err(dev, "unable to allocate suspend buffer for LTR\n");
>>>>>
>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
>>>>> +                                           2 * sizeof(u32));
>>>>> +       if (error)
>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
>>>>> +
>>>>>          pci_allocate_vc_save_buffers(dev);
>>>>>   }
>>>>>
>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>> index e10cdec6c56e..92d8c92662a4 100644
>>>>> --- a/drivers/pci/pci.h
>>>>> +++ b/drivers/pci/pci.h
>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
>>>>>   void pcie_aspm_exit_link_state(struct pci_dev *pdev);
>>>>>   void pcie_aspm_pm_state_change(struct pci_dev *pdev);
>>>>>   void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>>>>>   #else
>>>>>   static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
>>>>>   static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
>>>>>   static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
>>>>>   static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>   #endif
>>>>>
>>>>>   #ifdef CONFIG_PCIE_ECRC
>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>> index a96b7424c9bc..2c29fdd20059 100644
>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>>>>>                                  PCI_L1SS_CTL1_L1SS_MASK, val);
>>>>>   }
>>>>>
>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
>>>>> +{
>>>>> +       int aspm_l1ss;
>>>>> +       struct pci_cap_saved_state *save_state;
>>>>> +       u32 *cap;
>>>>> +
>>>>> +       if (!pci_is_pcie(dev))
>>>>> +               return;
>>>>> +
>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>> +       if (!aspm_l1ss)
>>>>> +               return;
>>>>> +
>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>> +       if (!save_state)
>>>>> +               return;
>>>>> +
>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
>>>>> +}
>>>>> +
>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
>>>>> +{
>>>>> +       int aspm_l1ss;
>>>>> +       struct pci_cap_saved_state *save_state;
>>>>> +       u32 *cap;
>>>>> +
>>>>> +       if (!pci_is_pcie(dev))
>>>>> +               return;
>>>>> +
>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>> +       if (!aspm_l1ss)
>>>>> +               return;
>>>>> +
>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>> +       if (!save_state)
>>>>> +               return;
>>>>> +
>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
>>>>> +}
>>>>> +
>>>>>   static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
>>>>>   {
>>>>>          pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
>>>>> --
>>>>> 2.17.1
>>>>>
>>
>> Hi,
>>
>> With this patch (and also mentioned
>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
>> applied on 5.10 (chromeos-5.10) I am observing problems after
>> suspend/resume with my WiFi card - it looks like whole communication
>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
>>
>> I played a little bit with this code and it looks like the
>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
>> why, not a PCI expert).
> 
> Thanks a lot for testing this!  I'm not quite sure what to make of the
> results since v5.10 is fairly old (Dec 2020) and I don't know what
> other changes are in chromeos-5.10.
> 
> Random observations, no analysis below.  This from your dmesg
> certainly looks like PCI reads failing and returning ~0:
> 
>    Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
>    iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
>    iwlwifi 0000:01:00.0: Device gone - attempting removal
>    Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> 
> And then we re-enumerate 01:00.0 and it looks like it may have been
> reset (BAR is 0):
> 
>    pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
>    pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> 
> lspci diffs from before/after suspend:
> 
>     00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
>       Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
>    -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
>    +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>    -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>    +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>    -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>    +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
>    -       Capabilities: [150 v0] Null
>    -       Capabilities: [200 v1] L1 PM Substates
>    -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>    -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
>    -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>    -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
>    -               L1SubCtl2: T_PwrOn=60us
> 
> The DevSta differences might be BIOS bugs, probably not relevant.
> Interesting that ASPM is disabled, maybe didn't get enabled after
> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
> disappeared.
> 
>     01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
>                    LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>    -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>    +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>            Capabilities: [154 v1] L1 PM Substates
>                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>                              PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
>    -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>    -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
>    +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>    +                          T_CommonMode=0us LTR1.2_Threshold=0ns
> 
> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
> get reinitialized after re-enumeration?  Looks like we didn't restore
> L1SubCtl1.
> 
> Bjorn
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-23 17:03         ` Vidya Sagar
@ 2022-07-25 22:50           ` Rajat Jain
  2022-07-26  7:20             ` Lukasz Majczak
  0 siblings, 1 reply; 24+ messages in thread
From: Rajat Jain @ 2022-07-25 22:50 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: Bjorn Helgaas, Lukasz Majczak, Kai-Heng Feng, Ben Chuang,
	bhelgaas, lorenzo.pieralisi, refactormyself, kw, kenny, treding,
	jonathanh, abhsahu, sagupta, linux-pci,
	Linux Kernel Mailing List, kthota, mmaddireddy, sagar.tv

Hello,

On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
>
> Agree with Bjorn's observations.
> The fact that the L1SS capability registers themselves disappeared in
> the root port post resume indicates that there seems to be something
> wrong with the BIOS itself.
> Could you please check from that perspective?

ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
is a shallower sleep state that preserves more state than, for e.g. S3
(suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
at all. i.e. after the kernel runs its suspend routines, it just puts
the CPU into S0ix state. So I do not think there is a BIOS angle to
this.


>
> Thanks,
> Vidya Sagar
>
>
> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> >> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
> >>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
> >>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>>
> >>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> >>>>> saved and restored during suspend/resume leading to L1 Substates
> >>>>> configuration being lost post-resume.
> >>>>>
> >>>>> Save the L1 Substates control registers so that the configuration is
> >>>>> retained post-resume.
> >>>>>
> >>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> >>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> >>>>
> >>>> Hi Vidya,
> >>>>
> >>>> I tested this patch on kernel v5.19-rc6.
> >>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> >>>> This patch can restore L1SS after suspend/resume.
> >>>>
> >>>> The test results are as follows:
> >>>>
> >>>> After Boot:
> >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>          Capabilities: [110 v1] L1 PM Substates
> >>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>                            PortCommonModeRestoreTime=255us
> >>>> PortTPowerOnTime=3100us
> >>>>                  L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>                             T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>                  L1SubCtl2: T_PwrOn=3100us
> >>>>
> >>>>
> >>>> After suspend/resume without this patch.
> >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>          Capabilities: [110 v1] L1 PM Substates
> >>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>                            PortCommonModeRestoreTime=255us
> >>>> PortTPowerOnTime=3100us
> >>>>                  L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>                             T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>                  L1SubCtl2: T_PwrOn=10us
> >>>>
> >>>>
> >>>> After suspend/resume with this patch.
> >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>          Capabilities: [110 v1] L1 PM Substates
> >>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>                            PortCommonModeRestoreTime=255us
> >>>> PortTPowerOnTime=3100us
> >>>>                  L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>                             T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>                  L1SubCtl2: T_PwrOn=3100us
> >>>>
> >>>>
> >>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
> >>>
> >>> Forgot to add mine:
> >>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >>>
> >>>>
> >>>> Best regards,
> >>>> Ben Chuang
> >>>>
> >>>>
> >>>>> ---
> >>>>> Hi,
> >>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> >>>>> on your laptop (Dell XPS 13) one last time?
> >>>>> IMHO, the regression observed on your laptop with an old version of the patch
> >>>>> could be due to a buggy old version BIOS in the laptop.
> >>>>>
> >>>>> Thanks,
> >>>>> Vidya Sagar
> >>>>>
> >>>>>   drivers/pci/pci.c       |  7 +++++++
> >>>>>   drivers/pci/pci.h       |  4 ++++
> >>>>>   drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> >>>>>   3 files changed, 55 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >>>>> index cfaf40a540a8..aca05880aaa3 100644
> >>>>> --- a/drivers/pci/pci.c
> >>>>> +++ b/drivers/pci/pci.c
> >>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> >>>>>                  return i;
> >>>>>
> >>>>>          pci_save_ltr_state(dev);
> >>>>> +       pci_save_aspm_l1ss_state(dev);
> >>>>>          pci_save_dpc_state(dev);
> >>>>>          pci_save_aer_state(dev);
> >>>>>          pci_save_ptm_state(dev);
> >>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> >>>>>           * LTR itself (in the PCIe capability).
> >>>>>           */
> >>>>>          pci_restore_ltr_state(dev);
> >>>>> +       pci_restore_aspm_l1ss_state(dev);
> >>>>>
> >>>>>          pci_restore_pcie_state(dev);
> >>>>>          pci_restore_pasid_state(dev);
> >>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >>>>>          if (error)
> >>>>>                  pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> >>>>>
> >>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> >>>>> +                                           2 * sizeof(u32));
> >>>>> +       if (error)
> >>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> >>>>> +
> >>>>>          pci_allocate_vc_save_buffers(dev);
> >>>>>   }
> >>>>>
> >>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> >>>>> index e10cdec6c56e..92d8c92662a4 100644
> >>>>> --- a/drivers/pci/pci.h
> >>>>> +++ b/drivers/pci/pci.h
> >>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> >>>>>   void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> >>>>>   void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> >>>>>   void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> >>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> >>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>   #else
> >>>>>   static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> >>>>>   static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> >>>>>   static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> >>>>>   static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> >>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>   #endif
> >>>>>
> >>>>>   #ifdef CONFIG_PCIE_ECRC
> >>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >>>>> index a96b7424c9bc..2c29fdd20059 100644
> >>>>> --- a/drivers/pci/pcie/aspm.c
> >>>>> +++ b/drivers/pci/pcie/aspm.c
> >>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >>>>>                                  PCI_L1SS_CTL1_L1SS_MASK, val);
> >>>>>   }
> >>>>>
> >>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> >>>>> +{
> >>>>> +       int aspm_l1ss;
> >>>>> +       struct pci_cap_saved_state *save_state;
> >>>>> +       u32 *cap;
> >>>>> +
> >>>>> +       if (!pci_is_pcie(dev))
> >>>>> +               return;
> >>>>> +
> >>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>> +       if (!aspm_l1ss)
> >>>>> +               return;
> >>>>> +
> >>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>> +       if (!save_state)
> >>>>> +               return;
> >>>>> +
> >>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> >>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> >>>>> +}
> >>>>> +
> >>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> >>>>> +{
> >>>>> +       int aspm_l1ss;
> >>>>> +       struct pci_cap_saved_state *save_state;
> >>>>> +       u32 *cap;
> >>>>> +
> >>>>> +       if (!pci_is_pcie(dev))
> >>>>> +               return;
> >>>>> +
> >>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>> +       if (!aspm_l1ss)
> >>>>> +               return;
> >>>>> +
> >>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>> +       if (!save_state)
> >>>>> +               return;
> >>>>> +
> >>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> >>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >>>>> +}
> >>>>> +
> >>>>>   static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> >>>>>   {
> >>>>>          pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> >>>>> --
> >>>>> 2.17.1
> >>>>>
> >>
> >> Hi,
> >>
> >> With this patch (and also mentioned
> >> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
> >> applied on 5.10 (chromeos-5.10) I am observing problems after
> >> suspend/resume with my WiFi card - it looks like whole communication
> >> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> >> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> >>
> >> I played a little bit with this code and it looks like the
> >> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> >> why, not a PCI expert).
> >
> > Thanks a lot for testing this!  I'm not quite sure what to make of the
> > results since v5.10 is fairly old (Dec 2020) and I don't know what
> > other changes are in chromeos-5.10.

Lukasz: I assume you are running this on Atlas and are seeing this bug
when uprev'ving it to 5.10 kernel. Can you please try it on a newer
Intel platform that have the latest upstream kernel running already
and see if this can be reproduced there too?
Note that the wifi PCI device is different on newer Intel platforms,
but platform design is similar enough that I suspect we should see
similar bug on those too. The other option is to try the latest
ustream kernel on Atlas. Perhaps if we just care about wifi (and
ignore bringing up the graphics stack and GUI), it may come up
sufficiently enough to try this patch?

Thanks,

Rajat


> >
> > Random observations, no analysis below.  This from your dmesg
> > certainly looks like PCI reads failing and returning ~0:
> >
> >    Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> >    iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> >    iwlwifi 0000:01:00.0: Device gone - attempting removal
> >    Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> >
> > And then we re-enumerate 01:00.0 and it looks like it may have been
> > reset (BAR is 0):
> >
> >    pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
> >    pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> >
> > lspci diffs from before/after suspend:
> >
> >     00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
> >       Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> >    -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
> >    +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >    -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >    +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> >    -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> >    +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> >    -       Capabilities: [150 v0] Null
> >    -       Capabilities: [200 v1] L1 PM Substates
> >    -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >    -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> >    -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >    -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
> >    -               L1SubCtl2: T_PwrOn=60us
> >
> > The DevSta differences might be BIOS bugs, probably not relevant.
> > Interesting that ASPM is disabled, maybe didn't get enabled after
> > re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
> > disappeared.
> >
> >     01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> >                    LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >    -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >    +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >            Capabilities: [154 v1] L1 PM Substates
> >                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >                              PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
> >    -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >    -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
> >    +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >    +                          T_CommonMode=0us LTR1.2_Threshold=0ns
> >
> > Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
> > get reinitialized after re-enumeration?  Looks like we didn't restore
> > L1SubCtl1.
> >
> > Bjorn
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-25 22:50           ` Rajat Jain
@ 2022-07-26  7:20             ` Lukasz Majczak
  2022-07-29  9:39               ` Lukasz Majczak
  0 siblings, 1 reply; 24+ messages in thread
From: Lukasz Majczak @ 2022-07-26  7:20 UTC (permalink / raw)
  To: Rajat Jain
  Cc: Vidya Sagar, Bjorn Helgaas, Kai-Heng Feng, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
>
> Hello,
>
> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> >
> > Agree with Bjorn's observations.
> > The fact that the L1SS capability registers themselves disappeared in
> > the root port post resume indicates that there seems to be something
> > wrong with the BIOS itself.
> > Could you please check from that perspective?
>
> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
> is a shallower sleep state that preserves more state than, for e.g. S3
> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
> at all. i.e. after the kernel runs its suspend routines, it just puts
> the CPU into S0ix state. So I do not think there is a BIOS angle to
> this.
>
>
> >
> > Thanks,
> > Vidya Sagar
> >
> >
> > On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> > >> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
> > >>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
> > >>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> > >>>>>
> > >>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> > >>>>> saved and restored during suspend/resume leading to L1 Substates
> > >>>>> configuration being lost post-resume.
> > >>>>>
> > >>>>> Save the L1 Substates control registers so that the configuration is
> > >>>>> retained post-resume.
> > >>>>>
> > >>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > >>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> > >>>>
> > >>>> Hi Vidya,
> > >>>>
> > >>>> I tested this patch on kernel v5.19-rc6.
> > >>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> > >>>> This patch can restore L1SS after suspend/resume.
> > >>>>
> > >>>> The test results are as follows:
> > >>>>
> > >>>> After Boot:
> > >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> > >>>>          Capabilities: [110 v1] L1 PM Substates
> > >>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > >>>> ASPM_L1.1+ L1_PM_Substates+
> > >>>>                            PortCommonModeRestoreTime=255us
> > >>>> PortTPowerOnTime=3100us
> > >>>>                  L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > >>>>                             T_CommonMode=0us LTR1.2_Threshold=3145728ns
> > >>>>                  L1SubCtl2: T_PwrOn=3100us
> > >>>>
> > >>>>
> > >>>> After suspend/resume without this patch.
> > >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> > >>>>          Capabilities: [110 v1] L1 PM Substates
> > >>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > >>>> ASPM_L1.1+ L1_PM_Substates+
> > >>>>                            PortCommonModeRestoreTime=255us
> > >>>> PortTPowerOnTime=3100us
> > >>>>                  L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> > >>>>                             T_CommonMode=0us LTR1.2_Threshold=0ns
> > >>>>                  L1SubCtl2: T_PwrOn=10us
> > >>>>
> > >>>>
> > >>>> After suspend/resume with this patch.
> > >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> > >>>>          Capabilities: [110 v1] L1 PM Substates
> > >>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > >>>> ASPM_L1.1+ L1_PM_Substates+
> > >>>>                            PortCommonModeRestoreTime=255us
> > >>>> PortTPowerOnTime=3100us
> > >>>>                  L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > >>>>                             T_CommonMode=0us LTR1.2_Threshold=3145728ns
> > >>>>                  L1SubCtl2: T_PwrOn=3100us
> > >>>>
> > >>>>
> > >>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
> > >>>
> > >>> Forgot to add mine:
> > >>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > >>>
> > >>>>
> > >>>> Best regards,
> > >>>> Ben Chuang
> > >>>>
> > >>>>
> > >>>>> ---
> > >>>>> Hi,
> > >>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> > >>>>> on your laptop (Dell XPS 13) one last time?
> > >>>>> IMHO, the regression observed on your laptop with an old version of the patch
> > >>>>> could be due to a buggy old version BIOS in the laptop.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Vidya Sagar
> > >>>>>
> > >>>>>   drivers/pci/pci.c       |  7 +++++++
> > >>>>>   drivers/pci/pci.h       |  4 ++++
> > >>>>>   drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> > >>>>>   3 files changed, 55 insertions(+)
> > >>>>>
> > >>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > >>>>> index cfaf40a540a8..aca05880aaa3 100644
> > >>>>> --- a/drivers/pci/pci.c
> > >>>>> +++ b/drivers/pci/pci.c
> > >>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> > >>>>>                  return i;
> > >>>>>
> > >>>>>          pci_save_ltr_state(dev);
> > >>>>> +       pci_save_aspm_l1ss_state(dev);
> > >>>>>          pci_save_dpc_state(dev);
> > >>>>>          pci_save_aer_state(dev);
> > >>>>>          pci_save_ptm_state(dev);
> > >>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> > >>>>>           * LTR itself (in the PCIe capability).
> > >>>>>           */
> > >>>>>          pci_restore_ltr_state(dev);
> > >>>>> +       pci_restore_aspm_l1ss_state(dev);
> > >>>>>
> > >>>>>          pci_restore_pcie_state(dev);
> > >>>>>          pci_restore_pasid_state(dev);
> > >>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> > >>>>>          if (error)
> > >>>>>                  pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> > >>>>>
> > >>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> > >>>>> +                                           2 * sizeof(u32));
> > >>>>> +       if (error)
> > >>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> > >>>>> +
> > >>>>>          pci_allocate_vc_save_buffers(dev);
> > >>>>>   }
> > >>>>>
> > >>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > >>>>> index e10cdec6c56e..92d8c92662a4 100644
> > >>>>> --- a/drivers/pci/pci.h
> > >>>>> +++ b/drivers/pci/pci.h
> > >>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> > >>>>>   void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> > >>>>>   void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> > >>>>>   void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> > >>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> > >>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> > >>>>>   #else
> > >>>>>   static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> > >>>>>   static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> > >>>>>   static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> > >>>>>   static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> > >>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> > >>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> > >>>>>   #endif
> > >>>>>
> > >>>>>   #ifdef CONFIG_PCIE_ECRC
> > >>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > >>>>> index a96b7424c9bc..2c29fdd20059 100644
> > >>>>> --- a/drivers/pci/pcie/aspm.c
> > >>>>> +++ b/drivers/pci/pcie/aspm.c
> > >>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> > >>>>>                                  PCI_L1SS_CTL1_L1SS_MASK, val);
> > >>>>>   }
> > >>>>>
> > >>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> > >>>>> +{
> > >>>>> +       int aspm_l1ss;
> > >>>>> +       struct pci_cap_saved_state *save_state;
> > >>>>> +       u32 *cap;
> > >>>>> +
> > >>>>> +       if (!pci_is_pcie(dev))
> > >>>>> +               return;
> > >>>>> +
> > >>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > >>>>> +       if (!aspm_l1ss)
> > >>>>> +               return;
> > >>>>> +
> > >>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > >>>>> +       if (!save_state)
> > >>>>> +               return;
> > >>>>> +
> > >>>>> +       cap = (u32 *)&save_state->cap.data[0];
> > >>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> > >>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> > >>>>> +}
> > >>>>> +
> > >>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> > >>>>> +{
> > >>>>> +       int aspm_l1ss;
> > >>>>> +       struct pci_cap_saved_state *save_state;
> > >>>>> +       u32 *cap;
> > >>>>> +
> > >>>>> +       if (!pci_is_pcie(dev))
> > >>>>> +               return;
> > >>>>> +
> > >>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > >>>>> +       if (!aspm_l1ss)
> > >>>>> +               return;
> > >>>>> +
> > >>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > >>>>> +       if (!save_state)
> > >>>>> +               return;
> > >>>>> +
> > >>>>> +       cap = (u32 *)&save_state->cap.data[0];
> > >>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> > >>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> > >>>>> +}
> > >>>>> +
> > >>>>>   static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> > >>>>>   {
> > >>>>>          pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> > >>>>> --
> > >>>>> 2.17.1
> > >>>>>
> > >>
> > >> Hi,
> > >>
> > >> With this patch (and also mentioned
> > >> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
> > >> applied on 5.10 (chromeos-5.10) I am observing problems after
> > >> suspend/resume with my WiFi card - it looks like whole communication
> > >> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> > >> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> > >>
> > >> I played a little bit with this code and it looks like the
> > >> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> > >> why, not a PCI expert).
> > >
> > > Thanks a lot for testing this!  I'm not quite sure what to make of the
> > > results since v5.10 is fairly old (Dec 2020) and I don't know what
> > > other changes are in chromeos-5.10.
>
> Lukasz: I assume you are running this on Atlas and are seeing this bug
> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
> Intel platform that have the latest upstream kernel running already
> and see if this can be reproduced there too?
> Note that the wifi PCI device is different on newer Intel platforms,
> but platform design is similar enough that I suspect we should see
> similar bug on those too. The other option is to try the latest
> ustream kernel on Atlas. Perhaps if we just care about wifi (and
> ignore bringing up the graphics stack and GUI), it may come up
> sufficiently enough to try this patch?
>
> Thanks,
>
> Rajat
>
>
> > >
> > > Random observations, no analysis below.  This from your dmesg
> > > certainly looks like PCI reads failing and returning ~0:
> > >
> > >    Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> > >    iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> > >    iwlwifi 0000:01:00.0: Device gone - attempting removal
> > >    Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> > >
> > > And then we re-enumerate 01:00.0 and it looks like it may have been
> > > reset (BAR is 0):
> > >
> > >    pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
> > >    pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> > >
> > > lspci diffs from before/after suspend:
> > >
> > >     00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
> > >       Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> > >    -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
> > >    +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > >    -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > >    +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> > >    -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> > >    +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> > >    -       Capabilities: [150 v0] Null
> > >    -       Capabilities: [200 v1] L1 PM Substates
> > >    -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> > >    -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> > >    -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > >    -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
> > >    -               L1SubCtl2: T_PwrOn=60us
> > >
> > > The DevSta differences might be BIOS bugs, probably not relevant.
> > > Interesting that ASPM is disabled, maybe didn't get enabled after
> > > re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
> > > disappeared.
> > >
> > >     01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> > >                    LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > >    -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> > >    +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > >            Capabilities: [154 v1] L1 PM Substates
> > >                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> > >                              PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
> > >    -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > >    -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
> > >    +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> > >    +                          T_CommonMode=0us LTR1.2_Threshold=0ns
> > >
> > > Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
> > > get reinitialized after re-enumeration?  Looks like we didn't restore
> > > L1SubCtl1.
> > >
> > > Bjorn
> > >

Hi,

Thank you all for the response and input! As Rajat mentioned I'm using
chromebook - but not Atlas (Amberlake) - in this case it is Babymega
(Apollolake)  - I will try to load most recent kernel and give it a
try once again.

Best regards,
Lukasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-26  7:20             ` Lukasz Majczak
@ 2022-07-29  9:39               ` Lukasz Majczak
  2022-07-29 14:35                 ` Vidya Sagar
  0 siblings, 1 reply; 24+ messages in thread
From: Lukasz Majczak @ 2022-07-29  9:39 UTC (permalink / raw)
  To: Rajat Jain
  Cc: Vidya Sagar, Bjorn Helgaas, Kai-Heng Feng, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
>
> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
> >
> > Hello,
> >
> > On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> > >
> > > Agree with Bjorn's observations.
> > > The fact that the L1SS capability registers themselves disappeared in
> > > the root port post resume indicates that there seems to be something
> > > wrong with the BIOS itself.
> > > Could you please check from that perspective?
> >
> > ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
> > is a shallower sleep state that preserves more state than, for e.g. S3
> > (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
> > at all. i.e. after the kernel runs its suspend routines, it just puts
> > the CPU into S0ix state. So I do not think there is a BIOS angle to
> > this.
> >
> >
> > >
> > > Thanks,
> > > Vidya Sagar
> > >
> > >
> > > On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> > > >> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
> > > >>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
> > > >>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> > > >>>>>
> > > >>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> > > >>>>> saved and restored during suspend/resume leading to L1 Substates
> > > >>>>> configuration being lost post-resume.
> > > >>>>>
> > > >>>>> Save the L1 Substates control registers so that the configuration is
> > > >>>>> retained post-resume.
> > > >>>>>
> > > >>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > >>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> > > >>>>
> > > >>>> Hi Vidya,
> > > >>>>
> > > >>>> I tested this patch on kernel v5.19-rc6.
> > > >>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> > > >>>> This patch can restore L1SS after suspend/resume.
> > > >>>>
> > > >>>> The test results are as follows:
> > > >>>>
> > > >>>> After Boot:
> > > >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> > > >>>>          Capabilities: [110 v1] L1 PM Substates
> > > >>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > > >>>> ASPM_L1.1+ L1_PM_Substates+
> > > >>>>                            PortCommonModeRestoreTime=255us
> > > >>>> PortTPowerOnTime=3100us
> > > >>>>                  L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > >>>>                             T_CommonMode=0us LTR1.2_Threshold=3145728ns
> > > >>>>                  L1SubCtl2: T_PwrOn=3100us
> > > >>>>
> > > >>>>
> > > >>>> After suspend/resume without this patch.
> > > >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> > > >>>>          Capabilities: [110 v1] L1 PM Substates
> > > >>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > > >>>> ASPM_L1.1+ L1_PM_Substates+
> > > >>>>                            PortCommonModeRestoreTime=255us
> > > >>>> PortTPowerOnTime=3100us
> > > >>>>                  L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> > > >>>>                             T_CommonMode=0us LTR1.2_Threshold=0ns
> > > >>>>                  L1SubCtl2: T_PwrOn=10us
> > > >>>>
> > > >>>>
> > > >>>> After suspend/resume with this patch.
> > > >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> > > >>>>          Capabilities: [110 v1] L1 PM Substates
> > > >>>>                  L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > > >>>> ASPM_L1.1+ L1_PM_Substates+
> > > >>>>                            PortCommonModeRestoreTime=255us
> > > >>>> PortTPowerOnTime=3100us
> > > >>>>                  L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > >>>>                             T_CommonMode=0us LTR1.2_Threshold=3145728ns
> > > >>>>                  L1SubCtl2: T_PwrOn=3100us
> > > >>>>
> > > >>>>
> > > >>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
> > > >>>
> > > >>> Forgot to add mine:
> > > >>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > >>>
> > > >>>>
> > > >>>> Best regards,
> > > >>>> Ben Chuang
> > > >>>>
> > > >>>>
> > > >>>>> ---
> > > >>>>> Hi,
> > > >>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> > > >>>>> on your laptop (Dell XPS 13) one last time?
> > > >>>>> IMHO, the regression observed on your laptop with an old version of the patch
> > > >>>>> could be due to a buggy old version BIOS in the laptop.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Vidya Sagar
> > > >>>>>
> > > >>>>>   drivers/pci/pci.c       |  7 +++++++
> > > >>>>>   drivers/pci/pci.h       |  4 ++++
> > > >>>>>   drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> > > >>>>>   3 files changed, 55 insertions(+)
> > > >>>>>
> > > >>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > >>>>> index cfaf40a540a8..aca05880aaa3 100644
> > > >>>>> --- a/drivers/pci/pci.c
> > > >>>>> +++ b/drivers/pci/pci.c
> > > >>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> > > >>>>>                  return i;
> > > >>>>>
> > > >>>>>          pci_save_ltr_state(dev);
> > > >>>>> +       pci_save_aspm_l1ss_state(dev);
> > > >>>>>          pci_save_dpc_state(dev);
> > > >>>>>          pci_save_aer_state(dev);
> > > >>>>>          pci_save_ptm_state(dev);
> > > >>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> > > >>>>>           * LTR itself (in the PCIe capability).
> > > >>>>>           */
> > > >>>>>          pci_restore_ltr_state(dev);
> > > >>>>> +       pci_restore_aspm_l1ss_state(dev);
> > > >>>>>
> > > >>>>>          pci_restore_pcie_state(dev);
> > > >>>>>          pci_restore_pasid_state(dev);
> > > >>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> > > >>>>>          if (error)
> > > >>>>>                  pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> > > >>>>>
> > > >>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> > > >>>>> +                                           2 * sizeof(u32));
> > > >>>>> +       if (error)
> > > >>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> > > >>>>> +
> > > >>>>>          pci_allocate_vc_save_buffers(dev);
> > > >>>>>   }
> > > >>>>>
> > > >>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > >>>>> index e10cdec6c56e..92d8c92662a4 100644
> > > >>>>> --- a/drivers/pci/pci.h
> > > >>>>> +++ b/drivers/pci/pci.h
> > > >>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> > > >>>>>   void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> > > >>>>>   void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> > > >>>>>   void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> > > >>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> > > >>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> > > >>>>>   #else
> > > >>>>>   static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> > > >>>>>   static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> > > >>>>>   static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> > > >>>>>   static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> > > >>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> > > >>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> > > >>>>>   #endif
> > > >>>>>
> > > >>>>>   #ifdef CONFIG_PCIE_ECRC
> > > >>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > >>>>> index a96b7424c9bc..2c29fdd20059 100644
> > > >>>>> --- a/drivers/pci/pcie/aspm.c
> > > >>>>> +++ b/drivers/pci/pcie/aspm.c
> > > >>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> > > >>>>>                                  PCI_L1SS_CTL1_L1SS_MASK, val);
> > > >>>>>   }
> > > >>>>>
> > > >>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> > > >>>>> +{
> > > >>>>> +       int aspm_l1ss;
> > > >>>>> +       struct pci_cap_saved_state *save_state;
> > > >>>>> +       u32 *cap;
> > > >>>>> +
> > > >>>>> +       if (!pci_is_pcie(dev))
> > > >>>>> +               return;
> > > >>>>> +
> > > >>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > > >>>>> +       if (!aspm_l1ss)
> > > >>>>> +               return;
> > > >>>>> +
> > > >>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > > >>>>> +       if (!save_state)
> > > >>>>> +               return;
> > > >>>>> +
> > > >>>>> +       cap = (u32 *)&save_state->cap.data[0];
> > > >>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> > > >>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> > > >>>>> +}
> > > >>>>> +
> > > >>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> > > >>>>> +{
> > > >>>>> +       int aspm_l1ss;
> > > >>>>> +       struct pci_cap_saved_state *save_state;
> > > >>>>> +       u32 *cap;
> > > >>>>> +
> > > >>>>> +       if (!pci_is_pcie(dev))
> > > >>>>> +               return;
> > > >>>>> +
> > > >>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > > >>>>> +       if (!aspm_l1ss)
> > > >>>>> +               return;
> > > >>>>> +
> > > >>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> > > >>>>> +       if (!save_state)
> > > >>>>> +               return;
> > > >>>>> +
> > > >>>>> +       cap = (u32 *)&save_state->cap.data[0];
> > > >>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> > > >>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> > > >>>>> +}
> > > >>>>> +
> > > >>>>>   static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> > > >>>>>   {
> > > >>>>>          pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> > > >>>>> --
> > > >>>>> 2.17.1
> > > >>>>>
> > > >>
> > > >> Hi,
> > > >>
> > > >> With this patch (and also mentioned
> > > >> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
> > > >> applied on 5.10 (chromeos-5.10) I am observing problems after
> > > >> suspend/resume with my WiFi card - it looks like whole communication
> > > >> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> > > >> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> > > >>
> > > >> I played a little bit with this code and it looks like the
> > > >> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> > > >> why, not a PCI expert).
> > > >
> > > > Thanks a lot for testing this!  I'm not quite sure what to make of the
> > > > results since v5.10 is fairly old (Dec 2020) and I don't know what
> > > > other changes are in chromeos-5.10.
> >
> > Lukasz: I assume you are running this on Atlas and are seeing this bug
> > when uprev'ving it to 5.10 kernel. Can you please try it on a newer
> > Intel platform that have the latest upstream kernel running already
> > and see if this can be reproduced there too?
> > Note that the wifi PCI device is different on newer Intel platforms,
> > but platform design is similar enough that I suspect we should see
> > similar bug on those too. The other option is to try the latest
> > ustream kernel on Atlas. Perhaps if we just care about wifi (and
> > ignore bringing up the graphics stack and GUI), it may come up
> > sufficiently enough to try this patch?
> >
> > Thanks,
> >
> > Rajat
> >
> >
> > > >
> > > > Random observations, no analysis below.  This from your dmesg
> > > > certainly looks like PCI reads failing and returning ~0:
> > > >
> > > >    Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> > > >    iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> > > >    iwlwifi 0000:01:00.0: Device gone - attempting removal
> > > >    Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> > > >
> > > > And then we re-enumerate 01:00.0 and it looks like it may have been
> > > > reset (BAR is 0):
> > > >
> > > >    pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
> > > >    pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> > > >
> > > > lspci diffs from before/after suspend:
> > > >
> > > >     00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
> > > >       Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> > > >    -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
> > > >    +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > > >    -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > > >    +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> > > >    -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> > > >    +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> > > >    -       Capabilities: [150 v0] Null
> > > >    -       Capabilities: [200 v1] L1 PM Substates
> > > >    -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> > > >    -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> > > >    -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > >    -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
> > > >    -               L1SubCtl2: T_PwrOn=60us
> > > >
> > > > The DevSta differences might be BIOS bugs, probably not relevant.
> > > > Interesting that ASPM is disabled, maybe didn't get enabled after
> > > > re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
> > > > disappeared.
> > > >
> > > >     01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> > > >                    LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > > >    -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> > > >    +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > >            Capabilities: [154 v1] L1 PM Substates
> > > >                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> > > >                              PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
> > > >    -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > >    -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
> > > >    +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> > > >    +                          T_CommonMode=0us LTR1.2_Threshold=0ns
> > > >
> > > > Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
> > > > get reinitialized after re-enumeration?  Looks like we didn't restore
> > > > L1SubCtl1.
> > > >
> > > > Bjorn
> > > >
>
> Hi,
>
> Thank you all for the response and input! As Rajat mentioned I'm using
> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
> (Apollolake)  - I will try to load most recent kernel and give it a
> try once again.
>
> Best regards,
> Lukasz

Hi,

 I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
still getting same results:
https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9

Best regards,
Lukasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-29  9:39               ` Lukasz Majczak
@ 2022-07-29 14:35                 ` Vidya Sagar
  2022-08-03 12:04                   ` Lukasz Majczak
  0 siblings, 1 reply; 24+ messages in thread
From: Vidya Sagar @ 2022-07-29 14:35 UTC (permalink / raw)
  To: Lukasz Majczak, Rajat Jain
  Cc: Bjorn Helgaas, Kai-Heng Feng, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

Hi Lukasz,
Thanks for sharing your observations.

Could you please also share the output of 'sudo lspci -vvvv' before and 
after suspend-resume cycle with the latest linux-next?
Do we still see the L1SS capabilities getting disappeared post resume?

Thanks,
Vidya Sagar

On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
> External email: Use caution opening links or attachments
> 
> 
> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
>>
>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
>>>
>>> Hello,
>>>
>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>
>>>> Agree with Bjorn's observations.
>>>> The fact that the L1SS capability registers themselves disappeared in
>>>> the root port post resume indicates that there seems to be something
>>>> wrong with the BIOS itself.
>>>> Could you please check from that perspective?
>>>
>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
>>> is a shallower sleep state that preserves more state than, for e.g. S3
>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
>>> at all. i.e. after the kernel runs its suspend routines, it just puts
>>> the CPU into S0ix state. So I do not think there is a BIOS angle to
>>> this.
>>>
>>>
>>>>
>>>> Thanks,
>>>> Vidya Sagar
>>>>
>>>>
>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>>>>>>
>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
>>>>>>>>> configuration being lost post-resume.
>>>>>>>>>
>>>>>>>>> Save the L1 Substates control registers so that the configuration is
>>>>>>>>> retained post-resume.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
>>>>>>>>
>>>>>>>> Hi Vidya,
>>>>>>>>
>>>>>>>> I tested this patch on kernel v5.19-rc6.
>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
>>>>>>>> This patch can restore L1SS after suspend/resume.
>>>>>>>>
>>>>>>>> The test results are as follows:
>>>>>>>>
>>>>>>>> After Boot:
>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>           Capabilities: [110 v1] L1 PM Substates
>>>>>>>>                   L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>                             PortCommonModeRestoreTime=255us
>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>                   L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>                              T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>>>>>                   L1SubCtl2: T_PwrOn=3100us
>>>>>>>>
>>>>>>>>
>>>>>>>> After suspend/resume without this patch.
>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>           Capabilities: [110 v1] L1 PM Substates
>>>>>>>>                   L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>                             PortCommonModeRestoreTime=255us
>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>                   L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>>>>>                              T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>>>>>                   L1SubCtl2: T_PwrOn=10us
>>>>>>>>
>>>>>>>>
>>>>>>>> After suspend/resume with this patch.
>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>           Capabilities: [110 v1] L1 PM Substates
>>>>>>>>                   L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>                             PortCommonModeRestoreTime=255us
>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>                   L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>                              T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>>>>>                   L1SubCtl2: T_PwrOn=3100us
>>>>>>>>
>>>>>>>>
>>>>>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
>>>>>>>
>>>>>>> Forgot to add mine:
>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>>>>
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Ben Chuang
>>>>>>>>
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> Hi,
>>>>>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
>>>>>>>>> on your laptop (Dell XPS 13) one last time?
>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
>>>>>>>>> could be due to a buggy old version BIOS in the laptop.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vidya Sagar
>>>>>>>>>
>>>>>>>>>    drivers/pci/pci.c       |  7 +++++++
>>>>>>>>>    drivers/pci/pci.h       |  4 ++++
>>>>>>>>>    drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>    3 files changed, 55 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
>>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
>>>>>>>>>                   return i;
>>>>>>>>>
>>>>>>>>>           pci_save_ltr_state(dev);
>>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
>>>>>>>>>           pci_save_dpc_state(dev);
>>>>>>>>>           pci_save_aer_state(dev);
>>>>>>>>>           pci_save_ptm_state(dev);
>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
>>>>>>>>>            * LTR itself (in the PCIe capability).
>>>>>>>>>            */
>>>>>>>>>           pci_restore_ltr_state(dev);
>>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
>>>>>>>>>
>>>>>>>>>           pci_restore_pcie_state(dev);
>>>>>>>>>           pci_restore_pasid_state(dev);
>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
>>>>>>>>>           if (error)
>>>>>>>>>                   pci_err(dev, "unable to allocate suspend buffer for LTR\n");
>>>>>>>>>
>>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
>>>>>>>>> +                                           2 * sizeof(u32));
>>>>>>>>> +       if (error)
>>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
>>>>>>>>> +
>>>>>>>>>           pci_allocate_vc_save_buffers(dev);
>>>>>>>>>    }
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
>>>>>>>>> --- a/drivers/pci/pci.h
>>>>>>>>> +++ b/drivers/pci/pci.h
>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
>>>>>>>>>    void pcie_aspm_exit_link_state(struct pci_dev *pdev);
>>>>>>>>>    void pcie_aspm_pm_state_change(struct pci_dev *pdev);
>>>>>>>>>    void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>>>>>>>>>    #else
>>>>>>>>>    static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
>>>>>>>>>    static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
>>>>>>>>>    static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
>>>>>>>>>    static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>>>>>    #endif
>>>>>>>>>
>>>>>>>>>    #ifdef CONFIG_PCIE_ECRC
>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
>>>>>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>>>>>>>>>                                   PCI_L1SS_CTL1_L1SS_MASK, val);
>>>>>>>>>    }
>>>>>>>>>
>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
>>>>>>>>> +{
>>>>>>>>> +       int aspm_l1ss;
>>>>>>>>> +       struct pci_cap_saved_state *save_state;
>>>>>>>>> +       u32 *cap;
>>>>>>>>> +
>>>>>>>>> +       if (!pci_is_pcie(dev))
>>>>>>>>> +               return;
>>>>>>>>> +
>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>> +       if (!aspm_l1ss)
>>>>>>>>> +               return;
>>>>>>>>> +
>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>> +       if (!save_state)
>>>>>>>>> +               return;
>>>>>>>>> +
>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
>>>>>>>>> +{
>>>>>>>>> +       int aspm_l1ss;
>>>>>>>>> +       struct pci_cap_saved_state *save_state;
>>>>>>>>> +       u32 *cap;
>>>>>>>>> +
>>>>>>>>> +       if (!pci_is_pcie(dev))
>>>>>>>>> +               return;
>>>>>>>>> +
>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>> +       if (!aspm_l1ss)
>>>>>>>>> +               return;
>>>>>>>>> +
>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>> +       if (!save_state)
>>>>>>>>> +               return;
>>>>>>>>> +
>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>    static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
>>>>>>>>>    {
>>>>>>>>>           pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
>>>>>>>>> --
>>>>>>>>> 2.17.1
>>>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> With this patch (and also mentioned
>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
>>>>>> suspend/resume with my WiFi card - it looks like whole communication
>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
>>>>>>
>>>>>> I played a little bit with this code and it looks like the
>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
>>>>>> why, not a PCI expert).
>>>>>
>>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
>>>>> other changes are in chromeos-5.10.
>>>
>>> Lukasz: I assume you are running this on Atlas and are seeing this bug
>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
>>> Intel platform that have the latest upstream kernel running already
>>> and see if this can be reproduced there too?
>>> Note that the wifi PCI device is different on newer Intel platforms,
>>> but platform design is similar enough that I suspect we should see
>>> similar bug on those too. The other option is to try the latest
>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
>>> ignore bringing up the graphics stack and GUI), it may come up
>>> sufficiently enough to try this patch?
>>>
>>> Thanks,
>>>
>>> Rajat
>>>
>>>
>>>>>
>>>>> Random observations, no analysis below.  This from your dmesg
>>>>> certainly looks like PCI reads failing and returning ~0:
>>>>>
>>>>>     Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
>>>>>     iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
>>>>>     iwlwifi 0000:01:00.0: Device gone - attempting removal
>>>>>     Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
>>>>>
>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
>>>>> reset (BAR is 0):
>>>>>
>>>>>     pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
>>>>>     pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
>>>>>
>>>>> lspci diffs from before/after suspend:
>>>>>
>>>>>      00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
>>>>>        Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
>>>>>     -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
>>>>>     +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>     -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>>>>>     +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>>>>>     -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>>>>>     +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
>>>>>     -       Capabilities: [150 v0] Null
>>>>>     -       Capabilities: [200 v1] L1 PM Substates
>>>>>     -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>>>>>     -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
>>>>>     -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>     -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
>>>>>     -               L1SubCtl2: T_PwrOn=60us
>>>>>
>>>>> The DevSta differences might be BIOS bugs, probably not relevant.
>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
>>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
>>>>> disappeared.
>>>>>
>>>>>      01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
>>>>>                     LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>>>>>     -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>     +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>>>             Capabilities: [154 v1] L1 PM Substates
>>>>>                     L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>>>>>                               PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
>>>>>     -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>     -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
>>>>>     +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>>     +                          T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>>
>>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
>>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
>>>>> L1SubCtl1.
>>>>>
>>>>> Bjorn
>>>>>
>>
>> Hi,
>>
>> Thank you all for the response and input! As Rajat mentioned I'm using
>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
>> (Apollolake)  - I will try to load most recent kernel and give it a
>> try once again.
>>
>> Best regards,
>> Lukasz
> 
> Hi,
> 
>   I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
> still getting same results:
> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
> 
> Best regards,
> Lukasz
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-07-29 14:35                 ` Vidya Sagar
@ 2022-08-03 12:04                   ` Lukasz Majczak
  2022-08-03 12:55                     ` Vidya Sagar
  0 siblings, 1 reply; 24+ messages in thread
From: Lukasz Majczak @ 2022-08-03 12:04 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: Rajat Jain, Bjorn Helgaas, Kai-Heng Feng, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>
> Hi Lukasz,
> Thanks for sharing your observations.
>
> Could you please also share the output of 'sudo lspci -vvvv' before and
> after suspend-resume cycle with the latest linux-next?
> Do we still see the L1SS capabilities getting disappeared post resume?
>
> Thanks,
> Vidya Sagar
>
> On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
> >>
> >> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
> >>>
> >>> Hello,
> >>>
> >>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>
> >>>> Agree with Bjorn's observations.
> >>>> The fact that the L1SS capability registers themselves disappeared in
> >>>> the root port post resume indicates that there seems to be something
> >>>> wrong with the BIOS itself.
> >>>> Could you please check from that perspective?
> >>>
> >>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
> >>> is a shallower sleep state that preserves more state than, for e.g. S3
> >>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
> >>> at all. i.e. after the kernel runs its suspend routines, it just puts
> >>> the CPU into S0ix state. So I do not think there is a BIOS angle to
> >>> this.
> >>>
> >>>
> >>>>
> >>>> Thanks,
> >>>> Vidya Sagar
> >>>>
> >>>>
> >>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> >>>>> External email: Use caution opening links or attachments
> >>>>>
> >>>>>
> >>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> >>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
> >>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
> >>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> >>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
> >>>>>>>>> configuration being lost post-resume.
> >>>>>>>>>
> >>>>>>>>> Save the L1 Substates control registers so that the configuration is
> >>>>>>>>> retained post-resume.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> >>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> >>>>>>>>
> >>>>>>>> Hi Vidya,
> >>>>>>>>
> >>>>>>>> I tested this patch on kernel v5.19-rc6.
> >>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> >>>>>>>> This patch can restore L1SS after suspend/resume.
> >>>>>>>>
> >>>>>>>> The test results are as follows:
> >>>>>>>>
> >>>>>>>> After Boot:
> >>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>           Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>                   L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>                             PortCommonModeRestoreTime=255us
> >>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>                   L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>                              T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>                   L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> After suspend/resume without this patch.
> >>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>           Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>                   L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>                             PortCommonModeRestoreTime=255us
> >>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>                   L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>>                              T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>>                   L1SubCtl2: T_PwrOn=10us
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> After suspend/resume with this patch.
> >>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>           Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>                   L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>                             PortCommonModeRestoreTime=255us
> >>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>                   L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>                              T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>                   L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
> >>>>>>>
> >>>>>>> Forgot to add mine:
> >>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> Ben Chuang
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> ---
> >>>>>>>>> Hi,
> >>>>>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> >>>>>>>>> on your laptop (Dell XPS 13) one last time?
> >>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
> >>>>>>>>> could be due to a buggy old version BIOS in the laptop.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Vidya Sagar
> >>>>>>>>>
> >>>>>>>>>    drivers/pci/pci.c       |  7 +++++++
> >>>>>>>>>    drivers/pci/pci.h       |  4 ++++
> >>>>>>>>>    drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>>    3 files changed, 55 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
> >>>>>>>>> --- a/drivers/pci/pci.c
> >>>>>>>>> +++ b/drivers/pci/pci.c
> >>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> >>>>>>>>>                   return i;
> >>>>>>>>>
> >>>>>>>>>           pci_save_ltr_state(dev);
> >>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
> >>>>>>>>>           pci_save_dpc_state(dev);
> >>>>>>>>>           pci_save_aer_state(dev);
> >>>>>>>>>           pci_save_ptm_state(dev);
> >>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> >>>>>>>>>            * LTR itself (in the PCIe capability).
> >>>>>>>>>            */
> >>>>>>>>>           pci_restore_ltr_state(dev);
> >>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
> >>>>>>>>>
> >>>>>>>>>           pci_restore_pcie_state(dev);
> >>>>>>>>>           pci_restore_pasid_state(dev);
> >>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >>>>>>>>>           if (error)
> >>>>>>>>>                   pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> >>>>>>>>>
> >>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> >>>>>>>>> +                                           2 * sizeof(u32));
> >>>>>>>>> +       if (error)
> >>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> >>>>>>>>> +
> >>>>>>>>>           pci_allocate_vc_save_buffers(dev);
> >>>>>>>>>    }
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> >>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
> >>>>>>>>> --- a/drivers/pci/pci.h
> >>>>>>>>> +++ b/drivers/pci/pci.h
> >>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> >>>>>>>>>    void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> >>>>>>>>>    void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> >>>>>>>>>    void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> >>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>    #else
> >>>>>>>>>    static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>    static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>    static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> >>>>>>>>>    static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> >>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>    #endif
> >>>>>>>>>
> >>>>>>>>>    #ifdef CONFIG_PCIE_ECRC
> >>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
> >>>>>>>>> --- a/drivers/pci/pcie/aspm.c
> >>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
> >>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >>>>>>>>>                                   PCI_L1SS_CTL1_L1SS_MASK, val);
> >>>>>>>>>    }
> >>>>>>>>>
> >>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>> +{
> >>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>> +       u32 *cap;
> >>>>>>>>> +
> >>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>> +               return;
> >>>>>>>>> +
> >>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>> +               return;
> >>>>>>>>> +
> >>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>> +       if (!save_state)
> >>>>>>>>> +               return;
> >>>>>>>>> +
> >>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> >>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>> +{
> >>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>> +       u32 *cap;
> >>>>>>>>> +
> >>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>> +               return;
> >>>>>>>>> +
> >>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>> +               return;
> >>>>>>>>> +
> >>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>> +       if (!save_state)
> >>>>>>>>> +               return;
> >>>>>>>>> +
> >>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> >>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>    static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> >>>>>>>>>    {
> >>>>>>>>>           pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> >>>>>>>>> --
> >>>>>>>>> 2.17.1
> >>>>>>>>>
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> With this patch (and also mentioned
> >>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
> >>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
> >>>>>> suspend/resume with my WiFi card - it looks like whole communication
> >>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> >>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> >>>>>>
> >>>>>> I played a little bit with this code and it looks like the
> >>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> >>>>>> why, not a PCI expert).
> >>>>>
> >>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
> >>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
> >>>>> other changes are in chromeos-5.10.
> >>>
> >>> Lukasz: I assume you are running this on Atlas and are seeing this bug
> >>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
> >>> Intel platform that have the latest upstream kernel running already
> >>> and see if this can be reproduced there too?
> >>> Note that the wifi PCI device is different on newer Intel platforms,
> >>> but platform design is similar enough that I suspect we should see
> >>> similar bug on those too. The other option is to try the latest
> >>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
> >>> ignore bringing up the graphics stack and GUI), it may come up
> >>> sufficiently enough to try this patch?
> >>>
> >>> Thanks,
> >>>
> >>> Rajat
> >>>
> >>>
> >>>>>
> >>>>> Random observations, no analysis below.  This from your dmesg
> >>>>> certainly looks like PCI reads failing and returning ~0:
> >>>>>
> >>>>>     Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> >>>>>     iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> >>>>>     iwlwifi 0000:01:00.0: Device gone - attempting removal
> >>>>>     Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> >>>>>
> >>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
> >>>>> reset (BAR is 0):
> >>>>>
> >>>>>     pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
> >>>>>     pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> >>>>>
> >>>>> lspci diffs from before/after suspend:
> >>>>>
> >>>>>      00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
> >>>>>        Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> >>>>>     -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
> >>>>>     +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>     -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>     +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>     -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> >>>>>     +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> >>>>>     -       Capabilities: [150 v0] Null
> >>>>>     -       Capabilities: [200 v1] L1 PM Substates
> >>>>>     -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>     -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> >>>>>     -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>     -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
> >>>>>     -               L1SubCtl2: T_PwrOn=60us
> >>>>>
> >>>>> The DevSta differences might be BIOS bugs, probably not relevant.
> >>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
> >>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
> >>>>> disappeared.
> >>>>>
> >>>>>      01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> >>>>>                     LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>     -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>     +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >>>>>             Capabilities: [154 v1] L1 PM Substates
> >>>>>                     L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>                               PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
> >>>>>     -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>     -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
> >>>>>     +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>     +                          T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>
> >>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
> >>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
> >>>>> L1SubCtl1.
> >>>>>
> >>>>> Bjorn
> >>>>>
> >>
> >> Hi,
> >>
> >> Thank you all for the response and input! As Rajat mentioned I'm using
> >> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
> >> (Apollolake)  - I will try to load most recent kernel and give it a
> >> try once again.
> >>
> >> Best regards,
> >> Lukasz
> >
> > Hi,
> >
> >   I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
> > still getting same results:
> > https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
> >
> > Best regards,
> > Lukasz
> >
Hi Vidya,

Sorry for the long delay, I have retested your patch on top of
linux-next/master (next-20220802) - the results for my device remain
the same.
Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg)
https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef
Let me know if you need any more logs and/or tests.

Best regards,
Lukasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-08-03 12:04                   ` Lukasz Majczak
@ 2022-08-03 12:55                     ` Vidya Sagar
  2022-08-08 14:07                       ` Lukasz Majczak
  0 siblings, 1 reply; 24+ messages in thread
From: Vidya Sagar @ 2022-08-03 12:55 UTC (permalink / raw)
  To: Lukasz Majczak
  Cc: Rajat Jain, Bjorn Helgaas, Kai-Heng Feng, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

Thanks Lukasz for the logs.
I still that the L1SS capability in the root port (00:14.0) disappeared 
after resume.
I still don't understand how this patch can make the capability register 
itself disappear. Honestly, I still see this as a HW issue.
Bjorn, could you please throw some light on this?

Thanks,
Vidya Sagar

On 8/3/2022 5:34 PM, Lukasz Majczak wrote:
> External email: Use caution opening links or attachments
> 
> 
> pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>>
>> Hi Lukasz,
>> Thanks for sharing your observations.
>>
>> Could you please also share the output of 'sudo lspci -vvvv' before and
>> after suspend-resume cycle with the latest linux-next?
>> Do we still see the L1SS capabilities getting disappeared post resume?
>>
>> Thanks,
>> Vidya Sagar
>>
>> On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
>>>>
>>>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
>>>>>
>>>>> Hello,
>>>>>
>>>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>>>
>>>>>> Agree with Bjorn's observations.
>>>>>> The fact that the L1SS capability registers themselves disappeared in
>>>>>> the root port post resume indicates that there seems to be something
>>>>>> wrong with the BIOS itself.
>>>>>> Could you please check from that perspective?
>>>>>
>>>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
>>>>> is a shallower sleep state that preserves more state than, for e.g. S3
>>>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
>>>>> at all. i.e. after the kernel runs its suspend routines, it just puts
>>>>> the CPU into S0ix state. So I do not think there is a BIOS angle to
>>>>> this.
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vidya Sagar
>>>>>>
>>>>>>
>>>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
>>>>>>> External email: Use caution opening links or attachments
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
>>>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
>>>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
>>>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
>>>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
>>>>>>>>>>> configuration being lost post-resume.
>>>>>>>>>>>
>>>>>>>>>>> Save the L1 Substates control registers so that the configuration is
>>>>>>>>>>> retained post-resume.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
>>>>>>>>>>
>>>>>>>>>> Hi Vidya,
>>>>>>>>>>
>>>>>>>>>> I tested this patch on kernel v5.19-rc6.
>>>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
>>>>>>>>>> This patch can restore L1SS after suspend/resume.
>>>>>>>>>>
>>>>>>>>>> The test results are as follows:
>>>>>>>>>>
>>>>>>>>>> After Boot:
>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>            Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>                              PortCommonModeRestoreTime=255us
>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>                    L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>                               T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>>>>>>>                    L1SubCtl2: T_PwrOn=3100us
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> After suspend/resume without this patch.
>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>            Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>                              PortCommonModeRestoreTime=255us
>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>                    L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>>>>>>>                               T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>>>>>>>                    L1SubCtl2: T_PwrOn=10us
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> After suspend/resume with this patch.
>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>            Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>                              PortCommonModeRestoreTime=255us
>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>                    L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>                               T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>>>>>>>                    L1SubCtl2: T_PwrOn=3100us
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
>>>>>>>>>
>>>>>>>>> Forgot to add mine:
>>>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Ben Chuang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>> Hi,
>>>>>>>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
>>>>>>>>>>> on your laptop (Dell XPS 13) one last time?
>>>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
>>>>>>>>>>> could be due to a buggy old version BIOS in the laptop.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vidya Sagar
>>>>>>>>>>>
>>>>>>>>>>>     drivers/pci/pci.c       |  7 +++++++
>>>>>>>>>>>     drivers/pci/pci.h       |  4 ++++
>>>>>>>>>>>     drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>     3 files changed, 55 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
>>>>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
>>>>>>>>>>>                    return i;
>>>>>>>>>>>
>>>>>>>>>>>            pci_save_ltr_state(dev);
>>>>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
>>>>>>>>>>>            pci_save_dpc_state(dev);
>>>>>>>>>>>            pci_save_aer_state(dev);
>>>>>>>>>>>            pci_save_ptm_state(dev);
>>>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
>>>>>>>>>>>             * LTR itself (in the PCIe capability).
>>>>>>>>>>>             */
>>>>>>>>>>>            pci_restore_ltr_state(dev);
>>>>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
>>>>>>>>>>>
>>>>>>>>>>>            pci_restore_pcie_state(dev);
>>>>>>>>>>>            pci_restore_pasid_state(dev);
>>>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
>>>>>>>>>>>            if (error)
>>>>>>>>>>>                    pci_err(dev, "unable to allocate suspend buffer for LTR\n");
>>>>>>>>>>>
>>>>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
>>>>>>>>>>> +                                           2 * sizeof(u32));
>>>>>>>>>>> +       if (error)
>>>>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
>>>>>>>>>>> +
>>>>>>>>>>>            pci_allocate_vc_save_buffers(dev);
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
>>>>>>>>>>> --- a/drivers/pci/pci.h
>>>>>>>>>>> +++ b/drivers/pci/pci.h
>>>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
>>>>>>>>>>>     void pcie_aspm_exit_link_state(struct pci_dev *pdev);
>>>>>>>>>>>     void pcie_aspm_pm_state_change(struct pci_dev *pdev);
>>>>>>>>>>>     void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>>>>>>>>>>>     #else
>>>>>>>>>>>     static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
>>>>>>>>>>>     static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
>>>>>>>>>>>     static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
>>>>>>>>>>>     static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
>>>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>>>>>>>     #endif
>>>>>>>>>>>
>>>>>>>>>>>     #ifdef CONFIG_PCIE_ECRC
>>>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
>>>>>>>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>>>>>>>>>>>                                    PCI_L1SS_CTL1_L1SS_MASK, val);
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
>>>>>>>>>>> +{
>>>>>>>>>>> +       int aspm_l1ss;
>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
>>>>>>>>>>> +       u32 *cap;
>>>>>>>>>>> +
>>>>>>>>>>> +       if (!pci_is_pcie(dev))
>>>>>>>>>>> +               return;
>>>>>>>>>>> +
>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>> +       if (!aspm_l1ss)
>>>>>>>>>>> +               return;
>>>>>>>>>>> +
>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>> +       if (!save_state)
>>>>>>>>>>> +               return;
>>>>>>>>>>> +
>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
>>>>>>>>>>> +{
>>>>>>>>>>> +       int aspm_l1ss;
>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
>>>>>>>>>>> +       u32 *cap;
>>>>>>>>>>> +
>>>>>>>>>>> +       if (!pci_is_pcie(dev))
>>>>>>>>>>> +               return;
>>>>>>>>>>> +
>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>> +       if (!aspm_l1ss)
>>>>>>>>>>> +               return;
>>>>>>>>>>> +
>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>> +       if (!save_state)
>>>>>>>>>>> +               return;
>>>>>>>>>>> +
>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>     static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
>>>>>>>>>>>     {
>>>>>>>>>>>            pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
>>>>>>>>>>> --
>>>>>>>>>>> 2.17.1
>>>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> With this patch (and also mentioned
>>>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
>>>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
>>>>>>>> suspend/resume with my WiFi card - it looks like whole communication
>>>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
>>>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
>>>>>>>>
>>>>>>>> I played a little bit with this code and it looks like the
>>>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
>>>>>>>> why, not a PCI expert).
>>>>>>>
>>>>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
>>>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
>>>>>>> other changes are in chromeos-5.10.
>>>>>
>>>>> Lukasz: I assume you are running this on Atlas and are seeing this bug
>>>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
>>>>> Intel platform that have the latest upstream kernel running already
>>>>> and see if this can be reproduced there too?
>>>>> Note that the wifi PCI device is different on newer Intel platforms,
>>>>> but platform design is similar enough that I suspect we should see
>>>>> similar bug on those too. The other option is to try the latest
>>>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
>>>>> ignore bringing up the graphics stack and GUI), it may come up
>>>>> sufficiently enough to try this patch?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Rajat
>>>>>
>>>>>
>>>>>>>
>>>>>>> Random observations, no analysis below.  This from your dmesg
>>>>>>> certainly looks like PCI reads failing and returning ~0:
>>>>>>>
>>>>>>>      Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
>>>>>>>      iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
>>>>>>>      iwlwifi 0000:01:00.0: Device gone - attempting removal
>>>>>>>      Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
>>>>>>>
>>>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
>>>>>>> reset (BAR is 0):
>>>>>>>
>>>>>>>      pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
>>>>>>>      pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
>>>>>>>
>>>>>>> lspci diffs from before/after suspend:
>>>>>>>
>>>>>>>       00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
>>>>>>>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
>>>>>>>      -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
>>>>>>>      +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>      -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>      +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>      -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>>>>>>>      +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
>>>>>>>      -       Capabilities: [150 v0] Null
>>>>>>>      -       Capabilities: [200 v1] L1 PM Substates
>>>>>>>      -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>>>>>>>      -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
>>>>>>>      -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>      -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
>>>>>>>      -               L1SubCtl2: T_PwrOn=60us
>>>>>>>
>>>>>>> The DevSta differences might be BIOS bugs, probably not relevant.
>>>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
>>>>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
>>>>>>> disappeared.
>>>>>>>
>>>>>>>       01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
>>>>>>>                      LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>      -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>      +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>>>>>              Capabilities: [154 v1] L1 PM Substates
>>>>>>>                      L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>>>>>>>                                PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
>>>>>>>      -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>      -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
>>>>>>>      +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>>>>      +                          T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>>>>
>>>>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
>>>>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
>>>>>>> L1SubCtl1.
>>>>>>>
>>>>>>> Bjorn
>>>>>>>
>>>>
>>>> Hi,
>>>>
>>>> Thank you all for the response and input! As Rajat mentioned I'm using
>>>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
>>>> (Apollolake)  - I will try to load most recent kernel and give it a
>>>> try once again.
>>>>
>>>> Best regards,
>>>> Lukasz
>>>
>>> Hi,
>>>
>>>    I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
>>> still getting same results:
>>> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
>>>
>>> Best regards,
>>> Lukasz
>>>
> Hi Vidya,
> 
> Sorry for the long delay, I have retested your patch on top of
> linux-next/master (next-20220802) - the results for my device remain
> the same.
> Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg)
> https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef
> Let me know if you need any more logs and/or tests.
> 
> Best regards,
> Lukasz
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-08-03 12:55                     ` Vidya Sagar
@ 2022-08-08 14:07                       ` Lukasz Majczak
  2022-08-08 16:16                         ` Vidya Sagar
  0 siblings, 1 reply; 24+ messages in thread
From: Lukasz Majczak @ 2022-08-08 14:07 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: Rajat Jain, Bjorn Helgaas, Kai-Heng Feng, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

śr., 3 sie 2022 o 14:55 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>
> Thanks Lukasz for the logs.
> I still that the L1SS capability in the root port (00:14.0) disappeared
> after resume.
> I still don't understand how this patch can make the capability register
> itself disappear. Honestly, I still see this as a HW issue.
> Bjorn, could you please throw some light on this?
>
> Thanks,
> Vidya Sagar
>
> On 8/3/2022 5:34 PM, Lukasz Majczak wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@nvidia.com> napisał(a):
> >>
> >> Hi Lukasz,
> >> Thanks for sharing your observations.
> >>
> >> Could you please also share the output of 'sudo lspci -vvvv' before and
> >> after suspend-resume cycle with the latest linux-next?
> >> Do we still see the L1SS capabilities getting disappeared post resume?
> >>
> >> Thanks,
> >> Vidya Sagar
> >>
> >> On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
> >>>>
> >>>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>>>
> >>>>>> Agree with Bjorn's observations.
> >>>>>> The fact that the L1SS capability registers themselves disappeared in
> >>>>>> the root port post resume indicates that there seems to be something
> >>>>>> wrong with the BIOS itself.
> >>>>>> Could you please check from that perspective?
> >>>>>
> >>>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
> >>>>> is a shallower sleep state that preserves more state than, for e.g. S3
> >>>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
> >>>>> at all. i.e. after the kernel runs its suspend routines, it just puts
> >>>>> the CPU into S0ix state. So I do not think there is a BIOS angle to
> >>>>> this.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vidya Sagar
> >>>>>>
> >>>>>>
> >>>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> >>>>>>> External email: Use caution opening links or attachments
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> >>>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
> >>>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
> >>>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> >>>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
> >>>>>>>>>>> configuration being lost post-resume.
> >>>>>>>>>>>
> >>>>>>>>>>> Save the L1 Substates control registers so that the configuration is
> >>>>>>>>>>> retained post-resume.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> >>>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> >>>>>>>>>>
> >>>>>>>>>> Hi Vidya,
> >>>>>>>>>>
> >>>>>>>>>> I tested this patch on kernel v5.19-rc6.
> >>>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> >>>>>>>>>> This patch can restore L1SS after suspend/resume.
> >>>>>>>>>>
> >>>>>>>>>> The test results are as follows:
> >>>>>>>>>>
> >>>>>>>>>> After Boot:
> >>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>            Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>                              PortCommonModeRestoreTime=255us
> >>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>                    L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>                               T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>>>                    L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> After suspend/resume without this patch.
> >>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>            Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>                              PortCommonModeRestoreTime=255us
> >>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>                    L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>>>>                               T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>>>>                    L1SubCtl2: T_PwrOn=10us
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> After suspend/resume with this patch.
> >>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>            Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>                              PortCommonModeRestoreTime=255us
> >>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>                    L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>                               T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>>>                    L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
> >>>>>>>>>
> >>>>>>>>> Forgot to add mine:
> >>>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Ben Chuang
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> ---
> >>>>>>>>>>> Hi,
> >>>>>>>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> >>>>>>>>>>> on your laptop (Dell XPS 13) one last time?
> >>>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
> >>>>>>>>>>> could be due to a buggy old version BIOS in the laptop.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Vidya Sagar
> >>>>>>>>>>>
> >>>>>>>>>>>     drivers/pci/pci.c       |  7 +++++++
> >>>>>>>>>>>     drivers/pci/pci.h       |  4 ++++
> >>>>>>>>>>>     drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>>>>     3 files changed, 55 insertions(+)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >>>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
> >>>>>>>>>>> --- a/drivers/pci/pci.c
> >>>>>>>>>>> +++ b/drivers/pci/pci.c
> >>>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> >>>>>>>>>>>                    return i;
> >>>>>>>>>>>
> >>>>>>>>>>>            pci_save_ltr_state(dev);
> >>>>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
> >>>>>>>>>>>            pci_save_dpc_state(dev);
> >>>>>>>>>>>            pci_save_aer_state(dev);
> >>>>>>>>>>>            pci_save_ptm_state(dev);
> >>>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> >>>>>>>>>>>             * LTR itself (in the PCIe capability).
> >>>>>>>>>>>             */
> >>>>>>>>>>>            pci_restore_ltr_state(dev);
> >>>>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
> >>>>>>>>>>>
> >>>>>>>>>>>            pci_restore_pcie_state(dev);
> >>>>>>>>>>>            pci_restore_pasid_state(dev);
> >>>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >>>>>>>>>>>            if (error)
> >>>>>>>>>>>                    pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> >>>>>>>>>>>
> >>>>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> >>>>>>>>>>> +                                           2 * sizeof(u32));
> >>>>>>>>>>> +       if (error)
> >>>>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> >>>>>>>>>>> +
> >>>>>>>>>>>            pci_allocate_vc_save_buffers(dev);
> >>>>>>>>>>>     }
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> >>>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
> >>>>>>>>>>> --- a/drivers/pci/pci.h
> >>>>>>>>>>> +++ b/drivers/pci/pci.h
> >>>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> >>>>>>>>>>>     void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> >>>>>>>>>>>     void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> >>>>>>>>>>>     void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> >>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>>>     #else
> >>>>>>>>>>>     static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>>>     static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>>>     static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> >>>>>>>>>>>     static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> >>>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>>>     #endif
> >>>>>>>>>>>
> >>>>>>>>>>>     #ifdef CONFIG_PCIE_ECRC
> >>>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >>>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
> >>>>>>>>>>> --- a/drivers/pci/pcie/aspm.c
> >>>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
> >>>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >>>>>>>>>>>                                    PCI_L1SS_CTL1_L1SS_MASK, val);
> >>>>>>>>>>>     }
> >>>>>>>>>>>
> >>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>>>> +{
> >>>>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>>>> +       u32 *cap;
> >>>>>>>>>>> +
> >>>>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>>>> +               return;
> >>>>>>>>>>> +
> >>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>>>> +               return;
> >>>>>>>>>>> +
> >>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>> +       if (!save_state)
> >>>>>>>>>>> +               return;
> >>>>>>>>>>> +
> >>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> >>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>>>> +{
> >>>>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>>>> +       u32 *cap;
> >>>>>>>>>>> +
> >>>>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>>>> +               return;
> >>>>>>>>>>> +
> >>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>>>> +               return;
> >>>>>>>>>>> +
> >>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>> +       if (!save_state)
> >>>>>>>>>>> +               return;
> >>>>>>>>>>> +
> >>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> >>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>>     static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> >>>>>>>>>>>     {
> >>>>>>>>>>>            pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> >>>>>>>>>>> --
> >>>>>>>>>>> 2.17.1
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> With this patch (and also mentioned
> >>>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
> >>>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
> >>>>>>>> suspend/resume with my WiFi card - it looks like whole communication
> >>>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> >>>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> >>>>>>>>
> >>>>>>>> I played a little bit with this code and it looks like the
> >>>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> >>>>>>>> why, not a PCI expert).
> >>>>>>>
> >>>>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
> >>>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
> >>>>>>> other changes are in chromeos-5.10.
> >>>>>
> >>>>> Lukasz: I assume you are running this on Atlas and are seeing this bug
> >>>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
> >>>>> Intel platform that have the latest upstream kernel running already
> >>>>> and see if this can be reproduced there too?
> >>>>> Note that the wifi PCI device is different on newer Intel platforms,
> >>>>> but platform design is similar enough that I suspect we should see
> >>>>> similar bug on those too. The other option is to try the latest
> >>>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
> >>>>> ignore bringing up the graphics stack and GUI), it may come up
> >>>>> sufficiently enough to try this patch?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Rajat
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>> Random observations, no analysis below.  This from your dmesg
> >>>>>>> certainly looks like PCI reads failing and returning ~0:
> >>>>>>>
> >>>>>>>      Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> >>>>>>>      iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> >>>>>>>      iwlwifi 0000:01:00.0: Device gone - attempting removal
> >>>>>>>      Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> >>>>>>>
> >>>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
> >>>>>>> reset (BAR is 0):
> >>>>>>>
> >>>>>>>      pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
> >>>>>>>      pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> >>>>>>>
> >>>>>>> lspci diffs from before/after suspend:
> >>>>>>>
> >>>>>>>       00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
> >>>>>>>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> >>>>>>>      -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
> >>>>>>>      +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>      -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>      +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>      -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> >>>>>>>      +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> >>>>>>>      -       Capabilities: [150 v0] Null
> >>>>>>>      -       Capabilities: [200 v1] L1 PM Substates
> >>>>>>>      -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>      -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> >>>>>>>      -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>      -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
> >>>>>>>      -               L1SubCtl2: T_PwrOn=60us
> >>>>>>>
> >>>>>>> The DevSta differences might be BIOS bugs, probably not relevant.
> >>>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
> >>>>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
> >>>>>>> disappeared.
> >>>>>>>
> >>>>>>>       01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> >>>>>>>                      LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>      -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>      +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >>>>>>>              Capabilities: [154 v1] L1 PM Substates
> >>>>>>>                      L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>                                PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
> >>>>>>>      -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>      -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
> >>>>>>>      +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>      +                          T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>
> >>>>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
> >>>>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
> >>>>>>> L1SubCtl1.
> >>>>>>>
> >>>>>>> Bjorn
> >>>>>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> Thank you all for the response and input! As Rajat mentioned I'm using
> >>>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
> >>>> (Apollolake)  - I will try to load most recent kernel and give it a
> >>>> try once again.
> >>>>
> >>>> Best regards,
> >>>> Lukasz
> >>>
> >>> Hi,
> >>>
> >>>    I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
> >>> still getting same results:
> >>> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
> >>>
> >>> Best regards,
> >>> Lukasz
> >>>
> > Hi Vidya,
> >
> > Sorry for the long delay, I have retested your patch on top of
> > linux-next/master (next-20220802) - the results for my device remain
> > the same.
> > Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg)
> > https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef
> > Let me know if you need any more logs and/or tests.
> >
> > Best regards,
> > Lukasz
> >
Hi Vidya,

After your last email, I've re-tested my setup and (without your
patch)  the capability register also disappears - so it looks there is
- in fact - some problem in my setup and your patch just brings it to
the top as after resume tries to write to a register that is no longer
present. I'm very sorry for the confusion here and I've not notice
that at the very beginning.

Best regards,
Lukasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-08-08 14:07                       ` Lukasz Majczak
@ 2022-08-08 16:16                         ` Vidya Sagar
  2022-08-23 14:55                           ` Kai-Heng Feng
  0 siblings, 1 reply; 24+ messages in thread
From: Vidya Sagar @ 2022-08-08 16:16 UTC (permalink / raw)
  To: Lukasz Majczak, Bjorn Helgaas
  Cc: Rajat Jain, Kai-Heng Feng, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

Thanks Lukasz for the update.
I think confirms that there is no issue with the patch as such.
Bjorn, could you please define the next step for this patch?

Thanks,
Vidya Sagar

On 8/8/2022 7:37 PM, Lukasz Majczak wrote:
> External email: Use caution opening links or attachments
> 
> 
> śr., 3 sie 2022 o 14:55 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>>
>> Thanks Lukasz for the logs.
>> I still that the L1SS capability in the root port (00:14.0) disappeared
>> after resume.
>> I still don't understand how this patch can make the capability register
>> itself disappear. Honestly, I still see this as a HW issue.
>> Bjorn, could you please throw some light on this?
>>
>> Thanks,
>> Vidya Sagar
>>
>> On 8/3/2022 5:34 PM, Lukasz Majczak wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>>>>
>>>> Hi Lukasz,
>>>> Thanks for sharing your observations.
>>>>
>>>> Could you please also share the output of 'sudo lspci -vvvv' before and
>>>> after suspend-resume cycle with the latest linux-next?
>>>> Do we still see the L1SS capabilities getting disappeared post resume?
>>>>
>>>> Thanks,
>>>> Vidya Sagar
>>>>
>>>> On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
>>>>>>
>>>>>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>>>>>
>>>>>>>> Agree with Bjorn's observations.
>>>>>>>> The fact that the L1SS capability registers themselves disappeared in
>>>>>>>> the root port post resume indicates that there seems to be something
>>>>>>>> wrong with the BIOS itself.
>>>>>>>> Could you please check from that perspective?
>>>>>>>
>>>>>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
>>>>>>> is a shallower sleep state that preserves more state than, for e.g. S3
>>>>>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
>>>>>>> at all. i.e. after the kernel runs its suspend routines, it just puts
>>>>>>> the CPU into S0ix state. So I do not think there is a BIOS angle to
>>>>>>> this.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vidya Sagar
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
>>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
>>>>>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
>>>>>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
>>>>>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
>>>>>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
>>>>>>>>>>>>> configuration being lost post-resume.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Save the L1 Substates control registers so that the configuration is
>>>>>>>>>>>>> retained post-resume.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>>>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Vidya,
>>>>>>>>>>>>
>>>>>>>>>>>> I tested this patch on kernel v5.19-rc6.
>>>>>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
>>>>>>>>>>>> This patch can restore L1SS after suspend/resume.
>>>>>>>>>>>>
>>>>>>>>>>>> The test results are as follows:
>>>>>>>>>>>>
>>>>>>>>>>>> After Boot:
>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>>>             Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>>>                     L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>                               PortCommonModeRestoreTime=255us
>>>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>>>                     L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>>>                                T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>>>>>>>>>                     L1SubCtl2: T_PwrOn=3100us
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> After suspend/resume without this patch.
>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>>>             Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>>>                     L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>                               PortCommonModeRestoreTime=255us
>>>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>>>                     L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>>>>>>>>>                                T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>>>>>>>>>                     L1SubCtl2: T_PwrOn=10us
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> After suspend/resume with this patch.
>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>>>             Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>>>                     L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>                               PortCommonModeRestoreTime=255us
>>>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>>>                     L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>>>                                T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>>>>>>>>>                     L1SubCtl2: T_PwrOn=3100us
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> Forgot to add mine:
>>>>>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Ben Chuang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
>>>>>>>>>>>>> on your laptop (Dell XPS 13) one last time?
>>>>>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
>>>>>>>>>>>>> could be due to a buggy old version BIOS in the laptop.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Vidya Sagar
>>>>>>>>>>>>>
>>>>>>>>>>>>>      drivers/pci/pci.c       |  7 +++++++
>>>>>>>>>>>>>      drivers/pci/pci.h       |  4 ++++
>>>>>>>>>>>>>      drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>      3 files changed, 55 insertions(+)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
>>>>>>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
>>>>>>>>>>>>>                     return i;
>>>>>>>>>>>>>
>>>>>>>>>>>>>             pci_save_ltr_state(dev);
>>>>>>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
>>>>>>>>>>>>>             pci_save_dpc_state(dev);
>>>>>>>>>>>>>             pci_save_aer_state(dev);
>>>>>>>>>>>>>             pci_save_ptm_state(dev);
>>>>>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
>>>>>>>>>>>>>              * LTR itself (in the PCIe capability).
>>>>>>>>>>>>>              */
>>>>>>>>>>>>>             pci_restore_ltr_state(dev);
>>>>>>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
>>>>>>>>>>>>>
>>>>>>>>>>>>>             pci_restore_pcie_state(dev);
>>>>>>>>>>>>>             pci_restore_pasid_state(dev);
>>>>>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
>>>>>>>>>>>>>             if (error)
>>>>>>>>>>>>>                     pci_err(dev, "unable to allocate suspend buffer for LTR\n");
>>>>>>>>>>>>>
>>>>>>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
>>>>>>>>>>>>> +                                           2 * sizeof(u32));
>>>>>>>>>>>>> +       if (error)
>>>>>>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
>>>>>>>>>>>>> +
>>>>>>>>>>>>>             pci_allocate_vc_save_buffers(dev);
>>>>>>>>>>>>>      }
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
>>>>>>>>>>>>> --- a/drivers/pci/pci.h
>>>>>>>>>>>>> +++ b/drivers/pci/pci.h
>>>>>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
>>>>>>>>>>>>>      void pcie_aspm_exit_link_state(struct pci_dev *pdev);
>>>>>>>>>>>>>      void pcie_aspm_pm_state_change(struct pci_dev *pdev);
>>>>>>>>>>>>>      void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>>>>>>>>>>>>>      #else
>>>>>>>>>>>>>      static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
>>>>>>>>>>>>>      static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
>>>>>>>>>>>>>      static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
>>>>>>>>>>>>>      static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
>>>>>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>>>>>>>>>      #endif
>>>>>>>>>>>>>
>>>>>>>>>>>>>      #ifdef CONFIG_PCIE_ECRC
>>>>>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
>>>>>>>>>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>>>>>>>>>>>>>                                     PCI_L1SS_CTL1_L1SS_MASK, val);
>>>>>>>>>>>>>      }
>>>>>>>>>>>>>
>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       int aspm_l1ss;
>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
>>>>>>>>>>>>> +       u32 *cap;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>> +       if (!aspm_l1ss)
>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>> +       if (!save_state)
>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       int aspm_l1ss;
>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
>>>>>>>>>>>>> +       u32 *cap;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>> +       if (!aspm_l1ss)
>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>> +       if (!save_state)
>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>>      static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
>>>>>>>>>>>>>      {
>>>>>>>>>>>>>             pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> With this patch (and also mentioned
>>>>>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
>>>>>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
>>>>>>>>>> suspend/resume with my WiFi card - it looks like whole communication
>>>>>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
>>>>>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
>>>>>>>>>>
>>>>>>>>>> I played a little bit with this code and it looks like the
>>>>>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
>>>>>>>>>> why, not a PCI expert).
>>>>>>>>>
>>>>>>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
>>>>>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
>>>>>>>>> other changes are in chromeos-5.10.
>>>>>>>
>>>>>>> Lukasz: I assume you are running this on Atlas and are seeing this bug
>>>>>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
>>>>>>> Intel platform that have the latest upstream kernel running already
>>>>>>> and see if this can be reproduced there too?
>>>>>>> Note that the wifi PCI device is different on newer Intel platforms,
>>>>>>> but platform design is similar enough that I suspect we should see
>>>>>>> similar bug on those too. The other option is to try the latest
>>>>>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
>>>>>>> ignore bringing up the graphics stack and GUI), it may come up
>>>>>>> sufficiently enough to try this patch?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Rajat
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Random observations, no analysis below.  This from your dmesg
>>>>>>>>> certainly looks like PCI reads failing and returning ~0:
>>>>>>>>>
>>>>>>>>>       Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
>>>>>>>>>       iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
>>>>>>>>>       iwlwifi 0000:01:00.0: Device gone - attempting removal
>>>>>>>>>       Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
>>>>>>>>>
>>>>>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
>>>>>>>>> reset (BAR is 0):
>>>>>>>>>
>>>>>>>>>       pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
>>>>>>>>>       pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
>>>>>>>>>
>>>>>>>>> lspci diffs from before/after suspend:
>>>>>>>>>
>>>>>>>>>        00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
>>>>>>>>>          Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
>>>>>>>>>       -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
>>>>>>>>>       +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>>       -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>>>       +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>>>       -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>>>>>>>>>       +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
>>>>>>>>>       -       Capabilities: [150 v0] Null
>>>>>>>>>       -       Capabilities: [200 v1] L1 PM Substates
>>>>>>>>>       -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>       -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
>>>>>>>>>       -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>       -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
>>>>>>>>>       -               L1SubCtl2: T_PwrOn=60us
>>>>>>>>>
>>>>>>>>> The DevSta differences might be BIOS bugs, probably not relevant.
>>>>>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
>>>>>>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
>>>>>>>>> disappeared.
>>>>>>>>>
>>>>>>>>>        01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
>>>>>>>>>                       LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>>>       -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>>       +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>>>>>>>               Capabilities: [154 v1] L1 PM Substates
>>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>                                 PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
>>>>>>>>>       -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>       -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
>>>>>>>>>       +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>>>>>>       +                          T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>>>>>>
>>>>>>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
>>>>>>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
>>>>>>>>> L1SubCtl1.
>>>>>>>>>
>>>>>>>>> Bjorn
>>>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Thank you all for the response and input! As Rajat mentioned I'm using
>>>>>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
>>>>>> (Apollolake)  - I will try to load most recent kernel and give it a
>>>>>> try once again.
>>>>>>
>>>>>> Best regards,
>>>>>> Lukasz
>>>>>
>>>>> Hi,
>>>>>
>>>>>     I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
>>>>> still getting same results:
>>>>> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
>>>>>
>>>>> Best regards,
>>>>> Lukasz
>>>>>
>>> Hi Vidya,
>>>
>>> Sorry for the long delay, I have retested your patch on top of
>>> linux-next/master (next-20220802) - the results for my device remain
>>> the same.
>>> Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg)
>>> https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef
>>> Let me know if you need any more logs and/or tests.
>>>
>>> Best regards,
>>> Lukasz
>>>
> Hi Vidya,
> 
> After your last email, I've re-tested my setup and (without your
> patch)  the capability register also disappears - so it looks there is
> - in fact - some problem in my setup and your patch just brings it to
> the top as after resume tries to write to a register that is no longer
> present. I'm very sorry for the confusion here and I've not notice
> that at the very beginning.
> 
> Best regards,
> Lukasz
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-08-08 16:16                         ` Vidya Sagar
@ 2022-08-23 14:55                           ` Kai-Heng Feng
  2022-08-25 23:01                             ` Bjorn Helgaas
  2022-08-26 13:00                             ` Vidya Sagar
  0 siblings, 2 replies; 24+ messages in thread
From: Kai-Heng Feng @ 2022-08-23 14:55 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: Lukasz Majczak, Bjorn Helgaas, Rajat Jain, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

Hi Vidya,

On Tue, Aug 9, 2022 at 12:17 AM Vidya Sagar <vidyas@nvidia.com> wrote:
>
> Thanks Lukasz for the update.
> I think confirms that there is no issue with the patch as such.
> Bjorn, could you please define the next step for this patch?

I think the L1SS cap went away _after_ L1SS registers are restored,
since your patch already check the cap before doing any write:
+       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
+       if (!aspm_l1ss)
+               return;

That means it's more likely to be caused by the following change:
+       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
+       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);

So is it possible to clear PCI_L1SS_CTL1 before setting PCI_L1SS_CTL2,
like what aspm_calc_l1ss_info() does?

Kai-Heng

>
> Thanks,
> Vidya Sagar
>
> On 8/8/2022 7:37 PM, Lukasz Majczak wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > śr., 3 sie 2022 o 14:55 Vidya Sagar <vidyas@nvidia.com> napisał(a):
> >>
> >> Thanks Lukasz for the logs.
> >> I still that the L1SS capability in the root port (00:14.0) disappeared
> >> after resume.
> >> I still don't understand how this patch can make the capability register
> >> itself disappear. Honestly, I still see this as a HW issue.
> >> Bjorn, could you please throw some light on this?
> >>
> >> Thanks,
> >> Vidya Sagar
> >>
> >> On 8/3/2022 5:34 PM, Lukasz Majczak wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@nvidia.com> napisał(a):
> >>>>
> >>>> Hi Lukasz,
> >>>> Thanks for sharing your observations.
> >>>>
> >>>> Could you please also share the output of 'sudo lspci -vvvv' before and
> >>>> after suspend-resume cycle with the latest linux-next?
> >>>> Do we still see the L1SS capabilities getting disappeared post resume?
> >>>>
> >>>> Thanks,
> >>>> Vidya Sagar
> >>>>
> >>>> On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
> >>>>> External email: Use caution opening links or attachments
> >>>>>
> >>>>>
> >>>>> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
> >>>>>>
> >>>>>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>>>>>
> >>>>>>>> Agree with Bjorn's observations.
> >>>>>>>> The fact that the L1SS capability registers themselves disappeared in
> >>>>>>>> the root port post resume indicates that there seems to be something
> >>>>>>>> wrong with the BIOS itself.
> >>>>>>>> Could you please check from that perspective?
> >>>>>>>
> >>>>>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
> >>>>>>> is a shallower sleep state that preserves more state than, for e.g. S3
> >>>>>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
> >>>>>>> at all. i.e. after the kernel runs its suspend routines, it just puts
> >>>>>>> the CPU into S0ix state. So I do not think there is a BIOS angle to
> >>>>>>> this.
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Vidya Sagar
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> >>>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> >>>>>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
> >>>>>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
> >>>>>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> >>>>>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
> >>>>>>>>>>>>> configuration being lost post-resume.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Save the L1 Substates control registers so that the configuration is
> >>>>>>>>>>>>> retained post-resume.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> >>>>>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Vidya,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I tested this patch on kernel v5.19-rc6.
> >>>>>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> >>>>>>>>>>>> This patch can restore L1SS after suspend/resume.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The test results are as follows:
> >>>>>>>>>>>>
> >>>>>>>>>>>> After Boot:
> >>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>             Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>                     L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>                               PortCommonModeRestoreTime=255us
> >>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>                     L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>                                T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>>>>>                     L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> After suspend/resume without this patch.
> >>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>             Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>                     L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>                               PortCommonModeRestoreTime=255us
> >>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>                     L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>>>>>>                                T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>>>>>>                     L1SubCtl2: T_PwrOn=10us
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> After suspend/resume with this patch.
> >>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>             Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>                     L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>                               PortCommonModeRestoreTime=255us
> >>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>                     L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>                                T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>>>>>                     L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
> >>>>>>>>>>>
> >>>>>>>>>>> Forgot to add mine:
> >>>>>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>> Ben Chuang
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> >>>>>>>>>>>>> on your laptop (Dell XPS 13) one last time?
> >>>>>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
> >>>>>>>>>>>>> could be due to a buggy old version BIOS in the laptop.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Vidya Sagar
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>      drivers/pci/pci.c       |  7 +++++++
> >>>>>>>>>>>>>      drivers/pci/pci.h       |  4 ++++
> >>>>>>>>>>>>>      drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>>>>>>      3 files changed, 55 insertions(+)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >>>>>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
> >>>>>>>>>>>>> --- a/drivers/pci/pci.c
> >>>>>>>>>>>>> +++ b/drivers/pci/pci.c
> >>>>>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> >>>>>>>>>>>>>                     return i;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>             pci_save_ltr_state(dev);
> >>>>>>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
> >>>>>>>>>>>>>             pci_save_dpc_state(dev);
> >>>>>>>>>>>>>             pci_save_aer_state(dev);
> >>>>>>>>>>>>>             pci_save_ptm_state(dev);
> >>>>>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> >>>>>>>>>>>>>              * LTR itself (in the PCIe capability).
> >>>>>>>>>>>>>              */
> >>>>>>>>>>>>>             pci_restore_ltr_state(dev);
> >>>>>>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>             pci_restore_pcie_state(dev);
> >>>>>>>>>>>>>             pci_restore_pasid_state(dev);
> >>>>>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >>>>>>>>>>>>>             if (error)
> >>>>>>>>>>>>>                     pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> >>>>>>>>>>>>> +                                           2 * sizeof(u32));
> >>>>>>>>>>>>> +       if (error)
> >>>>>>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>>             pci_allocate_vc_save_buffers(dev);
> >>>>>>>>>>>>>      }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> >>>>>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
> >>>>>>>>>>>>> --- a/drivers/pci/pci.h
> >>>>>>>>>>>>> +++ b/drivers/pci/pci.h
> >>>>>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> >>>>>>>>>>>>>      void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> >>>>>>>>>>>>>      void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> >>>>>>>>>>>>>      void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> >>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>>>>>      #else
> >>>>>>>>>>>>>      static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>      static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>      static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>      static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> >>>>>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>>>>>      #endif
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>      #ifdef CONFIG_PCIE_ECRC
> >>>>>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
> >>>>>>>>>>>>> --- a/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >>>>>>>>>>>>>                                     PCI_L1SS_CTL1_L1SS_MASK, val);
> >>>>>>>>>>>>>      }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>>>>>> +{
> >>>>>>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>>>>>> +       u32 *cap;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>> +       if (!save_state)
> >>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> >>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> >>>>>>>>>>>>> +}
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>>>>>> +{
> >>>>>>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>>>>>> +       u32 *cap;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>> +       if (!save_state)
> >>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> >>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >>>>>>>>>>>>> +}
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>>      static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> >>>>>>>>>>>>>      {
> >>>>>>>>>>>>>             pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> 2.17.1
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> With this patch (and also mentioned
> >>>>>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
> >>>>>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
> >>>>>>>>>> suspend/resume with my WiFi card - it looks like whole communication
> >>>>>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> >>>>>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> >>>>>>>>>>
> >>>>>>>>>> I played a little bit with this code and it looks like the
> >>>>>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> >>>>>>>>>> why, not a PCI expert).
> >>>>>>>>>
> >>>>>>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
> >>>>>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
> >>>>>>>>> other changes are in chromeos-5.10.
> >>>>>>>
> >>>>>>> Lukasz: I assume you are running this on Atlas and are seeing this bug
> >>>>>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
> >>>>>>> Intel platform that have the latest upstream kernel running already
> >>>>>>> and see if this can be reproduced there too?
> >>>>>>> Note that the wifi PCI device is different on newer Intel platforms,
> >>>>>>> but platform design is similar enough that I suspect we should see
> >>>>>>> similar bug on those too. The other option is to try the latest
> >>>>>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
> >>>>>>> ignore bringing up the graphics stack and GUI), it may come up
> >>>>>>> sufficiently enough to try this patch?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Rajat
> >>>>>>>
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> Random observations, no analysis below.  This from your dmesg
> >>>>>>>>> certainly looks like PCI reads failing and returning ~0:
> >>>>>>>>>
> >>>>>>>>>       Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> >>>>>>>>>       iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> >>>>>>>>>       iwlwifi 0000:01:00.0: Device gone - attempting removal
> >>>>>>>>>       Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> >>>>>>>>>
> >>>>>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
> >>>>>>>>> reset (BAR is 0):
> >>>>>>>>>
> >>>>>>>>>       pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
> >>>>>>>>>       pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> >>>>>>>>>
> >>>>>>>>> lspci diffs from before/after suspend:
> >>>>>>>>>
> >>>>>>>>>        00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
> >>>>>>>>>          Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> >>>>>>>>>       -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
> >>>>>>>>>       +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>>>       -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>       +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>       -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> >>>>>>>>>       +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> >>>>>>>>>       -       Capabilities: [150 v0] Null
> >>>>>>>>>       -       Capabilities: [200 v1] L1 PM Substates
> >>>>>>>>>       -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>       -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> >>>>>>>>>       -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>       -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
> >>>>>>>>>       -               L1SubCtl2: T_PwrOn=60us
> >>>>>>>>>
> >>>>>>>>> The DevSta differences might be BIOS bugs, probably not relevant.
> >>>>>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
> >>>>>>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
> >>>>>>>>> disappeared.
> >>>>>>>>>
> >>>>>>>>>        01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> >>>>>>>>>                       LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>       -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>       +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>               Capabilities: [154 v1] L1 PM Substates
> >>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>                                 PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
> >>>>>>>>>       -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>       -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
> >>>>>>>>>       +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>>>       +                          T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>>>
> >>>>>>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
> >>>>>>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
> >>>>>>>>> L1SubCtl1.
> >>>>>>>>>
> >>>>>>>>> Bjorn
> >>>>>>>>>
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Thank you all for the response and input! As Rajat mentioned I'm using
> >>>>>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
> >>>>>> (Apollolake)  - I will try to load most recent kernel and give it a
> >>>>>> try once again.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Lukasz
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>>     I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
> >>>>> still getting same results:
> >>>>> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
> >>>>>
> >>>>> Best regards,
> >>>>> Lukasz
> >>>>>
> >>> Hi Vidya,
> >>>
> >>> Sorry for the long delay, I have retested your patch on top of
> >>> linux-next/master (next-20220802) - the results for my device remain
> >>> the same.
> >>> Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg)
> >>> https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef
> >>> Let me know if you need any more logs and/or tests.
> >>>
> >>> Best regards,
> >>> Lukasz
> >>>
> > Hi Vidya,
> >
> > After your last email, I've re-tested my setup and (without your
> > patch)  the capability register also disappears - so it looks there is
> > - in fact - some problem in my setup and your patch just brings it to
> > the top as after resume tries to write to a register that is no longer
> > present. I'm very sorry for the confusion here and I've not notice
> > that at the very beginning.
> >
> > Best regards,
> > Lukasz
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-08-23 14:55                           ` Kai-Heng Feng
@ 2022-08-25 23:01                             ` Bjorn Helgaas
  2022-08-26  3:13                               ` Kai-Heng Feng
  2022-08-26 13:00                             ` Vidya Sagar
  1 sibling, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2022-08-25 23:01 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Vidya Sagar, Lukasz Majczak, Rajat Jain, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

On Tue, Aug 23, 2022 at 10:55:01PM +0800, Kai-Heng Feng wrote:
> On Tue, Aug 9, 2022 at 12:17 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> >
> > Thanks Lukasz for the update.
> > I think confirms that there is no issue with the patch as such.
> > Bjorn, could you please define the next step for this patch?
> 
> I think the L1SS cap went away _after_ L1SS registers are restored,
> since your patch already check the cap before doing any write:
> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> +       if (!aspm_l1ss)
> +               return;
> 
> That means it's more likely to be caused by the following change:
> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> 
> So is it possible to clear PCI_L1SS_CTL1 before setting PCI_L1SS_CTL2,
> like what aspm_calc_l1ss_info() does?

Sorry, I've totally lost track of where we are with this.  I guess the
object is to save/restore L1SS state.

And there are two problems that aren't understood yet?

  1) Lukasz's 01:00.0 wifi device didn't work immediately after
  resume, but seemed to be hot-added later? [1]

  2) The 00:14.0 Root Port L1SS capability was present before
  suspend/resume but not after? [2,3]

I thought Lukasz's latest emails [4,5] indicated that problem 1) still
happened and presumably only happens with Vidya's patch, and 2) also
still happens, but happens even *without* Vidya's patch.  Do I have
that right?

If adding the patch causes 1), obviously we would need to fix that.
It would certainly be good to understand 2) as well, but I guess if
that's a pre-existing problem, ...

Bjorn

[1] https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3#file-dmesg-L1762
[2] https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3#file-lspci-before-suspend-log-L136
[3] https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3#file-lspci-after-suspend-log-L136
[4] https://lore.kernel.org/r/CAFJ_xbr5NjoV1jC3P93N4UgooUuNdCRnrX7HuK=xLtPM5y7EjA@mail.gmail.com
[5] https://lore.kernel.org/r/CAFJ_xboyQyEaDeQ+pZH_YqN52-ALGNqzmmzeyNt6X_Cz-c1w9Q@mail.gmail.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-08-25 23:01                             ` Bjorn Helgaas
@ 2022-08-26  3:13                               ` Kai-Heng Feng
  0 siblings, 0 replies; 24+ messages in thread
From: Kai-Heng Feng @ 2022-08-26  3:13 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Vidya Sagar, Lukasz Majczak, Rajat Jain, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

On Fri, Aug 26, 2022 at 7:01 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Aug 23, 2022 at 10:55:01PM +0800, Kai-Heng Feng wrote:
> > On Tue, Aug 9, 2022 at 12:17 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> > >
> > > Thanks Lukasz for the update.
> > > I think confirms that there is no issue with the patch as such.
> > > Bjorn, could you please define the next step for this patch?
> >
> > I think the L1SS cap went away _after_ L1SS registers are restored,
> > since your patch already check the cap before doing any write:
> > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > +       if (!aspm_l1ss)
> > +               return;
> >
> > That means it's more likely to be caused by the following change:
> > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >
> > So is it possible to clear PCI_L1SS_CTL1 before setting PCI_L1SS_CTL2,
> > like what aspm_calc_l1ss_info() does?
>
> Sorry, I've totally lost track of where we are with this.  I guess the
> object is to save/restore L1SS state.
>
> And there are two problems that aren't understood yet?
>
>   1) Lukasz's 01:00.0 wifi device didn't work immediately after
>   resume, but seemed to be hot-added later? [1]
>
>   2) The 00:14.0 Root Port L1SS capability was present before
>   suspend/resume but not after? [2,3]
>
> I thought Lukasz's latest emails [4,5] indicated that problem 1) still
> happened and presumably only happens with Vidya's patch, and 2) also
> still happens, but happens even *without* Vidya's patch.  Do I have
> that right?

Thanks, so root port already losing its L1SS cap before applying the patch.

>
> If adding the patch causes 1), obviously we would need to fix that.
> It would certainly be good to understand 2) as well, but I guess if
> that's a pre-existing problem, ...

I wonder if checking parent device's L1SS cap in
pci_restore_aspm_l1ss_state() a good workaround?

If this is indeed a firmware side issue, it explains why Kenneth's XPS
doesn't have this issue anymore after some BIOS updates.

Kai-Heng

>
> Bjorn
>
> [1] https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3#file-dmesg-L1762
> [2] https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3#file-lspci-before-suspend-log-L136
> [3] https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3#file-lspci-after-suspend-log-L136
> [4] https://lore.kernel.org/r/CAFJ_xbr5NjoV1jC3P93N4UgooUuNdCRnrX7HuK=xLtPM5y7EjA@mail.gmail.com
> [5] https://lore.kernel.org/r/CAFJ_xboyQyEaDeQ+pZH_YqN52-ALGNqzmmzeyNt6X_Cz-c1w9Q@mail.gmail.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-08-23 14:55                           ` Kai-Heng Feng
  2022-08-25 23:01                             ` Bjorn Helgaas
@ 2022-08-26 13:00                             ` Vidya Sagar
  2022-08-30 11:15                               ` Lukasz Majczak
  1 sibling, 1 reply; 24+ messages in thread
From: Vidya Sagar @ 2022-08-26 13:00 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Lukasz Majczak, Bjorn Helgaas, Rajat Jain, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv



On 8/23/2022 8:25 PM, Kai-Heng Feng wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hi Vidya,
> 
> On Tue, Aug 9, 2022 at 12:17 AM Vidya Sagar <vidyas@nvidia.com> wrote:
>>
>> Thanks Lukasz for the update.
>> I think confirms that there is no issue with the patch as such.
>> Bjorn, could you please define the next step for this patch?
> 
> I think the L1SS cap went away _after_ L1SS registers are restored,
> since your patch already check the cap before doing any write:
> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> +       if (!aspm_l1ss)
> +               return;
> 
> That means it's more likely to be caused by the following change:
> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> 
> So is it possible to clear PCI_L1SS_CTL1 before setting PCI_L1SS_CTL2,
> like what aspm_calc_l1ss_info() does?

I posted a new patch 
https://patchwork.kernel.org/project/linux-pci/patch/20220826125526.28859-1-vidyas@nvidia.com/ 
keeping L1.2 disabled while restoring the rest of the fields in 
Control-1 register and restoring the L1.2 enable bits later. Could you 
please try this new patch on your setup and update your observations?

Thanks & Regards,
Vidya Sagar

> 
> Kai-Heng
> 
>>
>> Thanks,
>> Vidya Sagar
>>
>> On 8/8/2022 7:37 PM, Lukasz Majczak wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> śr., 3 sie 2022 o 14:55 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>>>>
>>>> Thanks Lukasz for the logs.
>>>> I still that the L1SS capability in the root port (00:14.0) disappeared
>>>> after resume.
>>>> I still don't understand how this patch can make the capability register
>>>> itself disappear. Honestly, I still see this as a HW issue.
>>>> Bjorn, could you please throw some light on this?
>>>>
>>>> Thanks,
>>>> Vidya Sagar
>>>>
>>>> On 8/3/2022 5:34 PM, Lukasz Majczak wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>>>>>>
>>>>>> Hi Lukasz,
>>>>>> Thanks for sharing your observations.
>>>>>>
>>>>>> Could you please also share the output of 'sudo lspci -vvvv' before and
>>>>>> after suspend-resume cycle with the latest linux-next?
>>>>>> Do we still see the L1SS capabilities getting disappeared post resume?
>>>>>>
>>>>>> Thanks,
>>>>>> Vidya Sagar
>>>>>>
>>>>>> On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
>>>>>>> External email: Use caution opening links or attachments
>>>>>>>
>>>>>>>
>>>>>>> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
>>>>>>>>
>>>>>>>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Agree with Bjorn's observations.
>>>>>>>>>> The fact that the L1SS capability registers themselves disappeared in
>>>>>>>>>> the root port post resume indicates that there seems to be something
>>>>>>>>>> wrong with the BIOS itself.
>>>>>>>>>> Could you please check from that perspective?
>>>>>>>>>
>>>>>>>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
>>>>>>>>> is a shallower sleep state that preserves more state than, for e.g. S3
>>>>>>>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
>>>>>>>>> at all. i.e. after the kernel runs its suspend routines, it just puts
>>>>>>>>> the CPU into S0ix state. So I do not think there is a BIOS angle to
>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vidya Sagar
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
>>>>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
>>>>>>>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
>>>>>>>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
>>>>>>>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
>>>>>>>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
>>>>>>>>>>>>>>> configuration being lost post-resume.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Save the L1 Substates control registers so that the configuration is
>>>>>>>>>>>>>>> retained post-resume.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>>>>>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Vidya,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I tested this patch on kernel v5.19-rc6.
>>>>>>>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
>>>>>>>>>>>>>> This patch can restore L1SS after suspend/resume.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The test results are as follows:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After Boot:
>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>>>>>              Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>>>>>                      L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>>>                                PortCommonModeRestoreTime=255us
>>>>>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>>>>>                      L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>>>>>                                 T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>>>>>>>>>>>                      L1SubCtl2: T_PwrOn=3100us
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After suspend/resume without this patch.
>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>>>>>              Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>>>>>                      L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>>>                                PortCommonModeRestoreTime=255us
>>>>>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>>>>>                      L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>>>>>>>>>>>                                 T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>>>>>>>>>>>                      L1SubCtl2: T_PwrOn=10us
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After suspend/resume with this patch.
>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>>>>>              Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>>>>>                      L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>>>                                PortCommonModeRestoreTime=255us
>>>>>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>>>>>                      L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>>>>>                                 T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>>>>>>>>>>>                      L1SubCtl2: T_PwrOn=3100us
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Forgot to add mine:
>>>>>>>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Ben Chuang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
>>>>>>>>>>>>>>> on your laptop (Dell XPS 13) one last time?
>>>>>>>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
>>>>>>>>>>>>>>> could be due to a buggy old version BIOS in the laptop.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Vidya Sagar
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       drivers/pci/pci.c       |  7 +++++++
>>>>>>>>>>>>>>>       drivers/pci/pci.h       |  4 ++++
>>>>>>>>>>>>>>>       drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>>       3 files changed, 55 insertions(+)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
>>>>>>>>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
>>>>>>>>>>>>>>>                      return i;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>              pci_save_ltr_state(dev);
>>>>>>>>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
>>>>>>>>>>>>>>>              pci_save_dpc_state(dev);
>>>>>>>>>>>>>>>              pci_save_aer_state(dev);
>>>>>>>>>>>>>>>              pci_save_ptm_state(dev);
>>>>>>>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
>>>>>>>>>>>>>>>               * LTR itself (in the PCIe capability).
>>>>>>>>>>>>>>>               */
>>>>>>>>>>>>>>>              pci_restore_ltr_state(dev);
>>>>>>>>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>              pci_restore_pcie_state(dev);
>>>>>>>>>>>>>>>              pci_restore_pasid_state(dev);
>>>>>>>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
>>>>>>>>>>>>>>>              if (error)
>>>>>>>>>>>>>>>                      pci_err(dev, "unable to allocate suspend buffer for LTR\n");
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
>>>>>>>>>>>>>>> +                                           2 * sizeof(u32));
>>>>>>>>>>>>>>> +       if (error)
>>>>>>>>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>              pci_allocate_vc_save_buffers(dev);
>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>>>>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
>>>>>>>>>>>>>>> --- a/drivers/pci/pci.h
>>>>>>>>>>>>>>> +++ b/drivers/pci/pci.h
>>>>>>>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
>>>>>>>>>>>>>>>       void pcie_aspm_exit_link_state(struct pci_dev *pdev);
>>>>>>>>>>>>>>>       void pcie_aspm_pm_state_change(struct pci_dev *pdev);
>>>>>>>>>>>>>>>       void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
>>>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
>>>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>>>>>>>>>>>>>>>       #else
>>>>>>>>>>>>>>>       static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
>>>>>>>>>>>>>>>       static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
>>>>>>>>>>>>>>>       static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
>>>>>>>>>>>>>>>       static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
>>>>>>>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>>>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>>>>>>>>>>>       #endif
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       #ifdef CONFIG_PCIE_ECRC
>>>>>>>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>>>>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
>>>>>>>>>>>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>>>>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>>>>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>>>>>>>>>>>>>>>                                      PCI_L1SS_CTL1_L1SS_MASK, val);
>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       int aspm_l1ss;
>>>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
>>>>>>>>>>>>>>> +       u32 *cap;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>>>> +       if (!aspm_l1ss)
>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>>>> +       if (!save_state)
>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
>>>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       int aspm_l1ss;
>>>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
>>>>>>>>>>>>>>> +       u32 *cap;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>>>> +       if (!aspm_l1ss)
>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>>>> +       if (!save_state)
>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
>>>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>       static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
>>>>>>>>>>>>>>>       {
>>>>>>>>>>>>>>>              pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> With this patch (and also mentioned
>>>>>>>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
>>>>>>>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
>>>>>>>>>>>> suspend/resume with my WiFi card - it looks like whole communication
>>>>>>>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
>>>>>>>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
>>>>>>>>>>>>
>>>>>>>>>>>> I played a little bit with this code and it looks like the
>>>>>>>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
>>>>>>>>>>>> why, not a PCI expert).
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
>>>>>>>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
>>>>>>>>>>> other changes are in chromeos-5.10.
>>>>>>>>>
>>>>>>>>> Lukasz: I assume you are running this on Atlas and are seeing this bug
>>>>>>>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
>>>>>>>>> Intel platform that have the latest upstream kernel running already
>>>>>>>>> and see if this can be reproduced there too?
>>>>>>>>> Note that the wifi PCI device is different on newer Intel platforms,
>>>>>>>>> but platform design is similar enough that I suspect we should see
>>>>>>>>> similar bug on those too. The other option is to try the latest
>>>>>>>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
>>>>>>>>> ignore bringing up the graphics stack and GUI), it may come up
>>>>>>>>> sufficiently enough to try this patch?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Rajat
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Random observations, no analysis below.  This from your dmesg
>>>>>>>>>>> certainly looks like PCI reads failing and returning ~0:
>>>>>>>>>>>
>>>>>>>>>>>        Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
>>>>>>>>>>>        iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
>>>>>>>>>>>        iwlwifi 0000:01:00.0: Device gone - attempting removal
>>>>>>>>>>>        Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
>>>>>>>>>>>
>>>>>>>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
>>>>>>>>>>> reset (BAR is 0):
>>>>>>>>>>>
>>>>>>>>>>>        pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
>>>>>>>>>>>        pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
>>>>>>>>>>>
>>>>>>>>>>> lspci diffs from before/after suspend:
>>>>>>>>>>>
>>>>>>>>>>>         00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
>>>>>>>>>>>           Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
>>>>>>>>>>>        -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
>>>>>>>>>>>        +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>>>>        -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>>>>>        +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>>>>>        -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>>>>>>>>>>>        +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
>>>>>>>>>>>        -       Capabilities: [150 v0] Null
>>>>>>>>>>>        -       Capabilities: [200 v1] L1 PM Substates
>>>>>>>>>>>        -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>        -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
>>>>>>>>>>>        -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>>        -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
>>>>>>>>>>>        -               L1SubCtl2: T_PwrOn=60us
>>>>>>>>>>>
>>>>>>>>>>> The DevSta differences might be BIOS bugs, probably not relevant.
>>>>>>>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
>>>>>>>>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
>>>>>>>>>>> disappeared.
>>>>>>>>>>>
>>>>>>>>>>>         01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
>>>>>>>>>>>                        LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>>>>>        -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>>>>        +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>>>>>>>>>                Capabilities: [154 v1] L1 PM Substates
>>>>>>>>>>>                        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>                                  PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
>>>>>>>>>>>        -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>>        -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
>>>>>>>>>>>        +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>>>>>>>>        +                          T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>>>>>>>>
>>>>>>>>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
>>>>>>>>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
>>>>>>>>>>> L1SubCtl1.
>>>>>>>>>>>
>>>>>>>>>>> Bjorn
>>>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Thank you all for the response and input! As Rajat mentioned I'm using
>>>>>>>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
>>>>>>>> (Apollolake)  - I will try to load most recent kernel and give it a
>>>>>>>> try once again.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Lukasz
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>      I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
>>>>>>> still getting same results:
>>>>>>> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Lukasz
>>>>>>>
>>>>> Hi Vidya,
>>>>>
>>>>> Sorry for the long delay, I have retested your patch on top of
>>>>> linux-next/master (next-20220802) - the results for my device remain
>>>>> the same.
>>>>> Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg)
>>>>> https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef
>>>>> Let me know if you need any more logs and/or tests.
>>>>>
>>>>> Best regards,
>>>>> Lukasz
>>>>>
>>> Hi Vidya,
>>>
>>> After your last email, I've re-tested my setup and (without your
>>> patch)  the capability register also disappears - so it looks there is
>>> - in fact - some problem in my setup and your patch just brings it to
>>> the top as after resume tries to write to a register that is no longer
>>> present. I'm very sorry for the confusion here and I've not notice
>>> that at the very beginning.
>>>
>>> Best regards,
>>> Lukasz
>>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-08-26 13:00                             ` Vidya Sagar
@ 2022-08-30 11:15                               ` Lukasz Majczak
  2022-08-30 14:02                                 ` Vidya Sagar
  0 siblings, 1 reply; 24+ messages in thread
From: Lukasz Majczak @ 2022-08-30 11:15 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: Kai-Heng Feng, Bjorn Helgaas, Rajat Jain, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

pt., 26 sie 2022 o 15:00 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>
>
>
> On 8/23/2022 8:25 PM, Kai-Heng Feng wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > Hi Vidya,
> >
> > On Tue, Aug 9, 2022 at 12:17 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>
> >> Thanks Lukasz for the update.
> >> I think confirms that there is no issue with the patch as such.
> >> Bjorn, could you please define the next step for this patch?
> >
> > I think the L1SS cap went away _after_ L1SS registers are restored,
> > since your patch already check the cap before doing any write:
> > +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > +       if (!aspm_l1ss)
> > +               return;
> >
> > That means it's more likely to be caused by the following change:
> > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> > +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >
> > So is it possible to clear PCI_L1SS_CTL1 before setting PCI_L1SS_CTL2,
> > like what aspm_calc_l1ss_info() does?
>
> I posted a new patch
> https://patchwork.kernel.org/project/linux-pci/patch/20220826125526.28859-1-vidyas@nvidia.com/
> keeping L1.2 disabled while restoring the rest of the fields in
> Control-1 register and restoring the L1.2 enable bits later. Could you
> please try this new patch on your setup and update your observations?
>
> Thanks & Regards,
> Vidya Sagar
>
> >
> > Kai-Heng
> >
> >>
> >> Thanks,
> >> Vidya Sagar
> >>
> >> On 8/8/2022 7:37 PM, Lukasz Majczak wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> śr., 3 sie 2022 o 14:55 Vidya Sagar <vidyas@nvidia.com> napisał(a):
> >>>>
> >>>> Thanks Lukasz for the logs.
> >>>> I still that the L1SS capability in the root port (00:14.0) disappeared
> >>>> after resume.
> >>>> I still don't understand how this patch can make the capability register
> >>>> itself disappear. Honestly, I still see this as a HW issue.
> >>>> Bjorn, could you please throw some light on this?
> >>>>
> >>>> Thanks,
> >>>> Vidya Sagar
> >>>>
> >>>> On 8/3/2022 5:34 PM, Lukasz Majczak wrote:
> >>>>> External email: Use caution opening links or attachments
> >>>>>
> >>>>>
> >>>>> pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@nvidia.com> napisał(a):
> >>>>>>
> >>>>>> Hi Lukasz,
> >>>>>> Thanks for sharing your observations.
> >>>>>>
> >>>>>> Could you please also share the output of 'sudo lspci -vvvv' before and
> >>>>>> after suspend-resume cycle with the latest linux-next?
> >>>>>> Do we still see the L1SS capabilities getting disappeared post resume?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vidya Sagar
> >>>>>>
> >>>>>> On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
> >>>>>>> External email: Use caution opening links or attachments
> >>>>>>>
> >>>>>>>
> >>>>>>> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
> >>>>>>>>
> >>>>>>>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
> >>>>>>>>>
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Agree with Bjorn's observations.
> >>>>>>>>>> The fact that the L1SS capability registers themselves disappeared in
> >>>>>>>>>> the root port post resume indicates that there seems to be something
> >>>>>>>>>> wrong with the BIOS itself.
> >>>>>>>>>> Could you please check from that perspective?
> >>>>>>>>>
> >>>>>>>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
> >>>>>>>>> is a shallower sleep state that preserves more state than, for e.g. S3
> >>>>>>>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
> >>>>>>>>> at all. i.e. after the kernel runs its suspend routines, it just puts
> >>>>>>>>> the CPU into S0ix state. So I do not think there is a BIOS angle to
> >>>>>>>>> this.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Vidya Sagar
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> >>>>>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> >>>>>>>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
> >>>>>>>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
> >>>>>>>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> >>>>>>>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
> >>>>>>>>>>>>>>> configuration being lost post-resume.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Save the L1 Substates control registers so that the configuration is
> >>>>>>>>>>>>>>> retained post-resume.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> >>>>>>>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Vidya,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I tested this patch on kernel v5.19-rc6.
> >>>>>>>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> >>>>>>>>>>>>>> This patch can restore L1SS after suspend/resume.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The test results are as follows:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> After Boot:
> >>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>>>              Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>>>                      L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>>                                PortCommonModeRestoreTime=255us
> >>>>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>>>                      L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>>>                                 T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>>>>>>>                      L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> After suspend/resume without this patch.
> >>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>>>              Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>>>                      L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>>                                PortCommonModeRestoreTime=255us
> >>>>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>>>                      L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>>>>>>>>                                 T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>>>>>>>>                      L1SubCtl2: T_PwrOn=10us
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> After suspend/resume with this patch.
> >>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>>>              Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>>>                      L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>>                                PortCommonModeRestoreTime=255us
> >>>>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>>>                      L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>>>                                 T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>>>>>>>                      L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Forgot to add mine:
> >>>>>>>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>> Ben Chuang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> >>>>>>>>>>>>>>> on your laptop (Dell XPS 13) one last time?
> >>>>>>>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
> >>>>>>>>>>>>>>> could be due to a buggy old version BIOS in the laptop.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Vidya Sagar
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>       drivers/pci/pci.c       |  7 +++++++
> >>>>>>>>>>>>>>>       drivers/pci/pci.h       |  4 ++++
> >>>>>>>>>>>>>>>       drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>>>>>>>>       3 files changed, 55 insertions(+)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >>>>>>>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
> >>>>>>>>>>>>>>> --- a/drivers/pci/pci.c
> >>>>>>>>>>>>>>> +++ b/drivers/pci/pci.c
> >>>>>>>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>>                      return i;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>              pci_save_ltr_state(dev);
> >>>>>>>>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
> >>>>>>>>>>>>>>>              pci_save_dpc_state(dev);
> >>>>>>>>>>>>>>>              pci_save_aer_state(dev);
> >>>>>>>>>>>>>>>              pci_save_ptm_state(dev);
> >>>>>>>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>>               * LTR itself (in the PCIe capability).
> >>>>>>>>>>>>>>>               */
> >>>>>>>>>>>>>>>              pci_restore_ltr_state(dev);
> >>>>>>>>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>              pci_restore_pcie_state(dev);
> >>>>>>>>>>>>>>>              pci_restore_pasid_state(dev);
> >>>>>>>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >>>>>>>>>>>>>>>              if (error)
> >>>>>>>>>>>>>>>                      pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> >>>>>>>>>>>>>>> +                                           2 * sizeof(u32));
> >>>>>>>>>>>>>>> +       if (error)
> >>>>>>>>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>              pci_allocate_vc_save_buffers(dev);
> >>>>>>>>>>>>>>>       }
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> >>>>>>>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
> >>>>>>>>>>>>>>> --- a/drivers/pci/pci.h
> >>>>>>>>>>>>>>> +++ b/drivers/pci/pci.h
> >>>>>>>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>       void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>       void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>       void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> >>>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>>>>>>>       #else
> >>>>>>>>>>>>>>>       static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>       static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>       static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>       static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>>>>>>>       #endif
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>       #ifdef CONFIG_PCIE_ECRC
> >>>>>>>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
> >>>>>>>>>>>>>>> --- a/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >>>>>>>>>>>>>>>                                      PCI_L1SS_CTL1_L1SS_MASK, val);
> >>>>>>>>>>>>>>>       }
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>> +{
> >>>>>>>>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>>>>>>>> +       u32 *cap;
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>> +       if (!save_state)
> >>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> >>>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> >>>>>>>>>>>>>>> +}
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>> +{
> >>>>>>>>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>>>>>>>> +       u32 *cap;
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>> +       if (!save_state)
> >>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> >>>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >>>>>>>>>>>>>>> +}
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>       static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> >>>>>>>>>>>>>>>       {
> >>>>>>>>>>>>>>>              pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> 2.17.1
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> With this patch (and also mentioned
> >>>>>>>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
> >>>>>>>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
> >>>>>>>>>>>> suspend/resume with my WiFi card - it looks like whole communication
> >>>>>>>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> >>>>>>>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> >>>>>>>>>>>>
> >>>>>>>>>>>> I played a little bit with this code and it looks like the
> >>>>>>>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> >>>>>>>>>>>> why, not a PCI expert).
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
> >>>>>>>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
> >>>>>>>>>>> other changes are in chromeos-5.10.
> >>>>>>>>>
> >>>>>>>>> Lukasz: I assume you are running this on Atlas and are seeing this bug
> >>>>>>>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
> >>>>>>>>> Intel platform that have the latest upstream kernel running already
> >>>>>>>>> and see if this can be reproduced there too?
> >>>>>>>>> Note that the wifi PCI device is different on newer Intel platforms,
> >>>>>>>>> but platform design is similar enough that I suspect we should see
> >>>>>>>>> similar bug on those too. The other option is to try the latest
> >>>>>>>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
> >>>>>>>>> ignore bringing up the graphics stack and GUI), it may come up
> >>>>>>>>> sufficiently enough to try this patch?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Rajat
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Random observations, no analysis below.  This from your dmesg
> >>>>>>>>>>> certainly looks like PCI reads failing and returning ~0:
> >>>>>>>>>>>
> >>>>>>>>>>>        Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> >>>>>>>>>>>        iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> >>>>>>>>>>>        iwlwifi 0000:01:00.0: Device gone - attempting removal
> >>>>>>>>>>>        Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> >>>>>>>>>>>
> >>>>>>>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
> >>>>>>>>>>> reset (BAR is 0):
> >>>>>>>>>>>
> >>>>>>>>>>>        pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
> >>>>>>>>>>>        pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> >>>>>>>>>>>
> >>>>>>>>>>> lspci diffs from before/after suspend:
> >>>>>>>>>>>
> >>>>>>>>>>>         00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
> >>>>>>>>>>>           Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> >>>>>>>>>>>        -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
> >>>>>>>>>>>        +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>>>>>        -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>>>        +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>>>        -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> >>>>>>>>>>>        +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> >>>>>>>>>>>        -       Capabilities: [150 v0] Null
> >>>>>>>>>>>        -       Capabilities: [200 v1] L1 PM Substates
> >>>>>>>>>>>        -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>        -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> >>>>>>>>>>>        -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>        -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
> >>>>>>>>>>>        -               L1SubCtl2: T_PwrOn=60us
> >>>>>>>>>>>
> >>>>>>>>>>> The DevSta differences might be BIOS bugs, probably not relevant.
> >>>>>>>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
> >>>>>>>>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
> >>>>>>>>>>> disappeared.
> >>>>>>>>>>>
> >>>>>>>>>>>         01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> >>>>>>>>>>>                        LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>>>        -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>>>        +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>>>                Capabilities: [154 v1] L1 PM Substates
> >>>>>>>>>>>                        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>                                  PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
> >>>>>>>>>>>        -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>        -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
> >>>>>>>>>>>        +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>>>>>        +                          T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>>>>>
> >>>>>>>>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
> >>>>>>>>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
> >>>>>>>>>>> L1SubCtl1.
> >>>>>>>>>>>
> >>>>>>>>>>> Bjorn
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Thank you all for the response and input! As Rajat mentioned I'm using
> >>>>>>>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
> >>>>>>>> (Apollolake)  - I will try to load most recent kernel and give it a
> >>>>>>>> try once again.
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> Lukasz
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>>      I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
> >>>>>>> still getting same results:
> >>>>>>> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Lukasz
> >>>>>>>
> >>>>> Hi Vidya,
> >>>>>
> >>>>> Sorry for the long delay, I have retested your patch on top of
> >>>>> linux-next/master (next-20220802) - the results for my device remain
> >>>>> the same.
> >>>>> Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg)
> >>>>> https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef
> >>>>> Let me know if you need any more logs and/or tests.
> >>>>>
> >>>>> Best regards,
> >>>>> Lukasz
> >>>>>
> >>> Hi Vidya,
> >>>
> >>> After your last email, I've re-tested my setup and (without your
> >>> patch)  the capability register also disappears - so it looks there is
> >>> - in fact - some problem in my setup and your patch just brings it to
> >>> the top as after resume tries to write to a register that is no longer
> >>> present. I'm very sorry for the confusion here and I've not notice
> >>> that at the very beginning.
> >>>
> >>> Best regards,
> >>> Lukasz
> >>>

Hi Vidya,

For me (on Apollolake devices) the results remain the same, but as
I've mentioned earlier - it looks very much related exactly to the
Apollolake and is not directly related to your patch (e.g. I'm losing
L1SS capabilities even without your patch).
As a counter example, I don't  observe any issues with your patach
(v3) on Amberlake devices - lspci -vvv before suspend and after resume
are exactly the same.

Best regards,
Lukasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-08-30 11:15                               ` Lukasz Majczak
@ 2022-08-30 14:02                                 ` Vidya Sagar
  2022-09-02  5:49                                   ` Lukasz Majczak
  0 siblings, 1 reply; 24+ messages in thread
From: Vidya Sagar @ 2022-08-30 14:02 UTC (permalink / raw)
  To: Lukasz Majczak
  Cc: Kai-Heng Feng, Bjorn Helgaas, Rajat Jain, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv



On 8/30/2022 4:45 PM, Lukasz Majczak wrote:
> External email: Use caution opening links or attachments
> 
> 
> pt., 26 sie 2022 o 15:00 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>>
>>
>>
>> On 8/23/2022 8:25 PM, Kai-Heng Feng wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hi Vidya,
>>>
>>> On Tue, Aug 9, 2022 at 12:17 AM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>
>>>> Thanks Lukasz for the update.
>>>> I think confirms that there is no issue with the patch as such.
>>>> Bjorn, could you please define the next step for this patch?
>>>
>>> I think the L1SS cap went away _after_ L1SS registers are restored,
>>> since your patch already check the cap before doing any write:
>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>> +       if (!aspm_l1ss)
>>> +               return;
>>>
>>> That means it's more likely to be caused by the following change:
>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
>>>
>>> So is it possible to clear PCI_L1SS_CTL1 before setting PCI_L1SS_CTL2,
>>> like what aspm_calc_l1ss_info() does?
>>
>> I posted a new patch
>> https://patchwork.kernel.org/project/linux-pci/patch/20220826125526.28859-1-vidyas@nvidia.com/
>> keeping L1.2 disabled while restoring the rest of the fields in
>> Control-1 register and restoring the L1.2 enable bits later. Could you
>> please try this new patch on your setup and update your observations?
>>
>> Thanks & Regards,
>> Vidya Sagar
>>
>>>
>>> Kai-Heng
>>>
>>>>
>>>> Thanks,
>>>> Vidya Sagar
>>>>
>>>> On 8/8/2022 7:37 PM, Lukasz Majczak wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> śr., 3 sie 2022 o 14:55 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>>>>>>
>>>>>> Thanks Lukasz for the logs.
>>>>>> I still that the L1SS capability in the root port (00:14.0) disappeared
>>>>>> after resume.
>>>>>> I still don't understand how this patch can make the capability register
>>>>>> itself disappear. Honestly, I still see this as a HW issue.
>>>>>> Bjorn, could you please throw some light on this?
>>>>>>
>>>>>> Thanks,
>>>>>> Vidya Sagar
>>>>>>
>>>>>> On 8/3/2022 5:34 PM, Lukasz Majczak wrote:
>>>>>>> External email: Use caution opening links or attachments
>>>>>>>
>>>>>>>
>>>>>>> pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>>>>>>>>
>>>>>>>> Hi Lukasz,
>>>>>>>> Thanks for sharing your observations.
>>>>>>>>
>>>>>>>> Could you please also share the output of 'sudo lspci -vvvv' before and
>>>>>>>> after suspend-resume cycle with the latest linux-next?
>>>>>>>> Do we still see the L1SS capabilities getting disappeared post resume?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vidya Sagar
>>>>>>>>
>>>>>>>> On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
>>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
>>>>>>>>>>
>>>>>>>>>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Agree with Bjorn's observations.
>>>>>>>>>>>> The fact that the L1SS capability registers themselves disappeared in
>>>>>>>>>>>> the root port post resume indicates that there seems to be something
>>>>>>>>>>>> wrong with the BIOS itself.
>>>>>>>>>>>> Could you please check from that perspective?
>>>>>>>>>>>
>>>>>>>>>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
>>>>>>>>>>> is a shallower sleep state that preserves more state than, for e.g. S3
>>>>>>>>>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
>>>>>>>>>>> at all. i.e. after the kernel runs its suspend routines, it just puts
>>>>>>>>>>> the CPU into S0ix state. So I do not think there is a BIOS angle to
>>>>>>>>>>> this.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Vidya Sagar
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
>>>>>>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
>>>>>>>>>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
>>>>>>>>>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
>>>>>>>>>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
>>>>>>>>>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
>>>>>>>>>>>>>>>>> configuration being lost post-resume.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Save the L1 Substates control registers so that the configuration is
>>>>>>>>>>>>>>>>> retained post-resume.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>>>>>>>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Vidya,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I tested this patch on kernel v5.19-rc6.
>>>>>>>>>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
>>>>>>>>>>>>>>>> This patch can restore L1SS after suspend/resume.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The test results are as follows:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> After Boot:
>>>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>>>>>>>               Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>>>>>                                 PortCommonModeRestoreTime=255us
>>>>>>>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>>>>>>>                       L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>>>>>>>                                  T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>>>>>>>>>>>>>                       L1SubCtl2: T_PwrOn=3100us
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> After suspend/resume without this patch.
>>>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>>>>>>>               Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>>>>>                                 PortCommonModeRestoreTime=255us
>>>>>>>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>>>>>>>                       L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>>>>>>>>>>>>>                                  T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>>>>>>>>>>>>>                       L1SubCtl2: T_PwrOn=10us
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> After suspend/resume with this patch.
>>>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
>>>>>>>>>>>>>>>>               Capabilities: [110 v1] L1 PM Substates
>>>>>>>>>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
>>>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>>>>>                                 PortCommonModeRestoreTime=255us
>>>>>>>>>>>>>>>> PortTPowerOnTime=3100us
>>>>>>>>>>>>>>>>                       L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>>>>>>>                                  T_CommonMode=0us LTR1.2_Threshold=3145728ns
>>>>>>>>>>>>>>>>                       L1SubCtl2: T_PwrOn=3100us
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Forgot to add mine:
>>>>>>>>>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Ben Chuang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
>>>>>>>>>>>>>>>>> on your laptop (Dell XPS 13) one last time?
>>>>>>>>>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
>>>>>>>>>>>>>>>>> could be due to a buggy old version BIOS in the laptop.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Vidya Sagar
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>        drivers/pci/pci.c       |  7 +++++++
>>>>>>>>>>>>>>>>>        drivers/pci/pci.h       |  4 ++++
>>>>>>>>>>>>>>>>>        drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>>>>        3 files changed, 55 insertions(+)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>>>>>>>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
>>>>>>>>>>>>>>>>> --- a/drivers/pci/pci.c
>>>>>>>>>>>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>>>>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
>>>>>>>>>>>>>>>>>                       return i;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>               pci_save_ltr_state(dev);
>>>>>>>>>>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
>>>>>>>>>>>>>>>>>               pci_save_dpc_state(dev);
>>>>>>>>>>>>>>>>>               pci_save_aer_state(dev);
>>>>>>>>>>>>>>>>>               pci_save_ptm_state(dev);
>>>>>>>>>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
>>>>>>>>>>>>>>>>>                * LTR itself (in the PCIe capability).
>>>>>>>>>>>>>>>>>                */
>>>>>>>>>>>>>>>>>               pci_restore_ltr_state(dev);
>>>>>>>>>>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>               pci_restore_pcie_state(dev);
>>>>>>>>>>>>>>>>>               pci_restore_pasid_state(dev);
>>>>>>>>>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
>>>>>>>>>>>>>>>>>               if (error)
>>>>>>>>>>>>>>>>>                       pci_err(dev, "unable to allocate suspend buffer for LTR\n");
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
>>>>>>>>>>>>>>>>> +                                           2 * sizeof(u32));
>>>>>>>>>>>>>>>>> +       if (error)
>>>>>>>>>>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>               pci_allocate_vc_save_buffers(dev);
>>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>>>>>>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
>>>>>>>>>>>>>>>>> --- a/drivers/pci/pci.h
>>>>>>>>>>>>>>>>> +++ b/drivers/pci/pci.h
>>>>>>>>>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
>>>>>>>>>>>>>>>>>        void pcie_aspm_exit_link_state(struct pci_dev *pdev);
>>>>>>>>>>>>>>>>>        void pcie_aspm_pm_state_change(struct pci_dev *pdev);
>>>>>>>>>>>>>>>>>        void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
>>>>>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
>>>>>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>>>>>>>>>>>>>>>>>        #else
>>>>>>>>>>>>>>>>>        static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
>>>>>>>>>>>>>>>>>        static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
>>>>>>>>>>>>>>>>>        static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
>>>>>>>>>>>>>>>>>        static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
>>>>>>>>>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>>>>>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
>>>>>>>>>>>>>>>>>        #endif
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>        #ifdef CONFIG_PCIE_ECRC
>>>>>>>>>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>>>>>>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
>>>>>>>>>>>>>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>>>>>>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>>>>>>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>>>>>>>>>>>>>>>>>                                       PCI_L1SS_CTL1_L1SS_MASK, val);
>>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       int aspm_l1ss;
>>>>>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
>>>>>>>>>>>>>>>>> +       u32 *cap;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
>>>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>>>>>> +       if (!aspm_l1ss)
>>>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>>>>>> +       if (!save_state)
>>>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
>>>>>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       int aspm_l1ss;
>>>>>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
>>>>>>>>>>>>>>>>> +       u32 *cap;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
>>>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>>>>>> +       if (!aspm_l1ss)
>>>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
>>>>>>>>>>>>>>>>> +       if (!save_state)
>>>>>>>>>>>>>>>>> +               return;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
>>>>>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
>>>>>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>        static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
>>>>>>>>>>>>>>>>>        {
>>>>>>>>>>>>>>>>>               pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With this patch (and also mentioned
>>>>>>>>>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
>>>>>>>>>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
>>>>>>>>>>>>>> suspend/resume with my WiFi card - it looks like whole communication
>>>>>>>>>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
>>>>>>>>>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I played a little bit with this code and it looks like the
>>>>>>>>>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
>>>>>>>>>>>>>> why, not a PCI expert).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
>>>>>>>>>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
>>>>>>>>>>>>> other changes are in chromeos-5.10.
>>>>>>>>>>>
>>>>>>>>>>> Lukasz: I assume you are running this on Atlas and are seeing this bug
>>>>>>>>>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
>>>>>>>>>>> Intel platform that have the latest upstream kernel running already
>>>>>>>>>>> and see if this can be reproduced there too?
>>>>>>>>>>> Note that the wifi PCI device is different on newer Intel platforms,
>>>>>>>>>>> but platform design is similar enough that I suspect we should see
>>>>>>>>>>> similar bug on those too. The other option is to try the latest
>>>>>>>>>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
>>>>>>>>>>> ignore bringing up the graphics stack and GUI), it may come up
>>>>>>>>>>> sufficiently enough to try this patch?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Rajat
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Random observations, no analysis below.  This from your dmesg
>>>>>>>>>>>>> certainly looks like PCI reads failing and returning ~0:
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
>>>>>>>>>>>>>         iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
>>>>>>>>>>>>>         iwlwifi 0000:01:00.0: Device gone - attempting removal
>>>>>>>>>>>>>         Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
>>>>>>>>>>>>> reset (BAR is 0):
>>>>>>>>>>>>>
>>>>>>>>>>>>>         pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
>>>>>>>>>>>>>         pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
>>>>>>>>>>>>>
>>>>>>>>>>>>> lspci diffs from before/after suspend:
>>>>>>>>>>>>>
>>>>>>>>>>>>>          00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
>>>>>>>>>>>>>            Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
>>>>>>>>>>>>>         -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
>>>>>>>>>>>>>         +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>>>>>>         -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>>>>>>>         +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>>>>>>>         -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>>>>>>>>>>>>>         +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
>>>>>>>>>>>>>         -       Capabilities: [150 v0] Null
>>>>>>>>>>>>>         -       Capabilities: [200 v1] L1 PM Substates
>>>>>>>>>>>>>         -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>>         -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
>>>>>>>>>>>>>         -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>>>>         -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
>>>>>>>>>>>>>         -               L1SubCtl2: T_PwrOn=60us
>>>>>>>>>>>>>
>>>>>>>>>>>>> The DevSta differences might be BIOS bugs, probably not relevant.
>>>>>>>>>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
>>>>>>>>>>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
>>>>>>>>>>>>> disappeared.
>>>>>>>>>>>>>
>>>>>>>>>>>>>          01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
>>>>>>>>>>>>>                         LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>>>>>>>>>>>>>         -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>>>>>>         +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>>>>>>>>>>>                 Capabilities: [154 v1] L1 PM Substates
>>>>>>>>>>>>>                         L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>>>>>>>>>>>>>                                   PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
>>>>>>>>>>>>>         -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>>>>>>>>>>>>>         -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
>>>>>>>>>>>>>         +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>>>>>>>>>>>>>         +                          T_CommonMode=0us LTR1.2_Threshold=0ns
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
>>>>>>>>>>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
>>>>>>>>>>>>> L1SubCtl1.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bjorn
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Thank you all for the response and input! As Rajat mentioned I'm using
>>>>>>>>>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
>>>>>>>>>> (Apollolake)  - I will try to load most recent kernel and give it a
>>>>>>>>>> try once again.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Lukasz
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>       I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
>>>>>>>>> still getting same results:
>>>>>>>>> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Lukasz
>>>>>>>>>
>>>>>>> Hi Vidya,
>>>>>>>
>>>>>>> Sorry for the long delay, I have retested your patch on top of
>>>>>>> linux-next/master (next-20220802) - the results for my device remain
>>>>>>> the same.
>>>>>>> Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg)
>>>>>>> https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef
>>>>>>> Let me know if you need any more logs and/or tests.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Lukasz
>>>>>>>
>>>>> Hi Vidya,
>>>>>
>>>>> After your last email, I've re-tested my setup and (without your
>>>>> patch)  the capability register also disappears - so it looks there is
>>>>> - in fact - some problem in my setup and your patch just brings it to
>>>>> the top as after resume tries to write to a register that is no longer
>>>>> present. I'm very sorry for the confusion here and I've not notice
>>>>> that at the very beginning.
>>>>>
>>>>> Best regards,
>>>>> Lukasz
>>>>>
> 
> Hi Vidya,
> 
> For me (on Apollolake devices) the results remain the same, but as
> I've mentioned earlier - it looks very much related exactly to the
> Apollolake and is not directly related to your patch (e.g. I'm losing
> L1SS capabilities even without your patch).
> As a counter example, I don't  observe any issues with your patach
> (v3) on Amberlake devices - lspci -vvv before suspend and after resume
> are exactly the same.

Thanks for the update Lukasz.
Anyway, i sent V3 fore review. Could you please review it and also test 
it on your platform?

Thanks,
Vidya Sagar

> 
> Best regards,
> Lukasz
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
  2022-08-30 14:02                                 ` Vidya Sagar
@ 2022-09-02  5:49                                   ` Lukasz Majczak
  0 siblings, 0 replies; 24+ messages in thread
From: Lukasz Majczak @ 2022-09-02  5:49 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: Kai-Heng Feng, Bjorn Helgaas, Rajat Jain, Ben Chuang, bhelgaas,
	lorenzo.pieralisi, refactormyself, kw, kenny, treding, jonathanh,
	abhsahu, sagupta, linux-pci, Linux Kernel Mailing List, kthota,
	mmaddireddy, sagar.tv

wt., 30 sie 2022 o 16:02 Vidya Sagar <vidyas@nvidia.com> napisał(a):
>
>
>
> On 8/30/2022 4:45 PM, Lukasz Majczak wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > pt., 26 sie 2022 o 15:00 Vidya Sagar <vidyas@nvidia.com> napisał(a):
> >>
> >>
> >>
> >> On 8/23/2022 8:25 PM, Kai-Heng Feng wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> Hi Vidya,
> >>>
> >>> On Tue, Aug 9, 2022 at 12:17 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>
> >>>> Thanks Lukasz for the update.
> >>>> I think confirms that there is no issue with the patch as such.
> >>>> Bjorn, could you please define the next step for this patch?
> >>>
> >>> I think the L1SS cap went away _after_ L1SS registers are restored,
> >>> since your patch already check the cap before doing any write:
> >>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>> +       if (!aspm_l1ss)
> >>> +               return;
> >>>
> >>> That means it's more likely to be caused by the following change:
> >>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> >>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >>>
> >>> So is it possible to clear PCI_L1SS_CTL1 before setting PCI_L1SS_CTL2,
> >>> like what aspm_calc_l1ss_info() does?
> >>
> >> I posted a new patch
> >> https://patchwork.kernel.org/project/linux-pci/patch/20220826125526.28859-1-vidyas@nvidia.com/
> >> keeping L1.2 disabled while restoring the rest of the fields in
> >> Control-1 register and restoring the L1.2 enable bits later. Could you
> >> please try this new patch on your setup and update your observations?
> >>
> >> Thanks & Regards,
> >> Vidya Sagar
> >>
> >>>
> >>> Kai-Heng
> >>>
> >>>>
> >>>> Thanks,
> >>>> Vidya Sagar
> >>>>
> >>>> On 8/8/2022 7:37 PM, Lukasz Majczak wrote:
> >>>>> External email: Use caution opening links or attachments
> >>>>>
> >>>>>
> >>>>> śr., 3 sie 2022 o 14:55 Vidya Sagar <vidyas@nvidia.com> napisał(a):
> >>>>>>
> >>>>>> Thanks Lukasz for the logs.
> >>>>>> I still that the L1SS capability in the root port (00:14.0) disappeared
> >>>>>> after resume.
> >>>>>> I still don't understand how this patch can make the capability register
> >>>>>> itself disappear. Honestly, I still see this as a HW issue.
> >>>>>> Bjorn, could you please throw some light on this?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vidya Sagar
> >>>>>>
> >>>>>> On 8/3/2022 5:34 PM, Lukasz Majczak wrote:
> >>>>>>> External email: Use caution opening links or attachments
> >>>>>>>
> >>>>>>>
> >>>>>>> pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@nvidia.com> napisał(a):
> >>>>>>>>
> >>>>>>>> Hi Lukasz,
> >>>>>>>> Thanks for sharing your observations.
> >>>>>>>>
> >>>>>>>> Could you please also share the output of 'sudo lspci -vvvv' before and
> >>>>>>>> after suspend-resume cycle with the latest linux-next?
> >>>>>>>> Do we still see the L1SS capabilities getting disappeared post resume?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Vidya Sagar
> >>>>>>>>
> >>>>>>>> On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
> >>>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@semihalf.com> napisał(a):
> >>>>>>>>>>
> >>>>>>>>>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@google.com> napisał(a):
> >>>>>>>>>>>
> >>>>>>>>>>> Hello,
> >>>>>>>>>>>
> >>>>>>>>>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Agree with Bjorn's observations.
> >>>>>>>>>>>> The fact that the L1SS capability registers themselves disappeared in
> >>>>>>>>>>>> the root port post resume indicates that there seems to be something
> >>>>>>>>>>>> wrong with the BIOS itself.
> >>>>>>>>>>>> Could you please check from that perspective?
> >>>>>>>>>>>
> >>>>>>>>>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
> >>>>>>>>>>> is a shallower sleep state that preserves more state than, for e.g. S3
> >>>>>>>>>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
> >>>>>>>>>>> at all. i.e. after the kernel runs its suspend routines, it just puts
> >>>>>>>>>>> the CPU into S0ix state. So I do not think there is a BIOS angle to
> >>>>>>>>>>> this.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Vidya Sagar
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> >>>>>>>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> >>>>>>>>>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@canonical.com> napisał(a):
> >>>>>>>>>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@gmail.com> wrote:
> >>>>>>>>>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@nvidia.com> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> >>>>>>>>>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
> >>>>>>>>>>>>>>>>> configuration being lost post-resume.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Save the L1 Substates control registers so that the configuration is
> >>>>>>>>>>>>>>>>> retained post-resume.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> >>>>>>>>>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@nvidia.com>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Vidya,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I tested this patch on kernel v5.19-rc6.
> >>>>>>>>>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> >>>>>>>>>>>>>>>> This patch can restore L1SS after suspend/resume.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The test results are as follows:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> After Boot:
> >>>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>>>>>               Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>>>>                                 PortCommonModeRestoreTime=255us
> >>>>>>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>>>>>                       L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>>>>>                                  T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>>>>>>>>>                       L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> After suspend/resume without this patch.
> >>>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>>>>>               Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>>>>                                 PortCommonModeRestoreTime=255us
> >>>>>>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>>>>>                       L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>>>>>>>>>>                                  T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>>>>>>>>>>                       L1SubCtl2: T_PwrOn=10us
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> After suspend/resume with this patch.
> >>>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>>>>>               Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>>>>                                 PortCommonModeRestoreTime=255us
> >>>>>>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>>>>>                       L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>>>>>                                  T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>>>>>>>>>                       L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Tested-by: Ben Chuang <benchuanggli@gmail.com>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Forgot to add mine:
> >>>>>>>>>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>> Ben Chuang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>> Kenneth R. Crudup <kenny@panix.com>, Could you please verify this patch
> >>>>>>>>>>>>>>>>> on your laptop (Dell XPS 13) one last time?
> >>>>>>>>>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
> >>>>>>>>>>>>>>>>> could be due to a buggy old version BIOS in the laptop.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>> Vidya Sagar
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>        drivers/pci/pci.c       |  7 +++++++
> >>>>>>>>>>>>>>>>>        drivers/pci/pci.h       |  4 ++++
> >>>>>>>>>>>>>>>>>        drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>>>>>>>>>>        3 files changed, 55 insertions(+)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >>>>>>>>>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
> >>>>>>>>>>>>>>>>> --- a/drivers/pci/pci.c
> >>>>>>>>>>>>>>>>> +++ b/drivers/pci/pci.c
> >>>>>>>>>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>>>>                       return i;
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>               pci_save_ltr_state(dev);
> >>>>>>>>>>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
> >>>>>>>>>>>>>>>>>               pci_save_dpc_state(dev);
> >>>>>>>>>>>>>>>>>               pci_save_aer_state(dev);
> >>>>>>>>>>>>>>>>>               pci_save_ptm_state(dev);
> >>>>>>>>>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>>>>                * LTR itself (in the PCIe capability).
> >>>>>>>>>>>>>>>>>                */
> >>>>>>>>>>>>>>>>>               pci_restore_ltr_state(dev);
> >>>>>>>>>>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>               pci_restore_pcie_state(dev);
> >>>>>>>>>>>>>>>>>               pci_restore_pasid_state(dev);
> >>>>>>>>>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >>>>>>>>>>>>>>>>>               if (error)
> >>>>>>>>>>>>>>>>>                       pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> >>>>>>>>>>>>>>>>> +                                           2 * sizeof(u32));
> >>>>>>>>>>>>>>>>> +       if (error)
> >>>>>>>>>>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>>               pci_allocate_vc_save_buffers(dev);
> >>>>>>>>>>>>>>>>>        }
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> >>>>>>>>>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
> >>>>>>>>>>>>>>>>> --- a/drivers/pci/pci.h
> >>>>>>>>>>>>>>>>> +++ b/drivers/pci/pci.h
> >>>>>>>>>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>>>        void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>>>        void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>>>        void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>>>>>>>>>        #else
> >>>>>>>>>>>>>>>>>        static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>>>        static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>>>        static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>>>        static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>>>>>>>>>        #endif
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>        #ifdef CONFIG_PCIE_ECRC
> >>>>>>>>>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
> >>>>>>>>>>>>>>>>> --- a/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >>>>>>>>>>>>>>>>>                                       PCI_L1SS_CTL1_L1SS_MASK, val);
> >>>>>>>>>>>>>>>>>        }
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>>>> +{
> >>>>>>>>>>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>>>>>>>>>> +       u32 *cap;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>>>> +       if (!save_state)
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> >>>>>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> >>>>>>>>>>>>>>>>> +}
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>>>> +{
> >>>>>>>>>>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>>>>>>>>>> +       u32 *cap;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>>>> +       if (!save_state)
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> >>>>>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >>>>>>>>>>>>>>>>> +}
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>>        static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> >>>>>>>>>>>>>>>>>        {
> >>>>>>>>>>>>>>>>>               pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> 2.17.1
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> With this patch (and also mentioned
> >>>>>>>>>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
> >>>>>>>>>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
> >>>>>>>>>>>>>> suspend/resume with my WiFi card - it looks like whole communication
> >>>>>>>>>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> >>>>>>>>>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I played a little bit with this code and it looks like the
> >>>>>>>>>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> >>>>>>>>>>>>>> why, not a PCI expert).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
> >>>>>>>>>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
> >>>>>>>>>>>>> other changes are in chromeos-5.10.
> >>>>>>>>>>>
> >>>>>>>>>>> Lukasz: I assume you are running this on Atlas and are seeing this bug
> >>>>>>>>>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
> >>>>>>>>>>> Intel platform that have the latest upstream kernel running already
> >>>>>>>>>>> and see if this can be reproduced there too?
> >>>>>>>>>>> Note that the wifi PCI device is different on newer Intel platforms,
> >>>>>>>>>>> but platform design is similar enough that I suspect we should see
> >>>>>>>>>>> similar bug on those too. The other option is to try the latest
> >>>>>>>>>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
> >>>>>>>>>>> ignore bringing up the graphics stack and GUI), it may come up
> >>>>>>>>>>> sufficiently enough to try this patch?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>>
> >>>>>>>>>>> Rajat
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Random observations, no analysis below.  This from your dmesg
> >>>>>>>>>>>>> certainly looks like PCI reads failing and returning ~0:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>         Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> >>>>>>>>>>>>>         iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> >>>>>>>>>>>>>         iwlwifi 0000:01:00.0: Device gone - attempting removal
> >>>>>>>>>>>>>         Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
> >>>>>>>>>>>>> reset (BAR is 0):
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>         pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
> >>>>>>>>>>>>>         pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> lspci diffs from before/after suspend:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>          00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
> >>>>>>>>>>>>>            Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> >>>>>>>>>>>>>         -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
> >>>>>>>>>>>>>         +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>>>>>>>         -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>>>>>         +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>>>>>         -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> >>>>>>>>>>>>>         +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> >>>>>>>>>>>>>         -       Capabilities: [150 v0] Null
> >>>>>>>>>>>>>         -       Capabilities: [200 v1] L1 PM Substates
> >>>>>>>>>>>>>         -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>         -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> >>>>>>>>>>>>>         -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>>         -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
> >>>>>>>>>>>>>         -               L1SubCtl2: T_PwrOn=60us
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The DevSta differences might be BIOS bugs, probably not relevant.
> >>>>>>>>>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
> >>>>>>>>>>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
> >>>>>>>>>>>>> disappeared.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>          01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> >>>>>>>>>>>>>                         LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>>>>>         -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>>>>>         +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>>>>>                 Capabilities: [154 v1] L1 PM Substates
> >>>>>>>>>>>>>                         L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>                                   PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
> >>>>>>>>>>>>>         -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>>         -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
> >>>>>>>>>>>>>         +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>>>>>>>         +                          T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
> >>>>>>>>>>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
> >>>>>>>>>>>>> L1SubCtl1.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Bjorn
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> Thank you all for the response and input! As Rajat mentioned I'm using
> >>>>>>>>>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
> >>>>>>>>>> (Apollolake)  - I will try to load most recent kernel and give it a
> >>>>>>>>>> try once again.
> >>>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Lukasz
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>>       I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
> >>>>>>>>> still getting same results:
> >>>>>>>>> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
> >>>>>>>>>
> >>>>>>>>> Best regards,
> >>>>>>>>> Lukasz
> >>>>>>>>>
> >>>>>>> Hi Vidya,
> >>>>>>>
> >>>>>>> Sorry for the long delay, I have retested your patch on top of
> >>>>>>> linux-next/master (next-20220802) - the results for my device remain
> >>>>>>> the same.
> >>>>>>> Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg)
> >>>>>>> https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef
> >>>>>>> Let me know if you need any more logs and/or tests.
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Lukasz
> >>>>>>>
> >>>>> Hi Vidya,
> >>>>>
> >>>>> After your last email, I've re-tested my setup and (without your
> >>>>> patch)  the capability register also disappears - so it looks there is
> >>>>> - in fact - some problem in my setup and your patch just brings it to
> >>>>> the top as after resume tries to write to a register that is no longer
> >>>>> present. I'm very sorry for the confusion here and I've not notice
> >>>>> that at the very beginning.
> >>>>>
> >>>>> Best regards,
> >>>>> Lukasz
> >>>>>
> >
> > Hi Vidya,
> >
> > For me (on Apollolake devices) the results remain the same, but as
> > I've mentioned earlier - it looks very much related exactly to the
> > Apollolake and is not directly related to your patch (e.g. I'm losing
> > L1SS capabilities even without your patch).
> > As a counter example, I don't  observe any issues with your patach
> > (v3) on Amberlake devices - lspci -vvv before suspend and after resume
> > are exactly the same.
>
> Thanks for the update Lukasz.
> Anyway, i sent V3 fore review. Could you please review it and also test
> it on your platform?
>
> Thanks,
> Vidya Sagar
>
> >
> > Best regards,
> > Lukasz
> >
Hi Vidya,

The results from my previous mail are for V3 of your patch;
Amberlake - works fine
Apollolake - still the same issue, but here it is not related to your
changes (we are still working on this).

Best regards,
Lukasz

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-09-02  5:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-05  6:00 [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume Vidya Sagar
2022-07-13 17:59 ` Vidya Sagar
2022-07-13 18:16   ` Bjorn Helgaas
2022-07-14  4:20     ` Kai-Heng Feng
2022-07-15 10:38 ` Ben Chuang
2022-07-22  7:31   ` Kai-Heng Feng
2022-07-22  9:41     ` Lukasz Majczak
2022-07-22 17:42       ` Bjorn Helgaas
2022-07-23 17:03         ` Vidya Sagar
2022-07-25 22:50           ` Rajat Jain
2022-07-26  7:20             ` Lukasz Majczak
2022-07-29  9:39               ` Lukasz Majczak
2022-07-29 14:35                 ` Vidya Sagar
2022-08-03 12:04                   ` Lukasz Majczak
2022-08-03 12:55                     ` Vidya Sagar
2022-08-08 14:07                       ` Lukasz Majczak
2022-08-08 16:16                         ` Vidya Sagar
2022-08-23 14:55                           ` Kai-Heng Feng
2022-08-25 23:01                             ` Bjorn Helgaas
2022-08-26  3:13                               ` Kai-Heng Feng
2022-08-26 13:00                             ` Vidya Sagar
2022-08-30 11:15                               ` Lukasz Majczak
2022-08-30 14:02                                 ` Vidya Sagar
2022-09-02  5:49                                   ` Lukasz Majczak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).