All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
@ 2021-10-26  6:51 ` Kai-Heng Feng
  0 siblings, 0 replies; 16+ messages in thread
From: Kai-Heng Feng @ 2021-10-26  6:51 UTC (permalink / raw)
  To: jesse.brandeburg, anthony.l.nguyen
  Cc: sasha.neftin, acelan.kao, Kai-Heng Feng, David S. Miller,
	Jakub Kicinski, Dima Ruinskiy, intel-wired-lan, netdev,
	linux-kernel

On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
polling logic doesn't wait until ME really unconfigures s0ix.

So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
flagged, wait for 1 second to let ME unconfigure s0ix.

Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
v2:
 Add missing "Fixes:" tag

 drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 44e2dc8328a22..cd81ba00a6bc9 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
 	u32 mac_data;
 	u16 phy_data;
 	u32 i = 0;
+	bool dpg_exit_done;
 
 	if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
+		dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
 		/* Request ME unconfigure the device from S0ix */
 		mac_data = er32(H2ME);
 		mac_data &= ~E1000_H2ME_START_DPG;
 		mac_data |= E1000_H2ME_EXIT_DPG;
 		ew32(H2ME, mac_data);
 
+		if (dpg_exit_done) {
+			e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
+			msleep(1000);
+		}
+
 		/* Poll up to 2.5 seconds for ME to unconfigure DPG.
 		 * If this takes more than 1 second, show a warning indicating a
 		 * firmware bug
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Intel-wired-lan] [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
@ 2021-10-26  6:51 ` Kai-Heng Feng
  0 siblings, 0 replies; 16+ messages in thread
From: Kai-Heng Feng @ 2021-10-26  6:51 UTC (permalink / raw)
  To: intel-wired-lan

On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
polling logic doesn't wait until ME really unconfigures s0ix.

So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
flagged, wait for 1 second to let ME unconfigure s0ix.

Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
v2:
 Add missing "Fixes:" tag

 drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 44e2dc8328a22..cd81ba00a6bc9 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
 	u32 mac_data;
 	u16 phy_data;
 	u32 i = 0;
+	bool dpg_exit_done;
 
 	if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
+		dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
 		/* Request ME unconfigure the device from S0ix */
 		mac_data = er32(H2ME);
 		mac_data &= ~E1000_H2ME_START_DPG;
 		mac_data |= E1000_H2ME_EXIT_DPG;
 		ew32(H2ME, mac_data);
 
+		if (dpg_exit_done) {
+			e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
+			msleep(1000);
+		}
+
 		/* Poll up to 2.5 seconds for ME to unconfigure DPG.
 		 * If this takes more than 1 second, show a warning indicating a
 		 * firmware bug
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Intel-wired-lan] [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
  2021-10-26  6:51 ` [Intel-wired-lan] " Kai-Heng Feng
@ 2021-10-26  7:27   ` Paul Menzel
  -1 siblings, 0 replies; 16+ messages in thread
From: Paul Menzel @ 2021-10-26  7:27 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: intel-wired-lan, linux-kernel, acelan.kao, netdev,
	Jakub Kicinski, Dima Ruinskiy, David S. Miller, jesse.brandeburg,
	anthony.l.nguyen

Dear Kai-Heng,


On 26.10.21 08:51, Kai-Heng Feng wrote:

In the commit message summary, maybe write:

> e1000e: Add 1 s delay …


> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
> polling logic doesn't wait until ME really unconfigures s0ix.

Please list one broken system. The log message says, it’s a firmware 
bug. Please elaborate.

> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
> flagged, wait for 1 second to let ME unconfigure s0ix.

Where did you get the one second from?

> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
> v2:
>   Add missing "Fixes:" tag
> 
>   drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 44e2dc8328a22..cd81ba00a6bc9 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
>   	u32 mac_data;
>   	u16 phy_data;
>   	u32 i = 0;
> +	bool dpg_exit_done;
>   
>   	if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
> +		dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
>   		/* Request ME unconfigure the device from S0ix */
>   		mac_data = er32(H2ME);
>   		mac_data &= ~E1000_H2ME_START_DPG;
>   		mac_data |= E1000_H2ME_EXIT_DPG;
>   		ew32(H2ME, mac_data);
>   
> +		if (dpg_exit_done) {
> +			e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");

What firmware exactly? Management Engine?

> +			msleep(1000);

One second is quite long. Can some bit be polled instead?

> +		}
> +
>   		/* Poll up to 2.5 seconds for ME to unconfigure DPG.
>   		 * If this takes more than 1 second, show a warning indicating a
>   		 * firmware bug
> 


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Intel-wired-lan] [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
@ 2021-10-26  7:27   ` Paul Menzel
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Menzel @ 2021-10-26  7:27 UTC (permalink / raw)
  To: intel-wired-lan

Dear Kai-Heng,


On 26.10.21 08:51, Kai-Heng Feng wrote:

In the commit message summary, maybe write:

> e1000e: Add 1 s delay ?


> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
> polling logic doesn't wait until ME really unconfigures s0ix.

Please list one broken system. The log message says, it?s a firmware 
bug. Please elaborate.

> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
> flagged, wait for 1 second to let ME unconfigure s0ix.

Where did you get the one second from?

> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
> v2:
>   Add missing "Fixes:" tag
> 
>   drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 44e2dc8328a22..cd81ba00a6bc9 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
>   	u32 mac_data;
>   	u16 phy_data;
>   	u32 i = 0;
> +	bool dpg_exit_done;
>   
>   	if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
> +		dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
>   		/* Request ME unconfigure the device from S0ix */
>   		mac_data = er32(H2ME);
>   		mac_data &= ~E1000_H2ME_START_DPG;
>   		mac_data |= E1000_H2ME_EXIT_DPG;
>   		ew32(H2ME, mac_data);
>   
> +		if (dpg_exit_done) {
> +			e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");

What firmware exactly? Management Engine?

> +			msleep(1000);

One second is quite long. Can some bit be polled instead?

> +		}
> +
>   		/* Poll up to 2.5 seconds for ME to unconfigure DPG.
>   		 * If this takes more than 1 second, show a warning indicating a
>   		 * firmware bug
> 


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
  2021-10-26  6:51 ` [Intel-wired-lan] " Kai-Heng Feng
@ 2021-10-26  8:47   ` Sasha Neftin
  -1 siblings, 0 replies; 16+ messages in thread
From: Sasha Neftin @ 2021-10-26  8:47 UTC (permalink / raw)
  To: Kai-Heng Feng, jesse.brandeburg, anthony.l.nguyen
  Cc: acelan.kao, David S. Miller, Jakub Kicinski, Dima Ruinskiy,
	intel-wired-lan, netdev, linux-kernel, Kraus, NechamaX,
	Fuxbrumer, Devora, Avivi, Amir

On 10/26/2021 09:51, Kai-Heng Feng wrote:
> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
> polling logic doesn't wait until ME really unconfigures s0ix.
> 
> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
> flagged, wait for 1 second to let ME unconfigure s0ix.
> 
> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
> v2:
>   Add missing "Fixes:" tag
> 
>   drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 44e2dc8328a22..cd81ba00a6bc9 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
>   	u32 mac_data;
>   	u16 phy_data;
>   	u32 i = 0;
> +	bool dpg_exit_done;
>   
>   	if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
> +		dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
>   		/* Request ME unconfigure the device from S0ix */
>   		mac_data = er32(H2ME);
>   		mac_data &= ~E1000_H2ME_START_DPG;
>   		mac_data |= E1000_H2ME_EXIT_DPG;
>   		ew32(H2ME, mac_data);
>   
> +		if (dpg_exit_done) {
> +			e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
> +			msleep(1000);
> +		}
Thanks for working on the enablement.
The delay approach is fragile. We need to work with CSME folks to 
understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
Could you provide CSME/BIOS version? dmidecode -t 0 and cat 
/sys/class/mei/mei0/fw_ver
>   		/* Poll up to 2.5 seconds for ME to unconfigure DPG.
>   		 * If this takes more than 1 second, show a warning indicating a
>   		 * firmware bug
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Intel-wired-lan] [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
@ 2021-10-26  8:47   ` Sasha Neftin
  0 siblings, 0 replies; 16+ messages in thread
From: Sasha Neftin @ 2021-10-26  8:47 UTC (permalink / raw)
  To: intel-wired-lan

On 10/26/2021 09:51, Kai-Heng Feng wrote:
> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
> polling logic doesn't wait until ME really unconfigures s0ix.
> 
> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
> flagged, wait for 1 second to let ME unconfigure s0ix.
> 
> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
> v2:
>   Add missing "Fixes:" tag
> 
>   drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 44e2dc8328a22..cd81ba00a6bc9 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
>   	u32 mac_data;
>   	u16 phy_data;
>   	u32 i = 0;
> +	bool dpg_exit_done;
>   
>   	if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
> +		dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
>   		/* Request ME unconfigure the device from S0ix */
>   		mac_data = er32(H2ME);
>   		mac_data &= ~E1000_H2ME_START_DPG;
>   		mac_data |= E1000_H2ME_EXIT_DPG;
>   		ew32(H2ME, mac_data);
>   
> +		if (dpg_exit_done) {
> +			e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
> +			msleep(1000);
> +		}
Thanks for working on the enablement.
The delay approach is fragile. We need to work with CSME folks to 
understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
Could you provide CSME/BIOS version? dmidecode -t 0 and cat 
/sys/class/mei/mei0/fw_ver
>   		/* Poll up to 2.5 seconds for ME to unconfigure DPG.
>   		 * If this takes more than 1 second, show a warning indicating a
>   		 * firmware bug
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
  2021-10-26  8:47   ` [Intel-wired-lan] " Sasha Neftin
@ 2021-10-26 22:50     ` Kai-Heng Feng
  -1 siblings, 0 replies; 16+ messages in thread
From: Kai-Heng Feng @ 2021-10-26 22:50 UTC (permalink / raw)
  To: Sasha Neftin
  Cc: Jesse Brandeburg, Nguyen, Anthony L, AceLan Kao, David S. Miller,
	Jakub Kicinski, Dima Ruinskiy,
	moderated list:INTEL ETHERNET DRIVERS, Linux Netdev List, LKML,
	Kraus, NechamaX, Fuxbrumer, Devora, Avivi, Amir

On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
>
> On 10/26/2021 09:51, Kai-Heng Feng wrote:
> > On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
> > polling logic doesn't wait until ME really unconfigures s0ix.
> >
> > So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
> > flagged, wait for 1 second to let ME unconfigure s0ix.
> >
> > Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > ---
> > v2:
> >   Add missing "Fixes:" tag
> >
> >   drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> > index 44e2dc8328a22..cd81ba00a6bc9 100644
> > --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> > @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
> >       u32 mac_data;
> >       u16 phy_data;
> >       u32 i = 0;
> > +     bool dpg_exit_done;
> >
> >       if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
> > +             dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
> >               /* Request ME unconfigure the device from S0ix */
> >               mac_data = er32(H2ME);
> >               mac_data &= ~E1000_H2ME_START_DPG;
> >               mac_data |= E1000_H2ME_EXIT_DPG;
> >               ew32(H2ME, mac_data);
> >
> > +             if (dpg_exit_done) {
> > +                     e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
> > +                     msleep(1000);
> > +             }
> Thanks for working on the enablement.
> The delay approach is fragile. We need to work with CSME folks to
> understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
> /sys/class/mei/mei0/fw_ver

$ sudo dmidecode -t 0
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.4 present.
# SMBIOS implementations newer than version 3.2.0 are not
# fully supported by this version of dmidecode.

Handle 0x0001, DMI type 0, 26 bytes
BIOS Information
        Vendor: Dell Inc.
        Version: 0.12.68
        Release Date: 10/01/2021
        ROM Size: 48 MB
        Characteristics:
                PCI is supported
                PNP is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                EDD is supported
                Print screen service is supported (int 5h)
                8042 keyboard services are supported (int 9h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                ACPI is supported
                USB legacy is supported
                BIOS boot specification is supported
                Function key-initiated network boot is supported
                Targeted content distribution is supported
                UEFI is supported
        BIOS Revision: 0.12

$ cat /sys/class/mei/mei0/fw_ver
0:16.0.15.1518
0:16.0.15.1518
0:16.0.15.1518

> >               /* Poll up to 2.5 seconds for ME to unconfigure DPG.
> >                * If this takes more than 1 second, show a warning indicating a
> >                * firmware bug
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Intel-wired-lan] [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
@ 2021-10-26 22:50     ` Kai-Heng Feng
  0 siblings, 0 replies; 16+ messages in thread
From: Kai-Heng Feng @ 2021-10-26 22:50 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
>
> On 10/26/2021 09:51, Kai-Heng Feng wrote:
> > On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
> > polling logic doesn't wait until ME really unconfigures s0ix.
> >
> > So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
> > flagged, wait for 1 second to let ME unconfigure s0ix.
> >
> > Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > ---
> > v2:
> >   Add missing "Fixes:" tag
> >
> >   drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> > index 44e2dc8328a22..cd81ba00a6bc9 100644
> > --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> > @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
> >       u32 mac_data;
> >       u16 phy_data;
> >       u32 i = 0;
> > +     bool dpg_exit_done;
> >
> >       if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
> > +             dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
> >               /* Request ME unconfigure the device from S0ix */
> >               mac_data = er32(H2ME);
> >               mac_data &= ~E1000_H2ME_START_DPG;
> >               mac_data |= E1000_H2ME_EXIT_DPG;
> >               ew32(H2ME, mac_data);
> >
> > +             if (dpg_exit_done) {
> > +                     e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
> > +                     msleep(1000);
> > +             }
> Thanks for working on the enablement.
> The delay approach is fragile. We need to work with CSME folks to
> understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
> /sys/class/mei/mei0/fw_ver

$ sudo dmidecode -t 0
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.4 present.
# SMBIOS implementations newer than version 3.2.0 are not
# fully supported by this version of dmidecode.

Handle 0x0001, DMI type 0, 26 bytes
BIOS Information
        Vendor: Dell Inc.
        Version: 0.12.68
        Release Date: 10/01/2021
        ROM Size: 48 MB
        Characteristics:
                PCI is supported
                PNP is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                EDD is supported
                Print screen service is supported (int 5h)
                8042 keyboard services are supported (int 9h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                ACPI is supported
                USB legacy is supported
                BIOS boot specification is supported
                Function key-initiated network boot is supported
                Targeted content distribution is supported
                UEFI is supported
        BIOS Revision: 0.12

$ cat /sys/class/mei/mei0/fw_ver
0:16.0.15.1518
0:16.0.15.1518
0:16.0.15.1518

> >               /* Poll up to 2.5 seconds for ME to unconfigure DPG.
> >                * If this takes more than 1 second, show a warning indicating a
> >                * firmware bug
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
  2021-10-26 22:50     ` [Intel-wired-lan] " Kai-Heng Feng
@ 2021-10-29  9:14       ` Sasha Neftin
  -1 siblings, 0 replies; 16+ messages in thread
From: Sasha Neftin @ 2021-10-29  9:14 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Jesse Brandeburg, Nguyen, Anthony L, AceLan Kao, David S. Miller,
	Jakub Kicinski, Dima Ruinskiy,
	moderated list:INTEL ETHERNET DRIVERS, Linux Netdev List, LKML,
	Kraus, NechamaX, Fuxbrumer, Devora, Avivi, Amir, Neftin, Sasha

On 10/27/2021 01:50, Kai-Heng Feng wrote:
> On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
>>
>> On 10/26/2021 09:51, Kai-Heng Feng wrote:
>>> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
>>> polling logic doesn't wait until ME really unconfigures s0ix.
>>>
>>> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
>>> flagged, wait for 1 second to let ME unconfigure s0ix.
>>>
>>> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>> ---
>>> v2:
>>>    Add missing "Fixes:" tag
>>>
>>>    drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> index 44e2dc8328a22..cd81ba00a6bc9 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
>>>        u32 mac_data;
>>>        u16 phy_data;
>>>        u32 i = 0;
>>> +     bool dpg_exit_done;
>>>
>>>        if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
>>> +             dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
>>>                /* Request ME unconfigure the device from S0ix */
>>>                mac_data = er32(H2ME);
>>>                mac_data &= ~E1000_H2ME_START_DPG;
>>>                mac_data |= E1000_H2ME_EXIT_DPG;
>>>                ew32(H2ME, mac_data);
>>>
>>> +             if (dpg_exit_done) {
>>> +                     e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
>>> +                     msleep(1000);
>>> +             }
>> Thanks for working on the enablement.
>> The delay approach is fragile. We need to work with CSME folks to
>> understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
>> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
>> /sys/class/mei/mei0/fw_ver
> 
> $ sudo dmidecode -t 0
> # dmidecode 3.2
> Getting SMBIOS data from sysfs.
> SMBIOS 3.4 present.
> # SMBIOS implementations newer than version 3.2.0 are not
> # fully supported by this version of dmidecode.
> 
> Handle 0x0001, DMI type 0, 26 bytes
> BIOS Information
>          Vendor: Dell Inc.
>          Version: 0.12.68
>          Release Date: 10/01/2021
>          ROM Size: 48 MB
>          Characteristics:
>                  PCI is supported
>                  PNP is supported
>                  BIOS is upgradeable
>                  BIOS shadowing is allowed
>                  Boot from CD is supported
>                  Selectable boot is supported
>                  EDD is supported
>                  Print screen service is supported (int 5h)
>                  8042 keyboard services are supported (int 9h)
>                  Serial services are supported (int 14h)
>                  Printer services are supported (int 17h)
>                  ACPI is supported
>                  USB legacy is supported
>                  BIOS boot specification is supported
>                  Function key-initiated network boot is supported
>                  Targeted content distribution is supported
>                  UEFI is supported
>          BIOS Revision: 0.12
> 
> $ cat /sys/class/mei/mei0/fw_ver
> 0:16.0.15.1518
> 0:16.0.15.1518
> 0:16.0.15.1518
> 
Thank you Kai-Heng. The _DPG_EXIT_DONE bit indication comes from the 
EXFWSM register controlled by the CSME. We have only read access.  I 
realized that this indication was set to 1 even before our request to 
unconfigure the s0ix settings from the CSME. I am wondering. Does after 
a ~ 1s delay (or less, or more) _DPG_EXIT_DONE indication eventually 
change by CSME to 0? (is it consistently)
>>>                /* Poll up to 2.5 seconds for ME to unconfigure DPG.
>>>                 * If this takes more than 1 second, show a warning indicating a
>>>                 * firmware bug
>>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Intel-wired-lan] [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
@ 2021-10-29  9:14       ` Sasha Neftin
  0 siblings, 0 replies; 16+ messages in thread
From: Sasha Neftin @ 2021-10-29  9:14 UTC (permalink / raw)
  To: intel-wired-lan

On 10/27/2021 01:50, Kai-Heng Feng wrote:
> On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
>>
>> On 10/26/2021 09:51, Kai-Heng Feng wrote:
>>> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
>>> polling logic doesn't wait until ME really unconfigures s0ix.
>>>
>>> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
>>> flagged, wait for 1 second to let ME unconfigure s0ix.
>>>
>>> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>> ---
>>> v2:
>>>    Add missing "Fixes:" tag
>>>
>>>    drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> index 44e2dc8328a22..cd81ba00a6bc9 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
>>>        u32 mac_data;
>>>        u16 phy_data;
>>>        u32 i = 0;
>>> +     bool dpg_exit_done;
>>>
>>>        if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
>>> +             dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
>>>                /* Request ME unconfigure the device from S0ix */
>>>                mac_data = er32(H2ME);
>>>                mac_data &= ~E1000_H2ME_START_DPG;
>>>                mac_data |= E1000_H2ME_EXIT_DPG;
>>>                ew32(H2ME, mac_data);
>>>
>>> +             if (dpg_exit_done) {
>>> +                     e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
>>> +                     msleep(1000);
>>> +             }
>> Thanks for working on the enablement.
>> The delay approach is fragile. We need to work with CSME folks to
>> understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
>> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
>> /sys/class/mei/mei0/fw_ver
> 
> $ sudo dmidecode -t 0
> # dmidecode 3.2
> Getting SMBIOS data from sysfs.
> SMBIOS 3.4 present.
> # SMBIOS implementations newer than version 3.2.0 are not
> # fully supported by this version of dmidecode.
> 
> Handle 0x0001, DMI type 0, 26 bytes
> BIOS Information
>          Vendor: Dell Inc.
>          Version: 0.12.68
>          Release Date: 10/01/2021
>          ROM Size: 48 MB
>          Characteristics:
>                  PCI is supported
>                  PNP is supported
>                  BIOS is upgradeable
>                  BIOS shadowing is allowed
>                  Boot from CD is supported
>                  Selectable boot is supported
>                  EDD is supported
>                  Print screen service is supported (int 5h)
>                  8042 keyboard services are supported (int 9h)
>                  Serial services are supported (int 14h)
>                  Printer services are supported (int 17h)
>                  ACPI is supported
>                  USB legacy is supported
>                  BIOS boot specification is supported
>                  Function key-initiated network boot is supported
>                  Targeted content distribution is supported
>                  UEFI is supported
>          BIOS Revision: 0.12
> 
> $ cat /sys/class/mei/mei0/fw_ver
> 0:16.0.15.1518
> 0:16.0.15.1518
> 0:16.0.15.1518
> 
Thank you Kai-Heng. The _DPG_EXIT_DONE bit indication comes from the 
EXFWSM register controlled by the CSME. We have only read access.  I 
realized that this indication was set to 1 even before our request to 
unconfigure the s0ix settings from the CSME. I am wondering. Does after 
a ~ 1s delay (or less, or more) _DPG_EXIT_DONE indication eventually 
change by CSME to 0? (is it consistently)
>>>                /* Poll up to 2.5 seconds for ME to unconfigure DPG.
>>>                 * If this takes more than 1 second, show a warning indicating a
>>>                 * firmware bug
>>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
  2021-10-29  9:14       ` [Intel-wired-lan] " Sasha Neftin
@ 2021-11-02  3:27         ` Kai-Heng Feng
  -1 siblings, 0 replies; 16+ messages in thread
From: Kai-Heng Feng @ 2021-11-02  3:27 UTC (permalink / raw)
  To: Sasha Neftin
  Cc: Jesse Brandeburg, Nguyen, Anthony L, AceLan Kao, David S. Miller,
	Jakub Kicinski, Dima Ruinskiy,
	moderated list:INTEL ETHERNET DRIVERS, Linux Netdev List, LKML,
	Kraus, NechamaX, Fuxbrumer, Devora, Avivi, Amir

On Fri, Oct 29, 2021 at 5:14 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
>
> On 10/27/2021 01:50, Kai-Heng Feng wrote:
> > On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
> >>
> >> On 10/26/2021 09:51, Kai-Heng Feng wrote:
> >>> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
> >>> polling logic doesn't wait until ME really unconfigures s0ix.
> >>>
> >>> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
> >>> flagged, wait for 1 second to let ME unconfigure s0ix.
> >>>
> >>> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
> >>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
> >>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >>> ---
> >>> v2:
> >>>    Add missing "Fixes:" tag
> >>>
> >>>    drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
> >>>    1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> >>> index 44e2dc8328a22..cd81ba00a6bc9 100644
> >>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> >>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> >>> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
> >>>        u32 mac_data;
> >>>        u16 phy_data;
> >>>        u32 i = 0;
> >>> +     bool dpg_exit_done;
> >>>
> >>>        if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
> >>> +             dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
> >>>                /* Request ME unconfigure the device from S0ix */
> >>>                mac_data = er32(H2ME);
> >>>                mac_data &= ~E1000_H2ME_START_DPG;
> >>>                mac_data |= E1000_H2ME_EXIT_DPG;
> >>>                ew32(H2ME, mac_data);
> >>>
> >>> +             if (dpg_exit_done) {
> >>> +                     e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
> >>> +                     msleep(1000);
> >>> +             }
> >> Thanks for working on the enablement.
> >> The delay approach is fragile. We need to work with CSME folks to
> >> understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
> >> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
> >> /sys/class/mei/mei0/fw_ver
> >
> > $ sudo dmidecode -t 0
> > # dmidecode 3.2
> > Getting SMBIOS data from sysfs.
> > SMBIOS 3.4 present.
> > # SMBIOS implementations newer than version 3.2.0 are not
> > # fully supported by this version of dmidecode.
> >
> > Handle 0x0001, DMI type 0, 26 bytes
> > BIOS Information
> >          Vendor: Dell Inc.
> >          Version: 0.12.68
> >          Release Date: 10/01/2021
> >          ROM Size: 48 MB
> >          Characteristics:
> >                  PCI is supported
> >                  PNP is supported
> >                  BIOS is upgradeable
> >                  BIOS shadowing is allowed
> >                  Boot from CD is supported
> >                  Selectable boot is supported
> >                  EDD is supported
> >                  Print screen service is supported (int 5h)
> >                  8042 keyboard services are supported (int 9h)
> >                  Serial services are supported (int 14h)
> >                  Printer services are supported (int 17h)
> >                  ACPI is supported
> >                  USB legacy is supported
> >                  BIOS boot specification is supported
> >                  Function key-initiated network boot is supported
> >                  Targeted content distribution is supported
> >                  UEFI is supported
> >          BIOS Revision: 0.12
> >
> > $ cat /sys/class/mei/mei0/fw_ver
> > 0:16.0.15.1518
> > 0:16.0.15.1518
> > 0:16.0.15.1518
> >
> Thank you Kai-Heng. The _DPG_EXIT_DONE bit indication comes from the
> EXFWSM register controlled by the CSME. We have only read access.  I
> realized that this indication was set to 1 even before our request to
> unconfigure the s0ix settings from the CSME. I am wondering. Does after
> a ~ 1s delay (or less, or more) _DPG_EXIT_DONE indication eventually
> change by CSME to 0? (is it consistently)

Never. It's consistently being 1.

Right now we are seeing the same issue on TGL, so I wonder if it's
better to just revert the CSME series?

Kai-Heng

> >>>                /* Poll up to 2.5 seconds for ME to unconfigure DPG.
> >>>                 * If this takes more than 1 second, show a warning indicating a
> >>>                 * firmware bug
> >>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Intel-wired-lan] [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
@ 2021-11-02  3:27         ` Kai-Heng Feng
  0 siblings, 0 replies; 16+ messages in thread
From: Kai-Heng Feng @ 2021-11-02  3:27 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Oct 29, 2021 at 5:14 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
>
> On 10/27/2021 01:50, Kai-Heng Feng wrote:
> > On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
> >>
> >> On 10/26/2021 09:51, Kai-Heng Feng wrote:
> >>> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
> >>> polling logic doesn't wait until ME really unconfigures s0ix.
> >>>
> >>> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
> >>> flagged, wait for 1 second to let ME unconfigure s0ix.
> >>>
> >>> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
> >>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
> >>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >>> ---
> >>> v2:
> >>>    Add missing "Fixes:" tag
> >>>
> >>>    drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
> >>>    1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> >>> index 44e2dc8328a22..cd81ba00a6bc9 100644
> >>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> >>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> >>> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
> >>>        u32 mac_data;
> >>>        u16 phy_data;
> >>>        u32 i = 0;
> >>> +     bool dpg_exit_done;
> >>>
> >>>        if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
> >>> +             dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
> >>>                /* Request ME unconfigure the device from S0ix */
> >>>                mac_data = er32(H2ME);
> >>>                mac_data &= ~E1000_H2ME_START_DPG;
> >>>                mac_data |= E1000_H2ME_EXIT_DPG;
> >>>                ew32(H2ME, mac_data);
> >>>
> >>> +             if (dpg_exit_done) {
> >>> +                     e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
> >>> +                     msleep(1000);
> >>> +             }
> >> Thanks for working on the enablement.
> >> The delay approach is fragile. We need to work with CSME folks to
> >> understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
> >> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
> >> /sys/class/mei/mei0/fw_ver
> >
> > $ sudo dmidecode -t 0
> > # dmidecode 3.2
> > Getting SMBIOS data from sysfs.
> > SMBIOS 3.4 present.
> > # SMBIOS implementations newer than version 3.2.0 are not
> > # fully supported by this version of dmidecode.
> >
> > Handle 0x0001, DMI type 0, 26 bytes
> > BIOS Information
> >          Vendor: Dell Inc.
> >          Version: 0.12.68
> >          Release Date: 10/01/2021
> >          ROM Size: 48 MB
> >          Characteristics:
> >                  PCI is supported
> >                  PNP is supported
> >                  BIOS is upgradeable
> >                  BIOS shadowing is allowed
> >                  Boot from CD is supported
> >                  Selectable boot is supported
> >                  EDD is supported
> >                  Print screen service is supported (int 5h)
> >                  8042 keyboard services are supported (int 9h)
> >                  Serial services are supported (int 14h)
> >                  Printer services are supported (int 17h)
> >                  ACPI is supported
> >                  USB legacy is supported
> >                  BIOS boot specification is supported
> >                  Function key-initiated network boot is supported
> >                  Targeted content distribution is supported
> >                  UEFI is supported
> >          BIOS Revision: 0.12
> >
> > $ cat /sys/class/mei/mei0/fw_ver
> > 0:16.0.15.1518
> > 0:16.0.15.1518
> > 0:16.0.15.1518
> >
> Thank you Kai-Heng. The _DPG_EXIT_DONE bit indication comes from the
> EXFWSM register controlled by the CSME. We have only read access.  I
> realized that this indication was set to 1 even before our request to
> unconfigure the s0ix settings from the CSME. I am wondering. Does after
> a ~ 1s delay (or less, or more) _DPG_EXIT_DONE indication eventually
> change by CSME to 0? (is it consistently)

Never. It's consistently being 1.

Right now we are seeing the same issue on TGL, so I wonder if it's
better to just revert the CSME series?

Kai-Heng

> >>>                /* Poll up to 2.5 seconds for ME to unconfigure DPG.
> >>>                 * If this takes more than 1 second, show a warning indicating a
> >>>                 * firmware bug
> >>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
  2021-11-02  3:27         ` [Intel-wired-lan] " Kai-Heng Feng
@ 2021-11-02  6:24           ` Sasha Neftin
  -1 siblings, 0 replies; 16+ messages in thread
From: Sasha Neftin @ 2021-11-02  6:24 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Jesse Brandeburg, Nguyen, Anthony L, AceLan Kao, David S. Miller,
	Jakub Kicinski, Dima Ruinskiy,
	moderated list:INTEL ETHERNET DRIVERS, Linux Netdev List, LKML,
	Kraus, NechamaX, Fuxbrumer, Devora, Avivi, Amir, Neftin, Sasha

On 11/2/2021 05:27, Kai-Heng Feng wrote:
> On Fri, Oct 29, 2021 at 5:14 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
>>
>> On 10/27/2021 01:50, Kai-Heng Feng wrote:
>>> On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
>>>>
>>>> On 10/26/2021 09:51, Kai-Heng Feng wrote:
>>>>> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
>>>>> polling logic doesn't wait until ME really unconfigures s0ix.
>>>>>
>>>>> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
>>>>> flagged, wait for 1 second to let ME unconfigure s0ix.
>>>>>
>>>>> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
>>>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
>>>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>> ---
>>>>> v2:
>>>>>     Add missing "Fixes:" tag
>>>>>
>>>>>     drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>>>>>     1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> index 44e2dc8328a22..cd81ba00a6bc9 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
>>>>>         u32 mac_data;
>>>>>         u16 phy_data;
>>>>>         u32 i = 0;
>>>>> +     bool dpg_exit_done;
>>>>>
>>>>>         if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
>>>>> +             dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
>>>>>                 /* Request ME unconfigure the device from S0ix */
>>>>>                 mac_data = er32(H2ME);
>>>>>                 mac_data &= ~E1000_H2ME_START_DPG;
>>>>>                 mac_data |= E1000_H2ME_EXIT_DPG;
>>>>>                 ew32(H2ME, mac_data);
>>>>>
>>>>> +             if (dpg_exit_done) {
>>>>> +                     e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
>>>>> +                     msleep(1000);
>>>>> +             }
>>>> Thanks for working on the enablement.
>>>> The delay approach is fragile. We need to work with CSME folks to
>>>> understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
>>>> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
>>>> /sys/class/mei/mei0/fw_ver
>>>
>>> $ sudo dmidecode -t 0
>>> # dmidecode 3.2
>>> Getting SMBIOS data from sysfs.
>>> SMBIOS 3.4 present.
>>> # SMBIOS implementations newer than version 3.2.0 are not
>>> # fully supported by this version of dmidecode.
>>>
>>> Handle 0x0001, DMI type 0, 26 bytes
>>> BIOS Information
>>>           Vendor: Dell Inc.
>>>           Version: 0.12.68
>>>           Release Date: 10/01/2021
>>>           ROM Size: 48 MB
>>>           Characteristics:
>>>                   PCI is supported
>>>                   PNP is supported
>>>                   BIOS is upgradeable
>>>                   BIOS shadowing is allowed
>>>                   Boot from CD is supported
>>>                   Selectable boot is supported
>>>                   EDD is supported
>>>                   Print screen service is supported (int 5h)
>>>                   8042 keyboard services are supported (int 9h)
>>>                   Serial services are supported (int 14h)
>>>                   Printer services are supported (int 17h)
>>>                   ACPI is supported
>>>                   USB legacy is supported
>>>                   BIOS boot specification is supported
>>>                   Function key-initiated network boot is supported
>>>                   Targeted content distribution is supported
>>>                   UEFI is supported
>>>           BIOS Revision: 0.12
>>>
>>> $ cat /sys/class/mei/mei0/fw_ver
>>> 0:16.0.15.1518
>>> 0:16.0.15.1518
>>> 0:16.0.15.1518
>>>
>> Thank you Kai-Heng. The _DPG_EXIT_DONE bit indication comes from the
>> EXFWSM register controlled by the CSME. We have only read access.  I
>> realized that this indication was set to 1 even before our request to
>> unconfigure the s0ix settings from the CSME. I am wondering. Does after
>> a ~ 1s delay (or less, or more) _DPG_EXIT_DONE indication eventually
>> change by CSME to 0? (is it consistently)
> 
> Never. It's consistently being 1.
no. On my TGL platform is cleared by CSME:
[Sun Oct 31 08:54:40 2021] s0ix exit: EXFWSM register: 0x00000000
[Sun Oct 31 08:54:40 2021] s0ix exit (right after sent H2ME): EXFWSM 
register: 0x00000000
[Sun Oct 31 08:54:40 2021] s0ix exit(after polling): EXFWSM register: 
0x00000001
[Sun Oct 31 08:54:40 2021] e1000e 0000:00:1f.6 enp0s31f6: DPG_EXIT_DONE 
cleared after 130 msec
> 
> Right now we are seeing the same issue on TGL, so I wonder if it's
> better to just revert the CSME series?
no. We need to investigate it and understand what is CSME state we hit. 
Meanwhile few options:
1. use privilege flags to disable s0ix flow for problematic system 
(power will increased)
ethtool --set-priv-flags enp0s31f6 s0ix-enabled off
ethtool --show-priv-flags enp0s31f6
Private flags for enp0s31f6:
s0ix-enabled: off
2. delay as you suggested - less preferable I though
3. I would like to suggest (need to check it) in case the DPG_EXIT_DONE 
is 1 (and polling will be exit immediately) - let's perform enforce 
settings to the CSME, before write request to CSME unconfigure the 
device from s0ix :

if (er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE)
	mac_data |= E1000_H2ME_ENFORCE_SETTINGS;

I will update Bugzilla: 
https://bugzilla.kernel.org/show_bug.cgi?id=214821 with this information.

I also will need some another information regards SMB state in this case.
> 
> Kai-Heng
> 
>>>>>                 /* Poll up to 2.5 seconds for ME to unconfigure DPG.
>>>>>                  * If this takes more than 1 second, show a warning indicating a
>>>>>                  * firmware bug
>>>>>
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Intel-wired-lan] [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
@ 2021-11-02  6:24           ` Sasha Neftin
  0 siblings, 0 replies; 16+ messages in thread
From: Sasha Neftin @ 2021-11-02  6:24 UTC (permalink / raw)
  To: intel-wired-lan

On 11/2/2021 05:27, Kai-Heng Feng wrote:
> On Fri, Oct 29, 2021 at 5:14 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
>>
>> On 10/27/2021 01:50, Kai-Heng Feng wrote:
>>> On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin <sasha.neftin@intel.com> wrote:
>>>>
>>>> On 10/26/2021 09:51, Kai-Heng Feng wrote:
>>>>> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
>>>>> polling logic doesn't wait until ME really unconfigures s0ix.
>>>>>
>>>>> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
>>>>> flagged, wait for 1 second to let ME unconfigure s0ix.
>>>>>
>>>>> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
>>>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
>>>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>> ---
>>>>> v2:
>>>>>     Add missing "Fixes:" tag
>>>>>
>>>>>     drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>>>>>     1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> index 44e2dc8328a22..cd81ba00a6bc9 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
>>>>>         u32 mac_data;
>>>>>         u16 phy_data;
>>>>>         u32 i = 0;
>>>>> +     bool dpg_exit_done;
>>>>>
>>>>>         if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
>>>>> +             dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
>>>>>                 /* Request ME unconfigure the device from S0ix */
>>>>>                 mac_data = er32(H2ME);
>>>>>                 mac_data &= ~E1000_H2ME_START_DPG;
>>>>>                 mac_data |= E1000_H2ME_EXIT_DPG;
>>>>>                 ew32(H2ME, mac_data);
>>>>>
>>>>> +             if (dpg_exit_done) {
>>>>> +                     e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
>>>>> +                     msleep(1000);
>>>>> +             }
>>>> Thanks for working on the enablement.
>>>> The delay approach is fragile. We need to work with CSME folks to
>>>> understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
>>>> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
>>>> /sys/class/mei/mei0/fw_ver
>>>
>>> $ sudo dmidecode -t 0
>>> # dmidecode 3.2
>>> Getting SMBIOS data from sysfs.
>>> SMBIOS 3.4 present.
>>> # SMBIOS implementations newer than version 3.2.0 are not
>>> # fully supported by this version of dmidecode.
>>>
>>> Handle 0x0001, DMI type 0, 26 bytes
>>> BIOS Information
>>>           Vendor: Dell Inc.
>>>           Version: 0.12.68
>>>           Release Date: 10/01/2021
>>>           ROM Size: 48 MB
>>>           Characteristics:
>>>                   PCI is supported
>>>                   PNP is supported
>>>                   BIOS is upgradeable
>>>                   BIOS shadowing is allowed
>>>                   Boot from CD is supported
>>>                   Selectable boot is supported
>>>                   EDD is supported
>>>                   Print screen service is supported (int 5h)
>>>                   8042 keyboard services are supported (int 9h)
>>>                   Serial services are supported (int 14h)
>>>                   Printer services are supported (int 17h)
>>>                   ACPI is supported
>>>                   USB legacy is supported
>>>                   BIOS boot specification is supported
>>>                   Function key-initiated network boot is supported
>>>                   Targeted content distribution is supported
>>>                   UEFI is supported
>>>           BIOS Revision: 0.12
>>>
>>> $ cat /sys/class/mei/mei0/fw_ver
>>> 0:16.0.15.1518
>>> 0:16.0.15.1518
>>> 0:16.0.15.1518
>>>
>> Thank you Kai-Heng. The _DPG_EXIT_DONE bit indication comes from the
>> EXFWSM register controlled by the CSME. We have only read access.  I
>> realized that this indication was set to 1 even before our request to
>> unconfigure the s0ix settings from the CSME. I am wondering. Does after
>> a ~ 1s delay (or less, or more) _DPG_EXIT_DONE indication eventually
>> change by CSME to 0? (is it consistently)
> 
> Never. It's consistently being 1.
no. On my TGL platform is cleared by CSME:
[Sun Oct 31 08:54:40 2021] s0ix exit: EXFWSM register: 0x00000000
[Sun Oct 31 08:54:40 2021] s0ix exit (right after sent H2ME): EXFWSM 
register: 0x00000000
[Sun Oct 31 08:54:40 2021] s0ix exit(after polling): EXFWSM register: 
0x00000001
[Sun Oct 31 08:54:40 2021] e1000e 0000:00:1f.6 enp0s31f6: DPG_EXIT_DONE 
cleared after 130 msec
> 
> Right now we are seeing the same issue on TGL, so I wonder if it's
> better to just revert the CSME series?
no. We need to investigate it and understand what is CSME state we hit. 
Meanwhile few options:
1. use privilege flags to disable s0ix flow for problematic system 
(power will increased)
ethtool --set-priv-flags enp0s31f6 s0ix-enabled off
ethtool --show-priv-flags enp0s31f6
Private flags for enp0s31f6:
s0ix-enabled: off
2. delay as you suggested - less preferable I though
3. I would like to suggest (need to check it) in case the DPG_EXIT_DONE 
is 1 (and polling will be exit immediately) - let's perform enforce 
settings to the CSME, before write request to CSME unconfigure the 
device from s0ix :

if (er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE)
	mac_data |= E1000_H2ME_ENFORCE_SETTINGS;

I will update Bugzilla: 
https://bugzilla.kernel.org/show_bug.cgi?id=214821 with this information.

I also will need some another information regards SMB state in this case.
> 
> Kai-Heng
> 
>>>>>                 /* Poll up to 2.5 seconds for ME to unconfigure DPG.
>>>>>                  * If this takes more than 1 second, show a warning indicating a
>>>>>                  * firmware bug
>>>>>
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
  2021-11-02  6:24           ` [Intel-wired-lan] " Sasha Neftin
@ 2021-11-02  6:29             ` Sasha Neftin
  -1 siblings, 0 replies; 16+ messages in thread
From: Sasha Neftin @ 2021-11-02  6:29 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Jesse Brandeburg, Nguyen, Anthony L, AceLan Kao, David S. Miller,
	Jakub Kicinski, Dima Ruinskiy,
	moderated list:INTEL ETHERNET DRIVERS, Linux Netdev List, LKML,
	Kraus, NechamaX, Fuxbrumer, Devora, Avivi, Amir, Neftin, Sasha

On 11/2/2021 08:24, Sasha Neftin wrote:
> On 11/2/2021 05:27, Kai-Heng Feng wrote:
>> On Fri, Oct 29, 2021 at 5:14 PM Sasha Neftin <sasha.neftin@intel.com> 
>> wrote:
>>>
>>> On 10/27/2021 01:50, Kai-Heng Feng wrote:
>>>> On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin 
>>>> <sasha.neftin@intel.com> wrote:
>>>>>
>>>>> On 10/26/2021 09:51, Kai-Heng Feng wrote:
>>>>>> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e 
>>>>>> resume
>>>>>> polling logic doesn't wait until ME really unconfigures s0ix.
>>>>>>
>>>>>> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
>>>>>> flagged, wait for 1 second to let ME unconfigure s0ix.
>>>>>>
>>>>>> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to 
>>>>>> support S0ix")
>>>>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
>>>>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>>> ---
>>>>>> v2:
>>>>>>     Add missing "Fixes:" tag
>>>>>>
>>>>>>     drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>>>>>>     1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
>>>>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>> index 44e2dc8328a22..cd81ba00a6bc9 100644
>>>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct 
>>>>>> e1000_adapter *adapter)
>>>>>>         u32 mac_data;
>>>>>>         u16 phy_data;
>>>>>>         u32 i = 0;
>>>>>> +     bool dpg_exit_done;
>>>>>>
>>>>>>         if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
>>>>>> +             dpg_exit_done = er32(EXFWSM) & 
>>>>>> E1000_EXFWSM_DPG_EXIT_DONE;
>>>>>>                 /* Request ME unconfigure the device from S0ix */
>>>>>>                 mac_data = er32(H2ME);
>>>>>>                 mac_data &= ~E1000_H2ME_START_DPG;
>>>>>>                 mac_data |= E1000_H2ME_EXIT_DPG;
>>>>>>                 ew32(H2ME, mac_data);
>>>>>>
>>>>>> +             if (dpg_exit_done) {
>>>>>> +                     e_warn("DPG_EXIT_DONE is already flagged. 
>>>>>> This is a firmware bug\n");
>>>>>> +                     msleep(1000);
>>>>>> +             }
>>>>> Thanks for working on the enablement.
>>>>> The delay approach is fragile. We need to work with CSME folks to
>>>>> understand why _DPG_EXIT_DONE indication is wrong on some ADL 
>>>>> platforms.
>>>>> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
>>>>> /sys/class/mei/mei0/fw_ver
>>>>
>>>> $ sudo dmidecode -t 0
>>>> # dmidecode 3.2
>>>> Getting SMBIOS data from sysfs.
>>>> SMBIOS 3.4 present.
>>>> # SMBIOS implementations newer than version 3.2.0 are not
>>>> # fully supported by this version of dmidecode.
>>>>
>>>> Handle 0x0001, DMI type 0, 26 bytes
>>>> BIOS Information
>>>>           Vendor: Dell Inc.
>>>>           Version: 0.12.68
>>>>           Release Date: 10/01/2021
>>>>           ROM Size: 48 MB
>>>>           Characteristics:
>>>>                   PCI is supported
>>>>                   PNP is supported
>>>>                   BIOS is upgradeable
>>>>                   BIOS shadowing is allowed
>>>>                   Boot from CD is supported
>>>>                   Selectable boot is supported
>>>>                   EDD is supported
>>>>                   Print screen service is supported (int 5h)
>>>>                   8042 keyboard services are supported (int 9h)
>>>>                   Serial services are supported (int 14h)
>>>>                   Printer services are supported (int 17h)
>>>>                   ACPI is supported
>>>>                   USB legacy is supported
>>>>                   BIOS boot specification is supported
>>>>                   Function key-initiated network boot is supported
>>>>                   Targeted content distribution is supported
>>>>                   UEFI is supported
>>>>           BIOS Revision: 0.12
>>>>
>>>> $ cat /sys/class/mei/mei0/fw_ver
>>>> 0:16.0.15.1518
>>>> 0:16.0.15.1518
>>>> 0:16.0.15.1518
>>>>
>>> Thank you Kai-Heng. The _DPG_EXIT_DONE bit indication comes from the
>>> EXFWSM register controlled by the CSME. We have only read access.  I
>>> realized that this indication was set to 1 even before our request to
>>> unconfigure the s0ix settings from the CSME. I am wondering. Does after
>>> a ~ 1s delay (or less, or more) _DPG_EXIT_DONE indication eventually
>>> change by CSME to 0? (is it consistently)
>>
>> Never. It's consistently being 1.
> no. On my TGL platform is cleared by CSME:
> [Sun Oct 31 08:54:40 2021] s0ix exit: EXFWSM register: 0x00000000
> [Sun Oct 31 08:54:40 2021] s0ix exit (right after sent H2ME): EXFWSM 
> register: 0x00000000
> [Sun Oct 31 08:54:40 2021] s0ix exit(after polling): EXFWSM register: 
> 0x00000001
> [Sun Oct 31 08:54:40 2021] e1000e 0000:00:1f.6 enp0s31f6: DPG_EXIT_DONE 
> cleared after 130 msec
>>
>> Right now we are seeing the same issue on TGL, so I wonder if it's
>> better to just revert the CSME series?
> no. We need to investigate it and understand what is CSME state we hit. 
> Meanwhile few options:
> 1. use privilege flags to disable s0ix flow for problematic system 
> (power will increased)
> ethtool --set-priv-flags enp0s31f6 s0ix-enabled off
> ethtool --show-priv-flags enp0s31f6
> Private flags for enp0s31f6:
> s0ix-enabled: off
> 2. delay as you suggested - less preferable I though
> 3. I would like to suggest (need to check it) in case the DPG_EXIT_DONE 
> is 1 (and polling will be exit immediately) - let's perform enforce 
> settings to the CSME, before write request to CSME unconfigure the 
> device from s0ix :
> 
> if (er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE)
>      mac_data |= E1000_H2ME_ENFORCE_SETTINGS;
> 
and then allow to CSME finish the enforcing synchronization:
ew32(H2ME, mac_data);
usleep_range(30000, 31000);

> I will update Bugzilla: 
> https://bugzilla.kernel.org/show_bug.cgi?id=214821 with this information.
> 
> I also will need some another information regards SMB state in this case.
>>
>> Kai-Heng
>>
>>>>>>                 /* Poll up to 2.5 seconds for ME to unconfigure DPG.
>>>>>>                  * If this takes more than 1 second, show a 
>>>>>> warning indicating a
>>>>>>                  * firmware bug
>>>>>>
>>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Intel-wired-lan] [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged
@ 2021-11-02  6:29             ` Sasha Neftin
  0 siblings, 0 replies; 16+ messages in thread
From: Sasha Neftin @ 2021-11-02  6:29 UTC (permalink / raw)
  To: intel-wired-lan

On 11/2/2021 08:24, Sasha Neftin wrote:
> On 11/2/2021 05:27, Kai-Heng Feng wrote:
>> On Fri, Oct 29, 2021 at 5:14 PM Sasha Neftin <sasha.neftin@intel.com> 
>> wrote:
>>>
>>> On 10/27/2021 01:50, Kai-Heng Feng wrote:
>>>> On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin 
>>>> <sasha.neftin@intel.com> wrote:
>>>>>
>>>>> On 10/26/2021 09:51, Kai-Heng Feng wrote:
>>>>>> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e 
>>>>>> resume
>>>>>> polling logic doesn't wait until ME really unconfigures s0ix.
>>>>>>
>>>>>> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
>>>>>> flagged, wait for 1 second to let ME unconfigure s0ix.
>>>>>>
>>>>>> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to 
>>>>>> support S0ix")
>>>>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
>>>>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>>> ---
>>>>>> v2:
>>>>>> ??? Add missing "Fixes:" tag
>>>>>>
>>>>>> ??? drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>>>>>> ??? 1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
>>>>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>> index 44e2dc8328a22..cd81ba00a6bc9 100644
>>>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct 
>>>>>> e1000_adapter *adapter)
>>>>>> ??????? u32 mac_data;
>>>>>> ??????? u16 phy_data;
>>>>>> ??????? u32 i = 0;
>>>>>> +???? bool dpg_exit_done;
>>>>>>
>>>>>> ??????? if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
>>>>>> +???????????? dpg_exit_done = er32(EXFWSM) & 
>>>>>> E1000_EXFWSM_DPG_EXIT_DONE;
>>>>>> ??????????????? /* Request ME unconfigure the device from S0ix */
>>>>>> ??????????????? mac_data = er32(H2ME);
>>>>>> ??????????????? mac_data &= ~E1000_H2ME_START_DPG;
>>>>>> ??????????????? mac_data |= E1000_H2ME_EXIT_DPG;
>>>>>> ??????????????? ew32(H2ME, mac_data);
>>>>>>
>>>>>> +???????????? if (dpg_exit_done) {
>>>>>> +???????????????????? e_warn("DPG_EXIT_DONE is already flagged. 
>>>>>> This is a firmware bug\n");
>>>>>> +???????????????????? msleep(1000);
>>>>>> +???????????? }
>>>>> Thanks for working on the enablement.
>>>>> The delay approach is fragile. We need to work with CSME folks to
>>>>> understand why _DPG_EXIT_DONE indication is wrong on some ADL 
>>>>> platforms.
>>>>> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
>>>>> /sys/class/mei/mei0/fw_ver
>>>>
>>>> $ sudo dmidecode -t 0
>>>> # dmidecode 3.2
>>>> Getting SMBIOS data from sysfs.
>>>> SMBIOS 3.4 present.
>>>> # SMBIOS implementations newer than version 3.2.0 are not
>>>> # fully supported by this version of dmidecode.
>>>>
>>>> Handle 0x0001, DMI type 0, 26 bytes
>>>> BIOS Information
>>>> ????????? Vendor: Dell Inc.
>>>> ????????? Version: 0.12.68
>>>> ????????? Release Date: 10/01/2021
>>>> ????????? ROM Size: 48 MB
>>>> ????????? Characteristics:
>>>> ????????????????? PCI is supported
>>>> ????????????????? PNP is supported
>>>> ????????????????? BIOS is upgradeable
>>>> ????????????????? BIOS shadowing is allowed
>>>> ????????????????? Boot from CD is supported
>>>> ????????????????? Selectable boot is supported
>>>> ????????????????? EDD is supported
>>>> ????????????????? Print screen service is supported (int 5h)
>>>> ????????????????? 8042 keyboard services are supported (int 9h)
>>>> ????????????????? Serial services are supported (int 14h)
>>>> ????????????????? Printer services are supported (int 17h)
>>>> ????????????????? ACPI is supported
>>>> ????????????????? USB legacy is supported
>>>> ????????????????? BIOS boot specification is supported
>>>> ????????????????? Function key-initiated network boot is supported
>>>> ????????????????? Targeted content distribution is supported
>>>> ????????????????? UEFI is supported
>>>> ????????? BIOS Revision: 0.12
>>>>
>>>> $ cat /sys/class/mei/mei0/fw_ver
>>>> 0:16.0.15.1518
>>>> 0:16.0.15.1518
>>>> 0:16.0.15.1518
>>>>
>>> Thank you Kai-Heng. The _DPG_EXIT_DONE bit indication comes from the
>>> EXFWSM register controlled by the CSME. We have only read access.? I
>>> realized that this indication was set to 1 even before our request to
>>> unconfigure the s0ix settings from the CSME. I am wondering. Does after
>>> a ~ 1s delay (or less, or more) _DPG_EXIT_DONE indication eventually
>>> change by CSME to 0? (is it consistently)
>>
>> Never. It's consistently being 1.
> no. On my TGL platform is cleared by CSME:
> [Sun Oct 31 08:54:40 2021] s0ix exit: EXFWSM register: 0x00000000
> [Sun Oct 31 08:54:40 2021] s0ix exit (right after sent H2ME): EXFWSM 
> register: 0x00000000
> [Sun Oct 31 08:54:40 2021] s0ix exit(after polling): EXFWSM register: 
> 0x00000001
> [Sun Oct 31 08:54:40 2021] e1000e 0000:00:1f.6 enp0s31f6: DPG_EXIT_DONE 
> cleared after 130 msec
>>
>> Right now we are seeing the same issue on TGL, so I wonder if it's
>> better to just revert the CSME series?
> no. We need to investigate it and understand what is CSME state we hit. 
> Meanwhile few options:
> 1. use privilege flags to disable s0ix flow for problematic system 
> (power will increased)
> ethtool --set-priv-flags enp0s31f6 s0ix-enabled off
> ethtool --show-priv-flags enp0s31f6
> Private flags for enp0s31f6:
> s0ix-enabled: off
> 2. delay as you suggested - less preferable I though
> 3. I would like to suggest (need to check it) in case the DPG_EXIT_DONE 
> is 1 (and polling will be exit immediately) - let's perform enforce 
> settings to the CSME, before write request to CSME unconfigure the 
> device from s0ix :
> 
> if (er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE)
>  ????mac_data |= E1000_H2ME_ENFORCE_SETTINGS;
> 
and then allow to CSME finish the enforcing synchronization:
ew32(H2ME, mac_data);
usleep_range(30000, 31000);

> I will update Bugzilla: 
> https://bugzilla.kernel.org/show_bug.cgi?id=214821 with this information.
> 
> I also will need some another information regards SMB state in this case.
>>
>> Kai-Heng
>>
>>>>>> ??????????????? /* Poll up to 2.5 seconds for ME to unconfigure DPG.
>>>>>> ???????????????? * If this takes more than 1 second, show a 
>>>>>> warning indicating a
>>>>>> ???????????????? * firmware bug
>>>>>>
>>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-11-02  6:29 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-26  6:51 [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when DPG_EXIT_DONE is already flagged Kai-Heng Feng
2021-10-26  6:51 ` [Intel-wired-lan] " Kai-Heng Feng
2021-10-26  7:27 ` Paul Menzel
2021-10-26  7:27   ` Paul Menzel
2021-10-26  8:47 ` Sasha Neftin
2021-10-26  8:47   ` [Intel-wired-lan] " Sasha Neftin
2021-10-26 22:50   ` Kai-Heng Feng
2021-10-26 22:50     ` [Intel-wired-lan] " Kai-Heng Feng
2021-10-29  9:14     ` Sasha Neftin
2021-10-29  9:14       ` [Intel-wired-lan] " Sasha Neftin
2021-11-02  3:27       ` Kai-Heng Feng
2021-11-02  3:27         ` [Intel-wired-lan] " Kai-Heng Feng
2021-11-02  6:24         ` Sasha Neftin
2021-11-02  6:24           ` [Intel-wired-lan] " Sasha Neftin
2021-11-02  6:29           ` Sasha Neftin
2021-11-02  6:29             ` [Intel-wired-lan] " Sasha Neftin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.