All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection
@ 2019-01-03 21:28 Jan-Marek Glogowski
  2019-01-04 13:31 ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jan-Marek Glogowski
  2019-01-18 15:32 ` [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection Jan-Marek Glogowski
  0 siblings, 2 replies; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-03 21:28 UTC (permalink / raw)
  To: intel-wired-lan

Hi

I'm seeing the same problem as Camille Bordignon in August
https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20180806/013606.html

I'm on Ubuntu 18.04 (Kernel 4.15) and Ubuntu 12.04 + 14.04 HWE (Kernel 4.4).

My hardware is a Fujitsu laptop, u757 series, Skylake based.

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection I219-LM (rev 21)
        Subsystem: Fujitsu Limited. Ethernet Connection I219-LM
        Flags: bus master, fast devsel, latency 0, IRQ 128
        Memory at c1200000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [e0] PCI Advanced Features
        Kernel driver in use: e1000e
        Kernel modules: e1000e

Almost always a reconnect brings the connection down to 10M. It seems to work correctly from an
initramfs, if no traffic is send (almost always). Starting the dhclient for the interface brings it
down on reconnect.

All the following stuff is from running the Ubuntu 4.20 vanilla kernel build and an extra patched
e1000e module. I've not yet compiled current master.

I put the following debug output into the module:

diff --git a/mac.c b/mac.c
index 4abd55d..7eae7ae 100644
--- a/mac.c
+++ b/mac.c
@@ -1308,6 +1308,7 @@ s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
 	u32 status;

 	status = er32(STATUS);
+	pr_info("e1000e_get_speed_and_duplex_copper::status %x\n", status);
 	if (status & E1000_STATUS_SPEED_1000)
 		*speed = SPEED_1000;
 	else if (status & E1000_STATUS_SPEED_100)
diff --git a/netdev.c b/netdev.c
index 16a73bd..3016ac1 100644
--- a/netdev.c
+++ b/netdev.c
@@ -5070,6 +5070,7 @@ static bool e1000e_has_link(struct e1000_adapter *adapter)
 		if (hw->mac.get_link_status) {
 			ret_val = hw->mac.ops.check_for_link(hw);
 			link_active = !hw->mac.get_link_status;
+			pr_info("e1000e_has_link::check_for_link %i\n", ret_val);
 		} else {
 			link_active = true;
 		}

Some "evidence" I discovered:
* Link state is always correct when I have a link before module load (on boot or with rmmod + modprobe)
* It doesn't matter if I connect to a switch or simply cross-connect two U7x7
* Booting just the initramfs and re-connecting seems to work (very often) until data is transmitted.

From the debug info I took the following:
* The check_for_link is always 0 / ok.
* The working status is always 0x80083 and the broken most time 0x40080003. Once out of 50 I had a
0x80003 following the 0x40080003.

I also have some older Skylake HW (Fujitsu e737)

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection I219-V (rev 21)
        Subsystem: Fujitsu Limited. Ethernet Connection I219-V
        Flags: bus master, fast devsel, latency 0, IRQ 125
        Memory at a1200000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [e0] PCI Advanced Features
        Kernel driver in use: e1000e
        Kernel modules: e1000e

where I couldn't reproduce with the problem with Ubuntu 12.04 + 14.04 HWE (Kernel 4.4).

If I remember correctly my colleague could reproduce the problem with the u727 hardware (should be
the same then u757, just smaller) and the 12.04 based installation (kernel 4.4). I will test this
tomorrow again with my u757 hardware.

There was some discussion about putting some sleep somewhere, which was later dropped again.
Is a status matching 0x40000000 valid? At least there isn't a definition for a match.

And I noticed the "PHYAD" value from ethtool changes. The rule of thumb it 1 with link and 2
without. It instantly changes to 2 on lost link, but it takes some time after a link to change back
to 1 after the auto-negotiation.

Jan-Marek

P.S. I also tired the following patch, after I found the duplicate code. Didn't change anything for
me and I don't know if the new check in e1000e_get_speed_and_duplex_copper is always correct.

From bcd0ec30383698f0da14a9f675ae7475175f979a Mon Sep 17 00:00:00 2001
From: Jan-Marek Glogowski <glogow@fbihome.de>
Date: Thu, 3 Jan 2019 22:11:25 +0100
Subject: [PATCH] e1000e: drop duplicate speed + duplex decoding code

Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
---
 80003es2lan.c |  4 ++--
 e1000.h       |  2 +-
 ethtool.c     | 21 +++++----------------
 hw.h          |  2 +-
 ich8lan.c     |  8 +++++---
 mac.c         | 11 ++++++++---
 mac.h         |  4 ++--
 7 files changed, 24 insertions(+), 28 deletions(-)

diff --git a/80003es2lan.c b/80003es2lan.c
index 257bd59..7779346 100644
--- a/80003es2lan.c
+++ b/80003es2lan.c
@@ -638,7 +638,7 @@ static s32 e1000_get_cable_length_80003es2lan(struct e1000_hw *hw)
  *  Retrieve the current speed and duplex configuration.
  **/
 static s32 e1000_get_link_up_info_80003es2lan(struct e1000_hw *hw, u16 *speed,
-					      u16 *duplex)
+					      u8 *duplex)
 {
 	s32 ret_val;

@@ -1068,7 +1068,7 @@ static s32 e1000_cfg_on_link_up_80003es2lan(struct e1000_hw *hw)
 {
 	s32 ret_val = 0;
 	u16 speed;
-	u16 duplex;
+	u8 duplex;

 	if (hw->phy.media_type == e1000_media_type_copper) {
 		ret_val = e1000e_get_speed_and_duplex_copper(hw, &speed,
diff --git a/e1000.h b/e1000.h
index c760dc7..190a4cd 100644
--- a/e1000.h
+++ b/e1000.h
@@ -200,7 +200,7 @@ struct e1000_adapter {
 	u32 rx_buffer_len;
 	u16 mng_vlan_id;
 	u16 link_speed;
-	u16 link_duplex;
+	u8 link_duplex;
 	u16 eeprom_vers;

 	/* track device up/down/testing state */
diff --git a/ethtool.c b/ethtool.c
index 02ebf20..dafdeed 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -105,7 +105,8 @@ static int e1000_get_link_ksettings(struct net_device *netdev,
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
-	u32 speed, supported, advertising;
+	u32 supported, advertising;
+	u16 speed;

 	if (hw->phy.media_type == e1000_media_type_copper) {
 		supported = (SUPPORTED_10baseT_Half |
@@ -148,21 +149,9 @@ static int e1000_get_link_ksettings(struct net_device *netdev,
 			cmd->base.duplex = adapter->link_duplex - 1;
 		}
 	} else if (!pm_runtime_suspended(netdev->dev.parent)) {
-		u32 status = er32(STATUS);
-
-		if (status & E1000_STATUS_LU) {
-			if (status & E1000_STATUS_SPEED_1000)
-				speed = SPEED_1000;
-			else if (status & E1000_STATUS_SPEED_100)
-				speed = SPEED_100;
-			else
-				speed = SPEED_10;
-
-			if (status & E1000_STATUS_FD)
-				cmd->base.duplex = DUPLEX_FULL;
-			else
-				cmd->base.duplex = DUPLEX_HALF;
-		}
+		if (!e1000e_get_speed_and_duplex_copper(hw, &speed,
+							&cmd->base.duplex))
+			cmd->base.speed = SPEED_UNKNOWN;
 	}

 	cmd->base.speed = speed;
diff --git a/hw.h b/hw.h
index eff75bd..d3c18ce 100644
--- a/hw.h
+++ b/hw.h
@@ -460,7 +460,7 @@ struct e1000_mac_operations {
 	void (*clear_vfta)(struct e1000_hw *);
 	s32  (*get_bus_info)(struct e1000_hw *);
 	void (*set_lan_id)(struct e1000_hw *);
-	s32  (*get_link_up_info)(struct e1000_hw *, u16 *, u16 *);
+	s32  (*get_link_up_info)(struct e1000_hw *, u16 *, u8 *);
 	s32  (*led_on)(struct e1000_hw *);
 	s32  (*led_off)(struct e1000_hw *);
 	void (*update_mc_addr_list)(struct e1000_hw *, u8 *, u32);
diff --git a/ich8lan.c b/ich8lan.c
index cdae0ef..fd59970 100644
--- a/ich8lan.c
+++ b/ich8lan.c
@@ -998,7 +998,8 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link)
 	u16 lat_enc = 0;	/* latency encoded */

 	if (link) {
-		u16 speed, duplex, scale = 0;
+		u16 speed, scale = 0;
+		u8 duplex;
 		u16 max_snoop, max_nosnoop;
 		u16 max_ltr_enc;	/* max LTR latency encoded */
 		u64 value;
@@ -1386,7 +1387,8 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
 	 * the IPG and reduce Rx latency in the PHY.
 	 */
 	if ((hw->mac.type >= e1000_pch2lan) && link) {
-		u16 speed, duplex;
+		u16 speed;
+		u8 duplex;

 		e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
 		tipg_reg = er32(TIPG);
@@ -5074,7 +5076,7 @@ static s32 e1000_setup_copper_link_pch_lpt(struct e1000_hw *hw)
  *  gigabit speeds.
  **/
 static s32 e1000_get_link_up_info_ich8lan(struct e1000_hw *hw, u16 *speed,
-					  u16 *duplex)
+					  u8 *duplex)
 {
 	s32 ret_val;

diff --git a/mac.c b/mac.c
index 4abd55d..19c816c 100644
--- a/mac.c
+++ b/mac.c
@@ -1004,7 +1004,8 @@ s32 e1000e_config_fc_after_link_up(struct e1000_hw *hw)
 	s32 ret_val = 0;
 	u32 pcs_status_reg, pcs_adv_reg, pcs_lp_ability_reg, pcs_ctrl_reg;
 	u16 mii_status_reg, mii_nway_adv_reg, mii_nway_lp_ability_reg;
-	u16 speed, duplex;
+	u16 speed;
+	u8 duplex;

 	/* Check for the case where we have fiber media and auto-neg failed
 	 * so we had to force link.  In this case, we need to force the
@@ -1303,11 +1304,15 @@ s32 e1000e_config_fc_after_link_up(struct e1000_hw *hw)
  *  speed and duplex for copper connections.
  **/
 s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
-				       u16 *duplex)
+				       u8 *duplex)
 {
 	u32 status;

 	status = er32(STATUS);
+
+	if (!(status & E1000_STATUS_LU))
+		return 1;
+
 	if (status & E1000_STATUS_SPEED_1000)
 		*speed = SPEED_1000;
 	else if (status & E1000_STATUS_SPEED_100)
@@ -1337,7 +1342,7 @@ s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
  *  for fiber/serdes links.
  **/
 s32 e1000e_get_speed_and_duplex_fiber_serdes(struct e1000_hw __always_unused
-					     *hw, u16 *speed, u16 *duplex)
+					     *hw, u16 *speed, u8 *duplex)
 {
 	*speed = SPEED_1000;
 	*duplex = FULL_DUPLEX;
diff --git a/mac.h b/mac.h
index 6ab2611..c4416e8 100644
--- a/mac.h
+++ b/mac.h
@@ -17,9 +17,9 @@ s32 e1000e_get_bus_info_pcie(struct e1000_hw *hw);
 void e1000_set_lan_id_single_port(struct e1000_hw *hw);
 s32 e1000e_get_hw_semaphore(struct e1000_hw *hw);
 s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
-				       u16 *duplex);
+				       u8 *duplex);
 s32 e1000e_get_speed_and_duplex_fiber_serdes(struct e1000_hw *hw,
-					     u16 *speed, u16 *duplex);
+					     u16 *speed, u8 *duplex);
 s32 e1000e_id_led_init_generic(struct e1000_hw *hw);
 s32 e1000e_led_on_generic(struct e1000_hw *hw);
 s32 e1000e_led_off_generic(struct e1000_hw *hw);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect
  2019-01-03 21:28 [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection Jan-Marek Glogowski
@ 2019-01-04 13:31 ` Jan-Marek Glogowski
  2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 1/3] e1000e: drop duplicate speed + duplex decoding code Jan-Marek Glogowski
                     ` (4 more replies)
  2019-01-18 15:32 ` [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection Jan-Marek Glogowski
  1 sibling, 5 replies; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-04 13:31 UTC (permalink / raw)
  To: intel-wired-lan

This patch set is just based on a guess from the status register on a failing
reconnect. It works for me (TM), but I just have two different notebook series
(Fujitsu U7x7 and E7x6). And it just happens for the U7x7 with I219-LM, not
the E7x6 with I219-V.

Might be this patch just adds a little bit more time for the auto-negotiation
to finish and there is a better way to "wait"...

The patches were developed and tested on vanilla 4.20 from Ubuntu, building as
an external module. Before submission I applied them to todays Linus master,
via "git am" without any conflicts, but I didn't compile them on master.
I wanted to get some featback before putting more time into them.

Thanks for any feedback.

Jan-Marek


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 1/3] e1000e: drop duplicate speed + duplex decoding code
  2019-01-04 13:31 ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jan-Marek Glogowski
@ 2019-01-04 13:31   ` Jan-Marek Glogowski
  2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation Jan-Marek Glogowski
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-04 13:31 UTC (permalink / raw)
  To: intel-wired-lan

This also moves the link-up status checks into the common speed
and duplex status extraction code. I expect the speed and duplex
status to be invalid, if there is no link-up.

Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
---
 drivers/net/ethernet/intel/e1000e/80003es2lan.c |  4 ++--
 drivers/net/ethernet/intel/e1000e/e1000.h       |  2 +-
 drivers/net/ethernet/intel/e1000e/ethtool.c     | 20 ++++----------------
 drivers/net/ethernet/intel/e1000e/hw.h          |  2 +-
 drivers/net/ethernet/intel/e1000e/ich8lan.c     |  8 +++++---
 drivers/net/ethernet/intel/e1000e/mac.c         | 11 ++++++++---
 drivers/net/ethernet/intel/e1000e/mac.h         |  4 ++--
 7 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/80003es2lan.c b/drivers/net/ethernet/intel/e1000e/80003es2lan.c
index 257bd59..7779346 100644
--- a/drivers/net/ethernet/intel/e1000e/80003es2lan.c
+++ b/drivers/net/ethernet/intel/e1000e/80003es2lan.c
@@ -638,7 +638,7 @@ static s32 e1000_get_cable_length_80003es2lan(struct e1000_hw *hw)
  *  Retrieve the current speed and duplex configuration.
  **/
 static s32 e1000_get_link_up_info_80003es2lan(struct e1000_hw *hw, u16 *speed,
-					      u16 *duplex)
+					      u8 *duplex)
 {
 	s32 ret_val;
 
@@ -1068,7 +1068,7 @@ static s32 e1000_cfg_on_link_up_80003es2lan(struct e1000_hw *hw)
 {
 	s32 ret_val = 0;
 	u16 speed;
-	u16 duplex;
+	u8 duplex;
 
 	if (hw->phy.media_type == e1000_media_type_copper) {
 		ret_val = e1000e_get_speed_and_duplex_copper(hw, &speed,
diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h
index be13227..7fd8d26 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -200,7 +200,7 @@ struct e1000_adapter {
 	u32 rx_buffer_len;
 	u16 mng_vlan_id;
 	u16 link_speed;
-	u16 link_duplex;
+	u8 link_duplex;
 	u16 eeprom_vers;
 
 	/* track device up/down/testing state */
diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 02ebf20..d6ad54b 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -105,7 +105,8 @@ static int e1000_get_link_ksettings(struct net_device *netdev,
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
-	u32 speed, supported, advertising;
+	u32 supported, advertising;
+	u16 speed;
 
 	if (hw->phy.media_type == e1000_media_type_copper) {
 		supported = (SUPPORTED_10baseT_Half |
@@ -148,21 +149,8 @@ static int e1000_get_link_ksettings(struct net_device *netdev,
 			cmd->base.duplex = adapter->link_duplex - 1;
 		}
 	} else if (!pm_runtime_suspended(netdev->dev.parent)) {
-		u32 status = er32(STATUS);
-
-		if (status & E1000_STATUS_LU) {
-			if (status & E1000_STATUS_SPEED_1000)
-				speed = SPEED_1000;
-			else if (status & E1000_STATUS_SPEED_100)
-				speed = SPEED_100;
-			else
-				speed = SPEED_10;
-
-			if (status & E1000_STATUS_FD)
-				cmd->base.duplex = DUPLEX_FULL;
-			else
-				cmd->base.duplex = DUPLEX_HALF;
-		}
+		e1000e_get_speed_and_duplex_copper(hw, &speed,
+						   &cmd->base.duplex);
 	}
 
 	cmd->base.speed = speed;
diff --git a/drivers/net/ethernet/intel/e1000e/hw.h b/drivers/net/ethernet/intel/e1000e/hw.h
index eff75bd..d3c18ce 100644
--- a/drivers/net/ethernet/intel/e1000e/hw.h
+++ b/drivers/net/ethernet/intel/e1000e/hw.h
@@ -460,7 +460,7 @@ struct e1000_mac_operations {
 	void (*clear_vfta)(struct e1000_hw *);
 	s32  (*get_bus_info)(struct e1000_hw *);
 	void (*set_lan_id)(struct e1000_hw *);
-	s32  (*get_link_up_info)(struct e1000_hw *, u16 *, u16 *);
+	s32  (*get_link_up_info)(struct e1000_hw *, u16 *, u8 *);
 	s32  (*led_on)(struct e1000_hw *);
 	s32  (*led_off)(struct e1000_hw *);
 	void (*update_mc_addr_list)(struct e1000_hw *, u8 *, u32);
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index cdae0ef..fd59970 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -998,7 +998,8 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link)
 	u16 lat_enc = 0;	/* latency encoded */
 
 	if (link) {
-		u16 speed, duplex, scale = 0;
+		u16 speed, scale = 0;
+		u8 duplex;
 		u16 max_snoop, max_nosnoop;
 		u16 max_ltr_enc;	/* max LTR latency encoded */
 		u64 value;
@@ -1386,7 +1387,8 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
 	 * the IPG and reduce Rx latency in the PHY.
 	 */
 	if ((hw->mac.type >= e1000_pch2lan) && link) {
-		u16 speed, duplex;
+		u16 speed;
+		u8 duplex;
 
 		e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
 		tipg_reg = er32(TIPG);
@@ -5074,7 +5076,7 @@ static s32 e1000_setup_copper_link_pch_lpt(struct e1000_hw *hw)
  *  gigabit speeds.
  **/
 static s32 e1000_get_link_up_info_ich8lan(struct e1000_hw *hw, u16 *speed,
-					  u16 *duplex)
+					  u8 *duplex)
 {
 	s32 ret_val;
 
diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
index 4abd55d..19c816c 100644
--- a/drivers/net/ethernet/intel/e1000e/mac.c
+++ b/drivers/net/ethernet/intel/e1000e/mac.c
@@ -1004,7 +1004,8 @@ s32 e1000e_config_fc_after_link_up(struct e1000_hw *hw)
 	s32 ret_val = 0;
 	u32 pcs_status_reg, pcs_adv_reg, pcs_lp_ability_reg, pcs_ctrl_reg;
 	u16 mii_status_reg, mii_nway_adv_reg, mii_nway_lp_ability_reg;
-	u16 speed, duplex;
+	u16 speed;
+	u8 duplex;
 
 	/* Check for the case where we have fiber media and auto-neg failed
 	 * so we had to force link.  In this case, we need to force the
@@ -1303,11 +1304,15 @@ s32 e1000e_config_fc_after_link_up(struct e1000_hw *hw)
  *  speed and duplex for copper connections.
  **/
 s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
-				       u16 *duplex)
+				       u8 *duplex)
 {
 	u32 status;
 
 	status = er32(STATUS);
+
+	if (!(status & E1000_STATUS_LU))
+		return 1;
+
 	if (status & E1000_STATUS_SPEED_1000)
 		*speed = SPEED_1000;
 	else if (status & E1000_STATUS_SPEED_100)
@@ -1337,7 +1342,7 @@ s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
  *  for fiber/serdes links.
  **/
 s32 e1000e_get_speed_and_duplex_fiber_serdes(struct e1000_hw __always_unused
-					     *hw, u16 *speed, u16 *duplex)
+					     *hw, u16 *speed, u8 *duplex)
 {
 	*speed = SPEED_1000;
 	*duplex = FULL_DUPLEX;
diff --git a/drivers/net/ethernet/intel/e1000e/mac.h b/drivers/net/ethernet/intel/e1000e/mac.h
index 6ab2611..c4416e8 100644
--- a/drivers/net/ethernet/intel/e1000e/mac.h
+++ b/drivers/net/ethernet/intel/e1000e/mac.h
@@ -17,9 +17,9 @@ s32 e1000e_get_bus_info_pcie(struct e1000_hw *hw);
 void e1000_set_lan_id_single_port(struct e1000_hw *hw);
 s32 e1000e_get_hw_semaphore(struct e1000_hw *hw);
 s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
-				       u16 *duplex);
+				       u8 *duplex);
 s32 e1000e_get_speed_and_duplex_fiber_serdes(struct e1000_hw *hw,
-					     u16 *speed, u16 *duplex);
+					     u16 *speed, u8 *duplex);
 s32 e1000e_id_led_init_generic(struct e1000_hw *hw);
 s32 e1000e_led_on_generic(struct e1000_hw *hw);
 s32 e1000e_led_off_generic(struct e1000_hw *hw);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-04 13:31 ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jan-Marek Glogowski
  2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 1/3] e1000e: drop duplicate speed + duplex decoding code Jan-Marek Glogowski
@ 2019-01-04 13:31   ` Jan-Marek Glogowski
  2019-01-06 15:28     ` Neftin, Sasha
  2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 3/3] e1000e: add some status debug output Jan-Marek Glogowski
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-04 13:31 UTC (permalink / raw)
  To: intel-wired-lan

My problem is the fallback of the hardware to 10 Mbps after a
re-connect, which happens almost all times. In the broken case
the status field has always the 0x40000000 bit set.

Still the naming for the status flag is just a guess. Ignoring
the status, when this bit is set, solves my problem. But I just
have one notebook hardware (I219-LM, rev 21), which exhibits the
problem. It doesn't happen for my other notebook with I219-V
(rev 21) hardware (or it's just much more unlikely).

Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
---
 drivers/net/ethernet/intel/e1000e/defines.h | 1 +
 drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
 drivers/net/ethernet/intel/e1000e/mac.c     | 2 ++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
index fd550de..3cd9f99 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -221,6 +221,7 @@
 #define E1000_STATUS_LAN_INIT_DONE 0x00000200   /* Lan Init Completion by NVM */
 #define E1000_STATUS_PHYRA      0x00000400      /* PHY Reset Asserted */
 #define E1000_STATUS_GIO_MASTER_ENABLE	0x00080000	/* Master Req status */
+#define E1000_STATUS_AUTONEG    0x40000000      /* in auto-negotiation */
 
 #define HALF_DUPLEX 1
 #define FULL_DUPLEX 2
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index fd59970..8588eb7 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1390,7 +1390,8 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
 		u16 speed;
 		u8 duplex;
 
-		e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
+		if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
+			goto out;
 		tipg_reg = er32(TIPG);
 		tipg_reg &= ~E1000_TIPG_IPGT_MASK;
 
diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
index 19c816c..ada8fbb 100644
--- a/drivers/net/ethernet/intel/e1000e/mac.c
+++ b/drivers/net/ethernet/intel/e1000e/mac.c
@@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
 
 	status = er32(STATUS);
 
+	if (status & E1000_STATUS_AUTONEG)
+		return 1;
 	if (!(status & E1000_STATUS_LU))
 		return 1;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 3/3] e1000e: add some status debug output
  2019-01-04 13:31 ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jan-Marek Glogowski
  2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 1/3] e1000e: drop duplicate speed + duplex decoding code Jan-Marek Glogowski
  2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation Jan-Marek Glogowski
@ 2019-01-04 13:31   ` Jan-Marek Glogowski
  2019-01-06 15:54     ` Neftin, Sasha
  2019-01-04 23:39   ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jeff Kirsher
  2019-01-15 15:22   ` Jan-Marek Glogowski
  4 siblings, 1 reply; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-04 13:31 UTC (permalink / raw)
  To: intel-wired-lan

Add dynamic debug info for flow control advertising and dump
the status when extracting speed and duplex from it.

Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
---
 drivers/net/ethernet/intel/e1000e/mac.c | 10 +++++++---
 drivers/net/ethernet/intel/e1000e/phy.c |  6 +++++-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
index ada8fbb..daa7be8 100644
--- a/drivers/net/ethernet/intel/e1000e/mac.c
+++ b/drivers/net/ethernet/intel/e1000e/mac.c
@@ -1310,10 +1310,14 @@ s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
 
 	status = er32(STATUS);
 
-	if (status & E1000_STATUS_AUTONEG)
+	if (status & E1000_STATUS_AUTONEG) {
+		e_dbg("status 0x%x => in auto-neg, no valid config\n", status);
 		return 1;
-	if (!(status & E1000_STATUS_LU))
+	}
+	if (!(status & E1000_STATUS_LU)) {
+		e_dbg("status 0x%x => no link, no valid config\n", status);
 		return 1;
+	}
 
 	if (status & E1000_STATUS_SPEED_1000)
 		*speed = SPEED_1000;
@@ -1327,7 +1331,7 @@ s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
 	else
 		*duplex = HALF_DUPLEX;
 
-	e_dbg("%u Mbps, %s Duplex\n",
+	e_dbg("status 0x%x => %u Mbps, %s Duplex\n", status,
 	      *speed == SPEED_1000 ? 1000 : *speed == SPEED_100 ? 100 : 10,
 	      *duplex == FULL_DUPLEX ? "Full" : "Half");
 
diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
index 4223301..91da35c 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -1011,6 +1011,7 @@ static s32 e1000_phy_setup_autoneg(struct e1000_hw *hw)
 		 */
 		mii_autoneg_adv_reg &=
 		    ~(ADVERTISE_PAUSE_ASYM | ADVERTISE_PAUSE_CAP);
+		e_dbg("Advertise no flow control\n");
 		break;
 	case e1000_fc_rx_pause:
 		/* Rx Flow control is enabled, and Tx Flow control is
@@ -1024,6 +1025,7 @@ static s32 e1000_phy_setup_autoneg(struct e1000_hw *hw)
 		 */
 		mii_autoneg_adv_reg |=
 		    (ADVERTISE_PAUSE_ASYM | ADVERTISE_PAUSE_CAP);
+		e_dbg("Advertise no flow control\n");
 		break;
 	case e1000_fc_tx_pause:
 		/* Tx Flow control is enabled, and Rx Flow control is
@@ -1031,6 +1033,7 @@ static s32 e1000_phy_setup_autoneg(struct e1000_hw *hw)
 		 */
 		mii_autoneg_adv_reg |= ADVERTISE_PAUSE_ASYM;
 		mii_autoneg_adv_reg &= ~ADVERTISE_PAUSE_CAP;
+		e_dbg("Advertise Tx flow control\n");
 		break;
 	case e1000_fc_full:
 		/* Flow control (both Rx and Tx) is enabled by a software
@@ -1038,6 +1041,7 @@ static s32 e1000_phy_setup_autoneg(struct e1000_hw *hw)
 		 */
 		mii_autoneg_adv_reg |=
 		    (ADVERTISE_PAUSE_ASYM | ADVERTISE_PAUSE_CAP);
+		e_dbg("Advertise Tx and Rx flow control\n");
 		break;
 	default:
 		e_dbg("Flow control param set incorrectly\n");
@@ -1048,7 +1052,7 @@ static s32 e1000_phy_setup_autoneg(struct e1000_hw *hw)
 	if (ret_val)
 		return ret_val;
 
-	e_dbg("Auto-Neg Advertising %x\n", mii_autoneg_adv_reg);
+	e_dbg("Auto-Neg Advertising 0x%x\n", mii_autoneg_adv_reg);
 
 	if (phy->autoneg_mask & ADVERTISE_1000_FULL)
 		ret_val = e1e_wphy(hw, MII_CTRL1000, mii_1000t_ctrl_reg);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect
  2019-01-04 13:31 ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jan-Marek Glogowski
                     ` (2 preceding siblings ...)
  2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 3/3] e1000e: add some status debug output Jan-Marek Glogowski
@ 2019-01-04 23:39   ` Jeff Kirsher
  2019-01-05  0:13     ` Jan-Marek Glogowski
  2019-01-15 15:22   ` Jan-Marek Glogowski
  4 siblings, 1 reply; 27+ messages in thread
From: Jeff Kirsher @ 2019-01-04 23:39 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 2019-01-04 at 14:31 +0100, Jan-Marek Glogowski wrote:
> This patch set is just based on a guess from the status register on a
> failing
> reconnect. It works for me (TM), but I just have two different
> notebook series
> (Fujitsu U7x7 and E7x6). And it just happens for the U7x7 with I219-
> LM, not
> the E7x6 with I219-V.
> 
> Might be this patch just adds a little bit more time for the auto-
> negotiation
> to finish and there is a better way to "wait"...
> 
> The patches were developed and tested on vanilla 4.20 from Ubuntu,
> building as
> an external module. Before submission I applied them to todays Linus
> master,
> via "git am" without any conflicts, but I didn't compile them on
> master.
> I wanted to get some featback before putting more time into them.
> 
> Thanks for any feedback.
> 

The developers I want to review and comment on the issue, as well as
the changes you have submitted are into their weekend already due to
their location.  So reviews and feedback will most likely take 48 hours
or more.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190104/b45f7ba0/attachment.asc>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect
  2019-01-04 23:39   ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jeff Kirsher
@ 2019-01-05  0:13     ` Jan-Marek Glogowski
  0 siblings, 0 replies; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-05  0:13 UTC (permalink / raw)
  To: intel-wired-lan

Am 5. Januar 2019 00:39:30 MEZ schrieb Jeff Kirsher <jeffrey.t.kirsher@intel.com>:

>The developers I want to review and comment on the issue, as well as
>the changes you have submitted are into their weekend already due to
>their location.  So reviews and feedback will most likely take 48 hours
>or more.

I don't have access to the HW until Monday back at work. From the August thread I just remember no one could reproduce it. Might also be good to contact the original reporter.

Have a nice weekend and thanks for the reply and status update.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation Jan-Marek Glogowski
@ 2019-01-06 15:28     ` Neftin, Sasha
  2019-01-06 19:53       ` Jan-Marek Glogowski
  0 siblings, 1 reply; 27+ messages in thread
From: Neftin, Sasha @ 2019-01-06 15:28 UTC (permalink / raw)
  To: intel-wired-lan

On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
> My problem is the fallback of the hardware to 10 Mbps after a
> re-connect, which happens almost all times. In the broken case
> the status field has always the 0x40000000 bit set.
> 
> Still the naming for the status flag is just a guess. Ignoring
> the status, when this bit is set, solves my problem. But I just
> have one notebook hardware (I219-LM, rev 21), which exhibits the
> problem. It doesn't happen for my other notebook with I219-V
> (rev 21) hardware (or it's just much more unlikely).
> 
> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
> ---
>   drivers/net/ethernet/intel/e1000e/defines.h | 1 +
>   drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>   drivers/net/ethernet/intel/e1000e/mac.c     | 2 ++
>   3 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
> index fd550de..3cd9f99 100644
> --- a/drivers/net/ethernet/intel/e1000e/defines.h
> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
> @@ -221,6 +221,7 @@
>   #define E1000_STATUS_LAN_INIT_DONE 0x00000200   /* Lan Init Completion by NVM */
>   #define E1000_STATUS_PHYRA      0x00000400      /* PHY Reset Asserted */
>   #define E1000_STATUS_GIO_MASTER_ENABLE	0x00080000	/* Master Req status */
> +#define E1000_STATUS_AUTONEG    0x40000000      /* in auto-negotiation */
>   
There is no such indication. Should be removed.
>   #define HALF_DUPLEX 1
>   #define FULL_DUPLEX 2
> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> index fd59970..8588eb7 100644
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> @@ -1390,7 +1390,8 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>   		u16 speed;
>   		u8 duplex;
>   
> -		e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
> +		if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
> +			goto out;
>   		tipg_reg = er32(TIPG);
>   		tipg_reg &= ~E1000_TIPG_IPGT_MASK;
>   
> diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
> index 19c816c..ada8fbb 100644
> --- a/drivers/net/ethernet/intel/e1000e/mac.c
> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
> @@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
>   
>   	status = er32(STATUS);
>   
> +	if (status & E1000_STATUS_AUTONEG)
> +		return 1;
This is wrong. We have no AUTONEG indication in bit 30 of E1000_STATUS 
(0x0008) register. These code piece should be removed.
>   	if (!(status & E1000_STATUS_LU))
>   		return 1;
>   
> 
Hello Jan-Marek,
That's okay to use u8 size for a duplex indication and u16 size for a 
link indication, as you refer in previous patch. But use the 'autoneg 
status' is wrong. I wonder how this can solve the problem. Do you 
encountered with this problem on other platforms with our devices? (I 
meant different, no similar HW)
Anyway, 0x40000000 indication is not relevant to the auto-negotiation.
May I ask do your experiments with ME disable (via BIOS) and see if same 
problem still happen.
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 3/3] e1000e: add some status debug output
  2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 3/3] e1000e: add some status debug output Jan-Marek Glogowski
@ 2019-01-06 15:54     ` Neftin, Sasha
  0 siblings, 0 replies; 27+ messages in thread
From: Neftin, Sasha @ 2019-01-06 15:54 UTC (permalink / raw)
  To: intel-wired-lan

On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
> Add dynamic debug info for flow control advertising and dump
> the status when extracting speed and duplex from it.
> 
> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
> ---
>   drivers/net/ethernet/intel/e1000e/mac.c | 10 +++++++---
>   drivers/net/ethernet/intel/e1000e/phy.c |  6 +++++-
>   2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
> index ada8fbb..daa7be8 100644
> --- a/drivers/net/ethernet/intel/e1000e/mac.c
> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
> @@ -1310,10 +1310,14 @@ s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
>   
>   	status = er32(STATUS);
>   
> -	if (status & E1000_STATUS_AUTONEG)
> +	if (status & E1000_STATUS_AUTONEG) {
> +		e_dbg("status 0x%x => in auto-neg, no valid config\n", status);
>   		return 1;
This is not relevant debug info.
> -	if (!(status & E1000_STATUS_LU))
> +	}
> +	if (!(status & E1000_STATUS_LU)) {
> +		e_dbg("status 0x%x => no link, no valid config\n", status);
>   		return 1;
> +	}
>   
No objection.
>   	if (status & E1000_STATUS_SPEED_1000)
>   		*speed = SPEED_1000;
> @@ -1327,7 +1331,7 @@ s32 e1000e_get_speed_and_duplex_copper(struct e1000_hw *hw, u16 *speed,
>   	else
>   		*duplex = HALF_DUPLEX;
>   
> -	e_dbg("%u Mbps, %s Duplex\n",
> +	e_dbg("status 0x%x => %u Mbps, %s Duplex\n", status,
>   	      *speed == SPEED_1000 ? 1000 : *speed == SPEED_100 ? 100 : 10,
>   	      *duplex == FULL_DUPLEX ? "Full" : "Half");
>   
No objection.
> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
> index 4223301..91da35c 100644
> --- a/drivers/net/ethernet/intel/e1000e/phy.c
> +++ b/drivers/net/ethernet/intel/e1000e/phy.c
> @@ -1011,6 +1011,7 @@ static s32 e1000_phy_setup_autoneg(struct e1000_hw *hw)
>   		 */
>   		mii_autoneg_adv_reg &=
>   		    ~(ADVERTISE_PAUSE_ASYM | ADVERTISE_PAUSE_CAP);
> +		e_dbg("Advertise no flow control\n");
No objection.
>   		break;
>   	case e1000_fc_rx_pause:
>   		/* Rx Flow control is enabled, and Tx Flow control is
> @@ -1024,6 +1025,7 @@ static s32 e1000_phy_setup_autoneg(struct e1000_hw *hw)
>   		 */
>   		mii_autoneg_adv_reg |=
>   		    (ADVERTISE_PAUSE_ASYM | ADVERTISE_PAUSE_CAP);
> +		e_dbg("Advertise no flow control\n");
No objection.
>   		break;
>   	case e1000_fc_tx_pause:
>   		/* Tx Flow control is enabled, and Rx Flow control is
> @@ -1031,6 +1033,7 @@ static s32 e1000_phy_setup_autoneg(struct e1000_hw *hw)
>   		 */
>   		mii_autoneg_adv_reg |= ADVERTISE_PAUSE_ASYM;
>   		mii_autoneg_adv_reg &= ~ADVERTISE_PAUSE_CAP;
> +		e_dbg("Advertise Tx flow control\n");
No objection.
>   		break;
>   	case e1000_fc_full:
>   		/* Flow control (both Rx and Tx) is enabled by a software
> @@ -1038,6 +1041,7 @@ static s32 e1000_phy_setup_autoneg(struct e1000_hw *hw)
>   		 */
>   		mii_autoneg_adv_reg |=
>   		    (ADVERTISE_PAUSE_ASYM | ADVERTISE_PAUSE_CAP);
> +		e_dbg("Advertise Tx and Rx flow control\n");
No objection.
>   		break;
>   	default:
>   		e_dbg("Flow control param set incorrectly\n");
> @@ -1048,7 +1052,7 @@ static s32 e1000_phy_setup_autoneg(struct e1000_hw *hw)
>   	if (ret_val)
>   		return ret_val;
>   
> -	e_dbg("Auto-Neg Advertising %x\n", mii_autoneg_adv_reg);
> +	e_dbg("Auto-Neg Advertising 0x%x\n", mii_autoneg_adv_reg);
>   
No objection.
>   	if (phy->autoneg_mask & ADVERTISE_1000_FULL)
>   		ret_val = e1e_wphy(hw, MII_CTRL1000, mii_1000t_ctrl_reg);
> 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-06 15:28     ` Neftin, Sasha
@ 2019-01-06 19:53       ` Jan-Marek Glogowski
  2019-01-07  6:32         ` Neftin, Sasha
  0 siblings, 1 reply; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-06 19:53 UTC (permalink / raw)
  To: intel-wired-lan

Am 6. Januar 2019 16:28:42 MEZ schrieb "Neftin, Sasha" <sasha.neftin@intel.com>:
>On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
>> My problem is the fallback of the hardware to 10 Mbps after a
>> re-connect, which happens almost all times. In the broken case
>> the status field has always the 0x40000000 bit set.
>> 
>> Still the naming for the status flag is just a guess. Ignoring
>> the status, when this bit is set, solves my problem. But I just
>> have one notebook hardware (I219-LM, rev 21), which exhibits the
>> problem. It doesn't happen for my other notebook with I219-V
>> (rev 21) hardware (or it's just much more unlikely).
>> 
>> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
>> ---
>>   drivers/net/ethernet/intel/e1000e/defines.h | 1 +
>>   drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>>   drivers/net/ethernet/intel/e1000e/mac.c     | 2 ++
>>   3 files changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h
>b/drivers/net/ethernet/intel/e1000e/defines.h
>> index fd550de..3cd9f99 100644
>> --- a/drivers/net/ethernet/intel/e1000e/defines.h
>> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
>> @@ -221,6 +221,7 @@
>>   #define E1000_STATUS_LAN_INIT_DONE 0x00000200   /* Lan Init
>Completion by NVM */
>>   #define E1000_STATUS_PHYRA      0x00000400      /* PHY Reset
>Asserted */
>>   #define E1000_STATUS_GIO_MASTER_ENABLE	0x00080000	/* Master Req
>status */
>> +#define E1000_STATUS_AUTONEG    0x40000000      /* in
>auto-negotiation */
>>   
>There is no such indication. Should be removed.
>>   #define HALF_DUPLEX 1
>>   #define FULL_DUPLEX 2
>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> index fd59970..8588eb7 100644
>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> @@ -1390,7 +1390,8 @@ static s32
>e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>>   		u16 speed;
>>   		u8 duplex;
>>   
>> -		e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
>> +		if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
>> +			goto out;
>>   		tipg_reg = er32(TIPG);
>>   		tipg_reg &= ~E1000_TIPG_IPGT_MASK;
>>   
>> diff --git a/drivers/net/ethernet/intel/e1000e/mac.c
>b/drivers/net/ethernet/intel/e1000e/mac.c
>> index 19c816c..ada8fbb 100644
>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>> @@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct
>e1000_hw *hw, u16 *speed,
>>   
>>   	status = er32(STATUS);
>>   
>> +	if (status & E1000_STATUS_AUTONEG)
>> +		return 1;
>This is wrong. We have no AUTONEG indication in bit 30 of E1000_STATUS 
>(0x0008) register. These code piece should be removed.
>>   	if (!(status & E1000_STATUS_LU))
>>   		return 1;
>>   
>> 
>Hello Jan-Marek,
>That's okay to use u8 size for a duplex indication and u16 size for a 
>link indication, as you refer in previous patch.
> But use the 'autoneg status' is wrong.

Just as a reminder: I have no idea what this bit actually indicates. This is just a guess I had when looking into the problem. I don't know if the device was still negotiating at this point, but this bit was set in the status register.

> I wonder how this can solve the problem. Do you 
>encountered with this problem on other platforms with our devices? (I meant different, no similar HW)

Other platforms as Windows? I'm just doing Linux development, but I'll ask the Windows people and can check, if this problem also happens there.

I don't see this problem with older HW (Fujitsu E7x6, also Skylake based, but I219-V). It happens with both of my U7x7 test notebooks. I have some older Haswell based HW (E7x4), which I didn't yet test. Google tells me they have "Intel 82579LM Gigabit" ethernet.

All of these three series are in use and we have a few hundred or even thousand of them. This problem was found during the tests for our next Ubuntu 18.04 based release. This just seems to happen with the "new" U-series. I'm not aware of any problems like this with the older E-series HW.
And it probably just happens more often now for whatever reason.

>Anyway, 0x40000000 indication is not relevant to the auto-negotiation.
>May I ask do your experiments with ME disable (via BIOS) and see if
>same problem still happen.

Disabling ME shouldn't be a problem to test.

I'll continue testing all the HW tomorrow, with both our releases, and report back. And maybe there is an easier way to trigger the problem then re-plugging the cable all the time (maybe better to get a switch and power cycle that...).

Please tell me if there is anything else I should look for or test.

JMG

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-06 19:53       ` Jan-Marek Glogowski
@ 2019-01-07  6:32         ` Neftin, Sasha
  2019-01-07  9:00           ` Jan-Marek Glogowski
  0 siblings, 1 reply; 27+ messages in thread
From: Neftin, Sasha @ 2019-01-07  6:32 UTC (permalink / raw)
  To: intel-wired-lan

On 1/6/2019 21:53, Jan-Marek Glogowski wrote:
> Am 6. Januar 2019 16:28:42 MEZ schrieb "Neftin, Sasha" <sasha.neftin@intel.com>:
>> On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
>>> My problem is the fallback of the hardware to 10 Mbps after a
>>> re-connect, which happens almost all times. In the broken case
>>> the status field has always the 0x40000000 bit set.
>>>
>>> Still the naming for the status flag is just a guess. Ignoring
>>> the status, when this bit is set, solves my problem. But I just
>>> have one notebook hardware (I219-LM, rev 21), which exhibits the
>>> problem. It doesn't happen for my other notebook with I219-V
>>> (rev 21) hardware (or it's just much more unlikely).
>>>
>>> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
>>> ---
>>>    drivers/net/ethernet/intel/e1000e/defines.h | 1 +
>>>    drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>>>    drivers/net/ethernet/intel/e1000e/mac.c     | 2 ++
>>>    3 files changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h
>> b/drivers/net/ethernet/intel/e1000e/defines.h
>>> index fd550de..3cd9f99 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/defines.h
>>> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
>>> @@ -221,6 +221,7 @@
>>>    #define E1000_STATUS_LAN_INIT_DONE 0x00000200   /* Lan Init
>> Completion by NVM */
>>>    #define E1000_STATUS_PHYRA      0x00000400      /* PHY Reset
>> Asserted */
>>>    #define E1000_STATUS_GIO_MASTER_ENABLE	0x00080000	/* Master Req
>> status */
>>> +#define E1000_STATUS_AUTONEG    0x40000000      /* in
>> auto-negotiation */
>>>    
>> There is no such indication. Should be removed.
>>>    #define HALF_DUPLEX 1
>>>    #define FULL_DUPLEX 2
>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> index fd59970..8588eb7 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> @@ -1390,7 +1390,8 @@ static s32
>> e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>>>    		u16 speed;
>>>    		u8 duplex;
>>>    
>>> -		e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
>>> +		if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
>>> +			goto out;
>>>    		tipg_reg = er32(TIPG);
>>>    		tipg_reg &= ~E1000_TIPG_IPGT_MASK;
>>>    
>>> diff --git a/drivers/net/ethernet/intel/e1000e/mac.c
>> b/drivers/net/ethernet/intel/e1000e/mac.c
>>> index 19c816c..ada8fbb 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>>> @@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct
>> e1000_hw *hw, u16 *speed,
>>>    
>>>    	status = er32(STATUS);
>>>    
>>> +	if (status & E1000_STATUS_AUTONEG)
>>> +		return 1;
>> This is wrong. We have no AUTONEG indication in bit 30 of E1000_STATUS
>> (0x0008) register. These code piece should be removed.
>>>    	if (!(status & E1000_STATUS_LU))
>>>    		return 1;
>>>    
>>>
>> Hello Jan-Marek,
>> That's okay to use u8 size for a duplex indication and u16 size for a
>> link indication, as you refer in previous patch.
>> But use the 'autoneg status' is wrong.
> 
> Just as a reminder: I have no idea what this bit actually indicates. This is just a guess I had when looking into the problem. I don't know if the device was still negotiating at this point, but this bit was set in the status register.
> 
>> I wonder how this can solve the problem. Do you
>> encountered with this problem on other platforms with our devices? (I meant different, no similar HW)
> 
> Other platforms as Windows? I'm just doing Linux development, but I'll ask the Windows people and can check, if this problem also happens there.
> 
> I don't see this problem with older HW (Fujitsu E7x6, also Skylake based, but I219-V). It happens with both of my U7x7 test notebooks. I have some older Haswell based HW (E7x4), which I didn't yet test. Google tells me they have "Intel 82579LM Gigabit" ethernet.
> 
> All of these three series are in use and we have a few hundred or even thousand of them. This problem was found during the tests for our next Ubuntu 18.04 based release. This just seems to happen with the "new" U-series. I'm not aware of any problems like this with the older E-series HW.
> And it probably just happens more often now for whatever reason.
> 
>> Anyway, 0x40000000 indication is not relevant to the auto-negotiation.
>> May I ask do your experiments with ME disable (via BIOS) and see if
>> same problem still happen.
> 
> Disabling ME shouldn't be a problem to test.
> 
You have mentioned that there is no problem on I219-V. The main 
difference between I219-LM and I219-V is 'Intel Standard Manageability' 
feature. So, I suggest to disable ME and re-check.
> I'll continue testing all the HW tomorrow, with both our releases, and report back. And maybe there is an easier way to trigger the problem then re-plugging the cable all the time (maybe better to get a switch and power cycle that...).
> 
> Please tell me if there is anything else I should look for or test.
> Further step more likely should be dump registers and try access to a 
PHY. But let's check ME disabled as the first step.
> JMG
> 
Sasha

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-07  6:32         ` Neftin, Sasha
@ 2019-01-07  9:00           ` Jan-Marek Glogowski
  2019-01-07 14:15             ` Jan-Marek Glogowski
  0 siblings, 1 reply; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-07  9:00 UTC (permalink / raw)
  To: intel-wired-lan



Am 07.01.19 um 07:32 schrieb Neftin, Sasha:
> On 1/6/2019 21:53, Jan-Marek Glogowski wrote:
>> Am 6. Januar 2019 16:28:42 MEZ schrieb "Neftin, Sasha" <sasha.neftin@intel.com>:
>>> On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
>>>> My problem is the fallback of the hardware to 10 Mbps after a
>>>> re-connect, which happens almost all times. In the broken case
>>>> the status field has always the 0x40000000 bit set.
>>>>
>>>> Still the naming for the status flag is just a guess. Ignoring
>>>> the status, when this bit is set, solves my problem. But I just
>>>> have one notebook hardware (I219-LM, rev 21), which exhibits the
>>>> problem. It doesn't happen for my other notebook with I219-V
>>>> (rev 21) hardware (or it's just much more unlikely).
>>>>
>>>> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
>>>> ---
>>>> ?? drivers/net/ethernet/intel/e1000e/defines.h | 1 +
>>>> ?? drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>>>> ?? drivers/net/ethernet/intel/e1000e/mac.c???? | 2 ++
>>>> ?? 3 files changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h
>>> b/drivers/net/ethernet/intel/e1000e/defines.h
>>>> index fd550de..3cd9f99 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/defines.h
>>>> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
>>>> @@ -221,6 +221,7 @@
>>>> ?? #define E1000_STATUS_LAN_INIT_DONE 0x00000200?? /* Lan Init
>>> Completion by NVM */
>>>> ?? #define E1000_STATUS_PHYRA????? 0x00000400????? /* PHY Reset
>>> Asserted */
>>>> ?? #define E1000_STATUS_GIO_MASTER_ENABLE??? 0x00080000??? /* Master Req
>>> status */
>>>> +#define E1000_STATUS_AUTONEG??? 0x40000000????? /* in
>>> auto-negotiation */
>>>> ?? 
>>> There is no such indication. Should be removed.
>>>> ?? #define HALF_DUPLEX 1
>>>> ?? #define FULL_DUPLEX 2
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> index fd59970..8588eb7 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> @@ -1390,7 +1390,8 @@ static s32
>>> e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>>>> ?????????? u16 speed;
>>>> ?????????? u8 duplex;
>>>> ?? -??????? e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
>>>> +??????? if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
>>>> +??????????? goto out;
>>>> ?????????? tipg_reg = er32(TIPG);
>>>> ?????????? tipg_reg &= ~E1000_TIPG_IPGT_MASK;
>>>> ?? diff --git a/drivers/net/ethernet/intel/e1000e/mac.c
>>> b/drivers/net/ethernet/intel/e1000e/mac.c
>>>> index 19c816c..ada8fbb 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>>>> @@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct
>>> e1000_hw *hw, u16 *speed,
>>>> ?? ?????? status = er32(STATUS);
>>>> ?? +??? if (status & E1000_STATUS_AUTONEG)
>>>> +??????? return 1;
>>> This is wrong. We have no AUTONEG indication in bit 30 of E1000_STATUS
>>> (0x0008) register. These code piece should be removed.
>>>> ?????? if (!(status & E1000_STATUS_LU))
>>>> ?????????? return 1;
>>>> ??
>>> Hello Jan-Marek,
>>> That's okay to use u8 size for a duplex indication and u16 size for a
>>> link indication, as you refer in previous patch.
>>> But use the 'autoneg status' is wrong.
>>
>> Just as a reminder: I have no idea what this bit actually indicates. This is just a guess I had
>> when looking into the problem. I don't know if the device was still negotiating at this point, but
>> this bit was set in the status register.
>>
>>> I wonder how this can solve the problem. Do you
>>> encountered with this problem on other platforms with our devices? (I meant different, no similar
>>> HW)
>>
>> Other platforms as Windows? I'm just doing Linux development, but I'll ask the Windows people and
>> can check, if this problem also happens there.
>>
>> I don't see this problem with older HW (Fujitsu E7x6, also Skylake based, but I219-V). It happens
>> with both of my U7x7 test notebooks. I have some older Haswell based HW (E7x4), which I didn't yet
>> test. Google tells me they have "Intel 82579LM Gigabit" ethernet.
>>
>> All of these three series are in use and we have a few hundred or even thousand of them. This
>> problem was found during the tests for our next Ubuntu 18.04 based release. This just seems to
>> happen with the "new" U-series. I'm not aware of any problems like this with the older E-series HW.
>> And it probably just happens more often now for whatever reason.
>>
>>> Anyway, 0x40000000 indication is not relevant to the auto-negotiation.
>>> May I ask do your experiments with ME disable (via BIOS) and see if
>>> same problem still happen.
>>
>> Disabling ME shouldn't be a problem to test.
>>
> You have mentioned that there is no problem on I219-V. The main difference between I219-LM and
> I219-V is 'Intel Standard Manageability' feature. So, I suggest to disable ME and re-check.
>> I'll continue testing all the HW tomorrow, with both our releases, and report back. And maybe
>> there is an easier way to trigger the problem then re-plugging the cable all the time (maybe
>> better to get a switch and power cycle that...).
>>
>> Please tell me if there is anything else I should look for or test.
>> Further step more likely should be dump registers and try access to a 
> PHY. But let's check ME disabled as the first step.

According to the BIOS ME is actually disabled.
Nevertheless I selected "UnConfigure ME", which didn'tr change anything in the BIOS (ME
v11.8.50.3425 FWIW). I did look for vendor BIOS updates, as you think this problem might be ME
related. There is an update available.

JMG

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-07  9:00           ` Jan-Marek Glogowski
@ 2019-01-07 14:15             ` Jan-Marek Glogowski
  2019-01-07 15:49               ` Neftin, Sasha
  0 siblings, 1 reply; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-07 14:15 UTC (permalink / raw)
  To: intel-wired-lan



Am 07.01.19 um 10:00 schrieb Jan-Marek Glogowski:
> 
> 
> Am 07.01.19 um 07:32 schrieb Neftin, Sasha:
>> On 1/6/2019 21:53, Jan-Marek Glogowski wrote:
>>> Am 6. Januar 2019 16:28:42 MEZ schrieb "Neftin, Sasha" <sasha.neftin@intel.com>:
>>>> On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
>>>>> My problem is the fallback of the hardware to 10 Mbps after a
>>>>> re-connect, which happens almost all times. In the broken case
>>>>> the status field has always the 0x40000000 bit set.
>>>>>
>>>>> Still the naming for the status flag is just a guess. Ignoring
>>>>> the status, when this bit is set, solves my problem. But I just
>>>>> have one notebook hardware (I219-LM, rev 21), which exhibits the
>>>>> problem. It doesn't happen for my other notebook with I219-V
>>>>> (rev 21) hardware (or it's just much more unlikely).
>>>>>
>>>>> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
>>>>> ---
>>>>> ?? drivers/net/ethernet/intel/e1000e/defines.h | 1 +
>>>>> ?? drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>>>>> ?? drivers/net/ethernet/intel/e1000e/mac.c???? | 2 ++
>>>>> ?? 3 files changed, 5 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h
>>>> b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>> index fd550de..3cd9f99 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>> @@ -221,6 +221,7 @@
>>>>> ?? #define E1000_STATUS_LAN_INIT_DONE 0x00000200?? /* Lan Init
>>>> Completion by NVM */
>>>>> ?? #define E1000_STATUS_PHYRA????? 0x00000400????? /* PHY Reset
>>>> Asserted */
>>>>> ?? #define E1000_STATUS_GIO_MASTER_ENABLE??? 0x00080000??? /* Master Req
>>>> status */
>>>>> +#define E1000_STATUS_AUTONEG??? 0x40000000????? /* in
>>>> auto-negotiation */
>>>>> ?? 
>>>> There is no such indication. Should be removed.
>>>>> ?? #define HALF_DUPLEX 1
>>>>> ?? #define FULL_DUPLEX 2
>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> index fd59970..8588eb7 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> @@ -1390,7 +1390,8 @@ static s32
>>>> e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>>>>> ?????????? u16 speed;
>>>>> ?????????? u8 duplex;
>>>>> ?? -??????? e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
>>>>> +??????? if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
>>>>> +??????????? goto out;
>>>>> ?????????? tipg_reg = er32(TIPG);
>>>>> ?????????? tipg_reg &= ~E1000_TIPG_IPGT_MASK;
>>>>> ?? diff --git a/drivers/net/ethernet/intel/e1000e/mac.c
>>>> b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>> index 19c816c..ada8fbb 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>> @@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct
>>>> e1000_hw *hw, u16 *speed,
>>>>> ?? ?????? status = er32(STATUS);
>>>>> ?? +??? if (status & E1000_STATUS_AUTONEG)
>>>>> +??????? return 1;
>>>> This is wrong. We have no AUTONEG indication in bit 30 of E1000_STATUS
>>>> (0x0008) register. These code piece should be removed.
>>>>> ?????? if (!(status & E1000_STATUS_LU))
>>>>> ?????????? return 1;
>>>>> ??
>>>> Hello Jan-Marek,
>>>> That's okay to use u8 size for a duplex indication and u16 size for a
>>>> link indication, as you refer in previous patch.
>>>> But use the 'autoneg status' is wrong.
>>>
>>> Just as a reminder: I have no idea what this bit actually indicates. This is just a guess I had
>>> when looking into the problem. I don't know if the device was still negotiating at this point, but
>>> this bit was set in the status register.
>>>
>>>> I wonder how this can solve the problem. Do you
>>>> encountered with this problem on other platforms with our devices? (I meant different, no similar
>>>> HW)
>>>
>>> Other platforms as Windows? I'm just doing Linux development, but I'll ask the Windows people and
>>> can check, if this problem also happens there.
>>>
>>> I don't see this problem with older HW (Fujitsu E7x6, also Skylake based, but I219-V). It happens
>>> with both of my U7x7 test notebooks. I have some older Haswell based HW (E7x4), which I didn't yet
>>> test. Google tells me they have "Intel 82579LM Gigabit" ethernet.
>>>
>>> All of these three series are in use and we have a few hundred or even thousand of them. This
>>> problem was found during the tests for our next Ubuntu 18.04 based release. This just seems to
>>> happen with the "new" U-series. I'm not aware of any problems like this with the older E-series HW.
>>> And it probably just happens more often now for whatever reason.
>>>
>>>> Anyway, 0x40000000 indication is not relevant to the auto-negotiation.
>>>> May I ask do your experiments with ME disable (via BIOS) and see if
>>>> same problem still happen.
>>>
>>> Disabling ME shouldn't be a problem to test.
>>>
>> You have mentioned that there is no problem on I219-V. The main difference between I219-LM and
>> I219-V is 'Intel Standard Manageability' feature. So, I suggest to disable ME and re-check.
>>> I'll continue testing all the HW tomorrow, with both our releases, and report back. And maybe
>>> there is an easier way to trigger the problem then re-plugging the cable all the time (maybe
>>> better to get a switch and power cycle that...).
>>>
>>> Please tell me if there is anything else I should look for or test.
>>> Further step more likely should be dump registers and try access to a 
>> PHY. But let's check ME disabled as the first step.
> 
> According to the BIOS ME is actually disabled.
> Nevertheless I selected "UnConfigure ME", which didn'tr change anything in the BIOS (ME
> v11.8.50.3425 FWIW). I did look for vendor BIOS updates, as you think this problem might be ME
> related. There is an update available.

So I did the BIOS update - no changes regarding the network auto-negotiation behavior.

I also tried both of my E-Series. The old Haswell series (E7x4) also has a disabled ME and as
suspected the following HW:

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
        Subsystem: Fujitsu Limited. Ethernet Connection I217-LM
        Flags: bus master, fast devsel, latency 0, IRQ 27
        Memory at f0500000 (32-bit, non-prefetchable) [size=128K]
        Memory at f053f000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at 3080 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [e0] PCI Advanced Features
        Kernel driver in use: e1000e
        Kernel modules: e1000e

I tried the patched module on both E-series HW and they always have the 0x40000000 bit set when
decoding the speed from the status register (always 0x40080083), either with or without the ME
available. So my patch breaks my older HW, as you probably suspected. I removed the 0x40000000 test
from the module, and they always negotiated 1000 Mbps just fine.

I've attached logs for all three notebooks with my patched module (without the  0x40000000 test) and
a debug filter for all files of the module (echo "file */e1000e-20/* +p;" >
/sys/kernel/debug/dynamic_debug/control).

My test consisted of rmmod'ing, sleep 1, insmod'ing, set debug filter + two reconnects.

So I'm basically back to square one.

How to proceed?

JMG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: e1000-logs.tar.xz
Type: application/x-xz
Size: 12432 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190107/48a4ec34/attachment.xz>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-07 14:15             ` Jan-Marek Glogowski
@ 2019-01-07 15:49               ` Neftin, Sasha
  2019-01-07 16:37                 ` Jan-Marek Glogowski
  0 siblings, 1 reply; 27+ messages in thread
From: Neftin, Sasha @ 2019-01-07 15:49 UTC (permalink / raw)
  To: intel-wired-lan

On 1/7/2019 16:15, Jan-Marek Glogowski wrote:
> 
> 
> Am 07.01.19 um 10:00 schrieb Jan-Marek Glogowski:
>>
>>
>> Am 07.01.19 um 07:32 schrieb Neftin, Sasha:
>>> On 1/6/2019 21:53, Jan-Marek Glogowski wrote:
>>>> Am 6. Januar 2019 16:28:42 MEZ schrieb "Neftin, Sasha" <sasha.neftin@intel.com>:
>>>>> On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
>>>>>> My problem is the fallback of the hardware to 10 Mbps after a
>>>>>> re-connect, which happens almost all times. In the broken case
>>>>>> the status field has always the 0x40000000 bit set.
>>>>>>
>>>>>> Still the naming for the status flag is just a guess. Ignoring
>>>>>> the status, when this bit is set, solves my problem. But I just
>>>>>> have one notebook hardware (I219-LM, rev 21), which exhibits the
>>>>>> problem. It doesn't happen for my other notebook with I219-V
>>>>>> (rev 21) hardware (or it's just much more unlikely).
>>>>>>
>>>>>> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
>>>>>> ---
>>>>>>  ?? drivers/net/ethernet/intel/e1000e/defines.h | 1 +
>>>>>>  ?? drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>>>>>>  ?? drivers/net/ethernet/intel/e1000e/mac.c???? | 2 ++
>>>>>>  ?? 3 files changed, 5 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>> b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>> index fd550de..3cd9f99 100644
>>>>>> --- a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>> @@ -221,6 +221,7 @@
>>>>>>  ?? #define E1000_STATUS_LAN_INIT_DONE 0x00000200?? /* Lan Init
>>>>> Completion by NVM */
>>>>>>  ?? #define E1000_STATUS_PHYRA????? 0x00000400????? /* PHY Reset
>>>>> Asserted */
>>>>>>  ?? #define E1000_STATUS_GIO_MASTER_ENABLE??? 0x00080000??? /* Master Req
>>>>> status */
>>>>>> +#define E1000_STATUS_AUTONEG??? 0x40000000????? /* in
>>>>> auto-negotiation */
>>>>>>     
>>>>> There is no such indication. Should be removed.
>>>>>>  ?? #define HALF_DUPLEX 1
>>>>>>  ?? #define FULL_DUPLEX 2
>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> index fd59970..8588eb7 100644
>>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> @@ -1390,7 +1390,8 @@ static s32
>>>>> e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>>>>>>  ?????????? u16 speed;
>>>>>>  ?????????? u8 duplex;
>>>>>>  ?? -??????? e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
>>>>>> +??????? if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
>>>>>> +??????????? goto out;
>>>>>>  ?????????? tipg_reg = er32(TIPG);
>>>>>>  ?????????? tipg_reg &= ~E1000_TIPG_IPGT_MASK;
>>>>>>  ?? diff --git a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>> b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>> index 19c816c..ada8fbb 100644
>>>>>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>> @@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct
>>>>> e1000_hw *hw, u16 *speed,
>>>>>>  ?? ?????? status = er32(STATUS);
>>>>>>  ?? +??? if (status & E1000_STATUS_AUTONEG)
>>>>>> +??????? return 1;
>>>>> This is wrong. We have no AUTONEG indication in bit 30 of E1000_STATUS
>>>>> (0x0008) register. These code piece should be removed.
>>>>>>  ?????? if (!(status & E1000_STATUS_LU))
>>>>>>  ?????????? return 1;
>>>>>>    
>>>>> Hello Jan-Marek,
>>>>> That's okay to use u8 size for a duplex indication and u16 size for a
>>>>> link indication, as you refer in previous patch.
>>>>> But use the 'autoneg status' is wrong.
>>>>
>>>> Just as a reminder: I have no idea what this bit actually indicates. This is just a guess I had
>>>> when looking into the problem. I don't know if the device was still negotiating at this point, but
>>>> this bit was set in the status register.
>>>>
>>>>> I wonder how this can solve the problem. Do you
>>>>> encountered with this problem on other platforms with our devices? (I meant different, no similar
>>>>> HW)
>>>>
>>>> Other platforms as Windows? I'm just doing Linux development, but I'll ask the Windows people and
>>>> can check, if this problem also happens there.
>>>>
>>>> I don't see this problem with older HW (Fujitsu E7x6, also Skylake based, but I219-V). It happens
>>>> with both of my U7x7 test notebooks. I have some older Haswell based HW (E7x4), which I didn't yet
>>>> test. Google tells me they have "Intel 82579LM Gigabit" ethernet.
>>>>
>>>> All of these three series are in use and we have a few hundred or even thousand of them. This
>>>> problem was found during the tests for our next Ubuntu 18.04 based release. This just seems to
>>>> happen with the "new" U-series. I'm not aware of any problems like this with the older E-series HW.
>>>> And it probably just happens more often now for whatever reason.
>>>>
>>>>> Anyway, 0x40000000 indication is not relevant to the auto-negotiation.
>>>>> May I ask do your experiments with ME disable (via BIOS) and see if
>>>>> same problem still happen.
>>>>
>>>> Disabling ME shouldn't be a problem to test.
>>>>
>>> You have mentioned that there is no problem on I219-V. The main difference between I219-LM and
>>> I219-V is 'Intel Standard Manageability' feature. So, I suggest to disable ME and re-check.
>>>> I'll continue testing all the HW tomorrow, with both our releases, and report back. And maybe
>>>> there is an easier way to trigger the problem then re-plugging the cable all the time (maybe
>>>> better to get a switch and power cycle that...).
>>>>
>>>> Please tell me if there is anything else I should look for or test.
>>>> Further step more likely should be dump registers and try access to a
>>> PHY. But let's check ME disabled as the first step.
>>
>> According to the BIOS ME is actually disabled.
>> Nevertheless I selected "UnConfigure ME", which didn'tr change anything in the BIOS (ME
>> v11.8.50.3425 FWIW). I did look for vendor BIOS updates, as you think this problem might be ME
>> related. There is an update available.
> 
> So I did the BIOS update - no changes regarding the network auto-negotiation behavior.
> 
> I also tried both of my E-Series. The old Haswell series (E7x4) also has a disabled ME and as
> suspected the following HW:
> 
> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
>          Subsystem: Fujitsu Limited. Ethernet Connection I217-LM
>          Flags: bus master, fast devsel, latency 0, IRQ 27
>          Memory at f0500000 (32-bit, non-prefetchable) [size=128K]
>          Memory at f053f000 (32-bit, non-prefetchable) [size=4K]
>          I/O ports at 3080 [size=32]
>          Capabilities: [c8] Power Management version 2
>          Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>          Capabilities: [e0] PCI Advanced Features
>          Kernel driver in use: e1000e
>          Kernel modules: e1000e
> 
> I tried the patched module on both E-series HW and they always have the 0x40000000 bit set when
> decoding the speed from the status register (always 0x40080083), either with or without the ME
> available. So my patch breaks my older HW, as you probably suspected. I removed the 0x40000000 test
> from the module, and they always negotiated 1000 Mbps just fine.
> 
> I've attached logs for all three notebooks with my patched module (without the  0x40000000 test) and
> a debug filter for all files of the module (echo "file */e1000e-20/* +p;" >
> /sys/kernel/debug/dynamic_debug/control).
> 
> My test consisted of rmmod'ing, sleep 1, insmod'ing, set debug filter + two reconnects.
> 
> So I'm basically back to square one.
> 
> How to proceed?
> 
ME disabled - good. How long time you wait for 1000Mbps after a re 
connection of the cable? Could please, wait 5-10s and see if link back 
to the 1000Mbps?
Unfortunately we have no such HW in our labs. I will try ask if our PAE 
can help with more debug if need.
> JMG
> 
Sasha

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-07 15:49               ` Neftin, Sasha
@ 2019-01-07 16:37                 ` Jan-Marek Glogowski
  2019-01-08  8:31                   ` Neftin, Sasha
  0 siblings, 1 reply; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-07 16:37 UTC (permalink / raw)
  To: intel-wired-lan



Am 07.01.19 um 16:49 schrieb Neftin, Sasha:
> On 1/7/2019 16:15, Jan-Marek Glogowski wrote:
>>
>>
>> Am 07.01.19 um 10:00 schrieb Jan-Marek Glogowski:
>>>
>>>
>>> Am 07.01.19 um 07:32 schrieb Neftin, Sasha:
>>>> On 1/6/2019 21:53, Jan-Marek Glogowski wrote:
>>>>> Am 6. Januar 2019 16:28:42 MEZ schrieb "Neftin, Sasha" <sasha.neftin@intel.com>:
>>>>>> On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
>>>>>>> My problem is the fallback of the hardware to 10 Mbps after a
>>>>>>> re-connect, which happens almost all times. In the broken case
>>>>>>> the status field has always the 0x40000000 bit set.
>>>>>>>
>>>>>>> Still the naming for the status flag is just a guess. Ignoring
>>>>>>> the status, when this bit is set, solves my problem. But I just
>>>>>>> have one notebook hardware (I219-LM, rev 21), which exhibits the
>>>>>>> problem. It doesn't happen for my other notebook with I219-V
>>>>>>> (rev 21) hardware (or it's just much more unlikely).
>>>>>>>
>>>>>>> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
>>>>>>> ---
>>>>>>> ??? drivers/net/ethernet/intel/e1000e/defines.h | 1 +
>>>>>>> ??? drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>>>>>>> ??? drivers/net/ethernet/intel/e1000e/mac.c???? | 2 ++
>>>>>>> ??? 3 files changed, 5 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>> b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>> index fd550de..3cd9f99 100644
>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>> @@ -221,6 +221,7 @@
>>>>>>> ??? #define E1000_STATUS_LAN_INIT_DONE 0x00000200?? /* Lan Init
>>>>>> Completion by NVM */
>>>>>>> ??? #define E1000_STATUS_PHYRA????? 0x00000400????? /* PHY Reset
>>>>>> Asserted */
>>>>>>> ??? #define E1000_STATUS_GIO_MASTER_ENABLE??? 0x00080000??? /* Master Req
>>>>>> status */
>>>>>>> +#define E1000_STATUS_AUTONEG??? 0x40000000????? /* in
>>>>>> auto-negotiation */
>>>>>>> ??? 
>>>>>> There is no such indication. Should be removed.
>>>>>>> ??? #define HALF_DUPLEX 1
>>>>>>> ??? #define FULL_DUPLEX 2
>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>> index fd59970..8588eb7 100644
>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>> @@ -1390,7 +1390,8 @@ static s32
>>>>>> e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>>>>>>> ??????????? u16 speed;
>>>>>>> ??????????? u8 duplex;
>>>>>>> ??? -??????? e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
>>>>>>> +??????? if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
>>>>>>> +??????????? goto out;
>>>>>>> ??????????? tipg_reg = er32(TIPG);
>>>>>>> ??????????? tipg_reg &= ~E1000_TIPG_IPGT_MASK;
>>>>>>> ??? diff --git a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>> b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>> index 19c816c..ada8fbb 100644
>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>> @@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct
>>>>>> e1000_hw *hw, u16 *speed,
>>>>>>> ??? ?????? status = er32(STATUS);
>>>>>>> ??? +??? if (status & E1000_STATUS_AUTONEG)
>>>>>>> +??????? return 1;
>>>>>> This is wrong. We have no AUTONEG indication in bit 30 of E1000_STATUS
>>>>>> (0x0008) register. These code piece should be removed.
>>>>>>> ??????? if (!(status & E1000_STATUS_LU))
>>>>>>> ??????????? return 1;
>>>>>>> ?? 
>>>>>> Hello Jan-Marek,
>>>>>> That's okay to use u8 size for a duplex indication and u16 size for a
>>>>>> link indication, as you refer in previous patch.
>>>>>> But use the 'autoneg status' is wrong.
>>>>>
>>>>> Just as a reminder: I have no idea what this bit actually indicates. This is just a guess I had
>>>>> when looking into the problem. I don't know if the device was still negotiating at this point, but
>>>>> this bit was set in the status register.
>>>>>
>>>>>> I wonder how this can solve the problem. Do you
>>>>>> encountered with this problem on other platforms with our devices? (I meant different, no similar
>>>>>> HW)
>>>>>
>>>>> Other platforms as Windows? I'm just doing Linux development, but I'll ask the Windows people and
>>>>> can check, if this problem also happens there.
>>>>>
>>>>> I don't see this problem with older HW (Fujitsu E7x6, also Skylake based, but I219-V). It happens
>>>>> with both of my U7x7 test notebooks. I have some older Haswell based HW (E7x4), which I didn't yet
>>>>> test. Google tells me they have "Intel 82579LM Gigabit" ethernet.
>>>>>
>>>>> All of these three series are in use and we have a few hundred or even thousand of them. This
>>>>> problem was found during the tests for our next Ubuntu 18.04 based release. This just seems to
>>>>> happen with the "new" U-series. I'm not aware of any problems like this with the older E-series
>>>>> HW.
>>>>> And it probably just happens more often now for whatever reason.
>>>>>
>>>>>> Anyway, 0x40000000 indication is not relevant to the auto-negotiation.
>>>>>> May I ask do your experiments with ME disable (via BIOS) and see if
>>>>>> same problem still happen.
>>>>>
>>>>> Disabling ME shouldn't be a problem to test.
>>>>>
>>>> You have mentioned that there is no problem on I219-V. The main difference between I219-LM and
>>>> I219-V is 'Intel Standard Manageability' feature. So, I suggest to disable ME and re-check.
>>>>> I'll continue testing all the HW tomorrow, with both our releases, and report back. And maybe
>>>>> there is an easier way to trigger the problem then re-plugging the cable all the time (maybe
>>>>> better to get a switch and power cycle that...).
>>>>>
>>>>> Please tell me if there is anything else I should look for or test.
>>>>> Further step more likely should be dump registers and try access to a
>>>> PHY. But let's check ME disabled as the first step.
>>>
>>> According to the BIOS ME is actually disabled.
>>> Nevertheless I selected "UnConfigure ME", which didn'tr change anything in the BIOS (ME
>>> v11.8.50.3425 FWIW). I did look for vendor BIOS updates, as you think this problem might be ME
>>> related. There is an update available.
>>
>> So I did the BIOS update - no changes regarding the network auto-negotiation behavior.
>>
>> I also tried both of my E-Series. The old Haswell series (E7x4) also has a disabled ME and as
>> suspected the following HW:
>>
>> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
>> ???????? Subsystem: Fujitsu Limited. Ethernet Connection I217-LM
>> ???????? Flags: bus master, fast devsel, latency 0, IRQ 27
>> ???????? Memory at f0500000 (32-bit, non-prefetchable) [size=128K]
>> ???????? Memory at f053f000 (32-bit, non-prefetchable) [size=4K]
>> ???????? I/O ports at 3080 [size=32]
>> ???????? Capabilities: [c8] Power Management version 2
>> ???????? Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>> ???????? Capabilities: [e0] PCI Advanced Features
>> ???????? Kernel driver in use: e1000e
>> ???????? Kernel modules: e1000e
>>
>> I tried the patched module on both E-series HW and they always have the 0x40000000 bit set when
>> decoding the speed from the status register (always 0x40080083), either with or without the ME
>> available. So my patch breaks my older HW, as you probably suspected. I removed the 0x40000000 test
>> from the module, and they always negotiated 1000 Mbps just fine.
>>
>> I've attached logs for all three notebooks with my patched module (without the? 0x40000000 test) and
>> a debug filter for all files of the module (echo "file */e1000e-20/* +p;" >
>> /sys/kernel/debug/dynamic_debug/control).
>>
>> My test consisted of rmmod'ing, sleep 1, insmod'ing, set debug filter + two reconnects.
>>
>> So I'm basically back to square one.
>>
>> How to proceed?
>>
> ME disabled - good. How long time you wait for 1000Mbps after a re connection of the cable? Could
> please, wait 5-10s and see if link back to the 1000Mbps?

From the U757 logs attached to the last mail:

[11750.669940] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
[11750.670054] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
[11750.670165] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
[11750.670166] e1000e: enp0s31f6 NIC Link is Down
[11752.925934] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
[11752.926065] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
[11752.926193] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
...
[11754.813959] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
[11754.814034] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
[11754.814106] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
...
[11768.142020] e1000e 0000:00:1f.6 enp0s31f6: status 0x40080003 => 10 Mbps, Full Duplex
...
[11768.151411] e1000e: enp0s31f6 NIC Link is Up 10 Mbps Full Duplex, Flow Control: None

Which is something about 16s.

Actually IMHO there is a larger chance to fall to 10 Mbits if you wait longer disconnected.
Still reloading the module remedies this condition.

> Unfortunately we have no such HW in our labs. I will try ask if our PAE can help with more debug if
> need.

Hmmm.

JMG

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-07 16:37                 ` Jan-Marek Glogowski
@ 2019-01-08  8:31                   ` Neftin, Sasha
  2019-01-08  9:59                     ` Jan-Marek Glogowski
  0 siblings, 1 reply; 27+ messages in thread
From: Neftin, Sasha @ 2019-01-08  8:31 UTC (permalink / raw)
  To: intel-wired-lan

On 1/7/2019 18:37, Jan-Marek Glogowski wrote:
> 
> 
> Am 07.01.19 um 16:49 schrieb Neftin, Sasha:
>> On 1/7/2019 16:15, Jan-Marek Glogowski wrote:
>>>
>>>
>>> Am 07.01.19 um 10:00 schrieb Jan-Marek Glogowski:
>>>>
>>>>
>>>> Am 07.01.19 um 07:32 schrieb Neftin, Sasha:
>>>>> On 1/6/2019 21:53, Jan-Marek Glogowski wrote:
>>>>>> Am 6. Januar 2019 16:28:42 MEZ schrieb "Neftin, Sasha" <sasha.neftin@intel.com>:
>>>>>>> On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
>>>>>>>> My problem is the fallback of the hardware to 10 Mbps after a
>>>>>>>> re-connect, which happens almost all times. In the broken case
>>>>>>>> the status field has always the 0x40000000 bit set.
>>>>>>>>
>>>>>>>> Still the naming for the status flag is just a guess. Ignoring
>>>>>>>> the status, when this bit is set, solves my problem. But I just
>>>>>>>> have one notebook hardware (I219-LM, rev 21), which exhibits the
>>>>>>>> problem. It doesn't happen for my other notebook with I219-V
>>>>>>>> (rev 21) hardware (or it's just much more unlikely).
>>>>>>>>
>>>>>>>> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
>>>>>>>> ---
>>>>>>>>  ??? drivers/net/ethernet/intel/e1000e/defines.h | 1 +
>>>>>>>>  ??? drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>>>>>>>>  ??? drivers/net/ethernet/intel/e1000e/mac.c???? | 2 ++
>>>>>>>>  ??? 3 files changed, 5 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>> b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>> index fd550de..3cd9f99 100644
>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>> @@ -221,6 +221,7 @@
>>>>>>>>  ??? #define E1000_STATUS_LAN_INIT_DONE 0x00000200?? /* Lan Init
>>>>>>> Completion by NVM */
>>>>>>>>  ??? #define E1000_STATUS_PHYRA????? 0x00000400????? /* PHY Reset
>>>>>>> Asserted */
>>>>>>>>  ??? #define E1000_STATUS_GIO_MASTER_ENABLE??? 0x00080000??? /* Master Req
>>>>>>> status */
>>>>>>>> +#define E1000_STATUS_AUTONEG??? 0x40000000????? /* in
>>>>>>> auto-negotiation */
>>>>>>>>      
>>>>>>> There is no such indication. Should be removed.
>>>>>>>>  ??? #define HALF_DUPLEX 1
>>>>>>>>  ??? #define FULL_DUPLEX 2
>>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>> index fd59970..8588eb7 100644
>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>> @@ -1390,7 +1390,8 @@ static s32
>>>>>>> e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>>>>>>>>  ??????????? u16 speed;
>>>>>>>>  ??????????? u8 duplex;
>>>>>>>>  ??? -??????? e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
>>>>>>>> +??????? if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
>>>>>>>> +??????????? goto out;
>>>>>>>>  ??????????? tipg_reg = er32(TIPG);
>>>>>>>>  ??????????? tipg_reg &= ~E1000_TIPG_IPGT_MASK;
>>>>>>>>  ??? diff --git a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>> b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>> index 19c816c..ada8fbb 100644
>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>> @@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct
>>>>>>> e1000_hw *hw, u16 *speed,
>>>>>>>>  ??? ?????? status = er32(STATUS);
>>>>>>>>  ??? +??? if (status & E1000_STATUS_AUTONEG)
>>>>>>>> +??????? return 1;
>>>>>>> This is wrong. We have no AUTONEG indication in bit 30 of E1000_STATUS
>>>>>>> (0x0008) register. These code piece should be removed.
>>>>>>>>  ??????? if (!(status & E1000_STATUS_LU))
>>>>>>>>  ??????????? return 1;
>>>>>>>>     
>>>>>>> Hello Jan-Marek,
>>>>>>> That's okay to use u8 size for a duplex indication and u16 size for a
>>>>>>> link indication, as you refer in previous patch.
>>>>>>> But use the 'autoneg status' is wrong.
>>>>>>
>>>>>> Just as a reminder: I have no idea what this bit actually indicates. This is just a guess I had
>>>>>> when looking into the problem. I don't know if the device was still negotiating at this point, but
>>>>>> this bit was set in the status register.
>>>>>>
>>>>>>> I wonder how this can solve the problem. Do you
>>>>>>> encountered with this problem on other platforms with our devices? (I meant different, no similar
>>>>>>> HW)
>>>>>>
>>>>>> Other platforms as Windows? I'm just doing Linux development, but I'll ask the Windows people and
>>>>>> can check, if this problem also happens there.
>>>>>>
>>>>>> I don't see this problem with older HW (Fujitsu E7x6, also Skylake based, but I219-V). It happens
>>>>>> with both of my U7x7 test notebooks. I have some older Haswell based HW (E7x4), which I didn't yet
>>>>>> test. Google tells me they have "Intel 82579LM Gigabit" ethernet.
>>>>>>
>>>>>> All of these three series are in use and we have a few hundred or even thousand of them. This
>>>>>> problem was found during the tests for our next Ubuntu 18.04 based release. This just seems to
>>>>>> happen with the "new" U-series. I'm not aware of any problems like this with the older E-series
>>>>>> HW.
>>>>>> And it probably just happens more often now for whatever reason.
>>>>>>
>>>>>>> Anyway, 0x40000000 indication is not relevant to the auto-negotiation.
>>>>>>> May I ask do your experiments with ME disable (via BIOS) and see if
>>>>>>> same problem still happen.
>>>>>>
>>>>>> Disabling ME shouldn't be a problem to test.
>>>>>>
>>>>> You have mentioned that there is no problem on I219-V. The main difference between I219-LM and
>>>>> I219-V is 'Intel Standard Manageability' feature. So, I suggest to disable ME and re-check.
>>>>>> I'll continue testing all the HW tomorrow, with both our releases, and report back. And maybe
>>>>>> there is an easier way to trigger the problem then re-plugging the cable all the time (maybe
>>>>>> better to get a switch and power cycle that...).
>>>>>>
>>>>>> Please tell me if there is anything else I should look for or test.
>>>>>> Further step more likely should be dump registers and try access to a
>>>>> PHY. But let's check ME disabled as the first step.
>>>>
>>>> According to the BIOS ME is actually disabled.
>>>> Nevertheless I selected "UnConfigure ME", which didn'tr change anything in the BIOS (ME
>>>> v11.8.50.3425 FWIW). I did look for vendor BIOS updates, as you think this problem might be ME
>>>> related. There is an update available.
>>>
>>> So I did the BIOS update - no changes regarding the network auto-negotiation behavior.
>>>
>>> I also tried both of my E-Series. The old Haswell series (E7x4) also has a disabled ME and as
>>> suspected the following HW:
>>>
>>> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
>>>  ???????? Subsystem: Fujitsu Limited. Ethernet Connection I217-LM
>>>  ???????? Flags: bus master, fast devsel, latency 0, IRQ 27
>>>  ???????? Memory at f0500000 (32-bit, non-prefetchable) [size=128K]
>>>  ???????? Memory at f053f000 (32-bit, non-prefetchable) [size=4K]
>>>  ???????? I/O ports at 3080 [size=32]
>>>  ???????? Capabilities: [c8] Power Management version 2
>>>  ???????? Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>  ???????? Capabilities: [e0] PCI Advanced Features
>>>  ???????? Kernel driver in use: e1000e
>>>  ???????? Kernel modules: e1000e
>>>
>>> I tried the patched module on both E-series HW and they always have the 0x40000000 bit set when
>>> decoding the speed from the status register (always 0x40080083), either with or without the ME
>>> available. So my patch breaks my older HW, as you probably suspected. I removed the 0x40000000 test
>>> from the module, and they always negotiated 1000 Mbps just fine.
>>>
>>> I've attached logs for all three notebooks with my patched module (without the? 0x40000000 test) and
>>> a debug filter for all files of the module (echo "file */e1000e-20/* +p;" >
>>> /sys/kernel/debug/dynamic_debug/control).
>>>
>>> My test consisted of rmmod'ing, sleep 1, insmod'ing, set debug filter + two reconnects.
>>>
>>> So I'm basically back to square one.
>>>
>>> How to proceed?
>>>
>> ME disabled - good. How long time you wait for 1000Mbps after a re connection of the cable? Could
>> please, wait 5-10s and see if link back to the 1000Mbps?
> 
>  From the U757 logs attached to the last mail:
> 
> [11750.669940] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
> [11750.670054] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
> [11750.670165] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
> [11750.670166] e1000e: enp0s31f6 NIC Link is Down
> [11752.925934] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
> [11752.926065] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
> [11752.926193] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
> ...
> [11754.813959] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
> [11754.814034] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
> [11754.814106] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
> ...
> [11768.142020] e1000e 0000:00:1f.6 enp0s31f6: status 0x40080003 => 10 Mbps, Full Duplex
> ...
> [11768.151411] e1000e: enp0s31f6 NIC Link is Up 10 Mbps Full Duplex, Flow Control: None
> 
> Which is something about 16s.
> 
> Actually IMHO there is a larger chance to fall to 10 Mbits if you wait longer disconnected.
> Still reloading the module remedies this condition.
> 
>> Unfortunately we have no such HW in our labs. I will try ask if our PAE can help with more debug if
>> need.
> 
> Hmmm.
> 
Since you still read 0x40000000 value in the status register it is 
causing me to think that ME works. Another way I think you should to go 
ask your vendors for last updated NVM (or with ME disabled if possible) 
for your HW. Since I219V works as properly, I expected I219-LM without 
ME works too.
Let's do follow experiments on your side. Please, do rmmod e1000e.ko, 
bring up the machine without the driver. Then re-connect the cable few 
times and see what is link's speed up. You can be trusted on the LED 
indicators.
> JMG
> 
Sasha

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-08  8:31                   ` Neftin, Sasha
@ 2019-01-08  9:59                     ` Jan-Marek Glogowski
  2019-01-08 10:15                       ` Paul Menzel
  2019-01-08 10:15                       ` Jan-Marek Glogowski
  0 siblings, 2 replies; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-08  9:59 UTC (permalink / raw)
  To: intel-wired-lan



Am 08.01.19 um 09:31 schrieb Neftin, Sasha:
> On 1/7/2019 18:37, Jan-Marek Glogowski wrote:
>>
>>
>> Am 07.01.19 um 16:49 schrieb Neftin, Sasha:
>>> On 1/7/2019 16:15, Jan-Marek Glogowski wrote:
>>>>
>>>>
>>>> Am 07.01.19 um 10:00 schrieb Jan-Marek Glogowski:
>>>>>
>>>>>
>>>>> Am 07.01.19 um 07:32 schrieb Neftin, Sasha:
>>>>>> On 1/6/2019 21:53, Jan-Marek Glogowski wrote:
>>>>>>> Am 6. Januar 2019 16:28:42 MEZ schrieb "Neftin, Sasha" <sasha.neftin@intel.com>:
>>>>>>>> On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
>>>>>>>>> My problem is the fallback of the hardware to 10 Mbps after a
>>>>>>>>> re-connect, which happens almost all times. In the broken case
>>>>>>>>> the status field has always the 0x40000000 bit set.
>>>>>>>>>
>>>>>>>>> Still the naming for the status flag is just a guess. Ignoring
>>>>>>>>> the status, when this bit is set, solves my problem. But I just
>>>>>>>>> have one notebook hardware (I219-LM, rev 21), which exhibits the
>>>>>>>>> problem. It doesn't happen for my other notebook with I219-V
>>>>>>>>> (rev 21) hardware (or it's just much more unlikely).
>>>>>>>>>
>>>>>>>>> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
>>>>>>>>> ---
>>>>>>>>> ???? drivers/net/ethernet/intel/e1000e/defines.h | 1 +
>>>>>>>>> ???? drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>>>>>>>>> ???? drivers/net/ethernet/intel/e1000e/mac.c???? | 2 ++
>>>>>>>>> ???? 3 files changed, 5 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>> b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>>> index fd550de..3cd9f99 100644
>>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>>> @@ -221,6 +221,7 @@
>>>>>>>>> ???? #define E1000_STATUS_LAN_INIT_DONE 0x00000200?? /* Lan Init
>>>>>>>> Completion by NVM */
>>>>>>>>> ???? #define E1000_STATUS_PHYRA????? 0x00000400????? /* PHY Reset
>>>>>>>> Asserted */
>>>>>>>>> ???? #define E1000_STATUS_GIO_MASTER_ENABLE??? 0x00080000??? /* Master Req
>>>>>>>> status */
>>>>>>>>> +#define E1000_STATUS_AUTONEG??? 0x40000000????? /* in
>>>>>>>> auto-negotiation */
>>>>>>>>> ???? 
>>>>>>>> There is no such indication. Should be removed.
>>>>>>>>> ???? #define HALF_DUPLEX 1
>>>>>>>>> ???? #define FULL_DUPLEX 2
>>>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>> index fd59970..8588eb7 100644
>>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>> @@ -1390,7 +1390,8 @@ static s32
>>>>>>>> e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>>>>>>>>> ???????????? u16 speed;
>>>>>>>>> ???????????? u8 duplex;
>>>>>>>>> ???? -??????? e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
>>>>>>>>> +??????? if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
>>>>>>>>> +??????????? goto out;
>>>>>>>>> ???????????? tipg_reg = er32(TIPG);
>>>>>>>>> ???????????? tipg_reg &= ~E1000_TIPG_IPGT_MASK;
>>>>>>>>> ???? diff --git a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>> b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>>> index 19c816c..ada8fbb 100644
>>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>>> @@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct
>>>>>>>> e1000_hw *hw, u16 *speed,
>>>>>>>>> ???? ?????? status = er32(STATUS);
>>>>>>>>> ???? +??? if (status & E1000_STATUS_AUTONEG)
>>>>>>>>> +??????? return 1;
>>>>>>>> This is wrong. We have no AUTONEG indication in bit 30 of E1000_STATUS
>>>>>>>> (0x0008) register. These code piece should be removed.
>>>>>>>>> ???????? if (!(status & E1000_STATUS_LU))
>>>>>>>>> ???????????? return 1;
>>>>>>>>> ??? 
>>>>>>>> Hello Jan-Marek,
>>>>>>>> That's okay to use u8 size for a duplex indication and u16 size for a
>>>>>>>> link indication, as you refer in previous patch.
>>>>>>>> But use the 'autoneg status' is wrong.
>>>>>>>
>>>>>>> Just as a reminder: I have no idea what this bit actually indicates. This is just a guess I had
>>>>>>> when looking into the problem. I don't know if the device was still negotiating at this
>>>>>>> point, but
>>>>>>> this bit was set in the status register.
>>>>>>>
>>>>>>>> I wonder how this can solve the problem. Do you
>>>>>>>> encountered with this problem on other platforms with our devices? (I meant different, no
>>>>>>>> similar
>>>>>>>> HW)
>>>>>>>
>>>>>>> Other platforms as Windows? I'm just doing Linux development, but I'll ask the Windows people
>>>>>>> and
>>>>>>> can check, if this problem also happens there.
>>>>>>>
>>>>>>> I don't see this problem with older HW (Fujitsu E7x6, also Skylake based, but I219-V). It
>>>>>>> happens
>>>>>>> with both of my U7x7 test notebooks. I have some older Haswell based HW (E7x4), which I
>>>>>>> didn't yet
>>>>>>> test. Google tells me they have "Intel 82579LM Gigabit" ethernet.
>>>>>>>
>>>>>>> All of these three series are in use and we have a few hundred or even thousand of them. This
>>>>>>> problem was found during the tests for our next Ubuntu 18.04 based release. This just seems to
>>>>>>> happen with the "new" U-series. I'm not aware of any problems like this with the older E-series
>>>>>>> HW.
>>>>>>> And it probably just happens more often now for whatever reason.
>>>>>>>
>>>>>>>> Anyway, 0x40000000 indication is not relevant to the auto-negotiation.
>>>>>>>> May I ask do your experiments with ME disable (via BIOS) and see if
>>>>>>>> same problem still happen.
>>>>>>>
>>>>>>> Disabling ME shouldn't be a problem to test.
>>>>>>>
>>>>>> You have mentioned that there is no problem on I219-V. The main difference between I219-LM and
>>>>>> I219-V is 'Intel Standard Manageability' feature. So, I suggest to disable ME and re-check.
>>>>>>> I'll continue testing all the HW tomorrow, with both our releases, and report back. And maybe
>>>>>>> there is an easier way to trigger the problem then re-plugging the cable all the time (maybe
>>>>>>> better to get a switch and power cycle that...).
>>>>>>>
>>>>>>> Please tell me if there is anything else I should look for or test.
>>>>>>> Further step more likely should be dump registers and try access to a
>>>>>> PHY. But let's check ME disabled as the first step.
>>>>>
>>>>> According to the BIOS ME is actually disabled.
>>>>> Nevertheless I selected "UnConfigure ME", which didn'tr change anything in the BIOS (ME
>>>>> v11.8.50.3425 FWIW). I did look for vendor BIOS updates, as you think this problem might be ME
>>>>> related. There is an update available.
>>>>
>>>> So I did the BIOS update - no changes regarding the network auto-negotiation behavior.
>>>>
>>>> I also tried both of my E-Series. The old Haswell series (E7x4) also has a disabled ME and as
>>>> suspected the following HW:
>>>>
>>>> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
>>>> ????????? Subsystem: Fujitsu Limited. Ethernet Connection I217-LM
>>>> ????????? Flags: bus master, fast devsel, latency 0, IRQ 27
>>>> ????????? Memory at f0500000 (32-bit, non-prefetchable) [size=128K]
>>>> ????????? Memory at f053f000 (32-bit, non-prefetchable) [size=4K]
>>>> ????????? I/O ports at 3080 [size=32]
>>>> ????????? Capabilities: [c8] Power Management version 2
>>>> ????????? Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>> ????????? Capabilities: [e0] PCI Advanced Features
>>>> ????????? Kernel driver in use: e1000e
>>>> ????????? Kernel modules: e1000e
>>>>
>>>> I tried the patched module on both E-series HW and they always have the 0x40000000 bit set when
>>>> decoding the speed from the status register (always 0x40080083), either with or without the ME
>>>> available. So my patch breaks my older HW, as you probably suspected. I removed the 0x40000000 test
>>>> from the module, and they always negotiated 1000 Mbps just fine.
>>>>
>>>> I've attached logs for all three notebooks with my patched module (without the? 0x40000000 test)
>>>> and
>>>> a debug filter for all files of the module (echo "file */e1000e-20/* +p;" >
>>>> /sys/kernel/debug/dynamic_debug/control).
>>>>
>>>> My test consisted of rmmod'ing, sleep 1, insmod'ing, set debug filter + two reconnects.
>>>>
>>>> So I'm basically back to square one.
>>>>
>>>> How to proceed?
>>>>
>>> ME disabled - good. How long time you wait for 1000Mbps after a re connection of the cable? Could
>>> please, wait 5-10s and see if link back to the 1000Mbps?
>>
>> ?From the U757 logs attached to the last mail:
>>
>> [11750.669940] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>> [11750.670054] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>> [11750.670165] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
>> [11750.670166] e1000e: enp0s31f6 NIC Link is Down
>> [11752.925934] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>> [11752.926065] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>> [11752.926193] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
>> ...
>> [11754.813959] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>> [11754.814034] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>> [11754.814106] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
>> ...
>> [11768.142020] e1000e 0000:00:1f.6 enp0s31f6: status 0x40080003 => 10 Mbps, Full Duplex
>> ...
>> [11768.151411] e1000e: enp0s31f6 NIC Link is Up 10 Mbps Full Duplex, Flow Control: None
>>
>> Which is something about 16s.
>>
>> Actually IMHO there is a larger chance to fall to 10 Mbits if you wait longer disconnected.
>> Still reloading the module remedies this condition.
>>
>>> Unfortunately we have no such HW in our labs. I will try ask if our PAE can help with more debug if
>>> need.
>>
>> Hmmm.
>>
> Since you still read 0x40000000 value in the status register it is causing me to think that ME
> works. Another way I think you should to go ask your vendors for last updated NVM (or with ME
> disabled if possible) for your HW. Since I219V works as properly, I expected I219-LM without ME
> works too.

From the logs I sent:

I217-LM: "status 0x40080083 => 1000 Mbps, Full Duplex"
I219-V: "status 0x40080083 => 1000 Mbps, Full Duplex"
I219-LM auto-nego ok: "status 0x80083 => 1000 Mbps, Full Duplex"
I219-LM auto-nego broken: "status 0x40080003 => 10 Mbps, Full Duplex"

According to the BIOS both LM-variants have Intel ME disabled. If I can't trust the BIOS, is there a
way to check this?

> Let's do follow experiments on your side. Please, do rmmod e1000e.ko, bring up the machine without
> the driver. Then re-connect the cable few times and see what is link's speed up. You can be trusted
> on the LED indicators.

So even if the driver (+ ethtool) indicates 10 MBits both lights are on. Same without the driver.
But wget speed is is just 10 MBits, as ethtool indicates. Manually overriding to 1000 Mbps still works.

Now I directly connected the I219-V with my I219-LM.

On the I219-V I ran "ethtool -s enp0s31f6 speed n autoneg off"
These are the results on the I219-LM side:
n=1000 : green and yellow lights on. "status 0x80083 => 1000 Mbps, Full Duplex"
n=100 : green and yellow lights on. "status 0x80042 => 100 Mbps, Half Duplex"
n=10 : just green light on. "status 0x80082 => 10 Mbps, Half Duplex"

The light indicators are the same on the I219-LM without the driver.

All status output on the I219-V has the 0x40000000 bit set, the I219-LM normally never.
The I219-V always had full duplex according to ethtool and the status.

If I set both sides to auto-neg, the first negotiation is correct (both have 1000 Mbps). If I
reconnect it becomes 10 Mbps on the I219-LM side and 1000 Mbps on the I219-V side. Both lights are
still on. The I219-LM side *just* has the status bit 0x40000000 set the first time it goes to 10
Mbits (what I based my patch on without testing it on other HW).

JMG

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-08  9:59                     ` Jan-Marek Glogowski
@ 2019-01-08 10:15                       ` Paul Menzel
  2019-01-08 11:15                         ` Jan-Marek Glogowski
  2019-01-08 10:15                       ` Jan-Marek Glogowski
  1 sibling, 1 reply; 27+ messages in thread
From: Paul Menzel @ 2019-01-08 10:15 UTC (permalink / raw)
  To: intel-wired-lan

Dear Jan-Marek,


On 01/08/19 10:59, Jan-Marek Glogowski wrote:

> Am 08.01.19 um 09:31 schrieb Neftin, Sasha:
>> On 1/7/2019 18:37, Jan-Marek Glogowski wrote:

[?]

>> Since you still read 0x40000000 value in the status register it is causing me to think that ME
>> works. Another way I think you should to go ask your vendors for last updated NVM (or with ME
>> disabled if possible) for your HW. Since I219V works as properly, I expected I219-LM without ME
>> works too.
> 
> From the logs I sent:
> 
> I217-LM: "status 0x40080083 => 1000 Mbps, Full Duplex"
> I219-V: "status 0x40080083 => 1000 Mbps, Full Duplex"
> I219-LM auto-nego ok: "status 0x80083 => 1000 Mbps, Full Duplex"
> I219-LM auto-nego broken: "status 0x40080003 => 10 Mbps, Full Duplex"
> 
> According to the BIOS both LM-variants have Intel ME disabled. If I can't trust the BIOS, is there a
> way to check this?

What does intelmetool from the coreboot project show?

    $ git clone https://review.coreboot.org/coreboot.git
    $ cd coreboot
    $ cd util/intelmetoolo
    $ make -j
    $ sudo ./intelmetool -m


Kind regards,

Paul

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190108/5ccc4c52/attachment-0001.p7s>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-08  9:59                     ` Jan-Marek Glogowski
  2019-01-08 10:15                       ` Paul Menzel
@ 2019-01-08 10:15                       ` Jan-Marek Glogowski
  1 sibling, 0 replies; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-08 10:15 UTC (permalink / raw)
  To: intel-wired-lan



Am 08.01.19 um 10:59 schrieb Jan-Marek Glogowski:
> 
> 
> Am 08.01.19 um 09:31 schrieb Neftin, Sasha:
>> On 1/7/2019 18:37, Jan-Marek Glogowski wrote:
>>>
>>>
>>> Am 07.01.19 um 16:49 schrieb Neftin, Sasha:
>>>> On 1/7/2019 16:15, Jan-Marek Glogowski wrote:
>>>>>
>>>>>
>>>>> Am 07.01.19 um 10:00 schrieb Jan-Marek Glogowski:
>>>>>>
>>>>>>
>>>>>> Am 07.01.19 um 07:32 schrieb Neftin, Sasha:
>>>>>>> On 1/6/2019 21:53, Jan-Marek Glogowski wrote:
>>>>>>>> Am 6. Januar 2019 16:28:42 MEZ schrieb "Neftin, Sasha" <sasha.neftin@intel.com>:
>>>>>>>>> On 1/4/2019 15:31, Jan-Marek Glogowski wrote:
>>>>>>>>>> My problem is the fallback of the hardware to 10 Mbps after a
>>>>>>>>>> re-connect, which happens almost all times. In the broken case
>>>>>>>>>> the status field has always the 0x40000000 bit set.
>>>>>>>>>>
>>>>>>>>>> Still the naming for the status flag is just a guess. Ignoring
>>>>>>>>>> the status, when this bit is set, solves my problem. But I just
>>>>>>>>>> have one notebook hardware (I219-LM, rev 21), which exhibits the
>>>>>>>>>> problem. It doesn't happen for my other notebook with I219-V
>>>>>>>>>> (rev 21) hardware (or it's just much more unlikely).
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
>>>>>>>>>> ---
>>>>>>>>>> ???? drivers/net/ethernet/intel/e1000e/defines.h | 1 +
>>>>>>>>>> ???? drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>>>>>>>>>> ???? drivers/net/ethernet/intel/e1000e/mac.c???? | 2 ++
>>>>>>>>>> ???? 3 files changed, 5 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>>> b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>>>> index fd550de..3cd9f99 100644
>>>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
>>>>>>>>>> @@ -221,6 +221,7 @@
>>>>>>>>>> ???? #define E1000_STATUS_LAN_INIT_DONE 0x00000200?? /* Lan Init
>>>>>>>>> Completion by NVM */
>>>>>>>>>> ???? #define E1000_STATUS_PHYRA????? 0x00000400????? /* PHY Reset
>>>>>>>>> Asserted */
>>>>>>>>>> ???? #define E1000_STATUS_GIO_MASTER_ENABLE??? 0x00080000??? /* Master Req
>>>>>>>>> status */
>>>>>>>>>> +#define E1000_STATUS_AUTONEG??? 0x40000000????? /* in
>>>>>>>>> auto-negotiation */
>>>>>>>>>> ???? 
>>>>>>>>> There is no such indication. Should be removed.
>>>>>>>>>> ???? #define HALF_DUPLEX 1
>>>>>>>>>> ???? #define FULL_DUPLEX 2
>>>>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>>> index fd59970..8588eb7 100644
>>>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>>> @@ -1390,7 +1390,8 @@ static s32
>>>>>>>>> e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
>>>>>>>>>> ???????????? u16 speed;
>>>>>>>>>> ???????????? u8 duplex;
>>>>>>>>>> ???? -??????? e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex);
>>>>>>>>>> +??????? if (e1000e_get_speed_and_duplex_copper(hw, &speed, &duplex))
>>>>>>>>>> +??????????? goto out;
>>>>>>>>>> ???????????? tipg_reg = er32(TIPG);
>>>>>>>>>> ???????????? tipg_reg &= ~E1000_TIPG_IPGT_MASK;
>>>>>>>>>> ???? diff --git a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>>> b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>>>> index 19c816c..ada8fbb 100644
>>>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>>>>>>>>>> @@ -1310,6 +1310,8 @@ s32 e1000e_get_speed_and_duplex_copper(struct
>>>>>>>>> e1000_hw *hw, u16 *speed,
>>>>>>>>>> ???? ?????? status = er32(STATUS);
>>>>>>>>>> ???? +??? if (status & E1000_STATUS_AUTONEG)
>>>>>>>>>> +??????? return 1;
>>>>>>>>> This is wrong. We have no AUTONEG indication in bit 30 of E1000_STATUS
>>>>>>>>> (0x0008) register. These code piece should be removed.
>>>>>>>>>> ???????? if (!(status & E1000_STATUS_LU))
>>>>>>>>>> ???????????? return 1;
>>>>>>>>>> ??? 
>>>>>>>>> Hello Jan-Marek,
>>>>>>>>> That's okay to use u8 size for a duplex indication and u16 size for a
>>>>>>>>> link indication, as you refer in previous patch.
>>>>>>>>> But use the 'autoneg status' is wrong.
>>>>>>>>
>>>>>>>> Just as a reminder: I have no idea what this bit actually indicates. This is just a guess I had
>>>>>>>> when looking into the problem. I don't know if the device was still negotiating at this
>>>>>>>> point, but
>>>>>>>> this bit was set in the status register.
>>>>>>>>
>>>>>>>>> I wonder how this can solve the problem. Do you
>>>>>>>>> encountered with this problem on other platforms with our devices? (I meant different, no
>>>>>>>>> similar
>>>>>>>>> HW)
>>>>>>>>
>>>>>>>> Other platforms as Windows? I'm just doing Linux development, but I'll ask the Windows people
>>>>>>>> and
>>>>>>>> can check, if this problem also happens there.
>>>>>>>>
>>>>>>>> I don't see this problem with older HW (Fujitsu E7x6, also Skylake based, but I219-V). It
>>>>>>>> happens
>>>>>>>> with both of my U7x7 test notebooks. I have some older Haswell based HW (E7x4), which I
>>>>>>>> didn't yet
>>>>>>>> test. Google tells me they have "Intel 82579LM Gigabit" ethernet.
>>>>>>>>
>>>>>>>> All of these three series are in use and we have a few hundred or even thousand of them. This
>>>>>>>> problem was found during the tests for our next Ubuntu 18.04 based release. This just seems to
>>>>>>>> happen with the "new" U-series. I'm not aware of any problems like this with the older E-series
>>>>>>>> HW.
>>>>>>>> And it probably just happens more often now for whatever reason.
>>>>>>>>
>>>>>>>>> Anyway, 0x40000000 indication is not relevant to the auto-negotiation.
>>>>>>>>> May I ask do your experiments with ME disable (via BIOS) and see if
>>>>>>>>> same problem still happen.
>>>>>>>>
>>>>>>>> Disabling ME shouldn't be a problem to test.
>>>>>>>>
>>>>>>> You have mentioned that there is no problem on I219-V. The main difference between I219-LM and
>>>>>>> I219-V is 'Intel Standard Manageability' feature. So, I suggest to disable ME and re-check.
>>>>>>>> I'll continue testing all the HW tomorrow, with both our releases, and report back. And maybe
>>>>>>>> there is an easier way to trigger the problem then re-plugging the cable all the time (maybe
>>>>>>>> better to get a switch and power cycle that...).
>>>>>>>>
>>>>>>>> Please tell me if there is anything else I should look for or test.
>>>>>>>> Further step more likely should be dump registers and try access to a
>>>>>>> PHY. But let's check ME disabled as the first step.
>>>>>>
>>>>>> According to the BIOS ME is actually disabled.
>>>>>> Nevertheless I selected "UnConfigure ME", which didn'tr change anything in the BIOS (ME
>>>>>> v11.8.50.3425 FWIW). I did look for vendor BIOS updates, as you think this problem might be ME
>>>>>> related. There is an update available.
>>>>>
>>>>> So I did the BIOS update - no changes regarding the network auto-negotiation behavior.
>>>>>
>>>>> I also tried both of my E-Series. The old Haswell series (E7x4) also has a disabled ME and as
>>>>> suspected the following HW:
>>>>>
>>>>> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
>>>>> ????????? Subsystem: Fujitsu Limited. Ethernet Connection I217-LM
>>>>> ????????? Flags: bus master, fast devsel, latency 0, IRQ 27
>>>>> ????????? Memory at f0500000 (32-bit, non-prefetchable) [size=128K]
>>>>> ????????? Memory at f053f000 (32-bit, non-prefetchable) [size=4K]
>>>>> ????????? I/O ports at 3080 [size=32]
>>>>> ????????? Capabilities: [c8] Power Management version 2
>>>>> ????????? Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>>> ????????? Capabilities: [e0] PCI Advanced Features
>>>>> ????????? Kernel driver in use: e1000e
>>>>> ????????? Kernel modules: e1000e
>>>>>
>>>>> I tried the patched module on both E-series HW and they always have the 0x40000000 bit set when
>>>>> decoding the speed from the status register (always 0x40080083), either with or without the ME
>>>>> available. So my patch breaks my older HW, as you probably suspected. I removed the 0x40000000 test
>>>>> from the module, and they always negotiated 1000 Mbps just fine.
>>>>>
>>>>> I've attached logs for all three notebooks with my patched module (without the? 0x40000000 test)
>>>>> and
>>>>> a debug filter for all files of the module (echo "file */e1000e-20/* +p;" >
>>>>> /sys/kernel/debug/dynamic_debug/control).
>>>>>
>>>>> My test consisted of rmmod'ing, sleep 1, insmod'ing, set debug filter + two reconnects.
>>>>>
>>>>> So I'm basically back to square one.
>>>>>
>>>>> How to proceed?
>>>>>
>>>> ME disabled - good. How long time you wait for 1000Mbps after a re connection of the cable? Could
>>>> please, wait 5-10s and see if link back to the 1000Mbps?
>>>
>>> ?From the U757 logs attached to the last mail:
>>>
>>> [11750.669940] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>>> [11750.670054] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>>> [11750.670165] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
>>> [11750.670166] e1000e: enp0s31f6 NIC Link is Down
>>> [11752.925934] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>>> [11752.926065] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>>> [11752.926193] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
>>> ...
>>> [11754.813959] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>>> [11754.814034] e1000e 0000:00:1f.6 enp0s31f6: reading PHY page 0 (or 0x0 shifted) reg 0x1
>>> [11754.814106] e1000e 0000:00:1f.6 enp0s31f6: ARC subsystem not valid.
>>> ...
>>> [11768.142020] e1000e 0000:00:1f.6 enp0s31f6: status 0x40080003 => 10 Mbps, Full Duplex
>>> ...
>>> [11768.151411] e1000e: enp0s31f6 NIC Link is Up 10 Mbps Full Duplex, Flow Control: None
>>>
>>> Which is something about 16s.
>>>
>>> Actually IMHO there is a larger chance to fall to 10 Mbits if you wait longer disconnected.
>>> Still reloading the module remedies this condition.
>>>
>>>> Unfortunately we have no such HW in our labs. I will try ask if our PAE can help with more debug if
>>>> need.
>>>
>>> Hmmm.
>>>
>> Since you still read 0x40000000 value in the status register it is causing me to think that ME
>> works. Another way I think you should to go ask your vendors for last updated NVM (or with ME
>> disabled if possible) for your HW. Since I219V works as properly, I expected I219-LM without ME
>> works too.
> 
> From the logs I sent:
> 
> I217-LM: "status 0x40080083 => 1000 Mbps, Full Duplex"
> I219-V: "status 0x40080083 => 1000 Mbps, Full Duplex"
> I219-LM auto-nego ok: "status 0x80083 => 1000 Mbps, Full Duplex"
> I219-LM auto-nego broken: "status 0x40080003 => 10 Mbps, Full Duplex"
> 
> According to the BIOS both LM-variants have Intel ME disabled. If I can't trust the BIOS, is there a
> way to check this?
> 
>> Let's do follow experiments on your side. Please, do rmmod e1000e.ko, bring up the machine without
>> the driver. Then re-connect the cable few times and see what is link's speed up. You can be trusted
>> on the LED indicators.
> 
> So even if the driver (+ ethtool) indicates 10 MBits both lights are on. Same without the driver.
> But wget speed is is just 10 MBits, as ethtool indicates. Manually overriding to 1000 Mbps still works.
> 
> Now I directly connected the I219-V with my I219-LM.
> 
> On the I219-V I ran "ethtool -s enp0s31f6 speed n autoneg off"
> These are the results on the I219-LM side:
> n=1000 : green and yellow lights on. "status 0x80083 => 1000 Mbps, Full Duplex"
> n=100 : green and yellow lights on. "status 0x80042 => 100 Mbps, Half Duplex"
> n=10 : just green light on. "status 0x80082 => 10 Mbps, Half Duplex"
> 
> The light indicators are the same on the I219-LM without the driver.
> 
> All status output on the I219-V has the 0x40000000 bit set, the I219-LM normally never.
> The I219-V always had full duplex according to ethtool and the status.
> 
> If I set both sides to auto-neg, the first negotiation is correct (both have 1000 Mbps). If I
> reconnect it becomes 10 Mbps on the I219-LM side and 1000 Mbps on the I219-V side. Both lights are
> still on. The I219-LM side *just* has the status bit 0x40000000 set the first time it goes to 10
> Mbits (what I based my patch on without testing it on other HW).

Re-anabling auto-negotiation also works (I219-V ethtool => I219-LM status)

1. "ethtool -s enp0s31f6 speed 100 autoneg off" => "status 0x80042 => 100 Mbps, Half Duplex"
2. "ethtool -s enp0s31f6 autoneg on" => "status 0x80083 => 1000 Mbps, Full Duplex"

according to dmesg (incl. status) and ethtool.
Until I reconnect.

JMG

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-08 10:15                       ` Paul Menzel
@ 2019-01-08 11:15                         ` Jan-Marek Glogowski
  2019-01-09 15:07                           ` Neftin, Sasha
  0 siblings, 1 reply; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-08 11:15 UTC (permalink / raw)
  To: intel-wired-lan

Hi Paul,

Am 08.01.19 um 11:15 schrieb Paul Menzel:
> On 01/08/19 10:59, Jan-Marek Glogowski wrote:
> 
>> Am 08.01.19 um 09:31 schrieb Neftin, Sasha:
>>> On 1/7/2019 18:37, Jan-Marek Glogowski wrote:
> 
> [?]
> 
>>> Since you still read 0x40000000 value in the status register it is causing me to think that ME
>>> works. Another way I think you should to go ask your vendors for last updated NVM (or with ME
>>> disabled if possible) for your HW. Since I219V works as properly, I expected I219-LM without ME
>>> works too.
>>
>> From the logs I sent:
>>
>> I217-LM: "status 0x40080083 => 1000 Mbps, Full Duplex"
>> I219-V: "status 0x40080083 => 1000 Mbps, Full Duplex"
>> I219-LM auto-nego ok: "status 0x80083 => 1000 Mbps, Full Duplex"
>> I219-LM auto-nego broken: "status 0x40080003 => 10 Mbps, Full Duplex"
>>
>> According to the BIOS both LM-variants have Intel ME disabled. If I can't trust the BIOS, is there a
>> way to check this?
> 
> What does intelmetool from the coreboot project show?
> 
>     $ git clone https://review.coreboot.org/coreboot.git
>     $ cd coreboot
>     $ cd util/intelmetoolo
>     $ make -j
>     $ sudo ./intelmetool -m


me-e736.log-norm
----------------
MEI found: [8086:9d3a] Sunrise Point-LP CSME HECI #1

ME Status   : 0x90000245
ME Status 2 : 0x86110306

ME: FW Partition Table      : OK
ME: Bringup Loader Failure  : NO
ME: Firmware Init Complete  : YES
ME: Manufacturing Mode      : NO
ME: Boot Options Present    : NO
ME: Update In Progress      : NO
ME: Current Working State   : Normal
ME: Current Operation State : M0 with UMA
ME: Current Operation Mode  : Normal
ME: Error Code              : No Error
ME: Progress Phase          : Clean Moff->Mx wake
ME: Power Management Event  : Pseudo-global reset
ME: Progress Phase State    : Unknown 0x11

ME: Extend Register not valid

ME: Firmware Version 11.0.1173.0 (code) 11.0.1173.0 (recovery) 11.0.1173.0 (fitc)

ME Capability: Full Network manageability                 : OFF
ME Capability: Regular Network manageability              : OFF
ME Capability: Manageability                              : OFF
ME Capability: Small business technology                  : OFF
ME Capability: Level III manageability                    : OFF
ME Capability: IntelR Anti-Theft (AT)                     : OFF
ME Capability: IntelR Capability Licensing Service (CLS)  : ON
ME Capability: IntelR Power Sharing Technology (MPC)      : OFF
ME Capability: ICC Over Clocking                          : ON
ME Capability: Protected Audio Video Path (PAVP)          : ON
ME Capability: IPV6                                       : OFF
ME Capability: KVM Remote Control (KVM)                   : OFF
ME Capability: Outbreak Containment Heuristic (OCH)       : OFF
ME Capability: Virtual LAN (VLAN)                         : ON
ME Capability: TLS                                        : OFF
ME Capability: Wireless LAN (WLAN)                        : OFF


me-e754.log-norm
----------------
Bad news, you have a `QM87 Express LPC Controller` so you have ME hardware on board and you can't
control or disable it, continuing...

MEI found: [8086:8c3a] 8 Series/C220 Series Chipset Family MEI Controller #1

ME Status   : 0x1e000245
ME Status 2 : 0x60002306

ME: FW Partition Table      : OK
ME: Bringup Loader Failure  : NO
ME: Firmware Init Complete  : YES
ME: Manufacturing Mode      : NO
ME: Boot Options Present    : NO
ME: Update In Progress      : NO
ME: Current Working State   : Normal
ME: Current Operation State : M0 with UMA
ME: Current Operation Mode  : Normal
ME: Error Code              : No Error
ME: Progress Phase          : Host Communication
ME: Power Management Event  : Clean Moff->Mx wake
ME: Progress Phase State    : Host communication established

ME: Extend SHA-256: d536aea220d776c0d26baaffc9832af56871511a7e304b37783b0fe7b8929503

ME: Firmware Version 9.0.1467.22 (code) 9.0.1467.22 (recovery) 9.0.1452.21 (fitc)

ME Capability: Full Network manageability                 : OFF
ME Capability: Regular Network manageability              : OFF
ME Capability: Manageability                              : ON
ME Capability: Small business technology                  : ON
ME Capability: Level III manageability                    : OFF
ME Capability: IntelR Anti-Theft (AT)                     : ON
ME Capability: IntelR Capability Licensing Service (CLS)  : ON
ME Capability: IntelR Power Sharing Technology (MPC)      : OFF
ME Capability: ICC Over Clocking                          : ON
ME Capability: Protected Audio Video Path (PAVP)          : ON
ME Capability: IPV6                                       : OFF
ME Capability: KVM Remote Control (KVM)                   : OFF
ME Capability: Outbreak Containment Heuristic (OCH)       : OFF
ME Capability: Virtual LAN (VLAN)                         : ON
ME Capability: TLS                                        : ON
ME Capability: Wireless LAN (WLAN)                        : OFF


me-u727.log-norm
----------------
MEI found: [8086:9d3a] Sunrise Point-LP CSME HECI #1

ME Status   : 0xa0000245
ME Status 2 : 0x89108106

ME: FW Partition Table      : OK
ME: Bringup Loader Failure  : NO
ME: Firmware Init Complete  : YES
ME: Manufacturing Mode      : NO
ME: Boot Options Present    : NO
ME: Update In Progress      : NO
ME: Current Working State   : Normal
ME: Current Operation State : M0 with UMA
ME: Current Operation Mode  : Normal
ME: Error Code              : No Error
ME: Progress Phase          : Clean Moff->Mx wake
ME: Power Management Event  : Non-power cycle reset
ME: Progress Phase State    : Unknown 0x10

ME: Extend Register not valid

ME: Firmware Version 11.6.3287.29 (code) 11.6.3287.29 (recovery) 11.6.3287.29 (fitc)

ME Capability: Full Network manageability                 : OFF
ME Capability: Regular Network manageability              : OFF
ME Capability: Manageability                              : ON
ME Capability: Small business technology                  : ON
ME Capability: Level III manageability                    : OFF
ME Capability: IntelR Anti-Theft (AT)                     : OFF
ME Capability: IntelR Capability Licensing Service (CLS)  : ON
ME Capability: IntelR Power Sharing Technology (MPC)      : OFF
ME Capability: ICC Over Clocking                          : OFF
ME Capability: Protected Audio Video Path (PAVP)          : ON
ME Capability: IPV6                                       : OFF
ME Capability: KVM Remote Control (KVM)                   : OFF
ME Capability: Outbreak Containment Heuristic (OCH)       : OFF
ME Capability: Virtual LAN (VLAN)                         : ON
ME Capability: TLS                                        : OFF
ME Capability: Wireless LAN (WLAN)                        : OFF


me-u757.log-norm
----------------
MEI found: [8086:9d3a] Sunrise Point-LP CSME HECI #1

ME Status   : 0x90000245
ME Status 2 : 0x89108106

ME: FW Partition Table      : OK
ME: Bringup Loader Failure  : NO
ME: Firmware Init Complete  : YES
ME: Manufacturing Mode      : NO
ME: Boot Options Present    : NO
ME: Update In Progress      : NO
ME: Current Working State   : Normal
ME: Current Operation State : M0 with UMA
ME: Current Operation Mode  : Normal
ME: Error Code              : No Error
ME: Progress Phase          : Clean Moff->Mx wake
ME: Power Management Event  : Non-power cycle reset
ME: Progress Phase State    : Unknown 0x10

ME: Extend Register not valid

ME: Firmware Version 11.8.3425.50 (code) 11.8.3425.50 (recovery) 11.8.3425.50 (fitc)

ME Capability: Full Network manageability                 : ON
ME Capability: Regular Network manageability              : OFF
ME Capability: Manageability                              : ON
ME Capability: Small business technology                  : OFF
ME Capability: Level III manageability                    : OFF
ME Capability: IntelR Anti-Theft (AT)                     : OFF
ME Capability: IntelR Capability Licensing Service (CLS)  : ON
ME Capability: IntelR Power Sharing Technology (MPC)      : OFF
ME Capability: ICC Over Clocking                          : OFF
ME Capability: Protected Audio Video Path (PAVP)          : ON
ME Capability: IPV6                                       : ON
ME Capability: KVM Remote Control (KVM)                   : ON
ME Capability: Outbreak Containment Heuristic (OCH)       : OFF
ME Capability: Virtual LAN (VLAN)                         : ON
ME Capability: TLS                                        : ON
ME Capability: Wireless LAN (WLAN)                        : ON


What do we make of this?

I see same problem with both u757 and u727.
No problem with either e736 and e754.

Jan-Marek

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-08 11:15                         ` Jan-Marek Glogowski
@ 2019-01-09 15:07                           ` Neftin, Sasha
  2019-01-09 17:07                             ` Jan-Marek Glogowski
  0 siblings, 1 reply; 27+ messages in thread
From: Neftin, Sasha @ 2019-01-09 15:07 UTC (permalink / raw)
  To: intel-wired-lan

On 1/8/2019 13:15, Jan-Marek Glogowski wrote:
> Hi Paul,
> 
> Am 08.01.19 um 11:15 schrieb Paul Menzel:
>> On 01/08/19 10:59, Jan-Marek Glogowski wrote:
>>
>>> Am 08.01.19 um 09:31 schrieb Neftin, Sasha:
>>>> On 1/7/2019 18:37, Jan-Marek Glogowski wrote:
>>
>> [?]
>>
>>>> Since you still read 0x40000000 value in the status register it is causing me to think that ME
>>>> works. Another way I think you should to go ask your vendors for last updated NVM (or with ME
>>>> disabled if possible) for your HW. Since I219V works as properly, I expected I219-LM without ME
>>>> works too.
>>>
>>>  From the logs I sent:
>>>
>>> I217-LM: "status 0x40080083 => 1000 Mbps, Full Duplex"
>>> I219-V: "status 0x40080083 => 1000 Mbps, Full Duplex"
>>> I219-LM auto-nego ok: "status 0x80083 => 1000 Mbps, Full Duplex"
>>> I219-LM auto-nego broken: "status 0x40080003 => 10 Mbps, Full Duplex"
>>>
>>> According to the BIOS both LM-variants have Intel ME disabled. If I can't trust the BIOS, is there a
>>> way to check this?
>>
>> What does intelmetool from the coreboot project show?
>>
>>      $ git clone https://review.coreboot.org/coreboot.git
>>      $ cd coreboot
>>      $ cd util/intelmetoolo
>>      $ make -j
>>      $ sudo ./intelmetool -m
> 
> 
> me-e736.log-norm
> ----------------
> MEI found: [8086:9d3a] Sunrise Point-LP CSME HECI #1
> 
> ME Status   : 0x90000245
> ME Status 2 : 0x86110306
> 
> ME: FW Partition Table      : OK
> ME: Bringup Loader Failure  : NO
> ME: Firmware Init Complete  : YES
> ME: Manufacturing Mode      : NO
> ME: Boot Options Present    : NO
> ME: Update In Progress      : NO
> ME: Current Working State   : Normal
> ME: Current Operation State : M0 with UMA
> ME: Current Operation Mode  : Normal
> ME: Error Code              : No Error
> ME: Progress Phase          : Clean Moff->Mx wake
> ME: Power Management Event  : Pseudo-global reset
> ME: Progress Phase State    : Unknown 0x11
> 
> ME: Extend Register not valid
> 
> ME: Firmware Version 11.0.1173.0 (code) 11.0.1173.0 (recovery) 11.0.1173.0 (fitc)
> 
> ME Capability: Full Network manageability                 : OFF
> ME Capability: Regular Network manageability              : OFF
> ME Capability: Manageability                              : OFF
> ME Capability: Small business technology                  : OFF
> ME Capability: Level III manageability                    : OFF
> ME Capability: IntelR Anti-Theft (AT)                     : OFF
> ME Capability: IntelR Capability Licensing Service (CLS)  : ON
> ME Capability: IntelR Power Sharing Technology (MPC)      : OFF
> ME Capability: ICC Over Clocking                          : ON
> ME Capability: Protected Audio Video Path (PAVP)          : ON
> ME Capability: IPV6                                       : OFF
> ME Capability: KVM Remote Control (KVM)                   : OFF
> ME Capability: Outbreak Containment Heuristic (OCH)       : OFF
> ME Capability: Virtual LAN (VLAN)                         : ON
> ME Capability: TLS                                        : OFF
> ME Capability: Wireless LAN (WLAN)                        : OFF
> 
> 
> me-e754.log-norm
> ----------------
> Bad news, you have a `QM87 Express LPC Controller` so you have ME hardware on board and you can't
> control or disable it, continuing...
> 
> MEI found: [8086:8c3a] 8 Series/C220 Series Chipset Family MEI Controller #1
> 
> ME Status   : 0x1e000245
> ME Status 2 : 0x60002306
> 
> ME: FW Partition Table      : OK
> ME: Bringup Loader Failure  : NO
> ME: Firmware Init Complete  : YES
> ME: Manufacturing Mode      : NO
> ME: Boot Options Present    : NO
> ME: Update In Progress      : NO
> ME: Current Working State   : Normal
> ME: Current Operation State : M0 with UMA
> ME: Current Operation Mode  : Normal
> ME: Error Code              : No Error
> ME: Progress Phase          : Host Communication
> ME: Power Management Event  : Clean Moff->Mx wake
> ME: Progress Phase State    : Host communication established
> 
> ME: Extend SHA-256: d536aea220d776c0d26baaffc9832af56871511a7e304b37783b0fe7b8929503
> 
> ME: Firmware Version 9.0.1467.22 (code) 9.0.1467.22 (recovery) 9.0.1452.21 (fitc)
> 
> ME Capability: Full Network manageability                 : OFF
> ME Capability: Regular Network manageability              : OFF
> ME Capability: Manageability                              : ON
> ME Capability: Small business technology                  : ON
> ME Capability: Level III manageability                    : OFF
> ME Capability: IntelR Anti-Theft (AT)                     : ON
> ME Capability: IntelR Capability Licensing Service (CLS)  : ON
> ME Capability: IntelR Power Sharing Technology (MPC)      : OFF
> ME Capability: ICC Over Clocking                          : ON
> ME Capability: Protected Audio Video Path (PAVP)          : ON
> ME Capability: IPV6                                       : OFF
> ME Capability: KVM Remote Control (KVM)                   : OFF
> ME Capability: Outbreak Containment Heuristic (OCH)       : OFF
> ME Capability: Virtual LAN (VLAN)                         : ON
> ME Capability: TLS                                        : ON
> ME Capability: Wireless LAN (WLAN)                        : OFF
> 
> 
> me-u727.log-norm
> ----------------
> MEI found: [8086:9d3a] Sunrise Point-LP CSME HECI #1
> 
> ME Status   : 0xa0000245
> ME Status 2 : 0x89108106
> 
> ME: FW Partition Table      : OK
> ME: Bringup Loader Failure  : NO
> ME: Firmware Init Complete  : YES
> ME: Manufacturing Mode      : NO
> ME: Boot Options Present    : NO
> ME: Update In Progress      : NO
> ME: Current Working State   : Normal
> ME: Current Operation State : M0 with UMA
> ME: Current Operation Mode  : Normal
> ME: Error Code              : No Error
> ME: Progress Phase          : Clean Moff->Mx wake
> ME: Power Management Event  : Non-power cycle reset
> ME: Progress Phase State    : Unknown 0x10
> 
> ME: Extend Register not valid
> 
> ME: Firmware Version 11.6.3287.29 (code) 11.6.3287.29 (recovery) 11.6.3287.29 (fitc)
> 
> ME Capability: Full Network manageability                 : OFF
> ME Capability: Regular Network manageability              : OFF
> ME Capability: Manageability                              : ON
> ME Capability: Small business technology                  : ON
> ME Capability: Level III manageability                    : OFF
> ME Capability: IntelR Anti-Theft (AT)                     : OFF
> ME Capability: IntelR Capability Licensing Service (CLS)  : ON
> ME Capability: IntelR Power Sharing Technology (MPC)      : OFF
> ME Capability: ICC Over Clocking                          : OFF
> ME Capability: Protected Audio Video Path (PAVP)          : ON
> ME Capability: IPV6                                       : OFF
> ME Capability: KVM Remote Control (KVM)                   : OFF
> ME Capability: Outbreak Containment Heuristic (OCH)       : OFF
> ME Capability: Virtual LAN (VLAN)                         : ON
> ME Capability: TLS                                        : OFF
> ME Capability: Wireless LAN (WLAN)                        : OFF
> 
> 
> me-u757.log-norm
> ----------------
> MEI found: [8086:9d3a] Sunrise Point-LP CSME HECI #1
> 
> ME Status   : 0x90000245
> ME Status 2 : 0x89108106
> 
> ME: FW Partition Table      : OK
> ME: Bringup Loader Failure  : NO
> ME: Firmware Init Complete  : YES
> ME: Manufacturing Mode      : NO
> ME: Boot Options Present    : NO
> ME: Update In Progress      : NO
> ME: Current Working State   : Normal
> ME: Current Operation State : M0 with UMA
> ME: Current Operation Mode  : Normal
> ME: Error Code              : No Error
> ME: Progress Phase          : Clean Moff->Mx wake
> ME: Power Management Event  : Non-power cycle reset
> ME: Progress Phase State    : Unknown 0x10
> 
> ME: Extend Register not valid
> 
> ME: Firmware Version 11.8.3425.50 (code) 11.8.3425.50 (recovery) 11.8.3425.50 (fitc)
> 
> ME Capability: Full Network manageability                 : ON
> ME Capability: Regular Network manageability              : OFF
> ME Capability: Manageability                              : ON
> ME Capability: Small business technology                  : OFF
> ME Capability: Level III manageability                    : OFF
> ME Capability: IntelR Anti-Theft (AT)                     : OFF
> ME Capability: IntelR Capability Licensing Service (CLS)  : ON
> ME Capability: IntelR Power Sharing Technology (MPC)      : OFF
> ME Capability: ICC Over Clocking                          : OFF
> ME Capability: Protected Audio Video Path (PAVP)          : ON
> ME Capability: IPV6                                       : ON
> ME Capability: KVM Remote Control (KVM)                   : ON
> ME Capability: Outbreak Containment Heuristic (OCH)       : OFF
> ME Capability: Virtual LAN (VLAN)                         : ON
> ME Capability: TLS                                        : ON
> ME Capability: Wireless LAN (WLAN)                        : ON
> 
> 
> What do we make of this?
> 
You might try to contact your HW vendor. Probably your HW was Windows OS 
oriented.You may ask for a FW/NVM update no ME or try to replace the HW 
on none ME.
> I see same problem with both u757 and u727.
> No problem with either e736 and e754.
> 
> Jan-Marek
> 
Sasha

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation
  2019-01-09 15:07                           ` Neftin, Sasha
@ 2019-01-09 17:07                             ` Jan-Marek Glogowski
  0 siblings, 0 replies; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-09 17:07 UTC (permalink / raw)
  To: intel-wired-lan



Am 09.01.19 um 16:07 schrieb Neftin, Sasha:
> On 1/8/2019 13:15, Jan-Marek Glogowski wrote:
>> Hi Paul,
>>
>> Am 08.01.19 um 11:15 schrieb Paul Menzel:
>>> On 01/08/19 10:59, Jan-Marek Glogowski wrote:
>>>
>>>> Am 08.01.19 um 09:31 schrieb Neftin, Sasha:
>>>>> On 1/7/2019 18:37, Jan-Marek Glogowski wrote:
>>>
>>> [?]
>>>
>>>>> Since you still read 0x40000000 value in the status register it is causing me to think that ME
>>>>> works. Another way I think you should to go ask your vendors for last updated NVM (or with ME
>>>>> disabled if possible) for your HW. Since I219V works as properly, I expected I219-LM without ME
>>>>> works too.
>>>>
>>>> ?From the logs I sent:
>>>>
>>>> I217-LM: "status 0x40080083 => 1000 Mbps, Full Duplex"
>>>> I219-V: "status 0x40080083 => 1000 Mbps, Full Duplex"
>>>> I219-LM auto-nego ok: "status 0x80083 => 1000 Mbps, Full Duplex"
>>>> I219-LM auto-nego broken: "status 0x40080003 => 10 Mbps, Full Duplex"
>>>>
>>>> According to the BIOS both LM-variants have Intel ME disabled. If I can't trust the BIOS, is
>>>> there a
>>>> way to check this?
>>>
>>> What does intelmetool from the coreboot project show?
>>>
>>> ???? $ git clone https://review.coreboot.org/coreboot.git
>>> ???? $ cd coreboot
>>> ???? $ cd util/intelmetoolo
>>> ???? $ make -j
>>> ???? $ sudo ./intelmetool -m
>>
>>
>> me-e736.log-norm
>> ----------------
>> MEI found: [8086:9d3a] Sunrise Point-LP CSME HECI #1
>>
>> ME Status?? : 0x90000245
>> ME Status 2 : 0x86110306
>>
>> ME: FW Partition Table????? : OK
>> ME: Bringup Loader Failure? : NO
>> ME: Firmware Init Complete? : YES
>> ME: Manufacturing Mode????? : NO
>> ME: Boot Options Present??? : NO
>> ME: Update In Progress????? : NO
>> ME: Current Working State?? : Normal
>> ME: Current Operation State : M0 with UMA
>> ME: Current Operation Mode? : Normal
>> ME: Error Code????????????? : No Error
>> ME: Progress Phase????????? : Clean Moff->Mx wake
>> ME: Power Management Event? : Pseudo-global reset
>> ME: Progress Phase State??? : Unknown 0x11
>>
>> ME: Extend Register not valid
>>
>> ME: Firmware Version 11.0.1173.0 (code) 11.0.1173.0 (recovery) 11.0.1173.0 (fitc)
>>
>> ME Capability: Full Network manageability???????????????? : OFF
>> ME Capability: Regular Network manageability????????????? : OFF
>> ME Capability: Manageability????????????????????????????? : OFF
>> ME Capability: Small business technology????????????????? : OFF
>> ME Capability: Level III manageability??????????????????? : OFF
>> ME Capability: IntelR Anti-Theft (AT)???????????????????? : OFF
>> ME Capability: IntelR Capability Licensing Service (CLS)? : ON
>> ME Capability: IntelR Power Sharing Technology (MPC)????? : OFF
>> ME Capability: ICC Over Clocking????????????????????????? : ON
>> ME Capability: Protected Audio Video Path (PAVP)????????? : ON
>> ME Capability: IPV6?????????????????????????????????????? : OFF
>> ME Capability: KVM Remote Control (KVM)?????????????????? : OFF
>> ME Capability: Outbreak Containment Heuristic (OCH)?????? : OFF
>> ME Capability: Virtual LAN (VLAN)???????????????????????? : ON
>> ME Capability: TLS??????????????????????????????????????? : OFF
>> ME Capability: Wireless LAN (WLAN)??????????????????????? : OFF
>>
>>
>> me-e754.log-norm
>> ----------------
>> Bad news, you have a `QM87 Express LPC Controller` so you have ME hardware on board and you can't
>> control or disable it, continuing...
>>
>> MEI found: [8086:8c3a] 8 Series/C220 Series Chipset Family MEI Controller #1
>>
>> ME Status?? : 0x1e000245
>> ME Status 2 : 0x60002306
>>
>> ME: FW Partition Table????? : OK
>> ME: Bringup Loader Failure? : NO
>> ME: Firmware Init Complete? : YES
>> ME: Manufacturing Mode????? : NO
>> ME: Boot Options Present??? : NO
>> ME: Update In Progress????? : NO
>> ME: Current Working State?? : Normal
>> ME: Current Operation State : M0 with UMA
>> ME: Current Operation Mode? : Normal
>> ME: Error Code????????????? : No Error
>> ME: Progress Phase????????? : Host Communication
>> ME: Power Management Event? : Clean Moff->Mx wake
>> ME: Progress Phase State??? : Host communication established
>>
>> ME: Extend SHA-256: d536aea220d776c0d26baaffc9832af56871511a7e304b37783b0fe7b8929503
>>
>> ME: Firmware Version 9.0.1467.22 (code) 9.0.1467.22 (recovery) 9.0.1452.21 (fitc)
>>
>> ME Capability: Full Network manageability???????????????? : OFF
>> ME Capability: Regular Network manageability????????????? : OFF
>> ME Capability: Manageability????????????????????????????? : ON
>> ME Capability: Small business technology????????????????? : ON
>> ME Capability: Level III manageability??????????????????? : OFF
>> ME Capability: IntelR Anti-Theft (AT)???????????????????? : ON
>> ME Capability: IntelR Capability Licensing Service (CLS)? : ON
>> ME Capability: IntelR Power Sharing Technology (MPC)????? : OFF
>> ME Capability: ICC Over Clocking????????????????????????? : ON
>> ME Capability: Protected Audio Video Path (PAVP)????????? : ON
>> ME Capability: IPV6?????????????????????????????????????? : OFF
>> ME Capability: KVM Remote Control (KVM)?????????????????? : OFF
>> ME Capability: Outbreak Containment Heuristic (OCH)?????? : OFF
>> ME Capability: Virtual LAN (VLAN)???????????????????????? : ON
>> ME Capability: TLS??????????????????????????????????????? : ON
>> ME Capability: Wireless LAN (WLAN)??????????????????????? : OFF
>>
>>
>> me-u727.log-norm
>> ----------------
>> MEI found: [8086:9d3a] Sunrise Point-LP CSME HECI #1
>>
>> ME Status?? : 0xa0000245
>> ME Status 2 : 0x89108106
>>
>> ME: FW Partition Table????? : OK
>> ME: Bringup Loader Failure? : NO
>> ME: Firmware Init Complete? : YES
>> ME: Manufacturing Mode????? : NO
>> ME: Boot Options Present??? : NO
>> ME: Update In Progress????? : NO
>> ME: Current Working State?? : Normal
>> ME: Current Operation State : M0 with UMA
>> ME: Current Operation Mode? : Normal
>> ME: Error Code????????????? : No Error
>> ME: Progress Phase????????? : Clean Moff->Mx wake
>> ME: Power Management Event? : Non-power cycle reset
>> ME: Progress Phase State??? : Unknown 0x10
>>
>> ME: Extend Register not valid
>>
>> ME: Firmware Version 11.6.3287.29 (code) 11.6.3287.29 (recovery) 11.6.3287.29 (fitc)
>>
>> ME Capability: Full Network manageability???????????????? : OFF
>> ME Capability: Regular Network manageability????????????? : OFF
>> ME Capability: Manageability????????????????????????????? : ON
>> ME Capability: Small business technology????????????????? : ON
>> ME Capability: Level III manageability??????????????????? : OFF
>> ME Capability: IntelR Anti-Theft (AT)???????????????????? : OFF
>> ME Capability: IntelR Capability Licensing Service (CLS)? : ON
>> ME Capability: IntelR Power Sharing Technology (MPC)????? : OFF
>> ME Capability: ICC Over Clocking????????????????????????? : OFF
>> ME Capability: Protected Audio Video Path (PAVP)????????? : ON
>> ME Capability: IPV6?????????????????????????????????????? : OFF
>> ME Capability: KVM Remote Control (KVM)?????????????????? : OFF
>> ME Capability: Outbreak Containment Heuristic (OCH)?????? : OFF
>> ME Capability: Virtual LAN (VLAN)???????????????????????? : ON
>> ME Capability: TLS??????????????????????????????????????? : OFF
>> ME Capability: Wireless LAN (WLAN)??????????????????????? : OFF
>>
>>
>> me-u757.log-norm
>> ----------------
>> MEI found: [8086:9d3a] Sunrise Point-LP CSME HECI #1
>>
>> ME Status?? : 0x90000245
>> ME Status 2 : 0x89108106
>>
>> ME: FW Partition Table????? : OK
>> ME: Bringup Loader Failure? : NO
>> ME: Firmware Init Complete? : YES
>> ME: Manufacturing Mode????? : NO
>> ME: Boot Options Present??? : NO
>> ME: Update In Progress????? : NO
>> ME: Current Working State?? : Normal
>> ME: Current Operation State : M0 with UMA
>> ME: Current Operation Mode? : Normal
>> ME: Error Code????????????? : No Error
>> ME: Progress Phase????????? : Clean Moff->Mx wake
>> ME: Power Management Event? : Non-power cycle reset
>> ME: Progress Phase State??? : Unknown 0x10
>>
>> ME: Extend Register not valid
>>
>> ME: Firmware Version 11.8.3425.50 (code) 11.8.3425.50 (recovery) 11.8.3425.50 (fitc)
>>
>> ME Capability: Full Network manageability???????????????? : ON
>> ME Capability: Regular Network manageability????????????? : OFF
>> ME Capability: Manageability????????????????????????????? : ON
>> ME Capability: Small business technology????????????????? : OFF
>> ME Capability: Level III manageability??????????????????? : OFF
>> ME Capability: IntelR Anti-Theft (AT)???????????????????? : OFF
>> ME Capability: IntelR Capability Licensing Service (CLS)? : ON
>> ME Capability: IntelR Power Sharing Technology (MPC)????? : OFF
>> ME Capability: ICC Over Clocking????????????????????????? : OFF
>> ME Capability: Protected Audio Video Path (PAVP)????????? : ON
>> ME Capability: IPV6?????????????????????????????????????? : ON
>> ME Capability: KVM Remote Control (KVM)?????????????????? : ON
>> ME Capability: Outbreak Containment Heuristic (OCH)?????? : OFF
>> ME Capability: Virtual LAN (VLAN)???????????????????????? : ON
>> ME Capability: TLS??????????????????????????????????????? : ON
>> ME Capability: Wireless LAN (WLAN)??????????????????????? : ON
>>
>>
>> What do we make of this?
>>
> You might try to contact your HW vendor. Probably your HW was Windows OS oriented.You may ask for a
> FW/NVM update no ME or try to replace the HW on none ME.

What part of ME needs to be disabled?
And - as you can see - both of the e-series also have some ME settings on and work.

The first two dumps are from my E-series notebooks and they work just fine with their ME settings.
The other two dumps are from my broken U-series. I don't understand why ME should make a problem for
U and not for E.

And Intel CPU don't work without ME AFAIK.
And all my HW are laptops, so there is nothing I can replace.

So please tell me what kind of problems you see with the U-series settings, compared to their E
counterparts, that might need a vendor fix.

Thanks

Jan-Marek

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect
  2019-01-04 13:31 ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jan-Marek Glogowski
                     ` (3 preceding siblings ...)
  2019-01-04 23:39   ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jeff Kirsher
@ 2019-01-15 15:22   ` Jan-Marek Glogowski
  2019-01-15 15:43     ` Neftin, Sasha
  4 siblings, 1 reply; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-15 15:22 UTC (permalink / raw)
  To: intel-wired-lan

Hi,

I still don't know how to proceed. The last statement for patch 2 is:

Am 09.01.19 um 16:07 schrieb Neftin, Sasha:

> You might try to contact your HW vendor. Probably your HW was Windows OS oriented.
> You may ask for a FW/NVM update no ME or try to replace the HW on none ME.

So this hardware is a build-in part of laptops. I don't know if there will be Linux or Windows
installed on them. I can't make much sense of "Probably your HW was Windows OS oriented".
I'll talk with our hardware supplier, which can probably talk with Fujitsu, if this needs some
firmware based fix. Should I point them to this thread and ask them to contact Intel?

As I already wrote the old hardware worked fine and also has ME enabled, which can't be completely
disabled AFAIK.

Then there is still patch 1 and 3, which I will re-send independent of the broken 2nd patch.

Anything else that can or should be done?

Jan-Marek





^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect
  2019-01-15 15:22   ` Jan-Marek Glogowski
@ 2019-01-15 15:43     ` Neftin, Sasha
  2019-01-16 17:33       ` Jan-Marek Glogowski
  0 siblings, 1 reply; 27+ messages in thread
From: Neftin, Sasha @ 2019-01-15 15:43 UTC (permalink / raw)
  To: intel-wired-lan

On 1/15/2019 17:22, Jan-Marek Glogowski wrote:
> Hi,
> 
> I still don't know how to proceed. The last statement for patch 2 is:
> 
> Am 09.01.19 um 16:07 schrieb Neftin, Sasha:
> 
>> You might try to contact your HW vendor. Probably your HW was Windows OS oriented.
>> You may ask for a FW/NVM update no ME or try to replace the HW on none ME.
> 
> So this hardware is a build-in part of laptops. I don't know if there will be Linux or Windows
> installed on them. I can't make much sense of "Probably your HW was Windows OS oriented".
> I'll talk with our hardware supplier, which can probably talk with Fujitsu, if this needs some
> firmware based fix. Should I point them to this thread and ask them to contact Intel?
> 
> As I already wrote the old hardware worked fine and also has ME enabled, which can't be completely
> disabled AFAIK.
> 
> Then there is still patch 1 and 3, which I will re-send independent of the broken 2nd patch.
> 
> Anything else that can or should be done?
> 
> Jan-Marek
> 
> 
> 
> 
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
> 
You can make changes and resend patches 1 and 3. There is 'u8' type use 
instead of 'u16' - I not sure in benefit of it. Debug prints from patch 
3 are essential I thought.
Regard your problem. Fujitsu can contacts our customer representative.
Sasha

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect
  2019-01-15 15:43     ` Neftin, Sasha
@ 2019-01-16 17:33       ` Jan-Marek Glogowski
  2019-01-17  7:43         ` Neftin, Sasha
  0 siblings, 1 reply; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-16 17:33 UTC (permalink / raw)
  To: intel-wired-lan



Am 15.01.19 um 16:43 schrieb Neftin, Sasha:
> You can make changes and resend patches 1 and 3. There is 'u8' type use instead of 'u16' - I not
> sure in benefit of it. Debug prints from patch 3 are essential I thought.

Regarding the u8: after I used the the reference of &cmd->base.duplex in the
e1000e_get_speed_and_duplex_copper call, the compiler complained about duplex being u8 in the UAPI
struct, but u16 as the duplex reference parameter.

The same actually happened when I tried the &cmd->base.speed, where the UAPI is u32, but internally
e1000e uses u16 in a lot of places.

I decided to keep u16 speed and change the few u8 duplex places. Really don't care, but if you have
a preference, I'll change it before sending v2, using a temporary duplex variable.

> Regard your problem. Fujitsu can contacts our customer representative.

Already in touch with Fujitsu.

Thanks for your help!

Jan-Marek

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect
  2019-01-16 17:33       ` Jan-Marek Glogowski
@ 2019-01-17  7:43         ` Neftin, Sasha
  0 siblings, 0 replies; 27+ messages in thread
From: Neftin, Sasha @ 2019-01-17  7:43 UTC (permalink / raw)
  To: intel-wired-lan

On 1/16/2019 19:33, Jan-Marek Glogowski wrote:
> 
> 
> Am 15.01.19 um 16:43 schrieb Neftin, Sasha:
>> You can make changes and resend patches 1 and 3. There is 'u8' type use instead of 'u16' - I not
>> sure in benefit of it. Debug prints from patch 3 are essential I thought.
> 
> Regarding the u8: after I used the the reference of &cmd->base.duplex in the
> e1000e_get_speed_and_duplex_copper call, the compiler complained about duplex being u8 in the UAPI
> struct, but u16 as the duplex reference parameter.
> 
> The same actually happened when I tried the &cmd->base.speed, where the UAPI is u32, but internally
> e1000e uses u16 in a lot of places.
> 
> I decided to keep u16 speed and change the few u8 duplex places. Really don't care, but if you have
> a preference, I'll change it before sending v2, using a temporary duplex variable.
There is no too much benefit from these patches (IMHO). I won't stop 
you. In case you consider resubmit please be consistent and replace the 
type in all relevant places.
> 
>> Regard your problem. Fujitsu can contacts our customer representative.
> 
> Already in touch with Fujitsu.
>
Good.

> Thanks for your help!
> 
> Jan-Marek
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
> 
Sasha

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection
  2019-01-03 21:28 [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection Jan-Marek Glogowski
  2019-01-04 13:31 ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jan-Marek Glogowski
@ 2019-01-18 15:32 ` Jan-Marek Glogowski
  1 sibling, 0 replies; 27+ messages in thread
From: Jan-Marek Glogowski @ 2019-01-18 15:32 UTC (permalink / raw)
  To: intel-wired-lan

So I've updated both my test notebooks (U757 and U727).
Both latest BIOS + a new ME (11.8.55.3510) I got from Fujitsu.
"Unconfigured" ME in the BIOS - again, just in case.

I think 1000Mbps became more stable. But then I saw 10Mbps when I first connected the port late
during boot, way after the driver is loaded. Now that seems to be the most reliable way to trigger
the bug. Most time reconnect works keeping 1000 Mbits, which is definitely an improvement.

Just tested:
* unplug
* rmmod e1000e
* modprobe e1000e
* connect
* => always 10 Mbps

Still:
* plugged
* rmmod e1000e
* modprobe e1000e
* => always 1000 Mbps

On 10 MBits I can get back to 1000 Mbits, if I just unplug for a very short time, keeping the cable
still in the port; so maybe the poll worker doesn't yet kick in to break something?

No difference between 4.15 and 5.0-rc2 vanilla, FWIW.

My broken patch still works. I sometimes get a "0x40080003" and ignore that and then react to the
correct "0x80083". Compared to my other HW the 0x40000000 is just set in the "error" case, not always.

FWIW current intelmetool -m output diff is:

@@ -1,7 +1,7 @@
 MEI found: [8086:9d3a] Sunrise Point-LP CSME HECI #1

 ME Status   : 0x90000245
-ME Status 2 : 0x89108106
+ME Status 2 : 0x89118106

 ME: FW Partition Table      : OK
 ME: Bringup Loader Failure  : NO
@@ -15,11 +15,11 @@
 ME: Error Code              : No Error
 ME: Progress Phase          : Clean Moff->Mx wake
 ME: Power Management Event  : Non-power cycle reset
-ME: Progress Phase State    : Unknown 0x10
+ME: Progress Phase State    : Unknown 0x11

 ME: Extend Register not valid

-ME: Firmware Version 11.8.3425.50 (code) 11.8.3425.50 (recovery) 11.8.3425.50 (fitc)
+ME: Firmware Version 11.8.3510.55 (code) 11.8.3510.55 (recovery) 11.8.3425.50 (fitc)

 ME Capability: Full Network manageability                 : ON
 ME Capability: Regular Network manageability              : OFF

And I tried to blacklist mei and mei_me kernel modules, not really expecting a change. Also no
difference.

I'm thinking of simply providing some kind of DMI-based-quirk to enable my special code path just
for this HW. I'm open for any additional suggestions.

Fujitsu has basically the same info and I'm waiting for an answer next week, as it's almost weekend.

I guess because of the 0x40000000 bit it's still a ME related problem.

Enough network plugging for this week. Hope I have more luck with my usb-c problem?

Jan-Marek

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-01-18 15:32 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-03 21:28 [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection Jan-Marek Glogowski
2019-01-04 13:31 ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jan-Marek Glogowski
2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 1/3] e1000e: drop duplicate speed + duplex decoding code Jan-Marek Glogowski
2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 2/3] e1000e: ignore status during auto-negotiation Jan-Marek Glogowski
2019-01-06 15:28     ` Neftin, Sasha
2019-01-06 19:53       ` Jan-Marek Glogowski
2019-01-07  6:32         ` Neftin, Sasha
2019-01-07  9:00           ` Jan-Marek Glogowski
2019-01-07 14:15             ` Jan-Marek Glogowski
2019-01-07 15:49               ` Neftin, Sasha
2019-01-07 16:37                 ` Jan-Marek Glogowski
2019-01-08  8:31                   ` Neftin, Sasha
2019-01-08  9:59                     ` Jan-Marek Glogowski
2019-01-08 10:15                       ` Paul Menzel
2019-01-08 11:15                         ` Jan-Marek Glogowski
2019-01-09 15:07                           ` Neftin, Sasha
2019-01-09 17:07                             ` Jan-Marek Glogowski
2019-01-08 10:15                       ` Jan-Marek Glogowski
2019-01-04 13:31   ` [Intel-wired-lan] [PATCH 3/3] e1000e: add some status debug output Jan-Marek Glogowski
2019-01-06 15:54     ` Neftin, Sasha
2019-01-04 23:39   ` [Intel-wired-lan] [RfC] fix auto-negotiation after reconnect Jeff Kirsher
2019-01-05  0:13     ` Jan-Marek Glogowski
2019-01-15 15:22   ` Jan-Marek Glogowski
2019-01-15 15:43     ` Neftin, Sasha
2019-01-16 17:33       ` Jan-Marek Glogowski
2019-01-17  7:43         ` Neftin, Sasha
2019-01-18 15:32 ` [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection Jan-Marek Glogowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.