All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next 0/9][pull request] Intel Wired LAN Dirver Updates
@ 2012-05-03  9:56 Jeff Kirsher
  2012-05-03  9:56 ` [net-next 1/9] e1000e: suggest a possible workaround to a device hang on 82577/8 Jeff Kirsher
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Jeff Kirsher @ 2012-05-03  9:56 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series of patches contains updates for e1000e and ixgbevf.

The following are changes since commit af94bf6db1d58d26f1cdab145b6312ad363254a6:
  ixgbe: Fix use after free on module remove
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Bruce Allan (2):
  e1000e: suggest a possible workaround to a device hang on 82577/8
  e1000e: cleanup long [read|write]_reg_locked PHY ops function
    pointers

Chris Boot (2):
  e1000e: Disable ASPM L1 on 82574
  e1000e: Remove special case for 82573/82574 ASPM L1 disablement

Greg Rose (3):
  ixgbevf: Add support to recognize 100mb link speed
  ixgbevf: Make sure jumbo frames are set correctly after PF reset
  ixgbevf: Update version string

Matthew Vick (2):
  e1000e: Resolve intermittent negotiation issue on 82574/82583.
  e1000e: Driver workaround for IPv6 Header Extension Erratum.

 drivers/net/ethernet/intel/e1000e/80003es2lan.c   |    8 +++
 drivers/net/ethernet/intel/e1000e/82571.c         |   13 +++++-
 drivers/net/ethernet/intel/e1000e/e1000.h         |   10 ++++
 drivers/net/ethernet/intel/e1000e/ich8lan.c       |   54 ++++++++++-----------
 drivers/net/ethernet/intel/e1000e/netdev.c        |   21 ++------
 drivers/net/ethernet/intel/e1000e/phy.c           |   18 +++++++-
 drivers/net/ethernet/intel/ixgbevf/defines.h      |    2 +
 drivers/net/ethernet/intel/ixgbevf/ethtool.c      |   18 +++++--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |    2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   30 ++++++------
 drivers/net/ethernet/intel/ixgbevf/vf.c           |   12 ++++-
 11 files changed, 119 insertions(+), 69 deletions(-)

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [net-next 1/9] e1000e: suggest a possible workaround to a device hang on 82577/8
  2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
@ 2012-05-03  9:56 ` Jeff Kirsher
  2012-05-03  9:56 ` [net-next 2/9] e1000e: cleanup long [read|write]_reg_locked PHY ops function pointers Jeff Kirsher
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff Kirsher @ 2012-05-03  9:56 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, sassmann, Jeff Kirsher

From: Bruce Allan <bruce.w.allan@intel.com>

There is a known issue in the 82577 and 82578 device that can cause a hang
in the device hardware during traffic stress; the current workaround in the
driver is to disable transmit flow control by default.  If the user enables
transmit flow control and the device hang occurs, provide a message in the
syslog suggesting to re-enable the workaround.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index c0e211b..e86b524 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1084,6 +1084,10 @@ static void e1000_print_hw_hang(struct work_struct *work)
 	      phy_1000t_status,
 	      phy_ext_status,
 	      pci_status);
+
+	/* Suggest workaround for known h/w issue */
+	if ((hw->mac.type == e1000_pchlan) && (er32(CTRL) & E1000_CTRL_TFCE))
+		e_err("Try turning off Tx pause (flow control) via ethtool\n");
 }
 
 /**
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net-next 2/9] e1000e: cleanup long [read|write]_reg_locked PHY ops function pointers
  2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
  2012-05-03  9:56 ` [net-next 1/9] e1000e: suggest a possible workaround to a device hang on 82577/8 Jeff Kirsher
@ 2012-05-03  9:56 ` Jeff Kirsher
  2012-05-03  9:56 ` [net-next 3/9] e1000e: Resolve intermittent negotiation issue on 82574/82583 Jeff Kirsher
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff Kirsher @ 2012-05-03  9:56 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, sassmann, Jeff Kirsher

From: Bruce Allan <bruce.w.allan@intel.com>

Calling the locked versions of the read/write PHY ops function pointers
often produces excessively long lines.  Shorten these as is done with
the non-locked versions of the PHY register read/write functions.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/e1000.h   |   10 ++++++
 drivers/net/ethernet/intel/e1000e/ich8lan.c |   47 +++++++++++----------------
 2 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h
index 1da9bfa..c960cf8 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -673,11 +673,21 @@ static inline s32 e1e_rphy(struct e1000_hw *hw, u32 offset, u16 *data)
 	return hw->phy.ops.read_reg(hw, offset, data);
 }
 
+static inline s32 e1e_rphy_locked(struct e1000_hw *hw, u32 offset, u16 *data)
+{
+	return hw->phy.ops.read_reg_locked(hw, offset, data);
+}
+
 static inline s32 e1e_wphy(struct e1000_hw *hw, u32 offset, u16 data)
 {
 	return hw->phy.ops.write_reg(hw, offset, data);
 }
 
+static inline s32 e1e_wphy_locked(struct e1000_hw *hw, u32 offset, u16 data)
+{
+	return hw->phy.ops.write_reg_locked(hw, offset, data);
+}
+
 static inline s32 e1000_get_cable_length(struct e1000_hw *hw)
 {
 	return hw->phy.ops.get_cable_length(hw);
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index ca34ebf..c58ed26 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -304,9 +304,9 @@ static bool e1000_phy_is_accessible_pchlan(struct e1000_hw *hw)
 	u16 phy_reg;
 	u32 phy_id;
 
-	hw->phy.ops.read_reg_locked(hw, PHY_ID1, &phy_reg);
+	e1e_rphy_locked(hw, PHY_ID1, &phy_reg);
 	phy_id = (u32)(phy_reg << 16);
-	hw->phy.ops.read_reg_locked(hw, PHY_ID2, &phy_reg);
+	e1e_rphy_locked(hw, PHY_ID2, &phy_reg);
 	phy_id |= (u32)(phy_reg & PHY_REVISION_MASK);
 
 	if (hw->phy.id) {
@@ -1271,8 +1271,7 @@ static s32 e1000_sw_lcd_config_ich8lan(struct e1000_hw *hw)
 		reg_addr &= PHY_REG_MASK;
 		reg_addr |= phy_page;
 
-		ret_val = phy->ops.write_reg_locked(hw, (u32)reg_addr,
-						    reg_data);
+		ret_val = e1e_wphy_locked(hw, (u32)reg_addr, reg_data);
 		if (ret_val)
 			goto release;
 	}
@@ -1309,8 +1308,8 @@ static s32 e1000_k1_gig_workaround_hv(struct e1000_hw *hw, bool link)
 	/* Disable K1 when link is 1Gbps, otherwise use the NVM setting */
 	if (link) {
 		if (hw->phy.type == e1000_phy_82578) {
-			ret_val = hw->phy.ops.read_reg_locked(hw, BM_CS_STATUS,
-			                                          &status_reg);
+			ret_val = e1e_rphy_locked(hw, BM_CS_STATUS,
+						  &status_reg);
 			if (ret_val)
 				goto release;
 
@@ -1325,8 +1324,7 @@ static s32 e1000_k1_gig_workaround_hv(struct e1000_hw *hw, bool link)
 		}
 
 		if (hw->phy.type == e1000_phy_82577) {
-			ret_val = hw->phy.ops.read_reg_locked(hw, HV_M_STATUS,
-			                                          &status_reg);
+			ret_val = e1e_rphy_locked(hw, HV_M_STATUS, &status_reg);
 			if (ret_val)
 				goto release;
 
@@ -1341,15 +1339,13 @@ static s32 e1000_k1_gig_workaround_hv(struct e1000_hw *hw, bool link)
 		}
 
 		/* Link stall fix for link up */
-		ret_val = hw->phy.ops.write_reg_locked(hw, PHY_REG(770, 19),
-		                                           0x0100);
+		ret_val = e1e_wphy_locked(hw, PHY_REG(770, 19), 0x0100);
 		if (ret_val)
 			goto release;
 
 	} else {
 		/* Link stall fix for link down */
-		ret_val = hw->phy.ops.write_reg_locked(hw, PHY_REG(770, 19),
-		                                           0x4100);
+		ret_val = e1e_wphy_locked(hw, PHY_REG(770, 19), 0x4100);
 		if (ret_val)
 			goto release;
 	}
@@ -1448,7 +1444,7 @@ static s32 e1000_oem_bits_config_ich8lan(struct e1000_hw *hw, bool d0_state)
 
 	mac_reg = er32(PHY_CTRL);
 
-	ret_val = hw->phy.ops.read_reg_locked(hw, HV_OEM_BITS, &oem_reg);
+	ret_val = e1e_rphy_locked(hw, HV_OEM_BITS, &oem_reg);
 	if (ret_val)
 		goto release;
 
@@ -1475,7 +1471,7 @@ static s32 e1000_oem_bits_config_ich8lan(struct e1000_hw *hw, bool d0_state)
 	    !hw->phy.ops.check_reset_block(hw))
 		oem_reg |= HV_OEM_BITS_RESTART_AN;
 
-	ret_val = hw->phy.ops.write_reg_locked(hw, HV_OEM_BITS, oem_reg);
+	ret_val = e1e_wphy_locked(hw, HV_OEM_BITS, oem_reg);
 
 release:
 	hw->phy.ops.release(hw);
@@ -1571,11 +1567,10 @@ static s32 e1000_hv_phy_workarounds_ich8lan(struct e1000_hw *hw)
 	ret_val = hw->phy.ops.acquire(hw);
 	if (ret_val)
 		return ret_val;
-	ret_val = hw->phy.ops.read_reg_locked(hw, BM_PORT_GEN_CFG, &phy_data);
+	ret_val = e1e_rphy_locked(hw, BM_PORT_GEN_CFG, &phy_data);
 	if (ret_val)
 		goto release;
-	ret_val = hw->phy.ops.write_reg_locked(hw, BM_PORT_GEN_CFG,
-					       phy_data & 0x00FF);
+	ret_val = e1e_wphy_locked(hw, BM_PORT_GEN_CFG, phy_data & 0x00FF);
 release:
 	hw->phy.ops.release(hw);
 
@@ -1807,20 +1802,18 @@ static s32 e1000_lv_phy_workarounds_ich8lan(struct e1000_hw *hw)
 	ret_val = hw->phy.ops.acquire(hw);
 	if (ret_val)
 		return ret_val;
-	ret_val = hw->phy.ops.write_reg_locked(hw, I82579_EMI_ADDR,
-					       I82579_MSE_THRESHOLD);
+	ret_val = e1e_wphy_locked(hw, I82579_EMI_ADDR, I82579_MSE_THRESHOLD);
 	if (ret_val)
 		goto release;
 	/* set MSE higher to enable link to stay up when noise is high */
-	ret_val = hw->phy.ops.write_reg_locked(hw, I82579_EMI_DATA, 0x0034);
+	ret_val = e1e_wphy_locked(hw, I82579_EMI_DATA, 0x0034);
 	if (ret_val)
 		goto release;
-	ret_val = hw->phy.ops.write_reg_locked(hw, I82579_EMI_ADDR,
-					       I82579_MSE_LINK_DOWN);
+	ret_val = e1e_wphy_locked(hw, I82579_EMI_ADDR, I82579_MSE_LINK_DOWN);
 	if (ret_val)
 		goto release;
 	/* drop link after 5 times MSE threshold was reached */
-	ret_val = hw->phy.ops.write_reg_locked(hw, I82579_EMI_DATA, 0x0005);
+	ret_val = e1e_wphy_locked(hw, I82579_EMI_DATA, 0x0005);
 release:
 	hw->phy.ops.release(hw);
 
@@ -1995,12 +1988,10 @@ static s32 e1000_post_phy_reset_ich8lan(struct e1000_hw *hw)
 		ret_val = hw->phy.ops.acquire(hw);
 		if (ret_val)
 			return ret_val;
-		ret_val = hw->phy.ops.write_reg_locked(hw, I82579_EMI_ADDR,
-						       I82579_LPI_UPDATE_TIMER);
+		ret_val = e1e_wphy_locked(hw, I82579_EMI_ADDR,
+					  I82579_LPI_UPDATE_TIMER);
 		if (!ret_val)
-			ret_val = hw->phy.ops.write_reg_locked(hw,
-							       I82579_EMI_DATA,
-							       0x1387);
+			ret_val = e1e_wphy_locked(hw, I82579_EMI_DATA, 0x1387);
 		hw->phy.ops.release(hw);
 	}
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net-next 3/9] e1000e: Resolve intermittent negotiation issue on 82574/82583.
  2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
  2012-05-03  9:56 ` [net-next 1/9] e1000e: suggest a possible workaround to a device hang on 82577/8 Jeff Kirsher
  2012-05-03  9:56 ` [net-next 2/9] e1000e: cleanup long [read|write]_reg_locked PHY ops function pointers Jeff Kirsher
@ 2012-05-03  9:56 ` Jeff Kirsher
  2012-05-03  9:56 ` [net-next 4/9] e1000e: Driver workaround for IPv6 Header Extension Erratum Jeff Kirsher
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff Kirsher @ 2012-05-03  9:56 UTC (permalink / raw)
  To: davem; +Cc: Matthew Vick, netdev, gospo, sassmann, Jeff Kirsher

From: Matthew Vick <matthew.vick@intel.com>

For 82574 and 82583 devices, resolve an intermittent link issue where
the link negotiates to 100Mbps rather than 1Gbps when powering off the
PHY and powering on the PHY after several seconds.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/phy.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
index bd5ef64..ad22b8c 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -722,8 +722,24 @@ s32 e1000e_copper_link_setup_m88(struct e1000_hw *hw)
 		phy_data |= M88E1000_PSCR_POLARITY_REVERSAL;
 
 	/* Enable downshift on BM (disabled by default) */
-	if (phy->type == e1000_phy_bm)
+	if (phy->type == e1000_phy_bm) {
+		/* For 82574/82583, first disable then enable downshift */
+		if (phy->id == BME1000_E_PHY_ID_R2) {
+			phy_data &= ~BME1000_PSCR_ENABLE_DOWNSHIFT;
+			ret_val = e1e_wphy(hw, M88E1000_PHY_SPEC_CTRL,
+					   phy_data);
+			if (ret_val)
+				return ret_val;
+			/* Commit the changes. */
+			ret_val = e1000e_commit_phy(hw);
+			if (ret_val) {
+				e_dbg("Error committing the PHY changes\n");
+				return ret_val;
+			}
+		}
+
 		phy_data |= BME1000_PSCR_ENABLE_DOWNSHIFT;
+	}
 
 	ret_val = e1e_wphy(hw, M88E1000_PHY_SPEC_CTRL, phy_data);
 	if (ret_val)
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net-next 4/9] e1000e: Driver workaround for IPv6 Header Extension Erratum.
  2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
                   ` (2 preceding siblings ...)
  2012-05-03  9:56 ` [net-next 3/9] e1000e: Resolve intermittent negotiation issue on 82574/82583 Jeff Kirsher
@ 2012-05-03  9:56 ` Jeff Kirsher
  2012-05-03  9:56 ` [net-next 5/9] e1000e: Disable ASPM L1 on 82574 Jeff Kirsher
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff Kirsher @ 2012-05-03  9:56 UTC (permalink / raw)
  To: davem; +Cc: Matthew Vick, netdev, gospo, sassmann, Jeff Kirsher

From: Matthew Vick <matthew.vick@intel.com>

Previously, IPv6 extension header parsing was disabled for all devices
supported by e1000e when using packet split mode. However, as per a
silicon errata, only certain devices need this restriction and will need
to disable IPv6 extension header parsing for all modes.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/80003es2lan.c |    8 ++++++++
 drivers/net/ethernet/intel/e1000e/82571.c       |   10 ++++++++++
 drivers/net/ethernet/intel/e1000e/ich8lan.c     |    7 +++++++
 drivers/net/ethernet/intel/e1000e/netdev.c      |    9 +--------
 4 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/80003es2lan.c b/drivers/net/ethernet/intel/e1000e/80003es2lan.c
index 66f9877..4dd18a1 100644
--- a/drivers/net/ethernet/intel/e1000e/80003es2lan.c
+++ b/drivers/net/ethernet/intel/e1000e/80003es2lan.c
@@ -944,6 +944,14 @@ static void e1000_initialize_hw_bits_80003es2lan(struct e1000_hw *hw)
 	else
 		reg |= (1 << 28);
 	ew32(TARC(1), reg);
+
+	/*
+	 * Disable IPv6 extension header parsing because some malformed
+	 * IPv6 headers can hang the Rx.
+	 */
+	reg = er32(RFCTL);
+	reg |= (E1000_RFCTL_IPV6_EX_DIS | E1000_RFCTL_NEW_IPV6_EXT_DIS);
+	ew32(RFCTL, reg);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/e1000e/82571.c b/drivers/net/ethernet/intel/e1000e/82571.c
index 7b02e87..98632f4 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -1279,6 +1279,16 @@ static void e1000_initialize_hw_bits_82571(struct e1000_hw *hw)
 		ew32(CTRL_EXT, reg);
 	}
 
+	/*
+	 * Disable IPv6 extension header parsing because some malformed
+	 * IPv6 headers can hang the Rx.
+	 */
+	if (hw->mac.type <= e1000_82573) {
+		reg = er32(RFCTL);
+		reg |= (E1000_RFCTL_IPV6_EX_DIS | E1000_RFCTL_NEW_IPV6_EXT_DIS);
+		ew32(RFCTL, reg);
+	}
+
 	/* PCI-Ex Control Registers */
 	switch (hw->mac.type) {
 	case e1000_82574:
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index c58ed26..dfff441 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -3468,6 +3468,13 @@ static void e1000_initialize_hw_bits_ich8lan(struct e1000_hw *hw)
 	 */
 	reg = er32(RFCTL);
 	reg |= (E1000_RFCTL_NFSW_DIS | E1000_RFCTL_NFSR_DIS);
+
+	/*
+	 * Disable IPv6 extension header parsing because some malformed
+	 * IPv6 headers can hang the Rx.
+	 */
+	if (hw->mac.type == e1000_ich8lan)
+		reg |= (E1000_RFCTL_IPV6_EX_DIS | E1000_RFCTL_NEW_IPV6_EXT_DIS);
 	ew32(RFCTL, reg);
 }
 
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index e86b524..ab4505c 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -2939,6 +2939,7 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter)
 	/* Enable Extended Status in all Receive Descriptors */
 	rfctl = er32(RFCTL);
 	rfctl |= E1000_RFCTL_EXTEN;
+	ew32(RFCTL, rfctl);
 
 	/*
 	 * 82571 and greater support packet-split where the protocol
@@ -2964,13 +2965,6 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter)
 	if (adapter->rx_ps_pages) {
 		u32 psrctl = 0;
 
-		/*
-		 * disable packet split support for IPv6 extension headers,
-		 * because some malformed IPv6 headers can hang the Rx
-		 */
-		rfctl |= (E1000_RFCTL_IPV6_EX_DIS |
-			  E1000_RFCTL_NEW_IPV6_EXT_DIS);
-
 		/* Enable Packet split descriptors */
 		rctl |= E1000_RCTL_DTYP_PS;
 
@@ -3009,7 +3003,6 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter)
 		 */
 	}
 
-	ew32(RFCTL, rfctl);
 	ew32(RCTL, rctl);
 	/* just started the receive unit, no need to restart */
 	adapter->flags &= ~FLAG_RX_RESTART_NOW;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net-next 5/9] e1000e: Disable ASPM L1 on 82574
  2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
                   ` (3 preceding siblings ...)
  2012-05-03  9:56 ` [net-next 4/9] e1000e: Driver workaround for IPv6 Header Extension Erratum Jeff Kirsher
@ 2012-05-03  9:56 ` Jeff Kirsher
  2012-05-03 10:08   ` Nix
  2012-05-03  9:56 ` [net-next 6/9] e1000e: Remove special case for 82573/82574 ASPM L1 disablement Jeff Kirsher
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Jeff Kirsher @ 2012-05-03  9:56 UTC (permalink / raw)
  To: davem
  Cc: Chris Boot, netdev, gospo, sassmann, Wyborny, Carolyn, Nix, Jeff Kirsher

From: Chris Boot <bootc@bootc.net>

ASPM on the 82574 causes trouble. Currently the driver disables L0s for
this NIC but only disables L1 if the MTU is >1500. This patch simply
causes L1 to be disabled regardless of the MTU setting.

Signed-off-by: Chris Boot <bootc@bootc.net>
Cc: "Wyborny, Carolyn" <carolyn.wyborny@intel.com>
Cc: Nix <nix@esperi.org.uk>
Link: https://lkml.org/lkml/2012/3/19/362
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/82571.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/82571.c b/drivers/net/ethernet/intel/e1000e/82571.c
index 98632f4..6a8a908 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -2072,8 +2072,9 @@ const struct e1000_info e1000_82574_info = {
 				  | FLAG_HAS_SMART_POWER_DOWN
 				  | FLAG_HAS_AMT
 				  | FLAG_HAS_CTRLEXT_ON_LOAD,
-	.flags2			  = FLAG2_CHECK_PHY_HANG
+	.flags2			 = FLAG2_CHECK_PHY_HANG
 				  | FLAG2_DISABLE_ASPM_L0S
+				  | FLAG2_DISABLE_ASPM_L1
 				  | FLAG2_NO_DISABLE_RX
 				  | FLAG2_DMA_BURST,
 	.pba			= 32,
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net-next 6/9] e1000e: Remove special case for 82573/82574 ASPM L1 disablement
  2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
                   ` (4 preceding siblings ...)
  2012-05-03  9:56 ` [net-next 5/9] e1000e: Disable ASPM L1 on 82574 Jeff Kirsher
@ 2012-05-03  9:56 ` Jeff Kirsher
  2012-05-03  9:56 ` [net-next 7/9] ixgbevf: Add support to recognize 100mb link speed Jeff Kirsher
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff Kirsher @ 2012-05-03  9:56 UTC (permalink / raw)
  To: davem; +Cc: Chris Boot, netdev, gospo, sassmann, Jeff Kirsher

From: Chris Boot <bootc@bootc.net>

For the 82573, ASPM L1 gets disabled wholesale so this special-case code
is not required. For the 82574 the previous patch does the same as for
the 82573, disabling L1 on the adapter. Thus, this code is no longer
required and can be removed.

Signed-off-by: Chris Boot <bootc@bootc.net>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ab4505c..9c4576e 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5272,14 +5272,6 @@ static int e1000_change_mtu(struct net_device *netdev, int new_mtu)
 		return -EINVAL;
 	}
 
-	/* 82573 Errata 17 */
-	if (((adapter->hw.mac.type == e1000_82573) ||
-	     (adapter->hw.mac.type == e1000_82574)) &&
-	    (max_frame > ETH_FRAME_LEN + ETH_FCS_LEN)) {
-		adapter->flags2 |= FLAG2_DISABLE_ASPM_L1;
-		e1000e_disable_aspm(adapter->pdev, PCIE_LINK_STATE_L1);
-	}
-
 	while (test_and_set_bit(__E1000_RESETTING, &adapter->state))
 		usleep_range(1000, 2000);
 	/* e1000e_down -> e1000e_reset dependent on max_frame_size & mtu */
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net-next 7/9] ixgbevf: Add support to recognize 100mb link speed
  2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
                   ` (5 preceding siblings ...)
  2012-05-03  9:56 ` [net-next 6/9] e1000e: Remove special case for 82573/82574 ASPM L1 disablement Jeff Kirsher
@ 2012-05-03  9:56 ` Jeff Kirsher
  2012-05-03  9:56 ` [net-next 8/9] ixgbevf: Make sure jumbo frames are set correctly after PF reset Jeff Kirsher
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff Kirsher @ 2012-05-03  9:56 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher

From: Greg Rose <gregory.v.rose@intel.com>

The X540 10Gig controller is capable of linking at 100Mbits - add
support for reporting that link speed.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/defines.h |    2 ++
 drivers/net/ethernet/intel/ixgbevf/ethtool.c |   18 ++++++++++++++----
 drivers/net/ethernet/intel/ixgbevf/vf.c      |   12 +++++++++---
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/defines.h b/drivers/net/ethernet/intel/ixgbevf/defines.h
index 947b5c8..e09a6cc 100644
--- a/drivers/net/ethernet/intel/ixgbevf/defines.h
+++ b/drivers/net/ethernet/intel/ixgbevf/defines.h
@@ -40,6 +40,7 @@
 typedef u32 ixgbe_link_speed;
 #define IXGBE_LINK_SPEED_1GB_FULL       0x0020
 #define IXGBE_LINK_SPEED_10GB_FULL      0x0080
+#define IXGBE_LINK_SPEED_100_FULL	0x0008
 
 #define IXGBE_CTRL_RST              0x04000000 /* Reset (SW) */
 #define IXGBE_RXDCTL_ENABLE         0x02000000 /* Enable specific Rx Queue */
@@ -48,6 +49,7 @@ typedef u32 ixgbe_link_speed;
 #define IXGBE_LINKS_SPEED_82599     0x30000000
 #define IXGBE_LINKS_SPEED_10G_82599 0x30000000
 #define IXGBE_LINKS_SPEED_1G_82599  0x20000000
+#define IXGBE_LINKS_SPEED_100_82599 0x10000000
 
 /* Number of Transmit and Receive Descriptors must be a multiple of 8 */
 #define IXGBE_REQ_TX_DESCRIPTOR_MULTIPLE  8
diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 2bfe0d1..e8dddf5 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -107,10 +107,20 @@ static int ixgbevf_get_settings(struct net_device *netdev,
 	hw->mac.ops.check_link(hw, &link_speed, &link_up, false);
 
 	if (link_up) {
-		ethtool_cmd_speed_set(
-			ecmd,
-			(link_speed == IXGBE_LINK_SPEED_10GB_FULL) ?
-			SPEED_10000 : SPEED_1000);
+		__u32 speed = SPEED_10000;
+		switch (link_speed) {
+		case IXGBE_LINK_SPEED_10GB_FULL:
+			speed = SPEED_10000;
+			break;
+		case IXGBE_LINK_SPEED_1GB_FULL:
+			speed = SPEED_1000;
+			break;
+		case IXGBE_LINK_SPEED_100_FULL:
+			speed = SPEED_100;
+			break;
+		}
+
+		ethtool_cmd_speed_set(ecmd, speed);
 		ecmd->duplex = DUPLEX_FULL;
 	} else {
 		ethtool_cmd_speed_set(ecmd, -1);
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index 74be741..ec89b86 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -404,11 +404,17 @@ static s32 ixgbevf_check_mac_link_vf(struct ixgbe_hw *hw,
 	else
 		*link_up = false;
 
-	if ((links_reg & IXGBE_LINKS_SPEED_82599) ==
-	    IXGBE_LINKS_SPEED_10G_82599)
+	switch (links_reg & IXGBE_LINKS_SPEED_82599) {
+	case IXGBE_LINKS_SPEED_10G_82599:
 		*speed = IXGBE_LINK_SPEED_10GB_FULL;
-	else
+		break;
+	case IXGBE_LINKS_SPEED_1G_82599:
 		*speed = IXGBE_LINK_SPEED_1GB_FULL;
+		break;
+	case IXGBE_LINKS_SPEED_100_82599:
+		*speed = IXGBE_LINK_SPEED_100_FULL;
+		break;
+	}
 
 	return 0;
 }
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net-next 8/9] ixgbevf: Make sure jumbo frames are set correctly after PF reset
  2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
                   ` (6 preceding siblings ...)
  2012-05-03  9:56 ` [net-next 7/9] ixgbevf: Add support to recognize 100mb link speed Jeff Kirsher
@ 2012-05-03  9:56 ` Jeff Kirsher
  2012-05-03  9:56 ` [net-next 9/9] ixgbevf: Update version string Jeff Kirsher
  2012-05-03 17:30 ` [net-next 0/9][pull request] Intel Wired LAN Dirver Updates David Miller
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff Kirsher @ 2012-05-03  9:56 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher

From: Greg Rose <gregory.v.rose@intel.com>

If the Physical Function (PF) resets after the VF has set jumbo
frame MTU then the VF jumbo frame is overwritten.  Make sure the
VF driver always requests proper MTU size after reset
synchronization.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |    2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   28 ++++++++++----------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index dfed420..0a1b992 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -287,7 +287,7 @@ extern const struct ixgbe_mbx_operations ixgbevf_mbx_ops;
 extern const char ixgbevf_driver_name[];
 extern const char ixgbevf_driver_version[];
 
-extern int ixgbevf_up(struct ixgbevf_adapter *adapter);
+extern void ixgbevf_up(struct ixgbevf_adapter *adapter);
 extern void ixgbevf_down(struct ixgbevf_adapter *adapter);
 extern void ixgbevf_reinit_locked(struct ixgbevf_adapter *adapter);
 extern void ixgbevf_reset(struct ixgbevf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 307611a..5a0e228 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1608,13 +1608,14 @@ static void ixgbevf_init_last_counter_stats(struct ixgbevf_adapter *adapter)
 	adapter->stats.base_vfmprc = adapter->stats.last_vfmprc;
 }
 
-static int ixgbevf_up_complete(struct ixgbevf_adapter *adapter)
+static void ixgbevf_up_complete(struct ixgbevf_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
 	struct ixgbe_hw *hw = &adapter->hw;
 	int i, j = 0;
 	int num_rx_rings = adapter->num_rx_queues;
 	u32 txdctl, rxdctl;
+	u32 msg[2];
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		j = adapter->tx_ring[i].reg_idx;
@@ -1653,6 +1654,10 @@ static int ixgbevf_up_complete(struct ixgbevf_adapter *adapter)
 			hw->mac.ops.set_rar(hw, 0, hw->mac.perm_addr, 0);
 	}
 
+	msg[0] = IXGBE_VF_SET_LPE;
+	msg[1] = netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
+	hw->mbx.ops.write_posted(hw, msg, 2);
+
 	clear_bit(__IXGBEVF_DOWN, &adapter->state);
 	ixgbevf_napi_enable_all(adapter);
 
@@ -1667,24 +1672,20 @@ static int ixgbevf_up_complete(struct ixgbevf_adapter *adapter)
 	adapter->flags |= IXGBE_FLAG_NEED_LINK_UPDATE;
 	adapter->link_check_timeout = jiffies;
 	mod_timer(&adapter->watchdog_timer, jiffies);
-	return 0;
 }
 
-int ixgbevf_up(struct ixgbevf_adapter *adapter)
+void ixgbevf_up(struct ixgbevf_adapter *adapter)
 {
-	int err;
 	struct ixgbe_hw *hw = &adapter->hw;
 
 	ixgbevf_configure(adapter);
 
-	err = ixgbevf_up_complete(adapter);
+	ixgbevf_up_complete(adapter);
 
 	/* clear any pending interrupts, may auto mask */
 	IXGBE_READ_REG(hw, IXGBE_VTEICR);
 
 	ixgbevf_irq_enable(adapter, true, true);
-
-	return err;
 }
 
 /**
@@ -2673,9 +2674,7 @@ static int ixgbevf_open(struct net_device *netdev)
 	 */
 	ixgbevf_map_rings_to_vectors(adapter);
 
-	err = ixgbevf_up_complete(adapter);
-	if (err)
-		goto err_up;
+	ixgbevf_up_complete(adapter);
 
 	/* clear any pending interrupts, may auto mask */
 	IXGBE_READ_REG(hw, IXGBE_VTEICR);
@@ -2689,7 +2688,6 @@ static int ixgbevf_open(struct net_device *netdev)
 
 err_req_irq:
 	ixgbevf_down(adapter);
-err_up:
 	ixgbevf_free_irq(adapter);
 err_setup_rx:
 	ixgbevf_free_all_rx_resources(adapter);
@@ -3196,9 +3194,11 @@ static int ixgbevf_change_mtu(struct net_device *netdev, int new_mtu)
 	/* must set new MTU before calling down or up */
 	netdev->mtu = new_mtu;
 
-	msg[0] = IXGBE_VF_SET_LPE;
-	msg[1] = max_frame;
-	hw->mbx.ops.write_posted(hw, msg, 2);
+	if (!netif_running(netdev)) {
+		msg[0] = IXGBE_VF_SET_LPE;
+		msg[1] = max_frame;
+		hw->mbx.ops.write_posted(hw, msg, 2);
+	}
 
 	if (netif_running(netdev))
 		ixgbevf_reinit_locked(adapter);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net-next 9/9] ixgbevf: Update version string
  2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
                   ` (7 preceding siblings ...)
  2012-05-03  9:56 ` [net-next 8/9] ixgbevf: Make sure jumbo frames are set correctly after PF reset Jeff Kirsher
@ 2012-05-03  9:56 ` Jeff Kirsher
  2012-05-03 17:30 ` [net-next 0/9][pull request] Intel Wired LAN Dirver Updates David Miller
  9 siblings, 0 replies; 16+ messages in thread
From: Jeff Kirsher @ 2012-05-03  9:56 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher

From: Greg Rose <gregory.v.rose@intel.com>

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 5a0e228..f69ec42 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -57,7 +57,7 @@ const char ixgbevf_driver_name[] = "ixgbevf";
 static const char ixgbevf_driver_string[] =
 	"Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver";
 
-#define DRV_VERSION "2.2.0-k"
+#define DRV_VERSION "2.6.0-k"
 const char ixgbevf_driver_version[] = DRV_VERSION;
 static char ixgbevf_copyright[] =
 	"Copyright (c) 2009 - 2012 Intel Corporation.";
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
  2012-05-03  9:56 ` [net-next 5/9] e1000e: Disable ASPM L1 on 82574 Jeff Kirsher
@ 2012-05-03 10:08   ` Nix
  2012-05-03 20:12     ` Wyborny, Carolyn
  0 siblings, 1 reply; 16+ messages in thread
From: Nix @ 2012-05-03 10:08 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Chris Boot, netdev, gospo, sassmann, Wyborny, Carolyn

On 3 May 2012, Jeff Kirsher spake thusly:

> From: Chris Boot <bootc@bootc.net>
>
> ASPM on the 82574 causes trouble. Currently the driver disables L0s for
> this NIC but only disables L1 if the MTU is >1500. This patch simply
> causes L1 to be disabled regardless of the MTU setting.
>
> Signed-off-by: Chris Boot <bootc@bootc.net>
> Cc: "Wyborny, Carolyn" <carolyn.wyborny@intel.com>
> Cc: Nix <nix@esperi.org.uk>
> Link: https://lkml.org/lkml/2012/3/19/362
> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

(reminder: this is known not to fix the instance of this problem I am
experiencing, where ASPM is being re-enabled by something even if turned
off via setpci during boot, though it does fix those instances seen by
others where that doesn't happen. I'd have done more printf()-scattering
debugging to see where it's turned back on if it wasn't that this is
happening on an always-on server for which rebooting outside the dead of
night is a long-winded chore...)

FWIW I have also seen -- very rare -- lockups of the same nature on
82574L links in 100MbE mode using non-jumbo frames. However they are far
more common on GbE jumbo-framed links, normally taking less than an hour
to take the link down with a wildly corrupted register set (as shown by
ethtool).

(It's annoying this firmware isn't flashable so we could just *fix* this
bug rather than working around it. :( )


I think I might cheat a bit next and printk_once() the state of ASPM L1
on the errant PCI device from inside the scheduler when it flips from L1
off to L1 on again. At 100 tests per second that should indicate at what
time the thing is turned back on fairly tightly: even if not providing a
direct clue as to which bit of the kernel is doing it, if I combine it
with a set -x in userspace it should at least indicate what bit of the
boot process is happening at the same time. It'll be the weekend before
I can try that though.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next 0/9][pull request] Intel Wired LAN Dirver Updates
  2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
                   ` (8 preceding siblings ...)
  2012-05-03  9:56 ` [net-next 9/9] ixgbevf: Update version string Jeff Kirsher
@ 2012-05-03 17:30 ` David Miller
  9 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2012-05-03 17:30 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu,  3 May 2012 02:56:23 -0700

> This series of patches contains updates for e1000e and ixgbevf.
> 
> The following are changes since commit af94bf6db1d58d26f1cdab145b6312ad363254a6:
>   ixgbe: Fix use after free on module remove
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Pulled, thanks Jeff.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
  2012-05-03 10:08   ` Nix
@ 2012-05-03 20:12     ` Wyborny, Carolyn
  2012-05-03 20:17       ` Nix
  0 siblings, 1 reply; 16+ messages in thread
From: Wyborny, Carolyn @ 2012-05-03 20:12 UTC (permalink / raw)
  To: Nix, Kirsher, Jeffrey T; +Cc: davem, Chris Boot, netdev, gospo, sassmann



>-----Original Message-----
>From: Nix [mailto:nix@esperi.org.uk]
>Sent: Thursday, May 03, 2012 3:09 AM
>To: Kirsher, Jeffrey T
>Cc: davem@davemloft.net; Chris Boot; netdev@vger.kernel.org;
>gospo@redhat.com; sassmann@redhat.com; Wyborny, Carolyn
>Subject: Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
>
 [..]
>(reminder: this is known not to fix the instance of this problem I am
>experiencing, where ASPM is being re-enabled by something even if turned
>off via setpci during boot, though it does fix those instances seen by
>others where that doesn't happen. I'd have done more printf()-scattering
>debugging to see where it's turned back on if it wasn't that this is
>happening on an always-on server for which rebooting outside the dead of
>night is a long-winded chore...)
>
>FWIW I have also seen -- very rare -- lockups of the same nature on
>82574L links in 100MbE mode using non-jumbo frames. However they are far
>more common on GbE jumbo-framed links, normally taking less than an hour
>to take the link down with a wildly corrupted register set (as shown by
>ethtool).
>
>(It's annoying this firmware isn't flashable so we could just *fix* this
>bug rather than working around it. :( )
>
>
>I think I might cheat a bit next and printk_once() the state of ASPM L1
>on the errant PCI device from inside the scheduler when it flips from L1
>off to L1 on again. At 100 tests per second that should indicate at what
>time the thing is turned back on fairly tightly: even if not providing a
>direct clue as to which bit of the kernel is doing it, if I combine it
>with a set -x in userspace it should at least indicate what bit of the
>boot process is happening at the same time. It'll be the weekend before
>I can try that though.
>
>--
>NULL && (void)

Hello,

It would be good to know why/how your system is re-enabling the setting.  The problem is not solvable in firmware unfortunately and is somewhat platform dependent. MMIO-tracer might be used to try and see when the re-enabling config space is written, but it might be too heavyweight for a live production system.

I am also working on a patch to the driver to detect the condition and then do a slot reset to avoid a whole system reboot.  Would you be willing to test it in your problem system?

Thanks,

Carolyn

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
  2012-05-03 20:12     ` Wyborny, Carolyn
@ 2012-05-03 20:17       ` Nix
  2012-05-05 16:33         ` Nix
  0 siblings, 1 reply; 16+ messages in thread
From: Nix @ 2012-05-03 20:17 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: Kirsher, Jeffrey T, davem, Chris Boot, netdev, gospo, sassmann

On 3 May 2012, Carolyn Wyborny told this:

> It would be good to know why/how your system is re-enabling the
> setting. The problem is not solvable in firmware unfortunately and is
> somewhat platform dependent. MMIO-tracer might be used to try and see

I entirely forgot about that tool! *Definitely* worth trying.

I'll give it a try this weekend.

> when the re-enabling config space is written, but it might be too
> heavyweight for a live production system.

Given that the re-enabling happens at around the same time as the boot
scripts finish running (it's done by the time I can log in), that's not
going to be a problem. Hence my speculation that it's being re-enabled
when the interface stabilizes (which is, of course, asynchronous) or
something like that.

> I am also working on a patch to the driver to detect the condition and
> then do a slot reset to avoid a whole system reboot. Would you be
> willing to test it in your problem system?

Yes, definitely. The whole-system reboot is irritating because the
system is headless, and with its NICs dead that means a big red switch
to reboot when this problem strikes, which gives me the heebie-jeebies
:)

(Turning off ASPM definitely completely fixes it, so it *is* this bug.
It's just getting the disabling to stick that's proving tricky.)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
  2012-05-03 20:17       ` Nix
@ 2012-05-05 16:33         ` Nix
  2012-05-09 14:02           ` Nix
  0 siblings, 1 reply; 16+ messages in thread
From: Nix @ 2012-05-05 16:33 UTC (permalink / raw)
  To: Wyborny, Carolyn, Matthew Garrett
  Cc: Kirsher, Jeffrey T, davem, Chris Boot, netdev, gospo, sassmann

On 3 May 2012, nix@esperi.org.uk outgrape:

> On 3 May 2012, Carolyn Wyborny told this:
>
>> It would be good to know why/how your system is re-enabling the
>> setting. The problem is not solvable in firmware unfortunately and is
>> somewhat platform dependent. MMIO-tracer might be used to try and see
>
> I entirely forgot about that tool! *Definitely* worth trying.
>
> I'll give it a try this weekend.

Well, mmiotrace was a total flop: massive numbers of unexpected
secondary interrupts and a hard lockup. Still, I've now diagnosed this
bug and it's right up Matthew Garrett's street!

Matthew: the problem here is a server with an 82574L (controlled by the
e1000e driver). This NIC has a hardware bug causing it to lock up in a
way that only a reboot can solve in an hour or two if PCIe ASPM is not
disabled during boot (leaving me with my home directory stuck behind a
dead NIC on a headless machine, most annoying). The driver is attempting
to disable it, but failing.

>> when the re-enabling config space is written, but it might be too
>> heavyweight for a live production system.
>
> Given that the re-enabling happens at around the same time as the boot
> scripts finish running (it's done by the time I can log in), that's not
> going to be a problem. Hence my speculation that it's being re-enabled
> when the interface stabilizes (which is, of course, asynchronous) or
> something like that.

This is wrong. The disable never happens. The BIOS has been told to
enable PCIe ASPM. However, the kernel log says:

May  5 17:06:53 spindle info: [    0.629699]  pci0000:00: Requesting ACPI _OSC control (0x1d)
May  5 17:06:53 spindle info: [    0.629941]  pci0000:00: ACPI _OSC request failed (AE_NOT_FOUND), returned control mask: 0x1d
May  5 17:06:53 spindle info: [    0.630373] ACPI _OSC control for PCIe not granted, disabling ASPM

Unless pcie_aspm=force has been specified on the kernel command line,
this flips aspm_disabled to 1.

The e1000e driver then says (with a bit of extra debugging info I
added):

May  5 17:06:53 spindle info: [    1.248153] e1000e 0000:03:00.0: Disabling ASPM L0s L1
May  5 17:06:53 spindle info: [    1.248393] e1000e 0000:03:00.0: Disabling ASPM via pci_disable_link_state_locked()
May  5 17:06:53 spindle info: [    1.248823] e1000e 0000:03:00.0: aspm disabled, not forcing

i.e. because aspm_disabled is set, pci/pcie/aspm.c refuses to make any
changes at all to ASPM link state, not even to turn *off* ASPM on a
device on which the BIOS turned it on at boot. So ASPM remains enabled
and the NIC eventually locks up.

The question here is how to fix it. It appears that the motherboard or
BIOS on this machine does not grant _OSC control even (especially?) if
you have turned on PCIe ASPM in the BIOS. But perhaps even if _OSC is
not granted you should permit PCIe to be *disabled* by drivers, just not
enabled? (The BIOS appears to be buggy in this area: if you turn off
ASPM, save, and go back into setup, ASPM has turned itself back on
again!)

I'm not sure what the right thing to do is here: I don't know enough
about this area. But it does seem very strange that the only way I have
to turn off PCIe ASPM reliably on this device is to tell the kernel to
forcibly turn it *on*!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
  2012-05-05 16:33         ` Nix
@ 2012-05-09 14:02           ` Nix
  0 siblings, 0 replies; 16+ messages in thread
From: Nix @ 2012-05-09 14:02 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: Matthew Garrett, Kirsher, Jeffrey T, davem, Chris Boot, netdev,
	gospo, sassmann

On 5 May 2012, nix@esperi.org.uk outgrape:

> The question here is how to fix it. It appears that the motherboard or
> BIOS on this machine does not grant _OSC control even (especially?) if
> you have turned on PCIe ASPM in the BIOS. But perhaps even if _OSC is
> not granted you should permit PCIe to be *disabled* by drivers, just not
> enabled? (The BIOS appears to be buggy in this area: if you turn off
> ASPM, save, and go back into setup, ASPM has turned itself back on
> again!)

This turned out to be me not knowing how to drive the BIOS's deeply
unintuitive configuration program. If I turn PCIe ASPM off in the BIOS,
the kernel does exactly the same as it does in my previous message (i.e.
decides that ASPM is disabled due to the failure of an _OSC request,
then refuses to change the ASPM link state of the e1000e), but since the
BIOS has already disabled ASPM, the card is not in crash-happy mode and
I don't need to force anything off by hand.

But if I turn ASPM on, as reported in my previous message the kernel
promptly bans itself from changing any PCIe ASPM link states whatsoever,
and the e1000e locks up about an hour later.

I presume that

May  5 17:06:53 spindle info: [    0.629699]  pci0000:00: Requesting ACPI _OSC control (0x1d)
May  5 17:06:53 spindle info: [    0.629941]  pci0000:00: ACPI _OSC request failed (AE_NOT_FOUND), returned control mask: 0x1d
May  5 17:06:53 spindle info: [    0.630373] ACPI _OSC control for PCIe not granted, disabling ASPM

is reporting some sort of BIOS bug, but it is at best confusing to have
the boot messages reporting that ASPM is disabled: better perhaps to
describe this as 'leaving ASPM on all devices how the BIOS set it' and
have the e1000e driver emit a giant flaming warning if it spots this
happening on an 82574L with ASPM turned on. (Or, alternatively, permit
ASPM to be turned off when the system is in this state, but not on,
whereupon the existing code in the e1000e driver will do the right
thing. But I don't know if that will break any laptops. This machine is
very much not a laptop.)

> I'm not sure what the right thing to do is here: I don't know enough
> about this area. But it does seem very strange that the only way I have
> to turn off PCIe ASPM reliably on this device is to tell the kernel to
> forcibly turn it *on*!

This is still strange, though it seems that turning ASPM completely off
in the BIOS will also serve.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-05-09 14:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-03  9:56 [net-next 0/9][pull request] Intel Wired LAN Dirver Updates Jeff Kirsher
2012-05-03  9:56 ` [net-next 1/9] e1000e: suggest a possible workaround to a device hang on 82577/8 Jeff Kirsher
2012-05-03  9:56 ` [net-next 2/9] e1000e: cleanup long [read|write]_reg_locked PHY ops function pointers Jeff Kirsher
2012-05-03  9:56 ` [net-next 3/9] e1000e: Resolve intermittent negotiation issue on 82574/82583 Jeff Kirsher
2012-05-03  9:56 ` [net-next 4/9] e1000e: Driver workaround for IPv6 Header Extension Erratum Jeff Kirsher
2012-05-03  9:56 ` [net-next 5/9] e1000e: Disable ASPM L1 on 82574 Jeff Kirsher
2012-05-03 10:08   ` Nix
2012-05-03 20:12     ` Wyborny, Carolyn
2012-05-03 20:17       ` Nix
2012-05-05 16:33         ` Nix
2012-05-09 14:02           ` Nix
2012-05-03  9:56 ` [net-next 6/9] e1000e: Remove special case for 82573/82574 ASPM L1 disablement Jeff Kirsher
2012-05-03  9:56 ` [net-next 7/9] ixgbevf: Add support to recognize 100mb link speed Jeff Kirsher
2012-05-03  9:56 ` [net-next 8/9] ixgbevf: Make sure jumbo frames are set correctly after PF reset Jeff Kirsher
2012-05-03  9:56 ` [net-next 9/9] ixgbevf: Update version string Jeff Kirsher
2012-05-03 17:30 ` [net-next 0/9][pull request] Intel Wired LAN Dirver Updates David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.