All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] i40e: fix firmware update
@ 2017-09-01 14:02 ` Stefan Assmann
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Assmann @ 2017-09-01 14:02 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, davem, jeffrey.t.kirsher, sassmann

The first patch fixes the firmware update which is currently broken and
results in a bad flash (corrupt firmware). Recovery is possible with a
fixed driver.
The second patch reverts a commit that causes the firmware checksum
verification to fail right after a successful flash. This is related to
a recent workqueue change. Haven't gotten to the bottom of this yet, but
for the sake of a smooth firmware update experience let's revert the
commit for now.

Stefan Assmann (2):
  i40e: use non-locking i40e_read_nvm_word() function during nvmupdate
  Revert "i40e: remove WQ_UNBOUND and the task limit of our workqueue"

 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++++-------
 drivers/net/ethernet/intel/i40e/i40e_nvm.c  | 24 ++++++++++++++++++++++--
 2 files changed, 27 insertions(+), 9 deletions(-)

-- 
2.13.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH 0/2] i40e: fix firmware update
@ 2017-09-01 14:02 ` Stefan Assmann
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Assmann @ 2017-09-01 14:02 UTC (permalink / raw)
  To: intel-wired-lan

The first patch fixes the firmware update which is currently broken and
results in a bad flash (corrupt firmware). Recovery is possible with a
fixed driver.
The second patch reverts a commit that causes the firmware checksum
verification to fail right after a successful flash. This is related to
a recent workqueue change. Haven't gotten to the bottom of this yet, but
for the sake of a smooth firmware update experience let's revert the
commit for now.

Stefan Assmann (2):
  i40e: use non-locking i40e_read_nvm_word() function during nvmupdate
  Revert "i40e: remove WQ_UNBOUND and the task limit of our workqueue"

 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++++-------
 drivers/net/ethernet/intel/i40e/i40e_nvm.c  | 24 ++++++++++++++++++++++--
 2 files changed, 27 insertions(+), 9 deletions(-)

-- 
2.13.5


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] i40e: use non-locking i40e_read_nvm_word() function during nvmupdate
  2017-09-01 14:02 ` [Intel-wired-lan] " Stefan Assmann
@ 2017-09-01 14:02   ` Stefan Assmann
  -1 siblings, 0 replies; 8+ messages in thread
From: Stefan Assmann @ 2017-09-01 14:02 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, davem, jeffrey.t.kirsher, sassmann

During firmware update the adminq gets locked. Currently during the
update i40e_calc_nvm_checksum() calls i40e_read_nvm_word() which itself
tries to acquire the adminq lock. This fails as the lock is already
taken.
This results in the firmware update to fail, leaving the eeprom in a
corrupt state.

To fix this, introducing __i40e_read_nvm_word() which avoids locking and
replace the calls to i40e_read_nvm_word() in i40e_calc_nvm_checksum()
with the newly introduced function. The change matches the i40e
sourceforge driver as much as possible.

With this in place the firmware update is now working again.
Fixes: ("96a39aed25e6 i40e: Acquire NVM lock before reads on all devices")

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/ethernet/intel/i40e/i40e_nvm.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
index 96afef98a08f..aa4cfdc51d2b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
@@ -322,6 +322,26 @@ i40e_status i40e_read_nvm_word(struct i40e_hw *hw, u16 offset,
 }
 
 /**
+ * __i40e_read_nvm_word - Reads nvm word, assumes caller does the locking
+ * @hw: pointer to the HW structure
+ * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF)
+ * @data: word read from the Shadow RAM
+ *
+ * Reads one 16 bit word from the Shadow RAM using the GLNVM_SRCTL register.
+ **/
+static i40e_status __i40e_read_nvm_word(struct i40e_hw *hw,
+					u16 offset, u16 *data)
+{
+	enum i40e_status_code ret_code = 0;
+
+	if (hw->flags & I40E_HW_FLAG_AQ_SRCTL_ACCESS_ENABLE)
+		ret_code = i40e_read_nvm_word_aq(hw, offset, data);
+	else
+		ret_code = i40e_read_nvm_word_srctl(hw, offset, data);
+	return ret_code;
+}
+
+/**
  * i40e_read_nvm_buffer_srctl - Reads Shadow RAM buffer via SRCTL register
  * @hw: pointer to the HW structure
  * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF).
@@ -516,14 +536,14 @@ static i40e_status i40e_calc_nvm_checksum(struct i40e_hw *hw,
 	data = (u16 *)vmem.va;
 
 	/* read pointer to VPD area */
-	ret_code = i40e_read_nvm_word(hw, I40E_SR_VPD_PTR, &vpd_module);
+	ret_code = __i40e_read_nvm_word(hw, I40E_SR_VPD_PTR, &vpd_module);
 	if (ret_code) {
 		ret_code = I40E_ERR_NVM_CHECKSUM;
 		goto i40e_calc_nvm_checksum_exit;
 	}
 
 	/* read pointer to PCIe Alt Auto-load module */
-	ret_code = i40e_read_nvm_word(hw, I40E_SR_PCIE_ALT_AUTO_LOAD_PTR,
+	ret_code = __i40e_read_nvm_word(hw, I40E_SR_PCIE_ALT_AUTO_LOAD_PTR,
 				      &pcie_alt_module);
 	if (ret_code) {
 		ret_code = I40E_ERR_NVM_CHECKSUM;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH 1/2] i40e: use non-locking i40e_read_nvm_word() function during nvmupdate
@ 2017-09-01 14:02   ` Stefan Assmann
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Assmann @ 2017-09-01 14:02 UTC (permalink / raw)
  To: intel-wired-lan

During firmware update the adminq gets locked. Currently during the
update i40e_calc_nvm_checksum() calls i40e_read_nvm_word() which itself
tries to acquire the adminq lock. This fails as the lock is already
taken.
This results in the firmware update to fail, leaving the eeprom in a
corrupt state.

To fix this, introducing __i40e_read_nvm_word() which avoids locking and
replace the calls to i40e_read_nvm_word() in i40e_calc_nvm_checksum()
with the newly introduced function. The change matches the i40e
sourceforge driver as much as possible.

With this in place the firmware update is now working again.
Fixes: ("96a39aed25e6 i40e: Acquire NVM lock before reads on all devices")

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/ethernet/intel/i40e/i40e_nvm.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
index 96afef98a08f..aa4cfdc51d2b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
@@ -322,6 +322,26 @@ i40e_status i40e_read_nvm_word(struct i40e_hw *hw, u16 offset,
 }
 
 /**
+ * __i40e_read_nvm_word - Reads nvm word, assumes caller does the locking
+ * @hw: pointer to the HW structure
+ * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF)
+ * @data: word read from the Shadow RAM
+ *
+ * Reads one 16 bit word from the Shadow RAM using the GLNVM_SRCTL register.
+ **/
+static i40e_status __i40e_read_nvm_word(struct i40e_hw *hw,
+					u16 offset, u16 *data)
+{
+	enum i40e_status_code ret_code = 0;
+
+	if (hw->flags & I40E_HW_FLAG_AQ_SRCTL_ACCESS_ENABLE)
+		ret_code = i40e_read_nvm_word_aq(hw, offset, data);
+	else
+		ret_code = i40e_read_nvm_word_srctl(hw, offset, data);
+	return ret_code;
+}
+
+/**
  * i40e_read_nvm_buffer_srctl - Reads Shadow RAM buffer via SRCTL register
  * @hw: pointer to the HW structure
  * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF).
@@ -516,14 +536,14 @@ static i40e_status i40e_calc_nvm_checksum(struct i40e_hw *hw,
 	data = (u16 *)vmem.va;
 
 	/* read pointer to VPD area */
-	ret_code = i40e_read_nvm_word(hw, I40E_SR_VPD_PTR, &vpd_module);
+	ret_code = __i40e_read_nvm_word(hw, I40E_SR_VPD_PTR, &vpd_module);
 	if (ret_code) {
 		ret_code = I40E_ERR_NVM_CHECKSUM;
 		goto i40e_calc_nvm_checksum_exit;
 	}
 
 	/* read pointer to PCIe Alt Auto-load module */
-	ret_code = i40e_read_nvm_word(hw, I40E_SR_PCIE_ALT_AUTO_LOAD_PTR,
+	ret_code = __i40e_read_nvm_word(hw, I40E_SR_PCIE_ALT_AUTO_LOAD_PTR,
 				      &pcie_alt_module);
 	if (ret_code) {
 		ret_code = I40E_ERR_NVM_CHECKSUM;
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] Revert "i40e: remove WQ_UNBOUND and the task limit of our workqueue"
  2017-09-01 14:02 ` [Intel-wired-lan] " Stefan Assmann
@ 2017-09-01 14:02   ` Stefan Assmann
  -1 siblings, 0 replies; 8+ messages in thread
From: Stefan Assmann @ 2017-09-01 14:02 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, davem, jeffrey.t.kirsher, sassmann

This reverts commit 4d5957cbdecdbb77d24c1465caadd801c07afa4a.

Due to this workqueue change the eeprom check right after flashing
firmware fails, although the flash itself completed successfully.
The error observed looks like this
i40e 0000:88:00.0: eeprom check failed (-62), Tx/Rx traffic disabled
The NIC is fully operational after the flash and even after a cold boot
any follow-up eeprom checks succeed.

This needs to be investigated, but for now it should be more important
to make sure the firmware update works as expected.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6498da8806cb..069b6683e1b0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12160,14 +12160,12 @@ static int __init i40e_init_module(void)
 		i40e_driver_string, i40e_driver_version_str);
 	pr_info("%s: %s\n", i40e_driver_name, i40e_copyright);
 
-	/* There is no need to throttle the number of active tasks because
-	 * each device limits its own task using a state bit for scheduling
-	 * the service task, and the device tasks do not interfere with each
-	 * other, so we don't set a max task limit. We must set WQ_MEM_RECLAIM
-	 * since we need to be able to guarantee forward progress even under
-	 * memory pressure.
+	/* we will see if single thread per module is enough for now,
+	 * it can't be any worse than using the system workqueue which
+	 * was already single threaded
 	 */
-	i40e_wq = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0, i40e_driver_name);
+	i40e_wq = alloc_workqueue("%s", WQ_UNBOUND | WQ_MEM_RECLAIM, 1,
+				  i40e_driver_name);
 	if (!i40e_wq) {
 		pr_err("%s: Failed to create workqueue\n", i40e_driver_name);
 		return -ENOMEM;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH 2/2] Revert "i40e: remove WQ_UNBOUND and the task limit of our workqueue"
@ 2017-09-01 14:02   ` Stefan Assmann
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Assmann @ 2017-09-01 14:02 UTC (permalink / raw)
  To: intel-wired-lan

This reverts commit 4d5957cbdecdbb77d24c1465caadd801c07afa4a.

Due to this workqueue change the eeprom check right after flashing
firmware fails, although the flash itself completed successfully.
The error observed looks like this
i40e 0000:88:00.0: eeprom check failed (-62), Tx/Rx traffic disabled
The NIC is fully operational after the flash and even after a cold boot
any follow-up eeprom checks succeed.

This needs to be investigated, but for now it should be more important
to make sure the firmware update works as expected.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6498da8806cb..069b6683e1b0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12160,14 +12160,12 @@ static int __init i40e_init_module(void)
 		i40e_driver_string, i40e_driver_version_str);
 	pr_info("%s: %s\n", i40e_driver_name, i40e_copyright);
 
-	/* There is no need to throttle the number of active tasks because
-	 * each device limits its own task using a state bit for scheduling
-	 * the service task, and the device tasks do not interfere with each
-	 * other, so we don't set a max task limit. We must set WQ_MEM_RECLAIM
-	 * since we need to be able to guarantee forward progress even under
-	 * memory pressure.
+	/* we will see if single thread per module is enough for now,
+	 * it can't be any worse than using the system workqueue which
+	 * was already single threaded
 	 */
-	i40e_wq = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0, i40e_driver_name);
+	i40e_wq = alloc_workqueue("%s", WQ_UNBOUND | WQ_MEM_RECLAIM, 1,
+				  i40e_driver_name);
 	if (!i40e_wq) {
 		pr_err("%s: Failed to create workqueue\n", i40e_driver_name);
 		return -ENOMEM;
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [PATCH 0/2] i40e: fix firmware update
  2017-09-01 14:02 ` [Intel-wired-lan] " Stefan Assmann
@ 2017-09-01 19:19   ` Keller, Jacob E
  -1 siblings, 0 replies; 8+ messages in thread
From: Keller, Jacob E @ 2017-09-01 19:19 UTC (permalink / raw)
  To: Stefan Assmann, intel-wired-lan; +Cc: netdev, davem, Kirsher, Jeffrey T



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Stefan Assmann
> Sent: Friday, September 01, 2017 7:03 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; davem@davemloft.net; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; sassmann@kpanic.de
> Subject: [PATCH 0/2] i40e: fix firmware update
> 
> The first patch fixes the firmware update which is currently broken and
> results in a bad flash (corrupt firmware). Recovery is possible with a
> fixed driver.
> The second patch reverts a commit that causes the firmware checksum
> verification to fail right after a successful flash. This is related to
> a recent workqueue change. Haven't gotten to the bottom of this yet, but
> for the sake of a smooth firmware update experience let's revert the
> commit for now.

Hi Stefan,

Thanks for these patches, I apologize for the time it took for us to respond to this. 

The first patch is functionally correct, and I'm surprised we missed sending an equivalent ourselves. It looks like some related changes occurred around this code, and we failed to submit the patch.

I think Jeff would prefer if we send the version based directly on the out-of-tree code, which I will be reviving and submitting shortly.

The second issue I believe is not fixed correctly by the patch, I'm unsure why exactly changing the WQ would cause this but I believe that a similar patch which creates a non-locked version of i40e_nvm_read_buffer() will resolve this, and I will be sending that patch as well, which I believe is the real fix versus halting the work queue.

Thanks,
Jake

> 
> Stefan Assmann (2):
>   i40e: use non-locking i40e_read_nvm_word() function during nvmupdate
>   Revert "i40e: remove WQ_UNBOUND and the task limit of our workqueue"
> 
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++++-------
>  drivers/net/ethernet/intel/i40e/i40e_nvm.c  | 24 ++++++++++++++++++++++--
>  2 files changed, 27 insertions(+), 9 deletions(-)
> 
> --
> 2.13.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH 0/2] i40e: fix firmware update
@ 2017-09-01 19:19   ` Keller, Jacob E
  0 siblings, 0 replies; 8+ messages in thread
From: Keller, Jacob E @ 2017-09-01 19:19 UTC (permalink / raw)
  To: intel-wired-lan



> -----Original Message-----
> From: netdev-owner at vger.kernel.org [mailto:netdev-owner at vger.kernel.org]
> On Behalf Of Stefan Assmann
> Sent: Friday, September 01, 2017 7:03 AM
> To: intel-wired-lan at lists.osuosl.org
> Cc: netdev at vger.kernel.org; davem at davemloft.net; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; sassmann at kpanic.de
> Subject: [PATCH 0/2] i40e: fix firmware update
> 
> The first patch fixes the firmware update which is currently broken and
> results in a bad flash (corrupt firmware). Recovery is possible with a
> fixed driver.
> The second patch reverts a commit that causes the firmware checksum
> verification to fail right after a successful flash. This is related to
> a recent workqueue change. Haven't gotten to the bottom of this yet, but
> for the sake of a smooth firmware update experience let's revert the
> commit for now.

Hi Stefan,

Thanks for these patches, I apologize for the time it took for us to respond to this. 

The first patch is functionally correct, and I'm surprised we missed sending an equivalent ourselves. It looks like some related changes occurred around this code, and we failed to submit the patch.

I think Jeff would prefer if we send the version based directly on the out-of-tree code, which I will be reviving and submitting shortly.

The second issue I believe is not fixed correctly by the patch, I'm unsure why exactly changing the WQ would cause this but I believe that a similar patch which creates a non-locked version of i40e_nvm_read_buffer() will resolve this, and I will be sending that patch as well, which I believe is the real fix versus halting the work queue.

Thanks,
Jake

> 
> Stefan Assmann (2):
>   i40e: use non-locking i40e_read_nvm_word() function during nvmupdate
>   Revert "i40e: remove WQ_UNBOUND and the task limit of our workqueue"
> 
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++++-------
>  drivers/net/ethernet/intel/i40e/i40e_nvm.c  | 24 ++++++++++++++++++++++--
>  2 files changed, 27 insertions(+), 9 deletions(-)
> 
> --
> 2.13.5



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-09-01 19:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-01 14:02 [PATCH 0/2] i40e: fix firmware update Stefan Assmann
2017-09-01 14:02 ` [Intel-wired-lan] " Stefan Assmann
2017-09-01 14:02 ` [PATCH 1/2] i40e: use non-locking i40e_read_nvm_word() function during nvmupdate Stefan Assmann
2017-09-01 14:02   ` [Intel-wired-lan] " Stefan Assmann
2017-09-01 14:02 ` [PATCH 2/2] Revert "i40e: remove WQ_UNBOUND and the task limit of our workqueue" Stefan Assmann
2017-09-01 14:02   ` [Intel-wired-lan] " Stefan Assmann
2017-09-01 19:19 ` [PATCH 0/2] i40e: fix firmware update Keller, Jacob E
2017-09-01 19:19   ` [Intel-wired-lan] " Keller, Jacob E

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.