netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/4][pull request] Intel Wired LAN Driver Updates 2021-01-28
@ 2021-01-28 21:38 Tony Nguyen
  2021-01-28 21:38 ` [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended Tony Nguyen
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Tony Nguyen @ 2021-01-28 21:38 UTC (permalink / raw)
  To: davem, kuba; +Cc: Tony Nguyen, netdev, sassmann

This series contains updates to igc and i40e drivers.

Kai-Heng Feng fixes igc to report unknown speed and duplex during suspend
as an attempted read will cause errors.

Kevin Lo sets the default value to -IGC_ERR_NVM instead of success for
writing shadow RAM as this could miss a timeout. Also propagates the return
value for Flow Control configuration to properly pass on errors for igc.

Aleksandr reverts commit 2ad1274fa35a ("i40e: don't report link up for a VF
who hasn't enabled") as this can cause link flapping.

The following are changes since commit 44a674d6f79867d5652026f1cc11f7ba8a390183:
  Merge tag 'mlx5-fixes-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 1GbE

Aleksandr Loktionov (1):
  i40e: Revert "i40e: don't report link up for a VF who hasn't enabled
    queues"

Kai-Heng Feng (1):
  igc: Report speed and duplex as unknown when device is runtime
    suspended

Kevin Lo (2):
  igc: set the default return value to -IGC_ERR_NVM in
    igc_write_nvm_srwr
  igc: check return value of ret_val in igc_config_fc_after_link_up

 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 13 +------------
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |  1 -
 drivers/net/ethernet/intel/igc/igc_ethtool.c       |  3 ++-
 drivers/net/ethernet/intel/igc/igc_i225.c          |  3 +--
 drivers/net/ethernet/intel/igc/igc_mac.c           |  2 +-
 5 files changed, 5 insertions(+), 17 deletions

-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended
  2021-01-28 21:38 [PATCH net 0/4][pull request] Intel Wired LAN Driver Updates 2021-01-28 Tony Nguyen
@ 2021-01-28 21:38 ` Tony Nguyen
  2021-01-30  6:22   ` Jakub Kicinski
  2021-01-28 21:38 ` [PATCH net 2/4] igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr Tony Nguyen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Tony Nguyen @ 2021-01-28 21:38 UTC (permalink / raw)
  To: davem, kuba
  Cc: Kai-Heng Feng, netdev, sassmann, anthony.l.nguyen, Sasha Neftin

From: Kai-Heng Feng <kai.heng.feng@canonical.com>

Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown
when device is runtime suspended"), if we try to read speed and duplex
sysfs while the device is runtime suspended, igc will complain and
stops working:

[  123.449883] igc 0000:03:00.0 enp3s0: PCIe link lost, device now detached
[  123.450052] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  123.450056] #PF: supervisor read access in kernel mode
[  123.450058] #PF: error_code(0x0000) - not-present page
[  123.450059] PGD 0 P4D 0
[  123.450064] Oops: 0000 [#1] SMP NOPTI
[  123.450068] CPU: 0 PID: 2525 Comm: udevadm Tainted: G     U  W  OE     5.10.0-1002-oem #2+rkl2-Ubuntu
[  123.450078] RIP: 0010:igc_rd32+0x1c/0x90 [igc]
[  123.450080] Code: c0 5d c3 b8 fd ff ff ff c3 0f 1f 44 00 00 0f 1f 44 00 00 55 89 f0 48 89 e5 41 56 41 55 41 54 49 89 c4 53 48 8b 57 08 48 01 d0 <44> 8b 28 41 83 fd ff 74 0c 5b 44 89 e8 41 5c 41 5d 4

[  123.450083] RSP: 0018:ffffb0d100d6fcc0 EFLAGS: 00010202
[  123.450085] RAX: 0000000000000008 RBX: ffffb0d100d6fd30 RCX: 0000000000000000
[  123.450087] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff945a12716c10
[  123.450089] RBP: ffffb0d100d6fce0 R08: ffff945a12716550 R09: ffff945a09874000
[  123.450090] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000008
[  123.450092] R13: ffff945a12716000 R14: ffff945a037da280 R15: ffff945a037da290
[  123.450094] FS:  00007f3b34c868c0(0000) GS:ffff945b89200000(0000) knlGS:0000000000000000
[  123.450096] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.450098] CR2: 0000000000000008 CR3: 00000001144de006 CR4: 0000000000770ef0
[  123.450100] PKRU: 55555554
[  123.450101] Call Trace:
[  123.450111]  igc_ethtool_get_link_ksettings+0xd6/0x1b0 [igc]
[  123.450118]  __ethtool_get_link_ksettings+0x71/0xb0
[  123.450123]  duplex_show+0x74/0xc0
[  123.450129]  dev_attr_show+0x1d/0x40
[  123.450134]  sysfs_kf_seq_show+0xa1/0x100
[  123.450137]  kernfs_seq_show+0x27/0x30
[  123.450142]  seq_read+0xb7/0x400
[  123.450148]  ? common_file_perm+0x72/0x170
[  123.450151]  kernfs_fop_read+0x35/0x1b0
[  123.450155]  vfs_read+0xb5/0x1b0
[  123.450157]  ksys_read+0x67/0xe0
[  123.450160]  __x64_sys_read+0x1a/0x20
[  123.450164]  do_syscall_64+0x38/0x90
[  123.450168]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  123.450170] RIP: 0033:0x7f3b351fe142
[  123.450173] Code: c0 e9 c2 fe ff ff 50 48 8d 3d 3a ca 0a 00 e8 f5 19 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[  123.450174] RSP: 002b:00007fffef2ec138 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  123.450177] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b351fe142
[  123.450179] RDX: 0000000000001001 RSI: 00005644c047f070 RDI: 0000000000000003
[  123.450180] RBP: 00007fffef2ec340 R08: 00005644c047f070 R09: 00007f3b352d9320
[  123.450182] R10: 00005644c047c010 R11: 0000000000000246 R12: 00005644c047cbf0
[  123.450184] R13: 00005644c047e6d0 R14: 0000000000000003 R15: 00007fffef2ec140
[  123.450189] Modules linked in: rfcomm ccm cmac algif_hash algif_skcipher af_alg bnep toshiba_acpi industrialio toshiba_haps hp_accel lis3lv02d btusb btrtl btbcm btintel bluetooth ecdh_generic ecc joydev input_leds nls_iso8859_1 snd_sof_pci snd_sof_intel_byt snd_sof_intel_ipc snd_sof_intel_hda_common snd_soc_hdac_hda snd_hda_codec_hdmi snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core ath10k_pci snd_hwdep intel_rapl_msr intel_rapl_common ath10k_core soundwire_bus snd_soc_core x86_pkg_temp_thermal ath intel_powerclamp snd_compress ac97_bus snd_pcm_dmaengine mac80211 snd_pcm coretemp snd_seq_midi snd_seq_midi_event snd_rawmidi kvm_intel cfg80211 snd_seq snd_seq_device snd_timer mei_hdcp kvm libarc4 snd crct10dif_pclmul ghash_clmulni_intel aesni_intel
 mei_me dell_wmi
[  123.450266]  dell_smbios soundcore sparse_keymap dcdbas crypto_simd cryptd mei dell_uart_backlight glue_helper ee1004 wmi_bmof intel_wmi_thunderbolt dell_wmi_descriptor mac_hid efi_pstore acpi_pad acpi_tad intel_cstate sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec crc32_pclmul rc_core drm intel_lpss_pci i2c_i801 ahci igc intel_lpss i2c_smbus idma64 xhci_pci libahci virt_dma xhci_pci_renesas wmi video pinctrl_tigerlake
[  123.450335] CR2: 0000000000000008
[  123.450338] ---[ end trace 9f731e38b53c35cc ]---

The more generic approach will be wrap get_link_ksettings() with begin()
and complete() callbacks, and calls runtime resume and runtime suspend
routine respectively. However, igc is like igb, runtime resume routine
uses rtnl_lock() which upper ethtool layer also uses.

So to prevent a deadlock on rtnl, take a different approach, use
pm_runtime_suspended() to avoid reading register while device is runtime
suspended.

Fixes: 8c5ad0dae93c ("igc: Add ethtool support")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_ethtool.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 831f2f09de5f..ec8cd69d4992 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1714,7 +1714,8 @@ static int igc_ethtool_get_link_ksettings(struct net_device *netdev,
 						     Asym_Pause);
 	}
 
-	status = rd32(IGC_STATUS);
+	status = pm_runtime_suspended(&adapter->pdev->dev) ?
+		 0 : rd32(IGC_STATUS);
 
 	if (status & IGC_STATUS_LU) {
 		if (status & IGC_STATUS_SPEED_1000) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 2/4] igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr
  2021-01-28 21:38 [PATCH net 0/4][pull request] Intel Wired LAN Driver Updates 2021-01-28 Tony Nguyen
  2021-01-28 21:38 ` [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended Tony Nguyen
@ 2021-01-28 21:38 ` Tony Nguyen
  2021-01-28 21:38 ` [PATCH net 3/4] igc: check return value of ret_val in igc_config_fc_after_link_up Tony Nguyen
  2021-01-28 21:38 ` [PATCH net 4/4] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues" Tony Nguyen
  3 siblings, 0 replies; 14+ messages in thread
From: Tony Nguyen @ 2021-01-28 21:38 UTC (permalink / raw)
  To: davem, kuba; +Cc: Kevin Lo, netdev, sassmann, anthony.l.nguyen

From: Kevin Lo <kevlo@kevlo.org>

This patch sets the default return value to -IGC_ERR_NVM in
igc_write_nvm_srwr. Without this change it wouldn't lead to a shadow RAM
write EEWR timeout.

Fixes: ab4056126813 ("igc: Add NVM support")
Signed-off-by: Kevin Lo <kevlo@kevlo.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_i225.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_i225.c b/drivers/net/ethernet/intel/igc/igc_i225.c
index 8b67d9b49a83..7ec04e48860c 100644
--- a/drivers/net/ethernet/intel/igc/igc_i225.c
+++ b/drivers/net/ethernet/intel/igc/igc_i225.c
@@ -219,9 +219,9 @@ static s32 igc_write_nvm_srwr(struct igc_hw *hw, u16 offset, u16 words,
 			      u16 *data)
 {
 	struct igc_nvm_info *nvm = &hw->nvm;
+	s32 ret_val = -IGC_ERR_NVM;
 	u32 attempts = 100000;
 	u32 i, k, eewr = 0;
-	s32 ret_val = 0;
 
 	/* A check for invalid values:  offset too large, too many words,
 	 * too many words for the offset, and not enough words.
@@ -229,7 +229,6 @@ static s32 igc_write_nvm_srwr(struct igc_hw *hw, u16 offset, u16 words,
 	if (offset >= nvm->word_size || (words > (nvm->word_size - offset)) ||
 	    words == 0) {
 		hw_dbg("nvm parameter(s) out of bounds\n");
-		ret_val = -IGC_ERR_NVM;
 		goto out;
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 3/4] igc: check return value of ret_val in igc_config_fc_after_link_up
  2021-01-28 21:38 [PATCH net 0/4][pull request] Intel Wired LAN Driver Updates 2021-01-28 Tony Nguyen
  2021-01-28 21:38 ` [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended Tony Nguyen
  2021-01-28 21:38 ` [PATCH net 2/4] igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr Tony Nguyen
@ 2021-01-28 21:38 ` Tony Nguyen
  2021-01-28 21:38 ` [PATCH net 4/4] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues" Tony Nguyen
  3 siblings, 0 replies; 14+ messages in thread
From: Tony Nguyen @ 2021-01-28 21:38 UTC (permalink / raw)
  To: davem, kuba; +Cc: Kevin Lo, netdev, sassmann, anthony.l.nguyen, Sasha Neftin

From: Kevin Lo <kevlo@kevlo.org>

Check return value from ret_val to make error check actually work.

Fixes: 4eb8080143a9 ("igc: Add setup link functionality")
Signed-off-by: Kevin Lo <kevlo@kevlo.org>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_mac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/ethernet/intel/igc/igc_mac.c
index 09cd0ec7ee87..67b8ffd21d8a 100644
--- a/drivers/net/ethernet/intel/igc/igc_mac.c
+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
@@ -638,7 +638,7 @@ s32 igc_config_fc_after_link_up(struct igc_hw *hw)
 	}
 
 out:
-	return 0;
+	return ret_val;
 }
 
 /**
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 4/4] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
  2021-01-28 21:38 [PATCH net 0/4][pull request] Intel Wired LAN Driver Updates 2021-01-28 Tony Nguyen
                   ` (2 preceding siblings ...)
  2021-01-28 21:38 ` [PATCH net 3/4] igc: check return value of ret_val in igc_config_fc_after_link_up Tony Nguyen
@ 2021-01-28 21:38 ` Tony Nguyen
  2021-01-29 20:23   ` Willem de Bruijn
  3 siblings, 1 reply; 14+ messages in thread
From: Tony Nguyen @ 2021-01-28 21:38 UTC (permalink / raw)
  To: davem, kuba
  Cc: Aleksandr Loktionov, netdev, sassmann, anthony.l.nguyen,
	Arkadiusz Kubalewski, Konrad Jankowski

From: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

This reverts commit 2ad1274fa35ace5c6360762ba48d33b63da2396c

VF queues were not brought up when PF was brought up after being
downed if the VF driver disabled VFs queues during PF down.
This could happen in some older or external VF driver implementations.
The problem was that PF driver used vf->queues_enabled as a condition
to decide what link-state it would send out which caused the issue.

Remove the check for vf->queues_enabled in the VF link notify.
Now VF will always be notified of the current link status.
Also remove the queues_enabled member from i40e_vf structure as it is
not used anymore. Otherwise VNF implementation was broken and caused
a link flap.

Fixes: 2ad1274fa35a ("i40e: don't report link up for a VF who hasn't enabled")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 13 +------------
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |  1 -
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 7efc61aacb0a..1b6ec9be155a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -55,12 +55,7 @@ static void i40e_vc_notify_vf_link_state(struct i40e_vf *vf)
 
 	pfe.event = VIRTCHNL_EVENT_LINK_CHANGE;
 	pfe.severity = PF_EVENT_SEVERITY_INFO;
-
-	/* Always report link is down if the VF queues aren't enabled */
-	if (!vf->queues_enabled) {
-		pfe.event_data.link_event.link_status = false;
-		pfe.event_data.link_event.link_speed = 0;
-	} else if (vf->link_forced) {
+	if (vf->link_forced) {
 		pfe.event_data.link_event.link_status = vf->link_up;
 		pfe.event_data.link_event.link_speed =
 			(vf->link_up ? i40e_virtchnl_link_speed(ls->link_speed) : 0);
@@ -70,7 +65,6 @@ static void i40e_vc_notify_vf_link_state(struct i40e_vf *vf)
 		pfe.event_data.link_event.link_speed =
 			i40e_virtchnl_link_speed(ls->link_speed);
 	}
-
 	i40e_aq_send_msg_to_vf(hw, abs_vf_id, VIRTCHNL_OP_EVENT,
 			       0, (u8 *)&pfe, sizeof(pfe), NULL);
 }
@@ -2443,8 +2437,6 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, u8 *msg)
 		}
 	}
 
-	vf->queues_enabled = true;
-
 error_param:
 	/* send the response to the VF */
 	return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES,
@@ -2466,9 +2458,6 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg)
 	struct i40e_pf *pf = vf->pf;
 	i40e_status aq_ret = 0;
 
-	/* Immediately mark queues as disabled */
-	vf->queues_enabled = false;
-
 	if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) {
 		aq_ret = I40E_ERR_PARAM;
 		goto error_param;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index 5491215d81de..091e32c1bb46 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -98,7 +98,6 @@ struct i40e_vf {
 	unsigned int tx_rate;	/* Tx bandwidth limit in Mbps */
 	bool link_forced;
 	bool link_up;		/* only valid if VF link is forced */
-	bool queues_enabled;	/* true if the VF queues are enabled */
 	bool spoofchk;
 	u16 num_vlan;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net 4/4] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
  2021-01-28 21:38 ` [PATCH net 4/4] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues" Tony Nguyen
@ 2021-01-29 20:23   ` Willem de Bruijn
  2021-01-30  0:09     ` Jacob Keller
  0 siblings, 1 reply; 14+ messages in thread
From: Willem de Bruijn @ 2021-01-29 20:23 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: David Miller, Jakub Kicinski, Aleksandr Loktionov,
	Network Development, sassmann, Arkadiusz Kubalewski,
	Konrad Jankowski

On Thu, Jan 28, 2021 at 4:45 PM Tony Nguyen <anthony.l.nguyen@intel.com> wrote:
>
> From: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>
> This reverts commit 2ad1274fa35ace5c6360762ba48d33b63da2396c
>
> VF queues were not brought up when PF was brought up after being
> downed if the VF driver disabled VFs queues during PF down.
> This could happen in some older or external VF driver implementations.
> The problem was that PF driver used vf->queues_enabled as a condition
> to decide what link-state it would send out which caused the issue.
>
> Remove the check for vf->queues_enabled in the VF link notify.
> Now VF will always be notified of the current link status.
> Also remove the queues_enabled member from i40e_vf structure as it is
> not used anymore. Otherwise VNF implementation was broken and caused
> a link flap.
>
> Fixes: 2ad1274fa35a ("i40e: don't report link up for a VF who hasn't enabled")
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Doesn't this reintroduce the bug that the original patch aimed to solve?

Commit 2ad1274fa35a itself was also a fix.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 4/4] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
  2021-01-29 20:23   ` Willem de Bruijn
@ 2021-01-30  0:09     ` Jacob Keller
  2021-01-30  2:00       ` Willem de Bruijn
  0 siblings, 1 reply; 14+ messages in thread
From: Jacob Keller @ 2021-01-30  0:09 UTC (permalink / raw)
  To: Willem de Bruijn, Tony Nguyen
  Cc: David Miller, Jakub Kicinski, Aleksandr Loktionov,
	Network Development, sassmann, Arkadiusz Kubalewski,
	Konrad Jankowski



On 1/29/2021 12:23 PM, Willem de Bruijn wrote:
> On Thu, Jan 28, 2021 at 4:45 PM Tony Nguyen <anthony.l.nguyen@intel.com> wrote:
>>
>> From: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>>
>> This reverts commit 2ad1274fa35ace5c6360762ba48d33b63da2396c
>>
>> VF queues were not brought up when PF was brought up after being
>> downed if the VF driver disabled VFs queues during PF down.
>> This could happen in some older or external VF driver implementations.
>> The problem was that PF driver used vf->queues_enabled as a condition
>> to decide what link-state it would send out which caused the issue.
>>
>> Remove the check for vf->queues_enabled in the VF link notify.
>> Now VF will always be notified of the current link status.
>> Also remove the queues_enabled member from i40e_vf structure as it is
>> not used anymore. Otherwise VNF implementation was broken and caused
>> a link flap.
>>
>> Fixes: 2ad1274fa35a ("i40e: don't report link up for a VF who hasn't enabled")
>> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
>> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
>> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> 
> Doesn't this reintroduce the bug that the original patch aimed to solve?
> 
> Commit 2ad1274fa35a itself was also a fix.
> 

Yea this might re-introduce the issue described in that commit. However
I believe the bug in question was due to very old versions of VF
drivers, (including an ancient version of FreeBSD if I recall).

Perhaps there is some better mechanism for handling this, but I think
reverting this is ok given that it causes problems in certain situations
where the link status wasn't reported properly.

Maybe there is a solution for both cases? but I would worry less about
an issue with the incredibly old VFs because we know that the issue is
fixed in newer VF code and the real problem is that the VF driver is
incorrectly assuming link up means it is ready to send.

Thus, I am comfortable with this revert: It simplifies the state for
both the PF and VF.

I would be open to alternatives as long as the issue described here is
also fixed.

Caveat: I was not involved in the decision to revert this and wasn't
aware of it until now, so I almost certainly have out of date information.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 4/4] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
  2021-01-30  0:09     ` Jacob Keller
@ 2021-01-30  2:00       ` Willem de Bruijn
  2021-01-30  6:18         ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Willem de Bruijn @ 2021-01-30  2:00 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Tony Nguyen, David Miller, Jakub Kicinski, Aleksandr Loktionov,
	Network Development, sassmann, Arkadiusz Kubalewski,
	Konrad Jankowski

On Fri, Jan 29, 2021 at 7:09 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
>
>
> On 1/29/2021 12:23 PM, Willem de Bruijn wrote:
> > On Thu, Jan 28, 2021 at 4:45 PM Tony Nguyen <anthony.l.nguyen@intel.com> wrote:
> >>
> >> From: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> >>
> >> This reverts commit 2ad1274fa35ace5c6360762ba48d33b63da2396c
> >>
> >> VF queues were not brought up when PF was brought up after being
> >> downed if the VF driver disabled VFs queues during PF down.
> >> This could happen in some older or external VF driver implementations.
> >> The problem was that PF driver used vf->queues_enabled as a condition
> >> to decide what link-state it would send out which caused the issue.
> >>
> >> Remove the check for vf->queues_enabled in the VF link notify.
> >> Now VF will always be notified of the current link status.
> >> Also remove the queues_enabled member from i40e_vf structure as it is
> >> not used anymore. Otherwise VNF implementation was broken and caused
> >> a link flap.
> >>
> >> Fixes: 2ad1274fa35a ("i40e: don't report link up for a VF who hasn't enabled")
> >> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> >> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> >> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
> >> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> >
> > Doesn't this reintroduce the bug that the original patch aimed to solve?
> >
> > Commit 2ad1274fa35a itself was also a fix.
> >
>
> Yea this might re-introduce the issue described in that commit. However
> I believe the bug in question was due to very old versions of VF
> drivers, (including an ancient version of FreeBSD if I recall).
>
> Perhaps there is some better mechanism for handling this, but I think
> reverting this is ok given that it causes problems in certain situations
> where the link status wasn't reported properly.
>
> Maybe there is a solution for both cases? but I would worry less about
> an issue with the incredibly old VFs because we know that the issue is
> fixed in newer VF code and the real problem is that the VF driver is
> incorrectly assuming link up means it is ready to send.
>
> Thus, I am comfortable with this revert: It simplifies the state for
> both the PF and VF.
>
> I would be open to alternatives as long as the issue described here is
> also fixed.
>
> Caveat: I was not involved in the decision to revert this and wasn't
> aware of it until now, so I almost certainly have out of date information.

That's reasonable. The original patch is over three years old.

If it is considered safe to revert now, I would just articulate that
point in the commit.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 4/4] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
  2021-01-30  2:00       ` Willem de Bruijn
@ 2021-01-30  6:18         ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2021-01-30  6:18 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Jacob Keller, Tony Nguyen, David Miller, Aleksandr Loktionov,
	Network Development, sassmann, Arkadiusz Kubalewski,
	Konrad Jankowski

On Fri, 29 Jan 2021 21:00:02 -0500 Willem de Bruijn wrote:
> On Fri, Jan 29, 2021 at 7:09 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
> > Yea this might re-introduce the issue described in that commit. However
> > I believe the bug in question was due to very old versions of VF
> > drivers, (including an ancient version of FreeBSD if I recall).
> >
> > Perhaps there is some better mechanism for handling this, but I think
> > reverting this is ok given that it causes problems in certain situations
> > where the link status wasn't reported properly.
> >
> > Maybe there is a solution for both cases? but I would worry less about
> > an issue with the incredibly old VFs because we know that the issue is
> > fixed in newer VF code and the real problem is that the VF driver is
> > incorrectly assuming link up means it is ready to send.
> >
> > Thus, I am comfortable with this revert: It simplifies the state for
> > both the PF and VF.
> >
> > I would be open to alternatives as long as the issue described here is
> > also fixed.
> >
> > Caveat: I was not involved in the decision to revert this and wasn't
> > aware of it until now, so I almost certainly have out of date information.  
> 
> That's reasonable. The original patch is over three years old.
> 
> If it is considered safe to revert now, I would just articulate that
> point in the commit.

Agreed. I'd call out that the original fix was a work around for
clearly buggy client drivers, and they had enough time to be fixed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended
  2021-01-28 21:38 ` [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended Tony Nguyen
@ 2021-01-30  6:22   ` Jakub Kicinski
  2021-01-30 14:00     ` Neftin, Sasha
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-01-30  6:22 UTC (permalink / raw)
  To: Tony Nguyen; +Cc: davem, Kai-Heng Feng, netdev, sassmann, Sasha Neftin

On Thu, 28 Jan 2021 13:38:48 -0800 Tony Nguyen wrote:
> From: Kai-Heng Feng <kai.heng.feng@canonical.com>
> 
> Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown
> when device is runtime suspended"), if we try to read speed and duplex
> sysfs while the device is runtime suspended, igc will complain and
> stops working:

> The more generic approach will be wrap get_link_ksettings() with begin()
> and complete() callbacks, and calls runtime resume and runtime suspend
> routine respectively. However, igc is like igb, runtime resume routine
> uses rtnl_lock() which upper ethtool layer also uses.
> 
> So to prevent a deadlock on rtnl, take a different approach, use
> pm_runtime_suspended() to avoid reading register while device is runtime
> suspended.

Is someone working on the full fix to how PM operates?

There is another rd32(IGC_STATUS) in this file which I don't think 
is protected either. 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended
  2021-01-30  6:22   ` Jakub Kicinski
@ 2021-01-30 14:00     ` Neftin, Sasha
  2021-01-30 18:12       ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Neftin, Sasha @ 2021-01-30 14:00 UTC (permalink / raw)
  To: Jakub Kicinski, Tony Nguyen
  Cc: davem, Kai-Heng Feng, netdev, sassmann, Lifshits, Vitaly, Neftin, Sasha

On 1/30/2021 08:22, Jakub Kicinski wrote:
> On Thu, 28 Jan 2021 13:38:48 -0800 Tony Nguyen wrote:
>> From: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>
>> Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown
>> when device is runtime suspended"), if we try to read speed and duplex
>> sysfs while the device is runtime suspended, igc will complain and
>> stops working:
> 
>> The more generic approach will be wrap get_link_ksettings() with begin()
>> and complete() callbacks, and calls runtime resume and runtime suspend
>> routine respectively. However, igc is like igb, runtime resume routine
>> uses rtnl_lock() which upper ethtool layer also uses.
>>
>> So to prevent a deadlock on rtnl, take a different approach, use
>> pm_runtime_suspended() to avoid reading register while device is runtime
>> suspended.
> 
> Is someone working on the full fix to how PM operates?
> 
> There is another rd32(IGC_STATUS) in this file which I don't think
> is protected either.
Hello Jakub,
What is another rd32(IGC_STATUS) you meant? in  igc_ethtool_get_regs? 
While the device in D3 state there is no configuration space registers 
access.
sasha
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended
  2021-01-30 14:00     ` Neftin, Sasha
@ 2021-01-30 18:12       ` Jakub Kicinski
  2021-01-31 10:22         ` Neftin, Sasha
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-01-30 18:12 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Tony Nguyen, davem, Kai-Heng Feng, netdev, sassmann, Lifshits, Vitaly

On Sat, 30 Jan 2021 16:00:06 +0200 Neftin, Sasha wrote:
> On 1/30/2021 08:22, Jakub Kicinski wrote:
> > On Thu, 28 Jan 2021 13:38:48 -0800 Tony Nguyen wrote:  
> >> From: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >>
> >> Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown
> >> when device is runtime suspended"), if we try to read speed and duplex
> >> sysfs while the device is runtime suspended, igc will complain and
> >> stops working:  
> >   
> >> The more generic approach will be wrap get_link_ksettings() with begin()
> >> and complete() callbacks, and calls runtime resume and runtime suspend
> >> routine respectively. However, igc is like igb, runtime resume routine
> >> uses rtnl_lock() which upper ethtool layer also uses.
> >>
> >> So to prevent a deadlock on rtnl, take a different approach, use
> >> pm_runtime_suspended() to avoid reading register while device is runtime
> >> suspended.  
> > 
> > Is someone working on the full fix to how PM operates?
> > 
> > There is another rd32(IGC_STATUS) in this file which I don't think
> > is protected either.  
>
> What is another rd32(IGC_STATUS) you meant? in  igc_ethtool_get_regs? 

Yes.

> While the device in D3 state there is no configuration space registers 
> access.

That's to say similar stack trace will be generated to the one fixed
here, if someone runs ethtool -d, correct? I don't see anything
checking runtime there either.

To be clear I'm not asking for this to be addressed in this series.
Rather for a strong commitment that PM handling will be restructured.
It seems to me you should depend on refcounting / locking that the PM
subsystem does more rather than involving rtnl_lock.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended
  2021-01-30 18:12       ` Jakub Kicinski
@ 2021-01-31 10:22         ` Neftin, Sasha
  2021-02-01 22:04           ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Neftin, Sasha @ 2021-01-31 10:22 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tony Nguyen, davem, Kai-Heng Feng, netdev, sassmann, Lifshits,
	Vitaly, Keller, Jacob E, Brandeburg, Jesse, Neftin, Sasha

On 1/30/2021 20:12, Jakub Kicinski wrote:
> On Sat, 30 Jan 2021 16:00:06 +0200 Neftin, Sasha wrote:
>> On 1/30/2021 08:22, Jakub Kicinski wrote:
>>> On Thu, 28 Jan 2021 13:38:48 -0800 Tony Nguyen wrote:
>>>> From: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>
>>>> Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown
>>>> when device is runtime suspended"), if we try to read speed and duplex
>>>> sysfs while the device is runtime suspended, igc will complain and
>>>> stops working:
>>>    
>>>> The more generic approach will be wrap get_link_ksettings() with begin()
>>>> and complete() callbacks, and calls runtime resume and runtime suspend
>>>> routine respectively. However, igc is like igb, runtime resume routine
>>>> uses rtnl_lock() which upper ethtool layer also uses.
>>>>
>>>> So to prevent a deadlock on rtnl, take a different approach, use
>>>> pm_runtime_suspended() to avoid reading register while device is runtime
>>>> suspended.
>>>
>>> Is someone working on the full fix to how PM operates?
>>>
>>> There is another rd32(IGC_STATUS) in this file which I don't think
>>> is protected either.
>>
>> What is another rd32(IGC_STATUS) you meant? in  igc_ethtool_get_regs?
> 
> Yes.
> 
>> While the device in D3 state there is no configuration space registers
>> access.
> 
> That's to say similar stack trace will be generated to the one fixed
> here, if someone runs ethtool -d, correct? I don't see anything
> checking runtime there either.
> 
yes.
This problem crosses many drivers. (not only igb, igc,...)

specific to this one (igc), can we check 'netif_running at begin of the 
_get_regs method:
if (!netif_running(netdev))
	return;
what do you think? (only OS can put device to the D3)

> To be clear I'm not asking for this to be addressed in this series.
> Rather for a strong commitment that PM handling will be restructured.
> It seems to me you should depend on refcounting / locking that the PM
> subsystem does more rather than involving rtnl_lock.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended
  2021-01-31 10:22         ` Neftin, Sasha
@ 2021-02-01 22:04           ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2021-02-01 22:04 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Tony Nguyen, davem, Kai-Heng Feng, netdev, sassmann, Lifshits,
	Vitaly, Keller, Jacob E, Brandeburg, Jesse

On Sun, 31 Jan 2021 12:22:25 +0200 Neftin, Sasha wrote:
> On 1/30/2021 20:12, Jakub Kicinski wrote:
> > On Sat, 30 Jan 2021 16:00:06 +0200 Neftin, Sasha wrote:  
> >> What is another rd32(IGC_STATUS) you meant? in  igc_ethtool_get_regs?  
> > 
> > Yes.
> >   
> >> While the device in D3 state there is no configuration space registers
> >> access.  
> > 
> > That's to say similar stack trace will be generated to the one fixed
> > here, if someone runs ethtool -d, correct? I don't see anything
> > checking runtime there either.
> >   
> yes.
> This problem crosses many drivers. (not only igb, igc,...)
> 
> specific to this one (igc), can we check 'netif_running at begin of the 
> _get_regs method:
> if (!netif_running(netdev))
> 	return;
> what do you think? (only OS can put device to the D3)

That'd address the particular issue we noticed in the 5min review of
this patch, but similar, less obvious problems may still be lurking?
I wish I knew more about PM so I could suggest a solution. It'd be
ideal to avoid the rtnl_lock calls in resume, so that the driver can
just wake up the device from within the callbacks. Maybe embedded
experts can chime in and suggest how it's usually handled..

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-02-01 22:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28 21:38 [PATCH net 0/4][pull request] Intel Wired LAN Driver Updates 2021-01-28 Tony Nguyen
2021-01-28 21:38 ` [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended Tony Nguyen
2021-01-30  6:22   ` Jakub Kicinski
2021-01-30 14:00     ` Neftin, Sasha
2021-01-30 18:12       ` Jakub Kicinski
2021-01-31 10:22         ` Neftin, Sasha
2021-02-01 22:04           ` Jakub Kicinski
2021-01-28 21:38 ` [PATCH net 2/4] igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr Tony Nguyen
2021-01-28 21:38 ` [PATCH net 3/4] igc: check return value of ret_val in igc_config_fc_after_link_up Tony Nguyen
2021-01-28 21:38 ` [PATCH net 4/4] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues" Tony Nguyen
2021-01-29 20:23   ` Willem de Bruijn
2021-01-30  0:09     ` Jacob Keller
2021-01-30  2:00       ` Willem de Bruijn
2021-01-30  6:18         ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).