All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH iwl-net 0/2] Fixes for iavf reset path
@ 2024-01-23 23:31 ` Ahmed Zaki
  0 siblings, 0 replies; 8+ messages in thread
From: Ahmed Zaki @ 2024-01-23 23:31 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Ahmed Zaki

A couple of fixes for iavf's reset issues that can happen in the early
states (before configuring IRQs, queues, ..etc).

Ahmed Zaki (2):
  iavf: fix reset in early states
  iavf: allow an early reset event to be processed

 drivers/net/ethernet/intel/iavf/iavf_main.c    | 11 +++++++++++
 .../net/ethernet/intel/iavf/iavf_virtchnl.c    | 18 ++++++++++++++++++
 2 files changed, 29 insertions(+)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH iwl-net 0/2] Fixes for iavf reset path
@ 2024-01-23 23:31 ` Ahmed Zaki
  0 siblings, 0 replies; 8+ messages in thread
From: Ahmed Zaki @ 2024-01-23 23:31 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Ahmed Zaki

A couple of fixes for iavf's reset issues that can happen in the early
states (before configuring IRQs, queues, ..etc).

Ahmed Zaki (2):
  iavf: fix reset in early states
  iavf: allow an early reset event to be processed

 drivers/net/ethernet/intel/iavf/iavf_main.c    | 11 +++++++++++
 .../net/ethernet/intel/iavf/iavf_virtchnl.c    | 18 ++++++++++++++++++
 2 files changed, 29 insertions(+)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH iwl-net 1/2] iavf: fix reset in early states
  2024-01-23 23:31 ` [Intel-wired-lan] " Ahmed Zaki
@ 2024-01-23 23:31   ` Ahmed Zaki
  -1 siblings, 0 replies; 8+ messages in thread
From: Ahmed Zaki @ 2024-01-23 23:31 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Ahmed Zaki, Tony Nguyen

The iavf_reset_task() assumes that the adapter has finished the
initialization cycle and is either in __IAVF_DOWN or __IAVF_RUNNING.

At the early states, no resources have been allocated. Allow an early reset
by simply shutting down the admin queue and reverting to the first state
__IAVF_STARTUP.

Fixes: 5eae00c57f5e ("i40evf: main driver core")
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 335fd13e86f7..e1569035d5d0 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -3037,6 +3037,17 @@ static void iavf_reset_task(struct work_struct *work)
 	}
 
 continue_reset:
+	/* If we are still early in the state machine, just restart. */
+	if (adapter->state <= __IAVF_INIT_FAILED) {
+		iavf_shutdown_adminq(hw);
+		iavf_change_state(adapter, __IAVF_STARTUP);
+		iavf_startup(adapter);
+		mutex_unlock(&adapter->crit_lock);
+		queue_delayed_work(adapter->wq, &adapter->watchdog_task,
+				   msecs_to_jiffies(30));
+		return;
+	}
+
 	/* We don't use netif_running() because it may be true prior to
 	 * ndo_open() returning, so we can't assume it means all our open
 	 * tasks have finished, since we're not holding the rtnl_lock here.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH iwl-net 1/2] iavf: fix reset in early states
@ 2024-01-23 23:31   ` Ahmed Zaki
  0 siblings, 0 replies; 8+ messages in thread
From: Ahmed Zaki @ 2024-01-23 23:31 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Tony Nguyen, Ahmed Zaki

The iavf_reset_task() assumes that the adapter has finished the
initialization cycle and is either in __IAVF_DOWN or __IAVF_RUNNING.

At the early states, no resources have been allocated. Allow an early reset
by simply shutting down the admin queue and reverting to the first state
__IAVF_STARTUP.

Fixes: 5eae00c57f5e ("i40evf: main driver core")
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 335fd13e86f7..e1569035d5d0 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -3037,6 +3037,17 @@ static void iavf_reset_task(struct work_struct *work)
 	}
 
 continue_reset:
+	/* If we are still early in the state machine, just restart. */
+	if (adapter->state <= __IAVF_INIT_FAILED) {
+		iavf_shutdown_adminq(hw);
+		iavf_change_state(adapter, __IAVF_STARTUP);
+		iavf_startup(adapter);
+		mutex_unlock(&adapter->crit_lock);
+		queue_delayed_work(adapter->wq, &adapter->watchdog_task,
+				   msecs_to_jiffies(30));
+		return;
+	}
+
 	/* We don't use netif_running() because it may be true prior to
 	 * ndo_open() returning, so we can't assume it means all our open
 	 * tasks have finished, since we're not holding the rtnl_lock here.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH iwl-net 2/2] iavf: allow an early reset event to be processed
  2024-01-23 23:31 ` [Intel-wired-lan] " Ahmed Zaki
@ 2024-01-23 23:31   ` Ahmed Zaki
  -1 siblings, 0 replies; 8+ messages in thread
From: Ahmed Zaki @ 2024-01-23 23:31 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Ahmed Zaki, Tony Nguyen

If a reset event is received from the PF early in the init cycle, the
iavf state machine could hang for about 25 seconds. For example:

    # echo 1 > /sys/class/net/enp175s0np0/device/sriov_numvfs && \
      ip link set dev enp175s0np0 vf 0 mac <new_mac>

the log shows:

    [532.770534] ice 0000:af:00.0: Enabling 1 VFs
    [532.880439] iavf 0000:af:01.0: enabling device (0000 -> 0002)
    [532.880983] ice 0000:af:00.0: Enabling 1 VFs with 17 vectors and 16 queues per VF
    [532.916547] ice 0000:af:00.0 enp175s0np0: Setting MAC 00:60:2f:20:3f:28 on VF 0. VF driver will be reinitialized
    [553.464990] iavf 0000:af:01.0: Failed to communicate with PF; waiting before retry
    [558.903000] iavf 0000:af:01.0: Hardware came out of reset. Attempting reinit.
    [558.984816] iavf 0000:af:01.0: Multiqueue Enabled: Queue pair count = 16

This happens because reset events are ignored in the early states where
the misc irq vector is not initialized yet and communicating with the PF
is through polling the AQ buffer. Fix by scanning the received OP
codes for a reset event and scheduling the reset task if a reset event
is received.

Fixes: 5eae00c57f5e ("i40evf: main driver core")
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
---
 .../net/ethernet/intel/iavf/iavf_virtchnl.c    | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 22f2df7c460b..9d8a5d3adcee 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -76,6 +76,24 @@ iavf_poll_virtchnl_msg(struct iavf_hw *hw, struct iavf_arq_event_info *event,
 			return iavf_status_to_errno(status);
 		received_op =
 		    (enum virtchnl_ops)le32_to_cpu(event->desc.cookie_high);
+
+		if (received_op == VIRTCHNL_OP_EVENT) {
+			struct iavf_adapter *adapter = hw->back;
+			struct virtchnl_pf_event *vpe =
+				(struct virtchnl_pf_event *)event->msg_buf;
+
+			if (vpe->event != VIRTCHNL_EVENT_RESET_IMPENDING)
+				continue;
+
+			dev_info(&adapter->pdev->dev, "Reset indication received from the PF\n");
+			if (!(adapter->flags & IAVF_FLAG_RESET_PENDING)) {
+				dev_info(&adapter->pdev->dev, "Scheduling reset task\n");
+				iavf_schedule_reset(adapter,
+						    IAVF_FLAG_RESET_PENDING);
+			}
+			return -EIO;
+		}
+
 		if (op_to_poll == received_op)
 			break;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH iwl-net 2/2] iavf: allow an early reset event to be processed
@ 2024-01-23 23:31   ` Ahmed Zaki
  0 siblings, 0 replies; 8+ messages in thread
From: Ahmed Zaki @ 2024-01-23 23:31 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Tony Nguyen, Ahmed Zaki

If a reset event is received from the PF early in the init cycle, the
iavf state machine could hang for about 25 seconds. For example:

    # echo 1 > /sys/class/net/enp175s0np0/device/sriov_numvfs && \
      ip link set dev enp175s0np0 vf 0 mac <new_mac>

the log shows:

    [532.770534] ice 0000:af:00.0: Enabling 1 VFs
    [532.880439] iavf 0000:af:01.0: enabling device (0000 -> 0002)
    [532.880983] ice 0000:af:00.0: Enabling 1 VFs with 17 vectors and 16 queues per VF
    [532.916547] ice 0000:af:00.0 enp175s0np0: Setting MAC 00:60:2f:20:3f:28 on VF 0. VF driver will be reinitialized
    [553.464990] iavf 0000:af:01.0: Failed to communicate with PF; waiting before retry
    [558.903000] iavf 0000:af:01.0: Hardware came out of reset. Attempting reinit.
    [558.984816] iavf 0000:af:01.0: Multiqueue Enabled: Queue pair count = 16

This happens because reset events are ignored in the early states where
the misc irq vector is not initialized yet and communicating with the PF
is through polling the AQ buffer. Fix by scanning the received OP
codes for a reset event and scheduling the reset task if a reset event
is received.

Fixes: 5eae00c57f5e ("i40evf: main driver core")
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
---
 .../net/ethernet/intel/iavf/iavf_virtchnl.c    | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 22f2df7c460b..9d8a5d3adcee 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -76,6 +76,24 @@ iavf_poll_virtchnl_msg(struct iavf_hw *hw, struct iavf_arq_event_info *event,
 			return iavf_status_to_errno(status);
 		received_op =
 		    (enum virtchnl_ops)le32_to_cpu(event->desc.cookie_high);
+
+		if (received_op == VIRTCHNL_OP_EVENT) {
+			struct iavf_adapter *adapter = hw->back;
+			struct virtchnl_pf_event *vpe =
+				(struct virtchnl_pf_event *)event->msg_buf;
+
+			if (vpe->event != VIRTCHNL_EVENT_RESET_IMPENDING)
+				continue;
+
+			dev_info(&adapter->pdev->dev, "Reset indication received from the PF\n");
+			if (!(adapter->flags & IAVF_FLAG_RESET_PENDING)) {
+				dev_info(&adapter->pdev->dev, "Scheduling reset task\n");
+				iavf_schedule_reset(adapter,
+						    IAVF_FLAG_RESET_PENDING);
+			}
+			return -EIO;
+		}
+
 		if (op_to_poll == received_op)
 			break;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH iwl-net 0/2] Fixes for iavf reset path
  2024-01-23 23:31 ` [Intel-wired-lan] " Ahmed Zaki
@ 2024-02-20 20:52   ` Ahmed Zaki
  -1 siblings, 0 replies; 8+ messages in thread
From: Ahmed Zaki @ 2024-02-20 20:52 UTC (permalink / raw)
  To: intel-wired-lan, Tony Nguyen; +Cc: netdev



On 2024-01-23 4:31 p.m., Ahmed Zaki wrote:
> A couple of fixes for iavf's reset issues that can happen in the early
> states (before configuring IRQs, queues, ..etc).
> 
> Ahmed Zaki (2):
>    iavf: fix reset in early states
>    iavf: allow an early reset event to be processed
> 
>   drivers/net/ethernet/intel/iavf/iavf_main.c    | 11 +++++++++++
>   .../net/ethernet/intel/iavf/iavf_virtchnl.c    | 18 ++++++++++++++++++
>   2 files changed, 29 insertions(+)
> 

Stress testing is showing errors like:

[ 3193.412996] iavf 0000:b1:01.0: Unable to get VF config (-5)
[ 3197.115178] iavf 0000:b1:01.0: Admin queue command never completed

and

[ 3274.183144] iavf 0000:b1:01.0: Failed to init Admin Queue (-53)


more than we usually see. I will send a new version that better handles 
these errors paths.

@tony, please drop from iwl next-queue for now.

Ahmed

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-net 0/2] Fixes for iavf reset path
@ 2024-02-20 20:52   ` Ahmed Zaki
  0 siblings, 0 replies; 8+ messages in thread
From: Ahmed Zaki @ 2024-02-20 20:52 UTC (permalink / raw)
  To: intel-wired-lan, Tony Nguyen; +Cc: netdev



On 2024-01-23 4:31 p.m., Ahmed Zaki wrote:
> A couple of fixes for iavf's reset issues that can happen in the early
> states (before configuring IRQs, queues, ..etc).
> 
> Ahmed Zaki (2):
>    iavf: fix reset in early states
>    iavf: allow an early reset event to be processed
> 
>   drivers/net/ethernet/intel/iavf/iavf_main.c    | 11 +++++++++++
>   .../net/ethernet/intel/iavf/iavf_virtchnl.c    | 18 ++++++++++++++++++
>   2 files changed, 29 insertions(+)
> 

Stress testing is showing errors like:

[ 3193.412996] iavf 0000:b1:01.0: Unable to get VF config (-5)
[ 3197.115178] iavf 0000:b1:01.0: Admin queue command never completed

and

[ 3274.183144] iavf 0000:b1:01.0: Failed to init Admin Queue (-53)


more than we usually see. I will send a new version that better handles 
these errors paths.

@tony, please drop from iwl next-queue for now.

Ahmed

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-02-20 21:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-23 23:31 [PATCH iwl-net 0/2] Fixes for iavf reset path Ahmed Zaki
2024-01-23 23:31 ` [Intel-wired-lan] " Ahmed Zaki
2024-01-23 23:31 ` [PATCH iwl-net 1/2] iavf: fix reset in early states Ahmed Zaki
2024-01-23 23:31   ` [Intel-wired-lan] " Ahmed Zaki
2024-01-23 23:31 ` [PATCH iwl-net 2/2] iavf: allow an early reset event to be processed Ahmed Zaki
2024-01-23 23:31   ` [Intel-wired-lan] " Ahmed Zaki
2024-02-20 20:52 ` [PATCH iwl-net 0/2] Fixes for iavf reset path Ahmed Zaki
2024-02-20 20:52   ` [Intel-wired-lan] " Ahmed Zaki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.