netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 1/3] ibmvnic: don't stop queue in xmit
@ 2021-10-29 22:03 Sukadev Bhattiprolu
  2021-10-29 22:03 ` [PATCH net 2/3] ibmvnic: Process crqs after enabling interrupts Sukadev Bhattiprolu
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Sukadev Bhattiprolu @ 2021-10-29 22:03 UTC (permalink / raw)
  To: netdev; +Cc: Brian King, Dany Madden, abdhalee, vaish123

If adapter's resetting bit is on, discard the packet but don't stop the
transmit queue - instead leave that to the reset code. With this change,
it is possible that we may get several calls to ibmvnic_xmit() that simply
discard packets and return.

But if we stop the queue here, we might end up doing so just after
__ibmvnic_open() started the queues (during a hard/soft reset) and before
the ->resetting bit was cleared. If that happens, there will be no one to
restart queue and transmissions will be blocked indefinitely.

This can cause a TIMEOUT reset and with auto priority failover enabled,
an unnecessary FAILOVER reset to less favored backing device and then a
FAILOVER back to the most favored backing device. If we hit the window
repeatedly, we can get stuck in a loop of TIMEOUT, FAILOVER, FAILOVER
resets leaving the adapter unusable for extended periods of time.

Fixes: 7f5b030830fe ("ibmvnic: Free skb's in cases of failure in transmit")
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 8f17096e614d..a1533979c670 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1914,8 +1914,6 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
 	ind_bufp = &tx_scrq->ind_buf;
 
 	if (test_bit(0, &adapter->resetting)) {
-		if (!netif_subqueue_stopped(netdev, skb))
-			netif_stop_subqueue(netdev, queue_num);
 		dev_kfree_skb_any(skb);
 
 		tx_send_failed++;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 2/3] ibmvnic: Process crqs after enabling interrupts
  2021-10-29 22:03 [PATCH net 1/3] ibmvnic: don't stop queue in xmit Sukadev Bhattiprolu
@ 2021-10-29 22:03 ` Sukadev Bhattiprolu
  2021-10-29 22:23   ` Dany Madden
  2021-10-29 22:03 ` [PATCH net 3/3] ibmvnic: delay complete() Sukadev Bhattiprolu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Sukadev Bhattiprolu @ 2021-10-29 22:03 UTC (permalink / raw)
  To: netdev; +Cc: Brian King, Dany Madden, abdhalee, vaish123

Soon after registering a CRQ it is possible that we get a fail over or
maybe a CRQ_INIT from the VIOS while interrupts were disabled.

Look for any such CRQs after enabling interrupts.

Otherwise we can intermittently fail to bring up ibmvnic adapters during
boot, specially in kexec/kdump kernels.

Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index a1533979c670..50956f622b11 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -5611,6 +5611,9 @@ static int init_crq_queue(struct ibmvnic_adapter *adapter)
 	crq->cur = 0;
 	spin_lock_init(&crq->lock);
 
+	/* process any CRQs that were queued before we enabled interrupts */
+	tasklet_schedule(&adapter->tasklet);
+
 	return retrc;
 
 req_irq_failed:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 3/3] ibmvnic: delay complete()
  2021-10-29 22:03 [PATCH net 1/3] ibmvnic: don't stop queue in xmit Sukadev Bhattiprolu
  2021-10-29 22:03 ` [PATCH net 2/3] ibmvnic: Process crqs after enabling interrupts Sukadev Bhattiprolu
@ 2021-10-29 22:03 ` Sukadev Bhattiprolu
  2021-10-29 22:25   ` Dany Madden
  2021-10-29 22:18 ` [PATCH net 1/3] ibmvnic: don't stop queue in xmit Dany Madden
  2021-11-01 13:20 ` patchwork-bot+netdevbpf
  3 siblings, 1 reply; 7+ messages in thread
From: Sukadev Bhattiprolu @ 2021-10-29 22:03 UTC (permalink / raw)
  To: netdev; +Cc: Brian King, Dany Madden, abdhalee, vaish123

If we get CRQ_INIT, we set errno to -EIO and first call complete() to
notify the waiter. Then we try to schedule a FAILOVER reset. If this
occurs while adapter is in PROBING state, ibmvnic_reset() changes the
error code to EAGAIN and returns without scheduling the FAILOVER. The
purpose of setting error code to EAGAIN is to ask the waiter to retry.

But due to the earlier complete() call, the waiter may already have seen
the -EIO response and decided not to retry. This can cause intermittent
failures when bringing up ibmvnic adapters during boot, specially in
in kexec/kdump kernels.

Defer the complete() call until after scheduling the reset.

Also streamline the error code to EAGAIN. Don't see why we need EIO
sometimes. All 3 callers of ibmvnic_reset_init() can handle EAGAIN.

Fixes: 17c8705838a5 ("ibmvnic: Return error code if init interrupted by transport event")
Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 50956f622b11..29cbf60dfd79 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2755,7 +2755,7 @@ static int ibmvnic_reset(struct ibmvnic_adapter *adapter,
 
 	if (adapter->state == VNIC_PROBING) {
 		netdev_warn(netdev, "Adapter reset during probe\n");
-		adapter->init_done_rc = EAGAIN;
+		adapter->init_done_rc = -EAGAIN;
 		ret = EAGAIN;
 		goto err;
 	}
@@ -5266,11 +5266,6 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
 			 */
 			adapter->login_pending = false;
 
-			if (!completion_done(&adapter->init_done)) {
-				complete(&adapter->init_done);
-				adapter->init_done_rc = -EIO;
-			}
-
 			if (adapter->state == VNIC_DOWN)
 				rc = ibmvnic_reset(adapter, VNIC_RESET_PASSIVE_INIT);
 			else
@@ -5291,6 +5286,13 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
 					   rc);
 				adapter->failover_pending = false;
 			}
+
+			if (!completion_done(&adapter->init_done)) {
+				complete(&adapter->init_done);
+				if (!adapter->init_done_rc)
+					adapter->init_done_rc = -EAGAIN;
+			}
+
 			break;
 		case IBMVNIC_CRQ_INIT_COMPLETE:
 			dev_info(dev, "Partner initialization complete\n");
@@ -5763,7 +5765,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)
 		}
 
 		rc = ibmvnic_reset_init(adapter, false);
-	} while (rc == EAGAIN);
+	} while (rc == -EAGAIN);
 
 	/* We are ignoring the error from ibmvnic_reset_init() assuming that the
 	 * partner is not ready. CRQ is not active. When the partner becomes
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/3] ibmvnic: don't stop queue in xmit
  2021-10-29 22:03 [PATCH net 1/3] ibmvnic: don't stop queue in xmit Sukadev Bhattiprolu
  2021-10-29 22:03 ` [PATCH net 2/3] ibmvnic: Process crqs after enabling interrupts Sukadev Bhattiprolu
  2021-10-29 22:03 ` [PATCH net 3/3] ibmvnic: delay complete() Sukadev Bhattiprolu
@ 2021-10-29 22:18 ` Dany Madden
  2021-11-01 13:20 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 7+ messages in thread
From: Dany Madden @ 2021-10-29 22:18 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: netdev, Brian King, abdhalee, vaish123

On 2021-10-29 15:03, Sukadev Bhattiprolu wrote:
> If adapter's resetting bit is on, discard the packet but don't stop the
> transmit queue - instead leave that to the reset code. With this 
> change,
> it is possible that we may get several calls to ibmvnic_xmit() that 
> simply
> discard packets and return.
> 
> But if we stop the queue here, we might end up doing so just after
> __ibmvnic_open() started the queues (during a hard/soft reset) and 
> before
> the ->resetting bit was cleared. If that happens, there will be no one 
> to
> restart queue and transmissions will be blocked indefinitely.
> 
> This can cause a TIMEOUT reset and with auto priority failover enabled,
> an unnecessary FAILOVER reset to less favored backing device and then a
> FAILOVER back to the most favored backing device. If we hit the window
> repeatedly, we can get stuck in a loop of TIMEOUT, FAILOVER, FAILOVER
> resets leaving the adapter unusable for extended periods of time.
> 
> Fixes: 7f5b030830fe ("ibmvnic: Free skb's in cases of failure in 
> transmit")
> Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>

Reviewed-by: Dany Madden <drt@linux.ibm.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 2/3] ibmvnic: Process crqs after enabling interrupts
  2021-10-29 22:03 ` [PATCH net 2/3] ibmvnic: Process crqs after enabling interrupts Sukadev Bhattiprolu
@ 2021-10-29 22:23   ` Dany Madden
  0 siblings, 0 replies; 7+ messages in thread
From: Dany Madden @ 2021-10-29 22:23 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: netdev, Brian King, abdhalee, vaish123

On 2021-10-29 15:03, Sukadev Bhattiprolu wrote:
> Soon after registering a CRQ it is possible that we get a fail over or
> maybe a CRQ_INIT from the VIOS while interrupts were disabled.
> 
> Look for any such CRQs after enabling interrupts.
> 
> Otherwise we can intermittently fail to bring up ibmvnic adapters 
> during
> boot, specially in kexec/kdump kernels.
> 
> Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>

Reviewed-by: Dany Madden <drt@linux.ibm.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 3/3] ibmvnic: delay complete()
  2021-10-29 22:03 ` [PATCH net 3/3] ibmvnic: delay complete() Sukadev Bhattiprolu
@ 2021-10-29 22:25   ` Dany Madden
  0 siblings, 0 replies; 7+ messages in thread
From: Dany Madden @ 2021-10-29 22:25 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: netdev, Brian King, abdhalee, vaish123

On 2021-10-29 15:03, Sukadev Bhattiprolu wrote:
> If we get CRQ_INIT, we set errno to -EIO and first call complete() to
> notify the waiter. Then we try to schedule a FAILOVER reset. If this
> occurs while adapter is in PROBING state, ibmvnic_reset() changes the
> error code to EAGAIN and returns without scheduling the FAILOVER. The
> purpose of setting error code to EAGAIN is to ask the waiter to retry.
> 
> But due to the earlier complete() call, the waiter may already have 
> seen
> the -EIO response and decided not to retry. This can cause intermittent
> failures when bringing up ibmvnic adapters during boot, specially in
> in kexec/kdump kernels.
> 
> Defer the complete() call until after scheduling the reset.
> 
> Also streamline the error code to EAGAIN. Don't see why we need EIO
> sometimes. All 3 callers of ibmvnic_reset_init() can handle EAGAIN.
> 
> Fixes: 17c8705838a5 ("ibmvnic: Return error code if init interrupted
> by transport event")
> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>

Reviewed-by: Dany Madden <drt@linux.ibm.com>

> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c
> b/drivers/net/ethernet/ibm/ibmvnic.c
> index 50956f622b11..29cbf60dfd79 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -2755,7 +2755,7 @@ static int ibmvnic_reset(struct ibmvnic_adapter 
> *adapter,
> 
>  	if (adapter->state == VNIC_PROBING) {
>  		netdev_warn(netdev, "Adapter reset during probe\n");
> -		adapter->init_done_rc = EAGAIN;
> +		adapter->init_done_rc = -EAGAIN;
>  		ret = EAGAIN;
>  		goto err;
>  	}
> @@ -5266,11 +5266,6 @@ static void ibmvnic_handle_crq(union ibmvnic_crq 
> *crq,
>  			 */
>  			adapter->login_pending = false;
> 
> -			if (!completion_done(&adapter->init_done)) {
> -				complete(&adapter->init_done);
> -				adapter->init_done_rc = -EIO;
> -			}
> -
>  			if (adapter->state == VNIC_DOWN)
>  				rc = ibmvnic_reset(adapter, VNIC_RESET_PASSIVE_INIT);
>  			else
> @@ -5291,6 +5286,13 @@ static void ibmvnic_handle_crq(union ibmvnic_crq 
> *crq,
>  					   rc);
>  				adapter->failover_pending = false;
>  			}
> +
> +			if (!completion_done(&adapter->init_done)) {
> +				complete(&adapter->init_done);
> +				if (!adapter->init_done_rc)
> +					adapter->init_done_rc = -EAGAIN;
> +			}
> +
>  			break;
>  		case IBMVNIC_CRQ_INIT_COMPLETE:
>  			dev_info(dev, "Partner initialization complete\n");
> @@ -5763,7 +5765,7 @@ static int ibmvnic_probe(struct vio_dev *dev,
> const struct vio_device_id *id)
>  		}
> 
>  		rc = ibmvnic_reset_init(adapter, false);
> -	} while (rc == EAGAIN);
> +	} while (rc == -EAGAIN);
> 
>  	/* We are ignoring the error from ibmvnic_reset_init() assuming that 
> the
>  	 * partner is not ready. CRQ is not active. When the partner becomes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/3] ibmvnic: don't stop queue in xmit
  2021-10-29 22:03 [PATCH net 1/3] ibmvnic: don't stop queue in xmit Sukadev Bhattiprolu
                   ` (2 preceding siblings ...)
  2021-10-29 22:18 ` [PATCH net 1/3] ibmvnic: don't stop queue in xmit Dany Madden
@ 2021-11-01 13:20 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-11-01 13:20 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: netdev, brking, drt, abdhalee, vaish123

Hello:

This series was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 29 Oct 2021 15:03:14 -0700 you wrote:
> If adapter's resetting bit is on, discard the packet but don't stop the
> transmit queue - instead leave that to the reset code. With this change,
> it is possible that we may get several calls to ibmvnic_xmit() that simply
> discard packets and return.
> 
> But if we stop the queue here, we might end up doing so just after
> __ibmvnic_open() started the queues (during a hard/soft reset) and before
> the ->resetting bit was cleared. If that happens, there will be no one to
> restart queue and transmissions will be blocked indefinitely.
> 
> [...]

Here is the summary with links:
  - [net,1/3] ibmvnic: don't stop queue in xmit
    https://git.kernel.org/netdev/net/c/8878e46fcfd4
  - [net,2/3] ibmvnic: Process crqs after enabling interrupts
    https://git.kernel.org/netdev/net/c/6e20d00158f3
  - [net,3/3] ibmvnic: delay complete()
    https://git.kernel.org/netdev/net/c/6b278c0cb378

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-11-01 13:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-29 22:03 [PATCH net 1/3] ibmvnic: don't stop queue in xmit Sukadev Bhattiprolu
2021-10-29 22:03 ` [PATCH net 2/3] ibmvnic: Process crqs after enabling interrupts Sukadev Bhattiprolu
2021-10-29 22:23   ` Dany Madden
2021-10-29 22:03 ` [PATCH net 3/3] ibmvnic: delay complete() Sukadev Bhattiprolu
2021-10-29 22:25   ` Dany Madden
2021-10-29 22:18 ` [PATCH net 1/3] ibmvnic: don't stop queue in xmit Dany Madden
2021-11-01 13:20 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).