netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 net 0/3] Bug fixes for ENA ethernet driver
@ 2020-08-19 17:28 Shay Agroskin
  2020-08-19 17:28 ` [PATCH V3 net 1/3] net: ena: Prevent reset after device destruction Shay Agroskin
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Shay Agroskin @ 2020-08-19 17:28 UTC (permalink / raw)
  To: davem, netdev
  Cc: Shay Agroskin, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano, sameehj, ndagan

This series adds the following:
- Fix undesired call to ena_restore after returning from suspend
- Fix condition inside a WARN_ON
- Fix overriding previous value when updating missed_tx statistic

v1->v2:
- fix bug when calling reset routine after device resources are freed (Jakub)

v2->v3:
- fix wrong hash in 'Fixes' tag

Shay Agroskin (3):
  net: ena: Prevent reset after device destruction
  net: ena: Change WARN_ON expression in ena_del_napi_in_range()
  net: ena: Make missed_tx stat incremental

 drivers/net/ethernet/amazon/ena/ena_netdev.c | 35 ++++++++++----------
 1 file changed, 18 insertions(+), 17 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH V3 net 1/3] net: ena: Prevent reset after device destruction
  2020-08-19 17:28 [PATCH V3 net 0/3] Bug fixes for ENA ethernet driver Shay Agroskin
@ 2020-08-19 17:28 ` Shay Agroskin
  2020-08-19 17:28 ` [PATCH V3 net 2/3] net: ena: Change WARN_ON expression in ena_del_napi_in_range() Shay Agroskin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Shay Agroskin @ 2020-08-19 17:28 UTC (permalink / raw)
  To: davem, netdev
  Cc: Shay Agroskin, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano, sameehj, ndagan

The reset work is scheduled by the timer routine whenever it
detects that a device reset is required (e.g. when a keep_alive signal
is missing).
When releasing device resources in ena_destroy_device() the driver
cancels the scheduling of the timer routine without destroying the reset
work explicitly.

This creates the following bug:
    The driver is suspended and the ena_suspend() function is called
	-> This function calls ena_destroy_device() to free the net device
	   resources
	    -> The driver waits for the timer routine to finish
	    its execution and then cancels it, thus preventing from it
	    to be called again.

    If, in its final execution, the timer routine schedules a reset,
    the reset routine might be called afterwards,and a redundant call to
    ena_restore_device() would be made.

By changing the reset routine we allow it to read the device's state
accurately.
This is achieved by checking whether ENA_FLAG_TRIGGER_RESET flag is set
before resetting the device and making both the destruction function and
the flag check are under rtnl lock.
The ENA_FLAG_TRIGGER_RESET is cleared at the end of the destruction
routine. Also surround the flag check with 'likely' because
we expect that the reset routine would be called only when
ENA_FLAG_TRIGGER_RESET flag is set.

The destruction of the timer and reset services in __ena_shutoff() have to
stay, even though the timer routine is destroyed in ena_destroy_device().
This is to avoid a case in which the reset routine is scheduled after
free_netdev() in __ena_shutoff(), which would create an access to freed
memory in adapter->flags.

Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 2a6c9725e092..44aeace196f0 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3601,16 +3601,14 @@ static void ena_fw_reset_device(struct work_struct *work)
 {
 	struct ena_adapter *adapter =
 		container_of(work, struct ena_adapter, reset_task);
-	struct pci_dev *pdev = adapter->pdev;
 
-	if (unlikely(!test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))) {
-		dev_err(&pdev->dev,
-			"device reset schedule while reset bit is off\n");
-		return;
-	}
 	rtnl_lock();
-	ena_destroy_device(adapter, false);
-	ena_restore_device(adapter);
+
+	if (likely(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))) {
+		ena_destroy_device(adapter, false);
+		ena_restore_device(adapter);
+	}
+
 	rtnl_unlock();
 }
 
@@ -4389,8 +4387,11 @@ static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
 		netdev->rx_cpu_rmap = NULL;
 	}
 #endif /* CONFIG_RFS_ACCEL */
-	del_timer_sync(&adapter->timer_service);
 
+	/* Make sure timer and reset routine won't be called after
+	 * freeing device resources.
+	 */
+	del_timer_sync(&adapter->timer_service);
 	cancel_work_sync(&adapter->reset_task);
 
 	rtnl_lock(); /* lock released inside the below if-else block */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V3 net 2/3] net: ena: Change WARN_ON expression in ena_del_napi_in_range()
  2020-08-19 17:28 [PATCH V3 net 0/3] Bug fixes for ENA ethernet driver Shay Agroskin
  2020-08-19 17:28 ` [PATCH V3 net 1/3] net: ena: Prevent reset after device destruction Shay Agroskin
@ 2020-08-19 17:28 ` Shay Agroskin
  2020-08-19 17:28 ` [PATCH V3 net 3/3] net: ena: Make missed_tx stat incremental Shay Agroskin
  2020-08-19 22:34 ` [PATCH V3 net 0/3] Bug fixes for ENA ethernet driver David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: Shay Agroskin @ 2020-08-19 17:28 UTC (permalink / raw)
  To: davem, netdev
  Cc: Shay Agroskin, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano, sameehj, ndagan

The ena_del_napi_in_range() function unregisters the napi handler for
rings in a given range.
This function had the following WARN_ON macro:

    WARN_ON(ENA_IS_XDP_INDEX(adapter, i) &&
	    adapter->ena_napi[i].xdp_ring);

This macro prints the call stack if the expression inside of it is
true [1], but the expression inside of it is the wanted situation.
The expression checks whether the ring has an XDP queue and its index
corresponds to a XDP one.

This patch changes the expression to
    !ENA_IS_XDP_INDEX(adapter, i) && adapter->ena_napi[i].xdp_ring
which indicates an unwanted situation.

Also, change the structure of the function. The napi handler is
unregistered for all rings, and so there's no need to check whether the
index is an XDP index or not. By removing this check the code becomes
much more readable.

Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 44aeace196f0..233db15c970d 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2180,13 +2180,10 @@ static void ena_del_napi_in_range(struct ena_adapter *adapter,
 	int i;
 
 	for (i = first_index; i < first_index + count; i++) {
-		/* Check if napi was initialized before */
-		if (!ENA_IS_XDP_INDEX(adapter, i) ||
-		    adapter->ena_napi[i].xdp_ring)
-			netif_napi_del(&adapter->ena_napi[i].napi);
-		else
-			WARN_ON(ENA_IS_XDP_INDEX(adapter, i) &&
-				adapter->ena_napi[i].xdp_ring);
+		netif_napi_del(&adapter->ena_napi[i].napi);
+
+		WARN_ON(!ENA_IS_XDP_INDEX(adapter, i) &&
+			adapter->ena_napi[i].xdp_ring);
 	}
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V3 net 3/3] net: ena: Make missed_tx stat incremental
  2020-08-19 17:28 [PATCH V3 net 0/3] Bug fixes for ENA ethernet driver Shay Agroskin
  2020-08-19 17:28 ` [PATCH V3 net 1/3] net: ena: Prevent reset after device destruction Shay Agroskin
  2020-08-19 17:28 ` [PATCH V3 net 2/3] net: ena: Change WARN_ON expression in ena_del_napi_in_range() Shay Agroskin
@ 2020-08-19 17:28 ` Shay Agroskin
  2020-08-19 22:34 ` [PATCH V3 net 0/3] Bug fixes for ENA ethernet driver David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: Shay Agroskin @ 2020-08-19 17:28 UTC (permalink / raw)
  To: davem, netdev
  Cc: Shay Agroskin, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano, sameehj, ndagan

Most statistics in ena driver are incremented, meaning that a stat's
value is a sum of all increases done to it since driver/queue
initialization.

This patch makes all statistics this way, effectively making missed_tx
statistic incremental.
Also added a comment regarding rx_drops and tx_drops to make it
clearer how these counters are calculated.

Fixes: 11095fdb712b ("net: ena: add statistics for missed tx packets")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 233db15c970d..a3a8edf9a734 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3687,7 +3687,7 @@ static int check_missing_comp_in_tx_queue(struct ena_adapter *adapter,
 	}
 
 	u64_stats_update_begin(&tx_ring->syncp);
-	tx_ring->tx_stats.missed_tx = missed_tx;
+	tx_ring->tx_stats.missed_tx += missed_tx;
 	u64_stats_update_end(&tx_ring->syncp);
 
 	return rc;
@@ -4556,6 +4556,9 @@ static void ena_keep_alive_wd(void *adapter_data,
 	tx_drops = ((u64)desc->tx_drops_high << 32) | desc->tx_drops_low;
 
 	u64_stats_update_begin(&adapter->syncp);
+	/* These stats are accumulated by the device, so the counters indicate
+	 * all drops since last reset.
+	 */
 	adapter->dev_stats.rx_drops = rx_drops;
 	adapter->dev_stats.tx_drops = tx_drops;
 	u64_stats_update_end(&adapter->syncp);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V3 net 0/3] Bug fixes for ENA ethernet driver
  2020-08-19 17:28 [PATCH V3 net 0/3] Bug fixes for ENA ethernet driver Shay Agroskin
                   ` (2 preceding siblings ...)
  2020-08-19 17:28 ` [PATCH V3 net 3/3] net: ena: Make missed_tx stat incremental Shay Agroskin
@ 2020-08-19 22:34 ` David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2020-08-19 22:34 UTC (permalink / raw)
  To: shayagr
  Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano, sameehj, ndagan

From: Shay Agroskin <shayagr@amazon.com>
Date: Wed, 19 Aug 2020 20:28:35 +0300

> This series adds the following:
> - Fix undesired call to ena_restore after returning from suspend
> - Fix condition inside a WARN_ON
> - Fix overriding previous value when updating missed_tx statistic
> 
> v1->v2:
> - fix bug when calling reset routine after device resources are freed (Jakub)
> 
> v2->v3:
> - fix wrong hash in 'Fixes' tag

Series applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-19 22:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-19 17:28 [PATCH V3 net 0/3] Bug fixes for ENA ethernet driver Shay Agroskin
2020-08-19 17:28 ` [PATCH V3 net 1/3] net: ena: Prevent reset after device destruction Shay Agroskin
2020-08-19 17:28 ` [PATCH V3 net 2/3] net: ena: Change WARN_ON expression in ena_del_napi_in_range() Shay Agroskin
2020-08-19 17:28 ` [PATCH V3 net 3/3] net: ena: Make missed_tx stat incremental Shay Agroskin
2020-08-19 22:34 ` [PATCH V3 net 0/3] Bug fixes for ENA ethernet driver David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).