All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: ena: fix race condition between device reset and link up setup
@ 2017-11-18  3:05 netanel
  2017-11-19  3:40 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: netanel @ 2017-11-18  3:05 UTC (permalink / raw)
  To: davem, netdev
  Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, evgenys

From: Netanel Belgazal <netanel@amazon.com>

In rare cases, ena driver would reset and re-start the device,
for example, in case of misbehaving application that causes
transmit timeout

The first step in the reset procedure is to stop the Tx traffic by
calling ena_carrier_off().

After the driver have just started the device reset procedure, device
happens to send an asynchronous notification (via AENQ) to the driver
than there was a link change (to link-up state).
This link change is mapped to a call to netif_carrier_on() which
re-activates the Tx queues, violating the assumption of no tx traffic
until device reset is completed, as the reset task might still be in
the process of queues initialization, leading to an access to
uninitialized memory.
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 11 +++++++++--
 drivers/net/ethernet/amazon/ena/ena_netdev.h |  3 ++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 5417e4da64ca..988d0383b4e7 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2579,6 +2579,7 @@ static int ena_restore_device(struct ena_adapter *adapter)
 	bool wd_state;
 	int rc;
 
+	set_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags);
 	rc = ena_device_init(ena_dev, adapter->pdev, &get_feat_ctx, &wd_state);
 	if (rc) {
 		dev_err(&pdev->dev, "Can not initialize device\n");
@@ -2592,6 +2593,11 @@ static int ena_restore_device(struct ena_adapter *adapter)
 		goto err_device_destroy;
 	}
 
+	clear_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags);
+	/* Make sure we don't have a race with AENQ Links state handler */
+	if (test_bit(ENA_FLAG_LINK_UP, &adapter->flags))
+		netif_carrier_on(adapter->netdev);
+
 	rc = ena_enable_msix_and_set_admin_interrupts(adapter,
 						      adapter->num_queues);
 	if (rc) {
@@ -2618,7 +2624,7 @@ static int ena_restore_device(struct ena_adapter *adapter)
 	ena_com_admin_destroy(ena_dev);
 err:
 	clear_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags);
-
+	clear_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags);
 	dev_err(&pdev->dev,
 		"Reset attempt failed. Can not reset the device\n");
 
@@ -3495,7 +3501,8 @@ static void ena_update_on_link_change(void *adapter_data,
 	if (status) {
 		netdev_dbg(adapter->netdev, "%s\n", __func__);
 		set_bit(ENA_FLAG_LINK_UP, &adapter->flags);
-		netif_carrier_on(adapter->netdev);
+		if (!test_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags))
+			netif_carrier_on(adapter->netdev);
 	} else {
 		clear_bit(ENA_FLAG_LINK_UP, &adapter->flags);
 		netif_carrier_off(adapter->netdev);
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index ed8bd0a579c4..3bbc003871de 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -272,7 +272,8 @@ enum ena_flags_t {
 	ENA_FLAG_DEV_UP,
 	ENA_FLAG_LINK_UP,
 	ENA_FLAG_MSIX_ENABLED,
-	ENA_FLAG_TRIGGER_RESET
+	ENA_FLAG_TRIGGER_RESET,
+	ENA_FLAG_ONGOING_RESET
 };
 
 /* adapter specific private data structure */
-- 
2.7.3.AMZN

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] net: ena: fix race condition between device reset and link up setup
  2017-11-18  3:05 [PATCH net] net: ena: fix race condition between device reset and link up setup netanel
@ 2017-11-19  3:40 ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2017-11-19  3:40 UTC (permalink / raw)
  To: netanel; +Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea, evgenys

From: <netanel@amazon.com>
Date: Sat, 18 Nov 2017 03:05:36 +0000

> From: Netanel Belgazal <netanel@amazon.com>
> 
> In rare cases, ena driver would reset and re-start the device,
> for example, in case of misbehaving application that causes
> transmit timeout
> 
> The first step in the reset procedure is to stop the Tx traffic by
> calling ena_carrier_off().
> 
> After the driver have just started the device reset procedure, device
> happens to send an asynchronous notification (via AENQ) to the driver
> than there was a link change (to link-up state).
> This link change is mapped to a call to netif_carrier_on() which
> re-activates the Tx queues, violating the assumption of no tx traffic
> until device reset is completed, as the reset task might still be in
> the process of queues initialization, leading to an access to
> uninitialized memory.

This patch lacks a proper Signed-off-by: tag.

Please resubmit with this corrected.

Thank you.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] net: ena: fix race condition between device reset and link up setup
  2017-11-19 18:03 netanel
@ 2017-11-20  2:37 ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2017-11-20  2:37 UTC (permalink / raw)
  To: netanel; +Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea, evgenys

From: <netanel@amazon.com>
Date: Sun, 19 Nov 2017 18:03:40 +0000

> From: Netanel Belgazal <netanel@amazon.com>
> 
> In rare cases, ena driver would reset and re-start the device,
> for example, in case of misbehaving application that causes
> transmit timeout
> 
> The first step in the reset procedure is to stop the Tx traffic by
> calling ena_carrier_off().
> 
> After the driver have just started the device reset procedure, device
> happens to send an asynchronous notification (via AENQ) to the driver
> than there was a link change (to link-up state).
> This link change is mapped to a call to netif_carrier_on() which
> re-activates the Tx queues, violating the assumption of no tx traffic
> until device reset is completed, as the reset task might still be in
> the process of queues initialization, leading to an access to
> uninitialized memory.
> 
> Signed-off-by: Netanel Belgazal <netanel@amazon.com>

Applied.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net] net: ena: fix race condition between device reset and link up setup
@ 2017-11-19 18:03 netanel
  2017-11-20  2:37 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: netanel @ 2017-11-19 18:03 UTC (permalink / raw)
  To: davem, netdev
  Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, evgenys

From: Netanel Belgazal <netanel@amazon.com>

In rare cases, ena driver would reset and re-start the device,
for example, in case of misbehaving application that causes
transmit timeout

The first step in the reset procedure is to stop the Tx traffic by
calling ena_carrier_off().

After the driver have just started the device reset procedure, device
happens to send an asynchronous notification (via AENQ) to the driver
than there was a link change (to link-up state).
This link change is mapped to a call to netif_carrier_on() which
re-activates the Tx queues, violating the assumption of no tx traffic
until device reset is completed, as the reset task might still be in
the process of queues initialization, leading to an access to
uninitialized memory.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 11 +++++++++--
 drivers/net/ethernet/amazon/ena/ena_netdev.h |  3 ++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 5417e4da64ca..988d0383b4e7 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2579,6 +2579,7 @@ static int ena_restore_device(struct ena_adapter *adapter)
 	bool wd_state;
 	int rc;
 
+	set_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags);
 	rc = ena_device_init(ena_dev, adapter->pdev, &get_feat_ctx, &wd_state);
 	if (rc) {
 		dev_err(&pdev->dev, "Can not initialize device\n");
@@ -2592,6 +2593,11 @@ static int ena_restore_device(struct ena_adapter *adapter)
 		goto err_device_destroy;
 	}
 
+	clear_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags);
+	/* Make sure we don't have a race with AENQ Links state handler */
+	if (test_bit(ENA_FLAG_LINK_UP, &adapter->flags))
+		netif_carrier_on(adapter->netdev);
+
 	rc = ena_enable_msix_and_set_admin_interrupts(adapter,
 						      adapter->num_queues);
 	if (rc) {
@@ -2618,7 +2624,7 @@ static int ena_restore_device(struct ena_adapter *adapter)
 	ena_com_admin_destroy(ena_dev);
 err:
 	clear_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags);
-
+	clear_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags);
 	dev_err(&pdev->dev,
 		"Reset attempt failed. Can not reset the device\n");
 
@@ -3495,7 +3501,8 @@ static void ena_update_on_link_change(void *adapter_data,
 	if (status) {
 		netdev_dbg(adapter->netdev, "%s\n", __func__);
 		set_bit(ENA_FLAG_LINK_UP, &adapter->flags);
-		netif_carrier_on(adapter->netdev);
+		if (!test_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags))
+			netif_carrier_on(adapter->netdev);
 	} else {
 		clear_bit(ENA_FLAG_LINK_UP, &adapter->flags);
 		netif_carrier_off(adapter->netdev);
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index ed8bd0a579c4..3bbc003871de 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -272,7 +272,8 @@ enum ena_flags_t {
 	ENA_FLAG_DEV_UP,
 	ENA_FLAG_LINK_UP,
 	ENA_FLAG_MSIX_ENABLED,
-	ENA_FLAG_TRIGGER_RESET
+	ENA_FLAG_TRIGGER_RESET,
+	ENA_FLAG_ONGOING_RESET
 };
 
 /* adapter specific private data structure */
-- 
2.7.3.AMZN

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-11-20  2:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-18  3:05 [PATCH net] net: ena: fix race condition between device reset and link up setup netanel
2017-11-19  3:40 ` David Miller
2017-11-19 18:03 netanel
2017-11-20  2:37 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.