netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shay Agroskin <shayagr@amazon.com>
To: <davem@davemloft.net>, <netdev@vger.kernel.org>
Cc: <dwmw@amazon.com>, <zorik@amazon.com>, <matua@amazon.com>,
	<saeedb@amazon.com>, <msw@amazon.com>, <aliguori@amazon.com>,
	<nafea@amazon.com>, <gtzalik@amazon.com>, <netanel@amazon.com>,
	<alisaidi@amazon.com>, <benh@amazon.com>, <akiyano@amazon.com>,
	<sameehj@amazon.com>, <ndagan@amazon.com>,
	Shay Agroskin <shayagr@amazon.com>
Subject: [PATCH V1 net 1/3] net: ena: Prevent reset after device destruction
Date: Wed, 12 Aug 2020 13:10:57 +0300	[thread overview]
Message-ID: <20200812101059.5501-2-shayagr@amazon.com> (raw)
In-Reply-To: <20200812101059.5501-1-shayagr@amazon.com>

The reset work is scheduled by the timer routine whenever it
detects that a device reset is required (e.g. when a keep_alive signal
is missing).
When releasing device resources in ena_destroy() the driver cancels the
scheduling of the timer routine without destroying the reset
work explicitly.

This creates the following bug:
    The driver is suspended and the ena_suspend() function is called
	-> This function calls ena_destroy() to free the net device
	   resources
	    -> The driver waits for the timer routine to finish
	    its execution and then cancels it, thus preventing from it
	    to be called again.

    If, in its final execution, the timer routine schedules a reset,
    the reset routine might be called after the device resources are
    freed, which might cause a kernel panic.

By changing the reset routine so that it cannot run simultaneously with
the destruction routine, we allow the reset routine read the device's
state accurately.
This is achieved by checking whether ENA_FLAG_TRIGGER_RESET flag is set
before resetting the device and making both the destruction function and
the flag check are under rtnl lock.
The ENA_FLAG_TRIGGER_RESET is cleared at the end of the destruction
routine. Also surround the flag check with 'likely' because
we expect that the reset routine would be called only when
ENA_FLAG_TRIGGER_RESET flag is set.

This patch also removes the destruction of the timer and reset services
from ena_remove() since the timer is destroyed by the destruction
routine and the reset work is handled by this patch.

Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 2a6c9725e092..0488fcbf48f7 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3601,16 +3601,14 @@ static void ena_fw_reset_device(struct work_struct *work)
 {
 	struct ena_adapter *adapter =
 		container_of(work, struct ena_adapter, reset_task);
-	struct pci_dev *pdev = adapter->pdev;
 
-	if (unlikely(!test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))) {
-		dev_err(&pdev->dev,
-			"device reset schedule while reset bit is off\n");
-		return;
-	}
 	rtnl_lock();
-	ena_destroy_device(adapter, false);
-	ena_restore_device(adapter);
+
+	if (likely(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))) {
+		ena_destroy_device(adapter, false);
+		ena_restore_device(adapter);
+	}
+
 	rtnl_unlock();
 }
 
@@ -4389,9 +4387,6 @@ static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
 		netdev->rx_cpu_rmap = NULL;
 	}
 #endif /* CONFIG_RFS_ACCEL */
-	del_timer_sync(&adapter->timer_service);
-
-	cancel_work_sync(&adapter->reset_task);
 
 	rtnl_lock(); /* lock released inside the below if-else block */
 	adapter->reset_reason = ENA_REGS_RESET_SHUTDOWN;
-- 
2.28.0


  reply	other threads:[~2020-08-12 10:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-12 10:10 [PATCH V1 net 0/3] Bug fixes for ENA ethernet driver Shay Agroskin
2020-08-12 10:10 ` Shay Agroskin [this message]
2020-08-12 17:52   ` [PATCH V1 net 1/3] net: ena: Prevent reset after device destruction Jakub Kicinski
2020-08-13 12:51     ` Shay Agroskin
2020-08-13 20:41       ` Jakub Kicinski
2020-08-16 10:25         ` Shay Agroskin
2020-08-12 10:10 ` [PATCH V1 net 2/3] net: ena: Change WARN_ON expression in ena_del_napi_in_range() Shay Agroskin
2020-08-12 10:10 ` [PATCH V1 net 3/3] net: ena: Make missed_tx stat incremental Shay Agroskin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200812101059.5501-2-shayagr@amazon.com \
    --to=shayagr@amazon.com \
    --cc=akiyano@amazon.com \
    --cc=aliguori@amazon.com \
    --cc=alisaidi@amazon.com \
    --cc=benh@amazon.com \
    --cc=davem@davemloft.net \
    --cc=dwmw@amazon.com \
    --cc=gtzalik@amazon.com \
    --cc=matua@amazon.com \
    --cc=msw@amazon.com \
    --cc=nafea@amazon.com \
    --cc=ndagan@amazon.com \
    --cc=netanel@amazon.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedb@amazon.com \
    --cc=sameehj@amazon.com \
    --cc=zorik@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).