All of lore.kernel.org
 help / color / mirror / Atom feed
From: <akiyano@amazon.com>
To: <davem@davemloft.net>, <netdev@vger.kernel.org>
Cc: Arthur Kiyanovski <akiyano@amazon.com>, <dwmw@amazon.com>,
	<zorik@amazon.com>, <matua@amazon.com>, <saeedb@amazon.com>,
	<msw@amazon.com>, <aliguori@amazon.com>, <nafea@amazon.com>,
	<gtzalik@amazon.com>, <netanel@amazon.com>, <alisaidi@amazon.com>
Subject: [PATCH V1 net 2/3] net: ena: fix crash during ena_remove()
Date: Mon, 19 Nov 2018 12:05:21 +0200	[thread overview]
Message-ID: <1542621922-31484-3-git-send-email-akiyano@amazon.com> (raw)
In-Reply-To: <1542621922-31484-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

In ena_remove() we have the following stack call:
ena_remove()
  unregister_netdev()
  ena_destroy_device()
    netif_carrier_off()

Calling netif_carrier_off() causes linkwatch to try to handle the
link change event on the already unregistered netdev, which leads
to a read from an unreadable memory address.

This patch switches the order of the two functions, so that
netif_carrier_off() is called on a regiestered netdev.

To accomplish this fix we also had to:
1. Remove the set bit ENA_FLAG_TRIGGER_RESET
2. Add a sanitiy check in ena_close()
both to prevent double device reset (when calling unregister_netdev()
ena_close is called, but the device was already deleted in
ena_destroy_device()).
3. Set the admin_queue running state to false to avoid using it after
device was reset (for example when calling ena_destroy_all_io_queues()
right after ena_com_dev_reset() in ena_down)

Fixes: 944b28aa2982 ("net: ena: fix missing lock during device destruction")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 1d3cead..a70bb1b 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1848,6 +1848,8 @@ static void ena_down(struct ena_adapter *adapter)
 		rc = ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason);
 		if (rc)
 			dev_err(&adapter->pdev->dev, "Device reset failed\n");
+		/* stop submitting admin commands on a device that was reset */
+		ena_com_set_admin_running_state(adapter->ena_dev, false);
 	}
 
 	ena_destroy_all_io_queues(adapter);
@@ -1914,6 +1916,9 @@ static int ena_close(struct net_device *netdev)
 
 	netif_dbg(adapter, ifdown, netdev, "%s\n", __func__);
 
+	if (!test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags))
+		return 0;
+
 	if (test_bit(ENA_FLAG_DEV_UP, &adapter->flags))
 		ena_down(adapter);
 
@@ -2613,9 +2618,7 @@ static void ena_destroy_device(struct ena_adapter *adapter, bool graceful)
 		ena_down(adapter);
 
 	/* Stop the device from sending AENQ events (in case reset flag is set
-	 *  and device is up, ena_close already reset the device
-	 * In case the reset flag is set and the device is up, ena_down()
-	 * already perform the reset, so it can be skipped.
+	 *  and device is up, ena_down() already reset the device.
 	 */
 	if (!(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags) && dev_up))
 		ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason);
@@ -3452,6 +3455,8 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	ena_com_rss_destroy(ena_dev);
 err_free_msix:
 	ena_com_dev_reset(ena_dev, ENA_REGS_RESET_INIT_ERR);
+	/* stop submitting admin commands on a device that was reset */
+	ena_com_set_admin_running_state(ena_dev, false);
 	ena_free_mgmnt_irq(adapter);
 	ena_disable_msix(adapter);
 err_worker_destroy:
@@ -3498,18 +3503,12 @@ static void ena_remove(struct pci_dev *pdev)
 
 	cancel_work_sync(&adapter->reset_task);
 
-	unregister_netdev(netdev);
-
-	/* If the device is running then we want to make sure the device will be
-	 * reset to make sure no more events will be issued by the device.
-	 */
-	if (test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags))
-		set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags);
-
 	rtnl_lock();
 	ena_destroy_device(adapter, true);
 	rtnl_unlock();
 
+	unregister_netdev(netdev);
+
 	free_netdev(netdev);
 
 	ena_com_rss_destroy(ena_dev);
-- 
2.7.4

  parent reply	other threads:[~2018-11-19 20:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-19 10:05 [PATCH V1 net 0/3] net: ena: hibernation and rmmod bug fixes akiyano
2018-11-19 10:05 ` [PATCH V1 net 1/3] net: ena: fix crash during failed resume from hibernation akiyano
2018-11-19 10:05 ` akiyano [this message]
2018-11-19 10:05 ` [PATCH V1 net 3/3] net: ena: update driver version from 2.0.1 to 2.0.2 akiyano
2018-11-19 23:13 ` [PATCH V1 net 0/3] net: ena: hibernation and rmmod bug fixes David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1542621922-31484-3-git-send-email-akiyano@amazon.com \
    --to=akiyano@amazon.com \
    --cc=aliguori@amazon.com \
    --cc=alisaidi@amazon.com \
    --cc=davem@davemloft.net \
    --cc=dwmw@amazon.com \
    --cc=gtzalik@amazon.com \
    --cc=matua@amazon.com \
    --cc=msw@amazon.com \
    --cc=nafea@amazon.com \
    --cc=netanel@amazon.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedb@amazon.com \
    --cc=zorik@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.