netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 net 0/4] ENA driver bug fixes
@ 2020-03-17  7:06 akiyano
  2020-03-17  7:06 ` [PATCH V2 net 1/4] net: ena: fix incorrect setting of the number of msix vectors akiyano
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: akiyano @ 2020-03-17  7:06 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, ndagan, shayagr,
	sameehj

From: Arthur Kiyanovski <akiyano@amazon.com>

ENA driver bug fixes

Arthur Kiyanovski (4):
  net: ena: fix incorrect setting of the number of msix vectors
  net: ena: fix request of incorrect number of IRQ vectors
  net: ena: avoid memory access violation by validating req_id properly
  net: ena: fix continuous keep-alive resets

 drivers/net/ethernet/amazon/ena/ena_netdev.c | 27 ++++++++++++++------
 1 file changed, 19 insertions(+), 8 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V2 net 1/4] net: ena: fix incorrect setting of the number of msix vectors
  2020-03-17  7:06 [PATCH V2 net 0/4] ENA driver bug fixes akiyano
@ 2020-03-17  7:06 ` akiyano
  2020-03-17  7:06 ` [PATCH V2 net 2/4] net: ena: fix request of incorrect number of IRQ vectors akiyano
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: akiyano @ 2020-03-17  7:06 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, ndagan, shayagr,
	sameehj

From: Arthur Kiyanovski <akiyano@amazon.com>

Overview:
We don't frequently change the msix vectors throughout the life cycle of
the driver. We do so in two functions: ena_probe() and ena_restore().
ena_probe() is only called when the driver is loaded. ena_restore() on the
other hand is called during device reset / resume operations.

We use num_io_queues for calculating and allocating the number of msix
vectors. At ena_probe() this value is equal to max_num_io_queues and thus
this is not an issue, however ena_restore() might be called after the
number of io queues has changed.

A possible bug scenario is as follows:

* Change number of queues from 8 to 4.
  (num_io_queues = 4, max_num_io_queues = 8, msix_vecs = 9,)
* Trigger reset occurs -> ena_restore is called.
  (num_io_queues = 4, max_num_io_queues =8 , msix_vecs = 5)
* Change number of queues from 4 to 6.
  (num_io_queues = 6, max_num_io_queues = 8, msix_vecs = 5)
* The driver will reset due to failure of check_for_rx_interrupt_queue()

Fix:
This can be easily fixed by always using max_num_io_queues to init the
msix_vecs, since this number won't change as opposed to num_io_queues.

Fixes: 4d19266022ec ("net: ena: multiple queue creation related cleanups")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 0b2fd96b93d7..39f290f4458f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1968,7 +1968,7 @@ static int ena_enable_msix(struct ena_adapter *adapter)
 	}
 
 	/* Reserved the max msix vectors we might need */
-	msix_vecs = ENA_MAX_MSIX_VEC(adapter->num_io_queues);
+	msix_vecs = ENA_MAX_MSIX_VEC(adapter->max_num_io_queues);
 	netif_dbg(adapter, probe, adapter->netdev,
 		  "trying to enable MSI-X, vectors %d\n", msix_vecs);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V2 net 2/4] net: ena: fix request of incorrect number of IRQ vectors
  2020-03-17  7:06 [PATCH V2 net 0/4] ENA driver bug fixes akiyano
  2020-03-17  7:06 ` [PATCH V2 net 1/4] net: ena: fix incorrect setting of the number of msix vectors akiyano
@ 2020-03-17  7:06 ` akiyano
  2020-03-17  7:06 ` [PATCH V2 net 3/4] net: ena: avoid memory access violation by validating req_id properly akiyano
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: akiyano @ 2020-03-17  7:06 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, ndagan, shayagr,
	sameehj

From: Arthur Kiyanovski <akiyano@amazon.com>

Bug:
In short the main issue is caused by the fact that the number of queues
is changed using ethtool after ena_probe() has been called and before
ena_up() was executed. Here is the full scenario in detail:

* ena_probe() is called when the driver is loaded, the driver is not up
  yet at the end of ena_probe().
* The number of queues is changed -> io_queue_count is changed as well -
  ena_up() is not called since the "dev_was_up" boolean in
  ena_update_queue_count() is false.
* ena_up() is called by the kernel (it's called asynchronously some
  time after ena_probe()). ena_setup_io_intr() is called by ena_up() and
  it uses io_queue_count to get the suitable irq lines for each msix
  vector. The function ena_request_io_irq() is called right after that
  and it uses msix_vecs - This value only changes during ena_probe() and
  ena_restore() - to request the irq vectors. This results in "Failed to
  request I/O IRQ" error for i > io_queue_count.

Numeric example:
* After ena_probe() io_queue_count = 8, msix_vecs = 9.
* The number of queues changes to 4 -> io_queue_count = 4, msix_vecs = 9.
* ena_up() is executed for the first time:
  ** ena_setup_io_intr() inits the vectors only up to io_queue_count.
  ** ena_request_io_irq() calls request_irq() and fails for i = 5.

How to reproduce:
simply run the following commands:
    sudo rmmod ena && sudo insmod ena.ko;
    sudo ethtool -L eth1 combined 3;

Fix:
Use ENA_MAX_MSIX_VEC(adapter->num_io_queues + adapter->xdp_num_queues)
instead of adapter->msix_vecs. We need to take XDP queues into
consideration as they need to have msix vectors assigned to them as well.
Note that the XDP cannot be attached before the driver is up and running
but in XDP mode the issue might occur when the number of queues changes
right after a reset trigger.
The ENA_MAX_MSIX_VEC simply adds one to the argument since the first msix
vector is reserved for management queue.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 39f290f4458f..836fda585391 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2068,6 +2068,7 @@ static int ena_request_mgmnt_irq(struct ena_adapter *adapter)
 
 static int ena_request_io_irq(struct ena_adapter *adapter)
 {
+	u32 io_queue_count = adapter->num_io_queues + adapter->xdp_num_queues;
 	unsigned long flags = 0;
 	struct ena_irq *irq;
 	int rc = 0, i, k;
@@ -2078,7 +2079,7 @@ static int ena_request_io_irq(struct ena_adapter *adapter)
 		return -EINVAL;
 	}
 
-	for (i = ENA_IO_IRQ_FIRST_IDX; i < adapter->msix_vecs; i++) {
+	for (i = ENA_IO_IRQ_FIRST_IDX; i < ENA_MAX_MSIX_VEC(io_queue_count); i++) {
 		irq = &adapter->irq_tbl[i];
 		rc = request_irq(irq->vector, irq->handler, flags, irq->name,
 				 irq->data);
@@ -2119,6 +2120,7 @@ static void ena_free_mgmnt_irq(struct ena_adapter *adapter)
 
 static void ena_free_io_irq(struct ena_adapter *adapter)
 {
+	u32 io_queue_count = adapter->num_io_queues + adapter->xdp_num_queues;
 	struct ena_irq *irq;
 	int i;
 
@@ -2129,7 +2131,7 @@ static void ena_free_io_irq(struct ena_adapter *adapter)
 	}
 #endif /* CONFIG_RFS_ACCEL */
 
-	for (i = ENA_IO_IRQ_FIRST_IDX; i < adapter->msix_vecs; i++) {
+	for (i = ENA_IO_IRQ_FIRST_IDX; i < ENA_MAX_MSIX_VEC(io_queue_count); i++) {
 		irq = &adapter->irq_tbl[i];
 		irq_set_affinity_hint(irq->vector, NULL);
 		free_irq(irq->vector, irq->data);
@@ -2144,12 +2146,13 @@ static void ena_disable_msix(struct ena_adapter *adapter)
 
 static void ena_disable_io_intr_sync(struct ena_adapter *adapter)
 {
+	u32 io_queue_count = adapter->num_io_queues + adapter->xdp_num_queues;
 	int i;
 
 	if (!netif_running(adapter->netdev))
 		return;
 
-	for (i = ENA_IO_IRQ_FIRST_IDX; i < adapter->msix_vecs; i++)
+	for (i = ENA_IO_IRQ_FIRST_IDX; i < ENA_MAX_MSIX_VEC(io_queue_count); i++)
 		synchronize_irq(adapter->irq_tbl[i].vector);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V2 net 3/4] net: ena: avoid memory access violation by validating req_id properly
  2020-03-17  7:06 [PATCH V2 net 0/4] ENA driver bug fixes akiyano
  2020-03-17  7:06 ` [PATCH V2 net 1/4] net: ena: fix incorrect setting of the number of msix vectors akiyano
  2020-03-17  7:06 ` [PATCH V2 net 2/4] net: ena: fix request of incorrect number of IRQ vectors akiyano
@ 2020-03-17  7:06 ` akiyano
  2020-03-17  7:06 ` [PATCH V2 net 4/4] net: ena: fix continuous keep-alive resets akiyano
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: akiyano @ 2020-03-17  7:06 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, ndagan, shayagr,
	sameehj

From: Arthur Kiyanovski <akiyano@amazon.com>

Rx req_id is an index in struct ena_eth_io_rx_cdesc_base.
The driver should validate that the Rx req_id it received from
the device is in range [0, ring_size -1].  Failure to do so could
yield to potential memory access violoation.
The validation was mistakenly done when refilling
the Rx submission queue and not in Rx completion queue.

Fixes: ad974baef2a1 ("net: ena: add support for out of order rx buffers refill")
Signed-off-by: Noam Dagan <ndagan@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 836fda585391..51333a05c14d 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1018,13 +1018,9 @@ static int ena_refill_rx_bufs(struct ena_ring *rx_ring, u32 num)
 		struct ena_rx_buffer *rx_info;
 
 		req_id = rx_ring->free_ids[next_to_use];
-		rc = validate_rx_req_id(rx_ring, req_id);
-		if (unlikely(rc < 0))
-			break;
 
 		rx_info = &rx_ring->rx_buffer_info[req_id];
 
-
 		rc = ena_alloc_rx_page(rx_ring, rx_info,
 				       GFP_ATOMIC | __GFP_COMP);
 		if (unlikely(rc < 0)) {
@@ -1379,9 +1375,15 @@ static struct sk_buff *ena_rx_skb(struct ena_ring *rx_ring,
 	struct ena_rx_buffer *rx_info;
 	u16 len, req_id, buf = 0;
 	void *va;
+	int rc;
 
 	len = ena_bufs[buf].len;
 	req_id = ena_bufs[buf].req_id;
+
+	rc = validate_rx_req_id(rx_ring, req_id);
+	if (unlikely(rc < 0))
+		return NULL;
+
 	rx_info = &rx_ring->rx_buffer_info[req_id];
 
 	if (unlikely(!rx_info->page)) {
@@ -1454,6 +1456,11 @@ static struct sk_buff *ena_rx_skb(struct ena_ring *rx_ring,
 		buf++;
 		len = ena_bufs[buf].len;
 		req_id = ena_bufs[buf].req_id;
+
+		rc = validate_rx_req_id(rx_ring, req_id);
+		if (unlikely(rc < 0))
+			return NULL;
+
 		rx_info = &rx_ring->rx_buffer_info[req_id];
 	} while (1);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V2 net 4/4] net: ena: fix continuous keep-alive resets
  2020-03-17  7:06 [PATCH V2 net 0/4] ENA driver bug fixes akiyano
                   ` (2 preceding siblings ...)
  2020-03-17  7:06 ` [PATCH V2 net 3/4] net: ena: avoid memory access violation by validating req_id properly akiyano
@ 2020-03-17  7:06 ` akiyano
  2020-03-17 20:22 ` [PATCH V2 net 0/4] ENA driver bug fixes Jakub Kicinski
  2020-03-18  4:25 ` David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: akiyano @ 2020-03-17  7:06 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, ndagan, shayagr,
	sameehj

From: Arthur Kiyanovski <akiyano@amazon.com>

last_keep_alive_jiffies is updated in probe and when a keep-alive
event is received.  In case the driver times-out on a keep-alive event,
it has high chances of continuously timing-out on keep-alive events.
This is because when the driver recovers from the keep-alive-timeout reset
the value of last_keep_alive_jiffies is very old, and if a keep-alive
event is not received before the next timer expires, the value of
last_keep_alive_jiffies will cause another keep-alive-timeout reset
and so forth in a loop.

Solution:
Update last_keep_alive_jiffies whenever the device is restored after
reset.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Noam Dagan <ndagan@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 51333a05c14d..4647d7656761 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3486,6 +3486,7 @@ static int ena_restore_device(struct ena_adapter *adapter)
 		netif_carrier_on(adapter->netdev);
 
 	mod_timer(&adapter->timer_service, round_jiffies(jiffies + HZ));
+	adapter->last_keep_alive_jiffies = jiffies;
 	dev_err(&pdev->dev,
 		"Device reset completed successfully, Driver info: %s\n",
 		version);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 net 0/4] ENA driver bug fixes
  2020-03-17  7:06 [PATCH V2 net 0/4] ENA driver bug fixes akiyano
                   ` (3 preceding siblings ...)
  2020-03-17  7:06 ` [PATCH V2 net 4/4] net: ena: fix continuous keep-alive resets akiyano
@ 2020-03-17 20:22 ` Jakub Kicinski
  2020-03-18  4:25 ` David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2020-03-17 20:22 UTC (permalink / raw)
  To: akiyano
  Cc: davem, netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, ndagan, shayagr, sameehj

On Tue, 17 Mar 2020 09:06:38 +0200 akiyano@amazon.com wrote:
> From: Arthur Kiyanovski <akiyano@amazon.com>
> 
> ENA driver bug fixes

Acked-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 net 0/4] ENA driver bug fixes
  2020-03-17  7:06 [PATCH V2 net 0/4] ENA driver bug fixes akiyano
                   ` (4 preceding siblings ...)
  2020-03-17 20:22 ` [PATCH V2 net 0/4] ENA driver bug fixes Jakub Kicinski
@ 2020-03-18  4:25 ` David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2020-03-18  4:25 UTC (permalink / raw)
  To: akiyano
  Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, ndagan, shayagr, sameehj

From: <akiyano@amazon.com>
Date: Tue, 17 Mar 2020 09:06:38 +0200

> From: Arthur Kiyanovski <akiyano@amazon.com>
> 
> ENA driver bug fixes

Series applied, thank you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-18  4:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-17  7:06 [PATCH V2 net 0/4] ENA driver bug fixes akiyano
2020-03-17  7:06 ` [PATCH V2 net 1/4] net: ena: fix incorrect setting of the number of msix vectors akiyano
2020-03-17  7:06 ` [PATCH V2 net 2/4] net: ena: fix request of incorrect number of IRQ vectors akiyano
2020-03-17  7:06 ` [PATCH V2 net 3/4] net: ena: avoid memory access violation by validating req_id properly akiyano
2020-03-17  7:06 ` [PATCH V2 net 4/4] net: ena: fix continuous keep-alive resets akiyano
2020-03-17 20:22 ` [PATCH V2 net 0/4] ENA driver bug fixes Jakub Kicinski
2020-03-18  4:25 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).