All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08
@ 2017-11-10  6:50 Saeed Mahameed
  2017-11-10  6:50 ` [net V2 1/5] net/mlx5: Loop over temp list to release delay events Saeed Mahameed
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Saeed Mahameed @ 2017-11-10  6:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Saeed Mahameed

Hi Dave,

The follwoing series includes some fixes for mlx5 core and etherent
driver.

Sorry for the late submission but as you can see i have some very
critical fixes below that i would like them merged into this RC. 

Please pull and let me know if there is any problem.

For -stable:
('net/mlx5e: Set page to null in case dma mapping fails') kernels >= 4.13
('net/mlx5: FPGA, return -EINVAL if size is zero') kernels >= 4.13
('net/mlx5: Cancel health poll before sending panic teardown command') kernels >= 4.13

V1->V2:
	- Fix Reviewed-by tag of the 2nd patch.
	- Drop the FPGA 0 size fix, it needs some more change log info. 

Thanks,
Saeed.

---

The following changes since commit 1cb483a5cc84b497afb51a6c5dfb5a38a0b67086:

  rds: ib: Fix NULL pointer dereference in debug code (2017-11-10 14:54:47 +0900)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2017-11-08

for you to fetch changes up to d1c61e6d79ea0d4d53dc18bcd2db30ef2d99cfa7:

  net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs (2017-11-10 15:39:21 +0900)

----------------------------------------------------------------
mlx5-fixes-2017-11-08

This series includes some fixes for mlx5 core and ethernet driver.

----------------------------------------------------------------
Eugenia Emantayev (1):
      net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs

Huy Nguyen (2):
      net/mlx5: Loop over temp list to release delay events
      net/mlx5: Cancel health poll before sending panic teardown command

Inbar Karmy (1):
      net/mlx5e: Set page to null in case dma mapping fails

Saeed Mahameed (1):
      net/mlx5e: Fix napi poll with zero budget

 drivers/net/ethernet/mellanox/mlx5/core/dev.c     |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 12 +++++-------
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 10 ++++++----
 drivers/net/ethernet/mellanox/mlx5/core/main.c    |  7 +++++++
 5 files changed, 20 insertions(+), 13 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [net V2 1/5] net/mlx5: Loop over temp list to release delay events
  2017-11-10  6:50 [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 Saeed Mahameed
@ 2017-11-10  6:50 ` Saeed Mahameed
  2017-11-10  6:50 ` [net V2 2/5] net/mlx5: Cancel health poll before sending panic teardown command Saeed Mahameed
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Saeed Mahameed @ 2017-11-10  6:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Huy Nguyen, Feras Daoud, Saeed Mahameed

From: Huy Nguyen <huyn@mellanox.com>

list_splice_init initializing waiting_events_list after splicing it to
temp list, therefore we should loop over temp list to fire the events.

Fixes: 4ca637a20a52 ("net/mlx5: Delay events till mlx5 interface's add complete for pci resume")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index fc281712869b..17b723218b0c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -93,7 +93,7 @@ static void delayed_event_release(struct mlx5_device_context *dev_ctx,
 	list_splice_init(&priv->waiting_events_list, &temp);
 	if (!dev_ctx->context)
 		goto out;
-	list_for_each_entry_safe(de, n, &priv->waiting_events_list, list)
+	list_for_each_entry_safe(de, n, &temp, list)
 		dev_ctx->intf->event(dev, dev_ctx->context, de->event, de->param);
 
 out:
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [net V2 2/5] net/mlx5: Cancel health poll before sending panic teardown command
  2017-11-10  6:50 [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 Saeed Mahameed
  2017-11-10  6:50 ` [net V2 1/5] net/mlx5: Loop over temp list to release delay events Saeed Mahameed
@ 2017-11-10  6:50 ` Saeed Mahameed
  2017-11-10  6:50 ` [net V2 3/5] net/mlx5e: Fix napi poll with zero budget Saeed Mahameed
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Saeed Mahameed @ 2017-11-10  6:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed

From: Huy Nguyen <huyn@mellanox.com>

After the panic teardown firmware command, health_care detects the error
in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is
no longer needed because the panic teardown firmware command will bring
down the PCI bus communication with the HCA.

The solution is to cancel the health care timer and its pending
workqueue request before sending panic teardown firmware command.

Kernel trace:
mlx5_core 0033:01:00.0: Shutdown was called
mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here
mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1
mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called
mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start
mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0
mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end
Unable to handle kernel paging request for data at address 0x0000003f
Faulting instruction address: 0xc0080000434b8c80

Fixes: 8812c24d28f4 ('net/mlx5: Add fast unload support in shutdown flow')
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 0d2c8dcd6eae..06562c9a6b9c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1482,9 +1482,16 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev *dev)
 		return -EAGAIN;
 	}
 
+	/* Panic tear down fw command will stop the PCI bus communication
+	 * with the HCA, so the health polll is no longer needed.
+	 */
+	mlx5_drain_health_wq(dev);
+	mlx5_stop_health_poll(dev);
+
 	ret = mlx5_cmd_force_teardown_hca(dev);
 	if (ret) {
 		mlx5_core_dbg(dev, "Firmware couldn't do fast unload error: %d\n", ret);
+		mlx5_start_health_poll(dev);
 		return ret;
 	}
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [net V2 3/5] net/mlx5e: Fix napi poll with zero budget
  2017-11-10  6:50 [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 Saeed Mahameed
  2017-11-10  6:50 ` [net V2 1/5] net/mlx5: Loop over temp list to release delay events Saeed Mahameed
  2017-11-10  6:50 ` [net V2 2/5] net/mlx5: Cancel health poll before sending panic teardown command Saeed Mahameed
@ 2017-11-10  6:50 ` Saeed Mahameed
  2017-11-10  6:50 ` [net V2 4/5] net/mlx5e: Set page to null in case dma mapping fails Saeed Mahameed
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Saeed Mahameed @ 2017-11-10  6:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Saeed Mahameed, kernel-team

napi->poll can be called with budget 0, e.g. in netpoll scenarios
where the caller only wants to poll TX rings
(poll_one_napi@net/core/netpoll.c).

The below commit changed RX polling from "while" loop to "do {} while",
which caused to ignore the initial budget and handle at least one RX
packet.

This fixes the following warning:
[ 2852.049194] mlx5e_napi_poll+0x0/0x260 [mlx5_core] exceeded budget in poll
[ 2852.049195] ------------[ cut here ]------------
[ 2852.049195] WARNING: CPU: 0 PID: 25691 at net/core/netpoll.c:171 netpoll_poll_dev+0x18a/0x1a0

Fixes: 4b7dfc992514 ("net/mlx5e: Early-return on empty completion queues")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Martin KaFai Lau <kafai@fb.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index e906b754415c..ab92298eafc3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -49,7 +49,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 	struct mlx5e_channel *c = container_of(napi, struct mlx5e_channel,
 					       napi);
 	bool busy = false;
-	int work_done;
+	int work_done = 0;
 	int i;
 
 	for (i = 0; i < c->num_tc; i++)
@@ -58,15 +58,17 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 	if (c->xdp)
 		busy |= mlx5e_poll_xdpsq_cq(&c->rq.xdpsq.cq);
 
-	work_done = mlx5e_poll_rx_cq(&c->rq.cq, budget);
-	busy |= work_done == budget;
+	if (likely(budget)) { /* budget=0 means: don't poll rx rings */
+		work_done = mlx5e_poll_rx_cq(&c->rq.cq, budget);
+		busy |= work_done == budget;
+	}
 
 	busy |= c->rq.post_wqes(&c->rq);
 
 	if (busy) {
 		if (likely(mlx5e_channel_no_affinity_change(c)))
 			return budget;
-		if (work_done == budget)
+		if (budget && work_done == budget)
 			work_done--;
 	}
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [net V2 4/5] net/mlx5e: Set page to null in case dma mapping fails
  2017-11-10  6:50 [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2017-11-10  6:50 ` [net V2 3/5] net/mlx5e: Fix napi poll with zero budget Saeed Mahameed
@ 2017-11-10  6:50 ` Saeed Mahameed
  2017-11-10  6:50 ` [net V2 5/5] net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs Saeed Mahameed
  2017-11-11 10:42 ` [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 David Miller
  5 siblings, 0 replies; 8+ messages in thread
From: Saeed Mahameed @ 2017-11-10  6:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Inbar Karmy, kernel-team, Saeed Mahameed

From: Inbar Karmy <inbark@mellanox.com>

Currently, when dma mapping fails, put_page is called,
but the page is not set to null. Later, in the page_reuse treatment in
mlx5e_free_rx_descs(), mlx5e_page_release() is called for the second time,
improperly doing dma_unmap (for a non-mapped address) and an extra put_page.
Prevent this by nullifying the page pointer when dma_map fails.

Fixes: accd58833237 ("net/mlx5e: Introduce RX Page-Reuse")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 15a1687483cc..91b1b0938931 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -215,22 +215,20 @@ static inline bool mlx5e_rx_cache_get(struct mlx5e_rq *rq,
 static inline int mlx5e_page_alloc_mapped(struct mlx5e_rq *rq,
 					  struct mlx5e_dma_info *dma_info)
 {
-	struct page *page;
-
 	if (mlx5e_rx_cache_get(rq, dma_info))
 		return 0;
 
-	page = dev_alloc_pages(rq->buff.page_order);
-	if (unlikely(!page))
+	dma_info->page = dev_alloc_pages(rq->buff.page_order);
+	if (unlikely(!dma_info->page))
 		return -ENOMEM;
 
-	dma_info->addr = dma_map_page(rq->pdev, page, 0,
+	dma_info->addr = dma_map_page(rq->pdev, dma_info->page, 0,
 				      RQ_PAGE_SIZE(rq), rq->buff.map_dir);
 	if (unlikely(dma_mapping_error(rq->pdev, dma_info->addr))) {
-		put_page(page);
+		put_page(dma_info->page);
+		dma_info->page = NULL;
 		return -ENOMEM;
 	}
-	dma_info->page = page;
 
 	return 0;
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [net V2 5/5] net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs
  2017-11-10  6:50 [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2017-11-10  6:50 ` [net V2 4/5] net/mlx5e: Set page to null in case dma mapping fails Saeed Mahameed
@ 2017-11-10  6:50 ` Saeed Mahameed
  2017-11-11 10:42 ` [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 David Miller
  5 siblings, 0 replies; 8+ messages in thread
From: Saeed Mahameed @ 2017-11-10  6:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eugenia Emantayev, kernel-team, Saeed Mahameed

From: Eugenia Emantayev <eugenia@mellanox.com>

This is to prevent the case of working with a single MPWQE
(1 WQE is always reserved as RQ is linked-list).
When the WQE is fully consumed, HW should still have available buffer
in order not to drop packets.

Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index cc13d3dbd366..13b5ef9d8703 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -67,7 +67,7 @@
 #define MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE                0xa
 #define MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE                0xd
 
-#define MLX5E_PARAMS_MINIMUM_LOG_RQ_SIZE_MPW            0x1
+#define MLX5E_PARAMS_MINIMUM_LOG_RQ_SIZE_MPW            0x2
 #define MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE_MPW            0x3
 #define MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE_MPW            0x6
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08
  2017-11-10  6:50 [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2017-11-10  6:50 ` [net V2 5/5] net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs Saeed Mahameed
@ 2017-11-11 10:42 ` David Miller
  2017-11-17 20:27   ` Saeed Mahameed
  5 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2017-11-11 10:42 UTC (permalink / raw)
  To: saeedm; +Cc: netdev

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Fri, 10 Nov 2017 15:50:15 +0900

> The follwoing series includes some fixes for mlx5 core and etherent
> driver.
> 
> Sorry for the late submission but as you can see i have some very
> critical fixes below that i would like them merged into this RC. 
> 
> Please pull and let me know if there is any problem.

Pulled.

> For -stable:
> ('net/mlx5e: Set page to null in case dma mapping fails') kernels >= 4.13
> ('net/mlx5: FPGA, return -EINVAL if size is zero') kernels >= 4.13
> ('net/mlx5: Cancel health poll before sending panic teardown command') kernels >= 4.13

That FPGA change doesn't appear in this pull request.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08
  2017-11-11 10:42 ` [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 David Miller
@ 2017-11-17 20:27   ` Saeed Mahameed
  0 siblings, 0 replies; 8+ messages in thread
From: Saeed Mahameed @ 2017-11-17 20:27 UTC (permalink / raw)
  To: David Miller; +Cc: Saeed Mahameed, Linux Netdev List

On Sat, Nov 11, 2017 at 2:42 AM, David Miller <davem@davemloft.net> wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
> Date: Fri, 10 Nov 2017 15:50:15 +0900
>
>> The follwoing series includes some fixes for mlx5 core and etherent
>> driver.
>>
>> Sorry for the late submission but as you can see i have some very
>> critical fixes below that i would like them merged into this RC.
>>
>> Please pull and let me know if there is any problem.
>
> Pulled.
>
>> For -stable:
>> ('net/mlx5e: Set page to null in case dma mapping fails') kernels >= 4.13
>> ('net/mlx5: FPGA, return -EINVAL if size is zero') kernels >= 4.13
>> ('net/mlx5: Cancel health poll before sending panic teardown command') kernels >= 4.13
>
> That FPGA change doesn't appear in this pull request.
>

Sorry about that, I had to drop it as you see in "V1->V2" log, but
forgot to remove it from the -stable list.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-17 20:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-10  6:50 [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 Saeed Mahameed
2017-11-10  6:50 ` [net V2 1/5] net/mlx5: Loop over temp list to release delay events Saeed Mahameed
2017-11-10  6:50 ` [net V2 2/5] net/mlx5: Cancel health poll before sending panic teardown command Saeed Mahameed
2017-11-10  6:50 ` [net V2 3/5] net/mlx5e: Fix napi poll with zero budget Saeed Mahameed
2017-11-10  6:50 ` [net V2 4/5] net/mlx5e: Set page to null in case dma mapping fails Saeed Mahameed
2017-11-10  6:50 ` [net V2 5/5] net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs Saeed Mahameed
2017-11-11 10:42 ` [pull request][net V2 0/5] Mellanox, mlx5 fixes 2017-11-08 David Miller
2017-11-17 20:27   ` Saeed Mahameed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.