All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/3] mlxbf_gige: Fix several bugs
@ 2023-09-19 22:13 Asmaa Mnebhi
  2023-09-19 22:13 ` [PATCH v1 1/3] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Asmaa Mnebhi @ 2023-09-19 22:13 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, olteanv; +Cc: Asmaa Mnebhi, netdev, davthompson

Fix several bugs in the mlxbf_gige driver:
1) panic at shutdown
2) no ip assigned although link is up
3) clogged port due to RX pause frames

Asmaa Mnebhi (3):
  mlxbf_gige: Fix kernel panic at shutdown
  mlxbf_gige: Fix intermittent no ip issue
  mlxbf_gige: Enable the GigE port in mlxbf_gige_open

 .../mellanox/mlxbf_gige/mlxbf_gige_main.c     | 26 +++++++++----------
 .../mellanox/mlxbf_gige/mlxbf_gige_rx.c       |  9 ++++---
 2 files changed, 19 insertions(+), 16 deletions(-)

-- 
2.30.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 1/3] mlxbf_gige: Fix kernel panic at shutdown
  2023-09-19 22:13 [PATCH v1 0/3] mlxbf_gige: Fix several bugs Asmaa Mnebhi
@ 2023-09-19 22:13 ` Asmaa Mnebhi
  2023-09-20  2:51   ` Florian Fainelli
  2023-09-19 22:13 ` [PATCH v1 2/3] mlxbf_gige: Fix intermittent no ip issue Asmaa Mnebhi
  2023-09-19 22:13 ` [PATCH v1 3/3] mlxbf_gige: Enable the GigE port in mlxbf_gige_open Asmaa Mnebhi
  2 siblings, 1 reply; 5+ messages in thread
From: Asmaa Mnebhi @ 2023-09-19 22:13 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, olteanv; +Cc: Asmaa Mnebhi, netdev, davthompson

There is a race condition happening during shutdown due to pending napi transactions.
Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and as a
result causes a kernel panic:

[  284.074822] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
...
[  284.322326] Call trace:
[  284.324757]  mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige]
[  284.330924]  mlxbf_gige_poll+0x54/0x160 [mlxbf_gige]
[  284.335876]  __napi_poll+0x40/0x1c8
[  284.339353]  net_rx_action+0x314/0x3a0
[  284.343086]  __do_softirq+0x128/0x334
[  284.346734]  run_ksoftirqd+0x54/0x6c
[  284.350294]  smpboot_thread_fn+0x14c/0x190
[  284.354375]  kthread+0x10c/0x110
[  284.357588]  ret_from_fork+0x10/0x20
[  284.361150] Code: 8b070000 f9000ea0 f95056c0 f86178a1 (b9407002)
[  284.367227] ---[ end trace a18340bbb9ea2fa7 ]---

To fix this, return in the case where "priv" is NULL.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
index 0d5a41a2ae01..cfb8fb957f0c 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
@@ -298,6 +298,9 @@ int mlxbf_gige_poll(struct napi_struct *napi, int budget)
 
 	priv = container_of(napi, struct mlxbf_gige, napi);
 
+	if (!priv)
+		return 0;
+
 	mlxbf_gige_handle_tx_complete(priv);
 
 	do {
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 2/3] mlxbf_gige: Fix intermittent no ip issue
  2023-09-19 22:13 [PATCH v1 0/3] mlxbf_gige: Fix several bugs Asmaa Mnebhi
  2023-09-19 22:13 ` [PATCH v1 1/3] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
@ 2023-09-19 22:13 ` Asmaa Mnebhi
  2023-09-19 22:13 ` [PATCH v1 3/3] mlxbf_gige: Enable the GigE port in mlxbf_gige_open Asmaa Mnebhi
  2 siblings, 0 replies; 5+ messages in thread
From: Asmaa Mnebhi @ 2023-09-19 22:13 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, olteanv; +Cc: Asmaa Mnebhi, netdev, davthompson

Although the link is up, there is no ip assigned on a setup with high background
traffic. Nothing is transmitted nor received.
The RX error count keeps on increasing. After several minutes, the RX error count
stagnates and the GigE interface finally gets an ip.

The issue is in the mlxbf_gige_rx_init function. As soon as the RX DMA is enabled,
the RX CI reaches the max of 128, and it becomes equal to RX PI. RX CI doesn't decrease
since the code hasn't ran phy_start yet.

The solution is to move the rx init after phy_start.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
---
 .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 14 +++++++-------
 .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c   |  6 +++---
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
index 694de9513b9f..7d132a132a29 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
@@ -147,14 +147,14 @@ static int mlxbf_gige_open(struct net_device *netdev)
 	 */
 	priv->valid_polarity = 0;
 
-	err = mlxbf_gige_rx_init(priv);
+	phy_start(phydev);
+
+	err = mlxbf_gige_tx_init(priv);
 	if (err)
 		goto free_irqs;
-	err = mlxbf_gige_tx_init(priv);
+	err = mlxbf_gige_rx_init(priv);
 	if (err)
-		goto rx_deinit;
-
-	phy_start(phydev);
+		goto tx_deinit;
 
 	netif_napi_add(netdev, &priv->napi, mlxbf_gige_poll);
 	napi_enable(&priv->napi);
@@ -176,8 +176,8 @@ static int mlxbf_gige_open(struct net_device *netdev)
 
 	return 0;
 
-rx_deinit:
-	mlxbf_gige_rx_deinit(priv);
+tx_deinit:
+	mlxbf_gige_tx_deinit(priv);
 
 free_irqs:
 	mlxbf_gige_free_irqs(priv);
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
index cfb8fb957f0c..d82feeabb061 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
@@ -142,6 +142,9 @@ int mlxbf_gige_rx_init(struct mlxbf_gige *priv)
 	writeq(MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS_EN,
 	       priv->base + MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS);
 
+	writeq(ilog2(priv->rx_q_entries),
+	       priv->base + MLXBF_GIGE_RX_WQE_SIZE_LOG2);
+
 	/* Clear MLXBF_GIGE_INT_MASK 'receive pkt' bit to
 	 * indicate readiness to receive interrupts
 	 */
@@ -154,9 +157,6 @@ int mlxbf_gige_rx_init(struct mlxbf_gige *priv)
 	data |= MLXBF_GIGE_RX_DMA_EN;
 	writeq(data, priv->base + MLXBF_GIGE_RX_DMA);
 
-	writeq(ilog2(priv->rx_q_entries),
-	       priv->base + MLXBF_GIGE_RX_WQE_SIZE_LOG2);
-
 	return 0;
 
 free_wqe_and_skb:
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 3/3] mlxbf_gige: Enable the GigE port in mlxbf_gige_open
  2023-09-19 22:13 [PATCH v1 0/3] mlxbf_gige: Fix several bugs Asmaa Mnebhi
  2023-09-19 22:13 ` [PATCH v1 1/3] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
  2023-09-19 22:13 ` [PATCH v1 2/3] mlxbf_gige: Fix intermittent no ip issue Asmaa Mnebhi
@ 2023-09-19 22:13 ` Asmaa Mnebhi
  2 siblings, 0 replies; 5+ messages in thread
From: Asmaa Mnebhi @ 2023-09-19 22:13 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, olteanv; +Cc: Asmaa Mnebhi, netdev, davthompson

At the moment, the GigE port is enabled in the mlxbf_gige_probe
function. If the mlxbf_gige_open is not executed, this could cause
pause frames to increase in the case where there is high backgroud
traffic. This results in clogging the port.
So move enabling the OOB port to mlxbf_gige_open.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
---
 .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c   | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
index 7d132a132a29..b285a9d0a66f 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
@@ -130,9 +130,15 @@ static int mlxbf_gige_open(struct net_device *netdev)
 {
 	struct mlxbf_gige *priv = netdev_priv(netdev);
 	struct phy_device *phydev = netdev->phydev;
+	u64 control;
 	u64 int_en;
 	int err;
 
+	/* Perform general init of GigE block */
+	control = readq(base + MLXBF_GIGE_CONTROL);
+	control |= MLXBF_GIGE_CONTROL_PORT_EN;
+	writeq(control, base + MLXBF_GIGE_CONTROL);
+
 	err = mlxbf_gige_request_irqs(priv);
 	if (err)
 		return err;
@@ -365,7 +371,6 @@ static int mlxbf_gige_probe(struct platform_device *pdev)
 	void __iomem *plu_base;
 	void __iomem *base;
 	int addr, phy_irq;
-	u64 control;
 	int err;
 
 	base = devm_platform_ioremap_resource(pdev, MLXBF_GIGE_RES_MAC);
@@ -380,11 +385,6 @@ static int mlxbf_gige_probe(struct platform_device *pdev)
 	if (IS_ERR(plu_base))
 		return PTR_ERR(plu_base);
 
-	/* Perform general init of GigE block */
-	control = readq(base + MLXBF_GIGE_CONTROL);
-	control |= MLXBF_GIGE_CONTROL_PORT_EN;
-	writeq(control, base + MLXBF_GIGE_CONTROL);
-
 	netdev = devm_alloc_etherdev(&pdev->dev, sizeof(*priv));
 	if (!netdev)
 		return -ENOMEM;
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 1/3] mlxbf_gige: Fix kernel panic at shutdown
  2023-09-19 22:13 ` [PATCH v1 1/3] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
@ 2023-09-20  2:51   ` Florian Fainelli
  0 siblings, 0 replies; 5+ messages in thread
From: Florian Fainelli @ 2023-09-20  2:51 UTC (permalink / raw)
  To: Asmaa Mnebhi, davem, edumazet, kuba, pabeni, olteanv; +Cc: netdev, davthompson



On 9/19/2023 3:13 PM, Asmaa Mnebhi wrote:
> There is a race condition happening during shutdown due to pending napi transactions.
> Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and as a
> result causes a kernel panic:
> 
> [  284.074822] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
> ...
> [  284.322326] Call trace:
> [  284.324757]  mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige]
> [  284.330924]  mlxbf_gige_poll+0x54/0x160 [mlxbf_gige]
> [  284.335876]  __napi_poll+0x40/0x1c8
> [  284.339353]  net_rx_action+0x314/0x3a0
> [  284.343086]  __do_softirq+0x128/0x334
> [  284.346734]  run_ksoftirqd+0x54/0x6c
> [  284.350294]  smpboot_thread_fn+0x14c/0x190
> [  284.354375]  kthread+0x10c/0x110
> [  284.357588]  ret_from_fork+0x10/0x20
> [  284.361150] Code: 8b070000 f9000ea0 f95056c0 f86178a1 (b9407002)
> [  284.367227] ---[ end trace a18340bbb9ea2fa7 ]---
> 
> To fix this, return in the case where "priv" is NULL.
> 
> Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
> Reviewed-by: David Thompson <davthompson@nvidia.com>

This adds a test in a hot-path when the solution would be to simply make 
sure that the interface does not schedule any new NAPI calls, as well as 
stops being visible to the system. In its current form your shutdown 
function is trying to be as efficient as possible, I would just make 
your shutdown function the same as the remove function which would 
ensure the network device is torn down using a well traveled path.
-- 
Florian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-20  2:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-19 22:13 [PATCH v1 0/3] mlxbf_gige: Fix several bugs Asmaa Mnebhi
2023-09-19 22:13 ` [PATCH v1 1/3] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
2023-09-20  2:51   ` Florian Fainelli
2023-09-19 22:13 ` [PATCH v1 2/3] mlxbf_gige: Fix intermittent no ip issue Asmaa Mnebhi
2023-09-19 22:13 ` [PATCH v1 3/3] mlxbf_gige: Enable the GigE port in mlxbf_gige_open Asmaa Mnebhi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.