netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05
@ 2020-03-05 23:17 Saeed Mahameed
  2020-03-05 23:17 ` [net 1/5] net/mlx5: DR, Fix postsend actions write length Saeed Mahameed
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Saeed Mahameed @ 2020-03-05 23:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Saeed Mahameed

Hi Dave,

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v5.3
 ('net/mlx5e: kTLS, Fix wrong value in record tracker enum')

For -stable v5.4
 ('net/mlx5: DR, Fix postsend actions write length')

For -stable v5.5
 ('net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow')
 ('net/mlx5e: Fix endianness handling in pedit mask')

Thanks,
Saeed.

---
The following changes since commit 3b4f06c715d0d3ecd6497275e3c85fe91462d0ee:

  sfc: complete the next packet when we receive a timestamp (2020-03-05 14:56:57 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2020-03-05

for you to fetch changes up to 0b136454741b2f6cb18d55e355d396db9248b2ab:

  net/mlx5: Clear LAG notifier pointer after unregister (2020-03-05 15:13:45 -0800)

----------------------------------------------------------------
mlx5-fixes-2020-03-05

----------------------------------------------------------------
Eli Cohen (1):
      net/mlx5: Clear LAG notifier pointer after unregister

Hamdan Igbaria (1):
      net/mlx5: DR, Fix postsend actions write length

Sebastian Hense (1):
      net/mlx5e: Fix endianness handling in pedit mask

Tariq Toukan (2):
      net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow
      net/mlx5e: kTLS, Fix wrong value in record tracker enum

 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h      | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c              | 5 +++--
 drivers/net/ethernet/mellanox/mlx5/core/lag.c                | 4 +++-
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c   | 3 ++-
 6 files changed, 11 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [net 1/5] net/mlx5: DR, Fix postsend actions write length
  2020-03-05 23:17 [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Saeed Mahameed
@ 2020-03-05 23:17 ` Saeed Mahameed
  2020-03-05 23:17 ` [net 2/5] net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow Saeed Mahameed
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2020-03-05 23:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Hamdan Igbaria, Alex Vesker, Saeed Mahameed

From: Hamdan Igbaria <hamdani@mellanox.com>

Fix the send info write length to be (actions x action) size in bytes.

Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations")
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c   | 3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
index 6dec2a550a10..2d93228ff633 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
@@ -933,7 +933,6 @@ static int dr_actions_l2_rewrite(struct mlx5dr_domain *dmn,
 
 	action->rewrite.data = (void *)ops;
 	action->rewrite.num_of_actions = i;
-	action->rewrite.chunk->byte_size = i * sizeof(*ops);
 
 	ret = mlx5dr_send_postsend_action(dmn, action);
 	if (ret) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
index c7f10d4f8f8d..095ec7b1399d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
@@ -558,7 +558,8 @@ int mlx5dr_send_postsend_action(struct mlx5dr_domain *dmn,
 	int ret;
 
 	send_info.write.addr = (uintptr_t)action->rewrite.data;
-	send_info.write.length = action->rewrite.chunk->byte_size;
+	send_info.write.length = action->rewrite.num_of_actions *
+				 DR_MODIFY_ACTION_SIZE;
 	send_info.write.lkey = 0;
 	send_info.remote_addr = action->rewrite.chunk->mr_addr;
 	send_info.rkey = action->rewrite.chunk->rkey;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net 2/5] net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow
  2020-03-05 23:17 [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Saeed Mahameed
  2020-03-05 23:17 ` [net 1/5] net/mlx5: DR, Fix postsend actions write length Saeed Mahameed
@ 2020-03-05 23:17 ` Saeed Mahameed
  2020-03-05 23:17 ` [net 3/5] net/mlx5e: kTLS, Fix wrong value in record tracker enum Saeed Mahameed
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2020-03-05 23:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Tariq Toukan, Boris Pismenny, Saeed Mahameed

From: Tariq Toukan <tariqt@mellanox.com>

We have an off-by-1 issue in the TCP seq comparison.
The last sequence number that belongs to the TCP packet's payload
is not "start_seq + len", but one byte before it.
Fix it so the 'ends_before' is evaluated properly.

This fixes a bug that results in error completions in the
kTLS HW offload flows.

Fixes: ffbd9ca94e2e ("net/mlx5e: kTLS, Fix corner-case checks in TX resync flow")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index f260dd96873b..52a56622034a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -218,7 +218,7 @@ tx_sync_info_get(struct mlx5e_ktls_offload_context_tx *priv_tx,
 	 *    this packet was already acknowledged and its record info
 	 *    was released.
 	 */
-	ends_before = before(tcp_seq + datalen, tls_record_start_seq(record));
+	ends_before = before(tcp_seq + datalen - 1, tls_record_start_seq(record));
 
 	if (unlikely(tls_record_is_start_marker(record))) {
 		ret = ends_before ? MLX5E_KTLS_SYNC_SKIP_NO_DATA : MLX5E_KTLS_SYNC_FAIL;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net 3/5] net/mlx5e: kTLS, Fix wrong value in record tracker enum
  2020-03-05 23:17 [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Saeed Mahameed
  2020-03-05 23:17 ` [net 1/5] net/mlx5: DR, Fix postsend actions write length Saeed Mahameed
  2020-03-05 23:17 ` [net 2/5] net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow Saeed Mahameed
@ 2020-03-05 23:17 ` Saeed Mahameed
  2020-03-05 23:17 ` [net 4/5] net/mlx5e: Fix endianness handling in pedit mask Saeed Mahameed
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2020-03-05 23:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Tariq Toukan, Boris Pismenny, Saeed Mahameed

From: Tariq Toukan <tariqt@mellanox.com>

Fix to match the HW spec: TRACKING state is 1, SEARCHING is 2.
No real issue for now, as these values are not currently used.

Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
index a3efa29a4629..63116be6b1d6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
@@ -38,8 +38,8 @@ enum {
 
 enum {
 	MLX5E_TLS_PROGRESS_PARAMS_RECORD_TRACKER_STATE_START     = 0,
-	MLX5E_TLS_PROGRESS_PARAMS_RECORD_TRACKER_STATE_SEARCHING = 1,
-	MLX5E_TLS_PROGRESS_PARAMS_RECORD_TRACKER_STATE_TRACKING  = 2,
+	MLX5E_TLS_PROGRESS_PARAMS_RECORD_TRACKER_STATE_TRACKING  = 1,
+	MLX5E_TLS_PROGRESS_PARAMS_RECORD_TRACKER_STATE_SEARCHING = 2,
 };
 
 struct mlx5e_ktls_offload_context_tx {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net 4/5] net/mlx5e: Fix endianness handling in pedit mask
  2020-03-05 23:17 [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2020-03-05 23:17 ` [net 3/5] net/mlx5e: kTLS, Fix wrong value in record tracker enum Saeed Mahameed
@ 2020-03-05 23:17 ` Saeed Mahameed
  2020-03-05 23:17 ` [net 5/5] net/mlx5: Clear LAG notifier pointer after unregister Saeed Mahameed
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2020-03-05 23:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Sebastian Hense, Roi Dayan, Saeed Mahameed

From: Sebastian Hense <sebastian.hense1@ibm.com>

The mask value is provided as 64 bit and has to be casted in
either 32 or 16 bit. On big endian systems the wrong half was
casted which resulted in an all zero mask.

Fixes: 2b64beba0251 ("net/mlx5e: Support header re-write of partial fields in TC pedit offload")
Signed-off-by: Sebastian Hense <sebastian.hense1@ibm.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 74091f72c9a8..ec5fc52bf572 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2476,10 +2476,11 @@ static int offload_pedit_fields(struct pedit_headers_action *hdrs,
 			continue;
 
 		if (f->field_bsize == 32) {
-			mask_be32 = *(__be32 *)&mask;
+			mask_be32 = (__be32)mask;
 			mask = (__force unsigned long)cpu_to_le32(be32_to_cpu(mask_be32));
 		} else if (f->field_bsize == 16) {
-			mask_be16 = *(__be16 *)&mask;
+			mask_be32 = (__be32)mask;
+			mask_be16 = *(__be16 *)&mask_be32;
 			mask = (__force unsigned long)cpu_to_le16(be16_to_cpu(mask_be16));
 		}
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net 5/5] net/mlx5: Clear LAG notifier pointer after unregister
  2020-03-05 23:17 [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2020-03-05 23:17 ` [net 4/5] net/mlx5e: Fix endianness handling in pedit mask Saeed Mahameed
@ 2020-03-05 23:17 ` Saeed Mahameed
  2020-03-09 19:08   ` Parav Pandit
  2020-03-05 23:56 ` [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Jakub Kicinski
  2020-03-07  6:43 ` David Miller
  6 siblings, 1 reply; 13+ messages in thread
From: Saeed Mahameed @ 2020-03-05 23:17 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eli Cohen, Vlad Buslov, Raed Salem, Saeed Mahameed

From: Eli Cohen <eli@mellanox.com>

After returning from unregister_netdevice_notifier_dev_net(), set the
notifier_call field to NULL so successive call to mlx5_lag_add() will
function as expected.

Fixes: 7907f23adc18 ("net/mlx5: Implement RoCE LAG feature")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lag.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
index 8e19f6ab8393..93052b07c76c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
@@ -615,8 +615,10 @@ void mlx5_lag_remove(struct mlx5_core_dev *dev)
 			break;
 
 	if (i == MLX5_MAX_PORTS) {
-		if (ldev->nb.notifier_call)
+		if (ldev->nb.notifier_call) {
 			unregister_netdevice_notifier_net(&init_net, &ldev->nb);
+			ldev->nb.notifier_call = NULL;
+		}
 		mlx5_lag_mp_cleanup(ldev);
 		cancel_delayed_work_sync(&ldev->bond_work);
 		mlx5_lag_dev_free(ldev);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05
  2020-03-05 23:17 [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2020-03-05 23:17 ` [net 5/5] net/mlx5: Clear LAG notifier pointer after unregister Saeed Mahameed
@ 2020-03-05 23:56 ` Jakub Kicinski
  2020-03-06  0:52   ` Saeed Mahameed
  2020-03-07  6:43 ` David Miller
  6 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2020-03-05 23:56 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: David S. Miller, netdev

On Thu,  5 Mar 2020 15:17:34 -0800 Saeed Mahameed wrote:
> Hi Dave,
> 
> This series introduces some fixes to mlx5 driver.
> 
> Please pull and let me know if there is any problem.
> 
> For -stable v5.3
>  ('net/mlx5e: kTLS, Fix wrong value in record tracker enum')

Back porting dead code feels kinda weird :S

> For -stable v5.4
>  ('net/mlx5: DR, Fix postsend actions write length')
> 
> For -stable v5.5
>  ('net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow')
>  ('net/mlx5e: Fix endianness handling in pedit mask')

I can only trust your testing on the pedit change :)

Reviewed-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05
  2020-03-05 23:56 ` [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Jakub Kicinski
@ 2020-03-06  0:52   ` Saeed Mahameed
  0 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2020-03-06  0:52 UTC (permalink / raw)
  To: kuba; +Cc: davem, netdev

On Thu, 2020-03-05 at 15:56 -0800, Jakub Kicinski wrote:
> On Thu,  5 Mar 2020 15:17:34 -0800 Saeed Mahameed wrote:
> > Hi Dave,
> > 
> > This series introduces some fixes to mlx5 driver.
> > 
> > Please pull and let me know if there is any problem.
> > 
> > For -stable v5.3
> >  ('net/mlx5e: kTLS, Fix wrong value in record tracker enum')
> 
> Back porting dead code feels kinda weird :S
> 

Yes this one can be skipped.

I have an automatic system that already tries to backport and test the
patch to find the correct -stable to mark, and this time I didn't
double check manually if the patch is actually needed to be backported
.. my bad.

> > For -stable v5.4
> >  ('net/mlx5: DR, Fix postsend actions write length')
> > 
> > For -stable v5.5
> >  ('net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow')
> >  ('net/mlx5e: Fix endianness handling in pedit mask')
> 
> I can only trust your testing on the pedit change :)
> 
> Reviewed-by: Jakub Kicinski <kuba@kernel.org>

Thanks!


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05
  2020-03-05 23:17 [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2020-03-05 23:56 ` [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Jakub Kicinski
@ 2020-03-07  6:43 ` David Miller
  2020-03-23 22:57   ` Saeed Mahameed
  6 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2020-03-07  6:43 UTC (permalink / raw)
  To: saeedm; +Cc: netdev

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu,  5 Mar 2020 15:17:34 -0800

> This series introduces some fixes to mlx5 driver.
> 
> Please pull and let me know if there is any problem.

Pulled.

> For -stable v5.4
>  ('net/mlx5: DR, Fix postsend actions write length')
> 
> For -stable v5.5
>  ('net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow')
>  ('net/mlx5e: Fix endianness handling in pedit mask')

Queued up, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [net 5/5] net/mlx5: Clear LAG notifier pointer after unregister
  2020-03-05 23:17 ` [net 5/5] net/mlx5: Clear LAG notifier pointer after unregister Saeed Mahameed
@ 2020-03-09 19:08   ` Parav Pandit
  2020-03-11 10:10     ` Eli Cohen
  0 siblings, 1 reply; 13+ messages in thread
From: Parav Pandit @ 2020-03-09 19:08 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller
  Cc: netdev, Eli Cohen, Vlad Buslov, Raed Salem, Saeed Mahameed

Hi Eli,

> Sent: Thursday, March 5, 2020 5:18 PM
> To: David S. Miller <davem@davemloft.net>
> 
> From: Eli Cohen <eli@mellanox.com>
> 
> After returning from unregister_netdevice_notifier_dev_net(), set the
> notifier_call field to NULL so successive call to mlx5_lag_add() will function as
> expected.
> 
> Fixes: 7907f23adc18 ("net/mlx5: Implement RoCE LAG feature")
> Signed-off-by: Eli Cohen <eli@mellanox.com>
> Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
> Reviewed-by: Raed Salem <raeds@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/lag.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
> b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
> index 8e19f6ab8393..93052b07c76c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
> @@ -615,8 +615,10 @@ void mlx5_lag_remove(struct mlx5_core_dev *dev)
>  			break;
> 
>  	if (i == MLX5_MAX_PORTS) {
> -		if (ldev->nb.notifier_call)
> +		if (ldev->nb.notifier_call) {
>  			unregister_netdevice_notifier_net(&init_net, &ldev-
> >nb);
> +			ldev->nb.notifier_call = NULL;
> +		}
>  		mlx5_lag_mp_cleanup(ldev);
>  		cancel_delayed_work_sync(&ldev->bond_work);
>  		mlx5_lag_dev_free(ldev);
> --
> 2.24.1

I have noticed this and applied this local change to avoid below call trace and reported it to Leon few days back in different discussion.

But I fail to justify that this was/is the right fix. To me it seems to hide another bug.
Can you please explain the flow why mlx5_lag_remove() will be called twice with i = 2 because of which null check is needed to avoid unregister_notifier second time?

RIP: 0010:__list_del_entry_valid+0x7c/0xa0
unregister_netdevice_notifier_dev_net+0x1f/0x70
mlx5_lag_remove+0x61/0xf0 [mlx5_core]
mlx5e_detach_netdev+0x24/0x50 [mlx5_core]
mlx5e_detach+0x36/0x40 [mlx5_core]
mlx5e_remove+0x48/0x60 [mlx5_core]
mlx5_remove_device+0xb0/0xc0 [mlx5_core]
mlx5_unregister_interface+0x39/0x90 [mlx5_core]
cleanup+0x5/0xdd1 [mlx5_core]


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [net 5/5] net/mlx5: Clear LAG notifier pointer after unregister
  2020-03-09 19:08   ` Parav Pandit
@ 2020-03-11 10:10     ` Eli Cohen
  0 siblings, 0 replies; 13+ messages in thread
From: Eli Cohen @ 2020-03-11 10:10 UTC (permalink / raw)
  To: Parav Pandit, Saeed Mahameed, David S. Miller
  Cc: netdev, Vlad Buslov, Raed Salem, Saeed Mahameed

Hi Parav,

My patch did not fix the issue you are quoting below with call trace. It  has been fixed by this patch:
commit e387f7d5fccf95299135c88b799184c3bef6a705
Author: Jiri Pirko <jiri@mellanox.com>
Date:   Thu Feb 27 08:22:10 2020 +0100

    mlx5: register lag notifier for init network namespace only

My fix is straightforward. Look at the patch.


-----Original Message-----
From: Parav Pandit <parav@mellanox.com> 
Sent: Monday, March 9, 2020 9:09 PM
To: Saeed Mahameed <saeedm@mellanox.com>; David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org; Eli Cohen <eli@mellanox.com>; Vlad Buslov <vladbu@mellanox.com>; Raed Salem <raeds@mellanox.com>; Saeed Mahameed <saeedm@mellanox.com>
Subject: RE: [net 5/5] net/mlx5: Clear LAG notifier pointer after unregister

Hi Eli,

> Sent: Thursday, March 5, 2020 5:18 PM
> To: David S. Miller <davem@davemloft.net>
> 
> From: Eli Cohen <eli@mellanox.com>
> 
> After returning from unregister_netdevice_notifier_dev_net(), set the 
> notifier_call field to NULL so successive call to mlx5_lag_add() will 
> function as expected.
> 
> Fixes: 7907f23adc18 ("net/mlx5: Implement RoCE LAG feature")
> Signed-off-by: Eli Cohen <eli@mellanox.com>
> Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
> Reviewed-by: Raed Salem <raeds@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/lag.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
> b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
> index 8e19f6ab8393..93052b07c76c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
> @@ -615,8 +615,10 @@ void mlx5_lag_remove(struct mlx5_core_dev *dev)
>  			break;
> 
>  	if (i == MLX5_MAX_PORTS) {
> -		if (ldev->nb.notifier_call)
> +		if (ldev->nb.notifier_call) {
>  			unregister_netdevice_notifier_net(&init_net, &ldev-
> >nb);
> +			ldev->nb.notifier_call = NULL;
> +		}
>  		mlx5_lag_mp_cleanup(ldev);
>  		cancel_delayed_work_sync(&ldev->bond_work);
>  		mlx5_lag_dev_free(ldev);
> --
> 2.24.1

I have noticed this and applied this local change to avoid below call trace and reported it to Leon few days back in different discussion.

But I fail to justify that this was/is the right fix. To me it seems to hide another bug.
Can you please explain the flow why mlx5_lag_remove() will be called twice with i = 2 because of which null check is needed to avoid unregister_notifier second time?

RIP: 0010:__list_del_entry_valid+0x7c/0xa0
unregister_netdevice_notifier_dev_net+0x1f/0x70
mlx5_lag_remove+0x61/0xf0 [mlx5_core]
mlx5e_detach_netdev+0x24/0x50 [mlx5_core]
mlx5e_detach+0x36/0x40 [mlx5_core]
mlx5e_remove+0x48/0x60 [mlx5_core]
mlx5_remove_device+0xb0/0xc0 [mlx5_core]
mlx5_unregister_interface+0x39/0x90 [mlx5_core]
cleanup+0x5/0xdd1 [mlx5_core]


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05
  2020-03-07  6:43 ` David Miller
@ 2020-03-23 22:57   ` Saeed Mahameed
  2020-03-24  1:41     ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Saeed Mahameed @ 2020-03-23 22:57 UTC (permalink / raw)
  To: davem; +Cc: netdev

On Fri, 2020-03-06 at 22:43 -0800, David Miller wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
> Date: Thu,  5 Mar 2020 15:17:34 -0800
> 
> > This series introduces some fixes to mlx5 driver.
> > 
> > Please pull and let me know if there is any problem.
> 
> Pulled.
> 

Hi Dave, 

i don't see this pull request merged into your net tree, 
what am i missing ? 

I am preparing a new fixes pull request, should i re-include these
patches and start over ? I don't mind .. 

> > For -stable v5.4
> >  ('net/mlx5: DR, Fix postsend actions write length')
> > 
> > For -stable v5.5
> >  ('net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow')
> >  ('net/mlx5e: Fix endianness handling in pedit mask')
> 
> Queued up, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05
  2020-03-23 22:57   ` Saeed Mahameed
@ 2020-03-24  1:41     ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2020-03-24  1:41 UTC (permalink / raw)
  To: saeedm; +Cc: netdev

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Mon, 23 Mar 2020 22:57:32 +0000

> i don't see this pull request merged into your net tree, 
> what am i missing ? 
> 
> I am preparing a new fixes pull request, should i re-include these
> patches and start over ? I don't mind .. 

I left it build tested on my laptop but not pushed out, sorry :-)

It's in the tree now.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-03-24  1:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-05 23:17 [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Saeed Mahameed
2020-03-05 23:17 ` [net 1/5] net/mlx5: DR, Fix postsend actions write length Saeed Mahameed
2020-03-05 23:17 ` [net 2/5] net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow Saeed Mahameed
2020-03-05 23:17 ` [net 3/5] net/mlx5e: kTLS, Fix wrong value in record tracker enum Saeed Mahameed
2020-03-05 23:17 ` [net 4/5] net/mlx5e: Fix endianness handling in pedit mask Saeed Mahameed
2020-03-05 23:17 ` [net 5/5] net/mlx5: Clear LAG notifier pointer after unregister Saeed Mahameed
2020-03-09 19:08   ` Parav Pandit
2020-03-11 10:10     ` Eli Cohen
2020-03-05 23:56 ` [pull request][net 0/5] Mellanox, mlx5 fixes 2020-03-05 Jakub Kicinski
2020-03-06  0:52   ` Saeed Mahameed
2020-03-07  6:43 ` David Miller
2020-03-23 22:57   ` Saeed Mahameed
2020-03-24  1:41     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).