linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
@ 2021-10-24  8:42 Leon Romanovsky
  2021-10-24  9:05 ` Ido Schimmel
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-24  8:42 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: Leon Romanovsky, Ido Schimmel, Jiri Pirko, linux-kernel, netdev,
	syzbot+93d5accfaefceedf43c1

From: Leon Romanovsky <leonro@nvidia.com>

Align netdevsim to be like all other physical devices that register and
unregister devlink traps during their probe and removal respectively.

netdevsim netdevsim0 netdevsim2 (unregistering): unset [1, 0] type 2 family 0 port 6081 - 0
netdevsim netdevsim0 netdevsim1 (unregistering): unset [1, 0] type 2 family 0 port 6081 - 0
netdevsim netdevsim0 netdevsim0 (unregistering): unset [1, 0] type 2 family 0 port 6081 - 0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 6553 at net/core/devlink.c:11162 devlink_trap_groups_unregister+0xe8/0x110 net/core/devlink.c:11162
Modules linked in:
CPU: 0 PID: 6553 Comm: syz-executor166 Not tainted 5.15.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:devlink_trap_groups_unregister+0xe8/0x110 net/core/devlink.c:11162
Code: ff ff 31 ff 89 de e8 97 e3 41 fa 83 fb ff 75 cc e8 4d dc 41 fa 4c 89 f7 5b 5d 41 5c 41 5d 41 5e e9 6d 79 05 02 e8 38 dc 41 fa <0f> 0b e9 71 ff ff ff 4c 89 ef e8 19 4f 89 fa e9 3b ff ff ff 48 89
RSP: 0018:ffffc90002d8f3b0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000
RDX: ffff888024a00000 RSI: ffffffff87350e68 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff87350dd7 R11: 0000000000000000 R12: ffffffff8a263c20
R13: ffff888077d36000 R14: dffffc0000000000 R15: ffff888077d36388
FS:  000055555711e300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200004c0 CR3: 0000000070465000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 nsim_dev_traps_exit+0x67/0x170 drivers/net/netdevsim/dev.c:849
 nsim_dev_reload_destroy+0x20c/0x2f0 drivers/net/netdevsim/dev.c:1559
 nsim_dev_reload_down+0xdf/0x180 drivers/net/netdevsim/dev.c:883
 devlink_reload+0x53b/0x6b0 net/core/devlink.c:4040
 devlink_nl_cmd_reload+0x6ed/0x11d0 net/core/devlink.c:4161
 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731
 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
 genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f02abf45409
Code: 28 c3 e8 4a 15 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffeed83e458 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffeed83e468 RCX: 00007f02abf45409
RDX: 0000000000000000 RSI: 0000000020000600 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffeed83e470
R13: 00007ffeed83e490 R14: 0000000000000000 R15: 0000000000000000

Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support")
Reported-by: syzbot+93d5accfaefceedf43c1@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/net/netdevsim/dev.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 9661aca35703..2698241bb886 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -1400,14 +1400,10 @@ static int nsim_dev_reload_create(struct nsim_dev *nsim_dev,
 	if (err)
 		return err;
 
-	err = nsim_dev_traps_init(devlink);
-	if (err)
-		goto err_dummy_region_exit;
-
 	nsim_dev->fib_data = nsim_fib_create(devlink, extack);
 	if (IS_ERR(nsim_dev->fib_data)) {
 		err = PTR_ERR(nsim_dev->fib_data);
-		goto err_traps_exit;
+		goto err_dummy_region_exit;
 	}
 
 	err = nsim_dev_health_init(nsim_dev, devlink);
@@ -1435,8 +1431,6 @@ static int nsim_dev_reload_create(struct nsim_dev *nsim_dev,
 	nsim_dev_health_exit(nsim_dev);
 err_fib_destroy:
 	nsim_fib_destroy(devlink, nsim_dev->fib_data);
-err_traps_exit:
-	nsim_dev_traps_exit(devlink);
 err_dummy_region_exit:
 	nsim_dev_dummy_region_exit(nsim_dev);
 	return err;
@@ -1556,7 +1550,6 @@ static void nsim_dev_reload_destroy(struct nsim_dev *nsim_dev)
 	nsim_dev_psample_exit(nsim_dev);
 	nsim_dev_health_exit(nsim_dev);
 	nsim_fib_destroy(devlink, nsim_dev->fib_data);
-	nsim_dev_traps_exit(devlink);
 	nsim_dev_dummy_region_exit(nsim_dev);
 	mutex_destroy(&nsim_dev->port_list_lock);
 }
@@ -1567,6 +1560,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
 	struct devlink *devlink = priv_to_devlink(nsim_dev);
 
 	devlink_unregister(devlink);
+	nsim_dev_traps_exit(devlink);
 	nsim_dev_reload_destroy(nsim_dev);
 
 	nsim_bpf_dev_exit(nsim_dev);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-24  8:42 [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device Leon Romanovsky
@ 2021-10-24  9:05 ` Ido Schimmel
  2021-10-24  9:54   ` Leon Romanovsky
  2021-10-25 23:19   ` Edwin Peer
  0 siblings, 2 replies; 29+ messages in thread
From: Ido Schimmel @ 2021-10-24  9:05 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Jakub Kicinski, Leon Romanovsky, Ido Schimmel,
	Jiri Pirko, linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Sun, Oct 24, 2021 at 11:42:11AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Align netdevsim to be like all other physical devices that register and
> unregister devlink traps during their probe and removal respectively.

No, this is incorrect. Out of the three drivers that support both reload
and traps, both netdevsim and mlxsw unregister the traps during reload.
Here is another report from syzkaller about mlxsw [1].

Please revert both 22849b5ea595 ("devlink: Remove not-executed trap
policer notifications") and 8bbeed485823 ("devlink: Remove not-executed
trap group notifications").

Thanks

[1]
------------[ cut here ]------------
WARNING: CPU: 1 PID: 556 at net/core/devlink.c:11162 devlink_trap_groups_unregister+0xf6/0x120 net/core/devlink.c:11162
Modules linked in:
CPU: 1 PID: 556 Comm: syz-executor.0 Not tainted 5.15.0-rc6-custom-64140-gb38f6d83c5b7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:devlink_trap_groups_unregister+0xf6/0x120 net/core/devlink.c:11162
Code: de 31 ff e8 7c 13 12 fe 83 fb ff 75 cc e8 d2 11 12 fe 4c 89 ff e8 ca 50 94 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 ba 11 12 fe <0f> 0b e9 6e ff ff ff 4c 89 ef e8 1b a6 57 fe e9 37 ff ff ff 4c 89
RSP: 0018:ffffc9000116f490 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff8350e302
RDX: 0000000000000000 RSI: ffff8881022655c0 RDI: 0000000000000002
RBP: ffffc9000116f4b8 R08: ffff8881022655c0 R09: fffffbfff0dfa61d
R10: 1ffff110235c42fa R11: fffffbfff0dfa61c R12: ffff88810a0bce28
R13: ffff8881033dc000 R14: 0000000000000001 R15: ffffed102071ebbb
FS:  0000000003152980(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055dad1f080f0 CR3: 0000000053fc4000 CR4: 00000000000006e0
Call Trace:
 mlxsw_sp_trap_groups_fini+0x10a/0x1c0 drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c:1411
 mlxsw_sp_devlink_traps_fini+0x175/0x1e0 drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c:1540
 mlxsw_sp_fini+0x3cb/0x500 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3114
 mlxsw_core_bus_device_unregister+0xdf/0x6c0 drivers/net/ethernet/mellanox/mlxsw/core.c:2091
 mlxsw_devlink_core_bus_device_reload_down+0x87/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1473
 devlink_reload+0x184/0x630 net/core/devlink.c:4040
 devlink_nl_cmd_reload+0x612/0x1320 net/core/devlink.c:4161
 genl_family_rcv_msg_doit.isra.0+0x253/0x370 net/netlink/genetlink.c:731
 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
 genl_rcv_msg+0x389/0x620 net/netlink/genetlink.c:792
 netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2491
 genl_rcv+0x2e/0x40 net/netlink/genetlink.c:803
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1916
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg net/socket.c:724 [inline]
 __sys_sendto+0x2b6/0x410 net/socket.c:2036
 __do_sys_sendto net/socket.c:2048 [inline]
 __se_sys_sendto net/socket.c:2044 [inline]
 __x64_sys_sendto+0xe6/0x1a0 net/socket.c:2044
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x41f86a
Code: 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
RSP: 002b:00007ffdb75a9918 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000014c4320 RCX: 000000000041f86a
RDX: 0000000000000038 RSI: 00000000014c4370 RDI: 0000000000000004
RBP: 0000000000000001 R08: 00007ffdb75a9934 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000014c4370 R14: 0000000000000004 R15: 0000000000000000

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-24  9:05 ` Ido Schimmel
@ 2021-10-24  9:54   ` Leon Romanovsky
  2021-10-24 10:48     ` Ido Schimmel
  2021-10-25 23:19   ` Edwin Peer
  1 sibling, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-24  9:54 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David S . Miller, Jakub Kicinski, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Sun, Oct 24, 2021 at 12:05:12PM +0300, Ido Schimmel wrote:
> On Sun, Oct 24, 2021 at 11:42:11AM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Align netdevsim to be like all other physical devices that register and
> > unregister devlink traps during their probe and removal respectively.
> 
> No, this is incorrect. Out of the three drivers that support both reload
> and traps, both netdevsim and mlxsw unregister the traps during reload.
> Here is another report from syzkaller about mlxsw [1].

Sorry, I overlooked it.

> 
> Please revert both 22849b5ea595 ("devlink: Remove not-executed trap
> policer notifications") and 8bbeed485823 ("devlink: Remove not-executed
> trap group notifications").

However, before we rush and revert commit, can you please explain why
current behavior to reregister traps on reload is correct?

I think that you are not changing traps during reload, so traps before
reload will be the same as after reload, am I right?

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-24  9:54   ` Leon Romanovsky
@ 2021-10-24 10:48     ` Ido Schimmel
  2021-10-25  5:34       ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Ido Schimmel @ 2021-10-24 10:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Jakub Kicinski, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Sun, Oct 24, 2021 at 12:54:52PM +0300, Leon Romanovsky wrote:
> On Sun, Oct 24, 2021 at 12:05:12PM +0300, Ido Schimmel wrote:
> > On Sun, Oct 24, 2021 at 11:42:11AM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > > 
> > > Align netdevsim to be like all other physical devices that register and
> > > unregister devlink traps during their probe and removal respectively.
> > 
> > No, this is incorrect. Out of the three drivers that support both reload
> > and traps, both netdevsim and mlxsw unregister the traps during reload.
> > Here is another report from syzkaller about mlxsw [1].
> 
> Sorry, I overlooked it.
> 
> > 
> > Please revert both 22849b5ea595 ("devlink: Remove not-executed trap
> > policer notifications") and 8bbeed485823 ("devlink: Remove not-executed
> > trap group notifications").
> 
> However, before we rush and revert commit, can you please explain why
> current behavior to reregister traps on reload is correct?
> 
> I think that you are not changing traps during reload, so traps before
> reload will be the same as after reload, am I right?

During reload we tear down the entire driver and load it again. As part
of the reload_down() operation we tear down the various objects from
both devlink and the device (e.g., shared buffer, ports, traps, etc.).
As part of the reload_up() operation we issue a device reset and
register everything back.

While the list of objects doesn't change, their properties (e.g., shared
buffer size, trap action, policer rate) do change back to the default
after reload and we cannot go back on that as it's a user-visible
change.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-24 10:48     ` Ido Schimmel
@ 2021-10-25  5:34       ` Leon Romanovsky
  2021-10-25  8:08         ` Ido Schimmel
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-25  5:34 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David S . Miller, Jakub Kicinski, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Sun, Oct 24, 2021 at 01:48:25PM +0300, Ido Schimmel wrote:
> On Sun, Oct 24, 2021 at 12:54:52PM +0300, Leon Romanovsky wrote:
> > On Sun, Oct 24, 2021 at 12:05:12PM +0300, Ido Schimmel wrote:
> > > On Sun, Oct 24, 2021 at 11:42:11AM +0300, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > 
> > > > Align netdevsim to be like all other physical devices that register and
> > > > unregister devlink traps during their probe and removal respectively.
> > > 
> > > No, this is incorrect. Out of the three drivers that support both reload
> > > and traps, both netdevsim and mlxsw unregister the traps during reload.
> > > Here is another report from syzkaller about mlxsw [1].
> > 
> > Sorry, I overlooked it.
> > 
> > > 
> > > Please revert both 22849b5ea595 ("devlink: Remove not-executed trap
> > > policer notifications") and 8bbeed485823 ("devlink: Remove not-executed
> > > trap group notifications").
> > 
> > However, before we rush and revert commit, can you please explain why
> > current behavior to reregister traps on reload is correct?
> > 
> > I think that you are not changing traps during reload, so traps before
> > reload will be the same as after reload, am I right?
> 
> During reload we tear down the entire driver and load it again. As part
> of the reload_down() operation we tear down the various objects from
> both devlink and the device (e.g., shared buffer, ports, traps, etc.).
> As part of the reload_up() operation we issue a device reset and
> register everything back.

This is an implementation which is arguably questionable and pinpoints
problem with devlink reload. It mixes different SW layers into one big
mess which I tried to untangle.

The devlink "feature" that driver reregisters itself again during execution
of other user-visible devlink command can't be right design.

> 
> While the list of objects doesn't change, their properties (e.g., shared
> buffer size, trap action, policer rate) do change back to the default
> after reload and we cannot go back on that as it's a user-visible
> change.

I don't propose to go back, just prefer to see fixed mlxsw that
shouldn't touch already created and registered objects from net/core/devlink.c.

All reset-to-default should be performed internally to the driver
without any need to devlink_*_register() again, so we will be able to
clean rest devlink notifications.

So at least for the netdevsim, this change looks like the correct one,
while mlxsw should be fixed next.

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-25  5:34       ` Leon Romanovsky
@ 2021-10-25  8:08         ` Ido Schimmel
  2021-10-25 10:56           ` Leon Romanovsky
  2021-10-25 18:24           ` Jakub Kicinski
  0 siblings, 2 replies; 29+ messages in thread
From: Ido Schimmel @ 2021-10-25  8:08 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Jakub Kicinski, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Mon, Oct 25, 2021 at 08:34:55AM +0300, Leon Romanovsky wrote:
> On Sun, Oct 24, 2021 at 01:48:25PM +0300, Ido Schimmel wrote:
> > On Sun, Oct 24, 2021 at 12:54:52PM +0300, Leon Romanovsky wrote:
> > > On Sun, Oct 24, 2021 at 12:05:12PM +0300, Ido Schimmel wrote:
> > > > On Sun, Oct 24, 2021 at 11:42:11AM +0300, Leon Romanovsky wrote:
> > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > > 
> > > > > Align netdevsim to be like all other physical devices that register and
> > > > > unregister devlink traps during their probe and removal respectively.
> > > > 
> > > > No, this is incorrect. Out of the three drivers that support both reload
> > > > and traps, both netdevsim and mlxsw unregister the traps during reload.
> > > > Here is another report from syzkaller about mlxsw [1].
> > > 
> > > Sorry, I overlooked it.
> > > 
> > > > 
> > > > Please revert both 22849b5ea595 ("devlink: Remove not-executed trap
> > > > policer notifications") and 8bbeed485823 ("devlink: Remove not-executed
> > > > trap group notifications").
> > > 
> > > However, before we rush and revert commit, can you please explain why
> > > current behavior to reregister traps on reload is correct?
> > > 
> > > I think that you are not changing traps during reload, so traps before
> > > reload will be the same as after reload, am I right?
> > 
> > During reload we tear down the entire driver and load it again. As part
> > of the reload_down() operation we tear down the various objects from
> > both devlink and the device (e.g., shared buffer, ports, traps, etc.).
> > As part of the reload_up() operation we issue a device reset and
> > register everything back.
> 
> This is an implementation which is arguably questionable and pinpoints
> problem with devlink reload. It mixes different SW layers into one big
> mess which I tried to untangle.
> 
> The devlink "feature" that driver reregisters itself again during execution
> of other user-visible devlink command can't be right design.
> 
> > 
> > While the list of objects doesn't change, their properties (e.g., shared
> > buffer size, trap action, policer rate) do change back to the default
> > after reload and we cannot go back on that as it's a user-visible
> > change.
> 
> I don't propose to go back, just prefer to see fixed mlxsw that
> shouldn't touch already created and registered objects from net/core/devlink.c.
> 
> All reset-to-default should be performed internally to the driver
> without any need to devlink_*_register() again, so we will be able to
> clean rest devlink notifications.
> 
> So at least for the netdevsim, this change looks like the correct one,
> while mlxsw should be fixed next.

No, it's not correct. After your patch, trap properties like action are
not set back to the default. Regardless of what you think is the "right
design", you cannot introduce such regressions.

Calling devlink_*_unregister() in reload_down() and devlink_*_register()
in reload_up() is not new. It is done for multiple objects (e.g., ports,
regions, shared buffer, etc). After your patch, netdevsim is still doing
it.

Again, please revert the two commits I mentioned. If you think they are
necessary, you can re-submit them in the future, after proper review and
testing of the affected code paths.

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-25  8:08         ` Ido Schimmel
@ 2021-10-25 10:56           ` Leon Romanovsky
  2021-10-26  6:51             ` Ido Schimmel
  2021-10-25 18:24           ` Jakub Kicinski
  1 sibling, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-25 10:56 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David S . Miller, Jakub Kicinski, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Mon, Oct 25, 2021 at 11:08:00AM +0300, Ido Schimmel wrote:
> On Mon, Oct 25, 2021 at 08:34:55AM +0300, Leon Romanovsky wrote:
> > On Sun, Oct 24, 2021 at 01:48:25PM +0300, Ido Schimmel wrote:
> > > On Sun, Oct 24, 2021 at 12:54:52PM +0300, Leon Romanovsky wrote:
> > > > On Sun, Oct 24, 2021 at 12:05:12PM +0300, Ido Schimmel wrote:
> > > > > On Sun, Oct 24, 2021 at 11:42:11AM +0300, Leon Romanovsky wrote:
> > > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > > > 
> > > > > > Align netdevsim to be like all other physical devices that register and
> > > > > > unregister devlink traps during their probe and removal respectively.
> > > > > 
> > > > > No, this is incorrect. Out of the three drivers that support both reload
> > > > > and traps, both netdevsim and mlxsw unregister the traps during reload.
> > > > > Here is another report from syzkaller about mlxsw [1].
> > > > 
> > > > Sorry, I overlooked it.
> > > > 
> > > > > 
> > > > > Please revert both 22849b5ea595 ("devlink: Remove not-executed trap
> > > > > policer notifications") and 8bbeed485823 ("devlink: Remove not-executed
> > > > > trap group notifications").
> > > > 
> > > > However, before we rush and revert commit, can you please explain why
> > > > current behavior to reregister traps on reload is correct?
> > > > 
> > > > I think that you are not changing traps during reload, so traps before
> > > > reload will be the same as after reload, am I right?
> > > 
> > > During reload we tear down the entire driver and load it again. As part
> > > of the reload_down() operation we tear down the various objects from
> > > both devlink and the device (e.g., shared buffer, ports, traps, etc.).
> > > As part of the reload_up() operation we issue a device reset and
> > > register everything back.
> > 
> > This is an implementation which is arguably questionable and pinpoints
> > problem with devlink reload. It mixes different SW layers into one big
> > mess which I tried to untangle.
> > 
> > The devlink "feature" that driver reregisters itself again during execution
> > of other user-visible devlink command can't be right design.
> > 
> > > 
> > > While the list of objects doesn't change, their properties (e.g., shared
> > > buffer size, trap action, policer rate) do change back to the default
> > > after reload and we cannot go back on that as it's a user-visible
> > > change.
> > 
> > I don't propose to go back, just prefer to see fixed mlxsw that
> > shouldn't touch already created and registered objects from net/core/devlink.c.
> > 
> > All reset-to-default should be performed internally to the driver
> > without any need to devlink_*_register() again, so we will be able to
> > clean rest devlink notifications.
> > 
> > So at least for the netdevsim, this change looks like the correct one,
> > while mlxsw should be fixed next.
> 
> No, it's not correct. After your patch, trap properties like action are
> not set back to the default. Regardless of what you think is the "right
> design", you cannot introduce such regressions.

Again, I'm not against fixing the regression, I'm trying to understand
why it is impossible to fix mlxsw and netdevsim to honor SW layering
properly.

> 
> Calling devlink_*_unregister() in reload_down() and devlink_*_register()
> in reload_up() is not new. It is done for multiple objects (e.g., ports,
> regions, shared buffer, etc). After your patch, netdevsim is still doing
> it.

Yeah, it was introduced by same developers who did it in mlxsw, so no
wonders that same patterns exist in both drivers.

> 
> Again, please revert the two commits I mentioned. If you think they are
> necessary, you can re-submit them in the future, after proper review and
> testing of the affected code paths.

It was posted for review in the ML, no one objected.

Can you please explain why is it so important to touch devlink SW
objects, reallocate them again and again on every reload in mlxsw?

The flow is convoluted without any reason:
devlink reload ->
  mlxsw reload_down -> 
    devlink ... unregister -> 
      mlxsw reload_down again ->
        devlink reload again ->
	  mlxsw reload_up ->
	    devlink ... register ->
	      mlxsw to handle register ->
	        mlxsw reload_up again ->
		  devlink reload finish

Instead of:
devlink reload ->
 mlxsw reload_down ->
   devlink reload again ->
     mlxsw reload_up ->
       devlink reload finish

Thanks

> 
> Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-25  8:08         ` Ido Schimmel
  2021-10-25 10:56           ` Leon Romanovsky
@ 2021-10-25 18:24           ` Jakub Kicinski
  2021-10-25 19:12             ` Leon Romanovsky
  1 sibling, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2021-10-25 18:24 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Leon Romanovsky, David S . Miller, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Mon, 25 Oct 2021 11:08:00 +0300 Ido Schimmel wrote:
> No, it's not correct. After your patch, trap properties like action are
> not set back to the default. Regardless of what you think is the "right
> design", you cannot introduce such regressions.
> 
> Calling devlink_*_unregister() in reload_down() and devlink_*_register()
> in reload_up() is not new. It is done for multiple objects (e.g., ports,
> regions, shared buffer, etc). After your patch, netdevsim is still doing
> it.

If we want to push forward in the direction that Leon advocates we'd
have to unregister the devlink instance before reload_down(), right?

Otherwise it seems fundamentally incompatible with the idea of reload
for reconfig. And we'd be trading one partially correct model for
another partially correct model :/

> Again, please revert the two commits I mentioned. If you think they are
> necessary, you can re-submit them in the future, after proper review and
> testing of the affected code paths.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-25 18:24           ` Jakub Kicinski
@ 2021-10-25 19:12             ` Leon Romanovsky
  0 siblings, 0 replies; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-25 19:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ido Schimmel, David S . Miller, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Mon, Oct 25, 2021 at 11:24:53AM -0700, Jakub Kicinski wrote:
> On Mon, 25 Oct 2021 11:08:00 +0300 Ido Schimmel wrote:
> > No, it's not correct. After your patch, trap properties like action are
> > not set back to the default. Regardless of what you think is the "right
> > design", you cannot introduce such regressions.
> > 
> > Calling devlink_*_unregister() in reload_down() and devlink_*_register()
> > in reload_up() is not new. It is done for multiple objects (e.g., ports,
> > regions, shared buffer, etc). After your patch, netdevsim is still doing
> > it.
> 
> If we want to push forward in the direction that Leon advocates we'd
> have to unregister the devlink instance before reload_down(), right?

Not really, we ensure that during "devlink reload", users can't send any
get/set commands. So you don't need to unregister anything, because
driver is safe to change any internal values it wants. The devlink instance
is locked for exclusive access.

The protection is done now by devlink_mutex, but will be slightly
different in the near future.

> 
> Otherwise it seems fundamentally incompatible with the idea of reload
> for reconfig. And we'd be trading one partially correct model for
> another partially correct model :/

I see devlink reload as command to reconfigure device and not devlink
itself. This is the difference between my direction vs. mlxsw way.

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-24  9:05 ` Ido Schimmel
  2021-10-24  9:54   ` Leon Romanovsky
@ 2021-10-25 23:19   ` Edwin Peer
  2021-10-26  5:56     ` Leon Romanovsky
  1 sibling, 1 reply; 29+ messages in thread
From: Edwin Peer @ 2021-10-25 23:19 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Leon Romanovsky, David S . Miller, Jakub Kicinski,
	Leon Romanovsky, Ido Schimmel, Jiri Pirko,
	Linux Kernel Mailing List, netdev, syzbot+93d5accfaefceedf43c1,
	Michael Chan

On Sun, Oct 24, 2021 at 3:35 PM Ido Schimmel <idosch@idosch.org> wrote:

> On Sun, Oct 24, 2021 at 11:42:11AM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > Align netdevsim to be like all other physical devices that register and
> > unregister devlink traps during their probe and removal respectively.
>
> No, this is incorrect. Out of the three drivers that support both reload
> and traps, both netdevsim and mlxsw unregister the traps during reload.
> Here is another report from syzkaller about mlxsw [1].
>
> Please revert both 22849b5ea595 ("devlink: Remove not-executed trap
> policer notifications") and 8bbeed485823 ("devlink: Remove not-executed
> trap group notifications").

Could we also revert 82465bec3e97 ("devlink: Delete reload
enable/disable interface")? This interface is needed because bnxt_en
cannot reorder devlink last. If Leon had fully carried out the
re-ordering in our driver he would have introduced a udev
phys_port_name regression because of:

cda2cab0771 ("bnxt_en: Move devlink_register before registering netdev")

and:

ab178b058c4 ("bnxt: remove ndo_get_phys_port_name implementation")

I think this went unnoticed for bnxt_en, because Michael had not yet
posted our devlink reload patches, which presently rely on the reload
enable/disable API. Absent horrible kludges in reload down/up which
currently depends on the netdev, there doesn't appear to be a clean
way to resolve the circular dependency without the interlocks this API
provides.

I imagine other subtle regressions are lying in wait.

Regards,
Edwin Peer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-25 23:19   ` Edwin Peer
@ 2021-10-26  5:56     ` Leon Romanovsky
  2021-10-26 17:34       ` Edwin Peer
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-26  5:56 UTC (permalink / raw)
  To: Edwin Peer
  Cc: Ido Schimmel, David S . Miller, Jakub Kicinski, Ido Schimmel,
	Jiri Pirko, Linux Kernel Mailing List, netdev,
	syzbot+93d5accfaefceedf43c1, Michael Chan

On Mon, Oct 25, 2021 at 04:19:07PM -0700, Edwin Peer wrote:
> On Sun, Oct 24, 2021 at 3:35 PM Ido Schimmel <idosch@idosch.org> wrote:
> 
> > On Sun, Oct 24, 2021 at 11:42:11AM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > >
> > > Align netdevsim to be like all other physical devices that register and
> > > unregister devlink traps during their probe and removal respectively.
> >
> > No, this is incorrect. Out of the three drivers that support both reload
> > and traps, both netdevsim and mlxsw unregister the traps during reload.
> > Here is another report from syzkaller about mlxsw [1].
> >
> > Please revert both 22849b5ea595 ("devlink: Remove not-executed trap
> > policer notifications") and 8bbeed485823 ("devlink: Remove not-executed
> > trap group notifications").
> 
> Could we also revert 82465bec3e97 ("devlink: Delete reload
> enable/disable interface")? 

Absolutely not.

> This interface is needed because bnxt_en cannot reorder devlink last.
> If Leon had fully carried out the re-ordering in our driver he would
> have introduced a udev phys_port_name regression because of:
> 
> cda2cab0771 ("bnxt_en: Move devlink_register before registering netdev")
> 
> and:
> 
> ab178b058c4 ("bnxt: remove ndo_get_phys_port_name implementation")

devlink_register() doesn't do anything except performing as a barrier.

In a nutshell, latest devlink_register() implementation is better
implementation of previously existed "reload enable/disable" boolean.

You don't need to reorder whole devlink logic, just put a call to
devlink_register() in the place where you wanted to put your
devlink_reload_enable().

> 
> I think this went unnoticed for bnxt_en, because Michael had not yet
> posted our devlink reload patches, which presently rely on the reload
> enable/disable API. Absent horrible kludges in reload down/up which
> currently depends on the netdev, there doesn't appear to be a clean
> way to resolve the circular dependency without the interlocks this API
> provides.

You was supposed to update and retest your out-of-tree implementation
of devlink reload before posting it to the ML. However, if you use
devlink_*() API correctly, such dependency won't exist.

> 
> I imagine other subtle regressions are lying in wait.

Sorry, but we don't have crystal ball and can't guess what else is
broken in your out-of-tree driver.

Thanks

> 
> Regards,
> Edwin Peer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-25 10:56           ` Leon Romanovsky
@ 2021-10-26  6:51             ` Ido Schimmel
  2021-10-26  7:18               ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Ido Schimmel @ 2021-10-26  6:51 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Jakub Kicinski, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Mon, Oct 25, 2021 at 01:56:17PM +0300, Leon Romanovsky wrote:
> On Mon, Oct 25, 2021 at 11:08:00AM +0300, Ido Schimmel wrote:
> > On Mon, Oct 25, 2021 at 08:34:55AM +0300, Leon Romanovsky wrote:
> > > On Sun, Oct 24, 2021 at 01:48:25PM +0300, Ido Schimmel wrote:
> > > > On Sun, Oct 24, 2021 at 12:54:52PM +0300, Leon Romanovsky wrote:
> > > > > On Sun, Oct 24, 2021 at 12:05:12PM +0300, Ido Schimmel wrote:
> > > > > > On Sun, Oct 24, 2021 at 11:42:11AM +0300, Leon Romanovsky wrote:
> > > > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > > > > 
> > > > > > > Align netdevsim to be like all other physical devices that register and
> > > > > > > unregister devlink traps during their probe and removal respectively.
> > > > > > 
> > > > > > No, this is incorrect. Out of the three drivers that support both reload
> > > > > > and traps, both netdevsim and mlxsw unregister the traps during reload.
> > > > > > Here is another report from syzkaller about mlxsw [1].
> > > > > 
> > > > > Sorry, I overlooked it.
> > > > > 
> > > > > > 
> > > > > > Please revert both 22849b5ea595 ("devlink: Remove not-executed trap
> > > > > > policer notifications") and 8bbeed485823 ("devlink: Remove not-executed
> > > > > > trap group notifications").
> > > > > 
> > > > > However, before we rush and revert commit, can you please explain why
> > > > > current behavior to reregister traps on reload is correct?
> > > > > 
> > > > > I think that you are not changing traps during reload, so traps before
> > > > > reload will be the same as after reload, am I right?
> > > > 
> > > > During reload we tear down the entire driver and load it again. As part
> > > > of the reload_down() operation we tear down the various objects from
> > > > both devlink and the device (e.g., shared buffer, ports, traps, etc.).
> > > > As part of the reload_up() operation we issue a device reset and
> > > > register everything back.
> > > 
> > > This is an implementation which is arguably questionable and pinpoints
> > > problem with devlink reload. It mixes different SW layers into one big
> > > mess which I tried to untangle.
> > > 
> > > The devlink "feature" that driver reregisters itself again during execution
> > > of other user-visible devlink command can't be right design.
> > > 
> > > > 
> > > > While the list of objects doesn't change, their properties (e.g., shared
> > > > buffer size, trap action, policer rate) do change back to the default
> > > > after reload and we cannot go back on that as it's a user-visible
> > > > change.
> > > 
> > > I don't propose to go back, just prefer to see fixed mlxsw that
> > > shouldn't touch already created and registered objects from net/core/devlink.c.
> > > 
> > > All reset-to-default should be performed internally to the driver
> > > without any need to devlink_*_register() again, so we will be able to
> > > clean rest devlink notifications.
> > > 
> > > So at least for the netdevsim, this change looks like the correct one,
> > > while mlxsw should be fixed next.
> > 
> > No, it's not correct. After your patch, trap properties like action are
> > not set back to the default. Regardless of what you think is the "right
> > design", you cannot introduce such regressions.
> 
> Again, I'm not against fixing the regression, I'm trying to understand
> why it is impossible to fix mlxsw and netdevsim to honor SW layering
> properly.

devlink reload was implemented in 4.16 along with mlxsw support and from
day one it meant that devlink objects such as ports and shared buffer
were dynamically destroyed / created during reload. You cannot come
almost 4 years later, sprinkle assertions according to how you
implemented reload support and claim that the rest are wrong and need to
be fixed so that the assertions don't trigger.

Note that devlink reload is not the only command that can trigger the
destruction / creation of objects. devlink ports can also come and go
following split / unsplit commands.

> 
> > 
> > Calling devlink_*_unregister() in reload_down() and devlink_*_register()
> > in reload_up() is not new. It is done for multiple objects (e.g., ports,
> > regions, shared buffer, etc). After your patch, netdevsim is still doing
> > it.
> 
> Yeah, it was introduced by same developers who did it in mlxsw, so no
> wonders that same patterns exist in both drivers.

The point of netdevsim is to both provide a reference and allow
developers to test the code. If you bothered to run the tests under
tools/testing/selftests/drivers/net/netdevsim/ (or just instantiate /
reload netdevsim, really), these regressions [1][2] would have been
avoided. Instead, I need to ask you for the third time to revert the
changes.

[1] https://syzkaller.appspot.com/bug?id=c58973ab3345057753c9f629a88275e30ed2a370
[2] https://syzkaller.appspot.com/bug?extid=93d5accfaefceedf43c1

> 
> > 
> > Again, please revert the two commits I mentioned. If you think they are
> > necessary, you can re-submit them in the future, after proper review and
> > testing of the affected code paths.
> 
> It was posted for review in the ML, no one objected.

So what?

> 
> Can you please explain why is it so important to touch devlink SW
> objects, reallocate them again and again on every reload in mlxsw?

Because that's how reload was defined and implemented. A complete
reload. We are not changing the semantics 4 years later.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26  6:51             ` Ido Schimmel
@ 2021-10-26  7:18               ` Leon Romanovsky
  2021-10-26 14:09                 ` Ido Schimmel
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-26  7:18 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David S . Miller, Jakub Kicinski, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Tue, Oct 26, 2021 at 09:51:13AM +0300, Ido Schimmel wrote:

<...>

> > 
> > Can you please explain why is it so important to touch devlink SW
> > objects, reallocate them again and again on every reload in mlxsw?
> 
> Because that's how reload was defined and implemented. A complete
> reload. We are not changing the semantics 4 years later.

Please put your emotions aside and explain me technically why are you
must to do it?

The proposed semantics was broken for last 4 years, it can even seen as
dead on arrival, because it never worked for us in real production.

So I'm fixing bugs without relation to when they were introduced.

For example, this fix from Jiri [1] for basic design flow was merged almost
two years later after devlink reload was introduced [2], or this patch from
Parav [3] that fixed an issue introduced year before [4].

[1] 
commit 5a508a254bed9a2e36a5fb96c9065532a6bf1e9c
Author: Jiri Pirko <jiri@mellanox.com>
Date:   Sat Nov 9 11:29:46 2019 +0100

    devlink: disallow reload operation during device cleanup
    
[2] 
commit 2d8dc5bbf4e7603747875eb5cadcd67c1fa8b1bb
Author: Arkadi Sharshevsky <arkadis@mellanox.com>
Date:   Mon Jan 15 08:59:04 2018 +0100

    devlink: Add support for reload

[3]
commit a7b43649507dae4e55ff0087cad4e4dd1c6d5b99
Author: Parav Pandit <parav@nvidia.com>
Date:   Wed Nov 25 11:16:20 2020 +0200

    devlink: Make sure devlink instance and port are in same net namespace

[4]
commit 070c63f20f6c739a3c534555f56c7327536bfcc2
Author: Jiri Pirko <jiri@mellanox.com>
Date:   Thu Oct 3 11:49:39 2019 +0200

    net: devlink: allow to change namespaces during reload

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26  7:18               ` Leon Romanovsky
@ 2021-10-26 14:09                 ` Ido Schimmel
  2021-10-26 16:14                   ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Ido Schimmel @ 2021-10-26 14:09 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Jakub Kicinski, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Tue, Oct 26, 2021 at 10:18:12AM +0300, Leon Romanovsky wrote:
> On Tue, Oct 26, 2021 at 09:51:13AM +0300, Ido Schimmel wrote:
> 
> <...>
> 
> > > 
> > > Can you please explain why is it so important to touch devlink SW
> > > objects, reallocate them again and again on every reload in mlxsw?
> > 
> > Because that's how reload was defined and implemented. A complete
> > reload. We are not changing the semantics 4 years later.
> 
> Please put your emotions aside and explain me technically why are you
> must to do it?

Already did. The current semantics are "devlink-reload provides
mechanism to reinit driver entities, applying devlink-params and
devlink-resources new values. It also provides mechanism to activate
firmware."

And this is exactly what netdevsim and mlxsw are doing. Driver entities
are re-initialized. Your patch breaks that as entities are not
re-initialized, which results in user space breakage. You simply cannot
introduce such regressions.

> 
> The proposed semantics was broken for last 4 years, it can even seen as
> dead on arrival,

Again with the bombastic statements. It was "dead on arrival" like the
notifications were "impossible"?

> because it never worked for us in real production.

Who is "us"? mlx5 that apparently decided to do its own thing?

We are using reload in mlxsw on a daily basis and users are using it to
re-partition ASIC resources and activate firmware. There are tests over
netdevsim implementation that anyone can run for testing purposes. We
also made sure to integrate it into syzkaller:

https://github.com/google/syzkaller/commit/5b49e1f605a770e8f8fcdcbd1a8ff85591fc0c8e
https://github.com/google/syzkaller/commit/04ca72cd45348daab9d896bbec8ea4c2d13455ac
https://github.com/google/syzkaller/commit/6930bbef3b671ae21f74007f9e59efb9b236b93f
https://github.com/google/syzkaller/commit/d45a4d69d83f40579e74fb561e1583db1be0e294
https://github.com/google/syzkaller/commit/510951950dc0ee69cfdaf746061d3dbe31b49fd8

Which is why the regressions you introduced were discovered so quickly.

> 
> So I'm fixing bugs without relation to when they were introduced.

We all do

> 
> For example, this fix from Jiri [1] for basic design flow was merged almost
> two years later after devlink reload was introduced [2], or this patch from
> Parav [3] that fixed an issue introduced year before [4].

What is your point? That code has bugs?

By now I have spent more time arguing with you than you spent testing
your patches and it's clear this discussion is not going anywhere.

Are you going to send a revert or I will? This is the fourth time I'm
asking you.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26 14:09                 ` Ido Schimmel
@ 2021-10-26 16:14                   ` Leon Romanovsky
  2021-10-26 19:02                     ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-26 16:14 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David S . Miller, Jakub Kicinski, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Tue, Oct 26, 2021 at 05:09:47PM +0300, Ido Schimmel wrote:
> On Tue, Oct 26, 2021 at 10:18:12AM +0300, Leon Romanovsky wrote:
> > On Tue, Oct 26, 2021 at 09:51:13AM +0300, Ido Schimmel wrote:
> > 
> > <...>
> > 
> > > > 
> > > > Can you please explain why is it so important to touch devlink SW
> > > > objects, reallocate them again and again on every reload in mlxsw?
> > > 
> > > Because that's how reload was defined and implemented. A complete
> > > reload. We are not changing the semantics 4 years later.
> > 
> > Please put your emotions aside and explain me technically why are you
> > must to do it?
> 
> Already did. The current semantics are "devlink-reload provides
> mechanism to reinit driver entities, applying devlink-params and
> devlink-resources new values. It also provides mechanism to activate
> firmware."

Right, it doesn't mean that devlink should reregister itself.

> 
> And this is exactly what netdevsim and mlxsw are doing. Driver entities
> are re-initialized. Your patch breaks that as entities are not
> re-initialized, which results in user space breakage. You simply cannot
> introduce such regressions.

Again, and again. I don't want to introduce regression, and I'll fix it.
However, let's try to reach a conclusion on how to fix the current regression
properly.

> 
> > 
> > The proposed semantics was broken for last 4 years, it can even seen as
> > dead on arrival,
> 
> Again with the bombastic statements. It was "dead on arrival" like the
> notifications were "impossible"?

No, it was dead-on-arrival, because of deadlocks and kernel panics.

My search in internal bug tracker shows more than 500 bugs opened
by various teams where devlink reload is involved. It is hard to tell
which are pure devlink reload related, but when I started to work on
this series, close to 20 such bugs were assigned to me.

> 
> > because it never worked for us in real production.
> 
> Who is "us"? mlx5 that apparently decided to do its own thing?

Yes, mlx5. I don't see why you think that us not calling devlink
recursively means "own thing".

> 
> We are using reload in mlxsw on a daily basis and users are using it to
> re-partition ASIC resources and activate firmware. 

Are you really compare mlx5 deployment scale and broad of use with mlxsw?

> There are tests over netdevsim implementation that anyone can run for
> testing purposes. We also made sure to integrate it into syzkaller:
> 
> https://github.com/google/syzkaller/commit/5b49e1f605a770e8f8fcdcbd1a8ff85591fc0c8e
> https://github.com/google/syzkaller/commit/04ca72cd45348daab9d896bbec8ea4c2d13455ac
> https://github.com/google/syzkaller/commit/6930bbef3b671ae21f74007f9e59efb9b236b93f
> https://github.com/google/syzkaller/commit/d45a4d69d83f40579e74fb561e1583db1be0e294
> https://github.com/google/syzkaller/commit/510951950dc0ee69cfdaf746061d3dbe31b49fd8
> 
> Which is why the regressions you introduced were discovered so quickly.

No, it was caught because I explicitly added assertions to find misuse.

> 
> > 
> > So I'm fixing bugs without relation to when they were introduced.
> 
> We all do

Great

> 
> > 
> > For example, this fix from Jiri [1] for basic design flow was merged almost
> > two years later after devlink reload was introduced [2], or this patch from
> > Parav [3] that fixed an issue introduced year before [4].
> 
> What is your point? That code has bugs?

My point is that devlink should be decoupled from mlxsw, so mlxsw won't
call recursively to devlink when executes devlink API.

This is incorrect and for me it is a bug.

> 
> By now I have spent more time arguing with you than you spent testing
> your patches and it's clear this discussion is not going anywhere.
> 
> Are you going to send a revert or I will? This is the fourth time I'm
> asking you.

I understand your temptation to send revert, at the end it is the
easiest solution. However, I prefer to finish this discussion with
decision on how the end result in mlxsw will look like.

Let's hear Jiri and Jakub before we are rushing to revert something that
is correct in my opinion. We have whole week till merge window, and
revert takes less than 5 minutes, so no need to rush and do it before
direction is clear.

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26  5:56     ` Leon Romanovsky
@ 2021-10-26 17:34       ` Edwin Peer
  2021-10-26 19:22         ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Edwin Peer @ 2021-10-26 17:34 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ido Schimmel, David S . Miller, Jakub Kicinski, Ido Schimmel,
	Jiri Pirko, Linux Kernel Mailing List, netdev,
	syzbot+93d5accfaefceedf43c1, Michael Chan

On Mon, Oct 25, 2021 at 10:56 PM Leon Romanovsky <leon@kernel.org> wrote:

> > Could we also revert 82465bec3e97 ("devlink: Delete reload
> > enable/disable interface")?
>
> Absolutely not.

Although the following patch doesn't affect bnxt_en directly, I
believe the change that will ultimately cause regressions are the
patches of the form:

64ea2d0e7263 ("net/mlx5: Accept devlink user input after driver
initialization complete")

Removing the reload enable interface is merely the reason you're
moving devlink_register() later here, but it's the swapping of the
relative order of devlinqk_register() and register_netdev() which is
the problem.

Our proposed devlink reload depends on the netdev being registered.
This was previously gated by reload enable, but that is a secondary
issue. The real question is whether you now require devlink_register()
to go last in general? If so, that's a problem because we'll race with
user space. User visible regressions will definitely follow.

The bnxt_en driver was only saved from such regressions because you
did not carry out the same change there as you've done here.
Otherwise, you would have broken bnxt_en as a consequence. I'm
obviously not as familiar with mlx5, but I think you may have already
broken it. I imagine the only reason customers haven't complained
about this change yet is that few, if any, are running the net-next
code.

> In a nutshell, latest devlink_register() implementation is better
> implementation of previously existed "reload enable/disable" boolean.
>
> You don't need to reorder whole devlink logic, just put a call to
> devlink_register() in the place where you wanted to put your
> devlink_reload_enable().

We can't though, because of the two patches I pointed out previously.
Moving devlink_register() to the existing devlink_reload_enable()
location puts it after register_netdev(). That will cause a regression
with udev and phys port name. We already have the failing test case
and customer bug report for this. That is why devlink_register() was
moved earlier in bnxt_en. We can't now do the opposite and move it
later.

> You was supposed to update and retest your out-of-tree implementation
> of devlink reload before posting it to the ML. However, if you use
> devlink_*() API correctly, such dependency won't exist.

We dropped the ball there, absolutely, but the out-of-tree
implementation is only relevant to the extent that it is a testing
vehicle for our QA _before_ we post patches upstream. This API change
invalidates all of that testing work, which is regrettable, but not
the end of the world if your change is the right way to go (I don't
think it is).

I'm interested in the form that the _upstream_ implementation of our
devlink reload would need to take. I don't think we can move
devlink_register() later without introducing whole classes of user
space regressions. Thus, fixing our proposed devlink reload code in
the absence of devlink reload enable would entail implementing
precisely the kind of interlock that was previously provided by the
removed API, only internally. It doesn't seem right to remove a useful
shared API in favor of duplicative implementations in drivers to
accomplish the same goal.

> > I imagine other subtle regressions are lying in wait.
>
> Sorry, but we don't have crystal ball and can't guess what else is
> broken in your out-of-tree driver.

In addition to the udev phys port name regressions that we already
know about, it doesn't take a crystal ball to imagine user ifup
scripts that call into devlink. All sorts of things in user space kick
off in response to the presentation of the netdev. User enables
switchdev mode in ifup, that will break. User twiddles a devlink param
in ifup, that'll break too.

Regards,
Edwin Peer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26 16:14                   ` Leon Romanovsky
@ 2021-10-26 19:02                     ` Jakub Kicinski
  2021-10-26 19:30                       ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2021-10-26 19:02 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ido Schimmel, David S . Miller, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Tue, 26 Oct 2021 19:14:58 +0300 Leon Romanovsky wrote:
> > By now I have spent more time arguing with you than you spent testing
> > your patches and it's clear this discussion is not going anywhere.
> > 
> > Are you going to send a revert or I will? This is the fourth time I'm
> > asking you.  
> 
> I understand your temptation to send revert, at the end it is the
> easiest solution. However, I prefer to finish this discussion with
> decision on how the end result in mlxsw will look like.
> 
> Let's hear Jiri and Jakub before we are rushing to revert something that
> is correct in my opinion. We have whole week till merge window, and
> revert takes less than 5 minutes, so no need to rush and do it before
> direction is clear.

Having drivers in a broken state will not be conducive to calm discussions.
Let's do a quick revert and unbreak the selftests.

Speaking under correction, but the model of operation where we merge
patches rather quickly necessarily must also mean we are quick to
revert changes which broke stuff if the fix is not immediately obvious
or disputed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26 17:34       ` Edwin Peer
@ 2021-10-26 19:22         ` Leon Romanovsky
  2021-10-26 20:03           ` Edwin Peer
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-26 19:22 UTC (permalink / raw)
  To: Edwin Peer
  Cc: Ido Schimmel, David S . Miller, Jakub Kicinski, Ido Schimmel,
	Jiri Pirko, Linux Kernel Mailing List, netdev,
	syzbot+93d5accfaefceedf43c1, Michael Chan

On Tue, Oct 26, 2021 at 10:34:39AM -0700, Edwin Peer wrote:
> On Mon, Oct 25, 2021 at 10:56 PM Leon Romanovsky <leon@kernel.org> wrote:
> 
> > > Could we also revert 82465bec3e97 ("devlink: Delete reload
> > > enable/disable interface")?
> >
> > Absolutely not.
> 
> Although the following patch doesn't affect bnxt_en directly, I
> believe the change that will ultimately cause regressions are the
> patches of the form:
> 
> 64ea2d0e7263 ("net/mlx5: Accept devlink user input after driver
> initialization complete")
> 
> Removing the reload enable interface is merely the reason you're
> moving devlink_register() later here, but it's the swapping of the
> relative order of devlinqk_register() and register_netdev() which is
> the problem.

At least in mlx5 case, reload_enable() was before register_netdev().
It stayed like this after swapping it with devlink_register().

> 
> Our proposed devlink reload depends on the netdev being registered.
> This was previously gated by reload enable, but that is a secondary
> issue. The real question is whether you now require devlink_register()
> to go last in general? If so, that's a problem because we'll race with
> user space. User visible regressions will definitely follow.

No, it is not requirement, but my suggestion. You need to be aware that
after call to devlink_register(), the device will be fully open for devlink
netlink access. So it is strongly advised to put devlink_register to be the
last command in PCI initialization sequence.

It is exactly like it was before, but instead of one _reload_ interface
which had extra enable/disable logic, we are protecting all other set/get
commands. Before the change, you was able to send any devlink netlink commands
and crash unprotected driver.

> 
> The bnxt_en driver was only saved from such regressions because you
> did not carry out the same change there as you've done here.
> Otherwise, you would have broken bnxt_en as a consequence. I'm
> obviously not as familiar with mlx5, but I think you may have already
> broken it. I imagine the only reason customers haven't complained
> about this change yet is that few, if any, are running the net-next
> code.

This is not how mlx5 is implemented at all. We use auxiliary bus to
separate PCI core driver and eth netdev driver.

And yes, we have customers who rely on upstream code and test it
methodically.

> 
> > In a nutshell, latest devlink_register() implementation is better
> > implementation of previously existed "reload enable/disable" boolean.
> >
> > You don't need to reorder whole devlink logic, just put a call to
> > devlink_register() in the place where you wanted to put your
> > devlink_reload_enable().
> 
> We can't though, because of the two patches I pointed out previously.
> Moving devlink_register() to the existing devlink_reload_enable()
> location puts it after register_netdev(). That will cause a regression
> with udev and phys port name. We already have the failing test case
> and customer bug report for this. That is why devlink_register() was
> moved earlier in bnxt_en. We can't now do the opposite and move it
> later.

You obviously need to fix your code. Upstream version of bnxt driver
doesn't have reload_* support, so all this regression blaming it not
relevant here.

In upstream code, devlink_register() doesn't accept ops like it was
before and position of that call does only one thing - opens devlink
netlink access. All kernel devlink APIs continue to be accessible even
before devlink_register.

It looks like your failure is in backport code.

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26 19:02                     ` Jakub Kicinski
@ 2021-10-26 19:30                       ` Leon Romanovsky
  2021-10-26 19:56                         ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-26 19:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ido Schimmel, David S . Miller, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1

On Tue, Oct 26, 2021 at 12:02:34PM -0700, Jakub Kicinski wrote:
> On Tue, 26 Oct 2021 19:14:58 +0300 Leon Romanovsky wrote:
> > > By now I have spent more time arguing with you than you spent testing
> > > your patches and it's clear this discussion is not going anywhere.
> > > 
> > > Are you going to send a revert or I will? This is the fourth time I'm
> > > asking you.  
> > 
> > I understand your temptation to send revert, at the end it is the
> > easiest solution. However, I prefer to finish this discussion with
> > decision on how the end result in mlxsw will look like.
> > 
> > Let's hear Jiri and Jakub before we are rushing to revert something that
> > is correct in my opinion. We have whole week till merge window, and
> > revert takes less than 5 minutes, so no need to rush and do it before
> > direction is clear.
> 
> Having drivers in a broken state will not be conducive to calm discussions.
> Let's do a quick revert and unbreak the selftests.

No problem, I'll send a revert now, but what is your take on the direction?
IMHO, the mlxsw layering should be fixed. All this recursive devlink re-entry
looks horrible and adds unneeded complexity.

> 
> Speaking under correction, but the model of operation where we merge
> patches rather quickly necessarily must also mean we are quick to
> revert changes which broke stuff if the fix is not immediately obvious
> or disputed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26 19:30                       ` Leon Romanovsky
@ 2021-10-26 19:56                         ` Jakub Kicinski
  2021-10-27  5:56                           ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2021-10-26 19:56 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ido Schimmel, David S . Miller, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1, Edwin Peer

On Tue, 26 Oct 2021 22:30:23 +0300 Leon Romanovsky wrote:
> On Tue, Oct 26, 2021 at 12:02:34PM -0700, Jakub Kicinski wrote:
> > On Tue, 26 Oct 2021 19:14:58 +0300 Leon Romanovsky wrote:  
> > > I understand your temptation to send revert, at the end it is the
> > > easiest solution. However, I prefer to finish this discussion with
> > > decision on how the end result in mlxsw will look like.
> > > 
> > > Let's hear Jiri and Jakub before we are rushing to revert something that
> > > is correct in my opinion. We have whole week till merge window, and
> > > revert takes less than 5 minutes, so no need to rush and do it before
> > > direction is clear.  
> > 
> > Having drivers in a broken state will not be conducive to calm discussions.
> > Let's do a quick revert and unbreak the selftests.  
> 
> No problem, I'll send a revert now, but what is your take on the direction?

I haven't put in the time to understand the detail so I was hoping not
to pass judgment on the direction. My likely unfounded feeling is that
reshuffling ordering is not going to fix what is fundamentally a
locking issue. Driver has internal locks it needs to hold both inside
devlink callbacks and when registering devlink objects. We would solve
a lot of the problems if those were one single lock instead of two. 
At least that's my recollection from the times I was actually writing
driver code...

> IMHO, the mlxsw layering should be fixed. All this recursive devlink re-entry
> looks horrible and adds unneeded complexity.

If you're asking about mlxsw or bnxt in particular I wouldn't say what
they do is wrong until we can point out bugs.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26 19:22         ` Leon Romanovsky
@ 2021-10-26 20:03           ` Edwin Peer
  2021-10-27  6:43             ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Edwin Peer @ 2021-10-26 20:03 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ido Schimmel, David S . Miller, Jakub Kicinski, Ido Schimmel,
	Jiri Pirko, Linux Kernel Mailing List, netdev,
	syzbot+93d5accfaefceedf43c1, Michael Chan

On Tue, Oct 26, 2021 at 12:22 PM Leon Romanovsky <leon@kernel.org> wrote:

> At least in mlx5 case, reload_enable() was before register_netdev().
> It stayed like this after swapping it with devlink_register().

What am I missing here?

err = mlx5_init_one(dev);
if (err) {
       mlx5_core_err(dev, "mlx5_init_one failed with error code %d\n", err);
       goto err_init_one;
}

err = mlx5_crdump_enable(dev);
if (err)
        dev_err(&pdev->dev, "mlx5_crdump_enable failed with error code
%d\n", err);

pci_save_state(pdev);
devlink_register(devlink);

Doesn't mlx5_init_one() ultimately result in the netdev being
presented to user space, even if it is via aux bus?

> No, it is not requirement, but my suggestion. You need to be aware that
> after call to devlink_register(), the device will be fully open for devlink
> netlink access. So it is strongly advised to put devlink_register to be the
> last command in PCI initialization sequence.

Right, that's the problem. Once we register the netdev, we're in a
race with user space, which may expect to be able to call devlink
before we get to devlink_register().

> You obviously need to fix your code. Upstream version of bnxt driver
> doesn't have reload_* support, so all this regression blaming it not
> relevant here.

Right, our timing is unfortunate and that's on us. It's still not
clear to me how to actually fix the devlink reload code without the
benefit of something similar to the reload enable API.

> In upstream code, devlink_register() doesn't accept ops like it was
> before and position of that call does only one thing - opens devlink
> netlink access. All kernel devlink APIs continue to be accessible even
> before devlink_register.

This isn't about kernel API. This is precisely about existing user
space that expects devlink to work immediately after the netdev
appears.

> It looks like your failure is in backport code.

Our out-of-tree driver isn't the issue here. I'm talking about the
proposed upstream code. The issue is what to do in order to get
something workable upstream for devlink reload. We can't move
devlink_register() later, that will cause a regression. What do you
suggest instead?

Regards,
Edwin Peer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26 19:56                         ` Jakub Kicinski
@ 2021-10-27  5:56                           ` Leon Romanovsky
  2021-10-27 14:17                             ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-27  5:56 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ido Schimmel, David S . Miller, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1, Edwin Peer

On Tue, Oct 26, 2021 at 12:56:02PM -0700, Jakub Kicinski wrote:
> On Tue, 26 Oct 2021 22:30:23 +0300 Leon Romanovsky wrote:
> > On Tue, Oct 26, 2021 at 12:02:34PM -0700, Jakub Kicinski wrote:
> > > On Tue, 26 Oct 2021 19:14:58 +0300 Leon Romanovsky wrote:  
> > > > I understand your temptation to send revert, at the end it is the
> > > > easiest solution. However, I prefer to finish this discussion with
> > > > decision on how the end result in mlxsw will look like.
> > > > 
> > > > Let's hear Jiri and Jakub before we are rushing to revert something that
> > > > is correct in my opinion. We have whole week till merge window, and
> > > > revert takes less than 5 minutes, so no need to rush and do it before
> > > > direction is clear.  
> > > 
> > > Having drivers in a broken state will not be conducive to calm discussions.
> > > Let's do a quick revert and unbreak the selftests.  
> > 
> > No problem, I'll send a revert now, but what is your take on the direction?
> 
> I haven't put in the time to understand the detail so I was hoping not
> to pass judgment on the direction. My likely unfounded feeling is that
> reshuffling ordering is not going to fix what is fundamentally a
> locking issue. Driver has internal locks it needs to hold both inside
> devlink callbacks and when registering devlink objects. We would solve
> a lot of the problems if those were one single lock instead of two. 
> At least that's my recollection from the times I was actually writing
> driver code...

Exactly, and this is what reshuffling of registrations does. It allows us
to actually reduce number of locks to bare minimum, so at least creation
and deletion of devlink objects will be locks free.

Latest changes already solved devlink reload issues for mlx5 eth side
and it is deadlock and lockdep free now. We still have deadlocks with
our IB part, where we obligated to hold pernet lock during registering
to net notifiers, but it is different discussion.

> 
> > IMHO, the mlxsw layering should be fixed. All this recursive devlink re-entry
> > looks horrible and adds unneeded complexity.
> 
> If you're asking about mlxsw or bnxt in particular I wouldn't say what
> they do is wrong until we can point out bugs.

I'm talking about mlxsw and pointed to the reentry to devlink over and over.

From what I read about bnxt, it is test issue.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-26 20:03           ` Edwin Peer
@ 2021-10-27  6:43             ` Leon Romanovsky
  2021-10-27  8:46               ` Edwin Peer
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-27  6:43 UTC (permalink / raw)
  To: Edwin Peer
  Cc: Ido Schimmel, David S . Miller, Jakub Kicinski, Ido Schimmel,
	Jiri Pirko, Linux Kernel Mailing List, netdev,
	syzbot+93d5accfaefceedf43c1, Michael Chan

On Tue, Oct 26, 2021 at 01:03:41PM -0700, Edwin Peer wrote:
> On Tue, Oct 26, 2021 at 12:22 PM Leon Romanovsky <leon@kernel.org> wrote:
> 
> > At least in mlx5 case, reload_enable() was before register_netdev().
> > It stayed like this after swapping it with devlink_register().
> 
> What am I missing here?
> 
> err = mlx5_init_one(dev);
> if (err) {
>        mlx5_core_err(dev, "mlx5_init_one failed with error code %d\n", err);
>        goto err_init_one;
> }
> 
> err = mlx5_crdump_enable(dev);
> if (err)
>         dev_err(&pdev->dev, "mlx5_crdump_enable failed with error code
> %d\n", err);
> 
> pci_save_state(pdev);
> devlink_register(devlink);
> 
> Doesn't mlx5_init_one() ultimately result in the netdev being
> presented to user space, even if it is via aux bus?

The mlx5_init_one() aux devices, and driver is not always loaded
directly in the Linux kernel. The device creation triggers udev event,
which is handled by udev systemd. The systemd reads various modules.* files
that kernel provides and this is how it knows which driver to load.

In our case, the eth driver is part of mlx5_core module, so at the
device creation phase that module is already loaded and driver/core
will try to autoprobe it.

However, the last step is not always performed and controlled by the
userspace. Users can disable driver autoprobe and bind manually. This
is pretty standard practice in the SR-IOV or VFIO modes.

> 
> > No, it is not requirement, but my suggestion. You need to be aware that
> > after call to devlink_register(), the device will be fully open for devlink
> > netlink access. So it is strongly advised to put devlink_register to be the
> > last command in PCI initialization sequence.
> 
> Right, that's the problem. Once we register the netdev, we're in a
> race with user space, which may expect to be able to call devlink
> before we get to devlink_register().

This is why devlink has monitor mode where you can see devlink device
addition and removal. It is user space job to check that device is
ready.

> 
> > You obviously need to fix your code. Upstream version of bnxt driver
> > doesn't have reload_* support, so all this regression blaming it not
> > relevant here.
> 
> Right, our timing is unfortunate and that's on us. It's still not
> clear to me how to actually fix the devlink reload code without the
> benefit of something similar to the reload enable API.
> 
> > In upstream code, devlink_register() doesn't accept ops like it was
> > before and position of that call does only one thing - opens devlink
> > netlink access. All kernel devlink APIs continue to be accessible even
> > before devlink_register.
> 
> This isn't about kernel API. This is precisely about existing user
> space that expects devlink to work immediately after the netdev
> appears.

Can you please share open source project that has such assumption?

> 
> > It looks like your failure is in backport code.
> 
> Our out-of-tree driver isn't the issue here. I'm talking about the
> proposed upstream code. The issue is what to do in order to get
> something workable upstream for devlink reload. We can't move
> devlink_register() later, that will cause a regression. What do you
> suggest instead?

Fix your test respect devlink notifications and don't ignore them.

Thanks

> 
> Regards,
> Edwin Peer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-27  6:43             ` Leon Romanovsky
@ 2021-10-27  8:46               ` Edwin Peer
  2021-10-27  9:43                 ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Edwin Peer @ 2021-10-27  8:46 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ido Schimmel, David S . Miller, Jakub Kicinski, Ido Schimmel,
	Jiri Pirko, Linux Kernel Mailing List, netdev,
	syzbot+93d5accfaefceedf43c1, Michael Chan

On Tue, Oct 26, 2021 at 11:43 PM Leon Romanovsky <leon@kernel.org> wrote:

> In our case, the eth driver is part of mlx5_core module, so at the
> device creation phase that module is already loaded and driver/core
> will try to autoprobe it.

> However, the last step is not always performed and controlled by the
> userspace. Users can disable driver autoprobe and bind manually. This
> is pretty standard practice in the SR-IOV or VFIO modes.

While you say the netdev will not necessarily be bound, that still
sounds like the netdev will indeed be presented to user space before
devlink_register() when it is auto-probed?

> This is why devlink has monitor mode where you can see devlink device
> addition and removal. It is user space job to check that device is
> ready.

Isn't it more a question of what existing user space _does_ rather
than what user space _should_ do?

> > This isn't about kernel API. This is precisely about existing user
> > space that expects devlink to work immediately after the netdev
> > appears.
>
> Can you please share open source project that has such assumption?

I'm no python expert, but it looks like
https://github.com/openstack-charmers/mlnx-switchdev-mode/ might.
We've certainly had implicit user space assumptions trip over
registration order before, hence the change we made in January last
year to move devlink registration earlier. Granted, upon deeper
analysis, that specific case pertained to phys port name via sysfs,
which technically only needs port attrs via ndo_get_devlink_port, not
devlink_register(). That said, I'm certainly not confident that there
are no other existing users that might expect to be able to invoke
devlink in ifup scripts.

> > What do you suggest instead?
>
> Fix your test respect devlink notifications and don't ignore them.

That's not very helpful. The test case does what the user in the field
did to break it. We can't assume users have always been using our APIs
the way we intended.

Regards,
Edwin Peer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-27  8:46               ` Edwin Peer
@ 2021-10-27  9:43                 ` Leon Romanovsky
  0 siblings, 0 replies; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-27  9:43 UTC (permalink / raw)
  To: Edwin Peer
  Cc: Ido Schimmel, David S . Miller, Jakub Kicinski, Ido Schimmel,
	Jiri Pirko, Linux Kernel Mailing List, netdev,
	syzbot+93d5accfaefceedf43c1, Michael Chan

On Wed, Oct 27, 2021 at 01:46:44AM -0700, Edwin Peer wrote:
> On Tue, Oct 26, 2021 at 11:43 PM Leon Romanovsky <leon@kernel.org> wrote:
> 
> > In our case, the eth driver is part of mlx5_core module, so at the
> > device creation phase that module is already loaded and driver/core
> > will try to autoprobe it.
> 
> > However, the last step is not always performed and controlled by the
> > userspace. Users can disable driver autoprobe and bind manually. This
> > is pretty standard practice in the SR-IOV or VFIO modes.
> 
> While you say the netdev will not necessarily be bound, that still
> sounds like the netdev will indeed be presented to user space before
> devlink_register() when it is auto-probed?

And it is ok, there is no binding between netdev and devlink. They are
independent. For example, IB cards have devlink through mlx5_core, but
without netdev at all.

> 
> > This is why devlink has monitor mode where you can see devlink device
> > addition and removal. It is user space job to check that device is
> > ready.
> 
> Isn't it more a question of what existing user space _does_ rather
> than what user space _should_ do?

No, because races of kernel vs. user space always existed and every time,
the answer was the same - it is user space job to avoid race. Kernel
needs to protect itself and we successfully did it here.

Latest example of such programming model can be seen in VFIO too:
https://lore.kernel.org/kvm/20211026090605.91646-1-yishaih@nvidia.com/T/#m89ef326da9405dcefe482bd5a6a54aff7e0b90bc
"Yes. If the userspace races ioctls they get a deserved mess.

This race exists no matter what we do, as soon as the unlock happens a
racing reset ioctl could run in during the system call exit path.

The purpose of the locking is to protect the kernel from hostile
userspace, not to allow userspace to execute concurrent ioctl's in a
sensible way."

> 
> > > This isn't about kernel API. This is precisely about existing user
> > > space that expects devlink to work immediately after the netdev
> > > appears.
> >
> > Can you please share open source project that has such assumption?
> 
> I'm no python expert, but it looks like
> https://github.com/openstack-charmers/mlnx-switchdev-mode/ might.

No, it is not.

> We've certainly had implicit user space assumptions trip over
> registration order before, hence the change we made in January last
> year to move devlink registration earlier. Granted, upon deeper
> analysis, that specific case pertained to phys port name via sysfs,
> which technically only needs port attrs via ndo_get_devlink_port, not
> devlink_register(). That said, I'm certainly not confident that there
> are no other existing users that might expect to be able to invoke
> devlink in ifup scripts.

It is not what your change did, you moved devlink ops registration logic
to be before register_netdev(), this is continue to be in upstream code.
We are registering ops in devlink_alloc() and that function stays to be
at the same place.

> 
> > > What do you suggest instead?
> >
> > Fix your test respect devlink notifications and don't ignore them.
> 
> That's not very helpful. The test case does what the user in the field
> did to break it. We can't assume users have always been using our APIs
> the way we intended.

There is no API here.

Thanks

> 
> Regards,
> Edwin Peer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-27  5:56                           ` Leon Romanovsky
@ 2021-10-27 14:17                             ` Jakub Kicinski
  2021-10-27 15:17                               ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2021-10-27 14:17 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ido Schimmel, David S . Miller, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1, Edwin Peer

On Wed, 27 Oct 2021 08:56:45 +0300 Leon Romanovsky wrote:
> On Tue, Oct 26, 2021 at 12:56:02PM -0700, Jakub Kicinski wrote:
> > On Tue, 26 Oct 2021 22:30:23 +0300 Leon Romanovsky wrote:  
> > > No problem, I'll send a revert now, but what is your take on the direction?  
> > 
> > I haven't put in the time to understand the detail so I was hoping not
> > to pass judgment on the direction. My likely unfounded feeling is that
> > reshuffling ordering is not going to fix what is fundamentally a
> > locking issue. Driver has internal locks it needs to hold both inside
> > devlink callbacks and when registering devlink objects. We would solve
> > a lot of the problems if those were one single lock instead of two. 
> > At least that's my recollection from the times I was actually writing
> > driver code...  
> 
> Exactly, and this is what reshuffling of registrations does. It allows us
> to actually reduce number of locks to bare minimum, so at least creation
> and deletion of devlink objects will be locks free.

That's not what I meant. I meant devlink should call in to take
driver's lock or more likely driver should use the devlink instance
mutex instead of creating its own. Most of the devlink helpers (with
minor exceptions like alloc) should just assert that devlink instance
lock is already held by the driver when called.

> Latest changes already solved devlink reload issues for mlx5 eth side
> and it is deadlock and lockdep free now. We still have deadlocks with
> our IB part, where we obligated to hold pernet lock during registering
> to net notifiers, but it is different discussion.
> 
> > > IMHO, the mlxsw layering should be fixed. All this recursive devlink re-entry
> > > looks horrible and adds unneeded complexity.  
> > 
> > If you're asking about mlxsw or bnxt in particular I wouldn't say what
> > they do is wrong until we can point out bugs.  
> 
> I'm talking about mlxsw and pointed to the reentry to devlink over and over.

To me "pointing to re-entry" read like breaking the new model you have
in mind, not actual bug/race/deadlock etc. If that's not the case the
explanation flew over my head :)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-27 14:17                             ` Jakub Kicinski
@ 2021-10-27 15:17                               ` Leon Romanovsky
  2021-10-27 19:15                                 ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-27 15:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ido Schimmel, David S . Miller, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1, Edwin Peer

On Wed, Oct 27, 2021 at 07:17:23AM -0700, Jakub Kicinski wrote:
> On Wed, 27 Oct 2021 08:56:45 +0300 Leon Romanovsky wrote:
> > On Tue, Oct 26, 2021 at 12:56:02PM -0700, Jakub Kicinski wrote:
> > > On Tue, 26 Oct 2021 22:30:23 +0300 Leon Romanovsky wrote:  
> > > > No problem, I'll send a revert now, but what is your take on the direction?  
> > > 
> > > I haven't put in the time to understand the detail so I was hoping not
> > > to pass judgment on the direction. My likely unfounded feeling is that
> > > reshuffling ordering is not going to fix what is fundamentally a
> > > locking issue. Driver has internal locks it needs to hold both inside
> > > devlink callbacks and when registering devlink objects. We would solve
> > > a lot of the problems if those were one single lock instead of two. 
> > > At least that's my recollection from the times I was actually writing
> > > driver code...  
> > 
> > Exactly, and this is what reshuffling of registrations does. It allows us
> > to actually reduce number of locks to bare minimum, so at least creation
> > and deletion of devlink objects will be locks free.
> 
> That's not what I meant. I meant devlink should call in to take
> driver's lock or more likely driver should use the devlink instance
> mutex instead of creating its own. Most of the devlink helpers (with
> minor exceptions like alloc) should just assert that devlink instance
> lock is already held by the driver when called.
> 
> > Latest changes already solved devlink reload issues for mlx5 eth side
> > and it is deadlock and lockdep free now. We still have deadlocks with
> > our IB part, where we obligated to hold pernet lock during registering
> > to net notifiers, but it is different discussion.
> > 
> > > > IMHO, the mlxsw layering should be fixed. All this recursive devlink re-entry
> > > > looks horrible and adds unneeded complexity.  
> > > 
> > > If you're asking about mlxsw or bnxt in particular I wouldn't say what
> > > they do is wrong until we can point out bugs.  
> > 
> > I'm talking about mlxsw and pointed to the reentry to devlink over and over.
> 
> To me "pointing to re-entry" read like breaking the new model you have
> in mind, not actual bug/race/deadlock etc. If that's not the case the
> explanation flew over my head :)

It doesn't break, but complicates without any reason.

Let me try to summarize my vision for the devlink. It is not written
in stone and changes after every review comment. :)

I want to divide all devlink APIs into two buckets:
1. Before devlink_register() - we don't need to worry about locking at
all. If caller decides to go crazy and wants parallel calls to devlink
in this stage, he/she will need to be responsible for proper locking.

2. After devlink_register() - users can send their commands through
netlink, so we need maximum protection. The devlink core will have
RW semaphore to make sure that every entry in this stage is marked
or read (allow parallel calls) or write (exclusive access). Plus we
have a barrier for devlink_unregister().

My goal is to move as much as possible in first bucket, and various
devlink_*_register() calls are natural candidates for it.

In addition, they are candidates because devlink is SW layer, in my
view everything or almost everything should be allocated during driver
init with const arrays and feature bits with clear separation between
devlink and driver beneath.

Call chains like "devlink->driver->devlink->driver.." that exist in
mlxsw can't be correct from layering POV.

The current implementation of devlink reload in mlxsw adds amazingly
large amount of over engineering to main devlink core, because it drags
many (constant) initializations to be in bucket #2.

Bottom line, convert devlink core to be similar to driver core. :)

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-27 15:17                               ` Leon Romanovsky
@ 2021-10-27 19:15                                 ` Leon Romanovsky
  2021-10-27 19:28                                   ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2021-10-27 19:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ido Schimmel, David S . Miller, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1, Edwin Peer

On Wed, Oct 27, 2021 at 06:17:25PM +0300, Leon Romanovsky wrote:
> On Wed, Oct 27, 2021 at 07:17:23AM -0700, Jakub Kicinski wrote:
> > On Wed, 27 Oct 2021 08:56:45 +0300 Leon Romanovsky wrote:
> > > On Tue, Oct 26, 2021 at 12:56:02PM -0700, Jakub Kicinski wrote:
> > > > On Tue, 26 Oct 2021 22:30:23 +0300 Leon Romanovsky wrote:  
> > > > > No problem, I'll send a revert now, but what is your take on the direction?  
> > > > 
> > > > I haven't put in the time to understand the detail so I was hoping not
> > > > to pass judgment on the direction. My likely unfounded feeling is that
> > > > reshuffling ordering is not going to fix what is fundamentally a
> > > > locking issue. Driver has internal locks it needs to hold both inside
> > > > devlink callbacks and when registering devlink objects. We would solve
> > > > a lot of the problems if those were one single lock instead of two. 
> > > > At least that's my recollection from the times I was actually writing
> > > > driver code...  
> > > 
> > > Exactly, and this is what reshuffling of registrations does. It allows us
> > > to actually reduce number of locks to bare minimum, so at least creation
> > > and deletion of devlink objects will be locks free.
> > 
> > That's not what I meant. I meant devlink should call in to take
> > driver's lock or more likely driver should use the devlink instance
> > mutex instead of creating its own. Most of the devlink helpers (with
> > minor exceptions like alloc) should just assert that devlink instance
> > lock is already held by the driver when called.
> > 
> > > Latest changes already solved devlink reload issues for mlx5 eth side
> > > and it is deadlock and lockdep free now. We still have deadlocks with
> > > our IB part, where we obligated to hold pernet lock during registering
> > > to net notifiers, but it is different discussion.
> > > 
> > > > > IMHO, the mlxsw layering should be fixed. All this recursive devlink re-entry
> > > > > looks horrible and adds unneeded complexity.  
> > > > 
> > > > If you're asking about mlxsw or bnxt in particular I wouldn't say what
> > > > they do is wrong until we can point out bugs.  
> > > 
> > > I'm talking about mlxsw and pointed to the reentry to devlink over and over.
> > 
> > To me "pointing to re-entry" read like breaking the new model you have
> > in mind, not actual bug/race/deadlock etc. If that's not the case the
> > explanation flew over my head :)
> 
> It doesn't break, but complicates without any reason.
> 
> Let me try to summarize my vision for the devlink. It is not written
> in stone and changes after every review comment. :)
> 
> I want to divide all devlink APIs into two buckets:
> 1. Before devlink_register() - we don't need to worry about locking at
> all. If caller decides to go crazy and wants parallel calls to devlink
> in this stage, he/she will need to be responsible for proper locking.
> 
> 2. After devlink_register() - users can send their commands through
> netlink, so we need maximum protection. The devlink core will have
> RW semaphore to make sure that every entry in this stage is marked
> or read (allow parallel calls) or write (exclusive access). Plus we
> have a barrier for devlink_unregister().
> 
> My goal is to move as much as possible in first bucket, and various
> devlink_*_register() calls are natural candidates for it.
> 
> In addition, they are candidates because devlink is SW layer, in my
> view everything or almost everything should be allocated during driver
> init with const arrays and feature bits with clear separation between
> devlink and driver beneath.
> 
> Call chains like "devlink->driver->devlink->driver.." that exist in
> mlxsw can't be correct from layering POV.

One of the outcomes is that such chain usually prevents from us to ensure
proper locking annotation.

Let's take as an example devlink_trap_policers_register().
In some drivers, it is called before device_register() and we don't need
any locks at all, because we are in initialization flow.

In mlxsw, it is called during devlink reload, and we don't really need to
lock it too, because we were supposed to lock everything for the reload.

However, for the mlxsw, we created devlink_trap_policers_register() to be
dynamic, so we must lock devlink->lock, as we don't know how other users
of this API will use it.

In the reality, no one uses it dynamically except mlxsw and we stuck
with function that calls to useless lock without us able to properly
annotate it with an invitation to misuse.

It is an example of layering problem, there are many more subtle issues
like this that require some cabal knowledge of proper locks to make it
is safe.

Thanks

> 
> The current implementation of devlink reload in mlxsw adds amazingly
> large amount of over engineering to main devlink core, because it drags
> many (constant) initializations to be in bucket #2.
> 
> Bottom line, convert devlink core to be similar to driver core. :)
> 
> Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device
  2021-10-27 19:15                                 ` Leon Romanovsky
@ 2021-10-27 19:28                                   ` Jakub Kicinski
  0 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2021-10-27 19:28 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ido Schimmel, David S . Miller, Ido Schimmel, Jiri Pirko,
	linux-kernel, netdev, syzbot+93d5accfaefceedf43c1, Edwin Peer

On Wed, 27 Oct 2021 22:15:41 +0300 Leon Romanovsky wrote:
> One of the outcomes is that such chain usually prevents from us to ensure
> proper locking annotation.
> 
> Let's take as an example devlink_trap_policers_register().
> In some drivers, it is called before device_register() and we don't need
> any locks at all, because we are in initialization flow.
> 
> In mlxsw, it is called during devlink reload, and we don't really need to
> lock it too, because we were supposed to lock everything for the reload.
> 
> However, for the mlxsw, we created devlink_trap_policers_register() to be
> dynamic, so we must lock devlink->lock, as we don't know how other users
> of this API will use it.
> 
> In the reality, no one uses it dynamically except mlxsw and we stuck
> with function that calls to useless lock without us able to properly
> annotate it with an invitation to misuse.
> 
> It is an example of layering problem, there are many more subtle issues
> like this that require some cabal knowledge of proper locks to make it
> is safe.

Now that you made me express my opinion I started feeling attached to
my way of thinking :) Let me try to convert devlink core, netdevsim and
nfp to devlink instance locking and see how far I can get...

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-10-27 19:28 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-24  8:42 [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device Leon Romanovsky
2021-10-24  9:05 ` Ido Schimmel
2021-10-24  9:54   ` Leon Romanovsky
2021-10-24 10:48     ` Ido Schimmel
2021-10-25  5:34       ` Leon Romanovsky
2021-10-25  8:08         ` Ido Schimmel
2021-10-25 10:56           ` Leon Romanovsky
2021-10-26  6:51             ` Ido Schimmel
2021-10-26  7:18               ` Leon Romanovsky
2021-10-26 14:09                 ` Ido Schimmel
2021-10-26 16:14                   ` Leon Romanovsky
2021-10-26 19:02                     ` Jakub Kicinski
2021-10-26 19:30                       ` Leon Romanovsky
2021-10-26 19:56                         ` Jakub Kicinski
2021-10-27  5:56                           ` Leon Romanovsky
2021-10-27 14:17                             ` Jakub Kicinski
2021-10-27 15:17                               ` Leon Romanovsky
2021-10-27 19:15                                 ` Leon Romanovsky
2021-10-27 19:28                                   ` Jakub Kicinski
2021-10-25 18:24           ` Jakub Kicinski
2021-10-25 19:12             ` Leon Romanovsky
2021-10-25 23:19   ` Edwin Peer
2021-10-26  5:56     ` Leon Romanovsky
2021-10-26 17:34       ` Edwin Peer
2021-10-26 19:22         ` Leon Romanovsky
2021-10-26 20:03           ` Edwin Peer
2021-10-27  6:43             ` Leon Romanovsky
2021-10-27  8:46               ` Edwin Peer
2021-10-27  9:43                 ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).