All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net] devlink: change per-devlink netdev notifier to static one
@ 2023-05-10 14:46 Jiri Pirko
  2023-05-11  1:01 ` Jakub Kicinski
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Jiri Pirko @ 2023-05-10 14:46 UTC (permalink / raw)
  To: netdev; +Cc: kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

From: Jiri Pirko <jiri@nvidia.com>

The commit 565b4824c39f ("devlink: change port event netdev notifier
from per-net to global") changed original per-net notifier to be
per-devlink instance. That fixed the issue of non-receiving events
of netdev uninit if that moved to a different namespace.
That worked fine in -net tree.

However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
separate devlink instance for ethernet auxiliary device") and
commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
case of PCI device suspend") were merged, a deadlock was introduced
when removing a namespace with devlink instance with another nested
instance.

Here there is the bad flow example resulting in deadlock with mlx5:
net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
devlink_pernet_pre_exit() -> devlink_reload() ->
mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
mlx5e_destroy_devlink() -> devlink_free() ->
unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)

Steps to reproduce:
$ modprobe mlx5_core
$ ip netns add ns1
$ devlink dev reload pci/0000:08:00.0 netns ns1
$ ip netns del ns1

Resolve this by converting the notifier from per-devlink instance to
a static one registered during init phase and leaving it registered
forever. Use this notifier for all devlink port instances created
later on.

Note what a tree needs this fix only in case all of the cited fixes
commits are present.

Reported-by: Moshe Shemesh <moshe@nvidia.com>
Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 net/devlink/core.c          | 16 +++++++---------
 net/devlink/devl_internal.h |  1 -
 net/devlink/leftover.c      |  5 ++---
 3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/net/devlink/core.c b/net/devlink/core.c
index 777b091ef74d..0e58eee44bdb 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -204,11 +204,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 	if (ret < 0)
 		goto err_xa_alloc;
 
-	devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event;
-	ret = register_netdevice_notifier(&devlink->netdevice_nb);
-	if (ret)
-		goto err_register_netdevice_notifier;
-
 	devlink->dev = dev;
 	devlink->ops = ops;
 	xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC);
@@ -233,8 +228,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 
 	return devlink;
 
-err_register_netdevice_notifier:
-	xa_erase(&devlinks, devlink->index);
 err_xa_alloc:
 	kfree(devlink);
 	return NULL;
@@ -266,8 +259,6 @@ void devlink_free(struct devlink *devlink)
 	xa_destroy(&devlink->params);
 	xa_destroy(&devlink->ports);
 
-	WARN_ON_ONCE(unregister_netdevice_notifier(&devlink->netdevice_nb));
-
 	xa_erase(&devlinks, devlink->index);
 
 	devlink_put(devlink);
@@ -303,6 +294,10 @@ static struct pernet_operations devlink_pernet_ops __net_initdata = {
 	.pre_exit = devlink_pernet_pre_exit,
 };
 
+static struct notifier_block devlink_port_netdevice_nb __net_initdata = {
+	.notifier_call = devlink_port_netdevice_event,
+};
+
 static int __init devlink_init(void)
 {
 	int err;
@@ -311,6 +306,9 @@ static int __init devlink_init(void)
 	if (err)
 		goto out;
 	err = register_pernet_subsys(&devlink_pernet_ops);
+	if (err)
+		goto out;
+	err = register_netdevice_notifier(&devlink_port_netdevice_nb);
 
 out:
 	WARN_ON(err);
diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
index e133f423294a..62921b2eb0d3 100644
--- a/net/devlink/devl_internal.h
+++ b/net/devlink/devl_internal.h
@@ -50,7 +50,6 @@ struct devlink {
 	u8 reload_failed:1;
 	refcount_t refcount;
 	struct rcu_work rwork;
-	struct notifier_block netdevice_nb;
 	char priv[] __aligned(NETDEV_ALIGN);
 };
 
diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
index dffca2f9bfa7..cd0254968076 100644
--- a/net/devlink/leftover.c
+++ b/net/devlink/leftover.c
@@ -7073,10 +7073,9 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
 	struct devlink_port *devlink_port = netdev->devlink_port;
 	struct devlink *devlink;
 
-	devlink = container_of(nb, struct devlink, netdevice_nb);
-
-	if (!devlink_port || devlink_port->devlink != devlink)
+	if (!devlink_port)
 		return NOTIFY_OK;
+	devlink = devlink_port->devlink;
 
 	switch (event) {
 	case NETDEV_POST_INIT:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch net] devlink: change per-devlink netdev notifier to static one
  2023-05-10 14:46 [patch net] devlink: change per-devlink netdev notifier to static one Jiri Pirko
@ 2023-05-11  1:01 ` Jakub Kicinski
  2023-05-11 12:18 ` Simon Horman
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2023-05-11  1:01 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

On Wed, 10 May 2023 16:46:21 +0200 Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> The commit 565b4824c39f ("devlink: change port event netdev notifier
> from per-net to global") changed original per-net notifier to be
> per-devlink instance. That fixed the issue of non-receiving events
> of netdev uninit if that moved to a different namespace.
> That worked fine in -net tree.
> 
> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
> separate devlink instance for ethernet auxiliary device") and
> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
> case of PCI device suspend") were merged, a deadlock was introduced
> when removing a namespace with devlink instance with another nested
> instance.
> 
> Here there is the bad flow example resulting in deadlock with mlx5:
> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
> devlink_pernet_pre_exit() -> devlink_reload() ->
> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
> mlx5e_destroy_devlink() -> devlink_free() ->
> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
> 
> Steps to reproduce:
> $ modprobe mlx5_core
> $ ip netns add ns1
> $ devlink dev reload pci/0000:08:00.0 netns ns1
> $ ip netns del ns1
> 
> Resolve this by converting the notifier from per-devlink instance to
> a static one registered during init phase and leaving it registered
> forever. Use this notifier for all devlink port instances created
> later on.
> 
> Note what a tree needs this fix only in case all of the cited fixes
> commits are present.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>

For posterity v1(/previous):
https://lore.kernel.org/all/20230509100939.760867-1-jiri@resnulli.us/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch net] devlink: change per-devlink netdev notifier to static one
  2023-05-10 14:46 [patch net] devlink: change per-devlink netdev notifier to static one Jiri Pirko
  2023-05-11  1:01 ` Jakub Kicinski
@ 2023-05-11 12:18 ` Simon Horman
  2023-05-12  1:10 ` patchwork-bot+netdevbpf
       [not found] ` <CGME20230515090912eucas1p2489efdc97f9cf1fddf2aad0449e8a2c7@eucas1p2.samsung.com>
  3 siblings, 0 replies; 10+ messages in thread
From: Simon Horman @ 2023-05-11 12:18 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

On Wed, May 10, 2023 at 04:46:21PM +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> The commit 565b4824c39f ("devlink: change port event netdev notifier
> from per-net to global") changed original per-net notifier to be
> per-devlink instance. That fixed the issue of non-receiving events
> of netdev uninit if that moved to a different namespace.
> That worked fine in -net tree.
> 
> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
> separate devlink instance for ethernet auxiliary device") and
> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
> case of PCI device suspend") were merged, a deadlock was introduced
> when removing a namespace with devlink instance with another nested
> instance.
> 
> Here there is the bad flow example resulting in deadlock with mlx5:
> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
> devlink_pernet_pre_exit() -> devlink_reload() ->
> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
> mlx5e_destroy_devlink() -> devlink_free() ->
> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
> 
> Steps to reproduce:
> $ modprobe mlx5_core
> $ ip netns add ns1
> $ devlink dev reload pci/0000:08:00.0 netns ns1
> $ ip netns del ns1
> 
> Resolve this by converting the notifier from per-devlink instance to
> a static one registered during init phase and leaving it registered
> forever. Use this notifier for all devlink port instances created
> later on.
> 
> Note what a tree needs this fix only in case all of the cited fixes
> commits are present.
> 
> Reported-by: Moshe Shemesh <moshe@nvidia.com>
> Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
> Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
> Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch net] devlink: change per-devlink netdev notifier to static one
  2023-05-10 14:46 [patch net] devlink: change per-devlink netdev notifier to static one Jiri Pirko
  2023-05-11  1:01 ` Jakub Kicinski
  2023-05-11 12:18 ` Simon Horman
@ 2023-05-12  1:10 ` patchwork-bot+netdevbpf
       [not found] ` <CGME20230515090912eucas1p2489efdc97f9cf1fddf2aad0449e8a2c7@eucas1p2.samsung.com>
  3 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-05-12  1:10 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 10 May 2023 16:46:21 +0200 you wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> The commit 565b4824c39f ("devlink: change port event netdev notifier
> from per-net to global") changed original per-net notifier to be
> per-devlink instance. That fixed the issue of non-receiving events
> of netdev uninit if that moved to a different namespace.
> That worked fine in -net tree.
> 
> [...]

Here is the summary with links:
  - [net] devlink: change per-devlink netdev notifier to static one
    https://git.kernel.org/netdev/net/c/e93c9378e33f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch net] devlink: change per-devlink netdev notifier to static one
       [not found] ` <CGME20230515090912eucas1p2489efdc97f9cf1fddf2aad0449e8a2c7@eucas1p2.samsung.com>
@ 2023-05-15  9:09   ` Marek Szyprowski
  2023-05-15 11:35     ` Jiri Pirko
  0 siblings, 1 reply; 10+ messages in thread
From: Marek Szyprowski @ 2023-05-15  9:09 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

On 10.05.2023 16:46, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
>
> The commit 565b4824c39f ("devlink: change port event netdev notifier
> from per-net to global") changed original per-net notifier to be
> per-devlink instance. That fixed the issue of non-receiving events
> of netdev uninit if that moved to a different namespace.
> That worked fine in -net tree.
>
> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
> separate devlink instance for ethernet auxiliary device") and
> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
> case of PCI device suspend") were merged, a deadlock was introduced
> when removing a namespace with devlink instance with another nested
> instance.
>
> Here there is the bad flow example resulting in deadlock with mlx5:
> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
> devlink_pernet_pre_exit() -> devlink_reload() ->
> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
> mlx5e_destroy_devlink() -> devlink_free() ->
> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
>
> Steps to reproduce:
> $ modprobe mlx5_core
> $ ip netns add ns1
> $ devlink dev reload pci/0000:08:00.0 netns ns1
> $ ip netns del ns1
>
> Resolve this by converting the notifier from per-devlink instance to
> a static one registered during init phase and leaving it registered
> forever. Use this notifier for all devlink port instances created
> later on.
>
> Note what a tree needs this fix only in case all of the cited fixes
> commits are present.
>
> Reported-by: Moshe Shemesh <moshe@nvidia.com>
> Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
> Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
> Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

This patch landed recently in linux-next as commit e93c9378e33f 
("devlink: change per-devlink netdev notifier to static one"). 
Unfortunately it causes serious regression with kernel compiled from 
multi_v7_defconfig (ARM 32bit) on all my test boards. Here is a example 
of the boot failure observed on QEMU's virt ARM 32bit machine:

8<--- cut here ---
Unable to handle kernel execution of memory at virtual address e5150010 
when execute
[e5150010] *pgd=4267a811, *pte=04750653, *ppte=04750453
Internal error: Oops: 8000000f [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 779 Comm: ip Not tainted 6.4.0-rc2-next-20230515 #6688
Hardware name: Generic DT based system
PC is at 0xe5150010
LR is at notifier_call_chain+0x60/0x11c
pc : [<e5150010>]    lr : [<c0365f50>]    psr: 60000013
...
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 4397806a  DAC: 00000051
...
Process ip (pid: 779, stack limit = 0x82b757b5)
Stack: (0xe9201a00 to 0xe9202000)
...
  notifier_call_chain from raw_notifier_call_chain+0x18/0x20
  raw_notifier_call_chain from __dev_open+0x74/0x190
  __dev_open from __dev_change_flags+0x17c/0x1f4
  __dev_change_flags from dev_change_flags+0x18/0x54
  dev_change_flags from do_setlink+0x24c/0xefc
  do_setlink from rtnl_newlink+0x534/0x818
  rtnl_newlink from rtnetlink_rcv_msg+0x250/0x300
  rtnetlink_rcv_msg from netlink_rcv_skb+0xb8/0x110
  netlink_rcv_skb from netlink_unicast+0x1f8/0x2bc
  netlink_unicast from netlink_sendmsg+0x1cc/0x44c
  netlink_sendmsg from ____sys_sendmsg+0x9c/0x260
  ____sys_sendmsg from ___sys_sendmsg+0x68/0x94
  ___sys_sendmsg from sys_sendmsg+0x4c/0x88
  sys_sendmsg from ret_fast_syscall+0x0/0x54
Exception stack(0xe9201fa8 to 0xe9201ff0)
...
---[ end trace 0000000000000000 ]---
[....] Configuring network interfaces...Segmentation fault
ifup: failed to bring up lo


Reverting $subject patch on top of linux next-20230515 fixes this issue.


> ---
>   net/devlink/core.c          | 16 +++++++---------
>   net/devlink/devl_internal.h |  1 -
>   net/devlink/leftover.c      |  5 ++---
>   3 files changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/net/devlink/core.c b/net/devlink/core.c
> index 777b091ef74d..0e58eee44bdb 100644
> --- a/net/devlink/core.c
> +++ b/net/devlink/core.c
> @@ -204,11 +204,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
>   	if (ret < 0)
>   		goto err_xa_alloc;
>   
> -	devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event;
> -	ret = register_netdevice_notifier(&devlink->netdevice_nb);
> -	if (ret)
> -		goto err_register_netdevice_notifier;
> -
>   	devlink->dev = dev;
>   	devlink->ops = ops;
>   	xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC);
> @@ -233,8 +228,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
>   
>   	return devlink;
>   
> -err_register_netdevice_notifier:
> -	xa_erase(&devlinks, devlink->index);
>   err_xa_alloc:
>   	kfree(devlink);
>   	return NULL;
> @@ -266,8 +259,6 @@ void devlink_free(struct devlink *devlink)
>   	xa_destroy(&devlink->params);
>   	xa_destroy(&devlink->ports);
>   
> -	WARN_ON_ONCE(unregister_netdevice_notifier(&devlink->netdevice_nb));
> -
>   	xa_erase(&devlinks, devlink->index);
>   
>   	devlink_put(devlink);
> @@ -303,6 +294,10 @@ static struct pernet_operations devlink_pernet_ops __net_initdata = {
>   	.pre_exit = devlink_pernet_pre_exit,
>   };
>   
> +static struct notifier_block devlink_port_netdevice_nb __net_initdata = {
> +	.notifier_call = devlink_port_netdevice_event,
> +};
> +
>   static int __init devlink_init(void)
>   {
>   	int err;
> @@ -311,6 +306,9 @@ static int __init devlink_init(void)
>   	if (err)
>   		goto out;
>   	err = register_pernet_subsys(&devlink_pernet_ops);
> +	if (err)
> +		goto out;
> +	err = register_netdevice_notifier(&devlink_port_netdevice_nb);
>   
>   out:
>   	WARN_ON(err);
> diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
> index e133f423294a..62921b2eb0d3 100644
> --- a/net/devlink/devl_internal.h
> +++ b/net/devlink/devl_internal.h
> @@ -50,7 +50,6 @@ struct devlink {
>   	u8 reload_failed:1;
>   	refcount_t refcount;
>   	struct rcu_work rwork;
> -	struct notifier_block netdevice_nb;
>   	char priv[] __aligned(NETDEV_ALIGN);
>   };
>   
> diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
> index dffca2f9bfa7..cd0254968076 100644
> --- a/net/devlink/leftover.c
> +++ b/net/devlink/leftover.c
> @@ -7073,10 +7073,9 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
>   	struct devlink_port *devlink_port = netdev->devlink_port;
>   	struct devlink *devlink;
>   
> -	devlink = container_of(nb, struct devlink, netdevice_nb);
> -
> -	if (!devlink_port || devlink_port->devlink != devlink)
> +	if (!devlink_port)
>   		return NOTIFY_OK;
> +	devlink = devlink_port->devlink;
>   
>   	switch (event) {
>   	case NETDEV_POST_INIT:

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch net] devlink: change per-devlink netdev notifier to static one
  2023-05-15  9:09   ` Marek Szyprowski
@ 2023-05-15 11:35     ` Jiri Pirko
  2023-05-15 12:05       ` Ido Schimmel
  0 siblings, 1 reply; 10+ messages in thread
From: Jiri Pirko @ 2023-05-15 11:35 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: netdev, kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

Mon, May 15, 2023 at 11:09:10AM CEST, m.szyprowski@samsung.com wrote:
>On 10.05.2023 16:46, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@nvidia.com>
>>
>> The commit 565b4824c39f ("devlink: change port event netdev notifier
>> from per-net to global") changed original per-net notifier to be
>> per-devlink instance. That fixed the issue of non-receiving events
>> of netdev uninit if that moved to a different namespace.
>> That worked fine in -net tree.
>>
>> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
>> separate devlink instance for ethernet auxiliary device") and
>> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
>> case of PCI device suspend") were merged, a deadlock was introduced
>> when removing a namespace with devlink instance with another nested
>> instance.
>>
>> Here there is the bad flow example resulting in deadlock with mlx5:
>> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
>> devlink_pernet_pre_exit() -> devlink_reload() ->
>> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
>> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
>> mlx5e_destroy_devlink() -> devlink_free() ->
>> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
>>
>> Steps to reproduce:
>> $ modprobe mlx5_core
>> $ ip netns add ns1
>> $ devlink dev reload pci/0000:08:00.0 netns ns1
>> $ ip netns del ns1
>>
>> Resolve this by converting the notifier from per-devlink instance to
>> a static one registered during init phase and leaving it registered
>> forever. Use this notifier for all devlink port instances created
>> later on.
>>
>> Note what a tree needs this fix only in case all of the cited fixes
>> commits are present.
>>
>> Reported-by: Moshe Shemesh <moshe@nvidia.com>
>> Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
>> Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
>> Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
>> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>
>This patch landed recently in linux-next as commit e93c9378e33f 
>("devlink: change per-devlink netdev notifier to static one"). 
>Unfortunately it causes serious regression with kernel compiled from 
>multi_v7_defconfig (ARM 32bit) on all my test boards. Here is a example 
>of the boot failure observed on QEMU's virt ARM 32bit machine:
>
>8<--- cut here ---
>Unable to handle kernel execution of memory at virtual address e5150010 
>when execute
>[e5150010] *pgd=4267a811, *pte=04750653, *ppte=04750453
>Internal error: Oops: 8000000f [#1] SMP ARM
>Modules linked in:
>CPU: 0 PID: 779 Comm: ip Not tainted 6.4.0-rc2-next-20230515 #6688
>Hardware name: Generic DT based system
>PC is at 0xe5150010
>LR is at notifier_call_chain+0x60/0x11c
>pc : [<e5150010>]    lr : [<c0365f50>]    psr: 60000013
>...
>Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
>Control: 10c5387d  Table: 4397806a  DAC: 00000051
>...
>Process ip (pid: 779, stack limit = 0x82b757b5)
>Stack: (0xe9201a00 to 0xe9202000)
>...
>  notifier_call_chain from raw_notifier_call_chain+0x18/0x20
>  raw_notifier_call_chain from __dev_open+0x74/0x190
>  __dev_open from __dev_change_flags+0x17c/0x1f4
>  __dev_change_flags from dev_change_flags+0x18/0x54
>  dev_change_flags from do_setlink+0x24c/0xefc
>  do_setlink from rtnl_newlink+0x534/0x818
>  rtnl_newlink from rtnetlink_rcv_msg+0x250/0x300
>  rtnetlink_rcv_msg from netlink_rcv_skb+0xb8/0x110
>  netlink_rcv_skb from netlink_unicast+0x1f8/0x2bc
>  netlink_unicast from netlink_sendmsg+0x1cc/0x44c
>  netlink_sendmsg from ____sys_sendmsg+0x9c/0x260
>  ____sys_sendmsg from ___sys_sendmsg+0x68/0x94
>  ___sys_sendmsg from sys_sendmsg+0x4c/0x88
>  sys_sendmsg from ret_fast_syscall+0x0/0x54
>Exception stack(0xe9201fa8 to 0xe9201ff0)

Thanks for the report. From the first sight, don't have a clue what may
be wrong. Debugging.


>...
>---[ end trace 0000000000000000 ]---
>[....] Configuring network interfaces...Segmentation fault
>ifup: failed to bring up lo
>
>
>Reverting $subject patch on top of linux next-20230515 fixes this issue.
>
>
>> ---
>>   net/devlink/core.c          | 16 +++++++---------
>>   net/devlink/devl_internal.h |  1 -
>>   net/devlink/leftover.c      |  5 ++---
>>   3 files changed, 9 insertions(+), 13 deletions(-)
>>
>> diff --git a/net/devlink/core.c b/net/devlink/core.c
>> index 777b091ef74d..0e58eee44bdb 100644
>> --- a/net/devlink/core.c
>> +++ b/net/devlink/core.c
>> @@ -204,11 +204,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
>>   	if (ret < 0)
>>   		goto err_xa_alloc;
>>   
>> -	devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event;
>> -	ret = register_netdevice_notifier(&devlink->netdevice_nb);
>> -	if (ret)
>> -		goto err_register_netdevice_notifier;
>> -
>>   	devlink->dev = dev;
>>   	devlink->ops = ops;
>>   	xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC);
>> @@ -233,8 +228,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
>>   
>>   	return devlink;
>>   
>> -err_register_netdevice_notifier:
>> -	xa_erase(&devlinks, devlink->index);
>>   err_xa_alloc:
>>   	kfree(devlink);
>>   	return NULL;
>> @@ -266,8 +259,6 @@ void devlink_free(struct devlink *devlink)
>>   	xa_destroy(&devlink->params);
>>   	xa_destroy(&devlink->ports);
>>   
>> -	WARN_ON_ONCE(unregister_netdevice_notifier(&devlink->netdevice_nb));
>> -
>>   	xa_erase(&devlinks, devlink->index);
>>   
>>   	devlink_put(devlink);
>> @@ -303,6 +294,10 @@ static struct pernet_operations devlink_pernet_ops __net_initdata = {
>>   	.pre_exit = devlink_pernet_pre_exit,
>>   };
>>   
>> +static struct notifier_block devlink_port_netdevice_nb __net_initdata = {
>> +	.notifier_call = devlink_port_netdevice_event,
>> +};
>> +
>>   static int __init devlink_init(void)
>>   {
>>   	int err;
>> @@ -311,6 +306,9 @@ static int __init devlink_init(void)
>>   	if (err)
>>   		goto out;
>>   	err = register_pernet_subsys(&devlink_pernet_ops);
>> +	if (err)
>> +		goto out;
>> +	err = register_netdevice_notifier(&devlink_port_netdevice_nb);
>>   
>>   out:
>>   	WARN_ON(err);
>> diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
>> index e133f423294a..62921b2eb0d3 100644
>> --- a/net/devlink/devl_internal.h
>> +++ b/net/devlink/devl_internal.h
>> @@ -50,7 +50,6 @@ struct devlink {
>>   	u8 reload_failed:1;
>>   	refcount_t refcount;
>>   	struct rcu_work rwork;
>> -	struct notifier_block netdevice_nb;
>>   	char priv[] __aligned(NETDEV_ALIGN);
>>   };
>>   
>> diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
>> index dffca2f9bfa7..cd0254968076 100644
>> --- a/net/devlink/leftover.c
>> +++ b/net/devlink/leftover.c
>> @@ -7073,10 +7073,9 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
>>   	struct devlink_port *devlink_port = netdev->devlink_port;
>>   	struct devlink *devlink;
>>   
>> -	devlink = container_of(nb, struct devlink, netdevice_nb);
>> -
>> -	if (!devlink_port || devlink_port->devlink != devlink)
>> +	if (!devlink_port)
>>   		return NOTIFY_OK;
>> +	devlink = devlink_port->devlink;
>>   
>>   	switch (event) {
>>   	case NETDEV_POST_INIT:
>
>Best regards
>-- 
>Marek Szyprowski, PhD
>Samsung R&D Institute Poland
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch net] devlink: change per-devlink netdev notifier to static one
  2023-05-15 11:35     ` Jiri Pirko
@ 2023-05-15 12:05       ` Ido Schimmel
  2023-05-15 12:28         ` Marek Szyprowski
  2023-05-15 12:37         ` Jiri Pirko
  0 siblings, 2 replies; 10+ messages in thread
From: Ido Schimmel @ 2023-05-15 12:05 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Marek Szyprowski, netdev, kuba, pabeni, davem, edumazet,
	jacob.e.keller, saeedm, moshe

On Mon, May 15, 2023 at 01:35:18PM +0200, Jiri Pirko wrote:
> Thanks for the report. From the first sight, don't have a clue what may
> be wrong. Debugging.

I guess he has CONFIG_NET_NS disabled which turns "__net_initdata" to
"__initdata" and frees the notifier block after init. "__net_initdata"
is a NOP when CONFIG_NET_NS is enabled.

Maybe this will help:

diff --git a/net/devlink/core.c b/net/devlink/core.c
index 0e58eee44bdb..c23ebabadc52 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -294,7 +294,7 @@ static struct pernet_operations devlink_pernet_ops __net_initdata = {
        .pre_exit = devlink_pernet_pre_exit,
 };
 
-static struct notifier_block devlink_port_netdevice_nb __net_initdata = {
+static struct notifier_block devlink_port_netdevice_nb = {
        .notifier_call = devlink_port_netdevice_event,
 };

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch net] devlink: change per-devlink netdev notifier to static one
  2023-05-15 12:05       ` Ido Schimmel
@ 2023-05-15 12:28         ` Marek Szyprowski
  2023-05-15 12:37         ` Jiri Pirko
  1 sibling, 0 replies; 10+ messages in thread
From: Marek Szyprowski @ 2023-05-15 12:28 UTC (permalink / raw)
  To: Ido Schimmel, Jiri Pirko
  Cc: netdev, kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

On 15.05.2023 14:05, Ido Schimmel wrote:
> On Mon, May 15, 2023 at 01:35:18PM +0200, Jiri Pirko wrote:
>> Thanks for the report. From the first sight, don't have a clue what may
>> be wrong. Debugging.
> I guess he has CONFIG_NET_NS disabled which turns "__net_initdata" to
> "__initdata" and frees the notifier block after init. "__net_initdata"
> is a NOP when CONFIG_NET_NS is enabled.
>
> Maybe this will help:
>
> diff --git a/net/devlink/core.c b/net/devlink/core.c
> index 0e58eee44bdb..c23ebabadc52 100644
> --- a/net/devlink/core.c
> +++ b/net/devlink/core.c
> @@ -294,7 +294,7 @@ static struct pernet_operations devlink_pernet_ops __net_initdata = {
>          .pre_exit = devlink_pernet_pre_exit,
>   };
>   
> -static struct notifier_block devlink_port_netdevice_nb __net_initdata = {
> +static struct notifier_block devlink_port_netdevice_nb = {
>          .notifier_call = devlink_port_netdevice_event,
>   };

Bingo, this fixes the issue.

Feel free to add:

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

to the final patch. Thanks!


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch net] devlink: change per-devlink netdev notifier to static one
  2023-05-15 12:05       ` Ido Schimmel
  2023-05-15 12:28         ` Marek Szyprowski
@ 2023-05-15 12:37         ` Jiri Pirko
  2023-05-15 13:40           ` Ido Schimmel
  1 sibling, 1 reply; 10+ messages in thread
From: Jiri Pirko @ 2023-05-15 12:37 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Marek Szyprowski, netdev, kuba, pabeni, davem, edumazet,
	jacob.e.keller, saeedm, moshe

Mon, May 15, 2023 at 02:05:54PM CEST, idosch@idosch.org wrote:
>On Mon, May 15, 2023 at 01:35:18PM +0200, Jiri Pirko wrote:
>> Thanks for the report. From the first sight, don't have a clue what may
>> be wrong. Debugging.
>
>I guess he has CONFIG_NET_NS disabled which turns "__net_initdata" to
>"__initdata" and frees the notifier block after init. "__net_initdata"
>is a NOP when CONFIG_NET_NS is enabled.
>
>Maybe this will help:
>
>diff --git a/net/devlink/core.c b/net/devlink/core.c
>index 0e58eee44bdb..c23ebabadc52 100644
>--- a/net/devlink/core.c
>+++ b/net/devlink/core.c
>@@ -294,7 +294,7 @@ static struct pernet_operations devlink_pernet_ops __net_initdata = {
>        .pre_exit = devlink_pernet_pre_exit,
> };
> 
>-static struct notifier_block devlink_port_netdevice_nb __net_initdata = {
>+static struct notifier_block devlink_port_netdevice_nb = {
>        .notifier_call = devlink_port_netdevice_event,
> };

Yeah I just ended up with the same assumption. That is going to fix it.
Are you sending the proper patch?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch net] devlink: change per-devlink netdev notifier to static one
  2023-05-15 12:37         ` Jiri Pirko
@ 2023-05-15 13:40           ` Ido Schimmel
  0 siblings, 0 replies; 10+ messages in thread
From: Ido Schimmel @ 2023-05-15 13:40 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Marek Szyprowski, netdev, kuba, pabeni, davem, edumazet,
	jacob.e.keller, saeedm, moshe

On Mon, May 15, 2023 at 02:37:45PM +0200, Jiri Pirko wrote:
> Mon, May 15, 2023 at 02:05:54PM CEST, idosch@idosch.org wrote:
> >On Mon, May 15, 2023 at 01:35:18PM +0200, Jiri Pirko wrote:
> >> Thanks for the report. From the first sight, don't have a clue what may
> >> be wrong. Debugging.
> >
> >I guess he has CONFIG_NET_NS disabled which turns "__net_initdata" to
> >"__initdata" and frees the notifier block after init. "__net_initdata"
> >is a NOP when CONFIG_NET_NS is enabled.
> >
> >Maybe this will help:
> >
> >diff --git a/net/devlink/core.c b/net/devlink/core.c
> >index 0e58eee44bdb..c23ebabadc52 100644
> >--- a/net/devlink/core.c
> >+++ b/net/devlink/core.c
> >@@ -294,7 +294,7 @@ static struct pernet_operations devlink_pernet_ops __net_initdata = {
> >        .pre_exit = devlink_pernet_pre_exit,
> > };
> > 
> >-static struct notifier_block devlink_port_netdevice_nb __net_initdata = {
> >+static struct notifier_block devlink_port_netdevice_nb = {
> >        .notifier_call = devlink_port_netdevice_event,
> > };
> 
> Yeah I just ended up with the same assumption. That is going to fix it.
> Are you sending the proper patch?

Yes. Will send later today

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-15 13:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-10 14:46 [patch net] devlink: change per-devlink netdev notifier to static one Jiri Pirko
2023-05-11  1:01 ` Jakub Kicinski
2023-05-11 12:18 ` Simon Horman
2023-05-12  1:10 ` patchwork-bot+netdevbpf
     [not found] ` <CGME20230515090912eucas1p2489efdc97f9cf1fddf2aad0449e8a2c7@eucas1p2.samsung.com>
2023-05-15  9:09   ` Marek Szyprowski
2023-05-15 11:35     ` Jiri Pirko
2023-05-15 12:05       ` Ido Schimmel
2023-05-15 12:28         ` Marek Szyprowski
2023-05-15 12:37         ` Jiri Pirko
2023-05-15 13:40           ` Ido Schimmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.