All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: Leon Romanovsky <leon@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Jiri Pirko <jiri@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Parav Pandit <parav@nvidia.com>
Subject: Re: [PATCH net-next v1 1/2] devlink: Break parameter notification sequence to be before/after unload/load driver
Date: Thu, 29 Jul 2021 16:33:58 +0200	[thread overview]
Message-ID: <YQK8VkXweZmWTggC@nanopsycho> (raw)
In-Reply-To: <a9a61cdee79cbfefa4e4e2cc973fe27a10b7ee4f.1627564383.git.leonro@nvidia.com>

Thu, Jul 29, 2021 at 03:17:49PM CEST, leon@kernel.org wrote:
>From: Leon Romanovsky <leonro@nvidia.com>
>
>The change of namespaces during devlink reload calls to driver unload
>before it accesses devlink parameters. The commands below causes to
>use-after-free bug when trying to get flow steering mode.
>
> * ip netns add n1
> * devlink dev reload pci/0000:00:09.0 netns n1
>
> ==================================================================
> BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
> Read of size 4 at addr ffff888009d04308 by task devlink/275
>
> CPU: 6 PID: 275 Comm: devlink Not tainted 5.12.0-rc2+ #2853
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  dump_stack+0x93/0xc2
>  print_address_description.constprop.0+0x18/0x140
>  ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
>  ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
>  kasan_report.cold+0x7c/0xd8
>  ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
>  mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
>  devlink_nl_param_fill+0x1c8/0xe80
>  ? __free_pages_ok+0x37a/0x8a0
>  ? devlink_flash_update_timeout_notify+0xd0/0xd0
>  ? lock_acquire+0x1a9/0x6d0
>  ? fs_reclaim_acquire+0xb7/0x160
>  ? lock_is_held_type+0x98/0x110
>  ? 0xffffffff81000000
>  ? lock_release+0x1f9/0x6c0
>  ? fs_reclaim_release+0xa1/0xf0
>  ? lock_downgrade+0x6d0/0x6d0
>  ? lock_is_held_type+0x98/0x110
>  ? lock_is_held_type+0x98/0x110
>  ? memset+0x20/0x40
>  ? __build_skb_around+0x1f8/0x2b0
>  devlink_param_notify+0x6d/0x180
>  devlink_reload+0x1c3/0x520
>  ? devlink_remote_reload_actions_performed+0x30/0x30
>  ? mutex_trylock+0x24b/0x2d0
>  ? devlink_nl_cmd_reload+0x62b/0x1070
>  devlink_nl_cmd_reload+0x66d/0x1070
>  ? devlink_reload+0x520/0x520
>  ? devlink_get_from_attrs+0x1bc/0x260
>  ? devlink_nl_pre_doit+0x64/0x4d0
>  genl_family_rcv_msg_doit+0x1e9/0x2f0
>  ? mutex_lock_io_nested+0x1130/0x1130
>  ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
>  ? security_capable+0x51/0x90
>  genl_rcv_msg+0x27f/0x4a0
>  ? genl_get_cmd+0x3c0/0x3c0
>  ? lock_acquire+0x1a9/0x6d0
>  ? devlink_reload+0x520/0x520
>  ? lock_release+0x6c0/0x6c0
>  netlink_rcv_skb+0x11d/0x340
>  ? genl_get_cmd+0x3c0/0x3c0
>  ? netlink_ack+0x9f0/0x9f0
>  ? lock_release+0x1f9/0x6c0
>  genl_rcv+0x24/0x40
>  netlink_unicast+0x433/0x700
>  ? netlink_attachskb+0x730/0x730
>  ? _copy_from_iter_full+0x178/0x650
>  ? __alloc_skb+0x113/0x2b0
>  netlink_sendmsg+0x6f1/0xbd0
>  ? netlink_unicast+0x700/0x700
>  ? lock_is_held_type+0x98/0x110
>  ? netlink_unicast+0x700/0x700
>  sock_sendmsg+0xb0/0xe0
>  __sys_sendto+0x193/0x240
>  ? __x64_sys_getpeername+0xb0/0xb0
>  ? do_sys_openat2+0x10b/0x370
>  ? __up_read+0x1a1/0x7b0
>  ? do_user_addr_fault+0x219/0xdc0
>  ? __x64_sys_openat+0x120/0x1d0
>  ? __x64_sys_open+0x1a0/0x1a0
>  __x64_sys_sendto+0xdd/0x1b0
>  ? syscall_enter_from_user_mode+0x1d/0x50
>  do_syscall_64+0x2d/0x40
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fc69d0af14a
> Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
> RSP: 002b:00007ffc1d8292f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fc69d0af14a
> RDX: 0000000000000038 RSI: 0000555f57c56440 RDI: 0000000000000003
> RBP: 0000555f57c56410 R08: 00007fc69d17b200 R09: 000000000000000c
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>
> Allocated by task 146:
>  kasan_save_stack+0x1b/0x40
>  __kasan_kmalloc+0x99/0xc0
>  mlx5_init_fs+0xf0/0x1c50 [mlx5_core]
>  mlx5_load+0xd2/0x180 [mlx5_core]
>  mlx5_init_one+0x2f6/0x450 [mlx5_core]
>  probe_one+0x47d/0x6e0 [mlx5_core]
>  pci_device_probe+0x2a0/0x4a0
>  really_probe+0x20a/0xc90
>  driver_probe_device+0xd8/0x380
>  device_driver_attach+0x1df/0x250
>  __driver_attach+0xff/0x240
>  bus_for_each_dev+0x11e/0x1a0
>  bus_add_driver+0x309/0x570
>  driver_register+0x1ee/0x380
>  0xffffffffa06b8062
>  do_one_initcall+0xd5/0x410
>  do_init_module+0x1c8/0x760
>  load_module+0x6d8b/0x9650
>  __do_sys_finit_module+0x118/0x1b0
>  do_syscall_64+0x2d/0x40
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Freed by task 275:
>  kasan_save_stack+0x1b/0x40
>  kasan_set_track+0x1c/0x30
>  kasan_set_free_info+0x20/0x30
>  __kasan_slab_free+0x102/0x140
>  slab_free_freelist_hook+0x74/0x1b0
>  kfree+0xd7/0x2a0
>  mlx5_unload+0x16/0xb0 [mlx5_core]
>  mlx5_unload_one+0xae/0x120 [mlx5_core]
>  mlx5_devlink_reload_down+0x1bc/0x380 [mlx5_core]
>  devlink_reload+0x141/0x520
>  devlink_nl_cmd_reload+0x66d/0x1070
>  genl_family_rcv_msg_doit+0x1e9/0x2f0
>  genl_rcv_msg+0x27f/0x4a0
>  netlink_rcv_skb+0x11d/0x340
>  genl_rcv+0x24/0x40
>  netlink_unicast+0x433/0x700
>  netlink_sendmsg+0x6f1/0xbd0
>  sock_sendmsg+0xb0/0xe0
>  __sys_sendto+0x193/0x240
>  __x64_sys_sendto+0xdd/0x1b0
>  do_syscall_64+0x2d/0x40
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> The buggy address belongs to the object at ffff888009d04300
>  which belongs to the cache kmalloc-128 of size 128
> The buggy address is located 8 bytes inside of
>  128-byte region [ffff888009d04300, ffff888009d04380)
> The buggy address belongs to the page:
> page:0000000086a64ecc refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888009d04000 pfn:0x9d04
> head:0000000086a64ecc order:1 compound_mapcount:0
> flags: 0x4000000000010200(slab|head)
> raw: 4000000000010200 ffffea0000203980 0000000200000002 ffff8880050428c0
> raw: ffff888009d04000 000000008020001d 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>  ffff888009d04200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff888009d04280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >ffff888009d04300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                       ^
>  ffff888009d04380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff888009d04400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>
>The right solution to devlink reload is to notify about deletion of
>parameters, unload driver, change net namespaces, load driver and notify
>about addition of parameters.
>
>Fixes: 070c63f20f6c ("net: devlink: allow to change namespaces during reload")
>Reviewed-by: Parav Pandit <parav@nvidia.com>
>Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>---
> net/core/devlink.c | 32 ++++++++++++++++++++------------
> 1 file changed, 20 insertions(+), 12 deletions(-)
>
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index b596a971b473..4385930b896f 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -3801,8 +3801,9 @@ static void devlink_param_notify(struct devlink *devlink,
> 				 struct devlink_param_item *param_item,
> 				 enum devlink_command cmd);
> 
>-static void devlink_reload_netns_change(struct devlink *devlink,
>-					struct net *dest_net)
>+static void devlink_ns_change_notify(struct devlink *devlink,
>+				     struct net *dest_net, struct net *curr_net,
>+				     enum devlink_command cmd, bool new)

Drop the cmd and determine the DEVLINK_CMD_PARAM_NEW/DEL by "new" arg as
well. I thought I wrote that clearly in my previous review, but
apparently not, sorry about that.



> {
> 	struct devlink_param_item *param_item;
> 
>@@ -3812,17 +3813,17 @@ static void devlink_reload_netns_change(struct devlink *devlink,
> 	 * reload process so the notifications are generated separatelly.
> 	 */
> 
>-	list_for_each_entry(param_item, &devlink->param_list, list)
>-		devlink_param_notify(devlink, 0, param_item,
>-				     DEVLINK_CMD_PARAM_DEL);
>-	devlink_notify(devlink, DEVLINK_CMD_DEL);
>+	if (!dest_net || net_eq(dest_net, curr_net))
>+		return;
> 
>-	__devlink_net_set(devlink, dest_net);
>+	if (new)
>+		devlink_notify(devlink, DEVLINK_CMD_NEW);
> 
>-	devlink_notify(devlink, DEVLINK_CMD_NEW);
> 	list_for_each_entry(param_item, &devlink->param_list, list)
>-		devlink_param_notify(devlink, 0, param_item,
>-				     DEVLINK_CMD_PARAM_NEW);
>+		devlink_param_notify(devlink, 0, param_item, cmd);

Like this:
		devlink_param_notify(devlink, 0, param_item, new ?
				     DEVLINK_CMD_PARAM_NEW ?
				     DEVLINK_CMD_PARAM_DEL);

>+
>+	if (!new)
>+		devlink_notify(devlink, DEVLINK_CMD_DEL);
> }
> 
> static bool devlink_reload_supported(const struct devlink_ops *ops)
>@@ -3902,6 +3903,7 @@ static int devlink_reload(struct devlink *devlink, struct net *dest_net,
> 			  u32 *actions_performed, struct netlink_ext_ack *extack)
> {
> 	u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
>+	struct net *curr_net;
> 	int err;
> 
> 	if (!devlink->reload_enabled)
>@@ -3909,18 +3911,24 @@ static int devlink_reload(struct devlink *devlink, struct net *dest_net,
> 
> 	memcpy(remote_reload_stats, devlink->stats.remote_reload_stats,
> 	       sizeof(remote_reload_stats));
>+
>+	curr_net = devlink_net(devlink);
>+	devlink_ns_change_notify(devlink, dest_net, curr_net,
>+				 DEVLINK_CMD_PARAM_DEL, false);
> 	err = devlink->ops->reload_down(devlink, !!dest_net, action, limit, extack);
> 	if (err)
> 		return err;
> 
>-	if (dest_net && !net_eq(dest_net, devlink_net(devlink)))
>-		devlink_reload_netns_change(devlink, dest_net);
>+	if (dest_net && !net_eq(dest_net, curr_net))
>+		__devlink_net_set(devlink, dest_net);
> 
> 	err = devlink->ops->reload_up(devlink, action, limit, actions_performed, extack);
> 	devlink_reload_failed_set(devlink, !!err);
> 	if (err)
> 		return err;
> 
>+	devlink_ns_change_notify(devlink, dest_net, curr_net,
>+				 DEVLINK_CMD_PARAM_NEW, true);
> 	WARN_ON(!(*actions_performed & BIT(action)));
> 	/* Catch driver on updating the remote action within devlink reload */
> 	WARN_ON(memcmp(remote_reload_stats, devlink->stats.remote_reload_stats,
>-- 
>2.31.1
>

  reply	other threads:[~2021-07-29 14:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 13:17 [PATCH net-next v1 0/2] Clean devlink net namespace operations Leon Romanovsky
2021-07-29 13:17 ` [PATCH net-next v1 1/2] devlink: Break parameter notification sequence to be before/after unload/load driver Leon Romanovsky
2021-07-29 14:33   ` Jiri Pirko [this message]
2021-07-29 14:38     ` Leon Romanovsky
2021-07-29 13:17 ` [PATCH net-next v1 2/2] devlink: Allocate devlink directly in requested net namespace Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YQK8VkXweZmWTggC@nanopsycho \
    --to=jiri@resnulli.us \
    --cc=davem@davemloft.net \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=parav@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.