netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/3] openvswitch: allow specifying ifindex of new interfaces
@ 2022-08-19 15:30 Andrey Zhadchenko
  2022-08-19 15:30 ` [PATCH net-next v2 1/3] " Andrey Zhadchenko
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Andrey Zhadchenko @ 2022-08-19 15:30 UTC (permalink / raw)
  To: netdev
  Cc: dev, pshelar, davem, edumazet, kuba, pabeni, ptikhomirov,
	alexander.mikhalitsyn, avagin, brauner, mark.d.gray, i.maximets,
	aconole

Hi!

CRIU currently do not support checkpoint/restore of OVS configurations, but
there was several requests for it. For example,
https://github.com/lxc/lxc/issues/2909

The main problem is ifindexes of newly created interfaces. We realy need to
preserve them after restore. Current openvswitch API does not allow to
specify ifindex. Most of the time we can just create an interface via
generic netlink requests and plug it into ovs but datapaths (generally any
OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
do not support selecting ifindex.

This patch allows to do so.
For new datapaths I decided to use dp_infindex in header as infindex
because it control ifindex for other requests too.
For internal vports I reused OVS_VPORT_ATTR_IFINDEX.

The only concern I have is that previously dp_ifindex was not used for
OVS_DP_VMD_NEW requests and some software may not set it to zero. However
we have been running this patch at Virtuozzo for 2 years and have not
encountered this problem. Not sure if it is worth to add new
ovs_datapath_attr instead.

v2:
Added two more patches.

Add OVS_DP_ATTR_PER_CPU_PIDS to dumps as suggested by Ilya Maximets.
Without it we won't be able to checkpoint/restore new openvswitch
configurations which use OVS_DP_F_DISPATCH_UPCALL_PER_CPU flag.

Found and fixed memory leak on datapath creation error path.

Andrey Zhadchenko (3):
  openvswitch: allow specifying ifindex of new interfaces
  openvswitch: fix memory leak at failed datapath creation
  openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests

 include/uapi/linux/openvswitch.h     |   4 +
 net/openvswitch/datapath.c           | 107 ++++++++++++++++++---------
 net/openvswitch/vport-internal_dev.c |   1 +
 net/openvswitch/vport.h              |   2 +
 4 files changed, 77 insertions(+), 37 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v2 1/3] openvswitch: allow specifying ifindex of new interfaces
  2022-08-19 15:30 [PATCH net-next v2 0/3] openvswitch: allow specifying ifindex of new interfaces Andrey Zhadchenko
@ 2022-08-19 15:30 ` Andrey Zhadchenko
  2022-08-22  8:06   ` Christian Brauner
  2022-08-23  1:37   ` Jakub Kicinski
  2022-08-19 15:30 ` [PATCH net-next v2 2/3] openvswitch: fix memory leak at failed datapath creation Andrey Zhadchenko
  2022-08-19 15:30 ` [PATCH net-next v2 3/3] openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests Andrey Zhadchenko
  2 siblings, 2 replies; 11+ messages in thread
From: Andrey Zhadchenko @ 2022-08-19 15:30 UTC (permalink / raw)
  To: netdev
  Cc: dev, pshelar, davem, edumazet, kuba, pabeni, ptikhomirov,
	alexander.mikhalitsyn, avagin, brauner, mark.d.gray, i.maximets,
	aconole

CRIU is preserving ifindexes of net devices after restoration. However,
current Open vSwitch API does not allow to target ifindex, so we cannot
correctly restore OVS configuration.

Use ovs_header->dp_ifindex during OVS_DP_CMD_NEW as desired ifindex.
Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev
ifindex.

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
---
 include/uapi/linux/openvswitch.h     |  4 ++++
 net/openvswitch/datapath.c           | 16 ++++++++++++++--
 net/openvswitch/vport-internal_dev.c |  1 +
 net/openvswitch/vport.h              |  2 ++
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index ce3e1738d427..8ec82b63f85d 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -79,6 +79,10 @@ enum ovs_datapath_cmd {
  *
  * These attributes follow the &struct ovs_header within the Generic Netlink
  * payload for %OVS_DP_* commands.
+ *
+ * %OVS_DP_CMD_NEW requests will try to create net device with ifindex
+ * specified in dp_ifindex. With dp_infindex == 0 any available number will
+ * be chosen.
  */
 enum ovs_datapath_attr {
 	OVS_DP_ATTR_UNSPEC,
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 7e8a39a35627..8033c97a8d65 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1739,6 +1739,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	struct vport *vport;
 	struct ovs_net *ovs_net;
 	int err;
+	struct ovs_header *ovs_header = info->userhdr;
 
 	err = -EINVAL;
 	if (!a[OVS_DP_ATTR_NAME] || !a[OVS_DP_ATTR_UPCALL_PID])
@@ -1779,6 +1780,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	parms.dp = dp;
 	parms.port_no = OVSP_LOCAL;
 	parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID];
+	parms.desired_ifindex = ovs_header->dp_ifindex;
 
 	/* So far only local changes have been made, now need the lock. */
 	ovs_lock();
@@ -2199,7 +2201,10 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	if (!a[OVS_VPORT_ATTR_NAME] || !a[OVS_VPORT_ATTR_TYPE] ||
 	    !a[OVS_VPORT_ATTR_UPCALL_PID])
 		return -EINVAL;
-	if (a[OVS_VPORT_ATTR_IFINDEX])
+
+	parms.type = nla_get_u32(a[OVS_VPORT_ATTR_TYPE]);
+
+	if (a[OVS_VPORT_ATTR_IFINDEX] && parms.type != OVS_VPORT_TYPE_INTERNAL)
 		return -EOPNOTSUPP;
 
 	port_no = a[OVS_VPORT_ATTR_PORT_NO]
@@ -2236,12 +2241,19 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	parms.name = nla_data(a[OVS_VPORT_ATTR_NAME]);
-	parms.type = nla_get_u32(a[OVS_VPORT_ATTR_TYPE]);
 	parms.options = a[OVS_VPORT_ATTR_OPTIONS];
 	parms.dp = dp;
 	parms.port_no = port_no;
 	parms.upcall_portids = a[OVS_VPORT_ATTR_UPCALL_PID];
 
+	if (parms.type == OVS_VPORT_TYPE_INTERNAL) {
+		if (a[OVS_VPORT_ATTR_IFINDEX])
+			parms.desired_ifindex =
+				nla_get_u32(a[OVS_VPORT_ATTR_IFINDEX]);
+		else
+			parms.desired_ifindex = 0;
+	}
+
 	vport = new_vport(&parms);
 	err = PTR_ERR(vport);
 	if (IS_ERR(vport)) {
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 5b2ee9c1c00b..2ab8a7e06d4b 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -154,6 +154,7 @@ static struct vport *internal_dev_create(const struct vport_parms *parms)
 	if (vport->port_no == OVSP_LOCAL)
 		vport->dev->features |= NETIF_F_NETNS_LOCAL;
 
+	dev->ifindex = parms->desired_ifindex;
 	rtnl_lock();
 	err = register_netdevice(vport->dev);
 	if (err)
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 9de5030d9801..24e1cba2f1ac 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -98,6 +98,8 @@ struct vport_parms {
 	enum ovs_vport_type type;
 	struct nlattr *options;
 
+	int desired_ifindex;
+
 	/* For ovs_vport_alloc(). */
 	struct datapath *dp;
 	u16 port_no;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v2 2/3] openvswitch: fix memory leak at failed datapath creation
  2022-08-19 15:30 [PATCH net-next v2 0/3] openvswitch: allow specifying ifindex of new interfaces Andrey Zhadchenko
  2022-08-19 15:30 ` [PATCH net-next v2 1/3] " Andrey Zhadchenko
@ 2022-08-19 15:30 ` Andrey Zhadchenko
  2022-08-22 12:53   ` Aaron Conole
  2022-08-19 15:30 ` [PATCH net-next v2 3/3] openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests Andrey Zhadchenko
  2 siblings, 1 reply; 11+ messages in thread
From: Andrey Zhadchenko @ 2022-08-19 15:30 UTC (permalink / raw)
  To: netdev
  Cc: dev, pshelar, davem, edumazet, kuba, pabeni, ptikhomirov,
	alexander.mikhalitsyn, avagin, brauner, mark.d.gray, i.maximets,
	aconole

ovs_dp_cmd_new()->ovs_dp_change()->ovs_dp_set_upcall_portids()
allocates array via kmalloc.
If for some reason new_vport() fails during ovs_dp_cmd_new()
dp->upcall_portids must be freed.
Add missing kfree.

Kmemleak example:
unreferenced object 0xffff88800c382500 (size 64):
  comm "dump_state", pid 323, jiffies 4294955418 (age 104.347s)
  hex dump (first 32 bytes):
    5e c2 79 e4 1f 7a 38 c7 09 21 38 0c 80 88 ff ff  ^.y..z8..!8.....
    03 00 00 00 0a 00 00 00 14 00 00 00 28 00 00 00  ............(...
  backtrace:
    [<0000000071bebc9f>] ovs_dp_set_upcall_portids+0x38/0xa0
    [<000000000187d8bd>] ovs_dp_change+0x63/0xe0
    [<000000002397e446>] ovs_dp_cmd_new+0x1f0/0x380
    [<00000000aa06f36e>] genl_family_rcv_msg_doit+0xea/0x150
    [<000000008f583bc4>] genl_rcv_msg+0xdc/0x1e0
    [<00000000fa10e377>] netlink_rcv_skb+0x50/0x100
    [<000000004959cece>] genl_rcv+0x24/0x40
    [<000000004699ac7f>] netlink_unicast+0x23e/0x360
    [<00000000c153573e>] netlink_sendmsg+0x24e/0x4b0
    [<000000006f4aa380>] sock_sendmsg+0x62/0x70
    [<00000000d0068654>] ____sys_sendmsg+0x230/0x270
    [<0000000012dacf7d>] ___sys_sendmsg+0x88/0xd0
    [<0000000011776020>] __sys_sendmsg+0x59/0xa0
    [<000000002e8f2dc1>] do_syscall_64+0x3b/0x90
    [<000000003243e7cb>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: b83d23a2a38b ("openvswitch: Introduce per-cpu upcall dispatch")
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
---
 net/openvswitch/datapath.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 8033c97a8d65..20c9964b74cc 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1804,7 +1804,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 				ovs_dp_reset_user_features(skb, info);
 		}
 
-		goto err_unlock_and_destroy_meters;
+		goto err_destroy_pids;
 	}
 
 	err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
@@ -1819,6 +1819,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	ovs_notify(&dp_datapath_genl_family, reply, info);
 	return 0;
 
+err_destroy_pids:
+	if (rcu_dereference_raw(dp->upcall_portids))
+		kfree(rcu_dereference_raw(dp->upcall_portids));
 err_unlock_and_destroy_meters:
 	ovs_unlock();
 	ovs_meters_exit(dp);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v2 3/3] openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests
  2022-08-19 15:30 [PATCH net-next v2 0/3] openvswitch: allow specifying ifindex of new interfaces Andrey Zhadchenko
  2022-08-19 15:30 ` [PATCH net-next v2 1/3] " Andrey Zhadchenko
  2022-08-19 15:30 ` [PATCH net-next v2 2/3] openvswitch: fix memory leak at failed datapath creation Andrey Zhadchenko
@ 2022-08-19 15:30 ` Andrey Zhadchenko
  2022-08-23  1:41   ` Jakub Kicinski
  2 siblings, 1 reply; 11+ messages in thread
From: Andrey Zhadchenko @ 2022-08-19 15:30 UTC (permalink / raw)
  To: netdev
  Cc: dev, pshelar, davem, edumazet, kuba, pabeni, ptikhomirov,
	alexander.mikhalitsyn, avagin, brauner, mark.d.gray, i.maximets,
	aconole

CRIU needs OVS_DP_ATTR_PER_CPU_PIDS to checkpoint/restore newest
openvswitch versions.
Add pids to generic datapath reply. Adjust reply allocation to reserve
memory for pids and move it under ovs_lock() to ensure than number of
pids is unchanged while we adding them to nlmsg.

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
---
 net/openvswitch/datapath.c | 88 +++++++++++++++++++++++---------------
 1 file changed, 53 insertions(+), 35 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 20c9964b74cc..865c848a041b 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1506,9 +1506,11 @@ static struct genl_family dp_flow_genl_family __ro_after_init = {
 	.module = THIS_MODULE,
 };
 
-static size_t ovs_dp_cmd_msg_size(void)
+static size_t ovs_dp_cmd_msg_size(struct datapath *dp)
 {
 	size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header));
+	struct dp_nlsk_pids *pids = ovsl_dereference(dp->upcall_portids);
+
 
 	msgsize += nla_total_size(IFNAMSIZ);
 	msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_stats));
@@ -1516,6 +1518,9 @@ static size_t ovs_dp_cmd_msg_size(void)
 	msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_USER_FEATURES */
 	msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_MASKS_CACHE_SIZE */
 
+	if (dp->user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU && pids)
+		msgsize += nla_total_size_64bit(sizeof(u32) * pids->n_pids);
+
 	return msgsize;
 }
 
@@ -1527,6 +1532,7 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 	struct ovs_dp_stats dp_stats;
 	struct ovs_dp_megaflow_stats dp_megaflow_stats;
 	int err;
+	struct dp_nlsk_pids *pids = ovsl_dereference(dp->upcall_portids);
 
 	ovs_header = genlmsg_put(skb, portid, seq, &dp_datapath_genl_family,
 				 flags, cmd);
@@ -1556,6 +1562,11 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 			ovs_flow_tbl_masks_cache_size(&dp->table)))
 		goto nla_put_failure;
 
+	if (dp->user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU && pids &&
+	    nla_put_64bit(skb, OVS_DP_ATTR_PER_CPU_PIDS, sizeof(u32) * pids->n_pids,
+			  &pids->pids, OVS_DP_ATTR_PAD))
+		goto nla_put_failure;
+
 	genlmsg_end(skb, ovs_header);
 	return 0;
 
@@ -1565,9 +1576,9 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 	return -EMSGSIZE;
 }
 
-static struct sk_buff *ovs_dp_cmd_alloc_info(void)
+static struct sk_buff *ovs_dp_cmd_alloc_info(struct datapath *dp)
 {
-	return genlmsg_new(ovs_dp_cmd_msg_size(), GFP_KERNEL);
+	return genlmsg_new(ovs_dp_cmd_msg_size(dp), GFP_KERNEL);
 }
 
 /* Called with rcu_read_lock or ovs_mutex. */
@@ -1745,14 +1756,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	if (!a[OVS_DP_ATTR_NAME] || !a[OVS_DP_ATTR_UPCALL_PID])
 		goto err;
 
-	reply = ovs_dp_cmd_alloc_info();
-	if (!reply)
-		return -ENOMEM;
-
-	err = -ENOMEM;
 	dp = kzalloc(sizeof(*dp), GFP_KERNEL);
 	if (dp == NULL)
-		goto err_destroy_reply;
+		return -ENOMEM;
 
 	ovs_dp_set_net(dp, sock_net(skb->sk));
 
@@ -1785,9 +1791,15 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	/* So far only local changes have been made, now need the lock. */
 	ovs_lock();
 
-	err = ovs_dp_change(dp, a);
-	if (err)
+	reply = ovs_dp_cmd_alloc_info(dp);
+	if (!reply) {
+		err = -ENOMEM;
 		goto err_unlock_and_destroy_meters;
+	}
+
+	err = ovs_dp_change(dp, a);
+	if (err)
+		goto err_destroy_reply;
 
 	vport = new_vport(&parms);
 	if (IS_ERR(vport)) {
@@ -1822,6 +1834,8 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 err_destroy_pids:
 	if (rcu_dereference_raw(dp->upcall_portids))
 		kfree(rcu_dereference_raw(dp->upcall_portids));
+err_destroy_reply:
+	kfree_skb(reply);
 err_unlock_and_destroy_meters:
 	ovs_unlock();
 	ovs_meters_exit(dp);
@@ -1833,8 +1847,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	ovs_flow_tbl_destroy(&dp->table);
 err_destroy_dp:
 	kfree(dp);
-err_destroy_reply:
-	kfree_skb(reply);
 err:
 	return err;
 }
@@ -1881,15 +1893,17 @@ static int ovs_dp_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	struct datapath *dp;
 	int err;
 
-	reply = ovs_dp_cmd_alloc_info();
-	if (!reply)
-		return -ENOMEM;
-
 	ovs_lock();
 	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
 	err = PTR_ERR(dp);
 	if (IS_ERR(dp))
-		goto err_unlock_free;
+		goto err_unlock;
+
+	reply = ovs_dp_cmd_alloc_info(dp);
+	if (!reply) {
+		err = -ENOMEM;
+		goto err_unlock;
+	}
 
 	err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
 				   info->snd_seq, 0, OVS_DP_CMD_DEL);
@@ -1902,9 +1916,8 @@ static int ovs_dp_cmd_del(struct sk_buff *skb, struct genl_info *info)
 
 	return 0;
 
-err_unlock_free:
+err_unlock:
 	ovs_unlock();
-	kfree_skb(reply);
 	return err;
 }
 
@@ -1914,19 +1927,21 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	struct datapath *dp;
 	int err;
 
-	reply = ovs_dp_cmd_alloc_info();
-	if (!reply)
-		return -ENOMEM;
-
 	ovs_lock();
 	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
 	err = PTR_ERR(dp);
 	if (IS_ERR(dp))
-		goto err_unlock_free;
+		goto err_unlock;
+
+	reply = ovs_dp_cmd_alloc_info(dp);
+	if (!reply) {
+		err = -ENOMEM;
+		goto err_unlock;
+	}
 
 	err = ovs_dp_change(dp, info->attrs);
 	if (err)
-		goto err_unlock_free;
+		goto err_free;
 
 	err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
 				   info->snd_seq, 0, OVS_DP_CMD_SET);
@@ -1937,9 +1952,10 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 
 	return 0;
 
-err_unlock_free:
-	ovs_unlock();
+err_free:
 	kfree_skb(reply);
+err_unlock:
+	ovs_unlock();
 	return err;
 }
 
@@ -1949,16 +1965,19 @@ static int ovs_dp_cmd_get(struct sk_buff *skb, struct genl_info *info)
 	struct datapath *dp;
 	int err;
 
-	reply = ovs_dp_cmd_alloc_info();
-	if (!reply)
-		return -ENOMEM;
-
 	ovs_lock();
 	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
 	if (IS_ERR(dp)) {
 		err = PTR_ERR(dp);
-		goto err_unlock_free;
+		goto err_unlock;
 	}
+
+	reply = ovs_dp_cmd_alloc_info(dp);
+	if (!reply) {
+		err = -ENOMEM;
+		goto err_unlock;
+	}
+
 	err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
 				   info->snd_seq, 0, OVS_DP_CMD_GET);
 	BUG_ON(err < 0);
@@ -1966,9 +1985,8 @@ static int ovs_dp_cmd_get(struct sk_buff *skb, struct genl_info *info)
 
 	return genlmsg_reply(reply, info);
 
-err_unlock_free:
+err_unlock:
 	ovs_unlock();
-	kfree_skb(reply);
 	return err;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 1/3] openvswitch: allow specifying ifindex of new interfaces
  2022-08-19 15:30 ` [PATCH net-next v2 1/3] " Andrey Zhadchenko
@ 2022-08-22  8:06   ` Christian Brauner
  2022-08-23  1:37   ` Jakub Kicinski
  1 sibling, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2022-08-22  8:06 UTC (permalink / raw)
  To: Andrey Zhadchenko
  Cc: netdev, dev, pshelar, davem, edumazet, kuba, pabeni, ptikhomirov,
	alexander.mikhalitsyn, avagin, mark.d.gray, i.maximets, aconole

On Fri, Aug 19, 2022 at 06:30:42PM +0300, Andrey Zhadchenko wrote:
> CRIU is preserving ifindexes of net devices after restoration. However,
> current Open vSwitch API does not allow to target ifindex, so we cannot
> correctly restore OVS configuration.
> 
> Use ovs_header->dp_ifindex during OVS_DP_CMD_NEW as desired ifindex.
> Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev
> ifindex.
> 
> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
> ---

Looks good to me,
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 2/3] openvswitch: fix memory leak at failed datapath creation
  2022-08-19 15:30 ` [PATCH net-next v2 2/3] openvswitch: fix memory leak at failed datapath creation Andrey Zhadchenko
@ 2022-08-22 12:53   ` Aaron Conole
  2022-08-23  1:38     ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Aaron Conole @ 2022-08-22 12:53 UTC (permalink / raw)
  To: Andrey Zhadchenko
  Cc: netdev, dev, pshelar, davem, edumazet, kuba, pabeni, ptikhomirov,
	alexander.mikhalitsyn, avagin, brauner, mark.d.gray, i.maximets

Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> writes:

> ovs_dp_cmd_new()->ovs_dp_change()->ovs_dp_set_upcall_portids()
> allocates array via kmalloc.
> If for some reason new_vport() fails during ovs_dp_cmd_new()
> dp->upcall_portids must be freed.
> Add missing kfree.
>
> Kmemleak example:
> unreferenced object 0xffff88800c382500 (size 64):
>   comm "dump_state", pid 323, jiffies 4294955418 (age 104.347s)
>   hex dump (first 32 bytes):
>     5e c2 79 e4 1f 7a 38 c7 09 21 38 0c 80 88 ff ff  ^.y..z8..!8.....
>     03 00 00 00 0a 00 00 00 14 00 00 00 28 00 00 00  ............(...
>   backtrace:
>     [<0000000071bebc9f>] ovs_dp_set_upcall_portids+0x38/0xa0
>     [<000000000187d8bd>] ovs_dp_change+0x63/0xe0
>     [<000000002397e446>] ovs_dp_cmd_new+0x1f0/0x380
>     [<00000000aa06f36e>] genl_family_rcv_msg_doit+0xea/0x150
>     [<000000008f583bc4>] genl_rcv_msg+0xdc/0x1e0
>     [<00000000fa10e377>] netlink_rcv_skb+0x50/0x100
>     [<000000004959cece>] genl_rcv+0x24/0x40
>     [<000000004699ac7f>] netlink_unicast+0x23e/0x360
>     [<00000000c153573e>] netlink_sendmsg+0x24e/0x4b0
>     [<000000006f4aa380>] sock_sendmsg+0x62/0x70
>     [<00000000d0068654>] ____sys_sendmsg+0x230/0x270
>     [<0000000012dacf7d>] ___sys_sendmsg+0x88/0xd0
>     [<0000000011776020>] __sys_sendmsg+0x59/0xa0
>     [<000000002e8f2dc1>] do_syscall_64+0x3b/0x90
>     [<000000003243e7cb>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> Fixes: b83d23a2a38b ("openvswitch: Introduce per-cpu upcall dispatch")
> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
> ---

Thanks for this patch.  I guess independent of this series, this patch
should be applied to the net tree as well - it fixes an existing issue.

Acked-by: Aaron Conole <aconole@redhat.com>

I will try as well the other patches in the series.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 1/3] openvswitch: allow specifying ifindex of new interfaces
  2022-08-19 15:30 ` [PATCH net-next v2 1/3] " Andrey Zhadchenko
  2022-08-22  8:06   ` Christian Brauner
@ 2022-08-23  1:37   ` Jakub Kicinski
  2022-08-23 13:50     ` Andrey Zhadchenko
  1 sibling, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2022-08-23  1:37 UTC (permalink / raw)
  To: Andrey Zhadchenko
  Cc: netdev, dev, pshelar, davem, edumazet, pabeni, ptikhomirov,
	alexander.mikhalitsyn, avagin, brauner, mark.d.gray, i.maximets,
	aconole

On Fri, 19 Aug 2022 18:30:42 +0300 Andrey Zhadchenko wrote:
> CRIU is preserving ifindexes of net devices after restoration. However,
> current Open vSwitch API does not allow to target ifindex, so we cannot
> correctly restore OVS configuration.
> 
> Use ovs_header->dp_ifindex during OVS_DP_CMD_NEW as desired ifindex.
> Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev
> ifindex.

> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -1739,6 +1739,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
>  	struct vport *vport;
>  	struct ovs_net *ovs_net;
>  	int err;
> +	struct ovs_header *ovs_header = info->userhdr;
>  
>  	err = -EINVAL;
>  	if (!a[OVS_DP_ATTR_NAME] || !a[OVS_DP_ATTR_UPCALL_PID])
> @@ -1779,6 +1780,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
>  	parms.dp = dp;
>  	parms.port_no = OVSP_LOCAL;
>  	parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID];
> +	parms.desired_ifindex = ovs_header->dp_ifindex;

Are you 100% sure that all user space making this call initializes
dp_ifindex to 0? There is no validation in the kernel today that 
the field is not garbage as far as I can tell.

If you are sure, please add the appropriate analysis to the commit msg.

>  	/* So far only local changes have been made, now need the lock. */
>  	ovs_lock();
> @@ -2199,7 +2201,10 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
>  	if (!a[OVS_VPORT_ATTR_NAME] || !a[OVS_VPORT_ATTR_TYPE] ||
>  	    !a[OVS_VPORT_ATTR_UPCALL_PID])
>  		return -EINVAL;
> -	if (a[OVS_VPORT_ATTR_IFINDEX])
> +
> +	parms.type = nla_get_u32(a[OVS_VPORT_ATTR_TYPE]);
> +
> +	if (a[OVS_VPORT_ATTR_IFINDEX] && parms.type != OVS_VPORT_TYPE_INTERNAL)
>  		return -EOPNOTSUPP;
>  
>  	port_no = a[OVS_VPORT_ATTR_PORT_NO]
> @@ -2236,12 +2241,19 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
>  	}
>  
>  	parms.name = nla_data(a[OVS_VPORT_ATTR_NAME]);
> -	parms.type = nla_get_u32(a[OVS_VPORT_ATTR_TYPE]);
>  	parms.options = a[OVS_VPORT_ATTR_OPTIONS];
>  	parms.dp = dp;
>  	parms.port_no = port_no;
>  	parms.upcall_portids = a[OVS_VPORT_ATTR_UPCALL_PID];
>  
> +	if (parms.type == OVS_VPORT_TYPE_INTERNAL) {
> +		if (a[OVS_VPORT_ATTR_IFINDEX])

You already validated that type must be internal for ifindex 
to be specified, so the outer if is unnecessary.

It's pretty common in netlink handling code to validate first
then act assuming validation has passed.

> +			parms.desired_ifindex =
> +				nla_get_u32(a[OVS_VPORT_ATTR_IFINDEX]);
> +		else
> +			parms.desired_ifindex = 0;
> +	}
> +
>  	vport = new_vport(&parms);
>  	err = PTR_ERR(vport);
>  	if (IS_ERR(vport)) {

> diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
> index 9de5030d9801..24e1cba2f1ac 100644
> --- a/net/openvswitch/vport.h
> +++ b/net/openvswitch/vport.h
> @@ -98,6 +98,8 @@ struct vport_parms {
>  	enum ovs_vport_type type;
>  	struct nlattr *options;
>  
> +	int desired_ifindex;

Any chance this field would make sense somewhere else? you're adding 
a 4B field between two pointers it will result in a padding.

Also you're missing kdoc for this field.

>  	/* For ovs_vport_alloc(). */
>  	struct datapath *dp;
>  	u16 port_no;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 2/3] openvswitch: fix memory leak at failed datapath creation
  2022-08-22 12:53   ` Aaron Conole
@ 2022-08-23  1:38     ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2022-08-23  1:38 UTC (permalink / raw)
  To: Aaron Conole
  Cc: Andrey Zhadchenko, netdev, dev, pshelar, davem, edumazet, pabeni,
	ptikhomirov, alexander.mikhalitsyn, avagin, brauner, mark.d.gray,
	i.maximets

On Mon, 22 Aug 2022 08:53:44 -0400 Aaron Conole wrote:
> Thanks for this patch.  I guess independent of this series, this patch
> should be applied to the net tree as well - it fixes an existing issue.

Yes, please, this needs to be reposted separately as [PATCH net].

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 3/3] openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests
  2022-08-19 15:30 ` [PATCH net-next v2 3/3] openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests Andrey Zhadchenko
@ 2022-08-23  1:41   ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2022-08-23  1:41 UTC (permalink / raw)
  To: Andrey Zhadchenko
  Cc: netdev, dev, pshelar, davem, edumazet, pabeni, ptikhomirov,
	alexander.mikhalitsyn, avagin, brauner, mark.d.gray, i.maximets,
	aconole

On Fri, 19 Aug 2022 18:30:44 +0300 Andrey Zhadchenko wrote:
> -static size_t ovs_dp_cmd_msg_size(void)
> +static size_t ovs_dp_cmd_msg_size(struct datapath *dp)
>  {
>  	size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header));
> +	struct dp_nlsk_pids *pids = ovsl_dereference(dp->upcall_portids);
> +
>  

double new line

>  	msgsize += nla_total_size(IFNAMSIZ);
>  	msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_stats));
> @@ -1516,6 +1518,9 @@ static size_t ovs_dp_cmd_msg_size(void)
>  	msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_USER_FEATURES */
>  	msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_MASKS_CACHE_SIZE */
>  
> +	if (dp->user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU && pids)
> +		msgsize += nla_total_size_64bit(sizeof(u32) * pids->n_pids);

Can we make a safe over estimation here, like nr_cpu_ids maybe?
Would that be too large? It's fairly common to overestimate the
netlink message allocation.

Also why 64bit if the value is in u32 units?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 1/3] openvswitch: allow specifying ifindex of new interfaces
  2022-08-23  1:37   ` Jakub Kicinski
@ 2022-08-23 13:50     ` Andrey Zhadchenko
  2022-08-23 19:15       ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Andrey Zhadchenko @ 2022-08-23 13:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, dev, pshelar, davem, edumazet, pabeni, ptikhomirov,
	alexander.mikhalitsyn, avagin, brauner, mark.d.gray, i.maximets,
	aconole

Thanks for the review!

On 8/23/22 04:37, Jakub Kicinski wrote:
> On Fri, 19 Aug 2022 18:30:42 +0300 Andrey Zhadchenko wrote:
>> CRIU is preserving ifindexes of net devices after restoration. However,
>> current Open vSwitch API does not allow to target ifindex, so we cannot
>> correctly restore OVS configuration.
>>
>> Use ovs_header->dp_ifindex during OVS_DP_CMD_NEW as desired ifindex.
>> Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev
>> ifindex.
> 
>> --- a/net/openvswitch/datapath.c
>> +++ b/net/openvswitch/datapath.c
>> @@ -1739,6 +1739,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
>>   	struct vport *vport;
>>   	struct ovs_net *ovs_net;
>>   	int err;
>> +	struct ovs_header *ovs_header = info->userhdr;
>>   
>>   	err = -EINVAL;
>>   	if (!a[OVS_DP_ATTR_NAME] || !a[OVS_DP_ATTR_UPCALL_PID])
>> @@ -1779,6 +1780,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
>>   	parms.dp = dp;
>>   	parms.port_no = OVSP_LOCAL;
>>   	parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID];
>> +	parms.desired_ifindex = ovs_header->dp_ifindex;
> 
> Are you 100% sure that all user space making this call initializes
> dp_ifindex to 0? There is no validation in the kernel today that
> the field is not garbage as far as I can tell.
> 
> If you are sure, please add the appropriate analysis to the commit msg.

I am not sure that *all* users set dp_ifindex to zero. At least I do not 
know about them. Primary ovs userspace ovs-vswitchd certainly set to 
zero, others like WeaveNet do it too. But there can be more?
What is the policy regarding this? I can add a new attribute, it is not 
hard.

> 
>>   	/* So far only local changes have been made, now need the lock. */
>>   	ovs_lock();
>> @@ -2199,7 +2201,10 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
>>   	if (!a[OVS_VPORT_ATTR_NAME] || !a[OVS_VPORT_ATTR_TYPE] ||
>>   	    !a[OVS_VPORT_ATTR_UPCALL_PID])
>>   		return -EINVAL;
>> -	if (a[OVS_VPORT_ATTR_IFINDEX])
>> +
>> +	parms.type = nla_get_u32(a[OVS_VPORT_ATTR_TYPE]);
>> +
>> +	if (a[OVS_VPORT_ATTR_IFINDEX] && parms.type != OVS_VPORT_TYPE_INTERNAL)
>>   		return -EOPNOTSUPP;
>>   
>>   	port_no = a[OVS_VPORT_ATTR_PORT_NO]
>> @@ -2236,12 +2241,19 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
>>   	}
>>   
>>   	parms.name = nla_data(a[OVS_VPORT_ATTR_NAME]);
>> -	parms.type = nla_get_u32(a[OVS_VPORT_ATTR_TYPE]);
>>   	parms.options = a[OVS_VPORT_ATTR_OPTIONS];
>>   	parms.dp = dp;
>>   	parms.port_no = port_no;
>>   	parms.upcall_portids = a[OVS_VPORT_ATTR_UPCALL_PID];
>>   
>> +	if (parms.type == OVS_VPORT_TYPE_INTERNAL) {
>> +		if (a[OVS_VPORT_ATTR_IFINDEX])
> 
> You already validated that type must be internal for ifindex
> to be specified, so the outer if is unnecessary.
> 
> It's pretty common in netlink handling code to validate first
> then act assuming validation has passed.
> 
>> +			parms.desired_ifindex =
>> +				nla_get_u32(a[OVS_VPORT_ATTR_IFINDEX]);
>> +		else
>> +			parms.desired_ifindex = 0;
>> +	}
>> +
>>   	vport = new_vport(&parms);
>>   	err = PTR_ERR(vport);
>>   	if (IS_ERR(vport)) {
> 
>> diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
>> index 9de5030d9801..24e1cba2f1ac 100644
>> --- a/net/openvswitch/vport.h
>> +++ b/net/openvswitch/vport.h
>> @@ -98,6 +98,8 @@ struct vport_parms {
>>   	enum ovs_vport_type type;
>>   	struct nlattr *options;
>>   
>> +	int desired_ifindex;
> 
> Any chance this field would make sense somewhere else? you're adding
> a 4B field between two pointers it will result in a padding.
> 
> Also you're missing kdoc for this field.

Sure, will fix all the above in next version.

> 
>>   	/* For ovs_vport_alloc(). */
>>   	struct datapath *dp;
>>   	u16 port_no;
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 1/3] openvswitch: allow specifying ifindex of new interfaces
  2022-08-23 13:50     ` Andrey Zhadchenko
@ 2022-08-23 19:15       ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2022-08-23 19:15 UTC (permalink / raw)
  To: Andrey Zhadchenko
  Cc: netdev, dev, pshelar, davem, edumazet, pabeni, ptikhomirov,
	alexander.mikhalitsyn, avagin, brauner, mark.d.gray, i.maximets,
	aconole

On Tue, 23 Aug 2022 16:50:21 +0300 Andrey Zhadchenko wrote:
> > Are you 100% sure that all user space making this call initializes
> > dp_ifindex to 0? There is no validation in the kernel today that
> > the field is not garbage as far as I can tell.
> > 
> > If you are sure, please add the appropriate analysis to the commit msg.  
> 
> I am not sure that *all* users set dp_ifindex to zero. At least I do not 
> know about them. Primary ovs userspace ovs-vswitchd certainly set to 
> zero, others like WeaveNet do it too. But there can be more?
> What is the policy regarding this? I can add a new attribute, it is not 
> hard.

IDK how many user space components driving OvS exist out there.
We could risk it and reuse the field, but if we get it wrong 
and don't catch the regression before the final release is cut 
the result gets _really_ ugly. We will have _some_ user space
out there expecting us to use ifindex and _some_ which expects
it can pass garbage, so we can neither revert not leave it...
If that makes sense.

If using / adding attribute is not a hassle that'd be my vote.

Using fixed structs like struct ovs_header is strongly discouraged
for new families exactly because of situations like this :(

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-08-23 20:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19 15:30 [PATCH net-next v2 0/3] openvswitch: allow specifying ifindex of new interfaces Andrey Zhadchenko
2022-08-19 15:30 ` [PATCH net-next v2 1/3] " Andrey Zhadchenko
2022-08-22  8:06   ` Christian Brauner
2022-08-23  1:37   ` Jakub Kicinski
2022-08-23 13:50     ` Andrey Zhadchenko
2022-08-23 19:15       ` Jakub Kicinski
2022-08-19 15:30 ` [PATCH net-next v2 2/3] openvswitch: fix memory leak at failed datapath creation Andrey Zhadchenko
2022-08-22 12:53   ` Aaron Conole
2022-08-23  1:38     ` Jakub Kicinski
2022-08-19 15:30 ` [PATCH net-next v2 3/3] openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests Andrey Zhadchenko
2022-08-23  1:41   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).