All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] support for 256 VFs in RTM_GETLINK
@ 2021-01-23  4:53 Edwin Peer
  2021-01-23  4:53 ` [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end() Edwin Peer
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Edwin Peer @ 2021-01-23  4:53 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek, David Ahern

[-- Attachment #1: Type: text/plain, Size: 1658 bytes --]

RTM_GETLINK for greater than about 220 VFs truncates IFLA_VFINFO_LIST
due to the maximum reach of nlattr's nla_len being exceeded. There is
not a lot of enthusiasm for extensive fixes to the deprecated netlink
ABI for VF config, but there appears to be even less appetite for the
kinds of work arounds that would be necessitated in order to truly
keep it frozen [1].

This series first addresses nla_nest_end()'s propensity to generate
malformed netlink messages. The balance of the series comprises very
minor ABI updates intended to be low impact, in order to address the
remaining issues. First, the existing RTEXT_FILTER_SKIP_STATS is
called upon to alleviate the problem when user space does not want
statistics and then a minor tweak is introduced in two steps in order
to promote the stats one level up in the hierarchy with the minimum
of code churn.

Finally, the kernel series is followed by an iproute2 series to take
advantage of the changes.

[1] https://lore.kernel.org/netdev/20210115225950.18762-1-edwin.peer@broadcom.com/

Edwin Peer (4):
  netlink: truncate overlength attribute list in nla_nest_end()
  rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO
  rtnetlink: refactor IFLA_VF_INFO stats into rtnl_fill_vfstats()
  rtnetlink: promote IFLA_VF_STATS to same level as IFLA_VF_INFO

 include/net/netlink.h          |  11 ++-
 include/uapi/linux/if_link.h   |   1 +
 include/uapi/linux/netlink.h   |   1 +
 include/uapi/linux/rtnetlink.h |   1 +
 lib/nlattr.c                   |  27 +++++++
 net/core/rtnetlink.c           | 132 +++++++++++++++++++++------------
 6 files changed, 122 insertions(+), 51 deletions(-)

-- 
2.30.0


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2021-01-23  4:53 [PATCH net-next 0/4] support for 256 VFs in RTM_GETLINK Edwin Peer
@ 2021-01-23  4:53 ` Edwin Peer
  2021-01-23 19:14   ` David Ahern
  2021-01-26  1:43   ` Jakub Kicinski
  2021-01-23  4:53 ` [PATCH net-next 2/4] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO Edwin Peer
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 25+ messages in thread
From: Edwin Peer @ 2021-01-23  4:53 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek, David Ahern

[-- Attachment #1: Type: text/plain, Size: 4950 bytes --]

If a nested list of attributes is too long, then the length will
exceed the 16-bit nla_len of the parent nlattr. In such cases,
determine how many whole attributes can fit and truncate the
message to this length. This properly maintains the nesting
hierarchy, keeping the entire message valid, while fitting more
subelements inside the nest range than may result if the length
is wrapped modulo 64KB.

Marking truncated attributes, such that user space can determine
the precise attribute truncated, by means of an additional bit in
the nla_type was considered and rejected. The NLA_F_NESTED and
NLA_F_NET_BYTEORDER flags are supposed to be mutually exclusive.
So, in theory, the latter bit could have been redefined for nested
attributes in order to indicate truncation, but user space tools
(most notably iproute2) cannot be relied on to honor NLA_TYPE_MASK,
resulting in alteration of the perceived nla_type and subsequent
catastrophic failure.

Failing the entire message with a hard error must also be rejected,
as this would break existing user space functionality. The trigger
issue is evident for IFLA_VFINFO_LIST and a hard error here would
cause iproute2 to fail to render an entire interface list even if
only a single interface warranted a truncated VF list. Instead, set
NLM_F_NEST_TRUNCATED in the netlink header to inform user space
about the incomplete data. In this particular case, however, user
space can better ascertain which instance is truncated by consulting
the associated IFLA_NUM_VF to determine how many VFs were expected.

Fixes: bfa83a9e03cf ("[NETLINK]: Type-safe netlink messages/attributes interface")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 include/net/netlink.h        | 11 +++++++++--
 include/uapi/linux/netlink.h |  1 +
 lib/nlattr.c                 | 27 +++++++++++++++++++++++++++
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 1ceec518ab49..fc8c57dafb05 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -1785,19 +1785,26 @@ static inline struct nlattr *nla_nest_start(struct sk_buff *skb, int attrtype)
 	return nla_nest_start_noflag(skb, attrtype | NLA_F_NESTED);
 }
 
+int __nla_nest_trunc_msg(struct sk_buff *skb, const struct nlattr *start);
+
 /**
  * nla_nest_end - Finalize nesting of attributes
  * @skb: socket buffer the attributes are stored in
  * @start: container attribute
  *
  * Corrects the container attribute header to include the all
- * appeneded attributes.
+ * appeneded attributes. The list of attributes will be truncated
+ * if too long to fit within the parent attribute's maximum reach.
  *
  * Returns the total data length of the skb.
  */
 static inline int nla_nest_end(struct sk_buff *skb, struct nlattr *start)
 {
-	start->nla_len = skb_tail_pointer(skb) - (unsigned char *)start;
+	int len = skb_tail_pointer(skb) - (unsigned char *)start;
+
+	if (len > 0xffff)
+		len = __nla_nest_trunc_msg(skb, start);
+	start->nla_len = len;
 	return skb->len;
 }
 
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 3d94269bbfa8..44a250825c30 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -57,6 +57,7 @@ struct nlmsghdr {
 #define NLM_F_ECHO		0x08	/* Echo this request 		*/
 #define NLM_F_DUMP_INTR		0x10	/* Dump was inconsistent due to sequence change */
 #define NLM_F_DUMP_FILTERED	0x20	/* Dump was filtered as requested */
+#define NLM_F_NEST_TRUNCATED	0x40	/* Message contains truncated nested attribute */
 
 /* Modifiers to GET request */
 #define NLM_F_ROOT	0x100	/* specify tree	root	*/
diff --git a/lib/nlattr.c b/lib/nlattr.c
index 5b6116e81f9f..2a267c0d3e16 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -1119,4 +1119,31 @@ int nla_append(struct sk_buff *skb, int attrlen, const void *data)
 	return 0;
 }
 EXPORT_SYMBOL(nla_append);
+
+/**
+ * __nla_nest_trunc_msg - Truncate list of nested netlink attributes to max len
+ * @skb: socket buffer with tail pointer positioned after end of nested list
+ * @start: container attribute designating the beginning of the list
+ *
+ * Trims the skb to fit only the attributes which are within the range of the
+ * containing nest attribute. This is a helper for nla_nest_end, to prevent
+ * adding unduly to the length of what is an inline function. It is not
+ * intended to be called from anywhere else.
+ *
+ * Returns the truncated length of the enclosing nest attribute in accordance
+ * with the number of whole attributes that can fit.
+ */
+int __nla_nest_trunc_msg(struct sk_buff *skb, const struct nlattr *start)
+{
+	struct nlattr *attr = nla_data(start);
+	int rem = 0xffff - NLA_HDRLEN;
+
+	while (nla_ok(attr, rem))
+		attr = nla_next(attr, &rem);
+	nlmsg_trim(skb, attr);
+	nlmsg_hdr(skb)->nlmsg_flags |= NLM_F_NEST_TRUNCATED;
+	return (unsigned char *)attr - (unsigned char *)start;
+}
+EXPORT_SYMBOL(__nla_nest_trunc_msg);
+
 #endif
-- 
2.30.0


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 2/4] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO
  2021-01-23  4:53 [PATCH net-next 0/4] support for 256 VFs in RTM_GETLINK Edwin Peer
  2021-01-23  4:53 ` [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end() Edwin Peer
@ 2021-01-23  4:53 ` Edwin Peer
  2021-01-26  1:55   ` Jakub Kicinski
  2021-01-23  4:53 ` [PATCH net-next 3/4] rtnetlink: refactor IFLA_VF_INFO stats into rtnl_fill_vfstats() Edwin Peer
  2021-01-23  4:53 ` [PATCH net-next 4/4] rtnetlink: promote IFLA_VF_STATS to same level as IFLA_VF_INFO Edwin Peer
  3 siblings, 1 reply; 25+ messages in thread
From: Edwin Peer @ 2021-01-23  4:53 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek, David Ahern

[-- Attachment #1: Type: text/plain, Size: 6298 bytes --]

This filter already exists for excluding IPv6 SNMP stats. Extend its
definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK.

This patch constitutes a partial fix for a netlink attribute nesting
overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the
requester doesn't need them, the truncation of the VF list is avoided.

While it was technically only the stats added in commit c5a9f6f0ab40
("net/core: Add drop counters to VF statistics") breaking the camel's
back, the appreciable size of the stats data should never have been
included without due consideration for the maximum number of VFs
supported by PCI.

Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice")
Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 net/core/rtnetlink.c | 96 +++++++++++++++++++++++---------------------
 1 file changed, 51 insertions(+), 45 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 3d6ab194d0f5..466f920ac974 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -931,24 +931,27 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev,
 			 nla_total_size(sizeof(struct ifla_vf_rate)) +
 			 nla_total_size(sizeof(struct ifla_vf_link_state)) +
 			 nla_total_size(sizeof(struct ifla_vf_rss_query_en)) +
-			 nla_total_size(0) + /* nest IFLA_VF_STATS */
-			 /* IFLA_VF_STATS_RX_PACKETS */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_TX_PACKETS */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_RX_BYTES */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_TX_BYTES */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_BROADCAST */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_MULTICAST */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_RX_DROPPED */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_TX_DROPPED */
-			 nla_total_size_64bit(sizeof(__u64)) +
 			 nla_total_size(sizeof(struct ifla_vf_trust)));
+		if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS) {
+			size += num_vfs *
+				(nla_total_size(0) + /* nest IFLA_VF_STATS */
+				 /* IFLA_VF_STATS_RX_PACKETS */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_TX_PACKETS */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_RX_BYTES */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_TX_BYTES */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_BROADCAST */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_MULTICAST */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_RX_DROPPED */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_TX_DROPPED */
+				 nla_total_size_64bit(sizeof(__u64)));
+		}
 		return size;
 	} else
 		return 0;
@@ -1223,7 +1226,8 @@ static noinline_for_stack int rtnl_fill_stats(struct sk_buff *skb,
 static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 					       struct net_device *dev,
 					       int vfs_num,
-					       struct nlattr *vfinfo)
+					       struct nlattr *vfinfo,
+					       u32 ext_filter_mask)
 {
 	struct ifla_vf_rss_query_en vf_rss_query_en;
 	struct nlattr *vf, *vfstats, *vfvlanlist;
@@ -1329,33 +1333,35 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 		goto nla_put_vf_failure;
 	}
 	nla_nest_end(skb, vfvlanlist);
-	memset(&vf_stats, 0, sizeof(vf_stats));
-	if (dev->netdev_ops->ndo_get_vf_stats)
-		dev->netdev_ops->ndo_get_vf_stats(dev, vfs_num,
-						&vf_stats);
-	vfstats = nla_nest_start_noflag(skb, IFLA_VF_STATS);
-	if (!vfstats)
-		goto nla_put_vf_failure;
-	if (nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_PACKETS,
-			      vf_stats.rx_packets, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_PACKETS,
-			      vf_stats.tx_packets, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_BYTES,
-			      vf_stats.rx_bytes, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_BYTES,
-			      vf_stats.tx_bytes, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_BROADCAST,
-			      vf_stats.broadcast, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_MULTICAST,
-			      vf_stats.multicast, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_DROPPED,
-			      vf_stats.rx_dropped, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_DROPPED,
-			      vf_stats.tx_dropped, IFLA_VF_STATS_PAD)) {
-		nla_nest_cancel(skb, vfstats);
-		goto nla_put_vf_failure;
+	if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS) {
+		memset(&vf_stats, 0, sizeof(vf_stats));
+		if (dev->netdev_ops->ndo_get_vf_stats)
+			dev->netdev_ops->ndo_get_vf_stats(dev, vfs_num,
+							  &vf_stats);
+		vfstats = nla_nest_start_noflag(skb, IFLA_VF_STATS);
+		if (!vfstats)
+			goto nla_put_vf_failure;
+		if (nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_PACKETS,
+				      vf_stats.rx_packets, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_PACKETS,
+				      vf_stats.tx_packets, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_BYTES,
+				      vf_stats.rx_bytes, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_BYTES,
+				      vf_stats.tx_bytes, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_BROADCAST,
+				      vf_stats.broadcast, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_MULTICAST,
+				      vf_stats.multicast, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_DROPPED,
+				      vf_stats.rx_dropped, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_DROPPED,
+				      vf_stats.tx_dropped, IFLA_VF_STATS_PAD)) {
+			nla_nest_cancel(skb, vfstats);
+			goto nla_put_vf_failure;
+		}
+		nla_nest_end(skb, vfstats);
 	}
-	nla_nest_end(skb, vfstats);
 	nla_nest_end(skb, vf);
 	return 0;
 
@@ -1388,7 +1394,7 @@ static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb,
 		return -EMSGSIZE;
 
 	for (i = 0; i < num_vfs; i++) {
-		if (rtnl_fill_vfinfo(skb, dev, i, vfinfo))
+		if (rtnl_fill_vfinfo(skb, dev, i, vfinfo, ext_filter_mask))
 			return -EMSGSIZE;
 	}
 
-- 
2.30.0


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 3/4] rtnetlink: refactor IFLA_VF_INFO stats into rtnl_fill_vfstats()
  2021-01-23  4:53 [PATCH net-next 0/4] support for 256 VFs in RTM_GETLINK Edwin Peer
  2021-01-23  4:53 ` [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end() Edwin Peer
  2021-01-23  4:53 ` [PATCH net-next 2/4] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO Edwin Peer
@ 2021-01-23  4:53 ` Edwin Peer
  2021-01-23  4:53 ` [PATCH net-next 4/4] rtnetlink: promote IFLA_VF_STATS to same level as IFLA_VF_INFO Edwin Peer
  3 siblings, 0 replies; 25+ messages in thread
From: Edwin Peer @ 2021-01-23  4:53 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek, David Ahern

[-- Attachment #1: Type: text/plain, Size: 4104 bytes --]

Moving VF stats into their own function will facilitate separating
them out later.

No functional change.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 net/core/rtnetlink.c | 66 +++++++++++++++++++++++++-------------------
 1 file changed, 38 insertions(+), 28 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 466f920ac974..95564fd12f24 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1223,6 +1223,42 @@ static noinline_for_stack int rtnl_fill_stats(struct sk_buff *skb,
 	return 0;
 }
 
+static noinline_for_stack int rtnl_fill_vfstats(struct sk_buff *skb,
+						struct net_device *dev,
+						int vf_num)
+{
+	struct ifla_vf_stats vf_stats;
+	struct nlattr *vfstats;
+
+	memset(&vf_stats, 0, sizeof(vf_stats));
+	if (dev->netdev_ops->ndo_get_vf_stats)
+		dev->netdev_ops->ndo_get_vf_stats(dev, vf_num, &vf_stats);
+	vfstats = nla_nest_start_noflag(skb, IFLA_VF_STATS);
+	if (!vfstats)
+		return -EMSGSIZE;
+	if (nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_PACKETS,
+			      vf_stats.rx_packets, IFLA_VF_STATS_PAD) ||
+	    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_PACKETS,
+			      vf_stats.tx_packets, IFLA_VF_STATS_PAD) ||
+	    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_BYTES,
+			      vf_stats.rx_bytes, IFLA_VF_STATS_PAD) ||
+	    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_BYTES,
+			      vf_stats.tx_bytes, IFLA_VF_STATS_PAD) ||
+	    nla_put_u64_64bit(skb, IFLA_VF_STATS_BROADCAST,
+			      vf_stats.broadcast, IFLA_VF_STATS_PAD) ||
+	    nla_put_u64_64bit(skb, IFLA_VF_STATS_MULTICAST,
+			      vf_stats.multicast, IFLA_VF_STATS_PAD) ||
+	    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_DROPPED,
+			      vf_stats.rx_dropped, IFLA_VF_STATS_PAD) ||
+	    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_DROPPED,
+			      vf_stats.tx_dropped, IFLA_VF_STATS_PAD)) {
+		nla_nest_cancel(skb, vfstats);
+		return -EMSGSIZE;
+	}
+	nla_nest_end(skb, vfstats);
+	return 0;
+}
+
 static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 					       struct net_device *dev,
 					       int vfs_num,
@@ -1230,12 +1266,11 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 					       u32 ext_filter_mask)
 {
 	struct ifla_vf_rss_query_en vf_rss_query_en;
-	struct nlattr *vf, *vfstats, *vfvlanlist;
+	struct nlattr *vf, *vfvlanlist;
 	struct ifla_vf_link_state vf_linkstate;
 	struct ifla_vf_vlan_info vf_vlan_info;
 	struct ifla_vf_spoofchk vf_spoofchk;
 	struct ifla_vf_tx_rate vf_tx_rate;
-	struct ifla_vf_stats vf_stats;
 	struct ifla_vf_trust vf_trust;
 	struct ifla_vf_vlan vf_vlan;
 	struct ifla_vf_rate vf_rate;
@@ -1334,33 +1369,8 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 	}
 	nla_nest_end(skb, vfvlanlist);
 	if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS) {
-		memset(&vf_stats, 0, sizeof(vf_stats));
-		if (dev->netdev_ops->ndo_get_vf_stats)
-			dev->netdev_ops->ndo_get_vf_stats(dev, vfs_num,
-							  &vf_stats);
-		vfstats = nla_nest_start_noflag(skb, IFLA_VF_STATS);
-		if (!vfstats)
-			goto nla_put_vf_failure;
-		if (nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_PACKETS,
-				      vf_stats.rx_packets, IFLA_VF_STATS_PAD) ||
-		    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_PACKETS,
-				      vf_stats.tx_packets, IFLA_VF_STATS_PAD) ||
-		    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_BYTES,
-				      vf_stats.rx_bytes, IFLA_VF_STATS_PAD) ||
-		    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_BYTES,
-				      vf_stats.tx_bytes, IFLA_VF_STATS_PAD) ||
-		    nla_put_u64_64bit(skb, IFLA_VF_STATS_BROADCAST,
-				      vf_stats.broadcast, IFLA_VF_STATS_PAD) ||
-		    nla_put_u64_64bit(skb, IFLA_VF_STATS_MULTICAST,
-				      vf_stats.multicast, IFLA_VF_STATS_PAD) ||
-		    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_DROPPED,
-				      vf_stats.rx_dropped, IFLA_VF_STATS_PAD) ||
-		    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_DROPPED,
-				      vf_stats.tx_dropped, IFLA_VF_STATS_PAD)) {
-			nla_nest_cancel(skb, vfstats);
+		if (rtnl_fill_vfstats(skb, dev, vfs_num))
 			goto nla_put_vf_failure;
-		}
-		nla_nest_end(skb, vfstats);
 	}
 	nla_nest_end(skb, vf);
 	return 0;
-- 
2.30.0


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 4/4] rtnetlink: promote IFLA_VF_STATS to same level as IFLA_VF_INFO
  2021-01-23  4:53 [PATCH net-next 0/4] support for 256 VFs in RTM_GETLINK Edwin Peer
                   ` (2 preceding siblings ...)
  2021-01-23  4:53 ` [PATCH net-next 3/4] rtnetlink: refactor IFLA_VF_INFO stats into rtnl_fill_vfstats() Edwin Peer
@ 2021-01-23  4:53 ` Edwin Peer
  2021-01-26  2:01   ` Jakub Kicinski
  3 siblings, 1 reply; 25+ messages in thread
From: Edwin Peer @ 2021-01-23  4:53 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek, David Ahern

[-- Attachment #1: Type: text/plain, Size: 3636 bytes --]

Separating the VF stats out of IFLA_VF_INFO appears to be the least
impact way of resolving the nlattr overflow bug in IFLA_VFINFO_LIST.

Since changing the hierarchy does constitute an ABI change, it must
be explicitly requested via RTEXT_FILTER_VF_SEPARATE_STATS. Otherwise,
the old location is maintained for compatibility.

A new container type, namely IFLA_VFSTATS_LIST, is introduced to group
the stats objects into an ordered list that corresponds with the order
of VFs in IFLA_VFINFO_LIST.

Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice")
Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 include/uapi/linux/if_link.h   |  1 +
 include/uapi/linux/rtnetlink.h |  1 +
 net/core/rtnetlink.c           | 24 +++++++++++++++++++++---
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 2bd0d8bbcdb2..db12ffd2bffd 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -341,6 +341,7 @@ enum {
 	IFLA_ALT_IFNAME, /* Alternative ifname */
 	IFLA_PERM_ADDRESS,
 	IFLA_PROTO_DOWN_REASON,
+	IFLA_VFSTATS_LIST,
 	__IFLA_MAX
 };
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index b841caa4657e..f2f4f9b4d595 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -789,6 +789,7 @@ enum {
 #define RTEXT_FILTER_MRP	(1 << 4)
 #define RTEXT_FILTER_CFM_CONFIG	(1 << 5)
 #define RTEXT_FILTER_CFM_STATUS	(1 << 6)
+#define RTEXT_FILTER_VF_SEPARATE_STATS	(1 << 7)
 
 /* End of information exported to user level */
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 95564fd12f24..cddd3945bc11 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -933,6 +933,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev,
 			 nla_total_size(sizeof(struct ifla_vf_rss_query_en)) +
 			 nla_total_size(sizeof(struct ifla_vf_trust)));
 		if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS) {
+			if (ext_filter_mask & RTEXT_FILTER_VF_SEPARATE_STATS)
+				size += nla_total_size(0); /* IFLA_VFSTATS_LIST */
 			size += num_vfs *
 				(nla_total_size(0) + /* nest IFLA_VF_STATS */
 				 /* IFLA_VF_STATS_RX_PACKETS */
@@ -1368,7 +1370,8 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 		goto nla_put_vf_failure;
 	}
 	nla_nest_end(skb, vfvlanlist);
-	if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS) {
+	if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS &&
+	    ~ext_filter_mask & RTEXT_FILTER_VF_SEPARATE_STATS) {
 		if (rtnl_fill_vfstats(skb, dev, vfs_num))
 			goto nla_put_vf_failure;
 	}
@@ -1386,7 +1389,7 @@ static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb,
 					   struct net_device *dev,
 					   u32 ext_filter_mask)
 {
-	struct nlattr *vfinfo;
+	struct nlattr *vfinfo, *vfstats;
 	int i, num_vfs;
 
 	if (!dev->dev.parent || ((ext_filter_mask & RTEXT_FILTER_VF) == 0))
@@ -1407,8 +1410,23 @@ static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb,
 		if (rtnl_fill_vfinfo(skb, dev, i, vfinfo, ext_filter_mask))
 			return -EMSGSIZE;
 	}
-
 	nla_nest_end(skb, vfinfo);
+
+	if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS &&
+	    ext_filter_mask & RTEXT_FILTER_VF_SEPARATE_STATS) {
+		vfstats = nla_nest_start_noflag(skb, IFLA_VFSTATS_LIST);
+		if (!vfstats)
+			return -EMSGSIZE;
+
+		for (i = 0; i < num_vfs; i++) {
+			if (rtnl_fill_vfstats(skb, dev, i)) {
+				nla_nest_cancel(skb, vfstats);
+				return -EMSGSIZE;
+			}
+		}
+		nla_nest_end(skb, vfstats);
+	}
+
 	return 0;
 }
 
-- 
2.30.0


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2021-01-23  4:53 ` [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end() Edwin Peer
@ 2021-01-23 19:14   ` David Ahern
  2021-01-23 20:42     ` Edwin Peer
  2021-01-26  1:43   ` Jakub Kicinski
  1 sibling, 1 reply; 25+ messages in thread
From: David Ahern @ 2021-01-23 19:14 UTC (permalink / raw)
  To: Edwin Peer, netdev
  Cc: Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

On 1/22/21 9:53 PM, Edwin Peer wrote:
> If a nested list of attributes is too long, then the length will
> exceed the 16-bit nla_len of the parent nlattr. In such cases,
> determine how many whole attributes can fit and truncate the
> message to this length. This properly maintains the nesting
> hierarchy, keeping the entire message valid, while fitting more
> subelements inside the nest range than may result if the length
> is wrapped modulo 64KB.
> 
> Marking truncated attributes, such that user space can determine
> the precise attribute truncated, by means of an additional bit in
> the nla_type was considered and rejected. The NLA_F_NESTED and
> NLA_F_NET_BYTEORDER flags are supposed to be mutually exclusive.
> So, in theory, the latter bit could have been redefined for nested
> attributes in order to indicate truncation, but user space tools
> (most notably iproute2) cannot be relied on to honor NLA_TYPE_MASK,
> resulting in alteration of the perceived nla_type and subsequent
> catastrophic failure.
> 

Did you look at using NETLINK_CB / netlink_skb_parms to keep a running
length of nested attributes to avoid the need to trim?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2021-01-23 19:14   ` David Ahern
@ 2021-01-23 20:42     ` Edwin Peer
  2021-01-23 21:03       ` Edwin Peer
  2021-01-26  4:50       ` David Ahern
  0 siblings, 2 replies; 25+ messages in thread
From: Edwin Peer @ 2021-01-23 20:42 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

[-- Attachment #1: Type: text/plain, Size: 2672 bytes --]

On Sat, Jan 23, 2021 at 11:14 AM David Ahern <dsahern@gmail.com> wrote:

> > Marking truncated attributes, such that user space can determine
> > the precise attribute truncated, by means of an additional bit in
> > the nla_type was considered and rejected. The NLA_F_NESTED and
> > NLA_F_NET_BYTEORDER flags are supposed to be mutually exclusive.
> > So, in theory, the latter bit could have been redefined for nested
> > attributes in order to indicate truncation, but user space tools
> > (most notably iproute2) cannot be relied on to honor NLA_TYPE_MASK,
> > resulting in alteration of the perceived nla_type and subsequent
> > catastrophic failure.
>
> Did you look at using NETLINK_CB / netlink_skb_parms to keep a running
> length of nested attributes to avoid the need to trim?

I did not, but thinking about it now, I don't think that's necessarily
the way to go. We shouldn't be concerned about the cost of iterating
over the list and trimming the skb for what should be a rare exception
path. Ideally, we want to make sure at compile time (by having correct
code) that we don't ever exceed this limit at run time. Perhaps we
should investigate static analysis approaches to prove nla_len can't
be exceeded?

Tracking the outer nest length during nla_put() would provide for
convenient error indication at the precise location where things go
wrong, but that's a fair amount of housekeeping that isn't free of
complexity and run time cost either. Instead of rarely (if ever)
undoing work, we'll always do extra work that we hardly ever need.

Then, if nla_put() can detect nesting errors, there's the issue of
what to do in the case of errors. Case in point, the IFLA_VFINFO_LIST
scenario would now require explicit error handling in the generator
logic, because we can't fail hard at that point. We would need to be
sure we propagate all possible nesting errors up to a common location
(probably where the nest ends, which is where we're dealing with the
problem in this solution), set the truncated flag and carry on (for
the same net effect the trim in nla_nest_end() has). If there are
other cases we don't know about today, they might turn from soft fails
into hard errors, breaking existing user space. Truncating the list is
the only non-obtrusive solution to any existing brokenness that is
guaranteed to not make things worse, but we can't know where we need
to do that apriori and would need to explicitly handle each case as
they come up.

Hard errors on nest overflow can only reliably work for new code. That
is, assuming it is tested to the extremes when it goes in, not after
user space comes to rely on the broken behavior.

Regards,
Edwin Peer

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2021-01-23 20:42     ` Edwin Peer
@ 2021-01-23 21:03       ` Edwin Peer
  2021-01-26  4:56         ` David Ahern
  2021-01-26  4:50       ` David Ahern
  1 sibling, 1 reply; 25+ messages in thread
From: Edwin Peer @ 2021-01-23 21:03 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]

On Sat, Jan 23, 2021 at 12:42 PM Edwin Peer <edwin.peer@broadcom.com> wrote:

> Then, if nla_put() can detect nesting errors, there's the issue of
> what to do in the case of errors. Case in point, the IFLA_VFINFO_LIST
> scenario would now require explicit error handling in the generator
> logic, because we can't fail hard at that point. We would need to be
> sure we propagate all possible nesting errors up to a common location
> (probably where the nest ends, which is where we're dealing with the
> problem in this solution), set the truncated flag and carry on (for
> the same net effect the trim in nla_nest_end() has).

Also, the unwind here turns out to be just as complicated, requiring a
traversal from the start and a trim anyway, because the attribute that
triggers the overflow may be deep inside the hierarchy. We can't
simply truncate at this point. We should remove whole elements at the
uppermost level, or risk having partially filled attribute trees
rather than missing attributes at the level of the exceeded nest -
which may be worse.

Regards,
Edwin Peer

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2021-01-23  4:53 ` [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end() Edwin Peer
  2021-01-23 19:14   ` David Ahern
@ 2021-01-26  1:43   ` Jakub Kicinski
  1 sibling, 0 replies; 25+ messages in thread
From: Jakub Kicinski @ 2021-01-26  1:43 UTC (permalink / raw)
  To: Edwin Peer
  Cc: netdev, Andrew Gospodarek, Michael Chan, Stephen Hemminger,
	Michal Kubecek, David Ahern

On Fri, 22 Jan 2021 20:53:18 -0800 Edwin Peer wrote:
> If a nested list of attributes is too long, then the length will
> exceed the 16-bit nla_len of the parent nlattr. In such cases,
> determine how many whole attributes can fit and truncate the
> message to this length. This properly maintains the nesting
> hierarchy, keeping the entire message valid, while fitting more
> subelements inside the nest range than may result if the length
> is wrapped modulo 64KB.
> 
> Marking truncated attributes, such that user space can determine
> the precise attribute truncated, by means of an additional bit in
> the nla_type was considered and rejected. The NLA_F_NESTED and
> NLA_F_NET_BYTEORDER flags are supposed to be mutually exclusive.
> So, in theory, the latter bit could have been redefined for nested
> attributes in order to indicate truncation, but user space tools
> (most notably iproute2) cannot be relied on to honor NLA_TYPE_MASK,
> resulting in alteration of the perceived nla_type and subsequent
> catastrophic failure.
> 
> Failing the entire message with a hard error must also be rejected,
> as this would break existing user space functionality. The trigger
> issue is evident for IFLA_VFINFO_LIST and a hard error here would
> cause iproute2 to fail to render an entire interface list even if
> only a single interface warranted a truncated VF list. Instead, set
> NLM_F_NEST_TRUNCATED in the netlink header to inform user space
> about the incomplete data. In this particular case, however, user
> space can better ascertain which instance is truncated by consulting
> the associated IFLA_NUM_VF to determine how many VFs were expected.
> 
> Fixes: bfa83a9e03cf ("[NETLINK]: Type-safe netlink messages/attributes interface")
> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>


> diff --git a/include/net/netlink.h b/include/net/netlink.h
> index 1ceec518ab49..fc8c57dafb05 100644
> --- a/include/net/netlink.h
> +++ b/include/net/netlink.h
> @@ -1785,19 +1785,26 @@ static inline struct nlattr *nla_nest_start(struct sk_buff *skb, int attrtype)
>  	return nla_nest_start_noflag(skb, attrtype | NLA_F_NESTED);
>  }
>  
> +int __nla_nest_trunc_msg(struct sk_buff *skb, const struct nlattr *start);
> +
>  /**
>   * nla_nest_end - Finalize nesting of attributes
>   * @skb: socket buffer the attributes are stored in
>   * @start: container attribute
>   *
>   * Corrects the container attribute header to include the all
> - * appeneded attributes.
> + * appeneded attributes. The list of attributes will be truncated
> + * if too long to fit within the parent attribute's maximum reach.
>   *
>   * Returns the total data length of the skb.
>   */
>  static inline int nla_nest_end(struct sk_buff *skb, struct nlattr *start)

What are the semantics for multiple nests? All attrs down will always
have the trunc flag set? Because I'd have expected the skb_tail to be
trimmed in __nla_nest_trunc_msg()..

In fact if there is another nest around the "full" one, and the "full"
one is close to 0xffff wouldn't that end up cascading all the way down
to the outermost attr yielding an empty top level attr? I feel like we
cater to a specific use case here where that doesn't happen.

After initially being concerned about using up another flag, I warmed
up to the concept of NLM_F_NEST_TRUNCATED, but I think this is all
better driven by the writer. The writer knows the size and count of the
attrs because it sizes the skb. So can the writer not drive the
truncation? Add a "how much space do we have left" check in
rtnl_fill_vf() ?

This would also avoid the assumption that the contents of the nla are
other nlas.

>  {
> -	start->nla_len = skb_tail_pointer(skb) - (unsigned char *)start;
> +	int len = skb_tail_pointer(skb) - (unsigned char *)start;
> +
> +	if (len > 0xffff)
> +		len = __nla_nest_trunc_msg(skb, start);
> +	start->nla_len = len;
>  	return skb->len;

This function really needs to start returning an error on overflow.
Perhaps even with a WARN(). 

Could you please check how many callers we'd need to change to have
nla_nest_end() have the same semantics as nla_put() (i.e. 0 or -errno)?
On a quick scan I see that most cases only care about errors already.

>  }
>  
> diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
> index 3d94269bbfa8..44a250825c30 100644
> --- a/include/uapi/linux/netlink.h
> +++ b/include/uapi/linux/netlink.h
> @@ -57,6 +57,7 @@ struct nlmsghdr {
>  #define NLM_F_ECHO		0x08	/* Echo this request 		*/
>  #define NLM_F_DUMP_INTR		0x10	/* Dump was inconsistent due to sequence change */
>  #define NLM_F_DUMP_FILTERED	0x20	/* Dump was filtered as requested */
> +#define NLM_F_NEST_TRUNCATED	0x40	/* Message contains truncated nested attribute */
>  
>  /* Modifiers to GET request */
>  #define NLM_F_ROOT	0x100	/* specify tree	root	*/
> diff --git a/lib/nlattr.c b/lib/nlattr.c
> index 5b6116e81f9f..2a267c0d3e16 100644
> --- a/lib/nlattr.c
> +++ b/lib/nlattr.c
> @@ -1119,4 +1119,31 @@ int nla_append(struct sk_buff *skb, int attrlen, const void *data)
>  	return 0;
>  }
>  EXPORT_SYMBOL(nla_append);
> +
> +/**
> + * __nla_nest_trunc_msg - Truncate list of nested netlink attributes to max len
> + * @skb: socket buffer with tail pointer positioned after end of nested list
> + * @start: container attribute designating the beginning of the list
> + *
> + * Trims the skb to fit only the attributes which are within the range of the
> + * containing nest attribute. This is a helper for nla_nest_end, to prevent
> + * adding unduly to the length of what is an inline function. It is not
> + * intended to be called from anywhere else.
> + *
> + * Returns the truncated length of the enclosing nest attribute in accordance
> + * with the number of whole attributes that can fit.
> + */
> +int __nla_nest_trunc_msg(struct sk_buff *skb, const struct nlattr *start)
> +{
> +	struct nlattr *attr = nla_data(start);
> +	int rem = 0xffff - NLA_HDRLEN;
> +
> +	while (nla_ok(attr, rem))
> +		attr = nla_next(attr, &rem);
> +	nlmsg_trim(skb, attr);
> +	nlmsg_hdr(skb)->nlmsg_flags |= NLM_F_NEST_TRUNCATED;
> +	return (unsigned char *)attr - (unsigned char *)start;
> +}
> +EXPORT_SYMBOL(__nla_nest_trunc_msg);
> +
>  #endif


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 2/4] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO
  2021-01-23  4:53 ` [PATCH net-next 2/4] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO Edwin Peer
@ 2021-01-26  1:55   ` Jakub Kicinski
  2021-01-26 22:48     ` Edwin Peer
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2021-01-26  1:55 UTC (permalink / raw)
  To: Edwin Peer
  Cc: netdev, Andrew Gospodarek, Michael Chan, Stephen Hemminger,
	Michal Kubecek, David Ahern

On Fri, 22 Jan 2021 20:53:19 -0800 Edwin Peer wrote:
> This filter already exists for excluding IPv6 SNMP stats. Extend its
> definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK.
> 
> This patch constitutes a partial fix for a netlink attribute nesting
> overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the
> requester doesn't need them, the truncation of the VF list is avoided.
> 
> While it was technically only the stats added in commit c5a9f6f0ab40
> ("net/core: Add drop counters to VF statistics") breaking the camel's
> back, the appreciable size of the stats data should never have been
> included without due consideration for the maximum number of VFs
> supported by PCI.
> 
> Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice")
> Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics")
> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>

Could you include in the commit message the size breakdown of a single
VF nest? With and without efficient unaligned 64b access?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 4/4] rtnetlink: promote IFLA_VF_STATS to same level as IFLA_VF_INFO
  2021-01-23  4:53 ` [PATCH net-next 4/4] rtnetlink: promote IFLA_VF_STATS to same level as IFLA_VF_INFO Edwin Peer
@ 2021-01-26  2:01   ` Jakub Kicinski
  2021-01-26 14:50     ` Edwin Peer
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2021-01-26  2:01 UTC (permalink / raw)
  To: Edwin Peer
  Cc: netdev, Andrew Gospodarek, Michael Chan, Stephen Hemminger,
	Michal Kubecek, David Ahern

On Fri, 22 Jan 2021 20:53:21 -0800 Edwin Peer wrote:
> Separating the VF stats out of IFLA_VF_INFO appears to be the least
> impact way of resolving the nlattr overflow bug in IFLA_VFINFO_LIST.
> 
> Since changing the hierarchy does constitute an ABI change, it must
> be explicitly requested via RTEXT_FILTER_VF_SEPARATE_STATS. Otherwise,
> the old location is maintained for compatibility.

We don't want any additions to the VF ABI via GETLINK. IMO the clear
truncation is fine, hiding stats is pushing it, new attr is a no go.

> A new container type, namely IFLA_VFSTATS_LIST, is introduced to group
> the stats objects into an ordered list that corresponds with the order
> of VFs in IFLA_VFINFO_LIST.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2021-01-23 20:42     ` Edwin Peer
  2021-01-23 21:03       ` Edwin Peer
@ 2021-01-26  4:50       ` David Ahern
  1 sibling, 0 replies; 25+ messages in thread
From: David Ahern @ 2021-01-26  4:50 UTC (permalink / raw)
  To: Edwin Peer
  Cc: netdev, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

On 1/23/21 1:42 PM, Edwin Peer wrote:
> On Sat, Jan 23, 2021 at 11:14 AM David Ahern <dsahern@gmail.com> wrote:
> 
>>> Marking truncated attributes, such that user space can determine
>>> the precise attribute truncated, by means of an additional bit in
>>> the nla_type was considered and rejected. The NLA_F_NESTED and
>>> NLA_F_NET_BYTEORDER flags are supposed to be mutually exclusive.
>>> So, in theory, the latter bit could have been redefined for nested
>>> attributes in order to indicate truncation, but user space tools
>>> (most notably iproute2) cannot be relied on to honor NLA_TYPE_MASK,
>>> resulting in alteration of the perceived nla_type and subsequent
>>> catastrophic failure.
>>
>> Did you look at using NETLINK_CB / netlink_skb_parms to keep a running
>> length of nested attributes to avoid the need to trim?
> 
> I did not, but thinking about it now, I don't think that's necessarily
> the way to go. We shouldn't be concerned about the cost of iterating
> over the list and trimming the skb for what should be a rare exception
> path. Ideally, we want to make sure at compile time (by having correct
> code) that we don't ever exceed this limit at run time. Perhaps we
> should investigate static analysis approaches to prove nla_len can't
> be exceeded?

It's not a rare exception path if VF data is dumped on every ip link
dump request. I think you would be surprised how often random s/w does a
link dump.

As for the static analysis, the number of VFs is dynamic so impossible
to detect at compile time. Limiting the number of VFs to what can fit
would be a different kind of regression.

> 
> Tracking the outer nest length during nla_put() would provide for
> convenient error indication at the precise location where things go
> wrong, but that's a fair amount of housekeeping that isn't free of
> complexity and run time cost either. Instead of rarely (if ever)
> undoing work, we'll always do extra work that we hardly ever need.
> 
> Then, if nla_put() can detect nesting errors, there's the issue of
> what to do in the case of errors. Case in point, the IFLA_VFINFO_LIST
> scenario would now require explicit error handling in the generator
> logic, because we can't fail hard at that point. We would need to be
> sure we propagate all possible nesting errors up to a common location
> (probably where the nest ends, which is where we're dealing with the
> problem in this solution), set the truncated flag and carry on (for
> the same net effect the trim in nla_nest_end() has). If there are
> other cases we don't know about today, they might turn from soft fails
> into hard errors, breaking existing user space. Truncating the list is
> the only non-obtrusive solution to any existing brokenness that is
> guaranteed to not make things worse, but we can't know where we need
> to do that apriori and would need to explicitly handle each case as
> they come up.

Yes, tracking the errors is hard given the flow of netlink helpers, but
ultimately all of the space checks come down to:

	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen)))
		return -EMSGSIZE;

so surely there is a solution. There are only a few entry points to
track space available (open coded checks deserve their fate) which can
be used to track overflow, nesting and multiple layers of nesting to
then avoid the memcpy in __nla_put, __nla_put_64bit, and __nla_put_nohdr

> 
> Hard errors on nest overflow can only reliably work for new code. That
> is, assuming it is tested to the extremes when it goes in, not after
> user space comes to rely on the broken behavior.
> 
> Regards,
> Edwin Peer
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2021-01-23 21:03       ` Edwin Peer
@ 2021-01-26  4:56         ` David Ahern
  2021-01-26 17:51           ` Edwin Peer
  0 siblings, 1 reply; 25+ messages in thread
From: David Ahern @ 2021-01-26  4:56 UTC (permalink / raw)
  To: Edwin Peer
  Cc: netdev, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

On 1/23/21 2:03 PM, Edwin Peer wrote:
> On Sat, Jan 23, 2021 at 12:42 PM Edwin Peer <edwin.peer@broadcom.com> wrote:
> 
>> Then, if nla_put() can detect nesting errors, there's the issue of
>> what to do in the case of errors. Case in point, the IFLA_VFINFO_LIST
>> scenario would now require explicit error handling in the generator
>> logic, because we can't fail hard at that point. We would need to be
>> sure we propagate all possible nesting errors up to a common location
>> (probably where the nest ends, which is where we're dealing with the
>> problem in this solution), set the truncated flag and carry on (for
>> the same net effect the trim in nla_nest_end() has).
> 
> Also, the unwind here turns out to be just as complicated, requiring a
> traversal from the start and a trim anyway, because the attribute that
> triggers the overflow may be deep inside the hierarchy. We can't
> simply truncate at this point. We should remove whole elements at the
> uppermost level, or risk having partially filled attribute trees
> rather than missing attributes at the level of the exceeded nest -
> which may be worse.
> 

I'm not a fan of the skb trim idea. I think it would be better to figure
out how to stop adding to the skb when an attr length is going to exceed
64kB. Not failing hard with an error (ip link sh needs to succeed), but
truncating the specific attribute of a message with a flag so userspace
knows it is short.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 4/4] rtnetlink: promote IFLA_VF_STATS to same level as IFLA_VF_INFO
  2021-01-26  2:01   ` Jakub Kicinski
@ 2021-01-26 14:50     ` Edwin Peer
  0 siblings, 0 replies; 25+ messages in thread
From: Edwin Peer @ 2021-01-26 14:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Andrew Gospodarek, Michael Chan, Stephen Hemminger,
	Michal Kubecek, David Ahern

[-- Attachment #1: Type: text/plain, Size: 1613 bytes --]

On Mon, Jan 25, 2021 at 6:01 PM Jakub Kicinski <kuba@kernel.org> wrote:

> > Since changing the hierarchy does constitute an ABI change, it must
> > be explicitly requested via RTEXT_FILTER_VF_SEPARATE_STATS. Otherwise,
> > the old location is maintained for compatibility.
>
> We don't want any additions to the VF ABI via GETLINK. IMO the clear
> truncation is fine, hiding stats is pushing it, new attr is a no go.

How does one fix an API bug if one is not allowed to change it in any way?

Perhaps that's a rhetorical question, but I had hoped we could be more
pragmatic. I'm not particularly proud of this proposed patch on its
standalone technical merits. There are obviously far superior
solutions if we are permitted to add to the GETLINK VF API in a
meaningful way. That said, it does present a compromise that cannot be
construed as adding capabilities to a deprecated API. Instead of
adding new operations that don't suffer the same ills, it merely moves
existing structures around without changing any of the semantics of
the prevailing request.

By drawing such a hard line in the sand, you are declaring that this
bug has no possible resolution by decree. With my vendor hat on, it's
probably no skin off my teeth - all my competitors suffer the same
Linux usability issue. As a long time Linux user and somebody who
cares about the customer experience, it just makes me sad. :( There's
a difference between encouraging people to move towards a newer API,
by insisting that the development of new functionality happens there,
and actively making sure the old API sucks to use.

Regards,
Edwin Peer

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2021-01-26  4:56         ` David Ahern
@ 2021-01-26 17:51           ` Edwin Peer
  2023-06-05  7:28             ` Gal Pressman
  0 siblings, 1 reply; 25+ messages in thread
From: Edwin Peer @ 2021-01-26 17:51 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

[-- Attachment #1: Type: text/plain, Size: 658 bytes --]

On Mon, Jan 25, 2021 at 8:56 PM David Ahern <dsahern@gmail.com> wrote:

> I'm not a fan of the skb trim idea. I think it would be better to figure
> out how to stop adding to the skb when an attr length is going to exceed
> 64kB. Not failing hard with an error (ip link sh needs to succeed), but
> truncating the specific attribute of a message with a flag so userspace
> knows it is short.

Absent the ability to do something useful in terms of actually
avoiding the overflow [1], I'm abandoning this approach entirely. I
have a different idea that I will propose in due course.

[1] https://marc.info/?l=linux-netdev&m=161163943811663

Regards,
Edwin Peer

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 2/4] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO
  2021-01-26  1:55   ` Jakub Kicinski
@ 2021-01-26 22:48     ` Edwin Peer
  0 siblings, 0 replies; 25+ messages in thread
From: Edwin Peer @ 2021-01-26 22:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Andrew Gospodarek, Michael Chan, Stephen Hemminger,
	Michal Kubecek, David Ahern

[-- Attachment #1: Type: text/plain, Size: 516 bytes --]

On Mon, Jan 25, 2021 at 5:56 PM Jakub Kicinski <kuba@kernel.org> wrote:

> > Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice")
> > Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics")
> > Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
>
> Could you include in the commit message the size breakdown of a single
> VF nest? With and without efficient unaligned 64b access?

Oh, apologies. I see missed this question. I can certainly do that.

Regards,
Edwin Peer

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2021-01-26 17:51           ` Edwin Peer
@ 2023-06-05  7:28             ` Gal Pressman
  2023-06-05 18:58               ` Jakub Kicinski
  0 siblings, 1 reply; 25+ messages in thread
From: Gal Pressman @ 2023-06-05  7:28 UTC (permalink / raw)
  To: Edwin Peer, David Ahern
  Cc: netdev, Jakub Kicinski, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

On 26/01/2021 19:51, Edwin Peer wrote:
> On Mon, Jan 25, 2021 at 8:56 PM David Ahern <dsahern@gmail.com> wrote:
> 
>> I'm not a fan of the skb trim idea. I think it would be better to figure
>> out how to stop adding to the skb when an attr length is going to exceed
>> 64kB. Not failing hard with an error (ip link sh needs to succeed), but
>> truncating the specific attribute of a message with a flag so userspace
>> knows it is short.
> 
> Absent the ability to do something useful in terms of actually
> avoiding the overflow [1], I'm abandoning this approach entirely. I
> have a different idea that I will propose in due course.
> 
> [1] https://marc.info/?l=linux-netdev&m=161163943811663
> 
> Regards,
> Edwin Peer

Hello Edwin,

I'm also interested in getting this issue resolved, have you had any
progress since this series? Are you still working on it?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2023-06-05  7:28             ` Gal Pressman
@ 2023-06-05 18:58               ` Jakub Kicinski
  2023-06-05 19:27                 ` Edwin Peer
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2023-06-05 18:58 UTC (permalink / raw)
  To: Gal Pressman, espeer
  Cc: David Ahern, netdev, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

[Updating Edwin's email.]

On Mon, 5 Jun 2023 10:28:06 +0300 Gal Pressman wrote:
> On 26/01/2021 19:51, Edwin Peer wrote:
> > On Mon, Jan 25, 2021 at 8:56 PM David Ahern <dsahern@gmail.com> wrote:
> >   
> >> I'm not a fan of the skb trim idea. I think it would be better to figure
> >> out how to stop adding to the skb when an attr length is going to exceed
> >> 64kB. Not failing hard with an error (ip link sh needs to succeed), but
> >> truncating the specific attribute of a message with a flag so userspace
> >> knows it is short.  
> > 
> > Absent the ability to do something useful in terms of actually
> > avoiding the overflow [1], I'm abandoning this approach entirely. I
> > have a different idea that I will propose in due course.
> > 
> > [1] https://marc.info/?l=linux-netdev&m=161163943811663
> > 
> > Regards,
> > Edwin Peer  
> 
> Hello Edwin,
> 
> I'm also interested in getting this issue resolved, have you had any
> progress since this series? Are you still working on it?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2023-06-05 18:58               ` Jakub Kicinski
@ 2023-06-05 19:27                 ` Edwin Peer
  2023-06-06  8:01                   ` Gal Pressman
  0 siblings, 1 reply; 25+ messages in thread
From: Edwin Peer @ 2023-06-05 19:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Gal Pressman, David Ahern, netdev, Andrew Gospodarek,
	Michael Chan, Stephen Hemminger, Michal Kubecek

On Mon, Jun 5, 2023 at 11:58 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> [Updating Edwin's email.]
>
> On Mon, 5 Jun 2023 10:28:06 +0300 Gal Pressman wrote:
> > On 26/01/2021 19:51, Edwin Peer wrote:
> > > On Mon, Jan 25, 2021 at 8:56 PM David Ahern <dsahern@gmail.com> wrote:
> > >
> > >> I'm not a fan of the skb trim idea. I think it would be better to figure
> > >> out how to stop adding to the skb when an attr length is going to exceed
> > >> 64kB. Not failing hard with an error (ip link sh needs to succeed), but
> > >> truncating the specific attribute of a message with a flag so userspace
> > >> knows it is short.
> > >
> > > Absent the ability to do something useful in terms of actually
> > > avoiding the overflow [1], I'm abandoning this approach entirely. I
> > > have a different idea that I will propose in due course.
> > >
> > > [1] https://marc.info/?l=linux-netdev&m=161163943811663
> > >
> > > Regards,
> > > Edwin Peer
> >
> > Hello Edwin,
> >
> > I'm also interested in getting this issue resolved, have you had any
> > progress since this series? Are you still working on it?

Hi Kuba,

Thanks for the CC, I left Broadcom quite some time ago and am no
longer subscribed to netdev as a result (been living in firmware land
doing work in Rust).

I have no immediate plans to pick this up, at least not in the short
to medium term. My work in progress was on the laptop I returned and I
cannot immediately recall what solution I had in mind here.

Regards,
Edwin Peer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2023-06-05 19:27                 ` Edwin Peer
@ 2023-06-06  8:01                   ` Gal Pressman
  2023-06-06 16:17                     ` Jakub Kicinski
  0 siblings, 1 reply; 25+ messages in thread
From: Gal Pressman @ 2023-06-06  8:01 UTC (permalink / raw)
  To: Edwin Peer, Jakub Kicinski
  Cc: David Ahern, netdev, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

On 05/06/2023 22:27, Edwin Peer wrote:
> On Mon, Jun 5, 2023 at 11:58 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> [Updating Edwin's email.]
>>
>> On Mon, 5 Jun 2023 10:28:06 +0300 Gal Pressman wrote:
>>> On 26/01/2021 19:51, Edwin Peer wrote:
>>>> On Mon, Jan 25, 2021 at 8:56 PM David Ahern <dsahern@gmail.com> wrote:
>>>>
>>>>> I'm not a fan of the skb trim idea. I think it would be better to figure
>>>>> out how to stop adding to the skb when an attr length is going to exceed
>>>>> 64kB. Not failing hard with an error (ip link sh needs to succeed), but
>>>>> truncating the specific attribute of a message with a flag so userspace
>>>>> knows it is short.
>>>>
>>>> Absent the ability to do something useful in terms of actually
>>>> avoiding the overflow [1], I'm abandoning this approach entirely. I
>>>> have a different idea that I will propose in due course.
>>>>
>>>> [1] https://marc.info/?l=linux-netdev&m=161163943811663
>>>>
>>>> Regards,
>>>> Edwin Peer
>>>
>>> Hello Edwin,
>>>
>>> I'm also interested in getting this issue resolved, have you had any
>>> progress since this series? Are you still working on it?
> 
> Hi Kuba,
> 
> Thanks for the CC, I left Broadcom quite some time ago and am no
> longer subscribed to netdev as a result (been living in firmware land
> doing work in Rust).
> 
> I have no immediate plans to pick this up, at least not in the short
> to medium term. My work in progress was on the laptop I returned and I
> cannot immediately recall what solution I had in mind here.
> 
> Regards,
> Edwin Peer

Jakub, sorry if this has been discussed already in the past, but can you
please clarify what is an accepted (or more importantly, not accepted)
solution for this issue? I'm not familiar with the history and don't
want to repeat previous mistakes.

So far I've seen discussions about increasing the recv buffer size, and
this patchset which changes the GETLINK ABI, both of which were nacked.

Having 'ip link show' broken is very unfortunate :\, how should one
approach this issue in 2023?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2023-06-06  8:01                   ` Gal Pressman
@ 2023-06-06 16:17                     ` Jakub Kicinski
  2023-06-07 13:31                       ` Gal Pressman
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2023-06-06 16:17 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Edwin Peer, David Ahern, netdev, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

On Tue, 6 Jun 2023 11:01:14 +0300 Gal Pressman wrote:
> On 05/06/2023 22:27, Edwin Peer wrote:
> > Thanks for the CC, I left Broadcom quite some time ago and am no
> > longer subscribed to netdev as a result (been living in firmware land
> > doing work in Rust).
> > 
> > I have no immediate plans to pick this up, at least not in the short
> > to medium term. My work in progress was on the laptop I returned and I
> > cannot immediately recall what solution I had in mind here.
> 
> Jakub, sorry if this has been discussed already in the past, but can you
> please clarify what is an accepted (or more importantly, not accepted)
> solution for this issue? I'm not familiar with the history and don't
> want to repeat previous mistakes.

The problem is basically that attributes can only be 64kB and 
the legacy SR-IOV API wraps all the link info in an attribute.

> So far I've seen discussions about increasing the recv buffer size, and
> this patchset which changes the GETLINK ABI, both of which were nacked.

Filtering out some of the info, like the stats, is okay, but that just
increases the limit. A limit still exists.

> Having 'ip link show' broken is very unfortunate :\, how should one
> approach this issue in 2023?

Sure is, which is why we should be moving away from the legacy SR-IOV
APIs.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2023-06-06 16:17                     ` Jakub Kicinski
@ 2023-06-07 13:31                       ` Gal Pressman
  2023-06-07 16:33                         ` Jakub Kicinski
  0 siblings, 1 reply; 25+ messages in thread
From: Gal Pressman @ 2023-06-07 13:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Edwin Peer, David Ahern, netdev, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

On 06/06/2023 19:17, Jakub Kicinski wrote:
> On Tue, 6 Jun 2023 11:01:14 +0300 Gal Pressman wrote:
>> On 05/06/2023 22:27, Edwin Peer wrote:
>>> Thanks for the CC, I left Broadcom quite some time ago and am no
>>> longer subscribed to netdev as a result (been living in firmware land
>>> doing work in Rust).
>>>
>>> I have no immediate plans to pick this up, at least not in the short
>>> to medium term. My work in progress was on the laptop I returned and I
>>> cannot immediately recall what solution I had in mind here.
>>
>> Jakub, sorry if this has been discussed already in the past, but can you
>> please clarify what is an accepted (or more importantly, not accepted)
>> solution for this issue? I'm not familiar with the history and don't
>> want to repeat previous mistakes.
> 
> The problem is basically that attributes can only be 64kB and 
> the legacy SR-IOV API wraps all the link info in an attribute.

Isn't that a second order issue? The skb itself is limited to 32kB AFAICT.

>> So far I've seen discussions about increasing the recv buffer size, and
>> this patchset which changes the GETLINK ABI, both of which were nacked.
> 
> Filtering out some of the info, like the stats, is okay, but that just
> increases the limit. A limit still exists.

Any objections to at least take the second patch here?
It doesn't introduce any ABI changes, but will allow 'ip link show' to
work properly (although 'ip -s link show' will remain broken).

>> Having 'ip link show' broken is very unfortunate :\, how should one
>> approach this issue in 2023?
> 
> Sure is, which is why we should be moving away from the legacy SR-IOV
> APIs.

Agreed!
I do not suggest to extend/improve this API, just make sure it's not broken.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2023-06-07 13:31                       ` Gal Pressman
@ 2023-06-07 16:33                         ` Jakub Kicinski
  2023-06-07 16:52                           ` Stephen Hemminger
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2023-06-07 16:33 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Edwin Peer, David Ahern, netdev, Andrew Gospodarek, Michael Chan,
	Stephen Hemminger, Michal Kubecek

On Wed, 7 Jun 2023 16:31:48 +0300 Gal Pressman wrote:
> On 06/06/2023 19:17, Jakub Kicinski wrote:
> > On Tue, 6 Jun 2023 11:01:14 +0300 Gal Pressman wrote:  
> >> Jakub, sorry if this has been discussed already in the past, but can you
> >> please clarify what is an accepted (or more importantly, not accepted)
> >> solution for this issue? I'm not familiar with the history and don't
> >> want to repeat previous mistakes.  
> > 
> > The problem is basically that attributes can only be 64kB and 
> > the legacy SR-IOV API wraps all the link info in an attribute.  
> 
> Isn't that a second order issue? The skb itself is limited to 32kB AFAICT.

Hm, you're right. But allocation larger than 32kB are costly.
We can't make every link dump allocate 64kB, it will cause
regressions on systems under memory pressure (== real world).

You'd need to come up with some careful scheme of using larger
buffers.

> >> So far I've seen discussions about increasing the recv buffer size, and
> >> this patchset which changes the GETLINK ABI, both of which were nacked.  
> > 
> > Filtering out some of the info, like the stats, is okay, but that just
> > increases the limit. A limit still exists.  
> 
> Any objections to at least take the second patch here?
> It doesn't introduce any ABI changes, but will allow 'ip link show' to
> work properly (although 'ip -s link show' will remain broken).

Yup, retest / repost?

> >> Having 'ip link show' broken is very unfortunate :\, how should one
> >> approach this issue in 2023?  
> > 
> > Sure is, which is why we should be moving away from the legacy SR-IOV
> > APIs.  
> 
> Agreed!
> I do not suggest to extend/improve this API, just make sure it's not broken.
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2023-06-07 16:33                         ` Jakub Kicinski
@ 2023-06-07 16:52                           ` Stephen Hemminger
  2023-06-07 17:29                             ` Jakub Kicinski
  0 siblings, 1 reply; 25+ messages in thread
From: Stephen Hemminger @ 2023-06-07 16:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Gal Pressman, Edwin Peer, David Ahern, netdev, Andrew Gospodarek,
	Michael Chan, Michal Kubecek

On Wed, 7 Jun 2023 09:33:24 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> > > 
> > > The problem is basically that attributes can only be 64kB and 
> > > the legacy SR-IOV API wraps all the link info in an attribute.    
> > 
> > Isn't that a second order issue? The skb itself is limited to 32kB AFAICT.  
> 
> Hm, you're right. But allocation larger than 32kB are costly.
> We can't make every link dump allocate 64kB, it will cause
> regressions on systems under memory pressure (== real world).
> 
> You'd need to come up with some careful scheme of using larger
> buffers.

Why does it all have to be a single message?
Things like 3 million routes are dumped fine, as multiple messages.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end()
  2023-06-07 16:52                           ` Stephen Hemminger
@ 2023-06-07 17:29                             ` Jakub Kicinski
  0 siblings, 0 replies; 25+ messages in thread
From: Jakub Kicinski @ 2023-06-07 17:29 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Gal Pressman, Edwin Peer, David Ahern, netdev, Andrew Gospodarek,
	Michael Chan, Michal Kubecek

On Wed, 7 Jun 2023 09:52:54 -0700 Stephen Hemminger wrote:
> > Hm, you're right. But allocation larger than 32kB are costly.
> > We can't make every link dump allocate 64kB, it will cause
> > regressions on systems under memory pressure (== real world).
> > 
> > You'd need to come up with some careful scheme of using larger
> > buffers.  
> 
> Why does it all have to be a single message?
> Things like 3 million routes are dumped fine, as multiple messages.

The old API we can't change. The new API is switchdev / devlink,
and it's already in place. We'd have to add a third API, like 
the old one but with different msg format which for obvious 
reasons I'd prefer not to do.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-06-07 17:29 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-23  4:53 [PATCH net-next 0/4] support for 256 VFs in RTM_GETLINK Edwin Peer
2021-01-23  4:53 ` [PATCH net-next 1/4] netlink: truncate overlength attribute list in nla_nest_end() Edwin Peer
2021-01-23 19:14   ` David Ahern
2021-01-23 20:42     ` Edwin Peer
2021-01-23 21:03       ` Edwin Peer
2021-01-26  4:56         ` David Ahern
2021-01-26 17:51           ` Edwin Peer
2023-06-05  7:28             ` Gal Pressman
2023-06-05 18:58               ` Jakub Kicinski
2023-06-05 19:27                 ` Edwin Peer
2023-06-06  8:01                   ` Gal Pressman
2023-06-06 16:17                     ` Jakub Kicinski
2023-06-07 13:31                       ` Gal Pressman
2023-06-07 16:33                         ` Jakub Kicinski
2023-06-07 16:52                           ` Stephen Hemminger
2023-06-07 17:29                             ` Jakub Kicinski
2021-01-26  4:50       ` David Ahern
2021-01-26  1:43   ` Jakub Kicinski
2021-01-23  4:53 ` [PATCH net-next 2/4] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO Edwin Peer
2021-01-26  1:55   ` Jakub Kicinski
2021-01-26 22:48     ` Edwin Peer
2021-01-23  4:53 ` [PATCH net-next 3/4] rtnetlink: refactor IFLA_VF_INFO stats into rtnl_fill_vfstats() Edwin Peer
2021-01-23  4:53 ` [PATCH net-next 4/4] rtnetlink: promote IFLA_VF_STATS to same level as IFLA_VF_INFO Edwin Peer
2021-01-26  2:01   ` Jakub Kicinski
2021-01-26 14:50     ` Edwin Peer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.