All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
@ 2016-09-28  9:03 Cyrill Gorcunov
  2016-09-28 10:08 ` Jamal Hadi Salim
  0 siblings, 1 reply; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28  9:03 UTC (permalink / raw)
  To: Eric Dumazet, David Ahern
  Cc: netdev, linux-kernel, David Miller, kuznet, jmorris, yoshfuji,
	kaber, avagin, stephen

In criu we are actively using diag interface to collect sockets
present in the system when dumping applications. And while for
unix, tcp, udp[lite], packet, netlink it works as expected,
the raw sockets do not have. Thus add it.

v2:
 - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
 - implement @destroy for diag requests (by dsa@)

v3:
 - add export of raw_abort for IPv6 (by dsa@)
 - pass net-admin flag into inet_sk_diag_fill due to
   changes in net-next branch (by dsa@)

v4:
 - use @pad in struct inet_diag_req_v2 for raw socket
   protocol specification: raw module carries sockets
   which may have custom protocol passed from socket()
   syscall and sole @sdiag_protocol is not enough to
   match underlied ones
 - start reporting protocol specifed in socket() call
   when sockets are raw ones for the same reason: user
   space tools like ss may parse this attribute and use
   it for socket matching

v5 (by eric.dumazet@):
 - use sock_hold in raw_sock_get instead of atomic_inc,
   we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock
   when looking up so counter won't be zero here.

CC: David S. Miller <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Ahern <dsa@cumulusnetworks.com>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Andrey Vagin <avagin@openvz.org>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---

Thanks all for feedback! Take a look please once time permit.

 include/net/raw.h              |    6 +
 include/net/rawv6.h            |    7 +
 include/uapi/linux/inet_diag.h |    5 
 net/ipv4/Kconfig               |    8 +
 net/ipv4/Makefile              |    1 
 net/ipv4/inet_diag.c           |    9 +
 net/ipv4/raw.c                 |   21 +++
 net/ipv4/raw_diag.c            |  233 +++++++++++++++++++++++++++++++++++++++++
 net/ipv6/raw.c                 |    7 -
 9 files changed, 292 insertions(+), 5 deletions(-)

Index: linux-ml.git/include/net/raw.h
===================================================================
--- linux-ml.git.orig/include/net/raw.h
+++ linux-ml.git/include/net/raw.h
@@ -23,6 +23,12 @@
 
 extern struct proto raw_prot;
 
+extern struct raw_hashinfo raw_v4_hashinfo;
+struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
+			     unsigned short num, __be32 raddr,
+			     __be32 laddr, int dif);
+
+int raw_abort(struct sock *sk, int err);
 void raw_icmp_error(struct sk_buff *, int, u32);
 int raw_local_deliver(struct sk_buff *, int);
 
Index: linux-ml.git/include/net/rawv6.h
===================================================================
--- linux-ml.git.orig/include/net/rawv6.h
+++ linux-ml.git/include/net/rawv6.h
@@ -3,6 +3,13 @@
 
 #include <net/protocol.h>
 
+extern struct raw_hashinfo raw_v6_hashinfo;
+struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
+			     unsigned short num, const struct in6_addr *loc_addr,
+			     const struct in6_addr *rmt_addr, int dif);
+
+int raw_abort(struct sock *sk, int err);
+
 void raw6_icmp_error(struct sk_buff *, int nexthdr,
 		u8 type, u8 code, int inner_offset, __be32);
 bool raw6_local_deliver(struct sk_buff *, int);
Index: linux-ml.git/include/uapi/linux/inet_diag.h
===================================================================
--- linux-ml.git.orig/include/uapi/linux/inet_diag.h
+++ linux-ml.git/include/uapi/linux/inet_diag.h
@@ -38,7 +38,10 @@ struct inet_diag_req_v2 {
 	__u8	sdiag_family;
 	__u8	sdiag_protocol;
 	__u8	idiag_ext;
-	__u8	pad;
+	union {
+		__u8	pad;
+		__u8	sdiag_raw_protocol;	/* SOCK_RAW only, @pad for others */
+	};
 	__u32	idiag_states;
 	struct inet_diag_sockid id;
 };
Index: linux-ml.git/net/ipv4/Kconfig
===================================================================
--- linux-ml.git.orig/net/ipv4/Kconfig
+++ linux-ml.git/net/ipv4/Kconfig
@@ -430,6 +430,14 @@ config INET_UDP_DIAG
 	  Support for UDP socket monitoring interface used by the ss tool.
 	  If unsure, say Y.
 
+config INET_RAW_DIAG
+	tristate "RAW: socket monitoring interface"
+	depends on INET_DIAG && (IPV6 || IPV6=n)
+	default n
+	---help---
+	  Support for RAW socket monitoring interface used by the ss tool.
+	  If unsure, say Y.
+
 config INET_DIAG_DESTROY
 	bool "INET: allow privileged process to administratively close sockets"
 	depends on INET_DIAG
Index: linux-ml.git/net/ipv4/Makefile
===================================================================
--- linux-ml.git.orig/net/ipv4/Makefile
+++ linux-ml.git/net/ipv4/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_NETFILTER)	+= netfilter.o n
 obj-$(CONFIG_INET_DIAG) += inet_diag.o 
 obj-$(CONFIG_INET_TCP_DIAG) += tcp_diag.o
 obj-$(CONFIG_INET_UDP_DIAG) += udp_diag.o
+obj-$(CONFIG_INET_RAW_DIAG) += raw_diag.o
 obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o
 obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
 obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o
Index: linux-ml.git/net/ipv4/inet_diag.c
===================================================================
--- linux-ml.git.orig/net/ipv4/inet_diag.c
+++ linux-ml.git/net/ipv4/inet_diag.c
@@ -200,6 +200,15 @@ int inet_sk_diag_fill(struct sock *sk, s
 		if (sock_diag_put_meminfo(sk, skb, INET_DIAG_SKMEMINFO))
 			goto errout;
 
+	/*
+	 * RAW sockets might have user-defined protocols assigned,
+	 * so report the one supplied on socket creation.
+	 */
+	if (sk->sk_type == SOCK_RAW) {
+		if (nla_put_u8(skb, INET_DIAG_PROTOCOL, sk->sk_protocol))
+			goto errout;
+	}
+
 	if (!icsk) {
 		handler->idiag_get_info(sk, r, NULL);
 		goto out;
Index: linux-ml.git/net/ipv4/raw.c
===================================================================
--- linux-ml.git.orig/net/ipv4/raw.c
+++ linux-ml.git/net/ipv4/raw.c
@@ -89,9 +89,10 @@ struct raw_frag_vec {
 	int hlen;
 };
 
-static struct raw_hashinfo raw_v4_hashinfo = {
+struct raw_hashinfo raw_v4_hashinfo = {
 	.lock = __RW_LOCK_UNLOCKED(raw_v4_hashinfo.lock),
 };
+EXPORT_SYMBOL_GPL(raw_v4_hashinfo);
 
 int raw_hash_sk(struct sock *sk)
 {
@@ -120,7 +121,7 @@ void raw_unhash_sk(struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(raw_unhash_sk);
 
-static struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
+struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
 		unsigned short num, __be32 raddr, __be32 laddr, int dif)
 {
 	sk_for_each_from(sk) {
@@ -136,6 +137,7 @@ static struct sock *__raw_v4_lookup(stru
 found:
 	return sk;
 }
+EXPORT_SYMBOL_GPL(__raw_v4_lookup);
 
 /*
  *	0 - deliver
@@ -918,6 +920,20 @@ static int compat_raw_ioctl(struct sock
 }
 #endif
 
+int raw_abort(struct sock *sk, int err)
+{
+	lock_sock(sk);
+
+	sk->sk_err = err;
+	sk->sk_error_report(sk);
+	udp_disconnect(sk, 0);
+
+	release_sock(sk);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(raw_abort);
+
 struct proto raw_prot = {
 	.name		   = "RAW",
 	.owner		   = THIS_MODULE,
@@ -943,6 +959,7 @@ struct proto raw_prot = {
 	.compat_getsockopt = compat_raw_getsockopt,
 	.compat_ioctl	   = compat_raw_ioctl,
 #endif
+	.diag_destroy	   = raw_abort,
 };
 
 #ifdef CONFIG_PROC_FS
Index: linux-ml.git/net/ipv4/raw_diag.c
===================================================================
--- /dev/null
+++ linux-ml.git/net/ipv4/raw_diag.c
@@ -0,0 +1,233 @@
+#include <linux/module.h>
+
+#include <linux/inet_diag.h>
+#include <linux/sock_diag.h>
+
+#include <net/raw.h>
+#include <net/rawv6.h>
+
+#ifdef pr_fmt
+# undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+static struct raw_hashinfo *
+raw_get_hashinfo(const struct inet_diag_req_v2 *r)
+{
+	if (r->sdiag_family == AF_INET) {
+		return &raw_v4_hashinfo;
+#if IS_ENABLED(CONFIG_IPV6)
+	} else if (r->sdiag_family == AF_INET6) {
+		return &raw_v6_hashinfo;
+#endif
+	} else {
+		pr_warn_once("Unexpected inet family %d\n",
+			     r->sdiag_family);
+		WARN_ON_ONCE(1);
+		return ERR_PTR(-EINVAL);
+	}
+}
+
+static struct sock *raw_lookup(struct net *net, struct sock *from,
+			       const struct inet_diag_req_v2 *r)
+{
+	struct sock *sk = NULL;
+
+	if (r->sdiag_family == AF_INET)
+		sk = __raw_v4_lookup(net, from, r->sdiag_raw_protocol,
+				     r->id.idiag_dst[0],
+				     r->id.idiag_src[0],
+				     r->id.idiag_if);
+#if IS_ENABLED(CONFIG_IPV6)
+	else
+		sk = __raw_v6_lookup(net, from, r->sdiag_raw_protocol,
+				     (const struct in6_addr *)r->id.idiag_src,
+				     (const struct in6_addr *)r->id.idiag_dst,
+				     r->id.idiag_if);
+#endif
+	return sk;
+}
+
+static struct sock *raw_sock_get(struct net *net, const struct inet_diag_req_v2 *r)
+{
+	struct raw_hashinfo *hashinfo = raw_get_hashinfo(r);
+	struct sock *sk = NULL, *s;
+	int slot;
+
+	if (IS_ERR(hashinfo))
+		return ERR_CAST(hashinfo);
+
+	read_lock(&hashinfo->lock);
+	for (slot = 0; slot < RAW_HTABLE_SIZE; slot++) {
+		sk_for_each(s, &hashinfo->ht[slot]) {
+			sk = raw_lookup(net, s, r);
+			if (sk) {
+				/*
+				 * Grab it and keep until we fill
+				 * diag message to be reported, so
+				 * caller should call sock_put then.
+				 * We can do that because we're keeping
+				 * hashinfo->lock here.
+				 */
+				sock_hold(sk);
+				break;
+			}
+		}
+	}
+	read_unlock(&hashinfo->lock);
+
+	return sk ? sk : ERR_PTR(-ENOENT);
+}
+
+static int raw_diag_dump_one(struct sk_buff *in_skb,
+			     const struct nlmsghdr *nlh,
+			     const struct inet_diag_req_v2 *r)
+{
+	struct net *net = sock_net(in_skb->sk);
+	struct sk_buff *rep;
+	struct sock *sk;
+	int err;
+
+	sk = raw_sock_get(net, r);
+	if (IS_ERR(sk))
+		return PTR_ERR(sk);
+
+	rep = nlmsg_new(sizeof(struct inet_diag_msg) +
+			sizeof(struct inet_diag_meminfo) + 64,
+			GFP_KERNEL);
+	if (!rep) {
+		sock_put(sk);
+		return -ENOMEM;
+	}
+
+	err = inet_sk_diag_fill(sk, NULL, rep, r,
+				sk_user_ns(NETLINK_CB(in_skb).sk),
+				NETLINK_CB(in_skb).portid,
+				nlh->nlmsg_seq, 0, nlh,
+				netlink_net_capable(in_skb, CAP_NET_ADMIN));
+	sock_put(sk);
+
+	if (err < 0) {
+		kfree_skb(rep);
+		return err;
+	}
+
+	err = netlink_unicast(net->diag_nlsk, rep,
+			      NETLINK_CB(in_skb).portid,
+			      MSG_DONTWAIT);
+	if (err > 0)
+		err = 0;
+	return err;
+}
+
+static int sk_diag_dump(struct sock *sk, struct sk_buff *skb,
+			struct netlink_callback *cb,
+			const struct inet_diag_req_v2 *r,
+			struct nlattr *bc, bool net_admin)
+{
+	if (!inet_diag_bc_sk(bc, sk))
+		return 0;
+
+	return inet_sk_diag_fill(sk, NULL, skb, r,
+			sk_user_ns(NETLINK_CB(cb->skb).sk),
+			NETLINK_CB(cb->skb).portid,
+			cb->nlh->nlmsg_seq, NLM_F_MULTI,
+			cb->nlh, net_admin);
+}
+
+static void raw_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
+			  const struct inet_diag_req_v2 *r, struct nlattr *bc)
+{
+	bool net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN);
+	struct raw_hashinfo *hashinfo = raw_get_hashinfo(r);
+	struct net *net = sock_net(skb->sk);
+	int num, s_num, slot, s_slot;
+	struct sock *sk = NULL;
+
+	if (IS_ERR(hashinfo))
+		return;
+
+	s_slot = cb->args[0];
+	num = s_num = cb->args[1];
+
+	read_lock(&hashinfo->lock);
+	for (slot = s_slot; slot < RAW_HTABLE_SIZE; s_num = 0, slot++) {
+		num = 0;
+
+		sk_for_each(sk, &hashinfo->ht[slot]) {
+			struct inet_sock *inet = inet_sk(sk);
+
+			if (!net_eq(sock_net(sk), net))
+				continue;
+			if (num < s_num)
+				goto next;
+			if (sk->sk_family != r->sdiag_family)
+				goto next;
+			if (r->id.idiag_sport != inet->inet_sport &&
+			    r->id.idiag_sport)
+				goto next;
+			if (r->id.idiag_dport != inet->inet_dport &&
+			    r->id.idiag_dport)
+				goto next;
+			if (sk_diag_dump(sk, skb, cb, r, bc, net_admin) < 0)
+				goto out_unlock;
+next:
+			num++;
+		}
+	}
+
+out_unlock:
+	read_unlock(&hashinfo->lock);
+
+	cb->args[0] = slot;
+	cb->args[1] = num;
+}
+
+static void raw_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
+			      void *info)
+{
+	r->idiag_rqueue = sk_rmem_alloc_get(sk);
+	r->idiag_wqueue = sk_wmem_alloc_get(sk);
+}
+
+#ifdef CONFIG_INET_DIAG_DESTROY
+static int raw_diag_destroy(struct sk_buff *in_skb,
+			    const struct inet_diag_req_v2 *r)
+{
+	struct net *net = sock_net(in_skb->sk);
+	struct sock *sk;
+
+	sk = raw_sock_get(net, r);
+	if (IS_ERR(sk))
+		return PTR_ERR(sk);
+	return sock_diag_destroy(sk, ECONNABORTED);
+}
+#endif
+
+static const struct inet_diag_handler raw_diag_handler = {
+	.dump			= raw_diag_dump,
+	.dump_one		= raw_diag_dump_one,
+	.idiag_get_info		= raw_diag_get_info,
+	.idiag_type		= IPPROTO_RAW,
+	.idiag_info_size	= 0,
+#ifdef CONFIG_INET_DIAG_DESTROY
+	.destroy		= raw_diag_destroy,
+#endif
+};
+
+static int __init raw_diag_init(void)
+{
+	return inet_diag_register(&raw_diag_handler);
+}
+
+static void __exit raw_diag_exit(void)
+{
+	inet_diag_unregister(&raw_diag_handler);
+}
+
+module_init(raw_diag_init);
+module_exit(raw_diag_exit);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_NETLINK, NETLINK_SOCK_DIAG, 2-255 /* AF_INET - IPPROTO_RAW */);
+MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_NETLINK, NETLINK_SOCK_DIAG, 10-255 /* AF_INET6 - IPPROTO_RAW */);
Index: linux-ml.git/net/ipv6/raw.c
===================================================================
--- linux-ml.git.orig/net/ipv6/raw.c
+++ linux-ml.git/net/ipv6/raw.c
@@ -65,11 +65,12 @@
 
 #define	ICMPV6_HDRLEN	4	/* ICMPv6 header, RFC 4443 Section 2.1 */
 
-static struct raw_hashinfo raw_v6_hashinfo = {
+struct raw_hashinfo raw_v6_hashinfo = {
 	.lock = __RW_LOCK_UNLOCKED(raw_v6_hashinfo.lock),
 };
+EXPORT_SYMBOL_GPL(raw_v6_hashinfo);
 
-static struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
+struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
 		unsigned short num, const struct in6_addr *loc_addr,
 		const struct in6_addr *rmt_addr, int dif)
 {
@@ -102,6 +103,7 @@ static struct sock *__raw_v6_lookup(stru
 found:
 	return sk;
 }
+EXPORT_SYMBOL_GPL(__raw_v6_lookup);
 
 /*
  *	0 - deliver
@@ -1252,6 +1254,7 @@ struct proto rawv6_prot = {
 	.compat_getsockopt = compat_rawv6_getsockopt,
 	.compat_ioctl	   = compat_rawv6_ioctl,
 #endif
+	.diag_destroy	   = raw_abort,
 };
 
 #ifdef CONFIG_PROC_FS

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28  9:03 [PATCH v5] net: ip, diag -- Add diag interface for raw sockets Cyrill Gorcunov
@ 2016-09-28 10:08 ` Jamal Hadi Salim
  2016-09-28 10:17   ` Cyrill Gorcunov
  0 siblings, 1 reply; 27+ messages in thread
From: Jamal Hadi Salim @ 2016-09-28 10:08 UTC (permalink / raw)
  To: Cyrill Gorcunov, Eric Dumazet, David Ahern
  Cc: netdev, linux-kernel, David Miller, kuznet, jmorris, yoshfuji,
	kaber, avagin, stephen

On 16-09-28 05:03 AM, Cyrill Gorcunov wrote:
> In criu we are actively using diag interface to collect sockets
> present in the system when dumping applications. And while for
> unix, tcp, udp[lite], packet, netlink it works as expected,
> the raw sockets do not have. Thus add it.
>
> v2:
>  - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
>  - implement @destroy for diag requests (by dsa@)
>
> v3:
>  - add export of raw_abort for IPv6 (by dsa@)
>  - pass net-admin flag into inet_sk_diag_fill due to
>    changes in net-next branch (by dsa@)
>
> v4:
>  - use @pad in struct inet_diag_req_v2 for raw socket
>    protocol specification: raw module carries sockets
>    which may have custom protocol passed from socket()
>    syscall and sole @sdiag_protocol is not enough to
>    match underlied ones
>  - start reporting protocol specifed in socket() call
>    when sockets are raw ones for the same reason: user
>    space tools like ss may parse this attribute and use
>    it for socket matching
>
> v5 (by eric.dumazet@):
>  - use sock_hold in raw_sock_get instead of atomic_inc,
>    we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock
>    when looking up so counter won't be zero here.
>
> CC: David S. Miller <davem@davemloft.net>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: David Ahern <dsa@cumulusnetworks.com>
> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> CC: James Morris <jmorris@namei.org>
> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> CC: Patrick McHardy <kaber@trash.net>
> CC: Andrey Vagin <avagin@openvz.org>
> CC: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> ---
>
> Thanks all for feedback! Take a look please once time permit.
>
>  include/net/raw.h              |    6 +
>  include/net/rawv6.h            |    7 +
>  include/uapi/linux/inet_diag.h |    5
>  net/ipv4/Kconfig               |    8 +
>  net/ipv4/Makefile              |    1
>  net/ipv4/inet_diag.c           |    9 +
>  net/ipv4/raw.c                 |   21 +++
>  net/ipv4/raw_diag.c            |  233 +++++++++++++++++++++++++++++++++++++++++
>  net/ipv6/raw.c                 |    7 -
>  9 files changed, 292 insertions(+), 5 deletions(-)
>
> Index: linux-ml.git/include/net/raw.h
> ===================================================================
> --- linux-ml.git.orig/include/net/raw.h
> +++ linux-ml.git/include/net/raw.h
> @@ -23,6 +23,12 @@
>
>  extern struct proto raw_prot;
>
> +extern struct raw_hashinfo raw_v4_hashinfo;
> +struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
> +			     unsigned short num, __be32 raddr,
> +			     __be32 laddr, int dif);
> +
> +int raw_abort(struct sock *sk, int err);
>  void raw_icmp_error(struct sk_buff *, int, u32);
>  int raw_local_deliver(struct sk_buff *, int);
>
> Index: linux-ml.git/include/net/rawv6.h
> ===================================================================
> --- linux-ml.git.orig/include/net/rawv6.h
> +++ linux-ml.git/include/net/rawv6.h
> @@ -3,6 +3,13 @@
>
>  #include <net/protocol.h>
>
> +extern struct raw_hashinfo raw_v6_hashinfo;
> +struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
> +			     unsigned short num, const struct in6_addr *loc_addr,
> +			     const struct in6_addr *rmt_addr, int dif);
> +
> +int raw_abort(struct sock *sk, int err);
> +
>  void raw6_icmp_error(struct sk_buff *, int nexthdr,
>  		u8 type, u8 code, int inner_offset, __be32);
>  bool raw6_local_deliver(struct sk_buff *, int);
> Index: linux-ml.git/include/uapi/linux/inet_diag.h
> ===================================================================
> --- linux-ml.git.orig/include/uapi/linux/inet_diag.h
> +++ linux-ml.git/include/uapi/linux/inet_diag.h
> @@ -38,7 +38,10 @@ struct inet_diag_req_v2 {
>  	__u8	sdiag_family;
>  	__u8	sdiag_protocol;
>  	__u8	idiag_ext;
> -	__u8	pad;
> +	union {
> +		__u8	pad;
> +		__u8	sdiag_raw_protocol;	/* SOCK_RAW only, @pad for others */
> +	};


Above looks funny. Why is it a union? pad is for exposing a byte-hole
for padding/alignment reasons and i doubt anybody is using it.
Should you not just rename it?
Also I notice when things like __raw_v4_lookup() are claiming it is 
unsigned short instead of a u8?

cheers,
jamal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 10:08 ` Jamal Hadi Salim
@ 2016-09-28 10:17   ` Cyrill Gorcunov
  2016-09-28 10:43     ` Jamal Hadi Salim
  0 siblings, 1 reply; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 10:17 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Eric Dumazet, David Ahern, netdev, linux-kernel, David Miller,
	kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 06:08:00AM -0400, Jamal Hadi Salim wrote:
...
> > @@ -38,7 +38,10 @@ struct inet_diag_req_v2 {
> >  	__u8	sdiag_family;
> >  	__u8	sdiag_protocol;
> >  	__u8	idiag_ext;
> > -	__u8	pad;
> > +	union {
> > +		__u8	pad;
> > +		__u8	sdiag_raw_protocol;	/* SOCK_RAW only, @pad for others */
> > +	};
> 
> 
> Above looks funny. Why is it a union? pad is for exposing a byte-hole
> for padding/alignment reasons and i doubt anybody is using it.

Someone may have set it to zero explicitly on source level, and the
compilation will fail on new kernel then. So no, keeping the name
is reasonable.

> Should you not just rename it?
> Also I notice when things like __raw_v4_lookup() are claiming it is unsigned
> short instead of a u8?

The protocol is still up to 255 for a while, is it expected that IPPROTO_MAX
will be increased in more-less near future? Of course I can drop the idea
of using @pad here and switch to some extended reauest but prefer to stick
with simplier solution. Hm?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 10:17   ` Cyrill Gorcunov
@ 2016-09-28 10:43     ` Jamal Hadi Salim
  2016-09-28 10:51       ` Cyrill Gorcunov
  0 siblings, 1 reply; 27+ messages in thread
From: Jamal Hadi Salim @ 2016-09-28 10:43 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Eric Dumazet, David Ahern, netdev, linux-kernel, David Miller,
	kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On 16-09-28 06:17 AM, Cyrill Gorcunov wrote:
> On Wed, Sep 28, 2016 at 06:08:00AM -0400, Jamal Hadi Salim wrote:
> ...
>>> @@ -38,7 +38,10 @@ struct inet_diag_req_v2 {
>>>  	__u8	sdiag_family;
>>>  	__u8	sdiag_protocol;
>>>  	__u8	idiag_ext;
>>> -	__u8	pad;
>>> +	union {
>>> +		__u8	pad;
>>> +		__u8	sdiag_raw_protocol;	/* SOCK_RAW only, @pad for others */
>>> +	};
>>
>>
>> Above looks funny. Why is it a union? pad is for exposing a byte-hole
>> for padding/alignment reasons and i doubt anybody is using it.
>
> Someone may have set it to zero explicitly on source level, and the
> compilation will fail on new kernel then. So no, keeping the name
> is reasonable.
>

I dont know how compilation will fail but you may be right with note:
that is not how pads have been used in the past. They are supposed to
cosmetic annotation which indicates "here's a hole; use it in the
future if you are looking to add something". And someone in the
future can claim them. I am not sure if MBZ philosophy applies.

>> Should you not just rename it?
>> Also I notice when things like __raw_v4_lookup() are claiming it is unsigned
>> short instead of a u8?
>
> The protocol is still up to 255 for a while, is it expected that IPPROTO_MAX
> will be increased in more-less near future? Of course I can drop the idea
> of using @pad here and switch to some extended reauest but prefer to stick
> with simplier solution. Hm?
>

Ok. If i understood correctly it was already unsigned short before your
patch -so i agree it doesnt matter. Maybe just put a comment to express
that if ever protocol goes above 255 it wont be sufficient.



cheers,
jamal

PS:- sorry for butting in the discussion - i blame it on coffee.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 10:43     ` Jamal Hadi Salim
@ 2016-09-28 10:51       ` Cyrill Gorcunov
  2016-09-28 11:06         ` Jamal Hadi Salim
  0 siblings, 1 reply; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 10:51 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Eric Dumazet, David Ahern, netdev, linux-kernel, David Miller,
	kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 06:43:01AM -0400, Jamal Hadi Salim wrote:
...
> > 
> > Someone may have set it to zero explicitly on source level, and the
> > compilation will fail on new kernel then. So no, keeping the name
> > is reasonable.
> > 
> 
> I dont know how compilation will fail but you may be right with note:
> that is not how pads have been used in the past. They are supposed to
> cosmetic annotation which indicates "here's a hole; use it in the
> future if you are looking to add something". And someone in the
> future can claim them. I am not sure if MBZ philosophy applies.

This structure is uapi, so anyone has complete rights to reference
@pad in the userspace programs. Sure it would be more clear to remove
the @pad completely, but if we choose so I think it's better to do
on top instead and then if someone complain we can easily revert
the single trivial commit instead of this big patch.

> 
> > > Should you not just rename it?
> > > Also I notice when things like __raw_v4_lookup() are claiming it is unsigned
> > > short instead of a u8?
> > 
> > The protocol is still up to 255 for a while, is it expected that IPPROTO_MAX
> > will be increased in more-less near future? Of course I can drop the idea
> > of using @pad here and switch to some extended reauest but prefer to stick
> > with simplier solution. Hm?
> > 
> 
> Ok. If i understood correctly it was already unsigned short before your
> patch -so i agree it doesnt matter. Maybe just put a comment to express
> that if ever protocol goes above 255 it wont be sufficient.

If protocol goes over u8 then complete inet_diag_req_v2 structure will
have to be reworked becaue @sdiag_protocol is u8 as well. IOW, once
someone liftup IPPROTO_MAX > 255, he will notice the problem immediately
because diag for such module simply stop working properly.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 10:51       ` Cyrill Gorcunov
@ 2016-09-28 11:06         ` Jamal Hadi Salim
  2016-09-28 11:27           ` Cyrill Gorcunov
  0 siblings, 1 reply; 27+ messages in thread
From: Jamal Hadi Salim @ 2016-09-28 11:06 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Eric Dumazet, David Ahern, netdev, linux-kernel, David Miller,
	kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On 16-09-28 06:51 AM, Cyrill Gorcunov wrote:
> On Wed, Sep 28, 2016 at 06:43:01AM -0400, Jamal Hadi Salim wrote:

[..]
>> I dont know how compilation will fail but you may be right with note:
>> that is not how pads have been used in the past. They are supposed to
>> cosmetic annotation which indicates "here's a hole; use it in the
>> future if you are looking to add something". And someone in the
>> future can claim them. I am not sure if MBZ philosophy applies.
>
> This structure is uapi, so anyone has complete rights to reference
> @pad in the userspace programs. Sure it would be more clear to remove
> the @pad completely, but if we choose so I think it's better to do
> on top instead and then if someone complain we can easily revert
> the single trivial commit instead of this big patch.

I am conflicted.
A field labelled "pad" does not appear to be valid as "UAPI". It is
a cosmetic indicator. If you did sizeof() with or without it being
present the value doesnt change.
BTW: There is at least one major structure in inet diag has a hole
today and doesnt have a padding indicator.

> If protocol goes over u8 then complete inet_diag_req_v2 structure will
> have to be reworked becaue @sdiag_protocol is u8 as well. IOW, once
> someone liftup IPPROTO_MAX > 255, he will notice the problem immediately
> because diag for such module simply stop working properly.
>

ok.

cheers,
jamal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 11:06         ` Jamal Hadi Salim
@ 2016-09-28 11:27           ` Cyrill Gorcunov
  2016-09-28 12:06             ` Jamal Hadi Salim
                               ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 11:27 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Eric Dumazet, David Ahern, netdev, linux-kernel, David Miller,
	kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 07:06:26AM -0400, Jamal Hadi Salim wrote:
> > 
> > This structure is uapi, so anyone has complete rights to reference
> > @pad in the userspace programs. Sure it would be more clear to remove
> > the @pad completely, but if we choose so I think it's better to do
> > on top instead and then if someone complain we can easily revert
> > the single trivial commit instead of this big patch.
> 
> I am conflicted.
> A field labelled "pad" does not appear to be valid as "UAPI". It is
> a cosmetic indicator. If you did sizeof() with or without it being
> present the value doesnt change.

I think you miss the point what I'm trying to say: currently end-user
may have reference to this member (for any reason) and his program
will compile and run. If we change the name the compilation procedure
fails and this will break API. Yes, referrning @pad is bad idea for
userspace code, and yes (!) better to simply rename it but lets do
that later, on top, so that if we break something in userspace
we could easily revert the oneline change.

> BTW: There is at least one major structure in inet diag has a hole
> today and doesnt have a padding indicator.
> 
> > If protocol goes over u8 then complete inet_diag_req_v2 structure will
> > have to be reworked becaue @sdiag_protocol is u8 as well. IOW, once
> > someone liftup IPPROTO_MAX > 255, he will notice the problem immediately
> > because diag for such module simply stop working properly.
> > 
> 
> ok.
> 
> cheers,
> jamal
> 

	Cyrill

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 11:27           ` Cyrill Gorcunov
@ 2016-09-28 12:06             ` Jamal Hadi Salim
  2016-09-28 12:09               ` David Miller
  2016-09-28 12:21               ` Cyrill Gorcunov
  2016-09-28 12:07             ` David Miller
  2016-09-28 12:57             ` Eric Dumazet
  2 siblings, 2 replies; 27+ messages in thread
From: Jamal Hadi Salim @ 2016-09-28 12:06 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Eric Dumazet, David Ahern, netdev, linux-kernel, David Miller,
	kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On 16-09-28 07:27 AM, Cyrill Gorcunov wrote:
> On Wed, Sep 28, 2016 at 07:06:26AM -0400, Jamal Hadi Salim wrote:
>>>
>>> This structure is uapi, so anyone has complete rights to reference
>>> @pad in the userspace programs. Sure it would be more clear to remove
>>> the @pad completely, but if we choose so I think it's better to do
>>> on top instead and then if someone complain we can easily revert
>>> the single trivial commit instead of this big patch.
>>
>> I am conflicted.
>> A field labelled "pad" does not appear to be valid as "UAPI". It is
>> a cosmetic indicator. If you did sizeof() with or without it being
>> present the value doesnt change.
>
> I think you miss the point what I'm trying to say: currently end-user
> may have reference to this member (for any reason) and his program
> will compile and run. If we change the name the compilation procedure
> fails and this will break API. Yes, referrning @pad is bad idea for
> userspace code, and yes (!) better to simply rename it but lets do
> that later, on top, so that if we break something in userspace
> we could easily revert the oneline change.
>


I understood well your point;-> Maybe my response was not clear:
_nobody should be fscking fondling pad fields_ setting them or
otherwise.
Maybe let these programs fail. I asked if you knew any such app which
did anything with a pad field.

cheers,
jamal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 11:27           ` Cyrill Gorcunov
  2016-09-28 12:06             ` Jamal Hadi Salim
@ 2016-09-28 12:07             ` David Miller
  2016-09-28 12:09               ` Jamal Hadi Salim
  2016-09-28 12:18               ` Cyrill Gorcunov
  2016-09-28 12:57             ` Eric Dumazet
  2 siblings, 2 replies; 27+ messages in thread
From: David Miller @ 2016-09-28 12:07 UTC (permalink / raw)
  To: gorcunov
  Cc: jhs, eric.dumazet, dsa, netdev, linux-kernel, kuznet, jmorris,
	yoshfuji, kaber, avagin, stephen

From: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Wed, 28 Sep 2016 14:27:03 +0300

> On Wed, Sep 28, 2016 at 07:06:26AM -0400, Jamal Hadi Salim wrote:
>> > 
>> > This structure is uapi, so anyone has complete rights to reference
>> > @pad in the userspace programs. Sure it would be more clear to remove
>> > the @pad completely, but if we choose so I think it's better to do
>> > on top instead and then if someone complain we can easily revert
>> > the single trivial commit instead of this big patch.
>> 
>> I am conflicted.
>> A field labelled "pad" does not appear to be valid as "UAPI". It is
>> a cosmetic indicator. If you did sizeof() with or without it being
>> present the value doesnt change.
> 
> I think you miss the point what I'm trying to say: currently end-user
> may have reference to this member (for any reason) and his program
> will compile and run. If we change the name the compilation procedure
> fails and this will break API. Yes, referrning @pad is bad idea for
> userspace code, and yes (!) better to simply rename it but lets do
> that later, on top, so that if we break something in userspace
> we could easily revert the oneline change.

Right, it would be legal for an existing user to have code that
explicitly initializes every member of the structure, including 'pad'.
So we have to keep that member around, at a minimum, for their sake.

>> BTW: There is at least one major structure in inet diag has a hole
>> today and doesnt have a padding indicator.
>> 
>> > If protocol goes over u8 then complete inet_diag_req_v2 structure will
>> > have to be reworked becaue @sdiag_protocol is u8 as well. IOW, once
>> > someone liftup IPPROTO_MAX > 255, he will notice the problem immediately
>> > because diag for such module simply stop working properly.
>> > 
>> 
>> ok.

Indeed, we need a 16-bit value here.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:06             ` Jamal Hadi Salim
@ 2016-09-28 12:09               ` David Miller
  2016-09-28 12:21               ` Cyrill Gorcunov
  1 sibling, 0 replies; 27+ messages in thread
From: David Miller @ 2016-09-28 12:09 UTC (permalink / raw)
  To: jhs
  Cc: gorcunov, eric.dumazet, dsa, netdev, linux-kernel, kuznet,
	jmorris, yoshfuji, kaber, avagin, stephen

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Wed, 28 Sep 2016 08:06:51 -0400

> I understood well your point;-> Maybe my response was not clear:
> _nobody should be fscking fondling pad fields_ setting them or
> otherwise.

Especially considering potential future uses of the field, existing
users absolutely must zero out the field.

Whether this is via a memset() of the entire structure or via
an explicit initialization to zero is their choice.

So setting 'pad' is in fact valid.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:07             ` David Miller
@ 2016-09-28 12:09               ` Jamal Hadi Salim
  2016-09-28 12:16                 ` David Miller
  2016-09-28 12:18               ` Cyrill Gorcunov
  1 sibling, 1 reply; 27+ messages in thread
From: Jamal Hadi Salim @ 2016-09-28 12:09 UTC (permalink / raw)
  To: David Miller, gorcunov
  Cc: eric.dumazet, dsa, netdev, linux-kernel, kuznet, jmorris,
	yoshfuji, kaber, avagin, stephen

On 16-09-28 08:07 AM, David Miller wrote:

> Right, it would be legal for an existing user to have code that
> explicitly initializes every member of the structure, including 'pad'.
> So we have to keep that member around, at a minimum, for their sake.
>

I think we need to start labelling any new pad fields added as
"Not UAPI. Do not fsck fondle this".

cheers,
jamal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:09               ` Jamal Hadi Salim
@ 2016-09-28 12:16                 ` David Miller
  2016-09-28 12:27                   ` Jamal Hadi Salim
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2016-09-28 12:16 UTC (permalink / raw)
  To: jhs
  Cc: gorcunov, eric.dumazet, dsa, netdev, linux-kernel, kuznet,
	jmorris, yoshfuji, kaber, avagin, stephen

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Wed, 28 Sep 2016 08:09:28 -0400

> On 16-09-28 08:07 AM, David Miller wrote:
> 
>> Right, it would be legal for an existing user to have code that
>> explicitly initializes every member of the structure, including 'pad'.
>> So we have to keep that member around, at a minimum, for their sake.
>>
> 
> I think we need to start labelling any new pad fields added as
> "Not UAPI. Do not fsck fondle this".

They must initialize it to zero.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:07             ` David Miller
  2016-09-28 12:09               ` Jamal Hadi Salim
@ 2016-09-28 12:18               ` Cyrill Gorcunov
  1 sibling, 0 replies; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 12:18 UTC (permalink / raw)
  To: David Miller
  Cc: jhs, eric.dumazet, dsa, netdev, linux-kernel, kuznet, jmorris,
	yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 08:07:01AM -0400, David Miller wrote:
...
> > 
> > I think you miss the point what I'm trying to say: currently end-user
> > may have reference to this member (for any reason) and his program
> > will compile and run. If we change the name the compilation procedure
> > fails and this will break API. Yes, referrning @pad is bad idea for
> > userspace code, and yes (!) better to simply rename it but lets do
> > that later, on top, so that if we break something in userspace
> > we could easily revert the oneline change.
> 
> Right, it would be legal for an existing user to have code that
> explicitly initializes every member of the structure, including 'pad'.
> So we have to keep that member around, at a minimum, for their sake.

+1

> 
> >> BTW: There is at least one major structure in inet diag has a hole
> >> today and doesnt have a padding indicator.
> >> 
> >> > If protocol goes over u8 then complete inet_diag_req_v2 structure will
> >> > have to be reworked becaue @sdiag_protocol is u8 as well. IOW, once
> >> > someone liftup IPPROTO_MAX > 255, he will notice the problem immediately
> >> > because diag for such module simply stop working properly.
> 
> Indeed, we need a 16-bit value here.

Yes, and we will need inet_diag_req_v3 for this sake ;) I think
we can even introduce it early and convert _v2 to _v3 transparently
inside kernel. I could start working on such change if people agreed
(but a bit latter, on the next week probably)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:06             ` Jamal Hadi Salim
  2016-09-28 12:09               ` David Miller
@ 2016-09-28 12:21               ` Cyrill Gorcunov
  1 sibling, 0 replies; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 12:21 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Eric Dumazet, David Ahern, netdev, linux-kernel, David Miller,
	kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 08:06:51AM -0400, Jamal Hadi Salim wrote:
> 
> I understood well your point;-> Maybe my response was not clear:
> _nobody should be fscking fondling pad fields_ setting them or
> otherwise.
> Maybe let these programs fail. I asked if you knew any such app which
> did anything with a pad field.

Nope, I don't know such programs. criu and iproute2 both zeroify
this strcuture explicitly by whole. But who knows where else this
structure used in userspace world.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:16                 ` David Miller
@ 2016-09-28 12:27                   ` Jamal Hadi Salim
  2016-09-28 12:38                     ` Jamal Hadi Salim
  2016-09-28 12:45                     ` Cyrill Gorcunov
  0 siblings, 2 replies; 27+ messages in thread
From: Jamal Hadi Salim @ 2016-09-28 12:27 UTC (permalink / raw)
  To: David Miller
  Cc: gorcunov, eric.dumazet, dsa, netdev, linux-kernel, kuznet,
	jmorris, yoshfuji, kaber, avagin, stephen

On 16-09-28 08:16 AM, David Miller wrote:
> From: Jamal Hadi Salim <jhs@mojatatu.com>
> Date: Wed, 28 Sep 2016 08:09:28 -0400
>
>> On 16-09-28 08:07 AM, David Miller wrote:
>>
>>> Right, it would be legal for an existing user to have code that
>>> explicitly initializes every member of the structure, including 'pad'.
>>> So we have to keep that member around, at a minimum, for their sake.
>>>
>>
>> I think we need to start labelling any new pad fields added as
>> "Not UAPI. Do not fsck fondle this".
>
> They must initialize it to zero.
>

What if in the future actually meant to use 0 for
something?;-> For example in Cyrill's case it means PROTO_IP
Not sure if it useful to interpret or not but it is part of the
enum for protocols.

Maybe we shouldnt be adding pad fields in these netlink
structure definitions then one can liberally add new ones.
Note: inet_diag somewhere has a netlink structure that
has a hole. I pointed it out to Eric D. and he said we cant
add it now because it would break ABI.
So where do we draw the line for future extensions?

cheers,
jamal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:27                   ` Jamal Hadi Salim
@ 2016-09-28 12:38                     ` Jamal Hadi Salim
  2016-09-28 12:45                     ` Cyrill Gorcunov
  1 sibling, 0 replies; 27+ messages in thread
From: Jamal Hadi Salim @ 2016-09-28 12:38 UTC (permalink / raw)
  To: David Miller
  Cc: gorcunov, eric.dumazet, dsa, netdev, linux-kernel, kuznet,
	jmorris, yoshfuji, kaber, avagin, stephen

On 16-09-28 08:27 AM, Jamal Hadi Salim wrote:
> On 16-09-28 08:16 AM, David Miller wrote:
>> From: Jamal Hadi Salim <jhs@mojatatu.com>
>> Date: Wed, 28 Sep 2016 08:09:28 -0400
>>
>>> On 16-09-28 08:07 AM, David Miller wrote:
>>>
>>>> Right, it would be legal for an existing user to have code that
>>>> explicitly initializes every member of the structure, including 'pad'.
>>>> So we have to keep that member around, at a minimum, for their sake.
>>>>
>>>
>>> I think we need to start labelling any new pad fields added as
>>> "Not UAPI. Do not fsck fondle this".
>>
>> They must initialize it to zero.
>>
>
> What if in the future actually meant to use 0 for
> something?;-> For example in Cyrill's case it means PROTO_IP
> Not sure if it useful to interpret or not but it is part of the
> enum for protocols.

I just tested with: socket(AF_INET, SOCK_RAW, 0);

and got back: EPROTONOSUPPORT

So in this case doesnt matter. But my point stands i.e 0 could mean
something.

cheers,
jamal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:27                   ` Jamal Hadi Salim
  2016-09-28 12:38                     ` Jamal Hadi Salim
@ 2016-09-28 12:45                     ` Cyrill Gorcunov
  2016-09-28 12:50                       ` Jamal Hadi Salim
  1 sibling, 1 reply; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 12:45 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David Miller, eric.dumazet, dsa, netdev, linux-kernel, kuznet,
	jmorris, yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 08:27:24AM -0400, Jamal Hadi Salim wrote:
> > 
> > They must initialize it to zero.
> > 
> 
> What if in the future actually meant to use 0 for
> something?;-> For example in Cyrill's case it means PROTO_IP
> Not sure if it useful to interpret or not but it is part of the
> enum for protocols.

It will be perfectly fine if we start using 0 here for something,
the main match key is @sdiag_proto (which will be IPPROTO_RAW for
my case). If someone is to use this field for something else the
main key selection remain the same, iow @sdiag_proto first and
then subprotocol if needed.

> Maybe we shouldnt be adding pad fields in these netlink
> structure definitions then one can liberally add new ones.

You know, I personally don't see much problem in defining union,
especially while anonymous uninons do work for us here.

> Note: inet_diag somewhere has a netlink structure that
> has a hole. I pointed it out to Eric D. and he said we cant
> add it now because it would break ABI.

Naming holes generated by a compiler for alignment sake should
not break abi (because alignments are abi by self), but I may
be missing something in context of where the structure you're
talking about is present. And what about non-x86 archs? They
might have different members alignment requirements.

> So where do we draw the line for future extensions?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:45                     ` Cyrill Gorcunov
@ 2016-09-28 12:50                       ` Jamal Hadi Salim
  2016-09-28 12:57                         ` Cyrill Gorcunov
  0 siblings, 1 reply; 27+ messages in thread
From: Jamal Hadi Salim @ 2016-09-28 12:50 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: David Miller, eric.dumazet, dsa, netdev, linux-kernel, kuznet,
	jmorris, yoshfuji, kaber, avagin, stephen

On 16-09-28 08:45 AM, Cyrill Gorcunov wrote:

>> Note: inet_diag somewhere has a netlink structure that
>> has a hole. I pointed it out to Eric D. and he said we cant
>> add it now because it would break ABI.
>
> Naming holes generated by a compiler for alignment sake should
> not break abi (because alignments are abi by self), but I may
> be missing something in context of where the structure you're
> talking about is present. And what about non-x86 archs? They
> might have different members alignment requirements.
>

struct tcp_info.

Sorry - i didnt mean to drag this for long; but the more i think
about it, the union is a pragmatic approach.

cheers,
jamal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 11:27           ` Cyrill Gorcunov
  2016-09-28 12:06             ` Jamal Hadi Salim
  2016-09-28 12:07             ` David Miller
@ 2016-09-28 12:57             ` Eric Dumazet
  2016-09-28 13:02               ` Eric Dumazet
  2016-09-28 13:03               ` Cyrill Gorcunov
  2 siblings, 2 replies; 27+ messages in thread
From: Eric Dumazet @ 2016-09-28 12:57 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Jamal Hadi Salim, David Ahern, netdev, linux-kernel,
	David Miller, kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, 2016-09-28 at 14:27 +0300, Cyrill Gorcunov wrote:
> On Wed, Sep 28, 2016 at 07:06:26AM -0400, Jamal Hadi Salim wrote:
> > > 
> > > This structure is uapi, so anyone has complete rights to reference
> > > @pad in the userspace programs. Sure it would be more clear to remove
> > > the @pad completely, but if we choose so I think it's better to do
> > > on top instead and then if someone complain we can easily revert
> > > the single trivial commit instead of this big patch.
> > 
> > I am conflicted.
> > A field labelled "pad" does not appear to be valid as "UAPI". It is
> > a cosmetic indicator. If you did sizeof() with or without it being
> > present the value doesnt change.
> 
> I think you miss the point what I'm trying to say: currently end-user
> may have reference to this member (for any reason) and his program
> will compile and run. If we change the name the compilation procedure
> fails and this will break API. Yes, referrning @pad is bad idea for
> userspace code, and yes (!) better to simply rename it but lets do
> that later, on top, so that if we break something in userspace
> we could easily revert the oneline change.

Note that some programs could fail to compile with the added union
anyway.

Some gcc versions are unable to compile a static init with an union

struct inet_diag_req_v2 foo = { .pad = 0, sdiag_family = AF_INET, };

When I cooked my recent fq commit I simply removed a pad and replaced
it :

git show fefa569a9d4bc4 -- include

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:50                       ` Jamal Hadi Salim
@ 2016-09-28 12:57                         ` Cyrill Gorcunov
  0 siblings, 0 replies; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 12:57 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David Miller, eric.dumazet, dsa, netdev, linux-kernel, kuznet,
	jmorris, yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 08:50:31AM -0400, Jamal Hadi Salim wrote:
> 
> struct tcp_info.

Yeah I see. As I said naming pads will be safe but to do so we
will have to compile on every arch we support and make sure the
implicit pad remains here.

> Sorry - i didnt mean to drag this for long; but the more i think
> about it, the union is a pragmatic approach.

Sure, lets keep it as is.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:57             ` Eric Dumazet
@ 2016-09-28 13:02               ` Eric Dumazet
  2016-09-28 13:09                 ` Cyrill Gorcunov
  2016-09-28 13:03               ` Cyrill Gorcunov
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2016-09-28 13:02 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Jamal Hadi Salim, David Ahern, netdev, linux-kernel,
	David Miller, kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, 2016-09-28 at 05:57 -0700, Eric Dumazet wrote:

> Note that some programs could fail to compile with the added union
> anyway.
> 
> Some gcc versions are unable to compile a static init with an union
> 
> struct inet_diag_req_v2 foo = { .pad = 0, sdiag_family = AF_INET, };
> 
> When I cooked my recent fq commit I simply removed a pad and replaced
> it :
> 
> git show fefa569a9d4bc4 -- include

This is a bit different of course, since struct tc_fq_qd_stats is only
one way : Kernel produces the content and gives it to user space.

User space should probably not need to initialize such a structure, but
who knows what a programmer can write ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 12:57             ` Eric Dumazet
  2016-09-28 13:02               ` Eric Dumazet
@ 2016-09-28 13:03               ` Cyrill Gorcunov
  2016-09-28 13:29                 ` Eric Dumazet
  1 sibling, 1 reply; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 13:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jamal Hadi Salim, David Ahern, netdev, linux-kernel,
	David Miller, kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 05:57:12AM -0700, Eric Dumazet wrote:
...
> Note that some programs could fail to compile with the added union
> anyway.
> 
> Some gcc versions are unable to compile a static init with an union
> 
> struct inet_diag_req_v2 foo = { .pad = 0, sdiag_family = AF_INET, };
> 
> When I cooked my recent fq commit I simply removed a pad and replaced
> it :
> 
> git show fefa569a9d4bc4 -- include

Oh, crap :( I've been looking into uapi headers, found that we
use anonymous unions (for example include/uapi/linux/bcache.h)
and thought it will be safe (and my test builds didn't fail).
Are you happen to know which gcc versions cant do that?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 13:02               ` Eric Dumazet
@ 2016-09-28 13:09                 ` Cyrill Gorcunov
  0 siblings, 0 replies; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 13:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jamal Hadi Salim, David Ahern, netdev, linux-kernel,
	David Miller, kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 06:02:11AM -0700, Eric Dumazet wrote:
> 
> This is a bit different of course, since struct tc_fq_qd_stats is only
> one way : Kernel produces the content and gives it to user space.
> 
> User space should probably not need to initialize such a structure, but
> who knows what a programmer can write ;)

It's the same issue as I have: the users may have initialized tc_fq_qd_stats for
some reson. But since nobody complained yet about such change -- it is a good
sign of catching the luck :)

You know at the moment I'm not sure how to be then -- either leave it as
uninon, either rename it (via patch on top), or even simply require the
protocol to be passed as additional data in netlink packet...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 13:03               ` Cyrill Gorcunov
@ 2016-09-28 13:29                 ` Eric Dumazet
  2016-09-28 13:43                   ` Cyrill Gorcunov
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2016-09-28 13:29 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Jamal Hadi Salim, David Ahern, netdev, linux-kernel,
	David Miller, kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, 2016-09-28 at 16:03 +0300, Cyrill Gorcunov wrote:
> On Wed, Sep 28, 2016 at 05:57:12AM -0700, Eric Dumazet wrote:
> ...
> > Note that some programs could fail to compile with the added union
> > anyway.
> > 
> > Some gcc versions are unable to compile a static init with an union
> > 
> > struct inet_diag_req_v2 foo = { .pad = 0, sdiag_family = AF_INET, };
> > 
> > When I cooked my recent fq commit I simply removed a pad and replaced
> > it :
> > 
> > git show fefa569a9d4bc4 -- include
> 
> Oh, crap :( I've been looking into uapi headers, found that we
> use anonymous unions (for example include/uapi/linux/bcache.h)
> and thought it will be safe (and my test builds didn't fail).
> Are you happen to know which gcc versions cant do that?

The most recent example I have in mind is a kbuild bot report on the
recent TCP BBR patches. We had to rework the patch to avoid the problem.

If I remember well, this was a gcc-4.7, but not on x86.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 13:29                 ` Eric Dumazet
@ 2016-09-28 13:43                   ` Cyrill Gorcunov
  2016-09-28 13:49                     ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 13:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jamal Hadi Salim, David Ahern, netdev, linux-kernel,
	David Miller, kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 06:29:08AM -0700, Eric Dumazet wrote:
> > 
> > Oh, crap :( I've been looking into uapi headers, found that we
> > use anonymous unions (for example include/uapi/linux/bcache.h)
> > and thought it will be safe (and my test builds didn't fail).
> > Are you happen to know which gcc versions cant do that?
> 
> The most recent example I have in mind is a kbuild bot report on the
> recent TCP BBR patches. We had to rework the patch to avoid the problem.
> 
> If I remember well, this was a gcc-4.7, but not on x86.

4.7 is pretty widespread, so I've to think...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 13:43                   ` Cyrill Gorcunov
@ 2016-09-28 13:49                     ` Eric Dumazet
  2016-09-28 14:55                       ` Cyrill Gorcunov
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2016-09-28 13:49 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Jamal Hadi Salim, David Ahern, netdev, linux-kernel,
	David Miller, kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, 2016-09-28 at 16:43 +0300, Cyrill Gorcunov wrote:
> On Wed, Sep 28, 2016 at 06:29:08AM -0700, Eric Dumazet wrote:
> > > 
> > > Oh, crap :( I've been looking into uapi headers, found that we
> > > use anonymous unions (for example include/uapi/linux/bcache.h)
> > > and thought it will be safe (and my test builds didn't fail).
> > > Are you happen to know which gcc versions cant do that?
> > 
> > The most recent example I have in mind is a kbuild bot report on the
> > recent TCP BBR patches. We had to rework the patch to avoid the problem.
> > 
> > If I remember well, this was a gcc-4.7, but not on x86.
> 
> 4.7 is pretty widespread, so I've to think...

Sorry, 4.4.7 it was

https://www.mail-archive.com/netdev@vger.kernel.org/msg128714.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] net: ip, diag -- Add diag interface for raw sockets
  2016-09-28 13:49                     ` Eric Dumazet
@ 2016-09-28 14:55                       ` Cyrill Gorcunov
  0 siblings, 0 replies; 27+ messages in thread
From: Cyrill Gorcunov @ 2016-09-28 14:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jamal Hadi Salim, David Ahern, netdev, linux-kernel,
	David Miller, kuznet, jmorris, yoshfuji, kaber, avagin, stephen

On Wed, Sep 28, 2016 at 06:49:51AM -0700, Eric Dumazet wrote:
> > 
> > 4.7 is pretty widespread, so I've to think...
> 
> Sorry, 4.4.7 it was
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg128714.html

Ah, thanks for info!

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-09-28 14:55 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-28  9:03 [PATCH v5] net: ip, diag -- Add diag interface for raw sockets Cyrill Gorcunov
2016-09-28 10:08 ` Jamal Hadi Salim
2016-09-28 10:17   ` Cyrill Gorcunov
2016-09-28 10:43     ` Jamal Hadi Salim
2016-09-28 10:51       ` Cyrill Gorcunov
2016-09-28 11:06         ` Jamal Hadi Salim
2016-09-28 11:27           ` Cyrill Gorcunov
2016-09-28 12:06             ` Jamal Hadi Salim
2016-09-28 12:09               ` David Miller
2016-09-28 12:21               ` Cyrill Gorcunov
2016-09-28 12:07             ` David Miller
2016-09-28 12:09               ` Jamal Hadi Salim
2016-09-28 12:16                 ` David Miller
2016-09-28 12:27                   ` Jamal Hadi Salim
2016-09-28 12:38                     ` Jamal Hadi Salim
2016-09-28 12:45                     ` Cyrill Gorcunov
2016-09-28 12:50                       ` Jamal Hadi Salim
2016-09-28 12:57                         ` Cyrill Gorcunov
2016-09-28 12:18               ` Cyrill Gorcunov
2016-09-28 12:57             ` Eric Dumazet
2016-09-28 13:02               ` Eric Dumazet
2016-09-28 13:09                 ` Cyrill Gorcunov
2016-09-28 13:03               ` Cyrill Gorcunov
2016-09-28 13:29                 ` Eric Dumazet
2016-09-28 13:43                   ` Cyrill Gorcunov
2016-09-28 13:49                     ` Eric Dumazet
2016-09-28 14:55                       ` Cyrill Gorcunov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.