All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg()
@ 2013-08-23 12:19 Francesco Fusco
  2013-08-23 12:19 ` [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data Francesco Fusco
  2013-08-23 12:19 ` [PATCH net-next v2 2/2] ipv4: processing ancillary IP_TOS or IP_TTL Francesco Fusco
  0 siblings, 2 replies; 7+ messages in thread
From: Francesco Fusco @ 2013-08-23 12:19 UTC (permalink / raw)
  To: davem; +Cc: netdev

There is no way to set the IP_TOS field on a per-packet basis in IPv4, while
IPv6 has such a mechanism. Therefore one has to fall back to the setsockopt()
in case of IPv4. 

Using the existing per-socket option is not convenient particularly in the
situations where multiple threads have to use the same socket data requiring
per-thread TOS values. In fact this would involve calling setsockopt() before
sendmsg() every time.

Francesco Fusco (2):
  ipv4: IP_TOS and IP_TTL can be specified as ancillary data
  ipv4: processing ancillary IP_TOS or IP_TTL

 include/net/inet_sock.h |  3 +++
 include/net/ip.h        | 14 ++++++++++++++
 include/net/route.h     |  1 +
 net/ipv4/icmp.c         |  5 +++++
 net/ipv4/ip_output.c    | 13 ++++++++++---
 net/ipv4/ip_sockglue.c  | 20 +++++++++++++++++++-
 net/ipv4/ping.c         |  4 +++-
 net/ipv4/raw.c          |  4 +++-
 net/ipv4/udp.c          |  4 +++-
 9 files changed, 61 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
  2013-08-23 12:19 [PATCH net-next v2 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg() Francesco Fusco
@ 2013-08-23 12:19 ` Francesco Fusco
  2013-08-27 18:56   ` David Miller
  2013-08-23 12:19 ` [PATCH net-next v2 2/2] ipv4: processing ancillary IP_TOS or IP_TTL Francesco Fusco
  1 sibling, 1 reply; 7+ messages in thread
From: Francesco Fusco @ 2013-08-23 12:19 UTC (permalink / raw)
  To: davem; +Cc: netdev

This patch enables the IP_TTL and IP_TOS values passed from userspace to
be stored in the ipcm_cookie struct. Three fields are added to the struct:

- the TTL, expressed as __u8.
  The allowed values are in the [1-255].
  A value of 0 means that the TTL is not specified.

- the TOS, expressed as __s16.
  The allowed values are in the range [0,255].
  A value of -1 means that the TOS is not specified.

- the priority, expressed as a char and computed when
  handling the ancillary data.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
---
 v1->v2
  - changed the icmp_cookie ttl field from __s16 to __u8.
    A value of 0 means that the TTL has not been specified
  - to tos field is still __s16. The user can specify
    values in the range 0-255 included, therefore I use
    a value of -1 as a flag saying that the value has
    not been specified
  - the priority it is now a char instead of a __u32, 
    which is the return type of rt_tos2priority
  - improved commit message

 include/net/ip.h       |  3 +++
 net/ipv4/ip_sockglue.c | 20 +++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index a68f838..84b5476 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -56,6 +56,9 @@ struct ipcm_cookie {
 	int			oif;
 	struct ip_options_rcu	*opt;
 	__u8			tx_flags;
+	__u8			ttl;
+	__s16			tos;
+	char			priority;
 };
 
 #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index d9c4f11..56e3445 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -189,7 +189,7 @@ EXPORT_SYMBOL(ip_cmsg_recv);
 
 int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
 {
-	int err;
+	int err, val;
 	struct cmsghdr *cmsg;
 
 	for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
@@ -215,6 +215,24 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
 			ipc->addr = info->ipi_spec_dst.s_addr;
 			break;
 		}
+		case IP_TTL:
+			if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+				return -EINVAL;
+			val = *(int *)CMSG_DATA(cmsg);
+			if (val < 1 || val > 255)
+				return -EINVAL;
+			ipc->ttl = val;
+			break;
+		case IP_TOS:
+			if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+				return -EINVAL;
+			val = *(int *)CMSG_DATA(cmsg);
+			if (val < 0 || val > 255)
+				return -EINVAL;
+			ipc->tos = val;
+			ipc->priority = rt_tos2priority(ipc->tos);
+			break;
+
 		default:
 			return -EINVAL;
 		}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next v2 2/2] ipv4: processing ancillary IP_TOS or IP_TTL
  2013-08-23 12:19 [PATCH net-next v2 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg() Francesco Fusco
  2013-08-23 12:19 ` [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data Francesco Fusco
@ 2013-08-23 12:19 ` Francesco Fusco
  1 sibling, 0 replies; 7+ messages in thread
From: Francesco Fusco @ 2013-08-23 12:19 UTC (permalink / raw)
  To: davem; +Cc: netdev

If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().

The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.

Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
---
 v1->v2
  - reworked the entire patch
  - modified the ttl field in the struct inet_cork from __s16 to __u8:
    0 means that the TTL is not specified
  - the tos field in the struct inet_cork is still __s16: 
    -1 means tha the tos is not set
  - modified the priority field in the struct inet_cork from __u32 to 
    char.
  - introduced the get_rttos and get_rtconn_flags functions

 include/net/inet_sock.h |  3 +++
 include/net/ip.h        | 11 +++++++++++
 include/net/route.h     |  1 +
 net/ipv4/icmp.c         |  5 +++++
 net/ipv4/ip_output.c    | 13 ++++++++++---
 net/ipv4/ping.c         |  4 +++-
 net/ipv4/raw.c          |  4 +++-
 net/ipv4/udp.c          |  4 +++-
 8 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index b21a7f0..97734d0 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -103,6 +103,9 @@ struct inet_cork {
 	int			length; /* Total length of all frames */
 	struct dst_entry	*dst;
 	u8			tx_flags;
+	__u8			ttl;
+	__s16			tos;
+	char			priority;
 };
 
 struct inet_cork_full {
diff --git a/include/net/ip.h b/include/net/ip.h
index 84b5476..174d22f 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -28,6 +28,7 @@
 #include <linux/skbuff.h>
 
 #include <net/inet_sock.h>
+#include <net/route.h>
 #include <net/snmp.h>
 #include <net/flow.h>
 
@@ -142,6 +143,16 @@ static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4)
 	return __ip_make_skb(sk, fl4, &sk->sk_write_queue, &inet_sk(sk)->cork.base);
 }
 
+static inline __u8 get_rttos(struct ipcm_cookie* ipc, struct inet_sock *inet)
+{
+	return (ipc->tos != -1) ? RT_TOS(ipc->tos) : RT_TOS(inet->tos);
+}
+
+static inline __u8 get_rtconn_flags(struct ipcm_cookie* ipc, struct sock* sk)
+{
+	return (ipc->tos != -1) ? RT_CONN_FLAGS_TOS(sk, ipc->tos) : RT_CONN_FLAGS(sk);
+}
+
 /* datagram.c */
 extern int		ip4_datagram_connect(struct sock *sk, 
 					     struct sockaddr *uaddr, int addr_len);
diff --git a/include/net/route.h b/include/net/route.h
index 2ea40c1..0a659cc 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -39,6 +39,7 @@
 #define RTO_ONLINK	0x01
 
 #define RT_CONN_FLAGS(sk)   (RT_TOS(inet_sk(sk)->tos) | sock_flag(sk, SOCK_LOCALROUTE))
+#define RT_CONN_FLAGS_TOS(sk,tos)   (RT_TOS(tos) | sock_flag(sk, SOCK_LOCALROUTE))
 
 struct fib_nh;
 struct fib_info;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 5f7d11a..5c0e8bc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -353,6 +353,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	saddr = fib_compute_spec_dst(skb);
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
+
 	if (icmp_param->replyopts.opt.opt.optlen) {
 		ipc.opt = &icmp_param->replyopts.opt;
 		if (ipc.opt->opt.srr)
@@ -608,6 +611,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	ipc.addr = iph->saddr;
 	ipc.opt = &icmp_param->replyopts.opt;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
 			       type, code, icmp_param);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 4bcabf3..854f4f3 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1068,6 +1068,9 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 			 rt->dst.dev->mtu : dst_mtu(&rt->dst);
 	cork->dst = &rt->dst;
 	cork->length = 0;
+	cork->ttl = ipc->ttl;
+	cork->tos = ipc->tos;
+	cork->priority = ipc->priority;
 	cork->tx_flags = ipc->tx_flags;
 
 	return 0;
@@ -1319,7 +1322,9 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	if (cork->flags & IPCORK_OPT)
 		opt = cork->opt;
 
-	if (rt->rt_type == RTN_MULTICAST)
+	if (cork->ttl != 0)
+		ttl = cork->ttl;
+	else if (rt->rt_type == RTN_MULTICAST)
 		ttl = inet->mc_ttl;
 	else
 		ttl = ip_select_ttl(inet, &rt->dst);
@@ -1327,7 +1332,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	iph = (struct iphdr *)skb->data;
 	iph->version = 4;
 	iph->ihl = 5;
-	iph->tos = inet->tos;
+	iph->tos = (cork->tos != -1) ? cork->tos : inet->tos;
 	iph->frag_off = df;
 	iph->ttl = ttl;
 	iph->protocol = sk->sk_protocol;
@@ -1339,7 +1344,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 		ip_options_build(skb, opt, cork->addr, rt, 0);
 	}
 
-	skb->priority = sk->sk_priority;
+	skb->priority = (cork->tos != -1) ? cork->priority: sk->sk_priority;
 	skb->mark = sk->sk_mark;
 	/*
 	 * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
@@ -1489,6 +1494,8 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
 	ipc.addr = daddr;
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	if (replyopts.opt.opt.optlen) {
 		ipc.opt = &replyopts.opt;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index d7d9882..706d108e 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -713,6 +713,8 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	ipc.opt = NULL;
 	ipc.oif = sk->sk_bound_dev_if;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	sock_tx_timestamp(sk, &ipc.tx_flags);
 
@@ -744,7 +746,7 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			return -EINVAL;
 		faddr = ipc.opt->opt.faddr;
 	}
-	tos = RT_TOS(inet->tos);
+	tos = get_rttos(&ipc, inet);
 	if (sock_flag(sk, SOCK_LOCALROUTE) ||
 	    (msg->msg_flags & MSG_DONTROUTE) ||
 	    (ipc.opt && ipc.opt->opt.is_strictroute)) {
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 41d8450..b6533d3 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -517,6 +517,8 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	ipc.addr = inet->inet_saddr;
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 	ipc.oif = sk->sk_bound_dev_if;
 
 	if (msg->msg_controllen) {
@@ -556,7 +558,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			daddr = ipc.opt->opt.faddr;
 		}
 	}
-	tos = RT_CONN_FLAGS(sk);
+	tos = get_rtconn_flags(&ipc, sk);
 	if (msg->msg_flags & MSG_DONTROUTE)
 		tos |= RTO_ONLINK;
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0b24508..3f15039 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -855,6 +855,8 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
 
@@ -938,7 +940,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		faddr = ipc.opt->opt.faddr;
 		connected = 0;
 	}
-	tos = RT_TOS(inet->tos);
+	tos = get_rttos(&ipc, inet);
 	if (sock_flag(sk, SOCK_LOCALROUTE) ||
 	    (msg->msg_flags & MSG_DONTROUTE) ||
 	    (ipc.opt && ipc.opt->opt.is_strictroute)) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
  2013-08-23 12:19 ` [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data Francesco Fusco
@ 2013-08-27 18:56   ` David Miller
  2013-08-28  7:56     ` Francesco Fusco
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2013-08-27 18:56 UTC (permalink / raw)
  To: ffusco; +Cc: netdev

From: Francesco Fusco <ffusco@redhat.com>
Date: Fri, 23 Aug 2013 14:19:32 +0200

>   - changed the icmp_cookie ttl field from __s16 to __u8.
>     A value of 0 means that the TTL has not been specified

Sorry, I have to ask you to change the ttl field type back to __s16
and use "-1" to mean not-specified.

Zero is a valid TTL setting and it means to not allow the
packet to leave this host.

Please make this change and resubmit, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
  2013-08-27 18:56   ` David Miller
@ 2013-08-28  7:56     ` Francesco Fusco
  2013-09-18  0:46       ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Francesco Fusco @ 2013-08-28  7:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On 08/27/2013 08:56 PM, David Miller wrote:
> From: Francesco Fusco <ffusco@redhat.com>
> Date: Fri, 23 Aug 2013 14:19:32 +0200
>
>>    - changed the icmp_cookie ttl field from __s16 to __u8.
>>      A value of 0 means that the TTL has not been specified
>
> Sorry, I have to ask you to change the ttl field type back to __s16
> and use "-1" to mean not-specified.
>
> Zero is a valid TTL setting and it means to not allow the
> packet to leave this host.

Actually setsockopt() does not allow a TTL value of zero:

 From net/ipv4/ip_sockglue.c::do_ip_setsockopt()
-----
case IP_TTL:
                 if (optlen < 1)
                         goto e_inval;
                 if (val != -1 && (val < 1 || val > 255))
                         goto e_inval;
                 inet->uc_ttl = val;
                 break;
---------

To make my patch consistent with the behavior of setsockopt() I also
do not accept a TTL of zero in the ancillary data:
+	if (val < 1 || val > 255)
+		return -EINVAL;

Therefore, if icmp_cookie->ttl has a value of 0, that could only mean
that the user has not specified the TTL.

I agree that could be somehow confusing to consider 0 as a non specified
TTL, and that -1 would be more clear. However, it seems to me that we 
end up using 1 more byte in a struct that is stored on the stack for
readability reasons.

> Please make this change and resubmit, thanks.

I can change the code as you requested despite what I wrote above,
let me know.

Thanks,
Francesco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
  2013-08-28  7:56     ` Francesco Fusco
@ 2013-09-18  0:46       ` David Miller
  2013-09-18  8:16         ` Francesco Fusco
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2013-09-18  0:46 UTC (permalink / raw)
  To: ffusco; +Cc: netdev

From: Francesco Fusco <ffusco@redhat.com>
Date: Wed, 28 Aug 2013 09:56:32 +0200

> On 08/27/2013 08:56 PM, David Miller wrote:
>> From: Francesco Fusco <ffusco@redhat.com>
>> Date: Fri, 23 Aug 2013 14:19:32 +0200
>>
>>>    - changed the icmp_cookie ttl field from __s16 to __u8.
>>>      A value of 0 means that the TTL has not been specified
>>
>> Sorry, I have to ask you to change the ttl field type back to __s16
>> and use "-1" to mean not-specified.
>>
>> Zero is a valid TTL setting and it means to not allow the
>> packet to leave this host.
> 
> Actually setsockopt() does not allow a TTL value of zero:
> 
> From net/ipv4/ip_sockglue.c::do_ip_setsockopt()

Indeed, you are right.

Please resubmit these patches for the next merge window.

Thank you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
  2013-09-18  0:46       ` David Miller
@ 2013-09-18  8:16         ` Francesco Fusco
  0 siblings, 0 replies; 7+ messages in thread
From: Francesco Fusco @ 2013-09-18  8:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Thanks David.
I will resubmit the patches as they are as soon as the merge window 
opens again.

Best,
Francesco

On 09/18/2013 02:46 AM, David Miller wrote:
> From: Francesco Fusco <ffusco@redhat.com>
> Date: Wed, 28 Aug 2013 09:56:32 +0200
>
>> On 08/27/2013 08:56 PM, David Miller wrote:
>>> From: Francesco Fusco <ffusco@redhat.com>
>>> Date: Fri, 23 Aug 2013 14:19:32 +0200
>>>
>>>>     - changed the icmp_cookie ttl field from __s16 to __u8.
>>>>       A value of 0 means that the TTL has not been specified
>>>
>>> Sorry, I have to ask you to change the ttl field type back to __s16
>>> and use "-1" to mean not-specified.
>>>
>>> Zero is a valid TTL setting and it means to not allow the
>>> packet to leave this host.
>>
>> Actually setsockopt() does not allow a TTL value of zero:
>>
>>  From net/ipv4/ip_sockglue.c::do_ip_setsockopt()
>
> Indeed, you are right.
>
> Please resubmit these patches for the next merge window.
>
> Thank you.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-09-18  8:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-23 12:19 [PATCH net-next v2 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg() Francesco Fusco
2013-08-23 12:19 ` [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data Francesco Fusco
2013-08-27 18:56   ` David Miller
2013-08-28  7:56     ` Francesco Fusco
2013-09-18  0:46       ` David Miller
2013-09-18  8:16         ` Francesco Fusco
2013-08-23 12:19 ` [PATCH net-next v2 2/2] ipv4: processing ancillary IP_TOS or IP_TTL Francesco Fusco

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.