* [PATCH net-next v2 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg()
@ 2013-08-23 12:19 Francesco Fusco
2013-08-23 12:19 ` [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data Francesco Fusco
2013-08-23 12:19 ` [PATCH net-next v2 2/2] ipv4: processing ancillary IP_TOS or IP_TTL Francesco Fusco
0 siblings, 2 replies; 7+ messages in thread
From: Francesco Fusco @ 2013-08-23 12:19 UTC (permalink / raw)
To: davem; +Cc: netdev
There is no way to set the IP_TOS field on a per-packet basis in IPv4, while
IPv6 has such a mechanism. Therefore one has to fall back to the setsockopt()
in case of IPv4.
Using the existing per-socket option is not convenient particularly in the
situations where multiple threads have to use the same socket data requiring
per-thread TOS values. In fact this would involve calling setsockopt() before
sendmsg() every time.
Francesco Fusco (2):
ipv4: IP_TOS and IP_TTL can be specified as ancillary data
ipv4: processing ancillary IP_TOS or IP_TTL
include/net/inet_sock.h | 3 +++
include/net/ip.h | 14 ++++++++++++++
include/net/route.h | 1 +
net/ipv4/icmp.c | 5 +++++
net/ipv4/ip_output.c | 13 ++++++++++---
net/ipv4/ip_sockglue.c | 20 +++++++++++++++++++-
net/ipv4/ping.c | 4 +++-
net/ipv4/raw.c | 4 +++-
net/ipv4/udp.c | 4 +++-
9 files changed, 61 insertions(+), 7 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
2013-08-23 12:19 [PATCH net-next v2 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg() Francesco Fusco
@ 2013-08-23 12:19 ` Francesco Fusco
2013-08-27 18:56 ` David Miller
2013-08-23 12:19 ` [PATCH net-next v2 2/2] ipv4: processing ancillary IP_TOS or IP_TTL Francesco Fusco
1 sibling, 1 reply; 7+ messages in thread
From: Francesco Fusco @ 2013-08-23 12:19 UTC (permalink / raw)
To: davem; +Cc: netdev
This patch enables the IP_TTL and IP_TOS values passed from userspace to
be stored in the ipcm_cookie struct. Three fields are added to the struct:
- the TTL, expressed as __u8.
The allowed values are in the [1-255].
A value of 0 means that the TTL is not specified.
- the TOS, expressed as __s16.
The allowed values are in the range [0,255].
A value of -1 means that the TOS is not specified.
- the priority, expressed as a char and computed when
handling the ancillary data.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
---
v1->v2
- changed the icmp_cookie ttl field from __s16 to __u8.
A value of 0 means that the TTL has not been specified
- to tos field is still __s16. The user can specify
values in the range 0-255 included, therefore I use
a value of -1 as a flag saying that the value has
not been specified
- the priority it is now a char instead of a __u32,
which is the return type of rt_tos2priority
- improved commit message
include/net/ip.h | 3 +++
net/ipv4/ip_sockglue.c | 20 +++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index a68f838..84b5476 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -56,6 +56,9 @@ struct ipcm_cookie {
int oif;
struct ip_options_rcu *opt;
__u8 tx_flags;
+ __u8 ttl;
+ __s16 tos;
+ char priority;
};
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index d9c4f11..56e3445 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -189,7 +189,7 @@ EXPORT_SYMBOL(ip_cmsg_recv);
int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
{
- int err;
+ int err, val;
struct cmsghdr *cmsg;
for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
@@ -215,6 +215,24 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
ipc->addr = info->ipi_spec_dst.s_addr;
break;
}
+ case IP_TTL:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+ return -EINVAL;
+ val = *(int *)CMSG_DATA(cmsg);
+ if (val < 1 || val > 255)
+ return -EINVAL;
+ ipc->ttl = val;
+ break;
+ case IP_TOS:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+ return -EINVAL;
+ val = *(int *)CMSG_DATA(cmsg);
+ if (val < 0 || val > 255)
+ return -EINVAL;
+ ipc->tos = val;
+ ipc->priority = rt_tos2priority(ipc->tos);
+ break;
+
default:
return -EINVAL;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH net-next v2 2/2] ipv4: processing ancillary IP_TOS or IP_TTL
2013-08-23 12:19 [PATCH net-next v2 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg() Francesco Fusco
2013-08-23 12:19 ` [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data Francesco Fusco
@ 2013-08-23 12:19 ` Francesco Fusco
1 sibling, 0 replies; 7+ messages in thread
From: Francesco Fusco @ 2013-08-23 12:19 UTC (permalink / raw)
To: davem; +Cc: netdev
If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().
The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.
Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
---
v1->v2
- reworked the entire patch
- modified the ttl field in the struct inet_cork from __s16 to __u8:
0 means that the TTL is not specified
- the tos field in the struct inet_cork is still __s16:
-1 means tha the tos is not set
- modified the priority field in the struct inet_cork from __u32 to
char.
- introduced the get_rttos and get_rtconn_flags functions
include/net/inet_sock.h | 3 +++
include/net/ip.h | 11 +++++++++++
include/net/route.h | 1 +
net/ipv4/icmp.c | 5 +++++
net/ipv4/ip_output.c | 13 ++++++++++---
net/ipv4/ping.c | 4 +++-
net/ipv4/raw.c | 4 +++-
net/ipv4/udp.c | 4 +++-
8 files changed, 39 insertions(+), 6 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index b21a7f0..97734d0 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -103,6 +103,9 @@ struct inet_cork {
int length; /* Total length of all frames */
struct dst_entry *dst;
u8 tx_flags;
+ __u8 ttl;
+ __s16 tos;
+ char priority;
};
struct inet_cork_full {
diff --git a/include/net/ip.h b/include/net/ip.h
index 84b5476..174d22f 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -28,6 +28,7 @@
#include <linux/skbuff.h>
#include <net/inet_sock.h>
+#include <net/route.h>
#include <net/snmp.h>
#include <net/flow.h>
@@ -142,6 +143,16 @@ static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4)
return __ip_make_skb(sk, fl4, &sk->sk_write_queue, &inet_sk(sk)->cork.base);
}
+static inline __u8 get_rttos(struct ipcm_cookie* ipc, struct inet_sock *inet)
+{
+ return (ipc->tos != -1) ? RT_TOS(ipc->tos) : RT_TOS(inet->tos);
+}
+
+static inline __u8 get_rtconn_flags(struct ipcm_cookie* ipc, struct sock* sk)
+{
+ return (ipc->tos != -1) ? RT_CONN_FLAGS_TOS(sk, ipc->tos) : RT_CONN_FLAGS(sk);
+}
+
/* datagram.c */
extern int ip4_datagram_connect(struct sock *sk,
struct sockaddr *uaddr, int addr_len);
diff --git a/include/net/route.h b/include/net/route.h
index 2ea40c1..0a659cc 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -39,6 +39,7 @@
#define RTO_ONLINK 0x01
#define RT_CONN_FLAGS(sk) (RT_TOS(inet_sk(sk)->tos) | sock_flag(sk, SOCK_LOCALROUTE))
+#define RT_CONN_FLAGS_TOS(sk,tos) (RT_TOS(tos) | sock_flag(sk, SOCK_LOCALROUTE))
struct fib_nh;
struct fib_info;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 5f7d11a..5c0e8bc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -353,6 +353,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
saddr = fib_compute_spec_dst(skb);
ipc.opt = NULL;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
+
if (icmp_param->replyopts.opt.opt.optlen) {
ipc.opt = &icmp_param->replyopts.opt;
if (ipc.opt->opt.srr)
@@ -608,6 +611,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
ipc.addr = iph->saddr;
ipc.opt = &icmp_param->replyopts.opt;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
type, code, icmp_param);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 4bcabf3..854f4f3 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1068,6 +1068,9 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
rt->dst.dev->mtu : dst_mtu(&rt->dst);
cork->dst = &rt->dst;
cork->length = 0;
+ cork->ttl = ipc->ttl;
+ cork->tos = ipc->tos;
+ cork->priority = ipc->priority;
cork->tx_flags = ipc->tx_flags;
return 0;
@@ -1319,7 +1322,9 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
if (cork->flags & IPCORK_OPT)
opt = cork->opt;
- if (rt->rt_type == RTN_MULTICAST)
+ if (cork->ttl != 0)
+ ttl = cork->ttl;
+ else if (rt->rt_type == RTN_MULTICAST)
ttl = inet->mc_ttl;
else
ttl = ip_select_ttl(inet, &rt->dst);
@@ -1327,7 +1332,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
iph = (struct iphdr *)skb->data;
iph->version = 4;
iph->ihl = 5;
- iph->tos = inet->tos;
+ iph->tos = (cork->tos != -1) ? cork->tos : inet->tos;
iph->frag_off = df;
iph->ttl = ttl;
iph->protocol = sk->sk_protocol;
@@ -1339,7 +1344,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
ip_options_build(skb, opt, cork->addr, rt, 0);
}
- skb->priority = sk->sk_priority;
+ skb->priority = (cork->tos != -1) ? cork->priority: sk->sk_priority;
skb->mark = sk->sk_mark;
/*
* Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
@@ -1489,6 +1494,8 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
ipc.addr = daddr;
ipc.opt = NULL;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
if (replyopts.opt.opt.optlen) {
ipc.opt = &replyopts.opt;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index d7d9882..706d108e 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -713,6 +713,8 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.opt = NULL;
ipc.oif = sk->sk_bound_dev_if;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
sock_tx_timestamp(sk, &ipc.tx_flags);
@@ -744,7 +746,7 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
return -EINVAL;
faddr = ipc.opt->opt.faddr;
}
- tos = RT_TOS(inet->tos);
+ tos = get_rttos(&ipc, inet);
if (sock_flag(sk, SOCK_LOCALROUTE) ||
(msg->msg_flags & MSG_DONTROUTE) ||
(ipc.opt && ipc.opt->opt.is_strictroute)) {
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 41d8450..b6533d3 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -517,6 +517,8 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.addr = inet->inet_saddr;
ipc.opt = NULL;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
ipc.oif = sk->sk_bound_dev_if;
if (msg->msg_controllen) {
@@ -556,7 +558,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
daddr = ipc.opt->opt.faddr;
}
}
- tos = RT_CONN_FLAGS(sk);
+ tos = get_rtconn_flags(&ipc, sk);
if (msg->msg_flags & MSG_DONTROUTE)
tos |= RTO_ONLINK;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0b24508..3f15039 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -855,6 +855,8 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.opt = NULL;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
@@ -938,7 +940,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
faddr = ipc.opt->opt.faddr;
connected = 0;
}
- tos = RT_TOS(inet->tos);
+ tos = get_rttos(&ipc, inet);
if (sock_flag(sk, SOCK_LOCALROUTE) ||
(msg->msg_flags & MSG_DONTROUTE) ||
(ipc.opt && ipc.opt->opt.is_strictroute)) {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
2013-08-23 12:19 ` [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data Francesco Fusco
@ 2013-08-27 18:56 ` David Miller
2013-08-28 7:56 ` Francesco Fusco
0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2013-08-27 18:56 UTC (permalink / raw)
To: ffusco; +Cc: netdev
From: Francesco Fusco <ffusco@redhat.com>
Date: Fri, 23 Aug 2013 14:19:32 +0200
> - changed the icmp_cookie ttl field from __s16 to __u8.
> A value of 0 means that the TTL has not been specified
Sorry, I have to ask you to change the ttl field type back to __s16
and use "-1" to mean not-specified.
Zero is a valid TTL setting and it means to not allow the
packet to leave this host.
Please make this change and resubmit, thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
2013-08-27 18:56 ` David Miller
@ 2013-08-28 7:56 ` Francesco Fusco
2013-09-18 0:46 ` David Miller
0 siblings, 1 reply; 7+ messages in thread
From: Francesco Fusco @ 2013-08-28 7:56 UTC (permalink / raw)
To: David Miller; +Cc: netdev
On 08/27/2013 08:56 PM, David Miller wrote:
> From: Francesco Fusco <ffusco@redhat.com>
> Date: Fri, 23 Aug 2013 14:19:32 +0200
>
>> - changed the icmp_cookie ttl field from __s16 to __u8.
>> A value of 0 means that the TTL has not been specified
>
> Sorry, I have to ask you to change the ttl field type back to __s16
> and use "-1" to mean not-specified.
>
> Zero is a valid TTL setting and it means to not allow the
> packet to leave this host.
Actually setsockopt() does not allow a TTL value of zero:
From net/ipv4/ip_sockglue.c::do_ip_setsockopt()
-----
case IP_TTL:
if (optlen < 1)
goto e_inval;
if (val != -1 && (val < 1 || val > 255))
goto e_inval;
inet->uc_ttl = val;
break;
---------
To make my patch consistent with the behavior of setsockopt() I also
do not accept a TTL of zero in the ancillary data:
+ if (val < 1 || val > 255)
+ return -EINVAL;
Therefore, if icmp_cookie->ttl has a value of 0, that could only mean
that the user has not specified the TTL.
I agree that could be somehow confusing to consider 0 as a non specified
TTL, and that -1 would be more clear. However, it seems to me that we
end up using 1 more byte in a struct that is stored on the stack for
readability reasons.
> Please make this change and resubmit, thanks.
I can change the code as you requested despite what I wrote above,
let me know.
Thanks,
Francesco
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
2013-08-28 7:56 ` Francesco Fusco
@ 2013-09-18 0:46 ` David Miller
2013-09-18 8:16 ` Francesco Fusco
0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2013-09-18 0:46 UTC (permalink / raw)
To: ffusco; +Cc: netdev
From: Francesco Fusco <ffusco@redhat.com>
Date: Wed, 28 Aug 2013 09:56:32 +0200
> On 08/27/2013 08:56 PM, David Miller wrote:
>> From: Francesco Fusco <ffusco@redhat.com>
>> Date: Fri, 23 Aug 2013 14:19:32 +0200
>>
>>> - changed the icmp_cookie ttl field from __s16 to __u8.
>>> A value of 0 means that the TTL has not been specified
>>
>> Sorry, I have to ask you to change the ttl field type back to __s16
>> and use "-1" to mean not-specified.
>>
>> Zero is a valid TTL setting and it means to not allow the
>> packet to leave this host.
>
> Actually setsockopt() does not allow a TTL value of zero:
>
> From net/ipv4/ip_sockglue.c::do_ip_setsockopt()
Indeed, you are right.
Please resubmit these patches for the next merge window.
Thank you.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
2013-09-18 0:46 ` David Miller
@ 2013-09-18 8:16 ` Francesco Fusco
0 siblings, 0 replies; 7+ messages in thread
From: Francesco Fusco @ 2013-09-18 8:16 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Thanks David.
I will resubmit the patches as they are as soon as the merge window
opens again.
Best,
Francesco
On 09/18/2013 02:46 AM, David Miller wrote:
> From: Francesco Fusco <ffusco@redhat.com>
> Date: Wed, 28 Aug 2013 09:56:32 +0200
>
>> On 08/27/2013 08:56 PM, David Miller wrote:
>>> From: Francesco Fusco <ffusco@redhat.com>
>>> Date: Fri, 23 Aug 2013 14:19:32 +0200
>>>
>>>> - changed the icmp_cookie ttl field from __s16 to __u8.
>>>> A value of 0 means that the TTL has not been specified
>>>
>>> Sorry, I have to ask you to change the ttl field type back to __s16
>>> and use "-1" to mean not-specified.
>>>
>>> Zero is a valid TTL setting and it means to not allow the
>>> packet to leave this host.
>>
>> Actually setsockopt() does not allow a TTL value of zero:
>>
>> From net/ipv4/ip_sockglue.c::do_ip_setsockopt()
>
> Indeed, you are right.
>
> Please resubmit these patches for the next merge window.
>
> Thank you.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-09-18 8:16 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-23 12:19 [PATCH net-next v2 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg() Francesco Fusco
2013-08-23 12:19 ` [PATCH net-next v2 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data Francesco Fusco
2013-08-27 18:56 ` David Miller
2013-08-28 7:56 ` Francesco Fusco
2013-09-18 0:46 ` David Miller
2013-09-18 8:16 ` Francesco Fusco
2013-08-23 12:19 ` [PATCH net-next v2 2/2] ipv4: processing ancillary IP_TOS or IP_TTL Francesco Fusco
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.