netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net/ipv4/raw Optimise ipv4 raw sends when IP_HDRINCL set.
@ 2020-05-10 16:00 David Laight
  2020-05-11 20:49 ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: David Laight @ 2020-05-10 16:00 UTC (permalink / raw)
  To: netdev

The final routing for ipv4 packets may be done with the IP address
from the message header not that from the address buffer.
If the addresses are different FLOWI_FLAG_KNOWN_NH must be set so
that a temporary 'struct rtable' entry is created to send the message.
However the allocate + free (under RCU) is relatively expensive
and can be avoided by a quick check shows the addresses match.

Signed-off-by: David Laight <david.laight@aculab.com>
---

This makes a considerable difference when we are sending a lot
of RTP streams from a raw socket.
IP_HDRINCL has to be set so that the calculated UDP checksum is right.

 net/ipv4/raw.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 3183413..0a81376 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -495,6 +495,27 @@ static int raw_getfrag(void *from, char *to, int offset, int len, int odd,
 	return ip_generic_getfrag(rfv->msg, to, offset, len, odd, skb);
 }
 
+static bool raw_msg_addr_matches(struct msghdr *msg, __be32 daddr)
+{
+	const struct iovec *iov;
+	__be32 msg_daddr;
+
+	/* Check common case of user buffer with header in the first fragment.
+	 * If we return false the message is still sent.
+	 */
+
+	if (!iter_is_iovec(&msg->msg_iter))
+		return false;
+	iov = msg->msg_iter.iov;
+	if (!iov || iov->iov_len < 20)
+		return false;
+
+	if (get_user(msg_daddr, (__be32 __user *)(iov->iov_base + 16)))
+		return false;
+
+	return daddr == msg_daddr;
+}
+
 static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
 	struct inet_sock *inet = inet_sk(sk);
@@ -626,9 +647,14 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, tos,
 			   RT_SCOPE_UNIVERSE,
 			   hdrincl ? IPPROTO_RAW : sk->sk_protocol,
-			   inet_sk_flowi_flags(sk) |
-			    (hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
+			   inet_sk_flowi_flags(sk),
 			   daddr, saddr, 0, 0, sk->sk_uid);
+	/* The final message routing may be done with the destination address
+	 * in the user-supplied ipv4 header. If this differs from 'daddr' then
+	 * a temporary destination table entry has to be created.
+	 */ 
+	if (hdrincl && !raw_msg_addr_matches(msg, daddr))
+		fl4.flowi4_flags |= FLOWI_FLAG_KNOWN_NH;
 
 	if (!hdrincl) {
 		rfv.msg = msg;
-- 
1.8.1.2

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net/ipv4/raw Optimise ipv4 raw sends when IP_HDRINCL set.
  2020-05-10 16:00 [PATCH net-next] net/ipv4/raw Optimise ipv4 raw sends when IP_HDRINCL set David Laight
@ 2020-05-11 20:49 ` David Miller
  2020-05-11 21:28   ` David Laight
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2020-05-11 20:49 UTC (permalink / raw)
  To: David.Laight; +Cc: netdev

From: David Laight <David.Laight@ACULAB.COM>
Date: Sun, 10 May 2020 16:00:41 +0000

> The final routing for ipv4 packets may be done with the IP address
> from the message header not that from the address buffer.
> If the addresses are different FLOWI_FLAG_KNOWN_NH must be set so
> that a temporary 'struct rtable' entry is created to send the message.
> However the allocate + free (under RCU) is relatively expensive
> and can be avoided by a quick check shows the addresses match.
> 
> Signed-off-by: David Laight <david.laight@aculab.com>

The user can change the daddr field in userspace between when you do
this test and when the iphdr is copied into the sk_buff.

Also, you are obfuscating what you are doing in the way you have coded
this check.  You extract 4 bytes from a magic offset (16), which is
hard to understand.

Just explicitly code out the fact that you are accessing the daddr
field of an ip header.

But nonetheless you have to solve the "modified in userspace
meanwhile" problem, as this is a bug we fix often in the kernel so we
don't want to add new instances.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH net-next] net/ipv4/raw Optimise ipv4 raw sends when IP_HDRINCL set.
  2020-05-11 20:49 ` David Miller
@ 2020-05-11 21:28   ` David Laight
  2020-05-11 23:09     ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: David Laight @ 2020-05-11 21:28 UTC (permalink / raw)
  To: 'David Miller'; +Cc: netdev

From: David Miller
> Sent: 11 May 2020 21:50
> To: David Laight <David.Laight@ACULAB.COM>
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] net/ipv4/raw Optimise ipv4 raw sends when IP_HDRINCL set.
> 
> From: David Laight <David.Laight@ACULAB.COM>
> Date: Sun, 10 May 2020 16:00:41 +0000
> 
> > The final routing for ipv4 packets may be done with the IP address
> > from the message header not that from the address buffer.
> > If the addresses are different FLOWI_FLAG_KNOWN_NH must be set so
> > that a temporary 'struct rtable' entry is created to send the message.
> > However the allocate + free (under RCU) is relatively expensive
> > and can be avoided by a quick check shows the addresses match.
> >
> > Signed-off-by: David Laight <david.laight@aculab.com>
> 
> The user can change the daddr field in userspace between when you do
> this test and when the iphdr is copied into the sk_buff.
> 
> Also, you are obfuscating what you are doing in the way you have coded
> this check.  You extract 4 bytes from a magic offset (16), which is
> hard to understand.

Ok, that should at least be the structure offset.

> Just explicitly code out the fact that you are accessing the daddr
> field of an ip header.
> 
> But nonetheless you have to solve the "modified in userspace
> meanwhile" problem, as this is a bug we fix often in the kernel so we
> don't want to add new instances.

In this case the "modified in userspace meanwhile" just breaks the
application - it isn't any kind of security issue.

The problem is that you can't read the data into an skb until you
have the offset - which is got by looking up the destination address.
But you need the actual destination (from the packet data) to match
the address buffer if you don't want to create a temporary rtable entry.

I didn't find the commit that make rtable entries shared.
I though the same table was used for routes and arps  - but
it looks like they got separated at some point.

The code could put the address it read back into the skb, but that would
look even worse.

At the moment the performance is horrid when hundreds of the rtable
entries get deleted under rcu all together.
They are also added to a single locked linked list.

I'm running 500 RTP streams which each send one UDP message every 20ms.
It really needs to used the cached rtable entries.
The only sane way to send the data is through a raw socket and to get
the UDP checksum set you have to sort out the both IP addresses.
(A UPD socket would be given rx data - which needs to go elsewhere.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net/ipv4/raw Optimise ipv4 raw sends when IP_HDRINCL set.
  2020-05-11 21:28   ` David Laight
@ 2020-05-11 23:09     ` David Miller
  2020-05-12  8:17       ` David Laight
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2020-05-11 23:09 UTC (permalink / raw)
  To: David.Laight; +Cc: netdev

From: David Laight <David.Laight@ACULAB.COM>
Date: Mon, 11 May 2020 21:28:18 +0000

> In this case the "modified in userspace meanwhile" just breaks the
> application - it isn't any kind of security issue.

The kernel must provide correct behavior based upon the stable IP
header that it copies into userspace.  I'm not moving on this
requirement, sorry.

I'm sure you have great reasons why you can't use normal UDP sockets
for RTP traffic, but that's how you will get a cached route and avoid
this exact problem.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH net-next] net/ipv4/raw Optimise ipv4 raw sends when IP_HDRINCL set.
  2020-05-11 23:09     ` David Miller
@ 2020-05-12  8:17       ` David Laight
  0 siblings, 0 replies; 5+ messages in thread
From: David Laight @ 2020-05-12  8:17 UTC (permalink / raw)
  To: 'David Miller'; +Cc: netdev

From: David Miller <davem@davemloft.net>
> Sent: 12 May 2020 00:10
> From: David Laight <David.Laight@ACULAB.COM>
> Date: Mon, 11 May 2020 21:28:18 +0000
> 
> > In this case the "modified in userspace meanwhile" just breaks the
> > application - it isn't any kind of security issue.
> 
> The kernel must provide correct behavior based upon the stable IP
> header that it copies into userspace.  I'm not moving on this
> requirement, sorry.
> 
> I'm sure you have great reasons why you can't use normal UDP sockets
> for RTP traffic, but that's how you will get a cached route and avoid
> this exact problem.

Not unless you can tell me how to create a UDP socket that doesn't
receive data.
Even if there is a corresponding RTP receive flow there is no reason
why it should use the same port numbers and IP addresses.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-05-12  8:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-10 16:00 [PATCH net-next] net/ipv4/raw Optimise ipv4 raw sends when IP_HDRINCL set David Laight
2020-05-11 20:49 ` David Miller
2020-05-11 21:28   ` David Laight
2020-05-11 23:09     ` David Miller
2020-05-12  8:17       ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).