All of lore.kernel.org
 help / color / mirror / Atom feed
* EINVAL when using connect() for udp sockets
@ 2017-03-09  3:10 Daurnimator
  2017-03-23  2:22 ` Daurnimator
  0 siblings, 1 reply; 17+ messages in thread
From: Daurnimator @ 2017-03-09  3:10 UTC (permalink / raw)
  To: netdev; +Cc: William Ahern, santitm99

When debugging https://github.com/daurnimator/lua-http/issues/73 which
uses https://github.com/wahern/dns we ran into an issue where modern
linux kernels return EINVAL if you try and re-use a udp socket.
The issue seems to occur if you go from a local destination ip to a
non-local one.

>From connect(2) man page:
> connectionless protocol sockets may use connect() multiple times to change their association

(the following content is also available at
https://gist.github.com/daurnimator/6765345776e87a3830ed101d1d983ee1)

$ cat connect-bug-stage2.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <netinet/ip.h>
#include <arpa/inet.h>
#include <errno.h>

int main() {
    int fd = socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP);
    if (fd == -1)
        exit(1);
    if (bind(fd, (struct sockaddr*)&(struct
sockaddr_in){.sin_family=AF_INET, .sin_port=htons(57997),
.sin_addr=inet_addr("0.0.0.0")}, 16))
        exit(2);
    if (connect(fd, (struct sockaddr*)&(struct
sockaddr_in){.sin_family=AF_INET, .sin_port=htons(53),
.sin_addr=inet_addr("127.0.0.2")}, 16))
        exit(3);
    if (-1 == sendto(fd, "test", 4, 0, NULL, 0))
        exit(4);
    char buf[200];
    if (-1 != recvfrom(fd, buf, 200, 0, 0, 0) && errno != ECONNREFUSED)
        exit(5);
    /* okay, try next server... */
    if (connect(fd, (struct sockaddr*)&(struct
sockaddr_in){.sin_family=AF_INET, .sin_port=htons(53),
.sin_addr=inet_addr("8.8.8.8")}, 16))
        exit(6);
    exit(0);
}
$ gcc connect-bug-stage2.c
$ strace ./a.out
execve("./a.out", ["./a.out"], [/* 56 vars */]) = 0
brk(NULL) = 0x860000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=215350, ...}) = 0
mmap(NULL, 215350, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fc978791000
close(3) = 0
open("/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\3\2\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1951744, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7fc97878f000
mmap(NULL, 3791152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7fc978206000
mprotect(0x7fc97839b000, 2093056, PROT_NONE) = 0
mmap(0x7fc97859a000, 24576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x194000) = 0x7fc97859a000
mmap(0x7fc9785a0000, 14640, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fc9785a0000
close(3) = 0
arch_prctl(ARCH_SET_FS, 0x7fc978790400) = 0
mprotect(0x7fc97859a000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ) = 0
mprotect(0x7fc9787c6000, 4096, PROT_READ) = 0
munmap(0x7fc978791000, 215350) = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(57997),
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("127.0.0.2")}, 16) = 0
sendto(3, "test", 4, 0, NULL, 0) = 4
recvfrom(3, 0x7ffcc2c9a3a0, 200, 0, NULL, NULL) = -1 ECONNREFUSED
(Connection refused)
connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("8.8.8.8")}, 16) = -1 EINVAL (Invalid argument)
exit_group(6) = ?
+++ exited with 6 +++

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-09  3:10 EINVAL when using connect() for udp sockets Daurnimator
@ 2017-03-23  2:22 ` Daurnimator
  2017-03-23  5:18   ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Daurnimator @ 2017-03-23  2:22 UTC (permalink / raw)
  To: netdev; +Cc: William Ahern, Santi T.

On 9 March 2017 at 14:10, Daurnimator <quae@daurnimator.com> wrote:
> When debugging https://github.com/daurnimator/lua-http/issues/73 which
> uses https://github.com/wahern/dns we ran into an issue where modern
> linux kernels return EINVAL if you try and re-use a udp socket.
> The issue seems to occur if you go from a local destination ip to a
> non-local one.

Did anyone get a chance to look into this issue?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-23  2:22 ` Daurnimator
@ 2017-03-23  5:18   ` Eric Dumazet
  2017-03-24 22:34     ` Cong Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2017-03-23  5:18 UTC (permalink / raw)
  To: Daurnimator; +Cc: netdev, William Ahern, Santi T.

On Thu, 2017-03-23 at 13:22 +1100, Daurnimator wrote:
> On 9 March 2017 at 14:10, Daurnimator <quae@daurnimator.com> wrote:
> > When debugging https://github.com/daurnimator/lua-http/issues/73 which
> > uses https://github.com/wahern/dns we ran into an issue where modern
> > linux kernels return EINVAL if you try and re-use a udp socket.
> > The issue seems to occur if you go from a local destination ip to a
> > non-local one.
> 
> Did anyone get a chance to look into this issue?

I believe man page is not complete.

A disconnect is needed before another connect()

eg :

#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <netinet/ip.h>
#include <arpa/inet.h>
#include <errno.h>

int main() {
    int fd = socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP);
    if (fd == -1)
        exit(1);
    if (bind(fd, (struct sockaddr*)&(struct
sockaddr_in){.sin_family=AF_INET, .sin_port=htons(57997),
.sin_addr=inet_addr("0.0.0.0")}, 16))
        exit(2);
    if (connect(fd, (struct sockaddr*)&(struct
sockaddr_in){.sin_family=AF_INET, .sin_port=htons(53),
.sin_addr=inet_addr("127.0.0.2")}, 16))
        exit(3);
    if (-1 == sendto(fd, "test", 4, 0, NULL, 0))
        exit(4);
    char buf[200];
    if (-1 != recvfrom(fd, buf, 200, 0, 0, 0) && errno != ECONNREFUSED)
        exit(5);
    /* okay, try next server... */

/* disconnect */
    if (connect(fd, (struct sockaddr*)&(struct sockaddr_in){.sin_family=AF_UNSPEC}, 16))
	exit(6);

    if (connect(fd, (struct sockaddr*)&(struct
sockaddr_in){.sin_family=AF_INET, .sin_port=htons(53),
.sin_addr=inet_addr("8.8.8.8")}, 16))
        exit(7);
    exit(0);
}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-23  5:18   ` Eric Dumazet
@ 2017-03-24 22:34     ` Cong Wang
  2017-03-24 23:19       ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Cong Wang @ 2017-03-24 22:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daurnimator, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	mtk.manpages

(Cc'ing Michael Kerrisk)

On Wed, Mar 22, 2017 at 10:18 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-03-23 at 13:22 +1100, Daurnimator wrote:
>> On 9 March 2017 at 14:10, Daurnimator <quae@daurnimator.com> wrote:
>> > When debugging https://github.com/daurnimator/lua-http/issues/73 which
>> > uses https://github.com/wahern/dns we ran into an issue where modern
>> > linux kernels return EINVAL if you try and re-use a udp socket.
>> > The issue seems to occur if you go from a local destination ip to a
>> > non-local one.
>>
>> Did anyone get a chance to look into this issue?
>
> I believe man page is not complete.
>
> A disconnect is needed before another connect()

Is it? Making connect() reentrant is reasonable for connection-less
protocol like UDP, but I don't dig POSIX for the details. If so we need
something like below...

--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -40,7 +40,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
sockaddr *uaddr, int addr_len
        sk_dst_reset(sk);

        oif = sk->sk_bound_dev_if;
-       saddr = inet->inet_saddr;
+       saddr = inet->inet_saddr = 0;
        if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
                if (!oif)
                        oif = inet->mc_index;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-24 22:34     ` Cong Wang
@ 2017-03-24 23:19       ` Eric Dumazet
  2017-03-28  4:22         ` Cong Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2017-03-24 23:19 UTC (permalink / raw)
  To: Cong Wang
  Cc: Daurnimator, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	mtk.manpages

On Fri, 2017-03-24 at 15:34 -0700, Cong Wang wrote:
> (Cc'ing Michael Kerrisk)
> 
> On Wed, Mar 22, 2017 at 10:18 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Thu, 2017-03-23 at 13:22 +1100, Daurnimator wrote:
> >> On 9 March 2017 at 14:10, Daurnimator <quae@daurnimator.com> wrote:
> >> > When debugging https://github.com/daurnimator/lua-http/issues/73 which
> >> > uses https://github.com/wahern/dns we ran into an issue where modern
> >> > linux kernels return EINVAL if you try and re-use a udp socket.
> >> > The issue seems to occur if you go from a local destination ip to a
> >> > non-local one.
> >>
> >> Did anyone get a chance to look into this issue?
> >
> > I believe man page is not complete.
> >
> > A disconnect is needed before another connect()
> 
> Is it? Making connect() reentrant is reasonable for connection-less
> protocol like UDP, but I don't dig POSIX for the details. If so we need
> something like below...
> 
> --- a/net/ipv4/datagram.c
> +++ b/net/ipv4/datagram.c
> @@ -40,7 +40,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
> sockaddr *uaddr, int addr_len
>         sk_dst_reset(sk);
> 
>         oif = sk->sk_bound_dev_if;
> -       saddr = inet->inet_saddr;
> +       saddr = inet->inet_saddr = 0;
>         if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
>                 if (!oif)
>                         oif = inet->mc_index;

Wont this break bind() ?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-24 23:19       ` Eric Dumazet
@ 2017-03-28  4:22         ` Cong Wang
  2017-03-28 23:11           ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Cong Wang @ 2017-03-28  4:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daurnimator, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On Fri, Mar 24, 2017 at 4:19 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-03-24 at 15:34 -0700, Cong Wang wrote:
>> (Cc'ing Michael Kerrisk)
>>
>> On Wed, Mar 22, 2017 at 10:18 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Thu, 2017-03-23 at 13:22 +1100, Daurnimator wrote:
>> >> On 9 March 2017 at 14:10, Daurnimator <quae@daurnimator.com> wrote:
>> >> > When debugging https://github.com/daurnimator/lua-http/issues/73 which
>> >> > uses https://github.com/wahern/dns we ran into an issue where modern
>> >> > linux kernels return EINVAL if you try and re-use a udp socket.
>> >> > The issue seems to occur if you go from a local destination ip to a
>> >> > non-local one.
>> >>
>> >> Did anyone get a chance to look into this issue?
>> >
>> > I believe man page is not complete.
>> >
>> > A disconnect is needed before another connect()
>>
>> Is it? Making connect() reentrant is reasonable for connection-less
>> protocol like UDP, but I don't dig POSIX for the details. If so we need
>> something like below...
>>
>> --- a/net/ipv4/datagram.c
>> +++ b/net/ipv4/datagram.c
>> @@ -40,7 +40,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
>> sockaddr *uaddr, int addr_len
>>         sk_dst_reset(sk);
>>
>>         oif = sk->sk_bound_dev_if;
>> -       saddr = inet->inet_saddr;
>> +       saddr = inet->inet_saddr = 0;
>>         if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
>>                 if (!oif)
>>                         oif = inet->mc_index;
>
> Wont this break bind() ?
>

Right. We need to distinguish bind() and connect(), something
like below?

--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -26,7 +26,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
sockaddr *uaddr, int addr_len
        struct sockaddr_in *usin = (struct sockaddr_in *) uaddr;
        struct flowi4 *fl4;
        struct rtable *rt;
-       __be32 saddr;
+       __be32 saddr = 0;
        int oif;
        int err;

@@ -40,7 +40,8 @@ int __ip4_datagram_connect(struct sock *sk, struct
sockaddr *uaddr, int addr_len
        sk_dst_reset(sk);

        oif = sk->sk_bound_dev_if;
-       saddr = inet->inet_saddr;
+       if (sk->sk_userlocks & SOCK_BINDADDR_LOCK)
+               saddr = inet->inet_saddr;
        if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
                if (!oif)
                        oif = inet->mc_index;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-28  4:22         ` Cong Wang
@ 2017-03-28 23:11           ` Eric Dumazet
  2017-03-28 23:44             ` Daurnimator
  2017-03-29  0:52             ` Eric Dumazet
  0 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2017-03-28 23:11 UTC (permalink / raw)
  To: Cong Wang
  Cc: Daurnimator, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On Mon, 2017-03-27 at 21:22 -0700, Cong Wang wrote:
> On Fri, Mar 24, 2017 at 4:19 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Fri, 2017-03-24 at 15:34 -0700, Cong Wang wrote:
> >> (Cc'ing Michael Kerrisk)
> >>
> >> On Wed, Mar 22, 2017 at 10:18 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> > On Thu, 2017-03-23 at 13:22 +1100, Daurnimator wrote:
> >> >> On 9 March 2017 at 14:10, Daurnimator <quae@daurnimator.com> wrote:
> >> >> > When debugging https://github.com/daurnimator/lua-http/issues/73 which
> >> >> > uses https://github.com/wahern/dns we ran into an issue where modern
> >> >> > linux kernels return EINVAL if you try and re-use a udp socket.
> >> >> > The issue seems to occur if you go from a local destination ip to a
> >> >> > non-local one.
> >> >>
> >> >> Did anyone get a chance to look into this issue?
> >> >
> >> > I believe man page is not complete.
> >> >
> >> > A disconnect is needed before another connect()
> >>
> >> Is it? Making connect() reentrant is reasonable for connection-less
> >> protocol like UDP, but I don't dig POSIX for the details. If so we need
> >> something like below...
> >>
> >> --- a/net/ipv4/datagram.c
> >> +++ b/net/ipv4/datagram.c
> >> @@ -40,7 +40,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
> >> sockaddr *uaddr, int addr_len
> >>         sk_dst_reset(sk);
> >>
> >>         oif = sk->sk_bound_dev_if;
> >> -       saddr = inet->inet_saddr;
> >> +       saddr = inet->inet_saddr = 0;
> >>         if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
> >>                 if (!oif)
> >>                         oif = inet->mc_index;
> >
> > Wont this break bind() ?
> >
> 
> Right. We need to distinguish bind() and connect(), something
> like below?
> 
> --- a/net/ipv4/datagram.c
> +++ b/net/ipv4/datagram.c
> @@ -26,7 +26,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
> sockaddr *uaddr, int addr_len
>         struct sockaddr_in *usin = (struct sockaddr_in *) uaddr;
>         struct flowi4 *fl4;
>         struct rtable *rt;
> -       __be32 saddr;
> +       __be32 saddr = 0;
>         int oif;
>         int err;
> 
> @@ -40,7 +40,8 @@ int __ip4_datagram_connect(struct sock *sk, struct
> sockaddr *uaddr, int addr_len
>         sk_dst_reset(sk);
> 
>         oif = sk->sk_bound_dev_if;
> -       saddr = inet->inet_saddr;
> +       if (sk->sk_userlocks & SOCK_BINDADDR_LOCK)
> +               saddr = inet->inet_saddr;
>         if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
>                 if (!oif)
>                         oif = inet->mc_index;

Yes, this looks better.

Although you probably need to change a bit later this part :

if (!inet->inet_saddr)
	inet->inet_saddr = fl4->saddr;  /* Update source address */


Should we also fix IPv6 or is this bug only about IPv4 ?

Thanks !

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-28 23:11           ` Eric Dumazet
@ 2017-03-28 23:44             ` Daurnimator
  2017-03-28 23:45               ` Daurnimator
  2017-03-29  0:52             ` Eric Dumazet
  1 sibling, 1 reply; 17+ messages in thread
From: Daurnimator @ 2017-03-28 23:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On 29 March 2017 at 10:11, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Should we also fix IPv6 or is this bug only about IPv4 ?

In IPv6 the second connect() returns ENETUNREACH (rather than failing
yet returning 0 as it does in IPv4).
This should probably incorrect behaviour and should the operation
should succeed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-28 23:44             ` Daurnimator
@ 2017-03-28 23:45               ` Daurnimator
  0 siblings, 0 replies; 17+ messages in thread
From: Daurnimator @ 2017-03-28 23:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On 29 March 2017 at 10:44, Daurnimator <quae@daurnimator.com> wrote:
> This should probably incorrect behaviour and should the operation
> should succeed.

Uh, not sure why there are so many "should"s in that message.

This is probably incorrect behaviour and the operation should succeed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-28 23:11           ` Eric Dumazet
  2017-03-28 23:44             ` Daurnimator
@ 2017-03-29  0:52             ` Eric Dumazet
  2017-03-30 23:36               ` Cong Wang
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2017-03-29  0:52 UTC (permalink / raw)
  To: Cong Wang
  Cc: Daurnimator, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On Tue, 2017-03-28 at 16:11 -0700, Eric Dumazet wrote:

> Yes, this looks better.
> 
> Although you probably need to change a bit later this part :
> 
> if (!inet->inet_saddr)
> 	inet->inet_saddr = fl4->saddr;  /* Update source address */
> 

I came up with the following tested patch for IPv4

diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index f915abff1350a86af8d5bb89725b751c061b0fb5..1454b6191e0d38ffae0ae260578858285bc5f77b 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -40,7 +40,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	sk_dst_reset(sk);
 
 	oif = sk->sk_bound_dev_if;
-	saddr = inet->inet_saddr;
+	saddr = (sk->sk_userlocks & SOCK_BINDADDR_LOCK) ? inet->inet_saddr : 0;
 	if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
 		if (!oif)
 			oif = inet->mc_index;
@@ -64,9 +64,8 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 		err = -EACCES;
 		goto out;
 	}
-	if (!inet->inet_saddr)
+	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) {
 		inet->inet_saddr = fl4->saddr;	/* Update source address */
-	if (!inet->inet_rcv_saddr) {
 		inet->inet_rcv_saddr = fl4->saddr;
 		if (sk->sk_prot->rehash)
 			sk->sk_prot->rehash(sk);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-29  0:52             ` Eric Dumazet
@ 2017-03-30 23:36               ` Cong Wang
  2017-03-31  0:02                 ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Cong Wang @ 2017-03-30 23:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daurnimator, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On Tue, Mar 28, 2017 at 5:52 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-03-28 at 16:11 -0700, Eric Dumazet wrote:
>
>> Yes, this looks better.
>>
>> Although you probably need to change a bit later this part :
>>
>> if (!inet->inet_saddr)
>>       inet->inet_saddr = fl4->saddr;  /* Update source address */
>>
>
> I came up with the following tested patch for IPv4
>
> diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
> index f915abff1350a86af8d5bb89725b751c061b0fb5..1454b6191e0d38ffae0ae260578858285bc5f77b 100644
> --- a/net/ipv4/datagram.c
> +++ b/net/ipv4/datagram.c
> @@ -40,7 +40,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
>         sk_dst_reset(sk);
>
>         oif = sk->sk_bound_dev_if;
> -       saddr = inet->inet_saddr;
> +       saddr = (sk->sk_userlocks & SOCK_BINDADDR_LOCK) ? inet->inet_saddr : 0;
>         if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
>                 if (!oif)
>                         oif = inet->mc_index;
> @@ -64,9 +64,8 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
>                 err = -EACCES;
>                 goto out;
>         }
> -       if (!inet->inet_saddr)
> +       if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) {
>                 inet->inet_saddr = fl4->saddr;  /* Update source address */
> -       if (!inet->inet_rcv_saddr) {
>                 inet->inet_rcv_saddr = fl4->saddr;
>                 if (sk->sk_prot->rehash)
>                         sk->sk_prot->rehash(sk);

Why do we need this here? If you mean bind() INADDR_ANY is bound,
then it is totally a different problem?

BTW, I am still not sure about what POSIX says about the connect()
behavior, I can only find this [1]:

"
If the initiating socket is not connection-mode, then connect() shall set the
socket's peer address, and no connection is made. For SOCK_DGRAM
sockets, the peer address identifies where all datagrams are sent on
subsequent send() functions, and limits the remote sender for subsequent
recv() functions.
"

It doesn't say anything about source address. But the man page [2] says:

"
When
       connect(2) is called on an unbound socket, the socket is
       automatically bound to a random free port or to a usable shared port
       with the local address set to INADDR_ANY.
"

Seems the last part is inaccurate, kernel actually picks a source address
from route instead of just using INADDR_ANY for connect(2).

So, for me, I think the following behaviors make sense for UDP:

1) When a bind() is called before connect()'s, aka:

bind();
connect(addr1); // should not change source addr
connect(addr2); // should fail is the source addr can not reach peer addr

2) No bind() before connect()'s, aka:

connect(addr1); // Free to bind a source addr
connect(addr2); // Free to bind a new source addr and change peer addr

Thoughts?

1. http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html
2. http://man7.org/linux/man-pages/man7/ip.7.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-30 23:36               ` Cong Wang
@ 2017-03-31  0:02                 ` Eric Dumazet
  2017-03-31 16:52                   ` Cong Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2017-03-31  0:02 UTC (permalink / raw)
  To: Cong Wang
  Cc: Daurnimator, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On Thu, 2017-03-30 at 16:36 -0700, Cong Wang wrote:
> On Tue, Mar 28, 2017 at 5:52 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2017-03-28 at 16:11 -0700, Eric Dumazet wrote:
> >
> >> Yes, this looks better.
> >>
> >> Although you probably need to change a bit later this part :
> >>
> >> if (!inet->inet_saddr)
> >>       inet->inet_saddr = fl4->saddr;  /* Update source address */
> >>
> >
> > I came up with the following tested patch for IPv4
> >
> > diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
> > index f915abff1350a86af8d5bb89725b751c061b0fb5..1454b6191e0d38ffae0ae260578858285bc5f77b 100644
> > --- a/net/ipv4/datagram.c
> > +++ b/net/ipv4/datagram.c
> > @@ -40,7 +40,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
> >         sk_dst_reset(sk);
> >
> >         oif = sk->sk_bound_dev_if;
> > -       saddr = inet->inet_saddr;
> > +       saddr = (sk->sk_userlocks & SOCK_BINDADDR_LOCK) ? inet->inet_saddr : 0;
> >         if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
> >                 if (!oif)
> >                         oif = inet->mc_index;
> > @@ -64,9 +64,8 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
> >                 err = -EACCES;
> >                 goto out;
> >         }
> > -       if (!inet->inet_saddr)
> > +       if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) {
> >                 inet->inet_saddr = fl4->saddr;  /* Update source address */
> > -       if (!inet->inet_rcv_saddr) {
> >                 inet->inet_rcv_saddr = fl4->saddr;
> >                 if (sk->sk_prot->rehash)
> >                         sk->sk_prot->rehash(sk);
> 
> Why do we need this here? If you mean bind() INADDR_ANY is bound,
> then it is totally a different problem?


Proper delivery of RX packets will need to find the socket, and this
needs the 2-tuple (source address, source port) info for UDP.

So after a connect(), we need to rehash

> 
> BTW, I am still not sure about what POSIX says about the connect()
> behavior, I can only find this [1]:
> 
> "
> If the initiating socket is not connection-mode, then connect() shall set the
> socket's peer address, and no connection is made. For SOCK_DGRAM
> sockets, the peer address identifies where all datagrams are sent on
> subsequent send() functions, and limits the remote sender for subsequent
> recv() functions.
> "
> 
> It doesn't say anything about source address. But the man page [2] says:
> 
> "
> When
>        connect(2) is called on an unbound socket, the socket is
>        automatically bound to a random free port or to a usable shared port
>        with the local address set to INADDR_ANY.
> "
> 
> Seems the last part is inaccurate, kernel actually picks a source address
> from route instead of just using INADDR_ANY for connect(2).
> 
> So, for me, I think the following behaviors make sense for UDP:
> 
> 1) When a bind() is called before connect()'s, aka:
> 
> bind();
> connect(addr1); // should not change source addr

It depends. bind() can be only allocating the source port.

If bind(INADDR_ANY) was used, then we need to determine source addr at
connect() time.

Point of connect() is that future send() wont have to guess the 4-tuple
infos. But also that incoming packets will find this precise socket
thanks to a higher score.

And tools like "ss -aun" should display the 4-tuple after a successful
connect()

> connect(addr2); // should fail is the source addr can not reach peer addr
> 
> 2) No bind() before connect()'s, aka:
> 
> connect(addr1); // Free to bind a source addr
> connect(addr2); // Free to bind a new source addr and change peer addr

Exactly. My patch does this.

> 
> Thoughts?
> 
> 1. http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html
> 2. http://man7.org/linux/man-pages/man7/ip.7.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-31  0:02                 ` Eric Dumazet
@ 2017-03-31 16:52                   ` Cong Wang
  2017-03-31 17:00                     ` Cong Wang
  2017-04-28  2:55                     ` Daurnimator
  0 siblings, 2 replies; 17+ messages in thread
From: Cong Wang @ 2017-03-31 16:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daurnimator, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On Thu, Mar 30, 2017 at 5:02 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-03-30 at 16:36 -0700, Cong Wang wrote:
>> On Tue, Mar 28, 2017 at 5:52 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Tue, 2017-03-28 at 16:11 -0700, Eric Dumazet wrote:
>> >
>> >> Yes, this looks better.
>> >>
>> >> Although you probably need to change a bit later this part :
>> >>
>> >> if (!inet->inet_saddr)
>> >>       inet->inet_saddr = fl4->saddr;  /* Update source address */
>> >>
>> >
>> > I came up with the following tested patch for IPv4
>> >
>> > diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
>> > index f915abff1350a86af8d5bb89725b751c061b0fb5..1454b6191e0d38ffae0ae260578858285bc5f77b 100644
>> > --- a/net/ipv4/datagram.c
>> > +++ b/net/ipv4/datagram.c
>> > @@ -40,7 +40,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
>> >         sk_dst_reset(sk);
>> >
>> >         oif = sk->sk_bound_dev_if;
>> > -       saddr = inet->inet_saddr;
>> > +       saddr = (sk->sk_userlocks & SOCK_BINDADDR_LOCK) ? inet->inet_saddr : 0;
>> >         if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
>> >                 if (!oif)
>> >                         oif = inet->mc_index;
>> > @@ -64,9 +64,8 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
>> >                 err = -EACCES;
>> >                 goto out;
>> >         }
>> > -       if (!inet->inet_saddr)
>> > +       if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) {
>> >                 inet->inet_saddr = fl4->saddr;  /* Update source address */
>> > -       if (!inet->inet_rcv_saddr) {
>> >                 inet->inet_rcv_saddr = fl4->saddr;
>> >                 if (sk->sk_prot->rehash)
>> >                         sk->sk_prot->rehash(sk);
>>
>> Why do we need this here? If you mean bind() INADDR_ANY is bound,
>> then it is totally a different problem?
>
>
> Proper delivery of RX packets will need to find the socket, and this
> needs the 2-tuple (source address, source port) info for UDP.
>
> So after a connect(), we need to rehash

Oh, I didn't notice remove the if (!inet->inet_rcv_saddr) check...

[...]

>> 1) When a bind() is called before connect()'s, aka:
>>
>> bind();
>> connect(addr1); // should not change source addr
>
> It depends. bind() can be only allocating the source port.
>
> If bind(INADDR_ANY) was used, then we need to determine source addr at
> connect() time.

Yes, bind() only sets the flag for non-zero address:

        if (inet->inet_rcv_saddr)
                sk->sk_userlocks |= SOCK_BINDADDR_LOCK;


Please submit your patch formally and with a man page patch too.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-31 16:52                   ` Cong Wang
@ 2017-03-31 17:00                     ` Cong Wang
  2017-04-28  2:55                     ` Daurnimator
  1 sibling, 0 replies; 17+ messages in thread
From: Cong Wang @ 2017-03-31 17:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daurnimator, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On Fri, Mar 31, 2017 at 9:52 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Please submit your patch formally and with a man page patch too.

BTW, and probably IPv6 too.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-03-31 16:52                   ` Cong Wang
  2017-03-31 17:00                     ` Cong Wang
@ 2017-04-28  2:55                     ` Daurnimator
  2017-04-28  4:48                       ` Eric Dumazet
  1 sibling, 1 reply; 17+ messages in thread
From: Daurnimator @ 2017-04-28  2:55 UTC (permalink / raw)
  To: Cong Wang
  Cc: Eric Dumazet, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On 1 April 2017 at 03:52, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> Please submit your patch formally and with a man page patch too.

Did a patch get submitted? I had a look but couldn't see it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-04-28  2:55                     ` Daurnimator
@ 2017-04-28  4:48                       ` Eric Dumazet
  2017-09-04 13:23                         ` Daurnimator
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2017-04-28  4:48 UTC (permalink / raw)
  To: Daurnimator
  Cc: Cong Wang, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On Fri, 2017-04-28 at 12:55 +1000, Daurnimator wrote:
> On 1 April 2017 at 03:52, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > Please submit your patch formally and with a man page patch too.
> 
> Did a patch get submitted? I had a look but couldn't see it.

Not yet.

I will probably wait for next cycle (linux-4.13), I have been busy on
other work lately.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: EINVAL when using connect() for udp sockets
  2017-04-28  4:48                       ` Eric Dumazet
@ 2017-09-04 13:23                         ` Daurnimator
  0 siblings, 0 replies; 17+ messages in thread
From: Daurnimator @ 2017-09-04 13:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Linux Kernel Network Developers, William Ahern,
	Santi T.,
	Michael Kerrisk-manpages

On 28 April 2017 at 14:48, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-04-28 at 12:55 +1000, Daurnimator wrote:
>> On 1 April 2017 at 03:52, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > Please submit your patch formally and with a man page patch too.
>>
>> Did a patch get submitted? I had a look but couldn't see it.
>
> Not yet.
>
> I will probably wait for next cycle (linux-4.13), I have been busy on
> other work lately.

Will you have time to work on this soon?

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-09-04 13:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-09  3:10 EINVAL when using connect() for udp sockets Daurnimator
2017-03-23  2:22 ` Daurnimator
2017-03-23  5:18   ` Eric Dumazet
2017-03-24 22:34     ` Cong Wang
2017-03-24 23:19       ` Eric Dumazet
2017-03-28  4:22         ` Cong Wang
2017-03-28 23:11           ` Eric Dumazet
2017-03-28 23:44             ` Daurnimator
2017-03-28 23:45               ` Daurnimator
2017-03-29  0:52             ` Eric Dumazet
2017-03-30 23:36               ` Cong Wang
2017-03-31  0:02                 ` Eric Dumazet
2017-03-31 16:52                   ` Cong Wang
2017-03-31 17:00                     ` Cong Wang
2017-04-28  2:55                     ` Daurnimator
2017-04-28  4:48                       ` Eric Dumazet
2017-09-04 13:23                         ` Daurnimator

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.