All of lore.kernel.org
 help / color / mirror / Atom feed
* socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
@ 2009-03-16 23:48 Felix von Leitner
  2009-03-17  0:00 ` Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Felix von Leitner @ 2009-03-16 23:48 UTC (permalink / raw)
  To: netdev

Here's an strace:

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET6, sin6_port=htons(6969), inet_pton(AF_INET6, "::ffff:0.0.0.0", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)

This is supposed to work, and it works on other operating systems, even
on Mac OS X.

I think it used to work on Linux, too.

I'm using 2.6.29-rc7 right now, but others have reported this not
working on distro kernels, too.

Felix

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-16 23:48 socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0 Felix von Leitner
@ 2009-03-17  0:00 ` Stephen Hemminger
  2009-03-17  0:18   ` Felix von Leitner
  2009-03-17  2:26 ` Brian Haley
  2009-03-17  9:03 ` Bjørn Mork
  2 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2009-03-17  0:00 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: netdev

On Tue, 17 Mar 2009 00:48:10 +0100
Felix von Leitner <felix-kernel@fefe.de> wrote:

> Here's an strace:
> 
> socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
> fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
> fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
> setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> bind(3, {sa_family=AF_INET6, sin6_port=htons(6969), inet_pton(AF_INET6, "::ffff:0.0.0.0", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
> 
> This is supposed to work, and it works on other operating systems, even
> on Mac OS X.
> 
> I think it used to work on Linux, too.
> 
> I'm using 2.6.29-rc7 right now, but others have reported this not
> working on distro kernels, too.
> 
> Felix
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Most likely you already have same port open on IPV4 and unless
you set IPV6 only, the bind bind will fail. The standard way
of doing servers is to bind only for IPV6 and handle IPV4
clients via the 6-4 address mapping.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17  0:00 ` Stephen Hemminger
@ 2009-03-17  0:18   ` Felix von Leitner
  0 siblings, 0 replies; 27+ messages in thread
From: Felix von Leitner @ 2009-03-17  0:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

> > Here's an strace:
> > 
> > socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
> > fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
> > fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
> > setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> > bind(3, {sa_family=AF_INET6, sin6_port=htons(6969), inet_pton(AF_INET6, "::ffff:0.0.0.0", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
> > 
> > This is supposed to work, and it works on other operating systems, even
> > on Mac OS X.
> > 
> > I think it used to work on Linux, too.
> > 
> > I'm using 2.6.29-rc7 right now, but others have reported this not
> > working on distro kernels, too.
> > 
> > Felix
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

> Most likely you already have same port open on IPV4 and unless
> you set IPV6 only, the bind bind will fail. The standard way
> of doing servers is to bind only for IPV6 and handle IPV4
> clients via the 6-4 address mapping.

No I don't have anything else on that port.

BTW, just for the record, binding to ::ffff:10.0.0.3 (my eth0 address at
the moment) still works, so the mechanism is not completely broken.

Felix

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-16 23:48 socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0 Felix von Leitner
  2009-03-17  0:00 ` Stephen Hemminger
@ 2009-03-17  2:26 ` Brian Haley
  2009-03-17  2:47   ` Eric Dumazet
  2009-03-17 12:58   ` Felix von Leitner
  2009-03-17  9:03 ` Bjørn Mork
  2 siblings, 2 replies; 27+ messages in thread
From: Brian Haley @ 2009-03-17  2:26 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: netdev

Felix von Leitner wrote:
> Here's an strace:
> 
> socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
> fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
> fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
> setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> bind(3, {sa_family=AF_INET6, sin6_port=htons(6969), inet_pton(AF_INET6, "::ffff:0.0.0.0", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
> 
> This is supposed to work, and it works on other operating systems, even
> on Mac OS X.
> 
> I think it used to work on Linux, too.
> 
> I'm using 2.6.29-rc7 right now, but others have reported this not
> working on distro kernels, too.

I don't think this ever worked on Linux, from the very beginning of inet6_bind():

        /* Check if the address belongs to the host. */
        if (addr_type == IPV6_ADDR_MAPPED) {
                v4addr = addr->sin6_addr.s6_addr32[3];
                if (inet_addr_type(net, v4addr) != RTN_LOCAL) {
                        err = -EADDRNOTAVAIL;
                        goto out;
                }
        } else {

So if it's a mapped address, the lower 32-bits must contain a local address.
RFC 3493 doesn't specifically mention what to do with ::ffff:0.0.0.0, so this
looks like a gray area to me.

So are you trying to get IPv4-only behavior out of this socket?  Seems like the
wrong way to go about it.

-Brian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17  2:26 ` Brian Haley
@ 2009-03-17  2:47   ` Eric Dumazet
  2009-03-17  8:51     ` Bjørn Mork
  2009-03-17 16:00     ` Brian Haley
  2009-03-17 12:58   ` Felix von Leitner
  1 sibling, 2 replies; 27+ messages in thread
From: Eric Dumazet @ 2009-03-17  2:47 UTC (permalink / raw)
  To: Brian Haley; +Cc: Felix von Leitner, netdev

Brian Haley a écrit :
> Felix von Leitner wrote:
>> Here's an strace:
>>
>> socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
>> fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
>> fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
>> setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>> bind(3, {sa_family=AF_INET6, sin6_port=htons(6969), inet_pton(AF_INET6, "::ffff:0.0.0.0", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
>>
>> This is supposed to work, and it works on other operating systems, even
>> on Mac OS X.
>>
>> I think it used to work on Linux, too.
>>
>> I'm using 2.6.29-rc7 right now, but others have reported this not
>> working on distro kernels, too.
> 
> I don't think this ever worked on Linux, from the very beginning of inet6_bind():
> 
>         /* Check if the address belongs to the host. */
>         if (addr_type == IPV6_ADDR_MAPPED) {
>                 v4addr = addr->sin6_addr.s6_addr32[3];
>                 if (inet_addr_type(net, v4addr) != RTN_LOCAL) {
>                         err = -EADDRNOTAVAIL;
>                         goto out;
>                 }
>         } else {
> 
> So if it's a mapped address, the lower 32-bits must contain a local address.
> RFC 3493 doesn't specifically mention what to do with ::ffff:0.0.0.0, so this
> looks like a gray area to me.
> 
> So are you trying to get IPv4-only behavior out of this socket?  Seems like the
> wrong way to go about it.

To me, section 3.7 of RFC 3493 is not gray. It is only refering to interoperate
with IPV4 applications. 
Ie *sending* UDP messages to IPV4 nodes, or *connect* to TCP IPV4 nodes.

So "::ffff:0.0.0.0" has no meaning to contact an IPV4 node, since 0.0.0.0 is not
a valid IPV4 address.

RFC 2373 is also clear

Part of RFC 3493 :

   Applications may use AF_INET6 sockets to open TCP connections to IPv4
   nodes, or send UDP packets to IPv4 nodes, by simply encoding the
   destination's IPv4 address as an IPv4-mapped IPv6 address, and
   passing that address, within a sockaddr_in6 structure, in the
   connect() or sendto() call.  When applications use AF_INET6 sockets
   to accept TCP connections from IPv4 nodes, or receive UDP packets
   from IPv4 nodes, the system returns the peer's address to the
   application in the accept(), recvfrom(), or getpeername() call using
   a sockaddr_in6 structure encoded this way.



RFC 2373 states :

 The IPv6 transition mechanisms [TRAN] include a technique for hosts
 and routers to dynamically tunnel IPv6 packets over IPv4 routing
 infrastructure.  IPv6 nodes that utilize this technique are assigned
 special IPv6 unicast addresses that carry an IPv4 address in the low-
 order 32-bits.  This type of address is termed an "IPv4-compatible
 IPv6 address" and has the format:

   |                80 bits               | 16 |      32 bits        |
   +--------------------------------------+--------------------------+
   |0000..............................0000|0000|    IPv4 address     |
   +--------------------------------------+----+---------------------+

 A second type of IPv6 address which holds an embedded IPv4 address is
 also defined.  This address is used to represent the addresses of
 IPv4-only nodes (those that *do not* support IPv6) as IPv6 addresses.
 This type of address is termed an "IPv4-mapped IPv6 address" and has
 the format:

   |                80 bits               | 16 |      32 bits        |
   +--------------------------------------+--------------------------+
   |0000..............................0000|FFFF|    IPv4 address     |
   +--------------------------------------+----+---------------------+



So using the "::ffff:0.0.0.0" as a local address for an 
IPv6 socket is a paradox, since "IPv4-mapped IPV6 address"
are for IPV4-only nodes.

If you want to accept only IPV4 connections, why use AF_INET6 in the first
place ?

Check how is implemented sctp_v6_cmp_addr() to see how expensive it
is to handle extensive ipv6 address comparisons...


/* Compare addresses exactly.
 * v4-mapped-v6 is also in consideration.
 */
static int sctp_v6_cmp_addr(const union sctp_addr *addr1,
                            const union sctp_addr *addr2)
{
        if (addr1->sa.sa_family != addr2->sa.sa_family) {
                if (addr1->sa.sa_family == AF_INET &&
                    addr2->sa.sa_family == AF_INET6 &&
                    ipv6_addr_v4mapped(&addr2->v6.sin6_addr)) {
                        if (addr2->v6.sin6_port == addr1->v4.sin_port &&
                            addr2->v6.sin6_addr.s6_addr32[3] ==
                            addr1->v4.sin_addr.s_addr)
                                return 1;
                }
                if (addr2->sa.sa_family == AF_INET &&
                    addr1->sa.sa_family == AF_INET6 &&
                    ipv6_addr_v4mapped(&addr1->v6.sin6_addr)) {
                        if (addr1->v6.sin6_port == addr2->v4.sin_port &&
                            addr1->v6.sin6_addr.s6_addr32[3] ==
                            addr2->v4.sin_addr.s_addr)
                                return 1;
                }
                return 0;
        }
        if (!ipv6_addr_equal(&addr1->v6.sin6_addr, &addr2->v6.sin6_addr))
                return 0;
        /* If this is a linklocal address, compare the scope_id. */
        if (ipv6_addr_type(&addr1->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) {
                if (addr1->v6.sin6_scope_id && addr2->v6.sin6_scope_id &&
                    (addr1->v6.sin6_scope_id != addr2->v6.sin6_scope_id)) {
                        return 0;
                }
        }

        return 1;
}


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17  2:47   ` Eric Dumazet
@ 2009-03-17  8:51     ` Bjørn Mork
  2009-03-17 16:00     ` Brian Haley
  1 sibling, 0 replies; 27+ messages in thread
From: Bjørn Mork @ 2009-03-17  8:51 UTC (permalink / raw)
  To: netdev

Eric Dumazet <dada1@cosmosbay.com> writes:

> RFC 2373 states :

I fully agree with your interpretation...

..., but just FYI, RFC 2373 was obsoleted by RFC 3513 which in turn was
obsoleted by RFC 4291.  Among the important changes was the deprecation
of the first address class you quote (the ::a.b.c.d addresses).

This doesn't affect the question though.



Bjørn


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-16 23:48 socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0 Felix von Leitner
  2009-03-17  0:00 ` Stephen Hemminger
  2009-03-17  2:26 ` Brian Haley
@ 2009-03-17  9:03 ` Bjørn Mork
  2 siblings, 0 replies; 27+ messages in thread
From: Bjørn Mork @ 2009-03-17  9:03 UTC (permalink / raw)
  To: netdev

Felix von Leitner <felix-kernel@fefe.de> writes:

> bind(3, {sa_family=AF_INET6, sin6_port=htons(6969), inet_pton(AF_INET6, "::ffff:0.0.0.0", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
>
> This is supposed to work, and it works on other operating systems, even
> on Mac OS X.
>
> I think it used to work on Linux, too.

You can find testresults for a number of (older) OSes here:
http://www.kame.net/newsletter/20010504/

You'll probably have to refer to the bindtest man page to interprete the
results:  http://www.jinmei.org/bindtest-man.txt



Bjørn


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17  2:26 ` Brian Haley
  2009-03-17  2:47   ` Eric Dumazet
@ 2009-03-17 12:58   ` Felix von Leitner
  2009-03-17 13:47     ` Vlad Yasevich
  2009-03-17 15:59     ` Brian Haley
  1 sibling, 2 replies; 27+ messages in thread
From: Felix von Leitner @ 2009-03-17 12:58 UTC (permalink / raw)
  To: Brian Haley; +Cc: netdev

> I don't think this ever worked on Linux, from the very beginning of inet6_bind():

>         /* Check if the address belongs to the host. */
>         if (addr_type == IPV6_ADDR_MAPPED) {
>                 v4addr = addr->sin6_addr.s6_addr32[3];
>                 if (inet_addr_type(net, v4addr) != RTN_LOCAL) {
>                         err = -EADDRNOTAVAIL;
>                         goto out;
>                 }
>         } else {

What is the harm in allowing this?  That way an application ported to
IPv6 can still bind IPv4-only.  Why would it be legal to bind to a
specific IPv4 address but not to all IPv4 addresses?

The specific case is a bittorrent tracker.  The code was ported to IPv6,
but since there is so much overhead in storing IPv6 addresses you are
supposed to run two processes, one on the IPv6 address and one on the
IPv4 address (the IPv4 one then does not have overhead).  The sane way
to do this is to bind the IPv6 socket to ::ffff:0.0.0.0 then.  Otherwise
you would need some kind of giant abstraction layer in the application.
And we specifically added the ipv4 mapped addresses so applications
would not need to have a giant abstraction layer.

Did I mention *BSD and OSX allow this?

> So are you trying to get IPv4-only behavior out of this socket?  Seems
> like the wrong way to go about it.

Why would you say that?

Felix

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17 12:58   ` Felix von Leitner
@ 2009-03-17 13:47     ` Vlad Yasevich
  2009-03-17 14:14       ` Felix von Leitner
  2009-03-17 15:59     ` Brian Haley
  1 sibling, 1 reply; 27+ messages in thread
From: Vlad Yasevich @ 2009-03-17 13:47 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: Brian Haley, netdev

Felix von Leitner wrote:
>> I don't think this ever worked on Linux, from the very beginning of inet6_bind():
> 
>>         /* Check if the address belongs to the host. */
>>         if (addr_type == IPV6_ADDR_MAPPED) {
>>                 v4addr = addr->sin6_addr.s6_addr32[3];
>>                 if (inet_addr_type(net, v4addr) != RTN_LOCAL) {
>>                         err = -EADDRNOTAVAIL;
>>                         goto out;
>>                 }
>>         } else {
> 
> What is the harm in allowing this?  That way an application ported to
> IPv6 can still bind IPv4-only.  Why would it be legal to bind to a
> specific IPv4 address but not to all IPv4 addresses?
> 
> The specific case is a bittorrent tracker.  The code was ported to IPv6,
> but since there is so much overhead in storing IPv6 addresses you are
> supposed to run two processes, one on the IPv6 address and one on the
> IPv4 address (the IPv4 one then does not have overhead).  The sane way
> to do this is to bind the IPv6 socket to ::ffff:0.0.0.0 then.  Otherwise
> you would need some kind of giant abstraction layer in the application.
> And we specifically added the ipv4 mapped addresses so applications
> would not need to have a giant abstraction layer.

Sorry, I just don't buy this.  You imply that you don't want the overhead
of storing IPv6 addresses, but you still get this with ::ffff:0.0.0.0.
In fact, now your overhead is even worse since ever IPv4 address will be
stored stored and interpreted as IPv6 128 bit address.

If you really care about overhead, run 2 services.  Your IPv6 service
will only track real IPv6 addresses and will reduce you total overhead.

If you don't care about overhead, just bind a single socket to :: and
you will get behavior identical for the ::fff:0.0.0.0 case, but with
the added benefit of tracking real ipv6 addresses as well.

Having written support for ::ffff:0.0.0.0, I've always thought it was
a bastardized case that didn't provide any benefits.  It was like saying:
"I've got IPv6 on my system, but I don't really support it, even though
I pretend that I do."

> 
> Did I mention *BSD and OSX allow this?
> 
>> So are you trying to get IPv4-only behavior out of this socket?  Seems
>> like the wrong way to go about it.
> 
> Why would you say that?

Because that case doesn't provide any benefits.  It only has the drawback that
you have to deal with ipv4-mapped IPv6 addresses witch is the overhead of
the whole thing.

If you are prepared to deal with it, you might as well deal with real ipv6 addresses
at the same time and mitigate your overhead somewhat.

-vlad
> 
> Felix
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17 13:47     ` Vlad Yasevich
@ 2009-03-17 14:14       ` Felix von Leitner
  2009-03-17 14:57         ` Vlad Yasevich
  2009-03-17 15:21         ` Eric Dumazet
  0 siblings, 2 replies; 27+ messages in thread
From: Felix von Leitner @ 2009-03-17 14:14 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Brian Haley, netdev

> Sorry, I just don't buy this.  You imply that you don't want the overhead
> of storing IPv6 addresses, but you still get this with ::ffff:0.0.0.0.
> In fact, now your overhead is even worse since ever IPv4 address will be
> stored stored and interpreted as IPv6 128 bit address.

> If you really care about overhead, run 2 services.  Your IPv6 service
> will only track real IPv6 addresses and will reduce you total overhead.

I am worried about the overhead of storing the IPv6 addresses.
I am not storing them in the IPv4 case.

But the socket code has been rewritten to use IPv6 addresses only,
precisely because IPv4-mapped addresses exist.

> If you don't care about overhead, just bind a single socket to :: and
> you will get behavior identical for the ::fff:0.0.0.0 case, but with
> the added benefit of tracking real ipv6 addresses as well.

You probably mean well but please stick to the problem at hand and don't
speculate about my app.

> Having written support for ::ffff:0.0.0.0, I've always thought it was
> a bastardized case that didn't provide any benefits.  It was like saying:
> "I've got IPv6 on my system, but I don't really support it, even though
> I pretend that I do."

The app has a command line option to specify which address to bind to.
The app understands IPv4 addresses and converts them to ipv4 mapped
addresses so it can only deal with sockaddr_in6 when talking to the
kernel and does not need to store info on what kind of socket family it
is dealing with.

If someone specifies 0.0.0.0, it does not work.  It's that easy.

Now it may be a fascinating side discussion on whether you think IPv4
mapped 0.0.0.0 is useful or not, but rest assured: it is useful to at
least one high profile app that is so far running on Linux.

> > Why would you say that?
> Because that case doesn't provide any benefits.

You may not see it but it does.

> It only has the drawback that you have to deal with ipv4-mapped IPv6
> addresses witch is the overhead of the whole thing.

That is not a drawback.  On the contrary.  It greatly simplifies how the
app deals with the socket API.

> If you are prepared to deal with it, you might as well deal with real ipv6 addresses
> at the same time and mitigate your overhead somewhat.

You are currently proving all the snide remarks by the BSD people about
the Linux IP stack true, and the "professionalism" snide remarks of the
Solaris people.  Great work, man.

Felix

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17 14:14       ` Felix von Leitner
@ 2009-03-17 14:57         ` Vlad Yasevich
  2009-03-17 17:51           ` Felix von Leitner
  2009-03-17 15:21         ` Eric Dumazet
  1 sibling, 1 reply; 27+ messages in thread
From: Vlad Yasevich @ 2009-03-17 14:57 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: Brian Haley, netdev

Felix von Leitner wrote:
>> Sorry, I just don't buy this.  You imply that you don't want the overhead
>> of storing IPv6 addresses, but you still get this with ::ffff:0.0.0.0.
>> In fact, now your overhead is even worse since ever IPv4 address will be
>> stored stored and interpreted as IPv6 128 bit address.
> 
>> If you really care about overhead, run 2 services.  Your IPv6 service
>> will only track real IPv6 addresses and will reduce you total overhead.
> 
> I am worried about the overhead of storing the IPv6 addresses.
> I am not storing them in the IPv4 case.
> 
> But the socket code has been rewritten to use IPv6 addresses only,
> precisely because IPv4-mapped addresses exist.

So, what you want to do is provide IPv4 only service on a fully
configured dual-stacked machine by running an IPv6 enabled application?

Why do you not want to provide IPv6 side of the same service?

You mentioned overhead (and I am guessing that's the answer the above question),
but is the number of IPv6 clients so high that your service would
not be able to handle it.

As I've already mentioned, your overhead of tracking IPv6 clients is actually
lower that tracking all the IPv4 clients using mapped addresses.

One way of preventing the tracking IPv6 clients is by disallowing IPv6 traffic
or even not configuring any IPv6 addresses.  That could get what you want
right now, without waiting for a kernel patch.

> 
>> If you don't care about overhead, just bind a single socket to :: and
>> you will get behavior identical for the ::fff:0.0.0.0 case, but with
>> the added benefit of tracking real ipv6 addresses as well.
> 
> You probably mean well but please stick to the problem at hand and don't
> speculate about my app.
> 
>> Having written support for ::ffff:0.0.0.0, I've always thought it was
>> a bastardized case that didn't provide any benefits.  It was like saying:
>> "I've got IPv6 on my system, but I don't really support it, even though
>> I pretend that I do."
> 
> The app has a command line option to specify which address to bind to.
> The app understands IPv4 addresses and converts them to ipv4 mapped
> addresses so it can only deal with sockaddr_in6 when talking to the
> kernel and does not need to store info on what kind of socket family it
> is dealing with.
> 
> If someone specifies 0.0.0.0, it does not work.  It's that easy.
> 
> Now it may be a fascinating side discussion on whether you think IPv4
> mapped 0.0.0.0 is useful or not, but rest assured: it is useful to at
> least one high profile app that is so far running on Linux.
> 

In this case, you are making a trade-off of application complexity against
kernel complexity.  You are making your application much simpler, while demanding
more complexity from the kernel.

It is your right as an application developer, and it our right as kernel developers
to push back and provide alternatives.

>>> Why would you say that?
>> Because that case doesn't provide any benefits.
> 
> You may not see it but it does.
> 
>> It only has the drawback that you have to deal with ipv4-mapped IPv6
>> addresses witch is the overhead of the whole thing.
> 
> That is not a drawback.  On the contrary.  It greatly simplifies how the
> app deals with the socket API.
> 
>> If you are prepared to deal with it, you might as well deal with real ipv6 addresses
>> at the same time and mitigate your overhead somewhat.
> 
> You are currently proving all the snide remarks by the BSD people about
> the Linux IP stack true, and the "professionalism" snide remarks of the
> Solaris people.  Great work, man.
> 

This is really a great way to convince someone to do the work... :/

-vlad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17 14:14       ` Felix von Leitner
  2009-03-17 14:57         ` Vlad Yasevich
@ 2009-03-17 15:21         ` Eric Dumazet
  2009-03-17 18:01           ` Felix von Leitner
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2009-03-17 15:21 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: Vlad Yasevich, Brian Haley, netdev

Felix von Leitner a écrit :
>> Sorry, I just don't buy this.  You imply that you don't want the overhead
>> of storing IPv6 addresses, but you still get this with ::ffff:0.0.0.0.
>> In fact, now your overhead is even worse since ever IPv4 address will be
>> stored stored and interpreted as IPv6 128 bit address.
> 
>> If you really care about overhead, run 2 services.  Your IPv6 service
>> will only track real IPv6 addresses and will reduce you total overhead.
> 
> I am worried about the overhead of storing the IPv6 addresses.
> I am not storing them in the IPv4 case.
> 
> But the socket code has been rewritten to use IPv6 addresses only,
> precisely because IPv4-mapped addresses exist.
> 
>> If you don't care about overhead, just bind a single socket to :: and
>> you will get behavior identical for the ::fff:0.0.0.0 case, but with
>> the added benefit of tracking real ipv6 addresses as well.
> 
> You probably mean well but please stick to the problem at hand and don't
> speculate about my app.
> 
>> Having written support for ::ffff:0.0.0.0, I've always thought it was
>> a bastardized case that didn't provide any benefits.  It was like saying:
>> "I've got IPv6 on my system, but I don't really support it, even though
>> I pretend that I do."
> 
> The app has a command line option to specify which address to bind to.
> The app understands IPv4 addresses and converts them to ipv4 mapped
> addresses so it can only deal with sockaddr_in6 when talking to the
> kernel and does not need to store info on what kind of socket family it
> is dealing with.
> 
> If someone specifies 0.0.0.0, it does not work.  It's that easy.
> 
> Now it may be a fascinating side discussion on whether you think IPv4
> mapped 0.0.0.0 is useful or not, but rest assured: it is useful to at
> least one high profile app that is so far running on Linux.
> 
>>> Why would you say that?
>> Because that case doesn't provide any benefits.
> 
> You may not see it but it does.
> 
>> It only has the drawback that you have to deal with ipv4-mapped IPv6
>> addresses witch is the overhead of the whole thing.
> 
> That is not a drawback.  On the contrary.  It greatly simplifies how the
> app deals with the socket API.
> 
>> If you are prepared to deal with it, you might as well deal with real ipv6 addresses
>> at the same time and mitigate your overhead somewhat.
> 
> You are currently proving all the snide remarks by the BSD people about
> the Linux IP stack true, and the "professionalism" snide remarks of the
> Solaris people.  Great work, man.
>

Trying to understand why you seem furious, lets try to be pragmatic.

Most users of your great program wont have a fix for this until next year.

I am afraid you have no choice but change your program, or loose users.

Still I dont get your point. Having TCP V6 sockets is much more expensive
at kernel level (same for UDP), and bittorrent is known to stress network a bit, so
having application use an IPV4 socket where it can is a win for your
program getting more users, and computers spend less power.

grep TCP /proc/slabinfo

tw_sock_TCPv6          0      0    192   21    1 : tunables    0    0    0 : slabdata      0      0      0
TCPv6                140    140   1600   20    8 : tunables    0    0    0 : slabdata      7      7      0
tw_sock_TCP          256    256    128   32    1 : tunables    0    0    0 : slabdata      8      8      0
TCP                  197    198   1472   22    8 : tunables    0    0    0 : slabdata      9      9      0


Gasp, OSX having this "::ffff:0.0.0.0" right is probably the reason why more computers
 run OSX than linux. Sometime dont implement RFC too literally :)





^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17 12:58   ` Felix von Leitner
  2009-03-17 13:47     ` Vlad Yasevich
@ 2009-03-17 15:59     ` Brian Haley
       [not found]       ` <20090317180840.GC13270@codeblau.de>
  1 sibling, 1 reply; 27+ messages in thread
From: Brian Haley @ 2009-03-17 15:59 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: netdev

Felix von Leitner wrote:
>> I don't think this ever worked on Linux, from the very beginning of inet6_bind():
> 
>>         /* Check if the address belongs to the host. */
>>         if (addr_type == IPV6_ADDR_MAPPED) {
>>                 v4addr = addr->sin6_addr.s6_addr32[3];
>>                 if (inet_addr_type(net, v4addr) != RTN_LOCAL) {
>>                         err = -EADDRNOTAVAIL;
>>                         goto out;
>>                 }
>>         } else {
> 
> What is the harm in allowing this?  That way an application ported to
> IPv6 can still bind IPv4-only.  Why would it be legal to bind to a
> specific IPv4 address but not to all IPv4 addresses?

Please show me a porting guide that even mentions supporting IPv4-only mode
through an IPv6 socket by using this method.  There is none that I know of.

> The specific case is a bittorrent tracker.  The code was ported to IPv6,
> but since there is so much overhead in storing IPv6 addresses you are
> supposed to run two processes, one on the IPv6 address and one on the
> IPv4 address (the IPv4 one then does not have overhead).  The sane way
> to do this is to bind the IPv6 socket to ::ffff:0.0.0.0 then.  Otherwise
> you would need some kind of giant abstraction layer in the application.
> And we specifically added the ipv4 mapped addresses so applications
> would not need to have a giant abstraction layer.
> 
> Did I mention *BSD and OSX allow this?

That was their decision, and it doesn't mean it's the right thing to do.  It
doesn't mean Linux shouldn't change either, but name-calling isn't going to get
you anywhere on this list.

Compare your bittorrent server to Apache, which is probably the most widely-used
server application in the world.  It doesn't do what you're trying to do.  See
http://httpd.apache.org/docs/2.2/bind.html and/or browse the source code.

>> So are you trying to get IPv4-only behavior out of this socket?  Seems
>> like the wrong way to go about it.
> 
> Why would you say that?

Because if you want IPv4-only you open an AF_INET socket.  There is no
equivalent to IPv6-only, for example when you open an AF_INET6 socket and set
IPV6_ONLY on it.

-Brian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17  2:47   ` Eric Dumazet
  2009-03-17  8:51     ` Bjørn Mork
@ 2009-03-17 16:00     ` Brian Haley
  1 sibling, 0 replies; 27+ messages in thread
From: Brian Haley @ 2009-03-17 16:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Felix von Leitner, netdev

Eric Dumazet wrote:
> To me, section 3.7 of RFC 3493 is not gray. It is only refering to interoperate
> with IPV4 applications. 
> Ie *sending* UDP messages to IPV4 nodes, or *connect* to TCP IPV4 nodes.
> 
> So "::ffff:0.0.0.0" has no meaning to contact an IPV4 node, since 0.0.0.0 is not
> a valid IPV4 address.

I agree with you Eric :)  I was simply referring to the fact that RFC 3493
doesn't distinguish between valid and invalid use of mapped addresses:

  IPv4-mapped addresses are written as follows:

      ::FFFF:<IPv4-address>

<IPv4-address> could be interpreted as 0.0.0.0 if you take that little section
out of context.

-Brian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17 14:57         ` Vlad Yasevich
@ 2009-03-17 17:51           ` Felix von Leitner
  0 siblings, 0 replies; 27+ messages in thread
From: Felix von Leitner @ 2009-03-17 17:51 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Brian Haley, netdev

> > I am worried about the overhead of storing the IPv6 addresses.
> > I am not storing them in the IPv4 case.

> > But the socket code has been rewritten to use IPv6 addresses only,
> > precisely because IPv4-mapped addresses exist.
> So, what you want to do is provide IPv4 only service on a fully
> configured dual-stacked machine by running an IPv6 enabled application?

Yes.
Actually, I want to provide IPv6 and IPv4 service, but it turns out the
users in some cases want to run the service in IPv4-only mode.

> Why do you not want to provide IPv6 side of the same service?

As I said, in this particular case, you run two processes.
One for IPv6 and one for IPv4.

The reason is that

  a) it's P2P, so you don't want to provide IPv6 addresses of peers to
  IPv4 users anyway, because if they supported IPv6, they'd be
  connecting via IPv6.

  b) IPv4 users outnumber IPv6 users by a wide margin.  For the IPv4
  case it does not make sense to waste 12 bytes per IP address to even
  store the "::ffff:" part.

> You mentioned overhead (and I am guessing that's the answer the above question),
> but is the number of IPv6 clients so high that your service would
> not be able to handle it.

The overhead is the memory overhead needed to store the IP addresses of
the peers.  For some popular files we are talking about a five digit
number of peers, and we don't want to store the full IPv6 address for
those.  We do want to use IPv6 sockets so we don't have to add code to
differentiate and make it work, because the kernel already has that code
in the form of the ipv4-mapped address handling code.  And it works,
except for that one if clause that prevents me from binding to
::ffff:0.0.0.0

As I said, this is not _me_ who wants to bind there.  It's the user who
uses "-i 0.0.0.0" to get a process that runs only in IPv4 mode.  It took
me a while to see the point in that, too.

But again, it's not my place to argue with the customers on how they
want to use the software.  It's my place to provide software that does
what they need.  And if you ask me, the same holds true for you.

> As I've already mentioned, your overhead of tracking IPv6 clients is actually
> lower that tracking all the IPv4 clients using mapped addresses.

You did not understand the problem then.
I hope you understand it now.

> One way of preventing the tracking IPv6 clients is by disallowing IPv6 traffic
> or even not configuring any IPv6 addresses.  That could get what you want
> right now, without waiting for a kernel patch.

We do have IPv6, and we have it enabled, and we run a copy of the
software on the IPv6 address, too.

Now we could bind to the specific address of the PC, but that happens to
inferfere with the load balancing and failover installation we have.  In
the case of one failing node, we configure that IP address on one of the
other hosts and expect that host to handle that traffic.

> In this case, you are making a trade-off of application complexity against
> kernel complexity.  You are making your application much simpler, while demanding
> more complexity from the kernel.

In fact it's the other way around.

I waited for the kernel to support v4 mapped addresses.
Then I wrote the socket layer on top of it.

You already committed on providing the complexity.  Now I just want you
to follow through on the promise. :-)

> >> If you are prepared to deal with it, you might as well deal with real ipv6 addresses
> >> at the same time and mitigate your overhead somewhat.
> > You are currently proving all the snide remarks by the BSD people about
> > the Linux IP stack true, and the "professionalism" snide remarks of the
> > Solaris people.  Great work, man.
> This is really a great way to convince someone to do the work... :/

Hey, I'm just saying.  My middleware runs on Linux, BSD, OSX and
Solaris.  I'm just writing the middleware.  Previously, users of my
middleware switched from BSD to Linux because v4 mapped v6 addresses
were turned off by default in FreeBSD.  My users made a stink about it
and convinced FreeBSD to change the default.  But many of them switched
to Linux.

What do you think happens if my middleware now does not work right on
Linux?  People will switch to Solaris.  Or FreeBSD.

I am willing to put up a fight before abandoning ship.  You apparently
think this is a disservice to you because I'm taking your time with
this, but it's in fact the opposite.  I'm giving Linux an opportunity
here to set things right.

Linux has stood tall as a beacon of "it may take us longer but we like
to do things right".  We did not just do a big kernel lock, we wanted to
do it right.  We did not just take an old Unix filesystem, we wanted to
do it right.  We did not just reimplement mbufs, we wanted to do memory
management right.

And now I hope we do not just let some language lawyer weasel through
some RFC and provide an interpretation of it that would legally allow
the current broken behavior.  I hope we fix it instead.

This may not seem like much to you, but we are talking about the biggest
noncommercial Internet messaging infrastructure here.  If they run
Linux, that is an asset for Linux.  Because it shows that we can scale.
We can provide a proper implementation of the IPv6 APIs.

Please don't be part of the problem.  Be part of the solution.

Felix

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17 15:21         ` Eric Dumazet
@ 2009-03-17 18:01           ` Felix von Leitner
  0 siblings, 0 replies; 27+ messages in thread
From: Felix von Leitner @ 2009-03-17 18:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Vlad Yasevich, Brian Haley, netdev

> Trying to understand why you seem furious, lets try to be pragmatic.

I'm not furious.  I just get angry when people I submit a bug report to
tell me they don't want to fix the bug.

Some people think that if I submit a bug to them, they are doing me a
service if they fix the bug.  In fact it's the opposite of that.  If I
submit a bug, I am doing them a service, because I am telling them in
what way their software fails to meet the requirements of the users.

> Most users of your great program wont have a fix for this until next year.

You underestimate my users.  The few ones that run into this kind of
problem are not above patching their kernels to make it work.

But I am not willing to provide a kernel patch and do the customer
support for that.

> I am afraid you have no choice but change your program, or loose users.

No I will not.  My program works.  Just not on Linux.
If my users see that "the Linux people" don't consider running high
profile high throughput messing systems important enough to remove one
if clause of dubious merit, then they go switch to Solaris or FreeBSD
instead.  And then Solaris and FreeBSD get the PR benefit.

> Still I dont get your point. Having TCP V6 sockets is much more expensive
> at kernel level (same for UDP), and bittorrent is known to stress network a bit, so
> having application use an IPV4 socket where it can is a win for your
> program getting more users, and computers spend less power.

There are two things to say to that:

  1. IPv6 is the future.  If I implement IPv4 code because the IPv6 code
  is slower, there will never be an incentive for the kernel people to
  tune the IPv6 code, and it will continue to suck.

  2. IPv4 users won't ever switch to IPv6 if they hear it's so slow that
  people like me had to provide a legacy code path for performance
  reasons.  That is exactly the wrong message to send.

  3. In my benchmarks the performance difference was negligible.  It was
  in the area of 1-2%, i.e. within the margin of error.

> Gasp, OSX having this "::ffff:0.0.0.0" right is probably the reason why more computers
>  run OSX than linux. Sometime dont implement RFC too literally :)

Your target audience is not the RFCs, it's the people.
And the people just told you that you implemented this part of the code
wrong.

Please listen to your users and don't berate them.

Even if we assume that the RFCs can be read so that the current
implementation is technically not illegal, note that the other operating
systems interpreted it differently.  So you miss the main goal of the
RFCs, providing a fertile ground for interoperability.

Just forget all I said.  Just look at the facts.

The RFCs are unclear.
All the other major IPv6 stacks do it the other way.
Maybe they are right?

Felix

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
       [not found]       ` <20090317180840.GC13270@codeblau.de>
@ 2009-03-17 19:21         ` Brian Haley
  2009-03-17 19:31           ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Brian Haley @ 2009-03-17 19:21 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: netdev

Top-posting so others can see your off-list rant in full.  I see no reason to
help you any further, even though I did have a patch that would change this
behavior for you.  Good luck with your "biggest noncommercial Internet messaging
infrastructure" in the world.

-Brian

Felix von Leitner wrote:
>> Please show me a porting guide that even mentions supporting IPv4-only mode
>> through an IPv6 socket by using this method.  There is none that I know of.
> 
> Are you kidding me?
> A _porting guide_?!?
> 
> If you are trying to troll me, you just succeeded.
> 
> Now please make room so the adults can talk about the issue at hand
> while you are putting up straw men.
> 
>>> Did I mention *BSD and OSX allow this?
>> That was their decision, and it doesn't mean it's the right thing to do.
> 
> Riiiight.
> 
> There is an old joke.  The Joneses are driving on the freeway, when the
> radio sounds a warning.  "Warning!  There is a car driving the wrong way
> on the freeway!"  Says grandpa (who is driving the car) "what do you
> mean, one guy?  Hundreds!!"
> 
> Sometimes, if there are two ways to read something, and your users tell
> you which way they want it, and the competition does it the way the
> users want, and you don't, sometimes, in that case, YOU ARE WRONG.
> 
> It's that easy.
> 
> Hey, you have an hp.com email address.  Why don't you check out how
> HP-UX handles this.
> 
>> Compare your bittorrent server to Apache, which is probably the most widely-used
>> server application in the world.  It doesn't do what you're trying to do.  See
>> http://httpd.apache.org/docs/2.2/bind.html and/or browse the source code.
> 
> What is this supposed to be?  Name dropping?
> 
> I'm not impressed.
> 
> And Apache never won any speed or scalability records.  Just because
> many people use Apache does not mean it's a good piece of software.  You
> know, many more people use Windows than Linux.  That does not make
> Windows the standard to follow.  Hey, many people use sendmail!  And
> BIND!
> 
> Felix
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17 19:21         ` Brian Haley
@ 2009-03-17 19:31           ` David Miller
  2009-03-17 21:05             ` Vlad Yasevich
                               ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: David Miller @ 2009-03-17 19:31 UTC (permalink / raw)
  To: brian.haley; +Cc: felix-kernel, netdev

From: Brian Haley <brian.haley@hp.com>
Date: Tue, 17 Mar 2009 15:21:52 -0400

> Top-posting so others can see your off-list rant in full.  I see no
> reason to help you any further, even though I did have a patch that
> would change this behavior for you.  Good luck with your "biggest
> noncommercial Internet messaging infrastructure" in the world.

What a jerk.  Brian, don't help him any more, you were being
very reasonable in your email to him.  His response was way
out of line.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17 19:31           ` David Miller
@ 2009-03-17 21:05             ` Vlad Yasevich
  2009-03-17 21:05             ` [RFC PATCH 1/4] ipv6: Disallow binding to v4-mapped address on v6-only socket Vlad Yasevich
                               ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Vlad Yasevich @ 2009-03-17 21:05 UTC (permalink / raw)
  To: davem; +Cc: netdev

Hi David

Regardless of how we may feel about this thread, it did make me run
the BSD bindtest utility and look at the results.  What I found was
rather surprising.

There were multiple tests that one would exptect to succeed, but they
were failing.

Things that I consider broken:
  1) We can bind to a v4-mapped IPv6 address on a v6-only socket.
  2) We conflict IPv4 wildcrads with explicit IPv6 addresses and vice-versa
  3) We inconsitently treat V4 address and v4-mapped addresses.  As an example,
     try binging to 0.1.2.3. (This also kind of goes to binding
     ::ffff:0.0.0.0).

The following 4 RFC patches attempt to fix this.  I've run bindtest tool and
am currently analizing the results.  They look a heck of a lot better.

Thanks
-vlad


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 1/4] ipv6: Disallow binding to v4-mapped address on v6-only socket.
  2009-03-17 19:31           ` David Miller
  2009-03-17 21:05             ` Vlad Yasevich
@ 2009-03-17 21:05             ` Vlad Yasevich
  2009-03-17 21:06             ` [RFC PATCH 2/4] ipv6: Allow ipv4 wildcard binds after ipv6 address binds Vlad Yasevich
                               ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Vlad Yasevich @ 2009-03-17 21:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, Vlad Yasevich

A socket marked v6-only, can not receive or send traffic to v4-mapped
addresses.  Thus allowing binding to v4-mapped address on such a
socket makes no sense.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 net/ipv6/af_inet6.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 3e2ddfa..07b9f3c 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -276,6 +276,13 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 
 	/* Check if the address belongs to the host. */
 	if (addr_type == IPV6_ADDR_MAPPED) {
+		/* Binding to v4-mapped address on a v6-only socket
+		 * makes no sense
+		 */
+		if (np->ipv6only) {
+			err = -EINVAL;
+			goto out;
+		}
 		v4addr = addr->sin6_addr.s6_addr32[3];
 		if (inet_addr_type(net, v4addr) != RTN_LOCAL) {
 			err = -EADDRNOTAVAIL;
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 2/4] ipv6: Allow ipv4 wildcard binds after ipv6 address binds
  2009-03-17 19:31           ` David Miller
  2009-03-17 21:05             ` Vlad Yasevich
  2009-03-17 21:05             ` [RFC PATCH 1/4] ipv6: Disallow binding to v4-mapped address on v6-only socket Vlad Yasevich
@ 2009-03-17 21:06             ` Vlad Yasevich
  2009-03-17 21:06             ` [RFC PATCH 3/4] ipv6: Make v4-mapped bindings consitant with IPv4 Vlad Yasevich
                               ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Vlad Yasevich @ 2009-03-17 21:06 UTC (permalink / raw)
  To: davem; +Cc: netdev, Vlad Yasevich

The IPv4 wildcard (0.0.0.0) address does not intersect
in any way with explicit IPv6 addresses.  These two should
be permitted, but the IPv4 conflict code checks the ipv6only
bit as part of the test.  Since binding to an explicit IPv6
address restricts the socket to only that IPv6 address, the
side-effect is that the socket behaves as v6-only.  By
explicitely setting ipv6only in this case, allows the 2 binds
to succeed.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 net/ipv6/af_inet6.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 07b9f3c..0adce8e 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -346,8 +346,11 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		goto out;
 	}
 
-	if (addr_type != IPV6_ADDR_ANY)
+	if (addr_type != IPV6_ADDR_ANY) {
 		sk->sk_userlocks |= SOCK_BINDADDR_LOCK;
+		if (addr_type != IPV6_ADDR_MAPPED)
+			np->ipv6only = 1;
+	}
 	if (snum)
 		sk->sk_userlocks |= SOCK_BINDPORT_LOCK;
 	inet->sport = htons(inet->num);
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 3/4] ipv6: Make v4-mapped bindings consitant with IPv4
  2009-03-17 19:31           ` David Miller
                               ` (2 preceding siblings ...)
  2009-03-17 21:06             ` [RFC PATCH 2/4] ipv6: Allow ipv4 wildcard binds after ipv6 address binds Vlad Yasevich
@ 2009-03-17 21:06             ` Vlad Yasevich
  2009-03-17 21:06             ` [RFC PATCH 4/4] ipv6: Fix conflict resolutions during ipv6 binding Vlad Yasevich
  2009-03-18  9:13             ` socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0 Jarek Poplawski
  5 siblings, 0 replies; 27+ messages in thread
From: Vlad Yasevich @ 2009-03-17 21:06 UTC (permalink / raw)
  To: davem; +Cc: netdev, Vlad Yasevich

Binding to a v4-mapped address on an AF_INET6 socket should
produce the same result as binding to an IPv4 address on
AF_INET socket.  The two are interchangable as v4-mapped
address is really a portability aid.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 net/ipv6/af_inet6.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 0adce8e..274cc89 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -276,6 +276,8 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 
 	/* Check if the address belongs to the host. */
 	if (addr_type == IPV6_ADDR_MAPPED) {
+		int chk_addr_ret;
+
 		/* Binding to v4-mapped address on a v6-only socket
 		 * makes no sense
 		 */
@@ -283,11 +285,17 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			err = -EINVAL;
 			goto out;
 		}
+
+		/* Reproduce AF_INET checks to make the bindings consitant */
 		v4addr = addr->sin6_addr.s6_addr32[3];
-		if (inet_addr_type(net, v4addr) != RTN_LOCAL) {
-			err = -EADDRNOTAVAIL;
+		chk_addr_ret = inet_addr_type(net, v4addr);
+		if (!sysctl_ip_nonlocal_bind &&
+		    !(inet->freebind || inet->transparent) &&
+		    v4addr != htonl(INADDR_ANY) &&
+		    chk_addr_ret != RTN_LOCAL &&
+		    chk_addr_ret != RTN_MULTICAST &&
+		    chk_addr_ret != RTN_BROADCAST)
 			goto out;
-		}
 	} else {
 		if (addr_type != IPV6_ADDR_ANY) {
 			struct net_device *dev = NULL;
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 4/4] ipv6: Fix conflict resolutions during ipv6 binding
  2009-03-17 19:31           ` David Miller
                               ` (3 preceding siblings ...)
  2009-03-17 21:06             ` [RFC PATCH 3/4] ipv6: Make v4-mapped bindings consitant with IPv4 Vlad Yasevich
@ 2009-03-17 21:06             ` Vlad Yasevich
  2009-03-18  9:13             ` socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0 Jarek Poplawski
  5 siblings, 0 replies; 27+ messages in thread
From: Vlad Yasevich @ 2009-03-17 21:06 UTC (permalink / raw)
  To: davem; +Cc: netdev, Vlad Yasevich

The ipv6 version of bind_conflict code calls ipv6_rcv_saddr_equal()
which at times wrongly identified intersections between addresses.
It particularly broke down under a few instances and caused erroneouse
bind conflicts.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 include/net/addrconf.h |    4 ++--
 include/net/udp.h      |    2 ++
 net/ipv4/udp.c         |    3 ++-
 net/ipv6/addrconf.c    |   34 ----------------------------------
 net/ipv6/udp.c         |   30 ++++++++++++++++++++++++++++++
 5 files changed, 36 insertions(+), 37 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index c216de5..7b55ab2 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -88,8 +88,8 @@ extern int			ipv6_dev_get_saddr(struct net *net,
 extern int			ipv6_get_lladdr(struct net_device *dev,
 						struct in6_addr *addr,
 						unsigned char banned_flags);
-extern int			ipv6_rcv_saddr_equal(const struct sock *sk, 
-						      const struct sock *sk2);
+extern int 			ipv6_rcv_saddr_equal(const struct sock *sk,
+						    const struct sock *sk2);
 extern void			addrconf_join_solict(struct net_device *dev,
 					struct in6_addr *addr);
 extern void			addrconf_leave_solict(struct inet6_dev *idev,
diff --git a/include/net/udp.h b/include/net/udp.h
index 90e6ce5..93dbe29 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -124,6 +124,8 @@ static inline void udp_lib_close(struct sock *sk, long timeout)
 	sk_common_release(sk);
 }
 
+extern int	ipv4_rcv_saddr_equal(const struct sock *sk1,
+				    const struct sock *sk2);
 extern int	udp_lib_get_port(struct sock *sk, unsigned short snum,
 		int (*)(const struct sock*,const struct sock*));
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4bd178a..ce64e4d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -222,7 +222,7 @@ fail:
 	return error;
 }
 
-static int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
+int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
 {
 	struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
 
@@ -1819,6 +1819,7 @@ EXPORT_SYMBOL(udp_lib_getsockopt);
 EXPORT_SYMBOL(udp_lib_setsockopt);
 EXPORT_SYMBOL(udp_poll);
 EXPORT_SYMBOL(udp_lib_get_port);
+EXPORT_SYMBOL(ipv4_rcv_saddr_equal);
 
 #ifdef CONFIG_PROC_FS
 EXPORT_SYMBOL(udp_proc_register);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index e83852a..00a37c1 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1367,40 +1367,6 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *add
 	return ifp;
 }
 
-int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2)
-{
-	const struct in6_addr *sk_rcv_saddr6 = &inet6_sk(sk)->rcv_saddr;
-	const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2);
-	__be32 sk_rcv_saddr = inet_sk(sk)->rcv_saddr;
-	__be32 sk2_rcv_saddr = inet_rcv_saddr(sk2);
-	int sk_ipv6only = ipv6_only_sock(sk);
-	int sk2_ipv6only = inet_v6_ipv6only(sk2);
-	int addr_type = ipv6_addr_type(sk_rcv_saddr6);
-	int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED;
-
-	if (!sk2_rcv_saddr && !sk_ipv6only)
-		return 1;
-
-	if (addr_type2 == IPV6_ADDR_ANY &&
-	    !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED))
-		return 1;
-
-	if (addr_type == IPV6_ADDR_ANY &&
-	    !(sk_ipv6only && addr_type2 == IPV6_ADDR_MAPPED))
-		return 1;
-
-	if (sk2_rcv_saddr6 &&
-	    ipv6_addr_equal(sk_rcv_saddr6, sk2_rcv_saddr6))
-		return 1;
-
-	if (addr_type == IPV6_ADDR_MAPPED &&
-	    !sk2_ipv6only &&
-	    (!sk2_rcv_saddr || !sk_rcv_saddr || sk_rcv_saddr == sk2_rcv_saddr))
-		return 1;
-
-	return 0;
-}
-
 /* Gets referenced address, destroys ifaddr */
 
 static void addrconf_dad_stop(struct inet6_ifaddr *ifp)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 84b1a29..7e45761 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -49,6 +49,36 @@
 #include <linux/seq_file.h>
 #include "udp_impl.h"
 
+int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2)
+{
+	const struct in6_addr *sk_rcv_saddr6 = &inet6_sk(sk)->rcv_saddr;
+	const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2);
+	__be32 sk_rcv_saddr = inet_sk(sk)->rcv_saddr;
+	__be32 sk2_rcv_saddr = inet_rcv_saddr(sk2);
+	int sk_ipv6only = ipv6_only_sock(sk);
+	int sk2_ipv6only = inet_v6_ipv6only(sk2);
+	int addr_type = ipv6_addr_type(sk_rcv_saddr6);
+	int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED;
+
+	/* if both are mapped, treat as IPv4 */
+	if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED)
+		return ipv4_rcv_saddr_equal(sk, sk2);
+
+	if (addr_type2 == IPV6_ADDR_ANY &&
+	    !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED))
+		return 1;
+
+	if (addr_type == IPV6_ADDR_ANY &&
+	    !(sk_ipv6only && addr_type2 == IPV6_ADDR_MAPPED))
+		return 1;
+
+	if (sk2_rcv_saddr6 &&
+	    ipv6_addr_equal(sk_rcv_saddr6, sk2_rcv_saddr6))
+		return 1;
+
+	return 0;
+}
+
 int udp_v6_get_port(struct sock *sk, unsigned short snum)
 {
 	return udp_lib_get_port(sk, snum, ipv6_rcv_saddr_equal);
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-17 19:31           ` David Miller
                               ` (4 preceding siblings ...)
  2009-03-17 21:06             ` [RFC PATCH 4/4] ipv6: Fix conflict resolutions during ipv6 binding Vlad Yasevich
@ 2009-03-18  9:13             ` Jarek Poplawski
  2009-03-18 21:36               ` David Miller
  5 siblings, 1 reply; 27+ messages in thread
From: Jarek Poplawski @ 2009-03-18  9:13 UTC (permalink / raw)
  To: David Miller; +Cc: brian.haley, felix-kernel, netdev

On 17-03-2009 20:31, David Miller wrote:
> From: Brian Haley <brian.haley@hp.com>
> Date: Tue, 17 Mar 2009 15:21:52 -0400
> 
>> Top-posting so others can see your off-list rant in full.  I see no
>> reason to help you any further, even though I did have a patch that
>> would change this behavior for you.  Good luck with your "biggest
>> noncommercial Internet messaging infrastructure" in the world.
> 
> What a jerk.  Brian, don't help him any more, you were being
> very reasonable in your email to him.  His response was way
> out of line.

Do you mean he got that joke wrong? Otherwise I think he is right. We
shouln't advise him how to do the things right, but, since what he
wants looks like legal and acceptable elsewhere, try to do this the
least invasive way.

Jarek P.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-18  9:13             ` socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0 Jarek Poplawski
@ 2009-03-18 21:36               ` David Miller
  2009-03-18 21:53                 ` Jarek Poplawski
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2009-03-18 21:36 UTC (permalink / raw)
  To: jarkao2; +Cc: brian.haley, felix-kernel, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 18 Mar 2009 09:13:07 +0000

> On 17-03-2009 20:31, David Miller wrote:
> > From: Brian Haley <brian.haley@hp.com>
> > Date: Tue, 17 Mar 2009 15:21:52 -0400
> > 
> >> Top-posting so others can see your off-list rant in full.  I see no
> >> reason to help you any further, even though I did have a patch that
> >> would change this behavior for you.  Good luck with your "biggest
> >> noncommercial Internet messaging infrastructure" in the world.
> > 
> > What a jerk.  Brian, don't help him any more, you were being
> > very reasonable in your email to him.  His response was way
> > out of line.
> 
> Do you mean he got that joke wrong? Otherwise I think he is right. We
> shouln't advise him how to do the things right, but, since what he
> wants looks like legal and acceptable elsewhere, try to do this the
> least invasive way.

First of all, no matter if we allow that kind of bind() he wants or
not, he cannot use it in his application unless he wants his
application to be useless of most people's machines for at least a
year.

That's why the "make Linux be compatible with X other systems" is
always a joke argument.  Application wise, one still has to be
compatible with all existing Linux systems which is a much larger
issue.

And yes we should advise people what is an appropriate way to
accomplish some task.  If we aren't the experts on such a topic,
then who the hell is?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-18 21:36               ` David Miller
@ 2009-03-18 21:53                 ` Jarek Poplawski
  2009-03-19  0:32                   ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Jarek Poplawski @ 2009-03-18 21:53 UTC (permalink / raw)
  To: David Miller; +Cc: brian.haley, felix-kernel, netdev

On Wed, Mar 18, 2009 at 02:36:35PM -0700, David Miller wrote:
...
> And yes we should advise people what is an appropriate way to
> accomplish some task.  If we aren't the experts on such a topic,
> then who the hell is?

Only if sb. is looking for advice; otherwise it's not very nice,
especially if repeated many times.

Jarek P.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0
  2009-03-18 21:53                 ` Jarek Poplawski
@ 2009-03-19  0:32                   ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2009-03-19  0:32 UTC (permalink / raw)
  To: jarkao2; +Cc: brian.haley, felix-kernel, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 18 Mar 2009 22:53:00 +0100

> On Wed, Mar 18, 2009 at 02:36:35PM -0700, David Miller wrote:
> ...
> > And yes we should advise people what is an appropriate way to
> > accomplish some task.  If we aren't the experts on such a topic,
> > then who the hell is?
> 
> Only if sb. is looking for advice; otherwise it's not very nice,
> especially if repeated many times.

If the purpose of the query was to suggest that Linux should behave a
certain way, it should be no surprise to anyone that if we should
disagree with that suggestion we would suggest what we consider more
desriable alternatives for the application developer.

I don't even think this is worth the time we are spending to discuss
it, it seems so straightforward.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-03-19  0:32 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-16 23:48 socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0 Felix von Leitner
2009-03-17  0:00 ` Stephen Hemminger
2009-03-17  0:18   ` Felix von Leitner
2009-03-17  2:26 ` Brian Haley
2009-03-17  2:47   ` Eric Dumazet
2009-03-17  8:51     ` Bjørn Mork
2009-03-17 16:00     ` Brian Haley
2009-03-17 12:58   ` Felix von Leitner
2009-03-17 13:47     ` Vlad Yasevich
2009-03-17 14:14       ` Felix von Leitner
2009-03-17 14:57         ` Vlad Yasevich
2009-03-17 17:51           ` Felix von Leitner
2009-03-17 15:21         ` Eric Dumazet
2009-03-17 18:01           ` Felix von Leitner
2009-03-17 15:59     ` Brian Haley
     [not found]       ` <20090317180840.GC13270@codeblau.de>
2009-03-17 19:21         ` Brian Haley
2009-03-17 19:31           ` David Miller
2009-03-17 21:05             ` Vlad Yasevich
2009-03-17 21:05             ` [RFC PATCH 1/4] ipv6: Disallow binding to v4-mapped address on v6-only socket Vlad Yasevich
2009-03-17 21:06             ` [RFC PATCH 2/4] ipv6: Allow ipv4 wildcard binds after ipv6 address binds Vlad Yasevich
2009-03-17 21:06             ` [RFC PATCH 3/4] ipv6: Make v4-mapped bindings consitant with IPv4 Vlad Yasevich
2009-03-17 21:06             ` [RFC PATCH 4/4] ipv6: Fix conflict resolutions during ipv6 binding Vlad Yasevich
2009-03-18  9:13             ` socket api problem: can't bind an ipv6 socket to ::ffff:0.0.0.0 Jarek Poplawski
2009-03-18 21:36               ` David Miller
2009-03-18 21:53                 ` Jarek Poplawski
2009-03-19  0:32                   ` David Miller
2009-03-17  9:03 ` Bjørn Mork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.