linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* struct sockaddr_storage
@ 2023-01-19 14:11 Alejandro Colomar
  2023-01-20 10:06 ` Stefan Puiu
  0 siblings, 1 reply; 12+ messages in thread
From: Alejandro Colomar @ 2023-01-19 14:11 UTC (permalink / raw)
  To: GNU C Library; +Cc: linux-man, gcc, Igor Sysoev


[-- Attachment #1.1: Type: text/plain, Size: 1702 bytes --]

Hi!

I just received a report about struct sockaddr_storage in the man pages.  It 
reminded me of some concern I've always had about it: it doesn't seem to be a 
usable type.

It has some alignment promises that make it "just work" most of the time, but 
it's still a UB mine, according to ISO C.

According to strict aliasing rules, if you declare a variable of type 'struct 
sockaddr_storage', that's what you get, and trying to access it later as some 
other sockaddr_8 is simply not legal.  The compiler may assume those accesses 
can't happen, and optimize as it pleases.

That means that one needs to declare a union with all possible sockaddr_* types 
that are of interest, so that access as any of them is later allowed by the 
compiler (of course, the user still needs to access the correct one, but that's 
of course).

In that union, one could add a member that is of type sockaddr_storage for 
getting a more consistent structure size (for example, if some members are 
conditional on preprocessor stuff), but I don't see much value in that. 
Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code:

  * struct sockaddr_storage is:
  *    128 bytes on Linux, FreeBSD, MacOSX, NetBSD;
  *    256 bytes on Solaris, OpenBSD, and HP-UX;
  *   1288 bytes on AIX.
  *
  * struct sockaddr_storage is too large on some platforms
  * or less than real maximum struct sockaddr_un length.

Which makes it even more useless as a type.


Should we warn about uses of this type?  Should we recommend against using it in 
the manual page, since there's no legitimate uses of it?

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-19 14:11 struct sockaddr_storage Alejandro Colomar
@ 2023-01-20 10:06 ` Stefan Puiu
  2023-01-20 12:39   ` Alejandro Colomar
  2023-01-24 11:16   ` Rich Felker
  0 siblings, 2 replies; 12+ messages in thread
From: Stefan Puiu @ 2023-01-20 10:06 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: GNU C Library, linux-man, gcc, Igor Sysoev

Hi Alex,

On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi!
>
> I just received a report about struct sockaddr_storage in the man pages.  It
> reminded me of some concern I've always had about it: it doesn't seem to be a
> usable type.
>
> It has some alignment promises that make it "just work" most of the time, but
> it's still a UB mine, according to ISO C.
>
> According to strict aliasing rules, if you declare a variable of type 'struct
> sockaddr_storage', that's what you get, and trying to access it later as some
> other sockaddr_8 is simply not legal.  The compiler may assume those accesses
> can't happen, and optimize as it pleases.

Can you detail the "is not legal" part? How about the APIs like
connect() etc that use pointers to struct sockaddr, where the
underlying type is different, why would that be legal while using
sockaddr_storage isn't?
Will code break in practice?

>
> That means that one needs to declare a union with all possible sockaddr_* types
> that are of interest, so that access as any of them is later allowed by the
> compiler (of course, the user still needs to access the correct one, but that's
> of course).
>
> In that union, one could add a member that is of type sockaddr_storage for
> getting a more consistent structure size (for example, if some members are
> conditional on preprocessor stuff), but I don't see much value in that.
> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code:
>
>   * struct sockaddr_storage is:
>   *    128 bytes on Linux, FreeBSD, MacOSX, NetBSD;
>   *    256 bytes on Solaris, OpenBSD, and HP-UX;
>   *   1288 bytes on AIX.
>   *
>   * struct sockaddr_storage is too large on some platforms
>   * or less than real maximum struct sockaddr_un length.
>
> Which makes it even more useless as a type.

I'm not sure using struct sockaddr_storage for storing sockaddr_un's
(UNIX domain socket addresses, right?) is that common a usage. I've
used it in the past to store either a sockaddr_in or a sockaddr_in6,
and I think that would be a more common scenario. The comment above
probably makes sense for nginx, but different projects have different
needs.

As for the size, I guess it might matter if you want to port your code
to AIX, Solaris, OpenBSD etc. I don't think all software is meant to
be portable, though (or portable to those platforms). Maybe a warning
is in order that, for portable code, developers should check its size
on the other platforms targeted.

Just my 2 cents, as always,
Stefan.

>
>
> Should we warn about uses of this type?  Should we recommend against using it in
> the manual page, since there's no legitimate uses of it?
>
> Cheers,
>
> Alex
>
> --
> <http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-20 10:06 ` Stefan Puiu
@ 2023-01-20 12:39   ` Alejandro Colomar
  2023-01-23  7:40     ` Stefan Puiu
  2023-01-24 11:16   ` Rich Felker
  1 sibling, 1 reply; 12+ messages in thread
From: Alejandro Colomar @ 2023-01-20 12:39 UTC (permalink / raw)
  To: Stefan Puiu; +Cc: GNU C Library, linux-man, gcc, Igor Sysoev


[-- Attachment #1.1: Type: text/plain, Size: 6064 bytes --]

Hi Stefan,

On 1/20/23 11:06, Stefan Puiu wrote:
> Hi Alex,
> 
> On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
> <alx.manpages@gmail.com> wrote:
>>
>> Hi!
>>
>> I just received a report about struct sockaddr_storage in the man pages.  It
>> reminded me of some concern I've always had about it: it doesn't seem to be a
>> usable type.
>>
>> It has some alignment promises that make it "just work" most of the time, but
>> it's still a UB mine, according to ISO C.
>>
>> According to strict aliasing rules, if you declare a variable of type 'struct
>> sockaddr_storage', that's what you get, and trying to access it later as some
>> other sockaddr_8 is simply not legal.  The compiler may assume those accesses
>> can't happen, and optimize as it pleases.
> 
> Can you detail the "is not legal" part?

I mean that it's Undefined Behavior contraband.

> How about the APIs like
> connect() etc that use pointers to struct sockaddr, where the
> underlying type is different, why would that be legal while using
> sockaddr_storage isn't?

That's also bad.  However, it can be fixed by fixing `sockaddr_storage` and 
telling everyone to use it instead of using whatever other `sockaddr_*`.  You 
need a union for the underlying storage, so that the library functions can 
access both as `sockaddr` and as `sockaddr_*`.

The problem isn't really in the implementation of connect(2), but on the type. 
The implementation of connect(2) would be fine if we just fixed the type.  See 
some example:

struct my_sockaddr_storage {
	union {
		sa_family_t          ss_family;
		struct sockaddr      sa;
		struct sockaddr_in   sin;
		struct sockaddr_in6  sin6;
		struct sockaddr_un   sun;
	};
};


void
foo(foo)
{
	struct my_sockaddr_storage  mss;
	struct sockaddr_storage     ss;

	// initialize mss and ss

	inet_sockaddr2str(&mss.sa);  // correct
	inet_sockaddr2str((struct sockaddr_storage *)&ss);  // UB
}

/* This function is correct, as far as the accessed object has the
  * type we're using.  That's only possible through a `union`, since
  * we're accessing it with 2 different types: `sockaddr` for the
  * `sa_family` and then the appropriate subtype for the address
  * itself.
  */
const char *
inet_sockaddr2str(const struct sockaddr *sa)
{
	struct sockaddr_in   *sin;
	struct sockaddr_in6  *sin6;

	static char          buf[INET_ADDRSTRLENMAX];

	switch (sa->sa_family) {
	case AF_INET:
		sin = (struct sockaddr_in *) sa;
		inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf));
		return buf;
	case AF_INET6:
		sin6 = (struct sockaddr_in6 *) sa;
		inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf));
		return buf;
	default:
		errno = EAFNOSUPPORT;
		return NULL;
	}
}


BTW, you need a union _even if_ you only care about a single address family. 
That is, if you only care about Unix sockets, you can't declare your variable of 
type sockaddr_un, because the libc functions and syscalls still need to access 
it as a sockaddr to see which family it has.

> Will code break in practice?

Well, it depends on how much compilers advance.  Here's some interesting experiment:

<https://software.codidact.com/posts/287748/287750#answer-287750>

I wouldn't rely on Undefined Behavior not causing nasal demons.  When you get 
them, you can only kill them with garlic.

> 
>>
>> That means that one needs to declare a union with all possible sockaddr_* types
>> that are of interest, so that access as any of them is later allowed by the
>> compiler (of course, the user still needs to access the correct one, but that's
>> of course).
>>
>> In that union, one could add a member that is of type sockaddr_storage for
>> getting a more consistent structure size (for example, if some members are
>> conditional on preprocessor stuff), but I don't see much value in that.
>> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code:
>>
>>    * struct sockaddr_storage is:
>>    *    128 bytes on Linux, FreeBSD, MacOSX, NetBSD;
>>    *    256 bytes on Solaris, OpenBSD, and HP-UX;
>>    *   1288 bytes on AIX.
>>    *
>>    * struct sockaddr_storage is too large on some platforms
>>    * or less than real maximum struct sockaddr_un length.
>>
>> Which makes it even more useless as a type.
> 
> I'm not sure using struct sockaddr_storage for storing sockaddr_un's
> (UNIX domain socket addresses, right?) is that common a usage. I've
> used it in the past to store either a sockaddr_in or a sockaddr_in6,
> and I think that would be a more common scenario. The comment above
> probably makes sense for nginx, but different projects have different
> needs.
> 
> As for the size, I guess it might matter if you want to port your code
> to AIX, Solaris, OpenBSD etc. I don't think all software is meant to
> be portable, though (or portable to those platforms). Maybe a warning
> is in order that, for portable code, developers should check its size
> on the other platforms targeted.

The size thing is just an added problem.  The deep problem is that you need to 
use a union that contains all types that you care about _plus_ plain sockaddr, 
because the structure will be accessed at least as a sockaddr, plus one of the 
different specialized structures.  So even for only sockaddr_un, you need at 
least the following:

union my_unix_sockaddr {
	struct sockaddr     sa;
	struct sockaddr_un  sun;
};

Not doing that will necessarily result in invoking Undefined Behavior at some point.

> 
> Just my 2 cents, as always,
> Stefan.

The good thing is that fixing sockaddr_storage and telling everybody to use it 
always fixes the problem, so I'm preparing a patch for glibc.

Cheers,

Alex

> 
>>
>>
>> Should we warn about uses of this type?  Should we recommend against using it in
>> the manual page, since there's no legitimate uses of it?
>>
>> Cheers,
>>
>> Alex
>>
>> --
>> <http://www.alejandro-colomar.es/>

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-20 12:39   ` Alejandro Colomar
@ 2023-01-23  7:40     ` Stefan Puiu
  2023-01-23 16:03       ` Alejandro Colomar
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Puiu @ 2023-01-23  7:40 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: GNU C Library, linux-man, gcc, Igor Sysoev

Hi Alex,

On Fri, Jan 20, 2023 at 2:40 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Stefan,
>
> On 1/20/23 11:06, Stefan Puiu wrote:
> > Hi Alex,
> >
> > On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
> > <alx.manpages@gmail.com> wrote:
> >>
> >> Hi!
> >>
> >> I just received a report about struct sockaddr_storage in the man pages.  It
> >> reminded me of some concern I've always had about it: it doesn't seem to be a
> >> usable type.
> >>
> >> It has some alignment promises that make it "just work" most of the time, but
> >> it's still a UB mine, according to ISO C.
> >>
> >> According to strict aliasing rules, if you declare a variable of type 'struct
> >> sockaddr_storage', that's what you get, and trying to access it later as some
> >> other sockaddr_8 is simply not legal.  The compiler may assume those accesses
> >> can't happen, and optimize as it pleases.
> >
> > Can you detail the "is not legal" part?
>
> I mean that it's Undefined Behavior contraband.

OK, next question. Is this theoretical or practical UB? People check
documentation about how to write code today, I think.

>
> > How about the APIs like
> > connect() etc that use pointers to struct sockaddr, where the
> > underlying type is different, why would that be legal while using
> > sockaddr_storage isn't?
>
> That's also bad.  However, it can be fixed by fixing `sockaddr_storage` and
> telling everyone to use it instead of using whatever other `sockaddr_*`.  You
> need a union for the underlying storage, so that the library functions can
> access both as `sockaddr` and as `sockaddr_*`.
>
> The problem isn't really in the implementation of connect(2), but on the type.
> The implementation of connect(2) would be fine if we just fixed the type.  See
> some example:
>
> struct my_sockaddr_storage {
>         union {
>                 sa_family_t          ss_family;
>                 struct sockaddr      sa;
>                 struct sockaddr_in   sin;
>                 struct sockaddr_in6  sin6;
>                 struct sockaddr_un   sun;
>         };
> };
>
>
> void
> foo(foo)
> {
>         struct my_sockaddr_storage  mss;
>         struct sockaddr_storage     ss;
>
>         // initialize mss and ss
>
>         inet_sockaddr2str(&mss.sa);  // correct
>         inet_sockaddr2str((struct sockaddr_storage *)&ss);  // UB
> }
>
> /* This function is correct, as far as the accessed object has the
>   * type we're using.  That's only possible through a `union`, since
>   * we're accessing it with 2 different types: `sockaddr` for the
>   * `sa_family` and then the appropriate subtype for the address
>   * itself.
>   */
> const char *
> inet_sockaddr2str(const struct sockaddr *sa)
> {
>         struct sockaddr_in   *sin;
>         struct sockaddr_in6  *sin6;
>
>         static char          buf[INET_ADDRSTRLENMAX];
>
>         switch (sa->sa_family) {
>         case AF_INET:
>                 sin = (struct sockaddr_in *) sa;
>                 inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf));
>                 return buf;
>         case AF_INET6:
>                 sin6 = (struct sockaddr_in6 *) sa;
>                 inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf));
>                 return buf;
>         default:
>                 errno = EAFNOSUPPORT;
>                 return NULL;
>         }
> }
>
>
> BTW, you need a union _even if_ you only care about a single address family.
> That is, if you only care about Unix sockets, you can't declare your variable of
> type sockaddr_un, because the libc functions and syscalls still need to access
> it as a sockaddr to see which family it has.
>
> > Will code break in practice?
>
> Well, it depends on how much compilers advance.  Here's some interesting experiment:
>
> <https://software.codidact.com/posts/287748/287750#answer-287750>

That code plays with 2 pointers to the same area, one to double and
one to int, so I don't think it's that similar to the sockaddr
situation. At least for struct sockaddr, the sa_family field is the
same for all struct sockaddr_* variants. Also, in practical terms, I
don't think any compiler optimization that breaks socket APIs (and, if
I recall correctly, there are instances of this pattern in the kernel
as well) is going to be an easy sell. It's possible, but realistically
speaking, I don't think it's going to happen.

>
> I wouldn't rely on Undefined Behavior not causing nasal demons.  When you get
> them, you can only kill them with garlic.

OK, but not all theoretical issues have practical implications. Is
there code that can show UB in practical terms with struct
sockaddr_storage today? Like Eric mentioned in another thread, does
UBSan complain about code using struct sockaddr_storage?

Thanks,
Stefan.

>
> >
> >>
> >> That means that one needs to declare a union with all possible sockaddr_* types
> >> that are of interest, so that access as any of them is later allowed by the
> >> compiler (of course, the user still needs to access the correct one, but that's
> >> of course).
> >>
> >> In that union, one could add a member that is of type sockaddr_storage for
> >> getting a more consistent structure size (for example, if some members are
> >> conditional on preprocessor stuff), but I don't see much value in that.
> >> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code:
> >>
> >>    * struct sockaddr_storage is:
> >>    *    128 bytes on Linux, FreeBSD, MacOSX, NetBSD;
> >>    *    256 bytes on Solaris, OpenBSD, and HP-UX;
> >>    *   1288 bytes on AIX.
> >>    *
> >>    * struct sockaddr_storage is too large on some platforms
> >>    * or less than real maximum struct sockaddr_un length.
> >>
> >> Which makes it even more useless as a type.
> >
> > I'm not sure using struct sockaddr_storage for storing sockaddr_un's
> > (UNIX domain socket addresses, right?) is that common a usage. I've
> > used it in the past to store either a sockaddr_in or a sockaddr_in6,
> > and I think that would be a more common scenario. The comment above
> > probably makes sense for nginx, but different projects have different
> > needs.
> >
> > As for the size, I guess it might matter if you want to port your code
> > to AIX, Solaris, OpenBSD etc. I don't think all software is meant to
> > be portable, though (or portable to those platforms). Maybe a warning
> > is in order that, for portable code, developers should check its size
> > on the other platforms targeted.
>
> The size thing is just an added problem.  The deep problem is that you need to
> use a union that contains all types that you care about _plus_ plain sockaddr,
> because the structure will be accessed at least as a sockaddr, plus one of the
> different specialized structures.  So even for only sockaddr_un, you need at
> least the following:
>
> union my_unix_sockaddr {
>         struct sockaddr     sa;
>         struct sockaddr_un  sun;
> };
>
> Not doing that will necessarily result in invoking Undefined Behavior at some point.
>
> >
> > Just my 2 cents, as always,
> > Stefan.
>
> The good thing is that fixing sockaddr_storage and telling everybody to use it
> always fixes the problem, so I'm preparing a patch for glibc.
>
> Cheers,
>
> Alex
>
> >
> >>
> >>
> >> Should we warn about uses of this type?  Should we recommend against using it in
> >> the manual page, since there's no legitimate uses of it?
> >>
> >> Cheers,
> >>
> >> Alex
> >>
> >> --
> >> <http://www.alejandro-colomar.es/>
>
> --
> <http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-23  7:40     ` Stefan Puiu
@ 2023-01-23 16:03       ` Alejandro Colomar
  2023-01-23 16:28         ` Richard Biener
  2023-01-23 16:37         ` Jakub Jelinek
  0 siblings, 2 replies; 12+ messages in thread
From: Alejandro Colomar @ 2023-01-23 16:03 UTC (permalink / raw)
  To: Stefan Puiu
  Cc: GNU C Library, linux-man, gcc, Igor Sysoev, Bastien Roucariès


[-- Attachment #1.1: Type: text/plain, Size: 4583 bytes --]

Hi Stefan,

On 1/23/23 08:40, Stefan Puiu wrote:
>>>> According to strict aliasing rules, if you declare a variable of type 'struct
>>>> sockaddr_storage', that's what you get, and trying to access it later as some
>>>> other sockaddr_8 is simply not legal.  The compiler may assume those accesses
>>>> can't happen, and optimize as it pleases.
>>>
>>> Can you detail the "is not legal" part?
>>
>> I mean that it's Undefined Behavior contraband.
> 
> OK, next question. Is this theoretical or practical UB?


Since the functions using this type are not functions that should be inlined, 
since the code is rather large, they are not visible to the compiler, so many of 
the optimizations that this UB enables are not likely to happen.  Translation 
Unit (TU) boundaries are what keeps most UB invokations not be really dangerous.

Also, glibc seems to be using a GCC attribute (transparent_union) to make the 
code avoid UB even if it were inlined, so if you use glibc, you're fine.  If 
you're using some smaller libc with a less capable compiler, or maybe C++, you 
are less lucky, but TU boundaries will probably still save your day.

> People check
> documentation about how to write code today, I think.

I'm fine documenting how to do it today.  But before changing the documentation, 
I'd like to take some time to reflect on what can we do to fix the standard so 
that we don't have this semi-broken state forever.  When we have a clear idea of 
what we can do to fix the implementation and hopefully the standard long-term, 
possibly keeping source code the same, we can do a better recommendation for 
programmers.

Today, you can do 2 things:

-  You don't care about UB, and would like that C had always been K&R C, and GCC 
just makes it work.  Then use `sockaddr_storage`.  It will just work.  When it 
stops working, you can blame the compiler and libc for optimizing way too much.

-  You care a lot about UB.  Then write your own union, as all the `sockaddr` 
interface should have been designed from the ground up.  That's what unions are for.

Which should we recommend?  That's my problem.

I don't want to be documenting the latter, because it's non-standard, and it's 
still likely to do it invoking UB in a different way, because it's a difficult 
part of the language, and when you roll your own, you're likely to make accidents.

So, ideally, I'd like to document the former, but for that, I'd like to make 
sure that it will work forever, since otherwise we'd be blamed when somebody's 
code is compiled in a platform with some combination of libc, compiler, and 
phase of the moon, that makes the UB become non-theoretical.

I think we can fix the definition of `sockaddr_storage` to have defined 
behavior, with the changes I'm discussing with Bastien, so I guess we'll 
document the former.

>>> Will code break in practice?
>>
>> Well, it depends on how much compilers advance.  Here's some interesting experiment:
>>
>> <https://software.codidact.com/posts/287748/287750#answer-287750>
> 
> That code plays with 2 pointers to the same area, one to double and
> one to int, so I don't think it's that similar to the sockaddr
> situation. At least for struct sockaddr, the sa_family field is the
> same for all struct sockaddr_* variants. Also, in practical terms, I
> don't think any compiler optimization that breaks socket APIs (and, if
> I recall correctly, there are instances of this pattern in the kernel
> as well) is going to be an easy sell. It's possible, but realistically
> speaking, I don't think it's going to happen.

The common initial sequence of structures is only allowed if the structures form 
part of a union (which is why to avoid UB you need a union; and still, you need 
to make sure you don't invoke UB in a different way).
<https://port70.net/%7Ensz/c/c11/n1570.html#6.5.2.3p6>

> 
>>
>> I wouldn't rely on Undefined Behavior not causing nasal demons.  When you get
>> them, you can only kill them with garlic.
> 
> OK, but not all theoretical issues have practical implications. Is
> there code that can show UB in practical terms with struct
> sockaddr_storage today? Like Eric mentioned in another thread, does
> UBSan complain about code using struct sockaddr_storage?

It's unlikely.  But I can't promise it will be safe under some random 
combination of compiler and library, and depends also on what you do in your 
code, which will affect compiler optimizations.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-23 16:03       ` Alejandro Colomar
@ 2023-01-23 16:28         ` Richard Biener
  2023-01-24 16:38           ` Alex Colomar
  2023-01-23 16:37         ` Jakub Jelinek
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Biener @ 2023-01-23 16:28 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Stefan Puiu, GNU C Library, linux-man, gcc, Igor Sysoev,
	Bastien Roucariès

On Mon, Jan 23, 2023 at 5:04 PM Alejandro Colomar via Gcc
<gcc@gcc.gnu.org> wrote:
>
> Hi Stefan,
>
> On 1/23/23 08:40, Stefan Puiu wrote:
> >>>> According to strict aliasing rules, if you declare a variable of type 'struct
> >>>> sockaddr_storage', that's what you get, and trying to access it later as some
> >>>> other sockaddr_8 is simply not legal.  The compiler may assume those accesses
> >>>> can't happen, and optimize as it pleases.
> >>>
> >>> Can you detail the "is not legal" part?
> >>
> >> I mean that it's Undefined Behavior contraband.
> >
> > OK, next question. Is this theoretical or practical UB?
>
>
> Since the functions using this type are not functions that should be inlined,
> since the code is rather large, they are not visible to the compiler, so many of
> the optimizations that this UB enables are not likely to happen.  Translation
> Unit (TU) boundaries are what keeps most UB invokations not be really dangerous.
>
> Also, glibc seems to be using a GCC attribute (transparent_union) to make the
> code avoid UB even if it were inlined, so if you use glibc, you're fine.  If
> you're using some smaller libc with a less capable compiler, or maybe C++, you
> are less lucky, but TU boundaries will probably still save your day.
>
> > People check
> > documentation about how to write code today, I think.
>
> I'm fine documenting how to do it today.  But before changing the documentation,
> I'd like to take some time to reflect on what can we do to fix the standard so
> that we don't have this semi-broken state forever.  When we have a clear idea of
> what we can do to fix the implementation and hopefully the standard long-term,
> possibly keeping source code the same, we can do a better recommendation for
> programmers.
>
> Today, you can do 2 things:
>
> -  You don't care about UB, and would like that C had always been K&R C, and GCC
> just makes it work.  Then use `sockaddr_storage`.  It will just work.  When it
> stops working, you can blame the compiler and libc for optimizing way too much.
>
> -  You care a lot about UB.  Then write your own union, as all the `sockaddr`
> interface should have been designed from the ground up.  That's what unions are for.
>
> Which should we recommend?  That's my problem.
>
> I don't want to be documenting the latter, because it's non-standard, and it's
> still likely to do it invoking UB in a different way, because it's a difficult
> part of the language, and when you roll your own, you're likely to make accidents.
>
> So, ideally, I'd like to document the former, but for that, I'd like to make
> sure that it will work forever, since otherwise we'd be blamed when somebody's
> code is compiled in a platform with some combination of libc, compiler, and
> phase of the moon, that makes the UB become non-theoretical.
>
> I think we can fix the definition of `sockaddr_storage` to have defined
> behavior, with the changes I'm discussing with Bastien, so I guess we'll
> document the former.
>
> >>> Will code break in practice?
> >>
> >> Well, it depends on how much compilers advance.  Here's some interesting experiment:
> >>
> >> <https://software.codidact.com/posts/287748/287750#answer-287750>
> >
> > That code plays with 2 pointers to the same area, one to double and
> > one to int, so I don't think it's that similar to the sockaddr
> > situation. At least for struct sockaddr, the sa_family field is the
> > same for all struct sockaddr_* variants. Also, in practical terms, I
> > don't think any compiler optimization that breaks socket APIs (and, if
> > I recall correctly, there are instances of this pattern in the kernel
> > as well) is going to be an easy sell. It's possible, but realistically
> > speaking, I don't think it's going to happen.
>
> The common initial sequence of structures is only allowed if the structures form
> part of a union (which is why to avoid UB you need a union; and still, you need
> to make sure you don't invoke UB in a different way).
> <https://port70.net/%7Ensz/c/c11/n1570.html#6.5.2.3p6>

GCC only allows it if the union is visible as part of the access, that
is, it allows it
under its rule of allowing punning for union accesses and not specially because
of the initial sequence rule.  So

 u.a.x = 1;
 ... = u.b.x;

is allowed but

  struct A *p = &u.a;
  p->x = 1;
  struct B *q = &u.b;
  ... = q->x;

is UB with GCC if struct A and B are the union members with a common
initial sequence.

Richard.

> >
> >>
> >> I wouldn't rely on Undefined Behavior not causing nasal demons.  When you get
> >> them, you can only kill them with garlic.
> >
> > OK, but not all theoretical issues have practical implications. Is
> > there code that can show UB in practical terms with struct
> > sockaddr_storage today? Like Eric mentioned in another thread, does
> > UBSan complain about code using struct sockaddr_storage?
>
> It's unlikely.  But I can't promise it will be safe under some random
> combination of compiler and library, and depends also on what you do in your
> code, which will affect compiler optimizations.
>
> Cheers,
>
> Alex
>
> --
> <http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-23 16:03       ` Alejandro Colomar
  2023-01-23 16:28         ` Richard Biener
@ 2023-01-23 16:37         ` Jakub Jelinek
  2023-01-24 16:40           ` Alex Colomar
  2023-01-24 18:00           ` Alex Colomar
  1 sibling, 2 replies; 12+ messages in thread
From: Jakub Jelinek @ 2023-01-23 16:37 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Stefan Puiu, GNU C Library, linux-man, gcc, Igor Sysoev,
	Bastien Roucariès

On Mon, Jan 23, 2023 at 05:03:00PM +0100, Alejandro Colomar via Gcc wrote:
> Hi Stefan,
> 
> On 1/23/23 08:40, Stefan Puiu wrote:
> > > > > According to strict aliasing rules, if you declare a variable of type 'struct
> > > > > sockaddr_storage', that's what you get, and trying to access it later as some
> > > > > other sockaddr_8 is simply not legal.  The compiler may assume those accesses
> > > > > can't happen, and optimize as it pleases.
> > > > 
> > > > Can you detail the "is not legal" part?
> > > 
> > > I mean that it's Undefined Behavior contraband.
> > 
> > OK, next question. Is this theoretical or practical UB?
> 
> 
> Since the functions using this type are not functions that should be
> inlined, since the code is rather large, they are not visible to the
> compiler, so many of the optimizations that this UB enables are not likely
> to happen.  Translation Unit (TU) boundaries are what keeps most UB
> invokations not be really dangerous.
> 
> Also, glibc seems to be using a GCC attribute (transparent_union) to make
> the code avoid UB even if it were inlined, so if you use glibc, you're fine.
> If you're using some smaller libc with a less capable compiler, or maybe
> C++, you are less lucky, but TU boundaries will probably still save your
> day.

Please see transparent_union documentation in GCC documentation.
E.g. https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Common-Type-Attributes.html#index-transparent_005funion-type-attribute
transparent_union doesn't change anything regarding type punning, it is
solely about function arguments, how arguments of that type are passed
(as first union member) and that no casts to the union are needed from
the member types.
And, with LTO TU boundaries are lost, inlining or other IPA optimizations
(including say modref) work in between TUs.

	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-20 10:06 ` Stefan Puiu
  2023-01-20 12:39   ` Alejandro Colomar
@ 2023-01-24 11:16   ` Rich Felker
  2023-01-24 16:53     ` Alex Colomar
  1 sibling, 1 reply; 12+ messages in thread
From: Rich Felker @ 2023-01-24 11:16 UTC (permalink / raw)
  To: Stefan Puiu; +Cc: Alejandro Colomar, GNU C Library, linux-man, gcc, Igor Sysoev

On Fri, Jan 20, 2023 at 12:06:50PM +0200, Stefan Puiu via Libc-alpha wrote:
> Hi Alex,
> 
> On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
> <alx.manpages@gmail.com> wrote:
> >
> > Hi!
> >
> > I just received a report about struct sockaddr_storage in the man pages.  It
> > reminded me of some concern I've always had about it: it doesn't seem to be a
> > usable type.
> >
> > It has some alignment promises that make it "just work" most of the time, but
> > it's still a UB mine, according to ISO C.
> >
> > According to strict aliasing rules, if you declare a variable of type 'struct
> > sockaddr_storage', that's what you get, and trying to access it later as some
> > other sockaddr_8 is simply not legal.  The compiler may assume those accesses
> > can't happen, and optimize as it pleases.
> 
> Can you detail the "is not legal" part? How about the APIs like
> connect() etc that use pointers to struct sockaddr, where the
> underlying type is different, why would that be legal while using
> sockaddr_storage isn't?

Because they're specified to take different types. In C, any struct
pointer type can legally point to any other struct type. You just
can't dereference through it with the wrong type. How the
implementation of connect etc. handle this is an implementation
detail. You're allowed to pass pointers to struct sockaddr_in, etc. to
connect etc. simply because the specification says you are.

In any case, sockaddr_storage is a legacy thing designed by folks who
didn't understand the rules of the C language. It should never appear
in modern code except perhaps with sizeof for allocting buffers. There
is no action that needs to be taken here except documenting that it
should not be used (cannot be used meaningfully without UB).

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-23 16:28         ` Richard Biener
@ 2023-01-24 16:38           ` Alex Colomar
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Colomar @ 2023-01-24 16:38 UTC (permalink / raw)
  To: Richard Biener
  Cc: Stefan Puiu, GNU C Library, linux-man, gcc, Igor Sysoev,
	Bastien Roucariès


[-- Attachment #1.1: Type: text/plain, Size: 1039 bytes --]

Hi Richard,

On 1/23/23 17:28, Richard Biener wrote:
>> The common initial sequence of structures is only allowed if the structures form
>> part of a union (which is why to avoid UB you need a union; and still, you need
>> to make sure you don't invoke UB in a different way).
>> <https://port70.net/%7Ensz/c/c11/n1570.html#6.5.2.3p6>
> 
> GCC only allows it if the union is visible as part of the access, that
> is, it allows it
> under its rule of allowing punning for union accesses and not specially because
> of the initial sequence rule.  So
> 
>   u.a.x = 1;
>   ... = u.b.x;
> 
> is allowed but
> 
>    struct A *p = &u.a;
>    p->x = 1;
>    struct B *q = &u.b;
>    ... = q->x;
> 
> is UB with GCC if struct A and B are the union members with a common
> initial sequence.

Yep.  That's why we need a union that is defined in libc, so that it can 
be used both in and out of glibc.  sockaddr_storage can be reconverted 
to that purpose.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-23 16:37         ` Jakub Jelinek
@ 2023-01-24 16:40           ` Alex Colomar
  2023-01-24 18:00           ` Alex Colomar
  1 sibling, 0 replies; 12+ messages in thread
From: Alex Colomar @ 2023-01-24 16:40 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Stefan Puiu, GNU C Library, linux-man, gcc, Igor Sysoev,
	Bastien Roucariès


[-- Attachment #1.1: Type: text/plain, Size: 938 bytes --]

Hi Jakub,

On 1/23/23 17:37, Jakub Jelinek wrote:
> Please see transparent_union documentation in GCC documentation.
> E.g. https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Common-Type-Attributes.html#index-transparent_005funion-type-attribute
> transparent_union doesn't change anything regarding type punning, it is
> solely about function arguments, how arguments of that type are passed
> (as first union member) and that no casts to the union are needed from
> the member types.

Yep, when I wrote that I didn't fully understand it.  Now I got it. 
I'll prepare some better suggestion about a fix.

Thanks.

> And, with LTO TU boundaries are lost, inlining or other IPA optimizations
> (including say modref) work in between TUs.

Yeah, that's why we need a fix.  Compilers will only get better at 
optimizing, so UB will sooner or later be a problem.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-24 11:16   ` Rich Felker
@ 2023-01-24 16:53     ` Alex Colomar
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Colomar @ 2023-01-24 16:53 UTC (permalink / raw)
  To: Rich Felker, Stefan Puiu; +Cc: GNU C Library, linux-man, gcc, Igor Sysoev


[-- Attachment #1.1: Type: text/plain, Size: 2738 bytes --]

Hi Rick,

On 1/24/23 12:16, Rich Felker wrote:
> On Fri, Jan 20, 2023 at 12:06:50PM +0200, Stefan Puiu via Libc-alpha wrote:
>> Hi Alex,
>>
>> On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
>> <alx.manpages@gmail.com> wrote:
>>>
>>> Hi!
>>>
>>> I just received a report about struct sockaddr_storage in the man pages.  It
>>> reminded me of some concern I've always had about it: it doesn't seem to be a
>>> usable type.
>>>
>>> It has some alignment promises that make it "just work" most of the time, but
>>> it's still a UB mine, according to ISO C.
>>>
>>> According to strict aliasing rules, if you declare a variable of type 'struct
>>> sockaddr_storage', that's what you get, and trying to access it later as some
>>> other sockaddr_8 is simply not legal.  The compiler may assume those accesses
>>> can't happen, and optimize as it pleases.
>>
>> Can you detail the "is not legal" part? How about the APIs like
>> connect() etc that use pointers to struct sockaddr, where the
>> underlying type is different, why would that be legal while using
>> sockaddr_storage isn't?
> 
> Because they're specified to take different types. In C, any struct
> pointer type can legally point to any other struct type. You just
> can't dereference through it with the wrong type.

Yep.  Which basically means that users need to treat sockaddr structures 
as black boxes.  Otherwise, there's going to be undefined behavior at 
some point.  Because of course, you can't know the right type before 
reading the first field, which is already UB.

> How the
> implementation of connect etc. handle this is an implementation
> detail. You're allowed to pass pointers to struct sockaddr_in, etc. to
> connect etc. simply because the specification says you are.

While the implementation has some more freedom regarding UB, in this 
case it's waiting for a compiler optimization to break this code, so I'd 
go the safe way and use standard C techniques in libc so that there are 
no long-term UB issues.

As a side effect, user code that currently invokes UB could be changed 
to have defined behavior.

> 
> In any case, sockaddr_storage is a legacy thing designed by folks who
> didn't understand the rules of the C language. It should never appear
> in modern code except perhaps with sizeof for allocting buffers. There
> is no action that needs to be taken here except documenting that it
> should not be used (cannot be used meaningfully without UB).

I agree with you on this.  sockaddr_storage has been broken since day 0. 
  However, for designing a solution for libc using unions, it could be 
useful.

> 
> Rich

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: struct sockaddr_storage
  2023-01-23 16:37         ` Jakub Jelinek
  2023-01-24 16:40           ` Alex Colomar
@ 2023-01-24 18:00           ` Alex Colomar
  1 sibling, 0 replies; 12+ messages in thread
From: Alex Colomar @ 2023-01-24 18:00 UTC (permalink / raw)
  To: GNU C Library, Bastien Roucariès
  Cc: Stefan Puiu, linux-man, gcc, Igor Sysoev, Rich Felker,
	Andrew Clayton, Richard Biener, Zack Weinberg, Florian Weimer,
	Joseph Myers, Jakub Jelinek


[-- Attachment #1.1: Type: text/plain, Size: 1472 bytes --]

Hi,

After reading more about transparent_unit, here's my idea of a fix for 
the API.  old_api() is an example for the libc functions that accept a 
`struct sockaddr *`, and user_code() is an example for user code 
functions that handle sockaddr structures.  The interface would be the 
same as it is currently, but the implementation inside libc would change 
to use a union.  In user code, uses of sockaddr_storage would be made 
safe with these changes, I believe, and new code would be simpler, since 
it wouldn't need casts.



void old_api(union my_sockaddr_ptr *sa);


struct sockaddr_storage {
	union {
		struct {
			sa_family_t  ss_family;
		};
		struct sockaddr_in   sin;
		struct sockaddr_in6  sin6;
		struct sockaddr_un   sun;
		// ...
	};
};


union [[gnu::transparent_union]] sockaddr_ptr {
	struct sockaddr_storage  *ss;
	struct sockaddr          *sa;
};


void old_api(struct sockaddr_storage *ss)
{
	// Here libc uses the union, so it doesn't invoke UB.
	ss->sun.sa_family = AF_UNIX;
	//...
}


void user_code(void)
{
	struct my_sockaddr_storage  ss;  // object definition

	// ...

	old_api(&ss);  // The transparent_union allows no casts.

	switch (ss.ss_family) {
		// This is safe too.
		// thanks to common initial sequence within a union.
	}
}


This would in fact deprecate plain `struct sockaddr`, as Bastien suggested.


Cheers,

Alex


-- 
<http://www.alejandro-colomar.es/>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-01-24 18:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-19 14:11 struct sockaddr_storage Alejandro Colomar
2023-01-20 10:06 ` Stefan Puiu
2023-01-20 12:39   ` Alejandro Colomar
2023-01-23  7:40     ` Stefan Puiu
2023-01-23 16:03       ` Alejandro Colomar
2023-01-23 16:28         ` Richard Biener
2023-01-24 16:38           ` Alex Colomar
2023-01-23 16:37         ` Jakub Jelinek
2023-01-24 16:40           ` Alex Colomar
2023-01-24 18:00           ` Alex Colomar
2023-01-24 11:16   ` Rich Felker
2023-01-24 16:53     ` Alex Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).