* struct sockaddr_storage @ 2023-01-19 14:11 Alejandro Colomar 2023-01-20 10:06 ` Stefan Puiu 0 siblings, 1 reply; 12+ messages in thread From: Alejandro Colomar @ 2023-01-19 14:11 UTC (permalink / raw) To: GNU C Library; +Cc: linux-man, gcc, Igor Sysoev [-- Attachment #1.1: Type: text/plain, Size: 1702 bytes --] Hi! I just received a report about struct sockaddr_storage in the man pages. It reminded me of some concern I've always had about it: it doesn't seem to be a usable type. It has some alignment promises that make it "just work" most of the time, but it's still a UB mine, according to ISO C. According to strict aliasing rules, if you declare a variable of type 'struct sockaddr_storage', that's what you get, and trying to access it later as some other sockaddr_8 is simply not legal. The compiler may assume those accesses can't happen, and optimize as it pleases. That means that one needs to declare a union with all possible sockaddr_* types that are of interest, so that access as any of them is later allowed by the compiler (of course, the user still needs to access the correct one, but that's of course). In that union, one could add a member that is of type sockaddr_storage for getting a more consistent structure size (for example, if some members are conditional on preprocessor stuff), but I don't see much value in that. Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code: * struct sockaddr_storage is: * 128 bytes on Linux, FreeBSD, MacOSX, NetBSD; * 256 bytes on Solaris, OpenBSD, and HP-UX; * 1288 bytes on AIX. * * struct sockaddr_storage is too large on some platforms * or less than real maximum struct sockaddr_un length. Which makes it even more useless as a type. Should we warn about uses of this type? Should we recommend against using it in the manual page, since there's no legitimate uses of it? Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-19 14:11 struct sockaddr_storage Alejandro Colomar @ 2023-01-20 10:06 ` Stefan Puiu 2023-01-20 12:39 ` Alejandro Colomar 2023-01-24 11:16 ` Rich Felker 0 siblings, 2 replies; 12+ messages in thread From: Stefan Puiu @ 2023-01-20 10:06 UTC (permalink / raw) To: Alejandro Colomar; +Cc: GNU C Library, linux-man, gcc, Igor Sysoev Hi Alex, On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar <alx.manpages@gmail.com> wrote: > > Hi! > > I just received a report about struct sockaddr_storage in the man pages. It > reminded me of some concern I've always had about it: it doesn't seem to be a > usable type. > > It has some alignment promises that make it "just work" most of the time, but > it's still a UB mine, according to ISO C. > > According to strict aliasing rules, if you declare a variable of type 'struct > sockaddr_storage', that's what you get, and trying to access it later as some > other sockaddr_8 is simply not legal. The compiler may assume those accesses > can't happen, and optimize as it pleases. Can you detail the "is not legal" part? How about the APIs like connect() etc that use pointers to struct sockaddr, where the underlying type is different, why would that be legal while using sockaddr_storage isn't? Will code break in practice? > > That means that one needs to declare a union with all possible sockaddr_* types > that are of interest, so that access as any of them is later allowed by the > compiler (of course, the user still needs to access the correct one, but that's > of course). > > In that union, one could add a member that is of type sockaddr_storage for > getting a more consistent structure size (for example, if some members are > conditional on preprocessor stuff), but I don't see much value in that. > Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code: > > * struct sockaddr_storage is: > * 128 bytes on Linux, FreeBSD, MacOSX, NetBSD; > * 256 bytes on Solaris, OpenBSD, and HP-UX; > * 1288 bytes on AIX. > * > * struct sockaddr_storage is too large on some platforms > * or less than real maximum struct sockaddr_un length. > > Which makes it even more useless as a type. I'm not sure using struct sockaddr_storage for storing sockaddr_un's (UNIX domain socket addresses, right?) is that common a usage. I've used it in the past to store either a sockaddr_in or a sockaddr_in6, and I think that would be a more common scenario. The comment above probably makes sense for nginx, but different projects have different needs. As for the size, I guess it might matter if you want to port your code to AIX, Solaris, OpenBSD etc. I don't think all software is meant to be portable, though (or portable to those platforms). Maybe a warning is in order that, for portable code, developers should check its size on the other platforms targeted. Just my 2 cents, as always, Stefan. > > > Should we warn about uses of this type? Should we recommend against using it in > the manual page, since there's no legitimate uses of it? > > Cheers, > > Alex > > -- > <http://www.alejandro-colomar.es/> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-20 10:06 ` Stefan Puiu @ 2023-01-20 12:39 ` Alejandro Colomar 2023-01-23 7:40 ` Stefan Puiu 2023-01-24 11:16 ` Rich Felker 1 sibling, 1 reply; 12+ messages in thread From: Alejandro Colomar @ 2023-01-20 12:39 UTC (permalink / raw) To: Stefan Puiu; +Cc: GNU C Library, linux-man, gcc, Igor Sysoev [-- Attachment #1.1: Type: text/plain, Size: 6064 bytes --] Hi Stefan, On 1/20/23 11:06, Stefan Puiu wrote: > Hi Alex, > > On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar > <alx.manpages@gmail.com> wrote: >> >> Hi! >> >> I just received a report about struct sockaddr_storage in the man pages. It >> reminded me of some concern I've always had about it: it doesn't seem to be a >> usable type. >> >> It has some alignment promises that make it "just work" most of the time, but >> it's still a UB mine, according to ISO C. >> >> According to strict aliasing rules, if you declare a variable of type 'struct >> sockaddr_storage', that's what you get, and trying to access it later as some >> other sockaddr_8 is simply not legal. The compiler may assume those accesses >> can't happen, and optimize as it pleases. > > Can you detail the "is not legal" part? I mean that it's Undefined Behavior contraband. > How about the APIs like > connect() etc that use pointers to struct sockaddr, where the > underlying type is different, why would that be legal while using > sockaddr_storage isn't? That's also bad. However, it can be fixed by fixing `sockaddr_storage` and telling everyone to use it instead of using whatever other `sockaddr_*`. You need a union for the underlying storage, so that the library functions can access both as `sockaddr` and as `sockaddr_*`. The problem isn't really in the implementation of connect(2), but on the type. The implementation of connect(2) would be fine if we just fixed the type. See some example: struct my_sockaddr_storage { union { sa_family_t ss_family; struct sockaddr sa; struct sockaddr_in sin; struct sockaddr_in6 sin6; struct sockaddr_un sun; }; }; void foo(foo) { struct my_sockaddr_storage mss; struct sockaddr_storage ss; // initialize mss and ss inet_sockaddr2str(&mss.sa); // correct inet_sockaddr2str((struct sockaddr_storage *)&ss); // UB } /* This function is correct, as far as the accessed object has the * type we're using. That's only possible through a `union`, since * we're accessing it with 2 different types: `sockaddr` for the * `sa_family` and then the appropriate subtype for the address * itself. */ const char * inet_sockaddr2str(const struct sockaddr *sa) { struct sockaddr_in *sin; struct sockaddr_in6 *sin6; static char buf[INET_ADDRSTRLENMAX]; switch (sa->sa_family) { case AF_INET: sin = (struct sockaddr_in *) sa; inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf)); return buf; case AF_INET6: sin6 = (struct sockaddr_in6 *) sa; inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf)); return buf; default: errno = EAFNOSUPPORT; return NULL; } } BTW, you need a union _even if_ you only care about a single address family. That is, if you only care about Unix sockets, you can't declare your variable of type sockaddr_un, because the libc functions and syscalls still need to access it as a sockaddr to see which family it has. > Will code break in practice? Well, it depends on how much compilers advance. Here's some interesting experiment: <https://software.codidact.com/posts/287748/287750#answer-287750> I wouldn't rely on Undefined Behavior not causing nasal demons. When you get them, you can only kill them with garlic. > >> >> That means that one needs to declare a union with all possible sockaddr_* types >> that are of interest, so that access as any of them is later allowed by the >> compiler (of course, the user still needs to access the correct one, but that's >> of course). >> >> In that union, one could add a member that is of type sockaddr_storage for >> getting a more consistent structure size (for example, if some members are >> conditional on preprocessor stuff), but I don't see much value in that. >> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code: >> >> * struct sockaddr_storage is: >> * 128 bytes on Linux, FreeBSD, MacOSX, NetBSD; >> * 256 bytes on Solaris, OpenBSD, and HP-UX; >> * 1288 bytes on AIX. >> * >> * struct sockaddr_storage is too large on some platforms >> * or less than real maximum struct sockaddr_un length. >> >> Which makes it even more useless as a type. > > I'm not sure using struct sockaddr_storage for storing sockaddr_un's > (UNIX domain socket addresses, right?) is that common a usage. I've > used it in the past to store either a sockaddr_in or a sockaddr_in6, > and I think that would be a more common scenario. The comment above > probably makes sense for nginx, but different projects have different > needs. > > As for the size, I guess it might matter if you want to port your code > to AIX, Solaris, OpenBSD etc. I don't think all software is meant to > be portable, though (or portable to those platforms). Maybe a warning > is in order that, for portable code, developers should check its size > on the other platforms targeted. The size thing is just an added problem. The deep problem is that you need to use a union that contains all types that you care about _plus_ plain sockaddr, because the structure will be accessed at least as a sockaddr, plus one of the different specialized structures. So even for only sockaddr_un, you need at least the following: union my_unix_sockaddr { struct sockaddr sa; struct sockaddr_un sun; }; Not doing that will necessarily result in invoking Undefined Behavior at some point. > > Just my 2 cents, as always, > Stefan. The good thing is that fixing sockaddr_storage and telling everybody to use it always fixes the problem, so I'm preparing a patch for glibc. Cheers, Alex > >> >> >> Should we warn about uses of this type? Should we recommend against using it in >> the manual page, since there's no legitimate uses of it? >> >> Cheers, >> >> Alex >> >> -- >> <http://www.alejandro-colomar.es/> -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-20 12:39 ` Alejandro Colomar @ 2023-01-23 7:40 ` Stefan Puiu 2023-01-23 16:03 ` Alejandro Colomar 0 siblings, 1 reply; 12+ messages in thread From: Stefan Puiu @ 2023-01-23 7:40 UTC (permalink / raw) To: Alejandro Colomar; +Cc: GNU C Library, linux-man, gcc, Igor Sysoev Hi Alex, On Fri, Jan 20, 2023 at 2:40 PM Alejandro Colomar <alx.manpages@gmail.com> wrote: > > Hi Stefan, > > On 1/20/23 11:06, Stefan Puiu wrote: > > Hi Alex, > > > > On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar > > <alx.manpages@gmail.com> wrote: > >> > >> Hi! > >> > >> I just received a report about struct sockaddr_storage in the man pages. It > >> reminded me of some concern I've always had about it: it doesn't seem to be a > >> usable type. > >> > >> It has some alignment promises that make it "just work" most of the time, but > >> it's still a UB mine, according to ISO C. > >> > >> According to strict aliasing rules, if you declare a variable of type 'struct > >> sockaddr_storage', that's what you get, and trying to access it later as some > >> other sockaddr_8 is simply not legal. The compiler may assume those accesses > >> can't happen, and optimize as it pleases. > > > > Can you detail the "is not legal" part? > > I mean that it's Undefined Behavior contraband. OK, next question. Is this theoretical or practical UB? People check documentation about how to write code today, I think. > > > How about the APIs like > > connect() etc that use pointers to struct sockaddr, where the > > underlying type is different, why would that be legal while using > > sockaddr_storage isn't? > > That's also bad. However, it can be fixed by fixing `sockaddr_storage` and > telling everyone to use it instead of using whatever other `sockaddr_*`. You > need a union for the underlying storage, so that the library functions can > access both as `sockaddr` and as `sockaddr_*`. > > The problem isn't really in the implementation of connect(2), but on the type. > The implementation of connect(2) would be fine if we just fixed the type. See > some example: > > struct my_sockaddr_storage { > union { > sa_family_t ss_family; > struct sockaddr sa; > struct sockaddr_in sin; > struct sockaddr_in6 sin6; > struct sockaddr_un sun; > }; > }; > > > void > foo(foo) > { > struct my_sockaddr_storage mss; > struct sockaddr_storage ss; > > // initialize mss and ss > > inet_sockaddr2str(&mss.sa); // correct > inet_sockaddr2str((struct sockaddr_storage *)&ss); // UB > } > > /* This function is correct, as far as the accessed object has the > * type we're using. That's only possible through a `union`, since > * we're accessing it with 2 different types: `sockaddr` for the > * `sa_family` and then the appropriate subtype for the address > * itself. > */ > const char * > inet_sockaddr2str(const struct sockaddr *sa) > { > struct sockaddr_in *sin; > struct sockaddr_in6 *sin6; > > static char buf[INET_ADDRSTRLENMAX]; > > switch (sa->sa_family) { > case AF_INET: > sin = (struct sockaddr_in *) sa; > inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf)); > return buf; > case AF_INET6: > sin6 = (struct sockaddr_in6 *) sa; > inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf)); > return buf; > default: > errno = EAFNOSUPPORT; > return NULL; > } > } > > > BTW, you need a union _even if_ you only care about a single address family. > That is, if you only care about Unix sockets, you can't declare your variable of > type sockaddr_un, because the libc functions and syscalls still need to access > it as a sockaddr to see which family it has. > > > Will code break in practice? > > Well, it depends on how much compilers advance. Here's some interesting experiment: > > <https://software.codidact.com/posts/287748/287750#answer-287750> That code plays with 2 pointers to the same area, one to double and one to int, so I don't think it's that similar to the sockaddr situation. At least for struct sockaddr, the sa_family field is the same for all struct sockaddr_* variants. Also, in practical terms, I don't think any compiler optimization that breaks socket APIs (and, if I recall correctly, there are instances of this pattern in the kernel as well) is going to be an easy sell. It's possible, but realistically speaking, I don't think it's going to happen. > > I wouldn't rely on Undefined Behavior not causing nasal demons. When you get > them, you can only kill them with garlic. OK, but not all theoretical issues have practical implications. Is there code that can show UB in practical terms with struct sockaddr_storage today? Like Eric mentioned in another thread, does UBSan complain about code using struct sockaddr_storage? Thanks, Stefan. > > > > >> > >> That means that one needs to declare a union with all possible sockaddr_* types > >> that are of interest, so that access as any of them is later allowed by the > >> compiler (of course, the user still needs to access the correct one, but that's > >> of course). > >> > >> In that union, one could add a member that is of type sockaddr_storage for > >> getting a more consistent structure size (for example, if some members are > >> conditional on preprocessor stuff), but I don't see much value in that. > >> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code: > >> > >> * struct sockaddr_storage is: > >> * 128 bytes on Linux, FreeBSD, MacOSX, NetBSD; > >> * 256 bytes on Solaris, OpenBSD, and HP-UX; > >> * 1288 bytes on AIX. > >> * > >> * struct sockaddr_storage is too large on some platforms > >> * or less than real maximum struct sockaddr_un length. > >> > >> Which makes it even more useless as a type. > > > > I'm not sure using struct sockaddr_storage for storing sockaddr_un's > > (UNIX domain socket addresses, right?) is that common a usage. I've > > used it in the past to store either a sockaddr_in or a sockaddr_in6, > > and I think that would be a more common scenario. The comment above > > probably makes sense for nginx, but different projects have different > > needs. > > > > As for the size, I guess it might matter if you want to port your code > > to AIX, Solaris, OpenBSD etc. I don't think all software is meant to > > be portable, though (or portable to those platforms). Maybe a warning > > is in order that, for portable code, developers should check its size > > on the other platforms targeted. > > The size thing is just an added problem. The deep problem is that you need to > use a union that contains all types that you care about _plus_ plain sockaddr, > because the structure will be accessed at least as a sockaddr, plus one of the > different specialized structures. So even for only sockaddr_un, you need at > least the following: > > union my_unix_sockaddr { > struct sockaddr sa; > struct sockaddr_un sun; > }; > > Not doing that will necessarily result in invoking Undefined Behavior at some point. > > > > > Just my 2 cents, as always, > > Stefan. > > The good thing is that fixing sockaddr_storage and telling everybody to use it > always fixes the problem, so I'm preparing a patch for glibc. > > Cheers, > > Alex > > > > >> > >> > >> Should we warn about uses of this type? Should we recommend against using it in > >> the manual page, since there's no legitimate uses of it? > >> > >> Cheers, > >> > >> Alex > >> > >> -- > >> <http://www.alejandro-colomar.es/> > > -- > <http://www.alejandro-colomar.es/> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-23 7:40 ` Stefan Puiu @ 2023-01-23 16:03 ` Alejandro Colomar 2023-01-23 16:28 ` Richard Biener 2023-01-23 16:37 ` Jakub Jelinek 0 siblings, 2 replies; 12+ messages in thread From: Alejandro Colomar @ 2023-01-23 16:03 UTC (permalink / raw) To: Stefan Puiu Cc: GNU C Library, linux-man, gcc, Igor Sysoev, Bastien Roucariès [-- Attachment #1.1: Type: text/plain, Size: 4583 bytes --] Hi Stefan, On 1/23/23 08:40, Stefan Puiu wrote: >>>> According to strict aliasing rules, if you declare a variable of type 'struct >>>> sockaddr_storage', that's what you get, and trying to access it later as some >>>> other sockaddr_8 is simply not legal. The compiler may assume those accesses >>>> can't happen, and optimize as it pleases. >>> >>> Can you detail the "is not legal" part? >> >> I mean that it's Undefined Behavior contraband. > > OK, next question. Is this theoretical or practical UB? Since the functions using this type are not functions that should be inlined, since the code is rather large, they are not visible to the compiler, so many of the optimizations that this UB enables are not likely to happen. Translation Unit (TU) boundaries are what keeps most UB invokations not be really dangerous. Also, glibc seems to be using a GCC attribute (transparent_union) to make the code avoid UB even if it were inlined, so if you use glibc, you're fine. If you're using some smaller libc with a less capable compiler, or maybe C++, you are less lucky, but TU boundaries will probably still save your day. > People check > documentation about how to write code today, I think. I'm fine documenting how to do it today. But before changing the documentation, I'd like to take some time to reflect on what can we do to fix the standard so that we don't have this semi-broken state forever. When we have a clear idea of what we can do to fix the implementation and hopefully the standard long-term, possibly keeping source code the same, we can do a better recommendation for programmers. Today, you can do 2 things: - You don't care about UB, and would like that C had always been K&R C, and GCC just makes it work. Then use `sockaddr_storage`. It will just work. When it stops working, you can blame the compiler and libc for optimizing way too much. - You care a lot about UB. Then write your own union, as all the `sockaddr` interface should have been designed from the ground up. That's what unions are for. Which should we recommend? That's my problem. I don't want to be documenting the latter, because it's non-standard, and it's still likely to do it invoking UB in a different way, because it's a difficult part of the language, and when you roll your own, you're likely to make accidents. So, ideally, I'd like to document the former, but for that, I'd like to make sure that it will work forever, since otherwise we'd be blamed when somebody's code is compiled in a platform with some combination of libc, compiler, and phase of the moon, that makes the UB become non-theoretical. I think we can fix the definition of `sockaddr_storage` to have defined behavior, with the changes I'm discussing with Bastien, so I guess we'll document the former. >>> Will code break in practice? >> >> Well, it depends on how much compilers advance. Here's some interesting experiment: >> >> <https://software.codidact.com/posts/287748/287750#answer-287750> > > That code plays with 2 pointers to the same area, one to double and > one to int, so I don't think it's that similar to the sockaddr > situation. At least for struct sockaddr, the sa_family field is the > same for all struct sockaddr_* variants. Also, in practical terms, I > don't think any compiler optimization that breaks socket APIs (and, if > I recall correctly, there are instances of this pattern in the kernel > as well) is going to be an easy sell. It's possible, but realistically > speaking, I don't think it's going to happen. The common initial sequence of structures is only allowed if the structures form part of a union (which is why to avoid UB you need a union; and still, you need to make sure you don't invoke UB in a different way). <https://port70.net/%7Ensz/c/c11/n1570.html#6.5.2.3p6> > >> >> I wouldn't rely on Undefined Behavior not causing nasal demons. When you get >> them, you can only kill them with garlic. > > OK, but not all theoretical issues have practical implications. Is > there code that can show UB in practical terms with struct > sockaddr_storage today? Like Eric mentioned in another thread, does > UBSan complain about code using struct sockaddr_storage? It's unlikely. But I can't promise it will be safe under some random combination of compiler and library, and depends also on what you do in your code, which will affect compiler optimizations. Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-23 16:03 ` Alejandro Colomar @ 2023-01-23 16:28 ` Richard Biener 2023-01-24 16:38 ` Alex Colomar 2023-01-23 16:37 ` Jakub Jelinek 1 sibling, 1 reply; 12+ messages in thread From: Richard Biener @ 2023-01-23 16:28 UTC (permalink / raw) To: Alejandro Colomar Cc: Stefan Puiu, GNU C Library, linux-man, gcc, Igor Sysoev, Bastien Roucariès On Mon, Jan 23, 2023 at 5:04 PM Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote: > > Hi Stefan, > > On 1/23/23 08:40, Stefan Puiu wrote: > >>>> According to strict aliasing rules, if you declare a variable of type 'struct > >>>> sockaddr_storage', that's what you get, and trying to access it later as some > >>>> other sockaddr_8 is simply not legal. The compiler may assume those accesses > >>>> can't happen, and optimize as it pleases. > >>> > >>> Can you detail the "is not legal" part? > >> > >> I mean that it's Undefined Behavior contraband. > > > > OK, next question. Is this theoretical or practical UB? > > > Since the functions using this type are not functions that should be inlined, > since the code is rather large, they are not visible to the compiler, so many of > the optimizations that this UB enables are not likely to happen. Translation > Unit (TU) boundaries are what keeps most UB invokations not be really dangerous. > > Also, glibc seems to be using a GCC attribute (transparent_union) to make the > code avoid UB even if it were inlined, so if you use glibc, you're fine. If > you're using some smaller libc with a less capable compiler, or maybe C++, you > are less lucky, but TU boundaries will probably still save your day. > > > People check > > documentation about how to write code today, I think. > > I'm fine documenting how to do it today. But before changing the documentation, > I'd like to take some time to reflect on what can we do to fix the standard so > that we don't have this semi-broken state forever. When we have a clear idea of > what we can do to fix the implementation and hopefully the standard long-term, > possibly keeping source code the same, we can do a better recommendation for > programmers. > > Today, you can do 2 things: > > - You don't care about UB, and would like that C had always been K&R C, and GCC > just makes it work. Then use `sockaddr_storage`. It will just work. When it > stops working, you can blame the compiler and libc for optimizing way too much. > > - You care a lot about UB. Then write your own union, as all the `sockaddr` > interface should have been designed from the ground up. That's what unions are for. > > Which should we recommend? That's my problem. > > I don't want to be documenting the latter, because it's non-standard, and it's > still likely to do it invoking UB in a different way, because it's a difficult > part of the language, and when you roll your own, you're likely to make accidents. > > So, ideally, I'd like to document the former, but for that, I'd like to make > sure that it will work forever, since otherwise we'd be blamed when somebody's > code is compiled in a platform with some combination of libc, compiler, and > phase of the moon, that makes the UB become non-theoretical. > > I think we can fix the definition of `sockaddr_storage` to have defined > behavior, with the changes I'm discussing with Bastien, so I guess we'll > document the former. > > >>> Will code break in practice? > >> > >> Well, it depends on how much compilers advance. Here's some interesting experiment: > >> > >> <https://software.codidact.com/posts/287748/287750#answer-287750> > > > > That code plays with 2 pointers to the same area, one to double and > > one to int, so I don't think it's that similar to the sockaddr > > situation. At least for struct sockaddr, the sa_family field is the > > same for all struct sockaddr_* variants. Also, in practical terms, I > > don't think any compiler optimization that breaks socket APIs (and, if > > I recall correctly, there are instances of this pattern in the kernel > > as well) is going to be an easy sell. It's possible, but realistically > > speaking, I don't think it's going to happen. > > The common initial sequence of structures is only allowed if the structures form > part of a union (which is why to avoid UB you need a union; and still, you need > to make sure you don't invoke UB in a different way). > <https://port70.net/%7Ensz/c/c11/n1570.html#6.5.2.3p6> GCC only allows it if the union is visible as part of the access, that is, it allows it under its rule of allowing punning for union accesses and not specially because of the initial sequence rule. So u.a.x = 1; ... = u.b.x; is allowed but struct A *p = &u.a; p->x = 1; struct B *q = &u.b; ... = q->x; is UB with GCC if struct A and B are the union members with a common initial sequence. Richard. > > > >> > >> I wouldn't rely on Undefined Behavior not causing nasal demons. When you get > >> them, you can only kill them with garlic. > > > > OK, but not all theoretical issues have practical implications. Is > > there code that can show UB in practical terms with struct > > sockaddr_storage today? Like Eric mentioned in another thread, does > > UBSan complain about code using struct sockaddr_storage? > > It's unlikely. But I can't promise it will be safe under some random > combination of compiler and library, and depends also on what you do in your > code, which will affect compiler optimizations. > > Cheers, > > Alex > > -- > <http://www.alejandro-colomar.es/> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-23 16:28 ` Richard Biener @ 2023-01-24 16:38 ` Alex Colomar 0 siblings, 0 replies; 12+ messages in thread From: Alex Colomar @ 2023-01-24 16:38 UTC (permalink / raw) To: Richard Biener Cc: Stefan Puiu, GNU C Library, linux-man, gcc, Igor Sysoev, Bastien Roucariès [-- Attachment #1.1: Type: text/plain, Size: 1039 bytes --] Hi Richard, On 1/23/23 17:28, Richard Biener wrote: >> The common initial sequence of structures is only allowed if the structures form >> part of a union (which is why to avoid UB you need a union; and still, you need >> to make sure you don't invoke UB in a different way). >> <https://port70.net/%7Ensz/c/c11/n1570.html#6.5.2.3p6> > > GCC only allows it if the union is visible as part of the access, that > is, it allows it > under its rule of allowing punning for union accesses and not specially because > of the initial sequence rule. So > > u.a.x = 1; > ... = u.b.x; > > is allowed but > > struct A *p = &u.a; > p->x = 1; > struct B *q = &u.b; > ... = q->x; > > is UB with GCC if struct A and B are the union members with a common > initial sequence. Yep. That's why we need a union that is defined in libc, so that it can be used both in and out of glibc. sockaddr_storage can be reconverted to that purpose. Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-23 16:03 ` Alejandro Colomar 2023-01-23 16:28 ` Richard Biener @ 2023-01-23 16:37 ` Jakub Jelinek 2023-01-24 16:40 ` Alex Colomar 2023-01-24 18:00 ` Alex Colomar 1 sibling, 2 replies; 12+ messages in thread From: Jakub Jelinek @ 2023-01-23 16:37 UTC (permalink / raw) To: Alejandro Colomar Cc: Stefan Puiu, GNU C Library, linux-man, gcc, Igor Sysoev, Bastien Roucariès On Mon, Jan 23, 2023 at 05:03:00PM +0100, Alejandro Colomar via Gcc wrote: > Hi Stefan, > > On 1/23/23 08:40, Stefan Puiu wrote: > > > > > According to strict aliasing rules, if you declare a variable of type 'struct > > > > > sockaddr_storage', that's what you get, and trying to access it later as some > > > > > other sockaddr_8 is simply not legal. The compiler may assume those accesses > > > > > can't happen, and optimize as it pleases. > > > > > > > > Can you detail the "is not legal" part? > > > > > > I mean that it's Undefined Behavior contraband. > > > > OK, next question. Is this theoretical or practical UB? > > > Since the functions using this type are not functions that should be > inlined, since the code is rather large, they are not visible to the > compiler, so many of the optimizations that this UB enables are not likely > to happen. Translation Unit (TU) boundaries are what keeps most UB > invokations not be really dangerous. > > Also, glibc seems to be using a GCC attribute (transparent_union) to make > the code avoid UB even if it were inlined, so if you use glibc, you're fine. > If you're using some smaller libc with a less capable compiler, or maybe > C++, you are less lucky, but TU boundaries will probably still save your > day. Please see transparent_union documentation in GCC documentation. E.g. https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Common-Type-Attributes.html#index-transparent_005funion-type-attribute transparent_union doesn't change anything regarding type punning, it is solely about function arguments, how arguments of that type are passed (as first union member) and that no casts to the union are needed from the member types. And, with LTO TU boundaries are lost, inlining or other IPA optimizations (including say modref) work in between TUs. Jakub ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-23 16:37 ` Jakub Jelinek @ 2023-01-24 16:40 ` Alex Colomar 2023-01-24 18:00 ` Alex Colomar 1 sibling, 0 replies; 12+ messages in thread From: Alex Colomar @ 2023-01-24 16:40 UTC (permalink / raw) To: Jakub Jelinek Cc: Stefan Puiu, GNU C Library, linux-man, gcc, Igor Sysoev, Bastien Roucariès [-- Attachment #1.1: Type: text/plain, Size: 938 bytes --] Hi Jakub, On 1/23/23 17:37, Jakub Jelinek wrote: > Please see transparent_union documentation in GCC documentation. > E.g. https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Common-Type-Attributes.html#index-transparent_005funion-type-attribute > transparent_union doesn't change anything regarding type punning, it is > solely about function arguments, how arguments of that type are passed > (as first union member) and that no casts to the union are needed from > the member types. Yep, when I wrote that I didn't fully understand it. Now I got it. I'll prepare some better suggestion about a fix. Thanks. > And, with LTO TU boundaries are lost, inlining or other IPA optimizations > (including say modref) work in between TUs. Yeah, that's why we need a fix. Compilers will only get better at optimizing, so UB will sooner or later be a problem. Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-23 16:37 ` Jakub Jelinek 2023-01-24 16:40 ` Alex Colomar @ 2023-01-24 18:00 ` Alex Colomar 1 sibling, 0 replies; 12+ messages in thread From: Alex Colomar @ 2023-01-24 18:00 UTC (permalink / raw) To: GNU C Library, Bastien Roucariès Cc: Stefan Puiu, linux-man, gcc, Igor Sysoev, Rich Felker, Andrew Clayton, Richard Biener, Zack Weinberg, Florian Weimer, Joseph Myers, Jakub Jelinek [-- Attachment #1.1: Type: text/plain, Size: 1472 bytes --] Hi, After reading more about transparent_unit, here's my idea of a fix for the API. old_api() is an example for the libc functions that accept a `struct sockaddr *`, and user_code() is an example for user code functions that handle sockaddr structures. The interface would be the same as it is currently, but the implementation inside libc would change to use a union. In user code, uses of sockaddr_storage would be made safe with these changes, I believe, and new code would be simpler, since it wouldn't need casts. void old_api(union my_sockaddr_ptr *sa); struct sockaddr_storage { union { struct { sa_family_t ss_family; }; struct sockaddr_in sin; struct sockaddr_in6 sin6; struct sockaddr_un sun; // ... }; }; union [[gnu::transparent_union]] sockaddr_ptr { struct sockaddr_storage *ss; struct sockaddr *sa; }; void old_api(struct sockaddr_storage *ss) { // Here libc uses the union, so it doesn't invoke UB. ss->sun.sa_family = AF_UNIX; //... } void user_code(void) { struct my_sockaddr_storage ss; // object definition // ... old_api(&ss); // The transparent_union allows no casts. switch (ss.ss_family) { // This is safe too. // thanks to common initial sequence within a union. } } This would in fact deprecate plain `struct sockaddr`, as Bastien suggested. Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-20 10:06 ` Stefan Puiu 2023-01-20 12:39 ` Alejandro Colomar @ 2023-01-24 11:16 ` Rich Felker 2023-01-24 16:53 ` Alex Colomar 1 sibling, 1 reply; 12+ messages in thread From: Rich Felker @ 2023-01-24 11:16 UTC (permalink / raw) To: Stefan Puiu; +Cc: Alejandro Colomar, GNU C Library, linux-man, gcc, Igor Sysoev On Fri, Jan 20, 2023 at 12:06:50PM +0200, Stefan Puiu via Libc-alpha wrote: > Hi Alex, > > On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar > <alx.manpages@gmail.com> wrote: > > > > Hi! > > > > I just received a report about struct sockaddr_storage in the man pages. It > > reminded me of some concern I've always had about it: it doesn't seem to be a > > usable type. > > > > It has some alignment promises that make it "just work" most of the time, but > > it's still a UB mine, according to ISO C. > > > > According to strict aliasing rules, if you declare a variable of type 'struct > > sockaddr_storage', that's what you get, and trying to access it later as some > > other sockaddr_8 is simply not legal. The compiler may assume those accesses > > can't happen, and optimize as it pleases. > > Can you detail the "is not legal" part? How about the APIs like > connect() etc that use pointers to struct sockaddr, where the > underlying type is different, why would that be legal while using > sockaddr_storage isn't? Because they're specified to take different types. In C, any struct pointer type can legally point to any other struct type. You just can't dereference through it with the wrong type. How the implementation of connect etc. handle this is an implementation detail. You're allowed to pass pointers to struct sockaddr_in, etc. to connect etc. simply because the specification says you are. In any case, sockaddr_storage is a legacy thing designed by folks who didn't understand the rules of the C language. It should never appear in modern code except perhaps with sizeof for allocting buffers. There is no action that needs to be taken here except documenting that it should not be used (cannot be used meaningfully without UB). Rich ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: struct sockaddr_storage 2023-01-24 11:16 ` Rich Felker @ 2023-01-24 16:53 ` Alex Colomar 0 siblings, 0 replies; 12+ messages in thread From: Alex Colomar @ 2023-01-24 16:53 UTC (permalink / raw) To: Rich Felker, Stefan Puiu; +Cc: GNU C Library, linux-man, gcc, Igor Sysoev [-- Attachment #1.1: Type: text/plain, Size: 2738 bytes --] Hi Rick, On 1/24/23 12:16, Rich Felker wrote: > On Fri, Jan 20, 2023 at 12:06:50PM +0200, Stefan Puiu via Libc-alpha wrote: >> Hi Alex, >> >> On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar >> <alx.manpages@gmail.com> wrote: >>> >>> Hi! >>> >>> I just received a report about struct sockaddr_storage in the man pages. It >>> reminded me of some concern I've always had about it: it doesn't seem to be a >>> usable type. >>> >>> It has some alignment promises that make it "just work" most of the time, but >>> it's still a UB mine, according to ISO C. >>> >>> According to strict aliasing rules, if you declare a variable of type 'struct >>> sockaddr_storage', that's what you get, and trying to access it later as some >>> other sockaddr_8 is simply not legal. The compiler may assume those accesses >>> can't happen, and optimize as it pleases. >> >> Can you detail the "is not legal" part? How about the APIs like >> connect() etc that use pointers to struct sockaddr, where the >> underlying type is different, why would that be legal while using >> sockaddr_storage isn't? > > Because they're specified to take different types. In C, any struct > pointer type can legally point to any other struct type. You just > can't dereference through it with the wrong type. Yep. Which basically means that users need to treat sockaddr structures as black boxes. Otherwise, there's going to be undefined behavior at some point. Because of course, you can't know the right type before reading the first field, which is already UB. > How the > implementation of connect etc. handle this is an implementation > detail. You're allowed to pass pointers to struct sockaddr_in, etc. to > connect etc. simply because the specification says you are. While the implementation has some more freedom regarding UB, in this case it's waiting for a compiler optimization to break this code, so I'd go the safe way and use standard C techniques in libc so that there are no long-term UB issues. As a side effect, user code that currently invokes UB could be changed to have defined behavior. > > In any case, sockaddr_storage is a legacy thing designed by folks who > didn't understand the rules of the C language. It should never appear > in modern code except perhaps with sizeof for allocting buffers. There > is no action that needs to be taken here except documenting that it > should not be used (cannot be used meaningfully without UB). I agree with you on this. sockaddr_storage has been broken since day 0. However, for designing a solution for libc using unions, it could be useful. > > Rich Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-01-24 18:00 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-19 14:11 struct sockaddr_storage Alejandro Colomar 2023-01-20 10:06 ` Stefan Puiu 2023-01-20 12:39 ` Alejandro Colomar 2023-01-23 7:40 ` Stefan Puiu 2023-01-23 16:03 ` Alejandro Colomar 2023-01-23 16:28 ` Richard Biener 2023-01-24 16:38 ` Alex Colomar 2023-01-23 16:37 ` Jakub Jelinek 2023-01-24 16:40 ` Alex Colomar 2023-01-24 18:00 ` Alex Colomar 2023-01-24 11:16 ` Rich Felker 2023-01-24 16:53 ` Alex Colomar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).