[The previous message is archived at .] On Tue, 2020-07-28 at 20:31 +0200, Thorsten Glaser wrote: > Package: src:linux > Version: 5.7.6-1 > Severity: normal > Tags: upstream > X-Debbugs-Cc: tg@mirbsd.de > > I’m using setsockopt to set the traffic class on sending and receive > it in control messages on receiving, for both IPv4 and IPv6. > > The relevant documentation is the ip(7) manpage and, because the ipv6(7) > manpage doesn’t contain it, RFC3542. ip(7) also doesn't document IP_PKTOPIONS. [...] > Same in net/ipv4/ip_sockglue.c… > > int tos = inet->rcv_tos; > put_cmsg(&msg, SOL_IP, IP_TOS, sizeof(tos), &tos); > … in one place, but… > > put_cmsg(msg, SOL_IP, IP_TOS, 1, &ip_hdr(skb)->tos); > > … in ip_cmsg_recv_tos(), yielding inconsistent results for IPv4(!). Those are two different APIs though: recvmsg() for datagram sockets, vs getsockopt(... IP_PKTOPTIONS ...) for stream sockets. They obviously ought to be consistent, but mistakes happen. [...] > tl;dr: Receiving traffic class values from IP traffic is broken on > big endian platforms. Some user-space that uses getsockopt(... IP_PKTOPTIONS ...) for stream sockets might be broken. I searched for 'cmsg_type.*IP_TOS' on codesearch.debian.net, and found only two instances where it was used in conjunction with IP_PKTOPTIONS. libzorpll reads only the first byte (so is broken on big-endian): https://sources.debian.org/src/libzorpll/7.0.1.0%7Ealpha1-1.1/src/io.cc/#L239 squid reads an int and then truncates it to a byte (so is fine): https://sources.debian.org/src/squid/4.12-1/src/ip/QosConfig.cc/#L41 > I place the following suggestion for discussion, to achieve maximum > portability: put 4 bytes into the CMSG for both IPv4 and IPv6, where > the first and fourth byte are, identically, traffic class, second and > third 0. [...] I see no point in changing the IPv6 behaviour: it seems to be consistent with itself and with the standard, so only risks breaking user-space that works today. As for IPv4, changing the format of the IP_TOS field in the IP_PKTOPIONS value looks like it would work for the two users found in Debian. But you should know that the highest priority for Linux API compatibility is to avoid breaking currently working user-space. That means that ugly and inconsistent APIs won't get fixed if it causes a regression for the programs people actually use. If the API never worked like it was supposed to on some architectures, that's not a regression, and is lower priority. Ben. -- Ben Hutchings It is easier to write an incorrect program than to understand a correct one.