All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Bug#966459: linux: traffic class socket options (both IPv4/IPv6) inconsistent with docs/standards
       [not found] <159596111771.2639.6929056987566441726.reportbug@tglase-nb.lan.tarent.de>
@ 2020-08-02 17:49 ` Ben Hutchings
  2020-08-02 19:29   ` Thorsten Glaser
  0 siblings, 1 reply; 6+ messages in thread
From: Ben Hutchings @ 2020-08-02 17:49 UTC (permalink / raw)
  To: Thorsten Glaser, 966459; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 2769 bytes --]

[The previous message is archived at <https://bugs.debian.org/966459>.]

On Tue, 2020-07-28 at 20:31 +0200, Thorsten Glaser wrote:
> Package: src:linux
> Version: 5.7.6-1
> Severity: normal
> Tags: upstream
> X-Debbugs-Cc: tg@mirbsd.de
> 
> I’m using setsockopt to set the traffic class on sending and receive
> it in control messages on receiving, for both IPv4 and IPv6.
> 
> The relevant documentation is the ip(7) manpage and, because the ipv6(7)
> manpage doesn’t contain it, RFC3542.

ip(7) also doesn't document IP_PKTOPIONS.

[...]
> Same in net/ipv4/ip_sockglue.c…
> 
>                         int tos = inet->rcv_tos;
>                         put_cmsg(&msg, SOL_IP, IP_TOS, sizeof(tos), &tos);
> … in one place, but…
> 
>         put_cmsg(msg, SOL_IP, IP_TOS, 1, &ip_hdr(skb)->tos);
> 
> … in ip_cmsg_recv_tos(), yielding inconsistent results for IPv4(!).

Those are two different APIs though: recvmsg() for datagram sockets, vs
getsockopt(... IP_PKTOPTIONS ...) for stream sockets.  They obviously
ought to be consistent, but mistakes happen.

[...]
> tl;dr: Receiving traffic class values from IP traffic is broken on
> big endian platforms.

Some user-space that uses getsockopt(... IP_PKTOPTIONS ...) for stream
sockets might be broken.

I searched for 'cmsg_type.*IP_TOS' on codesearch.debian.net, and found
only two instances where it was used in conjunction with IP_PKTOPTIONS.

libzorpll reads only the first byte (so is broken on big-endian):
https://sources.debian.org/src/libzorpll/7.0.1.0%7Ealpha1-1.1/src/io.cc/#L239

squid reads an int and then truncates it to a byte (so is fine):
https://sources.debian.org/src/squid/4.12-1/src/ip/QosConfig.cc/#L41

> I place the following suggestion for discussion, to achieve maximum
> portability: put 4 bytes into the CMSG for both IPv4 and IPv6, where
> the first and fourth byte are, identically, traffic class, second and
> third 0.
[...]

I see no point in changing the IPv6 behaviour: it seems to be
consistent with itself and with the standard, so only risks breaking
user-space that works today.

As for IPv4, changing the format of the IP_TOS field in the
IP_PKTOPIONS value looks like it would work for the two users found in
Debian.

But you should know that the highest priority for Linux API
compatibility is to avoid breaking currently working user-space.  That
means that ugly and inconsistent APIs won't get fixed if it causes a
regression for the programs people actually use.  If the API never
worked like it was supposed to on some architectures, that's not a
regression, and is lower priority.

Ben.

-- 
Ben Hutchings
It is easier to write an incorrect program
than to understand a correct one.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug#966459: linux: traffic class socket options (both IPv4/IPv6) inconsistent with docs/standards
  2020-08-02 17:49 ` Bug#966459: linux: traffic class socket options (both IPv4/IPv6) inconsistent with docs/standards Ben Hutchings
@ 2020-08-02 19:29   ` Thorsten Glaser
  2020-08-02 20:29     ` Ben Hutchings
  0 siblings, 1 reply; 6+ messages in thread
From: Thorsten Glaser @ 2020-08-02 19:29 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: 966459, netdev

Ben Hutchings dixit:

>ip(7) also doesn't document IP_PKTOPIONS.

Hmm, I don’t use IP_PKTOPIONS though. I’m not exactly sure I found
the correct place in the kernel for what I do.

On the sending side, I use setsockopt with either
IPPROTO_IP,IP_TOS or IPPROTO_IPV6,IPV6_TCLASS to
set the default traffic class on outgoing packets.

On the receiving side I use setsockopt with either
IPPROTO_IP,IP_RECVTOS or IPPROTO_IPV6,IPV6_RECVTCLASS
to set up the socket then recvmsg to get a cmsg(3) of
IPPROTO_IP,IP_TOS/IPPROTO_IPV6,IPV6_TCLASS from which
I read the traffic class octet.

These are where I believe I found inconsistencies
between code and documentation.

>Those are two different APIs though: recvmsg() for datagram sockets, vs
>getsockopt(... IP_PKTOPTIONS ...) for stream sockets.  They obviously
>ought to be consistent, but mistakes happen.

OK, I’m currently looking at the datagram case only.
This may change later if there’s enough time.

>I see no point in changing the IPv6 behaviour: it seems to be
>consistent with itself and with the standard

Not really: if the kernel writes an int and userspace reads
its first byte, it only works by accident on little endian,
but not elsewhere.

>so only risks breaking user-space that works today.

Hrm. It risks breaking userspace that reads an int. But the
RFC clearly says it should read the first byte, not an int.

>But you should know that the highest priority for Linux API
>compatibility is to avoid breaking currently working user-space.  That
>means that ugly and inconsistent APIs won't get fixed if it causes a
>regression for the programs people actually use.  If the API never
>worked like it was supposed to on some architectures, that's not a
>regression, and is lower priority.

This is why I just put this up for discussion instead of
requesting a specific change.

That being said, given that the IPv6 API is *only* documented
in the RFC and *not* documented in the Linux manpages…

(Perhaps codesearching for IPV6_TCLASS might also help.
It’s unclear how many users this has…)



In the end, what I really want, is clear documentation for
how I should implement the following file that it works on
Linux, and ideally also other systems implementing the RFC
API (FreeBSD supposedly does but needs testing):

https://github.com/tarent/ECN-Bits/blob/master/linux-c/lib/ecn.c

Given that there’s no documentation, trying to read the
coffee grounds from the kernel source, finding it doesn’t
even match the RFC (which, again, doesn’t match what itojun
proposed, for some reason), does not instigate trust in the
things I *think* I’ve found.

bye,
//mirabilos
-- 
tarent solutions GmbH
Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/
Tel: +49 228 54881-393 • Fax: +49 228 54881-235
HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug#966459: linux: traffic class socket options (both IPv4/IPv6) inconsistent with docs/standards
  2020-08-02 19:29   ` Thorsten Glaser
@ 2020-08-02 20:29     ` Ben Hutchings
  2020-08-02 20:44       ` Thorsten Glaser
  0 siblings, 1 reply; 6+ messages in thread
From: Ben Hutchings @ 2020-08-02 20:29 UTC (permalink / raw)
  To: Thorsten Glaser; +Cc: 966459, netdev

[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]

On Sun, 2020-08-02 at 19:29 +0000, Thorsten Glaser wrote:
> Ben Hutchings dixit:
> 
> >ip(7) also doesn't document IP_PKTOPIONS.
> 
> Hmm, I don’t use IP_PKTOPIONS though. I’m not exactly sure I found
> the correct place in the kernel for what I do.

The first instance of put_cmsg(...IP_TOS...) you found in
net/ipv4/ip_sockglue.c implements that socket option.

[...]
> >I see no point in changing the IPv6 behaviour: it seems to be
> >consistent with itself and with the standard
> 
> Not really: if the kernel writes an int and userspace reads
> its first byte, it only works by accident on little endian,
> but not elsewhere.

The RFC says that the IPV6_TCLASS option's value is an int, and that
"the first byte of cmsg_data[] will be the *first byte of the integer*
traffic class" (my emphasis).  We can infer from "the first byte of"
that cmsg_data[] will hold more than one byte.  And "the integer"
suggests that it's a C int, like the socket option.

> >so only risks breaking user-space that works today.
> 
> Hrm. It risks breaking userspace that reads an int. But the
> RFC clearly says it should read the first byte, not an int.
[...]

No, the wording is *not* clear.

Ben.

-- 
Ben Hutchings
It is easier to write an incorrect program
than to understand a correct one.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug#966459: linux: traffic class socket options (both IPv4/IPv6) inconsistent with docs/standards
  2020-08-02 20:29     ` Ben Hutchings
@ 2020-08-02 20:44       ` Thorsten Glaser
  2020-08-03  3:32         ` Ben Hutchings
  0 siblings, 1 reply; 6+ messages in thread
From: Thorsten Glaser @ 2020-08-02 20:44 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: 966459, netdev

On Sun, 2 Aug 2020, Ben Hutchings wrote:

> The RFC says that the IPV6_TCLASS option's value is an int, and that

for setsockopt (“option's”), not cmsg

> No, the wording is *not* clear.

Agreed.

So perhaps let’s try to find out what’s actually right…

Thanks for helping,
//mirabilos
-- 
tarent solutions GmbH
Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/
Tel: +49 228 54881-393 • Fax: +49 228 54881-235
HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug#966459: linux: traffic class socket options (both IPv4/IPv6) inconsistent with docs/standards
  2020-08-02 20:44       ` Thorsten Glaser
@ 2020-08-03  3:32         ` Ben Hutchings
  2020-08-03 16:58           ` Thorsten Glaser
  0 siblings, 1 reply; 6+ messages in thread
From: Ben Hutchings @ 2020-08-03  3:32 UTC (permalink / raw)
  To: Thorsten Glaser, 966459; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 663 bytes --]

On Sun, 2020-08-02 at 22:44 +0200, Thorsten Glaser wrote:
> On Sun, 2 Aug 2020, Ben Hutchings wrote:
> 
> > The RFC says that the IPV6_TCLASS option's value is an int, and that
> 
> for setsockopt (“option's”), not cmsg
> 
> > No, the wording is *not* clear.
> 
> Agreed.
> 
> So perhaps let’s try to find out what’s actually right…

For what it's worth, FreeBSD/Darwin and Windows also put 4 bytes of
data in a IPV6_TCLASS cmsg.  So whether or not it's "right", it's
consistent between three independent implementations.

Ben.

-- 
Ben Hutchings
For every complex problem
there is a solution that is simple, neat, and wrong.



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug#966459: linux: traffic class socket options (both IPv4/IPv6) inconsistent with docs/standards
  2020-08-03  3:32         ` Ben Hutchings
@ 2020-08-03 16:58           ` Thorsten Glaser
  0 siblings, 0 replies; 6+ messages in thread
From: Thorsten Glaser @ 2020-08-03 16:58 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: 966459, netdev

Hi Ben,

> For what it's worth, FreeBSD/Darwin and Windows also put 4 bytes of
> data in a IPV6_TCLASS cmsg.  So whether or not it's "right", it's
> consistent between three independent implementations.

oh, thank you, I don’t have any of these systems around at the
moment, so checking them was tricky for me.

So basically I should read an int in host endianness then (or
keep the code I currently have that compares byte 0 and 3, using
the one that’s not 0, if any). Great, thank you!

After some minor porting work, it turns out that the current code
does work on MidnightBSD (equivalent to FreeBSD 10.4) for IPv6.
I guess I’ll keep ints then.

bye,
//mirabilos
-- 
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-08-03 16:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <159596111771.2639.6929056987566441726.reportbug@tglase-nb.lan.tarent.de>
2020-08-02 17:49 ` Bug#966459: linux: traffic class socket options (both IPv4/IPv6) inconsistent with docs/standards Ben Hutchings
2020-08-02 19:29   ` Thorsten Glaser
2020-08-02 20:29     ` Ben Hutchings
2020-08-02 20:44       ` Thorsten Glaser
2020-08-03  3:32         ` Ben Hutchings
2020-08-03 16:58           ` Thorsten Glaser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.