Re: [PATCH net-next] net: DSCP in IPv4 routing v2

From: Guillaume Nault <gnault@redhat.com>
To: Russell Strong <russell@strong.id.au>
Cc: netdev@vger.kernel.org
Subject: Re: [PATCH net-next] net: DSCP in IPv4 routing v2
Date: Tue, 24 Nov 2020 16:22:22 +0100	[thread overview]
Message-ID: <20201124152222.GB28947@linux.home> (raw)
In-Reply-To: <20201124124149.11fe991e@192-168-1-16.tpgi.com.au>

On Tue, Nov 24, 2020 at 12:41:49PM +1000, Russell Strong wrote:
> On Mon, 23 Nov 2020 23:55:05 +0100 Guillaume Nault <gnault@redhat.com> wrote:
> > On Sat, Nov 21, 2020 at 06:24:46PM +1000, Russell Strong wrote:
> 
> I was wondering if one patch would be acceptable, or should it be broken
> up?  If broken up. It would not make sense to apply 1/2 of them.

A patch series would be applied in its entirety or not applied at all.
However, it's not acceptable to temporarily bring regressions in one
patch and fix it later in the series. The tree has to remain
bisectable.

Anyway, I believe there's no need to replace all the TOS macros in the
same patch series. DSCP doesn't have to be enabled everywhere at once.
Small, targeted, patch series are much easier to review.

> > RT_TOS didn't clear the second lowest bit, while the new IP_DSCP does.
> > Therefore, there's no guarantee that such a blanket replacement isn't
> > going to change existing behaviours. Replacements have to be done
> > step by step and accompanied by an explanation of why they're safe.
> 
> Original TOS did not use this bit until it was added in RFC1349 as "lowcost".
> The DSCP change (RFC2474) marked these as currently unused, but worse than that,
> with the introduction of ECN, both of those now "unused" bits are for ECN.
> Other parts of the kernel are using those bits for ECN, so bit 1 probably
> shouldn't be used in routing anymore as congestion could create unexpected
> routing behaviour, i.e. fib_rules

The IETF meaning and history of these bits are well understood. But we
can't write patches based on assumptions like "bit 1 probably shouldn't
be used". The actual code is what matters. That's why, again, changes
have to be done incrementally and in a reviewable manner.

> > For example some of the ip6_make_flowinfo() calls can probably
> > erroneously mark some packets with ECT(0). Instead of masking the
> > problem in this patch, I think it'd be better to have an explicit fix
> > that'd mask the ECN bits in ip6_make_flowinfo() and drop the buggy
> > RT_TOS() in the callers.
> > 
> > Another example is inet_rtm_getroute(). It calls
> > ip_route_output_key_hash_rcu() without masking the tos field first.
> 
> Should rtm->tos be checked for validity in inet_rtm_valid_getroute_req? Seems
> like it was missed.

Well, I don't think so. inet_rtm_valid_getroute_req() is supposed to
return an error if a parameter is wrong. Verifying ->tos should have
been done since day 1, yes. However, in practice, we've been accepting
any value for years. That's the kind of user space behaviour that we
can't really change. The only solution I can see is to mask the ECN
bits silently. That way, users can still pass whatever they like (we
won't break any script), but the result will be right (that is,
consistent with what routing does).

> > Therefore it can return a different route than what the routing code
> > would actually use. Like for the ip6_make_flowinfo() case, it might
> > be better to stop relying on the callers to mask ECN bits and do that
> > in ip_route_output_key_hash_rcu() instead.
> 
> In this context one of the ECN bits is not an ECN bit, as can be seen by
> 
> #define RT_FL_TOS(oldflp4) \
>         ((oldflp4)->flowi4_tos & (IP_DSCP_MASK | RTO_ONLINK))

The RTO_ONLINK flag would have to be passed in a different way. Not a
trivial task (many places to audit), but that looks feasible.

> It's all a bit messy and spread about.  Reducing the distributed nature of
> the masking would be good.

Yes, that's why I'd like to stop sprinkling RT_TOS everywhere and mask
the bits in central places when possible. Once the RT_TOS situation
improves, adding DSCP support will be much easier.

> > I'll verify that these two problems can actually happen in practice
> > and will send patches if necessary.
> 
> Thanks
>