bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Maciej Żenczykowski" <maze@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "Tyler Wear (QUIC)" <quic_twear@quicinc.com>,
	"Network Development" <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, "Yonghong Song" <yhs@fb.com>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Song Liu" <song@kernel.org>
Subject: Re: [PATCH bpf-next v6 1/2] Add skb_store_bytes() for BPF_PROG_TYPE_CGROUP_SKB
Date: Fri, 14 Jan 2022 13:18:44 -0800	[thread overview]
Message-ID: <CANP3RGfJ2G8P40hN2F=PGDYUc3pm84=SNppHp_J0V+YiDkLM_A@mail.gmail.com> (raw)
In-Reply-To: <CAADnVQJc=qgz47S1OuUBmX5Rb_opZUCADKqzqGnBruxtJONO7Q@mail.gmail.com>

> > > > This is wrong.
> > > > CGROUP_INET_EGRESS bpf prog cannot arbitrary change packet data.

I agree with this sentiment, which is why the original proposal was
simply to add a helper which is only capable of modifying the
tos/tclass/dscp field, and not any arbitrary bytes.  (note: there
already is such a helper to set the ECN congestion notification bits,
so there's somewhat of a precedent)

> > > > The networking stack populated the IP header at that point.
> > > > If the prog changes it to something else it will be confusing other
> > > > layers of stack. neigh(L2) will be wrong, etc.
> > > > We can still change certain things in the packet, but not arbitrary bytes.
> > > >
> > > > We cannot change the DS field directly in the packet either.

This part I won't agree with.  In most cases there is no DSCP based
routing decision, in which case it seems perfectly reasonable to
change the DSCP bits here.  Indeed last I checked (though this was a
few years ago) the ipv4 tos routing code wasn't even capable of making
sane decisions, because it looks at the bottom 4 bits of the TOS
field, instead of the top 6 bits, ie. you can route on ECN bits, but
you can't route on the full DSCP field.  Additionally afaik the ipv6
tclass routing simply wasn't implemented.  However, I last had to deal
with this probably half a decade ago, on even older kernels, so
perhaps the situation has changed.

Additionally DSCP bits may affect transmit queue selection (for
something like wifi qos / traffic prioritization across multiple
transmit queues with different air-time behaviours - which can use
dscp), so ideally we need dscp to be set *before* the mq qdisc /
dispatch.  I think this implies it needs to happen before tc (though
again, I'm not too certain of the ordering here).

> > > > It can only be changed by changing its value in the socket.

Changing it directly in the socket has two problems:
- it becomes visible to userspace which is undesirable (ie. I've run
across userspace code which will set tos to A, then read it back and
exit/fail/crash if it doesn't see A)
- if the tos bits themselves are an input to the decision about what
tos bits to actually use, then this becomes recursive and basically
impossible to get right.  (for example ssh sets tos to different
values for interactive/bulk (ie. copy) traffic, so using application
selected tos to select wire tos is perfectly reasonable)

> > > Why is the DS field unchangeable, but ecn is changeable?
> >
> > Per spec the requirement is to modify the ds field of egress packets with DSCP value. Setting ds field on socket will not suffice here.
> > Another case is where device is a middle-man and needs to modify the packets of a connected tethered client with the DSCP value, using a sock will not be able to change the packet here.
>
> If DS field needs to be changed differently for every packet
> it's better to use TC layer for this task.
> qdiscs may send packets with different DSs to different queues.

  reply	other threads:[~2022-01-14 21:18 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-13  0:06 [PATCH bpf-next v6 1/2] Add skb_store_bytes() for BPF_PROG_TYPE_CGROUP_SKB Tyler Wear
2022-01-13  0:06 ` [PATCH bpf-next v6 2/2] selftests/bpf: CGROUP_SKB test for skb_store_bytes() Tyler Wear
2022-01-13  0:55   ` Song Liu
2022-01-13  0:43 ` [PATCH bpf-next v6 1/2] Add skb_store_bytes() for BPF_PROG_TYPE_CGROUP_SKB Song Liu
2022-01-13  2:14 ` Alexei Starovoitov
2022-01-13  5:12   ` Tyler Wear
2022-01-14  6:49     ` Tyler Wear (QUIC)
2022-01-14  6:55       ` Alexei Starovoitov
2022-01-14 21:18         ` Maciej Żenczykowski [this message]
2022-01-18 20:37           ` Alexei Starovoitov
2022-01-18 22:23             ` Tyler Wear
2022-01-18 22:31               ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANP3RGfJ2G8P40hN2F=PGDYUc3pm84=SNppHp_J0V+YiDkLM_A@mail.gmail.com' \
    --to=maze@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=quic_twear@quicinc.com \
    --cc=song@kernel.org \
    --cc=toke@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).