All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Martin Lau <kafai@fb.com>
Cc: Lawrence Brakmo <brakmo@fb.com>, netdev <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 4/9] bpf: add bpf helper bpf_skb_ecn_set_ce
Date: Mon, 25 Feb 2019 11:10:39 +0100	[thread overview]
Message-ID: <c9ebe42a-4fc5-74f5-1b80-30ec45772174@iogearbox.net> (raw)
In-Reply-To: <20190223073031.utnow4seviqyfqta@kafai-mbp.dhcp.thefacebook.com>

On 02/23/2019 08:30 AM, Martin Lau wrote:
> On Sat, Feb 23, 2019 at 02:14:26AM +0100, Daniel Borkmann wrote:
>> On 02/23/2019 02:06 AM, brakmo wrote:
>>> This patch adds a new bpf helper BPF_FUNC_skb_ecn_set_ce
>>> "int bpf_skb_ecn_set_ce(struct sk_buff *skb)". It is added to
>>> BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can
>>> be attached to the ingress and egress path. The helper is needed
>>> because his type of bpf_prog cannot modify the skb directly.
>>>
>>> This helper is used to set the ECN field of ECN capable IP packets to ce
>>> (congestion encountered) in the IPv6 or IPv4 header of the skb. It can be
>>> used by a bpf_prog to manage egress or ingress network bandwdith limit
>>> per cgroupv2 by inducing an ECN response in the TCP sender.
>>> This works best when using DCTCP.
>>>
>>> Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
>>> ---
>>>  include/uapi/linux/bpf.h | 10 +++++++++-
>>>  net/core/filter.c        | 14 ++++++++++++++
>>>  2 files changed, 23 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 95b5058fa945..fc646f3eaf9b 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -2365,6 +2365,13 @@ union bpf_attr {
>>>   *		Make a tcp_sock enter CWR state.
>>>   *	Return
>>>   *		0 on success, or a negative error in case of failure.
>>> + *
>>> + * int bpf_skb_ecn_set_ce(struct sk_buf *skb)
>>> + *	Description
>>> + *		Sets ECN of IP header to ce (congestion encountered) if
>>> + *		current value is ect (ECN capable). Works with IPv6 and IPv4.
>>> + *	Return
>>> + *		1 if set, 0 if not set.
>>>   */
>>>  #define __BPF_FUNC_MAPPER(FN)		\
>>>  	FN(unspec),			\
>>> @@ -2464,7 +2471,8 @@ union bpf_attr {
>>>  	FN(spin_unlock),		\
>>>  	FN(sk_fullsock),		\
>>>  	FN(tcp_sock),			\
>>> -	FN(tcp_enter_cwr),
>>> +	FN(tcp_enter_cwr),		\
>>> +	FN(skb_ecn_set_ce),
>>>  
>>>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>>   * function eBPF program intends to call
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index ca57ef25279c..955369c6ed30 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -5444,6 +5444,18 @@ static const struct bpf_func_proto bpf_tcp_enter_cwr_proto = {
>>>  	.ret_type    = RET_INTEGER,
>>>  	.arg1_type    = ARG_PTR_TO_TCP_SOCK,
>>>  };
>>> +
>>> +BPF_CALL_1(bpf_skb_ecn_set_ce, struct sk_buff *, skb)
>>> +{
>>> +	return INET_ECN_set_ce(skb);
>>
>> Hm, but as mentioned last time, don't we have to ensure here that skb
>> is writable (aka skb->data private to us before writing into it)?
> INET_ECN_set_ce(skb) is also called from a few net/sched/sch_*.c
> but I don't see how they ensure if a skb is writable.
> 
> May be I have missed something there that can also be borrowed and
> reused here?

My understanding is that before doing any writes into skb, we should make
sure the data area is private to us (and offset in linear data). In tc BPF
(ingress, egress) we use bpf_try_make_writable() helper for this, others
like act_{pedit,skbmod} or ovs have similar logic before writing into skb,
note that in all these cases it's mostly about generic writes, so location
could also be L4, for example.

Difference of above helper compared to net/sched/sch_*.c instances could
be that it's i) for the qdisc case it's only on egress INET_ECN_set_ce()
and that there may be a convention that qdiscs specifically may mangle
it whereas the helper could be called on ingress and egress and confuse
other subsystems since they won't see original or race by seeing partially
updated (invalid) packet.

Eric, have a chance to clarify? Perhaps then would make sense to disallow
the helper in cgroup ingress path.

  reply	other threads:[~2019-02-25 10:10 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-23  1:06 [PATCH v2 bpf-next 0/9] bpf: Network Resource Manager (NRM) brakmo
2019-02-23  1:06 ` [PATCH v2 bpf-next 1/9] bpf: Remove const from get_func_proto brakmo
2019-02-23  1:06 ` [PATCH v2 bpf-next 2/9] bpf: Add bpf helper bpf_tcp_enter_cwr brakmo
2019-02-24  1:32   ` Eric Dumazet
2019-02-24  3:08     ` Martin Lau
2019-02-24  4:44       ` Alexei Starovoitov
2019-02-24 18:00       ` Eric Dumazet
2019-02-25 23:14   ` Stanislav Fomichev
2019-02-26  1:30     ` Martin Lau
2019-02-26  3:32       ` Stanislav Fomichev
2019-02-23  1:06 ` [PATCH v2 bpf-next 3/9] bpf: Test bpf_tcp_enter_cwr in test_verifier brakmo
2019-02-23  1:06 ` [PATCH v2 bpf-next 4/9] bpf: add bpf helper bpf_skb_ecn_set_ce brakmo
2019-02-23  1:14   ` Daniel Borkmann
2019-02-23  7:30     ` Martin Lau
2019-02-25 10:10       ` Daniel Borkmann [this message]
2019-02-25 16:52         ` Eric Dumazet
2019-02-23  1:06 ` [PATCH v2 bpf-next 5/9] bpf: Add bpf helper bpf_tcp_check_probe_timer brakmo
2019-02-23  1:07 ` [PATCH v2 bpf-next 6/9] bpf: sync bpf.h to tools and update bpf_helpers.h brakmo
2019-02-23  1:07 ` [PATCH v2 bpf-next 7/9] bpf: Sample NRM BPF program to limit egress bw brakmo
2019-02-23  1:07 ` [PATCH v2 bpf-next 8/9] bpf: User program for testing NRM brakmo
2019-02-23  1:07 ` [PATCH v2 bpf-next 9/9] bpf: NRM test script brakmo
2019-02-23  3:03 ` [PATCH v2 bpf-next 0/9] bpf: Network Resource Manager (NRM) David Ahern
2019-02-23 18:39   ` Eric Dumazet
2019-02-23 20:40     ` Alexei Starovoitov
2019-02-23 20:43       ` Eric Dumazet
2019-02-23 23:25         ` Alexei Starovoitov
2019-02-24  2:58           ` David Ahern
2019-02-24  4:48             ` Alexei Starovoitov
2019-02-25  1:38               ` David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9ebe42a-4fc5-74f5-1b80-30ec45772174@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=Kernel-team@fb.com \
    --cc=ast@fb.com \
    --cc=brakmo@fb.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.