From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F430C43381 for ; Tue, 19 Feb 2019 10:52:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DCB8821902 for ; Tue, 19 Feb 2019 10:52:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726589AbfBSKwi (ORCPT ); Tue, 19 Feb 2019 05:52:38 -0500 Received: from www62.your-server.de ([213.133.104.62]:52994 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725767AbfBSKwi (ORCPT ); Tue, 19 Feb 2019 05:52:38 -0500 Received: from [78.46.172.2] (helo=sslproxy05.your-server.de) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from ) id 1gw30t-0000tl-Ao; Tue, 19 Feb 2019 11:52:35 +0100 Received: from [178.197.248.36] (helo=linux.home) by sslproxy05.your-server.de with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89) (envelope-from ) id 1gw30t-0008sq-54; Tue, 19 Feb 2019 11:52:35 +0100 Subject: Re: [PATCH bpf-next 3/9] bpf: add bpf helper bpf_skb_set_ecn To: brakmo , netdev Cc: Martin Lau , Alexei Starovoitov , Daniel Borkmann --cc=Kernel Team <"daniel@iogearbox.netKernel-team"@fb.com> References: <20190219053832.2086706-1-brakmo@fb.com> From: Daniel Borkmann Message-ID: Date: Tue, 19 Feb 2019 11:52:34 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20190219053832.2086706-1-brakmo@fb.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.100.2/25364/Mon Feb 18 11:35:52 2019) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 02/19/2019 06:38 AM, brakmo wrote: > This patch adds a new bpf helper BPF_FUNC_skb_set_ecn > "int bpf_skb_set_Ecn(struct sk_buff *skb)". It is added to > BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can > be attached to the ingress and egress path. This type of > bpf_prog cannot modify the skb directly. > > This helper is used to set the ECN bits (2) of the IPv6 or IPv4 > header in skb. It can be used by a bpf_prog to manage egress > network bandwdith limit per cgroupv2 by inducing an ECN > response in the TCP sender (when the packet is ECN enabled). > This works best when using DCTCP. > > Signed-off-by: Lawrence Brakmo > --- > include/uapi/linux/bpf.h | 10 +++++++++- > net/core/filter.c | 29 +++++++++++++++++++++++++++++ > 2 files changed, 38 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 9e9f4f1a0370..5daf404511f7 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -2365,6 +2365,13 @@ union bpf_attr { > * Make a tcp_sock enter CWR state. > * Return > * 0 > + * > + * int bpf_skb_set_ecn(struct sk_buf *skb, int val) Nit: BPF_CALL_2() has u32 val > + * Description > + * Sets ECN bits (2) of IP header. Works with IPv6 and IPv4. > + * val should be one of 0, 1, 2, 3. > + * Return > + * -EINVAL on error (e.g. val > 3), 0 otherwise. > */ > #define __BPF_FUNC_MAPPER(FN) \ > FN(unspec), \ > @@ -2464,7 +2471,8 @@ union bpf_attr { > FN(spin_unlock), \ > FN(sk_fullsock), \ > FN(tcp_sock), \ > - FN(tcp_enter_cwr), > + FN(tcp_enter_cwr), \ > + FN(skb_set_ecn), > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > * function eBPF program intends to call > diff --git a/net/core/filter.c b/net/core/filter.c > index f51c4a781844..275acfb2117d 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -5438,6 +5438,33 @@ static const struct bpf_func_proto bpf_tcp_enter_cwr_proto = { > .ret_type = RET_INTEGER, > .arg1_type = ARG_PTR_TO_TCP_SOCK, > }; > + > +BPF_CALL_2(bpf_skb_set_ecn, struct sk_buff *, skb, u32, val) > +{ > + struct ipv6hdr *ip6h = ipv6_hdr(skb); > + > + if ((val & ~0x3) != 0) Nit: INET_ECN_MASK > + return -EINVAL; > + > + if (ip6h->version == 6) { > + ip6h->flow_lbl[0] = (ip6h->flow_lbl[0] & ~0x30) | (val << 4); > + return 0; > + } else if (ip6h->version == 4) { > + struct iphdr *ip4h = (struct iphdr *)ip6h; > + > + ip4h->tos = (ip4h->tos & ~0x3) | val; > + return 0; > + } Couldn't this be done as native BPF code via direct packet access instead? Afaik, skb->data should most likely points to network header for the hooks and skb->protocol should be one of ETH_P_IP{,V6}, no? Aside from this, don't we also have cloned skbs here (in particular from TCP side)? Looking at cg_skb_verifier_ops ... it seems there also a bug in the current code, namely that if we have a direct packet write, we don't make the skb writable; at that point skb->data is not private. The cg_skb_is_valid_access() allows to fetch PTR_TO_PACKET{,_END}, so we need a fix like the below for -bpf: diff --git a/net/core/filter.c b/net/core/filter.c index f7d0004fc160..34fe6da0a236 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5796,6 +5796,12 @@ static bool sk_filter_is_valid_access(int off, int size, return bpf_skb_is_valid_access(off, size, type, prog, info); } +static int cg_skb_prologue(struct bpf_insn *insn_buf, bool direct_write, + const struct bpf_prog *prog) +{ + return bpf_unclone_prologue(insn_buf, direct_write, prog, 0); +} + static bool cg_skb_is_valid_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, @@ -7595,6 +7601,7 @@ const struct bpf_verifier_ops cg_skb_verifier_ops = { .get_func_proto = cg_skb_func_proto, .is_valid_access = cg_skb_is_valid_access, .convert_ctx_access = bpf_convert_ctx_access, + .gen_prologue = cg_skb_prologue, }; const struct bpf_prog_ops cg_skb_prog_ops = { > + return -EINVAL; > +} > + > +static const struct bpf_func_proto bpf_skb_set_ecn_proto = { > + .func = bpf_skb_set_ecn, > + .gpl_only = false, > + .ret_type = RET_INTEGER, > + .arg1_type = ARG_PTR_TO_CTX, > + .arg2_type = ARG_ANYTHING, > +}; > #endif /* CONFIG_INET */ > > bool bpf_helper_changes_pkt_data(void *func) > @@ -5599,6 +5626,8 @@ cg_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > return &bpf_tcp_sock_proto; > case BPF_FUNC_tcp_enter_cwr: > return &bpf_tcp_enter_cwr_proto; > + case BPF_FUNC_skb_set_ecn: > + return &bpf_skb_set_ecn_proto; > #endif > default: > return sk_filter_func_proto(func_id, prog); >