Re: [PATCH v2 bpf-next 2/9] bpf: Add bpf helper bpf_tcp_enter_cwr

From: Martin Lau <kafai@fb.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Lawrence Brakmo <brakmo@fb.com>, netdev <netdev@vger.kernel.org>,
	"Alexei Starovoitov" <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Kernel Team" <Kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 2/9] bpf: Add bpf helper bpf_tcp_enter_cwr
Date: Sun, 24 Feb 2019 03:08:48 +0000	[thread overview]
Message-ID: <20190224030845.imwjbkoaxipuzb75@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <2a218060-8a62-150c-c05e-5433df18aaab@gmail.com>

On Sat, Feb 23, 2019 at 05:32:14PM -0800, Eric Dumazet wrote:
> 
> 
> On 02/22/2019 05:06 PM, brakmo wrote:
> > From: Martin KaFai Lau <kafai@fb.com>
> > 
> > This patch adds a new bpf helper BPF_FUNC_tcp_enter_cwr
> > "int bpf_tcp_enter_cwr(struct bpf_tcp_sock *tp)".
> > It is added to BPF_PROG_TYPE_CGROUP_SKB which can be attached
> > to the egress path where the bpf prog is called by
> > ip_finish_output() or ip6_finish_output().  The verifier
> > ensures that the parameter must be a tcp_sock.
> > 
> > This helper makes a tcp_sock enter CWR state.  It can be used
> > by a bpf_prog to manage egress network bandwidth limit per
> > cgroupv2.  A later patch will have a sample program to
> > show how it can be used to limit bandwidth usage per cgroupv2.
> > 
> > To ensure it is only called from BPF_CGROUP_INET_EGRESS, the
> > attr->expected_attach_type must be specified as BPF_CGROUP_INET_EGRESS
> > during load time if the prog uses this new helper.
> > The newly added prog->enforce_expected_attach_type bit will also be set
> > if this new helper is used.  This bit is for backward compatibility reason
> > because currently prog->expected_attach_type has been ignored in
> > BPF_PROG_TYPE_CGROUP_SKB.  During attach time,
> > prog->expected_attach_type is only enforced if the
> > prog->enforce_expected_attach_type bit is set.
> > i.e. prog->expected_attach_type is only enforced if this new helper
> > is used by the prog.
> > 
> 
> BTW, it seems to me that BPF_CGROUP_INET_EGRESS can be used while the socket lock is not held.
Thanks for pointing it out.

ic. I just noticed the comments at ip6_xmit():
/*
 * xmit an sk_buff (used by TCP, SCTP and DCCP)
 * Note : socket lock is not held for SYNACK packets, but might be modified
 * by calls to skb_set_owner_w() and ipv6_local_error(),
 * which are using proper atomic operations or spinlocks.
 */
Is there other cases other than SYNACK?

Thanks,
Martin