Re: [PATCH bpf] bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog

From: Martin KaFai Lau <kafai@fb.com>
To: Hou Tao <houtao1@huawei.com>
Cc: <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	KP Singh <kpsingh@kernel.org>
Subject: Re: [PATCH bpf] bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog
Date: Wed, 8 Sep 2021 10:19:39 -0700	[thread overview]
Message-ID: <20210908171939.l6ozdyoji3n5baaf@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <8e8dd070-ba19-2153-bf9b-8bbb16a70abb@huawei.com>

On Wed, Sep 08, 2021 at 09:31:55PM +0800, Hou Tao wrote:
> Hi,
> 
> On 9/8/2021 2:06 PM, Martin KaFai Lau wrote:
> > On Wed, Sep 01, 2021 at 04:53:44PM +0800, Hou Tao wrote:
> >> Currently if a function ptr in struct_ops has a return value, its
> >> caller will get a random return value from it, because the return
> >> value of related BPF_PROG_TYPE_STRUCT_OPS prog is just dropped.
> >>
> >> So adding a new flag BPF_TRAMP_F_RET_FENTRY_RET to tell bpf trampoline
> >> to save and return the return value of struct_ops prog if ret_size of
> >> the function ptr is greater than 0. Also restricting the flag to be
> >> used alone.
> > Thanks for the report and fix!  Sorry for the late reply.
> >
> > This bug is missed because the tcp-cc func is not always called.
> > A better test needs to be created to force exercising these funcs
> > in bpf_test_run(), which can be a follow-up patch in the bpf-next.
> > Could you help to create this test as a follow up?
> 
> Yes, will do. The first thought comes into my mind is implementing .get_info hook
> in a bpf tcp_congestion_ops and checking its return value in userspace by
> getsockopt(fd, TCP_CC_INFO).
The bpf-tcp-cc's struct_ops currently does not support ".get_info".
It will be a good addition also.

Different bpf-tcp-cc implementations have different infos, so it cannot be
bounded by a fixed struct like 'union tcp_cc_info'.  The format should be
a btf_id followed by the actual info-data.  The kernel should be able to
learn the size of the info-data from the btf_id.  The ".get_info" is
also used by inet_diag for tools (ss) like iproute2.  libbpf can pretty-print
the btf described data and libbpf support is added to iproute2, so pieces
should be in-place for iproute2's tools to handle data described by btf.

For ".get_info" in getsockopt(TCP_CC_INFO), not sure how the application
may use them but I think it will at least enable the application log
them as other kernel's tcp-cc do.  The implementation details may
need some more thoughts but should not be a big issue.

> I also consider to add a new BPF struct_ops
> for testing purpose, but it may be a little overkill.
A dummy struct_ops for testing makes sense. It probably should
be the one done first for testing purpose.  Although "get_info"
is a good add, having a separate testing struct_ops will be easier
to test other interesting cases in the future.

> I just check that it can be applied both on bpf and bpf-next, do you
> have other commits in your tree ?
There is no local commit.

From a quick look, the patch is created from a pretty old tree and it
is missing the BPF_TRAMP_F_SKIP_FRAME.  It is introduced in
commit 7e6f3cd89f04 ("bpf, x86: Store caller's ip in trampoline stack")
on Jul 15 2021 which is pretty old.

I am only able to apply with the --3way merge like "git am --3way".
Andrii, is it fine to land the patch like this?

> @@ -1949,17 +1972,19 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	struct bpf_tramp_progs *fmod_ret = &tprogs[BPF_TRAMP_MODIFY_RETURN];
>  	u8 **branches = NULL;
>  	u8 *prog;
> +	bool save_ret;
>  
>  	/* x86-64 supports up to 6 arguments. 7+ can be added in the future */
>  	if (nr_args > 6)
>  		return -ENOTSUPP;
>  
> -	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
> -	    (flags & BPF_TRAMP_F_SKIP_FRAME))
> +	if (!is_valid_bpf_tramp_flags(flags))
>  		return -EINVAL;
>  
> -	if (flags & BPF_TRAMP_F_CALL_ORIG)
> -		stack_size += 8; /* room for return value of orig_call */
> +	/* room for return value of orig_call or fentry prog */
> +	save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
> +	if (save_ret)
> +		stack_size += 8;
>  
>  	if (flags & BPF_TRAMP_F_SKIP_FRAME)
  	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>  		/* skip patched call instruction and point orig_call to actual