Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field size

From: Yonghong Song <yhs@fb.com>
To: Jakub Sitnicki <jakub@cloudflare.com>, <bpf@vger.kernel.org>
Cc: <netdev@vger.kernel.org>, <kernel-team@cloudflare.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>
Subject: Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field size
Date: Tue, 14 Jul 2020 23:44:01 -0700	[thread overview]
Message-ID: <c98aaa5e-9347-c23f-cfa6-e267f2485c5b@fb.com> (raw)
In-Reply-To: <20200710173123.427983-1-jakub@cloudflare.com>

On 7/10/20 10:31 AM, Jakub Sitnicki wrote:
> When size of load from context is the same as target field size, but less
> than context field size, the verifier does not emit the shift and mask
> instructions for loads at non-zero offset.
> 
> This has the unexpected effect of loading the same data no matter what the
> offset was. While the expected behavior would be to load zeros for offsets
> that are greater than target field size.
> 
> For instance, u16 load from u32 context field backed by u16 target field at
> an offset of 2 bytes results in:
> 
>    SEC("sk_reuseport/narrow_half")
>    int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>    {
>    	__u16 *half;
> 
>    	half = (__u16 *)&ctx->ip_protocol;
>    	if (half[0] == 0xaaaa)
>    		return SK_DROP;
>    	if (half[1] == 0xbbbb)
>    		return SK_DROP;
>    	return SK_PASS;
>    }

It would be good if you can include llvm asm output like below so people
can correlate source => asm => xlated codes:

        0:       w0 = 0
        1:       r2 = *(u16 *)(r1 + 24)
        2:       if w2 == 43690 goto +4 <LBB0_3>
        3:       r1 = *(u16 *)(r1 + 26)
        4:       w0 = 1
        5:       if w1 != 48059 goto +1 <LBB0_3>
        6:       w0 = 0

0000000000000038 <LBB0_3>:
        7:       exit

> 
>    int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>    ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>       0: (b4) w0 = 0
>    ; if (half[0] == 0xaaaa)
>       1: (79) r2 = *(u64 *)(r1 +8)
>       2: (69) r2 = *(u16 *)(r2 +924)
>    ; if (half[0] == 0xaaaa)
>       3: (16) if w2 == 0xaaaa goto pc+5
>    ; if (half[1] == 0xbbbb)
>       4: (79) r1 = *(u64 *)(r1 +8)
>       5: (69) r1 = *(u16 *)(r1 +924)
>       6: (b4) w0 = 1
>    ; if (half[1] == 0xbbbb)
>       7: (56) if w1 != 0xbbbb goto pc+1
>       8: (b4) w0 = 0
>    ; }
>       9: (95) exit

Indeed we have an issue here. The insn 5 is not correct.
The original assembly is correct.

Internally ip_protocol is backed by 2 bytes in sk_reuseport_kern.
The current verifier implementation makes an important assumption:
    all user load requests are within the size of kernel internal range
In this case, the verifier actually only correctly supports
    . one byte from offset 0
    . one byte from offset 1
    . two bytes from offset 0

The original assembly code tries to access 2 bytes from offset 2
and the verifier did incorrect transformation.

This actually makes sense since any other read is
misleading. For example, for ip_protocol, if people wants to
load 2 bytes from offset 2, what should we return? 0? In this case,
actually verifier can convert it to 0 with doing a load.

> In this case half[0] == half[1] == sk->sk_protocol that backs the
> ctx->ip_protocol field.
> 
> Fix it by shifting and masking any load from context that is narrower than
> context field size (is_narrower_load = size < ctx_field_size), in addition
> to loads that are narrower than target field size.

The fix can workaround the issue, but I think we should generate better 
codes for such cases.

> 
> The "size < target_size" check is left in place to cover the case when a
> context field is narrower than its target field, even if we might not have
> such case now. (It would have to be a u32 context field backed by a u64
> target field, with context fields all being 4-bytes or wider.)
> 
> Going back to the example, with the fix in place, the upper half load from
> ctx->ip_protocol yields zero:
> 
>    int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>    ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>       0: (b4) w0 = 0
>    ; if (half[0] == 0xaaaa)
>       1: (79) r2 = *(u64 *)(r1 +8)
>       2: (69) r2 = *(u16 *)(r2 +924)
>       3: (54) w2 &= 65535
>    ; if (half[0] == 0xaaaa)
>       4: (16) if w2 == 0xaaaa goto pc+7
>    ; if (half[1] == 0xbbbb)
>       5: (79) r1 = *(u64 *)(r1 +8)
>       6: (69) r1 = *(u16 *)(r1 +924)

The load is still from offset 0, 2 bytes with upper 48 bits as 0.

>       7: (74) w1 >>= 16

w1 will be 0 now. so this will work.

>       8: (54) w1 &= 65535

For the above insns 5-8, verifier, based on target information can
directly generate w1 = 0 since:
   . target kernel field size is 2, ctx field size is 4.
   . user tries to access offset 2 size 2.

Here, we need to decide whether we permits user to do partial read 
beyond of kernel narrow field or not (e.g., this example)? I would
say yes, but Daniel or Alexei can provide additional comments.

If we allow such accesses, I would like verifier to generate better
code as I illustrated in the above. This can be implemented in
verifier itself with target passing additional kernel field size
to the verifier. The target already passed the ctx field size back
to the verifier.

>       9: (b4) w0 = 1
>    ; if (half[1] == 0xbbbb)
>      10: (56) if w1 != 0xbbbb goto pc+1
>      11: (b4) w0 = 0
>    ; }
>      12: (95) exit
> 
> Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>   kernel/bpf/verifier.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 94cead5a43e5..1c4d0e24a5a2 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>   			return -EINVAL;
>   		}
>   
> -		if (is_narrower_load && size < target_size) {
> +		if (is_narrower_load || size < target_size) {
>   			u8 shift = bpf_ctx_narrow_access_offset(
>   				off, size, size_default) * 8;
>   			if (ctx_field_size <= 4) {
>