bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf] bpf: Shift and mask loads narrower than context field size
@ 2020-07-10 17:31 Jakub Sitnicki
  2020-07-15  6:44 ` Yonghong Song
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Sitnicki @ 2020-07-10 17:31 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann

When size of load from context is the same as target field size, but less
than context field size, the verifier does not emit the shift and mask
instructions for loads at non-zero offset.

This has the unexpected effect of loading the same data no matter what the
offset was. While the expected behavior would be to load zeros for offsets
that are greater than target field size.

For instance, u16 load from u32 context field backed by u16 target field at
an offset of 2 bytes results in:

  SEC("sk_reuseport/narrow_half")
  int reuseport_narrow_half(struct sk_reuseport_md *ctx)
  {
  	__u16 *half;

  	half = (__u16 *)&ctx->ip_protocol;
  	if (half[0] == 0xaaaa)
  		return SK_DROP;
  	if (half[1] == 0xbbbb)
  		return SK_DROP;
  	return SK_PASS;
  }

  int reuseport_narrow_half(struct sk_reuseport_md * ctx):
  ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
     0: (b4) w0 = 0
  ; if (half[0] == 0xaaaa)
     1: (79) r2 = *(u64 *)(r1 +8)
     2: (69) r2 = *(u16 *)(r2 +924)
  ; if (half[0] == 0xaaaa)
     3: (16) if w2 == 0xaaaa goto pc+5
  ; if (half[1] == 0xbbbb)
     4: (79) r1 = *(u64 *)(r1 +8)
     5: (69) r1 = *(u16 *)(r1 +924)
     6: (b4) w0 = 1
  ; if (half[1] == 0xbbbb)
     7: (56) if w1 != 0xbbbb goto pc+1
     8: (b4) w0 = 0
  ; }
     9: (95) exit

In this case half[0] == half[1] == sk->sk_protocol that backs the
ctx->ip_protocol field.

Fix it by shifting and masking any load from context that is narrower than
context field size (is_narrower_load = size < ctx_field_size), in addition
to loads that are narrower than target field size.

The "size < target_size" check is left in place to cover the case when a
context field is narrower than its target field, even if we might not have
such case now. (It would have to be a u32 context field backed by a u64
target field, with context fields all being 4-bytes or wider.)

Going back to the example, with the fix in place, the upper half load from
ctx->ip_protocol yields zero:

  int reuseport_narrow_half(struct sk_reuseport_md * ctx):
  ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
     0: (b4) w0 = 0
  ; if (half[0] == 0xaaaa)
     1: (79) r2 = *(u64 *)(r1 +8)
     2: (69) r2 = *(u16 *)(r2 +924)
     3: (54) w2 &= 65535
  ; if (half[0] == 0xaaaa)
     4: (16) if w2 == 0xaaaa goto pc+7
  ; if (half[1] == 0xbbbb)
     5: (79) r1 = *(u64 *)(r1 +8)
     6: (69) r1 = *(u16 *)(r1 +924)
     7: (74) w1 >>= 16
     8: (54) w1 &= 65535
     9: (b4) w0 = 1
  ; if (half[1] == 0xbbbb)
    10: (56) if w1 != 0xbbbb goto pc+1
    11: (b4) w0 = 0
  ; }
    12: (95) exit

Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 kernel/bpf/verifier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 94cead5a43e5..1c4d0e24a5a2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			return -EINVAL;
 		}
 
-		if (is_narrower_load && size < target_size) {
+		if (is_narrower_load || size < target_size) {
 			u8 shift = bpf_ctx_narrow_access_offset(
 				off, size, size_default) * 8;
 			if (ctx_field_size <= 4) {
-- 
2.25.4


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field size
  2020-07-10 17:31 [PATCH bpf] bpf: Shift and mask loads narrower than context field size Jakub Sitnicki
@ 2020-07-15  6:44 ` Yonghong Song
  2020-07-15 19:26   ` Jakub Sitnicki
  0 siblings, 1 reply; 6+ messages in thread
From: Yonghong Song @ 2020-07-15  6:44 UTC (permalink / raw)
  To: Jakub Sitnicki, bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann



On 7/10/20 10:31 AM, Jakub Sitnicki wrote:
> When size of load from context is the same as target field size, but less
> than context field size, the verifier does not emit the shift and mask
> instructions for loads at non-zero offset.
> 
> This has the unexpected effect of loading the same data no matter what the
> offset was. While the expected behavior would be to load zeros for offsets
> that are greater than target field size.
> 
> For instance, u16 load from u32 context field backed by u16 target field at
> an offset of 2 bytes results in:
> 
>    SEC("sk_reuseport/narrow_half")
>    int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>    {
>    	__u16 *half;
> 
>    	half = (__u16 *)&ctx->ip_protocol;
>    	if (half[0] == 0xaaaa)
>    		return SK_DROP;
>    	if (half[1] == 0xbbbb)
>    		return SK_DROP;
>    	return SK_PASS;
>    }

It would be good if you can include llvm asm output like below so people
can correlate source => asm => xlated codes:

        0:       w0 = 0
        1:       r2 = *(u16 *)(r1 + 24)
        2:       if w2 == 43690 goto +4 <LBB0_3>
        3:       r1 = *(u16 *)(r1 + 26)
        4:       w0 = 1
        5:       if w1 != 48059 goto +1 <LBB0_3>
        6:       w0 = 0

0000000000000038 <LBB0_3>:
        7:       exit

> 
>    int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>    ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>       0: (b4) w0 = 0
>    ; if (half[0] == 0xaaaa)
>       1: (79) r2 = *(u64 *)(r1 +8)
>       2: (69) r2 = *(u16 *)(r2 +924)
>    ; if (half[0] == 0xaaaa)
>       3: (16) if w2 == 0xaaaa goto pc+5
>    ; if (half[1] == 0xbbbb)
>       4: (79) r1 = *(u64 *)(r1 +8)
>       5: (69) r1 = *(u16 *)(r1 +924)
>       6: (b4) w0 = 1
>    ; if (half[1] == 0xbbbb)
>       7: (56) if w1 != 0xbbbb goto pc+1
>       8: (b4) w0 = 0
>    ; }
>       9: (95) exit

Indeed we have an issue here. The insn 5 is not correct.
The original assembly is correct.

Internally ip_protocol is backed by 2 bytes in sk_reuseport_kern.
The current verifier implementation makes an important assumption:
    all user load requests are within the size of kernel internal range
In this case, the verifier actually only correctly supports
    . one byte from offset 0
    . one byte from offset 1
    . two bytes from offset 0

The original assembly code tries to access 2 bytes from offset 2
and the verifier did incorrect transformation.

This actually makes sense since any other read is
misleading. For example, for ip_protocol, if people wants to
load 2 bytes from offset 2, what should we return? 0? In this case,
actually verifier can convert it to 0 with doing a load.


> In this case half[0] == half[1] == sk->sk_protocol that backs the
> ctx->ip_protocol field.
> 
> Fix it by shifting and masking any load from context that is narrower than
> context field size (is_narrower_load = size < ctx_field_size), in addition
> to loads that are narrower than target field size.

The fix can workaround the issue, but I think we should generate better 
codes for such cases.

> 
> The "size < target_size" check is left in place to cover the case when a
> context field is narrower than its target field, even if we might not have
> such case now. (It would have to be a u32 context field backed by a u64
> target field, with context fields all being 4-bytes or wider.)
> 
> Going back to the example, with the fix in place, the upper half load from
> ctx->ip_protocol yields zero:
> 
>    int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>    ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>       0: (b4) w0 = 0
>    ; if (half[0] == 0xaaaa)
>       1: (79) r2 = *(u64 *)(r1 +8)
>       2: (69) r2 = *(u16 *)(r2 +924)
>       3: (54) w2 &= 65535
>    ; if (half[0] == 0xaaaa)
>       4: (16) if w2 == 0xaaaa goto pc+7
>    ; if (half[1] == 0xbbbb)
>       5: (79) r1 = *(u64 *)(r1 +8)
>       6: (69) r1 = *(u16 *)(r1 +924)

The load is still from offset 0, 2 bytes with upper 48 bits as 0.

>       7: (74) w1 >>= 16

w1 will be 0 now. so this will work.

>       8: (54) w1 &= 65535

For the above insns 5-8, verifier, based on target information can
directly generate w1 = 0 since:
   . target kernel field size is 2, ctx field size is 4.
   . user tries to access offset 2 size 2.

Here, we need to decide whether we permits user to do partial read 
beyond of kernel narrow field or not (e.g., this example)? I would
say yes, but Daniel or Alexei can provide additional comments.

If we allow such accesses, I would like verifier to generate better
code as I illustrated in the above. This can be implemented in
verifier itself with target passing additional kernel field size
to the verifier. The target already passed the ctx field size back
to the verifier.

>       9: (b4) w0 = 1
>    ; if (half[1] == 0xbbbb)
>      10: (56) if w1 != 0xbbbb goto pc+1
>      11: (b4) w0 = 0
>    ; }
>      12: (95) exit
> 
> Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>   kernel/bpf/verifier.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 94cead5a43e5..1c4d0e24a5a2 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>   			return -EINVAL;
>   		}
>   
> -		if (is_narrower_load && size < target_size) {
> +		if (is_narrower_load || size < target_size) {
>   			u8 shift = bpf_ctx_narrow_access_offset(
>   				off, size, size_default) * 8;
>   			if (ctx_field_size <= 4) {
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field size
  2020-07-15  6:44 ` Yonghong Song
@ 2020-07-15 19:26   ` Jakub Sitnicki
  2020-07-15 20:59     ` Yonghong Song
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Sitnicki @ 2020-07-15 19:26 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann

On Wed, Jul 15, 2020 at 08:44 AM CEST, Yonghong Song wrote:
> On 7/10/20 10:31 AM, Jakub Sitnicki wrote:
>> When size of load from context is the same as target field size, but less
>> than context field size, the verifier does not emit the shift and mask
>> instructions for loads at non-zero offset.
>>
>> This has the unexpected effect of loading the same data no matter what the
>> offset was. While the expected behavior would be to load zeros for offsets
>> that are greater than target field size.
>>
>> For instance, u16 load from u32 context field backed by u16 target field at
>> an offset of 2 bytes results in:
>>
>>    SEC("sk_reuseport/narrow_half")
>>    int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>    {
>>    	__u16 *half;
>>
>>    	half = (__u16 *)&ctx->ip_protocol;
>>    	if (half[0] == 0xaaaa)
>>    		return SK_DROP;
>>    	if (half[1] == 0xbbbb)
>>    		return SK_DROP;
>>    	return SK_PASS;
>>    }
>
> It would be good if you can include llvm asm output like below so people
> can correlate source => asm => xlated codes:
>
>        0:       w0 = 0
>        1:       r2 = *(u16 *)(r1 + 24)
>        2:       if w2 == 43690 goto +4 <LBB0_3>
>        3:       r1 = *(u16 *)(r1 + 26)
>        4:       w0 = 1
>        5:       if w1 != 48059 goto +1 <LBB0_3>
>        6:       w0 = 0
>
> 0000000000000038 <LBB0_3>:
>        7:       exit

Sure, not a problem, if it makes reasoning about the problem easier.
I'm assuming that it is the narrow load at an offset that you wanted to
confirm with asm output.

>
>>
>>    int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>>    ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>       0: (b4) w0 = 0
>>    ; if (half[0] == 0xaaaa)
>>       1: (79) r2 = *(u64 *)(r1 +8)
>>       2: (69) r2 = *(u16 *)(r2 +924)
>>    ; if (half[0] == 0xaaaa)
>>       3: (16) if w2 == 0xaaaa goto pc+5
>>    ; if (half[1] == 0xbbbb)
>>       4: (79) r1 = *(u64 *)(r1 +8)
>>       5: (69) r1 = *(u16 *)(r1 +924)
>>       6: (b4) w0 = 1
>>    ; if (half[1] == 0xbbbb)
>>       7: (56) if w1 != 0xbbbb goto pc+1
>>       8: (b4) w0 = 0
>>    ; }
>>       9: (95) exit
>
> Indeed we have an issue here. The insn 5 is not correct.
> The original assembly is correct.
>
> Internally ip_protocol is backed by 2 bytes in sk_reuseport_kern.
> The current verifier implementation makes an important assumption:
>    all user load requests are within the size of kernel internal range
> In this case, the verifier actually only correctly supports
>    . one byte from offset 0
>    . one byte from offset 1
>    . two bytes from offset 0

I don't think that's true. For a field that has target size of 2 bytes,
like ip_protocol, 1-byte load at any offset is correctly supported
because right shifting and masking takes place. That is because we hit
the "size < target_size" condition in this case. Only loads of size >=
target size at an offset != give surprising results.

>
> The original assembly code tries to access 2 bytes from offset 2
> and the verifier did incorrect transformation.
>
> This actually makes sense since any other read is
> misleading. For example, for ip_protocol, if people wants to
> load 2 bytes from offset 2, what should we return? 0? In this case,
> actually verifier can convert it to 0 with doing a load.

Yes, IMHO, if you are loading 2 bytes at offset of 2 from a 4 byte
context field that holds an unsigned value, then it should return 0 for
for a field that is backed by a 2 byte kernel field.

I agree that it could be optimized to load an immediate value instead of
performing a load from memory. It didn't occur to me, so thanks for the
suggestion.

>> In this case half[0] == half[1] == sk->sk_protocol that backs the
>> ctx->ip_protocol field.
>>
>> Fix it by shifting and masking any load from context that is narrower than
>> context field size (is_narrower_load = size < ctx_field_size), in addition
>> to loads that are narrower than target field size.
>
> The fix can workaround the issue, but I think we should generate better codes
> for such cases.

Not sure I'd go as far as calling it a workaround. After all I
understand that in BPF loading a half word into a register is well
defined, and you can rely on upper word being zero. But please correct
me if not.

You're right, though, that approach can be smarter here, that is we can
emit just a single instruction that doesn't access memory.

>> The "size < target_size" check is left in place to cover the case when a
>> context field is narrower than its target field, even if we might not have
>> such case now. (It would have to be a u32 context field backed by a u64
>> target field, with context fields all being 4-bytes or wider.)
>>
>> Going back to the example, with the fix in place, the upper half load from
>> ctx->ip_protocol yields zero:
>>
>>    int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>>    ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>       0: (b4) w0 = 0
>>    ; if (half[0] == 0xaaaa)
>>       1: (79) r2 = *(u64 *)(r1 +8)
>>       2: (69) r2 = *(u16 *)(r2 +924)
>>       3: (54) w2 &= 65535
>>    ; if (half[0] == 0xaaaa)
>>       4: (16) if w2 == 0xaaaa goto pc+7
>>    ; if (half[1] == 0xbbbb)
>>       5: (79) r1 = *(u64 *)(r1 +8)
>>       6: (69) r1 = *(u16 *)(r1 +924)
>
> The load is still from offset 0, 2 bytes with upper 48 bits as 0.

Yes, this is how narrow loads currently work, right? It is not specific
to the case I'm fixing.

To give an example - if you do a 1-byte load at offset 1, it will load
the value from offset 0, and shift it right by 1 byte. So it is expected
that the load is always from offset 0 with current implementation.

SEC("sk_reuseport/narrow_byte")
int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
{
	__u8 *byte;

	byte = (__u8 *)&ctx->ip_protocol;
	if (byte[0] == 0xaa)
		return SK_DROP;
	if (byte[1] == 0xbb)
		return SK_DROP;
	if (byte[2] == 0xcc)
		return SK_DROP;
	if (byte[3] == 0xdd)
		return SK_DROP;
	return SK_PASS;
}

int reuseport_narrow_byte(struct sk_reuseport_md * ctx):
; int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
   0: (b4) w0 = 0
; if (byte[0] == 0xaa)
   1: (79) r2 = *(u64 *)(r1 +8)
   2: (69) r2 = *(u16 *)(r2 +924)
   3: (54) w2 &= 255
; if (byte[0] == 0xaa)
   4: (16) if w2 == 0xaa goto pc+17
; if (byte[1] == 0xbb)
   5: (79) r2 = *(u64 *)(r1 +8)
   6: (69) r2 = *(u16 *)(r2 +924)
   7: (74) w2 >>= 8
   8: (54) w2 &= 255
; if (byte[1] == 0xbb)
   9: (16) if w2 == 0xbb goto pc+12
; if (byte[2] == 0xcc)
  10: (79) r2 = *(u64 *)(r1 +8)
  11: (69) r2 = *(u16 *)(r2 +924)
  12: (74) w2 >>= 16
  13: (54) w2 &= 255
; if (byte[2] == 0xcc)
  14: (16) if w2 == 0xcc goto pc+7
; if (byte[3] == 0xdd)
  15: (79) r1 = *(u64 *)(r1 +8)
  16: (69) r1 = *(u16 *)(r1 +924)
  17: (74) w1 >>= 24
  18: (54) w1 &= 255
  19: (b4) w0 = 1
; if (byte[3] == 0xdd)
  20: (56) if w1 != 0xdd goto pc+1
  21: (b4) w0 = 0
; }
  22: (95) exit

>
>>       7: (74) w1 >>= 16
>
> w1 will be 0 now. so this will work.
>
>>       8: (54) w1 &= 65535
>
> For the above insns 5-8, verifier, based on target information can
> directly generate w1 = 0 since:
>   . target kernel field size is 2, ctx field size is 4.
>   . user tries to access offset 2 size 2.
>
> Here, we need to decide whether we permits user to do partial read beyond of
> kernel narrow field or not (e.g., this example)? I would
> say yes, but Daniel or Alexei can provide additional comments.
>
> If we allow such accesses, I would like verifier to generate better
> code as I illustrated in the above. This can be implemented in
> verifier itself with target passing additional kernel field size
> to the verifier. The target already passed the ctx field size back
> to the verifier.

Keep in mind that the BPF user is writing their code under the
assumption that the context field has 4 bytes. IMHO it's reasonable to
expect that I can load 2 bytes at offset of 2 from a 4 byte field.

Restricting it now to loads below the target field size, which is
unknown to the user, would mean rejecting programs that are working
today. Even if they are getting funny values.

I think implementing what you suggest is doable without major
changes. We have load size, target field size, and context field size at
hand in convert_ctx_accesses(), so it seems like a matter of adding an
'if' branch to handle better the case when we know the end result must
be 0. I'll give it a try.

But I do want to empahsize that I still think the fix in current form is
correct, or at least not worse than what we have already in place narrow
loads.

>
>>       9: (b4) w0 = 1
>>    ; if (half[1] == 0xbbbb)
>>      10: (56) if w1 != 0xbbbb goto pc+1
>>      11: (b4) w0 = 0
>>    ; }
>>      12: (95) exit
>>
>> Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>   kernel/bpf/verifier.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 94cead5a43e5..1c4d0e24a5a2 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>>   			return -EINVAL;
>>   		}
>>   -		if (is_narrower_load && size < target_size) {
>> +		if (is_narrower_load || size < target_size) {
>>   			u8 shift = bpf_ctx_narrow_access_offset(
>>   				off, size, size_default) * 8;
>>   			if (ctx_field_size <= 4) {
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field size
  2020-07-15 19:26   ` Jakub Sitnicki
@ 2020-07-15 20:59     ` Yonghong Song
  2020-07-16 11:48       ` Jakub Sitnicki
  0 siblings, 1 reply; 6+ messages in thread
From: Yonghong Song @ 2020-07-15 20:59 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann



On 7/15/20 12:26 PM, Jakub Sitnicki wrote:
> On Wed, Jul 15, 2020 at 08:44 AM CEST, Yonghong Song wrote:
>> On 7/10/20 10:31 AM, Jakub Sitnicki wrote:
>>> When size of load from context is the same as target field size, but less
>>> than context field size, the verifier does not emit the shift and mask
>>> instructions for loads at non-zero offset.
>>>
>>> This has the unexpected effect of loading the same data no matter what the
>>> offset was. While the expected behavior would be to load zeros for offsets
>>> that are greater than target field size.
>>>
>>> For instance, u16 load from u32 context field backed by u16 target field at
>>> an offset of 2 bytes results in:
>>>
>>>     SEC("sk_reuseport/narrow_half")
>>>     int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>>     {
>>>     	__u16 *half;
>>>
>>>     	half = (__u16 *)&ctx->ip_protocol;
>>>     	if (half[0] == 0xaaaa)
>>>     		return SK_DROP;
>>>     	if (half[1] == 0xbbbb)
>>>     		return SK_DROP;
>>>     	return SK_PASS;
>>>     }
>>
>> It would be good if you can include llvm asm output like below so people
>> can correlate source => asm => xlated codes:
>>
>>         0:       w0 = 0
>>         1:       r2 = *(u16 *)(r1 + 24)
>>         2:       if w2 == 43690 goto +4 <LBB0_3>
>>         3:       r1 = *(u16 *)(r1 + 26)
>>         4:       w0 = 1
>>         5:       if w1 != 48059 goto +1 <LBB0_3>
>>         6:       w0 = 0
>>
>> 0000000000000038 <LBB0_3>:
>>         7:       exit
> 
> Sure, not a problem, if it makes reasoning about the problem easier.
> I'm assuming that it is the narrow load at an offset that you wanted to
> confirm with asm output.
> 
>>
>>>
>>>     int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>>>     ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>>        0: (b4) w0 = 0
>>>     ; if (half[0] == 0xaaaa)
>>>        1: (79) r2 = *(u64 *)(r1 +8)
>>>        2: (69) r2 = *(u16 *)(r2 +924)
>>>     ; if (half[0] == 0xaaaa)
>>>        3: (16) if w2 == 0xaaaa goto pc+5
>>>     ; if (half[1] == 0xbbbb)
>>>        4: (79) r1 = *(u64 *)(r1 +8)
>>>        5: (69) r1 = *(u16 *)(r1 +924)
>>>        6: (b4) w0 = 1
>>>     ; if (half[1] == 0xbbbb)
>>>        7: (56) if w1 != 0xbbbb goto pc+1
>>>        8: (b4) w0 = 0
>>>     ; }
>>>        9: (95) exit
>>
>> Indeed we have an issue here. The insn 5 is not correct.
>> The original assembly is correct.
>>
>> Internally ip_protocol is backed by 2 bytes in sk_reuseport_kern.
>> The current verifier implementation makes an important assumption:
>>     all user load requests are within the size of kernel internal range
>> In this case, the verifier actually only correctly supports
>>     . one byte from offset 0
>>     . one byte from offset 1
>>     . two bytes from offset 0
> 
> I don't think that's true. For a field that has target size of 2 bytes,
> like ip_protocol, 1-byte load at any offset is correctly supported
> because right shifting and masking takes place. That is because we hit
> the "size < target_size" condition in this case. Only loads of size >=
> target size at an offset != give surprising results.

okay, yes, you are right. We relied on the fact even the inst is to load 
2 bytes but effectively we load 8 bytes always for 64bit load and 4 
bytes always for 32bit load. For any other one byte access, shifting and 
masking will kick in to make it work although the value is 0.

> 
>>
>> The original assembly code tries to access 2 bytes from offset 2
>> and the verifier did incorrect transformation.
>>
>> This actually makes sense since any other read is
>> misleading. For example, for ip_protocol, if people wants to
>> load 2 bytes from offset 2, what should we return? 0? In this case,
>> actually verifier can convert it to 0 with doing a load.
> 
> Yes, IMHO, if you are loading 2 bytes at offset of 2 from a 4 byte
> context field that holds an unsigned value, then it should return 0 for
> for a field that is backed by a 2 byte kernel field.
> 
> I agree that it could be optimized to load an immediate value instead of
> performing a load from memory. It didn't occur to me, so thanks for the
> suggestion.
> 
>>> In this case half[0] == half[1] == sk->sk_protocol that backs the
>>> ctx->ip_protocol field.
>>>
>>> Fix it by shifting and masking any load from context that is narrower than
>>> context field size (is_narrower_load = size < ctx_field_size), in addition
>>> to loads that are narrower than target field size.
>>
>> The fix can workaround the issue, but I think we should generate better codes
>> for such cases.
> 
> Not sure I'd go as far as calling it a workaround. After all I
> understand that in BPF loading a half word into a register is well
> defined, and you can rely on upper word being zero. But please correct
> me if not.

This is true. I should have said that "the fix can fix the issue".

> 
> You're right, though, that approach can be smarter here, that is we can
> emit just a single instruction that doesn't access memory.
> 
>>> The "size < target_size" check is left in place to cover the case when a
>>> context field is narrower than its target field, even if we might not have
>>> such case now. (It would have to be a u32 context field backed by a u64
>>> target field, with context fields all being 4-bytes or wider.)
>>>
>>> Going back to the example, with the fix in place, the upper half load from
>>> ctx->ip_protocol yields zero:
>>>
>>>     int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>>>     ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>>        0: (b4) w0 = 0
>>>     ; if (half[0] == 0xaaaa)
>>>        1: (79) r2 = *(u64 *)(r1 +8)
>>>        2: (69) r2 = *(u16 *)(r2 +924)
>>>        3: (54) w2 &= 65535
>>>     ; if (half[0] == 0xaaaa)
>>>        4: (16) if w2 == 0xaaaa goto pc+7
>>>     ; if (half[1] == 0xbbbb)
>>>        5: (79) r1 = *(u64 *)(r1 +8)
>>>        6: (69) r1 = *(u16 *)(r1 +924)
>>
>> The load is still from offset 0, 2 bytes with upper 48 bits as 0.
> 
> Yes, this is how narrow loads currently work, right? It is not specific
> to the case I'm fixing.
> 
> To give an example - if you do a 1-byte load at offset 1, it will load
> the value from offset 0, and shift it right by 1 byte. So it is expected
> that the load is always from offset 0 with current implementation.

Yes, the load is always from offset 0. The confusion part is
it load offset 0 with 2 bytes and then right shifting 2 bytes
to get 0...

> 
> SEC("sk_reuseport/narrow_byte")
> int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
> {
> 	__u8 *byte;
> 
> 	byte = (__u8 *)&ctx->ip_protocol;
> 	if (byte[0] == 0xaa)
> 		return SK_DROP;
> 	if (byte[1] == 0xbb)
> 		return SK_DROP;
> 	if (byte[2] == 0xcc)
> 		return SK_DROP;
> 	if (byte[3] == 0xdd)
> 		return SK_DROP;
> 	return SK_PASS;
> }
> 
> int reuseport_narrow_byte(struct sk_reuseport_md * ctx):
> ; int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
>     0: (b4) w0 = 0
> ; if (byte[0] == 0xaa)
>     1: (79) r2 = *(u64 *)(r1 +8)
>     2: (69) r2 = *(u16 *)(r2 +924)
>     3: (54) w2 &= 255
> ; if (byte[0] == 0xaa)
>     4: (16) if w2 == 0xaa goto pc+17
> ; if (byte[1] == 0xbb)
>     5: (79) r2 = *(u64 *)(r1 +8)
>     6: (69) r2 = *(u16 *)(r2 +924)
>     7: (74) w2 >>= 8
>     8: (54) w2 &= 255
> ; if (byte[1] == 0xbb)
>     9: (16) if w2 == 0xbb goto pc+12
> ; if (byte[2] == 0xcc)
>    10: (79) r2 = *(u64 *)(r1 +8)
>    11: (69) r2 = *(u16 *)(r2 +924)
>    12: (74) w2 >>= 16
>    13: (54) w2 &= 255
> ; if (byte[2] == 0xcc)
>    14: (16) if w2 == 0xcc goto pc+7
> ; if (byte[3] == 0xdd)
>    15: (79) r1 = *(u64 *)(r1 +8)
>    16: (69) r1 = *(u16 *)(r1 +924)
>    17: (74) w1 >>= 24
>    18: (54) w1 &= 255
>    19: (b4) w0 = 1
> ; if (byte[3] == 0xdd)
>    20: (56) if w1 != 0xdd goto pc+1
>    21: (b4) w0 = 0
> ; }
>    22: (95) exit
> 
>>
>>>        7: (74) w1 >>= 16
>>
>> w1 will be 0 now. so this will work.
>>
>>>        8: (54) w1 &= 65535
>>
>> For the above insns 5-8, verifier, based on target information can
>> directly generate w1 = 0 since:
>>    . target kernel field size is 2, ctx field size is 4.
>>    . user tries to access offset 2 size 2.
>>
>> Here, we need to decide whether we permits user to do partial read beyond of
>> kernel narrow field or not (e.g., this example)? I would
>> say yes, but Daniel or Alexei can provide additional comments.
>>
>> If we allow such accesses, I would like verifier to generate better
>> code as I illustrated in the above. This can be implemented in
>> verifier itself with target passing additional kernel field size
>> to the verifier. The target already passed the ctx field size back
>> to the verifier.
> 
> Keep in mind that the BPF user is writing their code under the
> assumption that the context field has 4 bytes. IMHO it's reasonable to
> expect that I can load 2 bytes at offset of 2 from a 4 byte field.
> 
> Restricting it now to loads below the target field size, which is
> unknown to the user, would mean rejecting programs that are working
> today. Even if they are getting funny values.
> 
> I think implementing what you suggest is doable without major
> changes. We have load size, target field size, and context field size at
> hand in convert_ctx_accesses(), so it seems like a matter of adding an
> 'if' branch to handle better the case when we know the end result must
> be 0. I'll give it a try.

Sounds good. The target_size is returned in convert_ctx_access(), which
is too late as the verifier already generated load instructions. You 
need to get it earlier in is_valid_access().

> 
> But I do want to empahsize that I still think the fix in current form is
> correct, or at least not worse than what we have already in place narrow
> loads.

I did agree that the fix in this patch is correct. It is just that we
could do better to fix this problem.

> 
>>
>>>        9: (b4) w0 = 1
>>>     ; if (half[1] == 0xbbbb)
>>>       10: (56) if w1 != 0xbbbb goto pc+1
>>>       11: (b4) w0 = 0
>>>     ; }
>>>       12: (95) exit
>>>
>>> Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
>>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>>> ---
>>>    kernel/bpf/verifier.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 94cead5a43e5..1c4d0e24a5a2 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>>>    			return -EINVAL;
>>>    		}
>>>    -		if (is_narrower_load && size < target_size) {
>>> +		if (is_narrower_load || size < target_size) {
>>>    			u8 shift = bpf_ctx_narrow_access_offset(
>>>    				off, size, size_default) * 8;
>>>    			if (ctx_field_size <= 4) {
>>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field size
  2020-07-15 20:59     ` Yonghong Song
@ 2020-07-16 11:48       ` Jakub Sitnicki
  2020-07-16 17:38         ` Yonghong Song
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Sitnicki @ 2020-07-16 11:48 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann

On Wed, Jul 15, 2020 at 10:59 PM CEST, Yonghong Song wrote:
> On 7/15/20 12:26 PM, Jakub Sitnicki wrote:
>> On Wed, Jul 15, 2020 at 08:44 AM CEST, Yonghong Song wrote:
>>> On 7/10/20 10:31 AM, Jakub Sitnicki wrote:

[...]

>>>> The "size < target_size" check is left in place to cover the case when a
>>>> context field is narrower than its target field, even if we might not have
>>>> such case now. (It would have to be a u32 context field backed by a u64
>>>> target field, with context fields all being 4-bytes or wider.)
>>>>
>>>> Going back to the example, with the fix in place, the upper half load from
>>>> ctx->ip_protocol yields zero:
>>>>
>>>>     int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>>>>     ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>>>        0: (b4) w0 = 0
>>>>     ; if (half[0] == 0xaaaa)
>>>>        1: (79) r2 = *(u64 *)(r1 +8)
>>>>        2: (69) r2 = *(u16 *)(r2 +924)
>>>>        3: (54) w2 &= 65535
>>>>     ; if (half[0] == 0xaaaa)
>>>>        4: (16) if w2 == 0xaaaa goto pc+7
>>>>     ; if (half[1] == 0xbbbb)
>>>>        5: (79) r1 = *(u64 *)(r1 +8)
>>>>        6: (69) r1 = *(u16 *)(r1 +924)
>>>
>>> The load is still from offset 0, 2 bytes with upper 48 bits as 0.
>>
>> Yes, this is how narrow loads currently work, right? It is not specific
>> to the case I'm fixing.
>>
>> To give an example - if you do a 1-byte load at offset 1, it will load
>> the value from offset 0, and shift it right by 1 byte. So it is expected
>> that the load is always from offset 0 with current implementation.
>
> Yes, the load is always from offset 0. The confusion part is
> it load offset 0 with 2 bytes and then right shifting 2 bytes
> to get 0...

Right, I see how silly is the generated instruction sequence. I guess
I've accepted how <prog_type>_convert_ctx_access functions emit loads
and didn't stop and question this part before.

>> SEC("sk_reuseport/narrow_byte")
>> int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
>> {
>> 	__u8 *byte;
>>
>> 	byte = (__u8 *)&ctx->ip_protocol;
>> 	if (byte[0] == 0xaa)
>> 		return SK_DROP;
>> 	if (byte[1] == 0xbb)
>> 		return SK_DROP;
>> 	if (byte[2] == 0xcc)
>> 		return SK_DROP;
>> 	if (byte[3] == 0xdd)
>> 		return SK_DROP;
>> 	return SK_PASS;
>> }
>>
>> int reuseport_narrow_byte(struct sk_reuseport_md * ctx):
>> ; int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
>>     0: (b4) w0 = 0
>> ; if (byte[0] == 0xaa)
>>     1: (79) r2 = *(u64 *)(r1 +8)
>>     2: (69) r2 = *(u16 *)(r2 +924)
>>     3: (54) w2 &= 255
>> ; if (byte[0] == 0xaa)
>>     4: (16) if w2 == 0xaa goto pc+17
>> ; if (byte[1] == 0xbb)
>>     5: (79) r2 = *(u64 *)(r1 +8)
>>     6: (69) r2 = *(u16 *)(r2 +924)
>>     7: (74) w2 >>= 8
>>     8: (54) w2 &= 255
>> ; if (byte[1] == 0xbb)
>>     9: (16) if w2 == 0xbb goto pc+12
>> ; if (byte[2] == 0xcc)
>>    10: (79) r2 = *(u64 *)(r1 +8)
>>    11: (69) r2 = *(u16 *)(r2 +924)
>>    12: (74) w2 >>= 16
>>    13: (54) w2 &= 255
>> ; if (byte[2] == 0xcc)
>>    14: (16) if w2 == 0xcc goto pc+7
>> ; if (byte[3] == 0xdd)
>>    15: (79) r1 = *(u64 *)(r1 +8)
>>    16: (69) r1 = *(u16 *)(r1 +924)
>>    17: (74) w1 >>= 24
>>    18: (54) w1 &= 255
>>    19: (b4) w0 = 1
>> ; if (byte[3] == 0xdd)
>>    20: (56) if w1 != 0xdd goto pc+1
>>    21: (b4) w0 = 0
>> ; }
>>    22: (95) exit
>>
>>>
>>>>        7: (74) w1 >>= 16
>>>
>>> w1 will be 0 now. so this will work.
>>>
>>>>        8: (54) w1 &= 65535
>>>
>>> For the above insns 5-8, verifier, based on target information can
>>> directly generate w1 = 0 since:
>>>    . target kernel field size is 2, ctx field size is 4.
>>>    . user tries to access offset 2 size 2.
>>>
>>> Here, we need to decide whether we permits user to do partial read beyond of
>>> kernel narrow field or not (e.g., this example)? I would
>>> say yes, but Daniel or Alexei can provide additional comments.
>>>
>>> If we allow such accesses, I would like verifier to generate better
>>> code as I illustrated in the above. This can be implemented in
>>> verifier itself with target passing additional kernel field size
>>> to the verifier. The target already passed the ctx field size back
>>> to the verifier.
>>
>> Keep in mind that the BPF user is writing their code under the
>> assumption that the context field has 4 bytes. IMHO it's reasonable to
>> expect that I can load 2 bytes at offset of 2 from a 4 byte field.
>>
>> Restricting it now to loads below the target field size, which is
>> unknown to the user, would mean rejecting programs that are working
>> today. Even if they are getting funny values.
>>
>> I think implementing what you suggest is doable without major
>> changes. We have load size, target field size, and context field size at
>> hand in convert_ctx_accesses(), so it seems like a matter of adding an
>> 'if' branch to handle better the case when we know the end result must
>> be 0. I'll give it a try.
>
> Sounds good. The target_size is returned in convert_ctx_access(), which
> is too late as the verifier already generated load instructions. You need to get
> it earlier in is_valid_access().

I have a feeling that I'm not following what you have in mind.

True, target_size is only known after convert_ctx_access generated
instructions. At this point, if we want to optimize the narrow loads
that must return 0, we can pop however many instructions
convert_ctx_access appended to insn_buf and emit BPF_MOV32/64_IMM.

However, it sounds a bit more complex than what I hoped for initially,
so I'm starting to doubt the value. Considering that narrow loads at an
offset that matches or exceeds target field size must be a corner case,
if the current "broken" behavior went unnoticed so far.

I'll need to play with the code and see how it turns out. But for the
moment please consider acking/nacking this one, as a simple way to fix
the issue targeted at 'bpf' branch and stable kernels.

>
>>
>> But I do want to empahsize that I still think the fix in current form is
>> correct, or at least not worse than what we have already in place narrow
>> loads.
>
> I did agree that the fix in this patch is correct. It is just that we
> could do better to fix this problem.

I agree with your sentiment. Sorry if I got too defensive there.

>
>>
>>>
>>>>        9: (b4) w0 = 1
>>>>     ; if (half[1] == 0xbbbb)
>>>>       10: (56) if w1 != 0xbbbb goto pc+1
>>>>       11: (b4) w0 = 0
>>>>     ; }
>>>>       12: (95) exit
>>>>
>>>> Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
>>>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>>>> ---
>>>>    kernel/bpf/verifier.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 94cead5a43e5..1c4d0e24a5a2 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>>>>    			return -EINVAL;
>>>>    		}
>>>>    -		if (is_narrower_load && size < target_size) {
>>>> +		if (is_narrower_load || size < target_size) {
>>>>    			u8 shift = bpf_ctx_narrow_access_offset(
>>>>    				off, size, size_default) * 8;
>>>>    			if (ctx_field_size <= 4) {
>>>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field size
  2020-07-16 11:48       ` Jakub Sitnicki
@ 2020-07-16 17:38         ` Yonghong Song
  0 siblings, 0 replies; 6+ messages in thread
From: Yonghong Song @ 2020-07-16 17:38 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann



On 7/16/20 4:48 AM, Jakub Sitnicki wrote:
> On Wed, Jul 15, 2020 at 10:59 PM CEST, Yonghong Song wrote:
>> On 7/15/20 12:26 PM, Jakub Sitnicki wrote:
>>> On Wed, Jul 15, 2020 at 08:44 AM CEST, Yonghong Song wrote:
>>>> On 7/10/20 10:31 AM, Jakub Sitnicki wrote:
> 
> [...]
> 
>>>>> The "size < target_size" check is left in place to cover the case when a
>>>>> context field is narrower than its target field, even if we might not have
>>>>> such case now. (It would have to be a u32 context field backed by a u64
>>>>> target field, with context fields all being 4-bytes or wider.)
>>>>>
>>>>> Going back to the example, with the fix in place, the upper half load from
>>>>> ctx->ip_protocol yields zero:
>>>>>
>>>>>      int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>>>>>      ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>>>>         0: (b4) w0 = 0
>>>>>      ; if (half[0] == 0xaaaa)
>>>>>         1: (79) r2 = *(u64 *)(r1 +8)
>>>>>         2: (69) r2 = *(u16 *)(r2 +924)
>>>>>         3: (54) w2 &= 65535
>>>>>      ; if (half[0] == 0xaaaa)
>>>>>         4: (16) if w2 == 0xaaaa goto pc+7
>>>>>      ; if (half[1] == 0xbbbb)
>>>>>         5: (79) r1 = *(u64 *)(r1 +8)
>>>>>         6: (69) r1 = *(u16 *)(r1 +924)
>>>>
>>>> The load is still from offset 0, 2 bytes with upper 48 bits as 0.
>>>
>>> Yes, this is how narrow loads currently work, right? It is not specific
>>> to the case I'm fixing.
>>>
>>> To give an example - if you do a 1-byte load at offset 1, it will load
>>> the value from offset 0, and shift it right by 1 byte. So it is expected
>>> that the load is always from offset 0 with current implementation.
>>
>> Yes, the load is always from offset 0. The confusion part is
>> it load offset 0 with 2 bytes and then right shifting 2 bytes
>> to get 0...
> 
> Right, I see how silly is the generated instruction sequence. I guess
> I've accepted how <prog_type>_convert_ctx_access functions emit loads
> and didn't stop and question this part before.
> 
>>> SEC("sk_reuseport/narrow_byte")
>>> int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
>>> {
>>> 	__u8 *byte;
>>>
>>> 	byte = (__u8 *)&ctx->ip_protocol;
>>> 	if (byte[0] == 0xaa)
>>> 		return SK_DROP;
>>> 	if (byte[1] == 0xbb)
>>> 		return SK_DROP;
>>> 	if (byte[2] == 0xcc)
>>> 		return SK_DROP;
>>> 	if (byte[3] == 0xdd)
>>> 		return SK_DROP;
>>> 	return SK_PASS;
>>> }
>>>
>>> int reuseport_narrow_byte(struct sk_reuseport_md * ctx):
>>> ; int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
>>>      0: (b4) w0 = 0
>>> ; if (byte[0] == 0xaa)
>>>      1: (79) r2 = *(u64 *)(r1 +8)
>>>      2: (69) r2 = *(u16 *)(r2 +924)
>>>      3: (54) w2 &= 255
>>> ; if (byte[0] == 0xaa)
>>>      4: (16) if w2 == 0xaa goto pc+17
>>> ; if (byte[1] == 0xbb)
>>>      5: (79) r2 = *(u64 *)(r1 +8)
>>>      6: (69) r2 = *(u16 *)(r2 +924)
>>>      7: (74) w2 >>= 8
>>>      8: (54) w2 &= 255
>>> ; if (byte[1] == 0xbb)
>>>      9: (16) if w2 == 0xbb goto pc+12
>>> ; if (byte[2] == 0xcc)
>>>     10: (79) r2 = *(u64 *)(r1 +8)
>>>     11: (69) r2 = *(u16 *)(r2 +924)
>>>     12: (74) w2 >>= 16
>>>     13: (54) w2 &= 255
>>> ; if (byte[2] == 0xcc)
>>>     14: (16) if w2 == 0xcc goto pc+7
>>> ; if (byte[3] == 0xdd)
>>>     15: (79) r1 = *(u64 *)(r1 +8)
>>>     16: (69) r1 = *(u16 *)(r1 +924)
>>>     17: (74) w1 >>= 24
>>>     18: (54) w1 &= 255
>>>     19: (b4) w0 = 1
>>> ; if (byte[3] == 0xdd)
>>>     20: (56) if w1 != 0xdd goto pc+1
>>>     21: (b4) w0 = 0
>>> ; }
>>>     22: (95) exit
>>>
>>>>
>>>>>         7: (74) w1 >>= 16
>>>>
>>>> w1 will be 0 now. so this will work.
>>>>
>>>>>         8: (54) w1 &= 65535
>>>>
>>>> For the above insns 5-8, verifier, based on target information can
>>>> directly generate w1 = 0 since:
>>>>     . target kernel field size is 2, ctx field size is 4.
>>>>     . user tries to access offset 2 size 2.
>>>>
>>>> Here, we need to decide whether we permits user to do partial read beyond of
>>>> kernel narrow field or not (e.g., this example)? I would
>>>> say yes, but Daniel or Alexei can provide additional comments.
>>>>
>>>> If we allow such accesses, I would like verifier to generate better
>>>> code as I illustrated in the above. This can be implemented in
>>>> verifier itself with target passing additional kernel field size
>>>> to the verifier. The target already passed the ctx field size back
>>>> to the verifier.
>>>
>>> Keep in mind that the BPF user is writing their code under the
>>> assumption that the context field has 4 bytes. IMHO it's reasonable to
>>> expect that I can load 2 bytes at offset of 2 from a 4 byte field.
>>>
>>> Restricting it now to loads below the target field size, which is
>>> unknown to the user, would mean rejecting programs that are working
>>> today. Even if they are getting funny values.
>>>
>>> I think implementing what you suggest is doable without major
>>> changes. We have load size, target field size, and context field size at
>>> hand in convert_ctx_accesses(), so it seems like a matter of adding an
>>> 'if' branch to handle better the case when we know the end result must
>>> be 0. I'll give it a try.
>>
>> Sounds good. The target_size is returned in convert_ctx_access(), which
>> is too late as the verifier already generated load instructions. You need to get
>> it earlier in is_valid_access().
> 
> I have a feeling that I'm not following what you have in mind.
> 
> True, target_size is only known after convert_ctx_access generated
> instructions. At this point, if we want to optimize the narrow loads
> that must return 0, we can pop however many instructions
> convert_ctx_access appended to insn_buf and emit BPF_MOV32/64_IMM.
> 
> However, it sounds a bit more complex than what I hoped for initially,
> so I'm starting to doubt the value. Considering that narrow loads at an
> offset that matches or exceeds target field size must be a corner case,
> if the current "broken" behavior went unnoticed so far.
> 
> I'll need to play with the code and see how it turns out. But for the
> moment please consider acking/nacking this one, as a simple way to fix
> the issue targeted at 'bpf' branch and stable kernels.

Ack the current patch as it does fix the problem. See below comments
with a slight change to avoid penalize existing common case like
    __u16 proto = ctx->ip_protocol;

> 
>>
>>>
>>> But I do want to empahsize that I still think the fix in current form is
>>> correct, or at least not worse than what we have already in place narrow
>>> loads.
>>
>> I did agree that the fix in this patch is correct. It is just that we
>> could do better to fix this problem.
> 
> I agree with your sentiment. Sorry if I got too defensive there.
> 
>>
>>>
>>>>
>>>>>         9: (b4) w0 = 1
>>>>>      ; if (half[1] == 0xbbbb)
>>>>>        10: (56) if w1 != 0xbbbb goto pc+1
>>>>>        11: (b4) w0 = 0
>>>>>      ; }
>>>>>        12: (95) exit
>>>>>
>>>>> Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
>>>>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>>>>> ---
>>>>>     kernel/bpf/verifier.c | 2 +-
>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>>> index 94cead5a43e5..1c4d0e24a5a2 100644
>>>>> --- a/kernel/bpf/verifier.c
>>>>> +++ b/kernel/bpf/verifier.c
>>>>> @@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>>>>>     			return -EINVAL;
>>>>>     		}
>>>>>     -		if (is_narrower_load && size < target_size) {
>>>>> +		if (is_narrower_load || size < target_size) {

Maybe
                 if (is_narrower_load &&
                     (size < target_size || (off & (size_default - 1)) 
!= 0)) {

The original patch is any narrow load will do shift/mask, which is not 
good. For narrow load, we only need to shift/mask if
   - size < target_size or
   - the off is not in the field boundary.

I still prefer better xlated codes. But if it is too complex, the above
change is also acceptable. I just do not like generated xlated byte 
codes, it is very easy for people to get confused.


>>>>>     			u8 shift = bpf_ctx_narrow_access_offset(
>>>>>     				off, size, size_default) * 8;
>>>>>     			if (ctx_field_size <= 4) {
>>>>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-07-16 17:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10 17:31 [PATCH bpf] bpf: Shift and mask loads narrower than context field size Jakub Sitnicki
2020-07-15  6:44 ` Yonghong Song
2020-07-15 19:26   ` Jakub Sitnicki
2020-07-15 20:59     ` Yonghong Song
2020-07-16 11:48       ` Jakub Sitnicki
2020-07-16 17:38         ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).