Re: [bpf PATCH v3] bpf: verifier, do_refine_retval_range may clamp umin to 0 incorrectly

From: Yonghong Song <yhs@fb.com>
To: John Fastabend <john.fastabend@gmail.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>
Subject: Re: [bpf PATCH v3] bpf: verifier, do_refine_retval_range may clamp umin to 0 incorrectly
Date: Fri, 7 Feb 2020 22:23:08 -0800	[thread overview]
Message-ID: <3883ca15-5c4b-8065-3698-988bd08ff55d@fb.com> (raw)
In-Reply-To: <5e3dccccc0a92_7e1b2b0e9aae25bcec@john-XPS-13-9370.notmuch>

On 2/7/20 12:47 PM, John Fastabend wrote:
> Yonghong Song wrote:
>>
>>
>> On 2/4/20 7:05 PM, John Fastabend wrote:
>>> Yonghong Song wrote:
>>>>
>>>>
>>>> On 2/4/20 11:55 AM, John Fastabend wrote:
>>>>> Alexei Starovoitov wrote:
>>>>>> On Fri, Jan 31, 2020 at 9:16 AM John Fastabend <john.fastabend@gmail.com> wrote:
>>>>>>>
>>>>>>> Also don't mind to build pseudo instruction here for signed extension
>>>>>>> but its not clear to me why we are getting different instruction
>>>>>>> selections? Its not clear to me why sext is being chosen in your case?
>>>
>>> [...]
>>>
>>>>>> zext is there both cases and it will be optimized with your llvm patch.
>>>>>> So please send it. Don't delay :)
>>>>>
>>>>> LLVM patch here, https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D73985&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=VnK0SKxGnw_yzWjaO-cZFrmlZB9p86L4me-mWE_vDto&s=jwDJuAEdJ23HVcvIILvkfxvTNSe_cgHQFn_MpXssfXc&e=
>>>>>
>>>>> With updated LLVM I can pass selftests with above fix and additional patch
>>>>> below to get tighter bounds on 32bit registers. So going forward I think
>>>>> we need to review and assuming it looks good commit above llvm patch and
>>>>> then go forward with this series.
>>>> [...]
>>>> With the above patch, there is still one more issue in test_seg6_loop.o,
>>>> which is related to llvm code generation, w.r.t. our strange 32bit
>>>> packet begin and packet end.
>>>>
>>>> The following patch is generated:
>>>>
>>>> 2: (61) r1 = *(u32 *)(r6 +76)
>>>> 3: R1_w=pkt(id=0,off=0,r=0,imm=0) R2_w=pkt_end(id=0,off=0,imm=0)
>>>> R6_w=ctx(id=0,off=0,imm=0) R10=fp0
>>>> ; cursor = (void *)(long)skb->data;
>>>> 3: (bc) w8 = w1
>>>> 4: R1_w=pkt(id=0,off=0,r=0,imm=0) R2_w=pkt_end(id=0,off=0,imm=0)
>>>> R6_w=ctx(id=0,off=0,imm=0)
>>>> R8_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
>>>> ; if ((void *)ipver + sizeof(*ipver) > data_end)
>>>> 4: (bf) r3 = r8
>>>>
>>>> In the above r1 is packet pointer and after the assignment, it becomes a
>>>> scalar and will lead later verification failure.
>>>>
>>>> Without the patch, we generates:
>>>> 1: R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0
>>>> ; data_end = (void *)(long)skb->data_end;
>>>> 1: (61) r1 = *(u32 *)(r6 +80)
>>>> 2: R1_w=pkt_end(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0
>>>> ; cursor = (void *)(long)skb->data;
>>>> 2: (61) r8 = *(u32 *)(r6 +76)
>>>> 3: R1_w=pkt_end(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0)
>>>> R8_w=pkt(id=0,off=0,r=0,imm=0) R10=fp0
>>>> ; if ((void *)ipver + sizeof(*ipver) > data_end)
>>>> 3: (bf) r2 = r8
>>>> 4: R1_w=pkt_end(id=0,off=0,imm=0) R2_w=pkt(id=0,off=0,r=0,imm=0)
>>>> R6_w=ctx(id=0,off=0,imm=0) R8_w=pkt(id=0,off=0,r=0,imm=0) R10=fp0
>>>> 4: (07) r2 += 1
>>>> 5: R1_w=pkt_end(id=0,off=0,imm=0) R2_w=pkt(id=0,off=1,r=0,imm=0)
>>>> R6_w=ctx(id=0,off=0,imm=0) R8_w=pkt(id=0,off=0,r=0,imm=0) R10=fp0
>>>>
>>>> r2 keeps as a packet pointer, so we don't have issues later.
>>>>
>>>> Not sure how we could fix this in llvm as llvm does not really have idea
>>>> the above w1 in w8 = w1 is a packet pointer.
>>>>
>>>
>>> OK thanks for analysis. I have this on my stack as well but need to
>>> check its correct still,
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 320e2df..3072dba7 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -2804,8 +2804,11 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
>>>                   reg->umax_value = mask;
>>>           }
>>>           reg->smin_value = reg->umin_value;
>>> -       if (reg->smax_value < 0 || reg->smax_value > reg->umax_value)
>>> +       if (reg->smax_value < 0 || reg->smax_value > reg->umax_value) {
>>>                   reg->smax_value = reg->umax_value;
>>> +       } else {
>>> +               reg->umax_value = reg->smax_value;
>>> +       }
>>>    }
>>>
>>> this helps but still hitting above issue with the packet pointer as
>>> you pointed out. I'll sort out how we can fix this. Somewhat related
>>
>> I just fixed llvm to allow itself doing a better job for zext code gen.
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D74101&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=H-nAm78-zRumhyRoNeSzy3IaT2e1IQSfD7rq-DYkDUM&s=GwE8T2RDQGwRxrmCTHhipk47qj3aJFyICQrm-cBmS9w&e=
>> This should solve the above issue.
> 
> Great applied this but still have one more issue to resolve.

So this one more issue is related to coerce_reg_to_size(), right?

> 
>>
>>> we have a similar issue we hit fairly consistently I've been meaning
>>> to sort out where the cmp happens on a different register then is
>>> used in the call, for example something like this pseudocode
>>>
>>>      r8 = r2
>>>      if r8 > blah goto +label
>>>      r1 = dest_ptr
>>>      r1 += r2
>>>      r2 = size
>>>      r3 = ptr
>>>      call #some_call
>>>
>>> and the verifier aborts because r8 was verified instead of r2. The
>>> working plan was to walk back in the def-use chain and sort it out
>>> but tbd.
>>
>> I have another llvm patch (not merged yet)
>>     https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D72787&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=H-nAm78-zRumhyRoNeSzy3IaT2e1IQSfD7rq-DYkDUM&s=yvw4ipXO3Eln_HHZvXPBZS9n-0w2ek5BjKbtG_Q2f4E&e=
>> to undo some llvm optimization so we do not have the above code.
>> But the resulted byte code needs some kernel verifier change. The
>> following is my previous attempt and you commented on.
>>
>> https://lore.kernel.org/bpf/20200123191815.1364298-1-yhs@fb.com/T/#m8e3dee022801542ddf15b8e406dc05185f959b4f
>>
>> I think this is better than making verifier more complex to do
>> backtracking. What do you think?
>>
> 
> In general I think its nice if llvm can continue to optimize as it
> wants and we can verify it. I was hoping to try this next week and
> see how it falls out after getting the above resolved. If it gets
> too ugly sure we can fall back to removing the optimization.

Sounds good. Looking forward to your patch. Thanks!