All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Cupertino Miranda <cupertino.miranda@oracle.com>
Cc: bpf@vger.kernel.org, Yonghong Song <yonghong.song@linux.dev>,
	 Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	David Faust <david.faust@oracle.com>,
	 Jose Marchesi <jose.marchesi@oracle.com>,
	Elena Zannoni <elena.zannoni@oracle.com>
Subject: Re: [PATCH bpf-next v3 2/6] bpf/verifier: refactor checks for range computation
Date: Fri, 26 Apr 2024 09:11:52 -0700	[thread overview]
Message-ID: <CAEf4BzazPWOgXFco=PJnGEAaJgjr2MG12=3Sr3=9gMckwTSDLg@mail.gmail.com> (raw)
In-Reply-To: <87edasmnlr.fsf@oracle.com>

On Fri, Apr 26, 2024 at 3:20 AM Cupertino Miranda
<cupertino.miranda@oracle.com> wrote:
>
>
> Andrii Nakryiko writes:
>
> > On Wed, Apr 24, 2024 at 3:41 PM Cupertino Miranda
> > <cupertino.miranda@oracle.com> wrote:
> >>
> >> Split range computation checks in its own function, isolating pessimitic
> >> range set for dst_reg and failing return to a single point.
> >>
> >> Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
> >> Cc: Yonghong Song <yonghong.song@linux.dev>
> >> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> >> Cc: David Faust <david.faust@oracle.com>
> >> Cc: Jose Marchesi <jose.marchesi@oracle.com>
> >> Cc: Elena Zannoni <elena.zannoni@oracle.com>
> >> ---
> >>  kernel/bpf/verifier.c | 141 +++++++++++++++++++++++-------------------
> >>  1 file changed, 77 insertions(+), 64 deletions(-)
> >>
> >
> > I know you are moving around pre-existing code, so a bunch of nits
> > below are to pre-existing code, but let's use this as an opportunity
> > to clean it up a bit.
> >
> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> index 6fe641c8ae33..829a12d263a5 100644
> >> --- a/kernel/bpf/verifier.c
> >> +++ b/kernel/bpf/verifier.c
> >> @@ -13695,6 +13695,82 @@ static void scalar_min_max_arsh(struct bpf_reg_state *dst_reg,
> >>         __update_reg_bounds(dst_reg);
> >>  }
> >>
> >> +static bool is_const_reg_and_valid(struct bpf_reg_state reg, bool alu32,
> >
> > hm.. why passing reg_state by value? Use pointer?
> >
> Someone mentioned this in a review already and I forgot to change it.
> Apologies if I did not reply on this.
>
> The reason why I pass by value, is more of an approach to programming.
> I do it as guarantee to the caller that there is no mutation of
> the value.
> If it is better or worst from a performance point of view it is
> arguable, since although it might appear to copy the value it also provides
> more information to the compiler of the intent of the callee function,
> allowing it to optimize further.
> I personally would leave the copy by value, but I understand if you want
> to keep having the same code style.

It's a pretty big 120-byte structure, so maybe the compiler can
optimize it very well, but I'd still be concerned. Hopefully it can
optimize well even with (const) pointer, if inlining.

But I do insist, if you look at (most? I haven't checked every single
function, of course) other uses in verifier.c, we pass things like
that by pointer. I understand the desire to specify the intent to not
modify it, but that's why you are passing `const struct bpf_reg_state
*reg`, so I think you don't lose anything with that.

>
>
> >> +                                  bool *valid)
> >> +{
> >> +       s64 smin_val = reg.smin_value;
> >> +       s64 smax_val = reg.smax_value;
> >> +       u64 umin_val = reg.umin_value;
> >> +       u64 umax_val = reg.umax_value;
> >> +
> >
> > don't add empty line between variable declarations, all variables
> > should be in a single continuous block
> >
> >> +       s32 s32_min_val = reg.s32_min_value;
> >> +       s32 s32_max_val = reg.s32_max_value;
> >> +       u32 u32_min_val = reg.u32_min_value;
> >> +       u32 u32_max_val = reg.u32_max_value;
> >> +
> >
> > but see below, I'm not sure we even need these local variables, they
> > don't save all that much typing
> >
> >> +       bool known = alu32 ? tnum_subreg_is_const(reg.var_off) :
> >> +                            tnum_is_const(reg.var_off);
> >
> > "known" is a misnomer, imo. It's `is_const`.
> >
> >> +
> >> +       if (alu32) {
> >> +               if ((known &&
> >> +                    (s32_min_val != s32_max_val || u32_min_val != u32_max_val)) ||
> >> +                     s32_min_val > s32_max_val || u32_min_val > u32_max_val)
> >> +                       *valid = false;
> >> +       } else {
> >> +               if ((known &&
> >> +                    (smin_val != smax_val || umin_val != umax_val)) ||
> >> +                   smin_val > smax_val || umin_val > umax_val)
> >> +                       *valid = false;
> >> +       }
> >> +
> >> +       return known;
> >
> >
> > The above is really hard to follow, especially how known && !known
> > cases are being handled is very easy to misinterpret. How about we
> > rewrite the equivalent logic in a few steps:
> >
> > if (alu32) {
> >     if (tnum_subreg_is_const(reg.var_off)) {
> >         return reg->s32_min_value == reg->s32_max_value &&
> >                reg->u32_min_value == reg->u32_max_value;
> >     } else {
> >         return reg->s32_min_value <= reg->s32_max_value &&
> >                reg->u32_min_value <= reg->u32_max_value;
> >     }
> > } else {
> >    /* same as above for 64-bit bounds */
> > }
> >
> > And you don't even need any local variables, while all the important
> > conditions are a bit more easy to follow? Or is it just me?
> >
>
> With current state of the code, indeed, it seems you don't need the extra
> valid argument to pass the extra information.
> Considering that the KNOWN value is now only used in the shift
> operators, it seems now safe to merge both valid and the return value
> from the function, since the logic will result in the same behaviour.
>

cool, let's do it then

> >> +}
> >> +
> >> +static bool is_safe_to_compute_dst_reg_range(struct bpf_insn *insn,
> >> +                                            struct bpf_reg_state src_reg)
> >> +{
> >> +       bool src_known;
> >> +       u64 insn_bitness = (BPF_CLASS(insn->code) == BPF_ALU64) ? 64 : 32;
> >
> > whole u64 for this seems like an overkill, I'd just stick to `int`
> >
> >> +       bool alu32 = (BPF_CLASS(insn->code) != BPF_ALU64);
> >
> > insn_bitness == 32 ?
> >
> >> +       u8 opcode = BPF_OP(insn->code);
> >> +
> >
> > nit: don't split variables block with empty line
> >
> >> +       bool valid_known = true;
> >
> > need an empty line between variable declarations and the rest
> >
> >> +       src_known = is_const_reg_and_valid(src_reg, alu32, &valid_known);
> >> +
> >> +       /* Taint dst register if offset had invalid bounds
> >> +        * derived from e.g. dead branches.
> >> +        */
> >> +       if (valid_known == false)
> >
> > nit: !valid_known
> >
> >> +               return false;
> >
> > given this logic/handling, why not just return false from
> > is_const_reg_and_valid() if it's a constant, but it's not
> > valid/consistent? It's simpler and fits the logic and function's name,
> > no? See my suggestion above
> >
> >> +
> >> +       switch (opcode) {
> >
> > inline opcode variable here, you use it just once
> >
> >> +       case BPF_ADD:
> >> +       case BPF_SUB:
> >> +       case BPF_AND:
> >> +               return true;
> >> +
> >> +       /* Compute range for the following only if the src_reg is known.
> >> +        */
> >> +       case BPF_XOR:
> >> +       case BPF_OR:
> >> +       case BPF_MUL:
> >> +               return src_known;
> >> +
> >> +       /* Shift operators range is only computable if shift dimension operand
> >> +        * is known. Also, shifts greater than 31 or 63 are undefined. This
> >> +        * includes shifts by a negative number.
> >> +        */
> >> +       case BPF_LSH:
> >> +       case BPF_RSH:
> >> +       case BPF_ARSH:
> >
> > preserve original comment?
> >
> >> -                       /* Shifts greater than 31 or 63 are undefined.
> >> -                        * This includes shifts by a negative number.
> >> -                        */
> >
> >> +               return (src_known && src_reg.umax_value < insn_bitness);
> >
> > nit: unnecessary ()
> >
> >> +       default:
> >> +               break;
> >
> > return false here, and drop return below
> >
> >> +       }
> >> +
> >> +       return false;
> >> +}
> >> +
> >>  /* WARNING: This function does calculations on 64-bit values, but the actual
> >>   * execution may occur on 32-bit values. Therefore, things like bitshifts
> >>   * need extra checks in the 32-bit case.
> >
> > [...]
>
> Apart from the obvious coding style problems I will address those optimizations
> in an independent patch in the end, if you agree with. I would prefer to
> separate the improvements to avoid to change semantics in the
> refactoring patch, as previously requested by Yonghong.

sure

  reply	other threads:[~2024-04-26 16:12 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-24 22:40 [PATCH bpf-next v3 0/6] bpf/verifier: range computation improvements Cupertino Miranda
2024-04-24 22:40 ` [PATCH bpf-next v3 1/6] bpf/verifier: replace calls to mark_reg_unknown Cupertino Miranda
2024-04-25 16:56   ` Eduard Zingerman
2024-04-24 22:40 ` [PATCH bpf-next v3 2/6] bpf/verifier: refactor checks for range computation Cupertino Miranda
2024-04-25 18:49   ` Eduard Zingerman
2024-04-25 23:05   ` Andrii Nakryiko
2024-04-26 10:20     ` Cupertino Miranda
2024-04-26 16:11       ` Andrii Nakryiko [this message]
2024-04-26 16:17         ` Alexei Starovoitov
2024-04-27 22:51           ` Cupertino Miranda
2024-04-28  3:22             ` Andrii Nakryiko
2024-04-28 10:56               ` Cupertino Miranda
2024-04-24 22:40 ` [PATCH bpf-next v3 3/6] bpf/verifier: improve XOR and OR " Cupertino Miranda
2024-04-25 18:52   ` Eduard Zingerman
2024-04-24 22:40 ` [PATCH bpf-next v3 4/6] selftests/bpf: XOR and OR range computation tests Cupertino Miranda
2024-04-25 18:59   ` Eduard Zingerman
2024-04-25 23:17   ` Andrii Nakryiko
2024-04-24 22:40 ` [PATCH bpf-next v3 5/6] bpf/verifier: relax MUL range computation check Cupertino Miranda
2024-04-25 19:00   ` Eduard Zingerman
2024-04-25 23:24   ` Andrii Nakryiko
2024-04-24 22:40 ` [PATCH bpf-next v3 6/6] selftests/bpf: MUL range computation tests Cupertino Miranda
2024-04-25 19:02   ` Eduard Zingerman
2024-04-25 23:26   ` Andrii Nakryiko
2024-04-24 22:45 ` [PATCH bpf-next v3 0/6] bpf/verifier: range computation improvements Cupertino Miranda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzazPWOgXFco=PJnGEAaJgjr2MG12=3Sr3=9gMckwTSDLg@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=cupertino.miranda@oracle.com \
    --cc=david.faust@oracle.com \
    --cc=elena.zannoni@oracle.com \
    --cc=jose.marchesi@oracle.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.