Re: Verifier rejects previously accepted program

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	kernel-team <kernel-team@cloudflare.com>,
	bpf <bpf@vger.kernel.org>,
	regressions@lists.linux.dev, Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>
Subject: Re: Verifier rejects previously accepted program
Date: Tue, 9 Nov 2021 20:25:30 -0800	[thread overview]
Message-ID: <20211110042530.6ye65mpspre7au5f@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CACAyw99KGdTAz+G3aU8G3eqC926YYpgD57q-A+NFNVqqiJPY3g@mail.gmail.com>

On Mon, Nov 08, 2021 at 01:21:12PM +0000, Lorenz Bauer wrote:
> On Fri, 5 Nov 2021 at 19:49, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Nov 05, 2021 at 10:41:40AM +0000, Lorenz Bauer wrote:
> > >
> > > bpf-next with f30d4968e9ae on top:
> > >
> > >     works!
> >
> > Awesome.
> >
> > > commit 3e8ce29850f1 ("bpf: Prevent pointer mismatch in
> > > bpf_timer_init.") (found via bisection):
> > >
> > >     BPF program is too large. Processed 1000001 insn
> > >
> > > commit 3e8ce29850f1^ ("bpf: Add map side support for bpf timers."):
> > >
> > >    works!
> >
> > So with just 3e8ce29850f1 it's "too large" and with parent commit it works ?
> > I've analyzed offending commit again and don't see how it can be causing
> > state pruning to be more conservative for your asm.
> > reg->map_uid should only be non-zero for lookups from inner maps,
> > but your asm doesn't have lookups at all in that loop.
> 
> I misattributed the problem to the loop, since it was really prominent
> in the verifier output. We use nested maps extensively, most likely
> those are what's causing the problem.
> 
> > Maybe in some case map_uid doesn't get cleared, but I couldn't find
> > such code path with manual code analysis.
> > I think it's worth investigating further.
> > Please craft a reproducer.
> 
> I've started with some verifier log analysis to narrow the problem down.
> 
> * Same test case as before
> * Dump verifier output with log_level=2 for both 3e8ce29850f1 and 3e8ce29850f1^
> * Use diff to find the first non-matching line
> 
> 3e8ce29850f1 makes the verifier do a lot more work on our code. Some
> later commit then drops the complexity below what the verifier will
> accept, probably the more precise scalar spill tracking.
> 
> 3e8ce29850f1^:                  295498 insns
> 3e8ce29850f1:                > 1000000 insns
> be2f2d1680df + bd479d103883:    450161 insns
> 
> Trace from 3e8ce29850f1^ (working):
> 
> 1033: R0=map_value(id=0,off=0,ks=4,vs=36,imm=0) R1_w=invP0
> R3_w=map_value(id=0,off=0,ks=4,vs=36,imm=0) R6=ctx(id=0,off=0,imm=0)
> R7=inv(id=0) R8=pkt(id=0,off=18,r=38,imm=0) R9=inv0 R10=fp0
> fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmm00m0 fp-48=mmmm0000
> fp-56=00000000 fp-64=00000000 fp-72=0000mmmm fp-80=mmmmmmmm
> fp-88=map_value fp-96=pkt_end fp-104=map_value fp-112=pkt fp-120=fp
> fp-128=map_value
> 1033: (16) if w1 == 0x0 goto pc+43
> 1077: safe
> 1178: R0=inv0 R1=map_ptr(id=0,off=0,ks=4,vs=4,imm=0) R2_w=inv0
> R3=inv2388976653695081527 R4=inv-8645972361240307355 R5=inv(id=6898)
> R6=ctx(id=0,off=0,imm=0) R7=inv(id=0) R8=pkt(id=0,off=18,r=38,imm=0)
> R9=inv0 R10=fp0 fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmm00m0
> fp-48=mmmm0000 fp-56=00000000 fp-64=00000000 fp-72=0000mmmm
> fp-80=mmmmmmmm fp-88=map_value fp-96=pkt_end fp-104=map_value
> fp-112=pkt fp-120=fp fp-128=map_value
> 1178: (63) *(u32 *)(r10 -32) = r7
> <...>
> processed 295498 insns (limit 1000000) max_states_per_insn 29
> total_states 14527 peak_states 1322 mark_read 53
> 
> Trace from 3e8ce29850f1 (broken):
> 
> 1033: R0=map_value(id=0,off=0,ks=4,vs=36,imm=0) R1_w=invP0
> R3_w=map_value(id=0,off=0,ks=4,vs=36,imm=0) R6=ctx(id=0,off=0,imm=0)
> R7=inv(id=0) R8=pkt(id=0,off=18,r=38,imm=0) R9=inv0 R10=fp0
> fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmm00m0 fp-48=mmmm0000
> fp-56=00000000 fp-64=00000000 fp-72=0000mmmm fp-80=mmmmmmmm
> fp-88=map_value fp-96=pkt_end fp-104=map_value fp-112=pkt fp-120=fp
> fp-128=map_value
> 1033: (16) if w1 == 0x0 goto pc+43
> 1077: R0=map_value(id=0,off=0,ks=4,vs=36,imm=0) R1_w=invP0
> R3_w=map_value(id=0,off=0,ks=4,vs=36,imm=0) R6=ctx(id=0,off=0,imm=0)
> R7=inv(id=0) R8=pkt(id=0,off=18,r=38,imm=0) R9=inv0 R10=fp0
> fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmm00m0 fp-48=mmmm0000
> fp-56=00000000 fp-64=00000000 fp-72=0000mmmm fp-80=mmmmmmmm
> fp-88=map_value fp-96=pkt_end fp-104=map_value fp-112=pkt fp-120=fp
> fp-128=map_value
> 1077: (79) r2 = *(u64 *)(r10 -128)

R2 loads a spilled map_value.

> 1078: R0=map_value(id=0,off=0,ks=4,vs=36,imm=0) R1_w=invP0
> R2_w=map_value(id=0,off=0,ks=4,vs=32,imm=0)
> R3_w=map_value(id=0,off=0,ks=4,vs=36,imm=0) R6=ctx(id=0,off=0,imm=0)
> R7=inv(id=0) R8=pkt(id=0,off=18,r=38,imm=0) R9=inv0 R10=fp0
> fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmm00m0 fp-48=mmmm0000
> fp-56=00000000 fp-64=00000000 fp-72=0000mmmm fp-80=mmmmmmmm
> fp-88=map_value fp-96=pkt_end fp-104=map_value fp-112=pkt fp-120=fp
> fp-128=map_value
> 1078: (79) r1 = *(u64 *)(r2 +0)
> <...>
> (truncated)
> 
> Trace from be2f2d1680df ("libbpf: Deprecate bpf_program__load() API")
> with bd479d103883 ("bpf: Do not reject when the stack read size is
> different from the tracked scalar size") cherry picked:
> 
> processed 450161 insns (limit 1000000) max_states_per_insn 19
> total_states 19452 peak_states 1319 mark_read 53
> 
> r2 is the result of a lookup from a per-CPU array, ts_metrics in the
> snippet below:
> 
> struct bpf_map_def traffic_set_metrics_map __section("maps") = {
>     .type        = BPF_MAP_TYPE_PERCPU_ARRAY,
>     .key_size    = sizeof(traffic_set_id_t),
>     .value_size  = sizeof(traffic_set_metrics_t),
>     .max_entries = SET_BY_USERSPACE,
> };
> 
>     traffic_set_metrics_t *ts_metrics =
> bpf_map_lookup_elem(&traffic_set_metrics_map, &meta->ts_id);
>     if (ts_metrics == NULL) {
>         return XDP_ABORTED;
>     }
> 
>    <...>
> 
>    if (meta->from_plurimog) {
>         ts_metrics->packets_total_plurimog_ingress++;
>     } else {
>         ts_metrics->packets_total_main++; // insn 1078
>     }

but it goes into R2 from non-inner map which ruins all my theories.

I've tried to craft a test case based on a theory and so far couldn't do so.
Could you please try the following hack:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1aafb43f61d1..89b8f79b7236 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -665,9 +665,10 @@ static void print_verifier_state(struct bpf_verifier_env *env,
                                 t == PTR_TO_MAP_KEY ||
                                 t == PTR_TO_MAP_VALUE ||
                                 t == PTR_TO_MAP_VALUE_OR_NULL)
-                               verbose(env, ",ks=%d,vs=%d",
+                               verbose(env, ",ks=%d,vs=%d,uid=%d",
                                        reg->map_ptr->key_size,
-                                       reg->map_ptr->value_size);
+                                       reg->map_ptr->value_size,
+                                       reg->map_uid);
                        if (tnum_is_const(reg->var_off)) {
                                /* Typically an immediate SCALAR_VALUE, but
                                 * could be a pointer whose offset is too big
@@ -10509,8 +10510,11 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
                 */
                if (rcur->type != PTR_TO_MAP_VALUE_OR_NULL)
                        return false;
-               if (memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)))
+               if (memcmp(rold, rcur, offsetof(struct bpf_reg_state, map_uid)))
                        return false;
+               if (rcur->map_uid)
+                       if (!check_ids(rold->map_uid, rcur->map_uid, idmap))
+                               return false;
                /* Check our ids match any regs they're supposed to */
                return check_ids(rold->id, rcur->id, idmap);
        case PTR_TO_PACKET_META:


The verbose() part will help to confirm that R2 in the above should be uid=0.

After that please try only with:
-               if (memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)))
+               if (memcmp(rold, rcur, offsetof(struct bpf_reg_state, map_uid)))

It should resolve the regression, but will break timer safety check and makes
the map_uid logic not quite right (though no existing test will show it).
Hence the check_ids() part in the hunk above that should make map_uid correct again
and hopefully not repeat the infinite loop you're seeing.

Without a reproducer it's all wild guesses.

If offsetof(map_uid) doesn't help another guess would be:
@@ -10496,7 +10497,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
                 * it's valid for all map elements regardless of the key
                 * used in bpf_map_lookup()
                 */
-               return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
+               return memcmp(rold, rcur, offsetof(struct bpf_reg_state, map_uid)) == 0 &&
                       range_within(rold, rcur) &&
                       tnum_in(rold->var_off, rcur->var_off);
that's for PTR_TO_MAP_VALUE and that would be a different theory which makes even less sense.

If neither help the reproducer would be must have to make further progress.