Re: [PATCH v4 bpf-next 01/15] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Jiong Wang <jiong.wang@netronome.com>
Cc: daniel@iogearbox.net, bpf@vger.kernel.org,
	netdev@vger.kernel.org, oss-drivers@netronome.com
Subject: Re: [PATCH v4 bpf-next 01/15] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32
Date: Tue, 16 Apr 2019 09:20:24 -0700	[thread overview]
Message-ID: <20190416162022.wsymfdriull5srvo@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <lyimveo0ul.fsf@netronome.com>

On Tue, Apr 16, 2019 at 08:39:30AM +0100, Jiong Wang wrote:
> 
> Alexei Starovoitov writes:
> 
> > On Mon, Apr 15, 2019 at 06:26:11PM +0100, Jiong Wang wrote:
> >> Register liveness infrastructure doesn't track register read width at the
> >> moment, while the width information will be needed for the later 32-bit
> >> safety analysis pass.
> >> 
> >> This patch take the first step to split read liveness into REG_LIVE_READ64
> >> and REG_LIVE_READ32.
> >> 
> >> Liveness propagation code are updated accordingly. They are taught to
> >> understand how to propagate REG_LIVE_READ64 and REG_LIVE_READ32 at the same
> >> propagation iteration. For example, "mark_reg_read" now propagate "flags"
> >> which could be multiple read bits instead of the single REG_LIVE_READ64.
> >> 
> >> A write still screen off all width of reads.
> >> 
> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >> ---
> >>  include/linux/bpf_verifier.h |   8 +--
> >>  kernel/bpf/verifier.c        | 119 +++++++++++++++++++++++++++++++++++++++----
> >>  2 files changed, 115 insertions(+), 12 deletions(-)
> >> 
> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> index b3ab61f..fba0ebb 100644
> >> --- a/include/linux/bpf_verifier.h
> >> +++ b/include/linux/bpf_verifier.h
> >> @@ -36,9 +36,11 @@
> >>   */
> >>  enum bpf_reg_liveness {
> >>  	REG_LIVE_NONE = 0, /* reg hasn't been read or written this branch */
> >> -	REG_LIVE_READ, /* reg was read, so we're sensitive to initial value */
> >> -	REG_LIVE_WRITTEN, /* reg was written first, screening off later reads */
> >> -	REG_LIVE_DONE = 4, /* liveness won't be updating this register anymore */
> >> +	REG_LIVE_READ32 = 0x1, /* reg was read, so we're sensitive to initial value */
> >> +	REG_LIVE_READ64 = 0x2, /* likewise, but full 64-bit content matters */
> >> +	REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64,
> >> +	REG_LIVE_WRITTEN = 0x4, /* reg was written first, screening off later reads */
> >> +	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
> >>  };
> >>  
> >>  struct bpf_reg_state {
> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> index c722015..5784b279 100644
> >> --- a/kernel/bpf/verifier.c
> >> +++ b/kernel/bpf/verifier.c
> >> @@ -1135,7 +1135,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
> >>   */
> >>  static int mark_reg_read(struct bpf_verifier_env *env,
> >>  			 const struct bpf_reg_state *state,
> >> -			 struct bpf_reg_state *parent)
> >> +			 struct bpf_reg_state *parent, u8 flags)
> >>  {
> >>  	bool writes = parent == state->parent; /* Observe write marks */
> >>  	int cnt = 0;
> >> @@ -1150,17 +1150,23 @@ static int mark_reg_read(struct bpf_verifier_env *env,
> >>  				parent->var_off.value, parent->off);
> >>  			return -EFAULT;
> >>  		}
> >> -		if (parent->live & REG_LIVE_READ)
> >> +		/* The first condition is much more likely to be true than the
> >> +		 * second, make it checked first.
> >> +		 */
> >> +		if ((parent->live & REG_LIVE_READ) == flags ||
> >> +		    parent->live & REG_LIVE_READ64)
> >>  			/* The parentage chain never changes and
> >>  			 * this parent was already marked as LIVE_READ.
> >>  			 * There is no need to keep walking the chain again and
> >>  			 * keep re-marking all parents as LIVE_READ.
> >>  			 * This case happens when the same register is read
> >>  			 * multiple times without writes into it in-between.
> >> +			 * Also, if parent has REG_LIVE_READ64 set, then no need
> >> +			 * to set the weak REG_LIVE_READ32.
> >>  			 */
> >>  			break;
> >>  		/* ... then we depend on parent's value */
> >> -		parent->live |= REG_LIVE_READ;
> >> +		parent->live |= flags;
> >>  		state = parent;
> >>  		parent = state->parent;
> >>  		writes = true;
> >> @@ -1172,12 +1178,95 @@ static int mark_reg_read(struct bpf_verifier_env *env,
> >>  	return 0;
> >>  }
> >>  
> >> +/* This function is supposed to be used by the following 32-bit optimization
> >> + * code only. It returns TRUE if the source or destination register operates
> >> + * on 64-bit, otherwise return FALSE.
> >> + */
> >> +static bool is_reg64(struct bpf_insn *insn, u32 regno,
> >> +		     struct bpf_reg_state *reg, enum reg_arg_type t)
> >> +{
> >> +	u8 code, class, op;
> >> +
> >
> > why is it called for case when t != SRC_OP ?
> > this patch is using the return value only in t == SRC_OP case
> > and other patches don't use is_reg64() at all.
> 
> It is used for case when t == DST*, in patch 2/15, please search "rw64" in
> that patch.

argh. such patch split didn't make it easy to review.

> And "is_reg64" aims to return TRUE if it is 64-bit read when T == SRC_OP,
> and return TRUE if it is 64-bit write when T = DST*. 
> 
> >> +	code = insn->code;
> >> +	class = BPF_CLASS(code);
> >> +	op = BPF_OP(code);
> >> +	if (class == BPF_JMP) {
> >> +		/* BPF_EXIT will reach here because of return value readability
> >> +		 * test for "main" which has s32 return value.
> >> +		 */
> >> +		if (op == BPF_EXIT)
> >> +			return false;
> >
> > That's not incorrect. bpf2bpf calls return 64-bit values.
> 
> bpf2bpf calls has all instructions exposed to insn walker, so the data-flow
> is naturally tracked. For example:
>   
> callee:
>   w0 = w2
>   exit

So then it needs to be different code? Like read32_maybe ?

> caller:
>   call callee2
>   r2 = r0
>   
> insn walker should have marked REG_0 is a sub-register define in callee's
> frame, and such marker is naturally propagetd back to caller's frame inside
> "prepare_func_exit" which is doing:
> 
>   /* return to the caller whatever r0 had in the callee */                 
>   caller->regs[BPF_REG_0] = *r0;
> 
> This copies parentage chain, and also copies the sub-register definition
> information as we have merged it into reg state.
> 
> > bpf abi is such that all bpf progs return 64-bit values.
> > Historically we truncate to 32-bit in BPF_PROG_RUN,
> > but some future bpf hook might use all 64 bits of return value.
> 
> hmm, was thinking bpf main always return 32-bit, so safe to return FALSE
> here. If we could have 64-bit main return, then perhaps for main, always do
> zero-extension on r0 if it comes from sub-register def, this seems a little
> bit unnecessary, given there is prog_type available during the check, using
> which we can known whether whether return value is 64-bit or not.
> 
> thoughts?

I think it's safer to go with read64 for now and optimize later
if really necessary.

> >> +		if (op == BPF_CALL) {
> >> +			/* BPF to BPF call will reach here because of marking
> >> +			 * caller saved clobber with DST_OP_NO_MARK for which we
> >> +			 * don't care the register def because they are anyway
> >> +			 * marked as NOT_INIT already.
> >> +			 */
> >
> > the comment doesn't seem to match the code. why return anything here?
> 
> It is to handle the case like:
> 
>   check_reg_arg(env, caller_saved[i], DST_OP_NO_MARK);
> 
> which is called inside "check_func_call" for call insn. I think for caller
> saved register, return anything is fine. They must have new def inside
> callee before used. And when returned to caller, they must be restored
> before used otherwise the prog will be rejected due to their status is
> marked as NOT_INIT.

but neither this patch nor patch 2 is using the return value.

> 
> So, for all cases, they are tracked insn walker accurately.
> 
> > The return value won't be used anyway.
> >
> > If is_reg64() is inside check_reg_arg() under if (t == SRC_OP)
> > all these corner cases wouldn't cause review headaches and can be converted
> > to WARN_ONCE.
> >
> > What am I missing?
> 
> As explained, is_reg64 could be used by both t == SRC_OP and DST*. And is
> will be called for instructions with implicit register usage for example
> CALL, and we need to return access width for those implicitly used register
> here as well.
> 
> Make sense?

Kinda. I'll take a look again. My first reaction is that patch 1 and 2
should be one patch or split differently.