All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brendan Jackman <jackmanb@google.com>
To: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	KP Singh <kpsingh@chromium.org>,
	Florent Revest <revest@chromium.org>,
	linux-kernel@vger.kernel.org, Jann Horn <jannh@google.com>
Subject: Re: [PATCH bpf-next v3 08/14] bpf: Add instructions for atomic_[cmp]xchg
Date: Fri, 4 Dec 2020 09:26:18 +0000	[thread overview]
Message-ID: <X8oAuo7HB/2XvP4g@google.com> (raw)
In-Reply-To: <34cf7a6e-4e97-9895-6dca-b38e631599b9@fb.com>

O Thu, Dec 03, 2020 at 09:34:23PM -0800, Yonghong Song wrote:
> On 12/3/20 8:02 AM, Brendan Jackman wrote:
> > This adds two atomic opcodes, both of which include the BPF_FETCH
> > flag. XCHG without the BPF_FETCh flag would naturally encode
> 
> BPF_FETCh => BPF_FETCH

Thanks, sorry I think you've already pointed that one out and I didn't fix it!

> > atomic_set. This is not supported because it would be of limited
> > value to userspace (it doesn't imply any barriers). CMPXCHG without
> > BPF_FETCH woulud be an atomic compare-and-write. We don't have such
> > an operation in the kernel so it isn't provided to BPF either.
> > 
> > There are two significant design decisions made for the CMPXCHG
> > instruction:
> > 
> >   - To solve the issue that this operation fundamentally has 3
> >     operands, but we only have two register fields. Therefore the
> >     operand we compare against (the kernel's API calls it 'old') is
> >     hard-coded to be R0. x86 has similar design (and A64 doesn't
> >     have this problem).
> > 
> >     A potential alternative might be to encode the other operand's
> >     register number in the immediate field.
> > 
> >   - The kernel's atomic_cmpxchg returns the old value, while the C11
> >     userspace APIs return a boolean indicating the comparison
> >     result. Which should BPF do? A64 returns the old value. x86 returns
> >     the old value in the hard-coded register (and also sets a
> >     flag). That means return-old-value is easier to JIT.
> > 
> > Signed-off-by: Brendan Jackman <jackmanb@google.com>
> 
> Ack with minor comments in the above and below.

Thanks, ack to all the comments.

Have run a `grep -r "atomic_.*(\*" *.patch` - hopefully we're now free
of this mistake where the first arg is dereferenced in the
comments/disasm...

> Acked-by: Yonghong Song <yhs@fb.com>
> 
> > Change-Id: I3f19ad867dfd08515eecf72674e6fdefe28424bb
> > ---
> >   arch/x86/net/bpf_jit_comp.c    |  8 ++++++++
> >   include/linux/filter.h         | 20 ++++++++++++++++++++
> >   include/uapi/linux/bpf.h       |  4 +++-
> >   kernel/bpf/core.c              | 20 ++++++++++++++++++++
> >   kernel/bpf/disasm.c            | 15 +++++++++++++++
> >   kernel/bpf/verifier.c          | 19 +++++++++++++++++--
> >   tools/include/linux/filter.h   | 20 ++++++++++++++++++++
> >   tools/include/uapi/linux/bpf.h |  4 +++-
> >   8 files changed, 106 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 88cb09fa3bfb..7d29bc3bb4ff 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -831,6 +831,14 @@ static int emit_atomic(u8 **pprog, u8 atomic_op,
> >   		/* src_reg = atomic_fetch_add(*(dst_reg + off), src_reg); */
> >   		EMIT2(0x0F, 0xC1);
> >   		break;
> > +	case BPF_XCHG:
> > +		/* src_reg = atomic_xchg(*(u32/u64*)(dst_reg + off), src_reg); */
> 
> src_reg = atomic_xchg((u32/u64*)(dst_reg + off), src_reg)?
> 
> > +		EMIT1(0x87);
> > +		break;
> > +	case BPF_CMPXCHG:
> > +		/* r0 = atomic_cmpxchg(*(u32/u64*)(dst_reg + off), r0, src_reg); */
> 
> r0 = atomic_cmpxchg((u32/u64*)(dst_reg + off), r0, src_reg)?
> 
> > +		EMIT2(0x0F, 0xB1);
> > +		break;
> >   	default:
> >   		pr_err("bpf_jit: unknown atomic opcode %02x\n", atomic_op);
> >   		return -EFAULT;
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index 4e04d0fc454f..6186280715ed 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -280,6 +280,26 @@ static inline bool insn_is_zext(const struct bpf_insn *insn)
> >   		.off   = OFF,					\
> >   		.imm   = BPF_ADD | BPF_FETCH })
> > +/* Atomic exchange, src_reg = atomic_xchg((dst_reg + off), src_reg) */
> 
> src_reg = atomic_xchg(dst_reg + off, src_reg)?
> 
> > +
> > +#define BPF_ATOMIC_XCHG(SIZE, DST, SRC, OFF)			\
> > +	((struct bpf_insn) {					\
> > +		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC,	\
> > +		.dst_reg = DST,					\
> > +		.src_reg = SRC,					\
> > +		.off   = OFF,					\
> > +		.imm   = BPF_XCHG  })
> > +
> > +/* Atomic compare-exchange, r0 = atomic_cmpxchg((dst_reg + off), r0, src_reg) */
> 
> r0 = atomic_cmpxchg(dst_reg + off, r0, src_reg)?
> 
> > +
> > +#define BPF_ATOMIC_CMPXCHG(SIZE, DST, SRC, OFF)			\
> > +	((struct bpf_insn) {					\
> > +		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC,	\
> > +		.dst_reg = DST,					\
> > +		.src_reg = SRC,					\
> > +		.off   = OFF,					\
> > +		.imm   = BPF_CMPXCHG })
> > +
> >   /* Memory store, *(uint *) (dst_reg + off16) = imm32 */
> >   #define BPF_ST_MEM(SIZE, DST, OFF, IMM)				\
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 025e377e7229..53334530cc81 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -45,7 +45,9 @@
> >   #define BPF_EXIT	0x90	/* function return */
> >   /* atomic op type fields (stored in immediate) */
> > -#define BPF_FETCH	0x01	/* fetch previous value into src reg */
> > +#define BPF_XCHG	(0xe0 | BPF_FETCH)	/* atomic exchange */
> > +#define BPF_CMPXCHG	(0xf0 | BPF_FETCH)	/* atomic compare-and-write */
> > +#define BPF_FETCH	0x01	/* not an opcode on its own, used to build others */
> >   /* Register numbers */
> >   enum {
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 61e93eb7d363..28f960bc2e30 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -1630,6 +1630,16 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
> >   				(u32) SRC,
> >   				(atomic_t *)(unsigned long) (DST + insn->off));
> >   			break;
> > +		case BPF_XCHG:
> > +			SRC = (u32) atomic_xchg(
> > +				(atomic_t *)(unsigned long) (DST + insn->off),
> > +				(u32) SRC);
> > +			break;
> > +		case BPF_CMPXCHG:
> > +			BPF_R0 = (u32) atomic_cmpxchg(
> > +				(atomic_t *)(unsigned long) (DST + insn->off),
> > +				(u32) BPF_R0, (u32) SRC);
> > +			break;
> >   		default:
> >   			goto default_label;
> >   		}
> > @@ -1647,6 +1657,16 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
> >   				(u64) SRC,
> >   				(atomic64_t *)(s64) (DST + insn->off));
> >   			break;
> > +		case BPF_XCHG:
> > +			SRC = (u64) atomic64_xchg(
> > +				(atomic64_t *)(u64) (DST + insn->off),
> > +				(u64) SRC);
> > +			break;
> > +		case BPF_CMPXCHG:
> > +			BPF_R0 = (u64) atomic64_cmpxchg(
> > +				(atomic64_t *)(u64) (DST + insn->off),
> > +				(u64) BPF_R0, (u64) SRC);
> > +			break;
> >   		default:
> >   			goto default_label;
> >   		}
> > diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
> > index 3ee2246a52ef..18357ea9a17d 100644
> > --- a/kernel/bpf/disasm.c
> > +++ b/kernel/bpf/disasm.c
> > @@ -167,6 +167,21 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
> >   				BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
> >   				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
> >   				insn->dst_reg, insn->off, insn->src_reg);
> > +		} else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
> > +			   insn->imm == BPF_CMPXCHG) {
> > +			verbose(cbs->private_data, "(%02x) r0 = atomic%s_cmpxchg(*(%s *)(r%d %+d), r0, r%d)\n",
> 
> (%02x) r0 = atomic%s_cmpxchg((%s *)(r%d %+d), r0, r%d)?
> 
> > +				insn->code,
> > +				BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
> > +				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
> > +				insn->dst_reg, insn->off,
> > +				insn->src_reg);
> > +		} else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
> > +			   insn->imm == BPF_XCHG) {
> > +			verbose(cbs->private_data, "(%02x) r%d = atomic%s_xchg(*(%s *)(r%d %+d), r%d)\n",
> 
> (%02x) r%d = atomic%s_xchg((%s *)(r%d %+d), r%d)?
> 
> > +				insn->code, insn->src_reg,
> > +				BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
> > +				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
> > +				insn->dst_reg, insn->off, insn->src_reg);
> >   		} else {
> >   			verbose(cbs->private_data, "BUG_%02x\n", insn->code);
> >   		}
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index a68adbcee370..ccf4315e54e7 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -3601,10 +3601,13 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> >   static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
> >   {
> >   	int err;
> > +	int load_reg;
> 
> nit: not a big deal but maybe put this definition before 'int err' to
> maintain reverse christmas tree coding style.
> 
> >   	switch (insn->imm) {
> >   	case BPF_ADD:
> >   	case BPF_ADD | BPF_FETCH:
> > +	case BPF_XCHG:
> > +	case BPF_CMPXCHG:
> >   		break;
> >   	default:
> >   		verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n", insn->imm);
> > @@ -3626,6 +3629,13 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> >   	if (err)
> >   		return err;
> > +	if (insn->imm == BPF_CMPXCHG) {
> > +		/* Check comparison of R0 with memory location */
> > +		err = check_reg_arg(env, BPF_REG_0, SRC_OP);
> > +		if (err)
> > +			return err;
> > +	}
> > +
> >   	if (is_pointer_value(env, insn->src_reg)) {
> >   		verbose(env, "R%d leaks addr into mem\n", insn->src_reg);
> >   		return -EACCES;
> > @@ -3656,8 +3666,13 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> >   	if (!(insn->imm & BPF_FETCH))
> >   		return 0;
> > -	/* check and record load of old value into src reg  */
> > -	err = check_reg_arg(env, insn->src_reg, DST_OP);
> > +	if (insn->imm == BPF_CMPXCHG)
> > +		load_reg = BPF_REG_0;
> > +	else
> > +		load_reg = insn->src_reg;
> > +
> > +	/* check and record load of old value */
> > +	err = check_reg_arg(env, load_reg, DST_OP);
> >   	if (err)
> >   		return err;
> > diff --git a/tools/include/linux/filter.h b/tools/include/linux/filter.h
> > index ac7701678e1a..ea99bd17d003 100644
> > --- a/tools/include/linux/filter.h
> > +++ b/tools/include/linux/filter.h
> > @@ -190,6 +190,26 @@
> >   		.off   = OFF,					\
> >   		.imm   = BPF_ADD | BPF_FETCH })
> > +/* Atomic exchange, src_reg = atomic_xchg((dst_reg + off), src_reg) */
> 
> src_reg = atomic_xchg(dst_reg + off, src_reg)?
> 
> > +
> > +#define BPF_ATOMIC_XCHG(SIZE, DST, SRC, OFF)			\
> > +	((struct bpf_insn) {					\
> > +		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC,	\
> > +		.dst_reg = DST,					\
> > +		.src_reg = SRC,					\
> > +		.off   = OFF,					\
> > +		.imm   = BPF_XCHG })
> > +
> > +/* Atomic compare-exchange, r0 = atomic_cmpxchg((dst_reg + off), r0, src_reg) */
> 
> r0 = atomic_cmpxchg(dst_reg + off, r0, src_reg)?
> 
> > +
> > +#define BPF_ATOMIC_CMPXCHG(SIZE, DST, SRC, OFF)			\
> > +	((struct bpf_insn) {					\
> > +		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC,	\
> > +		.dst_reg = DST,					\
> > +		.src_reg = SRC,					\
> > +		.off   = OFF,					\
> > +		.imm   = BPF_CMPXCHG })
> > +
> >   /* Memory store, *(uint *) (dst_reg + off16) = imm32 */
> >   #define BPF_ST_MEM(SIZE, DST, OFF, IMM)				\
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index 025e377e7229..53334530cc81 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -45,7 +45,9 @@
> >   #define BPF_EXIT	0x90	/* function return */
> >   /* atomic op type fields (stored in immediate) */
> > -#define BPF_FETCH	0x01	/* fetch previous value into src reg */
> > +#define BPF_XCHG	(0xe0 | BPF_FETCH)	/* atomic exchange */
> > +#define BPF_CMPXCHG	(0xf0 | BPF_FETCH)	/* atomic compare-and-write */
> > +#define BPF_FETCH	0x01	/* not an opcode on its own, used to build others */
> >   /* Register numbers */
> >   enum {
> > 

  reply	other threads:[~2020-12-04  9:27 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 16:02 [PATCH bpf-next v3 00/14] Atomics for eBPF Brendan Jackman
2020-12-03 16:02 ` [PATCH bpf-next v3 01/14] bpf: x86: Factor out emission of ModR/M for *(reg + off) Brendan Jackman
2020-12-03 16:02 ` [PATCH bpf-next v3 02/14] bpf: x86: Factor out emission of REX byte Brendan Jackman
2020-12-03 16:02 ` [PATCH bpf-next v3 03/14] bpf: x86: Factor out function to emit NEG Brendan Jackman
2020-12-03 16:02 ` [PATCH bpf-next v3 04/14] bpf: x86: Factor out a lookup table for some ALU opcodes Brendan Jackman
2020-12-03 16:02 ` [PATCH bpf-next v3 05/14] bpf: Rename BPF_XADD and prepare to encode other atomics in .imm Brendan Jackman
2020-12-04  4:49   ` Yonghong Song
2020-12-03 16:02 ` [PATCH bpf-next v3 06/14] bpf: Move BPF_STX reserved field check into BPF_STX verifier code Brendan Jackman
2020-12-04  4:51   ` Yonghong Song
2020-12-03 16:02 ` [PATCH bpf-next v3 07/14] bpf: Add BPF_FETCH field / create atomic_fetch_add instruction Brendan Jackman
2020-12-04  5:02   ` Yonghong Song
2020-12-04  5:27   ` Yonghong Song
2020-12-04  9:12     ` Brendan Jackman
2020-12-03 16:02 ` [PATCH bpf-next v3 08/14] bpf: Add instructions for atomic_[cmp]xchg Brendan Jackman
2020-12-04  5:34   ` Yonghong Song
2020-12-04  9:26     ` Brendan Jackman [this message]
2020-12-03 16:02 ` [PATCH bpf-next v3 09/14] bpf: Pull out a macro for interpreting atomic ALU operations Brendan Jackman
2020-12-04  6:30   ` Yonghong Song
2020-12-04  9:29     ` Brendan Jackman
2020-12-04 15:20       ` Yonghong Song
2020-12-03 16:02 ` [PATCH bpf-next v3 10/14] bpf: Add bitwise atomic instructions Brendan Jackman
2020-12-04  6:42   ` Yonghong Song
2020-12-04  9:36     ` Brendan Jackman
2020-12-04 15:21       ` Yonghong Song
2020-12-07 11:28         ` Brendan Jackman
2020-12-07 15:58           ` Yonghong Song
2020-12-07 16:14             ` Brendan Jackman
2020-12-03 16:02 ` [PATCH bpf-next v3 11/14] tools build: Implement feature check for BPF atomics in Clang Brendan Jackman
2020-12-03 21:02   ` Andrii Nakryiko
2020-12-03 16:02 ` [PATCH bpf-next v3 12/14] bpf: Pull tools/build/feature biz into selftests Makefile Brendan Jackman
2020-12-03 21:01   ` Andrii Nakryiko
2020-12-04  9:41     ` Brendan Jackman
2020-12-04 19:00       ` Andrii Nakryiko
2020-12-07 11:00         ` Brendan Jackman
2020-12-08  2:19           ` Andrii Nakryiko
2020-12-08 17:04             ` Brendan Jackman
2020-12-08 18:31               ` Andrii Nakryiko
2020-12-03 16:02 ` [PATCH bpf-next v3 13/14] bpf: Add tests for new BPF atomic operations Brendan Jackman
2020-12-04  7:06   ` Yonghong Song
2020-12-04  9:45     ` Brendan Jackman
2020-12-04 15:28       ` Yonghong Song
2020-12-04 19:49         ` Andrii Nakryiko
2020-12-07 15:48           ` Brendan Jackman
2020-12-03 16:02 ` [PATCH bpf-next v3 14/14] bpf: Document new atomic instructions Brendan Jackman
2020-12-03 16:10 ` [PATCH bpf-next v3 00/14] Atomics for eBPF Brendan Jackman
2020-12-04  4:46 ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X8oAuo7HB/2XvP4g@google.com \
    --to=jackmanb@google.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jannh@google.com \
    --cc=kpsingh@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=revest@chromium.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.