All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Yonghong Song <yhs@fb.com>, Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@fb.com>,
	Brendan Jackman <jackmanb@google.com>, bpf <bpf@vger.kernel.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>, KP Singh <kpsingh@google.com>
Subject: Re: What should BPF_CMPXCHG do when the values match?
Date: Wed, 10 Feb 2021 10:08:20 -0800	[thread overview]
Message-ID: <CAADnVQ+gnQED7WYAw7Vmm5=omngCKYXnmgU_NqPUfESBerH8gQ@mail.gmail.com> (raw)
In-Reply-To: <5c6501bea0f7ae9dcb9d5f2071441942d5d3dc0f.camel@linux.ibm.com>

On Wed, Feb 10, 2021 at 5:28 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> On Tue, 2021-02-09 at 20:14 -0800, Alexei Starovoitov wrote:
> > On Tue, Feb 9, 2021 at 4:43 PM Ilya Leoshkevich <iii@linux.ibm.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > I'm implementing BPF_CMPXCHG for the s390x JIT and noticed that the
> > > doc, the interpreter and the X64 JIT do not agree on what the
> > > behavior
> > > should be when the values match.
> > >
> > > If the operand size is BPF_W, this matters, because, depending on
> > > the
> > > answer, the upper bits of R0 are either zeroed out out or are left
> > > intact.
> > >
> > > I made the experiment based on the following modification to the
> > > "atomic compare-and-exchange smoketest - 32bit" test on top of
> > > commit
> > > ee5cc0363ea0:
> > >
> > > --- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
> > > +++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
> > > @@ -57,8 +57,8 @@
> > >                 BPF_MOV32_IMM(BPF_REG_1, 4),
> > >                 BPF_MOV32_IMM(BPF_REG_0, 3),
> > >                 BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10,
> > > BPF_REG_1, -4),
> > > -               /* if (old != 3) exit(4); */
> > > -               BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
> > > +               /* if ((u64)old != 3) exit(4); */
> > > +               BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
> > >                 BPF_MOV32_IMM(BPF_REG_0, 4),
> > >                 BPF_EXIT_INSN(),
> > >                 /* if (val != 4) exit(5); */
> > >
> > > and got the following results:
> > >
> > > 1) Documentation: Upper bits of R0 are zeroed out - but it looks as
> > > if
> > >    there is a typo and either a period or the word "otherwise" is
> > >    missing?
> > >
> > >    > If they match it is replaced with ``src_reg``, The value that
> > > was
> > >    > there before is loaded back to ``R0``.
> > >
> > > 2) Interpreter + KVM: Upper bits of R0 are zeroed out (C semantics)
> > >
> > > 3) X64 JIT + KVM: Upper bits of R0 are untouched (cmpxchg
> > > semantics)
> > >
> > >    => 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp)
> > >       0xffffffffc0146bcc: cmp $0x3,%rax
> > >    (gdb) p/x $rax
> > >    0x6bd5720c00000003
> > >    (gdb) x/d $rbp-4
> > >    0xffffc90001263d5c: 3
> > >
> > >       0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp)
> > >    => 0xffffffffc0146bcc: cmp $0x3,%rax
> > >    (gdb) p/x $rax
> > >    0x6bd5720c00000003
> > >
> > > 4) X64 JIT + TCG: Upper bits of R0 are zeroed out (qemu bug?)
> > >
> > >    => 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp)
> > >       0xffffffffc0144201: cmp $0x3,%rax
> > >    (gdb) p/x $rax
> > >    0x81776ea600000003
> > >    (gdb) x/d $rbp-4
> > >    0xffffc90001117d5c: 3
> > >
> > >       0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp)
> > >    => 0xffffffffc0144201: cmp $0x3,%rax
> > >    (gdb) p/x $rax
> > >    $3 = 0x3
> > >
> > > So which option is the right one? In my opinion, it would be safer
> > > to
> > > follow what the interpreter does and zero out the upper bits.
> >
> > Wow. What a find!
> > I thought that all 32-bit x86 ops zero-extend the dest register.
>
> I think that's true, it's just that when the values match, cmpxchg is
> specified to do nothing.
>
> > I agree that it's best to zero upper bits for cmpxchg as well.
>
> I will send a doc patch to clarify the wording then.
>
> > I wonder whether compilers know about this exceptional behavior.
>
> I'm not too familiar with the BPF LLVM backend, but at least CMPXCHG32
> is defined in a similar way to XFALU32, so it should be fine. Maybe
> Yonghong can comment on this further.

I meant x86 backends in gcc and llvm.
bpf backend in llvm I've already checked.

> > I believe the bpf backend considers full R0 to be used by bpf's
> > cmpxchg.
>
> It's a little bit inconsistent at the moment. I don't know why yet,
> but on s390 the subreg optimization kicks in and I have to run with the
> following patch in order to avoid stack pointer zero extension:

makes sense.
This is needed not only for cmpxchg, but for all bpf_fetch variants, right?

> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -10588,6 +10588,7 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct
> bpf_verifier_env *env,
>         for (i = 0; i < len; i++) {
>                 int adj_idx = i + delta;
>                 struct bpf_insn insn;
> +               u8 load_reg;
>
>                 insn = insns[adj_idx];
>                 if (!aux[adj_idx].zext_dst) {
> @@ -10630,9 +10631,29 @@ static int
> opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
>                 if (!bpf_jit_needs_zext())
>                         continue;
>
> +               /* zext_dst means that we want to zero-extend whatever
> register
> +                * the insn defines, which is dst_reg most of the time,
> with
> +                * the notable exception of BPF_STX + BPF_ATOMIC +
> BPF_FETCH.
> +                */
> +               if (BPF_CLASS(insn.code) == BPF_STX &&
> +                   BPF_MODE(insn.code) == BPF_ATOMIC) {
> +                       /* BPF_STX + BPF_ATOMIC insns without BPF_FETCH
> do not
> +                        * define any registers, therefore zext_dst
> cannot be
> +                        * set.
> +                        */
> +                       if (WARN_ON_ONCE(!(insn.imm & BPF_FETCH)))
> +                               return -EINVAL;

warn makes sense.

> +                       if (insn.imm == BPF_CMPXCHG)
> +                               load_reg = BPF_REG_0;
> +                       else
> +                               load_reg = insn.src_reg;

pls use ?:.
I think it will read easier.
And submit it as an official patch. Please.

> +               } else {
> +                       load_reg = insn.dst_reg;
> +               }
> +
>                 zext_patch[0] = insn;
> -               zext_patch[1].dst_reg = insn.dst_reg;
> -               zext_patch[1].src_reg = insn.dst_reg;
> +               zext_patch[1].dst_reg = load_reg;
> +               zext_patch[1].src_reg = load_reg;
>                 patch = zext_patch;
>                 patch_len = 2;
>  apply_patch_buffer:
>
> However, this doesn't seem to affect x86_64.

Right, but it will affect x86-32. It doesn't implement atomics yet,
but would be good to keep zext correct.

> > Do you know what xchg does on x86? What about arm64 with cas?
>
> xchg always zeroes out the upper half.
> Unlike x86_64's cmpxchg, arm64's cas is specified to always zero out
> the upper half, even if the values match. I don't have access to arm8.1
> machine to test this, but at least QEMU does behave this way.
> s390's cs does not zero out the upper half, we need to use llgfr in
> addition (which doesn't sound like a big deal to me).

thanks for checking!

Brendan,
could you please follow up with x64 jit fix to add 'mov eax,eax'  for
u32-sized cmpxchg  ?

  reply	other threads:[~2021-02-10 18:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-10  0:39 What should BPF_CMPXCHG do when the values match? Ilya Leoshkevich
2021-02-10  4:14 ` Alexei Starovoitov
2021-02-10 13:28   ` Ilya Leoshkevich
2021-02-10 18:08     ` Alexei Starovoitov [this message]
2021-02-10 18:28       ` Brendan Jackman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAADnVQ+gnQED7WYAw7Vmm5=omngCKYXnmgU_NqPUfESBerH8gQ@mail.gmail.com' \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=jackmanb@google.com \
    --cc=kpsingh@google.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.