From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Yonghong Song <yhs@fb.com>, Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Alexei Starovoitov <ast@fb.com>,
Brendan Jackman <jackmanb@google.com>, bpf <bpf@vger.kernel.org>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>, KP Singh <kpsingh@google.com>
Subject: Re: What should BPF_CMPXCHG do when the values match?
Date: Wed, 10 Feb 2021 10:08:20 -0800 [thread overview]
Message-ID: <CAADnVQ+gnQED7WYAw7Vmm5=omngCKYXnmgU_NqPUfESBerH8gQ@mail.gmail.com> (raw)
In-Reply-To: <5c6501bea0f7ae9dcb9d5f2071441942d5d3dc0f.camel@linux.ibm.com>
On Wed, Feb 10, 2021 at 5:28 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> On Tue, 2021-02-09 at 20:14 -0800, Alexei Starovoitov wrote:
> > On Tue, Feb 9, 2021 at 4:43 PM Ilya Leoshkevich <iii@linux.ibm.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > I'm implementing BPF_CMPXCHG for the s390x JIT and noticed that the
> > > doc, the interpreter and the X64 JIT do not agree on what the
> > > behavior
> > > should be when the values match.
> > >
> > > If the operand size is BPF_W, this matters, because, depending on
> > > the
> > > answer, the upper bits of R0 are either zeroed out out or are left
> > > intact.
> > >
> > > I made the experiment based on the following modification to the
> > > "atomic compare-and-exchange smoketest - 32bit" test on top of
> > > commit
> > > ee5cc0363ea0:
> > >
> > > --- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
> > > +++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
> > > @@ -57,8 +57,8 @@
> > > BPF_MOV32_IMM(BPF_REG_1, 4),
> > > BPF_MOV32_IMM(BPF_REG_0, 3),
> > > BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10,
> > > BPF_REG_1, -4),
> > > - /* if (old != 3) exit(4); */
> > > - BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
> > > + /* if ((u64)old != 3) exit(4); */
> > > + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
> > > BPF_MOV32_IMM(BPF_REG_0, 4),
> > > BPF_EXIT_INSN(),
> > > /* if (val != 4) exit(5); */
> > >
> > > and got the following results:
> > >
> > > 1) Documentation: Upper bits of R0 are zeroed out - but it looks as
> > > if
> > > there is a typo and either a period or the word "otherwise" is
> > > missing?
> > >
> > > > If they match it is replaced with ``src_reg``, The value that
> > > was
> > > > there before is loaded back to ``R0``.
> > >
> > > 2) Interpreter + KVM: Upper bits of R0 are zeroed out (C semantics)
> > >
> > > 3) X64 JIT + KVM: Upper bits of R0 are untouched (cmpxchg
> > > semantics)
> > >
> > > => 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp)
> > > 0xffffffffc0146bcc: cmp $0x3,%rax
> > > (gdb) p/x $rax
> > > 0x6bd5720c00000003
> > > (gdb) x/d $rbp-4
> > > 0xffffc90001263d5c: 3
> > >
> > > 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp)
> > > => 0xffffffffc0146bcc: cmp $0x3,%rax
> > > (gdb) p/x $rax
> > > 0x6bd5720c00000003
> > >
> > > 4) X64 JIT + TCG: Upper bits of R0 are zeroed out (qemu bug?)
> > >
> > > => 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp)
> > > 0xffffffffc0144201: cmp $0x3,%rax
> > > (gdb) p/x $rax
> > > 0x81776ea600000003
> > > (gdb) x/d $rbp-4
> > > 0xffffc90001117d5c: 3
> > >
> > > 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp)
> > > => 0xffffffffc0144201: cmp $0x3,%rax
> > > (gdb) p/x $rax
> > > $3 = 0x3
> > >
> > > So which option is the right one? In my opinion, it would be safer
> > > to
> > > follow what the interpreter does and zero out the upper bits.
> >
> > Wow. What a find!
> > I thought that all 32-bit x86 ops zero-extend the dest register.
>
> I think that's true, it's just that when the values match, cmpxchg is
> specified to do nothing.
>
> > I agree that it's best to zero upper bits for cmpxchg as well.
>
> I will send a doc patch to clarify the wording then.
>
> > I wonder whether compilers know about this exceptional behavior.
>
> I'm not too familiar with the BPF LLVM backend, but at least CMPXCHG32
> is defined in a similar way to XFALU32, so it should be fine. Maybe
> Yonghong can comment on this further.
I meant x86 backends in gcc and llvm.
bpf backend in llvm I've already checked.
> > I believe the bpf backend considers full R0 to be used by bpf's
> > cmpxchg.
>
> It's a little bit inconsistent at the moment. I don't know why yet,
> but on s390 the subreg optimization kicks in and I have to run with the
> following patch in order to avoid stack pointer zero extension:
makes sense.
This is needed not only for cmpxchg, but for all bpf_fetch variants, right?
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -10588,6 +10588,7 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct
> bpf_verifier_env *env,
> for (i = 0; i < len; i++) {
> int adj_idx = i + delta;
> struct bpf_insn insn;
> + u8 load_reg;
>
> insn = insns[adj_idx];
> if (!aux[adj_idx].zext_dst) {
> @@ -10630,9 +10631,29 @@ static int
> opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
> if (!bpf_jit_needs_zext())
> continue;
>
> + /* zext_dst means that we want to zero-extend whatever
> register
> + * the insn defines, which is dst_reg most of the time,
> with
> + * the notable exception of BPF_STX + BPF_ATOMIC +
> BPF_FETCH.
> + */
> + if (BPF_CLASS(insn.code) == BPF_STX &&
> + BPF_MODE(insn.code) == BPF_ATOMIC) {
> + /* BPF_STX + BPF_ATOMIC insns without BPF_FETCH
> do not
> + * define any registers, therefore zext_dst
> cannot be
> + * set.
> + */
> + if (WARN_ON_ONCE(!(insn.imm & BPF_FETCH)))
> + return -EINVAL;
warn makes sense.
> + if (insn.imm == BPF_CMPXCHG)
> + load_reg = BPF_REG_0;
> + else
> + load_reg = insn.src_reg;
pls use ?:.
I think it will read easier.
And submit it as an official patch. Please.
> + } else {
> + load_reg = insn.dst_reg;
> + }
> +
> zext_patch[0] = insn;
> - zext_patch[1].dst_reg = insn.dst_reg;
> - zext_patch[1].src_reg = insn.dst_reg;
> + zext_patch[1].dst_reg = load_reg;
> + zext_patch[1].src_reg = load_reg;
> patch = zext_patch;
> patch_len = 2;
> apply_patch_buffer:
>
> However, this doesn't seem to affect x86_64.
Right, but it will affect x86-32. It doesn't implement atomics yet,
but would be good to keep zext correct.
> > Do you know what xchg does on x86? What about arm64 with cas?
>
> xchg always zeroes out the upper half.
> Unlike x86_64's cmpxchg, arm64's cas is specified to always zero out
> the upper half, even if the values match. I don't have access to arm8.1
> machine to test this, but at least QEMU does behave this way.
> s390's cs does not zero out the upper half, we need to use llgfr in
> addition (which doesn't sound like a big deal to me).
thanks for checking!
Brendan,
could you please follow up with x64 jit fix to add 'mov eax,eax' for
u32-sized cmpxchg ?
next prev parent reply other threads:[~2021-02-10 18:11 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-10 0:39 What should BPF_CMPXCHG do when the values match? Ilya Leoshkevich
2021-02-10 4:14 ` Alexei Starovoitov
2021-02-10 13:28 ` Ilya Leoshkevich
2021-02-10 18:08 ` Alexei Starovoitov [this message]
2021-02-10 18:28 ` Brendan Jackman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAADnVQ+gnQED7WYAw7Vmm5=omngCKYXnmgU_NqPUfESBerH8gQ@mail.gmail.com' \
--to=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@fb.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=iii@linux.ibm.com \
--cc=jackmanb@google.com \
--cc=kpsingh@google.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).