Re: [PATCH bpf-next v2 03/15] bpf: Support new sign-extension mov insns

From: Eduard Zingerman <eddyz87@gmail.com>
To: Fangrui Song <maskray@google.com>, Yonghong Song <yhs@meta.com>,
	 Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com
Subject: Re: [PATCH bpf-next v2 03/15] bpf: Support new sign-extension mov insns
Date: Wed, 19 Jul 2023 19:57:16 +0300	[thread overview]
Message-ID: <f865243714e683d35d61221a778658ea4be745ae.camel@gmail.com> (raw)
In-Reply-To: <CAFP8O3+2dTqatr_of4faH2a9r2dm3e3MatFfXT2-JsYMJqOQ=A@mail.gmail.com>

On Wed, 2023-07-19 at 08:59 -0700, Fangrui Song wrote:
> On Wed, Jul 19, 2023 at 5:53 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Tue, 2023-07-18 at 18:17 -0700, Yonghong Song wrote:
> > [...]
> > > > > > +static void emit_movsx_reg(u8 **pprog, int num_bits, bool is64, u32 dst_reg,
> > > > > > +                          u32 src_reg)
> > > > > > +{
> > > > > > +       u8 *prog = *pprog;
> > > > > > +
> > > > > > +       if (is64) {
> > > > > > +               /* movs[b,w,l]q dst, src */
> > > > > > +               if (num_bits == 8)
> > > > > > +                       EMIT4(add_2mod(0x48, src_reg, dst_reg), 0x0f, 0xbe,
> > > > > > +                             add_2reg(0xC0, src_reg, dst_reg));
> > > > > > +               else if (num_bits == 16)
> > > > > > +                       EMIT4(add_2mod(0x48, src_reg, dst_reg), 0x0f, 0xbf,
> > > > > > +                             add_2reg(0xC0, src_reg, dst_reg));
> > > > > > +               else if (num_bits == 32)
> > > > > > +                       EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x63,
> > > > > > +                             add_2reg(0xC0, src_reg, dst_reg));
> > > > > > +       } else {
> > > > > > +               /* movs[b,w]l dst, src */
> > > > > > +               if (num_bits == 8) {
> > > > > > +                       EMIT4(add_2mod(0x40, src_reg, dst_reg), 0x0f, 0xbe,
> > > > > > +                             add_2reg(0xC0, src_reg, dst_reg));
> > > > 
> > > > Nit: As far as I understand 4-126 Vol. 2B of [1]
> > > >       the 0x40 prefix (REX prefix) is optional here
> > > >       (same as implemented below for num_bits == 16).
> > > 
> > > I think 0x40 prefix at least neededif register is from R8 - R15?
> > 
> > Yes, please see below.
> > 
> > > I use this website to do asm/disasm experiments and did
> > > try various combinations with first 8 and later 8 registers
> > > and it seems correct results are generated.
> > 
> > It seems all roads lead to that web-site, I used it as well :)
> > Today I learned that the following could be used:
> > 
> >   echo 'movsx rax,ax' | as -o /dev/null -aln -msyntax=intel -mnaked-reg
> > 
> > Which opens a road to scripting experiments.
> 
> This internal tool from llvm-project may also be useful:)
> 
> llvm-mc -triple=x86_64 -show-inst -x86-asm-syntax=intel
> -output-asm-variant=1 <<< 'movsx rax, ax'

Thank you, this works (with -show-encoding).

> 
> > > > 
> > > > [1] https://cdrdv2.intel.com/v1/dl/getContent/671200
> > > > 
> > > > 
> > > > > > +               } else if (num_bits == 16) {
> > > > > > +                       if (is_ereg(dst_reg) || is_ereg(src_reg))
> > > > > > +                               EMIT1(add_2mod(0x40, src_reg, dst_reg));
> > > > > > +                       EMIT3(add_2mod(0x0f, src_reg, dst_reg), 0xbf,
> > > > 
> > > > Nit: Basing on the same manual I don't understand why
> > > >       add_2mod(0x0f, src_reg, dst_reg) is used, '0xf' should suffice
> > > >       (but I tried it both ways and it works...).
> > > 
> > >  From the above online assembler website.
> > > 
> > > But I will check the doc to see whether it can be simplified.
> > 
> > I tried all combinations of r0..r9 for 64/32-bit destinations,
> > 32/16/8 sources [1]:
> > - 0x40 based prefix is generated if any of the following is true:
> >   - dst is 64 bit
> >   - dst is ereg
> >   - src is ereg
> >   - dst is 32-bit and src is 'sil' (part of 'rsi', used for r2)
> >     (!) This one is surprising and web-site shows the same results.
> >         For example `movsx eax,sil` is encoded as `40 0F BE C6`,
> >         disassembling `0F BE C6` (w/o prefix) gives `movsx eax,dh`.

I think I found the place in the manual that explains situation:

  3.7.2.1 Register Operands in 64-Bit Mode
  Register operands in 64-bit mode can be any of the following:
  - ...
  - 8-bit general-purpose registers: AL, BL, CL, DL, SIL, DIL, SPL, BPL,
    and R8B-R15B are available using REX prefixes;
    AL, BL, CL, DL, AH, BH, CH, DH are available without using REX prefixes.
  - ...

Vol. 1, page 3-21
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf

> > - opcodes:
> >   - 63      64-bit dst, 32-bit src
> >   - 0F BF   64-bit dst, 16-bit src
> >   - 0F BE   64-bit dst,  8-bit src
> >   - 0F BF   32-bit dst, 16-bit src (same as 64-bit dst)
> >   - 0F BE   32-bit dst,  8-bit src (same as 64-bit dst)
> > 
> > Script is at [2] (it is not particularly interesting, but in case if
> > you want to tweak it).
> > 
> > [1] https://gist.github.com/eddyz87/94b35fd89f023c43dd2480e196b28ea1
> > [2] https://gist.github.com/eddyz87/60991379c547df11d30fa91901862227
> > 
> > > > > > +                             add_2reg(0xC0, src_reg, dst_reg));
> > > > > > +               }
> > > > > > +       }
> > > > > > +
> > > > > > +       *pprog = prog;
> > > > > > +}
> > [...]
> 
> 
>