All of lore.kernel.org
 help / color / mirror / Atom feed
* bpf: Propose some new instructions for -mcpu=v4
@ 2023-02-09 22:54 Yonghong Song
  2023-02-09 23:36 ` Jose E. Marchesi
  2023-02-09 23:39 ` Andrii Nakryiko
  0 siblings, 2 replies; 11+ messages in thread
From: Yonghong Song @ 2023-02-09 22:54 UTC (permalink / raw)
  To: alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jose E. Marchesi, David Faust, James Hilliard, bpf,
	Martin KaFai Lau
  Cc: yhs

Over the past, there are some discussions to extend bpf
instruction ISA to accommodate some new use cases or
fix some potential issues. These new instructions will
be included in new cpu flavor -mcpu=v4.

The following are the proposal
to add new instructions in 6 different categories.
The proposal is a little bit rough. You can find bpf insn
background information in Documentation/bpf/instruction-set.rst.
You comments or suggestions are welcome!

SDIV/SMOD (signed div and mod)
==============================

bpf already has unsigned DIV and MOD. They are encoded as

   insn    code(4 bits)     source(1 bit)     instruction class(3 bit) 
off(16 bits)
   DIV     0x3              0/1               BPF_ALU/BPF_ALU64          0
   MOD     0x9              0/1               BPF_ALU/BPF_ALU64          0

The current 'code' field only has two value left, 0xe and 0xf.
gcc used these two values (0xe and 0xf) for SDIV and SMOD.
But using these two values takes up all 'code' space and makes
future extension hard.

Here, I propose to encode SDIV/SMOD like below:

   insn    code(4 bits)     source(1 bit)     instruction class(3 bit) 
off(16 bits)
   DIV     0x3              0/1               BPF_ALU/BPF_ALU64          1
   MOD     0x9              0/1               BPF_ALU/BPF_ALU64          1

Basically, we reuse the same 'code' value but changing 'off' from 0 to 1
to indicate signed div/mod.

Sign extend load
================

Currently llvm generated normal load instructions are encoded like below.

   mode(3 bits)      size(2 bits)    instruction class(3 bits)
   BPF_MEM (0x3)     8/16/32/64      BPF_LDX

For mode, existing used values are 0x0, 0x1, 0x2, 0x3, 0x6.
The proposal is to use mod value 0x4 to encode sign extend loads.

   alu32_mode  mode(3 bits)      size(2 bits)    instruction class(3 bits)
   yes         BPF_SMEM (0x4)    8/16            BPF_LDX
   no          BPF_SMEM (0x4)    8/16/32         BPF_LDX

Sign extend register mov
========================

Current BPF_MOV insn is encoded as
   insn    code(4 bits)     source(1 bit)     instruction class(3 bit) 
off(16 bits)
   MOV     0xb              0/1               BPF_ALU/BPF_ALU64          0

Let us support sign extended move insn as defined below:

   alu32_mode  insn    code(4 bits)    source(1 bit)    instruction 
class(3 bit)   off(16 bits)
   yes         MOVS    0xb             0/1              BPF_ALU 
           8/16
   no          MOVS    0xb             0/1              BPF_ALU64 
           8/16/32

In the above sign extended mov instruction, 'off' represents the 'size'.
For example, if alu32 mode is enabled, and 'off' is 8, which means sign 
extend a 8-bit
value (imm or register) to a 32-bit value. If alu32 mode is not enabled, 
the same 8-bit
value will sign extend to a 64-bit value.

32-bit JA
=========

Currently, the whole range of operations with BPF_JMP32/BPF_JMP insn are 
implemented like below

   ========  =====  =========================  ============
   code      value  description                notes
   ========  =====  =========================  ============
   BPF_JA    0x00   PC += off                  BPF_JMP only
   BPF_JEQ   0x10   PC += off if dst == src
   BPF_JGT   0x20   PC += off if dst > src     unsigned
   BPF_JGE   0x30   PC += off if dst >= src    unsigned
   BPF_JSET  0x40   PC += off if dst & src
   BPF_JNE   0x50   PC += off if dst != src
   BPF_JSGT  0x60   PC += off if dst > src     signed
   BPF_JSGE  0x70   PC += off if dst >= src    signed
   BPF_CALL  0x80   function call
   BPF_EXIT  0x90   function / program return  BPF_JMP only
   BPF_JLT   0xa0   PC += off if dst < src     unsigned
   BPF_JLE   0xb0   PC += off if dst <= src    unsigned
   BPF_JSLT  0xc0   PC += off if dst < src     signed
   BPF_JSLE  0xd0   PC += off if dst <= src    signed
   ========  =====  =========================  ============

Here the 'off' is 16 bit so the range of jump is [-32768, 32767].
In rare cases, people may have large programs or have loops fully unrolled.
This may cause some jump offset beyond the above range. In current
llvm implementation, wrong code (after truncation) will be generated.

To fix this issue, the following new insn is proposed

   ========  =====  =========================  ============
   code      value  description                notes
   ========  =====  =========================  ============
   BPF_JA    0x00   PC += imm                  BPF_JMP32 only, src = 1

The way, the jump offset range become [-2^31, 2^31 - 1].

For other jump instructions, e.g., BPF_JEQ, with a jmp offset
beyond [-32768, 32767]. It can be simulated with a
'BPF_JA (PC += imm)' followed by the original
BPF_JEQ with the range 'off', or BPF_JEQ with a short range followed
by a BPF_JA.

bswap16/32/64
=============

Currently, llvm does not generate bswap16/32/64 properly.
Rather it generates be16/32/64 and le16/32/64 instructions based on
endianness of the current bpf target in compilation.
The existing encode looks below:

   bpf target      insn    code(4 bits)     source(1 bit)
     instruction class(3 bit)   imm(32 bits)
   big endian      LE      0xd              LE(0)
     BPF_ALU                    16/32/64
   little endian   BE      0xd              BE(1)
     BPF_ALU                    16/32/64

LE insn will do swap if the running target is big endian.
BE insn will do swap if the running target is little endian.
See kernel/bpf/core.c for details.

The new bswap instruction will have the following encoding:
insn    code(4 bits)     source(1 bit)     instruction class(3 bit) 
imm(32 bits)
BSWAP   0xd              0                 BPF_ALU64 
16/32/64

The BSWAP insn will be swap unconditionally.

ST
==

The kernel has already supported BPF_ST insn like below,

   mode(3 bits)      size(2 bits)    instruction class(3 bits)
   BPF_MEM (0x3)     8/16/32/64      BPF_ST

The semantics is:
   *(size *) (dst_reg + off) = imm32
LLVM just needs to implement this instruction under -mcpu=v4. looks
like gcc can already generate this instruction.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf: Propose some new instructions for -mcpu=v4
  2023-02-09 22:54 bpf: Propose some new instructions for -mcpu=v4 Yonghong Song
@ 2023-02-09 23:36 ` Jose E. Marchesi
  2023-02-09 23:45   ` Jose E. Marchesi
                     ` (2 more replies)
  2023-02-09 23:39 ` Andrii Nakryiko
  1 sibling, 3 replies; 11+ messages in thread
From: Jose E. Marchesi @ 2023-02-09 23:36 UTC (permalink / raw)
  To: Yonghong Song
  Cc: alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	David Faust, James Hilliard, bpf, Martin KaFai Lau


Hi Yonghong.
Thanks for the proposal!

> SDIV/SMOD (signed div and mod)
> ==============================
>
> bpf already has unsigned DIV and MOD. They are encoded as
>
>   insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
>   off(16 bits)
>   DIV     0x3              0/1               BPF_ALU/BPF_ALU64          0
>   MOD     0x9              0/1               BPF_ALU/BPF_ALU64          0
>
> The current 'code' field only has two value left, 0xe and 0xf.
> gcc used these two values (0xe and 0xf) for SDIV and SMOD.
> But using these two values takes up all 'code' space and makes
> future extension hard.
>
> Here, I propose to encode SDIV/SMOD like below:
>
>   insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
>   off(16 bits)
>   DIV     0x3              0/1               BPF_ALU/BPF_ALU64          1
>   MOD     0x9              0/1               BPF_ALU/BPF_ALU64          1
>
> Basically, we reuse the same 'code' value but changing 'off' from 0 to 1
> to indicate signed div/mod.

I have a general concern about using instruction operands to encode
opcodes (in this case, 'off').

At the moment we have two BPF instruction formats:

 - The 64-bit instructions:

    code:8 regs:8 offset:16 imm:32

 - The 128-bit instructions:

    code:8 regs:8 offset:16 imm:32 unused:32 imm:32 

Of these, `code', `regs' and `unused' are what is commonly known as
instruction fields.  These are typically used for register numbers,
flags, and opcodes.

On the other hand, offset, imm32 and imm:32:::imm:32 are instruction
operands (the later is non-contiguous and conforms the 64-bit operand in
the 128-bit instruction).

The main difference between these is that the bytes conforming
instruction operands are themselves impacted by endianness, on top on
the endianness effect on the whole instruction.  (The weird endian-flip
in the two nibbles of `regs' is unfortunate, but I guess there is
nothing we can do about it at this point and I count them as
non-operands.)

If you use an instruction operand (such as `offset') in order to act as
an opcode, you incur in two inconveniences:

1) In effect you have "moving" opcodes that depend on the endianness.
   The opcode for signed-operation will be 0x1 in big-endian BPF, but
   0x8000 in little-endian bpf.

2) You lose the ability of easily adding more complementary opcodes in
   these 16 bits in the future, in case you ever need them.

As far as I have seen in other architectures, the usual way of doing
this is to add an additional instruction format, in this case for the
class of arithmetic instructions, where the bits dedicated to the unused
operand (offset) becomes a new opcodes field:

  - 32-bit arithmetic instructions:

    code:8 regs:8 code2:16 imm:32

Where code2 is now an additional field (not an operand) that provides
extra additional opcode space for this particular class of instructions.
This can be divided in a 1-bit field to signify "signed" and the rest
reserved for future use:

   opcode2 ::= unused(15) signed(1)

Thoughts?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf: Propose some new instructions for -mcpu=v4
  2023-02-09 22:54 bpf: Propose some new instructions for -mcpu=v4 Yonghong Song
  2023-02-09 23:36 ` Jose E. Marchesi
@ 2023-02-09 23:39 ` Andrii Nakryiko
  2023-02-10 18:53   ` Yonghong Song
  1 sibling, 1 reply; 11+ messages in thread
From: Andrii Nakryiko @ 2023-02-09 23:39 UTC (permalink / raw)
  To: Yonghong Song
  Cc: alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jose E. Marchesi, David Faust, James Hilliard, bpf,
	Martin KaFai Lau

On Thu, Feb 9, 2023 at 2:55 PM Yonghong Song <yhs@meta.com> wrote:
>
> Over the past, there are some discussions to extend bpf
> instruction ISA to accommodate some new use cases or
> fix some potential issues. These new instructions will
> be included in new cpu flavor -mcpu=v4.
>
> The following are the proposal
> to add new instructions in 6 different categories.
> The proposal is a little bit rough. You can find bpf insn
> background information in Documentation/bpf/instruction-set.rst.
> You comments or suggestions are welcome!
>

Great that we are trying to fix and complete the instruction set! Just
one comment/question below for condition jumps.

[...]

>
> 32-bit JA
> =========
>
> Currently, the whole range of operations with BPF_JMP32/BPF_JMP insn are
> implemented like below
>
>    ========  =====  =========================  ============
>    code      value  description                notes
>    ========  =====  =========================  ============
>    BPF_JA    0x00   PC += off                  BPF_JMP only
>    BPF_JEQ   0x10   PC += off if dst == src
>    BPF_JGT   0x20   PC += off if dst > src     unsigned
>    BPF_JGE   0x30   PC += off if dst >= src    unsigned
>    BPF_JSET  0x40   PC += off if dst & src
>    BPF_JNE   0x50   PC += off if dst != src
>    BPF_JSGT  0x60   PC += off if dst > src     signed
>    BPF_JSGE  0x70   PC += off if dst >= src    signed
>    BPF_CALL  0x80   function call
>    BPF_EXIT  0x90   function / program return  BPF_JMP only
>    BPF_JLT   0xa0   PC += off if dst < src     unsigned
>    BPF_JLE   0xb0   PC += off if dst <= src    unsigned
>    BPF_JSLT  0xc0   PC += off if dst < src     signed
>    BPF_JSLE  0xd0   PC += off if dst <= src    signed
>    ========  =====  =========================  ============
>
> Here the 'off' is 16 bit so the range of jump is [-32768, 32767].
> In rare cases, people may have large programs or have loops fully unrolled.
> This may cause some jump offset beyond the above range. In current
> llvm implementation, wrong code (after truncation) will be generated.
>
> To fix this issue, the following new insn is proposed
>
>    ========  =====  =========================  ============
>    code      value  description                notes
>    ========  =====  =========================  ============
>    BPF_JA    0x00   PC += imm                  BPF_JMP32 only, src = 1
>
> The way, the jump offset range become [-2^31, 2^31 - 1].
>
> For other jump instructions, e.g., BPF_JEQ, with a jmp offset
> beyond [-32768, 32767]. It can be simulated with a
> 'BPF_JA (PC += imm)' followed by the original
> BPF_JEQ with the range 'off', or BPF_JEQ with a short range followed
> by a BPF_JA.

Why not implement the same approach (using imm if src = 1) for all the
conditional jumps? Just too much JIT work or some other reasons?

[...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf: Propose some new instructions for -mcpu=v4
  2023-02-09 23:36 ` Jose E. Marchesi
@ 2023-02-09 23:45   ` Jose E. Marchesi
  2023-02-10  1:45   ` Jose E. Marchesi
  2023-02-10 18:37   ` Yonghong Song
  2 siblings, 0 replies; 11+ messages in thread
From: Jose E. Marchesi @ 2023-02-09 23:45 UTC (permalink / raw)
  To: Yonghong Song
  Cc: alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	David Faust, James Hilliard, bpf, Martin KaFai Lau


> 1) In effect you have "moving" opcodes that depend on the endianness.
>    The opcode for signed-operation will be 0x1 in big-endian BPF, but
>    0x8000 in little-endian bpf.

s/0x8000/0x0100/.

(Yeah I'm bad with such things :'D)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf: Propose some new instructions for -mcpu=v4
  2023-02-09 23:36 ` Jose E. Marchesi
  2023-02-09 23:45   ` Jose E. Marchesi
@ 2023-02-10  1:45   ` Jose E. Marchesi
  2023-02-10 14:29     ` Jose E. Marchesi
  2023-02-10 18:56     ` Yonghong Song
  2023-02-10 18:37   ` Yonghong Song
  2 siblings, 2 replies; 11+ messages in thread
From: Jose E. Marchesi @ 2023-02-10  1:45 UTC (permalink / raw)
  To: Yonghong Song
  Cc: alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	David Faust, James Hilliard, bpf, Martin KaFai Lau


> Hi Yonghong.
> Thanks for the proposal!
>
>> SDIV/SMOD (signed div and mod)
>> ==============================
>>
>> bpf already has unsigned DIV and MOD. They are encoded as
>>
>>   insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
>>   off(16 bits)
>>   DIV     0x3              0/1               BPF_ALU/BPF_ALU64          0
>>   MOD     0x9              0/1               BPF_ALU/BPF_ALU64          0
>>
>> The current 'code' field only has two value left, 0xe and 0xf.
>> gcc used these two values (0xe and 0xf) for SDIV and SMOD.
>> But using these two values takes up all 'code' space and makes
>> future extension hard.
>>
>> Here, I propose to encode SDIV/SMOD like below:
>>
>>   insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
>>   off(16 bits)
>>   DIV     0x3              0/1               BPF_ALU/BPF_ALU64          1
>>   MOD     0x9              0/1               BPF_ALU/BPF_ALU64          1
>>
>> Basically, we reuse the same 'code' value but changing 'off' from 0 to 1
>> to indicate signed div/mod.
>
> I have a general concern about using instruction operands to encode
> opcodes (in this case, 'off').
>
> At the moment we have two BPF instruction formats:
>
>  - The 64-bit instructions:
>
>     code:8 regs:8 offset:16 imm:32
>
>  - The 128-bit instructions:
>
>     code:8 regs:8 offset:16 imm:32 unused:32 imm:32 
>
> Of these, `code', `regs' and `unused' are what is commonly known as
> instruction fields.  These are typically used for register numbers,
> flags, and opcodes.
>
> On the other hand, offset, imm32 and imm:32:::imm:32 are instruction
> operands (the later is non-contiguous and conforms the 64-bit operand in
> the 128-bit instruction).
>
> The main difference between these is that the bytes conforming
> instruction operands are themselves impacted by endianness, on top on
> the endianness effect on the whole instruction.  (The weird endian-flip
> in the two nibbles of `regs' is unfortunate, but I guess there is
> nothing we can do about it at this point and I count them as
> non-operands.)
>
> If you use an instruction operand (such as `offset') in order to act as
> an opcode, you incur in two inconveniences:
>
> 1) In effect you have "moving" opcodes that depend on the endianness.
>    The opcode for signed-operation will be 0x1 in big-endian BPF, but
>    0x8000 in little-endian bpf.
>
> 2) You lose the ability of easily adding more complementary opcodes in
>    these 16 bits in the future, in case you ever need them.
>
> As far as I have seen in other architectures, the usual way of doing
> this is to add an additional instruction format, in this case for the
> class of arithmetic instructions, where the bits dedicated to the unused
> operand (offset) becomes a new opcodes field:
>
>   - 32-bit arithmetic instructions:
>
>     code:8 regs:8 code2:16 imm:32
>
> Where code2 is now an additional field (not an operand) that provides
> extra additional opcode space for this particular class of instructions.
> This can be divided in a 1-bit field to signify "signed" and the rest
> reserved for future use:
>
>    opcode2 ::= unused(15) signed(1)

Actually this would be just for DIV/MOD instructions, so the new format
should only apply to them.  The new format would be something like:

  - 64-bit ALU/ALU64 div/mod instructions (code=3,9):

    code:8 regs:8 unused:15 signed:1 imm:32

And for the rest of the ALU and ALU64 instructions
(code=0,1,2,4,5,6,7,8,a,b,c,d):

  - 64-bit ALU/ALU64 instructions:

    code:8 regs:8 unused:16 imm:32

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf: Propose some new instructions for -mcpu=v4
  2023-02-10  1:45   ` Jose E. Marchesi
@ 2023-02-10 14:29     ` Jose E. Marchesi
  2023-02-10 22:05       ` Alexei Starovoitov
  2023-02-10 18:56     ` Yonghong Song
  1 sibling, 1 reply; 11+ messages in thread
From: Jose E. Marchesi @ 2023-02-10 14:29 UTC (permalink / raw)
  To: Yonghong Song
  Cc: alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	David Faust, James Hilliard, bpf, Martin KaFai Lau


>> Hi Yonghong.
>> Thanks for the proposal!
>>
>>> SDIV/SMOD (signed div and mod)
>>> ==============================
>>>
>>> bpf already has unsigned DIV and MOD. They are encoded as
>>>
>>>   insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
>>>   off(16 bits)
>>>   DIV     0x3              0/1               BPF_ALU/BPF_ALU64          0
>>>   MOD     0x9              0/1               BPF_ALU/BPF_ALU64          0
>>>
>>> The current 'code' field only has two value left, 0xe and 0xf.
>>> gcc used these two values (0xe and 0xf) for SDIV and SMOD.
>>> But using these two values takes up all 'code' space and makes
>>> future extension hard.
>>>
>>> Here, I propose to encode SDIV/SMOD like below:
>>>
>>>   insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
>>>   off(16 bits)
>>>   DIV     0x3              0/1               BPF_ALU/BPF_ALU64          1
>>>   MOD     0x9              0/1               BPF_ALU/BPF_ALU64          1
>>>
>>> Basically, we reuse the same 'code' value but changing 'off' from 0 to 1
>>> to indicate signed div/mod.
>>
>> I have a general concern about using instruction operands to encode
>> opcodes (in this case, 'off').
>>
>> At the moment we have two BPF instruction formats:
>>
>>  - The 64-bit instructions:
>>
>>     code:8 regs:8 offset:16 imm:32
>>
>>  - The 128-bit instructions:
>>
>>     code:8 regs:8 offset:16 imm:32 unused:32 imm:32 
>>
>> Of these, `code', `regs' and `unused' are what is commonly known as
>> instruction fields.  These are typically used for register numbers,
>> flags, and opcodes.
>>
>> On the other hand, offset, imm32 and imm:32:::imm:32 are instruction
>> operands (the later is non-contiguous and conforms the 64-bit operand in
>> the 128-bit instruction).
>>
>> The main difference between these is that the bytes conforming
>> instruction operands are themselves impacted by endianness, on top on
>> the endianness effect on the whole instruction.  (The weird endian-flip
>> in the two nibbles of `regs' is unfortunate, but I guess there is
>> nothing we can do about it at this point and I count them as
>> non-operands.)
>>
>> If you use an instruction operand (such as `offset') in order to act as
>> an opcode, you incur in two inconveniences:
>>
>> 1) In effect you have "moving" opcodes that depend on the endianness.
>>    The opcode for signed-operation will be 0x1 in big-endian BPF, but
>>    0x8000 in little-endian bpf.
>>
>> 2) You lose the ability of easily adding more complementary opcodes in
>>    these 16 bits in the future, in case you ever need them.
>>
>> As far as I have seen in other architectures, the usual way of doing
>> this is to add an additional instruction format, in this case for the
>> class of arithmetic instructions, where the bits dedicated to the unused
>> operand (offset) becomes a new opcodes field:
>>
>>   - 32-bit arithmetic instructions:
>>
>>     code:8 regs:8 code2:16 imm:32
>>
>> Where code2 is now an additional field (not an operand) that provides
>> extra additional opcode space for this particular class of instructions.
>> This can be divided in a 1-bit field to signify "signed" and the rest
>> reserved for future use:
>>
>>    opcode2 ::= unused(15) signed(1)
>
> Actually this would be just for DIV/MOD instructions, so the new format
> should only apply to them.  The new format would be something like:
>
>   - 64-bit ALU/ALU64 div/mod instructions (code=3,9):
>
>     code:8 regs:8 unused:15 signed:1 imm:32
>
> And for the rest of the ALU and ALU64 instructions
> (code=0,1,2,4,5,6,7,8,a,b,c,d):
>
>   - 64-bit ALU/ALU64 instructions:
>
>     code:8 regs:8 unused:16 imm:32

Re-reading what I wrote above I realize that it is messy and uses
confusing terms that are not used in instruction-set.rst, and it also
has mistakes.  Sorry about that, 3:30AM posting.

After sleeping over it I figured I better start over and this time I
keep it short and stick to instruction-set.rst terminology :)

What I am proposing is, instead of using the `offset' multi-byte field
to encode an opcode:

  =============  =======  =======  =======  ============
  32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
  =============  =======  =======  =======  ============
  imm            offset   src_reg  dst_reg  opcode
  =============  =======  =======  =======  ============

To introduce a new opcode2 field for ALU32/ALU instructions like this:

  =============  ======= ======= =======  =======  ============
  32 bits (MSB)  8 bits  8 bits  4 bits   4 bits   8 bits (LSB)
  =============  ======= ======= =======  =======  ============
  imm            opcode2 unused  src_reg  dst_reg  opcode
  =============  ======= ======= =======  =======  ============

This way:

1) The opcode2 field's stored value will be the same regardless of
   endianness.

2) The remaining 8 bits stay free for future extensions.

That is from a purely ISA perspective.  But then this morning I thought
about the kernel and its uapi.  This is in uapi/linux/bpf.h:

  struct bpf_insn {
  	__u8	code;		/* opcode */
  	__u8	dst_reg:4;	/* dest register */
  	__u8	src_reg:4;	/* source register */
  	__s16	off;		/* signed offset */
  	__s32	imm;		/* signed immediate constant */
  };

If the bpf_insn struct is supposed to reflect the encoding of stored BPF
instructions, and since it is part of the uapi, does this mean we are
stuck with that instruction format (and only that format) forever?

Because if changing the internal structure of the bpf_insn struct is a
no-no, then there is no way to expand the existing opcode space without
using unused multi-byte fields like `off' as such :/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf: Propose some new instructions for -mcpu=v4
  2023-02-09 23:36 ` Jose E. Marchesi
  2023-02-09 23:45   ` Jose E. Marchesi
  2023-02-10  1:45   ` Jose E. Marchesi
@ 2023-02-10 18:37   ` Yonghong Song
  2 siblings, 0 replies; 11+ messages in thread
From: Yonghong Song @ 2023-02-10 18:37 UTC (permalink / raw)
  To: Jose E. Marchesi
  Cc: alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	David Faust, James Hilliard, bpf, Martin KaFai Lau



On 2/9/23 3:36 PM, Jose E. Marchesi wrote:
> 
> Hi Yonghong.
> Thanks for the proposal!
> 
>> SDIV/SMOD (signed div and mod)
>> ==============================
>>
>> bpf already has unsigned DIV and MOD. They are encoded as
>>
>>    insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
>>    off(16 bits)
>>    DIV     0x3              0/1               BPF_ALU/BPF_ALU64          0
>>    MOD     0x9              0/1               BPF_ALU/BPF_ALU64          0
>>
>> The current 'code' field only has two value left, 0xe and 0xf.
>> gcc used these two values (0xe and 0xf) for SDIV and SMOD.
>> But using these two values takes up all 'code' space and makes
>> future extension hard.
>>
>> Here, I propose to encode SDIV/SMOD like below:
>>
>>    insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
>>    off(16 bits)
>>    DIV     0x3              0/1               BPF_ALU/BPF_ALU64          1
>>    MOD     0x9              0/1               BPF_ALU/BPF_ALU64          1
>>
>> Basically, we reuse the same 'code' value but changing 'off' from 0 to 1
>> to indicate signed div/mod.
> 
> I have a general concern about using instruction operands to encode
> opcodes (in this case, 'off').
> 
> At the moment we have two BPF instruction formats:
> 
>   - The 64-bit instructions:
> 
>      code:8 regs:8 offset:16 imm:32
> 
>   - The 128-bit instructions:
> 
>      code:8 regs:8 offset:16 imm:32 unused:32 imm:32
> 
> Of these, `code', `regs' and `unused' are what is commonly known as
> instruction fields.  These are typically used for register numbers,
> flags, and opcodes.
> 
> On the other hand, offset, imm32 and imm:32:::imm:32 are instruction
> operands (the later is non-contiguous and conforms the 64-bit operand in
> the 128-bit instruction).
> 
> The main difference between these is that the bytes conforming
> instruction operands are themselves impacted by endianness, on top on
> the endianness effect on the whole instruction.  (The weird endian-flip
> in the two nibbles of `regs' is unfortunate, but I guess there is
> nothing we can do about it at this point and I count them as
> non-operands.)
> 
> If you use an instruction operand (such as `offset') in order to act as
> an opcode, you incur in two inconveniences:
> 
> 1) In effect you have "moving" opcodes that depend on the endianness.
>     The opcode for signed-operation will be 0x1 in big-endian BPF, but
>     0x8000 in little-endian bpf.
> 
> 2) You lose the ability of easily adding more complementary opcodes in
>     these 16 bits in the future, in case you ever need them.
> 
> As far as I have seen in other architectures, the usual way of doing
> this is to add an additional instruction format, in this case for the
> class of arithmetic instructions, where the bits dedicated to the unused
> operand (offset) becomes a new opcodes field:
> 
>    - 32-bit arithmetic instructions:
> 
>      code:8 regs:8 code2:16 imm:32
> 
> Where code2 is now an additional field (not an operand) that provides
> extra additional opcode space for this particular class of instructions.
> This can be divided in a 1-bit field to signify "signed" and the rest
> reserved for future use:
> 
>     opcode2 ::= unused(15) signed(1)
> 
> Thoughts?

If I understand correctly, you proposed something like

insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
EXT     0xe              0/1               BPF_ALU/BPF_ALU64

The insn BPF_EXT means the actual 'code' is in 'off' (16 bits), right?
I think this could work.

But we already have precedence to use off/imm fields for insn encoding,
e.g., newer atomic insns like XCHG using imm for the 'code' and we use
the BPF_ATOMIC to indicate a 'class' of atomic operations.
In this specific case, we use MOD/DIV as the 'code' which implies
a class of mod/div variants and the 'off' is the way to differentiate.

There are pro's and con's for each approach for the above EXT vs.
my proposal. But maybe we could reserve EXT for future use?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf: Propose some new instructions for -mcpu=v4
  2023-02-09 23:39 ` Andrii Nakryiko
@ 2023-02-10 18:53   ` Yonghong Song
  2023-02-10 21:47     ` Andrii Nakryiko
  0 siblings, 1 reply; 11+ messages in thread
From: Yonghong Song @ 2023-02-10 18:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jose E. Marchesi, David Faust, James Hilliard, bpf,
	Martin KaFai Lau



On 2/9/23 3:39 PM, Andrii Nakryiko wrote:
> On Thu, Feb 9, 2023 at 2:55 PM Yonghong Song <yhs@meta.com> wrote:
>>
>> Over the past, there are some discussions to extend bpf
>> instruction ISA to accommodate some new use cases or
>> fix some potential issues. These new instructions will
>> be included in new cpu flavor -mcpu=v4.
>>
>> The following are the proposal
>> to add new instructions in 6 different categories.
>> The proposal is a little bit rough. You can find bpf insn
>> background information in Documentation/bpf/instruction-set.rst.
>> You comments or suggestions are welcome!
>>
> 
> Great that we are trying to fix and complete the instruction set! Just
> one comment/question below for condition jumps.
> 
> [...]
> 
>>
>> 32-bit JA
>> =========
>>
>> Currently, the whole range of operations with BPF_JMP32/BPF_JMP insn are
>> implemented like below
>>
>>     ========  =====  =========================  ============
>>     code      value  description                notes
>>     ========  =====  =========================  ============
>>     BPF_JA    0x00   PC += off                  BPF_JMP only
>>     BPF_JEQ   0x10   PC += off if dst == src
>>     BPF_JGT   0x20   PC += off if dst > src     unsigned
>>     BPF_JGE   0x30   PC += off if dst >= src    unsigned
>>     BPF_JSET  0x40   PC += off if dst & src
>>     BPF_JNE   0x50   PC += off if dst != src
>>     BPF_JSGT  0x60   PC += off if dst > src     signed
>>     BPF_JSGE  0x70   PC += off if dst >= src    signed
>>     BPF_CALL  0x80   function call
>>     BPF_EXIT  0x90   function / program return  BPF_JMP only
>>     BPF_JLT   0xa0   PC += off if dst < src     unsigned
>>     BPF_JLE   0xb0   PC += off if dst <= src    unsigned
>>     BPF_JSLT  0xc0   PC += off if dst < src     signed
>>     BPF_JSLE  0xd0   PC += off if dst <= src    signed
>>     ========  =====  =========================  ============
>>
>> Here the 'off' is 16 bit so the range of jump is [-32768, 32767].
>> In rare cases, people may have large programs or have loops fully unrolled.
>> This may cause some jump offset beyond the above range. In current
>> llvm implementation, wrong code (after truncation) will be generated.
>>
>> To fix this issue, the following new insn is proposed
>>
>>     ========  =====  =========================  ============
>>     code      value  description                notes
>>     ========  =====  =========================  ============
>>     BPF_JA    0x00   PC += imm                  BPF_JMP32 only, src = 1
>>
>> The way, the jump offset range become [-2^31, 2^31 - 1].
>>
>> For other jump instructions, e.g., BPF_JEQ, with a jmp offset
>> beyond [-32768, 32767]. It can be simulated with a
>> 'BPF_JA (PC += imm)' followed by the original
>> BPF_JEQ with the range 'off', or BPF_JEQ with a short range followed
>> by a BPF_JA.
> 
> Why not implement the same approach (using imm if src = 1) for all the
> conditional jumps? Just too much JIT work or some other reasons?

We cannot use 'src' since 'src' is used in conditional jump, e.g.,

   ========  =====  =========================  ============
   code      value  description                notes
   ========  =====  =========================  ============
   BPF_JEQ   0x10   PC += off if dst == src

In this particular case, there is no good way to extend
the insn with range [-2^31, 2^31 - 1] as 'off/dst/src' all
used by the above insn. The sample extension to original
BPF_JEQ seems not working so I came up with the above
BPF_JA (32bit range) + BPF_JEQ(16 bit range) approach.
It is ugly and increase implementation complexity, but
considering this is a corner case. It may not be
worthwhile to design a whole range of 32bit range of
BPF_JEQ/JGT/... instructions.

> 
> [...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf: Propose some new instructions for -mcpu=v4
  2023-02-10  1:45   ` Jose E. Marchesi
  2023-02-10 14:29     ` Jose E. Marchesi
@ 2023-02-10 18:56     ` Yonghong Song
  1 sibling, 0 replies; 11+ messages in thread
From: Yonghong Song @ 2023-02-10 18:56 UTC (permalink / raw)
  To: Jose E. Marchesi
  Cc: alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	David Faust, James Hilliard, bpf, Martin KaFai Lau



On 2/9/23 5:45 PM, Jose E. Marchesi wrote:
> 
>> Hi Yonghong.
>> Thanks for the proposal!
>>
>>> SDIV/SMOD (signed div and mod)
>>> ==============================
>>>
>>> bpf already has unsigned DIV and MOD. They are encoded as
>>>
>>>    insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
>>>    off(16 bits)
>>>    DIV     0x3              0/1               BPF_ALU/BPF_ALU64          0
>>>    MOD     0x9              0/1               BPF_ALU/BPF_ALU64          0
>>>
>>> The current 'code' field only has two value left, 0xe and 0xf.
>>> gcc used these two values (0xe and 0xf) for SDIV and SMOD.
>>> But using these two values takes up all 'code' space and makes
>>> future extension hard.
>>>
>>> Here, I propose to encode SDIV/SMOD like below:
>>>
>>>    insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
>>>    off(16 bits)
>>>    DIV     0x3              0/1               BPF_ALU/BPF_ALU64          1
>>>    MOD     0x9              0/1               BPF_ALU/BPF_ALU64          1
>>>
>>> Basically, we reuse the same 'code' value but changing 'off' from 0 to 1
>>> to indicate signed div/mod.
>>
>> I have a general concern about using instruction operands to encode
>> opcodes (in this case, 'off').
>>
>> At the moment we have two BPF instruction formats:
>>
>>   - The 64-bit instructions:
>>
>>      code:8 regs:8 offset:16 imm:32
>>
>>   - The 128-bit instructions:
>>
>>      code:8 regs:8 offset:16 imm:32 unused:32 imm:32
>>
>> Of these, `code', `regs' and `unused' are what is commonly known as
>> instruction fields.  These are typically used for register numbers,
>> flags, and opcodes.
>>
>> On the other hand, offset, imm32 and imm:32:::imm:32 are instruction
>> operands (the later is non-contiguous and conforms the 64-bit operand in
>> the 128-bit instruction).
>>
>> The main difference between these is that the bytes conforming
>> instruction operands are themselves impacted by endianness, on top on
>> the endianness effect on the whole instruction.  (The weird endian-flip
>> in the two nibbles of `regs' is unfortunate, but I guess there is
>> nothing we can do about it at this point and I count them as
>> non-operands.)
>>
>> If you use an instruction operand (such as `offset') in order to act as
>> an opcode, you incur in two inconveniences:
>>
>> 1) In effect you have "moving" opcodes that depend on the endianness.
>>     The opcode for signed-operation will be 0x1 in big-endian BPF, but
>>     0x8000 in little-endian bpf.
>>
>> 2) You lose the ability of easily adding more complementary opcodes in
>>     these 16 bits in the future, in case you ever need them.
>>
>> As far as I have seen in other architectures, the usual way of doing
>> this is to add an additional instruction format, in this case for the
>> class of arithmetic instructions, where the bits dedicated to the unused
>> operand (offset) becomes a new opcodes field:
>>
>>    - 32-bit arithmetic instructions:
>>
>>      code:8 regs:8 code2:16 imm:32
>>
>> Where code2 is now an additional field (not an operand) that provides
>> extra additional opcode space for this particular class of instructions.
>> This can be divided in a 1-bit field to signify "signed" and the rest
>> reserved for future use:
>>
>>     opcode2 ::= unused(15) signed(1)
> 
> Actually this would be just for DIV/MOD instructions, so the new format
> should only apply to them.  The new format would be something like:
> 
>    - 64-bit ALU/ALU64 div/mod instructions (code=3,9):
> 
>      code:8 regs:8 unused:15 signed:1 imm:32
> 
> And for the rest of the ALU and ALU64 instructions
> (code=0,1,2,4,5,6,7,8,a,b,c,d):
> 
>    - 64-bit ALU/ALU64 instructions:
> 
>      code:8 regs:8 unused:16 imm:32

That is correct. My design can be interpreted this way.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf: Propose some new instructions for -mcpu=v4
  2023-02-10 18:53   ` Yonghong Song
@ 2023-02-10 21:47     ` Andrii Nakryiko
  0 siblings, 0 replies; 11+ messages in thread
From: Andrii Nakryiko @ 2023-02-10 21:47 UTC (permalink / raw)
  To: Yonghong Song
  Cc: alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jose E. Marchesi, David Faust, James Hilliard, bpf,
	Martin KaFai Lau

On Fri, Feb 10, 2023 at 10:54 AM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 2/9/23 3:39 PM, Andrii Nakryiko wrote:
> > On Thu, Feb 9, 2023 at 2:55 PM Yonghong Song <yhs@meta.com> wrote:
> >>
> >> Over the past, there are some discussions to extend bpf
> >> instruction ISA to accommodate some new use cases or
> >> fix some potential issues. These new instructions will
> >> be included in new cpu flavor -mcpu=v4.
> >>
> >> The following are the proposal
> >> to add new instructions in 6 different categories.
> >> The proposal is a little bit rough. You can find bpf insn
> >> background information in Documentation/bpf/instruction-set.rst.
> >> You comments or suggestions are welcome!
> >>
> >
> > Great that we are trying to fix and complete the instruction set! Just
> > one comment/question below for condition jumps.
> >
> > [...]
> >
> >>
> >> 32-bit JA
> >> =========
> >>
> >> Currently, the whole range of operations with BPF_JMP32/BPF_JMP insn are
> >> implemented like below
> >>
> >>     ========  =====  =========================  ============
> >>     code      value  description                notes
> >>     ========  =====  =========================  ============
> >>     BPF_JA    0x00   PC += off                  BPF_JMP only
> >>     BPF_JEQ   0x10   PC += off if dst == src
> >>     BPF_JGT   0x20   PC += off if dst > src     unsigned
> >>     BPF_JGE   0x30   PC += off if dst >= src    unsigned
> >>     BPF_JSET  0x40   PC += off if dst & src
> >>     BPF_JNE   0x50   PC += off if dst != src
> >>     BPF_JSGT  0x60   PC += off if dst > src     signed
> >>     BPF_JSGE  0x70   PC += off if dst >= src    signed
> >>     BPF_CALL  0x80   function call
> >>     BPF_EXIT  0x90   function / program return  BPF_JMP only
> >>     BPF_JLT   0xa0   PC += off if dst < src     unsigned
> >>     BPF_JLE   0xb0   PC += off if dst <= src    unsigned
> >>     BPF_JSLT  0xc0   PC += off if dst < src     signed
> >>     BPF_JSLE  0xd0   PC += off if dst <= src    signed
> >>     ========  =====  =========================  ============
> >>
> >> Here the 'off' is 16 bit so the range of jump is [-32768, 32767].
> >> In rare cases, people may have large programs or have loops fully unrolled.
> >> This may cause some jump offset beyond the above range. In current
> >> llvm implementation, wrong code (after truncation) will be generated.
> >>
> >> To fix this issue, the following new insn is proposed
> >>
> >>     ========  =====  =========================  ============
> >>     code      value  description                notes
> >>     ========  =====  =========================  ============
> >>     BPF_JA    0x00   PC += imm                  BPF_JMP32 only, src = 1
> >>
> >> The way, the jump offset range become [-2^31, 2^31 - 1].
> >>
> >> For other jump instructions, e.g., BPF_JEQ, with a jmp offset
> >> beyond [-32768, 32767]. It can be simulated with a
> >> 'BPF_JA (PC += imm)' followed by the original
> >> BPF_JEQ with the range 'off', or BPF_JEQ with a short range followed
> >> by a BPF_JA.
> >
> > Why not implement the same approach (using imm if src = 1) for all the
> > conditional jumps? Just too much JIT work or some other reasons?
>
> We cannot use 'src' since 'src' is used in conditional jump, e.g.,

Right, I missed this part, thanks.

>
>    ========  =====  =========================  ============
>    code      value  description                notes
>    ========  =====  =========================  ============
>    BPF_JEQ   0x10   PC += off if dst == src
>
> In this particular case, there is no good way to extend
> the insn with range [-2^31, 2^31 - 1] as 'off/dst/src' all
> used by the above insn. The sample extension to original
> BPF_JEQ seems not working so I came up with the above
> BPF_JA (32bit range) + BPF_JEQ(16 bit range) approach.
> It is ugly and increase implementation complexity, but
> considering this is a corner case. It may not be
> worthwhile to design a whole range of 32bit range of
> BPF_JEQ/JGT/... instructions.
>
> >
> > [...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf: Propose some new instructions for -mcpu=v4
  2023-02-10 14:29     ` Jose E. Marchesi
@ 2023-02-10 22:05       ` Alexei Starovoitov
  0 siblings, 0 replies; 11+ messages in thread
From: Alexei Starovoitov @ 2023-02-10 22:05 UTC (permalink / raw)
  To: Jose E. Marchesi
  Cc: Yonghong Song, alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, David Faust, James Hilliard, bpf,
	Martin KaFai Lau

On Fri, Feb 10, 2023 at 6:29 AM Jose E. Marchesi
<jose.marchesi@oracle.com> wrote:
>
>
> >> Hi Yonghong.
> >> Thanks for the proposal!
> >>
> >>> SDIV/SMOD (signed div and mod)
> >>> ==============================
> >>>
> >>> bpf already has unsigned DIV and MOD. They are encoded as
> >>>
> >>>   insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
> >>>   off(16 bits)
> >>>   DIV     0x3              0/1               BPF_ALU/BPF_ALU64          0
> >>>   MOD     0x9              0/1               BPF_ALU/BPF_ALU64          0
> >>>
> >>> The current 'code' field only has two value left, 0xe and 0xf.
> >>> gcc used these two values (0xe and 0xf) for SDIV and SMOD.
> >>> But using these two values takes up all 'code' space and makes
> >>> future extension hard.
> >>>
> >>> Here, I propose to encode SDIV/SMOD like below:
> >>>
> >>>   insn    code(4 bits)     source(1 bit)     instruction class(3 bit)
> >>>   off(16 bits)
> >>>   DIV     0x3              0/1               BPF_ALU/BPF_ALU64          1
> >>>   MOD     0x9              0/1               BPF_ALU/BPF_ALU64          1
> >>>
> >>> Basically, we reuse the same 'code' value but changing 'off' from 0 to 1
> >>> to indicate signed div/mod.
> >>
> >> I have a general concern about using instruction operands to encode
> >> opcodes (in this case, 'off').
> >>
> >> At the moment we have two BPF instruction formats:
> >>
> >>  - The 64-bit instructions:
> >>
> >>     code:8 regs:8 offset:16 imm:32
> >>
> >>  - The 128-bit instructions:
> >>
> >>     code:8 regs:8 offset:16 imm:32 unused:32 imm:32
> >>
> >> Of these, `code', `regs' and `unused' are what is commonly known as
> >> instruction fields.  These are typically used for register numbers,
> >> flags, and opcodes.
> >>
> >> On the other hand, offset, imm32 and imm:32:::imm:32 are instruction
> >> operands (the later is non-contiguous and conforms the 64-bit operand in
> >> the 128-bit instruction).
> >>
> >> The main difference between these is that the bytes conforming
> >> instruction operands are themselves impacted by endianness, on top on
> >> the endianness effect on the whole instruction.  (The weird endian-flip
> >> in the two nibbles of `regs' is unfortunate, but I guess there is
> >> nothing we can do about it at this point and I count them as
> >> non-operands.)
> >>
> >> If you use an instruction operand (such as `offset') in order to act as
> >> an opcode, you incur in two inconveniences:
> >>
> >> 1) In effect you have "moving" opcodes that depend on the endianness.
> >>    The opcode for signed-operation will be 0x1 in big-endian BPF, but
> >>    0x8000 in little-endian bpf.
> >>
> >> 2) You lose the ability of easily adding more complementary opcodes in
> >>    these 16 bits in the future, in case you ever need them.
> >>
> >> As far as I have seen in other architectures, the usual way of doing
> >> this is to add an additional instruction format, in this case for the
> >> class of arithmetic instructions, where the bits dedicated to the unused
> >> operand (offset) becomes a new opcodes field:
> >>
> >>   - 32-bit arithmetic instructions:
> >>
> >>     code:8 regs:8 code2:16 imm:32
> >>
> >> Where code2 is now an additional field (not an operand) that provides
> >> extra additional opcode space for this particular class of instructions.
> >> This can be divided in a 1-bit field to signify "signed" and the rest
> >> reserved for future use:
> >>
> >>    opcode2 ::= unused(15) signed(1)
> >
> > Actually this would be just for DIV/MOD instructions, so the new format
> > should only apply to them.  The new format would be something like:
> >
> >   - 64-bit ALU/ALU64 div/mod instructions (code=3,9):
> >
> >     code:8 regs:8 unused:15 signed:1 imm:32
> >
> > And for the rest of the ALU and ALU64 instructions
> > (code=0,1,2,4,5,6,7,8,a,b,c,d):
> >
> >   - 64-bit ALU/ALU64 instructions:
> >
> >     code:8 regs:8 unused:16 imm:32
>
> Re-reading what I wrote above I realize that it is messy and uses
> confusing terms that are not used in instruction-set.rst, and it also
> has mistakes.  Sorry about that, 3:30AM posting.
>
> After sleeping over it I figured I better start over and this time I
> keep it short and stick to instruction-set.rst terminology :)
>
> What I am proposing is, instead of using the `offset' multi-byte field
> to encode an opcode:
>
>   =============  =======  =======  =======  ============
>   32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
>   =============  =======  =======  =======  ============
>   imm            offset   src_reg  dst_reg  opcode
>   =============  =======  =======  =======  ============
>
> To introduce a new opcode2 field for ALU32/ALU instructions like this:
>
>   =============  ======= ======= =======  =======  ============
>   32 bits (MSB)  8 bits  8 bits  4 bits   4 bits   8 bits (LSB)
>   =============  ======= ======= =======  =======  ============
>   imm            opcode2 unused  src_reg  dst_reg  opcode
>   =============  ======= ======= =======  =======  ============
>
> This way:
>
> 1) The opcode2 field's stored value will be the same regardless of
>    endianness.
>
> 2) The remaining 8 bits stay free for future extensions.
>
> That is from a purely ISA perspective.  But then this morning I thought
> about the kernel and its uapi.  This is in uapi/linux/bpf.h:
>
>   struct bpf_insn {
>         __u8    code;           /* opcode */
>         __u8    dst_reg:4;      /* dest register */
>         __u8    src_reg:4;      /* source register */
>         __s16   off;            /* signed offset */
>         __s32   imm;            /* signed immediate constant */
>   };
>
> If the bpf_insn struct is supposed to reflect the encoding of stored BPF
> instructions, and since it is part of the uapi, does this mean we are
> stuck with that instruction format (and only that format) forever?
>
> Because if changing the internal structure of the bpf_insn struct is a
> no-no, then there is no way to expand the existing opcode space without
> using unused multi-byte fields like `off' as such :/

Yes. It's an uapi and we're stuck with it.
If I understand correctly you want to extend 8-bit opcode field
with another endian independent 8 or 1 bit field.
It's possible on paper, but very challenging and ugly in the code.
llvm backend does this for 'off' field:
OSE.write<uint16_t>((Value >> 32) & 0xffff);
where OSE is endian dependent writer.
It does it unconditionally for all insns when it emits them into elf.
While in BPFInstrInfo.td we set this 16 bits as:
let Inst{47-32} = offset;
so to make it 'endian independent' from ISA point of view
the llvm backend would need to undo the endianness OSE.write
and become endian dependent when it sets Inst{47-32} which is
possible, but quite ugly.
Another alternative is to teach binary emitter to different opcodes
and do OSE.write<uint16_t> one way for one set of opcode
and differently for another. Which is equally ugly.

The same ugliness will be on the kernel side.
Instead of doing if (bpf_insn->off == 1) in the interpreter and all JITs
in all architectures (both little and big)
the kernel would need to do this pseudo-endian-independent dance
and little end arch would do if (bpf_insn->off == 1) while
big-endian arch would do if (bpf_insn->off == 255)
which is double ugly.

imo Yonghong's proposal to encode off=1 for signed div/mod is cleaner.
It fits well to what we have already for other insns.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-02-10 22:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 22:54 bpf: Propose some new instructions for -mcpu=v4 Yonghong Song
2023-02-09 23:36 ` Jose E. Marchesi
2023-02-09 23:45   ` Jose E. Marchesi
2023-02-10  1:45   ` Jose E. Marchesi
2023-02-10 14:29     ` Jose E. Marchesi
2023-02-10 22:05       ` Alexei Starovoitov
2023-02-10 18:56     ` Yonghong Song
2023-02-10 18:37   ` Yonghong Song
2023-02-09 23:39 ` Andrii Nakryiko
2023-02-10 18:53   ` Yonghong Song
2023-02-10 21:47     ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.