All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file
@ 2022-09-27 18:59 dthaler1968
  2022-09-27 18:59 ` [PATCH 02/15] ebpf-docs: Linux byteswap note dthaler1968
                   ` (14 more replies)
  0 siblings, 15 replies; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 38 ++--------------
 Documentation/bpf/linux-notes.rst     | 65 +++++++++++++++++++++++++++
 2 files changed, 68 insertions(+), 35 deletions(-)
 create mode 100644 Documentation/bpf/linux-notes.rst

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 1b0e6711d..352f25a1e 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -282,8 +282,6 @@ arithmetic operations in the imm field to encode the atomic operation:
 
   *(u64 *)(dst_reg + off16) += src_reg
 
-``BPF_XADD`` is a deprecated name for ``BPF_ATOMIC | BPF_ADD``.
-
 In addition to the simple atomic operations, there also is a modifier and
 two complex atomic operations:
 
@@ -331,36 +329,6 @@ There is currently only one such instruction.
 Legacy BPF Packet access instructions
 -------------------------------------
 
-eBPF has special instructions for access to packet data that have been
-carried over from classic BPF to retain the performance of legacy socket
-filters running in the eBPF interpreter.
-
-The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
-``BPF_IND | <size> | BPF_LD``.
-
-These instructions are used to access packet data and can only be used when
-the program context is a pointer to networking packet.  ``BPF_ABS``
-accesses packet data at an absolute offset specified by the immediate data
-and ``BPF_IND`` access packet data at an offset that includes the value of
-a register in addition to the immediate data.
-
-These instructions have seven implicit operands:
-
- * Register R6 is an implicit input that must contain pointer to a
-   struct sk_buff.
- * Register R0 is an implicit output which contains the data fetched from
-   the packet.
- * Registers R1-R5 are scratch registers that are clobbered after a call to
-   ``BPF_ABS | BPF_LD`` or ``BPF_IND | BPF_LD`` instructions.
-
-These instructions have an implicit program exit condition as well. When an
-eBPF program is trying to access the data beyond the packet boundary, the
-program execution will be aborted.
-
-``BPF_ABS | BPF_W | BPF_LD`` means::
-
-  R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + imm32))
-
-``BPF_IND | BPF_W | BPF_LD`` means::
-
-  R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
+eBPF previously introduced special instructions for access to packet data that were
+carried over from classic BPF. However, these instructions are
+deprecated and should no longer be used.
diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
new file mode 100644
index 000000000..93c01386d
--- /dev/null
+++ b/Documentation/bpf/linux-notes.rst
@@ -0,0 +1,65 @@
+.. contents::
+.. sectnum::
+
+==========================
+Linux implementation notes
+==========================
+
+This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
+
+Legacy BPF Packet access instructions
+=====================================
+
+As mentioned in the `ISA standard documentation <instruction-set.rst#legacy-bpf-packet-access-instructions>`_,
+Linux has special eBPF instructions for access to packet data that have been
+carried over from classic BPF to retain the performance of legacy socket
+filters running in the eBPF interpreter.
+
+The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
+``BPF_IND | <size> | BPF_LD``.
+
+These instructions are used to access packet data and can only be used when
+the program context is a pointer to a networking packet.  ``BPF_ABS``
+accesses packet data at an absolute offset specified by the immediate data
+and ``BPF_IND`` access packet data at an offset that includes the value of
+a register in addition to the immediate data.
+
+These instructions have seven implicit operands:
+
+* Register R6 is an implicit input that must contain a pointer to a
+  struct sk_buff.
+* Register R0 is an implicit output which contains the data fetched from
+  the packet.
+* Registers R1-R5 are scratch registers that are clobbered by the
+  instruction.
+
+These instructions have an implicit program exit condition as well. If an
+eBPF program attempts access data beyond the packet boundary, the
+program execution will be aborted.
+
+``BPF_ABS | BPF_W | BPF_LD`` (0x20) means::
+
+  R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + imm))
+
+where ``ntohl()`` converts a 32-bit value from network byte order to host byte order.
+
+``BPF_IND | BPF_W | BPF_LD`` (0x40) means::
+
+  R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + src + imm))
+
+Appendix
+========
+
+For reference, the following table lists legacy Linux-specific opcodes in order by value.
+
+======  ====  ===================================================  =============
+opcode  imm   description                                          reference
+======  ====  ===================================================  =============
+0x20    any   dst = ntohl(\*(uint32_t \*)(R6->data + imm))         `Legacy BPF Packet access instructions`_
+0x28    any   dst = ntohs(\*(uint16_t \*)(R6->data + imm))         `Legacy BPF Packet access instructions`_
+0x30    any   dst = (\*(uint8_t \*)(R6->data + imm))               `Legacy BPF Packet access instructions`_
+0x38    any   dst = ntohll(\*(uint64_t \*)(R6->data + imm))        `Legacy BPF Packet access instructions`_
+0x40    any   dst = ntohl(\*(uint32_t \*)(R6->data + src + imm))   `Legacy BPF Packet access instructions`_
+0x48    any   dst = ntohs(\*(uint16_t \*)(R6->data + src + imm))   `Legacy BPF Packet access instructions`_
+0x50    any   dst = \*(uint8_t \*)(R6->data + src + imm))          `Legacy BPF Packet access instructions`_
+0x58    any   dst = ntohll(\*(uint64_t \*)(R6->data + src + imm))  `Legacy BPF Packet access instructions`_
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 02/15] ebpf-docs: Linux byteswap note
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-27 18:59 ` [PATCH 03/15] ebpf-docs: Move Clang notes to a separate file dthaler1968
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 4 ----
 Documentation/bpf/linux-notes.rst     | 5 +++++
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 352f25a1e..1735b91ec 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -156,10 +156,6 @@ Examples:
 
   dst_reg = htobe64(dst_reg)
 
-``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and
-``BPF_TO_BE`` respectively.
-
-
 Jump instructions
 -----------------
 
diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
index 93c01386d..1c31379b4 100644
--- a/Documentation/bpf/linux-notes.rst
+++ b/Documentation/bpf/linux-notes.rst
@@ -7,6 +7,11 @@ Linux implementation notes
 
 This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
 
+Byte swap instructions
+======================
+
+``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and ``BPF_TO_BE`` respectively.
+
 Legacy BPF Packet access instructions
 =====================================
 
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 03/15] ebpf-docs: Move Clang notes to a separate file
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
  2022-09-27 18:59 ` [PATCH 02/15] ebpf-docs: Linux byteswap note dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-27 18:59 ` [PATCH 04/15] ebpf-docs: Add Clang note about BPF_ALU dthaler1968
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/clang-notes.rst     | 24 ++++++++++++++++++++++++
 Documentation/bpf/instruction-set.rst |  6 ------
 2 files changed, 24 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/bpf/clang-notes.rst

diff --git a/Documentation/bpf/clang-notes.rst b/Documentation/bpf/clang-notes.rst
new file mode 100644
index 000000000..b15179cb5
--- /dev/null
+++ b/Documentation/bpf/clang-notes.rst
@@ -0,0 +1,24 @@
+.. contents::
+.. sectnum::
+
+==========================
+Clang implementation notes
+==========================
+
+This document provides more details specific to the Clang/LLVM implementation of the eBPF instruction set.
+
+Versions
+========
+
+Clang defined "CPU" versions, where a CPU version of 3 corresponds to the current eBPF ISA.
+
+Clang can select the eBPF ISA version using ``-mcpu=v3`` for example to select version 3.
+
+Atomic operations
+=================
+
+Clang can generate atomic instructions by default when ``-mcpu=v3`` is
+enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
+Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
+the atomics features, while keeping a lower ``-mcpu`` version, you can use
+``-Xclang -target-feature -Xclang +alu32``.
diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 1735b91ec..541483118 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -303,12 +303,6 @@ The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
 value that was at ``dst_reg + off`` before the operation is zero-extended
 and loaded back to ``R0``.
 
-Clang can generate atomic instructions by default when ``-mcpu=v3`` is
-enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
-Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
-the atomics features, while keeping a lower ``-mcpu`` version, you can use
-``-Xclang -target-feature -Xclang +alu32``.
-
 64-bit immediate instructions
 -----------------------------
 
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 04/15] ebpf-docs: Add Clang note about BPF_ALU
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
  2022-09-27 18:59 ` [PATCH 02/15] ebpf-docs: Linux byteswap note dthaler1968
  2022-09-27 18:59 ` [PATCH 03/15] ebpf-docs: Move Clang notes to a separate file dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-27 18:59 ` [PATCH 05/15] ebpf-docs: Add TOC and fix formatting dthaler1968
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/clang-notes.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/bpf/clang-notes.rst b/Documentation/bpf/clang-notes.rst
index b15179cb5..528feddf2 100644
--- a/Documentation/bpf/clang-notes.rst
+++ b/Documentation/bpf/clang-notes.rst
@@ -14,6 +14,12 @@ Clang defined "CPU" versions, where a CPU version of 3 corresponds to the curren
 
 Clang can select the eBPF ISA version using ``-mcpu=v3`` for example to select version 3.
 
+Arithmetic instructions
+=======================
+
+For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with
+``-Xclang -target-feature -Xclang +alu32``.  In CPU version 3, support is automatically included.
+
 Atomic operations
 =================
 
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 05/15] ebpf-docs: Add TOC and fix formatting
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (2 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 04/15] ebpf-docs: Add Clang note about BPF_ALU dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-27 18:59 ` [PATCH 06/15] ebpf-docs: Use standard type convention in standard doc dthaler1968
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 268 +++++++++++++-------------
 1 file changed, 136 insertions(+), 132 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 541483118..4997d2088 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -1,7 +1,12 @@
+.. contents::
+.. sectnum::
+
+========================================
+eBPF Instruction Set Specification, v1.0
+========================================
+
+This document specifies version 1.0 of the eBPF instruction set.
 
-====================
-eBPF Instruction Set
-====================
 
 Registers and calling convention
 ================================
@@ -11,10 +16,10 @@ all of which are 64-bits wide.
 
 The eBPF calling convention is defined as:
 
- * R0: return value from function calls, and exit value for eBPF programs
- * R1 - R5: arguments for function calls
- * R6 - R9: callee saved registers that function calls will preserve
- * R10: read-only frame pointer to access stack
+* R0: return value from function calls, and exit value for eBPF programs
+* R1 - R5: arguments for function calls
+* R6 - R9: callee saved registers that function calls will preserve
+* R10: read-only frame pointer to access stack
 
 R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
 necessary across calls.
@@ -24,17 +29,17 @@ Instruction encoding
 
 eBPF has two instruction encodings:
 
- * the basic instruction encoding, which uses 64 bits to encode an instruction
- * the wide instruction encoding, which appends a second 64-bit immediate value
-   (imm64) after the basic instruction for a total of 128 bits.
+* the basic instruction encoding, which uses 64 bits to encode an instruction
+* the wide instruction encoding, which appends a second 64-bit immediate value
+  (imm64) after the basic instruction for a total of 128 bits.
 
 The basic instruction encoding looks as follows:
 
- =============  =======  ===============  ====================  ============
- 32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
- =============  =======  ===============  ====================  ============
- immediate      offset   source register  destination register  opcode
- =============  =======  ===============  ====================  ============
+=============  =======  ===============  ====================  ============
+32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
+=============  =======  ===============  ====================  ============
+immediate      offset   source register  destination register  opcode
+=============  =======  ===============  ====================  ============
 
 Note that most instructions do not use all of the fields.
 Unused fields shall be cleared to zero.
@@ -44,30 +49,30 @@ Instruction classes
 
 The three LSB bits of the 'opcode' field store the instruction class:
 
-  =========  =====  ===============================
-  class      value  description
-  =========  =====  ===============================
-  BPF_LD     0x00   non-standard load operations
-  BPF_LDX    0x01   load into register operations
-  BPF_ST     0x02   store from immediate operations
-  BPF_STX    0x03   store from register operations
-  BPF_ALU    0x04   32-bit arithmetic operations
-  BPF_JMP    0x05   64-bit jump operations
-  BPF_JMP32  0x06   32-bit jump operations
-  BPF_ALU64  0x07   64-bit arithmetic operations
-  =========  =====  ===============================
+=========  =====  ===============================  ===================================
+class      value  description                      reference
+=========  =====  ===============================  ===================================
+BPF_LD     0x00   non-standard load operations     `Load and store instructions`_
+BPF_LDX    0x01   load into register operations    `Load and store instructions`_
+BPF_ST     0x02   store from immediate operations  `Load and store instructions`_
+BPF_STX    0x03   store from register operations   `Load and store instructions`_
+BPF_ALU    0x04   32-bit arithmetic operations     `Arithmetic and jump instructions`_
+BPF_JMP    0x05   64-bit jump operations           `Arithmetic and jump instructions`_
+BPF_JMP32  0x06   32-bit jump operations           `Arithmetic and jump instructions`_
+BPF_ALU64  0x07   64-bit arithmetic operations     `Arithmetic and jump instructions`_
+=========  =====  ===============================  ===================================
 
 Arithmetic and jump instructions
 ================================
 
-For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
-BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:
+For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and
+``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts:
 
-  ==============  ======  =================
-  4 bits (MSB)    1 bit   3 bits (LSB)
-  ==============  ======  =================
-  operation code  source  instruction class
-  ==============  ======  =================
+==============  ======  =================
+4 bits (MSB)    1 bit   3 bits (LSB)
+==============  ======  =================
+operation code  source  instruction class
+==============  ======  =================
 
 The 4th bit encodes the source operand:
 
@@ -84,51 +89,51 @@ The four MSB bits store the operation code.
 Arithmetic instructions
 -----------------------
 
-BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
+``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
 otherwise identical operations.
-The code field encodes the operation as below:
-
-  ========  =====  =================================================
-  code      value  description
-  ========  =====  =================================================
-  BPF_ADD   0x00   dst += src
-  BPF_SUB   0x10   dst -= src
-  BPF_MUL   0x20   dst \*= src
-  BPF_DIV   0x30   dst /= src
-  BPF_OR    0x40   dst \|= src
-  BPF_AND   0x50   dst &= src
-  BPF_LSH   0x60   dst <<= src
-  BPF_RSH   0x70   dst >>= src
-  BPF_NEG   0x80   dst = ~src
-  BPF_MOD   0x90   dst %= src
-  BPF_XOR   0xa0   dst ^= src
-  BPF_MOV   0xb0   dst = src
-  BPF_ARSH  0xc0   sign extending shift right
-  BPF_END   0xd0   byte swap operations (see separate section below)
-  ========  =====  =================================================
-
-BPF_ADD | BPF_X | BPF_ALU means::
+The 'code' field encodes the operation as below:
+
+========  =====  ==========================================================
+code      value  description
+========  =====  ==========================================================
+BPF_ADD   0x00   dst += src
+BPF_SUB   0x10   dst -= src
+BPF_MUL   0x20   dst \*= src
+BPF_DIV   0x30   dst /= src
+BPF_OR    0x40   dst \|= src
+BPF_AND   0x50   dst &= src
+BPF_LSH   0x60   dst <<= src
+BPF_RSH   0x70   dst >>= src
+BPF_NEG   0x80   dst = ~src
+BPF_MOD   0x90   dst %= src
+BPF_XOR   0xa0   dst ^= src
+BPF_MOV   0xb0   dst = src
+BPF_ARSH  0xc0   sign extending shift right
+BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
+========  =====  ==========================================================
+
+``BPF_ADD | BPF_X | BPF_ALU`` means::
 
   dst_reg = (u32) dst_reg + (u32) src_reg;
 
-BPF_ADD | BPF_X | BPF_ALU64 means::
+``BPF_ADD | BPF_X | BPF_ALU64`` means::
 
   dst_reg = dst_reg + src_reg
 
-BPF_XOR | BPF_K | BPF_ALU means::
+``BPF_XOR | BPF_K | BPF_ALU`` means::
 
   src_reg = (u32) src_reg ^ (u32) imm32
 
-BPF_XOR | BPF_K | BPF_ALU64 means::
+``BPF_XOR | BPF_K | BPF_ALU64`` means::
 
   src_reg = src_reg ^ imm32
 
 
 Byte swap instructions
-----------------------
+~~~~~~~~~~~~~~~~~~~~~~
 
 The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
-code field of ``BPF_END``.
+'code' field of ``BPF_END``.
 
 The byte swap instructions operate on the destination register
 only and do not use a separate source register or immediate value.
@@ -136,14 +141,14 @@ only and do not use a separate source register or immediate value.
 The 1-bit source operand field in the opcode is used to to select what byte
 order the operation convert from or to:
 
-  =========  =====  =================================================
-  source     value  description
-  =========  =====  =================================================
-  BPF_TO_LE  0x00   convert between host byte order and little endian
-  BPF_TO_BE  0x08   convert between host byte order and big endian
-  =========  =====  =================================================
+=========  =====  =================================================
+source     value  description
+=========  =====  =================================================
+BPF_TO_LE  0x00   convert between host byte order and little endian
+BPF_TO_BE  0x08   convert between host byte order and big endian
+=========  =====  =================================================
 
-The imm field encodes the width of the swap operations.  The following widths
+The 'imm' field encodes the width of the swap operations.  The following widths
 are supported: 16, 32 and 64.
 
 Examples:
@@ -159,28 +164,28 @@ Examples:
 Jump instructions
 -----------------
 
-BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
+``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
 otherwise identical operations.
-The code field encodes the operation as below:
-
-  ========  =====  =========================  ============
-  code      value  description                notes
-  ========  =====  =========================  ============
-  BPF_JA    0x00   PC += off                  BPF_JMP only
-  BPF_JEQ   0x10   PC += off if dst == src
-  BPF_JGT   0x20   PC += off if dst > src     unsigned
-  BPF_JGE   0x30   PC += off if dst >= src    unsigned
-  BPF_JSET  0x40   PC += off if dst & src
-  BPF_JNE   0x50   PC += off if dst != src
-  BPF_JSGT  0x60   PC += off if dst > src     signed
-  BPF_JSGE  0x70   PC += off if dst >= src    signed
-  BPF_CALL  0x80   function call
-  BPF_EXIT  0x90   function / program return  BPF_JMP only
-  BPF_JLT   0xa0   PC += off if dst < src     unsigned
-  BPF_JLE   0xb0   PC += off if dst <= src    unsigned
-  BPF_JSLT  0xc0   PC += off if dst < src     signed
-  BPF_JSLE  0xd0   PC += off if dst <= src    signed
-  ========  =====  =========================  ============
+The 'code' field encodes the operation as below:
+
+========  =====  =========================  ============
+code      value  description                notes
+========  =====  =========================  ============
+BPF_JA    0x00   PC += off                  BPF_JMP only
+BPF_JEQ   0x10   PC += off if dst == src
+BPF_JGT   0x20   PC += off if dst > src     unsigned
+BPF_JGE   0x30   PC += off if dst >= src    unsigned
+BPF_JSET  0x40   PC += off if dst & src
+BPF_JNE   0x50   PC += off if dst != src
+BPF_JSGT  0x60   PC += off if dst > src     signed
+BPF_JSGE  0x70   PC += off if dst >= src    signed
+BPF_CALL  0x80   function call
+BPF_EXIT  0x90   function / program return  BPF_JMP only
+BPF_JLT   0xa0   PC += off if dst < src     unsigned
+BPF_JLE   0xb0   PC += off if dst <= src    unsigned
+BPF_JSLT  0xc0   PC += off if dst < src     signed
+BPF_JSLE  0xd0   PC += off if dst <= src    signed
+========  =====  =========================  ============
 
 The eBPF program needs to store the return value into register R0 before doing a
 BPF_EXIT.
@@ -189,14 +194,26 @@ BPF_EXIT.
 Load and store instructions
 ===========================
 
-For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
+For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the
 8-bit 'opcode' field is divided as:
 
-  ============  ======  =================
-  3 bits (MSB)  2 bits  3 bits (LSB)
-  ============  ======  =================
-  mode          size    instruction class
-  ============  ======  =================
+============  ======  =================
+3 bits (MSB)  2 bits  3 bits (LSB)
+============  ======  =================
+mode          size    instruction class
+============  ======  =================
+
+The mode modifier is one of:
+
+  =============  =====  ====================================  =============
+  mode modifier  value  description                           reference
+  =============  =====  ====================================  =============
+  BPF_IMM        0x00   64-bit immediate instructions         `64-bit immediate instructions`_
+  BPF_ABS        0x20   legacy BPF packet access (absolute)   `Legacy BPF Packet access instructions`_
+  BPF_IND        0x40   legacy BPF packet access (indirect)   `Legacy BPF Packet access instructions`_
+  BPF_MEM        0x60   regular load and store operations     `Regular load and store operations`_
+  BPF_ATOMIC     0xc0   atomic operations                     `Atomic operations`_
+  =============  =====  ====================================  =============
 
 The size modifier is one of:
 
@@ -209,19 +226,6 @@ The size modifier is one of:
   BPF_DW         0x18   double word (8 bytes)
   =============  =====  =====================
 
-The mode modifier is one of:
-
-  =============  =====  ====================================
-  mode modifier  value  description
-  =============  =====  ====================================
-  BPF_IMM        0x00   64-bit immediate instructions
-  BPF_ABS        0x20   legacy BPF packet access (absolute)
-  BPF_IND        0x40   legacy BPF packet access (indirect)
-  BPF_MEM        0x60   regular load and store operations
-  BPF_ATOMIC     0xc0   atomic operations
-  =============  =====  ====================================
-
-
 Regular load and store operations
 ---------------------------------
 
@@ -252,42 +256,42 @@ by other eBPF programs or means outside of this specification.
 All atomic operations supported by eBPF are encoded as store operations
 that use the ``BPF_ATOMIC`` mode modifier as follows:
 
-  * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
-  * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
-  * 8-bit and 16-bit wide atomic operations are not supported.
+* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
+* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
+* 8-bit and 16-bit wide atomic operations are not supported.
 
-The imm field is used to encode the actual atomic operation.
+The 'imm' field is used to encode the actual atomic operation.
 Simple atomic operation use a subset of the values defined to encode
-arithmetic operations in the imm field to encode the atomic operation:
+arithmetic operations in the 'imm' field to encode the atomic operation:
 
-  ========  =====  ===========
-  imm       value  description
-  ========  =====  ===========
-  BPF_ADD   0x00   atomic add
-  BPF_OR    0x40   atomic or
-  BPF_AND   0x50   atomic and
-  BPF_XOR   0xa0   atomic xor
-  ========  =====  ===========
+========  =====  ===========
+imm       value  description
+========  =====  ===========
+BPF_ADD   0x00   atomic add
+BPF_OR    0x40   atomic or
+BPF_AND   0x50   atomic and
+BPF_XOR   0xa0   atomic xor
+========  =====  ===========
 
 
-``BPF_ATOMIC | BPF_W  | BPF_STX`` with imm = BPF_ADD means::
+``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
 
   *(u32 *)(dst_reg + off16) += src_reg
 
-``BPF_ATOMIC | BPF_DW | BPF_STX`` with imm = BPF ADD means::
+``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
 
   *(u64 *)(dst_reg + off16) += src_reg
 
 In addition to the simple atomic operations, there also is a modifier and
 two complex atomic operations:
 
-  ===========  ================  ===========================
-  imm          value             description
-  ===========  ================  ===========================
-  BPF_FETCH    0x01              modifier: return old value
-  BPF_XCHG     0xe0 | BPF_FETCH  atomic exchange
-  BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
-  ===========  ================  ===========================
+===========  ================  ===========================
+imm          value             description
+===========  ================  ===========================
+BPF_FETCH    0x01              modifier: return old value
+BPF_XCHG     0xe0 | BPF_FETCH  atomic exchange
+BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
+===========  ================  ===========================
 
 The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
 always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
@@ -306,7 +310,7 @@ and loaded back to ``R0``.
 64-bit immediate instructions
 -----------------------------
 
-Instructions with the ``BPF_IMM`` mode modifier use the wide instruction
+Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
 encoding for an extra imm64 value.
 
 There is currently only one such instruction.
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 06/15] ebpf-docs: Use standard type convention in standard doc
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (3 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 05/15] ebpf-docs: Add TOC and fix formatting dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-30 20:49   ` Alexei Starovoitov
  2022-09-27 18:59 ` [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow dthaler1968
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 14 ++++++++++----
 Documentation/bpf/linux-notes.rst     |  6 ++++++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 4997d2088..a24bc5d53 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -7,6 +7,10 @@ eBPF Instruction Set Specification, v1.0
 
 This document specifies version 1.0 of the eBPF instruction set.
 
+Documentation conventions
+=========================
+
+This specification uses the standard C types (uint32_t, etc.) in documentation.
 
 Registers and calling convention
 ================================
@@ -114,7 +118,9 @@ BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
 
 ``BPF_ADD | BPF_X | BPF_ALU`` means::
 
-  dst_reg = (u32) dst_reg + (u32) src_reg;
+  dst_reg = (uint32_t) dst_reg + (uint32_t) src_reg;
+
+where '(uint32_t)' indicates truncation to 32 bits.
 
 ``BPF_ADD | BPF_X | BPF_ALU64`` means::
 
@@ -122,7 +128,7 @@ BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
 
 ``BPF_XOR | BPF_K | BPF_ALU`` means::
 
-  src_reg = (u32) src_reg ^ (u32) imm32
+  src_reg = (uint32_t) src_reg ^ (uint32_t) imm32
 
 ``BPF_XOR | BPF_K | BPF_ALU64`` means::
 
@@ -276,11 +282,11 @@ BPF_XOR   0xa0   atomic xor
 
 ``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
 
-  *(u32 *)(dst_reg + off16) += src_reg
+  *(uint32_t *)(dst_reg + off16) += src_reg
 
 ``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
 
-  *(u64 *)(dst_reg + off16) += src_reg
+  *(uint32_t *)(dst_reg + off16) += src_reg
 
 In addition to the simple atomic operations, there also is a modifier and
 two complex atomic operations:
diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
index 1c31379b4..522ebe27d 100644
--- a/Documentation/bpf/linux-notes.rst
+++ b/Documentation/bpf/linux-notes.rst
@@ -7,6 +7,12 @@ Linux implementation notes
 
 This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
 
+Arithmetic instructions
+=======================
+
+While the eBPF instruction set document uses the standard C terminology as the cross-platform specification,
+in the Linux kernel, uint32_t is expressed as u32, uint64_t is expressed as u64, etc.
+
 Byte swap instructions
 ======================
 
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (4 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 06/15] ebpf-docs: Use standard type convention in standard doc dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-30 20:52   ` Alexei Starovoitov
  2022-09-27 18:59 ` [PATCH 08/15] ebpf-docs: Use consistent names for the same field dthaler1968
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index a24bc5d53..3c5a63612 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -103,19 +103,26 @@ code      value  description
 BPF_ADD   0x00   dst += src
 BPF_SUB   0x10   dst -= src
 BPF_MUL   0x20   dst \*= src
-BPF_DIV   0x30   dst /= src
+BPF_DIV   0x30   dst = (src != 0) ? (dst / src) : 0
 BPF_OR    0x40   dst \|= src
 BPF_AND   0x50   dst &= src
 BPF_LSH   0x60   dst <<= src
 BPF_RSH   0x70   dst >>= src
 BPF_NEG   0x80   dst = ~src
-BPF_MOD   0x90   dst %= src
+BPF_MOD   0x90   dst = (src != 0) ? (dst % src) : dst
 BPF_XOR   0xa0   dst ^= src
 BPF_MOV   0xb0   dst = src
 BPF_ARSH  0xc0   sign extending shift right
 BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
 ========  =====  ==========================================================
 
+Underflow and overflow are allowed during arithmetic operations,
+meaning the 64-bit or 32-bit value will wrap.  If
+eBPF program execution would result in division by zero,
+the destination register is instead set to zero.
+If execution would result in modulo by zero,
+the destination register is instead left unchanged.
+
 ``BPF_ADD | BPF_X | BPF_ALU`` means::
 
   dst_reg = (uint32_t) dst_reg + (uint32_t) src_reg;
@@ -135,6 +142,14 @@ where '(uint32_t)' indicates truncation to 32 bits.
   src_reg = src_reg ^ imm32
 
 
+Also note that the modulo operation often varies by language
+when the dividend or divisor are negative, where Python, Ruby, etc.
+differ from C, Go, Java, etc. This specification requires that
+modulo use truncated division (where -13 % 3 == -1) as implemented
+in C, Go, etc.:
+
+   a % n = a - n * trunc(a / n)
+
 Byte swap instructions
 ~~~~~~~~~~~~~~~~~~~~~~
 
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 08/15] ebpf-docs: Use consistent names for the same field
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (5 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-30 20:57   ` Alexei Starovoitov
  2022-09-27 18:59 ` [PATCH 09/15] ebpf-docs: Explain helper functions dthaler1968
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 107 ++++++++++++++++++--------
 1 file changed, 76 insertions(+), 31 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 3c5a63612..2987234eb 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -34,20 +34,59 @@ Instruction encoding
 eBPF has two instruction encodings:
 
 * the basic instruction encoding, which uses 64 bits to encode an instruction
-* the wide instruction encoding, which appends a second 64-bit immediate value
-  (imm64) after the basic instruction for a total of 128 bits.
+* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
+  constant) value after the basic instruction for a total of 128 bits.
 
-The basic instruction encoding looks as follows:
+The basic instruction encoding is as follows, where MSB and LSB mean the most significant
+bits and least significant bits, respectively:
 
 =============  =======  ===============  ====================  ============
 32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
 =============  =======  ===============  ====================  ============
-immediate      offset   source register  destination register  opcode
+imm            offset   src              dst                   opcode
 =============  =======  ===============  ====================  ============
 
+imm
+  signed integer immediate value
+
+offset
+  signed integer offset used with pointer arithmetic
+
+src
+  the source register number (0-10), except where otherwise specified
+  (`64-bit immediate instructions`_ reuse this field for other purposes)
+
+dst
+  destination register number (0-10)
+
+opcode
+  operation to perform
+
 Note that most instructions do not use all of the fields.
 Unused fields shall be cleared to zero.
 
+As discussed below in `64-bit immediate instructions`_, some
+instructions use a 64-bit immediate value that is constructed as follows.
+The 64 bits following the basic instruction contain a pseudo instruction
+using the same format but with opcode, dst, src, and offset all set to zero,
+and imm containing the high 32 bits of the immediate value.
+
+=================  ==================
+64 bits (MSB)      64 bits (LSB)
+=================  ==================
+basic instruction  pseudo instruction
+=================  ==================
+
+Thus the 64-bit immediate value is constructed as follows:
+
+  imm64 = imm + (next_imm << 32)
+
+where 'next_imm' refers to the imm value of the pseudo instruction
+following the basic instruction.
+
+In the remainder of this document 'src' and 'dst' refer to the values of the source
+and destination registers, respectively, rather than the register number.
+
 Instruction classes
 -------------------
 
@@ -75,20 +114,24 @@ For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` an
 ==============  ======  =================
 4 bits (MSB)    1 bit   3 bits (LSB)
 ==============  ======  =================
-operation code  source  instruction class
+code            source  instruction class
 ==============  ======  =================
 
-The 4th bit encodes the source operand:
+code
+  the operation code, whose meaning varies by instruction class
 
-  ======  =====  ========================================
-  source  value  description
-  ======  =====  ========================================
-  BPF_K   0x00   use 32-bit immediate as source operand
-  BPF_X   0x08   use 'src_reg' register as source operand
-  ======  =====  ========================================
+source
+  the source operand location, which unless otherwise specified is one of:
 
-The four MSB bits store the operation code.
+  ======  =====  ==========================================
+  source  value  description
+  ======  =====  ==========================================
+  BPF_K   0x00   use 32-bit 'imm' value as source operand
+  BPF_X   0x08   use 'src' register value as source operand
+  ======  =====  ==========================================
 
+instruction class
+  the instruction class (see `Instruction classes`_)
 
 Arithmetic instructions
 -----------------------
@@ -116,6 +159,8 @@ BPF_ARSH  0xc0   sign extending shift right
 BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
 ========  =====  ==========================================================
 
+where 'src' is the source operand value.
+
 Underflow and overflow are allowed during arithmetic operations,
 meaning the 64-bit or 32-bit value will wrap.  If
 eBPF program execution would result in division by zero,
@@ -125,21 +170,21 @@ the destination register is instead left unchanged.
 
 ``BPF_ADD | BPF_X | BPF_ALU`` means::
 
-  dst_reg = (uint32_t) dst_reg + (uint32_t) src_reg;
+  dst = (uint32_t) (dst + src)
 
 where '(uint32_t)' indicates truncation to 32 bits.
 
 ``BPF_ADD | BPF_X | BPF_ALU64`` means::
 
-  dst_reg = dst_reg + src_reg
+  dst = dst + src
 
 ``BPF_XOR | BPF_K | BPF_ALU`` means::
 
-  src_reg = (uint32_t) src_reg ^ (uint32_t) imm32
+  src = (uint32_t) src ^ (uint32_t) imm
 
 ``BPF_XOR | BPF_K | BPF_ALU64`` means::
 
-  src_reg = src_reg ^ imm32
+  src = src ^ imm
 
 
 Also note that the modulo operation often varies by language
@@ -176,11 +221,11 @@ Examples:
 
 ``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
 
-  dst_reg = htole16(dst_reg)
+  dst = htole16(dst)
 
 ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
 
-  dst_reg = htobe64(dst_reg)
+  dst = htobe64(dst)
 
 Jump instructions
 -----------------
@@ -255,15 +300,15 @@ instructions that transfer data between a register and memory.
 
 ``BPF_MEM | <size> | BPF_STX`` means::
 
-  *(size *) (dst_reg + off) = src_reg
+  *(size *) (dst + offset) = src_reg
 
 ``BPF_MEM | <size> | BPF_ST`` means::
 
-  *(size *) (dst_reg + off) = imm32
+  *(size *) (dst + offset) = imm32
 
 ``BPF_MEM | <size> | BPF_LDX`` means::
 
-  dst_reg = *(size *) (src_reg + off)
+  dst = *(size *) (src + offset)
 
 Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
 
@@ -297,11 +342,11 @@ BPF_XOR   0xa0   atomic xor
 
 ``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
 
-  *(uint32_t *)(dst_reg + off16) += src_reg
+  *(uint32_t *)(dst + offset) += src
 
 ``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
 
-  *(uint32_t *)(dst_reg + off16) += src_reg
+  *(uint64_t *)(dst + offset) += src
 
 In addition to the simple atomic operations, there also is a modifier and
 two complex atomic operations:
@@ -316,16 +361,16 @@ BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
 
 The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
 always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
-is set, then the operation also overwrites ``src_reg`` with the value that
+is set, then the operation also overwrites ``src`` with the value that
 was in memory before it was modified.
 
-The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value
-addressed by ``dst_reg + off``.
+The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
+addressed by ``dst + offset``.
 
 The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
-``dst_reg + off`` with ``R0``. If they match, the value addressed by
-``dst_reg + off`` is replaced with ``src_reg``. In either case, the
-value that was at ``dst_reg + off`` before the operation is zero-extended
+``dst + offset`` with ``R0``. If they match, the value addressed by
+``dst + offset`` is replaced with ``src``. In either case, the
+value that was at ``dst + offset`` before the operation is zero-extended
 and loaded back to ``R0``.
 
 64-bit immediate instructions
@@ -338,7 +383,7 @@ There is currently only one such instruction.
 
 ``BPF_LD | BPF_DW | BPF_IMM`` means::
 
-  dst_reg = imm64
+  dst = imm64
 
 
 Legacy BPF Packet access instructions
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 09/15] ebpf-docs: Explain helper functions
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (6 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 08/15] ebpf-docs: Use consistent names for the same field dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-30 22:01   ` Alexei Starovoitov
  2022-09-27 18:59 ` [PATCH 10/15] ebpf-docs: Add appendix of all opcodes in order dthaler1968
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 2987234eb..926957830 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -245,7 +245,7 @@ BPF_JSET  0x40   PC += off if dst & src
 BPF_JNE   0x50   PC += off if dst != src
 BPF_JSGT  0x60   PC += off if dst > src     signed
 BPF_JSGE  0x70   PC += off if dst >= src    signed
-BPF_CALL  0x80   function call
+BPF_CALL  0x80   function call              see `Helper functions`_
 BPF_EXIT  0x90   function / program return  BPF_JMP only
 BPF_JLT   0xa0   PC += off if dst < src     unsigned
 BPF_JLE   0xb0   PC += off if dst <= src    unsigned
@@ -256,6 +256,22 @@ BPF_JSLE  0xd0   PC += off if dst <= src    signed
 The eBPF program needs to store the return value into register R0 before doing a
 BPF_EXIT.
 
+Helper functions
+~~~~~~~~~~~~~~~~
+Helper functions are a concept whereby BPF programs can call into a
+set of function calls exposed by the eBPF runtime.  Each helper
+function is identified by an integer used in a ``BPF_CALL`` instruction.
+The available helper functions may differ for each eBPF program type.
+
+Conceptually, each helper function is implemented with a commonly shared function
+signature defined as:
+
+  uint64_t function(uint64_t r1, uint64_t r2, uint64_t r3, uint64_t r4, uint64_t r5)
+
+In actuality, each helper function is defined as taking between 0 and 5 arguments,
+with the remaining registers being ignored.  The definition of a helper function
+is responsible for specifying the type (e.g., integer, pointer, etc.) of the value returned,
+the number of arguments, and the type of each argument.
 
 Load and store instructions
 ===========================
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 10/15] ebpf-docs: Add appendix of all opcodes in order
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (7 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 09/15] ebpf-docs: Explain helper functions dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-30 22:02   ` Alexei Starovoitov
  2022-09-27 18:59 ` [PATCH 11/15] ebpf-docs: Improve English readability dthaler1968
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 197 ++++++++++++++++++++++++++
 1 file changed, 197 insertions(+)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 926957830..b6f098104 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -408,3 +408,200 @@ Legacy BPF Packet access instructions
 eBPF previously introduced special instructions for access to packet data that were
 carried over from classic BPF. However, these instructions are
 deprecated and should no longer be used.
+
+Appendix
+========
+
+For reference, the following table lists opcodes in order by value.
+
+======  ===  ====  ===================================================  ========================================
+opcode  src  imm   description                                          reference
+======  ===  ====  ===================================================  ========================================
+0x00    0x0  any   (additional immediate value)                         `64-bit immediate instructions`_
+0x04    0x0  any   dst = (uint32_t)(dst + imm)                          `Arithmetic instructions`_
+0x05    0x0  0x00  goto +offset                                         `Jump instructions`_
+0x07    0x0  any   dst += imm                                           `Arithmetic instructions`_
+0x0c    any  0x00  dst = (uint32_t)(dst + src)                          `Arithmetic instructions`_
+0x0f    any  0x00  dst += src                                           `Arithmetic instructions`_
+0x14    0x0  any   dst = (uint32_t)(dst - imm)                          `Arithmetic instructions`_
+0x15    0x0  any   if dst == imm goto +offset                           `Jump instructions`_
+0x16    0x0  any   if (uint32_t)dst == imm goto +offset                 `Jump instructions`_
+0x17    0x0  any   dst -= imm                                           `Arithmetic instructions`_
+0x18    0x0  any   dst = imm64                                          `64-bit immediate instructions`_
+0x1c    any  0x00  dst = (uint32_t)(dst - src)                          `Arithmetic instructions`_
+0x1d    any  0x00  if dst == src goto +offset                           `Jump instructions`_
+0x1e    any  0x00  if (uint32_t)dst == (uint32_t)src goto +offset       `Jump instructions`_
+0x1f    any  0x00  dst -= src                                           `Arithmetic instructions`_
+0x20    any  any   (deprecated, implementation-specific)                `Legacy BPF Packet access instructions`_
+0x24    0x0  any   dst = (uint32_t)(dst \* imm)                         `Arithmetic instructions`_
+0x25    0x0  any   if dst > imm goto +offset                            `Jump instructions`_
+0x26    0x0  any   if (uint32_t)dst > imm goto +offset                  `Jump instructions`_
+0x27    0x0  any   dst \*= imm                                          `Arithmetic instructions`_
+0x28    any  any   (deprecated, implementation-specific)                `Legacy BPF Packet access instructions`_
+0x2c    any  0x00  dst = (uint32_t)(dst \* src)                         `Arithmetic instructions`_
+0x2d    any  0x00  if dst > src goto +offset                            `Jump instructions`_
+0x2e    any  0x00  if (uint32_t)dst > (uint32_t)src goto +offset        `Jump instructions`_
+0x2f    any  0x00  dst \*= src                                          `Arithmetic instructions`_
+0x30    any  any   (deprecated, implementation-specific)                `Legacy BPF Packet access instructions`_
+0x34    0x0  any   dst = (uint32_t)((imm != 0) ? (dst / imm) : 0)       `Arithmetic instructions`_
+0x35    0x0  any   if dst >= imm goto +offset                           `Jump instructions`_
+0x36    0x0  any   if (uint32_t)dst >= imm goto +offset                 `Jump instructions`_
+0x37    0x0  any   dst = (imm != 0) ? (dst / imm) : 0                   `Arithmetic instructions`_
+0x38    any  any   (deprecated, implementation-specific)                `Legacy BPF Packet access instructions`_
+0x3c    any  0x00  dst = (uint32_t)((imm != 0) ? (dst / src) : 0)       `Arithmetic instructions`_
+0x3d    any  0x00  if dst >= src goto +offset                           `Jump instructions`_
+0x3e    any  0x00  if (uint32_t)dst >= (uint32_t)src goto +offset       `Jump instructions`_
+0x3f    any  0x00  dst = (src !+ 0) ? (dst / src) : 0                   `Arithmetic instructions`_
+0x40    any  any   (deprecated, implementation-specific)                `Legacy BPF Packet access instructions`_
+0x44    0x0  any   dst = (uint32_t)(dst \| imm)                         `Arithmetic instructions`_
+0x45    0x0  any   if dst & imm goto +offset                            `Jump instructions`_
+0x46    0x0  any   if (uint32_t)dst & imm goto +offset                  `Jump instructions`_
+0x47    0x0  any   dst \|= imm                                          `Arithmetic instructions`_
+0x48    any  any   (deprecated, implementation-specific)                `Legacy BPF Packet access instructions`_
+0x4c    any  0x00  dst = (uint32_t)(dst \| src)                         `Arithmetic instructions`_
+0x4d    any  0x00  if dst & src goto +offset                            `Jump instructions`_
+0x4e    any  0x00  if (uint32_t)dst & (uint32_t)src goto +offset        `Jump instructions`_
+0x4f    any  0x00  dst \|= src                                          `Arithmetic instructions`_
+0x50    any  any   (deprecated, implementation-specific)                `Legacy BPF Packet access instructions`_
+0x54    0x0  any   dst = (uint32_t)(dst & imm)                          `Arithmetic instructions`_
+0x55    0x0  any   if dst != imm goto +offset                           `Jump instructions`_
+0x56    0x0  any   if (uint32_t)dst != imm goto +offset                 `Jump instructions`_
+0x57    0x0  any   dst &= imm                                           `Arithmetic instructions`_
+0x58    any  any   (deprecated, implementation-specific)                `Legacy BPF Packet access instructions`_
+0x5c    any  0x00  dst = (uint32_t)(dst & src)                          `Arithmetic instructions`_
+0x5d    any  0x00  if dst != src goto +offset                           `Jump instructions`_
+0x5e    any  0x00  if (uint32_t)dst != (uint32_t)src goto +offset       `Jump instructions`_
+0x5f    any  0x00  dst &= src                                           `Arithmetic instructions`_
+0x61    any  0x00  dst = \*(uint32_t \*)(src + offset)                  `Load and store instructions`_
+0x62    0x0  any   \*(uint32_t \*)(dst + offset) = imm                  `Load and store instructions`_
+0x63    any  0x00  \*(uint32_t \*)(dst + offset) = src                  `Load and store instructions`_
+0x64    0x0  any   dst = (uint32_t)(dst << imm)                         `Arithmetic instructions`_
+0x65    0x0  any   if dst s> imm goto +offset                           `Jump instructions`_
+0x66    0x0  any   if (int32_t)dst s> (int32_t)imm goto +offset         `Jump instructions`_
+0x67    0x0  any   dst <<= imm                                          `Arithmetic instructions`_
+0x69    any  0x00  dst = \*(uint16_t \*)(src + offset)                  `Load and store instructions`_
+0x6a    0x0  any   \*(uint16_t \*)(dst + offset) = imm                  `Load and store instructions`_
+0x6b    any  0x00  \*(uint16_t \*)(dst + offset) = src                  `Load and store instructions`_
+0x6c    any  0x00  dst = (uint32_t)(dst << src)                         `Arithmetic instructions`_
+0x6d    any  0x00  if dst s> src goto +offset                           `Jump instructions`_
+0x6e    any  0x00  if (int32_t)dst s> (int32_t)src goto +offset         `Jump instructions`_
+0x6f    any  0x00  dst <<= src                                          `Arithmetic instructions`_
+0x71    any  0x00  dst = \*(uint8_t \*)(src + offset)                   `Load and store instructions`_
+0x72    0x0  any   \*(uint8_t \*)(dst + offset) = imm                   `Load and store instructions`_
+0x73    any  0x00  \*(uint8_t \*)(dst + offset) = src                   `Load and store instructions`_
+0x74    0x0  any   dst = (uint32_t)(dst >> imm)                         `Arithmetic instructions`_
+0x75    0x0  any   if dst s>= imm goto +offset                          `Jump instructions`_
+0x76    0x0  any   if (int32_t)dst s>= (int32_t)imm goto +offset        `Jump instructions`_
+0x77    0x0  any   dst >>= imm                                          `Arithmetic instructions`_
+0x79    any  0x00  dst = \*(uint64_t \*)(src + offset)                  `Load and store instructions`_
+0x7a    0x0  any   \*(uint64_t \*)(dst + offset) = imm                  `Load and store instructions`_
+0x7b    any  0x00  \*(uint64_t \*)(dst + offset) = src                  `Load and store instructions`_
+0x7c    any  0x00  dst = (uint32_t)(dst >> src)                         `Arithmetic instructions`_
+0x7d    any  0x00  if dst s>= src goto +offset                          `Jump instructions`_
+0x7e    any  0x00  if (int32_t)dst s>= (int32_t)src goto +offset        `Jump instructions`_
+0x7f    any  0x00  dst >>= src                                          `Arithmetic instructions`_
+0x84    0x0  0x00  dst = (uint32_t)-dst                                 `Arithmetic instructions`_
+0x85    0x0  any   call helper function imm                             `Helper functions`_
+0x87    0x0  0x00  dst = -dst                                           `Arithmetic instructions`_
+0x94    0x0  any   dst = (uint32_t)((imm != 0) ? (dst % imm) : dst)     `Arithmetic instructions`_
+0x95    0x0  0x00  return                                               `Jump instructions`_
+0x97    0x0  any   dst = (imm != 0) ? (dst % imm) : dst                 `Arithmetic instructions`_
+0x9c    any  0x00  dst = (uint32_t)((src != 0) ? (dst % src) : dst)     `Arithmetic instructions`_
+0x9f    any  0x00  dst = (src != 0) ? (dst % src) : dst                 `Arithmetic instructions`_
+0xa4    0x0  any   dst = (uint32_t)(dst ^ imm)                          `Arithmetic instructions`_
+0xa5    0x0  any   if dst < imm goto +offset                            `Jump instructions`_
+0xa6    0x0  any   if (uint32_t)dst < imm goto +offset                  `Jump instructions`_
+0xa7    0x0  any   dst ^= imm                                           `Arithmetic instructions`_
+0xac    any  0x00  dst = (uint32_t)(dst ^ src)                          `Arithmetic instructions`_
+0xad    any  0x00  if dst < src goto +offset                            `Jump instructions`_
+0xae    any  0x00  if (uint32_t)dst < (uint32_t)src goto +offset        `Jump instructions`_
+0xaf    any  0x00  dst ^= src                                           `Arithmetic instructions`_
+0xb4    0x0  any   dst = (uint32_t) imm                                 `Arithmetic instructions`_
+0xb5    0x0  any   if dst <= imm goto +offset                           `Jump instructions`_
+0xa6    0x0  any   if (uint32_t)dst <= imm goto +offset                 `Jump instructions`_
+0xb7    0x0  any   dst = imm                                            `Arithmetic instructions`_
+0xbc    any  0x00  dst = (uint32_t) src                                 `Arithmetic instructions`_
+0xbd    any  0x00  if dst <= src goto +offset                           `Jump instructions`_
+0xbe    any  0x00  if (uint32_t)dst <= (uint32_t)src goto +offset       `Jump instructions`_
+0xbf    any  0x00  dst = src                                            `Arithmetic instructions`_
+0xc3    any  0x00  lock \*(uint32_t \*)(dst + offset) += src            `Atomic operations`_
+0xc3    any  0x01  lock::                                               `Atomic operations`_
+
+                       *(uint32_t *)(dst + offset) += src
+                       src = *(uint32_t *)(dst + offset)
+0xc3    any  0x40  \*(uint32_t \*)(dst + offset) \|= src                `Atomic operations`_
+0xc3    any  0x41  lock::                                               `Atomic operations`_
+
+                       *(uint32_t *)(dst + offset) |= src
+                       src = *(uint32_t *)(dst + offset)
+0xc3    any  0x50  \*(uint32_t \*)(dst + offset) &= src                 `Atomic operations`_
+0xc3    any  0x51  lock::                                               `Atomic operations`_
+
+                       *(uint32_t *)(dst + offset) &= src
+                       src = *(uint32_t *)(dst + offset)
+0xc3    any  0xa0  \*(uint32_t \*)(dst + offset) ^= src                 `Atomic operations`_
+0xc3    any  0xa1  lock::                                               `Atomic operations`_
+
+                       *(uint32_t *)(dst + offset) ^= src
+                       src = *(uint32_t *)(dst + offset)
+0xc3    any  0xe1  lock::                                               `Atomic operations`_
+
+                       temp = *(uint32_t *)(dst + offset)
+                       *(uint32_t *)(dst + offset) = src
+                       src = temp
+0xc3    any  0xf1  lock::                                               `Atomic operations`_
+
+                       temp = *(uint32_t *)(dst + offset)
+                       if *(uint32_t)(dst + offset) == R0
+                          *(uint32_t)(dst + offset) = src
+                       R0 = temp
+0xc4    0x0  any   dst = (uint32_t)(dst s>> imm)                        `Arithmetic instructions`_
+0xc5    0x0  any   if dst s< imm goto +offset                           `Jump instructions`_
+0xc6    0x0  any   if (int32_t)dst s< (int32_t)imm goto +offset         `Jump instructions`_
+0xc7    0x0  any   dst s>>= imm                                         `Arithmetic instructions`_
+0xcc    any  0x00  dst = (uint32_t)(dst s>> src)                        `Arithmetic instructions`_
+0xcd    any  0x00  if dst s< src goto +offset                           `Jump instructions`_
+0xce    any  0x00  if (int32_t)dst s< (int32_t)src goto +offset         `Jump instructions`_
+0xcf    any  0x00  dst s>>= src                                         `Arithmetic instructions`_
+0xd4    0x0  0x10  dst = htole16(dst)                                   `Byte swap instructions`_
+0xd4    0x0  0x20  dst = htole32(dst)                                   `Byte swap instructions`_
+0xd4    0x0  0x40  dst = htole64(dst)                                   `Byte swap instructions`_
+0xd5    0x0  any   if dst s<= imm goto +offset                          `Jump instructions`_
+0xd6    0x0  any   if (int32_t)dst s<= (int32_t)imm goto +offset        `Jump instructions`_
+0xdb    any  0x00  lock \*(uint64_t \*)(dst + offset) += src            `Atomic operations`_
+0xdb    any  0x01  lock::                                               `Atomic operations`_
+
+                       *(uint64_t *)(dst + offset) += src
+                       src = *(uint64_t *)(dst + offset)
+0xdb    any  0x40  \*(uint64_t \*)(dst + offset) \|= src                `Atomic operations`_
+0xdb    any  0x41  lock::                                               `Atomic operations`_
+
+                       *(uint64_t *)(dst + offset) |= src
+                       lock src = *(uint64_t *)(dst + offset)
+0xdb    any  0x50  \*(uint64_t \*)(dst + offset) &= src                 `Atomic operations`_
+0xdb    any  0x51  lock::                                               `Atomic operations`_
+
+                       *(uint64_t *)(dst + offset) &= src
+                       src = *(uint64_t *)(dst + offset)
+0xdb    any  0xa0  \*(uint64_t \*)(dst + offset) ^= src                 `Atomic operations`_
+0xdb    any  0xa1  lock::                                               `Atomic operations`_
+
+                       *(uint64_t *)(dst + offset) ^= src
+                       src = *(uint64_t *)(dst + offset)
+0xdb    any  0xe1  lock::                                               `Atomic operations`_
+
+                       temp = *(uint64_t *)(dst + offset)
+                       *(uint64_t *)(dst + offset) = src
+                       src = temp
+0xdb    any  0xf1  lock::                                               `Atomic operations`_
+
+                       temp = *(uint64_t *)(dst + offset)
+                       if *(uint64_t)(dst + offset) == R0
+                          *(uint64_t)(dst + offset) = src
+                       R0 = temp
+0xdc    0x0  0x10  dst = htobe16(dst)                                   `Byte swap instructions`_
+0xdc    0x0  0x20  dst = htobe32(dst)                                   `Byte swap instructions`_
+0xdc    0x0  0x40  dst = htobe64(dst)                                   `Byte swap instructions`_
+0xdd    any  0x00  if dst s<= src goto +offset                          `Jump instructions`_
+0xde    any  0x00  if (int32_t)dst s<= (int32_t)src goto +offset        `Jump instructions`_
+======  ===  ====  ===================================================  ========================================
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 11/15] ebpf-docs: Improve English readability
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (8 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 10/15] ebpf-docs: Add appendix of all opcodes in order dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-30 22:16   ` Alexei Starovoitov
  2022-09-27 18:59 ` [PATCH 12/15] ebpf-docs: Add Linux note about register calling convention dthaler1968
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 137 ++++++++++++++++----------
 1 file changed, 87 insertions(+), 50 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index b6f098104..328207ff6 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -7,6 +7,9 @@ eBPF Instruction Set Specification, v1.0
 
 This document specifies version 1.0 of the eBPF instruction set.
 
+The eBPF instruction set consists of eleven 64 bit registers, a program counter,
+and 512 bytes of stack space.
+
 Documentation conventions
 =========================
 
@@ -25,12 +28,24 @@ The eBPF calling convention is defined as:
 * R6 - R9: callee saved registers that function calls will preserve
 * R10: read-only frame pointer to access stack
 
-R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
-necessary across calls.
+Registers R0 - R5 are scratch registers, meaning the BPF program needs to either
+spill them to the BPF stack or move them to callee saved registers if these
+arguments are to be reused across multiple function calls. Spilling means
+that the value in the register is moved to the BPF stack. The reverse operation
+of moving the variable from the BPF stack to the register is called filling.
+The reason for spilling/filling is due to the limited number of registers.
+
+Upon entering execution of an eBPF program, registers R1 - R5 initially can contain
+the input arguments for the program (similar to the argc/argv pair for a typical C program).
+The actual number of registers used, and their meaning, is defined by the program type;
+for example, a networking program might have an argument that includes network packet data
+and/or metadata.
 
 Instruction encoding
 ====================
 
+An eBPF program is a sequence of instructions.
+
 eBPF has two instruction encodings:
 
 * the basic instruction encoding, which uses 64 bits to encode an instruction
@@ -63,7 +78,7 @@ opcode
   operation to perform
 
 Note that most instructions do not use all of the fields.
-Unused fields shall be cleared to zero.
+Unused fields must be set to zero.
 
 As discussed below in `64-bit immediate instructions`_, some
 instructions use a 64-bit immediate value that is constructed as follows.
@@ -90,7 +105,9 @@ and destination registers, respectively, rather than the register number.
 Instruction classes
 -------------------
 
-The three LSB bits of the 'opcode' field store the instruction class:
+The encoding of the 'opcode' field varies and can be determined from
+the three least significant bits (LSB) of the 'opcode' field which holds
+the "instruction class", as follows:
 
 =========  =====  ===============================  ===================================
 class      value  description                      reference
@@ -136,9 +153,11 @@ instruction class
 Arithmetic instructions
 -----------------------
 
-``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
+Instruction class ``BPF_ALU`` uses 32-bit wide operands (zeroing the upper 32 bits
+of the destination register) while ``BPF_ALU64`` uses 64-bit wide operands for
 otherwise identical operations.
-The 'code' field encodes the operation as below:
+
+The 4-bit 'code' field encodes the operation as follows:
 
 ========  =====  ==========================================================
 code      value  description
@@ -168,21 +187,23 @@ the destination register is instead set to zero.
 If execution would result in modulo by zero,
 the destination register is instead left unchanged.
 
-``BPF_ADD | BPF_X | BPF_ALU`` means::
+Examples:
+
+``BPF_ADD | BPF_X | BPF_ALU`` (0x0c) means::
 
   dst = (uint32_t) (dst + src)
 
 where '(uint32_t)' indicates truncation to 32 bits.
 
-``BPF_ADD | BPF_X | BPF_ALU64`` means::
+``BPF_ADD | BPF_X | BPF_ALU64`` (0x0f) means::
 
   dst = dst + src
 
-``BPF_XOR | BPF_K | BPF_ALU`` means::
+``BPF_XOR | BPF_K | BPF_ALU`` (0xa4) means::
 
   src = (uint32_t) src ^ (uint32_t) imm
 
-``BPF_XOR | BPF_K | BPF_ALU64`` means::
+``BPF_XOR | BPF_K | BPF_ALU64`` (0xa7) means::
 
   src = src ^ imm
 
@@ -204,8 +225,9 @@ The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
 The byte swap instructions operate on the destination register
 only and do not use a separate source register or immediate value.
 
-The 1-bit source operand field in the opcode is used to to select what byte
-order the operation convert from or to:
+Byte swap instructions use non-default semantics of the 1-bit 'source' field in
+the 'opcode' field.  Instead of indicating the source operator, it is instead
+used to select what byte order the operation converts from or to:
 
 =========  =====  =================================================
 source     value  description
@@ -215,24 +237,33 @@ BPF_TO_BE  0x08   convert between host byte order and big endian
 =========  =====  =================================================
 
 The 'imm' field encodes the width of the swap operations.  The following widths
-are supported: 16, 32 and 64.
-
-Examples:
-
-``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
-
-  dst = htole16(dst)
-
-``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
-
-  dst = htobe64(dst)
+are supported: 16, 32 and 64. The following table summarizes the resulting
+possibilities:
+
+=============================  =========  ===  ========  ==================
+opcode construction            opcode     imm  mnemonic  pseudocode
+=============================  =========  ===  ========  ==================
+BPF_END | BPF_TO_LE | BPF_ALU  0xd4       16   le16 dst  dst = htole16(dst)
+BPF_END | BPF_TO_LE | BPF_ALU  0xd4       32   le32 dst  dst = htole32(dst)
+BPF_END | BPF_TO_LE | BPF_ALU  0xd4       64   le64 dst  dst = htole64(dst)
+BPF_END | BPF_TO_BE | BPF_ALU  0xdc       16   be16 dst  dst = htobe16(dst)
+BPF_END | BPF_TO_BE | BPF_ALU  0xdc       32   be32 dst  dst = htobe32(dst)
+BPF_END | BPF_TO_BE | BPF_ALU  0xdc       64   be64 dst  dst = htobe64(dst)
+=============================  =========  ===  ========  ==================
+
+where
+
+* mnenomic indicates a short form that might be displayed by some tools such as disassemblers
+* 'htoleNN()' indicates converting a NN-bit value from host byte order to little-endian byte order
+* 'htobeNN()' indicates converting a NN-bit value from host byte order to big-endian byte order
 
 Jump instructions
 -----------------
 
-``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
+Instruction class ``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
 otherwise identical operations.
-The 'code' field encodes the operation as below:
+
+The 4-bit 'code' field encodes the operation as below, where PC is the program counter:
 
 ========  =====  =========================  ============
 code      value  description                notes
@@ -253,9 +284,6 @@ BPF_JSLT  0xc0   PC += off if dst < src     signed
 BPF_JSLE  0xd0   PC += off if dst <= src    signed
 ========  =====  =========================  ============
 
-The eBPF program needs to store the return value into register R0 before doing a
-BPF_EXIT.
-
 Helper functions
 ~~~~~~~~~~~~~~~~
 Helper functions are a concept whereby BPF programs can call into a
@@ -285,7 +313,8 @@ For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_
 mode          size    instruction class
 ============  ======  =================
 
-The mode modifier is one of:
+mode
+  one of:
 
   =============  =====  ====================================  =============
   mode modifier  value  description                           reference
@@ -297,7 +326,8 @@ The mode modifier is one of:
   BPF_ATOMIC     0xc0   atomic operations                     `Atomic operations`_
   =============  =====  ====================================  =============
 
-The size modifier is one of:
+size
+  one of:
 
   =============  =====  =====================
   size modifier  value  description
@@ -308,25 +338,31 @@ The size modifier is one of:
   BPF_DW         0x18   double word (8 bytes)
   =============  =====  =====================
 
+instruction class
+  the instruction class (see `Instruction classes`_)
+
 Regular load and store operations
 ---------------------------------
 
 The ``BPF_MEM`` mode modifier is used to encode regular load and store
 instructions that transfer data between a register and memory.
 
-``BPF_MEM | <size> | BPF_STX`` means::
-
-  *(size *) (dst + offset) = src_reg
-
-``BPF_MEM | <size> | BPF_ST`` means::
-
-  *(size *) (dst + offset) = imm32
-
-``BPF_MEM | <size> | BPF_LDX`` means::
-
-  dst = *(size *) (src + offset)
-
-Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
+=============================  =========  ====================================
+opcode construction            opcode     pseudocode
+=============================  =========  ====================================
+BPF_MEM | BPF_B | BPF_LDX      0x71       dst = \*(uint8_t \*) (src + offset)
+BPF_MEM | BPF_H | BPF_LDX      0x69       dst = \*(uint16_t \*) (src + offset)
+BPF_MEM | BPF_W | BPF_LDX      0x61       dst = \*(uint32_t \*) (src + offset)
+BPF_MEM | BPF_DW | BPF_LDX     0x79       dst = \*(uint64_t \*) (src + offset)
+BPF_MEM | BPF_B | BPF_ST       0x72       \*(uint8_t \*) (dst + offset) = imm
+BPF_MEM | BPF_H | BPF_ST       0x6a       \*(uint16_t \*) (dst + offset) = imm
+BPF_MEM | BPF_W | BPF_ST       0x62       \*(uint32_t \*) (dst + offset) = imm
+BPF_MEM | BPF_DW | BPF_ST      0x7a       \*(uint64_t \*) (dst + offset) = imm
+BPF_MEM | BPF_B | BPF_STX      0x73       \*(uint8_t \*) (dst + offset) = src
+BPF_MEM | BPF_H | BPF_STX      0x6b       \*(uint16_t \*) (dst + offset) = src
+BPF_MEM | BPF_W | BPF_STX      0x63       \*(uint32_t \*) (dst + offset) = src
+BPF_MEM | BPF_DW | BPF_STX     0x7b       \*(uint64_t \*) (dst + offset) = src
+=============================  =========  ====================================
 
 Atomic operations
 -----------------
@@ -338,9 +374,11 @@ by other eBPF programs or means outside of this specification.
 All atomic operations supported by eBPF are encoded as store operations
 that use the ``BPF_ATOMIC`` mode modifier as follows:
 
-* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
-* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
-* 8-bit and 16-bit wide atomic operations are not supported.
+* ``BPF_ATOMIC | BPF_W | BPF_STX`` (0xc3) for 32-bit operations
+* ``BPF_ATOMIC | BPF_DW | BPF_STX`` (0xdb) for 64-bit operations
+
+Note that 8-bit (``BPF_B``) and 16-bit (``BPF_H``) wide atomic operations are not supported,
+nor is ``BPF_ATOMIC | <size> | BPF_ST``.
 
 The 'imm' field is used to encode the actual atomic operation.
 Simple atomic operation use a subset of the values defined to encode
@@ -355,16 +393,15 @@ BPF_AND   0x50   atomic and
 BPF_XOR   0xa0   atomic xor
 ========  =====  ===========
 
-
-``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
+``BPF_ATOMIC | BPF_W  | BPF_STX`` (0xc3) with 'imm' = BPF_ADD means::
 
   *(uint32_t *)(dst + offset) += src
 
-``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
+``BPF_ATOMIC | BPF_DW | BPF_STX`` (0xdb) with 'imm' = BPF ADD means::
 
   *(uint64_t *)(dst + offset) += src
 
-In addition to the simple atomic operations, there also is a modifier and
+In addition to the simple atomic operations above, there also is a modifier and
 two complex atomic operations:
 
 ===========  ================  ===========================
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 12/15] ebpf-docs: Add Linux note about register calling convention
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (9 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 11/15] ebpf-docs: Improve English readability dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-30 22:17   ` Alexei Starovoitov
  2022-09-27 18:59 ` [PATCH 13/15] ebpf-docs: Add extended 64-bit immediate instructions dthaler1968
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/linux-notes.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
index 522ebe27d..0581ba326 100644
--- a/Documentation/bpf/linux-notes.rst
+++ b/Documentation/bpf/linux-notes.rst
@@ -7,6 +7,12 @@ Linux implementation notes
 
 This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
 
+Registers and calling convention
+================================
+
+All program types only use R1 which contains the "context", which is typically a structure containing all
+the inputs needed, and the exit value for eBPF programs is passed as a 32 bit value.
+
 Arithmetic instructions
 =======================
 
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 13/15] ebpf-docs: Add extended 64-bit immediate instructions
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (10 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 12/15] ebpf-docs: Add Linux note about register calling convention dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-27 18:59 ` [PATCH 14/15] ebpf-docs: Add extended call instructions dthaler1968
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 54 +++++++++++++++++++++++++--
 Documentation/bpf/linux-notes.rst     | 10 +++++
 2 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 328207ff6..667d97715 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -430,14 +430,54 @@ and loaded back to ``R0``.
 -----------------------------
 
 Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
-encoding for an extra imm64 value.
+encoding defined in `Instruction encoding`_, and use the 'src' field of the
+basic instruction to hold an opcode subtype.
+
+The following instructions are defined, and use additional concepts defined below:
+
+=========================  ======  ===  =====================================  ===========  ==============
+opcode construction        opcode  src  pseudocode                             imm type     dst type
+=========================  ======  ===  =====================================  ===========  ==============
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x0  dst = imm64                            integer      integer
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x1  dst = map_by_fd(imm)                   map fd       map
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x2  dst = mva(map_by_fd(imm)) + next_imm   map fd       data pointer
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x3  dst = variable_addr(imm)               variable id  data pointer
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x4  dst = code_addr(imm)                   integer      code pointer
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x5  dst = map_by_idx(imm)                  map index    map
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x6  dst = mva(map_by_idx(imm)) + next_imm  map index    data pointer
+=========================  ======  ===  =====================================  ===========  ==============
 
-There is currently only one such instruction.
+where
+
+* map_by_fd(fd) means to convert a 32-bit POSIX file descriptor into an address of a map object (see `Map objects`_)
+* map_by_index(index) means to convert a 32-bit index into an address of a map object
+* mva(map) gets the address of the first value in a given map object
+* variable_addr(id) gets the address of a variable (see `Variables`_) with a given id
+* code_addr(offset) gets the address of the instruction at a specified relative offset in units of 64-bit blocks
+* the 'imm type' can be used by disassemblers for display
+* the 'dst type' can be used for verification and JIT compilation purposes
+
+Map objects
+~~~~~~~~~~~
+
+Maps are shared memory regions accessible by eBPF programs on some platforms, where we use the term "map object"
+to refer to an object containing the data and metadata (e.g., size) about the memory region.
+A map can have various semantics as defined in a separate document, and may or may not have a single
+contiguous memory region, but the 'mva(map)' is currently only defined for maps that do have a single
+contiguous memory region.  Support for maps is optional.
 
-``BPF_LD | BPF_DW | BPF_IMM`` means::
+Each map object can have a POSIX file descriptor (fd) if supported by the platform,
+where 'map_by_fd(fd)' means to get the map with the specified file descriptor.
+Each eBPF program can also be defined to use a set of maps associated with the program
+at load time, and 'map_by_index(index)' means to get the map with the given index in the set
+associated with the eBPF program containing the instruction.
 
-  dst = imm64
+Variables
+~~~~~~~~~
 
+Variables are memory regions, identified by integer ids, accessible by eBPF programs on
+some platforms.  The 'variable_addr(id)' operation means to get the address of the memory region
+identified by the given id.  Support for such variables is optional.
 
 Legacy BPF Packet access instructions
 -------------------------------------
@@ -465,6 +505,12 @@ opcode  src  imm   description                                          referenc
 0x16    0x0  any   if (uint32_t)dst == imm goto +offset                 `Jump instructions`_
 0x17    0x0  any   dst -= imm                                           `Arithmetic instructions`_
 0x18    0x0  any   dst = imm64                                          `64-bit immediate instructions`_
+0x18    0x1  any   dst = map_by_fd(imm)                                 `64-bit immediate instructions`_
+0x18    0x2  any   dst = mva(map_by_fd(imm)) + next_imm                 `64-bit immediate instructions`_
+0x18    0x3  any   dst = variable_addr(imm)                             `64-bit immediate instructions`_
+0x18    0x4  any   dst = code_addr(imm)                                 `64-bit immediate instructions`_
+0x18    0x5  any   dst = map_by_idx(imm)                                `64-bit immediate instructions`_
+0x18    0x6  any   dst = mva(map_by_idx(imm)) + next_imm                `64-bit immediate instructions`_
 0x1c    any  0x00  dst = (uint32_t)(dst - src)                          `Arithmetic instructions`_
 0x1d    any  0x00  if dst == src goto +offset                           `Jump instructions`_
 0x1e    any  0x00  if (uint32_t)dst == (uint32_t)src goto +offset       `Jump instructions`_
diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
index 0581ba326..e7f79242c 100644
--- a/Documentation/bpf/linux-notes.rst
+++ b/Documentation/bpf/linux-notes.rst
@@ -24,6 +24,16 @@ Byte swap instructions
 
 ``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and ``BPF_TO_BE`` respectively.
 
+Map objects
+===========
+
+Linux only supports the 'mva(map)' operation on array maps with a single element.
+
+Variables
+=========
+
+Linux uses BTF ids to identify variables.
+
 Legacy BPF Packet access instructions
 =====================================
 
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 14/15] ebpf-docs: Add extended call instructions
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (11 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 13/15] ebpf-docs: Add extended 64-bit immediate instructions dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-27 18:59 ` [PATCH 15/15] ebpf-docs: Add note about invalid instruction dthaler1968
  2022-09-30 20:50 ` [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file patchwork-bot+netdevbpf
  14 siblings, 0 replies; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/instruction-set.rst | 52 +++++++++++++++++----------
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 667d97715..2ac8f0dae 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -265,24 +265,26 @@ otherwise identical operations.
 
 The 4-bit 'code' field encodes the operation as below, where PC is the program counter:
 
-========  =====  =========================  ============
-code      value  description                notes
-========  =====  =========================  ============
-BPF_JA    0x00   PC += off                  BPF_JMP only
-BPF_JEQ   0x10   PC += off if dst == src
-BPF_JGT   0x20   PC += off if dst > src     unsigned
-BPF_JGE   0x30   PC += off if dst >= src    unsigned
-BPF_JSET  0x40   PC += off if dst & src
-BPF_JNE   0x50   PC += off if dst != src
-BPF_JSGT  0x60   PC += off if dst > src     signed
-BPF_JSGE  0x70   PC += off if dst >= src    signed
-BPF_CALL  0x80   function call              see `Helper functions`_
-BPF_EXIT  0x90   function / program return  BPF_JMP only
-BPF_JLT   0xa0   PC += off if dst < src     unsigned
-BPF_JLE   0xb0   PC += off if dst <= src    unsigned
-BPF_JSLT  0xc0   PC += off if dst < src     signed
-BPF_JSLE  0xd0   PC += off if dst <= src    signed
-========  =====  =========================  ============
+========  =====  ===  ==========================  ========================
+code      value  src  description                 notes
+========  =====  ===  ==========================  ========================
+BPF_JA    0x0    0x0  PC += offset                BPF_JMP only
+BPF_JEQ   0x1    any  PC += offset if dst == src
+BPF_JGT   0x2    any  PC += offset if dst > src   unsigned
+BPF_JGE   0x3    any  PC += offset if dst >= src  unsigned
+BPF_JSET  0x4    any  PC += offset if dst & src
+BPF_JNE   0x5    any  PC += offset if dst != src
+BPF_JSGT  0x6    any  PC += offset if dst > src   signed
+BPF_JSGE  0x7    any  PC += offset if dst >= src  signed
+BPF_CALL  0x8    0x0  call helper function imm    see `Helper functions`_
+BPF_CALL  0x8    0x1  call PC += offset           see `eBPF functions`_
+BPF_CALL  0x8    0x2  call runtime function imm   see `Runtime functions`_
+BPF_EXIT  0x9    0x0  return                      BPF_JMP only
+BPF_JLT   0xa    any  PC += offset if dst < src   unsigned
+BPF_JLE   0xb    any  PC += offset if dst <= src  unsigned
+BPF_JSLT  0xc    any  PC += offset if dst < src   signed
+BPF_JSLE  0xd    any  PC += offset if dst <= src  signed
+========  =====  ===  ==========================  ========================
 
 Helper functions
 ~~~~~~~~~~~~~~~~
@@ -301,6 +303,18 @@ with the remaining registers being ignored.  The definition of a helper function
 is responsible for specifying the type (e.g., integer, pointer, etc.) of the value returned,
 the number of arguments, and the type of each argument.
 
+Runtime functions
+~~~~~~~~~~~~~~~~~
+Runtime functions are like helper functions except that they are not specific
+to eBPF programs.  They use a different numbering space from helper functions,
+but otherwise the same considerations apply.
+
+eBPF functions
+~~~~~~~~~~~~~~
+eBPF functions are functions exposed by the same eBPF program as the caller,
+and are referenced by offset from the call instruction, similar to ``BPF_JA``.
+A ``BPF_EXIT`` within the eBPF function will return to the caller.
+
 Load and store instructions
 ===========================
 
@@ -585,6 +599,8 @@ opcode  src  imm   description                                          referenc
 0x7f    any  0x00  dst >>= src                                          `Arithmetic instructions`_
 0x84    0x0  0x00  dst = (uint32_t)-dst                                 `Arithmetic instructions`_
 0x85    0x0  any   call helper function imm                             `Helper functions`_
+0x85    0x1  any   call PC += offset                                    `eBPF functions`_
+0x85    0x2  any   call runtime function imm                            `Runtime functions`_
 0x87    0x0  0x00  dst = -dst                                           `Arithmetic instructions`_
 0x94    0x0  any   dst = (uint32_t)((imm != 0) ? (dst % imm) : dst)     `Arithmetic instructions`_
 0x95    0x0  0x00  return                                               `Jump instructions`_
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 15/15] ebpf-docs: Add note about invalid instruction
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (12 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 14/15] ebpf-docs: Add extended call instructions dthaler1968
@ 2022-09-27 18:59 ` dthaler1968
  2022-09-30 22:21   ` Alexei Starovoitov
  2022-09-30 20:50 ` [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file patchwork-bot+netdevbpf
  14 siblings, 1 reply; 50+ messages in thread
From: dthaler1968 @ 2022-09-27 18:59 UTC (permalink / raw)
  To: bpf; +Cc: Dave Thaler

From: Dave Thaler <dthaler@microsoft.com>

Signed-off-by: Dave Thaler <dthaler@microsoft.com>
---
 Documentation/bpf/clang-notes.rst     | 5 +++++
 Documentation/bpf/instruction-set.rst | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/Documentation/bpf/clang-notes.rst b/Documentation/bpf/clang-notes.rst
index 528feddf2..3c934421b 100644
--- a/Documentation/bpf/clang-notes.rst
+++ b/Documentation/bpf/clang-notes.rst
@@ -20,6 +20,11 @@ Arithmetic instructions
 For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with
 ``-Xclang -target-feature -Xclang +alu32``.  In CPU version 3, support is automatically included.
 
+Invalid instructions
+====================
+
+Clang will generate the invalid ``BPF_CALL | BPF_X | BPF_JMP`` (0x8d) instruction if ``-O0`` is used.
+
 Atomic operations
 =================
 
diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 2ac8f0dae..af9dc0cc6 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -303,6 +303,9 @@ with the remaining registers being ignored.  The definition of a helper function
 is responsible for specifying the type (e.g., integer, pointer, etc.) of the value returned,
 the number of arguments, and the type of each argument.
 
+Note that ``BPF_CALL | BPF_X | BPF_JMP`` (0x8d), where the helper function integer
+would be read from a specified register, is not currently permitted.
+
 Runtime functions
 ~~~~~~~~~~~~~~~~~
 Runtime functions are like helper functions except that they are not specific
-- 
2.33.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/15] ebpf-docs: Use standard type convention in standard doc
  2022-09-27 18:59 ` [PATCH 06/15] ebpf-docs: Use standard type convention in standard doc dthaler1968
@ 2022-09-30 20:49   ` Alexei Starovoitov
  0 siblings, 0 replies; 50+ messages in thread
From: Alexei Starovoitov @ 2022-09-30 20:49 UTC (permalink / raw)
  To: dthaler1968; +Cc: bpf, Dave Thaler

On Tue, Sep 27, 2022 at 06:59:49PM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>

Please always add commit log even it's just a copy paste of a subject line.

The patch subj should be 'bpf, docs:'.
I fixed it up while applying the first 5 patches.

> ---
>  Documentation/bpf/instruction-set.rst | 14 ++++++++++----
>  Documentation/bpf/linux-notes.rst     |  6 ++++++
>  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 4997d2088..a24bc5d53 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -7,6 +7,10 @@ eBPF Instruction Set Specification, v1.0
>  
>  This document specifies version 1.0 of the eBPF instruction set.
>  
> +Documentation conventions
> +=========================
> +
> +This specification uses the standard C types (uint32_t, etc.) in documentation.
>  
>  Registers and calling convention
>  ================================
> @@ -114,7 +118,9 @@ BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
>  
>  ``BPF_ADD | BPF_X | BPF_ALU`` means::
>  
> -  dst_reg = (u32) dst_reg + (u32) src_reg;
> +  dst_reg = (uint32_t) dst_reg + (uint32_t) src_reg;
> +
> +where '(uint32_t)' indicates truncation to 32 bits.

This part I'm not excited about.
imo the "standard" types are too verbose. They make the doc harder to read.
I think it would be just fine for the standard to say in the conventions section
that u32 is a 32-bit unsigned integer and that's how it's used in the doc.
u32 would be a name that the doc uses. Name match with linux type is 'accidental'.

>  ``BPF_ADD | BPF_X | BPF_ALU64`` means::
>  
> @@ -122,7 +128,7 @@ BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
>  
>  ``BPF_XOR | BPF_K | BPF_ALU`` means::
>  
> -  src_reg = (u32) src_reg ^ (u32) imm32
> +  src_reg = (uint32_t) src_reg ^ (uint32_t) imm32
>  
>  ``BPF_XOR | BPF_K | BPF_ALU64`` means::
>  
> @@ -276,11 +282,11 @@ BPF_XOR   0xa0   atomic xor
>  
>  ``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
>  
> -  *(u32 *)(dst_reg + off16) += src_reg
> +  *(uint32_t *)(dst_reg + off16) += src_reg
>  
>  ``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
>  
> -  *(u64 *)(dst_reg + off16) += src_reg
> +  *(uint32_t *)(dst_reg + off16) += src_reg

Bug.
And the later patch fixes it :(
Let's not add it in the first place.

>  In addition to the simple atomic operations, there also is a modifier and
>  two complex atomic operations:
> diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
> index 1c31379b4..522ebe27d 100644
> --- a/Documentation/bpf/linux-notes.rst
> +++ b/Documentation/bpf/linux-notes.rst
> @@ -7,6 +7,12 @@ Linux implementation notes
>  
>  This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
>  
> +Arithmetic instructions
> +=======================
> +
> +While the eBPF instruction set document uses the standard C terminology as the cross-platform specification,
> +in the Linux kernel, uint32_t is expressed as u32, uint64_t is expressed as u64, etc.
> +
>  Byte swap instructions
>  ======================
>  
> -- 
> 2.33.4
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file
  2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
                   ` (13 preceding siblings ...)
  2022-09-27 18:59 ` [PATCH 15/15] ebpf-docs: Add note about invalid instruction dthaler1968
@ 2022-09-30 20:50 ` patchwork-bot+netdevbpf
  14 siblings, 0 replies; 50+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-09-30 20:50 UTC (permalink / raw)
  To: None; +Cc: bpf, dthaler

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Tue, 27 Sep 2022 18:59:44 +0000 you wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
>  Documentation/bpf/instruction-set.rst | 38 ++--------------
>  Documentation/bpf/linux-notes.rst     | 65 +++++++++++++++++++++++++++
>  2 files changed, 68 insertions(+), 35 deletions(-)
>  create mode 100644 Documentation/bpf/linux-notes.rst

Here is the summary with links:
  - [01/15] ebpf-docs: Move legacy packet instructions to a separate file
    https://git.kernel.org/bpf/bpf-next/c/6166da0a02cd
  - [02/15] ebpf-docs: Linux byteswap note
    https://git.kernel.org/bpf/bpf-next/c/9a0bf21337c6
  - [03/15] ebpf-docs: Move Clang notes to a separate file
    https://git.kernel.org/bpf/bpf-next/c/6c7aaffb24ef
  - [04/15] ebpf-docs: Add Clang note about BPF_ALU
    https://git.kernel.org/bpf/bpf-next/c/ee159bdbdbce
  - [05/15] ebpf-docs: Add TOC and fix formatting
    https://git.kernel.org/bpf/bpf-next/c/5a8921ba96ce
  - [06/15] ebpf-docs: Use standard type convention in standard doc
    (no matching commit)
  - [07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
    (no matching commit)
  - [08/15] ebpf-docs: Use consistent names for the same field
    (no matching commit)
  - [09/15] ebpf-docs: Explain helper functions
    (no matching commit)
  - [10/15] ebpf-docs: Add appendix of all opcodes in order
    (no matching commit)
  - [11/15] ebpf-docs: Improve English readability
    (no matching commit)
  - [12/15] ebpf-docs: Add Linux note about register calling convention
    (no matching commit)
  - [13/15] ebpf-docs: Add extended 64-bit immediate instructions
    (no matching commit)
  - [14/15] ebpf-docs: Add extended call instructions
    (no matching commit)
  - [15/15] ebpf-docs: Add note about invalid instruction
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
  2022-09-27 18:59 ` [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow dthaler1968
@ 2022-09-30 20:52   ` Alexei Starovoitov
  2022-09-30 21:54     ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Alexei Starovoitov @ 2022-09-30 20:52 UTC (permalink / raw)
  To: dthaler1968; +Cc: bpf, Dave Thaler

On Tue, Sep 27, 2022 at 06:59:50PM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
>  Documentation/bpf/instruction-set.rst | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index a24bc5d53..3c5a63612 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -103,19 +103,26 @@ code      value  description
>  BPF_ADD   0x00   dst += src
>  BPF_SUB   0x10   dst -= src
>  BPF_MUL   0x20   dst \*= src
> -BPF_DIV   0x30   dst /= src
> +BPF_DIV   0x30   dst = (src != 0) ? (dst / src) : 0
>  BPF_OR    0x40   dst \|= src
>  BPF_AND   0x50   dst &= src
>  BPF_LSH   0x60   dst <<= src
>  BPF_RSH   0x70   dst >>= src
>  BPF_NEG   0x80   dst = ~src
> -BPF_MOD   0x90   dst %= src
> +BPF_MOD   0x90   dst = (src != 0) ? (dst % src) : dst
>  BPF_XOR   0xa0   dst ^= src
>  BPF_MOV   0xb0   dst = src
>  BPF_ARSH  0xc0   sign extending shift right
>  BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
>  ========  =====  ==========================================================
>  
> +Underflow and overflow are allowed during arithmetic operations,
> +meaning the 64-bit or 32-bit value will wrap.  If
> +eBPF program execution would result in division by zero,
> +the destination register is instead set to zero.
> +If execution would result in modulo by zero,
> +the destination register is instead left unchanged.
> +
>  ``BPF_ADD | BPF_X | BPF_ALU`` means::
>  
>    dst_reg = (uint32_t) dst_reg + (uint32_t) src_reg;
> @@ -135,6 +142,14 @@ where '(uint32_t)' indicates truncation to 32 bits.
>    src_reg = src_reg ^ imm32
>  
>  
> +Also note that the modulo operation often varies by language
> +when the dividend or divisor are negative, where Python, Ruby, etc.
> +differ from C, Go, Java, etc. This specification requires that
> +modulo use truncated division (where -13 % 3 == -1) as implemented
> +in C, Go, etc.:
> +
> +   a % n = a - n * trunc(a / n)
> +

Interesting bit of info, but I'm not sure how it relates to the ISA doc.

I think it's more important to say that BPF ISA only supports unsigned div/mod.
There are no instructions for signed div/mod.

>  Byte swap instructions
>  ~~~~~~~~~~~~~~~~~~~~~~
>  
> -- 
> 2.33.4
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/15] ebpf-docs: Use consistent names for the same field
  2022-09-27 18:59 ` [PATCH 08/15] ebpf-docs: Use consistent names for the same field dthaler1968
@ 2022-09-30 20:57   ` Alexei Starovoitov
  2022-10-04 14:44     ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Alexei Starovoitov @ 2022-09-30 20:57 UTC (permalink / raw)
  To: dthaler1968; +Cc: bpf, Dave Thaler

On Tue, Sep 27, 2022 at 06:59:51PM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
>  Documentation/bpf/instruction-set.rst | 107 ++++++++++++++++++--------
>  1 file changed, 76 insertions(+), 31 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 3c5a63612..2987234eb 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -34,20 +34,59 @@ Instruction encoding
>  eBPF has two instruction encodings:
>  
>  * the basic instruction encoding, which uses 64 bits to encode an instruction
> -* the wide instruction encoding, which appends a second 64-bit immediate value
> -  (imm64) after the basic instruction for a total of 128 bits.
> +* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
> +  constant) value after the basic instruction for a total of 128 bits.
>  
> -The basic instruction encoding looks as follows:
> +The basic instruction encoding is as follows, where MSB and LSB mean the most significant
> +bits and least significant bits, respectively:
>  
>  =============  =======  ===============  ====================  ============
>  32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
>  =============  =======  ===============  ====================  ============
> -immediate      offset   source register  destination register  opcode
> +imm            offset   src              dst                   opcode
>  =============  =======  ===============  ====================  ============
>  
> +imm
> +  signed integer immediate value
> +
> +offset
> +  signed integer offset used with pointer arithmetic
> +
> +src
> +  the source register number (0-10), except where otherwise specified
> +  (`64-bit immediate instructions`_ reuse this field for other purposes)

There are more than one?
I guess we have such section now,
but in ISA it really is only one insn. LD_IMM64.
It's one insn for the interpreter and one insn for JITs.

> +
> +dst
> +  destination register number (0-10)
> +
> +opcode
> +  operation to perform
> +
>  Note that most instructions do not use all of the fields.
>  Unused fields shall be cleared to zero.
>  
> +As discussed below in `64-bit immediate instructions`_, some
> +instructions use a 64-bit immediate value that is constructed as follows.
> +The 64 bits following the basic instruction contain a pseudo instruction
> +using the same format but with opcode, dst, src, and offset all set to zero,
> +and imm containing the high 32 bits of the immediate value.

'instructions' here and further reads a bit odd.
May be calling it one instruction where imm_lo/hi have different semantics
depending on src would be better?

> +
> +=================  ==================
> +64 bits (MSB)      64 bits (LSB)
> +=================  ==================
> +basic instruction  pseudo instruction
> +=================  ==================
> +
> +Thus the 64-bit immediate value is constructed as follows:
> +
> +  imm64 = imm + (next_imm << 32)
> +
> +where 'next_imm' refers to the imm value of the pseudo instruction
> +following the basic instruction.
> +
> +In the remainder of this document 'src' and 'dst' refer to the values of the source
> +and destination registers, respectively, rather than the register number.
> +
>  Instruction classes
>  -------------------
>  
> @@ -75,20 +114,24 @@ For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` an
>  ==============  ======  =================
>  4 bits (MSB)    1 bit   3 bits (LSB)
>  ==============  ======  =================
> -operation code  source  instruction class
> +code            source  instruction class
>  ==============  ======  =================
>  
> -The 4th bit encodes the source operand:

feels wrong to lose this part.

> +code
> +  the operation code, whose meaning varies by instruction class
>  
> -  ======  =====  ========================================
> -  source  value  description
> -  ======  =====  ========================================
> -  BPF_K   0x00   use 32-bit immediate as source operand
> -  BPF_X   0x08   use 'src_reg' register as source operand
> -  ======  =====  ========================================
> +source
> +  the source operand location, which unless otherwise specified is one of:
>  
> -The four MSB bits store the operation code.

same here.

> +  ======  =====  ==========================================
> +  source  value  description
> +  ======  =====  ==========================================
> +  BPF_K   0x00   use 32-bit 'imm' value as source operand
> +  BPF_X   0x08   use 'src' register value as source operand
> +  ======  =====  ==========================================
>  
> +instruction class
> +  the instruction class (see `Instruction classes`_)
>  
>  Arithmetic instructions
>  -----------------------
> @@ -116,6 +159,8 @@ BPF_ARSH  0xc0   sign extending shift right
>  BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
>  ========  =====  ==========================================================
>  
> +where 'src' is the source operand value.
> +
>  Underflow and overflow are allowed during arithmetic operations,
>  meaning the 64-bit or 32-bit value will wrap.  If
>  eBPF program execution would result in division by zero,
> @@ -125,21 +170,21 @@ the destination register is instead left unchanged.
>  
>  ``BPF_ADD | BPF_X | BPF_ALU`` means::
>  
> -  dst_reg = (uint32_t) dst_reg + (uint32_t) src_reg;
> +  dst = (uint32_t) (dst + src)
>  
>  where '(uint32_t)' indicates truncation to 32 bits.
>  
>  ``BPF_ADD | BPF_X | BPF_ALU64`` means::
>  
> -  dst_reg = dst_reg + src_reg
> +  dst = dst + src
>  
>  ``BPF_XOR | BPF_K | BPF_ALU`` means::
>  
> -  src_reg = (uint32_t) src_reg ^ (uint32_t) imm32
> +  src = (uint32_t) src ^ (uint32_t) imm
>  
>  ``BPF_XOR | BPF_K | BPF_ALU64`` means::
>  
> -  src_reg = src_reg ^ imm32
> +  src = src ^ imm
>  
>  
>  Also note that the modulo operation often varies by language
> @@ -176,11 +221,11 @@ Examples:
>  
>  ``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
>  
> -  dst_reg = htole16(dst_reg)
> +  dst = htole16(dst)
>  
>  ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
>  
> -  dst_reg = htobe64(dst_reg)
> +  dst = htobe64(dst)
>  
>  Jump instructions
>  -----------------
> @@ -255,15 +300,15 @@ instructions that transfer data between a register and memory.
>  
>  ``BPF_MEM | <size> | BPF_STX`` means::
>  
> -  *(size *) (dst_reg + off) = src_reg
> +  *(size *) (dst + offset) = src_reg
>  
>  ``BPF_MEM | <size> | BPF_ST`` means::
>  
> -  *(size *) (dst_reg + off) = imm32
> +  *(size *) (dst + offset) = imm32
>  
>  ``BPF_MEM | <size> | BPF_LDX`` means::
>  
> -  dst_reg = *(size *) (src_reg + off)
> +  dst = *(size *) (src + offset)
>  
>  Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
>  
> @@ -297,11 +342,11 @@ BPF_XOR   0xa0   atomic xor
>  
>  ``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
>  
> -  *(uint32_t *)(dst_reg + off16) += src_reg
> +  *(uint32_t *)(dst + offset) += src
>  
>  ``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
>  
> -  *(uint32_t *)(dst_reg + off16) += src_reg
> +  *(uint64_t *)(dst + offset) += src
>  
>  In addition to the simple atomic operations, there also is a modifier and
>  two complex atomic operations:
> @@ -316,16 +361,16 @@ BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
>  
>  The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
>  always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
> -is set, then the operation also overwrites ``src_reg`` with the value that
> +is set, then the operation also overwrites ``src`` with the value that
>  was in memory before it was modified.
>  
> -The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value
> -addressed by ``dst_reg + off``.
> +The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
> +addressed by ``dst + offset``.
>  
>  The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
> -``dst_reg + off`` with ``R0``. If they match, the value addressed by
> -``dst_reg + off`` is replaced with ``src_reg``. In either case, the
> -value that was at ``dst_reg + off`` before the operation is zero-extended
> +``dst + offset`` with ``R0``. If they match, the value addressed by
> +``dst + offset`` is replaced with ``src``. In either case, the
> +value that was at ``dst + offset`` before the operation is zero-extended
>  and loaded back to ``R0``.
>  
>  64-bit immediate instructions
> @@ -338,7 +383,7 @@ There is currently only one such instruction.
>  
>  ``BPF_LD | BPF_DW | BPF_IMM`` means::
>  
> -  dst_reg = imm64
> +  dst = imm64
>  
>  
>  Legacy BPF Packet access instructions
> -- 
> 2.33.4
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
  2022-09-30 20:52   ` Alexei Starovoitov
@ 2022-09-30 21:54     ` Dave Thaler
  2022-09-30 21:59       ` Alexei Starovoitov
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Thaler @ 2022-09-30 21:54 UTC (permalink / raw)
  To: Alexei Starovoitov, dthaler1968; +Cc: bpf

[...]
> > +Also note that the modulo operation often varies by language when the
> > +dividend or divisor are negative, where Python, Ruby, etc.
> > +differ from C, Go, Java, etc. This specification requires that modulo
> > +use truncated division (where -13 % 3 == -1) as implemented in C, Go,
> > +etc.:
> > +
> > +   a % n = a - n * trunc(a / n)
> > +
> 
> Interesting bit of info, but I'm not sure how it relates to the ISA doc.

It's because there's multiple definitions of modulo out there as the paragraph notes,
which differ in what they do with negative numbers.
The ISA defines the modulo operation as being the specific version above.
If you tried to implement the ISA in say Python and didn't know that,
you'd have a non-compliant implementation.

Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
  2022-09-30 21:54     ` Dave Thaler
@ 2022-09-30 21:59       ` Alexei Starovoitov
  2022-09-30 22:41         ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Alexei Starovoitov @ 2022-09-30 21:59 UTC (permalink / raw)
  To: Dave Thaler; +Cc: dthaler1968, bpf

On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
> [...]
> > > +Also note that the modulo operation often varies by language when the
> > > +dividend or divisor are negative, where Python, Ruby, etc.
> > > +differ from C, Go, Java, etc. This specification requires that modulo
> > > +use truncated division (where -13 % 3 == -1) as implemented in C, Go,
> > > +etc.:
> > > +
> > > +   a % n = a - n * trunc(a / n)
> > > +
> > 
> > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
> 
> It's because there's multiple definitions of modulo out there as the paragraph notes,
> which differ in what they do with negative numbers.
> The ISA defines the modulo operation as being the specific version above.
> If you tried to implement the ISA in say Python and didn't know that,
> you'd have a non-compliant implementation.

Is it because the languages have weird rules to pick between signed vs unsigned mod?
At least from llvm pov the smod and umod have fixed behavior.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 09/15] ebpf-docs: Explain helper functions
  2022-09-27 18:59 ` [PATCH 09/15] ebpf-docs: Explain helper functions dthaler1968
@ 2022-09-30 22:01   ` Alexei Starovoitov
  2022-09-30 23:01     ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Alexei Starovoitov @ 2022-09-30 22:01 UTC (permalink / raw)
  To: dthaler1968; +Cc: bpf, Dave Thaler

On Tue, Sep 27, 2022 at 06:59:52PM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
>  Documentation/bpf/instruction-set.rst | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 2987234eb..926957830 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -245,7 +245,7 @@ BPF_JSET  0x40   PC += off if dst & src
>  BPF_JNE   0x50   PC += off if dst != src
>  BPF_JSGT  0x60   PC += off if dst > src     signed
>  BPF_JSGE  0x70   PC += off if dst >= src    signed
> -BPF_CALL  0x80   function call
> +BPF_CALL  0x80   function call              see `Helper functions`_
>  BPF_EXIT  0x90   function / program return  BPF_JMP only
>  BPF_JLT   0xa0   PC += off if dst < src     unsigned
>  BPF_JLE   0xb0   PC += off if dst <= src    unsigned
> @@ -256,6 +256,22 @@ BPF_JSLE  0xd0   PC += off if dst <= src    signed
>  The eBPF program needs to store the return value into register R0 before doing a
>  BPF_EXIT.
>  
> +Helper functions
> +~~~~~~~~~~~~~~~~
> +Helper functions are a concept whereby BPF programs can call into a
> +set of function calls exposed by the eBPF runtime.  Each helper
> +function is identified by an integer used in a ``BPF_CALL`` instruction.
> +The available helper functions may differ for each eBPF program type.
> +
> +Conceptually, each helper function is implemented with a commonly shared function
> +signature defined as:
> +
> +  uint64_t function(uint64_t r1, uint64_t r2, uint64_t r3, uint64_t r4, uint64_t r5)
> +
> +In actuality, each helper function is defined as taking between 0 and 5 arguments,
> +with the remaining registers being ignored.  The definition of a helper function
> +is responsible for specifying the type (e.g., integer, pointer, etc.) of the value returned,
> +the number of arguments, and the type of each argument.

If we explain helpers in the doc then we should explain kfuncs and bpf-to-bpf calls as well.
Otherwise it looks incomplete and eventually will suffer the same issue as '64-bit instructionS'.
Here it's only one CALL insn.
Though 'imm' value can be interpreted differently bpf2bpf vs helper vs kfunc.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 10/15] ebpf-docs: Add appendix of all opcodes in order
  2022-09-27 18:59 ` [PATCH 10/15] ebpf-docs: Add appendix of all opcodes in order dthaler1968
@ 2022-09-30 22:02   ` Alexei Starovoitov
  2022-09-30 22:43     ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Alexei Starovoitov @ 2022-09-30 22:02 UTC (permalink / raw)
  To: dthaler1968; +Cc: bpf, Dave Thaler

On Tue, Sep 27, 2022 at 06:59:53PM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
>  Documentation/bpf/instruction-set.rst | 197 ++++++++++++++++++++++++++
>  1 file changed, 197 insertions(+)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 926957830..b6f098104 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -408,3 +408,200 @@ Legacy BPF Packet access instructions
>  eBPF previously introduced special instructions for access to packet data that were
>  carried over from classic BPF. However, these instructions are
>  deprecated and should no longer be used.
> +
> +Appendix
> +========
> +
> +For reference, the following table lists opcodes in order by value.
> +
> +======  ===  ====  ===================================================  ========================================
> +opcode  src  imm   description                                          reference
> +======  ===  ====  ===================================================  ========================================
> +0x00    0x0  any   (additional immediate value)                         `64-bit immediate instructions`_
> +0x04    0x0  any   dst = (uint32_t)(dst + imm)                          `Arithmetic instructions`_
> +0x05    0x0  0x00  goto +offset                                         `Jump instructions`_
> +0x07    0x0  any   dst += imm                                           `Arithmetic instructions`_
> +0x0c    any  0x00  dst = (uint32_t)(dst + src)                          `Arithmetic instructions`_


I guess it's useful, but how did you generate this table?
Looks very error prone if all hex numbers were done by human.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/15] ebpf-docs: Improve English readability
  2022-09-27 18:59 ` [PATCH 11/15] ebpf-docs: Improve English readability dthaler1968
@ 2022-09-30 22:16   ` Alexei Starovoitov
  2022-10-04 14:32     ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Alexei Starovoitov @ 2022-09-30 22:16 UTC (permalink / raw)
  To: dthaler1968; +Cc: bpf, Dave Thaler

On Tue, Sep 27, 2022 at 06:59:54PM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
>  Documentation/bpf/instruction-set.rst | 137 ++++++++++++++++----------
>  1 file changed, 87 insertions(+), 50 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index b6f098104..328207ff6 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -7,6 +7,9 @@ eBPF Instruction Set Specification, v1.0
>  
>  This document specifies version 1.0 of the eBPF instruction set.
>  
> +The eBPF instruction set consists of eleven 64 bit registers, a program counter,
> +and 512 bytes of stack space.

I would not add 512 to a doc.
That's what we have now, but we might relax that in the future.

> +
>  Documentation conventions
>  =========================
>  
> @@ -25,12 +28,24 @@ The eBPF calling convention is defined as:
>  * R6 - R9: callee saved registers that function calls will preserve
>  * R10: read-only frame pointer to access stack
>  
> -R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
> -necessary across calls.
> +Registers R0 - R5 are scratch registers, meaning the BPF program needs to either
> +spill them to the BPF stack or move them to callee saved registers if these
> +arguments are to be reused across multiple function calls. Spilling means
> +that the value in the register is moved to the BPF stack. The reverse operation
> +of moving the variable from the BPF stack to the register is called filling.
> +The reason for spilling/filling is due to the limited number of registers.

More canonical way would be to say that r0-r5 are caller saved.
I like above clarification though.

> +
> +Upon entering execution of an eBPF program, registers R1 - R5 initially can contain
> +the input arguments for the program (similar to the argc/argv pair for a typical C program).
> +The actual number of registers used, and their meaning, is defined by the program type;
> +for example, a networking program might have an argument that includes network packet data
> +and/or metadata.
>  
>  Instruction encoding
>  ====================
>  
> +An eBPF program is a sequence of instructions.
> +
>  eBPF has two instruction encodings:
>  
>  * the basic instruction encoding, which uses 64 bits to encode an instruction
> @@ -63,7 +78,7 @@ opcode
>    operation to perform
>  
>  Note that most instructions do not use all of the fields.
> -Unused fields shall be cleared to zero.
> +Unused fields must be set to zero.
>  
>  As discussed below in `64-bit immediate instructions`_, some
>  instructions use a 64-bit immediate value that is constructed as follows.
> @@ -90,7 +105,9 @@ and destination registers, respectively, rather than the register number.
>  Instruction classes
>  -------------------
>  
> -The three LSB bits of the 'opcode' field store the instruction class:
> +The encoding of the 'opcode' field varies and can be determined from
> +the three least significant bits (LSB) of the 'opcode' field which holds
> +the "instruction class", as follows:
>  
>  =========  =====  ===============================  ===================================
>  class      value  description                      reference
> @@ -136,9 +153,11 @@ instruction class
>  Arithmetic instructions
>  -----------------------
>  
> -``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
> +Instruction class ``BPF_ALU`` uses 32-bit wide operands (zeroing the upper 32 bits
> +of the destination register) while ``BPF_ALU64`` uses 64-bit wide operands for
>  otherwise identical operations.
> -The 'code' field encodes the operation as below:
> +
> +The 4-bit 'code' field encodes the operation as follows:
>  
>  ========  =====  ==========================================================
>  code      value  description
> @@ -168,21 +187,23 @@ the destination register is instead set to zero.
>  If execution would result in modulo by zero,
>  the destination register is instead left unchanged.
>  
> -``BPF_ADD | BPF_X | BPF_ALU`` means::
> +Examples:
> +
> +``BPF_ADD | BPF_X | BPF_ALU`` (0x0c) means::
>  
>    dst = (uint32_t) (dst + src)
>  
>  where '(uint32_t)' indicates truncation to 32 bits.
>  
> -``BPF_ADD | BPF_X | BPF_ALU64`` means::
> +``BPF_ADD | BPF_X | BPF_ALU64`` (0x0f) means::

I don't think adding hex values help here.

>  
>    dst = dst + src
>  
> -``BPF_XOR | BPF_K | BPF_ALU`` means::
> +``BPF_XOR | BPF_K | BPF_ALU`` (0xa4) means::
>  
>    src = (uint32_t) src ^ (uint32_t) imm
>  
> -``BPF_XOR | BPF_K | BPF_ALU64`` means::
> +``BPF_XOR | BPF_K | BPF_ALU64`` (0xa7) means::
>  
>    src = src ^ imm
>  
> @@ -204,8 +225,9 @@ The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
>  The byte swap instructions operate on the destination register
>  only and do not use a separate source register or immediate value.
>  
> -The 1-bit source operand field in the opcode is used to to select what byte
> -order the operation convert from or to:
> +Byte swap instructions use non-default semantics of the 1-bit 'source' field in

I would drop 'non-default'. Many fields have different meanings depending on opcode.
BPF_SRC() macro just reads that bit.

> +the 'opcode' field.  Instead of indicating the source operator, it is instead
> +used to select what byte order the operation converts from or to:
>  
>  =========  =====  =================================================
>  source     value  description
> @@ -215,24 +237,33 @@ BPF_TO_BE  0x08   convert between host byte order and big endian
>  =========  =====  =================================================
>  
>  The 'imm' field encodes the width of the swap operations.  The following widths
> -are supported: 16, 32 and 64.
> -
> -Examples:
> -
> -``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
> -
> -  dst = htole16(dst)
> -
> -``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
> -
> -  dst = htobe64(dst)
> +are supported: 16, 32 and 64. The following table summarizes the resulting
> +possibilities:
> +
> +=============================  =========  ===  ========  ==================
> +opcode construction            opcode     imm  mnemonic  pseudocode
> +=============================  =========  ===  ========  ==================
> +BPF_END | BPF_TO_LE | BPF_ALU  0xd4       16   le16 dst  dst = htole16(dst)
> +BPF_END | BPF_TO_LE | BPF_ALU  0xd4       32   le32 dst  dst = htole32(dst)
> +BPF_END | BPF_TO_LE | BPF_ALU  0xd4       64   le64 dst  dst = htole64(dst)
> +BPF_END | BPF_TO_BE | BPF_ALU  0xdc       16   be16 dst  dst = htobe16(dst)
> +BPF_END | BPF_TO_BE | BPF_ALU  0xdc       32   be32 dst  dst = htobe32(dst)
> +BPF_END | BPF_TO_BE | BPF_ALU  0xdc       64   be64 dst  dst = htobe64(dst)
> +=============================  =========  ===  ========  ==================
> +
> +where
> +
> +* mnenomic indicates a short form that might be displayed by some tools such as disassemblers
> +* 'htoleNN()' indicates converting a NN-bit value from host byte order to little-endian byte order
> +* 'htobeNN()' indicates converting a NN-bit value from host byte order to big-endian byte order

I think we need to add normal bswap insn.
These to_le and to_be are very awkward to use.
As soon as we have new insn the compiler will be using it.
There is no equivalent of to_be and to_le in C. It wasn't good ISA design.

>  Jump instructions
>  -----------------
>  
> -``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
> +Instruction class ``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
>  otherwise identical operations.
> -The 'code' field encodes the operation as below:
> +
> +The 4-bit 'code' field encodes the operation as below, where PC is the program counter:
>  
>  ========  =====  =========================  ============
>  code      value  description                notes
> @@ -253,9 +284,6 @@ BPF_JSLT  0xc0   PC += off if dst < src     signed
>  BPF_JSLE  0xd0   PC += off if dst <= src    signed
>  ========  =====  =========================  ============
>  
> -The eBPF program needs to store the return value into register R0 before doing a
> -BPF_EXIT.
> -
>  Helper functions
>  ~~~~~~~~~~~~~~~~
>  Helper functions are a concept whereby BPF programs can call into a
> @@ -285,7 +313,8 @@ For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_
>  mode          size    instruction class
>  ============  ======  =================
>  
> -The mode modifier is one of:
> +mode
> +  one of:
>  
>    =============  =====  ====================================  =============
>    mode modifier  value  description                           reference
> @@ -297,7 +326,8 @@ The mode modifier is one of:
>    BPF_ATOMIC     0xc0   atomic operations                     `Atomic operations`_
>    =============  =====  ====================================  =============
>  
> -The size modifier is one of:
> +size
> +  one of:
>  
>    =============  =====  =====================
>    size modifier  value  description
> @@ -308,25 +338,31 @@ The size modifier is one of:
>    BPF_DW         0x18   double word (8 bytes)
>    =============  =====  =====================
>  
> +instruction class
> +  the instruction class (see `Instruction classes`_)
> +
>  Regular load and store operations
>  ---------------------------------
>  
>  The ``BPF_MEM`` mode modifier is used to encode regular load and store
>  instructions that transfer data between a register and memory.
>  
> -``BPF_MEM | <size> | BPF_STX`` means::
> -
> -  *(size *) (dst + offset) = src_reg
> -
> -``BPF_MEM | <size> | BPF_ST`` means::
> -
> -  *(size *) (dst + offset) = imm32
> -
> -``BPF_MEM | <size> | BPF_LDX`` means::
> -
> -  dst = *(size *) (src + offset)
> -
> -Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
> +=============================  =========  ====================================
> +opcode construction            opcode     pseudocode
> +=============================  =========  ====================================
> +BPF_MEM | BPF_B | BPF_LDX      0x71       dst = \*(uint8_t \*) (src + offset)
> +BPF_MEM | BPF_H | BPF_LDX      0x69       dst = \*(uint16_t \*) (src + offset)
> +BPF_MEM | BPF_W | BPF_LDX      0x61       dst = \*(uint32_t \*) (src + offset)
> +BPF_MEM | BPF_DW | BPF_LDX     0x79       dst = \*(uint64_t \*) (src + offset)
> +BPF_MEM | BPF_B | BPF_ST       0x72       \*(uint8_t \*) (dst + offset) = imm
> +BPF_MEM | BPF_H | BPF_ST       0x6a       \*(uint16_t \*) (dst + offset) = imm
> +BPF_MEM | BPF_W | BPF_ST       0x62       \*(uint32_t \*) (dst + offset) = imm
> +BPF_MEM | BPF_DW | BPF_ST      0x7a       \*(uint64_t \*) (dst + offset) = imm
> +BPF_MEM | BPF_B | BPF_STX      0x73       \*(uint8_t \*) (dst + offset) = src
> +BPF_MEM | BPF_H | BPF_STX      0x6b       \*(uint16_t \*) (dst + offset) = src
> +BPF_MEM | BPF_W | BPF_STX      0x63       \*(uint32_t \*) (dst + offset) = src
> +BPF_MEM | BPF_DW | BPF_STX     0x7b       \*(uint64_t \*) (dst + offset) = src
> +=============================  =========  ====================================

I think the table is more verbose and less readable than the original text.

>  
>  Atomic operations
>  -----------------
> @@ -338,9 +374,11 @@ by other eBPF programs or means outside of this specification.
>  All atomic operations supported by eBPF are encoded as store operations
>  that use the ``BPF_ATOMIC`` mode modifier as follows:
>  
> -* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
> -* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
> -* 8-bit and 16-bit wide atomic operations are not supported.
> +* ``BPF_ATOMIC | BPF_W | BPF_STX`` (0xc3) for 32-bit operations
> +* ``BPF_ATOMIC | BPF_DW | BPF_STX`` (0xdb) for 64-bit operations
> +
> +Note that 8-bit (``BPF_B``) and 16-bit (``BPF_H``) wide atomic operations are not supported,
> +nor is ``BPF_ATOMIC | <size> | BPF_ST``.
>  
>  The 'imm' field is used to encode the actual atomic operation.
>  Simple atomic operation use a subset of the values defined to encode
> @@ -355,16 +393,15 @@ BPF_AND   0x50   atomic and
>  BPF_XOR   0xa0   atomic xor
>  ========  =====  ===========
>  
> -
> -``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
> +``BPF_ATOMIC | BPF_W  | BPF_STX`` (0xc3) with 'imm' = BPF_ADD means::
>  
>    *(uint32_t *)(dst + offset) += src
>  
> -``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
> +``BPF_ATOMIC | BPF_DW | BPF_STX`` (0xdb) with 'imm' = BPF ADD means::
>  
>    *(uint64_t *)(dst + offset) += src
>  
> -In addition to the simple atomic operations, there also is a modifier and
> +In addition to the simple atomic operations above, there also is a modifier and
>  two complex atomic operations:
>  
>  ===========  ================  ===========================
> -- 
> 2.33.4
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/15] ebpf-docs: Add Linux note about register calling convention
  2022-09-27 18:59 ` [PATCH 12/15] ebpf-docs: Add Linux note about register calling convention dthaler1968
@ 2022-09-30 22:17   ` Alexei Starovoitov
  0 siblings, 0 replies; 50+ messages in thread
From: Alexei Starovoitov @ 2022-09-30 22:17 UTC (permalink / raw)
  To: dthaler1968; +Cc: bpf, Dave Thaler

On Tue, Sep 27, 2022 at 06:59:55PM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
>  Documentation/bpf/linux-notes.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
> index 522ebe27d..0581ba326 100644
> --- a/Documentation/bpf/linux-notes.rst
> +++ b/Documentation/bpf/linux-notes.rst
> @@ -7,6 +7,12 @@ Linux implementation notes
>  
>  This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
>  
> +Registers and calling convention
> +================================
> +
> +All program types only use R1 which contains the "context", which is typically a structure containing all
> +the inputs needed, and the exit value for eBPF programs is passed as a 32 bit value.

There is a patch pending that makes return values 64-bit.
Also bpf progs that replace other bpf progs have all 5 input args.
I think this paragraph is unnecessary.

> +
>  Arithmetic instructions
>  =======================
>  
> -- 
> 2.33.4
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 15/15] ebpf-docs: Add note about invalid instruction
  2022-09-27 18:59 ` [PATCH 15/15] ebpf-docs: Add note about invalid instruction dthaler1968
@ 2022-09-30 22:21   ` Alexei Starovoitov
  0 siblings, 0 replies; 50+ messages in thread
From: Alexei Starovoitov @ 2022-09-30 22:21 UTC (permalink / raw)
  To: dthaler1968; +Cc: bpf, Dave Thaler

On Tue, Sep 27, 2022 at 06:59:58PM +0000, dthaler1968@googlemail.com wrote:
> From: Dave Thaler <dthaler@microsoft.com>
> 
> Signed-off-by: Dave Thaler <dthaler@microsoft.com>
> ---
>  Documentation/bpf/clang-notes.rst     | 5 +++++
>  Documentation/bpf/instruction-set.rst | 3 +++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/Documentation/bpf/clang-notes.rst b/Documentation/bpf/clang-notes.rst
> index 528feddf2..3c934421b 100644
> --- a/Documentation/bpf/clang-notes.rst
> +++ b/Documentation/bpf/clang-notes.rst
> @@ -20,6 +20,11 @@ Arithmetic instructions
>  For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with
>  ``-Xclang -target-feature -Xclang +alu32``.  In CPU version 3, support is automatically included.
>  
> +Invalid instructions
> +====================
> +
> +Clang will generate the invalid ``BPF_CALL | BPF_X | BPF_JMP`` (0x8d) instruction if ``-O0`` is used.

I wouldn't call it invalid and it's not related to -O0.
It's a "reserved" instruction.
When we support indirect jumps that's what it would be.

> +
>  Atomic operations
>  =================
>  
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index 2ac8f0dae..af9dc0cc6 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -303,6 +303,9 @@ with the remaining registers being ignored.  The definition of a helper function
>  is responsible for specifying the type (e.g., integer, pointer, etc.) of the value returned,
>  the number of arguments, and the type of each argument.
>  
> +Note that ``BPF_CALL | BPF_X | BPF_JMP`` (0x8d), where the helper function integer
> +would be read from a specified register, is not currently permitted.

I would say:
Note that ``BPF_CALL | BPF_X | BPF_JMP`` instruction is reserved and currently not permitted.

> +
>  Runtime functions
>  ~~~~~~~~~~~~~~~~~
>  Runtime functions are like helper functions except that they are not specific
> -- 
> 2.33.4
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
  2022-09-30 21:59       ` Alexei Starovoitov
@ 2022-09-30 22:41         ` Dave Thaler
  2022-09-30 23:41           ` Alexei Starovoitov
  2023-09-29 21:03             ` [Bpf] " Dave Thaler
  0 siblings, 2 replies; 50+ messages in thread
From: Dave Thaler @ 2022-09-30 22:41 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: dthaler1968, bpf

> -----Original Message-----
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Sent: Friday, September 30, 2022 2:59 PM
> To: Dave Thaler <dthaler@microsoft.com>
> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
> Subject: Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero,
> overflow, and underflow
> 
> On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
> > [...]
> > > > +Also note that the modulo operation often varies by language when
> > > > +the dividend or divisor are negative, where Python, Ruby, etc.
> > > > +differ from C, Go, Java, etc. This specification requires that
> > > > +modulo use truncated division (where -13 % 3 == -1) as
> > > > +implemented in C, Go,
> > > > +etc.:
> > > > +
> > > > +   a % n = a - n * trunc(a / n)
> > > > +
> > >
> > > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
> >
> > It's because there's multiple definitions of modulo out there as the
> > paragraph notes, which differ in what they do with negative numbers.
> > The ISA defines the modulo operation as being the specific version above.
> > If you tried to implement the ISA in say Python and didn't know that,
> > you'd have a non-compliant implementation.
> 
> Is it because the languages have weird rules to pick between signed vs
> unsigned mod?
> At least from llvm pov the smod and umod have fixed behavior.

It's because there's different mathematical definitions and different languages have chosen different definitions.  E.g., languages/libraries that follow Knuth use a
different mathematical definition than C uses.  For details see:

https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition

https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 10/15] ebpf-docs: Add appendix of all opcodes in order
  2022-09-30 22:02   ` Alexei Starovoitov
@ 2022-09-30 22:43     ` Dave Thaler
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Thaler @ 2022-09-30 22:43 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
[...] 
> I guess it's useful, but how did you generate this table?
> Looks very error prone if all hex numbers were done by human.

Yes it was human generated and checked by other reviewers such as Quentin.
I wish there was an automated way.  If someone else can come up with
an automated process, I welcome the patch contribution :)

Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 09/15] ebpf-docs: Explain helper functions
  2022-09-30 22:01   ` Alexei Starovoitov
@ 2022-09-30 23:01     ` Dave Thaler
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Thaler @ 2022-09-30 23:01 UTC (permalink / raw)
  To: Alexei Starovoitov, dthaler1968; +Cc: bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
[...]
> > +Helper functions
> > +~~~~~~~~~~~~~~~~
> > +Helper functions are a concept whereby BPF programs can call into a
> > +set of function calls exposed by the eBPF runtime.  Each helper
> > +function is identified by an integer used in a ``BPF_CALL`` instruction.
> > +The available helper functions may differ for each eBPF program type.
> > +
> > +Conceptually, each helper function is implemented with a commonly
> > +shared function signature defined as:
> > +
> > +  uint64_t function(uint64_t r1, uint64_t r2, uint64_t r3, uint64_t
> > + r4, uint64_t r5)
> > +
> > +In actuality, each helper function is defined as taking between 0 and
> > +5 arguments, with the remaining registers being ignored.  The
> > +definition of a helper function is responsible for specifying the
> > +type (e.g., integer, pointer, etc.) of the value returned, the number of
> arguments, and the type of each argument.
> 
> If we explain helpers in the doc then we should explain kfuncs and bpf-to-bpf
> calls as well.
> Otherwise it looks incomplete and eventually will suffer the same issue as '64-
> bit instructionS'.

Agree, and blurbs were indeed added in 
[PATCH 14/15] ebpf-docs: Add extended call instructions

Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
  2022-09-30 22:41         ` Dave Thaler
@ 2022-09-30 23:41           ` Alexei Starovoitov
  2022-10-04 16:36             ` Dave Thaler
  2023-09-29 21:03             ` [Bpf] " Dave Thaler
  1 sibling, 1 reply; 50+ messages in thread
From: Alexei Starovoitov @ 2022-09-30 23:41 UTC (permalink / raw)
  To: Dave Thaler; +Cc: dthaler1968, bpf

On Fri, Sep 30, 2022 at 3:41 PM Dave Thaler <dthaler@microsoft.com> wrote:
>
> > -----Original Message-----
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Sent: Friday, September 30, 2022 2:59 PM
> > To: Dave Thaler <dthaler@microsoft.com>
> > Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
> > Subject: Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero,
> > overflow, and underflow
> >
> > On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
> > > [...]
> > > > > +Also note that the modulo operation often varies by language when
> > > > > +the dividend or divisor are negative, where Python, Ruby, etc.
> > > > > +differ from C, Go, Java, etc. This specification requires that
> > > > > +modulo use truncated division (where -13 % 3 == -1) as
> > > > > +implemented in C, Go,
> > > > > +etc.:
> > > > > +
> > > > > +   a % n = a - n * trunc(a / n)
> > > > > +
> > > >
> > > > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
> > >
> > > It's because there's multiple definitions of modulo out there as the
> > > paragraph notes, which differ in what they do with negative numbers.
> > > The ISA defines the modulo operation as being the specific version above.
> > > If you tried to implement the ISA in say Python and didn't know that,
> > > you'd have a non-compliant implementation.
> >
> > Is it because the languages have weird rules to pick between signed vs
> > unsigned mod?
> > At least from llvm pov the smod and umod have fixed behavior.
>
> It's because there's different mathematical definitions and different languages have chosen different definitions.  E.g., languages/libraries that follow Knuth use a
> different mathematical definition than C uses.  For details see:
>
> https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition
>
> https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/

Those differences are in signed div/mod only, right?
Unsigned div/mod doesn't have it, right?
bpf has only unsigned div/mod.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 11/15] ebpf-docs: Improve English readability
  2022-09-30 22:16   ` Alexei Starovoitov
@ 2022-10-04 14:32     ` Dave Thaler
  2022-10-04 15:38       ` Alexei Starovoitov
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Thaler @ 2022-10-04 14:32 UTC (permalink / raw)
  To: Alexei Starovoitov, dthaler1968; +Cc: bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > +The eBPF instruction set consists of eleven 64 bit registers, a
> > +program counter, and 512 bytes of stack space.
> 
> I would not add 512 to a doc.
> That's what we have now, but we might relax that in the future.

I think it's important to at least give a minimum.  Will change to
"at least 512".

[...]
> > +Registers R0 - R5 are scratch registers, meaning the BPF program
> > +needs to either spill them to the BPF stack or move them to callee
> > +saved registers if these arguments are to be reused across multiple
> > +function calls. Spilling means that the value in the register is
> > +moved to the BPF stack. The reverse operation of moving the variable
> from the BPF stack to the register is called filling.
> > +The reason for spilling/filling is due to the limited number of registers.
> 
> More canonical way would be to say that r0-r5 are caller saved.

Will change "scratch" to "caller-saved"

[...]
> > -``BPF_ADD | BPF_X | BPF_ALU64`` means::
> > +``BPF_ADD | BPF_X | BPF_ALU64`` (0x0f) means::
> 
> I don't think adding hex values help here.

I found it very helpful in verifying that the Appendix table
was correct, and providing a correlation to the text here
that shows the construction of the value.  So I'd like to keep them.

[...]
> > -The 1-bit source operand field in the opcode is used to to select
> > what byte -order the operation convert from or to:
> > +Byte swap instructions use non-default semantics of the 1-bit
> > +'source' field in
> 
> I would drop 'non-default'. Many fields have different meanings depending
> on opcode.
> BPF_SRC() macro just reads that bit.

Will reword to say:
Byte swap instructions use the 1-bit 'source' field in the 'opcode' field
as follows.  Instead of indicating the source operator, it is instead
used to select what byte order the operation converts from or to:

[...]
> > +* mnenomic indicates a short form that might be displayed by some
> > +tools such as disassemblers
> > +* 'htoleNN()' indicates converting a NN-bit value from host byte
> > +order to little-endian byte order
> > +* 'htobeNN()' indicates converting a NN-bit value from host byte
> > +order to big-endian byte order
> 
> I think we need to add normal bswap insn.
> These to_le and to_be are very awkward to use.
> As soon as we have new insn the compiler will be using it.
> There is no equivalent of to_be and to_le in C. It wasn't good ISA design.

I will interpret this as a request for someone to do code work, rather
than any request for immediately changes to the doc :)

[...]
> >  Regular load and store operations
> >  ---------------------------------
> >
> >  The ``BPF_MEM`` mode modifier is used to encode regular load and
> > store  instructions that transfer data between a register and memory.
> >
> > -``BPF_MEM | <size> | BPF_STX`` means::
> > -
> > -  *(size *) (dst + offset) = src_reg
> > -
> > -``BPF_MEM | <size> | BPF_ST`` means::
> > -
> > -  *(size *) (dst + offset) = imm32
> > -
> > -``BPF_MEM | <size> | BPF_LDX`` means::
> > -
> > -  dst = *(size *) (src + offset)
> > -
> > -Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
> > +=============================  =========
> ====================================
> > +opcode construction            opcode     pseudocode
> > +=============================  =========
> ====================================
> > +BPF_MEM | BPF_B | BPF_LDX      0x71       dst = \*(uint8_t \*) (src + offset)
> > +BPF_MEM | BPF_H | BPF_LDX      0x69       dst = \*(uint16_t \*) (src +
> offset)
> > +BPF_MEM | BPF_W | BPF_LDX      0x61       dst = \*(uint32_t \*) (src +
> offset)
> > +BPF_MEM | BPF_DW | BPF_LDX     0x79       dst = \*(uint64_t \*) (src +
> offset)
> > +BPF_MEM | BPF_B | BPF_ST       0x72       \*(uint8_t \*) (dst + offset) = imm
> > +BPF_MEM | BPF_H | BPF_ST       0x6a       \*(uint16_t \*) (dst + offset) =
> imm
> > +BPF_MEM | BPF_W | BPF_ST       0x62       \*(uint32_t \*) (dst + offset) =
> imm
> > +BPF_MEM | BPF_DW | BPF_ST      0x7a       \*(uint64_t \*) (dst + offset) =
> imm
> > +BPF_MEM | BPF_B | BPF_STX      0x73       \*(uint8_t \*) (dst + offset) = src
> > +BPF_MEM | BPF_H | BPF_STX      0x6b       \*(uint16_t \*) (dst + offset) =
> src
> > +BPF_MEM | BPF_W | BPF_STX      0x63       \*(uint32_t \*) (dst + offset) =
> src
> > +BPF_MEM | BPF_DW | BPF_STX     0x7b       \*(uint64_t \*) (dst + offset) =
> src
> > +=============================  =========
> > +====================================
> 
> I think the table is more verbose and less readable than the original text.

Will change back to original text.

Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 08/15] ebpf-docs: Use consistent names for the same field
  2022-09-30 20:57   ` Alexei Starovoitov
@ 2022-10-04 14:44     ` Dave Thaler
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Thaler @ 2022-10-04 14:44 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
[...]
> > +src
> > +  the source register number (0-10), except where otherwise specified
> > +  (`64-bit immediate instructions`_ reuse this field for other
> > +purposes)
> 
> There are more than one?
> I guess we have such section now,
> but in ISA it really is only one insn. LD_IMM64.
> It's one insn for the interpreter and one insn for JITs.

Here the plural is really referring to occurrences in programs,
rather than implying multiple opcodes, so I don't think the grammar
is incorrect.

[...]
> > +As discussed below in `64-bit immediate instructions`_, some
> > +instructions use a 64-bit immediate value that is constructed as follows.
> > +The 64 bits following the basic instruction contain a pseudo
> > +instruction using the same format but with opcode, dst, src, and
> > +offset all set to zero, and imm containing the high 32 bits of the immediate
> value.
> 
> 'instructions' here and further reads a bit odd.
> May be calling it one instruction where imm_lo/hi have different semantics
> depending on src would be better?

I will reword the first sentence above to:

As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
instruction uses a 64-bit immediate value that is constructed as follows.

[...]
> 
> > +
> > +=================  ==================
> > +64 bits (MSB)      64 bits (LSB)
> > +=================  ================== basic instruction  pseudo
> > +instruction =================  ==================
> > +
> > +Thus the 64-bit immediate value is constructed as follows:
> > +
> > +  imm64 = imm + (next_imm << 32)
> > +
> > +where 'next_imm' refers to the imm value of the pseudo instruction
> > +following the basic instruction.
> > +
> > +In the remainder of this document 'src' and 'dst' refer to the values
> > +of the source and destination registers, respectively, rather than the
> register number.
> > +
> >  Instruction classes
> >  -------------------
> >
> > @@ -75,20 +114,24 @@ For arithmetic and jump instructions
> > (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` an  ==============  ======
> =================
> >  4 bits (MSB)    1 bit   3 bits (LSB)
> >  ==============  ======  ================= -operation code  source
> > instruction class
> > +code            source  instruction class
> >  ==============  ======  =================
> >
> > -The 4th bit encodes the source operand:
> 
> feels wrong to lose this part.

The concept is still in the document, just in the "Arithmetic and jump instructions"
section since it seems specific to those (i.e., not the "Load and store instructions").

> > +code
> > +  the operation code, whose meaning varies by instruction class
> >
> > -  ======  =====  ========================================
> > -  source  value  description
> > -  ======  =====  ========================================
> > -  BPF_K   0x00   use 32-bit immediate as source operand
> > -  BPF_X   0x08   use 'src_reg' register as source operand
> > -  ======  =====  ========================================
> > +source
> > +  the source operand location, which unless otherwise specified is one of:
> >
> > -The four MSB bits store the operation code.
> 
> same here.

Still there in the supersection.

These are easier to see in the rendered document.

Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/15] ebpf-docs: Improve English readability
  2022-10-04 14:32     ` Dave Thaler
@ 2022-10-04 15:38       ` Alexei Starovoitov
  2022-10-04 15:55         ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Alexei Starovoitov @ 2022-10-04 15:38 UTC (permalink / raw)
  To: Dave Thaler; +Cc: dthaler1968, bpf

On Tue, Oct 4, 2022 at 7:32 AM Dave Thaler <dthaler@microsoft.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > > +The eBPF instruction set consists of eleven 64 bit registers, a
> > > +program counter, and 512 bytes of stack space.
> >
> > I would not add 512 to a doc.
> > That's what we have now, but we might relax that in the future.
>
> I think it's important to at least give a minimum.  Will change to
> "at least 512".

I'm not sure 512 stands as a minimum either.
When we have the gatekeeper prog it will be able to enforce
any stack size.

> [...]
> > > +Registers R0 - R5 are scratch registers, meaning the BPF program
> > > +needs to either spill them to the BPF stack or move them to callee
> > > +saved registers if these arguments are to be reused across multiple
> > > +function calls. Spilling means that the value in the register is
> > > +moved to the BPF stack. The reverse operation of moving the variable
> > from the BPF stack to the register is called filling.
> > > +The reason for spilling/filling is due to the limited number of registers.
> >
> > More canonical way would be to say that r0-r5 are caller saved.
>
> Will change "scratch" to "caller-saved"
>
> [...]
> > > -``BPF_ADD | BPF_X | BPF_ALU64`` means::
> > > +``BPF_ADD | BPF_X | BPF_ALU64`` (0x0f) means::
> >
> > I don't think adding hex values help here.
>
> I found it very helpful in verifying that the Appendix table
> was correct, and providing a correlation to the text here
> that shows the construction of the value.  So I'd like to keep them.

I think that means that the appendix table shouldn't be there either.
I'd like to avoid both.

> [...]
> > > -The 1-bit source operand field in the opcode is used to to select
> > > what byte -order the operation convert from or to:
> > > +Byte swap instructions use non-default semantics of the 1-bit
> > > +'source' field in
> >
> > I would drop 'non-default'. Many fields have different meanings depending
> > on opcode.
> > BPF_SRC() macro just reads that bit.
>
> Will reword to say:
> Byte swap instructions use the 1-bit 'source' field in the 'opcode' field
> as follows.  Instead of indicating the source operator, it is instead
> used to select what byte order the operation converts from or to:

makes sense.

> [...]
> > > +* mnenomic indicates a short form that might be displayed by some
> > > +tools such as disassemblers
> > > +* 'htoleNN()' indicates converting a NN-bit value from host byte
> > > +order to little-endian byte order
> > > +* 'htobeNN()' indicates converting a NN-bit value from host byte
> > > +order to big-endian byte order
> >
> > I think we need to add normal bswap insn.
> > These to_le and to_be are very awkward to use.
> > As soon as we have new insn the compiler will be using it.
> > There is no equivalent of to_be and to_le in C. It wasn't good ISA design.
>
> I will interpret this as a request for someone to do code work, rather
> than any request for immediately changes to the doc :)
>
> [...]
> > >  Regular load and store operations
> > >  ---------------------------------
> > >
> > >  The ``BPF_MEM`` mode modifier is used to encode regular load and
> > > store  instructions that transfer data between a register and memory.
> > >
> > > -``BPF_MEM | <size> | BPF_STX`` means::
> > > -
> > > -  *(size *) (dst + offset) = src_reg
> > > -
> > > -``BPF_MEM | <size> | BPF_ST`` means::
> > > -
> > > -  *(size *) (dst + offset) = imm32
> > > -
> > > -``BPF_MEM | <size> | BPF_LDX`` means::
> > > -
> > > -  dst = *(size *) (src + offset)
> > > -
> > > -Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
> > > +=============================  =========
> > ====================================
> > > +opcode construction            opcode     pseudocode
> > > +=============================  =========
> > ====================================
> > > +BPF_MEM | BPF_B | BPF_LDX      0x71       dst = \*(uint8_t \*) (src + offset)
> > > +BPF_MEM | BPF_H | BPF_LDX      0x69       dst = \*(uint16_t \*) (src +
> > offset)
> > > +BPF_MEM | BPF_W | BPF_LDX      0x61       dst = \*(uint32_t \*) (src +
> > offset)
> > > +BPF_MEM | BPF_DW | BPF_LDX     0x79       dst = \*(uint64_t \*) (src +
> > offset)
> > > +BPF_MEM | BPF_B | BPF_ST       0x72       \*(uint8_t \*) (dst + offset) = imm
> > > +BPF_MEM | BPF_H | BPF_ST       0x6a       \*(uint16_t \*) (dst + offset) =
> > imm
> > > +BPF_MEM | BPF_W | BPF_ST       0x62       \*(uint32_t \*) (dst + offset) =
> > imm
> > > +BPF_MEM | BPF_DW | BPF_ST      0x7a       \*(uint64_t \*) (dst + offset) =
> > imm
> > > +BPF_MEM | BPF_B | BPF_STX      0x73       \*(uint8_t \*) (dst + offset) = src
> > > +BPF_MEM | BPF_H | BPF_STX      0x6b       \*(uint16_t \*) (dst + offset) =
> > src
> > > +BPF_MEM | BPF_W | BPF_STX      0x63       \*(uint32_t \*) (dst + offset) =
> > src
> > > +BPF_MEM | BPF_DW | BPF_STX     0x7b       \*(uint64_t \*) (dst + offset) =
> > src
> > > +=============================  =========
> > > +====================================
> >
> > I think the table is more verbose and less readable than the original text.
>
> Will change back to original text.

Please see git. I've removed that table. Please don't add it back.
I see no value in such tables other than more things to get wrong.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 11/15] ebpf-docs: Improve English readability
  2022-10-04 15:38       ` Alexei Starovoitov
@ 2022-10-04 15:55         ` Dave Thaler
  2022-10-04 15:56           ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Thaler @ 2022-10-04 15:55 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf

> > I found it very helpful in verifying that the Appendix table was
> > correct, and providing a correlation to the text here that shows the
> > construction of the value.  So I'd like to keep them.
> 
> I think that means that the appendix table shouldn't be there either.
> I'd like to avoid both.

I've heard from multiple people with different affiliations that the appendix
is the most useful part of the document, and what they wanted to see
added.  So I added by popular request.

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 11/15] ebpf-docs: Improve English readability
  2022-10-04 15:55         ` Dave Thaler
@ 2022-10-04 15:56           ` Dave Thaler
  2022-10-04 16:19             ` Alexei Starovoitov
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Thaler @ 2022-10-04 15:56 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf

Also worth noting that Quentin has a script that I believe parses the appendix
and uses it to generate a validator for ebpf programs.  (Which also
helps validate the appendix).

Dave

> -----Original Message-----
> From: Dave Thaler
> Sent: Tuesday, October 4, 2022 8:55 AM
> To: 'Alexei Starovoitov' <alexei.starovoitov@gmail.com>
> Cc: bpf@vger.kernel.org
> Subject: RE: [PATCH 11/15] ebpf-docs: Improve English readability
> 
> > > I found it very helpful in verifying that the Appendix table was
> > > correct, and providing a correlation to the text here that shows the
> > > construction of the value.  So I'd like to keep them.
> >
> > I think that means that the appendix table shouldn't be there either.
> > I'd like to avoid both.
> 
> I've heard from multiple people with different affiliations that the appendix
> is the most useful part of the document, and what they wanted to see
> added.  So I added by popular request.
> 
> Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/15] ebpf-docs: Improve English readability
  2022-10-04 15:56           ` Dave Thaler
@ 2022-10-04 16:19             ` Alexei Starovoitov
  2022-10-04 16:41               ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Alexei Starovoitov @ 2022-10-04 16:19 UTC (permalink / raw)
  To: Dave Thaler; +Cc: bpf

On Tue, Oct 4, 2022 at 8:56 AM Dave Thaler <dthaler@microsoft.com> wrote:
>
> Also worth noting that Quentin has a script that I believe parses the appendix
> and uses it to generate a validator for ebpf programs.  (Which also
> helps validate the appendix).

The last thing I want to see is a document becoming a description
of the code.
We've always been doing it the other way around.
The documentation can live next to the code and docs automatically
generated from .h or .c files.
Doing the other way around sooner or later will be a disaster.
Imagine a typo in instruction-set.rst.
What should we do next? Fix a typo and say, look, the code
behaves differently, so we're fixing the doc.
If so, there is close to zero reason to add hex to the doc,
since it's not an authoritative answer.
On the other hand if instruction-set.rst is the source of
the truth then the code would have to change, which we obviously
cannot do. So let's not get us into the corner with
such tables.

> Dave
>
> > -----Original Message-----
> > From: Dave Thaler
> > Sent: Tuesday, October 4, 2022 8:55 AM
> > To: 'Alexei Starovoitov' <alexei.starovoitov@gmail.com>
> > Cc: bpf@vger.kernel.org
> > Subject: RE: [PATCH 11/15] ebpf-docs: Improve English readability
> >
> > > > I found it very helpful in verifying that the Appendix table was
> > > > correct, and providing a correlation to the text here that shows the
> > > > construction of the value.  So I'd like to keep them.
> > >
> > > I think that means that the appendix table shouldn't be there either.
> > > I'd like to avoid both.
> >
> > I've heard from multiple people with different affiliations that the appendix
> > is the most useful part of the document, and what they wanted to see
> > added.  So I added by popular request.

These people should speak up then.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
  2022-09-30 23:41           ` Alexei Starovoitov
@ 2022-10-04 16:36             ` Dave Thaler
  2022-10-04 17:24               ` div_k. Was: " Alexei Starovoitov
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Thaler @ 2022-10-04 16:36 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
[...]
> > > > > > +Also note that the modulo operation often varies by language
> > > > > > +when the dividend or divisor are negative, where Python, Ruby,
> etc.
> > > > > > +differ from C, Go, Java, etc. This specification requires
> > > > > > +that modulo use truncated division (where -13 % 3 == -1) as
> > > > > > +implemented in C, Go,
> > > > > > +etc.:
> > > > > > +
> > > > > > +   a % n = a - n * trunc(a / n)
> > > > > > +
> > > > >
> > > > > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
[...]
> Those differences are in signed div/mod only, right?
> Unsigned div/mod doesn't have it, right?
> bpf has only unsigned div/mod.

Ah right, will replace.  However since imm is a signed integer, that leaves
an ambiguity that is important to clarify.

What is the expected value for the following 64-bit BPF_DIV operation:
    r0 = 0xFFFFFFFFFFFFFFFF
    r0 /= -10
Is it 0x1 or 0x10000000a?  i.e., is the -10 sign extended to
0xFFFFFFFFFFFFFFF6 or treated as 0xFFFFFFF6 when doing the unsigned
division?

Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 11/15] ebpf-docs: Improve English readability
  2022-10-04 16:19             ` Alexei Starovoitov
@ 2022-10-04 16:41               ` Dave Thaler
  2022-10-04 16:54                 ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Thaler @ 2022-10-04 16:41 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf

> -----Original Message-----
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Sent: Tuesday, October 4, 2022 9:20 AM
> To: Dave Thaler <dthaler@microsoft.com>
> Cc: bpf@vger.kernel.org
> Subject: Re: [PATCH 11/15] ebpf-docs: Improve English readability
> 
> On Tue, Oct 4, 2022 at 8:56 AM Dave Thaler <dthaler@microsoft.com>
> wrote:
> >
> > Also worth noting that Quentin has a script that I believe parses the
> > appendix and uses it to generate a validator for ebpf programs.
> > (Which also helps validate the appendix).
> 
> The last thing I want to see is a document becoming a description of the
> code.
> We've always been doing it the other way around.
> The documentation can live next to the code and docs automatically
> generated from .h or .c files.
> Doing the other way around sooner or later will be a disaster.
> Imagine a typo in instruction-set.rst.
> What should we do next? Fix a typo and say, look, the code behaves
> differently, so we're fixing the doc.
> If so, there is close to zero reason to add hex to the doc, since it's not an
> authoritative answer.
> On the other hand if instruction-set.rst is the source of the truth then the
> code would have to change, which we obviously cannot do. So let's not get us
> into the corner with such tables.

The point of a standard is to be a source of truth.  If another implementation
(ubpf, hbpf, etc.) doesn't match the spec, then the code would have to change.
Such tables are seen as invaluable for determining correctness of other
implementations.   So the feedback is that it's important to have such if we
want everyone else to do the right thing.

> These people should speak up then.

I agree.

Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH 11/15] ebpf-docs: Improve English readability
  2022-10-04 16:41               ` Dave Thaler
@ 2022-10-04 16:54                 ` Dave Thaler
  2022-10-06 20:44                   ` Jim Harris
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Thaler @ 2022-10-04 16:54 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf

Regarding the tables:
> Such tables are seen as invaluable for determining correctness of other
> implementations.   So the feedback is that it's important to have such if we
> want everyone else to do the right thing.
> 
> > These people should speak up then.
> 
> I agree.

Here's two public examples...

Christoph Hellwig, said on May 17 at https://lore.kernel.org/bpf/20220517091011.GA18723@lst.de/:
> One useful thing for this would be an opcode table with all the 
> optional field usage in machine readable format.
>
> Jim who is on CC has already built a nice table off all opcodes based 
> on existing material that might be a good starting point.

Jim Harris responded on that thread with a strawman which was
used as the basis for the table in the appendix.

Jim then commented in the github version on August 30:
> In my opinion, this table is the biggest thing that has been missing, 
> and will be most essential for a more "formal" specification.

I will encourage them and others to comment on this thread.

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* div_k. Was: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
  2022-10-04 16:36             ` Dave Thaler
@ 2022-10-04 17:24               ` Alexei Starovoitov
  2022-10-04 18:23                 ` Dave Thaler
  0 siblings, 1 reply; 50+ messages in thread
From: Alexei Starovoitov @ 2022-10-04 17:24 UTC (permalink / raw)
  To: Dave Thaler, Daniel Borkmann; +Cc: bpf, Johan Almbladh

On Tue, Oct 4, 2022 at 9:36 AM Dave Thaler <dthaler@microsoft.com> wrote:
> > Those differences are in signed div/mod only, right?
> > Unsigned div/mod doesn't have it, right?
> > bpf has only unsigned div/mod.
>
> Ah right, will replace.  However since imm is a signed integer, that leaves
> an ambiguity that is important to clarify.
>
> What is the expected value for the following 64-bit BPF_DIV operation:
>     r0 = 0xFFFFFFFFFFFFFFFF
>     r0 /= -10
> Is it 0x1 or 0x10000000a?  i.e., is the -10 sign extended to
> 0xFFFFFFFFFFFFFFF6 or treated as 0xFFFFFFF6 when doing the unsigned
> division?

x86 and arm64 JITs treat it as imm32 is zero extended.
But looking at the interpreter:
        ALU64_DIV_K:
                DST = div64_u64(DST, IMM);
it looks like we have a bug there.
But we have a bunch of div_k tests in lib/test_bpf.c
including negative imm32. Hmm.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: div_k. Was: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
  2022-10-04 17:24               ` div_k. Was: " Alexei Starovoitov
@ 2022-10-04 18:23                 ` Dave Thaler
  2022-10-04 18:34                   ` Alexei Starovoitov
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Thaler @ 2022-10-04 18:23 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann; +Cc: bpf, Johan Almbladh

Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
[...]
> > What is the expected value for the following 64-bit BPF_DIV operation:
> >     r0 = 0xFFFFFFFFFFFFFFFF
> >     r0 /= -10
> > Is it 0x1 or 0x10000000a?  i.e., is the -10 sign extended to
> > 0xFFFFFFFFFFFFFFF6 or treated as 0xFFFFFFF6 when doing the unsigned
> > division?
> 
> x86 and arm64 JITs treat it as imm32 is zero extended.

Alan Jowett just pointed out to me that the question is not limited to DIV.

r0 = 1
r0 += -1

Is the answer 0 or 0x0000000100000000?
Assuming the answer is to zero extend imm32, it contains the latter, which
would be counter-intuitive enough to make it important to point out explicitly.

> But looking at the interpreter:
>         ALU64_DIV_K:
>                 DST = div64_u64(DST, IMM); it looks like we have a bug there.
> But we have a bunch of div_k tests in lib/test_bpf.c including negative
> imm32. Hmm.

Yeah.

"ALU64_DIV_K: 0xffffffffffffffff / (-1) = 0x0000000000000001",
"ALU64_ADD_K: 2147483646 + -2147483647 = -1",
"ALU64_ADD_K: 0 + (-1) = 0xffffffffffffffff",
"ALU64_MUL_K: 1 * -2147483647 = -2147483647",
"ALU64_MUL_K: 1 * (-1) = 0xffffffffffffffff",
"ALU64_AND_K: 0x0000ffffffff0000 & -1 = 0x0000ffffffff0000",
"ALU64_AND_K: 0xffffffffffffffff & -1 = 0xffffffffffffffff",
"ALU64_OR_K: 0x000000000000000 | -1 = 0xffffffffffffffff",

The above assume sign extension not zero extension is the correct behavior
for these operations, if I understand correctly.

Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: div_k. Was: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow
  2022-10-04 18:23                 ` Dave Thaler
@ 2022-10-04 18:34                   ` Alexei Starovoitov
  0 siblings, 0 replies; 50+ messages in thread
From: Alexei Starovoitov @ 2022-10-04 18:34 UTC (permalink / raw)
  To: Dave Thaler; +Cc: Daniel Borkmann, bpf, Johan Almbladh

On Tue, Oct 4, 2022 at 11:23 AM Dave Thaler <dthaler@microsoft.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> [...]
> > > What is the expected value for the following 64-bit BPF_DIV operation:
> > >     r0 = 0xFFFFFFFFFFFFFFFF
> > >     r0 /= -10
> > > Is it 0x1 or 0x10000000a?  i.e., is the -10 sign extended to
> > > 0xFFFFFFFFFFFFFFF6 or treated as 0xFFFFFFF6 when doing the unsigned
> > > division?
> >
> > x86 and arm64 JITs treat it as imm32 is zero extended.
>
> Alan Jowett just pointed out to me that the question is not limited to DIV.
>
> r0 = 1
> r0 += -1
>
> Is the answer 0 or 0x0000000100000000?
> Assuming the answer is to zero extend imm32, it contains the latter, which
> would be counter-intuitive enough to make it important to point out explicitly.

This is an obvious one. imm32 is _sign_ extended everywhere.

>
> > But looking at the interpreter:
> >         ALU64_DIV_K:
> >                 DST = div64_u64(DST, IMM); it looks like we have a bug there.
> > But we have a bunch of div_k tests in lib/test_bpf.c including negative
> > imm32. Hmm.

Actually I misread the JITs.
/* mov r11, imm32 */
EMIT3_off32(0x49, 0xC7, 0xC3, imm32);

It is sign extending. There is no bug in the interpreter or JIT.

> Yeah.
>
> "ALU64_DIV_K: 0xffffffffffffffff / (-1) = 0x0000000000000001",
> "ALU64_ADD_K: 2147483646 + -2147483647 = -1",
> "ALU64_ADD_K: 0 + (-1) = 0xffffffffffffffff",
> "ALU64_MUL_K: 1 * -2147483647 = -2147483647",
> "ALU64_MUL_K: 1 * (-1) = 0xffffffffffffffff",
> "ALU64_AND_K: 0x0000ffffffff0000 & -1 = 0x0000ffffffff0000",
> "ALU64_AND_K: 0xffffffffffffffff & -1 = 0xffffffffffffffff",
> "ALU64_OR_K: 0x000000000000000 | -1 = 0xffffffffffffffff",
>
> The above assume sign extension not zero extension is the correct behavior
> for these operations, if I understand correctly.
>
> Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/15] ebpf-docs: Improve English readability
  2022-10-04 16:54                 ` Dave Thaler
@ 2022-10-06 20:44                   ` Jim Harris
  0 siblings, 0 replies; 50+ messages in thread
From: Jim Harris @ 2022-10-06 20:44 UTC (permalink / raw)
  To: Dave Thaler; +Cc: Alexei Starovoitov, bpf

On Tue, Oct 4, 2022 at 10:09 AM Dave Thaler <dthaler@microsoft.com> wrote:
>
> Regarding the tables:
> > Such tables are seen as invaluable for determining correctness of other
> > implementations.   So the feedback is that it's important to have such if we
> > want everyone else to do the right thing.
> >
> > > These people should speak up then.
> >
> > I agree.
>
> Here's two public examples...
>
> Christoph Hellwig, said on May 17 at https://lore.kernel.org/bpf/20220517091011.GA18723@lst.de/:
> > One useful thing for this would be an opcode table with all the
> > optional field usage in machine readable format.
> >
> > Jim who is on CC has already built a nice table off all opcodes based
> > on existing material that might be a good starting point.
>
> Jim Harris responded on that thread with a strawman which was
> used as the basis for the table in the appendix.
>
> Jim then commented in the github version on August 30:
> > In my opinion, this table is the biggest thing that has been missing,
> > and will be most essential for a more "formal" specification.

I think an opcode table is a huge help for developing alternate eBPF
implementations - anything that makes it more explicit which opcodes
are valid (and which ones are not).

For example, the section for BPF_ALU and BPF_ALU64 classes lists the
operation codes.  BPF_END is in that list.  Description says "see
separate section below".  The "Byte swap instructions" section then
says that byte swap instructions always use BPF_ALU, even for 64-bit
widths.  So an implementer can synthesize all of this to determine
that opcodes 0xD7 and 0xDF (which have BPF_ALU64 | BPF_END) are not
valid.  It would be more clear if somewhere there was a list that
explicitly showed that 0xD4 and 0xDC (BPF_ALU | BPF_END) were valid,
but 0xD7 and 0xDF were not.

If there's concern that an appendix gets out of sync with the code and
the primary sections of the instruction set doc, maybe these opcodes
can be added to the existing per-class-code tables in the instruction
set doc.  A full opcode table could probably be generated from those
tables instead of the hand-written one that Dave and I worked on in
his patch.

-Jim


> I will encourage them and others to comment on this thread.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Signed modulo operations
@ 2023-09-29 21:03             ` Dave Thaler
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Thaler @ 2023-09-29 21:03 UTC (permalink / raw)
  To: bpf; +Cc: bpf

In the email discussion below, we concluded it wasn't relevant at the time because
there were no signed modulo instructions.  However, now there is and I believe the
ambiguity in the current spec needs to be addressed.

> -----Original Message-----
> From: Dave Thaler
> Sent: Friday, September 30, 2022 3:42 PM
> To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
> Subject: RE: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero,
> overflow, and underflow
> 
> > -----Original Message-----
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Sent: Friday, September 30, 2022 2:59 PM
> > To: Dave Thaler <dthaler@microsoft.com>
> > Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
> > Subject: Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by
> > zero, overflow, and underflow
> >
> > On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
> > > [...]
> > > > > +Also note that the modulo operation often varies by language
> > > > > +when the dividend or divisor are negative, where Python, Ruby, etc.
> > > > > +differ from C, Go, Java, etc. This specification requires that
> > > > > +modulo use truncated division (where -13 % 3 == -1) as
> > > > > +implemented in C, Go,
> > > > > +etc.:
> > > > > +
> > > > > +   a % n = a - n * trunc(a / n)
> > > > > +
> > > >
> > > > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
> > >
> > > It's because there's multiple definitions of modulo out there as the
> > > paragraph notes, which differ in what they do with negative numbers.
> > > The ISA defines the modulo operation as being the specific version above.
> > > If you tried to implement the ISA in say Python and didn't know
> > > that, you'd have a non-compliant implementation.
> >
> > Is it because the languages have weird rules to pick between signed vs
> > unsigned mod?
> > At least from llvm pov the smod and umod have fixed behavior.
> 
> It's because there's different mathematical definitions and different languages
> have chosen different definitions.  E.g., languages/libraries that follow Knuth
> use a different mathematical definition than C uses.  For details see:
> 
> https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition
> 
> https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/
> 
> Dave

Perhaps text like the proposed snippet quoted in the exchange above should be
added around the new text that now appears in the doc, i.e. the ambiguous text
is currently:
> For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
> 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
> is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
> interpreted as a 64-bit signed value.  

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Bpf] Signed modulo operations
@ 2023-09-29 21:03             ` Dave Thaler
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Thaler @ 2023-09-29 21:03 UTC (permalink / raw)
  To: bpf; +Cc: bpf

In the email discussion below, we concluded it wasn't relevant at the time because
there were no signed modulo instructions.  However, now there is and I believe the
ambiguity in the current spec needs to be addressed.

> -----Original Message-----
> From: Dave Thaler
> Sent: Friday, September 30, 2022 3:42 PM
> To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
> Subject: RE: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero,
> overflow, and underflow
> 
> > -----Original Message-----
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Sent: Friday, September 30, 2022 2:59 PM
> > To: Dave Thaler <dthaler@microsoft.com>
> > Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
> > Subject: Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by
> > zero, overflow, and underflow
> >
> > On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
> > > [...]
> > > > > +Also note that the modulo operation often varies by language
> > > > > +when the dividend or divisor are negative, where Python, Ruby, etc.
> > > > > +differ from C, Go, Java, etc. This specification requires that
> > > > > +modulo use truncated division (where -13 % 3 == -1) as
> > > > > +implemented in C, Go,
> > > > > +etc.:
> > > > > +
> > > > > +   a % n = a - n * trunc(a / n)
> > > > > +
> > > >
> > > > Interesting bit of info, but I'm not sure how it relates to the ISA doc.
> > >
> > > It's because there's multiple definitions of modulo out there as the
> > > paragraph notes, which differ in what they do with negative numbers.
> > > The ISA defines the modulo operation as being the specific version above.
> > > If you tried to implement the ISA in say Python and didn't know
> > > that, you'd have a non-compliant implementation.
> >
> > Is it because the languages have weird rules to pick between signed vs
> > unsigned mod?
> > At least from llvm pov the smod and umod have fixed behavior.
> 
> It's because there's different mathematical definitions and different languages
> have chosen different definitions.  E.g., languages/libraries that follow Knuth
> use a different mathematical definition than C uses.  For details see:
> 
> https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition
> 
> https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/
> 
> Dave

Perhaps text like the proposed snippet quoted in the exchange above should be
added around the new text that now appears in the doc, i.e. the ambiguous text
is currently:
> For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
> 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
> is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
> interpreted as a 64-bit signed value.  

Dave

-- 
Bpf mailing list
Bpf@ietf.org
https://www.ietf.org/mailman/listinfo/bpf

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Bpf] Signed modulo operations
@ 2023-09-30  6:07               ` Carsten Bormann
  0 siblings, 0 replies; 50+ messages in thread
From: Carsten Bormann @ 2023-09-30  6:07 UTC (permalink / raw)
  To: Dave Thaler; +Cc: bpf, bpf

I didn’t follow the whole discussion, but it maybe it's worth pointing out that C’s % is not a modulo operator, but a remainder operator.

Grüße, Carsten


> On 29. Sep 2023, at 23:03, Dave Thaler <dthaler=40microsoft.com@dmarc.ietf.org> wrote:
> 
> In the email discussion below, we concluded it wasn't relevant at the time because
> there were no signed modulo instructions.  However, now there is and I believe the
> ambiguity in the current spec needs to be addressed.
> 
>> -----Original Message-----
>> From: Dave Thaler
>> Sent: Friday, September 30, 2022 3:42 PM
>> To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
>> Subject: RE: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero,
>> overflow, and underflow
>> 
>>> -----Original Message-----
>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>> Sent: Friday, September 30, 2022 2:59 PM
>>> To: Dave Thaler <dthaler@microsoft.com>
>>> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
>>> Subject: Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by
>>> zero, overflow, and underflow
>>> 
>>> On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
>>>> [...]
>>>>>> +Also note that the modulo operation often varies by language
>>>>>> +when the dividend or divisor are negative, where Python, Ruby, etc.
>>>>>> +differ from C, Go, Java, etc. This specification requires that
>>>>>> +modulo use truncated division (where -13 % 3 == -1) as
>>>>>> +implemented in C, Go,
>>>>>> +etc.:
>>>>>> +
>>>>>> +   a % n = a - n * trunc(a / n)
>>>>>> +
>>>>> 
>>>>> Interesting bit of info, but I'm not sure how it relates to the ISA doc.
>>>> 
>>>> It's because there's multiple definitions of modulo out there as the
>>>> paragraph notes, which differ in what they do with negative numbers.
>>>> The ISA defines the modulo operation as being the specific version above.
>>>> If you tried to implement the ISA in say Python and didn't know
>>>> that, you'd have a non-compliant implementation.
>>> 
>>> Is it because the languages have weird rules to pick between signed vs
>>> unsigned mod?
>>> At least from llvm pov the smod and umod have fixed behavior.
>> 
>> It's because there's different mathematical definitions and different languages
>> have chosen different definitions.  E.g., languages/libraries that follow Knuth
>> use a different mathematical definition than C uses.  For details see:
>> 
>> https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition
>> 
>> https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/
>> 
>> Dave
> 
> Perhaps text like the proposed snippet quoted in the exchange above should be
> added around the new text that now appears in the doc, i.e. the ambiguous text
> is currently:
>> For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
>> 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
>> is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
>> interpreted as a 64-bit signed value.  
> 
> Dave
> 
> -- 
> Bpf mailing list
> Bpf@ietf.org
> https://www.ietf.org/mailman/listinfo/bpf

-- 
Bpf mailing list
Bpf@ietf.org
https://www.ietf.org/mailman/listinfo/bpf

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Bpf] Signed modulo operations
@ 2023-09-30  6:07               ` Carsten Bormann
  0 siblings, 0 replies; 50+ messages in thread
From: Carsten Bormann @ 2023-09-30  6:07 UTC (permalink / raw)
  To: Dave Thaler; +Cc: bpf, bpf

I didn’t follow the whole discussion, but it maybe it's worth pointing out that C’s % is not a modulo operator, but a remainder operator.

Grüße, Carsten


> On 29. Sep 2023, at 23:03, Dave Thaler <dthaler=40microsoft.com@dmarc.ietf.org> wrote:
> 
> In the email discussion below, we concluded it wasn't relevant at the time because
> there were no signed modulo instructions.  However, now there is and I believe the
> ambiguity in the current spec needs to be addressed.
> 
>> -----Original Message-----
>> From: Dave Thaler
>> Sent: Friday, September 30, 2022 3:42 PM
>> To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
>> Subject: RE: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero,
>> overflow, and underflow
>> 
>>> -----Original Message-----
>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>> Sent: Friday, September 30, 2022 2:59 PM
>>> To: Dave Thaler <dthaler@microsoft.com>
>>> Cc: dthaler1968@googlemail.com; bpf@vger.kernel.org
>>> Subject: Re: [PATCH 07/15] ebpf-docs: Fix modulo zero, division by
>>> zero, overflow, and underflow
>>> 
>>> On Fri, Sep 30, 2022 at 09:54:17PM +0000, Dave Thaler wrote:
>>>> [...]
>>>>>> +Also note that the modulo operation often varies by language
>>>>>> +when the dividend or divisor are negative, where Python, Ruby, etc.
>>>>>> +differ from C, Go, Java, etc. This specification requires that
>>>>>> +modulo use truncated division (where -13 % 3 == -1) as
>>>>>> +implemented in C, Go,
>>>>>> +etc.:
>>>>>> +
>>>>>> +   a % n = a - n * trunc(a / n)
>>>>>> +
>>>>> 
>>>>> Interesting bit of info, but I'm not sure how it relates to the ISA doc.
>>>> 
>>>> It's because there's multiple definitions of modulo out there as the
>>>> paragraph notes, which differ in what they do with negative numbers.
>>>> The ISA defines the modulo operation as being the specific version above.
>>>> If you tried to implement the ISA in say Python and didn't know
>>>> that, you'd have a non-compliant implementation.
>>> 
>>> Is it because the languages have weird rules to pick between signed vs
>>> unsigned mod?
>>> At least from llvm pov the smod and umod have fixed behavior.
>> 
>> It's because there's different mathematical definitions and different languages
>> have chosen different definitions.  E.g., languages/libraries that follow Knuth
>> use a different mathematical definition than C uses.  For details see:
>> 
>> https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition
>> 
>> https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/
>> 
>> Dave
> 
> Perhaps text like the proposed snippet quoted in the exchange above should be
> added around the new text that now appears in the doc, i.e. the ambiguous text
> is currently:
>> For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
>> 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
>> is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
>> interpreted as a 64-bit signed value.  
> 
> Dave
> 
> -- 
> Bpf mailing list
> Bpf@ietf.org
> https://www.ietf.org/mailman/listinfo/bpf


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Bpf] Signed modulo operations
@ 2023-09-30 14:48               ` Alexei Starovoitov
  0 siblings, 0 replies; 50+ messages in thread
From: Alexei Starovoitov @ 2023-09-30 14:48 UTC (permalink / raw)
  To: Dave Thaler; +Cc: bpf, bpf

On Fri, Sep 29, 2023 at 2:03 PM Dave Thaler
<dthaler=40microsoft.com@dmarc.ietf.org> wrote:
>
> Perhaps text like the proposed snippet quoted in the exchange above should be
> added around the new text that now appears in the doc, i.e. the ambiguous text
> is currently:
> > For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
> > 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
> > is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
> > interpreted as a 64-bit signed value.

That's what we have in the doc and it's a correct description.
Which part is ambiguous?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Bpf] Signed modulo operations
@ 2023-09-30 14:48               ` Alexei Starovoitov
  0 siblings, 0 replies; 50+ messages in thread
From: Alexei Starovoitov @ 2023-09-30 14:48 UTC (permalink / raw)
  To: Dave Thaler; +Cc: bpf, bpf

On Fri, Sep 29, 2023 at 2:03 PM Dave Thaler
<dthaler=40microsoft.com@dmarc.ietf.org> wrote:
>
> Perhaps text like the proposed snippet quoted in the exchange above should be
> added around the new text that now appears in the doc, i.e. the ambiguous text
> is currently:
> > For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
> > 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
> > is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
> > interpreted as a 64-bit signed value.

That's what we have in the doc and it's a correct description.
Which part is ambiguous?

-- 
Bpf mailing list
Bpf@ietf.org
https://www.ietf.org/mailman/listinfo/bpf

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Bpf] Signed modulo operations
  2023-09-30 14:48               ` Alexei Starovoitov
  (?)
@ 2023-10-02 13:19               ` Eduard Zingerman
  -1 siblings, 0 replies; 50+ messages in thread
From: Eduard Zingerman @ 2023-10-02 13:19 UTC (permalink / raw)
  To: Alexei Starovoitov, Dave Thaler; +Cc: bpf, bpf

On Sat, 2023-09-30 at 07:48 -0700, Alexei Starovoitov wrote:
> On Fri, Sep 29, 2023 at 2:03 PM Dave Thaler
> <dthaler=40microsoft.com@dmarc.ietf.org> wrote:
> > 
> > Perhaps text like the proposed snippet quoted in the exchange above should be
> > added around the new text that now appears in the doc, i.e. the ambiguous text
> > is currently:
> > > For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
> > > 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
> > > is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
> > > interpreted as a 64-bit signed value.
> 
> That's what we have in the doc and it's a correct description.
> Which part is ambiguous?
> 

As far as I understand Dave suggests to add exact specification for
the SMOD instruction as "signed modulo" might have different definitions [1].

I double checked and current clang implementation generates SMOD for
LLVM's 'srem' operation [2], which follows C semantics and defines
remainder as:

  remainder = a - n * trunc(divident / divisor)

> This instruction returns the remainder of a division (where the
> result is either zero or has the same sign as the dividend, op1)

And this is consistent with interpreter logic in core.c, e.g.:

	ALU64_MOD_K:
		switch (OFF) {
		case 0: ... break;
		case 1:
			AX = div64_s64(DST, IMM); /* implemented as '/' */
			DST = DST - AX * IMM;
			break;
		}
		CONT;

[1] https://en.wikipedia.org/wiki/Modulo
[2] https://llvm.org/docs/LangRef.html#srem-instruction

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-10-02 13:19 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 18:59 [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file dthaler1968
2022-09-27 18:59 ` [PATCH 02/15] ebpf-docs: Linux byteswap note dthaler1968
2022-09-27 18:59 ` [PATCH 03/15] ebpf-docs: Move Clang notes to a separate file dthaler1968
2022-09-27 18:59 ` [PATCH 04/15] ebpf-docs: Add Clang note about BPF_ALU dthaler1968
2022-09-27 18:59 ` [PATCH 05/15] ebpf-docs: Add TOC and fix formatting dthaler1968
2022-09-27 18:59 ` [PATCH 06/15] ebpf-docs: Use standard type convention in standard doc dthaler1968
2022-09-30 20:49   ` Alexei Starovoitov
2022-09-27 18:59 ` [PATCH 07/15] ebpf-docs: Fix modulo zero, division by zero, overflow, and underflow dthaler1968
2022-09-30 20:52   ` Alexei Starovoitov
2022-09-30 21:54     ` Dave Thaler
2022-09-30 21:59       ` Alexei Starovoitov
2022-09-30 22:41         ` Dave Thaler
2022-09-30 23:41           ` Alexei Starovoitov
2022-10-04 16:36             ` Dave Thaler
2022-10-04 17:24               ` div_k. Was: " Alexei Starovoitov
2022-10-04 18:23                 ` Dave Thaler
2022-10-04 18:34                   ` Alexei Starovoitov
2023-09-29 21:03           ` Signed modulo operations Dave Thaler
2023-09-29 21:03             ` [Bpf] " Dave Thaler
2023-09-30  6:07             ` Carsten Bormann
2023-09-30  6:07               ` Carsten Bormann
2023-09-30 14:48             ` Alexei Starovoitov
2023-09-30 14:48               ` Alexei Starovoitov
2023-10-02 13:19               ` Eduard Zingerman
2022-09-27 18:59 ` [PATCH 08/15] ebpf-docs: Use consistent names for the same field dthaler1968
2022-09-30 20:57   ` Alexei Starovoitov
2022-10-04 14:44     ` Dave Thaler
2022-09-27 18:59 ` [PATCH 09/15] ebpf-docs: Explain helper functions dthaler1968
2022-09-30 22:01   ` Alexei Starovoitov
2022-09-30 23:01     ` Dave Thaler
2022-09-27 18:59 ` [PATCH 10/15] ebpf-docs: Add appendix of all opcodes in order dthaler1968
2022-09-30 22:02   ` Alexei Starovoitov
2022-09-30 22:43     ` Dave Thaler
2022-09-27 18:59 ` [PATCH 11/15] ebpf-docs: Improve English readability dthaler1968
2022-09-30 22:16   ` Alexei Starovoitov
2022-10-04 14:32     ` Dave Thaler
2022-10-04 15:38       ` Alexei Starovoitov
2022-10-04 15:55         ` Dave Thaler
2022-10-04 15:56           ` Dave Thaler
2022-10-04 16:19             ` Alexei Starovoitov
2022-10-04 16:41               ` Dave Thaler
2022-10-04 16:54                 ` Dave Thaler
2022-10-06 20:44                   ` Jim Harris
2022-09-27 18:59 ` [PATCH 12/15] ebpf-docs: Add Linux note about register calling convention dthaler1968
2022-09-30 22:17   ` Alexei Starovoitov
2022-09-27 18:59 ` [PATCH 13/15] ebpf-docs: Add extended 64-bit immediate instructions dthaler1968
2022-09-27 18:59 ` [PATCH 14/15] ebpf-docs: Add extended call instructions dthaler1968
2022-09-27 18:59 ` [PATCH 15/15] ebpf-docs: Add note about invalid instruction dthaler1968
2022-09-30 22:21   ` Alexei Starovoitov
2022-09-30 20:50 ` [PATCH 01/15] ebpf-docs: Move legacy packet instructions to a separate file patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.