[PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps
@ 2018-06-25  3:54 Jakub Kicinski
  2018-06-25  3:54 ` [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization Jakub Kicinski
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-25  3:54 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski

Hi!

This set enables memcpy optimization when the source is a map pointer.
The rest adds multiplication and devide support with Jiong describes
as follows:

NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per
step, therefore we need 2 steps for u16 and 4 steps for u32.

We also need one start instruction to initialize the sequence and one or
two instructions to fetch the result depending on either you need the high
halve of u32 multiplication.

For ALU64, if either operand is beyond u32's value range, we reject it. One
thing to note, if the source operand is BPF_K, then we need to check "imm"
field directly, and we'd reject it if it is negative.  Because for ALU64,
"imm" (with s32 type) is expected to be sign extended to s64 which NFP mul
doesn't support. For ALU32, it is fine for "imm" be negative though,
because the result is 32-bits and here is no difference on the low halve
of result for signed/unsigned mul, so we will get correct result.

NFP doesn't have integer divide instruction, this patch set uses reciprocal
algorithm (the basic one, reciprocal_div) to emulate it.

For each u32 divide, we would need 11 instructions to finish the operation.

  7 (for multiplication) + 4 (various ALUs) = 11

Given NFP only supports multiplication no bigger than u32, we'd require
divisor and dividend no bigger than that as well.

Also eBPF doesn't support signed divide and has enforced this on C language
level by failing compilation. However LLVM assembler hasn't enforced this,
so it is possible for negative constant to leak in as a BPF_K operand
through assembly code, we reject such cases as well.

Meanwhile reciprocal_div.h only implemented the basic version of:

  "Division by Invariant Integers Using Multiplication"
                        - Torbjörn Granlund and Peter L. Montgomery

This patch set further implements the optimized version (Figure 4.2 in the
paper) inside existing reciprocal_div.h. When the divider is even and the
calculated reciprocal magic number doesn't fit u32, we could reduce the
required ALU instructions from 4 to 2 or 1 for some cases.

The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations. It makes sense to use it for JIT divide
code generation (for example eBPF JIT backends) for which we are willing to
trade performance of JITed code with that of host.

Jiong Wang (7):
  nfp: bpf: allow source ptr type be map ptr in memcpy optimization
  lib: reciprocal_div: implement the improved algorithm on the paper
    mentioned
  nfp: bpf: rename umin/umax to umin_src/umax_src
  nfp: bpf: copy range info for all operands of all ALU operations
  nfp: bpf: support u16 and u32 multiplications
  nfp: bpf: support u32 divide using reciprocal_div.h
  nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h

 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 232 +++++++++++++++++-
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  43 ++--
 .../net/ethernet/netronome/nfp/bpf/offload.c  |   6 +-
 .../net/ethernet/netronome/nfp/bpf/verifier.c |  95 ++++++-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h  |  28 +++
 include/linux/reciprocal_div.h                |  65 +++++
 lib/reciprocal_div.c                          |  37 +++
 7 files changed, 467 insertions(+), 39 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization
  2018-06-25  3:54 [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps Jakub Kicinski
@ 2018-06-25  3:54 ` Jakub Kicinski
  2018-06-26  5:50   ` Song Liu
  2018-06-25  3:54 ` [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jakub Kicinski
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-25  3:54 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jiong Wang

From: Jiong Wang <jiong.wang@netronome.com>

Map read has been supported on NFP, this patch enables optimization for
memcpy from map to packet.

This patch also fixed one latent bug which will cause copying from
unexpected address once memcpy for map pointer enabled.

Reported-by: Mary Pham <mary.pham@netronome.com>
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 8a92088df0d7..33111739b210 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -670,7 +670,7 @@ static int nfp_cpp_memcpy(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	xfer_num = round_up(len, 4) / 4;
 
 	if (src_40bit_addr)
-		addr40_offset(nfp_prog, meta->insn.src_reg, off, &src_base,
+		addr40_offset(nfp_prog, meta->insn.src_reg * 2, off, &src_base,
 			      &off);
 
 	/* Setup PREV_ALU fields to override memory read length. */
@@ -3299,7 +3299,8 @@ curr_pair_is_memcpy(struct nfp_insn_meta *ld_meta,
 	if (!is_mbpf_load(ld_meta) || !is_mbpf_store(st_meta))
 		return false;
 
-	if (ld_meta->ptr.type != PTR_TO_PACKET)
+	if (ld_meta->ptr.type != PTR_TO_PACKET &&
+	    ld_meta->ptr.type != PTR_TO_MAP_VALUE)
 		return false;
 
 	if (st_meta->ptr.type != PTR_TO_PACKET)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
  2018-06-25  3:54 [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps Jakub Kicinski
  2018-06-25  3:54 ` [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization Jakub Kicinski
@ 2018-06-25  3:54 ` Jakub Kicinski
  2018-06-26  6:21   ` Song Liu
  2018-06-25  3:54 ` [PATCH bpf-next 3/7] nfp: bpf: rename umin/umax to umin_src/umax_src Jakub Kicinski
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-25  3:54 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jiong Wang

From: Jiong Wang <jiong.wang@netronome.com>

The new added "reciprocal_value_adv" implements the advanced version of the
algorithm described in Figure 4.2 of the paper except when dividend has MSB
set which would require u128 divide on host and actually could be easily
handled before calling the new "reciprocal_value_adv".

The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations.

It makes no sense to use this advanced version for host divide emulation,
those extra complexities for calculating multiplier etc could completely
waive our saving on emulation operations.

However, it makes sense to use it for JIT divide code generation (for
example eBPF JIT backends) for which we are willing to trade performance of
JITed code with that of host. As shown by the following pseudo code, the
required emulation operations could go down from 6 (the basic version) to 3
or 4.

To use the result of "reciprocal_value_adv", suppose we want to calculate
n/d, the C-style pseudo code will be the following, it could be easily
changed to real code generation for other JIT targets.

  struct reciprocal_value_adv rvalue;
  u8 pre_shift, exp;

  if (d >= (1u << 31)) {
    result = n >= d;
    return;
  }
  rvalue = reciprocal_value_adv(d, 32)
  exp = rvalue.exp;
  if (rvalue.is_wide_m && !(d & 1)) {
    pre_shift = fls(d & -d) - 1;
    rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
  } else {
    pre_shift = 0;
  }

  // code generation starts.
  if (imm == 1 << exp) {
    result = n >> exp;
  } else if (rvalue.is_wide_m) {
    // pre_shift must be zero when reached here.
    t = (n * rvalue.m) >> 32;
    result = n - t;
    result >>= 1;
    result += t;
    result >>= rvalue.sh - 1;
  } else {
    if (pre_shift)
      result = n >> pre_shift;
    result = ((u64)result * rvalue.m) >> 32;
    result >>= rvalue.sh;
  }

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/linux/reciprocal_div.h | 65 ++++++++++++++++++++++++++++++++++
 lib/reciprocal_div.c           | 37 +++++++++++++++++++
 2 files changed, 102 insertions(+)

diff --git a/include/linux/reciprocal_div.h b/include/linux/reciprocal_div.h
index e031e9f2f9d8..5a695e4697d3 100644
--- a/include/linux/reciprocal_div.h
+++ b/include/linux/reciprocal_div.h
@@ -25,6 +25,9 @@ struct reciprocal_value {
 	u8 sh1, sh2;
 };
 
+/* "reciprocal_value" and "reciprocal_divide" together implement the basic
+ * version of the algorithm described in Figure 4.1 of the paper.
+ */
 struct reciprocal_value reciprocal_value(u32 d);
 
 static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
@@ -33,4 +36,66 @@ static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
 	return (t + ((a - t) >> R.sh1)) >> R.sh2;
 }
 
+struct reciprocal_value_adv {
+	u32 m;
+	u8 sh, exp;
+	bool is_wide_m;
+};
+
+/* "reciprocal_value_adv" implements the advanced version of the algorithm
+ * described in Figure 4.2 of the paper except when dividend has MSB set which
+ * would require u128 divide on host and actually could be easily handled before
+ * calling "reciprocal_value_adv".
+ *
+ * The advanced version requires more complex calculation to get the reciprocal
+ * multiplier and other control variables, but then could reduce the required
+ * emulation operations.
+ *
+ * It makes no sense to use this advanced version for host divide emulation,
+ * those extra complexities for calculating multiplier etc could completely
+ * waive our saving on emulation operations.
+ *
+ * However, it makes sense to use it for JIT divide code generation for which
+ * we are willing to trade performance of JITed code with that of host. As shown
+ * by the following pseudo code, the required emulation operations could go down
+ * from 6 (the basic version) to 3 or 4.
+ *
+ * To use the result of "reciprocal_value_adv", suppose we want to calculate
+ * n/d:
+ *
+ *   struct reciprocal_value_adv rvalue;
+ *   u8 pre_shift, exp;
+ *
+ *   if (d >= (1u << 31)) {
+ *     result = n >= d;
+ *     return;
+ *   }
+ *   rvalue = reciprocal_value_adv(d, 32)
+ *   exp = rvalue.exp;
+ *   if (rvalue.is_wide_m && !(d & 1)) {
+ *     pre_shift = fls(d & -d) - 1;
+ *     rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
+ *   } else {
+ *     pre_shift = 0;
+ *   }
+ *
+ *   // code generation starts.
+ *   if (imm == 1 << exp) {
+ *     result = n >> exp;
+ *   } else if (rvalue.is_wide_m) {
+ *     // pre_shift must be zero when reached here.
+ *     t = (n * rvalue.m) >> 32;
+ *     result = n - t;
+ *     result >>= 1;
+ *     result += t;
+ *     result >>= rvalue.sh - 1;
+ *   } else {
+ *     if (pre_shift)
+ *       result = n >> pre_shift;
+ *     result = ((u64)result * rvalue.m) >> 32;
+ *     result >>= rvalue.sh;
+ *   }
+ */
+struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec);
+
 #endif /* _LINUX_RECIPROCAL_DIV_H */
diff --git a/lib/reciprocal_div.c b/lib/reciprocal_div.c
index fcb4ce682c6f..a41501ebad7c 100644
--- a/lib/reciprocal_div.c
+++ b/lib/reciprocal_div.c
@@ -26,3 +26,40 @@ struct reciprocal_value reciprocal_value(u32 d)
 	return R;
 }
 EXPORT_SYMBOL(reciprocal_value);
+
+struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
+{
+	struct reciprocal_value_adv R;
+	u32 l, post_shift;
+	u64 mhigh, mlow;
+
+	l = fls(d - 1);
+	post_shift = l;
+	/* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
+	 * MSB set. This case needs to be handled before calling
+	 * "reciprocal_value_adv", please see the comment at
+	 * include/linux/reciprocal_div.h.
+	 */
+	mlow = 1ULL << (32 + l);
+	do_div(mlow, d);
+	mhigh = (1ULL << (32 + l)) + (1ULL << (32 + l - prec));
+	do_div(mhigh, d);
+
+	for (; post_shift > 0; post_shift--) {
+		u64 lo = mlow >> 1, hi = mhigh >> 1;
+
+		if (lo >= hi)
+			break;
+
+		mlow = lo;
+		mhigh = hi;
+	}
+
+	R.m = (u32)mhigh;
+	R.sh = post_shift;
+	R.exp = l;
+	R.is_wide_m = mhigh > U32_MAX;
+
+	return R;
+}
+EXPORT_SYMBOL(reciprocal_value_adv);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 3/7] nfp: bpf: rename umin/umax to umin_src/umax_src
  2018-06-25  3:54 [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps Jakub Kicinski
  2018-06-25  3:54 ` [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization Jakub Kicinski
  2018-06-25  3:54 ` [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jakub Kicinski
@ 2018-06-25  3:54 ` Jakub Kicinski
  2018-06-26  6:21   ` Song Liu
  2018-06-25  3:54 ` [PATCH bpf-next 4/7] nfp: bpf: copy range info for all operands of all ALU operations Jakub Kicinski
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-25  3:54 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jiong Wang

From: Jiong Wang <jiong.wang@netronome.com>

The two fields are a copy of umin and umax info of bpf_insn->src_reg
generated by verifier.

Rename to make their meaning clear.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c      | 12 ++++++------
 drivers/net/ethernet/netronome/nfp/bpf/main.h     | 10 +++++-----
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |  2 +-
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |  4 ++--
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 33111739b210..4a629e9b5c0f 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1772,8 +1772,8 @@ static int shl_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	u8 dst, src;
 
 	dst = insn->dst_reg * 2;
-	umin = meta->umin;
-	umax = meta->umax;
+	umin = meta->umin_src;
+	umax = meta->umax_src;
 	if (umin == umax)
 		return __shl_imm64(nfp_prog, dst, umin);
 
@@ -1881,8 +1881,8 @@ static int shr_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	u8 dst, src;
 
 	dst = insn->dst_reg * 2;
-	umin = meta->umin;
-	umax = meta->umax;
+	umin = meta->umin_src;
+	umax = meta->umax_src;
 	if (umin == umax)
 		return __shr_imm64(nfp_prog, dst, umin);
 
@@ -1995,8 +1995,8 @@ static int ashr_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	u8 dst, src;
 
 	dst = insn->dst_reg * 2;
-	umin = meta->umin;
-	umax = meta->umax;
+	umin = meta->umin_src;
+	umax = meta->umax_src;
 	if (umin == umax)
 		return __ashr_imm64(nfp_prog, dst, umin);
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 654fe7823e5e..5975a19c28cb 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -263,8 +263,8 @@ struct nfp_bpf_reg_state {
  * @func_id: function id for call instructions
  * @arg1: arg1 for call instructions
  * @arg2: arg2 for call instructions
- * @umin: copy of core verifier umin_value.
- * @umax: copy of core verifier umax_value.
+ * @umin_src: copy of core verifier umin_value for src opearnd.
+ * @umax_src: copy of core verifier umax_value for src operand.
  * @off: index of first generated machine instruction (in nfp_prog.prog)
  * @n: eBPF instruction number
  * @flags: eBPF instruction extra optimization flags
@@ -301,11 +301,11 @@ struct nfp_insn_meta {
 			struct nfp_bpf_reg_state arg2;
 		};
 		/* We are interested in range info for some operands,
-		 * for example, the shift amount.
+		 * for example, the shift amount which is kept in src operand.
 		 */
 		struct {
-			u64 umin;
-			u64 umax;
+			u64 umin_src;
+			u64 umax_src;
 		};
 	};
 	unsigned int off;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 7eae4c0266f8..856a0003bb75 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -191,7 +191,7 @@ nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog,
 		meta->insn = prog[i];
 		meta->n = i;
 		if (is_mbpf_indir_shift(meta))
-			meta->umin = U64_MAX;
+			meta->umin_src = U64_MAX;
 
 		list_add_tail(&meta->l, &nfp_prog->insns);
 	}
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 4bfeba7b21b2..e862b739441f 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -555,8 +555,8 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
 		const struct bpf_reg_state *sreg =
 			cur_regs(env) + meta->insn.src_reg;
 
-		meta->umin = min(meta->umin, sreg->umin_value);
-		meta->umax = max(meta->umax, sreg->umax_value);
+		meta->umin_src = min(meta->umin_src, sreg->umin_value);
+		meta->umax_src = max(meta->umax_src, sreg->umax_value);
 	}
 
 	return 0;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 4/7] nfp: bpf: copy range info for all operands of all ALU operations
  2018-06-25  3:54 [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps Jakub Kicinski
                   ` (2 preceding siblings ...)
  2018-06-25  3:54 ` [PATCH bpf-next 3/7] nfp: bpf: rename umin/umax to umin_src/umax_src Jakub Kicinski
@ 2018-06-25  3:54 ` Jakub Kicinski
  2018-06-26  6:50   ` Song Liu
  2018-06-25  3:54 ` [PATCH bpf-next 5/7] nfp: bpf: support u16 and u32 multiplications Jakub Kicinski
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-25  3:54 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jiong Wang

From: Jiong Wang <jiong.wang@netronome.com>

NFP verifier hook is coping range information of the shift amount for
indirect shift operation so optimized shift sequences could be generated.

We want to use range info to do more things. For example, to decide whether
multiplication and divide are supported on the given range.

This patch simply let NFP verifier hook to copy range info for all operands
of all ALU operands.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.h | 33 +++++++------------
 .../net/ethernet/netronome/nfp/bpf/offload.c  |  4 ++-
 .../net/ethernet/netronome/nfp/bpf/verifier.c |  6 +++-
 3 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 5975a19c28cb..c985d0ac61a3 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -265,6 +265,8 @@ struct nfp_bpf_reg_state {
  * @arg2: arg2 for call instructions
  * @umin_src: copy of core verifier umin_value for src opearnd.
  * @umax_src: copy of core verifier umax_value for src operand.
+ * @umin_dst: copy of core verifier umin_value for dst opearnd.
+ * @umax_dst: copy of core verifier umax_value for dst operand.
  * @off: index of first generated machine instruction (in nfp_prog.prog)
  * @n: eBPF instruction number
  * @flags: eBPF instruction extra optimization flags
@@ -300,12 +302,15 @@ struct nfp_insn_meta {
 			struct bpf_reg_state arg1;
 			struct nfp_bpf_reg_state arg2;
 		};
-		/* We are interested in range info for some operands,
-		 * for example, the shift amount which is kept in src operand.
+		/* We are interested in range info for operands of ALU
+		 * operations. For example, shift amount, multiplicand and
+		 * multiplier etc.
 		 */
 		struct {
 			u64 umin_src;
 			u64 umax_src;
+			u64 umin_dst;
+			u64 umax_dst;
 		};
 	};
 	unsigned int off;
@@ -339,6 +344,11 @@ static inline u8 mbpf_mode(const struct nfp_insn_meta *meta)
 	return BPF_MODE(meta->insn.code);
 }
 
+static inline bool is_mbpf_alu(const struct nfp_insn_meta *meta)
+{
+	return mbpf_class(meta) == BPF_ALU64 || mbpf_class(meta) == BPF_ALU;
+}
+
 static inline bool is_mbpf_load(const struct nfp_insn_meta *meta)
 {
 	return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_LDX | BPF_MEM);
@@ -384,25 +394,6 @@ static inline bool is_mbpf_xadd(const struct nfp_insn_meta *meta)
 	return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_XADD);
 }
 
-static inline bool is_mbpf_indir_shift(const struct nfp_insn_meta *meta)
-{
-	u8 code = meta->insn.code;
-	bool is_alu, is_shift;
-	u8 opclass, opcode;
-
-	opclass = BPF_CLASS(code);
-	is_alu = opclass == BPF_ALU64 || opclass == BPF_ALU;
-	if (!is_alu)
-		return false;
-
-	opcode = BPF_OP(code);
-	is_shift = opcode == BPF_LSH || opcode == BPF_RSH || opcode == BPF_ARSH;
-	if (!is_shift)
-		return false;
-
-	return BPF_SRC(code) == BPF_X;
-}
-
 /**
  * struct nfp_prog - nfp BPF program
  * @bpf: backpointer to the bpf app priv structure
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 856a0003bb75..78f44c4d95b4 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -190,8 +190,10 @@ nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog,
 
 		meta->insn = prog[i];
 		meta->n = i;
-		if (is_mbpf_indir_shift(meta))
+		if (is_mbpf_alu(meta)) {
 			meta->umin_src = U64_MAX;
+			meta->umin_dst = U64_MAX;
+		}
 
 		list_add_tail(&meta->l, &nfp_prog->insns);
 	}
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index e862b739441f..7bd9666bd8ff 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -551,12 +551,16 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
 	if (is_mbpf_xadd(meta))
 		return nfp_bpf_check_xadd(nfp_prog, meta, env);
 
-	if (is_mbpf_indir_shift(meta)) {
+	if (is_mbpf_alu(meta)) {
 		const struct bpf_reg_state *sreg =
 			cur_regs(env) + meta->insn.src_reg;
+		const struct bpf_reg_state *dreg =
+			cur_regs(env) + meta->insn.dst_reg;
 
 		meta->umin_src = min(meta->umin_src, sreg->umin_value);
 		meta->umax_src = max(meta->umax_src, sreg->umax_value);
+		meta->umin_dst = min(meta->umin_dst, dreg->umin_value);
+		meta->umax_dst = max(meta->umax_dst, dreg->umax_value);
 	}
 
 	return 0;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 5/7] nfp: bpf: support u16 and u32 multiplications
  2018-06-25  3:54 [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps Jakub Kicinski
                   ` (3 preceding siblings ...)
  2018-06-25  3:54 ` [PATCH bpf-next 4/7] nfp: bpf: copy range info for all operands of all ALU operations Jakub Kicinski
@ 2018-06-25  3:54 ` Jakub Kicinski
  2018-06-26 22:23   ` Song Liu
  2018-06-25  3:54 ` [PATCH bpf-next 6/7] nfp: bpf: support u32 divide using reciprocal_div.h Jakub Kicinski
  2018-06-25  3:54 ` [PATCH bpf-next 7/7] nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h Jakub Kicinski
  6 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-25  3:54 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jiong Wang

From: Jiong Wang <jiong.wang@netronome.com>

NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per
step, therefore we need 2 steps for u16 and 4 steps for u32.

We also need one start instruction to initialize the sequence and one or
two instructions to fetch the result depending on either you need the high
halve of u32 multiplication.

For ALU64, if either operand is beyond u32's value range, we reject it. One
thing to note, if the source operand is BPF_K, then we need to check "imm"
field directly, and we'd reject it if it is negative.  Because for ALU64,
"imm" (with s32 type) is expected to be sign extended to s64 which NFP mul
doesn't support. For ALU32, it is fine for "imm" be negative though,
because the result is 32-bits and here is no difference on the low halve
of result for signed/unsigned mul, so we will get correct result.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 137 ++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/bpf/main.h |   5 +
 .../net/ethernet/netronome/nfp/bpf/verifier.c |  58 ++++++--
 drivers/net/ethernet/netronome/nfp/nfp_asm.h  |  28 ++++
 4 files changed, 217 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 4a629e9b5c0f..7d7061d93358 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -415,6 +415,60 @@ emit_alu(struct nfp_prog *nfp_prog, swreg dst,
 		   reg.dst_lmextn, reg.src_lmextn);
 }
 
+static void
+__emit_mul(struct nfp_prog *nfp_prog, enum alu_dst_ab dst_ab, u16 areg,
+	   enum mul_type type, enum mul_step step, u16 breg, bool swap,
+	   bool wr_both, bool dst_lmextn, bool src_lmextn)
+{
+	u64 insn;
+
+	insn = OP_MUL_BASE |
+		FIELD_PREP(OP_MUL_A_SRC, areg) |
+		FIELD_PREP(OP_MUL_B_SRC, breg) |
+		FIELD_PREP(OP_MUL_STEP, step) |
+		FIELD_PREP(OP_MUL_DST_AB, dst_ab) |
+		FIELD_PREP(OP_MUL_SW, swap) |
+		FIELD_PREP(OP_MUL_TYPE, type) |
+		FIELD_PREP(OP_MUL_WR_AB, wr_both) |
+		FIELD_PREP(OP_MUL_SRC_LMEXTN, src_lmextn) |
+		FIELD_PREP(OP_MUL_DST_LMEXTN, dst_lmextn);
+
+	nfp_prog_push(nfp_prog, insn);
+}
+
+static void
+emit_mul(struct nfp_prog *nfp_prog, swreg lreg, enum mul_type type,
+	 enum mul_step step, swreg rreg)
+{
+	struct nfp_insn_ur_regs reg;
+	u16 areg;
+	int err;
+
+	if (type == MUL_TYPE_START && step != MUL_STEP_NONE) {
+		nfp_prog->error = -EINVAL;
+		return;
+	}
+
+	if (step == MUL_LAST || step == MUL_LAST_2) {
+		/* When type is step and step Number is LAST or LAST2, left
+		 * source is used as destination.
+		 */
+		err = swreg_to_unrestricted(lreg, reg_none(), rreg, &reg);
+		areg = reg.dst;
+	} else {
+		err = swreg_to_unrestricted(reg_none(), lreg, rreg, &reg);
+		areg = reg.areg;
+	}
+
+	if (err) {
+		nfp_prog->error = err;
+		return;
+	}
+
+	__emit_mul(nfp_prog, reg.dst_ab, areg, type, step, reg.breg, reg.swap,
+		   reg.wr_both, reg.dst_lmextn, reg.src_lmextn);
+}
+
 static void
 __emit_ld_field(struct nfp_prog *nfp_prog, enum shf_sc sc,
 		u8 areg, u8 bmask, u8 breg, u8 shift, bool imm8,
@@ -1380,6 +1434,65 @@ static void wrp_end32(struct nfp_prog *nfp_prog, swreg reg_in, u8 gpr_out)
 		      SHF_SC_R_ROT, 16);
 }
 
+static void
+wrp_mul_u32(struct nfp_prog *nfp_prog, swreg dst_hi, swreg dst_lo, swreg lreg,
+	    swreg rreg, bool gen_high_half)
+{
+	emit_mul(nfp_prog, lreg, MUL_TYPE_START, MUL_STEP_NONE, rreg);
+	emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_32x32, MUL_STEP_1, rreg);
+	emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_32x32, MUL_STEP_2, rreg);
+	emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_32x32, MUL_STEP_3, rreg);
+	emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_32x32, MUL_STEP_4, rreg);
+	emit_mul(nfp_prog, dst_lo, MUL_TYPE_STEP_32x32, MUL_LAST, reg_none());
+	if (gen_high_half)
+		emit_mul(nfp_prog, dst_hi, MUL_TYPE_STEP_32x32, MUL_LAST_2,
+			 reg_none());
+	else
+		wrp_immed(nfp_prog, dst_hi, 0);
+}
+
+static void
+wrp_mul_u16(struct nfp_prog *nfp_prog, swreg dst_hi, swreg dst_lo, swreg lreg,
+	    swreg rreg)
+{
+	emit_mul(nfp_prog, lreg, MUL_TYPE_START, MUL_STEP_NONE, rreg);
+	emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_16x16, MUL_STEP_1, rreg);
+	emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_16x16, MUL_STEP_2, rreg);
+	emit_mul(nfp_prog, dst_lo, MUL_TYPE_STEP_16x16, MUL_LAST, reg_none());
+}
+
+static int
+wrp_mul(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+	bool gen_high_half, bool ropnd_from_reg)
+{
+	swreg multiplier, multiplicand, dst_hi, dst_lo;
+	const struct bpf_insn *insn = &meta->insn;
+	u32 lopnd_max, ropnd_max;
+	u8 dst_reg;
+
+	dst_reg = insn->dst_reg;
+	multiplicand = reg_a(dst_reg * 2);
+	dst_hi = reg_both(dst_reg * 2 + 1);
+	dst_lo = reg_both(dst_reg * 2);
+	lopnd_max = meta->umax_dst;
+	if (ropnd_from_reg) {
+		multiplier = reg_b(insn->src_reg * 2);
+		ropnd_max = meta->umax_src;
+	} else {
+		u32 imm = insn->imm;
+
+		multiplier = re_load_imm_any(nfp_prog, imm, imm_b(nfp_prog));
+		ropnd_max = imm;
+	}
+	if (lopnd_max > U16_MAX || ropnd_max > U16_MAX)
+		wrp_mul_u32(nfp_prog, dst_hi, dst_lo, multiplicand, multiplier,
+			    gen_high_half);
+	else
+		wrp_mul_u16(nfp_prog, dst_hi, dst_lo, multiplicand, multiplier);
+
+	return 0;
+}
+
 static int adjust_head(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	swreg tmp = imm_a(nfp_prog), tmp_len = imm_b(nfp_prog);
@@ -1684,6 +1797,16 @@ static int sub_imm64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	return 0;
 }
 
+static int mul_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_mul(nfp_prog, meta, true, true);
+}
+
+static int mul_imm64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_mul(nfp_prog, meta, true, false);
+}
+
 static int neg_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	const struct bpf_insn *insn = &meta->insn;
@@ -2097,6 +2220,16 @@ static int sub_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	return wrp_alu32_imm(nfp_prog, meta, ALU_OP_SUB, !meta->insn.imm);
 }
 
+static int mul_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_mul(nfp_prog, meta, false, true);
+}
+
+static int mul_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_mul(nfp_prog, meta, false, false);
+}
+
 static int neg_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	u8 dst = meta->insn.dst_reg * 2;
@@ -2848,6 +2981,8 @@ static const instr_cb_t instr_cb[256] = {
 	[BPF_ALU64 | BPF_ADD | BPF_K] =	add_imm64,
 	[BPF_ALU64 | BPF_SUB | BPF_X] =	sub_reg64,
 	[BPF_ALU64 | BPF_SUB | BPF_K] =	sub_imm64,
+	[BPF_ALU64 | BPF_MUL | BPF_X] =	mul_reg64,
+	[BPF_ALU64 | BPF_MUL | BPF_K] =	mul_imm64,
 	[BPF_ALU64 | BPF_NEG] =		neg_reg64,
 	[BPF_ALU64 | BPF_LSH | BPF_X] =	shl_reg64,
 	[BPF_ALU64 | BPF_LSH | BPF_K] =	shl_imm64,
@@ -2867,6 +3002,8 @@ static const instr_cb_t instr_cb[256] = {
 	[BPF_ALU | BPF_ADD | BPF_K] =	add_imm,
 	[BPF_ALU | BPF_SUB | BPF_X] =	sub_reg,
 	[BPF_ALU | BPF_SUB | BPF_K] =	sub_imm,
+	[BPF_ALU | BPF_MUL | BPF_X] =	mul_reg,
+	[BPF_ALU | BPF_MUL | BPF_K] =	mul_imm,
 	[BPF_ALU | BPF_NEG] =		neg_reg,
 	[BPF_ALU | BPF_LSH | BPF_K] =	shl_imm,
 	[BPF_ALU | BPF_END | BPF_X] =	end_reg32,
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index c985d0ac61a3..c10079b1a312 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -394,6 +394,11 @@ static inline bool is_mbpf_xadd(const struct nfp_insn_meta *meta)
 	return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_XADD);
 }
 
+static inline bool is_mbpf_mul(const struct nfp_insn_meta *meta)
+{
+	return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_MUL;
+}
+
 /**
  * struct nfp_prog - nfp BPF program
  * @bpf: backpointer to the bpf app priv structure
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 7bd9666bd8ff..30d4f1580693 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -516,6 +516,51 @@ nfp_bpf_check_xadd(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	return nfp_bpf_check_ptr(nfp_prog, meta, env, meta->insn.dst_reg);
 }
 
+static int
+nfp_bpf_check_alu(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+		  struct bpf_verifier_env *env)
+{
+	const struct bpf_reg_state *sreg =
+		cur_regs(env) + meta->insn.src_reg;
+	const struct bpf_reg_state *dreg =
+		cur_regs(env) + meta->insn.dst_reg;
+
+	meta->umin_src = min(meta->umin_src, sreg->umin_value);
+	meta->umax_src = max(meta->umax_src, sreg->umax_value);
+	meta->umin_dst = min(meta->umin_dst, dreg->umin_value);
+	meta->umax_dst = max(meta->umax_dst, dreg->umax_value);
+
+	/* NFP supports u16 and u32 multiplication.
+	 *
+	 * For ALU64, if either operand is beyond u32's value range, we reject
+	 * it. One thing to note, if the source operand is BPF_K, then we need
+	 * to check "imm" field directly, and we'd reject it if it is negative.
+	 * Because for ALU64, "imm" (with s32 type) is expected to be sign
+	 * extended to s64 which NFP mul doesn't support.
+	 *
+	 * For ALU32, it is fine for "imm" be negative though, because the
+	 * result is 32-bits and there is no difference on the low halve of
+	 * the result for signed/unsigned mul, so we will get correct result.
+	 */
+	if (is_mbpf_mul(meta)) {
+		if (meta->umax_dst > U32_MAX) {
+			pr_vlog(env, "multiplier is not within u32 value range\n");
+			return -EINVAL;
+		}
+		if (mbpf_src(meta) == BPF_X && meta->umax_src > U32_MAX) {
+			pr_vlog(env, "multiplicand is not within u32 value range\n");
+			return -EINVAL;
+		}
+		if (mbpf_class(meta) == BPF_ALU64 &&
+		    mbpf_src(meta) == BPF_K && meta->insn.imm < 0) {
+			pr_vlog(env, "sign extended multiplicand won't be within u32 value range\n");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 static int
 nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
 {
@@ -551,17 +596,8 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
 	if (is_mbpf_xadd(meta))
 		return nfp_bpf_check_xadd(nfp_prog, meta, env);
 
-	if (is_mbpf_alu(meta)) {
-		const struct bpf_reg_state *sreg =
-			cur_regs(env) + meta->insn.src_reg;
-		const struct bpf_reg_state *dreg =
-			cur_regs(env) + meta->insn.dst_reg;
-
-		meta->umin_src = min(meta->umin_src, sreg->umin_value);
-		meta->umax_src = max(meta->umax_src, sreg->umax_value);
-		meta->umin_dst = min(meta->umin_dst, dreg->umin_value);
-		meta->umax_dst = max(meta->umax_dst, dreg->umax_value);
-	}
+	if (is_mbpf_alu(meta))
+		return nfp_bpf_check_alu(nfp_prog, meta, env);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.h b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
index f6677bc9875a..cdc4e065f6f5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_asm.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
@@ -426,4 +426,32 @@ static inline u32 nfp_get_ind_csr_ctx_ptr_offs(u32 read_offset)
 	return (read_offset & ~NFP_IND_ME_CTX_PTR_BASE_MASK) | NFP_CSR_CTX_PTR;
 }
 
+enum mul_type {
+	MUL_TYPE_START		= 0x00,
+	MUL_TYPE_STEP_24x8	= 0x01,
+	MUL_TYPE_STEP_16x16	= 0x02,
+	MUL_TYPE_STEP_32x32	= 0x03,
+};
+
+enum mul_step {
+	MUL_STEP_1		= 0x00,
+	MUL_STEP_NONE		= MUL_STEP_1,
+	MUL_STEP_2		= 0x01,
+	MUL_STEP_3		= 0x02,
+	MUL_STEP_4		= 0x03,
+	MUL_LAST		= 0x04,
+	MUL_LAST_2		= 0x05,
+};
+
+#define OP_MUL_BASE		0x0f800000000ULL
+#define OP_MUL_A_SRC		0x000000003ffULL
+#define OP_MUL_B_SRC		0x000000ffc00ULL
+#define OP_MUL_STEP		0x00000700000ULL
+#define OP_MUL_DST_AB		0x00000800000ULL
+#define OP_MUL_SW		0x00040000000ULL
+#define OP_MUL_TYPE		0x00180000000ULL
+#define OP_MUL_WR_AB		0x20000000000ULL
+#define OP_MUL_SRC_LMEXTN	0x40000000000ULL
+#define OP_MUL_DST_LMEXTN	0x80000000000ULL
+
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 6/7] nfp: bpf: support u32 divide using reciprocal_div.h
  2018-06-25  3:54 [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps Jakub Kicinski
                   ` (4 preceding siblings ...)
  2018-06-25  3:54 ` [PATCH bpf-next 5/7] nfp: bpf: support u16 and u32 multiplications Jakub Kicinski
@ 2018-06-25  3:54 ` Jakub Kicinski
  2018-06-26 22:28   ` Song Liu
  2018-06-25  3:54 ` [PATCH bpf-next 7/7] nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h Jakub Kicinski
  6 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-25  3:54 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jiong Wang

From: Jiong Wang <jiong.wang@netronome.com>

NFP doesn't have integer divide instruction, this patch use reciprocal
algorithm (the basic one, reciprocal_div) to emulate it.

For each u32 divide, we would need 11 instructions to finish the operation.

  7 (for multiplication) + 4 (various ALUs) = 11

Given NFP only supports multiplication no bigger than u32, we'd require
divisor and dividend no bigger than that as well.

Also eBPF doesn't support signed divide and has enforced this on C language
level by failing compilation. However LLVM assembler hasn't enforced this,
so it is possible for negative constant to leak in as a BPF_K operand
through assembly code, we reject such cases as well.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 58 ++++++++++++++++++-
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  5 ++
 .../net/ethernet/netronome/nfp/bpf/verifier.c | 31 ++++++++++
 3 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 7d7061d93358..d732b6cfc356 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -34,10 +34,11 @@
 #define pr_fmt(fmt)	"NFP net bpf: " fmt
 
 #include <linux/bug.h>
-#include <linux/kernel.h>
 #include <linux/bpf.h>
 #include <linux/filter.h>
+#include <linux/kernel.h>
 #include <linux/pkt_cls.h>
+#include <linux/reciprocal_div.h>
 #include <linux/unistd.h>
 
 #include "main.h"
@@ -1493,6 +1494,32 @@ wrp_mul(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	return 0;
 }
 
+static int wrp_div_imm(struct nfp_prog *nfp_prog, u8 dst, u64 imm)
+{
+	swreg tmp_both = imm_both(nfp_prog), dst_both = reg_both(dst);
+	swreg dst_a = reg_a(dst), dst_b = reg_a(dst);
+	struct reciprocal_value rvalue;
+	swreg tmp_b = imm_b(nfp_prog);
+	swreg magic;
+
+	if (imm > U32_MAX) {
+		wrp_immed(nfp_prog, dst_both, 0);
+		return 0;
+	}
+
+	rvalue = reciprocal_value(imm);
+	magic = re_load_imm_any(nfp_prog, rvalue.m, imm_b(nfp_prog));
+	wrp_mul_u32(nfp_prog, tmp_both, tmp_both, dst_a, magic, true);
+	emit_alu(nfp_prog, dst_both, dst_a, ALU_OP_SUB, tmp_b);
+	emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE, dst_b,
+		 SHF_SC_R_SHF, rvalue.sh1);
+	emit_alu(nfp_prog, dst_both, dst_a, ALU_OP_ADD, tmp_b);
+	emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE, dst_b,
+		 SHF_SC_R_SHF, rvalue.sh2);
+
+	return 0;
+}
+
 static int adjust_head(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	swreg tmp = imm_a(nfp_prog), tmp_len = imm_b(nfp_prog);
@@ -1807,6 +1834,21 @@ static int mul_imm64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	return wrp_mul(nfp_prog, meta, true, false);
 }
 
+static int div_imm64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	const struct bpf_insn *insn = &meta->insn;
+
+	return wrp_div_imm(nfp_prog, insn->dst_reg * 2, insn->imm);
+}
+
+static int div_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	/* NOTE: verifier hook has rejected cases for which verifier doesn't
+	 * know whether the source operand is constant or not.
+	 */
+	return wrp_div_imm(nfp_prog, meta->insn.dst_reg * 2, meta->umin_src);
+}
+
 static int neg_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	const struct bpf_insn *insn = &meta->insn;
@@ -2230,6 +2272,16 @@ static int mul_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	return wrp_mul(nfp_prog, meta, false, false);
 }
 
+static int div_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return div_reg64(nfp_prog, meta);
+}
+
+static int div_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return div_imm64(nfp_prog, meta);
+}
+
 static int neg_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	u8 dst = meta->insn.dst_reg * 2;
@@ -2983,6 +3035,8 @@ static const instr_cb_t instr_cb[256] = {
 	[BPF_ALU64 | BPF_SUB | BPF_K] =	sub_imm64,
 	[BPF_ALU64 | BPF_MUL | BPF_X] =	mul_reg64,
 	[BPF_ALU64 | BPF_MUL | BPF_K] =	mul_imm64,
+	[BPF_ALU64 | BPF_DIV | BPF_X] =	div_reg64,
+	[BPF_ALU64 | BPF_DIV | BPF_K] =	div_imm64,
 	[BPF_ALU64 | BPF_NEG] =		neg_reg64,
 	[BPF_ALU64 | BPF_LSH | BPF_X] =	shl_reg64,
 	[BPF_ALU64 | BPF_LSH | BPF_K] =	shl_imm64,
@@ -3004,6 +3058,8 @@ static const instr_cb_t instr_cb[256] = {
 	[BPF_ALU | BPF_SUB | BPF_K] =	sub_imm,
 	[BPF_ALU | BPF_MUL | BPF_X] =	mul_reg,
 	[BPF_ALU | BPF_MUL | BPF_K] =	mul_imm,
+	[BPF_ALU | BPF_DIV | BPF_X] =	div_reg,
+	[BPF_ALU | BPF_DIV | BPF_K] =	div_imm,
 	[BPF_ALU | BPF_NEG] =		neg_reg,
 	[BPF_ALU | BPF_LSH | BPF_K] =	shl_imm,
 	[BPF_ALU | BPF_END | BPF_X] =	end_reg32,
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index c10079b1a312..9845c1a2d4c2 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -399,6 +399,11 @@ static inline bool is_mbpf_mul(const struct nfp_insn_meta *meta)
 	return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_MUL;
 }
 
+static inline bool is_mbpf_div(const struct nfp_insn_meta *meta)
+{
+	return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_DIV;
+}
+
 /**
  * struct nfp_prog - nfp BPF program
  * @bpf: backpointer to the bpf app priv structure
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 30d4f1580693..f0f07e988c46 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -558,6 +558,37 @@ nfp_bpf_check_alu(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 		}
 	}
 
+	/* NFP doesn't have divide instructions, we support divide by constant
+	 * through reciprocal multiplication. Given NFP support multiplication
+	 * no bigger than u32, we'd require divisor and dividend no bigger than
+	 * that as well.
+	 *
+	 * Also eBPF doesn't support signed divide and has enforced this on C
+	 * language level by failing compilation. However LLVM assembler hasn't
+	 * enforced this, so it is possible for negative constant to leak in as
+	 * a BPF_K operand through assembly code, we reject such cases as well.
+	 */
+	if (is_mbpf_div(meta)) {
+		if (meta->umax_dst > U32_MAX) {
+			pr_vlog(env, "divisor is not within u32 value range\n");
+			return -EINVAL;
+		}
+		if (mbpf_src(meta) == BPF_X) {
+			if (meta->umin_src != meta->umax_src) {
+				pr_vlog(env, "dividend is not constant\n");
+				return -EINVAL;
+			}
+			if (meta->umax_src > U32_MAX) {
+				pr_vlog(env, "dividend is not within u32 value range\n");
+				return -EINVAL;
+			}
+		}
+		if (mbpf_src(meta) == BPF_K && meta->insn.imm < 0) {
+			pr_vlog(env, "divide by negative constant is not supported\n");
+			return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 7/7] nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h
  2018-06-25  3:54 [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps Jakub Kicinski
                   ` (5 preceding siblings ...)
  2018-06-25  3:54 ` [PATCH bpf-next 6/7] nfp: bpf: support u32 divide using reciprocal_div.h Jakub Kicinski
@ 2018-06-25  3:54 ` Jakub Kicinski
  2018-06-26 20:59   ` Jakub Kicinski
  6 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-25  3:54 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jiong Wang

From: Jiong Wang <jiong.wang@netronome.com>

As we are doing JIT, we would want to use the advanced version of the
reciprocal divide (reciprocal_value_adv) to trade performance with host.

We could reduce the required ALU instructions from 4 to 2 or 1.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 38 ++++++++++++++-----
 .../net/ethernet/netronome/nfp/bpf/verifier.c | 16 ++++++--
 2 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index d732b6cfc356..f99ac00bd649 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1498,8 +1498,9 @@ static int wrp_div_imm(struct nfp_prog *nfp_prog, u8 dst, u64 imm)
 {
 	swreg tmp_both = imm_both(nfp_prog), dst_both = reg_both(dst);
 	swreg dst_a = reg_a(dst), dst_b = reg_a(dst);
-	struct reciprocal_value rvalue;
+	struct reciprocal_value_adv rvalue;
 	swreg tmp_b = imm_b(nfp_prog);
+	u8 pre_shift, exp;
 	swreg magic;
 
 	if (imm > U32_MAX) {
@@ -1507,15 +1508,34 @@ static int wrp_div_imm(struct nfp_prog *nfp_prog, u8 dst, u64 imm)
 		return 0;
 	}
 
-	rvalue = reciprocal_value(imm);
+	rvalue = reciprocal_value_adv(imm, 32);
+	exp = rvalue.exp;
+	if (rvalue.is_wide_m && !(imm & 1)) {
+		pre_shift = fls(imm & -imm) - 1;
+		rvalue = reciprocal_value_adv(imm >> pre_shift, 32 - pre_shift);
+	} else {
+		pre_shift = 0;
+	}
 	magic = re_load_imm_any(nfp_prog, rvalue.m, imm_b(nfp_prog));
-	wrp_mul_u32(nfp_prog, tmp_both, tmp_both, dst_a, magic, true);
-	emit_alu(nfp_prog, dst_both, dst_a, ALU_OP_SUB, tmp_b);
-	emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE, dst_b,
-		 SHF_SC_R_SHF, rvalue.sh1);
-	emit_alu(nfp_prog, dst_both, dst_a, ALU_OP_ADD, tmp_b);
-	emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE, dst_b,
-		 SHF_SC_R_SHF, rvalue.sh2);
+	if (imm == 1 << exp) {
+		emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE, dst_b,
+			 SHF_SC_R_SHF, exp);
+	} else if (rvalue.is_wide_m) {
+		wrp_mul_u32(nfp_prog, tmp_both, tmp_both, dst_a, magic, true);
+		emit_alu(nfp_prog, dst_both, dst_a, ALU_OP_SUB, tmp_b);
+		emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE, dst_b,
+			 SHF_SC_R_SHF, 1);
+		emit_alu(nfp_prog, dst_both, dst_a, ALU_OP_ADD, tmp_b);
+		emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE, dst_b,
+			 SHF_SC_R_SHF, rvalue.sh - 1);
+	} else {
+		if (pre_shift)
+			emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE,
+				 dst_b, SHF_SC_R_SHF, pre_shift);
+		wrp_mul_u32(nfp_prog, dst_both, dst_both, dst_a, magic, true);
+		emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE,
+			 dst_b, SHF_SC_R_SHF, rvalue.sh);
+	}
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index f0f07e988c46..39c2c24fea11 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -561,12 +561,22 @@ nfp_bpf_check_alu(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	/* NFP doesn't have divide instructions, we support divide by constant
 	 * through reciprocal multiplication. Given NFP support multiplication
 	 * no bigger than u32, we'd require divisor and dividend no bigger than
-	 * that as well.
+	 * that as well. There is a further range requirement on dividend,
+	 * please see the NOTE below.
 	 *
 	 * Also eBPF doesn't support signed divide and has enforced this on C
 	 * language level by failing compilation. However LLVM assembler hasn't
 	 * enforced this, so it is possible for negative constant to leak in as
 	 * a BPF_K operand through assembly code, we reject such cases as well.
+	 *
+	 * NOTE: because we are using "reciprocal_value_adv" which doesn't
+	 * support dividend with MSB set, so we need to JIT separate NFP
+	 * sequence to handle such case. It could be a simple sequence if there
+	 * is conditional move, however there isn't for NFP. So, we don't bother
+	 * generating compare-if-set-branch sequence by rejecting the program
+	 * straight away when the u32 dividend has MSB set. Divide by such a
+	 * large constant would be rare in practice. Also, the programmer could
+	 * simply rewrite it as "result = divisor >= the_const".
 	 */
 	if (is_mbpf_div(meta)) {
 		if (meta->umax_dst > U32_MAX) {
@@ -578,8 +588,8 @@ nfp_bpf_check_alu(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 				pr_vlog(env, "dividend is not constant\n");
 				return -EINVAL;
 			}
-			if (meta->umax_src > U32_MAX) {
-				pr_vlog(env, "dividend is not within u32 value range\n");
+			if (meta->umax_src > U32_MAX / 2) {
+				pr_vlog(env, "dividend is bigger than U32_MAX/2\n");
 				return -EINVAL;
 			}
 		}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization
  2018-06-25  3:54 ` [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization Jakub Kicinski
@ 2018-06-26  5:50   ` Song Liu
  2018-06-26  7:08     ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2018-06-26  5:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking, Jiong Wang

On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
>
> Map read has been supported on NFP, this patch enables optimization for
> memcpy from map to packet.
>
> This patch also fixed one latent bug which will cause copying from
> unexpected address once memcpy for map pointer enabled.
>
> Reported-by: Mary Pham <mary.pham@netronome.com>
> Reported-by: David Beckett <david.beckett@netronome.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  drivers/net/ethernet/netronome/nfp/bpf/jit.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> index 8a92088df0d7..33111739b210 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> @@ -670,7 +670,7 @@ static int nfp_cpp_memcpy(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>         xfer_num = round_up(len, 4) / 4;
>
>         if (src_40bit_addr)
> -               addr40_offset(nfp_prog, meta->insn.src_reg, off, &src_base,
> +               addr40_offset(nfp_prog, meta->insn.src_reg * 2, off, &src_base,
>                               &off);

Did this break other cases before this patch?

I am sorry if this is a dumb question. I don't think I fully
understand addr40_offset().

Song

>
>         /* Setup PREV_ALU fields to override memory read length. */
> @@ -3299,7 +3299,8 @@ curr_pair_is_memcpy(struct nfp_insn_meta *ld_meta,
>         if (!is_mbpf_load(ld_meta) || !is_mbpf_store(st_meta))
>                 return false;
>
> -       if (ld_meta->ptr.type != PTR_TO_PACKET)
> +       if (ld_meta->ptr.type != PTR_TO_PACKET &&
> +           ld_meta->ptr.type != PTR_TO_MAP_VALUE)
>                 return false;
>
>         if (st_meta->ptr.type != PTR_TO_PACKET)
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
  2018-06-25  3:54 ` [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jakub Kicinski
@ 2018-06-26  6:21   ` Song Liu
  2018-06-26 20:52     ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2018-06-26  6:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking, Jiong Wang

On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
>
> The new added "reciprocal_value_adv" implements the advanced version of the
> algorithm described in Figure 4.2 of the paper except when dividend has MSB
> set which would require u128 divide on host and actually could be easily
> handled before calling the new "reciprocal_value_adv".
>
> The advanced version requires more complex calculation to get the
> reciprocal multiplier and other control variables, but then could reduce
> the required emulation operations.
>
> It makes no sense to use this advanced version for host divide emulation,
> those extra complexities for calculating multiplier etc could completely
> waive our saving on emulation operations.
>
> However, it makes sense to use it for JIT divide code generation (for
> example eBPF JIT backends) for which we are willing to trade performance of
> JITed code with that of host. As shown by the following pseudo code, the
> required emulation operations could go down from 6 (the basic version) to 3
> or 4.
>
> To use the result of "reciprocal_value_adv", suppose we want to calculate
> n/d, the C-style pseudo code will be the following, it could be easily
> changed to real code generation for other JIT targets.
>
>   struct reciprocal_value_adv rvalue;
>   u8 pre_shift, exp;
>
>   if (d >= (1u << 31)) {
>     result = n >= d;
>     return;
>   }
>   rvalue = reciprocal_value_adv(d, 32)
>   exp = rvalue.exp;
>   if (rvalue.is_wide_m && !(d & 1)) {
>     pre_shift = fls(d & -d) - 1;
>     rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
>   } else {
>     pre_shift = 0;
>   }
>
>   // code generation starts.
>   if (imm == 1 << exp) {
>     result = n >> exp;
>   } else if (rvalue.is_wide_m) {
>     // pre_shift must be zero when reached here.
>     t = (n * rvalue.m) >> 32;
>     result = n - t;
>     result >>= 1;
>     result += t;
>     result >>= rvalue.sh - 1;
>   } else {
>     if (pre_shift)
>       result = n >> pre_shift;
>     result = ((u64)result * rvalue.m) >> 32;
>     result >>= rvalue.sh;
>   }
>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  include/linux/reciprocal_div.h | 65 ++++++++++++++++++++++++++++++++++
>  lib/reciprocal_div.c           | 37 +++++++++++++++++++
>  2 files changed, 102 insertions(+)
>
> diff --git a/include/linux/reciprocal_div.h b/include/linux/reciprocal_div.h
> index e031e9f2f9d8..5a695e4697d3 100644
> --- a/include/linux/reciprocal_div.h
> +++ b/include/linux/reciprocal_div.h
> @@ -25,6 +25,9 @@ struct reciprocal_value {
>         u8 sh1, sh2;
>  };
>
> +/* "reciprocal_value" and "reciprocal_divide" together implement the basic
> + * version of the algorithm described in Figure 4.1 of the paper.
> + */
>  struct reciprocal_value reciprocal_value(u32 d);
>
>  static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
> @@ -33,4 +36,66 @@ static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
>         return (t + ((a - t) >> R.sh1)) >> R.sh2;
>  }
>
> +struct reciprocal_value_adv {
> +       u32 m;
> +       u8 sh, exp;
> +       bool is_wide_m;
> +};
> +
> +/* "reciprocal_value_adv" implements the advanced version of the algorithm
> + * described in Figure 4.2 of the paper except when dividend has MSB set which
> + * would require u128 divide on host and actually could be easily handled before
> + * calling "reciprocal_value_adv".
> + *
> + * The advanced version requires more complex calculation to get the reciprocal
> + * multiplier and other control variables, but then could reduce the required
> + * emulation operations.
> + *
> + * It makes no sense to use this advanced version for host divide emulation,
> + * those extra complexities for calculating multiplier etc could completely
> + * waive our saving on emulation operations.
> + *
> + * However, it makes sense to use it for JIT divide code generation for which
> + * we are willing to trade performance of JITed code with that of host. As shown
> + * by the following pseudo code, the required emulation operations could go down
> + * from 6 (the basic version) to 3 or 4.
> + *
> + * To use the result of "reciprocal_value_adv", suppose we want to calculate
> + * n/d:
> + *
> + *   struct reciprocal_value_adv rvalue;
> + *   u8 pre_shift, exp;
> + *
> + *   if (d >= (1u << 31)) {
> + *     result = n >= d;
> + *     return;
> + *   }
> + *   rvalue = reciprocal_value_adv(d, 32)
> + *   exp = rvalue.exp;
> + *   if (rvalue.is_wide_m && !(d & 1)) {
> + *     pre_shift = fls(d & -d) - 1;
> + *     rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
> + *   } else {
> + *     pre_shift = 0;
> + *   }
> + *
> + *   // code generation starts.
> + *   if (imm == 1 << exp) {
> + *     result = n >> exp;
> + *   } else if (rvalue.is_wide_m) {
> + *     // pre_shift must be zero when reached here.
> + *     t = (n * rvalue.m) >> 32;
> + *     result = n - t;
> + *     result >>= 1;
> + *     result += t;
> + *     result >>= rvalue.sh - 1;
> + *   } else {
> + *     if (pre_shift)
> + *       result = n >> pre_shift;
> + *     result = ((u64)result * rvalue.m) >> 32;
> + *     result >>= rvalue.sh;
> + *   }
> + */
> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec);
> +
>  #endif /* _LINUX_RECIPROCAL_DIV_H */
> diff --git a/lib/reciprocal_div.c b/lib/reciprocal_div.c
> index fcb4ce682c6f..a41501ebad7c 100644
> --- a/lib/reciprocal_div.c
> +++ b/lib/reciprocal_div.c
> @@ -26,3 +26,40 @@ struct reciprocal_value reciprocal_value(u32 d)
>         return R;
>  }
>  EXPORT_SYMBOL(reciprocal_value);
> +
> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
> +{
> +       struct reciprocal_value_adv R;
> +       u32 l, post_shift;
> +       u64 mhigh, mlow;
> +
> +       l = fls(d - 1);
> +       post_shift = l;
> +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
> +        * MSB set. This case needs to be handled before calling
> +        * "reciprocal_value_adv", please see the comment at
> +        * include/linux/reciprocal_div.h.
> +        */

Shall we handle l == 32 case better? I guess the concern here is extra
handling may
slow down the fast path? If that's the case, we should at least add a
WARNING on the
slow path.

Thanks,
Song


> +       mlow = 1ULL << (32 + l);
> +       do_div(mlow, d);
> +       mhigh = (1ULL << (32 + l)) + (1ULL << (32 + l - prec));
> +       do_div(mhigh, d);
> +
> +       for (; post_shift > 0; post_shift--) {
> +               u64 lo = mlow >> 1, hi = mhigh >> 1;
> +
> +               if (lo >= hi)
> +                       break;
> +
> +               mlow = lo;
> +               mhigh = hi;
> +       }
> +
> +       R.m = (u32)mhigh;
> +       R.sh = post_shift;
> +       R.exp = l;
> +       R.is_wide_m = mhigh > U32_MAX;
> +
> +       return R;
> +}
> +EXPORT_SYMBOL(reciprocal_value_adv);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 3/7] nfp: bpf: rename umin/umax to umin_src/umax_src
  2018-06-25  3:54 ` [PATCH bpf-next 3/7] nfp: bpf: rename umin/umax to umin_src/umax_src Jakub Kicinski
@ 2018-06-26  6:21   ` Song Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-06-26  6:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking, Jiong Wang

On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
>
> The two fields are a copy of umin and umax info of bpf_insn->src_reg
> generated by verifier.
>
> Rename to make their meaning clear.
>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  drivers/net/ethernet/netronome/nfp/bpf/jit.c      | 12 ++++++------
>  drivers/net/ethernet/netronome/nfp/bpf/main.h     | 10 +++++-----
>  drivers/net/ethernet/netronome/nfp/bpf/offload.c  |  2 +-
>  drivers/net/ethernet/netronome/nfp/bpf/verifier.c |  4 ++--
>  4 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> index 33111739b210..4a629e9b5c0f 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> @@ -1772,8 +1772,8 @@ static int shl_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>         u8 dst, src;
>
>         dst = insn->dst_reg * 2;
> -       umin = meta->umin;
> -       umax = meta->umax;
> +       umin = meta->umin_src;
> +       umax = meta->umax_src;
>         if (umin == umax)
>                 return __shl_imm64(nfp_prog, dst, umin);
>
> @@ -1881,8 +1881,8 @@ static int shr_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>         u8 dst, src;
>
>         dst = insn->dst_reg * 2;
> -       umin = meta->umin;
> -       umax = meta->umax;
> +       umin = meta->umin_src;
> +       umax = meta->umax_src;
>         if (umin == umax)
>                 return __shr_imm64(nfp_prog, dst, umin);
>
> @@ -1995,8 +1995,8 @@ static int ashr_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>         u8 dst, src;
>
>         dst = insn->dst_reg * 2;
> -       umin = meta->umin;
> -       umax = meta->umax;
> +       umin = meta->umin_src;
> +       umax = meta->umax_src;
>         if (umin == umax)
>                 return __ashr_imm64(nfp_prog, dst, umin);
>
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
> index 654fe7823e5e..5975a19c28cb 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
> @@ -263,8 +263,8 @@ struct nfp_bpf_reg_state {
>   * @func_id: function id for call instructions
>   * @arg1: arg1 for call instructions
>   * @arg2: arg2 for call instructions
> - * @umin: copy of core verifier umin_value.
> - * @umax: copy of core verifier umax_value.
> + * @umin_src: copy of core verifier umin_value for src opearnd.
> + * @umax_src: copy of core verifier umax_value for src operand.
>   * @off: index of first generated machine instruction (in nfp_prog.prog)
>   * @n: eBPF instruction number
>   * @flags: eBPF instruction extra optimization flags
> @@ -301,11 +301,11 @@ struct nfp_insn_meta {
>                         struct nfp_bpf_reg_state arg2;
>                 };
>                 /* We are interested in range info for some operands,
> -                * for example, the shift amount.
> +                * for example, the shift amount which is kept in src operand.
>                  */
>                 struct {
> -                       u64 umin;
> -                       u64 umax;
> +                       u64 umin_src;
> +                       u64 umax_src;
>                 };
>         };
>         unsigned int off;
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> index 7eae4c0266f8..856a0003bb75 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> @@ -191,7 +191,7 @@ nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog,
>                 meta->insn = prog[i];
>                 meta->n = i;
>                 if (is_mbpf_indir_shift(meta))
> -                       meta->umin = U64_MAX;
> +                       meta->umin_src = U64_MAX;
>
>                 list_add_tail(&meta->l, &nfp_prog->insns);
>         }
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> index 4bfeba7b21b2..e862b739441f 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> @@ -555,8 +555,8 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
>                 const struct bpf_reg_state *sreg =
>                         cur_regs(env) + meta->insn.src_reg;
>
> -               meta->umin = min(meta->umin, sreg->umin_value);
> -               meta->umax = max(meta->umax, sreg->umax_value);
> +               meta->umin_src = min(meta->umin_src, sreg->umin_value);
> +               meta->umax_src = max(meta->umax_src, sreg->umax_value);
>         }
>
>         return 0;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 4/7] nfp: bpf: copy range info for all operands of all ALU operations
  2018-06-25  3:54 ` [PATCH bpf-next 4/7] nfp: bpf: copy range info for all operands of all ALU operations Jakub Kicinski
@ 2018-06-26  6:50   ` Song Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-06-26  6:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking, Jiong Wang

On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
>
> NFP verifier hook is coping range information of the shift amount for
> indirect shift operation so optimized shift sequences could be generated.
>
> We want to use range info to do more things. For example, to decide whether
> multiplication and divide are supported on the given range.
>
> This patch simply let NFP verifier hook to copy range info for all operands
> of all ALU operands.
>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  drivers/net/ethernet/netronome/nfp/bpf/main.h | 33 +++++++------------
>  .../net/ethernet/netronome/nfp/bpf/offload.c  |  4 ++-
>  .../net/ethernet/netronome/nfp/bpf/verifier.c |  6 +++-
>  3 files changed, 20 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
> index 5975a19c28cb..c985d0ac61a3 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
> @@ -265,6 +265,8 @@ struct nfp_bpf_reg_state {
>   * @arg2: arg2 for call instructions
>   * @umin_src: copy of core verifier umin_value for src opearnd.
>   * @umax_src: copy of core verifier umax_value for src operand.
> + * @umin_dst: copy of core verifier umin_value for dst opearnd.
> + * @umax_dst: copy of core verifier umax_value for dst operand.
>   * @off: index of first generated machine instruction (in nfp_prog.prog)
>   * @n: eBPF instruction number
>   * @flags: eBPF instruction extra optimization flags
> @@ -300,12 +302,15 @@ struct nfp_insn_meta {
>                         struct bpf_reg_state arg1;
>                         struct nfp_bpf_reg_state arg2;
>                 };
> -               /* We are interested in range info for some operands,
> -                * for example, the shift amount which is kept in src operand.
> +               /* We are interested in range info for operands of ALU
> +                * operations. For example, shift amount, multiplicand and
> +                * multiplier etc.
>                  */
>                 struct {
>                         u64 umin_src;
>                         u64 umax_src;
> +                       u64 umin_dst;
> +                       u64 umax_dst;
>                 };
>         };
>         unsigned int off;
> @@ -339,6 +344,11 @@ static inline u8 mbpf_mode(const struct nfp_insn_meta *meta)
>         return BPF_MODE(meta->insn.code);
>  }
>
> +static inline bool is_mbpf_alu(const struct nfp_insn_meta *meta)
> +{
> +       return mbpf_class(meta) == BPF_ALU64 || mbpf_class(meta) == BPF_ALU;
> +}
> +
>  static inline bool is_mbpf_load(const struct nfp_insn_meta *meta)
>  {
>         return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_LDX | BPF_MEM);
> @@ -384,25 +394,6 @@ static inline bool is_mbpf_xadd(const struct nfp_insn_meta *meta)
>         return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_XADD);
>  }
>
> -static inline bool is_mbpf_indir_shift(const struct nfp_insn_meta *meta)
> -{
> -       u8 code = meta->insn.code;
> -       bool is_alu, is_shift;
> -       u8 opclass, opcode;
> -
> -       opclass = BPF_CLASS(code);
> -       is_alu = opclass == BPF_ALU64 || opclass == BPF_ALU;
> -       if (!is_alu)
> -               return false;
> -
> -       opcode = BPF_OP(code);
> -       is_shift = opcode == BPF_LSH || opcode == BPF_RSH || opcode == BPF_ARSH;
> -       if (!is_shift)
> -               return false;
> -
> -       return BPF_SRC(code) == BPF_X;
> -}
> -
>  /**
>   * struct nfp_prog - nfp BPF program
>   * @bpf: backpointer to the bpf app priv structure
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> index 856a0003bb75..78f44c4d95b4 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> @@ -190,8 +190,10 @@ nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog,
>
>                 meta->insn = prog[i];
>                 meta->n = i;
> -               if (is_mbpf_indir_shift(meta))
> +               if (is_mbpf_alu(meta)) {
>                         meta->umin_src = U64_MAX;
> +                       meta->umin_dst = U64_MAX;
> +               }
>
>                 list_add_tail(&meta->l, &nfp_prog->insns);
>         }
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> index e862b739441f..7bd9666bd8ff 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> @@ -551,12 +551,16 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
>         if (is_mbpf_xadd(meta))
>                 return nfp_bpf_check_xadd(nfp_prog, meta, env);
>
> -       if (is_mbpf_indir_shift(meta)) {
> +       if (is_mbpf_alu(meta)) {
>                 const struct bpf_reg_state *sreg =
>                         cur_regs(env) + meta->insn.src_reg;
> +               const struct bpf_reg_state *dreg =
> +                       cur_regs(env) + meta->insn.dst_reg;
>
>                 meta->umin_src = min(meta->umin_src, sreg->umin_value);
>                 meta->umax_src = max(meta->umax_src, sreg->umax_value);
> +               meta->umin_dst = min(meta->umin_dst, dreg->umin_value);
> +               meta->umax_dst = max(meta->umax_dst, dreg->umax_value);
>         }
>
>         return 0;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization
  2018-06-26  5:50   ` Song Liu
@ 2018-06-26  7:08     ` Jakub Kicinski
  2018-06-26 16:26       ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-26  7:08 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking, Jiong Wang

On Mon, Jun 25, 2018 at 10:50 PM, Song Liu <liu.song.a23@gmail.com> wrote:
> On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
>> From: Jiong Wang <jiong.wang@netronome.com>
>>
>> Map read has been supported on NFP, this patch enables optimization for
>> memcpy from map to packet.
>>
>> This patch also fixed one latent bug which will cause copying from
>> unexpected address once memcpy for map pointer enabled.
>>
>> Reported-by: Mary Pham <mary.pham@netronome.com>
>> Reported-by: David Beckett <david.beckett@netronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> ---
>>  drivers/net/ethernet/netronome/nfp/bpf/jit.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
>> index 8a92088df0d7..33111739b210 100644
>> --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
>> +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
>> @@ -670,7 +670,7 @@ static int nfp_cpp_memcpy(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>>         xfer_num = round_up(len, 4) / 4;
>>
>>         if (src_40bit_addr)
>> -               addr40_offset(nfp_prog, meta->insn.src_reg, off, &src_base,
>> +               addr40_offset(nfp_prog, meta->insn.src_reg * 2, off, &src_base,
>>                               &off);
>
> Did this break other cases before this patch?
>
> I am sorry if this is a dumb question. I don't think I fully
> understand addr40_offset().

Only map memory uses 40 bit addressing right now, so the if was pretty
much dead code before the patch.

The memcpy optimization was left out of the initial map support due to
insufficient test coverage, I should have probably left more of the 40
bit addressing code out back then.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization
  2018-06-26  7:08     ` Jakub Kicinski
@ 2018-06-26 16:26       ` Song Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-06-26 16:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking, Jiong Wang

On Tue, Jun 26, 2018 at 12:08 AM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Mon, Jun 25, 2018 at 10:50 PM, Song Liu <liu.song.a23@gmail.com> wrote:
>> On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
>> <jakub.kicinski@netronome.com> wrote:
>>> From: Jiong Wang <jiong.wang@netronome.com>
>>>
>>> Map read has been supported on NFP, this patch enables optimization for
>>> memcpy from map to packet.
>>>
>>> This patch also fixed one latent bug which will cause copying from
>>> unexpected address once memcpy for map pointer enabled.
>>>
>>> Reported-by: Mary Pham <mary.pham@netronome.com>
>>> Reported-by: David Beckett <david.beckett@netronome.com>
>>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>> ---
>>>  drivers/net/ethernet/netronome/nfp/bpf/jit.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
>>> index 8a92088df0d7..33111739b210 100644
>>> --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
>>> +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
>>> @@ -670,7 +670,7 @@ static int nfp_cpp_memcpy(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>>>         xfer_num = round_up(len, 4) / 4;
>>>
>>>         if (src_40bit_addr)
>>> -               addr40_offset(nfp_prog, meta->insn.src_reg, off, &src_base,
>>> +               addr40_offset(nfp_prog, meta->insn.src_reg * 2, off, &src_base,
>>>                               &off);
>>
>> Did this break other cases before this patch?
>>
>> I am sorry if this is a dumb question. I don't think I fully
>> understand addr40_offset().
>
> Only map memory uses 40 bit addressing right now, so the if was pretty
> much dead code before the patch.
>
> The memcpy optimization was left out of the initial map support due to
> insufficient test coverage, I should have probably left more of the 40
> bit addressing code out back then.

Thanks for the explanation!

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
  2018-06-26  6:21   ` Song Liu
@ 2018-06-26 20:52     ` Jakub Kicinski
  2018-06-27  8:54       ` Daniel Borkmann
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-26 20:52 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking, Jiong Wang

On Mon, 25 Jun 2018 23:21:10 -0700, Song Liu wrote:
> > +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
> > +{
> > +       struct reciprocal_value_adv R;
> > +       u32 l, post_shift;
> > +       u64 mhigh, mlow;
> > +
> > +       l = fls(d - 1);
> > +       post_shift = l;
> > +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
> > +        * MSB set. This case needs to be handled before calling
> > +        * "reciprocal_value_adv", please see the comment at
> > +        * include/linux/reciprocal_div.h.
> > +        */  
> 
> Shall we handle l == 32 case better? I guess the concern here is extra
> handling may slow down the fast path? If that's the case, we should
> at least add a WARNING on the slow path.

Agreed, I think Jiong is travelling, hence no response.  We'll respin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 7/7] nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h
  2018-06-25  3:54 ` [PATCH bpf-next 7/7] nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h Jakub Kicinski
@ 2018-06-26 20:59   ` Jakub Kicinski
  2018-07-05 18:28     ` Jiong Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-26 20:59 UTC (permalink / raw)
  To: Jiong Wang; +Cc: alexei.starovoitov, daniel, oss-drivers, netdev

On Sun, 24 Jun 2018 20:54:21 -0700, Jakub Kicinski wrote:
> +	 * NOTE: because we are using "reciprocal_value_adv" which doesn't
> +	 * support dividend with MSB set, so we need to JIT separate NFP
> +	 * sequence to handle such case. It could be a simple sequence if there
> +	 * is conditional move, however there isn't for NFP. So, we don't bother
> +	 * generating compare-if-set-branch sequence by rejecting the program
> +	 * straight away when the u32 dividend has MSB set. Divide by such a
> +	 * large constant would be rare in practice. Also, the programmer could
> +	 * simply rewrite it as "result = divisor >= the_const".

Thinking about this again, can we just use carry bit?  The code may end
up shorter than the explanation why we don't support that case :P

immed[c, 0]
alu[--, a, -, b]
alu[c, c, +carry, 0]

Should be equivalent to:

c = a >= b

(Thanks to Edwin for double-checking the carry semantics.)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 5/7] nfp: bpf: support u16 and u32 multiplications
  2018-06-25  3:54 ` [PATCH bpf-next 5/7] nfp: bpf: support u16 and u32 multiplications Jakub Kicinski
@ 2018-06-26 22:23   ` Song Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-06-26 22:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking, Jiong Wang

On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
>
> NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per
> step, therefore we need 2 steps for u16 and 4 steps for u32.
>
> We also need one start instruction to initialize the sequence and one or
> two instructions to fetch the result depending on either you need the high
> halve of u32 multiplication.
>
> For ALU64, if either operand is beyond u32's value range, we reject it. One
> thing to note, if the source operand is BPF_K, then we need to check "imm"
> field directly, and we'd reject it if it is negative.  Because for ALU64,
> "imm" (with s32 type) is expected to be sign extended to s64 which NFP mul
> doesn't support. For ALU32, it is fine for "imm" be negative though,
> because the result is 32-bits and here is no difference on the low halve
> of result for signed/unsigned mul, so we will get correct result.
>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 137 ++++++++++++++++++
>  drivers/net/ethernet/netronome/nfp/bpf/main.h |   5 +
>  .../net/ethernet/netronome/nfp/bpf/verifier.c |  58 ++++++--
>  drivers/net/ethernet/netronome/nfp/nfp_asm.h  |  28 ++++
>  4 files changed, 217 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> index 4a629e9b5c0f..7d7061d93358 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> @@ -415,6 +415,60 @@ emit_alu(struct nfp_prog *nfp_prog, swreg dst,
>                    reg.dst_lmextn, reg.src_lmextn);
>  }
>
> +static void
> +__emit_mul(struct nfp_prog *nfp_prog, enum alu_dst_ab dst_ab, u16 areg,
> +          enum mul_type type, enum mul_step step, u16 breg, bool swap,
> +          bool wr_both, bool dst_lmextn, bool src_lmextn)
> +{
> +       u64 insn;
> +
> +       insn = OP_MUL_BASE |
> +               FIELD_PREP(OP_MUL_A_SRC, areg) |
> +               FIELD_PREP(OP_MUL_B_SRC, breg) |
> +               FIELD_PREP(OP_MUL_STEP, step) |
> +               FIELD_PREP(OP_MUL_DST_AB, dst_ab) |
> +               FIELD_PREP(OP_MUL_SW, swap) |
> +               FIELD_PREP(OP_MUL_TYPE, type) |
> +               FIELD_PREP(OP_MUL_WR_AB, wr_both) |
> +               FIELD_PREP(OP_MUL_SRC_LMEXTN, src_lmextn) |
> +               FIELD_PREP(OP_MUL_DST_LMEXTN, dst_lmextn);
> +
> +       nfp_prog_push(nfp_prog, insn);
> +}
> +
> +static void
> +emit_mul(struct nfp_prog *nfp_prog, swreg lreg, enum mul_type type,
> +        enum mul_step step, swreg rreg)
> +{
> +       struct nfp_insn_ur_regs reg;
> +       u16 areg;
> +       int err;
> +
> +       if (type == MUL_TYPE_START && step != MUL_STEP_NONE) {
> +               nfp_prog->error = -EINVAL;
> +               return;
> +       }
> +
> +       if (step == MUL_LAST || step == MUL_LAST_2) {
> +               /* When type is step and step Number is LAST or LAST2, left
> +                * source is used as destination.
> +                */
> +               err = swreg_to_unrestricted(lreg, reg_none(), rreg, &reg);
> +               areg = reg.dst;
> +       } else {
> +               err = swreg_to_unrestricted(reg_none(), lreg, rreg, &reg);
> +               areg = reg.areg;
> +       }
> +
> +       if (err) {
> +               nfp_prog->error = err;
> +               return;
> +       }
> +
> +       __emit_mul(nfp_prog, reg.dst_ab, areg, type, step, reg.breg, reg.swap,
> +                  reg.wr_both, reg.dst_lmextn, reg.src_lmextn);
> +}
> +
>  static void
>  __emit_ld_field(struct nfp_prog *nfp_prog, enum shf_sc sc,
>                 u8 areg, u8 bmask, u8 breg, u8 shift, bool imm8,
> @@ -1380,6 +1434,65 @@ static void wrp_end32(struct nfp_prog *nfp_prog, swreg reg_in, u8 gpr_out)
>                       SHF_SC_R_ROT, 16);
>  }
>
> +static void
> +wrp_mul_u32(struct nfp_prog *nfp_prog, swreg dst_hi, swreg dst_lo, swreg lreg,
> +           swreg rreg, bool gen_high_half)
> +{
> +       emit_mul(nfp_prog, lreg, MUL_TYPE_START, MUL_STEP_NONE, rreg);
> +       emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_32x32, MUL_STEP_1, rreg);
> +       emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_32x32, MUL_STEP_2, rreg);
> +       emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_32x32, MUL_STEP_3, rreg);
> +       emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_32x32, MUL_STEP_4, rreg);
> +       emit_mul(nfp_prog, dst_lo, MUL_TYPE_STEP_32x32, MUL_LAST, reg_none());
> +       if (gen_high_half)
> +               emit_mul(nfp_prog, dst_hi, MUL_TYPE_STEP_32x32, MUL_LAST_2,
> +                        reg_none());
> +       else
> +               wrp_immed(nfp_prog, dst_hi, 0);
> +}
> +
> +static void
> +wrp_mul_u16(struct nfp_prog *nfp_prog, swreg dst_hi, swreg dst_lo, swreg lreg,
> +           swreg rreg)
> +{
> +       emit_mul(nfp_prog, lreg, MUL_TYPE_START, MUL_STEP_NONE, rreg);
> +       emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_16x16, MUL_STEP_1, rreg);
> +       emit_mul(nfp_prog, lreg, MUL_TYPE_STEP_16x16, MUL_STEP_2, rreg);
> +       emit_mul(nfp_prog, dst_lo, MUL_TYPE_STEP_16x16, MUL_LAST, reg_none());
> +}
> +
> +static int
> +wrp_mul(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
> +       bool gen_high_half, bool ropnd_from_reg)
> +{
> +       swreg multiplier, multiplicand, dst_hi, dst_lo;
> +       const struct bpf_insn *insn = &meta->insn;
> +       u32 lopnd_max, ropnd_max;
> +       u8 dst_reg;
> +
> +       dst_reg = insn->dst_reg;
> +       multiplicand = reg_a(dst_reg * 2);
> +       dst_hi = reg_both(dst_reg * 2 + 1);
> +       dst_lo = reg_both(dst_reg * 2);
> +       lopnd_max = meta->umax_dst;
> +       if (ropnd_from_reg) {
> +               multiplier = reg_b(insn->src_reg * 2);
> +               ropnd_max = meta->umax_src;
> +       } else {
> +               u32 imm = insn->imm;
> +
> +               multiplier = re_load_imm_any(nfp_prog, imm, imm_b(nfp_prog));
> +               ropnd_max = imm;
> +       }
> +       if (lopnd_max > U16_MAX || ropnd_max > U16_MAX)
> +               wrp_mul_u32(nfp_prog, dst_hi, dst_lo, multiplicand, multiplier,
> +                           gen_high_half);
> +       else
> +               wrp_mul_u16(nfp_prog, dst_hi, dst_lo, multiplicand, multiplier);
> +
> +       return 0;
> +}
> +
>  static int adjust_head(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>  {
>         swreg tmp = imm_a(nfp_prog), tmp_len = imm_b(nfp_prog);
> @@ -1684,6 +1797,16 @@ static int sub_imm64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>         return 0;
>  }
>
> +static int mul_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> +{
> +       return wrp_mul(nfp_prog, meta, true, true);
> +}
> +
> +static int mul_imm64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> +{
> +       return wrp_mul(nfp_prog, meta, true, false);
> +}
> +
>  static int neg_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>  {
>         const struct bpf_insn *insn = &meta->insn;
> @@ -2097,6 +2220,16 @@ static int sub_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>         return wrp_alu32_imm(nfp_prog, meta, ALU_OP_SUB, !meta->insn.imm);
>  }
>
> +static int mul_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> +{
> +       return wrp_mul(nfp_prog, meta, false, true);
> +}
> +
> +static int mul_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> +{
> +       return wrp_mul(nfp_prog, meta, false, false);
> +}
> +
>  static int neg_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>  {
>         u8 dst = meta->insn.dst_reg * 2;
> @@ -2848,6 +2981,8 @@ static const instr_cb_t instr_cb[256] = {
>         [BPF_ALU64 | BPF_ADD | BPF_K] = add_imm64,
>         [BPF_ALU64 | BPF_SUB | BPF_X] = sub_reg64,
>         [BPF_ALU64 | BPF_SUB | BPF_K] = sub_imm64,
> +       [BPF_ALU64 | BPF_MUL | BPF_X] = mul_reg64,
> +       [BPF_ALU64 | BPF_MUL | BPF_K] = mul_imm64,
>         [BPF_ALU64 | BPF_NEG] =         neg_reg64,
>         [BPF_ALU64 | BPF_LSH | BPF_X] = shl_reg64,
>         [BPF_ALU64 | BPF_LSH | BPF_K] = shl_imm64,
> @@ -2867,6 +3002,8 @@ static const instr_cb_t instr_cb[256] = {
>         [BPF_ALU | BPF_ADD | BPF_K] =   add_imm,
>         [BPF_ALU | BPF_SUB | BPF_X] =   sub_reg,
>         [BPF_ALU | BPF_SUB | BPF_K] =   sub_imm,
> +       [BPF_ALU | BPF_MUL | BPF_X] =   mul_reg,
> +       [BPF_ALU | BPF_MUL | BPF_K] =   mul_imm,
>         [BPF_ALU | BPF_NEG] =           neg_reg,
>         [BPF_ALU | BPF_LSH | BPF_K] =   shl_imm,
>         [BPF_ALU | BPF_END | BPF_X] =   end_reg32,
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
> index c985d0ac61a3..c10079b1a312 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
> @@ -394,6 +394,11 @@ static inline bool is_mbpf_xadd(const struct nfp_insn_meta *meta)
>         return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_XADD);
>  }
>
> +static inline bool is_mbpf_mul(const struct nfp_insn_meta *meta)
> +{
> +       return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_MUL;
> +}
> +
>  /**
>   * struct nfp_prog - nfp BPF program
>   * @bpf: backpointer to the bpf app priv structure
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> index 7bd9666bd8ff..30d4f1580693 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> @@ -516,6 +516,51 @@ nfp_bpf_check_xadd(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
>         return nfp_bpf_check_ptr(nfp_prog, meta, env, meta->insn.dst_reg);
>  }
>
> +static int
> +nfp_bpf_check_alu(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
> +                 struct bpf_verifier_env *env)
> +{
> +       const struct bpf_reg_state *sreg =
> +               cur_regs(env) + meta->insn.src_reg;
> +       const struct bpf_reg_state *dreg =
> +               cur_regs(env) + meta->insn.dst_reg;
> +
> +       meta->umin_src = min(meta->umin_src, sreg->umin_value);
> +       meta->umax_src = max(meta->umax_src, sreg->umax_value);
> +       meta->umin_dst = min(meta->umin_dst, dreg->umin_value);
> +       meta->umax_dst = max(meta->umax_dst, dreg->umax_value);
> +
> +       /* NFP supports u16 and u32 multiplication.
> +        *
> +        * For ALU64, if either operand is beyond u32's value range, we reject
> +        * it. One thing to note, if the source operand is BPF_K, then we need
> +        * to check "imm" field directly, and we'd reject it if it is negative.
> +        * Because for ALU64, "imm" (with s32 type) is expected to be sign
> +        * extended to s64 which NFP mul doesn't support.
> +        *
> +        * For ALU32, it is fine for "imm" be negative though, because the
> +        * result is 32-bits and there is no difference on the low halve of
> +        * the result for signed/unsigned mul, so we will get correct result.
> +        */
> +       if (is_mbpf_mul(meta)) {
> +               if (meta->umax_dst > U32_MAX) {
> +                       pr_vlog(env, "multiplier is not within u32 value range\n");
> +                       return -EINVAL;
> +               }
> +               if (mbpf_src(meta) == BPF_X && meta->umax_src > U32_MAX) {
> +                       pr_vlog(env, "multiplicand is not within u32 value range\n");
> +                       return -EINVAL;
> +               }
> +               if (mbpf_class(meta) == BPF_ALU64 &&
> +                   mbpf_src(meta) == BPF_K && meta->insn.imm < 0) {
> +                       pr_vlog(env, "sign extended multiplicand won't be within u32 value range\n");
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
>  static int
>  nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
>  {
> @@ -551,17 +596,8 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
>         if (is_mbpf_xadd(meta))
>                 return nfp_bpf_check_xadd(nfp_prog, meta, env);
>
> -       if (is_mbpf_alu(meta)) {
> -               const struct bpf_reg_state *sreg =
> -                       cur_regs(env) + meta->insn.src_reg;
> -               const struct bpf_reg_state *dreg =
> -                       cur_regs(env) + meta->insn.dst_reg;
> -
> -               meta->umin_src = min(meta->umin_src, sreg->umin_value);
> -               meta->umax_src = max(meta->umax_src, sreg->umax_value);
> -               meta->umin_dst = min(meta->umin_dst, dreg->umin_value);
> -               meta->umax_dst = max(meta->umax_dst, dreg->umax_value);
> -       }
> +       if (is_mbpf_alu(meta))
> +               return nfp_bpf_check_alu(nfp_prog, meta, env);
>
>         return 0;
>  }
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.h b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
> index f6677bc9875a..cdc4e065f6f5 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_asm.h
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
> @@ -426,4 +426,32 @@ static inline u32 nfp_get_ind_csr_ctx_ptr_offs(u32 read_offset)
>         return (read_offset & ~NFP_IND_ME_CTX_PTR_BASE_MASK) | NFP_CSR_CTX_PTR;
>  }
>
> +enum mul_type {
> +       MUL_TYPE_START          = 0x00,
> +       MUL_TYPE_STEP_24x8      = 0x01,
> +       MUL_TYPE_STEP_16x16     = 0x02,
> +       MUL_TYPE_STEP_32x32     = 0x03,
> +};
> +
> +enum mul_step {
> +       MUL_STEP_1              = 0x00,
> +       MUL_STEP_NONE           = MUL_STEP_1,
> +       MUL_STEP_2              = 0x01,
> +       MUL_STEP_3              = 0x02,
> +       MUL_STEP_4              = 0x03,
> +       MUL_LAST                = 0x04,
> +       MUL_LAST_2              = 0x05,
> +};
> +
> +#define OP_MUL_BASE            0x0f800000000ULL
> +#define OP_MUL_A_SRC           0x000000003ffULL
> +#define OP_MUL_B_SRC           0x000000ffc00ULL
> +#define OP_MUL_STEP            0x00000700000ULL
> +#define OP_MUL_DST_AB          0x00000800000ULL
> +#define OP_MUL_SW              0x00040000000ULL
> +#define OP_MUL_TYPE            0x00180000000ULL
> +#define OP_MUL_WR_AB           0x20000000000ULL
> +#define OP_MUL_SRC_LMEXTN      0x40000000000ULL
> +#define OP_MUL_DST_LMEXTN      0x80000000000ULL
> +
>  #endif
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 6/7] nfp: bpf: support u32 divide using reciprocal_div.h
  2018-06-25  3:54 ` [PATCH bpf-next 6/7] nfp: bpf: support u32 divide using reciprocal_div.h Jakub Kicinski
@ 2018-06-26 22:28   ` Song Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-06-26 22:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking, Jiong Wang

On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
>
> NFP doesn't have integer divide instruction, this patch use reciprocal
> algorithm (the basic one, reciprocal_div) to emulate it.
>
> For each u32 divide, we would need 11 instructions to finish the operation.
>
>   7 (for multiplication) + 4 (various ALUs) = 11
>
> Given NFP only supports multiplication no bigger than u32, we'd require
> divisor and dividend no bigger than that as well.
>
> Also eBPF doesn't support signed divide and has enforced this on C language
> level by failing compilation. However LLVM assembler hasn't enforced this,
> so it is possible for negative constant to leak in as a BPF_K operand
> through assembly code, we reject such cases as well.
>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 58 ++++++++++++++++++-
>  drivers/net/ethernet/netronome/nfp/bpf/main.h |  5 ++
>  .../net/ethernet/netronome/nfp/bpf/verifier.c | 31 ++++++++++
>  3 files changed, 93 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> index 7d7061d93358..d732b6cfc356 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> @@ -34,10 +34,11 @@
>  #define pr_fmt(fmt)    "NFP net bpf: " fmt
>
>  #include <linux/bug.h>
> -#include <linux/kernel.h>
>  #include <linux/bpf.h>
>  #include <linux/filter.h>
> +#include <linux/kernel.h>
>  #include <linux/pkt_cls.h>
> +#include <linux/reciprocal_div.h>
>  #include <linux/unistd.h>
>
>  #include "main.h"
> @@ -1493,6 +1494,32 @@ wrp_mul(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
>         return 0;
>  }
>
> +static int wrp_div_imm(struct nfp_prog *nfp_prog, u8 dst, u64 imm)
> +{
> +       swreg tmp_both = imm_both(nfp_prog), dst_both = reg_both(dst);
> +       swreg dst_a = reg_a(dst), dst_b = reg_a(dst);
> +       struct reciprocal_value rvalue;
> +       swreg tmp_b = imm_b(nfp_prog);
> +       swreg magic;
> +
> +       if (imm > U32_MAX) {
> +               wrp_immed(nfp_prog, dst_both, 0);
> +               return 0;
> +       }
> +
> +       rvalue = reciprocal_value(imm);
> +       magic = re_load_imm_any(nfp_prog, rvalue.m, imm_b(nfp_prog));
> +       wrp_mul_u32(nfp_prog, tmp_both, tmp_both, dst_a, magic, true);
> +       emit_alu(nfp_prog, dst_both, dst_a, ALU_OP_SUB, tmp_b);
> +       emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE, dst_b,
> +                SHF_SC_R_SHF, rvalue.sh1);
> +       emit_alu(nfp_prog, dst_both, dst_a, ALU_OP_ADD, tmp_b);
> +       emit_shf(nfp_prog, dst_both, reg_none(), SHF_OP_NONE, dst_b,
> +                SHF_SC_R_SHF, rvalue.sh2);
> +
> +       return 0;
> +}
> +
>  static int adjust_head(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>  {
>         swreg tmp = imm_a(nfp_prog), tmp_len = imm_b(nfp_prog);
> @@ -1807,6 +1834,21 @@ static int mul_imm64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>         return wrp_mul(nfp_prog, meta, true, false);
>  }
>
> +static int div_imm64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> +{
> +       const struct bpf_insn *insn = &meta->insn;
> +
> +       return wrp_div_imm(nfp_prog, insn->dst_reg * 2, insn->imm);
> +}
> +
> +static int div_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> +{
> +       /* NOTE: verifier hook has rejected cases for which verifier doesn't
> +        * know whether the source operand is constant or not.
> +        */
> +       return wrp_div_imm(nfp_prog, meta->insn.dst_reg * 2, meta->umin_src);
> +}
> +
>  static int neg_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>  {
>         const struct bpf_insn *insn = &meta->insn;
> @@ -2230,6 +2272,16 @@ static int mul_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>         return wrp_mul(nfp_prog, meta, false, false);
>  }
>
> +static int div_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> +{
> +       return div_reg64(nfp_prog, meta);
> +}
> +
> +static int div_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> +{
> +       return div_imm64(nfp_prog, meta);
> +}
> +
>  static int neg_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>  {
>         u8 dst = meta->insn.dst_reg * 2;
> @@ -2983,6 +3035,8 @@ static const instr_cb_t instr_cb[256] = {
>         [BPF_ALU64 | BPF_SUB | BPF_K] = sub_imm64,
>         [BPF_ALU64 | BPF_MUL | BPF_X] = mul_reg64,
>         [BPF_ALU64 | BPF_MUL | BPF_K] = mul_imm64,
> +       [BPF_ALU64 | BPF_DIV | BPF_X] = div_reg64,
> +       [BPF_ALU64 | BPF_DIV | BPF_K] = div_imm64,
>         [BPF_ALU64 | BPF_NEG] =         neg_reg64,
>         [BPF_ALU64 | BPF_LSH | BPF_X] = shl_reg64,
>         [BPF_ALU64 | BPF_LSH | BPF_K] = shl_imm64,
> @@ -3004,6 +3058,8 @@ static const instr_cb_t instr_cb[256] = {
>         [BPF_ALU | BPF_SUB | BPF_K] =   sub_imm,
>         [BPF_ALU | BPF_MUL | BPF_X] =   mul_reg,
>         [BPF_ALU | BPF_MUL | BPF_K] =   mul_imm,
> +       [BPF_ALU | BPF_DIV | BPF_X] =   div_reg,
> +       [BPF_ALU | BPF_DIV | BPF_K] =   div_imm,
>         [BPF_ALU | BPF_NEG] =           neg_reg,
>         [BPF_ALU | BPF_LSH | BPF_K] =   shl_imm,
>         [BPF_ALU | BPF_END | BPF_X] =   end_reg32,
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
> index c10079b1a312..9845c1a2d4c2 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
> @@ -399,6 +399,11 @@ static inline bool is_mbpf_mul(const struct nfp_insn_meta *meta)
>         return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_MUL;
>  }
>
> +static inline bool is_mbpf_div(const struct nfp_insn_meta *meta)
> +{
> +       return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_DIV;
> +}
> +
>  /**
>   * struct nfp_prog - nfp BPF program
>   * @bpf: backpointer to the bpf app priv structure
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> index 30d4f1580693..f0f07e988c46 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
> @@ -558,6 +558,37 @@ nfp_bpf_check_alu(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
>                 }
>         }
>
> +       /* NFP doesn't have divide instructions, we support divide by constant
> +        * through reciprocal multiplication. Given NFP support multiplication
> +        * no bigger than u32, we'd require divisor and dividend no bigger than
> +        * that as well.
> +        *
> +        * Also eBPF doesn't support signed divide and has enforced this on C
> +        * language level by failing compilation. However LLVM assembler hasn't
> +        * enforced this, so it is possible for negative constant to leak in as
> +        * a BPF_K operand through assembly code, we reject such cases as well.
> +        */
> +       if (is_mbpf_div(meta)) {
> +               if (meta->umax_dst > U32_MAX) {
> +                       pr_vlog(env, "divisor is not within u32 value range\n");
> +                       return -EINVAL;
> +               }
> +               if (mbpf_src(meta) == BPF_X) {
> +                       if (meta->umin_src != meta->umax_src) {
> +                               pr_vlog(env, "dividend is not constant\n");
> +                               return -EINVAL;
> +                       }
> +                       if (meta->umax_src > U32_MAX) {
> +                               pr_vlog(env, "dividend is not within u32 value range\n");
> +                               return -EINVAL;
> +                       }
> +               }
> +               if (mbpf_src(meta) == BPF_K && meta->insn.imm < 0) {
> +                       pr_vlog(env, "divide by negative constant is not supported\n");
> +                       return -EINVAL;
> +               }
> +       }
> +
>         return 0;
>  }
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
  2018-06-26 20:52     ` Jakub Kicinski
@ 2018-06-27  8:54       ` Daniel Borkmann
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2018-06-27  8:54 UTC (permalink / raw)
  To: Jakub Kicinski, Song Liu
  Cc: Alexei Starovoitov, oss-drivers, Networking, Jiong Wang

On 06/26/2018 10:52 PM, Jakub Kicinski wrote:
> On Mon, 25 Jun 2018 23:21:10 -0700, Song Liu wrote:
>>> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
>>> +{
>>> +       struct reciprocal_value_adv R;
>>> +       u32 l, post_shift;
>>> +       u64 mhigh, mlow;
>>> +
>>> +       l = fls(d - 1);
>>> +       post_shift = l;
>>> +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
>>> +        * MSB set. This case needs to be handled before calling
>>> +        * "reciprocal_value_adv", please see the comment at
>>> +        * include/linux/reciprocal_div.h.
>>> +        */  
>>
>> Shall we handle l == 32 case better? I guess the concern here is extra
>> handling may slow down the fast path? If that's the case, we should
>> at least add a WARNING on the slow path.
> 
> Agreed, I think Jiong is travelling, hence no response.  We'll respin.

Ok, since there's going to be a respin, I've tossed the current series from
patchwork in that case.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 7/7] nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h
  2018-06-26 20:59   ` Jakub Kicinski
@ 2018-07-05 18:28     ` Jiong Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Jiong Wang @ 2018-07-05 18:28 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: alexei.starovoitov, daniel, oss-drivers, netdev

On 26/06/2018 21:59, Jakub Kicinski wrote:
> On Sun, 24 Jun 2018 20:54:21 -0700, Jakub Kicinski wrote:
>> +	 * NOTE: because we are using "reciprocal_value_adv" which doesn't
>> +	 * support dividend with MSB set, so we need to JIT separate NFP
>> +	 * sequence to handle such case. It could be a simple sequence if there
>> +	 * is conditional move, however there isn't for NFP. So, we don't bother
>> +	 * generating compare-if-set-branch sequence by rejecting the program
>> +	 * straight away when the u32 dividend has MSB set. Divide by such a
>> +	 * large constant would be rare in practice. Also, the programmer could
>> +	 * simply rewrite it as "result = divisor >= the_const".
> Thinking about this again, can we just use carry bit?

Good catch, yes we can.

> The code may end
> up shorter than the explanation why we don't support that case :P
>
> immed[c, 0]
> alu[--, a, -, b]
> alu[c, c, +carry, 0]

eBPF input will be "a = a / b", given "immed" doesn't affect carry bit,
I'd reorder the sequence so we only need one tmp register for holding
"b" who is constant.

   alu[--, a, -, b]
   immed[b, 0]
   alu[a, b, +carry, 0]
  
Thanks.
Regards,
Jiong

>
> Should be equivalent to:
>
> c = a >= b
>
> (Thanks to Edwin for double-checking the carry semantics.)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
  2018-06-28 19:02 [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jiong Wang
@ 2018-06-28 21:05 ` Jakub Kicinski
  0 siblings, 0 replies; 22+ messages in thread
From: Jakub Kicinski @ 2018-06-28 21:05 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Song Liu, Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking

On Thu, 28 Jun 2018 20:02:43 +0100, Jiong Wang wrote:
> > If that's the case, we should at least add a WARNING on the slow path.  
> 
> OK, I will add a pr_warn inside "reciprocal_value_adv" when l == 32 is
> triggered.

WARN() seems useful, given seeing l == 32 means the code calling this
function is buggy, and we want to see the back trace to figure out how
it happened.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
@ 2018-06-28 19:02 Jiong Wang
  2018-06-28 21:05 ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Jiong Wang @ 2018-06-28 19:02 UTC (permalink / raw)
  To: Song Liu
  Cc: Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, oss-drivers,
	Networking

On Tue, Jun 26, 2018 at 7:21 AM, Song Liu <liu.song.a23@gmail.com> wrote:
> On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
>> From: Jiong Wang <jiong.wang@netronome.com>

<snip>

>> +
>> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
>> +{
>> +       struct reciprocal_value_adv R;
>> +       u32 l, post_shift;
>> +       u64 mhigh, mlow;
>> +
>> +       l = fls(d - 1);
>> +       post_shift = l;
>> +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
>> +        * MSB set. This case needs to be handled before calling
>> +        * "reciprocal_value_adv", please see the comment at
>> +        * include/linux/reciprocal_div.h.
>> +        */
>
> Shall we handle l == 32 case better? I guess the concern here is extra
> handling may
> slow down the fast path?

The implementation of "reciprocal_value_adv" hasn't considered l  ==
32 which will make the code more complex.

As described at the pseudo code about how to call
"reciprocal_value_adv" in include/linux/reciprocal_div.h, l == 32
means the MSB of dividend is set, so the result of unsigned
divisor/dividend could only be 0 or 1, so the divide result could be
easily get by a comparison then conditional move 0 or 1 to the result.

> If that's the case, we should at least add a WARNING on the slow path.

OK, I will add a pr_warn inside "reciprocal_value_adv" when l == 32 is
triggered.

Thanks,
Jiong

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-07-05 18:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-25  3:54 [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps Jakub Kicinski
2018-06-25  3:54 ` [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization Jakub Kicinski
2018-06-26  5:50   ` Song Liu
2018-06-26  7:08     ` Jakub Kicinski
2018-06-26 16:26       ` Song Liu
2018-06-25  3:54 ` [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jakub Kicinski
2018-06-26  6:21   ` Song Liu
2018-06-26 20:52     ` Jakub Kicinski
2018-06-27  8:54       ` Daniel Borkmann
2018-06-25  3:54 ` [PATCH bpf-next 3/7] nfp: bpf: rename umin/umax to umin_src/umax_src Jakub Kicinski
2018-06-26  6:21   ` Song Liu
2018-06-25  3:54 ` [PATCH bpf-next 4/7] nfp: bpf: copy range info for all operands of all ALU operations Jakub Kicinski
2018-06-26  6:50   ` Song Liu
2018-06-25  3:54 ` [PATCH bpf-next 5/7] nfp: bpf: support u16 and u32 multiplications Jakub Kicinski
2018-06-26 22:23   ` Song Liu
2018-06-25  3:54 ` [PATCH bpf-next 6/7] nfp: bpf: support u32 divide using reciprocal_div.h Jakub Kicinski
2018-06-26 22:28   ` Song Liu
2018-06-25  3:54 ` [PATCH bpf-next 7/7] nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h Jakub Kicinski
2018-06-26 20:59   ` Jakub Kicinski
2018-07-05 18:28     ` Jiong Wang
2018-06-28 19:02 [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jiong Wang
2018-06-28 21:05 ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.