* [PATCH] i40e: replace switch-statement with if-clause
@ 2019-01-21 16:33 ` bjorn.topel
0 siblings, 0 replies; 12+ messages in thread
From: bjorn.topel @ 2019-01-21 16:33 UTC (permalink / raw)
To: intel-wired-lan
Cc: Björn Töpel, brouer, magnus.karlsson, magnus.karlsson, netdev
From: Björn Töpel <bjorn.topel@intel.com>
GCC will generate jump tables for switch-statements with more than 5
case statements. An entry into the jump table is an indirect call,
which means that for CONFIG_RETPOLINE builds, this is rather
expensive.
This commit replaces the switch-statement that acts on the XDP program
result with an if-clause.
The if-clause was also refactored into a common function that can be
used by AF_XDP zero-copy and non-zero-copy code.
Performance prior this patch:
$ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
XDP stats CPU pps issue-pps
XDP-RX CPU 20 18983018 0
XDP-RX CPU total 18983018
RXQ stats RXQ:CPU pps issue-pps
rx_queue_index 20:20 18983012 0
rx_queue_index 20:sum 18983012
$ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
sock0@enp134s0f0:20 rxdrop
pps pkts 2.00
rx 14,641,496 144,751,092
tx 0 0
And after:
$ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
XDP stats CPU pps issue-pps
XDP-RX CPU 20 24000986 0
XDP-RX CPU total 24000986
RXQ stats RXQ:CPU pps issue-pps
rx_queue_index 20:20 24000985 0
rx_queue_index 20:sum 24000985
+26%
$ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
sock0@enp134s0f0:20 rxdrop
pps pkts 2.00
rx 17,623,578 163,503,263
tx 0 0
+20%
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 31 ++++---------------
.../ethernet/intel/i40e/i40e_txrx_common.h | 27 ++++++++++++++++
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 24 ++------------
3 files changed, 35 insertions(+), 47 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index a7e14e98889f..b339b7ee6380 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2,7 +2,6 @@
/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include <linux/prefetch.h>
-#include <linux/bpf_trace.h>
#include <net/xdp.h>
#include "i40e.h"
#include "i40e_trace.h"
@@ -2195,41 +2194,23 @@ int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring)
static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
struct xdp_buff *xdp)
{
- int err, result = I40E_XDP_PASS;
- struct i40e_ring *xdp_ring;
struct bpf_prog *xdp_prog;
+ int result;
u32 act;
rcu_read_lock();
xdp_prog = READ_ONCE(rx_ring->xdp_prog);
- if (!xdp_prog)
+ if (!xdp_prog) {
+ result = I40E_XDP_PASS;
goto xdp_out;
+ }
prefetchw(xdp->data_hard_start); /* xdp_frame write */
act = bpf_prog_run_xdp(xdp_prog, xdp);
- switch (act) {
- case XDP_PASS:
- break;
- case XDP_TX:
- xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
- result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring);
- break;
- case XDP_REDIRECT:
- err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
- result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
- break;
- default:
- bpf_warn_invalid_xdp_action(act);
- /* fall through */
- case XDP_ABORTED:
- trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- /* fall through -- handle aborts by dropping packet */
- case XDP_DROP:
- result = I40E_XDP_CONSUMED;
- break;
- }
+ i40e_xdp_do_action(act, &result, rx_ring, xdp, xdp_prog);
+
xdp_out:
rcu_read_unlock();
return ERR_PTR(-result);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 8af0e99c6c0d..8cc4d8365f9e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -4,6 +4,8 @@
#ifndef I40E_TXRX_COMMON_
#define I40E_TXRX_COMMON_
+#include <linux/bpf_trace.h>
+
void i40e_fd_handle_status(struct i40e_ring *rx_ring,
union i40e_rx_desc *rx_desc, u8 prog_id);
int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring);
@@ -88,4 +90,29 @@ void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring);
void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi);
+static inline void i40e_xdp_do_action(u32 act, int *result,
+ struct i40e_ring *rx_ring,
+ struct xdp_buff *xdp,
+ struct bpf_prog *xdp_prog)
+{
+ struct i40e_ring *xdp_ring;
+ int err;
+
+ if (act == XDP_TX) {
+ xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
+ *result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring);
+ } else if (act == XDP_REDIRECT) {
+ err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
+ *result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
+ } else if (act == XDP_PASS) {
+ *result = I40E_XDP_PASS;
+ } else if (act == XDP_DROP) {
+ *result = I40E_XDP_CONSUMED;
+ } else {
+ if (act != XDP_ABORTED)
+ bpf_warn_invalid_xdp_action(act);
+ trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
+ *result = I40E_XDP_CONSUMED;
+ }
+}
#endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 870cf654e436..1ed56475ec78 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -282,9 +282,8 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
**/
static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
{
- int err, result = I40E_XDP_PASS;
- struct i40e_ring *xdp_ring;
struct bpf_prog *xdp_prog;
+ int result;
u32 act;
rcu_read_lock();
@@ -294,26 +293,7 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
xdp_prog = READ_ONCE(rx_ring->xdp_prog);
act = bpf_prog_run_xdp(xdp_prog, xdp);
xdp->handle += xdp->data - xdp->data_hard_start;
- switch (act) {
- case XDP_PASS:
- break;
- case XDP_TX:
- xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
- result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring);
- break;
- case XDP_REDIRECT:
- err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
- result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
- break;
- default:
- bpf_warn_invalid_xdp_action(act);
- case XDP_ABORTED:
- trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- /* fallthrough -- handle aborts by dropping packet */
- case XDP_DROP:
- result = I40E_XDP_CONSUMED;
- break;
- }
+ i40e_xdp_do_action(act, &result, rx_ring, xdp, xdp_prog);
rcu_read_unlock();
return result;
}
--
2.19.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Intel-wired-lan] [PATCH] i40e: replace switch-statement with if-clause
@ 2019-01-21 16:33 ` bjorn.topel
0 siblings, 0 replies; 12+ messages in thread
From: bjorn.topel @ 2019-01-21 16:33 UTC (permalink / raw)
To: intel-wired-lan
From: Bj?rn T?pel <bjorn.topel@intel.com>
GCC will generate jump tables for switch-statements with more than 5
case statements. An entry into the jump table is an indirect call,
which means that for CONFIG_RETPOLINE builds, this is rather
expensive.
This commit replaces the switch-statement that acts on the XDP program
result with an if-clause.
The if-clause was also refactored into a common function that can be
used by AF_XDP zero-copy and non-zero-copy code.
Performance prior this patch:
$ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
XDP stats CPU pps issue-pps
XDP-RX CPU 20 18983018 0
XDP-RX CPU total 18983018
RXQ stats RXQ:CPU pps issue-pps
rx_queue_index 20:20 18983012 0
rx_queue_index 20:sum 18983012
$ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
sock0 at enp134s0f0:20 rxdrop
pps pkts 2.00
rx 14,641,496 144,751,092
tx 0 0
And after:
$ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
XDP stats CPU pps issue-pps
XDP-RX CPU 20 24000986 0
XDP-RX CPU total 24000986
RXQ stats RXQ:CPU pps issue-pps
rx_queue_index 20:20 24000985 0
rx_queue_index 20:sum 24000985
+26%
$ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
sock0 at enp134s0f0:20 rxdrop
pps pkts 2.00
rx 17,623,578 163,503,263
tx 0 0
+20%
Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 31 ++++---------------
.../ethernet/intel/i40e/i40e_txrx_common.h | 27 ++++++++++++++++
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 24 ++------------
3 files changed, 35 insertions(+), 47 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index a7e14e98889f..b339b7ee6380 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2,7 +2,6 @@
/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include <linux/prefetch.h>
-#include <linux/bpf_trace.h>
#include <net/xdp.h>
#include "i40e.h"
#include "i40e_trace.h"
@@ -2195,41 +2194,23 @@ int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring)
static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
struct xdp_buff *xdp)
{
- int err, result = I40E_XDP_PASS;
- struct i40e_ring *xdp_ring;
struct bpf_prog *xdp_prog;
+ int result;
u32 act;
rcu_read_lock();
xdp_prog = READ_ONCE(rx_ring->xdp_prog);
- if (!xdp_prog)
+ if (!xdp_prog) {
+ result = I40E_XDP_PASS;
goto xdp_out;
+ }
prefetchw(xdp->data_hard_start); /* xdp_frame write */
act = bpf_prog_run_xdp(xdp_prog, xdp);
- switch (act) {
- case XDP_PASS:
- break;
- case XDP_TX:
- xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
- result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring);
- break;
- case XDP_REDIRECT:
- err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
- result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
- break;
- default:
- bpf_warn_invalid_xdp_action(act);
- /* fall through */
- case XDP_ABORTED:
- trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- /* fall through -- handle aborts by dropping packet */
- case XDP_DROP:
- result = I40E_XDP_CONSUMED;
- break;
- }
+ i40e_xdp_do_action(act, &result, rx_ring, xdp, xdp_prog);
+
xdp_out:
rcu_read_unlock();
return ERR_PTR(-result);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 8af0e99c6c0d..8cc4d8365f9e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -4,6 +4,8 @@
#ifndef I40E_TXRX_COMMON_
#define I40E_TXRX_COMMON_
+#include <linux/bpf_trace.h>
+
void i40e_fd_handle_status(struct i40e_ring *rx_ring,
union i40e_rx_desc *rx_desc, u8 prog_id);
int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring);
@@ -88,4 +90,29 @@ void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring);
void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi);
+static inline void i40e_xdp_do_action(u32 act, int *result,
+ struct i40e_ring *rx_ring,
+ struct xdp_buff *xdp,
+ struct bpf_prog *xdp_prog)
+{
+ struct i40e_ring *xdp_ring;
+ int err;
+
+ if (act == XDP_TX) {
+ xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
+ *result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring);
+ } else if (act == XDP_REDIRECT) {
+ err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
+ *result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
+ } else if (act == XDP_PASS) {
+ *result = I40E_XDP_PASS;
+ } else if (act == XDP_DROP) {
+ *result = I40E_XDP_CONSUMED;
+ } else {
+ if (act != XDP_ABORTED)
+ bpf_warn_invalid_xdp_action(act);
+ trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
+ *result = I40E_XDP_CONSUMED;
+ }
+}
#endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 870cf654e436..1ed56475ec78 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -282,9 +282,8 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
**/
static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
{
- int err, result = I40E_XDP_PASS;
- struct i40e_ring *xdp_ring;
struct bpf_prog *xdp_prog;
+ int result;
u32 act;
rcu_read_lock();
@@ -294,26 +293,7 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
xdp_prog = READ_ONCE(rx_ring->xdp_prog);
act = bpf_prog_run_xdp(xdp_prog, xdp);
xdp->handle += xdp->data - xdp->data_hard_start;
- switch (act) {
- case XDP_PASS:
- break;
- case XDP_TX:
- xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
- result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring);
- break;
- case XDP_REDIRECT:
- err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
- result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
- break;
- default:
- bpf_warn_invalid_xdp_action(act);
- case XDP_ABORTED:
- trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- /* fallthrough -- handle aborts by dropping packet */
- case XDP_DROP:
- result = I40E_XDP_CONSUMED;
- break;
- }
+ i40e_xdp_do_action(act, &result, rx_ring, xdp, xdp_prog);
rcu_read_unlock();
return result;
}
--
2.19.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [PATCH] i40e: replace switch-statement with if-clause
2019-01-21 16:33 ` [Intel-wired-lan] " bjorn.topel
@ 2019-01-21 16:42 ` Paul Menzel
-1 siblings, 0 replies; 12+ messages in thread
From: Paul Menzel @ 2019-01-21 16:42 UTC (permalink / raw)
To: Björn Töpel, intel-wired-lan
Cc: netdev, Björn Töpel, magnus.karlsson, magnus.karlsson, brouer
[-- Attachment #1: Type: text/plain, Size: 2584 bytes --]
Dear Björn,
On 01/21/19 17:33, bjorn.topel@gmail.com wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> GCC will generate jump tables for switch-statements with more than 5
> case statements. An entry into the jump table is an indirect call,
> which means that for CONFIG_RETPOLINE builds, this is rather
> expensive.
>
> This commit replaces the switch-statement that acts on the XDP program
> result with an if-clause.
Maybe mention the performance improvement already here. I’d also put it
into the commit message summary. Something like:
> i40e: Speed up retpoline case by using if-clause
If that jump tables are a common problem, I wonder, why the compiler
cannot be adapted to generate better performing code or an option passed
to the compiler.
> The if-clause was also refactored into a common function that can be
> used by AF_XDP zero-copy and non-zero-copy code.
>
> Performance prior this patch:
> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> XDP stats CPU pps issue-pps
> XDP-RX CPU 20 18983018 0
> XDP-RX CPU total 18983018
>
> RXQ stats RXQ:CPU pps issue-pps
> rx_queue_index 20:20 18983012 0
> rx_queue_index 20:sum 18983012
>
> $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> sock0@enp134s0f0:20 rxdrop
> pps pkts 2.00
> rx 14,641,496 144,751,092
> tx 0 0
>
> And after:
> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> XDP stats CPU pps issue-pps
> XDP-RX CPU 20 24000986 0
> XDP-RX CPU total 24000986
>
> RXQ stats RXQ:CPU pps issue-pps
> rx_queue_index 20:20 24000985 0
> rx_queue_index 20:sum 24000985
>
> +26%
>
> $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> sock0@enp134s0f0:20 rxdrop
> pps pkts 2.00
> rx 17,623,578 163,503,263
> tx 0 0
>
> +20%
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 31 ++++---------------
> .../ethernet/intel/i40e/i40e_txrx_common.h | 27 ++++++++++++++++
> drivers/net/ethernet/intel/i40e/i40e_xsk.c | 24 ++------------
> 3 files changed, 35 insertions(+), 47 deletions(-)
[…]
Kind regards,
Paul
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5174 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Intel-wired-lan] [PATCH] i40e: replace switch-statement with if-clause
@ 2019-01-21 16:42 ` Paul Menzel
0 siblings, 0 replies; 12+ messages in thread
From: Paul Menzel @ 2019-01-21 16:42 UTC (permalink / raw)
To: intel-wired-lan
Dear Bj?rn,
On 01/21/19 17:33, bjorn.topel at gmail.com wrote:
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> GCC will generate jump tables for switch-statements with more than 5
> case statements. An entry into the jump table is an indirect call,
> which means that for CONFIG_RETPOLINE builds, this is rather
> expensive.
>
> This commit replaces the switch-statement that acts on the XDP program
> result with an if-clause.
Maybe mention the performance improvement already here. I?d also put it
into the commit message summary. Something like:
> i40e: Speed up retpoline case by using if-clause
If that jump tables are a common problem, I wonder, why the compiler
cannot be adapted to generate better performing code or an option passed
to the compiler.
> The if-clause was also refactored into a common function that can be
> used by AF_XDP zero-copy and non-zero-copy code.
>
> Performance prior this patch:
> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> XDP stats CPU pps issue-pps
> XDP-RX CPU 20 18983018 0
> XDP-RX CPU total 18983018
>
> RXQ stats RXQ:CPU pps issue-pps
> rx_queue_index 20:20 18983012 0
> rx_queue_index 20:sum 18983012
>
> $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> sock0 at enp134s0f0:20 rxdrop
> pps pkts 2.00
> rx 14,641,496 144,751,092
> tx 0 0
>
> And after:
> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> XDP stats CPU pps issue-pps
> XDP-RX CPU 20 24000986 0
> XDP-RX CPU total 24000986
>
> RXQ stats RXQ:CPU pps issue-pps
> rx_queue_index 20:20 24000985 0
> rx_queue_index 20:sum 24000985
>
> +26%
>
> $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> sock0 at enp134s0f0:20 rxdrop
> pps pkts 2.00
> rx 17,623,578 163,503,263
> tx 0 0
>
> +20%
>
> Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 31 ++++---------------
> .../ethernet/intel/i40e/i40e_txrx_common.h | 27 ++++++++++++++++
> drivers/net/ethernet/intel/i40e/i40e_xsk.c | 24 ++------------
> 3 files changed, 35 insertions(+), 47 deletions(-)
[?]
Kind regards,
Paul
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190121/15b4cb01/attachment.p7s>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [PATCH] i40e: replace switch-statement with if-clause
2019-01-21 16:42 ` Paul Menzel
@ 2019-01-21 16:53 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
-1 siblings, 0 replies; 12+ messages in thread
From: Björn Töpel @ 2019-01-21 16:53 UTC (permalink / raw)
To: Paul Menzel
Cc: Björn Töpel, intel-wired-lan, Netdev, Magnus Karlsson,
Karlsson, Magnus, Jesper Dangaard Brouer
Den mån 21 jan. 2019 kl 17:42 skrev Paul Menzel <pmenzel@molgen.mpg.de>:
>
> Dear Björn,
>
>
> On 01/21/19 17:33, bjorn.topel@gmail.com wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > GCC will generate jump tables for switch-statements with more than 5
> > case statements. An entry into the jump table is an indirect call,
> > which means that for CONFIG_RETPOLINE builds, this is rather
> > expensive.
> >
> > This commit replaces the switch-statement that acts on the XDP program
> > result with an if-clause.
>
> Maybe mention the performance improvement already here. I’d also put it
> into the commit message summary. Something like:
>
> > i40e: Speed up retpoline case by using if-clause
>
Yes, this would, indeed, have been a better summary!
> If that jump tables are a common problem, I wonder, why the compiler
> cannot be adapted to generate better performing code or an option passed
> to the compiler.
>
It might make sense to use -fno-jump-tables or a better value for the
case-values-threshold param for the i40e code. However, doing that
would require a much broader testing, since there are a number of
different places where a switch-statement is used. And depending on
the context, a jump table might still be a better option.
Thank you for the comments!
Björn
> > The if-clause was also refactored into a common function that can be
> > used by AF_XDP zero-copy and non-zero-copy code.
> >
> > Performance prior this patch:
> > $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> > Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> > XDP stats CPU pps issue-pps
> > XDP-RX CPU 20 18983018 0
> > XDP-RX CPU total 18983018
> >
> > RXQ stats RXQ:CPU pps issue-pps
> > rx_queue_index 20:20 18983012 0
> > rx_queue_index 20:sum 18983012
> >
> > $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> > sock0@enp134s0f0:20 rxdrop
> > pps pkts 2.00
> > rx 14,641,496 144,751,092
> > tx 0 0
> >
> > And after:
> > $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> > Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> > XDP stats CPU pps issue-pps
> > XDP-RX CPU 20 24000986 0
> > XDP-RX CPU total 24000986
> >
> > RXQ stats RXQ:CPU pps issue-pps
> > rx_queue_index 20:20 24000985 0
> > rx_queue_index 20:sum 24000985
> >
> > +26%
> >
> > $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> > sock0@enp134s0f0:20 rxdrop
> > pps pkts 2.00
> > rx 17,623,578 163,503,263
> > tx 0 0
> >
> > +20%
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> > ---
> > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 31 ++++---------------
> > .../ethernet/intel/i40e/i40e_txrx_common.h | 27 ++++++++++++++++
> > drivers/net/ethernet/intel/i40e/i40e_xsk.c | 24 ++------------
> > 3 files changed, 35 insertions(+), 47 deletions(-)
>
> […]
>
>
> Kind regards,
>
> Paul
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Intel-wired-lan] [PATCH] i40e: replace switch-statement with if-clause
@ 2019-01-21 16:53 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
0 siblings, 0 replies; 12+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-01-21 16:53 UTC (permalink / raw)
To: intel-wired-lan
Den m?n 21 jan. 2019 kl 17:42 skrev Paul Menzel <pmenzel@molgen.mpg.de>:
>
> Dear Bj?rn,
>
>
> On 01/21/19 17:33, bjorn.topel at gmail.com wrote:
> > From: Bj?rn T?pel <bjorn.topel@intel.com>
> >
> > GCC will generate jump tables for switch-statements with more than 5
> > case statements. An entry into the jump table is an indirect call,
> > which means that for CONFIG_RETPOLINE builds, this is rather
> > expensive.
> >
> > This commit replaces the switch-statement that acts on the XDP program
> > result with an if-clause.
>
> Maybe mention the performance improvement already here. I?d also put it
> into the commit message summary. Something like:
>
> > i40e: Speed up retpoline case by using if-clause
>
Yes, this would, indeed, have been a better summary!
> If that jump tables are a common problem, I wonder, why the compiler
> cannot be adapted to generate better performing code or an option passed
> to the compiler.
>
It might make sense to use -fno-jump-tables or a better value for the
case-values-threshold param for the i40e code. However, doing that
would require a much broader testing, since there are a number of
different places where a switch-statement is used. And depending on
the context, a jump table might still be a better option.
Thank you for the comments!
Bj?rn
> > The if-clause was also refactored into a common function that can be
> > used by AF_XDP zero-copy and non-zero-copy code.
> >
> > Performance prior this patch:
> > $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> > Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> > XDP stats CPU pps issue-pps
> > XDP-RX CPU 20 18983018 0
> > XDP-RX CPU total 18983018
> >
> > RXQ stats RXQ:CPU pps issue-pps
> > rx_queue_index 20:20 18983012 0
> > rx_queue_index 20:sum 18983012
> >
> > $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> > sock0 at enp134s0f0:20 rxdrop
> > pps pkts 2.00
> > rx 14,641,496 144,751,092
> > tx 0 0
> >
> > And after:
> > $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> > Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> > XDP stats CPU pps issue-pps
> > XDP-RX CPU 20 24000986 0
> > XDP-RX CPU total 24000986
> >
> > RXQ stats RXQ:CPU pps issue-pps
> > rx_queue_index 20:20 24000985 0
> > rx_queue_index 20:sum 24000985
> >
> > +26%
> >
> > $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> > sock0 at enp134s0f0:20 rxdrop
> > pps pkts 2.00
> > rx 17,623,578 163,503,263
> > tx 0 0
> >
> > +20%
> >
> > Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
> > ---
> > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 31 ++++---------------
> > .../ethernet/intel/i40e/i40e_txrx_common.h | 27 ++++++++++++++++
> > drivers/net/ethernet/intel/i40e/i40e_xsk.c | 24 ++------------
> > 3 files changed, 35 insertions(+), 47 deletions(-)
>
> [?]
>
>
> Kind regards,
>
> Paul
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] i40e: replace switch-statement with if-clause
2019-01-21 16:33 ` [Intel-wired-lan] " bjorn.topel
@ 2019-01-21 18:59 ` Jesper Dangaard Brouer
-1 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2019-01-21 18:59 UTC (permalink / raw)
To: bjorn.topel
Cc: intel-wired-lan, Björn Töpel, magnus.karlsson,
magnus.karlsson, netdev, brouer
On Mon, 21 Jan 2019 17:33:56 +0100
bjorn.topel@gmail.com wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> GCC will generate jump tables for switch-statements with more than 5
> case statements. An entry into the jump table is an indirect call,
> which means that for CONFIG_RETPOLINE builds, this is rather
> expensive.
>
> This commit replaces the switch-statement that acts on the XDP program
> result with an if-clause.
>
> The if-clause was also refactored into a common function that can be
> used by AF_XDP zero-copy and non-zero-copy code.
>
> Performance prior this patch:
> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> XDP stats CPU pps issue-pps
> XDP-RX CPU 20 18983018 0
> XDP-RX CPU total 18983018
>
> RXQ stats RXQ:CPU pps issue-pps
> rx_queue_index 20:20 18983012 0
> rx_queue_index 20:sum 18983012
>
> $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> sock0@enp134s0f0:20 rxdrop
> pps pkts 2.00
> rx 14,641,496 144,751,092
> tx 0 0
>
> And after:
> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> XDP stats CPU pps issue-pps
> XDP-RX CPU 20 24000986 0
> XDP-RX CPU total 24000986
>
> RXQ stats RXQ:CPU pps issue-pps
> rx_queue_index 20:20 24000985 0
> rx_queue_index 20:sum 24000985
>
> +26%
>
> $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> sock0@enp134s0f0:20 rxdrop
> pps pkts 2.00
> rx 17,623,578 163,503,263
> tx 0 0
>
> +20%
The saving/cost of the retpoline is around 11 nanosec, which
corresponds well with my previous experience and microbenchmarking
around 12 ns.
((1/18983012)-(1/24000986))*10^9
11.01372430029000000000 nanosec
((1/14641496)-(1/17623578))*10^9
11.55686507951000000000 nanosec
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Intel-wired-lan] [PATCH] i40e: replace switch-statement with if-clause
@ 2019-01-21 18:59 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2019-01-21 18:59 UTC (permalink / raw)
To: intel-wired-lan
On Mon, 21 Jan 2019 17:33:56 +0100
bjorn.topel at gmail.com wrote:
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> GCC will generate jump tables for switch-statements with more than 5
> case statements. An entry into the jump table is an indirect call,
> which means that for CONFIG_RETPOLINE builds, this is rather
> expensive.
>
> This commit replaces the switch-statement that acts on the XDP program
> result with an if-clause.
>
> The if-clause was also refactored into a common function that can be
> used by AF_XDP zero-copy and non-zero-copy code.
>
> Performance prior this patch:
> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> XDP stats CPU pps issue-pps
> XDP-RX CPU 20 18983018 0
> XDP-RX CPU total 18983018
>
> RXQ stats RXQ:CPU pps issue-pps
> rx_queue_index 20:20 18983012 0
> rx_queue_index 20:sum 18983012
>
> $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> sock0 at enp134s0f0:20 rxdrop
> pps pkts 2.00
> rx 14,641,496 144,751,092
> tx 0 0
>
> And after:
> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> XDP stats CPU pps issue-pps
> XDP-RX CPU 20 24000986 0
> XDP-RX CPU total 24000986
>
> RXQ stats RXQ:CPU pps issue-pps
> rx_queue_index 20:20 24000985 0
> rx_queue_index 20:sum 24000985
>
> +26%
>
> $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> sock0 at enp134s0f0:20 rxdrop
> pps pkts 2.00
> rx 17,623,578 163,503,263
> tx 0 0
>
> +20%
The saving/cost of the retpoline is around 11 nanosec, which
corresponds well with my previous experience and microbenchmarking
around 12 ns.
((1/18983012)-(1/24000986))*10^9
11.01372430029000000000 nanosec
((1/14641496)-(1/17623578))*10^9
11.55686507951000000000 nanosec
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [PATCH] i40e: replace switch-statement with if-clause
2019-01-21 16:53 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2019-01-21 19:12 ` Jesper Dangaard Brouer
-1 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2019-01-21 19:12 UTC (permalink / raw)
To: Björn Töpel
Cc: Paul Menzel, Björn Töpel, intel-wired-lan, Netdev,
Magnus Karlsson, Karlsson, Magnus, brouer
On Mon, 21 Jan 2019 17:53:45 +0100
Björn Töpel <bjorn.topel@gmail.com> wrote:
> > If that jump tables are a common problem, I wonder, why the compiler
> > cannot be adapted to generate better performing code or an option passed
> > to the compiler.
> >
>
> It might make sense to use -fno-jump-tables or a better value for the
> case-values-threshold param for the i40e code. However, doing that
> would require a much broader testing, since there are a number of
> different places where a switch-statement is used. And depending on
> the context, a jump table might still be a better option.
I recently found out that it is possible to disable GCC attributes per
function basis. See how I played with it here:
https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org#investigate-overhead-of-bpf-indirect-retpoline
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Intel-wired-lan] [PATCH] i40e: replace switch-statement with if-clause
@ 2019-01-21 19:12 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2019-01-21 19:12 UTC (permalink / raw)
To: intel-wired-lan
On Mon, 21 Jan 2019 17:53:45 +0100
Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
> > If that jump tables are a common problem, I wonder, why the compiler
> > cannot be adapted to generate better performing code or an option passed
> > to the compiler.
> >
>
> It might make sense to use -fno-jump-tables or a better value for the
> case-values-threshold param for the i40e code. However, doing that
> would require a much broader testing, since there are a number of
> different places where a switch-statement is used. And depending on
> the context, a jump table might still be a better option.
I recently found out that it is possible to disable GCC attributes per
function basis. See how I played with it here:
https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org#investigate-overhead-of-bpf-indirect-retpoline
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [PATCH] i40e: replace switch-statement with if-clause
2019-01-21 19:12 ` Jesper Dangaard Brouer
@ 2019-01-22 6:51 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
-1 siblings, 0 replies; 12+ messages in thread
From: Björn Töpel @ 2019-01-22 6:51 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Paul Menzel, Björn Töpel, intel-wired-lan, Netdev,
Magnus Karlsson, Karlsson, Magnus
Den mån 21 jan. 2019 kl 20:12 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> On Mon, 21 Jan 2019 17:53:45 +0100
> Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> > > If that jump tables are a common problem, I wonder, why the compiler
> > > cannot be adapted to generate better performing code or an option passed
> > > to the compiler.
> > >
> >
> > It might make sense to use -fno-jump-tables or a better value for the
> > case-values-threshold param for the i40e code. However, doing that
> > would require a much broader testing, since there are a number of
> > different places where a switch-statement is used. And depending on
> > the context, a jump table might still be a better option.
>
> I recently found out that it is possible to disable GCC attributes per
> function basis. See how I played with it here:
> https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org#investigate-overhead-of-bpf-indirect-retpoline
>
Nice! ...but does that help here? I mean, excluding the thunks in jump
table call generation isn't really an option?
Is it possible in a function scoped way to change, say, the
case-values-threshold?
Björn
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Intel-wired-lan] [PATCH] i40e: replace switch-statement with if-clause
@ 2019-01-22 6:51 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
0 siblings, 0 replies; 12+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2019-01-22 6:51 UTC (permalink / raw)
To: intel-wired-lan
Den m?n 21 jan. 2019 kl 20:12 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> On Mon, 21 Jan 2019 17:53:45 +0100
> Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>
> > > If that jump tables are a common problem, I wonder, why the compiler
> > > cannot be adapted to generate better performing code or an option passed
> > > to the compiler.
> > >
> >
> > It might make sense to use -fno-jump-tables or a better value for the
> > case-values-threshold param for the i40e code. However, doing that
> > would require a much broader testing, since there are a number of
> > different places where a switch-statement is used. And depending on
> > the context, a jump table might still be a better option.
>
> I recently found out that it is possible to disable GCC attributes per
> function basis. See how I played with it here:
> https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org#investigate-overhead-of-bpf-indirect-retpoline
>
Nice! ...but does that help here? I mean, excluding the thunks in jump
table call generation isn't really an option?
Is it possible in a function scoped way to change, say, the
case-values-threshold?
Bj?rn
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-01-22 6:51 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-21 16:33 [PATCH] i40e: replace switch-statement with if-clause bjorn.topel
2019-01-21 16:33 ` [Intel-wired-lan] " bjorn.topel
2019-01-21 16:42 ` Paul Menzel
2019-01-21 16:42 ` Paul Menzel
2019-01-21 16:53 ` Björn Töpel
2019-01-21 16:53 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-01-21 19:12 ` Jesper Dangaard Brouer
2019-01-21 19:12 ` Jesper Dangaard Brouer
2019-01-22 6:51 ` Björn Töpel
2019-01-22 6:51 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2019-01-21 18:59 ` Jesper Dangaard Brouer
2019-01-21 18:59 ` [Intel-wired-lan] " Jesper Dangaard Brouer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.