netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] net/mlx5: use indirect call wrappers
@ 2019-05-31 12:53 Paolo Abeni
  2019-05-31 12:53 ` [PATCH net-next 1/3] net/mlx5e: use indirect calls wrapper for skb allocation Paolo Abeni
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Paolo Abeni @ 2019-05-31 12:53 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Saeed Mahameed, Leon Romanovsky

The mlx5_core driver uses several indirect calls in fast-path, some of them
are invoked on each ingress packet, even for the XDP-only traffic.

This series leverage the indirect call wrappers infrastructure the avoid
the expansive RETPOLINE overhead for 2 indirect calls in fast-path.

Each call is addressed on a different patch, plus we need to introduce a couple
of additional helpers to cope with the higher number of possible direct-call
alternatives.

Paolo Abeni (3):
  net/mlx5e: use indirect calls wrapper for skb allocation
  indirect call wrappers: add helpers for 3 and 4 ways switch
  net/mlx5e: use indirect calls wrapper for the rx packet handler

 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 26 ++++++++++++++-----
 include/linux/indirect_call_wrapper.h         | 12 +++++++++
 2 files changed, 32 insertions(+), 6 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 1/3] net/mlx5e: use indirect calls wrapper for skb allocation
  2019-05-31 12:53 [PATCH net-next 0/3] net/mlx5: use indirect call wrappers Paolo Abeni
@ 2019-05-31 12:53 ` Paolo Abeni
  2019-05-31 12:53 ` [PATCH net-next 2/3] indirect call wrappers: add helpers for 3 and 4 ways switch Paolo Abeni
  2019-05-31 12:53 ` [PATCH net-next 3/3] net/mlx5e: use indirect calls wrapper for the rx packet handler Paolo Abeni
  2 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2019-05-31 12:53 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Saeed Mahameed, Leon Romanovsky

We can avoid an indirect call per packet wrapping the skb creation
with the appropriate helper.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 13133e7f088e..0fe5f13d07cc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -34,6 +34,7 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/tcp.h>
+#include <linux/indirect_call_wrapper.h>
 #include <net/ip6_checksum.h>
 #include <net/page_pool.h>
 #include <net/inet_ecn.h>
@@ -1092,7 +1093,10 @@ void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wi       = get_frag(rq, ci);
 	cqe_bcnt = be32_to_cpu(cqe->byte_cnt);
 
-	skb = rq->wqe.skb_from_cqe(rq, cqe, wi, cqe_bcnt);
+	skb = INDIRECT_CALL_2(rq->wqe.skb_from_cqe,
+			      mlx5e_skb_from_cqe_linear,
+			      mlx5e_skb_from_cqe_nonlinear,
+			      rq, cqe, wi, cqe_bcnt);
 	if (!skb) {
 		/* probably for XDP */
 		if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
@@ -1279,8 +1283,10 @@ void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 
 	cqe_bcnt = mpwrq_get_cqe_byte_cnt(cqe);
 
-	skb = rq->mpwqe.skb_from_cqe_mpwrq(rq, wi, cqe_bcnt, head_offset,
-					   page_idx);
+	skb = INDIRECT_CALL_2(rq->mpwqe.skb_from_cqe_mpwrq,
+			      mlx5e_skb_from_cqe_mpwrq_linear,
+			      mlx5e_skb_from_cqe_mpwrq_nonlinear,
+			      rq, wi, cqe_bcnt, head_offset, page_idx);
 	if (!skb)
 		goto mpwrq_cqe_out;
 
@@ -1437,7 +1443,10 @@ void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wi       = get_frag(rq, ci);
 	cqe_bcnt = be32_to_cpu(cqe->byte_cnt);
 
-	skb = rq->wqe.skb_from_cqe(rq, cqe, wi, cqe_bcnt);
+	skb = INDIRECT_CALL_2(rq->wqe.skb_from_cqe,
+			      mlx5e_skb_from_cqe_linear,
+			      mlx5e_skb_from_cqe_nonlinear,
+			      rq, cqe, wi, cqe_bcnt);
 	if (!skb)
 		goto wq_free_wqe;
 
@@ -1469,7 +1478,10 @@ void mlx5e_ipsec_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wi       = get_frag(rq, ci);
 	cqe_bcnt = be32_to_cpu(cqe->byte_cnt);
 
-	skb = rq->wqe.skb_from_cqe(rq, cqe, wi, cqe_bcnt);
+	skb = INDIRECT_CALL_2(rq->wqe.skb_from_cqe,
+			      mlx5e_skb_from_cqe_linear,
+			      mlx5e_skb_from_cqe_nonlinear,
+			      rq, cqe, wi, cqe_bcnt);
 	if (unlikely(!skb)) {
 		/* a DROP, save the page-reuse checks */
 		mlx5e_free_rx_wqe(rq, wi, true);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/3] indirect call wrappers: add helpers for 3 and 4 ways switch
  2019-05-31 12:53 [PATCH net-next 0/3] net/mlx5: use indirect call wrappers Paolo Abeni
  2019-05-31 12:53 ` [PATCH net-next 1/3] net/mlx5e: use indirect calls wrapper for skb allocation Paolo Abeni
@ 2019-05-31 12:53 ` Paolo Abeni
  2019-05-31 18:30   ` Saeed Mahameed
  2019-05-31 12:53 ` [PATCH net-next 3/3] net/mlx5e: use indirect calls wrapper for the rx packet handler Paolo Abeni
  2 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2019-05-31 12:53 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Saeed Mahameed, Leon Romanovsky

Experimental results[1] has shown that resorting to several branches
and a direct-call is faster than indirect call via retpoline, even
when the number of added branches go up 5.

This change adds two additional helpers, to cope with indirect calls
with up to 4 available direct call option. We will use them
in the next patch.

[1] https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/indirect_call_wrapper.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/indirect_call_wrapper.h b/include/linux/indirect_call_wrapper.h
index 00d7e8e919c6..7c4cac87eaf7 100644
--- a/include/linux/indirect_call_wrapper.h
+++ b/include/linux/indirect_call_wrapper.h
@@ -23,6 +23,16 @@
 		likely(f == f2) ? f2(__VA_ARGS__) :			\
 				  INDIRECT_CALL_1(f, f1, __VA_ARGS__);	\
 	})
+#define INDIRECT_CALL_3(f, f3, f2, f1, ...)				\
+	({								\
+		likely(f == f3) ? f3(__VA_ARGS__) :			\
+				  INDIRECT_CALL_2(f, f2, f1, __VA_ARGS__); \
+	})
+#define INDIRECT_CALL_4(f, f4, f3, f2, f1, ...)				\
+	({								\
+		likely(f == f4) ? f4(__VA_ARGS__) :			\
+				  INDIRECT_CALL_3(f, f3, f2, f1, __VA_ARGS__); \
+	})
 
 #define INDIRECT_CALLABLE_DECLARE(f)	f
 #define INDIRECT_CALLABLE_SCOPE
@@ -30,6 +40,8 @@
 #else
 #define INDIRECT_CALL_1(f, f1, ...) f(__VA_ARGS__)
 #define INDIRECT_CALL_2(f, f2, f1, ...) f(__VA_ARGS__)
+#define INDIRECT_CALL_3(f, f3, f2, f1, ...) f(__VA_ARGS__)
+#define INDIRECT_CALL_4(f, f4, f3, f2, f1, ...) f(__VA_ARGS__)
 #define INDIRECT_CALLABLE_DECLARE(f)
 #define INDIRECT_CALLABLE_SCOPE		static
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/3] net/mlx5e: use indirect calls wrapper for the rx packet handler
  2019-05-31 12:53 [PATCH net-next 0/3] net/mlx5: use indirect call wrappers Paolo Abeni
  2019-05-31 12:53 ` [PATCH net-next 1/3] net/mlx5e: use indirect calls wrapper for skb allocation Paolo Abeni
  2019-05-31 12:53 ` [PATCH net-next 2/3] indirect call wrappers: add helpers for 3 and 4 ways switch Paolo Abeni
@ 2019-05-31 12:53 ` Paolo Abeni
  2019-05-31 18:41   ` Saeed Mahameed
  2 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2019-05-31 12:53 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Saeed Mahameed, Leon Romanovsky

We can avoid another indirect call per packet wrapping the rx
handler call with the proper helper.

To ensure that even the last listed direct call experience
measurable gain, despite the additional conditionals we must
traverse before reaching it, I tested reversing the order of the
listed options, with performance differences below noise level.

Together with the previous indirect call patch, this gives
~6% performance improvement in raw UDP tput.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 0fe5f13d07cc..c3752dbe00c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1333,7 +1333,9 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
 
 		mlx5_cqwq_pop(cqwq);
 
-		rq->handle_rx_cqe(rq, cqe);
+		INDIRECT_CALL_4(rq->handle_rx_cqe, mlx5e_handle_rx_cqe_mpwrq,
+				mlx5e_handle_rx_cqe, mlx5e_handle_rx_cqe_rep,
+				mlx5e_ipsec_handle_rx_cqe, rq, cqe);
 	} while ((++work_done < budget) && (cqe = mlx5_cqwq_get_cqe(cqwq)));
 
 out:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] indirect call wrappers: add helpers for 3 and 4 ways switch
  2019-05-31 12:53 ` [PATCH net-next 2/3] indirect call wrappers: add helpers for 3 and 4 ways switch Paolo Abeni
@ 2019-05-31 18:30   ` Saeed Mahameed
  2019-06-03  9:51     ` Paolo Abeni
  0 siblings, 1 reply; 10+ messages in thread
From: Saeed Mahameed @ 2019-05-31 18:30 UTC (permalink / raw)
  To: netdev, pabeni; +Cc: davem, leon

On Fri, 2019-05-31 at 14:53 +0200, Paolo Abeni wrote:
> Experimental results[1] has shown that resorting to several branches
> and a direct-call is faster than indirect call via retpoline, even
> when the number of added branches go up 5.
> 
> This change adds two additional helpers, to cope with indirect calls
> with up to 4 available direct call option. We will use them
> in the next patch.
> 
> [1] 
> https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/indirect_call_wrapper.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/linux/indirect_call_wrapper.h
> b/include/linux/indirect_call_wrapper.h
> index 00d7e8e919c6..7c4cac87eaf7 100644
> --- a/include/linux/indirect_call_wrapper.h
> +++ b/include/linux/indirect_call_wrapper.h
> @@ -23,6 +23,16 @@
>  		likely(f == f2) ? f2(__VA_ARGS__) :			
> \
>  				  INDIRECT_CALL_1(f, f1, __VA_ARGS__);	
> \
>  	})
> +#define INDIRECT_CALL_3(f, f3, f2, f1, ...)				
> \
> +	({								
> \
> +		likely(f == f3) ? f3(__VA_ARGS__) :			
> \
> +				  INDIRECT_CALL_2(f, f2, f1,
> __VA_ARGS__); \
> +	})
> +#define INDIRECT_CALL_4(f, f4, f3, f2, f1, ...)			
> 	\
> +	({								
> \
> +		likely(f == f4) ? f4(__VA_ARGS__) :		

do we really want "likely" here ? in our cases there is no preference
on whuch fN is going to have the top priority, all of them are equally
important and statically configured and guranteed to not change on data
path .. 

> 	\
> +				  INDIRECT_CALL_3(f, f3, f2, f1,
> __VA_ARGS__); \
> +	})
>  

Oh the RETPOLINE!

On which (N) where INDIRECT_CALL_N(f, fN, fN-1, ..., f1,...) , calling
the indirection function pointer directly is going to be actually
better than this whole INDIRECT_CALL_N wrapper "if else" dance ?

>  #define INDIRECT_CALLABLE_DECLARE(f)	f
>  #define INDIRECT_CALLABLE_SCOPE
> @@ -30,6 +40,8 @@
>  #else
>  #define INDIRECT_CALL_1(f, f1, ...) f(__VA_ARGS__)
>  #define INDIRECT_CALL_2(f, f2, f1, ...) f(__VA_ARGS__)
> +#define INDIRECT_CALL_3(f, f3, f2, f1, ...) f(__VA_ARGS__)
> +#define INDIRECT_CALL_4(f, f4, f3, f2, f1, ...) f(__VA_ARGS__)
>  #define INDIRECT_CALLABLE_DECLARE(f)
>  #define INDIRECT_CALLABLE_SCOPE		static
>  #endif

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 3/3] net/mlx5e: use indirect calls wrapper for the rx packet handler
  2019-05-31 12:53 ` [PATCH net-next 3/3] net/mlx5e: use indirect calls wrapper for the rx packet handler Paolo Abeni
@ 2019-05-31 18:41   ` Saeed Mahameed
  2019-06-03  9:55     ` Paolo Abeni
  0 siblings, 1 reply; 10+ messages in thread
From: Saeed Mahameed @ 2019-05-31 18:41 UTC (permalink / raw)
  To: netdev, pabeni; +Cc: davem, leon

On Fri, 2019-05-31 at 14:53 +0200, Paolo Abeni wrote:
> We can avoid another indirect call per packet wrapping the rx
> handler call with the proper helper.
> 
> To ensure that even the last listed direct call experience
> measurable gain, despite the additional conditionals we must
> traverse before reaching it, I tested reversing the order of the
> listed options, with performance differences below noise level.
> 
> Together with the previous indirect call patch, this gives
> ~6% performance improvement in raw UDP tput.
> 

Nice ! I like it.

> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 0fe5f13d07cc..c3752dbe00c8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -1333,7 +1333,9 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int
> budget)
>  
>  		mlx5_cqwq_pop(cqwq);
>  
> -		rq->handle_rx_cqe(rq, cqe);
> +		INDIRECT_CALL_4(rq->handle_rx_cqe,
> mlx5e_handle_rx_cqe_mpwrq,
> +				mlx5e_handle_rx_cqe,
> mlx5e_handle_rx_cqe_rep,
> +				mlx5e_ipsec_handle_rx_cqe, rq, cqe);

you missed mlx5i_handle_rx_cqe, anyway don't add INDIRECT_CALL_5 :D

just replace mlx5e_handle_rx_cqe_rep with mlx5i_handle_rx_cqe, 
mlx5e_handle_rx_cqe_rep is actually a slow path of switchdev mode.

Maybe define the list somewhere in en.h where it is visible for every
one:

#define MLX5_RX_INDIRECT_CALL_LIST \
mlx5e_handle_rx_cqe_mpwrq, mlx5e_handle_rx_cqe, mlx5i_handle_rx_cqe,
mlx5e_ipsec_handle_rx_cqe

and here:
INDIRECT_CALL_4(rq->handle_rx_cqe, MLX5_RX_INDIRECT_CALL_LIST, rq,
cqe);

>  	} while ((++work_done < budget) && (cqe =
> mlx5_cqwq_get_cqe(cqwq)));
>  
>  out:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] indirect call wrappers: add helpers for 3 and 4 ways switch
  2019-05-31 18:30   ` Saeed Mahameed
@ 2019-06-03  9:51     ` Paolo Abeni
  2019-06-03 22:27       ` Saeed Mahameed
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2019-06-03  9:51 UTC (permalink / raw)
  To: Saeed Mahameed, netdev; +Cc: davem, leon

On Fri, 2019-05-31 at 18:30 +0000, Saeed Mahameed wrote:
> On Fri, 2019-05-31 at 14:53 +0200, Paolo Abeni wrote:
> > Experimental results[1] has shown that resorting to several branches
> > and a direct-call is faster than indirect call via retpoline, even
> > when the number of added branches go up 5.
> > 
> > This change adds two additional helpers, to cope with indirect calls
> > with up to 4 available direct call option. We will use them
> > in the next patch.
> > 
> > [1] 
> > https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  include/linux/indirect_call_wrapper.h | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/include/linux/indirect_call_wrapper.h
> > b/include/linux/indirect_call_wrapper.h
> > index 00d7e8e919c6..7c4cac87eaf7 100644
> > --- a/include/linux/indirect_call_wrapper.h
> > +++ b/include/linux/indirect_call_wrapper.h
> > @@ -23,6 +23,16 @@
> >  		likely(f == f2) ? f2(__VA_ARGS__) :			
> > \
> >  				  INDIRECT_CALL_1(f, f1, __VA_ARGS__);	
> > \
> >  	})
> > +#define INDIRECT_CALL_3(f, f3, f2, f1, ...)				
> > \
> > +	({								
> > \
> > +		likely(f == f3) ? f3(__VA_ARGS__) :			
> > \
> > +				  INDIRECT_CALL_2(f, f2, f1,
> > __VA_ARGS__); \
> > +	})
> > +#define INDIRECT_CALL_4(f, f4, f3, f2, f1, ...)			
> > 	\
> > +	({								
> > \
> > +		likely(f == f4) ? f4(__VA_ARGS__) :		
> 
> do we really want "likely" here ? in our cases there is no preference
> on whuch fN is going to have the top priority, all of them are equally
> important and statically configured and guranteed to not change on data
> path .. 

I was a little undecided about that, too. 'likely()' is there mainly
for simmetry with the already existing _1 and _2 variants. In such
macros the branch prediction hint represent a real priority of the
available choices.

To avoid the branch prediction, a new set of macros should be defined,
but that also sounds redundant.

If you have strong opinion against the breanch prediction hint, I could
either drop this patch and the next one or resort to custom macros in
the mlx code.

Any [alternative] suggestions more than welcome!
	\
> > +				  INDIRECT_CALL_3(f, f3, f2, f1,
> > __VA_ARGS__); \
> > +	})
> >  
> 
> Oh the RETPOLINE!
> 
> On which (N) where INDIRECT_CALL_N(f, fN, fN-1, ..., f1,...) , calling
> the indirection function pointer directly is going to be actually
> better than this whole INDIRECT_CALL_N wrapper "if else" dance ?

In commit ce02ef06fcf7a399a6276adb83f37373d10cbbe1, it's measured a
relevant gain even with more than 5 options. I personally would avoid
adding much more options than the above.

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 3/3] net/mlx5e: use indirect calls wrapper for the rx packet handler
  2019-05-31 18:41   ` Saeed Mahameed
@ 2019-06-03  9:55     ` Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2019-06-03  9:55 UTC (permalink / raw)
  To: Saeed Mahameed, netdev; +Cc: davem, leon

On Fri, 2019-05-31 at 18:41 +0000, Saeed Mahameed wrote:
> On Fri, 2019-05-31 at 14:53 +0200, Paolo Abeni wrote:
> > We can avoid another indirect call per packet wrapping the rx
> > handler call with the proper helper.
> > 
> > To ensure that even the last listed direct call experience
> > measurable gain, despite the additional conditionals we must
> > traverse before reaching it, I tested reversing the order of the
> > listed options, with performance differences below noise level.
> > 
> > Together with the previous indirect call patch, this gives
> > ~6% performance improvement in raw UDP tput.
> > 
> 
> Nice ! I like it.
> 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > index 0fe5f13d07cc..c3752dbe00c8 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > @@ -1333,7 +1333,9 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int
> > budget)
> >  
> >  		mlx5_cqwq_pop(cqwq);
> >  
> > -		rq->handle_rx_cqe(rq, cqe);
> > +		INDIRECT_CALL_4(rq->handle_rx_cqe,
> > mlx5e_handle_rx_cqe_mpwrq,
> > +				mlx5e_handle_rx_cqe,
> > mlx5e_handle_rx_cqe_rep,
> > +				mlx5e_ipsec_handle_rx_cqe, rq, cqe);
> 
> you missed mlx5i_handle_rx_cqe, anyway don't add INDIRECT_CALL_5 :D
> 
> just replace mlx5e_handle_rx_cqe_rep with mlx5i_handle_rx_cqe, 
> mlx5e_handle_rx_cqe_rep is actually a slow path of switchdev mode.

Thank you! This is exactly the kind of feedback I was looking for! I
hoped some of the options was less relevant than the other, but I do
not know the driver well enough to guess. Also I missed completely
mlx5i_handle_rx_cqe, as you noded.

> Maybe define the list somewhere in en.h where it is visible for every
> one:
> 
> #define MLX5_RX_INDIRECT_CALL_LIST \
> mlx5e_handle_rx_cqe_mpwrq, mlx5e_handle_rx_cqe, mlx5i_handle_rx_cqe,
> mlx5e_ipsec_handle_rx_cqe
> 
> and here:
> INDIRECT_CALL_4(rq->handle_rx_cqe, MLX5_RX_INDIRECT_CALL_LIST, rq,
> cqe);

Will do in v2, unless this patch will be dropped, please see my reply
to patch 2/3.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] indirect call wrappers: add helpers for 3 and 4 ways switch
  2019-06-03  9:51     ` Paolo Abeni
@ 2019-06-03 22:27       ` Saeed Mahameed
  2019-06-05  8:35         ` Paolo Abeni
  0 siblings, 1 reply; 10+ messages in thread
From: Saeed Mahameed @ 2019-06-03 22:27 UTC (permalink / raw)
  To: edumazet, netdev, pabeni; +Cc: davem, leon

On Mon, 2019-06-03 at 11:51 +0200, Paolo Abeni wrote:
> On Fri, 2019-05-31 at 18:30 +0000, Saeed Mahameed wrote:
> > On Fri, 2019-05-31 at 14:53 +0200, Paolo Abeni wrote:
> > > Experimental results[1] has shown that resorting to several
> > > branches
> > > and a direct-call is faster than indirect call via retpoline,
> > > even
> > > when the number of added branches go up 5.
> > > 
> > > This change adds two additional helpers, to cope with indirect
> > > calls
> > > with up to 4 available direct call option. We will use them
> > > in the next patch.
> > > 
> > > [1] 
> > > https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf
> > > 
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  include/linux/indirect_call_wrapper.h | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/include/linux/indirect_call_wrapper.h
> > > b/include/linux/indirect_call_wrapper.h
> > > index 00d7e8e919c6..7c4cac87eaf7 100644
> > > --- a/include/linux/indirect_call_wrapper.h
> > > +++ b/include/linux/indirect_call_wrapper.h
> > > @@ -23,6 +23,16 @@
> > >  		likely(f == f2) ? f2(__VA_ARGS__) :			
> > > \
> > >  				  INDIRECT_CALL_1(f, f1, __VA_ARGS__);	
> > > \
> > >  	})
> > > +#define INDIRECT_CALL_3(f, f3, f2, f1, ...)			
> > > 	
> > > \
> > > +	({								
> > > \
> > > +		likely(f == f3) ? f3(__VA_ARGS__) :			
> > > \
> > > +				  INDIRECT_CALL_2(f, f2, f1,
> > > __VA_ARGS__); \
> > > +	})
> > > +#define INDIRECT_CALL_4(f, f4, f3, f2, f1, ...)			
> > > 	\
> > > +	({								
> > > \
> > > +		likely(f == f4) ? f4(__VA_ARGS__) :		
> > 
> > do we really want "likely" here ? in our cases there is no
> > preference
> > on whuch fN is going to have the top priority, all of them are
> > equally
> > important and statically configured and guranteed to not change on
> > data
> > path .. 
> 
> I was a little undecided about that, too. 'likely()' is there mainly
> for simmetry with the already existing _1 and _2 variants. In such
> macros the branch prediction hint represent a real priority of the
> available choices.
> 

For macro _1 it make sense to have the likely keyword but for _2 it
doesn't, by looking at most of the usecases of INDIRECT_CALL_2, they
seem to be all around improving tcp/udp related indirection calls in
the protocol stack, and they seem to prefer tcp over udp. But IMHO at
least for the above usecase I think the likely keyword is being misused
here and should be remove from all INDIRECT_CALL_N where N > 1;

Eric, what do you think ?

> To avoid the branch prediction, a new set of macros should be
> defined,
> but that also sounds redundant.
> 
> If you have strong opinion against the breanch prediction hint, I
> could
> either drop this patch and the next one or resort to custom macros in
> the mlx code.
> 
> Any [alternative] suggestions more than welcome!
> 	\

custom macros can work, but in case you don't want to introduce such
macros in a vendor specific driver, then i think your patches are still
an improvement after all..

In any case, just make sure to use the order i suggested in next patch
with: MLX5_RX_INDIRECT_CALL_LIST

> > > +				  INDIRECT_CALL_3(f, f3, f2, f1,
> > > __VA_ARGS__); \
> > > +	})
> > >  
> > 
> > Oh the RETPOLINE!
> > 
> > On which (N) where INDIRECT_CALL_N(f, fN, fN-1, ..., f1,...) ,
> > calling
> > the indirection function pointer directly is going to be actually
> > better than this whole INDIRECT_CALL_N wrapper "if else" dance ?
> 
> In commit ce02ef06fcf7a399a6276adb83f37373d10cbbe1, it's measured a
> relevant gain even with more than 5 options. I personally would avoid
> adding much more options than the above.
> 
> Thanks,
> 
> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] indirect call wrappers: add helpers for 3 and 4 ways switch
  2019-06-03 22:27       ` Saeed Mahameed
@ 2019-06-05  8:35         ` Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2019-06-05  8:35 UTC (permalink / raw)
  To: Saeed Mahameed, edumazet, netdev; +Cc: davem, leon

On Mon, 2019-06-03 at 22:27 +0000, Saeed Mahameed wrote:
> On Mon, 2019-06-03 at 11:51 +0200, Paolo Abeni wrote:
> > On Fri, 2019-05-31 at 18:30 +0000, Saeed Mahameed wrote:
> > > On Fri, 2019-05-31 at 14:53 +0200, Paolo Abeni wrote:
> > > > Experimental results[1] has shown that resorting to several
> > > > branches
> > > > and a direct-call is faster than indirect call via retpoline,
> > > > even
> > > > when the number of added branches go up 5.
> > > > 
> > > > This change adds two additional helpers, to cope with indirect
> > > > calls
> > > > with up to 4 available direct call option. We will use them
> > > > in the next patch.
> > > > 
> > > > [1] 
> > > > https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf
> > > > 
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > >  include/linux/indirect_call_wrapper.h | 12 ++++++++++++
> > > >  1 file changed, 12 insertions(+)
> > > > 
> > > > diff --git a/include/linux/indirect_call_wrapper.h
> > > > b/include/linux/indirect_call_wrapper.h
> > > > index 00d7e8e919c6..7c4cac87eaf7 100644
> > > > --- a/include/linux/indirect_call_wrapper.h
> > > > +++ b/include/linux/indirect_call_wrapper.h
> > > > @@ -23,6 +23,16 @@
> > > >  		likely(f == f2) ? f2(__VA_ARGS__) :			
> > > > \
> > > >  				  INDIRECT_CALL_1(f, f1, __VA_ARGS__);	
> > > > \
> > > >  	})
> > > > +#define INDIRECT_CALL_3(f, f3, f2, f1, ...)			
> > > > 	
> > > > \
> > > > +	({								
> > > > \
> > > > +		likely(f == f3) ? f3(__VA_ARGS__) :			
> > > > \
> > > > +				  INDIRECT_CALL_2(f, f2, f1,
> > > > __VA_ARGS__); \
> > > > +	})
> > > > +#define INDIRECT_CALL_4(f, f4, f3, f2, f1, ...)			
> > > > 	\
> > > > +	({								
> > > > \
> > > > +		likely(f == f4) ? f4(__VA_ARGS__) :		
> > > 
> > > do we really want "likely" here ? in our cases there is no
> > > preference
> > > on whuch fN is going to have the top priority, all of them are
> > > equally
> > > important and statically configured and guranteed to not change on
> > > data
> > > path .. 
> > 
> > I was a little undecided about that, too. 'likely()' is there mainly
> > for simmetry with the already existing _1 and _2 variants. In such
> > macros the branch prediction hint represent a real priority of the
> > available choices.
> > 
> 
> For macro _1 it make sense to have the likely keyword but for _2 it
> doesn't, by looking at most of the usecases of INDIRECT_CALL_2, they
> seem to be all around improving tcp/udp related indirection calls in
> the protocol stack, and they seem to prefer tcp over udp. But IMHO at
> least for the above usecase I think the likely keyword is being misused
> here and should be remove from all INDIRECT_CALL_N where N > 1;

I experimented a bit with gcc 8.3.1 and some BP hint variations:

* with current macros we have single test for fN and an incresing
number of conditional jumps and tests for the following functions, as
the generated code looks somehow alike:

	cmp f4, function_ptr
	jne test_f3
	call f4
post_call:
	// ...

	// ...
test_f3:
	cmp f3, function_ptr
	jne test_f2
	call f3
	jmp post_call

test_f2:
	cmp f2, function_ptr
	//...

* keeping 'likely' only on INDIRECT_CALL_1 we get a conditinal jump for
fN and the number of conditional jumps and tests grows for the next
functions, as the generated code looks somehow alike:

	cmp f4, function_ptr
	je call_f4
	cmp f3, function_ptr
	je call_f3
	//...
	cmp f1, function_ptr
	jne indirect_call
	call f1
post_call:
	// ...

	// ...
call_f4:
	call f4
	jmp post_call
call_f3:
	call f3
	jmp post_call
	// ...


* without any BP hints, is quite alike the above, except for the last
test, the indirect call don't need an additional jump: 

	cmp f4, function_ptr
	je call_f4
	cmp f3, function_ptr
	je call_f3
	//...
	cmp f1, function_ptr
	je call_f1
	call retpoline_helper

I think the first option should be overall better then the 2nd. The 3rd
one is the worse.

> In any case, just make sure to use the order i suggested in next
> patch with: MLX5_RX_INDIRECT_CALL_LIST

Sure! will do in next iteration, as soon as the above topic is settled.

Thanks!

Paolo




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-06-05  8:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-31 12:53 [PATCH net-next 0/3] net/mlx5: use indirect call wrappers Paolo Abeni
2019-05-31 12:53 ` [PATCH net-next 1/3] net/mlx5e: use indirect calls wrapper for skb allocation Paolo Abeni
2019-05-31 12:53 ` [PATCH net-next 2/3] indirect call wrappers: add helpers for 3 and 4 ways switch Paolo Abeni
2019-05-31 18:30   ` Saeed Mahameed
2019-06-03  9:51     ` Paolo Abeni
2019-06-03 22:27       ` Saeed Mahameed
2019-06-05  8:35         ` Paolo Abeni
2019-05-31 12:53 ` [PATCH net-next 3/3] net/mlx5e: use indirect calls wrapper for the rx packet handler Paolo Abeni
2019-05-31 18:41   ` Saeed Mahameed
2019-06-03  9:55     ` Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).