All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: netfilter-devel@vger.kernel.org
Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org
Subject: [PATCH net-next 02/16] netfilter: nft_set_pipapo_avx2: Skip LDMXCSR, we don't need a valid MXCSR state
Date: Wed,  2 Jun 2021 00:06:15 +0200	[thread overview]
Message-ID: <20210601220629.18307-3-pablo@netfilter.org> (raw)
In-Reply-To: <20210601220629.18307-1-pablo@netfilter.org>

From: Stefano Brivio <sbrivio@redhat.com>

We don't need a valid MXCSR state for the lookup routines, none of
the instructions we use rely on or affect any bit in the MXCSR
register.

Instead of calling kernel_fpu_begin(), we can pass 0 as mask to
kernel_fpu_begin_mask() and spare one LDMXCSR instruction.

Commit 49200d17d27d ("x86/fpu/64: Don't FNINIT in kernel_fpu_begin()")
already speeds up lookups considerably, and by dropping the MCXSR
initialisation we can now get a much smaller, but measurable, increase
in matching rates.

The table below reports matching rates and a wild approximation of
clock cycles needed for a match in a "port,net" test with 10 entries
from selftests/netfilter/nft_concat_range.sh, limited to the first
field, i.e. the port (with nft_set_rbtree initialisation skipped), run
on a single AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB L2$).

The (very rough) estimation of clock cycles is obtained by simply
dividing frequency by matching rate. The "cycles spared" column refers
to the difference in cycles compared to the previous row, and the rate
increase also refers to the previous row. Results are averages of six
runs.

Merely for context, I'm also reporting packet rates obtained by
skipping kernel_fpu_begin() and kernel_fpu_end() altogether (which
shows a very limited impact now), as well as skipping the whole lookup
function, compared to simply counting and dropping all packets using
the netdev hook drop (see nft_concat_range.sh for details). This
workload also includes packet generation with pktgen and the receive
path of veth.

                                      |matching|  est.  | cycles |  rate  |
                                      |  rate  | cycles | spared |increase|
                                      | (Mpps) |        |        |        |
--------------------------------------|--------|--------|--------|--------|
FNINIT, LDMXCSR (before 49200d17d27d) |  5.245 |    553 |      - |      - |
LDMXCSR only (with 49200d17d27d)      |  6.347 |    457 |     96 |  21.0% |
Without LDMXCSR (this patch)          |  6.461 |    449 |      8 |   1.8% |
-------- for reference only: ---------|--------|--------|--------|--------|
Without kernel_fpu_begin()            |  6.513 |    445 |      4 |   0.8% |
Without actual matching (return true) |  7.649 |    379 |     66 |  17.4% |
Without lookup operation (netdev drop)| 10.320 |    281 |     98 |  34.9% |

The clock cycles spared by avoiding LDMXCSR appear to be in line with CPI
and latency indicated in the manuals of comparable architectures: Intel
Skylake (CPI: 1, latency: 7) and AMD 12h (latency: 12) -- I couldn't find
this information for AMD 17h.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_set_pipapo_avx2.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index eabdb8d552ee..1c2620923a61 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1136,8 +1136,13 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 
 	m = rcu_dereference(priv->match);
 
-	/* This also protects access to all data related to scratch maps */
-	kernel_fpu_begin();
+	/* This also protects access to all data related to scratch maps.
+	 *
+	 * Note that we don't need a valid MXCSR state for any of the
+	 * operations we use here, so pass 0 as mask and spare a LDMXCSR
+	 * instruction.
+	 */
+	kernel_fpu_begin_mask(0);
 
 	scratch = *raw_cpu_ptr(m->scratch_aligned);
 	if (unlikely(!scratch)) {
-- 
2.30.2


  parent reply	other threads:[~2021-06-01 22:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01 22:06 [PATCH net-next 00/16] Netfilter updates for net-next Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 01/16] netfilter: nft_exthdr: Support SCTP chunks Pablo Neira Ayuso
2021-06-02  0:40   ` patchwork-bot+netdevbpf
2021-06-01 22:06 ` Pablo Neira Ayuso [this message]
2021-06-01 22:06 ` [PATCH net-next 03/16] netfilter: add and use nft_set_do_lookup helper Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 04/16] netfilter: nf_tables: prefer direct calls for set lookups Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 05/16] netfilter: Remove leading spaces in Kconfig Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 06/16] netfilter: x_tables: improve limit_mt scalability Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 07/16] netfilter: xt_CT: Remove redundant assignment to ret Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 08/16] netfilter: use nfnetlink_unicast() Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 09/16] netfilter: x_tables: reduce xt_action_param by 8 byte Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 10/16] netfilter: reduce size of nf_hook_state on 32bit platforms Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 11/16] netfilter: nf_tables: add and use nft_sk helper Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 12/16] netfilter: nf_tables: add and use nft_thoff helper Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 13/16] netfilter: nf_tables: remove unused arg in nft_set_pktinfo_unspec() Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 14/16] netfilter: nf_tables: remove xt_action_param from nft_pktinfo Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 15/16] netfilter: nft_set_pipapo_avx2: fix up description warnings Pablo Neira Ayuso
2021-06-01 22:06 ` [PATCH net-next 16/16] netfilter: fix clang-12 fmt string warnings Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210601220629.18307-3-pablo@netfilter.org \
    --to=pablo@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.