All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konstantin Ananyev <konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: dev-VfR2kkLFssw@public.gmane.org
Subject: [PATCH v2 14/17] libter_acl: move lo/hi dwords shuffle out from calc_addr
Date: Mon, 12 Jan 2015 19:16:18 +0000	[thread overview]
Message-ID: <1421090181-17150-15-git-send-email-konstantin.ananyev@intel.com> (raw)
In-Reply-To: <1421090181-17150-1-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Reorganise SSE code-path a bit by moving lo/hi dwords shuffle
out from calc_addr().
That allows to make calc_addr() for SSE and AVX2 practically identical
and opens opportunity for further code deduplication.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 lib/librte_acl/acl_run_sse.h | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/lib/librte_acl/acl_run_sse.h b/lib/librte_acl/acl_run_sse.h
index 1b7870e..4a174e9 100644
--- a/lib/librte_acl/acl_run_sse.h
+++ b/lib/librte_acl/acl_run_sse.h
@@ -172,9 +172,9 @@ acl_match_check_x4(int slot, const struct rte_acl_ctx *ctx, struct parms *parms,
  */
 static inline __attribute__((always_inline)) xmm_t
 calc_addr_sse(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input,
-	xmm_t ones_16, xmm_t indices1, xmm_t indices2)
+	xmm_t ones_16, xmm_t tr_lo, xmm_t tr_hi)
 {
-	xmm_t addr, node_types, range, temp;
+	xmm_t addr, node_types;
 	xmm_t dfa_msk, dfa_ofs, quad_ofs;
 	xmm_t in, r, t;
 
@@ -187,18 +187,14 @@ calc_addr_sse(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input,
 	 * it reaches a match.
 	 */
 
-	/* Shuffle low 32 into temp and high 32 into indices2 */
-	temp = (xmm_t)MM_SHUFFLEPS((__m128)indices1, (__m128)indices2, 0x88);
-	range = (xmm_t)MM_SHUFFLEPS((__m128)indices1, (__m128)indices2, 0xdd);
-
 	t = MM_XOR(index_mask, index_mask);
 
 	/* shuffle input byte to all 4 positions of 32 bit value */
 	in = MM_SHUFFLE8(next_input, shuffle_input);
 
 	/* Calc node type and node addr */
-	node_types = MM_ANDNOT(index_mask, temp);
-	addr = MM_AND(index_mask, temp);
+	node_types = MM_ANDNOT(index_mask, tr_lo);
+	addr = MM_AND(index_mask, tr_lo);
 
 	/*
 	 * Calc addr for DFAs - addr = dfa_index + input_byte
@@ -211,7 +207,7 @@ calc_addr_sse(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input,
 	r = _mm_add_epi8(r, range_base);
 
 	t = _mm_srli_epi32(in, 24);
-	r = _mm_shuffle_epi8(range, r);
+	r = _mm_shuffle_epi8(tr_hi, r);
 
 	dfa_ofs = _mm_sub_epi32(t, r);
 
@@ -224,22 +220,22 @@ calc_addr_sse(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input,
 	 */
 
 	/* check ranges */
-	temp = MM_CMPGT8(in, range);
+	t = MM_CMPGT8(in, tr_hi);
 
 	/* convert -1 to 1 (bytes greater than input byte */
-	temp = MM_SIGN8(temp, temp);
+	t = MM_SIGN8(t, t);
 
 	/* horizontal add pairs of bytes into words */
-	temp = MM_MADD8(temp, temp);
+	t = MM_MADD8(t, t);
 
 	/* horizontal add pairs of words into dwords */
-	quad_ofs = MM_MADD16(temp, ones_16);
+	quad_ofs = MM_MADD16(t, ones_16);
 
-	/* mask to range type nodes */
-	temp = _mm_blendv_epi8(quad_ofs, dfa_ofs, dfa_msk);
+	/* blend DFA and QUAD/SINGLE. */
+	t = _mm_blendv_epi8(quad_ofs, dfa_ofs, dfa_msk);
 
 	/* add index into node position */
-	return MM_ADD32(addr, temp);
+	return MM_ADD32(addr, t);
 }
 
 /*
@@ -249,13 +245,19 @@ static inline __attribute__((always_inline)) xmm_t
 transition4(xmm_t next_input, const uint64_t *trans,
 	xmm_t *indices1, xmm_t *indices2)
 {
-	xmm_t addr;
+	xmm_t addr, tr_lo, tr_hi;
 	uint64_t trans0, trans2;
 
+	/* Shuffle low 32 into tr_lo and high 32 into tr_hi */
+	tr_lo = (xmm_t)_mm_shuffle_ps((__m128)*indices1, (__m128)*indices2,
+		0x88);
+	tr_hi = (xmm_t)_mm_shuffle_ps((__m128)*indices1, (__m128)*indices2,
+		0xdd);
+
 	 /* Calculate the address (array index) for all 4 transitions. */
 
 	addr = calc_addr_sse(xmm_index_mask.x, next_input, xmm_shuffle_input.x,
-		xmm_ones_16.x, *indices1, *indices2);
+		xmm_ones_16.x, tr_lo, tr_hi);
 
 	 /* Gather 64 bit transitions and pack back into 2 registers. */
 
-- 
1.8.5.3

  parent reply	other threads:[~2015-01-12 19:16 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-12 19:16 [PATCH v2 00/17] ACL: New AVX2 classify method and several other enhancements Konstantin Ananyev
     [not found] ` <1421090181-17150-1-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-01-12 19:16   ` [PATCH v2 01/17] fix fix compilation issues with RTE_LIBRTE_ACL_STANDALONE=y Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 02/17] app/test: few small fixes fot test_acl.c Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 03/17] librte_acl: make data_indexes long enough to survive idle transitions Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 04/17] librte_acl: remove build phase heuristsic with negative perfomance effect Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 05/17] librte_acl: fix a bug at build phase that can cause matches beeing overwirtten Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 06/17] librte_acl: introduce DFA nodes compression (group64) for identical entries Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 07/17] librte_acl: build/gen phase - simplify the way match nodes are allocated Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 08/17] librte_acl: make scalar RT code to be more similar to vector one Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 09/17] librte_acl: a bit of RT code deduplication Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 10/17] EAL: introduce rte_ymm and relatives in rte_common_vect.h Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 11/17] librte_acl: add AVX2 as new rte_acl_classify() method Konstantin Ananyev
     [not found]     ` <1421090181-17150-12-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-01-19 17:22       ` Thomas Monjalon
2015-01-20 10:56         ` Ananyev, Konstantin
2015-01-12 19:16   ` [PATCH v2 12/17] test-acl: add ability to manually select RT method Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 13/17] librte_acl: Remove search_sse_2 and relatives Konstantin Ananyev
2015-01-12 19:16   ` Konstantin Ananyev [this message]
2015-01-12 19:16   ` [PATCH v2 15/17] libte_acl: make calc_addr a define to deduplicate the code Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 16/17] libte_acl: introduce max_size into rte_acl_config Konstantin Ananyev
2015-01-12 19:16   ` [PATCH v2 17/17] libte_acl: remove unused macros Konstantin Ananyev
     [not found]     ` <1421090181-17150-18-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-01-19 17:17       ` Thomas Monjalon
2015-01-20 10:09         ` Ananyev, Konstantin
     [not found]           ` <2601191342CEEE43887BDE71AB977258213DE05E-pww93C2UFcwu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-20 10:48             ` Jim Thompson
     [not found]               ` <2601191342CEEE43887BDE71AB977258213DE0BB@irsmsx105.ger.corp.intel.com>
     [not found]                 ` <2601191342CEEE43887BDE71AB977258213DE0BB-pww93C2UFcwu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-20 11:11                   ` Ananyev, Konstantin
2015-01-20 12:26             ` Thomas Monjalon
2015-01-14 18:39   ` [PATCH v2 00/17] ACL: New AVX2 classify method and several other enhancements Neil Horman
     [not found]     ` <20150114183928.GA28492-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
2015-01-19 17:16       ` Thomas Monjalon
2015-01-19 18:39         ` Neil Horman
2015-01-20 10:11         ` Ananyev, Konstantin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1421090181-17150-15-git-send-email-konstantin.ananyev@intel.com \
    --to=konstantin.ananyev-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=dev-VfR2kkLFssw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.