All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/4] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
  2015-12-01 18:41 ` [PATCH 1/4] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h Jianbo Liu
@ 2015-12-01 12:41   ` Jan Viktorin
  2015-12-01 12:43   ` Jan Viktorin
  1 sibling, 0 replies; 50+ messages in thread
From: Jan Viktorin @ 2015-12-01 12:41 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

Hello Jianbo,

thank you for this fix. I had the feeling this works the same like in the Linux
Kernel where the CONFIG_ prefix is be used. My bad. I recommend to make
this patch separate. I can't see any relation to the rest of the series.

Regards
Jan

On Tue,  1 Dec 2015 13:41:13 -0500
Jianbo Liu <jianbo.liu@linaro.org> wrote:

> CONFIG_* from config files can not be used in code.
> 
> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> ---
>  lib/librte_eal/common/include/arch/arm/rte_cycles_32.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
> index 6c6098e..9c1be71 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
> @@ -54,7 +54,7 @@ extern "C" {
>   * @return
>   *   The time base for this lcore.
>   */
> -#ifndef CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
> +#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
>  
>  /**
>   * This call is easily portable to any ARM architecture, however,



-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/4] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
  2015-12-01 18:41 ` [PATCH 1/4] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h Jianbo Liu
  2015-12-01 12:41   ` Jan Viktorin
@ 2015-12-01 12:43   ` Jan Viktorin
  1 sibling, 0 replies; 50+ messages in thread
From: Jan Viktorin @ 2015-12-01 12:43 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

On Tue,  1 Dec 2015 13:41:13 -0500
Jianbo Liu <jianbo.liu@linaro.org> wrote:

> CONFIG_* from config files can not be used in code.
> 
> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> ---
Acked-by: Jan Viktorin <viktorin@rehivetech.com>

-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8
  2015-12-01 18:41 [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8 Jianbo Liu
@ 2015-12-01 12:47 ` Jan Viktorin
  2015-12-01 20:56   ` Jianbo Liu
  2015-12-01 18:41 ` [PATCH 1/4] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h Jianbo Liu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 50+ messages in thread
From: Jan Viktorin @ 2015-12-01 12:47 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

On Tue,  1 Dec 2015 13:41:12 -0500
Jianbo Liu <jianbo.liu@linaro.org> wrote:

> Hi,
> I'm from Linaro.org, and will work on DPDK to make it better
> runing on different ARM Platforms.
> 
> This patchset includes a small fix in rte_cycle_32.h,
> and enables acl/lpm/table/pipeline libs for armv7 and armv8.
> Please apply it after [PATCH v4 0/2] disable CONFIG_RTE_SCHED_VECTOR for arm.

Would it avoid some merge conflicts or is there some other dependency?

Jan

> 
> Thanks!
> Jianbo
> 
> 
> Jianbo Liu (4):
>   eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
>   eal/acl: enable acl for armv7-a
>   eal/arm: Enable lpm/table/pipeline libs
>   maintainers: claim resposibility for ARMv7 and ARMv8
> 
>  MAINTAINERS                                        |  2 +
>  config/defconfig_arm-armv7a-linuxapp-gcc           |  4 --
>  config/defconfig_arm64-armv8a-linuxapp-gcc         |  3 -
>  lib/librte_acl/Makefile                            |  2 +-
>  lib/librte_acl/rte_acl.c                           |  2 +-
>  .../common/include/arch/arm/rte_cycles_32.h        |  2 +-
>  lib/librte_eal/common/include/arch/arm/rte_vect.h  | 51 ++++++++++++++++
>  lib/librte_lpm/rte_lpm.h                           | 68 ++++++++++++++++------
>  8 files changed, 105 insertions(+), 29 deletions(-)
> 



-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] eal/acl: enable acl for armv7-a
  2015-12-01 18:41 ` [PATCH 2/4] eal/acl: enable acl for armv7-a Jianbo Liu
@ 2015-12-01 14:43   ` Jerin Jacob
  2015-12-01 14:46     ` Jan Viktorin
  0 siblings, 1 reply; 50+ messages in thread
From: Jerin Jacob @ 2015-12-01 14:43 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

On Tue, Dec 01, 2015 at 01:41:14PM -0500, Jianbo Liu wrote:
> Implement vqtbl1q_u8 intrinsic function, which is not support in armv7-a.
> 
> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> ---
>  config/defconfig_arm-armv7a-linuxapp-gcc          |  1 -
>  lib/librte_acl/Makefile                           |  2 +-
>  lib/librte_acl/rte_acl.c                          |  2 +-
>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 23 +++++++++++++++++++++++
>  4 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> index 9924ff9..cbebd64 100644
> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> @@ -53,7 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
>  CONFIG_RTE_EAL_IGB_UIO=n
>  
>  # fails to compile on ARM
> -CONFIG_RTE_LIBRTE_ACL=n
>  CONFIG_RTE_LIBRTE_LPM=n
>  CONFIG_RTE_LIBRTE_TABLE=n
>  CONFIG_RTE_LIBRTE_PIPELINE=n
> diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
> index 897237d..2e394c9 100644
> --- a/lib/librte_acl/Makefile
> +++ b/lib/librte_acl/Makefile
> @@ -49,7 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
>  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
>  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
>  
> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
>  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_neon.c
>  CFLAGS_acl_run_neon.o += -flax-vector-conversions -Wno-maybe-uninitialized
>  else
> diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
> index e2fdebd..339aace 100644
> --- a/lib/librte_acl/rte_acl.c
> +++ b/lib/librte_acl/rte_acl.c
> @@ -114,7 +114,7 @@ rte_acl_init(void)
>  {
>  	enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT;
>  
> -#ifdef RTE_ARCH_ARM64
> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>  	alg =  RTE_ACL_CLASSIFY_NEON;

I believe SIMD is optional in armv7. If true, select alg as
RTE_ACL_CLASSIFY_NEON only when cpufeature NEON enabled.

>  #else
>  #ifdef CC_AVX2_SUPPORT
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> index 21cdb4d..a33c054 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> @@ -53,6 +53,29 @@ typedef union rte_xmm {
>  	double   pd[XMM_SIZE / sizeof(double)];
>  } __attribute__((aligned(16))) rte_xmm_t;
>  
> +#ifdef RTE_ARCH_ARM
> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> +static __inline uint8x16_t
> +vqtbl1q_u8(uint8x16_t a, uint8x16_t b)
> +{
> +	uint8_t i, pos;
> +	rte_xmm_t rte_a, rte_b, rte_ret;
> +
> +	vst1q_u8(rte_a.u8, a);
> +	vst1q_u8(rte_b.u8, b);
> +
> +	for (i = 0; i < 16; i++) {
> +		pos = rte_b.u8[i];
> +		if (pos < 16)
> +			rte_ret.u8[i] = rte_a.u8[pos];
> +		else
> +			rte_ret.u8[i] = 0;
> +	}
> +
> +	return vld1q_u8(rte_ret.u8);
> +}
> +#endif
> +
>  #ifdef __cplusplus
>  }
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] eal/acl: enable acl for armv7-a
  2015-12-01 14:43   ` Jerin Jacob
@ 2015-12-01 14:46     ` Jan Viktorin
  2015-12-02  6:14       ` Jianbo Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Viktorin @ 2015-12-01 14:46 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

On Tue, 1 Dec 2015 20:13:49 +0530
Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

> >  	enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT;
> >  
> > -#ifdef RTE_ARCH_ARM64
> > +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> >  	alg =  RTE_ACL_CLASSIFY_NEON;  
> 
> I believe SIMD is optional in armv7. If true, select alg as
> RTE_ACL_CLASSIFY_NEON only when cpufeature NEON enabled.

Yes. Or, probably, we can be happy with

#if defined(__ARM_NEON_FP)
...
#endif

as it is currently done in rte_memcpy_32.h.

Regards
Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-01 18:41 ` [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs Jianbo Liu
@ 2015-12-01 16:41   ` Jerin Jacob
  2015-12-01 17:02     ` Jan Viktorin
                       ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Jerin Jacob @ 2015-12-01 16:41 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
> Adds ARM NEON support for lpm.
> And enables table/pipeline libraries which depend on lpm.

I already sent the patch on the same yesterday.
We can converge the patches after the discussion.
Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml


> 
> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> ---
>  config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
>  config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
>  lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
>  4 files changed, 77 insertions(+), 25 deletions(-)
> 
> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> index cbebd64..efffa1f 100644
> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
>  CONFIG_RTE_EAL_IGB_UIO=n
>  
>  # fails to compile on ARM
> -CONFIG_RTE_LIBRTE_LPM=n
> -CONFIG_RTE_LIBRTE_TABLE=n
> -CONFIG_RTE_LIBRTE_PIPELINE=n
>  CONFIG_RTE_SCHED_VECTOR=n
>  
>  # cannot use those on ARM
> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
> index 504f3ed..57f7941 100644
> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
>  CONFIG_RTE_LIBRTE_I40E_PMD=n
>  
> -CONFIG_RTE_LIBRTE_LPM=n
> -CONFIG_RTE_LIBRTE_TABLE=n
> -CONFIG_RTE_LIBRTE_PIPELINE=n
>  CONFIG_RTE_SCHED_VECTOR=n
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> index a33c054..7437711 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> @@ -41,6 +41,8 @@ extern "C" {
>  
>  typedef int32x4_t xmm_t;
>  
> +typedef int32x4_t __m128i;
> +
>  #define	XMM_SIZE	(sizeof(xmm_t))
>  #define	XMM_MASK	(XMM_SIZE - 1)
>  
> @@ -53,6 +55,32 @@ typedef union rte_xmm {
>  	double   pd[XMM_SIZE / sizeof(double)];
>  } __attribute__((aligned(16))) rte_xmm_t;
>  
> +static __inline __m128i
> +_mm_set_epi32(int i3, int i2, int i1, int i0)
> +{
> +	int32_t r[4] = {i0, i1, i2, i3};
> +
> +	return vld1q_s32(r);
> +}
> +
> +static __inline __m128i
> +_mm_loadu_si128(__m128i *p)
> +{
> +	return vld1q_s32((int32_t *)p);
> +}
> +
> +static __inline __m128i
> +_mm_set1_epi32(int i)
> +{
> +	return vdupq_n_s32(i);
> +}
> +
> +static __inline __m128i
> +_mm_and_si128(__m128i a, __m128i b)
> +{
> +	return vandq_s32(a, b);
> +}
> +

IMO, it makes sense to not emulate the SSE intrinsics with NEON
Let's create the rte_vect_* as required. look at the existing patch.


>  #ifdef RTE_ARCH_ARM
>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>  static __inline uint8x16_t
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index c299ce2..c76c07d 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
>  /* Mask four results. */
>  #define	 RTE_LPM_MASKX4_RES	UINT64_C(0x00ff00ff00ff00ff)
>  
> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)

Separate out arm implementation to the different header file.
Too many ifdef looks odd in the header file and difficult to manage.


> +static inline void
> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t tbl[4])
> +{
> +	uint32x4_t i24;
> +	uint32_t idx[4];
> +
> +	/* get 4 indexes for tbl24[]. */
> +	i24 = vshrq_n_u32(vreinterpretq_u32_s32(ip), CHAR_BIT);
> +	vst1q_u32(idx, i24);
> +
> +	/* extract values from tbl24[] */
> +	tbl[0] = *(const uint16_t *)&lpm->tbl24[idx[0]];
> +	tbl[1] = *(const uint16_t *)&lpm->tbl24[idx[1]];
> +	tbl[2] = *(const uint16_t *)&lpm->tbl24[idx[2]];
> +	tbl[3] = *(const uint16_t *)&lpm->tbl24[idx[3]];
> +}

Nice. There is an improvement in this portion code wrt my patch. This is
a candidate for convergence.


> +#else
> +static inline void
> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, __m128i ip, uint16_t tbl[4])
> +{
> +	__m128i i24;
> +	uint64_t idx;
> +
> +	/* get 4 indexes for tbl24[]. */
> +	i24 = _mm_srli_epi32(ip, CHAR_BIT);
> +
> +	/* extract values from tbl24[] */
> +	idx = _mm_cvtsi128_si64(i24);
> +	i24 = _mm_srli_si128(i24, sizeof(uint64_t));
> +
> +	tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> +	tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> +
> +	idx = _mm_cvtsi128_si64(i24);
> +
> +	tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> +	tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> +}
> +#endif
> +
>  /**
>   * Lookup four IP addresses in an LPM table.
>   *
> @@ -381,17 +422,19 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
>   *   if lookup would fail.
>   */
>  static inline void
> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> +rte_lpm_lookupx4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t hop[4],
> +	uint16_t defv)

This would call for change in the change the ABI,
IMO, __m128i can be used to represent 128bit vector to avoid ABI chang


> +#else
separate out arm implementation to the different header file. Too many
ifdef looks odd in the header file.

Could you  rebase your patch based on existing patch and send the
improvement portion as separate patch or I can send update patch with
your improvements and with your signoff.


>  rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
>  	uint16_t defv)
> +#endif
>  {
> -	__m128i i24;
>  	rte_xmm_t i8;
>  	uint16_t tbl[4];
> -	uint64_t idx, pt;
> -
> -	const __m128i mask8 =
> -		_mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX);
> +	uint64_t pt;
>  
> +	const __m128i mask8 = _mm_set1_epi32(UINT8_MAX);
>  	/*
>  	 * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries
>  	 * as one 64-bit value (0x0300030003000300).
> @@ -412,20 +455,7 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
>  		(uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 |
>  		(uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48);
>  
> -	/* get 4 indexes for tbl24[]. */
> -	i24 = _mm_srli_epi32(ip, CHAR_BIT);
> -
> -	/* extract values from tbl24[] */
> -	idx = _mm_cvtsi128_si64(i24);
> -	i24 = _mm_srli_si128(i24, sizeof(uint64_t));
> -
> -	tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> -	tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> -
> -	idx = _mm_cvtsi128_si64(i24);
> -
> -	tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> -	tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> +	rte_lpm_tbl24_val4(lpm, ip, tbl);
>  
>  	/* get 4 indexes for tbl8[]. */
>  	i8.x = _mm_and_si128(ip, mask8);
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/4] maintainers: claim resposibility for ARMv7 and ARMv8
  2015-12-01 18:41 ` [PATCH 4/4] maintainers: claim resposibility for ARMv7 and ARMv8 Jianbo Liu
@ 2015-12-01 16:44   ` Jerin Jacob
  0 siblings, 0 replies; 50+ messages in thread
From: Jerin Jacob @ 2015-12-01 16:44 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

On Tue, Dec 01, 2015 at 01:41:16PM -0500, Jianbo Liu wrote:
> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> ---
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4478862..f859985 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -124,10 +124,12 @@ F: doc/guides/sample_app_ug/multi_process.rst
>  
>  ARM v7
>  M: Jan Viktorin <viktorin@rehivetech.com>
> +M: Jianbo Liu <jianbo.liu@linaro.org>
>  F: lib/librte_eal/common/include/arch/arm/
>  
>  ARM v8
>  M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> +M: Jianbo Liu <jianbo.liu@linaro.org>

+1

>  F: lib/librte_eal/common/include/arch/arm/*_64.h
>  F: lib/librte_acl/acl_run_neon.*
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-01 16:41   ` Jerin Jacob
@ 2015-12-01 17:02     ` Jan Viktorin
  2015-12-02  7:02     ` Jianbo Liu
       [not found]     ` <CAP4Qi3-5ofDU-2-4KsxFzMC1OpTsc5WjmxcFT2Eu_URA0UBzDw@mail.gmail.com>
  2 siblings, 0 replies; 50+ messages in thread
From: Jan Viktorin @ 2015-12-01 17:02 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

On Tue, 1 Dec 2015 22:11:42 +0530
Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

> On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
> > Adds ARM NEON support for lpm.
> > And enables table/pipeline libraries which depend on lpm.  
> 
> I already sent the patch on the same yesterday.
> We can converge the patches after the discussion.
> Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml

I've missed that too. Did you CC me?

Jan


-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8
@ 2015-12-01 18:41 Jianbo Liu
  2015-12-01 12:47 ` Jan Viktorin
                   ` (5 more replies)
  0 siblings, 6 replies; 50+ messages in thread
From: Jianbo Liu @ 2015-12-01 18:41 UTC (permalink / raw)
  To: dev

Hi,
I'm from Linaro.org, and will work on DPDK to make it better
runing on different ARM Platforms.

This patchset includes a small fix in rte_cycle_32.h,
and enables acl/lpm/table/pipeline libs for armv7 and armv8.
Please apply it after [PATCH v4 0/2] disable CONFIG_RTE_SCHED_VECTOR for arm.

Thanks!
Jianbo


Jianbo Liu (4):
  eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
  eal/acl: enable acl for armv7-a
  eal/arm: Enable lpm/table/pipeline libs
  maintainers: claim resposibility for ARMv7 and ARMv8

 MAINTAINERS                                        |  2 +
 config/defconfig_arm-armv7a-linuxapp-gcc           |  4 --
 config/defconfig_arm64-armv8a-linuxapp-gcc         |  3 -
 lib/librte_acl/Makefile                            |  2 +-
 lib/librte_acl/rte_acl.c                           |  2 +-
 .../common/include/arch/arm/rte_cycles_32.h        |  2 +-
 lib/librte_eal/common/include/arch/arm/rte_vect.h  | 51 ++++++++++++++++
 lib/librte_lpm/rte_lpm.h                           | 68 ++++++++++++++++------
 8 files changed, 105 insertions(+), 29 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/4] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
  2015-12-01 18:41 [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8 Jianbo Liu
  2015-12-01 12:47 ` Jan Viktorin
@ 2015-12-01 18:41 ` Jianbo Liu
  2015-12-01 12:41   ` Jan Viktorin
  2015-12-01 12:43   ` Jan Viktorin
  2015-12-01 18:41 ` [PATCH 2/4] eal/acl: enable acl for armv7-a Jianbo Liu
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 50+ messages in thread
From: Jianbo Liu @ 2015-12-01 18:41 UTC (permalink / raw)
  To: dev

CONFIG_* from config files can not be used in code.

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
---
 lib/librte_eal/common/include/arch/arm/rte_cycles_32.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
index 6c6098e..9c1be71 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
@@ -54,7 +54,7 @@ extern "C" {
  * @return
  *   The time base for this lcore.
  */
-#ifndef CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
+#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
 
 /**
  * This call is easily portable to any ARM architecture, however,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/4] eal/acl: enable acl for armv7-a
  2015-12-01 18:41 [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8 Jianbo Liu
  2015-12-01 12:47 ` Jan Viktorin
  2015-12-01 18:41 ` [PATCH 1/4] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h Jianbo Liu
@ 2015-12-01 18:41 ` Jianbo Liu
  2015-12-01 14:43   ` Jerin Jacob
  2015-12-01 18:41 ` [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs Jianbo Liu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 50+ messages in thread
From: Jianbo Liu @ 2015-12-01 18:41 UTC (permalink / raw)
  To: dev

Implement vqtbl1q_u8 intrinsic function, which is not support in armv7-a.

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
---
 config/defconfig_arm-armv7a-linuxapp-gcc          |  1 -
 lib/librte_acl/Makefile                           |  2 +-
 lib/librte_acl/rte_acl.c                          |  2 +-
 lib/librte_eal/common/include/arch/arm/rte_vect.h | 23 +++++++++++++++++++++++
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index 9924ff9..cbebd64 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -53,7 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
 CONFIG_RTE_EAL_IGB_UIO=n
 
 # fails to compile on ARM
-CONFIG_RTE_LIBRTE_ACL=n
 CONFIG_RTE_LIBRTE_LPM=n
 CONFIG_RTE_LIBRTE_TABLE=n
 CONFIG_RTE_LIBRTE_PIPELINE=n
diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
index 897237d..2e394c9 100644
--- a/lib/librte_acl/Makefile
+++ b/lib/librte_acl/Makefile
@@ -49,7 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
 
-ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
+ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_neon.c
 CFLAGS_acl_run_neon.o += -flax-vector-conversions -Wno-maybe-uninitialized
 else
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index e2fdebd..339aace 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -114,7 +114,7 @@ rte_acl_init(void)
 {
 	enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT;
 
-#ifdef RTE_ARCH_ARM64
+#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
 	alg =  RTE_ACL_CLASSIFY_NEON;
 #else
 #ifdef CC_AVX2_SUPPORT
diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
index 21cdb4d..a33c054 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
@@ -53,6 +53,29 @@ typedef union rte_xmm {
 	double   pd[XMM_SIZE / sizeof(double)];
 } __attribute__((aligned(16))) rte_xmm_t;
 
+#ifdef RTE_ARCH_ARM
+/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
+static __inline uint8x16_t
+vqtbl1q_u8(uint8x16_t a, uint8x16_t b)
+{
+	uint8_t i, pos;
+	rte_xmm_t rte_a, rte_b, rte_ret;
+
+	vst1q_u8(rte_a.u8, a);
+	vst1q_u8(rte_b.u8, b);
+
+	for (i = 0; i < 16; i++) {
+		pos = rte_b.u8[i];
+		if (pos < 16)
+			rte_ret.u8[i] = rte_a.u8[pos];
+		else
+			rte_ret.u8[i] = 0;
+	}
+
+	return vld1q_u8(rte_ret.u8);
+}
+#endif
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-01 18:41 [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8 Jianbo Liu
                   ` (2 preceding siblings ...)
  2015-12-01 18:41 ` [PATCH 2/4] eal/acl: enable acl for armv7-a Jianbo Liu
@ 2015-12-01 18:41 ` Jianbo Liu
  2015-12-01 16:41   ` Jerin Jacob
  2015-12-01 18:41 ` [PATCH 4/4] maintainers: claim resposibility for ARMv7 and ARMv8 Jianbo Liu
  2015-12-03 15:02 ` [PATCH v2 0/3] support acl lib for armv7-a and a small fix Jianbo Liu
  5 siblings, 1 reply; 50+ messages in thread
From: Jianbo Liu @ 2015-12-01 18:41 UTC (permalink / raw)
  To: dev

Adds ARM NEON support for lpm.
And enables table/pipeline libraries which depend on lpm.

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
---
 config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
 config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
 lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
 lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
 4 files changed, 77 insertions(+), 25 deletions(-)

diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index cbebd64..efffa1f 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
 CONFIG_RTE_EAL_IGB_UIO=n
 
 # fails to compile on ARM
-CONFIG_RTE_LIBRTE_LPM=n
-CONFIG_RTE_LIBRTE_TABLE=n
-CONFIG_RTE_LIBRTE_PIPELINE=n
 CONFIG_RTE_SCHED_VECTOR=n
 
 # cannot use those on ARM
diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 504f3ed..57f7941 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
 
-CONFIG_RTE_LIBRTE_LPM=n
-CONFIG_RTE_LIBRTE_TABLE=n
-CONFIG_RTE_LIBRTE_PIPELINE=n
 CONFIG_RTE_SCHED_VECTOR=n
diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
index a33c054..7437711 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
@@ -41,6 +41,8 @@ extern "C" {
 
 typedef int32x4_t xmm_t;
 
+typedef int32x4_t __m128i;
+
 #define	XMM_SIZE	(sizeof(xmm_t))
 #define	XMM_MASK	(XMM_SIZE - 1)
 
@@ -53,6 +55,32 @@ typedef union rte_xmm {
 	double   pd[XMM_SIZE / sizeof(double)];
 } __attribute__((aligned(16))) rte_xmm_t;
 
+static __inline __m128i
+_mm_set_epi32(int i3, int i2, int i1, int i0)
+{
+	int32_t r[4] = {i0, i1, i2, i3};
+
+	return vld1q_s32(r);
+}
+
+static __inline __m128i
+_mm_loadu_si128(__m128i *p)
+{
+	return vld1q_s32((int32_t *)p);
+}
+
+static __inline __m128i
+_mm_set1_epi32(int i)
+{
+	return vdupq_n_s32(i);
+}
+
+static __inline __m128i
+_mm_and_si128(__m128i a, __m128i b)
+{
+	return vandq_s32(a, b);
+}
+
 #ifdef RTE_ARCH_ARM
 /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
 static __inline uint8x16_t
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index c299ce2..c76c07d 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
 /* Mask four results. */
 #define	 RTE_LPM_MASKX4_RES	UINT64_C(0x00ff00ff00ff00ff)
 
+#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
+static inline void
+rte_lpm_tbl24_val4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t tbl[4])
+{
+	uint32x4_t i24;
+	uint32_t idx[4];
+
+	/* get 4 indexes for tbl24[]. */
+	i24 = vshrq_n_u32(vreinterpretq_u32_s32(ip), CHAR_BIT);
+	vst1q_u32(idx, i24);
+
+	/* extract values from tbl24[] */
+	tbl[0] = *(const uint16_t *)&lpm->tbl24[idx[0]];
+	tbl[1] = *(const uint16_t *)&lpm->tbl24[idx[1]];
+	tbl[2] = *(const uint16_t *)&lpm->tbl24[idx[2]];
+	tbl[3] = *(const uint16_t *)&lpm->tbl24[idx[3]];
+}
+#else
+static inline void
+rte_lpm_tbl24_val4(const struct rte_lpm *lpm, __m128i ip, uint16_t tbl[4])
+{
+	__m128i i24;
+	uint64_t idx;
+
+	/* get 4 indexes for tbl24[]. */
+	i24 = _mm_srli_epi32(ip, CHAR_BIT);
+
+	/* extract values from tbl24[] */
+	idx = _mm_cvtsi128_si64(i24);
+	i24 = _mm_srli_si128(i24, sizeof(uint64_t));
+
+	tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
+	tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
+
+	idx = _mm_cvtsi128_si64(i24);
+
+	tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
+	tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
+}
+#endif
+
 /**
  * Lookup four IP addresses in an LPM table.
  *
@@ -381,17 +422,19 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
  *   if lookup would fail.
  */
 static inline void
+#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
+rte_lpm_lookupx4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t hop[4],
+	uint16_t defv)
+#else
 rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
 	uint16_t defv)
+#endif
 {
-	__m128i i24;
 	rte_xmm_t i8;
 	uint16_t tbl[4];
-	uint64_t idx, pt;
-
-	const __m128i mask8 =
-		_mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX);
+	uint64_t pt;
 
+	const __m128i mask8 = _mm_set1_epi32(UINT8_MAX);
 	/*
 	 * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries
 	 * as one 64-bit value (0x0300030003000300).
@@ -412,20 +455,7 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
 		(uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 |
 		(uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48);
 
-	/* get 4 indexes for tbl24[]. */
-	i24 = _mm_srli_epi32(ip, CHAR_BIT);
-
-	/* extract values from tbl24[] */
-	idx = _mm_cvtsi128_si64(i24);
-	i24 = _mm_srli_si128(i24, sizeof(uint64_t));
-
-	tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
-	tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
-
-	idx = _mm_cvtsi128_si64(i24);
-
-	tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
-	tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
+	rte_lpm_tbl24_val4(lpm, ip, tbl);
 
 	/* get 4 indexes for tbl8[]. */
 	i8.x = _mm_and_si128(ip, mask8);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/4] maintainers: claim resposibility for ARMv7 and ARMv8
  2015-12-01 18:41 [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8 Jianbo Liu
                   ` (3 preceding siblings ...)
  2015-12-01 18:41 ` [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs Jianbo Liu
@ 2015-12-01 18:41 ` Jianbo Liu
  2015-12-01 16:44   ` Jerin Jacob
  2015-12-03 15:02 ` [PATCH v2 0/3] support acl lib for armv7-a and a small fix Jianbo Liu
  5 siblings, 1 reply; 50+ messages in thread
From: Jianbo Liu @ 2015-12-01 18:41 UTC (permalink / raw)
  To: dev

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4478862..f859985 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -124,10 +124,12 @@ F: doc/guides/sample_app_ug/multi_process.rst
 
 ARM v7
 M: Jan Viktorin <viktorin@rehivetech.com>
+M: Jianbo Liu <jianbo.liu@linaro.org>
 F: lib/librte_eal/common/include/arch/arm/
 
 ARM v8
 M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
+M: Jianbo Liu <jianbo.liu@linaro.org>
 F: lib/librte_eal/common/include/arch/arm/*_64.h
 F: lib/librte_acl/acl_run_neon.*
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8
  2015-12-01 12:47 ` Jan Viktorin
@ 2015-12-01 20:56   ` Jianbo Liu
  0 siblings, 0 replies; 50+ messages in thread
From: Jianbo Liu @ 2015-12-01 20:56 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev

On Tue, Dec 01, 2015 at 01:47:23PM +0100, Jan Viktorin wrote:
> On Tue,  1 Dec 2015 13:41:12 -0500
> Jianbo Liu <jianbo.liu@linaro.org> wrote:
> 
> > Hi,
> > I'm from Linaro.org, and will work on DPDK to make it better
> > runing on different ARM Platforms.
> > 
> > This patchset includes a small fix in rte_cycle_32.h,
> > and enables acl/lpm/table/pipeline libs for armv7 and armv8.
> > Please apply it after [PATCH v4 0/2] disable CONFIG_RTE_SCHED_VECTOR for arm.
> 
> Would it avoid some merge conflicts or is there some other dependency?
> 
There is no conflicts, but please apply Jerin's patch first since this
patchset is based on that.

> Jan
> 
> > 
> > Thanks!
> > Jianbo
> > 
> > 
> > Jianbo Liu (4):
> >   eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
> >   eal/acl: enable acl for armv7-a
> >   eal/arm: Enable lpm/table/pipeline libs
> >   maintainers: claim resposibility for ARMv7 and ARMv8
> > 
> >  MAINTAINERS                                        |  2 +
> >  config/defconfig_arm-armv7a-linuxapp-gcc           |  4 --
> >  config/defconfig_arm64-armv8a-linuxapp-gcc         |  3 -
> >  lib/librte_acl/Makefile                            |  2 +-
> >  lib/librte_acl/rte_acl.c                           |  2 +-
> >  .../common/include/arch/arm/rte_cycles_32.h        |  2 +-
> >  lib/librte_eal/common/include/arch/arm/rte_vect.h  | 51 ++++++++++++++++
> >  lib/librte_lpm/rte_lpm.h                           | 68 ++++++++++++++++------
> >  8 files changed, 105 insertions(+), 29 deletions(-)
> > 
> 
> 
> 
> -- 
>    Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
>    System Architect              Web:    www.RehiveTech.com
>    RehiveTech
>    Brno, Czech Republic

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] eal/acl: enable acl for armv7-a
  2015-12-01 14:46     ` Jan Viktorin
@ 2015-12-02  6:14       ` Jianbo Liu
  0 siblings, 0 replies; 50+ messages in thread
From: Jianbo Liu @ 2015-12-02  6:14 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev

On 1 December 2015 at 22:46, Jan Viktorin <viktorin@rehivetech.com> wrote:
> On Tue, 1 Dec 2015 20:13:49 +0530
> Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
>
>> >     enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT;
>> >
>> > -#ifdef RTE_ARCH_ARM64
>> > +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>> >     alg =  RTE_ACL_CLASSIFY_NEON;
>>
>> I believe SIMD is optional in armv7. If true, select alg as
>> RTE_ACL_CLASSIFY_NEON only when cpufeature NEON enabled.
>
> Yes. Or, probably, we can be happy with
>
> #if defined(__ARM_NEON_FP)
> ...
> #endif
>
> as it is currently done in rte_memcpy_32.h.
>
> Regards
> Jan

Athough optional for armv7, I believe there is NEON in most of the
popular armv7a chips.
Anyway, I will add the checking...

Thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-01 16:41   ` Jerin Jacob
  2015-12-01 17:02     ` Jan Viktorin
@ 2015-12-02  7:02     ` Jianbo Liu
       [not found]     ` <CAP4Qi3-5ofDU-2-4KsxFzMC1OpTsc5WjmxcFT2Eu_URA0UBzDw@mail.gmail.com>
  2 siblings, 0 replies; 50+ messages in thread
From: Jianbo Liu @ 2015-12-02  7:02 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

On 2 December 2015 at 00:41, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
>> Adds ARM NEON support for lpm.
>> And enables table/pipeline libraries which depend on lpm.
>
> I already sent the patch on the same yesterday.
> We can converge the patches after the discussion.
> Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
>
Yes, I have read your patch. But there are many differences, so I sent
mine for your reviewing :)

>
>>
>> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
>> ---
>>  config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
>>  config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
>>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
>>  lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
>>  4 files changed, 77 insertions(+), 25 deletions(-)
>>
>> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
>> index cbebd64..efffa1f 100644
>> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
>> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
>> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
>>  CONFIG_RTE_EAL_IGB_UIO=n
>>
>>  # fails to compile on ARM
>> -CONFIG_RTE_LIBRTE_LPM=n
>> -CONFIG_RTE_LIBRTE_TABLE=n
>> -CONFIG_RTE_LIBRTE_PIPELINE=n
>>  CONFIG_RTE_SCHED_VECTOR=n
>>
>>  # cannot use those on ARM
>> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> index 504f3ed..57f7941 100644
>> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
>> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
>>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
>>  CONFIG_RTE_LIBRTE_I40E_PMD=n
>>
>> -CONFIG_RTE_LIBRTE_LPM=n
>> -CONFIG_RTE_LIBRTE_TABLE=n
>> -CONFIG_RTE_LIBRTE_PIPELINE=n
>>  CONFIG_RTE_SCHED_VECTOR=n
>> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> index a33c054..7437711 100644
>> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> @@ -41,6 +41,8 @@ extern "C" {
>>
>>  typedef int32x4_t xmm_t;
>>
>> +typedef int32x4_t __m128i;
>> +
>>  #define      XMM_SIZE        (sizeof(xmm_t))
>>  #define      XMM_MASK        (XMM_SIZE - 1)
>>
>> @@ -53,6 +55,32 @@ typedef union rte_xmm {
>>       double   pd[XMM_SIZE / sizeof(double)];
>>  } __attribute__((aligned(16))) rte_xmm_t;
>>
>> +static __inline __m128i
>> +_mm_set_epi32(int i3, int i2, int i1, int i0)
>> +{
>> +     int32_t r[4] = {i0, i1, i2, i3};
>> +
>> +     return vld1q_s32(r);
>> +}
>> +
>> +static __inline __m128i
>> +_mm_loadu_si128(__m128i *p)
>> +{
>> +     return vld1q_s32((int32_t *)p);
>> +}
>> +
>> +static __inline __m128i
>> +_mm_set1_epi32(int i)
>> +{
>> +     return vdupq_n_s32(i);
>> +}
>> +
>> +static __inline __m128i
>> +_mm_and_si128(__m128i a, __m128i b)
>> +{
>> +     return vandq_s32(a, b);
>> +}
>> +
>
> IMO, it makes sense to not emulate the SSE intrinsics with NEON
> Let's create the rte_vect_* as required. look at the existing patch.
>
I thought of creating a layer of SIMD over all the platforms before.
But can't you see it make things complicated, considering there are
only few simple intrinsic to implement?
If do so, we also need to explain to others how to use these interfaces.
Besides, this patch did the smallest changes to the original code, and
more likely to be accepted by others.

>
>>  #ifdef RTE_ARCH_ARM
>>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>>  static __inline uint8x16_t
>> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
>> index c299ce2..c76c07d 100644
>> --- a/lib/librte_lpm/rte_lpm.h
>> +++ b/lib/librte_lpm/rte_lpm.h
>> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
>>  /* Mask four results. */
>>  #define       RTE_LPM_MASKX4_RES     UINT64_C(0x00ff00ff00ff00ff)
>>
>> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>
> Separate out arm implementation to the different header file.
> Too many ifdef looks odd in the header file and difficult to manage.
>
But there are many ifdefs already.
And It seems unreasonable to add a new file only for one small function.

>
>> +static inline void
>> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t tbl[4])
>> +{
>> +     uint32x4_t i24;
>> +     uint32_t idx[4];
>> +
>> +     /* get 4 indexes for tbl24[]. */
>> +     i24 = vshrq_n_u32(vreinterpretq_u32_s32(ip), CHAR_BIT);
>> +     vst1q_u32(idx, i24);
>> +
>> +     /* extract values from tbl24[] */
>> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[idx[0]];
>> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx[1]];
>> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[idx[2]];
>> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx[3]];
>> +}
>
> Nice. There is an improvement in this portion code wrt my patch. This is
> a candidate for convergence.
>
>
>> +#else
>> +static inline void
>> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, __m128i ip, uint16_t tbl[4])
>> +{
>> +     __m128i i24;
>> +     uint64_t idx;
>> +
>> +     /* get 4 indexes for tbl24[]. */
>> +     i24 = _mm_srli_epi32(ip, CHAR_BIT);
>> +
>> +     /* extract values from tbl24[] */
>> +     idx = _mm_cvtsi128_si64(i24);
>> +     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
>> +
>> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> +
>> +     idx = _mm_cvtsi128_si64(i24);
>> +
>> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> +}
>> +#endif
>> +
>>  /**
>>   * Lookup four IP addresses in an LPM table.
>>   *
>> @@ -381,17 +422,19 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
>>   *   if lookup would fail.
>>   */
>>  static inline void
>> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>> +rte_lpm_lookupx4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t hop[4],
>> +     uint16_t defv)
>
> This would call for change in the change the ABI,
> IMO, __m128i can be used to represent 128bit vector to avoid ABI chang
>
This redefine rte_lpm_lookupx4 is unncessary, I will remove it, so no
ABI change.
And there only one ifdef for ARM platforms left.

>
>> +#else
> separate out arm implementation to the different header file. Too many
> ifdef looks odd in the header file.
>
> Could you  rebase your patch based on existing patch and send the
> improvement portion as separate patch or I can send update patch with
> your improvements and with your signoff.
>
>
>>  rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
>>       uint16_t defv)
>> +#endif
>>  {
>> -     __m128i i24;
>>       rte_xmm_t i8;
>>       uint16_t tbl[4];
>> -     uint64_t idx, pt;
>> -
>> -     const __m128i mask8 =
>> -             _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX);
>> +     uint64_t pt;
>>
>> +     const __m128i mask8 = _mm_set1_epi32(UINT8_MAX);
>>       /*
>>        * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries
>>        * as one 64-bit value (0x0300030003000300).
>> @@ -412,20 +455,7 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
>>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 |
>>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48);
>>
>> -     /* get 4 indexes for tbl24[]. */
>> -     i24 = _mm_srli_epi32(ip, CHAR_BIT);
>> -
>> -     /* extract values from tbl24[] */
>> -     idx = _mm_cvtsi128_si64(i24);
>> -     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
>> -
>> -     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> -     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> -
>> -     idx = _mm_cvtsi128_si64(i24);
>> -
>> -     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> -     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> +     rte_lpm_tbl24_val4(lpm, ip, tbl);
>>
>>       /* get 4 indexes for tbl8[]. */
>>       i8.x = _mm_and_si128(ip, mask8);
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
       [not found]     ` <CAP4Qi3-5ofDU-2-4KsxFzMC1OpTsc5WjmxcFT2Eu_URA0UBzDw@mail.gmail.com>
@ 2015-12-02  8:03       ` Jerin Jacob
  2015-12-02  9:49         ` Jianbo Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Jerin Jacob @ 2015-12-02  8:03 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote:
> On 2 December 2015 at 00:41, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
> >> Adds ARM NEON support for lpm.
> >> And enables table/pipeline libraries which depend on lpm.
> >
> > I already sent the patch on the same yesterday.
> > We can converge the patches after the discussion.
> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
> >
> Yes, I have read your patch. But there are many differences, so I sent
> mine for your reviewing :)
> 
> >
> >>
> >> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> >> ---
> >>  config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
> >>  config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
> >>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
> >>  lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
> >>  4 files changed, 77 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> >> index cbebd64..efffa1f 100644
> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
> >>  CONFIG_RTE_EAL_IGB_UIO=n
> >>
> >>  # fails to compile on ARM
> >> -CONFIG_RTE_LIBRTE_LPM=n
> >> -CONFIG_RTE_LIBRTE_TABLE=n
> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> >>  CONFIG_RTE_SCHED_VECTOR=n
> >>
> >>  # cannot use those on ARM
> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> index 504f3ed..57f7941 100644
> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
> >>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
> >>  CONFIG_RTE_LIBRTE_I40E_PMD=n
> >>
> >> -CONFIG_RTE_LIBRTE_LPM=n
> >> -CONFIG_RTE_LIBRTE_TABLE=n
> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> >>  CONFIG_RTE_SCHED_VECTOR=n
> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> index a33c054..7437711 100644
> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> @@ -41,6 +41,8 @@ extern "C" {
> >>
> >>  typedef int32x4_t xmm_t;
> >>
> >> +typedef int32x4_t __m128i;
> >> +
> >>  #define      XMM_SIZE        (sizeof(xmm_t))
> >>  #define      XMM_MASK        (XMM_SIZE - 1)
> >>
> >> @@ -53,6 +55,32 @@ typedef union rte_xmm {
> >>       double   pd[XMM_SIZE / sizeof(double)];
> >>  } __attribute__((aligned(16))) rte_xmm_t;
> >>
> >> +static __inline __m128i
> >> +_mm_set_epi32(int i3, int i2, int i1, int i0)
> >> +{
> >> +     int32_t r[4] = {i0, i1, i2, i3};
> >> +
> >> +     return vld1q_s32(r);
> >> +}
> >> +
> >> +static __inline __m128i
> >> +_mm_loadu_si128(__m128i *p)
> >> +{
> >> +     return vld1q_s32((int32_t *)p);
> >> +}
> >> +
> >> +static __inline __m128i
> >> +_mm_set1_epi32(int i)
> >> +{
> >> +     return vdupq_n_s32(i);
> >> +}
> >> +
> >> +static __inline __m128i
> >> +_mm_and_si128(__m128i a, __m128i b)
> >> +{
> >> +     return vandq_s32(a, b);
> >> +}
> >> +

IMO, it's not always good to emulate GCC defined intrinsics of
other architecture. What if a legacy DPDK application has such mappings
then BOOM, multiple definition, which one is correct? which one
to comment it out? Integration pain starts for DPDK library consumer:-(

> >
> > IMO, it makes sense to not emulate the SSE intrinsics with NEON
> > Let's create the rte_vect_* as required. look at the existing patch.
> >
> I thought of creating a layer of SIMD over all the platforms before.
> But can't you see it make things complicated, considering there are
> only few simple intrinsic to implement?

Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON
implementation if I were to take this approach and emulation comes with
the cost.

So my take is,
lets the each architecture implementation for specific SIMD version of DPDK
API in the library should have the freedom to implement the API in
NATIVE.

And let's create only rte_vect_* abstraction only for using
that API/library. Which boils down to have very minimal rte_vect_*
abstraction to load, store, set not beyond that.

This makes clear "contract" between DPDK library and the applications.
and make easy for remaning new architecture  porting effort in DPDK.

Imagine how your proposed function will look like if new architecture
wants to implement "optimized" version of rte_lpm_lookupx4


> If do so, we also need to explain to others how to use these interfaces.
> Besides, this patch did the smallest changes to the original code, and
> more likely to be accepted by others.

other patch makes no changes to IA version of rte_lpm_lookupx4.I thought
that make reviewer easy to review the changes in architecture
perspective.

> 
> >
> >>  #ifdef RTE_ARCH_ARM
> >>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> >>  static __inline uint8x16_t
> >> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> >> index c299ce2..c76c07d 100644
> >> --- a/lib/librte_lpm/rte_lpm.h
> >> +++ b/lib/librte_lpm/rte_lpm.h
> >> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
> >>  /* Mask four results. */
> >>  #define       RTE_LPM_MASKX4_RES     UINT64_C(0x00ff00ff00ff00ff)
> >>
> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> >
> > Separate out arm implementation to the different header file.
> > Too many ifdef looks odd in the header file and difficult to manage.
> >
> But there are many ifdefs already.
> And It seems unreasonable to add a new file only for one small function.
> 

small or big, its matter of each architecture to have
the freedom for the optimized version for the implementation.

What if  other architecture demands to write this function in assembly
or restructure it for performance improvement?


> >
> >> +static inline void
> >> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t tbl[4])
> >> +{
> >> +     uint32x4_t i24;
> >> +     uint32_t idx[4];
> >> +
> >> +     /* get 4 indexes for tbl24[]. */
> >> +     i24 = vshrq_n_u32(vreinterpretq_u32_s32(ip), CHAR_BIT);
> >> +     vst1q_u32(idx, i24);
> >> +
> >> +     /* extract values from tbl24[] */
> >> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[idx[0]];
> >> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx[1]];
> >> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[idx[2]];
> >> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx[3]];
> >> +}
> >
> > Nice. There is an improvement in this portion code wrt my patch. This is
> > a candidate for convergence.
> >
> >
> >> +#else
> >> +static inline void
> >> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, __m128i ip, uint16_t tbl[4])
> >> +{
> >> +     __m128i i24;
> >> +     uint64_t idx;
> >> +
> >> +     /* get 4 indexes for tbl24[]. */
> >> +     i24 = _mm_srli_epi32(ip, CHAR_BIT);
> >> +
> >> +     /* extract values from tbl24[] */
> >> +     idx = _mm_cvtsi128_si64(i24);
> >> +     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
> >> +
> >> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> +
> >> +     idx = _mm_cvtsi128_si64(i24);
> >> +
> >> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> +}
> >> +#endif
> >> +
> >>  /**
> >>   * Lookup four IP addresses in an LPM table.
> >>   *
> >> @@ -381,17 +422,19 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
> >>   *   if lookup would fail.
> >>   */
> >>  static inline void
> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> >> +rte_lpm_lookupx4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t hop[4],
> >> +     uint16_t defv)
> >
> > This would call for change in the change the ABI,
> > IMO, __m128i can be used to represent 128bit vector to avoid ABI chang
> >
> This redefine rte_lpm_lookupx4 is unncessary, I will remove it, so no
> ABI change.
> And there only one ifdef for ARM platforms left.
> 
> >
> >> +#else
> > separate out arm implementation to the different header file. Too many
> > ifdef looks odd in the header file.
> >
> > Could you  rebase your patch based on existing patch and send the
> > improvement portion as separate patch or I can send update patch with
> > your improvements and with your signoff.
> >
> >
> >>  rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
> >>       uint16_t defv)
> >> +#endif
> >>  {
> >> -     __m128i i24;
> >>       rte_xmm_t i8;
> >>       uint16_t tbl[4];
> >> -     uint64_t idx, pt;
> >> -
> >> -     const __m128i mask8 =
> >> -             _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX);
> >> +     uint64_t pt;
> >>
> >> +     const __m128i mask8 = _mm_set1_epi32(UINT8_MAX);
> >>       /*
> >>        * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries
> >>        * as one 64-bit value (0x0300030003000300).
> >> @@ -412,20 +455,7 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
> >>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 |
> >>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48);
> >>
> >> -     /* get 4 indexes for tbl24[]. */
> >> -     i24 = _mm_srli_epi32(ip, CHAR_BIT);
> >> -
> >> -     /* extract values from tbl24[] */
> >> -     idx = _mm_cvtsi128_si64(i24);
> >> -     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
> >> -
> >> -     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> -     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> -
> >> -     idx = _mm_cvtsi128_si64(i24);
> >> -
> >> -     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> -     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> +     rte_lpm_tbl24_val4(lpm, ip, tbl);
> >>
> >>       /* get 4 indexes for tbl8[]. */
> >>       i8.x = _mm_and_si128(ip, mask8);
> >> --
> >> 1.8.3.1
> >>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02  8:03       ` Jerin Jacob
@ 2015-12-02  9:49         ` Jianbo Liu
  2015-12-02 10:33           ` Ananyev, Konstantin
  2015-12-02 10:39           ` Jerin Jacob
  0 siblings, 2 replies; 50+ messages in thread
From: Jianbo Liu @ 2015-12-02  9:49 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

On 2 December 2015 at 16:03, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote:
>> On 2 December 2015 at 00:41, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
>> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
>> >> Adds ARM NEON support for lpm.
>> >> And enables table/pipeline libraries which depend on lpm.
>> >
>> > I already sent the patch on the same yesterday.
>> > We can converge the patches after the discussion.
>> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
>> >
>> Yes, I have read your patch. But there are many differences, so I sent
>> mine for your reviewing :)
>>
>> >
>> >>
>> >> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
>> >> ---
>> >>  config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
>> >>  config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
>> >>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
>> >>  lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
>> >>  4 files changed, 77 insertions(+), 25 deletions(-)
>> >>
>> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> index cbebd64..efffa1f 100644
>> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
>> >>  CONFIG_RTE_EAL_IGB_UIO=n
>> >>
>> >>  # fails to compile on ARM
>> >> -CONFIG_RTE_LIBRTE_LPM=n
>> >> -CONFIG_RTE_LIBRTE_TABLE=n
>> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
>> >>  CONFIG_RTE_SCHED_VECTOR=n
>> >>
>> >>  # cannot use those on ARM
>> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> index 504f3ed..57f7941 100644
>> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
>> >>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
>> >>  CONFIG_RTE_LIBRTE_I40E_PMD=n
>> >>
>> >> -CONFIG_RTE_LIBRTE_LPM=n
>> >> -CONFIG_RTE_LIBRTE_TABLE=n
>> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
>> >>  CONFIG_RTE_SCHED_VECTOR=n
>> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> index a33c054..7437711 100644
>> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> @@ -41,6 +41,8 @@ extern "C" {
>> >>
>> >>  typedef int32x4_t xmm_t;
>> >>
>> >> +typedef int32x4_t __m128i;
>> >> +
>> >>  #define      XMM_SIZE        (sizeof(xmm_t))
>> >>  #define      XMM_MASK        (XMM_SIZE - 1)
>> >>
>> >> @@ -53,6 +55,32 @@ typedef union rte_xmm {
>> >>       double   pd[XMM_SIZE / sizeof(double)];
>> >>  } __attribute__((aligned(16))) rte_xmm_t;
>> >>
>> >> +static __inline __m128i
>> >> +_mm_set_epi32(int i3, int i2, int i1, int i0)
>> >> +{
>> >> +     int32_t r[4] = {i0, i1, i2, i3};
>> >> +
>> >> +     return vld1q_s32(r);
>> >> +}
>> >> +
>> >> +static __inline __m128i
>> >> +_mm_loadu_si128(__m128i *p)
>> >> +{
>> >> +     return vld1q_s32((int32_t *)p);
>> >> +}
>> >> +
>> >> +static __inline __m128i
>> >> +_mm_set1_epi32(int i)
>> >> +{
>> >> +     return vdupq_n_s32(i);
>> >> +}
>> >> +
>> >> +static __inline __m128i
>> >> +_mm_and_si128(__m128i a, __m128i b)
>> >> +{
>> >> +     return vandq_s32(a, b);
>> >> +}
>> >> +
>
> IMO, it's not always good to emulate GCC defined intrinsics of
> other architecture. What if a legacy DPDK application has such mappings
> then BOOM, multiple definition, which one is correct? which one
> to comment it out? Integration pain starts for DPDK library consumer:-(
>
They can include rte_vect.h in build/include directly, which is linked correctly
to the one for that ARCH, so there is no need to worry about.


>> >
>> > IMO, it makes sense to not emulate the SSE intrinsics with NEON
>> > Let's create the rte_vect_* as required. look at the existing patch.
>> >
>> I thought of creating a layer of SIMD over all the platforms before.
>> But can't you see it make things complicated, considering there are
>> only few simple intrinsic to implement?
>
> Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON
> implementation if I were to take this approach and emulation comes with
> the cost.
>
No, I will not re-implement all the intrinsic like that .
I only do with the simple intrinsic, such as load/store, as you said below.

> So my take is,
> lets the each architecture implementation for specific SIMD version of DPDK
> API in the library should have the freedom to implement the API in
> NATIVE.
>
> And let's create only rte_vect_* abstraction only for using
> that API/library. Which boils down to have very minimal rte_vect_*
> abstraction to load, store, set not beyond that.
>
> This makes clear "contract" between DPDK library and the applications.
> and make easy for remaning new architecture  porting effort in DPDK.
>
Agree.
But I reuse existing intrinsic names, and you recreate new ones.
And I try to do as few changes as possible, and try to avoid any
mistaken which may cause code un-compiled.
I think it's design level question, we need to hear what others talk about it.

> Imagine how your proposed function will look like if new architecture
> wants to implement "optimized" version of rte_lpm_lookupx4
>
There is no optimization for this (simple) rte_lpm_lookupx4, otherwise
you have done that in your patch.
If there is for other new platform, defintely they should do like
yours, as you did for NEON ACL.

>
>> If do so, we also need to explain to others how to use these interfaces.
>> Besides, this patch did the smallest changes to the original code, and
>> more likely to be accepted by others.
>
> other patch makes no changes to IA version of rte_lpm_lookupx4.I thought
> that make reviewer easy to review the changes in architecture
> perspective.
>
As I know, they don't enable LPM for PPC, and ARM is the first one to
touch this issue.

>>
>> >
>> >>  #ifdef RTE_ARCH_ARM
>> >>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>> >>  static __inline uint8x16_t
>> >> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
>> >> index c299ce2..c76c07d 100644
>> >> --- a/lib/librte_lpm/rte_lpm.h
>> >> +++ b/lib/librte_lpm/rte_lpm.h
>> >> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
>> >>  /* Mask four results. */
>> >>  #define       RTE_LPM_MASKX4_RES     UINT64_C(0x00ff00ff00ff00ff)
>> >>
>> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>> >
>> > Separate out arm implementation to the different header file.
>> > Too many ifdef looks odd in the header file and difficult to manage.
>> >
>> But there are many ifdefs already.
>> And It seems unreasonable to add a new file only for one small function.
>>
>
> small or big, its matter of each architecture to have
> the freedom for the optimized version for the implementation.
>
> What if  other architecture demands to write this function in assembly
> or restructure it for performance improvement?
>
If there is such demands, should do like that.
But I don't see any restructure in your patch, and you still follow
the logic as x86, is it worth adding a new file?

>
>> >
>> >> +static inline void
>> >> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t tbl[4])
>> >> +{
>> >> +     uint32x4_t i24;
>> >> +     uint32_t idx[4];
>> >> +
>> >> +     /* get 4 indexes for tbl24[]. */
>> >> +     i24 = vshrq_n_u32(vreinterpretq_u32_s32(ip), CHAR_BIT);
>> >> +     vst1q_u32(idx, i24);
>> >> +
>> >> +     /* extract values from tbl24[] */
>> >> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[idx[0]];
>> >> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx[1]];
>> >> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[idx[2]];
>> >> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx[3]];
>> >> +}
>> >
>> > Nice. There is an improvement in this portion code wrt my patch. This is
>> > a candidate for convergence.
>> >
>> >
>> >> +#else
>> >> +static inline void
>> >> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, __m128i ip, uint16_t tbl[4])
>> >> +{
>> >> +     __m128i i24;
>> >> +     uint64_t idx;
>> >> +
>> >> +     /* get 4 indexes for tbl24[]. */
>> >> +     i24 = _mm_srli_epi32(ip, CHAR_BIT);
>> >> +
>> >> +     /* extract values from tbl24[] */
>> >> +     idx = _mm_cvtsi128_si64(i24);
>> >> +     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
>> >> +
>> >> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> >> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> >> +
>> >> +     idx = _mm_cvtsi128_si64(i24);
>> >> +
>> >> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> >> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> >> +}
>> >> +#endif
>> >> +
>> >>  /**
>> >>   * Lookup four IP addresses in an LPM table.
>> >>   *
>> >> @@ -381,17 +422,19 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
>> >>   *   if lookup would fail.
>> >>   */
>> >>  static inline void
>> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>> >> +rte_lpm_lookupx4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t hop[4],
>> >> +     uint16_t defv)
>> >
>> > This would call for change in the change the ABI,
>> > IMO, __m128i can be used to represent 128bit vector to avoid ABI chang
>> >
>> This redefine rte_lpm_lookupx4 is unncessary, I will remove it, so no
>> ABI change.
>> And there only one ifdef for ARM platforms left.
>>
>> >
>> >> +#else
>> > separate out arm implementation to the different header file. Too many
>> > ifdef looks odd in the header file.
>> >
>> > Could you  rebase your patch based on existing patch and send the
>> > improvement portion as separate patch or I can send update patch with
>> > your improvements and with your signoff.
>> >
>> >
>> >>  rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
>> >>       uint16_t defv)
>> >> +#endif
>> >>  {
>> >> -     __m128i i24;
>> >>       rte_xmm_t i8;
>> >>       uint16_t tbl[4];
>> >> -     uint64_t idx, pt;
>> >> -
>> >> -     const __m128i mask8 =
>> >> -             _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX);
>> >> +     uint64_t pt;
>> >>
>> >> +     const __m128i mask8 = _mm_set1_epi32(UINT8_MAX);
>> >>       /*
>> >>        * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries
>> >>        * as one 64-bit value (0x0300030003000300).
>> >> @@ -412,20 +455,7 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
>> >>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 |
>> >>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48);
>> >>
>> >> -     /* get 4 indexes for tbl24[]. */
>> >> -     i24 = _mm_srli_epi32(ip, CHAR_BIT);
>> >> -
>> >> -     /* extract values from tbl24[] */
>> >> -     idx = _mm_cvtsi128_si64(i24);
>> >> -     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
>> >> -
>> >> -     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> >> -     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> >> -
>> >> -     idx = _mm_cvtsi128_si64(i24);
>> >> -
>> >> -     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> >> -     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> >> +     rte_lpm_tbl24_val4(lpm, ip, tbl);
>> >>
>> >>       /* get 4 indexes for tbl8[]. */
>> >>       i8.x = _mm_and_si128(ip, mask8);
>> >> --
>> >> 1.8.3.1
>> >>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02  9:49         ` Jianbo Liu
@ 2015-12-02 10:33           ` Ananyev, Konstantin
  2015-12-02 10:48             ` Jerin Jacob
  2015-12-02 10:39           ` Jerin Jacob
  1 sibling, 1 reply; 50+ messages in thread
From: Ananyev, Konstantin @ 2015-12-02 10:33 UTC (permalink / raw)
  To: Jianbo Liu, Jerin Jacob; +Cc: dev

Hi everyone,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
> Sent: Wednesday, December 02, 2015 9:50 AM
> To: Jerin Jacob
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> 
> On 2 December 2015 at 16:03, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote:
> >> On 2 December 2015 at 00:41, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> >> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
> >> >> Adds ARM NEON support for lpm.
> >> >> And enables table/pipeline libraries which depend on lpm.
> >> >
> >> > I already sent the patch on the same yesterday.
> >> > We can converge the patches after the discussion.
> >> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
> >> >
> >> Yes, I have read your patch. But there are many differences, so I sent
> >> mine for your reviewing :)
> >>
> >> >
> >> >>
> >> >> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> >> >> ---
> >> >>  config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
> >> >>  config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
> >> >>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
> >> >>  lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
> >> >>  4 files changed, 77 insertions(+), 25 deletions(-)
> >> >>
> >> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> index cbebd64..efffa1f 100644
> >> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
> >> >>  CONFIG_RTE_EAL_IGB_UIO=n
> >> >>
> >> >>  # fails to compile on ARM
> >> >> -CONFIG_RTE_LIBRTE_LPM=n
> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> >> >>  CONFIG_RTE_SCHED_VECTOR=n
> >> >>
> >> >>  # cannot use those on ARM
> >> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> index 504f3ed..57f7941 100644
> >> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
> >> >>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
> >> >>  CONFIG_RTE_LIBRTE_I40E_PMD=n
> >> >>
> >> >> -CONFIG_RTE_LIBRTE_LPM=n
> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> >> >>  CONFIG_RTE_SCHED_VECTOR=n
> >> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> index a33c054..7437711 100644
> >> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> @@ -41,6 +41,8 @@ extern "C" {
> >> >>
> >> >>  typedef int32x4_t xmm_t;
> >> >>
> >> >> +typedef int32x4_t __m128i;
> >> >> +
> >> >>  #define      XMM_SIZE        (sizeof(xmm_t))
> >> >>  #define      XMM_MASK        (XMM_SIZE - 1)
> >> >>
> >> >> @@ -53,6 +55,32 @@ typedef union rte_xmm {
> >> >>       double   pd[XMM_SIZE / sizeof(double)];
> >> >>  } __attribute__((aligned(16))) rte_xmm_t;
> >> >>
> >> >> +static __inline __m128i
> >> >> +_mm_set_epi32(int i3, int i2, int i1, int i0)
> >> >> +{
> >> >> +     int32_t r[4] = {i0, i1, i2, i3};
> >> >> +
> >> >> +     return vld1q_s32(r);
> >> >> +}
> >> >> +
> >> >> +static __inline __m128i
> >> >> +_mm_loadu_si128(__m128i *p)
> >> >> +{
> >> >> +     return vld1q_s32((int32_t *)p);
> >> >> +}
> >> >> +
> >> >> +static __inline __m128i
> >> >> +_mm_set1_epi32(int i)
> >> >> +{
> >> >> +     return vdupq_n_s32(i);
> >> >> +}
> >> >> +
> >> >> +static __inline __m128i
> >> >> +_mm_and_si128(__m128i a, __m128i b)
> >> >> +{
> >> >> +     return vandq_s32(a, b);
> >> >> +}
> >> >> +
> >
> > IMO, it's not always good to emulate GCC defined intrinsics of
> > other architecture. What if a legacy DPDK application has such mappings
> > then BOOM, multiple definition, which one is correct? which one
> > to comment it out? Integration pain starts for DPDK library consumer:-(
> >
> They can include rte_vect.h in build/include directly, which is linked correctly
> to the one for that ARCH, so there is no need to worry about.
> 
> 
> >> >
> >> > IMO, it makes sense to not emulate the SSE intrinsics with NEON
> >> > Let's create the rte_vect_* as required. look at the existing patch.
> >> >
> >> I thought of creating a layer of SIMD over all the platforms before.
> >> But can't you see it make things complicated, considering there are
> >> only few simple intrinsic to implement?
> >
> > Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON
> > implementation if I were to take this approach and emulation comes with
> > the cost.
> >
> No, I will not re-implement all the intrinsic like that .
> I only do with the simple intrinsic, such as load/store, as you said below.
> 
> > So my take is,
> > lets the each architecture implementation for specific SIMD version of DPDK
> > API in the library should have the freedom to implement the API in
> > NATIVE.
> >
> > And let's create only rte_vect_* abstraction only for using
> > that API/library. Which boils down to have very minimal rte_vect_*
> > abstraction to load, store, set not beyond that.
> >
> > This makes clear "contract" between DPDK library and the applications.
> > and make easy for remaning new architecture  porting effort in DPDK.
> >
> Agree.
> But I reuse existing intrinsic names, and you recreate new ones.
> And I try to do as few changes as possible, and try to avoid any
> mistaken which may cause code un-compiled.
> I think it's design level question, we need to hear what others talk about it.
> 
> > Imagine how your proposed function will look like if new architecture
> > wants to implement "optimized" version of rte_lpm_lookupx4
> >
> There is no optimization for this (simple) rte_lpm_lookupx4, otherwise
> you have done that in your patch.
> If there is for other new platform, defintely they should do like
> yours, as you did for NEON ACL.
> 
> >
> >> If do so, we also need to explain to others how to use these interfaces.
> >> Besides, this patch did the smallest changes to the original code, and
> >> more likely to be accepted by others.
> >
> > other patch makes no changes to IA version of rte_lpm_lookupx4.I thought
> > that make reviewer easy to review the changes in architecture
> > perspective.
> >
> As I know, they don't enable LPM for PPC, and ARM is the first one to
> touch this issue.
> 
> >>
> >> >
> >> >>  #ifdef RTE_ARCH_ARM
> >> >>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> >> >>  static __inline uint8x16_t
> >> >> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> >> >> index c299ce2..c76c07d 100644
> >> >> --- a/lib/librte_lpm/rte_lpm.h
> >> >> +++ b/lib/librte_lpm/rte_lpm.h
> >> >> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
> >> >>  /* Mask four results. */
> >> >>  #define       RTE_LPM_MASKX4_RES     UINT64_C(0x00ff00ff00ff00ff)
> >> >>
> >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> >> >
> >> > Separate out arm implementation to the different header file.
> >> > Too many ifdef looks odd in the header file and difficult to manage.
> >> >
> >> But there are many ifdefs already.
> >> And It seems unreasonable to add a new file only for one small function.
> >>
> >
> > small or big, its matter of each architecture to have
> > the freedom for the optimized version for the implementation.
> >
> > What if  other architecture demands to write this function in assembly
> > or restructure it for performance improvement?
> >
> If there is such demands, should do like that.
> But I don't see any restructure in your patch, and you still follow
> the logic as x86, is it worth adding a new file?
> 

My preference would also be to put architecture dependent implementation
into different files. 
Might be create lib/librte_lpm/arch/(arm|x86)/... here?
Konstantin  


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02  9:49         ` Jianbo Liu
  2015-12-02 10:33           ` Ananyev, Konstantin
@ 2015-12-02 10:39           ` Jerin Jacob
  2015-12-02 13:05             ` Jan Viktorin
  2015-12-02 13:13             ` Jianbo Liu
  1 sibling, 2 replies; 50+ messages in thread
From: Jerin Jacob @ 2015-12-02 10:39 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

On Wed, Dec 02, 2015 at 05:49:41PM +0800, Jianbo Liu wrote:
> On 2 December 2015 at 16:03, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote:
> >> On 2 December 2015 at 00:41, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> >> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
> >> >> Adds ARM NEON support for lpm.
> >> >> And enables table/pipeline libraries which depend on lpm.
> >> >
> >> > I already sent the patch on the same yesterday.
> >> > We can converge the patches after the discussion.
> >> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
> >> >
> >> Yes, I have read your patch. But there are many differences, so I sent
> >> mine for your reviewing :)
> >>
> >> >
> >> >>
> >> >> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> >> >> ---
> >> >>  config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
> >> >>  config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
> >> >>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
> >> >>  lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
> >> >>  4 files changed, 77 insertions(+), 25 deletions(-)
> >> >>
> >> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> index cbebd64..efffa1f 100644
> >> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
> >> >>  CONFIG_RTE_EAL_IGB_UIO=n
> >> >>
> >> >>  # fails to compile on ARM
> >> >> -CONFIG_RTE_LIBRTE_LPM=n
> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> >> >>  CONFIG_RTE_SCHED_VECTOR=n
> >> >>
> >> >>  # cannot use those on ARM
> >> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> index 504f3ed..57f7941 100644
> >> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
> >> >>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
> >> >>  CONFIG_RTE_LIBRTE_I40E_PMD=n
> >> >>
> >> >> -CONFIG_RTE_LIBRTE_LPM=n
> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> >> >>  CONFIG_RTE_SCHED_VECTOR=n
> >> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> index a33c054..7437711 100644
> >> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> @@ -41,6 +41,8 @@ extern "C" {
> >> >>
> >> >>  typedef int32x4_t xmm_t;
> >> >>
> >> >> +typedef int32x4_t __m128i;
> >> >> +
> >> >>  #define      XMM_SIZE        (sizeof(xmm_t))
> >> >>  #define      XMM_MASK        (XMM_SIZE - 1)
> >> >>
> >> >> @@ -53,6 +55,32 @@ typedef union rte_xmm {
> >> >>       double   pd[XMM_SIZE / sizeof(double)];
> >> >>  } __attribute__((aligned(16))) rte_xmm_t;
> >> >>
> >> >> +static __inline __m128i
> >> >> +_mm_set_epi32(int i3, int i2, int i1, int i0)
> >> >> +{
> >> >> +     int32_t r[4] = {i0, i1, i2, i3};
> >> >> +
> >> >> +     return vld1q_s32(r);
> >> >> +}
> >> >> +
> >> >> +static __inline __m128i
> >> >> +_mm_loadu_si128(__m128i *p)
> >> >> +{
> >> >> +     return vld1q_s32((int32_t *)p);
> >> >> +}
> >> >> +
> >> >> +static __inline __m128i
> >> >> +_mm_set1_epi32(int i)
> >> >> +{
> >> >> +     return vdupq_n_s32(i);
> >> >> +}
> >> >> +
> >> >> +static __inline __m128i
> >> >> +_mm_and_si128(__m128i a, __m128i b)
> >> >> +{
> >> >> +     return vandq_s32(a, b);
> >> >> +}
> >> >> +
> >
> > IMO, it's not always good to emulate GCC defined intrinsics of
> > other architecture. What if a legacy DPDK application has such mappings
> > then BOOM, multiple definition, which one is correct? which one
> > to comment it out? Integration pain starts for DPDK library consumer:-(
> >
> They can include rte_vect.h in build/include directly, which is linked correctly
> to the one for that ARCH, so there is no need to worry about.

I think you missed the point,I was trying to say that
legacy DPDK application and third party stacks uses SSE2NEON kind of
libraries
for quick integration, for example, something like this
https://github.com/jratcliff63367/sse2neon/blob/master/SSE2NEON.h

AND they include "rte_lpm.h"(it internally includes rte_vect.h)
that lead to multiple definition and its not good.

>
>
> >> >
> >> > IMO, it makes sense to not emulate the SSE intrinsics with NEON
> >> > Let's create the rte_vect_* as required. look at the existing patch.
> >> >
> >> I thought of creating a layer of SIMD over all the platforms before.
> >> But can't you see it make things complicated, considering there are
> >> only few simple intrinsic to implement?
> >
> > Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON
> > implementation if I were to take this approach and emulation comes with
> > the cost.
> >
> No, I will not re-implement all the intrinsic like that .
> I only do with the simple intrinsic, such as load/store, as you said below.

but you forced to add _mm_and_si128 also to the list and emulated
_mm_and_si128 intrinsic. Am just saying no emulation.


>
> > So my take is,
> > lets the each architecture implementation for specific SIMD version of DPDK
> > API in the library should have the freedom to implement the API in
> > NATIVE.
> >
> > And let's create only rte_vect_* abstraction only for using
> > that API/library. Which boils down to have very minimal rte_vect_*
> > abstraction to load, store, set not beyond that.
> >
> > This makes clear "contract" between DPDK library and the applications.
> > and make easy for remaning new architecture  porting effort in DPDK.
> >
> Agree.
> But I reuse existing intrinsic names, and you recreate new ones.
> And I try to do as few changes as possible, and try to avoid any
> mistaken which may cause code un-compiled.

Its trival to verify. Just compile it

> I think it's design level question, we need to hear what others talk about it.
>
> > Imagine how your proposed function will look like if new architecture
> > wants to implement "optimized" version of rte_lpm_lookupx4
> >
> There is no optimization for this (simple) rte_lpm_lookupx4, otherwise
> you have done that in your patch.
> If there is for other new platform, defintely they should do like
> yours, as you did for NEON ACL.
>
> >
> >> If do so, we also need to explain to others how to use these interfaces.
> >> Besides, this patch did the smallest changes to the original code, and
> >> more likely to be accepted by others.
> >
> > other patch makes no changes to IA version of rte_lpm_lookupx4.I thought
> > that make reviewer easy to review the changes in architecture
> > perspective.
> >
> As I know, they don't enable LPM for PPC, and ARM is the first one to
> touch this issue.
>
> >>
> >> >
> >> >>  #ifdef RTE_ARCH_ARM
> >> >>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> >> >>  static __inline uint8x16_t
> >> >> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> >> >> index c299ce2..c76c07d 100644
> >> >> --- a/lib/librte_lpm/rte_lpm.h
> >> >> +++ b/lib/librte_lpm/rte_lpm.h
> >> >> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
> >> >>  /* Mask four results. */
> >> >>  #define       RTE_LPM_MASKX4_RES     UINT64_C(0x00ff00ff00ff00ff)
> >> >>
> >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> >> >
> >> > Separate out arm implementation to the different header file.
> >> > Too many ifdef looks odd in the header file and difficult to manage.
> >> >
> >> But there are many ifdefs already.
> >> And It seems unreasonable to add a new file only for one small function.
> >>
> >
> > small or big, its matter of each architecture to have
> > the freedom for the optimized version for the implementation.
> >
> > What if  other architecture demands to write this function in assembly
> > or restructure it for performance improvement?
> >
> If there is such demands, should do like that.
> But I don't see any restructure in your patch, and you still follow
> the logic as x86, is it worth adding a new file?

SIMD Logic on getting  4 indexes for tbl24[] is different.

/* get 4 indexes for tbl24[]. */
i24 = _mm_srli_epi32(ip, CHAR_BIT);

/* extract values from tbl24[] */
idx = _mm_cvtsi128_si64(i24);
i24 = _mm_srli_si128(i24, sizeof(uint64_t));

tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];

idx = _mm_cvtsi128_si64(i24);

tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];

VS

/* extract values from tbl24[] */
idx = vgetq_lane_u64((uint64x2_t)i24, 0);

tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];

idx = vgetq_lane_u64((uint64x2_t)i24, 1);

tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];

>
> >
> >> >
> >> >> +static inline void
> >> >> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t tbl[4])
> >> >> +{
> >> >> +     uint32x4_t i24;
> >> >> +     uint32_t idx[4];
> >> >> +
> >> >> +     /* get 4 indexes for tbl24[]. */
> >> >> +     i24 = vshrq_n_u32(vreinterpretq_u32_s32(ip), CHAR_BIT);
> >> >> +     vst1q_u32(idx, i24);
> >> >> +
> >> >> +     /* extract values from tbl24[] */
> >> >> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[idx[0]];
> >> >> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx[1]];
> >> >> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[idx[2]];
> >> >> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx[3]];
> >> >> +}
> >> >
> >> > Nice. There is an improvement in this portion code wrt my patch. This is
> >> > a candidate for convergence.
> >> >
> >> >
> >> >> +#else
> >> >> +static inline void
> >> >> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, __m128i ip, uint16_t tbl[4])
> >> >> +{
> >> >> +     __m128i i24;
> >> >> +     uint64_t idx;
> >> >> +
> >> >> +     /* get 4 indexes for tbl24[]. */
> >> >> +     i24 = _mm_srli_epi32(ip, CHAR_BIT);
> >> >> +
> >> >> +     /* extract values from tbl24[] */
> >> >> +     idx = _mm_cvtsi128_si64(i24);
> >> >> +     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
> >> >> +
> >> >> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> >> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> >> +
> >> >> +     idx = _mm_cvtsi128_si64(i24);
> >> >> +
> >> >> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> >> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> >> +}
> >> >> +#endif
> >> >> +
> >> >>  /**
> >> >>   * Lookup four IP addresses in an LPM table.
> >> >>   *
> >> >> @@ -381,17 +422,19 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
> >> >>   *   if lookup would fail.
> >> >>   */
> >> >>  static inline void
> >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> >> >> +rte_lpm_lookupx4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t hop[4],
> >> >> +     uint16_t defv)
> >> >
> >> > This would call for change in the change the ABI,
> >> > IMO, __m128i can be used to represent 128bit vector to avoid ABI chang
> >> >
> >> This redefine rte_lpm_lookupx4 is unncessary, I will remove it, so no
> >> ABI change.
> >> And there only one ifdef for ARM platforms left.
> >>
> >> >
> >> >> +#else
> >> > separate out arm implementation to the different header file. Too many
> >> > ifdef looks odd in the header file.
> >> >
> >> > Could you  rebase your patch based on existing patch and send the
> >> > improvement portion as separate patch or I can send update patch with
> >> > your improvements and with your signoff.
> >> >
> >> >
> >> >>  rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
> >> >>       uint16_t defv)
> >> >> +#endif
> >> >>  {
> >> >> -     __m128i i24;
> >> >>       rte_xmm_t i8;
> >> >>       uint16_t tbl[4];
> >> >> -     uint64_t idx, pt;
> >> >> -
> >> >> -     const __m128i mask8 =
> >> >> -             _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX);
> >> >> +     uint64_t pt;
> >> >>
> >> >> +     const __m128i mask8 = _mm_set1_epi32(UINT8_MAX);
> >> >>       /*
> >> >>        * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries
> >> >>        * as one 64-bit value (0x0300030003000300).
> >> >> @@ -412,20 +455,7 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
> >> >>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 |
> >> >>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48);
> >> >>
> >> >> -     /* get 4 indexes for tbl24[]. */
> >> >> -     i24 = _mm_srli_epi32(ip, CHAR_BIT);
> >> >> -
> >> >> -     /* extract values from tbl24[] */
> >> >> -     idx = _mm_cvtsi128_si64(i24);
> >> >> -     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
> >> >> -
> >> >> -     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> >> -     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> >> -
> >> >> -     idx = _mm_cvtsi128_si64(i24);
> >> >> -
> >> >> -     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> >> -     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> >> +     rte_lpm_tbl24_val4(lpm, ip, tbl);
> >> >>
> >> >>       /* get 4 indexes for tbl8[]. */
> >> >>       i8.x = _mm_and_si128(ip, mask8);
> >> >> --
> >> >> 1.8.3.1
> >> >>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02 10:33           ` Ananyev, Konstantin
@ 2015-12-02 10:48             ` Jerin Jacob
  2015-12-02 13:06               ` Jan Viktorin
  0 siblings, 1 reply; 50+ messages in thread
From: Jerin Jacob @ 2015-12-02 10:48 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Wed, Dec 02, 2015 at 10:33:44AM +0000, Ananyev, Konstantin wrote:
> Hi everyone,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
> > Sent: Wednesday, December 02, 2015 9:50 AM
> > To: Jerin Jacob
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> > 
> > On 2 December 2015 at 16:03, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > > On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote:
> > >> On 2 December 2015 at 00:41, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > >> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
> > >> >> Adds ARM NEON support for lpm.
> > >> >> And enables table/pipeline libraries which depend on lpm.
> > >> >
> > >> > I already sent the patch on the same yesterday.
> > >> > We can converge the patches after the discussion.
> > >> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
> > >> >
> > >> Yes, I have read your patch. But there are many differences, so I sent
> > >> mine for your reviewing :)
> > >>
> > >> >
> > >> >>
> > >> >> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> > >> >> ---
> > >> >>  config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
> > >> >>  config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
> > >> >>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
> > >> >>  lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
> > >> >>  4 files changed, 77 insertions(+), 25 deletions(-)
> > >> >>
> > >> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> > >> >> index cbebd64..efffa1f 100644
> > >> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> > >> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> > >> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
> > >> >>  CONFIG_RTE_EAL_IGB_UIO=n
> > >> >>
> > >> >>  # fails to compile on ARM
> > >> >> -CONFIG_RTE_LIBRTE_LPM=n
> > >> >> -CONFIG_RTE_LIBRTE_TABLE=n
> > >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> > >> >>  CONFIG_RTE_SCHED_VECTOR=n
> > >> >>
> > >> >>  # cannot use those on ARM
> > >> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
> > >> >> index 504f3ed..57f7941 100644
> > >> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> > >> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> > >> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
> > >> >>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
> > >> >>  CONFIG_RTE_LIBRTE_I40E_PMD=n
> > >> >>
> > >> >> -CONFIG_RTE_LIBRTE_LPM=n
> > >> >> -CONFIG_RTE_LIBRTE_TABLE=n
> > >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> > >> >>  CONFIG_RTE_SCHED_VECTOR=n
> > >> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> > >> >> index a33c054..7437711 100644
> > >> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
> > >> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> > >> >> @@ -41,6 +41,8 @@ extern "C" {
> > >> >>
> > >> >>  typedef int32x4_t xmm_t;
> > >> >>
> > >> >> +typedef int32x4_t __m128i;
> > >> >> +
> > >> >>  #define      XMM_SIZE        (sizeof(xmm_t))
> > >> >>  #define      XMM_MASK        (XMM_SIZE - 1)
> > >> >>
> > >> >> @@ -53,6 +55,32 @@ typedef union rte_xmm {
> > >> >>       double   pd[XMM_SIZE / sizeof(double)];
> > >> >>  } __attribute__((aligned(16))) rte_xmm_t;
> > >> >>
> > >> >> +static __inline __m128i
> > >> >> +_mm_set_epi32(int i3, int i2, int i1, int i0)
> > >> >> +{
> > >> >> +     int32_t r[4] = {i0, i1, i2, i3};
> > >> >> +
> > >> >> +     return vld1q_s32(r);
> > >> >> +}
> > >> >> +
> > >> >> +static __inline __m128i
> > >> >> +_mm_loadu_si128(__m128i *p)
> > >> >> +{
> > >> >> +     return vld1q_s32((int32_t *)p);
> > >> >> +}
> > >> >> +
> > >> >> +static __inline __m128i
> > >> >> +_mm_set1_epi32(int i)
> > >> >> +{
> > >> >> +     return vdupq_n_s32(i);
> > >> >> +}
> > >> >> +
> > >> >> +static __inline __m128i
> > >> >> +_mm_and_si128(__m128i a, __m128i b)
> > >> >> +{
> > >> >> +     return vandq_s32(a, b);
> > >> >> +}
> > >> >> +
> > >
> > > IMO, it's not always good to emulate GCC defined intrinsics of
> > > other architecture. What if a legacy DPDK application has such mappings
> > > then BOOM, multiple definition, which one is correct? which one
> > > to comment it out? Integration pain starts for DPDK library consumer:-(
> > >
> > They can include rte_vect.h in build/include directly, which is linked correctly
> > to the one for that ARCH, so there is no need to worry about.
> > 
> > 
> > >> >
> > >> > IMO, it makes sense to not emulate the SSE intrinsics with NEON
> > >> > Let's create the rte_vect_* as required. look at the existing patch.
> > >> >
> > >> I thought of creating a layer of SIMD over all the platforms before.
> > >> But can't you see it make things complicated, considering there are
> > >> only few simple intrinsic to implement?
> > >
> > > Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON
> > > implementation if I were to take this approach and emulation comes with
> > > the cost.
> > >
> > No, I will not re-implement all the intrinsic like that .
> > I only do with the simple intrinsic, such as load/store, as you said below.
> > 
> > > So my take is,
> > > lets the each architecture implementation for specific SIMD version of DPDK
> > > API in the library should have the freedom to implement the API in
> > > NATIVE.
> > >
> > > And let's create only rte_vect_* abstraction only for using
> > > that API/library. Which boils down to have very minimal rte_vect_*
> > > abstraction to load, store, set not beyond that.
> > >
> > > This makes clear "contract" between DPDK library and the applications.
> > > and make easy for remaning new architecture  porting effort in DPDK.
> > >
> > Agree.
> > But I reuse existing intrinsic names, and you recreate new ones.
> > And I try to do as few changes as possible, and try to avoid any
> > mistaken which may cause code un-compiled.
> > I think it's design level question, we need to hear what others talk about it.
> > 
> > > Imagine how your proposed function will look like if new architecture
> > > wants to implement "optimized" version of rte_lpm_lookupx4
> > >
> > There is no optimization for this (simple) rte_lpm_lookupx4, otherwise
> > you have done that in your patch.
> > If there is for other new platform, defintely they should do like
> > yours, as you did for NEON ACL.
> > 
> > >
> > >> If do so, we also need to explain to others how to use these interfaces.
> > >> Besides, this patch did the smallest changes to the original code, and
> > >> more likely to be accepted by others.
> > >
> > > other patch makes no changes to IA version of rte_lpm_lookupx4.I thought
> > > that make reviewer easy to review the changes in architecture
> > > perspective.
> > >
> > As I know, they don't enable LPM for PPC, and ARM is the first one to
> > touch this issue.
> > 
> > >>
> > >> >
> > >> >>  #ifdef RTE_ARCH_ARM
> > >> >>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> > >> >>  static __inline uint8x16_t
> > >> >> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> > >> >> index c299ce2..c76c07d 100644
> > >> >> --- a/lib/librte_lpm/rte_lpm.h
> > >> >> +++ b/lib/librte_lpm/rte_lpm.h
> > >> >> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
> > >> >>  /* Mask four results. */
> > >> >>  #define       RTE_LPM_MASKX4_RES     UINT64_C(0x00ff00ff00ff00ff)
> > >> >>
> > >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> > >> >
> > >> > Separate out arm implementation to the different header file.
> > >> > Too many ifdef looks odd in the header file and difficult to manage.
> > >> >
> > >> But there are many ifdefs already.
> > >> And It seems unreasonable to add a new file only for one small function.
> > >>
> > >
> > > small or big, its matter of each architecture to have
> > > the freedom for the optimized version for the implementation.
> > >
> > > What if  other architecture demands to write this function in assembly
> > > or restructure it for performance improvement?
> > >
> > If there is such demands, should do like that.
> > But I don't see any restructure in your patch, and you still follow
> > the logic as x86, is it worth adding a new file?
> > 
> 
> My preference would also be to put architecture dependent implementation
> into different files. 
> Might be create lib/librte_lpm/arch/(arm|x86)/... here?
> Konstantin  

+1

my existing patch creates lib/librte_lpm/rte_lpm_neon.h instead
of lib/librte_lpm/arch/arm/rte_lpm_neon.h like
lib/librte_hash/rte_cmp_x86.h

I am OK for changing the directory structure as proposed in my next revision
of patch.
Let me know if anyone has any objections/concerns.

Jerin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02 10:39           ` Jerin Jacob
@ 2015-12-02 13:05             ` Jan Viktorin
  2015-12-02 13:13             ` Jianbo Liu
  1 sibling, 0 replies; 50+ messages in thread
From: Jan Viktorin @ 2015-12-02 13:05 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

On Wed, 2 Dec 2015 16:09:06 +0530
Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

> > > [snip]
> > > IMO, it's not always good to emulate GCC defined intrinsics of
> > > other architecture. What if a legacy DPDK application has such mappings
> > > then BOOM, multiple definition, which one is correct? which one
> > > to comment it out? Integration pain starts for DPDK library consumer:-(
> > >  
> > They can include rte_vect.h in build/include directly, which is linked correctly
> > to the one for that ARCH, so there is no need to worry about.  
> 
> I think you missed the point,I was trying to say that
> legacy DPDK application and third party stacks uses SSE2NEON kind of
> libraries
> for quick integration, for example, something like this
> https://github.com/jratcliff63367/sse2neon/blob/master/SSE2NEON.h
> 
> AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> that lead to multiple definition and its not good.
> 
> >
> >  
> > >> >
> > >> > IMO, it makes sense to not emulate the SSE intrinsics with NEON
> > >> > Let's create the rte_vect_* as required. look at the existing patch.
> > >> >  
> > >> I thought of creating a layer of SIMD over all the platforms before.
> > >> But can't you see it make things complicated, considering there are
> > >> only few simple intrinsic to implement?  
> > >
> > > Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON
> > > implementation if I were to take this approach and emulation comes with
> > > the cost.
> > >  
> > No, I will not re-implement all the intrinsic like that .
> > I only do with the simple intrinsic, such as load/store, as you said below.  
> 
> but you forced to add _mm_and_si128 also to the list and emulated
> _mm_and_si128 intrinsic. Am just saying no emulation.
> 

Guys, do we want emulate x86 on ARM? I hope we don't ;). I think, as
more platforms might come into DPDK, there will be a need for a proper
abstract vector operations API. Yes, we have to describe this API to
people. However, otherwise, the ARM guys must learn SSE and write for
ARM platform something that looks quite odd. And if there are some "neon
emulations" as shown above, it's definitely an argue to have the API
that can hide those approachs.

Regards
Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02 10:48             ` Jerin Jacob
@ 2015-12-02 13:06               ` Jan Viktorin
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Viktorin @ 2015-12-02 13:06 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

On Wed, 2 Dec 2015 16:18:13 +0530
Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

> > > [snip]
> > 
> > My preference would also be to put architecture dependent implementation
> > into different files. 
> > Might be create lib/librte_lpm/arch/(arm|x86)/... here?
> > Konstantin    
> 
> +1
> 
> my existing patch creates lib/librte_lpm/rte_lpm_neon.h instead
> of lib/librte_lpm/arch/arm/rte_lpm_neon.h like
> lib/librte_hash/rte_cmp_x86.h
> 
> I am OK for changing the directory structure as proposed in my next revision
> of patch.
> Let me know if anyone has any objections/concerns.
> 
> Jerin

I don't like the idea to have arch/... directory structure inside
libraries. I would delay such decision until there are really a big
number of different optimized implementations.

However, the rte_lpm_neon.h approach is OK from my point of view.

Jan

> > [snip]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02 10:39           ` Jerin Jacob
  2015-12-02 13:05             ` Jan Viktorin
@ 2015-12-02 13:13             ` Jianbo Liu
  2015-12-02 14:34               ` Jerin Jacob
  1 sibling, 1 reply; 50+ messages in thread
From: Jianbo Liu @ 2015-12-02 13:13 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> On Wed, Dec 02, 2015 at 05:49:41PM +0800, Jianbo Liu wrote:
>> On 2 December 2015 at 16:03, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
>> > On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote:
>> >> On 2 December 2015 at 00:41, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
>> >> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
>> >> >> Adds ARM NEON support for lpm.
>> >> >> And enables table/pipeline libraries which depend on lpm.
>> >> >
>> >> > I already sent the patch on the same yesterday.
>> >> > We can converge the patches after the discussion.
>> >> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
>> >> >
>> >> Yes, I have read your patch. But there are many differences, so I sent
>> >> mine for your reviewing :)
>> >>
>> >> >
>> >> >>
>> >> >> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
>> >> >> ---
>> >> >>  config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
>> >> >>  config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
>> >> >>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
>> >> >>  lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
>> >> >>  4 files changed, 77 insertions(+), 25 deletions(-)
>> >> >>
>> >> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> >> index cbebd64..efffa1f 100644
>> >> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
>> >> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
>> >> >>  CONFIG_RTE_EAL_IGB_UIO=n
>> >> >>
>> >> >>  # fails to compile on ARM
>> >> >> -CONFIG_RTE_LIBRTE_LPM=n
>> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
>> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
>> >> >>  CONFIG_RTE_SCHED_VECTOR=n
>> >> >>
>> >> >>  # cannot use those on ARM
>> >> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> >> index 504f3ed..57f7941 100644
>> >> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
>> >> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
>> >> >>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
>> >> >>  CONFIG_RTE_LIBRTE_I40E_PMD=n
>> >> >>
>> >> >> -CONFIG_RTE_LIBRTE_LPM=n
>> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
>> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
>> >> >>  CONFIG_RTE_SCHED_VECTOR=n
>> >> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> >> index a33c054..7437711 100644
>> >> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
>> >> >> @@ -41,6 +41,8 @@ extern "C" {
>> >> >>
>> >> >>  typedef int32x4_t xmm_t;
>> >> >>
>> >> >> +typedef int32x4_t __m128i;
>> >> >> +
>> >> >>  #define      XMM_SIZE        (sizeof(xmm_t))
>> >> >>  #define      XMM_MASK        (XMM_SIZE - 1)
>> >> >>
>> >> >> @@ -53,6 +55,32 @@ typedef union rte_xmm {
>> >> >>       double   pd[XMM_SIZE / sizeof(double)];
>> >> >>  } __attribute__((aligned(16))) rte_xmm_t;
>> >> >>
>> >> >> +static __inline __m128i
>> >> >> +_mm_set_epi32(int i3, int i2, int i1, int i0)
>> >> >> +{
>> >> >> +     int32_t r[4] = {i0, i1, i2, i3};
>> >> >> +
>> >> >> +     return vld1q_s32(r);
>> >> >> +}
>> >> >> +
>> >> >> +static __inline __m128i
>> >> >> +_mm_loadu_si128(__m128i *p)
>> >> >> +{
>> >> >> +     return vld1q_s32((int32_t *)p);
>> >> >> +}
>> >> >> +
>> >> >> +static __inline __m128i
>> >> >> +_mm_set1_epi32(int i)
>> >> >> +{
>> >> >> +     return vdupq_n_s32(i);
>> >> >> +}
>> >> >> +
>> >> >> +static __inline __m128i
>> >> >> +_mm_and_si128(__m128i a, __m128i b)
>> >> >> +{
>> >> >> +     return vandq_s32(a, b);
>> >> >> +}
>> >> >> +
>> >
>> > IMO, it's not always good to emulate GCC defined intrinsics of
>> > other architecture. What if a legacy DPDK application has such mappings
>> > then BOOM, multiple definition, which one is correct? which one
>> > to comment it out? Integration pain starts for DPDK library consumer:-(
>> >
>> They can include rte_vect.h in build/include directly, which is linked correctly
>> to the one for that ARCH, so there is no need to worry about.
>
> I think you missed the point,I was trying to say that
> legacy DPDK application and third party stacks uses SSE2NEON kind of
> libraries
> for quick integration, for example, something like this
> https://github.com/jratcliff63367/sse2neon/blob/master/SSE2NEON.h
>
> AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> that lead to multiple definition and its not good.
>
But you will have similar issue since "typedef int32x4_t __m128i"
appears in both your patch and this header file.

>>
>>
>> >> >
>> >> > IMO, it makes sense to not emulate the SSE intrinsics with NEON
>> >> > Let's create the rte_vect_* as required. look at the existing patch.
>> >> >
>> >> I thought of creating a layer of SIMD over all the platforms before.
>> >> But can't you see it make things complicated, considering there are
>> >> only few simple intrinsic to implement?
>> >
>> > Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON
>> > implementation if I were to take this approach and emulation comes with
>> > the cost.
>> >
>> No, I will not re-implement all the intrinsic like that .
>> I only do with the simple intrinsic, such as load/store, as you said below.
>
> but you forced to add _mm_and_si128 also to the list and emulated
> _mm_and_si128 intrinsic. Am just saying no emulation.
>
I means simple intrinsic, not load/store only.
Depends on how you define emulation. Actually, these simple intrisinic
could be only one NEON instruction, and will not bring cost.

>
>>
>> > So my take is,
>> > lets the each architecture implementation for specific SIMD version of DPDK
>> > API in the library should have the freedom to implement the API in
>> > NATIVE.
>> >
>> > And let's create only rte_vect_* abstraction only for using
>> > that API/library. Which boils down to have very minimal rte_vect_*
>> > abstraction to load, store, set not beyond that.
>> >
>> > This makes clear "contract" between DPDK library and the applications.
>> > and make easy for remaning new architecture  porting effort in DPDK.
>> >
>> Agree.
>> But I reuse existing intrinsic names, and you recreate new ones.
>> And I try to do as few changes as possible, and try to avoid any
>> mistaken which may cause code un-compiled.
>
> Its trival to verify. Just compile it
>
>> I think it's design level question, we need to hear what others talk about it.
>>
>> > Imagine how your proposed function will look like if new architecture
>> > wants to implement "optimized" version of rte_lpm_lookupx4
>> >
>> There is no optimization for this (simple) rte_lpm_lookupx4, otherwise
>> you have done that in your patch.
>> If there is for other new platform, defintely they should do like
>> yours, as you did for NEON ACL.
>>
>> >
>> >> If do so, we also need to explain to others how to use these interfaces.
>> >> Besides, this patch did the smallest changes to the original code, and
>> >> more likely to be accepted by others.
>> >
>> > other patch makes no changes to IA version of rte_lpm_lookupx4.I thought
>> > that make reviewer easy to review the changes in architecture
>> > perspective.
>> >
>> As I know, they don't enable LPM for PPC, and ARM is the first one to
>> touch this issue.
>>
>> >>
>> >> >
>> >> >>  #ifdef RTE_ARCH_ARM
>> >> >>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>> >> >>  static __inline uint8x16_t
>> >> >> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
>> >> >> index c299ce2..c76c07d 100644
>> >> >> --- a/lib/librte_lpm/rte_lpm.h
>> >> >> +++ b/lib/librte_lpm/rte_lpm.h
>> >> >> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
>> >> >>  /* Mask four results. */
>> >> >>  #define       RTE_LPM_MASKX4_RES     UINT64_C(0x00ff00ff00ff00ff)
>> >> >>
>> >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>> >> >
>> >> > Separate out arm implementation to the different header file.
>> >> > Too many ifdef looks odd in the header file and difficult to manage.
>> >> >
>> >> But there are many ifdefs already.
>> >> And It seems unreasonable to add a new file only for one small function.
>> >>
>> >
>> > small or big, its matter of each architecture to have
>> > the freedom for the optimized version for the implementation.
>> >
>> > What if  other architecture demands to write this function in assembly
>> > or restructure it for performance improvement?
>> >
>> If there is such demands, should do like that.
>> But I don't see any restructure in your patch, and you still follow
>> the logic as x86, is it worth adding a new file?
>
> SIMD Logic on getting  4 indexes for tbl24[] is different.
>
> /* get 4 indexes for tbl24[]. */
> i24 = _mm_srli_epi32(ip, CHAR_BIT);
>
> /* extract values from tbl24[] */
> idx = _mm_cvtsi128_si64(i24);
> i24 = _mm_srli_si128(i24, sizeof(uint64_t));
>
> tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>
> idx = _mm_cvtsi128_si64(i24);
>
> tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>
> VS
>
> /* extract values from tbl24[] */
> idx = vgetq_lane_u64((uint64x2_t)i24, 0);
>
> tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>
> idx = vgetq_lane_u64((uint64x2_t)i24, 1);
>
> tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>
It's only the optimazation of part of code in that function. I did the
similar in my patch.
But, looking from the whole, this function is not restructured, and
the logic is the same as x86.

>>
>> >
>> >> >
>> >> >> +static inline void
>> >> >> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t tbl[4])
>> >> >> +{
>> >> >> +     uint32x4_t i24;
>> >> >> +     uint32_t idx[4];
>> >> >> +
>> >> >> +     /* get 4 indexes for tbl24[]. */
>> >> >> +     i24 = vshrq_n_u32(vreinterpretq_u32_s32(ip), CHAR_BIT);
>> >> >> +     vst1q_u32(idx, i24);
>> >> >> +
>> >> >> +     /* extract values from tbl24[] */
>> >> >> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[idx[0]];
>> >> >> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx[1]];
>> >> >> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[idx[2]];
>> >> >> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx[3]];
>> >> >> +}
>> >> >
>> >> > Nice. There is an improvement in this portion code wrt my patch. This is
>> >> > a candidate for convergence.
>> >> >
>> >> >
>> >> >> +#else
>> >> >> +static inline void
>> >> >> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, __m128i ip, uint16_t tbl[4])
>> >> >> +{
>> >> >> +     __m128i i24;
>> >> >> +     uint64_t idx;
>> >> >> +
>> >> >> +     /* get 4 indexes for tbl24[]. */
>> >> >> +     i24 = _mm_srli_epi32(ip, CHAR_BIT);
>> >> >> +
>> >> >> +     /* extract values from tbl24[] */
>> >> >> +     idx = _mm_cvtsi128_si64(i24);
>> >> >> +     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
>> >> >> +
>> >> >> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> >> >> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> >> >> +
>> >> >> +     idx = _mm_cvtsi128_si64(i24);
>> >> >> +
>> >> >> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> >> >> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> >> >> +}
>> >> >> +#endif
>> >> >> +
>> >> >>  /**
>> >> >>   * Lookup four IP addresses in an LPM table.
>> >> >>   *
>> >> >> @@ -381,17 +422,19 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
>> >> >>   *   if lookup would fail.
>> >> >>   */
>> >> >>  static inline void
>> >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>> >> >> +rte_lpm_lookupx4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t hop[4],
>> >> >> +     uint16_t defv)
>> >> >
>> >> > This would call for change in the change the ABI,
>> >> > IMO, __m128i can be used to represent 128bit vector to avoid ABI chang
>> >> >
>> >> This redefine rte_lpm_lookupx4 is unncessary, I will remove it, so no
>> >> ABI change.
>> >> And there only one ifdef for ARM platforms left.
>> >>
>> >> >
>> >> >> +#else
>> >> > separate out arm implementation to the different header file. Too many
>> >> > ifdef looks odd in the header file.
>> >> >
>> >> > Could you  rebase your patch based on existing patch and send the
>> >> > improvement portion as separate patch or I can send update patch with
>> >> > your improvements and with your signoff.
>> >> >
>> >> >
>> >> >>  rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
>> >> >>       uint16_t defv)
>> >> >> +#endif
>> >> >>  {
>> >> >> -     __m128i i24;
>> >> >>       rte_xmm_t i8;
>> >> >>       uint16_t tbl[4];
>> >> >> -     uint64_t idx, pt;
>> >> >> -
>> >> >> -     const __m128i mask8 =
>> >> >> -             _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX);
>> >> >> +     uint64_t pt;
>> >> >>
>> >> >> +     const __m128i mask8 = _mm_set1_epi32(UINT8_MAX);
>> >> >>       /*
>> >> >>        * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries
>> >> >>        * as one 64-bit value (0x0300030003000300).
>> >> >> @@ -412,20 +455,7 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
>> >> >>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 |
>> >> >>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48);
>> >> >>
>> >> >> -     /* get 4 indexes for tbl24[]. */
>> >> >> -     i24 = _mm_srli_epi32(ip, CHAR_BIT);
>> >> >> -
>> >> >> -     /* extract values from tbl24[] */
>> >> >> -     idx = _mm_cvtsi128_si64(i24);
>> >> >> -     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
>> >> >> -
>> >> >> -     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> >> >> -     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> >> >> -
>> >> >> -     idx = _mm_cvtsi128_si64(i24);
>> >> >> -
>> >> >> -     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
>> >> >> -     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
>> >> >> +     rte_lpm_tbl24_val4(lpm, ip, tbl);
>> >> >>
>> >> >>       /* get 4 indexes for tbl8[]. */
>> >> >>       i8.x = _mm_and_si128(ip, mask8);
>> >> >> --
>> >> >> 1.8.3.1
>> >> >>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02 13:13             ` Jianbo Liu
@ 2015-12-02 14:34               ` Jerin Jacob
  2015-12-02 16:40                 ` Thomas Monjalon
  0 siblings, 1 reply; 50+ messages in thread
From: Jerin Jacob @ 2015-12-02 14:34 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > On Wed, Dec 02, 2015 at 05:49:41PM +0800, Jianbo Liu wrote:
> >> On 2 December 2015 at 16:03, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> >> > On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote:
> >> >> On 2 December 2015 at 00:41, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> >> >> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
> >> >> >> Adds ARM NEON support for lpm.
> >> >> >> And enables table/pipeline libraries which depend on lpm.
> >> >> >
> >> >> > I already sent the patch on the same yesterday.
> >> >> > We can converge the patches after the discussion.
> >> >> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
> >> >> >
> >> >> Yes, I have read your patch. But there are many differences, so I sent
> >> >> mine for your reviewing :)
> >> >>
> >> >> >
> >> >> >>
> >> >> >> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> >> >> >> ---
> >> >> >>  config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
> >> >> >>  config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
> >> >> >>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
> >> >> >>  lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
> >> >> >>  4 files changed, 77 insertions(+), 25 deletions(-)
> >> >> >>
> >> >> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> >> index cbebd64..efffa1f 100644
> >> >> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
> >> >> >>  CONFIG_RTE_EAL_IGB_UIO=n
> >> >> >>
> >> >> >>  # fails to compile on ARM
> >> >> >> -CONFIG_RTE_LIBRTE_LPM=n
> >> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
> >> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> >> >> >>  CONFIG_RTE_SCHED_VECTOR=n
> >> >> >>
> >> >> >>  # cannot use those on ARM
> >> >> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> >> index 504f3ed..57f7941 100644
> >> >> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
> >> >> >>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
> >> >> >>  CONFIG_RTE_LIBRTE_I40E_PMD=n
> >> >> >>
> >> >> >> -CONFIG_RTE_LIBRTE_LPM=n
> >> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
> >> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> >> >> >>  CONFIG_RTE_SCHED_VECTOR=n
> >> >> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> >> index a33c054..7437711 100644
> >> >> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> >> @@ -41,6 +41,8 @@ extern "C" {
> >> >> >>
> >> >> >>  typedef int32x4_t xmm_t;
> >> >> >>
> >> >> >> +typedef int32x4_t __m128i;
> >> >> >> +
> >> >> >>  #define      XMM_SIZE        (sizeof(xmm_t))
> >> >> >>  #define      XMM_MASK        (XMM_SIZE - 1)
> >> >> >>
> >> >> >> @@ -53,6 +55,32 @@ typedef union rte_xmm {
> >> >> >>       double   pd[XMM_SIZE / sizeof(double)];
> >> >> >>  } __attribute__((aligned(16))) rte_xmm_t;
> >> >> >>
> >> >> >> +static __inline __m128i
> >> >> >> +_mm_set_epi32(int i3, int i2, int i1, int i0)
> >> >> >> +{
> >> >> >> +     int32_t r[4] = {i0, i1, i2, i3};
> >> >> >> +
> >> >> >> +     return vld1q_s32(r);
> >> >> >> +}
> >> >> >> +
> >> >> >> +static __inline __m128i
> >> >> >> +_mm_loadu_si128(__m128i *p)
> >> >> >> +{
> >> >> >> +     return vld1q_s32((int32_t *)p);
> >> >> >> +}
> >> >> >> +
> >> >> >> +static __inline __m128i
> >> >> >> +_mm_set1_epi32(int i)
> >> >> >> +{
> >> >> >> +     return vdupq_n_s32(i);
> >> >> >> +}
> >> >> >> +
> >> >> >> +static __inline __m128i
> >> >> >> +_mm_and_si128(__m128i a, __m128i b)
> >> >> >> +{
> >> >> >> +     return vandq_s32(a, b);
> >> >> >> +}
> >> >> >> +
> >> >
> >> > IMO, it's not always good to emulate GCC defined intrinsics of
> >> > other architecture. What if a legacy DPDK application has such mappings
> >> > then BOOM, multiple definition, which one is correct? which one
> >> > to comment it out? Integration pain starts for DPDK library consumer:-(
> >> >
> >> They can include rte_vect.h in build/include directly, which is linked correctly
> >> to the one for that ARCH, so there is no need to worry about.
> >
> > I think you missed the point,I was trying to say that
> > legacy DPDK application and third party stacks uses SSE2NEON kind of
> > libraries
> > for quick integration, for example, something like this
> > https://github.com/jratcliff63367/sse2neon/blob/master/SSE2NEON.h
> >
> > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > that lead to multiple definition and its not good.
> >
> But you will have similar issue since "typedef int32x4_t __m128i"
> appears in both your patch and this header file.

I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
is fine(unlike inline function).

my intention to keep __m128i "as is"  because changing the __m128i to rte_???
something would break the ABI.


> 
> >>
> >>
> >> >> >
> >> >> > IMO, it makes sense to not emulate the SSE intrinsics with NEON
> >> >> > Let's create the rte_vect_* as required. look at the existing patch.
> >> >> >
> >> >> I thought of creating a layer of SIMD over all the platforms before.
> >> >> But can't you see it make things complicated, considering there are
> >> >> only few simple intrinsic to implement?
> >> >
> >> > Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON
> >> > implementation if I were to take this approach and emulation comes with
> >> > the cost.
> >> >
> >> No, I will not re-implement all the intrinsic like that .
> >> I only do with the simple intrinsic, such as load/store, as you said below.
> >
> > but you forced to add _mm_and_si128 also to the list and emulated
> > _mm_and_si128 intrinsic. Am just saying no emulation.
> >
> I means simple intrinsic, not load/store only.
> Depends on how you define emulation. Actually, these simple intrisinic
> could be only one NEON instruction, and will not bring cost.
> 
> >
> >>
> >> > So my take is,
> >> > lets the each architecture implementation for specific SIMD version of DPDK
> >> > API in the library should have the freedom to implement the API in
> >> > NATIVE.
> >> >
> >> > And let's create only rte_vect_* abstraction only for using
> >> > that API/library. Which boils down to have very minimal rte_vect_*
> >> > abstraction to load, store, set not beyond that.
> >> >
> >> > This makes clear "contract" between DPDK library and the applications.
> >> > and make easy for remaning new architecture  porting effort in DPDK.
> >> >
> >> Agree.
> >> But I reuse existing intrinsic names, and you recreate new ones.
> >> And I try to do as few changes as possible, and try to avoid any
> >> mistaken which may cause code un-compiled.
> >
> > Its trival to verify. Just compile it
> >
> >> I think it's design level question, we need to hear what others talk about it.
> >>
> >> > Imagine how your proposed function will look like if new architecture
> >> > wants to implement "optimized" version of rte_lpm_lookupx4
> >> >
> >> There is no optimization for this (simple) rte_lpm_lookupx4, otherwise
> >> you have done that in your patch.
> >> If there is for other new platform, defintely they should do like
> >> yours, as you did for NEON ACL.
> >>
> >> >
> >> >> If do so, we also need to explain to others how to use these interfaces.
> >> >> Besides, this patch did the smallest changes to the original code, and
> >> >> more likely to be accepted by others.
> >> >
> >> > other patch makes no changes to IA version of rte_lpm_lookupx4.I thought
> >> > that make reviewer easy to review the changes in architecture
> >> > perspective.
> >> >
> >> As I know, they don't enable LPM for PPC, and ARM is the first one to
> >> touch this issue.
> >>
> >> >>
> >> >> >
> >> >> >>  #ifdef RTE_ARCH_ARM
> >> >> >>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> >> >> >>  static __inline uint8x16_t
> >> >> >> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> >> >> >> index c299ce2..c76c07d 100644
> >> >> >> --- a/lib/librte_lpm/rte_lpm.h
> >> >> >> +++ b/lib/librte_lpm/rte_lpm.h
> >> >> >> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
> >> >> >>  /* Mask four results. */
> >> >> >>  #define       RTE_LPM_MASKX4_RES     UINT64_C(0x00ff00ff00ff00ff)
> >> >> >>
> >> >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> >> >> >
> >> >> > Separate out arm implementation to the different header file.
> >> >> > Too many ifdef looks odd in the header file and difficult to manage.
> >> >> >
> >> >> But there are many ifdefs already.
> >> >> And It seems unreasonable to add a new file only for one small function.
> >> >>
> >> >
> >> > small or big, its matter of each architecture to have
> >> > the freedom for the optimized version for the implementation.
> >> >
> >> > What if  other architecture demands to write this function in assembly
> >> > or restructure it for performance improvement?
> >> >
> >> If there is such demands, should do like that.
> >> But I don't see any restructure in your patch, and you still follow
> >> the logic as x86, is it worth adding a new file?
> >
> > SIMD Logic on getting  4 indexes for tbl24[] is different.
> >
> > /* get 4 indexes for tbl24[]. */
> > i24 = _mm_srli_epi32(ip, CHAR_BIT);
> >
> > /* extract values from tbl24[] */
> > idx = _mm_cvtsi128_si64(i24);
> > i24 = _mm_srli_si128(i24, sizeof(uint64_t));
> >
> > tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> > tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >
> > idx = _mm_cvtsi128_si64(i24);
> >
> > tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> > tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >
> > VS
> >
> > /* extract values from tbl24[] */
> > idx = vgetq_lane_u64((uint64x2_t)i24, 0);
> >
> > tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> > tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >
> > idx = vgetq_lane_u64((uint64x2_t)i24, 1);
> >
> > tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> > tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >
> It's only the optimazation of part of code in that function. I did the
> similar in my patch.
> But, looking from the whole, this function is not restructured, and
> the logic is the same as x86.
> 
> >>
> >> >
> >> >> >
> >> >> >> +static inline void
> >> >> >> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t tbl[4])
> >> >> >> +{
> >> >> >> +     uint32x4_t i24;
> >> >> >> +     uint32_t idx[4];
> >> >> >> +
> >> >> >> +     /* get 4 indexes for tbl24[]. */
> >> >> >> +     i24 = vshrq_n_u32(vreinterpretq_u32_s32(ip), CHAR_BIT);
> >> >> >> +     vst1q_u32(idx, i24);
> >> >> >> +
> >> >> >> +     /* extract values from tbl24[] */
> >> >> >> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[idx[0]];
> >> >> >> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx[1]];
> >> >> >> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[idx[2]];
> >> >> >> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx[3]];
> >> >> >> +}
> >> >> >
> >> >> > Nice. There is an improvement in this portion code wrt my patch. This is
> >> >> > a candidate for convergence.
> >> >> >
> >> >> >
> >> >> >> +#else
> >> >> >> +static inline void
> >> >> >> +rte_lpm_tbl24_val4(const struct rte_lpm *lpm, __m128i ip, uint16_t tbl[4])
> >> >> >> +{
> >> >> >> +     __m128i i24;
> >> >> >> +     uint64_t idx;
> >> >> >> +
> >> >> >> +     /* get 4 indexes for tbl24[]. */
> >> >> >> +     i24 = _mm_srli_epi32(ip, CHAR_BIT);
> >> >> >> +
> >> >> >> +     /* extract values from tbl24[] */
> >> >> >> +     idx = _mm_cvtsi128_si64(i24);
> >> >> >> +     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
> >> >> >> +
> >> >> >> +     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> >> >> +     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> >> >> +
> >> >> >> +     idx = _mm_cvtsi128_si64(i24);
> >> >> >> +
> >> >> >> +     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> >> >> +     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> >> >> +}
> >> >> >> +#endif
> >> >> >> +
> >> >> >>  /**
> >> >> >>   * Lookup four IP addresses in an LPM table.
> >> >> >>   *
> >> >> >> @@ -381,17 +422,19 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
> >> >> >>   *   if lookup would fail.
> >> >> >>   */
> >> >> >>  static inline void
> >> >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> >> >> >> +rte_lpm_lookupx4(const struct rte_lpm *lpm, int32x4_t ip, uint16_t hop[4],
> >> >> >> +     uint16_t defv)
> >> >> >
> >> >> > This would call for change in the change the ABI,
> >> >> > IMO, __m128i can be used to represent 128bit vector to avoid ABI chang
> >> >> >
> >> >> This redefine rte_lpm_lookupx4 is unncessary, I will remove it, so no
> >> >> ABI change.
> >> >> And there only one ifdef for ARM platforms left.
> >> >>
> >> >> >
> >> >> >> +#else
> >> >> > separate out arm implementation to the different header file. Too many
> >> >> > ifdef looks odd in the header file.
> >> >> >
> >> >> > Could you  rebase your patch based on existing patch and send the
> >> >> > improvement portion as separate patch or I can send update patch with
> >> >> > your improvements and with your signoff.
> >> >> >
> >> >> >
> >> >> >>  rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
> >> >> >>       uint16_t defv)
> >> >> >> +#endif
> >> >> >>  {
> >> >> >> -     __m128i i24;
> >> >> >>       rte_xmm_t i8;
> >> >> >>       uint16_t tbl[4];
> >> >> >> -     uint64_t idx, pt;
> >> >> >> -
> >> >> >> -     const __m128i mask8 =
> >> >> >> -             _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX);
> >> >> >> +     uint64_t pt;
> >> >> >>
> >> >> >> +     const __m128i mask8 = _mm_set1_epi32(UINT8_MAX);
> >> >> >>       /*
> >> >> >>        * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries
> >> >> >>        * as one 64-bit value (0x0300030003000300).
> >> >> >> @@ -412,20 +455,7 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4],
> >> >> >>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 |
> >> >> >>               (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48);
> >> >> >>
> >> >> >> -     /* get 4 indexes for tbl24[]. */
> >> >> >> -     i24 = _mm_srli_epi32(ip, CHAR_BIT);
> >> >> >> -
> >> >> >> -     /* extract values from tbl24[] */
> >> >> >> -     idx = _mm_cvtsi128_si64(i24);
> >> >> >> -     i24 = _mm_srli_si128(i24, sizeof(uint64_t));
> >> >> >> -
> >> >> >> -     tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> >> >> -     tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> >> >> -
> >> >> >> -     idx = _mm_cvtsi128_si64(i24);
> >> >> >> -
> >> >> >> -     tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx];
> >> >> >> -     tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32];
> >> >> >> +     rte_lpm_tbl24_val4(lpm, ip, tbl);
> >> >> >>
> >> >> >>       /* get 4 indexes for tbl8[]. */
> >> >> >>       i8.x = _mm_and_si128(ip, mask8);
> >> >> >> --
> >> >> >> 1.8.3.1
> >> >> >>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02 14:34               ` Jerin Jacob
@ 2015-12-02 16:40                 ` Thomas Monjalon
  2015-12-02 16:53                   ` Jerin Jacob
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Monjalon @ 2015-12-02 16:40 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

2015-12-02 20:04, Jerin Jacob:
> On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> > On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > > that lead to multiple definition and its not good.
> > >
> > But you will have similar issue since "typedef int32x4_t __m128i"
> > appears in both your patch and this header file.
> 
> I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
> is fine(unlike inline function).
> 
> my intention to keep __m128i "as is"  because changing the __m128i to rte_???
> something would break the ABI.

Isn't it already broken in 2.2?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02 16:40                 ` Thomas Monjalon
@ 2015-12-02 16:53                   ` Jerin Jacob
  2015-12-02 16:57                     ` Thomas Monjalon
  0 siblings, 1 reply; 50+ messages in thread
From: Jerin Jacob @ 2015-12-02 16:53 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Wed, Dec 02, 2015 at 05:40:13PM +0100, Thomas Monjalon wrote:
> 2015-12-02 20:04, Jerin Jacob:
> > On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> > > On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > > > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > > > that lead to multiple definition and its not good.
> > > >
> > > But you will have similar issue since "typedef int32x4_t __m128i"
> > > appears in both your patch and this header file.
> > 
> > I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
> > is fine(unlike inline function).
> > 
> > my intention to keep __m128i "as is"  because changing the __m128i to rte_???
> > something would break the ABI.
> 
> Isn't it already broken in 2.2?

Does it mean, You would like to have rte_128i(or similar) kind of
abstraction to represent 128bit SIMD variable in DPDK?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02 16:53                   ` Jerin Jacob
@ 2015-12-02 16:57                     ` Thomas Monjalon
  2015-12-02 17:38                       ` Jerin Jacob
  2015-12-03  9:33                       ` Jerin Jacob
  0 siblings, 2 replies; 50+ messages in thread
From: Thomas Monjalon @ 2015-12-02 16:57 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

2015-12-02 22:23, Jerin Jacob:
> On Wed, Dec 02, 2015 at 05:40:13PM +0100, Thomas Monjalon wrote:
> > 2015-12-02 20:04, Jerin Jacob:
> > > On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> > > > On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > > > > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > > > > that lead to multiple definition and its not good.
> > > > >
> > > > But you will have similar issue since "typedef int32x4_t __m128i"
> > > > appears in both your patch and this header file.
> > > 
> > > I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
> > > is fine(unlike inline function).
> > > 
> > > my intention to keep __m128i "as is"  because changing the __m128i to rte_???
> > > something would break the ABI.
> > 
> > Isn't it already broken in 2.2?
> 
> Does it mean, You would like to have rte_128i(or similar) kind of
> abstraction to represent 128bit SIMD variable in DPDK?

If you are convinced that it is the best way to write a generic code, yes.
I think the most important question is to know what is the best solution
for performance and maintainability. The API/ABI questions will be considered
after.

Thanks for your involvement guys.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02 16:57                     ` Thomas Monjalon
@ 2015-12-02 17:38                       ` Jerin Jacob
  2015-12-03  9:33                       ` Jerin Jacob
  1 sibling, 0 replies; 50+ messages in thread
From: Jerin Jacob @ 2015-12-02 17:38 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Wed, Dec 02, 2015 at 05:57:10PM +0100, Thomas Monjalon wrote:
> 2015-12-02 22:23, Jerin Jacob:
> > On Wed, Dec 02, 2015 at 05:40:13PM +0100, Thomas Monjalon wrote:
> > > 2015-12-02 20:04, Jerin Jacob:
> > > > On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> > > > > On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > > > > > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > > > > > that lead to multiple definition and its not good.
> > > > > >
> > > > > But you will have similar issue since "typedef int32x4_t __m128i"
> > > > > appears in both your patch and this header file.
> > > > 
> > > > I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
> > > > is fine(unlike inline function).
> > > > 
> > > > my intention to keep __m128i "as is"  because changing the __m128i to rte_???
> > > > something would break the ABI.
> > > 
> > > Isn't it already broken in 2.2?
> > 
> > Does it mean, You would like to have rte_128i(or similar) kind of
> > abstraction to represent 128bit SIMD variable in DPDK?
> 
> If you are convinced that it is the best way to write a generic code, yes.
> I think the most important question is to know what is the best solution
> for performance and maintainability. The API/ABI questions will be considered

IMO, a true portable platform-independent library may need rte_128i kind
of abstracttion to represent a 128bit SIMD variable. I can send an RFC
patch to see the changes required across the DPDK.


> after.
> 
> Thanks for your involvement guys.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-02 16:57                     ` Thomas Monjalon
  2015-12-02 17:38                       ` Jerin Jacob
@ 2015-12-03  9:33                       ` Jerin Jacob
  2015-12-03 11:02                         ` Ananyev, Konstantin
  1 sibling, 1 reply; 50+ messages in thread
From: Jerin Jacob @ 2015-12-03  9:33 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Wed, Dec 02, 2015 at 05:57:10PM +0100, Thomas Monjalon wrote:
> 2015-12-02 22:23, Jerin Jacob:
> > On Wed, Dec 02, 2015 at 05:40:13PM +0100, Thomas Monjalon wrote:
> > > 2015-12-02 20:04, Jerin Jacob:
> > > > On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> > > > > On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > > > > > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > > > > > that lead to multiple definition and its not good.
> > > > > >
> > > > > But you will have similar issue since "typedef int32x4_t __m128i"
> > > > > appears in both your patch and this header file.
> > > >
> > > > I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
> > > > is fine(unlike inline function).
> > > >
> > > > my intention to keep __m128i "as is"  because changing the __m128i to rte_???
> > > > something would break the ABI.
> > >
> > > Isn't it already broken in 2.2?
> >
> > Does it mean, You would like to have rte_128i(or similar) kind of
> > abstraction to represent 128bit SIMD variable in DPDK?
>
> If you are convinced that it is the best way to write a generic code, yes.

I grep-ed through DPDK API list to see the dependency with SIMD in API
definition.I see only rte_lpm_lookupx4 API has SIMD dependency in API
definition.

I believe that's the root cause of the problem. IMO, The
better way to fix this would be to remove __m128i from API and have more
general representation to remove the architecture dependency from API

something like this,

rte_lpm_lookupx4(const struct rte_lpm *lpm, uint32_t *ip, uint16_t
hop[4], uint16_t defv)

instead of

rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t
hop[4],  uint16_t defv)

Now I am not sure why this API was created like this, from l3fwd.c
example, it looks to accommodate the IPV4 byte swap[1]. If it's true,
maybe we can have eal byte swap abstraction for optimized byte swap on
memory for 4 IP address in one shot

or

Have rte_lpm_lookupx4 take an argument for byte swap or not ?

or

something similar?

Thoughts ?

[1]
const  __m128i bswap_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11,
                                                4, 5, 6, 7, 0, 1, 2, 3);
/* Byte swap 4 IPV4 addresses. */
dip = _mm_shuffle_epi8(dip, bswap_mask);

Jerin

> I think the most important question is to know what is the best solution
> for performance and maintainability. The API/ABI questions will be considered
> after.
>
> Thanks for your involvement guys.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-03  9:33                       ` Jerin Jacob
@ 2015-12-03 11:02                         ` Ananyev, Konstantin
  2015-12-03 12:17                           ` Jerin Jacob
  0 siblings, 1 reply; 50+ messages in thread
From: Ananyev, Konstantin @ 2015-12-03 11:02 UTC (permalink / raw)
  To: Jerin Jacob, Thomas Monjalon; +Cc: dev

Hi Jerin,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> Sent: Thursday, December 03, 2015 9:34 AM
> To: Thomas Monjalon
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> 
> On Wed, Dec 02, 2015 at 05:57:10PM +0100, Thomas Monjalon wrote:
> > 2015-12-02 22:23, Jerin Jacob:
> > > On Wed, Dec 02, 2015 at 05:40:13PM +0100, Thomas Monjalon wrote:
> > > > 2015-12-02 20:04, Jerin Jacob:
> > > > > On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> > > > > > On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > > > > > > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > > > > > > that lead to multiple definition and its not good.
> > > > > > >
> > > > > > But you will have similar issue since "typedef int32x4_t __m128i"
> > > > > > appears in both your patch and this header file.
> > > > >
> > > > > I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
> > > > > is fine(unlike inline function).
> > > > >
> > > > > my intention to keep __m128i "as is"  because changing the __m128i to rte_???
> > > > > something would break the ABI.
> > > >
> > > > Isn't it already broken in 2.2?
> > >
> > > Does it mean, You would like to have rte_128i(or similar) kind of
> > > abstraction to represent 128bit SIMD variable in DPDK?
> >
> > If you are convinced that it is the best way to write a generic code, yes.
> 
> I grep-ed through DPDK API list to see the dependency with SIMD in API
> definition.I see only rte_lpm_lookupx4 API has SIMD dependency in API
> definition.
> 
> I believe that's the root cause of the problem. IMO, The
> better way to fix this would be to remove __m128i from API and have more
> general representation to remove the architecture dependency from API
> 
> something like this,
> 
> rte_lpm_lookupx4(const struct rte_lpm *lpm, uint32_t *ip, uint16_t
> hop[4], uint16_t defv)
> 
> instead of
> 
> rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t
> hop[4],  uint16_t defv)

The idea for that function was that rte_lpm_lookupx4() accepts 4 IPv4 addresses that are:
1. already in 128bit register
2. 'prepared' - byte swap is already done for them if needed. 

About ways to fix  __m128i dependency: as I can see x86 and arm DPDK code
already has xmm_t typedef:
 
$ find lib -type f | xargs grep xmm_t | grep typedef
lib/librte_eal/common/include/arch/x86/rte_vect.h:typedef __m128i xmm_t;
lib/librte_eal/common/include/arch/arm/rte_vect.h:typedef int32x4_t xmm_t;

Why not to  change rte_lpm_lookupx4() to accept xmm_t as input parameter.
As I understand it would solve the problem, and wouldn't introduce any API/ABI breakage, right?

Konstantin

> 
> Now I am not sure why this API was created like this, from l3fwd.c
> example, it looks to accommodate the IPV4 byte swap[1]. If it's true,
> maybe we can have eal byte swap abstraction for optimized byte swap on
> memory for 4 IP address in one shot
> 
> or
> 
> Have rte_lpm_lookupx4 take an argument for byte swap or not ?
> 
> or
> 
> something similar?
> 
> Thoughts ?
> 
> [1]
> const  __m128i bswap_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11,
>                                                 4, 5, 6, 7, 0, 1, 2, 3);
> /* Byte swap 4 IPV4 addresses. */
> dip = _mm_shuffle_epi8(dip, bswap_mask);
> 
> Jerin
> 
> > I think the most important question is to know what is the best solution
> > for performance and maintainability. The API/ABI questions will be considered
> > after.
> >
> > Thanks for your involvement guys.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-03 11:02                         ` Ananyev, Konstantin
@ 2015-12-03 12:17                           ` Jerin Jacob
  2015-12-03 12:42                             ` Ananyev, Konstantin
  0 siblings, 1 reply; 50+ messages in thread
From: Jerin Jacob @ 2015-12-03 12:17 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Thu, Dec 03, 2015 at 11:02:07AM +0000, Ananyev, Konstantin wrote:

Hi Konstantin,

> Hi Jerin,
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Thursday, December 03, 2015 9:34 AM
> > To: Thomas Monjalon
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> >
> > On Wed, Dec 02, 2015 at 05:57:10PM +0100, Thomas Monjalon wrote:
> > > 2015-12-02 22:23, Jerin Jacob:
> > > > On Wed, Dec 02, 2015 at 05:40:13PM +0100, Thomas Monjalon wrote:
> > > > > 2015-12-02 20:04, Jerin Jacob:
> > > > > > On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> > > > > > > On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > > > > > > > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > > > > > > > that lead to multiple definition and its not good.
> > > > > > > >
> > > > > > > But you will have similar issue since "typedef int32x4_t __m128i"
> > > > > > > appears in both your patch and this header file.
> > > > > >
> > > > > > I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
> > > > > > is fine(unlike inline function).
> > > > > >
> > > > > > my intention to keep __m128i "as is"  because changing the __m128i to rte_???
> > > > > > something would break the ABI.
> > > > >
> > > > > Isn't it already broken in 2.2?
> > > >
> > > > Does it mean, You would like to have rte_128i(or similar) kind of
> > > > abstraction to represent 128bit SIMD variable in DPDK?
> > >
> > > If you are convinced that it is the best way to write a generic code, yes.
> >
> > I grep-ed through DPDK API list to see the dependency with SIMD in API
> > definition.I see only rte_lpm_lookupx4 API has SIMD dependency in API
> > definition.
> >
> > I believe that's the root cause of the problem. IMO, The
> > better way to fix this would be to remove __m128i from API and have more
> > general representation to remove the architecture dependency from API
> >
> > something like this,
> >
> > rte_lpm_lookupx4(const struct rte_lpm *lpm, uint32_t *ip, uint16_t
> > hop[4], uint16_t defv)
> >
> > instead of
> >
> > rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t
> > hop[4],  uint16_t defv)
>
> The idea for that function was that rte_lpm_lookupx4() accepts 4 IPv4 addresses that are:
> 1. already in 128bit register
> 2. 'prepared' - byte swap is already done for them if needed.
>
> About ways to fix  __m128i dependency: as I can see x86 and arm DPDK code
> already has xmm_t typedef:
>
> $ find lib -type f | xargs grep xmm_t | grep typedef
> lib/librte_eal/common/include/arch/x86/rte_vect.h:typedef __m128i xmm_t;
> lib/librte_eal/common/include/arch/arm/rte_vect.h:typedef int32x4_t xmm_t;
>
> Why not to  change rte_lpm_lookupx4() to accept xmm_t as input parameter.
> As I understand it would solve the problem, and wouldn't introduce any API/ABI breakage, right?

Yes, If we have API/ABI breakage concerns.

IMO, Now this would call for some kind of rte_vect_* abstraction load,
store, set kind of SIMD operation on xmm_t in common test code to
aviod #ifdef's in app/test/test_lpm.c

I guess we may not need those abstractions in
lib/librte_eal/common/include/arch/ directory.
keeping in app/test/xmmt_ops.h should be enough, right?


>
> Konstantin
>
> >
> > Now I am not sure why this API was created like this, from l3fwd.c
> > example, it looks to accommodate the IPV4 byte swap[1]. If it's true,
> > maybe we can have eal byte swap abstraction for optimized byte swap on
> > memory for 4 IP address in one shot
> >
> > or
> >
> > Have rte_lpm_lookupx4 take an argument for byte swap or not ?
> >
> > or
> >
> > something similar?
> >
> > Thoughts ?
> >
> > [1]
> > const  __m128i bswap_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11,
> >                                                 4, 5, 6, 7, 0, 1, 2, 3);
> > /* Byte swap 4 IPV4 addresses. */
> > dip = _mm_shuffle_epi8(dip, bswap_mask);
> >
> > Jerin
> >
> > > I think the most important question is to know what is the best solution
> > > for performance and maintainability. The API/ABI questions will be considered
> > > after.
> > >
> > > Thanks for your involvement guys.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-03 12:17                           ` Jerin Jacob
@ 2015-12-03 12:42                             ` Ananyev, Konstantin
  2015-12-03 13:20                               ` Jerin Jacob
  0 siblings, 1 reply; 50+ messages in thread
From: Ananyev, Konstantin @ 2015-12-03 12:42 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Thursday, December 03, 2015 12:17 PM
> To: Ananyev, Konstantin
> Cc: Thomas Monjalon; dev@dpdk.org; viktorin@rehivetech.com
> Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> 
> On Thu, Dec 03, 2015 at 11:02:07AM +0000, Ananyev, Konstantin wrote:
> 
> Hi Konstantin,
> 
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > Sent: Thursday, December 03, 2015 9:34 AM
> > > To: Thomas Monjalon
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> > >
> > > On Wed, Dec 02, 2015 at 05:57:10PM +0100, Thomas Monjalon wrote:
> > > > 2015-12-02 22:23, Jerin Jacob:
> > > > > On Wed, Dec 02, 2015 at 05:40:13PM +0100, Thomas Monjalon wrote:
> > > > > > 2015-12-02 20:04, Jerin Jacob:
> > > > > > > On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> > > > > > > > On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > > > > > > > > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > > > > > > > > that lead to multiple definition and its not good.
> > > > > > > > >
> > > > > > > > But you will have similar issue since "typedef int32x4_t __m128i"
> > > > > > > > appears in both your patch and this header file.
> > > > > > >
> > > > > > > I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
> > > > > > > is fine(unlike inline function).
> > > > > > >
> > > > > > > my intention to keep __m128i "as is"  because changing the __m128i to rte_???
> > > > > > > something would break the ABI.
> > > > > >
> > > > > > Isn't it already broken in 2.2?
> > > > >
> > > > > Does it mean, You would like to have rte_128i(or similar) kind of
> > > > > abstraction to represent 128bit SIMD variable in DPDK?
> > > >
> > > > If you are convinced that it is the best way to write a generic code, yes.
> > >
> > > I grep-ed through DPDK API list to see the dependency with SIMD in API
> > > definition.I see only rte_lpm_lookupx4 API has SIMD dependency in API
> > > definition.
> > >
> > > I believe that's the root cause of the problem. IMO, The
> > > better way to fix this would be to remove __m128i from API and have more
> > > general representation to remove the architecture dependency from API
> > >
> > > something like this,
> > >
> > > rte_lpm_lookupx4(const struct rte_lpm *lpm, uint32_t *ip, uint16_t
> > > hop[4], uint16_t defv)
> > >
> > > instead of
> > >
> > > rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t
> > > hop[4],  uint16_t defv)
> >
> > The idea for that function was that rte_lpm_lookupx4() accepts 4 IPv4 addresses that are:
> > 1. already in 128bit register
> > 2. 'prepared' - byte swap is already done for them if needed.
> >
> > About ways to fix  __m128i dependency: as I can see x86 and arm DPDK code
> > already has xmm_t typedef:
> >
> > $ find lib -type f | xargs grep xmm_t | grep typedef
> > lib/librte_eal/common/include/arch/x86/rte_vect.h:typedef __m128i xmm_t;
> > lib/librte_eal/common/include/arch/arm/rte_vect.h:typedef int32x4_t xmm_t;
> >
> > Why not to  change rte_lpm_lookupx4() to accept xmm_t as input parameter.
> > As I understand it would solve the problem, and wouldn't introduce any API/ABI breakage, right?
> 
> Yes, If we have API/ABI breakage concerns.
> 
> IMO, Now this would call for some kind of rte_vect_* abstraction load,
> store, set kind of SIMD operation on xmm_t in common test code to
> aviod #ifdef's in app/test/test_lpm.c

Yes, seems so.

> 
> I guess we may not need those abstractions in
> lib/librte_eal/common/include/arch/ directory.
> keeping in app/test/xmmt_ops.h should be enough, right?

That sounds ok to me.
At least for now.
For future the more generic question - do we like to have some
generic layer abstraction for similar vector instrincts across different archs?
>From one side it might help people writing/using vector implementation of some stuff,
from other side - there would be extra hassle creating/supporting it.    

Konstantin

> 
> 
> >
> > Konstantin
> >
> > >
> > > Now I am not sure why this API was created like this, from l3fwd.c
> > > example, it looks to accommodate the IPV4 byte swap[1]. If it's true,
> > > maybe we can have eal byte swap abstraction for optimized byte swap on
> > > memory for 4 IP address in one shot
> > >
> > > or
> > >
> > > Have rte_lpm_lookupx4 take an argument for byte swap or not ?
> > >
> > > or
> > >
> > > something similar?
> > >
> > > Thoughts ?
> > >
> > > [1]
> > > const  __m128i bswap_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11,
> > >                                                 4, 5, 6, 7, 0, 1, 2, 3);
> > > /* Byte swap 4 IPV4 addresses. */
> > > dip = _mm_shuffle_epi8(dip, bswap_mask);
> > >
> > > Jerin
> > >
> > > > I think the most important question is to know what is the best solution
> > > > for performance and maintainability. The API/ABI questions will be considered
> > > > after.
> > > >
> > > > Thanks for your involvement guys.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
  2015-12-03 12:42                             ` Ananyev, Konstantin
@ 2015-12-03 13:20                               ` Jerin Jacob
  0 siblings, 0 replies; 50+ messages in thread
From: Jerin Jacob @ 2015-12-03 13:20 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Thu, Dec 03, 2015 at 12:42:13PM +0000, Ananyev, Konstantin wrote:
>
>
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Thursday, December 03, 2015 12:17 PM
> > To: Ananyev, Konstantin
> > Cc: Thomas Monjalon; dev@dpdk.org; viktorin@rehivetech.com
> > Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> >
> > On Thu, Dec 03, 2015 at 11:02:07AM +0000, Ananyev, Konstantin wrote:
> >
> > Hi Konstantin,
> >
> > > Hi Jerin,
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > > Sent: Thursday, December 03, 2015 9:34 AM
> > > > To: Thomas Monjalon
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> > > >
> > > > On Wed, Dec 02, 2015 at 05:57:10PM +0100, Thomas Monjalon wrote:
> > > > > 2015-12-02 22:23, Jerin Jacob:
> > > > > > On Wed, Dec 02, 2015 at 05:40:13PM +0100, Thomas Monjalon wrote:
> > > > > > > 2015-12-02 20:04, Jerin Jacob:
> > > > > > > > On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> > > > > > > > > On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > > > > > > > > > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > > > > > > > > > that lead to multiple definition and its not good.
> > > > > > > > > >
> > > > > > > > > But you will have similar issue since "typedef int32x4_t __m128i"
> > > > > > > > > appears in both your patch and this header file.
> > > > > > > >
> > > > > > > > I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
> > > > > > > > is fine(unlike inline function).
> > > > > > > >
> > > > > > > > my intention to keep __m128i "as is"  because changing the __m128i to rte_???
> > > > > > > > something would break the ABI.
> > > > > > >
> > > > > > > Isn't it already broken in 2.2?
> > > > > >
> > > > > > Does it mean, You would like to have rte_128i(or similar) kind of
> > > > > > abstraction to represent 128bit SIMD variable in DPDK?
> > > > >
> > > > > If you are convinced that it is the best way to write a generic code, yes.
> > > >
> > > > I grep-ed through DPDK API list to see the dependency with SIMD in API
> > > > definition.I see only rte_lpm_lookupx4 API has SIMD dependency in API
> > > > definition.
> > > >
> > > > I believe that's the root cause of the problem. IMO, The
> > > > better way to fix this would be to remove __m128i from API and have more
> > > > general representation to remove the architecture dependency from API
> > > >
> > > > something like this,
> > > >
> > > > rte_lpm_lookupx4(const struct rte_lpm *lpm, uint32_t *ip, uint16_t
> > > > hop[4], uint16_t defv)
> > > >
> > > > instead of
> > > >
> > > > rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t
> > > > hop[4],  uint16_t defv)
> > >
> > > The idea for that function was that rte_lpm_lookupx4() accepts 4 IPv4 addresses that are:
> > > 1. already in 128bit register
> > > 2. 'prepared' - byte swap is already done for them if needed.
> > >
> > > About ways to fix  __m128i dependency: as I can see x86 and arm DPDK code
> > > already has xmm_t typedef:
> > >
> > > $ find lib -type f | xargs grep xmm_t | grep typedef
> > > lib/librte_eal/common/include/arch/x86/rte_vect.h:typedef __m128i xmm_t;
> > > lib/librte_eal/common/include/arch/arm/rte_vect.h:typedef int32x4_t xmm_t;
> > >
> > > Why not to  change rte_lpm_lookupx4() to accept xmm_t as input parameter.
> > > As I understand it would solve the problem, and wouldn't introduce any API/ABI breakage, right?
> >
> > Yes, If we have API/ABI breakage concerns.
> >
> > IMO, Now this would call for some kind of rte_vect_* abstraction load,
> > store, set kind of SIMD operation on xmm_t in common test code to
> > aviod #ifdef's in app/test/test_lpm.c
>
> Yes, seems so.
>
> >
> > I guess we may not need those abstractions in
> > lib/librte_eal/common/include/arch/ directory.
> > keeping in app/test/xmmt_ops.h should be enough, right?
>
> That sounds ok to me.
> At least for now.
> For future the more generic question - do we like to have some
> generic layer abstraction for similar vector instrincts across different archs?
> From one side it might help people writing/using vector implementation of some stuff,
> from other side - there would be extra hassle creating/supporting it.

There are few such libaries avilable on web. example:

NEON -> SSE
https://software.intel.com/sites/default/files/managed/cf/f6/NEONvsSSE.h

SSE -> NEON
https://github.com/jratcliff63367/sse2neon/blob/master/SSE2NEON.h

but coming up with common abstraction will  be difficult as it's not
one to one mapped all the time and performance criteria to choose
the instruction on given architecture to realize a certain logic etc

Jerin

>
> Konstantin
>
> >
> >
> > >
> > > Konstantin
> > >
> > > >
> > > > Now I am not sure why this API was created like this, from l3fwd.c
> > > > example, it looks to accommodate the IPV4 byte swap[1]. If it's true,
> > > > maybe we can have eal byte swap abstraction for optimized byte swap on
> > > > memory for 4 IP address in one shot
> > > >
> > > > or
> > > >
> > > > Have rte_lpm_lookupx4 take an argument for byte swap or not ?
> > > >
> > > > or
> > > >
> > > > something similar?
> > > >
> > > > Thoughts ?
> > > >
> > > > [1]
> > > > const  __m128i bswap_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11,
> > > >                                                 4, 5, 6, 7, 0, 1, 2, 3);
> > > > /* Byte swap 4 IPV4 addresses. */
> > > > dip = _mm_shuffle_epi8(dip, bswap_mask);
> > > >
> > > > Jerin
> > > >
> > > > > I think the most important question is to know what is the best solution
> > > > > for performance and maintainability. The API/ABI questions will be considered
> > > > > after.
> > > > >
> > > > > Thanks for your involvement guys.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/3] support acl lib for armv7-a and a small fix
  2015-12-01 18:41 [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8 Jianbo Liu
                   ` (4 preceding siblings ...)
  2015-12-01 18:41 ` [PATCH 4/4] maintainers: claim resposibility for ARMv7 and ARMv8 Jianbo Liu
@ 2015-12-03 15:02 ` Jianbo Liu
  2015-12-03 15:02   ` [PATCH v2 1/3] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h Jianbo Liu
                     ` (3 more replies)
  5 siblings, 4 replies; 50+ messages in thread
From: Jianbo Liu @ 2015-12-03 15:02 UTC (permalink / raw)
  To: dev

This patchset includes a small fix in rte_cycle_32.h,
and support ACL for armv7-a platform.

v2:
- select alg as RTE_ACL_CLASSIFY_NEON only when NEON is checked in cpuflags.
- remove lpm/table/pipeline patch, and part of change will be merged into Jerin's.

Jianbo Liu (3):
  eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
  eal/acl: enable acl for armv7-a
  maintainers: claim resposibility for ARMv7 and ARMv8

 MAINTAINERS                                        |  2 ++
 config/defconfig_arm-armv7a-linuxapp-gcc           |  1 -
 lib/librte_acl/Makefile                            |  2 +-
 lib/librte_acl/rte_acl.c                           |  5 ++++-
 .../common/include/arch/arm/rte_cycles_32.h        |  2 +-
 lib/librte_eal/common/include/arch/arm/rte_vect.h  | 23 ++++++++++++++++++++++
 6 files changed, 31 insertions(+), 4 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/3] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
  2015-12-03 15:02 ` [PATCH v2 0/3] support acl lib for armv7-a and a small fix Jianbo Liu
@ 2015-12-03 15:02   ` Jianbo Liu
  2015-12-08  1:13     ` Thomas Monjalon
  2015-12-03 15:02   ` [PATCH v2 2/3] eal/acl: enable acl for armv7-a Jianbo Liu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 50+ messages in thread
From: Jianbo Liu @ 2015-12-03 15:02 UTC (permalink / raw)
  To: dev

CONFIG_* from config files can not be used in code.

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Jan Viktorin <viktorin@rehivetech.com>
---
 lib/librte_eal/common/include/arch/arm/rte_cycles_32.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
index 6c6098e..9c1be71 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_32.h
@@ -54,7 +54,7 @@ extern "C" {
  * @return
  *   The time base for this lcore.
  */
-#ifndef CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
+#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
 
 /**
  * This call is easily portable to any ARM architecture, however,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-03 15:02 ` [PATCH v2 0/3] support acl lib for armv7-a and a small fix Jianbo Liu
  2015-12-03 15:02   ` [PATCH v2 1/3] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h Jianbo Liu
@ 2015-12-03 15:02   ` Jianbo Liu
  2015-12-03 15:13     ` Jerin Jacob
  2015-12-08  1:18     ` Thomas Monjalon
  2015-12-03 15:02   ` [PATCH v2 3/3] maintainers: claim resposibility for ARMv7 and ARMv8 Jianbo Liu
  2015-12-08  1:24   ` [PATCH v2 0/3] support acl lib for armv7-a and a small fix Thomas Monjalon
  3 siblings, 2 replies; 50+ messages in thread
From: Jianbo Liu @ 2015-12-03 15:02 UTC (permalink / raw)
  To: dev

Implement vqtbl1q_u8 intrinsic function, which is not support in armv7-a.

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
---
 config/defconfig_arm-armv7a-linuxapp-gcc          |  1 -
 lib/librte_acl/Makefile                           |  2 +-
 lib/librte_acl/rte_acl.c                          |  5 ++++-
 lib/librte_eal/common/include/arch/arm/rte_vect.h | 23 +++++++++++++++++++++++
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index 9924ff9..cbebd64 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -53,7 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
 CONFIG_RTE_EAL_IGB_UIO=n
 
 # fails to compile on ARM
-CONFIG_RTE_LIBRTE_ACL=n
 CONFIG_RTE_LIBRTE_LPM=n
 CONFIG_RTE_LIBRTE_TABLE=n
 CONFIG_RTE_LIBRTE_PIPELINE=n
diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
index 897237d..2e394c9 100644
--- a/lib/librte_acl/Makefile
+++ b/lib/librte_acl/Makefile
@@ -49,7 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
 
-ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
+ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_neon.c
 CFLAGS_acl_run_neon.o += -flax-vector-conversions -Wno-maybe-uninitialized
 else
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index e2fdebd..4ba9786 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -114,8 +114,11 @@ rte_acl_init(void)
 {
 	enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT;
 
-#ifdef RTE_ARCH_ARM64
+#if defined(RTE_ARCH_ARM64)
 	alg =  RTE_ACL_CLASSIFY_NEON;
+#elif defined(RTE_ARCH_ARM)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+		alg =  RTE_ACL_CLASSIFY_NEON;
 #else
 #ifdef CC_AVX2_SUPPORT
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
index 21cdb4d..a33c054 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
@@ -53,6 +53,29 @@ typedef union rte_xmm {
 	double   pd[XMM_SIZE / sizeof(double)];
 } __attribute__((aligned(16))) rte_xmm_t;
 
+#ifdef RTE_ARCH_ARM
+/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
+static __inline uint8x16_t
+vqtbl1q_u8(uint8x16_t a, uint8x16_t b)
+{
+	uint8_t i, pos;
+	rte_xmm_t rte_a, rte_b, rte_ret;
+
+	vst1q_u8(rte_a.u8, a);
+	vst1q_u8(rte_b.u8, b);
+
+	for (i = 0; i < 16; i++) {
+		pos = rte_b.u8[i];
+		if (pos < 16)
+			rte_ret.u8[i] = rte_a.u8[pos];
+		else
+			rte_ret.u8[i] = 0;
+	}
+
+	return vld1q_u8(rte_ret.u8);
+}
+#endif
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/3] maintainers: claim resposibility for ARMv7 and ARMv8
  2015-12-03 15:02 ` [PATCH v2 0/3] support acl lib for armv7-a and a small fix Jianbo Liu
  2015-12-03 15:02   ` [PATCH v2 1/3] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h Jianbo Liu
  2015-12-03 15:02   ` [PATCH v2 2/3] eal/acl: enable acl for armv7-a Jianbo Liu
@ 2015-12-03 15:02   ` Jianbo Liu
  2015-12-08  1:24   ` [PATCH v2 0/3] support acl lib for armv7-a and a small fix Thomas Monjalon
  3 siblings, 0 replies; 50+ messages in thread
From: Jianbo Liu @ 2015-12-03 15:02 UTC (permalink / raw)
  To: dev

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4478862..f859985 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -124,10 +124,12 @@ F: doc/guides/sample_app_ug/multi_process.rst
 
 ARM v7
 M: Jan Viktorin <viktorin@rehivetech.com>
+M: Jianbo Liu <jianbo.liu@linaro.org>
 F: lib/librte_eal/common/include/arch/arm/
 
 ARM v8
 M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
+M: Jianbo Liu <jianbo.liu@linaro.org>
 F: lib/librte_eal/common/include/arch/arm/*_64.h
 F: lib/librte_acl/acl_run_neon.*
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-03 15:02   ` [PATCH v2 2/3] eal/acl: enable acl for armv7-a Jianbo Liu
@ 2015-12-03 15:13     ` Jerin Jacob
  2015-12-08  1:18     ` Thomas Monjalon
  1 sibling, 0 replies; 50+ messages in thread
From: Jerin Jacob @ 2015-12-03 15:13 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

On Thu, Dec 03, 2015 at 11:02:55PM +0800, Jianbo Liu wrote:
> Implement vqtbl1q_u8 intrinsic function, which is not support in armv7-a.
> 
> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

> ---
>  config/defconfig_arm-armv7a-linuxapp-gcc          |  1 -
>  lib/librte_acl/Makefile                           |  2 +-
>  lib/librte_acl/rte_acl.c                          |  5 ++++-
>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 23 +++++++++++++++++++++++
>  4 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> index 9924ff9..cbebd64 100644
> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> @@ -53,7 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
>  CONFIG_RTE_EAL_IGB_UIO=n
>  
>  # fails to compile on ARM
> -CONFIG_RTE_LIBRTE_ACL=n
>  CONFIG_RTE_LIBRTE_LPM=n
>  CONFIG_RTE_LIBRTE_TABLE=n
>  CONFIG_RTE_LIBRTE_PIPELINE=n
> diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
> index 897237d..2e394c9 100644
> --- a/lib/librte_acl/Makefile
> +++ b/lib/librte_acl/Makefile
> @@ -49,7 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
>  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
>  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
>  
> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
>  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_neon.c
>  CFLAGS_acl_run_neon.o += -flax-vector-conversions -Wno-maybe-uninitialized
>  else
> diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
> index e2fdebd..4ba9786 100644
> --- a/lib/librte_acl/rte_acl.c
> +++ b/lib/librte_acl/rte_acl.c
> @@ -114,8 +114,11 @@ rte_acl_init(void)
>  {
>  	enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT;
>  
> -#ifdef RTE_ARCH_ARM64
> +#if defined(RTE_ARCH_ARM64)
>  	alg =  RTE_ACL_CLASSIFY_NEON;
> +#elif defined(RTE_ARCH_ARM)
> +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
> +		alg =  RTE_ACL_CLASSIFY_NEON;
>  #else
>  #ifdef CC_AVX2_SUPPORT
>  	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> index 21cdb4d..a33c054 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> @@ -53,6 +53,29 @@ typedef union rte_xmm {
>  	double   pd[XMM_SIZE / sizeof(double)];
>  } __attribute__((aligned(16))) rte_xmm_t;
>  
> +#ifdef RTE_ARCH_ARM
> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> +static __inline uint8x16_t
> +vqtbl1q_u8(uint8x16_t a, uint8x16_t b)
> +{
> +	uint8_t i, pos;
> +	rte_xmm_t rte_a, rte_b, rte_ret;
> +
> +	vst1q_u8(rte_a.u8, a);
> +	vst1q_u8(rte_b.u8, b);
> +
> +	for (i = 0; i < 16; i++) {
> +		pos = rte_b.u8[i];
> +		if (pos < 16)
> +			rte_ret.u8[i] = rte_a.u8[pos];
> +		else
> +			rte_ret.u8[i] = 0;
> +	}
> +
> +	return vld1q_u8(rte_ret.u8);
> +}
> +#endif
> +
>  #ifdef __cplusplus
>  }
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/3] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
  2015-12-03 15:02   ` [PATCH v2 1/3] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h Jianbo Liu
@ 2015-12-08  1:13     ` Thomas Monjalon
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Monjalon @ 2015-12-08  1:13 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

2015-12-03 23:02, Jianbo Liu:
> CONFIG_* from config files can not be used in code.
> 
> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> Acked-by: Jan Viktorin <viktorin@rehivetech.com>

Fixes: 12f45fa7e29b ("eal/arm: read timer from PMU if enabled")

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-03 15:02   ` [PATCH v2 2/3] eal/acl: enable acl for armv7-a Jianbo Liu
  2015-12-03 15:13     ` Jerin Jacob
@ 2015-12-08  1:18     ` Thomas Monjalon
  2015-12-08  1:50       ` Jianbo Liu
  1 sibling, 1 reply; 50+ messages in thread
From: Thomas Monjalon @ 2015-12-08  1:18 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

2015-12-03 23:02, Jianbo Liu:
> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
[...]
> +#ifdef RTE_ARCH_ARM
> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */

I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?

Is ARCH_ARM32 or ARCH_ARMv7 too simple?
Is it possible to have a 32-bit ARMv8?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/3] support acl lib for armv7-a and a small fix
  2015-12-03 15:02 ` [PATCH v2 0/3] support acl lib for armv7-a and a small fix Jianbo Liu
                     ` (2 preceding siblings ...)
  2015-12-03 15:02   ` [PATCH v2 3/3] maintainers: claim resposibility for ARMv7 and ARMv8 Jianbo Liu
@ 2015-12-08  1:24   ` Thomas Monjalon
  3 siblings, 0 replies; 50+ messages in thread
From: Thomas Monjalon @ 2015-12-08  1:24 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

2015-12-03 23:02, Jianbo Liu:
> This patchset includes a small fix in rte_cycle_32.h,
> and support ACL for armv7-a platform.
> 
> v2:
> - select alg as RTE_ACL_CLASSIFY_NEON only when NEON is checked in cpuflags.
> - remove lpm/table/pipeline patch, and part of change will be merged into Jerin's.
> 
> Jianbo Liu (3):
>   eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h
>   eal/acl: enable acl for armv7-a
>   maintainers: claim resposibility for ARMv7 and ARMv8

Applied, thanks

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-08  1:18     ` Thomas Monjalon
@ 2015-12-08  1:50       ` Jianbo Liu
  2015-12-08  2:23         ` Thomas Monjalon
  0 siblings, 1 reply; 50+ messages in thread
From: Jianbo Liu @ 2015-12-08  1:50 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On 8 December 2015 at 09:18, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 2015-12-03 23:02, Jianbo Liu:
>> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
>> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
> [...]
>> +#ifdef RTE_ARCH_ARM
>> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>
> I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
> and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
>
https://lkml.org/lkml/2012/7/15/133

> Is ARCH_ARM32 or ARCH_ARMv7 too simple?
> Is it possible to have a 32-bit ARMv8?
Yes, ARMv8-R/M

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-08  1:50       ` Jianbo Liu
@ 2015-12-08  2:23         ` Thomas Monjalon
  2015-12-08  7:56           ` Jianbo Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Monjalon @ 2015-12-08  2:23 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

2015-12-08 09:50, Jianbo Liu:
> On 8 December 2015 at 09:18, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> > 2015-12-03 23:02, Jianbo Liu:
> >> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
> >> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
> > [...]
> >> +#ifdef RTE_ARCH_ARM
> >> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> >
> > I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
> > and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
> >
> https://lkml.org/lkml/2012/7/15/133
> 
> > Is ARCH_ARM32 or ARCH_ARMv7 too simple?
> > Is it possible to have a 32-bit ARMv8?
> Yes, ARMv8-R/M

So what does mean CONFIG_RTE_ARCH_ARM?
ARMv7? ARM32?
Please consider a renaming.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-08  2:23         ` Thomas Monjalon
@ 2015-12-08  7:56           ` Jianbo Liu
  2015-12-08 10:03             ` Thomas Monjalon
  0 siblings, 1 reply; 50+ messages in thread
From: Jianbo Liu @ 2015-12-08  7:56 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On 8 December 2015 at 10:23, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 2015-12-08 09:50, Jianbo Liu:
>> On 8 December 2015 at 09:18, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>> > 2015-12-03 23:02, Jianbo Liu:
>> >> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
>> >> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
>> > [...]
>> >> +#ifdef RTE_ARCH_ARM
>> >> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>> >
>> > I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
>> > and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
>> >
>> https://lkml.org/lkml/2012/7/15/133
>>
>> > Is ARCH_ARM32 or ARCH_ARMv7 too simple?
>> > Is it possible to have a 32-bit ARMv8?
>> Yes, ARMv8-R/M
>
> So what does mean CONFIG_RTE_ARCH_ARM?
> ARMv7? ARM32?
> Please consider a renaming.

I'd rather not renaming becase it can be both ARMv7 and AARCH32, which
are ISA compatibility.
If further differentiation is needed, CONFIG_RTE_ARCH_ARMv7 is added
in the config, just like Jan Viktorin did.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-08  7:56           ` Jianbo Liu
@ 2015-12-08 10:03             ` Thomas Monjalon
  2015-12-08 10:21               ` Jianbo Liu
  2015-12-08 10:25               ` Jan Viktorin
  0 siblings, 2 replies; 50+ messages in thread
From: Thomas Monjalon @ 2015-12-08 10:03 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

2015-12-08 15:56, Jianbo Liu:
> On 8 December 2015 at 10:23, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> > 2015-12-08 09:50, Jianbo Liu:
> >> On 8 December 2015 at 09:18, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> >> > 2015-12-03 23:02, Jianbo Liu:
> >> >> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
> >> >> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
> >> > [...]
> >> >> +#ifdef RTE_ARCH_ARM
> >> >> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> >> >
> >> > I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
> >> > and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
> >> >
> >> https://lkml.org/lkml/2012/7/15/133
> >>
> >> > Is ARCH_ARM32 or ARCH_ARMv7 too simple?
> >> > Is it possible to have a 32-bit ARMv8?
> >> Yes, ARMv8-R/M
> >
> > So what does mean CONFIG_RTE_ARCH_ARM?
> > ARMv7? ARM32?
> > Please consider a renaming.
> 
> I'd rather not renaming becase it can be both ARMv7 and AARCH32, which
> are ISA compatibility.
> If further differentiation is needed, CONFIG_RTE_ARCH_ARMv7 is added
> in the config, just like Jan Viktorin did.

I don't understand.
You say CONFIG_RTE_ARCH_ARM is for ARMv7 and AARCH32, right?
Both are 32-bit right?
Why not rename it to CONFIG_RTE_ARCH_ARM32?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-08 10:03             ` Thomas Monjalon
@ 2015-12-08 10:21               ` Jianbo Liu
  2015-12-08 10:38                 ` Thomas Monjalon
  2015-12-08 10:25               ` Jan Viktorin
  1 sibling, 1 reply; 50+ messages in thread
From: Jianbo Liu @ 2015-12-08 10:21 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On 8 December 2015 at 18:03, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 2015-12-08 15:56, Jianbo Liu:
>> On 8 December 2015 at 10:23, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>> > 2015-12-08 09:50, Jianbo Liu:
>> >> On 8 December 2015 at 09:18, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>> >> > 2015-12-03 23:02, Jianbo Liu:
>> >> >> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
>> >> >> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
>> >> > [...]
>> >> >> +#ifdef RTE_ARCH_ARM
>> >> >> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
>> >> >
>> >> > I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
>> >> > and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
>> >> >
>> >> https://lkml.org/lkml/2012/7/15/133
>> >>
>> >> > Is ARCH_ARM32 or ARCH_ARMv7 too simple?
>> >> > Is it possible to have a 32-bit ARMv8?
>> >> Yes, ARMv8-R/M
>> >
>> > So what does mean CONFIG_RTE_ARCH_ARM?
>> > ARMv7? ARM32?
>> > Please consider a renaming.
>>
>> I'd rather not renaming becase it can be both ARMv7 and AARCH32, which
>> are ISA compatibility.
>> If further differentiation is needed, CONFIG_RTE_ARCH_ARMv7 is added
>> in the config, just like Jan Viktorin did.
>
> I don't understand.
> You say CONFIG_RTE_ARCH_ARM is for ARMv7 and AARCH32, right?
> Both are 32-bit right?
> Why not rename it to CONFIG_RTE_ARCH_ARM32?

I understand that you want to make the naming more clear.
But arm/arm64 are used in Linux kernel, I think it's better to stay the same.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-08 10:03             ` Thomas Monjalon
  2015-12-08 10:21               ` Jianbo Liu
@ 2015-12-08 10:25               ` Jan Viktorin
  1 sibling, 0 replies; 50+ messages in thread
From: Jan Viktorin @ 2015-12-08 10:25 UTC (permalink / raw)
  To: Thomas Monjalon, Jianbo Liu; +Cc: dev

  Původní zpráva  
Od: Thomas Monjalon
Odesláno: úterý, 8. prosince 2015 11:04
Komu: Jianbo Liu
Kopie: dev@dpdk.org; Jan Viktorin; Jerin Jacob
Předmět: Re: [dpdk-dev] [PATCH v2 2/3] eal/acl: enable acl for armv7-a

2015-12-08 15:56, Jianbo Liu:
> On 8 December 2015 at 10:23, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> > 2015-12-08 09:50, Jianbo Liu:
> >> On 8 December 2015 at 09:18, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> >> > 2015-12-03 23:02, Jianbo Liu:
> >> >> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
> >> >> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
> >> > [...]
> >> >> +#ifdef RTE_ARCH_ARM
> >> >> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> >> >
> >> > I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
> >> > and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
> >> >
> >> https://lkml.org/lkml/2012/7/15/133
> >>
> >> > Is ARCH_ARM32 or ARCH_ARMv7 too simple?
> >> > Is it possible to have a 32-bit ARMv8?
> >> Yes, ARMv8-R/M
> >
> > So what does mean CONFIG_RTE_ARCH_ARM?
> > ARMv7? ARM32?
> > Please consider a renaming.
> 
> I'd rather not renaming becase it can be both ARMv7 and AARCH32, which
> are ISA compatibility.
> If further differentiation is needed, CONFIG_RTE_ARCH_ARMv7 is added
> in the config, just like Jan Viktorin did.

>> I don't understand.
>> You say CONFIG_RTE_ARCH_ARM is for ARMv7 and AARCH32, right?
>> Both are 32-bit right?
>> Why not rename it to CONFIG_RTE_ARCH_ARM32?

Hello,

CONFIG_RTE_ARCH_ARMv7 entry specifies the certain architecture, subset of CONFIG_RTE_ARCH_ARM (which is 32b).

For ARM64 we can differentiate among architectures v8, v9, ..., v64 :) as well.

However, I doubt somebody will use dpdk on ARMv6.

IMO, the major options should be CONFIG_RTE_ARCH_ARM and CONFIG_RTE_ARCH_ARM64.

Jan Viktorin
RehiveTech
Sent from a mobile device‎

‎

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-08 10:21               ` Jianbo Liu
@ 2015-12-08 10:38                 ` Thomas Monjalon
  2015-12-08 11:27                   ` Jan Viktorin
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Monjalon @ 2015-12-08 10:38 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev

2015-12-08 18:21, Jianbo Liu:
> On 8 December 2015 at 18:03, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> > 2015-12-08 15:56, Jianbo Liu:
> >> On 8 December 2015 at 10:23, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> >> > 2015-12-08 09:50, Jianbo Liu:
> >> >> On 8 December 2015 at 09:18, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> >> >> > 2015-12-03 23:02, Jianbo Liu:
> >> >> >> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
> >> >> >> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
> >> >> > [...]
> >> >> >> +#ifdef RTE_ARCH_ARM
> >> >> >> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> >> >> >
> >> >> > I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
> >> >> > and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
> >> >> >
> >> >> https://lkml.org/lkml/2012/7/15/133
> >> >>
> >> >> > Is ARCH_ARM32 or ARCH_ARMv7 too simple?
> >> >> > Is it possible to have a 32-bit ARMv8?
> >> >> Yes, ARMv8-R/M
> >> >
> >> > So what does mean CONFIG_RTE_ARCH_ARM?
> >> > ARMv7? ARM32?
> >> > Please consider a renaming.
> >>
> >> I'd rather not renaming becase it can be both ARMv7 and AARCH32, which
> >> are ISA compatibility.
> >> If further differentiation is needed, CONFIG_RTE_ARCH_ARMv7 is added
> >> in the config, just like Jan Viktorin did.
> >
> > I don't understand.
> > You say CONFIG_RTE_ARCH_ARM is for ARMv7 and AARCH32, right?
> > Both are 32-bit right?
> > Why not rename it to CONFIG_RTE_ARCH_ARM32?
> 
> I understand that you want to make the naming more clear.
> But arm/arm64 are used in Linux kernel, I think it's better to stay the same.

Linux supports ARM for a very long time. Doing a rename now is costly.
The DPDK support is recent. Keeping a bad naming scheme because an
old project follows this scheme is insane.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/3] eal/acl: enable acl for armv7-a
  2015-12-08 10:38                 ` Thomas Monjalon
@ 2015-12-08 11:27                   ` Jan Viktorin
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Viktorin @ 2015-12-08 11:27 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Tue, 08 Dec 2015 11:38:46 +0100
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2015-12-08 18:21, Jianbo Liu:
> > On 8 December 2015 at 18:03, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:  
> > > 2015-12-08 15:56, Jianbo Liu:  
> > >> On 8 December 2015 at 10:23, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:  
> > >> > 2015-12-08 09:50, Jianbo Liu:  
> > >> >> On 8 December 2015 at 09:18, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:  
> > >> >> > 2015-12-03 23:02, Jianbo Liu:  
> > >> >> >> -ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
> > >> >> >> +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)  
> > >> >> > [...]  
> > >> >> >> +#ifdef RTE_ARCH_ARM
> > >> >> >> +/* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */  
> > >> >> >
> > >> >> > I'm convinced there is a good reason why ARMv8 is also called ARCH_ARM64,
> > >> >> > and ARMv7 may be called AArch32 or ARCH_ARM. But I don't know why?
> > >> >> >  
> > >> >> https://lkml.org/lkml/2012/7/15/133
> > >> >>  
> > >> >> > Is ARCH_ARM32 or ARCH_ARMv7 too simple?
> > >> >> > Is it possible to have a 32-bit ARMv8?  
> > >> >> Yes, ARMv8-R/M
> > >> >
> > >> > So what does mean CONFIG_RTE_ARCH_ARM?
> > >> > ARMv7? ARM32?
> > >> > Please consider a renaming.  
> > >>
> > >> I'd rather not renaming becase it can be both ARMv7 and AARCH32, which
> > >> are ISA compatibility.
> > >> If further differentiation is needed, CONFIG_RTE_ARCH_ARMv7 is added
> > >> in the config, just like Jan Viktorin did.  
> > >
> > > I don't understand.
> > > You say CONFIG_RTE_ARCH_ARM is for ARMv7 and AARCH32, right?
> > > Both are 32-bit right?
> > > Why not rename it to CONFIG_RTE_ARCH_ARM32?  
> > 
> > I understand that you want to make the naming more clear.
> > But arm/arm64 are used in Linux kernel, I think it's better to stay the same.  
> 
> Linux supports ARM for a very long time. Doing a rename now is costly.
> The DPDK support is recent. Keeping a bad naming scheme because an
> old project follows this scheme is insane.

I think, the idea about having CONFIG_RTE_ARCH_ARM32 is not bad. We
should do it soon, however.

Regards
Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2015-12-08 11:32 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-01 18:41 [PATCH 0/4] support acl/lpm/table/pipeline libs for armv7 and armv8 Jianbo Liu
2015-12-01 12:47 ` Jan Viktorin
2015-12-01 20:56   ` Jianbo Liu
2015-12-01 18:41 ` [PATCH 1/4] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h Jianbo Liu
2015-12-01 12:41   ` Jan Viktorin
2015-12-01 12:43   ` Jan Viktorin
2015-12-01 18:41 ` [PATCH 2/4] eal/acl: enable acl for armv7-a Jianbo Liu
2015-12-01 14:43   ` Jerin Jacob
2015-12-01 14:46     ` Jan Viktorin
2015-12-02  6:14       ` Jianbo Liu
2015-12-01 18:41 ` [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs Jianbo Liu
2015-12-01 16:41   ` Jerin Jacob
2015-12-01 17:02     ` Jan Viktorin
2015-12-02  7:02     ` Jianbo Liu
     [not found]     ` <CAP4Qi3-5ofDU-2-4KsxFzMC1OpTsc5WjmxcFT2Eu_URA0UBzDw@mail.gmail.com>
2015-12-02  8:03       ` Jerin Jacob
2015-12-02  9:49         ` Jianbo Liu
2015-12-02 10:33           ` Ananyev, Konstantin
2015-12-02 10:48             ` Jerin Jacob
2015-12-02 13:06               ` Jan Viktorin
2015-12-02 10:39           ` Jerin Jacob
2015-12-02 13:05             ` Jan Viktorin
2015-12-02 13:13             ` Jianbo Liu
2015-12-02 14:34               ` Jerin Jacob
2015-12-02 16:40                 ` Thomas Monjalon
2015-12-02 16:53                   ` Jerin Jacob
2015-12-02 16:57                     ` Thomas Monjalon
2015-12-02 17:38                       ` Jerin Jacob
2015-12-03  9:33                       ` Jerin Jacob
2015-12-03 11:02                         ` Ananyev, Konstantin
2015-12-03 12:17                           ` Jerin Jacob
2015-12-03 12:42                             ` Ananyev, Konstantin
2015-12-03 13:20                               ` Jerin Jacob
2015-12-01 18:41 ` [PATCH 4/4] maintainers: claim resposibility for ARMv7 and ARMv8 Jianbo Liu
2015-12-01 16:44   ` Jerin Jacob
2015-12-03 15:02 ` [PATCH v2 0/3] support acl lib for armv7-a and a small fix Jianbo Liu
2015-12-03 15:02   ` [PATCH v2 1/3] eal/arm: use RTE_ARM_EAL_RDTSC_USE_PMU in rte_cycle_32.h Jianbo Liu
2015-12-08  1:13     ` Thomas Monjalon
2015-12-03 15:02   ` [PATCH v2 2/3] eal/acl: enable acl for armv7-a Jianbo Liu
2015-12-03 15:13     ` Jerin Jacob
2015-12-08  1:18     ` Thomas Monjalon
2015-12-08  1:50       ` Jianbo Liu
2015-12-08  2:23         ` Thomas Monjalon
2015-12-08  7:56           ` Jianbo Liu
2015-12-08 10:03             ` Thomas Monjalon
2015-12-08 10:21               ` Jianbo Liu
2015-12-08 10:38                 ` Thomas Monjalon
2015-12-08 11:27                   ` Jan Viktorin
2015-12-08 10:25               ` Jan Viktorin
2015-12-03 15:02   ` [PATCH v2 3/3] maintainers: claim resposibility for ARMv7 and ARMv8 Jianbo Liu
2015-12-08  1:24   ` [PATCH v2 0/3] support acl lib for armv7-a and a small fix Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.