From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerin Jacob Subject: Re: [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs Date: Wed, 2 Dec 2015 16:18:13 +0530 Message-ID: <20151202104811.GA6337@localhost.localdomain> References: <1448995276-9599-1-git-send-email-jianbo.liu@linaro.org> <1448995276-9599-4-git-send-email-jianbo.liu@linaro.org> <20151201164139.GA12144@localhost.localdomain> <20151202080259.GA32494@localhost.localdomain> <2601191342CEEE43887BDE71AB97725836ACFADA@irsmsx105.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: "dev@dpdk.org" To: "Ananyev, Konstantin" Return-path: Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0091.outbound.protection.outlook.com [207.46.100.91]) by dpdk.org (Postfix) with ESMTP id 3CF8E568A for ; Wed, 2 Dec 2015 11:48:38 +0100 (CET) Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB97725836ACFADA@irsmsx105.ger.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Dec 02, 2015 at 10:33:44AM +0000, Ananyev, Konstantin wrote: > Hi everyone, > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu > > Sent: Wednesday, December 02, 2015 9:50 AM > > To: Jerin Jacob > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs > > > > On 2 December 2015 at 16:03, Jerin Jacob wrote: > > > On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote: > > >> On 2 December 2015 at 00:41, Jerin Jacob wrote: > > >> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote: > > >> >> Adds ARM NEON support for lpm. > > >> >> And enables table/pipeline libraries which depend on lpm. > > >> > > > >> > I already sent the patch on the same yesterday. > > >> > We can converge the patches after the discussion. > > >> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml > > >> > > > >> Yes, I have read your patch. But there are many differences, so I sent > > >> mine for your reviewing :) > > >> > > >> > > > >> >> > > >> >> Signed-off-by: Jianbo Liu > > >> >> --- > > >> >> config/defconfig_arm-armv7a-linuxapp-gcc | 3 - > > >> >> config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > > >> >> lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++ > > >> >> lib/librte_lpm/rte_lpm.h | 68 ++++++++++++++++------- > > >> >> 4 files changed, 77 insertions(+), 25 deletions(-) > > >> >> > > >> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc > > >> >> index cbebd64..efffa1f 100644 > > >> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc > > >> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc > > >> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n > > >> >> CONFIG_RTE_EAL_IGB_UIO=n > > >> >> > > >> >> # fails to compile on ARM > > >> >> -CONFIG_RTE_LIBRTE_LPM=n > > >> >> -CONFIG_RTE_LIBRTE_TABLE=n > > >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n > > >> >> CONFIG_RTE_SCHED_VECTOR=n > > >> >> > > >> >> # cannot use those on ARM > > >> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc > > >> >> index 504f3ed..57f7941 100644 > > >> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc > > >> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc > > >> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n > > >> >> CONFIG_RTE_LIBRTE_FM10K_PMD=n > > >> >> CONFIG_RTE_LIBRTE_I40E_PMD=n > > >> >> > > >> >> -CONFIG_RTE_LIBRTE_LPM=n > > >> >> -CONFIG_RTE_LIBRTE_TABLE=n > > >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n > > >> >> CONFIG_RTE_SCHED_VECTOR=n > > >> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h > > >> >> index a33c054..7437711 100644 > > >> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h > > >> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h > > >> >> @@ -41,6 +41,8 @@ extern "C" { > > >> >> > > >> >> typedef int32x4_t xmm_t; > > >> >> > > >> >> +typedef int32x4_t __m128i; > > >> >> + > > >> >> #define XMM_SIZE (sizeof(xmm_t)) > > >> >> #define XMM_MASK (XMM_SIZE - 1) > > >> >> > > >> >> @@ -53,6 +55,32 @@ typedef union rte_xmm { > > >> >> double pd[XMM_SIZE / sizeof(double)]; > > >> >> } __attribute__((aligned(16))) rte_xmm_t; > > >> >> > > >> >> +static __inline __m128i > > >> >> +_mm_set_epi32(int i3, int i2, int i1, int i0) > > >> >> +{ > > >> >> + int32_t r[4] = {i0, i1, i2, i3}; > > >> >> + > > >> >> + return vld1q_s32(r); > > >> >> +} > > >> >> + > > >> >> +static __inline __m128i > > >> >> +_mm_loadu_si128(__m128i *p) > > >> >> +{ > > >> >> + return vld1q_s32((int32_t *)p); > > >> >> +} > > >> >> + > > >> >> +static __inline __m128i > > >> >> +_mm_set1_epi32(int i) > > >> >> +{ > > >> >> + return vdupq_n_s32(i); > > >> >> +} > > >> >> + > > >> >> +static __inline __m128i > > >> >> +_mm_and_si128(__m128i a, __m128i b) > > >> >> +{ > > >> >> + return vandq_s32(a, b); > > >> >> +} > > >> >> + > > > > > > IMO, it's not always good to emulate GCC defined intrinsics of > > > other architecture. What if a legacy DPDK application has such mappings > > > then BOOM, multiple definition, which one is correct? which one > > > to comment it out? Integration pain starts for DPDK library consumer:-( > > > > > They can include rte_vect.h in build/include directly, which is linked correctly > > to the one for that ARCH, so there is no need to worry about. > > > > > > >> > > > >> > IMO, it makes sense to not emulate the SSE intrinsics with NEON > > >> > Let's create the rte_vect_* as required. look at the existing patch. > > >> > > > >> I thought of creating a layer of SIMD over all the platforms before. > > >> But can't you see it make things complicated, considering there are > > >> only few simple intrinsic to implement? > > > > > > Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON > > > implementation if I were to take this approach and emulation comes with > > > the cost. > > > > > No, I will not re-implement all the intrinsic like that . > > I only do with the simple intrinsic, such as load/store, as you said below. > > > > > So my take is, > > > lets the each architecture implementation for specific SIMD version of DPDK > > > API in the library should have the freedom to implement the API in > > > NATIVE. > > > > > > And let's create only rte_vect_* abstraction only for using > > > that API/library. Which boils down to have very minimal rte_vect_* > > > abstraction to load, store, set not beyond that. > > > > > > This makes clear "contract" between DPDK library and the applications. > > > and make easy for remaning new architecture porting effort in DPDK. > > > > > Agree. > > But I reuse existing intrinsic names, and you recreate new ones. > > And I try to do as few changes as possible, and try to avoid any > > mistaken which may cause code un-compiled. > > I think it's design level question, we need to hear what others talk about it. > > > > > Imagine how your proposed function will look like if new architecture > > > wants to implement "optimized" version of rte_lpm_lookupx4 > > > > > There is no optimization for this (simple) rte_lpm_lookupx4, otherwise > > you have done that in your patch. > > If there is for other new platform, defintely they should do like > > yours, as you did for NEON ACL. > > > > > > > >> If do so, we also need to explain to others how to use these interfaces. > > >> Besides, this patch did the smallest changes to the original code, and > > >> more likely to be accepted by others. > > > > > > other patch makes no changes to IA version of rte_lpm_lookupx4.I thought > > > that make reviewer easy to review the changes in architecture > > > perspective. > > > > > As I know, they don't enable LPM for PPC, and ARM is the first one to > > touch this issue. > > > > >> > > >> > > > >> >> #ifdef RTE_ARCH_ARM > > >> >> /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */ > > >> >> static __inline uint8x16_t > > >> >> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h > > >> >> index c299ce2..c76c07d 100644 > > >> >> --- a/lib/librte_lpm/rte_lpm.h > > >> >> +++ b/lib/librte_lpm/rte_lpm.h > > >> >> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips, > > >> >> /* Mask four results. */ > > >> >> #define RTE_LPM_MASKX4_RES UINT64_C(0x00ff00ff00ff00ff) > > >> >> > > >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) > > >> > > > >> > Separate out arm implementation to the different header file. > > >> > Too many ifdef looks odd in the header file and difficult to manage. > > >> > > > >> But there are many ifdefs already. > > >> And It seems unreasonable to add a new file only for one small function. > > >> > > > > > > small or big, its matter of each architecture to have > > > the freedom for the optimized version for the implementation. > > > > > > What if other architecture demands to write this function in assembly > > > or restructure it for performance improvement? > > > > > If there is such demands, should do like that. > > But I don't see any restructure in your patch, and you still follow > > the logic as x86, is it worth adding a new file? > > > > My preference would also be to put architecture dependent implementation > into different files. > Might be create lib/librte_lpm/arch/(arm|x86)/... here? > Konstantin +1 my existing patch creates lib/librte_lpm/rte_lpm_neon.h instead of lib/librte_lpm/arch/arm/rte_lpm_neon.h like lib/librte_hash/rte_cmp_x86.h I am OK for changing the directory structure as proposed in my next revision of patch. Let me know if anyone has any objections/concerns. Jerin