From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jianbo Liu Subject: Re: [PATCH v3 2/4] ixgbe: implement vector PMD for arm architecture Date: Thu, 26 May 2016 09:37:10 +0800 Message-ID: References: <1461159902-16680-1-git-send-email-jianbo.liu@linaro.org> <1462515948-23906-1-git-send-email-jianbo.liu@linaro.org> <1462515948-23906-3-git-send-email-jianbo.liu@linaro.org> <20160525122935.GA30670@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: dev@dpdk.org, Bruce Richardson , "Zhang, Helin" , "Ananyev, Konstantin" To: Jerin Jacob Return-path: Received: from mail-yw0-f170.google.com (mail-yw0-f170.google.com [209.85.161.170]) by dpdk.org (Postfix) with ESMTP id 7428D2BE0 for ; Thu, 26 May 2016 03:37:11 +0200 (CEST) Received: by mail-yw0-f170.google.com with SMTP id c127so63541780ywb.1 for ; Wed, 25 May 2016 18:37:11 -0700 (PDT) In-Reply-To: <20160525122935.GA30670@localhost.localdomain> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 25 May 2016 at 20:29, Jerin Jacob wrote: > On Fri, May 06, 2016 at 11:55:46AM +0530, Jianbo Liu wrote: >> use ARM NEON intrinsic to implement ixgbe vPMD >> >> Signed-off-by: Jianbo Liu >> --- >> drivers/net/ixgbe/Makefile | 4 + >> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 561 ++++++++++++++++++++++++++++++++ >> 2 files changed, 565 insertions(+) >> create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c >> + /* Read desc statuses backwards to avoid race condition */ >> + /* A.1 load 4 pkts desc */ >> + descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); >> + rte_rmb(); > > Any specific reason to add rte_rmb() here, If there is no performance > drop then it makes sense to add before descs[3] uses it.i.e > at rte_compiler_barrier() place in x86 code. > To avoid desc statuses inconsistent since they are read backwards. >> + >> + /* B.2 copy 2 mbuf point into rx_pkts */ >> + vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); >> + >> + /* B.1 load 1 mbuf point */ >> + mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); >> + >> + descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); >> + /* B.1 load 2 mbuf point */ >> + descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); >> + descs[0] = vld1q_u64((uint64_t *)(rxdp)); >> + >> + /* B.2 copy 2 mbuf point into rx_pkts */ >> + vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); >> + >> + if (split_packet) { >> + rte_prefetch_non_temporal(&rx_pkts[pos]->cacheline1); >> + rte_prefetch_non_temporal(&rx_pkts[pos+1]->cacheline1); >> + rte_prefetch_non_temporal(&rx_pkts[pos+2]->cacheline1); >> + rte_prefetch_non_temporal(&rx_pkts[pos+3]->cacheline1); > > replace with rte_mbuf_prefetch_part2 or equivalent > rte_mbuf_prefetch_part2 is new functions after this patchset, so it's better to submit a new patch as Bruce said.