From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6371DC43331 for ; Tue, 24 Mar 2020 14:38:49 +0000 (UTC) Received: from dpdk.org (dpdk.org [92.243.14.124]) by mail.kernel.org (Postfix) with ESMTP id B28962074D for ; Tue, 24 Mar 2020 14:38:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B28962074D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ashroe.eu Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 61C462C6D; Tue, 24 Mar 2020 15:38:47 +0100 (CET) Received: from relay0139.mxlogin.com (relay0139.mxlogin.com [199.181.239.139]) by dpdk.org (Postfix) with ESMTP id 8682B2BAE for ; Tue, 24 Mar 2020 15:38:45 +0100 (CET) Received: from filter004.mxroute.com ([149.28.56.236] 149.28.56.236.vultr.com) (Authenticated sender: mN4UYu2MZsgR) by relay0139.mxlogin.com (ZoneMTA) with ESMTPSA id 1710cfb30ab0000766.001 for (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256); Tue, 24 Mar 2020 14:38:40 +0000 X-Zone-Loop: 653b9de0670b921c98ada5daf9fdd81ef4b34147c757 X-Originating-IP: [149.28.56.236] Received: from galaxy.mxroute.com (unknown [23.92.70.113]) by filter004.mxroute.com (Postfix) with ESMTPS id CDA463EADE; Tue, 24 Mar 2020 14:38:34 +0000 (UTC) Received: from [192.198.151.43] by galaxy.mxroute.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.91) (envelope-from ) id 1jGkK5-0002YQ-13; Tue, 24 Mar 2020 10:14:29 -0400 To: Pavan Nikhilesh Bhagavatula , Jerin Jacob Kollanukkaran , Nithin Kumar Dabilpuram Cc: "dev@dpdk.org" , "thomas@monjalon.net" , "david.marchand@redhat.com" , "mattias.ronnblom@ericsson.com" , Kiran Kumar Kokkilagadda References: <20200318213551.3489504-1-jerinj@marvell.com> <20200318213551.3489504-21-jerinj@marvell.com> <02c4c25a-83ba-dac5-20e6-7b140cbcb4f1@ashroe.eu> <5a99e696-3853-5782-0a4c-0debcc74faa8@ashroe.eu> <20a3cb35-d57b-4799-f084-919f3f55da6f@ashroe.eu> From: Ray Kinsella Autocrypt: addr=mdr@ashroe.eu; keydata= mQINBFv8B3wBEAC+5ImcgbIvadt3axrTnt7Sxch3FsmWTTomXfB8YiuHT8KL8L/bFRQSL1f6 ASCHu3M89EjYazlY+vJUWLr0BhK5t/YI7bQzrOuYrl9K94vlLwzD19s/zB/g5YGGR5plJr0s JtJsFGEvF9LL3e+FKMRXveQxBB8A51nAHfwG0WSyx53d61DYz7lp4/Y4RagxaJoHp9lakn8j HV2N6rrnF+qt5ukj5SbbKWSzGg5HQF2t0QQ5tzWhCAKTfcPlnP0GymTBfNMGOReWivi3Qqzr S51Xo7hoGujUgNAM41sxpxmhx8xSwcQ5WzmxgAhJ/StNV9cb3HWIoE5StCwQ4uXOLplZNGnS uxNdegvKB95NHZjRVRChg/uMTGpg9PqYbTIFoPXjuk27sxZLRJRrueg4tLbb3HM39CJwSB++ YICcqf2N+GVD48STfcIlpp12/HI+EcDSThzfWFhaHDC0hyirHxJyHXjnZ8bUexI/5zATn/ux TpMbc/vicJxeN+qfaVqPkCbkS71cHKuPluM3jE8aNCIBNQY1/j87k5ELzg3qaesLo2n1krBH bKvFfAmQuUuJT84/IqfdVtrSCTabvDuNBDpYBV0dGbTwaRfE7i+LiJJclUr8lOvHUpJ4Y6a5 0cxEPxm498G12Z3NoY/mP5soItPIPtLR0rA0fage44zSPwp6cQARAQABtBxSYXkgS2luc2Vs bGEgPG1kckBhc2hyb2UuZXU+iQJUBBMBCAA+FiEEcDUDlKDJaDuJlfZfdJdaH/sCCpsFAlv8 B3wCGyMFCQlmAYAFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQdJdaH/sCCptdtRAAl0oE msa+djBVYLIsax+0f8acidtWg2l9f7kc2hEjp9h9aZCpPchQvhhemtew/nKavik3RSnLTAyn B3C/0GNlmvI1l5PFROOgPZwz4xhJKGN7jOsRrbkJa23a8ly5UXwF3Vqnlny7D3z+7cu1qq/f VRK8qFyWkAb+xgqeZ/hTcbJUWtW+l5Zb+68WGEp8hB7TuJLEWb4+VKgHTpQ4vElYj8H3Z94a 04s2PJMbLIZSgmKDASnyrKY0CzTpPXx5rSJ1q+B1FCsfepHLqt3vKSALa3ld6bJ8fSJtDUJ7 JLiU8dFZrywgDIVme01jPbjJtUScW6jONLvhI8Z2sheR71UoKqGomMHNQpZ03ViVWBEALzEt TcjWgJFn8yAmxqM4nBnZ+hE3LbMo34KCHJD4eg18ojDt3s9VrDLa+V9fNxUHPSib9FD9UX/1 +nGfU/ZABmiTuUDM7WZdXri7HaMpzDRJUKI6b+/uunF8xH/h/MHW16VuMzgI5dkOKKv1LejD dT5mA4R+2zBS+GsM0oa2hUeX9E5WwjaDzXtVDg6kYq8YvEd+m0z3M4e6diFeLS77/sAOgaYL 92UcoKD+Beym/fVuC6/55a0e12ksTmgk5/ZoEdoNQLlVgd2INtvnO+0k5BJcn66ZjKn3GbEC VqFbrnv1GnA58nEInRCTzR1k26h9nmS5Ag0EW/wHfAEQAMth1vHr3fOZkVOPfod3M6DkQir5 xJvUW5EHgYUjYCPIa2qzgIVVuLDqZgSCCinyooG5dUJONVHj3nCbITCpJp4eB3PI84RPfDcC hf/V34N/Gx5mTeoymSZDBmXT8YtvV/uJvn+LvHLO4ZJdvq5ZxmDyxfXFmkm3/lLw0+rrNdK5 pt6OnVlCqEU9tcDBezjUwDtOahyV20XqxtUttN4kQWbDRkhT+HrA9WN9l2HX91yEYC+zmF1S OhBqRoTPLrR6g4sCWgFywqztpvZWhyIicJipnjac7qL/wRS+wrWfsYy6qWLIV80beN7yoa6v ccnuy4pu2uiuhk9/edtlmFE4dNdoRf7843CV9k1yRASTlmPkU59n0TJbw+okTa9fbbQgbIb1 pWsAuicRHyLUIUz4f6kPgdgty2FgTKuPuIzJd1s8s6p2aC1qo+Obm2gnBTduB+/n1Jw+vKpt 07d+CKEKu4CWwvZZ8ktJJLeofi4hMupTYiq+oMzqH+V1k6QgNm0Da489gXllU+3EFC6W1qKj tkvQzg2rYoWeYD1Qn8iXcO4Fpk6wzylclvatBMddVlQ6qrYeTmSbCsk+m2KVrz5vIyja0o5Y yfeN29s9emXnikmNfv/dA5fpi8XCANNnz3zOfA93DOB9DBf0TQ2/OrSPGjB3op7RCfoPBZ7u AjJ9dM7VABEBAAGJAjwEGAEIACYWIQRwNQOUoMloO4mV9l90l1of+wIKmwUCW/wHfAIbDAUJ CWYBgAAKCRB0l1of+wIKm3KlD/9w/LOG5rtgtCUWPl4B3pZvGpNym6XdK8cop9saOnE85zWf u+sKWCrxNgYkYP7aZrYMPwqDvilxhbTsIJl5HhPgpTO1b0i+c0n1Tij3EElj5UCg3q8mEc17 c+5jRrY3oz77g7E3oPftAjaq1ybbXjY4K32o3JHFR6I8wX3m9wJZJe1+Y+UVrrjY65gZFxcA thNVnWKErarVQGjeNgHV4N1uF3pIx3kT1N4GSnxhoz4Bki91kvkbBhUgYfNflGURfZT3wIKK +d50jd7kqRouXUCzTdzmDh7jnYrcEFM4nvyaYu0JjSS5R672d9SK5LVIfWmoUGzqD4AVmUW8 pcv461+PXchuS8+zpltR9zajl72Q3ymlT4BTAQOlCWkD0snBoKNUB5d2EXPNV13nA0qlm4U2 GpROfJMQXjV6fyYRvttKYfM5xYKgRgtP0z5lTAbsjg9WFKq0Fndh7kUlmHjuAIwKIV4Tzo75 QO2zC0/NTaTjmrtiXhP+vkC4pcrOGNsbHuaqvsc/ZZ0siXyYsqbctj/sCd8ka2r94u+c7o4l BGaAm+FtwAfEAkXHu4y5Phuv2IRR+x1wTey1U1RaEPgN8xq0LQ1OitX4t2mQwjdPihZQBCnZ wzOrkbzlJMNrMKJpEgulmxAHmYJKgvZHXZXtLJSejFjR0GdHJcL5rwVOMWB8cg== Message-ID: <7a6842d9-43fd-1746-113f-887d6afb16e5@ashroe.eu> Date: Tue, 24 Mar 2020 14:38:27 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-AuthUser: mdr@ashroe.eu Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v1 20/26] node: ipv4 lookup for x86 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 24/03/2020 09:40, Pavan Nikhilesh Bhagavatula wrote: > Hi Ray, > > I have tried to avoid hand unrolling loops and found the following observations. > > 1. Although it decreases LOC it also takes away readability too. > Example: > Avoiding unrolled code below [SNIP] > Which is kind of unreadable. I am confused - isn't it exactly the same code? You still haven't completely unrolled the loop either? I don't know how one is readable and the other is not. > > 2. Not all compilers are made equal. I found that most of the compilers don’t > Unroll the loop above even when compiled with `-funroll-all-loops`. > I have checked with following compilers: > GCC 9.2.0 > Clang 9.0.1 > Aarch64 GCC 7.3.0 > Aarch64 GCC 9.2.0 Compilers have been unrolling fixed length loops for as long time - this isn't new technology. If the compiler isn't unrolling you are doing something that makes it think it is a bad idea. Hand unrolling the loop isn't the solution, understanding what the compiler is doing is a better idea. In front of your for loop insert, to indicate to the compiler what you want to do. #pragma unroll BUF_PER_LOOP With clang you can ask it why it is not unrolling the loop with the following switches. (output is verbose, but the reason is in there). -Rpass=loop-unroll -Rpass-missed=loop-unroll > > 3. Performance wise I see a lot of degradation on our platform at least 13%. Is the loop being unrolled? > On IA with a Broadwell(Xeon E5-2690) and i40e the performance remain same w.r.t Rx/Tx since the > hotspot is in the Tx path of the driver which limits the per core capability. > But the performance difference in number of cycles per node can be seen below: > > Hand unrolling: > +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+ > |Node |calls |objs |realloc_count |objs/call |objs/sec(10E6) |cycles/call| > +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+ > |ip4_lookup |7765918 |248509344 |1 |32.000 |27.725408 |779.0000 | > |ip4_rewrite |7765925 |248509568 |1 |32.000 |27.725408 |425.0000 | > |ethdev_tx-1 |7765927 |204056223 |1 |26.000 |22.762720 |597.0000 | > |pkt_drop |1389170 |44453409 |1 |32.000 |4.962688 |298.0000 | > |ethdev_rx-0-0 |63604111 |248509792 |2 |32.000 |27.725408 |982.0000 | > +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+ > > W/o unrolling: > > +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+ > |Node |calls |objs |realloc_count |objs/call |objs/sec(10E6) |cycles/call| > +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+ > |ip4_lookup |18864640 |603668448 |1 |32.000 |26.051328 |828.0000 | > |ip4_rewrite |18864646 |603668640 |1 |32.000 |26.051328 |534.0000 | > |ethdev_tx-1 |18864648 |527874175 |1 |27.000 |22.780256 |633.0000 | > |pkt_drop |2368580 |75794529 |1 |32.000 |3.271072 |286.0000 | > |ethdev_rx-0-0 |282058226 |603668864 |2 |32.000 |26.051328 |994.0000 | > +-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+ > > Considering the above findings we would like to continue unrolling the loops by hand. > > Regards, > Pavan. > >> -----Original Message----- >> From: Ray Kinsella >> Sent: Friday, March 20, 2020 2:44 PM >> To: Pavan Nikhilesh Bhagavatula ; Jerin >> Jacob Kollanukkaran ; Nithin Kumar Dabilpuram >> >> Cc: dev@dpdk.org; thomas@monjalon.net; >> david.marchand@redhat.com; mattias.ronnblom@ericsson.com; Kiran >> Kumar Kokkilagadda >> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v1 20/26] node: ipv4 lookup >> for x86 >> >> >> >> On 19/03/2020 16:13, Pavan Nikhilesh Bhagavatula wrote: >>> >>> >>>> -----Original Message----- >>>> From: Ray Kinsella >>>> Sent: Thursday, March 19, 2020 9:21 PM >>>> To: Pavan Nikhilesh Bhagavatula ; >> Jerin >>>> Jacob Kollanukkaran ; Nithin Kumar >> Dabilpuram >>>> >>>> Cc: dev@dpdk.org; thomas@monjalon.net; >>>> david.marchand@redhat.com; mattias.ronnblom@ericsson.com; >> Kiran >>>> Kumar Kokkilagadda >>>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v1 20/26] node: ipv4 >> lookup >>>> for x86 >>>> >>>> >>>> >>>> On 19/03/2020 14:22, Pavan Nikhilesh Bhagavatula wrote: >>>>>> On 18/03/2020 21:35, jerinj@marvell.com wrote: >>>>>>> From: Pavan Nikhilesh >>>>>>> >>>>>>> Add IPv4 lookup process function for ip4_lookup >>>>>>> rte_node. This node performs LPM lookup using x86_64 >>>>>>> vector supported RTE_LPM API on every packet received >>>>>>> and forwards it to a next node that is identified by >>>>>>> lookup result. >>>>>>> >>>>>>> Signed-off-by: Pavan Nikhilesh >>>>>>> Signed-off-by: Nithin Dabilpuram >>>>>>> Signed-off-by: Kiran Kumar K >>>>>>> --- >>>>>>> lib/librte_node/ip4_lookup.c | 245 >>>>>> +++++++++++++++++++++++++++++++++++ >>>>>>> 1 file changed, 245 insertions(+) >>>>>>> >>>>>>> diff --git a/lib/librte_node/ip4_lookup.c >>>>>> b/lib/librte_node/ip4_lookup.c >>>>>>> index d7fcd1158..c003e9c91 100644 >>>>>>> --- a/lib/librte_node/ip4_lookup.c >>>>>>> +++ b/lib/librte_node/ip4_lookup.c >>>>>>> @@ -264,6 +264,251 @@ ip4_lookup_node_process(struct >>>> rte_graph >>>>>> *graph, struct rte_node *node, >>>>>>> return nb_objs; >>>>>>> } >>>>>>> >>>>>>> +#elif defined(RTE_ARCH_X86) >>>>>>> + >>>>>>> +/* X86 SSE */ >>>>>>> +static uint16_t >>>>>>> +ip4_lookup_node_process(struct rte_graph *graph, struct >>>> rte_node >>>>>> *node, >>>>>>> + void **objs, uint16_t nb_objs) >>>>>>> +{ >>>>>>> + struct rte_mbuf *mbuf0, *mbuf1, *mbuf2, *mbuf3, >> **pkts; >>>>>>> + rte_edge_t next0, next1, next2, next3, next_index; >>>>>>> + struct rte_ipv4_hdr *ipv4_hdr; >>>>>>> + struct rte_ether_hdr *eth_hdr; >>>>>>> + uint32_t ip0, ip1, ip2, ip3; >>>>>>> + void **to_next, **from; >>>>>>> + uint16_t last_spec = 0; >>>>>>> + uint16_t n_left_from; >>>>>>> + struct rte_lpm *lpm; >>>>>>> + uint16_t held = 0; >>>>>>> + uint32_t drop_nh; >>>>>>> + rte_xmm_t dst; >>>>>>> + __m128i dip; /* SSE register */ >>>>>>> + int rc, i; >>>>>>> + >>>>>>> + /* Speculative next */ >>>>>>> + next_index = >> RTE_NODE_IP4_LOOKUP_NEXT_REWRITE; >>>>>>> + /* Drop node */ >>>>>>> + drop_nh = >>>>>> ((uint32_t)RTE_NODE_IP4_LOOKUP_NEXT_PKT_DROP) << 16; >>>>>>> + >>>>>>> + /* Get socket specific LPM from ctx */ >>>>>>> + lpm = *((struct rte_lpm **)node->ctx); >>>>>>> + >>>>>>> + pkts = (struct rte_mbuf **)objs; >>>>>>> + from = objs; >>>>>>> + n_left_from = nb_objs; >>>>>> >>>>>> I doubt this initial prefetch of the first 4 packets has any benefit. >>>>> >>>>> Ack will remove in v2 for x86. >>>>> >>>>>> >>>>>>> + if (n_left_from >= 4) { >>>>>>> + for (i = 0; i < 4; i++) { >>>>>>> + >> rte_prefetch0(rte_pktmbuf_mtod(pkts[i], >>>>>>> + struct >> rte_ether_hdr >>>>>> *) + >>>>>>> + 1); >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> + /* Get stream for the speculated next node */ >>>>>>> + to_next = rte_node_next_stream_get(graph, node, >>>>>> next_index, nb_objs); >>>>>> >>>>>> Suggest you don't reuse the hand-unrolling optimization from >> FD.io >>>>>> VPP. >>>>>> I have never found any performance benefit from them, and they >>>>>> make the code unnecessarily verbose. >>>>>> >>>>> >>>>> How would be take the benefit of rte_lpm_lookupx4 without >>>> unrolling the loop?. >>>>> Also, in future if we are using rte_rib and fib with a CPU supporting >>>> wider SIMD we might >>>>> need to unroll them further (AVX256 AND 512 currently >>>> rte_lpm_lookup uses only 128bit >>>>> since it is only uses SSE extension). >>>> >>>> Let the compiler do it for you, but using a constant vector length. >>>> for (int i=0; i < 4; ++i) { ... } >>>> >>> >>> Ok, I think I misunderstood the previous comment. >>> It was only for the prefetches in the loop right? >> >> >> no, it was for all the needless repetition. >> hand-unrolling loops serve no purpose but to add verbosity. >> >>> >>>>> >>>>>> >>>>>>> + while (n_left_from >= 4) { >>>>>>> + /* Prefetch next-next mbufs */ >>>>>>> + if (likely(n_left_from >= 11)) { >>>>>>> + rte_prefetch0(pkts[8]); >>>>>>> + rte_prefetch0(pkts[9]); >>>>>>> + rte_prefetch0(pkts[10]); >>>>>>> + rte_prefetch0(pkts[11]); >>>>>>> + } >>>>>>> + >>>>>>> + /* Prefetch next mbuf data */ >>>>>>> + if (likely(n_left_from >= 7)) { >>>>>>> + >> rte_prefetch0(rte_pktmbuf_mtod(pkts[4], >>>>>>> + struct >> rte_ether_hdr >>>>>> *) + >>>>>>> + 1); >>>>>>> + >> rte_prefetch0(rte_pktmbuf_mtod(pkts[5], >>>>>>> + struct >> rte_ether_hdr >>>>>> *) + >>>>>>> + 1); >>>>>>> + >> rte_prefetch0(rte_pktmbuf_mtod(pkts[6], >>>>>>> + struct >> rte_ether_hdr >>>>>> *) + >>>>>>> + 1); >>>>>>> + >> rte_prefetch0(rte_pktmbuf_mtod(pkts[7], >>>>>>> + struct >> rte_ether_hdr >>>>>> *) + >>>>>>> + 1); >>>>>>> + } >>>>>>> + >>>>>>> + mbuf0 = pkts[0]; >>>>>>> + mbuf1 = pkts[1]; >>>>>>> + mbuf2 = pkts[2]; >>>>>>> + mbuf3 = pkts[3]; >>>>>>> + >>>>>>> + pkts += 4; >>>>>>> + n_left_from -= 4; >>>>>>> + >>>>>>> + /* Extract DIP of mbuf0 */ >>>>>>> + eth_hdr = rte_pktmbuf_mtod(mbuf0, struct >>>>>> rte_ether_hdr *); >>>>>>> + ipv4_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); >>>>>>> + ip0 = ipv4_hdr->dst_addr; >>>>>>> + /* Extract cksum, ttl as ipv4 hdr is in cache */ >>>>>>> + rte_node_mbuf_priv1(mbuf0)->cksum = >> ipv4_hdr- >>>>>>> hdr_checksum; >>>>>>> + rte_node_mbuf_priv1(mbuf0)->ttl = ipv4_hdr- >>>>>>> time_to_live; >>>>>>> + >>>>>>> + /* Extract DIP of mbuf1 */ >>>>>>> + eth_hdr = rte_pktmbuf_mtod(mbuf1, struct >>>>>> rte_ether_hdr *); >>>>>>> + ipv4_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); >>>>>>> + ip1 = ipv4_hdr->dst_addr; >>>>>>> + /* Extract cksum, ttl as ipv4 hdr is in cache */ >>>>>>> + rte_node_mbuf_priv1(mbuf1)->cksum = >> ipv4_hdr- >>>>>>> hdr_checksum; >>>>>>> + rte_node_mbuf_priv1(mbuf1)->ttl = ipv4_hdr- >>>>>>> time_to_live; >>>>>>> + >>>>>>> + /* Extract DIP of mbuf2 */ >>>>>>> + eth_hdr = rte_pktmbuf_mtod(mbuf2, struct >>>>>> rte_ether_hdr *); >>>>>>> + ipv4_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); >>>>>>> + ip2 = ipv4_hdr->dst_addr; >>>>>>> + /* Extract cksum, ttl as ipv4 hdr is in cache */ >>>>>>> + rte_node_mbuf_priv1(mbuf2)->cksum = >> ipv4_hdr- >>>>>>> hdr_checksum; >>>>>>> + rte_node_mbuf_priv1(mbuf2)->ttl = ipv4_hdr- >>>>>>> time_to_live; >>>>>>> + >>>>>>> + /* Extract DIP of mbuf3 */ >>>>>>> + eth_hdr = rte_pktmbuf_mtod(mbuf3, struct >>>>>> rte_ether_hdr *); >>>>>>> + ipv4_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); >>>>>>> + ip3 = ipv4_hdr->dst_addr; >>>>>>> + >>>>>>> + /* Prepare for lookup x4 */ >>>>>>> + dip = _mm_set_epi32(ip3, ip2, ip1, ip0); >>>>>>> + >>>>>>> + /* Byte swap 4 IPV4 addresses. */ >>>>>>> + const __m128i bswap_mask = _mm_set_epi8( >>>>>>> + 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, >> 2, 3); >>>>>>> + dip = _mm_shuffle_epi8(dip, bswap_mask); >>>>>>> + >>>>>>> + /* Extract cksum, ttl as ipv4 hdr is in cache */ >>>>>>> + rte_node_mbuf_priv1(mbuf3)->cksum = >> ipv4_hdr- >>>>>>> hdr_checksum; >>>>>>> + rte_node_mbuf_priv1(mbuf3)->ttl = ipv4_hdr- >>>>>>> time_to_live; >>>>>>> + >>>>>>> + /* Perform LPM lookup to get NH and next >> node */ >>>>>>> + rte_lpm_lookupx4(lpm, dip, dst.u32, drop_nh); >>>>>>> + >>>>>>> + /* Extract next node id and NH */ >>>>>>> + rte_node_mbuf_priv1(mbuf0)->nh = dst.u32[0] >> & >>>>>> 0xFFFF; >>>>>>> + next0 = (dst.u32[0] >> 16); >>>>>>> + >>>>>>> + rte_node_mbuf_priv1(mbuf1)->nh = dst.u32[1] >> & >>>>>> 0xFFFF; >>>>>>> + next1 = (dst.u32[1] >> 16); >>>>>>> + >>>>>>> + rte_node_mbuf_priv1(mbuf2)->nh = dst.u32[2] >> & >>>>>> 0xFFFF; >>>>>>> + next2 = (dst.u32[2] >> 16); >>>>>>> + >>>>>>> + rte_node_mbuf_priv1(mbuf3)->nh = dst.u32[3] >> & >>>>>> 0xFFFF; >>>>>>> + next3 = (dst.u32[3] >> 16); >>>>>>> + >>>>>>> + /* Enqueue four to next node */ >>>>>>> + rte_edge_t fix_spec = >>>>>>> + (next_index ^ next0) | (next_index ^ >> next1) | >>>>>>> + (next_index ^ next2) | (next_index ^ >> next3); >>>>>>> + >>>>>>> + if (unlikely(fix_spec)) { >>>>>>> + /* Copy things successfully speculated >> till now >>>>>> */ >>>>>>> + rte_memcpy(to_next, from, last_spec * >>>>>> sizeof(from[0])); >>>>>>> + from += last_spec; >>>>>>> + to_next += last_spec; >>>>>>> + held += last_spec; >>>>>>> + last_spec = 0; >>>>>>> + >>>>>>> + /* Next0 */ >>>>>>> + if (next_index == next0) { >>>>>>> + to_next[0] = from[0]; >>>>>>> + to_next++; >>>>>>> + held++; >>>>>>> + } else { >>>>>>> + rte_node_enqueue_x1(graph, >> node, >>>>>> next0, >>>>>>> + from[0]); >>>>>>> + } >>>>>>> + >>>>>>> + /* Next1 */ >>>>>>> + if (next_index == next1) { >>>>>>> + to_next[0] = from[1]; >>>>>>> + to_next++; >>>>>>> + held++; >>>>>>> + } else { >>>>>>> + rte_node_enqueue_x1(graph, >> node, >>>>>> next1, >>>>>>> + from[1]); >>>>>>> + } >>>>>>> + >>>>>>> + /* Next2 */ >>>>>>> + if (next_index == next2) { >>>>>>> + to_next[0] = from[2]; >>>>>>> + to_next++; >>>>>>> + held++; >>>>>>> + } else { >>>>>>> + rte_node_enqueue_x1(graph, >> node, >>>>>> next2, >>>>>>> + from[2]); >>>>>>> + } >>>>>>> + >>>>>>> + /* Next3 */ >>>>>>> + if (next_index == next3) { >>>>>>> + to_next[0] = from[3]; >>>>>>> + to_next++; >>>>>>> + held++; >>>>>>> + } else { >>>>>>> + rte_node_enqueue_x1(graph, >> node, >>>>>> next3, >>>>>>> + from[3]); >>>>>>> + } >>>>>>> + >>>>>>> + from += 4; >>>>>>> + >>>>>>> + } else { >>>>>>> + last_spec += 4; >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> + while (n_left_from > 0) { >>>>>>> + uint32_t next_hop; >>>>>>> + >>>>>>> + mbuf0 = pkts[0]; >>>>>>> + >>>>>>> + pkts += 1; >>>>>>> + n_left_from -= 1; >>>>>>> + >>>>>>> + /* Extract DIP of mbuf0 */ >>>>>>> + eth_hdr = rte_pktmbuf_mtod(mbuf0, struct >>>>>> rte_ether_hdr *); >>>>>>> + ipv4_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); >>>>>>> + /* Extract cksum, ttl as ipv4 hdr is in cache */ >>>>>>> + rte_node_mbuf_priv1(mbuf0)->cksum = >> ipv4_hdr- >>>>>>> hdr_checksum; >>>>>>> + rte_node_mbuf_priv1(mbuf0)->ttl = ipv4_hdr- >>>>>>> time_to_live; >>>>>>> + >>>>>>> + rc = rte_lpm_lookup(lpm, >> rte_be_to_cpu_32(ipv4_hdr- >>>>>>> dst_addr), >>>>>>> + &next_hop); >>>>>>> + next_hop = (rc == 0) ? next_hop : drop_nh; >>>>>>> + >>>>>>> + rte_node_mbuf_priv1(mbuf0)->nh = next_hop >> & >>>>>> 0xFFFF; >>>>>>> + next0 = (next_hop >> 16); >>>>>>> + >>>>>>> + if (unlikely(next_index ^ next0)) { >>>>>>> + /* Copy things successfully speculated >> till now >>>>>> */ >>>>>>> + rte_memcpy(to_next, from, last_spec * >>>>>> sizeof(from[0])); >>>>>>> + from += last_spec; >>>>>>> + to_next += last_spec; >>>>>>> + held += last_spec; >>>>>>> + last_spec = 0; >>>>>>> + >>>>>>> + rte_node_enqueue_x1(graph, node, >> next0, >>>>>> from[0]); >>>>>>> + from += 1; >>>>>>> + } else { >>>>>>> + last_spec += 1; >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> + /* !!! Home run !!! */ >>>>>>> + if (likely(last_spec == nb_objs)) { >>>>>>> + rte_node_next_stream_move(graph, node, >>>>>> next_index); >>>>>>> + return nb_objs; >>>>>>> + } >>>>>>> + >>>>>>> + held += last_spec; >>>>>>> + /* Copy things successfully speculated till now */ >>>>>>> + rte_memcpy(to_next, from, last_spec * >> sizeof(from[0])); >>>>>>> + rte_node_next_stream_put(graph, node, next_index, >> held); >>>>>>> + >>>>>>> + return nb_objs; >>>>>>> +} >>>>>>> + >>>>>>> #else >>>>>>> >>>>>>> static uint16_t >>>>>>>