From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=VgCv=U7=dpdk.org=dev-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E4D10C5B57D
	for <dpdk-dev@archiver.kernel.org>; Tue,  2 Jul 2019 07:53:19 +0000 (UTC)
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by mail.kernel.org (Postfix) with ESMTP id 82542205F4
	for <dpdk-dev@archiver.kernel.org>; Tue,  2 Jul 2019 07:53:19 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 82542205F4
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=6wind.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id B4A7A5B3A;
	Tue,  2 Jul 2019 09:53:18 +0200 (CEST)
Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67])
 by dpdk.org (Postfix) with ESMTP id 7DB3223D
 for <dev@dpdk.org>; Tue,  2 Jul 2019 09:53:17 +0200 (CEST)
Received: from lfbn-lil-1-176-160.w90-45.abo.wanadoo.fr ([90.45.26.160]
 helo=droids-corp.org)
 by mail.droids-corp.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.89) (envelope-from <olivier.matz@6wind.com>)
 id 1hiDeH-0001sL-3e; Tue, 02 Jul 2019 09:56:22 +0200
Received: by droids-corp.org (sSMTP sendmail emulation);
 Tue, 02 Jul 2019 09:53:14 +0200
Date: Tue, 2 Jul 2019 09:53:14 +0200
From: Olivier Matz <olivier.matz@6wind.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: dev@dpdk.org, Andrew Rybchenko <arybchenko@solarflare.com>
Message-ID: <20190702075314.npiao2zcsajykrob@platinum>
References: <20190516180427.17270-1-stephen@networkplumber.org>
 <20190624204435.29452-1-stephen@networkplumber.org>
 <20190624204435.29452-5-stephen@networkplumber.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190624204435.29452-5-stephen@networkplumber.org>
User-Agent: NeoMutt/20170113 (1.7.2)
Subject: Re: [dpdk-dev] [PATCH v5 4/8] net/ether: use bitops to speedup
 comparison
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi,

On Mon, Jun 24, 2019 at 01:44:31PM -0700, Stephen Hemminger wrote:
> Using bit operations like or and xor is faster than a loop
> on all architectures. Really just explicit unrolling.
> 
> Similar cast to uint16 unaligned is already done in
> other functions here.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
>  lib/librte_net/rte_ether.h | 17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> index 8edc7e217b25..feb35a33c94b 100644
> --- a/lib/librte_net/rte_ether.h
> +++ b/lib/librte_net/rte_ether.h
> @@ -81,11 +81,10 @@ struct rte_ether_addr {
>  static inline int rte_is_same_ether_addr(const struct rte_ether_addr *ea1,
>  				     const struct rte_ether_addr *ea2)
>  {
> -	int i;
> -	for (i = 0; i < RTE_ETHER_ADDR_LEN; i++)
> -		if (ea1->addr_bytes[i] != ea2->addr_bytes[i])
> -			return 0;
> -	return 1;
> +	const unaligned_uint16_t *w1 = (const uint16_t *)ea1;
> +	const unaligned_uint16_t *w2 = (const uint16_t *)ea2;
> +
> +	return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) == 0;
>  }
>  
>  /**
> @@ -100,11 +99,9 @@ static inline int rte_is_same_ether_addr(const struct rte_ether_addr *ea1,
>   */
>  static inline int rte_is_zero_ether_addr(const struct rte_ether_addr *ea)
>  {
> -	int i;
> -	for (i = 0; i < RTE_ETHER_ADDR_LEN; i++)
> -		if (ea->addr_bytes[i] != 0x00)
> -			return 0;
> -	return 1;
> +	const unaligned_uint16_t *w = (const uint16_t *)ea;
> +
> +	return (w[0] | w[1] | w[2]) == 0;
>  }
>  
>  /**

I wonder if using memcmp() isn't faster with recent compilers (gcc >= 7).
I tried it quickly, and it seems the generated code is good (no call):
https://godbolt.org/z/9MOL7g

It would avoid the use of unaligned_uint16_t, and the next patch that
adds the alignment constraint.