From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58C45C43333 for ; Fri, 12 Mar 2021 18:29:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4619A64F6A for ; Fri, 12 Mar 2021 18:29:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233464AbhCLS2l (ORCPT ); Fri, 12 Mar 2021 13:28:41 -0500 Received: from mail-40134.protonmail.ch ([185.70.40.134]:20732 "EHLO mail-40134.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233205AbhCLS2W (ORCPT ); Fri, 12 Mar 2021 13:28:22 -0500 Date: Fri, 12 Mar 2021 18:28:12 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1615573700; bh=L0uazoUIkN0zjdIavDun3gMD7KLrxkBn0hh1tsNwdYA=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=nJrLBQ7QGtRHzE4HZ5TSZc7Iemy7dzVq8QJGuIEOco91HRWWipj82pCJeYaF1PTps kyr0+E0pBF+rH7GwUwVn6o9k/MKF/7g4YXni56+7Kw3uohsUD3tA0I1bPr47KT9q5v w/qMenB5m40w+KcGbqDiCaknD5JHTpTpLiNkPJsFpvoMEhrznz15eNVJxXktIyTaew hepGGaYI4w6w3lFIwzbl5DrhhqsrY2Sbs6D//JuMGNkhax93/0QYVOJg6OrO0I8bGA NyLhmbHZnnLiISbJtKI1M0Asn5rAV5a7b86qfjYngFIZSCuUtbC7J2GsJ9M3FZEtzz FYYwPjXgqOp2w== To: Eric Dumazet From: Alexander Lobakin Cc: Alexander Lobakin , "David S. Miller" , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Wei Wang , Cong Wang , Taehee Yoo , netdev , LKML Reply-To: Alexander Lobakin Subject: Re: [PATCH net-next 4/4] gro: improve flow distribution across GRO buckets in dev_gro_receive() Message-ID: <20210312182754.241807-1-alobakin@pm.me> In-Reply-To: References: <20210312162127.239795-1-alobakin@pm.me> <20210312162127.239795-5-alobakin@pm.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Eric Dumazet Date: Fri, 12 Mar 2021 17:33:53 +0100 > On Fri, Mar 12, 2021 at 5:22 PM Alexander Lobakin wrote: > > > > Most of the functions that "convert" hash value into an index > > (when RPS is configured / XPS is not configured / etc.) set > > reciprocal_scale() on it. Its logics is simple, but fair enough and > > accounts the entire input value. > > On the opposite side, 'hash & (GRO_HASH_BUCKETS - 1)' expression uses > > only 3 least significant bits of the value, which is far from > > optimal (especially for XOR RSS hashers, where the hashes of two > > different flows may differ only by 1 bit somewhere in the middle). > > > > Use reciprocal_scale() here too to take the entire hash value into > > account and improve flow dispersion between GRO hash buckets. > > > > Signed-off-by: Alexander Lobakin > > --- > > net/core/dev.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 65d9e7d9d1e8..bd7c9ba54623 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -5952,7 +5952,7 @@ static void gro_flush_oldest(struct napi_struct *= napi, struct list_head *head) > > > > static enum gro_result dev_gro_receive(struct napi_struct *napi, struc= t sk_buff *skb) > > { > > - u32 bucket =3D skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1); > > + u32 bucket =3D reciprocal_scale(skb_get_hash_raw(skb), GRO_HASH= _BUCKETS); > > This is going to use 3 high order bits instead of 3 low-order bits. We-e-ell, seems like it. > Now, had you use hash_32(skb_get_hash_raw(skb), 3), you could have > claimed to use "more bits" Nice suggestion, I'll try. If there won't be any visible improvements, I'll just drop this one. > Toeplitz already shuffles stuff. As well as CRC and others, but I feel like we shouldn't rely only on the hardware. > Adding a multiply here seems not needed. > > Please provide experimental results, because this looks unnecessary to me= . Thanks, Al