From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA404C43334 for ; Fri, 3 Jun 2022 16:21:16 +0000 (UTC) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BD6A24021E; Fri, 3 Jun 2022 18:21:15 +0200 (CEST) Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by mails.dpdk.org (Postfix) with ESMTP id 0C55040041 for ; Fri, 3 Jun 2022 18:21:14 +0200 (CEST) Received: by mail-pj1-f43.google.com with SMTP id u12-20020a17090a1d4c00b001df78c7c209so12477636pju.1 for ; Fri, 03 Jun 2022 09:21:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4Xo5LWyy/tm84u2flAJ9kgGi1G/vp+sfKxjFvFgyizc=; b=Phf1EPll5cZmQAOYeXy5IXNdXyldGHE1ZyO90Q8AVr6RXzrJ5iZ7CNlWGSm+qhL/zG DHfMkh6Z5FWc1PW/fnof+AOlWb3/RmaU3GMPmRr4mNtBRrPtxcFyXvtKvS/iasmNZBSB zyUUeAAg+ds2oQKNqZaA76n/lrTSu/zMEV7LuFQYA57276fxRA8e0btRTLV/0kbivKO2 0d8EQZmo7xbYUPtxZwlqZVNEfKR2Pq7vH6hdZAI0st2ZMOSV20NQpSlwtpgQ1k4Fn59O Q+Qet9tQvsYoNzZP67gIlpJEUCwVBxV5KSqRmvdq0s+Ps7Kn3BWtZiYsl5A+Qu1grFUe oLoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4Xo5LWyy/tm84u2flAJ9kgGi1G/vp+sfKxjFvFgyizc=; b=YIJnceDYYKQ8u965wqjEGCys/N1vflWwHz8oWd3pIQdfkgPz2ZTNe58+vxerhJPk8k fp9x4YD6cYGfLt4rrB3aX72GbIy42Kig4scaDOFPIuKgjJaVPupywYAH6EkV62yIJhNC FmaMOHcCUOSwcDrDltx7diOvNI1BAswyph6LClrTeen0v0/MtRHao6pclI3J71rnOiz4 65TFnNIeypM35f5TiPQu8360VgtsH+9bOgvPpKv1EeljBVvhP48ivaOVVuObN15fQe5h cwmRyDFy2sxai3tJIaO2UvGOW83Xs6wcke6CCKw+2bc0wNsknj+bw7q9EQIq+N+3CCuB U6sA== X-Gm-Message-State: AOAM530pqL0JjbkdgqWkyAyAWva2K/T/9l+hXbK9/Y0gWI3HOHSvXwQt D4311Cbdbs+HXMKFzkXiFLmC1A== X-Google-Smtp-Source: ABdhPJxPV2+bHAYMCmpnyNZk1fmzx1Bb0aEqZwaSs8MG0CKfYJHOI0X7xrekN3lX+3pxpfGmEtKSaw== X-Received: by 2002:a17:902:a608:b0:161:f859:e8d1 with SMTP id u8-20020a170902a60800b00161f859e8d1mr10819464plq.116.1654273273111; Fri, 03 Jun 2022 09:21:13 -0700 (PDT) Received: from hermes.local (204-195-112-199.wavecable.com. [204.195.112.199]) by smtp.gmail.com with ESMTPSA id v24-20020a634658000000b003fad46ceb85sm5646484pgk.7.2022.06.03.09.21.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Jun 2022 09:21:12 -0700 (PDT) Date: Fri, 3 Jun 2022 09:21:09 -0700 From: Stephen Hemminger To: madhuker.mythri@oracle.com Cc: ferruh.yigit@intel.com, dev@dpdk.org Subject: Re: [PATCH] net/tap: Fixed RSS algorithm to support fragmented packets Message-ID: <20220603092109.71735d43@hermes.local> In-Reply-To: <20220603085359.229c898c@hermes.local> References: <20220325152809.2035-1-madhuker.mythri@oracle.com> <20220603085359.229c898c@hermes.local> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, 3 Jun 2022 08:53:59 -0700 Stephen Hemminger wrote: > On Fri, 25 Mar 2022 20:58:09 +0530 > madhuker.mythri@oracle.com wrote: > > > From: Madhuker Mythri > > > > As per analysis on DPDK Tap PMD, the existing RSS algorithm considering 4-tuple(Src-IP, Dst-IP, Src-port and Dst-port) and identification of fragment packets is not done, thus we are seeing all the fragmented chunks of single packet differs RSS hash value and distributed across multiple queues. > > The RSS algorithm assumes that, all the incoming IP packets are based on L4-protocol(UDP/TCP) and trying to fetch the L4 fields(Src-port and Dst-port) for each incoming packet, but for the fragmented packets these L4-header will not be present(except for first packet) and should not consider in RSS hash for L4 header fields in-case of fragmented packets. > > Which is a bug in the RSS algorithm implemented in the BPF functionality under TAP PMD. > > > > So, modified the RSS eBPF C-program and generated the structure of C-array in the 'tap_bpf_insns.h' file, which is in eBPF byte-code instructions format. > > > > Bugzilla Id: 870 > > > > Signed-off-by: Madhuker Mythri > > --- > > drivers/net/tap/tap_bpf_insns.h | 3371 +++++++++++++++-------------- > > drivers/net/tap/tap_bpf_program.c | 48 +- > > 2 files changed, 1743 insertions(+), 1676 deletions(-) > > > > Going back to the original RSS specs on Windows: > https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types > > There is note there: > > If a NIC receives a packet that has both IP and TCP headers, NDIS_HASH_TCP_IPV4 should not always > be used. In the case of a fragmented IP packet, NDIS_HASH_IPV4 must be used. > This includes the first fragment which contains both IP and TCP headers. > > The modified BPF program in this patch does not do that exactly. > Adding port of 0 is not the same hashing a smaller tuple. > > IPV6 fragments need similar fix? Something like the following (totally untested)... diff --git a/drivers/net/tap/tap_bpf_program.c b/drivers/net/tap/tap_bpf_program.c index 20c310e5e7ba..db6b32b9003b 100644 --- a/drivers/net/tap/tap_bpf_program.c +++ b/drivers/net/tap/tap_bpf_program.c @@ -170,10 +170,19 @@ rss_l3_l4(struct __sk_buff *skb) .dport = PORT(*(src_dst_port + 2), *(src_dst_port + 3)), }; + __u16 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off); + __u16 frag_off = PORT(frag_off_addr[0], frag_off_addr[1]); + __u8 *proto_addr = data + off + offsetof(struct iphdr, protocol); + __u8 protocol = *proto_addr; __u8 input_len = sizeof(v4_tuple) / sizeof(__u32); - if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3)) + + /* If only want L3 or fragmented or not tcp/udp then do L3 only */ + if ((rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3)) || + (frag_off & 0x3fff) || + !(protocol == IPPROTO_TCP || protocol == IPPROTO_UDP)) input_len--; - hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3); + + hash = rte_softrss_be((__u32 *)&v4_tuple, key, input_len); } else if (proto == htons(ETH_P_IPV6)) { if (data + off + sizeof(struct ipv6hdr) + sizeof(__u32) > data_end) @@ -182,6 +191,8 @@ rss_l3_l4(struct __sk_buff *skb) offsetof(struct ipv6hdr, saddr); __u8 *src_dst_port = data + off + sizeof(struct ipv6hdr); + __u8 *nexthdr = data + off + offsetof(struct ipv6hdr, nexthdr); + struct ipv6_l3_l4_tuple v6_tuple; for (j = 0; j < 4; j++) *((uint32_t *)&v6_tuple.src_addr + j) = @@ -197,9 +208,11 @@ rss_l3_l4(struct __sk_buff *skb) *(src_dst_port + 3)); __u8 input_len = sizeof(v6_tuple) / sizeof(__u32); - if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3)) + + if ((rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3)) || + !(*nexthdr == IPPROTO_TCP || *nexthdr == IPPROTO_UDP)) input_len--; - hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9); + hash = rte_softrss_be((__u32 *)&v6_tuple, key, input_len); } else { return TC_ACT_PIPE; }