From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39478C282C2 for ; Wed, 13 Feb 2019 10:52:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 07D9E21901 for ; Wed, 13 Feb 2019 10:52:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387976AbfBMKwP (ORCPT ); Wed, 13 Feb 2019 05:52:15 -0500 Received: from a.mx.secunet.com ([62.96.220.36]:33906 "EHLO a.mx.secunet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729685AbfBMKwP (ORCPT ); Wed, 13 Feb 2019 05:52:15 -0500 Received: from localhost (localhost [127.0.0.1]) by a.mx.secunet.com (Postfix) with ESMTP id CDA15200B9; Wed, 13 Feb 2019 11:52:13 +0100 (CET) X-Virus-Scanned: by secunet Received: from a.mx.secunet.com ([127.0.0.1]) by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yRsTpR0WTzy5; Wed, 13 Feb 2019 11:52:13 +0100 (CET) Received: from mail-essen-01.secunet.de (mail-essen-01.secunet.de [10.53.40.204]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by a.mx.secunet.com (Postfix) with ESMTPS id 1ED34200A7; Wed, 13 Feb 2019 11:52:13 +0100 (CET) Received: from gauss2.secunet.de (10.182.7.193) by mail-essen-01.secunet.de (10.53.40.204) with Microsoft SMTP Server id 14.3.435.0; Wed, 13 Feb 2019 11:52:12 +0100 Received: by gauss2.secunet.de (Postfix, from userid 1000) id 5698331826A5; Wed, 13 Feb 2019 11:52:12 +0100 (CET) Date: Wed, 13 Feb 2019 11:52:12 +0100 From: Steffen Klassert To: Paolo Abeni CC: Willem de Bruijn , "Jason A. Donenfeld" , Subject: Re: [PATCH RFC v2 1/3] UDP: enable GRO by default. Message-ID: <20190213105212.GY3087@gauss3.secunet.de> References: <20190128085025.14532-1-steffen.klassert@secunet.com> <20190128085025.14532-2-steffen.klassert@secunet.com> <9615a29c028dea4f95efa7d1921f67b1a8c1f6e2.camel@redhat.com> <20190213090406.GW3087@gauss3.secunet.de> <87f5eb5964c77840eccaaba184039b226a387fc7.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <87f5eb5964c77840eccaaba184039b226a387fc7.camel@redhat.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-G-Data-MailSecurity-for-Exchange-State: 0 X-G-Data-MailSecurity-for-Exchange-Error: 0 X-G-Data-MailSecurity-for-Exchange-Sender: 23 X-G-Data-MailSecurity-for-Exchange-Server: d65e63f7-5c15-413f-8f63-c0d707471c93 X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-G-Data-MailSecurity-for-Exchange-Guid: 98215778-E97C-443D-AB4D-C0E600A13470 X-G-Data-MailSecurity-for-Exchange-ProcessedOnRouted: True Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Move discussion back to the list... On Wed, Feb 13, 2019 at 10:50:53AM +0100, Paolo Abeni wrote: > Hi, > > I'm unsure if the off-list is intentional, anyhow I keep the reply off- > list. Please feel free to move back the discussion on the ML as it fit > you better. Your last mail was already off-list, I've just replied to everyone in Cc :-) > > On Wed, 2019-02-13 at 10:04 +0100, Steffen Klassert wrote: > > On Tue, Feb 12, 2019 at 05:18:38PM +0100, Paolo Abeni wrote: > > > Hi, > > > > > > On Mon, 2019-01-28 at 09:50 +0100, Steffen Klassert wrote: > > > > diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c > > > > index 83b11d0ac091..5f7937a4f71a 100644 > > > > --- a/net/ipv6/udp_offload.c > > > > +++ b/net/ipv6/udp_offload.c > > > > @@ -119,6 +119,8 @@ INDIRECT_CALLABLE_SCOPE > > > > struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb) > > > > { > > > > struct udphdr *uh = udp_gro_udphdr(skb); > > > > + struct sk_buff *pp; > > > > + struct sock *sk; > > > > > > > > if (unlikely(!uh) || !static_branch_unlikely(&udpv6_encap_needed_key)) > > > > goto flush; > > > > > > This ^^^^ > > > should be dropped, otherwise GRO is not enabled for ipv6. > > > > Yes, will change this in the next patchset. > > > > > I finally had some time to benchmark this series. > > > > > > I used a pktgen sender b2b connected with the DUT via a 10Gb link. The > > > sender generates 64 bytes UDP packets, and and the receiver runs an > > > UDP sink _without_ UDP_GRO. > > > > > > With aggregation (pktgen uses a single flow): > > > > > > vanilla patched delta > > > (kpps) (kpps) (%) > > > sink with 2000 2050 2.5 > > > rcvmsg > > > > > > sink with 1260 2900 130 > > > rcvmmsg > > > > Do you know the bottleneck with the patched kernel? In particular > > with rcvmsg, I guess the bottleneck is a full socket receive queue. > > Yes, with the patched kernel the bottle-neck is the user-space sink: > especially due to PTI the US/KS transition is very costly. This is why > rcvmmsg gives so great improvements. > > IIRC the bottle-neck is still on the US even with rcvmmsg, because GRO > (even with segmentation) gives a great speedup vs non GRO processing. > > Additional note: all the above is without any nf/ct related module > loaded - that is, with the less favorable conditions for GRO. > > > > Note: with the unpatched kernel, the bottle-neck is in ksoftirq > > > processing: making the receiver faster (with rcvmmsg) reduces the > > > "goodput", as the kernel processing needs to spend more time to wake up > > > the user-space process. > > > > > > When aggregation does not take place (pktgen changes the source port > > > for each packet, creating 64K different flows): > > > > I tried this with random source ports on my forwarding setup. > > Here the NIC distributes the packets to different RX queues, > > so I could easily fill a 40G link with this workload. > > I'm sorry, I omitted another piece of information: In my tests I forced > the NIC to use a single queue, (and with 64 bytes packets). Otherwise > the link speed (or the bus b/w) easily become the bottle-neck and I > could not measure any difference. I have only 10Gbs links handy. Ok, so this is likely the worst case we can get. > > > > vanilla patched delta > > > (kpps) (kpps) (%) > > > sink with 2020 1620 -20 > > > rcvmsg > > > > > > sink with 1260 1110 -12 > > > rcvmmsg > > > > > > Here there is a measurable regression. I haven't tracked it fully, but > > > it looks like it depends mostly on less cache-friendly execution: 64 > > > packets are stored in the GRO hash and than flushed altogether. When > > > that happens, most packets headers are cold in the cache. > > > > With 64 packets in the gro hashtable, a L1D cache miss is rather > > likely. Btw. why can we hold 64 flows in the gro hashtable now? > > The old gro_list could hold up to 8 flows. Seems like > > > > commit 6312fe77751f57d4fa2b28abeef84c6a95c28136 > > net: limit each hash list length to MAX_GRO_SKBS > > > > changed this. > > AFAICS, yes each list can have at most 8 entries, but the hash has > GRO_HASH_BUCKETS (8) buckets/lists for a possible total of 64 flows. > Likely we will hit the MAX_GRO_SKBS limit before completely filling the > hash, I have to instrument to get some real numbers, but likely at each > napi_poll() completion we will have a lot of flows still there (almost > 64). > > I'm using HZ == 1000, so I hit a napi_gro_flush() at each napi_poll() > > > I'm wondering if a prefetch(skb->data) in __napi_gro_flush_chain() > > > could help?!? (even for TCP workload) > > > > Yes, the problem should exist for TCP too. Not sure if a prefetch > > will help, but you can try. > > Indeed, I would be happy for any better solution! Maybe we could adjust napi_gro_complete() etc. so the we can use netif_receive_skb_list() on a gro flush. Cache would be still cold, but we would not need to run the full stack for each packet in the gro hashtable. > > The following is likely overkill, but... long time ago, I tried to > implement early demux for unconnected sockets: > > https://www.spinics.net/lists/netdev/msg456384.html > > One idea behind that series is that, if there is no policy routing > configuration in place, we can detect if a packet is locally terminated > without any route lookup, with a way cheaper address lookup. *Perhaps* > a similar idea could be used here -> in absence of UDP_GRO sockets, do > UDP GRO only if the packet is not locally terminated (using an address > lookup). That would not help TCP and we can not use listified gro for the local UDP input path (still not sure if we want to).