From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH net-next 2/2] udp: implement and use per cpu rx skbs cache Date: Sat, 21 Apr 2018 09:45:00 -0700 Message-ID: References: <890db004-4dfe-7f77-61ee-1ac0d7d2a24c@gmail.com> <1524071712.2599.60.camel@redhat.com> <3270c995-4eea-b3e1-128c-82921d89eb79@gmail.com> <1524123637.3160.16.camel@redhat.com> <0e3abeb5-8081-f9ea-4de6-cc1a7edfc5a5@gmail.com> <20180420154836.3690a39e@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Paolo Abeni , Network Development , "David S. Miller" , Tariq Toukan To: Willem de Bruijn , Jesper Dangaard Brouer Return-path: Received: from mail-qt0-f176.google.com ([209.85.216.176]:43912 "EHLO mail-qt0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752762AbeDUQpD (ORCPT ); Sat, 21 Apr 2018 12:45:03 -0400 Received: by mail-qt0-f176.google.com with SMTP id l11-v6so13102869qtj.10 for ; Sat, 21 Apr 2018 09:45:03 -0700 (PDT) In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 04/21/2018 08:54 AM, Willem de Bruijn wrote: > On Fri, Apr 20, 2018 at 9:48 AM, Jesper Dangaard Brouer > wrote: >> >> On Thu, 19 Apr 2018 06:47:10 -0700 Eric Dumazet wrote: >>> On 04/19/2018 12:40 AM, Paolo Abeni wrote: >>>> On Wed, 2018-04-18 at 12:21 -0700, Eric Dumazet wrote: >>>>> On 04/18/2018 10:15 AM, Paolo Abeni wrote: >> [...] >>>> >>>> Any suggestions for better results are more than welcome! >>> >>> Yes, remote skb freeing. I mentioned this idea to Jesper and Tariq in >>> Seoul (netdev conference). Not tied to UDP, but a generic solution. >> >> Yes, I remember. I think... was it the idea, where you basically >> wanted to queue back SKBs to the CPU that allocated them, right? >> >> Freeing an SKB on the same CPU that allocated it, have multiple >> advantages. (1) the SLUB allocator can use a non-atomic >> "cpu-local" (double)cmpxchg. (2) the 4 cache-lines memset cleared of >> the SKB stay local. (3) the atomic SKB refcnt/users stay local. >> >> We just have to avoid that queue back SKB's mechanism, doesn't cost >> more than the operations we expect to save. Bulk transfer is an >> obvious approach. For storing SKBs until they are returned, we already >> have a fast mechanism see napi_consume_skb calling _kfree_skb_defer, >> which SLUB/SLAB-bulk free to amortize cost (1). >> >> I guess, the missing information is that we don't know what CPU the SKB >> were created on... > > For connected sockets, sk->sk_incoming_cpu has this data. It > records BH cpu on enqueue to udp socket, so one caveat is that > it may be wrong with rps/rfs. > > Another option is to associate not with source cpu but napi struct > and have the device driver free in the context of its napi processing. > This has the additional benefit that skb->napi_id is already stored > per skb, so this also works for unconnected sockets. > > Third, the skb->napi_id field is unused after setting sk->sk_napi_id > on sk enqueue, so the BH cpu could be stored here after that, > essentially extending sk_incoming_cpu to unconnected sockets. We use at Google something named TXCS, which is what I mentioned to Jesper and Tariq. (In our case, we wanted to not perform skb destructor/freeing on the cpu handling the TX queue, but on cpus that originally cooked the skb (running TCP stack)) To accommodate generic needs (both RX and TX), I do not believe we can union any existing fields, without a lot of pain/bugs.