From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34AA1C433B4 for ; Fri, 30 Apr 2021 16:24:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1534461468 for ; Fri, 30 Apr 2021 16:24:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230165AbhD3QZA (ORCPT ); Fri, 30 Apr 2021 12:25:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229579AbhD3QZA (ORCPT ); Fri, 30 Apr 2021 12:25:00 -0400 Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A50B7C06138B for ; Fri, 30 Apr 2021 09:24:11 -0700 (PDT) Received: by mail-ej1-x62c.google.com with SMTP id n2so106109479ejy.7 for ; Fri, 30 Apr 2021 09:24:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=mwqVO5O2e4qn+C8cpBOkq45JASNi1JFNeTKlRI/lCs4=; b=NvA9j/ifwfKe59vNTwP+Wxi7INrk55tqbmSXrwUdiE2feP2djVkfgDi2PZQy+/usvp qe8KaWzUqxZDGZGPa4ob+8n8YsKFwDSVPWQP9z3as54hGW2Lu2JgbSglkaAHEHwhpbVk GMR3rqcuOAD/hCq/HjWtr70jlWyOMzlvuD7gZtzn6N5pNVOyn7V7ZRLwoGr+X7kRQNtJ kLP3N6UV4TTZtyRC7mHd7FFtBC1GVa1Ia7hnMq4hRDbz3GWnYzWY8CbJT5j8QUdSQ03T QZRf3h+UITj/Uok7p2TgGYXgkK3HlKi+MCUf0WYbW3OzdMJ14qGF16W4JU3VPaRzinhE VEzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=mwqVO5O2e4qn+C8cpBOkq45JASNi1JFNeTKlRI/lCs4=; b=tj5f0sKOLxAOzdPre8pvzyKQOs7QtAEtdNS0axxNrD+N/nIwqM2ufYRMA1RB125vND VDcG0duSbLF/vay2RuqO5SbQqtul6hmPAbSnAv9Ra2FP78T9Ux9+D3UALR7bwuRNix9h 4PQkNhXdByPtvrTFYGsey+VAHL97YiD916Wu5PpklfeeeeNnoibUdCPhZ51Ho6EGbq6b 99AsibV1Kh4prVmodlzgE7BJO8KiuRknrNhzwom0lmuiiCIHAwKhNW7P4zlhWFK0GnT2 9EBOiO3N3iIr1j4TZBgnfxj89SY7H57shklnwQ3cMJMFTfecXEYF0E71XE+mM7+AxdNb pUIw== X-Gm-Message-State: AOAM533RyUIXDUvXWY0jVnIi9MLi8xT98y6f9pu/VKXDAKitHn8MO1cD 7m+hUf/RzSnIJqzjWaj6K2UwcA== X-Google-Smtp-Source: ABdhPJwspeacau4Mf51sMUSHNfeG3QTA2QyRAmxBCgydY0y8kwjc7b8xW2eyjbVUMfvvgt64UssD8w== X-Received: by 2002:a17:907:724d:: with SMTP id ds13mr5319581ejc.442.1619799850275; Fri, 30 Apr 2021 09:24:10 -0700 (PDT) Received: from apalos.home ([94.69.77.156]) by smtp.gmail.com with ESMTPSA id z17sm2305695ejc.69.2021.04.30.09.24.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Apr 2021 09:24:09 -0700 (PDT) Date: Fri, 30 Apr 2021 19:24:03 +0300 From: Ilias Apalodimas To: Yunsheng Lin Cc: Matteo Croce , netdev@vger.kernel.org, linux-mm@kvack.org, Ayush Sawal , Vinay Kumar Yadav , Rohit Maheshwari , "David S. Miller" , Jakub Kicinski , Thomas Petazzoni , Marcin Wojtas , Russell King , Mirko Lindner , Stephen Hemminger , Tariq Toukan , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Boris Pismenny , Arnd Bergmann , Andrew Morton , "Peter Zijlstra (Intel)" , Vlastimil Babka , Yu Zhao , Will Deacon , Michel Lespinasse , Fenghua Yu , Roman Gushchin , Hugh Dickins , Peter Xu , Jason Gunthorpe , Guoqing Jiang , Jonathan Lemon , Alexander Lobakin , Cong Wang , wenxu , Kevin Hao , Aleksandr Nogikh , Jakub Sitnicki , Marco Elver , Willem de Bruijn , Miaohe Lin , Guillaume Nault , linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, bpf@vger.kernel.org, Matthew Wilcox , Eric Dumazet , David Ahern , Lorenzo Bianconi , Saeed Mahameed , Andrew Lunn , Paolo Abeni Subject: Re: [PATCH net-next v3 0/5] page_pool: recycle buffers Message-ID: References: <20210409223801.104657-1-mcroce@linux.microsoft.com> <9bf7c5b3-c3cf-e669-051f-247aa8df5c5a@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9bf7c5b3-c3cf-e669-051f-247aa8df5c5a@huawei.com> Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org [...] > >> > >> 1. skb frag page recycling do not need "struct xdp_rxq_info" or > >> "struct xdp_mem_info" to bond the relation between "struct page" and > >> "struct page_pool", which seems uncessary at this point if bonding > >> a "struct page_pool" pointer directly in "struct page" does not cause > >> space increasing. > > > > We can't do that. The reason we need those structs is that we rely on the > > existing XDP code, which already recycles it's buffers, to enable > > recycling. Since we allocate a page per packet when using page_pool for a > > driver , the same ideas apply to an SKB and XDP frame. We just recycle the > > I am not really familar with XDP here, but a packet from hw is either a > "struct xdp_frame/xdp_buff" for XDP or a "struct sk_buff" for TCP/IP stack, > a packet can not be both "struct xdp_frame/xdp_buff" and "struct sk_buff" at > the same time, right? > Yes, but the payload is irrelevant in both cases and that's what we use page_pool for. You can't use this patchset unless your driver usues build_skb(). So in both cases you just allocate memory for the payload and decide what the wrap the buffer with (XDP or SKB) later. > What does not really make sense to me is that the page has to be from page > pool when a skb's frag page can be recycled, right? If it is ture, the switch > case in __xdp_return() does not really make sense for skb recycling, why go > all the trouble of checking the mem->type and mem->id to find the page_pool > pointer when recyclable page for skb can only be from page pool? In any case you need to find in which pool the buffer you try to recycle belongs. In order to make the whole idea generic and be able to recycle skb fragments instead of just the skb head you need to store some information on struct page. That's the fundamental difference of this patchset compared to the RFC we sent a few years back [1] which was just storing information on the skb. The way this is done on the current patchset is that we store the struct xdp_mem_info in page->private and then look it up on xdp_return(). Now that being said Matthew recently reworked struct page, so we could see if we can store the page pool pointer directly instead of the struct xdp_mem_info. That would allow us to call into page pool functions directly. But we'll have to agree if that makes sense to go into struct page to begin with and make sure the pointer is still valid when we take the recycling path. > > payload and we don't really care what's in that. We could rename the functions > > to something more generic in the future though ? > > > >> > >> 2. it would be good to do the page reference count updating batching > >> in page pool instead of specific driver. > >> > >> > >> page_pool_atomic_sub_if_positive() is added to decide who can call > >> page_pool_put_full_page(), because the driver and stack may hold > >> reference to the same page, only if last one which hold complete > >> reference to a page can call page_pool_put_full_page() to decide if > >> recycling is possible, if not, the page is released, so I am wondering > >> if a similar page_pool_atomic_sub_if_positive() can added to specific > >> user space address unmapping path to allow skb recycling for RX zerocopy > >> too? > >> > > > > I would prefer a different page pool type if we wanted to support the split > > page model. The changes as is are quite intrusive, since they change the > > entire skb return path. So I would prefer introducing the changes one at a > > time. > > I understand there may be fundamental semantic change when split page model > is supported by page pool, but the split page support change mainly affect the > skb recycling path and the driver that uses page pool(XDP too) if we are careful > enough, not the entire skb return path as my understanding. It affects those drivers only, but in order to do so is intercepts the packet in skb_free_head(), which pretty much affects the entire network path. > > Anyway, one changes at a time is always prefered if supporting split page is > proved to be non-trivel and intrusive. > > > > > The fundamental difference between having the recycling in the driver vs > > having it in a generic API is pretty straightforward. When a driver holds > > the extra page references he is free to decide what to reuse, when he is about > > to refill his Rx descriptors. So TCP zerocopy might work even if the > > userspace applications hold the buffers for an X amount of time. > > On this proposal though we *need* to decide what to do with the buffer when we > > are about to free the skb. > > I am not sure I understand what you meant by "free the skb", does it mean > that kfree_skb() is called to free the skb. Yes > > As my understanding, if the skb completely own the page(which means page_count() > == 1) when kfree_skb() is called, __page_pool_put_page() is called, otherwise > page_ref_dec() is called, which is exactly what page_pool_atomic_sub_if_positive() > try to handle it atomically. > Not really, the opposite is happening here. If the pp_recycle bit is set we will always call page_pool_return_skb_page(). If the page signature matches the 'magic' set by page pool we will always call xdp_return_skb_frame() will end up calling __page_pool_put_page(). If the refcnt is 1 we'll try to recycle the page. If it's not we'll release it from page_pool (releasing some internal references we keep) unmap the buffer and decrement the refcnt. [1] https://lore.kernel.org/netdev/154413868810.21735.572808840657728172.stgit@firesoul/ Cheers /Ilias