From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18655C433B4 for ; Fri, 14 May 2021 07:36:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 704C2613C7 for ; Fri, 14 May 2021 07:36:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 704C2613C7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 797CE6B0036; Fri, 14 May 2021 03:36:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 746F46B006E; Fri, 14 May 2021 03:36:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54CA66B0070; Fri, 14 May 2021 03:36:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id 2211A6B0036 for ; Fri, 14 May 2021 03:36:26 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B994CBBD3 for ; Fri, 14 May 2021 07:36:25 +0000 (UTC) X-FDA: 78139028730.19.22E9C08 Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf17.hostedemail.com (Postfix) with ESMTP id D626D40002E2 for ; Fri, 14 May 2021 07:36:24 +0000 (UTC) Received: by mail-ed1-f51.google.com with SMTP id v5so22578819edc.8 for ; Fri, 14 May 2021 00:36:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=S47bnTENAfz6udhQttejkJkRPmPHYthUy5L+Sbi1l+g=; b=Pozo8VEmGUmlZKt9niO8/JKRXDFrjRtmcu1ld/2/dJ9w7KeDp5EOJ2DlbmoPetEW5n /mTdLQE3RtyCRgWcMgCZnPa8f/4TTLm1UenFGJY9GewlhaiDUiFXz/7nV9MOH0srBwY8 pHa0EJb5M1mnk7LdAFN6ukSOV+mQkKKkhq5kNM/B2zbO+OBwMcrP8BxGgC84lU1RG7t1 5w2FP1K467aR4iMeGI5BpzBVgTmJq4ge6l05gtr18uyT/Eiy8A4klRLW0igwKbxEztp6 hfPs5dxkU8pBRftlsfRU8kLsrr/MGe7XTw39l6YDzNFrRFjEov9J1hjgv992gQi7Oi8f TZ2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=S47bnTENAfz6udhQttejkJkRPmPHYthUy5L+Sbi1l+g=; b=Nwho5UkCfI483ZWjrBj0BBkiC1rd+T62vr591EO+eTrupfc/IOTOBtf70wpvmCbZYr mUV7Ty12vTkmdrOKfHc7VGpf402wn4Q5e7wPsx7CdynFjkYj+elyOTcEoFePhCDPD3HN Pk/j9x2pkkCO89n1uifBWHfQVGnxv8DJt1fAmQ9m7ONYhRoTCf5cf7A/ylJCYK4eIg0c bjVxZhWxC+4ds9XHcdAIu0EzXBHSm+rrxgTpLGTdBJ58n4nKD1rFAkkHciTZJU4GQhVi q9n9g4LGBGhrvJyfBor50L6ka4n82sEzgz0E+XaWCKw/lgqtTSyzZdQ/hPjaXGtSMQjo kjOw== X-Gm-Message-State: AOAM531dQLNQqs3VFDXie/5bRrZOl8SxRQ5E9vxLcTzR/thNhR7ZEWG8 HmOZ3xTDCYQkEIkY9vM3CTfcig== X-Google-Smtp-Source: ABdhPJy6Zo4JHgaVaixpB+IZmKoXaAbTmSjnJttVn13YEvEcyewCues80W+Z2OjksFxyItJCWyNKBw== X-Received: by 2002:a05:6402:152:: with SMTP id s18mr2423019edu.221.1620977783937; Fri, 14 May 2021 00:36:23 -0700 (PDT) Received: from apalos.home ([94.69.77.156]) by smtp.gmail.com with ESMTPSA id k12sm3860693edo.50.2021.05.14.00.36.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 May 2021 00:36:23 -0700 (PDT) Date: Fri, 14 May 2021 10:36:18 +0300 From: Ilias Apalodimas To: Yunsheng Lin Cc: Matteo Croce , netdev@vger.kernel.org, linux-mm@kvack.org, Ayush Sawal , Vinay Kumar Yadav , Rohit Maheshwari , "David S. Miller" , Jakub Kicinski , Thomas Petazzoni , Marcin Wojtas , Russell King , Mirko Lindner , Stephen Hemminger , Tariq Toukan , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Boris Pismenny , Arnd Bergmann , Andrew Morton , "Peter Zijlstra (Intel)" , Vlastimil Babka , Yu Zhao , Will Deacon , Fenghua Yu , Roman Gushchin , Hugh Dickins , Peter Xu , Jason Gunthorpe , Jonathan Lemon , Alexander Lobakin , Cong Wang , wenxu , Kevin Hao , Jakub Sitnicki , Marco Elver , Willem de Bruijn , Miaohe Lin , Guillaume Nault , linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, bpf@vger.kernel.org, Matthew Wilcox , Eric Dumazet , David Ahern , Lorenzo Bianconi , Saeed Mahameed , Andrew Lunn , Paolo Abeni , Sven Auhagen Subject: Re: [PATCH net-next v5 3/5] page_pool: Allow drivers to hint on SKB recycling Message-ID: References: <20210513165846.23722-1-mcroce@linux.microsoft.com> <20210513165846.23722-4-mcroce@linux.microsoft.com> <798d6dad-7950-91b2-46a5-3535f44df4e2@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <798d6dad-7950-91b2-46a5-3535f44df4e2@huawei.com> X-Rspamd-Queue-Id: D626D40002E2 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linaro.org header.s=google header.b=Pozo8VEm; dmarc=pass (policy=none) header.from=linaro.org; spf=pass (imf17.hostedemail.com: domain of ilias.apalodimas@linaro.org designates 209.85.208.51 as permitted sender) smtp.mailfrom=ilias.apalodimas@linaro.org X-Rspamd-Server: rspam03 X-Stat-Signature: 77k4aexpzqfs9pfk7a4ti4o7js4on68q X-HE-Tag: 1620977784-27804 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [...] > > * using a single memcpy() in __copy_skb_header() > > */ > > @@ -3088,7 +3095,13 @@ static inline void skb_frag_ref(struct sk_buff= *skb, int f) > > */ > > static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) >=20 > Does it make sure to define a new function like recyclable_skb_frag_unr= ef() > instead of adding the recycle parameter? This way we may avoid checking > skb->pp_recycle for head data and every frag? >=20 We'd still have to check when to run __skb_frag_unref or recyclable_skb_frag_unref so I am not sure we can avoid that. In any case I'll have a look=20 > > { > > - put_page(skb_frag_page(frag)); > > + struct page *page =3D skb_frag_page(frag); > > + > > +#ifdef CONFIG_PAGE_POOL > > + if (recycle && page_pool_return_skb_page(page_address(page))) > > + return; > > +#endif > > + put_page(page); > > } > > =20 > > /** > > @@ -3100,7 +3113,7 @@ static inline void __skb_frag_unref(skb_frag_t = *frag, bool recycle) > > */ > > static inline void skb_frag_unref(struct sk_buff *skb, int f) > > { > > - __skb_frag_unref(&skb_shinfo(skb)->frags[f], false); > > + __skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle); > > } > > =20 > > /** > > @@ -4699,5 +4712,14 @@ static inline u64 skb_get_kcov_handle(struct s= k_buff *skb) > > #endif > > } > > =20 > > +#ifdef CONFIG_PAGE_POOL > > +static inline void skb_mark_for_recycle(struct sk_buff *skb, struct = page *page, > > + struct page_pool *pp) > > +{ > > + skb->pp_recycle =3D 1; > > + page_pool_store_mem_info(page, pp); > > +} > > +#endif > > + > > #endif /* __KERNEL__ */ > > #endif /* _LINUX_SKBUFF_H */ > > diff --git a/include/net/page_pool.h b/include/net/page_pool.h > > index 24b3d42c62c0..ce75abeddb29 100644 > > --- a/include/net/page_pool.h > > +++ b/include/net/page_pool.h > > @@ -148,6 +148,8 @@ inline enum dma_data_direction page_pool_get_dma_= dir(struct page_pool *pool) > > return pool->p.dma_dir; > > } > > =20 > > +bool page_pool_return_skb_page(void *data); > > + > > struct page_pool *page_pool_create(const struct page_pool_params *pa= rams); > > =20 > > #ifdef CONFIG_PAGE_POOL > > @@ -253,4 +255,11 @@ static inline void page_pool_ring_unlock(struct = page_pool *pool) > > spin_unlock_bh(&pool->ring.producer_lock); > > } > > =20 > > +/* Store mem_info on struct page and use it while recycling skb frag= s */ > > +static inline > > +void page_pool_store_mem_info(struct page *page, struct page_pool *p= p) > > +{ > > + page->pp =3D pp; > > +} > > + > > #endif /* _NET_PAGE_POOL_H */ > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > index 9de5d8c08c17..fa9f17db7c48 100644 > > --- a/net/core/page_pool.c > > +++ b/net/core/page_pool.c > > @@ -626,3 +626,26 @@ void page_pool_update_nid(struct page_pool *pool= , int new_nid) > > } > > } > > EXPORT_SYMBOL(page_pool_update_nid); > > + > > +bool page_pool_return_skb_page(void *data) > > +{ > > + struct page_pool *pp; > > + struct page *page; > > + > > + page =3D virt_to_head_page(data); > > + if (unlikely(page->pp_magic !=3D PP_SIGNATURE)) >=20 > we have checked the skb->pp_recycle before checking the page->pp_magic, > so the above seems like a likely() instead of unlikely()? >=20 The check here is ! =3D PP_SIGNATURE. So since we already checked for pp_recycle, it's unlikely the signature won't match. > > + return false; > > + > > + pp =3D (struct page_pool *)page->pp; > > + > > + /* Driver set this to memory recycling info. Reset it on recycle. > > + * This will *not* work for NIC using a split-page memory model. > > + * The page will be returned to the pool here regardless of the > > + * 'flipped' fragment being in use or not. > > + */ > > + page->pp =3D NULL; >=20 > Why not only clear the page->pp when the page can not be recycled > by the page pool? so that we do not need to set and clear it every > time the page is recycled=E3=80=82 >=20 If the page cannot be recycled, page->pp will not probably be set to begi= n with. Since we don't embed the feature in page_pool and we require the driver to explicitly enable it, as part of the 'skb flow', I'd rather kee= p=20 it as is. When we set/clear the page->pp, the page is probably already i= n=20 cache, so I doubt this will have any measurable impact. > > + page_pool_put_full_page(pp, virt_to_head_page(data), false); > > + > > C(end); [...] > > @@ -1725,6 +1734,7 @@ int pskb_expand_head(struct sk_buff *skb, int n= head, int ntail, > > skb->cloned =3D 0; > > skb->hdr_len =3D 0; > > skb->nohdr =3D 0; > > + skb->pp_recycle =3D 0; >=20 > I am not sure why we clear the skb->pp_recycle here. > As my understanding, the pskb_expand_head() only allocate new head > data, the old frag page in skb_shinfo()->frags still could be from > page pool=EF=BC=8C right? >=20 Ah correct! In that case we must not clear skb->pp_recycle. The new head will fail on the signature check and end up being freed, while the remaining frags will be recycled. The *original* head will be unmapped/recycled (based of the page refcnt) on the pskb_expand_head() itself. > > atomic_set(&skb_shinfo(skb)->dataref, 1); > > =20 > > skb_metadata_clear(skb); > > @@ -3495,7 +3505,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_bu= ff *skb, int shiftlen) > > fragto =3D &skb_shinfo(tgt)->frags[merge]; > > =20 > > skb_frag_size_add(fragto, skb_frag_size(fragfrom)); > > - __skb_frag_unref(fragfrom, false); > > + __skb_frag_unref(fragfrom, skb->pp_recycle); > > } > > =20 > > /* Reposition in the original skb */ > > @@ -5285,6 +5295,13 @@ bool skb_try_coalesce(struct sk_buff *to, stru= ct sk_buff *from, > > if (skb_cloned(to)) > > return false; > > =20 > > + /* We can't coalesce skb that are allocated from slab and page_pool > > + * The recycle mark is on the skb, so that might end up trying to > > + * recycle slab allocated skb->head > > + */ > > + if (to->pp_recycle !=3D from->pp_recycle) > > + return false; >=20 > Since we are also depending on page->pp_magic to decide whether to > recycle a page, we could just set the to->pp_recycle according to > from->pp_recycle and do the coalesce? So I was think about this myself. This check is a 'leftover' from my initial version, were I only had the pp_recycle bit + struct page meta-data (without the signature). Since that version didn't have the signature you could not coalesce 2 skb's coming from page_pool/slab.=20 We could now do what you suggest, but honestly I can't think of many use cases that this can happen to begin with. I think I'd prefer leaving it = as is and adjusting the comment. If we can somehow prove this happens oftenly and has a performance impact, we can go ahead and remove it. [...] Thanks /Ilias