From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=0LR3=KJ=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.6 required=3.0 tests=BAYES_00,DKIM_INVALID,
	DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 18655C433B4
	for <linux-mm@archiver.kernel.org>; Fri, 14 May 2021 07:36:28 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 704C2613C7
	for <linux-mm@archiver.kernel.org>; Fri, 14 May 2021 07:36:27 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 704C2613C7
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 797CE6B0036; Fri, 14 May 2021 03:36:26 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 746F46B006E; Fri, 14 May 2021 03:36:26 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 54CA66B0070; Fri, 14 May 2021 03:36:26 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115])
	by kanga.kvack.org (Postfix) with ESMTP id 2211A6B0036
	for <linux-mm@kvack.org>; Fri, 14 May 2021 03:36:26 -0400 (EDT)
Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id B994CBBD3
	for <linux-mm@kvack.org>; Fri, 14 May 2021 07:36:25 +0000 (UTC)
X-FDA: 78139028730.19.22E9C08
Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51])
	by imf17.hostedemail.com (Postfix) with ESMTP id D626D40002E2
	for <linux-mm@kvack.org>; Fri, 14 May 2021 07:36:24 +0000 (UTC)
Received: by mail-ed1-f51.google.com with SMTP id v5so22578819edc.8
        for <linux-mm@kvack.org>; Fri, 14 May 2021 00:36:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:content-transfer-encoding:in-reply-to;
        bh=S47bnTENAfz6udhQttejkJkRPmPHYthUy5L+Sbi1l+g=;
        b=Pozo8VEmGUmlZKt9niO8/JKRXDFrjRtmcu1ld/2/dJ9w7KeDp5EOJ2DlbmoPetEW5n
         /mTdLQE3RtyCRgWcMgCZnPa8f/4TTLm1UenFGJY9GewlhaiDUiFXz/7nV9MOH0srBwY8
         pHa0EJb5M1mnk7LdAFN6ukSOV+mQkKKkhq5kNM/B2zbO+OBwMcrP8BxGgC84lU1RG7t1
         5w2FP1K467aR4iMeGI5BpzBVgTmJq4ge6l05gtr18uyT/Eiy8A4klRLW0igwKbxEztp6
         hfPs5dxkU8pBRftlsfRU8kLsrr/MGe7XTw39l6YDzNFrRFjEov9J1hjgv992gQi7Oi8f
         TZ2A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:content-transfer-encoding
         :in-reply-to;
        bh=S47bnTENAfz6udhQttejkJkRPmPHYthUy5L+Sbi1l+g=;
        b=Nwho5UkCfI483ZWjrBj0BBkiC1rd+T62vr591EO+eTrupfc/IOTOBtf70wpvmCbZYr
         mUV7Ty12vTkmdrOKfHc7VGpf402wn4Q5e7wPsx7CdynFjkYj+elyOTcEoFePhCDPD3HN
         Pk/j9x2pkkCO89n1uifBWHfQVGnxv8DJt1fAmQ9m7ONYhRoTCf5cf7A/ylJCYK4eIg0c
         bjVxZhWxC+4ds9XHcdAIu0EzXBHSm+rrxgTpLGTdBJ58n4nKD1rFAkkHciTZJU4GQhVi
         q9n9g4LGBGhrvJyfBor50L6ka4n82sEzgz0E+XaWCKw/lgqtTSyzZdQ/hPjaXGtSMQjo
         kjOw==
X-Gm-Message-State: AOAM531dQLNQqs3VFDXie/5bRrZOl8SxRQ5E9vxLcTzR/thNhR7ZEWG8
	HmOZ3xTDCYQkEIkY9vM3CTfcig==
X-Google-Smtp-Source: ABdhPJy6Zo4JHgaVaixpB+IZmKoXaAbTmSjnJttVn13YEvEcyewCues80W+Z2OjksFxyItJCWyNKBw==
X-Received: by 2002:a05:6402:152:: with SMTP id s18mr2423019edu.221.1620977783937;
        Fri, 14 May 2021 00:36:23 -0700 (PDT)
Received: from apalos.home ([94.69.77.156])
        by smtp.gmail.com with ESMTPSA id k12sm3860693edo.50.2021.05.14.00.36.19
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 14 May 2021 00:36:23 -0700 (PDT)
Date: Fri, 14 May 2021 10:36:18 +0300
From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: Matteo Croce <mcroce@linux.microsoft.com>, netdev@vger.kernel.org,
	linux-mm@kvack.org, Ayush Sawal <ayush.sawal@chelsio.com>,
	Vinay Kumar Yadav <vinay.yadav@chelsio.com>,
	Rohit Maheshwari <rohitm@chelsio.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	Marcin Wojtas <mw@semihalf.com>,
	Russell King <linux@armlinux.org.uk>,
	Mirko Lindner <mlindner@marvell.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Tariq Toukan <tariqt@nvidia.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Boris Pismenny <borisp@nvidia.com>, Arnd Bergmann <arnd@arndb.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>, Yu Zhao <yuzhao@google.com>,
	Will Deacon <will@kernel.org>, Fenghua Yu <fenghua.yu@intel.com>,
	Roman Gushchin <guro@fb.com>, Hugh Dickins <hughd@google.com>,
	Peter Xu <peterx@redhat.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	Alexander Lobakin <alobakin@pm.me>,
	Cong Wang <cong.wang@bytedance.com>, wenxu <wenxu@ucloud.cn>,
	Kevin Hao <haokexin@gmail.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	Marco Elver <elver@google.com>,
	Willem de Bruijn <willemb@google.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Guillaume Nault <gnault@redhat.com>, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, bpf@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	Eric Dumazet <edumazet@google.com>, David Ahern <dsahern@gmail.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>, Andrew Lunn <andrew@lunn.ch>,
	Paolo Abeni <pabeni@redhat.com>,
	Sven Auhagen <sven.auhagen@voleatech.de>
Subject: Re: [PATCH net-next v5 3/5] page_pool: Allow drivers to hint on SKB
 recycling
Message-ID: <YJ4ocslvURa/H+6f@apalos.home>
References: <20210513165846.23722-1-mcroce@linux.microsoft.com>
 <20210513165846.23722-4-mcroce@linux.microsoft.com>
 <798d6dad-7950-91b2-46a5-3535f44df4e2@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <798d6dad-7950-91b2-46a5-3535f44df4e2@huawei.com>
X-Rspamd-Queue-Id: D626D40002E2
Authentication-Results: imf17.hostedemail.com;
	dkim=pass header.d=linaro.org header.s=google header.b=Pozo8VEm;
	dmarc=pass (policy=none) header.from=linaro.org;
	spf=pass (imf17.hostedemail.com: domain of ilias.apalodimas@linaro.org designates 209.85.208.51 as permitted sender) smtp.mailfrom=ilias.apalodimas@linaro.org
X-Rspamd-Server: rspam03
X-Stat-Signature: 77k4aexpzqfs9pfk7a4ti4o7js4on68q
X-HE-Tag: 1620977784-27804
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

[...]
> >  	 * using a single memcpy() in __copy_skb_header()
> >  	 */
> > @@ -3088,7 +3095,13 @@ static inline void skb_frag_ref(struct sk_buff=
 *skb, int f)
> >   */
> >  static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle)
>=20
> Does it make sure to define a new function like recyclable_skb_frag_unr=
ef()
> instead of adding the recycle parameter? This way we may avoid checking
> skb->pp_recycle for head data and every frag?
>=20

We'd still have to check when to run __skb_frag_unref or
recyclable_skb_frag_unref so I am not sure we can avoid that.
In any case I'll have a look=20

> >  {
> > -	put_page(skb_frag_page(frag));
> > +	struct page *page =3D skb_frag_page(frag);
> > +
> > +#ifdef CONFIG_PAGE_POOL
> > +	if (recycle && page_pool_return_skb_page(page_address(page)))
> > +		return;
> > +#endif
> > +	put_page(page);
> >  }
> > =20
> >  /**
> > @@ -3100,7 +3113,7 @@ static inline void __skb_frag_unref(skb_frag_t =
*frag, bool recycle)
> >   */
> >  static inline void skb_frag_unref(struct sk_buff *skb, int f)
> >  {
> > -	__skb_frag_unref(&skb_shinfo(skb)->frags[f], false);
> > +	__skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle);
> >  }
> > =20
> >  /**
> > @@ -4699,5 +4712,14 @@ static inline u64 skb_get_kcov_handle(struct s=
k_buff *skb)
> >  #endif
> >  }
> > =20
> > +#ifdef CONFIG_PAGE_POOL
> > +static inline void skb_mark_for_recycle(struct sk_buff *skb, struct =
page *page,
> > +					struct page_pool *pp)
> > +{
> > +	skb->pp_recycle =3D 1;
> > +	page_pool_store_mem_info(page, pp);
> > +}
> > +#endif
> > +
> >  #endif	/* __KERNEL__ */
> >  #endif	/* _LINUX_SKBUFF_H */
> > diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> > index 24b3d42c62c0..ce75abeddb29 100644
> > --- a/include/net/page_pool.h
> > +++ b/include/net/page_pool.h
> > @@ -148,6 +148,8 @@ inline enum dma_data_direction page_pool_get_dma_=
dir(struct page_pool *pool)
> >  	return pool->p.dma_dir;
> >  }
> > =20
> > +bool page_pool_return_skb_page(void *data);
> > +
> >  struct page_pool *page_pool_create(const struct page_pool_params *pa=
rams);
> > =20
> >  #ifdef CONFIG_PAGE_POOL
> > @@ -253,4 +255,11 @@ static inline void page_pool_ring_unlock(struct =
page_pool *pool)
> >  		spin_unlock_bh(&pool->ring.producer_lock);
> >  }
> > =20
> > +/* Store mem_info on struct page and use it while recycling skb frag=
s */
> > +static inline
> > +void page_pool_store_mem_info(struct page *page, struct page_pool *p=
p)
> > +{
> > +	page->pp =3D pp;
> > +}
> > +
> >  #endif /* _NET_PAGE_POOL_H */
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 9de5d8c08c17..fa9f17db7c48 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -626,3 +626,26 @@ void page_pool_update_nid(struct page_pool *pool=
, int new_nid)
> >  	}
> >  }
> >  EXPORT_SYMBOL(page_pool_update_nid);
> > +
> > +bool page_pool_return_skb_page(void *data)
> > +{
> > +	struct page_pool *pp;
> > +	struct page *page;
> > +
> > +	page =3D virt_to_head_page(data);
> > +	if (unlikely(page->pp_magic !=3D PP_SIGNATURE))
>=20
> we have checked the skb->pp_recycle before checking the page->pp_magic,
> so the above seems like a likely() instead of unlikely()?
>=20

The check here is ! =3D PP_SIGNATURE. So since we already checked for
pp_recycle, it's unlikely the signature won't match.

> > +		return false;
> > +
> > +	pp =3D (struct page_pool *)page->pp;
> > +
> > +	/* Driver set this to memory recycling info. Reset it on recycle.
> > +	 * This will *not* work for NIC using a split-page memory model.
> > +	 * The page will be returned to the pool here regardless of the
> > +	 * 'flipped' fragment being in use or not.
> > +	 */
> > +	page->pp =3D NULL;
>=20
> Why not only clear the page->pp when the page can not be recycled
> by the page pool? so that we do not need to set and clear it every
> time the page is recycled=E3=80=82
>=20

If the page cannot be recycled, page->pp will not probably be set to begi=
n
with. Since we don't embed the feature in page_pool and we require the
driver to explicitly enable it, as part of the 'skb flow', I'd rather kee=
p=20
it as is.  When we set/clear the page->pp, the page is probably already i=
n=20
cache, so I doubt this will have any measurable impact.

> > +	page_pool_put_full_page(pp, virt_to_head_page(data), false);
> > +
> >  	C(end);

[...]

> > @@ -1725,6 +1734,7 @@ int pskb_expand_head(struct sk_buff *skb, int n=
head, int ntail,
> >  	skb->cloned   =3D 0;
> >  	skb->hdr_len  =3D 0;
> >  	skb->nohdr    =3D 0;
> > +	skb->pp_recycle =3D 0;
>=20
> I am not sure why we clear the skb->pp_recycle here.
> As my understanding, the pskb_expand_head() only allocate new head
> data, the old frag page in skb_shinfo()->frags still could be from
> page pool=EF=BC=8C right?
>=20

Ah correct! In that case we must not clear skb->pp_recycle.  The new head
will fail on the signature check and end up being freed, while the
remaining frags will be recycled. The *original* head will be
unmapped/recycled (based of the page refcnt)  on the pskb_expand_head()
itself.

> >  	atomic_set(&skb_shinfo(skb)->dataref, 1);
> > =20
> >  	skb_metadata_clear(skb);
> > @@ -3495,7 +3505,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_bu=
ff *skb, int shiftlen)
> >  		fragto =3D &skb_shinfo(tgt)->frags[merge];
> > =20
> >  		skb_frag_size_add(fragto, skb_frag_size(fragfrom));
> > -		__skb_frag_unref(fragfrom, false);
> > +		__skb_frag_unref(fragfrom, skb->pp_recycle);
> >  	}
> > =20
> >  	/* Reposition in the original skb */
> > @@ -5285,6 +5295,13 @@ bool skb_try_coalesce(struct sk_buff *to, stru=
ct sk_buff *from,
> >  	if (skb_cloned(to))
> >  		return false;
> > =20
> > +	/* We can't coalesce skb that are allocated from slab and page_pool
> > +	 * The recycle mark is on the skb, so that might end up trying to
> > +	 * recycle slab allocated skb->head
> > +	 */
> > +	if (to->pp_recycle !=3D from->pp_recycle)
> > +		return false;
>=20
> Since we are also depending on page->pp_magic to decide whether to
> recycle a page, we could just set the to->pp_recycle according to
> from->pp_recycle and do the coalesce?

So I was think about this myself.  This check is a 'leftover' from my
initial version, were I only had the pp_recycle bit + struct page
meta-data (without the signature).  Since that version didn't have the
signature you could not coalesce 2 skb's coming from page_pool/slab.=20
We could now do what you suggest, but honestly I can't think of many use
cases that this can happen to begin with.  I think I'd prefer leaving it =
as
is and adjusting the comment.  If we can somehow prove this happens
oftenly and has a performance impact, we can go ahead and remove it.

[...]

Thanks
/Ilias