All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
@ 2023-06-28 12:11 Liang Chen
  2023-06-29  6:53 ` Yunsheng Lin
  2023-06-30 23:07 ` Jakub Kicinski
  0 siblings, 2 replies; 11+ messages in thread
From: Liang Chen @ 2023-06-28 12:11 UTC (permalink / raw)
  To: ilias.apalodimas, hawk
  Cc: kuba, davem, edumazet, pabeni, linyunsheng, netdev, liangchen.linux

In order to address the issues encountered with commit 1effe8ca4e34
("skbuff: fix coalescing for page_pool fragment recycling"), the
combination of the following condition was excluded from skb coalescing:

from->pp_recycle = 1
from->cloned = 1
to->pp_recycle = 1

However, with page pool environments, the aforementioned combination can
be quite common. In scenarios with a higher number of small packets, it
can significantly affect the success rate of coalescing. For example,
when considering packets of 256 bytes size, our comparison of coalescing
success rate is as follows:

Without page pool: 70%
With page pool: 13%

Consequently, this has an impact on performance:

Without page pool: 2.64 Gbits/sec
With page pool: 2.41 Gbits/sec

Therefore, it seems worthwhile to optimize this scenario and enable
coalescing of this particular combination. To achieve this, we need to
ensure the correct increment of the "from" SKB page's page pool fragment
count (pp_frag_count).

Following this optimization, the success rate of coalescing measured in our
environment has improved as follows:

With page pool: 60%

This success rate is approaching the rate achieved without using page pool,
and the performance has also been improved:

With page pool: 2.61 Gbits/sec

Below is the performance comparison for small packets before and after this
optimization. We observe no impact to packets larger than 4K.

without page pool fragment(PP_FLAG_PAGE_FRAG)
packet size     before      after
(bytes)         (Gbits/sec) (Gbits/sec)
128             1.28        1.37
256             2.41        2.61
512             4.56        4.87
1024            7.69        8.21
2048            12.85       13.41

with page pool fragment(PP_FLAG_PAGE_FRAG)
packet size     before      after
(bytes)         (Gbits/sec) (Gbits/sec)
128             1.28        1.37
256             2.35        2.62
512             4.37        4.86
1024            7.62        8.41
2048            13.07       13.53

with page pool fragment(PP_FLAG_PAGE_FRAG) and high order(order = 3)
packet size     before      after
(bytes)         (Gbits/sec) (Gbits/sec)
128             1.28        1.41
256             2.41        2.74
512             4.57        5.25
1024            8.61        9.71
2048            14.81       16.78

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
---
 include/net/page_pool.h | 21 +++++++++++++++++++++
 net/core/skbuff.c       | 11 +++++++----
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 126f9e294389..05e5d8ead63b 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -399,4 +399,25 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
 		page_pool_update_nid(pool, new_nid);
 }
 
+static inline bool page_pool_is_pp_page(struct page *page)
+{
+	return (page->pp_magic & ~0x3UL) == PP_SIGNATURE;
+}
+
+static inline bool page_pool_is_pp_page_frag(struct page *page)
+{
+	return !!(page->pp->p.flags & PP_FLAG_PAGE_FRAG);
+}
+
+static inline void page_pool_page_ref(struct page *page)
+{
+	struct page *head_page = compound_head(page);
+
+	if (page_pool_is_pp_page(head_page) &&
+			page_pool_is_pp_page_frag(head_page))
+		atomic_long_inc(&head_page->pp_frag_count);
+	else
+		get_page(head_page);
+}
+
 #endif /* _NET_PAGE_POOL_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6c5915efbc17..9806b091f0f6 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5666,8 +5666,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	 * !@to->pp_recycle but its tricky (due to potential race with
 	 * the clone disappearing) and rare, so not worth dealing with.
 	 */
-	if (to->pp_recycle != from->pp_recycle ||
-	    (from->pp_recycle && skb_cloned(from)))
+	if (to->pp_recycle != from->pp_recycle)
 		return false;
 
 	if (len <= skb_tailroom(to)) {
@@ -5724,8 +5723,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	/* if the skb is not cloned this does nothing
 	 * since we set nr_frags to 0.
 	 */
-	for (i = 0; i < from_shinfo->nr_frags; i++)
-		__skb_frag_ref(&from_shinfo->frags[i]);
+	if (from->pp_recycle)
+		for (i = 0; i < from_shinfo->nr_frags; i++)
+			page_pool_page_ref(skb_frag_page(&from_shinfo->frags[i]));
+	else
+		for (i = 0; i < from_shinfo->nr_frags; i++)
+			__skb_frag_ref(&from_shinfo->frags[i]);
 
 	to->truesize += delta;
 	to->len += len;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
  2023-06-28 12:11 [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case Liang Chen
@ 2023-06-29  6:53 ` Yunsheng Lin
  2023-06-29 12:17   ` Liang Chen
  2023-06-30 23:07 ` Jakub Kicinski
  1 sibling, 1 reply; 11+ messages in thread
From: Yunsheng Lin @ 2023-06-29  6:53 UTC (permalink / raw)
  To: Liang Chen, ilias.apalodimas, hawk; +Cc: kuba, davem, edumazet, pabeni, netdev

On 2023/6/28 20:11, Liang Chen wrote:
> In order to address the issues encountered with commit 1effe8ca4e34
> ("skbuff: fix coalescing for page_pool fragment recycling"), the
> combination of the following condition was excluded from skb coalescing:
> 
> from->pp_recycle = 1
> from->cloned = 1
> to->pp_recycle = 1
> 
> However, with page pool environments, the aforementioned combination can
> be quite common. In scenarios with a higher number of small packets, it
> can significantly affect the success rate of coalescing. For example,
> when considering packets of 256 bytes size, our comparison of coalescing
> success rate is as follows:

As skb_try_coalesce() only allow coaleascing when 'to' skb is not cloned.

Could you give more detailed about the testing when we have a non-cloned
'to' skb and a cloned 'from' skb? As both of them should be belong to the
same flow.

I had the below patchset trying to do something similar as this patch does:
https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/

It seems this patch is only trying to optimize a specific case for skb
coalescing, So if skb coalescing between non-cloned and cloned skb is a
common case, then it might worth optimizing.


> 
> Without page pool: 70%
> With page pool: 13%
> 

...

> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 126f9e294389..05e5d8ead63b 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -399,4 +399,25 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
>  		page_pool_update_nid(pool, new_nid);
>  }
>  
> +static inline bool page_pool_is_pp_page(struct page *page)
> +{
> +	return (page->pp_magic & ~0x3UL) == PP_SIGNATURE;
> +}
> +
> +static inline bool page_pool_is_pp_page_frag(struct page *page)> +{
> +	return !!(page->pp->p.flags & PP_FLAG_PAGE_FRAG);
> +}
> +
> +static inline void page_pool_page_ref(struct page *page)
> +{
> +	struct page *head_page = compound_head(page);

It seems we could avoid adding head_page here:
page = compound_head(page);

> +
> +	if (page_pool_is_pp_page(head_page) &&
> +			page_pool_is_pp_page_frag(head_page))
> +		atomic_long_inc(&head_page->pp_frag_count);
> +	else
> +		get_page(head_page);

page_ref_inc() should be enough here instead of get_page()
as compound_head() have been called.

> +}
> +
>  #endif /* _NET_PAGE_POOL_H */
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 6c5915efbc17..9806b091f0f6 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5666,8 +5666,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>  	 * !@to->pp_recycle but its tricky (due to potential race with
>  	 * the clone disappearing) and rare, so not worth dealing with.
>  	 */
> -	if (to->pp_recycle != from->pp_recycle ||
> -	    (from->pp_recycle && skb_cloned(from)))
> +	if (to->pp_recycle != from->pp_recycle)
>  		return false;
>  
>  	if (len <= skb_tailroom(to)) {
> @@ -5724,8 +5723,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>  	/* if the skb is not cloned this does nothing
>  	 * since we set nr_frags to 0.
>  	 */
> -	for (i = 0; i < from_shinfo->nr_frags; i++)
> -		__skb_frag_ref(&from_shinfo->frags[i]);
> +	if (from->pp_recycle)
> +		for (i = 0; i < from_shinfo->nr_frags; i++)
> +			page_pool_page_ref(skb_frag_page(&from_shinfo->frags[i]));
> +	else
> +		for (i = 0; i < from_shinfo->nr_frags; i++)
> +			__skb_frag_ref(&from_shinfo->frags[i]);
>  
>  	to->truesize += delta;
>  	to->len += len;
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
  2023-06-29  6:53 ` Yunsheng Lin
@ 2023-06-29 12:17   ` Liang Chen
  2023-06-29 12:19     ` Liang Chen
  0 siblings, 1 reply; 11+ messages in thread
From: Liang Chen @ 2023-06-29 12:17 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: ilias.apalodimas, hawk, kuba, davem, edumazet, pabeni, netdev

On Thu, Jun 29, 2023 at 2:53 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/6/28 20:11, Liang Chen wrote:
> > In order to address the issues encountered with commit 1effe8ca4e34
> > ("skbuff: fix coalescing for page_pool fragment recycling"), the
> > combination of the following condition was excluded from skb coalescing:
> >
> > from->pp_recycle = 1
> > from->cloned = 1
> > to->pp_recycle = 1
> >
> > However, with page pool environments, the aforementioned combination can
> > be quite common. In scenarios with a higher number of small packets, it
> > can significantly affect the success rate of coalescing. For example,
> > when considering packets of 256 bytes size, our comparison of coalescing
> > success rate is as follows:
>
> As skb_try_coalesce() only allow coaleascing when 'to' skb is not cloned.
>
> Could you give more detailed about the testing when we have a non-cloned
> 'to' skb and a cloned 'from' skb? As both of them should be belong to the
> same flow.
>
> I had the below patchset trying to do something similar as this patch does:
> https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/
>
> It seems this patch is only trying to optimize a specific case for skb
> coalescing, So if skb coalescing between non-cloned and cloned skb is a
> common case, then it might worth optimizing.
>

Sure, Thanks for the information! The testing is just a common iperf
test as below.

iperf3 -c <server IP> -i 5 -f g -t 0 -l 128

We observed the frequency of each combination of the pp (page pool)
and clone condition when entering skb_try_coalesce. The results
motivated us to propose such an optimization, as we noticed that case
11 (from pp/clone=1/1 and to pp/clone = 1/0) occurs quite often.

+-------------+--------------+--------------+--------------+--------------+
|   from/to   | pp/clone=0/0 | pp/clone=0/1 | pp/clone=1/0 | pp/clone=1/1 |
+-------------+--------------+--------------+--------------+--------------+
|pp/clone=0/0 | 0            | 1            | 2            | 3            |
|pp/clone=0/1 | 4            | 5            | 6            | 7            |
|pp/clone=1/0 | 8            | 9            | 10           | 11           |
|pp/clone=1/1 | 12           | 13           | 14           | 15           |
|+------------+--------------+--------------+--------------+--------------+


packet size 128:
total : 152903
0 : 0            (0%)
1 : 0            (0%)
2 : 0            (0%)
3 : 0            (0%)
4 : 0            (0%)
5 : 0            (0%)
6 : 0            (0%)
7 : 0            (0%)
8 : 0            (0%)
9 : 0            (0%)
10 : 20681       (13%)
11 : 90136       (58%)
12 : 0           (0%)
13 : 0           (0%)
14 : 0           (0%)
15 : 42086       (27%)

Thanks,
Liang


>
> >
> > Without page pool: 70%
> > With page pool: 13%
> >
>
> ...
>
> > diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> > index 126f9e294389..05e5d8ead63b 100644
> > --- a/include/net/page_pool.h
> > +++ b/include/net/page_pool.h
> > @@ -399,4 +399,25 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
> >               page_pool_update_nid(pool, new_nid);
> >  }
> >
> > +static inline bool page_pool_is_pp_page(struct page *page)
> > +{
> > +     return (page->pp_magic & ~0x3UL) == PP_SIGNATURE;
> > +}
> > +
> > +static inline bool page_pool_is_pp_page_frag(struct page *page)> +{
> > +     return !!(page->pp->p.flags & PP_FLAG_PAGE_FRAG);
> > +}
> > +
> > +static inline void page_pool_page_ref(struct page *page)
> > +{
> > +     struct page *head_page = compound_head(page);
>
> It seems we could avoid adding head_page here:
> page = compound_head(page);
>
> > +
> > +     if (page_pool_is_pp_page(head_page) &&
> > +                     page_pool_is_pp_page_frag(head_page))
> > +             atomic_long_inc(&head_page->pp_frag_count);
> > +     else
> > +             get_page(head_page);
>
> page_ref_inc() should be enough here instead of get_page()
> as compound_head() have been called.
>
> > +}
> > +
> >  #endif /* _NET_PAGE_POOL_H */
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 6c5915efbc17..9806b091f0f6 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5666,8 +5666,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> >        * !@to->pp_recycle but its tricky (due to potential race with
> >        * the clone disappearing) and rare, so not worth dealing with.
> >        */
> > -     if (to->pp_recycle != from->pp_recycle ||
> > -         (from->pp_recycle && skb_cloned(from)))
> > +     if (to->pp_recycle != from->pp_recycle)
> >               return false;
> >
> >       if (len <= skb_tailroom(to)) {
> > @@ -5724,8 +5723,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> >       /* if the skb is not cloned this does nothing
> >        * since we set nr_frags to 0.
> >        */
> > -     for (i = 0; i < from_shinfo->nr_frags; i++)
> > -             __skb_frag_ref(&from_shinfo->frags[i]);
> > +     if (from->pp_recycle)
> > +             for (i = 0; i < from_shinfo->nr_frags; i++)
> > +                     page_pool_page_ref(skb_frag_page(&from_shinfo->frags[i]));
> > +     else
> > +             for (i = 0; i < from_shinfo->nr_frags; i++)
> > +                     __skb_frag_ref(&from_shinfo->frags[i]);
> >
> >       to->truesize += delta;
> >       to->len += len;
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
  2023-06-29 12:17   ` Liang Chen
@ 2023-06-29 12:19     ` Liang Chen
  2023-06-30 11:52       ` Yunsheng Lin
  0 siblings, 1 reply; 11+ messages in thread
From: Liang Chen @ 2023-06-29 12:19 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: ilias.apalodimas, hawk, kuba, davem, edumazet, pabeni, netdev

On Thu, Jun 29, 2023 at 8:17 PM Liang Chen <liangchen.linux@gmail.com> wrote:
>
> On Thu, Jun 29, 2023 at 2:53 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >
> > On 2023/6/28 20:11, Liang Chen wrote:
> > > In order to address the issues encountered with commit 1effe8ca4e34
> > > ("skbuff: fix coalescing for page_pool fragment recycling"), the
> > > combination of the following condition was excluded from skb coalescing:
> > >
> > > from->pp_recycle = 1
> > > from->cloned = 1
> > > to->pp_recycle = 1
> > >
> > > However, with page pool environments, the aforementioned combination can
> > > be quite common. In scenarios with a higher number of small packets, it
> > > can significantly affect the success rate of coalescing. For example,
> > > when considering packets of 256 bytes size, our comparison of coalescing
> > > success rate is as follows:
> >
> > As skb_try_coalesce() only allow coaleascing when 'to' skb is not cloned.
> >
> > Could you give more detailed about the testing when we have a non-cloned
> > 'to' skb and a cloned 'from' skb? As both of them should be belong to the
> > same flow.
> >
> > I had the below patchset trying to do something similar as this patch does:
> > https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/
> >
> > It seems this patch is only trying to optimize a specific case for skb
> > coalescing, So if skb coalescing between non-cloned and cloned skb is a
> > common case, then it might worth optimizing.
> >
>
> Sure, Thanks for the information! The testing is just a common iperf
> test as below.
>
> iperf3 -c <server IP> -i 5 -f g -t 0 -l 128
>
> We observed the frequency of each combination of the pp (page pool)
> and clone condition when entering skb_try_coalesce. The results
> motivated us to propose such an optimization, as we noticed that case
> 11 (from pp/clone=1/1 and to pp/clone = 1/0) occurs quite often.
>
> +-------------+--------------+--------------+--------------+--------------+
> |   from/to   | pp/clone=0/0 | pp/clone=0/1 | pp/clone=1/0 | pp/clone=1/1 |
> +-------------+--------------+--------------+--------------+--------------+
> |pp/clone=0/0 | 0            | 1            | 2            | 3            |
> |pp/clone=0/1 | 4            | 5            | 6            | 7            |
> |pp/clone=1/0 | 8            | 9            | 10           | 11           |
> |pp/clone=1/1 | 12           | 13           | 14           | 15           |
> |+------------+--------------+--------------+--------------+--------------+
>
>
> packet size 128:
> total : 152903
> 0 : 0            (0%)
> 1 : 0            (0%)
> 2 : 0            (0%)
> 3 : 0            (0%)
> 4 : 0            (0%)
> 5 : 0            (0%)
> 6 : 0            (0%)
> 7 : 0            (0%)
> 8 : 0            (0%)
> 9 : 0            (0%)
> 10 : 20681       (13%)
> 11 : 90136       (58%)
> 12 : 0           (0%)
> 13 : 0           (0%)
> 14 : 0           (0%)
> 15 : 42086       (27%)
>
> Thanks,
> Liang
>
>
> >
> > >
> > > Without page pool: 70%
> > > With page pool: 13%
> > >
> >
> > ...
> >
> > > diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> > > index 126f9e294389..05e5d8ead63b 100644
> > > --- a/include/net/page_pool.h
> > > +++ b/include/net/page_pool.h
> > > @@ -399,4 +399,25 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
> > >               page_pool_update_nid(pool, new_nid);
> > >  }
> > >
> > > +static inline bool page_pool_is_pp_page(struct page *page)
> > > +{
> > > +     return (page->pp_magic & ~0x3UL) == PP_SIGNATURE;
> > > +}
> > > +
> > > +static inline bool page_pool_is_pp_page_frag(struct page *page)> +{
> > > +     return !!(page->pp->p.flags & PP_FLAG_PAGE_FRAG);
> > > +}
> > > +
> > > +static inline void page_pool_page_ref(struct page *page)
> > > +{
> > > +     struct page *head_page = compound_head(page);
> >
> > It seems we could avoid adding head_page here:
> > page = compound_head(page);
> >

Sure.

> > > +
> > > +     if (page_pool_is_pp_page(head_page) &&
> > > +                     page_pool_is_pp_page_frag(head_page))
> > > +             atomic_long_inc(&head_page->pp_frag_count);
> > > +     else
> > > +             get_page(head_page);
> >
> > page_ref_inc() should be enough here instead of get_page()
> > as compound_head() have been called.
> >

Yeah, it will be changed to page_ref_inc on v2.

> > > +}
> > > +
> > >  #endif /* _NET_PAGE_POOL_H */
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 6c5915efbc17..9806b091f0f6 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -5666,8 +5666,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> > >        * !@to->pp_recycle but its tricky (due to potential race with
> > >        * the clone disappearing) and rare, so not worth dealing with.
> > >        */
> > > -     if (to->pp_recycle != from->pp_recycle ||
> > > -         (from->pp_recycle && skb_cloned(from)))
> > > +     if (to->pp_recycle != from->pp_recycle)
> > >               return false;
> > >
> > >       if (len <= skb_tailroom(to)) {
> > > @@ -5724,8 +5723,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> > >       /* if the skb is not cloned this does nothing
> > >        * since we set nr_frags to 0.
> > >        */
> > > -     for (i = 0; i < from_shinfo->nr_frags; i++)
> > > -             __skb_frag_ref(&from_shinfo->frags[i]);
> > > +     if (from->pp_recycle)
> > > +             for (i = 0; i < from_shinfo->nr_frags; i++)
> > > +                     page_pool_page_ref(skb_frag_page(&from_shinfo->frags[i]));
> > > +     else
> > > +             for (i = 0; i < from_shinfo->nr_frags; i++)
> > > +                     __skb_frag_ref(&from_shinfo->frags[i]);
> > >
> > >       to->truesize += delta;
> > >       to->len += len;
> > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
  2023-06-29 12:19     ` Liang Chen
@ 2023-06-30 11:52       ` Yunsheng Lin
  2023-07-03  9:09         ` Liang Chen
  0 siblings, 1 reply; 11+ messages in thread
From: Yunsheng Lin @ 2023-06-30 11:52 UTC (permalink / raw)
  To: Liang Chen; +Cc: ilias.apalodimas, hawk, kuba, davem, edumazet, pabeni, netdev

On 2023/6/29 20:19, Liang Chen wrote:
> On Thu, Jun 29, 2023 at 8:17 PM Liang Chen <liangchen.linux@gmail.com> wrote:
>>
>> On Thu, Jun 29, 2023 at 2:53 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>>
>>> On 2023/6/28 20:11, Liang Chen wrote:
>>>> In order to address the issues encountered with commit 1effe8ca4e34
>>>> ("skbuff: fix coalescing for page_pool fragment recycling"), the
>>>> combination of the following condition was excluded from skb coalescing:
>>>>
>>>> from->pp_recycle = 1
>>>> from->cloned = 1
>>>> to->pp_recycle = 1
>>>>
>>>> However, with page pool environments, the aforementioned combination can
>>>> be quite common. In scenarios with a higher number of small packets, it
>>>> can significantly affect the success rate of coalescing. For example,
>>>> when considering packets of 256 bytes size, our comparison of coalescing
>>>> success rate is as follows:
>>>
>>> As skb_try_coalesce() only allow coaleascing when 'to' skb is not cloned.
>>>
>>> Could you give more detailed about the testing when we have a non-cloned
>>> 'to' skb and a cloned 'from' skb? As both of them should be belong to the
>>> same flow.
>>>
>>> I had the below patchset trying to do something similar as this patch does:
>>> https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/
>>>
>>> It seems this patch is only trying to optimize a specific case for skb
>>> coalescing, So if skb coalescing between non-cloned and cloned skb is a
>>> common case, then it might worth optimizing.
>>>
>>
>> Sure, Thanks for the information! The testing is just a common iperf
>> test as below.
>>
>> iperf3 -c <server IP> -i 5 -f g -t 0 -l 128
>>
>> We observed the frequency of each combination of the pp (page pool)
>> and clone condition when entering skb_try_coalesce. The results
>> motivated us to propose such an optimization, as we noticed that case
>> 11 (from pp/clone=1/1 and to pp/clone = 1/0) occurs quite often.
>>
>> +-------------+--------------+--------------+--------------+--------------+
>> |   from/to   | pp/clone=0/0 | pp/clone=0/1 | pp/clone=1/0 | pp/clone=1/1 |
>> +-------------+--------------+--------------+--------------+--------------+
>> |pp/clone=0/0 | 0            | 1            | 2            | 3            |
>> |pp/clone=0/1 | 4            | 5            | 6            | 7            |
>> |pp/clone=1/0 | 8            | 9            | 10           | 11           |
>> |pp/clone=1/1 | 12           | 13           | 14           | 15           |
>> |+------------+--------------+--------------+--------------+--------------+


I run the iperf test, it seems there is only one skb_clone() calling for each
round, and I was using 'iperf', not 'iperf3'.
Is there any app like tcpdump running? It seems odd that the skb from the rx
need to be cloned for a common iperf test, which app or configuration is causing
the cloning?

Maybe using the ftrace to see the skb_clone() calling?
echo skb_clone > set_ftrace_filter
echo function > current_tracer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
  2023-06-28 12:11 [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case Liang Chen
  2023-06-29  6:53 ` Yunsheng Lin
@ 2023-06-30 23:07 ` Jakub Kicinski
  2023-07-03  9:12   ` Liang Chen
  1 sibling, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2023-06-30 23:07 UTC (permalink / raw)
  To: Liang Chen
  Cc: ilias.apalodimas, hawk, davem, edumazet, pabeni, linyunsheng, netdev

On Wed, 28 Jun 2023 20:11:50 +0800 Liang Chen wrote:
> +static inline void page_pool_page_ref(struct page *page)
> +{
> +	struct page *head_page = compound_head(page);
> +
> +	if (page_pool_is_pp_page(head_page) &&
> +			page_pool_is_pp_page_frag(head_page))
> +		atomic_long_inc(&head_page->pp_frag_count);
> +	else
> +		get_page(head_page);

AFAICT this is not correct, you cannot take an extra *pp* reference 
on a pp page, unless it is a frag. If the @to skb is a pp skb (and it
must be, if we get here) we will have two skbs with pp_recycle = 1.
Which means if they both get freed at the same time they will both
try to return / release the page.

I haven't looked at Yunsheng's v5 yet, but that's why I was asking 
to rename the pp_frag_count to pp_ref_count. pp_frag_count is a true
refcount (rather than skb->pp_recycle acting as a poor man's single
shot refcount), so in case of frag'd pages / after Yunsheng's work
we will be able to take new refs.

Long story short please wait until Yunsheng's changes are finalized.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
  2023-06-30 11:52       ` Yunsheng Lin
@ 2023-07-03  9:09         ` Liang Chen
  0 siblings, 0 replies; 11+ messages in thread
From: Liang Chen @ 2023-07-03  9:09 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: ilias.apalodimas, hawk, kuba, davem, edumazet, pabeni, netdev

On Fri, Jun 30, 2023 at 7:52 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/6/29 20:19, Liang Chen wrote:
> > On Thu, Jun 29, 2023 at 8:17 PM Liang Chen <liangchen.linux@gmail.com> wrote:
> >>
> >> On Thu, Jun 29, 2023 at 2:53 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>>
> >>> On 2023/6/28 20:11, Liang Chen wrote:
> >>>> In order to address the issues encountered with commit 1effe8ca4e34
> >>>> ("skbuff: fix coalescing for page_pool fragment recycling"), the
> >>>> combination of the following condition was excluded from skb coalescing:
> >>>>
> >>>> from->pp_recycle = 1
> >>>> from->cloned = 1
> >>>> to->pp_recycle = 1
> >>>>
> >>>> However, with page pool environments, the aforementioned combination can
> >>>> be quite common. In scenarios with a higher number of small packets, it
> >>>> can significantly affect the success rate of coalescing. For example,
> >>>> when considering packets of 256 bytes size, our comparison of coalescing
> >>>> success rate is as follows:
> >>>
> >>> As skb_try_coalesce() only allow coaleascing when 'to' skb is not cloned.
> >>>
> >>> Could you give more detailed about the testing when we have a non-cloned
> >>> 'to' skb and a cloned 'from' skb? As both of them should be belong to the
> >>> same flow.
> >>>
> >>> I had the below patchset trying to do something similar as this patch does:
> >>> https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/
> >>>
> >>> It seems this patch is only trying to optimize a specific case for skb
> >>> coalescing, So if skb coalescing between non-cloned and cloned skb is a
> >>> common case, then it might worth optimizing.
> >>>
> >>
> >> Sure, Thanks for the information! The testing is just a common iperf
> >> test as below.
> >>
> >> iperf3 -c <server IP> -i 5 -f g -t 0 -l 128
> >>
> >> We observed the frequency of each combination of the pp (page pool)
> >> and clone condition when entering skb_try_coalesce. The results
> >> motivated us to propose such an optimization, as we noticed that case
> >> 11 (from pp/clone=1/1 and to pp/clone = 1/0) occurs quite often.
> >>
> >> +-------------+--------------+--------------+--------------+--------------+
> >> |   from/to   | pp/clone=0/0 | pp/clone=0/1 | pp/clone=1/0 | pp/clone=1/1 |
> >> +-------------+--------------+--------------+--------------+--------------+
> >> |pp/clone=0/0 | 0            | 1            | 2            | 3            |
> >> |pp/clone=0/1 | 4            | 5            | 6            | 7            |
> >> |pp/clone=1/0 | 8            | 9            | 10           | 11           |
> >> |pp/clone=1/1 | 12           | 13           | 14           | 15           |
> >> |+------------+--------------+--------------+--------------+--------------+
>
>
> I run the iperf test, it seems there is only one skb_clone() calling for each
> round, and I was using 'iperf', not 'iperf3'.
> Is there any app like tcpdump running? It seems odd that the skb from the rx
> need to be cloned for a common iperf test, which app or configuration is causing
> the cloning?
>
> Maybe using the ftrace to see the skb_clone() calling?
> echo skb_clone > set_ftrace_filter
> echo function > current_tracer

Thanks for raising the concerns. We did some investigation into the
cause of skb cloning. The result is that in our environment (fedora 37
default network setup) NetworkMananger creates a SOCK_DGRAM socket,
which eventually leads to the additional packet_type (struct
packet_sock.prot_hook) being registered, thus the cloning. Since
__netif_receive_skb_core iterates through orig_dev->ptype_specific for
all possible skb delivery targets and increases skb->users
accordingly.

We will update the commit message to include this information to point
out that the figures are specific to our environment. But there are
many possible reasons skbs can be cloned, and improvements in this
code path can still bring benefits.

Thanks,
Liang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
  2023-06-30 23:07 ` Jakub Kicinski
@ 2023-07-03  9:12   ` Liang Chen
  2023-07-03 18:53     ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Liang Chen @ 2023-07-03  9:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: ilias.apalodimas, hawk, davem, edumazet, pabeni, linyunsheng, netdev

On Sat, Jul 1, 2023 at 7:07 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 28 Jun 2023 20:11:50 +0800 Liang Chen wrote:
> > +static inline void page_pool_page_ref(struct page *page)
> > +{
> > +     struct page *head_page = compound_head(page);
> > +
> > +     if (page_pool_is_pp_page(head_page) &&
> > +                     page_pool_is_pp_page_frag(head_page))
> > +             atomic_long_inc(&head_page->pp_frag_count);
> > +     else
> > +             get_page(head_page);
>
> AFAICT this is not correct, you cannot take an extra *pp* reference
> on a pp page, unless it is a frag. If the @to skb is a pp skb (and it
> must be, if we get here) we will have two skbs with pp_recycle = 1.
> Which means if they both get freed at the same time they will both
> try to return / release the page.
>
> I haven't looked at Yunsheng's v5 yet, but that's why I was asking
> to rename the pp_frag_count to pp_ref_count. pp_frag_count is a true
> refcount (rather than skb->pp_recycle acting as a poor man's single
> shot refcount), so in case of frag'd pages / after Yunsheng's work
> we will be able to take new refs.
>
> Long story short please wait until Yunsheng's changes are finalized.

Sure, thanks for the information. I will wait until Yunsheng's changes
are finalized before proposing v2.

As for the "pp" reference, it has the test
page_pool_is_pp_page_frag(head_page) there. So for a non-frag pp page,
it will be a get_page call.

Thanks,
Liang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
  2023-07-03  9:12   ` Liang Chen
@ 2023-07-03 18:53     ` Jakub Kicinski
  2023-07-04  0:50       ` Yunsheng Lin
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2023-07-03 18:53 UTC (permalink / raw)
  To: Liang Chen
  Cc: ilias.apalodimas, hawk, davem, edumazet, pabeni, linyunsheng, netdev

On Mon, 3 Jul 2023 17:12:46 +0800 Liang Chen wrote:
> As for the "pp" reference, it has the test
> page_pool_is_pp_page_frag(head_page) there. So for a non-frag pp page,
> it will be a get_page call.

You don't understand - you can't put a page from a page pool in two
skbs with pp_recycle set, unless the page is frag'ed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
  2023-07-03 18:53     ` Jakub Kicinski
@ 2023-07-04  0:50       ` Yunsheng Lin
  2023-07-04  4:54         ` Liang Chen
  0 siblings, 1 reply; 11+ messages in thread
From: Yunsheng Lin @ 2023-07-04  0:50 UTC (permalink / raw)
  To: Jakub Kicinski, Liang Chen
  Cc: ilias.apalodimas, hawk, davem, edumazet, pabeni, netdev

On 2023/7/4 2:53, Jakub Kicinski wrote:
> On Mon, 3 Jul 2023 17:12:46 +0800 Liang Chen wrote:
>> As for the "pp" reference, it has the test
>> page_pool_is_pp_page_frag(head_page) there. So for a non-frag pp page,
>> it will be a get_page call.
> 
> You don't understand - you can't put a page from a page pool in two
> skbs with pp_recycle set, unless the page is frag'ed.

Agreed. I think we should disallow skb coaleasing for non-frag pp page
instead of calling get_page(), as there is data race when calling
page_pool_return_skb_page() concurrently for the same non-frag pp page.

Even with my patchset, it may break the arch with
PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true.

> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
  2023-07-04  0:50       ` Yunsheng Lin
@ 2023-07-04  4:54         ` Liang Chen
  0 siblings, 0 replies; 11+ messages in thread
From: Liang Chen @ 2023-07-04  4:54 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Jakub Kicinski, ilias.apalodimas, hawk, davem, edumazet, pabeni, netdev

On Tue, Jul 4, 2023 at 8:50 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/7/4 2:53, Jakub Kicinski wrote:
> > On Mon, 3 Jul 2023 17:12:46 +0800 Liang Chen wrote:
> >> As for the "pp" reference, it has the test
> >> page_pool_is_pp_page_frag(head_page) there. So for a non-frag pp page,
> >> it will be a get_page call.
> >
> > You don't understand - you can't put a page from a page pool in two
> > skbs with pp_recycle set, unless the page is frag'ed.
>
> Agreed. I think we should disallow skb coaleasing for non-frag pp page
> instead of calling get_page(), as there is data race when calling
> page_pool_return_skb_page() concurrently for the same non-frag pp page.
>
> Even with my patchset, it may break the arch with
> PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true.
>

Yeah, that's my fault. I thought __page_pool_put_page handles this
elevated refcnt, but overlooked that the second skb release would
enter page_pool_return_skb_page again, not directly calling put_page.

Disallowing skb coalescing for non-frag pp pages sounds good; we will
handle this problem on v2 after Yunsheng's changes are finalized.

> > .
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-07-04  4:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-28 12:11 [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case Liang Chen
2023-06-29  6:53 ` Yunsheng Lin
2023-06-29 12:17   ` Liang Chen
2023-06-29 12:19     ` Liang Chen
2023-06-30 11:52       ` Yunsheng Lin
2023-07-03  9:09         ` Liang Chen
2023-06-30 23:07 ` Jakub Kicinski
2023-07-03  9:12   ` Liang Chen
2023-07-03 18:53     ` Jakub Kicinski
2023-07-04  0:50       ` Yunsheng Lin
2023-07-04  4:54         ` Liang Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.