All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>, <brouer@redhat.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	<davem@davemloft.net>, <kuba@kernel.org>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linuxarm@openeuler.org>, <hawk@kernel.org>,
	<jonathan.lemon@gmail.com>, <alobakin@pm.me>,
	<willemb@google.com>, <cong.wang@bytedance.com>,
	<pabeni@redhat.com>, <haokexin@gmail.com>, <nogikh@google.com>,
	<elver@google.com>, <memxor@gmail.com>, <edumazet@google.com>,
	<dsahern@gmail.com>
Subject: Re: Re: [PATCH net-next v2 3/3] skbuff: keep track of pp page when __skb_frag_ref() is called
Date: Fri, 17 Sep 2021 11:57:18 +0800	[thread overview]
Message-ID: <36676c07-c2ca-bbd2-972c-95b4027c424f@huawei.com> (raw)
In-Reply-To: <YUMxKhzm+9MDR0jW@apalos.home>

On 2021/9/16 19:57, Ilias Apalodimas wrote:
> On Thu, Sep 16, 2021 at 07:04:54PM +0800, Yunsheng Lin wrote:
>> On 2021/9/16 18:38, Ilias Apalodimas wrote:
>>> On Thu, Sep 16, 2021 at 05:33:39PM +0800, Yunsheng Lin wrote:
>>>> On 2021/9/16 16:44, Ilias Apalodimas wrote:
>>>>>>>> appear if we try to pull in your patches on using page pool and recycling
>>>>>
>>>>> [...]
>>>>>
>>>>>>>> for Tx where TSO and skb_split are used?
>>>>>>
>>>>>> As my understanding, the problem might exists without tx recycling, because a
>>>>>> skb from wire would be passed down to the tcp stack and retransmited back to
>>>>>> the wire theoretically. As I am not able to setup a configuration to verify
>>>>>> and test it and the handling seems tricky, so I am targetting net-next branch
>>>>>> instead of net branch.
>>>>>>
>>>>>>>>
>>>>>>>> I'll be honest, when I came up with the recycling idea for page pool, I
>>>>>>>> never intended to support Tx.  I agree with Alexander here,  If people want
>>>>>>>> to use it on Tx and think there's value,  we might need to go back to the
>>>>>>>> drawing board and see what I've missed.  It's still early and there's a
>>>>>>>> handful of drivers using it,  so it will less painful now.
>>>>>>
>>>>>> Yes, we also need to prototype it to see if there is something missing in the
>>>>>> drawing board and how much improvement we get from that:)
>>>>>>
>>>>>>>
>>>>>>> I agree, page_pool is NOT designed or intended for TX support.
>>>>>>> E.g. it doesn't make sense to allocate a page_pool instance per socket, as the backing memory structures for page_pool are too much.
>>>>>>> As the number RX-queues are more limited it was deemed okay that we use page_pool per RX-queue, which sacrifice some memory to gain speed.
>>>>>>
>>>>>> As memtioned before, Tx recycling is based on page_pool instance per socket.
>>>>>> it shares the page_pool instance with rx.
>>>>>>
>>>>>> Anyway, based on feedback from edumazet and dsahern, I am still trying to
>>>>>> see if the page pool is meaningful for tx.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> The pp_recycle_bit was introduced to make the checking faster, instead of
>>>>>>>> getting stuff into cache and check the page signature.  If that ends up
>>>>>>>> being counterproductive, we could just replace the entire logic with the
>>>>>>>> frag count and the page signature, couldn't we?  In that case we should be
>>>>>>>> very cautious and measure potential regression on the standard path.
>>>>>>>
>>>>>>> +1
>>>>>>
>>>>>> I am not sure "pp_recycle_bit was introduced to make the checking faster" is a
>>>>>> valid. The size of "struct page" is only about 9 words(36/72 bytes), which is
>>>>>> mostly to be in the same cache line, and both standard path and recycle path have
>>>>>> been touching the "struct page", so it seems the overhead for checking signature
>>>>>> seems minimal.
>>>>>>
>>>>>> I agree that we need to be cautious and measure potential regression on the
>>>>>> standard path.
>>>>>
>>>>> well pp_recycle is on the same cache line boundary with the head_frag we
>>>>> need to decide on recycling. After that we start checking page signatures
>>>>> etc,  which means the default release path remains mostly unaffected.  
>>>>>
>>>>> I guess what you are saying here, is that 'struct page' is going to be
>>>>> accessed eventually by the default network path,  so there won't be any 
>>>>> noticeable performance hit?  What about the other usecases we have
>>>>
>>>> Yes.
>>>
>>> In that case you'd need to call virt_to_head_page() early though, get it
>>> and then compare the signature.   I guess that's avoidable by using 
>>> frag->bv_page for the fragments?
>>
>> If a page of a skb frag is from page pool, It seems frag->bv_page is
>> always point to head_page of a compound page, so the calling of
>> virt_to_head_page() does not seems necessary.
>>
> 
> I was mostly referring to the skb head here and how would you trigger the
> recycling path. 
> 
> I think we are talking about different things here.  
> One idea is to use the last bit of frag->bv_page to identify fragments
> allocated from page_pool, which is done today with the signature.
> 
> The signature however exists in the head page so my question was, can we rid
> of that without having a performance penalty?

As both skb frag and head page is eventually operated on the head page
of a compound page(if it is a compound page) for normal case too, maybe
we can refactor the code to get the head page of a compound page before
the signature checking without doing a second virt_to_head_page() or
compound_head() call?

> 
> IOW in skb_free_head() an we replace:
> 
> if (skb_pp_recycle(skb, head)) 
> with
> if (page->pp_magic & ~0x3UL) == PP_SIGNATURE)
> and get rid of the 'bool recycle' argument in __skb_frag_unref()?

For the frag page of a skb, it seems ok to get rid of the 'bool recycle'
argument in __skb_frag_unref(), as __skb_frag_unref() and __skb_frag_ref()
is symmetrically called to put/get a page.

For the head page of a skb, we might need to make sure the head page
passed to __build_skb_around() meet below condition:
do pp_frag_count incrementing instead of _refcount incrementing when
the head page is not newly allocated and it is from page pool.
It seems hard to audit that?


> 
>> bit 0 of frag->bv_page is different way of indicatior for a pp page,
>> it is better we do not confuse with the page signature way. Using
>> a bit 0 may give us a free word in 'struct page' if we manage to
>> use skb->pp_recycle to indicate a head page of the skb uniquely, meaning
>> page->pp_magic can be used for future feature.
>>
>>
>>>
>>>>
>>>>> for pp_recycle right now?  __skb_frag_unref() in skb_shift() or
>>>>> skb_try_coalesce() (the latter can probably be removed tbh).
>>>>
>>>> If we decide to go with accurate indicator of a pp page, we just need
>>>> to make sure network stack use __skb_frag_unref() and __skb_frag_ref()
>>>> to put and get a page frag, the indicator checking need only done in
>>>> __skb_frag_unref() and __skb_frag_ref(), so the skb_shift() and
>>>> skb_try_coalesce() should be fine too.
>>>>
>>>>>
>>>>>>
>>>>>> Another way is to use the bit 0 of frag->bv_page ptr to indicate if a frag
>>>>>> page is from page pool.
>>>>>
>>>>> Instead of the 'struct page' signature?  And the pp_recycle bit will
>>>>> continue to exist?  
>>>>
>>>> pp_recycle bit might only exist or is only used for the head page for the skb.
>>>> The bit 0 of frag->bv_page ptr can be used to indicate a frag page uniquely.
>>>> Doing a memcpying of shinfo or "*fragto = *fragfrom" automatically pass the
>>>> indicator to the new shinfo before doing a __skb_frag_ref(), and __skb_frag_ref()
>>>> will increment the _refcount or pp_frag_count according to the bit 0 of
>>>> frag->bv_page.
>>>>
>>>> By the way, I also prototype the above idea, and it seems to work well too.
>>>>
>>>
>>> As long as no one else touches this, it's just another way of identifying a
>>> page_pool allocated page.  But are we gaining by that?  Not using
>>> virt_to_head_page() as stated above? But in that case you still need to
>>> keep pp_recycle around. 
>>
>> No, we do not need the pp_recycle, as long as the we make sure __skb_frag_ref()
>> is called after memcpying the shinfo or doing "*fragto = *fragfrom".
> 
> But we'll have to keep it for the skb head in this case.

As above, I am not really look into skb head case:)

> 
> Regards
> /Ilias
> 
>>
>>>
>>>>> .
>>>>> Right now the 'naive' explanation on the recycling decision is something like:
>>>>>
>>>>> if (pp_recycle) <--- recycling bit is set
>>>>>     (check page signature) <--- signature matches page pool
>>>>> 		(check fragment refcnt) <--- If frags are enabled and is the last consumer
>>>>> 			recycle
>>>>>
>>>>> If we can proove the performance is unaffected when we eliminate the first if,
>>>>> then obviously we should remove it.  I'll try running that test here and see,
>>>>> but keep in mind I am only testing on an 1GB interface.  Any chance we can get 
>>>>> measurements on a beefier hardware using hns3 ?
>>>>
>>>> Sure, I will try it.
>>>> As the kind of performance overhead is small, any performance testcase in mind?
>>>>
>>>
>>> 'eliminate the first if' wasn't accurate.  I meant switch the first if and
>>> check the struct page signature instead.  That would be the best solution
>>> imho.  We effectively have a single rule to check if a packet comes from
>>> page_pool or not.
>>
>> I am not sure what does "switch " means here, if the page signature can
>> indicate a pp page uniquely, the "if (pp_recycle)" checking can be removed.
>>
>>>
>>> You can start by sending a lot of packets and dropping those immediately.
>>> That should put enough stress on the receive path and the allocators and it
>>> should give us a rough idea. 
>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> But in general,  I'd be happier if we only had a simple logic in our
>>>>>>>> testing for the pages we have to recycle.  Debugging and understanding this
>>>>>>>> otherwise will end up being a mess.
>>>>>>>
>>>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>> Regards
>>>>> /Ilias
>>>>> .
>>>>>
>>>
>>> Regards
>>> /Ilias
>>> .
>>>
> .
> 

  reply	other threads:[~2021-09-17  3:57 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 12:11 [PATCH net-next v2 0/3] some optimization for page pool Yunsheng Lin
2021-09-14 12:11 ` [PATCH net-next v2 1/3] page_pool: support non-split page with PP_FLAG_PAGE_FRAG Yunsheng Lin
2021-09-14 12:11 ` [PATCH net-next v2 2/3] pool_pool: avoid calling compound_head() for skb frag page Yunsheng Lin
2021-09-14 12:11 ` [PATCH net-next v2 3/3] skbuff: keep track of pp page when __skb_frag_ref() is called Yunsheng Lin
2021-09-15  0:59   ` Alexander Duyck
2021-09-15  9:07     ` Yunsheng Lin
2021-09-15 12:56       ` Ilias Apalodimas
2021-09-15 15:42         ` Jesper Dangaard Brouer
2021-09-16  2:05           ` [Linuxarm] " Yunsheng Lin
2021-09-16  8:44             ` Ilias Apalodimas
2021-09-16  9:33               ` Yunsheng Lin
2021-09-16 10:38                 ` Ilias Apalodimas
2021-09-16 11:04                   ` Yunsheng Lin
2021-09-16 11:21                     ` Yunsheng Lin
2021-09-16 11:57                     ` [Linuxarm] " Ilias Apalodimas
2021-09-17  3:57                       ` Yunsheng Lin [this message]
2021-09-17  6:38                         ` Ilias Apalodimas
2021-09-17  9:17                           ` Yunsheng Lin
2021-09-17 15:01                             ` Ilias Apalodimas
2021-09-18  1:43                               ` Yunsheng Lin
2021-09-18  9:23                                 ` Ilias Apalodimas
2021-09-22  3:38                                   ` Yunsheng Lin
2021-09-17 17:15             ` [Linuxarm] " Eric Dumazet
2021-09-18  2:42               ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36676c07-c2ca-bbd2-972c-95b4027c424f@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alobakin@pm.me \
    --cc=brouer@redhat.com \
    --cc=cong.wang@bytedance.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=edumazet@google.com \
    --cc=elver@google.com \
    --cc=haokexin@gmail.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jbrouer@redhat.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=nogikh@google.com \
    --cc=pabeni@redhat.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.