All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	<netdev@vger.kernel.org>, Eric Dumazet <eric.dumazet@gmail.com>,
	<linux-mm@kvack.org>, Mel Gorman <mgorman@techsingularity.net>
Cc: lorenzo@kernel.org, "Toke Høiland-Jørgensen" <toke@redhat.com>,
	bpf@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	willy@infradead.org
Subject: Re: [PATCH RFC net-next/mm V3 1/2] page_pool: Remove workqueue in new shutdown scheme
Date: Thu, 4 May 2023 10:42:32 +0800	[thread overview]
Message-ID: <387f4653-1986-3ffe-65e7-448a59002ed0@huawei.com> (raw)
In-Reply-To: <168269857929.2191653.13267688321246766547.stgit@firesoul>

On 2023/4/29 0:16, Jesper Dangaard Brouer wrote:
>  void page_pool_release_page(struct page_pool *pool, struct page *page)
>  {
> +	unsigned int flags = READ_ONCE(pool->p.flags);
>  	dma_addr_t dma;
> -	int count;
> +	u32 release_cnt;
> +	u32 hold_cnt;
>  
>  	if (!(pool->p.flags & PP_FLAG_DMA_MAP))
>  		/* Always account for inflight pages, even if we didn't
> @@ -490,11 +503,15 @@ void page_pool_release_page(struct page_pool *pool, struct page *page)
>  skip_dma_unmap:
>  	page_pool_clear_pp_info(page);
>  
> -	/* This may be the last page returned, releasing the pool, so
> -	 * it is not safe to reference pool afterwards.
> -	 */
> -	count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt);
> -	trace_page_pool_state_release(pool, page, count);

There is a time window between "unsigned int flags = READ_ONCE(pool->p.flags)"
and flags checking, if page_pool_destroy() is called concurrently during that
time window, it seems we will have a pp instance leaking problem here?

It seems it is very hard to aovid this kind of corner case when using both
flags & PP_FLAG_SHUTDOWN and release_cnt/hold_cnt checking to decide if pp
instance can be freed.
Can we use something like biased reference counting, which used by frag support
in page pool? So that we only need to check only one variable and avoid cache
bouncing as much as possible.

> +	if (flags & PP_FLAG_SHUTDOWN)
> +		hold_cnt = pp_read_hold_cnt(pool);
> +
> +	release_cnt = atomic_inc_return(&pool->pages_state_release_cnt);
> +	trace_page_pool_state_release(pool, page, release_cnt);
> +
> +	/* In shutdown phase, last page will free pool instance */
> +	if (flags & PP_FLAG_SHUTDOWN)
> +		page_pool_free_attempt(pool, hold_cnt, release_cnt);
>  }
>  EXPORT_SYMBOL(page_pool_release_page);
> 

...

>  
>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
> @@ -856,6 +884,10 @@ EXPORT_SYMBOL(page_pool_unlink_napi);
>  
>  void page_pool_destroy(struct page_pool *pool)
>  {
> +	unsigned int flags;
> +	u32 release_cnt;
> +	u32 hold_cnt;
> +
>  	if (!pool)
>  		return;
>  
> @@ -868,11 +900,39 @@ void page_pool_destroy(struct page_pool *pool)
>  	if (!page_pool_release(pool))
>  		return;
>  
> -	pool->defer_start = jiffies;
> -	pool->defer_warn  = jiffies + DEFER_WARN_INTERVAL;
> +	/* PP have pages inflight, thus cannot immediately release memory.
> +	 * Enter into shutdown phase, depending on remaining in-flight PP
> +	 * pages to trigger shutdown process (on concurrent CPUs) and last
> +	 * page will free pool instance.
> +	 *
> +	 * There exist two race conditions here, we need to take into
> +	 * account in the following code.
> +	 *
> +	 * 1. Before setting PP_FLAG_SHUTDOWN another CPU released the last
> +	 *    pages into the ptr_ring.  Thus, it missed triggering shutdown
> +	 *    process, which can then be stalled forever.
> +	 *
> +	 * 2. After setting PP_FLAG_SHUTDOWN another CPU released the last
> +	 *    page, which triggered shutdown process and freed pool
> +	 *    instance. Thus, its not safe to dereference *pool afterwards.
> +	 *
> +	 * Handling races by holding a fake in-flight count, via
> +	 * artificially bumping pages_state_hold_cnt, which assures pool
> +	 * isn't freed under us.  For race(1) its safe to recheck ptr_ring
> +	 * (it will not free pool). Race(2) cannot happen, and we can
> +	 * release fake in-flight count as last step.
> +	 */
> +	hold_cnt = READ_ONCE(pool->pages_state_hold_cnt) + 1;
> +	smp_store_release(&pool->pages_state_hold_cnt, hold_cnt);

I assume the smp_store_release() is used to ensure the correct order
between the above store operations?
There is data dependency between those two store operations, do we
really need the smp_store_release() here?

> +	barrier();
> +	flags = READ_ONCE(pool->p.flags) | PP_FLAG_SHUTDOWN;

Do we need a stronger barrier like smp_rmb() to prevent cpu from
executing "flags = READ_ONCE(pool->p.flags) | PP_FLAG_SHUTDOWN"
before "smp_store_release(&pool->pages_state_hold_cnt, hold_cnt)"
even if there is a smp_store_release() barrier here?

> +	smp_store_release(&pool->p.flags, flags);
> +
> +	/* Concurrent CPUs could have returned last pages into ptr_ring */
> +	page_pool_empty_ring(pool);
>  
> -	INIT_DELAYED_WORK(&pool->release_dw, page_pool_release_retry);
> -	schedule_delayed_work(&pool->release_dw, DEFER_TIME);
> +	release_cnt = atomic_inc_return(&pool->pages_state_release_cnt);
> +	page_pool_free_attempt(pool, hold_cnt, release_cnt);
>  }
>  EXPORT_SYMBOL(page_pool_destroy);
>  
> 
> 
> .
> 

  parent reply	other threads:[~2023-05-04  2:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-28 16:16 [PATCH RFC net-next/mm V3 0/2] page_pool: new approach for leak detection and shutdown phase Jesper Dangaard Brouer
2023-04-28 16:16 ` [PATCH RFC net-next/mm V3 1/2] page_pool: Remove workqueue in new shutdown scheme Jesper Dangaard Brouer
2023-04-28 21:38   ` Toke Høiland-Jørgensen
2023-05-03 15:21     ` Jesper Dangaard Brouer
2023-05-03  2:33   ` Jakub Kicinski
2023-05-03 11:18     ` Toke Høiland-Jørgensen
2023-05-03 15:49       ` Jesper Dangaard Brouer
2023-05-04  1:47         ` Jakub Kicinski
2023-05-04  2:42   ` Yunsheng Lin [this message]
2023-05-04 13:48     ` Jesper Dangaard Brouer
2023-05-05  0:54       ` Yunsheng Lin
2023-05-06 13:11         ` Yunsheng Lin
2023-04-28 16:16 ` [PATCH RFC net-next/mm V3 2/2] mm/page_pool: catch page_pool memory leaks Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=387f4653-1986-3ffe-65e7-448a59002ed0@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=kuba@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toke@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.