All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saeed Mahameed <saeedm@mellanox.com>
To: "linyunsheng@huawei.com" <linyunsheng@huawei.com>,
	"brouer@redhat.com" <brouer@redhat.com>
Cc: "jonathan.lemon@gmail.com" <jonathan.lemon@gmail.com>,
	"ilias.apalodimas@linaro.org" <ilias.apalodimas@linaro.org>,
	Li Rongqing <lirongqing@baidu.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE condition
Date: Tue, 10 Dec 2019 19:45:28 +0000	[thread overview]
Message-ID: <585eda1ebe8788959b31bca5bb6943908c08c909.camel@mellanox.com> (raw)
In-Reply-To: <e9855bd9-dddd-e12c-c889-b872702f80d1@huawei.com>

On Tue, 2019-12-10 at 09:31 +0800, Yunsheng Lin wrote:
> On 2019/12/10 7:34, Saeed Mahameed wrote:
> > On Mon, 2019-12-09 at 13:14 +0100, Jesper Dangaard Brouer wrote:
> > > On Sat, 7 Dec 2019 03:52:41 +0000
> > > Saeed Mahameed <saeedm@mellanox.com> wrote:
> > > 
> > > > On Fri, 2019-12-06 at 17:32 +0800, Li RongQing wrote:
> > > > > some drivers uses page pool, but not require to allocate
> > > > > pages from bound node, or simply assign pool.p.nid to
> > > > > NUMA_NO_NODE, and the commit d5394610b1ba ("page_pool:
> > > > > Don't recycle non-reusable pages") will block this kind
> > > > > of driver to recycle
> > > > > 
> > > > > so take page as reusable when page belongs to current
> > > > > memory node if nid is NUMA_NO_NODE
> > > > > 
> > > > > v1-->v2: add check with numa_mem_id from Yunsheng
> > > > > 
> > > > > Fixes: d5394610b1ba ("page_pool: Don't recycle non-reusable
> > > > > pages")
> > > > > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > > > > Suggested-by: Yunsheng Lin <linyunsheng@huawei.com>
> > > > > Cc: Saeed Mahameed <saeedm@mellanox.com>
> > > > > ---
> > > > >  net/core/page_pool.c | 7 ++++++-
> > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > > > index a6aefe989043..3c8b51ccd1c1 100644
> > > > > --- a/net/core/page_pool.c
> > > > > +++ b/net/core/page_pool.c
> > > > > @@ -312,12 +312,17 @@ static bool
> > > > > __page_pool_recycle_direct(struct
> > > > > page *page,
> > > > >  /* page is NOT reusable when:
> > > > >   * 1) allocated when system is under some pressure.
> > > > > (page_is_pfmemalloc)
> > > > >   * 2) belongs to a different NUMA node than pool->p.nid.
> > > > > + * 3) belongs to a different memory node than current
> > > > > context
> > > > > + * if pool->p.nid is NUMA_NO_NODE
> > > > >   *
> > > > >   * To update pool->p.nid users must call
> > > > > page_pool_update_nid.
> > > > >   */
> > > > >  static bool pool_page_reusable(struct page_pool *pool,
> > > > > struct
> > > > > page
> > > > > *page)
> > > > >  {
> > > > > -	return !page_is_pfmemalloc(page) && page_to_nid(page)
> > > > > ==
> > > > > pool-  
> > > > > > p.nid;  
> > > > > +	return !page_is_pfmemalloc(page) &&
> > > > > +		(page_to_nid(page) == pool->p.nid ||
> > > > > +		(pool->p.nid == NUMA_NO_NODE &&
> > > > > +		page_to_nid(page) == numa_mem_id()));
> > > > >  }
> > > > >    
> > > > 
> > > > Cc'ed Jesper, Ilias & Jonathan.
> > > > 
> > > > I don't think it is correct to check that the page nid is same
> > > > as
> > > > numa_mem_id() if pool is NUMA_NO_NODE. In such case we should
> > > > allow
> > > > all
> > > > pages to recycle, because you can't assume where pages are
> > > > allocated
> > > > from and where they are being handled.
> > > > 
> > > > I suggest the following:
> > > > 
> > > > return !page_pfmemalloc() && 
> > > > ( page_to_nid(page) == pool->p.nid || pool->p.nid ==
> > > > NUMA_NO_NODE
> > > > );
> > > > 
> > > > 1) never recycle emergency pages, regardless of pool nid.
> > > > 2) always recycle if pool is NUMA_NO_NODE.
> > > > 
> > > > the above change should not add any overhead, a modest branch
> > > > predictor
> > > > will handle this with no effort.
> > > > 
> > > > Jesper et al. what do you think?
> > > 
> > > The patch description doesn't explain the problem very well.
> > > 
> > > Lets first establish what the problem is.  After I took at closer
> > > look,
> > > I do think we have a real problem here...
> > > 
> > > If function alloc_pages_node() is called with NUMA_NO_NODE (see
> > > below
> > > signature), then the nid is re-assigned to numa_mem_id().
> > > 
> > > Our current code checks: page_to_nid(page) == pool->p.nid which
> > > seems
> > > bogus, as pool->p.nid=NUMA_NO_NODE and the page NID will not
> > > return
> > > NUMA_NO_NODE... as it was set to the local detect numa node,
> > > right?
> > > 
> > 
> > right.
> > 
> > > So, we do need a fix... but the question is that semantics do we
> > > want?
> > > 
> > 
> > maybe assume that __page_pool_recycle_direct() is always called
> > from
> > the right node and change the current bogus check:
> > 
> > from:
> > page_to_nid(page) == pool->p.nid 
> > 
> > to:
> > page_to_nid(page) == numa_mem_id()
> > 
> > This will allow recycling only if handling node is the same as
> > where
> > the page was allocated, regardless of pool->p.nid.
> > 
> > so semantics are:
> > 
> > 1) allocate from: pool->p.nid, as chosen by user.
> > 2) recycle when: page_to_nid(page) == numa_mem_id().
> > 3) pool user must guarantee that the handler will run on the right
> > node. which should always be the case. otherwise recycling will be
> > skipped (no cross numa recycling).
> > 
> > 
> > a) if the pool migrates, we will stop recycling until the pool
> > moves
> > back to original node, or user calls pool_update_nid() as we do in
> > mlx5.
> > b) if pool is NUMA_NO_NODE, then allocation and handling will be
> > done
> > on numa_mem_id(), which means the above check will work.
> 
> Only checking page_to_nid(page) == numa_mem_id() may not work for the
> below
> case in mvneta.c:
> 
> static int mvneta_create_page_pool(struct mvneta_port *pp,
> 				   struct mvneta_rx_queue *rxq, int
> size)
> {
> 	struct bpf_prog *xdp_prog = READ_ONCE(pp->xdp_prog);
> 	struct page_pool_params pp_params = {
> 		.order = 0,
> 		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
> 		.pool_size = size,
> 		.nid = cpu_to_node(0),
> 		.dev = pp->dev->dev.parent,
> 		.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL :
> DMA_FROM_DEVICE,
> 		.offset = pp->rx_offset_correction,
> 		.max_len = MVNETA_MAX_RX_BUF_SIZE,
> 	};
> 
> the pool->p.nid is not NUMA_NO_NODE, then the node of page allocated
> for rx
> may not be numa_mem_id() when running in the NAPI polling, because
> pool->p.nid
> is not the same as the node of cpu running in the NAPI polling.
> 
> Does the page pool support recycling for above case?
> 

I don't think you want to allow cross numa recycling.

> Or we "fix' the above case by setting pool->p.nid to
> NUMA_NO_NODE/dev_to_node(),
> or by calling pool_update_nid() in NAPI polling as mlx5 does?
> 

Yes just update_nid when needed, and make sure the NAPI polling runs on
a consistent core and eventually alloc/recycling will happen on the
same core.

> 
> > Thanks,
> > Saeed.
> > 
> > 
> > 
> > 
> > 
> > 
> > 

  parent reply	other threads:[~2019-12-10 19:45 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06  9:32 [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE condition Li RongQing
2019-12-07  3:52 ` Saeed Mahameed
2019-12-09  1:31   ` Yunsheng Lin
2019-12-09  3:47     ` 答复: " Li,Rongqing
2019-12-09  9:30       ` Ilias Apalodimas
2019-12-09 10:37         ` 答复: " Li,Rongqing
2019-12-09 12:14   ` Jesper Dangaard Brouer
2019-12-09 23:34     ` Saeed Mahameed
2019-12-10  1:31       ` Yunsheng Lin
2019-12-10  9:39         ` 答复: " Li,Rongqing
2019-12-10 14:52           ` Ilias Apalodimas
2019-12-10 19:56           ` Saeed Mahameed
2019-12-10 19:45         ` Saeed Mahameed [this message]
2019-12-11  3:01           ` Yunsheng Lin
2019-12-11  3:06             ` Yunsheng Lin
2019-12-11 20:57             ` Saeed Mahameed
2019-12-12  1:04               ` Yunsheng Lin
2019-12-10 15:02       ` Ilias Apalodimas
2019-12-10 20:02         ` Saeed Mahameed
2019-12-10 20:10           ` Ilias Apalodimas
2019-12-11 18:49   ` Jesper Dangaard Brouer
2019-12-11 21:24     ` Saeed Mahameed
2019-12-12  1:34       ` Yunsheng Lin
2019-12-12 10:18         ` Jesper Dangaard Brouer
2019-12-13  3:40           ` Yunsheng Lin
2019-12-13  6:27             ` 答复: " Li,Rongqing
2019-12-13  6:53               ` Yunsheng Lin
2019-12-13  8:48                 ` Jesper Dangaard Brouer
2019-12-16  1:51                   ` Yunsheng Lin
2019-12-16  4:02                     ` 答复: " Li,Rongqing
2019-12-16 10:13                       ` Ilias Apalodimas
2019-12-16 10:16                         ` Ilias Apalodimas
2019-12-16 10:57                           ` 答复: " Li,Rongqing
2019-12-17 19:38                         ` Saeed Mahameed
2019-12-17 19:35             ` Saeed Mahameed
2019-12-17 19:27           ` Saeed Mahameed
2019-12-16 12:15         ` Michal Hocko
2019-12-16 12:34           ` Ilias Apalodimas
2019-12-16 13:08             ` Michal Hocko
2019-12-16 13:21               ` Ilias Apalodimas
2019-12-17  2:11                 ` Yunsheng Lin
2019-12-17  9:11                   ` Michal Hocko
2019-12-19  2:09                     ` Yunsheng Lin
2019-12-19 11:53                       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=585eda1ebe8788959b31bca5bb6943908c08c909.camel@mellanox.com \
    --to=saeedm@mellanox.com \
    --cc=brouer@redhat.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=linyunsheng@huawei.com \
    --cc=lirongqing@baidu.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.