Re: [PATCH v2] rcu/kvfree: Prevents cache growing when the backoff_page_cache_fill is set

From: Uladzislau Rezki <urezki@gmail.com>
To: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Cc: "paulmck@kernel.org" <paulmck@kernel.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	"frederic@kernel.org" <frederic@kernel.org>,
	"joel@joelfernandes.org" <joel@joelfernandes.org>,
	"qiang.zhang1211@gmail.com" <qiang.zhang1211@gmail.com>,
	"rcu@vger.kernel.org" <rcu@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] rcu/kvfree: Prevents cache growing when the backoff_page_cache_fill is set
Date: Sat, 8 Apr 2023 11:04:39 +0200	[thread overview]
Message-ID: <ZDEuJwnkpkr717Fr@pc636> (raw)
In-Reply-To: <PH0PR11MB5880D0D946C20C4DEE3AEF99DA979@PH0PR11MB5880.namprd11.prod.outlook.com>

On Sat, Apr 08, 2023 at 01:56:40AM +0000, Zhang, Qiang1 wrote:
> On Fri, Apr 07, 2023 at 01:26:39AM +0000, Zhang, Qiang1 wrote:
> > > >>On Thu, Apr 06, 2023 at 06:37:53AM +0200, Uladzislau Rezki wrote:
> > > > On Thu, Apr 06, 2023 at 08:12:38AM +0800, Zqiang wrote:
> > > > > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is
> > > > > executed before kfree_rcu_monitor() to drain page cache, if the bnode
> > > > > structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the
> > > > > page cache again in kfree_rcu_monitor(), this commit add a check
> > > > > for krcp structure's->backoff_page_cache_fill in put_cached_bnode(),
> > > > > if the krcp structure's->backoff_page_cache_fill is set, prevent page
> > > > > cache growing.
> > > > > 
> > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > > > > ---
> > > > >  kernel/rcu/tree.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 9cc0a7766fd2..f25430ae1936 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -2907,6 +2907,8 @@ static inline bool
> > > > >  put_cached_bnode(struct kfree_rcu_cpu *krcp,
> > > > >  	struct kvfree_rcu_bulk_data *bnode)
> > > > >  {
> > > > > +	if (atomic_read(&krcp->backoff_page_cache_fill))
> > > > > +		return false;
> > > > >  	// Check the limit.
> > > > >  	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
> > > > >  		return false;
> > > > > -- 
> > > > > 2.32.0
> > > > >
> > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > >
> > > >Thank you both!
> > > >
> > > >One question, though.  Might it be better to instead modify the "for"
> > > >loop in fill_page_cache_func() to start at krcp->nr_bkv_objs instead
> > > >of starting at zero?  That way, we still provide a single page under
> > > >low-memory conditions, but provide rcu_min_cached_objs of them if memory
> > > >is plentiful.
> > > >
> > > >Alternatively, if we really don't want to allow any pages at all under
> > > >low-memory conditions, shouldn't the fill_page_cache_func() set nr_pages
> > > >to zero (instead of the current 1) when the krcp->backoff_page_cache_fill
> > > >flag is set?  
> > > 
> > > Hi, Paul
> > > 
> > > If the krcp->backoff_page_cache_fill is true, the put_cached_bnode () return false,
> > > the allocated single page will also be freed in fill_page_cache_func().
> > > 
> > > or it would be better not to allocate under memory pressure.
> > >
> > >That was my thought.  ;-)
> > >
> > > How about like this?
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 9cc0a7766fd2..94aedbc3da36 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2907,6 +2907,8 @@ static inline bool
> > >  put_cached_bnode(struct kfree_rcu_cpu *krcp,
> > >         struct kvfree_rcu_bulk_data *bnode)
> > >  {
> > > +       if (atomic_read(&krcp->backoff_page_cache_fill))
> > > +               return false;
> > >         // Check the limit.
> > >         if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
> > >                 return false;
> > > @@ -3220,7 +3222,7 @@ static void fill_page_cache_func(struct work_struct *work)
> > >         int i;
> > > 
> > >         nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ?
> > > -               1 : rcu_min_cached_objs;
> > > +               0 : rcu_min_cached_objs;
> > > 
> > >         for (i = 0; i < nr_pages; i++) {
> > >
> > >The other question is why this loop does not allow for any pages
> > >that might already be allocated, thus perhaps looking like this:
> > >
> > >		for (i = krcp->nr_bkv_objs; i < nr_pages; i++) {
> > >
> > >Or do we somehow know that krcp->nr_bkv_objs is equal to zero?  (I am not
> > >seeing this, but I do feel the need to ask.)
> > 
> > 
> > The fill_page_cache_func() is triggered when we invoke get_cached_bnode() return NULL,
> > this also means that krcp->nr_bkv_objs is equal to zero. 
> > But if can_alloc is set,  we will unlock krcp0->lock and allocated single page,  after that
> > we will reacquire krcp1 and lock,  but the krcp1 at this time may be different from the
> > previous krcp0,  if !bnode is true, also trigger work to invoke fill_page_cache_func(),  but
> > maybe the krcp1-> nr_bkv_objs is not equal to zero.
> >
> >OK.  Given all of these good points, what would be a good patch for
> >this issue?  ;-)
> 
> Is it possible to keep the filling of the page always on the correct krcp?
> 
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3303,7 +3303,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
>                         // scenarios.
>                         bnode = (struct kvfree_rcu_bulk_data *)
>                                 __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
> -                       *krcp = krc_this_cpu_lock(flags);
> +                       raw_spin_lock_irqsave(&(*krcp)->lock, *flags);
>                 }
> 
I do not expect any contention if this is applied. From the other
hand it might be that it was done deliberately for some reason or
because we most likely anyway stay on the same CPU.

I did some test to see how many times a migration occurs during this
small window and according to my data it was negligible.

Anyway i do not have any objections as of now against this change
and it looks correct to me.

--
Uladzislau Rezki