linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>,
	LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.iitr10@gmail.com>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Subject: Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
Date: Tue, 14 Jun 2022 22:12:05 -0700	[thread overview]
Message-ID: <20220615051205.GG1790663@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <YqgtuBXW5pW4ivD/@pc638.lan>

On Tue, Jun 14, 2022 at 08:42:00AM +0200, Uladzislau Rezki wrote:
> > Hello, Joel, Paul.
> > 
> > > Hi Vlad, Paul,
> > > 
> > > On Thu, Jun 09, 2022 at 03:10:57PM +0200, Uladzislau Rezki wrote:
> > > > On Tue, Jun 7, 2022 at 5:47 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > > On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > > > > > > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > > > > Currently the monitor work is scheduled with a fixed interval that
> > > > > > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > > > > > > a low utilization of page slot in some scenarios. The page can store
> > > > > > > > up to 512 records. For example on Android system it can look like:
> > > > > > > >
> > > > > > > > <snip>
> > > > > > > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > > > > > > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > > > > > > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > > > > > > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > > > > > > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > > > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > > > > > > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > > > > > > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > > > > > > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > > > > > > <snip>
> > > > > > > >
> > > > > > > > where a page only carries few records to reclaim a memory. In order to
> > > > > > > > improve batching and make utilization more efficient the patch introduces
> > > > > > > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > > > > > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > > > > > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > > > > > > interval is set to its maximum timeout that improves the utilization of
> > > > > > > > page slots.
> > > > > > > >
> > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > >
> > > > > > > That does look like a problem well worth solving!
> > > > > > >
> > > > > > Agree, better ideas make better final solution :)
> > > > > >
> > > > > > >
> > > > > > > But I am missing one thing. If we are having a callback flood, why do we
> > > > > > > need a shorter timeout?
> > > > > > >
> > > > > > To offload faster, because otherwise we run into classical issue, it is a low
> > > > > > memory condition state resulting in OOM.
> > > > >
> > > > > But doesn't each callback queued during the flood give us an opportunity
> > > > > to react to the flood?  That will be way more fine-grained than any
> > > > > reasonable timer, right?  Or am I missing something?
> > > > >
> > > > We can set the timer to zero or to current "jiffies" to initiate the
> > > > offloading if the
> > > > page is full. In that sense probably it make sense to propagate those two attr.
> > > > to user space, so the user can configure min/max drain interval.
> > > > 
> > > > Or we can only deal with fixed interval exposed via sysfs to control it by user.
> > > > In that case we can get rid of MIN one and just trigger a timer if the page is
> > > > full. I think this approach is better.
> > > 
> > > Yes I also think triggering timer with zero-timeout is better. Can you (Vlad)
> > > accomplish that by just calling the timer callback inline, instead of queuing
> > > a timer? I imagine you would just do queue_work() instead of
> > > queue_delayed_work() in this scenario.
> > > 
> > > > > I do agree that the action would often need to be indirect to avoid the
> > > > > memory-allocation-state hassles, but we already can do that, either via
> > > > > an extremely short-term hrtimer or something like irq-work.
> > > > >
> > > > > > > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > > > > > > and provide faster response to the start of a callback flood?
> > > > > > >
> > > > > > I rely on krcp->count because not always we can store the pointer in the page
> > > > > > slots. We can not allocate a page in the caller context thus we use page-cache
> > > > > > worker that fills the cache in normal context. While it populates the cache,
> > > > > > pointers temporary are queued to the linked-list.
> > > > > >
> > > > > > Any thoughts?
> > > > >
> > > > > There are a great many ways to approach this.  One of them is to maintain
> > > > > a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
> > > > > counter each jiffy.
> > > > >
> > > > > Or am I missing a trick here?
> > > > >
> > > > Do you mean to have a per-cpu timer that checks the per-cpu-freed counter
> > > > and schedule the work when if it is needed? Or i have missed your point?
> > > 
> > > I think he (Paul) is describing the way 'flood detection' can work similar to how the
> > > bypass list code is implemented. There he maintains a count which only if
> > > exceeds a limit, will queue on to the bypass list.
> > > 
> > OK, i see that. We also do similar thing. We say it is a flood - when a
> > page becomes full, so it is kind of threshold that we pass.
> > 
> > > This code:
> > > 
> > >         // If we have advanced to a new jiffy, reset counts to allow
> > >         // moving back from ->nocb_bypass to ->cblist.
> > >         if (j == rdp->nocb_nobypass_last) {
> > >                 c = rdp->nocb_nobypass_count + 1;
> > >         } else {
> > >                 WRITE_ONCE(rdp->nocb_nobypass_last, j);
> > >                 c = rdp->nocb_nobypass_count - nocb_nobypass_lim_per_jiffy;
> > >                 if (ULONG_CMP_LT(rdp->nocb_nobypass_count,
> > >                                  nocb_nobypass_lim_per_jiffy))
> > >                         c = 0;
> > >                 else if (c > nocb_nobypass_lim_per_jiffy)
> > >                         c = nocb_nobypass_lim_per_jiffy;
> > >         }
> > >         WRITE_ONCE(rdp->nocb_nobypass_count, c);
> > > 
> > > 
> > > Your (Vlad's) approach OTOH is also fine to me, you check if page is full and
> > > make that as a 'flood is happening' detector.
> > > 
> > OK, thank you Joel. I also think, that way we improve batching and utilization
> > of the page what is actually an intention of the patch in question.
> > 
> Paul, will you pick this patch?

I did pick up the first one:

16224f4cdf03 ("rcu/kvfree: Remove useless monitor_todo flag")

On the second one, if you use page-fill as your flood detector, can't
you simplify things by just using the one longer timeout, as discussed
in this thread?

Or did I miss a turn somewhere?

							Thanx, Paul

> Thanks!
> 
> --
> Uladzislau Rezki

  reply	other threads:[~2022-06-15  5:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-02  8:06 [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag Uladzislau Rezki (Sony)
2022-06-02  8:06 ` [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval Uladzislau Rezki (Sony)
2022-06-02 23:32   ` Joel Fernandes
2022-06-03  9:55     ` Uladzislau Rezki
2022-06-04  3:03       ` Joel Fernandes
2022-06-04 15:51   ` Paul E. McKenney
2022-06-05  9:10     ` Uladzislau Rezki
2022-06-07  3:47       ` Paul E. McKenney
2022-06-09 13:10         ` Uladzislau Rezki
2022-06-10 16:45           ` Joel Fernandes
2022-06-13  9:47             ` Uladzislau Rezki
2022-06-14  6:42               ` Uladzislau Rezki
2022-06-15  5:12                 ` Paul E. McKenney [this message]
2022-06-15  7:32                   ` Uladzislau Rezki
2022-06-15 14:02                     ` Paul E. McKenney
2022-06-02 23:43 ` [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag Joel Fernandes
2022-06-03  9:51   ` Uladzislau Rezki
2022-06-04  3:07     ` Joel Fernandes
2022-06-04 16:03   ` Paul E. McKenney
2022-06-05  8:52     ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220615051205.GG1790663@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=frederic@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraj.iitr10@gmail.com \
    --cc=oleksiy.avramchenko@sony.com \
    --cc=rcu@vger.kernel.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).