All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	"Uladzislau Rezki (Sony)" <urezki@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	Daniel Axtens <dja@axtens.net>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraju@codeaurora.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Theodore Y . Ts'o" <tytso@mit.edu>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Subject: Re: [PATCH 1/3] kvfree_rcu: Allocate a page for a single argument
Date: Thu, 21 Jan 2021 14:35:10 +0100	[thread overview]
Message-ID: <20210121133510.GB1872@pc638.lan> (raw)
In-Reply-To: <20210120215403.GH2743@paulmck-ThinkPad-P72>

On Wed, Jan 20, 2021 at 01:54:03PM -0800, Paul E. McKenney wrote:
> On Wed, Jan 20, 2021 at 08:57:57PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2021-01-20 17:21:46 [+0100], Uladzislau Rezki (Sony) wrote:
> > > For a single argument we can directly request a page from a caller
> > > context when a "carry page block" is run out of free spots. Instead
> > > of hitting a slow path we can request an extra page by demand and
> > > proceed with a fast path.
> > > 
> > > A single-argument kvfree_rcu() must be invoked in sleepable contexts,
> > > and that its fallback is the relatively high latency synchronize_rcu().
> > > Single-argument kvfree_rcu() therefore uses GFP_KERNEL|__GFP_RETRY_MAYFAIL
> > > to allow limited sleeping within the memory allocator.
> > > 
> > > [ paulmck: Add add_ptr_to_bulk_krc_lock header comment per Michal Hocko. ]
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > ---
> > >  kernel/rcu/tree.c | 42 ++++++++++++++++++++++++++----------------
> > >  1 file changed, 26 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index e04e336bee42..2014fb22644d 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3465,37 +3465,50 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
> > >  	}
> > >  }
> > >  
> > > +// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
> > > +// state specified by flags.  If can_alloc is true, the caller must
> > > +// be schedulable and not be holding any locks or mutexes that might be
> > > +// acquired by the memory allocator or anything that it might invoke.
> > > +// Returns true if ptr was successfully recorded, else the caller must
> > > +// use a fallback.
> > 
> > The whole RCU department is getting swamped by the // comments. Can't we
> > have proper kernel doc and /* */ style comments like the remaining part
> > of the kernel?
> 
> Because // comments are easier to type and take up less horizontal space.
> Also, this kvfree_call_rcu_add_ptr_to_bulk() function is local to
> kvfree_rcu(), and we don't normally docbook-ify such functions.
> 
> > >  static inline bool
> > > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> > > +	unsigned long *flags, void *ptr, bool can_alloc)
> > >  {
> > >  	struct kvfree_rcu_bulk_data *bnode;
> > >  	int idx;
> > >  
> > > -	if (unlikely(!krcp->initialized))
> > > +	*krcp = krc_this_cpu_lock(flags);
> > > +	if (unlikely(!(*krcp)->initialized))
> > >  		return false;
> > >  
> > > -	lockdep_assert_held(&krcp->lock);
> > >  	idx = !!is_vmalloc_addr(ptr);
> > >  
> > >  	/* Check if a new block is required. */
> > > -	if (!krcp->bkvhead[idx] ||
> > > -			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> > > -		bnode = get_cached_bnode(krcp);
> > > -		/* Switch to emergency path. */
> > > +	if (!(*krcp)->bkvhead[idx] ||
> > > +			(*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> > > +		bnode = get_cached_bnode(*krcp);
> > > +		if (!bnode && can_alloc) {
> > > +			krc_this_cpu_unlock(*krcp, *flags);
> > > +			bnode = (struct kvfree_rcu_bulk_data *)
> > 
> > There is no need for this cast.
> 
> Without it, gcc version 7.5.0 says:
> 
> 	warning: assignment makes pointer from integer without a cast
> 
> > > +				__get_free_page(GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> > > +			*krcp = krc_this_cpu_lock(flags);
> > 
> > so if bnode is NULL you could retry get_cached_bnode() since it might
> > have been filled (given preemption or CPU migration changed something).
> > Judging from patch #3 you think that a CPU migration is a bad thing. But
> > why?
> 
> So that the later "(*krcp)->bkvhead[idx] = bnode" assignment associates
> it with the correct CPU.
> 
> Though now that you mention it, couldn't the following happen?
> 
> o	Task A on CPU 0 notices that allocation is needed, so it
> 	drops the lock disables migration, and sleeps while
> 	allocating.
> 
> o	Task B on CPU 0 does the same.
> 
> o	The two tasks wake up in some order, and the second one
> 	causes trouble at the "(*krcp)->bkvhead[idx] = bnode"
> 	assignment.
> 
> Uladzislau, do we need to recheck "!(*krcp)->bkvhead[idx]" just after
> the migrate_enable()?  Along with the KVFREE_BULK_MAX_ENTR check?
> 
Probably i should have mentioned your sequence you described, that two tasks
can get a page on same CPU, i was thinking about it :) Yep, it can happen
since we drop the lock and a context is fully preemptible, so another one
can trigger kvfree_rcu() ending up at the same place - entering a page
allocator.

I spent some time simulating it, but with no any luck, therefore i did not
reflect this case in the commit message, thus did no pay much attention to
such scenario.

>
> Uladzislau, do we need to recheck "!(*krcp)->bkvhead[idx]" just after
> the migrate_enable()?  Along with the KVFREE_BULK_MAX_ENTR check?
>
Two woken tasks will be serialized, i.e. an assignment is protected by
the our local lock. We do krc_this_cpu_lock(flags); as a first step
right after that we do restore a migration. A migration in that case
can occur only when krc_this_cpu_unlock(*krcp, *flags); is invoked.

The scenario you described can happen, in that case a previous bnode
in the drain list can be either empty or partly utilized. But, again
i was non able to trigger such scenario.

If we should fix it, i think we can go with below "alloc_in_progress"
protection:

<snip>
urezki@pc638:~/data/raid0/coding/linux-rcu.git$ git diff
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cad36074366d..95485ec7267e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3488,12 +3488,19 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
        if (!(*krcp)->bkvhead[idx] ||
                        (*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
                bnode = get_cached_bnode(*krcp);
-               if (!bnode && can_alloc) {
+               if (!bnode && can_alloc && !(*krcp)->alloc_in_progress)  {
                        migrate_disable();
+
+                       /* Set it before dropping the lock. */
+                       (*krcp)->alloc_in_progress = true;
                        krc_this_cpu_unlock(*krcp, *flags);
+
                        bnode = (struct kvfree_rcu_bulk_data *)
                                __get_free_page(GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOMEMALLOC | __GFP_NOWARN);
                        *krcp = krc_this_cpu_lock(flags);
+
+                       /* Clear it, the lock was taken back. */
+                       (*krcp)->alloc_in_progress = false;
                        migrate_enable();
                }
 
urezki@pc638:~/data/raid0/coding/linux-rcu.git$
<snip>

in that case a second task will follow a fallback path bypassing a page
request. I can send it as a separate patch if there are no any objections.

--
Vlad Rezki

  reply	other threads:[~2021-01-21 14:01 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20 16:21 [PATCH 1/3] kvfree_rcu: Allocate a page for a single argument Uladzislau Rezki (Sony)
2021-01-20 16:21 ` [PATCH 2/3] kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu() Uladzislau Rezki (Sony)
2021-01-28 18:06   ` Uladzislau Rezki
2021-01-20 16:21 ` [PATCH 3/3] kvfree_rcu: use migrate_disable/enable() Uladzislau Rezki (Sony)
2021-01-20 19:45   ` Sebastian Andrzej Siewior
2021-01-20 21:42     ` Paul E. McKenney
2021-01-23  9:31   ` 回复: " Zhang, Qiang
2021-01-24 21:57     ` Uladzislau Rezki
2021-01-25  1:50       ` 回复: " Zhang, Qiang
2021-01-25  2:18         ` Zhang, Qiang
2021-01-25 13:49           ` Uladzislau Rezki
2021-01-26  9:33             ` 回复: " Zhang, Qiang
2021-01-26 13:43               ` Uladzislau Rezki
2021-01-20 18:40 ` [PATCH 1/3] kvfree_rcu: Allocate a page for a single argument Paul E. McKenney
2021-01-20 19:57 ` Sebastian Andrzej Siewior
2021-01-20 21:54   ` Paul E. McKenney
2021-01-21 13:35     ` Uladzislau Rezki [this message]
2021-01-21 15:07       ` Paul E. McKenney
2021-01-21 19:17         ` Uladzislau Rezki
2021-01-22 11:17     ` Sebastian Andrzej Siewior
2021-01-22 15:28       ` Paul E. McKenney
2021-01-21 12:38   ` Uladzislau Rezki
2021-01-22 11:34     ` Sebastian Andrzej Siewior
2021-01-22 14:21       ` Uladzislau Rezki
2021-01-25 13:22 ` Michal Hocko
2021-01-25 14:31   ` Uladzislau Rezki
2021-01-25 15:39     ` Michal Hocko
2021-01-25 16:25       ` Uladzislau Rezki
2021-01-28 15:11         ` Uladzislau Rezki
2021-01-28 15:17           ` Michal Hocko
2021-01-28 15:30             ` Uladzislau Rezki
2021-01-28 18:02               ` Uladzislau Rezki
     [not found]                 ` <YBPNvbJLg56XU8co@dhcp22.suse.cz>
2021-01-29 16:35                   ` Uladzislau Rezki
2021-02-01 11:47                     ` Michal Hocko
2021-02-01 14:44                       ` Uladzislau Rezki
2021-02-03 19:37                       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210121133510.GB1872@pc638.lan \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=dja@axtens.net \
    --cc=frederic@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=neeraju@codeaurora.org \
    --cc=oleksiy.avramchenko@sonymobile.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.