linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Michal Hocko <mhocko@suse.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	"Theodore Y . Ts'o" <tytso@mit.edu>,
	Joel Fernandes <joel@joelfernandes.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Subject: Re: [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag
Date: Fri, 14 Aug 2020 07:14:25 -0700	[thread overview]
Message-ID: <20200814141425.GM4295@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20200814083037.GD3982@worktop.programming.kicks-ass.net>

On Fri, Aug 14, 2020 at 10:30:37AM +0200, Peter Zijlstra wrote:
> On Fri, Aug 14, 2020 at 01:59:04AM +0200, Thomas Gleixner wrote:
> > On Fri, Aug 14 2020 at 00:06, peterz wrote:
> > > I'm still not getting it, how do we end up trying to allocate memory
> > > from under raw spinlocks if you're not allowed to use kfree_rcu() under
> > > one ?
> > >
> > > Can someone please spell out the actual problem?
> > 
> > Your actual problem is the heat wave. Get an icepack already :)
> 
> Sure, but also much of the below wasn't stated anywhere in the thread I
> got Cc'ed on. All I got was half arsed solutions without an actual
> problem statement.
> 
> > To set things straight:
> > 
> >   1) kfree_rcu() which used to be just a conveniance wrapper around
> >      call_rcu() always worked in any context except NMI.
> > 
> >      Otherwise RT would have never worked or would have needed other
> >      horrible hacks to defer kfree in certain contexts including
> >      context switch.
> 
> I've never used kfree_rcu(), htf would I know.

Doing this to kfree_rcu() is the first step.  We will also be doing this
to call_rcu(), which has some long-standing invocations from various
raw contexts, including hardirq handler.

> >   2) RCU grew an optimization for kfree_rcu() which avoids the linked
> >      list cache misses when a large number of objects is freed via
> >      kfree_rcu(). Paul says it speeds up post grace period processing by
> >      a factor of 8 which is impressive.
> > 
> >      That's done by avoiding the linked list and storing the object
> >      pointer in an array of pointers which is page sized. This array is
> >      then freed in bulk without having to touch the rcu head linked list
> >      which obviously avoids cache misses.
> > 
> >      Occasionally kfree_rcu() needs to allocate a page for this but it
> >      can fallback to the linked list when the allocation fails.
> >      
> >      Inconveniantly this allocation happens with an RCU internal raw
> >      lock held, but even if we could do the allocation outside the RCU
> >      internal lock this would not solve the problem that kfree_rcu() can
> >      be called in any context except NMI even on RT.
> > 
> >      So RT forbids allocations from truly atomic contexts even with
> >      GFP_ATOMIC/NOWAIT. GFP_ATOMIC is kinda meaningless on RT because
> >      everything which calls it is in non-atomic context :) But still
> >      GFP_ATOMIC or GFP_NOWAIT retain the semantics of !RT and do not go
> >      down into deeper levels or wait for pages to become available.
> 
> Right so on RT just cut out the allocation and unconditionally make it
> NULL. Since it needs to deal with GFP_ATOMIC|GFP_NOWARN failing the
> allocation anyway.

Except that this is not just RT due to CONFIG_PROVE_RAW_LOCK_NESTING=y.

> >   3) #2 upsets lockdep (with the raw lock nesting enabled) rightfully
> >      when RCU tries to allocate a page, the lockless page cache is empty
> >      and it acquires zone->lock which is a regular spinlock
> 
> There was no lockdep splat in the thread.

I don't see the point of including such a splat given that we know that
lockdep is supposed to splat in this situation.

> >   4) As kfree_rcu() can be used from any context except NMI and RT
> >      relies on it we ran into a circular dependency problem.
> 
> Well, which actual usage sites are under a raw spinlock? Most of the
> ones I could find are not.

There are some on their way in, but this same optimization will be applied
to call_rcu(), and there are no shortage of such call sites in that case.
And these call sites have been around for a very long time.

> >      If we disable that feature for RT then the problem would be solved
> >      except that lockdep still would complain about taking zone->lock
> >      within a forbidden context on !RT kernels.
> > 
> >      Preventing that feature because of that is not a feasible option
> >      either. Aside of that we discuss this postfactum, IOW the stuff is
> >      upstream already despite the problem being known before.
> 
> Well, that's a fail :-( I tought we were supposed to solve known issues
> before shit got merged.

The enforcement is coming in and the kfree_rcu() speed up is coming in
at the same time.  The call_rcu() speedup will appear later.

> >   5) Ulad posted patches which introduce a new GFP allocation mode
> >      which makes the allocation fail if the lockless cache is empty,
> >      i.e. it does not try to touch zone->lock in that case.
> > 
> >      That works perfectly fine on RT and !RT, makes lockdep happy and
> >      Paul is happy as well.
> > 
> >      If the lockless cache, which works perfectly fine on RT, is empty
> >      then the performance of kfree_rcu() post grace period processing is
> >      probably not making the situation of the system worse.
> > 
> >      Michal is not fond of the new GFP flag and wants to explore options
> >      to avoid that completely. But so far there is no real feasible
> >      solution.
> 
> Not only Michal, those patches looked pretty horrid.

They are the first round, and will be improved.

> >      A) It was suggested to make zone->lock raw, but that's a total
> >         disaster latency wise. With just a kernel compile (!RT kernel)
> >         spinwait times are in the hundreds of microseconds.
> 
> Yeah, I know, been there done that, like over a decade ago :-)
> 
> >      B) Michal suggested to have GFP_RT_ATOMIC defined to 0 which would
> >         not require a new GFP bit and bail out when RT is enabled and
> >         the context is atomic.
> 
> I would've written the code like:
> 
> 	void *mem = NULL;
> 
> 	...
> #ifndef CONFIG_PREEMPT_RT
> 	mem = kmalloc(PAGE_SIZE, GFP_ATOMIC|GFP_NOWAIT);
> #endif
> 	if (!mem)
> 	  ....
> 
> Seems much simpler and doesn't pollute the GFP_ space.

But fails in the CONFIG_PROVE_RAW_LOCK_NESTING=y case when lockdep
is enabled.

> >      D) To solve the lockdep issue of #B Michal suggested to have magic
> >         lockdep annotations which allow !RT kernels to take zone->lock
> >         from the otherwise forbidden contexts because on !RT this lock
> >         nesting could be considered a false positive.
> > 
> >         But this would be horrors of some sorts because there might be
> >         locks further down which then need the same treatment or some
> >         general 'yay, I'm excempt from everything' annotation which is
> >         short of disabling lockdep completly.
> > 
> >         Even if that could be solved and made correct for both RT and
> >         !RT then this opens the can of worms that this kind of
> >         annotation would show up all over the place within no time for
> >         the completely wrong reasons.
> 
> So due to this heat crap I've not slept more than a few hours a night
> for the last week, I'm not entirely there, but we already have a bunch
> of lockdep magic for this raw nesting stuff.

Ouch!  Here is hoping for cooler weather soon!

> But yes, we need to avoid abuse, grep for lockdep_off() :-( This
> drivers/md/dm-cache-target.c thing is just plain broken. It sorta
> 'works' on mainline (and even there it can behave badly), but on RT it
> will come apart with a bang.
> 
> > Paul compiled this options list:
> > 
> > > 1.	Prohibit invoking allocators from raw atomic context, such
> > >	as when holding a raw spinlock.
> > 
> >   Clearly the simplest solution but not Pauls favourite and
> >   unfortunately he has a good reason.
> 
> Which isn't actually stated anywhere I suppose ?

Several times, but why not one more?  ;-)

In CONFIG_PREEMPT_NONE=y kernels, which are heavily used, and for which
the proposed kfree_rcu() and later call_rcu() optimizations are quite
important, there is no way to tell at runtime whether or you are in
atomic raw context.

> > > 2.	Adding a GFP_ flag.
> > 
> >   Michal does not like the restriction for !RT kernels and tries to
> >   avoid the introduction of a new allocation mode.
> 
> Like above, I tend to be with Michal on this, just wrap the actual
> allocation in CONFIG_PREEMPT_RT, the code needs to handle a NULL pointer
> there anyway.

That disables the optimization in the CONFIG_PREEMPT_NONE=y case,
where it really is needed.

> > > 3.	Reusing existing GFP_ flags/values/whatever to communicate
> > >	the raw-context information that was to be communicated by
> > >	the new GFP_ flag.
> > >
> > > 4.	Making lockdep forgive acquiring spinlocks while holding
> > >	raw spinlocks, but only in CONFIG_PREEMPT_NONE=y kernels.
> 
> Uhh, !CONFIG_PREEMPT_RT, the rest is 'fine'.

I would be OK with either.  In CONFIG_PREEMPT_NONE=n kernels, the
kfree_rcu() code could use preemptible() to determine whether it was safe
to invoke the allocator.  The code in kfree_rcu() might look like this:

	mem = NULL;
	if (IS_ENABLED(CONFIG_PREEMPT_NONE) || preemptible())
		mem = __get_free_page(...);

Is your point is that the usual mistakes would then be caught by the
usual testing on CONFIG_PREEMPT_NONE=n kernels?

> >  These are not seperate of each other as #3 requires #4. The most
> >  horrible solution IMO from a technical POV as it proliferates
> >  inconsistency for no good reaosn.
> > 
> >  Aside of that it'd be solving a problem which does not exist simply
> >  because kfree_rcu() does not depend on it and we really don't want to
> >  set precedence and encourage the (ab)use of this in any way.
> 
> My preferred solution is 1, if you want to use an allocator, you simply
> don't get to play under raw_spinlock_t. And from a quick grep, most
> kfree_rcu() users are not under raw_spinlock_t context.

There is at least one on its way in, but more to the point, we will
need to apply this same optimization to call_rcu(), which is used in
raw atomic context, including from hardirq handlers.

> This here sounds like someone who wants to have his cake and eat it too.

Just looking for a non-negative sized slice of cake, actually!

> I'll try and think about a lockdep annotation that isn't utter crap, but
> that's probably next week, I need this heat to end and get a few nights
> sleep, I'm an utter wreck atm.

Again, here is hoping for cooler weather!

							Thanx, Paul


  parent reply	other threads:[~2020-08-14 14:14 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-09 20:43 [RFC-PATCH 0/2] __GFP_NO_LOCKS Uladzislau Rezki (Sony)
2020-08-09 20:43 ` [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag Uladzislau Rezki (Sony)
2020-08-10 12:31   ` Michal Hocko
2020-08-10 16:07     ` Uladzislau Rezki
2020-08-10 19:25       ` Michal Hocko
2020-08-11  8:19         ` Michal Hocko
2020-08-11  9:37           ` Uladzislau Rezki
2020-08-11  9:42             ` Uladzislau Rezki
2020-08-11 10:28               ` Michal Hocko
2020-08-11 10:45                 ` Uladzislau Rezki
2020-08-11 10:26             ` Michal Hocko
2020-08-11 11:33               ` Uladzislau Rezki
2020-08-11  9:18         ` Uladzislau Rezki
2020-08-11 10:21           ` Michal Hocko
2020-08-11 11:10             ` Uladzislau Rezki
2020-08-11 14:44         ` Thomas Gleixner
2020-08-11 15:22           ` Thomas Gleixner
2020-08-12 11:38             ` Thomas Gleixner
2020-08-12 12:01               ` Uladzislau Rezki
2020-08-13  7:18               ` Michal Hocko
2020-08-11 15:33           ` Paul E. McKenney
2020-08-11 15:43             ` Thomas Gleixner
2020-08-11 15:56               ` Sebastian Andrzej Siewior
2020-08-11 16:02               ` Paul E. McKenney
2020-08-11 16:19                 ` Paul E. McKenney
2020-08-11 19:39               ` Thomas Gleixner
2020-08-11 21:09                 ` Paul E. McKenney
2020-08-12  0:13                   ` Thomas Gleixner
2020-08-12  4:29                     ` Paul E. McKenney
2020-08-12  8:32                       ` Thomas Gleixner
2020-08-12 13:30                         ` Paul E. McKenney
2020-08-13  7:50                     ` Michal Hocko
2020-08-13  9:58                       ` Uladzislau Rezki
2020-08-13 11:15                         ` Michal Hocko
2020-08-13 13:27                           ` Thomas Gleixner
2020-08-13 13:45                             ` Michal Hocko
2020-08-13 14:32                             ` Matthew Wilcox
2020-08-13 16:14                               ` Thomas Gleixner
2020-08-13 16:22                                 ` Matthew Wilcox
2020-08-13 13:22                         ` Thomas Gleixner
2020-08-13 13:33                           ` Michal Hocko
2020-08-13 14:34                             ` Thomas Gleixner
2020-08-13 14:53                               ` Michal Hocko
2020-08-13 15:41                                 ` Paul E. McKenney
2020-08-13 15:54                                   ` Michal Hocko
2020-08-13 16:04                                     ` Paul E. McKenney
2020-08-13 16:13                                       ` Michal Hocko
2020-08-13 16:29                                         ` Paul E. McKenney
2020-08-13 17:12                                           ` Michal Hocko
2020-08-13 17:27                                             ` Paul E. McKenney
2020-08-13 18:31                                           ` peterz
2020-08-13 19:13                                             ` Michal Hocko
2020-08-13 16:20                                     ` Uladzislau Rezki
2020-08-13 16:36                                       ` Michal Hocko
2020-08-14 11:54                                         ` Uladzislau Rezki
2020-08-13 17:09                                 ` Thomas Gleixner
2020-08-13 17:22                                   ` Michal Hocko
2020-08-14  7:17                                   ` Michal Hocko
2020-08-14 12:15                                     ` Uladzislau Rezki
2020-08-14 12:48                                       ` Michal Hocko
2020-08-14 13:34                                         ` Paul E. McKenney
2020-08-14 14:06                                           ` Michal Hocko
2020-08-14 18:01                                             ` Paul E. McKenney
2020-08-14 23:14                                               ` Thomas Gleixner
2020-08-14 23:41                                                 ` Paul E. McKenney
2020-08-15  0:43                                                   ` Thomas Gleixner
2020-08-15  3:01                                                     ` Paul E. McKenney
2020-08-15  8:27                                                 ` Peter Zijlstra
2020-08-15 13:03                                                   ` Paul E. McKenney
2020-08-15  8:42                                                 ` Peter Zijlstra
2020-08-15 14:18                                                   ` Paul E. McKenney
2020-08-15 14:23                                                     ` Paul E. McKenney
2020-08-17  8:47                                                 ` Michal Hocko
2020-08-13 18:26                               ` peterz
2020-08-13 18:52                                 ` Paul E. McKenney
2020-08-13 22:06                                   ` peterz
2020-08-13 23:23                                     ` Paul E. McKenney
2020-08-13 23:59                                     ` Thomas Gleixner
2020-08-14  8:30                                       ` Peter Zijlstra
2020-08-14 10:23                                         ` peterz
2020-08-14 15:26                                           ` Paul E. McKenney
2020-08-14 14:14                                         ` Paul E. McKenney [this message]
2020-08-14 16:11                                           ` Paul E. McKenney
2020-08-14 17:49                                             ` Peter Zijlstra
2020-08-14 18:02                                               ` Paul E. McKenney
2020-08-14 19:33                                                 ` Thomas Gleixner
2020-08-14 20:41                                                   ` Paul E. McKenney
2020-08-14 21:52                                                     ` Peter Zijlstra
2020-08-14 23:27                                                       ` Paul E. McKenney
2020-08-14 23:40                                                       ` Thomas Gleixner
2020-08-16 22:56                                                       ` Uladzislau Rezki
2020-08-17  8:28                                                         ` Michal Hocko
2020-08-17 10:36                                                           ` Uladzislau Rezki
2020-08-17 22:28                                                           ` Paul E. McKenney
2020-08-18  7:43                                                             ` Michal Hocko
2020-08-18 13:53                                                               ` Paul E. McKenney
2020-08-18 14:43                                                                 ` Thomas Gleixner
2020-08-18 16:13                                                                   ` Paul E. McKenney
2020-08-18 16:55                                                                     ` Thomas Gleixner
2020-08-18 17:13                                                                       ` Paul E. McKenney
2020-08-18 23:26                                                                         ` Thomas Gleixner
2020-08-19 23:07                                                                           ` Paul E. McKenney
2020-08-18 15:02                                                                 ` Michal Hocko
2020-08-18 15:45                                                                   ` Uladzislau Rezki
2020-08-18 16:18                                                                   ` Paul E. McKenney
2020-08-14 16:19                                           ` peterz
2020-08-14 18:15                                             ` Paul E. McKenney
2020-08-13 13:29                         ` Uladzislau Rezki
2020-08-13 13:41                           ` Michal Hocko
2020-08-13 14:22                             ` Uladzislau Rezki
2020-08-09 20:43 ` [PATCH 2/2] rcu/tree: use " Uladzislau Rezki (Sony)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200814141425.GM4295@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleksiy.avramchenko@sonymobile.com \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).