All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Michal Hocko <mhocko@suse.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	"Theodore Y . Ts'o" <tytso@mit.edu>,
	Joel Fernandes <joel@joelfernandes.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Subject: Re: [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag
Date: Fri, 14 Aug 2020 07:14:25 -0700	[thread overview]
Message-ID: <20200814141425.GM4295@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20200814083037.GD3982@worktop.programming.kicks-ass.net>

On Fri, Aug 14, 2020 at 10:30:37AM +0200, Peter Zijlstra wrote:
> On Fri, Aug 14, 2020 at 01:59:04AM +0200, Thomas Gleixner wrote:
> > On Fri, Aug 14 2020 at 00:06, peterz wrote:
> > > I'm still not getting it, how do we end up trying to allocate memory
> > > from under raw spinlocks if you're not allowed to use kfree_rcu() under
> > > one ?
> > >
> > > Can someone please spell out the actual problem?
> > 
> > Your actual problem is the heat wave. Get an icepack already :)
> 
> Sure, but also much of the below wasn't stated anywhere in the thread I
> got Cc'ed on. All I got was half arsed solutions without an actual
> problem statement.
> 
> > To set things straight:
> > 
> >   1) kfree_rcu() which used to be just a conveniance wrapper around
> >      call_rcu() always worked in any context except NMI.
> > 
> >      Otherwise RT would have never worked or would have needed other
> >      horrible hacks to defer kfree in certain contexts including
> >      context switch.
> 
> I've never used kfree_rcu(), htf would I know.

Doing this to kfree_rcu() is the first step.  We will also be doing this
to call_rcu(), which has some long-standing invocations from various
raw contexts, including hardirq handler.

> >   2) RCU grew an optimization for kfree_rcu() which avoids the linked
> >      list cache misses when a large number of objects is freed via
> >      kfree_rcu(). Paul says it speeds up post grace period processing by
> >      a factor of 8 which is impressive.
> > 
> >      That's done by avoiding the linked list and storing the object
> >      pointer in an array of pointers which is page sized. This array is
> >      then freed in bulk without having to touch the rcu head linked list
> >      which obviously avoids cache misses.
> > 
> >      Occasionally kfree_rcu() needs to allocate a page for this but it
> >      can fallback to the linked list when the allocation fails.
> >      
> >      Inconveniantly this allocation happens with an RCU internal raw
> >      lock held, but even if we could do the allocation outside the RCU
> >      internal lock this would not solve the problem that kfree_rcu() can
> >      be called in any context except NMI even on RT.
> > 
> >      So RT forbids allocations from truly atomic contexts even with
> >      GFP_ATOMIC/NOWAIT. GFP_ATOMIC is kinda meaningless on RT because
> >      everything which calls it is in non-atomic context :) But still
> >      GFP_ATOMIC or GFP_NOWAIT retain the semantics of !RT and do not go
> >      down into deeper levels or wait for pages to become available.
> 
> Right so on RT just cut out the allocation and unconditionally make it
> NULL. Since it needs to deal with GFP_ATOMIC|GFP_NOWARN failing the
> allocation anyway.

Except that this is not just RT due to CONFIG_PROVE_RAW_LOCK_NESTING=y.

> >   3) #2 upsets lockdep (with the raw lock nesting enabled) rightfully
> >      when RCU tries to allocate a page, the lockless page cache is empty
> >      and it acquires zone->lock which is a regular spinlock
> 
> There was no lockdep splat in the thread.

I don't see the point of including such a splat given that we know that
lockdep is supposed to splat in this situation.

> >   4) As kfree_rcu() can be used from any context except NMI and RT
> >      relies on it we ran into a circular dependency problem.
> 
> Well, which actual usage sites are under a raw spinlock? Most of the
> ones I could find are not.

There are some on their way in, but this same optimization will be applied
to call_rcu(), and there are no shortage of such call sites in that case.
And these call sites have been around for a very long time.

> >      If we disable that feature for RT then the problem would be solved
> >      except that lockdep still would complain about taking zone->lock
> >      within a forbidden context on !RT kernels.
> > 
> >      Preventing that feature because of that is not a feasible option
> >      either. Aside of that we discuss this postfactum, IOW the stuff is
> >      upstream already despite the problem being known before.
> 
> Well, that's a fail :-( I tought we were supposed to solve known issues
> before shit got merged.

The enforcement is coming in and the kfree_rcu() speed up is coming in
at the same time.  The call_rcu() speedup will appear later.

> >   5) Ulad posted patches which introduce a new GFP allocation mode
> >      which makes the allocation fail if the lockless cache is empty,
> >      i.e. it does not try to touch zone->lock in that case.
> > 
> >      That works perfectly fine on RT and !RT, makes lockdep happy and
> >      Paul is happy as well.
> > 
> >      If the lockless cache, which works perfectly fine on RT, is empty
> >      then the performance of kfree_rcu() post grace period processing is
> >      probably not making the situation of the system worse.
> > 
> >      Michal is not fond of the new GFP flag and wants to explore options
> >      to avoid that completely. But so far there is no real feasible
> >      solution.
> 
> Not only Michal, those patches looked pretty horrid.

They are the first round, and will be improved.

> >      A) It was suggested to make zone->lock raw, but that's a total
> >         disaster latency wise. With just a kernel compile (!RT kernel)
> >         spinwait times are in the hundreds of microseconds.
> 
> Yeah, I know, been there done that, like over a decade ago :-)
> 
> >      B) Michal suggested to have GFP_RT_ATOMIC defined to 0 which would
> >         not require a new GFP bit and bail out when RT is enabled and
> >         the context is atomic.
> 
> I would've written the code like:
> 
> 	void *mem = NULL;
> 
> 	...
> #ifndef CONFIG_PREEMPT_RT
> 	mem = kmalloc(PAGE_SIZE, GFP_ATOMIC|GFP_NOWAIT);
> #endif
> 	if (!mem)
> 	  ....
> 
> Seems much simpler and doesn't pollute the GFP_ space.

But fails in the CONFIG_PROVE_RAW_LOCK_NESTING=y case when lockdep
is enabled.

> >      D) To solve the lockdep issue of #B Michal suggested to have magic
> >         lockdep annotations which allow !RT kernels to take zone->lock
> >         from the otherwise forbidden contexts because on !RT this lock
> >         nesting could be considered a false positive.
> > 
> >         But this would be horrors of some sorts because there might be
> >         locks further down which then need the same treatment or some
> >         general 'yay, I'm excempt from everything' annotation which is
> >         short of disabling lockdep completly.
> > 
> >         Even if that could be solved and made correct for both RT and
> >         !RT then this opens the can of worms that this kind of
> >         annotation would show up all over the place within no time for
> >         the completely wrong reasons.
> 
> So due to this heat crap I've not slept more than a few hours a night
> for the last week, I'm not entirely there, but we already have a bunch
> of lockdep magic for this raw nesting stuff.

Ouch!  Here is hoping for cooler weather soon!

> But yes, we need to avoid abuse, grep for lockdep_off() :-( This
> drivers/md/dm-cache-target.c thing is just plain broken. It sorta
> 'works' on mainline (and even there it can behave badly), but on RT it
> will come apart with a bang.
> 
> > Paul compiled this options list:
> > 
> > > 1.	Prohibit invoking allocators from raw atomic context, such
> > >	as when holding a raw spinlock.
> > 
> >   Clearly the simplest solution but not Pauls favourite and
> >   unfortunately he has a good reason.
> 
> Which isn't actually stated anywhere I suppose ?

Several times, but why not one more?  ;-)

In CONFIG_PREEMPT_NONE=y kernels, which are heavily used, and for which
the proposed kfree_rcu() and later call_rcu() optimizations are quite
important, there is no way to tell at runtime whether or you are in
atomic raw context.

> > > 2.	Adding a GFP_ flag.
> > 
> >   Michal does not like the restriction for !RT kernels and tries to
> >   avoid the introduction of a new allocation mode.
> 
> Like above, I tend to be with Michal on this, just wrap the actual
> allocation in CONFIG_PREEMPT_RT, the code needs to handle a NULL pointer
> there anyway.

That disables the optimization in the CONFIG_PREEMPT_NONE=y case,
where it really is needed.

> > > 3.	Reusing existing GFP_ flags/values/whatever to communicate
> > >	the raw-context information that was to be communicated by
> > >	the new GFP_ flag.
> > >
> > > 4.	Making lockdep forgive acquiring spinlocks while holding
> > >	raw spinlocks, but only in CONFIG_PREEMPT_NONE=y kernels.
> 
> Uhh, !CONFIG_PREEMPT_RT, the rest is 'fine'.

I would be OK with either.  In CONFIG_PREEMPT_NONE=n kernels, the
kfree_rcu() code could use preemptible() to determine whether it was safe
to invoke the allocator.  The code in kfree_rcu() might look like this:

	mem = NULL;
	if (IS_ENABLED(CONFIG_PREEMPT_NONE) || preemptible())
		mem = __get_free_page(...);

Is your point is that the usual mistakes would then be caught by the
usual testing on CONFIG_PREEMPT_NONE=n kernels?

> >  These are not seperate of each other as #3 requires #4. The most
> >  horrible solution IMO from a technical POV as it proliferates
> >  inconsistency for no good reaosn.
> > 
> >  Aside of that it'd be solving a problem which does not exist simply
> >  because kfree_rcu() does not depend on it and we really don't want to
> >  set precedence and encourage the (ab)use of this in any way.
> 
> My preferred solution is 1, if you want to use an allocator, you simply
> don't get to play under raw_spinlock_t. And from a quick grep, most
> kfree_rcu() users are not under raw_spinlock_t context.

There is at least one on its way in, but more to the point, we will
need to apply this same optimization to call_rcu(), which is used in
raw atomic context, including from hardirq handlers.

> This here sounds like someone who wants to have his cake and eat it too.

Just looking for a non-negative sized slice of cake, actually!

> I'll try and think about a lockdep annotation that isn't utter crap, but
> that's probably next week, I need this heat to end and get a few nights
> sleep, I'm an utter wreck atm.

Again, here is hoping for cooler weather!

							Thanx, Paul

  parent reply	other threads:[~2020-08-14 14:14 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-09 20:43 [RFC-PATCH 0/2] __GFP_NO_LOCKS Uladzislau Rezki (Sony)
2020-08-09 20:43 ` [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag Uladzislau Rezki (Sony)
2020-08-10 12:31   ` Michal Hocko
2020-08-10 16:07     ` Uladzislau Rezki
2020-08-10 19:25       ` Michal Hocko
2020-08-11  8:19         ` Michal Hocko
2020-08-11  9:37           ` Uladzislau Rezki
2020-08-11  9:42             ` Uladzislau Rezki
2020-08-11 10:28               ` Michal Hocko
2020-08-11 10:45                 ` Uladzislau Rezki
2020-08-11 10:26             ` Michal Hocko
2020-08-11 11:33               ` Uladzislau Rezki
2020-08-11  9:18         ` Uladzislau Rezki
2020-08-11 10:21           ` Michal Hocko
2020-08-11 11:10             ` Uladzislau Rezki
2020-08-11 14:44         ` Thomas Gleixner
2020-08-11 15:22           ` Thomas Gleixner
2020-08-12 11:38             ` Thomas Gleixner
2020-08-12 12:01               ` Uladzislau Rezki
2020-08-13  7:18               ` Michal Hocko
2020-08-11 15:33           ` Paul E. McKenney
2020-08-11 15:43             ` Thomas Gleixner
2020-08-11 15:56               ` Sebastian Andrzej Siewior
2020-08-11 16:02               ` Paul E. McKenney
2020-08-11 16:19                 ` Paul E. McKenney
2020-08-11 19:39               ` Thomas Gleixner
2020-08-11 21:09                 ` Paul E. McKenney
2020-08-12  0:13                   ` Thomas Gleixner
2020-08-12  4:29                     ` Paul E. McKenney
2020-08-12  8:32                       ` Thomas Gleixner
2020-08-12 13:30                         ` Paul E. McKenney
2020-08-13  7:50                     ` Michal Hocko
2020-08-13  9:58                       ` Uladzislau Rezki
2020-08-13 11:15                         ` Michal Hocko
2020-08-13 13:27                           ` Thomas Gleixner
2020-08-13 13:45                             ` Michal Hocko
2020-08-13 14:32                             ` Matthew Wilcox
2020-08-13 16:14                               ` Thomas Gleixner
2020-08-13 16:22                                 ` Matthew Wilcox
2020-08-13 13:22                         ` Thomas Gleixner
2020-08-13 13:33                           ` Michal Hocko
2020-08-13 14:34                             ` Thomas Gleixner
2020-08-13 14:53                               ` Michal Hocko
2020-08-13 15:41                                 ` Paul E. McKenney
2020-08-13 15:54                                   ` Michal Hocko
2020-08-13 16:04                                     ` Paul E. McKenney
2020-08-13 16:13                                       ` Michal Hocko
2020-08-13 16:29                                         ` Paul E. McKenney
2020-08-13 17:12                                           ` Michal Hocko
2020-08-13 17:27                                             ` Paul E. McKenney
2020-08-13 18:31                                           ` peterz
2020-08-13 19:13                                             ` Michal Hocko
2020-08-13 16:20                                     ` Uladzislau Rezki
2020-08-13 16:36                                       ` Michal Hocko
2020-08-14 11:54                                         ` Uladzislau Rezki
2020-08-13 17:09                                 ` Thomas Gleixner
2020-08-13 17:22                                   ` Michal Hocko
2020-08-14  7:17                                   ` Michal Hocko
2020-08-14 12:15                                     ` Uladzislau Rezki
2020-08-14 12:48                                       ` Michal Hocko
2020-08-14 13:34                                         ` Paul E. McKenney
2020-08-14 14:06                                           ` Michal Hocko
2020-08-14 18:01                                             ` Paul E. McKenney
2020-08-14 23:14                                               ` Thomas Gleixner
2020-08-14 23:41                                                 ` Paul E. McKenney
2020-08-15  0:43                                                   ` Thomas Gleixner
2020-08-15  3:01                                                     ` Paul E. McKenney
2020-08-15  8:27                                                 ` Peter Zijlstra
2020-08-15 13:03                                                   ` Paul E. McKenney
2020-08-15  8:42                                                 ` Peter Zijlstra
2020-08-15 14:18                                                   ` Paul E. McKenney
2020-08-15 14:23                                                     ` Paul E. McKenney
2020-08-17  8:47                                                 ` Michal Hocko
2020-08-13 18:26                               ` peterz
2020-08-13 18:52                                 ` Paul E. McKenney
2020-08-13 22:06                                   ` peterz
2020-08-13 23:23                                     ` Paul E. McKenney
2020-08-13 23:59                                     ` Thomas Gleixner
2020-08-14  8:30                                       ` Peter Zijlstra
2020-08-14 10:23                                         ` peterz
2020-08-14 15:26                                           ` Paul E. McKenney
2020-08-14 14:14                                         ` Paul E. McKenney [this message]
2020-08-14 16:11                                           ` Paul E. McKenney
2020-08-14 17:49                                             ` Peter Zijlstra
2020-08-14 18:02                                               ` Paul E. McKenney
2020-08-14 19:33                                                 ` Thomas Gleixner
2020-08-14 20:41                                                   ` Paul E. McKenney
2020-08-14 21:52                                                     ` Peter Zijlstra
2020-08-14 23:27                                                       ` Paul E. McKenney
2020-08-14 23:40                                                       ` Thomas Gleixner
2020-08-16 22:56                                                       ` Uladzislau Rezki
2020-08-17  8:28                                                         ` Michal Hocko
2020-08-17 10:36                                                           ` Uladzislau Rezki
2020-08-17 22:28                                                           ` Paul E. McKenney
2020-08-18  7:43                                                             ` Michal Hocko
2020-08-18 13:53                                                               ` Paul E. McKenney
2020-08-18 14:43                                                                 ` Thomas Gleixner
2020-08-18 16:13                                                                   ` Paul E. McKenney
2020-08-18 16:55                                                                     ` Thomas Gleixner
2020-08-18 17:13                                                                       ` Paul E. McKenney
2020-08-18 23:26                                                                         ` Thomas Gleixner
2020-08-19 23:07                                                                           ` Paul E. McKenney
2020-08-18 15:02                                                                 ` Michal Hocko
2020-08-18 15:45                                                                   ` Uladzislau Rezki
2020-08-18 16:18                                                                   ` Paul E. McKenney
2020-08-14 16:19                                           ` peterz
2020-08-14 18:15                                             ` Paul E. McKenney
2020-08-13 13:29                         ` Uladzislau Rezki
2020-08-13 13:41                           ` Michal Hocko
2020-08-13 14:22                             ` Uladzislau Rezki
2020-08-09 20:43 ` [PATCH 2/2] rcu/tree: use " Uladzislau Rezki (Sony)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200814141425.GM4295@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleksiy.avramchenko@sonymobile.com \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.