linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Joel Fernandes <joel@joelfernandes.org>,
	Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, peterz@infradead.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de,
	jpoimboe@kernel.org, mark.rutland@arm.com, jgross@suse.com,
	andrew.cooper3@citrix.com, bristot@kernel.org,
	mathieu.desnoyers@efficios.com, geert@linux-m68k.org,
	glaubitz@physik.fu-berlin.de, anton.ivanov@cambridgegreys.com,
	mattst88@gmail.com, krypton@ulrich-teichert.org,
	rostedt@goodmis.org, David.Laight@aculab.com, richard@nod.at,
	mjguzik@gmail.com, jon.grimm@amd.com, bharata@amd.com,
	raghavendra.kt@amd.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com, rcu@vger.kernel.org
Subject: Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
Date: Mon, 11 Mar 2024 12:53:47 -0700	[thread overview]
Message-ID: <e762474b-a3fa-46bd-9816-7663fbba7271@paulmck-laptop> (raw)
In-Reply-To: <87wmq8pop1.ffs@tglx>

On Mon, Mar 11, 2024 at 08:12:58PM +0100, Thomas Gleixner wrote:
> On Mon, Mar 11 2024 at 11:25, Joel Fernandes wrote:
> > On 3/11/2024 1:18 AM, Ankur Arora wrote:
> >>> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
> >>> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
> >>> preemption in preempt=none. So the "Don't preempt the kernel" behavior has
> >>> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
> >>> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on
> >> 
> >> I think that's a view from too close to the implementation. Someone
> >> using the kernel is not necessarily concered with whether tasks are
> >> preempted or not. They are concerned with throughput and latency.
> >
> > No, we are not only talking about that (throughput/latency). We are also talking
> > about the issue related to RCU reader-preemption causing OOM (well and that
> > could hurt both throughput and latency as well).
> 
> That happens only when PREEMPT_RCU=y. For PREEMPT_RCU=n the read side
> critical sections still have preemption disabled.
> 
> > With CONFIG_PREEMPT_AUTO=y, you now preempt in the preempt=none mode. Something
> > very different from the classical CONFIG_PREEMPT_NONE=y.
> 
> In PREEMPT_RCU=y and preempt=none mode this happens only when really
> required, i.e. when the task does not schedule out or returns to user
> space on time, or when a higher scheduling class task gets runnable. For
> the latter the jury is still out whether this should be done or just
> lazily defered like the SCHED_OTHER preemption requests.
> 
> In any case for that to matter this forced preemption would need to
> preempt a RCU read side critical section and then keep the preempted
> task away from the CPU for a long time.
> 
> That's very different from the unconditional kernel preemption model which
> preempt=full provides and only marginally different from the existing
> PREEMPT_NONE model. I know there might be dragons, but I'm not convinced
> yet that this is an actual problem.
> 
> OTOH, doesn't PREEMPT_RCU=y have mechanism to mitigate that already?

You are right, it does, CONFIG_RCU_BOOST=y.

> > Essentially this means preemption is now more aggressive from the point of view
> > of a preempt=none user. I was suggesting that, a point of view could be RCU
> > should always support preepmtiblity (don't give PREEEMPT_RCU=n option) because
> > AUTO *does preempt* unlike classic CONFIG_PREEMPT_NONE. Otherwise it is
> > inconsistent -- say with CONFIG_PREEMPT=y (another *preemption mode*) which
> > forces CONFIG_PREEMPT_RCU. However to Paul's point, we need to worry about those
> > users who are concerned with running out of memory due to reader
> > preemption.
> 
> What's wrong with the combination of PREEMPT_AUTO=y and PREEMPT_RCU=n?
> Paul and me agreed long ago that this needs to be supported.
> 
> > In that vain, maybe we should also support CONFIG_PREEMPT_RCU=n for
> > CONFIG_PREEMPT=y as well. There are plenty of popular systems with relatively
> > low memory that need low latency (like some low-end devices / laptops
> > :-)).
> 
> I'm not sure whether that's useful as the goal is to get rid of all the
> CONFIG_PREEMPT_FOO options, no?
> 
> I'd rather spend brain cycles on figuring out whether RCU can be flipped
> over between PREEMPT_RCU=n/y at boot or obviously run-time.

Well, it is just software, so anything is possible.  But there can
be a wide gap between "possible" and "sensible".  ;-)

In theory, one boot-time approach would be build preemptible RCU,
and then to boot-time binary-rewrite calls to __rcu_read_lock()
and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
respectively.  Because preemptible RCU has to treat preemption-disabled
regions of code as RCU readers, this Should Just Work.  However, there
would then be a lot of needless branches in the grace-period code.
Only the ones on fastpaths (for example, context switch) would need
to be static-branchified, but there would likely need to be other
restructuring, given the need for current preemptible RCU to do a better
job of emulating non-preemptible RCU.  (Emulating non-preemptible RCU
is of course currently a complete non-goal for preemptible RCU.)

So maybe?

But this one needs careful design and review up front, as in step through
all the code and check assumptions and changes in behavior.  After all,
this stuff is way easier to break than to debug and fix.  ;-)


On the other hand, making RCU switch at runtime is...  Tricky.

For example, if the system was in non-preemptible mode at rcu_read_lock()
time, the corresponding rcu_read_unlock() needs to be aware that it needs
to act as if the system was still in non-preemptible mode, and vice versa.
Grace period processing during the switch needs to be aware that different
CPUs will be switching at different times.  Also, it will be common for a
given CPU's switch to span more than one grace period.  So any approach
based on either binary rewrite or static branches will need to be set
up in a multi-phase multi-grace-period state machine.  Sort of like
Frederic's runtime-switched callback offloading, but rather more complex,
and way more performance sensitive.

But do we even need to switch RCU at runtime, other than to say that
we did it?  What is the use case?  Or is this just a case of "it would
be cool!"?  Don't get me wrong, I am a sucker for "it would be cool",
as you well know, but even for me there are limits.  ;-)

At the moment, I would prioritize improving quiescent-state forcing for
existing RCU over this, especially perhaps given the concerns from the
MM folks.

But what is motivating the desire to boot-time/run-time switch RCU
between preemptible and non-preemptible?

							Thanx, Paul

  reply	other threads:[~2024-03-11 19:53 UTC|newest]

Thread overview: 155+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13  5:55 [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-02-13  5:55 ` [PATCH 01/30] preempt: introduce CONFIG_PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 02/30] thread_info: selector for TIF_NEED_RESCHED[_LAZY] Ankur Arora
2024-02-19 15:16   ` Thomas Gleixner
2024-02-20 22:50     ` Ankur Arora
2024-02-21 17:05       ` Thomas Gleixner
2024-02-21 18:26   ` Steven Rostedt
2024-02-21 20:03     ` Thomas Gleixner
2024-02-13  5:55 ` [PATCH 03/30] thread_info: tif_need_resched() now takes resched_t as param Ankur Arora
2024-02-14  3:17   ` kernel test robot
2024-02-14 14:08   ` Mark Rutland
2024-02-15  4:08     ` Ankur Arora
2024-02-19 12:30       ` Mark Rutland
2024-02-20 22:09         ` Ankur Arora
2024-02-19 15:21     ` Thomas Gleixner
2024-02-20 22:21       ` Ankur Arora
2024-02-21 17:07         ` Thomas Gleixner
2024-02-21 21:22           ` Ankur Arora
2024-02-13  5:55 ` [PATCH 04/30] sched: make test_*_tsk_thread_flag() return bool Ankur Arora
2024-02-14 14:12   ` Mark Rutland
2024-02-15  2:04     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 05/30] sched: *_tsk_need_resched() now takes resched_t as param Ankur Arora
2024-02-19 15:26   ` Thomas Gleixner
2024-02-20 22:37     ` Ankur Arora
2024-02-21 17:10       ` Thomas Gleixner
2024-02-13  5:55 ` [PATCH 06/30] entry: handle lazy rescheduling at user-exit Ankur Arora
2024-02-19 15:29   ` Thomas Gleixner
2024-02-20 22:38     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 07/30] entry/kvm: handle lazy rescheduling at guest-entry Ankur Arora
2024-02-13  5:55 ` [PATCH 08/30] entry: irqentry_exit only preempts for TIF_NEED_RESCHED Ankur Arora
2024-02-13  5:55 ` [PATCH 09/30] sched: __schedule_loop() doesn't need to check for need_resched_lazy() Ankur Arora
2024-02-13  5:55 ` [PATCH 10/30] sched: separate PREEMPT_DYNAMIC config logic Ankur Arora
2024-02-13  5:55 ` [PATCH 11/30] sched: runtime preemption config under PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 12/30] rcu: limit PREEMPT_RCU to full preemption " Ankur Arora
2024-02-13  5:55 ` [PATCH 13/30] rcu: fix header guard for rcu_all_qs() Ankur Arora
2024-02-13  5:55 ` [PATCH 14/30] preempt,rcu: warn on PREEMPT_RCU=n, preempt=full Ankur Arora
2024-02-13  5:55 ` [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Ankur Arora
2024-03-10 10:03   ` Joel Fernandes
2024-03-10 18:56     ` Paul E. McKenney
2024-03-11  0:48       ` Joel Fernandes
2024-03-11  3:56         ` Paul E. McKenney
2024-03-11 15:01           ` Joel Fernandes
2024-03-11 20:51             ` Ankur Arora
2024-03-11 22:12               ` Thomas Gleixner
2024-03-11  5:18         ` Ankur Arora
2024-03-11 15:25           ` Joel Fernandes
2024-03-11 19:12             ` Thomas Gleixner
2024-03-11 19:53               ` Paul E. McKenney [this message]
2024-03-11 20:29                 ` Thomas Gleixner
2024-03-12  0:01                   ` Paul E. McKenney
2024-03-12  0:08               ` Joel Fernandes
2024-03-12  3:16                 ` Ankur Arora
2024-03-12  3:24                   ` Joel Fernandes
2024-03-12  5:23                     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 16/30] rcu: force context-switch " Ankur Arora
2024-02-13  5:55 ` [PATCH 17/30] x86/thread_info: define TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-14 13:25   ` Mark Rutland
2024-02-14 20:31     ` Ankur Arora
2024-02-19 12:32       ` Mark Rutland
2024-02-13  5:55 ` [PATCH 18/30] sched: prepare for lazy rescheduling in resched_curr() Ankur Arora
2024-02-13  5:55 ` [PATCH 19/30] sched: default preemption policy for PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 20/30] sched: handle idle preemption " Ankur Arora
2024-02-13  5:55 ` [PATCH 21/30] sched: schedule eagerly in resched_cpu() Ankur Arora
2024-02-13  5:55 ` [PATCH 22/30] sched/fair: refactor update_curr(), entity_tick() Ankur Arora
2024-02-13  5:55 ` [PATCH 23/30] sched/fair: handle tick expiry under lazy preemption Ankur Arora
2024-02-21 21:38   ` Steven Rostedt
2024-02-28 13:47   ` Juri Lelli
2024-02-29  6:43     ` Ankur Arora
2024-02-29  9:33       ` Juri Lelli
2024-02-29 23:54         ` Ankur Arora
2024-03-01  0:28           ` Paul E. McKenney
2024-02-13  5:55 ` [PATCH 24/30] sched: support preempt=none under PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 25/30] sched: support preempt=full " Ankur Arora
2024-02-13  5:55 ` [PATCH 26/30] sched: handle preempt=voluntary " Ankur Arora
2024-03-03  1:08   ` Joel Fernandes
2024-03-05  8:11     ` Ankur Arora
2024-03-06 20:42       ` Joel Fernandes
2024-03-07 19:01         ` Paul E. McKenney
2024-03-08  0:15           ` Joel Fernandes
2024-03-08  0:42             ` Paul E. McKenney
2024-03-08  4:22               ` Ankur Arora
2024-03-08 21:33                 ` Paul E. McKenney
2024-03-11  4:50                   ` Ankur Arora
2024-03-11 19:26                     ` Paul E. McKenney
2024-03-11 20:09                       ` Ankur Arora
2024-03-11 20:23                         ` Linus Torvalds
2024-03-11 21:03                           ` Ankur Arora
2024-03-12  0:03                           ` Paul E. McKenney
2024-03-12 12:14                             ` Thomas Gleixner
2024-03-12 19:40                               ` Paul E. McKenney
2024-03-08  3:49             ` Ankur Arora
2024-03-08  5:29               ` Joel Fernandes
2024-03-08  6:54               ` Juri Lelli
2024-03-11  5:34                 ` Ankur Arora
2024-02-13  5:55 ` [PATCH 27/30] sched: latency warn for TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-13  5:55 ` [PATCH 28/30] tracing: support lazy resched Ankur Arora
2024-02-13  5:55 ` [PATCH 29/30] Documentation: tracing: add TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-21 21:43   ` Steven Rostedt
2024-02-21 23:22     ` Ankur Arora
2024-02-21 23:53       ` Steven Rostedt
2024-03-01 23:33     ` Joel Fernandes
2024-03-02  3:09       ` Ankur Arora
2024-03-03 19:32         ` Joel Fernandes
2024-02-13  5:55 ` [PATCH 30/30] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y Ankur Arora
2024-02-13  9:47 ` [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Geert Uytterhoeven
2024-02-13 21:46   ` Ankur Arora
2024-02-14 23:57 ` Paul E. McKenney
2024-02-15  2:03   ` Ankur Arora
2024-02-15  3:45     ` Paul E. McKenney
2024-02-15 19:28       ` Paul E. McKenney
2024-02-15 20:04         ` Thomas Gleixner
2024-02-15 20:54           ` Paul E. McKenney
2024-02-15 20:53         ` Ankur Arora
2024-02-15 20:55           ` Paul E. McKenney
2024-02-15 21:24         ` Ankur Arora
2024-02-15 22:54           ` Paul E. McKenney
2024-02-15 22:56             ` Paul E. McKenney
2024-02-16  0:45             ` Ankur Arora
2024-02-16  2:59               ` Paul E. McKenney
2024-02-17  0:55                 ` Paul E. McKenney
2024-02-17  3:59                   ` Ankur Arora
2024-02-18 18:17                     ` Paul E. McKenney
2024-02-19 16:48                       ` Paul E. McKenney
2024-02-21 18:19                         ` Steven Rostedt
2024-02-21 19:41                           ` Paul E. McKenney
2024-02-21 20:11                             ` Steven Rostedt
2024-02-21 20:22                               ` Paul E. McKenney
2024-02-22 15:50                                 ` Mark Rutland
2024-02-22 19:11                                   ` Paul E. McKenney
2024-02-23 11:05                                     ` Mark Rutland
2024-02-23 15:31                                       ` Paul E. McKenney
2024-03-02  1:16                                         ` Paul E. McKenney
2024-03-19 11:45                                           ` Tasks RCU, ftrace, and trampolines (was: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling) Mark Rutland
2024-03-19 23:33                                             ` Paul E. McKenney
2024-02-21  6:48                   ` [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-02-21 17:44                     ` Paul E. McKenney
2024-02-16  0:45             ` Ankur Arora
2024-02-21 12:23 ` Raghavendra K T
2024-02-21 17:15   ` Thomas Gleixner
2024-02-21 17:27     ` Raghavendra K T
2024-02-21 21:16       ` Ankur Arora
2024-02-22  4:05         ` Raghavendra K T
2024-02-22 21:23       ` Thomas Gleixner
2024-02-23  3:14         ` Ankur Arora
2024-02-23  6:28           ` Raghavendra K T
2024-02-24  3:15             ` Raghavendra K T
2024-02-27 17:45               ` Ankur Arora
2024-02-22 13:04     ` Raghavendra K T
2024-04-23 15:21 ` Shrikanth Hegde
2024-04-23 16:13   ` Linus Torvalds
2024-04-26  7:46     ` Shrikanth Hegde
2024-04-26 19:00       ` Ankur Arora
2024-05-07 11:16         ` Shrikanth Hegde
2024-05-08  5:18           ` Ankur Arora
2024-05-15 14:31             ` Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e762474b-a3fa-46bd-9816-7663fbba7271@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=David.Laight@aculab.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=bristot@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=geert@linux-m68k.org \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=joel@joelfernandes.org \
    --cc=jon.grimm@amd.com \
    --cc=jpoimboe@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=krypton@ulrich-teichert.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mattst88@gmail.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=rcu@vger.kernel.org \
    --cc=richard@nod.at \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).