linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	peterz@infradead.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	willy@infradead.org, mgorman@suse.de, jpoimboe@kernel.org,
	mark.rutland@arm.com, jgross@suse.com, andrew.cooper3@citrix.com,
	bristot@kernel.org, mathieu.desnoyers@efficios.com,
	glaubitz@physik.fu-berlin.de, anton.ivanov@cambridgegreys.com,
	mattst88@gmail.com, krypton@ulrich-teichert.org,
	David.Laight@aculab.com, richard@nod.at, jon.grimm@amd.com,
	bharata@amd.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com
Subject: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling
Date: Wed, 21 Feb 2024 11:41:47 -0800	[thread overview]
Message-ID: <8f30ecd8-629b-414e-b6ea-b526b265b592@paulmck-laptop> (raw)
In-Reply-To: <20240221131901.69c80c47@gandalf.local.home>

On Wed, Feb 21, 2024 at 01:19:01PM -0500, Steven Rostedt wrote:
> On Mon, 19 Feb 2024 08:48:20 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > I will look again -- it is quite possible that I was confused by earlier
> > > in-fleet setups that had Tasks RCU enabled even when preemption was
> > > disabled.  (We don't do that anymore, and, had I been paying sufficient
> > > attention, would not have been doing it to start with.  Back in the day,
> > > enabling rcutorture, even as a module, had the side effect of enabling
> > > Tasks RCU.  How else to test it, right?  Well...)  
> > 
> > OK, I got my head straight on this one...
> > 
> > And the problem is in fact that Tasks RCU isn't normally present
> > in non-preemptible kernels.  This is because normal RCU will wait
> > for preemption-disabled regions of code, and in PREMPT_NONE and
> > PREEMPT_VOLUNTARY kernels, that includes pretty much any region of code
> > lacking an explicit schedule() or similar.  And as I understand it,
> > tracing trampolines rely on this implicit lack of preemption.
> > 
> > So, with lazy preemption, we could preempt in the middle of a
> > trampoline, and synchronize_rcu() won't save us.
> > 
> > Steve and Mathieu will correct me if I am wrong.
> > 
> > If I do understand this correctly, one workaround is to remove the
> > "if PREEMPTIBLE" on all occurrences of "select TASKS_RCU".  That way,
> > all kernels would use synchronize_rcu_tasks(), which would wait for
> > a voluntary context switch.
> > 
> > This workaround does increase the overhead and tracepoint-removal
> > latency on non-preemptible kernels, so it might be time to revisit the
> > synchronization of trampolines.  Unfortunately, the things I have come
> > up with thus far have disadvantages:
> > 
> > o	Keep a set of permanent trampolines that enter and exit
> > 	some sort of explicit RCU read-side critical section.
> > 	If the address for this trampoline to call is in a register,
> > 	then these permanent trampolines remain constant so that
> > 	no synchronization of them is required.  The selected
> > 	flavor of RCU can then be used to deal with the non-permanent
> > 	trampolines.
> > 
> > 	The disadvantage here is a significant increase in the complexity
> > 	and overhead of trampoline code and the code that invokes the
> > 	trampolines.  This overhead limits where tracing may be used
> > 	in the kernel, which is of course undesirable.
> 
> I wonder if we can just see if the instruction pointer at preemption is at
> something that was allocated? That is, if it __is_kernel(addr) returns
> false, then we need to do more work. Of course that means modules will also
> trigger this. We could check __is_module_text() but that does a bit more
> work and may cause too much overhead. But who knows, if the module check is
> only done if the __is_kernel() check fails, maybe it's not that bad.

I do like very much that idea, but it requires that we be able to identify
this instruction pointer perfectly, no matter what.  It might also require
that we be able to perfectly identify any IRQ return addresses as well,
for example, if the preemption was triggered within an interrupt handler.
And interrupts from softirq environments might require identifying an
additional level of IRQ return address.  The original IRQ might have
interrupted a trampoline, and then after transitioning into softirq,
another IRQ might also interrupt a trampoline, and this last IRQ handler
might have instigated a preemption.

Are there additional levels or mechanisms requiring identifying
return addresses?

For whatever it is worth, and in case it should prove necessary,
I have added a sneak preview of the Kconfig workaround at the end of
this message.

							Thanx, Paul

> -- Steve
> 
> > 
> > o	Check for being preempted within a trampoline, and track this
> > 	within the tasks structure.  The disadvantage here is that this
> > 	requires keeping track of all of the trampolines and adding a
> > 	check for being in one on a scheduler fast path.
> > 
> > o	Have a variant of Tasks RCU which checks the stack of preempted
> > 	tasks, waiting until all have been seen without being preempted
> > 	in a trampoline.  This still requires keeping track of all the
> > 	trampolines in an easy-to-search manner, but gets the overhead
> > 	of searching off of the scheduler fastpaths.
> > 
> > 	It is also necessary to check running tasks, which might have
> > 	been interrupted from within a trampoline.
> > 
> > 	I would have a hard time convincing myself that these return
> > 	addresses were unconditionally reliable.  But maybe they are?
> > 
> > o	Your idea here!
> > 
> > Again, the short-term workaround is to remove the "if PREEMPTIBLE" from
> > all of the "select TASKS_RCU" clauses.
> > 
> > > > > My next step is to try this on bare metal on a system configured as
> > > > > is the fleet.  But good progress for a week!!!  
> > > > 
> > > > Yeah this is great. Fingers crossed for the wider set of tests.  
> > > 
> > > I got what might be a one-off when hitting rcutorture and KASAN harder.
> > > I am running 320*TRACE01 to see if it reproduces.  
> > 
> > [ . . . ]
> > 
> > > So, first see if it is reproducible, second enable more diagnostics,
> > > third make more grace-period sequence numbers available to rcutorture,
> > > fourth recheck the diagnostics code, and then see where we go from there.
> > > It might be that lazy preemption needs adjustment, or it might be that
> > > it just tickled latent diagnostic issues in rcutorture.
> > > 
> > > (I rarely hit this WARN_ON() except in early development, when the
> > > problem is usually glaringly obvious, hence all the uncertainty.)  
> > 
> > And it is eminently reproducible.  Digging into it...

diff --git a/arch/Kconfig b/arch/Kconfig
index c91917b508736..154f994547632 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -55,7 +55,7 @@ config KPROBES
 	depends on MODULES
 	depends on HAVE_KPROBES
 	select KALLSYMS
-	select TASKS_RCU if PREEMPTION
+	select NEED_TASKS_RCU
 	help
 	  Kprobes allows you to trap at almost any kernel address and
 	  execute a callback function.  register_kprobe() establishes
@@ -104,7 +104,7 @@ config STATIC_CALL_SELFTEST
 config OPTPROBES
 	def_bool y
 	depends on KPROBES && HAVE_OPTPROBES
-	select TASKS_RCU if PREEMPTION
+	select NEED_TASKS_RCU
 
 config KPROBES_ON_FTRACE
 	def_bool y
diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index 6a906ff930065..ce9fbc3b27ecf 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -27,7 +27,7 @@ config BPF_SYSCALL
 	bool "Enable bpf() system call"
 	select BPF
 	select IRQ_WORK
-	select TASKS_RCU if PREEMPTION
+	select NEED_TASKS_RCU
 	select TASKS_TRACE_RCU
 	select BINARY_PRINTF
 	select NET_SOCK_MSG if NET
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 7dca0138260c3..3e079de0f5b43 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -85,9 +85,13 @@ config FORCE_TASKS_RCU
 	  idle, and user-mode execution as quiescent states.  Not for
 	  manual selection in most cases.
 
-config TASKS_RCU
+config NEED_TASKS_RCU
 	bool
 	default n
+
+config TASKS_RCU
+	bool
+	default NEED_TASKS_RCU && (PREEMPTION || PREEMPT_AUTO)
 	select IRQ_WORK
 
 config FORCE_TASKS_RUDE_RCU
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 61c541c36596d..6cdc5ff919b09 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -163,7 +163,7 @@ config TRACING
 	select BINARY_PRINTF
 	select EVENT_TRACING
 	select TRACE_CLOCK
-	select TASKS_RCU if PREEMPTION
+	select NEED_TASKS_RCU
 
 config GENERIC_TRACER
 	bool
@@ -204,7 +204,7 @@ config FUNCTION_TRACER
 	select GENERIC_TRACER
 	select CONTEXT_SWITCH_TRACER
 	select GLOB
-	select TASKS_RCU if PREEMPTION
+	select NEED_TASKS_RCU
 	select TASKS_RUDE_RCU
 	help
 	  Enable the kernel to trace every kernel function. This is done

  reply	other threads:[~2024-02-21 19:41 UTC|newest]

Thread overview: 157+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13  5:55 [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-02-13  5:55 ` [PATCH 01/30] preempt: introduce CONFIG_PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 02/30] thread_info: selector for TIF_NEED_RESCHED[_LAZY] Ankur Arora
2024-02-19 15:16   ` Thomas Gleixner
2024-02-20 22:50     ` Ankur Arora
2024-02-21 17:05       ` Thomas Gleixner
2024-02-21 18:26   ` Steven Rostedt
2024-02-21 20:03     ` Thomas Gleixner
2024-02-13  5:55 ` [PATCH 03/30] thread_info: tif_need_resched() now takes resched_t as param Ankur Arora
2024-02-14  3:17   ` kernel test robot
2024-02-14 14:08   ` Mark Rutland
2024-02-15  4:08     ` Ankur Arora
2024-02-19 12:30       ` Mark Rutland
2024-02-20 22:09         ` Ankur Arora
2024-02-19 15:21     ` Thomas Gleixner
2024-02-20 22:21       ` Ankur Arora
2024-02-21 17:07         ` Thomas Gleixner
2024-02-21 21:22           ` Ankur Arora
2024-02-13  5:55 ` [PATCH 04/30] sched: make test_*_tsk_thread_flag() return bool Ankur Arora
2024-02-14 14:12   ` Mark Rutland
2024-02-15  2:04     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 05/30] sched: *_tsk_need_resched() now takes resched_t as param Ankur Arora
2024-02-19 15:26   ` Thomas Gleixner
2024-02-20 22:37     ` Ankur Arora
2024-02-21 17:10       ` Thomas Gleixner
2024-02-13  5:55 ` [PATCH 06/30] entry: handle lazy rescheduling at user-exit Ankur Arora
2024-02-19 15:29   ` Thomas Gleixner
2024-02-20 22:38     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 07/30] entry/kvm: handle lazy rescheduling at guest-entry Ankur Arora
2024-02-13  5:55 ` [PATCH 08/30] entry: irqentry_exit only preempts for TIF_NEED_RESCHED Ankur Arora
2024-02-13  5:55 ` [PATCH 09/30] sched: __schedule_loop() doesn't need to check for need_resched_lazy() Ankur Arora
2024-02-13  5:55 ` [PATCH 10/30] sched: separate PREEMPT_DYNAMIC config logic Ankur Arora
2024-02-13  5:55 ` [PATCH 11/30] sched: runtime preemption config under PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 12/30] rcu: limit PREEMPT_RCU to full preemption " Ankur Arora
2024-02-13  5:55 ` [PATCH 13/30] rcu: fix header guard for rcu_all_qs() Ankur Arora
2024-02-13  5:55 ` [PATCH 14/30] preempt,rcu: warn on PREEMPT_RCU=n, preempt=full Ankur Arora
2024-02-13  5:55 ` [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Ankur Arora
2024-03-10 10:03   ` Joel Fernandes
2024-03-10 18:56     ` Paul E. McKenney
2024-03-11  0:48       ` Joel Fernandes
2024-03-11  3:56         ` Paul E. McKenney
2024-03-11 15:01           ` Joel Fernandes
2024-03-11 20:51             ` Ankur Arora
2024-03-11 22:12               ` Thomas Gleixner
2024-03-11  5:18         ` Ankur Arora
2024-03-11 15:25           ` Joel Fernandes
2024-03-11 19:12             ` Thomas Gleixner
2024-03-11 19:53               ` Paul E. McKenney
2024-03-11 20:29                 ` Thomas Gleixner
2024-03-12  0:01                   ` Paul E. McKenney
2024-03-12  0:08               ` Joel Fernandes
2024-03-12  3:16                 ` Ankur Arora
2024-03-12  3:24                   ` Joel Fernandes
2024-03-12  5:23                     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 16/30] rcu: force context-switch " Ankur Arora
2024-02-13  5:55 ` [PATCH 17/30] x86/thread_info: define TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-14 13:25   ` Mark Rutland
2024-02-14 20:31     ` Ankur Arora
2024-02-19 12:32       ` Mark Rutland
2024-02-13  5:55 ` [PATCH 18/30] sched: prepare for lazy rescheduling in resched_curr() Ankur Arora
2024-02-13  5:55 ` [PATCH 19/30] sched: default preemption policy for PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 20/30] sched: handle idle preemption " Ankur Arora
2024-02-13  5:55 ` [PATCH 21/30] sched: schedule eagerly in resched_cpu() Ankur Arora
2024-02-13  5:55 ` [PATCH 22/30] sched/fair: refactor update_curr(), entity_tick() Ankur Arora
2024-02-13  5:55 ` [PATCH 23/30] sched/fair: handle tick expiry under lazy preemption Ankur Arora
2024-02-21 21:38   ` Steven Rostedt
2024-02-28 13:47   ` Juri Lelli
2024-02-29  6:43     ` Ankur Arora
2024-02-29  9:33       ` Juri Lelli
2024-02-29 23:54         ` Ankur Arora
2024-03-01  0:28           ` Paul E. McKenney
2024-02-13  5:55 ` [PATCH 24/30] sched: support preempt=none under PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 25/30] sched: support preempt=full " Ankur Arora
2024-02-13  5:55 ` [PATCH 26/30] sched: handle preempt=voluntary " Ankur Arora
2024-03-03  1:08   ` Joel Fernandes
2024-03-05  8:11     ` Ankur Arora
2024-03-06 20:42       ` Joel Fernandes
2024-03-07 19:01         ` Paul E. McKenney
2024-03-08  0:15           ` Joel Fernandes
2024-03-08  0:42             ` Paul E. McKenney
2024-03-08  4:22               ` Ankur Arora
2024-03-08 21:33                 ` Paul E. McKenney
2024-03-11  4:50                   ` Ankur Arora
2024-03-11 19:26                     ` Paul E. McKenney
2024-03-11 20:09                       ` Ankur Arora
2024-03-11 20:23                         ` Linus Torvalds
2024-03-11 21:03                           ` Ankur Arora
2024-03-12  0:03                           ` Paul E. McKenney
2024-03-12 12:14                             ` Thomas Gleixner
2024-03-12 19:40                               ` Paul E. McKenney
2024-03-08  3:49             ` Ankur Arora
2024-03-08  5:29               ` Joel Fernandes
2024-03-08  6:54               ` Juri Lelli
2024-03-11  5:34                 ` Ankur Arora
2024-02-13  5:55 ` [PATCH 27/30] sched: latency warn for TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-13  5:55 ` [PATCH 28/30] tracing: support lazy resched Ankur Arora
2024-02-13  5:55 ` [PATCH 29/30] Documentation: tracing: add TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-21 21:43   ` Steven Rostedt
2024-02-21 23:22     ` Ankur Arora
2024-02-21 23:53       ` Steven Rostedt
2024-03-01 23:33     ` Joel Fernandes
2024-03-02  3:09       ` Ankur Arora
2024-03-03 19:32         ` Joel Fernandes
2024-02-13  5:55 ` [PATCH 30/30] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y Ankur Arora
2024-02-13  9:47 ` [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Geert Uytterhoeven
2024-02-13 21:46   ` Ankur Arora
2024-02-14 23:57 ` Paul E. McKenney
2024-02-15  2:03   ` Ankur Arora
2024-02-15  3:45     ` Paul E. McKenney
2024-02-15 19:28       ` Paul E. McKenney
2024-02-15 20:04         ` Thomas Gleixner
2024-02-15 20:54           ` Paul E. McKenney
2024-02-15 20:53         ` Ankur Arora
2024-02-15 20:55           ` Paul E. McKenney
2024-02-15 21:24         ` Ankur Arora
2024-02-15 22:54           ` Paul E. McKenney
2024-02-15 22:56             ` Paul E. McKenney
2024-02-16  0:45             ` Ankur Arora
2024-02-16  2:59               ` Paul E. McKenney
2024-02-17  0:55                 ` Paul E. McKenney
2024-02-17  3:59                   ` Ankur Arora
2024-02-18 18:17                     ` Paul E. McKenney
2024-02-19 16:48                       ` Paul E. McKenney
2024-02-21 18:19                         ` Steven Rostedt
2024-02-21 19:41                           ` Paul E. McKenney [this message]
2024-02-21 20:11                             ` Steven Rostedt
2024-02-21 20:22                               ` Paul E. McKenney
2024-02-22 15:50                                 ` Mark Rutland
2024-02-22 19:11                                   ` Paul E. McKenney
2024-02-23 11:05                                     ` Mark Rutland
2024-02-23 15:31                                       ` Paul E. McKenney
2024-03-02  1:16                                         ` Paul E. McKenney
2024-03-19 11:45                                           ` Tasks RCU, ftrace, and trampolines (was: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling) Mark Rutland
2024-03-19 23:33                                             ` Paul E. McKenney
2024-02-21  6:48                   ` [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-02-21 17:44                     ` Paul E. McKenney
2024-02-16  0:45             ` Ankur Arora
2024-02-21 12:23 ` Raghavendra K T
2024-02-21 17:15   ` Thomas Gleixner
2024-02-21 17:27     ` Raghavendra K T
2024-02-21 21:16       ` Ankur Arora
2024-02-22  4:05         ` Raghavendra K T
2024-02-22 21:23       ` Thomas Gleixner
2024-02-23  3:14         ` Ankur Arora
2024-02-23  6:28           ` Raghavendra K T
2024-02-24  3:15             ` Raghavendra K T
2024-02-27 17:45               ` Ankur Arora
2024-02-22 13:04     ` Raghavendra K T
2024-04-23 15:21 ` Shrikanth Hegde
2024-04-23 16:13   ` Linus Torvalds
2024-04-26  7:46     ` Shrikanth Hegde
2024-04-26 19:00       ` Ankur Arora
2024-05-07 11:16         ` Shrikanth Hegde
2024-05-08  5:18           ` Ankur Arora
2024-05-15 14:31             ` Shrikanth Hegde
     [not found] <draft-87a5o4go5i.ffs@tglx>
2024-02-19 15:54 ` Thomas Gleixner
2024-02-21  6:48   ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f30ecd8-629b-414e-b6ea-b526b265b592@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=David.Laight@aculab.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=bristot@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jon.grimm@amd.com \
    --cc=jpoimboe@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=krypton@ulrich-teichert.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mattst88@gmail.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richard@nod.at \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).