All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: paulmck@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	willy@infradead.org, mgorman@suse.de, rostedt@goodmis.org,
	jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com,
	boris.ostrovsky@oracle.com, konrad.wilk@oracle.com,
	jgross@suse.com, andrew.cooper3@citrix.com,
	Frederic Weisbecker <fweisbec@gmail.com>
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED
Date: Wed, 18 Oct 2023 15:16:12 +0200	[thread overview]
Message-ID: <87pm1c3wbn.ffs@tglx> (raw)
In-Reply-To: <a375674b-de27-4965-a4bf-e0679229e28e@paulmck-laptop>

Paul!

On Tue, Oct 17 2023 at 18:03, Paul E. McKenney wrote:
> Belatedly calling out some RCU issues.  Nothing fatal, just a
> (surprisingly) few adjustments that will need to be made.  The key thing
> to note is that from RCU's viewpoint, with this change, all kernels
> are preemptible, though rcu_read_lock() readers remain
> non-preemptible.

Why? Either I'm confused or you or both of us :)

With this approach the kernel is by definition fully preemptible, which
means means rcu_read_lock() is preemptible too. That's pretty much the
same situation as with PREEMPT_DYNAMIC.

For throughput sake this fully preemptible kernel provides a mechanism
to delay preemption for SCHED_OTHER tasks, i.e. instead of setting
NEED_RESCHED the scheduler sets NEED_RESCHED_LAZY.

That means the preemption points in preempt_enable() and return from
interrupt to kernel will not see NEED_RESCHED and the tasks can run to
completion either to the point where they call schedule() or when they
return to user space. That's pretty much what PREEMPT_NONE does today.

The difference to NONE/VOLUNTARY is that the explicit cond_resched()
points are not longer required because the scheduler can preempt the
long running task by setting NEED_RESCHED instead.

That preemption might be suboptimal in some cases compared to
cond_resched(), but from my initial experimentation that's not really an
issue.

> With that:
>
> 1.	As an optimization, given that preempt_count() would always give
> 	good information, the scheduling-clock interrupt could sense RCU
> 	readers for new-age CONFIG_PREEMPT_NONE=y kernels.  As might the
> 	IPI handlers for expedited grace periods.  A nice optimization.
> 	Except that...
>
> 2.	The quiescent-state-forcing code currently relies on the presence
> 	of cond_resched() in CONFIG_PREEMPT_RCU=n kernels.  One fix
> 	would be to do resched_cpu() more quickly, but some workloads
> 	might not love the additional IPIs.  Another approach to do #1
> 	above to replace the quiescent states from cond_resched() with
> 	scheduler-tick-interrupt-sensed quiescent states.

Right. The tick can see either the lazy resched bit "ignored" or some
magic "RCU needs a quiescent state" and force a reschedule.

> 	Plus...
>
> 3.	For nohz_full CPUs that run for a long time in the kernel,
> 	there are no scheduling-clock interrupts.  RCU reaches for
> 	the resched_cpu() hammer a few jiffies into the grace period.
> 	And it sets the ->rcu_urgent_qs flag so that the holdout CPU's
> 	interrupt-entry code will re-enable its scheduling-clock interrupt
> 	upon receiving the resched_cpu() IPI.

You can spare the IPI by setting NEED_RESCHED on the remote CPU which
will cause it to preempt.

> 	So nohz_full CPUs should be OK as far as RCU is concerned.
> 	Other subsystems might have other opinions.
>
> 4.	As another optimization, kvfree_rcu() could unconditionally
> 	check preempt_count() to sense a clean environment suitable for
> 	memory allocation.

Correct. All the limitations of preempt count being useless are gone.

> 5.	Kconfig files with "select TASKS_RCU if PREEMPTION" must
> 	instead say "select TASKS_RCU".  This means that the #else
> 	in include/linux/rcupdate.h that defines TASKS_RCU in terms of
> 	vanilla RCU must go.  There might be be some fallout if something
> 	fails to select TASKS_RCU, builds only with CONFIG_PREEMPT_NONE=y,
> 	and expects call_rcu_tasks(), synchronize_rcu_tasks(), or
> 	rcu_tasks_classic_qs() do do something useful.

In the end there is no CONFIG_PREEMPT_XXX anymore. The only knob
remaining would be CONFIG_PREEMPT_RT, which should be renamed to
CONFIG_RT or such as it does not really change the preemption
model itself. RT just reduces the preemption disabled sections with the
lock conversions, forced interrupt threading and some more.

> 6.	You might think that RCU Tasks (as opposed to RCU Tasks Trace
> 	or RCU Tasks Rude) would need those pesky cond_resched() calls
> 	to stick around.  The reason is that RCU Tasks readers are ended
> 	only by voluntary context switches.  This means that although a
> 	preemptible infinite loop in the kernel won't inconvenience a
> 	real-time task (nor an non-real-time task for all that long),
> 	and won't delay grace periods for the other flavors of RCU,
> 	it would indefinitely delay an RCU Tasks grace period.
>
> 	However, RCU Tasks grace periods seem to be finite in preemptible
> 	kernels today, so they should remain finite in limited-preemptible
> 	kernels tomorrow.  Famous last words...

That's an issue which you have today with preempt FULL, right? So if it
turns out to be a problem then it's not a problem of the new model.

> 7.	RCU Tasks Trace, RCU Tasks Rude, and SRCU shouldn't notice
> 	any algorithmic difference from this change.
>
> 8.	As has been noted elsewhere, in this new limited-preemption
> 	mode of operation, rcu_read_lock() readers remain preemptible.
> 	This means that most of the CONFIG_PREEMPT_RCU #ifdefs remain.

Why? You fundamentally have a preemptible kernel with PREEMPT_RCU, no?

> 9.	The rcu_preempt_depth() macro could do something useful in
> 	limited-preemption kernels.  Its current lack of ability in
> 	CONFIG_PREEMPT_NONE=y kernels has caused trouble in the past.

Correct.

> 10.	The cond_resched_rcu() function must remain because we still
> 	have non-preemptible rcu_read_lock() readers.

Where?

> 11.	My guess is that the IPVS_EST_TICK_CHAINS heuristic remains
> 	unchanged, but I must defer to the include/net/ip_vs.h people.

*blink*

> 12.	I need to check with the BPF folks on the BPF verifier's
> 	definition of BTF_ID(func, rcu_read_unlock_strict).
>
> 13.	The kernel/locking/rtmutex.c file's rtmutex_spin_on_owner()
> 	function might have some redundancy across the board instead
> 	of just on CONFIG_PREEMPT_RCU=y.  Or might not.
>
> 14.	The kernel/trace/trace_osnoise.c file's run_osnoise() function
> 	might need to do something for non-preemptible RCU to make
> 	up for the lack of cond_resched() calls.  Maybe just drop the
> 	"IS_ENABLED()" and execute the body of the current "if" statement
> 	unconditionally.

Again. There is no non-preemtible RCU with this model, unless I'm
missing something important here.

> 15.	I must defer to others on the mm/pgtable-generic.c file's
> 	#ifdef that depends on CONFIG_PREEMPT_RCU.

All those ifdefs should die :)

> While in the area, I noted that KLP seems to depend on cond_resched(),
> but on this I must defer to the KLP people.

Yeah, KLP needs some thoughts, but that's not rocket science to fix IMO.

> I am sure that I am missing something, but I have not yet seen any
> show-stoppers.  Just some needed adjustments.

Right. If it works out as I think it can work out the main adjustments
are to remove a large amount of #ifdef maze and related gunk :)

Thanks,

        tglx

  parent reply	other threads:[~2023-10-18 13:16 UTC|newest]

Thread overview: 214+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-30 18:49 [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-30 18:49 ` [PATCH v2 1/9] mm/clear_huge_page: allow arch override for clear_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 2/9] mm/huge_page: separate clear_huge_page() and copy_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 3/9] mm/huge_page: cleanup clear_/copy_subpage() Ankur Arora
2023-09-08 13:09   ` Matthew Wilcox
2023-09-11 17:22     ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 4/9] x86/clear_page: extend clear_page*() for multi-page clearing Ankur Arora
2023-09-08 13:11   ` Matthew Wilcox
2023-08-30 18:49 ` [PATCH v2 5/9] x86/clear_page: add clear_pages() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 6/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-31 18:26   ` kernel test robot
2023-09-08 12:38   ` Peter Zijlstra
2023-09-13  6:43   ` Raghavendra K T
2023-08-30 18:49 ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-08  7:02   ` Peter Zijlstra
2023-09-08 17:15     ` Linus Torvalds
2023-09-08 22:50       ` Peter Zijlstra
2023-09-09  5:15         ` Linus Torvalds
2023-09-09  6:39           ` Ankur Arora
2023-09-09  9:11             ` Peter Zijlstra
2023-09-09 20:04               ` Ankur Arora
2023-09-09  5:30       ` Ankur Arora
2023-09-09  9:12         ` Peter Zijlstra
2023-09-09 20:15     ` Ankur Arora
2023-09-09 21:16       ` Linus Torvalds
2023-09-10  3:48         ` Ankur Arora
2023-09-10  4:35           ` Linus Torvalds
2023-09-10 10:01             ` Ankur Arora
2023-09-10 18:32               ` Linus Torvalds
2023-09-11 15:04                 ` Peter Zijlstra
2023-09-11 16:29                   ` andrew.cooper3
2023-09-11 17:04                   ` Ankur Arora
2023-09-12  8:26                     ` Peter Zijlstra
2023-09-12 12:24                       ` Phil Auld
2023-09-12 12:33                       ` Matthew Wilcox
2023-09-18 23:42                       ` Thomas Gleixner
2023-09-19  1:57                         ` Linus Torvalds
2023-09-19  8:03                           ` Ingo Molnar
2023-09-19  8:43                             ` Ingo Molnar
2023-09-19 13:43                               ` Thomas Gleixner
2023-09-19 13:25                             ` Thomas Gleixner
2023-09-19 12:30                           ` Thomas Gleixner
2023-09-19 13:00                             ` Arches that don't support PREEMPT Matthew Wilcox
2023-09-19 13:00                               ` Matthew Wilcox
2023-09-19 13:00                               ` Matthew Wilcox
2023-09-19 13:34                               ` Geert Uytterhoeven
2023-09-19 13:34                                 ` Geert Uytterhoeven
2023-09-19 13:34                                 ` Geert Uytterhoeven
2023-09-19 13:37                               ` John Paul Adrian Glaubitz
2023-09-19 13:37                                 ` John Paul Adrian Glaubitz
2023-09-19 13:37                                 ` John Paul Adrian Glaubitz
2023-09-19 13:42                                 ` Peter Zijlstra
2023-09-19 13:42                                   ` Peter Zijlstra
2023-09-19 13:42                                   ` Peter Zijlstra
2023-09-19 13:48                                   ` John Paul Adrian Glaubitz
2023-09-19 13:48                                     ` John Paul Adrian Glaubitz
2023-09-19 13:48                                     ` John Paul Adrian Glaubitz
2023-09-19 14:16                                     ` Peter Zijlstra
2023-09-19 14:16                                       ` Peter Zijlstra
2023-09-19 14:16                                       ` Peter Zijlstra
2023-09-19 14:24                                       ` John Paul Adrian Glaubitz
2023-09-19 14:24                                         ` John Paul Adrian Glaubitz
2023-09-19 14:24                                         ` John Paul Adrian Glaubitz
2023-09-19 14:32                                         ` Matthew Wilcox
2023-09-19 14:32                                           ` Matthew Wilcox
2023-09-19 14:32                                           ` Matthew Wilcox
2023-09-19 15:31                                           ` Steven Rostedt
2023-09-19 15:31                                             ` Steven Rostedt
2023-09-19 15:31                                             ` Steven Rostedt
2023-09-20 14:38                                       ` Anton Ivanov
2023-09-20 14:38                                         ` Anton Ivanov
2023-09-20 14:38                                         ` Anton Ivanov
2023-09-21 12:20                                       ` Arnd Bergmann
2023-09-21 12:20                                         ` Arnd Bergmann
2023-09-21 12:20                                         ` Arnd Bergmann
2023-09-19 14:17                                     ` Thomas Gleixner
2023-09-19 14:17                                       ` Thomas Gleixner
2023-09-19 14:17                                       ` Thomas Gleixner
2023-09-19 14:50                                       ` H. Peter Anvin
2023-09-19 14:50                                         ` H. Peter Anvin
2023-09-19 14:50                                         ` H. Peter Anvin
2023-09-19 14:57                                         ` Matt Turner
2023-09-19 14:57                                           ` Matt Turner
2023-09-19 14:57                                           ` Matt Turner
2023-09-19 17:09                                         ` Ulrich Teichert
2023-09-19 17:09                                           ` Ulrich Teichert
2023-09-19 17:25                                     ` Linus Torvalds
2023-09-19 17:25                                       ` Linus Torvalds
2023-09-19 17:25                                       ` Linus Torvalds
2023-09-19 17:58                                       ` John Paul Adrian Glaubitz
2023-09-19 17:58                                         ` John Paul Adrian Glaubitz
2023-09-19 17:58                                         ` John Paul Adrian Glaubitz
2023-09-19 18:31                                       ` Thomas Gleixner
2023-09-19 18:31                                         ` Thomas Gleixner
2023-09-19 18:31                                         ` Thomas Gleixner
2023-09-19 18:38                                         ` Steven Rostedt
2023-09-19 18:38                                           ` Steven Rostedt
2023-09-19 18:38                                           ` Steven Rostedt
2023-09-19 18:52                                           ` Linus Torvalds
2023-09-19 18:52                                             ` Linus Torvalds
2023-09-19 18:52                                             ` Linus Torvalds
2023-09-19 19:53                                             ` Thomas Gleixner
2023-09-19 19:53                                               ` Thomas Gleixner
2023-09-19 19:53                                               ` Thomas Gleixner
2023-09-20  7:32                                           ` Ingo Molnar
2023-09-20  7:32                                             ` Ingo Molnar
2023-09-20  7:32                                             ` Ingo Molnar
2023-09-20  7:29                                         ` Ingo Molnar
2023-09-20  7:29                                           ` Ingo Molnar
2023-09-20  7:29                                           ` Ingo Molnar
2023-09-20  8:26                                       ` Thomas Gleixner
2023-09-20  8:26                                         ` Thomas Gleixner
2023-09-20  8:26                                         ` Thomas Gleixner
2023-09-20 10:37                                       ` David Laight
2023-09-20 10:37                                         ` David Laight
2023-09-20 10:37                                         ` David Laight
2023-09-19 14:21                                   ` Anton Ivanov
2023-09-19 14:21                                     ` Anton Ivanov
2023-09-19 14:21                                     ` Anton Ivanov
2023-09-19 15:17                                     ` Thomas Gleixner
2023-09-19 15:17                                       ` Thomas Gleixner
2023-09-19 15:17                                       ` Thomas Gleixner
2023-09-19 15:21                                       ` Anton Ivanov
2023-09-19 15:21                                         ` Anton Ivanov
2023-09-19 15:21                                         ` Anton Ivanov
2023-09-19 16:22                                         ` Richard Weinberger
2023-09-19 16:22                                           ` Richard Weinberger
2023-09-19 16:22                                           ` Richard Weinberger
2023-09-19 16:41                                           ` Anton Ivanov
2023-09-19 16:41                                             ` Anton Ivanov
2023-09-19 16:41                                             ` Anton Ivanov
2023-09-19 17:33                                             ` Thomas Gleixner
2023-09-19 17:33                                               ` Thomas Gleixner
2023-09-19 17:33                                               ` Thomas Gleixner
2023-10-06 14:51                               ` Geert Uytterhoeven
2023-10-06 14:51                                 ` Geert Uytterhoeven
2023-09-20 14:22                             ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-20 20:51                               ` Thomas Gleixner
2023-09-21  0:14                                 ` Thomas Gleixner
2023-09-21  0:58                                 ` Ankur Arora
2023-09-21  2:12                                   ` Thomas Gleixner
2023-09-20 23:58                             ` Thomas Gleixner
2023-09-21  0:57                               ` Ankur Arora
2023-09-21  2:02                                 ` Thomas Gleixner
2023-09-21  4:16                                   ` Ankur Arora
2023-09-21 13:59                                     ` Steven Rostedt
2023-09-21 16:00                               ` Linus Torvalds
2023-09-21 22:55                                 ` Thomas Gleixner
2023-09-23  1:11                                   ` Thomas Gleixner
2023-10-02 14:15                                     ` Steven Rostedt
2023-10-02 16:13                                       ` Thomas Gleixner
2023-10-18  1:03                                     ` Paul E. McKenney
2023-10-18 12:09                                       ` Ankur Arora
2023-10-18 17:51                                         ` Paul E. McKenney
2023-10-18 22:53                                           ` Thomas Gleixner
2023-10-18 23:25                                             ` Paul E. McKenney
2023-10-18 13:16                                       ` Thomas Gleixner [this message]
2023-10-18 14:31                                         ` Steven Rostedt
2023-10-18 17:55                                           ` Paul E. McKenney
2023-10-18 18:00                                             ` Steven Rostedt
2023-10-18 18:13                                               ` Paul E. McKenney
2023-10-19 12:37                                                 ` Daniel Bristot de Oliveira
2023-10-19 17:08                                                   ` Paul E. McKenney
2023-10-18 17:19                                         ` Paul E. McKenney
2023-10-18 17:41                                           ` Steven Rostedt
2023-10-18 17:59                                             ` Paul E. McKenney
2023-10-18 20:15                                           ` Ankur Arora
2023-10-18 20:42                                             ` Paul E. McKenney
2023-10-19  0:21                                           ` Thomas Gleixner
2023-10-19 19:13                                             ` Paul E. McKenney
2023-10-20 21:59                                               ` Paul E. McKenney
2023-10-20 22:56                                               ` Ankur Arora
2023-10-20 23:36                                                 ` Paul E. McKenney
2023-10-21  1:05                                                   ` Ankur Arora
2023-10-21  2:08                                                     ` Paul E. McKenney
2023-10-24 12:15                                               ` Thomas Gleixner
2023-10-24 18:59                                                 ` Paul E. McKenney
2023-09-23 22:50                             ` Thomas Gleixner
2023-09-24  0:10                               ` Thomas Gleixner
2023-09-24  7:19                               ` Matthew Wilcox
2023-09-24  7:55                                 ` Thomas Gleixner
2023-09-24 10:29                                   ` Matthew Wilcox
2023-09-25  0:13                               ` Ankur Arora
2023-10-06 13:01                             ` Geert Uytterhoeven
2023-09-19  7:21                         ` Ingo Molnar
2023-09-19 19:05                         ` Ankur Arora
2023-10-24 14:34                         ` Steven Rostedt
2023-10-25  1:49                           ` Steven Rostedt
2023-10-26  7:50                           ` Sergey Senozhatsky
2023-10-26 12:48                             ` Steven Rostedt
2023-09-11 16:48             ` Steven Rostedt
2023-09-11 20:50               ` Linus Torvalds
2023-09-11 21:16                 ` Linus Torvalds
2023-09-12  7:20                   ` Peter Zijlstra
2023-09-12  7:38                     ` Ingo Molnar
2023-09-11 22:20                 ` Steven Rostedt
2023-09-11 23:10                   ` Ankur Arora
2023-09-11 23:16                     ` Steven Rostedt
2023-09-12 16:30                   ` Linus Torvalds
2023-09-12  3:27                 ` Matthew Wilcox
2023-09-12 16:20                   ` Linus Torvalds
2023-09-19  3:21   ` Andy Lutomirski
2023-09-19  9:20     ` Thomas Gleixner
2023-09-19  9:49       ` Ingo Molnar
2023-08-30 18:49 ` [PATCH v2 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-09-08 12:42   ` Peter Zijlstra
2023-09-11 17:24     ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-09-08 12:45   ` Peter Zijlstra
2023-09-03  8:14 ` [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Mateusz Guzik
2023-09-05 22:14   ` Ankur Arora
2023-09-08  2:18   ` Raghavendra K T
2023-09-05  1:06 ` Raghavendra K T
2023-09-05 19:36   ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pm1c3wbn.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jon.grimm@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.