* [GIT PULL] RCU changes for v5.10
@ 2020-10-12 14:14 Ingo Molnar
2020-10-12 20:25 ` Linus Torvalds
2020-10-18 21:39 ` Linus Torvalds
0 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2020-10-12 14:14 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Paul E. McKenney, Thomas Gleixner, Peter Zijlstra,
Andrew Morton
Linus,
Please pull the latest core/rcu git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12
# HEAD: c6de896fa0a4546c799c86513d99bd011b4a6177 Merge branch 'rcu/fix-rt' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
RCU changes for v5.10:
- Debugging for smp_call_function()
- RT raw/non-raw lock ordering fixes
- Strict grace periods for KASAN
- New smp_call_function() torture test
- Torture-test updates
- Documentation updates
- Miscellaneous fixes
Thanks,
Ingo
------------------>
Alexander A. Klimov (1):
rcutorture: Replace HTTP links with HTTPS ones
Colin Ian King (1):
refperf: Avoid null pointer dereference when buf fails to allocate
Joel Fernandes (Google) (6):
rcu/trace: Print negative GP numbers correctly
rcu/trace: Use gp_seq_req in acceleration's rcu_grace_period tracepoint
rcu: Clarify comments about FQS loop reporting quiescent states
rcu: Make FQS more aggressive in complaining about offline CPUs
rcutorture: Output number of elapsed grace periods
rcu/segcblist: Prevent useless GP start if no CBs to accelerate
Madhuparna Bhowmik (2):
rculist: Introduce list/hlist_for_each_entry_srcu() macros
kvm: mmu: page_track: Fix RCU list API usage
Neeraj Upadhyay (2):
rcu/tree: Force quiescent state on callback overload
rcu/tree: Remove CONFIG_PREMPT_RCU check in force_qs_rnp()
Paul E. McKenney (56):
lib: Add backtrace_idle parameter to force backtrace of idle CPUs
rcu: Remove KCSAN stubs
rcu: Remove KCSAN stubs from update.c
srcu: Remove KCSAN stubs
rcu: Initialize at declaration time in rcu_exp_handler()
nocb: Clarify RCU nocb CPU error message
nocb: Remove show_rcu_nocb_state() false positive printout
rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_divisor
rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_resched_ns
rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_kick_kthreads
rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_cpu_stall_ftrace_dump
rcu: Move rcu_cpu_started per-CPU variable to rcu_data
rcu/nocb: Add a warning for non-GP kthread running GP code
rcu: Remove unused __rcu_is_watching() function
scftorture: Add smp_call_function() torture test
torture: Declare parse-console.sh independence from rcutorture
torture: Add scftorture to the rcutorture scripting
scftorture: Implement weighted primitive selection
tick-sched: Clarify "NOHZ: local_softirq_pending" warning
scftorture: Summarize per-thread statistics
scftorture: Add smp_call_function_single() memory-ordering checks
scftorture: Add smp_call_function_many() memory-ordering checks
scftorture: Add smp_call_function() memory-ordering checks
scftorture: Consolidate scftorture_invoke_one() check and kfree()
scftorture: Consolidate scftorture_invoke_one() scf_check initialization
scftorture: Flag errors in torture-compatible manner
scftorture: Prevent compiler from reducing race probabilities
scftorture: Check unexpected "switch" statement value
scftorture: Block scftorture_invoker() kthreads for offline CPUs
scftorture: Adapt memory-ordering test to UP operation
scftorture: Add cond_resched() to test loop
rcuperf: Change rcuperf to rcuscale
rcu: Add Kconfig option for strict RCU grace periods
rcu: Reduce leaf fanout for strict RCU grace periods
rcu: Restrict default jiffies_till_first_fqs for strict RCU GPs
rcu: Force DEFAULT_RCU_BLIMIT to 1000 for strict RCU GPs
rcu: Always set .need_qs from __rcu_read_lock() for strict GPs
rcu: Do full report for .need_qs for strict GPs
rcu: Attempt QS when CPU discovers GP for strict GPs
rcu: IPI all CPUs at GP start for strict GPs
rcu: IPI all CPUs at GP end for strict GPs
rcu: Provide optional RCU-reader exit delay for strict GPs
rcu: Execute RCU reader shortly after rcu_core for strict GPs
rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() for strict GPs
rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
rcutorture: Remove KCSAN stubs
torture: Update initrd documentation
rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
torture: Add kvm.sh --help and update help message
rcutorture: Properly set rcu_fwds for OOM handling
rcutorture: Properly synchronize with OOM notifier
rcutorture: Hoist OOM registry up one level
rcutorture: Allow pointer leaks to test diagnostic code
torture: Add gdb support
smp: Add source and destination CPUs to __call_single_data
kernel/smp: Provide CSD lock timeout diagnostics
Paul Gortmaker (1):
torture: document --allcpus argument added to the kvm.sh script
Randy Dunlap (2):
doc: Drop doubled words from RCU Data-Structures.rst
doc: Drop doubled words from RCU requirements documentation
Thomas Gleixner (13):
lib/debug: Remove pointless ARCH_NO_PREEMPT dependencies
preempt: Make preempt count unconditional
preempt: Cleanup PREEMPT_COUNT leftovers
lockdep: Cleanup PREEMPT_COUNT leftovers
mm/pagemap: Cleanup PREEMPT_COUNT leftovers
locking/bitspinlock: Cleanup PREEMPT_COUNT leftovers
uaccess: Cleanup PREEMPT_COUNT leftovers
sched: Cleanup PREEMPT_COUNT leftovers
ARM: Cleanup PREEMPT_COUNT leftovers
xtensa: Cleanup PREEMPT_COUNT leftovers
drm/i915: Cleanup PREEMPT_COUNT leftovers
rcutorture: Cleanup PREEMPT_COUNT leftovers
preempt: Remove PREEMPT_COUNT from Kconfig
Tobias Klauser (2):
docs: Fix typo in synchronize_rcu() function name
rcu: Fix kerneldoc comments in rcupdate.h
Uladzislau Rezki (Sony) (1):
rcu/tree: Allocate a page when caller is preemptible
Wei Yongjun (3):
scftorture: Make symbol 'scf_torture_rand' static
locktorture: Make function torture_percpu_rwsem_init() static
smp: Make symbol 'csd_bug_count' static
Zqiang (1):
rcu: Shrink each possible cpu krcp
kernel test robot (1):
kvfree_rcu(): Fix ifnullfree.cocci warnings
.../RCU/Design/Data-Structures/Data-Structures.rst | 2 +-
.../RCU/Design/Requirements/Requirements.rst | 4 +-
Documentation/RCU/whatisRCU.rst | 2 +-
Documentation/admin-guide/kernel-parameters.txt | 153 +++++-
MAINTAINERS | 3 +-
arch/arm/include/asm/assembler.h | 11 -
arch/arm/kernel/iwmmxt.S | 2 -
arch/arm/mach-ep93xx/crunch-bits.S | 2 -
arch/x86/kvm/mmu/page_track.c | 6 +-
arch/xtensa/kernel/entry.S | 2 +-
drivers/gpu/drm/i915/Kconfig.debug | 1 -
drivers/gpu/drm/i915/i915_utils.h | 3 +-
include/linux/bit_spinlock.h | 4 +-
include/linux/lockdep.h | 6 +-
include/linux/pagemap.h | 4 +-
include/linux/preempt.h | 37 +-
include/linux/rculist.h | 48 ++
include/linux/rcupdate.h | 19 +-
include/linux/rcutiny.h | 1 -
include/linux/rcutree.h | 1 -
include/linux/smp.h | 3 +
include/linux/smp_types.h | 3 +
include/linux/uaccess.h | 6 +-
include/trace/events/rcu.h | 54 +-
kernel/Kconfig.preempt | 4 -
kernel/Makefile | 2 +
kernel/entry/common.c | 2 +-
kernel/locking/locktorture.c | 2 +-
kernel/rcu/Kconfig | 8 +-
kernel/rcu/Kconfig.debug | 17 +-
kernel/rcu/Makefile | 2 +-
kernel/rcu/rcu_segcblist.c | 10 +-
kernel/rcu/{rcuperf.c => rcuscale.c} | 330 ++++++------
kernel/rcu/rcutorture.c | 61 ++-
kernel/rcu/refscale.c | 8 +-
kernel/rcu/srcutree.c | 13 -
kernel/rcu/tree.c | 244 +++++----
kernel/rcu/tree.h | 2 +
kernel/rcu/tree_exp.h | 6 +-
kernel/rcu/tree_plugin.h | 40 +-
kernel/rcu/tree_stall.h | 8 +-
kernel/rcu/update.c | 13 -
kernel/scftorture.c | 575 +++++++++++++++++++++
kernel/sched/core.c | 6 +-
kernel/smp.c | 134 +++++
kernel/time/tick-sched.c | 2 +-
lib/Kconfig.debug | 24 +-
lib/nmi_backtrace.c | 6 +-
...rf-ftrace.sh => kvm-recheck-rcuscale-ftrace.sh} | 6 +-
...-recheck-rcuperf.sh => kvm-recheck-rcuscale.sh} | 14 +-
.../selftests/rcutorture/bin/kvm-recheck-scf.sh | 38 ++
.../selftests/rcutorture/bin/kvm-test-1-run.sh | 33 +-
tools/testing/selftests/rcutorture/bin/kvm.sh | 36 +-
.../selftests/rcutorture/bin/parse-console.sh | 11 +-
.../selftests/rcutorture/configs/rcu/SRCU-t | 1 -
.../selftests/rcutorture/configs/rcu/SRCU-u | 1 -
.../selftests/rcutorture/configs/rcu/TINY01 | 1 -
.../selftests/rcutorture/configs/rcu/TREE05 | 1 +
.../selftests/rcutorture/configs/rcuperf/CFcommon | 2 -
.../configs/{rcuperf => rcuscale}/CFLIST | 0
.../selftests/rcutorture/configs/rcuscale/CFcommon | 2 +
.../rcutorture/configs/{rcuperf => rcuscale}/TINY | 0
.../rcutorture/configs/{rcuperf => rcuscale}/TREE | 0
.../configs/{rcuperf => rcuscale}/TREE54 | 0
.../configs/{rcuperf => rcuscale}/ver_functions.sh | 4 +-
.../selftests/rcutorture/configs/scf/CFLIST | 2 +
.../selftests/rcutorture/configs/scf/CFcommon | 2 +
.../selftests/rcutorture/configs/scf/NOPREEMPT | 9 +
.../rcutorture/configs/scf/NOPREEMPT.boot | 1 +
.../selftests/rcutorture/configs/scf/PREEMPT | 9 +
.../rcutorture/configs/scf/ver_functions.sh | 30 ++
.../testing/selftests/rcutorture/doc/TINY_RCU.txt | 5 +-
.../selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 1 -
tools/testing/selftests/rcutorture/doc/initrd.txt | 36 +-
.../selftests/rcutorture/doc/rcu-test-image.txt | 41 +-
.../rcutorture/formal/srcu-cbmc/src/config.h | 1 -
76 files changed, 1626 insertions(+), 557 deletions(-)
rename kernel/rcu/{rcuperf.c => rcuscale.c} (64%)
create mode 100644 kernel/scftorture.c
rename tools/testing/selftests/rcutorture/bin/{kvm-recheck-rcuperf-ftrace.sh => kvm-recheck-rcuscale-ftrace.sh} (92%)
rename tools/testing/selftests/rcutorture/bin/{kvm-recheck-rcuperf.sh => kvm-recheck-rcuscale.sh} (84%)
create mode 100755 tools/testing/selftests/rcutorture/bin/kvm-recheck-scf.sh
delete mode 100644 tools/testing/selftests/rcutorture/configs/rcuperf/CFcommon
rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/CFLIST (100%)
create mode 100644 tools/testing/selftests/rcutorture/configs/rcuscale/CFcommon
rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/TINY (100%)
rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/TREE (100%)
rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/TREE54 (100%)
rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/ver_functions.sh (88%)
create mode 100644 tools/testing/selftests/rcutorture/configs/scf/CFLIST
create mode 100644 tools/testing/selftests/rcutorture/configs/scf/CFcommon
create mode 100644 tools/testing/selftests/rcutorture/configs/scf/NOPREEMPT
create mode 100644 tools/testing/selftests/rcutorture/configs/scf/NOPREEMPT.boot
create mode 100644 tools/testing/selftests/rcutorture/configs/scf/PREEMPT
create mode 100644 tools/testing/selftests/rcutorture/configs/scf/ver_functions.sh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-12 14:14 [GIT PULL] RCU changes for v5.10 Ingo Molnar
@ 2020-10-12 20:25 ` Linus Torvalds
2020-10-12 21:44 ` Paul E. McKenney
` (2 more replies)
2020-10-18 21:39 ` Linus Torvalds
1 sibling, 3 replies; 13+ messages in thread
From: Linus Torvalds @ 2020-10-12 20:25 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
>
> Please pull the latest core/rcu git tree from:
>
> RCU changes for v5.10:
>
> - Debugging for smp_call_function()
> - RT raw/non-raw lock ordering fixes
> - Strict grace periods for KASAN
> - New smp_call_function() torture test
> - Torture-test updates
> - Documentation updates
> - Miscellaneous fixes
I am *very* unhappy with this pull request.
It doesn't even mention the big removal of CONFIR_PREEMPT, that I felt
was still under discussion.
I don't absolutely hate that code, and I'm willing to be convinced
about how little it matter for people who don't want to have the
counting overhead, but I refuse to pull it as some secret hidden thing
that isn't even mentioned in the pull request.
Honestly, I did not get any strong arguments for why making the
preempt count unconditional was such an important thing.
Yes, Thomas pointed me at a couple of uses that were garbage, but even
the people involved in those seemed to agree they were legacy garbage.
So why was this preempt-count thing then pushed through like this?
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-12 20:25 ` Linus Torvalds
@ 2020-10-12 21:44 ` Paul E. McKenney
2020-10-12 21:59 ` Linus Torvalds
2020-10-13 7:32 ` Ingo Molnar
2020-10-13 9:16 ` Thomas Gleixner
2 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2020-10-12 21:44 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
On Mon, Oct 12, 2020 at 01:25:09PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Please pull the latest core/rcu git tree from:
> >
> > RCU changes for v5.10:
> >
> > - Debugging for smp_call_function()
> > - RT raw/non-raw lock ordering fixes
> > - Strict grace periods for KASAN
> > - New smp_call_function() torture test
> > - Torture-test updates
> > - Documentation updates
> > - Miscellaneous fixes
>
> I am *very* unhappy with this pull request.
>
> It doesn't even mention the big removal of CONFIR_PREEMPT, that I felt
> was still under discussion.
>
> I don't absolutely hate that code, and I'm willing to be convinced
> about how little it matter for people who don't want to have the
> counting overhead, but I refuse to pull it as some secret hidden thing
> that isn't even mentioned in the pull request.
>
> Honestly, I did not get any strong arguments for why making the
> preempt count unconditional was such an important thing.
>
> Yes, Thomas pointed me at a couple of uses that were garbage, but even
> the people involved in those seemed to agree they were legacy garbage.
>
> So why was this preempt-count thing then pushed through like this?
So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it
is safe to invoke the memory allocator. RCU needs to figure this out
for -rt kernels and for the CONFIG_PROVE_RAW_LOCK_NESTING Kconfig option
that was recently added to lockdep. And with this option, lockdep has
been triggering for kvfree_rcu() for awhile now.
We have tried a number of alternative fixes, but they have had subtle
problems. Or, in the case of the alternative that uses a lockless
interface to the memory allocator, the not-so-subtle problem of strong
resistance from the maintainers.
In contrast, your earlier comments seemed to indicate that with a valid
use case, you would be OK with unconditional PREEMPT_COUNT, though
perhaps that was a case of excessive optimism on my part. I based my
optimism in part on your not having complained about either the patch
series or the pull request, both of which I CCed you on:
https://lore.kernel.org/lkml/20200928233041.GA23230@paulmck-ThinkPad-P72
Patch series.
https://lore.kernel.org/lkml/20201001210750.GA25287@paulmck-ThinkPad-P72
Pull request.
Of course, if you hate this approach, we can go back to browbeating the
memory-allocator maintainers. On the other hand, the approach used in
this pull request does work quite well and I do know that there are
people who were quite tired of the kvfree_rcu() lockdep complaints
starting quite some time ago.
But either way, please let me know how you would like us to proceed.
Thanx, Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-12 21:44 ` Paul E. McKenney
@ 2020-10-12 21:59 ` Linus Torvalds
2020-10-12 23:54 ` Paul E. McKenney
0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2020-10-12 21:59 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
On Mon, Oct 12, 2020 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it
> is safe to invoke the memory allocator.
So in what situation is RCU called from random contexts that it can't even tell?
> But either way, please let me know how you would like us to proceed.
Well, AT A MINIMUM, the pull request should damn well have made it
1000% clear that this removes a case that has existed for decades, and
that potentially makes a difference for small kernels in particular.
In fact, my personal config option - still to this day - is
CONFIG_PREEMPT_VOLUNTARY and on the kernel I'm running,
CONFIG_PREEMPT_COUNT isn't actually set.
Because honestly, the code generation of some core code looks better
that way (in places where I've historically looked at things), and the
latency arguments against it simply aren't relevant when you have 8
cores or more.
So i don't think that "make preempt count unconditional" is some small
meaningless detail.
What is so magical about RCU allocating memory? I assume it's some
debug case? Why does that debug case then have a
select PREEMPT_COUNT
like is done for PROVE_LOCKING?
> I based my
> optimism in part on your not having complained about either the patch
> series or the pull request, both of which I CCed you on:
I had already raised my concerns when that patch series was posted by
Thomas originally. I did not feel like I needed to re-raise them just
because the series got reposted by somebody else.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-12 21:59 ` Linus Torvalds
@ 2020-10-12 23:54 ` Paul E. McKenney
2020-10-13 0:14 ` Linus Torvalds
0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2020-10-12 23:54 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
On Mon, Oct 12, 2020 at 02:59:41PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it
> > is safe to invoke the memory allocator.
>
> So in what situation is RCU called from random contexts that it can't even tell?
In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or
not its caller holds a raw spinlock, which some callers do. And if its
caller holds a raw spinlock, then RCU cannot invoke the memory allocator
because the allocator acquires non-raw spinlocks, which in turn results
in lockdep splats. Making CONFIG_PREEMPT_COUNT unconditional allows
RCU to make this determination.
Please note that RCU always provides a fallback for memory-allocation
failure, but such failure needs to be rare, at least in non-OOM
situations.
The alternatives to this approach are:
1. Lockless memory allocation, which was provided by an earlier
patch series. Again, the relevant maintainers are not happy
with this approach.
2. Defer memory allocation to a clean environment. However,
even softirq handlers are not clean enough, so this approach
incurs a full scheduling delay. And this delay is incurred
unconditionally in kernels built with CONFIG_PREEMPT_COUNT=n,
even if the system has memory coming out of its ears, and even
if RCU's caller happens to be a clean environment.
3. A long and sad litany of subtly broken approaches.
> > But either way, please let me know how you would like us to proceed.
>
> Well, AT A MINIMUM, the pull request should damn well have made it
> 1000% clear that this removes a case that has existed for decades, and
> that potentially makes a difference for small kernels in particular.
Got it, thank you.
> In fact, my personal config option - still to this day - is
> CONFIG_PREEMPT_VOLUNTARY and on the kernel I'm running,
> CONFIG_PREEMPT_COUNT isn't actually set.
>
> Because honestly, the code generation of some core code looks better
> that way (in places where I've historically looked at things), and the
> latency arguments against it simply aren't relevant when you have 8
> cores or more.
>
> So i don't think that "make preempt count unconditional" is some small
> meaningless detail.
Understood and agreed. And to take your point one step further, not
just CONFIG_PREEMPT_VOLUNTARY but also CONFIG_PREEMPT_NONE is also in
extremely heavy use, including by my employer.
And understood on kernel text size. Raw performance is a different story:
Even microbenchmarks didn't show statistically significant performance
change from CONFIG_PREEMPT_COUNT=n, and system-level benchmarks showed no
difference whatsoever.
So would it help if CONFIG_PREEMPT_COUNT=n became unconditional only for
CONFIG_SMP=y kernels? RCU does have other options for CONFIG_SMP=n. Or
do your small-kernel concerns extend beyond single-CPU microcontrollers?
> What is so magical about RCU allocating memory? I assume it's some
> debug case? Why does that debug case then have a
>
> select PREEMPT_COUNT
>
> like is done for PROVE_LOCKING?
Sadly, no, it is not just a debug case.
This memory allocation enables a cache-locality optimization to
callback processing that reduces cache misses. This optimization
is currently implemented only for kvfree_rcu(), where it reduces
callback-invocation-time cache misses by a factor of eight on typical
x86 systems, which produces decent system-level benefits. So it would
be good to also apply this optimization to call_rcu().
> > I based my
> > optimism in part on your not having complained about either the patch
> > series or the pull request, both of which I CCed you on:
>
> I had already raised my concerns when that patch series was posted by
> Thomas originally. I did not feel like I needed to re-raise them just
> because the series got reposted by somebody else.
OK, I did not know, but I do know it now!
Thanx, Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-12 23:54 ` Paul E. McKenney
@ 2020-10-13 0:14 ` Linus Torvalds
2020-10-13 2:41 ` Paul E. McKenney
0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2020-10-13 0:14 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
On Mon, Oct 12, 2020 at 4:54 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or
> not its caller holds a raw spinlock, which some callers do.
Only kfree_rcu()? (And apparently eventually call_rcu())?
And since we have lockdep, and it warns about it, and raw spinlocks
are really really rare, do we really need to then disable this simple
optimization for everybody else?
We have been very successful with "don't do that then" rules.
Eg, you cannot do normal memory allocations inside a spinlock (or you
have to mark them with GFP_ATOMIC, and not all allocations can be
marked as such), and this has been the case basically forever. And we
have debug code and tools that will check that.
Why is it impossible to just say "you can't do kfree_rcu() while
holding a raw spinlock"?
Particularly for something like kfree_rcu() and particularly if it's
just about raw spinlocks, it would seem to be very natural to just say
"just delay freeing it until after you've released the raw spinlock".
Because I sure hope that you don't find raw spinlock users in random
places. It should be stuff like core scheduling, RCU itself, etc.
> Making CONFIG_PREEMPT_COUNT unconditional allows
> RCU to make this determination.
I understand _that_ part, but the thing I find objectionable is how a
small piece of code seems to want to change the rules we have had in
the kernel since basically day #1.
(Ok, so the preempt count itself is much more recent than "day #1",
but the basic non-counting spinlocks do go back to very early in the
SMP stages).
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-13 0:14 ` Linus Torvalds
@ 2020-10-13 2:41 ` Paul E. McKenney
0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2020-10-13 2:41 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
On Mon, Oct 12, 2020 at 05:14:42PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 4:54 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or
> > not its caller holds a raw spinlock, which some callers do.
>
> Only kfree_rcu()? (And apparently eventually call_rcu())?
Yes. The other RCU APIs either only use raw spinlocks themselves on
the one hand or must be called from schedulable contexts on the other.
> And since we have lockdep, and it warns about it, and raw spinlocks
> are really really rare, do we really need to then disable this simple
> optimization for everybody else?
>
> We have been very successful with "don't do that then" rules.
>
> Eg, you cannot do normal memory allocations inside a spinlock (or you
> have to mark them with GFP_ATOMIC, and not all allocations can be
> marked as such), and this has been the case basically forever. And we
> have debug code and tools that will check that.
>
> Why is it impossible to just say "you can't do kfree_rcu() while
> holding a raw spinlock"?
>
> Particularly for something like kfree_rcu() and particularly if it's
> just about raw spinlocks, it would seem to be very natural to just say
> "just delay freeing it until after you've released the raw spinlock".
>
> Because I sure hope that you don't find raw spinlock users in random
> places. It should be stuff like core scheduling, RCU itself, etc.
True enough, but core stuff does use RCU, some of it while holding
raw spinlocks.
And you are right that "just don't do that, defer it instead" is often
very effective. In fact, I defer wakeups within RCU in order to avoid
deadlocks with the scheduler. It is simple in concept, and it does
work, but it is also a disproportionate source of bugs. Most of which
rcutorture finds in the safety and comfort of my own system, thankfully,
but some do escape. Maybe I am overreacting, but I have been burned
often enough that I feel the need to avoid this.
Plus I did oversimplify. CONFIG_PREEMPT_COUNT also allows the call_rcu()
portion to avoid deadlocks with the current non-lockless memory allocator.
So if the answer from you on global CONFIG_PREEMPT_COUNT=y and from
the MM guys on lockless allocation is irrevocably "@#$@#$ NO!" or the
current-day bowdlerized equivalent, I will take the scheduling delays
in the shorts and defer allocation.
> > Making CONFIG_PREEMPT_COUNT unconditional allows
> > RCU to make this determination.
>
> I understand _that_ part, but the thing I find objectionable is how a
> small piece of code seems to want to change the rules we have had in
> the kernel since basically day #1.
>
> (Ok, so the preempt count itself is much more recent than "day #1",
> but the basic non-counting spinlocks do go back to very early in the
> SMP stages).
Understood, a count-free CONFIG_PREEMPT_NONE has been in place in the
Linux kernel for an extremely long time. And I also understand that
adding CONFIG_PREEMPT_COUNT=y everywhere is a pervasive change that is
not to be taken lightly.
Thanx, Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-12 20:25 ` Linus Torvalds
2020-10-12 21:44 ` Paul E. McKenney
@ 2020-10-13 7:32 ` Ingo Molnar
2020-10-13 9:16 ` Thomas Gleixner
2 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2020-10-13 7:32 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Please pull the latest core/rcu git tree from:
> >
> > RCU changes for v5.10:
> >
> > - Debugging for smp_call_function()
> > - RT raw/non-raw lock ordering fixes
> > - Strict grace periods for KASAN
> > - New smp_call_function() torture test
> > - Torture-test updates
> > - Documentation updates
> > - Miscellaneous fixes
>
> I am *very* unhappy with this pull request.
>
> It doesn't even mention the big removal of CONFIR_PREEMPT, that I felt
> was still under discussion.
Not mentioning the unconditional PREEMPT_COUNT enabling aspect was 100% my
fault in summarizing the changes insufficiently, as I (mistakenly) thought
them to be uncontroversial. My apologies for that!
Here's a second attempt to properly justify these changes:
Regarding the performance aspect of the change, I was relying on these
performance measurements:
"Freshly conducted benchmarks did not reveal any measurable impact from
enabling preempt count unconditionally. On kernels with
CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY the preempt count is only
incremented and decremented but the result of the decrement is not
tested. Contrary to that enabling CONFIG_PREEMPT which tests the result
has a small but measurable impact due to the conditional branch/call."
FWIW, to inject some hard numbers into this discussion, here's also the
code generation impact of an unconditional PREEMPT_COUNT, on x86-defconfig:
text data bss filename
19675937 5591036 1433672 vmlinux.ubuntu.vanilla # 856deb866d16: ("Linux 5.9-rc5")
19682382 5590964 1425480 vmlinux.ubuntu.PREEMPT_COUNT=y # 7681205ba49d: ("preempt: Make preempt count unconditional")
So this is a pretty small, +0.03% increase (+6k) in generated code in the
core kernel, and it doesn't add widespread new control dependencies either.
I also measured the core kernel code generation impact on the kernel config
from a major Linux distribution that uses PREEMPT_VOLUNTARY=y (Ubuntu):
kepler:~/tip> grep PREEMPT .config
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPT_NOTIFIERS=y
text data bss filename
15754341 13790786 5242880 vmlinux.ubuntu.vanilla # 856deb866d16: ("Linux 5.9-rc5")
15754790 13791018 5242880 vmlinux.ubuntu.PREEMPT_COUNT=y # 7681205ba49d: ("preempt: Make preempt count unconditional")
15754771 13791018 5242880 vmlinux.ubuntu.full_cleanups # 849b9c5446cc: ("kvfree_rcu(): Fix ifnullfree.cocci warnings")
In this test the changes result in very little generated code increase in
the core kernel, just +449 bytes, or +0.003%.
In fact the impact was so low on this config that I initially disbelieved
it and double-checked the result and re-ran the build with all =m's turned
into =y's, to get a whole-kernel measurement of the generated code impact:
text data bss filename
84594448 61819613 42000384 vmlinux.ubuntu.vanilla # 856deb866d16: ("Linux 5.9-rc5")
84594129 61819777 42000384 vmlinux.ubuntu.PREEMPT_COUNT=y # 7681205ba49d: ("preempt: Make preempt count unconditional")
Note how the full ~84 MB image actually *shrunk*, possibly due to random
function & section alignment noise.
So to get a truly sensitive measurement of the impact of the PREEMPT_COUNT
change I built with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, to get tight instruction
packing and no alignment padding artifacts:
text data bss filename
69460329 60932573 40411136 vmlinux.ubuntu.vanilla # 856deb866d16: ("Linux 5.9-rc5")
69460739 60936853 40411136 vmlinux.ubuntu.PREEMPT_COUNT=y # 7681205ba49d: ("preempt: Make preempt count unconditional")
This shows a 410 bytes (+0.0005%) increase.
( Side note: it's rather impressive that -Os saves 21% of text size - if
only GCC wasn't so stupid with the final 2-3% size optimizations... )
So there's even less relative impact on the whole 84 MB kernel image -
modules don't do much direct preempt_count manipulation.
Just for completeness' sake I re-ran the original defconfig build as well,
this time with -Os:
text data bss filename
16091696 5565988 2928696 vmlinux.defconfig.Os.vanilla # 856deb866d16: ("Linux 5.9-rc5")
16095525 5570156 2928696 vmlinux.defconfig.Os.PREEMPT_COUNT=y # 7681205ba49d: ("preempt: Make preempt count unconditional")
3.8k, or +0.025% - similar to the initial +0.03% result.
So even though I'm normally fiercely anti-bloat, if we combine the
performance and code generation measurements with these maintainability
arguments:
"It's about time to make essential functionality of the kernel consistent
across the various preemption models.
Enable CONFIG_PREEMPT_COUNT unconditionally. Follow up changes will
remove the #ifdeffery and remove the config option at the end."
I think the PREEMPT_COUNT=y change to reduce the schizm between the various
preemption models is IMHO justified - and reducing the code base distance
to -rt is the icing on the cake.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-12 20:25 ` Linus Torvalds
2020-10-12 21:44 ` Paul E. McKenney
2020-10-13 7:32 ` Ingo Molnar
@ 2020-10-13 9:16 ` Thomas Gleixner
2 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2020-10-13 9:16 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Linux Kernel Mailing List, Paul E. McKenney, Peter Zijlstra,
Andrew Morton
Linus,
On Mon, Oct 12 2020 at 13:25, Linus Torvalds wrote:
> I don't absolutely hate that code, and I'm willing to be convinced
> about how little it matter for people who don't want to have the
> counting overhead, but I refuse to pull it as some secret hidden thing
> that isn't even mentioned in the pull request.
>
> Honestly, I did not get any strong arguments for why making the
> preempt count unconditional was such an important thing.
>
> Yes, Thomas pointed me at a couple of uses that were garbage, but even
> the people involved in those seemed to agree they were legacy garbage.
Yes, we agreed on those parts and also started to act on your request to
remove these constructs where non core code is changing behaviour
depending on some context check by either seperating it or handing the
condition in from the call sites.
You said in that original discussion:
> Of course core code can (and will) look at things like
>
> if (in_interrupt())
> .. schedule work asynchronously ..
>
> because core code ends up being called from odd places, and code like
> that is expected to have understanding of the rules it plays with.
>
> But something like RCU is a very different beast from some "walk the
> scatter-gather list" code.
>
> RCU does its work in the background, and works with lots of different
> things. And it's so core and used everywhere that it knows about these
> things. I mean, we literally have special code explicitly to let RCU
> know "we entered kernel context now".
As Paul was facing the problem of not knowing the exact context on
PREEMPT_NONE kernels which can aside of the raw spinlock issue cause
deadlocks vs. the page allocator, we had the impression that solving
this particular itch with a consistent preempt count was justified.
As you did not voice objections on Pauls series which incorporated my
initial work, I was encouraging Paul to go ahead with this. Of course if
I misinterpreted your silence on that patch submission, I need to fine
tune my void decoder.
That said, I fully agree that this all should have been part of the pull
request message and in hindsight I should have reminded Ingo to be very
careful about that when we were splitting up the pull request pile of
the tip tree.
I hope that clarifies the context some more and makes the whole thing
more palatable.
Thanks,
tglx
/me goes back to stare at in_*() constructs...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-12 14:14 [GIT PULL] RCU changes for v5.10 Ingo Molnar
2020-10-12 20:25 ` Linus Torvalds
@ 2020-10-18 21:39 ` Linus Torvalds
2020-10-19 3:24 ` Paul E. McKenney
2020-10-19 7:31 ` Ingo Molnar
1 sibling, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2020-10-18 21:39 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
>
> Please pull the latest core/rcu git tree from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12
I've pulled everything but that last merge and the PREEMPT_COUNT stuff
that came with it.
When Paul asked whether it was ok for RCU to use preempt_count() and I
answered in the affirmative, I didn't mean it in the sense of "RCU
wants to force it on everybody else too".
I'm pretty convinced that the proper fix is to simply make sure that
rcu_free() and friends aren't run under any raw spinlocks. So even if
the cost of preempt-count isn't that noticeable, there just isn't a
reason for RCU to say "screw everybody else, I want this" when there
are other alternatives.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-18 21:39 ` Linus Torvalds
@ 2020-10-19 3:24 ` Paul E. McKenney
2020-10-19 16:51 ` Linus Torvalds
2020-10-19 7:31 ` Ingo Molnar
1 sibling, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2020-10-19 3:24 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
On Sun, Oct 18, 2020 at 02:39:56PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Please pull the latest core/rcu git tree from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12
>
> I've pulled everything but that last merge and the PREEMPT_COUNT stuff
> that came with it.
>
> When Paul asked whether it was ok for RCU to use preempt_count() and I
> answered in the affirmative, I didn't mean it in the sense of "RCU
> wants to force it on everybody else too".
>
> I'm pretty convinced that the proper fix is to simply make sure that
> rcu_free() and friends aren't run under any raw spinlocks. So even if
> the cost of preempt-count isn't that noticeable, there just isn't a
> reason for RCU to say "screw everybody else, I want this" when there
> are other alternatives.
Thank you for pulling the other branches.
On CONFIG_PREEMPT_COUNT, got it. It would be OK for RCU to use
preempt_count() for some debugging or specialty kernel, but not across
the board. Thank you for bearing with me on this one.
There is more to it than just raw spinlocks, but regardless we will go
back to the drawing board and come up with a less intrusive fix for the
v5.11 merge window.
Thanx, Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-18 21:39 ` Linus Torvalds
2020-10-19 3:24 ` Paul E. McKenney
@ 2020-10-19 7:31 ` Ingo Molnar
1 sibling, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2020-10-19 7:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Please pull the latest core/rcu git tree from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12
>
> I've pulled everything but that last merge and the PREEMPT_COUNT
> stuff that came with it.
>
> When Paul asked whether it was ok for RCU to use preempt_count() and
> I answered in the affirmative, I didn't mean it in the sense of "RCU
> wants to force it on everybody else too".
>
> I'm pretty convinced that the proper fix is to simply make sure that
> rcu_free() and friends aren't run under any raw spinlocks. So even
> if the cost of preempt-count isn't that noticeable, there just isn't
> a reason for RCU to say "screw everybody else, I want this" when
> there are other alternatives.
That's certainly true - thanks for catching this & sorting it out from
the bigger pull request!
Thanks,
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10
2020-10-19 3:24 ` Paul E. McKenney
@ 2020-10-19 16:51 ` Linus Torvalds
0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2020-10-19 16:51 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
Peter Zijlstra, Andrew Morton
On Sun, Oct 18, 2020 at 8:24 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On CONFIG_PREEMPT_COUNT, got it. It would be OK for RCU to use
> preempt_count() for some debugging or specialty kernel, but not across
> the board.
Right - that was what I thought you were asking originally.
I don't think a driver or random piece of code like that should ever
use "preempt_count()" on its own - partly because the rules are
subtle, but partly simply because drivers have no business with those
kinds of low-level things.
But yeah, for some core stuff like RCU, using preempt_count() for
debugging etc makes sense. Just not to change _behavior_, because
preempt_count on its own is almost entirely meaningless. It's just one
(local) part of so much state. Again, partly because preempt count
isn't necessarily always meaningful due to config settings, but partly
because there are just so many other things like "are interrupts
disabled" or "are we in an NMI context" or whatever.
And in some odd situation, depending on exactly what you do, maybe
preempt-count can be exactly what you need, because you know
everything else about the state statically. "preempt_enable()"
obviously is one such thing - the whole point is "if
CONFIG_PREEMPT_COUNT is on, then the _semantics_ of this is 'increase
preempt count', and if it goes to zero, and we should reschedule, do
that'".
So it's not that preempt_count() is meaningless, but it's such a
specialized thing that 99.9% of all code really cannot and shouldn't
use it.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-10-19 16:52 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12 14:14 [GIT PULL] RCU changes for v5.10 Ingo Molnar
2020-10-12 20:25 ` Linus Torvalds
2020-10-12 21:44 ` Paul E. McKenney
2020-10-12 21:59 ` Linus Torvalds
2020-10-12 23:54 ` Paul E. McKenney
2020-10-13 0:14 ` Linus Torvalds
2020-10-13 2:41 ` Paul E. McKenney
2020-10-13 7:32 ` Ingo Molnar
2020-10-13 9:16 ` Thomas Gleixner
2020-10-18 21:39 ` Linus Torvalds
2020-10-19 3:24 ` Paul E. McKenney
2020-10-19 16:51 ` Linus Torvalds
2020-10-19 7:31 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).