* [GIT PULL] RCU changes for v5.10 @ 2020-10-12 14:14 Ingo Molnar 2020-10-12 20:25 ` Linus Torvalds 2020-10-18 21:39 ` Linus Torvalds 0 siblings, 2 replies; 13+ messages in thread From: Ingo Molnar @ 2020-10-12 14:14 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, Paul E. McKenney, Thomas Gleixner, Peter Zijlstra, Andrew Morton Linus, Please pull the latest core/rcu git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12 # HEAD: c6de896fa0a4546c799c86513d99bd011b4a6177 Merge branch 'rcu/fix-rt' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu RCU changes for v5.10: - Debugging for smp_call_function() - RT raw/non-raw lock ordering fixes - Strict grace periods for KASAN - New smp_call_function() torture test - Torture-test updates - Documentation updates - Miscellaneous fixes Thanks, Ingo ------------------> Alexander A. Klimov (1): rcutorture: Replace HTTP links with HTTPS ones Colin Ian King (1): refperf: Avoid null pointer dereference when buf fails to allocate Joel Fernandes (Google) (6): rcu/trace: Print negative GP numbers correctly rcu/trace: Use gp_seq_req in acceleration's rcu_grace_period tracepoint rcu: Clarify comments about FQS loop reporting quiescent states rcu: Make FQS more aggressive in complaining about offline CPUs rcutorture: Output number of elapsed grace periods rcu/segcblist: Prevent useless GP start if no CBs to accelerate Madhuparna Bhowmik (2): rculist: Introduce list/hlist_for_each_entry_srcu() macros kvm: mmu: page_track: Fix RCU list API usage Neeraj Upadhyay (2): rcu/tree: Force quiescent state on callback overload rcu/tree: Remove CONFIG_PREMPT_RCU check in force_qs_rnp() Paul E. McKenney (56): lib: Add backtrace_idle parameter to force backtrace of idle CPUs rcu: Remove KCSAN stubs rcu: Remove KCSAN stubs from update.c srcu: Remove KCSAN stubs rcu: Initialize at declaration time in rcu_exp_handler() nocb: Clarify RCU nocb CPU error message nocb: Remove show_rcu_nocb_state() false positive printout rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_divisor rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_resched_ns rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_kick_kthreads rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_cpu_stall_ftrace_dump rcu: Move rcu_cpu_started per-CPU variable to rcu_data rcu/nocb: Add a warning for non-GP kthread running GP code rcu: Remove unused __rcu_is_watching() function scftorture: Add smp_call_function() torture test torture: Declare parse-console.sh independence from rcutorture torture: Add scftorture to the rcutorture scripting scftorture: Implement weighted primitive selection tick-sched: Clarify "NOHZ: local_softirq_pending" warning scftorture: Summarize per-thread statistics scftorture: Add smp_call_function_single() memory-ordering checks scftorture: Add smp_call_function_many() memory-ordering checks scftorture: Add smp_call_function() memory-ordering checks scftorture: Consolidate scftorture_invoke_one() check and kfree() scftorture: Consolidate scftorture_invoke_one() scf_check initialization scftorture: Flag errors in torture-compatible manner scftorture: Prevent compiler from reducing race probabilities scftorture: Check unexpected "switch" statement value scftorture: Block scftorture_invoker() kthreads for offline CPUs scftorture: Adapt memory-ordering test to UP operation scftorture: Add cond_resched() to test loop rcuperf: Change rcuperf to rcuscale rcu: Add Kconfig option for strict RCU grace periods rcu: Reduce leaf fanout for strict RCU grace periods rcu: Restrict default jiffies_till_first_fqs for strict RCU GPs rcu: Force DEFAULT_RCU_BLIMIT to 1000 for strict RCU GPs rcu: Always set .need_qs from __rcu_read_lock() for strict GPs rcu: Do full report for .need_qs for strict GPs rcu: Attempt QS when CPU discovers GP for strict GPs rcu: IPI all CPUs at GP start for strict GPs rcu: IPI all CPUs at GP end for strict GPs rcu: Provide optional RCU-reader exit delay for strict GPs rcu: Execute RCU reader shortly after rcu_core for strict GPs rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() for strict GPs rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp() rcutorture: Remove KCSAN stubs torture: Update initrd documentation rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05 torture: Add kvm.sh --help and update help message rcutorture: Properly set rcu_fwds for OOM handling rcutorture: Properly synchronize with OOM notifier rcutorture: Hoist OOM registry up one level rcutorture: Allow pointer leaks to test diagnostic code torture: Add gdb support smp: Add source and destination CPUs to __call_single_data kernel/smp: Provide CSD lock timeout diagnostics Paul Gortmaker (1): torture: document --allcpus argument added to the kvm.sh script Randy Dunlap (2): doc: Drop doubled words from RCU Data-Structures.rst doc: Drop doubled words from RCU requirements documentation Thomas Gleixner (13): lib/debug: Remove pointless ARCH_NO_PREEMPT dependencies preempt: Make preempt count unconditional preempt: Cleanup PREEMPT_COUNT leftovers lockdep: Cleanup PREEMPT_COUNT leftovers mm/pagemap: Cleanup PREEMPT_COUNT leftovers locking/bitspinlock: Cleanup PREEMPT_COUNT leftovers uaccess: Cleanup PREEMPT_COUNT leftovers sched: Cleanup PREEMPT_COUNT leftovers ARM: Cleanup PREEMPT_COUNT leftovers xtensa: Cleanup PREEMPT_COUNT leftovers drm/i915: Cleanup PREEMPT_COUNT leftovers rcutorture: Cleanup PREEMPT_COUNT leftovers preempt: Remove PREEMPT_COUNT from Kconfig Tobias Klauser (2): docs: Fix typo in synchronize_rcu() function name rcu: Fix kerneldoc comments in rcupdate.h Uladzislau Rezki (Sony) (1): rcu/tree: Allocate a page when caller is preemptible Wei Yongjun (3): scftorture: Make symbol 'scf_torture_rand' static locktorture: Make function torture_percpu_rwsem_init() static smp: Make symbol 'csd_bug_count' static Zqiang (1): rcu: Shrink each possible cpu krcp kernel test robot (1): kvfree_rcu(): Fix ifnullfree.cocci warnings .../RCU/Design/Data-Structures/Data-Structures.rst | 2 +- .../RCU/Design/Requirements/Requirements.rst | 4 +- Documentation/RCU/whatisRCU.rst | 2 +- Documentation/admin-guide/kernel-parameters.txt | 153 +++++- MAINTAINERS | 3 +- arch/arm/include/asm/assembler.h | 11 - arch/arm/kernel/iwmmxt.S | 2 - arch/arm/mach-ep93xx/crunch-bits.S | 2 - arch/x86/kvm/mmu/page_track.c | 6 +- arch/xtensa/kernel/entry.S | 2 +- drivers/gpu/drm/i915/Kconfig.debug | 1 - drivers/gpu/drm/i915/i915_utils.h | 3 +- include/linux/bit_spinlock.h | 4 +- include/linux/lockdep.h | 6 +- include/linux/pagemap.h | 4 +- include/linux/preempt.h | 37 +- include/linux/rculist.h | 48 ++ include/linux/rcupdate.h | 19 +- include/linux/rcutiny.h | 1 - include/linux/rcutree.h | 1 - include/linux/smp.h | 3 + include/linux/smp_types.h | 3 + include/linux/uaccess.h | 6 +- include/trace/events/rcu.h | 54 +- kernel/Kconfig.preempt | 4 - kernel/Makefile | 2 + kernel/entry/common.c | 2 +- kernel/locking/locktorture.c | 2 +- kernel/rcu/Kconfig | 8 +- kernel/rcu/Kconfig.debug | 17 +- kernel/rcu/Makefile | 2 +- kernel/rcu/rcu_segcblist.c | 10 +- kernel/rcu/{rcuperf.c => rcuscale.c} | 330 ++++++------ kernel/rcu/rcutorture.c | 61 ++- kernel/rcu/refscale.c | 8 +- kernel/rcu/srcutree.c | 13 - kernel/rcu/tree.c | 244 +++++---- kernel/rcu/tree.h | 2 + kernel/rcu/tree_exp.h | 6 +- kernel/rcu/tree_plugin.h | 40 +- kernel/rcu/tree_stall.h | 8 +- kernel/rcu/update.c | 13 - kernel/scftorture.c | 575 +++++++++++++++++++++ kernel/sched/core.c | 6 +- kernel/smp.c | 134 +++++ kernel/time/tick-sched.c | 2 +- lib/Kconfig.debug | 24 +- lib/nmi_backtrace.c | 6 +- ...rf-ftrace.sh => kvm-recheck-rcuscale-ftrace.sh} | 6 +- ...-recheck-rcuperf.sh => kvm-recheck-rcuscale.sh} | 14 +- .../selftests/rcutorture/bin/kvm-recheck-scf.sh | 38 ++ .../selftests/rcutorture/bin/kvm-test-1-run.sh | 33 +- tools/testing/selftests/rcutorture/bin/kvm.sh | 36 +- .../selftests/rcutorture/bin/parse-console.sh | 11 +- .../selftests/rcutorture/configs/rcu/SRCU-t | 1 - .../selftests/rcutorture/configs/rcu/SRCU-u | 1 - .../selftests/rcutorture/configs/rcu/TINY01 | 1 - .../selftests/rcutorture/configs/rcu/TREE05 | 1 + .../selftests/rcutorture/configs/rcuperf/CFcommon | 2 - .../configs/{rcuperf => rcuscale}/CFLIST | 0 .../selftests/rcutorture/configs/rcuscale/CFcommon | 2 + .../rcutorture/configs/{rcuperf => rcuscale}/TINY | 0 .../rcutorture/configs/{rcuperf => rcuscale}/TREE | 0 .../configs/{rcuperf => rcuscale}/TREE54 | 0 .../configs/{rcuperf => rcuscale}/ver_functions.sh | 4 +- .../selftests/rcutorture/configs/scf/CFLIST | 2 + .../selftests/rcutorture/configs/scf/CFcommon | 2 + .../selftests/rcutorture/configs/scf/NOPREEMPT | 9 + .../rcutorture/configs/scf/NOPREEMPT.boot | 1 + .../selftests/rcutorture/configs/scf/PREEMPT | 9 + .../rcutorture/configs/scf/ver_functions.sh | 30 ++ .../testing/selftests/rcutorture/doc/TINY_RCU.txt | 5 +- .../selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 1 - tools/testing/selftests/rcutorture/doc/initrd.txt | 36 +- .../selftests/rcutorture/doc/rcu-test-image.txt | 41 +- .../rcutorture/formal/srcu-cbmc/src/config.h | 1 - 76 files changed, 1626 insertions(+), 557 deletions(-) rename kernel/rcu/{rcuperf.c => rcuscale.c} (64%) create mode 100644 kernel/scftorture.c rename tools/testing/selftests/rcutorture/bin/{kvm-recheck-rcuperf-ftrace.sh => kvm-recheck-rcuscale-ftrace.sh} (92%) rename tools/testing/selftests/rcutorture/bin/{kvm-recheck-rcuperf.sh => kvm-recheck-rcuscale.sh} (84%) create mode 100755 tools/testing/selftests/rcutorture/bin/kvm-recheck-scf.sh delete mode 100644 tools/testing/selftests/rcutorture/configs/rcuperf/CFcommon rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/CFLIST (100%) create mode 100644 tools/testing/selftests/rcutorture/configs/rcuscale/CFcommon rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/TINY (100%) rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/TREE (100%) rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/TREE54 (100%) rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/ver_functions.sh (88%) create mode 100644 tools/testing/selftests/rcutorture/configs/scf/CFLIST create mode 100644 tools/testing/selftests/rcutorture/configs/scf/CFcommon create mode 100644 tools/testing/selftests/rcutorture/configs/scf/NOPREEMPT create mode 100644 tools/testing/selftests/rcutorture/configs/scf/NOPREEMPT.boot create mode 100644 tools/testing/selftests/rcutorture/configs/scf/PREEMPT create mode 100644 tools/testing/selftests/rcutorture/configs/scf/ver_functions.sh ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-12 14:14 [GIT PULL] RCU changes for v5.10 Ingo Molnar @ 2020-10-12 20:25 ` Linus Torvalds 2020-10-12 21:44 ` Paul E. McKenney ` (2 more replies) 2020-10-18 21:39 ` Linus Torvalds 1 sibling, 3 replies; 13+ messages in thread From: Linus Torvalds @ 2020-10-12 20:25 UTC (permalink / raw) To: Ingo Molnar Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner, Peter Zijlstra, Andrew Morton On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote: > > Please pull the latest core/rcu git tree from: > > RCU changes for v5.10: > > - Debugging for smp_call_function() > - RT raw/non-raw lock ordering fixes > - Strict grace periods for KASAN > - New smp_call_function() torture test > - Torture-test updates > - Documentation updates > - Miscellaneous fixes I am *very* unhappy with this pull request. It doesn't even mention the big removal of CONFIR_PREEMPT, that I felt was still under discussion. I don't absolutely hate that code, and I'm willing to be convinced about how little it matter for people who don't want to have the counting overhead, but I refuse to pull it as some secret hidden thing that isn't even mentioned in the pull request. Honestly, I did not get any strong arguments for why making the preempt count unconditional was such an important thing. Yes, Thomas pointed me at a couple of uses that were garbage, but even the people involved in those seemed to agree they were legacy garbage. So why was this preempt-count thing then pushed through like this? Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-12 20:25 ` Linus Torvalds @ 2020-10-12 21:44 ` Paul E. McKenney 2020-10-12 21:59 ` Linus Torvalds 2020-10-13 7:32 ` Ingo Molnar 2020-10-13 9:16 ` Thomas Gleixner 2 siblings, 1 reply; 13+ messages in thread From: Paul E. McKenney @ 2020-10-12 21:44 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Andrew Morton On Mon, Oct 12, 2020 at 01:25:09PM -0700, Linus Torvalds wrote: > On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote: > > > > Please pull the latest core/rcu git tree from: > > > > RCU changes for v5.10: > > > > - Debugging for smp_call_function() > > - RT raw/non-raw lock ordering fixes > > - Strict grace periods for KASAN > > - New smp_call_function() torture test > > - Torture-test updates > > - Documentation updates > > - Miscellaneous fixes > > I am *very* unhappy with this pull request. > > It doesn't even mention the big removal of CONFIR_PREEMPT, that I felt > was still under discussion. > > I don't absolutely hate that code, and I'm willing to be convinced > about how little it matter for people who don't want to have the > counting overhead, but I refuse to pull it as some secret hidden thing > that isn't even mentioned in the pull request. > > Honestly, I did not get any strong arguments for why making the > preempt count unconditional was such an important thing. > > Yes, Thomas pointed me at a couple of uses that were garbage, but even > the people involved in those seemed to agree they were legacy garbage. > > So why was this preempt-count thing then pushed through like this? So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it is safe to invoke the memory allocator. RCU needs to figure this out for -rt kernels and for the CONFIG_PROVE_RAW_LOCK_NESTING Kconfig option that was recently added to lockdep. And with this option, lockdep has been triggering for kvfree_rcu() for awhile now. We have tried a number of alternative fixes, but they have had subtle problems. Or, in the case of the alternative that uses a lockless interface to the memory allocator, the not-so-subtle problem of strong resistance from the maintainers. In contrast, your earlier comments seemed to indicate that with a valid use case, you would be OK with unconditional PREEMPT_COUNT, though perhaps that was a case of excessive optimism on my part. I based my optimism in part on your not having complained about either the patch series or the pull request, both of which I CCed you on: https://lore.kernel.org/lkml/20200928233041.GA23230@paulmck-ThinkPad-P72 Patch series. https://lore.kernel.org/lkml/20201001210750.GA25287@paulmck-ThinkPad-P72 Pull request. Of course, if you hate this approach, we can go back to browbeating the memory-allocator maintainers. On the other hand, the approach used in this pull request does work quite well and I do know that there are people who were quite tired of the kvfree_rcu() lockdep complaints starting quite some time ago. But either way, please let me know how you would like us to proceed. Thanx, Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-12 21:44 ` Paul E. McKenney @ 2020-10-12 21:59 ` Linus Torvalds 2020-10-12 23:54 ` Paul E. McKenney 0 siblings, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2020-10-12 21:59 UTC (permalink / raw) To: Paul E. McKenney Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Andrew Morton On Mon, Oct 12, 2020 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it > is safe to invoke the memory allocator. So in what situation is RCU called from random contexts that it can't even tell? > But either way, please let me know how you would like us to proceed. Well, AT A MINIMUM, the pull request should damn well have made it 1000% clear that this removes a case that has existed for decades, and that potentially makes a difference for small kernels in particular. In fact, my personal config option - still to this day - is CONFIG_PREEMPT_VOLUNTARY and on the kernel I'm running, CONFIG_PREEMPT_COUNT isn't actually set. Because honestly, the code generation of some core code looks better that way (in places where I've historically looked at things), and the latency arguments against it simply aren't relevant when you have 8 cores or more. So i don't think that "make preempt count unconditional" is some small meaningless detail. What is so magical about RCU allocating memory? I assume it's some debug case? Why does that debug case then have a select PREEMPT_COUNT like is done for PROVE_LOCKING? > I based my > optimism in part on your not having complained about either the patch > series or the pull request, both of which I CCed you on: I had already raised my concerns when that patch series was posted by Thomas originally. I did not feel like I needed to re-raise them just because the series got reposted by somebody else. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-12 21:59 ` Linus Torvalds @ 2020-10-12 23:54 ` Paul E. McKenney 2020-10-13 0:14 ` Linus Torvalds 0 siblings, 1 reply; 13+ messages in thread From: Paul E. McKenney @ 2020-10-12 23:54 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Andrew Morton On Mon, Oct 12, 2020 at 02:59:41PM -0700, Linus Torvalds wrote: > On Mon, Oct 12, 2020 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it > > is safe to invoke the memory allocator. > > So in what situation is RCU called from random contexts that it can't even tell? In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or not its caller holds a raw spinlock, which some callers do. And if its caller holds a raw spinlock, then RCU cannot invoke the memory allocator because the allocator acquires non-raw spinlocks, which in turn results in lockdep splats. Making CONFIG_PREEMPT_COUNT unconditional allows RCU to make this determination. Please note that RCU always provides a fallback for memory-allocation failure, but such failure needs to be rare, at least in non-OOM situations. The alternatives to this approach are: 1. Lockless memory allocation, which was provided by an earlier patch series. Again, the relevant maintainers are not happy with this approach. 2. Defer memory allocation to a clean environment. However, even softirq handlers are not clean enough, so this approach incurs a full scheduling delay. And this delay is incurred unconditionally in kernels built with CONFIG_PREEMPT_COUNT=n, even if the system has memory coming out of its ears, and even if RCU's caller happens to be a clean environment. 3. A long and sad litany of subtly broken approaches. > > But either way, please let me know how you would like us to proceed. > > Well, AT A MINIMUM, the pull request should damn well have made it > 1000% clear that this removes a case that has existed for decades, and > that potentially makes a difference for small kernels in particular. Got it, thank you. > In fact, my personal config option - still to this day - is > CONFIG_PREEMPT_VOLUNTARY and on the kernel I'm running, > CONFIG_PREEMPT_COUNT isn't actually set. > > Because honestly, the code generation of some core code looks better > that way (in places where I've historically looked at things), and the > latency arguments against it simply aren't relevant when you have 8 > cores or more. > > So i don't think that "make preempt count unconditional" is some small > meaningless detail. Understood and agreed. And to take your point one step further, not just CONFIG_PREEMPT_VOLUNTARY but also CONFIG_PREEMPT_NONE is also in extremely heavy use, including by my employer. And understood on kernel text size. Raw performance is a different story: Even microbenchmarks didn't show statistically significant performance change from CONFIG_PREEMPT_COUNT=n, and system-level benchmarks showed no difference whatsoever. So would it help if CONFIG_PREEMPT_COUNT=n became unconditional only for CONFIG_SMP=y kernels? RCU does have other options for CONFIG_SMP=n. Or do your small-kernel concerns extend beyond single-CPU microcontrollers? > What is so magical about RCU allocating memory? I assume it's some > debug case? Why does that debug case then have a > > select PREEMPT_COUNT > > like is done for PROVE_LOCKING? Sadly, no, it is not just a debug case. This memory allocation enables a cache-locality optimization to callback processing that reduces cache misses. This optimization is currently implemented only for kvfree_rcu(), where it reduces callback-invocation-time cache misses by a factor of eight on typical x86 systems, which produces decent system-level benefits. So it would be good to also apply this optimization to call_rcu(). > > I based my > > optimism in part on your not having complained about either the patch > > series or the pull request, both of which I CCed you on: > > I had already raised my concerns when that patch series was posted by > Thomas originally. I did not feel like I needed to re-raise them just > because the series got reposted by somebody else. OK, I did not know, but I do know it now! Thanx, Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-12 23:54 ` Paul E. McKenney @ 2020-10-13 0:14 ` Linus Torvalds 2020-10-13 2:41 ` Paul E. McKenney 0 siblings, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2020-10-13 0:14 UTC (permalink / raw) To: Paul E. McKenney Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Andrew Morton On Mon, Oct 12, 2020 at 4:54 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or > not its caller holds a raw spinlock, which some callers do. Only kfree_rcu()? (And apparently eventually call_rcu())? And since we have lockdep, and it warns about it, and raw spinlocks are really really rare, do we really need to then disable this simple optimization for everybody else? We have been very successful with "don't do that then" rules. Eg, you cannot do normal memory allocations inside a spinlock (or you have to mark them with GFP_ATOMIC, and not all allocations can be marked as such), and this has been the case basically forever. And we have debug code and tools that will check that. Why is it impossible to just say "you can't do kfree_rcu() while holding a raw spinlock"? Particularly for something like kfree_rcu() and particularly if it's just about raw spinlocks, it would seem to be very natural to just say "just delay freeing it until after you've released the raw spinlock". Because I sure hope that you don't find raw spinlock users in random places. It should be stuff like core scheduling, RCU itself, etc. > Making CONFIG_PREEMPT_COUNT unconditional allows > RCU to make this determination. I understand _that_ part, but the thing I find objectionable is how a small piece of code seems to want to change the rules we have had in the kernel since basically day #1. (Ok, so the preempt count itself is much more recent than "day #1", but the basic non-counting spinlocks do go back to very early in the SMP stages). Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-13 0:14 ` Linus Torvalds @ 2020-10-13 2:41 ` Paul E. McKenney 0 siblings, 0 replies; 13+ messages in thread From: Paul E. McKenney @ 2020-10-13 2:41 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Andrew Morton On Mon, Oct 12, 2020 at 05:14:42PM -0700, Linus Torvalds wrote: > On Mon, Oct 12, 2020 at 4:54 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or > > not its caller holds a raw spinlock, which some callers do. > > Only kfree_rcu()? (And apparently eventually call_rcu())? Yes. The other RCU APIs either only use raw spinlocks themselves on the one hand or must be called from schedulable contexts on the other. > And since we have lockdep, and it warns about it, and raw spinlocks > are really really rare, do we really need to then disable this simple > optimization for everybody else? > > We have been very successful with "don't do that then" rules. > > Eg, you cannot do normal memory allocations inside a spinlock (or you > have to mark them with GFP_ATOMIC, and not all allocations can be > marked as such), and this has been the case basically forever. And we > have debug code and tools that will check that. > > Why is it impossible to just say "you can't do kfree_rcu() while > holding a raw spinlock"? > > Particularly for something like kfree_rcu() and particularly if it's > just about raw spinlocks, it would seem to be very natural to just say > "just delay freeing it until after you've released the raw spinlock". > > Because I sure hope that you don't find raw spinlock users in random > places. It should be stuff like core scheduling, RCU itself, etc. True enough, but core stuff does use RCU, some of it while holding raw spinlocks. And you are right that "just don't do that, defer it instead" is often very effective. In fact, I defer wakeups within RCU in order to avoid deadlocks with the scheduler. It is simple in concept, and it does work, but it is also a disproportionate source of bugs. Most of which rcutorture finds in the safety and comfort of my own system, thankfully, but some do escape. Maybe I am overreacting, but I have been burned often enough that I feel the need to avoid this. Plus I did oversimplify. CONFIG_PREEMPT_COUNT also allows the call_rcu() portion to avoid deadlocks with the current non-lockless memory allocator. So if the answer from you on global CONFIG_PREEMPT_COUNT=y and from the MM guys on lockless allocation is irrevocably "@#$@#$ NO!" or the current-day bowdlerized equivalent, I will take the scheduling delays in the shorts and defer allocation. > > Making CONFIG_PREEMPT_COUNT unconditional allows > > RCU to make this determination. > > I understand _that_ part, but the thing I find objectionable is how a > small piece of code seems to want to change the rules we have had in > the kernel since basically day #1. > > (Ok, so the preempt count itself is much more recent than "day #1", > but the basic non-counting spinlocks do go back to very early in the > SMP stages). Understood, a count-free CONFIG_PREEMPT_NONE has been in place in the Linux kernel for an extremely long time. And I also understand that adding CONFIG_PREEMPT_COUNT=y everywhere is a pervasive change that is not to be taken lightly. Thanx, Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-12 20:25 ` Linus Torvalds 2020-10-12 21:44 ` Paul E. McKenney @ 2020-10-13 7:32 ` Ingo Molnar 2020-10-13 9:16 ` Thomas Gleixner 2 siblings, 0 replies; 13+ messages in thread From: Ingo Molnar @ 2020-10-13 7:32 UTC (permalink / raw) To: Linus Torvalds Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner, Peter Zijlstra, Andrew Morton * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote: > > > > Please pull the latest core/rcu git tree from: > > > > RCU changes for v5.10: > > > > - Debugging for smp_call_function() > > - RT raw/non-raw lock ordering fixes > > - Strict grace periods for KASAN > > - New smp_call_function() torture test > > - Torture-test updates > > - Documentation updates > > - Miscellaneous fixes > > I am *very* unhappy with this pull request. > > It doesn't even mention the big removal of CONFIR_PREEMPT, that I felt > was still under discussion. Not mentioning the unconditional PREEMPT_COUNT enabling aspect was 100% my fault in summarizing the changes insufficiently, as I (mistakenly) thought them to be uncontroversial. My apologies for that! Here's a second attempt to properly justify these changes: Regarding the performance aspect of the change, I was relying on these performance measurements: "Freshly conducted benchmarks did not reveal any measurable impact from enabling preempt count unconditionally. On kernels with CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY the preempt count is only incremented and decremented but the result of the decrement is not tested. Contrary to that enabling CONFIG_PREEMPT which tests the result has a small but measurable impact due to the conditional branch/call." FWIW, to inject some hard numbers into this discussion, here's also the code generation impact of an unconditional PREEMPT_COUNT, on x86-defconfig: text data bss filename 19675937 5591036 1433672 vmlinux.ubuntu.vanilla # 856deb866d16: ("Linux 5.9-rc5") 19682382 5590964 1425480 vmlinux.ubuntu.PREEMPT_COUNT=y # 7681205ba49d: ("preempt: Make preempt count unconditional") So this is a pretty small, +0.03% increase (+6k) in generated code in the core kernel, and it doesn't add widespread new control dependencies either. I also measured the core kernel code generation impact on the kernel config from a major Linux distribution that uses PREEMPT_VOLUNTARY=y (Ubuntu): kepler:~/tip> grep PREEMPT .config # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_COUNT=y CONFIG_PREEMPT_NOTIFIERS=y text data bss filename 15754341 13790786 5242880 vmlinux.ubuntu.vanilla # 856deb866d16: ("Linux 5.9-rc5") 15754790 13791018 5242880 vmlinux.ubuntu.PREEMPT_COUNT=y # 7681205ba49d: ("preempt: Make preempt count unconditional") 15754771 13791018 5242880 vmlinux.ubuntu.full_cleanups # 849b9c5446cc: ("kvfree_rcu(): Fix ifnullfree.cocci warnings") In this test the changes result in very little generated code increase in the core kernel, just +449 bytes, or +0.003%. In fact the impact was so low on this config that I initially disbelieved it and double-checked the result and re-ran the build with all =m's turned into =y's, to get a whole-kernel measurement of the generated code impact: text data bss filename 84594448 61819613 42000384 vmlinux.ubuntu.vanilla # 856deb866d16: ("Linux 5.9-rc5") 84594129 61819777 42000384 vmlinux.ubuntu.PREEMPT_COUNT=y # 7681205ba49d: ("preempt: Make preempt count unconditional") Note how the full ~84 MB image actually *shrunk*, possibly due to random function & section alignment noise. So to get a truly sensitive measurement of the impact of the PREEMPT_COUNT change I built with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, to get tight instruction packing and no alignment padding artifacts: text data bss filename 69460329 60932573 40411136 vmlinux.ubuntu.vanilla # 856deb866d16: ("Linux 5.9-rc5") 69460739 60936853 40411136 vmlinux.ubuntu.PREEMPT_COUNT=y # 7681205ba49d: ("preempt: Make preempt count unconditional") This shows a 410 bytes (+0.0005%) increase. ( Side note: it's rather impressive that -Os saves 21% of text size - if only GCC wasn't so stupid with the final 2-3% size optimizations... ) So there's even less relative impact on the whole 84 MB kernel image - modules don't do much direct preempt_count manipulation. Just for completeness' sake I re-ran the original defconfig build as well, this time with -Os: text data bss filename 16091696 5565988 2928696 vmlinux.defconfig.Os.vanilla # 856deb866d16: ("Linux 5.9-rc5") 16095525 5570156 2928696 vmlinux.defconfig.Os.PREEMPT_COUNT=y # 7681205ba49d: ("preempt: Make preempt count unconditional") 3.8k, or +0.025% - similar to the initial +0.03% result. So even though I'm normally fiercely anti-bloat, if we combine the performance and code generation measurements with these maintainability arguments: "It's about time to make essential functionality of the kernel consistent across the various preemption models. Enable CONFIG_PREEMPT_COUNT unconditionally. Follow up changes will remove the #ifdeffery and remove the config option at the end." I think the PREEMPT_COUNT=y change to reduce the schizm between the various preemption models is IMHO justified - and reducing the code base distance to -rt is the icing on the cake. Thanks, Ingo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-12 20:25 ` Linus Torvalds 2020-10-12 21:44 ` Paul E. McKenney 2020-10-13 7:32 ` Ingo Molnar @ 2020-10-13 9:16 ` Thomas Gleixner 2 siblings, 0 replies; 13+ messages in thread From: Thomas Gleixner @ 2020-10-13 9:16 UTC (permalink / raw) To: Linus Torvalds, Ingo Molnar Cc: Linux Kernel Mailing List, Paul E. McKenney, Peter Zijlstra, Andrew Morton Linus, On Mon, Oct 12 2020 at 13:25, Linus Torvalds wrote: > I don't absolutely hate that code, and I'm willing to be convinced > about how little it matter for people who don't want to have the > counting overhead, but I refuse to pull it as some secret hidden thing > that isn't even mentioned in the pull request. > > Honestly, I did not get any strong arguments for why making the > preempt count unconditional was such an important thing. > > Yes, Thomas pointed me at a couple of uses that were garbage, but even > the people involved in those seemed to agree they were legacy garbage. Yes, we agreed on those parts and also started to act on your request to remove these constructs where non core code is changing behaviour depending on some context check by either seperating it or handing the condition in from the call sites. You said in that original discussion: > Of course core code can (and will) look at things like > > if (in_interrupt()) > .. schedule work asynchronously .. > > because core code ends up being called from odd places, and code like > that is expected to have understanding of the rules it plays with. > > But something like RCU is a very different beast from some "walk the > scatter-gather list" code. > > RCU does its work in the background, and works with lots of different > things. And it's so core and used everywhere that it knows about these > things. I mean, we literally have special code explicitly to let RCU > know "we entered kernel context now". As Paul was facing the problem of not knowing the exact context on PREEMPT_NONE kernels which can aside of the raw spinlock issue cause deadlocks vs. the page allocator, we had the impression that solving this particular itch with a consistent preempt count was justified. As you did not voice objections on Pauls series which incorporated my initial work, I was encouraging Paul to go ahead with this. Of course if I misinterpreted your silence on that patch submission, I need to fine tune my void decoder. That said, I fully agree that this all should have been part of the pull request message and in hindsight I should have reminded Ingo to be very careful about that when we were splitting up the pull request pile of the tip tree. I hope that clarifies the context some more and makes the whole thing more palatable. Thanks, tglx /me goes back to stare at in_*() constructs... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-12 14:14 [GIT PULL] RCU changes for v5.10 Ingo Molnar 2020-10-12 20:25 ` Linus Torvalds @ 2020-10-18 21:39 ` Linus Torvalds 2020-10-19 3:24 ` Paul E. McKenney 2020-10-19 7:31 ` Ingo Molnar 1 sibling, 2 replies; 13+ messages in thread From: Linus Torvalds @ 2020-10-18 21:39 UTC (permalink / raw) To: Ingo Molnar Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner, Peter Zijlstra, Andrew Morton On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote: > > Please pull the latest core/rcu git tree from: > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12 I've pulled everything but that last merge and the PREEMPT_COUNT stuff that came with it. When Paul asked whether it was ok for RCU to use preempt_count() and I answered in the affirmative, I didn't mean it in the sense of "RCU wants to force it on everybody else too". I'm pretty convinced that the proper fix is to simply make sure that rcu_free() and friends aren't run under any raw spinlocks. So even if the cost of preempt-count isn't that noticeable, there just isn't a reason for RCU to say "screw everybody else, I want this" when there are other alternatives. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-18 21:39 ` Linus Torvalds @ 2020-10-19 3:24 ` Paul E. McKenney 2020-10-19 16:51 ` Linus Torvalds 2020-10-19 7:31 ` Ingo Molnar 1 sibling, 1 reply; 13+ messages in thread From: Paul E. McKenney @ 2020-10-19 3:24 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Andrew Morton On Sun, Oct 18, 2020 at 02:39:56PM -0700, Linus Torvalds wrote: > On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote: > > > > Please pull the latest core/rcu git tree from: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12 > > I've pulled everything but that last merge and the PREEMPT_COUNT stuff > that came with it. > > When Paul asked whether it was ok for RCU to use preempt_count() and I > answered in the affirmative, I didn't mean it in the sense of "RCU > wants to force it on everybody else too". > > I'm pretty convinced that the proper fix is to simply make sure that > rcu_free() and friends aren't run under any raw spinlocks. So even if > the cost of preempt-count isn't that noticeable, there just isn't a > reason for RCU to say "screw everybody else, I want this" when there > are other alternatives. Thank you for pulling the other branches. On CONFIG_PREEMPT_COUNT, got it. It would be OK for RCU to use preempt_count() for some debugging or specialty kernel, but not across the board. Thank you for bearing with me on this one. There is more to it than just raw spinlocks, but regardless we will go back to the drawing board and come up with a less intrusive fix for the v5.11 merge window. Thanx, Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-19 3:24 ` Paul E. McKenney @ 2020-10-19 16:51 ` Linus Torvalds 0 siblings, 0 replies; 13+ messages in thread From: Linus Torvalds @ 2020-10-19 16:51 UTC (permalink / raw) To: Paul E. McKenney Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Andrew Morton On Sun, Oct 18, 2020 at 8:24 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On CONFIG_PREEMPT_COUNT, got it. It would be OK for RCU to use > preempt_count() for some debugging or specialty kernel, but not across > the board. Right - that was what I thought you were asking originally. I don't think a driver or random piece of code like that should ever use "preempt_count()" on its own - partly because the rules are subtle, but partly simply because drivers have no business with those kinds of low-level things. But yeah, for some core stuff like RCU, using preempt_count() for debugging etc makes sense. Just not to change _behavior_, because preempt_count on its own is almost entirely meaningless. It's just one (local) part of so much state. Again, partly because preempt count isn't necessarily always meaningful due to config settings, but partly because there are just so many other things like "are interrupts disabled" or "are we in an NMI context" or whatever. And in some odd situation, depending on exactly what you do, maybe preempt-count can be exactly what you need, because you know everything else about the state statically. "preempt_enable()" obviously is one such thing - the whole point is "if CONFIG_PREEMPT_COUNT is on, then the _semantics_ of this is 'increase preempt count', and if it goes to zero, and we should reschedule, do that'". So it's not that preempt_count() is meaningless, but it's such a specialized thing that 99.9% of all code really cannot and shouldn't use it. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [GIT PULL] RCU changes for v5.10 2020-10-18 21:39 ` Linus Torvalds 2020-10-19 3:24 ` Paul E. McKenney @ 2020-10-19 7:31 ` Ingo Molnar 1 sibling, 0 replies; 13+ messages in thread From: Ingo Molnar @ 2020-10-19 7:31 UTC (permalink / raw) To: Linus Torvalds Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner, Peter Zijlstra, Andrew Morton * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote: > > > > Please pull the latest core/rcu git tree from: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12 > > I've pulled everything but that last merge and the PREEMPT_COUNT > stuff that came with it. > > When Paul asked whether it was ok for RCU to use preempt_count() and > I answered in the affirmative, I didn't mean it in the sense of "RCU > wants to force it on everybody else too". > > I'm pretty convinced that the proper fix is to simply make sure that > rcu_free() and friends aren't run under any raw spinlocks. So even > if the cost of preempt-count isn't that noticeable, there just isn't > a reason for RCU to say "screw everybody else, I want this" when > there are other alternatives. That's certainly true - thanks for catching this & sorting it out from the bigger pull request! Thanks, Ingo ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-10-19 16:52 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-12 14:14 [GIT PULL] RCU changes for v5.10 Ingo Molnar 2020-10-12 20:25 ` Linus Torvalds 2020-10-12 21:44 ` Paul E. McKenney 2020-10-12 21:59 ` Linus Torvalds 2020-10-12 23:54 ` Paul E. McKenney 2020-10-13 0:14 ` Linus Torvalds 2020-10-13 2:41 ` Paul E. McKenney 2020-10-13 7:32 ` Ingo Molnar 2020-10-13 9:16 ` Thomas Gleixner 2020-10-18 21:39 ` Linus Torvalds 2020-10-19 3:24 ` Paul E. McKenney 2020-10-19 16:51 ` Linus Torvalds 2020-10-19 7:31 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).