linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] RCU changes for v5.10
@ 2020-10-12 14:14 Ingo Molnar
  2020-10-12 20:25 ` Linus Torvalds
  2020-10-18 21:39 ` Linus Torvalds
  0 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2020-10-12 14:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Paul E. McKenney, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

Linus,

Please pull the latest core/rcu git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12

   # HEAD: c6de896fa0a4546c799c86513d99bd011b4a6177 Merge branch 'rcu/fix-rt' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu

RCU changes for v5.10:

 - Debugging for smp_call_function()
 - RT raw/non-raw lock ordering fixes
 - Strict grace periods for KASAN
 - New smp_call_function() torture test
 - Torture-test updates
 - Documentation updates
 - Miscellaneous fixes

 Thanks,

	Ingo

------------------>
Alexander A. Klimov (1):
      rcutorture: Replace HTTP links with HTTPS ones

Colin Ian King (1):
      refperf: Avoid null pointer dereference when buf fails to allocate

Joel Fernandes (Google) (6):
      rcu/trace: Print negative GP numbers correctly
      rcu/trace: Use gp_seq_req in acceleration's rcu_grace_period tracepoint
      rcu: Clarify comments about FQS loop reporting quiescent states
      rcu: Make FQS more aggressive in complaining about offline CPUs
      rcutorture: Output number of elapsed grace periods
      rcu/segcblist: Prevent useless GP start if no CBs to accelerate

Madhuparna Bhowmik (2):
      rculist: Introduce list/hlist_for_each_entry_srcu() macros
      kvm: mmu: page_track: Fix RCU list API usage

Neeraj Upadhyay (2):
      rcu/tree: Force quiescent state on callback overload
      rcu/tree: Remove CONFIG_PREMPT_RCU check in force_qs_rnp()

Paul E. McKenney (56):
      lib: Add backtrace_idle parameter to force backtrace of idle CPUs
      rcu: Remove KCSAN stubs
      rcu: Remove KCSAN stubs from update.c
      srcu: Remove KCSAN stubs
      rcu: Initialize at declaration time in rcu_exp_handler()
      nocb: Clarify RCU nocb CPU error message
      nocb: Remove show_rcu_nocb_state() false positive printout
      rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_divisor
      rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_resched_ns
      rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_kick_kthreads
      rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_cpu_stall_ftrace_dump
      rcu: Move rcu_cpu_started per-CPU variable to rcu_data
      rcu/nocb: Add a warning for non-GP kthread running GP code
      rcu: Remove unused __rcu_is_watching() function
      scftorture: Add smp_call_function() torture test
      torture: Declare parse-console.sh independence from rcutorture
      torture: Add scftorture to the rcutorture scripting
      scftorture: Implement weighted primitive selection
      tick-sched: Clarify "NOHZ: local_softirq_pending" warning
      scftorture: Summarize per-thread statistics
      scftorture: Add smp_call_function_single() memory-ordering checks
      scftorture: Add smp_call_function_many() memory-ordering checks
      scftorture: Add smp_call_function() memory-ordering checks
      scftorture: Consolidate scftorture_invoke_one() check and kfree()
      scftorture: Consolidate scftorture_invoke_one() scf_check initialization
      scftorture: Flag errors in torture-compatible manner
      scftorture: Prevent compiler from reducing race probabilities
      scftorture: Check unexpected "switch" statement value
      scftorture: Block scftorture_invoker() kthreads for offline CPUs
      scftorture: Adapt memory-ordering test to UP operation
      scftorture: Add cond_resched() to test loop
      rcuperf: Change rcuperf to rcuscale
      rcu: Add Kconfig option for strict RCU grace periods
      rcu: Reduce leaf fanout for strict RCU grace periods
      rcu: Restrict default jiffies_till_first_fqs for strict RCU GPs
      rcu: Force DEFAULT_RCU_BLIMIT to 1000 for strict RCU GPs
      rcu: Always set .need_qs from __rcu_read_lock() for strict GPs
      rcu: Do full report for .need_qs for strict GPs
      rcu: Attempt QS when CPU discovers GP for strict GPs
      rcu: IPI all CPUs at GP start for strict GPs
      rcu: IPI all CPUs at GP end for strict GPs
      rcu: Provide optional RCU-reader exit delay for strict GPs
      rcu: Execute RCU reader shortly after rcu_core for strict GPs
      rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() for strict GPs
      rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
      rcutorture: Remove KCSAN stubs
      torture: Update initrd documentation
      rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
      torture: Add kvm.sh --help and update help message
      rcutorture: Properly set rcu_fwds for OOM handling
      rcutorture: Properly synchronize with OOM notifier
      rcutorture: Hoist OOM registry up one level
      rcutorture: Allow pointer leaks to test diagnostic code
      torture: Add gdb support
      smp: Add source and destination CPUs to __call_single_data
      kernel/smp: Provide CSD lock timeout diagnostics

Paul Gortmaker (1):
      torture: document --allcpus argument added to the kvm.sh script

Randy Dunlap (2):
      doc: Drop doubled words from RCU Data-Structures.rst
      doc: Drop doubled words from RCU requirements documentation

Thomas Gleixner (13):
      lib/debug: Remove pointless ARCH_NO_PREEMPT dependencies
      preempt: Make preempt count unconditional
      preempt: Cleanup PREEMPT_COUNT leftovers
      lockdep: Cleanup PREEMPT_COUNT leftovers
      mm/pagemap: Cleanup PREEMPT_COUNT leftovers
      locking/bitspinlock: Cleanup PREEMPT_COUNT leftovers
      uaccess: Cleanup PREEMPT_COUNT leftovers
      sched: Cleanup PREEMPT_COUNT leftovers
      ARM: Cleanup PREEMPT_COUNT leftovers
      xtensa: Cleanup PREEMPT_COUNT leftovers
      drm/i915: Cleanup PREEMPT_COUNT leftovers
      rcutorture: Cleanup PREEMPT_COUNT leftovers
      preempt: Remove PREEMPT_COUNT from Kconfig

Tobias Klauser (2):
      docs: Fix typo in synchronize_rcu() function name
      rcu: Fix kerneldoc comments in rcupdate.h

Uladzislau Rezki (Sony) (1):
      rcu/tree: Allocate a page when caller is preemptible

Wei Yongjun (3):
      scftorture: Make symbol 'scf_torture_rand' static
      locktorture: Make function torture_percpu_rwsem_init() static
      smp: Make symbol 'csd_bug_count' static

Zqiang (1):
      rcu: Shrink each possible cpu krcp

kernel test robot (1):
      kvfree_rcu(): Fix ifnullfree.cocci warnings


 .../RCU/Design/Data-Structures/Data-Structures.rst |   2 +-
 .../RCU/Design/Requirements/Requirements.rst       |   4 +-
 Documentation/RCU/whatisRCU.rst                    |   2 +-
 Documentation/admin-guide/kernel-parameters.txt    | 153 +++++-
 MAINTAINERS                                        |   3 +-
 arch/arm/include/asm/assembler.h                   |  11 -
 arch/arm/kernel/iwmmxt.S                           |   2 -
 arch/arm/mach-ep93xx/crunch-bits.S                 |   2 -
 arch/x86/kvm/mmu/page_track.c                      |   6 +-
 arch/xtensa/kernel/entry.S                         |   2 +-
 drivers/gpu/drm/i915/Kconfig.debug                 |   1 -
 drivers/gpu/drm/i915/i915_utils.h                  |   3 +-
 include/linux/bit_spinlock.h                       |   4 +-
 include/linux/lockdep.h                            |   6 +-
 include/linux/pagemap.h                            |   4 +-
 include/linux/preempt.h                            |  37 +-
 include/linux/rculist.h                            |  48 ++
 include/linux/rcupdate.h                           |  19 +-
 include/linux/rcutiny.h                            |   1 -
 include/linux/rcutree.h                            |   1 -
 include/linux/smp.h                                |   3 +
 include/linux/smp_types.h                          |   3 +
 include/linux/uaccess.h                            |   6 +-
 include/trace/events/rcu.h                         |  54 +-
 kernel/Kconfig.preempt                             |   4 -
 kernel/Makefile                                    |   2 +
 kernel/entry/common.c                              |   2 +-
 kernel/locking/locktorture.c                       |   2 +-
 kernel/rcu/Kconfig                                 |   8 +-
 kernel/rcu/Kconfig.debug                           |  17 +-
 kernel/rcu/Makefile                                |   2 +-
 kernel/rcu/rcu_segcblist.c                         |  10 +-
 kernel/rcu/{rcuperf.c => rcuscale.c}               | 330 ++++++------
 kernel/rcu/rcutorture.c                            |  61 ++-
 kernel/rcu/refscale.c                              |   8 +-
 kernel/rcu/srcutree.c                              |  13 -
 kernel/rcu/tree.c                                  | 244 +++++----
 kernel/rcu/tree.h                                  |   2 +
 kernel/rcu/tree_exp.h                              |   6 +-
 kernel/rcu/tree_plugin.h                           |  40 +-
 kernel/rcu/tree_stall.h                            |   8 +-
 kernel/rcu/update.c                                |  13 -
 kernel/scftorture.c                                | 575 +++++++++++++++++++++
 kernel/sched/core.c                                |   6 +-
 kernel/smp.c                                       | 134 +++++
 kernel/time/tick-sched.c                           |   2 +-
 lib/Kconfig.debug                                  |  24 +-
 lib/nmi_backtrace.c                                |   6 +-
 ...rf-ftrace.sh => kvm-recheck-rcuscale-ftrace.sh} |   6 +-
 ...-recheck-rcuperf.sh => kvm-recheck-rcuscale.sh} |  14 +-
 .../selftests/rcutorture/bin/kvm-recheck-scf.sh    |  38 ++
 .../selftests/rcutorture/bin/kvm-test-1-run.sh     |  33 +-
 tools/testing/selftests/rcutorture/bin/kvm.sh      |  36 +-
 .../selftests/rcutorture/bin/parse-console.sh      |  11 +-
 .../selftests/rcutorture/configs/rcu/SRCU-t        |   1 -
 .../selftests/rcutorture/configs/rcu/SRCU-u        |   1 -
 .../selftests/rcutorture/configs/rcu/TINY01        |   1 -
 .../selftests/rcutorture/configs/rcu/TREE05        |   1 +
 .../selftests/rcutorture/configs/rcuperf/CFcommon  |   2 -
 .../configs/{rcuperf => rcuscale}/CFLIST           |   0
 .../selftests/rcutorture/configs/rcuscale/CFcommon |   2 +
 .../rcutorture/configs/{rcuperf => rcuscale}/TINY  |   0
 .../rcutorture/configs/{rcuperf => rcuscale}/TREE  |   0
 .../configs/{rcuperf => rcuscale}/TREE54           |   0
 .../configs/{rcuperf => rcuscale}/ver_functions.sh |   4 +-
 .../selftests/rcutorture/configs/scf/CFLIST        |   2 +
 .../selftests/rcutorture/configs/scf/CFcommon      |   2 +
 .../selftests/rcutorture/configs/scf/NOPREEMPT     |   9 +
 .../rcutorture/configs/scf/NOPREEMPT.boot          |   1 +
 .../selftests/rcutorture/configs/scf/PREEMPT       |   9 +
 .../rcutorture/configs/scf/ver_functions.sh        |  30 ++
 .../testing/selftests/rcutorture/doc/TINY_RCU.txt  |   5 +-
 .../selftests/rcutorture/doc/TREE_RCU-kconfig.txt  |   1 -
 tools/testing/selftests/rcutorture/doc/initrd.txt  |  36 +-
 .../selftests/rcutorture/doc/rcu-test-image.txt    |  41 +-
 .../rcutorture/formal/srcu-cbmc/src/config.h       |   1 -
 76 files changed, 1626 insertions(+), 557 deletions(-)
 rename kernel/rcu/{rcuperf.c => rcuscale.c} (64%)
 create mode 100644 kernel/scftorture.c
 rename tools/testing/selftests/rcutorture/bin/{kvm-recheck-rcuperf-ftrace.sh => kvm-recheck-rcuscale-ftrace.sh} (92%)
 rename tools/testing/selftests/rcutorture/bin/{kvm-recheck-rcuperf.sh => kvm-recheck-rcuscale.sh} (84%)
 create mode 100755 tools/testing/selftests/rcutorture/bin/kvm-recheck-scf.sh
 delete mode 100644 tools/testing/selftests/rcutorture/configs/rcuperf/CFcommon
 rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/CFLIST (100%)
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcuscale/CFcommon
 rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/TINY (100%)
 rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/TREE (100%)
 rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/TREE54 (100%)
 rename tools/testing/selftests/rcutorture/configs/{rcuperf => rcuscale}/ver_functions.sh (88%)
 create mode 100644 tools/testing/selftests/rcutorture/configs/scf/CFLIST
 create mode 100644 tools/testing/selftests/rcutorture/configs/scf/CFcommon
 create mode 100644 tools/testing/selftests/rcutorture/configs/scf/NOPREEMPT
 create mode 100644 tools/testing/selftests/rcutorture/configs/scf/NOPREEMPT.boot
 create mode 100644 tools/testing/selftests/rcutorture/configs/scf/PREEMPT
 create mode 100644 tools/testing/selftests/rcutorture/configs/scf/ver_functions.sh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-12 14:14 [GIT PULL] RCU changes for v5.10 Ingo Molnar
@ 2020-10-12 20:25 ` Linus Torvalds
  2020-10-12 21:44   ` Paul E. McKenney
                     ` (2 more replies)
  2020-10-18 21:39 ` Linus Torvalds
  1 sibling, 3 replies; 13+ messages in thread
From: Linus Torvalds @ 2020-10-12 20:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
>
> Please pull the latest core/rcu git tree from:
>
> RCU changes for v5.10:
>
>  - Debugging for smp_call_function()
>  - RT raw/non-raw lock ordering fixes
>  - Strict grace periods for KASAN
>  - New smp_call_function() torture test
>  - Torture-test updates
>  - Documentation updates
>  - Miscellaneous fixes

I am *very* unhappy with this pull request.

It doesn't even mention the big removal of CONFIR_PREEMPT, that I felt
was still under discussion.

I don't absolutely hate that code, and I'm willing to be convinced
about how little it matter for people who don't want to have the
counting overhead, but I refuse to pull it as some secret hidden thing
that isn't even mentioned in the pull request.

Honestly, I did not get any strong arguments for why making the
preempt count unconditional was such an important thing.

Yes, Thomas pointed me at a couple of uses that were garbage, but even
the people involved in those seemed to agree they were legacy garbage.

So why was this preempt-count thing then pushed through like this?

                 Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-12 20:25 ` Linus Torvalds
@ 2020-10-12 21:44   ` Paul E. McKenney
  2020-10-12 21:59     ` Linus Torvalds
  2020-10-13  7:32   ` Ingo Molnar
  2020-10-13  9:16   ` Thomas Gleixner
  2 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2020-10-12 21:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Mon, Oct 12, 2020 at 01:25:09PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Please pull the latest core/rcu git tree from:
> >
> > RCU changes for v5.10:
> >
> >  - Debugging for smp_call_function()
> >  - RT raw/non-raw lock ordering fixes
> >  - Strict grace periods for KASAN
> >  - New smp_call_function() torture test
> >  - Torture-test updates
> >  - Documentation updates
> >  - Miscellaneous fixes
> 
> I am *very* unhappy with this pull request.
> 
> It doesn't even mention the big removal of CONFIR_PREEMPT, that I felt
> was still under discussion.
> 
> I don't absolutely hate that code, and I'm willing to be convinced
> about how little it matter for people who don't want to have the
> counting overhead, but I refuse to pull it as some secret hidden thing
> that isn't even mentioned in the pull request.
> 
> Honestly, I did not get any strong arguments for why making the
> preempt count unconditional was such an important thing.
> 
> Yes, Thomas pointed me at a couple of uses that were garbage, but even
> the people involved in those seemed to agree they were legacy garbage.
> 
> So why was this preempt-count thing then pushed through like this?

So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it
is safe to invoke the memory allocator.  RCU needs to figure this out
for -rt kernels and for the CONFIG_PROVE_RAW_LOCK_NESTING Kconfig option
that was recently added to lockdep.  And with this option, lockdep has
been triggering for kvfree_rcu() for awhile now.

We have tried a number of alternative fixes, but they have had subtle
problems.  Or, in the case of the alternative that uses a lockless
interface to the memory allocator, the not-so-subtle problem of strong
resistance from the maintainers.

In contrast, your earlier comments seemed to indicate that with a valid
use case, you would be OK with unconditional PREEMPT_COUNT, though
perhaps that was a case of excessive optimism on my part.  I based my
optimism in part on your not having complained about either the patch
series or the pull request, both of which I CCed you on:

https://lore.kernel.org/lkml/20200928233041.GA23230@paulmck-ThinkPad-P72
	Patch series.

https://lore.kernel.org/lkml/20201001210750.GA25287@paulmck-ThinkPad-P72
	Pull request.

Of course, if you hate this approach, we can go back to browbeating the
memory-allocator maintainers.  On the other hand, the approach used in
this pull request does work quite well and I do know that there are
people who were quite tired of the kvfree_rcu() lockdep complaints
starting quite some time ago.

But either way, please let me know how you would like us to proceed.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-12 21:44   ` Paul E. McKenney
@ 2020-10-12 21:59     ` Linus Torvalds
  2020-10-12 23:54       ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2020-10-12 21:59 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Mon, Oct 12, 2020 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it
> is safe to invoke the memory allocator.

So in what situation is RCU called from random contexts that it can't even tell?

> But either way, please let me know how you would like us to proceed.

Well, AT A MINIMUM, the pull request should damn well have made it
1000% clear that this removes a case that has existed for decades, and
that potentially makes a difference for small kernels in particular.

In fact, my personal config option - still to this day - is
CONFIG_PREEMPT_VOLUNTARY and on the kernel I'm running,
CONFIG_PREEMPT_COUNT isn't actually set.

Because honestly, the code generation of some core code looks better
that way (in places where I've historically looked at things), and the
latency arguments against it simply aren't relevant when you have 8
cores or more.

So i don't think that "make preempt count unconditional" is some small
meaningless detail.

What is so magical about RCU allocating memory? I assume it's some
debug case? Why does that debug case then have a

    select PREEMPT_COUNT

like is done for PROVE_LOCKING?

> I based my
> optimism in part on your not having complained about either the patch
> series or the pull request, both of which I CCed you on:

I had already raised my concerns when that patch series was posted by
Thomas originally. I did not feel like I needed to re-raise them just
because the series got reposted by somebody else.

                Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-12 21:59     ` Linus Torvalds
@ 2020-10-12 23:54       ` Paul E. McKenney
  2020-10-13  0:14         ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2020-10-12 23:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Mon, Oct 12, 2020 at 02:59:41PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it
> > is safe to invoke the memory allocator.
> 
> So in what situation is RCU called from random contexts that it can't even tell?

In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or
not its caller holds a raw spinlock, which some callers do.  And if its
caller holds a raw spinlock, then RCU cannot invoke the memory allocator
because the allocator acquires non-raw spinlocks, which in turn results
in lockdep splats.  Making CONFIG_PREEMPT_COUNT unconditional allows
RCU to make this determination.

Please note that RCU always provides a fallback for memory-allocation
failure, but such failure needs to be rare, at least in non-OOM
situations.

The alternatives to this approach are:

1.	Lockless memory allocation, which was provided by an earlier
	patch series.  Again, the relevant maintainers are not happy
	with this approach.

2.	Defer memory allocation to a clean environment.  However,
	even softirq handlers are not clean enough, so this approach
	incurs a full scheduling delay.  And this delay is incurred
	unconditionally in kernels built with CONFIG_PREEMPT_COUNT=n,
	even if the system has memory coming out of its ears, and even
	if RCU's caller happens to be a clean environment.

3.	A long and sad litany of subtly broken approaches.

> > But either way, please let me know how you would like us to proceed.
> 
> Well, AT A MINIMUM, the pull request should damn well have made it
> 1000% clear that this removes a case that has existed for decades, and
> that potentially makes a difference for small kernels in particular.

Got it, thank you.

> In fact, my personal config option - still to this day - is
> CONFIG_PREEMPT_VOLUNTARY and on the kernel I'm running,
> CONFIG_PREEMPT_COUNT isn't actually set.
> 
> Because honestly, the code generation of some core code looks better
> that way (in places where I've historically looked at things), and the
> latency arguments against it simply aren't relevant when you have 8
> cores or more.
> 
> So i don't think that "make preempt count unconditional" is some small
> meaningless detail.

Understood and agreed.  And to take your point one step further, not
just CONFIG_PREEMPT_VOLUNTARY but also CONFIG_PREEMPT_NONE is also in
extremely heavy use, including by my employer.

And understood on kernel text size.  Raw performance is a different story:
Even microbenchmarks didn't show statistically significant performance
change from CONFIG_PREEMPT_COUNT=n, and system-level benchmarks showed no
difference whatsoever.

So would it help if CONFIG_PREEMPT_COUNT=n became unconditional only for
CONFIG_SMP=y kernels?  RCU does have other options for CONFIG_SMP=n.  Or
do your small-kernel concerns extend beyond single-CPU microcontrollers?

> What is so magical about RCU allocating memory? I assume it's some
> debug case? Why does that debug case then have a
> 
>     select PREEMPT_COUNT
> 
> like is done for PROVE_LOCKING?

Sadly, no, it is not just a debug case.

This memory allocation enables a cache-locality optimization to
callback processing that reduces cache misses.  This optimization
is currently implemented only for kvfree_rcu(), where it reduces
callback-invocation-time cache misses by a factor of eight on typical
x86 systems, which produces decent system-level benefits.  So it would
be good to also apply this optimization to call_rcu().

> > I based my
> > optimism in part on your not having complained about either the patch
> > series or the pull request, both of which I CCed you on:
> 
> I had already raised my concerns when that patch series was posted by
> Thomas originally. I did not feel like I needed to re-raise them just
> because the series got reposted by somebody else.

OK, I did not know, but I do know it now!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-12 23:54       ` Paul E. McKenney
@ 2020-10-13  0:14         ` Linus Torvalds
  2020-10-13  2:41           ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2020-10-13  0:14 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Mon, Oct 12, 2020 at 4:54 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or
> not its caller holds a raw spinlock, which some callers do.

Only kfree_rcu()? (And apparently eventually call_rcu())?

And since we have lockdep, and it warns about it, and raw spinlocks
are really really rare, do we really need to then disable this simple
optimization for everybody else?

We have been very successful with "don't do that then" rules.

Eg, you cannot do normal memory allocations inside a spinlock (or you
have to mark them with GFP_ATOMIC, and not all allocations can be
marked as such), and this has been the case basically forever. And we
have debug code and tools that will check that.

Why is it impossible to just say "you can't do kfree_rcu() while
holding a raw spinlock"?

Particularly for something like kfree_rcu() and particularly if it's
just about raw spinlocks, it would seem to be very natural to just say
"just delay freeing it until after you've released the raw spinlock".

Because I sure hope that you don't find raw spinlock users in random
places. It should be stuff like core scheduling, RCU itself, etc.

> Making CONFIG_PREEMPT_COUNT unconditional allows
> RCU to make this determination.

I understand _that_ part, but the thing I find objectionable is how a
small piece of code seems to want to change the rules we have had in
the kernel since basically day #1.

(Ok, so the preempt count itself is much more recent than "day #1",
but the basic non-counting spinlocks do go back to very early in the
SMP stages).

                    Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-13  0:14         ` Linus Torvalds
@ 2020-10-13  2:41           ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2020-10-13  2:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Mon, Oct 12, 2020 at 05:14:42PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 4:54 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or
> > not its caller holds a raw spinlock, which some callers do.
> 
> Only kfree_rcu()? (And apparently eventually call_rcu())?

Yes.  The other RCU APIs either only use raw spinlocks themselves on
the one hand or must be called from schedulable contexts on the other.

> And since we have lockdep, and it warns about it, and raw spinlocks
> are really really rare, do we really need to then disable this simple
> optimization for everybody else?
> 
> We have been very successful with "don't do that then" rules.
> 
> Eg, you cannot do normal memory allocations inside a spinlock (or you
> have to mark them with GFP_ATOMIC, and not all allocations can be
> marked as such), and this has been the case basically forever. And we
> have debug code and tools that will check that.
> 
> Why is it impossible to just say "you can't do kfree_rcu() while
> holding a raw spinlock"?
> 
> Particularly for something like kfree_rcu() and particularly if it's
> just about raw spinlocks, it would seem to be very natural to just say
> "just delay freeing it until after you've released the raw spinlock".
> 
> Because I sure hope that you don't find raw spinlock users in random
> places. It should be stuff like core scheduling, RCU itself, etc.

True enough, but core stuff does use RCU, some of it while holding
raw spinlocks.

And you are right that "just don't do that, defer it instead" is often
very effective.  In fact, I defer wakeups within RCU in order to avoid
deadlocks with the scheduler.  It is simple in concept, and it does
work, but it is also a disproportionate source of bugs.  Most of which
rcutorture finds in the safety and comfort of my own system, thankfully,
but some do escape.  Maybe I am overreacting, but I have been burned
often enough that I feel the need to avoid this.

Plus I did oversimplify.  CONFIG_PREEMPT_COUNT also allows the call_rcu()
portion to avoid deadlocks with the current non-lockless memory allocator.

So if the answer from you on global CONFIG_PREEMPT_COUNT=y and from
the MM guys on lockless allocation is irrevocably "@#$@#$ NO!" or the
current-day bowdlerized equivalent, I will take the scheduling delays
in the shorts and defer allocation.

> > Making CONFIG_PREEMPT_COUNT unconditional allows
> > RCU to make this determination.
> 
> I understand _that_ part, but the thing I find objectionable is how a
> small piece of code seems to want to change the rules we have had in
> the kernel since basically day #1.
> 
> (Ok, so the preempt count itself is much more recent than "day #1",
> but the basic non-counting spinlocks do go back to very early in the
> SMP stages).

Understood, a count-free CONFIG_PREEMPT_NONE has been in place in the
Linux kernel for an extremely long time.  And I also understand that
adding CONFIG_PREEMPT_COUNT=y everywhere is a pervasive change that is
not to be taken lightly.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-12 20:25 ` Linus Torvalds
  2020-10-12 21:44   ` Paul E. McKenney
@ 2020-10-13  7:32   ` Ingo Molnar
  2020-10-13  9:16   ` Thomas Gleixner
  2 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2020-10-13  7:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Please pull the latest core/rcu git tree from:
> >
> > RCU changes for v5.10:
> >
> >  - Debugging for smp_call_function()
> >  - RT raw/non-raw lock ordering fixes
> >  - Strict grace periods for KASAN
> >  - New smp_call_function() torture test
> >  - Torture-test updates
> >  - Documentation updates
> >  - Miscellaneous fixes
> 
> I am *very* unhappy with this pull request.
> 
> It doesn't even mention the big removal of CONFIR_PREEMPT, that I felt 
> was still under discussion.

Not mentioning the unconditional PREEMPT_COUNT enabling aspect was 100% my 
fault in summarizing the changes insufficiently, as I (mistakenly) thought 
them to be uncontroversial. My apologies for that!

Here's a second attempt to properly justify these changes:

Regarding the performance aspect of the change, I was relying on these 
performance measurements:

  "Freshly conducted benchmarks did not reveal any measurable impact from 
   enabling preempt count unconditionally. On kernels with 
   CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY the preempt count is only 
   incremented and decremented but the result of the decrement is not 
   tested. Contrary to that enabling CONFIG_PREEMPT which tests the result 
   has a small but measurable impact due to the conditional branch/call."

FWIW, to inject some hard numbers into this discussion, here's also the 
code generation impact of an unconditional PREEMPT_COUNT, on x86-defconfig:

      text       data        bss    filename
  19675937    5591036    1433672    vmlinux.ubuntu.vanilla          # 856deb866d16: ("Linux 5.9-rc5")
  19682382    5590964    1425480    vmlinux.ubuntu.PREEMPT_COUNT=y  # 7681205ba49d: ("preempt: Make preempt count unconditional")

So this is a pretty small, +0.03% increase (+6k) in generated code in the 
core kernel, and it doesn't add widespread new control dependencies either.

I also measured the core kernel code generation impact on the kernel config 
from a major Linux distribution that uses PREEMPT_VOLUNTARY=y (Ubuntu):

  kepler:~/tip> grep PREEMPT .config
  # CONFIG_PREEMPT_NONE is not set
  CONFIG_PREEMPT_VOLUNTARY=y
  # CONFIG_PREEMPT is not set
  CONFIG_PREEMPT_COUNT=y
  CONFIG_PREEMPT_NOTIFIERS=y

     text       data        bss      filename
  15754341    13790786    5242880    vmlinux.ubuntu.vanilla          # 856deb866d16: ("Linux 5.9-rc5")
  15754790    13791018    5242880    vmlinux.ubuntu.PREEMPT_COUNT=y  # 7681205ba49d: ("preempt: Make preempt count unconditional")
  15754771    13791018    5242880    vmlinux.ubuntu.full_cleanups    # 849b9c5446cc: ("kvfree_rcu(): Fix ifnullfree.cocci warnings")

In this test the changes result in very little generated code increase in 
the core kernel, just +449 bytes, or +0.003%.

In fact the impact was so low on this config that I initially disbelieved 
it and double-checked the result and re-ran the build with all =m's turned 
into =y's, to get a whole-kernel measurement of the generated code impact:

      text       data        bss      filename
  84594448    61819613    42000384    vmlinux.ubuntu.vanilla          # 856deb866d16: ("Linux 5.9-rc5")
  84594129    61819777    42000384    vmlinux.ubuntu.PREEMPT_COUNT=y  # 7681205ba49d: ("preempt: Make preempt count unconditional")

Note how the full ~84 MB image actually *shrunk*, possibly due to random 
function & section alignment noise.

So to get a truly sensitive measurement of the impact of the PREEMPT_COUNT 
change I built with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, to get tight instruction 
packing and no alignment padding artifacts:

      text        data         bss    filename
  69460329    60932573    40411136    vmlinux.ubuntu.vanilla          # 856deb866d16: ("Linux 5.9-rc5")
  69460739    60936853    40411136    vmlinux.ubuntu.PREEMPT_COUNT=y  # 7681205ba49d: ("preempt: Make preempt count unconditional")

This shows a 410 bytes (+0.0005%) increase.

  ( Side note: it's rather impressive that -Os saves 21% of text size - if 
    only GCC wasn't so stupid with the final 2-3% size optimizations... )

So there's even less relative impact on the whole 84 MB kernel image - 
modules don't do much direct preempt_count manipulation.

Just for completeness' sake I re-ran the original defconfig build as well, 
this time with -Os:

     text       data        bss     filename
  16091696    5565988    2928696    vmlinux.defconfig.Os.vanilla          # 856deb866d16: ("Linux 5.9-rc5")
  16095525    5570156    2928696    vmlinux.defconfig.Os.PREEMPT_COUNT=y  # 7681205ba49d: ("preempt: Make preempt count unconditional")

3.8k, or +0.025% - similar to the initial +0.03% result.

So even though I'm normally fiercely anti-bloat, if we combine the 
performance and code generation measurements with these maintainability 
arguments:

   "It's about time to make essential functionality of the kernel consistent 
    across the various preemption models.

    Enable CONFIG_PREEMPT_COUNT unconditionally. Follow up changes will 
    remove the #ifdeffery and remove the config option at the end."

I think the PREEMPT_COUNT=y change to reduce the schizm between the various 
preemption models is IMHO justified - and reducing the code base distance 
to -rt is the icing on the cake.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-12 20:25 ` Linus Torvalds
  2020-10-12 21:44   ` Paul E. McKenney
  2020-10-13  7:32   ` Ingo Molnar
@ 2020-10-13  9:16   ` Thomas Gleixner
  2 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2020-10-13  9:16 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Linux Kernel Mailing List, Paul E. McKenney, Peter Zijlstra,
	Andrew Morton

Linus,

On Mon, Oct 12 2020 at 13:25, Linus Torvalds wrote:
> I don't absolutely hate that code, and I'm willing to be convinced
> about how little it matter for people who don't want to have the
> counting overhead, but I refuse to pull it as some secret hidden thing
> that isn't even mentioned in the pull request.
>
> Honestly, I did not get any strong arguments for why making the
> preempt count unconditional was such an important thing.
>
> Yes, Thomas pointed me at a couple of uses that were garbage, but even
> the people involved in those seemed to agree they were legacy garbage.

Yes, we agreed on those parts and also started to act on your request to
remove these constructs where non core code is changing behaviour
depending on some context check by either seperating it or handing the
condition in from the call sites.

You said in that original discussion:

> Of course core code can (and will) look at things like
>
>	if (in_interrupt())
>	    .. schedule work asynchronously ..
>
>  because core code ends up being called from odd places, and code like
>  that is expected to have understanding of the rules it plays with.
>
>  But something like RCU is a very different beast from some "walk the
>  scatter-gather list" code.
>
>  RCU does its work in the background, and works with lots of different
>  things. And it's so core and used everywhere that it knows about these
>  things. I mean, we literally have special code explicitly to let RCU
>  know "we entered kernel context now".

As Paul was facing the problem of not knowing the exact context on
PREEMPT_NONE kernels which can aside of the raw spinlock issue cause
deadlocks vs. the page allocator, we had the impression that solving
this particular itch with a consistent preempt count was justified.

As you did not voice objections on Pauls series which incorporated my
initial work, I was encouraging Paul to go ahead with this. Of course if
I misinterpreted your silence on that patch submission, I need to fine
tune my void decoder.

That said, I fully agree that this all should have been part of the pull
request message and in hindsight I should have reminded Ingo to be very
careful about that when we were splitting up the pull request pile of
the tip tree.

I hope that clarifies the context some more and makes the whole thing
more palatable.

Thanks,

        tglx

/me goes back to stare at in_*() constructs...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-12 14:14 [GIT PULL] RCU changes for v5.10 Ingo Molnar
  2020-10-12 20:25 ` Linus Torvalds
@ 2020-10-18 21:39 ` Linus Torvalds
  2020-10-19  3:24   ` Paul E. McKenney
  2020-10-19  7:31   ` Ingo Molnar
  1 sibling, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2020-10-18 21:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
>
> Please pull the latest core/rcu git tree from:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12

I've pulled everything but that last merge and the PREEMPT_COUNT stuff
that came with it.

When Paul asked whether it was ok for RCU to use preempt_count() and I
answered in the affirmative, I didn't mean it in the sense of "RCU
wants to force it on everybody else too".

I'm pretty convinced that the proper fix is to simply make sure that
rcu_free() and friends aren't run under any raw spinlocks. So even if
the cost of preempt-count isn't that noticeable, there just isn't a
reason for RCU to say "screw everybody else, I want this" when there
are other alternatives.

            Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-18 21:39 ` Linus Torvalds
@ 2020-10-19  3:24   ` Paul E. McKenney
  2020-10-19 16:51     ` Linus Torvalds
  2020-10-19  7:31   ` Ingo Molnar
  1 sibling, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2020-10-19  3:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Sun, Oct 18, 2020 at 02:39:56PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Please pull the latest core/rcu git tree from:
> >
> >    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12
> 
> I've pulled everything but that last merge and the PREEMPT_COUNT stuff
> that came with it.
> 
> When Paul asked whether it was ok for RCU to use preempt_count() and I
> answered in the affirmative, I didn't mean it in the sense of "RCU
> wants to force it on everybody else too".
> 
> I'm pretty convinced that the proper fix is to simply make sure that
> rcu_free() and friends aren't run under any raw spinlocks. So even if
> the cost of preempt-count isn't that noticeable, there just isn't a
> reason for RCU to say "screw everybody else, I want this" when there
> are other alternatives.

Thank you for pulling the other branches.

On CONFIG_PREEMPT_COUNT, got it.  It would be OK for RCU to use
preempt_count() for some debugging or specialty kernel, but not across
the board.  Thank you for bearing with me on this one.

There is more to it than just raw spinlocks, but regardless we will go
back to the drawing board and come up with a less intrusive fix for the
v5.11 merge window.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-18 21:39 ` Linus Torvalds
  2020-10-19  3:24   ` Paul E. McKenney
@ 2020-10-19  7:31   ` Ingo Molnar
  1 sibling, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2020-10-19  7:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Paul E. McKenney, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Please pull the latest core/rcu git tree from:
> >
> >    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12
> 
> I've pulled everything but that last merge and the PREEMPT_COUNT 
> stuff that came with it.
> 
> When Paul asked whether it was ok for RCU to use preempt_count() and 
> I answered in the affirmative, I didn't mean it in the sense of "RCU 
> wants to force it on everybody else too".
>
> I'm pretty convinced that the proper fix is to simply make sure that 
> rcu_free() and friends aren't run under any raw spinlocks. So even 
> if the cost of preempt-count isn't that noticeable, there just isn't 
> a reason for RCU to say "screw everybody else, I want this" when 
> there are other alternatives.

That's certainly true - thanks for catching this & sorting it out from 
the bigger pull request!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] RCU changes for v5.10
  2020-10-19  3:24   ` Paul E. McKenney
@ 2020-10-19 16:51     ` Linus Torvalds
  0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2020-10-19 16:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Sun, Oct 18, 2020 at 8:24 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On CONFIG_PREEMPT_COUNT, got it.  It would be OK for RCU to use
> preempt_count() for some debugging or specialty kernel, but not across
> the board.

Right - that was what I thought you were asking originally.

I don't think a driver or random piece of code like that should ever
use "preempt_count()" on its own - partly because the rules are
subtle, but partly simply because drivers have no business with those
kinds of low-level things.

But yeah, for some core stuff like RCU, using preempt_count() for
debugging etc makes sense. Just not to change _behavior_, because
preempt_count on its own is almost entirely meaningless. It's just one
(local) part of so much state. Again, partly because preempt count
isn't necessarily always meaningful due to config settings, but partly
because there are just so many other things like "are interrupts
disabled" or "are we in an NMI context" or whatever.

And in some odd situation, depending on exactly what you do, maybe
preempt-count can be exactly what you need, because you know
everything else about the state statically. "preempt_enable()"
obviously is one such thing - the whole point is "if
CONFIG_PREEMPT_COUNT is on, then the _semantics_ of this is 'increase
preempt count', and if it goes to zero, and we should reschedule, do
that'".

So it's not that preempt_count() is meaningless, but it's such a
specialized thing that 99.9% of all code really cannot and shouldn't
use it.

           Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-10-19 16:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12 14:14 [GIT PULL] RCU changes for v5.10 Ingo Molnar
2020-10-12 20:25 ` Linus Torvalds
2020-10-12 21:44   ` Paul E. McKenney
2020-10-12 21:59     ` Linus Torvalds
2020-10-12 23:54       ` Paul E. McKenney
2020-10-13  0:14         ` Linus Torvalds
2020-10-13  2:41           ` Paul E. McKenney
2020-10-13  7:32   ` Ingo Molnar
2020-10-13  9:16   ` Thomas Gleixner
2020-10-18 21:39 ` Linus Torvalds
2020-10-19  3:24   ` Paul E. McKenney
2020-10-19 16:51     ` Linus Torvalds
2020-10-19  7:31   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).