[PATCH 00/23] mm, sched: Rework lazy mm handling

* [PATCH 00/23] mm, sched: Rework lazy mm handling
@ 2022-01-08 16:43 Andy Lutomirski
  2022-01-08 16:43 ` [PATCH 01/23] membarrier: Document why membarrier() works Andy Lutomirski
                   ` (22 more replies)
  0 siblings, 23 replies; 79+ messages in thread
From: Andy Lutomirski @ 2022-01-08 16:43 UTC (permalink / raw)
  To: Andrew Morton, Linux-MM
  Cc: Nicholas Piggin, Anton Blanchard, Benjamin Herrenschmidt,
	Paul Mackerras, Randy Dunlap, linux-arch, x86, Rik van Riel,
	Dave Hansen, Peter Zijlstra, Nadav Amit, Mathieu Desnoyers,
	Andy Lutomirski

Hi all-

Sorry I've been sitting on this so long.  I think it's in decent shape, it
has no *known* bugs, and I think it's time to get the show on the road.
This series needs more eyeballs, too.

The overall point of this series is to get rid of the scalability
problems with mm_count, and my goal is to solve it once and for all,
for all architectures, in a way that doesn't have any gotchas for
unwary users of ->active_mm.

Most of this series is just cleanup, though.  mmgrab(), mmdrop(), and
->active_mm are a mess.  A number of ->active_mm users are simply
wrong.  kthread lazy mm handling is inconsistent with user thread lazy
mm handling (by accident, as far as I can tell).  And membarrier()
relies on the barrier semantics of mmdrop() and mmgrab(), such that
anything that gets rid of those barriers risks breaking membarrier().
x86 is sometimes non-lazy when the core thinks it's lazy because the
core mm code didn't offer any mechanism by which x86 could tell the core
that it's exiting lazy mode.

So most of this series is just cleanup.  Bogus users of ->active_mm
are fixed, and membarrier() is reworked so that its barriers are
explicit instead of depending on mmdrop() and mmgrab().  x86 lazy
handling is extensively tidied up, and x86's EFI mm code gets tidied
up a bit too.  I think I've done this all in a way that introduces
little or no overhead.

Additionally, all the code paths that change current->mm are consolidated
so that there is only one path to start using an mm and only one path
to stop using it.

Once that's done, the actual meat (the hazard pointers) isn't so bad, and
the x86 optimization on top that should eliminate scanning of remote CPUs
in __mmput() is about two lines of code.  Other architectures with
sufficiently accurate mm_cpumask() tracking should be able to do the same
thing.

akpm, this is intended to mostly replace Nick Piggin's lazy shootdown
series.  This series implements lazy shootdown on x86 implicitly, and
powerpc should be able to do the same thing in just a couple lines
of code if it wants to.  The result is IMO much cleaner and more
maintainable.

Once this is all reviewed, I'm hoping it can go in -tip (and -next) after
the merge window or go in -mm.  This is not intended for v5.16.  I suspect
-tip is easier in case other arch maintainers want to optimize their
code in the same release.

Andy Lutomirski (23):
  membarrier: Document why membarrier() works
  x86/mm: Handle unlazying membarrier core sync in the arch code
  membarrier: Remove membarrier_arch_switch_mm() prototype in core code
  membarrier: Make the post-switch-mm barrier explicit
  membarrier, kthread: Use _ONCE accessors for task->mm
  powerpc/membarrier: Remove special barrier on mm switch
  membarrier: Rewrite sync_core_before_usermode() and improve
    documentation
  membarrier: Remove redundant clear of mm->membarrier_state in
    exec_mmap()
  membarrier: Fix incorrect barrier positions during exec and
    kthread_use_mm()
  x86/events, x86/insn-eval: Remove incorrect active_mm references
  sched/scs: Initialize shadow stack on idle thread bringup, not
    shutdown
  Rework "sched/core: Fix illegal RCU from offline CPUs"
  exec: Remove unnecessary vmacache_seqnum clear in exec_mmap()
  sched, exec: Factor current mm changes out from exec
  kthread: Switch to __change_current_mm()
  sched: Use lightweight hazard pointers to grab lazy mms
  x86/mm: Make use/unuse_temporary_mm() non-static
  x86/mm: Allow temporary mms when IRQs are on
  x86/efi: Make efi_enter/leave_mm use the temporary_mm machinery
  x86/mm: Remove leave_mm() in favor of unlazy_mm_irqs_off()
  x86/mm: Use unlazy_mm_irqs_off() in TLB flush IPIs
  x86/mm: Optimize for_each_possible_lazymm_cpu()
  x86/mm: Opt in to IRQs-off activate_mm()

 .../membarrier-sync-core/arch-support.txt     |  69 +--
 arch/arm/include/asm/membarrier.h             |  21 +
 arch/arm/kernel/smp.c                         |   2 -
 arch/arm64/include/asm/membarrier.h           |  19 +
 arch/arm64/kernel/smp.c                       |   2 -
 arch/csky/kernel/smp.c                        |   2 -
 arch/ia64/kernel/process.c                    |   1 -
 arch/mips/cavium-octeon/smp.c                 |   1 -
 arch/mips/kernel/smp-bmips.c                  |   2 -
 arch/mips/kernel/smp-cps.c                    |   1 -
 arch/mips/loongson64/smp.c                    |   2 -
 arch/powerpc/include/asm/membarrier.h         |  28 +-
 arch/powerpc/mm/mmu_context.c                 |   1 -
 arch/powerpc/platforms/85xx/smp.c             |   2 -
 arch/powerpc/platforms/powermac/smp.c         |   2 -
 arch/powerpc/platforms/powernv/smp.c          |   1 -
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |   2 -
 arch/powerpc/platforms/pseries/pmem.c         |   1 -
 arch/riscv/kernel/cpu-hotplug.c               |   2 -
 arch/s390/kernel/smp.c                        |   1 -
 arch/sh/kernel/smp.c                          |   1 -
 arch/sparc/kernel/smp_64.c                    |   2 -
 arch/x86/Kconfig                              |   2 +-
 arch/x86/events/core.c                        |   9 +-
 arch/x86/include/asm/membarrier.h             |  25 ++
 arch/x86/include/asm/mmu.h                    |   6 +-
 arch/x86/include/asm/mmu_context.h            |  15 +-
 arch/x86/include/asm/sync_core.h              |  20 -
 arch/x86/kernel/alternative.c                 |  67 +--
 arch/x86/kernel/cpu/mce/core.c                |   2 +-
 arch/x86/kernel/smpboot.c                     |   2 -
 arch/x86/lib/insn-eval.c                      |  13 +-
 arch/x86/mm/tlb.c                             | 155 +++++--
 arch/x86/platform/efi/efi_64.c                |   9 +-
 arch/x86/xen/mmu_pv.c                         |   2 +-
 arch/xtensa/kernel/smp.c                      |   1 -
 drivers/cpuidle/cpuidle.c                     |   2 +-
 drivers/idle/intel_idle.c                     |   4 +-
 drivers/misc/sgi-gru/grufault.c               |   2 +-
 drivers/misc/sgi-gru/gruhandles.c             |   2 +-
 drivers/misc/sgi-gru/grukservices.c           |   2 +-
 fs/exec.c                                     |  28 +-
 include/linux/mmu_context.h                   |   4 +-
 include/linux/sched/hotplug.h                 |   6 -
 include/linux/sched/mm.h                      |  58 ++-
 include/linux/sync_core.h                     |  21 -
 init/Kconfig                                  |   3 -
 kernel/cpu.c                                  |  21 +-
 kernel/exit.c                                 |   2 +-
 kernel/fork.c                                 |  11 +
 kernel/kthread.c                              |  50 +--
 kernel/sched/core.c                           | 409 +++++++++++++++---
 kernel/sched/idle.c                           |   1 +
 kernel/sched/membarrier.c                     |  97 ++++-
 kernel/sched/sched.h                          |  11 +-
 55 files changed, 745 insertions(+), 482 deletions(-)
 create mode 100644 arch/arm/include/asm/membarrier.h
 create mode 100644 arch/arm64/include/asm/membarrier.h
 create mode 100644 arch/x86/include/asm/membarrier.h
 delete mode 100644 include/linux/sync_core.h

-- 
2.33.1

^ permalink raw reply	[flat|nested] 79+ messages in thread