All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/10] arm64/riscv: Introduce fast kexec reboot
@ 2022-08-22  2:15 ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

On a SMP arm64 machine, it may take a long time to kexec-reboot a new
kernel, where the time is linear to the number of the cpus. On a 80 cpus
machine, it takes about 15 seconds, while with this patch, the time
will dramaticly drop to one second.

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
    which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
    and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
components and tear down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel. 


*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
     The firmware's internal state should enter into a proper state, so
it can work for the new kernel. And this is achieved by the firmware's
cpuhp_step's teardown interface if any.

-1.2 CPU internal state
     Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
   The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by dependency. This series does not
bother to judge the dependency, instead, just iterate downside each
cpuhp_step. And this strategy demands that each involved cpuhp_step's
teardown procedure supports parallelism.


*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallelism, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.
1. Send each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
   in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Send each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
   interface of stop_machine_cpuslocked() in parallel.

Finally the exposed stop_machine_cpuslocked()can be used to support
parallelism.

Apparently, step 2 is introduced in order to satisfy the prerequisite on
which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallelism in step 1&3.
Fortunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adapting to the
subsystem's lock rule will make things work.


*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue if a failure on cpu_down() happens,
it just adventures to move on.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org

Pingfan Liu (10):
  cpu/hotplug: Make __cpuhp_kick_ap() ready for async
  cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on
    CONFIG_SHUTDOWN_NONBOOT_CPUS
  cpu/hotplug: Introduce fast kexec reboot
  cpu/hotplug: Check the capability of kexec quick reboot
  perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel
  rcu/hotplug: Make rcutree_dead_cpu() parallel
  lib/cpumask: Introduce cpumask_not_dying_but()
  cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
  genirq/cpuhotplug: Ask migrate_one_irq() to migrate to a real online
    cpu
  arm64: smp: Make __cpu_disable() parallel

 arch/Kconfig                             |   4 +
 arch/arm/Kconfig                         |   1 +
 arch/arm/mach-imx/mmdc.c                 |   2 +-
 arch/arm/mm/cache-l2x0-pmu.c             |   2 +-
 arch/arm64/Kconfig                       |   1 +
 arch/arm64/kernel/smp.c                  |  31 +++-
 arch/ia64/Kconfig                        |   1 +
 arch/riscv/Kconfig                       |   1 +
 drivers/dma/idxd/perfmon.c               |   2 +-
 drivers/fpga/dfl-fme-perf.c              |   2 +-
 drivers/gpu/drm/i915/i915_pmu.c          |   2 +-
 drivers/perf/arm-cci.c                   |   2 +-
 drivers/perf/arm-ccn.c                   |   2 +-
 drivers/perf/arm-cmn.c                   |   4 +-
 drivers/perf/arm_dmc620_pmu.c            |   2 +-
 drivers/perf/arm_dsu_pmu.c               |  16 +-
 drivers/perf/arm_smmuv3_pmu.c            |   2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         |   2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c |   2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     |   2 +-
 drivers/perf/qcom_l2_pmu.c               |   2 +-
 drivers/perf/qcom_l3_pmu.c               |   2 +-
 drivers/perf/xgene_pmu.c                 |   2 +-
 drivers/soc/fsl/qbman/bman_portal.c      |   2 +-
 drivers/soc/fsl/qbman/qman_portal.c      |   2 +-
 include/linux/cpuhotplug.h               |   2 +
 include/linux/cpumask.h                  |   3 +
 kernel/cpu.c                             | 213 ++++++++++++++++++++---
 kernel/irq/cpuhotplug.c                  |   3 +-
 kernel/rcu/tree.c                        |   3 +-
 lib/cpumask.c                            |  18 ++
 31 files changed, 281 insertions(+), 54 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC 00/10] arm64/riscv: Introduce fast kexec reboot
@ 2022-08-22  2:15 ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

On a SMP arm64 machine, it may take a long time to kexec-reboot a new
kernel, where the time is linear to the number of the cpus. On a 80 cpus
machine, it takes about 15 seconds, while with this patch, the time
will dramaticly drop to one second.

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
    which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
    and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
components and tear down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel. 


*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
     The firmware's internal state should enter into a proper state, so
it can work for the new kernel. And this is achieved by the firmware's
cpuhp_step's teardown interface if any.

-1.2 CPU internal state
     Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
   The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by dependency. This series does not
bother to judge the dependency, instead, just iterate downside each
cpuhp_step. And this strategy demands that each involved cpuhp_step's
teardown procedure supports parallelism.


*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallelism, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.
1. Send each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
   in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Send each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
   interface of stop_machine_cpuslocked() in parallel.

Finally the exposed stop_machine_cpuslocked()can be used to support
parallelism.

Apparently, step 2 is introduced in order to satisfy the prerequisite on
which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallelism in step 1&3.
Fortunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adapting to the
subsystem's lock rule will make things work.


*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue if a failure on cpu_down() happens,
it just adventures to move on.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org

Pingfan Liu (10):
  cpu/hotplug: Make __cpuhp_kick_ap() ready for async
  cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on
    CONFIG_SHUTDOWN_NONBOOT_CPUS
  cpu/hotplug: Introduce fast kexec reboot
  cpu/hotplug: Check the capability of kexec quick reboot
  perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel
  rcu/hotplug: Make rcutree_dead_cpu() parallel
  lib/cpumask: Introduce cpumask_not_dying_but()
  cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
  genirq/cpuhotplug: Ask migrate_one_irq() to migrate to a real online
    cpu
  arm64: smp: Make __cpu_disable() parallel

 arch/Kconfig                             |   4 +
 arch/arm/Kconfig                         |   1 +
 arch/arm/mach-imx/mmdc.c                 |   2 +-
 arch/arm/mm/cache-l2x0-pmu.c             |   2 +-
 arch/arm64/Kconfig                       |   1 +
 arch/arm64/kernel/smp.c                  |  31 +++-
 arch/ia64/Kconfig                        |   1 +
 arch/riscv/Kconfig                       |   1 +
 drivers/dma/idxd/perfmon.c               |   2 +-
 drivers/fpga/dfl-fme-perf.c              |   2 +-
 drivers/gpu/drm/i915/i915_pmu.c          |   2 +-
 drivers/perf/arm-cci.c                   |   2 +-
 drivers/perf/arm-ccn.c                   |   2 +-
 drivers/perf/arm-cmn.c                   |   4 +-
 drivers/perf/arm_dmc620_pmu.c            |   2 +-
 drivers/perf/arm_dsu_pmu.c               |  16 +-
 drivers/perf/arm_smmuv3_pmu.c            |   2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         |   2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c |   2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     |   2 +-
 drivers/perf/qcom_l2_pmu.c               |   2 +-
 drivers/perf/qcom_l3_pmu.c               |   2 +-
 drivers/perf/xgene_pmu.c                 |   2 +-
 drivers/soc/fsl/qbman/bman_portal.c      |   2 +-
 drivers/soc/fsl/qbman/qman_portal.c      |   2 +-
 include/linux/cpuhotplug.h               |   2 +
 include/linux/cpumask.h                  |   3 +
 kernel/cpu.c                             | 213 ++++++++++++++++++++---
 kernel/irq/cpuhotplug.c                  |   3 +-
 kernel/rcu/tree.c                        |   3 +-
 lib/cpumask.c                            |  18 ++
 31 files changed, 281 insertions(+), 54 deletions(-)

-- 
2.31.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC 00/10] arm64/riscv: Introduce fast kexec reboot
@ 2022-08-22  2:15 ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

On a SMP arm64 machine, it may take a long time to kexec-reboot a new
kernel, where the time is linear to the number of the cpus. On a 80 cpus
machine, it takes about 15 seconds, while with this patch, the time
will dramaticly drop to one second.

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
    which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
    and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
components and tear down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel. 


*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
     The firmware's internal state should enter into a proper state, so
it can work for the new kernel. And this is achieved by the firmware's
cpuhp_step's teardown interface if any.

-1.2 CPU internal state
     Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
   The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by dependency. This series does not
bother to judge the dependency, instead, just iterate downside each
cpuhp_step. And this strategy demands that each involved cpuhp_step's
teardown procedure supports parallelism.


*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallelism, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.
1. Send each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
   in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Send each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
   interface of stop_machine_cpuslocked() in parallel.

Finally the exposed stop_machine_cpuslocked()can be used to support
parallelism.

Apparently, step 2 is introduced in order to satisfy the prerequisite on
which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallelism in step 1&3.
Fortunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adapting to the
subsystem's lock rule will make things work.


*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue if a failure on cpu_down() happens,
it just adventures to move on.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org

Pingfan Liu (10):
  cpu/hotplug: Make __cpuhp_kick_ap() ready for async
  cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on
    CONFIG_SHUTDOWN_NONBOOT_CPUS
  cpu/hotplug: Introduce fast kexec reboot
  cpu/hotplug: Check the capability of kexec quick reboot
  perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel
  rcu/hotplug: Make rcutree_dead_cpu() parallel
  lib/cpumask: Introduce cpumask_not_dying_but()
  cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
  genirq/cpuhotplug: Ask migrate_one_irq() to migrate to a real online
    cpu
  arm64: smp: Make __cpu_disable() parallel

 arch/Kconfig                             |   4 +
 arch/arm/Kconfig                         |   1 +
 arch/arm/mach-imx/mmdc.c                 |   2 +-
 arch/arm/mm/cache-l2x0-pmu.c             |   2 +-
 arch/arm64/Kconfig                       |   1 +
 arch/arm64/kernel/smp.c                  |  31 +++-
 arch/ia64/Kconfig                        |   1 +
 arch/riscv/Kconfig                       |   1 +
 drivers/dma/idxd/perfmon.c               |   2 +-
 drivers/fpga/dfl-fme-perf.c              |   2 +-
 drivers/gpu/drm/i915/i915_pmu.c          |   2 +-
 drivers/perf/arm-cci.c                   |   2 +-
 drivers/perf/arm-ccn.c                   |   2 +-
 drivers/perf/arm-cmn.c                   |   4 +-
 drivers/perf/arm_dmc620_pmu.c            |   2 +-
 drivers/perf/arm_dsu_pmu.c               |  16 +-
 drivers/perf/arm_smmuv3_pmu.c            |   2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         |   2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c |   2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     |   2 +-
 drivers/perf/qcom_l2_pmu.c               |   2 +-
 drivers/perf/qcom_l3_pmu.c               |   2 +-
 drivers/perf/xgene_pmu.c                 |   2 +-
 drivers/soc/fsl/qbman/bman_portal.c      |   2 +-
 drivers/soc/fsl/qbman/qman_portal.c      |   2 +-
 include/linux/cpuhotplug.h               |   2 +
 include/linux/cpumask.h                  |   3 +
 kernel/cpu.c                             | 213 ++++++++++++++++++++---
 kernel/irq/cpuhotplug.c                  |   3 +-
 kernel/rcu/tree.c                        |   3 +-
 lib/cpumask.c                            |  18 ++
 31 files changed, 281 insertions(+), 54 deletions(-)

-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC 00/10] arm64/riscv: Introduce fast kexec reboot
@ 2022-08-22  2:15 ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

On a SMP arm64 machine, it may take a long time to kexec-reboot a new
kernel, where the time is linear to the number of the cpus. On a 80 cpus
machine, it takes about 15 seconds, while with this patch, the time
will dramaticly drop to one second.

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
    which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
    and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
components and tear down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel. 


*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
     The firmware's internal state should enter into a proper state, so
it can work for the new kernel. And this is achieved by the firmware's
cpuhp_step's teardown interface if any.

-1.2 CPU internal state
     Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
   The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by dependency. This series does not
bother to judge the dependency, instead, just iterate downside each
cpuhp_step. And this strategy demands that each involved cpuhp_step's
teardown procedure supports parallelism.


*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallelism, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.
1. Send each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
   in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Send each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
   interface of stop_machine_cpuslocked() in parallel.

Finally the exposed stop_machine_cpuslocked()can be used to support
parallelism.

Apparently, step 2 is introduced in order to satisfy the prerequisite on
which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallelism in step 1&3.
Fortunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adapting to the
subsystem's lock rule will make things work.


*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue if a failure on cpu_down() happens,
it just adventures to move on.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org

Pingfan Liu (10):
  cpu/hotplug: Make __cpuhp_kick_ap() ready for async
  cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on
    CONFIG_SHUTDOWN_NONBOOT_CPUS
  cpu/hotplug: Introduce fast kexec reboot
  cpu/hotplug: Check the capability of kexec quick reboot
  perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel
  rcu/hotplug: Make rcutree_dead_cpu() parallel
  lib/cpumask: Introduce cpumask_not_dying_but()
  cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
  genirq/cpuhotplug: Ask migrate_one_irq() to migrate to a real online
    cpu
  arm64: smp: Make __cpu_disable() parallel

 arch/Kconfig                             |   4 +
 arch/arm/Kconfig                         |   1 +
 arch/arm/mach-imx/mmdc.c                 |   2 +-
 arch/arm/mm/cache-l2x0-pmu.c             |   2 +-
 arch/arm64/Kconfig                       |   1 +
 arch/arm64/kernel/smp.c                  |  31 +++-
 arch/ia64/Kconfig                        |   1 +
 arch/riscv/Kconfig                       |   1 +
 drivers/dma/idxd/perfmon.c               |   2 +-
 drivers/fpga/dfl-fme-perf.c              |   2 +-
 drivers/gpu/drm/i915/i915_pmu.c          |   2 +-
 drivers/perf/arm-cci.c                   |   2 +-
 drivers/perf/arm-ccn.c                   |   2 +-
 drivers/perf/arm-cmn.c                   |   4 +-
 drivers/perf/arm_dmc620_pmu.c            |   2 +-
 drivers/perf/arm_dsu_pmu.c               |  16 +-
 drivers/perf/arm_smmuv3_pmu.c            |   2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         |   2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c |   2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     |   2 +-
 drivers/perf/qcom_l2_pmu.c               |   2 +-
 drivers/perf/qcom_l3_pmu.c               |   2 +-
 drivers/perf/xgene_pmu.c                 |   2 +-
 drivers/soc/fsl/qbman/bman_portal.c      |   2 +-
 drivers/soc/fsl/qbman/qman_portal.c      |   2 +-
 include/linux/cpuhotplug.h               |   2 +
 include/linux/cpumask.h                  |   3 +
 kernel/cpu.c                             | 213 ++++++++++++++++++++---
 kernel/irq/cpuhotplug.c                  |   3 +-
 kernel/rcu/tree.c                        |   3 +-
 lib/cpumask.c                            |  18 ++
 31 files changed, 281 insertions(+), 54 deletions(-)

-- 
2.31.1

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC 01/10] cpu/hotplug: Make __cpuhp_kick_ap() ready for async
  2022-08-22  2:15 ` Pingfan Liu
                   ` (2 preceding siblings ...)
  (?)
@ 2022-08-22  2:15 ` Pingfan Liu
  -1 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price, Andi Kleen,
	Frederic Weisbecker, Jason A. Donenfeld, Mark Rutland

At present, during the kexec reboot, the teardown of cpus can not run in
parallel. As the first step towards the parallel, it demands the
initiator to kick ap thread one by one instead of waiting for each ap
thread completion.

Change the prototype of __cpuhp_kick_ap() to cope with this demand.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: "Peter Zijlstra
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Mark Rutland <mark.rutland@arm.com>
To: linux-kernel@vger.kernel.org
---
 kernel/cpu.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index bbad5e375d3b..338e1d426c7e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -526,7 +526,7 @@ cpuhp_reset_state(int cpu, struct cpuhp_cpu_state *st,
 }
 
 /* Regular hotplug invocation of the AP hotplug thread */
-static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
+static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st, bool sync)
 {
 	if (!st->single && st->state == st->target)
 		return;
@@ -539,20 +539,22 @@ static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
 	smp_mb();
 	st->should_run = true;
 	wake_up_process(st->thread);
-	wait_for_ap_thread(st, st->bringup);
+	if (sync)
+		wait_for_ap_thread(st, st->bringup);
 }
 
 static int cpuhp_kick_ap(int cpu, struct cpuhp_cpu_state *st,
-			 enum cpuhp_state target)
+		enum cpuhp_state target, bool sync)
 {
 	enum cpuhp_state prev_state;
 	int ret;
 
 	prev_state = cpuhp_set_state(cpu, st, target);
-	__cpuhp_kick_ap(st);
-	if ((ret = st->result)) {
+	__cpuhp_kick_ap(st, sync);
+	ret = st->result;
+	if (sync && ret) {
 		cpuhp_reset_state(cpu, st, prev_state);
-		__cpuhp_kick_ap(st);
+		__cpuhp_kick_ap(st, true);
 	}
 
 	return ret;
@@ -583,7 +585,7 @@ static int bringup_wait_for_ap(unsigned int cpu)
 	if (st->target <= CPUHP_AP_ONLINE_IDLE)
 		return 0;
 
-	return cpuhp_kick_ap(cpu, st, st->target);
+	return cpuhp_kick_ap(cpu, st, st->target, true);
 }
 
 static int bringup_cpu(unsigned int cpu)
@@ -835,7 +837,7 @@ cpuhp_invoke_ap_callback(int cpu, enum cpuhp_state state, bool bringup,
 	st->cb_state = state;
 	st->single = true;
 
-	__cpuhp_kick_ap(st);
+	__cpuhp_kick_ap(st, true);
 
 	/*
 	 * If we failed and did a partial, do a rollback.
@@ -844,7 +846,7 @@ cpuhp_invoke_ap_callback(int cpu, enum cpuhp_state state, bool bringup,
 		st->rollback = true;
 		st->bringup = !bringup;
 
-		__cpuhp_kick_ap(st);
+		__cpuhp_kick_ap(st, true);
 	}
 
 	/*
@@ -868,12 +870,29 @@ static int cpuhp_kick_ap_work(unsigned int cpu)
 	cpuhp_lock_release(true);
 
 	trace_cpuhp_enter(cpu, st->target, prev_state, cpuhp_kick_ap_work);
-	ret = cpuhp_kick_ap(cpu, st, st->target);
+	ret = cpuhp_kick_ap(cpu, st, st->target, true);
 	trace_cpuhp_exit(cpu, st->state, prev_state, ret);
 
 	return ret;
 }
 
+/* In the async case, trace is meaningless since ret value is not available */
+static int cpuhp_kick_ap_work_async(unsigned int cpu)
+{
+	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+	int ret;
+
+	cpuhp_lock_acquire(false);
+	cpuhp_lock_release(false);
+
+	cpuhp_lock_acquire(true);
+	cpuhp_lock_release(true);
+
+	ret = cpuhp_kick_ap(cpu, st, st->target, false);
+
+	return ret;
+}
+
 static struct smp_hotplug_thread cpuhp_threads = {
 	.store			= &cpuhp_state.thread,
 	.thread_should_run	= cpuhp_should_run,
@@ -1171,7 +1190,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 	if (ret && st->state < prev_state) {
 		if (st->state == CPUHP_TEARDOWN_CPU) {
 			cpuhp_reset_state(cpu, st, prev_state);
-			__cpuhp_kick_ap(st);
+			__cpuhp_kick_ap(st, true);
 		} else {
 			WARN(1, "DEAD callback error for CPU%d", cpu);
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS
  2022-08-22  2:15 ` Pingfan Liu
  (?)
  (?)
@ 2022-08-22  2:15   ` Pingfan Liu
  -1 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman, Mark Rutland, Marco Elver, Masami Hiramatsu,
	Dan Li, Song Liu, Sami Tolvanen, Arnd Bergmann, Linus Walleij,
	Ard Biesheuvel, Tony Lindgren, Nick Hawkins, John Crispin,
	Geert Uytterhoeven, Andrew Morton, Bjorn Andersson,
	Anshuman Khandual, Thomas Gleixner, Steven Price

Only arm/arm64/ia64/riscv share the smp_shutdown_nonboot_cpus(). So
compiling this code conditioned on the macro
CONFIG_SHUTDOWN_NONBOOT_CPUS. Later this macro will brace the quick
kexec reboot code.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Dan Li <ashimida@linux.alibaba.com>
Cc: Song Liu <song@kernel.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Nick Hawkins <nick.hawkins@hpe.com>
Cc: John Crispin <john@phrozen.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 arch/Kconfig       | 4 ++++
 arch/arm/Kconfig   | 1 +
 arch/arm64/Kconfig | 1 +
 arch/ia64/Kconfig  | 1 +
 arch/riscv/Kconfig | 1 +
 kernel/cpu.c       | 3 +++
 6 files changed, 11 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index f330410da63a..be447537d0f6 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,10 @@ menu "General architecture-dependent options"
 config CRASH_CORE
 	bool
 
+config SHUTDOWN_NONBOOT_CPUS
+	select KEXEC_CORE
+	bool
+
 config KEXEC_CORE
 	select CRASH_CORE
 	bool
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 87badeae3181..711cfdb4f9f4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -129,6 +129,7 @@ config ARM
 	select PCI_SYSCALL if PCI
 	select PERF_USE_VMALLOC
 	select RTC_LIB
+	select SHUTDOWN_NONBOOT_CPUS
 	select SYS_SUPPORTS_APM_EMULATION
 	select THREAD_INFO_IN_TASK
 	select HAVE_ARCH_VMAP_STACK if MMU && ARM_HAS_GROUP_RELOCS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 571cc234d0b3..8c481a0b1829 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -223,6 +223,7 @@ config ARM64
 	select PCI_SYSCALL if PCI
 	select POWER_RESET
 	select POWER_SUPPLY
+	select SHUTDOWN_NONBOOT_CPUS
 	select SPARSE_IRQ
 	select SWIOTLB
 	select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 26ac8ea15a9e..8a3ddea97d1b 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -52,6 +52,7 @@ config IA64
 	select ARCH_CLOCKSOURCE_DATA
 	select GENERIC_TIME_VSYSCALL
 	select LEGACY_TIMER_TICK
+	select SHUTDOWN_NONBOOT_CPUS
 	select SWIOTLB
 	select SYSCTL_ARCH_UNALIGN_NO_WARN
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ed66c31e4655..02606a48c5ea 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -120,6 +120,7 @@ config RISCV
 	select PCI_MSI if PCI
 	select RISCV_INTC
 	select RISCV_TIMER if RISCV_SBI
+	select SHUTDOWN_NONBOOT_CPUS
 	select SPARSE_IRQ
 	select SYSCTL_EXCEPTION_TRACE
 	select THREAD_INFO_IN_TASK
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 338e1d426c7e..2be6ba811a01 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1258,6 +1258,8 @@ int remove_cpu(unsigned int cpu)
 }
 EXPORT_SYMBOL_GPL(remove_cpu);
 
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+
 void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 {
 	unsigned int cpu;
@@ -1299,6 +1301,7 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 
 	cpu_maps_update_done();
 }
+#endif
 
 #else
 #define takedown_cpu		NULL
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman, Mark Rutland, Marco Elver, Masami Hiramatsu,
	Dan Li, Song Liu, Sami Tolvanen, Arnd Bergmann, Linus Walleij,
	Ard Biesheuvel, Tony Lindgren, Nick Hawkins, John Crispin,
	Geert Uytterhoeven, Andrew Morton, Bjorn Andersson,
	Anshuman Khandual, Thomas Gleixner, Steven Price

Only arm/arm64/ia64/riscv share the smp_shutdown_nonboot_cpus(). So
compiling this code conditioned on the macro
CONFIG_SHUTDOWN_NONBOOT_CPUS. Later this macro will brace the quick
kexec reboot code.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Dan Li <ashimida@linux.alibaba.com>
Cc: Song Liu <song@kernel.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Nick Hawkins <nick.hawkins@hpe.com>
Cc: John Crispin <john@phrozen.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 arch/Kconfig       | 4 ++++
 arch/arm/Kconfig   | 1 +
 arch/arm64/Kconfig | 1 +
 arch/ia64/Kconfig  | 1 +
 arch/riscv/Kconfig | 1 +
 kernel/cpu.c       | 3 +++
 6 files changed, 11 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index f330410da63a..be447537d0f6 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,10 @@ menu "General architecture-dependent options"
 config CRASH_CORE
 	bool
 
+config SHUTDOWN_NONBOOT_CPUS
+	select KEXEC_CORE
+	bool
+
 config KEXEC_CORE
 	select CRASH_CORE
 	bool
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 87badeae3181..711cfdb4f9f4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -129,6 +129,7 @@ config ARM
 	select PCI_SYSCALL if PCI
 	select PERF_USE_VMALLOC
 	select RTC_LIB
+	select SHUTDOWN_NONBOOT_CPUS
 	select SYS_SUPPORTS_APM_EMULATION
 	select THREAD_INFO_IN_TASK
 	select HAVE_ARCH_VMAP_STACK if MMU && ARM_HAS_GROUP_RELOCS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 571cc234d0b3..8c481a0b1829 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -223,6 +223,7 @@ config ARM64
 	select PCI_SYSCALL if PCI
 	select POWER_RESET
 	select POWER_SUPPLY
+	select SHUTDOWN_NONBOOT_CPUS
 	select SPARSE_IRQ
 	select SWIOTLB
 	select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 26ac8ea15a9e..8a3ddea97d1b 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -52,6 +52,7 @@ config IA64
 	select ARCH_CLOCKSOURCE_DATA
 	select GENERIC_TIME_VSYSCALL
 	select LEGACY_TIMER_TICK
+	select SHUTDOWN_NONBOOT_CPUS
 	select SWIOTLB
 	select SYSCTL_ARCH_UNALIGN_NO_WARN
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ed66c31e4655..02606a48c5ea 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -120,6 +120,7 @@ config RISCV
 	select PCI_MSI if PCI
 	select RISCV_INTC
 	select RISCV_TIMER if RISCV_SBI
+	select SHUTDOWN_NONBOOT_CPUS
 	select SPARSE_IRQ
 	select SYSCTL_EXCEPTION_TRACE
 	select THREAD_INFO_IN_TASK
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 338e1d426c7e..2be6ba811a01 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1258,6 +1258,8 @@ int remove_cpu(unsigned int cpu)
 }
 EXPORT_SYMBOL_GPL(remove_cpu);
 
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+
 void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 {
 	unsigned int cpu;
@@ -1299,6 +1301,7 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 
 	cpu_maps_update_done();
 }
+#endif
 
 #else
 #define takedown_cpu		NULL
-- 
2.31.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman, Mark Rutland, Marco Elver, Masami Hiramatsu,
	Dan Li, Song Liu, Sami Tolvanen, Arnd Bergmann, Linus Walleij,
	Ard Biesheuvel, Tony Lindgren, Nick Hawkins, John Crispin,
	Geert Uytterhoeven, Andrew Morton, Bjorn Andersson,
	Anshuman Khandual, Thomas Gleixner, Steven Price

Only arm/arm64/ia64/riscv share the smp_shutdown_nonboot_cpus(). So
compiling this code conditioned on the macro
CONFIG_SHUTDOWN_NONBOOT_CPUS. Later this macro will brace the quick
kexec reboot code.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Dan Li <ashimida@linux.alibaba.com>
Cc: Song Liu <song@kernel.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Nick Hawkins <nick.hawkins@hpe.com>
Cc: John Crispin <john@phrozen.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 arch/Kconfig       | 4 ++++
 arch/arm/Kconfig   | 1 +
 arch/arm64/Kconfig | 1 +
 arch/ia64/Kconfig  | 1 +
 arch/riscv/Kconfig | 1 +
 kernel/cpu.c       | 3 +++
 6 files changed, 11 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index f330410da63a..be447537d0f6 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,10 @@ menu "General architecture-dependent options"
 config CRASH_CORE
 	bool
 
+config SHUTDOWN_NONBOOT_CPUS
+	select KEXEC_CORE
+	bool
+
 config KEXEC_CORE
 	select CRASH_CORE
 	bool
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 87badeae3181..711cfdb4f9f4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -129,6 +129,7 @@ config ARM
 	select PCI_SYSCALL if PCI
 	select PERF_USE_VMALLOC
 	select RTC_LIB
+	select SHUTDOWN_NONBOOT_CPUS
 	select SYS_SUPPORTS_APM_EMULATION
 	select THREAD_INFO_IN_TASK
 	select HAVE_ARCH_VMAP_STACK if MMU && ARM_HAS_GROUP_RELOCS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 571cc234d0b3..8c481a0b1829 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -223,6 +223,7 @@ config ARM64
 	select PCI_SYSCALL if PCI
 	select POWER_RESET
 	select POWER_SUPPLY
+	select SHUTDOWN_NONBOOT_CPUS
 	select SPARSE_IRQ
 	select SWIOTLB
 	select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 26ac8ea15a9e..8a3ddea97d1b 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -52,6 +52,7 @@ config IA64
 	select ARCH_CLOCKSOURCE_DATA
 	select GENERIC_TIME_VSYSCALL
 	select LEGACY_TIMER_TICK
+	select SHUTDOWN_NONBOOT_CPUS
 	select SWIOTLB
 	select SYSCTL_ARCH_UNALIGN_NO_WARN
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ed66c31e4655..02606a48c5ea 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -120,6 +120,7 @@ config RISCV
 	select PCI_MSI if PCI
 	select RISCV_INTC
 	select RISCV_TIMER if RISCV_SBI
+	select SHUTDOWN_NONBOOT_CPUS
 	select SPARSE_IRQ
 	select SYSCTL_EXCEPTION_TRACE
 	select THREAD_INFO_IN_TASK
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 338e1d426c7e..2be6ba811a01 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1258,6 +1258,8 @@ int remove_cpu(unsigned int cpu)
 }
 EXPORT_SYMBOL_GPL(remove_cpu);
 
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+
 void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 {
 	unsigned int cpu;
@@ -1299,6 +1301,7 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 
 	cpu_maps_update_done();
 }
+#endif
 
 #else
 #define takedown_cpu		NULL
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman, Mark Rutland, Marco Elver, Masami Hiramatsu,
	Dan Li, Song Liu, Sami Tolvanen, Arnd Bergmann, Linus Walleij,
	Ard Biesheuvel, Tony Lindgren, Nick Hawkins, John Crispin,
	Geert Uytterhoeven, Andrew Morton, Bjorn Andersson,
	Anshuman Khandual, Thomas Gleixner, Steven Price

Only arm/arm64/ia64/riscv share the smp_shutdown_nonboot_cpus(). So
compiling this code conditioned on the macro
CONFIG_SHUTDOWN_NONBOOT_CPUS. Later this macro will brace the quick
kexec reboot code.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Dan Li <ashimida@linux.alibaba.com>
Cc: Song Liu <song@kernel.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Nick Hawkins <nick.hawkins@hpe.com>
Cc: John Crispin <john@phrozen.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 arch/Kconfig       | 4 ++++
 arch/arm/Kconfig   | 1 +
 arch/arm64/Kconfig | 1 +
 arch/ia64/Kconfig  | 1 +
 arch/riscv/Kconfig | 1 +
 kernel/cpu.c       | 3 +++
 6 files changed, 11 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index f330410da63a..be447537d0f6 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,10 @@ menu "General architecture-dependent options"
 config CRASH_CORE
 	bool
 
+config SHUTDOWN_NONBOOT_CPUS
+	select KEXEC_CORE
+	bool
+
 config KEXEC_CORE
 	select CRASH_CORE
 	bool
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 87badeae3181..711cfdb4f9f4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -129,6 +129,7 @@ config ARM
 	select PCI_SYSCALL if PCI
 	select PERF_USE_VMALLOC
 	select RTC_LIB
+	select SHUTDOWN_NONBOOT_CPUS
 	select SYS_SUPPORTS_APM_EMULATION
 	select THREAD_INFO_IN_TASK
 	select HAVE_ARCH_VMAP_STACK if MMU && ARM_HAS_GROUP_RELOCS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 571cc234d0b3..8c481a0b1829 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -223,6 +223,7 @@ config ARM64
 	select PCI_SYSCALL if PCI
 	select POWER_RESET
 	select POWER_SUPPLY
+	select SHUTDOWN_NONBOOT_CPUS
 	select SPARSE_IRQ
 	select SWIOTLB
 	select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 26ac8ea15a9e..8a3ddea97d1b 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -52,6 +52,7 @@ config IA64
 	select ARCH_CLOCKSOURCE_DATA
 	select GENERIC_TIME_VSYSCALL
 	select LEGACY_TIMER_TICK
+	select SHUTDOWN_NONBOOT_CPUS
 	select SWIOTLB
 	select SYSCTL_ARCH_UNALIGN_NO_WARN
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ed66c31e4655..02606a48c5ea 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -120,6 +120,7 @@ config RISCV
 	select PCI_MSI if PCI
 	select RISCV_INTC
 	select RISCV_TIMER if RISCV_SBI
+	select SHUTDOWN_NONBOOT_CPUS
 	select SPARSE_IRQ
 	select SYSCTL_EXCEPTION_TRACE
 	select THREAD_INFO_IN_TASK
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 338e1d426c7e..2be6ba811a01 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1258,6 +1258,8 @@ int remove_cpu(unsigned int cpu)
 }
 EXPORT_SYMBOL_GPL(remove_cpu);
 
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+
 void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 {
 	unsigned int cpu;
@@ -1299,6 +1301,7 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 
 	cpu_maps_update_done();
 }
+#endif
 
 #else
 #define takedown_cpu		NULL
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 03/10] cpu/hotplug: Introduce fast kexec reboot
  2022-08-22  2:15 ` Pingfan Liu
  (?)
  (?)
@ 2022-08-22  2:15   ` Pingfan Liu
  -1 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
    which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
    and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
component and tears them down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel. It also renders an opportunity to
implement the cpu_down() in parallel in future (not done by this series).

*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
     The firmware's internal state should enter into a proper state.
And this is achieved by the firmware's cpuhp_step's teardown interface
if any.

-1.2 CPU internal
     Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
   The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by the way of the dependency. This
series does not bother to judge the dependency, instead, just iterate
downside each cpuhp_step. And this stragegy demands that each cpuhp_step's
teardown interface supports parallel.

*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallel, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.  And the exposed stop_machine_cpuslocked()
can be used to support parallel.
1. Bring each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
   in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Bring each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
   interface of stop_machine_cpuslocked() in parallel.

Apparently, the step 2 is introduced in order to satisfy the condition
on which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallel in step 1&3.
Furtunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adopting to the
subsystem's lock rule will make things work.

*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 kernel/cpu.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 129 insertions(+), 10 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2be6ba811a01..94ab2727d6bb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1260,10 +1260,125 @@ EXPORT_SYMBOL_GPL(remove_cpu);
 
 #ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
 
-void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+/*
+ * Push all of cpus to the state CPUHP_AP_ONLINE_IDLE.
+ * Since kexec-reboot has already shut down all devices, there is no way to
+ * roll back, the cpus' teardown also requires no rollback, instead, just throw
+ * warning.
+ */
+static void cpus_down_no_rollback(struct cpumask *cpus)
 {
+	struct cpuhp_cpu_state *st;
 	unsigned int cpu;
+
+	/* launch ap work one by one, but not wait for completion */
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		/*
+		 * If the current CPU state is in the range of the AP hotplug thread,
+		 * then we need to kick the thread.
+		 */
+		if (st->state > CPUHP_TEARDOWN_CPU) {
+			cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU);
+			/* In order to parallel, async. And there is no way to rollback */
+			cpuhp_kick_ap_work_async(cpu);
+		}
+	}
+
+	/* wait for all ap work completion */
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		wait_for_ap_thread(st, st->bringup);
+		if (st->result)
+			pr_warn("cpu %u refuses to offline due to %d\n", cpu, st->result);
+		else if (st->state > CPUHP_TEARDOWN_CPU)
+			pr_warn("cpu %u refuses to offline, state: %d\n", cpu, st->state);
+	}
+}
+
+static int __takedown_cpu_cleanup(unsigned int cpu)
+{
+	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+
+	/*
+	 * The teardown callback for CPUHP_AP_SCHED_STARTING will have removed
+	 * all runnable tasks from the CPU, there's only the idle task left now
+	 * that the migration thread is done doing the stop_machine thing.
+	 *
+	 * Wait for the stop thread to go away.
+	 */
+	wait_for_ap_thread(st, false);
+	BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
+
+	hotplug_cpu__broadcast_tick_pull(cpu);
+	/* This actually kills the CPU. */
+	__cpu_die(cpu);
+
+	tick_cleanup_dead_cpu(cpu);
+	rcutree_migrate_callbacks(cpu);
+	return 0;
+}
+
+/*
+ * There is a sync that all ap threads are done before calling this func.
+ */
+static void takedown_cpus_no_rollback(struct cpumask *cpus)
+{
+	struct cpuhp_cpu_state *st;
+	unsigned int cpu;
+
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		WARN_ON(st->state != CPUHP_TEARDOWN_CPU);
+		/* No invoke to takedown_cpu(), so set the state by manual */
+		st->state = CPUHP_AP_ONLINE;
+		cpuhp_set_state(cpu, st, CPUHP_AP_OFFLINE);
+	}
+
+	irq_lock_sparse();
+	/* ask stopper kthreads to execute take_cpu_down() in parallel */
+	stop_machine_cpuslocked(take_cpu_down, NULL, cpus);
+
+	/* Finally wait for completion and clean up */
+	for_each_cpu(cpu, cpus)
+		__takedown_cpu_cleanup(cpu);
+	irq_unlock_sparse();
+}
+
+static bool check_quick_reboot(void)
+{
+	return false;
+}
+
+static struct cpumask kexec_ap_map;
+
+void smp_shutdown_nonboot_cpus_quick_path(unsigned int primary_cpu)
+{
+	struct cpumask *cpus = &kexec_ap_map;
+	/*
+	 * To prevent other subsystem from access to __cpu_online_mask, but internally,
+	 * __cpu_disable() accesses the bitmap in parral and needs its own local lock.
+	 */
+	cpus_write_lock();
+
+	cpumask_copy(cpus, cpu_online_mask);
+	cpumask_clear_cpu(primary_cpu, cpus);
+	cpus_down_no_rollback(cpus);
+	takedown_cpus_no_rollback(cpus);
+	/*
+	 * For some subsystems, there are still remains for offline cpus from
+	 * CPUHP_BRINGUP_CPU to CPUHP_OFFLINE. But since none of them interact
+	 * with hardwares or firmware, they have no effect on the new kernel.
+	 * So skipping the cpuhp callbacks in that range
+	 */
+
+	cpus_write_unlock();
+}
+
+void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+{
 	int error;
+	unsigned int cpu;
 
 	cpu_maps_update_begin();
 
@@ -1275,15 +1390,19 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 	if (!cpu_online(primary_cpu))
 		primary_cpu = cpumask_first(cpu_online_mask);
 
-	for_each_online_cpu(cpu) {
-		if (cpu == primary_cpu)
-			continue;
-
-		error = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
-		if (error) {
-			pr_err("Failed to offline CPU%d - error=%d",
-				cpu, error);
-			break;
+	if (check_quick_reboot()) {
+		smp_shutdown_nonboot_cpus_quick_path(primary_cpu);
+	} else {
+		for_each_online_cpu(cpu) {
+			if (cpu == primary_cpu)
+				continue;
+
+			error = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
+			if (error) {
+				pr_err("Failed to offline CPU%d - error=%d",
+					cpu, error);
+				break;
+			}
 		}
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 03/10] cpu/hotplug: Introduce fast kexec reboot
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
    which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
    and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
component and tears them down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel. It also renders an opportunity to
implement the cpu_down() in parallel in future (not done by this series).

*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
     The firmware's internal state should enter into a proper state.
And this is achieved by the firmware's cpuhp_step's teardown interface
if any.

-1.2 CPU internal
     Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
   The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by the way of the dependency. This
series does not bother to judge the dependency, instead, just iterate
downside each cpuhp_step. And this stragegy demands that each cpuhp_step's
teardown interface supports parallel.

*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallel, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.  And the exposed stop_machine_cpuslocked()
can be used to support parallel.
1. Bring each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
   in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Bring each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
   interface of stop_machine_cpuslocked() in parallel.

Apparently, the step 2 is introduced in order to satisfy the condition
on which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallel in step 1&3.
Furtunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adopting to the
subsystem's lock rule will make things work.

*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 kernel/cpu.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 129 insertions(+), 10 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2be6ba811a01..94ab2727d6bb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1260,10 +1260,125 @@ EXPORT_SYMBOL_GPL(remove_cpu);
 
 #ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
 
-void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+/*
+ * Push all of cpus to the state CPUHP_AP_ONLINE_IDLE.
+ * Since kexec-reboot has already shut down all devices, there is no way to
+ * roll back, the cpus' teardown also requires no rollback, instead, just throw
+ * warning.
+ */
+static void cpus_down_no_rollback(struct cpumask *cpus)
 {
+	struct cpuhp_cpu_state *st;
 	unsigned int cpu;
+
+	/* launch ap work one by one, but not wait for completion */
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		/*
+		 * If the current CPU state is in the range of the AP hotplug thread,
+		 * then we need to kick the thread.
+		 */
+		if (st->state > CPUHP_TEARDOWN_CPU) {
+			cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU);
+			/* In order to parallel, async. And there is no way to rollback */
+			cpuhp_kick_ap_work_async(cpu);
+		}
+	}
+
+	/* wait for all ap work completion */
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		wait_for_ap_thread(st, st->bringup);
+		if (st->result)
+			pr_warn("cpu %u refuses to offline due to %d\n", cpu, st->result);
+		else if (st->state > CPUHP_TEARDOWN_CPU)
+			pr_warn("cpu %u refuses to offline, state: %d\n", cpu, st->state);
+	}
+}
+
+static int __takedown_cpu_cleanup(unsigned int cpu)
+{
+	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+
+	/*
+	 * The teardown callback for CPUHP_AP_SCHED_STARTING will have removed
+	 * all runnable tasks from the CPU, there's only the idle task left now
+	 * that the migration thread is done doing the stop_machine thing.
+	 *
+	 * Wait for the stop thread to go away.
+	 */
+	wait_for_ap_thread(st, false);
+	BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
+
+	hotplug_cpu__broadcast_tick_pull(cpu);
+	/* This actually kills the CPU. */
+	__cpu_die(cpu);
+
+	tick_cleanup_dead_cpu(cpu);
+	rcutree_migrate_callbacks(cpu);
+	return 0;
+}
+
+/*
+ * There is a sync that all ap threads are done before calling this func.
+ */
+static void takedown_cpus_no_rollback(struct cpumask *cpus)
+{
+	struct cpuhp_cpu_state *st;
+	unsigned int cpu;
+
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		WARN_ON(st->state != CPUHP_TEARDOWN_CPU);
+		/* No invoke to takedown_cpu(), so set the state by manual */
+		st->state = CPUHP_AP_ONLINE;
+		cpuhp_set_state(cpu, st, CPUHP_AP_OFFLINE);
+	}
+
+	irq_lock_sparse();
+	/* ask stopper kthreads to execute take_cpu_down() in parallel */
+	stop_machine_cpuslocked(take_cpu_down, NULL, cpus);
+
+	/* Finally wait for completion and clean up */
+	for_each_cpu(cpu, cpus)
+		__takedown_cpu_cleanup(cpu);
+	irq_unlock_sparse();
+}
+
+static bool check_quick_reboot(void)
+{
+	return false;
+}
+
+static struct cpumask kexec_ap_map;
+
+void smp_shutdown_nonboot_cpus_quick_path(unsigned int primary_cpu)
+{
+	struct cpumask *cpus = &kexec_ap_map;
+	/*
+	 * To prevent other subsystem from access to __cpu_online_mask, but internally,
+	 * __cpu_disable() accesses the bitmap in parral and needs its own local lock.
+	 */
+	cpus_write_lock();
+
+	cpumask_copy(cpus, cpu_online_mask);
+	cpumask_clear_cpu(primary_cpu, cpus);
+	cpus_down_no_rollback(cpus);
+	takedown_cpus_no_rollback(cpus);
+	/*
+	 * For some subsystems, there are still remains for offline cpus from
+	 * CPUHP_BRINGUP_CPU to CPUHP_OFFLINE. But since none of them interact
+	 * with hardwares or firmware, they have no effect on the new kernel.
+	 * So skipping the cpuhp callbacks in that range
+	 */
+
+	cpus_write_unlock();
+}
+
+void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+{
 	int error;
+	unsigned int cpu;
 
 	cpu_maps_update_begin();
 
@@ -1275,15 +1390,19 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 	if (!cpu_online(primary_cpu))
 		primary_cpu = cpumask_first(cpu_online_mask);
 
-	for_each_online_cpu(cpu) {
-		if (cpu == primary_cpu)
-			continue;
-
-		error = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
-		if (error) {
-			pr_err("Failed to offline CPU%d - error=%d",
-				cpu, error);
-			break;
+	if (check_quick_reboot()) {
+		smp_shutdown_nonboot_cpus_quick_path(primary_cpu);
+	} else {
+		for_each_online_cpu(cpu) {
+			if (cpu == primary_cpu)
+				continue;
+
+			error = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
+			if (error) {
+				pr_err("Failed to offline CPU%d - error=%d",
+					cpu, error);
+				break;
+			}
 		}
 	}
 
-- 
2.31.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 03/10] cpu/hotplug: Introduce fast kexec reboot
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
    which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
    and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
component and tears them down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel. It also renders an opportunity to
implement the cpu_down() in parallel in future (not done by this series).

*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
     The firmware's internal state should enter into a proper state.
And this is achieved by the firmware's cpuhp_step's teardown interface
if any.

-1.2 CPU internal
     Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
   The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by the way of the dependency. This
series does not bother to judge the dependency, instead, just iterate
downside each cpuhp_step. And this stragegy demands that each cpuhp_step's
teardown interface supports parallel.

*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallel, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.  And the exposed stop_machine_cpuslocked()
can be used to support parallel.
1. Bring each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
   in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Bring each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
   interface of stop_machine_cpuslocked() in parallel.

Apparently, the step 2 is introduced in order to satisfy the condition
on which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallel in step 1&3.
Furtunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adopting to the
subsystem's lock rule will make things work.

*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 kernel/cpu.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 129 insertions(+), 10 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2be6ba811a01..94ab2727d6bb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1260,10 +1260,125 @@ EXPORT_SYMBOL_GPL(remove_cpu);
 
 #ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
 
-void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+/*
+ * Push all of cpus to the state CPUHP_AP_ONLINE_IDLE.
+ * Since kexec-reboot has already shut down all devices, there is no way to
+ * roll back, the cpus' teardown also requires no rollback, instead, just throw
+ * warning.
+ */
+static void cpus_down_no_rollback(struct cpumask *cpus)
 {
+	struct cpuhp_cpu_state *st;
 	unsigned int cpu;
+
+	/* launch ap work one by one, but not wait for completion */
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		/*
+		 * If the current CPU state is in the range of the AP hotplug thread,
+		 * then we need to kick the thread.
+		 */
+		if (st->state > CPUHP_TEARDOWN_CPU) {
+			cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU);
+			/* In order to parallel, async. And there is no way to rollback */
+			cpuhp_kick_ap_work_async(cpu);
+		}
+	}
+
+	/* wait for all ap work completion */
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		wait_for_ap_thread(st, st->bringup);
+		if (st->result)
+			pr_warn("cpu %u refuses to offline due to %d\n", cpu, st->result);
+		else if (st->state > CPUHP_TEARDOWN_CPU)
+			pr_warn("cpu %u refuses to offline, state: %d\n", cpu, st->state);
+	}
+}
+
+static int __takedown_cpu_cleanup(unsigned int cpu)
+{
+	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+
+	/*
+	 * The teardown callback for CPUHP_AP_SCHED_STARTING will have removed
+	 * all runnable tasks from the CPU, there's only the idle task left now
+	 * that the migration thread is done doing the stop_machine thing.
+	 *
+	 * Wait for the stop thread to go away.
+	 */
+	wait_for_ap_thread(st, false);
+	BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
+
+	hotplug_cpu__broadcast_tick_pull(cpu);
+	/* This actually kills the CPU. */
+	__cpu_die(cpu);
+
+	tick_cleanup_dead_cpu(cpu);
+	rcutree_migrate_callbacks(cpu);
+	return 0;
+}
+
+/*
+ * There is a sync that all ap threads are done before calling this func.
+ */
+static void takedown_cpus_no_rollback(struct cpumask *cpus)
+{
+	struct cpuhp_cpu_state *st;
+	unsigned int cpu;
+
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		WARN_ON(st->state != CPUHP_TEARDOWN_CPU);
+		/* No invoke to takedown_cpu(), so set the state by manual */
+		st->state = CPUHP_AP_ONLINE;
+		cpuhp_set_state(cpu, st, CPUHP_AP_OFFLINE);
+	}
+
+	irq_lock_sparse();
+	/* ask stopper kthreads to execute take_cpu_down() in parallel */
+	stop_machine_cpuslocked(take_cpu_down, NULL, cpus);
+
+	/* Finally wait for completion and clean up */
+	for_each_cpu(cpu, cpus)
+		__takedown_cpu_cleanup(cpu);
+	irq_unlock_sparse();
+}
+
+static bool check_quick_reboot(void)
+{
+	return false;
+}
+
+static struct cpumask kexec_ap_map;
+
+void smp_shutdown_nonboot_cpus_quick_path(unsigned int primary_cpu)
+{
+	struct cpumask *cpus = &kexec_ap_map;
+	/*
+	 * To prevent other subsystem from access to __cpu_online_mask, but internally,
+	 * __cpu_disable() accesses the bitmap in parral and needs its own local lock.
+	 */
+	cpus_write_lock();
+
+	cpumask_copy(cpus, cpu_online_mask);
+	cpumask_clear_cpu(primary_cpu, cpus);
+	cpus_down_no_rollback(cpus);
+	takedown_cpus_no_rollback(cpus);
+	/*
+	 * For some subsystems, there are still remains for offline cpus from
+	 * CPUHP_BRINGUP_CPU to CPUHP_OFFLINE. But since none of them interact
+	 * with hardwares or firmware, they have no effect on the new kernel.
+	 * So skipping the cpuhp callbacks in that range
+	 */
+
+	cpus_write_unlock();
+}
+
+void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+{
 	int error;
+	unsigned int cpu;
 
 	cpu_maps_update_begin();
 
@@ -1275,15 +1390,19 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 	if (!cpu_online(primary_cpu))
 		primary_cpu = cpumask_first(cpu_online_mask);
 
-	for_each_online_cpu(cpu) {
-		if (cpu == primary_cpu)
-			continue;
-
-		error = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
-		if (error) {
-			pr_err("Failed to offline CPU%d - error=%d",
-				cpu, error);
-			break;
+	if (check_quick_reboot()) {
+		smp_shutdown_nonboot_cpus_quick_path(primary_cpu);
+	} else {
+		for_each_online_cpu(cpu) {
+			if (cpu == primary_cpu)
+				continue;
+
+			error = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
+			if (error) {
+				pr_err("Failed to offline CPU%d - error=%d",
+					cpu, error);
+				break;
+			}
 		}
 	}
 
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 03/10] cpu/hotplug: Introduce fast kexec reboot
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
    which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
    and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
component and tears them down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel. It also renders an opportunity to
implement the cpu_down() in parallel in future (not done by this series).

*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
     The firmware's internal state should enter into a proper state.
And this is achieved by the firmware's cpuhp_step's teardown interface
if any.

-1.2 CPU internal
     Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
   The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by the way of the dependency. This
series does not bother to judge the dependency, instead, just iterate
downside each cpuhp_step. And this stragegy demands that each cpuhp_step's
teardown interface supports parallel.

*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallel, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.  And the exposed stop_machine_cpuslocked()
can be used to support parallel.
1. Bring each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
   in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Bring each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
   interface of stop_machine_cpuslocked() in parallel.

Apparently, the step 2 is introduced in order to satisfy the condition
on which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallel in step 1&3.
Furtunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adopting to the
subsystem's lock rule will make things work.

*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 kernel/cpu.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 129 insertions(+), 10 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2be6ba811a01..94ab2727d6bb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1260,10 +1260,125 @@ EXPORT_SYMBOL_GPL(remove_cpu);
 
 #ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
 
-void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+/*
+ * Push all of cpus to the state CPUHP_AP_ONLINE_IDLE.
+ * Since kexec-reboot has already shut down all devices, there is no way to
+ * roll back, the cpus' teardown also requires no rollback, instead, just throw
+ * warning.
+ */
+static void cpus_down_no_rollback(struct cpumask *cpus)
 {
+	struct cpuhp_cpu_state *st;
 	unsigned int cpu;
+
+	/* launch ap work one by one, but not wait for completion */
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		/*
+		 * If the current CPU state is in the range of the AP hotplug thread,
+		 * then we need to kick the thread.
+		 */
+		if (st->state > CPUHP_TEARDOWN_CPU) {
+			cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU);
+			/* In order to parallel, async. And there is no way to rollback */
+			cpuhp_kick_ap_work_async(cpu);
+		}
+	}
+
+	/* wait for all ap work completion */
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		wait_for_ap_thread(st, st->bringup);
+		if (st->result)
+			pr_warn("cpu %u refuses to offline due to %d\n", cpu, st->result);
+		else if (st->state > CPUHP_TEARDOWN_CPU)
+			pr_warn("cpu %u refuses to offline, state: %d\n", cpu, st->state);
+	}
+}
+
+static int __takedown_cpu_cleanup(unsigned int cpu)
+{
+	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+
+	/*
+	 * The teardown callback for CPUHP_AP_SCHED_STARTING will have removed
+	 * all runnable tasks from the CPU, there's only the idle task left now
+	 * that the migration thread is done doing the stop_machine thing.
+	 *
+	 * Wait for the stop thread to go away.
+	 */
+	wait_for_ap_thread(st, false);
+	BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
+
+	hotplug_cpu__broadcast_tick_pull(cpu);
+	/* This actually kills the CPU. */
+	__cpu_die(cpu);
+
+	tick_cleanup_dead_cpu(cpu);
+	rcutree_migrate_callbacks(cpu);
+	return 0;
+}
+
+/*
+ * There is a sync that all ap threads are done before calling this func.
+ */
+static void takedown_cpus_no_rollback(struct cpumask *cpus)
+{
+	struct cpuhp_cpu_state *st;
+	unsigned int cpu;
+
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		WARN_ON(st->state != CPUHP_TEARDOWN_CPU);
+		/* No invoke to takedown_cpu(), so set the state by manual */
+		st->state = CPUHP_AP_ONLINE;
+		cpuhp_set_state(cpu, st, CPUHP_AP_OFFLINE);
+	}
+
+	irq_lock_sparse();
+	/* ask stopper kthreads to execute take_cpu_down() in parallel */
+	stop_machine_cpuslocked(take_cpu_down, NULL, cpus);
+
+	/* Finally wait for completion and clean up */
+	for_each_cpu(cpu, cpus)
+		__takedown_cpu_cleanup(cpu);
+	irq_unlock_sparse();
+}
+
+static bool check_quick_reboot(void)
+{
+	return false;
+}
+
+static struct cpumask kexec_ap_map;
+
+void smp_shutdown_nonboot_cpus_quick_path(unsigned int primary_cpu)
+{
+	struct cpumask *cpus = &kexec_ap_map;
+	/*
+	 * To prevent other subsystem from access to __cpu_online_mask, but internally,
+	 * __cpu_disable() accesses the bitmap in parral and needs its own local lock.
+	 */
+	cpus_write_lock();
+
+	cpumask_copy(cpus, cpu_online_mask);
+	cpumask_clear_cpu(primary_cpu, cpus);
+	cpus_down_no_rollback(cpus);
+	takedown_cpus_no_rollback(cpus);
+	/*
+	 * For some subsystems, there are still remains for offline cpus from
+	 * CPUHP_BRINGUP_CPU to CPUHP_OFFLINE. But since none of them interact
+	 * with hardwares or firmware, they have no effect on the new kernel.
+	 * So skipping the cpuhp callbacks in that range
+	 */
+
+	cpus_write_unlock();
+}
+
+void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+{
 	int error;
+	unsigned int cpu;
 
 	cpu_maps_update_begin();
 
@@ -1275,15 +1390,19 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 	if (!cpu_online(primary_cpu))
 		primary_cpu = cpumask_first(cpu_online_mask);
 
-	for_each_online_cpu(cpu) {
-		if (cpu = primary_cpu)
-			continue;
-
-		error = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
-		if (error) {
-			pr_err("Failed to offline CPU%d - error=%d",
-				cpu, error);
-			break;
+	if (check_quick_reboot()) {
+		smp_shutdown_nonboot_cpus_quick_path(primary_cpu);
+	} else {
+		for_each_online_cpu(cpu) {
+			if (cpu = primary_cpu)
+				continue;
+
+			error = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
+			if (error) {
+				pr_err("Failed to offline CPU%d - error=%d",
+					cpu, error);
+				break;
+			}
 		}
 	}
 
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 04/10] cpu/hotplug: Check the capability of kexec quick reboot
  2022-08-22  2:15 ` Pingfan Liu
  (?)
  (?)
@ 2022-08-22  2:15   ` Pingfan Liu
  -1 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

The kexec quick reboot needs each involved cpuhp_step to run in
parallel.

There are lots of teardown cpuhp_step, but not all of them belong to
arm/arm64/riscv kexec reboot path. So introducing a member
'support_kexec_parallel' in the struct cpuhp_step to signal whether the
teardown supports parallel or not. If a cpuhp_step is used in kexec
reboot, then it needs to support parallel to enable the quick reboot.

The function check_quick_reboot() checks all teardown cpuhp_steps and
report those unsupported if any.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 include/linux/cpuhotplug.h |  2 ++
 kernel/cpu.c               | 28 +++++++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f61447913db9..73093fc15aec 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -374,6 +374,8 @@ static inline int cpuhp_setup_state_multi(enum cpuhp_state state,
 				   (void *) teardown, true);
 }
 
+void cpuhp_set_step_parallel(enum cpuhp_state state);
+
 int __cpuhp_state_add_instance(enum cpuhp_state state, struct hlist_node *node,
 			       bool invoke);
 int __cpuhp_state_add_instance_cpuslocked(enum cpuhp_state state,
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 94ab2727d6bb..1261c3f3be51 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -137,6 +137,9 @@ struct cpuhp_step {
 	/* public: */
 	bool			cant_stop;
 	bool			multi_instance;
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+	bool			support_kexec_parallel;
+#endif
 };
 
 static DEFINE_MUTEX(cpuhp_state_mutex);
@@ -147,6 +150,14 @@ static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)
 	return cpuhp_hp_states + state;
 }
 
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+void cpuhp_set_step_parallel(enum cpuhp_state state)
+{
+	cpuhp_hp_states[state].support_kexec_parallel = true;
+}
+EXPORT_SYMBOL(cpuhp_set_step_parallel);
+#endif
+
 static bool cpuhp_step_empty(bool bringup, struct cpuhp_step *step)
 {
 	return bringup ? !step->startup.single : !step->teardown.single;
@@ -1347,7 +1358,22 @@ static void takedown_cpus_no_rollback(struct cpumask *cpus)
 
 static bool check_quick_reboot(void)
 {
-	return false;
+	struct cpuhp_step *step;
+	enum cpuhp_state state;
+	bool ret = true;
+
+	for (state = CPUHP_ONLINE; state >= CPUHP_AP_OFFLINE; state--) {
+		step = cpuhp_get_step(state);
+		if (step->teardown.single == NULL)
+			continue;
+		if (step->support_kexec_parallel == false) {
+			pr_info("cpuhp state:%d, %s, does not support cpudown in parallel\n",
+					state, step->name);
+			ret = false;
+		}
+	}
+
+	return ret;
 }
 
 static struct cpumask kexec_ap_map;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 04/10] cpu/hotplug: Check the capability of kexec quick reboot
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

The kexec quick reboot needs each involved cpuhp_step to run in
parallel.

There are lots of teardown cpuhp_step, but not all of them belong to
arm/arm64/riscv kexec reboot path. So introducing a member
'support_kexec_parallel' in the struct cpuhp_step to signal whether the
teardown supports parallel or not. If a cpuhp_step is used in kexec
reboot, then it needs to support parallel to enable the quick reboot.

The function check_quick_reboot() checks all teardown cpuhp_steps and
report those unsupported if any.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 include/linux/cpuhotplug.h |  2 ++
 kernel/cpu.c               | 28 +++++++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f61447913db9..73093fc15aec 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -374,6 +374,8 @@ static inline int cpuhp_setup_state_multi(enum cpuhp_state state,
 				   (void *) teardown, true);
 }
 
+void cpuhp_set_step_parallel(enum cpuhp_state state);
+
 int __cpuhp_state_add_instance(enum cpuhp_state state, struct hlist_node *node,
 			       bool invoke);
 int __cpuhp_state_add_instance_cpuslocked(enum cpuhp_state state,
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 94ab2727d6bb..1261c3f3be51 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -137,6 +137,9 @@ struct cpuhp_step {
 	/* public: */
 	bool			cant_stop;
 	bool			multi_instance;
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+	bool			support_kexec_parallel;
+#endif
 };
 
 static DEFINE_MUTEX(cpuhp_state_mutex);
@@ -147,6 +150,14 @@ static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)
 	return cpuhp_hp_states + state;
 }
 
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+void cpuhp_set_step_parallel(enum cpuhp_state state)
+{
+	cpuhp_hp_states[state].support_kexec_parallel = true;
+}
+EXPORT_SYMBOL(cpuhp_set_step_parallel);
+#endif
+
 static bool cpuhp_step_empty(bool bringup, struct cpuhp_step *step)
 {
 	return bringup ? !step->startup.single : !step->teardown.single;
@@ -1347,7 +1358,22 @@ static void takedown_cpus_no_rollback(struct cpumask *cpus)
 
 static bool check_quick_reboot(void)
 {
-	return false;
+	struct cpuhp_step *step;
+	enum cpuhp_state state;
+	bool ret = true;
+
+	for (state = CPUHP_ONLINE; state >= CPUHP_AP_OFFLINE; state--) {
+		step = cpuhp_get_step(state);
+		if (step->teardown.single == NULL)
+			continue;
+		if (step->support_kexec_parallel == false) {
+			pr_info("cpuhp state:%d, %s, does not support cpudown in parallel\n",
+					state, step->name);
+			ret = false;
+		}
+	}
+
+	return ret;
 }
 
 static struct cpumask kexec_ap_map;
-- 
2.31.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 04/10] cpu/hotplug: Check the capability of kexec quick reboot
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

The kexec quick reboot needs each involved cpuhp_step to run in
parallel.

There are lots of teardown cpuhp_step, but not all of them belong to
arm/arm64/riscv kexec reboot path. So introducing a member
'support_kexec_parallel' in the struct cpuhp_step to signal whether the
teardown supports parallel or not. If a cpuhp_step is used in kexec
reboot, then it needs to support parallel to enable the quick reboot.

The function check_quick_reboot() checks all teardown cpuhp_steps and
report those unsupported if any.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 include/linux/cpuhotplug.h |  2 ++
 kernel/cpu.c               | 28 +++++++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f61447913db9..73093fc15aec 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -374,6 +374,8 @@ static inline int cpuhp_setup_state_multi(enum cpuhp_state state,
 				   (void *) teardown, true);
 }
 
+void cpuhp_set_step_parallel(enum cpuhp_state state);
+
 int __cpuhp_state_add_instance(enum cpuhp_state state, struct hlist_node *node,
 			       bool invoke);
 int __cpuhp_state_add_instance_cpuslocked(enum cpuhp_state state,
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 94ab2727d6bb..1261c3f3be51 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -137,6 +137,9 @@ struct cpuhp_step {
 	/* public: */
 	bool			cant_stop;
 	bool			multi_instance;
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+	bool			support_kexec_parallel;
+#endif
 };
 
 static DEFINE_MUTEX(cpuhp_state_mutex);
@@ -147,6 +150,14 @@ static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)
 	return cpuhp_hp_states + state;
 }
 
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+void cpuhp_set_step_parallel(enum cpuhp_state state)
+{
+	cpuhp_hp_states[state].support_kexec_parallel = true;
+}
+EXPORT_SYMBOL(cpuhp_set_step_parallel);
+#endif
+
 static bool cpuhp_step_empty(bool bringup, struct cpuhp_step *step)
 {
 	return bringup ? !step->startup.single : !step->teardown.single;
@@ -1347,7 +1358,22 @@ static void takedown_cpus_no_rollback(struct cpumask *cpus)
 
 static bool check_quick_reboot(void)
 {
-	return false;
+	struct cpuhp_step *step;
+	enum cpuhp_state state;
+	bool ret = true;
+
+	for (state = CPUHP_ONLINE; state >= CPUHP_AP_OFFLINE; state--) {
+		step = cpuhp_get_step(state);
+		if (step->teardown.single == NULL)
+			continue;
+		if (step->support_kexec_parallel == false) {
+			pr_info("cpuhp state:%d, %s, does not support cpudown in parallel\n",
+					state, step->name);
+			ret = false;
+		}
+	}
+
+	return ret;
 }
 
 static struct cpumask kexec_ap_map;
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 04/10] cpu/hotplug: Check the capability of kexec quick reboot
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

The kexec quick reboot needs each involved cpuhp_step to run in
parallel.

There are lots of teardown cpuhp_step, but not all of them belong to
arm/arm64/riscv kexec reboot path. So introducing a member
'support_kexec_parallel' in the struct cpuhp_step to signal whether the
teardown supports parallel or not. If a cpuhp_step is used in kexec
reboot, then it needs to support parallel to enable the quick reboot.

The function check_quick_reboot() checks all teardown cpuhp_steps and
report those unsupported if any.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 include/linux/cpuhotplug.h |  2 ++
 kernel/cpu.c               | 28 +++++++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f61447913db9..73093fc15aec 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -374,6 +374,8 @@ static inline int cpuhp_setup_state_multi(enum cpuhp_state state,
 				   (void *) teardown, true);
 }
 
+void cpuhp_set_step_parallel(enum cpuhp_state state);
+
 int __cpuhp_state_add_instance(enum cpuhp_state state, struct hlist_node *node,
 			       bool invoke);
 int __cpuhp_state_add_instance_cpuslocked(enum cpuhp_state state,
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 94ab2727d6bb..1261c3f3be51 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -137,6 +137,9 @@ struct cpuhp_step {
 	/* public: */
 	bool			cant_stop;
 	bool			multi_instance;
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+	bool			support_kexec_parallel;
+#endif
 };
 
 static DEFINE_MUTEX(cpuhp_state_mutex);
@@ -147,6 +150,14 @@ static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)
 	return cpuhp_hp_states + state;
 }
 
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+void cpuhp_set_step_parallel(enum cpuhp_state state)
+{
+	cpuhp_hp_states[state].support_kexec_parallel = true;
+}
+EXPORT_SYMBOL(cpuhp_set_step_parallel);
+#endif
+
 static bool cpuhp_step_empty(bool bringup, struct cpuhp_step *step)
 {
 	return bringup ? !step->startup.single : !step->teardown.single;
@@ -1347,7 +1358,22 @@ static void takedown_cpus_no_rollback(struct cpumask *cpus)
 
 static bool check_quick_reboot(void)
 {
-	return false;
+	struct cpuhp_step *step;
+	enum cpuhp_state state;
+	bool ret = true;
+
+	for (state = CPUHP_ONLINE; state >= CPUHP_AP_OFFLINE; state--) {
+		step = cpuhp_get_step(state);
+		if (step->teardown.single = NULL)
+			continue;
+		if (step->support_kexec_parallel = false) {
+			pr_info("cpuhp state:%d, %s, does not support cpudown in parallel\n",
+					state, step->name);
+			ret = false;
+		}
+	}
+
+	return ret;
 }
 
 static struct cpumask kexec_ap_map;
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 05/10] perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel
  2022-08-22  2:15 ` Pingfan Liu
@ 2022-08-22  2:15   ` Pingfan Liu
  -1 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel; +Cc: Pingfan Liu, Will Deacon, Mark Rutland

In the case of kexec quick reboot, dsu_pmu_cpu_teardown() confronts
parallel and lock are needed to protect the contest on a dsu_pmu.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 drivers/perf/arm_dsu_pmu.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index a36698a90d2f..aa9f4393ff0c 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -833,16 +833,23 @@ static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 	struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
 						   cpuhp_node);
 
-	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu))
+	raw_spin_lock(&dsu_pmu->pmu_lock);
+	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu)) {
+		raw_spin_unlock(&dsu_pmu->pmu_lock);
 		return 0;
+	}
 
 	dst = dsu_pmu_get_online_cpu_any_but(dsu_pmu, cpu);
 	/* If there are no active CPUs in the DSU, leave IRQ disabled */
-	if (dst >= nr_cpu_ids)
+	if (dst >= nr_cpu_ids) {
+		raw_spin_unlock(&dsu_pmu->pmu_lock);
 		return 0;
+	}
 
-	perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
+	/* dst should not be in dying mask. So after setting, blocking parallel */
 	dsu_pmu_set_active_cpu(dst, dsu_pmu);
+	raw_spin_unlock(&dsu_pmu->pmu_lock);
+	perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
 
 	return 0;
 }
@@ -858,6 +865,7 @@ static int __init dsu_pmu_init(void)
 	if (ret < 0)
 		return ret;
 	dsu_pmu_cpuhp_state = ret;
+	cpuhp_set_step_parallel(ret);
 	return platform_driver_register(&dsu_pmu_driver);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 05/10] perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel; +Cc: Pingfan Liu, Will Deacon, Mark Rutland

In the case of kexec quick reboot, dsu_pmu_cpu_teardown() confronts
parallel and lock are needed to protect the contest on a dsu_pmu.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 drivers/perf/arm_dsu_pmu.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index a36698a90d2f..aa9f4393ff0c 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -833,16 +833,23 @@ static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 	struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
 						   cpuhp_node);
 
-	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu))
+	raw_spin_lock(&dsu_pmu->pmu_lock);
+	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu)) {
+		raw_spin_unlock(&dsu_pmu->pmu_lock);
 		return 0;
+	}
 
 	dst = dsu_pmu_get_online_cpu_any_but(dsu_pmu, cpu);
 	/* If there are no active CPUs in the DSU, leave IRQ disabled */
-	if (dst >= nr_cpu_ids)
+	if (dst >= nr_cpu_ids) {
+		raw_spin_unlock(&dsu_pmu->pmu_lock);
 		return 0;
+	}
 
-	perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
+	/* dst should not be in dying mask. So after setting, blocking parallel */
 	dsu_pmu_set_active_cpu(dst, dsu_pmu);
+	raw_spin_unlock(&dsu_pmu->pmu_lock);
+	perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
 
 	return 0;
 }
@@ -858,6 +865,7 @@ static int __init dsu_pmu_init(void)
 	if (ret < 0)
 		return ret;
 	dsu_pmu_cpuhp_state = ret;
+	cpuhp_set_step_parallel(ret);
 	return platform_driver_register(&dsu_pmu_driver);
 }
 
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-22  2:15 ` Pingfan Liu
                   ` (7 preceding siblings ...)
  (?)
@ 2022-08-22  2:15 ` Pingfan Liu
  2022-08-22  2:45   ` Paul E. McKenney
                     ` (2 more replies)
  -1 siblings, 3 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-kernel, rcu
  Cc: Pingfan Liu, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld

In order to support parallel, rcu_state.n_online_cpus should be
atomic_dec()

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: "Peter Zijlstra
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: linux-kernel@vger.kernel.org
To: rcu@vger.kernel.org
---
 kernel/cpu.c      | 1 +
 kernel/rcu/tree.c | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1261c3f3be51..90debbe28e85 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1872,6 +1872,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
 		.name			= "RCU/tree:prepare",
 		.startup.single		= rcutree_prepare_cpu,
 		.teardown.single	= rcutree_dead_cpu,
+		.support_kexec_parallel	= true,
 	},
 	/*
 	 * On the tear-down path, timers_dead_cpu() must be invoked
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79aea7df4345..07d31e16c65e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2168,7 +2168,8 @@ int rcutree_dead_cpu(unsigned int cpu)
 	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
 		return 0;
 
-	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
+	/* Hot remove path allows parallel, while Hot add races against remove on lock */
+	atomic_dec((atomic_t *)&rcu_state.n_online_cpus);
 	/* Adjust any no-longer-needed kthreads. */
 	rcu_boost_kthread_setaffinity(rnp, -1);
 	// Stop-machine done, so allow nohz_full to disable tick.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 07/10] lib/cpumask: Introduce cpumask_not_dying_but()
  2022-08-22  2:15 ` Pingfan Liu
                   ` (8 preceding siblings ...)
  (?)
@ 2022-08-22  2:15 ` Pingfan Liu
  2022-08-22 14:15   ` Yury Norov
  -1 siblings, 1 reply; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Yury Norov, Andy Shevchenko, Rasmus Villemoes,
	Thomas Gleixner, Steven Price, Mark Rutland, Jason A. Donenfeld,
	Kuppuswamy Sathyanarayanan

During cpu hot-removing, the dying cpus are still in cpu_online_mask.
On the other hand, A subsystem will migrate its broker from the dying
cpu to a online cpu in its teardown cpuhp_step.

After enabling the teardown of cpus in parallel, cpu_online_mask can not
tell those dying from the real online.

Introducing a function cpumask_not_dying_but() to pick a real online
cpu.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
To: linux-kernel@vger.kernel.org
---
 include/linux/cpumask.h |  3 +++
 kernel/cpu.c            |  3 +++
 lib/cpumask.c           | 18 ++++++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 0d435d0edbcb..d2033a239a07 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -317,6 +317,9 @@ unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
 	return i;
 }
 
+/* for parallel kexec reboot */
+int cpumask_not_dying_but(const struct cpumask *mask, unsigned int cpu);
+
 #define CPU_BITS_NONE						\
 {								\
 	[0 ... BITS_TO_LONGS(NR_CPUS)-1] = 0UL			\
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 90debbe28e85..771e344f8ff9 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1282,6 +1282,9 @@ static void cpus_down_no_rollback(struct cpumask *cpus)
 	struct cpuhp_cpu_state *st;
 	unsigned int cpu;
 
+	for_each_cpu(cpu, cpus)
+		set_cpu_dying(cpu, true);
+
 	/* launch ap work one by one, but not wait for completion */
 	for_each_cpu(cpu, cpus) {
 		st = per_cpu_ptr(&cpuhp_state, cpu);
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 8baeb37e23d3..6474f07ed87a 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -7,6 +7,24 @@
 #include <linux/memblock.h>
 #include <linux/numa.h>
 
+/* Used in parallel kexec-reboot cpuhp callbacks */
+int cpumask_not_dying_but(const struct cpumask *mask,
+					   unsigned int cpu)
+{
+	unsigned int i;
+
+	if (CONFIG_SHUTDOWN_NONBOOT_CPUS) {
+		cpumask_check(cpu);
+		for_each_cpu(i, mask)
+			if (i != cpu && !cpumask_test_cpu(i, cpu_dying_mask))
+				break;
+		return i;
+	} else {
+		return cpumask_any_but(mask, cpu);
+	}
+}
+EXPORT_SYMBOL(cpumask_not_dying_but);
+
 /**
  * cpumask_next_wrap - helper to implement for_each_cpu_wrap
  * @n: the cpu prior to the place to search
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
  2022-08-22  2:15 ` Pingfan Liu
                     ` (2 preceding siblings ...)
  (?)
@ 2022-08-22  2:15   ` Pingfan Liu
  -1 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, dmaengine, linux-fpga, intel-gfx, dri-devel,
	linux-arm-msm, linuxppc-dev, linux-kernel
  Cc: Pingfan Liu, Russell King, Shawn Guo, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, NXP Linux Team,
	Fenghua Yu, Dave Jiang, Vinod Koul, Wu Hao, Tom Rix,
	Moritz Fischer, Xu Yilun, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, David Airlie, Daniel Vetter,
	Will Deacon, Mark Rutland, Frank Li, Shaokun Zhang, Qi Liu,
	Andy Gross, Bjorn Andersson, Konrad Dybcio, Khuong Dinh, Li Yang,
	Yury Norov

In a kexec quick reboot path, the dying cpus are still on
cpu_online_mask. During the teardown of cpu, a subsystem needs to
migrate its broker to a real online cpu.

This patch replaces cpumask_any_but(cpu_online_mask, cpu) in a teardown
procedure with cpumask_not_dying_but(cpu_online_mask, cpu).

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Wu Hao <hao.wu@intel.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Xu Yilun <yilun.xu@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Frank Li <Frank.li@nxp.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Qi Liu <liuqi115@huawei.com>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Konrad Dybcio <konrad.dybcio@somainline.org>
Cc: Khuong Dinh <khuong@os.amperecomputing.com>
Cc: Li Yang <leoyang.li@nxp.com>
Cc: Yury Norov <yury.norov@gmail.com>
To: linux-arm-kernel@lists.infradead.org
To: dmaengine@vger.kernel.org
To: linux-fpga@vger.kernel.org
To: intel-gfx@lists.freedesktop.org
To: dri-devel@lists.freedesktop.org
To: linux-arm-msm@vger.kernel.org
To: linuxppc-dev@lists.ozlabs.org
To: linux-kernel@vger.kernel.org
---
 arch/arm/mach-imx/mmdc.c                 | 2 +-
 arch/arm/mm/cache-l2x0-pmu.c             | 2 +-
 drivers/dma/idxd/perfmon.c               | 2 +-
 drivers/fpga/dfl-fme-perf.c              | 2 +-
 drivers/gpu/drm/i915/i915_pmu.c          | 2 +-
 drivers/perf/arm-cci.c                   | 2 +-
 drivers/perf/arm-ccn.c                   | 2 +-
 drivers/perf/arm-cmn.c                   | 4 ++--
 drivers/perf/arm_dmc620_pmu.c            | 2 +-
 drivers/perf/arm_dsu_pmu.c               | 2 +-
 drivers/perf/arm_smmuv3_pmu.c            | 2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         | 2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     | 2 +-
 drivers/perf/qcom_l2_pmu.c               | 2 +-
 drivers/perf/qcom_l3_pmu.c               | 2 +-
 drivers/perf/xgene_pmu.c                 | 2 +-
 drivers/soc/fsl/qbman/bman_portal.c      | 2 +-
 drivers/soc/fsl/qbman/qman_portal.c      | 2 +-
 19 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index af12668d0bf5..a109a7ea8613 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -220,7 +220,7 @@ static int mmdc_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_mmdc->cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index 993fefdc167a..1b0037ef7fa5 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -428,7 +428,7 @@ static int l2x0_pmu_offline_cpu(unsigned int cpu)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index d73004f47cf4..f3f1ccb55f73 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -528,7 +528,7 @@ static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 
 	/* migrate events if there is a valid target */
 	if (target < nr_cpu_ids)
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index 587c82be12f7..57804f28357e 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -948,7 +948,7 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != priv->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 958b37123bf1..f866f9223492 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1068,7 +1068,7 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
 		return 0;
 
 	if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
-		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
+		target = cpumask_not_dying_but(topology_sibling_cpumask(cpu), cpu);
 
 		/* Migrate events if there is a valid target */
 		if (target < nr_cpu_ids) {
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 03b1309875ae..481da937fb9d 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1447,7 +1447,7 @@ static int cci_pmu_offline_cpu(unsigned int cpu)
 	if (!g_cci_pmu || cpu != g_cci_pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 728d13d8e98a..573d6906ec9b 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -1205,7 +1205,7 @@ static int arm_ccn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (cpu != dt->cpu)
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&dt->pmu, cpu, target);
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 80d8309652a4..1847182a1ed3 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1787,9 +1787,9 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no
 	node = dev_to_node(cmn->dev);
 	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
 	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
-		target = cpumask_any(&mask);
+		target = cpumask_not_dying_but(&mask, cpu);
 	else
-		target = cpumask_any_but(cpu_online_mask, cpu);
+		target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target < nr_cpu_ids)
 		arm_cmn_migrate(cmn, target);
 	return 0;
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 280a6ae3e27c..3a0a2bb92e12 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -611,7 +611,7 @@ static int dmc620_pmu_cpu_teardown(unsigned int cpu,
 	if (cpu != irq->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index aa9f4393ff0c..e19ce0406b02 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -236,7 +236,7 @@ static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)
 
 	cpumask_and(&online_supported,
 			 &dsu_pmu->associated_cpus, cpu_online_mask);
-	return cpumask_any_but(&online_supported, cpu);
+	return cpumask_not_dying_but(&online_supported, cpu);
 }
 
 static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 00d4c45a8017..827315d31056 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -640,7 +640,7 @@ static int smmu_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != smmu_pmu->on_cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 8e058e08fe81..4e0276fc1548 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -664,7 +664,7 @@ static int ddr_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index fbc8a93d5eac..8c39da8f4b3c 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -518,7 +518,7 @@ int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Choose a new CPU to migrate ownership of the PMU to */
 	cpumask_and(&pmu_online_cpus, &hisi_pmu->associated_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&pmu_online_cpus, cpu);
+	target = cpumask_not_dying_but(&pmu_online_cpus, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 69c3050a4348..268e3288893d 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -387,7 +387,7 @@ static int tad_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 30234c261b05..8823d0bb6476 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -822,7 +822,7 @@ static int l2cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Any other CPU for this cluster which is still online */
 	cpumask_and(&cluster_online_cpus, &cluster->cluster_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&cluster_online_cpus, cpu);
+	target = cpumask_not_dying_but(&cluster_online_cpus, cpu);
 	if (target >= nr_cpu_ids) {
 		disable_irq(cluster->irq);
 		return 0;
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 1ff2ff6582bf..ba26b2fa0736 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -718,7 +718,7 @@ static int qcom_l3_cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *no
 
 	if (!cpumask_test_and_clear_cpu(cpu, &l3pmu->cpumask))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&l3pmu->pmu, cpu, target);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0c32dffc7ede..069eb0a0d3ba 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -1804,7 +1804,7 @@ static int xgene_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (!cpumask_test_and_clear_cpu(cpu, &xgene_pmu->cpu))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index 4d7b9caee1c4..8ebcf87e7d06 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -67,7 +67,7 @@ static int bman_offline_cpu(unsigned int cpu)
 		return 0;
 
 	/* use any other online CPU */
-	cpu = cpumask_any_but(cpu_online_mask, cpu);
+	cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 	irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 	return 0;
 }
diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index e23b60618c1a..3807a8285ced 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -148,7 +148,7 @@ static int qman_offline_cpu(unsigned int cpu)
 		pcfg = qman_get_qm_portal_config(p);
 		if (pcfg) {
 			/* select any other online CPU */
-			cpu = cpumask_any_but(cpu_online_mask, cpu);
+			cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 			irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 			qman_portal_update_sdest(pcfg, cpu);
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, dmaengine, linux-fpga, intel-gfx, dri-devel,
	linux-arm-msm, linuxppc-dev, linux-kernel
  Cc: Mark Rutland, David Airlie, Tom Rix, Joonas Lahtinen, Frank Li,
	Pingfan Liu, Bjorn Andersson, Will Deacon, Khuong Dinh,
	Dave Jiang, Fabio Estevam, Russell King, Andy Gross,
	NXP Linux Team, Qi Liu, Wu Hao, Fenghua Yu,
	Pengutronix Kernel Team, Yury Norov, Konrad Dybcio, Sascha Hauer,
	Jani Nikula, Shaokun Zhang, Moritz Fischer, Rodrigo Vivi,
	Tvrtko Ursulin, Li Yang

In a kexec quick reboot path, the dying cpus are still on
cpu_online_mask. During the teardown of cpu, a subsystem needs to
migrate its broker to a real online cpu.

This patch replaces cpumask_any_but(cpu_online_mask, cpu) in a teardown
procedure with cpumask_not_dying_but(cpu_online_mask, cpu).

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Wu Hao <hao.wu@intel.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Xu Yilun <yilun.xu@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Frank Li <Frank.li@nxp.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Qi Liu <liuqi115@huawei.com>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Konrad Dybcio <konrad.dybcio@somainline.org>
Cc: Khuong Dinh <khuong@os.amperecomputing.com>
Cc: Li Yang <leoyang.li@nxp.com>
Cc: Yury Norov <yury.norov@gmail.com>
To: linux-arm-kernel@lists.infradead.org
To: dmaengine@vger.kernel.org
To: linux-fpga@vger.kernel.org
To: intel-gfx@lists.freedesktop.org
To: dri-devel@lists.freedesktop.org
To: linux-arm-msm@vger.kernel.org
To: linuxppc-dev@lists.ozlabs.org
To: linux-kernel@vger.kernel.org
---
 arch/arm/mach-imx/mmdc.c                 | 2 +-
 arch/arm/mm/cache-l2x0-pmu.c             | 2 +-
 drivers/dma/idxd/perfmon.c               | 2 +-
 drivers/fpga/dfl-fme-perf.c              | 2 +-
 drivers/gpu/drm/i915/i915_pmu.c          | 2 +-
 drivers/perf/arm-cci.c                   | 2 +-
 drivers/perf/arm-ccn.c                   | 2 +-
 drivers/perf/arm-cmn.c                   | 4 ++--
 drivers/perf/arm_dmc620_pmu.c            | 2 +-
 drivers/perf/arm_dsu_pmu.c               | 2 +-
 drivers/perf/arm_smmuv3_pmu.c            | 2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         | 2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     | 2 +-
 drivers/perf/qcom_l2_pmu.c               | 2 +-
 drivers/perf/qcom_l3_pmu.c               | 2 +-
 drivers/perf/xgene_pmu.c                 | 2 +-
 drivers/soc/fsl/qbman/bman_portal.c      | 2 +-
 drivers/soc/fsl/qbman/qman_portal.c      | 2 +-
 19 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index af12668d0bf5..a109a7ea8613 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -220,7 +220,7 @@ static int mmdc_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_mmdc->cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index 993fefdc167a..1b0037ef7fa5 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -428,7 +428,7 @@ static int l2x0_pmu_offline_cpu(unsigned int cpu)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index d73004f47cf4..f3f1ccb55f73 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -528,7 +528,7 @@ static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 
 	/* migrate events if there is a valid target */
 	if (target < nr_cpu_ids)
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index 587c82be12f7..57804f28357e 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -948,7 +948,7 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != priv->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 958b37123bf1..f866f9223492 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1068,7 +1068,7 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
 		return 0;
 
 	if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
-		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
+		target = cpumask_not_dying_but(topology_sibling_cpumask(cpu), cpu);
 
 		/* Migrate events if there is a valid target */
 		if (target < nr_cpu_ids) {
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 03b1309875ae..481da937fb9d 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1447,7 +1447,7 @@ static int cci_pmu_offline_cpu(unsigned int cpu)
 	if (!g_cci_pmu || cpu != g_cci_pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 728d13d8e98a..573d6906ec9b 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -1205,7 +1205,7 @@ static int arm_ccn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (cpu != dt->cpu)
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&dt->pmu, cpu, target);
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 80d8309652a4..1847182a1ed3 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1787,9 +1787,9 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no
 	node = dev_to_node(cmn->dev);
 	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
 	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
-		target = cpumask_any(&mask);
+		target = cpumask_not_dying_but(&mask, cpu);
 	else
-		target = cpumask_any_but(cpu_online_mask, cpu);
+		target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target < nr_cpu_ids)
 		arm_cmn_migrate(cmn, target);
 	return 0;
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 280a6ae3e27c..3a0a2bb92e12 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -611,7 +611,7 @@ static int dmc620_pmu_cpu_teardown(unsigned int cpu,
 	if (cpu != irq->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index aa9f4393ff0c..e19ce0406b02 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -236,7 +236,7 @@ static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)
 
 	cpumask_and(&online_supported,
 			 &dsu_pmu->associated_cpus, cpu_online_mask);
-	return cpumask_any_but(&online_supported, cpu);
+	return cpumask_not_dying_but(&online_supported, cpu);
 }
 
 static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 00d4c45a8017..827315d31056 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -640,7 +640,7 @@ static int smmu_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != smmu_pmu->on_cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 8e058e08fe81..4e0276fc1548 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -664,7 +664,7 @@ static int ddr_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index fbc8a93d5eac..8c39da8f4b3c 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -518,7 +518,7 @@ int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Choose a new CPU to migrate ownership of the PMU to */
 	cpumask_and(&pmu_online_cpus, &hisi_pmu->associated_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&pmu_online_cpus, cpu);
+	target = cpumask_not_dying_but(&pmu_online_cpus, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 69c3050a4348..268e3288893d 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -387,7 +387,7 @@ static int tad_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 30234c261b05..8823d0bb6476 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -822,7 +822,7 @@ static int l2cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Any other CPU for this cluster which is still online */
 	cpumask_and(&cluster_online_cpus, &cluster->cluster_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&cluster_online_cpus, cpu);
+	target = cpumask_not_dying_but(&cluster_online_cpus, cpu);
 	if (target >= nr_cpu_ids) {
 		disable_irq(cluster->irq);
 		return 0;
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 1ff2ff6582bf..ba26b2fa0736 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -718,7 +718,7 @@ static int qcom_l3_cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *no
 
 	if (!cpumask_test_and_clear_cpu(cpu, &l3pmu->cpumask))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&l3pmu->pmu, cpu, target);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0c32dffc7ede..069eb0a0d3ba 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -1804,7 +1804,7 @@ static int xgene_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (!cpumask_test_and_clear_cpu(cpu, &xgene_pmu->cpu))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index 4d7b9caee1c4..8ebcf87e7d06 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -67,7 +67,7 @@ static int bman_offline_cpu(unsigned int cpu)
 		return 0;
 
 	/* use any other online CPU */
-	cpu = cpumask_any_but(cpu_online_mask, cpu);
+	cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 	irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 	return 0;
 }
diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index e23b60618c1a..3807a8285ced 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -148,7 +148,7 @@ static int qman_offline_cpu(unsigned int cpu)
 		pcfg = qman_get_qm_portal_config(p);
 		if (pcfg) {
 			/* select any other online CPU */
-			cpu = cpumask_any_but(cpu_online_mask, cpu);
+			cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 			irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 			qman_portal_update_sdest(pcfg, cpu);
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, dmaengine, linux-fpga, intel-gfx, dri-devel,
	linux-arm-msm, linuxppc-dev, linux-kernel
  Cc: Pingfan Liu, Russell King, Shawn Guo, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, NXP Linux Team,
	Fenghua Yu, Dave Jiang, Vinod Koul, Wu Hao, Tom Rix,
	Moritz Fischer, Xu Yilun, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, David Airlie, Daniel Vetter,
	Will Deacon, Mark Rutland, Frank Li, Shaokun Zhang, Qi Liu,
	Andy Gross, Bjorn Andersson, Konrad Dybcio, Khuong Dinh, Li Yang,
	Yury Norov

In a kexec quick reboot path, the dying cpus are still on
cpu_online_mask. During the teardown of cpu, a subsystem needs to
migrate its broker to a real online cpu.

This patch replaces cpumask_any_but(cpu_online_mask, cpu) in a teardown
procedure with cpumask_not_dying_but(cpu_online_mask, cpu).

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Wu Hao <hao.wu@intel.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Xu Yilun <yilun.xu@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Frank Li <Frank.li@nxp.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Qi Liu <liuqi115@huawei.com>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Konrad Dybcio <konrad.dybcio@somainline.org>
Cc: Khuong Dinh <khuong@os.amperecomputing.com>
Cc: Li Yang <leoyang.li@nxp.com>
Cc: Yury Norov <yury.norov@gmail.com>
To: linux-arm-kernel@lists.infradead.org
To: dmaengine@vger.kernel.org
To: linux-fpga@vger.kernel.org
To: intel-gfx@lists.freedesktop.org
To: dri-devel@lists.freedesktop.org
To: linux-arm-msm@vger.kernel.org
To: linuxppc-dev@lists.ozlabs.org
To: linux-kernel@vger.kernel.org
---
 arch/arm/mach-imx/mmdc.c                 | 2 +-
 arch/arm/mm/cache-l2x0-pmu.c             | 2 +-
 drivers/dma/idxd/perfmon.c               | 2 +-
 drivers/fpga/dfl-fme-perf.c              | 2 +-
 drivers/gpu/drm/i915/i915_pmu.c          | 2 +-
 drivers/perf/arm-cci.c                   | 2 +-
 drivers/perf/arm-ccn.c                   | 2 +-
 drivers/perf/arm-cmn.c                   | 4 ++--
 drivers/perf/arm_dmc620_pmu.c            | 2 +-
 drivers/perf/arm_dsu_pmu.c               | 2 +-
 drivers/perf/arm_smmuv3_pmu.c            | 2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         | 2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     | 2 +-
 drivers/perf/qcom_l2_pmu.c               | 2 +-
 drivers/perf/qcom_l3_pmu.c               | 2 +-
 drivers/perf/xgene_pmu.c                 | 2 +-
 drivers/soc/fsl/qbman/bman_portal.c      | 2 +-
 drivers/soc/fsl/qbman/qman_portal.c      | 2 +-
 19 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index af12668d0bf5..a109a7ea8613 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -220,7 +220,7 @@ static int mmdc_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_mmdc->cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index 993fefdc167a..1b0037ef7fa5 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -428,7 +428,7 @@ static int l2x0_pmu_offline_cpu(unsigned int cpu)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index d73004f47cf4..f3f1ccb55f73 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -528,7 +528,7 @@ static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 
 	/* migrate events if there is a valid target */
 	if (target < nr_cpu_ids)
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index 587c82be12f7..57804f28357e 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -948,7 +948,7 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != priv->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 958b37123bf1..f866f9223492 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1068,7 +1068,7 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
 		return 0;
 
 	if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
-		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
+		target = cpumask_not_dying_but(topology_sibling_cpumask(cpu), cpu);
 
 		/* Migrate events if there is a valid target */
 		if (target < nr_cpu_ids) {
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 03b1309875ae..481da937fb9d 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1447,7 +1447,7 @@ static int cci_pmu_offline_cpu(unsigned int cpu)
 	if (!g_cci_pmu || cpu != g_cci_pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 728d13d8e98a..573d6906ec9b 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -1205,7 +1205,7 @@ static int arm_ccn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (cpu != dt->cpu)
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&dt->pmu, cpu, target);
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 80d8309652a4..1847182a1ed3 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1787,9 +1787,9 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no
 	node = dev_to_node(cmn->dev);
 	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
 	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
-		target = cpumask_any(&mask);
+		target = cpumask_not_dying_but(&mask, cpu);
 	else
-		target = cpumask_any_but(cpu_online_mask, cpu);
+		target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target < nr_cpu_ids)
 		arm_cmn_migrate(cmn, target);
 	return 0;
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 280a6ae3e27c..3a0a2bb92e12 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -611,7 +611,7 @@ static int dmc620_pmu_cpu_teardown(unsigned int cpu,
 	if (cpu != irq->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index aa9f4393ff0c..e19ce0406b02 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -236,7 +236,7 @@ static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)
 
 	cpumask_and(&online_supported,
 			 &dsu_pmu->associated_cpus, cpu_online_mask);
-	return cpumask_any_but(&online_supported, cpu);
+	return cpumask_not_dying_but(&online_supported, cpu);
 }
 
 static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 00d4c45a8017..827315d31056 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -640,7 +640,7 @@ static int smmu_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != smmu_pmu->on_cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 8e058e08fe81..4e0276fc1548 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -664,7 +664,7 @@ static int ddr_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index fbc8a93d5eac..8c39da8f4b3c 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -518,7 +518,7 @@ int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Choose a new CPU to migrate ownership of the PMU to */
 	cpumask_and(&pmu_online_cpus, &hisi_pmu->associated_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&pmu_online_cpus, cpu);
+	target = cpumask_not_dying_but(&pmu_online_cpus, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 69c3050a4348..268e3288893d 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -387,7 +387,7 @@ static int tad_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 30234c261b05..8823d0bb6476 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -822,7 +822,7 @@ static int l2cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Any other CPU for this cluster which is still online */
 	cpumask_and(&cluster_online_cpus, &cluster->cluster_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&cluster_online_cpus, cpu);
+	target = cpumask_not_dying_but(&cluster_online_cpus, cpu);
 	if (target >= nr_cpu_ids) {
 		disable_irq(cluster->irq);
 		return 0;
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 1ff2ff6582bf..ba26b2fa0736 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -718,7 +718,7 @@ static int qcom_l3_cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *no
 
 	if (!cpumask_test_and_clear_cpu(cpu, &l3pmu->cpumask))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&l3pmu->pmu, cpu, target);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0c32dffc7ede..069eb0a0d3ba 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -1804,7 +1804,7 @@ static int xgene_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (!cpumask_test_and_clear_cpu(cpu, &xgene_pmu->cpu))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index 4d7b9caee1c4..8ebcf87e7d06 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -67,7 +67,7 @@ static int bman_offline_cpu(unsigned int cpu)
 		return 0;
 
 	/* use any other online CPU */
-	cpu = cpumask_any_but(cpu_online_mask, cpu);
+	cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 	irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 	return 0;
 }
diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index e23b60618c1a..3807a8285ced 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -148,7 +148,7 @@ static int qman_offline_cpu(unsigned int cpu)
 		pcfg = qman_get_qm_portal_config(p);
 		if (pcfg) {
 			/* select any other online CPU */
-			cpu = cpumask_any_but(cpu_online_mask, cpu);
+			cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 			irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 			qman_portal_update_sdest(pcfg, cpu);
 		}
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, dmaengine, linux-fpga, intel-gfx, dri-devel,
	linux-arm-msm, linuxppc-dev, linux-kernel
  Cc: Mark Rutland, David Airlie, Tom Rix, Frank Li, Pingfan Liu,
	Bjorn Andersson, Will Deacon, Khuong Dinh, Dave Jiang,
	Russell King, Andy Gross, NXP Linux Team, Qi Liu, Wu Hao,
	Fenghua Yu, Pengutronix Kernel Team, Yury Norov, Konrad Dybcio,
	Sascha Hauer, Shaokun Zhang, Moritz Fischer, Rodrigo Vivi,
	Tvrtko Ursulin, Li Yang, Vinod Koul, Shawn Guo, Xu Yilun

In a kexec quick reboot path, the dying cpus are still on
cpu_online_mask. During the teardown of cpu, a subsystem needs to
migrate its broker to a real online cpu.

This patch replaces cpumask_any_but(cpu_online_mask, cpu) in a teardown
procedure with cpumask_not_dying_but(cpu_online_mask, cpu).

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Wu Hao <hao.wu@intel.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Xu Yilun <yilun.xu@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Frank Li <Frank.li@nxp.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Qi Liu <liuqi115@huawei.com>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Konrad Dybcio <konrad.dybcio@somainline.org>
Cc: Khuong Dinh <khuong@os.amperecomputing.com>
Cc: Li Yang <leoyang.li@nxp.com>
Cc: Yury Norov <yury.norov@gmail.com>
To: linux-arm-kernel@lists.infradead.org
To: dmaengine@vger.kernel.org
To: linux-fpga@vger.kernel.org
To: intel-gfx@lists.freedesktop.org
To: dri-devel@lists.freedesktop.org
To: linux-arm-msm@vger.kernel.org
To: linuxppc-dev@lists.ozlabs.org
To: linux-kernel@vger.kernel.org
---
 arch/arm/mach-imx/mmdc.c                 | 2 +-
 arch/arm/mm/cache-l2x0-pmu.c             | 2 +-
 drivers/dma/idxd/perfmon.c               | 2 +-
 drivers/fpga/dfl-fme-perf.c              | 2 +-
 drivers/gpu/drm/i915/i915_pmu.c          | 2 +-
 drivers/perf/arm-cci.c                   | 2 +-
 drivers/perf/arm-ccn.c                   | 2 +-
 drivers/perf/arm-cmn.c                   | 4 ++--
 drivers/perf/arm_dmc620_pmu.c            | 2 +-
 drivers/perf/arm_dsu_pmu.c               | 2 +-
 drivers/perf/arm_smmuv3_pmu.c            | 2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         | 2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     | 2 +-
 drivers/perf/qcom_l2_pmu.c               | 2 +-
 drivers/perf/qcom_l3_pmu.c               | 2 +-
 drivers/perf/xgene_pmu.c                 | 2 +-
 drivers/soc/fsl/qbman/bman_portal.c      | 2 +-
 drivers/soc/fsl/qbman/qman_portal.c      | 2 +-
 19 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index af12668d0bf5..a109a7ea8613 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -220,7 +220,7 @@ static int mmdc_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_mmdc->cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index 993fefdc167a..1b0037ef7fa5 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -428,7 +428,7 @@ static int l2x0_pmu_offline_cpu(unsigned int cpu)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index d73004f47cf4..f3f1ccb55f73 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -528,7 +528,7 @@ static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 
 	/* migrate events if there is a valid target */
 	if (target < nr_cpu_ids)
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index 587c82be12f7..57804f28357e 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -948,7 +948,7 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != priv->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 958b37123bf1..f866f9223492 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1068,7 +1068,7 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
 		return 0;
 
 	if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
-		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
+		target = cpumask_not_dying_but(topology_sibling_cpumask(cpu), cpu);
 
 		/* Migrate events if there is a valid target */
 		if (target < nr_cpu_ids) {
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 03b1309875ae..481da937fb9d 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1447,7 +1447,7 @@ static int cci_pmu_offline_cpu(unsigned int cpu)
 	if (!g_cci_pmu || cpu != g_cci_pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 728d13d8e98a..573d6906ec9b 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -1205,7 +1205,7 @@ static int arm_ccn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (cpu != dt->cpu)
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&dt->pmu, cpu, target);
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 80d8309652a4..1847182a1ed3 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1787,9 +1787,9 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no
 	node = dev_to_node(cmn->dev);
 	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
 	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
-		target = cpumask_any(&mask);
+		target = cpumask_not_dying_but(&mask, cpu);
 	else
-		target = cpumask_any_but(cpu_online_mask, cpu);
+		target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target < nr_cpu_ids)
 		arm_cmn_migrate(cmn, target);
 	return 0;
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 280a6ae3e27c..3a0a2bb92e12 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -611,7 +611,7 @@ static int dmc620_pmu_cpu_teardown(unsigned int cpu,
 	if (cpu != irq->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index aa9f4393ff0c..e19ce0406b02 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -236,7 +236,7 @@ static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)
 
 	cpumask_and(&online_supported,
 			 &dsu_pmu->associated_cpus, cpu_online_mask);
-	return cpumask_any_but(&online_supported, cpu);
+	return cpumask_not_dying_but(&online_supported, cpu);
 }
 
 static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 00d4c45a8017..827315d31056 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -640,7 +640,7 @@ static int smmu_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != smmu_pmu->on_cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 8e058e08fe81..4e0276fc1548 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -664,7 +664,7 @@ static int ddr_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index fbc8a93d5eac..8c39da8f4b3c 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -518,7 +518,7 @@ int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Choose a new CPU to migrate ownership of the PMU to */
 	cpumask_and(&pmu_online_cpus, &hisi_pmu->associated_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&pmu_online_cpus, cpu);
+	target = cpumask_not_dying_but(&pmu_online_cpus, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 69c3050a4348..268e3288893d 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -387,7 +387,7 @@ static int tad_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 30234c261b05..8823d0bb6476 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -822,7 +822,7 @@ static int l2cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Any other CPU for this cluster which is still online */
 	cpumask_and(&cluster_online_cpus, &cluster->cluster_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&cluster_online_cpus, cpu);
+	target = cpumask_not_dying_but(&cluster_online_cpus, cpu);
 	if (target >= nr_cpu_ids) {
 		disable_irq(cluster->irq);
 		return 0;
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 1ff2ff6582bf..ba26b2fa0736 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -718,7 +718,7 @@ static int qcom_l3_cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *no
 
 	if (!cpumask_test_and_clear_cpu(cpu, &l3pmu->cpumask))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&l3pmu->pmu, cpu, target);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0c32dffc7ede..069eb0a0d3ba 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -1804,7 +1804,7 @@ static int xgene_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (!cpumask_test_and_clear_cpu(cpu, &xgene_pmu->cpu))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index 4d7b9caee1c4..8ebcf87e7d06 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -67,7 +67,7 @@ static int bman_offline_cpu(unsigned int cpu)
 		return 0;
 
 	/* use any other online CPU */
-	cpu = cpumask_any_but(cpu_online_mask, cpu);
+	cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 	irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 	return 0;
 }
diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index e23b60618c1a..3807a8285ced 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -148,7 +148,7 @@ static int qman_offline_cpu(unsigned int cpu)
 		pcfg = qman_get_qm_portal_config(p);
 		if (pcfg) {
 			/* select any other online CPU */
-			cpu = cpumask_any_but(cpu_online_mask, cpu);
+			cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 			irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 			qman_portal_update_sdest(pcfg, cpu);
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Intel-gfx] [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, dmaengine, linux-fpga, intel-gfx, dri-devel,
	linux-arm-msm, linuxppc-dev, linux-kernel
  Cc: Mark Rutland, David Airlie, Tom Rix, Frank Li, Pingfan Liu,
	Bjorn Andersson, Will Deacon, Khuong Dinh, Dave Jiang,
	Fabio Estevam, Russell King, Andy Gross, NXP Linux Team, Qi Liu,
	Wu Hao, Fenghua Yu, Pengutronix Kernel Team, Yury Norov,
	Konrad Dybcio, Sascha Hauer, Shaokun Zhang, Moritz Fischer,
	Rodrigo Vivi, Li Yang, Vinod Koul, Daniel Vetter, Shawn Guo,
	Xu Yilun

In a kexec quick reboot path, the dying cpus are still on
cpu_online_mask. During the teardown of cpu, a subsystem needs to
migrate its broker to a real online cpu.

This patch replaces cpumask_any_but(cpu_online_mask, cpu) in a teardown
procedure with cpumask_not_dying_but(cpu_online_mask, cpu).

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Wu Hao <hao.wu@intel.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Xu Yilun <yilun.xu@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Frank Li <Frank.li@nxp.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Qi Liu <liuqi115@huawei.com>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Konrad Dybcio <konrad.dybcio@somainline.org>
Cc: Khuong Dinh <khuong@os.amperecomputing.com>
Cc: Li Yang <leoyang.li@nxp.com>
Cc: Yury Norov <yury.norov@gmail.com>
To: linux-arm-kernel@lists.infradead.org
To: dmaengine@vger.kernel.org
To: linux-fpga@vger.kernel.org
To: intel-gfx@lists.freedesktop.org
To: dri-devel@lists.freedesktop.org
To: linux-arm-msm@vger.kernel.org
To: linuxppc-dev@lists.ozlabs.org
To: linux-kernel@vger.kernel.org
---
 arch/arm/mach-imx/mmdc.c                 | 2 +-
 arch/arm/mm/cache-l2x0-pmu.c             | 2 +-
 drivers/dma/idxd/perfmon.c               | 2 +-
 drivers/fpga/dfl-fme-perf.c              | 2 +-
 drivers/gpu/drm/i915/i915_pmu.c          | 2 +-
 drivers/perf/arm-cci.c                   | 2 +-
 drivers/perf/arm-ccn.c                   | 2 +-
 drivers/perf/arm-cmn.c                   | 4 ++--
 drivers/perf/arm_dmc620_pmu.c            | 2 +-
 drivers/perf/arm_dsu_pmu.c               | 2 +-
 drivers/perf/arm_smmuv3_pmu.c            | 2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         | 2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     | 2 +-
 drivers/perf/qcom_l2_pmu.c               | 2 +-
 drivers/perf/qcom_l3_pmu.c               | 2 +-
 drivers/perf/xgene_pmu.c                 | 2 +-
 drivers/soc/fsl/qbman/bman_portal.c      | 2 +-
 drivers/soc/fsl/qbman/qman_portal.c      | 2 +-
 19 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index af12668d0bf5..a109a7ea8613 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -220,7 +220,7 @@ static int mmdc_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_mmdc->cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index 993fefdc167a..1b0037ef7fa5 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -428,7 +428,7 @@ static int l2x0_pmu_offline_cpu(unsigned int cpu)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index d73004f47cf4..f3f1ccb55f73 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -528,7 +528,7 @@ static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 
 	/* migrate events if there is a valid target */
 	if (target < nr_cpu_ids)
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index 587c82be12f7..57804f28357e 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -948,7 +948,7 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != priv->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 958b37123bf1..f866f9223492 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1068,7 +1068,7 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
 		return 0;
 
 	if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
-		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
+		target = cpumask_not_dying_but(topology_sibling_cpumask(cpu), cpu);
 
 		/* Migrate events if there is a valid target */
 		if (target < nr_cpu_ids) {
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 03b1309875ae..481da937fb9d 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1447,7 +1447,7 @@ static int cci_pmu_offline_cpu(unsigned int cpu)
 	if (!g_cci_pmu || cpu != g_cci_pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 728d13d8e98a..573d6906ec9b 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -1205,7 +1205,7 @@ static int arm_ccn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (cpu != dt->cpu)
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&dt->pmu, cpu, target);
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 80d8309652a4..1847182a1ed3 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1787,9 +1787,9 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no
 	node = dev_to_node(cmn->dev);
 	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
 	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
-		target = cpumask_any(&mask);
+		target = cpumask_not_dying_but(&mask, cpu);
 	else
-		target = cpumask_any_but(cpu_online_mask, cpu);
+		target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target < nr_cpu_ids)
 		arm_cmn_migrate(cmn, target);
 	return 0;
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 280a6ae3e27c..3a0a2bb92e12 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -611,7 +611,7 @@ static int dmc620_pmu_cpu_teardown(unsigned int cpu,
 	if (cpu != irq->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index aa9f4393ff0c..e19ce0406b02 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -236,7 +236,7 @@ static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)
 
 	cpumask_and(&online_supported,
 			 &dsu_pmu->associated_cpus, cpu_online_mask);
-	return cpumask_any_but(&online_supported, cpu);
+	return cpumask_not_dying_but(&online_supported, cpu);
 }
 
 static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 00d4c45a8017..827315d31056 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -640,7 +640,7 @@ static int smmu_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != smmu_pmu->on_cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 8e058e08fe81..4e0276fc1548 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -664,7 +664,7 @@ static int ddr_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index fbc8a93d5eac..8c39da8f4b3c 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -518,7 +518,7 @@ int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Choose a new CPU to migrate ownership of the PMU to */
 	cpumask_and(&pmu_online_cpus, &hisi_pmu->associated_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&pmu_online_cpus, cpu);
+	target = cpumask_not_dying_but(&pmu_online_cpus, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 69c3050a4348..268e3288893d 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -387,7 +387,7 @@ static int tad_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 30234c261b05..8823d0bb6476 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -822,7 +822,7 @@ static int l2cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Any other CPU for this cluster which is still online */
 	cpumask_and(&cluster_online_cpus, &cluster->cluster_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&cluster_online_cpus, cpu);
+	target = cpumask_not_dying_but(&cluster_online_cpus, cpu);
 	if (target >= nr_cpu_ids) {
 		disable_irq(cluster->irq);
 		return 0;
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 1ff2ff6582bf..ba26b2fa0736 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -718,7 +718,7 @@ static int qcom_l3_cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *no
 
 	if (!cpumask_test_and_clear_cpu(cpu, &l3pmu->cpumask))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&l3pmu->pmu, cpu, target);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0c32dffc7ede..069eb0a0d3ba 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -1804,7 +1804,7 @@ static int xgene_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (!cpumask_test_and_clear_cpu(cpu, &xgene_pmu->cpu))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index 4d7b9caee1c4..8ebcf87e7d06 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -67,7 +67,7 @@ static int bman_offline_cpu(unsigned int cpu)
 		return 0;
 
 	/* use any other online CPU */
-	cpu = cpumask_any_but(cpu_online_mask, cpu);
+	cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 	irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 	return 0;
 }
diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index e23b60618c1a..3807a8285ced 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -148,7 +148,7 @@ static int qman_offline_cpu(unsigned int cpu)
 		pcfg = qman_get_qm_portal_config(p);
 		if (pcfg) {
 			/* select any other online CPU */
-			cpu = cpumask_any_but(cpu_online_mask, cpu);
+			cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 			irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 			qman_portal_update_sdest(pcfg, cpu);
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 09/10] genirq/cpuhotplug: Ask migrate_one_irq() to migrate to a real online cpu
  2022-08-22  2:15 ` Pingfan Liu
                   ` (10 preceding siblings ...)
  (?)
@ 2022-08-22  2:15 ` Pingfan Liu
  -1 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pingfan Liu, Thomas Gleixner

In kexec quick reboot path, the dying cpus are still on the
cpu_online_mask, but the interrupt should be migrated onto a real online
cpu instead of a dying one. Otherwise, the interrupt has no opportunity
to move onto a live cpu.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
To: linux-kernel@vger.kernel.org
---
 kernel/irq/cpuhotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 1ed2b1739363..e85d6456f310 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -110,6 +110,7 @@ static bool migrate_one_irq(struct irq_desc *desc)
 	if (maskchip && chip->irq_mask)
 		chip->irq_mask(d);
 
+	cpumask_andnot(affinity, affinity, cpu_dying_mask);
 	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
 		/*
 		 * If the interrupt is managed, then shut it down and leave
@@ -120,7 +121,7 @@ static bool migrate_one_irq(struct irq_desc *desc)
 			irq_shutdown_and_deactivate(desc);
 			return false;
 		}
-		affinity = cpu_online_mask;
+		cpumask_andnot(affinity, cpu_online_mask, cpu_dying_mask);
 		brokeaff = true;
 	}
 	/*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 10/10] arm64: smp: Make __cpu_disable() parallel
  2022-08-22  2:15 ` Pingfan Liu
@ 2022-08-22  2:15   ` Pingfan Liu
  -1 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: Pingfan Liu, Catalin Marinas, Will Deacon, Viresh Kumar,
	Sudeep Holla, Phil Auld, Rob Herring, Ben Dooks

On a dying cpu, take_cpu_down()->__cpu_disable(), which means if the
teardown path supports parallel, __cpu_disable() confront the parallel,
which may ruin cpu_online_mask etc if no extra lock provides the
protection.

At present, the cpumask is protected by cpu_add_remove_lock, that lock
is quite above __cpu_disable(). In order to protect __cpu_disable() from
parrallel in kexec quick reboot path, introducing a local lock
cpumap_lock.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Ben Dooks <ben-linux@fluff.org>
To: linux-arm-kernel@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 arch/arm64/kernel/smp.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf695..fee8879048b0 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -287,6 +287,28 @@ static int op_cpu_disable(unsigned int cpu)
 	return 0;
 }
 
+static DEFINE_SPINLOCK(cpumap_lock);
+
+static void __cpu_clear_maps(unsigned int cpu)
+{
+	/*
+	 * In the case of kexec rebooting, the cpu_add_remove_lock mutex can not protect
+	 */
+	if (kexec_in_progress)
+		spin_lock(&cpumap_lock);
+	remove_cpu_topology(cpu);
+	numa_remove_cpu(cpu);
+
+	/*
+	 * Take this CPU offline.  Once we clear this, we can't return,
+	 * and we must not schedule until we're ready to give up the cpu.
+	 */
+	set_cpu_online(cpu, false);
+	if (kexec_in_progress)
+		spin_unlock(&cpumap_lock);
+
+}
+
 /*
  * __cpu_disable runs on the processor to be shutdown.
  */
@@ -299,14 +321,7 @@ int __cpu_disable(void)
 	if (ret)
 		return ret;
 
-	remove_cpu_topology(cpu);
-	numa_remove_cpu(cpu);
-
-	/*
-	 * Take this CPU offline.  Once we clear this, we can't return,
-	 * and we must not schedule until we're ready to give up the cpu.
-	 */
-	set_cpu_online(cpu, false);
+	__cpu_clear_maps(cpu);
 	ipi_teardown(cpu);
 
 	/*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC 10/10] arm64: smp: Make __cpu_disable() parallel
@ 2022-08-22  2:15   ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: Pingfan Liu, Catalin Marinas, Will Deacon, Viresh Kumar,
	Sudeep Holla, Phil Auld, Rob Herring, Ben Dooks

On a dying cpu, take_cpu_down()->__cpu_disable(), which means if the
teardown path supports parallel, __cpu_disable() confront the parallel,
which may ruin cpu_online_mask etc if no extra lock provides the
protection.

At present, the cpumask is protected by cpu_add_remove_lock, that lock
is quite above __cpu_disable(). In order to protect __cpu_disable() from
parrallel in kexec quick reboot path, introducing a local lock
cpumap_lock.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Ben Dooks <ben-linux@fluff.org>
To: linux-arm-kernel@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 arch/arm64/kernel/smp.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf695..fee8879048b0 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -287,6 +287,28 @@ static int op_cpu_disable(unsigned int cpu)
 	return 0;
 }
 
+static DEFINE_SPINLOCK(cpumap_lock);
+
+static void __cpu_clear_maps(unsigned int cpu)
+{
+	/*
+	 * In the case of kexec rebooting, the cpu_add_remove_lock mutex can not protect
+	 */
+	if (kexec_in_progress)
+		spin_lock(&cpumap_lock);
+	remove_cpu_topology(cpu);
+	numa_remove_cpu(cpu);
+
+	/*
+	 * Take this CPU offline.  Once we clear this, we can't return,
+	 * and we must not schedule until we're ready to give up the cpu.
+	 */
+	set_cpu_online(cpu, false);
+	if (kexec_in_progress)
+		spin_unlock(&cpumap_lock);
+
+}
+
 /*
  * __cpu_disable runs on the processor to be shutdown.
  */
@@ -299,14 +321,7 @@ int __cpu_disable(void)
 	if (ret)
 		return ret;
 
-	remove_cpu_topology(cpu);
-	numa_remove_cpu(cpu);
-
-	/*
-	 * Take this CPU offline.  Once we clear this, we can't return,
-	 * and we must not schedule until we're ready to give up the cpu.
-	 */
-	set_cpu_online(cpu, false);
+	__cpu_clear_maps(cpu);
 	ipi_teardown(cpu);
 
 	/*
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-22  2:15 ` [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel Pingfan Liu
@ 2022-08-22  2:45   ` Paul E. McKenney
  2022-08-23  1:50     ` Pingfan Liu
  2022-08-22  4:54   ` kernel test robot
  2022-08-22 18:08   ` Joel Fernandes
  2 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2022-08-22  2:45 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, rcu, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, boqun.feng

On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> In order to support parallel, rcu_state.n_online_cpus should be
> atomic_dec()
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>

I have to ask...  What testing have you subjected this patch to?

							Thanx, Paul

> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Steven Price <steven.price@arm.com>
> Cc: "Peter Zijlstra
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: linux-kernel@vger.kernel.org
> To: rcu@vger.kernel.org
> ---
>  kernel/cpu.c      | 1 +
>  kernel/rcu/tree.c | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 1261c3f3be51..90debbe28e85 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1872,6 +1872,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
>  		.name			= "RCU/tree:prepare",
>  		.startup.single		= rcutree_prepare_cpu,
>  		.teardown.single	= rcutree_dead_cpu,
> +		.support_kexec_parallel	= true,
>  	},
>  	/*
>  	 * On the tear-down path, timers_dead_cpu() must be invoked
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79aea7df4345..07d31e16c65e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2168,7 +2168,8 @@ int rcutree_dead_cpu(unsigned int cpu)
>  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
>  		return 0;
>  
> -	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> +	/* Hot remove path allows parallel, while Hot add races against remove on lock */
> +	atomic_dec((atomic_t *)&rcu_state.n_online_cpus);
>  	/* Adjust any no-longer-needed kthreads. */
>  	rcu_boost_kthread_setaffinity(rnp, -1);
>  	// Stop-machine done, so allow nohz_full to disable tick.
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-22  2:15 ` [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel Pingfan Liu
  2022-08-22  2:45   ` Paul E. McKenney
@ 2022-08-22  4:54   ` kernel test robot
  2022-08-22 18:08   ` Joel Fernandes
  2 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2022-08-22  4:54 UTC (permalink / raw)
  To: Pingfan Liu; +Cc: llvm, kbuild-all

Hi Pingfan,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on linus/master]
[also build test ERROR on v6.0-rc2 next-20220819]
[cannot apply to arm64/for-next/core soc/for-next arm/for-next kvmarm/next xilinx-xlnx/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Pingfan-Liu/arm64-riscv-Introduce-fast-kexec-reboot/20220822-101854
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 1c23f9e627a7b412978b4e852793c5e3c3efc555
config: i386-randconfig-a001-20220822 (https://download.01.org/0day-ci/archive/20220822/202208221246.L4LUk5u8-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/28c1ad168af1fe0412af126f49213f44664175ed
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Pingfan-Liu/arm64-riscv-Introduce-fast-kexec-reboot/20220822-101854
        git checkout 28c1ad168af1fe0412af126f49213f44664175ed
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> kernel/cpu.c:1875:4: error: field designator 'support_kexec_parallel' does not refer to any field in type 'struct cpuhp_step'
                   .support_kexec_parallel = true,
                    ^
   1 error generated.


vim +1875 kernel/cpu.c

  1821	
  1822	/* Boot processor state steps */
  1823	static struct cpuhp_step cpuhp_hp_states[] = {
  1824		[CPUHP_OFFLINE] = {
  1825			.name			= "offline",
  1826			.startup.single		= NULL,
  1827			.teardown.single	= NULL,
  1828		},
  1829	#ifdef CONFIG_SMP
  1830		[CPUHP_CREATE_THREADS]= {
  1831			.name			= "threads:prepare",
  1832			.startup.single		= smpboot_create_threads,
  1833			.teardown.single	= NULL,
  1834			.cant_stop		= true,
  1835		},
  1836		[CPUHP_PERF_PREPARE] = {
  1837			.name			= "perf:prepare",
  1838			.startup.single		= perf_event_init_cpu,
  1839			.teardown.single	= perf_event_exit_cpu,
  1840		},
  1841		[CPUHP_RANDOM_PREPARE] = {
  1842			.name			= "random:prepare",
  1843			.startup.single		= random_prepare_cpu,
  1844			.teardown.single	= NULL,
  1845		},
  1846		[CPUHP_WORKQUEUE_PREP] = {
  1847			.name			= "workqueue:prepare",
  1848			.startup.single		= workqueue_prepare_cpu,
  1849			.teardown.single	= NULL,
  1850		},
  1851		[CPUHP_HRTIMERS_PREPARE] = {
  1852			.name			= "hrtimers:prepare",
  1853			.startup.single		= hrtimers_prepare_cpu,
  1854			.teardown.single	= hrtimers_dead_cpu,
  1855		},
  1856		[CPUHP_SMPCFD_PREPARE] = {
  1857			.name			= "smpcfd:prepare",
  1858			.startup.single		= smpcfd_prepare_cpu,
  1859			.teardown.single	= smpcfd_dead_cpu,
  1860		},
  1861		[CPUHP_RELAY_PREPARE] = {
  1862			.name			= "relay:prepare",
  1863			.startup.single		= relay_prepare_cpu,
  1864			.teardown.single	= NULL,
  1865		},
  1866		[CPUHP_SLAB_PREPARE] = {
  1867			.name			= "slab:prepare",
  1868			.startup.single		= slab_prepare_cpu,
  1869			.teardown.single	= slab_dead_cpu,
  1870		},
  1871		[CPUHP_RCUTREE_PREP] = {
  1872			.name			= "RCU/tree:prepare",
  1873			.startup.single		= rcutree_prepare_cpu,
  1874			.teardown.single	= rcutree_dead_cpu,
> 1875			.support_kexec_parallel	= true,
  1876		},
  1877		/*
  1878		 * On the tear-down path, timers_dead_cpu() must be invoked
  1879		 * before blk_mq_queue_reinit_notify() from notify_dead(),
  1880		 * otherwise a RCU stall occurs.
  1881		 */
  1882		[CPUHP_TIMERS_PREPARE] = {
  1883			.name			= "timers:prepare",
  1884			.startup.single		= timers_prepare_cpu,
  1885			.teardown.single	= timers_dead_cpu,
  1886		},
  1887		/* Kicks the plugged cpu into life */
  1888		[CPUHP_BRINGUP_CPU] = {
  1889			.name			= "cpu:bringup",
  1890			.startup.single		= bringup_cpu,
  1891			.teardown.single	= finish_cpu,
  1892			.cant_stop		= true,
  1893		},
  1894		/* Final state before CPU kills itself */
  1895		[CPUHP_AP_IDLE_DEAD] = {
  1896			.name			= "idle:dead",
  1897		},
  1898		/*
  1899		 * Last state before CPU enters the idle loop to die. Transient state
  1900		 * for synchronization.
  1901		 */
  1902		[CPUHP_AP_OFFLINE] = {
  1903			.name			= "ap:offline",
  1904			.cant_stop		= true,
  1905		},
  1906		/* First state is scheduler control. Interrupts are disabled */
  1907		[CPUHP_AP_SCHED_STARTING] = {
  1908			.name			= "sched:starting",
  1909			.startup.single		= sched_cpu_starting,
  1910			.teardown.single	= sched_cpu_dying,
  1911		},
  1912		[CPUHP_AP_RCUTREE_DYING] = {
  1913			.name			= "RCU/tree:dying",
  1914			.startup.single		= NULL,
  1915			.teardown.single	= rcutree_dying_cpu,
  1916		},
  1917		[CPUHP_AP_SMPCFD_DYING] = {
  1918			.name			= "smpcfd:dying",
  1919			.startup.single		= NULL,
  1920			.teardown.single	= smpcfd_dying_cpu,
  1921		},
  1922		/* Entry state on starting. Interrupts enabled from here on. Transient
  1923		 * state for synchronsization */
  1924		[CPUHP_AP_ONLINE] = {
  1925			.name			= "ap:online",
  1926		},
  1927		/*
  1928		 * Handled on control processor until the plugged processor manages
  1929		 * this itself.
  1930		 */
  1931		[CPUHP_TEARDOWN_CPU] = {
  1932			.name			= "cpu:teardown",
  1933			.startup.single		= NULL,
  1934			.teardown.single	= takedown_cpu,
  1935			.cant_stop		= true,
  1936		},
  1937	
  1938		[CPUHP_AP_SCHED_WAIT_EMPTY] = {
  1939			.name			= "sched:waitempty",
  1940			.startup.single		= NULL,
  1941			.teardown.single	= sched_cpu_wait_empty,
  1942		},
  1943	
  1944		/* Handle smpboot threads park/unpark */
  1945		[CPUHP_AP_SMPBOOT_THREADS] = {
  1946			.name			= "smpboot/threads:online",
  1947			.startup.single		= smpboot_unpark_threads,
  1948			.teardown.single	= smpboot_park_threads,
  1949		},
  1950		[CPUHP_AP_IRQ_AFFINITY_ONLINE] = {
  1951			.name			= "irq/affinity:online",
  1952			.startup.single		= irq_affinity_online_cpu,
  1953			.teardown.single	= NULL,
  1954		},
  1955		[CPUHP_AP_PERF_ONLINE] = {
  1956			.name			= "perf:online",
  1957			.startup.single		= perf_event_init_cpu,
  1958			.teardown.single	= perf_event_exit_cpu,
  1959		},
  1960		[CPUHP_AP_WATCHDOG_ONLINE] = {
  1961			.name			= "lockup_detector:online",
  1962			.startup.single		= lockup_detector_online_cpu,
  1963			.teardown.single	= lockup_detector_offline_cpu,
  1964		},
  1965		[CPUHP_AP_WORKQUEUE_ONLINE] = {
  1966			.name			= "workqueue:online",
  1967			.startup.single		= workqueue_online_cpu,
  1968			.teardown.single	= workqueue_offline_cpu,
  1969		},
  1970		[CPUHP_AP_RANDOM_ONLINE] = {
  1971			.name			= "random:online",
  1972			.startup.single		= random_online_cpu,
  1973			.teardown.single	= NULL,
  1974		},
  1975		[CPUHP_AP_RCUTREE_ONLINE] = {
  1976			.name			= "RCU/tree:online",
  1977			.startup.single		= rcutree_online_cpu,
  1978			.teardown.single	= rcutree_offline_cpu,
  1979		},
  1980	#endif
  1981		/*
  1982		 * The dynamically registered state space is here
  1983		 */
  1984	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 07/10] lib/cpumask: Introduce cpumask_not_dying_but()
  2022-08-22  2:15 ` [RFC 07/10] lib/cpumask: Introduce cpumask_not_dying_but() Pingfan Liu
@ 2022-08-22 14:15   ` Yury Norov
  2022-08-23  7:29     ` Pingfan Liu
  0 siblings, 1 reply; 49+ messages in thread
From: Yury Norov @ 2022-08-22 14:15 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Andy Shevchenko, Rasmus Villemoes, Thomas Gleixner,
	Steven Price, Mark Rutland, Jason A. Donenfeld,
	Kuppuswamy Sathyanarayanan

On Mon, Aug 22, 2022 at 10:15:17AM +0800, Pingfan Liu wrote:
> During cpu hot-removing, the dying cpus are still in cpu_online_mask.
> On the other hand, A subsystem will migrate its broker from the dying
> cpu to a online cpu in its teardown cpuhp_step.
> 
> After enabling the teardown of cpus in parallel, cpu_online_mask can not
> tell those dying from the real online.
> 
> Introducing a function cpumask_not_dying_but() to pick a real online
> cpu.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Yury Norov <yury.norov@gmail.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> To: linux-kernel@vger.kernel.org
> ---
>  include/linux/cpumask.h |  3 +++
>  kernel/cpu.c            |  3 +++
>  lib/cpumask.c           | 18 ++++++++++++++++++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index 0d435d0edbcb..d2033a239a07 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -317,6 +317,9 @@ unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
>  	return i;
>  }
>  
> +/* for parallel kexec reboot */
> +int cpumask_not_dying_but(const struct cpumask *mask, unsigned int cpu);
> +
>  #define CPU_BITS_NONE						\
>  {								\
>  	[0 ... BITS_TO_LONGS(NR_CPUS)-1] = 0UL			\
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 90debbe28e85..771e344f8ff9 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1282,6 +1282,9 @@ static void cpus_down_no_rollback(struct cpumask *cpus)
>  	struct cpuhp_cpu_state *st;
>  	unsigned int cpu;
>  
> +	for_each_cpu(cpu, cpus)
> +		set_cpu_dying(cpu, true);
> +
>  	/* launch ap work one by one, but not wait for completion */
>  	for_each_cpu(cpu, cpus) {
>  		st = per_cpu_ptr(&cpuhp_state, cpu);
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 8baeb37e23d3..6474f07ed87a 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -7,6 +7,24 @@
>  #include <linux/memblock.h>
>  #include <linux/numa.h>
>  
> +/* Used in parallel kexec-reboot cpuhp callbacks */
> +int cpumask_not_dying_but(const struct cpumask *mask,
> +					   unsigned int cpu)
> +{
> +	unsigned int i;
> +
> +	if (CONFIG_SHUTDOWN_NONBOOT_CPUS) {

Hmm... Would it even work? Anyways, the documentation says:
Within code, where possible, use the IS_ENABLED macro to convert a Kconfig
symbol into a C boolean expression, and use it in a normal C conditional:

.. code-block:: c

        if (IS_ENABLED(CONFIG_SOMETHING)) {
                ...
        }


> +		cpumask_check(cpu);
> +		for_each_cpu(i, mask)
> +			if (i != cpu && !cpumask_test_cpu(i, cpu_dying_mask))
> +				break;
> +		return i;
> +	} else {
> +		return cpumask_any_but(mask, cpu);
> +	}
> +}
> +EXPORT_SYMBOL(cpumask_not_dying_but);

I don't like how you create a dedicated function for a random
mask. Dying mask is nothing special, right? What you really
need is probably this:
        cpumask_andnot_any_but(mask, cpu_dying_mask, cpu);

Now, if you still think it's worth that, you can add a trivial wrapper
for cpu_dying_mask. (But please pick some other name, because
'not dying but' sounds like a hangover description. :) )

Thanks,
Yury

> +
>  /**
>   * cpumask_next_wrap - helper to implement for_each_cpu_wrap
>   * @n: the cpu prior to the place to search
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-22  2:15 ` [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel Pingfan Liu
  2022-08-22  2:45   ` Paul E. McKenney
  2022-08-22  4:54   ` kernel test robot
@ 2022-08-22 18:08   ` Joel Fernandes
  2022-08-23  1:56     ` Pingfan Liu
  2 siblings, 1 reply; 49+ messages in thread
From: Joel Fernandes @ 2022-08-22 18:08 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: LKML, rcu, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Thomas Gleixner, Steven Price,
	Mark Rutland, Kuppuswamy Sathyanarayanan, Jason A. Donenfeld

On Sun, Aug 21, 2022 at 10:16 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> In order to support parallel, rcu_state.n_online_cpus should be
> atomic_dec()

What does Parallel mean? Is that some kexec terminology?

Thanks,

 - Joel

>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Steven Price <steven.price@arm.com>
> Cc: "Peter Zijlstra
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: linux-kernel@vger.kernel.org
> To: rcu@vger.kernel.org
> ---
>  kernel/cpu.c      | 1 +
>  kernel/rcu/tree.c | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 1261c3f3be51..90debbe28e85 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1872,6 +1872,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
>                 .name                   = "RCU/tree:prepare",
>                 .startup.single         = rcutree_prepare_cpu,
>                 .teardown.single        = rcutree_dead_cpu,
> +               .support_kexec_parallel = true,
>         },
>         /*
>          * On the tear-down path, timers_dead_cpu() must be invoked
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79aea7df4345..07d31e16c65e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2168,7 +2168,8 @@ int rcutree_dead_cpu(unsigned int cpu)
>         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
>                 return 0;
>
> -       WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> +       /* Hot remove path allows parallel, while Hot add races against remove on lock */
> +       atomic_dec((atomic_t *)&rcu_state.n_online_cpus);
>         /* Adjust any no-longer-needed kthreads. */
>         rcu_boost_kthread_setaffinity(rnp, -1);
>         // Stop-machine done, so allow nohz_full to disable tick.
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-22  2:45   ` Paul E. McKenney
@ 2022-08-23  1:50     ` Pingfan Liu
  2022-08-23  3:01       ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Pingfan Liu @ 2022-08-23  1:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, rcu, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, boqun.feng

On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> > In order to support parallel, rcu_state.n_online_cpus should be
> > atomic_dec()
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> 
> I have to ask...  What testing have you subjected this patch to?
> 

This patch subjects to [1]. The series aims to enable kexec-reboot in
parallel on all cpu. As a result, the involved RCU part is expected to
support parallel.

[1]: https://lore.kernel.org/linux-arm-kernel/20220822021520.6996-3-kernelfans@gmail.com/T/#mf62352138d7b040fdb583ba66f8cd0ed1e145feb

Thanks,

	Pingfan


> 							Thanx, Paul
> 
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Steven Price <steven.price@arm.com>
> > Cc: "Peter Zijlstra
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: linux-kernel@vger.kernel.org
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/cpu.c      | 1 +
> >  kernel/rcu/tree.c | 3 ++-
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 1261c3f3be51..90debbe28e85 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -1872,6 +1872,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
> >  		.name			= "RCU/tree:prepare",
> >  		.startup.single		= rcutree_prepare_cpu,
> >  		.teardown.single	= rcutree_dead_cpu,
> > +		.support_kexec_parallel	= true,
> >  	},
> >  	/*
> >  	 * On the tear-down path, timers_dead_cpu() must be invoked
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..07d31e16c65e 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2168,7 +2168,8 @@ int rcutree_dead_cpu(unsigned int cpu)
> >  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> >  		return 0;
> >  
> > -	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > +	/* Hot remove path allows parallel, while Hot add races against remove on lock */
> > +	atomic_dec((atomic_t *)&rcu_state.n_online_cpus);
> >  	/* Adjust any no-longer-needed kthreads. */
> >  	rcu_boost_kthread_setaffinity(rnp, -1);
> >  	// Stop-machine done, so allow nohz_full to disable tick.
> > -- 
> > 2.31.1
> > 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-22 18:08   ` Joel Fernandes
@ 2022-08-23  1:56     ` Pingfan Liu
  2022-08-23  3:14       ` Joel Fernandes
  2022-08-24 13:44       ` Jason A. Donenfeld
  0 siblings, 2 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-23  1:56 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, rcu, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Thomas Gleixner, Steven Price,
	Mark Rutland, Kuppuswamy Sathyanarayanan, Jason A. Donenfeld

On Mon, Aug 22, 2022 at 02:08:38PM -0400, Joel Fernandes wrote:
> On Sun, Aug 21, 2022 at 10:16 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > In order to support parallel, rcu_state.n_online_cpus should be
> > atomic_dec()
> 
> What does Parallel mean? Is that some kexec terminology?
> 

'Parallel' means concurrent. It is not a kexec terminology, instead,
should be SMP.

Thanks,

	Pingfan


> Thanks,
> 
>  - Joel
> 
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Steven Price <steven.price@arm.com>
> > Cc: "Peter Zijlstra
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: linux-kernel@vger.kernel.org
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/cpu.c      | 1 +
> >  kernel/rcu/tree.c | 3 ++-
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 1261c3f3be51..90debbe28e85 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -1872,6 +1872,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
> >                 .name                   = "RCU/tree:prepare",
> >                 .startup.single         = rcutree_prepare_cpu,
> >                 .teardown.single        = rcutree_dead_cpu,
> > +               .support_kexec_parallel = true,
> >         },
> >         /*
> >          * On the tear-down path, timers_dead_cpu() must be invoked
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..07d31e16c65e 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2168,7 +2168,8 @@ int rcutree_dead_cpu(unsigned int cpu)
> >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> >                 return 0;
> >
> > -       WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > +       /* Hot remove path allows parallel, while Hot add races against remove on lock */
> > +       atomic_dec((atomic_t *)&rcu_state.n_online_cpus);
> >         /* Adjust any no-longer-needed kthreads. */
> >         rcu_boost_kthread_setaffinity(rnp, -1);
> >         // Stop-machine done, so allow nohz_full to disable tick.
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-23  1:50     ` Pingfan Liu
@ 2022-08-23  3:01       ` Paul E. McKenney
  2022-08-24 13:53         ` Pingfan Liu
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2022-08-23  3:01 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, rcu, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, boqun.feng

On Tue, Aug 23, 2022 at 09:50:56AM +0800, Pingfan Liu wrote:
> On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> > > In order to support parallel, rcu_state.n_online_cpus should be
> > > atomic_dec()
> > > 
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > 
> > I have to ask...  What testing have you subjected this patch to?
> > 
> 
> This patch subjects to [1]. The series aims to enable kexec-reboot in
> parallel on all cpu. As a result, the involved RCU part is expected to
> support parallel.

I understand (and even sympathize with) the expectation.  But results
sometimes diverge from expectations.  There have been implicit assumptions
in RCU about only one CPU going offline at a time, and I am not sure
that all of them have been addressed.  Concurrent CPU onlining has
been looked at recently here:

https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing

You did us atomic_dec() to make rcu_state.n_online_cpus decrementing be
atomic, which is good.  Did you look through the rest of RCU's CPU-offline
code paths and related code paths?

> [1]: https://lore.kernel.org/linux-arm-kernel/20220822021520.6996-3-kernelfans@gmail.com/T/#mf62352138d7b040fdb583ba66f8cd0ed1e145feb

Perhaps I am more blind than usual today, but I am not seeing anything
in this patch describing the testing.  At this point, I am thinking in
terms of making rcutorture test concurrent CPU offlining.

Thoughts?

							Thanx, Paul

> Thanks,
> 
> 	Pingfan
> 
> 
> > 							Thanx, Paul
> > 
> > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > Cc: Frederic Weisbecker <frederic@kernel.org>
> > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Steven Price <steven.price@arm.com>
> > > Cc: "Peter Zijlstra
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > > To: linux-kernel@vger.kernel.org
> > > To: rcu@vger.kernel.org
> > > ---
> > >  kernel/cpu.c      | 1 +
> > >  kernel/rcu/tree.c | 3 ++-
> > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > > index 1261c3f3be51..90debbe28e85 100644
> > > --- a/kernel/cpu.c
> > > +++ b/kernel/cpu.c
> > > @@ -1872,6 +1872,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
> > >  		.name			= "RCU/tree:prepare",
> > >  		.startup.single		= rcutree_prepare_cpu,
> > >  		.teardown.single	= rcutree_dead_cpu,
> > > +		.support_kexec_parallel	= true,
> > >  	},
> > >  	/*
> > >  	 * On the tear-down path, timers_dead_cpu() must be invoked
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 79aea7df4345..07d31e16c65e 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2168,7 +2168,8 @@ int rcutree_dead_cpu(unsigned int cpu)
> > >  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > >  		return 0;
> > >  
> > > -	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > > +	/* Hot remove path allows parallel, while Hot add races against remove on lock */
> > > +	atomic_dec((atomic_t *)&rcu_state.n_online_cpus);
> > >  	/* Adjust any no-longer-needed kthreads. */
> > >  	rcu_boost_kthread_setaffinity(rnp, -1);
> > >  	// Stop-machine done, so allow nohz_full to disable tick.
> > > -- 
> > > 2.31.1
> > > 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-23  1:56     ` Pingfan Liu
@ 2022-08-23  3:14       ` Joel Fernandes
  2022-08-24 13:38         ` Pingfan Liu
  2022-08-24 13:44       ` Jason A. Donenfeld
  1 sibling, 1 reply; 49+ messages in thread
From: Joel Fernandes @ 2022-08-23  3:14 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: LKML, rcu, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Thomas Gleixner, Steven Price,
	Mark Rutland, Kuppuswamy Sathyanarayanan, Jason A. Donenfeld

On Mon, Aug 22, 2022 at 9:56 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> On Mon, Aug 22, 2022 at 02:08:38PM -0400, Joel Fernandes wrote:
> > On Sun, Aug 21, 2022 at 10:16 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> > >
> > > In order to support parallel, rcu_state.n_online_cpus should be
> > > atomic_dec()
> >
> > What does Parallel mean? Is that some kexec terminology?
> >
>
> 'Parallel' means concurrent. It is not a kexec terminology, instead,
> should be SMP.

Ah ok! Makes sense. Apologies to be the word-police here, but you
probably could reword it to "In order to support parallel offlining"
or some such.

 - Joel



>
> Thanks,
>
>         Pingfan
>
>
> > Thanks,
> >
> >  - Joel
> >
> > >
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > Cc: Frederic Weisbecker <frederic@kernel.org>
> > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Steven Price <steven.price@arm.com>
> > > Cc: "Peter Zijlstra
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > > To: linux-kernel@vger.kernel.org
> > > To: rcu@vger.kernel.org
> > > ---
> > >  kernel/cpu.c      | 1 +
> > >  kernel/rcu/tree.c | 3 ++-
> > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > > index 1261c3f3be51..90debbe28e85 100644
> > > --- a/kernel/cpu.c
> > > +++ b/kernel/cpu.c
> > > @@ -1872,6 +1872,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
> > >                 .name                   = "RCU/tree:prepare",
> > >                 .startup.single         = rcutree_prepare_cpu,
> > >                 .teardown.single        = rcutree_dead_cpu,
> > > +               .support_kexec_parallel = true,
> > >         },
> > >         /*
> > >          * On the tear-down path, timers_dead_cpu() must be invoked
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 79aea7df4345..07d31e16c65e 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2168,7 +2168,8 @@ int rcutree_dead_cpu(unsigned int cpu)
> > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > >                 return 0;
> > >
> > > -       WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > > +       /* Hot remove path allows parallel, while Hot add races against remove on lock */
> > > +       atomic_dec((atomic_t *)&rcu_state.n_online_cpus);
> > >         /* Adjust any no-longer-needed kthreads. */
> > >         rcu_boost_kthread_setaffinity(rnp, -1);
> > >         // Stop-machine done, so allow nohz_full to disable tick.
> > > --
> > > 2.31.1
> > >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 07/10] lib/cpumask: Introduce cpumask_not_dying_but()
  2022-08-22 14:15   ` Yury Norov
@ 2022-08-23  7:29     ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-23  7:29 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Andy Shevchenko, Rasmus Villemoes, Thomas Gleixner,
	Steven Price, Mark Rutland, Jason A. Donenfeld,
	Kuppuswamy Sathyanarayanan

On Mon, Aug 22, 2022 at 07:15:45AM -0700, Yury Norov wrote:
> On Mon, Aug 22, 2022 at 10:15:17AM +0800, Pingfan Liu wrote:
> > During cpu hot-removing, the dying cpus are still in cpu_online_mask.
> > On the other hand, A subsystem will migrate its broker from the dying
> > cpu to a online cpu in its teardown cpuhp_step.
> > 
> > After enabling the teardown of cpus in parallel, cpu_online_mask can not
> > tell those dying from the real online.
> > 
> > Introducing a function cpumask_not_dying_but() to pick a real online
> > cpu.
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Yury Norov <yury.norov@gmail.com>
> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Steven Price <steven.price@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > To: linux-kernel@vger.kernel.org
> > ---
> >  include/linux/cpumask.h |  3 +++
> >  kernel/cpu.c            |  3 +++
> >  lib/cpumask.c           | 18 ++++++++++++++++++
> >  3 files changed, 24 insertions(+)
> > 
> > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> > index 0d435d0edbcb..d2033a239a07 100644
> > --- a/include/linux/cpumask.h
> > +++ b/include/linux/cpumask.h
> > @@ -317,6 +317,9 @@ unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> >  	return i;
> >  }
> >  
> > +/* for parallel kexec reboot */
> > +int cpumask_not_dying_but(const struct cpumask *mask, unsigned int cpu);
> > +
> >  #define CPU_BITS_NONE						\
> >  {								\
> >  	[0 ... BITS_TO_LONGS(NR_CPUS)-1] = 0UL			\
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 90debbe28e85..771e344f8ff9 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -1282,6 +1282,9 @@ static void cpus_down_no_rollback(struct cpumask *cpus)
> >  	struct cpuhp_cpu_state *st;
> >  	unsigned int cpu;
> >  
> > +	for_each_cpu(cpu, cpus)
> > +		set_cpu_dying(cpu, true);
> > +
> >  	/* launch ap work one by one, but not wait for completion */
> >  	for_each_cpu(cpu, cpus) {
> >  		st = per_cpu_ptr(&cpuhp_state, cpu);
> > diff --git a/lib/cpumask.c b/lib/cpumask.c
> > index 8baeb37e23d3..6474f07ed87a 100644
> > --- a/lib/cpumask.c
> > +++ b/lib/cpumask.c
> > @@ -7,6 +7,24 @@
> >  #include <linux/memblock.h>
> >  #include <linux/numa.h>
> >  
> > +/* Used in parallel kexec-reboot cpuhp callbacks */
> > +int cpumask_not_dying_but(const struct cpumask *mask,
> > +					   unsigned int cpu)
> > +{
> > +	unsigned int i;
> > +
> > +	if (CONFIG_SHUTDOWN_NONBOOT_CPUS) {
> 
> Hmm... Would it even work? Anyways, the documentation says:
> Within code, where possible, use the IS_ENABLED macro to convert a Kconfig
> symbol into a C boolean expression, and use it in a normal C conditional:
> 
> .. code-block:: c
> 
>         if (IS_ENABLED(CONFIG_SOMETHING)) {
>                 ...
>         }
> 

Yes, it shall be like you pointed out.

I changed the code from "#ifdef" style to "if (IS_ENABLED()" style just
before sending out the series. Sorry for the haste without compiling
check again.

> 
> > +		cpumask_check(cpu);
> > +		for_each_cpu(i, mask)
> > +			if (i != cpu && !cpumask_test_cpu(i, cpu_dying_mask))
> > +				break;
> > +		return i;
> > +	} else {
> > +		return cpumask_any_but(mask, cpu);
> > +	}
> > +}
> > +EXPORT_SYMBOL(cpumask_not_dying_but);
> 
> I don't like how you create a dedicated function for a random
> mask. Dying mask is nothing special, right? What you really

Yes, I agree.

> need is probably this:
>         cpumask_andnot_any_but(mask, cpu_dying_mask, cpu);
> 

That is it.

> Now, if you still think it's worth that, you can add a trivial wrapper
> for cpu_dying_mask. (But please pick some other name, because
> 'not dying but' sounds like a hangover description. :) )
> 

I think that since even if !IS_ENABLED(CONFIG_SHUTDOWN_NONBOOT_CPUS),
cpumask_andnot_any_but(mask, cpu_dying_mask, cpu) can work properly,
so replacing the callsite with "cpumask_andnot() + cpumask_any_but()"
will be a choice.

Appreciate for your help.


Thanks,

	Pingfan

> Thanks,
> Yury
> 
> > +
> >  /**
> >   * cpumask_next_wrap - helper to implement for_each_cpu_wrap
> >   * @n: the cpu prior to the place to search
> > -- 
> > 2.31.1

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-23  3:14       ` Joel Fernandes
@ 2022-08-24 13:38         ` Pingfan Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Pingfan Liu @ 2022-08-24 13:38 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, rcu, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Thomas Gleixner, Steven Price,
	Mark Rutland, Kuppuswamy Sathyanarayanan, Jason A. Donenfeld

On Tue, Aug 23, 2022 at 11:14 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Mon, Aug 22, 2022 at 9:56 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > On Mon, Aug 22, 2022 at 02:08:38PM -0400, Joel Fernandes wrote:
> > > On Sun, Aug 21, 2022 at 10:16 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> > > >
> > > > In order to support parallel, rcu_state.n_online_cpus should be
> > > > atomic_dec()
> > >
> > > What does Parallel mean? Is that some kexec terminology?
> > >
> >
> > 'Parallel' means concurrent. It is not a kexec terminology, instead,
> > should be SMP.
>
> Ah ok! Makes sense. Apologies to be the word-police here, but you
> probably could reword it to "In order to support parallel offlining"
> or some such.
>

Thanks for your advice. It is a good English lesson, which can give
more productivity in the community.


Thanks,

    Pingfan


>  - Joel
>
>
>
> >
> > Thanks,
> >
> >         Pingfan
> >
> >
> > > Thanks,
> > >
> > >  - Joel
> > >
> > > >
> > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > Cc: Frederic Weisbecker <frederic@kernel.org>
> > > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Steven Price <steven.price@arm.com>
> > > > Cc: "Peter Zijlstra
> > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > > > To: linux-kernel@vger.kernel.org
> > > > To: rcu@vger.kernel.org
> > > > ---
> > > >  kernel/cpu.c      | 1 +
> > > >  kernel/rcu/tree.c | 3 ++-
> > > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > > > index 1261c3f3be51..90debbe28e85 100644
> > > > --- a/kernel/cpu.c
> > > > +++ b/kernel/cpu.c
> > > > @@ -1872,6 +1872,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
> > > >                 .name                   = "RCU/tree:prepare",
> > > >                 .startup.single         = rcutree_prepare_cpu,
> > > >                 .teardown.single        = rcutree_dead_cpu,
> > > > +               .support_kexec_parallel = true,
> > > >         },
> > > >         /*
> > > >          * On the tear-down path, timers_dead_cpu() must be invoked
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 79aea7df4345..07d31e16c65e 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -2168,7 +2168,8 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > >                 return 0;
> > > >
> > > > -       WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > > > +       /* Hot remove path allows parallel, while Hot add races against remove on lock */
> > > > +       atomic_dec((atomic_t *)&rcu_state.n_online_cpus);
> > > >         /* Adjust any no-longer-needed kthreads. */
> > > >         rcu_boost_kthread_setaffinity(rnp, -1);
> > > >         // Stop-machine done, so allow nohz_full to disable tick.
> > > > --
> > > > 2.31.1
> > > >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-23  1:56     ` Pingfan Liu
  2022-08-23  3:14       ` Joel Fernandes
@ 2022-08-24 13:44       ` Jason A. Donenfeld
  1 sibling, 0 replies; 49+ messages in thread
From: Jason A. Donenfeld @ 2022-08-24 13:44 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Joel Fernandes, LKML, rcu, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Thomas Gleixner, Steven Price,
	Mark Rutland, Kuppuswamy Sathyanarayanan

On 8/22/22, Pingfan Liu <kernelfans@gmail.com> wrote:
> On Mon, Aug 22, 2022 at 02:08:38PM -0400, Joel Fernandes wrote:
>> On Sun, Aug 21, 2022 at 10:16 PM Pingfan Liu <kernelfans@gmail.com>
>> wrote:
>> >
>> > In order to support parallel, rcu_state.n_online_cpus should be
>> > atomic_dec()
>>
>> What does Parallel mean? Is that some kexec terminology?
>>
>
> 'Parallel' means concurrent. It is not a kexec terminology, instead,
> should be SMP.

Only sort of. See section A.6 of Paul 's book:
http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-23  3:01       ` Paul E. McKenney
@ 2022-08-24 13:53         ` Pingfan Liu
  2022-08-24 16:20           ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Pingfan Liu @ 2022-08-24 13:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, rcu, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, boqun.feng

On Tue, Aug 23, 2022 at 11:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Aug 23, 2022 at 09:50:56AM +0800, Pingfan Liu wrote:
> > On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
> > > On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> > > > In order to support parallel, rcu_state.n_online_cpus should be
> > > > atomic_dec()
> > > >
> > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > >
> > > I have to ask...  What testing have you subjected this patch to?
> > >
> >
> > This patch subjects to [1]. The series aims to enable kexec-reboot in
> > parallel on all cpu. As a result, the involved RCU part is expected to
> > support parallel.
>
> I understand (and even sympathize with) the expectation.  But results
> sometimes diverge from expectations.  There have been implicit assumptions
> in RCU about only one CPU going offline at a time, and I am not sure
> that all of them have been addressed.  Concurrent CPU onlining has
> been looked at recently here:
>
> https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
>
> You did us atomic_dec() to make rcu_state.n_online_cpus decrementing be
> atomic, which is good.  Did you look through the rest of RCU's CPU-offline
> code paths and related code paths?
>

I went through those codes at a shallow level, especially at each
cpuhp_step hook in the RCU system.

But as you pointed out, there are implicit assumptions about only one
CPU going offline at a time, I will chew the google doc which you
share.  Then I can come to a final result.

> > [1]: https://lore.kernel.org/linux-arm-kernel/20220822021520.6996-3-kernelfans@gmail.com/T/#mf62352138d7b040fdb583ba66f8cd0ed1e145feb
>
> Perhaps I am more blind than usual today, but I am not seeing anything
> in this patch describing the testing.  At this point, I am thinking in
> terms of making rcutorture test concurrent CPU offlining parallel
>

Yes, testing results are more convincing in this area.

After making clear the implicit assumptions, I will write some code to
bridge my code and rcutorture test. Since the series is a little
different from parallel cpu offlining. It happens after all devices
are torn down, and there is no way to rollback.

> Thoughts?
>

Need a deeper dive into this field. Hope to bring out something soon.


Thanks,

    Pingfan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-24 13:53         ` Pingfan Liu
@ 2022-08-24 16:20           ` Paul E. McKenney
  2022-08-24 17:26             ` Joel Fernandes
  2022-08-31 16:15             ` Paul E. McKenney
  0 siblings, 2 replies; 49+ messages in thread
From: Paul E. McKenney @ 2022-08-24 16:20 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: LKML, rcu, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, boqun.feng

On Wed, Aug 24, 2022 at 09:53:11PM +0800, Pingfan Liu wrote:
> On Tue, Aug 23, 2022 at 11:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Tue, Aug 23, 2022 at 09:50:56AM +0800, Pingfan Liu wrote:
> > > On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> > > > > In order to support parallel, rcu_state.n_online_cpus should be
> > > > > atomic_dec()
> > > > >
> > > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > >
> > > > I have to ask...  What testing have you subjected this patch to?
> > > >
> > >
> > > This patch subjects to [1]. The series aims to enable kexec-reboot in
> > > parallel on all cpu. As a result, the involved RCU part is expected to
> > > support parallel.
> >
> > I understand (and even sympathize with) the expectation.  But results
> > sometimes diverge from expectations.  There have been implicit assumptions
> > in RCU about only one CPU going offline at a time, and I am not sure
> > that all of them have been addressed.  Concurrent CPU onlining has
> > been looked at recently here:
> >
> > https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
> >
> > You did us atomic_dec() to make rcu_state.n_online_cpus decrementing be
> > atomic, which is good.  Did you look through the rest of RCU's CPU-offline
> > code paths and related code paths?
> 
> I went through those codes at a shallow level, especially at each
> cpuhp_step hook in the RCU system.

And that is fine, at least as a first step.

> But as you pointed out, there are implicit assumptions about only one
> CPU going offline at a time, I will chew the google doc which you
> share.  Then I can come to a final result.

Boqun Feng, Neeraj Upadhyay, Uladzislau Rezki, and I took a quick look,
and rcu_boost_kthread_setaffinity() seems to need some help.  As it
stands, it appears that concurrent invocations of this function from the
CPU-offline path will cause all but the last outgoing CPU's bit to be
(incorrectly) set in the cpumask_var_t passed to set_cpus_allowed_ptr().

This should not be difficult to fix, for example, by maintaining a
separate per-leaf-rcu_node-structure bitmask of the concurrently outgoing
CPUs for that rcu_node structure.  (Similar in structure to the
->qsmask field.)

There are probably more where that one came from.  ;-)

> > > [1]: https://lore.kernel.org/linux-arm-kernel/20220822021520.6996-3-kernelfans@gmail.com/T/#mf62352138d7b040fdb583ba66f8cd0ed1e145feb
> >
> > Perhaps I am more blind than usual today, but I am not seeing anything
> > in this patch describing the testing.  At this point, I am thinking in
> > terms of making rcutorture test concurrent CPU offlining parallel
> 
> Yes, testing results are more convincing in this area.
> 
> After making clear the implicit assumptions, I will write some code to
> bridge my code and rcutorture test. Since the series is a little
> different from parallel cpu offlining. It happens after all devices
> are torn down, and there is no way to rollback.

Very good, looking forward to seeing what you come up with!

> > Thoughts?
> 
> Need a deeper dive into this field. Hope to bring out something soon.

Again, looking forward to seeing what you find!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-24 16:20           ` Paul E. McKenney
@ 2022-08-24 17:26             ` Joel Fernandes
  2022-08-24 19:21               ` Paul E. McKenney
  2022-08-31 16:15             ` Paul E. McKenney
  1 sibling, 1 reply; 49+ messages in thread
From: Joel Fernandes @ 2022-08-24 17:26 UTC (permalink / raw)
  To: paulmck, Pingfan Liu
  Cc: LKML, rcu, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, boqun.feng



On 8/24/2022 12:20 PM, Paul E. McKenney wrote:
> On Wed, Aug 24, 2022 at 09:53:11PM +0800, Pingfan Liu wrote:
>> On Tue, Aug 23, 2022 at 11:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>>>
>>> On Tue, Aug 23, 2022 at 09:50:56AM +0800, Pingfan Liu wrote:
>>>> On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
>>>>> On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
>>>>>> In order to support parallel, rcu_state.n_online_cpus should be
>>>>>> atomic_dec()
>>>>>>
>>>>>> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
>>>>>
>>>>> I have to ask...  What testing have you subjected this patch to?
>>>>>
>>>>
>>>> This patch subjects to [1]. The series aims to enable kexec-reboot in
>>>> parallel on all cpu. As a result, the involved RCU part is expected to
>>>> support parallel.
>>>
>>> I understand (and even sympathize with) the expectation.  But results
>>> sometimes diverge from expectations.  There have been implicit assumptions
>>> in RCU about only one CPU going offline at a time, and I am not sure
>>> that all of them have been addressed.  Concurrent CPU onlining has
>>> been looked at recently here:
>>>
>>> https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
>>>
>>> You did us atomic_dec() to make rcu_state.n_online_cpus decrementing be
>>> atomic, which is good.  Did you look through the rest of RCU's CPU-offline
>>> code paths and related code paths?
>>
>> I went through those codes at a shallow level, especially at each
>> cpuhp_step hook in the RCU system.
> 
> And that is fine, at least as a first step.
> 
>> But as you pointed out, there are implicit assumptions about only one
>> CPU going offline at a time, I will chew the google doc which you
>> share.  Then I can come to a final result.
> 
> Boqun Feng, Neeraj Upadhyay, Uladzislau Rezki, and I took a quick look,
> and rcu_boost_kthread_setaffinity() seems to need some help.  As it
> stands, it appears that concurrent invocations of this function from the
> CPU-offline path will cause all but the last outgoing CPU's bit to be
> (incorrectly) set in the cpumask_var_t passed to set_cpus_allowed_ptr().
> 
> This should not be difficult to fix, for example, by maintaining a
> separate per-leaf-rcu_node-structure bitmask of the concurrently outgoing
> CPUs for that rcu_node structure.  (Similar in structure to the
> ->qsmask field.)
> 
> There are probably more where that one came from.  ;-)

Should rcutree_dying_cpu() access to rnp->qsmask have a READ_ONCE() ? I was
thinking grace period initialization or qs reporting paths racing with that. Its
just tracing, still :)

Thanks,

- Joel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-24 17:26             ` Joel Fernandes
@ 2022-08-24 19:21               ` Paul E. McKenney
  2022-08-24 22:54                 ` Joel Fernandes
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2022-08-24 19:21 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Pingfan Liu, LKML, rcu, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, boqun.feng

On Wed, Aug 24, 2022 at 01:26:01PM -0400, Joel Fernandes wrote:
> 
> 
> On 8/24/2022 12:20 PM, Paul E. McKenney wrote:
> > On Wed, Aug 24, 2022 at 09:53:11PM +0800, Pingfan Liu wrote:
> >> On Tue, Aug 23, 2022 at 11:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >>>
> >>> On Tue, Aug 23, 2022 at 09:50:56AM +0800, Pingfan Liu wrote:
> >>>> On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
> >>>>> On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> >>>>>> In order to support parallel, rcu_state.n_online_cpus should be
> >>>>>> atomic_dec()
> >>>>>>
> >>>>>> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> >>>>>
> >>>>> I have to ask...  What testing have you subjected this patch to?
> >>>>>
> >>>>
> >>>> This patch subjects to [1]. The series aims to enable kexec-reboot in
> >>>> parallel on all cpu. As a result, the involved RCU part is expected to
> >>>> support parallel.
> >>>
> >>> I understand (and even sympathize with) the expectation.  But results
> >>> sometimes diverge from expectations.  There have been implicit assumptions
> >>> in RCU about only one CPU going offline at a time, and I am not sure
> >>> that all of them have been addressed.  Concurrent CPU onlining has
> >>> been looked at recently here:
> >>>
> >>> https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
> >>>
> >>> You did us atomic_dec() to make rcu_state.n_online_cpus decrementing be
> >>> atomic, which is good.  Did you look through the rest of RCU's CPU-offline
> >>> code paths and related code paths?
> >>
> >> I went through those codes at a shallow level, especially at each
> >> cpuhp_step hook in the RCU system.
> > 
> > And that is fine, at least as a first step.
> > 
> >> But as you pointed out, there are implicit assumptions about only one
> >> CPU going offline at a time, I will chew the google doc which you
> >> share.  Then I can come to a final result.
> > 
> > Boqun Feng, Neeraj Upadhyay, Uladzislau Rezki, and I took a quick look,
> > and rcu_boost_kthread_setaffinity() seems to need some help.  As it
> > stands, it appears that concurrent invocations of this function from the
> > CPU-offline path will cause all but the last outgoing CPU's bit to be
> > (incorrectly) set in the cpumask_var_t passed to set_cpus_allowed_ptr().
> > 
> > This should not be difficult to fix, for example, by maintaining a
> > separate per-leaf-rcu_node-structure bitmask of the concurrently outgoing
> > CPUs for that rcu_node structure.  (Similar in structure to the
> > ->qsmask field.)
> > 
> > There are probably more where that one came from.  ;-)
> 
> Should rcutree_dying_cpu() access to rnp->qsmask have a READ_ONCE() ? I was
> thinking grace period initialization or qs reporting paths racing with that. Its
> just tracing, still :)

Looks like it should be regardless of Pingfan's patches, given that
the grace-period kthread might report a quiescent state concurrently.

Good catch!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-24 19:21               ` Paul E. McKenney
@ 2022-08-24 22:54                 ` Joel Fernandes
  2022-08-24 23:01                   ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Joel Fernandes @ 2022-08-24 22:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Pingfan Liu, LKML, rcu, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, Boqun Feng

On Wed, Aug 24, 2022 at 3:21 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Aug 24, 2022 at 01:26:01PM -0400, Joel Fernandes wrote:
> >
> >
> > On 8/24/2022 12:20 PM, Paul E. McKenney wrote:
> > > On Wed, Aug 24, 2022 at 09:53:11PM +0800, Pingfan Liu wrote:
> > >> On Tue, Aug 23, 2022 at 11:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >>>
> > >>> On Tue, Aug 23, 2022 at 09:50:56AM +0800, Pingfan Liu wrote:
> > >>>> On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
> > >>>>> On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> > >>>>>> In order to support parallel, rcu_state.n_online_cpus should be
> > >>>>>> atomic_dec()
> > >>>>>>
> > >>>>>> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > >>>>>
> > >>>>> I have to ask...  What testing have you subjected this patch to?
> > >>>>>
> > >>>>
> > >>>> This patch subjects to [1]. The series aims to enable kexec-reboot in
> > >>>> parallel on all cpu. As a result, the involved RCU part is expected to
> > >>>> support parallel.
> > >>>
> > >>> I understand (and even sympathize with) the expectation.  But results
> > >>> sometimes diverge from expectations.  There have been implicit assumptions
> > >>> in RCU about only one CPU going offline at a time, and I am not sure
> > >>> that all of them have been addressed.  Concurrent CPU onlining has
> > >>> been looked at recently here:
> > >>>
> > >>> https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
> > >>>
> > >>> You did us atomic_dec() to make rcu_state.n_online_cpus decrementing be
> > >>> atomic, which is good.  Did you look through the rest of RCU's CPU-offline
> > >>> code paths and related code paths?
> > >>
> > >> I went through those codes at a shallow level, especially at each
> > >> cpuhp_step hook in the RCU system.
> > >
> > > And that is fine, at least as a first step.
> > >
> > >> But as you pointed out, there are implicit assumptions about only one
> > >> CPU going offline at a time, I will chew the google doc which you
> > >> share.  Then I can come to a final result.
> > >
> > > Boqun Feng, Neeraj Upadhyay, Uladzislau Rezki, and I took a quick look,
> > > and rcu_boost_kthread_setaffinity() seems to need some help.  As it
> > > stands, it appears that concurrent invocations of this function from the
> > > CPU-offline path will cause all but the last outgoing CPU's bit to be
> > > (incorrectly) set in the cpumask_var_t passed to set_cpus_allowed_ptr().
> > >
> > > This should not be difficult to fix, for example, by maintaining a
> > > separate per-leaf-rcu_node-structure bitmask of the concurrently outgoing
> > > CPUs for that rcu_node structure.  (Similar in structure to the
> > > ->qsmask field.)
> > >
> > > There are probably more where that one came from.  ;-)
> >
> > Should rcutree_dying_cpu() access to rnp->qsmask have a READ_ONCE() ? I was
> > thinking grace period initialization or qs reporting paths racing with that. Its
> > just tracing, still :)
>
> Looks like it should be regardless of Pingfan's patches, given that
> the grace-period kthread might report a quiescent state concurrently.

Thanks for confirming, I'll queue it into my next revision of the series.

 - Joel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-24 22:54                 ` Joel Fernandes
@ 2022-08-24 23:01                   ` Paul E. McKenney
  0 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2022-08-24 23:01 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Pingfan Liu, LKML, rcu, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, Boqun Feng

On Wed, Aug 24, 2022 at 06:54:01PM -0400, Joel Fernandes wrote:
> On Wed, Aug 24, 2022 at 3:21 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, Aug 24, 2022 at 01:26:01PM -0400, Joel Fernandes wrote:
> > >
> > >
> > > On 8/24/2022 12:20 PM, Paul E. McKenney wrote:
> > > > On Wed, Aug 24, 2022 at 09:53:11PM +0800, Pingfan Liu wrote:
> > > >> On Tue, Aug 23, 2022 at 11:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >>>
> > > >>> On Tue, Aug 23, 2022 at 09:50:56AM +0800, Pingfan Liu wrote:
> > > >>>> On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
> > > >>>>> On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> > > >>>>>> In order to support parallel, rcu_state.n_online_cpus should be
> > > >>>>>> atomic_dec()
> > > >>>>>>
> > > >>>>>> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > >>>>>
> > > >>>>> I have to ask...  What testing have you subjected this patch to?
> > > >>>>>
> > > >>>>
> > > >>>> This patch subjects to [1]. The series aims to enable kexec-reboot in
> > > >>>> parallel on all cpu. As a result, the involved RCU part is expected to
> > > >>>> support parallel.
> > > >>>
> > > >>> I understand (and even sympathize with) the expectation.  But results
> > > >>> sometimes diverge from expectations.  There have been implicit assumptions
> > > >>> in RCU about only one CPU going offline at a time, and I am not sure
> > > >>> that all of them have been addressed.  Concurrent CPU onlining has
> > > >>> been looked at recently here:
> > > >>>
> > > >>> https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
> > > >>>
> > > >>> You did us atomic_dec() to make rcu_state.n_online_cpus decrementing be
> > > >>> atomic, which is good.  Did you look through the rest of RCU's CPU-offline
> > > >>> code paths and related code paths?
> > > >>
> > > >> I went through those codes at a shallow level, especially at each
> > > >> cpuhp_step hook in the RCU system.
> > > >
> > > > And that is fine, at least as a first step.
> > > >
> > > >> But as you pointed out, there are implicit assumptions about only one
> > > >> CPU going offline at a time, I will chew the google doc which you
> > > >> share.  Then I can come to a final result.
> > > >
> > > > Boqun Feng, Neeraj Upadhyay, Uladzislau Rezki, and I took a quick look,
> > > > and rcu_boost_kthread_setaffinity() seems to need some help.  As it
> > > > stands, it appears that concurrent invocations of this function from the
> > > > CPU-offline path will cause all but the last outgoing CPU's bit to be
> > > > (incorrectly) set in the cpumask_var_t passed to set_cpus_allowed_ptr().
> > > >
> > > > This should not be difficult to fix, for example, by maintaining a
> > > > separate per-leaf-rcu_node-structure bitmask of the concurrently outgoing
> > > > CPUs for that rcu_node structure.  (Similar in structure to the
> > > > ->qsmask field.)
> > > >
> > > > There are probably more where that one came from.  ;-)
> > >
> > > Should rcutree_dying_cpu() access to rnp->qsmask have a READ_ONCE() ? I was
> > > thinking grace period initialization or qs reporting paths racing with that. Its
> > > just tracing, still :)
> >
> > Looks like it should be regardless of Pingfan's patches, given that
> > the grace-period kthread might report a quiescent state concurrently.
> 
> Thanks for confirming, I'll queue it into my next revision of the series.

Sounds good!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-24 16:20           ` Paul E. McKenney
  2022-08-24 17:26             ` Joel Fernandes
@ 2022-08-31 16:15             ` Paul E. McKenney
  2022-09-05  3:53               ` Pingfan Liu
  1 sibling, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2022-08-31 16:15 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: LKML, rcu, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, boqun.feng

On Wed, Aug 24, 2022 at 09:20:50AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 24, 2022 at 09:53:11PM +0800, Pingfan Liu wrote:
> > On Tue, Aug 23, 2022 at 11:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > On Tue, Aug 23, 2022 at 09:50:56AM +0800, Pingfan Liu wrote:
> > > > On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> > > > > > In order to support parallel, rcu_state.n_online_cpus should be
> > > > > > atomic_dec()
> > > > > >
> > > > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > >
> > > > > I have to ask...  What testing have you subjected this patch to?
> > > > >
> > > >
> > > > This patch subjects to [1]. The series aims to enable kexec-reboot in
> > > > parallel on all cpu. As a result, the involved RCU part is expected to
> > > > support parallel.
> > >
> > > I understand (and even sympathize with) the expectation.  But results
> > > sometimes diverge from expectations.  There have been implicit assumptions
> > > in RCU about only one CPU going offline at a time, and I am not sure
> > > that all of them have been addressed.  Concurrent CPU onlining has
> > > been looked at recently here:
> > >
> > > https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
> > >
> > > You did us atomic_dec() to make rcu_state.n_online_cpus decrementing be
> > > atomic, which is good.  Did you look through the rest of RCU's CPU-offline
> > > code paths and related code paths?
> > 
> > I went through those codes at a shallow level, especially at each
> > cpuhp_step hook in the RCU system.
> 
> And that is fine, at least as a first step.
> 
> > But as you pointed out, there are implicit assumptions about only one
> > CPU going offline at a time, I will chew the google doc which you
> > share.  Then I can come to a final result.
> 
> Boqun Feng, Neeraj Upadhyay, Uladzislau Rezki, and I took a quick look,
> and rcu_boost_kthread_setaffinity() seems to need some help.  As it
> stands, it appears that concurrent invocations of this function from the
> CPU-offline path will cause all but the last outgoing CPU's bit to be
> (incorrectly) set in the cpumask_var_t passed to set_cpus_allowed_ptr().
> 
> This should not be difficult to fix, for example, by maintaining a
> separate per-leaf-rcu_node-structure bitmask of the concurrently outgoing
> CPUs for that rcu_node structure.  (Similar in structure to the
> ->qsmask field.)
> 
> There are probably more where that one came from.  ;-)

And here is one more from this week's session.

The calls to tick_dep_set() and tick_dep_clear() use atomic operations,
but they operate on a global variable.  This means that the first call
to rcutree_offline_cpu() would enable the tick and the first call to
rcutree_dead_cpu() would disable the tick.  This might be OK, but it
is at the very least bad practice.  There needs to be a counter
mediating these calls.

For more detail, please see the Google document:

https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing

							Thanx, Paul

> > > > [1]: https://lore.kernel.org/linux-arm-kernel/20220822021520.6996-3-kernelfans@gmail.com/T/#mf62352138d7b040fdb583ba66f8cd0ed1e145feb
> > >
> > > Perhaps I am more blind than usual today, but I am not seeing anything
> > > in this patch describing the testing.  At this point, I am thinking in
> > > terms of making rcutorture test concurrent CPU offlining parallel
> > 
> > Yes, testing results are more convincing in this area.
> > 
> > After making clear the implicit assumptions, I will write some code to
> > bridge my code and rcutorture test. Since the series is a little
> > different from parallel cpu offlining. It happens after all devices
> > are torn down, and there is no way to rollback.
> 
> Very good, looking forward to seeing what you come up with!
> 
> > > Thoughts?
> > 
> > Need a deeper dive into this field. Hope to bring out something soon.
> 
> Again, looking forward to seeing what you find!
> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-08-31 16:15             ` Paul E. McKenney
@ 2022-09-05  3:53               ` Pingfan Liu
  2022-09-06 18:45                 ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Pingfan Liu @ 2022-09-05  3:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, rcu, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, boqun.feng

On Thu, Sep 1, 2022 at 12:15 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Aug 24, 2022 at 09:20:50AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 24, 2022 at 09:53:11PM +0800, Pingfan Liu wrote:
> > > On Tue, Aug 23, 2022 at 11:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > On Tue, Aug 23, 2022 at 09:50:56AM +0800, Pingfan Liu wrote:
> > > > > On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> > > > > > > In order to support parallel, rcu_state.n_online_cpus should be
> > > > > > > atomic_dec()
> > > > > > >
> > > > > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > > >
> > > > > > I have to ask...  What testing have you subjected this patch to?
> > > > > >
> > > > >
> > > > > This patch subjects to [1]. The series aims to enable kexec-reboot in
> > > > > parallel on all cpu. As a result, the involved RCU part is expected to
> > > > > support parallel.
> > > >
> > > > I understand (and even sympathize with) the expectation.  But results
> > > > sometimes diverge from expectations.  There have been implicit assumptions
> > > > in RCU about only one CPU going offline at a time, and I am not sure
> > > > that all of them have been addressed.  Concurrent CPU onlining has
> > > > been looked at recently here:
> > > >
> > > > https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
> > > >
> > > > You did us atomic_dec() to make rcu_state.n_online_cpus decrementing be
> > > > atomic, which is good.  Did you look through the rest of RCU's CPU-offline
> > > > code paths and related code paths?
> > >
> > > I went through those codes at a shallow level, especially at each
> > > cpuhp_step hook in the RCU system.
> >
> > And that is fine, at least as a first step.
> >
> > > But as you pointed out, there are implicit assumptions about only one
> > > CPU going offline at a time, I will chew the google doc which you
> > > share.  Then I can come to a final result.
> >
> > Boqun Feng, Neeraj Upadhyay, Uladzislau Rezki, and I took a quick look,
> > and rcu_boost_kthread_setaffinity() seems to need some help.  As it
> > stands, it appears that concurrent invocations of this function from the
> > CPU-offline path will cause all but the last outgoing CPU's bit to be
> > (incorrectly) set in the cpumask_var_t passed to set_cpus_allowed_ptr().
> >
> > This should not be difficult to fix, for example, by maintaining a
> > separate per-leaf-rcu_node-structure bitmask of the concurrently outgoing
> > CPUs for that rcu_node structure.  (Similar in structure to the
> > ->qsmask field.)
> >

Sorry to reply late, since I am interrupted by some other things.
I have took a different way and posted a series ([PATCH 1/3] rcu:
remove redundant cpu affinity setting during teardown) for that on
https://lore.kernel.org/rcu/20220905033852.18988-1-kernelfans@gmail.com/T/#t

Besides, for the integration of the concurrency cpu hot-removing into
the rcu torture test, I begin to do it.

> > There are probably more where that one came from.  ;-)
>
> And here is one more from this week's session.
>

Thanks for the update.

> The calls to tick_dep_set() and tick_dep_clear() use atomic operations,
> but they operate on a global variable.  This means that the first call
> to rcutree_offline_cpu() would enable the tick and the first call to
> rcutree_dead_cpu() would disable the tick.  This might be OK, but it
> is at the very least bad practice.  There needs to be a counter
> mediating these calls.
>

I will see what I can do here.

> For more detail, please see the Google document:
>
> https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
>

Have read it and hope that both online and offline concurrency can
come to true in near future.

Thanks,

    Pingfan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel
  2022-09-05  3:53               ` Pingfan Liu
@ 2022-09-06 18:45                 ` Paul E. McKenney
  0 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2022-09-06 18:45 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: LKML, rcu, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Thomas Gleixner, Steven Price, Mark Rutland,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld, boqun.feng

On Mon, Sep 05, 2022 at 11:53:52AM +0800, Pingfan Liu wrote:
> On Thu, Sep 1, 2022 at 12:15 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, Aug 24, 2022 at 09:20:50AM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 24, 2022 at 09:53:11PM +0800, Pingfan Liu wrote:
> > > > On Tue, Aug 23, 2022 at 11:01 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > On Tue, Aug 23, 2022 at 09:50:56AM +0800, Pingfan Liu wrote:
> > > > > > On Sun, Aug 21, 2022 at 07:45:28PM -0700, Paul E. McKenney wrote:
> > > > > > > On Mon, Aug 22, 2022 at 10:15:16AM +0800, Pingfan Liu wrote:
> > > > > > > > In order to support parallel, rcu_state.n_online_cpus should be
> > > > > > > > atomic_dec()
> > > > > > > >
> > > > > > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > > > >
> > > > > > > I have to ask...  What testing have you subjected this patch to?
> > > > > > >
> > > > > >
> > > > > > This patch subjects to [1]. The series aims to enable kexec-reboot in
> > > > > > parallel on all cpu. As a result, the involved RCU part is expected to
> > > > > > support parallel.
> > > > >
> > > > > I understand (and even sympathize with) the expectation.  But results
> > > > > sometimes diverge from expectations.  There have been implicit assumptions
> > > > > in RCU about only one CPU going offline at a time, and I am not sure
> > > > > that all of them have been addressed.  Concurrent CPU onlining has
> > > > > been looked at recently here:
> > > > >
> > > > > https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
> > > > >
> > > > > You did us atomic_dec() to make rcu_state.n_online_cpus decrementing be
> > > > > atomic, which is good.  Did you look through the rest of RCU's CPU-offline
> > > > > code paths and related code paths?
> > > >
> > > > I went through those codes at a shallow level, especially at each
> > > > cpuhp_step hook in the RCU system.
> > >
> > > And that is fine, at least as a first step.
> > >
> > > > But as you pointed out, there are implicit assumptions about only one
> > > > CPU going offline at a time, I will chew the google doc which you
> > > > share.  Then I can come to a final result.
> > >
> > > Boqun Feng, Neeraj Upadhyay, Uladzislau Rezki, and I took a quick look,
> > > and rcu_boost_kthread_setaffinity() seems to need some help.  As it
> > > stands, it appears that concurrent invocations of this function from the
> > > CPU-offline path will cause all but the last outgoing CPU's bit to be
> > > (incorrectly) set in the cpumask_var_t passed to set_cpus_allowed_ptr().
> > >
> > > This should not be difficult to fix, for example, by maintaining a
> > > separate per-leaf-rcu_node-structure bitmask of the concurrently outgoing
> > > CPUs for that rcu_node structure.  (Similar in structure to the
> > > ->qsmask field.)
> > >
> 
> Sorry to reply late, since I am interrupted by some other things.
> I have took a different way and posted a series ([PATCH 1/3] rcu:
> remove redundant cpu affinity setting during teardown) for that on
> https://lore.kernel.org/rcu/20220905033852.18988-1-kernelfans@gmail.com/T/#t

And I took patch #3, thank you!

#1 allows the kthread to run on the outgoing CPU, which is to be
avoided, and #2 depends on #1.

> Besides, for the integration of the concurrency cpu hot-removing into
> the rcu torture test, I begin to do it.

Very good!  I am looking forward to seeing what you come up with.

> > > There are probably more where that one came from.  ;-)
> >
> > And here is one more from this week's session.
> 
> Thanks for the update.
> 
> > The calls to tick_dep_set() and tick_dep_clear() use atomic operations,
> > but they operate on a global variable.  This means that the first call
> > to rcutree_offline_cpu() would enable the tick and the first call to
> > rcutree_dead_cpu() would disable the tick.  This might be OK, but it
> > is at the very least bad practice.  There needs to be a counter
> > mediating these calls.
> 
> I will see what I can do here.
> 
> > For more detail, please see the Google document:
> >
> > https://docs.google.com/document/d/1jymsaCPQ1PUDcfjIKm0UIbVdrJAaGX-6cXrmcfm0PRU/edit?usp=sharing
> >
> 
> Have read it and hope that both online and offline concurrency can
> come to true in near future.

Indeed, I suspect that a lot of people would like to see faster kexec!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2022-09-06 18:45 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-22  2:15 [RFC 00/10] arm64/riscv: Introduce fast kexec reboot Pingfan Liu
2022-08-22  2:15 ` Pingfan Liu
2022-08-22  2:15 ` Pingfan Liu
2022-08-22  2:15 ` Pingfan Liu
2022-08-22  2:15 ` [RFC 01/10] cpu/hotplug: Make __cpuhp_kick_ap() ready for async Pingfan Liu
2022-08-22  2:15 ` [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS Pingfan Liu
2022-08-22  2:15   ` [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_ Pingfan Liu
2022-08-22  2:15   ` [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15 ` [RFC 03/10] cpu/hotplug: Introduce fast kexec reboot Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15 ` [RFC 04/10] cpu/hotplug: Check the capability of kexec quick reboot Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15 ` [RFC 05/10] perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15 ` [RFC 06/10] rcu/hotplug: Make rcutree_dead_cpu() parallel Pingfan Liu
2022-08-22  2:45   ` Paul E. McKenney
2022-08-23  1:50     ` Pingfan Liu
2022-08-23  3:01       ` Paul E. McKenney
2022-08-24 13:53         ` Pingfan Liu
2022-08-24 16:20           ` Paul E. McKenney
2022-08-24 17:26             ` Joel Fernandes
2022-08-24 19:21               ` Paul E. McKenney
2022-08-24 22:54                 ` Joel Fernandes
2022-08-24 23:01                   ` Paul E. McKenney
2022-08-31 16:15             ` Paul E. McKenney
2022-09-05  3:53               ` Pingfan Liu
2022-09-06 18:45                 ` Paul E. McKenney
2022-08-22  4:54   ` kernel test robot
2022-08-22 18:08   ` Joel Fernandes
2022-08-23  1:56     ` Pingfan Liu
2022-08-23  3:14       ` Joel Fernandes
2022-08-24 13:38         ` Pingfan Liu
2022-08-24 13:44       ` Jason A. Donenfeld
2022-08-22  2:15 ` [RFC 07/10] lib/cpumask: Introduce cpumask_not_dying_but() Pingfan Liu
2022-08-22 14:15   ` Yury Norov
2022-08-23  7:29     ` Pingfan Liu
2022-08-22  2:15 ` [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu) Pingfan Liu
2022-08-22  2:15   ` [Intel-gfx] " Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu
2022-08-22  2:15 ` [RFC 09/10] genirq/cpuhotplug: Ask migrate_one_irq() to migrate to a real online cpu Pingfan Liu
2022-08-22  2:15 ` [RFC 10/10] arm64: smp: Make __cpu_disable() parallel Pingfan Liu
2022-08-22  2:15   ` Pingfan Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.