All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/11] smp: cross CPU call interface
@ 2022-05-19 20:49 Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

The motivation of this patch set is intended to make the existing
cross CPU call mechanism become a formal interface, more friendly
to the kernel developers.

For the simplicity self-explanatory less code redundancy and no
ambiguity, the set of functions below can satisfy any existing
demand for cross call, at the meantime they still carry over the
same semantics and the similar parameter lists.

int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags)

int smp_call_cond(int cpu, smp_call_func_t func, void *info,
                smp_cond_func_t condf, unsigned int flags)

void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
                void *info, unsigned int flags)

void smp_call_mask_others(const struct cpumask *mask, smp_call_func_t func,
                void *info, unsigned int flags)

void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
                void *info, smp_cond_func_t condf, unsigned int flags)

int smp_call_csd(int cpu, call_single_data_t *csd, unsigned int flags)

Patch 1: The smp cross call related structures and definitions are
consolidated from smp.c smp_types.h to smp.h. As a result, smp_types.h
is deleted from the source tree.

Patch 2: The set of smp_call* functions listed above are defined.
But the details will be done with the subsequent patches in this set.

Patch 3: Removed SCF_WAIT and SCF_RUN_LOCAL and changed
smp_call_function_many_cond() accordingly. In order to differentiate
the cross calls that the local CPU needs to be included from those the
local CPU needs to be excluded, the interface provides two different
functions - smp_call_mask() and smp_call_mask_others(). As a result
of these changes, the on_each_cpu* functions and smp_call_function_many
smp_call_function will be eliminated with another patch in this series.

Patch 4: Eliminated the percpu global csd_data. Let
smp_call_function_single() temporarily hook up to smp_call().

Patch 5: Replaced smp_call_function_single_async() with smp_call_csd()
and also extended smp_call_csd() to support one CPU synchronous cross
call with preallocated csd structure

Patch 6: irq_work.c and core.c should use the cross interface rather
than using a unpublished internal function __smp_call_single_queue.

But this change also exposed unpublished types SMP_CALL_TYPE_TTWU and
SMP_CALL_TYPE_IRQ_WORK. However, this is a temporarily change so that
all kernel clients can call the interface as the first step. A
permanent solution should implement new cross call func() for both
and from there the TTWU and IRQ_WORK types of requests can be processed.
That way, the special TTWU and IRQ_WORK types will not be visible from
the cross call interface. Because this needs to change core.s and
irq_work.c, this should be done with a follow up patch set.

Patch 7: When a caller needs to send a cross call to a preferred CPU,
the caller should provide the CPU rather than having the cross call
interface pick up one because interface should do the common job for
the majority rather than cover every single special cases.

In the scenario of smp_call_function_any(), a preferred CPU is either
the local CPU or any CPU in the same node or any online CPU as long
as it is present in cpumask. However, the way of selecting a preferred
CPU should be done by the caller. Thus, the particular code which
deal with CPU selection is moved outside of smp_call_function_any()
and formed a utility function __smp_cpu_nearby() for this purpose.
So, the function smp_call_function_any() can be replaced with the
combination of __smp_cpu_nearby() and smp_call(). As a result, the
function smp_call_function_any() is eliminated.

Patch 8: Eliminated smp_call_function, smp_call_function_many,
smp_call_function_many_cond, on_each_cpu, on_each_cpu_mask,
on_each_cpu_cond, on_each_cpu_cond_mask.

Patch 9: Eliminated smp_call_function_single_async.

Patch 10: Eliminated smp_call_function_single.

Patch 11: modify up.c to adopt the same format of cross CPU call.

Note: Each patch in this set depends on its precedent patch only.
      The kernel can be built and boot if it is patched with any
      number of patches starting from 1 to 11.

v1 -> v2: Removed 'x' from the function names and change XCALL to
	  SMP_CALL from the new macros.
V2 -> v3: Modified the cover-letter since v2;
	  Addressed the issues that Peter and Thomas have pointed out
	  previously.
v3 -> v4: Fixed the assert warnings emitted from __smp_call_mask_cond
	  which is triggered by callers without preempt_disable()
	  before calling smp_call_mask_others.
	  Fixed the "error: 'lush_tlb_all_local' undeclared" error
	  at arch/parisc/mm/init.c:850:32
	  Fixed the "variable 'end_paddr' set but not used" issue
	  at arch/parisc/mm/init.c:623:31

Donghai Qiao (11):
  smp: consolidate the structure definitions to smp.h
  smp: the definitions of cross call interface
  smp: remove SCF_WAIT and SCF_RUN_LOCAL
  smp: replace smp_call_function_single() with  smp_call()
  smp: replace smp_call_function_single_async with smp_call_csd
  smp: use smp_call_csd() from irq_work.c and core.c
  smp: eliminate smp_call_function_any
  smp: replace smp_call_function_many_cond with __smp_call_mask_cond
  smp: replace smp_call_function_single_async with smp_call_csd
  smp: replace smp_call_function_single() with smp_call
  smp: up.c to adopt the same format of cross CPU call

 arch/alpha/kernel/process.c                   |   2 +-
 arch/alpha/kernel/rtc.c                       |   4 +-
 arch/alpha/kernel/smp.c                       |  13 +-
 arch/arc/kernel/perf_event.c                  |   2 +-
 arch/arc/mm/cache.c                           |   2 +-
 arch/arc/mm/tlb.c                             |  14 +-
 arch/arm/common/bL_switcher.c                 |   3 +-
 arch/arm/kernel/machine_kexec.c               |   7 +-
 arch/arm/kernel/perf_event_v7.c               |  18 +-
 arch/arm/kernel/smp_tlb.c                     |  25 +-
 arch/arm/kernel/smp_twd.c                     |   4 +-
 arch/arm/mach-bcm/bcm_kona_smc.c              |   2 +-
 arch/arm/mach-mvebu/pmsu.c                    |   4 +-
 arch/arm/mm/flush.c                           |  10 +-
 arch/arm/vfp/vfpmodule.c                      |   2 +-
 arch/arm64/kernel/armv8_deprecated.c          |   4 +-
 arch/arm64/kernel/perf_event.c                |  16 +-
 arch/arm64/kernel/topology.c                  |   2 +-
 arch/arm64/kvm/arm.c                          |   6 +-
 arch/csky/abiv2/cacheflush.c                  |   2 +-
 arch/csky/kernel/cpu-probe.c                  |   2 +-
 arch/csky/kernel/perf_event.c                 |   2 +-
 arch/csky/kernel/smp.c                        |   2 +-
 arch/csky/mm/cachev2.c                        |   2 +-
 arch/ia64/kernel/mca.c                        |   4 +-
 arch/ia64/kernel/palinfo.c                    |   3 +-
 arch/ia64/kernel/smp.c                        |  12 +-
 arch/ia64/kernel/smpboot.c                    |   2 +-
 arch/ia64/kernel/uncached.c                   |  14 +-
 arch/mips/cavium-octeon/octeon-irq.c          |   4 +-
 arch/mips/cavium-octeon/setup.c               |  12 +-
 arch/mips/kernel/crash.c                      |   6 +-
 arch/mips/kernel/machine_kexec.c              |   5 +-
 arch/mips/kernel/perf_event_mipsxx.c          |   7 +-
 arch/mips/kernel/process.c                    |   2 +-
 arch/mips/kernel/smp-bmips.c                  |   3 +-
 arch/mips/kernel/smp-cps.c                    |   8 +-
 arch/mips/kernel/smp.c                        |  13 +-
 arch/mips/kernel/sysrq.c                      |   5 +-
 arch/mips/mm/c-r4k.c                          |   4 +-
 arch/mips/sibyte/common/cfe.c                 |   5 +-
 arch/openrisc/kernel/smp.c                    |  15 +-
 arch/parisc/kernel/cache.c                    |   4 +-
 arch/parisc/mm/init.c                         |   4 +-
 arch/powerpc/kernel/dawr.c                    |   8 +-
 arch/powerpc/kernel/kvm.c                     |   2 +-
 arch/powerpc/kernel/security.c                |   6 +-
 arch/powerpc/kernel/smp.c                     |  10 +-
 arch/powerpc/kernel/sysfs.c                   |  28 +-
 arch/powerpc/kernel/tau_6xx.c                 |   4 +-
 arch/powerpc/kernel/watchdog.c                |   4 +-
 arch/powerpc/kexec/core_64.c                  |   7 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |   2 +-
 arch/powerpc/kvm/book3s_hv.c                  |   8 +-
 arch/powerpc/mm/book3s64/pgtable.c            |   2 +-
 arch/powerpc/mm/book3s64/radix_tlb.c          |  12 +-
 arch/powerpc/mm/nohash/tlb.c                  |  11 +-
 arch/powerpc/mm/slice.c                       |   4 +-
 arch/powerpc/perf/core-book3s.c               |   2 +-
 arch/powerpc/perf/imc-pmu.c                   |   2 +-
 arch/powerpc/platforms/85xx/smp.c             |  14 +-
 arch/powerpc/platforms/powernv/idle.c         |   2 +-
 arch/powerpc/platforms/pseries/lparcfg.c      |   2 +-
 arch/riscv/mm/cacheflush.c                    |   4 +-
 arch/s390/hypfs/hypfs_diag0c.c                |   2 +-
 arch/s390/kernel/alternative.c                |   2 +-
 arch/s390/kernel/perf_cpum_cf.c               |  10 +-
 arch/s390/kernel/perf_cpum_cf_common.c        |   4 +-
 arch/s390/kernel/perf_cpum_sf.c               |   4 +-
 arch/s390/kernel/processor.c                  |   2 +-
 arch/s390/kernel/smp.c                        |   2 +-
 arch/s390/kernel/topology.c                   |   2 +-
 arch/s390/mm/pgalloc.c                        |   2 +-
 arch/s390/pci/pci_irq.c                       |   4 +-
 arch/sh/kernel/smp.c                          |  24 +-
 arch/sh/mm/cache.c                            |   3 +-
 arch/sparc/include/asm/mman.h                 |   4 +-
 arch/sparc/kernel/nmi.c                       |  19 +-
 arch/sparc/kernel/perf_event.c                |   4 +-
 arch/sparc/kernel/smp_64.c                    |  14 +-
 arch/sparc/mm/init_64.c                       |   2 +-
 arch/x86/events/core.c                        |   6 +-
 arch/x86/events/intel/core.c                  |   4 +-
 arch/x86/kernel/alternative.c                 |   2 +-
 arch/x86/kernel/amd_nb.c                      |   2 +-
 arch/x86/kernel/apic/apic.c                   |   2 +-
 arch/x86/kernel/apic/vector.c                 |   2 +-
 arch/x86/kernel/cpu/aperfmperf.c              |   5 +-
 arch/x86/kernel/cpu/bugs.c                    |   2 +-
 arch/x86/kernel/cpu/mce/amd.c                 |   4 +-
 arch/x86/kernel/cpu/mce/core.c                |  12 +-
 arch/x86/kernel/cpu/mce/inject.c              |  12 +-
 arch/x86/kernel/cpu/mce/intel.c               |   2 +-
 arch/x86/kernel/cpu/microcode/core.c          |   4 +-
 arch/x86/kernel/cpu/mtrr/mtrr.c               |   2 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |  14 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |   8 +-
 arch/x86/kernel/cpu/sgx/main.c                |   5 +-
 arch/x86/kernel/cpu/umwait.c                  |   2 +-
 arch/x86/kernel/cpu/vmware.c                  |   2 +-
 arch/x86/kernel/cpuid.c                       |   2 +-
 arch/x86/kernel/kvm.c                         |   6 +-
 arch/x86/kernel/ldt.c                         |   2 +-
 arch/x86/kvm/vmx/vmx.c                        |   3 +-
 arch/x86/kvm/x86.c                            |  11 +-
 arch/x86/lib/cache-smp.c                      |   4 +-
 arch/x86/lib/msr-smp.c                        |  20 +-
 arch/x86/mm/pat/set_memory.c                  |   4 +-
 arch/x86/mm/tlb.c                             |  12 +-
 arch/x86/xen/mmu_pv.c                         |   4 +-
 arch/x86/xen/smp_pv.c                         |   5 +-
 arch/x86/xen/suspend.c                        |   4 +-
 arch/xtensa/kernel/smp.c                      |  29 +-
 block/blk-mq.c                                |   2 +-
 drivers/acpi/processor_idle.c                 |   4 +-
 drivers/char/agp/generic.c                    |   2 +-
 drivers/clocksource/ingenic-timer.c           |   2 +-
 drivers/clocksource/mips-gic-timer.c          |   2 +-
 drivers/cpufreq/acpi-cpufreq.c                |  15 +-
 drivers/cpufreq/powernow-k8.c                 |   9 +-
 drivers/cpufreq/powernv-cpufreq.c             |  39 +-
 drivers/cpufreq/sparc-us2e-cpufreq.c          |   4 +-
 drivers/cpufreq/sparc-us3-cpufreq.c           |   4 +-
 drivers/cpufreq/speedstep-ich.c               |   7 +-
 drivers/cpufreq/tegra194-cpufreq.c            |   8 +-
 drivers/cpuidle/coupled.c                     |   2 +-
 drivers/cpuidle/driver.c                      |   8 +-
 drivers/edac/amd64_edac.c                     |   4 +-
 drivers/firmware/arm_sdei.c                   |  10 +-
 drivers/gpu/drm/i915/vlv_sideband.c           |   2 +-
 drivers/hwmon/fam15h_power.c                  |   2 +-
 .../hwtracing/coresight/coresight-cpu-debug.c |   3 +-
 .../coresight/coresight-etm3x-core.c          |  11 +-
 .../coresight/coresight-etm4x-core.c          |  12 +-
 .../coresight/coresight-etm4x-sysfs.c         |   2 +-
 drivers/hwtracing/coresight/coresight-trbe.c  |   6 +-
 drivers/irqchip/irq-mvebu-pic.c               |   4 +-
 .../net/ethernet/cavium/liquidio/lio_core.c   |   2 +-
 drivers/net/ethernet/marvell/mvneta.c         |  34 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   8 +-
 drivers/perf/arm_spe_pmu.c                    |  14 +-
 .../intel/speed_select_if/isst_if_mbox_msr.c  |   4 +-
 drivers/platform/x86/intel_ips.c              |   4 +-
 drivers/powercap/intel_rapl_common.c          |   2 +-
 drivers/powercap/intel_rapl_msr.c             |   2 +-
 drivers/regulator/qcom_spmi-regulator.c       |   3 +-
 drivers/soc/fsl/qbman/qman.c                  |   4 +-
 drivers/soc/fsl/qbman/qman_test_stash.c       |   9 +-
 drivers/soc/xilinx/xlnx_event_manager.c       |   2 +-
 drivers/tty/sysrq.c                           |   5 +-
 drivers/watchdog/booke_wdt.c                  |   8 +-
 fs/buffer.c                                   |   2 +-
 include/linux/irq_work.h                      |   2 +-
 include/linux/smp.h                           | 216 ++++--
 include/linux/smp_types.h                     |  69 --
 kernel/cpu.c                                  |   4 +-
 kernel/debug/debug_core.c                     |   2 +-
 kernel/events/core.c                          |  10 +-
 kernel/irq_work.c                             |   4 +-
 kernel/profile.c                              |   4 +-
 kernel/rcu/rcutorture.c                       |   3 +-
 kernel/rcu/tasks.h                            |   4 +-
 kernel/rcu/tree.c                             |   6 +-
 kernel/rcu/tree_exp.h                         |   4 +-
 kernel/relay.c                                |   5 +-
 kernel/scftorture.c                           |  20 +-
 kernel/sched/core.c                           |   4 +-
 kernel/sched/fair.c                           |   2 +-
 kernel/sched/membarrier.c                     |  10 +-
 kernel/smp.c                                  | 641 ++++++++----------
 kernel/time/clockevents.c                     |   2 +-
 kernel/time/clocksource.c                     |   2 +-
 kernel/time/hrtimer.c                         |   5 +-
 kernel/time/tick-common.c                     |   2 +-
 kernel/trace/ftrace.c                         |  18 +-
 kernel/trace/ring_buffer.c                    |   7 +-
 kernel/trace/trace.c                          |  12 +-
 kernel/trace/trace_events.c                   |   2 +-
 kernel/up.c                                   |  58 +-
 mm/kasan/quarantine.c                         |   2 +-
 mm/mmu_gather.c                               |   5 +-
 mm/slab.c                                     |   2 +-
 net/bpf/test_run.c                            |   4 +-
 net/core/dev.c                                |   2 +-
 net/iucv/iucv.c                               |  21 +-
 virt/kvm/kvm_main.c                           |  12 +-
 186 files changed, 1160 insertions(+), 991 deletions(-)
 delete mode 100644 include/linux/smp_types.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 01/11] smp: consolidate the structure definitions to smp.h
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 02/11] smp: the definitions of cross call interface Donghai Qiao
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Move the structure definitions from kernel/smp.c to
include/linux/smp.h

Move the structure definitions from include/linux/smp_types.h
to include/linux/smp.h and delete smp_types.h

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v2 -> v3: Moved the array seq_type[] back to smp.c as Thomas
           pointed out.

 include/linux/irq_work.h  |   2 +-
 include/linux/smp.h       | 118 ++++++++++++++++++++++++++++++++++++--
 include/linux/smp_types.h |  69 ----------------------
 kernel/smp.c              |  50 ----------------
 4 files changed, 115 insertions(+), 124 deletions(-)
 delete mode 100644 include/linux/smp_types.h

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 8cd11a223260..145af67b1cd3 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -2,7 +2,7 @@
 #ifndef _LINUX_IRQ_WORK_H
 #define _LINUX_IRQ_WORK_H
 
-#include <linux/smp_types.h>
+#include <linux/smp.h>
 #include <linux/rcuwait.h>
 
 /*
diff --git a/include/linux/smp.h b/include/linux/smp.h
index a80ab58ae3f1..94bd901b4e4c 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -10,13 +10,74 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/list.h>
+#include <linux/llist.h>
 #include <linux/cpumask.h>
 #include <linux/init.h>
-#include <linux/smp_types.h>
 
 typedef void (*smp_call_func_t)(void *info);
 typedef bool (*smp_cond_func_t)(int cpu, void *info);
 
+enum {
+	CSD_FLAG_LOCK		= 0x01,
+
+	IRQ_WORK_PENDING	= 0x01,
+	IRQ_WORK_BUSY		= 0x02,
+	IRQ_WORK_LAZY		= 0x04, /* No IPI, wait for tick */
+	IRQ_WORK_HARD_IRQ	= 0x08, /* IRQ context on PREEMPT_RT */
+
+	IRQ_WORK_CLAIMED	= (IRQ_WORK_PENDING | IRQ_WORK_BUSY),
+
+	CSD_TYPE_ASYNC		= 0x00,
+	CSD_TYPE_SYNC		= 0x10,
+	CSD_TYPE_IRQ_WORK	= 0x20,
+	CSD_TYPE_TTWU		= 0x30,
+
+	CSD_FLAG_TYPE_MASK	= 0xF0,
+};
+
+/*
+ * struct __call_single_node is the primary type on
+ * smp.c:call_single_queue.
+ *
+ * flush_smp_call_function_queue() only reads the type from
+ * __call_single_node::u_flags as a regular load, the above
+ * (anonymous) enum defines all the bits of this word.
+ *
+ * Other bits are not modified until the type is known.
+ *
+ * CSD_TYPE_SYNC/ASYNC:
+ *	struct {
+ *		struct llist_node node;
+ *		unsigned int flags;
+ *		smp_call_func_t func;
+ *		void *info;
+ *	};
+ *
+ * CSD_TYPE_IRQ_WORK:
+ *	struct {
+ *		struct llist_node node;
+ *		atomic_t flags;
+ *		void (*func)(struct irq_work *);
+ *	};
+ *
+ * CSD_TYPE_TTWU:
+ *	struct {
+ *		struct llist_node node;
+ *		unsigned int flags;
+ *	};
+ *
+ */
+struct __call_single_node {
+	struct llist_node	llist;
+	union {
+		unsigned int	u_flags;
+		atomic_t	a_flags;
+	};
+#ifdef CONFIG_64BIT
+	u16 src, dst;
+#endif
+};
+
 /*
  * structure shares (partial) layout with struct irq_work
  */
@@ -26,13 +87,62 @@ struct __call_single_data {
 	void *info;
 };
 
-#define CSD_INIT(_func, _info) \
-	(struct __call_single_data){ .func = (_func), .info = (_info), }
-
 /* Use __aligned() to avoid to use 2 cache lines for 1 csd */
 typedef struct __call_single_data call_single_data_t
 	__aligned(sizeof(struct __call_single_data));
 
+struct cfd_percpu {
+	call_single_data_t	csd;
+#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
+	u64	seq_queue;
+	u64	seq_ipi;
+	u64	seq_noipi;
+#endif
+};
+
+struct call_function_data {
+	struct cfd_percpu	__percpu *pcpu;
+	cpumask_var_t		cpumask;
+	cpumask_var_t		cpumask_ipi;
+};
+
+#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
+union cfd_seq_cnt {
+	u64		val;
+	struct {
+		u64	src:16;
+		u64	dst:16;
+#define CFD_SEQ_NOCPU	0xffff
+		u64	type:4;
+#define CFD_SEQ_QUEUE	0
+#define CFD_SEQ_IPI	1
+#define CFD_SEQ_NOIPI	2
+#define CFD_SEQ_PING	3
+#define CFD_SEQ_PINGED	4
+#define CFD_SEQ_HANDLE	5
+#define CFD_SEQ_DEQUEUE	6
+#define CFD_SEQ_IDLE	7
+#define CFD_SEQ_GOTIPI	8
+#define CFD_SEQ_HDLEND	9
+		u64	cnt:28;
+	}		u;
+};
+
+struct cfd_seq_local {
+	u64	ping;
+	u64	pinged;
+	u64	handle;
+	u64	dequeue;
+	u64	idle;
+	u64	gotipi;
+	u64	hdlend;
+};
+#endif
+
+#define CSD_TYPE(_csd)	((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
+#define CSD_INIT(_func, _info) \
+	((struct __call_single_data){ .func = (_func), .info = (_info), })
+
 #define INIT_CSD(_csd, _func, _info)		\
 do {						\
 	*(_csd) = CSD_INIT((_func), (_info));	\
diff --git a/include/linux/smp_types.h b/include/linux/smp_types.h
deleted file mode 100644
index 2e8461af8df6..000000000000
--- a/include/linux/smp_types.h
+++ /dev/null
@@ -1,69 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __LINUX_SMP_TYPES_H
-#define __LINUX_SMP_TYPES_H
-
-#include <linux/llist.h>
-
-enum {
-	CSD_FLAG_LOCK		= 0x01,
-
-	IRQ_WORK_PENDING	= 0x01,
-	IRQ_WORK_BUSY		= 0x02,
-	IRQ_WORK_LAZY		= 0x04, /* No IPI, wait for tick */
-	IRQ_WORK_HARD_IRQ	= 0x08, /* IRQ context on PREEMPT_RT */
-
-	IRQ_WORK_CLAIMED	= (IRQ_WORK_PENDING | IRQ_WORK_BUSY),
-
-	CSD_TYPE_ASYNC		= 0x00,
-	CSD_TYPE_SYNC		= 0x10,
-	CSD_TYPE_IRQ_WORK	= 0x20,
-	CSD_TYPE_TTWU		= 0x30,
-
-	CSD_FLAG_TYPE_MASK	= 0xF0,
-};
-
-/*
- * struct __call_single_node is the primary type on
- * smp.c:call_single_queue.
- *
- * flush_smp_call_function_queue() only reads the type from
- * __call_single_node::u_flags as a regular load, the above
- * (anonymous) enum defines all the bits of this word.
- *
- * Other bits are not modified until the type is known.
- *
- * CSD_TYPE_SYNC/ASYNC:
- *	struct {
- *		struct llist_node node;
- *		unsigned int flags;
- *		smp_call_func_t func;
- *		void *info;
- *	};
- *
- * CSD_TYPE_IRQ_WORK:
- *	struct {
- *		struct llist_node node;
- *		atomic_t flags;
- *		void (*func)(struct irq_work *);
- *	};
- *
- * CSD_TYPE_TTWU:
- *	struct {
- *		struct llist_node node;
- *		unsigned int flags;
- *	};
- *
- */
-
-struct __call_single_node {
-	struct llist_node	llist;
-	union {
-		unsigned int	u_flags;
-		atomic_t	a_flags;
-	};
-#ifdef CONFIG_64BIT
-	u16 src, dst;
-#endif
-};
-
-#endif /* __LINUX_SMP_TYPES_H */
diff --git a/kernel/smp.c b/kernel/smp.c
index 65a630f62363..4d192ac85a91 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -29,30 +29,7 @@
 #include "smpboot.h"
 #include "sched/smp.h"
 
-#define CSD_TYPE(_csd)	((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
-
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
-union cfd_seq_cnt {
-	u64		val;
-	struct {
-		u64	src:16;
-		u64	dst:16;
-#define CFD_SEQ_NOCPU	0xffff
-		u64	type:4;
-#define CFD_SEQ_QUEUE	0
-#define CFD_SEQ_IPI	1
-#define CFD_SEQ_NOIPI	2
-#define CFD_SEQ_PING	3
-#define CFD_SEQ_PINGED	4
-#define CFD_SEQ_HANDLE	5
-#define CFD_SEQ_DEQUEUE	6
-#define CFD_SEQ_IDLE	7
-#define CFD_SEQ_GOTIPI	8
-#define CFD_SEQ_HDLEND	9
-		u64	cnt:28;
-	}		u;
-};
-
 static char *seq_type[] = {
 	[CFD_SEQ_QUEUE]		= "queue",
 	[CFD_SEQ_IPI]		= "ipi",
@@ -65,37 +42,10 @@ static char *seq_type[] = {
 	[CFD_SEQ_GOTIPI]	= "gotipi",
 	[CFD_SEQ_HDLEND]	= "hdlend (src CPU 0 == early)",
 };
-
-struct cfd_seq_local {
-	u64	ping;
-	u64	pinged;
-	u64	handle;
-	u64	dequeue;
-	u64	idle;
-	u64	gotipi;
-	u64	hdlend;
-};
 #endif
 
-struct cfd_percpu {
-	call_single_data_t	csd;
-#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
-	u64	seq_queue;
-	u64	seq_ipi;
-	u64	seq_noipi;
-#endif
-};
-
-struct call_function_data {
-	struct cfd_percpu	__percpu *pcpu;
-	cpumask_var_t		cpumask;
-	cpumask_var_t		cpumask_ipi;
-};
-
 static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);
-
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
-
 static void flush_smp_call_function_queue(bool warn_cpu_offline);
 
 int smpcfd_prepare_cpu(unsigned int cpu)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 02/11] smp: the definitions of cross call interface
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 03/11] smp: remove SCF_WAIT and SCF_RUN_LOCAL Donghai Qiao
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

The functions of cross CPU call interface are defined below :

int smp_call(int cpu, smp_call_func_t func, void *info,
		unsigned int flags)

int smp_call_cond(int cpu, smp_call_func_t func, void *info,
		smp_cond_func_t condf, unsigned int flags)

void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
		void *info, unsigned int flags)

void smp_call_mask_others(const struct cpumask *mask, smp_call_func_t func,
		void *info, unsigned int flags)

void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
		void *info, smp_cond_func_t condf, unsigned int flags)

int smp_call_csd(int cpu, call_single_data_t *csd, unsigned int flags)

Basically the set of functions above can cover the requirements for
synchronous and asynchronous type of cross calls. The target CPUs
can be specified as a unique one or all of the CPUs in a given cpumask
or all of the CPUs in cpumask except the local CPU. The CPUs in cpumask
can be as many as all the onlined CPUs or a set of selected CPUs.

Although there is a constrain that cross calls cannot be called from
context with disabled interrupts, the function smp_call_csd() is
allowed to do so, which requires the callers to provide and maintain
their own preallocated call_single_data_t structures.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: Removed 'x' from the function names and change XCALL to
          SMP_CALL from the new macros
v2 -> v3: In order to fix the issues that Peter pointed out previously:
          Removed smp_call_any();
          Changed smp_call_private to smp_call_csd;
          Added smp_call_mask_others();
          Modified the commit comments accordingly.

 include/linux/smp.h |  30 ++++++++
 kernel/smp.c        | 172 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 202 insertions(+)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 94bd901b4e4c..06a20454fd53 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -148,6 +148,36 @@ do {						\
 	*(_csd) = CSD_INIT((_func), (_info));	\
 } while (0)
 
+
+/*
+ * smp_call Interface.
+ *
+ * Also see kernel/smp.c for the details.
+ */
+#define	SMP_CALL_TYPE_SYNC		CSD_TYPE_SYNC
+#define	SMP_CALL_TYPE_ASYNC		CSD_TYPE_ASYNC
+#define	SMP_CALL_TYPE_IRQ_WORK		CSD_TYPE_IRQ_WORK
+#define	SMP_CALL_TYPE_TTWU		CSD_TYPE_TTWU
+#define	SMP_CALL_TYPE_MASK		CSD_FLAG_TYPE_MASK
+
+#define	SMP_CALL_ALL	-1
+
+extern int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags);
+
+extern int smp_call_cond(int cpu, smp_call_func_t func, void *info,
+		smp_cond_func_t condf, unsigned int flags);
+
+extern void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags);
+
+void smp_call_mask_others(const struct cpumask *mask, smp_call_func_t func,
+                void *info, unsigned int flags);
+
+extern void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
+		void *info, smp_cond_func_t condf, unsigned int flags);
+
+extern int smp_call_csd(int cpu, call_single_data_t *csd, unsigned int flags);
+
 /*
  * Enqueue a llist_node on the call_single_queue; be very careful, read
  * flush_smp_call_function_queue() in detail.
diff --git a/kernel/smp.c b/kernel/smp.c
index 4d192ac85a91..906be125068f 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -1185,3 +1185,175 @@ int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 	return sscs.ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_on_cpu);
+
+
+void __smp_call_mask_cond(const struct cpumask *mask,
+		smp_call_func_t func, void *info,
+		smp_cond_func_t cond_func, bool local_cpu,
+		unsigned int flags)
+{
+}
+
+/*
+ * smp_call Interface
+ *
+ * Consolidate the cross CPU call usage from the history below:
+ *
+ * Normally this interface cannot be used with interrupts disabled or
+ * from a hardware interrupt handler or from a bottom half handler.
+ * But there are two exceptions:
+ * 1) It can be used during early boot while early_boot_irqs_disabled
+ *    is. In this scenario, you should use local_irq_save/restore()
+ *    instead of local_irq_disable/enable()
+ * 2) Because smp_call_csd(cpu, csd, SMP_CALL_TYPE_ASYNC) is an asynchonous
+ *    call with a preallocated csd structure, thus it can be called from
+ *    the context where interrupts are disabled.
+ */
+
+/*
+ * Parameters:
+ *
+ * cpu: If cpu == SMP_CALL_ALL, the cross call is for all the online CPUs
+ *      Otherwise, if 0 <= cpu < nr_cpu_ids, the cross call is for that cpu.
+ *
+ * func: It is the cross function that the destination CPUs need to execute.
+ *       This function must be fast and non-blocking.
+ *
+ * info: It is the parameter to func().
+ *
+ * flags: The flags specify the manner the cross call is performaned in terms
+ *	  of synchronous or asynchronous.
+ *
+ *	  A synchronous cross call will not return immediately until all
+ *	  the destination CPUs have executed func() and responded the call.
+ *
+ *	  An asynchrouse cross call will return immediately as soon as it
+ *	  has fired all the cross calls and run func() locally if needed
+ *	  regardless the status of the target CPUs.
+ *
+ * Return: %0 on success or negative errno value on error.
+ */
+int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags)
+{
+	return smp_call_cond(cpu, func, info, NULL, flags);
+}
+EXPORT_SYMBOL(smp_call);
+
+/*
+ * Parameters:
+ *
+ * cond_func: This is a condition function cond_func(cpu, info) invoked by
+ *	      the underlying cross call mechanism only. If the return value
+ *	      from cond_func(cpu, info) is true, the cross call will be sent
+ *	      to that cpu, otherwise not.
+ *
+ * Others: see smp_call().
+ *
+ * Return: %0 on success or negative errno value on error.
+ */
+int smp_call_cond(int cpu, smp_call_func_t func, void *info,
+		    smp_cond_func_t cond_func, unsigned int flags)
+{
+	preempt_disable();
+	if (cpu == SMP_CALL_ALL) {
+		__smp_call_mask_cond(cpu_online_mask, func, info, cond_func, true, flags);
+	} else if ((unsigned int)cpu < nr_cpu_ids) {
+		__smp_call_mask_cond(cpumask_of(cpu), func, info, cond_func, true, flags);
+	} else {
+		preempt_enable();
+		pr_warn("Invalid cpu ID = %d\n", cpu);
+		return -ENXIO;
+	}
+	preempt_enable();
+	return 0;
+}
+EXPORT_SYMBOL(smp_call_cond);
+
+/*
+ * Parameters:
+ *
+ * mask: The cross call will be sent to all the CPUs in mask.
+ *
+ * Others: see smp_call().
+ */
+void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags)
+{
+	if (mask == NULL) {
+		pr_warn("mask must not be NULL\n");
+		return;
+	}
+
+	preempt_disable();
+	__smp_call_mask_cond(mask, func, info, NULL, true, flags);
+	preempt_enable();
+}
+EXPORT_SYMBOL(smp_call_mask);
+
+/*
+ * Parameters:
+ *
+ * mask: The cross call will be sent to all the CPUs in mask except
+ *       the local CPU.
+ *
+ * Others: see smp_call().
+ *
+ * Besides, its the callers' discrimination as to whether preemption
+ * needs to be disabled or not during the call.
+ */
+void smp_call_mask_others(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags)
+{
+	if (mask == NULL) {
+		pr_warn("mask must not be NULL\n");
+		return;
+	}
+
+	__smp_call_mask_cond(mask, func, info, NULL, false, flags);
+}
+EXPORT_SYMBOL(smp_call_mask_others);
+
+/*
+ * The combination of smp_call_cond() and smp_call_mask()
+ */
+void smp_call_mask_cond(const struct cpumask *mask,
+		smp_call_func_t func, void *info,
+		smp_cond_func_t cond_func,
+		unsigned int flags)
+{
+	if (!mask) {
+		/*
+		 * If mask is NULL meaning for all CPUs,
+		 * then cond_func must be provided.
+		 */
+		WARN_ON_ONCE(!cond_func);
+		mask = cpu_online_mask;
+	}
+
+	preempt_disable();
+	__smp_call_mask_cond(mask, func, info, cond_func, true, flags);
+	preempt_enable();
+}
+EXPORT_SYMBOL(smp_call_mask_cond);
+
+/*
+ * This function provides an alternative way of sending a cross call to
+ * only one destination CPU with a caller pre-allocated private csd instead
+ * of using the csd resource in the cross call mechanism. Thus it is the
+ * callers' responsibity to setup and maintain its own private csd.
+ *
+ * Because of that, this function can be used from the contexts with disabled
+ * interrupts.
+ *
+ * Parameters
+ *
+ * cpu:   Must be a positive value less than nr_cpu_id.
+ * csd:   The private csd provided by the callers.
+ *
+ * Others: see smp_call().
+ */
+int smp_call_csd(int cpu, call_single_data_t *csd, unsigned int flags)
+{
+	return 0;
+}
+EXPORT_SYMBOL(smp_call_csd);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 03/11] smp: remove SCF_WAIT and SCF_RUN_LOCAL
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 02/11] smp: the definitions of cross call interface Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 04/11] smp: replace smp_call_function_single() with smp_call() Donghai Qiao
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Remove SCF_WAIT and SCF_RUN_LOCAL and change smp_call_function_many_cond
accordingly. In order to differentiate the cross calls that the local CPU
needs to be included from those the local CPU needs to be excluded, the
interface provides two different functions to deal with that - smp_call_mask()
and smp_call_mask_others().

As a result of these change, the set of on_each_cpu* functions will be
eliminated with a patch later on in this series.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: Removed 'x' from the function names and change XCALL to SMP_CALL
          from the new macros
v2 -> v3: Added a parameter 'local_cpu' to smp_call_function_many_cond
          and the changes accordingly.

 kernel/smp.c | 37 ++++++++++---------------------------
 1 file changed, 10 insertions(+), 27 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 906be125068f..101a48d1d8af 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -802,23 +802,14 @@ int smp_call_function_any(const struct cpumask *mask,
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
 
-/*
- * Flags to be used as scf_flags argument of smp_call_function_many_cond().
- *
- * %SCF_WAIT:		Wait until function execution is completed
- * %SCF_RUN_LOCAL:	Run also locally if local cpu is set in cpumask
- */
-#define SCF_WAIT	(1U << 0)
-#define SCF_RUN_LOCAL	(1U << 1)
-
 static void smp_call_function_many_cond(const struct cpumask *mask,
 					smp_call_func_t func, void *info,
-					unsigned int scf_flags,
+					bool local_cpu,
+					bool wait,
 					smp_cond_func_t cond_func)
 {
 	int cpu, last_cpu, this_cpu = smp_processor_id();
 	struct call_function_data *cfd;
-	bool wait = scf_flags & SCF_WAIT;
 	bool run_remote = false;
 	bool run_local = false;
 	int nr_cpus = 0;
@@ -844,14 +835,14 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 	WARN_ON_ONCE(!in_task());
 
 	/* Check if we need local execution. */
-	if ((scf_flags & SCF_RUN_LOCAL) && cpumask_test_cpu(this_cpu, mask))
+	if (local_cpu && cpumask_test_cpu(this_cpu, mask))
 		run_local = true;
 
 	/* Check if we need remote execution, i.e., any CPU excluding this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
 	if (cpu == this_cpu)
 		cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
-	if (cpu < nr_cpu_ids)
+	if ((unsigned int)cpu < nr_cpu_ids)
 		run_remote = true;
 
 	if (run_remote) {
@@ -922,16 +913,12 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 }
 
 /**
- * smp_call_function_many(): Run a function on a set of CPUs.
+ * smp_call_function_many(): Run a function on a set of CPUs except the local CPU.
  * @mask: The set of cpus to run on (only runs on online subset).
  * @func: The function to run. This must be fast and non-blocking.
  * @info: An arbitrary pointer to pass to the function.
- * @wait: Bitmask that controls the operation. If %SCF_WAIT is set, wait
- *        (atomically) until function has completed on other CPUs. If
- *        %SCF_RUN_LOCAL is set, the function will also be run locally
- *        if the local CPU is set in the @cpumask.
- *
- * If @wait is true, then returns once @func has returned.
+ * @wait: If wait is true, the call will not return until func()
+ *        has completed on other CPUs.
  *
  * You must not call this function with disabled interrupts or from a
  * hardware interrupt handler or from a bottom half handler. Preemption
@@ -940,7 +927,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 void smp_call_function_many(const struct cpumask *mask,
 			    smp_call_func_t func, void *info, bool wait)
 {
-	smp_call_function_many_cond(mask, func, info, wait * SCF_WAIT, NULL);
+	smp_call_function_many_cond(mask, func, info, false, wait, NULL);
 }
 EXPORT_SYMBOL(smp_call_function_many);
 
@@ -1052,7 +1039,7 @@ void __init smp_init(void)
 }
 
 /*
- * on_each_cpu_cond(): Call a function on each processor for which
+ * on_each_cpu_cond(): Call a function on each processor which
  * the supplied function cond_func returns true, optionally waiting
  * for all the required CPUs to finish. This may include the local
  * processor.
@@ -1076,13 +1063,9 @@ void __init smp_init(void)
 void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
 			   void *info, bool wait, const struct cpumask *mask)
 {
-	unsigned int scf_flags = SCF_RUN_LOCAL;
-
-	if (wait)
-		scf_flags |= SCF_WAIT;
 
 	preempt_disable();
-	smp_call_function_many_cond(mask, func, info, scf_flags, cond_func);
+	smp_call_function_many_cond(mask, func, info, true, wait, cond_func);
 	preempt_enable();
 }
 EXPORT_SYMBOL(on_each_cpu_cond_mask);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 04/11] smp: replace smp_call_function_single() with  smp_call()
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (2 preceding siblings ...)
  2022-05-19 20:49 ` [PATCH v4 03/11] smp: remove SCF_WAIT and SCF_RUN_LOCAL Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 05/11] smp: replace smp_call_function_single_async with smp_call_csd Donghai Qiao
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Eliminated the percpu global csd_data and temporarily hooked up
smp_call_function_single() to smp_call().

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: Removed 'x' from the function names and change XCALL to
          SMP_CALL from the new macros

 kernel/smp.c | 74 ++++++++++++++++++----------------------------------
 1 file changed, 25 insertions(+), 49 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 101a48d1d8af..8fdea9547502 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -414,8 +414,6 @@ static __always_inline void csd_unlock(struct __call_single_data *csd)
 	smp_store_release(&csd->node.u_flags, 0);
 }
 
-static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
-
 void __smp_call_single_queue(int cpu, struct llist_node *node)
 {
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
@@ -649,6 +647,9 @@ void flush_smp_call_function_from_idle(void)
 }
 
 /*
+ * This is a temporarily hook up. This function will be eliminated
+ * with a later patch in this series.
+ *
  * smp_call_function_single - Run a function on a specific CPU
  * @func: The function to run. This must be fast and non-blocking.
  * @info: An arbitrary pointer to pass to the function.
@@ -657,59 +658,21 @@ void flush_smp_call_function_from_idle(void)
  * Returns 0 on success, else a negative status code.
  */
 int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
-			     int wait)
+			int wait)
 {
-	call_single_data_t *csd;
-	call_single_data_t csd_stack = {
-		.node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, },
-	};
-	int this_cpu;
-	int err;
-
-	/*
-	 * prevent preemption and reschedule on another processor,
-	 * as well as CPU removal
-	 */
-	this_cpu = get_cpu();
-
-	/*
-	 * Can deadlock when called with interrupts disabled.
-	 * We allow cpu's that are not yet online though, as no one else can
-	 * send smp call function interrupt to this cpu and as such deadlocks
-	 * can't happen.
-	 */
-	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
-		     && !oops_in_progress);
+	unsigned int flags = 0;
 
-	/*
-	 * When @wait we can deadlock when we interrupt between llist_add() and
-	 * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
-	 * csd_lock() on because the interrupt context uses the same csd
-	 * storage.
-	 */
-	WARN_ON_ONCE(!in_task());
-
-	csd = &csd_stack;
-	if (!wait) {
-		csd = this_cpu_ptr(&csd_data);
-		csd_lock(csd);
-	}
-
-	csd->func = func;
-	csd->info = info;
-#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
-	csd->node.src = smp_processor_id();
-	csd->node.dst = cpu;
-#endif
-
-	err = generic_exec_single(cpu, csd);
+	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu))
+		return -ENXIO;
 
 	if (wait)
-		csd_lock_wait(csd);
+		flags = SMP_CALL_TYPE_SYNC;
+	else
+		flags = SMP_CALL_TYPE_ASYNC;
 
-	put_cpu();
+	smp_call(cpu, func, info, flags);
 
-	return err;
+	return 0;
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
@@ -1175,6 +1138,19 @@ void __smp_call_mask_cond(const struct cpumask *mask,
 		smp_cond_func_t cond_func, bool local_cpu,
 		unsigned int flags)
 {
+	bool wait = false;
+
+	if (flags == SMP_CALL_TYPE_SYNC)
+		wait = true;
+
+	preempt_disable();
+
+	/*
+	 * This is temporarily hook. The function smp_call_function_many_cond()
+	 * will be inlined here with a later patch in this series.
+	 */
+	smp_call_function_many_cond(mask, func, info, local_cpu, wait, cond_func);
+	preempt_enable();
 }
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 05/11] smp: replace smp_call_function_single_async with smp_call_csd
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (3 preceding siblings ...)
  2022-05-19 20:49 ` [PATCH v4 04/11] smp: replace smp_call_function_single() with smp_call() Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 06/11] smp: use smp_call_csd() from irq_work.c and core.c Donghai Qiao
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replaced smp_call_function_single_async with smp_call_csd and extended
it to support one CPU synchronous call with preallocated csd structure.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to
          SMP_CALL from the new macros
v2 -> v3: Changed the call of smp_call_private() to smp_call_csd()

 include/linux/smp.h |   3 +-
 kernel/smp.c        | 157 ++++++++++++++++++++------------------------
 2 files changed, 75 insertions(+), 85 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 06a20454fd53..b4885e45690b 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -193,7 +193,8 @@ int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
 			   void *info, bool wait, const struct cpumask *mask);
 
-int smp_call_function_single_async(int cpu, struct __call_single_data *csd);
+#define	smp_call_function_single_async(cpu, csd) \
+	smp_call_csd(cpu, csd, SMP_CALL_TYPE_ASYNC)
 
 /*
  * Cpus stopping functions in panic. All have default weak definitions.
diff --git a/kernel/smp.c b/kernel/smp.c
index 8fdea9547502..f08135ad70e3 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -444,41 +444,6 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
 		send_call_function_single_ipi(cpu);
 }
 
-/*
- * Insert a previously allocated call_single_data_t element
- * for execution on the given CPU. data must already have
- * ->func, ->info, and ->flags set.
- */
-static int generic_exec_single(int cpu, struct __call_single_data *csd)
-{
-	if (cpu == smp_processor_id()) {
-		smp_call_func_t func = csd->func;
-		void *info = csd->info;
-		unsigned long flags;
-
-		/*
-		 * We can unlock early even for the synchronous on-stack case,
-		 * since we're doing this from the same CPU..
-		 */
-		csd_lock_record(csd);
-		csd_unlock(csd);
-		local_irq_save(flags);
-		func(info);
-		csd_lock_record(NULL);
-		local_irq_restore(flags);
-		return 0;
-	}
-
-	if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
-		csd_unlock(csd);
-		return -ENXIO;
-	}
-
-	__smp_call_single_queue(cpu, &csd->node.llist);
-
-	return 0;
-}
-
 /**
  * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
  *
@@ -676,52 +641,6 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-/**
- * smp_call_function_single_async() - Run an asynchronous function on a
- * 			         specific CPU.
- * @cpu: The CPU to run on.
- * @csd: Pre-allocated and setup data structure
- *
- * Like smp_call_function_single(), but the call is asynchonous and
- * can thus be done from contexts with disabled interrupts.
- *
- * The caller passes his own pre-allocated data structure
- * (ie: embedded in an object) and is responsible for synchronizing it
- * such that the IPIs performed on the @csd are strictly serialized.
- *
- * If the function is called with one csd which has not yet been
- * processed by previous call to smp_call_function_single_async(), the
- * function will return immediately with -EBUSY showing that the csd
- * object is still in progress.
- *
- * NOTE: Be careful, there is unfortunately no current debugging facility to
- * validate the correctness of this serialization.
- *
- * Return: %0 on success or negative errno value on error
- */
-int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
-{
-	int err = 0;
-
-	preempt_disable();
-
-	if (csd->node.u_flags & CSD_FLAG_LOCK) {
-		err = -EBUSY;
-		goto out;
-	}
-
-	csd->node.u_flags = CSD_FLAG_LOCK;
-	smp_wmb();
-
-	err = generic_exec_single(cpu, csd);
-
-out:
-	preempt_enable();
-
-	return err;
-}
-EXPORT_SYMBOL_GPL(smp_call_function_single_async);
-
 /*
  * smp_call_function_any - Run a function on any of the given cpus
  * @mask: The mask of cpus it can run on.
@@ -1304,15 +1223,85 @@ EXPORT_SYMBOL(smp_call_mask_cond);
  * Because of that, this function can be used from the contexts with disabled
  * interrupts.
  *
- * Parameters
+ * The bit CSD_FLAG_LOCK will be set to csd->node.u_flags only if the call
+ * is made as type CSD_TYPE_SYNC or CSD_TYPE_ASYNC.
  *
+ * Parameters
  * cpu:   Must be a positive value less than nr_cpu_id.
  * csd:   The private csd provided by the callers.
- *
  * Others: see smp_call().
+ *
+ * Return: %0 on success or negative errno value on error.
+ *
+ * The following comments are from smp_call_function_single_async():
+ *
+ *   The call is asynchronous and can thus be done from contexts with
+ *   disabled interrupts. If the function is called with one csd which
+ *   has not yet been processed by previous call, the function will
+ *   return immediately with -EBUSY showing that the csd object is
+ *   still in progress.
+ *
+ *   NOTE: Be careful, there is unfortunately no current debugging
+ *   facility to validate the correctness of this serialization.
  */
 int smp_call_csd(int cpu, call_single_data_t *csd, unsigned int flags)
 {
-	return 0;
+	int err = 0;
+
+	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
+		pr_warn("cpu ID must be a positive number < nr_cpu_ids and must be currently online\n");
+		return -EINVAL;
+	}
+
+	if (csd == NULL) {
+		pr_warn("csd must not be NULL\n");
+		return -EINVAL;
+	}
+
+	preempt_disable();
+	if (csd->node.u_flags & CSD_FLAG_LOCK) {
+		err = -EBUSY;
+		goto out;
+	}
+
+	/*
+	 * CSD_FLAG_LOCK is set for CSD_TYPE_SYNC or CSD_TYPE_ASYNC only.
+	 */
+	if ((flags & ~(CSD_TYPE_SYNC | CSD_TYPE_ASYNC)) == 0)
+		csd->node.u_flags = CSD_FLAG_LOCK | flags;
+	else
+		csd->node.u_flags = flags;
+
+	if (cpu == smp_processor_id()) {
+		smp_call_func_t func = csd->func;
+		void *info = csd->info;
+		unsigned long flags;
+
+		/*
+		 * We can unlock early even for the synchronous on-stack case,
+		 * since we're doing this from the same CPU..
+		 */
+		csd_lock_record(csd);
+		csd_unlock(csd);
+		local_irq_save(flags);
+		func(info);
+		csd_lock_record(NULL);
+		local_irq_restore(flags);
+		goto out;
+	}
+
+	/*
+	 * Ensure the flags are visible before the csd
+	 * goes to the queue.
+	 */
+	smp_wmb();
+
+	__smp_call_single_queue(cpu, &csd->node.llist);
+
+	if (flags & CSD_TYPE_SYNC)
+		csd_lock_wait(csd);
+out:
+	preempt_enable();
+	return err;
 }
 EXPORT_SYMBOL(smp_call_csd);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 06/11] smp: use smp_call_csd() from irq_work.c and core.c
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (4 preceding siblings ...)
  2022-05-19 20:49 ` [PATCH v4 05/11] smp: replace smp_call_function_single_async with smp_call_csd Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 07/11] smp: eliminate smp_call_function_any Donghai Qiao
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

irq_work.c and core.c should use the cross interface rather than
using a unpublished internal function __smp_call_single_queue.

But this change also exposed unpublished types SMP_CALL_TYPE_TTWU and
SMP_CALL_TYPE_IRQ_WORK. However, this is a temporarily change so that
all kernel clients can call the interface as the first step. A
permanent solution should implement new cross call func() for both
and from there the TTWU and IRQ_WORK types of requests can be processed.
That way, the special TTWU and IRQ_WORK types will not be visible from
the cross call interface. Because this needs to change core.s and
irq_work.c, this should be done with a follow up patch set to reduce
the level of risk.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: Removed 'x' from the function names and change XCALL to
          SMP_CALL from the new macros
v2 -> v3: Modifed the commit comments

 kernel/irq_work.c   | 4 ++--
 kernel/sched/core.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 7afa40fe5cc4..aa67ea2ccb8f 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -159,8 +159,8 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
 			if (!irq_work_claim(work))
 				goto out;
 		}
-
-		__smp_call_single_queue(cpu, &work->node.llist);
+		smp_call_csd(cpu, (call_single_data_t *)&work->node.llist,
+					SMP_CALL_TYPE_IRQ_WORK);
 	} else {
 		__irq_work_queue_local(work);
 	}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51efaabac3e4..6d1d30bd6220 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3780,7 +3780,7 @@ static void __ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags
 	p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
 
 	WRITE_ONCE(rq->ttwu_pending, 1);
-	__smp_call_single_queue(cpu, &p->wake_entry.llist);
+	smp_call_csd(cpu, (call_single_data_t *)&p->wake_entry.llist, SMP_CALL_TYPE_TTWU);
 }
 
 void wake_up_if_idle(int cpu)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 07/11] smp: eliminate smp_call_function_any
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (5 preceding siblings ...)
  2022-05-19 20:49 ` [PATCH v4 06/11] smp: use smp_call_csd() from irq_work.c and core.c Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 08/11] smp: replace smp_call_function_many_cond with __smp_call_mask_cond Donghai Qiao
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao, kernel test robot

When a caller needs to send a cross call to a preferred CPU,
the caller should provide the CPU rather than having the cross
call interface pick up one because an interface should do the
common job for the majority rather than cover every single
special cases.

In the scenario of smp_call_function_any(), a preferred CPU is
either the local CPU or any CPU in the same node or any online
CPU as long as it is present in cpumask. However, the way of
selecting a preferred CPU should be done by the caller rather
than by the interface. Thus, that particular code is moved
outside of smp_call_function_any() and formed a utility function
__smp_cpu_nearby() for this purpose. So, the function
smp_call_function_any() can be replaced with the combination of
__smp_cpu_nearby() and smp_call(). As a result, the function
smp_call_function_any() is eliminated.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
---
v1 -> v2: Removed 'x' from the function names and change XCALL
          to SMP_CALL from the new macros
v2 -> v3: Added a new utility function __smp_cpu_nearby() and
          use the combination of __smp_cpu_nearby() and smp_call()
          to replace smp_call_function_any() accordingly.
v3 -> v4: Changed '/**' to '/*' at kernel/smp.c:1033
 
 arch/arm/kernel/perf_event_v7.c           | 18 +++++-
 arch/arm64/kernel/perf_event.c            | 14 ++++-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 12 +++-
 drivers/cpufreq/acpi-cpufreq.c            |  9 ++-
 drivers/cpufreq/powernv-cpufreq.c         | 37 +++++++++--
 drivers/perf/arm_spe_pmu.c                | 14 ++++-
 include/linux/smp.h                       | 12 +---
 kernel/smp.c                              | 77 ++++++++++-------------
 8 files changed, 123 insertions(+), 70 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index eb2190477da1..64e9d62e9720 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1192,9 +1192,21 @@ static void armv7_read_num_pmnc_events(void *info)
 
 static int armv7_probe_num_events(struct arm_pmu *arm_pmu)
 {
-	return smp_call_function_any(&arm_pmu->supported_cpus,
-				     armv7_read_num_pmnc_events,
-				     &arm_pmu->num_events, 1);
+	int cpu;
+	int ret;
+
+	preempt_disable();
+	cpu = __smp_cpu_nearby(&arm_pmu->supported_cpus);
+	if ((unsigned int)cpu >= nr_cpu_ids) {
+		preempt_enable();
+		return -ENXIO;
+	}
+
+	ret = smp_call(cpu, armv7_read_num_pmnc_events,
+			&arm_pmu->num_events, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
+
+	return ret;
 }
 
 static int armv7_a8_pmu_init(struct arm_pmu *cpu_pmu)
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index cb69ff1e6138..be4ef28dc1d6 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1184,11 +1184,19 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
 		.pmu = cpu_pmu,
 		.present = false,
 	};
+	int cpu;
 	int ret;
 
-	ret = smp_call_function_any(&cpu_pmu->supported_cpus,
-				    __armv8pmu_probe_pmu,
-				    &probe, 1);
+	preempt_disable();
+	cpu = __smp_cpu_nearby(&cpu_pmu->supported_cpus);
+	if ((unsigned int)cpu >= nr_cpu_ids) {
+		preempt_enable();
+		return -ENXIO;
+	}
+
+	ret = smp_call(cpu, __armv8pmu_probe_pmu, &probe, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
+
 	if (ret)
 		return ret;
 
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 87666275eed9..f2dd4f29af27 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -502,6 +502,8 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
 		    int evtid, int first)
 {
+	int cpu;
+
 	/*
 	 * setup the parameters to send to the IPI to read the data.
 	 */
@@ -512,7 +514,15 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	rr->val = 0;
 	rr->first = first;
 
-	smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
+	preempt_disable();
+	cpu = __smp_cpu_nearby(&d->cpu_mask);
+	if ((unsigned int)cpu >= nr_cpu_ids) {
+		preempt_enable();
+		return;
+	}
+
+	smp_call(cpu, mon_event_count, rr, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
 }
 
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 3d514b82d055..9bbb4b9391f8 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -310,10 +310,15 @@ static u32 drv_read(struct acpi_cpufreq_data *data, const struct cpumask *mask)
 		.reg = &perf->control_register,
 		.func.read = data->cpu_freq_read,
 	};
+	int cpu;
 	int err;
 
-	err = smp_call_function_any(mask, do_drv_read, &cmd, 1);
-	WARN_ON_ONCE(err);	/* smp_call_function_any() was buggy? */
+	preempt_disable();
+	cpu = __smp_cpu_nearby(mask);
+	err = smp_call(cpu, do_drv_read, &cmd, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
+
+	WARN_ON_ONCE(err);	/* smp_call was buggy? */
 	return cmd.val;
 }
 
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index fddbd1ea1635..078ab1c46f73 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -507,8 +507,15 @@ static unsigned int powernv_cpufreq_get(unsigned int cpu)
 {
 	struct powernv_smp_call_data freq_data;
 
-	smp_call_function_any(cpu_sibling_mask(cpu), powernv_read_cpu_freq,
-			&freq_data, 1);
+	preempt_disable();
+	cpu = (unsigned int)__smp_cpu_nearby(cpu_sibling_mask(cpu));
+	if (cpu >= nr_cpu_ids) {
+		preempt_enable();
+		return -ENXIO;
+	}
+
+	smp_call((int)cpu, powernv_read_cpu_freq, &freq_data, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
 
 	return freq_data.freq;
 }
@@ -744,6 +751,7 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
 	struct powernv_smp_call_data freq_data;
 	unsigned int cur_msec, gpstate_idx;
 	struct global_pstate_info *gpstates = policy->driver_data;
+	int cpu;
 
 	if (unlikely(rebooting) && new_index != get_nominal_index())
 		return 0;
@@ -817,11 +825,20 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
 
 no_gpstate:
 	/*
-	 * Use smp_call_function to send IPI and execute the
+	 * Use smp_call to send IPI and execute the
 	 * mtspr on target CPU.  We could do that without IPI
 	 * if current CPU is within policy->cpus (core)
 	 */
-	smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1);
+	preempt_disable();
+	cpu = __smp_cpu_nearby(policy->cpus);
+	if ((unsigned int)cpu >= nr_cpu_ids) {
+		preempt_enable();
+		return -ENXIO;
+	}
+
+	smp_call(cpu, set_pstate, &freq_data, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
+
 	return 0;
 }
 
@@ -921,8 +938,16 @@ static void powernv_cpufreq_work_fn(struct work_struct *work)
 
 	cpus_read_lock();
 	cpumask_and(&mask, &chip->mask, cpu_online_mask);
-	smp_call_function_any(&mask,
-			      powernv_cpufreq_throttle_check, NULL, 0);
+
+	preempt_disable();
+	cpu = (unsigned int)__smp_cpu_nearby(&mask);
+	if (cpu >= nr_cpu_ids) {
+		preempt_enable();
+		return;
+	}
+
+	smp_call((int)cpu, powernv_cpufreq_throttle_check, NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 
 	if (!chip->restore)
 		goto out;
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d99c..92fa8e00086f 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -1104,11 +1104,21 @@ static int arm_spe_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 
 static int arm_spe_pmu_dev_init(struct arm_spe_pmu *spe_pmu)
 {
-	int ret;
 	cpumask_t *mask = &spe_pmu->supported_cpus;
+	int cpu;
+	int ret;
 
 	/* Make sure we probe the hardware on a relevant CPU */
-	ret = smp_call_function_any(mask,  __arm_spe_pmu_dev_probe, spe_pmu, 1);
+	preempt_disable();
+	cpu = __smp_cpu_nearby(mask);
+	if ((unsigned int)cpu >= nr_cpu_ids) {
+		preempt_enable();
+		return -ENXIO;
+	}
+
+	ret = smp_call(cpu, __arm_spe_pmu_dev_probe, spe_pmu, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
+
 	if (ret || !(spe_pmu->features & SPE_PMU_FEAT_DEV_PROBED))
 		return -ENXIO;
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index b4885e45690b..1fb0951ca16b 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -178,6 +178,8 @@ extern void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
 
 extern int smp_call_csd(int cpu, call_single_data_t *csd, unsigned int flags);
 
+extern int __smp_cpu_nearby(const struct cpumask *mask);
+
 /*
  * Enqueue a llist_node on the call_single_queue; be very careful, read
  * flush_smp_call_function_queue() in detail.
@@ -291,9 +293,6 @@ void smp_call_function(smp_call_func_t func, void *info, int wait);
 void smp_call_function_many(const struct cpumask *mask,
 			    smp_call_func_t func, void *info, bool wait);
 
-int smp_call_function_any(const struct cpumask *mask,
-			  smp_call_func_t func, void *info, int wait);
-
 void kick_all_cpus_sync(void);
 void wake_up_all_idle_cpus(void);
 
@@ -342,13 +341,6 @@ static inline void smp_send_reschedule(int cpu) { }
 			(up_smp_call_function(func, info))
 static inline void call_function_init(void) { }
 
-static inline int
-smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
-		      void *info, int wait)
-{
-	return smp_call_function_single(0, func, info, wait);
-}
-
 static inline void kick_all_cpus_sync(void) {  }
 static inline void wake_up_all_idle_cpus(void) {  }
 
diff --git a/kernel/smp.c b/kernel/smp.c
index f08135ad70e3..df343b1368eb 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -641,49 +641,6 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-/*
- * smp_call_function_any - Run a function on any of the given cpus
- * @mask: The mask of cpus it can run on.
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait until function has completed.
- *
- * Returns 0 on success, else a negative status code (if no cpus were online).
- *
- * Selection preference:
- *	1) current cpu if in @mask
- *	2) any cpu of current node if in @mask
- *	3) any other online cpu in @mask
- */
-int smp_call_function_any(const struct cpumask *mask,
-			  smp_call_func_t func, void *info, int wait)
-{
-	unsigned int cpu;
-	const struct cpumask *nodemask;
-	int ret;
-
-	/* Try for same CPU (cheapest) */
-	cpu = get_cpu();
-	if (cpumask_test_cpu(cpu, mask))
-		goto call;
-
-	/* Try for same node. */
-	nodemask = cpumask_of_node(cpu_to_node(cpu));
-	for (cpu = cpumask_first_and(nodemask, mask); cpu < nr_cpu_ids;
-	     cpu = cpumask_next_and(cpu, nodemask, mask)) {
-		if (cpu_online(cpu))
-			goto call;
-	}
-
-	/* Any online will do: smp_call_function_single handles nr_cpu_ids. */
-	cpu = cpumask_any_and(mask, cpu_online_mask);
-call:
-	ret = smp_call_function_single(cpu, func, info, wait);
-	put_cpu();
-	return ret;
-}
-EXPORT_SYMBOL_GPL(smp_call_function_any);
-
 static void smp_call_function_many_cond(const struct cpumask *mask,
 					smp_call_func_t func, void *info,
 					bool local_cpu,
@@ -1072,6 +1029,40 @@ void __smp_call_mask_cond(const struct cpumask *mask,
 	preempt_enable();
 }
 
+/**
+ * Given cpumask, find an online CPU as close to the local CPU as
+ * possible in the order below:
+ *
+ *   - The first pick is the local CPU if it is present in cpumask.
+ *   - The second pick is a CPU located in the same node of the
+ *     local CPU and it is also present in cpumask.
+ *   - The Third pick is any online CPU which is also in cpumask.
+ *
+ * The callers should disable preemption when using this function.
+ */
+int __smp_cpu_nearby(const struct cpumask *mask)
+{
+        int cpu;
+        const struct cpumask *nodemask;
+
+        cpu = smp_processor_id();
+        if (cpumask_test_cpu(cpu, mask))
+                return cpu;
+
+        /* Any CPU in the same node */
+        nodemask = cpumask_of_node(cpu_to_node(cpu));
+        for (cpu = cpumask_first_and(nodemask, mask); cpu < nr_cpu_ids;
+             cpu = cpumask_next_and(cpu, nodemask, mask)) {
+                if (cpu_online(cpu))
+                        return cpu;
+        }
+
+        /* Any online CPU will do */
+        cpu = cpumask_any_and(mask, cpu_online_mask);
+        return cpu;
+}
+EXPORT_SYMBOL(__smp_cpu_nearby);
+
 /*
  * smp_call Interface
  *
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 08/11] smp: replace smp_call_function_many_cond with __smp_call_mask_cond
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (6 preceding siblings ...)
  2022-05-19 20:49 ` [PATCH v4 07/11] smp: eliminate smp_call_function_any Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 09/11] smp: replace smp_call_function_single_async with smp_call_csd Donghai Qiao
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao, kernel test robot,
	Sven Schnelle

Replaced smp_call_function_many_cond() with __smp_call_mask_cond()
and made the changes accordingly.

Consolidated and clean up the redundant code alongside the paths of the
invocation to the function smp_call_function_many_cond().

on_each_cpu_cond_mask(cond_func, func, info, wait, mask) is replaced by
smp_call_mask_cond(umask, func, cond_func, info, (wait?SMP_CALL_TYPE_SYNC:
SMP_CALL_TYPE_ASYNC))

smp_call_function_many(mask, func, info, wait) is replaced by
smp_call_mask_others(mask, func, info, (wait?SMP_CALL_TYPE_SYNC:SMP_CALL_TYPE_ASYNC))

smp_call_function(func, info, wait) is replace by
smp_call_mask_others(SMP_CALL_MASK_ALL, func, info, (wait?SMP_CALL_TYPE_SYNC:SMP_CALL_TYPE_ASYNC))

on_each_cpu(func info, wait) is replaced by
smp_call(SMP_CALL_ALL, func, info, (wait?SMP_CALL_TYPE_SYNC:SMP_CALL_TYPE_ASYNC))

on_each_cpu_mask(mask, func, info, wait) is replaced by
smp_call_mask(mask, func, info, (wait?SMP_CALL_TYPE_SYNC:SMP_CALL_TYPE_ASYNC))

on_each_cpu_cond(cond_func, func, info, wait) is replaced by
smp_call_cond(SMP_CALL_ALL, func, info, cond_func, (wait ? SMP_CALL_TYPE_SYNC:
SMP_CALL_TYPE_ASYNC))

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Sven Schnelle <svens@linux.ibm.com> 
---
v1 -> v2: Removed 'x' from the function names and change XCALL to SMP_CALL
          from the new macros
v2 -> v3: Modified the commit comments;
          Replaced smp_call_function_many(mask,...) with smp_call_mask_others(mask,...)
          for all the invocations;
          Replaced smp_call_function() with smp_call_mask_others(SMP_CALL_MASK_ALL,...)
          for all the invocations.
v3 -> v4: Fixed the assert warnings emitted from __smp_call_mask_cond
          which is triggered by callers without preempt_disable().
          before calling smp_call_mask_others.
	  Removed the variable 'end_paddr' which is set but not used at arch/parisc/mm/init.c:623:31
	  Fixed undeclared lush_tlb_all_local at arch/parisc/mm/init.c:850:32

 arch/alpha/kernel/process.c                   |   2 +-
 arch/alpha/kernel/smp.c                       |  13 +-
 arch/arc/kernel/perf_event.c                  |   2 +-
 arch/arc/mm/cache.c                           |   2 +-
 arch/arc/mm/tlb.c                             |  14 +-
 arch/arm/common/bL_switcher.c                 |   3 +-
 arch/arm/kernel/machine_kexec.c               |   7 +-
 arch/arm/kernel/smp_tlb.c                     |  25 ++--
 arch/arm/kernel/smp_twd.c                     |   4 +-
 arch/arm/mm/flush.c                           |  10 +-
 arch/arm/vfp/vfpmodule.c                      |   2 +-
 arch/arm64/kernel/armv8_deprecated.c          |   4 +-
 arch/arm64/kernel/perf_event.c                |   2 +-
 arch/arm64/kvm/arm.c                          |   6 +-
 arch/csky/abiv2/cacheflush.c                  |   2 +-
 arch/csky/kernel/perf_event.c                 |   2 +-
 arch/csky/kernel/smp.c                        |   2 +-
 arch/csky/mm/cachev2.c                        |   2 +-
 arch/ia64/kernel/mca.c                        |   4 +-
 arch/ia64/kernel/smp.c                        |  12 +-
 arch/ia64/kernel/uncached.c                   |  14 +-
 arch/mips/cavium-octeon/octeon-irq.c          |   4 +-
 arch/mips/cavium-octeon/setup.c               |  12 +-
 arch/mips/kernel/crash.c                      |   6 +-
 arch/mips/kernel/machine_kexec.c              |   5 +-
 arch/mips/kernel/perf_event_mipsxx.c          |   7 +-
 arch/mips/kernel/smp.c                        |  11 +-
 arch/mips/kernel/sysrq.c                      |   5 +-
 arch/mips/mm/c-r4k.c                          |   4 +-
 arch/mips/sibyte/common/cfe.c                 |   5 +-
 arch/openrisc/kernel/smp.c                    |  15 +-
 arch/parisc/kernel/cache.c                    |   4 +-
 arch/parisc/mm/init.c                         |   4 +-
 arch/powerpc/kernel/dawr.c                    |   8 +-
 arch/powerpc/kernel/kvm.c                     |   2 +-
 arch/powerpc/kernel/security.c                |   6 +-
 arch/powerpc/kernel/smp.c                     |  10 +-
 arch/powerpc/kernel/sysfs.c                   |   2 +-
 arch/powerpc/kernel/tau_6xx.c                 |   4 +-
 arch/powerpc/kexec/core_64.c                  |   7 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |   2 +-
 arch/powerpc/mm/book3s64/pgtable.c            |   2 +-
 arch/powerpc/mm/book3s64/radix_tlb.c          |  12 +-
 arch/powerpc/mm/nohash/tlb.c                  |  11 +-
 arch/powerpc/mm/slice.c                       |   4 +-
 arch/powerpc/perf/core-book3s.c               |   2 +-
 arch/powerpc/perf/imc-pmu.c                   |   2 +-
 arch/powerpc/platforms/85xx/smp.c             |   8 +-
 arch/powerpc/platforms/powernv/idle.c         |   2 +-
 arch/powerpc/platforms/pseries/lparcfg.c      |   2 +-
 arch/riscv/mm/cacheflush.c                    |   4 +-
 arch/s390/hypfs/hypfs_diag0c.c                |   2 +-
 arch/s390/kernel/alternative.c                |   2 +-
 arch/s390/kernel/perf_cpum_cf.c               |  10 +-
 arch/s390/kernel/perf_cpum_cf_common.c        |   4 +-
 arch/s390/kernel/perf_cpum_sf.c               |   4 +-
 arch/s390/kernel/processor.c                  |   2 +-
 arch/s390/kernel/smp.c                        |   2 +-
 arch/s390/kernel/topology.c                   |   2 +-
 arch/s390/mm/pgalloc.c                        |   2 +-
 arch/s390/pci/pci_irq.c                       |   2 +-
 arch/sh/kernel/smp.c                          |  24 +++-
 arch/sh/mm/cache.c                            |   3 +-
 arch/sparc/include/asm/mman.h                 |   4 +-
 arch/sparc/kernel/nmi.c                       |  15 +-
 arch/sparc/kernel/perf_event.c                |   4 +-
 arch/sparc/kernel/smp_64.c                    |  14 +-
 arch/sparc/mm/init_64.c                       |   2 +-
 arch/x86/events/core.c                        |   6 +-
 arch/x86/events/intel/core.c                  |   4 +-
 arch/x86/kernel/alternative.c                 |   2 +-
 arch/x86/kernel/amd_nb.c                      |   2 +-
 arch/x86/kernel/apic/apic.c                   |   2 +-
 arch/x86/kernel/cpu/bugs.c                    |   2 +-
 arch/x86/kernel/cpu/mce/core.c                |  12 +-
 arch/x86/kernel/cpu/mce/inject.c              |   4 +-
 arch/x86/kernel/cpu/mce/intel.c               |   2 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |   6 +-
 arch/x86/kernel/cpu/sgx/main.c                |   5 +-
 arch/x86/kernel/cpu/umwait.c                  |   2 +-
 arch/x86/kernel/cpu/vmware.c                  |   2 +-
 arch/x86/kernel/kvm.c                         |   2 +-
 arch/x86/kernel/ldt.c                         |   2 +-
 arch/x86/kvm/x86.c                            |   4 +-
 arch/x86/lib/cache-smp.c                      |   2 +-
 arch/x86/lib/msr-smp.c                        |   2 +-
 arch/x86/mm/pat/set_memory.c                  |   4 +-
 arch/x86/mm/tlb.c                             |  12 +-
 arch/x86/xen/mmu_pv.c                         |   2 +-
 arch/x86/xen/smp_pv.c                         |   5 +-
 arch/x86/xen/suspend.c                        |   4 +-
 arch/xtensa/kernel/smp.c                      |  22 +--
 drivers/char/agp/generic.c                    |   2 +-
 drivers/clocksource/mips-gic-timer.c          |   2 +-
 drivers/cpufreq/acpi-cpufreq.c                |   6 +-
 drivers/cpufreq/tegra194-cpufreq.c            |   2 +-
 drivers/cpuidle/driver.c                      |   8 +-
 drivers/edac/amd64_edac.c                     |   4 +-
 drivers/firmware/arm_sdei.c                   |  10 +-
 drivers/gpu/drm/i915/vlv_sideband.c           |   2 +-
 drivers/hwmon/fam15h_power.c                  |   2 +-
 drivers/irqchip/irq-mvebu-pic.c               |   4 +-
 drivers/net/ethernet/marvell/mvneta.c         |  30 ++--
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   8 +-
 drivers/platform/x86/intel_ips.c              |   4 +-
 drivers/soc/xilinx/xlnx_event_manager.c       |   2 +-
 drivers/tty/sysrq.c                           |   5 +-
 drivers/watchdog/booke_wdt.c                  |   8 +-
 fs/buffer.c                                   |   2 +-
 include/linux/smp.h                           |  55 +------
 kernel/profile.c                              |   4 +-
 kernel/rcu/tree.c                             |   4 +-
 kernel/scftorture.c                           |  15 +-
 kernel/sched/membarrier.c                     |   8 +-
 kernel/smp.c                                  | 135 ++++--------------
 kernel/time/hrtimer.c                         |   5 +-
 kernel/trace/ftrace.c                         |  18 ++-
 kernel/trace/ring_buffer.c                    |   7 +-
 kernel/trace/trace.c                          |  12 +-
 kernel/trace/trace_events.c                   |   2 +-
 mm/kasan/quarantine.c                         |   2 +-
 mm/mmu_gather.c                               |   5 +-
 mm/slab.c                                     |   2 +-
 net/iucv/iucv.c                               |  10 +-
 virt/kvm/kvm_main.c                           |  10 +-
 126 files changed, 454 insertions(+), 459 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 5f8527081da9..90eeb04b435f 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -167,7 +167,7 @@ common_shutdown(int mode, char *restart_cmd)
 	struct halt_info args;
 	args.mode = mode;
 	args.restart_cmd = restart_cmd;
-	on_each_cpu(common_shutdown_1, &args, 0);
+	smp_call(SMP_CALL_ALL, common_shutdown_1, &args, SMP_CALL_TYPE_ASYNC);
 }
 
 void
diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index cb64e4797d2a..298463cb8c77 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -611,7 +611,7 @@ void
 smp_imb(void)
 {
 	/* Must wait other processors to flush their icache before continue. */
-	on_each_cpu(ipi_imb, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_imb, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(smp_imb);
 
@@ -626,7 +626,7 @@ flush_tlb_all(void)
 {
 	/* Although we don't have any data to pass, we do want to
 	   synchronize with the other processors.  */
-	on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 #define asn_locked() (cpu_data[smp_processor_id()].asn_lock)
@@ -661,7 +661,8 @@ flush_tlb_mm(struct mm_struct *mm)
 		}
 	}
 
-	smp_call_function(ipi_flush_tlb_mm, mm, 1);
+	smp_call_mask_others(SMP_CALL_MASK_ALL, ipi_flush_tlb_mm,
+				mm, SMP_CALL_TYPE_SYNC);
 
 	preempt_enable();
 }
@@ -712,7 +713,8 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 	data.mm = mm;
 	data.addr = addr;
 
-	smp_call_function(ipi_flush_tlb_page, &data, 1);
+	smp_call_mask_others(SMP_CALL_MASK_ALL, ipi_flush_tlb_page,
+			&data, SMP_CALL_TYPE_SYNC);
 
 	preempt_enable();
 }
@@ -762,7 +764,8 @@ flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 		}
 	}
 
-	smp_call_function(ipi_flush_icache_page, mm, 1);
+	smp_call_mask_others(SMP_CALL_MASK_ALL, ipi_flush_icache_page,
+			mm, SMP_CALL_TYPE_SYNC);
 
 	preempt_enable();
 }
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index adff957962da..682db1682287 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -811,7 +811,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 						 this_cpu_ptr(&arc_pmu_cpu));
 
 			if (!ret)
-				on_each_cpu(arc_cpu_pmu_irq_init, &irq, 1);
+				smp_call(SMP_CALL_ALL, arc_cpu_pmu_irq_init, &irq, SMP_CALL_TYPE_SYNC);
 			else
 				irq = -1;
 		}
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 5446967ea98d..4ad4e8f843e3 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -569,7 +569,7 @@ static void __ic_line_inv_vaddr(phys_addr_t paddr, unsigned long vaddr,
 		.sz    = sz
 	};
 
-	on_each_cpu(__ic_line_inv_vaddr_helper, &ic_inv, 1);
+	smp_call(SMP_CALL_ALL, __ic_line_inv_vaddr_helper, &ic_inv, SMP_CALL_TYPE_SYNC);
 }
 
 #endif	/* CONFIG_SMP */
diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
index 5f71445f26bd..83aaa61c6698 100644
--- a/arch/arc/mm/tlb.c
+++ b/arch/arc/mm/tlb.c
@@ -330,13 +330,13 @@ static inline void ipi_flush_tlb_kernel_range(void *arg)
 
 void flush_tlb_all(void)
 {
-	on_each_cpu((smp_call_func_t)local_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, (smp_call_func_t)local_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	on_each_cpu_mask(mm_cpumask(mm), (smp_call_func_t)local_flush_tlb_mm,
-			 mm, 1);
+	smp_call_mask(mm_cpumask(mm), (smp_call_func_t)local_flush_tlb_mm,
+			 mm, SMP_CALL_TYPE_SYNC);
 }
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
@@ -346,7 +346,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		.ta_start = uaddr
 	};
 
-	on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page, &ta, 1);
+	smp_call_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page, &ta, SMP_CALL_TYPE_SYNC);
 }
 
 void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
@@ -358,7 +358,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		.ta_end = end
 	};
 
-	on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range, &ta, 1);
+	smp_call_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range, &ta, SMP_CALL_TYPE_SYNC);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -371,7 +371,7 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		.ta_end = end
 	};
 
-	on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_pmd_tlb_range, &ta, 1);
+	smp_call_mask(mm_cpumask(vma->vm_mm), ipi_flush_pmd_tlb_range, &ta, SMP_CALL_TYPE_SYNC);
 }
 #endif
 
@@ -382,7 +382,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.ta_end = end
 	};
 
-	on_each_cpu(ipi_flush_tlb_kernel_range, &ta, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_kernel_range, &ta, SMP_CALL_TYPE_SYNC);
 }
 #endif
 
diff --git a/arch/arm/common/bL_switcher.c b/arch/arm/common/bL_switcher.c
index 9a9aa53547a6..a98a37e16595 100644
--- a/arch/arm/common/bL_switcher.c
+++ b/arch/arm/common/bL_switcher.c
@@ -541,7 +541,8 @@ int bL_switcher_trace_trigger(void)
 	preempt_disable();
 
 	bL_switcher_trace_trigger_cpu(NULL);
-	smp_call_function(bL_switcher_trace_trigger_cpu, NULL, true);
+	mp_call_mask_others(SMP_CALL_MASK_ALL, bL_switcher_trace_trigger_cpu,
+			NULL, SMP_CALL_TYPE_SYNC);
 
 	preempt_enable();
 
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f567032a09c0..5adf8cdb9752 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -101,7 +101,12 @@ void crash_smp_send_stop(void)
 		return;
 
 	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
-	smp_call_function(machine_crash_nonpanic_core, NULL, false);
+
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, machine_crash_nonpanic_core,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
+
 	msecs = 1000; /* Wait at most a second for the other cpus to stop */
 	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
 		mdelay(1);
diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index d4908b3736d8..f41254435d50 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -158,7 +158,10 @@ static void broadcast_tlb_a15_erratum(void)
 	if (!erratum_a15_798181())
 		return;
 
-	smp_call_function(ipi_flush_tlb_a15_erratum, NULL, 1);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, ipi_flush_tlb_a15_erratum,
+			NULL, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
 }
 
 static void broadcast_tlb_mm_a15_erratum(struct mm_struct *mm)
@@ -171,14 +174,14 @@ static void broadcast_tlb_mm_a15_erratum(struct mm_struct *mm)
 
 	this_cpu = get_cpu();
 	a15_erratum_get_cpumask(this_cpu, mm, &mask);
-	smp_call_function_many(&mask, ipi_flush_tlb_a15_erratum, NULL, 1);
+	smp_call_mask_others(&mask, ipi_flush_tlb_a15_erratum, NULL, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 }
 
 void flush_tlb_all(void)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+		smp_call(SMP_CALL_ALL, ipi_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 	else
 		__flush_tlb_all();
 	broadcast_tlb_a15_erratum();
@@ -187,7 +190,7 @@ void flush_tlb_all(void)
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
+		smp_call_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, SMP_CALL_TYPE_SYNC);
 	else
 		__flush_tlb_mm(mm);
 	broadcast_tlb_mm_a15_erratum(mm);
@@ -199,8 +202,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		struct tlb_args ta;
 		ta.ta_vma = vma;
 		ta.ta_start = uaddr;
-		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
-					&ta, 1);
+		smp_call_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
+			       &ta, SMP_CALL_TYPE_SYNC);
 	} else
 		__flush_tlb_page(vma, uaddr);
 	broadcast_tlb_mm_a15_erratum(vma->vm_mm);
@@ -211,7 +214,7 @@ void flush_tlb_kernel_page(unsigned long kaddr)
 	if (tlb_ops_need_broadcast()) {
 		struct tlb_args ta;
 		ta.ta_start = kaddr;
-		on_each_cpu(ipi_flush_tlb_kernel_page, &ta, 1);
+		smp_call(SMP_CALL_ALL, ipi_flush_tlb_kernel_page, &ta, SMP_CALL_TYPE_SYNC);
 	} else
 		__flush_tlb_kernel_page(kaddr);
 	broadcast_tlb_a15_erratum();
@@ -225,8 +228,8 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		ta.ta_vma = vma;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
-					&ta, 1);
+		smp_call_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
+				&ta, SMP_CALL_TYPE_SYNC);
 	} else
 		local_flush_tlb_range(vma, start, end);
 	broadcast_tlb_mm_a15_erratum(vma->vm_mm);
@@ -238,7 +241,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		struct tlb_args ta;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu(ipi_flush_tlb_kernel_range, &ta, 1);
+		smp_call(SMP_CALL_ALL, ipi_flush_tlb_kernel_range, &ta, SMP_CALL_TYPE_SYNC);
 	} else
 		local_flush_tlb_kernel_range(start, end);
 	broadcast_tlb_a15_erratum();
@@ -247,7 +250,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 void flush_bp_all(void)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu(ipi_flush_bp_all, NULL, 1);
+		smp_call(SMP_CALL_ALL, ipi_flush_bp_all, NULL, SMP_CALL_TYPE_SYNC);
 	else
 		__flush_bp_all();
 }
diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index 9a14f721a2b0..fa1b054e1398 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -119,8 +119,8 @@ static int twd_rate_change(struct notifier_block *nb,
 	 * changing cpu.
 	 */
 	if (flags == POST_RATE_CHANGE)
-		on_each_cpu(twd_update_frequency,
-				  (void *)&cnd->new_rate, 1);
+		smp_call(SMP_CALL_ALL, twd_update_frequency,
+			  (void *)&cnd->new_rate, SMP_CALL_TYPE_SYNC);
 
 	return NOTIFY_OK;
 }
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7ff9feea13a6..3e8a82dfadf7 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -149,9 +149,13 @@ void __flush_ptrace_access(struct page *page, unsigned long uaddr, void *kaddr,
 			flush_icache_alias(page_to_pfn(page), uaddr, len);
 		else
 			__cpuc_coherent_kern_range(addr, addr + len);
-		if (cache_ops_need_broadcast())
-			smp_call_function(flush_ptrace_access_other,
-					  NULL, 1);
+		if (cache_ops_need_broadcast()) {
+			preempt_disable();
+			smp_call_mask_others(SMP_CALL_MASK_ALL,
+					flush_ptrace_access_other,
+					NULL, SMP_CALL_TYPE_SYNC);
+			preempt_enable();
+		}
 	}
 }
 
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 2cb355c1b5b7..dc21f87062c2 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -780,7 +780,7 @@ static int __init vfp_init(void)
 	 * following test on FPSID will succeed.
 	 */
 	if (cpu_arch >= CPU_ARCH_ARMv6)
-		on_each_cpu(vfp_enable, NULL, 1);
+		smp_call(SMP_CALL_ALL, vfp_enable, NULL, SMP_CALL_TYPE_SYNC);
 
 	/*
 	 * First check that there is a VFP that we can use.
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
index 6875a16b09d2..03f469dffdb6 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -104,9 +104,9 @@ static int run_all_cpu_set_hw_mode(struct insn_emulation *insn, bool enable)
 	if (!insn->ops->set_hw_mode)
 		return -EINVAL;
 	if (enable)
-		on_each_cpu(enable_insn_hw_mode, (void *)insn, true);
+		smp_call(SMP_CALL_ALL, enable_insn_hw_mode, (void *)insn, SMP_CALL_TYPE_SYNC);
 	else
-		on_each_cpu(disable_insn_hw_mode, (void *)insn, true);
+		smp_call(SMP_CALL_ALL, disable_insn_hw_mode, (void *)insn, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index be4ef28dc1d6..c1efb0b0d8d6 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1215,7 +1215,7 @@ static int armv8pmu_proc_user_access_handler(struct ctl_table *table, int write,
 	if (ret || !write || sysctl_perf_user_access)
 		return ret;
 
-	on_each_cpu(armv8pmu_disable_user_access_ipi, NULL, 1);
+	smp_call(SMP_CALL_ALL, armv8pmu_disable_user_access_ipi, NULL, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 523bc934fe2f..bb82cbc48626 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1728,7 +1728,7 @@ static int init_subsystems(void)
 	/*
 	 * Enable hardware so that subsystem initialisation can access EL2.
 	 */
-	on_each_cpu(_kvm_arch_hardware_enable, NULL, 1);
+	smp_call(SMP_CALL_ALL, _kvm_arch_hardware_enable, NULL, SMP_CALL_TYPE_SYNC);
 
 	/*
 	 * Register CPU lower-power notifier
@@ -1765,7 +1765,7 @@ static int init_subsystems(void)
 
 out:
 	if (err || !is_protected_kvm_enabled())
-		on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
+		smp_call(SMP_CALL_ALL, _kvm_arch_hardware_disable, NULL, SMP_CALL_TYPE_SYNC);
 
 	return err;
 }
@@ -2000,7 +2000,7 @@ static int pkvm_drop_host_privileges(void)
 	 * once the host stage 2 is installed.
 	 */
 	static_branch_enable(&kvm_protected_mode_initialized);
-	on_each_cpu(_kvm_host_prot_finalize, &ret, 1);
+	smp_call(SMP_CALL_ALL, _kvm_host_prot_finalize, &ret, SMP_CALL_TYPE_SYNC);
 	return ret;
 }
 
diff --git a/arch/csky/abiv2/cacheflush.c b/arch/csky/abiv2/cacheflush.c
index 39c51399dd81..b662f8a10db8 100644
--- a/arch/csky/abiv2/cacheflush.c
+++ b/arch/csky/abiv2/cacheflush.c
@@ -80,7 +80,7 @@ void flush_icache_mm_range(struct mm_struct *mm,
 	cpumask_andnot(&others, mm_cpumask(mm), cpumask_of(cpu));
 
 	if (mm != current->active_mm || !cpumask_empty(&others)) {
-		on_each_cpu_mask(&others, local_icache_inv_all, NULL, 1);
+		smp_call_mask(&others, local_icache_inv_all, NULL, SMP_CALL_TYPE_SYNC);
 		cpumask_clear(mask);
 	}
 
diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index e5f18420ce64..bc8fb8a8fd28 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -1311,7 +1311,7 @@ int csky_pmu_device_probe(struct platform_device *pdev,
 	csky_pmu.plat_device = pdev;
 
 	/* Ensure the PMU has sane values out of reset. */
-	on_each_cpu(csky_pmu_reset, &csky_pmu, 1);
+	smp_call(SMP_CALL_ALL, csky_pmu_reset, &csky_pmu, SMP_CALL_TYPE_SYNC);
 
 	ret = csky_pmu_request_irq(csky_pmu_handle_irq);
 	if (ret) {
diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c
index 6bb38bc2f39b..2b62171d8d82 100644
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -137,7 +137,7 @@ static void ipi_stop(void *unused)
 
 void smp_send_stop(void)
 {
-	on_each_cpu(ipi_stop, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_stop, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void smp_send_reschedule(int cpu)
diff --git a/arch/csky/mm/cachev2.c b/arch/csky/mm/cachev2.c
index 7a9664adce43..19dab0d4c089 100644
--- a/arch/csky/mm/cachev2.c
+++ b/arch/csky/mm/cachev2.c
@@ -66,7 +66,7 @@ void icache_inv_range(unsigned long start, unsigned long end)
 	if (irqs_disabled())
 		local_icache_inv_range(&param);
 	else
-		on_each_cpu(local_icache_inv_range, &param, 1);
+		smp_call(SMP_CALL_ALL, local_icache_inv_range, &param, SMP_CALL_TYPE_SYNC);
 }
 #endif
 
diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
index e628a88607bb..c3723f7b0068 100644
--- a/arch/ia64/kernel/mca.c
+++ b/arch/ia64/kernel/mca.c
@@ -712,7 +712,7 @@ ia64_mca_cmc_vector_enable (void *dummy)
 static void
 ia64_mca_cmc_vector_disable_keventd(struct work_struct *unused)
 {
-	on_each_cpu(ia64_mca_cmc_vector_disable, NULL, 0);
+	smp_call(SMP_CALL_ALL, ia64_mca_cmc_vector_disable, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 /*
@@ -724,7 +724,7 @@ ia64_mca_cmc_vector_disable_keventd(struct work_struct *unused)
 static void
 ia64_mca_cmc_vector_enable_keventd(struct work_struct *unused)
 {
-	on_each_cpu(ia64_mca_cmc_vector_enable, NULL, 0);
+	smp_call(SMP_CALL_ALL, ia64_mca_cmc_vector_enable, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index 7b7b64eb3129..e7865d3d006d 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -285,7 +285,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
 void
 smp_flush_tlb_all (void)
 {
-	on_each_cpu((void (*)(void *))local_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, (void (*)(void *))local_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void
@@ -301,12 +301,14 @@ smp_flush_tlb_mm (struct mm_struct *mm)
 		return;
 	}
 	if (!alloc_cpumask_var(&cpus, GFP_ATOMIC)) {
-		smp_call_function((void (*)(void *))local_finish_flush_tlb_mm,
-			mm, 1);
+		smp_call_mask_others(SMP_CALL_MASK_ALL,
+				(void (*)(void *))local_finish_flush_tlb_mm,
+				mm, SMP_CALL_TYPE_SYNC);
 	} else {
 		cpumask_copy(cpus, mm_cpumask(mm));
-		smp_call_function_many(cpus,
-			(void (*)(void *))local_finish_flush_tlb_mm, mm, 1);
+		smp_call_mask_others(cpus,
+			(void (*)(void *))local_finish_flush_tlb_mm,
+			mm, SMP_CALL_TYPE_SYNC);
 		free_cpumask_var(cpus);
 	}
 	local_irq_disable();
diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c
index 816803636a75..9aeb75fa3a92 100644
--- a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -118,7 +118,12 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
 	status = ia64_pal_prefetch_visibility(PAL_VISIBILITY_PHYSICAL);
 	if (status == PAL_VISIBILITY_OK_REMOTE_NEEDED) {
 		atomic_set(&uc_pool->status, 0);
-		smp_call_function(uncached_ipi_visibility, uc_pool, 1);
+
+		preempt_disable();
+		smp_call_mask_others(SMP_CALL_MASK_ALL, uncached_ipi_visibility,
+				uc_pool, SMP_CALL_TYPE_SYNC);
+		preempt_enable();
+
 		if (atomic_read(&uc_pool->status))
 			goto failed;
 	} else if (status != PAL_VISIBILITY_OK)
@@ -137,7 +142,12 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
 	if (status != PAL_STATUS_SUCCESS)
 		goto failed;
 	atomic_set(&uc_pool->status, 0);
-	smp_call_function(uncached_ipi_mc_drain, uc_pool, 1);
+
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, uncached_ipi_mc_drain,
+			uc_pool, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
+
 	if (atomic_read(&uc_pool->status))
 		goto failed;
 
diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c
index 07d7ff5a981d..558a16d3a705 100644
--- a/arch/mips/cavium-octeon/octeon-irq.c
+++ b/arch/mips/cavium-octeon/octeon-irq.c
@@ -216,7 +216,7 @@ static void octeon_irq_core_bus_sync_unlock(struct irq_data *data)
 	struct octeon_core_chip_data *cd = irq_data_get_irq_chip_data(data);
 
 	if (cd->desired_en != cd->current_en) {
-		on_each_cpu(octeon_irq_core_set_enable_local, data, 1);
+		smp_call(SMP_CALL_ALL, octeon_irq_core_set_enable_local, data, SMP_CALL_TYPE_SYNC);
 
 		cd->current_en = cd->desired_en;
 	}
@@ -1364,7 +1364,7 @@ void octeon_irq_set_ip4_handler(octeon_irq_ip4_handler_t h)
 {
 	octeon_irq_ip4 = h;
 	octeon_irq_use_ip4 = true;
-	on_each_cpu(octeon_irq_local_enable_ip4, NULL, 1);
+	smp_call(SMP_CALL_ALL, octeon_irq_local_enable_ip4, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void octeon_irq_percpu_enable(void)
diff --git a/arch/mips/cavium-octeon/setup.c b/arch/mips/cavium-octeon/setup.c
index 00bf269763cf..d8bc5142b881 100644
--- a/arch/mips/cavium-octeon/setup.c
+++ b/arch/mips/cavium-octeon/setup.c
@@ -255,8 +255,13 @@ static void octeon_generic_shutdown(void)
 static void octeon_shutdown(void)
 {
 	octeon_generic_shutdown();
+
 #ifdef CONFIG_SMP
-	smp_call_function(octeon_kexec_smp_down, NULL, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, octeon_kexec_smp_down,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
+
 	smp_wmb();
 	while (num_online_cpus() > 1) {
 		cpu_relax();
@@ -469,7 +474,10 @@ static void octeon_kill_core(void *arg)
  */
 static void octeon_halt(void)
 {
-	smp_call_function(octeon_kill_core, NULL, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, octeon_kill_core,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 
 	switch (octeon_bootinfo->board_type) {
 	case CVMX_BOARD_TYPE_NAO38:
diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
index 81845ba04835..6f5d88705a07 100644
--- a/arch/mips/kernel/crash.c
+++ b/arch/mips/kernel/crash.c
@@ -63,7 +63,11 @@ static void crash_kexec_prepare_cpus(void)
 
 	ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
 
-	smp_call_function(crash_shutdown_secondary, NULL, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, crash_shutdown_secondary,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
+
 	smp_wmb();
 
 	/*
diff --git a/arch/mips/kernel/machine_kexec.c b/arch/mips/kernel/machine_kexec.c
index 432bfd3e7f22..dd823f9e291f 100644
--- a/arch/mips/kernel/machine_kexec.c
+++ b/arch/mips/kernel/machine_kexec.c
@@ -139,7 +139,10 @@ machine_shutdown(void)
 		_machine_kexec_shutdown();
 
 #ifdef CONFIG_SMP
-	smp_call_function(kexec_shutdown_secondary, NULL, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, kexec_shutdown_secondary,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 
 	while (num_online_cpus() > 1) {
 		cpu_relax();
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 1641d274fe37..8d8a4ae1ca26 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -642,8 +642,9 @@ static void hw_perf_event_destroy(struct perf_event *event)
 		 * We must not call the destroy function with interrupts
 		 * disabled.
 		 */
-		on_each_cpu(reset_counters,
-			(void *)(long)mipspmu.num_counters, 1);
+		smp_call(SMP_CALL_ALL, reset_counters,
+			(void *)(long)mipspmu.num_counters, SMP_CALL_TYPE_SYNC);
+
 		mipspmu_free_irq();
 		mutex_unlock(&pmu_reserve_mutex);
 	}
@@ -2043,7 +2044,7 @@ init_hw_perf_events(void)
 		mipspmu.write_counter = mipsxx_pmu_write_counter;
 	}
 
-	on_each_cpu(reset_counters, (void *)(long)counters, 1);
+	smp_call(SMP_CALL_ALL, reset_counters, (void *)(long)counters, SMP_CALL_TYPE_SYNC);
 
 	pr_cont("%s PMU enabled, %d %d-bit counters available to each "
 		"CPU, irq %d%s\n", mipspmu.name, counters, counter_bits, irq,
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 1986d1309410..27f94fde499d 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -397,7 +397,10 @@ static void stop_this_cpu(void *dummy)
 
 void smp_send_stop(void)
 {
-	smp_call_function(stop_this_cpu, NULL, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, stop_this_cpu,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 }
 
 void __init smp_cpus_done(unsigned int max_cpus)
@@ -472,7 +475,7 @@ void flush_tlb_all(void)
 		return;
 	}
 
-	on_each_cpu(flush_tlb_all_ipi, NULL, 1);
+	smp_call(SMP_CALL_ALL, flush_tlb_all_ipi, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void flush_tlb_mm_ipi(void *mm)
@@ -490,7 +493,7 @@ static void flush_tlb_mm_ipi(void *mm)
  */
 static inline void smp_on_other_tlbs(void (*func) (void *info), void *info)
 {
-	smp_call_function(func, info, 1);
+	smp_call_mask_others(SMP_CALL_MASK_ALL, func, info, SMP_CALL_TYPE_SYNC);
 }
 
 static inline void smp_on_each_tlb(void (*func) (void *info), void *info)
@@ -617,7 +620,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.addr2 = end,
 	};
 
-	on_each_cpu(flush_tlb_kernel_range_ipi, &fd, 1);
+	smp_call(SMP_CALL_ALL, flush_tlb_kernel_range_ipi, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void flush_tlb_page_ipi(void *info)
diff --git a/arch/mips/kernel/sysrq.c b/arch/mips/kernel/sysrq.c
index 9c1a2019113b..549d82bc8881 100644
--- a/arch/mips/kernel/sysrq.c
+++ b/arch/mips/kernel/sysrq.c
@@ -38,7 +38,10 @@ static void sysrq_tlbdump_single(void *dummy)
 #ifdef CONFIG_SMP
 static void sysrq_tlbdump_othercpus(struct work_struct *dummy)
 {
-	smp_call_function(sysrq_tlbdump_single, NULL, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, sysrq_tlbdump_single,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 }
 
 static DECLARE_WORK(sysrq_tlbdump, sysrq_tlbdump_othercpus);
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index ccb9e47322b0..558ce8f115a3 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -96,8 +96,8 @@ static inline void r4k_on_each_cpu(unsigned int type,
 {
 	preempt_disable();
 	if (r4k_op_needs_ipi(type))
-		smp_call_function_many(&cpu_foreign_map[smp_processor_id()],
-				       func, info, 1);
+		smp_call_mask_others(&cpu_foreign_map[smp_processor_id()],
+					func, info, SMP_CALL_TYPE_SYNC);
 	func(info);
 	preempt_enable();
 }
diff --git a/arch/mips/sibyte/common/cfe.c b/arch/mips/sibyte/common/cfe.c
index 1a504294d85f..e2e431110e3a 100644
--- a/arch/mips/sibyte/common/cfe.c
+++ b/arch/mips/sibyte/common/cfe.c
@@ -57,7 +57,10 @@ static void __noreturn cfe_linux_exit(void *arg)
 		if (!reboot_smp) {
 			/* Get CPU 0 to do the cfe_exit */
 			reboot_smp = 1;
-			smp_call_function(cfe_linux_exit, arg, 0);
+			preempt_disable();
+			smp_call_mask_others(SMP_CALL_MASK_ALL, cfe_linux_exit,
+					arg, SMP_CALL_TYPE_ASYNC);
+			preempt_enable();
 		}
 	} else {
 		printk("Passing control back to CFE...\n");
diff --git a/arch/openrisc/kernel/smp.c b/arch/openrisc/kernel/smp.c
index 27041db2c8b0..e2ed3c9ba50b 100644
--- a/arch/openrisc/kernel/smp.c
+++ b/arch/openrisc/kernel/smp.c
@@ -194,7 +194,10 @@ static void stop_this_cpu(void *dummy)
 
 void smp_send_stop(void)
 {
-	smp_call_function(stop_this_cpu, NULL, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, stop_this_cpu,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 }
 
 /* not supported, yet */
@@ -244,7 +247,7 @@ static void smp_flush_tlb_mm(struct cpumask *cmask, struct mm_struct *mm)
 		/* local cpu is the only cpu present in cpumask */
 		local_flush_tlb_mm(mm);
 	} else {
-		on_each_cpu_mask(cmask, ipi_flush_tlb_mm, mm, 1);
+		smp_call_mask(cmask, ipi_flush_tlb_mm, mm, SMP_CALL_TYPE_SYNC);
 	}
 	put_cpu();
 }
@@ -291,16 +294,16 @@ static void smp_flush_tlb_range(const struct cpumask *cmask, unsigned long start
 		fd.addr2 = end;
 
 		if ((end - start) <= PAGE_SIZE)
-			on_each_cpu_mask(cmask, ipi_flush_tlb_page, &fd, 1);
+			smp_call_mask(cmask, ipi_flush_tlb_page, &fd, SMP_CALL_TYPE_SYNC);
 		else
-			on_each_cpu_mask(cmask, ipi_flush_tlb_range, &fd, 1);
+			smp_call_mask(cmask, ipi_flush_tlb_range, &fd, SMP_CALL_TYPE_SYNC);
 	}
 	put_cpu();
 }
 
 void flush_tlb_all(void)
 {
-	on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
@@ -331,6 +334,6 @@ static void ipi_icache_page_inv(void *arg)
 
 void smp_icache_page_inv(struct page *page)
 {
-	on_each_cpu(ipi_icache_page_inv, page, 1);
+	smp_call(SMP_CALL_ALL, ipi_icache_page_inv, page, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(smp_icache_page_inv);
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index e7911225a4f8..2d51c1fef256 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -81,13 +81,13 @@ void flush_cache_all_local(void)
 void flush_cache_all(void)
 {
 	if (static_branch_likely(&parisc_has_cache))
-		on_each_cpu(cache_flush_local_cpu, NULL, 1);
+		smp_call(SMP_CALL_ALL, cache_flush_local_cpu, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static inline void flush_data_cache(void)
 {
 	if (static_branch_likely(&parisc_has_dcache))
-		on_each_cpu(flush_data_cache_local, NULL, 1);
+		smp_call(SMP_CALL_ALL, flush_data_cache_local, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 
diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index 1dc2e88e7b04..990224e54939 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -620,12 +620,10 @@ static void __init pagetable_init(void)
 
 	for (range = 0; range < npmem_ranges; range++) {
 		unsigned long start_paddr;
-		unsigned long end_paddr;
 		unsigned long size;
 
 		start_paddr = pmem_ranges[range].start_pfn << PAGE_SHIFT;
 		size = pmem_ranges[range].pages << PAGE_SHIFT;
-		end_paddr = start_paddr + size;
 
 		map_pages((unsigned long)__va(start_paddr), start_paddr,
 			  size, PAGE_KERNEL, 0);
@@ -847,7 +845,7 @@ void flush_tlb_all(void)
 	    do_recycle++;
 	}
 	spin_unlock(&sid_lock);
-	on_each_cpu(flush_tlb_all_local, NULL, 1);
+	smp_call(SMP_CALL_ALL, flush_tlb_all_local, NULL, SMP_CALL_TYPE_SYNC);
 	if (do_recycle) {
 	    spin_lock(&sid_lock);
 	    recycle_sids(recycle_ndirty,recycle_dirty_array);
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
index 64e423d2fe0f..68449a7f0b2b 100644
--- a/arch/powerpc/kernel/dawr.c
+++ b/arch/powerpc/kernel/dawr.c
@@ -76,8 +76,12 @@ static ssize_t dawr_write_file_bool(struct file *file,
 		return rc;
 
 	/* If we are clearing, make sure all CPUs have the DAWR cleared */
-	if (!dawr_force_enable)
-		smp_call_function(disable_dawrs_cb, NULL, 0);
+	if (!dawr_force_enable) {
+		preempt_disable();
+		smp_call_mask_others(SMP_CALL_MASK_ALL, disable_dawrs_cb,
+				NULL, SMP_CALL_TYPE_ASYNC);
+		preempt_enable();
+	}
 
 	return rc;
 }
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 6568823cf306..559023ef1952 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -666,7 +666,7 @@ static void __init kvm_use_magic_page(void)
 	u32 features;
 
 	/* Tell the host to map the magic page to -4096 on all CPUs */
-	on_each_cpu(kvm_map_magic_page, &features, 1);
+	smp_call(SMP_CALL_ALL, kvm_map_magic_page, &features, SMP_CALL_TYPE_SYNC);
 
 	/* Quick self-test to see if the mapping works */
 	if (fault_in_readable((const char __user *)KVM_MAGIC_PAGE,
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index d96fd14bd7c9..65b53535f2b9 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -607,7 +607,7 @@ void rfi_flush_enable(bool enable)
 {
 	if (enable) {
 		do_rfi_flush_fixups(enabled_flush_types);
-		on_each_cpu(do_nothing, NULL, 1);
+		smp_call(SMP_CALL_ALL, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 	} else
 		do_rfi_flush_fixups(L1D_FLUSH_NONE);
 
@@ -618,7 +618,7 @@ static void entry_flush_enable(bool enable)
 {
 	if (enable) {
 		do_entry_flush_fixups(enabled_flush_types);
-		on_each_cpu(do_nothing, NULL, 1);
+		smp_call(SMP_CALL_ALL, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 	} else {
 		do_entry_flush_fixups(L1D_FLUSH_NONE);
 	}
@@ -631,7 +631,7 @@ static void uaccess_flush_enable(bool enable)
 	if (enable) {
 		do_uaccess_flush_fixups(enabled_flush_types);
 		static_branch_enable(&uaccess_flush_key);
-		on_each_cpu(do_nothing, NULL, 1);
+		smp_call(SMP_CALL_ALL, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 	} else {
 		static_branch_disable(&uaccess_flush_key);
 		do_uaccess_flush_fixups(L1D_FLUSH_NONE);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index de0f6f09a5dd..af47670e6b26 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -655,7 +655,10 @@ void crash_smp_send_stop(void)
 #ifdef CONFIG_NMI_IPI
 	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 1000000);
 #else
-	smp_call_function(crash_stop_this_cpu, NULL, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, crash_stop_this_cpu,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 #endif /* CONFIG_NMI_IPI */
 }
 
@@ -711,7 +714,10 @@ void smp_send_stop(void)
 
 	stopped = true;
 
-	smp_call_function(stop_this_cpu, NULL, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, stop_this_cpu,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 }
 #endif /* CONFIG_NMI_IPI */
 
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 2069bbb90a9a..898b48ca7b5a 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -206,7 +206,7 @@ static ssize_t __used store_dscr_default(struct device *dev,
 		return -EINVAL;
 	dscr_default = val;
 
-	on_each_cpu(write_dscr, &val, 1);
+	smp_call(SMP_CALL_ALL, write_dscr, &val, SMP_CALL_TYPE_SYNC);
 
 	return count;
 }
diff --git a/arch/powerpc/kernel/tau_6xx.c b/arch/powerpc/kernel/tau_6xx.c
index 828d0f4106d2..235905acf234 100644
--- a/arch/powerpc/kernel/tau_6xx.c
+++ b/arch/powerpc/kernel/tau_6xx.c
@@ -158,7 +158,7 @@ static struct workqueue_struct *tau_workq;
 static void tau_work_func(struct work_struct *work)
 {
 	msleep(shrink_timer);
-	on_each_cpu(tau_timeout, NULL, 0);
+	smp_call(SMP_CALL_ALL, tau_timeout, NULL, SMP_CALL_TYPE_ASYNC);
 	/* schedule ourselves to be run again */
 	queue_work(tau_workq, work);
 }
@@ -204,7 +204,7 @@ static int __init TAU_init(void)
 	if (!tau_workq)
 		return -ENOMEM;
 
-	on_each_cpu(TAU_init_smp, NULL, 0);
+	smp_call(SMP_CALL_ALL, TAU_init_smp, NULL, SMP_CALL_TYPE_ASYNC);
 
 	queue_work(tau_workq, &tau_work);
 
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 6cc7793b8420..e9cd0c21a8cb 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -225,7 +225,12 @@ static void wake_offline_cpus(void)
 static void kexec_prepare_cpus(void)
 {
 	wake_offline_cpus();
-	smp_call_function(kexec_smp_down, NULL, /* wait */0);
+
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, kexec_smp_down,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
+
 	local_irq_disable();
 	hard_irq_disable();
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 0aeb51738ca9..776f9d1dfe99 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1555,7 +1555,7 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 
 	/* Boot all CPUs out of the guest so they re-read
 	 * mmu_ready */
-	on_each_cpu(resize_hpt_boot_vcpu, NULL, 1);
+	smp_call(SMP_CALL_ALL, resize_hpt_boot_vcpu, NULL, SMP_CALL_TYPE_SYNC);
 
 	ret = -ENXIO;
 	if (!resize || (resize->order != shift))
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 052e6590f84f..ed809b93c635 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -111,7 +111,7 @@ static void do_serialize(void *arg)
 void serialize_against_pte_lookup(struct mm_struct *mm)
 {
 	smp_mb();
-	smp_call_function_many(mm_cpumask(mm), do_serialize, mm, 1);
+	smp_call_mask_others(mm_cpumask(mm), do_serialize, mm, SMP_CALL_TYPE_SYNC);
 }
 
 /*
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 7724af19ed7e..189b1b483ca5 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -464,7 +464,7 @@ static inline void _tlbiel_pid_multicast(struct mm_struct *mm,
 	struct cpumask *cpus = mm_cpumask(mm);
 	struct tlbiel_pid t = { .pid = pid, .ric = ric };
 
-	on_each_cpu_mask(cpus, do_tlbiel_pid, &t, 1);
+	smp_call_mask(cpus, do_tlbiel_pid, &t, SMP_CALL_TYPE_SYNC);
 	/*
 	 * Always want the CPU translations to be invalidated with tlbiel in
 	 * these paths, so while coprocessors must use tlbie, we can not
@@ -616,7 +616,7 @@ static inline void _tlbiel_va_multicast(struct mm_struct *mm,
 {
 	struct cpumask *cpus = mm_cpumask(mm);
 	struct tlbiel_va t = { .va = va, .pid = pid, .psize = psize, .ric = ric };
-	on_each_cpu_mask(cpus, do_tlbiel_va, &t, 1);
+	smp_call_mask(cpus, do_tlbiel_va, &t, SMP_CALL_TYPE_SYNC);
 	if (atomic_read(&mm->context.copros) > 0)
 		_tlbie_va(va, pid, psize, RIC_FLUSH_TLB);
 }
@@ -682,7 +682,7 @@ static inline void _tlbiel_va_range_multicast(struct mm_struct *mm,
 				.pid = pid, .page_size = page_size,
 				.psize = psize, .also_pwc = also_pwc };
 
-	on_each_cpu_mask(cpus, do_tlbiel_va_range, &t, 1);
+	smp_call_mask(cpus, do_tlbiel_va_range, &t, SMP_CALL_TYPE_SYNC);
 	if (atomic_read(&mm->context.copros) > 0)
 		_tlbie_va_range(start, end, pid, page_size, psize, also_pwc);
 }
@@ -827,8 +827,8 @@ static void exit_flush_lazy_tlbs(struct mm_struct *mm)
 	 * make a special powerpc IPI for flushing TLBs.
 	 * For now it's not too performance critical.
 	 */
-	smp_call_function_many(mm_cpumask(mm), do_exit_flush_lazy_tlb,
-				(void *)mm, 1);
+	smp_call_mask_others(mm_cpumask(mm), do_exit_flush_lazy_tlb,
+				(void *)mm, SMP_CALL_TYPE_SYNC);
 }
 
 #else /* CONFIG_SMP */
@@ -1064,7 +1064,7 @@ static void do_tlbiel_kernel(void *info)
 
 static inline void _tlbiel_kernel_broadcast(void)
 {
-	on_each_cpu(do_tlbiel_kernel, NULL, 1);
+	smp_call(SMP_CALL_ALL, do_tlbiel_kernel, NULL, SMP_CALL_TYPE_SYNC);
 	if (tlbie_capable) {
 		/*
 		 * Coherent accelerators don't refcount kernel memory mappings,
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index fd2c77af5c55..5af924d35c3d 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -276,8 +276,8 @@ void flush_tlb_mm(struct mm_struct *mm)
 	if (!mm_is_core_local(mm)) {
 		struct tlb_flush_param p = { .pid = pid };
 		/* Ignores smp_processor_id() even if set. */
-		smp_call_function_many(mm_cpumask(mm),
-				       do_flush_tlb_mm_ipi, &p, 1);
+		smp_call_mask_others(mm_cpumask(mm),
+			       do_flush_tlb_mm_ipi, &p, SMP_CALL_TYPE_SYNC);
 	}
 	_tlbil_pid(pid);
  no_context:
@@ -321,8 +321,8 @@ void __flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
 				.ind = ind,
 			};
 			/* Ignores smp_processor_id() even if set in cpu_mask */
-			smp_call_function_many(cpu_mask,
-					       do_flush_tlb_page_ipi, &p, 1);
+			smp_call_mask_others(cpu_mask,
+				       do_flush_tlb_page_ipi, &p, SMP_CALL_TYPE_SYNC);
 		}
 	}
 	_tlbil_va(vmaddr, pid, tsize, ind);
@@ -362,7 +362,8 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
 #ifdef CONFIG_SMP
 	preempt_disable();
-	smp_call_function(do_flush_tlb_mm_ipi, NULL, 1);
+	smp_call_mask_others(SMP_CALL_MASK_ALL, do_flush_tlb_mm_ipi,
+			NULL, SMP_CALL_TYPE_SYNC);
 	_tlbil_pid(0);
 	preempt_enable();
 #else
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index f42711f865f3..a597007d8cf4 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -464,7 +464,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 		 */
 		mm_ctx_set_slb_addr_limit(&mm->context, high_limit);
 
-		on_each_cpu(slice_flush_segments, mm, 1);
+		smp_call(SMP_CALL_ALL, slice_flush_segments, mm, SMP_CALL_TYPE_SYNC);
 	}
 
 	/* Sanity checks */
@@ -626,7 +626,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 		 !bitmap_empty(potential_mask.high_slices, SLICE_NUM_HIGH))) {
 		slice_convert(mm, &potential_mask, psize);
 		if (psize > MMU_PAGE_BASE)
-			on_each_cpu(slice_flush_segments, mm, 1);
+			smp_call(SMP_CALL_ALL, slice_flush_segments, mm, SMP_CALL_TYPE_SYNC);
 	}
 	return newaddr;
 
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index b5b42cf0a703..548dccc3efe1 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2529,7 +2529,7 @@ static int __init init_ppc64_pmu(void)
 {
 	if (cpu_has_feature(CPU_FTR_HVMODE) && pmu_override) {
 		pr_warn("disabling perf due to pmu_override= command line option.\n");
-		on_each_cpu(do_pmu_override, NULL, 1);
+		smp_call(SMP_CALL_ALL, do_pmu_override, NULL, SMP_CALL_TYPE_SYNC);
 		return 0;
 	}
 
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 526d4b767534..6ecf8315e928 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1583,7 +1583,7 @@ static void thread_imc_ldbar_disable(void *dummy)
 
 void thread_imc_disable(void)
 {
-	on_each_cpu(thread_imc_ldbar_disable, NULL, 1);
+	smp_call(SMP_CALL_ALL, thread_imc_ldbar_disable, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void cleanup_all_thread_imc_memory(void)
diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index a1c6a7827c8f..e36c67d7dffb 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -444,8 +444,12 @@ static void mpc85xx_smp_machine_kexec(struct kimage *image)
 	int timeout = INT_MAX;
 	int i, num_cpus = num_present_cpus();
 
-	if (image->type == KEXEC_TYPE_DEFAULT)
-		smp_call_function(mpc85xx_smp_kexec_down, NULL, 0);
+	if (image->type == KEXEC_TYPE_DEFAULT) {
+		preempt_disable();
+		smp_call_mask_others(SMP_CALL_MASK_ALL, mpc85xx_smp_kexec_down,
+				NULL, SMP_CALL_TYPE_ASYNC);
+		preempt_enable();
+	}
 
 	while ( (atomic_read(&kexec_down_cpus) != (num_cpus - 1)) &&
 		( timeout > 0 ) )
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index a6677a111aca..5bc261517d08 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -202,7 +202,7 @@ static ssize_t store_fastsleep_workaround_applyonce(struct device *dev,
 	power7_fastsleep_workaround_exit = false;
 
 	cpus_read_lock();
-	on_each_cpu(pnv_fastsleep_workaround_apply, &err, 1);
+	smp_call(SMP_CALL_ALL, pnv_fastsleep_workaround_apply, &err, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 	if (err) {
 		pr_err("fastsleep_workaround_applyonce change failed while running pnv_fastsleep_workaround_apply");
diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index 2119c003fcf9..3a5d31321366 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -61,7 +61,7 @@ static unsigned long get_purr(void)
 {
 	atomic64_t purr = ATOMIC64_INIT(0);
 
-	on_each_cpu(cpu_get_purr, &purr, 1);
+	smp_call(SMP_CALL_ALL, cpu_get_purr, &purr, SMP_CALL_TYPE_SYNC);
 
 	return atomic64_read(&purr);
 }
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 6cb7d96ad9c7..8d64bb39aff2 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -21,7 +21,7 @@ void flush_icache_all(void)
 	if (IS_ENABLED(CONFIG_RISCV_SBI))
 		sbi_remote_fence_i(NULL);
 	else
-		on_each_cpu(ipi_remote_fence_i, NULL, 1);
+		smp_call(SMP_CALL_ALL, ipi_remote_fence_i, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(flush_icache_all);
 
@@ -69,7 +69,7 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
 	} else if (IS_ENABLED(CONFIG_RISCV_SBI)) {
 		sbi_remote_fence_i(&others);
 	} else {
-		on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
+		smp_call_mask(&others, ipi_remote_fence_i, NULL, SMP_CALL_TYPE_SYNC);
 	}
 
 	preempt_enable();
diff --git a/arch/s390/hypfs/hypfs_diag0c.c b/arch/s390/hypfs/hypfs_diag0c.c
index 9a2786079e3a..f513e1529b6a 100644
--- a/arch/s390/hypfs/hypfs_diag0c.c
+++ b/arch/s390/hypfs/hypfs_diag0c.c
@@ -51,7 +51,7 @@ static void *diag0c_store(unsigned int *count)
 		cpu_vec[cpu] = &diag0c_data->entry[i++];
 	}
 	/* Collect data all CPUs */
-	on_each_cpu(diag0c_fn, cpu_vec, 1);
+	smp_call(SMP_CALL_ALL, diag0c_fn, cpu_vec, SMP_CALL_TYPE_SYNC);
 	*count = cpu_count;
 	kfree(cpu_vec);
 	cpus_read_unlock();
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index cce0ddee2d02..2c112748a785 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -121,7 +121,7 @@ static void do_sync_core(void *info)
 
 void text_poke_sync(void)
 {
-	on_each_cpu(do_sync_core, NULL, 1);
+	smp_call(SMP_CALL_ALL, do_sync_core, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void text_poke_sync_lock(void)
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 483ab5e10164..5629b32a7f5a 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -923,7 +923,7 @@ static void cfset_all_stop(struct cfset_request *req)
 	};
 
 	cpumask_and(&req->mask, &req->mask, cpu_online_mask);
-	on_each_cpu_mask(&req->mask, cfset_ioctl_off, &p, 1);
+	smp_call_mask(&req->mask, cfset_ioctl_off, &p, SMP_CALL_TYPE_SYNC);
 }
 
 /* Release function is also called when application gets terminated without
@@ -940,7 +940,7 @@ static int cfset_release(struct inode *inode, struct file *file)
 		file->private_data = NULL;
 	}
 	if (!atomic_dec_return(&cfset_opencnt))
-		on_each_cpu(cfset_release_cpu, NULL, 1);
+		smp_call(SMP_CALL_ALL, cfset_release_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	mutex_unlock(&cfset_ctrset_mutex);
 
 	hw_perf_event_destroy(NULL);
@@ -974,9 +974,9 @@ static int cfset_all_start(struct cfset_request *req)
 	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
 		return -ENOMEM;
 	cpumask_and(mask, &req->mask, cpu_online_mask);
-	on_each_cpu_mask(mask, cfset_ioctl_on, &p, 1);
+	smp_call_mask(mask, cfset_ioctl_on, &p, SMP_CALL_TYPE_SYNC);
 	if (atomic_read(&p.cpus_ack) != cpumask_weight(mask)) {
-		on_each_cpu_mask(mask, cfset_ioctl_off, &p, 1);
+		smp_call_mask(mask, cfset_ioctl_off, &p, SMP_CALL_TYPE_SYNC);
 		rc = -EIO;
 		debug_sprintf_event(cf_dbg, 4, "%s CPUs missing", __func__);
 	}
@@ -1100,7 +1100,7 @@ static int cfset_all_read(unsigned long arg, struct cfset_request *req)
 
 	p.sets = req->ctrset;
 	cpumask_and(mask, &req->mask, cpu_online_mask);
-	on_each_cpu_mask(mask, cfset_cpu_read, &p, 1);
+	smp_call_mask(mask, cfset_cpu_read, &p, SMP_CALL_TYPE_SYNC);
 	rc = cfset_all_copy(arg, mask);
 	free_cpumask_var(mask);
 	return rc;
diff --git a/arch/s390/kernel/perf_cpum_cf_common.c b/arch/s390/kernel/perf_cpum_cf_common.c
index 8ee48672233f..d46b6ee17485 100644
--- a/arch/s390/kernel/perf_cpum_cf_common.c
+++ b/arch/s390/kernel/perf_cpum_cf_common.c
@@ -105,7 +105,7 @@ int __kernel_cpumcf_begin(void)
 {
 	int flags = PMC_INIT;
 
-	on_each_cpu(cpum_cf_setup_cpu, &flags, 1);
+	smp_call(SMP_CALL_ALL, cpum_cf_setup_cpu, &flags, SMP_CALL_TYPE_SYNC);
 	irq_subclass_register(IRQ_SUBCLASS_MEASUREMENT_ALERT);
 
 	return 0;
@@ -131,7 +131,7 @@ void __kernel_cpumcf_end(void)
 {
 	int flags = PMC_RELEASE;
 
-	on_each_cpu(cpum_cf_setup_cpu, &flags, 1);
+	smp_call(SMP_CALL_ALL, cpum_cf_setup_cpu, &flags, SMP_CALL_TYPE_SYNC);
 	irq_subclass_unregister(IRQ_SUBCLASS_MEASUREMENT_ALERT);
 }
 EXPORT_SYMBOL(__kernel_cpumcf_end);
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 332a49965130..d0e1c7acb249 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -582,14 +582,14 @@ static void release_pmc_hardware(void)
 	int flags = PMC_RELEASE;
 
 	irq_subclass_unregister(IRQ_SUBCLASS_MEASUREMENT_ALERT);
-	on_each_cpu(setup_pmc_cpu, &flags, 1);
+	smp_call(SMP_CALL_ALL, setup_pmc_cpu, &flags, SMP_CALL_TYPE_SYNC);
 }
 
 static int reserve_pmc_hardware(void)
 {
 	int flags = PMC_INIT;
 
-	on_each_cpu(setup_pmc_cpu, &flags, 1);
+	smp_call(SMP_CALL_ALL, setup_pmc_cpu, &flags, SMP_CALL_TYPE_SYNC);
 	if (flags & PMC_FAILURE) {
 		release_pmc_hardware();
 		return -ENODEV;
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index aa0e0e7fc773..04c6f2a70483 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -62,7 +62,7 @@ void s390_update_cpu_mhz(void)
 {
 	s390_adjust_jiffies();
 	if (machine_has_cpu_mhz)
-		on_each_cpu(update_cpu_mhz, NULL, 0);
+		smp_call(SMP_CALL_ALL, update_cpu_mhz, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 void notrace stop_machine_yield(const struct cpumask *cpumask)
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 30c91d565933..837e5f8a619c 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -595,7 +595,7 @@ void smp_ctl_set_clear_bit(int cr, int bit, bool set)
 	ctlreg = (ctlreg & parms.andval) | parms.orval;
 	put_abs_lowcore(cregs_save_area[cr], ctlreg);
 	spin_unlock(&ctl_lock);
-	on_each_cpu(smp_ctl_bit_callback, &parms, 1);
+	smp_call(SMP_CALL_ALL, smp_ctl_bit_callback, &parms, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(smp_ctl_set_clear_bit);
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index c6eecd4a5302..3df2a1f0c5fe 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -322,7 +322,7 @@ int arch_update_cpu_topology(void)
 	int cpu, rc;
 
 	rc = __arch_update_cpu_topology();
-	on_each_cpu(__arch_update_dedicated_flag, NULL, 0);
+	smp_call(SMP_CALL_ALL, __arch_update_dedicated_flag, NULL, SMP_CALL_TYPE_ASYNC);
 	for_each_online_cpu(cpu) {
 		dev = get_cpu_device(cpu);
 		if (dev)
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 2de48b2c1b04..f700843a703c 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -131,7 +131,7 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
 
 	spin_unlock_bh(&mm->page_table_lock);
 
-	on_each_cpu(__crst_table_upgrade, mm, 0);
+	smp_call(SMP_CALL_ALL, __crst_table_upgrade, mm, SMP_CALL_TYPE_ASYNC);
 
 	return 0;
 
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 500cd2dbdf53..136af9f32f23 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -440,7 +440,7 @@ static int __init zpci_directed_irq_init(void)
 		if (!zpci_ibv[cpu])
 			return -ENOMEM;
 	}
-	on_each_cpu(cpu_enable_directed_irq, NULL, 1);
+	smp_call(SMP_CALL_ALL, cpu_enable_directed_irq, NULL, SMP_CALL_TYPE_SYNC);
 
 	zpci_irq_chip.irq_set_affinity = zpci_set_irq_affinity;
 
diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 65924d9ec245..5681955e7d9e 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -263,7 +263,10 @@ void smp_send_reschedule(int cpu)
 
 void smp_send_stop(void)
 {
-	smp_call_function(stop_this_cpu, 0, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, stop_this_cpu,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 }
 
 void arch_send_call_function_ipi_mask(const struct cpumask *mask)
@@ -335,7 +338,7 @@ static void flush_tlb_all_ipi(void *info)
 
 void flush_tlb_all(void)
 {
-	on_each_cpu(flush_tlb_all_ipi, 0, 1);
+	smp_call(SMP_CALL_ALL, flush_tlb_all_ipi, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void flush_tlb_mm_ipi(void *mm)
@@ -360,7 +363,8 @@ void flush_tlb_mm(struct mm_struct *mm)
 	preempt_disable();
 
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
-		smp_call_function(flush_tlb_mm_ipi, (void *)mm, 1);
+		smp_call_mask_others(SMP_CALL_MASK_ALL, flush_tlb_mm_ipi,
+				(void *)mm, SMP_CALL_TYPE_SYNC);
 	} else {
 		int i;
 		for_each_online_cpu(i)
@@ -397,7 +401,8 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		fd.vma = vma;
 		fd.addr1 = start;
 		fd.addr2 = end;
-		smp_call_function(flush_tlb_range_ipi, (void *)&fd, 1);
+		smp_call_mask_others(SMP_CALL_MASK_ALL, flush_tlb_range_ipi,
+				(void *)&fd, SMP_CALL_TYPE_SYNC);
 	} else {
 		int i;
 		for_each_online_cpu(i)
@@ -421,7 +426,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 
 	fd.addr1 = start;
 	fd.addr2 = end;
-	on_each_cpu(flush_tlb_kernel_range_ipi, (void *)&fd, 1);
+	smp_call(SMP_CALL_ALL, flush_tlb_kernel_range_ipi, (void *)&fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void flush_tlb_page_ipi(void *info)
@@ -440,7 +445,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 
 		fd.vma = vma;
 		fd.addr1 = page;
-		smp_call_function(flush_tlb_page_ipi, (void *)&fd, 1);
+		smp_call_mask_others(SMP_CALL_MASK_ALL, flush_tlb_page_ipi,
+				(void *)&fd, SMP_CALL_TYPE_SYNC);
 	} else {
 		int i;
 		for_each_online_cpu(i)
@@ -464,7 +470,11 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
 	fd.addr1 = asid;
 	fd.addr2 = vaddr;
 
-	smp_call_function(flush_tlb_one_ipi, (void *)&fd, 1);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, flush_tlb_one_ipi,
+			(void *)&fd, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
+
 	local_flush_tlb_one(asid, vaddr);
 }
 
diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
index 3aef78ceb820..58215011a441 100644
--- a/arch/sh/mm/cache.c
+++ b/arch/sh/mm/cache.c
@@ -49,7 +49,8 @@ static inline void cacheop_on_each_cpu(void (*func) (void *info), void *info,
 	 * even attempt IPIs unless there are other CPUs online.
 	 */
 	if (num_online_cpus() > 1)
-		smp_call_function(func, info, wait);
+		smp_call_mask_others(SMP_CALL_MASK_ALL, func, info,
+			(wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 #endif
 
 	func(info);
diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h
index 274217e7ed70..038099bcabea 100644
--- a/arch/sparc/include/asm/mman.h
+++ b/arch/sparc/include/asm/mman.h
@@ -37,8 +37,8 @@ static inline unsigned long sparc_calc_vm_prot_bits(unsigned long prot)
 			regs = task_pt_regs(current);
 			regs->tstate |= TSTATE_MCDE;
 			current->mm->context.adi = true;
-			on_each_cpu_mask(mm_cpumask(current->mm),
-					 ipi_set_tstate_mcde, current->mm, 0);
+			smp_call_mask(mm_cpumask(current->mm),
+				ipi_set_tstate_mcde, current->mm, SMP_CALL_TYPE_ASYNC);
 		}
 		return VM_SPARC_ADI;
 	} else {
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 060fff95a305..b18a45dc0028 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -176,7 +176,10 @@ static int __init check_nmi_watchdog(void)
 
 	printk(KERN_INFO "Testing NMI watchdog ... ");
 
-	smp_call_function(nmi_cpu_busy, (void *)&endflag, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, nmi_cpu_busy,
+			(void *)&endflag, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 
 	for_each_possible_cpu(cpu)
 		prev_nmi_count[cpu] = get_nmi_count(cpu);
@@ -203,7 +206,7 @@ static int __init check_nmi_watchdog(void)
 	kfree(prev_nmi_count);
 	return 0;
 error:
-	on_each_cpu(stop_nmi_watchdog, NULL, 1);
+	smp_call(SMP_CALL_ALL, stop_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 	return err;
 }
 
@@ -235,13 +238,13 @@ static void nmi_adjust_hz_one(void *unused)
 void nmi_adjust_hz(unsigned int new_hz)
 {
 	nmi_hz = new_hz;
-	on_each_cpu(nmi_adjust_hz_one, NULL, 1);
+	smp_call(SMP_CALL_ALL, nmi_adjust_hz_one, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(nmi_adjust_hz);
 
 static int nmi_shutdown(struct notifier_block *nb, unsigned long cmd, void *p)
 {
-	on_each_cpu(stop_nmi_watchdog, NULL, 1);
+	smp_call(SMP_CALL_ALL, stop_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 
@@ -253,13 +256,13 @@ int __init nmi_init(void)
 {
 	int err;
 
-	on_each_cpu(start_nmi_watchdog, NULL, 1);
+	smp_call(SMP_CALL_ALL, start_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 
 	err = check_nmi_watchdog();
 	if (!err) {
 		err = register_reboot_notifier(&nmi_reboot_notifier);
 		if (err) {
-			on_each_cpu(stop_nmi_watchdog, NULL, 1);
+			smp_call(SMP_CALL_ALL, stop_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 			atomic_set(&nmi_active, -1);
 		}
 	}
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index a58ae9c42803..a843eb4a0f2c 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1176,7 +1176,7 @@ static void perf_event_grab_pmc(void)
 	mutex_lock(&pmc_grab_mutex);
 	if (atomic_read(&active_events) == 0) {
 		if (atomic_read(&nmi_active) > 0) {
-			on_each_cpu(perf_stop_nmi_watchdog, NULL, 1);
+			smp_call(SMP_CALL_ALL, perf_stop_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 			BUG_ON(atomic_read(&nmi_active) != 0);
 		}
 		atomic_inc(&active_events);
@@ -1188,7 +1188,7 @@ static void perf_event_release_pmc(void)
 {
 	if (atomic_dec_and_mutex_lock(&active_events, &pmc_grab_mutex)) {
 		if (atomic_read(&nmi_active) == 0)
-			on_each_cpu(start_nmi_watchdog, NULL, 1);
+			smp_call(SMP_CALL_ALL, start_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 		mutex_unlock(&pmc_grab_mutex);
 	}
 }
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index a1f78e9ddaf3..3e701dccd619 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -901,7 +901,7 @@ static void tsb_sync(void *info)
 
 void smp_tsb_sync(struct mm_struct *mm)
 {
-	smp_call_function_many(mm_cpumask(mm), tsb_sync, mm, 1);
+	smp_call_mask_others(mm_cpumask(mm), tsb_sync, mm, SMP_CALL_TYPE_SYNC);
 }
 
 extern unsigned long xcall_flush_tlb_mm;
@@ -1084,8 +1084,8 @@ void smp_flush_tlb_pending(struct mm_struct *mm, unsigned long nr, unsigned long
 	info.nr = nr;
 	info.vaddrs = vaddrs;
 
-	smp_call_function_many(mm_cpumask(mm), tlb_pending_func,
-			       &info, 1);
+	smp_call_mask_others(mm_cpumask(mm), tlb_pending_func,
+		       &info, SMP_CALL_TYPE_SYNC);
 
 	__flush_tlb_pending(ctx, nr, vaddrs);
 
@@ -1522,8 +1522,12 @@ void smp_send_stop(void)
 #endif
 				prom_stopcpu_cpuid(cpu);
 		}
-	} else
-		smp_call_function(stop_this_cpu, NULL, 0);
+	} else {
+		preempt_disable();
+		smp_call_mask_others(SMP_CALL_MASK_ALL, stop_this_cpu,
+				NULL, SMP_CALL_TYPE_ASYNC);
+		preempt_enable();
+	}
 }
 
 static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 8b1911591581..dca8c077c8ad 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -3024,7 +3024,7 @@ void hugetlb_setup(struct pt_regs *regs)
 		spin_unlock_irq(&ctx_alloc_lock);
 
 		if (need_context_reload)
-			on_each_cpu(context_reload, mm, 0);
+			smp_call(SMP_CALL_ALL, context_reload, mm, SMP_CALL_TYPE_ASYNC);
 	}
 }
 #endif
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index eef816fc216d..03515ba1f36a 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2526,7 +2526,7 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
 	mmap_assert_write_locked(mm);
 
 	if (atomic_inc_return(&mm->context.perf_rdpmc_allowed) == 1)
-		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
+		smp_call_mask(mm_cpumask(mm), cr4_update_pce, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
@@ -2535,7 +2535,7 @@ static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *m
 		return;
 
 	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
-		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
+		smp_call_mask(mm_cpumask(mm), cr4_update_pce, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static int x86_pmu_event_idx(struct perf_event *event)
@@ -2591,7 +2591,7 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
 		else if (x86_pmu.attr_rdpmc == 2)
 			static_branch_dec(&rdpmc_always_available_key);
 
-		on_each_cpu(cr4_update_pce, NULL, 1);
+		smp_call(SMP_CALL_ALL, cr4_update_pce, NULL, SMP_CALL_TYPE_SYNC);
 		x86_pmu.attr_rdpmc = val;
 	}
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index fc7f458eb3de..2e94f33d1aab 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5123,7 +5123,7 @@ static ssize_t freeze_on_smi_store(struct device *cdev,
 	x86_pmu.attr_freeze_on_smi = val;
 
 	cpus_read_lock();
-	on_each_cpu(flip_smm_bit, &val, 1);
+	smp_call(SMP_CALL_ALL, flip_smm_bit, &val, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 done:
 	mutex_unlock(&freeze_on_smi_mutex);
@@ -5168,7 +5168,7 @@ static ssize_t set_sysctl_tfa(struct device *cdev,
 	allow_tsx_force_abort = val;
 
 	cpus_read_lock();
-	on_each_cpu(update_tfa_sched, NULL, 1);
+	smp_call(SMP_CALL_ALL, update_tfa_sched, NULL, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 
 	return count;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d374cb3cf024..b8d93fb7c33b 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1181,7 +1181,7 @@ static void do_sync_core(void *info)
 
 void text_poke_sync(void)
 {
-	on_each_cpu(do_sync_core, NULL, 1);
+	smp_call(SMP_CALL_ALL, do_sync_core, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 struct text_poke_loc {
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 020c906f7934..4edc11afcd78 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -488,7 +488,7 @@ static __init void fix_erratum_688(void)
 	if (val & BIT(2))
 		return;
 
-	on_each_cpu(__fix_erratum_688, NULL, 0);
+	smp_call(SMP_CALL_ALL, __fix_erratum_688, NULL, SMP_CALL_TYPE_ASYNC);
 
 	pr_info("x86/cpu/AMD: CPU erratum 688 worked around\n");
 }
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b70344bf6600..87a68837db30 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -656,7 +656,7 @@ void lapic_update_tsc_freq(void)
 	 * changed. In order to avoid races, schedule the frequency
 	 * update code on each CPU.
 	 */
-	on_each_cpu(__lapic_update_tsc_freq, NULL, 0);
+	smp_call(SMP_CALL_ALL, __lapic_update_tsc_freq, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6296e1ebed1d..a6806b73f7e3 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1091,7 +1091,7 @@ static void update_stibp_strict(void)
 	pr_info("Update user space SMT mitigation: STIBP %s\n",
 		mask & SPEC_CTRL_STIBP ? "always-on" : "off");
 	x86_spec_ctrl_base = mask;
-	on_each_cpu(update_stibp_msr, NULL, 1);
+	smp_call(SMP_CALL_ALL, update_stibp_msr, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 /* Update the static key controlling the evaluation of TIF_SPEC_IB */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 981496e6bc0e..e8ddca5aacf1 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2207,7 +2207,7 @@ void mce_disable_bank(int bank)
 		return;
 	}
 	set_bit(bank, mce_banks_ce_disabled);
-	on_each_cpu(__mce_disable_bank, &bank, 1);
+	smp_call(SMP_CALL_ALL, __mce_disable_bank, &bank, SMP_CALL_TYPE_SYNC);
 }
 
 /*
@@ -2362,7 +2362,7 @@ static void mce_cpu_restart(void *data)
 static void mce_restart(void)
 {
 	mce_timer_delete_all();
-	on_each_cpu(mce_cpu_restart, NULL, 1);
+	smp_call(SMP_CALL_ALL, mce_cpu_restart, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 /* Toggle features for corrected errors */
@@ -2450,12 +2450,12 @@ static ssize_t set_ignore_ce(struct device *s,
 		if (new) {
 			/* disable ce features */
 			mce_timer_delete_all();
-			on_each_cpu(mce_disable_cmci, NULL, 1);
+			smp_call(SMP_CALL_ALL, mce_disable_cmci, NULL, SMP_CALL_TYPE_SYNC);
 			mca_cfg.ignore_ce = true;
 		} else {
 			/* enable ce features */
 			mca_cfg.ignore_ce = false;
-			on_each_cpu(mce_enable_ce, (void *)1, 1);
+			smp_call(SMP_CALL_ALL, mce_enable_ce, (void *)1, SMP_CALL_TYPE_SYNC);
 		}
 	}
 	mutex_unlock(&mce_sysfs_mutex);
@@ -2476,12 +2476,12 @@ static ssize_t set_cmci_disabled(struct device *s,
 	if (mca_cfg.cmci_disabled ^ !!new) {
 		if (new) {
 			/* disable cmci */
-			on_each_cpu(mce_disable_cmci, NULL, 1);
+			smp_call(SMP_CALL_ALL, mce_disable_cmci, NULL, SMP_CALL_TYPE_SYNC);
 			mca_cfg.cmci_disabled = true;
 		} else {
 			/* enable cmci */
 			mca_cfg.cmci_disabled = false;
-			on_each_cpu(mce_enable_ce, NULL, 1);
+			smp_call(SMP_CALL_ALL, mce_enable_ce, NULL, SMP_CALL_TYPE_SYNC);
 		}
 	}
 	mutex_unlock(&mce_sysfs_mutex);
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 5fbd7ffb3233..e7d5ad9d62b6 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -264,8 +264,8 @@ static void __maybe_unused raise_mce(struct mce *m)
 				 * to be sync with following raise_local
 				 */
 				preempt_disable();
-				smp_call_function_many(mce_inject_cpumask,
-					mce_irq_ipi, NULL, 0);
+				smp_call_mask_others(mce_inject_cpumask,
+					mce_irq_ipi, NULL, SMP_CALL_TYPE_ASYNC);
 				preempt_enable();
 			} else if (m->inject_flags & MCJ_NMI_BROADCAST)
 				apic->send_IPI_mask(mce_inject_cpumask,
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index 95275a5e57e0..787dfc787336 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -400,7 +400,7 @@ void cmci_rediscover(void)
 	if (!cmci_supported(&banks))
 		return;
 
-	on_each_cpu(cmci_rediscover_work_func, NULL, 1);
+	smp_call(SMP_CALL_ALL, cmci_rediscover_work_func, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index f2dd4f29af27..a8a5df0828b1 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -326,7 +326,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		rdt_ctrl_update(&msr_param);
 	/* Update resource control msr on other CPUs. */
-	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
+	smp_call_mask_others(cpu_mask, rdt_ctrl_update, &msr_param, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 
 done:
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 83f901e2c2df..c6f02d1f0673 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -329,7 +329,7 @@ update_closid_rmid(const struct cpumask *cpu_mask, struct rdtgroup *r)
 
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		update_cpu_closid_rmid(r);
-	smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1);
+	smp_call_mask_others(cpu_mask, update_cpu_closid_rmid, r, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 }
 
@@ -1866,7 +1866,7 @@ static int set_cache_qos_cfg(int level, bool enable)
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		update(&enable);
 	/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, update, &enable, 1);
+	smp_call_mask_others(cpu_mask, update, &enable, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 
 	free_cpumask_var(cpu_mask);
@@ -2335,7 +2335,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		rdt_ctrl_update(&msr_param);
 	/* Update CBM on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
+	smp_call_mask_others(cpu_mask, rdt_ctrl_update, &msr_param, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 
 	free_cpumask_var(cpu_mask);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8e4bc6453d26..67bdfe2c3069 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -278,8 +278,9 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page,
 			 * miss cpus that entered the enclave between
 			 * generating the mask and incrementing epoch.
 			 */
-			on_each_cpu_mask(sgx_encl_ewb_cpumask(encl),
-					 sgx_ipi_cb, NULL, 1);
+			smp_call_mask(sgx_encl_ewb_cpumask(encl),
+				 sgx_ipi_cb, NULL, SMP_CALL_TYPE_SYNC);
+
 			ret = __sgx_encl_ewb(epc_page, va_slot, backing);
 		}
 	}
diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index ec8064c0ae03..86a9dd98d596 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -120,7 +120,7 @@ static inline void umwait_update_control(u32 maxtime, bool c02_enable)
 
 	WRITE_ONCE(umwait_control_cached, ctrl);
 	/* Propagate to all CPUs */
-	on_each_cpu(umwait_update_control_msr, NULL, 1);
+	smp_call(SMP_CALL_ALL, umwait_update_control_msr, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static ssize_t
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index c04b933f48d3..741e49bfaa1f 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -282,7 +282,7 @@ static int vmware_pv_reboot_notify(struct notifier_block *nb,
 				unsigned long code, void *unused)
 {
 	if (code == SYS_RESTART)
-		on_each_cpu(vmware_pv_guest_cpu_reboot, NULL, 1);
+		smp_call(SMP_CALL_ALL, vmware_pv_guest_cpu_reboot, NULL, SMP_CALL_TYPE_SYNC);
 	return NOTIFY_DONE;
 }
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8b1c45c9cda8..9f1623917322 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -743,7 +743,7 @@ static int kvm_pv_reboot_notify(struct notifier_block *nb,
 				unsigned long code, void *unused)
 {
 	if (code == SYS_RESTART)
-		on_each_cpu(kvm_pv_guest_cpu_reboot, NULL, 1);
+		smp_call(SMP_CALL_ALL, kvm_pv_guest_cpu_reboot, NULL, SMP_CALL_TYPE_SYNC);
 	return NOTIFY_DONE;
 }
 
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 525876e7b9f4..3243e2507d73 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -424,7 +424,7 @@ static void install_ldt(struct mm_struct *mm, struct ldt_struct *ldt)
 	smp_store_release(&mm->context.ldt, ldt);
 
 	/* Activate the LDT for all CPUs using currents mm. */
-	on_each_cpu_mask(mm_cpumask(mm), flush_ldt, mm, true);
+	smp_call_mask(mm_cpumask(mm), flush_ldt, mm, SMP_CALL_TYPE_SYNC);
 
 	mutex_unlock(&mm->context.lock);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4790f0d7d40b..691cbf38304c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7442,8 +7442,8 @@ static int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
 		int cpu = get_cpu();
 
 		cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
-		on_each_cpu_mask(vcpu->arch.wbinvd_dirty_mask,
-				wbinvd_ipi, NULL, 1);
+		smp_call_mask(vcpu->arch.wbinvd_dirty_mask,
+				wbinvd_ipi, NULL, SMP_CALL_TYPE_SYNC);
 		put_cpu();
 		cpumask_clear(vcpu->arch.wbinvd_dirty_mask);
 	} else
diff --git a/arch/x86/lib/cache-smp.c b/arch/x86/lib/cache-smp.c
index 7c48ff4ae8d1..20d00dc2bb8c 100644
--- a/arch/x86/lib/cache-smp.c
+++ b/arch/x86/lib/cache-smp.c
@@ -15,7 +15,7 @@ EXPORT_SYMBOL(wbinvd_on_cpu);
 
 int wbinvd_on_all_cpus(void)
 {
-	on_each_cpu(__wbinvd, NULL, 1);
+	smp_call(SMP_CALL_ALL, __wbinvd, NULL, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 EXPORT_SYMBOL(wbinvd_on_all_cpus);
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index 40bbe56bde32..cbe2874835f3 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -113,7 +113,7 @@ static void __rwmsr_on_cpus(const struct cpumask *mask, u32 msr_no,
 	if (cpumask_test_cpu(this_cpu, mask))
 		msr_func(&rv);
 
-	smp_call_function_many(mask, msr_func, &rv, 1);
+	smp_call_mask_others(mask, msr_func, &rv, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 }
 
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 0656db33574d..5f65746b6fc0 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -348,7 +348,7 @@ static void cpa_flush_all(unsigned long cache)
 {
 	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
 
-	on_each_cpu(__cpa_flush_all, (void *) cache, 1);
+	smp_call(SMP_CALL_ALL, __cpa_flush_all, (void *) cache, SMP_CALL_TYPE_SYNC);
 }
 
 static void __cpa_flush_tlb(void *data)
@@ -375,7 +375,7 @@ static void cpa_flush(struct cpa_data *data, int cache)
 	if (cpa->force_flush_all || cpa->numpages > tlb_single_page_flush_ceiling)
 		flush_tlb_all();
 	else
-		on_each_cpu(__cpa_flush_tlb, cpa, 1);
+		smp_call(SMP_CALL_ALL, __cpa_flush_tlb, cpa, SMP_CALL_TYPE_SYNC);
 
 	if (!cache)
 		return;
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d400b6d9d246..8771ca98af7b 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -889,10 +889,10 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
 	 * doing a speculative memory access.
 	 */
 	if (info->freed_tables)
-		on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
+		smp_call_mask(cpumask, flush_tlb_func, (void *)info, SMP_CALL_TYPE_SYNC);
 	else
-		on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
-				(void *)info, 1, cpumask);
+		smp_call_mask_cond(cpumask, flush_tlb_func, (void *)info,
+				tlb_is_not_lazy, SMP_CALL_TYPE_SYNC);
 }
 
 void flush_tlb_multi(const struct cpumask *cpumask,
@@ -1006,7 +1006,7 @@ static void do_flush_tlb_all(void *info)
 void flush_tlb_all(void)
 {
 	count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
-	on_each_cpu(do_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, do_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void do_kernel_range_flush(void *info)
@@ -1024,14 +1024,14 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 	/* Balance as user space task's flush, a bit conservative */
 	if (end == TLB_FLUSH_ALL ||
 	    (end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) {
-		on_each_cpu(do_flush_tlb_all, NULL, 1);
+		smp_call(SMP_CALL_ALL, do_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 	} else {
 		struct flush_tlb_info *info;
 
 		preempt_disable();
 		info = get_flush_tlb_info(NULL, start, end, 0, false, 0);
 
-		on_each_cpu(do_kernel_range_flush, info, 1);
+		smp_call(SMP_CALL_ALL, do_kernel_range_flush, info, SMP_CALL_TYPE_SYNC);
 
 		put_flush_tlb_info();
 		preempt_enable();
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 00354866921b..ec37a4d2c0f9 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -948,7 +948,7 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
 			cpumask_set_cpu(cpu, mask);
 	}
 
-	smp_call_function_many(mask, drop_mm_ref_this_cpu, mm, 1);
+	smp_call_mask_others(mask, drop_mm_ref_this_cpu, mm, SMP_CALL_TYPE_SYNC);
 	free_cpumask_var(mask);
 }
 #else
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 688aa8b6ae29..bf7abf876cf9 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -428,7 +428,10 @@ static void stop_self(void *v)
 
 static void xen_pv_stop_other_cpus(int wait)
 {
-	smp_call_function(stop_self, NULL, wait);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, stop_self, NULL,
+			(wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
+	preempt_enable();
 }
 
 static irqreturn_t xen_irq_work_interrupt(int irq, void *dev_id)
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 1d83152c761b..94abd0bb7a26 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -67,7 +67,7 @@ void xen_arch_resume(void)
 {
 	int cpu;
 
-	on_each_cpu(xen_vcpu_notify_restore, NULL, 1);
+	smp_call(SMP_CALL_ALL, xen_vcpu_notify_restore, NULL, SMP_CALL_TYPE_SYNC);
 
 	for_each_online_cpu(cpu)
 		xen_pmu_init(cpu);
@@ -80,5 +80,5 @@ void xen_arch_suspend(void)
 	for_each_online_cpu(cpu)
 		xen_pmu_finish(cpu);
 
-	on_each_cpu(xen_vcpu_notify_suspend, NULL, 1);
+	smp_call(SMP_CALL_ALL, xen_vcpu_notify_suspend, NULL, SMP_CALL_TYPE_SYNC);
 }
diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c
index 1254da07ead1..5878ba8f0274 100644
--- a/arch/xtensa/kernel/smp.c
+++ b/arch/xtensa/kernel/smp.c
@@ -470,7 +470,7 @@ static void ipi_flush_tlb_all(void *arg)
 
 void flush_tlb_all(void)
 {
-	on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_mm(void *arg)
@@ -480,7 +480,7 @@ static void ipi_flush_tlb_mm(void *arg)
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	on_each_cpu(ipi_flush_tlb_mm, mm, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_mm, mm, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_page(void *arg)
@@ -495,7 +495,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 		.vma = vma,
 		.addr1 = addr,
 	};
-	on_each_cpu(ipi_flush_tlb_page, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_page, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_range(void *arg)
@@ -512,7 +512,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_tlb_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_range, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_kernel_range(void *arg)
@@ -527,7 +527,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_tlb_kernel_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_kernel_range, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 /* Cache flush functions */
@@ -539,7 +539,7 @@ static void ipi_flush_cache_all(void *arg)
 
 void flush_cache_all(void)
 {
-	on_each_cpu(ipi_flush_cache_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_cache_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_cache_page(void *arg)
@@ -556,7 +556,7 @@ void flush_cache_page(struct vm_area_struct *vma,
 		.addr1 = address,
 		.addr2 = pfn,
 	};
-	on_each_cpu(ipi_flush_cache_page, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_cache_page, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_cache_range(void *arg)
@@ -573,7 +573,7 @@ void flush_cache_range(struct vm_area_struct *vma,
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_cache_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_cache_range, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_icache_range(void *arg)
@@ -588,7 +588,7 @@ void flush_icache_range(unsigned long start, unsigned long end)
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_icache_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_icache_range, &fd, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(flush_icache_range);
 
@@ -607,7 +607,7 @@ static void system_invalidate_dcache_range(unsigned long start,
 		.addr1 = start,
 		.addr2 = size,
 	};
-	on_each_cpu(ipi_invalidate_dcache_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_invalidate_dcache_range, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_invalidate_dcache_range(void *arg)
@@ -623,5 +623,5 @@ static void system_flush_invalidate_dcache_range(unsigned long start,
 		.addr1 = start,
 		.addr2 = size,
 	};
-	on_each_cpu(ipi_flush_invalidate_dcache_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_invalidate_dcache_range, &fd, SMP_CALL_TYPE_SYNC);
 }
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index 3ffbb1c80c5c..dddb96754e0b 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -1308,7 +1308,7 @@ static void ipi_handler(void *null)
 
 void global_cache_flush(void)
 {
-	on_each_cpu(ipi_handler, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_handler, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(global_cache_flush);
 
diff --git a/drivers/clocksource/mips-gic-timer.c b/drivers/clocksource/mips-gic-timer.c
index be4175f415ba..3f0566027bf6 100644
--- a/drivers/clocksource/mips-gic-timer.c
+++ b/drivers/clocksource/mips-gic-timer.c
@@ -130,7 +130,7 @@ static int gic_clk_notifier(struct notifier_block *nb, unsigned long action,
 
 	if (action == POST_RATE_CHANGE) {
 		gic_clocksource_unstable("ref clock rate change");
-		on_each_cpu(gic_update_frequency, (void *)cnd->new_rate, 1);
+		smp_call(SMP_CALL_ALL, gic_update_frequency, (void *)cnd->new_rate, SMP_CALL_TYPE_SYNC);
 	}
 
 	return NOTIFY_OK;
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 9bbb4b9391f8..b6ade5348c7f 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -129,8 +129,8 @@ static void boost_set_msr_each(void *p_en)
 
 static int set_boost(struct cpufreq_policy *policy, int val)
 {
-	on_each_cpu_mask(policy->cpus, boost_set_msr_each,
-			 (void *)(long)val, 1);
+	smp_call_mask(policy->cpus, boost_set_msr_each,
+			(void *)(long)val, SMP_CALL_TYPE_SYNC);
 	pr_debug("CPU %*pbl: Core Boosting %sabled.\n",
 		 cpumask_pr_args(policy->cpus), val ? "en" : "dis");
 
@@ -345,7 +345,7 @@ static void drv_write(struct acpi_cpufreq_data *data,
 	if (cpumask_test_cpu(this_cpu, mask))
 		do_drv_write(&cmd);
 
-	smp_call_function_many(mask, do_drv_write, &cmd, 1);
+	smp_call_mask_others(mask, do_drv_write, &cmd, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 }
 
diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c
index ac381db25dbe..99fc456f16b6 100644
--- a/drivers/cpufreq/tegra194-cpufreq.c
+++ b/drivers/cpufreq/tegra194-cpufreq.c
@@ -265,7 +265,7 @@ static int tegra194_cpufreq_set_target(struct cpufreq_policy *policy,
 	 * in a cluster run at same frequency which is the maximum frequency
 	 * request out of the values requested by both cores in that cluster.
 	 */
-	on_each_cpu_mask(policy->cpus, set_cpu_ndiv, tbl, true);
+	smp_call_mask(policy->cpus, set_cpu_ndiv, tbl, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index f70aa17e2a8e..a7e6f95e6a9c 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -225,8 +225,8 @@ static int __cpuidle_register_driver(struct cpuidle_driver *drv)
 		return ret;
 
 	if (drv->bctimer)
-		on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
-				 (void *)1, 1);
+		smp_call_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
+				 (void *)1, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
@@ -244,8 +244,8 @@ static void __cpuidle_unregister_driver(struct cpuidle_driver *drv)
 {
 	if (drv->bctimer) {
 		drv->bctimer = 0;
-		on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
-				 NULL, 1);
+		smp_call_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
+				 NULL, SMP_CALL_TYPE_SYNC);
 	}
 
 	__cpuidle_unset_driver(drv);
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 812baa48b290..69038cd22924 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -793,7 +793,7 @@ static ssize_t inject_write_store(struct device *dev,
 			"/sys/bus/machinecheck/devices/machinecheck<CPUNUM>/check_interval\n"
 			"so that you can get the error report faster.\n");
 
-	on_each_cpu(disable_caches, NULL, 1);
+	smp_call(SMP_CALL_ALL, disable_caches, NULL, SMP_CALL_TYPE_SYNC);
 
 	/* Issue 'word' and 'bit' along with the READ request */
 	amd64_write_pci_cfg(pvt->F3, F10_NB_ARRAY_DATA, word_bits);
@@ -806,7 +806,7 @@ static ssize_t inject_write_store(struct device *dev,
 		goto retry;
 	}
 
-	on_each_cpu(enable_caches, NULL, 1);
+	smp_call(SMP_CALL_ALL, enable_caches, NULL, SMP_CALL_TYPE_SYNC);
 
 	edac_dbg(0, "section=0x%x word_bits=0x%x\n", section, word_bits);
 
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 1e1a51510e83..691408efb912 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -101,7 +101,7 @@ static inline int sdei_do_cross_call(smp_call_func_t fn,
 	struct sdei_crosscall_args arg;
 
 	CROSSCALL_INIT(arg, event);
-	on_each_cpu(fn, &arg, true);
+	smp_call(SMP_CALL_ALL, fn, &arg, SMP_CALL_TYPE_SYNC);
 
 	return arg.first_error;
 }
@@ -359,7 +359,7 @@ static int sdei_api_shared_reset(void)
 static void sdei_mark_interface_broken(void)
 {
 	pr_err("disabling SDEI firmware interface\n");
-	on_each_cpu(&_ipi_mask_cpu, NULL, true);
+	smp_call(SMP_CALL_ALL, &_ipi_mask_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	sdei_firmware_call = NULL;
 }
 
@@ -367,7 +367,7 @@ static int sdei_platform_reset(void)
 {
 	int err;
 
-	on_each_cpu(&_ipi_private_reset, NULL, true);
+	smp_call(SMP_CALL_ALL, &_ipi_private_reset, NULL, SMP_CALL_TYPE_SYNC);
 	err = sdei_api_shared_reset();
 	if (err) {
 		pr_err("Failed to reset platform: %d\n", err);
@@ -741,14 +741,14 @@ static struct notifier_block sdei_pm_nb = {
 
 static int sdei_device_suspend(struct device *dev)
 {
-	on_each_cpu(_ipi_mask_cpu, NULL, true);
+	smp_call(SMP_CALL_ALL, _ipi_mask_cpu, NULL, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
 
 static int sdei_device_resume(struct device *dev)
 {
-	on_each_cpu(_ipi_unmask_cpu, NULL, true);
+	smp_call(SMP_CALL_ALL, _ipi_unmask_cpu, NULL, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/vlv_sideband.c b/drivers/gpu/drm/i915/vlv_sideband.c
index c26001300ebd..09a261d8af80 100644
--- a/drivers/gpu/drm/i915/vlv_sideband.c
+++ b/drivers/gpu/drm/i915/vlv_sideband.c
@@ -42,7 +42,7 @@ static void __vlv_punit_get(struct drm_i915_private *i915)
 	 */
 	if (IS_VALLEYVIEW(i915)) {
 		cpu_latency_qos_update_request(&i915->sb_qos, 0);
-		on_each_cpu(ping, NULL, 1);
+		smp_call(SMP_CALL_ALL, ping, NULL, SMP_CALL_TYPE_SYNC);
 	}
 }
 
diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index 521534d5c1e5..25bf112c9aa0 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -188,7 +188,7 @@ static int read_registers(struct fam15h_power_data *data)
 		cpumask_set_cpu(cpumask_any(topology_sibling_cpumask(cpu)), mask);
 	}
 
-	on_each_cpu_mask(mask, do_read_registers_on_cu, data, true);
+	smp_call_mask(mask, do_read_registers_on_cu, data, SMP_CALL_TYPE_SYNC);
 
 	cpus_read_unlock();
 	free_cpumask_var(mask);
diff --git a/drivers/irqchip/irq-mvebu-pic.c b/drivers/irqchip/irq-mvebu-pic.c
index ef3d3646ccc2..c000fd1be945 100644
--- a/drivers/irqchip/irq-mvebu-pic.c
+++ b/drivers/irqchip/irq-mvebu-pic.c
@@ -160,7 +160,7 @@ static int mvebu_pic_probe(struct platform_device *pdev)
 	irq_set_chained_handler(pic->parent_irq, mvebu_pic_handle_cascade_irq);
 	irq_set_handler_data(pic->parent_irq, pic);
 
-	on_each_cpu(mvebu_pic_enable_percpu_irq, pic, 1);
+	smp_call(SMP_CALL_ALL, mvebu_pic_enable_percpu_irq, pic, SMP_CALL_TYPE_SYNC);
 
 	platform_set_drvdata(pdev, pic);
 
@@ -171,7 +171,7 @@ static int mvebu_pic_remove(struct platform_device *pdev)
 {
 	struct mvebu_pic *pic = platform_get_drvdata(pdev);
 
-	on_each_cpu(mvebu_pic_disable_percpu_irq, pic, 1);
+	smp_call(SMP_CALL_ALL, mvebu_pic_disable_percpu_irq, pic, SMP_CALL_TYPE_SYNC);
 	irq_domain_remove(pic->domain);
 
 	return 0;
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 934f6dd90992..77042499baa4 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1473,10 +1473,10 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
 	int max_cpu = num_present_cpus();
 
 	/* Clear all Cause registers */
-	on_each_cpu(mvneta_percpu_clear_intr_cause, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_clear_intr_cause, pp, SMP_CALL_TYPE_SYNC);
 
 	/* Mask all interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_mask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 	mvreg_write(pp, MVNETA_INTR_ENABLE, 0);
 
 	/* Enable MBUS Retry bit16 */
@@ -3704,7 +3704,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
 	}
 
 	/* Unmask interrupts. It has to be done from each CPU */
-	on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_unmask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 
 	mvreg_write(pp, MVNETA_INTR_MISC_MASK,
 		    MVNETA_CAUSE_PHY_STATUS_CHANGE |
@@ -3751,10 +3751,10 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
 	mvneta_port_disable(pp);
 
 	/* Clear all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_clear_intr_cause, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_clear_intr_cause, pp, SMP_CALL_TYPE_SYNC);
 
 	/* Mask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_mask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 
 	mvneta_tx_reset(pp);
 	mvneta_rx_reset(pp);
@@ -3811,7 +3811,7 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 	 * reallocation of the queues
 	 */
 	mvneta_stop_dev(pp);
-	on_each_cpu(mvneta_percpu_disable, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_disable, pp, SMP_CALL_TYPE_SYNC);
 
 	mvneta_cleanup_txqs(pp);
 	mvneta_cleanup_rxqs(pp);
@@ -3833,7 +3833,7 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 		return ret;
 	}
 
-	on_each_cpu(mvneta_percpu_enable, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_enable, pp, SMP_CALL_TYPE_SYNC);
 	mvneta_start_dev(pp);
 
 	netdev_update_features(dev);
@@ -4349,7 +4349,7 @@ static int mvneta_cpu_online(unsigned int cpu, struct hlist_node *node)
 	}
 
 	/* Mask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_mask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 	napi_enable(&port->napi);
 
 	/*
@@ -4365,7 +4365,7 @@ static int mvneta_cpu_online(unsigned int cpu, struct hlist_node *node)
 	mvneta_percpu_elect(pp);
 
 	/* Unmask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_unmask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 	mvreg_write(pp, MVNETA_INTR_MISC_MASK,
 		    MVNETA_CAUSE_PHY_STATUS_CHANGE |
 		    MVNETA_CAUSE_LINK_CHANGE);
@@ -4386,7 +4386,7 @@ static int mvneta_cpu_down_prepare(unsigned int cpu, struct hlist_node *node)
 	 */
 	spin_lock(&pp->lock);
 	/* Mask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_mask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 	spin_unlock(&pp->lock);
 
 	napi_synchronize(&port->napi);
@@ -4406,7 +4406,7 @@ static int mvneta_cpu_dead(unsigned int cpu, struct hlist_node *node)
 	mvneta_percpu_elect(pp);
 	spin_unlock(&pp->lock);
 	/* Unmask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_unmask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 	mvreg_write(pp, MVNETA_INTR_MISC_MASK,
 		    MVNETA_CAUSE_PHY_STATUS_CHANGE |
 		    MVNETA_CAUSE_LINK_CHANGE);
@@ -4445,7 +4445,7 @@ static int mvneta_open(struct net_device *dev)
 		/* Enable per-CPU interrupt on all the CPU to handle our RX
 		 * queue interrupts
 		 */
-		on_each_cpu(mvneta_percpu_enable, pp, true);
+		smp_call(SMP_CALL_ALL, mvneta_percpu_enable, pp, SMP_CALL_TYPE_SYNC);
 
 		pp->is_stopped = false;
 		/* Register a CPU notifier to handle the case where our CPU
@@ -4484,7 +4484,7 @@ static int mvneta_open(struct net_device *dev)
 	if (pp->neta_armada3700) {
 		free_irq(pp->dev->irq, pp);
 	} else {
-		on_each_cpu(mvneta_percpu_disable, pp, true);
+		smp_call(SMP_CALL_ALL, mvneta_percpu_disable, pp, SMP_CALL_TYPE_SYNC);
 		free_percpu_irq(pp->dev->irq, pp->ports);
 	}
 err_cleanup_txqs:
@@ -4516,7 +4516,7 @@ static int mvneta_stop(struct net_device *dev)
 						    &pp->node_online);
 		cpuhp_state_remove_instance_nocalls(CPUHP_NET_MVNETA_DEAD,
 						    &pp->node_dead);
-		on_each_cpu(mvneta_percpu_disable, pp, true);
+		smp_call(SMP_CALL_ALL, mvneta_percpu_disable, pp, SMP_CALL_TYPE_SYNC);
 		free_percpu_irq(dev->irq, pp->ports);
 	} else {
 		mvneta_stop_dev(pp);
@@ -4893,7 +4893,7 @@ static int  mvneta_config_rss(struct mvneta_port *pp)
 
 	netif_tx_stop_all_queues(pp->dev);
 
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_mask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 
 	if (!pp->neta_armada3700) {
 		/* We have to synchronise on the napi of each CPU */
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 1a835b48791b..70fe2af6985f 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -3263,7 +3263,7 @@ static void mvpp2_cleanup_txqs(struct mvpp2_port *port)
 		mvpp2_txq_deinit(port, txq);
 	}
 
-	on_each_cpu(mvpp2_txq_sent_counter_clear, port, 1);
+	smp_call(SMP_CALL_ALL, mvpp2_txq_sent_counter_clear, port, SMP_CALL_TYPE_SYNC);
 
 	val &= ~MVPP2_TX_PORT_FLUSH_MASK(port->id);
 	mvpp2_write(port->priv, MVPP2_TX_PORT_FLUSH_REG, val);
@@ -3327,7 +3327,7 @@ static int mvpp2_setup_txqs(struct mvpp2_port *port)
 		}
 	}
 
-	on_each_cpu(mvpp2_txq_sent_counter_clear, port, 1);
+	smp_call(SMP_CALL_ALL, mvpp2_txq_sent_counter_clear, port, SMP_CALL_TYPE_SYNC);
 	return 0;
 
 err_cleanup:
@@ -4829,7 +4829,7 @@ static int mvpp2_open(struct net_device *dev)
 	}
 
 	/* Unmask interrupts on all CPUs */
-	on_each_cpu(mvpp2_interrupts_unmask, port, 1);
+	smp_call(SMP_CALL_ALL, mvpp2_interrupts_unmask, port, SMP_CALL_TYPE_SYNC);
 	mvpp2_shared_interrupt_mask_unmask(port, false);
 
 	mvpp2_start_dev(port);
@@ -4858,7 +4858,7 @@ static int mvpp2_stop(struct net_device *dev)
 	mvpp2_stop_dev(port);
 
 	/* Mask interrupts on all threads */
-	on_each_cpu(mvpp2_interrupts_mask, port, 1);
+	smp_call(SMP_CALL_ALL, mvpp2_interrupts_mask, port, SMP_CALL_TYPE_SYNC);
 	mvpp2_shared_interrupt_mask_unmask(port, true);
 
 	if (port->phylink)
diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c
index 4dfdbfca6841..e8cacf59357a 100644
--- a/drivers/platform/x86/intel_ips.c
+++ b/drivers/platform/x86/intel_ips.c
@@ -457,7 +457,7 @@ static void ips_enable_cpu_turbo(struct ips_driver *ips)
 		return;
 
 	if (ips->turbo_toggle_allowed)
-		on_each_cpu(do_enable_cpu_turbo, ips, 1);
+		smp_call(SMP_CALL_ALL, do_enable_cpu_turbo, ips, SMP_CALL_TYPE_SYNC);
 
 	ips->__cpu_turbo_on = true;
 }
@@ -495,7 +495,7 @@ static void ips_disable_cpu_turbo(struct ips_driver *ips)
 		return;
 
 	if (ips->turbo_toggle_allowed)
-		on_each_cpu(do_disable_cpu_turbo, ips, 1);
+		smp_call(SMP_CALL_ALL, do_disable_cpu_turbo, ips, SMP_CALL_TYPE_SYNC);
 
 	ips->__cpu_turbo_on = false;
 }
diff --git a/drivers/soc/xilinx/xlnx_event_manager.c b/drivers/soc/xilinx/xlnx_event_manager.c
index b27f8853508e..a9dc450951b8 100644
--- a/drivers/soc/xilinx/xlnx_event_manager.c
+++ b/drivers/soc/xilinx/xlnx_event_manager.c
@@ -514,7 +514,7 @@ static void xlnx_event_cleanup_sgi(struct platform_device *pdev)
 
 	cpuhp_remove_state(CPUHP_AP_ONLINE_DYN);
 
-	on_each_cpu(xlnx_disable_percpu_irq, NULL, 1);
+	smp_call(SMP_CALL_ALL, xlnx_disable_percpu_irq, NULL, SMP_CALL_TYPE_SYNC);
 
 	irq_clear_status_flags(virq_sgi, IRQ_PER_CPU);
 	free_percpu_irq(virq_sgi, &cpu_number1);
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index bbfd004449b5..7982875e647f 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -243,7 +243,10 @@ static void showacpu(void *dummy)
 
 static void sysrq_showregs_othercpus(struct work_struct *dummy)
 {
-	smp_call_function(showacpu, NULL, 0);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, showacpu,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
 }
 
 static DECLARE_WORK(sysrq_showallcpus, sysrq_showregs_othercpus);
diff --git a/drivers/watchdog/booke_wdt.c b/drivers/watchdog/booke_wdt.c
index 5e4dc1a0f2c6..b34ba999c8c5 100644
--- a/drivers/watchdog/booke_wdt.c
+++ b/drivers/watchdog/booke_wdt.c
@@ -118,7 +118,7 @@ static void __booke_wdt_set(void *data)
 
 static void booke_wdt_set(void *data)
 {
-	on_each_cpu(__booke_wdt_set, data, 0);
+	smp_call(SMP_CALL_ALL, __booke_wdt_set, data, SMP_CALL_TYPE_ASYNC);
 }
 
 static void __booke_wdt_ping(void *data)
@@ -128,7 +128,7 @@ static void __booke_wdt_ping(void *data)
 
 static int booke_wdt_ping(struct watchdog_device *wdog)
 {
-	on_each_cpu(__booke_wdt_ping, NULL, 0);
+	smp_call(SMP_CALL_ALL, __booke_wdt_ping, NULL, SMP_CALL_TYPE_ASYNC);
 
 	return 0;
 }
@@ -170,7 +170,7 @@ static void __booke_wdt_disable(void *data)
 
 static int booke_wdt_start(struct watchdog_device *wdog)
 {
-	on_each_cpu(__booke_wdt_enable, wdog, 0);
+	smp_call(SMP_CALL_ALL, __booke_wdt_enable, wdog, SMP_CALL_TYPE_ASYNC);
 	pr_debug("watchdog enabled (timeout = %u sec)\n", wdog->timeout);
 
 	return 0;
@@ -178,7 +178,7 @@ static int booke_wdt_start(struct watchdog_device *wdog)
 
 static int booke_wdt_stop(struct watchdog_device *wdog)
 {
-	on_each_cpu(__booke_wdt_disable, NULL, 0);
+	smp_call(SMP_CALL_ALL, __booke_wdt_disable, NULL, SMP_CALL_TYPE_ASYNC);
 	pr_debug("watchdog disabled\n");
 
 	return 0;
diff --git a/fs/buffer.c b/fs/buffer.c
index 2b5561ae5d0b..e02180ab3816 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1420,7 +1420,7 @@ bool has_bh_in_lru(int cpu, void *dummy)
 
 void invalidate_bh_lrus(void)
 {
-	on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1);
+	smp_call_cond(SMP_CALL_ALL, invalidate_bh_lru, NULL, has_bh_in_lru, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(invalidate_bh_lrus);
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 1fb0951ca16b..40c46d2bea88 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -160,7 +160,8 @@ do {						\
 #define	SMP_CALL_TYPE_TTWU		CSD_TYPE_TTWU
 #define	SMP_CALL_TYPE_MASK		CSD_FLAG_TYPE_MASK
 
-#define	SMP_CALL_ALL	-1
+#define	SMP_CALL_ALL		-1
+#define	SMP_CALL_MASK_ALL	((struct cpumask *)-1)
 
 extern int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags);
 
@@ -192,9 +193,6 @@ extern unsigned int total_cpus;
 int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 			     int wait);
 
-void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
-			   void *info, bool wait, const struct cpumask *mask);
-
 #define	smp_call_function_single_async(cpu, csd) \
 	smp_call_csd(cpu, csd, SMP_CALL_TYPE_ASYNC)
 
@@ -206,48 +204,6 @@ void panic_smp_self_stop(void);
 void nmi_panic_self_stop(struct pt_regs *regs);
 void crash_smp_send_stop(void);
 
-/*
- * Call a function on all processors
- */
-static inline void on_each_cpu(smp_call_func_t func, void *info, int wait)
-{
-	on_each_cpu_cond_mask(NULL, func, info, wait, cpu_online_mask);
-}
-
-/**
- * on_each_cpu_mask(): Run a function on processors specified by
- * cpumask, which may include the local processor.
- * @mask: The set of cpus to run on (only runs on online subset).
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait (atomically) until function has completed
- *        on other CPUs.
- *
- * If @wait is true, then returns once @func has returned.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler.  The
- * exception is that it may be used during early boot while
- * early_boot_irqs_disabled is set.
- */
-static inline void on_each_cpu_mask(const struct cpumask *mask,
-				    smp_call_func_t func, void *info, bool wait)
-{
-	on_each_cpu_cond_mask(NULL, func, info, wait, mask);
-}
-
-/*
- * Call a function on each processor for which the supplied function
- * cond_func returns a positive value. This may include the local
- * processor.  May be used during early boot while early_boot_irqs_disabled is
- * set. Use local_irq_save/restore() instead of local_irq_disable/enable().
- */
-static inline void on_each_cpu_cond(smp_cond_func_t cond_func,
-				    smp_call_func_t func, void *info, bool wait)
-{
-	on_each_cpu_cond_mask(cond_func, func, info, wait, cpu_online_mask);
-}
-
 #ifdef CONFIG_SMP
 
 #include <linux/preempt.h>
@@ -286,13 +242,6 @@ extern int __cpu_up(unsigned int cpunum, struct task_struct *tidle);
  */
 extern void smp_cpus_done(unsigned int max_cpus);
 
-/*
- * Call a function on all other processors
- */
-void smp_call_function(smp_call_func_t func, void *info, int wait);
-void smp_call_function_many(const struct cpumask *mask,
-			    smp_call_func_t func, void *info, bool wait);
-
 void kick_all_cpus_sync(void);
 void wake_up_all_idle_cpus(void);
 
diff --git a/kernel/profile.c b/kernel/profile.c
index 37640a0bd8a3..7b7a6135f443 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -179,7 +179,7 @@ static void profile_flip_buffers(void)
 	mutex_lock(&profile_flip_mutex);
 	j = per_cpu(cpu_profile_flip, get_cpu());
 	put_cpu();
-	on_each_cpu(__profile_flip_buffers, NULL, 1);
+	smp_call(SMP_CALL_ALL, __profile_flip_buffers, NULL, SMP_CALL_TYPE_SYNC);
 	for_each_online_cpu(cpu) {
 		struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
 		for (i = 0; i < NR_PROFILE_HIT; ++i) {
@@ -202,7 +202,7 @@ static void profile_discard_flip_buffers(void)
 	mutex_lock(&profile_flip_mutex);
 	i = per_cpu(cpu_profile_flip, get_cpu());
 	put_cpu();
-	on_each_cpu(__profile_flip_buffers, NULL, 1);
+	smp_call(SMP_CALL_ALL, __profile_flip_buffers, NULL, SMP_CALL_TYPE_SYNC);
 	for_each_online_cpu(cpu) {
 		struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[i];
 		memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4b8189455d5..7bf1db5ebade 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1884,7 +1884,7 @@ static noinline_for_stack bool rcu_gp_init(void)
 
 	// If strict, make all CPUs aware of new grace period.
 	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
-		on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
+		smp_call(SMP_CALL_ALL, rcu_strict_gp_boundary, NULL, SMP_CALL_TYPE_ASYNC);
 
 	return true;
 }
@@ -2109,7 +2109,7 @@ static noinline void rcu_gp_cleanup(void)
 
 	// If strict, make all CPUs aware of the end of the old grace period.
 	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
-		on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
+		smp_call(SMP_CALL_ALL, rcu_strict_gp_boundary, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/kernel/scftorture.c b/kernel/scftorture.c
index dcb0410950e4..4b6dd708f1f6 100644
--- a/kernel/scftorture.c
+++ b/kernel/scftorture.c
@@ -398,7 +398,8 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 			barrier(); // Prevent race-reduction compiler optimizations.
 			scfcp->scfc_in = true;
 		}
-		smp_call_function_many(cpu_online_mask, scf_handler, scfcp, scfsp->scfs_wait);
+		smp_call_mask_others(cpu_online_mask, scf_handler, scfcp,
+			       (scfsp->scfs_wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 		break;
 	case SCF_PRIM_ALL:
 		if (scfsp->scfs_wait)
@@ -409,7 +410,10 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 			barrier(); // Prevent race-reduction compiler optimizations.
 			scfcp->scfc_in = true;
 		}
-		smp_call_function(scf_handler, scfcp, scfsp->scfs_wait);
+		preempt_disable();
+		smp_call_mask_others(SMP_CALL_MASK_ALL, scf_handler, scfcp,
+			  (scfsp->scfs_wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
+		preempt_enable();
 		break;
 	default:
 		WARN_ON_ONCE(1);
@@ -515,7 +519,12 @@ static void scf_torture_cleanup(void)
 			torture_stop_kthread("scftorture_invoker", scf_stats_p[i].task);
 	else
 		goto end;
-	smp_call_function(scf_cleanup_handler, NULL, 0);
+
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, scf_cleanup_handler,
+			NULL, SMP_CALL_TYPE_ASYNC);
+	preempt_enable();
+
 	torture_stop_kthread(scf_torture_stats, scf_torture_stats_task);
 	scf_torture_stats_print();  // -After- the stats thread is stopped!
 	kfree(scf_stats_p);  // -After- the last stats print has completed!
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 0c5be7ebb1dc..427abc8aee02 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -291,7 +291,7 @@ static int membarrier_global_expedited(void)
 	rcu_read_unlock();
 
 	preempt_disable();
-	smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+	smp_call_mask_others(tmpmask, ipi_mb, NULL, SMP_CALL_TYPE_SYNC);
 	preempt_enable();
 
 	free_cpumask_var(tmpmask);
@@ -400,10 +400,10 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		 */
 		if (flags != MEMBARRIER_FLAG_SYNC_CORE) {
 			preempt_disable();
-			smp_call_function_many(tmpmask, ipi_func, NULL, true);
+			smp_call_mask_others(tmpmask, ipi_func, NULL, SMP_CALL_TYPE_SYNC);
 			preempt_enable();
 		} else {
-			on_each_cpu_mask(tmpmask, ipi_func, NULL, true);
+			smp_call_mask(tmpmask, ipi_func, NULL, SMP_CALL_TYPE_SYNC);
 		}
 	}
 
@@ -471,7 +471,7 @@ static int sync_runqueues_membarrier_state(struct mm_struct *mm)
 	}
 	rcu_read_unlock();
 
-	on_each_cpu_mask(tmpmask, ipi_sync_rq_state, mm, true);
+	smp_call_mask(tmpmask, ipi_sync_rq_state, mm, SMP_CALL_TYPE_SYNC);
 
 	free_cpumask_var(tmpmask);
 	cpus_read_unlock();
diff --git a/kernel/smp.c b/kernel/smp.c
index df343b1368eb..c783f5c533e2 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -641,11 +641,10 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-static void smp_call_function_many_cond(const struct cpumask *mask,
-					smp_call_func_t func, void *info,
-					bool local_cpu,
-					bool wait,
-					smp_cond_func_t cond_func)
+static void __smp_call_mask_cond(const struct cpumask *mask,
+		smp_call_func_t func, void *info,
+		smp_cond_func_t cond_func, bool local_cpu,
+		unsigned int type)
 {
 	int cpu, last_cpu, this_cpu = smp_processor_id();
 	struct call_function_data *cfd;
@@ -666,10 +665,10 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 		lockdep_assert_irqs_enabled();
 
 	/*
-	 * When @wait we can deadlock when we interrupt between llist_add() and
-	 * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
-	 * csd_lock() on because the interrupt context uses the same csd
-	 * storage.
+	 * When CSD_TYPE_SYNC we can deadlock when we interrupt between
+	 * llist_add() and arch_send_call_function_ipi*(); when CSD_TYPE_ASYNC
+	 * we can deadlock due to csd_lock() on because the interrupt context
+	 * uses the same csd storage.
 	 */
 	WARN_ON_ONCE(!in_task());
 
@@ -698,7 +697,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 				continue;
 
 			csd_lock(csd);
-			if (wait)
+			if (type & CSD_TYPE_SYNC)
 				csd->node.u_flags |= CSD_TYPE_SYNC;
 			csd->func = func;
 			csd->info = info;
@@ -737,11 +736,12 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 		unsigned long flags;
 
 		local_irq_save(flags);
-		func(info);
+		if (likely(func != NULL))
+			func(info);
 		local_irq_restore(flags);
 	}
 
-	if (run_remote && wait) {
+	if (run_remote && (type & CSD_TYPE_SYNC)) {
 		for_each_cpu(cpu, cfd->cpumask) {
 			call_single_data_t *csd;
 
@@ -751,48 +751,6 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 	}
 }
 
-/**
- * smp_call_function_many(): Run a function on a set of CPUs except the local CPU.
- * @mask: The set of cpus to run on (only runs on online subset).
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If wait is true, the call will not return until func()
- *        has completed on other CPUs.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler. Preemption
- * must be disabled when calling this function.
- */
-void smp_call_function_many(const struct cpumask *mask,
-			    smp_call_func_t func, void *info, bool wait)
-{
-	smp_call_function_many_cond(mask, func, info, false, wait, NULL);
-}
-EXPORT_SYMBOL(smp_call_function_many);
-
-/**
- * smp_call_function(): Run a function on all other CPUs.
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait (atomically) until function has completed
- *        on other CPUs.
- *
- * Returns 0.
- *
- * If @wait is true, then returns once @func has returned; otherwise
- * it returns just before the target cpu calls @func.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler.
- */
-void smp_call_function(smp_call_func_t func, void *info, int wait)
-{
-	preempt_disable();
-	smp_call_function_many(cpu_online_mask, func, info, wait);
-	preempt_enable();
-}
-EXPORT_SYMBOL(smp_call_function);
-
 /* Setup configured maximum number of CPUs to activate */
 unsigned int setup_max_cpus = NR_CPUS;
 EXPORT_SYMBOL(setup_max_cpus);
@@ -877,38 +835,6 @@ void __init smp_init(void)
 	smp_cpus_done(setup_max_cpus);
 }
 
-/*
- * on_each_cpu_cond(): Call a function on each processor which
- * the supplied function cond_func returns true, optionally waiting
- * for all the required CPUs to finish. This may include the local
- * processor.
- * @cond_func:	A callback function that is passed a cpu id and
- *		the info parameter. The function is called
- *		with preemption disabled. The function should
- *		return a blooean value indicating whether to IPI
- *		the specified CPU.
- * @func:	The function to run on all applicable CPUs.
- *		This must be fast and non-blocking.
- * @info:	An arbitrary pointer to pass to both functions.
- * @wait:	If true, wait (atomically) until function has
- *		completed on other CPUs.
- *
- * Preemption is disabled to protect against CPUs going offline but not online.
- * CPUs going online during the call will not be seen or sent an IPI.
- *
- * You must not call this function with disabled interrupts or
- * from a hardware interrupt handler or from a bottom half handler.
- */
-void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
-			   void *info, bool wait, const struct cpumask *mask)
-{
-
-	preempt_disable();
-	smp_call_function_many_cond(mask, func, info, true, wait, cond_func);
-	preempt_enable();
-}
-EXPORT_SYMBOL(on_each_cpu_cond_mask);
-
 static void do_nothing(void *unused)
 {
 }
@@ -928,7 +854,11 @@ void kick_all_cpus_sync(void)
 {
 	/* Make sure the change is visible before we kick the cpus */
 	smp_mb();
-	smp_call_function(do_nothing, NULL, 1);
+
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, do_nothing,
+			NULL, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
 }
 EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
 
@@ -1008,28 +938,7 @@ int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 }
 EXPORT_SYMBOL_GPL(smp_call_on_cpu);
 
-
-void __smp_call_mask_cond(const struct cpumask *mask,
-		smp_call_func_t func, void *info,
-		smp_cond_func_t cond_func, bool local_cpu,
-		unsigned int flags)
-{
-	bool wait = false;
-
-	if (flags == SMP_CALL_TYPE_SYNC)
-		wait = true;
-
-	preempt_disable();
-
-	/*
-	 * This is temporarily hook. The function smp_call_function_many_cond()
-	 * will be inlined here with a later patch in this series.
-	 */
-	smp_call_function_many_cond(mask, func, info, local_cpu, wait, cond_func);
-	preempt_enable();
-}
-
-/**
+/*
  * Given cpumask, find an online CPU as close to the local CPU as
  * possible in the order below:
  *
@@ -1165,6 +1074,9 @@ EXPORT_SYMBOL(smp_call_mask);
  * mask: The cross call will be sent to all the CPUs in mask except
  *       the local CPU.
  *
+ *       If mask == SMP_CALL_MASK_ALL, then the cross call is sent
+ *       to all online CPUs except the local CPU.
+ *
  * Others: see smp_call().
  *
  * Besides, its the callers' discrimination as to whether preemption
@@ -1178,7 +1090,10 @@ void smp_call_mask_others(const struct cpumask *mask, smp_call_func_t func,
 		return;
 	}
 
-	__smp_call_mask_cond(mask, func, info, NULL, false, flags);
+	if (mask == SMP_CALL_MASK_ALL)
+		__smp_call_mask_cond(cpu_online_mask, func, info, NULL, false, flags);
+	else
+		__smp_call_mask_cond(mask, func, info, NULL, false, flags);
 }
 EXPORT_SYMBOL(smp_call_mask_others);
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 0ea8702eb516..6726e92d9a9e 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -952,7 +952,7 @@ void clock_was_set(unsigned int bases)
 		goto out_timerfd;
 
 	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
-		on_each_cpu(retrigger_next_event, NULL, 1);
+		smp_call(SMP_CALL_ALL, retrigger_next_event, NULL, SMP_CALL_TYPE_SYNC);
 		goto out_timerfd;
 	}
 
@@ -971,8 +971,9 @@ void clock_was_set(unsigned int bases)
 	}
 
 	preempt_disable();
-	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
+	smp_call_mask_others(mask, retrigger_next_event, NULL, SMP_CALL_TYPE_SYNC);
 	preempt_enable();
+
 	cpus_read_unlock();
 	free_cpumask_var(mask);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4f1d2f5e7263..28b7f670a300 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -242,8 +242,13 @@ static void update_ftrace_function(void)
 	function_trace_op = set_function_trace_op;
 	/* Make sure the function_trace_op is visible on all CPUs */
 	smp_wmb();
+
 	/* Nasty way to force a rmb on all cpus */
-	smp_call_function(ftrace_sync_ipi, NULL, 1);
+	preempt_disable();
+	smp_call_mask_others(SMP_CALL_MASK_ALL, ftrace_sync_ipi,
+			NULL, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
+
 	/* OK, we are all set to update the ftrace_trace_function now! */
 #endif /* !CONFIG_DYNAMIC_FTRACE */
 
@@ -2755,8 +2760,13 @@ void ftrace_modify_all_code(int command)
 		function_trace_op = set_function_trace_op;
 		smp_wmb();
 		/* If irqs are disabled, we are in stop machine */
-		if (!irqs_disabled())
-			smp_call_function(ftrace_sync_ipi, NULL, 1);
+		if (!irqs_disabled()) {
+			preempt_disable();
+			smp_call_mask_others(SMP_CALL_MASK_ALL,
+				ftrace_sync_ipi, NULL, SMP_CALL_TYPE_SYNC);
+			preempt_enable();
+		}
+
 		err = ftrace_update_ftrace_func(ftrace_trace_function);
 		if (FTRACE_WARN_ON(err))
 			return;
@@ -7769,7 +7779,7 @@ pid_write(struct file *filp, const char __user *ubuf,
 	 * check for those tasks that are currently running.
 	 * Always do this in case a pid was appended or removed.
 	 */
-	on_each_cpu(ignore_task_cpu, tr, 1);
+	smp_call(SMP_CALL_ALL, ignore_task_cpu, tr, SMP_CALL_TYPE_SYNC);
 
 	ftrace_update_pid_func();
 	ftrace_startup_all(0);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 05dfc7a12d3d..699d2f40e75d 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5864,8 +5864,13 @@ static __init int rb_hammer_test(void *arg)
 {
 	while (!kthread_should_stop()) {
 
+		preempt_disable();
+
 		/* Send an IPI to all cpus to write data! */
-		smp_call_function(rb_ipi, NULL, 1);
+		smp_call_mask_others(SMP_CALL_MASK_ALL, rb_ipi,
+				NULL, SMP_CALL_TYPE_SYNC);
+		preempt_enable();
+
 		/* No sleep, but for non preempt, let others run */
 		schedule();
 	}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f4de111fa18f..ac55868475d4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2724,10 +2724,10 @@ void trace_buffered_event_disable(void)
 	if (--trace_buffered_event_ref)
 		return;
 
-	preempt_disable();
 	/* For each CPU, set the buffer as used. */
-	smp_call_function_many(tracing_buffer_mask,
-			       disable_trace_buffered_event, NULL, 1);
+	preempt_disable();
+	smp_call_mask_others(tracing_buffer_mask,
+		       disable_trace_buffered_event, NULL, SMP_CALL_TYPE_SYNC);
 	preempt_enable();
 
 	/* Wait for all current users to finish */
@@ -2743,10 +2743,10 @@ void trace_buffered_event_disable(void)
 	 */
 	smp_wmb();
 
-	preempt_disable();
 	/* Do the work on each cpu */
-	smp_call_function_many(tracing_buffer_mask,
-			       enable_trace_buffered_event, NULL, 1);
+	preempt_disable();
+	smp_call_mask_others(tracing_buffer_mask,
+		       enable_trace_buffered_event, NULL, SMP_CALL_TYPE_SYNC);
 	preempt_enable();
 }
 
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index e11e167b7809..30441ba91790 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1995,7 +1995,7 @@ event_pid_write(struct file *filp, const char __user *ubuf,
 	 * check for those tasks that are currently running.
 	 * Always do this in case a pid was appended or removed.
 	 */
-	on_each_cpu(ignore_task_cpu, tr, 1);
+	smp_call(SMP_CALL_ALL, ignore_task_cpu, tr, SMP_CALL_TYPE_SYNC);
 
  out:
 	mutex_unlock(&event_mutex);
diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 0a9def8ce5e8..54b500a0fc4c 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -339,7 +339,7 @@ void kasan_quarantine_remove_cache(struct kmem_cache *cache)
 	 * achieves the first goal, while synchronize_srcu() achieves the
 	 * second.
 	 */
-	on_each_cpu(per_cpu_remove_cache, cache, 1);
+	smp_call(SMP_CALL_ALL, per_cpu_remove_cache, cache, SMP_CALL_TYPE_SYNC);
 
 	raw_spin_lock_irqsave(&quarantine_lock, flags);
 	for (i = 0; i < QUARANTINE_BATCHES; i++) {
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index afb7185ffdc4..5d3dfe05777c 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -142,6 +142,7 @@ static void tlb_remove_table_smp_sync(void *arg)
 
 static void tlb_remove_table_sync_one(void)
 {
+	preempt_disable();
 	/*
 	 * This isn't an RCU grace period and hence the page-tables cannot be
 	 * assumed to be actually RCU-freed.
@@ -149,7 +150,9 @@ static void tlb_remove_table_sync_one(void)
 	 * It is however sufficient for software page-table walkers that rely on
 	 * IRQ disabling.
 	 */
-	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
+	smp_call_mask_others(SMP_CALL_MASK_ALL, tlb_remove_table_smp_sync,
+			NULL, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
 }
 
 static void tlb_remove_table_rcu(struct rcu_head *head)
diff --git a/mm/slab.c b/mm/slab.c
index 0edb474edef1..13c96fad5ff4 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2171,7 +2171,7 @@ static void drain_cpu_caches(struct kmem_cache *cachep)
 	int node;
 	LIST_HEAD(list);
 
-	on_each_cpu(do_drain, cachep, 1);
+	smp_call(SMP_CALL_ALL, do_drain, cachep, SMP_CALL_TYPE_SYNC);
 	check_irq_on();
 	for_each_kmem_cache_node(cachep, node, n)
 		if (n->alien)
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index eb0295d90039..03f72efdda84 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -574,7 +574,7 @@ static int iucv_enable(void)
 static void iucv_disable(void)
 {
 	cpus_read_lock();
-	on_each_cpu(iucv_retrieve_cpu, NULL, 1);
+	smp_call(SMP_CALL_ALL, iucv_retrieve_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	kfree(iucv_path_table);
 	iucv_path_table = NULL;
 	cpus_read_unlock();
@@ -687,6 +687,7 @@ static void iucv_cleanup_queue(void)
 {
 	struct iucv_irq_list *p, *n;
 
+	preempt_disable();
 	/*
 	 * When a path is severed, the pathid can be reused immediately
 	 * on a iucv connect or a connection pending interrupt. Remove
@@ -696,7 +697,10 @@ static void iucv_cleanup_queue(void)
 	 * pending interrupts force them to the work queue by calling
 	 * an empty function on all cpus.
 	 */
-	smp_call_function(__iucv_cleanup_queue, NULL, 1);
+	smp_call_mask_others(SMP_CALL_MASK_ALL, __iucv_cleanup_queue,
+			NULL, SMP_CALL_TYPE_SYNC);
+	preempt_enable();
+
 	spin_lock_irq(&iucv_queue_lock);
 	list_for_each_entry_safe(p, n, &iucv_task_queue, list) {
 		/* Remove stale work items from the task queue. */
@@ -787,7 +791,7 @@ static int iucv_reboot_event(struct notifier_block *this,
 		return NOTIFY_DONE;
 
 	cpus_read_lock();
-	on_each_cpu_mask(&iucv_irq_cpumask, iucv_block_cpu, NULL, 1);
+	smp_call_mask(&iucv_irq_cpumask, iucv_block_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	preempt_disable();
 	for (i = 0; i < iucv_max_pathid; i++) {
 		if (iucv_path_table[i])
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6d971fb1b08d..ffa181d2d465 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -248,7 +248,7 @@ static inline bool kvm_kick_many_cpus(struct cpumask *cpus, bool wait)
 	if (cpumask_empty(cpus))
 		return false;
 
-	smp_call_function_many(cpus, ack_flush, NULL, wait);
+	smp_call_mask_others(cpus, ack_flush, NULL, (wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 	return true;
 }
 
@@ -4923,7 +4923,7 @@ static void hardware_disable_all_nolock(void)
 
 	kvm_usage_count--;
 	if (!kvm_usage_count)
-		on_each_cpu(hardware_disable_nolock, NULL, 1);
+		smp_call(SMP_CALL_ALL, hardware_disable_nolock, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void hardware_disable_all(void)
@@ -4942,7 +4942,7 @@ static int hardware_enable_all(void)
 	kvm_usage_count++;
 	if (kvm_usage_count == 1) {
 		atomic_set(&hardware_enable_failed, 0);
-		on_each_cpu(hardware_enable_nolock, NULL, 1);
+		smp_call(SMP_CALL_ALL, hardware_enable_nolock, NULL, SMP_CALL_TYPE_SYNC);
 
 		if (atomic_read(&hardware_enable_failed)) {
 			hardware_disable_all_nolock();
@@ -4966,7 +4966,7 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
 	 */
 	pr_info("kvm: exiting hardware virtualization\n");
 	kvm_rebooting = true;
-	on_each_cpu(hardware_disable_nolock, NULL, 1);
+	smp_call(SMP_CALL_ALL, hardware_disable_nolock, NULL, SMP_CALL_TYPE_SYNC);
 	return NOTIFY_OK;
 }
 
@@ -5818,7 +5818,7 @@ void kvm_exit(void)
 	unregister_syscore_ops(&kvm_syscore_ops);
 	unregister_reboot_notifier(&kvm_reboot_notifier);
 	cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
-	on_each_cpu(hardware_disable_nolock, NULL, 1);
+	smp_call(SMP_CALL_ALL, hardware_disable_nolock, NULL, SMP_CALL_TYPE_SYNC);
 	kvm_arch_hardware_unsetup();
 	kvm_arch_exit();
 	kvm_irqfd_exit();
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 09/11] smp: replace smp_call_function_single_async with smp_call_csd
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (7 preceding siblings ...)
  2022-05-19 20:49 ` [PATCH v4 08/11] smp: replace smp_call_function_many_cond with __smp_call_mask_cond Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 10/11] smp: replace smp_call_function_single() with smp_call Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 11/11] smp: up.c to adopt the same format of cross CPU call Donghai Qiao
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replace smp_call_function_single_async with smp_call_csd
and modify all the invocations.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: Removed 'x' from the function names and change XCALL to SMP_CALL
          from the new macros
v2 -> v3: Modifed all the invocations of smp_call_function_single_async()
          to smp_call_csd()

 arch/mips/kernel/process.c                      | 2 +-
 arch/mips/kernel/smp.c                          | 2 +-
 arch/s390/pci/pci_irq.c                         | 2 +-
 arch/x86/kernel/cpuid.c                         | 2 +-
 arch/x86/lib/msr-smp.c                          | 2 +-
 block/blk-mq.c                                  | 2 +-
 drivers/clocksource/ingenic-timer.c             | 2 +-
 drivers/cpuidle/coupled.c                       | 2 +-
 drivers/net/ethernet/cavium/liquidio/lio_core.c | 2 +-
 include/linux/smp.h                             | 3 ---
 kernel/debug/debug_core.c                       | 2 +-
 kernel/sched/core.c                             | 2 +-
 kernel/sched/fair.c                             | 2 +-
 net/core/dev.c                                  | 2 +-
 14 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index c2d5f4bfe1f3..8a8012af3219 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -745,7 +745,7 @@ static void raise_backtrace(cpumask_t *mask)
 		}
 
 		csd = &per_cpu(backtrace_csd, cpu);
-		smp_call_function_single_async(cpu, csd);
+		smp_call_csd(cpu, csd, SMP_CALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 27f94fde499d..ec070378e47f 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -704,7 +704,7 @@ void tick_broadcast(const struct cpumask *mask)
 
 	for_each_cpu(cpu, mask) {
 		csd = &per_cpu(tick_broadcast_csd, cpu);
-		smp_call_function_single_async(cpu, csd);
+		smp_call_csd(cpu, csd, SMP_CALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 136af9f32f23..5cfd6c98cfe3 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -212,7 +212,7 @@ static void zpci_handle_fallback_irq(void)
 			continue;
 
 		INIT_CSD(&cpu_data->csd, zpci_handle_remote_irq, &cpu_data->scheduled);
-		smp_call_function_single_async(cpu, &cpu_data->csd);
+		smp_call_csd(cpu, &cpu_data->csd, SMP_CALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/arch/x86/kernel/cpuid.c b/arch/x86/kernel/cpuid.c
index 6f7b8cc1bc9f..760beece93e3 100644
--- a/arch/x86/kernel/cpuid.c
+++ b/arch/x86/kernel/cpuid.c
@@ -81,7 +81,7 @@ static ssize_t cpuid_read(struct file *file, char __user *buf,
 		cmd.regs.eax = pos;
 		cmd.regs.ecx = pos >> 32;
 
-		err = smp_call_function_single_async(cpu, &csd);
+		err = smp_call_csd(cpu, &csd, SMP_CALL_TYPE_ASYNC);
 		if (err)
 			break;
 		wait_for_completion(&cmd.done);
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index cbe2874835f3..cc33097651fc 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -178,7 +178,7 @@ int rdmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
 	init_completion(&rv.done);
 	rv.msr.msr_no = msr_no;
 
-	err = smp_call_function_single_async(cpu, &csd);
+	err = smp_call_csd(cpu, &csd, SMP_CALL_TYPE_ASYNC);
 	if (!err) {
 		wait_for_completion(&rv.done);
 		err = rv.msr.err;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 84d749511f55..5c14cdffc70d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1063,7 +1063,7 @@ static void blk_mq_complete_send_ipi(struct request *rq)
 	list = &per_cpu(blk_cpu_done, cpu);
 	if (llist_add(&rq->ipi_list, list)) {
 		INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq);
-		smp_call_function_single_async(cpu, &rq->csd);
+		smp_call_csd(cpu, &rq->csd, SMP_CALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/drivers/clocksource/ingenic-timer.c b/drivers/clocksource/ingenic-timer.c
index 24ed0f1f089b..722694666063 100644
--- a/drivers/clocksource/ingenic-timer.c
+++ b/drivers/clocksource/ingenic-timer.c
@@ -121,7 +121,7 @@ static irqreturn_t ingenic_tcu_cevt_cb(int irq, void *dev_id)
 		csd = &per_cpu(ingenic_cevt_csd, timer->cpu);
 		csd->info = (void *) &timer->cevt;
 		csd->func = ingenic_per_cpu_event_handler;
-		smp_call_function_single_async(timer->cpu, csd);
+		smp_call_csd(timer->cpu, csd, SMP_CALL_TYPE_ASYNC);
 	}
 
 	return IRQ_HANDLED;
diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
index 74068742cef3..26519f6e828a 100644
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -334,7 +334,7 @@ static void cpuidle_coupled_poke(int cpu)
 	call_single_data_t *csd = &per_cpu(cpuidle_coupled_poke_cb, cpu);
 
 	if (!cpumask_test_and_set_cpu(cpu, &cpuidle_coupled_poke_pending))
-		smp_call_function_single_async(cpu, csd);
+		smp_call_csd(cpu, csd, SMP_CALL_TYPE_ASYNC);
 }
 
 /**
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 73cb03266549..489777fd3e56 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -729,7 +729,7 @@ static void liquidio_napi_drv_callback(void *arg)
 		napi_schedule_irqoff(&droq->napi);
 	} else {
 		INIT_CSD(&droq->csd, napi_schedule_wrapper, &droq->napi);
-		smp_call_function_single_async(droq->cpu_id, &droq->csd);
+		smp_call_csd(droq->cpu_id, &droq->csd, SMP_CALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 40c46d2bea88..1086f8d5e18c 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -193,9 +193,6 @@ extern unsigned int total_cpus;
 int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 			     int wait);
 
-#define	smp_call_function_single_async(cpu, csd) \
-	smp_call_csd(cpu, csd, SMP_CALL_TYPE_ASYNC)
-
 /*
  * Cpus stopping functions in panic. All have default weak definitions.
  * Architecture-dependent code may override them.
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index da06a5553835..113229096a70 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -264,7 +264,7 @@ void __weak kgdb_roundup_cpus(void)
 			continue;
 		kgdb_info[cpu].rounding_up = true;
 
-		ret = smp_call_function_single_async(cpu, csd);
+		ret = smp_call_csd(cpu, csd, SMP_CALL_TYPE_ASYNC);
 		if (ret)
 			kgdb_info[cpu].rounding_up = false;
 	}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6d1d30bd6220..09d54fb3eb4c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -836,7 +836,7 @@ void hrtick_start(struct rq *rq, u64 delay)
 	if (rq == this_rq())
 		__hrtick_restart(rq);
 	else
-		smp_call_function_single_async(cpu_of(rq), &rq->hrtick_csd);
+		smp_call_csd(cpu_of(rq), &rq->hrtick_csd, SMP_CALL_TYPE_ASYNC);
 }
 
 #else
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a68482d66535..1bb83d3ae9ea 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10461,7 +10461,7 @@ static void kick_ilb(unsigned int flags)
 	 * is idle. And the softirq performing nohz idle load balance
 	 * will be run before returning from the IPI.
 	 */
-	smp_call_function_single_async(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd);
+	smp_call_csd(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd, SMP_CALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/net/core/dev.c b/net/core/dev.c
index 1461c2d9dec8..e8319f789ed1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5788,7 +5788,7 @@ static void net_rps_send_ipi(struct softnet_data *remsd)
 		struct softnet_data *next = remsd->rps_ipi_next;
 
 		if (cpu_online(remsd->cpu))
-			smp_call_function_single_async(remsd->cpu, &remsd->csd);
+			smp_call_csd(remsd->cpu, &remsd->csd, SMP_CALL_TYPE_ASYNC);
 		remsd = next;
 	}
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 10/11] smp: replace smp_call_function_single() with smp_call
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (8 preceding siblings ...)
  2022-05-19 20:49 ` [PATCH v4 09/11] smp: replace smp_call_function_single_async with smp_call_csd Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  2022-05-19 20:49 ` [PATCH v4 11/11] smp: up.c to adopt the same format of cross CPU call Donghai Qiao
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replace smp_call_function_single() with smp_call() and
changes all the invocations for it.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: Removed 'x' from the function names and change XCALL to
          SMP_CALL from the new macros

 arch/alpha/kernel/rtc.c                       |  4 +--
 arch/arm/mach-bcm/bcm_kona_smc.c              |  2 +-
 arch/arm/mach-mvebu/pmsu.c                    |  4 +--
 arch/arm64/kernel/topology.c                  |  2 +-
 arch/csky/kernel/cpu-probe.c                  |  2 +-
 arch/ia64/kernel/palinfo.c                    |  3 +-
 arch/ia64/kernel/smpboot.c                    |  2 +-
 arch/mips/kernel/smp-bmips.c                  |  3 +-
 arch/mips/kernel/smp-cps.c                    |  8 ++---
 arch/powerpc/kernel/sysfs.c                   | 26 +++++++--------
 arch/powerpc/kernel/watchdog.c                |  4 +--
 arch/powerpc/kvm/book3s_hv.c                  |  8 ++---
 arch/powerpc/platforms/85xx/smp.c             |  6 ++--
 arch/sparc/kernel/nmi.c                       |  4 +--
 arch/x86/kernel/apic/vector.c                 |  2 +-
 arch/x86/kernel/cpu/aperfmperf.c              |  5 +--
 arch/x86/kernel/cpu/mce/amd.c                 |  4 +--
 arch/x86/kernel/cpu/mce/inject.c              |  8 ++---
 arch/x86/kernel/cpu/microcode/core.c          |  4 +--
 arch/x86/kernel/cpu/mtrr/mtrr.c               |  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |  2 +-
 arch/x86/kernel/kvm.c                         |  4 +--
 arch/x86/kvm/vmx/vmx.c                        |  3 +-
 arch/x86/kvm/x86.c                            |  7 ++--
 arch/x86/lib/cache-smp.c                      |  2 +-
 arch/x86/lib/msr-smp.c                        | 16 ++++-----
 arch/x86/xen/mmu_pv.c                         |  2 +-
 arch/xtensa/kernel/smp.c                      |  7 ++--
 drivers/acpi/processor_idle.c                 |  4 +--
 drivers/cpufreq/powernow-k8.c                 |  9 +++--
 drivers/cpufreq/powernv-cpufreq.c             |  2 +-
 drivers/cpufreq/sparc-us2e-cpufreq.c          |  4 +--
 drivers/cpufreq/sparc-us3-cpufreq.c           |  4 +--
 drivers/cpufreq/speedstep-ich.c               |  7 ++--
 drivers/cpufreq/tegra194-cpufreq.c            |  6 ++--
 .../hwtracing/coresight/coresight-cpu-debug.c |  3 +-
 .../coresight/coresight-etm3x-core.c          | 11 +++----
 .../coresight/coresight-etm4x-core.c          | 12 +++----
 .../coresight/coresight-etm4x-sysfs.c         |  2 +-
 drivers/hwtracing/coresight/coresight-trbe.c  |  6 ++--
 drivers/net/ethernet/marvell/mvneta.c         |  4 +--
 .../intel/speed_select_if/isst_if_mbox_msr.c  |  4 +--
 drivers/powercap/intel_rapl_common.c          |  2 +-
 drivers/powercap/intel_rapl_msr.c             |  2 +-
 drivers/regulator/qcom_spmi-regulator.c       |  3 +-
 drivers/soc/fsl/qbman/qman.c                  |  4 +--
 drivers/soc/fsl/qbman/qman_test_stash.c       |  9 ++---
 include/linux/smp.h                           |  1 +
 kernel/cpu.c                                  |  4 +--
 kernel/events/core.c                          | 10 +++---
 kernel/rcu/rcutorture.c                       |  3 +-
 kernel/rcu/tasks.h                            |  4 +--
 kernel/rcu/tree.c                             |  2 +-
 kernel/rcu/tree_exp.h                         |  4 +--
 kernel/relay.c                                |  5 ++-
 kernel/scftorture.c                           |  5 +--
 kernel/sched/membarrier.c                     |  2 +-
 kernel/smp.c                                  | 33 ++-----------------
 kernel/time/clockevents.c                     |  2 +-
 kernel/time/clocksource.c                     |  2 +-
 kernel/time/tick-common.c                     |  2 +-
 net/bpf/test_run.c                            |  4 +--
 net/iucv/iucv.c                               | 11 +++----
 virt/kvm/kvm_main.c                           |  2 +-
 64 files changed, 149 insertions(+), 191 deletions(-)

diff --git a/arch/alpha/kernel/rtc.c b/arch/alpha/kernel/rtc.c
index fb3025396ac9..2853887cc0d5 100644
--- a/arch/alpha/kernel/rtc.c
+++ b/arch/alpha/kernel/rtc.c
@@ -168,7 +168,7 @@ remote_read_time(struct device *dev, struct rtc_time *tm)
 	union remote_data x;
 	if (smp_processor_id() != boot_cpuid) {
 		x.tm = tm;
-		smp_call_function_single(boot_cpuid, do_remote_read, &x, 1);
+		smp_call(boot_cpuid, do_remote_read, &x, SMP_CALL_TYPE_SYNC);
 		return x.retval;
 	}
 	return alpha_rtc_read_time(NULL, tm);
@@ -187,7 +187,7 @@ remote_set_time(struct device *dev, struct rtc_time *tm)
 	union remote_data x;
 	if (smp_processor_id() != boot_cpuid) {
 		x.tm = tm;
-		smp_call_function_single(boot_cpuid, do_remote_set, &x, 1);
+		smp_call(boot_cpuid, do_remote_set, &x, SMP_CALL_TYPE_SYNC);
 		return x.retval;
 	}
 	return alpha_rtc_set_time(NULL, tm);
diff --git a/arch/arm/mach-bcm/bcm_kona_smc.c b/arch/arm/mach-bcm/bcm_kona_smc.c
index 43829e49ad93..75453079f79e 100644
--- a/arch/arm/mach-bcm/bcm_kona_smc.c
+++ b/arch/arm/mach-bcm/bcm_kona_smc.c
@@ -172,7 +172,7 @@ unsigned bcm_kona_smc(unsigned service_id, unsigned arg0, unsigned arg1,
 	 * Due to a limitation of the secure monitor, we must use the SMP
 	 * infrastructure to forward all secure monitor calls to Core 0.
 	 */
-	smp_call_function_single(0, __bcm_kona_smc, &data, 1);
+	smp_call(0, __bcm_kona_smc, &data, SMP_CALL_TYPE_SYNC);
 
 	return data.result;
 }
diff --git a/arch/arm/mach-mvebu/pmsu.c b/arch/arm/mach-mvebu/pmsu.c
index 73d5d72dfc3e..91ef1ba4100e 100644
--- a/arch/arm/mach-mvebu/pmsu.c
+++ b/arch/arm/mach-mvebu/pmsu.c
@@ -585,8 +585,8 @@ int mvebu_pmsu_dfs_request(int cpu)
 	writel(reg, pmsu_mp_base + PMSU_EVENT_STATUS_AND_MASK(hwcpu));
 
 	/* Trigger the DFS on the appropriate CPU */
-	smp_call_function_single(cpu, mvebu_pmsu_dfs_request_local,
-				 NULL, false);
+	smp_call(cpu, mvebu_pmsu_dfs_request_local,
+				 NULL, SMP_CALL_TYPE_ASYNC);
 
 	/* Poll until the DFS done event is generated */
 	timeout = jiffies + HZ;
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 9ab78ad826e2..c7a6d037feb4 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -331,7 +331,7 @@ int counters_read_on_cpu(int cpu, smp_call_func_t func, u64 *val)
 	if (WARN_ON_ONCE(irqs_disabled()))
 		return -EPERM;
 
-	smp_call_function_single(cpu, func, val, 1);
+	smp_call(cpu, func, val, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
diff --git a/arch/csky/kernel/cpu-probe.c b/arch/csky/kernel/cpu-probe.c
index 5f15ca31d3e8..2af63175fd08 100644
--- a/arch/csky/kernel/cpu-probe.c
+++ b/arch/csky/kernel/cpu-probe.c
@@ -48,7 +48,7 @@ static int c_show(struct seq_file *m, void *v)
 	int cpu;
 
 	for_each_online_cpu(cpu)
-		smp_call_function_single(cpu, percpu_print, m, true);
+		smp_call(cpu, percpu_print, m, SMP_CALL_TYPE_SYNC);
 
 #ifdef CSKY_ARCH_VERSION
 	seq_printf(m, "arch-version : %s\n", CSKY_ARCH_VERSION);
diff --git a/arch/ia64/kernel/palinfo.c b/arch/ia64/kernel/palinfo.c
index 64189f04c1a4..9c1b113e155a 100644
--- a/arch/ia64/kernel/palinfo.c
+++ b/arch/ia64/kernel/palinfo.c
@@ -844,7 +844,8 @@ int palinfo_handle_smp(struct seq_file *m, pal_func_cpu_u_t *f)
 
 
 	/* will send IPI to other CPU and wait for completion of remote call */
-	if ((ret=smp_call_function_single(f->req_cpu, palinfo_smp_call, &ptr, 1))) {
+	ret = smp_call(f->req_cpu, palinfo_smp_call, &ptr, SMP_CALL_TYPE_SYNC);
+	if (ret) {
 		printk(KERN_ERR "palinfo: remote CPU call from %d to %d on function %d: "
 		       "error %d\n", smp_processor_id(), f->req_cpu, f->func_id, ret);
 		return 0;
diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index d10f780c13b9..e71829688bd2 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -295,7 +295,7 @@ ia64_sync_itc (unsigned int master)
 
 	go[MASTER] = 1;
 
-	if (smp_call_function_single(master, sync_master, NULL, 0) < 0) {
+	if (smp_call(master, sync_master, NULL, SMP_CALL_TYPE_ASYNC) < 0) {
 		printk(KERN_ERR "sync_itc: failed to get attention of CPU %u!\n", master);
 		return;
 	}
diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
index f5d7bfa3472a..bd2df61e4f7f 100644
--- a/arch/mips/kernel/smp-bmips.c
+++ b/arch/mips/kernel/smp-bmips.c
@@ -488,8 +488,7 @@ static void bmips_set_reset_vec_remote(void *vinfo)
 
 	preempt_disable();
 	if (smp_processor_id() > 0) {
-		smp_call_function_single(0, &bmips_set_reset_vec_remote,
-					 info, 1);
+		smp_call(0, &bmips_set_reset_vec_remote, info, SMP_CALL_TYPE_SYNC);
 	} else {
 		if (info->cpu & 0x02) {
 			/* BMIPS5200 "should" use mask/shift, but it's buggy */
diff --git a/arch/mips/kernel/smp-cps.c b/arch/mips/kernel/smp-cps.c
index bcd6a944b839..3073b1e05b61 100644
--- a/arch/mips/kernel/smp-cps.c
+++ b/arch/mips/kernel/smp-cps.c
@@ -341,8 +341,7 @@ static int cps_boot_secondary(int cpu, struct task_struct *idle)
 			goto out;
 		}
 
-		err = smp_call_function_single(remote, remote_vpe_boot,
-					       NULL, 1);
+		err = smp_call(remote, remote_vpe_boot, NULL, SMP_CALL_TYPE_SYNC);
 		if (err)
 			panic("Failed to call remote CPU\n");
 		goto out;
@@ -587,9 +586,8 @@ static void cps_cpu_die(unsigned int cpu)
 		 * Have a CPU with access to the offlined CPUs registers wait
 		 * for its TC to halt.
 		 */
-		err = smp_call_function_single(cpu_death_sibling,
-					       wait_for_sibling_halt,
-					       (void *)(unsigned long)cpu, 1);
+		err = smp_call(cpu_death_sibling, wait_for_sibling_halt,
+			       (void *)(unsigned long)cpu, SMP_CALL_TYPE_SYNC);
 		if (err)
 			panic("Failed to call remote sibling CPU\n");
 	} else if (cpu_has_vp) {
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 898b48ca7b5a..b285ca10152b 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -98,7 +98,7 @@ static ssize_t show_##NAME(struct device *dev, \
 { \
 	struct cpu *cpu = container_of(dev, struct cpu, dev); \
 	unsigned long val; \
-	smp_call_function_single(cpu->dev.id, read_##NAME, &val, 1);	\
+	smp_call(cpu->dev.id, read_##NAME, &val, SMP_CALL_TYPE_SYNC);	\
 	return sprintf(buf, "%lx\n", val); \
 } \
 static ssize_t __used \
@@ -110,7 +110,7 @@ static ssize_t __used \
 	int ret = sscanf(buf, "%lx", &val); \
 	if (ret != 1) \
 		return -EINVAL; \
-	smp_call_function_single(cpu->dev.id, write_##NAME, &val, 1); \
+	smp_call(cpu->dev.id, write_##NAME, &val, SMP_CALL_TYPE_SYNC); \
 	return count; \
 }
 
@@ -262,7 +262,7 @@ static ssize_t show_pw20_state(struct device *dev,
 	u32 value;
 	unsigned int cpu = dev->id;
 
-	smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+	smp_call(cpu, do_show_pwrmgtcr0, &value, SMP_CALL_TYPE_SYNC);
 
 	value &= PWRMGTCR0_PW20_WAIT;
 
@@ -297,7 +297,7 @@ static ssize_t store_pw20_state(struct device *dev,
 	if (value > 1)
 		return -EINVAL;
 
-	smp_call_function_single(cpu, do_store_pw20_state, &value, 1);
+	smp_call(cpu, do_store_pw20_state, &value, SMP_CALL_TYPE_SYNC);
 
 	return count;
 }
@@ -312,7 +312,7 @@ static ssize_t show_pw20_wait_time(struct device *dev,
 	unsigned int cpu = dev->id;
 
 	if (!pw20_wt) {
-		smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+		smp_call(cpu, do_show_pwrmgtcr0, &value, SMP_CALL_TYPE_SYNC);
 		value = (value & PWRMGTCR0_PW20_ENT) >>
 					PWRMGTCR0_PW20_ENT_SHIFT;
 
@@ -372,8 +372,7 @@ static ssize_t store_pw20_wait_time(struct device *dev,
 
 	pw20_wt = value;
 
-	smp_call_function_single(cpu, set_pw20_wait_entry_bit,
-				&entry_bit, 1);
+	smp_call(cpu, set_pw20_wait_entry_bit, &entry_bit, SMP_CALL_TYPE_SYNC);
 
 	return count;
 }
@@ -384,7 +383,7 @@ static ssize_t show_altivec_idle(struct device *dev,
 	u32 value;
 	unsigned int cpu = dev->id;
 
-	smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+	smp_call(cpu, do_show_pwrmgtcr0, &value, SMP_CALL_TYPE_SYNC);
 
 	value &= PWRMGTCR0_AV_IDLE_PD_EN;
 
@@ -419,7 +418,7 @@ static ssize_t store_altivec_idle(struct device *dev,
 	if (value > 1)
 		return -EINVAL;
 
-	smp_call_function_single(cpu, do_store_altivec_idle, &value, 1);
+	smp_call(cpu, do_store_altivec_idle, &value, SMP_CALL_TYPE_SYNC);
 
 	return count;
 }
@@ -434,7 +433,7 @@ static ssize_t show_altivec_idle_wait_time(struct device *dev,
 	unsigned int cpu = dev->id;
 
 	if (!altivec_idle_wt) {
-		smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+		smp_call(cpu, do_show_pwrmgtcr0, &value, SMP_CALL_TYPE_SYNC);
 		value = (value & PWRMGTCR0_AV_IDLE_CNT) >>
 					PWRMGTCR0_AV_IDLE_CNT_SHIFT;
 
@@ -494,8 +493,7 @@ static ssize_t store_altivec_idle_wait_time(struct device *dev,
 
 	altivec_idle_wt = value;
 
-	smp_call_function_single(cpu, set_altivec_idle_wait_entry_bit,
-				&entry_bit, 1);
+	smp_call(cpu, set_altivec_idle_wait_entry_bit, &entry_bit, SMP_CALL_TYPE_SYNC);
 
 	return count;
 }
@@ -768,7 +766,7 @@ static ssize_t idle_purr_show(struct device *dev,
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
 	u64 val;
 
-	smp_call_function_single(cpu->dev.id, read_idle_purr, &val, 1);
+	smp_call(cpu->dev.id, read_idle_purr, &val, SMP_CALL_TYPE_SYNC);
 	return sprintf(buf, "%llx\n", val);
 }
 static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL);
@@ -798,7 +796,7 @@ static ssize_t idle_spurr_show(struct device *dev,
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
 	u64 val;
 
-	smp_call_function_single(cpu->dev.id, read_idle_spurr, &val, 1);
+	smp_call(cpu->dev.id, read_idle_spurr, &val, SMP_CALL_TYPE_SYNC);
 	return sprintf(buf, "%llx\n", val);
 }
 static DEVICE_ATTR(idle_spurr, 0400, idle_spurr_show, NULL);
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index bfc27496fe7e..1212a73a1ee3 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -499,7 +499,7 @@ static void start_watchdog(void *arg)
 
 static int start_watchdog_on_cpu(unsigned int cpu)
 {
-	return smp_call_function_single(cpu, start_watchdog, NULL, true);
+	return smp_call(cpu, start_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void stop_watchdog(void *arg)
@@ -522,7 +522,7 @@ static void stop_watchdog(void *arg)
 
 static int stop_watchdog_on_cpu(unsigned int cpu)
 {
-	return smp_call_function_single(cpu, stop_watchdog, NULL, true);
+	return smp_call(cpu, stop_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void watchdog_calc_timeouts(void)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6fa518f6501d..3a9df7302421 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1392,7 +1392,7 @@ static unsigned long kvmppc_read_dpdes(struct kvm_vcpu *vcpu)
 		 */
 		pcpu = READ_ONCE(v->cpu);
 		if (pcpu >= 0)
-			smp_call_function_single(pcpu, do_nothing, NULL, 1);
+			smp_call(pcpu, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 		if (kvmppc_doorbell_pending(v))
 			dpdes |= 1 << thr;
 	}
@@ -3082,7 +3082,7 @@ static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu)
 		struct kvm *running = *per_cpu_ptr(&cpu_in_guest, i);
 
 		if (running == kvm)
-			smp_call_function_single(i, do_nothing, NULL, 1);
+			smp_call(i, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 	}
 }
 
@@ -3136,8 +3136,8 @@ static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu)
 			    cpu_first_tlb_thread_sibling(pcpu))
 				radix_flush_cpu(kvm, prev_cpu, vcpu);
 
-			smp_call_function_single(prev_cpu,
-					do_migrate_away_vcpu, vcpu, 1);
+			smp_call(prev_cpu, do_migrate_away_vcpu,
+				  vcpu, SMP_CALL_TYPE_SYNC);
 		}
 		if (nested)
 			nested->prev_cpu[vcpu->arch.nested_vcpu_id] = pcpu;
diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index e36c67d7dffb..c91a6f3cc7c7 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -300,12 +300,10 @@ static int smp_85xx_kick_cpu(int nr)
 		 * the other.
 		 */
 		if (cpu_online(primary)) {
-			smp_call_function_single(primary,
-					wake_hw_thread, &nr, 1);
+			smp_call(primary, wake_hw_thread, &nr, SMP_CALL_TYPE_SYNC);
 			goto done;
 		} else if (cpu_online(primary + 1)) {
-			smp_call_function_single(primary + 1,
-					wake_hw_thread, &nr, 1);
+			smp_call(primary + 1, wake_hw_thread, &nr, SMP_CALL_TYPE_SYNC);
 			goto done;
 		}
 
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index b18a45dc0028..24fb10b9926e 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -300,7 +300,7 @@ int watchdog_nmi_enable(unsigned int cpu)
 	if (!nmi_init_done)
 		return 0;
 
-	smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
+	smp_call(cpu, start_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
@@ -313,5 +313,5 @@ void watchdog_nmi_disable(unsigned int cpu)
 	if (atomic_read(&nmi_active) == -1)
 		pr_warn_once("NMI watchdog cannot be enabled or disabled\n");
 	else
-		smp_call_function_single(cpu, stop_nmi_watchdog, NULL, 1);
+		smp_call(cpu, stop_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 }
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 3e6f6b448f6a..4f9f973b3ef3 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -1266,7 +1266,7 @@ static void __init print_local_APICs(int maxcpu)
 	for_each_online_cpu(cpu) {
 		if (cpu >= maxcpu)
 			break;
-		smp_call_function_single(cpu, print_local_APIC, NULL, 1);
+		smp_call(cpu, print_local_APIC, NULL, SMP_CALL_TYPE_SYNC);
 	}
 	preempt_enable();
 }
diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index 9ca008f9e9b1..5688727068a6 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -77,7 +77,8 @@ static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait)
 		return true;
 
 	if (!atomic_xchg(&s->scfpending, 1) || wait)
-		smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait);
+		smp_call(cpu, aperfmperf_snapshot_khz, NULL,
+				(wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 
 	/* Return false if the previous iteration was too long ago. */
 	return time_delta <= APERFMPERF_STALE_THRESHOLD_MS;
@@ -145,7 +146,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
 	msleep(APERFMPERF_REFRESH_DELAY_MS);
 	atomic_set(&s->scfpending, 1);
 	smp_mb(); /* ->scfpending before smp_call_function_single(). */
-	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
+	smp_call(cpu, aperfmperf_snapshot_khz, NULL, SMP_CALL_TYPE_SYNC);
 
 	return per_cpu(samples.khz, cpu);
 }
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1940d305db1c..101ac8ef99fd 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -929,7 +929,7 @@ store_interrupt_enable(struct threshold_block *b, const char *buf, size_t size)
 	memset(&tr, 0, sizeof(tr));
 	tr.b		= b;
 
-	if (smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1))
+	if (smp_call(b->cpu, threshold_restart_bank, &tr, SMP_CALL_TYPE_SYNC))
 		return -ENODEV;
 
 	return size;
@@ -954,7 +954,7 @@ store_threshold_limit(struct threshold_block *b, const char *buf, size_t size)
 	b->threshold_limit = new;
 	tr.b = b;
 
-	if (smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1))
+	if (smp_call(b->cpu, threshold_restart_bank, &tr, SMP_CALL_TYPE_SYNC))
 		return -ENODEV;
 
 	return size;
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index e7d5ad9d62b6..60da18cb8778 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -552,19 +552,19 @@ static void do_inject(void)
 
 	i_mce.mcgstatus = mcg_status;
 	i_mce.inject_flags = inj_type;
-	smp_call_function_single(cpu, prepare_msrs, &i_mce, 0);
+	smp_call(cpu, prepare_msrs, &i_mce, SMP_CALL_TYPE_ASYNC);
 
 	toggle_hw_mce_inject(cpu, false);
 
 	switch (inj_type) {
 	case DFR_INT_INJ:
-		smp_call_function_single(cpu, trigger_dfr_int, NULL, 0);
+		smp_call(cpu, trigger_dfr_int, NULL, SMP_CALL_TYPE_ASYNC);
 		break;
 	case THR_INT_INJ:
-		smp_call_function_single(cpu, trigger_thr_int, NULL, 0);
+		smp_call(cpu, trigger_thr_int, NULL, SMP_CALL_TYPE_ASYNC);
 		break;
 	default:
-		smp_call_function_single(cpu, trigger_mce, NULL, 0);
+		smp_call(cpu, trigger_mce, NULL, SMP_CALL_TYPE_ASYNC);
 	}
 
 err:
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 239ff5fcec6a..ccae0fb0d2d7 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -332,7 +332,7 @@ static int collect_cpu_info_on_target(int cpu, struct cpu_signature *cpu_sig)
 	struct cpu_info_ctx ctx = { .cpu_sig = cpu_sig, .err = 0 };
 	int ret;
 
-	ret = smp_call_function_single(cpu, collect_cpu_info_local, &ctx, 1);
+	ret = smp_call(cpu, collect_cpu_info_local, &ctx, SMP_CALL_TYPE_SYNC);
 	if (!ret)
 		ret = ctx.err;
 
@@ -365,7 +365,7 @@ static int apply_microcode_on_target(int cpu)
 	enum ucode_state err;
 	int ret;
 
-	ret = smp_call_function_single(cpu, apply_microcode_local, &err, 1);
+	ret = smp_call(cpu, apply_microcode_local, &err, SMP_CALL_TYPE_SYNC);
 	if (!ret) {
 		if (err == UCODE_ERROR)
 			ret = 1;
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 2746cac9d8a9..1ef6068a9970 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -820,7 +820,7 @@ void mtrr_save_state(void)
 		return;
 
 	first_cpu = cpumask_first(cpu_online_mask);
-	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
+	smp_call(first_cpu, mtrr_save_fixed_ranges, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void set_mtrr_aps_delayed_init(void)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index c6f02d1f0673..22be4468c233 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -541,7 +541,7 @@ static void _update_task_closid_rmid(void *task)
 static void update_task_closid_rmid(struct task_struct *t)
 {
 	if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
-		smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
+		smp_call(task_cpu(t), _update_task_closid_rmid, t, SMP_CALL_TYPE_SYNC);
 	else
 		_update_task_closid_rmid(t);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 9f1623917322..a2820709b23c 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1131,7 +1131,7 @@ void arch_haltpoll_enable(unsigned int cpu)
 	}
 
 	/* Enable guest halt poll disables host halt poll */
-	smp_call_function_single(cpu, kvm_disable_host_haltpoll, NULL, 1);
+	smp_call(cpu, kvm_disable_host_haltpoll, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(arch_haltpoll_enable);
 
@@ -1141,7 +1141,7 @@ void arch_haltpoll_disable(unsigned int cpu)
 		return;
 
 	/* Disable guest halt poll enables host halt poll */
-	smp_call_function_single(cpu, kvm_enable_host_haltpoll, NULL, 1);
+	smp_call(cpu, kvm_enable_host_haltpoll, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(arch_haltpoll_disable);
 #endif
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 610355b9ccce..6d66a5e524d0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -656,8 +656,7 @@ void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
 	int cpu = loaded_vmcs->cpu;
 
 	if (cpu != -1)
-		smp_call_function_single(cpu,
-			 __loaded_vmcs_clear, loaded_vmcs, 1);
+		smp_call(cpu, __loaded_vmcs_clear, loaded_vmcs, SMP_CALL_TYPE_SYNC);
 }
 
 static bool vmx_segment_cache_test_set(struct vcpu_vmx *vmx, unsigned seg,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 691cbf38304c..8015b688b32a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4551,8 +4551,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		if (static_call(kvm_x86_has_wbinvd_exit)())
 			cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
 		else if (vcpu->cpu != -1 && vcpu->cpu != cpu)
-			smp_call_function_single(vcpu->cpu,
-					wbinvd_ipi, NULL, 1);
+			smp_call(vcpu->cpu, wbinvd_ipi, NULL, SMP_CALL_TYPE_SYNC);
 	}
 
 	static_call(kvm_x86_vcpu_load)(vcpu, cpu);
@@ -8729,7 +8728,7 @@ static void __kvmclock_cpufreq_notifier(struct cpufreq_freqs *freq, int cpu)
 	 *
 	 */
 
-	smp_call_function_single(cpu, tsc_khz_changed, freq, 1);
+	smp_call(cpu, tsc_khz_changed, freq, SMP_CALL_TYPE_SYNC);
 
 	mutex_lock(&kvm_lock);
 	list_for_each_entry(kvm, &vm_list, vm_list) {
@@ -8756,7 +8755,7 @@ static void __kvmclock_cpufreq_notifier(struct cpufreq_freqs *freq, int cpu)
 		 * guest context is entered kvmclock will be updated,
 		 * so the guest will not see stale values.
 		 */
-		smp_call_function_single(cpu, tsc_khz_changed, freq, 1);
+		smp_call(cpu, tsc_khz_changed, freq, SMP_CALL_TYPE_SYNC);
 	}
 }
 
diff --git a/arch/x86/lib/cache-smp.c b/arch/x86/lib/cache-smp.c
index 20d00dc2bb8c..bbfa2ef4c1ea 100644
--- a/arch/x86/lib/cache-smp.c
+++ b/arch/x86/lib/cache-smp.c
@@ -9,7 +9,7 @@ static void __wbinvd(void *dummy)
 
 void wbinvd_on_cpu(int cpu)
 {
-	smp_call_function_single(cpu, __wbinvd, NULL, 1);
+	smp_call(cpu, __wbinvd, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(wbinvd_on_cpu);
 
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index cc33097651fc..eb1c197883b8 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -41,7 +41,7 @@ int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
 	memset(&rv, 0, sizeof(rv));
 
 	rv.msr_no = msr_no;
-	err = smp_call_function_single(cpu, __rdmsr_on_cpu, &rv, 1);
+	err = smp_call(cpu, __rdmsr_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 	*l = rv.reg.l;
 	*h = rv.reg.h;
 
@@ -57,7 +57,7 @@ int rdmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 *q)
 	memset(&rv, 0, sizeof(rv));
 
 	rv.msr_no = msr_no;
-	err = smp_call_function_single(cpu, __rdmsr_on_cpu, &rv, 1);
+	err = smp_call(cpu, __rdmsr_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 	*q = rv.reg.q;
 
 	return err;
@@ -74,7 +74,7 @@ int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
 	rv.msr_no = msr_no;
 	rv.reg.l = l;
 	rv.reg.h = h;
-	err = smp_call_function_single(cpu, __wrmsr_on_cpu, &rv, 1);
+	err = smp_call(cpu, __wrmsr_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err;
 }
@@ -90,7 +90,7 @@ int wrmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 q)
 	rv.msr_no = msr_no;
 	rv.reg.q = q;
 
-	err = smp_call_function_single(cpu, __wrmsr_on_cpu, &rv, 1);
+	err = smp_call(cpu, __wrmsr_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err;
 }
@@ -200,7 +200,7 @@ int wrmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
 	rv.msr_no = msr_no;
 	rv.reg.l = l;
 	rv.reg.h = h;
-	err = smp_call_function_single(cpu, __wrmsr_safe_on_cpu, &rv, 1);
+	err = smp_call(cpu, __wrmsr_safe_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
@@ -216,7 +216,7 @@ int wrmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 q)
 	rv.msr_no = msr_no;
 	rv.reg.q = q;
 
-	err = smp_call_function_single(cpu, __wrmsr_safe_on_cpu, &rv, 1);
+	err = smp_call(cpu, __wrmsr_safe_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
@@ -259,7 +259,7 @@ int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8])
 
 	rv.regs   = regs;
 	rv.err    = -EIO;
-	err = smp_call_function_single(cpu, __rdmsr_safe_regs_on_cpu, &rv, 1);
+	err = smp_call(cpu, __rdmsr_safe_regs_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
@@ -272,7 +272,7 @@ int wrmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8])
 
 	rv.regs = regs;
 	rv.err  = -EIO;
-	err = smp_call_function_single(cpu, __wrmsr_safe_regs_on_cpu, &rv, 1);
+	err = smp_call(cpu, __wrmsr_safe_regs_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index ec37a4d2c0f9..d83c5a3cce33 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -930,7 +930,7 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
 		for_each_online_cpu(cpu) {
 			if (per_cpu(xen_current_cr3, cpu) != __pa(mm->pgd))
 				continue;
-			smp_call_function_single(cpu, drop_mm_ref_this_cpu, mm, 1);
+			smp_call(cpu, drop_mm_ref_this_cpu, mm, SMP_CALL_TYPE_SYNC);
 		}
 		return;
 	}
diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c
index 5878ba8f0274..2d30ae400d5b 100644
--- a/arch/xtensa/kernel/smp.c
+++ b/arch/xtensa/kernel/smp.c
@@ -201,7 +201,7 @@ static int boot_secondary(unsigned int cpu, struct task_struct *ts)
 	system_flush_invalidate_dcache_range((unsigned long)&cpu_start_id,
 					     sizeof(cpu_start_id));
 #endif
-	smp_call_function_single(0, mx_cpu_start, (void *)cpu, 1);
+	smp_call(0, mx_cpu_start, (void *)cpu, SMP_CALL_TYPE_SYNC);
 
 	for (i = 0; i < 2; ++i) {
 		do
@@ -220,8 +220,7 @@ static int boot_secondary(unsigned int cpu, struct task_struct *ts)
 		} while (ccount && time_before(jiffies, timeout));
 
 		if (ccount) {
-			smp_call_function_single(0, mx_cpu_stop,
-						 (void *)cpu, 1);
+			smp_call(0, mx_cpu_stop, (void *)cpu, SMP_CALL_TYPE_SYNC);
 			WRITE_ONCE(cpu_start_ccount, 0);
 			return -EIO;
 		}
@@ -292,7 +291,7 @@ int __cpu_disable(void)
 
 static void platform_cpu_kill(unsigned int cpu)
 {
-	smp_call_function_single(0, mx_cpu_stop, (void *)cpu, true);
+	smp_call(0, mx_cpu_stop, (void *)cpu, SMP_CALL_TYPE_SYNC);
 }
 
 /*
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index eb95e188d62b..a0002ff8a0a5 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -155,8 +155,8 @@ static void __lapic_timer_propagate_broadcast(void *arg)
 
 static void lapic_timer_propagate_broadcast(struct acpi_processor *pr)
 {
-	smp_call_function_single(pr->id, __lapic_timer_propagate_broadcast,
-				 (void *)pr, 1);
+	smp_call(pr->id, __lapic_timer_propagate_broadcast,
+			 (void *)pr, SMP_CALL_TYPE_SYNC);
 }
 
 /* Power(C) State timer broadcast control */
diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
index d289036beff2..f5a83043c321 100644
--- a/drivers/cpufreq/powernow-k8.c
+++ b/drivers/cpufreq/powernow-k8.c
@@ -1025,7 +1025,7 @@ static int powernowk8_cpu_init(struct cpufreq_policy *pol)
 	struct init_on_cpu init_on_cpu;
 	int rc, cpu;
 
-	smp_call_function_single(pol->cpu, check_supported_cpu, &rc, 1);
+	smp_call(pol->cpu, check_supported_cpu, &rc, SMP_CALL_TYPE_SYNC);
 	if (rc)
 		return -ENODEV;
 
@@ -1062,8 +1062,7 @@ static int powernowk8_cpu_init(struct cpufreq_policy *pol)
 
 	/* only run on specific CPU from here on */
 	init_on_cpu.data = data;
-	smp_call_function_single(data->cpu, powernowk8_cpu_init_on_cpu,
-				 &init_on_cpu, 1);
+	smp_call(data->cpu, powernowk8_cpu_init_on_cpu, &init_on_cpu, SMP_CALL_TYPE_SYNC);
 	rc = init_on_cpu.rc;
 	if (rc != 0)
 		goto err_out_exit_acpi;
@@ -1124,7 +1123,7 @@ static unsigned int powernowk8_get(unsigned int cpu)
 	if (!data)
 		return 0;
 
-	smp_call_function_single(cpu, query_values_on_cpu, &err, true);
+	smp_call(cpu, query_values_on_cpu, &err, SMP_CALL_TYPE_SYNC);
 	if (err)
 		goto out;
 
@@ -1182,7 +1181,7 @@ static int powernowk8_init(void)
 
 	cpus_read_lock();
 	for_each_online_cpu(i) {
-		smp_call_function_single(i, check_supported_cpu, &ret, 1);
+		smp_call(i, check_supported_cpu, &ret, SMP_CALL_TYPE_SYNC);
 		if (!ret)
 			supported_cpus++;
 	}
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 078ab1c46f73..fa2f6f868970 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -898,7 +898,7 @@ static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 
 	freq_data.pstate_id = idx_to_pstate(powernv_pstate_info.min);
 	freq_data.gpstate_id = idx_to_pstate(powernv_pstate_info.min);
-	smp_call_function_single(policy->cpu, set_pstate, &freq_data, 1);
+	smp_call(policy->cpu, set_pstate, &freq_data, SMP_CALL_TYPE_SYNC);
 	if (gpstates)
 		del_timer_sync(&gpstates->timer);
 
diff --git a/drivers/cpufreq/sparc-us2e-cpufreq.c b/drivers/cpufreq/sparc-us2e-cpufreq.c
index 92acbb25abb3..8c447bf67dc5 100644
--- a/drivers/cpufreq/sparc-us2e-cpufreq.c
+++ b/drivers/cpufreq/sparc-us2e-cpufreq.c
@@ -236,7 +236,7 @@ static unsigned int us2e_freq_get(unsigned int cpu)
 	unsigned long clock_tick, estar;
 
 	clock_tick = sparc64_get_clock_tick(cpu) / 1000;
-	if (smp_call_function_single(cpu, __us2e_freq_get, &estar, 1))
+	if (smp_call(cpu, __us2e_freq_get, &estar, SMP_CALL_TYPE_SYNC))
 		return 0;
 
 	return clock_tick / estar_to_divisor(estar);
@@ -268,7 +268,7 @@ static int us2e_freq_target(struct cpufreq_policy *policy, unsigned int index)
 {
 	unsigned int cpu = policy->cpu;
 
-	return smp_call_function_single(cpu, __us2e_freq_target, &index, 1);
+	return smp_call(cpu, __us2e_freq_target, &index, SMP_CALL_TYPE_SYNC);
 }
 
 static int __init us2e_freq_cpu_init(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/sparc-us3-cpufreq.c b/drivers/cpufreq/sparc-us3-cpufreq.c
index e41b35b16afd..0dd7cd16bf93 100644
--- a/drivers/cpufreq/sparc-us3-cpufreq.c
+++ b/drivers/cpufreq/sparc-us3-cpufreq.c
@@ -87,7 +87,7 @@ static unsigned int us3_freq_get(unsigned int cpu)
 {
 	unsigned long reg;
 
-	if (smp_call_function_single(cpu, read_safari_cfg, &reg, 1))
+	if (smp_call(cpu, read_safari_cfg, &reg, SMP_CALL_TYPE_SYNC))
 		return 0;
 	return get_current_freq(cpu, reg);
 }
@@ -116,7 +116,7 @@ static int us3_freq_target(struct cpufreq_policy *policy, unsigned int index)
 		BUG();
 	}
 
-	return smp_call_function_single(cpu, update_safari_cfg, &new_bits, 1);
+	return smp_call(cpu, update_safari_cfg, &new_bits, SMP_CALL_TYPE_SYNC);
 }
 
 static int __init us3_freq_cpu_init(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/speedstep-ich.c b/drivers/cpufreq/speedstep-ich.c
index f2076d72bf39..be062ac5804c 100644
--- a/drivers/cpufreq/speedstep-ich.c
+++ b/drivers/cpufreq/speedstep-ich.c
@@ -243,7 +243,7 @@ static unsigned int speedstep_get(unsigned int cpu)
 	unsigned int speed;
 
 	/* You're supposed to ensure CPU is online. */
-	BUG_ON(smp_call_function_single(cpu, get_freq_data, &speed, 1));
+	BUG_ON(smp_call(cpu, get_freq_data, &speed, SMP_CALL_TYPE_SYNC));
 
 	pr_debug("detected %u kHz as current frequency\n", speed);
 	return speed;
@@ -262,8 +262,7 @@ static int speedstep_target(struct cpufreq_policy *policy, unsigned int index)
 
 	policy_cpu = cpumask_any_and(policy->cpus, cpu_online_mask);
 
-	smp_call_function_single(policy_cpu, _speedstep_set_state, &index,
-				 true);
+	smp_call(policy_cpu, _speedstep_set_state, &index, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
@@ -299,7 +298,7 @@ static int speedstep_cpu_init(struct cpufreq_policy *policy)
 
 	/* detect low and high frequency and transition latency */
 	gf.policy = policy;
-	smp_call_function_single(policy_cpu, get_freqs_on_cpu, &gf, 1);
+	smp_call(policy_cpu, get_freqs_on_cpu, &gf, SMP_CALL_TYPE_SYNC);
 	if (gf.ret)
 		return gf.ret;
 
diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c
index 99fc456f16b6..09a76382f383 100644
--- a/drivers/cpufreq/tegra194-cpufreq.c
+++ b/drivers/cpufreq/tegra194-cpufreq.c
@@ -203,13 +203,13 @@ static unsigned int tegra194_get_speed(u32 cpu)
 	int ret;
 	u32 cl;
 
-	smp_call_function_single(cpu, get_cpu_cluster, &cl, true);
+	smp_call(cpu, get_cpu_cluster, &cl, SMP_CALL_TYPE_SYNC);
 
 	/* reconstruct actual cpu freq using counters */
 	rate = tegra194_calculate_speed(cpu);
 
 	/* get last written ndiv value */
-	ret = smp_call_function_single(cpu, get_cpu_ndiv, &ndiv, true);
+	ret = smp_call(cpu, get_cpu_ndiv, &ndiv, SMP_CALL_TYPE_SYNC);
 	if (WARN_ON_ONCE(ret))
 		return rate;
 
@@ -240,7 +240,7 @@ static int tegra194_cpufreq_init(struct cpufreq_policy *policy)
 	u32 cpu;
 	u32 cl;
 
-	smp_call_function_single(policy->cpu, get_cpu_cluster, &cl, true);
+	smp_call(policy->cpu, get_cpu_cluster, &cl, SMP_CALL_TYPE_SYNC);
 
 	if (cl >= data->num_clusters || !data->tables[cl])
 		return -EINVAL;
diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 8845ec4b4402..f051a658f1cf 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -590,8 +590,7 @@ static int debug_probe(struct amba_device *adev, const struct amba_id *id)
 
 	cpus_read_lock();
 	per_cpu(debug_drvdata, drvdata->cpu) = drvdata;
-	ret = smp_call_function_single(drvdata->cpu, debug_init_arch_data,
-				       drvdata, 1);
+	ret = smp_call(drvdata->cpu, debug_init_arch_data, drvdata, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 
 	if (ret) {
diff --git a/drivers/hwtracing/coresight/coresight-etm3x-core.c b/drivers/hwtracing/coresight/coresight-etm3x-core.c
index 7d413ba8b823..7f11115163e2 100644
--- a/drivers/hwtracing/coresight/coresight-etm3x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm3x-core.c
@@ -518,8 +518,8 @@ static int etm_enable_sysfs(struct coresight_device *csdev)
 	 */
 	if (cpu_online(drvdata->cpu)) {
 		arg.drvdata = drvdata;
-		ret = smp_call_function_single(drvdata->cpu,
-					       etm_enable_hw_smp_call, &arg, 1);
+		ret = smp_call(drvdata->cpu, etm_enable_hw_smp_call,
+				&arg, SMP_CALL_TYPE_SYNC);
 		if (!ret)
 			ret = arg.rc;
 		if (!ret)
@@ -630,7 +630,7 @@ static void etm_disable_sysfs(struct coresight_device *csdev)
 	 * Executing etm_disable_hw on the cpu whose ETM is being disabled
 	 * ensures that register writes occur when cpu is powered.
 	 */
-	smp_call_function_single(drvdata->cpu, etm_disable_hw, drvdata, 1);
+	smp_call(drvdata->cpu, etm_disable_hw, drvdata, SMP_CALL_TYPE_SYNC);
 
 	spin_unlock(&drvdata->spinlock);
 	cpus_read_unlock();
@@ -864,8 +864,7 @@ static int etm_probe(struct amba_device *adev, const struct amba_id *id)
 	if (!desc.name)
 		return -ENOMEM;
 
-	if (smp_call_function_single(drvdata->cpu,
-				     etm_init_arch_data,  drvdata, 1))
+	if (smp_call(drvdata->cpu, etm_init_arch_data,  drvdata, SMP_CALL_TYPE_SYNC))
 		dev_err(dev, "ETM arch init failed\n");
 
 	if (etm_arch_supported(drvdata->arch) == false)
@@ -933,7 +932,7 @@ static void etm_remove(struct amba_device *adev)
 	 * CPU i ensures these call backs has consistent view
 	 * inside one call back function.
 	 */
-	if (smp_call_function_single(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, 1))
+	if (smp_call(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, SMP_CALL_TYPE_SYNC))
 		etmdrvdata[drvdata->cpu] = NULL;
 
 	cpus_read_unlock();
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 7f416a12000e..4f136e477808 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -746,8 +746,8 @@ static int etm4_enable_sysfs(struct coresight_device *csdev)
 	 * ensures that register writes occur when cpu is powered.
 	 */
 	arg.drvdata = drvdata;
-	ret = smp_call_function_single(drvdata->cpu,
-				       etm4_enable_hw_smp_call, &arg, 1);
+	ret = smp_call(drvdata->cpu, etm4_enable_hw_smp_call,
+			&arg, SMP_CALL_TYPE_SYNC);
 	if (!ret)
 		ret = arg.rc;
 	if (!ret)
@@ -903,7 +903,7 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
 	 * Executing etm4_disable_hw on the cpu whose ETM is being disabled
 	 * ensures that register writes occur when cpu is powered.
 	 */
-	smp_call_function_single(drvdata->cpu, etm4_disable_hw, drvdata, 1);
+	smp_call(drvdata->cpu, etm4_disable_hw, drvdata, SMP_CALL_TYPE_SYNC);
 
 	spin_unlock(&drvdata->spinlock);
 	cpus_read_unlock();
@@ -1977,8 +1977,8 @@ static int etm4_probe(struct device *dev, void __iomem *base, u32 etm_pid)
 	init_arg.csa = &desc.access;
 	init_arg.pid = etm_pid;
 
-	if (smp_call_function_single(drvdata->cpu,
-				etm4_init_arch_data,  &init_arg, 1))
+	if (smp_call(drvdata->cpu, etm4_init_arch_data,
+				&init_arg, SMP_CALL_TYPE_SYNC))
 		dev_err(dev, "ETM arch init failed\n");
 
 	if (!drvdata->arch)
@@ -2118,7 +2118,7 @@ static int __exit etm4_remove_dev(struct etmv4_drvdata *drvdata)
 	 * CPU i ensures these call backs has consistent view
 	 * inside one call back function.
 	 */
-	if (smp_call_function_single(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, 1))
+	if (smp_call(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, SMP_CALL_TYPE_SYNC))
 		etmdrvdata[drvdata->cpu] = NULL;
 
 	cpus_read_unlock();
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
index 21687cc1e4e2..377f7e40b081 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
@@ -2379,7 +2379,7 @@ static u32 etmv4_cross_read(const struct etmv4_drvdata *drvdata, u32 offset)
 	 * smp cross call ensures the CPU will be powered up before
 	 * accessing the ETMv4 trace core registers
 	 */
-	smp_call_function_single(drvdata->cpu, do_smp_cross_read, &reg, 1);
+	smp_call(drvdata->cpu, do_smp_cross_read, &reg, SMP_CALL_TYPE_SYNC);
 	return reg.data;
 }
 
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 2b386bb848f8..c0e3cff19cdb 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -1350,12 +1350,12 @@ static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
 
 	for_each_cpu(cpu, &drvdata->supported_cpus) {
 		/* If we fail to probe the CPU, let us defer it to hotplug callbacks */
-		if (smp_call_function_single(cpu, arm_trbe_probe_cpu, drvdata, 1))
+		if (smp_call(cpu, arm_trbe_probe_cpu, drvdata, SMP_CALL_TYPE_SYNC))
 			continue;
 		if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
 			arm_trbe_register_coresight_cpu(drvdata, cpu);
 		if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
-			smp_call_function_single(cpu, arm_trbe_enable_cpu, drvdata, 1);
+			smp_call(cpu, arm_trbe_enable_cpu, drvdata, SMP_CALL_TYPE_SYNC);
 	}
 	return 0;
 }
@@ -1365,7 +1365,7 @@ static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
 	int cpu;
 
 	for_each_cpu(cpu, &drvdata->supported_cpus)
-		smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
+		smp_call(cpu, arm_trbe_remove_coresight_cpu, drvdata, SMP_CALL_TYPE_SYNC);
 	free_percpu(drvdata->cpudata);
 	return 0;
 }
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 77042499baa4..165c59778a33 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4304,8 +4304,8 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
 		/* Update the interrupt mask on each CPU according the
 		 * new mapping
 		 */
-		smp_call_function_single(cpu, mvneta_percpu_unmask_interrupt,
-					 pp, true);
+		smp_call(cpu, mvneta_percpu_unmask_interrupt,
+			  pp, SMP_CALL_TYPE_SYNC);
 		i++;
 
 	}
diff --git a/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c b/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c
index 1b6eab071068..2ae76aa42d50 100644
--- a/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c
+++ b/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c
@@ -124,8 +124,8 @@ static long isst_if_mbox_proc_cmd(u8 *cmd_ptr, int *write_only, int resume)
 	 * and also with wait flag, wait for completion.
 	 * smp_call_function_single is using get_cpu() and put_cpu().
 	 */
-	ret = smp_call_function_single(action.mbox_cmd->logical_cpu,
-				       msrl_update_func, &action, 1);
+	ret = smp_call(action.mbox_cmd->logical_cpu,
+			msrl_update_func, &action, SMP_CALL_TYPE_SYNC);
 	if (ret)
 		return ret;
 
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 07611a00b78f..518ef59e7467 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -913,7 +913,7 @@ static void package_power_limit_irq_save(struct rapl_package *rp)
 	if (!boot_cpu_has(X86_FEATURE_PTS) || !boot_cpu_has(X86_FEATURE_PLN))
 		return;
 
-	smp_call_function_single(rp->lead_cpu, power_limit_irq_save_cpu, rp, 1);
+	smp_call(rp->lead_cpu, power_limit_irq_save_cpu, rp, SMP_CALL_TYPE_SYNC);
 }
 
 /*
diff --git a/drivers/powercap/intel_rapl_msr.c b/drivers/powercap/intel_rapl_msr.c
index 1be45f36ab6c..0430bafbffed 100644
--- a/drivers/powercap/intel_rapl_msr.c
+++ b/drivers/powercap/intel_rapl_msr.c
@@ -128,7 +128,7 @@ static int rapl_msr_write_raw(int cpu, struct reg_action *ra)
 {
 	int ret;
 
-	ret = smp_call_function_single(cpu, rapl_msr_update_func, ra, 1);
+	ret = smp_call(cpu, rapl_msr_update_func, ra, SMP_CALL_TYPE_SYNC);
 	if (WARN_ON_ONCE(ret))
 		return ret;
 
diff --git a/drivers/regulator/qcom_spmi-regulator.c b/drivers/regulator/qcom_spmi-regulator.c
index 02bfce981150..222fc60bb1f3 100644
--- a/drivers/regulator/qcom_spmi-regulator.c
+++ b/drivers/regulator/qcom_spmi-regulator.c
@@ -1292,8 +1292,7 @@ spmi_regulator_saw_set_voltage(struct regulator_dev *rdev, unsigned selector)
 	}
 
 	/* Always do the SAW register writes on the first CPU */
-	return smp_call_function_single(0, spmi_saw_set_vdd, \
-					&voltage_sel, true);
+	return smp_call(0, spmi_saw_set_vdd, &voltage_sel, SMP_CALL_TYPE_SYNC);
 }
 
 static struct regulator_ops spmi_saw_ops = {};
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index fde4edd83c14..4fb3cab1bf5d 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -2548,8 +2548,8 @@ void qman_delete_cgr_safe(struct qman_cgr *cgr)
 {
 	preempt_disable();
 	if (qman_cgr_cpus[cgr->cgrid] != smp_processor_id()) {
-		smp_call_function_single(qman_cgr_cpus[cgr->cgrid],
-					 qman_delete_cgr_smp_call, cgr, true);
+		smp_call(qman_cgr_cpus[cgr->cgrid],
+			  qman_delete_cgr_smp_call, cgr, SMP_CALL_TYPE_SYNC);
 		preempt_enable();
 		return;
 	}
diff --git a/drivers/soc/fsl/qbman/qman_test_stash.c b/drivers/soc/fsl/qbman/qman_test_stash.c
index b7e8e5ec884c..e4110c3177e5 100644
--- a/drivers/soc/fsl/qbman/qman_test_stash.c
+++ b/drivers/soc/fsl/qbman/qman_test_stash.c
@@ -507,8 +507,9 @@ static int init_phase3(void)
 				if (err)
 					return err;
 			} else {
-				smp_call_function_single(hp_cpu->processor_id,
-					init_handler_cb, hp_cpu->iterator, 1);
+				smp_call(hp_cpu->processor_id,
+					init_handler_cb, hp_cpu->iterator,
+					SMP_CALL_TYPE_SYNC);
 			}
 			preempt_enable();
 		}
@@ -607,8 +608,8 @@ int qman_test_stash(void)
 		if (err)
 			goto failed;
 	} else {
-		smp_call_function_single(special_handler->processor_id,
-					 send_first_frame_cb, NULL, 1);
+		smp_call(special_handler->processor_id,
+				 send_first_frame_cb, NULL, SMP_CALL_TYPE_SYNC);
 	}
 	preempt_enable();
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 1086f8d5e18c..96066dcf6eef 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -160,6 +160,7 @@ do {						\
 #define	SMP_CALL_TYPE_TTWU		CSD_TYPE_TTWU
 #define	SMP_CALL_TYPE_MASK		CSD_FLAG_TYPE_MASK
 
+/* Cross call on all online CPUs */
 #define	SMP_CALL_ALL		-1
 #define	SMP_CALL_MASK_ALL	((struct cpumask *)-1)
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d0a9aa0b42e8..8217b989c3cf 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1091,8 +1091,8 @@ void cpuhp_report_idle_dead(void)
 	 * We cannot call complete after rcu_report_dead() so we delegate it
 	 * to an online cpu.
 	 */
-	smp_call_function_single(cpumask_first(cpu_online_mask),
-				 cpuhp_complete_idle_dead, st, 0);
+	smp_call(cpumask_first(cpu_online_mask),
+			 cpuhp_complete_idle_dead, st, SMP_CALL_TYPE_ASYNC);
 }
 
 static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7858bafffa9d..180eeb04e5a5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -116,8 +116,8 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info)
 	int ret;
 
 	for (;;) {
-		ret = smp_call_function_single(task_cpu(p), remote_function,
-					       &data, 1);
+		ret = smp_call(task_cpu(p), remote_function,
+				&data, SMP_CALL_TYPE_SYNC);
 		if (!ret)
 			ret = data.ret;
 
@@ -149,7 +149,7 @@ static int cpu_function_call(int cpu, remote_function_f func, void *info)
 		.ret	= -ENXIO, /* No such CPU */
 	};
 
-	smp_call_function_single(cpu, remote_function, &data, 1);
+	smp_call(cpu, remote_function, &data, SMP_CALL_TYPE_SYNC);
 
 	return data.ret;
 }
@@ -4513,7 +4513,7 @@ static int perf_event_read(struct perf_event *event, bool group)
 		 * Therefore, either way, we'll have an up-to-date event count
 		 * after this.
 		 */
-		(void)smp_call_function_single(event_cpu, __perf_event_read, &data, 1);
+		(void)smp_call(event_cpu, __perf_event_read, &data, SMP_CALL_TYPE_SYNC);
 		preempt_enable();
 		ret = data.ret;
 
@@ -13292,7 +13292,7 @@ static void perf_event_exit_cpu_context(int cpu)
 		ctx = &cpuctx->ctx;
 
 		mutex_lock(&ctx->mutex);
-		smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
+		smp_call(cpu, __perf_event_exit_context, ctx, SMP_CALL_TYPE_SYNC);
 		cpuctx->online = 0;
 		mutex_unlock(&ctx->mutex);
 	}
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 55d049c39608..ad9c33487526 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2662,8 +2662,7 @@ static int rcu_torture_barrier_cbs(void *arg)
 		 * The above smp_load_acquire() ensures barrier_phase load
 		 * is ordered before the following ->call().
 		 */
-		if (smp_call_function_single(myid, rcu_torture_barrier1cb,
-					     &rcu, 1)) {
+		if (smp_call(myid, rcu_torture_barrier1cb, &rcu, SMP_CALL_TYPE_SYNC)) {
 			// IPI failed, so use direct call from current CPU.
 			cur_ops->call(&rcu, rcu_torture_barrier_cbf);
 		}
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 99cf3a13954c..428fe1451aeb 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1299,7 +1299,7 @@ static void trc_wait_for_one_reader(struct task_struct *t,
 		per_cpu(trc_ipi_to_cpu, cpu) = true;
 		t->trc_ipi_to_cpu = cpu;
 		rcu_tasks_trace.n_ipis++;
-		if (smp_call_function_single(cpu, trc_read_check_handler, t, 0)) {
+		if (smp_call(cpu, trc_read_check_handler, t, SMP_CALL_TYPE_ASYNC)) {
 			// Just in case there is some other reason for
 			// failure than the target CPU being offline.
 			WARN_ONCE(1, "%s():  smp_call_function_single() failed for CPU: %d\n",
@@ -1473,7 +1473,7 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
 	// changes, there will need to be a recheck and/or timed wait.
 	for_each_online_cpu(cpu)
 		if (WARN_ON_ONCE(smp_load_acquire(per_cpu_ptr(&trc_ipi_to_cpu, cpu))))
-			smp_call_function_single(cpu, rcu_tasks_trace_empty_fn, NULL, 1);
+			smp_call(cpu, rcu_tasks_trace_empty_fn, NULL, SMP_CALL_TYPE_SYNC);
 
 	// Remove the safety count.
 	smp_mb__before_atomic();  // Order vs. earlier atomics
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7bf1db5ebade..aac99328d193 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4100,7 +4100,7 @@ void rcu_barrier(void)
 			continue;
 		}
 		raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);
-		if (smp_call_function_single(cpu, rcu_barrier_handler, (void *)cpu, 1)) {
+		if (smp_call(cpu, rcu_barrier_handler, (void *)cpu, SMP_CALL_TYPE_SYNC)) {
 			schedule_timeout_uninterruptible(1);
 			goto retry;
 		}
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 60197ea24ceb..ef4cec63a544 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -391,7 +391,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
 			put_cpu();
 			continue;
 		}
-		ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
+		ret = smp_call(cpu, rcu_exp_handler, NULL, SMP_CALL_TYPE_ASYNC);
 		put_cpu();
 		/* The CPU will report the QS in response to the IPI. */
 		if (!ret)
@@ -777,7 +777,7 @@ static void sync_sched_exp_online_cleanup(int cpu)
 		return;
 	}
 	/* Quiescent state needed on some other CPU, send IPI. */
-	ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
+	ret = smp_call(cpu, rcu_exp_handler, NULL, SMP_CALL_TYPE_ASYNC);
 	put_cpu();
 	WARN_ON_ONCE(ret);
 }
diff --git a/kernel/relay.c b/kernel/relay.c
index d1a67fbb819d..0e615f5367a7 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -635,9 +635,8 @@ int relay_late_setup_files(struct rchan *chan,
 			disp.dentry = dentry;
 			smp_mb();
 			/* relay_channels_mutex must be held, so wait. */
-			err = smp_call_function_single(i,
-						       __relay_set_buf_dentry,
-						       &disp, 1);
+			err = smp_call(i, __relay_set_buf_dentry,
+					&disp, SMP_CALL_TYPE_SYNC);
 		}
 		if (unlikely(err))
 			break;
diff --git a/kernel/scftorture.c b/kernel/scftorture.c
index 4b6dd708f1f6..79bcc9bf86a5 100644
--- a/kernel/scftorture.c
+++ b/kernel/scftorture.c
@@ -351,7 +351,8 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 			barrier(); // Prevent race-reduction compiler optimizations.
 			scfcp->scfc_in = true;
 		}
-		ret = smp_call_function_single(cpu, scf_handler_1, (void *)scfcp, scfsp->scfs_wait);
+		ret = smp_call(cpu, scf_handler_1, (void *)scfcp,
+				(scfsp->scfs_wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 		if (ret) {
 			if (scfsp->scfs_wait)
 				scfp->n_single_wait_ofl++;
@@ -372,7 +373,7 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 		scfcp->scfc_rpc = true;
 		barrier(); // Prevent race-reduction compiler optimizations.
 		scfcp->scfc_in = true;
-		ret = smp_call_function_single(cpu, scf_handler_1, (void *)scfcp, 0);
+		ret = smp_call(cpu, scf_handler_1, (void *)scfcp, SMP_CALL_TYPE_ASYNC);
 		if (!ret) {
 			if (use_cpus_read_lock)
 				cpus_read_unlock();
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 427abc8aee02..e5786b977bd9 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -379,7 +379,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		 * smp_call_function_single() will call ipi_func() if cpu_id
 		 * is the calling CPU.
 		 */
-		smp_call_function_single(cpu_id, ipi_func, NULL, 1);
+		smp_call(cpu_id, ipi_func, NULL, SMP_CALL_TYPE_SYNC);
 	} else {
 		/*
 		 * For regular membarrier, we can save a few cycles by
diff --git a/kernel/smp.c b/kernel/smp.c
index c783f5c533e2..ac54699575b1 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -611,36 +611,6 @@ void flush_smp_call_function_from_idle(void)
 	local_irq_restore(flags);
 }
 
-/*
- * This is a temporarily hook up. This function will be eliminated
- * with a later patch in this series.
- *
- * smp_call_function_single - Run a function on a specific CPU
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait until function has completed on other CPUs.
- *
- * Returns 0 on success, else a negative status code.
- */
-int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
-			int wait)
-{
-	unsigned int flags = 0;
-
-	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu))
-		return -ENXIO;
-
-	if (wait)
-		flags = SMP_CALL_TYPE_SYNC;
-	else
-		flags = SMP_CALL_TYPE_ASYNC;
-
-	smp_call(cpu, func, info, flags);
-
-	return 0;
-}
-EXPORT_SYMBOL(smp_call_function_single);
-
 static void __smp_call_mask_cond(const struct cpumask *mask,
 		smp_call_func_t func, void *info,
 		smp_cond_func_t cond_func, bool local_cpu,
@@ -1190,7 +1160,8 @@ int smp_call_csd(int cpu, call_single_data_t *csd, unsigned int flags)
 		csd_lock_record(csd);
 		csd_unlock(csd);
 		local_irq_save(flags);
-		func(info);
+		if (likely(func != NULL))
+			func(info);
 		csd_lock_record(NULL);
 		local_irq_restore(flags);
 		goto out;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 003ccf338d20..862d56cd554a 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -418,7 +418,7 @@ static int clockevents_unbind(struct clock_event_device *ced, int cpu)
 {
 	struct ce_unbind cu = { .ce = ced, .res = -ENODEV };
 
-	smp_call_function_single(cpu, __clockevents_unbind, &cu, 1);
+	smp_call(cpu, __clockevents_unbind, &cu, SMP_CALL_TYPE_SYNC);
 	return cu.res;
 }
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 95d7ca35bdf2..7830fa290474 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -355,7 +355,7 @@ void clocksource_verify_percpu(struct clocksource *cs)
 		if (cpu == testcpu)
 			continue;
 		csnow_begin = cs->read(cs);
-		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		smp_call(cpu, clocksource_verify_one_cpu, cs, SMP_CALL_TYPE_SYNC);
 		csnow_end = cs->read(cs);
 		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
 		if (delta < 0)
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 46789356f856..608a176552d6 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -195,7 +195,7 @@ static void tick_take_do_timer_from_boot(void)
 	int from = tick_do_timer_boot_cpu;
 
 	if (from >= 0 && from != cpu)
-		smp_call_function_single(from, giveup_do_timer, &cpu, 1);
+		smp_call(from, giveup_do_timer, &cpu, SMP_CALL_TYPE_SYNC);
 }
 #endif
 
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index af709c182674..0240c89337b9 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -823,8 +823,8 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 		 */
 		err = -ENXIO;
 	} else {
-		err = smp_call_function_single(cpu, __bpf_prog_test_run_raw_tp,
-					       &info, 1);
+		err = smp_call(cpu, __bpf_prog_test_run_raw_tp,
+				&info, SMP_CALL_TYPE_SYNC);
 	}
 	put_cpu();
 
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index 03f72efdda84..fe086dd0fb60 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -507,8 +507,7 @@ static void iucv_setmask_mp(void)
 		/* Enable all cpus with a declared buffer. */
 		if (cpumask_test_cpu(cpu, &iucv_buffer_cpumask) &&
 		    !cpumask_test_cpu(cpu, &iucv_irq_cpumask))
-			smp_call_function_single(cpu, iucv_allow_cpu,
-						 NULL, 1);
+			smp_call(cpu, iucv_allow_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 }
 
@@ -526,7 +525,7 @@ static void iucv_setmask_up(void)
 	cpumask_copy(&cpumask, &iucv_irq_cpumask);
 	cpumask_clear_cpu(cpumask_first(&iucv_irq_cpumask), &cpumask);
 	for_each_cpu(cpu, &cpumask)
-		smp_call_function_single(cpu, iucv_block_cpu, NULL, 1);
+		smp_call(cpu, iucv_block_cpu, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 /*
@@ -551,7 +550,7 @@ static int iucv_enable(void)
 	/* Declare per cpu buffers. */
 	rc = -EIO;
 	for_each_online_cpu(cpu)
-		smp_call_function_single(cpu, iucv_declare_cpu, NULL, 1);
+		smp_call(cpu, iucv_declare_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	if (cpumask_empty(&iucv_buffer_cpumask))
 		/* No cpu could declare an iucv buffer. */
 		goto out;
@@ -641,8 +640,8 @@ static int iucv_cpu_down_prep(unsigned int cpu)
 	iucv_retrieve_cpu(NULL);
 	if (!cpumask_empty(&iucv_irq_cpumask))
 		return 0;
-	smp_call_function_single(cpumask_first(&iucv_buffer_cpumask),
-				 iucv_allow_cpu, NULL, 1);
+	smp_call(cpumask_first(&iucv_buffer_cpumask),
+		  iucv_allow_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ffa181d2d465..abdb844f671d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5724,7 +5724,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	c.ret = &r;
 	c.opaque = opaque;
 	for_each_online_cpu(cpu) {
-		smp_call_function_single(cpu, check_processor_compat, &c, 1);
+		smp_call(cpu, check_processor_compat, &c, SMP_CALL_TYPE_SYNC);
 		if (r < 0)
 			goto out_free_2;
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 11/11] smp: up.c to adopt the same format of cross CPU call
  2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (9 preceding siblings ...)
  2022-05-19 20:49 ` [PATCH v4 10/11] smp: replace smp_call_function_single() with smp_call Donghai Qiao
@ 2022-05-19 20:49 ` Donghai Qiao
  10 siblings, 0 replies; 12+ messages in thread
From: Donghai Qiao @ 2022-05-19 20:49 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao, kernel test robot

Since smp.c has been changed to use the new interface, up.c should
be changed to use the uniprocessor version of cross call as well.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
---
v1 -> v2: Removed 'x' from the function names and change XCALL to
          SMP_CALL from the new macros
v2 -> v3: Fixed the UP build errors (W=1 build)

 kernel/up.c | 58 +++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/kernel/up.c b/kernel/up.c
index a38b8b095251..ebecd8602884 100644
--- a/kernel/up.c
+++ b/kernel/up.c
@@ -9,8 +9,7 @@
 #include <linux/smp.h>
 #include <linux/hypervisor.h>
 
-int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
-				int wait)
+int smp_call(int cpu, void (*func) (void *info), void *info, unsigned int type)
 {
 	unsigned long flags;
 
@@ -23,37 +22,62 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
 
 	return 0;
 }
-EXPORT_SYMBOL(smp_call_function_single);
+EXPORT_SYMBOL(smp_call);
 
-int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
+int smp_call_cond(int cpu, smp_call_func_t func, void *info,
+		   smp_cond_func_t cond_func, unsigned int type)
 {
-	unsigned long flags;
+	int ret = 0;
+
+	preempt_disable();
+	if (!cond_func || cond_func(cpu, info))
+		ret = smp_call(cpu, func, info, type);
+
+	preempt_enable();
+
+	return ret;
+}
+EXPORT_SYMBOL(smp_call_cond);
+
+void smp_call_mask(const struct cpumask *mask, smp_call_func_t func, void *info, unsigned int type)
+{
+	if (!cpumask_test_cpu(0, mask))
+		return;
+
+	preempt_disable();
+	smp_call(0, func, info, type);
+	preempt_enable();
+}
+EXPORT_SYMBOL(smp_call_mask);
+
+int smp_call_csd(int cpu, struct __call_single_data *csd, unsigned int type)
+{
+	preempt_disable();
+
+	if (csd->func != NULL)
+		smp_call(cpu, csd->func, csd->info, type);
+
+	preempt_enable();
 
-	local_irq_save(flags);
-	csd->func(csd->info);
-	local_irq_restore(flags);
 	return 0;
 }
-EXPORT_SYMBOL(smp_call_function_single_async);
+EXPORT_SYMBOL(smp_call_csd);
 
 /*
  * Preemption is disabled here to make sure the cond_func is called under the
  * same conditions in UP and SMP.
  */
-void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
-			   void *info, bool wait, const struct cpumask *mask)
+void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
+			 void *info, smp_cond_func_t cond_func,
+			 unsigned int type)
 {
-	unsigned long flags;
-
 	preempt_disable();
 	if ((!cond_func || cond_func(0, info)) && cpumask_test_cpu(0, mask)) {
-		local_irq_save(flags);
-		func(info);
-		local_irq_restore(flags);
+		smp_call(0, func, info, type);
 	}
 	preempt_enable();
 }
-EXPORT_SYMBOL(on_each_cpu_cond_mask);
+EXPORT_SYMBOL(smp_call_mask_cond);
 
 int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-19 20:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19 20:49 [PATCH v4 00/11] smp: cross CPU call interface Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 02/11] smp: the definitions of cross call interface Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 03/11] smp: remove SCF_WAIT and SCF_RUN_LOCAL Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 04/11] smp: replace smp_call_function_single() with smp_call() Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 05/11] smp: replace smp_call_function_single_async with smp_call_csd Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 06/11] smp: use smp_call_csd() from irq_work.c and core.c Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 07/11] smp: eliminate smp_call_function_any Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 08/11] smp: replace smp_call_function_many_cond with __smp_call_mask_cond Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 09/11] smp: replace smp_call_function_single_async with smp_call_csd Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 10/11] smp: replace smp_call_function_single() with smp_call Donghai Qiao
2022-05-19 20:49 ` [PATCH v4 11/11] smp: up.c to adopt the same format of cross CPU call Donghai Qiao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.