All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] smp: cross CPU call interface
@ 2022-04-15  2:46 Donghai Qiao
  2022-04-15  2:46 ` [PATCH 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:46 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

The motivation of submitting this patch set is intended to make the
existing cross CPU call mechanism become a bit more formal interface
and more friendly to the kernel developers.

Basically the minimum set of functions below can satisfy any demand
for cross CPU call from kernel consumers. For the sack of simplicity
self-explanatory and less code redundancy no ambiguity, the functions
in this interface are renamed, simplified, or eliminated. But they
are still inheriting the same semantics and parameter lists from their
previous version.

int smp_xcall(int cpu, smp_call_func_t func, void *info, unsigned int flags)

int smp_xcall_cond(int cpu, smp_call_func_t func, void *info,
                   smp_cond_func_t condf, unsigned int flags)

void smp_xcall_mask(const struct cpumask *mask, smp_call_func_t func,
                    void *info, unsigned int flags)

void smp_xcall_mask_cond(const struct cpumask *mask, smp_call_func_t func,
                         void *info, smp_cond_func_t condf, unsigned int flags)

int smp_xcall_private(int cpu, call_single_data_t *csd, unsigned int flags)

int smp_xcall_any(const struct cpumask *mask, smp_call_func_t func,
                  void *info, unsigned int flags)


Here is the explanation about the patch set:

Patch 1: The smp cross call related structures and definitions are
consolidated from smp.c smp_types.h to smp.h. As a result, smp_types.h
is deleted from the source tree.

Patch 2: The set of smp_xcall* functions listed above are defined.
But the details will be done with the subsequent patches in this set.

Patch 3: Eliminated the macros SCF_WAIT and SCF_RUN_LOCAL and the
code around them. The reason that we can do that is because the
function smp_call_function_many_cond() was able to handle the local
cpu call after the commit a32a4d8a815c ("smp: Run functions concurrently
in smp_call_function_many_cond()") only if the local cpu showed up
in cpumask. So it was incorrect to force a local cpu call for the
on_each_cpu_cond_mask code path.

This change and the changes in subsequent patches will eventually
help eliminate the set of on_each_cpu* functions.

Patch 4: Eliminated the percpu global csd_data. Let
smp_call_function_single() temporarily hook up to smp_xcall().

Patch 5: Replaced smp_call_function_single_async() with smp_xcall_private()
and also extended smp_xcall_private() to support synchronous call
with a preallocated csd structures.

Patch 6: Previously, there were two special cross call clients
irq_work.c and core.c that they were using __smp_call_single_queue
which was a smp internal function. With some minor changes in this
patch, they are able to use this interface.

Patch 7: Actually kernel consumers can use smp_xcall() when they
want to use smp_call_function_any(). The extra logics handled by
smp_call_function_any() should be moved out of there and have the
kernel consumers pick up the CPU. Because there are quite a few
of the consumers need to run the cross call function on any one
of the CPUs, so there is some advantage to add smp_xcall_any()
to the interface.

Patch 8: Eliminated smp_call_function, smp_call_function_many,
smp_call_function_many_cond, on_each_cpu, on_each_cpu_mask,
on_each_cpu_cond, on_each_cpu_cond_mask.

Patch 9: Eliminated smp_call_function_single_async.

Patch 10: Eliminated smp_call_function_single.

Patch 11: modify up.c to adopt the same format of cross CPU call.

Note: each patch in this set depends on its precedent patch only.
The kernel can be built and boot if it is patched with any number
of patches starting from 1 to 11.

Donghai Qiao (11):
  smp: consolidate the structure definitions to smp.h
  smp: cross call interface
  smp: eliminate SCF_WAIT and SCF_RUN_LOCAL
  smp: replace smp_call_function_single() with smp_xcall()
  smp: replace smp_call_function_single_async() with smp_xcall_private()
  smp: use smp_xcall_private() fron irq_work.c and core.c
  smp: change smp_call_function_any() to smp_xcall_any()
  smp: replace smp_call_function_many_cond() with __smp_call_mask_cond()
  smp: replace smp_call_function_single_async with smp_xcall_private
  smp: replace smp_call_function_single() with smp_xcall()
  smp: modify up.c to adopt the same format of cross CPU call.

 arch/alpha/kernel/process.c                   |   2 +-
 arch/alpha/kernel/rtc.c                       |   4 +-
 arch/alpha/kernel/smp.c                       |  10 +-
 arch/arc/kernel/perf_event.c                  |   2 +-
 arch/arc/mm/cache.c                           |   2 +-
 arch/arc/mm/tlb.c                             |  14 +-
 arch/arm/common/bL_switcher.c                 |   2 +-
 arch/arm/kernel/machine_kexec.c               |   2 +-
 arch/arm/kernel/perf_event_v7.c               |   6 +-
 arch/arm/kernel/smp_tlb.c                     |  22 +-
 arch/arm/kernel/smp_twd.c                     |   4 +-
 arch/arm/mach-bcm/bcm_kona_smc.c              |   2 +-
 arch/arm/mach-mvebu/pmsu.c                    |   4 +-
 arch/arm/mm/flush.c                           |   4 +-
 arch/arm/vfp/vfpmodule.c                      |   2 +-
 arch/arm64/kernel/armv8_deprecated.c          |   4 +-
 arch/arm64/kernel/perf_event.c                |   8 +-
 arch/arm64/kernel/topology.c                  |   2 +-
 arch/arm64/kvm/arm.c                          |   6 +-
 arch/csky/abiv2/cacheflush.c                  |   2 +-
 arch/csky/kernel/cpu-probe.c                  |   2 +-
 arch/csky/kernel/perf_event.c                 |   2 +-
 arch/csky/kernel/smp.c                        |   2 +-
 arch/csky/mm/cachev2.c                        |   2 +-
 arch/ia64/kernel/mca.c                        |   4 +-
 arch/ia64/kernel/palinfo.c                    |   3 +-
 arch/ia64/kernel/smp.c                        |  10 +-
 arch/ia64/kernel/smpboot.c                    |   2 +-
 arch/ia64/kernel/uncached.c                   |   4 +-
 arch/mips/cavium-octeon/octeon-irq.c          |   4 +-
 arch/mips/cavium-octeon/setup.c               |   4 +-
 arch/mips/kernel/crash.c                      |   2 +-
 arch/mips/kernel/machine_kexec.c              |   2 +-
 arch/mips/kernel/perf_event_mipsxx.c          |   7 +-
 arch/mips/kernel/process.c                    |   2 +-
 arch/mips/kernel/smp-bmips.c                  |   3 +-
 arch/mips/kernel/smp-cps.c                    |   8 +-
 arch/mips/kernel/smp.c                        |  10 +-
 arch/mips/kernel/sysrq.c                      |   2 +-
 arch/mips/mm/c-r4k.c                          |   4 +-
 arch/mips/sibyte/common/cfe.c                 |   2 +-
 arch/openrisc/kernel/smp.c                    |  12 +-
 arch/parisc/kernel/cache.c                    |   4 +-
 arch/parisc/mm/init.c                         |   2 +-
 arch/powerpc/kernel/dawr.c                    |   2 +-
 arch/powerpc/kernel/kvm.c                     |   2 +-
 arch/powerpc/kernel/security.c                |   6 +-
 arch/powerpc/kernel/smp.c                     |   4 +-
 arch/powerpc/kernel/sysfs.c                   |  28 +-
 arch/powerpc/kernel/tau_6xx.c                 |   4 +-
 arch/powerpc/kernel/watchdog.c                |   4 +-
 arch/powerpc/kexec/core_64.c                  |   2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |   2 +-
 arch/powerpc/kvm/book3s_hv.c                  |   8 +-
 arch/powerpc/mm/book3s64/pgtable.c            |   2 +-
 arch/powerpc/mm/book3s64/radix_tlb.c          |  12 +-
 arch/powerpc/mm/nohash/tlb.c                  |  10 +-
 arch/powerpc/mm/slice.c                       |   4 +-
 arch/powerpc/perf/core-book3s.c               |   2 +-
 arch/powerpc/perf/imc-pmu.c                   |   2 +-
 arch/powerpc/platforms/85xx/smp.c             |   8 +-
 arch/powerpc/platforms/powernv/idle.c         |   2 +-
 arch/powerpc/platforms/pseries/lparcfg.c      |   2 +-
 arch/riscv/mm/cacheflush.c                    |   4 +-
 arch/s390/hypfs/hypfs_diag0c.c                |   2 +-
 arch/s390/kernel/alternative.c                |   2 +-
 arch/s390/kernel/perf_cpum_cf.c               |  10 +-
 arch/s390/kernel/perf_cpum_cf_common.c        |   4 +-
 arch/s390/kernel/perf_cpum_sf.c               |   4 +-
 arch/s390/kernel/processor.c                  |   2 +-
 arch/s390/kernel/smp.c                        |   2 +-
 arch/s390/kernel/topology.c                   |   2 +-
 arch/s390/mm/pgalloc.c                        |   2 +-
 arch/s390/pci/pci_irq.c                       |   4 +-
 arch/sh/kernel/smp.c                          |  14 +-
 arch/sh/mm/cache.c                            |   2 +-
 arch/sparc/include/asm/mman.h                 |   4 +-
 arch/sparc/kernel/nmi.c                       |  16 +-
 arch/sparc/kernel/perf_event.c                |   4 +-
 arch/sparc/kernel/smp_64.c                    |   8 +-
 arch/sparc/mm/init_64.c                       |   2 +-
 arch/x86/events/core.c                        |   6 +-
 arch/x86/events/intel/core.c                  |   4 +-
 arch/x86/kernel/alternative.c                 |   2 +-
 arch/x86/kernel/amd_nb.c                      |   2 +-
 arch/x86/kernel/apic/apic.c                   |   2 +-
 arch/x86/kernel/apic/vector.c                 |   2 +-
 arch/x86/kernel/cpu/aperfmperf.c              |   5 +-
 arch/x86/kernel/cpu/bugs.c                    |   2 +-
 arch/x86/kernel/cpu/mce/amd.c                 |   4 +-
 arch/x86/kernel/cpu/mce/core.c                |  12 +-
 arch/x86/kernel/cpu/mce/inject.c              |  14 +-
 arch/x86/kernel/cpu/mce/intel.c               |   2 +-
 arch/x86/kernel/cpu/microcode/core.c          |   4 +-
 arch/x86/kernel/cpu/mtrr/mtrr.c               |   2 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   4 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |   8 +-
 arch/x86/kernel/cpu/sgx/main.c                |   5 +-
 arch/x86/kernel/cpu/umwait.c                  |   2 +-
 arch/x86/kernel/cpu/vmware.c                  |   2 +-
 arch/x86/kernel/cpuid.c                       |   2 +-
 arch/x86/kernel/kvm.c                         |   6 +-
 arch/x86/kernel/ldt.c                         |   2 +-
 arch/x86/kvm/vmx/vmx.c                        |   3 +-
 arch/x86/kvm/x86.c                            |  11 +-
 arch/x86/lib/cache-smp.c                      |   4 +-
 arch/x86/lib/msr-smp.c                        |  20 +-
 arch/x86/mm/pat/set_memory.c                  |   4 +-
 arch/x86/mm/tlb.c                             |  12 +-
 arch/x86/xen/mmu_pv.c                         |   4 +-
 arch/x86/xen/smp_pv.c                         |   2 +-
 arch/x86/xen/suspend.c                        |   4 +-
 arch/xtensa/kernel/smp.c                      |  29 +-
 block/blk-mq.c                                |   2 +-
 drivers/acpi/processor_idle.c                 |   4 +-
 drivers/char/agp/generic.c                    |   2 +-
 drivers/clocksource/ingenic-timer.c           |   2 +-
 drivers/clocksource/mips-gic-timer.c          |   2 +-
 drivers/cpufreq/acpi-cpufreq.c                |  10 +-
 drivers/cpufreq/powernow-k8.c                 |   9 +-
 drivers/cpufreq/powernv-cpufreq.c             |  14 +-
 drivers/cpufreq/sparc-us2e-cpufreq.c          |   4 +-
 drivers/cpufreq/sparc-us3-cpufreq.c           |   4 +-
 drivers/cpufreq/speedstep-ich.c               |   7 +-
 drivers/cpufreq/tegra194-cpufreq.c            |   8 +-
 drivers/cpuidle/coupled.c                     |   2 +-
 drivers/cpuidle/driver.c                      |   8 +-
 drivers/edac/amd64_edac.c                     |   4 +-
 drivers/firmware/arm_sdei.c                   |  10 +-
 drivers/gpu/drm/i915/vlv_sideband.c           |   2 +-
 drivers/hwmon/fam15h_power.c                  |   2 +-
 .../hwtracing/coresight/coresight-cpu-debug.c |   3 +-
 .../coresight/coresight-etm3x-core.c          |  11 +-
 .../coresight/coresight-etm4x-core.c          |  12 +-
 .../coresight/coresight-etm4x-sysfs.c         |   2 +-
 drivers/hwtracing/coresight/coresight-trbe.c  |   6 +-
 drivers/irqchip/irq-mvebu-pic.c               |   4 +-
 .../net/ethernet/cavium/liquidio/lio_core.c   |   2 +-
 drivers/net/ethernet/marvell/mvneta.c         |  34 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   8 +-
 drivers/perf/arm_spe_pmu.c                    |   2 +-
 .../intel/speed_select_if/isst_if_mbox_msr.c  |   4 +-
 drivers/platform/x86/intel_ips.c              |   4 +-
 drivers/powercap/intel_rapl_common.c          |   2 +-
 drivers/powercap/intel_rapl_msr.c             |   2 +-
 drivers/regulator/qcom_spmi-regulator.c       |   3 +-
 drivers/soc/fsl/qbman/qman.c                  |   4 +-
 drivers/soc/fsl/qbman/qman_test_stash.c       |   9 +-
 drivers/soc/xilinx/xlnx_event_manager.c       |   2 +-
 drivers/tty/sysrq.c                           |   2 +-
 drivers/watchdog/booke_wdt.c                  |   8 +-
 fs/buffer.c                                   |   2 +-
 include/linux/irq_work.h                      |   2 +-
 include/linux/smp.h                           | 234 ++++---
 include/linux/smp_types.h                     |  69 --
 kernel/cpu.c                                  |   4 +-
 kernel/debug/debug_core.c                     |   2 +-
 kernel/events/core.c                          |  10 +-
 kernel/irq_work.c                             |   4 +-
 kernel/profile.c                              |   4 +-
 kernel/rcu/rcutorture.c                       |   3 +-
 kernel/rcu/tasks.h                            |   4 +-
 kernel/rcu/tree.c                             |   6 +-
 kernel/rcu/tree_exp.h                         |   4 +-
 kernel/relay.c                                |   5 +-
 kernel/scftorture.c                           |  13 +-
 kernel/sched/core.c                           |   4 +-
 kernel/sched/fair.c                           |   2 +-
 kernel/sched/membarrier.c                     |  14 +-
 kernel/smp.c                                  | 633 ++++++++----------
 kernel/time/clockevents.c                     |   2 +-
 kernel/time/clocksource.c                     |   2 +-
 kernel/time/hrtimer.c                         |   6 +-
 kernel/time/tick-common.c                     |   2 +-
 kernel/trace/ftrace.c                         |   6 +-
 kernel/trace/ring_buffer.c                    |   2 +-
 kernel/trace/trace.c                          |  12 +-
 kernel/trace/trace_events.c                   |   2 +-
 kernel/up.c                                   |  56 +-
 mm/kasan/quarantine.c                         |   2 +-
 mm/mmu_gather.c                               |   2 +-
 mm/slab.c                                     |   2 +-
 net/bpf/test_run.c                            |   4 +-
 net/core/dev.c                                |   2 +-
 net/iucv/iucv.c                               |  17 +-
 virt/kvm/kvm_main.c                           |  12 +-
 186 files changed, 945 insertions(+), 1009 deletions(-)
 delete mode 100644 include/linux/smp_types.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 01/11] smp: consolidate the structure definitions to smp.h
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
@ 2022-04-15  2:46 ` Donghai Qiao
  2022-04-15  2:46 ` [PATCH 02/11] smp: cross call interface Donghai Qiao
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:46 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Move the structure definitions from kernel/smp.c to
include/linux/smp.h

Move the structure definitions from include/linux/smp_types.h
to include/linux/smp.h and delete smp_types.h

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 include/linux/irq_work.h  |   2 +-
 include/linux/smp.h       | 131 ++++++++++++++++++++++++++++++++++++--
 include/linux/smp_types.h |  69 --------------------
 kernel/smp.c              |  65 -------------------
 4 files changed, 128 insertions(+), 139 deletions(-)
 delete mode 100644 include/linux/smp_types.h

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 8cd11a223260..145af67b1cd3 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -2,7 +2,7 @@
 #ifndef _LINUX_IRQ_WORK_H
 #define _LINUX_IRQ_WORK_H
 
-#include <linux/smp_types.h>
+#include <linux/smp.h>
 #include <linux/rcuwait.h>
 
 /*
diff --git a/include/linux/smp.h b/include/linux/smp.h
index a80ab58ae3f1..31811da856a3 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -10,13 +10,74 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/list.h>
+#include <linux/llist.h>
 #include <linux/cpumask.h>
 #include <linux/init.h>
-#include <linux/smp_types.h>
 
 typedef void (*smp_call_func_t)(void *info);
 typedef bool (*smp_cond_func_t)(int cpu, void *info);
 
+enum {
+	CSD_FLAG_LOCK		= 0x01,
+
+	IRQ_WORK_PENDING	= 0x01,
+	IRQ_WORK_BUSY		= 0x02,
+	IRQ_WORK_LAZY		= 0x04, /* No IPI, wait for tick */
+	IRQ_WORK_HARD_IRQ	= 0x08, /* IRQ context on PREEMPT_RT */
+
+	IRQ_WORK_CLAIMED	= (IRQ_WORK_PENDING | IRQ_WORK_BUSY),
+
+	CSD_TYPE_ASYNC		= 0x00,
+	CSD_TYPE_SYNC		= 0x10,
+	CSD_TYPE_IRQ_WORK	= 0x20,
+	CSD_TYPE_TTWU		= 0x30,
+
+	CSD_FLAG_TYPE_MASK	= 0xF0,
+};
+
+/*
+ * struct __call_single_node is the primary type on
+ * smp.c:call_single_queue.
+ *
+ * flush_smp_call_function_queue() only reads the type from
+ * __call_single_node::u_flags as a regular load, the above
+ * (anonymous) enum defines all the bits of this word.
+ *
+ * Other bits are not modified until the type is known.
+ *
+ * CSD_TYPE_SYNC/ASYNC:
+ *	struct {
+ *		struct llist_node node;
+ *		unsigned int flags;
+ *		smp_call_func_t func;
+ *		void *info;
+ *	};
+ *
+ * CSD_TYPE_IRQ_WORK:
+ *	struct {
+ *		struct llist_node node;
+ *		atomic_t flags;
+ *		void (*func)(struct irq_work *);
+ *	};
+ *
+ * CSD_TYPE_TTWU:
+ *	struct {
+ *		struct llist_node node;
+ *		unsigned int flags;
+ *	};
+ *
+ */
+struct __call_single_node {
+	struct llist_node	llist;
+	union {
+		unsigned int	u_flags;
+		atomic_t	a_flags;
+	};
+#ifdef CONFIG_64BIT
+	u16 src, dst;
+#endif
+};
+
 /*
  * structure shares (partial) layout with struct irq_work
  */
@@ -26,13 +87,75 @@ struct __call_single_data {
 	void *info;
 };
 
-#define CSD_INIT(_func, _info) \
-	(struct __call_single_data){ .func = (_func), .info = (_info), }
-
 /* Use __aligned() to avoid to use 2 cache lines for 1 csd */
 typedef struct __call_single_data call_single_data_t
 	__aligned(sizeof(struct __call_single_data));
 
+struct cfd_percpu {
+	call_single_data_t	csd;
+#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
+	u64	seq_queue;
+	u64	seq_ipi;
+	u64	seq_noipi;
+#endif
+};
+
+struct call_function_data {
+	struct cfd_percpu	__percpu *pcpu;
+	cpumask_var_t		cpumask;
+	cpumask_var_t		cpumask_ipi;
+};
+
+#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
+union cfd_seq_cnt {
+	u64		val;
+	struct {
+		u64	src:16;
+		u64	dst:16;
+#define CFD_SEQ_NOCPU	0xffff
+		u64	type:4;
+#define CFD_SEQ_QUEUE	0
+#define CFD_SEQ_IPI	1
+#define CFD_SEQ_NOIPI	2
+#define CFD_SEQ_PING	3
+#define CFD_SEQ_PINGED	4
+#define CFD_SEQ_HANDLE	5
+#define CFD_SEQ_DEQUEUE	6
+#define CFD_SEQ_IDLE	7
+#define CFD_SEQ_GOTIPI	8
+#define CFD_SEQ_HDLEND	9
+		u64	cnt:28;
+	}		u;
+};
+
+static char *seq_type[] = {
+	[CFD_SEQ_QUEUE]		= "queue",
+	[CFD_SEQ_IPI]		= "ipi",
+	[CFD_SEQ_NOIPI]		= "noipi",
+	[CFD_SEQ_PING]		= "ping",
+	[CFD_SEQ_PINGED]	= "pinged",
+	[CFD_SEQ_HANDLE]	= "handle",
+	[CFD_SEQ_DEQUEUE]	= "dequeue (src CPU 0 == empty)",
+	[CFD_SEQ_IDLE]		= "idle",
+	[CFD_SEQ_GOTIPI]	= "gotipi",
+	[CFD_SEQ_HDLEND]	= "hdlend (src CPU 0 == early)",
+};
+
+struct cfd_seq_local {
+	u64	ping;
+	u64	pinged;
+	u64	handle;
+	u64	dequeue;
+	u64	idle;
+	u64	gotipi;
+	u64	hdlend;
+};
+#endif
+
+#define CSD_TYPE(_csd)	((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
+#define CSD_INIT(_func, _info) \
+	((struct __call_single_data){ .func = (_func), .info = (_info), })
+
 #define INIT_CSD(_csd, _func, _info)		\
 do {						\
 	*(_csd) = CSD_INIT((_func), (_info));	\
diff --git a/include/linux/smp_types.h b/include/linux/smp_types.h
deleted file mode 100644
index 2e8461af8df6..000000000000
--- a/include/linux/smp_types.h
+++ /dev/null
@@ -1,69 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __LINUX_SMP_TYPES_H
-#define __LINUX_SMP_TYPES_H
-
-#include <linux/llist.h>
-
-enum {
-	CSD_FLAG_LOCK		= 0x01,
-
-	IRQ_WORK_PENDING	= 0x01,
-	IRQ_WORK_BUSY		= 0x02,
-	IRQ_WORK_LAZY		= 0x04, /* No IPI, wait for tick */
-	IRQ_WORK_HARD_IRQ	= 0x08, /* IRQ context on PREEMPT_RT */
-
-	IRQ_WORK_CLAIMED	= (IRQ_WORK_PENDING | IRQ_WORK_BUSY),
-
-	CSD_TYPE_ASYNC		= 0x00,
-	CSD_TYPE_SYNC		= 0x10,
-	CSD_TYPE_IRQ_WORK	= 0x20,
-	CSD_TYPE_TTWU		= 0x30,
-
-	CSD_FLAG_TYPE_MASK	= 0xF0,
-};
-
-/*
- * struct __call_single_node is the primary type on
- * smp.c:call_single_queue.
- *
- * flush_smp_call_function_queue() only reads the type from
- * __call_single_node::u_flags as a regular load, the above
- * (anonymous) enum defines all the bits of this word.
- *
- * Other bits are not modified until the type is known.
- *
- * CSD_TYPE_SYNC/ASYNC:
- *	struct {
- *		struct llist_node node;
- *		unsigned int flags;
- *		smp_call_func_t func;
- *		void *info;
- *	};
- *
- * CSD_TYPE_IRQ_WORK:
- *	struct {
- *		struct llist_node node;
- *		atomic_t flags;
- *		void (*func)(struct irq_work *);
- *	};
- *
- * CSD_TYPE_TTWU:
- *	struct {
- *		struct llist_node node;
- *		unsigned int flags;
- *	};
- *
- */
-
-struct __call_single_node {
-	struct llist_node	llist;
-	union {
-		unsigned int	u_flags;
-		atomic_t	a_flags;
-	};
-#ifdef CONFIG_64BIT
-	u16 src, dst;
-#endif
-};
-
-#endif /* __LINUX_SMP_TYPES_H */
diff --git a/kernel/smp.c b/kernel/smp.c
index 01a7c1706a58..b2b3878f0330 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -29,73 +29,8 @@
 #include "smpboot.h"
 #include "sched/smp.h"
 
-#define CSD_TYPE(_csd)	((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
-
-#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
-union cfd_seq_cnt {
-	u64		val;
-	struct {
-		u64	src:16;
-		u64	dst:16;
-#define CFD_SEQ_NOCPU	0xffff
-		u64	type:4;
-#define CFD_SEQ_QUEUE	0
-#define CFD_SEQ_IPI	1
-#define CFD_SEQ_NOIPI	2
-#define CFD_SEQ_PING	3
-#define CFD_SEQ_PINGED	4
-#define CFD_SEQ_HANDLE	5
-#define CFD_SEQ_DEQUEUE	6
-#define CFD_SEQ_IDLE	7
-#define CFD_SEQ_GOTIPI	8
-#define CFD_SEQ_HDLEND	9
-		u64	cnt:28;
-	}		u;
-};
-
-static char *seq_type[] = {
-	[CFD_SEQ_QUEUE]		= "queue",
-	[CFD_SEQ_IPI]		= "ipi",
-	[CFD_SEQ_NOIPI]		= "noipi",
-	[CFD_SEQ_PING]		= "ping",
-	[CFD_SEQ_PINGED]	= "pinged",
-	[CFD_SEQ_HANDLE]	= "handle",
-	[CFD_SEQ_DEQUEUE]	= "dequeue (src CPU 0 == empty)",
-	[CFD_SEQ_IDLE]		= "idle",
-	[CFD_SEQ_GOTIPI]	= "gotipi",
-	[CFD_SEQ_HDLEND]	= "hdlend (src CPU 0 == early)",
-};
-
-struct cfd_seq_local {
-	u64	ping;
-	u64	pinged;
-	u64	handle;
-	u64	dequeue;
-	u64	idle;
-	u64	gotipi;
-	u64	hdlend;
-};
-#endif
-
-struct cfd_percpu {
-	call_single_data_t	csd;
-#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
-	u64	seq_queue;
-	u64	seq_ipi;
-	u64	seq_noipi;
-#endif
-};
-
-struct call_function_data {
-	struct cfd_percpu	__percpu *pcpu;
-	cpumask_var_t		cpumask;
-	cpumask_var_t		cpumask_ipi;
-};
-
 static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);
-
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
-
 static void flush_smp_call_function_queue(bool warn_cpu_offline);
 
 int smpcfd_prepare_cpu(unsigned int cpu)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 02/11] smp: cross call interface
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
  2022-04-15  2:46 ` [PATCH 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
@ 2022-04-15  2:46 ` Donghai Qiao
  2022-04-15  2:46 ` [PATCH 03/11] smp: eliminate SCF_WAIT and SCF_RUN_LOCAL Donghai Qiao
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:46 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

The functions of cross CPU call interface are defined below :

int smp_xcall(int cpu, smp_call_func_t func, void *info,
		unsigned int flags)

int smp_xcall_cond(int cpu, smp_call_func_t func, void *info,
		smp_cond_func_t condf, unsigned int flags)

void smp_xcall_mask(const struct cpumask *mask, smp_call_func_t func,
		void *info, unsigned int flags)

void smp_xcall_mask_cond(const struct cpumask *mask, smp_call_func_t func,
		void *info, smp_cond_func_t condf, unsigned int flags)

int smp_xcall_private(int cpu, call_single_data_t *csd, unsigned int flags)

int smp_xcall_any(const struct cpumask *mask, smp_call_func_t func,
		void *info, unsigned int flags)

The motivation of submitting this patch set is intended to make the
existing cross CPU call mechanism become a bit more formal interface
and more friendly to the kernel developers.

Basically the minimum set of functions below can satisfy any demand
for cross CPU call from kernel consumers. For the sack of simplicity
self-explanatory and less code redundancy no ambiguity, the functions
in this interface are renamed, simplified, or eliminated. But they
are still inheriting the same semantics and parameter lists from their
previous version.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 include/linux/smp.h |  30 +++++++++
 kernel/smp.c        | 156 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 186 insertions(+)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 31811da856a3..12d6efef34f7 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -161,6 +161,36 @@ do {						\
 	*(_csd) = CSD_INIT((_func), (_info));	\
 } while (0)
 
+
+/*
+ * smp_xcall Interface.
+ *
+ * Also see kernel/smp.c for the details.
+ */
+#define	XCALL_TYPE_SYNC		CSD_TYPE_SYNC
+#define	XCALL_TYPE_ASYNC	CSD_TYPE_ASYNC
+#define	XCALL_TYPE_IRQ_WORK	CSD_TYPE_IRQ_WORK
+#define	XCALL_TYPE_TTWU		CSD_TYPE_TTWU
+#define	XCALL_TYPE_MASK		CSD_FLAG_TYPE_MASK
+
+#define	XCALL_ALL		-1
+
+extern int smp_xcall(int cpu, smp_call_func_t func, void *info, unsigned int flags);
+
+extern int smp_xcall_cond(int cpu, smp_call_func_t func, void *info,
+		smp_cond_func_t condf, unsigned int flags);
+
+extern void smp_xcall_mask(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags);
+
+extern void smp_xcall_mask_cond(const struct cpumask *mask, smp_call_func_t func,
+		void *info, smp_cond_func_t condf, unsigned int flags);
+
+extern int smp_xcall_private(int cpu, call_single_data_t *csd, unsigned int flags);
+
+extern int smp_xcall_any(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags);
+
 /*
  * Enqueue a llist_node on the call_single_queue; be very careful, read
  * flush_smp_call_function_queue() in detail.
diff --git a/kernel/smp.c b/kernel/smp.c
index b2b3878f0330..6183a3586329 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -1170,3 +1170,159 @@ int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 	return sscs.ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_on_cpu);
+
+
+void __smp_call_mask_cond(const struct cpumask *mask,
+		smp_call_func_t func, void *info,
+		smp_cond_func_t cond_func,
+		unsigned int flags)
+{
+}
+
+/*
+ * smp_xcall Interface
+ *
+ * Consolidate the cross CPU call usage from the history below:
+ *
+ * Normally this interface cannot be used with interrupts disabled or
+ * from a hardware interrupt handler or from a bottom half handler.
+ * But there are two exceptions:
+ * 1) It can be used during early boot while early_boot_irqs_disabled
+ *    is. In this scenario, you should use local_irq_save/restore()
+ *    instead of local_irq_disable/enable()
+ * 2) Because smp_xcall_private(cpu, csd, XCALL_TYPE_ASYNC) is an asynchonous
+ *    call with a preallocated csd structure, thus it can be called from
+ *    the context where interrupts are disabled.
+ */
+
+/*
+ * Parameters:
+ *
+ * cpu: If cpu >=0 && cpu < nr_cpu_ids, the cross call is for that cpu.
+ *      If cpu == -1, the cross call is for all the online CPUs
+ *
+ * func: It is the cross function that the destination CPUs need to execute.
+ *       This function must be fast and non-blocking.
+ *
+ * info: It is the parameter to func().
+ *
+ * flags: The flags specify the manner the cross call is performaned in terms
+ *	  of synchronous or asynchronous.
+ *
+ *	  A synchronous cross call will not return immediately until all
+ *	  the destination CPUs have executed func() and responded the call.
+ *
+ *	  An asynchrouse cross call will return immediately as soon as it
+ *	  has fired all the cross calls and run func() locally if needed
+ *	  regardless the status of the target CPUs.
+ *
+ * Return: %0 on success or negative errno value on error.
+ */
+int smp_xcall(int cpu, smp_call_func_t func, void *info, unsigned int flags)
+{
+	return smp_xcall_cond(cpu, func, info, NULL, flags);
+}
+EXPORT_SYMBOL(smp_xcall);
+
+/*
+ * Parameters:
+ *
+ * cond_func: This is a condition function cond_func(cpu, info) invoked by
+ *	      the underlying cross call mechanism only. If the return value
+ *	      from cond_func(cpu, info) is true, the cross call will be sent
+ *	      to that cpu, otherwise the call will not be sent.
+ *
+ * Others: see smp_xcall().
+ *
+ * Return: %0 on success or negative errno value on error.
+ */
+int smp_xcall_cond(int cpu, smp_call_func_t func, void *info,
+		    smp_cond_func_t cond_func, unsigned int flags)
+{
+	preempt_disable();
+	if (cpu == XCALL_ALL) {
+		__smp_call_mask_cond(cpu_online_mask, func, info, cond_func, flags);
+	} else if ((unsigned int)cpu < nr_cpu_ids)
+		__smp_call_mask_cond(cpumask_of(cpu), func, info, cond_func, flags);
+	else {
+		preempt_enable();
+		pr_warn("Invalid cpu ID = %d\n", cpu);
+		return -ENXIO;
+	}
+	preempt_enable();
+	return 0;
+}
+EXPORT_SYMBOL(smp_xcall_cond);
+
+/*
+ * Parameters:
+ *
+ * mask: This is the bitmap of CPUs to which the cross call will be sent.
+ *
+ * Others: see smp_xcall().
+ */
+void smp_xcall_mask(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags)
+{
+	preempt_disable();
+	__smp_call_mask_cond(mask, func, info, NULL, flags);
+	preempt_enable();
+}
+EXPORT_SYMBOL(smp_xcall_mask);
+
+/*
+ * The combination of smp_xcall_cond() and smp_xcall_mask()
+ */
+void smp_xcall_mask_cond(const struct cpumask *mask,
+		smp_call_func_t func, void *info,
+		smp_cond_func_t cond_func,
+		unsigned int flags)
+{
+	preempt_disable();
+	__smp_call_mask_cond(mask, func, info, cond_func, flags);
+	preempt_enable();
+}
+EXPORT_SYMBOL(smp_xcall_mask_cond);
+
+/*
+ * This function provides an alternative way of sending a xcall call to
+ * only one CPU with a private csd instead of using the csd resource of
+ * the xcall. But it is the callers' responsibity to setup and maintain
+ * its private call_single_data_t struture.
+ *
+ * Because the call is asynchonous with a preallocated csd structure, thus
+ * it can be called from contexts with disabled interrupts.
+ *
+ * Parameters
+ *
+ * cpu:   Must be a positive value less than nr_cpu_id.
+ * csd:   The private csd provided by the caller.
+ *
+ * Others: see smp_xcall().
+ */
+int smp_xcall_private(int cpu, call_single_data_t *csd, unsigned int flags)
+{
+	return 0;
+}
+EXPORT_SYMBOL(smp_xcall_private);
+
+/*
+ * Parameters:
+ *
+ * mask:  Run func() on one of the given CPUs in mask if it is oneline.
+ *        CPU selection preference (from the original comments for
+ *        smp_call_function_any()) :
+ *          1) current cpu if in @mask
+ *          2) any cpu of current node if in @mask
+ *          3) any other online cpu in @mask
+ *
+ * Others, see smp_xcall().
+ *
+ * Returns 0 on success, else a negative status code (if no cpus were online).
+ */
+int smp_xcall_any(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags)
+{
+	return 0;
+}
+EXPORT_SYMBOL(smp_xcall_any);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 03/11] smp: eliminate SCF_WAIT and SCF_RUN_LOCAL
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
  2022-04-15  2:46 ` [PATCH 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
  2022-04-15  2:46 ` [PATCH 02/11] smp: cross call interface Donghai Qiao
@ 2022-04-15  2:46 ` Donghai Qiao
  2022-04-15  2:46 ` [PATCH 04/11] smp: replace smp_call_function_single() with smp_xcall() Donghai Qiao
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:46 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

The commit a32a4d8a815c ("smp: Run functions concurrently in
smp_call_function_many_cond()") was to improve the concurrent
of the cross call execution between local and remote. The change
in smp_call_function_many_cond() did what was intended, but the
new macro SCF_WAIT and SCF_RUN_LOCAL and the code around them
to handle local call were not unnecessary because the modified
function smp_call_function_many() was able to handle the local
cross call. So these two macros can be eliminated and the code
implemented around that can be removed as well.

Also this patch fixed a issue of a comparison between an integer
and a unsigned integer in smp_call_function_many_cond().

The changes with this patch and the changes made in subsequent
patches will eventually help eliminate the set of on_each_cpu*
functions.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 kernel/smp.c | 32 +++++++-------------------------
 1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 6183a3586329..3f9bc5ae7180 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -787,23 +787,13 @@ int smp_call_function_any(const struct cpumask *mask,
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
 
-/*
- * Flags to be used as scf_flags argument of smp_call_function_many_cond().
- *
- * %SCF_WAIT:		Wait until function execution is completed
- * %SCF_RUN_LOCAL:	Run also locally if local cpu is set in cpumask
- */
-#define SCF_WAIT	(1U << 0)
-#define SCF_RUN_LOCAL	(1U << 1)
-
 static void smp_call_function_many_cond(const struct cpumask *mask,
 					smp_call_func_t func, void *info,
-					unsigned int scf_flags,
+					bool wait,
 					smp_cond_func_t cond_func)
 {
 	int cpu, last_cpu, this_cpu = smp_processor_id();
 	struct call_function_data *cfd;
-	bool wait = scf_flags & SCF_WAIT;
 	bool run_remote = false;
 	bool run_local = false;
 	int nr_cpus = 0;
@@ -829,14 +819,14 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 	WARN_ON_ONCE(!in_task());
 
 	/* Check if we need local execution. */
-	if ((scf_flags & SCF_RUN_LOCAL) && cpumask_test_cpu(this_cpu, mask))
+	if (cpumask_test_cpu(this_cpu, mask))
 		run_local = true;
 
 	/* Check if we need remote execution, i.e., any CPU excluding this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
 	if (cpu == this_cpu)
 		cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
-	if (cpu < nr_cpu_ids)
+	if ((unsigned int)cpu < nr_cpu_ids)
 		run_remote = true;
 
 	if (run_remote) {
@@ -911,12 +901,8 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
  * @mask: The set of cpus to run on (only runs on online subset).
  * @func: The function to run. This must be fast and non-blocking.
  * @info: An arbitrary pointer to pass to the function.
- * @wait: Bitmask that controls the operation. If %SCF_WAIT is set, wait
- *        (atomically) until function has completed on other CPUs. If
- *        %SCF_RUN_LOCAL is set, the function will also be run locally
- *        if the local CPU is set in the @cpumask.
- *
- * If @wait is true, then returns once @func has returned.
+ * @wait: If wait is true, the call will not return until func()
+ *        has completed on other CPUs.
  *
  * You must not call this function with disabled interrupts or from a
  * hardware interrupt handler or from a bottom half handler. Preemption
@@ -925,7 +911,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 void smp_call_function_many(const struct cpumask *mask,
 			    smp_call_func_t func, void *info, bool wait)
 {
-	smp_call_function_many_cond(mask, func, info, wait * SCF_WAIT, NULL);
+	smp_call_function_many_cond(mask, func, info, wait, NULL);
 }
 EXPORT_SYMBOL(smp_call_function_many);
 
@@ -1061,13 +1047,9 @@ void __init smp_init(void)
 void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
 			   void *info, bool wait, const struct cpumask *mask)
 {
-	unsigned int scf_flags = SCF_RUN_LOCAL;
-
-	if (wait)
-		scf_flags |= SCF_WAIT;
 
 	preempt_disable();
-	smp_call_function_many_cond(mask, func, info, scf_flags, cond_func);
+	smp_call_function_many_cond(mask, func, info, wait, cond_func);
 	preempt_enable();
 }
 EXPORT_SYMBOL(on_each_cpu_cond_mask);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 04/11] smp: replace smp_call_function_single() with smp_xcall()
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (2 preceding siblings ...)
  2022-04-15  2:46 ` [PATCH 03/11] smp: eliminate SCF_WAIT and SCF_RUN_LOCAL Donghai Qiao
@ 2022-04-15  2:46 ` Donghai Qiao
  2022-04-15  2:46 ` [PATCH 05/11] smp: replace smp_call_function_single_async() with smp_xcall_private() Donghai Qiao
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:46 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Eliminated the percpu global csd_data and temporarily hook up
to smp_xcall().

There is no obvious reason or evidence that the differentiation
of xcall of single recipient from that of multiple recipients
can exploit noticeable performance gaining. If something can be
optimized on this matter, it might be from the interrupt level
which has been already addressed by arch_send_call_function_single_ipi()
and arch_send_call_function_ipi_mask(). In fact, both have been
taken in to account from smp_call_function_many_cond().

So, it is appropriate to make this change as part of the cross
call interface.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 kernel/smp.c | 74 ++++++++++++++++++----------------------------------
 1 file changed, 25 insertions(+), 49 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 3f9bc5ae7180..42ecaf960963 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -399,8 +399,6 @@ static __always_inline void csd_unlock(struct __call_single_data *csd)
 	smp_store_release(&csd->node.u_flags, 0);
 }
 
-static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
-
 void __smp_call_single_queue(int cpu, struct llist_node *node)
 {
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
@@ -634,6 +632,9 @@ void flush_smp_call_function_from_idle(void)
 }
 
 /*
+ * This is a temporarily hook up. This function will be eliminated
+ * with the last patch in this series.
+ *
  * smp_call_function_single - Run a function on a specific CPU
  * @func: The function to run. This must be fast and non-blocking.
  * @info: An arbitrary pointer to pass to the function.
@@ -642,59 +643,21 @@ void flush_smp_call_function_from_idle(void)
  * Returns 0 on success, else a negative status code.
  */
 int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
-			     int wait)
+			int wait)
 {
-	call_single_data_t *csd;
-	call_single_data_t csd_stack = {
-		.node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, },
-	};
-	int this_cpu;
-	int err;
-
-	/*
-	 * prevent preemption and reschedule on another processor,
-	 * as well as CPU removal
-	 */
-	this_cpu = get_cpu();
-
-	/*
-	 * Can deadlock when called with interrupts disabled.
-	 * We allow cpu's that are not yet online though, as no one else can
-	 * send smp call function interrupt to this cpu and as such deadlocks
-	 * can't happen.
-	 */
-	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
-		     && !oops_in_progress);
-
-	/*
-	 * When @wait we can deadlock when we interrupt between llist_add() and
-	 * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
-	 * csd_lock() on because the interrupt context uses the same csd
-	 * storage.
-	 */
-	WARN_ON_ONCE(!in_task());
-
-	csd = &csd_stack;
-	if (!wait) {
-		csd = this_cpu_ptr(&csd_data);
-		csd_lock(csd);
-	}
-
-	csd->func = func;
-	csd->info = info;
-#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
-	csd->node.src = smp_processor_id();
-	csd->node.dst = cpu;
-#endif
+	unsigned int flags = 0;
 
-	err = generic_exec_single(cpu, csd);
+	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu))
+		return -ENXIO;
 
 	if (wait)
-		csd_lock_wait(csd);
+		flags = XCALL_TYPE_SYNC;
+	else
+		flags = XCALL_TYPE_ASYNC;
 
-	put_cpu();
+	smp_xcall(cpu, func, info, flags);
 
-	return err;
+	return 0;
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
@@ -1159,6 +1122,19 @@ void __smp_call_mask_cond(const struct cpumask *mask,
 		smp_cond_func_t cond_func,
 		unsigned int flags)
 {
+	bool wait = false;
+
+	if (flags == XCALL_TYPE_SYNC)
+		wait = true;
+
+	preempt_disable();
+
+	/*
+	 * This is temporarily hook. The function smp_call_function_many_cond()
+	 * will be inlined here with the last patch in this series.
+	 */
+	smp_call_function_many_cond(mask, func, info, wait, cond_func);
+	preempt_enable();
 }
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 05/11] smp: replace smp_call_function_single_async() with smp_xcall_private()
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (3 preceding siblings ...)
  2022-04-15  2:46 ` [PATCH 04/11] smp: replace smp_call_function_single() with smp_xcall() Donghai Qiao
@ 2022-04-15  2:46 ` Donghai Qiao
  2022-04-15  2:46 ` [PATCH 06/11] smp: use smp_xcall_private() fron irq_work.c and core.c Donghai Qiao
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:46 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replaced smp_call_function_single_async() with smp_xcall_private()
and also extended smp_xcall_private() to support one CPU synchronous
call with preallocated csd structures.

Ideally, the new interface smp_xcall() should be able to do what
smp_call_function_single_async() does. Because the csd is provided
and maintained by the callers, it exposes the risk of corrupting
the call_single_queue[cpu] linked list if the clents menipulate
their csd inappropriately. On the other hand, there should have no
noticeable performance advantage to provide preallocated csd for
cross call kernel consumers. Thus, in the long run, the consumers
should change to not use this type of preallocated csd.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 include/linux/smp.h |   3 +-
 kernel/smp.c        | 163 +++++++++++++++++++++-----------------------
 2 files changed, 81 insertions(+), 85 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 12d6efef34f7..8a234e707f10 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -206,7 +206,8 @@ int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
 			   void *info, bool wait, const struct cpumask *mask);
 
-int smp_call_function_single_async(int cpu, struct __call_single_data *csd);
+#define	smp_call_function_single_async(cpu, csd) \
+	smp_xcall_private(cpu, csd, XCALL_TYPE_ASYNC)
 
 /*
  * Cpus stopping functions in panic. All have default weak definitions.
diff --git a/kernel/smp.c b/kernel/smp.c
index 42ecaf960963..aef913b54f81 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -429,41 +429,6 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
 		send_call_function_single_ipi(cpu);
 }
 
-/*
- * Insert a previously allocated call_single_data_t element
- * for execution on the given CPU. data must already have
- * ->func, ->info, and ->flags set.
- */
-static int generic_exec_single(int cpu, struct __call_single_data *csd)
-{
-	if (cpu == smp_processor_id()) {
-		smp_call_func_t func = csd->func;
-		void *info = csd->info;
-		unsigned long flags;
-
-		/*
-		 * We can unlock early even for the synchronous on-stack case,
-		 * since we're doing this from the same CPU..
-		 */
-		csd_lock_record(csd);
-		csd_unlock(csd);
-		local_irq_save(flags);
-		func(info);
-		csd_lock_record(NULL);
-		local_irq_restore(flags);
-		return 0;
-	}
-
-	if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
-		csd_unlock(csd);
-		return -ENXIO;
-	}
-
-	__smp_call_single_queue(cpu, &csd->node.llist);
-
-	return 0;
-}
-
 /**
  * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
  *
@@ -661,52 +626,6 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-/**
- * smp_call_function_single_async() - Run an asynchronous function on a
- * 			         specific CPU.
- * @cpu: The CPU to run on.
- * @csd: Pre-allocated and setup data structure
- *
- * Like smp_call_function_single(), but the call is asynchonous and
- * can thus be done from contexts with disabled interrupts.
- *
- * The caller passes his own pre-allocated data structure
- * (ie: embedded in an object) and is responsible for synchronizing it
- * such that the IPIs performed on the @csd are strictly serialized.
- *
- * If the function is called with one csd which has not yet been
- * processed by previous call to smp_call_function_single_async(), the
- * function will return immediately with -EBUSY showing that the csd
- * object is still in progress.
- *
- * NOTE: Be careful, there is unfortunately no current debugging facility to
- * validate the correctness of this serialization.
- *
- * Return: %0 on success or negative errno value on error
- */
-int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
-{
-	int err = 0;
-
-	preempt_disable();
-
-	if (csd->node.u_flags & CSD_FLAG_LOCK) {
-		err = -EBUSY;
-		goto out;
-	}
-
-	csd->node.u_flags = CSD_FLAG_LOCK;
-	smp_wmb();
-
-	err = generic_exec_single(cpu, csd);
-
-out:
-	preempt_enable();
-
-	return err;
-}
-EXPORT_SYMBOL_GPL(smp_call_function_single_async);
-
 /*
  * smp_call_function_any - Run a function on any of the given cpus
  * @mask: The mask of cpus it can run on.
@@ -1251,16 +1170,92 @@ EXPORT_SYMBOL(smp_xcall_mask_cond);
  * Because the call is asynchonous with a preallocated csd structure, thus
  * it can be called from contexts with disabled interrupts.
  *
- * Parameters
+ * Ideally this functionality should be part of smp_xcall_mask_cond().
+ * Because the csd is provided and maintained by the callers, merging this
+ * functionality into smp_xcall_mask_cond() will result in some extra
+ * complications in it. Before there is better way to facilitate all
+ * kinds of xcall, let's still handle this case with a separate function.
+ *
+ * The bit CSD_FLAG_LOCK will be set to csd->node.u_flags only if the
+ * xcall is made as type CSD_TYPE_SYNC or CSD_TYPE_ASYNC.
  *
+ * Parameters:
  * cpu:   Must be a positive value less than nr_cpu_id.
  * csd:   The private csd provided by the caller.
- *
  * Others: see smp_xcall().
+ *
+ * Return: %0 on success or negative errno value on error.
+ *
+ * The following comments are from smp_call_function_single_async():
+ *
+ *    The call is asynchronous and can thus be done from contexts with
+ *    disabled interrupts. If the function is called with one csd which
+ *    has not yet been processed by previous call, the function will
+ *    return immediately with -EBUSY showing that the csd object is
+ *    still in progress.
+ *
+ *    NOTE: Be careful, there is unfortunately no current debugging
+ *    facility to validate the correctness of this serialization.
  */
 int smp_xcall_private(int cpu, call_single_data_t *csd, unsigned int flags)
 {
-	return 0;
+	int err = 0;
+
+	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
+		pr_warn("cpu ID must be a positive number < nr_cpu_ids and must be currently online\n");
+		return -EINVAL;
+	}
+
+	if (csd == NULL) {
+		pr_warn("csd must not be NULL\n");
+		return -EINVAL;
+	}
+
+	preempt_disable();
+	if (csd->node.u_flags & CSD_FLAG_LOCK) {
+		err = -EBUSY;
+		goto out;
+	}
+
+	/*
+	 * CSD_FLAG_LOCK is set for CSD_TYPE_SYNC or CSD_TYPE_ASYNC only.
+	 */
+	if ((flags & ~(CSD_TYPE_SYNC | CSD_TYPE_ASYNC)) == 0)
+		csd->node.u_flags = CSD_FLAG_LOCK | flags;
+	else
+		csd->node.u_flags = flags;
+
+	if (cpu == smp_processor_id()) {
+		smp_call_func_t func = csd->func;
+		void *info = csd->info;
+		unsigned long flags;
+
+		/*
+		 * We can unlock early even for the synchronous on-stack case,
+		 * since we're doing this from the same CPU..
+		 */
+		csd_lock_record(csd);
+		csd_unlock(csd);
+		local_irq_save(flags);
+		func(info);
+		csd_lock_record(NULL);
+		local_irq_restore(flags);
+		goto out;
+	}
+
+	/*
+	 * Ensure the flags are visible before the csd
+	 * goes to the queue.
+	 */
+	smp_wmb();
+
+	__smp_call_single_queue(cpu, &csd->node.llist);
+
+	if (flags & CSD_TYPE_SYNC)
+		csd_lock_wait(csd);
+out:
+	preempt_enable();
+	return err;
 }
 EXPORT_SYMBOL(smp_xcall_private);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 06/11] smp: use smp_xcall_private() fron irq_work.c and core.c
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (4 preceding siblings ...)
  2022-04-15  2:46 ` [PATCH 05/11] smp: replace smp_call_function_single_async() with smp_xcall_private() Donghai Qiao
@ 2022-04-15  2:46 ` Donghai Qiao
  2022-04-15  2:46 ` [PATCH 07/11] smp: change smp_call_function_any() to smp_xcall_any() Donghai Qiao
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:46 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

irq_work.c and core.c should use the cross interface rather than
using a unpublished internal function __smp_call_single_queue.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 kernel/irq_work.c   | 4 ++--
 kernel/sched/core.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index f7df715ec28e..dac94a625665 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -159,8 +159,8 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
 			if (!irq_work_claim(work))
 				goto out;
 		}
-
-		__smp_call_single_queue(cpu, &work->node.llist);
+		smp_xcall_private(cpu, (call_single_data_t *)&work->node.llist,
+					XCALL_TYPE_IRQ_WORK);
 	} else {
 		__irq_work_queue_local(work);
 	}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51efaabac3e4..417355fbe32d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3780,7 +3780,7 @@ static void __ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags
 	p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
 
 	WRITE_ONCE(rq->ttwu_pending, 1);
-	__smp_call_single_queue(cpu, &p->wake_entry.llist);
+	smp_xcall_private(cpu, (call_single_data_t *)&p->wake_entry.llist, XCALL_TYPE_TTWU);
 }
 
 void wake_up_if_idle(int cpu)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 07/11] smp: change smp_call_function_any() to smp_xcall_any()
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (5 preceding siblings ...)
  2022-04-15  2:46 ` [PATCH 06/11] smp: use smp_xcall_private() fron irq_work.c and core.c Donghai Qiao
@ 2022-04-15  2:46 ` Donghai Qiao
  2022-04-15  2:46 ` [PATCH 08/11] smp: replace smp_call_function_many_cond() with __smp_call_mask_cond() Donghai Qiao
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:46 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Rename smp_call_function_any() to smp_xcall_any() and also make
the changes necessary.

Replace all the invocations of smp_call_function_any() with
smp_xcall_any() for all.

Actually the kernel consumers can use smp_xcall() when they want
to use smp_call_function_any(). The extra logics handled by
smp_call_function_any() should be moved out of there and have the
consumers choose the preferred CPU. Because there are quite a few
of the cross call consumers need to run their functions on just
one of the CPUs of a given CPU set, so there is some advantage to
add smp_xcall_any() to the interface.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 arch/arm/kernel/perf_event_v7.c           |  6 +-
 arch/arm64/kernel/perf_event.c            |  6 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  2 +-
 drivers/cpufreq/acpi-cpufreq.c            |  4 +-
 drivers/cpufreq/powernv-cpufreq.c         | 12 ++--
 drivers/perf/arm_spe_pmu.c                |  2 +-
 include/linux/smp.h                       | 12 +---
 kernel/smp.c                              | 78 ++++++++++-------------
 8 files changed, 53 insertions(+), 69 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index eb2190477da1..f07e9221019a 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1192,9 +1192,9 @@ static void armv7_read_num_pmnc_events(void *info)
 
 static int armv7_probe_num_events(struct arm_pmu *arm_pmu)
 {
-	return smp_call_function_any(&arm_pmu->supported_cpus,
-				     armv7_read_num_pmnc_events,
-				     &arm_pmu->num_events, 1);
+	return smp_xcall_any(&arm_pmu->supported_cpus,
+			     armv7_read_num_pmnc_events,
+			     &arm_pmu->num_events, XCALL_TYPE_SYNC);
 }
 
 static int armv7_a8_pmu_init(struct arm_pmu *cpu_pmu)
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index cb69ff1e6138..7e847044492b 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1186,9 +1186,9 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
 	};
 	int ret;
 
-	ret = smp_call_function_any(&cpu_pmu->supported_cpus,
-				    __armv8pmu_probe_pmu,
-				    &probe, 1);
+	ret = smp_xcall_any(&cpu_pmu->supported_cpus,
+			    __armv8pmu_probe_pmu,
+			    &probe, XCALL_TYPE_SYNC);
 	if (ret)
 		return ret;
 
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 87666275eed9..7e45da5f3c8b 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -512,7 +512,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	rr->val = 0;
 	rr->first = first;
 
-	smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
+	(void) smp_xcall_any(&d->cpu_mask, mon_event_count, rr, XCALL_TYPE_SYNC);
 }
 
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 3d514b82d055..fd595c1cdd2f 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -312,8 +312,8 @@ static u32 drv_read(struct acpi_cpufreq_data *data, const struct cpumask *mask)
 	};
 	int err;
 
-	err = smp_call_function_any(mask, do_drv_read, &cmd, 1);
-	WARN_ON_ONCE(err);	/* smp_call_function_any() was buggy? */
+	err = smp_xcall_any(mask, do_drv_read, &cmd, XCALL_TYPE_SYNC);
+	WARN_ON_ONCE(err);	/* smp_xcall_any() was buggy? */
 	return cmd.val;
 }
 
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index fddbd1ea1635..aa7a02e1c647 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -507,8 +507,8 @@ static unsigned int powernv_cpufreq_get(unsigned int cpu)
 {
 	struct powernv_smp_call_data freq_data;
 
-	smp_call_function_any(cpu_sibling_mask(cpu), powernv_read_cpu_freq,
-			&freq_data, 1);
+	(void) smp_xcall_any(cpu_sibling_mask(cpu), powernv_read_cpu_freq,
+			&freq_data, XCALL_TYPE_SYNC);
 
 	return freq_data.freq;
 }
@@ -820,8 +820,10 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
 	 * Use smp_call_function to send IPI and execute the
 	 * mtspr on target CPU.  We could do that without IPI
 	 * if current CPU is within policy->cpus (core)
+	 *
+	 * Shouldn't return the value of smp_xcall_any() ?
 	 */
-	smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1);
+	(void) smp_xcall_any(policy->cpus, set_pstate, &freq_data, XCALL_TYPE_SYNC);
 	return 0;
 }
 
@@ -921,8 +923,8 @@ static void powernv_cpufreq_work_fn(struct work_struct *work)
 
 	cpus_read_lock();
 	cpumask_and(&mask, &chip->mask, cpu_online_mask);
-	smp_call_function_any(&mask,
-			      powernv_cpufreq_throttle_check, NULL, 0);
+	(void) smp_xcall_any(&mask, powernv_cpufreq_throttle_check,
+			     NULL, XCALL_TYPE_ASYNC);
 
 	if (!chip->restore)
 		goto out;
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d99c..f81fa4a496a6 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -1108,7 +1108,7 @@ static int arm_spe_pmu_dev_init(struct arm_spe_pmu *spe_pmu)
 	cpumask_t *mask = &spe_pmu->supported_cpus;
 
 	/* Make sure we probe the hardware on a relevant CPU */
-	ret = smp_call_function_any(mask,  __arm_spe_pmu_dev_probe, spe_pmu, 1);
+	ret = smp_xcall_any(mask,  __arm_spe_pmu_dev_probe, spe_pmu, XCALL_TYPE_SYNC);
 	if (ret || !(spe_pmu->features & SPE_PMU_FEAT_DEV_PROBED))
 		return -ENXIO;
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 8a234e707f10..3ddd4c6107e1 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -161,6 +161,8 @@ do {						\
 	*(_csd) = CSD_INIT((_func), (_info));	\
 } while (0)
 
+extern int smp_xcall_any(const struct cpumask *mask, smp_call_func_t func,
+			 void *info, unsigned int flags);
 
 /*
  * smp_xcall Interface.
@@ -304,9 +306,6 @@ void smp_call_function(smp_call_func_t func, void *info, int wait);
 void smp_call_function_many(const struct cpumask *mask,
 			    smp_call_func_t func, void *info, bool wait);
 
-int smp_call_function_any(const struct cpumask *mask,
-			  smp_call_func_t func, void *info, int wait);
-
 void kick_all_cpus_sync(void);
 void wake_up_all_idle_cpus(void);
 
@@ -355,13 +354,6 @@ static inline void smp_send_reschedule(int cpu) { }
 			(up_smp_call_function(func, info))
 static inline void call_function_init(void) { }
 
-static inline int
-smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
-		      void *info, int wait)
-{
-	return smp_call_function_single(0, func, info, wait);
-}
-
 static inline void kick_all_cpus_sync(void) {  }
 static inline void wake_up_all_idle_cpus(void) {  }
 
diff --git a/kernel/smp.c b/kernel/smp.c
index aef913b54f81..94df3b3a38cf 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -626,49 +626,6 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-/*
- * smp_call_function_any - Run a function on any of the given cpus
- * @mask: The mask of cpus it can run on.
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait until function has completed.
- *
- * Returns 0 on success, else a negative status code (if no cpus were online).
- *
- * Selection preference:
- *	1) current cpu if in @mask
- *	2) any cpu of current node if in @mask
- *	3) any other online cpu in @mask
- */
-int smp_call_function_any(const struct cpumask *mask,
-			  smp_call_func_t func, void *info, int wait)
-{
-	unsigned int cpu;
-	const struct cpumask *nodemask;
-	int ret;
-
-	/* Try for same CPU (cheapest) */
-	cpu = get_cpu();
-	if (cpumask_test_cpu(cpu, mask))
-		goto call;
-
-	/* Try for same node. */
-	nodemask = cpumask_of_node(cpu_to_node(cpu));
-	for (cpu = cpumask_first_and(nodemask, mask); cpu < nr_cpu_ids;
-	     cpu = cpumask_next_and(cpu, nodemask, mask)) {
-		if (cpu_online(cpu))
-			goto call;
-	}
-
-	/* Any online will do: smp_call_function_single handles nr_cpu_ids. */
-	cpu = cpumask_any_and(mask, cpu_online_mask);
-call:
-	ret = smp_call_function_single(cpu, func, info, wait);
-	put_cpu();
-	return ret;
-}
-EXPORT_SYMBOL_GPL(smp_call_function_any);
-
 static void smp_call_function_many_cond(const struct cpumask *mask,
 					smp_call_func_t func, void *info,
 					bool wait,
@@ -1276,6 +1233,39 @@ EXPORT_SYMBOL(smp_xcall_private);
 int smp_xcall_any(const struct cpumask *mask, smp_call_func_t func,
 		void *info, unsigned int flags)
 {
-	return 0;
+	int cpu;
+        const struct cpumask *nodemask;
+
+	if (mask == NULL || func == NULL ||
+	    (flags != XCALL_TYPE_SYNC && flags != XCALL_TYPE_ASYNC))
+		return -EINVAL;
+
+        /* Try for same CPU (cheapest) */
+	preempt_disable();
+	cpu = smp_processor_id();
+
+        if (cpumask_test_cpu(cpu, mask))
+                goto call;
+
+        /* Try for same node. */
+        nodemask = cpumask_of_node(cpu_to_node(cpu));
+        for (cpu = cpumask_first_and(nodemask, mask); (unsigned int)cpu < nr_cpu_ids;
+             cpu = cpumask_next_and(cpu, nodemask, mask)) {
+                if (cpu_online(cpu))
+                        goto call;
+        }
+
+        /* Any online will do: smp_call_function_single handles nr_cpu_ids. */
+        cpu = cpumask_any_and(mask, cpu_online_mask);
+	if ((unsigned int)cpu >= nr_cpu_ids) {
+        	preempt_enable();
+		return -ENXIO;
+	}
+
+call:
+	(void) smp_xcall(cpu, func, info, flags);
+
+        preempt_enable();
+        return 0;
 }
 EXPORT_SYMBOL(smp_xcall_any);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 08/11] smp: replace smp_call_function_many_cond() with __smp_call_mask_cond()
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (6 preceding siblings ...)
  2022-04-15  2:46 ` [PATCH 07/11] smp: change smp_call_function_any() to smp_xcall_any() Donghai Qiao
@ 2022-04-15  2:46 ` Donghai Qiao
  2022-04-15  2:46 ` [PATCH 09/11] smp: replace smp_call_function_single_async with smp_xcall_private Donghai Qiao
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:46 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replaced smp_call_function_many_cond() with __smp_call_mask_cond()
and make the changes as necessary.

Consolidated and clean up the redundant code alongside the paths of the
invocation to the function smp_call_function_many_cond().

on_each_cpu_cond_mask(cond_func, func, info, wait, mask) is replaced by
smp_xcall_mask_cond(umask, func, cond_func, info, (wait?XCALL_TYPE_SYNC:
XCALL_TYPE_ASYNC))

smp_call_function_many(mask, func, info, wait) is replaced by
smp_xcall_mask(mask, func, info, (wait?XCALL_TYPE_SYNC:XCALL_TYPE_ASYNC))

smp_call_function(func, info, wait) is replace by
smp_xcall(XCALL_ALL, func, info, (wait?XCALL_TYPE_SYNC:XCALL_TYPE_ASYNC))

on_each_cpu(func info, wait) is replaced by
smp_xcall(XCALL_ALL, func, info, (wait?XCALL_TYPE_SYNC:XCALL_TYPE_ASYNC))

on_each_cpu_mask(mask, func, info, wait) is replaced by
smp_xcall_mask(mask, func, info, (wait?XCALL_TYPE_SYNC:XCALL_TYPE_ASYNC))

on_each_cpu_cond(cond_func, func, info, wait) is replaced by
smp_xcall_cond(XCALL_ALL, func, info, cond_func, (wait ? XCALL_TYPE_SYNC:
XCALL_TYPE_ASYNC))

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 arch/alpha/kernel/process.c                   |   2 +-
 arch/alpha/kernel/smp.c                       |  10 +-
 arch/arc/kernel/perf_event.c                  |   2 +-
 arch/arc/mm/cache.c                           |   2 +-
 arch/arc/mm/tlb.c                             |  14 +-
 arch/arm/common/bL_switcher.c                 |   2 +-
 arch/arm/kernel/machine_kexec.c               |   2 +-
 arch/arm/kernel/smp_tlb.c                     |  22 +--
 arch/arm/kernel/smp_twd.c                     |   4 +-
 arch/arm/mm/flush.c                           |   4 +-
 arch/arm/vfp/vfpmodule.c                      |   2 +-
 arch/arm64/kernel/armv8_deprecated.c          |   4 +-
 arch/arm64/kernel/perf_event.c                |   2 +-
 arch/arm64/kvm/arm.c                          |   6 +-
 arch/csky/abiv2/cacheflush.c                  |   2 +-
 arch/csky/kernel/perf_event.c                 |   2 +-
 arch/csky/kernel/smp.c                        |   2 +-
 arch/csky/mm/cachev2.c                        |   2 +-
 arch/ia64/kernel/mca.c                        |   4 +-
 arch/ia64/kernel/smp.c                        |  10 +-
 arch/ia64/kernel/uncached.c                   |   4 +-
 arch/mips/cavium-octeon/octeon-irq.c          |   4 +-
 arch/mips/cavium-octeon/setup.c               |   4 +-
 arch/mips/kernel/crash.c                      |   2 +-
 arch/mips/kernel/machine_kexec.c              |   2 +-
 arch/mips/kernel/perf_event_mipsxx.c          |   7 +-
 arch/mips/kernel/smp.c                        |   8 +-
 arch/mips/kernel/sysrq.c                      |   2 +-
 arch/mips/mm/c-r4k.c                          |   4 +-
 arch/mips/sibyte/common/cfe.c                 |   2 +-
 arch/openrisc/kernel/smp.c                    |  12 +-
 arch/parisc/kernel/cache.c                    |   4 +-
 arch/parisc/mm/init.c                         |   2 +-
 arch/powerpc/kernel/dawr.c                    |   2 +-
 arch/powerpc/kernel/kvm.c                     |   2 +-
 arch/powerpc/kernel/security.c                |   6 +-
 arch/powerpc/kernel/smp.c                     |   4 +-
 arch/powerpc/kernel/sysfs.c                   |   2 +-
 arch/powerpc/kernel/tau_6xx.c                 |   4 +-
 arch/powerpc/kexec/core_64.c                  |   2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |   2 +-
 arch/powerpc/mm/book3s64/pgtable.c            |   2 +-
 arch/powerpc/mm/book3s64/radix_tlb.c          |  12 +-
 arch/powerpc/mm/nohash/tlb.c                  |  10 +-
 arch/powerpc/mm/slice.c                       |   4 +-
 arch/powerpc/perf/core-book3s.c               |   2 +-
 arch/powerpc/perf/imc-pmu.c                   |   2 +-
 arch/powerpc/platforms/85xx/smp.c             |   2 +-
 arch/powerpc/platforms/powernv/idle.c         |   2 +-
 arch/powerpc/platforms/pseries/lparcfg.c      |   2 +-
 arch/riscv/mm/cacheflush.c                    |   4 +-
 arch/s390/hypfs/hypfs_diag0c.c                |   2 +-
 arch/s390/kernel/alternative.c                |   2 +-
 arch/s390/kernel/perf_cpum_cf.c               |  10 +-
 arch/s390/kernel/perf_cpum_cf_common.c        |   4 +-
 arch/s390/kernel/perf_cpum_sf.c               |   4 +-
 arch/s390/kernel/processor.c                  |   2 +-
 arch/s390/kernel/smp.c                        |   2 +-
 arch/s390/kernel/topology.c                   |   2 +-
 arch/s390/mm/pgalloc.c                        |   2 +-
 arch/s390/pci/pci_irq.c                       |   2 +-
 arch/sh/kernel/smp.c                          |  14 +-
 arch/sh/mm/cache.c                            |   2 +-
 arch/sparc/include/asm/mman.h                 |   4 +-
 arch/sparc/kernel/nmi.c                       |  12 +-
 arch/sparc/kernel/perf_event.c                |   4 +-
 arch/sparc/kernel/smp_64.c                    |   8 +-
 arch/sparc/mm/init_64.c                       |   2 +-
 arch/x86/events/core.c                        |   6 +-
 arch/x86/events/intel/core.c                  |   4 +-
 arch/x86/kernel/alternative.c                 |   2 +-
 arch/x86/kernel/amd_nb.c                      |   2 +-
 arch/x86/kernel/apic/apic.c                   |   2 +-
 arch/x86/kernel/cpu/bugs.c                    |   2 +-
 arch/x86/kernel/cpu/mce/core.c                |  12 +-
 arch/x86/kernel/cpu/mce/inject.c              |   6 +-
 arch/x86/kernel/cpu/mce/intel.c               |   2 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |   6 +-
 arch/x86/kernel/cpu/sgx/main.c                |   5 +-
 arch/x86/kernel/cpu/umwait.c                  |   2 +-
 arch/x86/kernel/cpu/vmware.c                  |   2 +-
 arch/x86/kernel/kvm.c                         |   2 +-
 arch/x86/kernel/ldt.c                         |   2 +-
 arch/x86/kvm/x86.c                            |   4 +-
 arch/x86/lib/cache-smp.c                      |   2 +-
 arch/x86/lib/msr-smp.c                        |   2 +-
 arch/x86/mm/pat/set_memory.c                  |   4 +-
 arch/x86/mm/tlb.c                             |  12 +-
 arch/x86/xen/mmu_pv.c                         |   2 +-
 arch/x86/xen/smp_pv.c                         |   2 +-
 arch/x86/xen/suspend.c                        |   4 +-
 arch/xtensa/kernel/smp.c                      |  22 +--
 drivers/char/agp/generic.c                    |   2 +-
 drivers/clocksource/mips-gic-timer.c          |   2 +-
 drivers/cpufreq/acpi-cpufreq.c                |   6 +-
 drivers/cpufreq/tegra194-cpufreq.c            |   2 +-
 drivers/cpuidle/driver.c                      |   8 +-
 drivers/edac/amd64_edac.c                     |   4 +-
 drivers/firmware/arm_sdei.c                   |  10 +-
 drivers/gpu/drm/i915/vlv_sideband.c           |   2 +-
 drivers/hwmon/fam15h_power.c                  |   2 +-
 drivers/irqchip/irq-mvebu-pic.c               |   4 +-
 drivers/net/ethernet/marvell/mvneta.c         |  30 ++---
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   8 +-
 drivers/platform/x86/intel_ips.c              |   4 +-
 drivers/soc/xilinx/xlnx_event_manager.c       |   2 +-
 drivers/tty/sysrq.c                           |   2 +-
 drivers/watchdog/booke_wdt.c                  |   8 +-
 fs/buffer.c                                   |   2 +-
 include/linux/smp.h                           |  53 +-------
 kernel/profile.c                              |   4 +-
 kernel/rcu/tree.c                             |   4 +-
 kernel/scftorture.c                           |   8 +-
 kernel/sched/membarrier.c                     |  12 +-
 kernel/smp.c                                  | 125 +++---------------
 kernel/time/hrtimer.c                         |   6 +-
 kernel/trace/ftrace.c                         |   6 +-
 kernel/trace/ring_buffer.c                    |   2 +-
 kernel/trace/trace.c                          |  12 +-
 kernel/trace/trace_events.c                   |   2 +-
 mm/kasan/quarantine.c                         |   2 +-
 mm/mmu_gather.c                               |   2 +-
 mm/slab.c                                     |   2 +-
 net/iucv/iucv.c                               |   6 +-
 virt/kvm/kvm_main.c                           |  10 +-
 126 files changed, 309 insertions(+), 459 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 5f8527081da9..835127b84b66 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -167,7 +167,7 @@ common_shutdown(int mode, char *restart_cmd)
 	struct halt_info args;
 	args.mode = mode;
 	args.restart_cmd = restart_cmd;
-	on_each_cpu(common_shutdown_1, &args, 0);
+	smp_xcall(XCALL_ALL, common_shutdown_1, &args, XCALL_TYPE_ASYNC);
 }
 
 void
diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index cb64e4797d2a..70df387ee9f2 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -611,7 +611,7 @@ void
 smp_imb(void)
 {
 	/* Must wait other processors to flush their icache before continue. */
-	on_each_cpu(ipi_imb, NULL, 1);
+	smp_xcall(XCALL_ALL, ipi_imb, NULL, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(smp_imb);
 
@@ -626,7 +626,7 @@ flush_tlb_all(void)
 {
 	/* Although we don't have any data to pass, we do want to
 	   synchronize with the other processors.  */
-	on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_all, NULL, XCALL_TYPE_SYNC);
 }
 
 #define asn_locked() (cpu_data[smp_processor_id()].asn_lock)
@@ -661,7 +661,7 @@ flush_tlb_mm(struct mm_struct *mm)
 		}
 	}
 
-	smp_call_function(ipi_flush_tlb_mm, mm, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_mm, mm, XCALL_TYPE_SYNC);
 
 	preempt_enable();
 }
@@ -712,7 +712,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 	data.mm = mm;
 	data.addr = addr;
 
-	smp_call_function(ipi_flush_tlb_page, &data, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_page, &data, XCALL_TYPE_SYNC);
 
 	preempt_enable();
 }
@@ -762,7 +762,7 @@ flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 		}
 	}
 
-	smp_call_function(ipi_flush_icache_page, mm, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_icache_page, mm, XCALL_TYPE_SYNC);
 
 	preempt_enable();
 }
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index adff957962da..ef70a4becba9 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -811,7 +811,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 						 this_cpu_ptr(&arc_pmu_cpu));
 
 			if (!ret)
-				on_each_cpu(arc_cpu_pmu_irq_init, &irq, 1);
+				smp_xcall(XCALL_ALL, arc_cpu_pmu_irq_init, &irq, XCALL_TYPE_SYNC);
 			else
 				irq = -1;
 		}
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 8aa1231865d1..f46df918b3c7 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -569,7 +569,7 @@ static void __ic_line_inv_vaddr(phys_addr_t paddr, unsigned long vaddr,
 		.sz    = sz
 	};
 
-	on_each_cpu(__ic_line_inv_vaddr_helper, &ic_inv, 1);
+	smp_xcall(XCALL_ALL, __ic_line_inv_vaddr_helper, &ic_inv, XCALL_TYPE_SYNC);
 }
 
 #endif	/* CONFIG_SMP */
diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
index 5f71445f26bd..7d5b55410f4c 100644
--- a/arch/arc/mm/tlb.c
+++ b/arch/arc/mm/tlb.c
@@ -330,13 +330,13 @@ static inline void ipi_flush_tlb_kernel_range(void *arg)
 
 void flush_tlb_all(void)
 {
-	on_each_cpu((smp_call_func_t)local_flush_tlb_all, NULL, 1);
+	smp_xcall(XCALL_ALL, (smp_call_func_t)local_flush_tlb_all, NULL, XCALL_TYPE_SYNC);
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	on_each_cpu_mask(mm_cpumask(mm), (smp_call_func_t)local_flush_tlb_mm,
-			 mm, 1);
+	smp_xcall_mask(mm_cpumask(mm), (smp_call_func_t)local_flush_tlb_mm,
+			 mm, XCALL_TYPE_SYNC);
 }
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
@@ -346,7 +346,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		.ta_start = uaddr
 	};
 
-	on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page, &ta, 1);
+	smp_xcall_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page, &ta, XCALL_TYPE_SYNC);
 }
 
 void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
@@ -358,7 +358,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		.ta_end = end
 	};
 
-	on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range, &ta, 1);
+	smp_xcall_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range, &ta, XCALL_TYPE_SYNC);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -371,7 +371,7 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		.ta_end = end
 	};
 
-	on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_pmd_tlb_range, &ta, 1);
+	smp_xcall_mask(mm_cpumask(vma->vm_mm), ipi_flush_pmd_tlb_range, &ta, XCALL_TYPE_SYNC);
 }
 #endif
 
@@ -382,7 +382,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.ta_end = end
 	};
 
-	on_each_cpu(ipi_flush_tlb_kernel_range, &ta, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_kernel_range, &ta, XCALL_TYPE_SYNC);
 }
 #endif
 
diff --git a/arch/arm/common/bL_switcher.c b/arch/arm/common/bL_switcher.c
index 9a9aa53547a6..7840aea57967 100644
--- a/arch/arm/common/bL_switcher.c
+++ b/arch/arm/common/bL_switcher.c
@@ -541,7 +541,7 @@ int bL_switcher_trace_trigger(void)
 	preempt_disable();
 
 	bL_switcher_trace_trigger_cpu(NULL);
-	smp_call_function(bL_switcher_trace_trigger_cpu, NULL, true);
+	smp_xcall(XCALL_ALL, bL_switcher_trace_trigger_cpu, NULL, XCALL_TYPE_SYNC);
 
 	preempt_enable();
 
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f567032a09c0..c1fed9a94fa0 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -101,7 +101,7 @@ void crash_smp_send_stop(void)
 		return;
 
 	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
-	smp_call_function(machine_crash_nonpanic_core, NULL, false);
+	smp_xcall(XCALL_ALL, machine_crash_nonpanic_core, NULL, XCALL_TYPE_ASYNC);
 	msecs = 1000; /* Wait at most a second for the other cpus to stop */
 	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
 		mdelay(1);
diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index d4908b3736d8..2113b5760003 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -158,7 +158,7 @@ static void broadcast_tlb_a15_erratum(void)
 	if (!erratum_a15_798181())
 		return;
 
-	smp_call_function(ipi_flush_tlb_a15_erratum, NULL, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_a15_erratum, NULL, XCALL_TYPE_SYNC);
 }
 
 static void broadcast_tlb_mm_a15_erratum(struct mm_struct *mm)
@@ -171,14 +171,14 @@ static void broadcast_tlb_mm_a15_erratum(struct mm_struct *mm)
 
 	this_cpu = get_cpu();
 	a15_erratum_get_cpumask(this_cpu, mm, &mask);
-	smp_call_function_many(&mask, ipi_flush_tlb_a15_erratum, NULL, 1);
+	smp_xcall_mask(&mask, ipi_flush_tlb_a15_erratum, NULL, XCALL_TYPE_SYNC);
 	put_cpu();
 }
 
 void flush_tlb_all(void)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+		smp_xcall(XCALL_ALL, ipi_flush_tlb_all, NULL, XCALL_TYPE_SYNC);
 	else
 		__flush_tlb_all();
 	broadcast_tlb_a15_erratum();
@@ -187,7 +187,7 @@ void flush_tlb_all(void)
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
+		smp_xcall_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, XCALL_TYPE_SYNC);
 	else
 		__flush_tlb_mm(mm);
 	broadcast_tlb_mm_a15_erratum(mm);
@@ -199,8 +199,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		struct tlb_args ta;
 		ta.ta_vma = vma;
 		ta.ta_start = uaddr;
-		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
-					&ta, 1);
+		smp_xcall_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
+			       &ta, XCALL_TYPE_SYNC);
 	} else
 		__flush_tlb_page(vma, uaddr);
 	broadcast_tlb_mm_a15_erratum(vma->vm_mm);
@@ -211,7 +211,7 @@ void flush_tlb_kernel_page(unsigned long kaddr)
 	if (tlb_ops_need_broadcast()) {
 		struct tlb_args ta;
 		ta.ta_start = kaddr;
-		on_each_cpu(ipi_flush_tlb_kernel_page, &ta, 1);
+		smp_xcall(XCALL_ALL, ipi_flush_tlb_kernel_page, &ta, XCALL_TYPE_SYNC);
 	} else
 		__flush_tlb_kernel_page(kaddr);
 	broadcast_tlb_a15_erratum();
@@ -225,8 +225,8 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		ta.ta_vma = vma;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
-					&ta, 1);
+		smp_xcall_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
+				&ta, XCALL_TYPE_SYNC);
 	} else
 		local_flush_tlb_range(vma, start, end);
 	broadcast_tlb_mm_a15_erratum(vma->vm_mm);
@@ -238,7 +238,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		struct tlb_args ta;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu(ipi_flush_tlb_kernel_range, &ta, 1);
+		smp_xcall(XCALL_ALL, ipi_flush_tlb_kernel_range, &ta, XCALL_TYPE_SYNC);
 	} else
 		local_flush_tlb_kernel_range(start, end);
 	broadcast_tlb_a15_erratum();
@@ -247,7 +247,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 void flush_bp_all(void)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu(ipi_flush_bp_all, NULL, 1);
+		smp_xcall(XCALL_ALL, ipi_flush_bp_all, NULL, XCALL_TYPE_SYNC);
 	else
 		__flush_bp_all();
 }
diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index 9a14f721a2b0..a279bdfe40dd 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -119,8 +119,8 @@ static int twd_rate_change(struct notifier_block *nb,
 	 * changing cpu.
 	 */
 	if (flags == POST_RATE_CHANGE)
-		on_each_cpu(twd_update_frequency,
-				  (void *)&cnd->new_rate, 1);
+		smp_xcall(XCALL_ALL, twd_update_frequency,
+			  (void *)&cnd->new_rate, XCALL_TYPE_SYNC);
 
 	return NOTIFY_OK;
 }
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7ff9feea13a6..07475c23e97c 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -150,8 +150,8 @@ void __flush_ptrace_access(struct page *page, unsigned long uaddr, void *kaddr,
 		else
 			__cpuc_coherent_kern_range(addr, addr + len);
 		if (cache_ops_need_broadcast())
-			smp_call_function(flush_ptrace_access_other,
-					  NULL, 1);
+			smp_xcall(XCALL_ALL, flush_ptrace_access_other,
+				  NULL, XCALL_TYPE_SYNC);
 	}
 }
 
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 2cb355c1b5b7..41173f46bc80 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -780,7 +780,7 @@ static int __init vfp_init(void)
 	 * following test on FPSID will succeed.
 	 */
 	if (cpu_arch >= CPU_ARCH_ARMv6)
-		on_each_cpu(vfp_enable, NULL, 1);
+		smp_xcall(XCALL_ALL, vfp_enable, NULL, XCALL_TYPE_SYNC);
 
 	/*
 	 * First check that there is a VFP that we can use.
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
index 6875a16b09d2..5cb793e1893b 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -104,9 +104,9 @@ static int run_all_cpu_set_hw_mode(struct insn_emulation *insn, bool enable)
 	if (!insn->ops->set_hw_mode)
 		return -EINVAL;
 	if (enable)
-		on_each_cpu(enable_insn_hw_mode, (void *)insn, true);
+		smp_xcall(XCALL_ALL, enable_insn_hw_mode, (void *)insn, XCALL_TYPE_SYNC);
 	else
-		on_each_cpu(disable_insn_hw_mode, (void *)insn, true);
+		smp_xcall(XCALL_ALL, disable_insn_hw_mode, (void *)insn, XCALL_TYPE_SYNC);
 	return 0;
 }
 
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 7e847044492b..fc7e13dbf1f5 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1207,7 +1207,7 @@ static int armv8pmu_proc_user_access_handler(struct ctl_table *table, int write,
 	if (ret || !write || sysctl_perf_user_access)
 		return ret;
 
-	on_each_cpu(armv8pmu_disable_user_access_ipi, NULL, 1);
+	smp_xcall(XCALL_ALL, armv8pmu_disable_user_access_ipi, NULL, XCALL_TYPE_SYNC);
 	return 0;
 }
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 523bc934fe2f..68e93b87c4cf 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1728,7 +1728,7 @@ static int init_subsystems(void)
 	/*
 	 * Enable hardware so that subsystem initialisation can access EL2.
 	 */
-	on_each_cpu(_kvm_arch_hardware_enable, NULL, 1);
+	smp_xcall(XCALL_ALL, _kvm_arch_hardware_enable, NULL, XCALL_TYPE_SYNC);
 
 	/*
 	 * Register CPU lower-power notifier
@@ -1765,7 +1765,7 @@ static int init_subsystems(void)
 
 out:
 	if (err || !is_protected_kvm_enabled())
-		on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
+		smp_xcall(XCALL_ALL, _kvm_arch_hardware_disable, NULL, XCALL_TYPE_SYNC);
 
 	return err;
 }
@@ -2000,7 +2000,7 @@ static int pkvm_drop_host_privileges(void)
 	 * once the host stage 2 is installed.
 	 */
 	static_branch_enable(&kvm_protected_mode_initialized);
-	on_each_cpu(_kvm_host_prot_finalize, &ret, 1);
+	smp_xcall(XCALL_ALL, _kvm_host_prot_finalize, &ret, XCALL_TYPE_SYNC);
 	return ret;
 }
 
diff --git a/arch/csky/abiv2/cacheflush.c b/arch/csky/abiv2/cacheflush.c
index 39c51399dd81..68bfe3b618cc 100644
--- a/arch/csky/abiv2/cacheflush.c
+++ b/arch/csky/abiv2/cacheflush.c
@@ -80,7 +80,7 @@ void flush_icache_mm_range(struct mm_struct *mm,
 	cpumask_andnot(&others, mm_cpumask(mm), cpumask_of(cpu));
 
 	if (mm != current->active_mm || !cpumask_empty(&others)) {
-		on_each_cpu_mask(&others, local_icache_inv_all, NULL, 1);
+		smp_xcall_mask(&others, local_icache_inv_all, NULL, XCALL_TYPE_SYNC);
 		cpumask_clear(mask);
 	}
 
diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index e5f18420ce64..35f4c8071c64 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -1311,7 +1311,7 @@ int csky_pmu_device_probe(struct platform_device *pdev,
 	csky_pmu.plat_device = pdev;
 
 	/* Ensure the PMU has sane values out of reset. */
-	on_each_cpu(csky_pmu_reset, &csky_pmu, 1);
+	smp_xcall(XCALL_ALL, csky_pmu_reset, &csky_pmu, XCALL_TYPE_SYNC);
 
 	ret = csky_pmu_request_irq(csky_pmu_handle_irq);
 	if (ret) {
diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c
index 6bb38bc2f39b..728b157e8323 100644
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -137,7 +137,7 @@ static void ipi_stop(void *unused)
 
 void smp_send_stop(void)
 {
-	on_each_cpu(ipi_stop, NULL, 1);
+	smp_xcall(XCALL_ALL, ipi_stop, NULL, XCALL_TYPE_SYNC);
 }
 
 void smp_send_reschedule(int cpu)
diff --git a/arch/csky/mm/cachev2.c b/arch/csky/mm/cachev2.c
index 7a9664adce43..289ad7852c70 100644
--- a/arch/csky/mm/cachev2.c
+++ b/arch/csky/mm/cachev2.c
@@ -66,7 +66,7 @@ void icache_inv_range(unsigned long start, unsigned long end)
 	if (irqs_disabled())
 		local_icache_inv_range(&param);
 	else
-		on_each_cpu(local_icache_inv_range, &param, 1);
+		smp_xcall(XCALL_ALL, local_icache_inv_range, &param, XCALL_TYPE_SYNC);
 }
 #endif
 
diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
index e628a88607bb..eeb2dbdf2886 100644
--- a/arch/ia64/kernel/mca.c
+++ b/arch/ia64/kernel/mca.c
@@ -712,7 +712,7 @@ ia64_mca_cmc_vector_enable (void *dummy)
 static void
 ia64_mca_cmc_vector_disable_keventd(struct work_struct *unused)
 {
-	on_each_cpu(ia64_mca_cmc_vector_disable, NULL, 0);
+	smp_xcall(XCALL_ALL, ia64_mca_cmc_vector_disable, NULL, XCALL_TYPE_ASYNC);
 }
 
 /*
@@ -724,7 +724,7 @@ ia64_mca_cmc_vector_disable_keventd(struct work_struct *unused)
 static void
 ia64_mca_cmc_vector_enable_keventd(struct work_struct *unused)
 {
-	on_each_cpu(ia64_mca_cmc_vector_enable, NULL, 0);
+	smp_xcall(XCALL_ALL, ia64_mca_cmc_vector_enable, NULL, XCALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index 7b7b64eb3129..76a76388f1c6 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -285,7 +285,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
 void
 smp_flush_tlb_all (void)
 {
-	on_each_cpu((void (*)(void *))local_flush_tlb_all, NULL, 1);
+	smp_xcall(XCALL_ALL, (void (*)(void *))local_flush_tlb_all, NULL, XCALL_TYPE_SYNC);
 }
 
 void
@@ -301,12 +301,12 @@ smp_flush_tlb_mm (struct mm_struct *mm)
 		return;
 	}
 	if (!alloc_cpumask_var(&cpus, GFP_ATOMIC)) {
-		smp_call_function((void (*)(void *))local_finish_flush_tlb_mm,
-			mm, 1);
+		smp_xcall(XCALL_ALL, (void (*)(void *))local_finish_flush_tlb_mm,
+			  mm, XCALL_TYPE_SYNC);
 	} else {
 		cpumask_copy(cpus, mm_cpumask(mm));
-		smp_call_function_many(cpus,
-			(void (*)(void *))local_finish_flush_tlb_mm, mm, 1);
+		smp_xcall_mask(cpus,
+			(void (*)(void *))local_finish_flush_tlb_mm, mm, XCALL_TYPE_SYNC);
 		free_cpumask_var(cpus);
 	}
 	local_irq_disable();
diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c
index 816803636a75..a6166b4f81dd 100644
--- a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -118,7 +118,7 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
 	status = ia64_pal_prefetch_visibility(PAL_VISIBILITY_PHYSICAL);
 	if (status == PAL_VISIBILITY_OK_REMOTE_NEEDED) {
 		atomic_set(&uc_pool->status, 0);
-		smp_call_function(uncached_ipi_visibility, uc_pool, 1);
+		smp_xcall(XCALL_ALL, uncached_ipi_visibility, uc_pool, XCALL_TYPE_SYNC);
 		if (atomic_read(&uc_pool->status))
 			goto failed;
 	} else if (status != PAL_VISIBILITY_OK)
@@ -137,7 +137,7 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
 	if (status != PAL_STATUS_SUCCESS)
 		goto failed;
 	atomic_set(&uc_pool->status, 0);
-	smp_call_function(uncached_ipi_mc_drain, uc_pool, 1);
+	smp_xcall(XCALL_ALL, uncached_ipi_mc_drain, uc_pool, XCALL_TYPE_SYNC);
 	if (atomic_read(&uc_pool->status))
 		goto failed;
 
diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c
index 07d7ff5a981d..a6a0b8e50fab 100644
--- a/arch/mips/cavium-octeon/octeon-irq.c
+++ b/arch/mips/cavium-octeon/octeon-irq.c
@@ -216,7 +216,7 @@ static void octeon_irq_core_bus_sync_unlock(struct irq_data *data)
 	struct octeon_core_chip_data *cd = irq_data_get_irq_chip_data(data);
 
 	if (cd->desired_en != cd->current_en) {
-		on_each_cpu(octeon_irq_core_set_enable_local, data, 1);
+		smp_xcall(XCALL_ALL, octeon_irq_core_set_enable_local, data, XCALL_TYPE_SYNC);
 
 		cd->current_en = cd->desired_en;
 	}
@@ -1364,7 +1364,7 @@ void octeon_irq_set_ip4_handler(octeon_irq_ip4_handler_t h)
 {
 	octeon_irq_ip4 = h;
 	octeon_irq_use_ip4 = true;
-	on_each_cpu(octeon_irq_local_enable_ip4, NULL, 1);
+	smp_xcall(XCALL_ALL, octeon_irq_local_enable_ip4, NULL, XCALL_TYPE_SYNC);
 }
 
 static void octeon_irq_percpu_enable(void)
diff --git a/arch/mips/cavium-octeon/setup.c b/arch/mips/cavium-octeon/setup.c
index 00bf269763cf..dedfc714f733 100644
--- a/arch/mips/cavium-octeon/setup.c
+++ b/arch/mips/cavium-octeon/setup.c
@@ -256,7 +256,7 @@ static void octeon_shutdown(void)
 {
 	octeon_generic_shutdown();
 #ifdef CONFIG_SMP
-	smp_call_function(octeon_kexec_smp_down, NULL, 0);
+	smp_xcall(XCALL_ALL, octeon_kexec_smp_down, NULL, XCALL_TYPE_ASYNC);
 	smp_wmb();
 	while (num_online_cpus() > 1) {
 		cpu_relax();
@@ -469,7 +469,7 @@ static void octeon_kill_core(void *arg)
  */
 static void octeon_halt(void)
 {
-	smp_call_function(octeon_kill_core, NULL, 0);
+	smp_xcall(XCALL_ALL, octeon_kill_core, NULL, XCALL_TYPE_ASYNC);
 
 	switch (octeon_bootinfo->board_type) {
 	case CVMX_BOARD_TYPE_NAO38:
diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
index 81845ba04835..3c28e5627f89 100644
--- a/arch/mips/kernel/crash.c
+++ b/arch/mips/kernel/crash.c
@@ -63,7 +63,7 @@ static void crash_kexec_prepare_cpus(void)
 
 	ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
 
-	smp_call_function(crash_shutdown_secondary, NULL, 0);
+	smp_xcall(XCALL_ALL, crash_shutdown_secondary, NULL, XCALL_TYPE_ASYNC);
 	smp_wmb();
 
 	/*
diff --git a/arch/mips/kernel/machine_kexec.c b/arch/mips/kernel/machine_kexec.c
index 432bfd3e7f22..f995c82b0375 100644
--- a/arch/mips/kernel/machine_kexec.c
+++ b/arch/mips/kernel/machine_kexec.c
@@ -139,7 +139,7 @@ machine_shutdown(void)
 		_machine_kexec_shutdown();
 
 #ifdef CONFIG_SMP
-	smp_call_function(kexec_shutdown_secondary, NULL, 0);
+	smp_xcall(XCALL_ALL, kexec_shutdown_secondary, NULL, XCALL_TYPE_ASYNC);
 
 	while (num_online_cpus() > 1) {
 		cpu_relax();
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 1641d274fe37..88ab3bc40741 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -642,8 +642,9 @@ static void hw_perf_event_destroy(struct perf_event *event)
 		 * We must not call the destroy function with interrupts
 		 * disabled.
 		 */
-		on_each_cpu(reset_counters,
-			(void *)(long)mipspmu.num_counters, 1);
+		smp_xcall(XCALL_ALL, reset_counters,
+			(void *)(long)mipspmu.num_counters, XCALL_TYPE_SYNC);
+
 		mipspmu_free_irq();
 		mutex_unlock(&pmu_reserve_mutex);
 	}
@@ -2043,7 +2044,7 @@ init_hw_perf_events(void)
 		mipspmu.write_counter = mipsxx_pmu_write_counter;
 	}
 
-	on_each_cpu(reset_counters, (void *)(long)counters, 1);
+	smp_xcall(XCALL_ALL, reset_counters, (void *)(long)counters, XCALL_TYPE_SYNC);
 
 	pr_cont("%s PMU enabled, %d %d-bit counters available to each "
 		"CPU, irq %d%s\n", mipspmu.name, counters, counter_bits, irq,
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 1986d1309410..d5bb38bfaef5 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -397,7 +397,7 @@ static void stop_this_cpu(void *dummy)
 
 void smp_send_stop(void)
 {
-	smp_call_function(stop_this_cpu, NULL, 0);
+	smp_xcall(XCALL_ALL, stop_this_cpu, NULL, XCALL_TYPE_ASYNC);
 }
 
 void __init smp_cpus_done(unsigned int max_cpus)
@@ -472,7 +472,7 @@ void flush_tlb_all(void)
 		return;
 	}
 
-	on_each_cpu(flush_tlb_all_ipi, NULL, 1);
+	smp_xcall(XCALL_ALL, flush_tlb_all_ipi, NULL, XCALL_TYPE_SYNC);
 }
 
 static void flush_tlb_mm_ipi(void *mm)
@@ -490,7 +490,7 @@ static void flush_tlb_mm_ipi(void *mm)
  */
 static inline void smp_on_other_tlbs(void (*func) (void *info), void *info)
 {
-	smp_call_function(func, info, 1);
+	smp_xcall(XCALL_ALL, func, info, XCALL_TYPE_SYNC);
 }
 
 static inline void smp_on_each_tlb(void (*func) (void *info), void *info)
@@ -617,7 +617,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.addr2 = end,
 	};
 
-	on_each_cpu(flush_tlb_kernel_range_ipi, &fd, 1);
+	smp_xcall(XCALL_ALL, flush_tlb_kernel_range_ipi, &fd, XCALL_TYPE_SYNC);
 }
 
 static void flush_tlb_page_ipi(void *info)
diff --git a/arch/mips/kernel/sysrq.c b/arch/mips/kernel/sysrq.c
index 9c1a2019113b..352ace351a5c 100644
--- a/arch/mips/kernel/sysrq.c
+++ b/arch/mips/kernel/sysrq.c
@@ -38,7 +38,7 @@ static void sysrq_tlbdump_single(void *dummy)
 #ifdef CONFIG_SMP
 static void sysrq_tlbdump_othercpus(struct work_struct *dummy)
 {
-	smp_call_function(sysrq_tlbdump_single, NULL, 0);
+	smp_xcall(XCALL_ALL, sysrq_tlbdump_single, NULL, XCALL_TYPE_ASYNC);
 }
 
 static DECLARE_WORK(sysrq_tlbdump, sysrq_tlbdump_othercpus);
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index ccb9e47322b0..451662ee259c 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -96,8 +96,8 @@ static inline void r4k_on_each_cpu(unsigned int type,
 {
 	preempt_disable();
 	if (r4k_op_needs_ipi(type))
-		smp_call_function_many(&cpu_foreign_map[smp_processor_id()],
-				       func, info, 1);
+		smp_xcall_mask(&cpu_foreign_map[smp_processor_id()],
+				       func, info, XCALL_TYPE_SYNC);
 	func(info);
 	preempt_enable();
 }
diff --git a/arch/mips/sibyte/common/cfe.c b/arch/mips/sibyte/common/cfe.c
index 1a504294d85f..d3bc788cec0e 100644
--- a/arch/mips/sibyte/common/cfe.c
+++ b/arch/mips/sibyte/common/cfe.c
@@ -57,7 +57,7 @@ static void __noreturn cfe_linux_exit(void *arg)
 		if (!reboot_smp) {
 			/* Get CPU 0 to do the cfe_exit */
 			reboot_smp = 1;
-			smp_call_function(cfe_linux_exit, arg, 0);
+			smp_xcall(XCALL_ALL, cfe_linux_exit, arg, XCALL_TYPE_ASYNC);
 		}
 	} else {
 		printk("Passing control back to CFE...\n");
diff --git a/arch/openrisc/kernel/smp.c b/arch/openrisc/kernel/smp.c
index 27041db2c8b0..45e35fde4030 100644
--- a/arch/openrisc/kernel/smp.c
+++ b/arch/openrisc/kernel/smp.c
@@ -194,7 +194,7 @@ static void stop_this_cpu(void *dummy)
 
 void smp_send_stop(void)
 {
-	smp_call_function(stop_this_cpu, NULL, 0);
+	smp_xcall(XCALL_ALL, stop_this_cpu, NULL, XCALL_TYPE_ASYNC);
 }
 
 /* not supported, yet */
@@ -244,7 +244,7 @@ static void smp_flush_tlb_mm(struct cpumask *cmask, struct mm_struct *mm)
 		/* local cpu is the only cpu present in cpumask */
 		local_flush_tlb_mm(mm);
 	} else {
-		on_each_cpu_mask(cmask, ipi_flush_tlb_mm, mm, 1);
+		smp_xcall_mask(cmask, ipi_flush_tlb_mm, mm, XCALL_TYPE_SYNC);
 	}
 	put_cpu();
 }
@@ -291,16 +291,16 @@ static void smp_flush_tlb_range(const struct cpumask *cmask, unsigned long start
 		fd.addr2 = end;
 
 		if ((end - start) <= PAGE_SIZE)
-			on_each_cpu_mask(cmask, ipi_flush_tlb_page, &fd, 1);
+			smp_xcall_mask(cmask, ipi_flush_tlb_page, &fd, XCALL_TYPE_SYNC);
 		else
-			on_each_cpu_mask(cmask, ipi_flush_tlb_range, &fd, 1);
+			smp_xcall_mask(cmask, ipi_flush_tlb_range, &fd, XCALL_TYPE_SYNC);
 	}
 	put_cpu();
 }
 
 void flush_tlb_all(void)
 {
-	on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_all, NULL, XCALL_TYPE_SYNC);
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
@@ -331,6 +331,6 @@ static void ipi_icache_page_inv(void *arg)
 
 void smp_icache_page_inv(struct page *page)
 {
-	on_each_cpu(ipi_icache_page_inv, page, 1);
+	smp_xcall(XCALL_ALL, ipi_icache_page_inv, page, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(smp_icache_page_inv);
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index 23348199f3f8..8619591439c6 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -81,13 +81,13 @@ void flush_cache_all_local(void)
 void flush_cache_all(void)
 {
 	if (static_branch_likely(&parisc_has_cache))
-		on_each_cpu(cache_flush_local_cpu, NULL, 1);
+		smp_xcall(XCALL_ALL, cache_flush_local_cpu, NULL, XCALL_TYPE_SYNC);
 }
 
 static inline void flush_data_cache(void)
 {
 	if (static_branch_likely(&parisc_has_dcache))
-		on_each_cpu(flush_data_cache_local, NULL, 1);
+		smp_xcall(XCALL_ALL, flush_data_cache_local, NULL, XCALL_TYPE_SYNC);
 }
 
 
diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index 1dc2e88e7b04..3cf4f06e8386 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -847,7 +847,7 @@ void flush_tlb_all(void)
 	    do_recycle++;
 	}
 	spin_unlock(&sid_lock);
-	on_each_cpu(flush_tlb_all_local, NULL, 1);
+	smp_xcall(XCALL_ALL, lush_tlb_all_local, NULL, XCALL_TYPE_SYNC);
 	if (do_recycle) {
 	    spin_lock(&sid_lock);
 	    recycle_sids(recycle_ndirty,recycle_dirty_array);
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
index 64e423d2fe0f..58031057ec1e 100644
--- a/arch/powerpc/kernel/dawr.c
+++ b/arch/powerpc/kernel/dawr.c
@@ -77,7 +77,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
 
 	/* If we are clearing, make sure all CPUs have the DAWR cleared */
 	if (!dawr_force_enable)
-		smp_call_function(disable_dawrs_cb, NULL, 0);
+		smp_xcall(XCALL_ALL, disable_dawrs_cb, NULL, XCALL_TYPE_ASYNC);
 
 	return rc;
 }
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 6568823cf306..6923156e2379 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -666,7 +666,7 @@ static void __init kvm_use_magic_page(void)
 	u32 features;
 
 	/* Tell the host to map the magic page to -4096 on all CPUs */
-	on_each_cpu(kvm_map_magic_page, &features, 1);
+	smp_xcall(XCALL_ALL, kvm_map_magic_page, &features, XCALL_TYPE_SYNC);
 
 	/* Quick self-test to see if the mapping works */
 	if (fault_in_readable((const char __user *)KVM_MAGIC_PAGE,
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index d96fd14bd7c9..31a4e9856e00 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -607,7 +607,7 @@ void rfi_flush_enable(bool enable)
 {
 	if (enable) {
 		do_rfi_flush_fixups(enabled_flush_types);
-		on_each_cpu(do_nothing, NULL, 1);
+		smp_xcall(XCALL_ALL, do_nothing, NULL, XCALL_TYPE_SYNC);
 	} else
 		do_rfi_flush_fixups(L1D_FLUSH_NONE);
 
@@ -618,7 +618,7 @@ static void entry_flush_enable(bool enable)
 {
 	if (enable) {
 		do_entry_flush_fixups(enabled_flush_types);
-		on_each_cpu(do_nothing, NULL, 1);
+		smp_xcall(XCALL_ALL, do_nothing, NULL, XCALL_TYPE_SYNC);
 	} else {
 		do_entry_flush_fixups(L1D_FLUSH_NONE);
 	}
@@ -631,7 +631,7 @@ static void uaccess_flush_enable(bool enable)
 	if (enable) {
 		do_uaccess_flush_fixups(enabled_flush_types);
 		static_branch_enable(&uaccess_flush_key);
-		on_each_cpu(do_nothing, NULL, 1);
+		smp_xcall(XCALL_ALL, do_nothing, NULL, XCALL_TYPE_SYNC);
 	} else {
 		static_branch_disable(&uaccess_flush_key);
 		do_uaccess_flush_fixups(L1D_FLUSH_NONE);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index de0f6f09a5dd..5650a1510cb7 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -655,7 +655,7 @@ void crash_smp_send_stop(void)
 #ifdef CONFIG_NMI_IPI
 	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 1000000);
 #else
-	smp_call_function(crash_stop_this_cpu, NULL, 0);
+	smp_xcall(XCALL_ALL, crash_stop_this_cpu, NULL, XCALL_TYPE_ASYNC);
 #endif /* CONFIG_NMI_IPI */
 }
 
@@ -711,7 +711,7 @@ void smp_send_stop(void)
 
 	stopped = true;
 
-	smp_call_function(stop_this_cpu, NULL, 0);
+	smp_xcall(XCALL_ALL, stop_this_cpu, NULL, XCALL_TYPE_ASYNC);
 }
 #endif /* CONFIG_NMI_IPI */
 
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 2069bbb90a9a..0ce6aff8eca0 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -206,7 +206,7 @@ static ssize_t __used store_dscr_default(struct device *dev,
 		return -EINVAL;
 	dscr_default = val;
 
-	on_each_cpu(write_dscr, &val, 1);
+	smp_xcall(XCALL_ALL, write_dscr, &val, XCALL_TYPE_SYNC);
 
 	return count;
 }
diff --git a/arch/powerpc/kernel/tau_6xx.c b/arch/powerpc/kernel/tau_6xx.c
index 828d0f4106d2..fac64b71fc93 100644
--- a/arch/powerpc/kernel/tau_6xx.c
+++ b/arch/powerpc/kernel/tau_6xx.c
@@ -158,7 +158,7 @@ static struct workqueue_struct *tau_workq;
 static void tau_work_func(struct work_struct *work)
 {
 	msleep(shrink_timer);
-	on_each_cpu(tau_timeout, NULL, 0);
+	smp_xcall(XCALL_ALL, tau_timeout, NULL, XCALL_TYPE_ASYNC);
 	/* schedule ourselves to be run again */
 	queue_work(tau_workq, work);
 }
@@ -204,7 +204,7 @@ static int __init TAU_init(void)
 	if (!tau_workq)
 		return -ENOMEM;
 
-	on_each_cpu(TAU_init_smp, NULL, 0);
+	smp_xcall(XCALL_ALL, TAU_init_smp, NULL, XCALL_TYPE_ASYNC);
 
 	queue_work(tau_workq, &tau_work);
 
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 6cc7793b8420..7f9f4477749a 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -225,7 +225,7 @@ static void wake_offline_cpus(void)
 static void kexec_prepare_cpus(void)
 {
 	wake_offline_cpus();
-	smp_call_function(kexec_smp_down, NULL, /* wait */0);
+	smp_xcall(XCALL_ALL, kexec_smp_down, NULL, XCALL_TYPE_ASYNC);
 	local_irq_disable();
 	hard_irq_disable();
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 0aeb51738ca9..433a04ac48dc 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1555,7 +1555,7 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 
 	/* Boot all CPUs out of the guest so they re-read
 	 * mmu_ready */
-	on_each_cpu(resize_hpt_boot_vcpu, NULL, 1);
+	smp_xcall(XCALL_ALL, resize_hpt_boot_vcpu, NULL, XCALL_TYPE_SYNC);
 
 	ret = -ENXIO;
 	if (!resize || (resize->order != shift))
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 052e6590f84f..b7129b2bab5b 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -111,7 +111,7 @@ static void do_serialize(void *arg)
 void serialize_against_pte_lookup(struct mm_struct *mm)
 {
 	smp_mb();
-	smp_call_function_many(mm_cpumask(mm), do_serialize, mm, 1);
+	smp_xcall_mask(mm_cpumask(mm), do_serialize, mm, XCALL_TYPE_SYNC);
 }
 
 /*
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 7724af19ed7e..2e46ef59ce5b 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -464,7 +464,7 @@ static inline void _tlbiel_pid_multicast(struct mm_struct *mm,
 	struct cpumask *cpus = mm_cpumask(mm);
 	struct tlbiel_pid t = { .pid = pid, .ric = ric };
 
-	on_each_cpu_mask(cpus, do_tlbiel_pid, &t, 1);
+	smp_xcall_mask(cpus, do_tlbiel_pid, &t, XCALL_TYPE_SYNC);
 	/*
 	 * Always want the CPU translations to be invalidated with tlbiel in
 	 * these paths, so while coprocessors must use tlbie, we can not
@@ -616,7 +616,7 @@ static inline void _tlbiel_va_multicast(struct mm_struct *mm,
 {
 	struct cpumask *cpus = mm_cpumask(mm);
 	struct tlbiel_va t = { .va = va, .pid = pid, .psize = psize, .ric = ric };
-	on_each_cpu_mask(cpus, do_tlbiel_va, &t, 1);
+	smp_xcall_mask(cpus, do_tlbiel_va, &t, XCALL_TYPE_SYNC);
 	if (atomic_read(&mm->context.copros) > 0)
 		_tlbie_va(va, pid, psize, RIC_FLUSH_TLB);
 }
@@ -682,7 +682,7 @@ static inline void _tlbiel_va_range_multicast(struct mm_struct *mm,
 				.pid = pid, .page_size = page_size,
 				.psize = psize, .also_pwc = also_pwc };
 
-	on_each_cpu_mask(cpus, do_tlbiel_va_range, &t, 1);
+	smp_xcall_mask(cpus, do_tlbiel_va_range, &t, XCALL_TYPE_SYNC);
 	if (atomic_read(&mm->context.copros) > 0)
 		_tlbie_va_range(start, end, pid, page_size, psize, also_pwc);
 }
@@ -827,8 +827,8 @@ static void exit_flush_lazy_tlbs(struct mm_struct *mm)
 	 * make a special powerpc IPI for flushing TLBs.
 	 * For now it's not too performance critical.
 	 */
-	smp_call_function_many(mm_cpumask(mm), do_exit_flush_lazy_tlb,
-				(void *)mm, 1);
+	smp_xcall_mask(mm_cpumask(mm), do_exit_flush_lazy_tlb,
+				(void *)mm, XCALL_TYPE_SYNC);
 }
 
 #else /* CONFIG_SMP */
@@ -1064,7 +1064,7 @@ static void do_tlbiel_kernel(void *info)
 
 static inline void _tlbiel_kernel_broadcast(void)
 {
-	on_each_cpu(do_tlbiel_kernel, NULL, 1);
+	smp_xcall(XCALL_ALL, do_tlbiel_kernel, NULL, XCALL_TYPE_SYNC);
 	if (tlbie_capable) {
 		/*
 		 * Coherent accelerators don't refcount kernel memory mappings,
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index fd2c77af5c55..fca3e4ebbfd1 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -276,8 +276,8 @@ void flush_tlb_mm(struct mm_struct *mm)
 	if (!mm_is_core_local(mm)) {
 		struct tlb_flush_param p = { .pid = pid };
 		/* Ignores smp_processor_id() even if set. */
-		smp_call_function_many(mm_cpumask(mm),
-				       do_flush_tlb_mm_ipi, &p, 1);
+		smp_xcall_mask(mm_cpumask(mm),
+			       do_flush_tlb_mm_ipi, &p, XCALL_TYPE_SYNC);
 	}
 	_tlbil_pid(pid);
  no_context:
@@ -321,8 +321,8 @@ void __flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
 				.ind = ind,
 			};
 			/* Ignores smp_processor_id() even if set in cpu_mask */
-			smp_call_function_many(cpu_mask,
-					       do_flush_tlb_page_ipi, &p, 1);
+			smp_xcall_mask(cpu_mask,
+				       do_flush_tlb_page_ipi, &p, XCALL_TYPE_SYNC);
 		}
 	}
 	_tlbil_va(vmaddr, pid, tsize, ind);
@@ -362,7 +362,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
 #ifdef CONFIG_SMP
 	preempt_disable();
-	smp_call_function(do_flush_tlb_mm_ipi, NULL, 1);
+	smp_xcall(XCALL_ALL, do_flush_tlb_mm_ipi, NULL, XCALL_TYPE_SYNC);
 	_tlbil_pid(0);
 	preempt_enable();
 #else
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index f42711f865f3..a3348fc0e1fe 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -464,7 +464,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 		 */
 		mm_ctx_set_slb_addr_limit(&mm->context, high_limit);
 
-		on_each_cpu(slice_flush_segments, mm, 1);
+		smp_xcall(XCALL_ALL, slice_flush_segments, mm, XCALL_TYPE_SYNC);
 	}
 
 	/* Sanity checks */
@@ -626,7 +626,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 		 !bitmap_empty(potential_mask.high_slices, SLICE_NUM_HIGH))) {
 		slice_convert(mm, &potential_mask, psize);
 		if (psize > MMU_PAGE_BASE)
-			on_each_cpu(slice_flush_segments, mm, 1);
+			smp_xcall(XCALL_ALL, slice_flush_segments, mm, XCALL_TYPE_SYNC);
 	}
 	return newaddr;
 
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index b5b42cf0a703..d87477afe337 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2529,7 +2529,7 @@ static int __init init_ppc64_pmu(void)
 {
 	if (cpu_has_feature(CPU_FTR_HVMODE) && pmu_override) {
 		pr_warn("disabling perf due to pmu_override= command line option.\n");
-		on_each_cpu(do_pmu_override, NULL, 1);
+		smp_xcall(XCALL_ALL, do_pmu_override, NULL, XCALL_TYPE_SYNC);
 		return 0;
 	}
 
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 526d4b767534..0e367f651df7 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1583,7 +1583,7 @@ static void thread_imc_ldbar_disable(void *dummy)
 
 void thread_imc_disable(void)
 {
-	on_each_cpu(thread_imc_ldbar_disable, NULL, 1);
+	smp_xcall(XCALL_ALL, thread_imc_ldbar_disable, NULL, XCALL_TYPE_SYNC);
 }
 
 static void cleanup_all_thread_imc_memory(void)
diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index a1c6a7827c8f..ca4995a39884 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -445,7 +445,7 @@ static void mpc85xx_smp_machine_kexec(struct kimage *image)
 	int i, num_cpus = num_present_cpus();
 
 	if (image->type == KEXEC_TYPE_DEFAULT)
-		smp_call_function(mpc85xx_smp_kexec_down, NULL, 0);
+		smp_xcall(XCALL_ALL, mpc85xx_smp_kexec_down, NULL, XCALL_TYPE_ASYNC);
 
 	while ( (atomic_read(&kexec_down_cpus) != (num_cpus - 1)) &&
 		( timeout > 0 ) )
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index a6677a111aca..9b6136338df3 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -202,7 +202,7 @@ static ssize_t store_fastsleep_workaround_applyonce(struct device *dev,
 	power7_fastsleep_workaround_exit = false;
 
 	cpus_read_lock();
-	on_each_cpu(pnv_fastsleep_workaround_apply, &err, 1);
+	smp_xcall(XCALL_ALL, pnv_fastsleep_workaround_apply, &err, XCALL_TYPE_SYNC);
 	cpus_read_unlock();
 	if (err) {
 		pr_err("fastsleep_workaround_applyonce change failed while running pnv_fastsleep_workaround_apply");
diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index 2119c003fcf9..8f5fdf69155a 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -61,7 +61,7 @@ static unsigned long get_purr(void)
 {
 	atomic64_t purr = ATOMIC64_INIT(0);
 
-	on_each_cpu(cpu_get_purr, &purr, 1);
+	smp_xcall(XCALL_ALL, cpu_get_purr, &purr, XCALL_TYPE_SYNC);
 
 	return atomic64_read(&purr);
 }
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 6cb7d96ad9c7..dbfa13c52742 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -21,7 +21,7 @@ void flush_icache_all(void)
 	if (IS_ENABLED(CONFIG_RISCV_SBI))
 		sbi_remote_fence_i(NULL);
 	else
-		on_each_cpu(ipi_remote_fence_i, NULL, 1);
+		smp_xcall(XCALL_ALL, ipi_remote_fence_i, NULL, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(flush_icache_all);
 
@@ -69,7 +69,7 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
 	} else if (IS_ENABLED(CONFIG_RISCV_SBI)) {
 		sbi_remote_fence_i(&others);
 	} else {
-		on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
+		smp_xcall_mask(&others, ipi_remote_fence_i, NULL, XCALL_TYPE_SYNC);
 	}
 
 	preempt_enable();
diff --git a/arch/s390/hypfs/hypfs_diag0c.c b/arch/s390/hypfs/hypfs_diag0c.c
index 9a2786079e3a..41c0a95d20d3 100644
--- a/arch/s390/hypfs/hypfs_diag0c.c
+++ b/arch/s390/hypfs/hypfs_diag0c.c
@@ -51,7 +51,7 @@ static void *diag0c_store(unsigned int *count)
 		cpu_vec[cpu] = &diag0c_data->entry[i++];
 	}
 	/* Collect data all CPUs */
-	on_each_cpu(diag0c_fn, cpu_vec, 1);
+	smp_xcall(XCALL_ALL, diag0c_fn, cpu_vec, XCALL_TYPE_SYNC);
 	*count = cpu_count;
 	kfree(cpu_vec);
 	cpus_read_unlock();
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index cce0ddee2d02..cf6808d25d8f 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -121,7 +121,7 @@ static void do_sync_core(void *info)
 
 void text_poke_sync(void)
 {
-	on_each_cpu(do_sync_core, NULL, 1);
+	smp_xcall(XCALL_ALL, do_sync_core, NULL, XCALL_TYPE_SYNC);
 }
 
 void text_poke_sync_lock(void)
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 483ab5e10164..0787ea07c003 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -923,7 +923,7 @@ static void cfset_all_stop(struct cfset_request *req)
 	};
 
 	cpumask_and(&req->mask, &req->mask, cpu_online_mask);
-	on_each_cpu_mask(&req->mask, cfset_ioctl_off, &p, 1);
+	smp_xcall_mask(&req->mask, cfset_ioctl_off, &p, XCALL_TYPE_SYNC);
 }
 
 /* Release function is also called when application gets terminated without
@@ -940,7 +940,7 @@ static int cfset_release(struct inode *inode, struct file *file)
 		file->private_data = NULL;
 	}
 	if (!atomic_dec_return(&cfset_opencnt))
-		on_each_cpu(cfset_release_cpu, NULL, 1);
+		smp_xcall(XCALL_ALL, cfset_release_cpu, NULL, XCALL_TYPE_SYNC);
 	mutex_unlock(&cfset_ctrset_mutex);
 
 	hw_perf_event_destroy(NULL);
@@ -974,9 +974,9 @@ static int cfset_all_start(struct cfset_request *req)
 	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
 		return -ENOMEM;
 	cpumask_and(mask, &req->mask, cpu_online_mask);
-	on_each_cpu_mask(mask, cfset_ioctl_on, &p, 1);
+	smp_xcall_mask(mask, cfset_ioctl_on, &p, XCALL_TYPE_SYNC);
 	if (atomic_read(&p.cpus_ack) != cpumask_weight(mask)) {
-		on_each_cpu_mask(mask, cfset_ioctl_off, &p, 1);
+		smp_xcall_mask(mask, cfset_ioctl_off, &p, XCALL_TYPE_SYNC);
 		rc = -EIO;
 		debug_sprintf_event(cf_dbg, 4, "%s CPUs missing", __func__);
 	}
@@ -1100,7 +1100,7 @@ static int cfset_all_read(unsigned long arg, struct cfset_request *req)
 
 	p.sets = req->ctrset;
 	cpumask_and(mask, &req->mask, cpu_online_mask);
-	on_each_cpu_mask(mask, cfset_cpu_read, &p, 1);
+	smp_xcall_mask(mask, cfset_cpu_read, &p, XCALL_TYPE_SYNC);
 	rc = cfset_all_copy(arg, mask);
 	free_cpumask_var(mask);
 	return rc;
diff --git a/arch/s390/kernel/perf_cpum_cf_common.c b/arch/s390/kernel/perf_cpum_cf_common.c
index 8ee48672233f..e67211b4f518 100644
--- a/arch/s390/kernel/perf_cpum_cf_common.c
+++ b/arch/s390/kernel/perf_cpum_cf_common.c
@@ -105,7 +105,7 @@ int __kernel_cpumcf_begin(void)
 {
 	int flags = PMC_INIT;
 
-	on_each_cpu(cpum_cf_setup_cpu, &flags, 1);
+	smp_xcall(XCALL_ALL, cpum_cf_setup_cpu, &flags, XCALL_TYPE_SYNC);
 	irq_subclass_register(IRQ_SUBCLASS_MEASUREMENT_ALERT);
 
 	return 0;
@@ -131,7 +131,7 @@ void __kernel_cpumcf_end(void)
 {
 	int flags = PMC_RELEASE;
 
-	on_each_cpu(cpum_cf_setup_cpu, &flags, 1);
+	smp_xcall(XCALL_ALL, cpum_cf_setup_cpu, &flags, XCALL_TYPE_SYNC);
 	irq_subclass_unregister(IRQ_SUBCLASS_MEASUREMENT_ALERT);
 }
 EXPORT_SYMBOL(__kernel_cpumcf_end);
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 332a49965130..b0a7e8574b4b 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -582,14 +582,14 @@ static void release_pmc_hardware(void)
 	int flags = PMC_RELEASE;
 
 	irq_subclass_unregister(IRQ_SUBCLASS_MEASUREMENT_ALERT);
-	on_each_cpu(setup_pmc_cpu, &flags, 1);
+	smp_xcall(XCALL_ALL, setup_pmc_cpu, &flags, XCALL_TYPE_SYNC);
 }
 
 static int reserve_pmc_hardware(void)
 {
 	int flags = PMC_INIT;
 
-	on_each_cpu(setup_pmc_cpu, &flags, 1);
+	smp_xcall(XCALL_ALL, setup_pmc_cpu, &flags, XCALL_TYPE_SYNC);
 	if (flags & PMC_FAILURE) {
 		release_pmc_hardware();
 		return -ENODEV;
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 7a74ea5f7531..ebb32b5cde40 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -62,7 +62,7 @@ void s390_update_cpu_mhz(void)
 {
 	s390_adjust_jiffies();
 	if (machine_has_cpu_mhz)
-		on_each_cpu(update_cpu_mhz, NULL, 0);
+		smp_xcall(XCALL_ALL, update_cpu_mhz, NULL, XCALL_TYPE_ASYNC);
 }
 
 void notrace stop_machine_yield(const struct cpumask *cpumask)
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 30c91d565933..c42190d4bd4f 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -595,7 +595,7 @@ void smp_ctl_set_clear_bit(int cr, int bit, bool set)
 	ctlreg = (ctlreg & parms.andval) | parms.orval;
 	put_abs_lowcore(cregs_save_area[cr], ctlreg);
 	spin_unlock(&ctl_lock);
-	on_each_cpu(smp_ctl_bit_callback, &parms, 1);
+	smp_xcall(XCALL_ALL, smp_ctl_bit_callback, &parms, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(smp_ctl_set_clear_bit);
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index c6eecd4a5302..3a4a52b9bc9e 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -322,7 +322,7 @@ int arch_update_cpu_topology(void)
 	int cpu, rc;
 
 	rc = __arch_update_cpu_topology();
-	on_each_cpu(__arch_update_dedicated_flag, NULL, 0);
+	smp_xcall(XCALL_ALL, __arch_update_dedicated_flag, NULL, XCALL_TYPE_ASYNC);
 	for_each_online_cpu(cpu) {
 		dev = get_cpu_device(cpu);
 		if (dev)
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 2de48b2c1b04..fa3afbe85f2d 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -131,7 +131,7 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
 
 	spin_unlock_bh(&mm->page_table_lock);
 
-	on_each_cpu(__crst_table_upgrade, mm, 0);
+	smp_xcall(XCALL_ALL, __crst_table_upgrade, mm, XCALL_TYPE_ASYNC);
 
 	return 0;
 
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 500cd2dbdf53..325c42c6ddb4 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -440,7 +440,7 @@ static int __init zpci_directed_irq_init(void)
 		if (!zpci_ibv[cpu])
 			return -ENOMEM;
 	}
-	on_each_cpu(cpu_enable_directed_irq, NULL, 1);
+	smp_xcall(XCALL_ALL, cpu_enable_directed_irq, NULL, XCALL_TYPE_SYNC);
 
 	zpci_irq_chip.irq_set_affinity = zpci_set_irq_affinity;
 
diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 65924d9ec245..3f1afc1b18fa 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -263,7 +263,7 @@ void smp_send_reschedule(int cpu)
 
 void smp_send_stop(void)
 {
-	smp_call_function(stop_this_cpu, 0, 0);
+	smp_xcall(XCALL_ALL, stop_this_cpu, NULL, XCALL_TYPE_ASYNC);
 }
 
 void arch_send_call_function_ipi_mask(const struct cpumask *mask)
@@ -335,7 +335,7 @@ static void flush_tlb_all_ipi(void *info)
 
 void flush_tlb_all(void)
 {
-	on_each_cpu(flush_tlb_all_ipi, 0, 1);
+	smp_xcall(XCALL_ALL, flush_tlb_all_ipi, NULL, XCALL_TYPE_SYNC);
 }
 
 static void flush_tlb_mm_ipi(void *mm)
@@ -360,7 +360,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	preempt_disable();
 
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
-		smp_call_function(flush_tlb_mm_ipi, (void *)mm, 1);
+		smp_xcall(XCALL_ALL, flush_tlb_mm_ipi, (void *)mm, XCALL_TYPE_SYNC);
 	} else {
 		int i;
 		for_each_online_cpu(i)
@@ -397,7 +397,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		fd.vma = vma;
 		fd.addr1 = start;
 		fd.addr2 = end;
-		smp_call_function(flush_tlb_range_ipi, (void *)&fd, 1);
+		smp_xcall(XCALL_ALL, flush_tlb_range_ipi, (void *)&fd, XCALL_TYPE_SYNC);
 	} else {
 		int i;
 		for_each_online_cpu(i)
@@ -421,7 +421,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 
 	fd.addr1 = start;
 	fd.addr2 = end;
-	on_each_cpu(flush_tlb_kernel_range_ipi, (void *)&fd, 1);
+	smp_xcall(XCALL_ALL, flush_tlb_kernel_range_ipi, (void *)&fd, XCALL_TYPE_SYNC);
 }
 
 static void flush_tlb_page_ipi(void *info)
@@ -440,7 +440,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 
 		fd.vma = vma;
 		fd.addr1 = page;
-		smp_call_function(flush_tlb_page_ipi, (void *)&fd, 1);
+		smp_xcall(XCALL_ALL, flush_tlb_page_ipi, (void *)&fd, XCALL_TYPE_SYNC);
 	} else {
 		int i;
 		for_each_online_cpu(i)
@@ -464,7 +464,7 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
 	fd.addr1 = asid;
 	fd.addr2 = vaddr;
 
-	smp_call_function(flush_tlb_one_ipi, (void *)&fd, 1);
+	smp_xcall(XCALL_ALL, flush_tlb_one_ipi, (void *)&fd, XCALL_TYPE_SYNC);
 	local_flush_tlb_one(asid, vaddr);
 }
 
diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
index 3aef78ceb820..67eef4bf42e3 100644
--- a/arch/sh/mm/cache.c
+++ b/arch/sh/mm/cache.c
@@ -49,7 +49,7 @@ static inline void cacheop_on_each_cpu(void (*func) (void *info), void *info,
 	 * even attempt IPIs unless there are other CPUs online.
 	 */
 	if (num_online_cpus() > 1)
-		smp_call_function(func, info, wait);
+		smp_xcall(XCALL_ALL, func, info, (wait ? XCALL_TYPE_SYNC : XCALL_TYPE_ASYNC));
 #endif
 
 	func(info);
diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h
index 274217e7ed70..ce7a2be6df0b 100644
--- a/arch/sparc/include/asm/mman.h
+++ b/arch/sparc/include/asm/mman.h
@@ -37,8 +37,8 @@ static inline unsigned long sparc_calc_vm_prot_bits(unsigned long prot)
 			regs = task_pt_regs(current);
 			regs->tstate |= TSTATE_MCDE;
 			current->mm->context.adi = true;
-			on_each_cpu_mask(mm_cpumask(current->mm),
-					 ipi_set_tstate_mcde, current->mm, 0);
+			smp_xcall_mask(mm_cpumask(current->mm),
+				ipi_set_tstate_mcde, current->mm, XCALL_TYPE_ASYNC);
 		}
 		return VM_SPARC_ADI;
 	} else {
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 060fff95a305..ff789082d5ab 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -176,7 +176,7 @@ static int __init check_nmi_watchdog(void)
 
 	printk(KERN_INFO "Testing NMI watchdog ... ");
 
-	smp_call_function(nmi_cpu_busy, (void *)&endflag, 0);
+	smp_xcall(XCALL_ALL, nmi_cpu_busy, (void *)&endflag, XCALL_TYPE_ASYNC);
 
 	for_each_possible_cpu(cpu)
 		prev_nmi_count[cpu] = get_nmi_count(cpu);
@@ -203,7 +203,7 @@ static int __init check_nmi_watchdog(void)
 	kfree(prev_nmi_count);
 	return 0;
 error:
-	on_each_cpu(stop_nmi_watchdog, NULL, 1);
+	smp_xcall(XCALL_ALL, stop_nmi_watchdog, NULL, XCALL_TYPE_SYNC);
 	return err;
 }
 
@@ -235,13 +235,13 @@ static void nmi_adjust_hz_one(void *unused)
 void nmi_adjust_hz(unsigned int new_hz)
 {
 	nmi_hz = new_hz;
-	on_each_cpu(nmi_adjust_hz_one, NULL, 1);
+	smp_xcall(XCALL_ALL, nmi_adjust_hz_one, NULL, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(nmi_adjust_hz);
 
 static int nmi_shutdown(struct notifier_block *nb, unsigned long cmd, void *p)
 {
-	on_each_cpu(stop_nmi_watchdog, NULL, 1);
+	smp_xcall(XCALL_ALL, stop_nmi_watchdog, NULL, XCALL_TYPE_SYNC);
 	return 0;
 }
 
@@ -253,13 +253,13 @@ int __init nmi_init(void)
 {
 	int err;
 
-	on_each_cpu(start_nmi_watchdog, NULL, 1);
+	smp_xcall(XCALL_ALL, start_nmi_watchdog, NULL, XCALL_TYPE_SYNC);
 
 	err = check_nmi_watchdog();
 	if (!err) {
 		err = register_reboot_notifier(&nmi_reboot_notifier);
 		if (err) {
-			on_each_cpu(stop_nmi_watchdog, NULL, 1);
+			smp_xcall(XCALL_ALL, stop_nmi_watchdog, NULL, XCALL_TYPE_SYNC);
 			atomic_set(&nmi_active, -1);
 		}
 	}
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index a58ae9c42803..e6038831a4f0 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1176,7 +1176,7 @@ static void perf_event_grab_pmc(void)
 	mutex_lock(&pmc_grab_mutex);
 	if (atomic_read(&active_events) == 0) {
 		if (atomic_read(&nmi_active) > 0) {
-			on_each_cpu(perf_stop_nmi_watchdog, NULL, 1);
+			smp_xcall(XCALL_ALL, perf_stop_nmi_watchdog, NULL, XCALL_TYPE_SYNC);
 			BUG_ON(atomic_read(&nmi_active) != 0);
 		}
 		atomic_inc(&active_events);
@@ -1188,7 +1188,7 @@ static void perf_event_release_pmc(void)
 {
 	if (atomic_dec_and_mutex_lock(&active_events, &pmc_grab_mutex)) {
 		if (atomic_read(&nmi_active) == 0)
-			on_each_cpu(start_nmi_watchdog, NULL, 1);
+			smp_xcall(XCALL_ALL, start_nmi_watchdog, NULL, XCALL_TYPE_SYNC);
 		mutex_unlock(&pmc_grab_mutex);
 	}
 }
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index a1f78e9ddaf3..2785d660eb2a 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -901,7 +901,7 @@ static void tsb_sync(void *info)
 
 void smp_tsb_sync(struct mm_struct *mm)
 {
-	smp_call_function_many(mm_cpumask(mm), tsb_sync, mm, 1);
+	smp_xcall_mask(mm_cpumask(mm), tsb_sync, mm, XCALL_TYPE_SYNC);
 }
 
 extern unsigned long xcall_flush_tlb_mm;
@@ -1084,8 +1084,8 @@ void smp_flush_tlb_pending(struct mm_struct *mm, unsigned long nr, unsigned long
 	info.nr = nr;
 	info.vaddrs = vaddrs;
 
-	smp_call_function_many(mm_cpumask(mm), tlb_pending_func,
-			       &info, 1);
+	smp_xcall_mask(mm_cpumask(mm), tlb_pending_func,
+		       &info, XCALL_TYPE_SYNC);
 
 	__flush_tlb_pending(ctx, nr, vaddrs);
 
@@ -1523,7 +1523,7 @@ void smp_send_stop(void)
 				prom_stopcpu_cpuid(cpu);
 		}
 	} else
-		smp_call_function(stop_this_cpu, NULL, 0);
+		smp_xcall(XCALL_ALL, stop_this_cpu, NULL, XCALL_TYPE_ASYNC);
 }
 
 static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 8b1911591581..68c48edd012f 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -3024,7 +3024,7 @@ void hugetlb_setup(struct pt_regs *regs)
 		spin_unlock_irq(&ctx_alloc_lock);
 
 		if (need_context_reload)
-			on_each_cpu(context_reload, mm, 0);
+			smp_xcall(XCALL_ALL, context_reload, mm, XCALL_TYPE_ASYNC);
 	}
 }
 #endif
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index eef816fc216d..bb0f3f6d8019 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2526,7 +2526,7 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
 	mmap_assert_write_locked(mm);
 
 	if (atomic_inc_return(&mm->context.perf_rdpmc_allowed) == 1)
-		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
+		smp_xcall_mask(mm_cpumask(mm), cr4_update_pce, NULL, XCALL_TYPE_SYNC);
 }
 
 static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
@@ -2535,7 +2535,7 @@ static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *m
 		return;
 
 	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
-		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
+		smp_xcall_mask(mm_cpumask(mm), cr4_update_pce, NULL, XCALL_TYPE_SYNC);
 }
 
 static int x86_pmu_event_idx(struct perf_event *event)
@@ -2591,7 +2591,7 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
 		else if (x86_pmu.attr_rdpmc == 2)
 			static_branch_dec(&rdpmc_always_available_key);
 
-		on_each_cpu(cr4_update_pce, NULL, 1);
+		smp_xcall(XCALL_ALL, cr4_update_pce, NULL, XCALL_TYPE_SYNC);
 		x86_pmu.attr_rdpmc = val;
 	}
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index fc7f458eb3de..f2b577e6b66d 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5123,7 +5123,7 @@ static ssize_t freeze_on_smi_store(struct device *cdev,
 	x86_pmu.attr_freeze_on_smi = val;
 
 	cpus_read_lock();
-	on_each_cpu(flip_smm_bit, &val, 1);
+	smp_xcall(XCALL_ALL, flip_smm_bit, &val, XCALL_TYPE_SYNC);
 	cpus_read_unlock();
 done:
 	mutex_unlock(&freeze_on_smi_mutex);
@@ -5168,7 +5168,7 @@ static ssize_t set_sysctl_tfa(struct device *cdev,
 	allow_tsx_force_abort = val;
 
 	cpus_read_lock();
-	on_each_cpu(update_tfa_sched, NULL, 1);
+	smp_xcall(XCALL_ALL, update_tfa_sched, NULL, XCALL_TYPE_SYNC);
 	cpus_read_unlock();
 
 	return count;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d374cb3cf024..7721b55e58e5 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1181,7 +1181,7 @@ static void do_sync_core(void *info)
 
 void text_poke_sync(void)
 {
-	on_each_cpu(do_sync_core, NULL, 1);
+	smp_xcall(XCALL_ALL, do_sync_core, NULL, XCALL_TYPE_SYNC);
 }
 
 struct text_poke_loc {
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 020c906f7934..b6276a0a21eb 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -488,7 +488,7 @@ static __init void fix_erratum_688(void)
 	if (val & BIT(2))
 		return;
 
-	on_each_cpu(__fix_erratum_688, NULL, 0);
+	smp_xcall(XCALL_ALL, __fix_erratum_688, NULL, XCALL_TYPE_ASYNC);
 
 	pr_info("x86/cpu/AMD: CPU erratum 688 worked around\n");
 }
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b70344bf6600..56f281fed53a 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -656,7 +656,7 @@ void lapic_update_tsc_freq(void)
 	 * changed. In order to avoid races, schedule the frequency
 	 * update code on each CPU.
 	 */
-	on_each_cpu(__lapic_update_tsc_freq, NULL, 0);
+	smp_xcall(XCALL_ALL, __lapic_update_tsc_freq, NULL, XCALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6296e1ebed1d..d9d242993982 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1091,7 +1091,7 @@ static void update_stibp_strict(void)
 	pr_info("Update user space SMT mitigation: STIBP %s\n",
 		mask & SPEC_CTRL_STIBP ? "always-on" : "off");
 	x86_spec_ctrl_base = mask;
-	on_each_cpu(update_stibp_msr, NULL, 1);
+	smp_xcall(XCALL_ALL, update_stibp_msr, NULL, XCALL_TYPE_SYNC);
 }
 
 /* Update the static key controlling the evaluation of TIF_SPEC_IB */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 981496e6bc0e..7e021f6fa20b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2207,7 +2207,7 @@ void mce_disable_bank(int bank)
 		return;
 	}
 	set_bit(bank, mce_banks_ce_disabled);
-	on_each_cpu(__mce_disable_bank, &bank, 1);
+	smp_xcall(XCALL_ALL, __mce_disable_bank, &bank, XCALL_TYPE_SYNC);
 }
 
 /*
@@ -2362,7 +2362,7 @@ static void mce_cpu_restart(void *data)
 static void mce_restart(void)
 {
 	mce_timer_delete_all();
-	on_each_cpu(mce_cpu_restart, NULL, 1);
+	smp_xcall(XCALL_ALL, mce_cpu_restart, NULL, XCALL_TYPE_SYNC);
 }
 
 /* Toggle features for corrected errors */
@@ -2450,12 +2450,12 @@ static ssize_t set_ignore_ce(struct device *s,
 		if (new) {
 			/* disable ce features */
 			mce_timer_delete_all();
-			on_each_cpu(mce_disable_cmci, NULL, 1);
+			smp_xcall(XCALL_ALL, mce_disable_cmci, NULL, XCALL_TYPE_SYNC);
 			mca_cfg.ignore_ce = true;
 		} else {
 			/* enable ce features */
 			mca_cfg.ignore_ce = false;
-			on_each_cpu(mce_enable_ce, (void *)1, 1);
+			smp_xcall(XCALL_ALL, mce_enable_ce, (void *)1, XCALL_TYPE_SYNC);
 		}
 	}
 	mutex_unlock(&mce_sysfs_mutex);
@@ -2476,12 +2476,12 @@ static ssize_t set_cmci_disabled(struct device *s,
 	if (mca_cfg.cmci_disabled ^ !!new) {
 		if (new) {
 			/* disable cmci */
-			on_each_cpu(mce_disable_cmci, NULL, 1);
+			smp_xcall(XCALL_ALL, mce_disable_cmci, NULL, XCALL_TYPE_SYNC);
 			mca_cfg.cmci_disabled = true;
 		} else {
 			/* enable cmci */
 			mca_cfg.cmci_disabled = false;
-			on_each_cpu(mce_enable_ce, NULL, 1);
+			smp_xcall(XCALL_ALL, mce_enable_ce, NULL, XCALL_TYPE_SYNC);
 		}
 	}
 	mutex_unlock(&mce_sysfs_mutex);
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 5fbd7ffb3233..f23445733020 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -263,10 +263,8 @@ static void __maybe_unused raise_mce(struct mce *m)
 				 * don't wait because mce_irq_ipi is necessary
 				 * to be sync with following raise_local
 				 */
-				preempt_disable();
-				smp_call_function_many(mce_inject_cpumask,
-					mce_irq_ipi, NULL, 0);
-				preempt_enable();
+				smp_xcall_mask(mce_inject_cpumask,
+					mce_irq_ipi, NULL, XCALL_TYPE_ASYNC);
 			} else if (m->inject_flags & MCJ_NMI_BROADCAST)
 				apic->send_IPI_mask(mce_inject_cpumask,
 						NMI_VECTOR);
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index 95275a5e57e0..6385fff2051b 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -400,7 +400,7 @@ void cmci_rediscover(void)
 	if (!cmci_supported(&banks))
 		return;
 
-	on_each_cpu(cmci_rediscover_work_func, NULL, 1);
+	smp_xcall(XCALL_ALL, cmci_rediscover_work_func, NULL, XCALL_TYPE_SYNC);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 7e45da5f3c8b..5daf63859c95 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -326,7 +326,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		rdt_ctrl_update(&msr_param);
 	/* Update resource control msr on other CPUs. */
-	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
+	smp_xcall_mask(cpu_mask, rdt_ctrl_update, &msr_param, XCALL_TYPE_SYNC);
 	put_cpu();
 
 done:
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 83f901e2c2df..2fee32deb701 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -329,7 +329,7 @@ update_closid_rmid(const struct cpumask *cpu_mask, struct rdtgroup *r)
 
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		update_cpu_closid_rmid(r);
-	smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1);
+	smp_xcall_mask(cpu_mask, update_cpu_closid_rmid, r, XCALL_TYPE_SYNC);
 	put_cpu();
 }
 
@@ -1866,7 +1866,7 @@ static int set_cache_qos_cfg(int level, bool enable)
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		update(&enable);
 	/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, update, &enable, 1);
+	smp_xcall_mask(cpu_mask, update, &enable, XCALL_TYPE_SYNC);
 	put_cpu();
 
 	free_cpumask_var(cpu_mask);
@@ -2335,7 +2335,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		rdt_ctrl_update(&msr_param);
 	/* Update CBM on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
+	smp_xcall_mask(cpu_mask, rdt_ctrl_update, &msr_param, XCALL_TYPE_SYNC);
 	put_cpu();
 
 	free_cpumask_var(cpu_mask);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8e4bc6453d26..ed9a7cf4f996 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -278,8 +278,9 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page,
 			 * miss cpus that entered the enclave between
 			 * generating the mask and incrementing epoch.
 			 */
-			on_each_cpu_mask(sgx_encl_ewb_cpumask(encl),
-					 sgx_ipi_cb, NULL, 1);
+			smp_xcall_mask(sgx_encl_ewb_cpumask(encl),
+				 sgx_ipi_cb, NULL, XCALL_TYPE_SYNC);
+
 			ret = __sgx_encl_ewb(epc_page, va_slot, backing);
 		}
 	}
diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index ec8064c0ae03..5da5c757ce42 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -120,7 +120,7 @@ static inline void umwait_update_control(u32 maxtime, bool c02_enable)
 
 	WRITE_ONCE(umwait_control_cached, ctrl);
 	/* Propagate to all CPUs */
-	on_each_cpu(umwait_update_control_msr, NULL, 1);
+	smp_xcall(XCALL_ALL, umwait_update_control_msr, NULL, XCALL_TYPE_SYNC);
 }
 
 static ssize_t
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index c04b933f48d3..94cc41cf656e 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -282,7 +282,7 @@ static int vmware_pv_reboot_notify(struct notifier_block *nb,
 				unsigned long code, void *unused)
 {
 	if (code == SYS_RESTART)
-		on_each_cpu(vmware_pv_guest_cpu_reboot, NULL, 1);
+		smp_xcall(XCALL_ALL, vmware_pv_guest_cpu_reboot, NULL, XCALL_TYPE_SYNC);
 	return NOTIFY_DONE;
 }
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a22deb58f86d..9ba37c13df7b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -730,7 +730,7 @@ static int kvm_pv_reboot_notify(struct notifier_block *nb,
 				unsigned long code, void *unused)
 {
 	if (code == SYS_RESTART)
-		on_each_cpu(kvm_pv_guest_cpu_reboot, NULL, 1);
+		smp_xcall(XCALL_ALL, kvm_pv_guest_cpu_reboot, NULL, XCALL_TYPE_SYNC);
 	return NOTIFY_DONE;
 }
 
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 525876e7b9f4..d92b0a04b859 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -424,7 +424,7 @@ static void install_ldt(struct mm_struct *mm, struct ldt_struct *ldt)
 	smp_store_release(&mm->context.ldt, ldt);
 
 	/* Activate the LDT for all CPUs using currents mm. */
-	on_each_cpu_mask(mm_cpumask(mm), flush_ldt, mm, true);
+	smp_xcall_mask(mm_cpumask(mm), flush_ldt, mm, XCALL_TYPE_SYNC);
 
 	mutex_unlock(&mm->context.lock);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c0ca599a353..02b84f0bdff2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7443,8 +7443,8 @@ static int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
 		int cpu = get_cpu();
 
 		cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
-		on_each_cpu_mask(vcpu->arch.wbinvd_dirty_mask,
-				wbinvd_ipi, NULL, 1);
+		smp_xcall_mask(vcpu->arch.wbinvd_dirty_mask,
+				wbinvd_ipi, NULL, XCALL_TYPE_SYNC);
 		put_cpu();
 		cpumask_clear(vcpu->arch.wbinvd_dirty_mask);
 	} else
diff --git a/arch/x86/lib/cache-smp.c b/arch/x86/lib/cache-smp.c
index 7c48ff4ae8d1..d81977b85228 100644
--- a/arch/x86/lib/cache-smp.c
+++ b/arch/x86/lib/cache-smp.c
@@ -15,7 +15,7 @@ EXPORT_SYMBOL(wbinvd_on_cpu);
 
 int wbinvd_on_all_cpus(void)
 {
-	on_each_cpu(__wbinvd, NULL, 1);
+	smp_xcall(XCALL_ALL, __wbinvd, NULL, XCALL_TYPE_SYNC);
 	return 0;
 }
 EXPORT_SYMBOL(wbinvd_on_all_cpus);
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index 40bbe56bde32..68170a28270f 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -113,7 +113,7 @@ static void __rwmsr_on_cpus(const struct cpumask *mask, u32 msr_no,
 	if (cpumask_test_cpu(this_cpu, mask))
 		msr_func(&rv);
 
-	smp_call_function_many(mask, msr_func, &rv, 1);
+	smp_xcall_mask(mask, msr_func, &rv, XCALL_TYPE_SYNC);
 	put_cpu();
 }
 
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index abf5ed76e4b7..835ab4e0a8fd 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -348,7 +348,7 @@ static void cpa_flush_all(unsigned long cache)
 {
 	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
 
-	on_each_cpu(__cpa_flush_all, (void *) cache, 1);
+	smp_xcall(XCALL_ALL, __cpa_flush_all, (void *) cache, XCALL_TYPE_SYNC);
 }
 
 static void __cpa_flush_tlb(void *data)
@@ -375,7 +375,7 @@ static void cpa_flush(struct cpa_data *data, int cache)
 	if (cpa->force_flush_all || cpa->numpages > tlb_single_page_flush_ceiling)
 		flush_tlb_all();
 	else
-		on_each_cpu(__cpa_flush_tlb, cpa, 1);
+		smp_xcall(XCALL_ALL, __cpa_flush_tlb, cpa, XCALL_TYPE_SYNC);
 
 	if (!cache)
 		return;
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d400b6d9d246..78f685c9ebb0 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -889,10 +889,10 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
 	 * doing a speculative memory access.
 	 */
 	if (info->freed_tables)
-		on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
+		smp_xcall_mask(cpumask, flush_tlb_func, (void *)info, XCALL_TYPE_SYNC);
 	else
-		on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
-				(void *)info, 1, cpumask);
+		smp_xcall_mask_cond(cpumask, flush_tlb_func, (void *)info,
+				tlb_is_not_lazy, XCALL_TYPE_SYNC);
 }
 
 void flush_tlb_multi(const struct cpumask *cpumask,
@@ -1006,7 +1006,7 @@ static void do_flush_tlb_all(void *info)
 void flush_tlb_all(void)
 {
 	count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
-	on_each_cpu(do_flush_tlb_all, NULL, 1);
+	smp_xcall(XCALL_ALL, do_flush_tlb_all, NULL, XCALL_TYPE_SYNC);
 }
 
 static void do_kernel_range_flush(void *info)
@@ -1024,14 +1024,14 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 	/* Balance as user space task's flush, a bit conservative */
 	if (end == TLB_FLUSH_ALL ||
 	    (end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) {
-		on_each_cpu(do_flush_tlb_all, NULL, 1);
+		smp_xcall(XCALL_ALL, do_flush_tlb_all, NULL, XCALL_TYPE_SYNC);
 	} else {
 		struct flush_tlb_info *info;
 
 		preempt_disable();
 		info = get_flush_tlb_info(NULL, start, end, 0, false, 0);
 
-		on_each_cpu(do_kernel_range_flush, info, 1);
+		smp_xcall(XCALL_ALL, do_kernel_range_flush, info, XCALL_TYPE_SYNC);
 
 		put_flush_tlb_info();
 		preempt_enable();
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 00354866921b..1938d08b20e7 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -948,7 +948,7 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
 			cpumask_set_cpu(cpu, mask);
 	}
 
-	smp_call_function_many(mask, drop_mm_ref_this_cpu, mm, 1);
+	smp_xcall_mask(mask, drop_mm_ref_this_cpu, mm, XCALL_TYPE_SYNC);
 	free_cpumask_var(mask);
 }
 #else
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 688aa8b6ae29..6030daaa8f04 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -428,7 +428,7 @@ static void stop_self(void *v)
 
 static void xen_pv_stop_other_cpus(int wait)
 {
-	smp_call_function(stop_self, NULL, wait);
+	smp_xcall(XCALL_ALL, stop_self, NULL, (wait ? XCALL_TYPE_SYNC : XCALL_TYPE_ASYNC));
 }
 
 static irqreturn_t xen_irq_work_interrupt(int irq, void *dev_id)
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 1d83152c761b..e54ae4cb388d 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -67,7 +67,7 @@ void xen_arch_resume(void)
 {
 	int cpu;
 
-	on_each_cpu(xen_vcpu_notify_restore, NULL, 1);
+	smp_xcall(XCALL_ALL, xen_vcpu_notify_restore, NULL, XCALL_TYPE_SYNC);
 
 	for_each_online_cpu(cpu)
 		xen_pmu_init(cpu);
@@ -80,5 +80,5 @@ void xen_arch_suspend(void)
 	for_each_online_cpu(cpu)
 		xen_pmu_finish(cpu);
 
-	on_each_cpu(xen_vcpu_notify_suspend, NULL, 1);
+	smp_xcall(XCALL_ALL, xen_vcpu_notify_suspend, NULL, XCALL_TYPE_SYNC);
 }
diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c
index 1254da07ead1..b2d126510c9f 100644
--- a/arch/xtensa/kernel/smp.c
+++ b/arch/xtensa/kernel/smp.c
@@ -470,7 +470,7 @@ static void ipi_flush_tlb_all(void *arg)
 
 void flush_tlb_all(void)
 {
-	on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_all, NULL, XCALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_mm(void *arg)
@@ -480,7 +480,7 @@ static void ipi_flush_tlb_mm(void *arg)
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	on_each_cpu(ipi_flush_tlb_mm, mm, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_mm, mm, XCALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_page(void *arg)
@@ -495,7 +495,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 		.vma = vma,
 		.addr1 = addr,
 	};
-	on_each_cpu(ipi_flush_tlb_page, &fd, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_page, &fd, XCALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_range(void *arg)
@@ -512,7 +512,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_tlb_range, &fd, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_range, &fd, XCALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_kernel_range(void *arg)
@@ -527,7 +527,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_tlb_kernel_range, &fd, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_tlb_kernel_range, &fd, XCALL_TYPE_SYNC);
 }
 
 /* Cache flush functions */
@@ -539,7 +539,7 @@ static void ipi_flush_cache_all(void *arg)
 
 void flush_cache_all(void)
 {
-	on_each_cpu(ipi_flush_cache_all, NULL, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_cache_all, NULL, XCALL_TYPE_SYNC);
 }
 
 static void ipi_flush_cache_page(void *arg)
@@ -556,7 +556,7 @@ void flush_cache_page(struct vm_area_struct *vma,
 		.addr1 = address,
 		.addr2 = pfn,
 	};
-	on_each_cpu(ipi_flush_cache_page, &fd, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_cache_page, &fd, XCALL_TYPE_SYNC);
 }
 
 static void ipi_flush_cache_range(void *arg)
@@ -573,7 +573,7 @@ void flush_cache_range(struct vm_area_struct *vma,
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_cache_range, &fd, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_cache_range, &fd, XCALL_TYPE_SYNC);
 }
 
 static void ipi_flush_icache_range(void *arg)
@@ -588,7 +588,7 @@ void flush_icache_range(unsigned long start, unsigned long end)
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_icache_range, &fd, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_icache_range, &fd, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(flush_icache_range);
 
@@ -607,7 +607,7 @@ static void system_invalidate_dcache_range(unsigned long start,
 		.addr1 = start,
 		.addr2 = size,
 	};
-	on_each_cpu(ipi_invalidate_dcache_range, &fd, 1);
+	smp_xcall(XCALL_ALL, ipi_invalidate_dcache_range, &fd, XCALL_TYPE_SYNC);
 }
 
 static void ipi_flush_invalidate_dcache_range(void *arg)
@@ -623,5 +623,5 @@ static void system_flush_invalidate_dcache_range(unsigned long start,
 		.addr1 = start,
 		.addr2 = size,
 	};
-	on_each_cpu(ipi_flush_invalidate_dcache_range, &fd, 1);
+	smp_xcall(XCALL_ALL, ipi_flush_invalidate_dcache_range, &fd, XCALL_TYPE_SYNC);
 }
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index 3ffbb1c80c5c..dc9754f3162e 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -1308,7 +1308,7 @@ static void ipi_handler(void *null)
 
 void global_cache_flush(void)
 {
-	on_each_cpu(ipi_handler, NULL, 1);
+	smp_xcall(XCALL_ALL, ipi_handler, NULL, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(global_cache_flush);
 
diff --git a/drivers/clocksource/mips-gic-timer.c b/drivers/clocksource/mips-gic-timer.c
index be4175f415ba..51754c81213f 100644
--- a/drivers/clocksource/mips-gic-timer.c
+++ b/drivers/clocksource/mips-gic-timer.c
@@ -130,7 +130,7 @@ static int gic_clk_notifier(struct notifier_block *nb, unsigned long action,
 
 	if (action == POST_RATE_CHANGE) {
 		gic_clocksource_unstable("ref clock rate change");
-		on_each_cpu(gic_update_frequency, (void *)cnd->new_rate, 1);
+		smp_xcall(XCALL_ALL, gic_update_frequency, (void *)cnd->new_rate, XCALL_TYPE_SYNC);
 	}
 
 	return NOTIFY_OK;
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index fd595c1cdd2f..140b6f1d6078 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -129,8 +129,8 @@ static void boost_set_msr_each(void *p_en)
 
 static int set_boost(struct cpufreq_policy *policy, int val)
 {
-	on_each_cpu_mask(policy->cpus, boost_set_msr_each,
-			 (void *)(long)val, 1);
+	smp_xcall_mask(policy->cpus, boost_set_msr_each,
+			(void *)(long)val, XCALL_TYPE_SYNC);
 	pr_debug("CPU %*pbl: Core Boosting %sabled.\n",
 		 cpumask_pr_args(policy->cpus), val ? "en" : "dis");
 
@@ -340,7 +340,7 @@ static void drv_write(struct acpi_cpufreq_data *data,
 	if (cpumask_test_cpu(this_cpu, mask))
 		do_drv_write(&cmd);
 
-	smp_call_function_many(mask, do_drv_write, &cmd, 1);
+	smp_xcall_mask(mask, do_drv_write, &cmd, XCALL_TYPE_SYNC);
 	put_cpu();
 }
 
diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c
index ac381db25dbe..4b0b7e3cb19f 100644
--- a/drivers/cpufreq/tegra194-cpufreq.c
+++ b/drivers/cpufreq/tegra194-cpufreq.c
@@ -265,7 +265,7 @@ static int tegra194_cpufreq_set_target(struct cpufreq_policy *policy,
 	 * in a cluster run at same frequency which is the maximum frequency
 	 * request out of the values requested by both cores in that cluster.
 	 */
-	on_each_cpu_mask(policy->cpus, set_cpu_ndiv, tbl, true);
+	smp_xcall_mask(policy->cpus, set_cpu_ndiv, tbl, XCALL_TYPE_SYNC);
 
 	return 0;
 }
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index f70aa17e2a8e..88c104ee76b4 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -225,8 +225,8 @@ static int __cpuidle_register_driver(struct cpuidle_driver *drv)
 		return ret;
 
 	if (drv->bctimer)
-		on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
-				 (void *)1, 1);
+		smp_xcall_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
+				 (void *)1, XCALL_TYPE_SYNC);
 
 	return 0;
 }
@@ -244,8 +244,8 @@ static void __cpuidle_unregister_driver(struct cpuidle_driver *drv)
 {
 	if (drv->bctimer) {
 		drv->bctimer = 0;
-		on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
-				 NULL, 1);
+		smp_xcall_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
+				 NULL, XCALL_TYPE_SYNC);
 	}
 
 	__cpuidle_unset_driver(drv);
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 812baa48b290..4d44c541fea7 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -793,7 +793,7 @@ static ssize_t inject_write_store(struct device *dev,
 			"/sys/bus/machinecheck/devices/machinecheck<CPUNUM>/check_interval\n"
 			"so that you can get the error report faster.\n");
 
-	on_each_cpu(disable_caches, NULL, 1);
+	smp_xcall(XCALL_ALL, disable_caches, NULL, XCALL_TYPE_SYNC);
 
 	/* Issue 'word' and 'bit' along with the READ request */
 	amd64_write_pci_cfg(pvt->F3, F10_NB_ARRAY_DATA, word_bits);
@@ -806,7 +806,7 @@ static ssize_t inject_write_store(struct device *dev,
 		goto retry;
 	}
 
-	on_each_cpu(enable_caches, NULL, 1);
+	smp_xcall(XCALL_ALL, enable_caches, NULL, XCALL_TYPE_SYNC);
 
 	edac_dbg(0, "section=0x%x word_bits=0x%x\n", section, word_bits);
 
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 1e1a51510e83..e5d79fe7fe6b 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -101,7 +101,7 @@ static inline int sdei_do_cross_call(smp_call_func_t fn,
 	struct sdei_crosscall_args arg;
 
 	CROSSCALL_INIT(arg, event);
-	on_each_cpu(fn, &arg, true);
+	smp_xcall(XCALL_ALL, fn, &arg, XCALL_TYPE_SYNC);
 
 	return arg.first_error;
 }
@@ -359,7 +359,7 @@ static int sdei_api_shared_reset(void)
 static void sdei_mark_interface_broken(void)
 {
 	pr_err("disabling SDEI firmware interface\n");
-	on_each_cpu(&_ipi_mask_cpu, NULL, true);
+	smp_xcall(XCALL_ALL, &_ipi_mask_cpu, NULL, XCALL_TYPE_SYNC);
 	sdei_firmware_call = NULL;
 }
 
@@ -367,7 +367,7 @@ static int sdei_platform_reset(void)
 {
 	int err;
 
-	on_each_cpu(&_ipi_private_reset, NULL, true);
+	smp_xcall(XCALL_ALL, &_ipi_private_reset, NULL, XCALL_TYPE_SYNC);
 	err = sdei_api_shared_reset();
 	if (err) {
 		pr_err("Failed to reset platform: %d\n", err);
@@ -741,14 +741,14 @@ static struct notifier_block sdei_pm_nb = {
 
 static int sdei_device_suspend(struct device *dev)
 {
-	on_each_cpu(_ipi_mask_cpu, NULL, true);
+	smp_xcall(XCALL_ALL, _ipi_mask_cpu, NULL, XCALL_TYPE_SYNC);
 
 	return 0;
 }
 
 static int sdei_device_resume(struct device *dev)
 {
-	on_each_cpu(_ipi_unmask_cpu, NULL, true);
+	smp_xcall(XCALL_ALL, _ipi_unmask_cpu, NULL, XCALL_TYPE_SYNC);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/vlv_sideband.c b/drivers/gpu/drm/i915/vlv_sideband.c
index c26001300ebd..3c67ba455bb7 100644
--- a/drivers/gpu/drm/i915/vlv_sideband.c
+++ b/drivers/gpu/drm/i915/vlv_sideband.c
@@ -42,7 +42,7 @@ static void __vlv_punit_get(struct drm_i915_private *i915)
 	 */
 	if (IS_VALLEYVIEW(i915)) {
 		cpu_latency_qos_update_request(&i915->sb_qos, 0);
-		on_each_cpu(ping, NULL, 1);
+		smp_xcall(XCALL_ALL, ping, NULL, XCALL_TYPE_SYNC);
 	}
 }
 
diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index 521534d5c1e5..ce69d7cdd91d 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -188,7 +188,7 @@ static int read_registers(struct fam15h_power_data *data)
 		cpumask_set_cpu(cpumask_any(topology_sibling_cpumask(cpu)), mask);
 	}
 
-	on_each_cpu_mask(mask, do_read_registers_on_cu, data, true);
+	smp_xcall_mask(mask, do_read_registers_on_cu, data, XCALL_TYPE_SYNC);
 
 	cpus_read_unlock();
 	free_cpumask_var(mask);
diff --git a/drivers/irqchip/irq-mvebu-pic.c b/drivers/irqchip/irq-mvebu-pic.c
index ef3d3646ccc2..80cee49a9a8e 100644
--- a/drivers/irqchip/irq-mvebu-pic.c
+++ b/drivers/irqchip/irq-mvebu-pic.c
@@ -160,7 +160,7 @@ static int mvebu_pic_probe(struct platform_device *pdev)
 	irq_set_chained_handler(pic->parent_irq, mvebu_pic_handle_cascade_irq);
 	irq_set_handler_data(pic->parent_irq, pic);
 
-	on_each_cpu(mvebu_pic_enable_percpu_irq, pic, 1);
+	smp_xcall(XCALL_ALL, mvebu_pic_enable_percpu_irq, pic, XCALL_TYPE_SYNC);
 
 	platform_set_drvdata(pdev, pic);
 
@@ -171,7 +171,7 @@ static int mvebu_pic_remove(struct platform_device *pdev)
 {
 	struct mvebu_pic *pic = platform_get_drvdata(pdev);
 
-	on_each_cpu(mvebu_pic_disable_percpu_irq, pic, 1);
+	smp_xcall(XCALL_ALL, mvebu_pic_disable_percpu_irq, pic, XCALL_TYPE_SYNC);
 	irq_domain_remove(pic->domain);
 
 	return 0;
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 934f6dd90992..305c18bf33c1 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1473,10 +1473,10 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
 	int max_cpu = num_present_cpus();
 
 	/* Clear all Cause registers */
-	on_each_cpu(mvneta_percpu_clear_intr_cause, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_clear_intr_cause, pp, XCALL_TYPE_SYNC);
 
 	/* Mask all interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_mask_interrupt, pp, XCALL_TYPE_SYNC);
 	mvreg_write(pp, MVNETA_INTR_ENABLE, 0);
 
 	/* Enable MBUS Retry bit16 */
@@ -3704,7 +3704,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
 	}
 
 	/* Unmask interrupts. It has to be done from each CPU */
-	on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_unmask_interrupt, pp, XCALL_TYPE_SYNC);
 
 	mvreg_write(pp, MVNETA_INTR_MISC_MASK,
 		    MVNETA_CAUSE_PHY_STATUS_CHANGE |
@@ -3751,10 +3751,10 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
 	mvneta_port_disable(pp);
 
 	/* Clear all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_clear_intr_cause, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_clear_intr_cause, pp, XCALL_TYPE_SYNC);
 
 	/* Mask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_mask_interrupt, pp, XCALL_TYPE_SYNC);
 
 	mvneta_tx_reset(pp);
 	mvneta_rx_reset(pp);
@@ -3811,7 +3811,7 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 	 * reallocation of the queues
 	 */
 	mvneta_stop_dev(pp);
-	on_each_cpu(mvneta_percpu_disable, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_disable, pp, XCALL_TYPE_SYNC);
 
 	mvneta_cleanup_txqs(pp);
 	mvneta_cleanup_rxqs(pp);
@@ -3833,7 +3833,7 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 		return ret;
 	}
 
-	on_each_cpu(mvneta_percpu_enable, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_enable, pp, XCALL_TYPE_SYNC);
 	mvneta_start_dev(pp);
 
 	netdev_update_features(dev);
@@ -4349,7 +4349,7 @@ static int mvneta_cpu_online(unsigned int cpu, struct hlist_node *node)
 	}
 
 	/* Mask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_mask_interrupt, pp, XCALL_TYPE_SYNC);
 	napi_enable(&port->napi);
 
 	/*
@@ -4365,7 +4365,7 @@ static int mvneta_cpu_online(unsigned int cpu, struct hlist_node *node)
 	mvneta_percpu_elect(pp);
 
 	/* Unmask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_unmask_interrupt, pp, XCALL_TYPE_SYNC);
 	mvreg_write(pp, MVNETA_INTR_MISC_MASK,
 		    MVNETA_CAUSE_PHY_STATUS_CHANGE |
 		    MVNETA_CAUSE_LINK_CHANGE);
@@ -4386,7 +4386,7 @@ static int mvneta_cpu_down_prepare(unsigned int cpu, struct hlist_node *node)
 	 */
 	spin_lock(&pp->lock);
 	/* Mask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_mask_interrupt, pp, XCALL_TYPE_SYNC);
 	spin_unlock(&pp->lock);
 
 	napi_synchronize(&port->napi);
@@ -4406,7 +4406,7 @@ static int mvneta_cpu_dead(unsigned int cpu, struct hlist_node *node)
 	mvneta_percpu_elect(pp);
 	spin_unlock(&pp->lock);
 	/* Unmask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_unmask_interrupt, pp, XCALL_TYPE_SYNC);
 	mvreg_write(pp, MVNETA_INTR_MISC_MASK,
 		    MVNETA_CAUSE_PHY_STATUS_CHANGE |
 		    MVNETA_CAUSE_LINK_CHANGE);
@@ -4445,7 +4445,7 @@ static int mvneta_open(struct net_device *dev)
 		/* Enable per-CPU interrupt on all the CPU to handle our RX
 		 * queue interrupts
 		 */
-		on_each_cpu(mvneta_percpu_enable, pp, true);
+		smp_xcall(XCALL_ALL, mvneta_percpu_enable, pp, XCALL_TYPE_SYNC);
 
 		pp->is_stopped = false;
 		/* Register a CPU notifier to handle the case where our CPU
@@ -4484,7 +4484,7 @@ static int mvneta_open(struct net_device *dev)
 	if (pp->neta_armada3700) {
 		free_irq(pp->dev->irq, pp);
 	} else {
-		on_each_cpu(mvneta_percpu_disable, pp, true);
+		smp_xcall(XCALL_ALL, mvneta_percpu_disable, pp, XCALL_TYPE_SYNC);
 		free_percpu_irq(pp->dev->irq, pp->ports);
 	}
 err_cleanup_txqs:
@@ -4516,7 +4516,7 @@ static int mvneta_stop(struct net_device *dev)
 						    &pp->node_online);
 		cpuhp_state_remove_instance_nocalls(CPUHP_NET_MVNETA_DEAD,
 						    &pp->node_dead);
-		on_each_cpu(mvneta_percpu_disable, pp, true);
+		smp_xcall(XCALL_ALL, mvneta_percpu_disable, pp, XCALL_TYPE_SYNC);
 		free_percpu_irq(dev->irq, pp->ports);
 	} else {
 		mvneta_stop_dev(pp);
@@ -4893,7 +4893,7 @@ static int  mvneta_config_rss(struct mvneta_port *pp)
 
 	netif_tx_stop_all_queues(pp->dev);
 
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_xcall(XCALL_ALL, mvneta_percpu_mask_interrupt, pp, XCALL_TYPE_SYNC);
 
 	if (!pp->neta_armada3700) {
 		/* We have to synchronise on the napi of each CPU */
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 1a835b48791b..e94b51be503f 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -3263,7 +3263,7 @@ static void mvpp2_cleanup_txqs(struct mvpp2_port *port)
 		mvpp2_txq_deinit(port, txq);
 	}
 
-	on_each_cpu(mvpp2_txq_sent_counter_clear, port, 1);
+	smp_xcall(XCALL_ALL, mvpp2_txq_sent_counter_clear, port, XCALL_TYPE_SYNC);
 
 	val &= ~MVPP2_TX_PORT_FLUSH_MASK(port->id);
 	mvpp2_write(port->priv, MVPP2_TX_PORT_FLUSH_REG, val);
@@ -3327,7 +3327,7 @@ static int mvpp2_setup_txqs(struct mvpp2_port *port)
 		}
 	}
 
-	on_each_cpu(mvpp2_txq_sent_counter_clear, port, 1);
+	smp_xcall(XCALL_ALL, mvpp2_txq_sent_counter_clear, port, XCALL_TYPE_SYNC);
 	return 0;
 
 err_cleanup:
@@ -4829,7 +4829,7 @@ static int mvpp2_open(struct net_device *dev)
 	}
 
 	/* Unmask interrupts on all CPUs */
-	on_each_cpu(mvpp2_interrupts_unmask, port, 1);
+	smp_xcall(XCALL_ALL, mvpp2_interrupts_unmask, port, XCALL_TYPE_SYNC);
 	mvpp2_shared_interrupt_mask_unmask(port, false);
 
 	mvpp2_start_dev(port);
@@ -4858,7 +4858,7 @@ static int mvpp2_stop(struct net_device *dev)
 	mvpp2_stop_dev(port);
 
 	/* Mask interrupts on all threads */
-	on_each_cpu(mvpp2_interrupts_mask, port, 1);
+	smp_xcall(XCALL_ALL, mvpp2_interrupts_mask, port, XCALL_TYPE_SYNC);
 	mvpp2_shared_interrupt_mask_unmask(port, true);
 
 	if (port->phylink)
diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c
index 4dfdbfca6841..8a0fb76b297e 100644
--- a/drivers/platform/x86/intel_ips.c
+++ b/drivers/platform/x86/intel_ips.c
@@ -457,7 +457,7 @@ static void ips_enable_cpu_turbo(struct ips_driver *ips)
 		return;
 
 	if (ips->turbo_toggle_allowed)
-		on_each_cpu(do_enable_cpu_turbo, ips, 1);
+		smp_xcall(XCALL_ALL, do_enable_cpu_turbo, ips, XCALL_TYPE_SYNC);
 
 	ips->__cpu_turbo_on = true;
 }
@@ -495,7 +495,7 @@ static void ips_disable_cpu_turbo(struct ips_driver *ips)
 		return;
 
 	if (ips->turbo_toggle_allowed)
-		on_each_cpu(do_disable_cpu_turbo, ips, 1);
+		smp_xcall(XCALL_ALL, do_disable_cpu_turbo, ips, XCALL_TYPE_SYNC);
 
 	ips->__cpu_turbo_on = false;
 }
diff --git a/drivers/soc/xilinx/xlnx_event_manager.c b/drivers/soc/xilinx/xlnx_event_manager.c
index b27f8853508e..efb11c62f2a0 100644
--- a/drivers/soc/xilinx/xlnx_event_manager.c
+++ b/drivers/soc/xilinx/xlnx_event_manager.c
@@ -514,7 +514,7 @@ static void xlnx_event_cleanup_sgi(struct platform_device *pdev)
 
 	cpuhp_remove_state(CPUHP_AP_ONLINE_DYN);
 
-	on_each_cpu(xlnx_disable_percpu_irq, NULL, 1);
+	smp_xcall(XCALL_ALL, xlnx_disable_percpu_irq, NULL, XCALL_TYPE_SYNC);
 
 	irq_clear_status_flags(virq_sgi, IRQ_PER_CPU);
 	free_percpu_irq(virq_sgi, &cpu_number1);
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index bbfd004449b5..e8542970d93e 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -243,7 +243,7 @@ static void showacpu(void *dummy)
 
 static void sysrq_showregs_othercpus(struct work_struct *dummy)
 {
-	smp_call_function(showacpu, NULL, 0);
+	smp_xcall(XCALL_ALL, showacpu, NULL, XCALL_TYPE_ASYNC);
 }
 
 static DECLARE_WORK(sysrq_showallcpus, sysrq_showregs_othercpus);
diff --git a/drivers/watchdog/booke_wdt.c b/drivers/watchdog/booke_wdt.c
index 5e4dc1a0f2c6..45ec7bb02f2a 100644
--- a/drivers/watchdog/booke_wdt.c
+++ b/drivers/watchdog/booke_wdt.c
@@ -118,7 +118,7 @@ static void __booke_wdt_set(void *data)
 
 static void booke_wdt_set(void *data)
 {
-	on_each_cpu(__booke_wdt_set, data, 0);
+	smp_xcall(XCALL_ALL, __booke_wdt_set, data, XCALL_TYPE_ASYNC);
 }
 
 static void __booke_wdt_ping(void *data)
@@ -128,7 +128,7 @@ static void __booke_wdt_ping(void *data)
 
 static int booke_wdt_ping(struct watchdog_device *wdog)
 {
-	on_each_cpu(__booke_wdt_ping, NULL, 0);
+	smp_xcall(XCALL_ALL, __booke_wdt_ping, NULL, XCALL_TYPE_ASYNC);
 
 	return 0;
 }
@@ -170,7 +170,7 @@ static void __booke_wdt_disable(void *data)
 
 static int booke_wdt_start(struct watchdog_device *wdog)
 {
-	on_each_cpu(__booke_wdt_enable, wdog, 0);
+	smp_xcall(XCALL_ALL, __booke_wdt_enable, wdog, XCALL_TYPE_ASYNC);
 	pr_debug("watchdog enabled (timeout = %u sec)\n", wdog->timeout);
 
 	return 0;
@@ -178,7 +178,7 @@ static int booke_wdt_start(struct watchdog_device *wdog)
 
 static int booke_wdt_stop(struct watchdog_device *wdog)
 {
-	on_each_cpu(__booke_wdt_disable, NULL, 0);
+	smp_xcall(XCALL_ALL, __booke_wdt_disable, NULL, XCALL_TYPE_ASYNC);
 	pr_debug("watchdog disabled\n");
 
 	return 0;
diff --git a/fs/buffer.c b/fs/buffer.c
index 2b5561ae5d0b..5c93f3d94c74 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1420,7 +1420,7 @@ bool has_bh_in_lru(int cpu, void *dummy)
 
 void invalidate_bh_lrus(void)
 {
-	on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1);
+	smp_xcall_cond(XCALL_ALL, invalidate_bh_lru, NULL, has_bh_in_lru, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(invalidate_bh_lrus);
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 3ddd4c6107e1..673192e2192e 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -174,6 +174,7 @@ extern int smp_xcall_any(const struct cpumask *mask, smp_call_func_t func,
 #define	XCALL_TYPE_IRQ_WORK	CSD_TYPE_IRQ_WORK
 #define	XCALL_TYPE_TTWU		CSD_TYPE_TTWU
 #define	XCALL_TYPE_MASK		CSD_FLAG_TYPE_MASK
+#define	XCALL_ALL		-1
 
 #define	XCALL_ALL		-1
 
@@ -205,9 +206,6 @@ extern unsigned int total_cpus;
 int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 			     int wait);
 
-void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
-			   void *info, bool wait, const struct cpumask *mask);
-
 #define	smp_call_function_single_async(cpu, csd) \
 	smp_xcall_private(cpu, csd, XCALL_TYPE_ASYNC)
 
@@ -219,48 +217,6 @@ void panic_smp_self_stop(void);
 void nmi_panic_self_stop(struct pt_regs *regs);
 void crash_smp_send_stop(void);
 
-/*
- * Call a function on all processors
- */
-static inline void on_each_cpu(smp_call_func_t func, void *info, int wait)
-{
-	on_each_cpu_cond_mask(NULL, func, info, wait, cpu_online_mask);
-}
-
-/**
- * on_each_cpu_mask(): Run a function on processors specified by
- * cpumask, which may include the local processor.
- * @mask: The set of cpus to run on (only runs on online subset).
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait (atomically) until function has completed
- *        on other CPUs.
- *
- * If @wait is true, then returns once @func has returned.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler.  The
- * exception is that it may be used during early boot while
- * early_boot_irqs_disabled is set.
- */
-static inline void on_each_cpu_mask(const struct cpumask *mask,
-				    smp_call_func_t func, void *info, bool wait)
-{
-	on_each_cpu_cond_mask(NULL, func, info, wait, mask);
-}
-
-/*
- * Call a function on each processor for which the supplied function
- * cond_func returns a positive value. This may include the local
- * processor.  May be used during early boot while early_boot_irqs_disabled is
- * set. Use local_irq_save/restore() instead of local_irq_disable/enable().
- */
-static inline void on_each_cpu_cond(smp_cond_func_t cond_func,
-				    smp_call_func_t func, void *info, bool wait)
-{
-	on_each_cpu_cond_mask(cond_func, func, info, wait, cpu_online_mask);
-}
-
 #ifdef CONFIG_SMP
 
 #include <linux/preempt.h>
@@ -299,13 +255,6 @@ extern int __cpu_up(unsigned int cpunum, struct task_struct *tidle);
  */
 extern void smp_cpus_done(unsigned int max_cpus);
 
-/*
- * Call a function on all other processors
- */
-void smp_call_function(smp_call_func_t func, void *info, int wait);
-void smp_call_function_many(const struct cpumask *mask,
-			    smp_call_func_t func, void *info, bool wait);
-
 void kick_all_cpus_sync(void);
 void wake_up_all_idle_cpus(void);
 
diff --git a/kernel/profile.c b/kernel/profile.c
index 37640a0bd8a3..61a564372fc0 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -179,7 +179,7 @@ static void profile_flip_buffers(void)
 	mutex_lock(&profile_flip_mutex);
 	j = per_cpu(cpu_profile_flip, get_cpu());
 	put_cpu();
-	on_each_cpu(__profile_flip_buffers, NULL, 1);
+	smp_xcall(XCALL_ALL, __profile_flip_buffers, NULL, XCALL_TYPE_SYNC);
 	for_each_online_cpu(cpu) {
 		struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
 		for (i = 0; i < NR_PROFILE_HIT; ++i) {
@@ -202,7 +202,7 @@ static void profile_discard_flip_buffers(void)
 	mutex_lock(&profile_flip_mutex);
 	i = per_cpu(cpu_profile_flip, get_cpu());
 	put_cpu();
-	on_each_cpu(__profile_flip_buffers, NULL, 1);
+	smp_xcall(XCALL_ALL, __profile_flip_buffers, NULL, XCALL_TYPE_SYNC);
 	for_each_online_cpu(cpu) {
 		struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[i];
 		memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4b8189455d5..bf3a3fe88d94 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1884,7 +1884,7 @@ static noinline_for_stack bool rcu_gp_init(void)
 
 	// If strict, make all CPUs aware of new grace period.
 	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
-		on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
+		smp_xcall(XCALL_ALL, rcu_strict_gp_boundary, NULL, XCALL_TYPE_ASYNC);
 
 	return true;
 }
@@ -2109,7 +2109,7 @@ static noinline void rcu_gp_cleanup(void)
 
 	// If strict, make all CPUs aware of the end of the old grace period.
 	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
-		on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
+		smp_xcall(XCALL_ALL, rcu_strict_gp_boundary, NULL, XCALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/kernel/scftorture.c b/kernel/scftorture.c
index dcb0410950e4..a2cb2b223997 100644
--- a/kernel/scftorture.c
+++ b/kernel/scftorture.c
@@ -398,7 +398,8 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 			barrier(); // Prevent race-reduction compiler optimizations.
 			scfcp->scfc_in = true;
 		}
-		smp_call_function_many(cpu_online_mask, scf_handler, scfcp, scfsp->scfs_wait);
+		smp_xcall_mask(cpu_online_mask, scf_handler, scfcp,
+			       (scfsp->scfs_wait ? XCALL_TYPE_SYNC : XCALL_TYPE_ASYNC));
 		break;
 	case SCF_PRIM_ALL:
 		if (scfsp->scfs_wait)
@@ -409,7 +410,8 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 			barrier(); // Prevent race-reduction compiler optimizations.
 			scfcp->scfc_in = true;
 		}
-		smp_call_function(scf_handler, scfcp, scfsp->scfs_wait);
+		smp_xcall(XCALL_ALL, scf_handler, scfcp,
+			  (scfsp->scfs_wait ? XCALL_TYPE_SYNC : XCALL_TYPE_ASYNC));
 		break;
 	default:
 		WARN_ON_ONCE(1);
@@ -515,7 +517,7 @@ static void scf_torture_cleanup(void)
 			torture_stop_kthread("scftorture_invoker", scf_stats_p[i].task);
 	else
 		goto end;
-	smp_call_function(scf_cleanup_handler, NULL, 0);
+	smp_xcall(XCALL_ALL, scf_cleanup_handler, NULL, XCALL_TYPE_ASYNC);
 	torture_stop_kthread(scf_torture_stats, scf_torture_stats_task);
 	scf_torture_stats_print();  // -After- the stats thread is stopped!
 	kfree(scf_stats_p);  // -After- the last stats print has completed!
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 0c5be7ebb1dc..9af7795f220e 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -290,9 +290,7 @@ static int membarrier_global_expedited(void)
 	}
 	rcu_read_unlock();
 
-	preempt_disable();
-	smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
-	preempt_enable();
+	smp_xcall_mask(tmpmask, ipi_mb, NULL, XCALL_TYPE_SYNC);
 
 	free_cpumask_var(tmpmask);
 	cpus_read_unlock();
@@ -399,11 +397,9 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		 * rseq critical section.
 		 */
 		if (flags != MEMBARRIER_FLAG_SYNC_CORE) {
-			preempt_disable();
-			smp_call_function_many(tmpmask, ipi_func, NULL, true);
-			preempt_enable();
+			smp_xcall_mask(tmpmask, ipi_func, NULL, XCALL_TYPE_SYNC);
 		} else {
-			on_each_cpu_mask(tmpmask, ipi_func, NULL, true);
+			smp_xcall_mask(tmpmask, ipi_func, NULL, XCALL_TYPE_SYNC);
 		}
 	}
 
@@ -471,7 +467,7 @@ static int sync_runqueues_membarrier_state(struct mm_struct *mm)
 	}
 	rcu_read_unlock();
 
-	on_each_cpu_mask(tmpmask, ipi_sync_rq_state, mm, true);
+	smp_xcall_mask(tmpmask, ipi_sync_rq_state, mm, XCALL_TYPE_SYNC);
 
 	free_cpumask_var(tmpmask);
 	cpus_read_unlock();
diff --git a/kernel/smp.c b/kernel/smp.c
index 94df3b3a38cf..fb2333218e31 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -626,10 +626,12 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-static void smp_call_function_many_cond(const struct cpumask *mask,
-					smp_call_func_t func, void *info,
-					bool wait,
-					smp_cond_func_t cond_func)
+/*
+ * This function performs synchronous and asynchronous cross CPU call for
+ * more than one participants.
+ */
+static void __smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
+		void *info, smp_cond_func_t cond_func, unsigned int flags)
 {
 	int cpu, last_cpu, this_cpu = smp_processor_id();
 	struct call_function_data *cfd;
@@ -650,10 +652,10 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 		lockdep_assert_irqs_enabled();
 
 	/*
-	 * When @wait we can deadlock when we interrupt between llist_add() and
-	 * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
-	 * csd_lock() on because the interrupt context uses the same csd
-	 * storage.
+	 * When CSD_TYPE_SYNC we can deadlock when we interrupt between
+	 * llist_add() and arch_send_call_function_ipi*(); when CSD_TYPE_ASYNC
+	 * we can deadlock due to csd_lock() on because the interrupt context
+	 * uses the same csd storage.
 	 */
 	WARN_ON_ONCE(!in_task());
 
@@ -682,7 +684,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 				continue;
 
 			csd_lock(csd);
-			if (wait)
+			if (flags & CSD_TYPE_SYNC)
 				csd->node.u_flags |= CSD_TYPE_SYNC;
 			csd->func = func;
 			csd->info = info;
@@ -721,11 +723,12 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 		unsigned long flags;
 
 		local_irq_save(flags);
-		func(info);
+		if (func != NULL)
+			func(info);
 		local_irq_restore(flags);
 	}
 
-	if (run_remote && wait) {
+	if (run_remote && (flags & CSD_TYPE_SYNC)) {
 		for_each_cpu(cpu, cfd->cpumask) {
 			call_single_data_t *csd;
 
@@ -735,48 +738,6 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 	}
 }
 
-/**
- * smp_call_function_many(): Run a function on a set of CPUs.
- * @mask: The set of cpus to run on (only runs on online subset).
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If wait is true, the call will not return until func()
- *        has completed on other CPUs.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler. Preemption
- * must be disabled when calling this function.
- */
-void smp_call_function_many(const struct cpumask *mask,
-			    smp_call_func_t func, void *info, bool wait)
-{
-	smp_call_function_many_cond(mask, func, info, wait, NULL);
-}
-EXPORT_SYMBOL(smp_call_function_many);
-
-/**
- * smp_call_function(): Run a function on all other CPUs.
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait (atomically) until function has completed
- *        on other CPUs.
- *
- * Returns 0.
- *
- * If @wait is true, then returns once @func has returned; otherwise
- * it returns just before the target cpu calls @func.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler.
- */
-void smp_call_function(smp_call_func_t func, void *info, int wait)
-{
-	preempt_disable();
-	smp_call_function_many(cpu_online_mask, func, info, wait);
-	preempt_enable();
-}
-EXPORT_SYMBOL(smp_call_function);
-
 /* Setup configured maximum number of CPUs to activate */
 unsigned int setup_max_cpus = NR_CPUS;
 EXPORT_SYMBOL(setup_max_cpus);
@@ -861,38 +822,6 @@ void __init smp_init(void)
 	smp_cpus_done(setup_max_cpus);
 }
 
-/*
- * on_each_cpu_cond(): Call a function on each processor for which
- * the supplied function cond_func returns true, optionally waiting
- * for all the required CPUs to finish. This may include the local
- * processor.
- * @cond_func:	A callback function that is passed a cpu id and
- *		the info parameter. The function is called
- *		with preemption disabled. The function should
- *		return a blooean value indicating whether to IPI
- *		the specified CPU.
- * @func:	The function to run on all applicable CPUs.
- *		This must be fast and non-blocking.
- * @info:	An arbitrary pointer to pass to both functions.
- * @wait:	If true, wait (atomically) until function has
- *		completed on other CPUs.
- *
- * Preemption is disabled to protect against CPUs going offline but not online.
- * CPUs going online during the call will not be seen or sent an IPI.
- *
- * You must not call this function with disabled interrupts or
- * from a hardware interrupt handler or from a bottom half handler.
- */
-void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
-			   void *info, bool wait, const struct cpumask *mask)
-{
-
-	preempt_disable();
-	smp_call_function_many_cond(mask, func, info, wait, cond_func);
-	preempt_enable();
-}
-EXPORT_SYMBOL(on_each_cpu_cond_mask);
-
 static void do_nothing(void *unused)
 {
 }
@@ -912,7 +841,7 @@ void kick_all_cpus_sync(void)
 {
 	/* Make sure the change is visible before we kick the cpus */
 	smp_mb();
-	smp_call_function(do_nothing, NULL, 1);
+	smp_xcall(XCALL_ALL, do_nothing, NULL, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
 
@@ -992,27 +921,6 @@ int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 }
 EXPORT_SYMBOL_GPL(smp_call_on_cpu);
 
-
-void __smp_call_mask_cond(const struct cpumask *mask,
-		smp_call_func_t func, void *info,
-		smp_cond_func_t cond_func,
-		unsigned int flags)
-{
-	bool wait = false;
-
-	if (flags == XCALL_TYPE_SYNC)
-		wait = true;
-
-	preempt_disable();
-
-	/*
-	 * This is temporarily hook. The function smp_call_function_many_cond()
-	 * will be inlined here with the last patch in this series.
-	 */
-	smp_call_function_many_cond(mask, func, info, wait, cond_func);
-	preempt_enable();
-}
-
 /*
  * smp_xcall Interface
  *
@@ -1033,7 +941,8 @@ void __smp_call_mask_cond(const struct cpumask *mask,
  * Parameters:
  *
  * cpu: If cpu >=0 && cpu < nr_cpu_ids, the cross call is for that cpu.
- *      If cpu == -1, the cross call is for all the online CPUs
+ *      If cpu == XCALL_ALL which is -1, the cross call is for all the online
+ *      CPUs.
  *
  * func: It is the cross function that the destination CPUs need to execute.
  *       This function must be fast and non-blocking.
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 0ea8702eb516..b829694d8d9d 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -952,7 +952,7 @@ void clock_was_set(unsigned int bases)
 		goto out_timerfd;
 
 	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
-		on_each_cpu(retrigger_next_event, NULL, 1);
+		smp_xcall(XCALL_ALL, retrigger_next_event, NULL, XCALL_TYPE_SYNC);
 		goto out_timerfd;
 	}
 
@@ -970,9 +970,7 @@ void clock_was_set(unsigned int bases)
 		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
 	}
 
-	preempt_disable();
-	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
-	preempt_enable();
+	smp_xcall_mask(mask, retrigger_next_event, NULL, XCALL_TYPE_SYNC);
 	cpus_read_unlock();
 	free_cpumask_var(mask);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4f1d2f5e7263..6b5f1bcb0f47 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -243,7 +243,7 @@ static void update_ftrace_function(void)
 	/* Make sure the function_trace_op is visible on all CPUs */
 	smp_wmb();
 	/* Nasty way to force a rmb on all cpus */
-	smp_call_function(ftrace_sync_ipi, NULL, 1);
+	smp_xcall(XCALL_ALL, ftrace_sync_ipi, NULL, XCALL_TYPE_SYNC);
 	/* OK, we are all set to update the ftrace_trace_function now! */
 #endif /* !CONFIG_DYNAMIC_FTRACE */
 
@@ -2756,7 +2756,7 @@ void ftrace_modify_all_code(int command)
 		smp_wmb();
 		/* If irqs are disabled, we are in stop machine */
 		if (!irqs_disabled())
-			smp_call_function(ftrace_sync_ipi, NULL, 1);
+			smp_xcall(XCALL_ALL, ftrace_sync_ipi, NULL, XCALL_TYPE_SYNC);
 		err = ftrace_update_ftrace_func(ftrace_trace_function);
 		if (FTRACE_WARN_ON(err))
 			return;
@@ -7769,7 +7769,7 @@ pid_write(struct file *filp, const char __user *ubuf,
 	 * check for those tasks that are currently running.
 	 * Always do this in case a pid was appended or removed.
 	 */
-	on_each_cpu(ignore_task_cpu, tr, 1);
+	smp_xcall(XCALL_ALL, ignore_task_cpu, tr, XCALL_TYPE_SYNC);
 
 	ftrace_update_pid_func();
 	ftrace_startup_all(0);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 05dfc7a12d3d..489a59d0cdac 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5865,7 +5865,7 @@ static __init int rb_hammer_test(void *arg)
 	while (!kthread_should_stop()) {
 
 		/* Send an IPI to all cpus to write data! */
-		smp_call_function(rb_ipi, NULL, 1);
+		smp_xcall(XCALL_ALL, rb_ipi, NULL, XCALL_TYPE_SYNC);
 		/* No sleep, but for non preempt, let others run */
 		schedule();
 	}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f4de111fa18f..58c8984b37d8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2724,11 +2724,9 @@ void trace_buffered_event_disable(void)
 	if (--trace_buffered_event_ref)
 		return;
 
-	preempt_disable();
 	/* For each CPU, set the buffer as used. */
-	smp_call_function_many(tracing_buffer_mask,
-			       disable_trace_buffered_event, NULL, 1);
-	preempt_enable();
+	smp_xcall_mask(tracing_buffer_mask,
+		       disable_trace_buffered_event, NULL, XCALL_TYPE_SYNC);
 
 	/* Wait for all current users to finish */
 	synchronize_rcu();
@@ -2743,11 +2741,9 @@ void trace_buffered_event_disable(void)
 	 */
 	smp_wmb();
 
-	preempt_disable();
 	/* Do the work on each cpu */
-	smp_call_function_many(tracing_buffer_mask,
-			       enable_trace_buffered_event, NULL, 1);
-	preempt_enable();
+	smp_xcall_mask(tracing_buffer_mask,
+		       enable_trace_buffered_event, NULL, XCALL_TYPE_SYNC);
 }
 
 static struct trace_buffer *temp_buffer;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index e11e167b7809..b9d6c0465077 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1995,7 +1995,7 @@ event_pid_write(struct file *filp, const char __user *ubuf,
 	 * check for those tasks that are currently running.
 	 * Always do this in case a pid was appended or removed.
 	 */
-	on_each_cpu(ignore_task_cpu, tr, 1);
+	smp_xcall(XCALL_ALL, ignore_task_cpu, tr, XCALL_TYPE_SYNC);
 
  out:
 	mutex_unlock(&event_mutex);
diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 08291ed33e93..9c851fee2f2a 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -332,7 +332,7 @@ void kasan_quarantine_remove_cache(struct kmem_cache *cache)
 	 * achieves the first goal, while synchronize_srcu() achieves the
 	 * second.
 	 */
-	on_each_cpu(per_cpu_remove_cache, cache, 1);
+	smp_xcall(XCALL_ALL, per_cpu_remove_cache, cache, XCALL_TYPE_SYNC);
 
 	raw_spin_lock_irqsave(&quarantine_lock, flags);
 	for (i = 0; i < QUARANTINE_BATCHES; i++) {
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index afb7185ffdc4..befe6183806f 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -149,7 +149,7 @@ static void tlb_remove_table_sync_one(void)
 	 * It is however sufficient for software page-table walkers that rely on
 	 * IRQ disabling.
 	 */
-	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
+	smp_xcall(XCALL_ALL, tlb_remove_table_smp_sync, NULL, XCALL_TYPE_SYNC);
 }
 
 static void tlb_remove_table_rcu(struct rcu_head *head)
diff --git a/mm/slab.c b/mm/slab.c
index b04e40078bdf..e53acef380f3 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2171,7 +2171,7 @@ static void drain_cpu_caches(struct kmem_cache *cachep)
 	int node;
 	LIST_HEAD(list);
 
-	on_each_cpu(do_drain, cachep, 1);
+	smp_xcall(XCALL_ALL, do_drain, cachep, XCALL_TYPE_SYNC);
 	check_irq_on();
 	for_each_kmem_cache_node(cachep, node, n)
 		if (n->alien)
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index eb0295d90039..7fba16e99665 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -574,7 +574,7 @@ static int iucv_enable(void)
 static void iucv_disable(void)
 {
 	cpus_read_lock();
-	on_each_cpu(iucv_retrieve_cpu, NULL, 1);
+	smp_xcall(XCALL_ALL, iucv_retrieve_cpu, NULL, XCALL_TYPE_SYNC);
 	kfree(iucv_path_table);
 	iucv_path_table = NULL;
 	cpus_read_unlock();
@@ -696,7 +696,7 @@ static void iucv_cleanup_queue(void)
 	 * pending interrupts force them to the work queue by calling
 	 * an empty function on all cpus.
 	 */
-	smp_call_function(__iucv_cleanup_queue, NULL, 1);
+	smp_xcall(XCALL_ALL, __iucv_cleanup_queue, NULL, XCALL_TYPE_SYNC);
 	spin_lock_irq(&iucv_queue_lock);
 	list_for_each_entry_safe(p, n, &iucv_task_queue, list) {
 		/* Remove stale work items from the task queue. */
@@ -787,7 +787,7 @@ static int iucv_reboot_event(struct notifier_block *this,
 		return NOTIFY_DONE;
 
 	cpus_read_lock();
-	on_each_cpu_mask(&iucv_irq_cpumask, iucv_block_cpu, NULL, 1);
+	smp_xcall_mask(&iucv_irq_cpumask, iucv_block_cpu, NULL, XCALL_TYPE_SYNC);
 	preempt_disable();
 	for (i = 0; i < iucv_max_pathid; i++) {
 		if (iucv_path_table[i])
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 70e05af5ebea..adb7cbe67ee5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -244,7 +244,7 @@ static inline bool kvm_kick_many_cpus(struct cpumask *cpus, bool wait)
 	if (cpumask_empty(cpus))
 		return false;
 
-	smp_call_function_many(cpus, ack_flush, NULL, wait);
+	smp_xcall_mask(cpus, ack_flush, NULL, (wait ? XCALL_TYPE_SYNC : XCALL_TYPE_ASYNC));
 	return true;
 }
 
@@ -4895,7 +4895,7 @@ static void hardware_disable_all_nolock(void)
 
 	kvm_usage_count--;
 	if (!kvm_usage_count)
-		on_each_cpu(hardware_disable_nolock, NULL, 1);
+		smp_xcall(XCALL_ALL, hardware_disable_nolock, NULL, XCALL_TYPE_SYNC);
 }
 
 static void hardware_disable_all(void)
@@ -4914,7 +4914,7 @@ static int hardware_enable_all(void)
 	kvm_usage_count++;
 	if (kvm_usage_count == 1) {
 		atomic_set(&hardware_enable_failed, 0);
-		on_each_cpu(hardware_enable_nolock, NULL, 1);
+		smp_xcall(XCALL_ALL, hardware_enable_nolock, NULL, XCALL_TYPE_SYNC);
 
 		if (atomic_read(&hardware_enable_failed)) {
 			hardware_disable_all_nolock();
@@ -4938,7 +4938,7 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
 	 */
 	pr_info("kvm: exiting hardware virtualization\n");
 	kvm_rebooting = true;
-	on_each_cpu(hardware_disable_nolock, NULL, 1);
+	smp_xcall(XCALL_ALL, hardware_disable_nolock, NULL, XCALL_TYPE_SYNC);
 	return NOTIFY_OK;
 }
 
@@ -5790,7 +5790,7 @@ void kvm_exit(void)
 	unregister_syscore_ops(&kvm_syscore_ops);
 	unregister_reboot_notifier(&kvm_reboot_notifier);
 	cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
-	on_each_cpu(hardware_disable_nolock, NULL, 1);
+	smp_xcall(XCALL_ALL, hardware_disable_nolock, NULL, XCALL_TYPE_SYNC);
 	kvm_arch_hardware_unsetup();
 	kvm_arch_exit();
 	kvm_irqfd_exit();
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 09/11] smp: replace smp_call_function_single_async with smp_xcall_private
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (7 preceding siblings ...)
  2022-04-15  2:46 ` [PATCH 08/11] smp: replace smp_call_function_many_cond() with __smp_call_mask_cond() Donghai Qiao
@ 2022-04-15  2:46 ` Donghai Qiao
  2022-04-15  2:47 ` [PATCH 10/11] smp: replace smp_call_function_single() with smp_xcall() Donghai Qiao
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:46 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replace smp_call_function_single_async with smp_xcall_private
and modify all the invocations.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 arch/mips/kernel/process.c                      | 2 +-
 arch/mips/kernel/smp.c                          | 2 +-
 arch/s390/pci/pci_irq.c                         | 2 +-
 arch/x86/kernel/cpuid.c                         | 2 +-
 arch/x86/lib/msr-smp.c                          | 2 +-
 block/blk-mq.c                                  | 2 +-
 drivers/clocksource/ingenic-timer.c             | 2 +-
 drivers/cpuidle/coupled.c                       | 2 +-
 drivers/net/ethernet/cavium/liquidio/lio_core.c | 2 +-
 include/linux/smp.h                             | 3 ---
 kernel/debug/debug_core.c                       | 2 +-
 kernel/sched/core.c                             | 2 +-
 kernel/sched/fair.c                             | 2 +-
 net/core/dev.c                                  | 2 +-
 14 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index c2d5f4bfe1f3..5a63adccdcaf 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -745,7 +745,7 @@ static void raise_backtrace(cpumask_t *mask)
 		}
 
 		csd = &per_cpu(backtrace_csd, cpu);
-		smp_call_function_single_async(cpu, csd);
+		smp_xcall_private(cpu, csd, XCALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index d5bb38bfaef5..6202e9c1ca0c 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -701,7 +701,7 @@ void tick_broadcast(const struct cpumask *mask)
 
 	for_each_cpu(cpu, mask) {
 		csd = &per_cpu(tick_broadcast_csd, cpu);
-		smp_call_function_single_async(cpu, csd);
+		smp_xcall_private(cpu, csd, XCALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 325c42c6ddb4..37724c600d51 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -212,7 +212,7 @@ static void zpci_handle_fallback_irq(void)
 			continue;
 
 		INIT_CSD(&cpu_data->csd, zpci_handle_remote_irq, &cpu_data->scheduled);
-		smp_call_function_single_async(cpu, &cpu_data->csd);
+		smp_xcall_private(cpu, &cpu_data->csd, XCALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/arch/x86/kernel/cpuid.c b/arch/x86/kernel/cpuid.c
index 6f7b8cc1bc9f..3e75dfe07314 100644
--- a/arch/x86/kernel/cpuid.c
+++ b/arch/x86/kernel/cpuid.c
@@ -81,7 +81,7 @@ static ssize_t cpuid_read(struct file *file, char __user *buf,
 		cmd.regs.eax = pos;
 		cmd.regs.ecx = pos >> 32;
 
-		err = smp_call_function_single_async(cpu, &csd);
+		err = smp_xcall_private(cpu, &csd, XCALL_TYPE_ASYNC);
 		if (err)
 			break;
 		wait_for_completion(&cmd.done);
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index 68170a28270f..8c6b85bdc2d3 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -178,7 +178,7 @@ int rdmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
 	init_completion(&rv.done);
 	rv.msr.msr_no = msr_no;
 
-	err = smp_call_function_single_async(cpu, &csd);
+	err = smp_xcall_private(cpu, &csd, XCALL_TYPE_ASYNC);
 	if (!err) {
 		wait_for_completion(&rv.done);
 		err = rv.msr.err;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ed3ed86f7dd2..548960494d79 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1062,7 +1062,7 @@ static void blk_mq_complete_send_ipi(struct request *rq)
 	list = &per_cpu(blk_cpu_done, cpu);
 	if (llist_add(&rq->ipi_list, list)) {
 		INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq);
-		smp_call_function_single_async(cpu, &rq->csd);
+		smp_xcall_private(cpu, &rq->csd, XCALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/drivers/clocksource/ingenic-timer.c b/drivers/clocksource/ingenic-timer.c
index 24ed0f1f089b..30e437679ca9 100644
--- a/drivers/clocksource/ingenic-timer.c
+++ b/drivers/clocksource/ingenic-timer.c
@@ -121,7 +121,7 @@ static irqreturn_t ingenic_tcu_cevt_cb(int irq, void *dev_id)
 		csd = &per_cpu(ingenic_cevt_csd, timer->cpu);
 		csd->info = (void *) &timer->cevt;
 		csd->func = ingenic_per_cpu_event_handler;
-		smp_call_function_single_async(timer->cpu, csd);
+		smp_xcall_private(timer->cpu, csd, XCALL_TYPE_ASYNC);
 	}
 
 	return IRQ_HANDLED;
diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
index 74068742cef3..bec03fdc6edf 100644
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -334,7 +334,7 @@ static void cpuidle_coupled_poke(int cpu)
 	call_single_data_t *csd = &per_cpu(cpuidle_coupled_poke_cb, cpu);
 
 	if (!cpumask_test_and_set_cpu(cpu, &cpuidle_coupled_poke_pending))
-		smp_call_function_single_async(cpu, csd);
+		smp_xcall_private(cpu, csd, XCALL_TYPE_ASYNC);
 }
 
 /**
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 73cb03266549..ae97533c7f8b 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -729,7 +729,7 @@ static void liquidio_napi_drv_callback(void *arg)
 		napi_schedule_irqoff(&droq->napi);
 	} else {
 		INIT_CSD(&droq->csd, napi_schedule_wrapper, &droq->napi);
-		smp_call_function_single_async(droq->cpu_id, &droq->csd);
+		smp_xcall_private(droq->cpu_id, &droq->csd, XCALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 673192e2192e..de9b850722b3 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -206,9 +206,6 @@ extern unsigned int total_cpus;
 int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 			     int wait);
 
-#define	smp_call_function_single_async(cpu, csd) \
-	smp_xcall_private(cpu, csd, XCALL_TYPE_ASYNC)
-
 /*
  * Cpus stopping functions in panic. All have default weak definitions.
  * Architecture-dependent code may override them.
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index da06a5553835..cb69113251c9 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -264,7 +264,7 @@ void __weak kgdb_roundup_cpus(void)
 			continue;
 		kgdb_info[cpu].rounding_up = true;
 
-		ret = smp_call_function_single_async(cpu, csd);
+		ret = smp_xcall_private(cpu, csd, XCALL_TYPE_ASYNC);
 		if (ret)
 			kgdb_info[cpu].rounding_up = false;
 	}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 417355fbe32d..610e02b4c598 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -836,7 +836,7 @@ void hrtick_start(struct rq *rq, u64 delay)
 	if (rq == this_rq())
 		__hrtick_restart(rq);
 	else
-		smp_call_function_single_async(cpu_of(rq), &rq->hrtick_csd);
+		smp_xcall_private(cpu_of(rq), &rq->hrtick_csd, XCALL_TYPE_ASYNC);
 }
 
 #else
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d4bd299d67ab..1b060a64cb93 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10461,7 +10461,7 @@ static void kick_ilb(unsigned int flags)
 	 * is idle. And the softirq performing nohz idle load balance
 	 * will be run before returning from the IPI.
 	 */
-	smp_call_function_single_async(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd);
+	smp_xcall_private(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd, XCALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/net/core/dev.c b/net/core/dev.c
index 8c6c08446556..0c8e18f6f53c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5788,7 +5788,7 @@ static void net_rps_send_ipi(struct softnet_data *remsd)
 		struct softnet_data *next = remsd->rps_ipi_next;
 
 		if (cpu_online(remsd->cpu))
-			smp_call_function_single_async(remsd->cpu, &remsd->csd);
+			smp_xcall_private(remsd->cpu, &remsd->csd, XCALL_TYPE_ASYNC);
 		remsd = next;
 	}
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 10/11] smp: replace smp_call_function_single() with smp_xcall()
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (8 preceding siblings ...)
  2022-04-15  2:46 ` [PATCH 09/11] smp: replace smp_call_function_single_async with smp_xcall_private Donghai Qiao
@ 2022-04-15  2:47 ` Donghai Qiao
  2022-04-15  2:47 ` [PATCH 11/11] smp: modify up.c to adopt the same format of cross CPU call Donghai Qiao
  2022-04-15 12:13 ` [PATCH 00/11] smp: cross CPU call interface Peter Zijlstra
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:47 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replace smp_call_function_single() with smp_xcall() and
changes all the invocations for it.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 arch/alpha/kernel/rtc.c                       |  4 +--
 arch/arm/mach-bcm/bcm_kona_smc.c              |  2 +-
 arch/arm/mach-mvebu/pmsu.c                    |  4 +--
 arch/arm64/kernel/topology.c                  |  2 +-
 arch/csky/kernel/cpu-probe.c                  |  2 +-
 arch/ia64/kernel/palinfo.c                    |  3 +-
 arch/ia64/kernel/smpboot.c                    |  2 +-
 arch/mips/kernel/smp-bmips.c                  |  3 +-
 arch/mips/kernel/smp-cps.c                    |  8 ++---
 arch/powerpc/kernel/sysfs.c                   | 26 +++++++-------
 arch/powerpc/kernel/watchdog.c                |  4 +--
 arch/powerpc/kvm/book3s_hv.c                  |  8 ++---
 arch/powerpc/platforms/85xx/smp.c             |  6 ++--
 arch/sparc/kernel/nmi.c                       |  4 +--
 arch/x86/kernel/apic/vector.c                 |  2 +-
 arch/x86/kernel/cpu/aperfmperf.c              |  5 +--
 arch/x86/kernel/cpu/mce/amd.c                 |  4 +--
 arch/x86/kernel/cpu/mce/inject.c              |  8 ++---
 arch/x86/kernel/cpu/microcode/core.c          |  4 +--
 arch/x86/kernel/cpu/mtrr/mtrr.c               |  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |  2 +-
 arch/x86/kernel/kvm.c                         |  4 +--
 arch/x86/kvm/vmx/vmx.c                        |  3 +-
 arch/x86/kvm/x86.c                            |  7 ++--
 arch/x86/lib/cache-smp.c                      |  2 +-
 arch/x86/lib/msr-smp.c                        | 16 ++++-----
 arch/x86/xen/mmu_pv.c                         |  2 +-
 arch/xtensa/kernel/smp.c                      |  7 ++--
 drivers/acpi/processor_idle.c                 |  4 +--
 drivers/cpufreq/powernow-k8.c                 |  9 +++--
 drivers/cpufreq/powernv-cpufreq.c             |  2 +-
 drivers/cpufreq/sparc-us2e-cpufreq.c          |  4 +--
 drivers/cpufreq/sparc-us3-cpufreq.c           |  4 +--
 drivers/cpufreq/speedstep-ich.c               |  7 ++--
 drivers/cpufreq/tegra194-cpufreq.c            |  6 ++--
 .../hwtracing/coresight/coresight-cpu-debug.c |  3 +-
 .../coresight/coresight-etm3x-core.c          | 11 +++---
 .../coresight/coresight-etm4x-core.c          | 12 +++----
 .../coresight/coresight-etm4x-sysfs.c         |  2 +-
 drivers/hwtracing/coresight/coresight-trbe.c  |  6 ++--
 drivers/net/ethernet/marvell/mvneta.c         |  4 +--
 .../intel/speed_select_if/isst_if_mbox_msr.c  |  4 +--
 drivers/powercap/intel_rapl_common.c          |  2 +-
 drivers/powercap/intel_rapl_msr.c             |  2 +-
 drivers/regulator/qcom_spmi-regulator.c       |  3 +-
 drivers/soc/fsl/qbman/qman.c                  |  4 +--
 drivers/soc/fsl/qbman/qman_test_stash.c       |  9 ++---
 include/linux/smp.h                           |  3 +-
 kernel/cpu.c                                  |  4 +--
 kernel/events/core.c                          | 10 +++---
 kernel/rcu/rcutorture.c                       |  3 +-
 kernel/rcu/tasks.h                            |  4 +--
 kernel/rcu/tree.c                             |  2 +-
 kernel/rcu/tree_exp.h                         |  4 +--
 kernel/relay.c                                |  5 ++-
 kernel/scftorture.c                           |  5 +--
 kernel/sched/membarrier.c                     |  2 +-
 kernel/smp.c                                  | 34 ++-----------------
 kernel/time/clockevents.c                     |  2 +-
 kernel/time/clocksource.c                     |  2 +-
 kernel/time/tick-common.c                     |  2 +-
 net/bpf/test_run.c                            |  4 +--
 net/iucv/iucv.c                               | 11 +++---
 virt/kvm/kvm_main.c                           |  2 +-
 64 files changed, 149 insertions(+), 194 deletions(-)

diff --git a/arch/alpha/kernel/rtc.c b/arch/alpha/kernel/rtc.c
index fb3025396ac9..bd7abb6874c6 100644
--- a/arch/alpha/kernel/rtc.c
+++ b/arch/alpha/kernel/rtc.c
@@ -168,7 +168,7 @@ remote_read_time(struct device *dev, struct rtc_time *tm)
 	union remote_data x;
 	if (smp_processor_id() != boot_cpuid) {
 		x.tm = tm;
-		smp_call_function_single(boot_cpuid, do_remote_read, &x, 1);
+		smp_xcall(boot_cpuid, do_remote_read, &x, XCALL_TYPE_SYNC);
 		return x.retval;
 	}
 	return alpha_rtc_read_time(NULL, tm);
@@ -187,7 +187,7 @@ remote_set_time(struct device *dev, struct rtc_time *tm)
 	union remote_data x;
 	if (smp_processor_id() != boot_cpuid) {
 		x.tm = tm;
-		smp_call_function_single(boot_cpuid, do_remote_set, &x, 1);
+		smp_xcall(boot_cpuid, do_remote_set, &x, XCALL_TYPE_SYNC);
 		return x.retval;
 	}
 	return alpha_rtc_set_time(NULL, tm);
diff --git a/arch/arm/mach-bcm/bcm_kona_smc.c b/arch/arm/mach-bcm/bcm_kona_smc.c
index 43829e49ad93..1121a68a4283 100644
--- a/arch/arm/mach-bcm/bcm_kona_smc.c
+++ b/arch/arm/mach-bcm/bcm_kona_smc.c
@@ -172,7 +172,7 @@ unsigned bcm_kona_smc(unsigned service_id, unsigned arg0, unsigned arg1,
 	 * Due to a limitation of the secure monitor, we must use the SMP
 	 * infrastructure to forward all secure monitor calls to Core 0.
 	 */
-	smp_call_function_single(0, __bcm_kona_smc, &data, 1);
+	smp_xcall(0, __bcm_kona_smc, &data, XCALL_TYPE_SYNC);
 
 	return data.result;
 }
diff --git a/arch/arm/mach-mvebu/pmsu.c b/arch/arm/mach-mvebu/pmsu.c
index 73d5d72dfc3e..dac1587c22b0 100644
--- a/arch/arm/mach-mvebu/pmsu.c
+++ b/arch/arm/mach-mvebu/pmsu.c
@@ -585,8 +585,8 @@ int mvebu_pmsu_dfs_request(int cpu)
 	writel(reg, pmsu_mp_base + PMSU_EVENT_STATUS_AND_MASK(hwcpu));
 
 	/* Trigger the DFS on the appropriate CPU */
-	smp_call_function_single(cpu, mvebu_pmsu_dfs_request_local,
-				 NULL, false);
+	smp_xcall(cpu, mvebu_pmsu_dfs_request_local,
+				 NULL, XCALL_TYPE_ASYNC);
 
 	/* Poll until the DFS done event is generated */
 	timeout = jiffies + HZ;
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 9ab78ad826e2..4e2651822281 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -331,7 +331,7 @@ int counters_read_on_cpu(int cpu, smp_call_func_t func, u64 *val)
 	if (WARN_ON_ONCE(irqs_disabled()))
 		return -EPERM;
 
-	smp_call_function_single(cpu, func, val, 1);
+	smp_xcall(cpu, func, val, XCALL_TYPE_SYNC);
 
 	return 0;
 }
diff --git a/arch/csky/kernel/cpu-probe.c b/arch/csky/kernel/cpu-probe.c
index 5f15ca31d3e8..860bb2233d20 100644
--- a/arch/csky/kernel/cpu-probe.c
+++ b/arch/csky/kernel/cpu-probe.c
@@ -48,7 +48,7 @@ static int c_show(struct seq_file *m, void *v)
 	int cpu;
 
 	for_each_online_cpu(cpu)
-		smp_call_function_single(cpu, percpu_print, m, true);
+		smp_xcall(cpu, percpu_print, m, XCALL_TYPE_SYNC);
 
 #ifdef CSKY_ARCH_VERSION
 	seq_printf(m, "arch-version : %s\n", CSKY_ARCH_VERSION);
diff --git a/arch/ia64/kernel/palinfo.c b/arch/ia64/kernel/palinfo.c
index 64189f04c1a4..0b2885ab08f8 100644
--- a/arch/ia64/kernel/palinfo.c
+++ b/arch/ia64/kernel/palinfo.c
@@ -844,7 +844,8 @@ int palinfo_handle_smp(struct seq_file *m, pal_func_cpu_u_t *f)
 
 
 	/* will send IPI to other CPU and wait for completion of remote call */
-	if ((ret=smp_call_function_single(f->req_cpu, palinfo_smp_call, &ptr, 1))) {
+	ret = smp_xcall(f->req_cpu, palinfo_smp_call, &ptr, XCALL_TYPE_SYNC);
+	if (ret) {
 		printk(KERN_ERR "palinfo: remote CPU call from %d to %d on function %d: "
 		       "error %d\n", smp_processor_id(), f->req_cpu, f->func_id, ret);
 		return 0;
diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index d10f780c13b9..f552fa6c69bb 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -295,7 +295,7 @@ ia64_sync_itc (unsigned int master)
 
 	go[MASTER] = 1;
 
-	if (smp_call_function_single(master, sync_master, NULL, 0) < 0) {
+	if (smp_xcall(master, sync_master, NULL, XCALL_TYPE_ASYNC) < 0) {
 		printk(KERN_ERR "sync_itc: failed to get attention of CPU %u!\n", master);
 		return;
 	}
diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
index f5d7bfa3472a..7c5ab463bf51 100644
--- a/arch/mips/kernel/smp-bmips.c
+++ b/arch/mips/kernel/smp-bmips.c
@@ -488,8 +488,7 @@ static void bmips_set_reset_vec_remote(void *vinfo)
 
 	preempt_disable();
 	if (smp_processor_id() > 0) {
-		smp_call_function_single(0, &bmips_set_reset_vec_remote,
-					 info, 1);
+		smp_xcall(0, &bmips_set_reset_vec_remote, info, XCALL_TYPE_SYNC);
 	} else {
 		if (info->cpu & 0x02) {
 			/* BMIPS5200 "should" use mask/shift, but it's buggy */
diff --git a/arch/mips/kernel/smp-cps.c b/arch/mips/kernel/smp-cps.c
index bcd6a944b839..ad9178617167 100644
--- a/arch/mips/kernel/smp-cps.c
+++ b/arch/mips/kernel/smp-cps.c
@@ -341,8 +341,7 @@ static int cps_boot_secondary(int cpu, struct task_struct *idle)
 			goto out;
 		}
 
-		err = smp_call_function_single(remote, remote_vpe_boot,
-					       NULL, 1);
+		err = smp_xcall(remote, remote_vpe_boot, NULL, XCALL_TYPE_SYNC);
 		if (err)
 			panic("Failed to call remote CPU\n");
 		goto out;
@@ -587,9 +586,8 @@ static void cps_cpu_die(unsigned int cpu)
 		 * Have a CPU with access to the offlined CPUs registers wait
 		 * for its TC to halt.
 		 */
-		err = smp_call_function_single(cpu_death_sibling,
-					       wait_for_sibling_halt,
-					       (void *)(unsigned long)cpu, 1);
+		err = smp_xcall(cpu_death_sibling, wait_for_sibling_halt,
+			       (void *)(unsigned long)cpu, XCALL_TYPE_SYNC);
 		if (err)
 			panic("Failed to call remote sibling CPU\n");
 	} else if (cpu_has_vp) {
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 0ce6aff8eca0..77fc1c56598c 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -98,7 +98,7 @@ static ssize_t show_##NAME(struct device *dev, \
 { \
 	struct cpu *cpu = container_of(dev, struct cpu, dev); \
 	unsigned long val; \
-	smp_call_function_single(cpu->dev.id, read_##NAME, &val, 1);	\
+	smp_xcall(cpu->dev.id, read_##NAME, &val, XCALL_TYPE_SYNC);	\
 	return sprintf(buf, "%lx\n", val); \
 } \
 static ssize_t __used \
@@ -110,7 +110,7 @@ static ssize_t __used \
 	int ret = sscanf(buf, "%lx", &val); \
 	if (ret != 1) \
 		return -EINVAL; \
-	smp_call_function_single(cpu->dev.id, write_##NAME, &val, 1); \
+	smp_xcall(cpu->dev.id, write_##NAME, &val, XCALL_TYPE_SYNC); \
 	return count; \
 }
 
@@ -262,7 +262,7 @@ static ssize_t show_pw20_state(struct device *dev,
 	u32 value;
 	unsigned int cpu = dev->id;
 
-	smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+	smp_xcall(cpu, do_show_pwrmgtcr0, &value, XCALL_TYPE_SYNC);
 
 	value &= PWRMGTCR0_PW20_WAIT;
 
@@ -297,7 +297,7 @@ static ssize_t store_pw20_state(struct device *dev,
 	if (value > 1)
 		return -EINVAL;
 
-	smp_call_function_single(cpu, do_store_pw20_state, &value, 1);
+	smp_xcall(cpu, do_store_pw20_state, &value, XCALL_TYPE_SYNC);
 
 	return count;
 }
@@ -312,7 +312,7 @@ static ssize_t show_pw20_wait_time(struct device *dev,
 	unsigned int cpu = dev->id;
 
 	if (!pw20_wt) {
-		smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+		smp_xcall(cpu, do_show_pwrmgtcr0, &value, XCALL_TYPE_SYNC);
 		value = (value & PWRMGTCR0_PW20_ENT) >>
 					PWRMGTCR0_PW20_ENT_SHIFT;
 
@@ -372,8 +372,7 @@ static ssize_t store_pw20_wait_time(struct device *dev,
 
 	pw20_wt = value;
 
-	smp_call_function_single(cpu, set_pw20_wait_entry_bit,
-				&entry_bit, 1);
+	smp_xcall(cpu, set_pw20_wait_entry_bit, &entry_bit, XCALL_TYPE_SYNC);
 
 	return count;
 }
@@ -384,7 +383,7 @@ static ssize_t show_altivec_idle(struct device *dev,
 	u32 value;
 	unsigned int cpu = dev->id;
 
-	smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+	smp_xcall(cpu, do_show_pwrmgtcr0, &value, XCALL_TYPE_SYNC);
 
 	value &= PWRMGTCR0_AV_IDLE_PD_EN;
 
@@ -419,7 +418,7 @@ static ssize_t store_altivec_idle(struct device *dev,
 	if (value > 1)
 		return -EINVAL;
 
-	smp_call_function_single(cpu, do_store_altivec_idle, &value, 1);
+	smp_xcall(cpu, do_store_altivec_idle, &value, XCALL_TYPE_SYNC);
 
 	return count;
 }
@@ -434,7 +433,7 @@ static ssize_t show_altivec_idle_wait_time(struct device *dev,
 	unsigned int cpu = dev->id;
 
 	if (!altivec_idle_wt) {
-		smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+		smp_xcall(cpu, do_show_pwrmgtcr0, &value, XCALL_TYPE_SYNC);
 		value = (value & PWRMGTCR0_AV_IDLE_CNT) >>
 					PWRMGTCR0_AV_IDLE_CNT_SHIFT;
 
@@ -494,8 +493,7 @@ static ssize_t store_altivec_idle_wait_time(struct device *dev,
 
 	altivec_idle_wt = value;
 
-	smp_call_function_single(cpu, set_altivec_idle_wait_entry_bit,
-				&entry_bit, 1);
+	smp_xcall(cpu, set_altivec_idle_wait_entry_bit, &entry_bit, XCALL_TYPE_SYNC);
 
 	return count;
 }
@@ -768,7 +766,7 @@ static ssize_t idle_purr_show(struct device *dev,
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
 	u64 val;
 
-	smp_call_function_single(cpu->dev.id, read_idle_purr, &val, 1);
+	smp_xcall(cpu->dev.id, read_idle_purr, &val, XCALL_TYPE_SYNC);
 	return sprintf(buf, "%llx\n", val);
 }
 static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL);
@@ -798,7 +796,7 @@ static ssize_t idle_spurr_show(struct device *dev,
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
 	u64 val;
 
-	smp_call_function_single(cpu->dev.id, read_idle_spurr, &val, 1);
+	smp_xcall(cpu->dev.id, read_idle_spurr, &val, XCALL_TYPE_SYNC);
 	return sprintf(buf, "%llx\n", val);
 }
 static DEVICE_ATTR(idle_spurr, 0400, idle_spurr_show, NULL);
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index bfc27496fe7e..4afea23b9b28 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -499,7 +499,7 @@ static void start_watchdog(void *arg)
 
 static int start_watchdog_on_cpu(unsigned int cpu)
 {
-	return smp_call_function_single(cpu, start_watchdog, NULL, true);
+	return smp_xcall(cpu, start_watchdog, NULL, XCALL_TYPE_SYNC);
 }
 
 static void stop_watchdog(void *arg)
@@ -522,7 +522,7 @@ static void stop_watchdog(void *arg)
 
 static int stop_watchdog_on_cpu(unsigned int cpu)
 {
-	return smp_call_function_single(cpu, stop_watchdog, NULL, true);
+	return smp_xcall(cpu, stop_watchdog, NULL, XCALL_TYPE_SYNC);
 }
 
 static void watchdog_calc_timeouts(void)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6fa518f6501d..86afa4e9fce4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1392,7 +1392,7 @@ static unsigned long kvmppc_read_dpdes(struct kvm_vcpu *vcpu)
 		 */
 		pcpu = READ_ONCE(v->cpu);
 		if (pcpu >= 0)
-			smp_call_function_single(pcpu, do_nothing, NULL, 1);
+			smp_xcall(pcpu, do_nothing, NULL, XCALL_TYPE_SYNC);
 		if (kvmppc_doorbell_pending(v))
 			dpdes |= 1 << thr;
 	}
@@ -3082,7 +3082,7 @@ static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu)
 		struct kvm *running = *per_cpu_ptr(&cpu_in_guest, i);
 
 		if (running == kvm)
-			smp_call_function_single(i, do_nothing, NULL, 1);
+			smp_xcall(i, do_nothing, NULL, XCALL_TYPE_SYNC);
 	}
 }
 
@@ -3136,8 +3136,8 @@ static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu)
 			    cpu_first_tlb_thread_sibling(pcpu))
 				radix_flush_cpu(kvm, prev_cpu, vcpu);
 
-			smp_call_function_single(prev_cpu,
-					do_migrate_away_vcpu, vcpu, 1);
+			smp_xcall(prev_cpu, do_migrate_away_vcpu,
+				  vcpu, XCALL_TYPE_SYNC);
 		}
 		if (nested)
 			nested->prev_cpu[vcpu->arch.nested_vcpu_id] = pcpu;
diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index ca4995a39884..25f20650a024 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -300,12 +300,10 @@ static int smp_85xx_kick_cpu(int nr)
 		 * the other.
 		 */
 		if (cpu_online(primary)) {
-			smp_call_function_single(primary,
-					wake_hw_thread, &nr, 1);
+			smp_xcall(primary, wake_hw_thread, &nr, XCALL_TYPE_SYNC);
 			goto done;
 		} else if (cpu_online(primary + 1)) {
-			smp_call_function_single(primary + 1,
-					wake_hw_thread, &nr, 1);
+			smp_xcall(primary + 1, wake_hw_thread, &nr, XCALL_TYPE_SYNC);
 			goto done;
 		}
 
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index ff789082d5ab..311e854396e2 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -297,7 +297,7 @@ int watchdog_nmi_enable(unsigned int cpu)
 	if (!nmi_init_done)
 		return 0;
 
-	smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
+	smp_xcall(cpu, start_nmi_watchdog, NULL, XCALL_TYPE_SYNC);
 
 	return 0;
 }
@@ -310,5 +310,5 @@ void watchdog_nmi_disable(unsigned int cpu)
 	if (atomic_read(&nmi_active) == -1)
 		pr_warn_once("NMI watchdog cannot be enabled or disabled\n");
 	else
-		smp_call_function_single(cpu, stop_nmi_watchdog, NULL, 1);
+		smp_xcall(cpu, stop_nmi_watchdog, NULL, XCALL_TYPE_SYNC);
 }
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 3e6f6b448f6a..2ab86f7848a4 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -1266,7 +1266,7 @@ static void __init print_local_APICs(int maxcpu)
 	for_each_online_cpu(cpu) {
 		if (cpu >= maxcpu)
 			break;
-		smp_call_function_single(cpu, print_local_APIC, NULL, 1);
+		smp_xcall(cpu, print_local_APIC, NULL, XCALL_TYPE_SYNC);
 	}
 	preempt_enable();
 }
diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index 9ca008f9e9b1..c83dc3148bf0 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -77,7 +77,8 @@ static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait)
 		return true;
 
 	if (!atomic_xchg(&s->scfpending, 1) || wait)
-		smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait);
+		smp_xcall(cpu, aperfmperf_snapshot_khz, NULL,
+				(wait ? XCALL_TYPE_SYNC : XCALL_TYPE_ASYNC));
 
 	/* Return false if the previous iteration was too long ago. */
 	return time_delta <= APERFMPERF_STALE_THRESHOLD_MS;
@@ -145,7 +146,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
 	msleep(APERFMPERF_REFRESH_DELAY_MS);
 	atomic_set(&s->scfpending, 1);
 	smp_mb(); /* ->scfpending before smp_call_function_single(). */
-	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
+	smp_xcall(cpu, aperfmperf_snapshot_khz, NULL, XCALL_TYPE_SYNC);
 
 	return per_cpu(samples.khz, cpu);
 }
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1940d305db1c..c7f928abbaf6 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -929,7 +929,7 @@ store_interrupt_enable(struct threshold_block *b, const char *buf, size_t size)
 	memset(&tr, 0, sizeof(tr));
 	tr.b		= b;
 
-	if (smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1))
+	if (smp_xcall(b->cpu, threshold_restart_bank, &tr, XCALL_TYPE_SYNC))
 		return -ENODEV;
 
 	return size;
@@ -954,7 +954,7 @@ store_threshold_limit(struct threshold_block *b, const char *buf, size_t size)
 	b->threshold_limit = new;
 	tr.b = b;
 
-	if (smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1))
+	if (smp_xcall(b->cpu, threshold_restart_bank, &tr, XCALL_TYPE_SYNC))
 		return -ENODEV;
 
 	return size;
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index f23445733020..d72076155246 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -550,19 +550,19 @@ static void do_inject(void)
 
 	i_mce.mcgstatus = mcg_status;
 	i_mce.inject_flags = inj_type;
-	smp_call_function_single(cpu, prepare_msrs, &i_mce, 0);
+	smp_xcall(cpu, prepare_msrs, &i_mce, XCALL_TYPE_ASYNC);
 
 	toggle_hw_mce_inject(cpu, false);
 
 	switch (inj_type) {
 	case DFR_INT_INJ:
-		smp_call_function_single(cpu, trigger_dfr_int, NULL, 0);
+		smp_xcall(cpu, trigger_dfr_int, NULL, XCALL_TYPE_ASYNC);
 		break;
 	case THR_INT_INJ:
-		smp_call_function_single(cpu, trigger_thr_int, NULL, 0);
+		smp_xcall(cpu, trigger_thr_int, NULL, XCALL_TYPE_ASYNC);
 		break;
 	default:
-		smp_call_function_single(cpu, trigger_mce, NULL, 0);
+		smp_xcall(cpu, trigger_mce, NULL, XCALL_TYPE_ASYNC);
 	}
 
 err:
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index f955d25076ba..3fbf83934e58 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -332,7 +332,7 @@ static int collect_cpu_info_on_target(int cpu, struct cpu_signature *cpu_sig)
 	struct cpu_info_ctx ctx = { .cpu_sig = cpu_sig, .err = 0 };
 	int ret;
 
-	ret = smp_call_function_single(cpu, collect_cpu_info_local, &ctx, 1);
+	ret = smp_xcall(cpu, collect_cpu_info_local, &ctx, XCALL_TYPE_SYNC);
 	if (!ret)
 		ret = ctx.err;
 
@@ -365,7 +365,7 @@ static int apply_microcode_on_target(int cpu)
 	enum ucode_state err;
 	int ret;
 
-	ret = smp_call_function_single(cpu, apply_microcode_local, &err, 1);
+	ret = smp_xcall(cpu, apply_microcode_local, &err, XCALL_TYPE_SYNC);
 	if (!ret) {
 		if (err == UCODE_ERROR)
 			ret = 1;
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 2746cac9d8a9..cdaab335e66d 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -820,7 +820,7 @@ void mtrr_save_state(void)
 		return;
 
 	first_cpu = cpumask_first(cpu_online_mask);
-	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
+	smp_xcall(first_cpu, mtrr_save_fixed_ranges, NULL, XCALL_TYPE_SYNC);
 }
 
 void set_mtrr_aps_delayed_init(void)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 2fee32deb701..5c742a14bfba 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -541,7 +541,7 @@ static void _update_task_closid_rmid(void *task)
 static void update_task_closid_rmid(struct task_struct *t)
 {
 	if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
-		smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
+		smp_xcall(task_cpu(t), _update_task_closid_rmid, t, XCALL_TYPE_SYNC);
 	else
 		_update_task_closid_rmid(t);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 9ba37c13df7b..b7b1ed4c19fa 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1118,7 +1118,7 @@ void arch_haltpoll_enable(unsigned int cpu)
 	}
 
 	/* Enable guest halt poll disables host halt poll */
-	smp_call_function_single(cpu, kvm_disable_host_haltpoll, NULL, 1);
+	smp_xcall(cpu, kvm_disable_host_haltpoll, NULL, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(arch_haltpoll_enable);
 
@@ -1128,7 +1128,7 @@ void arch_haltpoll_disable(unsigned int cpu)
 		return;
 
 	/* Disable guest halt poll enables host halt poll */
-	smp_call_function_single(cpu, kvm_enable_host_haltpoll, NULL, 1);
+	smp_xcall(cpu, kvm_enable_host_haltpoll, NULL, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(arch_haltpoll_disable);
 #endif
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 04d170c4b61e..506e3aa50d9f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -656,8 +656,7 @@ void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
 	int cpu = loaded_vmcs->cpu;
 
 	if (cpu != -1)
-		smp_call_function_single(cpu,
-			 __loaded_vmcs_clear, loaded_vmcs, 1);
+		smp_xcall(cpu, __loaded_vmcs_clear, loaded_vmcs, XCALL_TYPE_SYNC);
 }
 
 static bool vmx_segment_cache_test_set(struct vcpu_vmx *vmx, unsigned seg,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02b84f0bdff2..278262506de0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4552,8 +4552,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		if (static_call(kvm_x86_has_wbinvd_exit)())
 			cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
 		else if (vcpu->cpu != -1 && vcpu->cpu != cpu)
-			smp_call_function_single(vcpu->cpu,
-					wbinvd_ipi, NULL, 1);
+			smp_xcall(vcpu->cpu, wbinvd_ipi, NULL, XCALL_TYPE_SYNC);
 	}
 
 	static_call(kvm_x86_vcpu_load)(vcpu, cpu);
@@ -8730,7 +8729,7 @@ static void __kvmclock_cpufreq_notifier(struct cpufreq_freqs *freq, int cpu)
 	 *
 	 */
 
-	smp_call_function_single(cpu, tsc_khz_changed, freq, 1);
+	smp_xcall(cpu, tsc_khz_changed, freq, XCALL_TYPE_SYNC);
 
 	mutex_lock(&kvm_lock);
 	list_for_each_entry(kvm, &vm_list, vm_list) {
@@ -8757,7 +8756,7 @@ static void __kvmclock_cpufreq_notifier(struct cpufreq_freqs *freq, int cpu)
 		 * guest context is entered kvmclock will be updated,
 		 * so the guest will not see stale values.
 		 */
-		smp_call_function_single(cpu, tsc_khz_changed, freq, 1);
+		smp_xcall(cpu, tsc_khz_changed, freq, XCALL_TYPE_SYNC);
 	}
 }
 
diff --git a/arch/x86/lib/cache-smp.c b/arch/x86/lib/cache-smp.c
index d81977b85228..34d4619b4c58 100644
--- a/arch/x86/lib/cache-smp.c
+++ b/arch/x86/lib/cache-smp.c
@@ -9,7 +9,7 @@ static void __wbinvd(void *dummy)
 
 void wbinvd_on_cpu(int cpu)
 {
-	smp_call_function_single(cpu, __wbinvd, NULL, 1);
+	smp_xcall(cpu, __wbinvd, NULL, XCALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(wbinvd_on_cpu);
 
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index 8c6b85bdc2d3..bff5f9b59c06 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -41,7 +41,7 @@ int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
 	memset(&rv, 0, sizeof(rv));
 
 	rv.msr_no = msr_no;
-	err = smp_call_function_single(cpu, __rdmsr_on_cpu, &rv, 1);
+	err = smp_xcall(cpu, __rdmsr_on_cpu, &rv, XCALL_TYPE_SYNC);
 	*l = rv.reg.l;
 	*h = rv.reg.h;
 
@@ -57,7 +57,7 @@ int rdmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 *q)
 	memset(&rv, 0, sizeof(rv));
 
 	rv.msr_no = msr_no;
-	err = smp_call_function_single(cpu, __rdmsr_on_cpu, &rv, 1);
+	err = smp_xcall(cpu, __rdmsr_on_cpu, &rv, XCALL_TYPE_SYNC);
 	*q = rv.reg.q;
 
 	return err;
@@ -74,7 +74,7 @@ int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
 	rv.msr_no = msr_no;
 	rv.reg.l = l;
 	rv.reg.h = h;
-	err = smp_call_function_single(cpu, __wrmsr_on_cpu, &rv, 1);
+	err = smp_xcall(cpu, __wrmsr_on_cpu, &rv, XCALL_TYPE_SYNC);
 
 	return err;
 }
@@ -90,7 +90,7 @@ int wrmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 q)
 	rv.msr_no = msr_no;
 	rv.reg.q = q;
 
-	err = smp_call_function_single(cpu, __wrmsr_on_cpu, &rv, 1);
+	err = smp_xcall(cpu, __wrmsr_on_cpu, &rv, XCALL_TYPE_SYNC);
 
 	return err;
 }
@@ -200,7 +200,7 @@ int wrmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
 	rv.msr_no = msr_no;
 	rv.reg.l = l;
 	rv.reg.h = h;
-	err = smp_call_function_single(cpu, __wrmsr_safe_on_cpu, &rv, 1);
+	err = smp_xcall(cpu, __wrmsr_safe_on_cpu, &rv, XCALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
@@ -216,7 +216,7 @@ int wrmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 q)
 	rv.msr_no = msr_no;
 	rv.reg.q = q;
 
-	err = smp_call_function_single(cpu, __wrmsr_safe_on_cpu, &rv, 1);
+	err = smp_xcall(cpu, __wrmsr_safe_on_cpu, &rv, XCALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
@@ -259,7 +259,7 @@ int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8])
 
 	rv.regs   = regs;
 	rv.err    = -EIO;
-	err = smp_call_function_single(cpu, __rdmsr_safe_regs_on_cpu, &rv, 1);
+	err = smp_xcall(cpu, __rdmsr_safe_regs_on_cpu, &rv, XCALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
@@ -272,7 +272,7 @@ int wrmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8])
 
 	rv.regs = regs;
 	rv.err  = -EIO;
-	err = smp_call_function_single(cpu, __wrmsr_safe_regs_on_cpu, &rv, 1);
+	err = smp_xcall(cpu, __wrmsr_safe_regs_on_cpu, &rv, XCALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 1938d08b20e7..e738e570284f 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -930,7 +930,7 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
 		for_each_online_cpu(cpu) {
 			if (per_cpu(xen_current_cr3, cpu) != __pa(mm->pgd))
 				continue;
-			smp_call_function_single(cpu, drop_mm_ref_this_cpu, mm, 1);
+			smp_xcall(cpu, drop_mm_ref_this_cpu, mm, XCALL_TYPE_SYNC);
 		}
 		return;
 	}
diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c
index b2d126510c9f..b36e28fb5b61 100644
--- a/arch/xtensa/kernel/smp.c
+++ b/arch/xtensa/kernel/smp.c
@@ -201,7 +201,7 @@ static int boot_secondary(unsigned int cpu, struct task_struct *ts)
 	system_flush_invalidate_dcache_range((unsigned long)&cpu_start_id,
 					     sizeof(cpu_start_id));
 #endif
-	smp_call_function_single(0, mx_cpu_start, (void *)cpu, 1);
+	smp_xcall(0, mx_cpu_start, (void *)cpu, XCALL_TYPE_SYNC);
 
 	for (i = 0; i < 2; ++i) {
 		do
@@ -220,8 +220,7 @@ static int boot_secondary(unsigned int cpu, struct task_struct *ts)
 		} while (ccount && time_before(jiffies, timeout));
 
 		if (ccount) {
-			smp_call_function_single(0, mx_cpu_stop,
-						 (void *)cpu, 1);
+			smp_xcall(0, mx_cpu_stop, (void *)cpu, XCALL_TYPE_SYNC);
 			WRITE_ONCE(cpu_start_ccount, 0);
 			return -EIO;
 		}
@@ -292,7 +291,7 @@ int __cpu_disable(void)
 
 static void platform_cpu_kill(unsigned int cpu)
 {
-	smp_call_function_single(0, mx_cpu_stop, (void *)cpu, true);
+	smp_xcall(0, mx_cpu_stop, (void *)cpu, XCALL_TYPE_SYNC);
 }
 
 /*
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 4556c86c3465..49a656d954b5 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -160,8 +160,8 @@ static void __lapic_timer_propagate_broadcast(void *arg)
 
 static void lapic_timer_propagate_broadcast(struct acpi_processor *pr)
 {
-	smp_call_function_single(pr->id, __lapic_timer_propagate_broadcast,
-				 (void *)pr, 1);
+	smp_xcall(pr->id, __lapic_timer_propagate_broadcast,
+			 (void *)pr, XCALL_TYPE_SYNC);
 }
 
 /* Power(C) State timer broadcast control */
diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
index d289036beff2..160e3bd1e080 100644
--- a/drivers/cpufreq/powernow-k8.c
+++ b/drivers/cpufreq/powernow-k8.c
@@ -1025,7 +1025,7 @@ static int powernowk8_cpu_init(struct cpufreq_policy *pol)
 	struct init_on_cpu init_on_cpu;
 	int rc, cpu;
 
-	smp_call_function_single(pol->cpu, check_supported_cpu, &rc, 1);
+	smp_xcall(pol->cpu, check_supported_cpu, &rc, XCALL_TYPE_SYNC);
 	if (rc)
 		return -ENODEV;
 
@@ -1062,8 +1062,7 @@ static int powernowk8_cpu_init(struct cpufreq_policy *pol)
 
 	/* only run on specific CPU from here on */
 	init_on_cpu.data = data;
-	smp_call_function_single(data->cpu, powernowk8_cpu_init_on_cpu,
-				 &init_on_cpu, 1);
+	smp_xcall(data->cpu, powernowk8_cpu_init_on_cpu, &init_on_cpu, XCALL_TYPE_SYNC);
 	rc = init_on_cpu.rc;
 	if (rc != 0)
 		goto err_out_exit_acpi;
@@ -1124,7 +1123,7 @@ static unsigned int powernowk8_get(unsigned int cpu)
 	if (!data)
 		return 0;
 
-	smp_call_function_single(cpu, query_values_on_cpu, &err, true);
+	smp_xcall(cpu, query_values_on_cpu, &err, XCALL_TYPE_SYNC);
 	if (err)
 		goto out;
 
@@ -1182,7 +1181,7 @@ static int powernowk8_init(void)
 
 	cpus_read_lock();
 	for_each_online_cpu(i) {
-		smp_call_function_single(i, check_supported_cpu, &ret, 1);
+		smp_xcall(i, check_supported_cpu, &ret, XCALL_TYPE_SYNC);
 		if (!ret)
 			supported_cpus++;
 	}
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index aa7a02e1c647..755cebdd19a4 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -883,7 +883,7 @@ static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 
 	freq_data.pstate_id = idx_to_pstate(powernv_pstate_info.min);
 	freq_data.gpstate_id = idx_to_pstate(powernv_pstate_info.min);
-	smp_call_function_single(policy->cpu, set_pstate, &freq_data, 1);
+	smp_xcall(policy->cpu, set_pstate, &freq_data, XCALL_TYPE_SYNC);
 	if (gpstates)
 		del_timer_sync(&gpstates->timer);
 
diff --git a/drivers/cpufreq/sparc-us2e-cpufreq.c b/drivers/cpufreq/sparc-us2e-cpufreq.c
index 92acbb25abb3..848e3d2d7970 100644
--- a/drivers/cpufreq/sparc-us2e-cpufreq.c
+++ b/drivers/cpufreq/sparc-us2e-cpufreq.c
@@ -236,7 +236,7 @@ static unsigned int us2e_freq_get(unsigned int cpu)
 	unsigned long clock_tick, estar;
 
 	clock_tick = sparc64_get_clock_tick(cpu) / 1000;
-	if (smp_call_function_single(cpu, __us2e_freq_get, &estar, 1))
+	if (smp_xcall(cpu, __us2e_freq_get, &estar, XCALL_TYPE_SYNC))
 		return 0;
 
 	return clock_tick / estar_to_divisor(estar);
@@ -268,7 +268,7 @@ static int us2e_freq_target(struct cpufreq_policy *policy, unsigned int index)
 {
 	unsigned int cpu = policy->cpu;
 
-	return smp_call_function_single(cpu, __us2e_freq_target, &index, 1);
+	return smp_xcall(cpu, __us2e_freq_target, &index, XCALL_TYPE_SYNC);
 }
 
 static int __init us2e_freq_cpu_init(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/sparc-us3-cpufreq.c b/drivers/cpufreq/sparc-us3-cpufreq.c
index e41b35b16afd..8791579286a4 100644
--- a/drivers/cpufreq/sparc-us3-cpufreq.c
+++ b/drivers/cpufreq/sparc-us3-cpufreq.c
@@ -87,7 +87,7 @@ static unsigned int us3_freq_get(unsigned int cpu)
 {
 	unsigned long reg;
 
-	if (smp_call_function_single(cpu, read_safari_cfg, &reg, 1))
+	if (smp_xcall(cpu, read_safari_cfg, &reg, XCALL_TYPE_SYNC))
 		return 0;
 	return get_current_freq(cpu, reg);
 }
@@ -116,7 +116,7 @@ static int us3_freq_target(struct cpufreq_policy *policy, unsigned int index)
 		BUG();
 	}
 
-	return smp_call_function_single(cpu, update_safari_cfg, &new_bits, 1);
+	return smp_xcall(cpu, update_safari_cfg, &new_bits, XCALL_TYPE_SYNC);
 }
 
 static int __init us3_freq_cpu_init(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/speedstep-ich.c b/drivers/cpufreq/speedstep-ich.c
index f2076d72bf39..0f3410098a92 100644
--- a/drivers/cpufreq/speedstep-ich.c
+++ b/drivers/cpufreq/speedstep-ich.c
@@ -243,7 +243,7 @@ static unsigned int speedstep_get(unsigned int cpu)
 	unsigned int speed;
 
 	/* You're supposed to ensure CPU is online. */
-	BUG_ON(smp_call_function_single(cpu, get_freq_data, &speed, 1));
+	BUG_ON(smp_xcall(cpu, get_freq_data, &speed, XCALL_TYPE_SYNC));
 
 	pr_debug("detected %u kHz as current frequency\n", speed);
 	return speed;
@@ -262,8 +262,7 @@ static int speedstep_target(struct cpufreq_policy *policy, unsigned int index)
 
 	policy_cpu = cpumask_any_and(policy->cpus, cpu_online_mask);
 
-	smp_call_function_single(policy_cpu, _speedstep_set_state, &index,
-				 true);
+	smp_xcall(policy_cpu, _speedstep_set_state, &index, XCALL_TYPE_SYNC);
 
 	return 0;
 }
@@ -299,7 +298,7 @@ static int speedstep_cpu_init(struct cpufreq_policy *policy)
 
 	/* detect low and high frequency and transition latency */
 	gf.policy = policy;
-	smp_call_function_single(policy_cpu, get_freqs_on_cpu, &gf, 1);
+	smp_xcall(policy_cpu, get_freqs_on_cpu, &gf, XCALL_TYPE_SYNC);
 	if (gf.ret)
 		return gf.ret;
 
diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c
index 4b0b7e3cb19f..5da89e76b95b 100644
--- a/drivers/cpufreq/tegra194-cpufreq.c
+++ b/drivers/cpufreq/tegra194-cpufreq.c
@@ -203,13 +203,13 @@ static unsigned int tegra194_get_speed(u32 cpu)
 	int ret;
 	u32 cl;
 
-	smp_call_function_single(cpu, get_cpu_cluster, &cl, true);
+	smp_xcall(cpu, get_cpu_cluster, &cl, XCALL_TYPE_SYNC);
 
 	/* reconstruct actual cpu freq using counters */
 	rate = tegra194_calculate_speed(cpu);
 
 	/* get last written ndiv value */
-	ret = smp_call_function_single(cpu, get_cpu_ndiv, &ndiv, true);
+	ret = smp_xcall(cpu, get_cpu_ndiv, &ndiv, XCALL_TYPE_SYNC);
 	if (WARN_ON_ONCE(ret))
 		return rate;
 
@@ -240,7 +240,7 @@ static int tegra194_cpufreq_init(struct cpufreq_policy *policy)
 	u32 cpu;
 	u32 cl;
 
-	smp_call_function_single(policy->cpu, get_cpu_cluster, &cl, true);
+	smp_xcall(policy->cpu, get_cpu_cluster, &cl, XCALL_TYPE_SYNC);
 
 	if (cl >= data->num_clusters || !data->tables[cl])
 		return -EINVAL;
diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 8845ec4b4402..0cacd61b66ed 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -590,8 +590,7 @@ static int debug_probe(struct amba_device *adev, const struct amba_id *id)
 
 	cpus_read_lock();
 	per_cpu(debug_drvdata, drvdata->cpu) = drvdata;
-	ret = smp_call_function_single(drvdata->cpu, debug_init_arch_data,
-				       drvdata, 1);
+	ret = smp_xcall(drvdata->cpu, debug_init_arch_data, drvdata, XCALL_TYPE_SYNC);
 	cpus_read_unlock();
 
 	if (ret) {
diff --git a/drivers/hwtracing/coresight/coresight-etm3x-core.c b/drivers/hwtracing/coresight/coresight-etm3x-core.c
index 7d413ba8b823..e0a2c4c6e90d 100644
--- a/drivers/hwtracing/coresight/coresight-etm3x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm3x-core.c
@@ -518,8 +518,8 @@ static int etm_enable_sysfs(struct coresight_device *csdev)
 	 */
 	if (cpu_online(drvdata->cpu)) {
 		arg.drvdata = drvdata;
-		ret = smp_call_function_single(drvdata->cpu,
-					       etm_enable_hw_smp_call, &arg, 1);
+		ret = smp_xcall(drvdata->cpu, etm_enable_hw_smp_call,
+				&arg, XCALL_TYPE_SYNC);
 		if (!ret)
 			ret = arg.rc;
 		if (!ret)
@@ -630,7 +630,7 @@ static void etm_disable_sysfs(struct coresight_device *csdev)
 	 * Executing etm_disable_hw on the cpu whose ETM is being disabled
 	 * ensures that register writes occur when cpu is powered.
 	 */
-	smp_call_function_single(drvdata->cpu, etm_disable_hw, drvdata, 1);
+	smp_xcall(drvdata->cpu, etm_disable_hw, drvdata, XCALL_TYPE_SYNC);
 
 	spin_unlock(&drvdata->spinlock);
 	cpus_read_unlock();
@@ -864,8 +864,7 @@ static int etm_probe(struct amba_device *adev, const struct amba_id *id)
 	if (!desc.name)
 		return -ENOMEM;
 
-	if (smp_call_function_single(drvdata->cpu,
-				     etm_init_arch_data,  drvdata, 1))
+	if (smp_xcall(drvdata->cpu, etm_init_arch_data,  drvdata, XCALL_TYPE_SYNC))
 		dev_err(dev, "ETM arch init failed\n");
 
 	if (etm_arch_supported(drvdata->arch) == false)
@@ -933,7 +932,7 @@ static void etm_remove(struct amba_device *adev)
 	 * CPU i ensures these call backs has consistent view
 	 * inside one call back function.
 	 */
-	if (smp_call_function_single(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, 1))
+	if (smp_xcall(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, XCALL_TYPE_SYNC))
 		etmdrvdata[drvdata->cpu] = NULL;
 
 	cpus_read_unlock();
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 7f416a12000e..28d91687dcaa 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -746,8 +746,8 @@ static int etm4_enable_sysfs(struct coresight_device *csdev)
 	 * ensures that register writes occur when cpu is powered.
 	 */
 	arg.drvdata = drvdata;
-	ret = smp_call_function_single(drvdata->cpu,
-				       etm4_enable_hw_smp_call, &arg, 1);
+	ret = smp_xcall(drvdata->cpu, etm4_enable_hw_smp_call,
+			&arg, XCALL_TYPE_SYNC);
 	if (!ret)
 		ret = arg.rc;
 	if (!ret)
@@ -903,7 +903,7 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
 	 * Executing etm4_disable_hw on the cpu whose ETM is being disabled
 	 * ensures that register writes occur when cpu is powered.
 	 */
-	smp_call_function_single(drvdata->cpu, etm4_disable_hw, drvdata, 1);
+	smp_xcall(drvdata->cpu, etm4_disable_hw, drvdata, XCALL_TYPE_SYNC);
 
 	spin_unlock(&drvdata->spinlock);
 	cpus_read_unlock();
@@ -1977,8 +1977,8 @@ static int etm4_probe(struct device *dev, void __iomem *base, u32 etm_pid)
 	init_arg.csa = &desc.access;
 	init_arg.pid = etm_pid;
 
-	if (smp_call_function_single(drvdata->cpu,
-				etm4_init_arch_data,  &init_arg, 1))
+	if (smp_xcall(drvdata->cpu, etm4_init_arch_data,
+				&init_arg, XCALL_TYPE_SYNC))
 		dev_err(dev, "ETM arch init failed\n");
 
 	if (!drvdata->arch)
@@ -2118,7 +2118,7 @@ static int __exit etm4_remove_dev(struct etmv4_drvdata *drvdata)
 	 * CPU i ensures these call backs has consistent view
 	 * inside one call back function.
 	 */
-	if (smp_call_function_single(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, 1))
+	if (smp_xcall(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, XCALL_TYPE_SYNC))
 		etmdrvdata[drvdata->cpu] = NULL;
 
 	cpus_read_unlock();
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
index 21687cc1e4e2..9794600a95d8 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
@@ -2379,7 +2379,7 @@ static u32 etmv4_cross_read(const struct etmv4_drvdata *drvdata, u32 offset)
 	 * smp cross call ensures the CPU will be powered up before
 	 * accessing the ETMv4 trace core registers
 	 */
-	smp_call_function_single(drvdata->cpu, do_smp_cross_read, &reg, 1);
+	smp_xcall(drvdata->cpu, do_smp_cross_read, &reg, XCALL_TYPE_SYNC);
 	return reg.data;
 }
 
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 2b386bb848f8..b1fdd70b7ace 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -1350,12 +1350,12 @@ static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
 
 	for_each_cpu(cpu, &drvdata->supported_cpus) {
 		/* If we fail to probe the CPU, let us defer it to hotplug callbacks */
-		if (smp_call_function_single(cpu, arm_trbe_probe_cpu, drvdata, 1))
+		if (smp_xcall(cpu, arm_trbe_probe_cpu, drvdata, XCALL_TYPE_SYNC))
 			continue;
 		if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
 			arm_trbe_register_coresight_cpu(drvdata, cpu);
 		if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
-			smp_call_function_single(cpu, arm_trbe_enable_cpu, drvdata, 1);
+			smp_xcall(cpu, arm_trbe_enable_cpu, drvdata, XCALL_TYPE_SYNC);
 	}
 	return 0;
 }
@@ -1365,7 +1365,7 @@ static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
 	int cpu;
 
 	for_each_cpu(cpu, &drvdata->supported_cpus)
-		smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
+		smp_xcall(cpu, arm_trbe_remove_coresight_cpu, drvdata, XCALL_TYPE_SYNC);
 	free_percpu(drvdata->cpudata);
 	return 0;
 }
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 305c18bf33c1..8663a9f21401 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4304,8 +4304,8 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
 		/* Update the interrupt mask on each CPU according the
 		 * new mapping
 		 */
-		smp_call_function_single(cpu, mvneta_percpu_unmask_interrupt,
-					 pp, true);
+		smp_xcall(cpu, mvneta_percpu_unmask_interrupt,
+			  pp, XCALL_TYPE_SYNC);
 		i++;
 
 	}
diff --git a/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c b/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c
index 1b6eab071068..f08161b8ee62 100644
--- a/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c
+++ b/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c
@@ -124,8 +124,8 @@ static long isst_if_mbox_proc_cmd(u8 *cmd_ptr, int *write_only, int resume)
 	 * and also with wait flag, wait for completion.
 	 * smp_call_function_single is using get_cpu() and put_cpu().
 	 */
-	ret = smp_call_function_single(action.mbox_cmd->logical_cpu,
-				       msrl_update_func, &action, 1);
+	ret = smp_xcall(action.mbox_cmd->logical_cpu,
+			msrl_update_func, &action, XCALL_TYPE_SYNC);
 	if (ret)
 		return ret;
 
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 07611a00b78f..448f8bf097f3 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -913,7 +913,7 @@ static void package_power_limit_irq_save(struct rapl_package *rp)
 	if (!boot_cpu_has(X86_FEATURE_PTS) || !boot_cpu_has(X86_FEATURE_PLN))
 		return;
 
-	smp_call_function_single(rp->lead_cpu, power_limit_irq_save_cpu, rp, 1);
+	smp_xcall(rp->lead_cpu, power_limit_irq_save_cpu, rp, XCALL_TYPE_SYNC);
 }
 
 /*
diff --git a/drivers/powercap/intel_rapl_msr.c b/drivers/powercap/intel_rapl_msr.c
index 1be45f36ab6c..d9cbf3f94c94 100644
--- a/drivers/powercap/intel_rapl_msr.c
+++ b/drivers/powercap/intel_rapl_msr.c
@@ -128,7 +128,7 @@ static int rapl_msr_write_raw(int cpu, struct reg_action *ra)
 {
 	int ret;
 
-	ret = smp_call_function_single(cpu, rapl_msr_update_func, ra, 1);
+	ret = smp_xcall(cpu, rapl_msr_update_func, ra, XCALL_TYPE_SYNC);
 	if (WARN_ON_ONCE(ret))
 		return ret;
 
diff --git a/drivers/regulator/qcom_spmi-regulator.c b/drivers/regulator/qcom_spmi-regulator.c
index 02bfce981150..8b92ab3c9ed7 100644
--- a/drivers/regulator/qcom_spmi-regulator.c
+++ b/drivers/regulator/qcom_spmi-regulator.c
@@ -1292,8 +1292,7 @@ spmi_regulator_saw_set_voltage(struct regulator_dev *rdev, unsigned selector)
 	}
 
 	/* Always do the SAW register writes on the first CPU */
-	return smp_call_function_single(0, spmi_saw_set_vdd, \
-					&voltage_sel, true);
+	return smp_xcall(0, spmi_saw_set_vdd, &voltage_sel, XCALL_TYPE_SYNC);
 }
 
 static struct regulator_ops spmi_saw_ops = {};
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index fde4edd83c14..d2d528f38aab 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -2548,8 +2548,8 @@ void qman_delete_cgr_safe(struct qman_cgr *cgr)
 {
 	preempt_disable();
 	if (qman_cgr_cpus[cgr->cgrid] != smp_processor_id()) {
-		smp_call_function_single(qman_cgr_cpus[cgr->cgrid],
-					 qman_delete_cgr_smp_call, cgr, true);
+		smp_xcall(qman_cgr_cpus[cgr->cgrid],
+			  qman_delete_cgr_smp_call, cgr, XCALL_TYPE_SYNC);
 		preempt_enable();
 		return;
 	}
diff --git a/drivers/soc/fsl/qbman/qman_test_stash.c b/drivers/soc/fsl/qbman/qman_test_stash.c
index b7e8e5ec884c..04b813c288d5 100644
--- a/drivers/soc/fsl/qbman/qman_test_stash.c
+++ b/drivers/soc/fsl/qbman/qman_test_stash.c
@@ -507,8 +507,9 @@ static int init_phase3(void)
 				if (err)
 					return err;
 			} else {
-				smp_call_function_single(hp_cpu->processor_id,
-					init_handler_cb, hp_cpu->iterator, 1);
+				smp_xcall(hp_cpu->processor_id,
+					init_handler_cb, hp_cpu->iterator,
+					XCALL_TYPE_SYNC);
 			}
 			preempt_enable();
 		}
@@ -607,8 +608,8 @@ int qman_test_stash(void)
 		if (err)
 			goto failed;
 	} else {
-		smp_call_function_single(special_handler->processor_id,
-					 send_first_frame_cb, NULL, 1);
+		smp_xcall(special_handler->processor_id,
+				 send_first_frame_cb, NULL, XCALL_TYPE_SYNC);
 	}
 	preempt_enable();
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index de9b850722b3..f2e6c7a1be3d 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -174,9 +174,8 @@ extern int smp_xcall_any(const struct cpumask *mask, smp_call_func_t func,
 #define	XCALL_TYPE_IRQ_WORK	CSD_TYPE_IRQ_WORK
 #define	XCALL_TYPE_TTWU		CSD_TYPE_TTWU
 #define	XCALL_TYPE_MASK		CSD_FLAG_TYPE_MASK
-#define	XCALL_ALL		-1
 
-#define	XCALL_ALL		-1
+#define	XCALL_ALL		-1	/* cross call on all online CPUs */
 
 extern int smp_xcall(int cpu, smp_call_func_t func, void *info, unsigned int flags);
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5797c2a7a93f..06daf4f2882d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1091,8 +1091,8 @@ void cpuhp_report_idle_dead(void)
 	 * We cannot call complete after rcu_report_dead() so we delegate it
 	 * to an online cpu.
 	 */
-	smp_call_function_single(cpumask_first(cpu_online_mask),
-				 cpuhp_complete_idle_dead, st, 0);
+	smp_xcall(cpumask_first(cpu_online_mask),
+			 cpuhp_complete_idle_dead, st, XCALL_TYPE_ASYNC);
 }
 
 static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 23bb19716ad3..e36a2f3fe38a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -116,8 +116,8 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info)
 	int ret;
 
 	for (;;) {
-		ret = smp_call_function_single(task_cpu(p), remote_function,
-					       &data, 1);
+		ret = smp_xcall(task_cpu(p), remote_function,
+				&data, XCALL_TYPE_SYNC);
 		if (!ret)
 			ret = data.ret;
 
@@ -149,7 +149,7 @@ static int cpu_function_call(int cpu, remote_function_f func, void *info)
 		.ret	= -ENXIO, /* No such CPU */
 	};
 
-	smp_call_function_single(cpu, remote_function, &data, 1);
+	smp_xcall(cpu, remote_function, &data, XCALL_TYPE_SYNC);
 
 	return data.ret;
 }
@@ -4513,7 +4513,7 @@ static int perf_event_read(struct perf_event *event, bool group)
 		 * Therefore, either way, we'll have an up-to-date event count
 		 * after this.
 		 */
-		(void)smp_call_function_single(event_cpu, __perf_event_read, &data, 1);
+		(void)smp_xcall(event_cpu, __perf_event_read, &data, XCALL_TYPE_SYNC);
 		preempt_enable();
 		ret = data.ret;
 
@@ -13292,7 +13292,7 @@ static void perf_event_exit_cpu_context(int cpu)
 		ctx = &cpuctx->ctx;
 
 		mutex_lock(&ctx->mutex);
-		smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
+		smp_xcall(cpu, __perf_event_exit_context, ctx, XCALL_TYPE_SYNC);
 		cpuctx->online = 0;
 		mutex_unlock(&ctx->mutex);
 	}
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 55d049c39608..6c998391550b 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2662,8 +2662,7 @@ static int rcu_torture_barrier_cbs(void *arg)
 		 * The above smp_load_acquire() ensures barrier_phase load
 		 * is ordered before the following ->call().
 		 */
-		if (smp_call_function_single(myid, rcu_torture_barrier1cb,
-					     &rcu, 1)) {
+		if (smp_xcall(myid, rcu_torture_barrier1cb, &rcu, XCALL_TYPE_SYNC)) {
 			// IPI failed, so use direct call from current CPU.
 			cur_ops->call(&rcu, rcu_torture_barrier_cbf);
 		}
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 99cf3a13954c..e5248737f9b6 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1299,7 +1299,7 @@ static void trc_wait_for_one_reader(struct task_struct *t,
 		per_cpu(trc_ipi_to_cpu, cpu) = true;
 		t->trc_ipi_to_cpu = cpu;
 		rcu_tasks_trace.n_ipis++;
-		if (smp_call_function_single(cpu, trc_read_check_handler, t, 0)) {
+		if (smp_xcall(cpu, trc_read_check_handler, t, XCALL_TYPE_ASYNC)) {
 			// Just in case there is some other reason for
 			// failure than the target CPU being offline.
 			WARN_ONCE(1, "%s():  smp_call_function_single() failed for CPU: %d\n",
@@ -1473,7 +1473,7 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
 	// changes, there will need to be a recheck and/or timed wait.
 	for_each_online_cpu(cpu)
 		if (WARN_ON_ONCE(smp_load_acquire(per_cpu_ptr(&trc_ipi_to_cpu, cpu))))
-			smp_call_function_single(cpu, rcu_tasks_trace_empty_fn, NULL, 1);
+			smp_xcall(cpu, rcu_tasks_trace_empty_fn, NULL, XCALL_TYPE_SYNC);
 
 	// Remove the safety count.
 	smp_mb__before_atomic();  // Order vs. earlier atomics
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bf3a3fe88d94..8131d7662e4c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4100,7 +4100,7 @@ void rcu_barrier(void)
 			continue;
 		}
 		raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);
-		if (smp_call_function_single(cpu, rcu_barrier_handler, (void *)cpu, 1)) {
+		if (smp_xcall(cpu, rcu_barrier_handler, (void *)cpu, XCALL_TYPE_SYNC)) {
 			schedule_timeout_uninterruptible(1);
 			goto retry;
 		}
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 60197ea24ceb..fffefe6205eb 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -391,7 +391,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
 			put_cpu();
 			continue;
 		}
-		ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
+		ret = smp_xcall(cpu, rcu_exp_handler, NULL, XCALL_TYPE_ASYNC);
 		put_cpu();
 		/* The CPU will report the QS in response to the IPI. */
 		if (!ret)
@@ -777,7 +777,7 @@ static void sync_sched_exp_online_cleanup(int cpu)
 		return;
 	}
 	/* Quiescent state needed on some other CPU, send IPI. */
-	ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
+	ret = smp_xcall(cpu, rcu_exp_handler, NULL, XCALL_TYPE_ASYNC);
 	put_cpu();
 	WARN_ON_ONCE(ret);
 }
diff --git a/kernel/relay.c b/kernel/relay.c
index d1a67fbb819d..5d605b08c6f9 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -635,9 +635,8 @@ int relay_late_setup_files(struct rchan *chan,
 			disp.dentry = dentry;
 			smp_mb();
 			/* relay_channels_mutex must be held, so wait. */
-			err = smp_call_function_single(i,
-						       __relay_set_buf_dentry,
-						       &disp, 1);
+			err = smp_xcall(i, __relay_set_buf_dentry,
+					&disp, XCALL_TYPE_SYNC);
 		}
 		if (unlikely(err))
 			break;
diff --git a/kernel/scftorture.c b/kernel/scftorture.c
index a2cb2b223997..17d5f6b69a01 100644
--- a/kernel/scftorture.c
+++ b/kernel/scftorture.c
@@ -351,7 +351,8 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 			barrier(); // Prevent race-reduction compiler optimizations.
 			scfcp->scfc_in = true;
 		}
-		ret = smp_call_function_single(cpu, scf_handler_1, (void *)scfcp, scfsp->scfs_wait);
+		ret = smp_xcall(cpu, scf_handler_1, (void *)scfcp,
+				(scfsp->scfs_wait ? XCALL_TYPE_SYNC : XCALL_TYPE_ASYNC));
 		if (ret) {
 			if (scfsp->scfs_wait)
 				scfp->n_single_wait_ofl++;
@@ -372,7 +373,7 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 		scfcp->scfc_rpc = true;
 		barrier(); // Prevent race-reduction compiler optimizations.
 		scfcp->scfc_in = true;
-		ret = smp_call_function_single(cpu, scf_handler_1, (void *)scfcp, 0);
+		ret = smp_xcall(cpu, scf_handler_1, (void *)scfcp, XCALL_TYPE_ASYNC);
 		if (!ret) {
 			if (use_cpus_read_lock)
 				cpus_read_unlock();
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 9af7795f220e..6e12a823e2f7 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -377,7 +377,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		 * smp_call_function_single() will call ipi_func() if cpu_id
 		 * is the calling CPU.
 		 */
-		smp_call_function_single(cpu_id, ipi_func, NULL, 1);
+		smp_xcall(cpu_id, ipi_func, NULL, XCALL_TYPE_SYNC);
 	} else {
 		/*
 		 * For regular membarrier, we can save a few cycles by
diff --git a/kernel/smp.c b/kernel/smp.c
index fb2333218e31..448bde271515 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -596,36 +596,6 @@ void flush_smp_call_function_from_idle(void)
 	local_irq_restore(flags);
 }
 
-/*
- * This is a temporarily hook up. This function will be eliminated
- * with the last patch in this series.
- *
- * smp_call_function_single - Run a function on a specific CPU
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait until function has completed on other CPUs.
- *
- * Returns 0 on success, else a negative status code.
- */
-int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
-			int wait)
-{
-	unsigned int flags = 0;
-
-	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu))
-		return -ENXIO;
-
-	if (wait)
-		flags = XCALL_TYPE_SYNC;
-	else
-		flags = XCALL_TYPE_ASYNC;
-
-	smp_xcall(cpu, func, info, flags);
-
-	return 0;
-}
-EXPORT_SYMBOL(smp_call_function_single);
-
 /*
  * This function performs synchronous and asynchronous cross CPU call for
  * more than one participants.
@@ -1068,8 +1038,8 @@ int smp_xcall_private(int cpu, call_single_data_t *csd, unsigned int flags)
 	int err = 0;
 
 	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
-		pr_warn("cpu ID must be a positive number < nr_cpu_ids and must be currently online\n");
-		return -EINVAL;
+		pr_warn("cpu ID must be >=0 && < nr_cpu_ids\n");
+		return -ENXIO;
 	}
 
 	if (csd == NULL) {
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 003ccf338d20..0ff0b32be6b2 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -418,7 +418,7 @@ static int clockevents_unbind(struct clock_event_device *ced, int cpu)
 {
 	struct ce_unbind cu = { .ce = ced, .res = -ENODEV };
 
-	smp_call_function_single(cpu, __clockevents_unbind, &cu, 1);
+	smp_xcall(cpu, __clockevents_unbind, &cu, XCALL_TYPE_SYNC);
 	return cu.res;
 }
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 95d7ca35bdf2..9d7eb5eb2fea 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -355,7 +355,7 @@ void clocksource_verify_percpu(struct clocksource *cs)
 		if (cpu == testcpu)
 			continue;
 		csnow_begin = cs->read(cs);
-		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		smp_xcall(cpu, clocksource_verify_one_cpu, cs, XCALL_TYPE_SYNC);
 		csnow_end = cs->read(cs);
 		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
 		if (delta < 0)
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 46789356f856..17dc8d409138 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -195,7 +195,7 @@ static void tick_take_do_timer_from_boot(void)
 	int from = tick_do_timer_boot_cpu;
 
 	if (from >= 0 && from != cpu)
-		smp_call_function_single(from, giveup_do_timer, &cpu, 1);
+		smp_xcall(from, giveup_do_timer, &cpu, XCALL_TYPE_SYNC);
 }
 #endif
 
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index e7b9c2636d10..fd48eb5606c6 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -822,8 +822,8 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 		 */
 		err = -ENXIO;
 	} else {
-		err = smp_call_function_single(cpu, __bpf_prog_test_run_raw_tp,
-					       &info, 1);
+		err = smp_xcall(cpu, __bpf_prog_test_run_raw_tp,
+				&info, XCALL_TYPE_SYNC);
 	}
 	put_cpu();
 
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index 7fba16e99665..f30538ee2e70 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -507,8 +507,7 @@ static void iucv_setmask_mp(void)
 		/* Enable all cpus with a declared buffer. */
 		if (cpumask_test_cpu(cpu, &iucv_buffer_cpumask) &&
 		    !cpumask_test_cpu(cpu, &iucv_irq_cpumask))
-			smp_call_function_single(cpu, iucv_allow_cpu,
-						 NULL, 1);
+			smp_xcall(cpu, iucv_allow_cpu, NULL, XCALL_TYPE_SYNC);
 	cpus_read_unlock();
 }
 
@@ -526,7 +525,7 @@ static void iucv_setmask_up(void)
 	cpumask_copy(&cpumask, &iucv_irq_cpumask);
 	cpumask_clear_cpu(cpumask_first(&iucv_irq_cpumask), &cpumask);
 	for_each_cpu(cpu, &cpumask)
-		smp_call_function_single(cpu, iucv_block_cpu, NULL, 1);
+		smp_xcall(cpu, iucv_block_cpu, NULL, XCALL_TYPE_SYNC);
 }
 
 /*
@@ -551,7 +550,7 @@ static int iucv_enable(void)
 	/* Declare per cpu buffers. */
 	rc = -EIO;
 	for_each_online_cpu(cpu)
-		smp_call_function_single(cpu, iucv_declare_cpu, NULL, 1);
+		smp_xcall(cpu, iucv_declare_cpu, NULL, XCALL_TYPE_SYNC);
 	if (cpumask_empty(&iucv_buffer_cpumask))
 		/* No cpu could declare an iucv buffer. */
 		goto out;
@@ -641,8 +640,8 @@ static int iucv_cpu_down_prep(unsigned int cpu)
 	iucv_retrieve_cpu(NULL);
 	if (!cpumask_empty(&iucv_irq_cpumask))
 		return 0;
-	smp_call_function_single(cpumask_first(&iucv_buffer_cpumask),
-				 iucv_allow_cpu, NULL, 1);
+	smp_xcall(cpumask_first(&iucv_buffer_cpumask),
+		  iucv_allow_cpu, NULL, XCALL_TYPE_SYNC);
 	return 0;
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index adb7cbe67ee5..2f9d1d4826cb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5696,7 +5696,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	c.ret = &r;
 	c.opaque = opaque;
 	for_each_online_cpu(cpu) {
-		smp_call_function_single(cpu, check_processor_compat, &c, 1);
+		smp_xcall(cpu, check_processor_compat, &c, XCALL_TYPE_SYNC);
 		if (r < 0)
 			goto out_free_2;
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 11/11] smp: modify up.c to adopt the same format of cross CPU call.
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (9 preceding siblings ...)
  2022-04-15  2:47 ` [PATCH 10/11] smp: replace smp_call_function_single() with smp_xcall() Donghai Qiao
@ 2022-04-15  2:47 ` Donghai Qiao
  2022-04-15 12:13 ` [PATCH 00/11] smp: cross CPU call interface Peter Zijlstra
  11 siblings, 0 replies; 13+ messages in thread
From: Donghai Qiao @ 2022-04-15  2:47 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Since smp.c has been changed to use the new interface, up.c should
be changed to use the uniprocessor version of cross call as well.

Also clean up the dead code which left out after applying the precedent
patches of this patch set.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 include/linux/smp.h |  7 ------
 kernel/up.c         | 56 +++++++++++++++++++++++++++++++++------------
 2 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index f2e6c7a1be3d..1e29527123f8 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -202,9 +202,6 @@ extern void __smp_call_single_queue(int cpu, struct llist_node *node);
 /* total number of cpus in this system (may exceed NR_CPUS) */
 extern unsigned int total_cpus;
 
-int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
-			     int wait);
-
 /*
  * Cpus stopping functions in panic. All have default weak definitions.
  * Architecture-dependent code may override them.
@@ -290,13 +287,9 @@ static inline void smp_send_stop(void) { }
 static inline void up_smp_call_function(smp_call_func_t func, void *info)
 {
 }
-#define smp_call_function(func, info, wait) \
-			(up_smp_call_function(func, info))
 
 static inline void smp_send_reschedule(int cpu) { }
 #define smp_prepare_boot_cpu()			do {} while (0)
-#define smp_call_function_many(mask, func, info, wait) \
-			(up_smp_call_function(func, info))
 static inline void call_function_init(void) { }
 
 static inline void kick_all_cpus_sync(void) {  }
diff --git a/kernel/up.c b/kernel/up.c
index a38b8b095251..92c62c677e52 100644
--- a/kernel/up.c
+++ b/kernel/up.c
@@ -9,8 +9,7 @@
 #include <linux/smp.h>
 #include <linux/hypervisor.h>
 
-int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
-				int wait)
+int smp_xcall(int cpu, void (*func) (void *info), void *info, unsigned int type)
 {
 	unsigned long flags;
 
@@ -23,37 +22,66 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
 
 	return 0;
 }
-EXPORT_SYMBOL(smp_call_function_single);
+EXPORT_SYMBOL(smp_xcall);
 
-int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
+int smp_xcall_cond(int cpu, smp_call_func_t func, void *info,
+		   smp_cond_func_t cond_func, unsigned int type)
+{
+	int ret = 0;
+
+	preempt_disable();
+	if (!cond_func || cond_func(0, info))
+		ret = smp_xcall(cpu, func, info, type);
+
+	preempt_enable();
+
+	return ret;
+}
+EXPORT_SYMBOL(smp_xcall_cond);
+
+void smp_xcall_mask(const struct cpumask *mask, smp_call_func_t func, void *info, unsigned int type)
 {
 	unsigned long flags;
 
-	local_irq_save(flags);
-	csd->func(csd->info);
-	local_irq_restore(flags);
+	if (!cpumask_test_cpu(0, mask))
+		return;
+
+	preempt_disable();
+	smp_xcall(cpu, func, info, type);
+	preempt_enable();
+}
+EXPORT_SYMBOL(smp_xcall_mask);
+
+int smp_xcall_private(int cpu, struct __call_single_data *csd, unsigned int type)
+{
+	preempt_disable();
+
+	if (csd->func != NULL)
+		smp_xcall(cpu, csd->func, csd->info, type);
+
+	preempt_enable();
+
 	return 0;
 }
-EXPORT_SYMBOL(smp_call_function_single_async);
+EXPORT_SYMBOL(smp_xcall_private);
 
 /*
  * Preemption is disabled here to make sure the cond_func is called under the
  * same conditions in UP and SMP.
  */
-void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
-			   void *info, bool wait, const struct cpumask *mask)
+void smp_xcall_mask_cond(const struct cpumask *mask, smp_call_func_t func,
+			 void *info, smp_cond_func_t cond_func,
+			 unsigned int type)
 {
 	unsigned long flags;
 
 	preempt_disable();
 	if ((!cond_func || cond_func(0, info)) && cpumask_test_cpu(0, mask)) {
-		local_irq_save(flags);
-		func(info);
-		local_irq_restore(flags);
+		smp_xcall(cpu, func, info, type);
 	}
 	preempt_enable();
 }
-EXPORT_SYMBOL(on_each_cpu_cond_mask);
+EXPORT_SYMBOL(smp_xcall_mask_cond);
 
 int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 00/11] smp: cross CPU call interface
  2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (10 preceding siblings ...)
  2022-04-15  2:47 ` [PATCH 11/11] smp: modify up.c to adopt the same format of cross CPU call Donghai Qiao
@ 2022-04-15 12:13 ` Peter Zijlstra
  11 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2022-04-15 12:13 UTC (permalink / raw)
  To: Donghai Qiao
  Cc: akpm, sfr, arnd, heying24, andriy.shevchenko, axboe, rdunlap,
	tglx, gor, donghai.w.qiao, linux-kernel

On Thu, Apr 14, 2022 at 10:46:50PM -0400, Donghai Qiao wrote:
> The motivation of submitting this patch set is intended to make the
> existing cross CPU call mechanism become a bit more formal interface
> and more friendly to the kernel developers.
> 
> Basically the minimum set of functions below can satisfy any demand
> for cross CPU call from kernel consumers. For the sack of simplicity
> self-explanatory and less code redundancy no ambiguity, the functions
> in this interface are renamed, simplified, or eliminated. But they
> are still inheriting the same semantics and parameter lists from their
> previous version.
> 
> int smp_xcall(int cpu, smp_call_func_t func, void *info, unsigned int flags)
> 
> int smp_xcall_cond(int cpu, smp_call_func_t func, void *info,
>                    smp_cond_func_t condf, unsigned int flags)
> 
> void smp_xcall_mask(const struct cpumask *mask, smp_call_func_t func,
>                     void *info, unsigned int flags)
> 
> void smp_xcall_mask_cond(const struct cpumask *mask, smp_call_func_t func,
>                          void *info, smp_cond_func_t condf, unsigned int flags)
> 
> int smp_xcall_private(int cpu, call_single_data_t *csd, unsigned int flags)
> 
> int smp_xcall_any(const struct cpumask *mask, smp_call_func_t func,
>                   void *info, unsigned int flags)
> 

Can we please remove that x? That's going to be horrible pain for a long
time to come.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-04-15 12:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-15  2:46 [PATCH 00/11] smp: cross CPU call interface Donghai Qiao
2022-04-15  2:46 ` [PATCH 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
2022-04-15  2:46 ` [PATCH 02/11] smp: cross call interface Donghai Qiao
2022-04-15  2:46 ` [PATCH 03/11] smp: eliminate SCF_WAIT and SCF_RUN_LOCAL Donghai Qiao
2022-04-15  2:46 ` [PATCH 04/11] smp: replace smp_call_function_single() with smp_xcall() Donghai Qiao
2022-04-15  2:46 ` [PATCH 05/11] smp: replace smp_call_function_single_async() with smp_xcall_private() Donghai Qiao
2022-04-15  2:46 ` [PATCH 06/11] smp: use smp_xcall_private() fron irq_work.c and core.c Donghai Qiao
2022-04-15  2:46 ` [PATCH 07/11] smp: change smp_call_function_any() to smp_xcall_any() Donghai Qiao
2022-04-15  2:46 ` [PATCH 08/11] smp: replace smp_call_function_many_cond() with __smp_call_mask_cond() Donghai Qiao
2022-04-15  2:46 ` [PATCH 09/11] smp: replace smp_call_function_single_async with smp_xcall_private Donghai Qiao
2022-04-15  2:47 ` [PATCH 10/11] smp: replace smp_call_function_single() with smp_xcall() Donghai Qiao
2022-04-15  2:47 ` [PATCH 11/11] smp: modify up.c to adopt the same format of cross CPU call Donghai Qiao
2022-04-15 12:13 ` [PATCH 00/11] smp: cross CPU call interface Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.