All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/11] smp: cross CPU call interface
@ 2022-04-22 20:00 Donghai Qiao
  2022-04-22 20:00 ` [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
                   ` (11 more replies)
  0 siblings, 12 replies; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

The motivation of submitting this patch set is intended to make the
existing cross CPU call mechanism become a bit more formal interface
and more friendly to the kernel developers.

Basically the minimum set of functions below can satisfy any demand
for cross CPU call from kernel consumers. For the sack of simplicity
self-explanatory and less code redundancy no ambiguity, the functions
in this interface are renamed, simplified, or eliminated. But they
are still inheriting the same semantics and parameter lists from their
previous version.

int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags)

int smp_call_cond(int cpu, smp_call_func_t func, void *info,
                   smp_cond_func_t condf, unsigned int flags)

void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
                    void *info, unsigned int flags)

void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
                         void *info, smp_cond_func_t condf, unsigned int flags)

int smp_call_private(int cpu, call_single_data_t *csd, unsigned int flags)

int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
                  void *info, unsigned int flags)


Here is the explanation about the patch set:

Patch 1: The smp cross call related structures and definitions are
consolidated from smp.c smp_types.h to smp.h. As a result, smp_types.h
is deleted from the source tree.

Patch 2: The set of smp_call* functions listed above are defined.
But the details will be done with the subsequent patches in this set.

Patch 3: Eliminated the macros SCF_WAIT and SCF_RUN_LOCAL and the
code around them. The reason that we can do that is because the
function smp_call_function_many_cond() was able to handle the local
cpu call after the commit a32a4d8a815c ("smp: Run functions concurrently
in smp_call_function_many_cond()") only if the local cpu showed up
in cpumask. So it was incorrect to force a local cpu call for the
on_each_cpu_cond_mask code path.

This change and the changes in subsequent patches will eventually
help eliminate the set of on_each_cpu* functions.

Patch 4: Eliminated the percpu global csd_data. Let
smp_call_function_single() temporarily hook up to smp_call().

Patch 5: Replaced smp_call_function_single_async() with smp_call_private()
and also extended smp_call_private() to support synchronous call
with a preallocated csd structures.

Patch 6: Previously, there were two special cross call clients
irq_work.c and core.c that they were using __smp_call_single_queue
which was a smp internal function. With some minor changes in this
patch, they are able to use this interface.

Patch 7: Actually kernel consumers can use smp_call() when they
want to use smp_call_function_any(). The extra logics handled by
smp_call_function_any() should be moved out of there and have the
kernel consumers pick up the CPU. Because there are quite a few
of the consumers need to run the cross call function on any one
of the CPUs, so there is some advantage to add smp_call_any()
to the interface.

Patch 8: Eliminated smp_call_function, smp_call_function_many,
smp_call_function_many_cond, on_each_cpu, on_each_cpu_mask,
on_each_cpu_cond, on_each_cpu_cond_mask.

Patch 9: Eliminated smp_call_function_single_async.

Patch 10: Eliminated smp_call_function_single.

Patch 11: modify up.c to adopt the same format of cross CPU call.

Note: Each patch in this set depends on its precedent patch only.
      The kernel can be built and boot if it is patched with any
      number of patches starting from 1 to 11.

Donghai Qiao (11):
  smp: consolidate the structure definitions to smp.h
  smp: define the cross call interface
  smp: eliminate SCF_WAIT and SCF_RUN_LOCAL
  smp: replace smp_call_function_single() with smp_call()
  smp: replace smp_call_function_single_async() with  smp_call_private()
  smp: use smp_call_private() fron irq_work.c and core.c
  smp: change smp_call_function_any() to smp_call_any()
  smp: replace smp_call_function_many_cond() with 
    __smp_call_mask_cond()
  smp: replace smp_call_function_single_async with smp_call_private
  smp: replace smp_call_function_single() with smp_call()
  smp: modify up.c to adopt the same format of cross CPU call.

v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros.

 arch/alpha/kernel/process.c                   |   2 +-
 arch/alpha/kernel/rtc.c                       |   4 +-
 arch/alpha/kernel/smp.c                       |  10 +-
 arch/arc/kernel/perf_event.c                  |   2 +-
 arch/arc/mm/cache.c                           |   2 +-
 arch/arc/mm/tlb.c                             |  14 +-
 arch/arm/common/bL_switcher.c                 |   2 +-
 arch/arm/kernel/machine_kexec.c               |   2 +-
 arch/arm/kernel/perf_event_v7.c               |   6 +-
 arch/arm/kernel/smp_tlb.c                     |  22 +-
 arch/arm/kernel/smp_twd.c                     |   4 +-
 arch/arm/mach-bcm/bcm_kona_smc.c              |   2 +-
 arch/arm/mach-mvebu/pmsu.c                    |   4 +-
 arch/arm/mm/flush.c                           |   4 +-
 arch/arm/vfp/vfpmodule.c                      |   2 +-
 arch/arm64/kernel/armv8_deprecated.c          |   4 +-
 arch/arm64/kernel/perf_event.c                |   8 +-
 arch/arm64/kernel/topology.c                  |   2 +-
 arch/arm64/kvm/arm.c                          |   6 +-
 arch/csky/abiv2/cacheflush.c                  |   2 +-
 arch/csky/kernel/cpu-probe.c                  |   2 +-
 arch/csky/kernel/perf_event.c                 |   2 +-
 arch/csky/kernel/smp.c                        |   2 +-
 arch/csky/mm/cachev2.c                        |   2 +-
 arch/ia64/kernel/mca.c                        |   4 +-
 arch/ia64/kernel/palinfo.c                    |   3 +-
 arch/ia64/kernel/smp.c                        |  10 +-
 arch/ia64/kernel/smpboot.c                    |   2 +-
 arch/ia64/kernel/uncached.c                   |   4 +-
 arch/mips/cavium-octeon/octeon-irq.c          |   4 +-
 arch/mips/cavium-octeon/setup.c               |   4 +-
 arch/mips/kernel/crash.c                      |   2 +-
 arch/mips/kernel/machine_kexec.c              |   2 +-
 arch/mips/kernel/perf_event_mipsxx.c          |   7 +-
 arch/mips/kernel/process.c                    |   2 +-
 arch/mips/kernel/smp-bmips.c                  |   3 +-
 arch/mips/kernel/smp-cps.c                    |   8 +-
 arch/mips/kernel/smp.c                        |  10 +-
 arch/mips/kernel/sysrq.c                      |   2 +-
 arch/mips/mm/c-r4k.c                          |   4 +-
 arch/mips/sibyte/common/cfe.c                 |   2 +-
 arch/openrisc/kernel/smp.c                    |  12 +-
 arch/parisc/kernel/cache.c                    |   4 +-
 arch/parisc/mm/init.c                         |   2 +-
 arch/powerpc/kernel/dawr.c                    |   2 +-
 arch/powerpc/kernel/kvm.c                     |   2 +-
 arch/powerpc/kernel/security.c                |   6 +-
 arch/powerpc/kernel/smp.c                     |   4 +-
 arch/powerpc/kernel/sysfs.c                   |  28 +-
 arch/powerpc/kernel/tau_6xx.c                 |   4 +-
 arch/powerpc/kernel/watchdog.c                |   4 +-
 arch/powerpc/kexec/core_64.c                  |   2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |   2 +-
 arch/powerpc/kvm/book3s_hv.c                  |   8 +-
 arch/powerpc/mm/book3s64/pgtable.c            |   2 +-
 arch/powerpc/mm/book3s64/radix_tlb.c          |  12 +-
 arch/powerpc/mm/nohash/tlb.c                  |  10 +-
 arch/powerpc/mm/slice.c                       |   4 +-
 arch/powerpc/perf/core-book3s.c               |   2 +-
 arch/powerpc/perf/imc-pmu.c                   |   2 +-
 arch/powerpc/platforms/85xx/smp.c             |   8 +-
 arch/powerpc/platforms/powernv/idle.c         |   2 +-
 arch/powerpc/platforms/pseries/lparcfg.c      |   2 +-
 arch/riscv/mm/cacheflush.c                    |   4 +-
 arch/s390/hypfs/hypfs_diag0c.c                |   2 +-
 arch/s390/kernel/alternative.c                |   2 +-
 arch/s390/kernel/perf_cpum_cf.c               |  10 +-
 arch/s390/kernel/perf_cpum_cf_common.c        |   4 +-
 arch/s390/kernel/perf_cpum_sf.c               |   4 +-
 arch/s390/kernel/processor.c                  |   2 +-
 arch/s390/kernel/smp.c                        |   2 +-
 arch/s390/kernel/topology.c                   |   2 +-
 arch/s390/mm/pgalloc.c                        |   2 +-
 arch/s390/pci/pci_irq.c                       |   4 +-
 arch/sh/kernel/smp.c                          |  14 +-
 arch/sh/mm/cache.c                            |   2 +-
 arch/sparc/include/asm/mman.h                 |   4 +-
 arch/sparc/kernel/nmi.c                       |  16 +-
 arch/sparc/kernel/perf_event.c                |   4 +-
 arch/sparc/kernel/smp_64.c                    |   8 +-
 arch/sparc/mm/init_64.c                       |   2 +-
 arch/x86/events/core.c                        |   6 +-
 arch/x86/events/intel/core.c                  |   4 +-
 arch/x86/kernel/alternative.c                 |   2 +-
 arch/x86/kernel/amd_nb.c                      |   2 +-
 arch/x86/kernel/apic/apic.c                   |   2 +-
 arch/x86/kernel/apic/vector.c                 |   2 +-
 arch/x86/kernel/cpu/aperfmperf.c              |   5 +-
 arch/x86/kernel/cpu/bugs.c                    |   2 +-
 arch/x86/kernel/cpu/mce/amd.c                 |   4 +-
 arch/x86/kernel/cpu/mce/core.c                |  12 +-
 arch/x86/kernel/cpu/mce/inject.c              |  14 +-
 arch/x86/kernel/cpu/mce/intel.c               |   2 +-
 arch/x86/kernel/cpu/microcode/core.c          |   4 +-
 arch/x86/kernel/cpu/mtrr/mtrr.c               |   2 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   4 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |   8 +-
 arch/x86/kernel/cpu/sgx/main.c                |   5 +-
 arch/x86/kernel/cpu/umwait.c                  |   2 +-
 arch/x86/kernel/cpu/vmware.c                  |   2 +-
 arch/x86/kernel/cpuid.c                       |   2 +-
 arch/x86/kernel/kvm.c                         |   6 +-
 arch/x86/kernel/ldt.c                         |   2 +-
 arch/x86/kvm/vmx/vmx.c                        |   3 +-
 arch/x86/kvm/x86.c                            |  11 +-
 arch/x86/lib/cache-smp.c                      |   4 +-
 arch/x86/lib/msr-smp.c                        |  20 +-
 arch/x86/mm/pat/set_memory.c                  |   4 +-
 arch/x86/mm/tlb.c                             |  12 +-
 arch/x86/xen/mmu_pv.c                         |   4 +-
 arch/x86/xen/smp_pv.c                         |   2 +-
 arch/x86/xen/suspend.c                        |   4 +-
 arch/xtensa/kernel/smp.c                      |  29 +-
 block/blk-mq.c                                |   2 +-
 drivers/acpi/processor_idle.c                 |   4 +-
 drivers/char/agp/generic.c                    |   2 +-
 drivers/clocksource/ingenic-timer.c           |   2 +-
 drivers/clocksource/mips-gic-timer.c          |   2 +-
 drivers/cpufreq/acpi-cpufreq.c                |  10 +-
 drivers/cpufreq/powernow-k8.c                 |   9 +-
 drivers/cpufreq/powernv-cpufreq.c             |  14 +-
 drivers/cpufreq/sparc-us2e-cpufreq.c          |   4 +-
 drivers/cpufreq/sparc-us3-cpufreq.c           |   4 +-
 drivers/cpufreq/speedstep-ich.c               |   7 +-
 drivers/cpufreq/tegra194-cpufreq.c            |   8 +-
 drivers/cpuidle/coupled.c                     |   2 +-
 drivers/cpuidle/driver.c                      |   8 +-
 drivers/edac/amd64_edac.c                     |   4 +-
 drivers/firmware/arm_sdei.c                   |  10 +-
 drivers/gpu/drm/i915/vlv_sideband.c           |   2 +-
 drivers/hwmon/fam15h_power.c                  |   2 +-
 .../hwtracing/coresight/coresight-cpu-debug.c |   3 +-
 .../coresight/coresight-etm3x-core.c          |  11 +-
 .../coresight/coresight-etm4x-core.c          |  12 +-
 .../coresight/coresight-etm4x-sysfs.c         |   2 +-
 drivers/hwtracing/coresight/coresight-trbe.c  |   6 +-
 drivers/irqchip/irq-mvebu-pic.c               |   4 +-
 .../net/ethernet/cavium/liquidio/lio_core.c   |   2 +-
 drivers/net/ethernet/marvell/mvneta.c         |  34 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   8 +-
 drivers/perf/arm_spe_pmu.c                    |   2 +-
 .../intel/speed_select_if/isst_if_mbox_msr.c  |   4 +-
 drivers/platform/x86/intel_ips.c              |   4 +-
 drivers/powercap/intel_rapl_common.c          |   2 +-
 drivers/powercap/intel_rapl_msr.c             |   2 +-
 drivers/regulator/qcom_spmi-regulator.c       |   3 +-
 drivers/soc/fsl/qbman/qman.c                  |   4 +-
 drivers/soc/fsl/qbman/qman_test_stash.c       |   9 +-
 drivers/soc/xilinx/xlnx_event_manager.c       |   2 +-
 drivers/tty/sysrq.c                           |   2 +-
 drivers/watchdog/booke_wdt.c                  |   8 +-
 fs/buffer.c                                   |   2 +-
 include/linux/irq_work.h                      |   2 +-
 include/linux/smp.h                           | 227 +++++--
 include/linux/smp_types.h                     |  69 --
 kernel/cpu.c                                  |   4 +-
 kernel/debug/debug_core.c                     |   2 +-
 kernel/events/core.c                          |  10 +-
 kernel/irq_work.c                             |   4 +-
 kernel/profile.c                              |   4 +-
 kernel/rcu/rcutorture.c                       |   3 +-
 kernel/rcu/tasks.h                            |   4 +-
 kernel/rcu/tree.c                             |   6 +-
 kernel/rcu/tree_exp.h                         |   4 +-
 kernel/relay.c                                |   5 +-
 kernel/scftorture.c                           |  13 +-
 kernel/sched/core.c                           |   4 +-
 kernel/sched/fair.c                           |   2 +-
 kernel/sched/membarrier.c                     |  14 +-
 kernel/smp.c                                  | 633 ++++++++----------
 kernel/time/clockevents.c                     |   2 +-
 kernel/time/clocksource.c                     |   2 +-
 kernel/time/hrtimer.c                         |   6 +-
 kernel/time/tick-common.c                     |   2 +-
 kernel/trace/ftrace.c                         |   6 +-
 kernel/trace/ring_buffer.c                    |   2 +-
 kernel/trace/trace.c                          |  12 +-
 kernel/trace/trace_events.c                   |   2 +-
 kernel/up.c                                   |  56 +-
 mm/kasan/quarantine.c                         |   2 +-
 mm/mmu_gather.c                               |   2 +-
 mm/slab.c                                     |   2 +-
 net/bpf/test_run.c                            |   4 +-
 net/core/dev.c                                |   2 +-
 net/iucv/iucv.c                               |  17 +-
 virt/kvm/kvm_main.c                           |  12 +-
 186 files changed, 945 insertions(+), 1002 deletions(-)
 delete mode 100644 include/linux/smp_types.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-23  4:57   ` kernel test robot
                     ` (2 more replies)
  2022-04-22 20:00 ` [PATCH v2 02/11] smp: define the cross call interface Donghai Qiao
                   ` (10 subsequent siblings)
  11 siblings, 3 replies; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Move the structure definitions from kernel/smp.c to
include/linux/smp.h

Move the structure definitions from include/linux/smp_types.h
to include/linux/smp.h and delete smp_types.h

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
 include/linux/irq_work.h  |   2 +-
 include/linux/smp.h       | 131 ++++++++++++++++++++++++++++++++++++--
 include/linux/smp_types.h |  69 --------------------
 kernel/smp.c              |  65 -------------------
 4 files changed, 128 insertions(+), 139 deletions(-)
 delete mode 100644 include/linux/smp_types.h

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 8cd11a223260..145af67b1cd3 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -2,7 +2,7 @@
 #ifndef _LINUX_IRQ_WORK_H
 #define _LINUX_IRQ_WORK_H
 
-#include <linux/smp_types.h>
+#include <linux/smp.h>
 #include <linux/rcuwait.h>
 
 /*
diff --git a/include/linux/smp.h b/include/linux/smp.h
index a80ab58ae3f1..31811da856a3 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -10,13 +10,74 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/list.h>
+#include <linux/llist.h>
 #include <linux/cpumask.h>
 #include <linux/init.h>
-#include <linux/smp_types.h>
 
 typedef void (*smp_call_func_t)(void *info);
 typedef bool (*smp_cond_func_t)(int cpu, void *info);
 
+enum {
+	CSD_FLAG_LOCK		= 0x01,
+
+	IRQ_WORK_PENDING	= 0x01,
+	IRQ_WORK_BUSY		= 0x02,
+	IRQ_WORK_LAZY		= 0x04, /* No IPI, wait for tick */
+	IRQ_WORK_HARD_IRQ	= 0x08, /* IRQ context on PREEMPT_RT */
+
+	IRQ_WORK_CLAIMED	= (IRQ_WORK_PENDING | IRQ_WORK_BUSY),
+
+	CSD_TYPE_ASYNC		= 0x00,
+	CSD_TYPE_SYNC		= 0x10,
+	CSD_TYPE_IRQ_WORK	= 0x20,
+	CSD_TYPE_TTWU		= 0x30,
+
+	CSD_FLAG_TYPE_MASK	= 0xF0,
+};
+
+/*
+ * struct __call_single_node is the primary type on
+ * smp.c:call_single_queue.
+ *
+ * flush_smp_call_function_queue() only reads the type from
+ * __call_single_node::u_flags as a regular load, the above
+ * (anonymous) enum defines all the bits of this word.
+ *
+ * Other bits are not modified until the type is known.
+ *
+ * CSD_TYPE_SYNC/ASYNC:
+ *	struct {
+ *		struct llist_node node;
+ *		unsigned int flags;
+ *		smp_call_func_t func;
+ *		void *info;
+ *	};
+ *
+ * CSD_TYPE_IRQ_WORK:
+ *	struct {
+ *		struct llist_node node;
+ *		atomic_t flags;
+ *		void (*func)(struct irq_work *);
+ *	};
+ *
+ * CSD_TYPE_TTWU:
+ *	struct {
+ *		struct llist_node node;
+ *		unsigned int flags;
+ *	};
+ *
+ */
+struct __call_single_node {
+	struct llist_node	llist;
+	union {
+		unsigned int	u_flags;
+		atomic_t	a_flags;
+	};
+#ifdef CONFIG_64BIT
+	u16 src, dst;
+#endif
+};
+
 /*
  * structure shares (partial) layout with struct irq_work
  */
@@ -26,13 +87,75 @@ struct __call_single_data {
 	void *info;
 };
 
-#define CSD_INIT(_func, _info) \
-	(struct __call_single_data){ .func = (_func), .info = (_info), }
-
 /* Use __aligned() to avoid to use 2 cache lines for 1 csd */
 typedef struct __call_single_data call_single_data_t
 	__aligned(sizeof(struct __call_single_data));
 
+struct cfd_percpu {
+	call_single_data_t	csd;
+#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
+	u64	seq_queue;
+	u64	seq_ipi;
+	u64	seq_noipi;
+#endif
+};
+
+struct call_function_data {
+	struct cfd_percpu	__percpu *pcpu;
+	cpumask_var_t		cpumask;
+	cpumask_var_t		cpumask_ipi;
+};
+
+#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
+union cfd_seq_cnt {
+	u64		val;
+	struct {
+		u64	src:16;
+		u64	dst:16;
+#define CFD_SEQ_NOCPU	0xffff
+		u64	type:4;
+#define CFD_SEQ_QUEUE	0
+#define CFD_SEQ_IPI	1
+#define CFD_SEQ_NOIPI	2
+#define CFD_SEQ_PING	3
+#define CFD_SEQ_PINGED	4
+#define CFD_SEQ_HANDLE	5
+#define CFD_SEQ_DEQUEUE	6
+#define CFD_SEQ_IDLE	7
+#define CFD_SEQ_GOTIPI	8
+#define CFD_SEQ_HDLEND	9
+		u64	cnt:28;
+	}		u;
+};
+
+static char *seq_type[] = {
+	[CFD_SEQ_QUEUE]		= "queue",
+	[CFD_SEQ_IPI]		= "ipi",
+	[CFD_SEQ_NOIPI]		= "noipi",
+	[CFD_SEQ_PING]		= "ping",
+	[CFD_SEQ_PINGED]	= "pinged",
+	[CFD_SEQ_HANDLE]	= "handle",
+	[CFD_SEQ_DEQUEUE]	= "dequeue (src CPU 0 == empty)",
+	[CFD_SEQ_IDLE]		= "idle",
+	[CFD_SEQ_GOTIPI]	= "gotipi",
+	[CFD_SEQ_HDLEND]	= "hdlend (src CPU 0 == early)",
+};
+
+struct cfd_seq_local {
+	u64	ping;
+	u64	pinged;
+	u64	handle;
+	u64	dequeue;
+	u64	idle;
+	u64	gotipi;
+	u64	hdlend;
+};
+#endif
+
+#define CSD_TYPE(_csd)	((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
+#define CSD_INIT(_func, _info) \
+	((struct __call_single_data){ .func = (_func), .info = (_info), })
+
 #define INIT_CSD(_csd, _func, _info)		\
 do {						\
 	*(_csd) = CSD_INIT((_func), (_info));	\
diff --git a/include/linux/smp_types.h b/include/linux/smp_types.h
deleted file mode 100644
index 2e8461af8df6..000000000000
--- a/include/linux/smp_types.h
+++ /dev/null
@@ -1,69 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __LINUX_SMP_TYPES_H
-#define __LINUX_SMP_TYPES_H
-
-#include <linux/llist.h>
-
-enum {
-	CSD_FLAG_LOCK		= 0x01,
-
-	IRQ_WORK_PENDING	= 0x01,
-	IRQ_WORK_BUSY		= 0x02,
-	IRQ_WORK_LAZY		= 0x04, /* No IPI, wait for tick */
-	IRQ_WORK_HARD_IRQ	= 0x08, /* IRQ context on PREEMPT_RT */
-
-	IRQ_WORK_CLAIMED	= (IRQ_WORK_PENDING | IRQ_WORK_BUSY),
-
-	CSD_TYPE_ASYNC		= 0x00,
-	CSD_TYPE_SYNC		= 0x10,
-	CSD_TYPE_IRQ_WORK	= 0x20,
-	CSD_TYPE_TTWU		= 0x30,
-
-	CSD_FLAG_TYPE_MASK	= 0xF0,
-};
-
-/*
- * struct __call_single_node is the primary type on
- * smp.c:call_single_queue.
- *
- * flush_smp_call_function_queue() only reads the type from
- * __call_single_node::u_flags as a regular load, the above
- * (anonymous) enum defines all the bits of this word.
- *
- * Other bits are not modified until the type is known.
- *
- * CSD_TYPE_SYNC/ASYNC:
- *	struct {
- *		struct llist_node node;
- *		unsigned int flags;
- *		smp_call_func_t func;
- *		void *info;
- *	};
- *
- * CSD_TYPE_IRQ_WORK:
- *	struct {
- *		struct llist_node node;
- *		atomic_t flags;
- *		void (*func)(struct irq_work *);
- *	};
- *
- * CSD_TYPE_TTWU:
- *	struct {
- *		struct llist_node node;
- *		unsigned int flags;
- *	};
- *
- */
-
-struct __call_single_node {
-	struct llist_node	llist;
-	union {
-		unsigned int	u_flags;
-		atomic_t	a_flags;
-	};
-#ifdef CONFIG_64BIT
-	u16 src, dst;
-#endif
-};
-
-#endif /* __LINUX_SMP_TYPES_H */
diff --git a/kernel/smp.c b/kernel/smp.c
index 01a7c1706a58..b2b3878f0330 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -29,73 +29,8 @@
 #include "smpboot.h"
 #include "sched/smp.h"
 
-#define CSD_TYPE(_csd)	((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
-
-#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
-union cfd_seq_cnt {
-	u64		val;
-	struct {
-		u64	src:16;
-		u64	dst:16;
-#define CFD_SEQ_NOCPU	0xffff
-		u64	type:4;
-#define CFD_SEQ_QUEUE	0
-#define CFD_SEQ_IPI	1
-#define CFD_SEQ_NOIPI	2
-#define CFD_SEQ_PING	3
-#define CFD_SEQ_PINGED	4
-#define CFD_SEQ_HANDLE	5
-#define CFD_SEQ_DEQUEUE	6
-#define CFD_SEQ_IDLE	7
-#define CFD_SEQ_GOTIPI	8
-#define CFD_SEQ_HDLEND	9
-		u64	cnt:28;
-	}		u;
-};
-
-static char *seq_type[] = {
-	[CFD_SEQ_QUEUE]		= "queue",
-	[CFD_SEQ_IPI]		= "ipi",
-	[CFD_SEQ_NOIPI]		= "noipi",
-	[CFD_SEQ_PING]		= "ping",
-	[CFD_SEQ_PINGED]	= "pinged",
-	[CFD_SEQ_HANDLE]	= "handle",
-	[CFD_SEQ_DEQUEUE]	= "dequeue (src CPU 0 == empty)",
-	[CFD_SEQ_IDLE]		= "idle",
-	[CFD_SEQ_GOTIPI]	= "gotipi",
-	[CFD_SEQ_HDLEND]	= "hdlend (src CPU 0 == early)",
-};
-
-struct cfd_seq_local {
-	u64	ping;
-	u64	pinged;
-	u64	handle;
-	u64	dequeue;
-	u64	idle;
-	u64	gotipi;
-	u64	hdlend;
-};
-#endif
-
-struct cfd_percpu {
-	call_single_data_t	csd;
-#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
-	u64	seq_queue;
-	u64	seq_ipi;
-	u64	seq_noipi;
-#endif
-};
-
-struct call_function_data {
-	struct cfd_percpu	__percpu *pcpu;
-	cpumask_var_t		cpumask;
-	cpumask_var_t		cpumask_ipi;
-};
-
 static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);
-
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
-
 static void flush_smp_call_function_queue(bool warn_cpu_offline);
 
 int smpcfd_prepare_cpu(unsigned int cpu)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 02/11] smp: define the cross call interface
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
  2022-04-22 20:00 ` [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-25  9:05   ` Peter Zijlstra
  2022-04-22 20:00 ` [PATCH v2 03/11] smp: eliminate SCF_WAIT and SCF_RUN_LOCAL Donghai Qiao
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

The functions of cross CPU call interface are defined below :

int smp_call(int cpu, smp_call_func_t func, void *info,
		unsigned int flags)

int smp_call_cond(int cpu, smp_call_func_t func, void *info,
		smp_cond_func_t condf, unsigned int flags)

void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
		void *info, unsigned int flags)

void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
		void *info, smp_cond_func_t condf, unsigned int flags)

int smp_call_private(int cpu, call_single_data_t *csd, unsigned int flags)

int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
		void *info, unsigned int flags)

The motivation of submitting this patch set is intended to make the
existing cross CPU call mechanism become a bit more formal interface
and more friendly to the kernel developers.

Basically the minimum set of functions below can satisfy any demand
for cross CPU call from kernel consumers. For the sack of simplicity
self-explanatory and less code redundancy no ambiguity, the functions
in this interface are renamed, simplified, or eliminated. But they
are still inheriting the same semantics and parameter lists from their
previous version.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

 include/linux/smp.h |  30 +++++++++
 kernel/smp.c        | 156 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 186 insertions(+)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 31811da856a3..bee1e6b5b2fd 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -161,6 +161,36 @@ do {						\
 	*(_csd) = CSD_INIT((_func), (_info));	\
 } while (0)
 
+
+/*
+ * smp_call Interface.
+ *
+ * Also see kernel/smp.c for the details.
+ */
+#define	SMP_CALL_TYPE_SYNC		CSD_TYPE_SYNC
+#define	SMP_CALL_TYPE_ASYNC	CSD_TYPE_ASYNC
+#define	SMP_CALL_TYPE_IRQ_WORK	CSD_TYPE_IRQ_WORK
+#define	SMP_CALL_TYPE_TTWU		CSD_TYPE_TTWU
+#define	SMP_CALL_TYPE_MASK		CSD_FLAG_TYPE_MASK
+
+#define	SMP_CALL_ALL		-1
+
+extern int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags);
+
+extern int smp_call_cond(int cpu, smp_call_func_t func, void *info,
+		smp_cond_func_t condf, unsigned int flags);
+
+extern void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags);
+
+extern void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
+		void *info, smp_cond_func_t condf, unsigned int flags);
+
+extern int smp_call_private(int cpu, call_single_data_t *csd, unsigned int flags);
+
+extern int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags);
+
 /*
  * Enqueue a llist_node on the call_single_queue; be very careful, read
  * flush_smp_call_function_queue() in detail.
diff --git a/kernel/smp.c b/kernel/smp.c
index b2b3878f0330..b1e6a8f77a9e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -1170,3 +1170,159 @@ int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 	return sscs.ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_on_cpu);
+
+
+void __smp_call_mask_cond(const struct cpumask *mask,
+		smp_call_func_t func, void *info,
+		smp_cond_func_t cond_func,
+		unsigned int flags)
+{
+}
+
+/*
+ * smp_call Interface
+ *
+ * Consolidate the cross CPU call usage from the history below:
+ *
+ * Normally this interface cannot be used with interrupts disabled or
+ * from a hardware interrupt handler or from a bottom half handler.
+ * But there are two exceptions:
+ * 1) It can be used during early boot while early_boot_irqs_disabled
+ *    is. In this scenario, you should use local_irq_save/restore()
+ *    instead of local_irq_disable/enable()
+ * 2) Because smp_call_private(cpu, csd, SMP_CALL_TYPE_ASYNC) is an asynchonous
+ *    call with a preallocated csd structure, thus it can be called from
+ *    the context where interrupts are disabled.
+ */
+
+/*
+ * Parameters:
+ *
+ * cpu: If cpu >=0 && cpu < nr_cpu_ids, the cross call is for that cpu.
+ *      If cpu == -1, the cross call is for all the online CPUs
+ *
+ * func: It is the cross function that the destination CPUs need to execute.
+ *       This function must be fast and non-blocking.
+ *
+ * info: It is the parameter to func().
+ *
+ * flags: The flags specify the manner the cross call is performaned in terms
+ *	  of synchronous or asynchronous.
+ *
+ *	  A synchronous cross call will not return immediately until all
+ *	  the destination CPUs have executed func() and responded the call.
+ *
+ *	  An asynchrouse cross call will return immediately as soon as it
+ *	  has fired all the cross calls and run func() locally if needed
+ *	  regardless the status of the target CPUs.
+ *
+ * Return: %0 on success or negative errno value on error.
+ */
+int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags)
+{
+	return smp_call_cond(cpu, func, info, NULL, flags);
+}
+EXPORT_SYMBOL(smp_call);
+
+/*
+ * Parameters:
+ *
+ * cond_func: This is a condition function cond_func(cpu, info) invoked by
+ *	      the underlying cross call mechanism only. If the return value
+ *	      from cond_func(cpu, info) is true, the cross call will be sent
+ *	      to that cpu, otherwise the call will not be sent.
+ *
+ * Others: see smp_call().
+ *
+ * Return: %0 on success or negative errno value on error.
+ */
+int smp_call_cond(int cpu, smp_call_func_t func, void *info,
+		    smp_cond_func_t cond_func, unsigned int flags)
+{
+	preempt_disable();
+	if (cpu == SMP_CALL_ALL) {
+		__smp_call_mask_cond(cpu_online_mask, func, info, cond_func, flags);
+	} else if ((unsigned int)cpu < nr_cpu_ids)
+		__smp_call_mask_cond(cpumask_of(cpu), func, info, cond_func, flags);
+	else {
+		preempt_enable();
+		pr_warn("Invalid cpu ID = %d\n", cpu);
+		return -ENXIO;
+	}
+	preempt_enable();
+	return 0;
+}
+EXPORT_SYMBOL(smp_call_cond);
+
+/*
+ * Parameters:
+ *
+ * mask: This is the bitmap of CPUs to which the cross call will be sent.
+ *
+ * Others: see smp_call().
+ */
+void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags)
+{
+	preempt_disable();
+	__smp_call_mask_cond(mask, func, info, NULL, flags);
+	preempt_enable();
+}
+EXPORT_SYMBOL(smp_call_mask);
+
+/*
+ * The combination of smp_call_cond() and smp_call_mask()
+ */
+void smp_call_mask_cond(const struct cpumask *mask,
+		smp_call_func_t func, void *info,
+		smp_cond_func_t cond_func,
+		unsigned int flags)
+{
+	preempt_disable();
+	__smp_call_mask_cond(mask, func, info, cond_func, flags);
+	preempt_enable();
+}
+EXPORT_SYMBOL(smp_call_mask_cond);
+
+/*
+ * This function provides an alternative way of sending a call call to
+ * only one CPU with a private csd instead of using the csd resource of
+ * the call. But it is the callers' responsibity to setup and maintain
+ * its private call_single_data_t struture.
+ *
+ * Because the call is asynchonous with a preallocated csd structure, thus
+ * it can be called from contexts with disabled interrupts.
+ *
+ * Parameters
+ *
+ * cpu:   Must be a positive value less than nr_cpu_id.
+ * csd:   The private csd provided by the caller.
+ *
+ * Others: see smp_call().
+ */
+int smp_call_private(int cpu, call_single_data_t *csd, unsigned int flags)
+{
+	return 0;
+}
+EXPORT_SYMBOL(smp_call_private);
+
+/*
+ * Parameters:
+ *
+ * mask:  Run func() on one of the given CPUs in mask if it is oneline.
+ *        CPU selection preference (from the original comments for
+ *        smp_call_function_any()) :
+ *          1) current cpu if in @mask
+ *          2) any cpu of current node if in @mask
+ *          3) any other online cpu in @mask
+ *
+ * Others, see smp_call().
+ *
+ * Returns 0 on success, else a negative status code (if no cpus were online).
+ */
+int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
+		void *info, unsigned int flags)
+{
+	return 0;
+}
+EXPORT_SYMBOL(smp_call_any);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 03/11] smp: eliminate SCF_WAIT and SCF_RUN_LOCAL
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
  2022-04-22 20:00 ` [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
  2022-04-22 20:00 ` [PATCH v2 02/11] smp: define the cross call interface Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-25  9:10   ` Peter Zijlstra
  2022-04-22 20:00 ` [PATCH v2 04/11] smp: replace smp_call_function_single() with smp_call() Donghai Qiao
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

The commit a32a4d8a815c ("smp: Run functions concurrently in
smp_call_function_many_cond()") was to improve the concurrent
of the cross call execution between local and remote. The change
in smp_call_function_many_cond() did what was intended, but the
new macro SCF_WAIT and SCF_RUN_LOCAL and the code around them
to handle local call were not unnecessary because the modified
function smp_call_function_many() was able to handle the local
cross call. So these two macros can be eliminated and the code
implemented around that can be removed as well.

Also this patch fixed a issue of a comparison between an integer
and a unsigned integer in smp_call_function_many_cond().

The changes with this patch and the changes made in subsequent
patches will eventually help eliminate the set of on_each_cpu*
functions.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

 kernel/smp.c | 32 +++++++-------------------------
 1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index b1e6a8f77a9e..22e8f2bd770b 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -787,23 +787,13 @@ int smp_call_function_any(const struct cpumask *mask,
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
 
-/*
- * Flags to be used as scf_flags argument of smp_call_function_many_cond().
- *
- * %SCF_WAIT:		Wait until function execution is completed
- * %SCF_RUN_LOCAL:	Run also locally if local cpu is set in cpumask
- */
-#define SCF_WAIT	(1U << 0)
-#define SCF_RUN_LOCAL	(1U << 1)
-
 static void smp_call_function_many_cond(const struct cpumask *mask,
 					smp_call_func_t func, void *info,
-					unsigned int scf_flags,
+					bool wait,
 					smp_cond_func_t cond_func)
 {
 	int cpu, last_cpu, this_cpu = smp_processor_id();
 	struct call_function_data *cfd;
-	bool wait = scf_flags & SCF_WAIT;
 	bool run_remote = false;
 	bool run_local = false;
 	int nr_cpus = 0;
@@ -829,14 +819,14 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 	WARN_ON_ONCE(!in_task());
 
 	/* Check if we need local execution. */
-	if ((scf_flags & SCF_RUN_LOCAL) && cpumask_test_cpu(this_cpu, mask))
+	if (cpumask_test_cpu(this_cpu, mask))
 		run_local = true;
 
 	/* Check if we need remote execution, i.e., any CPU excluding this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
 	if (cpu == this_cpu)
 		cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
-	if (cpu < nr_cpu_ids)
+	if ((unsigned int)cpu < nr_cpu_ids)
 		run_remote = true;
 
 	if (run_remote) {
@@ -911,12 +901,8 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
  * @mask: The set of cpus to run on (only runs on online subset).
  * @func: The function to run. This must be fast and non-blocking.
  * @info: An arbitrary pointer to pass to the function.
- * @wait: Bitmask that controls the operation. If %SCF_WAIT is set, wait
- *        (atomically) until function has completed on other CPUs. If
- *        %SCF_RUN_LOCAL is set, the function will also be run locally
- *        if the local CPU is set in the @cpumask.
- *
- * If @wait is true, then returns once @func has returned.
+ * @wait: If wait is true, the call will not return until func()
+ *        has completed on other CPUs.
  *
  * You must not call this function with disabled interrupts or from a
  * hardware interrupt handler or from a bottom half handler. Preemption
@@ -925,7 +911,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 void smp_call_function_many(const struct cpumask *mask,
 			    smp_call_func_t func, void *info, bool wait)
 {
-	smp_call_function_many_cond(mask, func, info, wait * SCF_WAIT, NULL);
+	smp_call_function_many_cond(mask, func, info, wait, NULL);
 }
 EXPORT_SYMBOL(smp_call_function_many);
 
@@ -1061,13 +1047,9 @@ void __init smp_init(void)
 void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
 			   void *info, bool wait, const struct cpumask *mask)
 {
-	unsigned int scf_flags = SCF_RUN_LOCAL;
-
-	if (wait)
-		scf_flags |= SCF_WAIT;
 
 	preempt_disable();
-	smp_call_function_many_cond(mask, func, info, scf_flags, cond_func);
+	smp_call_function_many_cond(mask, func, info, wait, cond_func);
 	preempt_enable();
 }
 EXPORT_SYMBOL(on_each_cpu_cond_mask);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 04/11] smp: replace smp_call_function_single() with smp_call()
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (2 preceding siblings ...)
  2022-04-22 20:00 ` [PATCH v2 03/11] smp: eliminate SCF_WAIT and SCF_RUN_LOCAL Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-25  9:33   ` Peter Zijlstra
  2022-04-22 20:00 ` [PATCH v2 05/11] smp: replace smp_call_function_single_async() with smp_call_private() Donghai Qiao
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Eliminated the percpu global csd_data and temporarily hook up
to smp_call().

There is no obvious reason or evidence that the differentiation
of xcall of single recipient from that of multiple recipients
can exploit noticeable performance gaining. If something can be
optimized on this matter, it might be from the interrupt level
which has been already addressed by arch_send_call_function_single_ipi()
and arch_send_call_function_ipi_mask(). In fact, both have been
taken in to account from smp_call_function_many_cond().

So, it is appropriate to make this change as part of the cross
call interface.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

 kernel/smp.c | 74 ++++++++++++++++++----------------------------------
 1 file changed, 25 insertions(+), 49 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 22e8f2bd770b..8998b97d5f72 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -399,8 +399,6 @@ static __always_inline void csd_unlock(struct __call_single_data *csd)
 	smp_store_release(&csd->node.u_flags, 0);
 }
 
-static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
-
 void __smp_call_single_queue(int cpu, struct llist_node *node)
 {
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
@@ -634,6 +632,9 @@ void flush_smp_call_function_from_idle(void)
 }
 
 /*
+ * This is a temporarily hook up. This function will be eliminated
+ * with the last patch in this series.
+ *
  * smp_call_function_single - Run a function on a specific CPU
  * @func: The function to run. This must be fast and non-blocking.
  * @info: An arbitrary pointer to pass to the function.
@@ -642,59 +643,21 @@ void flush_smp_call_function_from_idle(void)
  * Returns 0 on success, else a negative status code.
  */
 int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
-			     int wait)
+			int wait)
 {
-	call_single_data_t *csd;
-	call_single_data_t csd_stack = {
-		.node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, },
-	};
-	int this_cpu;
-	int err;
-
-	/*
-	 * prevent preemption and reschedule on another processor,
-	 * as well as CPU removal
-	 */
-	this_cpu = get_cpu();
-
-	/*
-	 * Can deadlock when called with interrupts disabled.
-	 * We allow cpu's that are not yet online though, as no one else can
-	 * send smp call function interrupt to this cpu and as such deadlocks
-	 * can't happen.
-	 */
-	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
-		     && !oops_in_progress);
-
-	/*
-	 * When @wait we can deadlock when we interrupt between llist_add() and
-	 * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
-	 * csd_lock() on because the interrupt context uses the same csd
-	 * storage.
-	 */
-	WARN_ON_ONCE(!in_task());
-
-	csd = &csd_stack;
-	if (!wait) {
-		csd = this_cpu_ptr(&csd_data);
-		csd_lock(csd);
-	}
-
-	csd->func = func;
-	csd->info = info;
-#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
-	csd->node.src = smp_processor_id();
-	csd->node.dst = cpu;
-#endif
+	unsigned int flags = 0;
 
-	err = generic_exec_single(cpu, csd);
+	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu))
+		return -ENXIO;
 
 	if (wait)
-		csd_lock_wait(csd);
+		flags = SMP_CALL_TYPE_SYNC;
+	else
+		flags = SMP_CALL_TYPE_ASYNC;
 
-	put_cpu();
+	smp_call(cpu, func, info, flags);
 
-	return err;
+	return 0;
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
@@ -1159,6 +1122,19 @@ void __smp_call_mask_cond(const struct cpumask *mask,
 		smp_cond_func_t cond_func,
 		unsigned int flags)
 {
+	bool wait = false;
+
+	if (flags == SMP_CALL_TYPE_SYNC)
+		wait = true;
+
+	preempt_disable();
+
+	/*
+	 * This is temporarily hook. The function smp_call_function_many_cond()
+	 * will be inlined here with the last patch in this series.
+	 */
+	smp_call_function_many_cond(mask, func, info, wait, cond_func);
+	preempt_enable();
 }
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 05/11] smp: replace smp_call_function_single_async() with  smp_call_private()
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (3 preceding siblings ...)
  2022-04-22 20:00 ` [PATCH v2 04/11] smp: replace smp_call_function_single() with smp_call() Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-23  7:30   ` kernel test robot
  2022-04-25  9:35   ` Peter Zijlstra
  2022-04-22 20:00 ` [PATCH v2 06/11] smp: use smp_call_private() fron irq_work.c and core.c Donghai Qiao
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replaced smp_call_function_single_async() with smp_call_private()
and also extended smp_call_private() to support one CPU synchronous
call with preallocated csd structures.

Ideally, the new interface smp_call() should be able to do what
smp_call_function_single_async() does. Because the csd is provided
and maintained by the callers, it exposes the risk of corrupting
the call_single_queue[cpu] linked list if the clents menipulate
their csd inappropriately. On the other hand, there should have no
noticeable performance advantage to provide preallocated csd for
cross call kernel consumers. Thus, in the long run, the consumers
should change to not use this type of preallocated csd.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

 include/linux/smp.h |   3 +-
 kernel/smp.c        | 163 +++++++++++++++++++++-----------------------
 2 files changed, 81 insertions(+), 85 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index bee1e6b5b2fd..0301faf270bf 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -206,7 +206,8 @@ int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
 			   void *info, bool wait, const struct cpumask *mask);
 
-int smp_call_function_single_async(int cpu, struct __call_single_data *csd);
+#define	smp_call_function_single_async(cpu, csd) \
+	smp_call_private(cpu, csd, SMP_CALL_TYPE_ASYNC)
 
 /*
  * Cpus stopping functions in panic. All have default weak definitions.
diff --git a/kernel/smp.c b/kernel/smp.c
index 8998b97d5f72..51715633b4f7 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -429,41 +429,6 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
 		send_call_function_single_ipi(cpu);
 }
 
-/*
- * Insert a previously allocated call_single_data_t element
- * for execution on the given CPU. data must already have
- * ->func, ->info, and ->flags set.
- */
-static int generic_exec_single(int cpu, struct __call_single_data *csd)
-{
-	if (cpu == smp_processor_id()) {
-		smp_call_func_t func = csd->func;
-		void *info = csd->info;
-		unsigned long flags;
-
-		/*
-		 * We can unlock early even for the synchronous on-stack case,
-		 * since we're doing this from the same CPU..
-		 */
-		csd_lock_record(csd);
-		csd_unlock(csd);
-		local_irq_save(flags);
-		func(info);
-		csd_lock_record(NULL);
-		local_irq_restore(flags);
-		return 0;
-	}
-
-	if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
-		csd_unlock(csd);
-		return -ENXIO;
-	}
-
-	__smp_call_single_queue(cpu, &csd->node.llist);
-
-	return 0;
-}
-
 /**
  * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
  *
@@ -661,52 +626,6 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-/**
- * smp_call_function_single_async() - Run an asynchronous function on a
- * 			         specific CPU.
- * @cpu: The CPU to run on.
- * @csd: Pre-allocated and setup data structure
- *
- * Like smp_call_function_single(), but the call is asynchonous and
- * can thus be done from contexts with disabled interrupts.
- *
- * The caller passes his own pre-allocated data structure
- * (ie: embedded in an object) and is responsible for synchronizing it
- * such that the IPIs performed on the @csd are strictly serialized.
- *
- * If the function is called with one csd which has not yet been
- * processed by previous call to smp_call_function_single_async(), the
- * function will return immediately with -EBUSY showing that the csd
- * object is still in progress.
- *
- * NOTE: Be careful, there is unfortunately no current debugging facility to
- * validate the correctness of this serialization.
- *
- * Return: %0 on success or negative errno value on error
- */
-int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
-{
-	int err = 0;
-
-	preempt_disable();
-
-	if (csd->node.u_flags & CSD_FLAG_LOCK) {
-		err = -EBUSY;
-		goto out;
-	}
-
-	csd->node.u_flags = CSD_FLAG_LOCK;
-	smp_wmb();
-
-	err = generic_exec_single(cpu, csd);
-
-out:
-	preempt_enable();
-
-	return err;
-}
-EXPORT_SYMBOL_GPL(smp_call_function_single_async);
-
 /*
  * smp_call_function_any - Run a function on any of the given cpus
  * @mask: The mask of cpus it can run on.
@@ -1251,16 +1170,92 @@ EXPORT_SYMBOL(smp_call_mask_cond);
  * Because the call is asynchonous with a preallocated csd structure, thus
  * it can be called from contexts with disabled interrupts.
  *
- * Parameters
+ * Ideally this functionality should be part of smp_call_mask_cond().
+ * Because the csd is provided and maintained by the callers, merging this
+ * functionality into smp_call_mask_cond() will result in some extra
+ * complications in it. Before there is better way to facilitate all
+ * kinds of call, let's still handle this case with a separate function.
+ *
+ * The bit CSD_FLAG_LOCK will be set to csd->node.u_flags only if the
+ * call is made as type CSD_TYPE_SYNC or CSD_TYPE_ASYNC.
  *
+ * Parameters:
  * cpu:   Must be a positive value less than nr_cpu_id.
  * csd:   The private csd provided by the caller.
- *
  * Others: see smp_call().
+ *
+ * Return: %0 on success or negative errno value on error.
+ *
+ * The following comments are from smp_call_function_single_async():
+ *
+ *    The call is asynchronous and can thus be done from contexts with
+ *    disabled interrupts. If the function is called with one csd which
+ *    has not yet been processed by previous call, the function will
+ *    return immediately with -EBUSY showing that the csd object is
+ *    still in progress.
+ *
+ *    NOTE: Be careful, there is unfortunately no current debugging
+ *    facility to validate the correctness of this serialization.
  */
 int smp_call_private(int cpu, call_single_data_t *csd, unsigned int flags)
 {
-	return 0;
+	int err = 0;
+
+	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
+		pr_warn("cpu ID must be a positive number < nr_cpu_ids and must be currently online\n");
+		return -EINVAL;
+	}
+
+	if (csd == NULL) {
+		pr_warn("csd must not be NULL\n");
+		return -EINVAL;
+	}
+
+	preempt_disable();
+	if (csd->node.u_flags & CSD_FLAG_LOCK) {
+		err = -EBUSY;
+		goto out;
+	}
+
+	/*
+	 * CSD_FLAG_LOCK is set for CSD_TYPE_SYNC or CSD_TYPE_ASYNC only.
+	 */
+	if ((flags & ~(CSD_TYPE_SYNC | CSD_TYPE_ASYNC)) == 0)
+		csd->node.u_flags = CSD_FLAG_LOCK | flags;
+	else
+		csd->node.u_flags = flags;
+
+	if (cpu == smp_processor_id()) {
+		smp_call_func_t func = csd->func;
+		void *info = csd->info;
+		unsigned long flags;
+
+		/*
+		 * We can unlock early even for the synchronous on-stack case,
+		 * since we're doing this from the same CPU..
+		 */
+		csd_lock_record(csd);
+		csd_unlock(csd);
+		local_irq_save(flags);
+		func(info);
+		csd_lock_record(NULL);
+		local_irq_restore(flags);
+		goto out;
+	}
+
+	/*
+	 * Ensure the flags are visible before the csd
+	 * goes to the queue.
+	 */
+	smp_wmb();
+
+	__smp_call_single_queue(cpu, &csd->node.llist);
+
+	if (flags & CSD_TYPE_SYNC)
+		csd_lock_wait(csd);
+out:
+	preempt_enable();
+	return err;
 }
 EXPORT_SYMBOL(smp_call_private);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 06/11] smp: use smp_call_private() fron irq_work.c and core.c
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (4 preceding siblings ...)
  2022-04-22 20:00 ` [PATCH v2 05/11] smp: replace smp_call_function_single_async() with smp_call_private() Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-25  9:37   ` Peter Zijlstra
  2022-04-22 20:00 ` [PATCH v2 07/11] smp: change smp_call_function_any() to smp_call_any() Donghai Qiao
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

irq_work.c and core.c should use the cross interface rather than
using a unpublished internal function __smp_call_single_queue.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

 kernel/irq_work.c   | 4 ++--
 kernel/sched/core.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index f7df715ec28e..0d17c9d03d03 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -159,8 +159,8 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
 			if (!irq_work_claim(work))
 				goto out;
 		}
-
-		__smp_call_single_queue(cpu, &work->node.llist);
+		smp_call_private(cpu, (call_single_data_t *)&work->node.llist,
+					SMP_CALL_TYPE_IRQ_WORK);
 	} else {
 		__irq_work_queue_local(work);
 	}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51efaabac3e4..4fda3dfb887b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3780,7 +3780,7 @@ static void __ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags
 	p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
 
 	WRITE_ONCE(rq->ttwu_pending, 1);
-	__smp_call_single_queue(cpu, &p->wake_entry.llist);
+	smp_call_private(cpu, (call_single_data_t *)&p->wake_entry.llist, SMP_CALL_TYPE_TTWU);
 }
 
 void wake_up_if_idle(int cpu)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 07/11] smp: change smp_call_function_any() to smp_call_any()
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (5 preceding siblings ...)
  2022-04-22 20:00 ` [PATCH v2 06/11] smp: use smp_call_private() fron irq_work.c and core.c Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-22 20:00 ` [PATCH v2 08/11] smp: replace smp_call_function_many_cond() with __smp_call_mask_cond() Donghai Qiao
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Rename smp_call_function_any() to smp_call_any() and also make
the changes necessary.

Replace all the invocations of smp_call_function_any() with
smp_call_any() for all.

Actually the kernel consumers can use smp_call() when they want
to use smp_call_function_any(). The extra logics handled by
smp_call_function_any() should be moved out of there and have the
consumers choose the preferred CPU. Because there are quite a few
of the cross call consumers need to run their functions on just
one of the CPUs of a given CPU set, so there is some advantage to
add smp_call_any() to the interface.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

 arch/arm/kernel/perf_event_v7.c           |  6 +-
 arch/arm64/kernel/perf_event.c            |  6 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  2 +-
 drivers/cpufreq/acpi-cpufreq.c            |  4 +-
 drivers/cpufreq/powernv-cpufreq.c         | 12 ++--
 drivers/perf/arm_spe_pmu.c                |  2 +-
 include/linux/smp.h                       | 12 +---
 kernel/smp.c                              | 78 ++++++++++-------------
 8 files changed, 53 insertions(+), 69 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index eb2190477da1..ae008cba28c4 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1192,9 +1192,9 @@ static void armv7_read_num_pmnc_events(void *info)
 
 static int armv7_probe_num_events(struct arm_pmu *arm_pmu)
 {
-	return smp_call_function_any(&arm_pmu->supported_cpus,
-				     armv7_read_num_pmnc_events,
-				     &arm_pmu->num_events, 1);
+	return smp_call_any(&arm_pmu->supported_cpus,
+			     armv7_read_num_pmnc_events,
+			     &arm_pmu->num_events, SMP_CALL_TYPE_SYNC);
 }
 
 static int armv7_a8_pmu_init(struct arm_pmu *cpu_pmu)
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index cb69ff1e6138..7326c7cc67de 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1186,9 +1186,9 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
 	};
 	int ret;
 
-	ret = smp_call_function_any(&cpu_pmu->supported_cpus,
-				    __armv8pmu_probe_pmu,
-				    &probe, 1);
+	ret = smp_call_any(&cpu_pmu->supported_cpus,
+			    __armv8pmu_probe_pmu,
+			    &probe, SMP_CALL_TYPE_SYNC);
 	if (ret)
 		return ret;
 
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 87666275eed9..418c83c3c4b5 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -512,7 +512,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	rr->val = 0;
 	rr->first = first;
 
-	smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
+	(void) smp_call_any(&d->cpu_mask, mon_event_count, rr, SMP_CALL_TYPE_SYNC);
 }
 
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 3d514b82d055..e0c4b3c6575d 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -312,8 +312,8 @@ static u32 drv_read(struct acpi_cpufreq_data *data, const struct cpumask *mask)
 	};
 	int err;
 
-	err = smp_call_function_any(mask, do_drv_read, &cmd, 1);
-	WARN_ON_ONCE(err);	/* smp_call_function_any() was buggy? */
+	err = smp_call_any(mask, do_drv_read, &cmd, SMP_CALL_TYPE_SYNC);
+	WARN_ON_ONCE(err);	/* smp_call_any() was buggy? */
 	return cmd.val;
 }
 
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index fddbd1ea1635..d4f45bb9c419 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -507,8 +507,8 @@ static unsigned int powernv_cpufreq_get(unsigned int cpu)
 {
 	struct powernv_smp_call_data freq_data;
 
-	smp_call_function_any(cpu_sibling_mask(cpu), powernv_read_cpu_freq,
-			&freq_data, 1);
+	(void) smp_call_any(cpu_sibling_mask(cpu), powernv_read_cpu_freq,
+			&freq_data, SMP_CALL_TYPE_SYNC);
 
 	return freq_data.freq;
 }
@@ -820,8 +820,10 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
 	 * Use smp_call_function to send IPI and execute the
 	 * mtspr on target CPU.  We could do that without IPI
 	 * if current CPU is within policy->cpus (core)
+	 *
+	 * Shouldn't return the value of smp_call_any() ?
 	 */
-	smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1);
+	(void) smp_call_any(policy->cpus, set_pstate, &freq_data, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 
@@ -921,8 +923,8 @@ static void powernv_cpufreq_work_fn(struct work_struct *work)
 
 	cpus_read_lock();
 	cpumask_and(&mask, &chip->mask, cpu_online_mask);
-	smp_call_function_any(&mask,
-			      powernv_cpufreq_throttle_check, NULL, 0);
+	(void) smp_call_any(&mask, powernv_cpufreq_throttle_check,
+			     NULL, SMP_CALL_TYPE_ASYNC);
 
 	if (!chip->restore)
 		goto out;
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d99c..e1059c60feab 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -1108,7 +1108,7 @@ static int arm_spe_pmu_dev_init(struct arm_spe_pmu *spe_pmu)
 	cpumask_t *mask = &spe_pmu->supported_cpus;
 
 	/* Make sure we probe the hardware on a relevant CPU */
-	ret = smp_call_function_any(mask,  __arm_spe_pmu_dev_probe, spe_pmu, 1);
+	ret = smp_call_any(mask,  __arm_spe_pmu_dev_probe, spe_pmu, SMP_CALL_TYPE_SYNC);
 	if (ret || !(spe_pmu->features & SPE_PMU_FEAT_DEV_PROBED))
 		return -ENXIO;
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 0301faf270bf..3a6663bce18f 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -161,6 +161,8 @@ do {						\
 	*(_csd) = CSD_INIT((_func), (_info));	\
 } while (0)
 
+extern int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
+			 void *info, unsigned int flags);
 
 /*
  * smp_call Interface.
@@ -304,9 +306,6 @@ void smp_call_function(smp_call_func_t func, void *info, int wait);
 void smp_call_function_many(const struct cpumask *mask,
 			    smp_call_func_t func, void *info, bool wait);
 
-int smp_call_function_any(const struct cpumask *mask,
-			  smp_call_func_t func, void *info, int wait);
-
 void kick_all_cpus_sync(void);
 void wake_up_all_idle_cpus(void);
 
@@ -355,13 +354,6 @@ static inline void smp_send_reschedule(int cpu) { }
 			(up_smp_call_function(func, info))
 static inline void call_function_init(void) { }
 
-static inline int
-smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
-		      void *info, int wait)
-{
-	return smp_call_function_single(0, func, info, wait);
-}
-
 static inline void kick_all_cpus_sync(void) {  }
 static inline void wake_up_all_idle_cpus(void) {  }
 
diff --git a/kernel/smp.c b/kernel/smp.c
index 51715633b4f7..6a43ab165aee 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -626,49 +626,6 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-/*
- * smp_call_function_any - Run a function on any of the given cpus
- * @mask: The mask of cpus it can run on.
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait until function has completed.
- *
- * Returns 0 on success, else a negative status code (if no cpus were online).
- *
- * Selection preference:
- *	1) current cpu if in @mask
- *	2) any cpu of current node if in @mask
- *	3) any other online cpu in @mask
- */
-int smp_call_function_any(const struct cpumask *mask,
-			  smp_call_func_t func, void *info, int wait)
-{
-	unsigned int cpu;
-	const struct cpumask *nodemask;
-	int ret;
-
-	/* Try for same CPU (cheapest) */
-	cpu = get_cpu();
-	if (cpumask_test_cpu(cpu, mask))
-		goto call;
-
-	/* Try for same node. */
-	nodemask = cpumask_of_node(cpu_to_node(cpu));
-	for (cpu = cpumask_first_and(nodemask, mask); cpu < nr_cpu_ids;
-	     cpu = cpumask_next_and(cpu, nodemask, mask)) {
-		if (cpu_online(cpu))
-			goto call;
-	}
-
-	/* Any online will do: smp_call_function_single handles nr_cpu_ids. */
-	cpu = cpumask_any_and(mask, cpu_online_mask);
-call:
-	ret = smp_call_function_single(cpu, func, info, wait);
-	put_cpu();
-	return ret;
-}
-EXPORT_SYMBOL_GPL(smp_call_function_any);
-
 static void smp_call_function_many_cond(const struct cpumask *mask,
 					smp_call_func_t func, void *info,
 					bool wait,
@@ -1276,6 +1233,39 @@ EXPORT_SYMBOL(smp_call_private);
 int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
 		void *info, unsigned int flags)
 {
-	return 0;
+	int cpu;
+        const struct cpumask *nodemask;
+
+	if (mask == NULL || func == NULL ||
+	    (flags != SMP_CALL_TYPE_SYNC && flags != SMP_CALL_TYPE_ASYNC))
+		return -EINVAL;
+
+        /* Try for same CPU (cheapest) */
+	preempt_disable();
+	cpu = smp_processor_id();
+
+        if (cpumask_test_cpu(cpu, mask))
+                goto call;
+
+        /* Try for same node. */
+        nodemask = cpumask_of_node(cpu_to_node(cpu));
+        for (cpu = cpumask_first_and(nodemask, mask); (unsigned int)cpu < nr_cpu_ids;
+             cpu = cpumask_next_and(cpu, nodemask, mask)) {
+                if (cpu_online(cpu))
+                        goto call;
+        }
+
+        /* Any online will do: smp_call_function_single handles nr_cpu_ids. */
+        cpu = cpumask_any_and(mask, cpu_online_mask);
+	if ((unsigned int)cpu >= nr_cpu_ids) {
+        	preempt_enable();
+		return -ENXIO;
+	}
+
+call:
+	(void) smp_call(cpu, func, info, flags);
+
+        preempt_enable();
+        return 0;
 }
 EXPORT_SYMBOL(smp_call_any);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 08/11] smp: replace smp_call_function_many_cond() with  __smp_call_mask_cond()
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (6 preceding siblings ...)
  2022-04-22 20:00 ` [PATCH v2 07/11] smp: change smp_call_function_any() to smp_call_any() Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-22 20:00 ` [PATCH v2 09/11] smp: replace smp_call_function_single_async with smp_call_private Donghai Qiao
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replaced smp_call_function_many_cond() with __smp_call_mask_cond()
and make the changes as necessary.

Consolidated and clean up the redundant code alongside the paths of the
invocation to the function smp_call_function_many_cond().

on_each_cpu_cond_mask(cond_func, func, info, wait, mask) is replaced by
smp_call_mask_cond(umask, func, cond_func, info, (wait?SMP_CALL_TYPE_SYNC:
SMP_CALL_TYPE_ASYNC))

smp_call_function_many(mask, func, info, wait) is replaced by
smp_call_mask(mask, func, info, (wait?SMP_CALL_TYPE_SYNC:SMP_CALL_TYPE_ASYNC))

smp_call_function(func, info, wait) is replace by
smp_call(SMP_CALL_ALL, func, info, (wait?SMP_CALL_TYPE_SYNC:SMP_CALL_TYPE_ASYNC))

on_each_cpu(func info, wait) is replaced by
smp_call(SMP_CALL_ALL, func, info, (wait?SMP_CALL_TYPE_SYNC:SMP_CALL_TYPE_ASYNC))

on_each_cpu_mask(mask, func, info, wait) is replaced by
smp_call_mask(mask, func, info, (wait?SMP_CALL_TYPE_SYNC:SMP_CALL_TYPE_ASYNC))

on_each_cpu_cond(cond_func, func, info, wait) is replaced by
smp_call_cond(SMP_CALL_ALL, func, info, cond_func, (wait ? SMP_CALL_TYPE_SYNC:
SMP_CALL_TYPE_ASYNC))

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

 arch/alpha/kernel/process.c                   |   2 +-
 arch/alpha/kernel/smp.c                       |  10 +-
 arch/arc/kernel/perf_event.c                  |   2 +-
 arch/arc/mm/cache.c                           |   2 +-
 arch/arc/mm/tlb.c                             |  14 +-
 arch/arm/common/bL_switcher.c                 |   2 +-
 arch/arm/kernel/machine_kexec.c               |   2 +-
 arch/arm/kernel/smp_tlb.c                     |  22 +--
 arch/arm/kernel/smp_twd.c                     |   4 +-
 arch/arm/mm/flush.c                           |   4 +-
 arch/arm/vfp/vfpmodule.c                      |   2 +-
 arch/arm64/kernel/armv8_deprecated.c          |   4 +-
 arch/arm64/kernel/perf_event.c                |   2 +-
 arch/arm64/kvm/arm.c                          |   6 +-
 arch/csky/abiv2/cacheflush.c                  |   2 +-
 arch/csky/kernel/perf_event.c                 |   2 +-
 arch/csky/kernel/smp.c                        |   2 +-
 arch/csky/mm/cachev2.c                        |   2 +-
 arch/ia64/kernel/mca.c                        |   4 +-
 arch/ia64/kernel/smp.c                        |  10 +-
 arch/ia64/kernel/uncached.c                   |   4 +-
 arch/mips/cavium-octeon/octeon-irq.c          |   4 +-
 arch/mips/cavium-octeon/setup.c               |   4 +-
 arch/mips/kernel/crash.c                      |   2 +-
 arch/mips/kernel/machine_kexec.c              |   2 +-
 arch/mips/kernel/perf_event_mipsxx.c          |   7 +-
 arch/mips/kernel/smp.c                        |   8 +-
 arch/mips/kernel/sysrq.c                      |   2 +-
 arch/mips/mm/c-r4k.c                          |   4 +-
 arch/mips/sibyte/common/cfe.c                 |   2 +-
 arch/openrisc/kernel/smp.c                    |  12 +-
 arch/parisc/kernel/cache.c                    |   4 +-
 arch/parisc/mm/init.c                         |   2 +-
 arch/powerpc/kernel/dawr.c                    |   2 +-
 arch/powerpc/kernel/kvm.c                     |   2 +-
 arch/powerpc/kernel/security.c                |   6 +-
 arch/powerpc/kernel/smp.c                     |   4 +-
 arch/powerpc/kernel/sysfs.c                   |   2 +-
 arch/powerpc/kernel/tau_6xx.c                 |   4 +-
 arch/powerpc/kexec/core_64.c                  |   2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |   2 +-
 arch/powerpc/mm/book3s64/pgtable.c            |   2 +-
 arch/powerpc/mm/book3s64/radix_tlb.c          |  12 +-
 arch/powerpc/mm/nohash/tlb.c                  |  10 +-
 arch/powerpc/mm/slice.c                       |   4 +-
 arch/powerpc/perf/core-book3s.c               |   2 +-
 arch/powerpc/perf/imc-pmu.c                   |   2 +-
 arch/powerpc/platforms/85xx/smp.c             |   2 +-
 arch/powerpc/platforms/powernv/idle.c         |   2 +-
 arch/powerpc/platforms/pseries/lparcfg.c      |   2 +-
 arch/riscv/mm/cacheflush.c                    |   4 +-
 arch/s390/hypfs/hypfs_diag0c.c                |   2 +-
 arch/s390/kernel/alternative.c                |   2 +-
 arch/s390/kernel/perf_cpum_cf.c               |  10 +-
 arch/s390/kernel/perf_cpum_cf_common.c        |   4 +-
 arch/s390/kernel/perf_cpum_sf.c               |   4 +-
 arch/s390/kernel/processor.c                  |   2 +-
 arch/s390/kernel/smp.c                        |   2 +-
 arch/s390/kernel/topology.c                   |   2 +-
 arch/s390/mm/pgalloc.c                        |   2 +-
 arch/s390/pci/pci_irq.c                       |   2 +-
 arch/sh/kernel/smp.c                          |  14 +-
 arch/sh/mm/cache.c                            |   2 +-
 arch/sparc/include/asm/mman.h                 |   4 +-
 arch/sparc/kernel/nmi.c                       |  12 +-
 arch/sparc/kernel/perf_event.c                |   4 +-
 arch/sparc/kernel/smp_64.c                    |   8 +-
 arch/sparc/mm/init_64.c                       |   2 +-
 arch/x86/events/core.c                        |   6 +-
 arch/x86/events/intel/core.c                  |   4 +-
 arch/x86/kernel/alternative.c                 |   2 +-
 arch/x86/kernel/amd_nb.c                      |   2 +-
 arch/x86/kernel/apic/apic.c                   |   2 +-
 arch/x86/kernel/cpu/bugs.c                    |   2 +-
 arch/x86/kernel/cpu/mce/core.c                |  12 +-
 arch/x86/kernel/cpu/mce/inject.c              |   6 +-
 arch/x86/kernel/cpu/mce/intel.c               |   2 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |   6 +-
 arch/x86/kernel/cpu/sgx/main.c                |   5 +-
 arch/x86/kernel/cpu/umwait.c                  |   2 +-
 arch/x86/kernel/cpu/vmware.c                  |   2 +-
 arch/x86/kernel/kvm.c                         |   2 +-
 arch/x86/kernel/ldt.c                         |   2 +-
 arch/x86/kvm/x86.c                            |   4 +-
 arch/x86/lib/cache-smp.c                      |   2 +-
 arch/x86/lib/msr-smp.c                        |   2 +-
 arch/x86/mm/pat/set_memory.c                  |   4 +-
 arch/x86/mm/tlb.c                             |  12 +-
 arch/x86/xen/mmu_pv.c                         |   2 +-
 arch/x86/xen/smp_pv.c                         |   2 +-
 arch/x86/xen/suspend.c                        |   4 +-
 arch/xtensa/kernel/smp.c                      |  22 +--
 drivers/char/agp/generic.c                    |   2 +-
 drivers/clocksource/mips-gic-timer.c          |   2 +-
 drivers/cpufreq/acpi-cpufreq.c                |   6 +-
 drivers/cpufreq/tegra194-cpufreq.c            |   2 +-
 drivers/cpuidle/driver.c                      |   8 +-
 drivers/edac/amd64_edac.c                     |   4 +-
 drivers/firmware/arm_sdei.c                   |  10 +-
 drivers/gpu/drm/i915/vlv_sideband.c           |   2 +-
 drivers/hwmon/fam15h_power.c                  |   2 +-
 drivers/irqchip/irq-mvebu-pic.c               |   4 +-
 drivers/net/ethernet/marvell/mvneta.c         |  30 ++---
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   8 +-
 drivers/platform/x86/intel_ips.c              |   4 +-
 drivers/soc/xilinx/xlnx_event_manager.c       |   2 +-
 drivers/tty/sysrq.c                           |   2 +-
 drivers/watchdog/booke_wdt.c                  |   8 +-
 fs/buffer.c                                   |   2 +-
 include/linux/smp.h                           |  53 +-------
 kernel/profile.c                              |   4 +-
 kernel/rcu/tree.c                             |   4 +-
 kernel/scftorture.c                           |   8 +-
 kernel/sched/membarrier.c                     |  12 +-
 kernel/smp.c                                  | 125 +++---------------
 kernel/time/hrtimer.c                         |   6 +-
 kernel/trace/ftrace.c                         |   6 +-
 kernel/trace/ring_buffer.c                    |   2 +-
 kernel/trace/trace.c                          |  12 +-
 kernel/trace/trace_events.c                   |   2 +-
 mm/kasan/quarantine.c                         |   2 +-
 mm/mmu_gather.c                               |   2 +-
 mm/slab.c                                     |   2 +-
 net/iucv/iucv.c                               |   6 +-
 virt/kvm/kvm_main.c                           |  10 +-
 126 files changed, 309 insertions(+), 459 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 5f8527081da9..90eeb04b435f 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -167,7 +167,7 @@ common_shutdown(int mode, char *restart_cmd)
 	struct halt_info args;
 	args.mode = mode;
 	args.restart_cmd = restart_cmd;
-	on_each_cpu(common_shutdown_1, &args, 0);
+	smp_call(SMP_CALL_ALL, common_shutdown_1, &args, SMP_CALL_TYPE_ASYNC);
 }
 
 void
diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index cb64e4797d2a..e396ab392cac 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -611,7 +611,7 @@ void
 smp_imb(void)
 {
 	/* Must wait other processors to flush their icache before continue. */
-	on_each_cpu(ipi_imb, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_imb, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(smp_imb);
 
@@ -626,7 +626,7 @@ flush_tlb_all(void)
 {
 	/* Although we don't have any data to pass, we do want to
 	   synchronize with the other processors.  */
-	on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 #define asn_locked() (cpu_data[smp_processor_id()].asn_lock)
@@ -661,7 +661,7 @@ flush_tlb_mm(struct mm_struct *mm)
 		}
 	}
 
-	smp_call_function(ipi_flush_tlb_mm, mm, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_mm, mm, SMP_CALL_TYPE_SYNC);
 
 	preempt_enable();
 }
@@ -712,7 +712,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 	data.mm = mm;
 	data.addr = addr;
 
-	smp_call_function(ipi_flush_tlb_page, &data, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_page, &data, SMP_CALL_TYPE_SYNC);
 
 	preempt_enable();
 }
@@ -762,7 +762,7 @@ flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 		}
 	}
 
-	smp_call_function(ipi_flush_icache_page, mm, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_icache_page, mm, SMP_CALL_TYPE_SYNC);
 
 	preempt_enable();
 }
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index adff957962da..682db1682287 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -811,7 +811,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 						 this_cpu_ptr(&arc_pmu_cpu));
 
 			if (!ret)
-				on_each_cpu(arc_cpu_pmu_irq_init, &irq, 1);
+				smp_call(SMP_CALL_ALL, arc_cpu_pmu_irq_init, &irq, SMP_CALL_TYPE_SYNC);
 			else
 				irq = -1;
 		}
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 8aa1231865d1..b231e2fe3ff6 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -569,7 +569,7 @@ static void __ic_line_inv_vaddr(phys_addr_t paddr, unsigned long vaddr,
 		.sz    = sz
 	};
 
-	on_each_cpu(__ic_line_inv_vaddr_helper, &ic_inv, 1);
+	smp_call(SMP_CALL_ALL, __ic_line_inv_vaddr_helper, &ic_inv, SMP_CALL_TYPE_SYNC);
 }
 
 #endif	/* CONFIG_SMP */
diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
index 5f71445f26bd..83aaa61c6698 100644
--- a/arch/arc/mm/tlb.c
+++ b/arch/arc/mm/tlb.c
@@ -330,13 +330,13 @@ static inline void ipi_flush_tlb_kernel_range(void *arg)
 
 void flush_tlb_all(void)
 {
-	on_each_cpu((smp_call_func_t)local_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, (smp_call_func_t)local_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	on_each_cpu_mask(mm_cpumask(mm), (smp_call_func_t)local_flush_tlb_mm,
-			 mm, 1);
+	smp_call_mask(mm_cpumask(mm), (smp_call_func_t)local_flush_tlb_mm,
+			 mm, SMP_CALL_TYPE_SYNC);
 }
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
@@ -346,7 +346,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		.ta_start = uaddr
 	};
 
-	on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page, &ta, 1);
+	smp_call_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page, &ta, SMP_CALL_TYPE_SYNC);
 }
 
 void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
@@ -358,7 +358,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		.ta_end = end
 	};
 
-	on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range, &ta, 1);
+	smp_call_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range, &ta, SMP_CALL_TYPE_SYNC);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -371,7 +371,7 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		.ta_end = end
 	};
 
-	on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_pmd_tlb_range, &ta, 1);
+	smp_call_mask(mm_cpumask(vma->vm_mm), ipi_flush_pmd_tlb_range, &ta, SMP_CALL_TYPE_SYNC);
 }
 #endif
 
@@ -382,7 +382,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.ta_end = end
 	};
 
-	on_each_cpu(ipi_flush_tlb_kernel_range, &ta, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_kernel_range, &ta, SMP_CALL_TYPE_SYNC);
 }
 #endif
 
diff --git a/arch/arm/common/bL_switcher.c b/arch/arm/common/bL_switcher.c
index 9a9aa53547a6..f02c7dfca8ed 100644
--- a/arch/arm/common/bL_switcher.c
+++ b/arch/arm/common/bL_switcher.c
@@ -541,7 +541,7 @@ int bL_switcher_trace_trigger(void)
 	preempt_disable();
 
 	bL_switcher_trace_trigger_cpu(NULL);
-	smp_call_function(bL_switcher_trace_trigger_cpu, NULL, true);
+	smp_call(SMP_CALL_ALL, bL_switcher_trace_trigger_cpu, NULL, SMP_CALL_TYPE_SYNC);
 
 	preempt_enable();
 
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f567032a09c0..21058c8ac194 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -101,7 +101,7 @@ void crash_smp_send_stop(void)
 		return;
 
 	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
-	smp_call_function(machine_crash_nonpanic_core, NULL, false);
+	smp_call(SMP_CALL_ALL, machine_crash_nonpanic_core, NULL, SMP_CALL_TYPE_ASYNC);
 	msecs = 1000; /* Wait at most a second for the other cpus to stop */
 	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
 		mdelay(1);
diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index d4908b3736d8..d16923710800 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -158,7 +158,7 @@ static void broadcast_tlb_a15_erratum(void)
 	if (!erratum_a15_798181())
 		return;
 
-	smp_call_function(ipi_flush_tlb_a15_erratum, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_a15_erratum, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void broadcast_tlb_mm_a15_erratum(struct mm_struct *mm)
@@ -171,14 +171,14 @@ static void broadcast_tlb_mm_a15_erratum(struct mm_struct *mm)
 
 	this_cpu = get_cpu();
 	a15_erratum_get_cpumask(this_cpu, mm, &mask);
-	smp_call_function_many(&mask, ipi_flush_tlb_a15_erratum, NULL, 1);
+	smp_call_mask(&mask, ipi_flush_tlb_a15_erratum, NULL, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 }
 
 void flush_tlb_all(void)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+		smp_call(SMP_CALL_ALL, ipi_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 	else
 		__flush_tlb_all();
 	broadcast_tlb_a15_erratum();
@@ -187,7 +187,7 @@ void flush_tlb_all(void)
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
+		smp_call_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, SMP_CALL_TYPE_SYNC);
 	else
 		__flush_tlb_mm(mm);
 	broadcast_tlb_mm_a15_erratum(mm);
@@ -199,8 +199,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		struct tlb_args ta;
 		ta.ta_vma = vma;
 		ta.ta_start = uaddr;
-		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
-					&ta, 1);
+		smp_call_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
+			       &ta, SMP_CALL_TYPE_SYNC);
 	} else
 		__flush_tlb_page(vma, uaddr);
 	broadcast_tlb_mm_a15_erratum(vma->vm_mm);
@@ -211,7 +211,7 @@ void flush_tlb_kernel_page(unsigned long kaddr)
 	if (tlb_ops_need_broadcast()) {
 		struct tlb_args ta;
 		ta.ta_start = kaddr;
-		on_each_cpu(ipi_flush_tlb_kernel_page, &ta, 1);
+		smp_call(SMP_CALL_ALL, ipi_flush_tlb_kernel_page, &ta, SMP_CALL_TYPE_SYNC);
 	} else
 		__flush_tlb_kernel_page(kaddr);
 	broadcast_tlb_a15_erratum();
@@ -225,8 +225,8 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		ta.ta_vma = vma;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
-					&ta, 1);
+		smp_call_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
+				&ta, SMP_CALL_TYPE_SYNC);
 	} else
 		local_flush_tlb_range(vma, start, end);
 	broadcast_tlb_mm_a15_erratum(vma->vm_mm);
@@ -238,7 +238,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		struct tlb_args ta;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu(ipi_flush_tlb_kernel_range, &ta, 1);
+		smp_call(SMP_CALL_ALL, ipi_flush_tlb_kernel_range, &ta, SMP_CALL_TYPE_SYNC);
 	} else
 		local_flush_tlb_kernel_range(start, end);
 	broadcast_tlb_a15_erratum();
@@ -247,7 +247,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 void flush_bp_all(void)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu(ipi_flush_bp_all, NULL, 1);
+		smp_call(SMP_CALL_ALL, ipi_flush_bp_all, NULL, SMP_CALL_TYPE_SYNC);
 	else
 		__flush_bp_all();
 }
diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index 9a14f721a2b0..fa1b054e1398 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -119,8 +119,8 @@ static int twd_rate_change(struct notifier_block *nb,
 	 * changing cpu.
 	 */
 	if (flags == POST_RATE_CHANGE)
-		on_each_cpu(twd_update_frequency,
-				  (void *)&cnd->new_rate, 1);
+		smp_call(SMP_CALL_ALL, twd_update_frequency,
+			  (void *)&cnd->new_rate, SMP_CALL_TYPE_SYNC);
 
 	return NOTIFY_OK;
 }
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7ff9feea13a6..b80368d320f8 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -150,8 +150,8 @@ void __flush_ptrace_access(struct page *page, unsigned long uaddr, void *kaddr,
 		else
 			__cpuc_coherent_kern_range(addr, addr + len);
 		if (cache_ops_need_broadcast())
-			smp_call_function(flush_ptrace_access_other,
-					  NULL, 1);
+			smp_call(SMP_CALL_ALL, flush_ptrace_access_other,
+				  NULL, SMP_CALL_TYPE_SYNC);
 	}
 }
 
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 2cb355c1b5b7..dc21f87062c2 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -780,7 +780,7 @@ static int __init vfp_init(void)
 	 * following test on FPSID will succeed.
 	 */
 	if (cpu_arch >= CPU_ARCH_ARMv6)
-		on_each_cpu(vfp_enable, NULL, 1);
+		smp_call(SMP_CALL_ALL, vfp_enable, NULL, SMP_CALL_TYPE_SYNC);
 
 	/*
 	 * First check that there is a VFP that we can use.
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
index 6875a16b09d2..03f469dffdb6 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -104,9 +104,9 @@ static int run_all_cpu_set_hw_mode(struct insn_emulation *insn, bool enable)
 	if (!insn->ops->set_hw_mode)
 		return -EINVAL;
 	if (enable)
-		on_each_cpu(enable_insn_hw_mode, (void *)insn, true);
+		smp_call(SMP_CALL_ALL, enable_insn_hw_mode, (void *)insn, SMP_CALL_TYPE_SYNC);
 	else
-		on_each_cpu(disable_insn_hw_mode, (void *)insn, true);
+		smp_call(SMP_CALL_ALL, disable_insn_hw_mode, (void *)insn, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 7326c7cc67de..7d9d3a611054 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1207,7 +1207,7 @@ static int armv8pmu_proc_user_access_handler(struct ctl_table *table, int write,
 	if (ret || !write || sysctl_perf_user_access)
 		return ret;
 
-	on_each_cpu(armv8pmu_disable_user_access_ipi, NULL, 1);
+	smp_call(SMP_CALL_ALL, armv8pmu_disable_user_access_ipi, NULL, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 523bc934fe2f..bb82cbc48626 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1728,7 +1728,7 @@ static int init_subsystems(void)
 	/*
 	 * Enable hardware so that subsystem initialisation can access EL2.
 	 */
-	on_each_cpu(_kvm_arch_hardware_enable, NULL, 1);
+	smp_call(SMP_CALL_ALL, _kvm_arch_hardware_enable, NULL, SMP_CALL_TYPE_SYNC);
 
 	/*
 	 * Register CPU lower-power notifier
@@ -1765,7 +1765,7 @@ static int init_subsystems(void)
 
 out:
 	if (err || !is_protected_kvm_enabled())
-		on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
+		smp_call(SMP_CALL_ALL, _kvm_arch_hardware_disable, NULL, SMP_CALL_TYPE_SYNC);
 
 	return err;
 }
@@ -2000,7 +2000,7 @@ static int pkvm_drop_host_privileges(void)
 	 * once the host stage 2 is installed.
 	 */
 	static_branch_enable(&kvm_protected_mode_initialized);
-	on_each_cpu(_kvm_host_prot_finalize, &ret, 1);
+	smp_call(SMP_CALL_ALL, _kvm_host_prot_finalize, &ret, SMP_CALL_TYPE_SYNC);
 	return ret;
 }
 
diff --git a/arch/csky/abiv2/cacheflush.c b/arch/csky/abiv2/cacheflush.c
index 39c51399dd81..b662f8a10db8 100644
--- a/arch/csky/abiv2/cacheflush.c
+++ b/arch/csky/abiv2/cacheflush.c
@@ -80,7 +80,7 @@ void flush_icache_mm_range(struct mm_struct *mm,
 	cpumask_andnot(&others, mm_cpumask(mm), cpumask_of(cpu));
 
 	if (mm != current->active_mm || !cpumask_empty(&others)) {
-		on_each_cpu_mask(&others, local_icache_inv_all, NULL, 1);
+		smp_call_mask(&others, local_icache_inv_all, NULL, SMP_CALL_TYPE_SYNC);
 		cpumask_clear(mask);
 	}
 
diff --git a/arch/csky/kernel/perf_event.c b/arch/csky/kernel/perf_event.c
index e5f18420ce64..bc8fb8a8fd28 100644
--- a/arch/csky/kernel/perf_event.c
+++ b/arch/csky/kernel/perf_event.c
@@ -1311,7 +1311,7 @@ int csky_pmu_device_probe(struct platform_device *pdev,
 	csky_pmu.plat_device = pdev;
 
 	/* Ensure the PMU has sane values out of reset. */
-	on_each_cpu(csky_pmu_reset, &csky_pmu, 1);
+	smp_call(SMP_CALL_ALL, csky_pmu_reset, &csky_pmu, SMP_CALL_TYPE_SYNC);
 
 	ret = csky_pmu_request_irq(csky_pmu_handle_irq);
 	if (ret) {
diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c
index 6bb38bc2f39b..2b62171d8d82 100644
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -137,7 +137,7 @@ static void ipi_stop(void *unused)
 
 void smp_send_stop(void)
 {
-	on_each_cpu(ipi_stop, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_stop, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void smp_send_reschedule(int cpu)
diff --git a/arch/csky/mm/cachev2.c b/arch/csky/mm/cachev2.c
index 7a9664adce43..19dab0d4c089 100644
--- a/arch/csky/mm/cachev2.c
+++ b/arch/csky/mm/cachev2.c
@@ -66,7 +66,7 @@ void icache_inv_range(unsigned long start, unsigned long end)
 	if (irqs_disabled())
 		local_icache_inv_range(&param);
 	else
-		on_each_cpu(local_icache_inv_range, &param, 1);
+		smp_call(SMP_CALL_ALL, local_icache_inv_range, &param, SMP_CALL_TYPE_SYNC);
 }
 #endif
 
diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
index e628a88607bb..c3723f7b0068 100644
--- a/arch/ia64/kernel/mca.c
+++ b/arch/ia64/kernel/mca.c
@@ -712,7 +712,7 @@ ia64_mca_cmc_vector_enable (void *dummy)
 static void
 ia64_mca_cmc_vector_disable_keventd(struct work_struct *unused)
 {
-	on_each_cpu(ia64_mca_cmc_vector_disable, NULL, 0);
+	smp_call(SMP_CALL_ALL, ia64_mca_cmc_vector_disable, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 /*
@@ -724,7 +724,7 @@ ia64_mca_cmc_vector_disable_keventd(struct work_struct *unused)
 static void
 ia64_mca_cmc_vector_enable_keventd(struct work_struct *unused)
 {
-	on_each_cpu(ia64_mca_cmc_vector_enable, NULL, 0);
+	smp_call(SMP_CALL_ALL, ia64_mca_cmc_vector_enable, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index 7b7b64eb3129..d792c1cfea36 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -285,7 +285,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
 void
 smp_flush_tlb_all (void)
 {
-	on_each_cpu((void (*)(void *))local_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, (void (*)(void *))local_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void
@@ -301,12 +301,12 @@ smp_flush_tlb_mm (struct mm_struct *mm)
 		return;
 	}
 	if (!alloc_cpumask_var(&cpus, GFP_ATOMIC)) {
-		smp_call_function((void (*)(void *))local_finish_flush_tlb_mm,
-			mm, 1);
+		smp_call(SMP_CALL_ALL, (void (*)(void *))local_finish_flush_tlb_mm,
+			  mm, SMP_CALL_TYPE_SYNC);
 	} else {
 		cpumask_copy(cpus, mm_cpumask(mm));
-		smp_call_function_many(cpus,
-			(void (*)(void *))local_finish_flush_tlb_mm, mm, 1);
+		smp_call_mask(cpus,
+			(void (*)(void *))local_finish_flush_tlb_mm, mm, SMP_CALL_TYPE_SYNC);
 		free_cpumask_var(cpus);
 	}
 	local_irq_disable();
diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c
index 816803636a75..c49fe1b01ec5 100644
--- a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -118,7 +118,7 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
 	status = ia64_pal_prefetch_visibility(PAL_VISIBILITY_PHYSICAL);
 	if (status == PAL_VISIBILITY_OK_REMOTE_NEEDED) {
 		atomic_set(&uc_pool->status, 0);
-		smp_call_function(uncached_ipi_visibility, uc_pool, 1);
+		smp_call(SMP_CALL_ALL, uncached_ipi_visibility, uc_pool, SMP_CALL_TYPE_SYNC);
 		if (atomic_read(&uc_pool->status))
 			goto failed;
 	} else if (status != PAL_VISIBILITY_OK)
@@ -137,7 +137,7 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
 	if (status != PAL_STATUS_SUCCESS)
 		goto failed;
 	atomic_set(&uc_pool->status, 0);
-	smp_call_function(uncached_ipi_mc_drain, uc_pool, 1);
+	smp_call(SMP_CALL_ALL, uncached_ipi_mc_drain, uc_pool, SMP_CALL_TYPE_SYNC);
 	if (atomic_read(&uc_pool->status))
 		goto failed;
 
diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c
index 07d7ff5a981d..558a16d3a705 100644
--- a/arch/mips/cavium-octeon/octeon-irq.c
+++ b/arch/mips/cavium-octeon/octeon-irq.c
@@ -216,7 +216,7 @@ static void octeon_irq_core_bus_sync_unlock(struct irq_data *data)
 	struct octeon_core_chip_data *cd = irq_data_get_irq_chip_data(data);
 
 	if (cd->desired_en != cd->current_en) {
-		on_each_cpu(octeon_irq_core_set_enable_local, data, 1);
+		smp_call(SMP_CALL_ALL, octeon_irq_core_set_enable_local, data, SMP_CALL_TYPE_SYNC);
 
 		cd->current_en = cd->desired_en;
 	}
@@ -1364,7 +1364,7 @@ void octeon_irq_set_ip4_handler(octeon_irq_ip4_handler_t h)
 {
 	octeon_irq_ip4 = h;
 	octeon_irq_use_ip4 = true;
-	on_each_cpu(octeon_irq_local_enable_ip4, NULL, 1);
+	smp_call(SMP_CALL_ALL, octeon_irq_local_enable_ip4, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void octeon_irq_percpu_enable(void)
diff --git a/arch/mips/cavium-octeon/setup.c b/arch/mips/cavium-octeon/setup.c
index 00bf269763cf..54b54d8cd3a9 100644
--- a/arch/mips/cavium-octeon/setup.c
+++ b/arch/mips/cavium-octeon/setup.c
@@ -256,7 +256,7 @@ static void octeon_shutdown(void)
 {
 	octeon_generic_shutdown();
 #ifdef CONFIG_SMP
-	smp_call_function(octeon_kexec_smp_down, NULL, 0);
+	smp_call(SMP_CALL_ALL, octeon_kexec_smp_down, NULL, SMP_CALL_TYPE_ASYNC);
 	smp_wmb();
 	while (num_online_cpus() > 1) {
 		cpu_relax();
@@ -469,7 +469,7 @@ static void octeon_kill_core(void *arg)
  */
 static void octeon_halt(void)
 {
-	smp_call_function(octeon_kill_core, NULL, 0);
+	smp_call(SMP_CALL_ALL, octeon_kill_core, NULL, SMP_CALL_TYPE_ASYNC);
 
 	switch (octeon_bootinfo->board_type) {
 	case CVMX_BOARD_TYPE_NAO38:
diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
index 81845ba04835..5ab095b20648 100644
--- a/arch/mips/kernel/crash.c
+++ b/arch/mips/kernel/crash.c
@@ -63,7 +63,7 @@ static void crash_kexec_prepare_cpus(void)
 
 	ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
 
-	smp_call_function(crash_shutdown_secondary, NULL, 0);
+	smp_call(SMP_CALL_ALL, crash_shutdown_secondary, NULL, SMP_CALL_TYPE_ASYNC);
 	smp_wmb();
 
 	/*
diff --git a/arch/mips/kernel/machine_kexec.c b/arch/mips/kernel/machine_kexec.c
index 432bfd3e7f22..89ace42c9648 100644
--- a/arch/mips/kernel/machine_kexec.c
+++ b/arch/mips/kernel/machine_kexec.c
@@ -139,7 +139,7 @@ machine_shutdown(void)
 		_machine_kexec_shutdown();
 
 #ifdef CONFIG_SMP
-	smp_call_function(kexec_shutdown_secondary, NULL, 0);
+	smp_call(SMP_CALL_ALL, kexec_shutdown_secondary, NULL, SMP_CALL_TYPE_ASYNC);
 
 	while (num_online_cpus() > 1) {
 		cpu_relax();
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 1641d274fe37..8d8a4ae1ca26 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -642,8 +642,9 @@ static void hw_perf_event_destroy(struct perf_event *event)
 		 * We must not call the destroy function with interrupts
 		 * disabled.
 		 */
-		on_each_cpu(reset_counters,
-			(void *)(long)mipspmu.num_counters, 1);
+		smp_call(SMP_CALL_ALL, reset_counters,
+			(void *)(long)mipspmu.num_counters, SMP_CALL_TYPE_SYNC);
+
 		mipspmu_free_irq();
 		mutex_unlock(&pmu_reserve_mutex);
 	}
@@ -2043,7 +2044,7 @@ init_hw_perf_events(void)
 		mipspmu.write_counter = mipsxx_pmu_write_counter;
 	}
 
-	on_each_cpu(reset_counters, (void *)(long)counters, 1);
+	smp_call(SMP_CALL_ALL, reset_counters, (void *)(long)counters, SMP_CALL_TYPE_SYNC);
 
 	pr_cont("%s PMU enabled, %d %d-bit counters available to each "
 		"CPU, irq %d%s\n", mipspmu.name, counters, counter_bits, irq,
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 1986d1309410..9a6f827d9325 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -397,7 +397,7 @@ static void stop_this_cpu(void *dummy)
 
 void smp_send_stop(void)
 {
-	smp_call_function(stop_this_cpu, NULL, 0);
+	smp_call(SMP_CALL_ALL, stop_this_cpu, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 void __init smp_cpus_done(unsigned int max_cpus)
@@ -472,7 +472,7 @@ void flush_tlb_all(void)
 		return;
 	}
 
-	on_each_cpu(flush_tlb_all_ipi, NULL, 1);
+	smp_call(SMP_CALL_ALL, flush_tlb_all_ipi, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void flush_tlb_mm_ipi(void *mm)
@@ -490,7 +490,7 @@ static void flush_tlb_mm_ipi(void *mm)
  */
 static inline void smp_on_other_tlbs(void (*func) (void *info), void *info)
 {
-	smp_call_function(func, info, 1);
+	smp_call(SMP_CALL_ALL, func, info, SMP_CALL_TYPE_SYNC);
 }
 
 static inline void smp_on_each_tlb(void (*func) (void *info), void *info)
@@ -617,7 +617,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.addr2 = end,
 	};
 
-	on_each_cpu(flush_tlb_kernel_range_ipi, &fd, 1);
+	smp_call(SMP_CALL_ALL, flush_tlb_kernel_range_ipi, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void flush_tlb_page_ipi(void *info)
diff --git a/arch/mips/kernel/sysrq.c b/arch/mips/kernel/sysrq.c
index 9c1a2019113b..2b3424997d4a 100644
--- a/arch/mips/kernel/sysrq.c
+++ b/arch/mips/kernel/sysrq.c
@@ -38,7 +38,7 @@ static void sysrq_tlbdump_single(void *dummy)
 #ifdef CONFIG_SMP
 static void sysrq_tlbdump_othercpus(struct work_struct *dummy)
 {
-	smp_call_function(sysrq_tlbdump_single, NULL, 0);
+	smp_call(SMP_CALL_ALL, sysrq_tlbdump_single, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 static DECLARE_WORK(sysrq_tlbdump, sysrq_tlbdump_othercpus);
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index ccb9e47322b0..ce17b72c425a 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -96,8 +96,8 @@ static inline void r4k_on_each_cpu(unsigned int type,
 {
 	preempt_disable();
 	if (r4k_op_needs_ipi(type))
-		smp_call_function_many(&cpu_foreign_map[smp_processor_id()],
-				       func, info, 1);
+		smp_call_mask(&cpu_foreign_map[smp_processor_id()],
+				       func, info, SMP_CALL_TYPE_SYNC);
 	func(info);
 	preempt_enable();
 }
diff --git a/arch/mips/sibyte/common/cfe.c b/arch/mips/sibyte/common/cfe.c
index 1a504294d85f..f70b2ce49115 100644
--- a/arch/mips/sibyte/common/cfe.c
+++ b/arch/mips/sibyte/common/cfe.c
@@ -57,7 +57,7 @@ static void __noreturn cfe_linux_exit(void *arg)
 		if (!reboot_smp) {
 			/* Get CPU 0 to do the cfe_exit */
 			reboot_smp = 1;
-			smp_call_function(cfe_linux_exit, arg, 0);
+			smp_call(SMP_CALL_ALL, cfe_linux_exit, arg, SMP_CALL_TYPE_ASYNC);
 		}
 	} else {
 		printk("Passing control back to CFE...\n");
diff --git a/arch/openrisc/kernel/smp.c b/arch/openrisc/kernel/smp.c
index 27041db2c8b0..ed5f3c7d86e3 100644
--- a/arch/openrisc/kernel/smp.c
+++ b/arch/openrisc/kernel/smp.c
@@ -194,7 +194,7 @@ static void stop_this_cpu(void *dummy)
 
 void smp_send_stop(void)
 {
-	smp_call_function(stop_this_cpu, NULL, 0);
+	smp_call(SMP_CALL_ALL, stop_this_cpu, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 /* not supported, yet */
@@ -244,7 +244,7 @@ static void smp_flush_tlb_mm(struct cpumask *cmask, struct mm_struct *mm)
 		/* local cpu is the only cpu present in cpumask */
 		local_flush_tlb_mm(mm);
 	} else {
-		on_each_cpu_mask(cmask, ipi_flush_tlb_mm, mm, 1);
+		smp_call_mask(cmask, ipi_flush_tlb_mm, mm, SMP_CALL_TYPE_SYNC);
 	}
 	put_cpu();
 }
@@ -291,16 +291,16 @@ static void smp_flush_tlb_range(const struct cpumask *cmask, unsigned long start
 		fd.addr2 = end;
 
 		if ((end - start) <= PAGE_SIZE)
-			on_each_cpu_mask(cmask, ipi_flush_tlb_page, &fd, 1);
+			smp_call_mask(cmask, ipi_flush_tlb_page, &fd, SMP_CALL_TYPE_SYNC);
 		else
-			on_each_cpu_mask(cmask, ipi_flush_tlb_range, &fd, 1);
+			smp_call_mask(cmask, ipi_flush_tlb_range, &fd, SMP_CALL_TYPE_SYNC);
 	}
 	put_cpu();
 }
 
 void flush_tlb_all(void)
 {
-	on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
@@ -331,6 +331,6 @@ static void ipi_icache_page_inv(void *arg)
 
 void smp_icache_page_inv(struct page *page)
 {
-	on_each_cpu(ipi_icache_page_inv, page, 1);
+	smp_call(SMP_CALL_ALL, ipi_icache_page_inv, page, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(smp_icache_page_inv);
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index 23348199f3f8..111b1d442296 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -81,13 +81,13 @@ void flush_cache_all_local(void)
 void flush_cache_all(void)
 {
 	if (static_branch_likely(&parisc_has_cache))
-		on_each_cpu(cache_flush_local_cpu, NULL, 1);
+		smp_call(SMP_CALL_ALL, cache_flush_local_cpu, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static inline void flush_data_cache(void)
 {
 	if (static_branch_likely(&parisc_has_dcache))
-		on_each_cpu(flush_data_cache_local, NULL, 1);
+		smp_call(SMP_CALL_ALL, flush_data_cache_local, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 
diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index 1dc2e88e7b04..3b89ab4af568 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -847,7 +847,7 @@ void flush_tlb_all(void)
 	    do_recycle++;
 	}
 	spin_unlock(&sid_lock);
-	on_each_cpu(flush_tlb_all_local, NULL, 1);
+	smp_call(SMP_CALL_ALL, lush_tlb_all_local, NULL, SMP_CALL_TYPE_SYNC);
 	if (do_recycle) {
 	    spin_lock(&sid_lock);
 	    recycle_sids(recycle_ndirty,recycle_dirty_array);
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
index 64e423d2fe0f..1e928bda191b 100644
--- a/arch/powerpc/kernel/dawr.c
+++ b/arch/powerpc/kernel/dawr.c
@@ -77,7 +77,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
 
 	/* If we are clearing, make sure all CPUs have the DAWR cleared */
 	if (!dawr_force_enable)
-		smp_call_function(disable_dawrs_cb, NULL, 0);
+		smp_call(SMP_CALL_ALL, disable_dawrs_cb, NULL, SMP_CALL_TYPE_ASYNC);
 
 	return rc;
 }
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 6568823cf306..559023ef1952 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -666,7 +666,7 @@ static void __init kvm_use_magic_page(void)
 	u32 features;
 
 	/* Tell the host to map the magic page to -4096 on all CPUs */
-	on_each_cpu(kvm_map_magic_page, &features, 1);
+	smp_call(SMP_CALL_ALL, kvm_map_magic_page, &features, SMP_CALL_TYPE_SYNC);
 
 	/* Quick self-test to see if the mapping works */
 	if (fault_in_readable((const char __user *)KVM_MAGIC_PAGE,
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index d96fd14bd7c9..65b53535f2b9 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -607,7 +607,7 @@ void rfi_flush_enable(bool enable)
 {
 	if (enable) {
 		do_rfi_flush_fixups(enabled_flush_types);
-		on_each_cpu(do_nothing, NULL, 1);
+		smp_call(SMP_CALL_ALL, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 	} else
 		do_rfi_flush_fixups(L1D_FLUSH_NONE);
 
@@ -618,7 +618,7 @@ static void entry_flush_enable(bool enable)
 {
 	if (enable) {
 		do_entry_flush_fixups(enabled_flush_types);
-		on_each_cpu(do_nothing, NULL, 1);
+		smp_call(SMP_CALL_ALL, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 	} else {
 		do_entry_flush_fixups(L1D_FLUSH_NONE);
 	}
@@ -631,7 +631,7 @@ static void uaccess_flush_enable(bool enable)
 	if (enable) {
 		do_uaccess_flush_fixups(enabled_flush_types);
 		static_branch_enable(&uaccess_flush_key);
-		on_each_cpu(do_nothing, NULL, 1);
+		smp_call(SMP_CALL_ALL, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 	} else {
 		static_branch_disable(&uaccess_flush_key);
 		do_uaccess_flush_fixups(L1D_FLUSH_NONE);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index de0f6f09a5dd..6806d5b2f0f8 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -655,7 +655,7 @@ void crash_smp_send_stop(void)
 #ifdef CONFIG_NMI_IPI
 	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 1000000);
 #else
-	smp_call_function(crash_stop_this_cpu, NULL, 0);
+	smp_call(SMP_CALL_ALL, crash_stop_this_cpu, NULL, SMP_CALL_TYPE_ASYNC);
 #endif /* CONFIG_NMI_IPI */
 }
 
@@ -711,7 +711,7 @@ void smp_send_stop(void)
 
 	stopped = true;
 
-	smp_call_function(stop_this_cpu, NULL, 0);
+	smp_call(SMP_CALL_ALL, stop_this_cpu, NULL, SMP_CALL_TYPE_ASYNC);
 }
 #endif /* CONFIG_NMI_IPI */
 
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 2069bbb90a9a..898b48ca7b5a 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -206,7 +206,7 @@ static ssize_t __used store_dscr_default(struct device *dev,
 		return -EINVAL;
 	dscr_default = val;
 
-	on_each_cpu(write_dscr, &val, 1);
+	smp_call(SMP_CALL_ALL, write_dscr, &val, SMP_CALL_TYPE_SYNC);
 
 	return count;
 }
diff --git a/arch/powerpc/kernel/tau_6xx.c b/arch/powerpc/kernel/tau_6xx.c
index 828d0f4106d2..235905acf234 100644
--- a/arch/powerpc/kernel/tau_6xx.c
+++ b/arch/powerpc/kernel/tau_6xx.c
@@ -158,7 +158,7 @@ static struct workqueue_struct *tau_workq;
 static void tau_work_func(struct work_struct *work)
 {
 	msleep(shrink_timer);
-	on_each_cpu(tau_timeout, NULL, 0);
+	smp_call(SMP_CALL_ALL, tau_timeout, NULL, SMP_CALL_TYPE_ASYNC);
 	/* schedule ourselves to be run again */
 	queue_work(tau_workq, work);
 }
@@ -204,7 +204,7 @@ static int __init TAU_init(void)
 	if (!tau_workq)
 		return -ENOMEM;
 
-	on_each_cpu(TAU_init_smp, NULL, 0);
+	smp_call(SMP_CALL_ALL, TAU_init_smp, NULL, SMP_CALL_TYPE_ASYNC);
 
 	queue_work(tau_workq, &tau_work);
 
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 6cc7793b8420..01f2428108fc 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -225,7 +225,7 @@ static void wake_offline_cpus(void)
 static void kexec_prepare_cpus(void)
 {
 	wake_offline_cpus();
-	smp_call_function(kexec_smp_down, NULL, /* wait */0);
+	smp_call(SMP_CALL_ALL, kexec_smp_down, NULL, SMP_CALL_TYPE_ASYNC);
 	local_irq_disable();
 	hard_irq_disable();
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 0aeb51738ca9..776f9d1dfe99 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1555,7 +1555,7 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 
 	/* Boot all CPUs out of the guest so they re-read
 	 * mmu_ready */
-	on_each_cpu(resize_hpt_boot_vcpu, NULL, 1);
+	smp_call(SMP_CALL_ALL, resize_hpt_boot_vcpu, NULL, SMP_CALL_TYPE_SYNC);
 
 	ret = -ENXIO;
 	if (!resize || (resize->order != shift))
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 052e6590f84f..35c96cae62f9 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -111,7 +111,7 @@ static void do_serialize(void *arg)
 void serialize_against_pte_lookup(struct mm_struct *mm)
 {
 	smp_mb();
-	smp_call_function_many(mm_cpumask(mm), do_serialize, mm, 1);
+	smp_call_mask(mm_cpumask(mm), do_serialize, mm, SMP_CALL_TYPE_SYNC);
 }
 
 /*
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 7724af19ed7e..9166c6ce4ad2 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -464,7 +464,7 @@ static inline void _tlbiel_pid_multicast(struct mm_struct *mm,
 	struct cpumask *cpus = mm_cpumask(mm);
 	struct tlbiel_pid t = { .pid = pid, .ric = ric };
 
-	on_each_cpu_mask(cpus, do_tlbiel_pid, &t, 1);
+	smp_call_mask(cpus, do_tlbiel_pid, &t, SMP_CALL_TYPE_SYNC);
 	/*
 	 * Always want the CPU translations to be invalidated with tlbiel in
 	 * these paths, so while coprocessors must use tlbie, we can not
@@ -616,7 +616,7 @@ static inline void _tlbiel_va_multicast(struct mm_struct *mm,
 {
 	struct cpumask *cpus = mm_cpumask(mm);
 	struct tlbiel_va t = { .va = va, .pid = pid, .psize = psize, .ric = ric };
-	on_each_cpu_mask(cpus, do_tlbiel_va, &t, 1);
+	smp_call_mask(cpus, do_tlbiel_va, &t, SMP_CALL_TYPE_SYNC);
 	if (atomic_read(&mm->context.copros) > 0)
 		_tlbie_va(va, pid, psize, RIC_FLUSH_TLB);
 }
@@ -682,7 +682,7 @@ static inline void _tlbiel_va_range_multicast(struct mm_struct *mm,
 				.pid = pid, .page_size = page_size,
 				.psize = psize, .also_pwc = also_pwc };
 
-	on_each_cpu_mask(cpus, do_tlbiel_va_range, &t, 1);
+	smp_call_mask(cpus, do_tlbiel_va_range, &t, SMP_CALL_TYPE_SYNC);
 	if (atomic_read(&mm->context.copros) > 0)
 		_tlbie_va_range(start, end, pid, page_size, psize, also_pwc);
 }
@@ -827,8 +827,8 @@ static void exit_flush_lazy_tlbs(struct mm_struct *mm)
 	 * make a special powerpc IPI for flushing TLBs.
 	 * For now it's not too performance critical.
 	 */
-	smp_call_function_many(mm_cpumask(mm), do_exit_flush_lazy_tlb,
-				(void *)mm, 1);
+	smp_call_mask(mm_cpumask(mm), do_exit_flush_lazy_tlb,
+				(void *)mm, SMP_CALL_TYPE_SYNC);
 }
 
 #else /* CONFIG_SMP */
@@ -1064,7 +1064,7 @@ static void do_tlbiel_kernel(void *info)
 
 static inline void _tlbiel_kernel_broadcast(void)
 {
-	on_each_cpu(do_tlbiel_kernel, NULL, 1);
+	smp_call(SMP_CALL_ALL, do_tlbiel_kernel, NULL, SMP_CALL_TYPE_SYNC);
 	if (tlbie_capable) {
 		/*
 		 * Coherent accelerators don't refcount kernel memory mappings,
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index fd2c77af5c55..287c8ca8a974 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -276,8 +276,8 @@ void flush_tlb_mm(struct mm_struct *mm)
 	if (!mm_is_core_local(mm)) {
 		struct tlb_flush_param p = { .pid = pid };
 		/* Ignores smp_processor_id() even if set. */
-		smp_call_function_many(mm_cpumask(mm),
-				       do_flush_tlb_mm_ipi, &p, 1);
+		smp_call_mask(mm_cpumask(mm),
+			       do_flush_tlb_mm_ipi, &p, SMP_CALL_TYPE_SYNC);
 	}
 	_tlbil_pid(pid);
  no_context:
@@ -321,8 +321,8 @@ void __flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
 				.ind = ind,
 			};
 			/* Ignores smp_processor_id() even if set in cpu_mask */
-			smp_call_function_many(cpu_mask,
-					       do_flush_tlb_page_ipi, &p, 1);
+			smp_call_mask(cpu_mask,
+				       do_flush_tlb_page_ipi, &p, SMP_CALL_TYPE_SYNC);
 		}
 	}
 	_tlbil_va(vmaddr, pid, tsize, ind);
@@ -362,7 +362,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
 #ifdef CONFIG_SMP
 	preempt_disable();
-	smp_call_function(do_flush_tlb_mm_ipi, NULL, 1);
+	smp_call(SMP_CALL_ALL, do_flush_tlb_mm_ipi, NULL, SMP_CALL_TYPE_SYNC);
 	_tlbil_pid(0);
 	preempt_enable();
 #else
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index f42711f865f3..a597007d8cf4 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -464,7 +464,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 		 */
 		mm_ctx_set_slb_addr_limit(&mm->context, high_limit);
 
-		on_each_cpu(slice_flush_segments, mm, 1);
+		smp_call(SMP_CALL_ALL, slice_flush_segments, mm, SMP_CALL_TYPE_SYNC);
 	}
 
 	/* Sanity checks */
@@ -626,7 +626,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 		 !bitmap_empty(potential_mask.high_slices, SLICE_NUM_HIGH))) {
 		slice_convert(mm, &potential_mask, psize);
 		if (psize > MMU_PAGE_BASE)
-			on_each_cpu(slice_flush_segments, mm, 1);
+			smp_call(SMP_CALL_ALL, slice_flush_segments, mm, SMP_CALL_TYPE_SYNC);
 	}
 	return newaddr;
 
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index b5b42cf0a703..548dccc3efe1 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2529,7 +2529,7 @@ static int __init init_ppc64_pmu(void)
 {
 	if (cpu_has_feature(CPU_FTR_HVMODE) && pmu_override) {
 		pr_warn("disabling perf due to pmu_override= command line option.\n");
-		on_each_cpu(do_pmu_override, NULL, 1);
+		smp_call(SMP_CALL_ALL, do_pmu_override, NULL, SMP_CALL_TYPE_SYNC);
 		return 0;
 	}
 
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 526d4b767534..6ecf8315e928 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1583,7 +1583,7 @@ static void thread_imc_ldbar_disable(void *dummy)
 
 void thread_imc_disable(void)
 {
-	on_each_cpu(thread_imc_ldbar_disable, NULL, 1);
+	smp_call(SMP_CALL_ALL, thread_imc_ldbar_disable, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void cleanup_all_thread_imc_memory(void)
diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index a1c6a7827c8f..fddf5e1cda6d 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -445,7 +445,7 @@ static void mpc85xx_smp_machine_kexec(struct kimage *image)
 	int i, num_cpus = num_present_cpus();
 
 	if (image->type == KEXEC_TYPE_DEFAULT)
-		smp_call_function(mpc85xx_smp_kexec_down, NULL, 0);
+		smp_call(SMP_CALL_ALL, mpc85xx_smp_kexec_down, NULL, SMP_CALL_TYPE_ASYNC);
 
 	while ( (atomic_read(&kexec_down_cpus) != (num_cpus - 1)) &&
 		( timeout > 0 ) )
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index a6677a111aca..5bc261517d08 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -202,7 +202,7 @@ static ssize_t store_fastsleep_workaround_applyonce(struct device *dev,
 	power7_fastsleep_workaround_exit = false;
 
 	cpus_read_lock();
-	on_each_cpu(pnv_fastsleep_workaround_apply, &err, 1);
+	smp_call(SMP_CALL_ALL, pnv_fastsleep_workaround_apply, &err, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 	if (err) {
 		pr_err("fastsleep_workaround_applyonce change failed while running pnv_fastsleep_workaround_apply");
diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index 2119c003fcf9..3a5d31321366 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -61,7 +61,7 @@ static unsigned long get_purr(void)
 {
 	atomic64_t purr = ATOMIC64_INIT(0);
 
-	on_each_cpu(cpu_get_purr, &purr, 1);
+	smp_call(SMP_CALL_ALL, cpu_get_purr, &purr, SMP_CALL_TYPE_SYNC);
 
 	return atomic64_read(&purr);
 }
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 6cb7d96ad9c7..8d64bb39aff2 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -21,7 +21,7 @@ void flush_icache_all(void)
 	if (IS_ENABLED(CONFIG_RISCV_SBI))
 		sbi_remote_fence_i(NULL);
 	else
-		on_each_cpu(ipi_remote_fence_i, NULL, 1);
+		smp_call(SMP_CALL_ALL, ipi_remote_fence_i, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(flush_icache_all);
 
@@ -69,7 +69,7 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
 	} else if (IS_ENABLED(CONFIG_RISCV_SBI)) {
 		sbi_remote_fence_i(&others);
 	} else {
-		on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
+		smp_call_mask(&others, ipi_remote_fence_i, NULL, SMP_CALL_TYPE_SYNC);
 	}
 
 	preempt_enable();
diff --git a/arch/s390/hypfs/hypfs_diag0c.c b/arch/s390/hypfs/hypfs_diag0c.c
index 9a2786079e3a..f513e1529b6a 100644
--- a/arch/s390/hypfs/hypfs_diag0c.c
+++ b/arch/s390/hypfs/hypfs_diag0c.c
@@ -51,7 +51,7 @@ static void *diag0c_store(unsigned int *count)
 		cpu_vec[cpu] = &diag0c_data->entry[i++];
 	}
 	/* Collect data all CPUs */
-	on_each_cpu(diag0c_fn, cpu_vec, 1);
+	smp_call(SMP_CALL_ALL, diag0c_fn, cpu_vec, SMP_CALL_TYPE_SYNC);
 	*count = cpu_count;
 	kfree(cpu_vec);
 	cpus_read_unlock();
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index cce0ddee2d02..2c112748a785 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -121,7 +121,7 @@ static void do_sync_core(void *info)
 
 void text_poke_sync(void)
 {
-	on_each_cpu(do_sync_core, NULL, 1);
+	smp_call(SMP_CALL_ALL, do_sync_core, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void text_poke_sync_lock(void)
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 483ab5e10164..5629b32a7f5a 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -923,7 +923,7 @@ static void cfset_all_stop(struct cfset_request *req)
 	};
 
 	cpumask_and(&req->mask, &req->mask, cpu_online_mask);
-	on_each_cpu_mask(&req->mask, cfset_ioctl_off, &p, 1);
+	smp_call_mask(&req->mask, cfset_ioctl_off, &p, SMP_CALL_TYPE_SYNC);
 }
 
 /* Release function is also called when application gets terminated without
@@ -940,7 +940,7 @@ static int cfset_release(struct inode *inode, struct file *file)
 		file->private_data = NULL;
 	}
 	if (!atomic_dec_return(&cfset_opencnt))
-		on_each_cpu(cfset_release_cpu, NULL, 1);
+		smp_call(SMP_CALL_ALL, cfset_release_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	mutex_unlock(&cfset_ctrset_mutex);
 
 	hw_perf_event_destroy(NULL);
@@ -974,9 +974,9 @@ static int cfset_all_start(struct cfset_request *req)
 	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
 		return -ENOMEM;
 	cpumask_and(mask, &req->mask, cpu_online_mask);
-	on_each_cpu_mask(mask, cfset_ioctl_on, &p, 1);
+	smp_call_mask(mask, cfset_ioctl_on, &p, SMP_CALL_TYPE_SYNC);
 	if (atomic_read(&p.cpus_ack) != cpumask_weight(mask)) {
-		on_each_cpu_mask(mask, cfset_ioctl_off, &p, 1);
+		smp_call_mask(mask, cfset_ioctl_off, &p, SMP_CALL_TYPE_SYNC);
 		rc = -EIO;
 		debug_sprintf_event(cf_dbg, 4, "%s CPUs missing", __func__);
 	}
@@ -1100,7 +1100,7 @@ static int cfset_all_read(unsigned long arg, struct cfset_request *req)
 
 	p.sets = req->ctrset;
 	cpumask_and(mask, &req->mask, cpu_online_mask);
-	on_each_cpu_mask(mask, cfset_cpu_read, &p, 1);
+	smp_call_mask(mask, cfset_cpu_read, &p, SMP_CALL_TYPE_SYNC);
 	rc = cfset_all_copy(arg, mask);
 	free_cpumask_var(mask);
 	return rc;
diff --git a/arch/s390/kernel/perf_cpum_cf_common.c b/arch/s390/kernel/perf_cpum_cf_common.c
index 8ee48672233f..d46b6ee17485 100644
--- a/arch/s390/kernel/perf_cpum_cf_common.c
+++ b/arch/s390/kernel/perf_cpum_cf_common.c
@@ -105,7 +105,7 @@ int __kernel_cpumcf_begin(void)
 {
 	int flags = PMC_INIT;
 
-	on_each_cpu(cpum_cf_setup_cpu, &flags, 1);
+	smp_call(SMP_CALL_ALL, cpum_cf_setup_cpu, &flags, SMP_CALL_TYPE_SYNC);
 	irq_subclass_register(IRQ_SUBCLASS_MEASUREMENT_ALERT);
 
 	return 0;
@@ -131,7 +131,7 @@ void __kernel_cpumcf_end(void)
 {
 	int flags = PMC_RELEASE;
 
-	on_each_cpu(cpum_cf_setup_cpu, &flags, 1);
+	smp_call(SMP_CALL_ALL, cpum_cf_setup_cpu, &flags, SMP_CALL_TYPE_SYNC);
 	irq_subclass_unregister(IRQ_SUBCLASS_MEASUREMENT_ALERT);
 }
 EXPORT_SYMBOL(__kernel_cpumcf_end);
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 332a49965130..d0e1c7acb249 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -582,14 +582,14 @@ static void release_pmc_hardware(void)
 	int flags = PMC_RELEASE;
 
 	irq_subclass_unregister(IRQ_SUBCLASS_MEASUREMENT_ALERT);
-	on_each_cpu(setup_pmc_cpu, &flags, 1);
+	smp_call(SMP_CALL_ALL, setup_pmc_cpu, &flags, SMP_CALL_TYPE_SYNC);
 }
 
 static int reserve_pmc_hardware(void)
 {
 	int flags = PMC_INIT;
 
-	on_each_cpu(setup_pmc_cpu, &flags, 1);
+	smp_call(SMP_CALL_ALL, setup_pmc_cpu, &flags, SMP_CALL_TYPE_SYNC);
 	if (flags & PMC_FAILURE) {
 		release_pmc_hardware();
 		return -ENODEV;
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 7a74ea5f7531..2c9652c0db6e 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -62,7 +62,7 @@ void s390_update_cpu_mhz(void)
 {
 	s390_adjust_jiffies();
 	if (machine_has_cpu_mhz)
-		on_each_cpu(update_cpu_mhz, NULL, 0);
+		smp_call(SMP_CALL_ALL, update_cpu_mhz, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 void notrace stop_machine_yield(const struct cpumask *cpumask)
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 30c91d565933..837e5f8a619c 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -595,7 +595,7 @@ void smp_ctl_set_clear_bit(int cr, int bit, bool set)
 	ctlreg = (ctlreg & parms.andval) | parms.orval;
 	put_abs_lowcore(cregs_save_area[cr], ctlreg);
 	spin_unlock(&ctl_lock);
-	on_each_cpu(smp_ctl_bit_callback, &parms, 1);
+	smp_call(SMP_CALL_ALL, smp_ctl_bit_callback, &parms, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(smp_ctl_set_clear_bit);
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index c6eecd4a5302..3df2a1f0c5fe 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -322,7 +322,7 @@ int arch_update_cpu_topology(void)
 	int cpu, rc;
 
 	rc = __arch_update_cpu_topology();
-	on_each_cpu(__arch_update_dedicated_flag, NULL, 0);
+	smp_call(SMP_CALL_ALL, __arch_update_dedicated_flag, NULL, SMP_CALL_TYPE_ASYNC);
 	for_each_online_cpu(cpu) {
 		dev = get_cpu_device(cpu);
 		if (dev)
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 2de48b2c1b04..f700843a703c 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -131,7 +131,7 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
 
 	spin_unlock_bh(&mm->page_table_lock);
 
-	on_each_cpu(__crst_table_upgrade, mm, 0);
+	smp_call(SMP_CALL_ALL, __crst_table_upgrade, mm, SMP_CALL_TYPE_ASYNC);
 
 	return 0;
 
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 500cd2dbdf53..136af9f32f23 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -440,7 +440,7 @@ static int __init zpci_directed_irq_init(void)
 		if (!zpci_ibv[cpu])
 			return -ENOMEM;
 	}
-	on_each_cpu(cpu_enable_directed_irq, NULL, 1);
+	smp_call(SMP_CALL_ALL, cpu_enable_directed_irq, NULL, SMP_CALL_TYPE_SYNC);
 
 	zpci_irq_chip.irq_set_affinity = zpci_set_irq_affinity;
 
diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 65924d9ec245..7370c301b55a 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -263,7 +263,7 @@ void smp_send_reschedule(int cpu)
 
 void smp_send_stop(void)
 {
-	smp_call_function(stop_this_cpu, 0, 0);
+	smp_call(SMP_CALL_ALL, stop_this_cpu, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 void arch_send_call_function_ipi_mask(const struct cpumask *mask)
@@ -335,7 +335,7 @@ static void flush_tlb_all_ipi(void *info)
 
 void flush_tlb_all(void)
 {
-	on_each_cpu(flush_tlb_all_ipi, 0, 1);
+	smp_call(SMP_CALL_ALL, flush_tlb_all_ipi, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void flush_tlb_mm_ipi(void *mm)
@@ -360,7 +360,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	preempt_disable();
 
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
-		smp_call_function(flush_tlb_mm_ipi, (void *)mm, 1);
+		smp_call(SMP_CALL_ALL, flush_tlb_mm_ipi, (void *)mm, SMP_CALL_TYPE_SYNC);
 	} else {
 		int i;
 		for_each_online_cpu(i)
@@ -397,7 +397,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		fd.vma = vma;
 		fd.addr1 = start;
 		fd.addr2 = end;
-		smp_call_function(flush_tlb_range_ipi, (void *)&fd, 1);
+		smp_call(SMP_CALL_ALL, flush_tlb_range_ipi, (void *)&fd, SMP_CALL_TYPE_SYNC);
 	} else {
 		int i;
 		for_each_online_cpu(i)
@@ -421,7 +421,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 
 	fd.addr1 = start;
 	fd.addr2 = end;
-	on_each_cpu(flush_tlb_kernel_range_ipi, (void *)&fd, 1);
+	smp_call(SMP_CALL_ALL, flush_tlb_kernel_range_ipi, (void *)&fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void flush_tlb_page_ipi(void *info)
@@ -440,7 +440,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 
 		fd.vma = vma;
 		fd.addr1 = page;
-		smp_call_function(flush_tlb_page_ipi, (void *)&fd, 1);
+		smp_call(SMP_CALL_ALL, flush_tlb_page_ipi, (void *)&fd, SMP_CALL_TYPE_SYNC);
 	} else {
 		int i;
 		for_each_online_cpu(i)
@@ -464,7 +464,7 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
 	fd.addr1 = asid;
 	fd.addr2 = vaddr;
 
-	smp_call_function(flush_tlb_one_ipi, (void *)&fd, 1);
+	smp_call(SMP_CALL_ALL, flush_tlb_one_ipi, (void *)&fd, SMP_CALL_TYPE_SYNC);
 	local_flush_tlb_one(asid, vaddr);
 }
 
diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
index 3aef78ceb820..ac98c3f8f530 100644
--- a/arch/sh/mm/cache.c
+++ b/arch/sh/mm/cache.c
@@ -49,7 +49,7 @@ static inline void cacheop_on_each_cpu(void (*func) (void *info), void *info,
 	 * even attempt IPIs unless there are other CPUs online.
 	 */
 	if (num_online_cpus() > 1)
-		smp_call_function(func, info, wait);
+		smp_call(SMP_CALL_ALL, func, info, (wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 #endif
 
 	func(info);
diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h
index 274217e7ed70..038099bcabea 100644
--- a/arch/sparc/include/asm/mman.h
+++ b/arch/sparc/include/asm/mman.h
@@ -37,8 +37,8 @@ static inline unsigned long sparc_calc_vm_prot_bits(unsigned long prot)
 			regs = task_pt_regs(current);
 			regs->tstate |= TSTATE_MCDE;
 			current->mm->context.adi = true;
-			on_each_cpu_mask(mm_cpumask(current->mm),
-					 ipi_set_tstate_mcde, current->mm, 0);
+			smp_call_mask(mm_cpumask(current->mm),
+				ipi_set_tstate_mcde, current->mm, SMP_CALL_TYPE_ASYNC);
 		}
 		return VM_SPARC_ADI;
 	} else {
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 060fff95a305..02e5d1c54fcc 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -176,7 +176,7 @@ static int __init check_nmi_watchdog(void)
 
 	printk(KERN_INFO "Testing NMI watchdog ... ");
 
-	smp_call_function(nmi_cpu_busy, (void *)&endflag, 0);
+	smp_call(SMP_CALL_ALL, nmi_cpu_busy, (void *)&endflag, SMP_CALL_TYPE_ASYNC);
 
 	for_each_possible_cpu(cpu)
 		prev_nmi_count[cpu] = get_nmi_count(cpu);
@@ -203,7 +203,7 @@ static int __init check_nmi_watchdog(void)
 	kfree(prev_nmi_count);
 	return 0;
 error:
-	on_each_cpu(stop_nmi_watchdog, NULL, 1);
+	smp_call(SMP_CALL_ALL, stop_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 	return err;
 }
 
@@ -235,13 +235,13 @@ static void nmi_adjust_hz_one(void *unused)
 void nmi_adjust_hz(unsigned int new_hz)
 {
 	nmi_hz = new_hz;
-	on_each_cpu(nmi_adjust_hz_one, NULL, 1);
+	smp_call(SMP_CALL_ALL, nmi_adjust_hz_one, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(nmi_adjust_hz);
 
 static int nmi_shutdown(struct notifier_block *nb, unsigned long cmd, void *p)
 {
-	on_each_cpu(stop_nmi_watchdog, NULL, 1);
+	smp_call(SMP_CALL_ALL, stop_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 
@@ -253,13 +253,13 @@ int __init nmi_init(void)
 {
 	int err;
 
-	on_each_cpu(start_nmi_watchdog, NULL, 1);
+	smp_call(SMP_CALL_ALL, start_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 
 	err = check_nmi_watchdog();
 	if (!err) {
 		err = register_reboot_notifier(&nmi_reboot_notifier);
 		if (err) {
-			on_each_cpu(stop_nmi_watchdog, NULL, 1);
+			smp_call(SMP_CALL_ALL, stop_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 			atomic_set(&nmi_active, -1);
 		}
 	}
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index a58ae9c42803..a843eb4a0f2c 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1176,7 +1176,7 @@ static void perf_event_grab_pmc(void)
 	mutex_lock(&pmc_grab_mutex);
 	if (atomic_read(&active_events) == 0) {
 		if (atomic_read(&nmi_active) > 0) {
-			on_each_cpu(perf_stop_nmi_watchdog, NULL, 1);
+			smp_call(SMP_CALL_ALL, perf_stop_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 			BUG_ON(atomic_read(&nmi_active) != 0);
 		}
 		atomic_inc(&active_events);
@@ -1188,7 +1188,7 @@ static void perf_event_release_pmc(void)
 {
 	if (atomic_dec_and_mutex_lock(&active_events, &pmc_grab_mutex)) {
 		if (atomic_read(&nmi_active) == 0)
-			on_each_cpu(start_nmi_watchdog, NULL, 1);
+			smp_call(SMP_CALL_ALL, start_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 		mutex_unlock(&pmc_grab_mutex);
 	}
 }
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index a1f78e9ddaf3..f86a4cda59dd 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -901,7 +901,7 @@ static void tsb_sync(void *info)
 
 void smp_tsb_sync(struct mm_struct *mm)
 {
-	smp_call_function_many(mm_cpumask(mm), tsb_sync, mm, 1);
+	smp_call_mask(mm_cpumask(mm), tsb_sync, mm, SMP_CALL_TYPE_SYNC);
 }
 
 extern unsigned long xcall_flush_tlb_mm;
@@ -1084,8 +1084,8 @@ void smp_flush_tlb_pending(struct mm_struct *mm, unsigned long nr, unsigned long
 	info.nr = nr;
 	info.vaddrs = vaddrs;
 
-	smp_call_function_many(mm_cpumask(mm), tlb_pending_func,
-			       &info, 1);
+	smp_call_mask(mm_cpumask(mm), tlb_pending_func,
+		       &info, SMP_CALL_TYPE_SYNC);
 
 	__flush_tlb_pending(ctx, nr, vaddrs);
 
@@ -1523,7 +1523,7 @@ void smp_send_stop(void)
 				prom_stopcpu_cpuid(cpu);
 		}
 	} else
-		smp_call_function(stop_this_cpu, NULL, 0);
+		smp_call(SMP_CALL_ALL, stop_this_cpu, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 8b1911591581..dca8c077c8ad 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -3024,7 +3024,7 @@ void hugetlb_setup(struct pt_regs *regs)
 		spin_unlock_irq(&ctx_alloc_lock);
 
 		if (need_context_reload)
-			on_each_cpu(context_reload, mm, 0);
+			smp_call(SMP_CALL_ALL, context_reload, mm, SMP_CALL_TYPE_ASYNC);
 	}
 }
 #endif
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index eef816fc216d..03515ba1f36a 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2526,7 +2526,7 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
 	mmap_assert_write_locked(mm);
 
 	if (atomic_inc_return(&mm->context.perf_rdpmc_allowed) == 1)
-		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
+		smp_call_mask(mm_cpumask(mm), cr4_update_pce, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
@@ -2535,7 +2535,7 @@ static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *m
 		return;
 
 	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
-		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
+		smp_call_mask(mm_cpumask(mm), cr4_update_pce, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static int x86_pmu_event_idx(struct perf_event *event)
@@ -2591,7 +2591,7 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
 		else if (x86_pmu.attr_rdpmc == 2)
 			static_branch_dec(&rdpmc_always_available_key);
 
-		on_each_cpu(cr4_update_pce, NULL, 1);
+		smp_call(SMP_CALL_ALL, cr4_update_pce, NULL, SMP_CALL_TYPE_SYNC);
 		x86_pmu.attr_rdpmc = val;
 	}
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index fc7f458eb3de..2e94f33d1aab 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5123,7 +5123,7 @@ static ssize_t freeze_on_smi_store(struct device *cdev,
 	x86_pmu.attr_freeze_on_smi = val;
 
 	cpus_read_lock();
-	on_each_cpu(flip_smm_bit, &val, 1);
+	smp_call(SMP_CALL_ALL, flip_smm_bit, &val, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 done:
 	mutex_unlock(&freeze_on_smi_mutex);
@@ -5168,7 +5168,7 @@ static ssize_t set_sysctl_tfa(struct device *cdev,
 	allow_tsx_force_abort = val;
 
 	cpus_read_lock();
-	on_each_cpu(update_tfa_sched, NULL, 1);
+	smp_call(SMP_CALL_ALL, update_tfa_sched, NULL, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 
 	return count;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d374cb3cf024..b8d93fb7c33b 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1181,7 +1181,7 @@ static void do_sync_core(void *info)
 
 void text_poke_sync(void)
 {
-	on_each_cpu(do_sync_core, NULL, 1);
+	smp_call(SMP_CALL_ALL, do_sync_core, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 struct text_poke_loc {
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 020c906f7934..4edc11afcd78 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -488,7 +488,7 @@ static __init void fix_erratum_688(void)
 	if (val & BIT(2))
 		return;
 
-	on_each_cpu(__fix_erratum_688, NULL, 0);
+	smp_call(SMP_CALL_ALL, __fix_erratum_688, NULL, SMP_CALL_TYPE_ASYNC);
 
 	pr_info("x86/cpu/AMD: CPU erratum 688 worked around\n");
 }
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b70344bf6600..87a68837db30 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -656,7 +656,7 @@ void lapic_update_tsc_freq(void)
 	 * changed. In order to avoid races, schedule the frequency
 	 * update code on each CPU.
 	 */
-	on_each_cpu(__lapic_update_tsc_freq, NULL, 0);
+	smp_call(SMP_CALL_ALL, __lapic_update_tsc_freq, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6296e1ebed1d..a6806b73f7e3 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1091,7 +1091,7 @@ static void update_stibp_strict(void)
 	pr_info("Update user space SMT mitigation: STIBP %s\n",
 		mask & SPEC_CTRL_STIBP ? "always-on" : "off");
 	x86_spec_ctrl_base = mask;
-	on_each_cpu(update_stibp_msr, NULL, 1);
+	smp_call(SMP_CALL_ALL, update_stibp_msr, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 /* Update the static key controlling the evaluation of TIF_SPEC_IB */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 981496e6bc0e..e8ddca5aacf1 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2207,7 +2207,7 @@ void mce_disable_bank(int bank)
 		return;
 	}
 	set_bit(bank, mce_banks_ce_disabled);
-	on_each_cpu(__mce_disable_bank, &bank, 1);
+	smp_call(SMP_CALL_ALL, __mce_disable_bank, &bank, SMP_CALL_TYPE_SYNC);
 }
 
 /*
@@ -2362,7 +2362,7 @@ static void mce_cpu_restart(void *data)
 static void mce_restart(void)
 {
 	mce_timer_delete_all();
-	on_each_cpu(mce_cpu_restart, NULL, 1);
+	smp_call(SMP_CALL_ALL, mce_cpu_restart, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 /* Toggle features for corrected errors */
@@ -2450,12 +2450,12 @@ static ssize_t set_ignore_ce(struct device *s,
 		if (new) {
 			/* disable ce features */
 			mce_timer_delete_all();
-			on_each_cpu(mce_disable_cmci, NULL, 1);
+			smp_call(SMP_CALL_ALL, mce_disable_cmci, NULL, SMP_CALL_TYPE_SYNC);
 			mca_cfg.ignore_ce = true;
 		} else {
 			/* enable ce features */
 			mca_cfg.ignore_ce = false;
-			on_each_cpu(mce_enable_ce, (void *)1, 1);
+			smp_call(SMP_CALL_ALL, mce_enable_ce, (void *)1, SMP_CALL_TYPE_SYNC);
 		}
 	}
 	mutex_unlock(&mce_sysfs_mutex);
@@ -2476,12 +2476,12 @@ static ssize_t set_cmci_disabled(struct device *s,
 	if (mca_cfg.cmci_disabled ^ !!new) {
 		if (new) {
 			/* disable cmci */
-			on_each_cpu(mce_disable_cmci, NULL, 1);
+			smp_call(SMP_CALL_ALL, mce_disable_cmci, NULL, SMP_CALL_TYPE_SYNC);
 			mca_cfg.cmci_disabled = true;
 		} else {
 			/* enable cmci */
 			mca_cfg.cmci_disabled = false;
-			on_each_cpu(mce_enable_ce, NULL, 1);
+			smp_call(SMP_CALL_ALL, mce_enable_ce, NULL, SMP_CALL_TYPE_SYNC);
 		}
 	}
 	mutex_unlock(&mce_sysfs_mutex);
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 5fbd7ffb3233..728c7df0f56f 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -263,10 +263,8 @@ static void __maybe_unused raise_mce(struct mce *m)
 				 * don't wait because mce_irq_ipi is necessary
 				 * to be sync with following raise_local
 				 */
-				preempt_disable();
-				smp_call_function_many(mce_inject_cpumask,
-					mce_irq_ipi, NULL, 0);
-				preempt_enable();
+				smp_call_mask(mce_inject_cpumask,
+					mce_irq_ipi, NULL, SMP_CALL_TYPE_ASYNC);
 			} else if (m->inject_flags & MCJ_NMI_BROADCAST)
 				apic->send_IPI_mask(mce_inject_cpumask,
 						NMI_VECTOR);
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index 95275a5e57e0..787dfc787336 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -400,7 +400,7 @@ void cmci_rediscover(void)
 	if (!cmci_supported(&banks))
 		return;
 
-	on_each_cpu(cmci_rediscover_work_func, NULL, 1);
+	smp_call(SMP_CALL_ALL, cmci_rediscover_work_func, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 418c83c3c4b5..45eb7b067436 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -326,7 +326,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		rdt_ctrl_update(&msr_param);
 	/* Update resource control msr on other CPUs. */
-	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
+	smp_call_mask(cpu_mask, rdt_ctrl_update, &msr_param, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 
 done:
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 83f901e2c2df..aafa68060c45 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -329,7 +329,7 @@ update_closid_rmid(const struct cpumask *cpu_mask, struct rdtgroup *r)
 
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		update_cpu_closid_rmid(r);
-	smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1);
+	smp_call_mask(cpu_mask, update_cpu_closid_rmid, r, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 }
 
@@ -1866,7 +1866,7 @@ static int set_cache_qos_cfg(int level, bool enable)
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		update(&enable);
 	/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, update, &enable, 1);
+	smp_call_mask(cpu_mask, update, &enable, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 
 	free_cpumask_var(cpu_mask);
@@ -2335,7 +2335,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
 	if (cpumask_test_cpu(cpu, cpu_mask))
 		rdt_ctrl_update(&msr_param);
 	/* Update CBM on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
+	smp_call_mask(cpu_mask, rdt_ctrl_update, &msr_param, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 
 	free_cpumask_var(cpu_mask);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8e4bc6453d26..67bdfe2c3069 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -278,8 +278,9 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page,
 			 * miss cpus that entered the enclave between
 			 * generating the mask and incrementing epoch.
 			 */
-			on_each_cpu_mask(sgx_encl_ewb_cpumask(encl),
-					 sgx_ipi_cb, NULL, 1);
+			smp_call_mask(sgx_encl_ewb_cpumask(encl),
+				 sgx_ipi_cb, NULL, SMP_CALL_TYPE_SYNC);
+
 			ret = __sgx_encl_ewb(epc_page, va_slot, backing);
 		}
 	}
diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index ec8064c0ae03..86a9dd98d596 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -120,7 +120,7 @@ static inline void umwait_update_control(u32 maxtime, bool c02_enable)
 
 	WRITE_ONCE(umwait_control_cached, ctrl);
 	/* Propagate to all CPUs */
-	on_each_cpu(umwait_update_control_msr, NULL, 1);
+	smp_call(SMP_CALL_ALL, umwait_update_control_msr, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static ssize_t
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index c04b933f48d3..741e49bfaa1f 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -282,7 +282,7 @@ static int vmware_pv_reboot_notify(struct notifier_block *nb,
 				unsigned long code, void *unused)
 {
 	if (code == SYS_RESTART)
-		on_each_cpu(vmware_pv_guest_cpu_reboot, NULL, 1);
+		smp_call(SMP_CALL_ALL, vmware_pv_guest_cpu_reboot, NULL, SMP_CALL_TYPE_SYNC);
 	return NOTIFY_DONE;
 }
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a22deb58f86d..5955093e0deb 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -730,7 +730,7 @@ static int kvm_pv_reboot_notify(struct notifier_block *nb,
 				unsigned long code, void *unused)
 {
 	if (code == SYS_RESTART)
-		on_each_cpu(kvm_pv_guest_cpu_reboot, NULL, 1);
+		smp_call(SMP_CALL_ALL, kvm_pv_guest_cpu_reboot, NULL, SMP_CALL_TYPE_SYNC);
 	return NOTIFY_DONE;
 }
 
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 525876e7b9f4..3243e2507d73 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -424,7 +424,7 @@ static void install_ldt(struct mm_struct *mm, struct ldt_struct *ldt)
 	smp_store_release(&mm->context.ldt, ldt);
 
 	/* Activate the LDT for all CPUs using currents mm. */
-	on_each_cpu_mask(mm_cpumask(mm), flush_ldt, mm, true);
+	smp_call_mask(mm_cpumask(mm), flush_ldt, mm, SMP_CALL_TYPE_SYNC);
 
 	mutex_unlock(&mm->context.lock);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c0ca599a353..9cd1dd022879 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7443,8 +7443,8 @@ static int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
 		int cpu = get_cpu();
 
 		cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
-		on_each_cpu_mask(vcpu->arch.wbinvd_dirty_mask,
-				wbinvd_ipi, NULL, 1);
+		smp_call_mask(vcpu->arch.wbinvd_dirty_mask,
+				wbinvd_ipi, NULL, SMP_CALL_TYPE_SYNC);
 		put_cpu();
 		cpumask_clear(vcpu->arch.wbinvd_dirty_mask);
 	} else
diff --git a/arch/x86/lib/cache-smp.c b/arch/x86/lib/cache-smp.c
index 7c48ff4ae8d1..20d00dc2bb8c 100644
--- a/arch/x86/lib/cache-smp.c
+++ b/arch/x86/lib/cache-smp.c
@@ -15,7 +15,7 @@ EXPORT_SYMBOL(wbinvd_on_cpu);
 
 int wbinvd_on_all_cpus(void)
 {
-	on_each_cpu(__wbinvd, NULL, 1);
+	smp_call(SMP_CALL_ALL, __wbinvd, NULL, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 EXPORT_SYMBOL(wbinvd_on_all_cpus);
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index 40bbe56bde32..139ba6a2240a 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -113,7 +113,7 @@ static void __rwmsr_on_cpus(const struct cpumask *mask, u32 msr_no,
 	if (cpumask_test_cpu(this_cpu, mask))
 		msr_func(&rv);
 
-	smp_call_function_many(mask, msr_func, &rv, 1);
+	smp_call_mask(mask, msr_func, &rv, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 }
 
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index abf5ed76e4b7..920fe7a5e8e2 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -348,7 +348,7 @@ static void cpa_flush_all(unsigned long cache)
 {
 	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
 
-	on_each_cpu(__cpa_flush_all, (void *) cache, 1);
+	smp_call(SMP_CALL_ALL, __cpa_flush_all, (void *) cache, SMP_CALL_TYPE_SYNC);
 }
 
 static void __cpa_flush_tlb(void *data)
@@ -375,7 +375,7 @@ static void cpa_flush(struct cpa_data *data, int cache)
 	if (cpa->force_flush_all || cpa->numpages > tlb_single_page_flush_ceiling)
 		flush_tlb_all();
 	else
-		on_each_cpu(__cpa_flush_tlb, cpa, 1);
+		smp_call(SMP_CALL_ALL, __cpa_flush_tlb, cpa, SMP_CALL_TYPE_SYNC);
 
 	if (!cache)
 		return;
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d400b6d9d246..8771ca98af7b 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -889,10 +889,10 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
 	 * doing a speculative memory access.
 	 */
 	if (info->freed_tables)
-		on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
+		smp_call_mask(cpumask, flush_tlb_func, (void *)info, SMP_CALL_TYPE_SYNC);
 	else
-		on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
-				(void *)info, 1, cpumask);
+		smp_call_mask_cond(cpumask, flush_tlb_func, (void *)info,
+				tlb_is_not_lazy, SMP_CALL_TYPE_SYNC);
 }
 
 void flush_tlb_multi(const struct cpumask *cpumask,
@@ -1006,7 +1006,7 @@ static void do_flush_tlb_all(void *info)
 void flush_tlb_all(void)
 {
 	count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
-	on_each_cpu(do_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, do_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void do_kernel_range_flush(void *info)
@@ -1024,14 +1024,14 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 	/* Balance as user space task's flush, a bit conservative */
 	if (end == TLB_FLUSH_ALL ||
 	    (end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) {
-		on_each_cpu(do_flush_tlb_all, NULL, 1);
+		smp_call(SMP_CALL_ALL, do_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 	} else {
 		struct flush_tlb_info *info;
 
 		preempt_disable();
 		info = get_flush_tlb_info(NULL, start, end, 0, false, 0);
 
-		on_each_cpu(do_kernel_range_flush, info, 1);
+		smp_call(SMP_CALL_ALL, do_kernel_range_flush, info, SMP_CALL_TYPE_SYNC);
 
 		put_flush_tlb_info();
 		preempt_enable();
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 00354866921b..1122038107ff 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -948,7 +948,7 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
 			cpumask_set_cpu(cpu, mask);
 	}
 
-	smp_call_function_many(mask, drop_mm_ref_this_cpu, mm, 1);
+	smp_call_mask(mask, drop_mm_ref_this_cpu, mm, SMP_CALL_TYPE_SYNC);
 	free_cpumask_var(mask);
 }
 #else
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 688aa8b6ae29..d44f86b9c9bc 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -428,7 +428,7 @@ static void stop_self(void *v)
 
 static void xen_pv_stop_other_cpus(int wait)
 {
-	smp_call_function(stop_self, NULL, wait);
+	smp_call(SMP_CALL_ALL, stop_self, NULL, (wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 }
 
 static irqreturn_t xen_irq_work_interrupt(int irq, void *dev_id)
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 1d83152c761b..94abd0bb7a26 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -67,7 +67,7 @@ void xen_arch_resume(void)
 {
 	int cpu;
 
-	on_each_cpu(xen_vcpu_notify_restore, NULL, 1);
+	smp_call(SMP_CALL_ALL, xen_vcpu_notify_restore, NULL, SMP_CALL_TYPE_SYNC);
 
 	for_each_online_cpu(cpu)
 		xen_pmu_init(cpu);
@@ -80,5 +80,5 @@ void xen_arch_suspend(void)
 	for_each_online_cpu(cpu)
 		xen_pmu_finish(cpu);
 
-	on_each_cpu(xen_vcpu_notify_suspend, NULL, 1);
+	smp_call(SMP_CALL_ALL, xen_vcpu_notify_suspend, NULL, SMP_CALL_TYPE_SYNC);
 }
diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c
index 1254da07ead1..5878ba8f0274 100644
--- a/arch/xtensa/kernel/smp.c
+++ b/arch/xtensa/kernel/smp.c
@@ -470,7 +470,7 @@ static void ipi_flush_tlb_all(void *arg)
 
 void flush_tlb_all(void)
 {
-	on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_mm(void *arg)
@@ -480,7 +480,7 @@ static void ipi_flush_tlb_mm(void *arg)
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	on_each_cpu(ipi_flush_tlb_mm, mm, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_mm, mm, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_page(void *arg)
@@ -495,7 +495,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 		.vma = vma,
 		.addr1 = addr,
 	};
-	on_each_cpu(ipi_flush_tlb_page, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_page, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_range(void *arg)
@@ -512,7 +512,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_tlb_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_range, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_tlb_kernel_range(void *arg)
@@ -527,7 +527,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_tlb_kernel_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_tlb_kernel_range, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 /* Cache flush functions */
@@ -539,7 +539,7 @@ static void ipi_flush_cache_all(void *arg)
 
 void flush_cache_all(void)
 {
-	on_each_cpu(ipi_flush_cache_all, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_cache_all, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_cache_page(void *arg)
@@ -556,7 +556,7 @@ void flush_cache_page(struct vm_area_struct *vma,
 		.addr1 = address,
 		.addr2 = pfn,
 	};
-	on_each_cpu(ipi_flush_cache_page, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_cache_page, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_cache_range(void *arg)
@@ -573,7 +573,7 @@ void flush_cache_range(struct vm_area_struct *vma,
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_cache_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_cache_range, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_icache_range(void *arg)
@@ -588,7 +588,7 @@ void flush_icache_range(unsigned long start, unsigned long end)
 		.addr1 = start,
 		.addr2 = end,
 	};
-	on_each_cpu(ipi_flush_icache_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_icache_range, &fd, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(flush_icache_range);
 
@@ -607,7 +607,7 @@ static void system_invalidate_dcache_range(unsigned long start,
 		.addr1 = start,
 		.addr2 = size,
 	};
-	on_each_cpu(ipi_invalidate_dcache_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_invalidate_dcache_range, &fd, SMP_CALL_TYPE_SYNC);
 }
 
 static void ipi_flush_invalidate_dcache_range(void *arg)
@@ -623,5 +623,5 @@ static void system_flush_invalidate_dcache_range(unsigned long start,
 		.addr1 = start,
 		.addr2 = size,
 	};
-	on_each_cpu(ipi_flush_invalidate_dcache_range, &fd, 1);
+	smp_call(SMP_CALL_ALL, ipi_flush_invalidate_dcache_range, &fd, SMP_CALL_TYPE_SYNC);
 }
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index 3ffbb1c80c5c..dddb96754e0b 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -1308,7 +1308,7 @@ static void ipi_handler(void *null)
 
 void global_cache_flush(void)
 {
-	on_each_cpu(ipi_handler, NULL, 1);
+	smp_call(SMP_CALL_ALL, ipi_handler, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(global_cache_flush);
 
diff --git a/drivers/clocksource/mips-gic-timer.c b/drivers/clocksource/mips-gic-timer.c
index be4175f415ba..3f0566027bf6 100644
--- a/drivers/clocksource/mips-gic-timer.c
+++ b/drivers/clocksource/mips-gic-timer.c
@@ -130,7 +130,7 @@ static int gic_clk_notifier(struct notifier_block *nb, unsigned long action,
 
 	if (action == POST_RATE_CHANGE) {
 		gic_clocksource_unstable("ref clock rate change");
-		on_each_cpu(gic_update_frequency, (void *)cnd->new_rate, 1);
+		smp_call(SMP_CALL_ALL, gic_update_frequency, (void *)cnd->new_rate, SMP_CALL_TYPE_SYNC);
 	}
 
 	return NOTIFY_OK;
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index e0c4b3c6575d..d54d28b6ea81 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -129,8 +129,8 @@ static void boost_set_msr_each(void *p_en)
 
 static int set_boost(struct cpufreq_policy *policy, int val)
 {
-	on_each_cpu_mask(policy->cpus, boost_set_msr_each,
-			 (void *)(long)val, 1);
+	smp_call_mask(policy->cpus, boost_set_msr_each,
+			(void *)(long)val, SMP_CALL_TYPE_SYNC);
 	pr_debug("CPU %*pbl: Core Boosting %sabled.\n",
 		 cpumask_pr_args(policy->cpus), val ? "en" : "dis");
 
@@ -340,7 +340,7 @@ static void drv_write(struct acpi_cpufreq_data *data,
 	if (cpumask_test_cpu(this_cpu, mask))
 		do_drv_write(&cmd);
 
-	smp_call_function_many(mask, do_drv_write, &cmd, 1);
+	smp_call_mask(mask, do_drv_write, &cmd, SMP_CALL_TYPE_SYNC);
 	put_cpu();
 }
 
diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c
index ac381db25dbe..99fc456f16b6 100644
--- a/drivers/cpufreq/tegra194-cpufreq.c
+++ b/drivers/cpufreq/tegra194-cpufreq.c
@@ -265,7 +265,7 @@ static int tegra194_cpufreq_set_target(struct cpufreq_policy *policy,
 	 * in a cluster run at same frequency which is the maximum frequency
 	 * request out of the values requested by both cores in that cluster.
 	 */
-	on_each_cpu_mask(policy->cpus, set_cpu_ndiv, tbl, true);
+	smp_call_mask(policy->cpus, set_cpu_ndiv, tbl, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index f70aa17e2a8e..a7e6f95e6a9c 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -225,8 +225,8 @@ static int __cpuidle_register_driver(struct cpuidle_driver *drv)
 		return ret;
 
 	if (drv->bctimer)
-		on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
-				 (void *)1, 1);
+		smp_call_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
+				 (void *)1, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
@@ -244,8 +244,8 @@ static void __cpuidle_unregister_driver(struct cpuidle_driver *drv)
 {
 	if (drv->bctimer) {
 		drv->bctimer = 0;
-		on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
-				 NULL, 1);
+		smp_call_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
+				 NULL, SMP_CALL_TYPE_SYNC);
 	}
 
 	__cpuidle_unset_driver(drv);
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 812baa48b290..69038cd22924 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -793,7 +793,7 @@ static ssize_t inject_write_store(struct device *dev,
 			"/sys/bus/machinecheck/devices/machinecheck<CPUNUM>/check_interval\n"
 			"so that you can get the error report faster.\n");
 
-	on_each_cpu(disable_caches, NULL, 1);
+	smp_call(SMP_CALL_ALL, disable_caches, NULL, SMP_CALL_TYPE_SYNC);
 
 	/* Issue 'word' and 'bit' along with the READ request */
 	amd64_write_pci_cfg(pvt->F3, F10_NB_ARRAY_DATA, word_bits);
@@ -806,7 +806,7 @@ static ssize_t inject_write_store(struct device *dev,
 		goto retry;
 	}
 
-	on_each_cpu(enable_caches, NULL, 1);
+	smp_call(SMP_CALL_ALL, enable_caches, NULL, SMP_CALL_TYPE_SYNC);
 
 	edac_dbg(0, "section=0x%x word_bits=0x%x\n", section, word_bits);
 
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 1e1a51510e83..691408efb912 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -101,7 +101,7 @@ static inline int sdei_do_cross_call(smp_call_func_t fn,
 	struct sdei_crosscall_args arg;
 
 	CROSSCALL_INIT(arg, event);
-	on_each_cpu(fn, &arg, true);
+	smp_call(SMP_CALL_ALL, fn, &arg, SMP_CALL_TYPE_SYNC);
 
 	return arg.first_error;
 }
@@ -359,7 +359,7 @@ static int sdei_api_shared_reset(void)
 static void sdei_mark_interface_broken(void)
 {
 	pr_err("disabling SDEI firmware interface\n");
-	on_each_cpu(&_ipi_mask_cpu, NULL, true);
+	smp_call(SMP_CALL_ALL, &_ipi_mask_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	sdei_firmware_call = NULL;
 }
 
@@ -367,7 +367,7 @@ static int sdei_platform_reset(void)
 {
 	int err;
 
-	on_each_cpu(&_ipi_private_reset, NULL, true);
+	smp_call(SMP_CALL_ALL, &_ipi_private_reset, NULL, SMP_CALL_TYPE_SYNC);
 	err = sdei_api_shared_reset();
 	if (err) {
 		pr_err("Failed to reset platform: %d\n", err);
@@ -741,14 +741,14 @@ static struct notifier_block sdei_pm_nb = {
 
 static int sdei_device_suspend(struct device *dev)
 {
-	on_each_cpu(_ipi_mask_cpu, NULL, true);
+	smp_call(SMP_CALL_ALL, _ipi_mask_cpu, NULL, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
 
 static int sdei_device_resume(struct device *dev)
 {
-	on_each_cpu(_ipi_unmask_cpu, NULL, true);
+	smp_call(SMP_CALL_ALL, _ipi_unmask_cpu, NULL, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/vlv_sideband.c b/drivers/gpu/drm/i915/vlv_sideband.c
index c26001300ebd..09a261d8af80 100644
--- a/drivers/gpu/drm/i915/vlv_sideband.c
+++ b/drivers/gpu/drm/i915/vlv_sideband.c
@@ -42,7 +42,7 @@ static void __vlv_punit_get(struct drm_i915_private *i915)
 	 */
 	if (IS_VALLEYVIEW(i915)) {
 		cpu_latency_qos_update_request(&i915->sb_qos, 0);
-		on_each_cpu(ping, NULL, 1);
+		smp_call(SMP_CALL_ALL, ping, NULL, SMP_CALL_TYPE_SYNC);
 	}
 }
 
diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index 521534d5c1e5..25bf112c9aa0 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -188,7 +188,7 @@ static int read_registers(struct fam15h_power_data *data)
 		cpumask_set_cpu(cpumask_any(topology_sibling_cpumask(cpu)), mask);
 	}
 
-	on_each_cpu_mask(mask, do_read_registers_on_cu, data, true);
+	smp_call_mask(mask, do_read_registers_on_cu, data, SMP_CALL_TYPE_SYNC);
 
 	cpus_read_unlock();
 	free_cpumask_var(mask);
diff --git a/drivers/irqchip/irq-mvebu-pic.c b/drivers/irqchip/irq-mvebu-pic.c
index ef3d3646ccc2..c000fd1be945 100644
--- a/drivers/irqchip/irq-mvebu-pic.c
+++ b/drivers/irqchip/irq-mvebu-pic.c
@@ -160,7 +160,7 @@ static int mvebu_pic_probe(struct platform_device *pdev)
 	irq_set_chained_handler(pic->parent_irq, mvebu_pic_handle_cascade_irq);
 	irq_set_handler_data(pic->parent_irq, pic);
 
-	on_each_cpu(mvebu_pic_enable_percpu_irq, pic, 1);
+	smp_call(SMP_CALL_ALL, mvebu_pic_enable_percpu_irq, pic, SMP_CALL_TYPE_SYNC);
 
 	platform_set_drvdata(pdev, pic);
 
@@ -171,7 +171,7 @@ static int mvebu_pic_remove(struct platform_device *pdev)
 {
 	struct mvebu_pic *pic = platform_get_drvdata(pdev);
 
-	on_each_cpu(mvebu_pic_disable_percpu_irq, pic, 1);
+	smp_call(SMP_CALL_ALL, mvebu_pic_disable_percpu_irq, pic, SMP_CALL_TYPE_SYNC);
 	irq_domain_remove(pic->domain);
 
 	return 0;
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 934f6dd90992..77042499baa4 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1473,10 +1473,10 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
 	int max_cpu = num_present_cpus();
 
 	/* Clear all Cause registers */
-	on_each_cpu(mvneta_percpu_clear_intr_cause, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_clear_intr_cause, pp, SMP_CALL_TYPE_SYNC);
 
 	/* Mask all interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_mask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 	mvreg_write(pp, MVNETA_INTR_ENABLE, 0);
 
 	/* Enable MBUS Retry bit16 */
@@ -3704,7 +3704,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
 	}
 
 	/* Unmask interrupts. It has to be done from each CPU */
-	on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_unmask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 
 	mvreg_write(pp, MVNETA_INTR_MISC_MASK,
 		    MVNETA_CAUSE_PHY_STATUS_CHANGE |
@@ -3751,10 +3751,10 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
 	mvneta_port_disable(pp);
 
 	/* Clear all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_clear_intr_cause, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_clear_intr_cause, pp, SMP_CALL_TYPE_SYNC);
 
 	/* Mask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_mask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 
 	mvneta_tx_reset(pp);
 	mvneta_rx_reset(pp);
@@ -3811,7 +3811,7 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 	 * reallocation of the queues
 	 */
 	mvneta_stop_dev(pp);
-	on_each_cpu(mvneta_percpu_disable, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_disable, pp, SMP_CALL_TYPE_SYNC);
 
 	mvneta_cleanup_txqs(pp);
 	mvneta_cleanup_rxqs(pp);
@@ -3833,7 +3833,7 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 		return ret;
 	}
 
-	on_each_cpu(mvneta_percpu_enable, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_enable, pp, SMP_CALL_TYPE_SYNC);
 	mvneta_start_dev(pp);
 
 	netdev_update_features(dev);
@@ -4349,7 +4349,7 @@ static int mvneta_cpu_online(unsigned int cpu, struct hlist_node *node)
 	}
 
 	/* Mask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_mask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 	napi_enable(&port->napi);
 
 	/*
@@ -4365,7 +4365,7 @@ static int mvneta_cpu_online(unsigned int cpu, struct hlist_node *node)
 	mvneta_percpu_elect(pp);
 
 	/* Unmask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_unmask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 	mvreg_write(pp, MVNETA_INTR_MISC_MASK,
 		    MVNETA_CAUSE_PHY_STATUS_CHANGE |
 		    MVNETA_CAUSE_LINK_CHANGE);
@@ -4386,7 +4386,7 @@ static int mvneta_cpu_down_prepare(unsigned int cpu, struct hlist_node *node)
 	 */
 	spin_lock(&pp->lock);
 	/* Mask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_mask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 	spin_unlock(&pp->lock);
 
 	napi_synchronize(&port->napi);
@@ -4406,7 +4406,7 @@ static int mvneta_cpu_dead(unsigned int cpu, struct hlist_node *node)
 	mvneta_percpu_elect(pp);
 	spin_unlock(&pp->lock);
 	/* Unmask all ethernet port interrupts */
-	on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_unmask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 	mvreg_write(pp, MVNETA_INTR_MISC_MASK,
 		    MVNETA_CAUSE_PHY_STATUS_CHANGE |
 		    MVNETA_CAUSE_LINK_CHANGE);
@@ -4445,7 +4445,7 @@ static int mvneta_open(struct net_device *dev)
 		/* Enable per-CPU interrupt on all the CPU to handle our RX
 		 * queue interrupts
 		 */
-		on_each_cpu(mvneta_percpu_enable, pp, true);
+		smp_call(SMP_CALL_ALL, mvneta_percpu_enable, pp, SMP_CALL_TYPE_SYNC);
 
 		pp->is_stopped = false;
 		/* Register a CPU notifier to handle the case where our CPU
@@ -4484,7 +4484,7 @@ static int mvneta_open(struct net_device *dev)
 	if (pp->neta_armada3700) {
 		free_irq(pp->dev->irq, pp);
 	} else {
-		on_each_cpu(mvneta_percpu_disable, pp, true);
+		smp_call(SMP_CALL_ALL, mvneta_percpu_disable, pp, SMP_CALL_TYPE_SYNC);
 		free_percpu_irq(pp->dev->irq, pp->ports);
 	}
 err_cleanup_txqs:
@@ -4516,7 +4516,7 @@ static int mvneta_stop(struct net_device *dev)
 						    &pp->node_online);
 		cpuhp_state_remove_instance_nocalls(CPUHP_NET_MVNETA_DEAD,
 						    &pp->node_dead);
-		on_each_cpu(mvneta_percpu_disable, pp, true);
+		smp_call(SMP_CALL_ALL, mvneta_percpu_disable, pp, SMP_CALL_TYPE_SYNC);
 		free_percpu_irq(dev->irq, pp->ports);
 	} else {
 		mvneta_stop_dev(pp);
@@ -4893,7 +4893,7 @@ static int  mvneta_config_rss(struct mvneta_port *pp)
 
 	netif_tx_stop_all_queues(pp->dev);
 
-	on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+	smp_call(SMP_CALL_ALL, mvneta_percpu_mask_interrupt, pp, SMP_CALL_TYPE_SYNC);
 
 	if (!pp->neta_armada3700) {
 		/* We have to synchronise on the napi of each CPU */
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 1a835b48791b..70fe2af6985f 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -3263,7 +3263,7 @@ static void mvpp2_cleanup_txqs(struct mvpp2_port *port)
 		mvpp2_txq_deinit(port, txq);
 	}
 
-	on_each_cpu(mvpp2_txq_sent_counter_clear, port, 1);
+	smp_call(SMP_CALL_ALL, mvpp2_txq_sent_counter_clear, port, SMP_CALL_TYPE_SYNC);
 
 	val &= ~MVPP2_TX_PORT_FLUSH_MASK(port->id);
 	mvpp2_write(port->priv, MVPP2_TX_PORT_FLUSH_REG, val);
@@ -3327,7 +3327,7 @@ static int mvpp2_setup_txqs(struct mvpp2_port *port)
 		}
 	}
 
-	on_each_cpu(mvpp2_txq_sent_counter_clear, port, 1);
+	smp_call(SMP_CALL_ALL, mvpp2_txq_sent_counter_clear, port, SMP_CALL_TYPE_SYNC);
 	return 0;
 
 err_cleanup:
@@ -4829,7 +4829,7 @@ static int mvpp2_open(struct net_device *dev)
 	}
 
 	/* Unmask interrupts on all CPUs */
-	on_each_cpu(mvpp2_interrupts_unmask, port, 1);
+	smp_call(SMP_CALL_ALL, mvpp2_interrupts_unmask, port, SMP_CALL_TYPE_SYNC);
 	mvpp2_shared_interrupt_mask_unmask(port, false);
 
 	mvpp2_start_dev(port);
@@ -4858,7 +4858,7 @@ static int mvpp2_stop(struct net_device *dev)
 	mvpp2_stop_dev(port);
 
 	/* Mask interrupts on all threads */
-	on_each_cpu(mvpp2_interrupts_mask, port, 1);
+	smp_call(SMP_CALL_ALL, mvpp2_interrupts_mask, port, SMP_CALL_TYPE_SYNC);
 	mvpp2_shared_interrupt_mask_unmask(port, true);
 
 	if (port->phylink)
diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c
index 4dfdbfca6841..e8cacf59357a 100644
--- a/drivers/platform/x86/intel_ips.c
+++ b/drivers/platform/x86/intel_ips.c
@@ -457,7 +457,7 @@ static void ips_enable_cpu_turbo(struct ips_driver *ips)
 		return;
 
 	if (ips->turbo_toggle_allowed)
-		on_each_cpu(do_enable_cpu_turbo, ips, 1);
+		smp_call(SMP_CALL_ALL, do_enable_cpu_turbo, ips, SMP_CALL_TYPE_SYNC);
 
 	ips->__cpu_turbo_on = true;
 }
@@ -495,7 +495,7 @@ static void ips_disable_cpu_turbo(struct ips_driver *ips)
 		return;
 
 	if (ips->turbo_toggle_allowed)
-		on_each_cpu(do_disable_cpu_turbo, ips, 1);
+		smp_call(SMP_CALL_ALL, do_disable_cpu_turbo, ips, SMP_CALL_TYPE_SYNC);
 
 	ips->__cpu_turbo_on = false;
 }
diff --git a/drivers/soc/xilinx/xlnx_event_manager.c b/drivers/soc/xilinx/xlnx_event_manager.c
index b27f8853508e..a9dc450951b8 100644
--- a/drivers/soc/xilinx/xlnx_event_manager.c
+++ b/drivers/soc/xilinx/xlnx_event_manager.c
@@ -514,7 +514,7 @@ static void xlnx_event_cleanup_sgi(struct platform_device *pdev)
 
 	cpuhp_remove_state(CPUHP_AP_ONLINE_DYN);
 
-	on_each_cpu(xlnx_disable_percpu_irq, NULL, 1);
+	smp_call(SMP_CALL_ALL, xlnx_disable_percpu_irq, NULL, SMP_CALL_TYPE_SYNC);
 
 	irq_clear_status_flags(virq_sgi, IRQ_PER_CPU);
 	free_percpu_irq(virq_sgi, &cpu_number1);
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index bbfd004449b5..78e4014b18c1 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -243,7 +243,7 @@ static void showacpu(void *dummy)
 
 static void sysrq_showregs_othercpus(struct work_struct *dummy)
 {
-	smp_call_function(showacpu, NULL, 0);
+	smp_call(SMP_CALL_ALL, showacpu, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 static DECLARE_WORK(sysrq_showallcpus, sysrq_showregs_othercpus);
diff --git a/drivers/watchdog/booke_wdt.c b/drivers/watchdog/booke_wdt.c
index 5e4dc1a0f2c6..b34ba999c8c5 100644
--- a/drivers/watchdog/booke_wdt.c
+++ b/drivers/watchdog/booke_wdt.c
@@ -118,7 +118,7 @@ static void __booke_wdt_set(void *data)
 
 static void booke_wdt_set(void *data)
 {
-	on_each_cpu(__booke_wdt_set, data, 0);
+	smp_call(SMP_CALL_ALL, __booke_wdt_set, data, SMP_CALL_TYPE_ASYNC);
 }
 
 static void __booke_wdt_ping(void *data)
@@ -128,7 +128,7 @@ static void __booke_wdt_ping(void *data)
 
 static int booke_wdt_ping(struct watchdog_device *wdog)
 {
-	on_each_cpu(__booke_wdt_ping, NULL, 0);
+	smp_call(SMP_CALL_ALL, __booke_wdt_ping, NULL, SMP_CALL_TYPE_ASYNC);
 
 	return 0;
 }
@@ -170,7 +170,7 @@ static void __booke_wdt_disable(void *data)
 
 static int booke_wdt_start(struct watchdog_device *wdog)
 {
-	on_each_cpu(__booke_wdt_enable, wdog, 0);
+	smp_call(SMP_CALL_ALL, __booke_wdt_enable, wdog, SMP_CALL_TYPE_ASYNC);
 	pr_debug("watchdog enabled (timeout = %u sec)\n", wdog->timeout);
 
 	return 0;
@@ -178,7 +178,7 @@ static int booke_wdt_start(struct watchdog_device *wdog)
 
 static int booke_wdt_stop(struct watchdog_device *wdog)
 {
-	on_each_cpu(__booke_wdt_disable, NULL, 0);
+	smp_call(SMP_CALL_ALL, __booke_wdt_disable, NULL, SMP_CALL_TYPE_ASYNC);
 	pr_debug("watchdog disabled\n");
 
 	return 0;
diff --git a/fs/buffer.c b/fs/buffer.c
index 2b5561ae5d0b..e02180ab3816 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1420,7 +1420,7 @@ bool has_bh_in_lru(int cpu, void *dummy)
 
 void invalidate_bh_lrus(void)
 {
-	on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1);
+	smp_call_cond(SMP_CALL_ALL, invalidate_bh_lru, NULL, has_bh_in_lru, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(invalidate_bh_lrus);
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 3a6663bce18f..5e5d5a5aa730 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -174,6 +174,7 @@ extern int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
 #define	SMP_CALL_TYPE_IRQ_WORK	CSD_TYPE_IRQ_WORK
 #define	SMP_CALL_TYPE_TTWU		CSD_TYPE_TTWU
 #define	SMP_CALL_TYPE_MASK		CSD_FLAG_TYPE_MASK
+#define	SMP_CALL_ALL		-1
 
 #define	SMP_CALL_ALL		-1
 
@@ -205,9 +206,6 @@ extern unsigned int total_cpus;
 int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 			     int wait);
 
-void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
-			   void *info, bool wait, const struct cpumask *mask);
-
 #define	smp_call_function_single_async(cpu, csd) \
 	smp_call_private(cpu, csd, SMP_CALL_TYPE_ASYNC)
 
@@ -219,48 +217,6 @@ void panic_smp_self_stop(void);
 void nmi_panic_self_stop(struct pt_regs *regs);
 void crash_smp_send_stop(void);
 
-/*
- * Call a function on all processors
- */
-static inline void on_each_cpu(smp_call_func_t func, void *info, int wait)
-{
-	on_each_cpu_cond_mask(NULL, func, info, wait, cpu_online_mask);
-}
-
-/**
- * on_each_cpu_mask(): Run a function on processors specified by
- * cpumask, which may include the local processor.
- * @mask: The set of cpus to run on (only runs on online subset).
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait (atomically) until function has completed
- *        on other CPUs.
- *
- * If @wait is true, then returns once @func has returned.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler.  The
- * exception is that it may be used during early boot while
- * early_boot_irqs_disabled is set.
- */
-static inline void on_each_cpu_mask(const struct cpumask *mask,
-				    smp_call_func_t func, void *info, bool wait)
-{
-	on_each_cpu_cond_mask(NULL, func, info, wait, mask);
-}
-
-/*
- * Call a function on each processor for which the supplied function
- * cond_func returns a positive value. This may include the local
- * processor.  May be used during early boot while early_boot_irqs_disabled is
- * set. Use local_irq_save/restore() instead of local_irq_disable/enable().
- */
-static inline void on_each_cpu_cond(smp_cond_func_t cond_func,
-				    smp_call_func_t func, void *info, bool wait)
-{
-	on_each_cpu_cond_mask(cond_func, func, info, wait, cpu_online_mask);
-}
-
 #ifdef CONFIG_SMP
 
 #include <linux/preempt.h>
@@ -299,13 +255,6 @@ extern int __cpu_up(unsigned int cpunum, struct task_struct *tidle);
  */
 extern void smp_cpus_done(unsigned int max_cpus);
 
-/*
- * Call a function on all other processors
- */
-void smp_call_function(smp_call_func_t func, void *info, int wait);
-void smp_call_function_many(const struct cpumask *mask,
-			    smp_call_func_t func, void *info, bool wait);
-
 void kick_all_cpus_sync(void);
 void wake_up_all_idle_cpus(void);
 
diff --git a/kernel/profile.c b/kernel/profile.c
index 37640a0bd8a3..7b7a6135f443 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -179,7 +179,7 @@ static void profile_flip_buffers(void)
 	mutex_lock(&profile_flip_mutex);
 	j = per_cpu(cpu_profile_flip, get_cpu());
 	put_cpu();
-	on_each_cpu(__profile_flip_buffers, NULL, 1);
+	smp_call(SMP_CALL_ALL, __profile_flip_buffers, NULL, SMP_CALL_TYPE_SYNC);
 	for_each_online_cpu(cpu) {
 		struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
 		for (i = 0; i < NR_PROFILE_HIT; ++i) {
@@ -202,7 +202,7 @@ static void profile_discard_flip_buffers(void)
 	mutex_lock(&profile_flip_mutex);
 	i = per_cpu(cpu_profile_flip, get_cpu());
 	put_cpu();
-	on_each_cpu(__profile_flip_buffers, NULL, 1);
+	smp_call(SMP_CALL_ALL, __profile_flip_buffers, NULL, SMP_CALL_TYPE_SYNC);
 	for_each_online_cpu(cpu) {
 		struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[i];
 		memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4b8189455d5..7bf1db5ebade 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1884,7 +1884,7 @@ static noinline_for_stack bool rcu_gp_init(void)
 
 	// If strict, make all CPUs aware of new grace period.
 	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
-		on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
+		smp_call(SMP_CALL_ALL, rcu_strict_gp_boundary, NULL, SMP_CALL_TYPE_ASYNC);
 
 	return true;
 }
@@ -2109,7 +2109,7 @@ static noinline void rcu_gp_cleanup(void)
 
 	// If strict, make all CPUs aware of the end of the old grace period.
 	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
-		on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
+		smp_call(SMP_CALL_ALL, rcu_strict_gp_boundary, NULL, SMP_CALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/kernel/scftorture.c b/kernel/scftorture.c
index dcb0410950e4..13f387583a54 100644
--- a/kernel/scftorture.c
+++ b/kernel/scftorture.c
@@ -398,7 +398,8 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 			barrier(); // Prevent race-reduction compiler optimizations.
 			scfcp->scfc_in = true;
 		}
-		smp_call_function_many(cpu_online_mask, scf_handler, scfcp, scfsp->scfs_wait);
+		smp_call_mask(cpu_online_mask, scf_handler, scfcp,
+			       (scfsp->scfs_wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 		break;
 	case SCF_PRIM_ALL:
 		if (scfsp->scfs_wait)
@@ -409,7 +410,8 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 			barrier(); // Prevent race-reduction compiler optimizations.
 			scfcp->scfc_in = true;
 		}
-		smp_call_function(scf_handler, scfcp, scfsp->scfs_wait);
+		smp_call(SMP_CALL_ALL, scf_handler, scfcp,
+			  (scfsp->scfs_wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 		break;
 	default:
 		WARN_ON_ONCE(1);
@@ -515,7 +517,7 @@ static void scf_torture_cleanup(void)
 			torture_stop_kthread("scftorture_invoker", scf_stats_p[i].task);
 	else
 		goto end;
-	smp_call_function(scf_cleanup_handler, NULL, 0);
+	smp_call(SMP_CALL_ALL, scf_cleanup_handler, NULL, SMP_CALL_TYPE_ASYNC);
 	torture_stop_kthread(scf_torture_stats, scf_torture_stats_task);
 	scf_torture_stats_print();  // -After- the stats thread is stopped!
 	kfree(scf_stats_p);  // -After- the last stats print has completed!
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 0c5be7ebb1dc..1b230c226912 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -290,9 +290,7 @@ static int membarrier_global_expedited(void)
 	}
 	rcu_read_unlock();
 
-	preempt_disable();
-	smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
-	preempt_enable();
+	smp_call_mask(tmpmask, ipi_mb, NULL, SMP_CALL_TYPE_SYNC);
 
 	free_cpumask_var(tmpmask);
 	cpus_read_unlock();
@@ -399,11 +397,9 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		 * rseq critical section.
 		 */
 		if (flags != MEMBARRIER_FLAG_SYNC_CORE) {
-			preempt_disable();
-			smp_call_function_many(tmpmask, ipi_func, NULL, true);
-			preempt_enable();
+			smp_call_mask(tmpmask, ipi_func, NULL, SMP_CALL_TYPE_SYNC);
 		} else {
-			on_each_cpu_mask(tmpmask, ipi_func, NULL, true);
+			smp_call_mask(tmpmask, ipi_func, NULL, SMP_CALL_TYPE_SYNC);
 		}
 	}
 
@@ -471,7 +467,7 @@ static int sync_runqueues_membarrier_state(struct mm_struct *mm)
 	}
 	rcu_read_unlock();
 
-	on_each_cpu_mask(tmpmask, ipi_sync_rq_state, mm, true);
+	smp_call_mask(tmpmask, ipi_sync_rq_state, mm, SMP_CALL_TYPE_SYNC);
 
 	free_cpumask_var(tmpmask);
 	cpus_read_unlock();
diff --git a/kernel/smp.c b/kernel/smp.c
index 6a43ab165aee..25ee37f746d4 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -626,10 +626,12 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-static void smp_call_function_many_cond(const struct cpumask *mask,
-					smp_call_func_t func, void *info,
-					bool wait,
-					smp_cond_func_t cond_func)
+/*
+ * This function performs synchronous and asynchronous cross CPU call for
+ * more than one participants.
+ */
+static void __smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
+		void *info, smp_cond_func_t cond_func, unsigned int flags)
 {
 	int cpu, last_cpu, this_cpu = smp_processor_id();
 	struct call_function_data *cfd;
@@ -650,10 +652,10 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 		lockdep_assert_irqs_enabled();
 
 	/*
-	 * When @wait we can deadlock when we interrupt between llist_add() and
-	 * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
-	 * csd_lock() on because the interrupt context uses the same csd
-	 * storage.
+	 * When CSD_TYPE_SYNC we can deadlock when we interrupt between
+	 * llist_add() and arch_send_call_function_ipi*(); when CSD_TYPE_ASYNC
+	 * we can deadlock due to csd_lock() on because the interrupt context
+	 * uses the same csd storage.
 	 */
 	WARN_ON_ONCE(!in_task());
 
@@ -682,7 +684,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 				continue;
 
 			csd_lock(csd);
-			if (wait)
+			if (flags & CSD_TYPE_SYNC)
 				csd->node.u_flags |= CSD_TYPE_SYNC;
 			csd->func = func;
 			csd->info = info;
@@ -721,11 +723,12 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 		unsigned long flags;
 
 		local_irq_save(flags);
-		func(info);
+		if (func != NULL)
+			func(info);
 		local_irq_restore(flags);
 	}
 
-	if (run_remote && wait) {
+	if (run_remote && (flags & CSD_TYPE_SYNC)) {
 		for_each_cpu(cpu, cfd->cpumask) {
 			call_single_data_t *csd;
 
@@ -735,48 +738,6 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 	}
 }
 
-/**
- * smp_call_function_many(): Run a function on a set of CPUs.
- * @mask: The set of cpus to run on (only runs on online subset).
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If wait is true, the call will not return until func()
- *        has completed on other CPUs.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler. Preemption
- * must be disabled when calling this function.
- */
-void smp_call_function_many(const struct cpumask *mask,
-			    smp_call_func_t func, void *info, bool wait)
-{
-	smp_call_function_many_cond(mask, func, info, wait, NULL);
-}
-EXPORT_SYMBOL(smp_call_function_many);
-
-/**
- * smp_call_function(): Run a function on all other CPUs.
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait (atomically) until function has completed
- *        on other CPUs.
- *
- * Returns 0.
- *
- * If @wait is true, then returns once @func has returned; otherwise
- * it returns just before the target cpu calls @func.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler.
- */
-void smp_call_function(smp_call_func_t func, void *info, int wait)
-{
-	preempt_disable();
-	smp_call_function_many(cpu_online_mask, func, info, wait);
-	preempt_enable();
-}
-EXPORT_SYMBOL(smp_call_function);
-
 /* Setup configured maximum number of CPUs to activate */
 unsigned int setup_max_cpus = NR_CPUS;
 EXPORT_SYMBOL(setup_max_cpus);
@@ -861,38 +822,6 @@ void __init smp_init(void)
 	smp_cpus_done(setup_max_cpus);
 }
 
-/*
- * on_each_cpu_cond(): Call a function on each processor for which
- * the supplied function cond_func returns true, optionally waiting
- * for all the required CPUs to finish. This may include the local
- * processor.
- * @cond_func:	A callback function that is passed a cpu id and
- *		the info parameter. The function is called
- *		with preemption disabled. The function should
- *		return a blooean value indicating whether to IPI
- *		the specified CPU.
- * @func:	The function to run on all applicable CPUs.
- *		This must be fast and non-blocking.
- * @info:	An arbitrary pointer to pass to both functions.
- * @wait:	If true, wait (atomically) until function has
- *		completed on other CPUs.
- *
- * Preemption is disabled to protect against CPUs going offline but not online.
- * CPUs going online during the call will not be seen or sent an IPI.
- *
- * You must not call this function with disabled interrupts or
- * from a hardware interrupt handler or from a bottom half handler.
- */
-void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
-			   void *info, bool wait, const struct cpumask *mask)
-{
-
-	preempt_disable();
-	smp_call_function_many_cond(mask, func, info, wait, cond_func);
-	preempt_enable();
-}
-EXPORT_SYMBOL(on_each_cpu_cond_mask);
-
 static void do_nothing(void *unused)
 {
 }
@@ -912,7 +841,7 @@ void kick_all_cpus_sync(void)
 {
 	/* Make sure the change is visible before we kick the cpus */
 	smp_mb();
-	smp_call_function(do_nothing, NULL, 1);
+	smp_call(SMP_CALL_ALL, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
 
@@ -992,27 +921,6 @@ int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 }
 EXPORT_SYMBOL_GPL(smp_call_on_cpu);
 
-
-void __smp_call_mask_cond(const struct cpumask *mask,
-		smp_call_func_t func, void *info,
-		smp_cond_func_t cond_func,
-		unsigned int flags)
-{
-	bool wait = false;
-
-	if (flags == SMP_CALL_TYPE_SYNC)
-		wait = true;
-
-	preempt_disable();
-
-	/*
-	 * This is temporarily hook. The function smp_call_function_many_cond()
-	 * will be inlined here with the last patch in this series.
-	 */
-	smp_call_function_many_cond(mask, func, info, wait, cond_func);
-	preempt_enable();
-}
-
 /*
  * smp_call Interface
  *
@@ -1033,7 +941,8 @@ void __smp_call_mask_cond(const struct cpumask *mask,
  * Parameters:
  *
  * cpu: If cpu >=0 && cpu < nr_cpu_ids, the cross call is for that cpu.
- *      If cpu == -1, the cross call is for all the online CPUs
+ *      If cpu == SMP_CALL_ALL which is -1, the cross call is for all the online
+ *      CPUs.
  *
  * func: It is the cross function that the destination CPUs need to execute.
  *       This function must be fast and non-blocking.
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 0ea8702eb516..75cb0c8e2bfa 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -952,7 +952,7 @@ void clock_was_set(unsigned int bases)
 		goto out_timerfd;
 
 	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
-		on_each_cpu(retrigger_next_event, NULL, 1);
+		smp_call(SMP_CALL_ALL, retrigger_next_event, NULL, SMP_CALL_TYPE_SYNC);
 		goto out_timerfd;
 	}
 
@@ -970,9 +970,7 @@ void clock_was_set(unsigned int bases)
 		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
 	}
 
-	preempt_disable();
-	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
-	preempt_enable();
+	smp_call_mask(mask, retrigger_next_event, NULL, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 	free_cpumask_var(mask);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4f1d2f5e7263..1762955e18d7 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -243,7 +243,7 @@ static void update_ftrace_function(void)
 	/* Make sure the function_trace_op is visible on all CPUs */
 	smp_wmb();
 	/* Nasty way to force a rmb on all cpus */
-	smp_call_function(ftrace_sync_ipi, NULL, 1);
+	smp_call(SMP_CALL_ALL, ftrace_sync_ipi, NULL, SMP_CALL_TYPE_SYNC);
 	/* OK, we are all set to update the ftrace_trace_function now! */
 #endif /* !CONFIG_DYNAMIC_FTRACE */
 
@@ -2756,7 +2756,7 @@ void ftrace_modify_all_code(int command)
 		smp_wmb();
 		/* If irqs are disabled, we are in stop machine */
 		if (!irqs_disabled())
-			smp_call_function(ftrace_sync_ipi, NULL, 1);
+			smp_call(SMP_CALL_ALL, ftrace_sync_ipi, NULL, SMP_CALL_TYPE_SYNC);
 		err = ftrace_update_ftrace_func(ftrace_trace_function);
 		if (FTRACE_WARN_ON(err))
 			return;
@@ -7769,7 +7769,7 @@ pid_write(struct file *filp, const char __user *ubuf,
 	 * check for those tasks that are currently running.
 	 * Always do this in case a pid was appended or removed.
 	 */
-	on_each_cpu(ignore_task_cpu, tr, 1);
+	smp_call(SMP_CALL_ALL, ignore_task_cpu, tr, SMP_CALL_TYPE_SYNC);
 
 	ftrace_update_pid_func();
 	ftrace_startup_all(0);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 05dfc7a12d3d..bb1f9cd85232 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5865,7 +5865,7 @@ static __init int rb_hammer_test(void *arg)
 	while (!kthread_should_stop()) {
 
 		/* Send an IPI to all cpus to write data! */
-		smp_call_function(rb_ipi, NULL, 1);
+		smp_call(SMP_CALL_ALL, rb_ipi, NULL, SMP_CALL_TYPE_SYNC);
 		/* No sleep, but for non preempt, let others run */
 		schedule();
 	}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f4de111fa18f..1da0502e8ee8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2724,11 +2724,9 @@ void trace_buffered_event_disable(void)
 	if (--trace_buffered_event_ref)
 		return;
 
-	preempt_disable();
 	/* For each CPU, set the buffer as used. */
-	smp_call_function_many(tracing_buffer_mask,
-			       disable_trace_buffered_event, NULL, 1);
-	preempt_enable();
+	smp_call_mask(tracing_buffer_mask,
+		       disable_trace_buffered_event, NULL, SMP_CALL_TYPE_SYNC);
 
 	/* Wait for all current users to finish */
 	synchronize_rcu();
@@ -2743,11 +2741,9 @@ void trace_buffered_event_disable(void)
 	 */
 	smp_wmb();
 
-	preempt_disable();
 	/* Do the work on each cpu */
-	smp_call_function_many(tracing_buffer_mask,
-			       enable_trace_buffered_event, NULL, 1);
-	preempt_enable();
+	smp_call_mask(tracing_buffer_mask,
+		       enable_trace_buffered_event, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static struct trace_buffer *temp_buffer;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index e11e167b7809..30441ba91790 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1995,7 +1995,7 @@ event_pid_write(struct file *filp, const char __user *ubuf,
 	 * check for those tasks that are currently running.
 	 * Always do this in case a pid was appended or removed.
 	 */
-	on_each_cpu(ignore_task_cpu, tr, 1);
+	smp_call(SMP_CALL_ALL, ignore_task_cpu, tr, SMP_CALL_TYPE_SYNC);
 
  out:
 	mutex_unlock(&event_mutex);
diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 08291ed33e93..1d06f2489aa2 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -332,7 +332,7 @@ void kasan_quarantine_remove_cache(struct kmem_cache *cache)
 	 * achieves the first goal, while synchronize_srcu() achieves the
 	 * second.
 	 */
-	on_each_cpu(per_cpu_remove_cache, cache, 1);
+	smp_call(SMP_CALL_ALL, per_cpu_remove_cache, cache, SMP_CALL_TYPE_SYNC);
 
 	raw_spin_lock_irqsave(&quarantine_lock, flags);
 	for (i = 0; i < QUARANTINE_BATCHES; i++) {
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index afb7185ffdc4..8d2ee7dc7934 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -149,7 +149,7 @@ static void tlb_remove_table_sync_one(void)
 	 * It is however sufficient for software page-table walkers that rely on
 	 * IRQ disabling.
 	 */
-	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
+	smp_call(SMP_CALL_ALL, tlb_remove_table_smp_sync, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void tlb_remove_table_rcu(struct rcu_head *head)
diff --git a/mm/slab.c b/mm/slab.c
index b04e40078bdf..5f442172e566 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2171,7 +2171,7 @@ static void drain_cpu_caches(struct kmem_cache *cachep)
 	int node;
 	LIST_HEAD(list);
 
-	on_each_cpu(do_drain, cachep, 1);
+	smp_call(SMP_CALL_ALL, do_drain, cachep, SMP_CALL_TYPE_SYNC);
 	check_irq_on();
 	for_each_kmem_cache_node(cachep, node, n)
 		if (n->alien)
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index eb0295d90039..4d97f6d807e7 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -574,7 +574,7 @@ static int iucv_enable(void)
 static void iucv_disable(void)
 {
 	cpus_read_lock();
-	on_each_cpu(iucv_retrieve_cpu, NULL, 1);
+	smp_call(SMP_CALL_ALL, iucv_retrieve_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	kfree(iucv_path_table);
 	iucv_path_table = NULL;
 	cpus_read_unlock();
@@ -696,7 +696,7 @@ static void iucv_cleanup_queue(void)
 	 * pending interrupts force them to the work queue by calling
 	 * an empty function on all cpus.
 	 */
-	smp_call_function(__iucv_cleanup_queue, NULL, 1);
+	smp_call(SMP_CALL_ALL, __iucv_cleanup_queue, NULL, SMP_CALL_TYPE_SYNC);
 	spin_lock_irq(&iucv_queue_lock);
 	list_for_each_entry_safe(p, n, &iucv_task_queue, list) {
 		/* Remove stale work items from the task queue. */
@@ -787,7 +787,7 @@ static int iucv_reboot_event(struct notifier_block *this,
 		return NOTIFY_DONE;
 
 	cpus_read_lock();
-	on_each_cpu_mask(&iucv_irq_cpumask, iucv_block_cpu, NULL, 1);
+	smp_call_mask(&iucv_irq_cpumask, iucv_block_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	preempt_disable();
 	for (i = 0; i < iucv_max_pathid; i++) {
 		if (iucv_path_table[i])
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 70e05af5ebea..e4b1e15f1965 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -244,7 +244,7 @@ static inline bool kvm_kick_many_cpus(struct cpumask *cpus, bool wait)
 	if (cpumask_empty(cpus))
 		return false;
 
-	smp_call_function_many(cpus, ack_flush, NULL, wait);
+	smp_call_mask(cpus, ack_flush, NULL, (wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 	return true;
 }
 
@@ -4895,7 +4895,7 @@ static void hardware_disable_all_nolock(void)
 
 	kvm_usage_count--;
 	if (!kvm_usage_count)
-		on_each_cpu(hardware_disable_nolock, NULL, 1);
+		smp_call(SMP_CALL_ALL, hardware_disable_nolock, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void hardware_disable_all(void)
@@ -4914,7 +4914,7 @@ static int hardware_enable_all(void)
 	kvm_usage_count++;
 	if (kvm_usage_count == 1) {
 		atomic_set(&hardware_enable_failed, 0);
-		on_each_cpu(hardware_enable_nolock, NULL, 1);
+		smp_call(SMP_CALL_ALL, hardware_enable_nolock, NULL, SMP_CALL_TYPE_SYNC);
 
 		if (atomic_read(&hardware_enable_failed)) {
 			hardware_disable_all_nolock();
@@ -4938,7 +4938,7 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
 	 */
 	pr_info("kvm: exiting hardware virtualization\n");
 	kvm_rebooting = true;
-	on_each_cpu(hardware_disable_nolock, NULL, 1);
+	smp_call(SMP_CALL_ALL, hardware_disable_nolock, NULL, SMP_CALL_TYPE_SYNC);
 	return NOTIFY_OK;
 }
 
@@ -5790,7 +5790,7 @@ void kvm_exit(void)
 	unregister_syscore_ops(&kvm_syscore_ops);
 	unregister_reboot_notifier(&kvm_reboot_notifier);
 	cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
-	on_each_cpu(hardware_disable_nolock, NULL, 1);
+	smp_call(SMP_CALL_ALL, hardware_disable_nolock, NULL, SMP_CALL_TYPE_SYNC);
 	kvm_arch_hardware_unsetup();
 	kvm_arch_exit();
 	kvm_irqfd_exit();
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 09/11] smp: replace smp_call_function_single_async with smp_call_private
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (7 preceding siblings ...)
  2022-04-22 20:00 ` [PATCH v2 08/11] smp: replace smp_call_function_many_cond() with __smp_call_mask_cond() Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-22 20:00 ` [PATCH v2 10/11] smp: replace smp_call_function_single() with smp_call() Donghai Qiao
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replace smp_call_function_single_async with smp_call_private
and modify all the invocations.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

 arch/mips/kernel/process.c                      | 2 +-
 arch/mips/kernel/smp.c                          | 2 +-
 arch/s390/pci/pci_irq.c                         | 2 +-
 arch/x86/kernel/cpuid.c                         | 2 +-
 arch/x86/lib/msr-smp.c                          | 2 +-
 block/blk-mq.c                                  | 2 +-
 drivers/clocksource/ingenic-timer.c             | 2 +-
 drivers/cpuidle/coupled.c                       | 2 +-
 drivers/net/ethernet/cavium/liquidio/lio_core.c | 2 +-
 include/linux/smp.h                             | 3 ---
 kernel/debug/debug_core.c                       | 2 +-
 kernel/sched/core.c                             | 2 +-
 kernel/sched/fair.c                             | 2 +-
 net/core/dev.c                                  | 2 +-
 14 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index c2d5f4bfe1f3..41e78c4ba2cf 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -745,7 +745,7 @@ static void raise_backtrace(cpumask_t *mask)
 		}
 
 		csd = &per_cpu(backtrace_csd, cpu);
-		smp_call_function_single_async(cpu, csd);
+		smp_call_private(cpu, csd, SMP_CALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 9a6f827d9325..04f1cec09a15 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -701,7 +701,7 @@ void tick_broadcast(const struct cpumask *mask)
 
 	for_each_cpu(cpu, mask) {
 		csd = &per_cpu(tick_broadcast_csd, cpu);
-		smp_call_function_single_async(cpu, csd);
+		smp_call_private(cpu, csd, SMP_CALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 136af9f32f23..6ce877cda632 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -212,7 +212,7 @@ static void zpci_handle_fallback_irq(void)
 			continue;
 
 		INIT_CSD(&cpu_data->csd, zpci_handle_remote_irq, &cpu_data->scheduled);
-		smp_call_function_single_async(cpu, &cpu_data->csd);
+		smp_call_private(cpu, &cpu_data->csd, SMP_CALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/arch/x86/kernel/cpuid.c b/arch/x86/kernel/cpuid.c
index 6f7b8cc1bc9f..930bf45244eb 100644
--- a/arch/x86/kernel/cpuid.c
+++ b/arch/x86/kernel/cpuid.c
@@ -81,7 +81,7 @@ static ssize_t cpuid_read(struct file *file, char __user *buf,
 		cmd.regs.eax = pos;
 		cmd.regs.ecx = pos >> 32;
 
-		err = smp_call_function_single_async(cpu, &csd);
+		err = smp_call_private(cpu, &csd, SMP_CALL_TYPE_ASYNC);
 		if (err)
 			break;
 		wait_for_completion(&cmd.done);
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index 139ba6a2240a..71b8bf97450e 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -178,7 +178,7 @@ int rdmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
 	init_completion(&rv.done);
 	rv.msr.msr_no = msr_no;
 
-	err = smp_call_function_single_async(cpu, &csd);
+	err = smp_call_private(cpu, &csd, SMP_CALL_TYPE_ASYNC);
 	if (!err) {
 		wait_for_completion(&rv.done);
 		err = rv.msr.err;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ed3ed86f7dd2..da2c86f317df 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1062,7 +1062,7 @@ static void blk_mq_complete_send_ipi(struct request *rq)
 	list = &per_cpu(blk_cpu_done, cpu);
 	if (llist_add(&rq->ipi_list, list)) {
 		INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq);
-		smp_call_function_single_async(cpu, &rq->csd);
+		smp_call_private(cpu, &rq->csd, SMP_CALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/drivers/clocksource/ingenic-timer.c b/drivers/clocksource/ingenic-timer.c
index 24ed0f1f089b..d70f1c2d967b 100644
--- a/drivers/clocksource/ingenic-timer.c
+++ b/drivers/clocksource/ingenic-timer.c
@@ -121,7 +121,7 @@ static irqreturn_t ingenic_tcu_cevt_cb(int irq, void *dev_id)
 		csd = &per_cpu(ingenic_cevt_csd, timer->cpu);
 		csd->info = (void *) &timer->cevt;
 		csd->func = ingenic_per_cpu_event_handler;
-		smp_call_function_single_async(timer->cpu, csd);
+		smp_call_private(timer->cpu, csd, SMP_CALL_TYPE_ASYNC);
 	}
 
 	return IRQ_HANDLED;
diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
index 74068742cef3..4379dd2a363b 100644
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -334,7 +334,7 @@ static void cpuidle_coupled_poke(int cpu)
 	call_single_data_t *csd = &per_cpu(cpuidle_coupled_poke_cb, cpu);
 
 	if (!cpumask_test_and_set_cpu(cpu, &cpuidle_coupled_poke_pending))
-		smp_call_function_single_async(cpu, csd);
+		smp_call_private(cpu, csd, SMP_CALL_TYPE_ASYNC);
 }
 
 /**
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 73cb03266549..d293e2da4aea 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -729,7 +729,7 @@ static void liquidio_napi_drv_callback(void *arg)
 		napi_schedule_irqoff(&droq->napi);
 	} else {
 		INIT_CSD(&droq->csd, napi_schedule_wrapper, &droq->napi);
-		smp_call_function_single_async(droq->cpu_id, &droq->csd);
+		smp_call_private(droq->cpu_id, &droq->csd, SMP_CALL_TYPE_ASYNC);
 	}
 }
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 5e5d5a5aa730..8923a806757f 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -206,9 +206,6 @@ extern unsigned int total_cpus;
 int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 			     int wait);
 
-#define	smp_call_function_single_async(cpu, csd) \
-	smp_call_private(cpu, csd, SMP_CALL_TYPE_ASYNC)
-
 /*
  * Cpus stopping functions in panic. All have default weak definitions.
  * Architecture-dependent code may override them.
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index da06a5553835..a6dd8f2e928a 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -264,7 +264,7 @@ void __weak kgdb_roundup_cpus(void)
 			continue;
 		kgdb_info[cpu].rounding_up = true;
 
-		ret = smp_call_function_single_async(cpu, csd);
+		ret = smp_call_private(cpu, csd, SMP_CALL_TYPE_ASYNC);
 		if (ret)
 			kgdb_info[cpu].rounding_up = false;
 	}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4fda3dfb887b..c9c5e4814742 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -836,7 +836,7 @@ void hrtick_start(struct rq *rq, u64 delay)
 	if (rq == this_rq())
 		__hrtick_restart(rq);
 	else
-		smp_call_function_single_async(cpu_of(rq), &rq->hrtick_csd);
+		smp_call_private(cpu_of(rq), &rq->hrtick_csd, SMP_CALL_TYPE_ASYNC);
 }
 
 #else
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d4bd299d67ab..367c3ce1d683 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10461,7 +10461,7 @@ static void kick_ilb(unsigned int flags)
 	 * is idle. And the softirq performing nohz idle load balance
 	 * will be run before returning from the IPI.
 	 */
-	smp_call_function_single_async(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd);
+	smp_call_private(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd, SMP_CALL_TYPE_ASYNC);
 }
 
 /*
diff --git a/net/core/dev.c b/net/core/dev.c
index 8c6c08446556..77b9f71a5259 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5788,7 +5788,7 @@ static void net_rps_send_ipi(struct softnet_data *remsd)
 		struct softnet_data *next = remsd->rps_ipi_next;
 
 		if (cpu_online(remsd->cpu))
-			smp_call_function_single_async(remsd->cpu, &remsd->csd);
+			smp_call_private(remsd->cpu, &remsd->csd, SMP_CALL_TYPE_ASYNC);
 		remsd = next;
 	}
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 10/11] smp: replace smp_call_function_single() with smp_call()
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (8 preceding siblings ...)
  2022-04-22 20:00 ` [PATCH v2 09/11] smp: replace smp_call_function_single_async with smp_call_private Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-22 20:00 ` [PATCH v2 11/11] smp: modify up.c to adopt the same format of cross CPU call Donghai Qiao
  2022-04-26 14:00 ` [PATCH v2 00/11] smp: cross CPU call interface Christoph Hellwig
  11 siblings, 0 replies; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Replace smp_call_function_single() with smp_call() and
changes all the invocations for it.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

 arch/alpha/kernel/rtc.c                       |  4 +--
 arch/arm/mach-bcm/bcm_kona_smc.c              |  2 +-
 arch/arm/mach-mvebu/pmsu.c                    |  4 +--
 arch/arm64/kernel/topology.c                  |  2 +-
 arch/csky/kernel/cpu-probe.c                  |  2 +-
 arch/ia64/kernel/palinfo.c                    |  3 +-
 arch/ia64/kernel/smpboot.c                    |  2 +-
 arch/mips/kernel/smp-bmips.c                  |  3 +-
 arch/mips/kernel/smp-cps.c                    |  8 ++---
 arch/powerpc/kernel/sysfs.c                   | 26 +++++++-------
 arch/powerpc/kernel/watchdog.c                |  4 +--
 arch/powerpc/kvm/book3s_hv.c                  |  8 ++---
 arch/powerpc/platforms/85xx/smp.c             |  6 ++--
 arch/sparc/kernel/nmi.c                       |  4 +--
 arch/x86/kernel/apic/vector.c                 |  2 +-
 arch/x86/kernel/cpu/aperfmperf.c              |  5 +--
 arch/x86/kernel/cpu/mce/amd.c                 |  4 +--
 arch/x86/kernel/cpu/mce/inject.c              |  8 ++---
 arch/x86/kernel/cpu/microcode/core.c          |  4 +--
 arch/x86/kernel/cpu/mtrr/mtrr.c               |  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |  2 +-
 arch/x86/kernel/kvm.c                         |  4 +--
 arch/x86/kvm/vmx/vmx.c                        |  3 +-
 arch/x86/kvm/x86.c                            |  7 ++--
 arch/x86/lib/cache-smp.c                      |  2 +-
 arch/x86/lib/msr-smp.c                        | 16 ++++-----
 arch/x86/xen/mmu_pv.c                         |  2 +-
 arch/xtensa/kernel/smp.c                      |  7 ++--
 drivers/acpi/processor_idle.c                 |  4 +--
 drivers/cpufreq/powernow-k8.c                 |  9 +++--
 drivers/cpufreq/powernv-cpufreq.c             |  2 +-
 drivers/cpufreq/sparc-us2e-cpufreq.c          |  4 +--
 drivers/cpufreq/sparc-us3-cpufreq.c           |  4 +--
 drivers/cpufreq/speedstep-ich.c               |  7 ++--
 drivers/cpufreq/tegra194-cpufreq.c            |  6 ++--
 .../hwtracing/coresight/coresight-cpu-debug.c |  3 +-
 .../coresight/coresight-etm3x-core.c          | 11 +++---
 .../coresight/coresight-etm4x-core.c          | 12 +++----
 .../coresight/coresight-etm4x-sysfs.c         |  2 +-
 drivers/hwtracing/coresight/coresight-trbe.c  |  6 ++--
 drivers/net/ethernet/marvell/mvneta.c         |  4 +--
 .../intel/speed_select_if/isst_if_mbox_msr.c  |  4 +--
 drivers/powercap/intel_rapl_common.c          |  2 +-
 drivers/powercap/intel_rapl_msr.c             |  2 +-
 drivers/regulator/qcom_spmi-regulator.c       |  3 +-
 drivers/soc/fsl/qbman/qman.c                  |  4 +--
 drivers/soc/fsl/qbman/qman_test_stash.c       |  9 ++---
 include/linux/smp.h                           |  3 +-
 kernel/cpu.c                                  |  4 +--
 kernel/events/core.c                          | 10 +++---
 kernel/rcu/rcutorture.c                       |  3 +-
 kernel/rcu/tasks.h                            |  4 +--
 kernel/rcu/tree.c                             |  2 +-
 kernel/rcu/tree_exp.h                         |  4 +--
 kernel/relay.c                                |  5 ++-
 kernel/scftorture.c                           |  5 +--
 kernel/sched/membarrier.c                     |  2 +-
 kernel/smp.c                                  | 34 ++-----------------
 kernel/time/clockevents.c                     |  2 +-
 kernel/time/clocksource.c                     |  2 +-
 kernel/time/tick-common.c                     |  2 +-
 net/bpf/test_run.c                            |  4 +--
 net/iucv/iucv.c                               | 11 +++---
 virt/kvm/kvm_main.c                           |  2 +-
 64 files changed, 149 insertions(+), 194 deletions(-)

diff --git a/arch/alpha/kernel/rtc.c b/arch/alpha/kernel/rtc.c
index fb3025396ac9..2853887cc0d5 100644
--- a/arch/alpha/kernel/rtc.c
+++ b/arch/alpha/kernel/rtc.c
@@ -168,7 +168,7 @@ remote_read_time(struct device *dev, struct rtc_time *tm)
 	union remote_data x;
 	if (smp_processor_id() != boot_cpuid) {
 		x.tm = tm;
-		smp_call_function_single(boot_cpuid, do_remote_read, &x, 1);
+		smp_call(boot_cpuid, do_remote_read, &x, SMP_CALL_TYPE_SYNC);
 		return x.retval;
 	}
 	return alpha_rtc_read_time(NULL, tm);
@@ -187,7 +187,7 @@ remote_set_time(struct device *dev, struct rtc_time *tm)
 	union remote_data x;
 	if (smp_processor_id() != boot_cpuid) {
 		x.tm = tm;
-		smp_call_function_single(boot_cpuid, do_remote_set, &x, 1);
+		smp_call(boot_cpuid, do_remote_set, &x, SMP_CALL_TYPE_SYNC);
 		return x.retval;
 	}
 	return alpha_rtc_set_time(NULL, tm);
diff --git a/arch/arm/mach-bcm/bcm_kona_smc.c b/arch/arm/mach-bcm/bcm_kona_smc.c
index 43829e49ad93..75453079f79e 100644
--- a/arch/arm/mach-bcm/bcm_kona_smc.c
+++ b/arch/arm/mach-bcm/bcm_kona_smc.c
@@ -172,7 +172,7 @@ unsigned bcm_kona_smc(unsigned service_id, unsigned arg0, unsigned arg1,
 	 * Due to a limitation of the secure monitor, we must use the SMP
 	 * infrastructure to forward all secure monitor calls to Core 0.
 	 */
-	smp_call_function_single(0, __bcm_kona_smc, &data, 1);
+	smp_call(0, __bcm_kona_smc, &data, SMP_CALL_TYPE_SYNC);
 
 	return data.result;
 }
diff --git a/arch/arm/mach-mvebu/pmsu.c b/arch/arm/mach-mvebu/pmsu.c
index 73d5d72dfc3e..91ef1ba4100e 100644
--- a/arch/arm/mach-mvebu/pmsu.c
+++ b/arch/arm/mach-mvebu/pmsu.c
@@ -585,8 +585,8 @@ int mvebu_pmsu_dfs_request(int cpu)
 	writel(reg, pmsu_mp_base + PMSU_EVENT_STATUS_AND_MASK(hwcpu));
 
 	/* Trigger the DFS on the appropriate CPU */
-	smp_call_function_single(cpu, mvebu_pmsu_dfs_request_local,
-				 NULL, false);
+	smp_call(cpu, mvebu_pmsu_dfs_request_local,
+				 NULL, SMP_CALL_TYPE_ASYNC);
 
 	/* Poll until the DFS done event is generated */
 	timeout = jiffies + HZ;
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 9ab78ad826e2..c7a6d037feb4 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -331,7 +331,7 @@ int counters_read_on_cpu(int cpu, smp_call_func_t func, u64 *val)
 	if (WARN_ON_ONCE(irqs_disabled()))
 		return -EPERM;
 
-	smp_call_function_single(cpu, func, val, 1);
+	smp_call(cpu, func, val, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
diff --git a/arch/csky/kernel/cpu-probe.c b/arch/csky/kernel/cpu-probe.c
index 5f15ca31d3e8..2af63175fd08 100644
--- a/arch/csky/kernel/cpu-probe.c
+++ b/arch/csky/kernel/cpu-probe.c
@@ -48,7 +48,7 @@ static int c_show(struct seq_file *m, void *v)
 	int cpu;
 
 	for_each_online_cpu(cpu)
-		smp_call_function_single(cpu, percpu_print, m, true);
+		smp_call(cpu, percpu_print, m, SMP_CALL_TYPE_SYNC);
 
 #ifdef CSKY_ARCH_VERSION
 	seq_printf(m, "arch-version : %s\n", CSKY_ARCH_VERSION);
diff --git a/arch/ia64/kernel/palinfo.c b/arch/ia64/kernel/palinfo.c
index 64189f04c1a4..9c1b113e155a 100644
--- a/arch/ia64/kernel/palinfo.c
+++ b/arch/ia64/kernel/palinfo.c
@@ -844,7 +844,8 @@ int palinfo_handle_smp(struct seq_file *m, pal_func_cpu_u_t *f)
 
 
 	/* will send IPI to other CPU and wait for completion of remote call */
-	if ((ret=smp_call_function_single(f->req_cpu, palinfo_smp_call, &ptr, 1))) {
+	ret = smp_call(f->req_cpu, palinfo_smp_call, &ptr, SMP_CALL_TYPE_SYNC);
+	if (ret) {
 		printk(KERN_ERR "palinfo: remote CPU call from %d to %d on function %d: "
 		       "error %d\n", smp_processor_id(), f->req_cpu, f->func_id, ret);
 		return 0;
diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index d10f780c13b9..e71829688bd2 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -295,7 +295,7 @@ ia64_sync_itc (unsigned int master)
 
 	go[MASTER] = 1;
 
-	if (smp_call_function_single(master, sync_master, NULL, 0) < 0) {
+	if (smp_call(master, sync_master, NULL, SMP_CALL_TYPE_ASYNC) < 0) {
 		printk(KERN_ERR "sync_itc: failed to get attention of CPU %u!\n", master);
 		return;
 	}
diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
index f5d7bfa3472a..bd2df61e4f7f 100644
--- a/arch/mips/kernel/smp-bmips.c
+++ b/arch/mips/kernel/smp-bmips.c
@@ -488,8 +488,7 @@ static void bmips_set_reset_vec_remote(void *vinfo)
 
 	preempt_disable();
 	if (smp_processor_id() > 0) {
-		smp_call_function_single(0, &bmips_set_reset_vec_remote,
-					 info, 1);
+		smp_call(0, &bmips_set_reset_vec_remote, info, SMP_CALL_TYPE_SYNC);
 	} else {
 		if (info->cpu & 0x02) {
 			/* BMIPS5200 "should" use mask/shift, but it's buggy */
diff --git a/arch/mips/kernel/smp-cps.c b/arch/mips/kernel/smp-cps.c
index bcd6a944b839..3073b1e05b61 100644
--- a/arch/mips/kernel/smp-cps.c
+++ b/arch/mips/kernel/smp-cps.c
@@ -341,8 +341,7 @@ static int cps_boot_secondary(int cpu, struct task_struct *idle)
 			goto out;
 		}
 
-		err = smp_call_function_single(remote, remote_vpe_boot,
-					       NULL, 1);
+		err = smp_call(remote, remote_vpe_boot, NULL, SMP_CALL_TYPE_SYNC);
 		if (err)
 			panic("Failed to call remote CPU\n");
 		goto out;
@@ -587,9 +586,8 @@ static void cps_cpu_die(unsigned int cpu)
 		 * Have a CPU with access to the offlined CPUs registers wait
 		 * for its TC to halt.
 		 */
-		err = smp_call_function_single(cpu_death_sibling,
-					       wait_for_sibling_halt,
-					       (void *)(unsigned long)cpu, 1);
+		err = smp_call(cpu_death_sibling, wait_for_sibling_halt,
+			       (void *)(unsigned long)cpu, SMP_CALL_TYPE_SYNC);
 		if (err)
 			panic("Failed to call remote sibling CPU\n");
 	} else if (cpu_has_vp) {
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 898b48ca7b5a..b285ca10152b 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -98,7 +98,7 @@ static ssize_t show_##NAME(struct device *dev, \
 { \
 	struct cpu *cpu = container_of(dev, struct cpu, dev); \
 	unsigned long val; \
-	smp_call_function_single(cpu->dev.id, read_##NAME, &val, 1);	\
+	smp_call(cpu->dev.id, read_##NAME, &val, SMP_CALL_TYPE_SYNC);	\
 	return sprintf(buf, "%lx\n", val); \
 } \
 static ssize_t __used \
@@ -110,7 +110,7 @@ static ssize_t __used \
 	int ret = sscanf(buf, "%lx", &val); \
 	if (ret != 1) \
 		return -EINVAL; \
-	smp_call_function_single(cpu->dev.id, write_##NAME, &val, 1); \
+	smp_call(cpu->dev.id, write_##NAME, &val, SMP_CALL_TYPE_SYNC); \
 	return count; \
 }
 
@@ -262,7 +262,7 @@ static ssize_t show_pw20_state(struct device *dev,
 	u32 value;
 	unsigned int cpu = dev->id;
 
-	smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+	smp_call(cpu, do_show_pwrmgtcr0, &value, SMP_CALL_TYPE_SYNC);
 
 	value &= PWRMGTCR0_PW20_WAIT;
 
@@ -297,7 +297,7 @@ static ssize_t store_pw20_state(struct device *dev,
 	if (value > 1)
 		return -EINVAL;
 
-	smp_call_function_single(cpu, do_store_pw20_state, &value, 1);
+	smp_call(cpu, do_store_pw20_state, &value, SMP_CALL_TYPE_SYNC);
 
 	return count;
 }
@@ -312,7 +312,7 @@ static ssize_t show_pw20_wait_time(struct device *dev,
 	unsigned int cpu = dev->id;
 
 	if (!pw20_wt) {
-		smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+		smp_call(cpu, do_show_pwrmgtcr0, &value, SMP_CALL_TYPE_SYNC);
 		value = (value & PWRMGTCR0_PW20_ENT) >>
 					PWRMGTCR0_PW20_ENT_SHIFT;
 
@@ -372,8 +372,7 @@ static ssize_t store_pw20_wait_time(struct device *dev,
 
 	pw20_wt = value;
 
-	smp_call_function_single(cpu, set_pw20_wait_entry_bit,
-				&entry_bit, 1);
+	smp_call(cpu, set_pw20_wait_entry_bit, &entry_bit, SMP_CALL_TYPE_SYNC);
 
 	return count;
 }
@@ -384,7 +383,7 @@ static ssize_t show_altivec_idle(struct device *dev,
 	u32 value;
 	unsigned int cpu = dev->id;
 
-	smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+	smp_call(cpu, do_show_pwrmgtcr0, &value, SMP_CALL_TYPE_SYNC);
 
 	value &= PWRMGTCR0_AV_IDLE_PD_EN;
 
@@ -419,7 +418,7 @@ static ssize_t store_altivec_idle(struct device *dev,
 	if (value > 1)
 		return -EINVAL;
 
-	smp_call_function_single(cpu, do_store_altivec_idle, &value, 1);
+	smp_call(cpu, do_store_altivec_idle, &value, SMP_CALL_TYPE_SYNC);
 
 	return count;
 }
@@ -434,7 +433,7 @@ static ssize_t show_altivec_idle_wait_time(struct device *dev,
 	unsigned int cpu = dev->id;
 
 	if (!altivec_idle_wt) {
-		smp_call_function_single(cpu, do_show_pwrmgtcr0, &value, 1);
+		smp_call(cpu, do_show_pwrmgtcr0, &value, SMP_CALL_TYPE_SYNC);
 		value = (value & PWRMGTCR0_AV_IDLE_CNT) >>
 					PWRMGTCR0_AV_IDLE_CNT_SHIFT;
 
@@ -494,8 +493,7 @@ static ssize_t store_altivec_idle_wait_time(struct device *dev,
 
 	altivec_idle_wt = value;
 
-	smp_call_function_single(cpu, set_altivec_idle_wait_entry_bit,
-				&entry_bit, 1);
+	smp_call(cpu, set_altivec_idle_wait_entry_bit, &entry_bit, SMP_CALL_TYPE_SYNC);
 
 	return count;
 }
@@ -768,7 +766,7 @@ static ssize_t idle_purr_show(struct device *dev,
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
 	u64 val;
 
-	smp_call_function_single(cpu->dev.id, read_idle_purr, &val, 1);
+	smp_call(cpu->dev.id, read_idle_purr, &val, SMP_CALL_TYPE_SYNC);
 	return sprintf(buf, "%llx\n", val);
 }
 static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL);
@@ -798,7 +796,7 @@ static ssize_t idle_spurr_show(struct device *dev,
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
 	u64 val;
 
-	smp_call_function_single(cpu->dev.id, read_idle_spurr, &val, 1);
+	smp_call(cpu->dev.id, read_idle_spurr, &val, SMP_CALL_TYPE_SYNC);
 	return sprintf(buf, "%llx\n", val);
 }
 static DEVICE_ATTR(idle_spurr, 0400, idle_spurr_show, NULL);
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index bfc27496fe7e..1212a73a1ee3 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -499,7 +499,7 @@ static void start_watchdog(void *arg)
 
 static int start_watchdog_on_cpu(unsigned int cpu)
 {
-	return smp_call_function_single(cpu, start_watchdog, NULL, true);
+	return smp_call(cpu, start_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void stop_watchdog(void *arg)
@@ -522,7 +522,7 @@ static void stop_watchdog(void *arg)
 
 static int stop_watchdog_on_cpu(unsigned int cpu)
 {
-	return smp_call_function_single(cpu, stop_watchdog, NULL, true);
+	return smp_call(cpu, stop_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 static void watchdog_calc_timeouts(void)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6fa518f6501d..3a9df7302421 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1392,7 +1392,7 @@ static unsigned long kvmppc_read_dpdes(struct kvm_vcpu *vcpu)
 		 */
 		pcpu = READ_ONCE(v->cpu);
 		if (pcpu >= 0)
-			smp_call_function_single(pcpu, do_nothing, NULL, 1);
+			smp_call(pcpu, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 		if (kvmppc_doorbell_pending(v))
 			dpdes |= 1 << thr;
 	}
@@ -3082,7 +3082,7 @@ static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu)
 		struct kvm *running = *per_cpu_ptr(&cpu_in_guest, i);
 
 		if (running == kvm)
-			smp_call_function_single(i, do_nothing, NULL, 1);
+			smp_call(i, do_nothing, NULL, SMP_CALL_TYPE_SYNC);
 	}
 }
 
@@ -3136,8 +3136,8 @@ static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu)
 			    cpu_first_tlb_thread_sibling(pcpu))
 				radix_flush_cpu(kvm, prev_cpu, vcpu);
 
-			smp_call_function_single(prev_cpu,
-					do_migrate_away_vcpu, vcpu, 1);
+			smp_call(prev_cpu, do_migrate_away_vcpu,
+				  vcpu, SMP_CALL_TYPE_SYNC);
 		}
 		if (nested)
 			nested->prev_cpu[vcpu->arch.nested_vcpu_id] = pcpu;
diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index fddf5e1cda6d..c4db585a8d01 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -300,12 +300,10 @@ static int smp_85xx_kick_cpu(int nr)
 		 * the other.
 		 */
 		if (cpu_online(primary)) {
-			smp_call_function_single(primary,
-					wake_hw_thread, &nr, 1);
+			smp_call(primary, wake_hw_thread, &nr, SMP_CALL_TYPE_SYNC);
 			goto done;
 		} else if (cpu_online(primary + 1)) {
-			smp_call_function_single(primary + 1,
-					wake_hw_thread, &nr, 1);
+			smp_call(primary + 1, wake_hw_thread, &nr, SMP_CALL_TYPE_SYNC);
 			goto done;
 		}
 
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 02e5d1c54fcc..1c5937c3199f 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -297,7 +297,7 @@ int watchdog_nmi_enable(unsigned int cpu)
 	if (!nmi_init_done)
 		return 0;
 
-	smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
+	smp_call(cpu, start_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
@@ -310,5 +310,5 @@ void watchdog_nmi_disable(unsigned int cpu)
 	if (atomic_read(&nmi_active) == -1)
 		pr_warn_once("NMI watchdog cannot be enabled or disabled\n");
 	else
-		smp_call_function_single(cpu, stop_nmi_watchdog, NULL, 1);
+		smp_call(cpu, stop_nmi_watchdog, NULL, SMP_CALL_TYPE_SYNC);
 }
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 3e6f6b448f6a..4f9f973b3ef3 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -1266,7 +1266,7 @@ static void __init print_local_APICs(int maxcpu)
 	for_each_online_cpu(cpu) {
 		if (cpu >= maxcpu)
 			break;
-		smp_call_function_single(cpu, print_local_APIC, NULL, 1);
+		smp_call(cpu, print_local_APIC, NULL, SMP_CALL_TYPE_SYNC);
 	}
 	preempt_enable();
 }
diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index 9ca008f9e9b1..5688727068a6 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -77,7 +77,8 @@ static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait)
 		return true;
 
 	if (!atomic_xchg(&s->scfpending, 1) || wait)
-		smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait);
+		smp_call(cpu, aperfmperf_snapshot_khz, NULL,
+				(wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 
 	/* Return false if the previous iteration was too long ago. */
 	return time_delta <= APERFMPERF_STALE_THRESHOLD_MS;
@@ -145,7 +146,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
 	msleep(APERFMPERF_REFRESH_DELAY_MS);
 	atomic_set(&s->scfpending, 1);
 	smp_mb(); /* ->scfpending before smp_call_function_single(). */
-	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
+	smp_call(cpu, aperfmperf_snapshot_khz, NULL, SMP_CALL_TYPE_SYNC);
 
 	return per_cpu(samples.khz, cpu);
 }
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1940d305db1c..101ac8ef99fd 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -929,7 +929,7 @@ store_interrupt_enable(struct threshold_block *b, const char *buf, size_t size)
 	memset(&tr, 0, sizeof(tr));
 	tr.b		= b;
 
-	if (smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1))
+	if (smp_call(b->cpu, threshold_restart_bank, &tr, SMP_CALL_TYPE_SYNC))
 		return -ENODEV;
 
 	return size;
@@ -954,7 +954,7 @@ store_threshold_limit(struct threshold_block *b, const char *buf, size_t size)
 	b->threshold_limit = new;
 	tr.b = b;
 
-	if (smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1))
+	if (smp_call(b->cpu, threshold_restart_bank, &tr, SMP_CALL_TYPE_SYNC))
 		return -ENODEV;
 
 	return size;
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 728c7df0f56f..1cf408bf45e1 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -550,19 +550,19 @@ static void do_inject(void)
 
 	i_mce.mcgstatus = mcg_status;
 	i_mce.inject_flags = inj_type;
-	smp_call_function_single(cpu, prepare_msrs, &i_mce, 0);
+	smp_call(cpu, prepare_msrs, &i_mce, SMP_CALL_TYPE_ASYNC);
 
 	toggle_hw_mce_inject(cpu, false);
 
 	switch (inj_type) {
 	case DFR_INT_INJ:
-		smp_call_function_single(cpu, trigger_dfr_int, NULL, 0);
+		smp_call(cpu, trigger_dfr_int, NULL, SMP_CALL_TYPE_ASYNC);
 		break;
 	case THR_INT_INJ:
-		smp_call_function_single(cpu, trigger_thr_int, NULL, 0);
+		smp_call(cpu, trigger_thr_int, NULL, SMP_CALL_TYPE_ASYNC);
 		break;
 	default:
-		smp_call_function_single(cpu, trigger_mce, NULL, 0);
+		smp_call(cpu, trigger_mce, NULL, SMP_CALL_TYPE_ASYNC);
 	}
 
 err:
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index f955d25076ba..c4aa5dbfea6b 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -332,7 +332,7 @@ static int collect_cpu_info_on_target(int cpu, struct cpu_signature *cpu_sig)
 	struct cpu_info_ctx ctx = { .cpu_sig = cpu_sig, .err = 0 };
 	int ret;
 
-	ret = smp_call_function_single(cpu, collect_cpu_info_local, &ctx, 1);
+	ret = smp_call(cpu, collect_cpu_info_local, &ctx, SMP_CALL_TYPE_SYNC);
 	if (!ret)
 		ret = ctx.err;
 
@@ -365,7 +365,7 @@ static int apply_microcode_on_target(int cpu)
 	enum ucode_state err;
 	int ret;
 
-	ret = smp_call_function_single(cpu, apply_microcode_local, &err, 1);
+	ret = smp_call(cpu, apply_microcode_local, &err, SMP_CALL_TYPE_SYNC);
 	if (!ret) {
 		if (err == UCODE_ERROR)
 			ret = 1;
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 2746cac9d8a9..1ef6068a9970 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -820,7 +820,7 @@ void mtrr_save_state(void)
 		return;
 
 	first_cpu = cpumask_first(cpu_online_mask);
-	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
+	smp_call(first_cpu, mtrr_save_fixed_ranges, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 void set_mtrr_aps_delayed_init(void)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index aafa68060c45..41b9b79e060c 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -541,7 +541,7 @@ static void _update_task_closid_rmid(void *task)
 static void update_task_closid_rmid(struct task_struct *t)
 {
 	if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
-		smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
+		smp_call(task_cpu(t), _update_task_closid_rmid, t, SMP_CALL_TYPE_SYNC);
 	else
 		_update_task_closid_rmid(t);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5955093e0deb..64b1646ed56c 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1118,7 +1118,7 @@ void arch_haltpoll_enable(unsigned int cpu)
 	}
 
 	/* Enable guest halt poll disables host halt poll */
-	smp_call_function_single(cpu, kvm_disable_host_haltpoll, NULL, 1);
+	smp_call(cpu, kvm_disable_host_haltpoll, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(arch_haltpoll_enable);
 
@@ -1128,7 +1128,7 @@ void arch_haltpoll_disable(unsigned int cpu)
 		return;
 
 	/* Disable guest halt poll enables host halt poll */
-	smp_call_function_single(cpu, kvm_enable_host_haltpoll, NULL, 1);
+	smp_call(cpu, kvm_enable_host_haltpoll, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL_GPL(arch_haltpoll_disable);
 #endif
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 04d170c4b61e..8e531221d810 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -656,8 +656,7 @@ void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
 	int cpu = loaded_vmcs->cpu;
 
 	if (cpu != -1)
-		smp_call_function_single(cpu,
-			 __loaded_vmcs_clear, loaded_vmcs, 1);
+		smp_call(cpu, __loaded_vmcs_clear, loaded_vmcs, SMP_CALL_TYPE_SYNC);
 }
 
 static bool vmx_segment_cache_test_set(struct vcpu_vmx *vmx, unsigned seg,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9cd1dd022879..7715911fd54f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4552,8 +4552,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		if (static_call(kvm_x86_has_wbinvd_exit)())
 			cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
 		else if (vcpu->cpu != -1 && vcpu->cpu != cpu)
-			smp_call_function_single(vcpu->cpu,
-					wbinvd_ipi, NULL, 1);
+			smp_call(vcpu->cpu, wbinvd_ipi, NULL, SMP_CALL_TYPE_SYNC);
 	}
 
 	static_call(kvm_x86_vcpu_load)(vcpu, cpu);
@@ -8730,7 +8729,7 @@ static void __kvmclock_cpufreq_notifier(struct cpufreq_freqs *freq, int cpu)
 	 *
 	 */
 
-	smp_call_function_single(cpu, tsc_khz_changed, freq, 1);
+	smp_call(cpu, tsc_khz_changed, freq, SMP_CALL_TYPE_SYNC);
 
 	mutex_lock(&kvm_lock);
 	list_for_each_entry(kvm, &vm_list, vm_list) {
@@ -8757,7 +8756,7 @@ static void __kvmclock_cpufreq_notifier(struct cpufreq_freqs *freq, int cpu)
 		 * guest context is entered kvmclock will be updated,
 		 * so the guest will not see stale values.
 		 */
-		smp_call_function_single(cpu, tsc_khz_changed, freq, 1);
+		smp_call(cpu, tsc_khz_changed, freq, SMP_CALL_TYPE_SYNC);
 	}
 }
 
diff --git a/arch/x86/lib/cache-smp.c b/arch/x86/lib/cache-smp.c
index 20d00dc2bb8c..bbfa2ef4c1ea 100644
--- a/arch/x86/lib/cache-smp.c
+++ b/arch/x86/lib/cache-smp.c
@@ -9,7 +9,7 @@ static void __wbinvd(void *dummy)
 
 void wbinvd_on_cpu(int cpu)
 {
-	smp_call_function_single(cpu, __wbinvd, NULL, 1);
+	smp_call(cpu, __wbinvd, NULL, SMP_CALL_TYPE_SYNC);
 }
 EXPORT_SYMBOL(wbinvd_on_cpu);
 
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index 71b8bf97450e..b31708a73129 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -41,7 +41,7 @@ int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
 	memset(&rv, 0, sizeof(rv));
 
 	rv.msr_no = msr_no;
-	err = smp_call_function_single(cpu, __rdmsr_on_cpu, &rv, 1);
+	err = smp_call(cpu, __rdmsr_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 	*l = rv.reg.l;
 	*h = rv.reg.h;
 
@@ -57,7 +57,7 @@ int rdmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 *q)
 	memset(&rv, 0, sizeof(rv));
 
 	rv.msr_no = msr_no;
-	err = smp_call_function_single(cpu, __rdmsr_on_cpu, &rv, 1);
+	err = smp_call(cpu, __rdmsr_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 	*q = rv.reg.q;
 
 	return err;
@@ -74,7 +74,7 @@ int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
 	rv.msr_no = msr_no;
 	rv.reg.l = l;
 	rv.reg.h = h;
-	err = smp_call_function_single(cpu, __wrmsr_on_cpu, &rv, 1);
+	err = smp_call(cpu, __wrmsr_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err;
 }
@@ -90,7 +90,7 @@ int wrmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 q)
 	rv.msr_no = msr_no;
 	rv.reg.q = q;
 
-	err = smp_call_function_single(cpu, __wrmsr_on_cpu, &rv, 1);
+	err = smp_call(cpu, __wrmsr_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err;
 }
@@ -200,7 +200,7 @@ int wrmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
 	rv.msr_no = msr_no;
 	rv.reg.l = l;
 	rv.reg.h = h;
-	err = smp_call_function_single(cpu, __wrmsr_safe_on_cpu, &rv, 1);
+	err = smp_call(cpu, __wrmsr_safe_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
@@ -216,7 +216,7 @@ int wrmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 q)
 	rv.msr_no = msr_no;
 	rv.reg.q = q;
 
-	err = smp_call_function_single(cpu, __wrmsr_safe_on_cpu, &rv, 1);
+	err = smp_call(cpu, __wrmsr_safe_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
@@ -259,7 +259,7 @@ int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8])
 
 	rv.regs   = regs;
 	rv.err    = -EIO;
-	err = smp_call_function_single(cpu, __rdmsr_safe_regs_on_cpu, &rv, 1);
+	err = smp_call(cpu, __rdmsr_safe_regs_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
@@ -272,7 +272,7 @@ int wrmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8])
 
 	rv.regs = regs;
 	rv.err  = -EIO;
-	err = smp_call_function_single(cpu, __wrmsr_safe_regs_on_cpu, &rv, 1);
+	err = smp_call(cpu, __wrmsr_safe_regs_on_cpu, &rv, SMP_CALL_TYPE_SYNC);
 
 	return err ? err : rv.err;
 }
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 1122038107ff..22626c6727bf 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -930,7 +930,7 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
 		for_each_online_cpu(cpu) {
 			if (per_cpu(xen_current_cr3, cpu) != __pa(mm->pgd))
 				continue;
-			smp_call_function_single(cpu, drop_mm_ref_this_cpu, mm, 1);
+			smp_call(cpu, drop_mm_ref_this_cpu, mm, SMP_CALL_TYPE_SYNC);
 		}
 		return;
 	}
diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c
index 5878ba8f0274..2d30ae400d5b 100644
--- a/arch/xtensa/kernel/smp.c
+++ b/arch/xtensa/kernel/smp.c
@@ -201,7 +201,7 @@ static int boot_secondary(unsigned int cpu, struct task_struct *ts)
 	system_flush_invalidate_dcache_range((unsigned long)&cpu_start_id,
 					     sizeof(cpu_start_id));
 #endif
-	smp_call_function_single(0, mx_cpu_start, (void *)cpu, 1);
+	smp_call(0, mx_cpu_start, (void *)cpu, SMP_CALL_TYPE_SYNC);
 
 	for (i = 0; i < 2; ++i) {
 		do
@@ -220,8 +220,7 @@ static int boot_secondary(unsigned int cpu, struct task_struct *ts)
 		} while (ccount && time_before(jiffies, timeout));
 
 		if (ccount) {
-			smp_call_function_single(0, mx_cpu_stop,
-						 (void *)cpu, 1);
+			smp_call(0, mx_cpu_stop, (void *)cpu, SMP_CALL_TYPE_SYNC);
 			WRITE_ONCE(cpu_start_ccount, 0);
 			return -EIO;
 		}
@@ -292,7 +291,7 @@ int __cpu_disable(void)
 
 static void platform_cpu_kill(unsigned int cpu)
 {
-	smp_call_function_single(0, mx_cpu_stop, (void *)cpu, true);
+	smp_call(0, mx_cpu_stop, (void *)cpu, SMP_CALL_TYPE_SYNC);
 }
 
 /*
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 4556c86c3465..2ba4b0344021 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -160,8 +160,8 @@ static void __lapic_timer_propagate_broadcast(void *arg)
 
 static void lapic_timer_propagate_broadcast(struct acpi_processor *pr)
 {
-	smp_call_function_single(pr->id, __lapic_timer_propagate_broadcast,
-				 (void *)pr, 1);
+	smp_call(pr->id, __lapic_timer_propagate_broadcast,
+			 (void *)pr, SMP_CALL_TYPE_SYNC);
 }
 
 /* Power(C) State timer broadcast control */
diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
index d289036beff2..f5a83043c321 100644
--- a/drivers/cpufreq/powernow-k8.c
+++ b/drivers/cpufreq/powernow-k8.c
@@ -1025,7 +1025,7 @@ static int powernowk8_cpu_init(struct cpufreq_policy *pol)
 	struct init_on_cpu init_on_cpu;
 	int rc, cpu;
 
-	smp_call_function_single(pol->cpu, check_supported_cpu, &rc, 1);
+	smp_call(pol->cpu, check_supported_cpu, &rc, SMP_CALL_TYPE_SYNC);
 	if (rc)
 		return -ENODEV;
 
@@ -1062,8 +1062,7 @@ static int powernowk8_cpu_init(struct cpufreq_policy *pol)
 
 	/* only run on specific CPU from here on */
 	init_on_cpu.data = data;
-	smp_call_function_single(data->cpu, powernowk8_cpu_init_on_cpu,
-				 &init_on_cpu, 1);
+	smp_call(data->cpu, powernowk8_cpu_init_on_cpu, &init_on_cpu, SMP_CALL_TYPE_SYNC);
 	rc = init_on_cpu.rc;
 	if (rc != 0)
 		goto err_out_exit_acpi;
@@ -1124,7 +1123,7 @@ static unsigned int powernowk8_get(unsigned int cpu)
 	if (!data)
 		return 0;
 
-	smp_call_function_single(cpu, query_values_on_cpu, &err, true);
+	smp_call(cpu, query_values_on_cpu, &err, SMP_CALL_TYPE_SYNC);
 	if (err)
 		goto out;
 
@@ -1182,7 +1181,7 @@ static int powernowk8_init(void)
 
 	cpus_read_lock();
 	for_each_online_cpu(i) {
-		smp_call_function_single(i, check_supported_cpu, &ret, 1);
+		smp_call(i, check_supported_cpu, &ret, SMP_CALL_TYPE_SYNC);
 		if (!ret)
 			supported_cpus++;
 	}
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index d4f45bb9c419..745c01118b33 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -883,7 +883,7 @@ static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 
 	freq_data.pstate_id = idx_to_pstate(powernv_pstate_info.min);
 	freq_data.gpstate_id = idx_to_pstate(powernv_pstate_info.min);
-	smp_call_function_single(policy->cpu, set_pstate, &freq_data, 1);
+	smp_call(policy->cpu, set_pstate, &freq_data, SMP_CALL_TYPE_SYNC);
 	if (gpstates)
 		del_timer_sync(&gpstates->timer);
 
diff --git a/drivers/cpufreq/sparc-us2e-cpufreq.c b/drivers/cpufreq/sparc-us2e-cpufreq.c
index 92acbb25abb3..8c447bf67dc5 100644
--- a/drivers/cpufreq/sparc-us2e-cpufreq.c
+++ b/drivers/cpufreq/sparc-us2e-cpufreq.c
@@ -236,7 +236,7 @@ static unsigned int us2e_freq_get(unsigned int cpu)
 	unsigned long clock_tick, estar;
 
 	clock_tick = sparc64_get_clock_tick(cpu) / 1000;
-	if (smp_call_function_single(cpu, __us2e_freq_get, &estar, 1))
+	if (smp_call(cpu, __us2e_freq_get, &estar, SMP_CALL_TYPE_SYNC))
 		return 0;
 
 	return clock_tick / estar_to_divisor(estar);
@@ -268,7 +268,7 @@ static int us2e_freq_target(struct cpufreq_policy *policy, unsigned int index)
 {
 	unsigned int cpu = policy->cpu;
 
-	return smp_call_function_single(cpu, __us2e_freq_target, &index, 1);
+	return smp_call(cpu, __us2e_freq_target, &index, SMP_CALL_TYPE_SYNC);
 }
 
 static int __init us2e_freq_cpu_init(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/sparc-us3-cpufreq.c b/drivers/cpufreq/sparc-us3-cpufreq.c
index e41b35b16afd..0dd7cd16bf93 100644
--- a/drivers/cpufreq/sparc-us3-cpufreq.c
+++ b/drivers/cpufreq/sparc-us3-cpufreq.c
@@ -87,7 +87,7 @@ static unsigned int us3_freq_get(unsigned int cpu)
 {
 	unsigned long reg;
 
-	if (smp_call_function_single(cpu, read_safari_cfg, &reg, 1))
+	if (smp_call(cpu, read_safari_cfg, &reg, SMP_CALL_TYPE_SYNC))
 		return 0;
 	return get_current_freq(cpu, reg);
 }
@@ -116,7 +116,7 @@ static int us3_freq_target(struct cpufreq_policy *policy, unsigned int index)
 		BUG();
 	}
 
-	return smp_call_function_single(cpu, update_safari_cfg, &new_bits, 1);
+	return smp_call(cpu, update_safari_cfg, &new_bits, SMP_CALL_TYPE_SYNC);
 }
 
 static int __init us3_freq_cpu_init(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/speedstep-ich.c b/drivers/cpufreq/speedstep-ich.c
index f2076d72bf39..be062ac5804c 100644
--- a/drivers/cpufreq/speedstep-ich.c
+++ b/drivers/cpufreq/speedstep-ich.c
@@ -243,7 +243,7 @@ static unsigned int speedstep_get(unsigned int cpu)
 	unsigned int speed;
 
 	/* You're supposed to ensure CPU is online. */
-	BUG_ON(smp_call_function_single(cpu, get_freq_data, &speed, 1));
+	BUG_ON(smp_call(cpu, get_freq_data, &speed, SMP_CALL_TYPE_SYNC));
 
 	pr_debug("detected %u kHz as current frequency\n", speed);
 	return speed;
@@ -262,8 +262,7 @@ static int speedstep_target(struct cpufreq_policy *policy, unsigned int index)
 
 	policy_cpu = cpumask_any_and(policy->cpus, cpu_online_mask);
 
-	smp_call_function_single(policy_cpu, _speedstep_set_state, &index,
-				 true);
+	smp_call(policy_cpu, _speedstep_set_state, &index, SMP_CALL_TYPE_SYNC);
 
 	return 0;
 }
@@ -299,7 +298,7 @@ static int speedstep_cpu_init(struct cpufreq_policy *policy)
 
 	/* detect low and high frequency and transition latency */
 	gf.policy = policy;
-	smp_call_function_single(policy_cpu, get_freqs_on_cpu, &gf, 1);
+	smp_call(policy_cpu, get_freqs_on_cpu, &gf, SMP_CALL_TYPE_SYNC);
 	if (gf.ret)
 		return gf.ret;
 
diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c
index 99fc456f16b6..09a76382f383 100644
--- a/drivers/cpufreq/tegra194-cpufreq.c
+++ b/drivers/cpufreq/tegra194-cpufreq.c
@@ -203,13 +203,13 @@ static unsigned int tegra194_get_speed(u32 cpu)
 	int ret;
 	u32 cl;
 
-	smp_call_function_single(cpu, get_cpu_cluster, &cl, true);
+	smp_call(cpu, get_cpu_cluster, &cl, SMP_CALL_TYPE_SYNC);
 
 	/* reconstruct actual cpu freq using counters */
 	rate = tegra194_calculate_speed(cpu);
 
 	/* get last written ndiv value */
-	ret = smp_call_function_single(cpu, get_cpu_ndiv, &ndiv, true);
+	ret = smp_call(cpu, get_cpu_ndiv, &ndiv, SMP_CALL_TYPE_SYNC);
 	if (WARN_ON_ONCE(ret))
 		return rate;
 
@@ -240,7 +240,7 @@ static int tegra194_cpufreq_init(struct cpufreq_policy *policy)
 	u32 cpu;
 	u32 cl;
 
-	smp_call_function_single(policy->cpu, get_cpu_cluster, &cl, true);
+	smp_call(policy->cpu, get_cpu_cluster, &cl, SMP_CALL_TYPE_SYNC);
 
 	if (cl >= data->num_clusters || !data->tables[cl])
 		return -EINVAL;
diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 8845ec4b4402..f051a658f1cf 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -590,8 +590,7 @@ static int debug_probe(struct amba_device *adev, const struct amba_id *id)
 
 	cpus_read_lock();
 	per_cpu(debug_drvdata, drvdata->cpu) = drvdata;
-	ret = smp_call_function_single(drvdata->cpu, debug_init_arch_data,
-				       drvdata, 1);
+	ret = smp_call(drvdata->cpu, debug_init_arch_data, drvdata, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 
 	if (ret) {
diff --git a/drivers/hwtracing/coresight/coresight-etm3x-core.c b/drivers/hwtracing/coresight/coresight-etm3x-core.c
index 7d413ba8b823..7f11115163e2 100644
--- a/drivers/hwtracing/coresight/coresight-etm3x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm3x-core.c
@@ -518,8 +518,8 @@ static int etm_enable_sysfs(struct coresight_device *csdev)
 	 */
 	if (cpu_online(drvdata->cpu)) {
 		arg.drvdata = drvdata;
-		ret = smp_call_function_single(drvdata->cpu,
-					       etm_enable_hw_smp_call, &arg, 1);
+		ret = smp_call(drvdata->cpu, etm_enable_hw_smp_call,
+				&arg, SMP_CALL_TYPE_SYNC);
 		if (!ret)
 			ret = arg.rc;
 		if (!ret)
@@ -630,7 +630,7 @@ static void etm_disable_sysfs(struct coresight_device *csdev)
 	 * Executing etm_disable_hw on the cpu whose ETM is being disabled
 	 * ensures that register writes occur when cpu is powered.
 	 */
-	smp_call_function_single(drvdata->cpu, etm_disable_hw, drvdata, 1);
+	smp_call(drvdata->cpu, etm_disable_hw, drvdata, SMP_CALL_TYPE_SYNC);
 
 	spin_unlock(&drvdata->spinlock);
 	cpus_read_unlock();
@@ -864,8 +864,7 @@ static int etm_probe(struct amba_device *adev, const struct amba_id *id)
 	if (!desc.name)
 		return -ENOMEM;
 
-	if (smp_call_function_single(drvdata->cpu,
-				     etm_init_arch_data,  drvdata, 1))
+	if (smp_call(drvdata->cpu, etm_init_arch_data,  drvdata, SMP_CALL_TYPE_SYNC))
 		dev_err(dev, "ETM arch init failed\n");
 
 	if (etm_arch_supported(drvdata->arch) == false)
@@ -933,7 +932,7 @@ static void etm_remove(struct amba_device *adev)
 	 * CPU i ensures these call backs has consistent view
 	 * inside one call back function.
 	 */
-	if (smp_call_function_single(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, 1))
+	if (smp_call(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, SMP_CALL_TYPE_SYNC))
 		etmdrvdata[drvdata->cpu] = NULL;
 
 	cpus_read_unlock();
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 7f416a12000e..4f136e477808 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -746,8 +746,8 @@ static int etm4_enable_sysfs(struct coresight_device *csdev)
 	 * ensures that register writes occur when cpu is powered.
 	 */
 	arg.drvdata = drvdata;
-	ret = smp_call_function_single(drvdata->cpu,
-				       etm4_enable_hw_smp_call, &arg, 1);
+	ret = smp_call(drvdata->cpu, etm4_enable_hw_smp_call,
+			&arg, SMP_CALL_TYPE_SYNC);
 	if (!ret)
 		ret = arg.rc;
 	if (!ret)
@@ -903,7 +903,7 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
 	 * Executing etm4_disable_hw on the cpu whose ETM is being disabled
 	 * ensures that register writes occur when cpu is powered.
 	 */
-	smp_call_function_single(drvdata->cpu, etm4_disable_hw, drvdata, 1);
+	smp_call(drvdata->cpu, etm4_disable_hw, drvdata, SMP_CALL_TYPE_SYNC);
 
 	spin_unlock(&drvdata->spinlock);
 	cpus_read_unlock();
@@ -1977,8 +1977,8 @@ static int etm4_probe(struct device *dev, void __iomem *base, u32 etm_pid)
 	init_arg.csa = &desc.access;
 	init_arg.pid = etm_pid;
 
-	if (smp_call_function_single(drvdata->cpu,
-				etm4_init_arch_data,  &init_arg, 1))
+	if (smp_call(drvdata->cpu, etm4_init_arch_data,
+				&init_arg, SMP_CALL_TYPE_SYNC))
 		dev_err(dev, "ETM arch init failed\n");
 
 	if (!drvdata->arch)
@@ -2118,7 +2118,7 @@ static int __exit etm4_remove_dev(struct etmv4_drvdata *drvdata)
 	 * CPU i ensures these call backs has consistent view
 	 * inside one call back function.
 	 */
-	if (smp_call_function_single(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, 1))
+	if (smp_call(drvdata->cpu, clear_etmdrvdata, &drvdata->cpu, SMP_CALL_TYPE_SYNC))
 		etmdrvdata[drvdata->cpu] = NULL;
 
 	cpus_read_unlock();
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
index 21687cc1e4e2..377f7e40b081 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
@@ -2379,7 +2379,7 @@ static u32 etmv4_cross_read(const struct etmv4_drvdata *drvdata, u32 offset)
 	 * smp cross call ensures the CPU will be powered up before
 	 * accessing the ETMv4 trace core registers
 	 */
-	smp_call_function_single(drvdata->cpu, do_smp_cross_read, &reg, 1);
+	smp_call(drvdata->cpu, do_smp_cross_read, &reg, SMP_CALL_TYPE_SYNC);
 	return reg.data;
 }
 
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 2b386bb848f8..c0e3cff19cdb 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -1350,12 +1350,12 @@ static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
 
 	for_each_cpu(cpu, &drvdata->supported_cpus) {
 		/* If we fail to probe the CPU, let us defer it to hotplug callbacks */
-		if (smp_call_function_single(cpu, arm_trbe_probe_cpu, drvdata, 1))
+		if (smp_call(cpu, arm_trbe_probe_cpu, drvdata, SMP_CALL_TYPE_SYNC))
 			continue;
 		if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
 			arm_trbe_register_coresight_cpu(drvdata, cpu);
 		if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
-			smp_call_function_single(cpu, arm_trbe_enable_cpu, drvdata, 1);
+			smp_call(cpu, arm_trbe_enable_cpu, drvdata, SMP_CALL_TYPE_SYNC);
 	}
 	return 0;
 }
@@ -1365,7 +1365,7 @@ static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
 	int cpu;
 
 	for_each_cpu(cpu, &drvdata->supported_cpus)
-		smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
+		smp_call(cpu, arm_trbe_remove_coresight_cpu, drvdata, SMP_CALL_TYPE_SYNC);
 	free_percpu(drvdata->cpudata);
 	return 0;
 }
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 77042499baa4..165c59778a33 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4304,8 +4304,8 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
 		/* Update the interrupt mask on each CPU according the
 		 * new mapping
 		 */
-		smp_call_function_single(cpu, mvneta_percpu_unmask_interrupt,
-					 pp, true);
+		smp_call(cpu, mvneta_percpu_unmask_interrupt,
+			  pp, SMP_CALL_TYPE_SYNC);
 		i++;
 
 	}
diff --git a/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c b/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c
index 1b6eab071068..2ae76aa42d50 100644
--- a/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c
+++ b/drivers/platform/x86/intel/speed_select_if/isst_if_mbox_msr.c
@@ -124,8 +124,8 @@ static long isst_if_mbox_proc_cmd(u8 *cmd_ptr, int *write_only, int resume)
 	 * and also with wait flag, wait for completion.
 	 * smp_call_function_single is using get_cpu() and put_cpu().
 	 */
-	ret = smp_call_function_single(action.mbox_cmd->logical_cpu,
-				       msrl_update_func, &action, 1);
+	ret = smp_call(action.mbox_cmd->logical_cpu,
+			msrl_update_func, &action, SMP_CALL_TYPE_SYNC);
 	if (ret)
 		return ret;
 
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 07611a00b78f..518ef59e7467 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -913,7 +913,7 @@ static void package_power_limit_irq_save(struct rapl_package *rp)
 	if (!boot_cpu_has(X86_FEATURE_PTS) || !boot_cpu_has(X86_FEATURE_PLN))
 		return;
 
-	smp_call_function_single(rp->lead_cpu, power_limit_irq_save_cpu, rp, 1);
+	smp_call(rp->lead_cpu, power_limit_irq_save_cpu, rp, SMP_CALL_TYPE_SYNC);
 }
 
 /*
diff --git a/drivers/powercap/intel_rapl_msr.c b/drivers/powercap/intel_rapl_msr.c
index 1be45f36ab6c..0430bafbffed 100644
--- a/drivers/powercap/intel_rapl_msr.c
+++ b/drivers/powercap/intel_rapl_msr.c
@@ -128,7 +128,7 @@ static int rapl_msr_write_raw(int cpu, struct reg_action *ra)
 {
 	int ret;
 
-	ret = smp_call_function_single(cpu, rapl_msr_update_func, ra, 1);
+	ret = smp_call(cpu, rapl_msr_update_func, ra, SMP_CALL_TYPE_SYNC);
 	if (WARN_ON_ONCE(ret))
 		return ret;
 
diff --git a/drivers/regulator/qcom_spmi-regulator.c b/drivers/regulator/qcom_spmi-regulator.c
index 02bfce981150..222fc60bb1f3 100644
--- a/drivers/regulator/qcom_spmi-regulator.c
+++ b/drivers/regulator/qcom_spmi-regulator.c
@@ -1292,8 +1292,7 @@ spmi_regulator_saw_set_voltage(struct regulator_dev *rdev, unsigned selector)
 	}
 
 	/* Always do the SAW register writes on the first CPU */
-	return smp_call_function_single(0, spmi_saw_set_vdd, \
-					&voltage_sel, true);
+	return smp_call(0, spmi_saw_set_vdd, &voltage_sel, SMP_CALL_TYPE_SYNC);
 }
 
 static struct regulator_ops spmi_saw_ops = {};
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index fde4edd83c14..4fb3cab1bf5d 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -2548,8 +2548,8 @@ void qman_delete_cgr_safe(struct qman_cgr *cgr)
 {
 	preempt_disable();
 	if (qman_cgr_cpus[cgr->cgrid] != smp_processor_id()) {
-		smp_call_function_single(qman_cgr_cpus[cgr->cgrid],
-					 qman_delete_cgr_smp_call, cgr, true);
+		smp_call(qman_cgr_cpus[cgr->cgrid],
+			  qman_delete_cgr_smp_call, cgr, SMP_CALL_TYPE_SYNC);
 		preempt_enable();
 		return;
 	}
diff --git a/drivers/soc/fsl/qbman/qman_test_stash.c b/drivers/soc/fsl/qbman/qman_test_stash.c
index b7e8e5ec884c..e4110c3177e5 100644
--- a/drivers/soc/fsl/qbman/qman_test_stash.c
+++ b/drivers/soc/fsl/qbman/qman_test_stash.c
@@ -507,8 +507,9 @@ static int init_phase3(void)
 				if (err)
 					return err;
 			} else {
-				smp_call_function_single(hp_cpu->processor_id,
-					init_handler_cb, hp_cpu->iterator, 1);
+				smp_call(hp_cpu->processor_id,
+					init_handler_cb, hp_cpu->iterator,
+					SMP_CALL_TYPE_SYNC);
 			}
 			preempt_enable();
 		}
@@ -607,8 +608,8 @@ int qman_test_stash(void)
 		if (err)
 			goto failed;
 	} else {
-		smp_call_function_single(special_handler->processor_id,
-					 send_first_frame_cb, NULL, 1);
+		smp_call(special_handler->processor_id,
+				 send_first_frame_cb, NULL, SMP_CALL_TYPE_SYNC);
 	}
 	preempt_enable();
 
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 8923a806757f..9e826d250384 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -174,9 +174,8 @@ extern int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
 #define	SMP_CALL_TYPE_IRQ_WORK	CSD_TYPE_IRQ_WORK
 #define	SMP_CALL_TYPE_TTWU		CSD_TYPE_TTWU
 #define	SMP_CALL_TYPE_MASK		CSD_FLAG_TYPE_MASK
-#define	SMP_CALL_ALL		-1
 
-#define	SMP_CALL_ALL		-1
+#define	SMP_CALL_ALL		-1	/* cross call on all online CPUs */
 
 extern int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags);
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5797c2a7a93f..a7669693a33c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1091,8 +1091,8 @@ void cpuhp_report_idle_dead(void)
 	 * We cannot call complete after rcu_report_dead() so we delegate it
 	 * to an online cpu.
 	 */
-	smp_call_function_single(cpumask_first(cpu_online_mask),
-				 cpuhp_complete_idle_dead, st, 0);
+	smp_call(cpumask_first(cpu_online_mask),
+			 cpuhp_complete_idle_dead, st, SMP_CALL_TYPE_ASYNC);
 }
 
 static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 23bb19716ad3..e324f3061526 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -116,8 +116,8 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info)
 	int ret;
 
 	for (;;) {
-		ret = smp_call_function_single(task_cpu(p), remote_function,
-					       &data, 1);
+		ret = smp_call(task_cpu(p), remote_function,
+				&data, SMP_CALL_TYPE_SYNC);
 		if (!ret)
 			ret = data.ret;
 
@@ -149,7 +149,7 @@ static int cpu_function_call(int cpu, remote_function_f func, void *info)
 		.ret	= -ENXIO, /* No such CPU */
 	};
 
-	smp_call_function_single(cpu, remote_function, &data, 1);
+	smp_call(cpu, remote_function, &data, SMP_CALL_TYPE_SYNC);
 
 	return data.ret;
 }
@@ -4513,7 +4513,7 @@ static int perf_event_read(struct perf_event *event, bool group)
 		 * Therefore, either way, we'll have an up-to-date event count
 		 * after this.
 		 */
-		(void)smp_call_function_single(event_cpu, __perf_event_read, &data, 1);
+		(void)smp_call(event_cpu, __perf_event_read, &data, SMP_CALL_TYPE_SYNC);
 		preempt_enable();
 		ret = data.ret;
 
@@ -13292,7 +13292,7 @@ static void perf_event_exit_cpu_context(int cpu)
 		ctx = &cpuctx->ctx;
 
 		mutex_lock(&ctx->mutex);
-		smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
+		smp_call(cpu, __perf_event_exit_context, ctx, SMP_CALL_TYPE_SYNC);
 		cpuctx->online = 0;
 		mutex_unlock(&ctx->mutex);
 	}
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 55d049c39608..ad9c33487526 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2662,8 +2662,7 @@ static int rcu_torture_barrier_cbs(void *arg)
 		 * The above smp_load_acquire() ensures barrier_phase load
 		 * is ordered before the following ->call().
 		 */
-		if (smp_call_function_single(myid, rcu_torture_barrier1cb,
-					     &rcu, 1)) {
+		if (smp_call(myid, rcu_torture_barrier1cb, &rcu, SMP_CALL_TYPE_SYNC)) {
 			// IPI failed, so use direct call from current CPU.
 			cur_ops->call(&rcu, rcu_torture_barrier_cbf);
 		}
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 99cf3a13954c..428fe1451aeb 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1299,7 +1299,7 @@ static void trc_wait_for_one_reader(struct task_struct *t,
 		per_cpu(trc_ipi_to_cpu, cpu) = true;
 		t->trc_ipi_to_cpu = cpu;
 		rcu_tasks_trace.n_ipis++;
-		if (smp_call_function_single(cpu, trc_read_check_handler, t, 0)) {
+		if (smp_call(cpu, trc_read_check_handler, t, SMP_CALL_TYPE_ASYNC)) {
 			// Just in case there is some other reason for
 			// failure than the target CPU being offline.
 			WARN_ONCE(1, "%s():  smp_call_function_single() failed for CPU: %d\n",
@@ -1473,7 +1473,7 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
 	// changes, there will need to be a recheck and/or timed wait.
 	for_each_online_cpu(cpu)
 		if (WARN_ON_ONCE(smp_load_acquire(per_cpu_ptr(&trc_ipi_to_cpu, cpu))))
-			smp_call_function_single(cpu, rcu_tasks_trace_empty_fn, NULL, 1);
+			smp_call(cpu, rcu_tasks_trace_empty_fn, NULL, SMP_CALL_TYPE_SYNC);
 
 	// Remove the safety count.
 	smp_mb__before_atomic();  // Order vs. earlier atomics
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7bf1db5ebade..aac99328d193 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4100,7 +4100,7 @@ void rcu_barrier(void)
 			continue;
 		}
 		raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);
-		if (smp_call_function_single(cpu, rcu_barrier_handler, (void *)cpu, 1)) {
+		if (smp_call(cpu, rcu_barrier_handler, (void *)cpu, SMP_CALL_TYPE_SYNC)) {
 			schedule_timeout_uninterruptible(1);
 			goto retry;
 		}
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 60197ea24ceb..ef4cec63a544 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -391,7 +391,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
 			put_cpu();
 			continue;
 		}
-		ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
+		ret = smp_call(cpu, rcu_exp_handler, NULL, SMP_CALL_TYPE_ASYNC);
 		put_cpu();
 		/* The CPU will report the QS in response to the IPI. */
 		if (!ret)
@@ -777,7 +777,7 @@ static void sync_sched_exp_online_cleanup(int cpu)
 		return;
 	}
 	/* Quiescent state needed on some other CPU, send IPI. */
-	ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
+	ret = smp_call(cpu, rcu_exp_handler, NULL, SMP_CALL_TYPE_ASYNC);
 	put_cpu();
 	WARN_ON_ONCE(ret);
 }
diff --git a/kernel/relay.c b/kernel/relay.c
index d1a67fbb819d..0e615f5367a7 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -635,9 +635,8 @@ int relay_late_setup_files(struct rchan *chan,
 			disp.dentry = dentry;
 			smp_mb();
 			/* relay_channels_mutex must be held, so wait. */
-			err = smp_call_function_single(i,
-						       __relay_set_buf_dentry,
-						       &disp, 1);
+			err = smp_call(i, __relay_set_buf_dentry,
+					&disp, SMP_CALL_TYPE_SYNC);
 		}
 		if (unlikely(err))
 			break;
diff --git a/kernel/scftorture.c b/kernel/scftorture.c
index 13f387583a54..7839a2e5d141 100644
--- a/kernel/scftorture.c
+++ b/kernel/scftorture.c
@@ -351,7 +351,8 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 			barrier(); // Prevent race-reduction compiler optimizations.
 			scfcp->scfc_in = true;
 		}
-		ret = smp_call_function_single(cpu, scf_handler_1, (void *)scfcp, scfsp->scfs_wait);
+		ret = smp_call(cpu, scf_handler_1, (void *)scfcp,
+				(scfsp->scfs_wait ? SMP_CALL_TYPE_SYNC : SMP_CALL_TYPE_ASYNC));
 		if (ret) {
 			if (scfsp->scfs_wait)
 				scfp->n_single_wait_ofl++;
@@ -372,7 +373,7 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 		scfcp->scfc_rpc = true;
 		barrier(); // Prevent race-reduction compiler optimizations.
 		scfcp->scfc_in = true;
-		ret = smp_call_function_single(cpu, scf_handler_1, (void *)scfcp, 0);
+		ret = smp_call(cpu, scf_handler_1, (void *)scfcp, SMP_CALL_TYPE_ASYNC);
 		if (!ret) {
 			if (use_cpus_read_lock)
 				cpus_read_unlock();
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 1b230c226912..16138cc06f6e 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -377,7 +377,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		 * smp_call_function_single() will call ipi_func() if cpu_id
 		 * is the calling CPU.
 		 */
-		smp_call_function_single(cpu_id, ipi_func, NULL, 1);
+		smp_call(cpu_id, ipi_func, NULL, SMP_CALL_TYPE_SYNC);
 	} else {
 		/*
 		 * For regular membarrier, we can save a few cycles by
diff --git a/kernel/smp.c b/kernel/smp.c
index 25ee37f746d4..b441784d7a91 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -596,36 +596,6 @@ void flush_smp_call_function_from_idle(void)
 	local_irq_restore(flags);
 }
 
-/*
- * This is a temporarily hook up. This function will be eliminated
- * with the last patch in this series.
- *
- * smp_call_function_single - Run a function on a specific CPU
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait until function has completed on other CPUs.
- *
- * Returns 0 on success, else a negative status code.
- */
-int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
-			int wait)
-{
-	unsigned int flags = 0;
-
-	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu))
-		return -ENXIO;
-
-	if (wait)
-		flags = SMP_CALL_TYPE_SYNC;
-	else
-		flags = SMP_CALL_TYPE_ASYNC;
-
-	smp_call(cpu, func, info, flags);
-
-	return 0;
-}
-EXPORT_SYMBOL(smp_call_function_single);
-
 /*
  * This function performs synchronous and asynchronous cross CPU call for
  * more than one participants.
@@ -1068,8 +1038,8 @@ int smp_call_private(int cpu, call_single_data_t *csd, unsigned int flags)
 	int err = 0;
 
 	if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
-		pr_warn("cpu ID must be a positive number < nr_cpu_ids and must be currently online\n");
-		return -EINVAL;
+		pr_warn("cpu ID must be >=0 && < nr_cpu_ids\n");
+		return -ENXIO;
 	}
 
 	if (csd == NULL) {
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 003ccf338d20..862d56cd554a 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -418,7 +418,7 @@ static int clockevents_unbind(struct clock_event_device *ced, int cpu)
 {
 	struct ce_unbind cu = { .ce = ced, .res = -ENODEV };
 
-	smp_call_function_single(cpu, __clockevents_unbind, &cu, 1);
+	smp_call(cpu, __clockevents_unbind, &cu, SMP_CALL_TYPE_SYNC);
 	return cu.res;
 }
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 95d7ca35bdf2..7830fa290474 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -355,7 +355,7 @@ void clocksource_verify_percpu(struct clocksource *cs)
 		if (cpu == testcpu)
 			continue;
 		csnow_begin = cs->read(cs);
-		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		smp_call(cpu, clocksource_verify_one_cpu, cs, SMP_CALL_TYPE_SYNC);
 		csnow_end = cs->read(cs);
 		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
 		if (delta < 0)
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 46789356f856..608a176552d6 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -195,7 +195,7 @@ static void tick_take_do_timer_from_boot(void)
 	int from = tick_do_timer_boot_cpu;
 
 	if (from >= 0 && from != cpu)
-		smp_call_function_single(from, giveup_do_timer, &cpu, 1);
+		smp_call(from, giveup_do_timer, &cpu, SMP_CALL_TYPE_SYNC);
 }
 #endif
 
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index e7b9c2636d10..d9f3c5ef05ee 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -822,8 +822,8 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 		 */
 		err = -ENXIO;
 	} else {
-		err = smp_call_function_single(cpu, __bpf_prog_test_run_raw_tp,
-					       &info, 1);
+		err = smp_call(cpu, __bpf_prog_test_run_raw_tp,
+				&info, SMP_CALL_TYPE_SYNC);
 	}
 	put_cpu();
 
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index 4d97f6d807e7..40d63b31051a 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -507,8 +507,7 @@ static void iucv_setmask_mp(void)
 		/* Enable all cpus with a declared buffer. */
 		if (cpumask_test_cpu(cpu, &iucv_buffer_cpumask) &&
 		    !cpumask_test_cpu(cpu, &iucv_irq_cpumask))
-			smp_call_function_single(cpu, iucv_allow_cpu,
-						 NULL, 1);
+			smp_call(cpu, iucv_allow_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	cpus_read_unlock();
 }
 
@@ -526,7 +525,7 @@ static void iucv_setmask_up(void)
 	cpumask_copy(&cpumask, &iucv_irq_cpumask);
 	cpumask_clear_cpu(cpumask_first(&iucv_irq_cpumask), &cpumask);
 	for_each_cpu(cpu, &cpumask)
-		smp_call_function_single(cpu, iucv_block_cpu, NULL, 1);
+		smp_call(cpu, iucv_block_cpu, NULL, SMP_CALL_TYPE_SYNC);
 }
 
 /*
@@ -551,7 +550,7 @@ static int iucv_enable(void)
 	/* Declare per cpu buffers. */
 	rc = -EIO;
 	for_each_online_cpu(cpu)
-		smp_call_function_single(cpu, iucv_declare_cpu, NULL, 1);
+		smp_call(cpu, iucv_declare_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	if (cpumask_empty(&iucv_buffer_cpumask))
 		/* No cpu could declare an iucv buffer. */
 		goto out;
@@ -641,8 +640,8 @@ static int iucv_cpu_down_prep(unsigned int cpu)
 	iucv_retrieve_cpu(NULL);
 	if (!cpumask_empty(&iucv_irq_cpumask))
 		return 0;
-	smp_call_function_single(cpumask_first(&iucv_buffer_cpumask),
-				 iucv_allow_cpu, NULL, 1);
+	smp_call(cpumask_first(&iucv_buffer_cpumask),
+		  iucv_allow_cpu, NULL, SMP_CALL_TYPE_SYNC);
 	return 0;
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e4b1e15f1965..d6aedfc4da56 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5696,7 +5696,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	c.ret = &r;
 	c.opaque = opaque;
 	for_each_online_cpu(cpu) {
-		smp_call_function_single(cpu, check_processor_compat, &c, 1);
+		smp_call(cpu, check_processor_compat, &c, SMP_CALL_TYPE_SYNC);
 		if (r < 0)
 			goto out_free_2;
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 11/11] smp: modify up.c to adopt the same format of cross CPU call.
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (9 preceding siblings ...)
  2022-04-22 20:00 ` [PATCH v2 10/11] smp: replace smp_call_function_single() with smp_call() Donghai Qiao
@ 2022-04-22 20:00 ` Donghai Qiao
  2022-04-23  5:17   ` kernel test robot
  2022-04-23  5:58   ` kernel test robot
  2022-04-26 14:00 ` [PATCH v2 00/11] smp: cross CPU call interface Christoph Hellwig
  11 siblings, 2 replies; 26+ messages in thread
From: Donghai Qiao @ 2022-04-22 20:00 UTC (permalink / raw)
  To: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

Since smp.c has been changed to use the new interface, up.c should
be changed to use the uniprocessor version of cross call as well.

Signed-off-by: Donghai Qiao <dqiao@redhat.com>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

 kernel/up.c | 56 +++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/kernel/up.c b/kernel/up.c
index a38b8b095251..b442e011a059 100644
--- a/kernel/up.c
+++ b/kernel/up.c
@@ -9,8 +9,7 @@
 #include <linux/smp.h>
 #include <linux/hypervisor.h>
 
-int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
-				int wait)
+int smp_call(int cpu, void (*func) (void *info), void *info, unsigned int type)
 {
 	unsigned long flags;
 
@@ -23,37 +22,66 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
 
 	return 0;
 }
-EXPORT_SYMBOL(smp_call_function_single);
+EXPORT_SYMBOL(smp_call);
 
-int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
+int smp_call_cond(int cpu, smp_call_func_t func, void *info,
+		   smp_cond_func_t cond_func, unsigned int type)
+{
+	int ret = 0;
+
+	preempt_disable();
+	if (!cond_func || cond_func(0, info))
+		ret = smp_call(cpu, func, info, type);
+
+	preempt_enable();
+
+	return ret;
+}
+EXPORT_SYMBOL(smp_call_cond);
+
+void smp_call_mask(const struct cpumask *mask, smp_call_func_t func, void *info, unsigned int type)
 {
 	unsigned long flags;
 
-	local_irq_save(flags);
-	csd->func(csd->info);
-	local_irq_restore(flags);
+	if (!cpumask_test_cpu(0, mask))
+		return;
+
+	preempt_disable();
+	smp_call(cpu, func, info, type);
+	preempt_enable();
+}
+EXPORT_SYMBOL(smp_call_mask);
+
+int smp_call_private(int cpu, struct __call_single_data *csd, unsigned int type)
+{
+	preempt_disable();
+
+	if (csd->func != NULL)
+		smp_call(cpu, csd->func, csd->info, type);
+
+	preempt_enable();
+
 	return 0;
 }
-EXPORT_SYMBOL(smp_call_function_single_async);
+EXPORT_SYMBOL(smp_call_private);
 
 /*
  * Preemption is disabled here to make sure the cond_func is called under the
  * same conditions in UP and SMP.
  */
-void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
-			   void *info, bool wait, const struct cpumask *mask)
+void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
+			 void *info, smp_cond_func_t cond_func,
+			 unsigned int type)
 {
 	unsigned long flags;
 
 	preempt_disable();
 	if ((!cond_func || cond_func(0, info)) && cpumask_test_cpu(0, mask)) {
-		local_irq_save(flags);
-		func(info);
-		local_irq_restore(flags);
+		smp_call(cpu, func, info, type);
 	}
 	preempt_enable();
 }
-EXPORT_SYMBOL(on_each_cpu_cond_mask);
+EXPORT_SYMBOL(smp_call_mask_cond);
 
 int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h
  2022-04-22 20:00 ` [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
@ 2022-04-23  4:57   ` kernel test robot
  2022-04-25  8:39   ` Peter Zijlstra
  2022-04-25  9:52   ` Thomas Gleixner
  2 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2022-04-23  4:57 UTC (permalink / raw)
  To: Donghai Qiao, akpm, sfr, arnd, heying24, andriy.shevchenko,
	axboe, rdunlap, tglx, gor
  Cc: kbuild-all, donghai.w.qiao, linux-kernel, Donghai Qiao

Hi Donghai,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on rafael-pm/linux-next linus/master v5.18-rc3 next-20220422]
[cannot apply to tip/x86/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: sparc-randconfig-r026-20220422 (https://download.01.org/0day-ci/archive/20220423/202204231212.Os1QwhiN-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/1c14bdafe09740690b249824f3526527e4c02d2a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
        git checkout 1c14bdafe09740690b249824f3526527e4c02d2a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross W=1 O=build_dir ARCH=sparc SHELL=/bin/bash lib/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/lockdep.h:14,
                    from include/linux/spinlock.h:62,
                    from include/linux/kref.h:16,
                    from include/linux/mm_types.h:8,
                    from include/linux/buildid.h:5,
                    from include/linux/module.h:14,
                    from lib/test_bitops.c:9:
>> include/linux/smp.h:131:14: error: 'seq_type' defined but not used [-Werror=unused-variable]
     131 | static char *seq_type[] = {
         |              ^~~~~~~~
   cc1: all warnings being treated as errors


vim +/seq_type +131 include/linux/smp.h

   130	
 > 131	static char *seq_type[] = {
   132		[CFD_SEQ_QUEUE]		= "queue",
   133		[CFD_SEQ_IPI]		= "ipi",
   134		[CFD_SEQ_NOIPI]		= "noipi",
   135		[CFD_SEQ_PING]		= "ping",
   136		[CFD_SEQ_PINGED]	= "pinged",
   137		[CFD_SEQ_HANDLE]	= "handle",
   138		[CFD_SEQ_DEQUEUE]	= "dequeue (src CPU 0 == empty)",
   139		[CFD_SEQ_IDLE]		= "idle",
   140		[CFD_SEQ_GOTIPI]	= "gotipi",
   141		[CFD_SEQ_HDLEND]	= "hdlend (src CPU 0 == early)",
   142	};
   143	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 11/11] smp: modify up.c to adopt the same format of cross CPU call.
  2022-04-22 20:00 ` [PATCH v2 11/11] smp: modify up.c to adopt the same format of cross CPU call Donghai Qiao
@ 2022-04-23  5:17   ` kernel test robot
  2022-04-23  5:58   ` kernel test robot
  1 sibling, 0 replies; 26+ messages in thread
From: kernel test robot @ 2022-04-23  5:17 UTC (permalink / raw)
  To: Donghai Qiao, akpm, sfr, arnd, peterz, heying24,
	andriy.shevchenko, axboe, rdunlap, tglx, gor
  Cc: kbuild-all, donghai.w.qiao, linux-kernel, Donghai Qiao

Hi Donghai,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on rafael-pm/linux-next linus/master v5.18-rc3]
[cannot apply to tip/x86/core next-20220422]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20220423/202204231333.D1x82f1d-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/655b898028ef1555f6bec036db8d4681b551aaa8
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
        git checkout 655b898028ef1555f6bec036db8d4681b551aaa8
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/up.c: In function 'smp_call_mask':
>> kernel/up.c:50:18: error: 'cpu' undeclared (first use in this function); did you mean 'fpu'?
      50 |         smp_call(cpu, func, info, type);
         |                  ^~~
         |                  fpu
   kernel/up.c:50:18: note: each undeclared identifier is reported only once for each function it appears in
   kernel/up.c:44:23: warning: unused variable 'flags' [-Wunused-variable]
      44 |         unsigned long flags;
         |                       ^~~~~
   kernel/up.c: In function 'smp_call_mask_cond':
   kernel/up.c:80:26: error: 'cpu' undeclared (first use in this function); did you mean 'fpu'?
      80 |                 smp_call(cpu, func, info, type);
         |                          ^~~
         |                          fpu
   kernel/up.c:76:23: warning: unused variable 'flags' [-Wunused-variable]
      76 |         unsigned long flags;
         |                       ^~~~~
   In file included from include/linux/percpu.h:7,
                    from include/linux/context_tracking_state.h:5,
                    from include/linux/hardirq.h:5,
                    from include/linux/interrupt.h:11,
                    from kernel/up.c:6:
   At top level:
   include/linux/smp.h:131:14: warning: 'seq_type' defined but not used [-Wunused-variable]
     131 | static char *seq_type[] = {
         |              ^~~~~~~~


vim +50 kernel/up.c

    41	
    42	void smp_call_mask(const struct cpumask *mask, smp_call_func_t func, void *info, unsigned int type)
    43	{
    44		unsigned long flags;
    45	
    46		if (!cpumask_test_cpu(0, mask))
    47			return;
    48	
    49		preempt_disable();
  > 50		smp_call(cpu, func, info, type);
    51		preempt_enable();
    52	}
    53	EXPORT_SYMBOL(smp_call_mask);
    54	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 11/11] smp: modify up.c to adopt the same format of cross CPU call.
  2022-04-22 20:00 ` [PATCH v2 11/11] smp: modify up.c to adopt the same format of cross CPU call Donghai Qiao
  2022-04-23  5:17   ` kernel test robot
@ 2022-04-23  5:58   ` kernel test robot
  1 sibling, 0 replies; 26+ messages in thread
From: kernel test robot @ 2022-04-23  5:58 UTC (permalink / raw)
  To: Donghai Qiao, akpm, sfr, arnd, peterz, heying24,
	andriy.shevchenko, axboe, rdunlap, tglx, gor
  Cc: llvm, kbuild-all, donghai.w.qiao, linux-kernel, Donghai Qiao

Hi Donghai,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on rafael-pm/linux-next linus/master v5.18-rc3]
[cannot apply to tip/x86/core next-20220422]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: i386-randconfig-a015 (https://download.01.org/0day-ci/archive/20220423/202204231305.905VHc1H-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5bd87350a5ae429baf8f373cb226a57b62f87280)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/655b898028ef1555f6bec036db8d4681b551aaa8
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
        git checkout 655b898028ef1555f6bec036db8d4681b551aaa8
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> kernel/up.c:50:11: error: use of undeclared identifier 'cpu'
           smp_call(cpu, func, info, type);
                    ^
   kernel/up.c:80:12: error: use of undeclared identifier 'cpu'
                   smp_call(cpu, func, info, type);
                            ^
   2 errors generated.


vim +/cpu +50 kernel/up.c

    41	
    42	void smp_call_mask(const struct cpumask *mask, smp_call_func_t func, void *info, unsigned int type)
    43	{
    44		unsigned long flags;
    45	
    46		if (!cpumask_test_cpu(0, mask))
    47			return;
    48	
    49		preempt_disable();
  > 50		smp_call(cpu, func, info, type);
    51		preempt_enable();
    52	}
    53	EXPORT_SYMBOL(smp_call_mask);
    54	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 05/11] smp: replace smp_call_function_single_async() with  smp_call_private()
  2022-04-22 20:00 ` [PATCH v2 05/11] smp: replace smp_call_function_single_async() with smp_call_private() Donghai Qiao
@ 2022-04-23  7:30   ` kernel test robot
  2022-04-24 22:06       ` Nathan Chancellor
  2022-04-25  9:35   ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: kernel test robot @ 2022-04-23  7:30 UTC (permalink / raw)
  To: Donghai Qiao, akpm, sfr, arnd, peterz, heying24,
	andriy.shevchenko, axboe, rdunlap, tglx, gor
  Cc: llvm, kbuild-all, donghai.w.qiao, linux-kernel, Donghai Qiao

Hi Donghai,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on rafael-pm/linux-next linus/master v5.18-rc3 next-20220422]
[cannot apply to tip/x86/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: i386-randconfig-a002 (https://download.01.org/0day-ci/archive/20220423/202204231532.nXbSK9Tk-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5bd87350a5ae429baf8f373cb226a57b62f87280)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/3b8a90029bebdb77e2d5c0cd16f371e83d8ed2e8
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
        git checkout 3b8a90029bebdb77e2d5c0cd16f371e83d8ed2e8
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> block/blk-mq.c:1065:39: warning: passing 4-byte aligned argument to 16-byte aligned parameter 2 of 'smp_call_private' may result in an unaligned pointer access [-Walign-mismatch]
                   smp_call_function_single_async(cpu, &rq->csd);
                                                       ^
   1 warning generated.


vim +/smp_call_private +1065 block/blk-mq.c

963395269c7586 Christoph Hellwig         2020-06-11  1055  
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1056  static void blk_mq_complete_send_ipi(struct request *rq)
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1057  {
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1058  	struct llist_head *list;
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1059  	unsigned int cpu;
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1060  
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1061  	cpu = rq->mq_ctx->cpu;
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1062  	list = &per_cpu(blk_cpu_done, cpu);
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1063  	if (llist_add(&rq->ipi_list, list)) {
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1064  		INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq);
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 @1065  		smp_call_function_single_async(cpu, &rq->csd);
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1066  	}
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1067  }
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23  1068  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 05/11] smp: replace smp_call_function_single_async() with  smp_call_private()
  2022-04-23  7:30   ` kernel test robot
@ 2022-04-24 22:06       ` Nathan Chancellor
  0 siblings, 0 replies; 26+ messages in thread
From: Nathan Chancellor @ 2022-04-24 22:06 UTC (permalink / raw)
  To: kernel test robot
  Cc: Donghai Qiao, akpm, sfr, arnd, peterz, heying24,
	andriy.shevchenko, axboe, rdunlap, tglx, gor, llvm, kbuild-all,
	donghai.w.qiao, linux-kernel

On Sat, Apr 23, 2022 at 03:30:18PM +0800, kernel test robot wrote:
> Hi Donghai,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on powerpc/next]
> [also build test WARNING on rafael-pm/linux-next linus/master v5.18-rc3 next-20220422]
> [cannot apply to tip/x86/core]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: i386-randconfig-a002 (https://download.01.org/0day-ci/archive/20220423/202204231532.nXbSK9Tk-lkp@intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5bd87350a5ae429baf8f373cb226a57b62f87280)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/3b8a90029bebdb77e2d5c0cd16f371e83d8ed2e8
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
>         git checkout 3b8a90029bebdb77e2d5c0cd16f371e83d8ed2e8
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):
> 
> >> block/blk-mq.c:1065:39: warning: passing 4-byte aligned argument to 16-byte aligned parameter 2 of 'smp_call_private' may result in an unaligned pointer access [-Walign-mismatch]
>                    smp_call_function_single_async(cpu, &rq->csd);
>                                                        ^
>    1 warning generated.

This warning was fixed by commit 1139aeb1c521 ("smp: Fix
smp_call_function_single_async prototype"), which is undone by this
proposed change, hence the warning.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 05/11] smp: replace smp_call_function_single_async() with smp_call_private()
@ 2022-04-24 22:06       ` Nathan Chancellor
  0 siblings, 0 replies; 26+ messages in thread
From: Nathan Chancellor @ 2022-04-24 22:06 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2382 bytes --]

On Sat, Apr 23, 2022 at 03:30:18PM +0800, kernel test robot wrote:
> Hi Donghai,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on powerpc/next]
> [also build test WARNING on rafael-pm/linux-next linus/master v5.18-rc3 next-20220422]
> [cannot apply to tip/x86/core]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: i386-randconfig-a002 (https://download.01.org/0day-ci/archive/20220423/202204231532.nXbSK9Tk-lkp(a)intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5bd87350a5ae429baf8f373cb226a57b62f87280)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/3b8a90029bebdb77e2d5c0cd16f371e83d8ed2e8
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
>         git checkout 3b8a90029bebdb77e2d5c0cd16f371e83d8ed2e8
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):
> 
> >> block/blk-mq.c:1065:39: warning: passing 4-byte aligned argument to 16-byte aligned parameter 2 of 'smp_call_private' may result in an unaligned pointer access [-Walign-mismatch]
>                    smp_call_function_single_async(cpu, &rq->csd);
>                                                        ^
>    1 warning generated.

This warning was fixed by commit 1139aeb1c521 ("smp: Fix
smp_call_function_single_async prototype"), which is undone by this
proposed change, hence the warning.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h
  2022-04-22 20:00 ` [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
  2022-04-23  4:57   ` kernel test robot
@ 2022-04-25  8:39   ` Peter Zijlstra
  2022-04-25  9:52   ` Thomas Gleixner
  2 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2022-04-25  8:39 UTC (permalink / raw)
  To: Donghai Qiao
  Cc: akpm, sfr, arnd, heying24, andriy.shevchenko, axboe, rdunlap,
	tglx, gor, donghai.w.qiao, linux-kernel

On Fri, Apr 22, 2022 at 04:00:30PM -0400, Donghai Qiao wrote:
> Move the structure definitions from kernel/smp.c to
> include/linux/smp.h
> 
> Move the structure definitions from include/linux/smp_types.h
> to include/linux/smp.h and delete smp_types.h

This Changelog is missing a *why*. Why are you exposing things that were
not exposed before..

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 02/11] smp: define the cross call interface
  2022-04-22 20:00 ` [PATCH v2 02/11] smp: define the cross call interface Donghai Qiao
@ 2022-04-25  9:05   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2022-04-25  9:05 UTC (permalink / raw)
  To: Donghai Qiao
  Cc: akpm, sfr, arnd, heying24, andriy.shevchenko, axboe, rdunlap,
	tglx, gor, donghai.w.qiao, linux-kernel

On Fri, Apr 22, 2022 at 04:00:31PM -0400, Donghai Qiao wrote:
> The functions of cross CPU call interface are defined below :
> 
> int smp_call(int cpu, smp_call_func_t func, void *info,
> 		unsigned int flags)
> 

> int smp_call_cond(int cpu, smp_call_func_t func, void *info,
> 		smp_cond_func_t condf, unsigned int flags)
> 
> void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
> 		void *info, unsigned int flags)
> 

I think these two are a mistake. See below.

> void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
> 		void *info, smp_cond_func_t condf, unsigned int flags)
> 
> int smp_call_private(int cpu, call_single_data_t *csd, unsigned int flags)

I'm thinking this is a terrible name. At least the current names tells
me what it does, whereas _private tells me nothing.

> int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
> 		void *info, unsigned int flags)
> 
> The motivation of submitting this patch set is intended to make the
> existing cross CPU call mechanism become a bit more formal interface
> and more friendly to the kernel developers.

Why do you think that's needed? Mostly, it at all possible, you
shouldn't IPI.

> Basically the minimum set of functions below can satisfy any demand
> for cross CPU call from kernel consumers. For the sack of simplicity

LOL - s/sack/sake/

> self-explanatory and less code redundancy no ambiguity, the functions
> in this interface are renamed, simplified, or eliminated. But they
> are still inheriting the same semantics and parameter lists from their
> previous version.
> 
> Signed-off-by: Donghai Qiao <dqiao@redhat.com>
> ---
> v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros
> 
>  include/linux/smp.h |  30 +++++++++
>  kernel/smp.c        | 156 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 186 insertions(+)
> 
> diff --git a/include/linux/smp.h b/include/linux/smp.h
> index 31811da856a3..bee1e6b5b2fd 100644
> --- a/include/linux/smp.h
> +++ b/include/linux/smp.h
> @@ -161,6 +161,36 @@ do {						\
>  	*(_csd) = CSD_INIT((_func), (_info));	\
>  } while (0)
>  
> +
> +/*
> + * smp_call Interface.
> + *
> + * Also see kernel/smp.c for the details.
> + */
> +#define	SMP_CALL_TYPE_SYNC		CSD_TYPE_SYNC
> +#define	SMP_CALL_TYPE_ASYNC	CSD_TYPE_ASYNC
> +#define	SMP_CALL_TYPE_IRQ_WORK	CSD_TYPE_IRQ_WORK
> +#define	SMP_CALL_TYPE_TTWU		CSD_TYPE_TTWU
> +#define	SMP_CALL_TYPE_MASK		CSD_FLAG_TYPE_MASK
> +
> +#define	SMP_CALL_ALL		-1
> +
> +extern int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags);
> +
> +extern int smp_call_cond(int cpu, smp_call_func_t func, void *info,
> +		smp_cond_func_t condf, unsigned int flags);
> +
> +extern void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
> +		void *info, unsigned int flags);
> +
> +extern void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
> +		void *info, smp_cond_func_t condf, unsigned int flags);
> +
> +extern int smp_call_private(int cpu, call_single_data_t *csd, unsigned int flags);
> +
> +extern int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
> +		void *info, unsigned int flags);
> +
>  /*
>   * Enqueue a llist_node on the call_single_queue; be very careful, read
>   * flush_smp_call_function_queue() in detail.
> diff --git a/kernel/smp.c b/kernel/smp.c
> index b2b3878f0330..b1e6a8f77a9e 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -1170,3 +1170,159 @@ int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
>  	return sscs.ret;
>  }
>  EXPORT_SYMBOL_GPL(smp_call_on_cpu);
> +
> +
> +void __smp_call_mask_cond(const struct cpumask *mask,
> +		smp_call_func_t func, void *info,
> +		smp_cond_func_t cond_func,
> +		unsigned int flags)
> +{
> +}
> +
> +/*
> + * smp_call Interface
> + *
> + * Consolidate the cross CPU call usage from the history below:
> + *
> + * Normally this interface cannot be used with interrupts disabled or
> + * from a hardware interrupt handler or from a bottom half handler.
> + * But there are two exceptions:
> + * 1) It can be used during early boot while early_boot_irqs_disabled
> + *    is. In this scenario, you should use local_irq_save/restore()
> + *    instead of local_irq_disable/enable()
> + * 2) Because smp_call_private(cpu, csd, SMP_CALL_TYPE_ASYNC) is an asynchonous
> + *    call with a preallocated csd structure, thus it can be called from
> + *    the context where interrupts are disabled.
> + */
> +
> +/*
> + * Parameters:
> + *
> + * cpu: If cpu >=0 && cpu < nr_cpu_ids, the cross call is for that cpu.

If @cpu were unsigned, that could be a single expression.

> + *      If cpu == -1, the cross call is for all the online CPUs

URGH... no, please don't make it easy to IPI all. We shoulnd't ever do
that it at all avoidable. This should be a *hard* thing to make happen.

> + *
> + * func: It is the cross function that the destination CPUs need to execute.
> + *       This function must be fast and non-blocking.
> + *
> + * info: It is the parameter to func().
> + *
> + * flags: The flags specify the manner the cross call is performaned in terms
> + *	  of synchronous or asynchronous.
> + *
> + *	  A synchronous cross call will not return immediately until all
> + *	  the destination CPUs have executed func() and responded the call.
> + *
> + *	  An asynchrouse cross call will return immediately as soon as it
> + *	  has fired all the cross calls and run func() locally if needed
> + *	  regardless the status of the target CPUs.
> + *
> + * Return: %0 on success or negative errno value on error.
> + */
> +int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags)
> +{
> +	return smp_call_cond(cpu, func, info, NULL, flags);
> +}
> +EXPORT_SYMBOL(smp_call);

Given this uses smp_call_cond(), it's defintion should come after the
definition of smp_call_cond().

> +/*
> + * Parameters:
> + *
> + * cond_func: This is a condition function cond_func(cpu, info) invoked by
> + *	      the underlying cross call mechanism only. If the return value
> + *	      from cond_func(cpu, info) is true, the cross call will be sent
> + *	      to that cpu, otherwise the call will not be sent.
> + *
> + * Others: see smp_call().
> + *
> + * Return: %0 on success or negative errno value on error.
> + */
> +int smp_call_cond(int cpu, smp_call_func_t func, void *info,
> +		    smp_cond_func_t cond_func, unsigned int flags)
> +{
> +	preempt_disable();
> +	if (cpu == SMP_CALL_ALL) {
> +		__smp_call_mask_cond(cpu_online_mask, func, info, cond_func, flags);

Should we mandate @cond != NULL ?

> +	} else if ((unsigned int)cpu < nr_cpu_ids)
> +		__smp_call_mask_cond(cpumask_of(cpu), func, info, cond_func, flags);

That's a fairly useless combination. If you IPI a single CPU you had
better already know if @cond is true or not.

> +	else {

Very inconsistent use of {} there.

> +		preempt_enable();
> +		pr_warn("Invalid cpu ID = %d\n", cpu);
> +		return -ENXIO;
> +	}
> +	preempt_enable();
> +	return 0;
> +}
> +EXPORT_SYMBOL(smp_call_cond);
> +
> +/*
> + * Parameters:
> + *
> + * mask: This is the bitmap of CPUs to which the cross call will be sent.
> + *
> + * Others: see smp_call().
> + */
> +void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
> +		void *info, unsigned int flags)
> +{
> +	preempt_disable();
> +	__smp_call_mask_cond(mask, func, info, NULL, flags);
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(smp_call_mask);

Arguably, the whole preempt thing should be in __smp_call_mask_cond(),
also, I really don't see the point of having these two functions above.

> +
> +/*
> + * The combination of smp_call_cond() and smp_call_mask()
> + */
> +void smp_call_mask_cond(const struct cpumask *mask,
> +		smp_call_func_t func, void *info,
> +		smp_cond_func_t cond_func,
> +		unsigned int flags)
> +{
> +	preempt_disable();
> +	__smp_call_mask_cond(mask, func, info, cond_func, flags);
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(smp_call_mask_cond);

This should be something like:

{
	preempt_disable();
	if (!mask) {
		WARN_ON_ONCE(!cond);
		mask = cpu_online_mask;
	}
	__smp_call_mask_cond(mask, func, info, cond_func, flags);
	preempt_enable();
}

So then we have smp_call(@cpu) for if you want to IPI a single specific
CPU and smp_call_mask_cond() for anything else. Additionally if you want
.mask=NULL to IPI-all then .cond *must* be set.

Because IPI-all is *BAD*, it must not be done if it can be avoided.

> +/*
> + * Parameters:
> + *
> + * mask:  Run func() on one of the given CPUs in mask if it is oneline.
> + *        CPU selection preference (from the original comments for
> + *        smp_call_function_any()) :
> + *          1) current cpu if in @mask
> + *          2) any cpu of current node if in @mask
> + *          3) any other online cpu in @mask
> + *
> + * Others, see smp_call().
> + *
> + * Returns 0 on success, else a negative status code (if no cpus were online).
> + */
> +int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
> +		void *info, unsigned int flags)
> +{
> +	return 0;
> +}
> +EXPORT_SYMBOL(smp_call_any);

Seriously? is there anybody who does something daft like this? If you
don't care who gets the IPI, maybe you shouldn't IPI.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 03/11] smp: eliminate SCF_WAIT and SCF_RUN_LOCAL
  2022-04-22 20:00 ` [PATCH v2 03/11] smp: eliminate SCF_WAIT and SCF_RUN_LOCAL Donghai Qiao
@ 2022-04-25  9:10   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2022-04-25  9:10 UTC (permalink / raw)
  To: Donghai Qiao
  Cc: akpm, sfr, arnd, heying24, andriy.shevchenko, axboe, rdunlap,
	tglx, gor, donghai.w.qiao, linux-kernel

On Fri, Apr 22, 2022 at 04:00:32PM -0400, Donghai Qiao wrote:
> The commit a32a4d8a815c ("smp: Run functions concurrently in
> smp_call_function_many_cond()") was to improve the concurrent
> of the cross call execution between local and remote. The change
> in smp_call_function_many_cond() did what was intended, but the
> new macro SCF_WAIT and SCF_RUN_LOCAL and the code around them
> to handle local call were not unnecessary because the modified
> function smp_call_function_many() was able to handle the local
> cross call. So these two macros can be eliminated and the code
> implemented around that can be removed as well.

Maybe I've not had enough wake-up-juice, but I can't parse the above,
what?!


> @@ -787,23 +787,13 @@ int smp_call_function_any(const struct cpumask *mask,
>  }
>  EXPORT_SYMBOL_GPL(smp_call_function_any);
>  
> -/*
> - * Flags to be used as scf_flags argument of smp_call_function_many_cond().
> - *
> - * %SCF_WAIT:		Wait until function execution is completed
> - * %SCF_RUN_LOCAL:	Run also locally if local cpu is set in cpumask
> - */
> -#define SCF_WAIT	(1U << 0)
> -#define SCF_RUN_LOCAL	(1U << 1)
> -
>  static void smp_call_function_many_cond(const struct cpumask *mask,
>  					smp_call_func_t func, void *info,
> -					unsigned int scf_flags,
> +					bool wait,
>  					smp_cond_func_t cond_func)
>  {
>  	int cpu, last_cpu, this_cpu = smp_processor_id();
>  	struct call_function_data *cfd;
> -	bool wait = scf_flags & SCF_WAIT;
>  	bool run_remote = false;
>  	bool run_local = false;
>  	int nr_cpus = 0;
> @@ -829,14 +819,14 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
>  	WARN_ON_ONCE(!in_task());
>  
>  	/* Check if we need local execution. */
> -	if ((scf_flags & SCF_RUN_LOCAL) && cpumask_test_cpu(this_cpu, mask))
> +	if (cpumask_test_cpu(this_cpu, mask))
>  		run_local = true;
>  
>  	/* Check if we need remote execution, i.e., any CPU excluding this one. */
>  	cpu = cpumask_first_and(mask, cpu_online_mask);
>  	if (cpu == this_cpu)
>  		cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
> -	if (cpu < nr_cpu_ids)
> +	if ((unsigned int)cpu < nr_cpu_ids)
>  		run_remote = true;
>  
>  	if (run_remote) {
> @@ -911,12 +901,8 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
>   * @mask: The set of cpus to run on (only runs on online subset).
>   * @func: The function to run. This must be fast and non-blocking.
>   * @info: An arbitrary pointer to pass to the function.
> - * @wait: Bitmask that controls the operation. If %SCF_WAIT is set, wait
> - *        (atomically) until function has completed on other CPUs. If
> - *        %SCF_RUN_LOCAL is set, the function will also be run locally
> - *        if the local CPU is set in the @cpumask.
> - *
> - * If @wait is true, then returns once @func has returned.
> + * @wait: If wait is true, the call will not return until func()
> + *        has completed on other CPUs.
>   *
>   * You must not call this function with disabled interrupts or from a
>   * hardware interrupt handler or from a bottom half handler. Preemption
> @@ -925,7 +911,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
>  void smp_call_function_many(const struct cpumask *mask,
>  			    smp_call_func_t func, void *info, bool wait)
>  {
> -	smp_call_function_many_cond(mask, func, info, wait * SCF_WAIT, NULL);
> +	smp_call_function_many_cond(mask, func, info, wait, NULL);
>  }
>  EXPORT_SYMBOL(smp_call_function_many);

This changes the semantics of smp_call_function_many(), before this, if
self was in the mask it wouldn't call @func, now it will.

I appreciate you want to clean that up, but you can't do that without
auditing all callers.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 04/11] smp: replace smp_call_function_single() with smp_call()
  2022-04-22 20:00 ` [PATCH v2 04/11] smp: replace smp_call_function_single() with smp_call() Donghai Qiao
@ 2022-04-25  9:33   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2022-04-25  9:33 UTC (permalink / raw)
  To: Donghai Qiao
  Cc: akpm, sfr, arnd, heying24, andriy.shevchenko, axboe, rdunlap,
	tglx, gor, donghai.w.qiao, linux-kernel

On Fri, Apr 22, 2022 at 04:00:33PM -0400, Donghai Qiao wrote:
> Eliminated the percpu global csd_data and temporarily hook up
> to smp_call().
> 
> There is no obvious reason or evidence that the differentiation

Using the on-stack csd seems like an obvious benefit to me. Stack is
often cache-hot, while the percpu stuff is likely a cache-miss. Worse,
the call_function_data as used by the mask functions is likely a double
miss due to the O(n^2) nature of the thing.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 05/11] smp: replace smp_call_function_single_async() with  smp_call_private()
  2022-04-22 20:00 ` [PATCH v2 05/11] smp: replace smp_call_function_single_async() with smp_call_private() Donghai Qiao
  2022-04-23  7:30   ` kernel test robot
@ 2022-04-25  9:35   ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2022-04-25  9:35 UTC (permalink / raw)
  To: Donghai Qiao
  Cc: akpm, sfr, arnd, heying24, andriy.shevchenko, axboe, rdunlap,
	tglx, gor, donghai.w.qiao, linux-kernel

On Fri, Apr 22, 2022 at 04:00:34PM -0400, Donghai Qiao wrote:
> Replaced smp_call_function_single_async() with smp_call_private()
> and also extended smp_call_private() to support one CPU synchronous
> call with preallocated csd structures.
> 
> Ideally, the new interface smp_call() should be able to do what
> smp_call_function_single_async() does. Because the csd is provided
> and maintained by the callers, it exposes the risk of corrupting
> the call_single_queue[cpu] linked list if the clents menipulate
> their csd inappropriately. On the other hand, there should have no
> noticeable performance advantage to provide preallocated csd for
> cross call kernel consumers. Thus, in the long run, the consumers
> should change to not use this type of preallocated csd.

You're completely missing the point of this interface. *please* look at
it more carefully.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 06/11] smp: use smp_call_private() fron irq_work.c and core.c
  2022-04-22 20:00 ` [PATCH v2 06/11] smp: use smp_call_private() fron irq_work.c and core.c Donghai Qiao
@ 2022-04-25  9:37   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2022-04-25  9:37 UTC (permalink / raw)
  To: Donghai Qiao
  Cc: akpm, sfr, arnd, heying24, andriy.shevchenko, axboe, rdunlap,
	tglx, gor, donghai.w.qiao, linux-kernel

On Fri, Apr 22, 2022 at 04:00:35PM -0400, Donghai Qiao wrote:
> irq_work.c and core.c should use the cross interface rather than
> using a unpublished internal function __smp_call_single_queue.

They should do no such thing, using an unpublished/private interface was
on purpose. Nobody else should be able to do these things.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h
  2022-04-22 20:00 ` [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
  2022-04-23  4:57   ` kernel test robot
  2022-04-25  8:39   ` Peter Zijlstra
@ 2022-04-25  9:52   ` Thomas Gleixner
  2 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2022-04-25  9:52 UTC (permalink / raw)
  To: Donghai Qiao, akpm, sfr, arnd, peterz, heying24,
	andriy.shevchenko, axboe, rdunlap, gor
  Cc: donghai.w.qiao, linux-kernel, Donghai Qiao

On Fri, Apr 22 2022 at 16:00, Donghai Qiao wrote:
> Move the structure definitions from kernel/smp.c to
> include/linux/smp.h

...

> +static char *seq_type[] = {
> +	[CFD_SEQ_QUEUE]		= "queue",
> +	[CFD_SEQ_IPI]		= "ipi",
> +	[CFD_SEQ_NOIPI]		= "noipi",
> +	[CFD_SEQ_PING]		= "ping",
> +	[CFD_SEQ_PINGED]	= "pinged",
> +	[CFD_SEQ_HANDLE]	= "handle",
> +	[CFD_SEQ_DEQUEUE]	= "dequeue (src CPU 0 == empty)",
> +	[CFD_SEQ_IDLE]		= "idle",
> +	[CFD_SEQ_GOTIPI]	= "gotipi",
> +	[CFD_SEQ_HDLEND]	= "hdlend (src CPU 0 == early)",
> +};

How does this qualify as a structure definition?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 00/11] smp: cross CPU call interface
  2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
                   ` (10 preceding siblings ...)
  2022-04-22 20:00 ` [PATCH v2 11/11] smp: modify up.c to adopt the same format of cross CPU call Donghai Qiao
@ 2022-04-26 14:00 ` Christoph Hellwig
  11 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2022-04-26 14:00 UTC (permalink / raw)
  To: Donghai Qiao
  Cc: akpm, sfr, arnd, peterz, heying24, andriy.shevchenko, axboe,
	rdunlap, tglx, gor, donghai.w.qiao, linux-kernel

On Fri, Apr 22, 2022 at 04:00:29PM -0400, Donghai Qiao wrote:
> The motivation of submitting this patch set is intended to make the
> existing cross CPU call mechanism become a bit more formal interface
> and more friendly to the kernel developers.

As far as I can tell it mostly does the reverse.  The new interfaces
are a lot more verbose for no good reason.  E.g. this is a common
pattern:


-	on_each_cpu(common_shutdown_1, &args, 0);
+	smp_call(SMP_CALL_ALL, common_shutdown_1, &args, SMP_CALL_TYPE_ASYNC);

or this:

-		smp_call_function_single(boot_cpuid, do_remote_read, &x, 1);
+		smp_call(boot_cpuid, do_remote_read, &x, SMP_CALL_TYPE_SYNC);


The old interface more or less made some sense.  The new one is just a mess.

> Patch 1: The smp cross call related structures and definitions are
> consolidated from smp.c smp_types.h to smp.h. As a result, smp_types.h
> is deleted from the source tree.

And a lot of definitions that used to be private are not public.


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-04-26 14:02 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-22 20:00 [PATCH v2 00/11] smp: cross CPU call interface Donghai Qiao
2022-04-22 20:00 ` [PATCH v2 01/11] smp: consolidate the structure definitions to smp.h Donghai Qiao
2022-04-23  4:57   ` kernel test robot
2022-04-25  8:39   ` Peter Zijlstra
2022-04-25  9:52   ` Thomas Gleixner
2022-04-22 20:00 ` [PATCH v2 02/11] smp: define the cross call interface Donghai Qiao
2022-04-25  9:05   ` Peter Zijlstra
2022-04-22 20:00 ` [PATCH v2 03/11] smp: eliminate SCF_WAIT and SCF_RUN_LOCAL Donghai Qiao
2022-04-25  9:10   ` Peter Zijlstra
2022-04-22 20:00 ` [PATCH v2 04/11] smp: replace smp_call_function_single() with smp_call() Donghai Qiao
2022-04-25  9:33   ` Peter Zijlstra
2022-04-22 20:00 ` [PATCH v2 05/11] smp: replace smp_call_function_single_async() with smp_call_private() Donghai Qiao
2022-04-23  7:30   ` kernel test robot
2022-04-24 22:06     ` Nathan Chancellor
2022-04-24 22:06       ` Nathan Chancellor
2022-04-25  9:35   ` Peter Zijlstra
2022-04-22 20:00 ` [PATCH v2 06/11] smp: use smp_call_private() fron irq_work.c and core.c Donghai Qiao
2022-04-25  9:37   ` Peter Zijlstra
2022-04-22 20:00 ` [PATCH v2 07/11] smp: change smp_call_function_any() to smp_call_any() Donghai Qiao
2022-04-22 20:00 ` [PATCH v2 08/11] smp: replace smp_call_function_many_cond() with __smp_call_mask_cond() Donghai Qiao
2022-04-22 20:00 ` [PATCH v2 09/11] smp: replace smp_call_function_single_async with smp_call_private Donghai Qiao
2022-04-22 20:00 ` [PATCH v2 10/11] smp: replace smp_call_function_single() with smp_call() Donghai Qiao
2022-04-22 20:00 ` [PATCH v2 11/11] smp: modify up.c to adopt the same format of cross CPU call Donghai Qiao
2022-04-23  5:17   ` kernel test robot
2022-04-23  5:58   ` kernel test robot
2022-04-26 14:00 ` [PATCH v2 00/11] smp: cross CPU call interface Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.