linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: torvalds@linux-foundation.org, mingo@kernel.org, bp@alien8.de,
	tglx@linutronix.de, luto@kernel.org, namit@vmware.com,
	peterz@infradead.org
Cc: linux-kernel@vger.kernel.org, Nadav Amit <nadav.amit@gmail.com>
Subject: [PATCH v2 2/5] x86/percpu: Relax smp_processor_id()
Date: Thu, 13 Jun 2019 15:54:47 +0200	[thread overview]
Message-ID: <20190613135653.310055383@infradead.org> (raw)
In-Reply-To: 20190613135445.318096781@infradead.org

Nadav reported that since this_cpu_read() became asm-volatile, many
smp_processor_id() users generated worse code due to the extra
constraints.

However since smp_processor_id() is reading a stable value, we can use
__this_cpu_read().

While this does reduce text size somewhat, this mostly results in code
movement to .text.unlikely as a result of more/larger .cold.
subfunctions. Less text on the hotpath is good for I$.

$ ./compare.sh defconfig-build1 defconfig-build2 vmlinux.o
setup_APIC_ibs                                             90         98   -12,+20
force_ibs_eilvt_setup                                     400        413   -57,+70
pci_serr_error                                            109        104   -54,+49
pci_serr_error                                            109        104   -54,+49
unknown_nmi_error                                         125        120   -76,+71
unknown_nmi_error                                         125        120   -76,+71
io_check_error                                            125        132   -97,+104
intel_thermal_interrupt                                   730        822   +92,+0
intel_init_thermal                                        951        945   -6,+0
generic_get_mtrr                                          301        294   -7,+0
generic_get_mtrr                                          301        294   -7,+0
generic_set_all                                           749        754   -44,+49
get_fixed_ranges                                          352        360   -41,+49
x86_acpi_suspend_lowlevel                                 369        363   -6,+0
check_tsc_sync_source                                     412        412   -71,+71
irq_migrate_all_off_this_cpu                              662        674   -14,+26
clocksource_watchdog                                      748        748   -113,+113
__perf_event_account_interrupt                            204        197   -7,+0
attempt_merge                                            1748       1741   -7,+0
intel_guc_send_ct                                        1424       1409   -15,+0
__fini_doorbell                                           235        231   -4,+0
bdw_set_cdclk                                             928        923   -5,+0
gen11_dsi_disable                                        1571       1556   -15,+0
gmbus_wait                                                493        488   -5,+0
md_make_request                                           376        369   -7,+0
__split_and_process_bio                                   543        536   -7,+0
delay_tsc                                                  96         89   -7,+0
hsw_disable_pc8                                           696        691   -5,+0
tsc_verify_tsc_adjust                                     215        228   -22,+35
cpuidle_driver_unref                                       56         49   -7,+0
blk_account_io_completion                                 159        148   -11,+0
mtrr_wrmsr                                                 95         99   -29,+33
__intel_wait_for_register_fw                              401        419   +18,+0
cpuidle_driver_ref                                         43         36   -7,+0
cpuidle_get_driver                                         15          8   -7,+0
blk_account_io_done                                       535        528   -7,+0
irq_migrate_all_off_this_cpu                              662        674   -14,+26
check_tsc_sync_source                                     412        412   -71,+71
irq_wait_for_poll                                         170        163   -7,+0
generic_end_io_acct                                       329        322   -7,+0
x86_acpi_suspend_lowlevel                                 369        363   -6,+0
nohz_balance_enter_idle                                   198        191   -7,+0
generic_start_io_acct                                     254        247   -7,+0
blk_account_io_start                                      341        334   -7,+0
perf_event_task_tick                                      682        675   -7,+0
intel_init_thermal                                        951        945   -6,+0
amd_e400_c1e_apic_setup                                    47         51   -28,+32
setup_APIC_eilvt                                          350        328   -22,+0
hsw_enable_pc8                                           1611       1605   -6,+0
                                             total   12985947   12985892   -994,+939

Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/smp.h |    3 ++-
 include/linux/smp.h        |   45 +++++++++++++++++++++++++++++++--------------
 2 files changed, 33 insertions(+), 15 deletions(-)

--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -162,7 +162,8 @@ __visible void smp_call_function_single_
  * from the initial startup. We map APIC_BASE very early in page_setup(),
  * so this is correct in the x86 case.
  */
-#define raw_smp_processor_id() (this_cpu_read(cpu_number))
+#define raw_smp_processor_id()  this_cpu_read(cpu_number)
+#define __smp_processor_id() __this_cpu_read(cpu_number)
 
 #ifdef CONFIG_X86_32
 extern int safe_smp_processor_id(void);
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -181,29 +181,46 @@ static inline int get_boot_cpu_id(void)
 
 #endif /* !SMP */
 
-/*
- * smp_processor_id(): get the current CPU ID.
+/**
+ * raw_processor_id() - get the current (unstable) CPU id
+ *
+ * For then you know what you are doing and need an unstable
+ * CPU id.
+ */
+
+/**
+ * smp_processor_id() - get the current (stable) CPU id
+ *
+ * This is the normal accessor to the CPU id and should be used
+ * whenever possible.
  *
- * if DEBUG_PREEMPT is enabled then we check whether it is
- * used in a preemption-safe way. (smp_processor_id() is safe
- * if it's used in a preemption-off critical section, or in
- * a thread that is bound to the current CPU.)
+ * The CPU id is stable when:
+ *
+ *  - IRQs are disabled;
+ *  - preemption is disabled;
+ *  - the task is CPU affine.
  *
- * NOTE: raw_smp_processor_id() is for internal use only
- * (smp_processor_id() is the preferred variant), but in rare
- * instances it might also be used to turn off false positives
- * (i.e. smp_processor_id() use that the debugging code reports but
- * which use for some reason is legal). Don't use this to hack around
- * the warning message, as your code might not work under PREEMPT.
+ * When CONFIG_DEBUG_PREEMPT; we verify these assumption and WARN
+ * when smp_processor_id() is used when the CPU id is not stable.
  */
+
+/*
+ * Allow the architecture to differentiate between a stable and unstable read.
+ * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a
+ * regular asm read for the stable.
+ */
+#ifndef __smp_processor_id
+#define __smp_processor_id(x) raw_smp_processor_id(x)
+#endif
+
 #ifdef CONFIG_DEBUG_PREEMPT
   extern unsigned int debug_smp_processor_id(void);
 # define smp_processor_id() debug_smp_processor_id()
 #else
-# define smp_processor_id() raw_smp_processor_id()
+# define smp_processor_id() __smp_processor_id()
 #endif
 
-#define get_cpu()		({ preempt_disable(); smp_processor_id(); })
+#define get_cpu()		({ preempt_disable(); __smp_processor_id(); })
 #define put_cpu()		preempt_enable()
 
 /*



  parent reply	other threads:[~2019-06-13 15:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-13 13:54 [PATCH v2 0/5] x86/percpu semantics and fixes Peter Zijlstra
2019-06-13 13:54 ` [PATCH v2 1/5] x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}() Peter Zijlstra
2019-06-13 13:54 ` Peter Zijlstra [this message]
2019-06-13 13:54 ` [PATCH v2 3/5] x86/percpu, x86/irq: Relax {set,get}_irq_regs() Peter Zijlstra
2019-06-13 13:54 ` [PATCH v2 4/5] x86/percpu, sched/fair: Avoid local_clock() Peter Zijlstra
2019-06-13 13:54 ` [PATCH v2 5/5] x86/percpu: Optimize raw_cpu_xchg() Peter Zijlstra
2019-06-13 18:49 ` [PATCH v2 0/5] x86/percpu semantics and fixes Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190613135653.310055383@infradead.org \
    --to=peterz@infradead.org \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=namit@vmware.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).