All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/9] x86: macrofying inline asm for better compilation
@ 2018-06-12 11:50 Nadav Amit
  2018-06-12 11:50   ` Nadav Amit
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86; +Cc: Nadav Amit

This patch-set deals with an interesting yet stupid problem: kernel code
that does not get inlined despite its simplicity. There are several
causes for this behavior: "cold" attribute on __init, different function
optimization levels; conditional constant computations based on
__builtin_constant_p(); and finally large inline assembly blocks.

This patch-set deals with the inline assembly problem. I separated these
patches from the others (that were sent in the RFC) for easier
inclusion. I also separated the removal of unnecessary new-lines which
would be sent separately.

The problem with inline assembly is that inline assembly is often used
by the kernel for things that are other than code - for example,
assembly directives and data. GCC however is oblivious to the content of
the blocks and assumes their cost in space and time is proportional to
the number of the perceived assembly "instruction", according to the
number of newlines and semicolons. Alternatives, paravirt and other
mechanisms are affected, causing code not to be inlined, and degrading
compilation quality in general.

The solution that this patch-set carries for this problem is to create
an assembly macro, and then call it from the inline assembly block.  As
a result, the compiler sees a single "instruction" and assigns the more
appropriate cost to the code.

To avoid uglification of the code, as many noted, the macros are first
precompiled into an assembly file, which is later assembled together
with the C files. This also enables to avoid duplicate implementation
that was set before for the asm and C code. This can be seen in the
exception table changes.

Overall this patch-set slightly increases the kernel size (my build was
done using my Ubuntu 18.04 config + localyesconfig for the record):

   text	   data	    bss	    dec	    hex	filename
18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)

The number of static functions in the image is reduced by 379, but
actually inlining is even better, which does not always shows in these
numbers: a function may be inlined causing the calling function not to
be inlined.

The Makefile stuff may not be too clean. Ideas for improvements are
welcome.

v3->v4: * Changed naming of macros in 2 patches (PeterZ)
	* Minor cleanup of the paravirt patch

v2->v3: * Several build issues resolved (0-day)
	* Wrong comments fix (Josh)
	* Change asm vs C order in refcount (Kees)

v1->v2:	* Compiling the macros into a separate .s file, improving
	  readability (Linus)
	* Improving assembly formatting, applying most of the comments
	  according to my judgment (Jan)
	* Adding exception-table, cpufeature and jump-labels
	* Removing new-line cleanup; to be submitted separately


Nadav Amit (9):
  Makefile: Prepare for using macros for inline asm
  x86: objtool: use asm macro for better compiler decisions
  x86: refcount: prevent gcc distortions
  x86: alternatives: macrofy locks for better inlining
  x86: bug: prevent gcc distortions
  x86: prevent inline distortion by paravirt ops
  x86: extable: use macros instead of inline assembly
  x86: cpufeature: use macros instead of inline assembly
  x86: jump-labels: use macros instead of inline assembly

 Makefile                               |  9 ++-
 arch/x86/Makefile                      | 11 ++-
 arch/x86/include/asm/alternative-asm.h | 20 ++++--
 arch/x86/include/asm/alternative.h     | 11 +--
 arch/x86/include/asm/asm.h             | 61 +++++++---------
 arch/x86/include/asm/bug.h             | 98 +++++++++++++++-----------
 arch/x86/include/asm/cpufeature.h      | 82 ++++++++++++---------
 arch/x86/include/asm/jump_label.h      | 65 ++++++++++-------
 arch/x86/include/asm/paravirt_types.h  | 56 +++++++--------
 arch/x86/include/asm/refcount.h        | 74 +++++++++++--------
 arch/x86/kernel/Makefile               |  6 ++
 arch/x86/kernel/macros.S               | 16 +++++
 include/asm-generic/bug.h              |  8 +--
 include/linux/compiler.h               | 56 +++++++++++----
 scripts/Kbuild.include                 |  4 +-
 15 files changed, 344 insertions(+), 233 deletions(-)
 create mode 100644 arch/x86/kernel/macros.S

-- 
2.17.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 1/9] Makefile: Prepare for using macros for inline asm
  2018-06-12 11:50 [PATCH v4 0/9] x86: macrofying inline asm for better compilation Nadav Amit
@ 2018-06-12 11:50   ` Nadav Amit
  2018-06-12 11:50   ` Nadav Amit
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Nadav Amit, Masahiro Yamada, Michal Marek, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, linux-kbuild

Using macros for inline assembly improves both readability and
compilation decisions that are distorted by big assembly blocks that use
alternative sections. Compile macros.S and use it to assemble all C
files. Currently, only x86 will use it.

Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kbuild@vger.kernel.org

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 Makefile                 |  9 +++++++--
 arch/x86/Makefile        | 11 +++++++++--
 arch/x86/kernel/Makefile |  6 ++++++
 arch/x86/kernel/macros.S |  7 +++++++
 scripts/Kbuild.include   |  4 +++-
 5 files changed, 32 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/kernel/macros.S

diff --git a/Makefile b/Makefile
index 554dcaddbce4..026586db2f26 100644
--- a/Makefile
+++ b/Makefile
@@ -1085,7 +1085,7 @@ scripts: scripts_basic include/config/auto.conf include/config/tristate.conf \
 # version.h and scripts_basic is processed / created.
 
 # Listed in dependency order
-PHONY += prepare archprepare prepare0 prepare1 prepare2 prepare3
+PHONY += prepare archprepare macroprepare prepare0 prepare1 prepare2 prepare3
 
 # prepare3 is used to check if we are building in a separate output directory,
 # and if so do:
@@ -1109,7 +1109,9 @@ prepare1: prepare2 $(version_h) $(autoksyms_h) include/generated/utsrelease.h \
                    include/config/auto.conf
 	$(cmd_crmodverdir)
 
-archprepare: archheaders archscripts prepare1 scripts_basic
+macroprepare: prepare1 archmacros
+
+archprepare: archheaders archscripts macroprepare scripts_basic
 
 prepare0: archprepare gcc-plugins
 	$(Q)$(MAKE) $(build)=.
@@ -1214,6 +1216,9 @@ archheaders:
 PHONY += archscripts
 archscripts:
 
+PHONY += archmacros
+archmacros:
+
 PHONY += __headers
 __headers: $(version_h) scripts_basic uapi-asm-generic archheaders archscripts
 	$(Q)$(MAKE) $(build)=scripts build_unifdef
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 60135cbd905c..6b82314776fd 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -235,8 +235,8 @@ ifdef CONFIG_X86_64
 LDFLAGS += $(call ld-option, -z max-page-size=0x200000)
 endif
 
-# Speed up the build
-KBUILD_CFLAGS += -pipe
+# We cannot use -pipe flag since we give an additional .s file to the compiler
+#KBUILD_CFLAGS += -pipe
 # Workaround for a gcc prelease that unfortunately was shipped in a suse release
 KBUILD_CFLAGS += -Wno-sign-compare
 #
@@ -258,11 +258,18 @@ archscripts: scripts_basic
 archheaders:
 	$(Q)$(MAKE) $(build)=arch/x86/entry/syscalls all
 
+archmacros:
+	$(Q)$(MAKE) $(build)=arch/x86/kernel macros
+
 archprepare:
 ifeq ($(CONFIG_KEXEC_FILE),y)
 	$(Q)$(MAKE) $(build)=arch/x86/purgatory arch/x86/purgatory/kexec-purgatory.c
 endif
 
+ASM_MACRO_FLAGS = -Wa,arch/x86/kernel/macros.s
+export ASM_MACRO_FLAGS
+KBUILD_CFLAGS += $(ASM_MACRO_FLAGS)
+
 ###
 # Kernel objects
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 02d6f5cf4e70..fdb6c5b2a922 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -9,6 +9,12 @@ extra-y	+= ebda.o
 extra-y	+= platform-quirks.o
 extra-y	+= vmlinux.lds
 
+$(obj)/macros.s: $(obj)/macros.S FORCE
+	$(call if_changed_dep,cpp_s_S)
+
+macros: $(obj)/macros.s
+	@:
+
 CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)
 
 ifdef CONFIG_FUNCTION_TRACER
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
new file mode 100644
index 000000000000..cfc1c7d1a6eb
--- /dev/null
+++ b/arch/x86/kernel/macros.S
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * This file includes headers whose assembly part includes macros which are
+ * commonly used. The macros are precompiled into assmebly file which is later
+ * assembled together with each compiled file.
+ */
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 50cee534fd64..ad2c02062aa4 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -189,7 +189,9 @@ __cc-option = $(call try-run-cached,\
 
 # Do not attempt to build with gcc plugins during cc-option tests.
 # (And this uses delayed resolution so the flags will be up to date.)
-CC_OPTION_CFLAGS = $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS))
+# In addition, do not include the asm macros which are built later.
+CC_OPTION_FILTERED = $(GCC_PLUGINS_CFLAGS) $(ASM_MACRO_FLAGS)
+CC_OPTION_CFLAGS = $(filter-out $(CC_OPTION_FILTERED),$(KBUILD_CFLAGS))
 
 # cc-option
 # Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586)
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 1/9] Makefile: Prepare for using macros for inline asm
@ 2018-06-12 11:50   ` Nadav Amit
  0 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Nadav Amit, Masahiro Yamada, Michal Marek, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, linux-kbuild

Using macros for inline assembly improves both readability and
compilation decisions that are distorted by big assembly blocks that use
alternative sections. Compile macros.S and use it to assemble all C
files. Currently, only x86 will use it.

Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kbuild@vger.kernel.org

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 Makefile                 |  9 +++++++--
 arch/x86/Makefile        | 11 +++++++++--
 arch/x86/kernel/Makefile |  6 ++++++
 arch/x86/kernel/macros.S |  7 +++++++
 scripts/Kbuild.include   |  4 +++-
 5 files changed, 32 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/kernel/macros.S

diff --git a/Makefile b/Makefile
index 554dcaddbce4..026586db2f26 100644
--- a/Makefile
+++ b/Makefile
@@ -1085,7 +1085,7 @@ scripts: scripts_basic include/config/auto.conf include/config/tristate.conf \
 # version.h and scripts_basic is processed / created.
 
 # Listed in dependency order
-PHONY += prepare archprepare prepare0 prepare1 prepare2 prepare3
+PHONY += prepare archprepare macroprepare prepare0 prepare1 prepare2 prepare3
 
 # prepare3 is used to check if we are building in a separate output directory,
 # and if so do:
@@ -1109,7 +1109,9 @@ prepare1: prepare2 $(version_h) $(autoksyms_h) include/generated/utsrelease.h \
                    include/config/auto.conf
 	$(cmd_crmodverdir)
 
-archprepare: archheaders archscripts prepare1 scripts_basic
+macroprepare: prepare1 archmacros
+
+archprepare: archheaders archscripts macroprepare scripts_basic
 
 prepare0: archprepare gcc-plugins
 	$(Q)$(MAKE) $(build)=.
@@ -1214,6 +1216,9 @@ archheaders:
 PHONY += archscripts
 archscripts:
 
+PHONY += archmacros
+archmacros:
+
 PHONY += __headers
 __headers: $(version_h) scripts_basic uapi-asm-generic archheaders archscripts
 	$(Q)$(MAKE) $(build)=scripts build_unifdef
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 60135cbd905c..6b82314776fd 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -235,8 +235,8 @@ ifdef CONFIG_X86_64
 LDFLAGS += $(call ld-option, -z max-page-size=0x200000)
 endif
 
-# Speed up the build
-KBUILD_CFLAGS += -pipe
+# We cannot use -pipe flag since we give an additional .s file to the compiler
+#KBUILD_CFLAGS += -pipe
 # Workaround for a gcc prelease that unfortunately was shipped in a suse release
 KBUILD_CFLAGS += -Wno-sign-compare
 #
@@ -258,11 +258,18 @@ archscripts: scripts_basic
 archheaders:
 	$(Q)$(MAKE) $(build)=arch/x86/entry/syscalls all
 
+archmacros:
+	$(Q)$(MAKE) $(build)=arch/x86/kernel macros
+
 archprepare:
 ifeq ($(CONFIG_KEXEC_FILE),y)
 	$(Q)$(MAKE) $(build)=arch/x86/purgatory arch/x86/purgatory/kexec-purgatory.c
 endif
 
+ASM_MACRO_FLAGS = -Wa,arch/x86/kernel/macros.s
+export ASM_MACRO_FLAGS
+KBUILD_CFLAGS += $(ASM_MACRO_FLAGS)
+
 ###
 # Kernel objects
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 02d6f5cf4e70..fdb6c5b2a922 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -9,6 +9,12 @@ extra-y	+= ebda.o
 extra-y	+= platform-quirks.o
 extra-y	+= vmlinux.lds
 
+$(obj)/macros.s: $(obj)/macros.S FORCE
+	$(call if_changed_dep,cpp_s_S)
+
+macros: $(obj)/macros.s
+	@:
+
 CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)
 
 ifdef CONFIG_FUNCTION_TRACER
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
new file mode 100644
index 000000000000..cfc1c7d1a6eb
--- /dev/null
+++ b/arch/x86/kernel/macros.S
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * This file includes headers whose assembly part includes macros which are
+ * commonly used. The macros are precompiled into assmebly file which is later
+ * assembled together with each compiled file.
+ */
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 50cee534fd64..ad2c02062aa4 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -189,7 +189,9 @@ __cc-option = $(call try-run-cached,\
 
 # Do not attempt to build with gcc plugins during cc-option tests.
 # (And this uses delayed resolution so the flags will be up to date.)
-CC_OPTION_CFLAGS = $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS))
+# In addition, do not include the asm macros which are built later.
+CC_OPTION_FILTERED = $(GCC_PLUGINS_CFLAGS) $(ASM_MACRO_FLAGS)
+CC_OPTION_CFLAGS = $(filter-out $(CC_OPTION_FILTERED),$(KBUILD_CFLAGS))
 
 # cc-option
 # Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586)
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 2/9] x86: objtool: use asm macro for better compiler decisions
  2018-06-12 11:50 [PATCH v4 0/9] x86: macrofying inline asm for better compilation Nadav Amit
@ 2018-06-12 11:50   ` Nadav Amit
  2018-06-12 11:50   ` Nadav Amit
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86; +Cc: Nadav Amit, Christopher Li, linux-sparse

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

In the case of objtool, this distortion is extreme, since anyhow the
annotations of objtool are discarded during linkage.

The solution is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch slightly increases the kernel size.

   text	   data	    bss	    dec	    hex	filename
18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
18140970 10225412 2957312 31323694 1ddf62e ./vmlinux after (+829)

Static text symbols:
Before:	40321
After:	40302	(-19)

Cc: Christopher Li <sparse@chrisli.org>
Cc: linux-sparse@vger.kernel.org

Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/kernel/macros.S |  2 ++
 include/linux/compiler.h | 56 ++++++++++++++++++++++++++++++----------
 2 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index cfc1c7d1a6eb..cee28c3246dc 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -5,3 +5,5 @@
  * commonly used. The macros are precompiled into assmebly file which is later
  * assembled together with each compiled file.
  */
+
+#include <linux/compiler.h>
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c63601..3efd85334774 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -99,22 +99,13 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
  * unique, to convince GCC not to merge duplicate inline asm statements.
  */
 #define annotate_reachable() ({						\
-	asm volatile("%c0:\n\t"						\
-		     ".pushsection .discard.reachable\n\t"		\
-		     ".long %c0b - .\n\t"				\
-		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+	asm volatile("ANNOTATE_REACHABLE counter=%c0"			\
+		     : : "i" (__COUNTER__));				\
 })
 #define annotate_unreachable() ({					\
-	asm volatile("%c0:\n\t"						\
-		     ".pushsection .discard.unreachable\n\t"		\
-		     ".long %c0b - .\n\t"				\
-		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+	asm volatile("ANNOTATE_UNREACHABLE counter=%c0"			\
+		     : : "i" (__COUNTER__));				\
 })
-#define ASM_UNREACHABLE							\
-	"999:\n\t"							\
-	".pushsection .discard.unreachable\n\t"				\
-	".long 999b - .\n\t"						\
-	".popsection\n\t"
 #else
 #define annotate_reachable()
 #define annotate_unreachable()
@@ -280,6 +271,45 @@ unsigned long read_word_at_a_time(const void *addr)
 
 #endif /* __KERNEL__ */
 
+#else /* __ASSEMBLY__ */
+
+#ifdef __KERNEL__
+#ifndef LINKER_SCRIPT
+
+#ifdef CONFIG_STACK_VALIDATION
+.macro ANNOTATE_UNREACHABLE counter:req
+\counter:
+	.pushsection .discard.unreachable
+	.long \counter\()b -.
+	.popsection
+.endm
+
+.macro ANNOTATE_REACHABLE counter:req
+\counter:
+	.pushsection .discard.reachable
+	.long \counter\()b -.
+	.popsection
+.endm
+
+.macro ASM_UNREACHABLE
+999:
+	.pushsection .discard.unreachable
+	.long 999b - .
+	.popsection
+.endm
+#else /* CONFIG_STACK_VALIDATION */
+.macro ANNOTATE_UNREACHABLE counter:req
+.endm
+
+.macro ANNOTATE_REACHABLE counter:req
+.endm
+
+.macro ASM_UNREACHABLE
+.endm
+#endif /* CONFIG_STACK_VALIDATION */
+
+#endif /* LINKER_SCRIPT */
+#endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
 
 #ifndef __optimize
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 2/9] x86: objtool: use asm macro for better compiler decisions
@ 2018-06-12 11:50   ` Nadav Amit
  0 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86; +Cc: Nadav Amit, Christopher Li, linux-sparse

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

In the case of objtool, this distortion is extreme, since anyhow the
annotations of objtool are discarded during linkage.

The solution is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch slightly increases the kernel size.

   text	   data	    bss	    dec	    hex	filename
18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
18140970 10225412 2957312 31323694 1ddf62e ./vmlinux after (+829)

Static text symbols:
Before:	40321
After:	40302	(-19)

Cc: Christopher Li <sparse@chrisli.org>
Cc: linux-sparse@vger.kernel.org

Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/kernel/macros.S |  2 ++
 include/linux/compiler.h | 56 ++++++++++++++++++++++++++++++----------
 2 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index cfc1c7d1a6eb..cee28c3246dc 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -5,3 +5,5 @@
  * commonly used. The macros are precompiled into assmebly file which is later
  * assembled together with each compiled file.
  */
+
+#include <linux/compiler.h>
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c63601..3efd85334774 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -99,22 +99,13 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
  * unique, to convince GCC not to merge duplicate inline asm statements.
  */
 #define annotate_reachable() ({						\
-	asm volatile("%c0:\n\t"						\
-		     ".pushsection .discard.reachable\n\t"		\
-		     ".long %c0b - .\n\t"				\
-		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+	asm volatile("ANNOTATE_REACHABLE counter=%c0"			\
+		     : : "i" (__COUNTER__));				\
 })
 #define annotate_unreachable() ({					\
-	asm volatile("%c0:\n\t"						\
-		     ".pushsection .discard.unreachable\n\t"		\
-		     ".long %c0b - .\n\t"				\
-		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+	asm volatile("ANNOTATE_UNREACHABLE counter=%c0"			\
+		     : : "i" (__COUNTER__));				\
 })
-#define ASM_UNREACHABLE							\
-	"999:\n\t"							\
-	".pushsection .discard.unreachable\n\t"				\
-	".long 999b - .\n\t"						\
-	".popsection\n\t"
 #else
 #define annotate_reachable()
 #define annotate_unreachable()
@@ -280,6 +271,45 @@ unsigned long read_word_at_a_time(const void *addr)
 
 #endif /* __KERNEL__ */
 
+#else /* __ASSEMBLY__ */
+
+#ifdef __KERNEL__
+#ifndef LINKER_SCRIPT
+
+#ifdef CONFIG_STACK_VALIDATION
+.macro ANNOTATE_UNREACHABLE counter:req
+\counter:
+	.pushsection .discard.unreachable
+	.long \counter\()b -.
+	.popsection
+.endm
+
+.macro ANNOTATE_REACHABLE counter:req
+\counter:
+	.pushsection .discard.reachable
+	.long \counter\()b -.
+	.popsection
+.endm
+
+.macro ASM_UNREACHABLE
+999:
+	.pushsection .discard.unreachable
+	.long 999b - .
+	.popsection
+.endm
+#else /* CONFIG_STACK_VALIDATION */
+.macro ANNOTATE_UNREACHABLE counter:req
+.endm
+
+.macro ANNOTATE_REACHABLE counter:req
+.endm
+
+.macro ASM_UNREACHABLE
+.endm
+#endif /* CONFIG_STACK_VALIDATION */
+
+#endif /* LINKER_SCRIPT */
+#endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
 
 #ifndef __optimize
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 3/9] x86: refcount: prevent gcc distortions
  2018-06-12 11:50 [PATCH v4 0/9] x86: macrofying inline asm for better compilation Nadav Amit
  2018-06-12 11:50   ` Nadav Amit
  2018-06-12 11:50   ` Nadav Amit
@ 2018-06-12 11:50 ` Nadav Amit
  2018-06-12 11:50 ` [PATCH v4 4/9] x86: alternatives: macrofy locks for better inlining Nadav Amit
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Nadav Amit, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Kees Cook, Jan Beulich, Josh Poimboeuf

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch allows to inline functions such as __get_seccomp_filter().
Interestingly, this allows more aggressive inlining while reducing the
kernel size.

   text	   data	    bss	    dec	    hex	filename
18140970 10225412 2957312 31323694 1ddf62e ./vmlinux before
18140140 10225284 2957312 31322736 1ddf270 ./vmlinux after (-958)

Static text symbols:
Before:	40302
After:	40286	(-16)

Functions such as kref_get(), free_user(), fuse_file_get() now get
inlined.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Kees Cook <keescook@chromium.org>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/refcount.h | 74 ++++++++++++++++++++-------------
 arch/x86/kernel/macros.S        |  1 +
 2 files changed, 46 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
index 4cf11d88d3b3..6b2809a4d8e9 100644
--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -4,6 +4,41 @@
  * x86-specific implementation of refcount_t. Based on PAX_REFCOUNT from
  * PaX/grsecurity.
  */
+
+#ifdef __ASSEMBLY__
+
+#include <asm/asm.h>
+#include <asm/bug.h>
+
+.macro REFCOUNT_EXCEPTION counter:req
+	.pushsection .text..refcount
+111:	lea \counter, %_ASM_CX
+112:	ud2
+	ASM_UNREACHABLE
+	.popsection
+113:	_ASM_EXTABLE_REFCOUNT(112b, 113b)
+.endm
+
+/* Trigger refcount exception if refcount result is negative. */
+.macro REFCOUNT_CHECK_LT_ZERO counter:req
+	js 111f
+	REFCOUNT_EXCEPTION counter="\counter"
+.endm
+
+/* Trigger refcount exception if refcount result is zero or negative. */
+.macro REFCOUNT_CHECK_LE_ZERO counter:req
+	jz 111f
+	REFCOUNT_CHECK_LT_ZERO counter="\counter"
+.endm
+
+/* Trigger refcount exception unconditionally. */
+.macro REFCOUNT_ERROR counter:req
+	jmp 111f
+	REFCOUNT_EXCEPTION counter="\counter"
+.endm
+
+#else /* __ASSEMBLY__ */
+
 #include <linux/refcount.h>
 
 /*
@@ -14,34 +49,11 @@
  * central refcount exception. The fixup address for the exception points
  * back to the regular execution flow in .text.
  */
-#define _REFCOUNT_EXCEPTION				\
-	".pushsection .text..refcount\n"		\
-	"111:\tlea %[counter], %%" _ASM_CX "\n"		\
-	"112:\t" ASM_UD2 "\n"				\
-	ASM_UNREACHABLE					\
-	".popsection\n"					\
-	"113:\n"					\
-	_ASM_EXTABLE_REFCOUNT(112b, 113b)
-
-/* Trigger refcount exception if refcount result is negative. */
-#define REFCOUNT_CHECK_LT_ZERO				\
-	"js 111f\n\t"					\
-	_REFCOUNT_EXCEPTION
-
-/* Trigger refcount exception if refcount result is zero or negative. */
-#define REFCOUNT_CHECK_LE_ZERO				\
-	"jz 111f\n\t"					\
-	REFCOUNT_CHECK_LT_ZERO
-
-/* Trigger refcount exception unconditionally. */
-#define REFCOUNT_ERROR					\
-	"jmp 111f\n\t"					\
-	_REFCOUNT_EXCEPTION
 
 static __always_inline void refcount_add(unsigned int i, refcount_t *r)
 {
 	asm volatile(LOCK_PREFIX "addl %1,%0\n\t"
-		REFCOUNT_CHECK_LT_ZERO
+		"REFCOUNT_CHECK_LT_ZERO counter=\"%[counter]\""
 		: [counter] "+m" (r->refs.counter)
 		: "ir" (i)
 		: "cc", "cx");
@@ -50,7 +62,7 @@ static __always_inline void refcount_add(unsigned int i, refcount_t *r)
 static __always_inline void refcount_inc(refcount_t *r)
 {
 	asm volatile(LOCK_PREFIX "incl %0\n\t"
-		REFCOUNT_CHECK_LT_ZERO
+		"REFCOUNT_CHECK_LT_ZERO counter=\"%[counter]\""
 		: [counter] "+m" (r->refs.counter)
 		: : "cc", "cx");
 }
@@ -58,7 +70,7 @@ static __always_inline void refcount_inc(refcount_t *r)
 static __always_inline void refcount_dec(refcount_t *r)
 {
 	asm volatile(LOCK_PREFIX "decl %0\n\t"
-		REFCOUNT_CHECK_LE_ZERO
+		"REFCOUNT_CHECK_LE_ZERO counter=\"%[counter]\""
 		: [counter] "+m" (r->refs.counter)
 		: : "cc", "cx");
 }
@@ -66,13 +78,15 @@ static __always_inline void refcount_dec(refcount_t *r)
 static __always_inline __must_check
 bool refcount_sub_and_test(unsigned int i, refcount_t *r)
 {
-	GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", REFCOUNT_CHECK_LT_ZERO,
+	GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
+				  "REFCOUNT_CHECK_LT_ZERO counter=\"%0\"",
 				  r->refs.counter, "er", i, "%0", e, "cx");
 }
 
 static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
 {
-	GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl", REFCOUNT_CHECK_LT_ZERO,
+	GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
+				 "REFCOUNT_CHECK_LT_ZERO counter=\"%0\"",
 				 r->refs.counter, "%0", e, "cx");
 }
 
@@ -90,7 +104,7 @@ bool refcount_add_not_zero(unsigned int i, refcount_t *r)
 
 		/* Did we try to increment from/to an undesirable state? */
 		if (unlikely(c < 0 || c == INT_MAX || result < c)) {
-			asm volatile(REFCOUNT_ERROR
+			asm volatile("REFCOUNT_ERROR counter=\"%[counter]\""
 				     : : [counter] "m" (r->refs.counter)
 				     : "cc", "cx");
 			break;
@@ -106,4 +120,6 @@ static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r)
 	return refcount_add_not_zero(1, r);
 }
 
+#endif /* __ASSEMBLY__ */
+
 #endif
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index cee28c3246dc..f1fe1d570365 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -7,3 +7,4 @@
  */
 
 #include <linux/compiler.h>
+#include <asm/refcount.h>
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 4/9] x86: alternatives: macrofy locks for better inlining
  2018-06-12 11:50 [PATCH v4 0/9] x86: macrofying inline asm for better compilation Nadav Amit
                   ` (2 preceding siblings ...)
  2018-06-12 11:50 ` [PATCH v4 3/9] x86: refcount: prevent gcc distortions Nadav Amit
@ 2018-06-12 11:50 ` Nadav Amit
  2018-06-12 11:50 ` [PATCH v4 5/9] x86: bug: prevent gcc distortions Nadav Amit
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Nadav Amit, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Josh Poimboeuf

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch handles the LOCK prefix, allowing more aggresive inlining.

   text	   data	    bss	    dec	    hex	filename
18140140 10225284 2957312 31322736 1ddf270 ./vmlinux before
18146889 10225380 2957312 31329581 1de0d2d ./vmlinux after (+6845)

Static text symbols:
Before:	40286
After:	40218	(-68)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/alternative-asm.h | 20 ++++++++++++++------
 arch/x86/include/asm/alternative.h     | 11 ++---------
 arch/x86/kernel/macros.S               |  1 +
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h
index 31b627b43a8e..8e4ea39e55d0 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -7,16 +7,24 @@
 #include <asm/asm.h>
 
 #ifdef CONFIG_SMP
-	.macro LOCK_PREFIX
-672:	lock
+.macro LOCK_PREFIX_HERE
 	.pushsection .smp_locks,"a"
 	.balign 4
-	.long 672b - .
+	.long 671f - .		# offset
 	.popsection
-	.endm
+671:
+.endm
+
+.macro LOCK_PREFIX insn:vararg
+	LOCK_PREFIX_HERE
+	lock \insn
+.endm
 #else
-	.macro LOCK_PREFIX
-	.endm
+.macro LOCK_PREFIX_HERE
+.endm
+
+.macro LOCK_PREFIX insn:vararg
+.endm
 #endif
 
 /*
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 4cd6a3b71824..d7faa16622d8 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -31,15 +31,8 @@
  */
 
 #ifdef CONFIG_SMP
-#define LOCK_PREFIX_HERE \
-		".pushsection .smp_locks,\"a\"\n"	\
-		".balign 4\n"				\
-		".long 671f - .\n" /* offset */		\
-		".popsection\n"				\
-		"671:"
-
-#define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; "
-
+#define LOCK_PREFIX_HERE "LOCK_PREFIX_HERE\n\t"
+#define LOCK_PREFIX "LOCK_PREFIX "
 #else /* ! CONFIG_SMP */
 #define LOCK_PREFIX_HERE ""
 #define LOCK_PREFIX ""
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index f1fe1d570365..852487a9fc56 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -8,3 +8,4 @@
 
 #include <linux/compiler.h>
 #include <asm/refcount.h>
+#include <asm/alternative-asm.h>
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 5/9] x86: bug: prevent gcc distortions
  2018-06-12 11:50 [PATCH v4 0/9] x86: macrofying inline asm for better compilation Nadav Amit
                   ` (3 preceding siblings ...)
  2018-06-12 11:50 ` [PATCH v4 4/9] x86: alternatives: macrofy locks for better inlining Nadav Amit
@ 2018-06-12 11:50 ` Nadav Amit
  2018-06-12 11:50 ` [PATCH v4 6/9] x86: prevent inline distortion by paravirt ops Nadav Amit
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Nadav Amit, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Josh Poimboeuf

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlinedv
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch increases the kernel size:

   text	   data	    bss	    dec	    hex	filename
18146889 10225380 2957312 31329581 1de0d2d ./vmlinux before
18147336 10226688 2957312 31331336 1de1408 ./vmlinux after (+1755)

But enables more aggressive inlining (and probably branch decisions).
The number of static text symbols in vmlinux is lower.

Static text symbols:
Before:	40218
After:	40053	(-165)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/bug.h | 98 ++++++++++++++++++++++----------------
 arch/x86/kernel/macros.S   |  1 +
 include/asm-generic/bug.h  |  8 ++--
 3 files changed, 61 insertions(+), 46 deletions(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index 6804d6642767..5090035e6d16 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -4,6 +4,8 @@
 
 #include <linux/stringify.h>
 
+#ifndef __ASSEMBLY__
+
 /*
  * Despite that some emulators terminate on UD2, we use it for WARN().
  *
@@ -20,53 +22,15 @@
 
 #define LEN_UD2		2
 
-#ifdef CONFIG_GENERIC_BUG
-
-#ifdef CONFIG_X86_32
-# define __BUG_REL(val)	".long " __stringify(val)
-#else
-# define __BUG_REL(val)	".long " __stringify(val) " - 2b"
-#endif
-
-#ifdef CONFIG_DEBUG_BUGVERBOSE
-
-#define _BUG_FLAGS(ins, flags)						\
-do {									\
-	asm volatile("1:\t" ins "\n"					\
-		     ".pushsection __bug_table,\"aw\"\n"		\
-		     "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"	\
-		     "\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"	\
-		     "\t.word %c1"        "\t# bug_entry::line\n"	\
-		     "\t.word %c2"        "\t# bug_entry::flags\n"	\
-		     "\t.org 2b+%c3\n"					\
-		     ".popsection"					\
-		     : : "i" (__FILE__), "i" (__LINE__),		\
-			 "i" (flags),					\
-			 "i" (sizeof(struct bug_entry)));		\
-} while (0)
-
-#else /* !CONFIG_DEBUG_BUGVERBOSE */
-
 #define _BUG_FLAGS(ins, flags)						\
 do {									\
-	asm volatile("1:\t" ins "\n"					\
-		     ".pushsection __bug_table,\"aw\"\n"		\
-		     "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"	\
-		     "\t.word %c0"        "\t# bug_entry::flags\n"	\
-		     "\t.org 2b+%c1\n"					\
-		     ".popsection"					\
-		     : : "i" (flags),					\
+	asm volatile("ASM_BUG ins=\"" ins "\" file=%c0 line=%c1 "	\
+		     "flags=%c2 size=%c3"				\
+		     : : "i" (__FILE__), "i" (__LINE__),                \
+			 "i" (flags),                                   \
 			 "i" (sizeof(struct bug_entry)));		\
 } while (0)
 
-#endif /* CONFIG_DEBUG_BUGVERBOSE */
-
-#else
-
-#define _BUG_FLAGS(ins, flags)  asm volatile(ins)
-
-#endif /* CONFIG_GENERIC_BUG */
-
 #define HAVE_ARCH_BUG
 #define BUG()							\
 do {								\
@@ -82,4 +46,54 @@ do {								\
 
 #include <asm-generic/bug.h>
 
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_GENERIC_BUG
+
+#ifdef CONFIG_X86_32
+.macro __BUG_REL val:req
+	.long \val
+.endm
+#else
+.macro __BUG_REL val:req
+	.long \val - 2b
+.endm
+#endif
+
+#ifdef CONFIG_DEBUG_BUGVERBOSE
+
+.macro ASM_BUG ins:req file:req line:req flags:req size:req
+1:	\ins
+	.pushsection __bug_table,"aw"
+2:	__BUG_REL val=1b	# bug_entry::bug_addr
+	__BUG_REL val=\file	# bug_entry::file
+	.word \line		# bug_entry::line
+	.word \flags		# bug_entry::flags
+	.org 2b+\size
+	.popsection
+.endm
+
+#else /* !CONFIG_DEBUG_BUGVERBOSE */
+
+.macro ASM_BUG ins:req file:req line:req flags:req size:req
+1:	\ins
+	.pushsection __bug_table,"aw"
+2:	__BUG_REL val=1b	# bug_entry::bug_addr
+	.word \flags		# bug_entry::flags
+	.org 2b+\size
+	.popsection
+.endm
+
+#endif /* CONFIG_DEBUG_BUGVERBOSE */
+
+#else /* CONFIG_GENERIC_BUG */
+
+.macro ASM_BUG ins:req file:req line:req flags:req size:req
+	\ins
+.endm
+
+#endif /* CONFIG_GENERIC_BUG */
+
+#endif /* __ASSEMBLY__ */
+
 #endif /* _ASM_X86_BUG_H */
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index 852487a9fc56..66ccb8e823b1 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -9,3 +9,4 @@
 #include <linux/compiler.h>
 #include <asm/refcount.h>
 #include <asm/alternative-asm.h>
+#include <asm/bug.h>
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index a7613e1b0c87..cf7f4ef4749e 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -17,10 +17,8 @@
 #ifndef __ASSEMBLY__
 #include <linux/kernel.h>
 
-#ifdef CONFIG_BUG
-
-#ifdef CONFIG_GENERIC_BUG
 struct bug_entry {
+#ifdef CONFIG_GENERIC_BUG
 #ifndef CONFIG_GENERIC_BUG_RELATIVE_POINTERS
 	unsigned long	bug_addr;
 #else
@@ -35,8 +33,10 @@ struct bug_entry {
 	unsigned short	line;
 #endif
 	unsigned short	flags;
-};
 #endif	/* CONFIG_GENERIC_BUG */
+};
+
+#ifdef CONFIG_BUG
 
 /*
  * Don't use BUG() or BUG_ON() unless there's really no way out; one
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 6/9] x86: prevent inline distortion by paravirt ops
  2018-06-12 11:50 [PATCH v4 0/9] x86: macrofying inline asm for better compilation Nadav Amit
                   ` (4 preceding siblings ...)
  2018-06-12 11:50 ` [PATCH v4 5/9] x86: bug: prevent gcc distortions Nadav Amit
@ 2018-06-12 11:50 ` Nadav Amit
  2018-06-19 15:09     ` Juergen Gross
  2018-06-12 11:50 ` [PATCH v4 7/9] x86: extable: use macros instead of inline assembly Nadav Amit
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Nadav Amit, Juergen Gross, Alok Kataria, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, virtualization

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

The effect of the patch is a more aggressive inlining, which also
causes a size increase of kernel.

   text	   data	    bss	    dec	    hex	filename
18147336 10226688 2957312 31331336 1de1408 ./vmlinux before
18162555 10226288 2957312 31346155 1de4deb ./vmlinux after (+14819)

Static text symbols:
Before:	40053
After:	39942	(-111)

Cc: Juergen Gross <jgross@suse.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: virtualization@lists.linux-foundation.org

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/paravirt_types.h | 56 +++++++++++++--------------
 arch/x86/kernel/macros.S              |  1 +
 2 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 180bc0bff0fb..a3d886238b58 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -343,23 +343,11 @@ extern struct pv_lock_ops pv_lock_ops;
 #define paravirt_clobber(clobber)		\
 	[paravirt_clobber] "i" (clobber)
 
-/*
- * Generate some code, and mark it as patchable by the
- * apply_paravirt() alternate instruction patcher.
- */
-#define _paravirt_alt(insn_string, type, clobber)	\
-	"771:\n\t" insn_string "\n" "772:\n"		\
-	".pushsection .parainstructions,\"a\"\n"	\
-	_ASM_ALIGN "\n"					\
-	_ASM_PTR " 771b\n"				\
-	"  .byte " type "\n"				\
-	"  .byte 772b-771b\n"				\
-	"  .short " clobber "\n"			\
-	".popsection\n"
-
 /* Generate patchable code, with the default asm parameters. */
-#define paravirt_alt(insn_string)					\
-	_paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")
+#define paravirt_alt							\
+	"PARAVIRT_CALL type=\"%c[paravirt_typenum]\""			\
+	" clobber=\"%c[paravirt_clobber]\""				\
+	" pv_opptr=\"%c[paravirt_opptr]\""
 
 /* Simple instruction patching code. */
 #define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
@@ -387,16 +375,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
 
 int paravirt_disable_iospace(void);
 
-/*
- * This generates an indirect call based on the operation type number.
- * The type number, computed in PARAVIRT_PATCH, is derived from the
- * offset into the paravirt_patch_template structure, and can therefore be
- * freely converted back into a structure offset.
- */
-#define PARAVIRT_CALL					\
-	ANNOTATE_RETPOLINE_SAFE				\
-	"call *%c[paravirt_opptr];"
-
 /*
  * These macros are intended to wrap calls through one of the paravirt
  * ops structs, so that they can be later identified and patched at
@@ -534,7 +512,7 @@ int paravirt_disable_iospace(void);
 		/* since this condition will never hold */		\
 		if (sizeof(rettype) > sizeof(unsigned long)) {		\
 			asm volatile(pre				\
-				     paravirt_alt(PARAVIRT_CALL)	\
+				     paravirt_alt			\
 				     post				\
 				     : call_clbr, ASM_CALL_CONSTRAINT	\
 				     : paravirt_type(op),		\
@@ -544,7 +522,7 @@ int paravirt_disable_iospace(void);
 			__ret = (rettype)((((u64)__edx) << 32) | __eax); \
 		} else {						\
 			asm volatile(pre				\
-				     paravirt_alt(PARAVIRT_CALL)	\
+				     paravirt_alt			\
 				     post				\
 				     : call_clbr, ASM_CALL_CONSTRAINT	\
 				     : paravirt_type(op),		\
@@ -571,7 +549,7 @@ int paravirt_disable_iospace(void);
 		PVOP_VCALL_ARGS;					\
 		PVOP_TEST_NULL(op);					\
 		asm volatile(pre					\
-			     paravirt_alt(PARAVIRT_CALL)		\
+			     paravirt_alt				\
 			     post					\
 			     : call_clbr, ASM_CALL_CONSTRAINT		\
 			     : paravirt_type(op),			\
@@ -691,6 +669,26 @@ struct paravirt_patch_site {
 extern struct paravirt_patch_site __parainstructions[],
 	__parainstructions_end[];
 
+#else	/* __ASSEMBLY__ */
+
+/*
+ * This generates an indirect call based on the operation type number.
+ * The type number, computed in PARAVIRT_PATCH, is derived from the
+ * offset into the paravirt_patch_template structure, and can therefore be
+ * freely converted back into a structure offset.
+ */
+.macro PARAVIRT_CALL type:req clobber:req pv_opptr:req
+771:	ANNOTATE_RETPOLINE_SAFE
+	call *\pv_opptr
+772:	.pushsection .parainstructions,"a"
+	_ASM_ALIGN
+	_ASM_PTR 771b
+	.byte \type
+	.byte 772b-771b
+	.short \clobber
+	.popsection
+.endm
+
 #endif	/* __ASSEMBLY__ */
 
 #endif	/* _ASM_X86_PARAVIRT_TYPES_H */
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index 66ccb8e823b1..71d8b716b111 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -10,3 +10,4 @@
 #include <asm/refcount.h>
 #include <asm/alternative-asm.h>
 #include <asm/bug.h>
+#include <asm/paravirt.h>
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 7/9] x86: extable: use macros instead of inline assembly
  2018-06-12 11:50 [PATCH v4 0/9] x86: macrofying inline asm for better compilation Nadav Amit
                   ` (5 preceding siblings ...)
  2018-06-12 11:50 ` [PATCH v4 6/9] x86: prevent inline distortion by paravirt ops Nadav Amit
@ 2018-06-12 11:50 ` Nadav Amit
  2018-06-12 11:50 ` [PATCH v4 8/9] x86: cpufeature: " Nadav Amit
  2018-06-12 11:50 ` [PATCH v4 9/9] x86: jump-labels: " Nadav Amit
  8 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Nadav Amit, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Josh Poimboeuf

Use assembly macros for exception-tables and call them from inline
assembly.  This not only makes the code more readable and allows to
avoid the duplicate implementation, but also improves compilation
decision, specifically inline decisions which GCC base on the number of
new lines in inline assembly.

   text	   data	    bss	    dec	    hex	filename
18162555 10226288 2957312 31346155 1de4deb ./vmlinux before
18162879 10226256 2957312 31346447 1de4f0f ./vmlinux after (+292)

This allows to inline functions such as nested_vmx_exit_reflected(),
set_segment_reg(), __copy_xstate_to_user().

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/asm.h | 61 +++++++++++++++++---------------------
 arch/x86/kernel/macros.S   |  1 +
 2 files changed, 28 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 219faaec51df..30bc1b0058ef 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -58,28 +58,44 @@
 # define CC_OUT(c) [_cc_ ## c] "=qm"
 #endif
 
-/* Exception table entry */
 #ifdef __ASSEMBLY__
 # define _ASM_EXTABLE_HANDLE(from, to, handler)			\
-	.pushsection "__ex_table","a" ;				\
-	.balign 4 ;						\
-	.long (from) - . ;					\
-	.long (to) - . ;					\
-	.long (handler) - . ;					\
-	.popsection
+	ASM_EXTABLE_HANDLE from to handler
+
+#else /* __ASSEMBLY__ */
+
+# define _ASM_EXTABLE_HANDLE(from, to, handler)			\
+	"ASM_EXTABLE_HANDLE from=" #from " to=" #to		\
+	" handler=\"" #handler "\"\n\t"
+
+/* For C file, we already have NOKPROBE_SYMBOL macro */
+
+#endif /* __ASSEMBLY__ */
 
-# define _ASM_EXTABLE(from, to)					\
+#define _ASM_EXTABLE(from, to)					\
 	_ASM_EXTABLE_HANDLE(from, to, ex_handler_default)
 
-# define _ASM_EXTABLE_FAULT(from, to)				\
+#define _ASM_EXTABLE_FAULT(from, to)				\
 	_ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)
 
-# define _ASM_EXTABLE_EX(from, to)				\
+#define _ASM_EXTABLE_EX(from, to)				\
 	_ASM_EXTABLE_HANDLE(from, to, ex_handler_ext)
 
-# define _ASM_EXTABLE_REFCOUNT(from, to)			\
+#define _ASM_EXTABLE_REFCOUNT(from, to)				\
 	_ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount)
 
+/* Exception table entry */
+#ifdef __ASSEMBLY__
+
+.macro ASM_EXTABLE_HANDLE from:req to:req handler:req
+	.pushsection "__ex_table","a"
+	.balign 4
+	.long (\from) - .
+	.long (\to) - .
+	.long (\handler) - .
+	.popsection
+.endm
+
 # define _ASM_NOKPROBE(entry)					\
 	.pushsection "_kprobe_blacklist","aw" ;			\
 	_ASM_ALIGN ;						\
@@ -110,29 +126,6 @@
 	_ASM_EXTABLE(101b,103b)
 	.endm
 
-#else
-# define _EXPAND_EXTABLE_HANDLE(x) #x
-# define _ASM_EXTABLE_HANDLE(from, to, handler)			\
-	" .pushsection \"__ex_table\",\"a\"\n"			\
-	" .balign 4\n"						\
-	" .long (" #from ") - .\n"				\
-	" .long (" #to ") - .\n"				\
-	" .long (" _EXPAND_EXTABLE_HANDLE(handler) ") - .\n"	\
-	" .popsection\n"
-
-# define _ASM_EXTABLE(from, to)					\
-	_ASM_EXTABLE_HANDLE(from, to, ex_handler_default)
-
-# define _ASM_EXTABLE_FAULT(from, to)				\
-	_ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)
-
-# define _ASM_EXTABLE_EX(from, to)				\
-	_ASM_EXTABLE_HANDLE(from, to, ex_handler_ext)
-
-# define _ASM_EXTABLE_REFCOUNT(from, to)			\
-	_ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount)
-
-/* For C file, we already have NOKPROBE_SYMBOL macro */
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index 71d8b716b111..7baa40d5bf16 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -11,3 +11,4 @@
 #include <asm/alternative-asm.h>
 #include <asm/bug.h>
 #include <asm/paravirt.h>
+#include <asm/asm.h>
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 8/9] x86: cpufeature: use macros instead of inline assembly
  2018-06-12 11:50 [PATCH v4 0/9] x86: macrofying inline asm for better compilation Nadav Amit
                   ` (6 preceding siblings ...)
  2018-06-12 11:50 ` [PATCH v4 7/9] x86: extable: use macros instead of inline assembly Nadav Amit
@ 2018-06-12 11:50 ` Nadav Amit
  2018-06-12 11:50 ` [PATCH v4 9/9] x86: jump-labels: " Nadav Amit
  8 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Nadav Amit, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Use assembly macros for static_cpu_has() and call them from inline
assembly.  This not only makes the code more readable, but also improves
compilation decision, specifically inline decisions which GCC base on
the number of new lines in inline assembly.

The patch slightly increases the kernel size:

   text	   data	    bss	    dec	    hex	filename
18162879 10226256 2957312 31346447 1de4f0f ./vmlinux before
18163528 10226300 2957312 31347140 1de51c4 ./vmlinux after (+693)

And enables the inlining of function such as free_ldt_pgtables().

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/cpufeature.h | 82 ++++++++++++++++++-------------
 arch/x86/kernel/macros.S          |  1 +
 2 files changed, 48 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index aced6c9290d6..7d442722ef24 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -2,10 +2,10 @@
 #ifndef _ASM_X86_CPUFEATURE_H
 #define _ASM_X86_CPUFEATURE_H
 
-#include <asm/processor.h>
-
-#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
+#ifdef __KERNEL__
+#ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
 #include <asm/asm.h>
 #include <linux/bitops.h>
 
@@ -161,37 +161,10 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
  */
 static __always_inline __pure bool _static_cpu_has(u16 bit)
 {
-	asm_volatile_goto("1: jmp 6f\n"
-		 "2:\n"
-		 ".skip -(((5f-4f) - (2b-1b)) > 0) * "
-			 "((5f-4f) - (2b-1b)),0x90\n"
-		 "3:\n"
-		 ".section .altinstructions,\"a\"\n"
-		 " .long 1b - .\n"		/* src offset */
-		 " .long 4f - .\n"		/* repl offset */
-		 " .word %P[always]\n"		/* always replace */
-		 " .byte 3b - 1b\n"		/* src len */
-		 " .byte 5f - 4f\n"		/* repl len */
-		 " .byte 3b - 2b\n"		/* pad len */
-		 ".previous\n"
-		 ".section .altinstr_replacement,\"ax\"\n"
-		 "4: jmp %l[t_no]\n"
-		 "5:\n"
-		 ".previous\n"
-		 ".section .altinstructions,\"a\"\n"
-		 " .long 1b - .\n"		/* src offset */
-		 " .long 0\n"			/* no replacement */
-		 " .word %P[feature]\n"		/* feature bit */
-		 " .byte 3b - 1b\n"		/* src len */
-		 " .byte 0\n"			/* repl len */
-		 " .byte 0\n"			/* pad len */
-		 ".previous\n"
-		 ".section .altinstr_aux,\"ax\"\n"
-		 "6:\n"
-		 " testb %[bitnum],%[cap_byte]\n"
-		 " jnz %l[t_yes]\n"
-		 " jmp %l[t_no]\n"
-		 ".previous\n"
+	asm_volatile_goto("STATIC_CPU_HAS bitnum=%[bitnum] "
+			  "cap_byte=\"%[cap_byte]\" "
+			  "feature=%P[feature] t_yes=%l[t_yes] "
+			  "t_no=%l[t_no] always=%P[always]"
 		 : : [feature]  "i" (bit),
 		     [always]   "i" (X86_FEATURE_ALWAYS),
 		     [bitnum]   "i" (1 << (bit & 7)),
@@ -226,5 +199,44 @@ static __always_inline __pure bool _static_cpu_has(u16 bit)
 #define CPU_FEATURE_TYPEVAL		boot_cpu_data.x86_vendor, boot_cpu_data.x86, \
 					boot_cpu_data.x86_model
 
-#endif /* defined(__KERNEL__) && !defined(__ASSEMBLY__) */
+#else /* __ASSEMBLY__ */
+
+.macro STATIC_CPU_HAS bitnum:req cap_byte:req feature:req t_yes:req t_no:req always:req
+1:
+	jmp 6f
+2:
+	.skip -(((5f-4f) - (2b-1b)) > 0) * ((5f-4f) - (2b-1b)),0x90
+3:
+	.section .altinstructions,"a"
+	.long 1b - .		/* src offset */
+	.long 4f - .		/* repl offset */
+	.word \always		/* always replace */
+	.byte 3b - 1b		/* src len */
+	.byte 5f - 4f		/* repl len */
+	.byte 3b - 2b		/* pad len */
+	.previous
+	.section .altinstr_replacement,"ax"
+4:
+	jmp \t_no
+5:
+	.previous
+	.section .altinstructions,"a"
+	.long 1b - .		/* src offset */
+	.long 0			/* no replacement */
+	.word \feature		/* feature bit */
+	.byte 3b - 1b		/* src len */
+	.byte 0			/* repl len */
+	.byte 0			/* pad len */
+	.previous
+	.section .altinstr_aux,"ax"
+6:
+	testb \bitnum,\cap_byte
+	jnz \t_yes
+	jmp \t_no
+	.previous
+.endm
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __KERNEL__ */
 #endif /* _ASM_X86_CPUFEATURE_H */
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index 7baa40d5bf16..bf8b9c93e255 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -12,3 +12,4 @@
 #include <asm/bug.h>
 #include <asm/paravirt.h>
 #include <asm/asm.h>
+#include <asm/cpufeature.h>
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 9/9] x86: jump-labels: use macros instead of inline assembly
  2018-06-12 11:50 [PATCH v4 0/9] x86: macrofying inline asm for better compilation Nadav Amit
                   ` (7 preceding siblings ...)
  2018-06-12 11:50 ` [PATCH v4 8/9] x86: cpufeature: " Nadav Amit
@ 2018-06-12 11:50 ` Nadav Amit
  8 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-12 11:50 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Nadav Amit, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Kate Stewart, Philippe Ombredanne

Use assembly macros for jump-labels and call them from inline assembly.
This not only makes the code more readable, but also improves
compilation decision, specifically inline decisions which GCC base on
the number of new lines in inline assembly.

As a result the code size is slightly increased.

   text	   data	    bss	    dec	    hex	filename
18163528 10226300 2957312 31347140 1de51c4 ./vmlinux before
18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+1128)

And functions such as intel_pstate_adjust_policy_max(),
kvm_cpu_accept_dm_intr(), kvm_register_readl() are inlined.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/jump_label.h | 65 ++++++++++++++++++-------------
 arch/x86/kernel/macros.S          |  1 +
 2 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 8c0de4282659..f321a50e6727 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -2,19 +2,6 @@
 #ifndef _ASM_X86_JUMP_LABEL_H
 #define _ASM_X86_JUMP_LABEL_H
 
-#ifndef HAVE_JUMP_LABEL
-/*
- * For better or for worse, if jump labels (the gcc extension) are missing,
- * then the entire static branch patching infrastructure is compiled out.
- * If that happens, the code in here will malfunction.  Raise a compiler
- * error instead.
- *
- * In theory, jump labels and the static branch patching infrastructure
- * could be decoupled to fix this.
- */
-#error asm/jump_label.h included on a non-jump-label kernel
-#endif
-
 #define JUMP_LABEL_NOP_SIZE 5
 
 #ifdef CONFIG_X86_64
@@ -28,18 +15,27 @@
 
 #ifndef __ASSEMBLY__
 
+#ifndef HAVE_JUMP_LABEL
+/*
+ * For better or for worse, if jump labels (the gcc extension) are missing,
+ * then the entire static branch patching infrastructure is compiled out.
+ * If that happens, the code in here will malfunction.  Raise a compiler
+ * error instead.
+ *
+ * In theory, jump labels and the static branch patching infrastructure
+ * could be decoupled to fix this.
+ */
+#error asm/jump_label.h included on a non-jump-label kernel
+#endif
+
 #include <linux/stringify.h>
 #include <linux/types.h>
 
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
 {
-	asm_volatile_goto("1:"
-		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
-		".pushsection __jump_table,  \"aw\" \n\t"
-		_ASM_ALIGN "\n\t"
-		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
-		".popsection \n\t"
-		: :  "i" (key), "i" (branch) : : l_yes);
+	asm_volatile_goto("STATIC_BRANCH_NOP l_yes=\"%l[l_yes]\" key=\"%c0\" "
+			  "branch=\"%c1\""
+			: :  "i" (key), "i" (branch) : : l_yes);
 
 	return false;
 l_yes:
@@ -48,13 +44,8 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
 
 static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
 {
-	asm_volatile_goto("1:"
-		".byte 0xe9\n\t .long %l[l_yes] - 2f\n\t"
-		"2:\n\t"
-		".pushsection __jump_table,  \"aw\" \n\t"
-		_ASM_ALIGN "\n\t"
-		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
-		".popsection \n\t"
+	asm_volatile_goto("STATIC_BRANCH_JMP l_yes=\"%l[l_yes]\" key=\"%c0\" "
+			  "branch=\"%c1\""
 		: :  "i" (key), "i" (branch) : : l_yes);
 
 	return false;
@@ -108,6 +99,26 @@ struct jump_entry {
 	.popsection
 .endm
 
+.macro STATIC_BRANCH_NOP l_yes:req key:req branch:req
+1:
+	.byte STATIC_KEY_INIT_NOP
+	.pushsection __jump_table, "aw"
+	_ASM_ALIGN
+	_ASM_PTR 1b, \l_yes, \key + \branch
+	.popsection
+.endm
+
+.macro STATIC_BRANCH_JMP l_yes:req key:req branch:req
+1:
+	.byte 0xe9
+	.long \l_yes - 2f
+2:
+	.pushsection __jump_table, "aw"
+	_ASM_ALIGN
+	_ASM_PTR 1b, \l_yes, \key + \branch
+	.popsection
+.endm
+
 #endif	/* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index bf8b9c93e255..161c95059044 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -13,3 +13,4 @@
 #include <asm/paravirt.h>
 #include <asm/asm.h>
 #include <asm/cpufeature.h>
+#include <asm/jump_label.h>
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/9] Makefile: Prepare for using macros for inline asm
  2018-06-12 11:50   ` Nadav Amit
  (?)
@ 2018-06-13 17:43   ` Masahiro Yamada
  2018-06-13 20:36     ` Sam Ravnborg
  2018-06-13 22:19     ` Nadav Amit
  -1 siblings, 2 replies; 20+ messages in thread
From: Masahiro Yamada @ 2018-06-13 17:43 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Linux Kernel Mailing List, X86 ML, Michal Marek, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Linux Kbuild mailing list

Hi.


2018-06-12 20:50 GMT+09:00 Nadav Amit <namit@vmware.com>:
> Using macros for inline assembly improves both readability and
> compilation decisions that are distorted by big assembly blocks that use
> alternative sections. Compile macros.S and use it to assemble all C
> files. Currently, only x86 will use it.
>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Michal Marek <michal.lkml@markovi.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: linux-kbuild@vger.kernel.org
>
> Signed-off-by: Nadav Amit <namit@vmware.com>


I have not fully understood this series yet.

I do not have enough skill in x86 architecture,
but just some comments from the build system point of view.



I guess this will probably break the parallel building.

Kbuild can build 'prepare' and 'scripts' simultaneously.


I locally modified the following line:


diff --git a/Makefile b/Makefile
index 2dea909440..6ad484a 100644
--- a/Makefile
+++ b/Makefile
@@ -1030,7 +1030,7 @@ $(sort $(vmlinux-deps)): $(vmlinux-dirs) ;
 # Error messages still appears in the original language

 PHONY += $(vmlinux-dirs)
-$(vmlinux-dirs): prepare scripts
+$(vmlinux-dirs): scripts prepare
        $(Q)$(MAKE) $(build)=$@ need-builtin=1

 define filechk_kernel.release



masahiro@grover:~/workspace/linux-kbuild$ make  defconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  YACC    scripts/kconfig/zconf.tab.c
  LEX     scripts/kconfig/zconf.lex.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
*** Default configuration is based on 'x86_64_defconfig'
#
# configuration written to .config
#
masahiro@grover:~/workspace/linux-kbuild$ make all
scripts/kconfig/conf  --syncconfig Kconfig
  WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
  WRAP    arch/x86/include/generated/uapi/asm/poll.h
  WRAP    arch/x86/include/generated/asm/dma-contiguous.h
  WRAP    arch/x86/include/generated/asm/early_ioremap.h
  WRAP    arch/x86/include/generated/asm/mcs_spinlock.h
  WRAP    arch/x86/include/generated/asm/mm-arch-hooks.h
  CC      scripts/mod/empty.o
Assembler messages:
Error: can't open arch/x86/kernel/macros.s for reading: No such file
or directory
make[2]: *** [scripts/Makefile.build:318: scripts/mod/empty.o] Error 1
make[1]: *** [scripts/Makefile.build:558: scripts/mod] Error 2
make: *** [Makefile:1050: scripts] Error 2







> ---
>  Makefile                 |  9 +++++++--
>  arch/x86/Makefile        | 11 +++++++++--
>  arch/x86/kernel/Makefile |  6 ++++++
>  arch/x86/kernel/macros.S |  7 +++++++
>  scripts/Kbuild.include   |  4 +++-
>  5 files changed, 32 insertions(+), 5 deletions(-)
>  create mode 100644 arch/x86/kernel/macros.S
>
> diff --git a/Makefile b/Makefile
> index 554dcaddbce4..026586db2f26 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1085,7 +1085,7 @@ scripts: scripts_basic include/config/auto.conf include/config/tristate.conf \
>  # version.h and scripts_basic is processed / created.
>
>  # Listed in dependency order
> -PHONY += prepare archprepare prepare0 prepare1 prepare2 prepare3
> +PHONY += prepare archprepare macroprepare prepare0 prepare1 prepare2 prepare3
>
>  # prepare3 is used to check if we are building in a separate output directory,
>  # and if so do:
> @@ -1109,7 +1109,9 @@ prepare1: prepare2 $(version_h) $(autoksyms_h) include/generated/utsrelease.h \
>                     include/config/auto.conf
>         $(cmd_crmodverdir)
>
> -archprepare: archheaders archscripts prepare1 scripts_basic
> +macroprepare: prepare1 archmacros
> +
> +archprepare: archheaders archscripts macroprepare scripts_basic


Are you planning to support for all architectures, or x86-specific?

Currently, there are some hooks to do arch-specific things such as
'archprepare', 'archscripts'.




>  prepare0: archprepare gcc-plugins
>         $(Q)$(MAKE) $(build)=.
> @@ -1214,6 +1216,9 @@ archheaders:
>  PHONY += archscripts
>  archscripts:
>
> +PHONY += archmacros
> +archmacros:
> +
>  PHONY += __headers
>  __headers: $(version_h) scripts_basic uapi-asm-generic archheaders archscripts
>         $(Q)$(MAKE) $(build)=scripts build_unifdef
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 60135cbd905c..6b82314776fd 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -235,8 +235,8 @@ ifdef CONFIG_X86_64
>  LDFLAGS += $(call ld-option, -z max-page-size=0x200000)
>  endif
>
> -# Speed up the build
> -KBUILD_CFLAGS += -pipe
> +# We cannot use -pipe flag since we give an additional .s file to the compiler
> +#KBUILD_CFLAGS += -pipe
>  # Workaround for a gcc prelease that unfortunately was shipped in a suse release
>  KBUILD_CFLAGS += -Wno-sign-compare
>  #
> @@ -258,11 +258,18 @@ archscripts: scripts_basic
>  archheaders:
>         $(Q)$(MAKE) $(build)=arch/x86/entry/syscalls all
>
> +archmacros:
> +       $(Q)$(MAKE) $(build)=arch/x86/kernel macros
> +
>  archprepare:
>  ifeq ($(CONFIG_KEXEC_FILE),y)
>         $(Q)$(MAKE) $(build)=arch/x86/purgatory arch/x86/purgatory/kexec-purgatory.c
>  endif
>
> +ASM_MACRO_FLAGS = -Wa,arch/x86/kernel/macros.s
> +export ASM_MACRO_FLAGS
> +KBUILD_CFLAGS += $(ASM_MACRO_FLAGS)
> +
>  ###
>  # Kernel objects
>
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 02d6f5cf4e70..fdb6c5b2a922 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -9,6 +9,12 @@ extra-y        += ebda.o
>  extra-y        += platform-quirks.o
>  extra-y        += vmlinux.lds
>
> +$(obj)/macros.s: $(obj)/macros.S FORCE
> +       $(call if_changed_dep,cpp_s_S)

This is unnecessary.

Kbuild will find the correct pattern rule
in scripts/Makefile.build





> +macros: $(obj)/macros.s
> +       @:

If you add a phony target, it should be added to 'PHONY'.

Like this:


PHONY += macros
macros: $(obj)/macros.s
       @:



>  CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)
>
>  ifdef CONFIG_FUNCTION_TRACER
> diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
> new file mode 100644
> index 000000000000..cfc1c7d1a6eb
> --- /dev/null
> +++ b/arch/x86/kernel/macros.S
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * This file includes headers whose assembly part includes macros which are
> + * commonly used. The macros are precompiled into assmebly file which is later
> + * assembled together with each compiled file.
> + */
> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
> index 50cee534fd64..ad2c02062aa4 100644
> --- a/scripts/Kbuild.include
> +++ b/scripts/Kbuild.include
> @@ -189,7 +189,9 @@ __cc-option = $(call try-run-cached,\
>
>  # Do not attempt to build with gcc plugins during cc-option tests.
>  # (And this uses delayed resolution so the flags will be up to date.)
> -CC_OPTION_CFLAGS = $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS))
> +# In addition, do not include the asm macros which are built later.
> +CC_OPTION_FILTERED = $(GCC_PLUGINS_CFLAGS) $(ASM_MACRO_FLAGS)
> +CC_OPTION_CFLAGS = $(filter-out $(CC_OPTION_FILTERED),$(KBUILD_CFLAGS))
>
>  # cc-option
>  # Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586)
> --
> 2.17.0
>



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/9] Makefile: Prepare for using macros for inline asm
  2018-06-13 17:43   ` Masahiro Yamada
@ 2018-06-13 20:36     ` Sam Ravnborg
  2018-06-17  0:47       ` Masahiro Yamada
  2018-06-13 22:19     ` Nadav Amit
  1 sibling, 1 reply; 20+ messages in thread
From: Sam Ravnborg @ 2018-06-13 20:36 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Nadav Amit, Linux Kernel Mailing List, X86 ML, Michal Marek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Linux Kbuild mailing list

> 
> > +macros: $(obj)/macros.s
> > +       @:
> 
> If you add a phony target, it should be added to 'PHONY'.

Or this part:
> +archmacros:
> +       $(Q)$(MAKE) $(build)=arch/x86/kernel macros

could be modified to specify the exact filename so this indirection is not needed.

PS. Only one file posted to kbuild, so impossible to review as context is missing.

	Sam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/9] Makefile: Prepare for using macros for inline asm
  2018-06-13 17:43   ` Masahiro Yamada
  2018-06-13 20:36     ` Sam Ravnborg
@ 2018-06-13 22:19     ` Nadav Amit
  2018-06-17  0:40       ` Masahiro Yamada
  1 sibling, 1 reply; 20+ messages in thread
From: Nadav Amit @ 2018-06-13 22:19 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Linux Kernel Mailing List, X86 ML, Michal Marek, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Linux Kbuild mailing list

Thanks for your comment! I am certainly Makefile-challenged, so your help is
highly appreciated.

Please see my responses/questions inline.

at 10:43 AM, Masahiro Yamada <yamada.masahiro@socionext.com> wrote:

> 2018-06-12 20:50 GMT+09:00 Nadav Amit <namit@vmware.com>:
>> Using macros for inline assembly improves both readability and
>> compilation decisions that are distorted by big assembly blocks that use
>> alternative sections. Compile macros.S and use it to assemble all C
>> files. Currently, only x86 will use it.
>> 
>> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
>> Cc: Michal Marek <michal.lkml@markovi.net>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Cc: linux-kbuild@vger.kernel.org
>> 
>> Signed-off-by: Nadav Amit <namit@vmware.com>
> 
> 
> I have not fully understood this series yet.
> 
> I do not have enough skill in x86 architecture,
> but just some comments from the build system point of view.
> 
> 
> 
> I guess this will probably break the parallel building.
> 
> Kbuild can build 'prepare' and 'scripts' simultaneously.
> 
> 
> I locally modified the following line:
> 
> 
> diff --git a/Makefile b/Makefile
> index 2dea909440..6ad484a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1030,7 +1030,7 @@ $(sort $(vmlinux-deps)): $(vmlinux-dirs) ;
> # Error messages still appears in the original language
> 
> PHONY += $(vmlinux-dirs)
> -$(vmlinux-dirs): prepare scripts
> +$(vmlinux-dirs): scripts prepare
>        $(Q)$(MAKE) $(build)=$@ need-builtin=1
> 
> define filechk_kernel.release
> 
> 
> 
> masahiro@grover:~/workspace/linux-kbuild$ make  defconfig
>  HOSTCC  scripts/basic/fixdep
>  HOSTCC  scripts/kconfig/conf.o
>  YACC    scripts/kconfig/zconf.tab.c
>  LEX     scripts/kconfig/zconf.lex.c
>  HOSTCC  scripts/kconfig/zconf.tab.o
>  HOSTLD  scripts/kconfig/conf
> *** Default configuration is based on 'x86_64_defconfig'
> #
> # configuration written to .config
> #
> masahiro@grover:~/workspace/linux-kbuild$ make all
> scripts/kconfig/conf  --syncconfig Kconfig
>  WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
>  WRAP    arch/x86/include/generated/uapi/asm/poll.h
>  WRAP    arch/x86/include/generated/asm/dma-contiguous.h
>  WRAP    arch/x86/include/generated/asm/early_ioremap.h
>  WRAP    arch/x86/include/generated/asm/mcs_spinlock.h
>  WRAP    arch/x86/include/generated/asm/mm-arch-hooks.h
>  CC      scripts/mod/empty.o
> Assembler messages:
> Error: can't open arch/x86/kernel/macros.s for reading: No such file
> or directory
> make[2]: *** [scripts/Makefile.build:318: scripts/mod/empty.o] Error 1
> make[1]: *** [scripts/Makefile.build:558: scripts/mod] Error 2
> make: *** [Makefile:1050: scripts] Error 2

You are right. I tried to filter out the use of the switch (exported as
ASM_MACRO_FLAGS) for empty.o. Any suggestions on how to do it properly?

> Are you planning to support for all architectures, or x86-specific?

Not right now. But I think that other architectures will also prefer to
separate the inline assembly chunks in a similar manner, for better
compilation, easier readability, and better code maintainability.

Thanks again,
Nadav


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/9] Makefile: Prepare for using macros for inline asm
  2018-06-13 22:19     ` Nadav Amit
@ 2018-06-17  0:40       ` Masahiro Yamada
  2018-06-19  7:12         ` Nadav Amit
  0 siblings, 1 reply; 20+ messages in thread
From: Masahiro Yamada @ 2018-06-17  0:40 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Linux Kernel Mailing List, X86 ML, Michal Marek, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Linux Kbuild mailing list

Hi.

2018-06-14 7:19 GMT+09:00 Nadav Amit <namit@vmware.com>:
>>
>>
>> I have not fully understood this series yet.
>>
>> I do not have enough skill in x86 architecture,
>> but just some comments from the build system point of view.
>>
>>
>>
>> I guess this will probably break the parallel building.
>>
>> Kbuild can build 'prepare' and 'scripts' simultaneously.
>>
>>
>> I locally modified the following line:
>>
>>
>> diff --git a/Makefile b/Makefile
>> index 2dea909440..6ad484a 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -1030,7 +1030,7 @@ $(sort $(vmlinux-deps)): $(vmlinux-dirs) ;
>> # Error messages still appears in the original language
>>
>> PHONY += $(vmlinux-dirs)
>> -$(vmlinux-dirs): prepare scripts
>> +$(vmlinux-dirs): scripts prepare
>>        $(Q)$(MAKE) $(build)=$@ need-builtin=1
>>
>> define filechk_kernel.release
>>
>>
>>
>> masahiro@grover:~/workspace/linux-kbuild$ make  defconfig
>>  HOSTCC  scripts/basic/fixdep
>>  HOSTCC  scripts/kconfig/conf.o
>>  YACC    scripts/kconfig/zconf.tab.c
>>  LEX     scripts/kconfig/zconf.lex.c
>>  HOSTCC  scripts/kconfig/zconf.tab.o
>>  HOSTLD  scripts/kconfig/conf
>> *** Default configuration is based on 'x86_64_defconfig'
>> #
>> # configuration written to .config
>> #
>> masahiro@grover:~/workspace/linux-kbuild$ make all
>> scripts/kconfig/conf  --syncconfig Kconfig
>>  WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
>>  WRAP    arch/x86/include/generated/uapi/asm/poll.h
>>  WRAP    arch/x86/include/generated/asm/dma-contiguous.h
>>  WRAP    arch/x86/include/generated/asm/early_ioremap.h
>>  WRAP    arch/x86/include/generated/asm/mcs_spinlock.h
>>  WRAP    arch/x86/include/generated/asm/mm-arch-hooks.h
>>  CC      scripts/mod/empty.o
>> Assembler messages:
>> Error: can't open arch/x86/kernel/macros.s for reading: No such file
>> or directory
>> make[2]: *** [scripts/Makefile.build:318: scripts/mod/empty.o] Error 1
>> make[1]: *** [scripts/Makefile.build:558: scripts/mod] Error 2
>> make: *** [Makefile:1050: scripts] Error 2
>
> You are right. I tried to filter out the use of the switch (exported as
> ASM_MACRO_FLAGS) for empty.o. Any suggestions on how to do it properly?


You can do like this:


diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile
index 42c5d50..a5b4af4 100644
--- a/scripts/mod/Makefile
+++ b/scripts/mod/Makefile
@@ -4,6 +4,8 @@ OBJECT_FILES_NON_STANDARD := y
 hostprogs-y    := modpost mk_elfconfig
 always         := $(hostprogs-y) empty.o

+CFLAGS_REMOVE_empty.o := $(ASM_MACRO_FLAGS)
+
 modpost-objs   := modpost.o file2alias.o sumversion.o

 devicetable-offsets-file := devicetable-offsets.h





modpost is an exceptional hostprogam - it depends on
target compiler, which makes the build order complicated.

I want to compile under scripts/ only with $(HOSTCC).

I am planning to change modpost
target-independent by detecting
32/64 bit, little/big-endianness run-time.
Then, empty.c will go away.


In the meanwhile, CFLAGS_REMOVE_ will do
as a workaround.


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/9] Makefile: Prepare for using macros for inline asm
  2018-06-13 20:36     ` Sam Ravnborg
@ 2018-06-17  0:47       ` Masahiro Yamada
  0 siblings, 0 replies; 20+ messages in thread
From: Masahiro Yamada @ 2018-06-17  0:47 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: Nadav Amit, Linux Kernel Mailing List, X86 ML, Michal Marek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Linux Kbuild mailing list

2018-06-14 5:36 GMT+09:00 Sam Ravnborg <sam@ravnborg.org>:
>>
>> > +macros: $(obj)/macros.s
>> > +       @:
>>
>> If you add a phony target, it should be added to 'PHONY'.
>
> Or this part:
>> +archmacros:
>> +       $(Q)$(MAKE) $(build)=arch/x86/kernel macros
>
> could be modified to specify the exact filename so this indirection is not needed.

Yes, it could work.

Actually, two styles co-exist in arch/x86/Makefile.
(archprepare vs archscripts)




-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 1/9] Makefile: Prepare for using macros for inline asm
  2018-06-17  0:40       ` Masahiro Yamada
@ 2018-06-19  7:12         ` Nadav Amit
  0 siblings, 0 replies; 20+ messages in thread
From: Nadav Amit @ 2018-06-19  7:12 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Linux Kernel Mailing List, X86 ML, Michal Marek, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Linux Kbuild mailing list

at 5:40 PM, Masahiro Yamada <yamada.masahiro@socionext.com> wrote:

> In the meanwhile, CFLAGS_REMOVE_ will do
> as a workaround.

Thanks! I appreciate your help.

I will submit tomorrow an updated version.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 6/9] x86: prevent inline distortion by paravirt ops
  2018-06-12 11:50 ` [PATCH v4 6/9] x86: prevent inline distortion by paravirt ops Nadav Amit
@ 2018-06-19 15:09     ` Juergen Gross
  0 siblings, 0 replies; 20+ messages in thread
From: Juergen Gross @ 2018-06-19 15:09 UTC (permalink / raw)
  To: Nadav Amit, linux-kernel, x86
  Cc: Alok Kataria, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	virtualization

On 12/06/18 13:50, Nadav Amit wrote:
> GCC considers the number of statements in inlined assembly blocks,
> according to new-lines and semicolons, as an indication to the cost of
> the block in time and space. This data is distorted by the kernel code,
> which puts information in alternative sections. As a result, the
> compiler may perform incorrect inlining and branch optimizations.
> 
> The solution is to set an assembly macro and call it from the inlined
> assembly block. As a result GCC considers the inline assembly block as
> a single instruction.
> 
> The effect of the patch is a more aggressive inlining, which also
> causes a size increase of kernel.
> 
>    text	   data	    bss	    dec	    hex	filename
> 18147336 10226688 2957312 31331336 1de1408 ./vmlinux before
> 18162555 10226288 2957312 31346155 1de4deb ./vmlinux after (+14819)
> 
> Static text symbols:
> Before:	40053
> After:	39942	(-111)
> 
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Alok Kataria <akataria@vmware.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: virtualization@lists.linux-foundation.org
> 
> Signed-off-by: Nadav Amit <namit@vmware.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 6/9] x86: prevent inline distortion by paravirt ops
@ 2018-06-19 15:09     ` Juergen Gross
  0 siblings, 0 replies; 20+ messages in thread
From: Juergen Gross @ 2018-06-19 15:09 UTC (permalink / raw)
  To: Nadav Amit, linux-kernel, x86
  Cc: Alok Kataria, Ingo Molnar, virtualization, Thomas Gleixner,
	H. Peter Anvin

On 12/06/18 13:50, Nadav Amit wrote:
> GCC considers the number of statements in inlined assembly blocks,
> according to new-lines and semicolons, as an indication to the cost of
> the block in time and space. This data is distorted by the kernel code,
> which puts information in alternative sections. As a result, the
> compiler may perform incorrect inlining and branch optimizations.
> 
> The solution is to set an assembly macro and call it from the inlined
> assembly block. As a result GCC considers the inline assembly block as
> a single instruction.
> 
> The effect of the patch is a more aggressive inlining, which also
> causes a size increase of kernel.
> 
>    text	   data	    bss	    dec	    hex	filename
> 18147336 10226688 2957312 31331336 1de1408 ./vmlinux before
> 18162555 10226288 2957312 31346155 1de4deb ./vmlinux after (+14819)
> 
> Static text symbols:
> Before:	40053
> After:	39942	(-111)
> 
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Alok Kataria <akataria@vmware.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: virtualization@lists.linux-foundation.org
> 
> Signed-off-by: Nadav Amit <namit@vmware.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-06-19 15:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-12 11:50 [PATCH v4 0/9] x86: macrofying inline asm for better compilation Nadav Amit
2018-06-12 11:50 ` [PATCH v4 1/9] Makefile: Prepare for using macros for inline asm Nadav Amit
2018-06-12 11:50   ` Nadav Amit
2018-06-13 17:43   ` Masahiro Yamada
2018-06-13 20:36     ` Sam Ravnborg
2018-06-17  0:47       ` Masahiro Yamada
2018-06-13 22:19     ` Nadav Amit
2018-06-17  0:40       ` Masahiro Yamada
2018-06-19  7:12         ` Nadav Amit
2018-06-12 11:50 ` [PATCH v4 2/9] x86: objtool: use asm macro for better compiler decisions Nadav Amit
2018-06-12 11:50   ` Nadav Amit
2018-06-12 11:50 ` [PATCH v4 3/9] x86: refcount: prevent gcc distortions Nadav Amit
2018-06-12 11:50 ` [PATCH v4 4/9] x86: alternatives: macrofy locks for better inlining Nadav Amit
2018-06-12 11:50 ` [PATCH v4 5/9] x86: bug: prevent gcc distortions Nadav Amit
2018-06-12 11:50 ` [PATCH v4 6/9] x86: prevent inline distortion by paravirt ops Nadav Amit
2018-06-19 15:09   ` Juergen Gross
2018-06-19 15:09     ` Juergen Gross
2018-06-12 11:50 ` [PATCH v4 7/9] x86: extable: use macros instead of inline assembly Nadav Amit
2018-06-12 11:50 ` [PATCH v4 8/9] x86: cpufeature: " Nadav Amit
2018-06-12 11:50 ` [PATCH v4 9/9] x86: jump-labels: " Nadav Amit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.