All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Macrofying inline assembly for better compilation
@ 2018-05-17 16:13 ` Nadav Amit
  0 siblings, 0 replies; 39+ messages in thread
From: Nadav Amit @ 2018-05-17 16:13 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: nadav.amit, Nadav Amit, Alok Kataria, Christopher Li,
	H. Peter Anvin, Ingo Molnar, Jan Beulich, Jonathan Corbet,
	Josh Poimboeuf, Juergen Gross, Kees Cook, linux-sparse,
	Peter Zijlstra, Randy Dunlap, Thomas Gleixner, virtualization

This patch-set deals with an interesting yet stupid problem: kernel code
that does not get inlined despite its simplicity. There are several
causes for this behavior: "cold" attribute on __init, different function
optimization levels; conditional constant computations based on
__builtin_constant_p(); and finally large inline assembly blocks.

This patch-set deals with the inline assembly problem. I separated these
patches from the others (that were sent in the RFC) for easier
inclusion.

The problem with inline assembly is that inline assembly is often used
by the kernel for things that are other than code - for example,
assembly directives and data. GCC however is oblivious to the content of
the blocks and assumes their cost in space and time is proportional to
the number of the perceived assembly "instruction", according to the
number of newlines and semicolons. Alternatives, paravirt and other
mechanisms are affected, causing code not to be inlined, and degrading
compilation quality in general.

The solution that this patch-set carries for this problem is to create
an assembly macro, and then call it from the inline assembly block.  As
a result, the compiler sees a single "instruction" and assigns the more
appropriate cost to the code. In addition, this patch-set removes
unneeded new-lines from common x86 inline asm's, which "confuse" GCC
heuristics.

Overall this patch-set slightly increases the kernel size (my build was
done using my localmodconfig + localyesconfig for the record):

   text    data     bss     dec     hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18148888 10064016 2936832 31149736 1db4ea8 ./vmlinux after (+0.06%)

The patch-set eliminates many of the static text symbols:
Before: 40033
After:	39650	(-1%)

A microbenchmark with a loop of page-fault and MADV_DONTNEED show 2%
performance improvement with this patch-set (when PTI is off).

Changes from RFC:
- Better formatting [Jan]
- i386 build problems [0-day]
- Inline comments
- Separating __builtin_constant_p() into a different future patch-set

Cc: Alok Kataria <akataria@vmware.com>
Cc: Christopher Li <sparse@chrisli.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-sparse@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: x86@kernel.org

Nadav Amit (6):
  x86: objtool: use asm macro for better compiler decisions
  x86: bug: prevent gcc distortions
  x86: alternative: macrofy locks for better inlining
  x86: prevent inline distortion by paravirt ops
  x86: refcount: prevent gcc distortions
  x86: removing unneeded new-lines

 arch/x86/include/asm/alternative.h    | 34 +++++++++++----
 arch/x86/include/asm/asm.h            |  4 +-
 arch/x86/include/asm/bug.h            | 56 ++++++++++++++++--------
 arch/x86/include/asm/cmpxchg.h        | 10 ++---
 arch/x86/include/asm/paravirt_types.h | 63 +++++++++++++++++----------
 arch/x86/include/asm/refcount.h       | 62 ++++++++++++++++----------
 arch/x86/include/asm/special_insns.h  | 12 ++---
 include/linux/compiler.h              | 37 ++++++++++++----
 8 files changed, 183 insertions(+), 95 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 0/6] Macrofying inline assembly for better compilation
@ 2018-05-17 16:13 ` Nadav Amit
  0 siblings, 0 replies; 39+ messages in thread
From: Nadav Amit @ 2018-05-17 16:13 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: nadav.amit, Nadav Amit, Alok Kataria, Christopher Li,
	H. Peter Anvin, Ingo Molnar, Jan Beulich, Jonathan Corbet,
	Josh Poimboeuf, Juergen Gross, Kees Cook, linux-sparse,
	Peter Zijlstra, Randy Dunlap, Thomas Gleixner, virtualization

This patch-set deals with an interesting yet stupid problem: kernel code
that does not get inlined despite its simplicity. There are several
causes for this behavior: "cold" attribute on __init, different function
optimization levels; conditional constant computations based on
__builtin_constant_p(); and finally large inline assembly blocks.

This patch-set deals with the inline assembly problem. I separated these
patches from the others (that were sent in the RFC) for easier
inclusion.

The problem with inline assembly is that inline assembly is often used
by the kernel for things that are other than code - for example,
assembly directives and data. GCC however is oblivious to the content of
the blocks and assumes their cost in space and time is proportional to
the number of the perceived assembly "instruction", according to the
number of newlines and semicolons. Alternatives, paravirt and other
mechanisms are affected, causing code not to be inlined, and degrading
compilation quality in general.

The solution that this patch-set carries for this problem is to create
an assembly macro, and then call it from the inline assembly block.  As
a result, the compiler sees a single "instruction" and assigns the more
appropriate cost to the code. In addition, this patch-set removes
unneeded new-lines from common x86 inline asm's, which "confuse" GCC
heuristics.

Overall this patch-set slightly increases the kernel size (my build was
done using my localmodconfig + localyesconfig for the record):

   text    data     bss     dec     hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18148888 10064016 2936832 31149736 1db4ea8 ./vmlinux after (+0.06%)

The patch-set eliminates many of the static text symbols:
Before: 40033
After:	39650	(-1%)

A microbenchmark with a loop of page-fault and MADV_DONTNEED show 2%
performance improvement with this patch-set (when PTI is off).

Changes from RFC:
- Better formatting [Jan]
- i386 build problems [0-day]
- Inline comments
- Separating __builtin_constant_p() into a different future patch-set

Cc: Alok Kataria <akataria@vmware.com>
Cc: Christopher Li <sparse@chrisli.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-sparse@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: x86@kernel.org

Nadav Amit (6):
  x86: objtool: use asm macro for better compiler decisions
  x86: bug: prevent gcc distortions
  x86: alternative: macrofy locks for better inlining
  x86: prevent inline distortion by paravirt ops
  x86: refcount: prevent gcc distortions
  x86: removing unneeded new-lines

 arch/x86/include/asm/alternative.h    | 34 +++++++++++----
 arch/x86/include/asm/asm.h            |  4 +-
 arch/x86/include/asm/bug.h            | 56 ++++++++++++++++--------
 arch/x86/include/asm/cmpxchg.h        | 10 ++---
 arch/x86/include/asm/paravirt_types.h | 63 +++++++++++++++++----------
 arch/x86/include/asm/refcount.h       | 62 ++++++++++++++++----------
 arch/x86/include/asm/special_insns.h  | 12 ++---
 include/linux/compiler.h              | 37 ++++++++++++----
 8 files changed, 183 insertions(+), 95 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/6] x86: objtool: use asm macro for better compiler decisions
  2018-05-17 16:13 ` Nadav Amit
@ 2018-05-17 16:13   ` Nadav Amit
  -1 siblings, 0 replies; 39+ messages in thread
From: Nadav Amit @ 2018-05-17 16:13 UTC (permalink / raw)
  To: linux-kernel, x86; +Cc: nadav.amit, Nadav Amit, Christopher Li, linux-sparse

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

In the case of objtool, this distortion is extreme, since anyhow the
annotations of objtool are discarded during linkage.

The solution is to set an assembly macro and call it from the inlinedv
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch slightly increases the kernel size.

   text    data     bss     dec     hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18126824 10067268 2936832 31130924 1db052c ./vmlinux after (+665)

But allows more aggressive inlining. Static text symbols:
Before: 40033
After: 40015 (-18)

Cc: Christopher Li <sparse@chrisli.org>
Cc: linux-sparse@vger.kernel.org

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 include/linux/compiler.h | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c63601..6cbabc6b195a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -97,19 +97,40 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
  * These macros help objtool understand GCC code flow for unreachable code.
  * The __COUNTER__ based labels are a hack to make each instance of the macros
  * unique, to convince GCC not to merge duplicate inline asm statements.
+ *
+ * The annotation logic is encapsulated within assembly macros, which are then
+ * called on each annotation. This hack is necessary to prevent GCC from
+ * considering the inline assembly blocks as costly in time and space, which can
+ * prevent function inlining and lead to other bad compilation decisions. GCC
+ * computes inline assembly cost according to the number of perceived number of
+ * assembly instruction, based on the number of new-lines and semicolons in the
+ * assembly block. Since the annotations will be discarded during linkage, the
+ * macros make the annotations to be considered "cheap" and let GCC to emit
+ * better code.
  */
+asm(".macro __annotate_reachable counter:req\n"
+    "\\counter:\n\t"
+    ".pushsection .discard.reachable\n\t"
+    ".long \\counter\\()b -.\n\t"
+    ".popsection\n\t"
+    ".endm");
+
 #define annotate_reachable() ({						\
-	asm volatile("%c0:\n\t"						\
-		     ".pushsection .discard.reachable\n\t"		\
-		     ".long %c0b - .\n\t"				\
-		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+	asm volatile("__annotate_reachable %c0" : : "i" (__COUNTER__));	\
 })
+
+asm(".macro __annotate_unreachable counter:req\n"
+    "\\counter:\n\t"
+    ".pushsection .discard.unreachable\n\t"
+    ".long \\counter\\()b -.\n\t"
+    ".popsection\n\t"
+    ".endm");
+
 #define annotate_unreachable() ({					\
-	asm volatile("%c0:\n\t"						\
-		     ".pushsection .discard.unreachable\n\t"		\
-		     ".long %c0b - .\n\t"				\
-		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+	asm volatile("__annotate_unreachable %c0" : :			\
+		     "i" (__COUNTER__));				\
 })
+
 #define ASM_UNREACHABLE							\
 	"999:\n\t"							\
 	".pushsection .discard.unreachable\n\t"				\
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 1/6] x86: objtool: use asm macro for better compiler decisions
@ 2018-05-17 16:13   ` Nadav Amit
  0 siblings, 0 replies; 39+ messages in thread
From: Nadav Amit @ 2018-05-17 16:13 UTC (permalink / raw)
  To: linux-kernel, x86; +Cc: nadav.amit, Nadav Amit, Christopher Li, linux-sparse

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

In the case of objtool, this distortion is extreme, since anyhow the
annotations of objtool are discarded during linkage.

The solution is to set an assembly macro and call it from the inlinedv
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch slightly increases the kernel size.

   text    data     bss     dec     hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18126824 10067268 2936832 31130924 1db052c ./vmlinux after (+665)

But allows more aggressive inlining. Static text symbols:
Before: 40033
After: 40015 (-18)

Cc: Christopher Li <sparse@chrisli.org>
Cc: linux-sparse@vger.kernel.org

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 include/linux/compiler.h | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c63601..6cbabc6b195a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -97,19 +97,40 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
  * These macros help objtool understand GCC code flow for unreachable code.
  * The __COUNTER__ based labels are a hack to make each instance of the macros
  * unique, to convince GCC not to merge duplicate inline asm statements.
+ *
+ * The annotation logic is encapsulated within assembly macros, which are then
+ * called on each annotation. This hack is necessary to prevent GCC from
+ * considering the inline assembly blocks as costly in time and space, which can
+ * prevent function inlining and lead to other bad compilation decisions. GCC
+ * computes inline assembly cost according to the number of perceived number of
+ * assembly instruction, based on the number of new-lines and semicolons in the
+ * assembly block. Since the annotations will be discarded during linkage, the
+ * macros make the annotations to be considered "cheap" and let GCC to emit
+ * better code.
  */
+asm(".macro __annotate_reachable counter:req\n"
+    "\\counter:\n\t"
+    ".pushsection .discard.reachable\n\t"
+    ".long \\counter\\()b -.\n\t"
+    ".popsection\n\t"
+    ".endm");
+
 #define annotate_reachable() ({						\
-	asm volatile("%c0:\n\t"						\
-		     ".pushsection .discard.reachable\n\t"		\
-		     ".long %c0b - .\n\t"				\
-		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+	asm volatile("__annotate_reachable %c0" : : "i" (__COUNTER__));	\
 })
+
+asm(".macro __annotate_unreachable counter:req\n"
+    "\\counter:\n\t"
+    ".pushsection .discard.unreachable\n\t"
+    ".long \\counter\\()b -.\n\t"
+    ".popsection\n\t"
+    ".endm");
+
 #define annotate_unreachable() ({					\
-	asm volatile("%c0:\n\t"						\
-		     ".pushsection .discard.unreachable\n\t"		\
-		     ".long %c0b - .\n\t"				\
-		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+	asm volatile("__annotate_unreachable %c0" : :			\
+		     "i" (__COUNTER__));				\
 })
+
 #define ASM_UNREACHABLE							\
 	"999:\n\t"							\
 	".pushsection .discard.unreachable\n\t"				\
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-17 16:13 ` Nadav Amit
  (?)
  (?)
@ 2018-05-17 16:13 ` Nadav Amit
  2018-05-18  7:58   ` Peter Zijlstra
  -1 siblings, 1 reply; 39+ messages in thread
From: Nadav Amit @ 2018-05-17 16:13 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: nadav.amit, Nadav Amit, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Josh Poimboeuf

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlinedv
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch increases the kernel size:

   text	   data	    bss	    dec	    hex	filename
18126824 10067268 2936832 31130924 1db052c ./vmlinux before
18127205 10068388 2936832 31132425 1db0b09 ./vmlinux after (+1501)

But enables more aggressive inlining (and probably branch decisions).
The number of static text symbols in vmlinux is lower.

Before:	40015
After:	39860	(-155)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/bug.h | 56 +++++++++++++++++++++++++-------------
 1 file changed, 37 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index 6804d6642767..1167e4822a34 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -30,33 +30,51 @@
 
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 
-#define _BUG_FLAGS(ins, flags)						\
+/*
+ * Saving the bug data is encapsulated within an assembly macro, which is then
+ * called on each use. This hack is necessary to prevent GCC from considering
+ * the inline assembly blocks as costly in time and space, which can prevent
+ * function inlining and lead to other bad compilation decisions. GCC computes
+ * inline assembly cost according to the number of perceived number of assembly
+ * instruction, based on the number of new-lines and semicolons in the assembly
+ * block. The macro will eventually be compiled into a single instruction (and
+ * some data). This scheme allows GCC to better understand the inline asm cost.
+ */
+asm(".macro __BUG_FLAGS ins:req file:req line:req flags:req size:req\n"
+    "1:\t \\ins\n\t"
+    ".pushsection __bug_table,\"aw\"\n"
+    "2:\t "__BUG_REL(1b)		"\t# bug_entry::bug_addr\n\t"
+    __BUG_REL(\\file)			"\t# bug_entry::file\n\t"
+    ".word \\line"			"\t# bug_entry::line\n\t"
+    ".word \\flags"			"\t# bug_entry::flags\n\t"
+    ".org 2b+\\size\n\t"
+    ".popsection\n\t"
+    ".endm");
+
+#define _BUG_FLAGS(ins, flags)                                          \
 do {									\
-	asm volatile("1:\t" ins "\n"					\
-		     ".pushsection __bug_table,\"aw\"\n"		\
-		     "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"	\
-		     "\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"	\
-		     "\t.word %c1"        "\t# bug_entry::line\n"	\
-		     "\t.word %c2"        "\t# bug_entry::flags\n"	\
-		     "\t.org 2b+%c3\n"					\
-		     ".popsection"					\
-		     : : "i" (__FILE__), "i" (__LINE__),		\
-			 "i" (flags),					\
+	asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1 %c2 %c3"		\
+		     : : "i" (__FILE__), "i" (__LINE__),                \
+			 "i" (flags),                                   \
 			 "i" (sizeof(struct bug_entry)));		\
 } while (0)
 
 #else /* !CONFIG_DEBUG_BUGVERBOSE */
 
+asm(".macro __BUG_FLAGS ins:req flags:req size:req\n"
+    "1:\t\\ins\n\t"
+    ".pushsection __bug_table,\"aw\"\n"
+    "2:\t" __BUG_REL(1b)		"\t# bug_entry::bug_addr\n\t"
+    ".word \\flags"			"\t# bug_entry::flags\n\t"
+    ".org 2b+\\size\n\t"
+    ".popsection\n\t"
+    ".endm");
+
 #define _BUG_FLAGS(ins, flags)						\
 do {									\
-	asm volatile("1:\t" ins "\n"					\
-		     ".pushsection __bug_table,\"aw\"\n"		\
-		     "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"	\
-		     "\t.word %c0"        "\t# bug_entry::flags\n"	\
-		     "\t.org 2b+%c1\n"					\
-		     ".popsection"					\
-		     : : "i" (flags),					\
-			 "i" (sizeof(struct bug_entry)));		\
+	asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1"			\
+		    : : "i" (flags),					\
+			"i" (sizeof(struct bug_entry)));		\
 } while (0)
 
 #endif /* CONFIG_DEBUG_BUGVERBOSE */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/6] x86: alternative: macrofy locks for better inlining
  2018-05-17 16:13 ` Nadav Amit
                   ` (2 preceding siblings ...)
  (?)
@ 2018-05-17 16:13 ` Nadav Amit
  -1 siblings, 0 replies; 39+ messages in thread
From: Nadav Amit @ 2018-05-17 16:13 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: nadav.amit, Nadav Amit, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Josh Poimboeuf

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch handles the LOCK prefix, allowing more aggresive inlining.

   text	   data	    bss	    dec	    hex	filename
18127205 10068388 2936832 31132425 1db0b09 ./vmlinux before
18131468 10068488 2936832 31136788 1db1c14 ./vmlinux after (+4363)

Static text symbols:
Before:	39860
After:	39788	(-72)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/alternative.h | 34 +++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 4cd6a3b71824..1dc47c9fd480 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -28,17 +28,35 @@
  * The very common lock prefix is handled as special case in a
  * separate table which is a pure address list without replacement ptr
  * and size information.  That keeps the table sizes small.
+ *
+ * Saving the lock data is encapsulated within an assembly macro, which is then
+ * called on each use. This hack is necessary to prevent GCC from considering
+ * the inline assembly blocks as costly in time and space, which can prevent
+ * function inlining and lead to other bad compilation decisions. GCC computes
+ * inline assembly cost according to the number of perceived number of assembly
+ * instruction, based on the number of new-lines and semicolons in the assembly
+ * block. The macro will eventually be compiled into a single instruction (and
+ * some data). This scheme allows GCC to better understand the inline asm cost.
  */
 
 #ifdef CONFIG_SMP
-#define LOCK_PREFIX_HERE \
-		".pushsection .smp_locks,\"a\"\n"	\
-		".balign 4\n"				\
-		".long 671f - .\n" /* offset */		\
-		".popsection\n"				\
-		"671:"
-
-#define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; "
+
+asm(".macro __LOCK_PREFIX_HERE\n\t"
+    ".pushsection .smp_locks,\"a\"\n\t"
+    ".balign 4\n\t"
+    ".long 671f - .\n\t" /* offset */
+    ".popsection\n"
+    "671:\n\t"
+    ".endm");
+
+#define LOCK_PREFIX_HERE "__LOCK_PREFIX_HERE\n\t"
+
+asm(".macro __LOCK_PREFIX ins:vararg\n\t"
+    "__LOCK_PREFIX_HERE\n\t"
+    "lock; \\ins\n\t"
+    ".endm");
+
+#define LOCK_PREFIX "__LOCK_PREFIX "
 
 #else /* ! CONFIG_SMP */
 #define LOCK_PREFIX_HERE ""
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/6] x86: prevent inline distortion by paravirt ops
  2018-05-17 16:13 ` Nadav Amit
                   ` (3 preceding siblings ...)
  (?)
@ 2018-05-17 16:14 ` Nadav Amit
  -1 siblings, 0 replies; 39+ messages in thread
From: Nadav Amit @ 2018-05-17 16:14 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: nadav.amit, Nadav Amit, Juergen Gross, Alok Kataria,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, virtualization

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

The effect of the patch is a more aggressive inlining, which also
causes a size increase of kernel.

   text	   data	    bss	    dec	    hex	filename
18131468 10068488 2936832 31136788 1db1c14 ./vmlinux before
18146418 10064100 2936832 31147350 1db4556 ./vmlinux after (+10562)

Static text symbols:
Before:	39788
After:	39673	(-115)

Cc: Juergen Gross <jgross@suse.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: virtualization@lists.linux-foundation.org

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/paravirt_types.h | 63 +++++++++++++++++----------
 1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 180bc0bff0fb..ea62204c2ee6 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -346,20 +346,45 @@ extern struct pv_lock_ops pv_lock_ops;
 /*
  * Generate some code, and mark it as patchable by the
  * apply_paravirt() alternate instruction patcher.
+ *
+ * This generates an indirect call based on the operation type number.
+ * The type number, computed in PARAVIRT_PATCH, is derived from the
+ * offset into the paravirt_patch_template structure, and can therefore be
+ * freely converted back into a structure offset.
+ *
+ * The paravirtual alternative logic and data are encapsulated within an
+ * assembly macro, which is then called on each use. This hack is necessary to
+ * prevent GCC from considering the inline assembly blocks as costly in time and
+ * space, which can prevent function inlining and lead to other bad compilation
+ * decisions. GCC computes inline assembly cost according to the number of
+ * perceived number of assembly instruction, based on the number of new-lines
+ * and semicolons in the assembly block. The macro will eventually be compiled
+ * into a single instruction (and some data). This scheme allows GCC to better
+ * understand the inline asm cost.
  */
-#define _paravirt_alt(insn_string, type, clobber)	\
-	"771:\n\t" insn_string "\n" "772:\n"		\
-	".pushsection .parainstructions,\"a\"\n"	\
-	_ASM_ALIGN "\n"					\
-	_ASM_PTR " 771b\n"				\
-	"  .byte " type "\n"				\
-	"  .byte 772b-771b\n"				\
-	"  .short " clobber "\n"			\
-	".popsection\n"
+asm(".macro __paravirt_alt type:req clobber:req pv_opptr:req\n"
+    "771:\n\t"
+    ANNOTATE_RETPOLINE_SAFE "\n\t"
+    "call *\\pv_opptr\n"
+    "772:\n\t"
+    ".pushsection .parainstructions,\"a\"\n\t"
+    _ASM_ALIGN "\n\t"
+    _ASM_PTR " 771b\n\t"
+    ".byte \\type\n\t"
+    ".byte 772b-771b\n\t"
+    ".short \\clobber\n\t"
+    ".popsection\n\t"
+    ".endm");
+
+#define _paravirt_alt(type, clobber, pv_opptr)				\
+	"__paravirt_alt type=" __stringify(type)			\
+	" clobber=" __stringify(clobber)				\
+	" pv_opptr=" __stringify(pv_opptr) "\n\t"
 
 /* Generate patchable code, with the default asm parameters. */
-#define paravirt_alt(insn_string)					\
-	_paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")
+#define paravirt_alt							\
+	_paravirt_alt("%c[paravirt_typenum]", "%c[paravirt_clobber]",	\
+		      "%c[paravirt_opptr]")
 
 /* Simple instruction patching code. */
 #define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
@@ -387,16 +412,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
 
 int paravirt_disable_iospace(void);
 
-/*
- * This generates an indirect call based on the operation type number.
- * The type number, computed in PARAVIRT_PATCH, is derived from the
- * offset into the paravirt_patch_template structure, and can therefore be
- * freely converted back into a structure offset.
- */
-#define PARAVIRT_CALL					\
-	ANNOTATE_RETPOLINE_SAFE				\
-	"call *%c[paravirt_opptr];"
-
 /*
  * These macros are intended to wrap calls through one of the paravirt
  * ops structs, so that they can be later identified and patched at
@@ -534,7 +549,7 @@ int paravirt_disable_iospace(void);
 		/* since this condition will never hold */		\
 		if (sizeof(rettype) > sizeof(unsigned long)) {		\
 			asm volatile(pre				\
-				     paravirt_alt(PARAVIRT_CALL)	\
+				     paravirt_alt			\
 				     post				\
 				     : call_clbr, ASM_CALL_CONSTRAINT	\
 				     : paravirt_type(op),		\
@@ -544,7 +559,7 @@ int paravirt_disable_iospace(void);
 			__ret = (rettype)((((u64)__edx) << 32) | __eax); \
 		} else {						\
 			asm volatile(pre				\
-				     paravirt_alt(PARAVIRT_CALL)	\
+				     paravirt_alt			\
 				     post				\
 				     : call_clbr, ASM_CALL_CONSTRAINT	\
 				     : paravirt_type(op),		\
@@ -571,7 +586,7 @@ int paravirt_disable_iospace(void);
 		PVOP_VCALL_ARGS;					\
 		PVOP_TEST_NULL(op);					\
 		asm volatile(pre					\
-			     paravirt_alt(PARAVIRT_CALL)		\
+			     paravirt_alt				\
 			     post					\
 			     : call_clbr, ASM_CALL_CONSTRAINT		\
 			     : paravirt_type(op),			\
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/6] x86: refcount: prevent gcc distortions
  2018-05-17 16:13 ` Nadav Amit
                   ` (4 preceding siblings ...)
  (?)
@ 2018-05-17 16:14 ` Nadav Amit
  2018-05-19  4:27   ` kbuild test robot
  -1 siblings, 1 reply; 39+ messages in thread
From: Nadav Amit @ 2018-05-17 16:14 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: nadav.amit, Nadav Amit, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Kees Cook, Jan Beulich, Josh Poimboeuf

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch allows to inline functions such as __get_seccomp_filter().
The effect of the patch is as follows on the kernel size:

   text	   data	    bss	    dec	    hex	filename
18146418 10064100 2936832 31147350 1db4556 ./vmlinux before
18148228 10063968 2936832 31149028 1db4be4 ./vmlinux after (+1678)

Static text symbols:
Before:	39673
After:	39649	(-24)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Kees Cook <keescook@chromium.org>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/refcount.h | 62 +++++++++++++++++++++------------
 1 file changed, 39 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
index 4cf11d88d3b3..3126141f2c80 100644
--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -13,35 +13,49 @@
  * the exception handler to use (in mm/extable.c), and then triggers the
  * central refcount exception. The fixup address for the exception points
  * back to the regular execution flow in .text.
+ *
+ * The logic and data are encapsulated within an assembly macro, which is then
+ * called on each use. This hack is necessary to prevent GCC from considering
+ * the inline assembly blocks as costly in time and space, which can prevent
+ * function inlining and lead to other bad compilation decisions. GCC computes
+ * inline assembly cost according to the number of perceived number of assembly
+ * instruction, based on the number of new-lines and semicolons in the assembly
+ * block. The macros will eventually be compiled into a single instruction that
+ * will be actually executed (unless an exception happens). This scheme allows
+ * GCC to better understand the inline asm cost.
  */
-#define _REFCOUNT_EXCEPTION				\
-	".pushsection .text..refcount\n"		\
-	"111:\tlea %[counter], %%" _ASM_CX "\n"		\
-	"112:\t" ASM_UD2 "\n"				\
-	ASM_UNREACHABLE					\
-	".popsection\n"					\
-	"113:\n"					\
-	_ASM_EXTABLE_REFCOUNT(112b, 113b)
+asm(".macro __REFCOUNT_EXCEPTION counter:req\n\t"
+    ".pushsection .text..refcount\n"
+    "111:\tlea \\counter, %" _ASM_CX "\n"
+    "112:\t" ASM_UD2 "\n\t"
+    ASM_UNREACHABLE
+    ".popsection\n"
+    "113:\n\t"
+    _ASM_EXTABLE_REFCOUNT(112b, 113b) "\n\t"
+    ".endm");
 
 /* Trigger refcount exception if refcount result is negative. */
-#define REFCOUNT_CHECK_LT_ZERO				\
-	"js 111f\n\t"					\
-	_REFCOUNT_EXCEPTION
+asm(".macro __REFCOUNT_CHECK_LT_ZERO counter:req\n\t"
+    "js 111f\n\t"
+    "__REFCOUNT_EXCEPTION \\counter\n\t"
+    ".endm");
 
 /* Trigger refcount exception if refcount result is zero or negative. */
-#define REFCOUNT_CHECK_LE_ZERO				\
-	"jz 111f\n\t"					\
-	REFCOUNT_CHECK_LT_ZERO
+asm(".macro __REFCOUNT_CHECK_LE_ZERO counter:req\n\t"
+    "jz 111f\n\t"
+    "__REFCOUNT_CHECK_LT_ZERO counter=\\counter\n\t"
+    ".endm");
 
 /* Trigger refcount exception unconditionally. */
-#define REFCOUNT_ERROR					\
-	"jmp 111f\n\t"					\
-	_REFCOUNT_EXCEPTION
+asm(".macro __REFCOUNT_ERROR counter:req\n\t"
+    "jmp 111f\n\t"
+    "__REFCOUNT_EXCEPTION counter=\\counter\n\t"
+    ".endm");
 
 static __always_inline void refcount_add(unsigned int i, refcount_t *r)
 {
 	asm volatile(LOCK_PREFIX "addl %1,%0\n\t"
-		REFCOUNT_CHECK_LT_ZERO
+		"__REFCOUNT_CHECK_LT_ZERO %[counter]"
 		: [counter] "+m" (r->refs.counter)
 		: "ir" (i)
 		: "cc", "cx");
@@ -50,7 +64,7 @@ static __always_inline void refcount_add(unsigned int i, refcount_t *r)
 static __always_inline void refcount_inc(refcount_t *r)
 {
 	asm volatile(LOCK_PREFIX "incl %0\n\t"
-		REFCOUNT_CHECK_LT_ZERO
+		"__REFCOUNT_CHECK_LT_ZERO %[counter]"
 		: [counter] "+m" (r->refs.counter)
 		: : "cc", "cx");
 }
@@ -58,7 +72,7 @@ static __always_inline void refcount_inc(refcount_t *r)
 static __always_inline void refcount_dec(refcount_t *r)
 {
 	asm volatile(LOCK_PREFIX "decl %0\n\t"
-		REFCOUNT_CHECK_LE_ZERO
+		"__REFCOUNT_CHECK_LE_ZERO %[counter]"
 		: [counter] "+m" (r->refs.counter)
 		: : "cc", "cx");
 }
@@ -66,13 +80,15 @@ static __always_inline void refcount_dec(refcount_t *r)
 static __always_inline __must_check
 bool refcount_sub_and_test(unsigned int i, refcount_t *r)
 {
-	GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", REFCOUNT_CHECK_LT_ZERO,
+	GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
+				  "__REFCOUNT_CHECK_LT_ZERO %[counter]",
 				  r->refs.counter, "er", i, "%0", e, "cx");
 }
 
 static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
 {
-	GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl", REFCOUNT_CHECK_LT_ZERO,
+	GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
+				 "__REFCOUNT_CHECK_LT_ZERO %[counter]",
 				 r->refs.counter, "%0", e, "cx");
 }
 
@@ -90,7 +106,7 @@ bool refcount_add_not_zero(unsigned int i, refcount_t *r)
 
 		/* Did we try to increment from/to an undesirable state? */
 		if (unlikely(c < 0 || c == INT_MAX || result < c)) {
-			asm volatile(REFCOUNT_ERROR
+			asm volatile("__REFCOUNT_ERROR %[counter]"
 				     : : [counter] "m" (r->refs.counter)
 				     : "cc", "cx");
 			break;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/6] x86: removing unneeded new-lines
  2018-05-17 16:13 ` Nadav Amit
                   ` (5 preceding siblings ...)
  (?)
@ 2018-05-17 16:14 ` Nadav Amit
  -1 siblings, 0 replies; 39+ messages in thread
From: Nadav Amit @ 2018-05-17 16:14 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: nadav.amit, Nadav Amit, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Josh Poimboeuf

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

This patch removes unneeded new-lines and semicolons to prevent such
distortion.

Functions such as nfs_io_completion_put() get inlined. Its overall
effect is not shown in the absolute numbers, but it seems to enable
slightly better inlining:

   text	   data	    bss	    dec	    hex	filename
18148228 10063968 2936832 31149028 1db4be4 ./vmlinux before
18148888 10064016 2936832 31149736 1db4ea8 ./vmlinux after (+708)

Static text symbols:
Before:	39649
After:	39650	(+1)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/asm.h           |  4 ++--
 arch/x86/include/asm/cmpxchg.h       | 10 +++++-----
 arch/x86/include/asm/special_insns.h | 12 ++++++------
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 219faaec51df..571ceec97976 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -51,10 +51,10 @@
  * The output operand must be type "bool".
  */
 #ifdef __GCC_ASM_FLAG_OUTPUTS__
-# define CC_SET(c) "\n\t/* output condition code " #c "*/\n"
+# define CC_SET(c) "\n\t/* output condition code " #c "*/"
 # define CC_OUT(c) "=@cc" #c
 #else
-# define CC_SET(c) "\n\tset" #c " %[_cc_" #c "]\n"
+# define CC_SET(c) "\n\tset" #c " %[_cc_" #c "]"
 # define CC_OUT(c) [_cc_ ## c] "=qm"
 #endif
 
diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index e3efd8a06066..2be9582fcb2e 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -44,22 +44,22 @@ extern void __add_wrong_size(void)
 	        __typeof__ (*(ptr)) __ret = (arg);			\
 		switch (sizeof(*(ptr))) {				\
 		case __X86_CASE_B:					\
-			asm volatile (lock #op "b %b0, %1\n"		\
+			asm volatile (lock #op "b %b0, %1"		\
 				      : "+q" (__ret), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
 		case __X86_CASE_W:					\
-			asm volatile (lock #op "w %w0, %1\n"		\
+			asm volatile (lock #op "w %w0, %1"		\
 				      : "+r" (__ret), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
 		case __X86_CASE_L:					\
-			asm volatile (lock #op "l %0, %1\n"		\
+			asm volatile (lock #op "l %0, %1"		\
 				      : "+r" (__ret), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
 		case __X86_CASE_Q:					\
-			asm volatile (lock #op "q %q0, %1\n"		\
+			asm volatile (lock #op "q %q0, %1"		\
 				      : "+r" (__ret), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
@@ -134,7 +134,7 @@ extern void __add_wrong_size(void)
 	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
 
 #define __sync_cmpxchg(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
+	__raw_cmpxchg((ptr), (old), (new), (size), "lock ")
 
 #define __cmpxchg_local(ptr, old, new, size)				\
 	__raw_cmpxchg((ptr), (old), (new), (size), "")
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 317fc59b512c..9c56059aaf24 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -19,7 +19,7 @@ extern unsigned long __force_order;
 static inline unsigned long native_read_cr0(void)
 {
 	unsigned long val;
-	asm volatile("mov %%cr0,%0\n\t" : "=r" (val), "=m" (__force_order));
+	asm volatile("mov %%cr0,%0" : "=r" (val), "=m" (__force_order));
 	return val;
 }
 
@@ -31,7 +31,7 @@ static inline void native_write_cr0(unsigned long val)
 static inline unsigned long native_read_cr2(void)
 {
 	unsigned long val;
-	asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_order));
+	asm volatile("mov %%cr2,%0" : "=r" (val), "=m" (__force_order));
 	return val;
 }
 
@@ -43,7 +43,7 @@ static inline void native_write_cr2(unsigned long val)
 static inline unsigned long __native_read_cr3(void)
 {
 	unsigned long val;
-	asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" (__force_order));
+	asm volatile("mov %%cr3,%0" : "=r" (val), "=m" (__force_order));
 	return val;
 }
 
@@ -67,7 +67,7 @@ static inline unsigned long native_read_cr4(void)
 		     : "=r" (val), "=m" (__force_order) : "0" (0));
 #else
 	/* CR4 always exists on x86_64. */
-	asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" (__force_order));
+	asm volatile("mov %%cr4,%0" : "=r" (val), "=m" (__force_order));
 #endif
 	return val;
 }
@@ -101,7 +101,7 @@ static inline u32 __read_pkru(void)
 	 * "rdpkru" instruction.  Places PKRU contents in to EAX,
 	 * clears EDX and requires that ecx=0.
 	 */
-	asm volatile(".byte 0x0f,0x01,0xee\n\t"
+	asm volatile(".byte 0x0f,0x01,0xee"
 		     : "=a" (pkru), "=d" (edx)
 		     : "c" (ecx));
 	return pkru;
@@ -115,7 +115,7 @@ static inline void __write_pkru(u32 pkru)
 	 * "wrpkru" instruction.  Loads contents in EAX to PKRU,
 	 * requires that ecx = edx = 0.
 	 */
-	asm volatile(".byte 0x0f,0x01,0xef\n\t"
+	asm volatile(".byte 0x0f,0x01,0xef"
 		     : : "a" (pkru), "c"(ecx), "d"(edx));
 }
 #else
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-17 16:13 ` [PATCH 2/6] x86: bug: prevent gcc distortions Nadav Amit
@ 2018-05-18  7:58   ` Peter Zijlstra
  2018-05-18  8:13     ` Ingo Molnar
                       ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-05-18  7:58 UTC (permalink / raw)
  To: Nadav Amit
  Cc: linux-kernel, x86, nadav.amit, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Josh Poimboeuf

On Thu, May 17, 2018 at 09:13:58AM -0700, Nadav Amit wrote:
> +asm(".macro __BUG_FLAGS ins:req file:req line:req flags:req size:req\n"
> +    "1:\t \\ins\n\t"
> +    ".pushsection __bug_table,\"aw\"\n"
> +    "2:\t "__BUG_REL(1b)		"\t# bug_entry::bug_addr\n\t"
> +    __BUG_REL(\\file)			"\t# bug_entry::file\n\t"
> +    ".word \\line"			"\t# bug_entry::line\n\t"
> +    ".word \\flags"			"\t# bug_entry::flags\n\t"
> +    ".org 2b+\\size\n\t"
> +    ".popsection\n\t"
> +    ".endm");
> +
> +#define _BUG_FLAGS(ins, flags)                                          \
>  do {									\
> +	asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1 %c2 %c3"		\
> +		     : : "i" (__FILE__), "i" (__LINE__),                \
> +			 "i" (flags),                                   \
>  			 "i" (sizeof(struct bug_entry)));		\
>  } while (0)

This is an awesome hack, but is there really nothing we can do to make
it more readable? Esp, that global asm doing the macro definition is a
pain to read.

Also, can we pretty please used named operands in 'new' code?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18  7:58   ` Peter Zijlstra
@ 2018-05-18  8:13     ` Ingo Molnar
  2018-05-18 10:11       ` Borislav Petkov
  2018-05-18 14:30       ` Nadav Amit
  2018-05-18 14:22     ` Nadav Amit
  2018-05-18 16:24     ` Linus Torvalds
  2 siblings, 2 replies; 39+ messages in thread
From: Ingo Molnar @ 2018-05-18  8:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nadav Amit, linux-kernel, x86, nadav.amit, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Denys Vlasenko


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, May 17, 2018 at 09:13:58AM -0700, Nadav Amit wrote:
> > +asm(".macro __BUG_FLAGS ins:req file:req line:req flags:req size:req\n"
> > +    "1:\t \\ins\n\t"
> > +    ".pushsection __bug_table,\"aw\"\n"
> > +    "2:\t "__BUG_REL(1b)		"\t# bug_entry::bug_addr\n\t"
> > +    __BUG_REL(\\file)			"\t# bug_entry::file\n\t"
> > +    ".word \\line"			"\t# bug_entry::line\n\t"
> > +    ".word \\flags"			"\t# bug_entry::flags\n\t"
> > +    ".org 2b+\\size\n\t"
> > +    ".popsection\n\t"
> > +    ".endm");
> > +
> > +#define _BUG_FLAGS(ins, flags)                                          \
> >  do {									\
> > +	asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1 %c2 %c3"		\
> > +		     : : "i" (__FILE__), "i" (__LINE__),                \
> > +			 "i" (flags),                                   \
> >  			 "i" (sizeof(struct bug_entry)));		\
> >  } while (0)
> 
> This is an awesome hack, but is there really nothing we can do to make
> it more readable? Esp, that global asm doing the macro definition is a
> pain to read.
> 
> Also, can we pretty please used named operands in 'new' code?

Yes, that's my main worry too about all these inlining changes:
the very, very marked reduction in the readability of assembly code.

It's bad to an extent that I'm questioning the wisdom of pandering to a compiler 
limitation to begin with?

How about asking GCC for an attribute where we can specify the inlined size of an 
asm() function? Even if we'll just approximate it due to some vagaries of actual 
code generation related to how arguments interact with GCC, an explicit byte value 
will do a heck of a better job of it than some sort of implied, vague 'number of 
newlines' heuristics ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH 0/6] Macrofying inline assembly for better compilation
  2018-05-17 16:13 ` Nadav Amit
                   ` (7 preceding siblings ...)
  (?)
@ 2018-05-18  9:20 ` David Laight
  2018-05-18 14:15   ` Nadav Amit
  -1 siblings, 1 reply; 39+ messages in thread
From: David Laight @ 2018-05-18  9:20 UTC (permalink / raw)
  To: 'Nadav Amit', linux-kernel, x86
  Cc: nadav.amit, Alok Kataria, Christopher Li, H. Peter Anvin,
	Ingo Molnar, Jan Beulich, Jonathan Corbet, Josh Poimboeuf,
	Juergen Gross, Kees Cook, linux-sparse, Peter Zijlstra,
	Randy Dunlap, Thomas Gleixner, virtualization

From: Nadav Amit
> Sent: 17 May 2018 17:14
> This patch-set deals with an interesting yet stupid problem: kernel code
> that does not get inlined despite its simplicity. There are several
> causes for this behavior: "cold" attribute on __init, different function
> optimization levels; conditional constant computations based on
> __builtin_constant_p(); and finally large inline assembly blocks.
> 
> This patch-set deals with the inline assembly problem. I separated these
> patches from the others (that were sent in the RFC) for easier
> inclusion.
> 
> The problem with inline assembly is that inline assembly is often used
> by the kernel for things that are other than code - for example,
> assembly directives and data. GCC however is oblivious to the content of
> the blocks and assumes their cost in space and time is proportional to
> the number of the perceived assembly "instruction", according to the
> number of newlines and semicolons. Alternatives, paravirt and other
> mechanisms are affected, causing code not to be inlined, and degrading
> compilation quality in general.
> 
> The solution that this patch-set carries for this problem is to create
> an assembly macro, and then call it from the inline assembly block.  As
> a result, the compiler sees a single "instruction" and assigns the more
> appropriate cost to the code. In addition, this patch-set removes
> unneeded new-lines from common x86 inline asm's, which "confuse" GCC
> heuristics.

Can't you get the same effect by using always_inline ?

	David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH 0/6] Macrofying inline assembly for better compilation
  2018-05-17 16:13 ` Nadav Amit
                   ` (6 preceding siblings ...)
  (?)
@ 2018-05-18  9:20 ` David Laight
  -1 siblings, 0 replies; 39+ messages in thread
From: David Laight @ 2018-05-18  9:20 UTC (permalink / raw)
  To: 'Nadav Amit', linux-kernel, x86
  Cc: Juergen Gross, Randy Dunlap, Kees Cook, Jonathan Corbet,
	Peter Zijlstra, Christopher Li, Josh Poimboeuf, virtualization,
	linux-sparse, Ingo Molnar, Jan Beulich, H. Peter Anvin,
	Alok Kataria, nadav.amit, Thomas Gleixner

From: Nadav Amit
> Sent: 17 May 2018 17:14
> This patch-set deals with an interesting yet stupid problem: kernel code
> that does not get inlined despite its simplicity. There are several
> causes for this behavior: "cold" attribute on __init, different function
> optimization levels; conditional constant computations based on
> __builtin_constant_p(); and finally large inline assembly blocks.
> 
> This patch-set deals with the inline assembly problem. I separated these
> patches from the others (that were sent in the RFC) for easier
> inclusion.
> 
> The problem with inline assembly is that inline assembly is often used
> by the kernel for things that are other than code - for example,
> assembly directives and data. GCC however is oblivious to the content of
> the blocks and assumes their cost in space and time is proportional to
> the number of the perceived assembly "instruction", according to the
> number of newlines and semicolons. Alternatives, paravirt and other
> mechanisms are affected, causing code not to be inlined, and degrading
> compilation quality in general.
> 
> The solution that this patch-set carries for this problem is to create
> an assembly macro, and then call it from the inline assembly block.  As
> a result, the compiler sees a single "instruction" and assigns the more
> appropriate cost to the code. In addition, this patch-set removes
> unneeded new-lines from common x86 inline asm's, which "confuse" GCC
> heuristics.

Can't you get the same effect by using always_inline ?

	David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18  8:13     ` Ingo Molnar
@ 2018-05-18 10:11       ` Borislav Petkov
  2018-05-18 14:36         ` Nadav Amit
  2018-05-18 14:30       ` Nadav Amit
  1 sibling, 1 reply; 39+ messages in thread
From: Borislav Petkov @ 2018-05-18 10:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Nadav Amit, linux-kernel, x86, nadav.amit,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Josh Poimboeuf,
	Linus Torvalds, Andy Lutomirski, Peter Zijlstra, Denys Vlasenko

On Fri, May 18, 2018 at 10:13:54AM +0200, Ingo Molnar wrote:
> Yes, that's my main worry too about all these inlining changes:
> the very, very marked reduction in the readability of assembly code.

Same reaction here: the small improvements this brings is simply not
enough to sacrifice readability so much IMO.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] Macrofying inline assembly for better compilation
  2018-05-18  9:20 ` David Laight
@ 2018-05-18 14:15   ` Nadav Amit
  0 siblings, 0 replies; 39+ messages in thread
From: Nadav Amit @ 2018-05-18 14:15 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel, x86, Alok Kataria, Christopher Li, H. Peter Anvin,
	Ingo Molnar, Jan Beulich, Jonathan Corbet, Josh Poimboeuf,
	Juergen Gross, Kees Cook, linux-sparse, Peter Zijlstra,
	Randy Dunlap, Thomas Gleixner, virtualization

David Laight <David.Laight@ACULAB.COM> wrote:

> From: Nadav Amit
>> Sent: 17 May 2018 17:14
>> This patch-set deals with an interesting yet stupid problem: kernel code
>> that does not get inlined despite its simplicity. There are several
>> causes for this behavior: "cold" attribute on __init, different function
>> optimization levels; conditional constant computations based on
>> __builtin_constant_p(); and finally large inline assembly blocks.
>> 
>> This patch-set deals with the inline assembly problem. I separated these
>> patches from the others (that were sent in the RFC) for easier
>> inclusion.
>> 
>> The problem with inline assembly is that inline assembly is often used
>> by the kernel for things that are other than code - for example,
>> assembly directives and data. GCC however is oblivious to the content of
>> the blocks and assumes their cost in space and time is proportional to
>> the number of the perceived assembly "instruction", according to the
>> number of newlines and semicolons. Alternatives, paravirt and other
>> mechanisms are affected, causing code not to be inlined, and degrading
>> compilation quality in general.
>> 
>> The solution that this patch-set carries for this problem is to create
>> an assembly macro, and then call it from the inline assembly block.  As
>> a result, the compiler sees a single "instruction" and assigns the more
>> appropriate cost to the code. In addition, this patch-set removes
>> unneeded new-lines from common x86 inline asm's, which "confuse" GCC
>> heuristics.
> 
> Can't you get the same effect by using always_inline ?

I wanted and forgot to mention in the cover-letter why always_inline is not
a proper solution:

1. It is not easy to go over 400 functions and mark them as __always_inline.
   Maintaining it afterwards (i.e., removing the __always_inline if the
   function is changed and becomes “heavier") is even harder.

2. The kernel can be configured in a many ways, which would make
   functions more “cheaper” or more “expensive”, so you cannot always
   predetermine whether a function should be inlined.

3. If you mark a function __always_inline you can just cause the calling
   function not to be inlined (when it should be inlined as well). It becomes
   a whack-a-mole.

4. It is not only about inlining. The compiler also makes branch decisions
   based on the perceived cost of the code, including inlined function.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18  7:58   ` Peter Zijlstra
  2018-05-18  8:13     ` Ingo Molnar
@ 2018-05-18 14:22     ` Nadav Amit
  2018-05-18 17:52       ` Joe Perches
  2018-05-18 16:24     ` Linus Torvalds
  2 siblings, 1 reply; 39+ messages in thread
From: Nadav Amit @ 2018-05-18 14:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Josh Poimboeuf

Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, May 17, 2018 at 09:13:58AM -0700, Nadav Amit wrote:
>> +asm(".macro __BUG_FLAGS ins:req file:req line:req flags:req size:req\n"
>> +    "1:\t \\ins\n\t"
>> +    ".pushsection __bug_table,\"aw\"\n"
>> +    "2:\t "__BUG_REL(1b)		"\t# bug_entry::bug_addr\n\t"
>> +    __BUG_REL(\\file)			"\t# bug_entry::file\n\t"
>> +    ".word \\line"			"\t# bug_entry::line\n\t"
>> +    ".word \\flags"			"\t# bug_entry::flags\n\t"
>> +    ".org 2b+\\size\n\t"
>> +    ".popsection\n\t"
>> +    ".endm");
>> +
>> +#define _BUG_FLAGS(ins, flags)                                          \
>> do {									\
>> +	asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1 %c2 %c3"		\
>> +		     : : "i" (__FILE__), "i" (__LINE__),                \
>> +			 "i" (flags),                                   \
>> 			 "i" (sizeof(struct bug_entry)));		\
>> } while (0)
> 
> This is an awesome hack, but is there really nothing we can do to make
> it more readable? Esp, that global asm doing the macro definition is a
> pain to read.
> 
> Also, can we pretty please used named operands in 'new' code?

It is hard to make the code readable in C, readable in the generated asm,
and to follow the coding style imposed by checkpatch (e.g., no space between
the newline and the asm argument before it).

I considered wrapping the asm macro in a C macro, but AFAIK C macros cannot
emit backslashes.

I thought of suggesting to change “ins” into a vararg and removing the
escaped double-quotes in the C macro, but you ask to use named operands.

So I am out of ideas. Do you have anything else in mind?

Thanks,
Nadav

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18  8:13     ` Ingo Molnar
  2018-05-18 10:11       ` Borislav Petkov
@ 2018-05-18 14:30       ` Nadav Amit
  1 sibling, 0 replies; 39+ messages in thread
From: Nadav Amit @ 2018-05-18 14:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Josh Poimboeuf, Linus Torvalds, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Denys Vlasenko

Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Thu, May 17, 2018 at 09:13:58AM -0700, Nadav Amit wrote:
>>> +asm(".macro __BUG_FLAGS ins:req file:req line:req flags:req size:req\n"
>>> +    "1:\t \\ins\n\t"
>>> +    ".pushsection __bug_table,\"aw\"\n"
>>> +    "2:\t "__BUG_REL(1b)		"\t# bug_entry::bug_addr\n\t"
>>> +    __BUG_REL(\\file)			"\t# bug_entry::file\n\t"
>>> +    ".word \\line"			"\t# bug_entry::line\n\t"
>>> +    ".word \\flags"			"\t# bug_entry::flags\n\t"
>>> +    ".org 2b+\\size\n\t"
>>> +    ".popsection\n\t"
>>> +    ".endm");
>>> +
>>> +#define _BUG_FLAGS(ins, flags)                                          \
>>> do {									\
>>> +	asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1 %c2 %c3"		\
>>> +		     : : "i" (__FILE__), "i" (__LINE__),                \
>>> +			 "i" (flags),                                   \
>>> 			 "i" (sizeof(struct bug_entry)));		\
>>> } while (0)
>> 
>> This is an awesome hack, but is there really nothing we can do to make
>> it more readable? Esp, that global asm doing the macro definition is a
>> pain to read.
>> 
>> Also, can we pretty please used named operands in 'new' code?
> 
> Yes, that's my main worry too about all these inlining changes:
> the very, very marked reduction in the readability of assembly code.
> 
> It's bad to an extent that I'm questioning the wisdom of pandering to a compiler 
> limitation to begin with?
> 
> How about asking GCC for an attribute where we can specify the inlined size of an 
> asm() function? Even if we'll just approximate it due to some vagaries of actual 
> code generation related to how arguments interact with GCC, an explicit byte value 
> will do a heck of a better job of it than some sort of implied, vague 'number of 
> newlines' heuristics ...

If it were to become a GCC feature, I think it is best to be a builtin that
says: consider the enclosed expression as “free”. The problem of poor
inlining decisions is not specific to inline asm. As I mentioned in the RFC,
when there are two code paths for constants and variables based on
__builtin_constant_p(), you can get the “cost” of the constant path for
variables.

It is not hard to add such a feature to GCC, but I don’t know how easy it is
to get new features into the compiler.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 10:11       ` Borislav Petkov
@ 2018-05-18 14:36         ` Nadav Amit
  2018-05-18 15:40           ` Borislav Petkov
  0 siblings, 1 reply; 39+ messages in thread
From: Nadav Amit @ 2018-05-18 14:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
	Andy Lutomirski, Peter Zijlstra, Denys Vlasenko

Borislav Petkov <bp@alien8.de> wrote:

> On Fri, May 18, 2018 at 10:13:54AM +0200, Ingo Molnar wrote:
>> Yes, that's my main worry too about all these inlining changes:
>> the very, very marked reduction in the readability of assembly code.
> 
> Same reaction here: the small improvements this brings is simply not
> enough to sacrifice readability so much IMO.

I didn’t try too hard to find more affected (micro)benchmarks, but I am
pretty sure there are: pretty much all the paravirt ops are currently not
inlined, which can affect kernel operations that bang on the page-tables.

Notice that this is not an x86-specific issue. Other architectures, with
not-as-good OOOE for instance, might incur higher overheads. So I don’t
think that this issue should be ignored.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 14:36         ` Nadav Amit
@ 2018-05-18 15:40           ` Borislav Petkov
  2018-05-18 15:46             ` Nadav Amit
  0 siblings, 1 reply; 39+ messages in thread
From: Borislav Petkov @ 2018-05-18 15:40 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
	Andy Lutomirski, Peter Zijlstra, Denys Vlasenko

On Fri, May 18, 2018 at 02:36:21PM +0000, Nadav Amit wrote:
> I didn’t try too hard to find more affected (micro)benchmarks, but I am
> pretty sure there are:

So you being pretty sure there are, doesn't make me go, oh, ok, then,
this is an uglification we should try to live with. It is still ugly
with no proven benefit from it.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 15:40           ` Borislav Petkov
@ 2018-05-18 15:46             ` Nadav Amit
  2018-05-18 15:53               ` Borislav Petkov
  0 siblings, 1 reply; 39+ messages in thread
From: Nadav Amit @ 2018-05-18 15:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
	Andy Lutomirski, Peter Zijlstra, Denys Vlasenko

Borislav Petkov <bp@alien8.de> wrote:

> On Fri, May 18, 2018 at 02:36:21PM +0000, Nadav Amit wrote:
>> I didn’t try too hard to find more affected (micro)benchmarks, but I am
>> pretty sure there are:
> 
> So you being pretty sure there are, doesn't make me go, oh, ok, then,
> this is an uglification we should try to live with. It is still ugly
> with no proven benefit from it.

In case you didn’t read the cover-letter: the patch-set does give a 2%
performance improvement for #PF-MADV_DONTNEED microbenchmark loop.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 15:46             ` Nadav Amit
@ 2018-05-18 15:53               ` Borislav Petkov
  2018-05-18 16:29                 ` Nadav Amit
  0 siblings, 1 reply; 39+ messages in thread
From: Borislav Petkov @ 2018-05-18 15:53 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
	Andy Lutomirski, Peter Zijlstra, Denys Vlasenko

On Fri, May 18, 2018 at 03:46:33PM +0000, Nadav Amit wrote:
> In case you didn’t read the cover-letter: the patch-set does give a 2%
> performance improvement for #PF-MADV_DONTNEED microbenchmark loop.

I saw it but *micro*-benchmark doesn't tell me a whole lot. If that
"improvement" is not visible in real workloads/benchmarks, then I'm just
as unimpressed.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18  7:58   ` Peter Zijlstra
  2018-05-18  8:13     ` Ingo Molnar
  2018-05-18 14:22     ` Nadav Amit
@ 2018-05-18 16:24     ` Linus Torvalds
  2018-05-18 17:24       ` Nadav Amit
  2 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2018-05-18 16:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: namit, Linux Kernel Mailing List, the arch/x86 maintainers,
	Nadav Amit, Thomas Gleixner, Ingo Molnar, Peter Anvin,
	Josh Poimboeuf

On Fri, May 18, 2018 at 12:59 AM Peter Zijlstra <peterz@infradead.org>
wrote:

> This is an awesome hack, but is there really nothing we can do to make
> it more readable? Esp, that global asm doing the macro definition is a
> pain to read.

I actually find that macro to be *more* legible than what we do now,
although I'm not enamored with the pseudo-operation name ("__BUG_FLAGS").

That said, the C header code itself I don't love.

I wonder if we should just introduce a new assembler header file, and get
it included when processing compiler-generated asm. We already do that for
our _real_ *.S files, with a number of our header files having constants
and code for the asm case too, not just C.

But we could have an <asm/asm-macro.h> header file that has these kinds of
macros (or "pseudo-instructions") for assembly language cases, and then we
could just rely on them in inline asm.

Because if you want to see illegible, look at what we currently generate:

     # kernel/exit.c:1761:       BUG();
     #APP
     # 1761 "kernel/exit.c" 1
         1:      .byte 0x0f, 0x0b
     .pushsection __bug_table,"aw"
     2:  .long 1b - 2b   # bug_entry::bug_addr
         .long .LC0 - 2b # bug_entry::file       #
         .word 1761      # bug_entry::line       #
         .word 0 # bug_entry::flags      #
         .org 2b+12      #
     .popsection
     # 0 "" 2
     # 1761 "kernel/exit.c" 1
         180:    #
         .pushsection .discard.unreachable
         .long 180b - .  #
         .popsection

     # 0 "" 2
     #NO_APP

and tell me that's legible.. Of course, I'm probably one of the few people
who actually look at the generated asm fairly regularly.

So a few macros that we can use in inline asm definitely wouldn't hurt
legibility. And if we actually can put them in a header file as legible
code - instead of having to wrap them in a global "asm()" macro in C code,
they'd probably be legible at a source level too.

It's not just the bug_flags cases. It's things like jump labels too:

     # ./arch/x86/include/asm/jump_label.h:36:   asm_volatile_goto("1:"
     #APP
     # 36 "./arch/x86/include/asm/jump_label.h" 1
         1:.byte 0x0f,0x1f,0x44,0x00,0
         .pushsection __jump_table,  "aw"
          .balign 8
          .quad 1b, .L71, __tracepoint_sched_process_free+8 + 0  #,,
         .popsection

     # 0 "" 2
     #NO_APP

and atomics:

     # ./arch/x86/include/asm/atomic.h:122:      GEN_UNARY_RMWcc(LOCK_PREFIX
"decl", v->counter, "%0", e);
     #APP
     # 122 "./arch/x86/include/asm/atomic.h" 1
         .pushsection .smp_locks,"a"
     .balign 4
     .long 671f - .
     .popsection
     671:
         lock; decl -2336(%rbp)  # _7->counter
         /* output condition code e*/

     # 0 "" 2
     # ./include/linux/sched/task.h:95:  if (atomic_dec_and_test(&t->usage))
     #NO_APP

where I suspect we could hide the whole "lock" magic in a macro, and make
this much more legible.

Maybe? I think it might be worth trying. It's possible that the macro games
themselves would just cause enough pain to make any gains go away.

                    Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 15:53               ` Borislav Petkov
@ 2018-05-18 16:29                 ` Nadav Amit
  2018-05-18 17:41                   ` Boris Petkov
  0 siblings, 1 reply; 39+ messages in thread
From: Nadav Amit @ 2018-05-18 16:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
	Andy Lutomirski, Peter Zijlstra, Denys Vlasenko

Borislav Petkov <bp@alien8.de> wrote:

> On Fri, May 18, 2018 at 03:46:33PM +0000, Nadav Amit wrote:
>> In case you didn’t read the cover-letter: the patch-set does give a 2%
>> performance improvement for #PF-MADV_DONTNEED microbenchmark loop.
> 
> I saw it but *micro*-benchmark doesn't tell me a whole lot. If that
> "improvement" is not visible in real workloads/benchmarks, then I'm just
> as unimpressed.

Funny. I found in my mailbox that you once wrote me: "It is a dumb idea, it
doesn't bring us anything besides some superficial readability which you
don't really need.”

To the point, I think you exaggerate with the effect of the patch on the
code readability: it is not much uglier than it was before, and the change
is in a very specific point in the code.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 16:24     ` Linus Torvalds
@ 2018-05-18 17:24       ` Nadav Amit
  2018-05-18 18:25         ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: Nadav Amit @ 2018-05-18 17:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Peter Anvin, Josh Poimboeuf

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, May 18, 2018 at 12:59 AM Peter Zijlstra <peterz@infradead.org>
> wrote:
> 
>> This is an awesome hack, but is there really nothing we can do to make
>> it more readable? Esp, that global asm doing the macro definition is a
>> pain to read.
> 
> I actually find that macro to be *more* legible than what we do now,
> although I'm not enamored with the pseudo-operation name ("__BUG_FLAGS").
> 
> That said, the C header code itself I don't love.
> 
> I wonder if we should just introduce a new assembler header file, and get
> it included when processing compiler-generated asm. We already do that for
> our _real_ *.S files, with a number of our header files having constants
> and code for the asm case too, not just C.
> 
> But we could have an <asm/asm-macro.h> header file that has these kinds of
> macros (or "pseudo-instructions") for assembly language cases, and then we
> could just rely on them in inline asm.

Will it be ok just to use a global inline asm to set an “.include” directive
that gas would later process? (I can probably wrap it in a C macro so it
won’t be too disgusting)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 16:29                 ` Nadav Amit
@ 2018-05-18 17:41                   ` Boris Petkov
  0 siblings, 0 replies; 39+ messages in thread
From: Boris Petkov @ 2018-05-18 17:41 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, x86, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Josh Poimboeuf, Linus Torvalds,
	Andy Lutomirski, Peter Zijlstra, Denys Vlasenko

On May 18, 2018 6:29:59 PM GMT+02:00, Nadav Amit <namit@vmware.com> wrote:
>Funny. I found in my mailbox that you once wrote me: "It is a dumb
>idea, it
>doesn't bring us anything besides some superficial readability which
>you
>don't really need.”

How about a proper quotation with the Message-id you're referring to?


-- 
Sent from a small device: formatting sux and brevity is inevitable. 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 14:22     ` Nadav Amit
@ 2018-05-18 17:52       ` Joe Perches
  0 siblings, 0 replies; 39+ messages in thread
From: Joe Perches @ 2018-05-18 17:52 UTC (permalink / raw)
  To: Nadav Amit, Peter Zijlstra
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Josh Poimboeuf

On Fri, 2018-05-18 at 14:22 +0000, Nadav Amit wrote:
> It is hard to make the code readable in C, readable in the generated asm,
> and to follow the coding style imposed by checkpatch (e.g., no space between
> the newline and the asm argument before it).

Ignore checkpatch when you know better.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 17:24       ` Nadav Amit
@ 2018-05-18 18:25         ` Linus Torvalds
  2018-05-18 18:33           ` hpa
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2018-05-18 18:25 UTC (permalink / raw)
  To: namit
  Cc: Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Peter Anvin, Josh Poimboeuf

On Fri, May 18, 2018 at 10:24 AM Nadav Amit <namit@vmware.com> wrote:

> Will it be ok just to use a global inline asm to set an “.include”
directive
> that gas would later process? (I can probably wrap it in a C macro so it
> won’t be too disgusting)

Maybe. I'd almost prefer it to be the automatic kind of thing that we
already do for C files using "-include".

                  Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 18:25         ` Linus Torvalds
@ 2018-05-18 18:33           ` hpa
  2018-05-18 18:50             ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: hpa @ 2018-05-18 18:33 UTC (permalink / raw)
  To: Linus Torvalds, namit
  Cc: Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Fri, May 18, 2018 at 10:24 AM Nadav Amit <namit@vmware.com> wrote:
>
>> Will it be ok just to use a global inline asm to set an “.include”
>directive
>> that gas would later process? (I can probably wrap it in a C macro so
>it
>> won’t be too disgusting)
>
>Maybe. I'd almost prefer it to be the automatic kind of thing that we
>already do for C files using "-include".
>
>                  Linus

Unfortunately gcc doesn't guarantee that global assembly inlines will appear at the top of the file.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 18:33           ` hpa
@ 2018-05-18 18:50             ` Linus Torvalds
  2018-05-18 18:53               ` hpa
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2018-05-18 18:50 UTC (permalink / raw)
  To: Peter Anvin
  Cc: namit, Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

On Fri, May 18, 2018 at 11:34 AM <hpa@zytor.com> wrote:

> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
torvalds@linux-foundation.org> wrote:

> Unfortunately gcc doesn't guarantee that global assembly inlines will
appear at the top of the file.

Yeah. It really would be better to do the "asm version of -inline".

We already do something like that for real *.S files on some architectures
(because their assembly really wants it, eg

  arch/arm/Makefile:
    KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y) -include
asm/unified.h -msoft-float

but I do want to point out that KBUILD_AFLAGS is *not* used for
compiler-generated assembly, only for actual *.S files.

Sadly, I don't actually know any way to make gcc call the 'as' phase with
particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
assembler, but there is no "-include" option for GNU as afaik.

Can you perhaps define a macro symbol for "--defsym"? Probably not.

                       Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 18:50             ` Linus Torvalds
@ 2018-05-18 18:53               ` hpa
  2018-05-18 19:02                 ` Nadav Amit
  0 siblings, 1 reply; 39+ messages in thread
From: hpa @ 2018-05-18 18:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: namit, Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

On May 18, 2018 11:50:12 AM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Fri, May 18, 2018 at 11:34 AM <hpa@zytor.com> wrote:
>
>> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>torvalds@linux-foundation.org> wrote:
>
>> Unfortunately gcc doesn't guarantee that global assembly inlines will
>appear at the top of the file.
>
>Yeah. It really would be better to do the "asm version of -inline".
>
>We already do something like that for real *.S files on some
>architectures
>(because their assembly really wants it, eg
>
>  arch/arm/Makefile:
>KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>-include
>asm/unified.h -msoft-float
>
>but I do want to point out that KBUILD_AFLAGS is *not* used for
>compiler-generated assembly, only for actual *.S files.
>
>Sadly, I don't actually know any way to make gcc call the 'as' phase
>with
>particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>assembler, but there is no "-include" option for GNU as afaik.
>
>Can you perhaps define a macro symbol for "--defsym"? Probably not.
>
>                       Linus

I looked at this thing a long time ago; it's not there, and the best would probably be to get global asm() working properly in gcc.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 18:53               ` hpa
@ 2018-05-18 19:02                 ` Nadav Amit
  2018-05-18 19:05                   ` hpa
  2018-05-18 19:11                   ` Linus Torvalds
  0 siblings, 2 replies; 39+ messages in thread
From: Nadav Amit @ 2018-05-18 19:02 UTC (permalink / raw)
  To: hpa
  Cc: Linus Torvalds, Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

hpa@zytor.com wrote:

> On May 18, 2018 11:50:12 AM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> On Fri, May 18, 2018 at 11:34 AM <hpa@zytor.com> wrote:
>> 
>>> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>> torvalds@linux-foundation.org> wrote:
>> 
>>> Unfortunately gcc doesn't guarantee that global assembly inlines will
>> appear at the top of the file.
>> 
>> Yeah. It really would be better to do the "asm version of -inline".
>> 
>> We already do something like that for real *.S files on some
>> architectures
>> (because their assembly really wants it, eg
>> 
>> arch/arm/Makefile:
>> KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>> -include
>> asm/unified.h -msoft-float
>> 
>> but I do want to point out that KBUILD_AFLAGS is *not* used for
>> compiler-generated assembly, only for actual *.S files.
>> 
>> Sadly, I don't actually know any way to make gcc call the 'as' phase
>> with
>> particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>> assembler, but there is no "-include" option for GNU as afaik.
>> 
>> Can you perhaps define a macro symbol for "--defsym"? Probably not.
>> 
>>                      Linus
> 
> I looked at this thing a long time ago; it's not there, and the best would probably be to get global asm() working properly in gcc.

I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 19:02                 ` Nadav Amit
@ 2018-05-18 19:05                   ` hpa
  2018-05-18 19:11                   ` Linus Torvalds
  1 sibling, 0 replies; 39+ messages in thread
From: hpa @ 2018-05-18 19:05 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Linus Torvalds, Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

On May 18, 2018 12:02:50 PM PDT, Nadav Amit <namit@vmware.com> wrote:
>hpa@zytor.com wrote:
>
>> On May 18, 2018 11:50:12 AM PDT, Linus Torvalds
><torvalds@linux-foundation.org> wrote:
>>> On Fri, May 18, 2018 at 11:34 AM <hpa@zytor.com> wrote:
>>> 
>>>> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>>> torvalds@linux-foundation.org> wrote:
>>> 
>>>> Unfortunately gcc doesn't guarantee that global assembly inlines
>will
>>> appear at the top of the file.
>>> 
>>> Yeah. It really would be better to do the "asm version of -inline".
>>> 
>>> We already do something like that for real *.S files on some
>>> architectures
>>> (because their assembly really wants it, eg
>>> 
>>> arch/arm/Makefile:
>>> KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>>> -include
>>> asm/unified.h -msoft-float
>>> 
>>> but I do want to point out that KBUILD_AFLAGS is *not* used for
>>> compiler-generated assembly, only for actual *.S files.
>>> 
>>> Sadly, I don't actually know any way to make gcc call the 'as' phase
>>> with
>>> particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>>> assembler, but there is no "-include" option for GNU as afaik.
>>> 
>>> Can you perhaps define a macro symbol for "--defsym"? Probably not.
>>> 
>>>                      Linus
>> 
>> I looked at this thing a long time ago; it's not there, and the best
>would probably be to get global asm() working properly in gcc.
>
>I can add a -Wa,[filename.s] switch. It works, but sort of
>undocumented.

Ah, I thought that would have assembled each file separately. We can request it be documented, that's usually much easier than getting a new feature implemented.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 19:02                 ` Nadav Amit
  2018-05-18 19:05                   ` hpa
@ 2018-05-18 19:11                   ` Linus Torvalds
  2018-05-18 19:18                     ` Nadav Amit
  1 sibling, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2018-05-18 19:11 UTC (permalink / raw)
  To: namit
  Cc: Peter Anvin, Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

On Fri, May 18, 2018 at 12:02 PM Nadav Amit <namit@vmware.com> wrote:

> I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.

Oh, if it assembles things together, then that sounds optimal.

And yes, like hpa says, we should make sure that behavior is acknowledged
by the GNU as people, so that they then don't come back and say "hey, now
we assemble things as separate units".

That said, the "separate units" model really doesn't make much sense now
that I think about it, because 'as' supports only one output file. So I
guess the as people wouldn't have any issues with just accepting the
"concatenate all input files" syntax.

              Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 19:11                   ` Linus Torvalds
@ 2018-05-18 19:18                     ` Nadav Amit
  2018-05-18 19:21                       ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: Nadav Amit @ 2018-05-18 19:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Anvin, Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, May 18, 2018 at 12:02 PM Nadav Amit <namit@vmware.com> wrote:
> 
>> I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.
> 
> Oh, if it assembles things together, then that sounds optimal.
> 
> And yes, like hpa says, we should make sure that behavior is acknowledged
> by the GNU as people, so that they then don't come back and say "hey, now
> we assemble things as separate units".
> 
> That said, the "separate units" model really doesn't make much sense now
> that I think about it, because 'as' supports only one output file. So I
> guess the as people wouldn't have any issues with just accepting the
> "concatenate all input files" syntax.

Gnu ASM manual says: "Each time you run as it assembles exactly one source
program. The source program is made up of one or more files.”

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 19:18                     ` Nadav Amit
@ 2018-05-18 19:21                       ` Linus Torvalds
  2018-05-18 19:22                         ` hpa
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2018-05-18 19:21 UTC (permalink / raw)
  To: namit
  Cc: Peter Anvin, Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

On Fri, May 18, 2018 at 12:18 PM Nadav Amit <namit@vmware.com> wrote:

> Gnu ASM manual says: "Each time you run as it assembles exactly one source
> program. The source program is made up of one or more files.”

Ok, that counts as documentation, although it's confusing as hell. Is it
one source program or multiple files again? I see what they are trying to
say, but it really could be phrased more clearly ;)

                     Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 19:21                       ` Linus Torvalds
@ 2018-05-18 19:22                         ` hpa
  2018-05-18 19:36                           ` Nadav Amit
  0 siblings, 1 reply; 39+ messages in thread
From: hpa @ 2018-05-18 19:22 UTC (permalink / raw)
  To: Linus Torvalds, namit
  Cc: Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

On May 18, 2018 12:21:00 PM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Fri, May 18, 2018 at 12:18 PM Nadav Amit <namit@vmware.com> wrote:
>
>> Gnu ASM manual says: "Each time you run as it assembles exactly one
>source
>> program. The source program is made up of one or more files.”
>
>Ok, that counts as documentation, although it's confusing as hell. Is
>it
>one source program or multiple files again? I see what they are trying
>to
>say, but it really could be phrased more clearly ;)
>
>                     Linus

At least we have a solution!
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 19:22                         ` hpa
@ 2018-05-18 19:36                           ` Nadav Amit
  2018-05-18 19:41                             ` hpa
  0 siblings, 1 reply; 39+ messages in thread
From: Nadav Amit @ 2018-05-18 19:36 UTC (permalink / raw)
  To: hpa
  Cc: Linus Torvalds, Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

hpa@zytor.com wrote:

> On May 18, 2018 12:21:00 PM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> On Fri, May 18, 2018 at 12:18 PM Nadav Amit <namit@vmware.com> wrote:
>> 
>>> Gnu ASM manual says: "Each time you run as it assembles exactly one
>> source
>>> program. The source program is made up of one or more files.”
>> 
>> Ok, that counts as documentation, although it's confusing as hell. Is
>> it
>> one source program or multiple files again? I see what they are trying
>> to
>> say, but it really could be phrased more clearly ;)
>> 
>>                    Linus
> 
> At least we have a solution!

Thanks for all your help. I will try to do it over over the weekend.

Funny, I see that this “trick” (-Wa,[filename.s]) is already being used to
include arch/x86/boot/code16gcc.h so I feel “safer” to use it.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] x86: bug: prevent gcc distortions
  2018-05-18 19:36                           ` Nadav Amit
@ 2018-05-18 19:41                             ` hpa
  0 siblings, 0 replies; 39+ messages in thread
From: hpa @ 2018-05-18 19:41 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Linus Torvalds, Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf

On May 18, 2018 12:36:20 PM PDT, Nadav Amit <namit@vmware.com> wrote:
>hpa@zytor.com wrote:
>
>> On May 18, 2018 12:21:00 PM PDT, Linus Torvalds
><torvalds@linux-foundation.org> wrote:
>>> On Fri, May 18, 2018 at 12:18 PM Nadav Amit <namit@vmware.com>
>wrote:
>>> 
>>>> Gnu ASM manual says: "Each time you run as it assembles exactly one
>>> source
>>>> program. The source program is made up of one or more files.”
>>> 
>>> Ok, that counts as documentation, although it's confusing as hell.
>Is
>>> it
>>> one source program or multiple files again? I see what they are
>trying
>>> to
>>> say, but it really could be phrased more clearly ;)
>>> 
>>>                    Linus
>> 
>> At least we have a solution!
>
>Thanks for all your help. I will try to do it over over the weekend.
>
>Funny, I see that this “trick” (-Wa,[filename.s]) is already being used
>to
>include arch/x86/boot/code16gcc.h so I feel “safer” to use it.
>
>Regards,
>Nadav

Oh. I don't remember when we did that... might even be my doing and then I just forgot about it :)
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/6] x86: refcount: prevent gcc distortions
  2018-05-17 16:14 ` [PATCH 5/6] x86: refcount: prevent gcc distortions Nadav Amit
@ 2018-05-19  4:27   ` kbuild test robot
  0 siblings, 0 replies; 39+ messages in thread
From: kbuild test robot @ 2018-05-19  4:27 UTC (permalink / raw)
  To: Nadav Amit
  Cc: kbuild-all, linux-kernel, x86, nadav.amit, Nadav Amit,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Kees Cook,
	Jan Beulich, Josh Poimboeuf

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.17-rc5 next-20180517]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-objtool-use-asm-macro-for-better-compiler-decisions/20180519-045439
config: x86_64-allyesdebian (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   arch/x86/include/asm/refcount.h: Assembler messages:
>> arch/x86/include/asm/refcount.h:67: Error: too many positional arguments

vim +67 arch/x86/include/asm/refcount.h

    63	
    64	static __always_inline void refcount_inc(refcount_t *r)
    65	{
    66		asm volatile(LOCK_PREFIX "incl %0\n\t"
  > 67			"__REFCOUNT_CHECK_LT_ZERO %[counter]"
    68			: [counter] "+m" (r->refs.counter)
    69			: : "cc", "cx");
    70	}
    71	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 39420 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2018-05-19  4:28 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-17 16:13 [PATCH 0/6] Macrofying inline assembly for better compilation Nadav Amit
2018-05-17 16:13 ` Nadav Amit
2018-05-17 16:13 ` [PATCH 1/6] x86: objtool: use asm macro for better compiler decisions Nadav Amit
2018-05-17 16:13   ` Nadav Amit
2018-05-17 16:13 ` [PATCH 2/6] x86: bug: prevent gcc distortions Nadav Amit
2018-05-18  7:58   ` Peter Zijlstra
2018-05-18  8:13     ` Ingo Molnar
2018-05-18 10:11       ` Borislav Petkov
2018-05-18 14:36         ` Nadav Amit
2018-05-18 15:40           ` Borislav Petkov
2018-05-18 15:46             ` Nadav Amit
2018-05-18 15:53               ` Borislav Petkov
2018-05-18 16:29                 ` Nadav Amit
2018-05-18 17:41                   ` Boris Petkov
2018-05-18 14:30       ` Nadav Amit
2018-05-18 14:22     ` Nadav Amit
2018-05-18 17:52       ` Joe Perches
2018-05-18 16:24     ` Linus Torvalds
2018-05-18 17:24       ` Nadav Amit
2018-05-18 18:25         ` Linus Torvalds
2018-05-18 18:33           ` hpa
2018-05-18 18:50             ` Linus Torvalds
2018-05-18 18:53               ` hpa
2018-05-18 19:02                 ` Nadav Amit
2018-05-18 19:05                   ` hpa
2018-05-18 19:11                   ` Linus Torvalds
2018-05-18 19:18                     ` Nadav Amit
2018-05-18 19:21                       ` Linus Torvalds
2018-05-18 19:22                         ` hpa
2018-05-18 19:36                           ` Nadav Amit
2018-05-18 19:41                             ` hpa
2018-05-17 16:13 ` [PATCH 3/6] x86: alternative: macrofy locks for better inlining Nadav Amit
2018-05-17 16:14 ` [PATCH 4/6] x86: prevent inline distortion by paravirt ops Nadav Amit
2018-05-17 16:14 ` [PATCH 5/6] x86: refcount: prevent gcc distortions Nadav Amit
2018-05-19  4:27   ` kbuild test robot
2018-05-17 16:14 ` [PATCH 6/6] x86: removing unneeded new-lines Nadav Amit
2018-05-18  9:20 ` [PATCH 0/6] Macrofying inline assembly for better compilation David Laight
2018-05-18  9:20 ` David Laight
2018-05-18 14:15   ` Nadav Amit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.