From: Nadav Amit <namit@vmware.com> To: <linux-kernel@vger.kernel.org>, <x86@kernel.org> Cc: <nadav.amit@gmail.com>, Nadav Amit <namit@vmware.com>, Alok Kataria <akataria@vmware.com>, Christopher Li <sparse@chrisli.org>, "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>, Jan Beulich <JBeulich@suse.com>, Jonathan Corbet <corbet@lwn.net>, Josh Poimboeuf <jpoimboe@redhat.com>, Juergen Gross <jgross@suse.com>, Kees Cook <keescook@chromium.org>, <linux-sparse@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>, Randy Dunlap <rdunlap@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, <virtualization@lists.linux-foundation.org> Subject: [PATCH 0/6] Macrofying inline assembly for better compilation Date: Thu, 17 May 2018 09:13:56 -0700 [thread overview] Message-ID: <20180517161402.78089-1-namit@vmware.com> (raw) This patch-set deals with an interesting yet stupid problem: kernel code that does not get inlined despite its simplicity. There are several causes for this behavior: "cold" attribute on __init, different function optimization levels; conditional constant computations based on __builtin_constant_p(); and finally large inline assembly blocks. This patch-set deals with the inline assembly problem. I separated these patches from the others (that were sent in the RFC) for easier inclusion. The problem with inline assembly is that inline assembly is often used by the kernel for things that are other than code - for example, assembly directives and data. GCC however is oblivious to the content of the blocks and assumes their cost in space and time is proportional to the number of the perceived assembly "instruction", according to the number of newlines and semicolons. Alternatives, paravirt and other mechanisms are affected, causing code not to be inlined, and degrading compilation quality in general. The solution that this patch-set carries for this problem is to create an assembly macro, and then call it from the inline assembly block. As a result, the compiler sees a single "instruction" and assigns the more appropriate cost to the code. In addition, this patch-set removes unneeded new-lines from common x86 inline asm's, which "confuse" GCC heuristics. Overall this patch-set slightly increases the kernel size (my build was done using my localmodconfig + localyesconfig for the record): text data bss dec hex filename 18126699 10066728 2936832 31130259 1db0293 ./vmlinux before 18148888 10064016 2936832 31149736 1db4ea8 ./vmlinux after (+0.06%) The patch-set eliminates many of the static text symbols: Before: 40033 After: 39650 (-1%) A microbenchmark with a loop of page-fault and MADV_DONTNEED show 2% performance improvement with this patch-set (when PTI is off). Changes from RFC: - Better formatting [Jan] - i386 build problems [0-day] - Inline comments - Separating __builtin_constant_p() into a different future patch-set Cc: Alok Kataria <akataria@vmware.com> Cc: Christopher Li <sparse@chrisli.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-sparse@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: x86@kernel.org Nadav Amit (6): x86: objtool: use asm macro for better compiler decisions x86: bug: prevent gcc distortions x86: alternative: macrofy locks for better inlining x86: prevent inline distortion by paravirt ops x86: refcount: prevent gcc distortions x86: removing unneeded new-lines arch/x86/include/asm/alternative.h | 34 +++++++++++---- arch/x86/include/asm/asm.h | 4 +- arch/x86/include/asm/bug.h | 56 ++++++++++++++++-------- arch/x86/include/asm/cmpxchg.h | 10 ++--- arch/x86/include/asm/paravirt_types.h | 63 +++++++++++++++++---------- arch/x86/include/asm/refcount.h | 62 ++++++++++++++++---------- arch/x86/include/asm/special_insns.h | 12 ++--- include/linux/compiler.h | 37 ++++++++++++---- 8 files changed, 183 insertions(+), 95 deletions(-) -- 2.17.0
WARNING: multiple messages have this Message-ID (diff)
From: Nadav Amit <namit@vmware.com> To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: nadav.amit@gmail.com, Nadav Amit <namit@vmware.com>, Alok Kataria <akataria@vmware.com>, Christopher Li <sparse@chrisli.org>, "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>, Jan Beulich <JBeulich@suse.com>, Jonathan Corbet <corbet@lwn.net>, Josh Poimboeuf <jpoimboe@redhat.com>, Juergen Gross <jgross@suse.com>, Kees Cook <keescook@chromium.org>, linux-sparse@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>, Randy Dunlap <rdunlap@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, virtualization@lists.linux-foundation.org Subject: [PATCH 0/6] Macrofying inline assembly for better compilation Date: Thu, 17 May 2018 09:13:56 -0700 [thread overview] Message-ID: <20180517161402.78089-1-namit@vmware.com> (raw) This patch-set deals with an interesting yet stupid problem: kernel code that does not get inlined despite its simplicity. There are several causes for this behavior: "cold" attribute on __init, different function optimization levels; conditional constant computations based on __builtin_constant_p(); and finally large inline assembly blocks. This patch-set deals with the inline assembly problem. I separated these patches from the others (that were sent in the RFC) for easier inclusion. The problem with inline assembly is that inline assembly is often used by the kernel for things that are other than code - for example, assembly directives and data. GCC however is oblivious to the content of the blocks and assumes their cost in space and time is proportional to the number of the perceived assembly "instruction", according to the number of newlines and semicolons. Alternatives, paravirt and other mechanisms are affected, causing code not to be inlined, and degrading compilation quality in general. The solution that this patch-set carries for this problem is to create an assembly macro, and then call it from the inline assembly block. As a result, the compiler sees a single "instruction" and assigns the more appropriate cost to the code. In addition, this patch-set removes unneeded new-lines from common x86 inline asm's, which "confuse" GCC heuristics. Overall this patch-set slightly increases the kernel size (my build was done using my localmodconfig + localyesconfig for the record): text data bss dec hex filename 18126699 10066728 2936832 31130259 1db0293 ./vmlinux before 18148888 10064016 2936832 31149736 1db4ea8 ./vmlinux after (+0.06%) The patch-set eliminates many of the static text symbols: Before: 40033 After: 39650 (-1%) A microbenchmark with a loop of page-fault and MADV_DONTNEED show 2% performance improvement with this patch-set (when PTI is off). Changes from RFC: - Better formatting [Jan] - i386 build problems [0-day] - Inline comments - Separating __builtin_constant_p() into a different future patch-set Cc: Alok Kataria <akataria@vmware.com> Cc: Christopher Li <sparse@chrisli.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-sparse@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: x86@kernel.org Nadav Amit (6): x86: objtool: use asm macro for better compiler decisions x86: bug: prevent gcc distortions x86: alternative: macrofy locks for better inlining x86: prevent inline distortion by paravirt ops x86: refcount: prevent gcc distortions x86: removing unneeded new-lines arch/x86/include/asm/alternative.h | 34 +++++++++++---- arch/x86/include/asm/asm.h | 4 +- arch/x86/include/asm/bug.h | 56 ++++++++++++++++-------- arch/x86/include/asm/cmpxchg.h | 10 ++--- arch/x86/include/asm/paravirt_types.h | 63 +++++++++++++++++---------- arch/x86/include/asm/refcount.h | 62 ++++++++++++++++---------- arch/x86/include/asm/special_insns.h | 12 ++--- include/linux/compiler.h | 37 ++++++++++++---- 8 files changed, 183 insertions(+), 95 deletions(-) -- 2.17.0
next reply other threads:[~2018-05-17 23:28 UTC|newest] Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-05-17 16:13 Nadav Amit [this message] 2018-05-17 16:13 ` [PATCH 0/6] Macrofying inline assembly for better compilation Nadav Amit 2018-05-17 16:13 ` [PATCH 1/6] x86: objtool: use asm macro for better compiler decisions Nadav Amit 2018-05-17 16:13 ` Nadav Amit 2018-05-17 16:13 ` [PATCH 2/6] x86: bug: prevent gcc distortions Nadav Amit 2018-05-18 7:58 ` Peter Zijlstra 2018-05-18 8:13 ` Ingo Molnar 2018-05-18 10:11 ` Borislav Petkov 2018-05-18 14:36 ` Nadav Amit 2018-05-18 15:40 ` Borislav Petkov 2018-05-18 15:46 ` Nadav Amit 2018-05-18 15:53 ` Borislav Petkov 2018-05-18 16:29 ` Nadav Amit 2018-05-18 17:41 ` Boris Petkov 2018-05-18 14:30 ` Nadav Amit 2018-05-18 14:22 ` Nadav Amit 2018-05-18 17:52 ` Joe Perches 2018-05-18 16:24 ` Linus Torvalds 2018-05-18 17:24 ` Nadav Amit 2018-05-18 18:25 ` Linus Torvalds 2018-05-18 18:33 ` hpa 2018-05-18 18:50 ` Linus Torvalds 2018-05-18 18:53 ` hpa 2018-05-18 19:02 ` Nadav Amit 2018-05-18 19:05 ` hpa 2018-05-18 19:11 ` Linus Torvalds 2018-05-18 19:18 ` Nadav Amit 2018-05-18 19:21 ` Linus Torvalds 2018-05-18 19:22 ` hpa 2018-05-18 19:36 ` Nadav Amit 2018-05-18 19:41 ` hpa 2018-05-17 16:13 ` [PATCH 3/6] x86: alternative: macrofy locks for better inlining Nadav Amit 2018-05-17 16:14 ` [PATCH 4/6] x86: prevent inline distortion by paravirt ops Nadav Amit 2018-05-17 16:14 ` [PATCH 5/6] x86: refcount: prevent gcc distortions Nadav Amit 2018-05-19 4:27 ` kbuild test robot 2018-05-17 16:14 ` [PATCH 6/6] x86: removing unneeded new-lines Nadav Amit 2018-05-18 9:20 ` [PATCH 0/6] Macrofying inline assembly for better compilation David Laight 2018-05-18 9:20 ` David Laight 2018-05-18 14:15 ` Nadav Amit
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180517161402.78089-1-namit@vmware.com \ --to=namit@vmware.com \ --cc=JBeulich@suse.com \ --cc=akataria@vmware.com \ --cc=corbet@lwn.net \ --cc=hpa@zytor.com \ --cc=jgross@suse.com \ --cc=jpoimboe@redhat.com \ --cc=keescook@chromium.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-sparse@vger.kernel.org \ --cc=mingo@redhat.com \ --cc=nadav.amit@gmail.com \ --cc=peterz@infradead.org \ --cc=rdunlap@infradead.org \ --cc=sparse@chrisli.org \ --cc=tglx@linutronix.de \ --cc=virtualization@lists.linux-foundation.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.