From: Bill Wendling <morbo@google.com>
To: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Juergen Gross <jgross@suse.com>,
Peter Zijlstra <peterz@infradead.org>,
Andy Lutomirski <luto@kernel.org>,
llvm@lists.linux.dev
Cc: linux-kernel@vger.kernel.org, Bill Wendling <morbo@google.com>
Subject: [PATCH v5] x86: use builtins to read eflags
Date: Tue, 1 Mar 2022 12:19:03 -0800 [thread overview]
Message-ID: <20220301201903.4113977-1-morbo@google.com> (raw)
In-Reply-To: <20220210223134.233757-1-morbo@google.com>
This issue arose due to Clang's issue with the "=rm" constraint. Clang
chooses to be conservative in these situations and always uses memory
instead of registers, resulting in sub-optimal code. (This is a known
issue, which is currently being addressed.)
This function has gone through numerous changes over the years:
- The original version of this function used the "=g" constraint,
which has the following description:
Any register, memory or immediate integer operand is allowed,
except for registers that are not general registers.
- This was changed to "=r" in commit f1f029c7bfbf ("x86: fix assembly
constraints in native_save_fl()"), because someone noticed that:
the offset of the flags variable from the stack pointer will
change when the pushf is performed. gcc doesn't attempt to
understand that fact, and address used for pop will still be the
same. It will write to somewhere near flags on the stack but not
actually into it and overwrite some other value.
- However, commit f1f029c7bfbf ("x86: fix assembly constraints in
native_save_fl()") was partially reverted in commit ab94fcf528d1
("x86: allow "=rm" in native_save_fl()"), because the original
reporter of the issue was using a broken x86 simulator. The
justification for this change was:
"=rm" is allowed in this context, because "pop" is explicitly
defined to adjust the stack pointer *before* it evaluates its
effective address, if it has one. Thus, we do end up writing to
the correct address even if we use an on-stack memory argument.
Clang generates good code when the builtins are used. On one benchmark,
a hotspot in kmem_cache_free went from using 5.18% of cycles popping to
a memory address to 0.13% popping to a register. This benefit is
magnified given that this code is inlined in numerous places in the
kernel.
The builtins also help GCC. It allows GCC (and Clang) to reduce register
pressure and, consequently, register spills by rescheduling
instructions. It can't happen with instructions in inline assembly,
because compilers view inline assembly blocks as "black boxes," whose
instructions can't be rescheduled.
Another benefit of builtins over asm blocks is that compilers are able
to make more precise inlining decisions, since they no longer need to
rely on imprecise measures based on newline counts.
A trivial example demonstrates this code motion.
void y(void);
unsigned long x(void) {
unsigned long v = __builtin_ia32_readeflags_u64();
y();
return v;
}
GCC at -O1:
pushq %rbx
pushfq
popq %rbx
movl $0, %eax
call y
movq %rbx, %rax
popq %rbx
ret
GCC at -O2:
pushq %r12
pushfq
xorl %eax, %eax
popq %r12
call y
movq %r12, %rax
popq %r12
ret
Link: https://gist.github.com/nickdesaulniers/b4d0f6e26f8cbef0ae4c5352cfeaca67
Link: https://github.com/llvm/llvm-project/issues/20571
Link: https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints
Link: https://godbolt.org/z/5n3Eov1xT
Signed-off-by: Bill Wendling <morbo@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
---
v5: - Incorporate Nick's suggestion to limit the change to Clang >= 14.0 and
GCC.
v4: - Clang now no longer generates stack frames when using these builtins.
- Corrected misspellings.
v3: - Add blurb indicating that GCC's output hasn't changed.
v2: - Kept the original function to retain the out-of-line symbol.
- Improved the commit message.
- Note that I couldn't use Nick's suggestion of
return IS_ENABLED(CONFIG_X86_64) ? ...
because Clang complains about using __builtin_ia32_readeflags_u32 in
64-bit mode.
---
arch/x86/include/asm/irqflags.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 87761396e8cc..2eded855f6ab 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -19,6 +19,11 @@
extern inline unsigned long native_save_fl(void);
extern __always_inline unsigned long native_save_fl(void)
{
+#if defined(CC_IS_CLANG) && defined(UNWINDER_ORC) && CLANG_VERSION < 140000
+ /*
+ * Clang forced frame pointers via the builtins until Clang-14. Use
+ * this as a fall-back until the minimum Clang version is >= 14.0.
+ */
unsigned long flags;
/*
@@ -33,6 +38,11 @@ extern __always_inline unsigned long native_save_fl(void)
: "memory");
return flags;
+#elif defined(CONFIG_X86_64)
+ return __builtin_ia32_readeflags_u64();
+#else
+ return __builtin_ia32_readeflags_u32();
+#endif
}
static __always_inline void native_irq_disable(void)
--
2.35.1.574.g5d30c73bfb-goog
next prev parent reply other threads:[~2022-03-01 20:19 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-15 21:18 [PATCH] x86: use builtins to read eflags Bill Wendling
2021-12-15 22:46 ` Nathan Chancellor
2021-12-15 23:26 ` Peter Zijlstra
2021-12-16 20:00 ` Bill Wendling
2021-12-16 20:07 ` Nick Desaulniers
2021-12-16 0:57 ` Thomas Gleixner
2021-12-16 19:55 ` Bill Wendling
2021-12-17 12:48 ` Peter Zijlstra
2021-12-17 19:39 ` Thomas Gleixner
2022-03-14 23:09 ` H. Peter Anvin
2022-03-15 0:08 ` Bill Wendling
2021-12-16 19:58 ` Nick Desaulniers
2021-12-29 2:12 ` [PATCH v2] " Bill Wendling
2022-01-27 20:56 ` Bill Wendling
2022-02-04 0:16 ` Thomas Gleixner
2022-02-04 0:58 ` Bill Wendling
2022-02-04 0:57 ` [PATCH v3] " Bill Wendling
2022-02-07 22:11 ` Nick Desaulniers
2022-02-08 9:14 ` David Laight
2022-02-08 23:18 ` Bill Wendling
2022-02-14 23:53 ` Nick Desaulniers
2022-02-10 22:31 ` [PATCH v4] " Bill Wendling
2022-02-11 16:40 ` David Laight
2022-02-11 19:25 ` Bill Wendling
2022-02-11 22:09 ` David Laight
2022-02-11 23:33 ` Bill Wendling
2022-02-12 0:24 ` Nick Desaulniers
2022-02-12 9:23 ` Bill Wendling
2022-02-15 0:33 ` Nick Desaulniers
2022-03-01 20:19 ` Bill Wendling [this message]
2022-03-14 23:07 ` [PATCH v5] " Bill Wendling
[not found] ` <AC3D873E-A28B-41F1-8BF4-2F6F37BCEEB4@zytor.com>
2022-03-15 7:19 ` Bill Wendling
2022-03-17 15:43 ` H. Peter Anvin
2022-03-17 18:00 ` Nick Desaulniers
2022-03-17 18:52 ` Linus Torvalds
2022-03-17 19:45 ` Bill Wendling
2022-03-17 20:13 ` Linus Torvalds
2022-03-17 21:10 ` Bill Wendling
2022-03-17 21:21 ` Linus Torvalds
2022-03-17 21:45 ` Bill Wendling
2022-03-17 22:51 ` Linus Torvalds
2022-03-17 23:14 ` Linus Torvalds
2022-03-17 23:19 ` Segher Boessenkool
2022-03-17 23:31 ` Linus Torvalds
2022-03-18 0:05 ` Segher Boessenkool
2022-03-17 22:37 ` Segher Boessenkool
2022-03-17 20:13 ` Florian Weimer
2022-03-17 20:36 ` Linus Torvalds
2022-03-18 0:25 ` Segher Boessenkool
2022-03-18 1:21 ` Linus Torvalds
2022-03-18 1:50 ` Linus Torvalds
2022-03-17 21:05 ` Andrew Cooper
2022-03-17 21:39 ` Linus Torvalds
2022-03-18 17:59 ` Andy Lutomirski
2022-03-18 18:19 ` Linus Torvalds
2022-03-18 21:48 ` Andrew Cooper
2022-03-18 23:10 ` Linus Torvalds
2022-03-18 23:42 ` Segher Boessenkool
2022-03-19 1:13 ` Linus Torvalds
2022-03-19 23:15 ` Andy Lutomirski
2022-03-18 22:09 ` Segher Boessenkool
2022-03-18 22:33 ` H. Peter Anvin
2022-03-18 22:36 ` David Laight
2022-03-18 22:47 ` H. Peter Anvin
2022-03-18 22:43 ` David Laight
2022-03-18 23:03 ` H. Peter Anvin
2022-03-18 23:04 ` Segher Boessenkool
2022-03-18 23:52 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220301201903.4113977-1-morbo@google.com \
--to=morbo@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.