All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Desaulniers <ndesaulniers@google.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Bill Wendling <morbo@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,  Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<x86@kernel.org>, Nathan Chancellor <nathan@kernel.org>,
	Juergen Gross <jgross@suse.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>,
	llvm@lists.linux.dev,  LKML <linux-kernel@vger.kernel.org>,
	linux-toolchains@vger.kernel.org,
	 Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v5] x86: use builtins to read eflags
Date: Thu, 17 Mar 2022 11:00:11 -0700	[thread overview]
Message-ID: <CAKwvOdkk-C8HMemKs4+yoxvNDgTLmvZG1rmwjVXBqhsQ-cED5g@mail.gmail.com> (raw)
In-Reply-To: <F5296439-4CA3-4F31-BD91-5ED1510BC382@zytor.com>

On Thu, Mar 17, 2022 at 8:43 AM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On March 15, 2022 12:19:40 AM PDT, Bill Wendling <morbo@google.com> wrote:
> >On Mon, Mar 14, 2022 at 6:08 PM H. Peter Anvin <hpa@zytor.com> wrote:
> >>
> >> My only concern is this: how does this affect the placement of these sequences in relation to the things they need to protect?
> >>
> >With clang (and I assume it's similar with gcc), the
> >__builtin_ia32_readeflags_u{64|32} builtins specify that the EFLAGS
> >register is used (and the stack pointer is modified). Instructions
> >that may change EFLAGS are marked as defining EFLAGS, and clang and
> >gcc are careful not to move instructions that access EFLAGS past them.
> >
> >One change you may see due to this patch is the compiler moving the
> >"pop %..." instruction away from the "pushf" instruction. This could
> >happen if the compiler determines that it could produce better code by
> >doing so---e.g. to reduce register pressure. The "gcc -O2" example
> >below shows this code movement.
> >
> >-bw
> >
> >> On March 14, 2022 4:07:23 PM PDT, Bill Wendling <morbo@google.com> wrote:
> >>>
> >>> On Tue, Mar 1, 2022 at 12:19 PM Bill Wendling <morbo@google.com> wrote:
> >>>>
> >>>>
> >>>>  Clang generates good code when the builtins are used. On one benchmark,
> >>>>  a hotspot in kmem_cache_free went from using 5.18% of cycles popping to
> >>>>  a memory address to 0.13% popping to a register. This benefit is
> >>>>  magnified given that this code is inlined in numerous places in the
> >>>>  kernel.
> >>>>
> >>>>  The builtins also help GCC. It allows GCC (and Clang) to reduce register
> >>>>  pressure and, consequently, register spills by rescheduling
> >>>>  instructions. It can't happen with instructions in inline assembly,
> >>>>  because compilers view inline assembly blocks as "black boxes," whose
> >>>>  instructions can't be rescheduled.
> >>>>
> >>>>  Another benefit of builtins over asm blocks is that compilers are able
> >>>>  to make more precise inlining decisions, since they no longer need to
> >>>>  rely on imprecise measures based on newline counts.
> >>>>
> >>>>  A trivial example demonstrates this code motion.
> >>>>
> >>>>          void y(void);
> >>>>          unsigned long x(void) {
> >>>>                  unsigned long v = __builtin_ia32_readeflags_u64();
> >>>>                  y();
> >>>>                  return v;
> >>>>          }
> >>>>
> >>>>  GCC at -O1:
> >>>>          pushq   %rbx
> >>>>          pushfq
> >>>>          popq    %rbx
> >>>>          movl    $0, %eax
> >>>>          call    y
> >>>>          movq    %rbx, %rax
> >>>>          popq    %rbx
> >>>>          ret
> >>>>
> >>>>  GCC at -O2:
> >>>>          pushq   %r12
> >>>>          pushfq
> >>>>          xorl    %eax, %eax
> >>>>          popq    %r12
> >>>>          call    y
> >>>>          movq    %r12, %rax
> >>>>          popq    %r12
> >>>>          ret
> >>>>
> EFLAGS is a mishmash of different things, only some of which are visible to the compiler, and the semantics of which are totally different.
>
> Changing IF, changing DF, changing AC, and changing the arithmetic flags have *enormously* different properties. The compiler can't know how the semantics of a particular instance is, at least without being explicitly told (giving it a some kind of mask of bits that could change.)

EFLAGS is defined as the lower 16 bits of FLAGS register, yeah? Aren't
IF, DF, and AC all within those range of bits?  Yes; there's no way to
express finer grain resolution that you only care to read/write an
individual bitfield and not the whole register; EFLAGS is the finest
range.  Even if the compiler determined via the use of the results of
reading eflags that only a particular bitfield was needed, I'm pretty
sure there's no instructions in X86 for reading these individual
fields.  So I don't understand your point; the finest grain resolution
the compiler has to work with is the whole EFLAGS register, not
individual bits like IF, DF, or AC.  (I triple checked this with the
maintainer of LLVM's x86 backend).

> The memory barrier is needed for IF changes, for example.

Shouldn't native_irq_disable/native_irq_enable be declaring that they
clobber "cc" then?
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers

> This feels like "let's fix LLVM by hacking the kernel in dangerous ways" once again!

Even if the behavior of llvm was changed, you'd _still_ get better
codegen for _both_ compilers by using the intrinsics.

Another issue is that when multiple constraints are used, the order
they're selected in is unspecified.
https://gcc.gnu.org/onlinedocs/gcc/Constraints.html#Constraints if you
want to look.
I could ask GCC folks if they want to specify that behavior; doing so
does take away some freedom from implementations, so they may choose
not to.

> We rely on "memory" as compiler barriers *all over the place*.

Ah, your concern is about removing the explicit "memory"
clobber/compiler barrier. Now you're earlier question "how does this
affect the placement of these sequences in relation to the things they
need to protect?" makes sense.  Is the concern a call to
native_save_fl() being re-ordered with respect to a call to
native_irq_disable()/native_irq_enable()? Or do you have a call site
of native_save_fl() that you have concrete concerns about?

> This is especially so since this appears to be a suboptimal code generation issue and not a correctness issue; your change is likely to promote the former (underoptimizing) to the latter (overoptimizing.)

How so? Are you going to ignore the example posted above (`x()`)? Can
you help me understand via code and disassembly that can demonstrate
your claim?
-- 
Thanks,
~Nick Desaulniers

  reply	other threads:[~2022-03-17 18:00 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15 21:18 [PATCH] x86: use builtins to read eflags Bill Wendling
2021-12-15 22:46 ` Nathan Chancellor
2021-12-15 23:26 ` Peter Zijlstra
2021-12-16 20:00   ` Bill Wendling
2021-12-16 20:07     ` Nick Desaulniers
2021-12-16  0:57 ` Thomas Gleixner
2021-12-16 19:55   ` Bill Wendling
2021-12-17 12:48     ` Peter Zijlstra
2021-12-17 19:39     ` Thomas Gleixner
2022-03-14 23:09     ` H. Peter Anvin
2022-03-15  0:08       ` Bill Wendling
2021-12-16 19:58   ` Nick Desaulniers
2021-12-29  2:12 ` [PATCH v2] " Bill Wendling
2022-01-27 20:56   ` Bill Wendling
2022-02-04  0:16   ` Thomas Gleixner
2022-02-04  0:58     ` Bill Wendling
2022-02-04  0:57   ` [PATCH v3] " Bill Wendling
2022-02-07 22:11     ` Nick Desaulniers
2022-02-08  9:14       ` David Laight
2022-02-08 23:18         ` Bill Wendling
2022-02-14 23:53         ` Nick Desaulniers
2022-02-10 22:31     ` [PATCH v4] " Bill Wendling
2022-02-11 16:40       ` David Laight
2022-02-11 19:25         ` Bill Wendling
2022-02-11 22:09           ` David Laight
2022-02-11 23:33             ` Bill Wendling
2022-02-12  0:24           ` Nick Desaulniers
2022-02-12  9:23             ` Bill Wendling
2022-02-15  0:33               ` Nick Desaulniers
2022-03-01 20:19       ` [PATCH v5] " Bill Wendling
2022-03-14 23:07         ` Bill Wendling
     [not found]           ` <AC3D873E-A28B-41F1-8BF4-2F6F37BCEEB4@zytor.com>
2022-03-15  7:19             ` Bill Wendling
2022-03-17 15:43               ` H. Peter Anvin
2022-03-17 18:00                 ` Nick Desaulniers [this message]
2022-03-17 18:52                   ` Linus Torvalds
2022-03-17 19:45                     ` Bill Wendling
2022-03-17 20:13                       ` Linus Torvalds
2022-03-17 21:10                         ` Bill Wendling
2022-03-17 21:21                           ` Linus Torvalds
2022-03-17 21:45                             ` Bill Wendling
2022-03-17 22:51                               ` Linus Torvalds
2022-03-17 23:14                                 ` Linus Torvalds
2022-03-17 23:19                                 ` Segher Boessenkool
2022-03-17 23:31                                   ` Linus Torvalds
2022-03-18  0:05                                     ` Segher Boessenkool
2022-03-17 22:37                       ` Segher Boessenkool
2022-03-17 20:13                     ` Florian Weimer
2022-03-17 20:36                       ` Linus Torvalds
2022-03-18  0:25                         ` Segher Boessenkool
2022-03-18  1:21                           ` Linus Torvalds
2022-03-18  1:50                             ` Linus Torvalds
2022-03-17 21:05                     ` Andrew Cooper
2022-03-17 21:39                       ` Linus Torvalds
2022-03-18 17:59                         ` Andy Lutomirski
2022-03-18 18:19                           ` Linus Torvalds
2022-03-18 21:48                             ` Andrew Cooper
2022-03-18 23:10                               ` Linus Torvalds
2022-03-18 23:42                                 ` Segher Boessenkool
2022-03-19  1:13                                   ` Linus Torvalds
2022-03-19 23:15                                   ` Andy Lutomirski
2022-03-18 22:09                             ` Segher Boessenkool
2022-03-18 22:33                               ` H. Peter Anvin
2022-03-18 22:36                               ` David Laight
2022-03-18 22:47                                 ` H. Peter Anvin
2022-03-18 22:43                             ` David Laight
2022-03-18 23:03                               ` H. Peter Anvin
2022-03-18 23:04                         ` Segher Boessenkool
2022-03-18 23:52                           ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKwvOdkk-C8HMemKs4+yoxvNDgTLmvZG1rmwjVXBqhsQ-cED5g@mail.gmail.com \
    --to=ndesaulniers@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.