All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Bill Wendling' <morbo@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"H . Peter Anvin" <hpa@zytor.com>,
	"Nathan Chancellor" <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Juergen Gross <jgross@suse.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Andy Lutomirski" <luto@kernel.org>,
	"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH v4] x86: use builtins to read eflags
Date: Fri, 11 Feb 2022 22:09:58 +0000	[thread overview]
Message-ID: <6b83fa302b974f749c60fc6c456e055f@AcuMS.aculab.com> (raw)
In-Reply-To: <CAGG=3QXvSt=d94iqSV-Y9JVNc+pt-WOZGpSeW--fp=w2ttMvUA@mail.gmail.com>

From: Bill Wendling > Sent: 11 February 2022 19:26
> 
> On Fri, Feb 11, 2022 at 8:40 AM David Laight <David.Laight@aculab.com> wrote:
> > From: Bill Wendling
> > > Sent: 10 February 2022 22:32
> > >
> > > GCC and Clang both have builtins to read and write the EFLAGS register.
> > > This allows the compiler to determine the best way to generate this
> > > code, which can improve code generation.
> > >
> > > This issue arose due to Clang's issue with the "=rm" constraint.  Clang
> > > chooses to be conservative in these situations, and so uses memory
> > > instead of registers. This is a known issue, which is currently being
> > > addressed.
> > >
> > > However, using builtins is beneficial in general, because it removes the
> > > burden of determining what's the way to read the flags register from the
> > > programmer and places it on to the compiler, which has the information
> > > needed to make that decision.
> >
> > Except that neither gcc nor clang attempt to make that decision.
> > They always do pushf; pop ax;
> >
> It looks like both GCC and Clang pop into virtual registers. The
> register allocator is then able to determine if it can allocate a
> physical register or if a stack slot is required.

Doing:
	int fl;
	void f(void) { fl = __builtin_ia32_readeflags_u64(); }
Seems to use register.
If it pops to a virtual register it will probably never pop
into a real target location.

See https://godbolt.org/z/8aY8o8rhe

But performance wise the pop+mov is just one byte longer.
Instruction decode time might be higher for two instruction, but since
'pop mem' generates 2 uops (intel) it may be constrained to the first
decoder (I can't rememberthe exact details), but the separate pop+mov
can be decoded in parallel - so could end up faster.

Actual execution time (if that makes any sense) is really the same.
Two operations, one pop and one memory write.

I bet you'd be hard pressed to find a piece of code where it even made
a consistent difference.

> > ...
> > > v4: - Clang now no longer generates stack frames when using these builtins.
> > >     - Corrected misspellings.
> >
> > While clang 'head' has been fixed, it seems a bit premature to say
> > it is 'fixed' enough for all clang builds to use the builtin.
> >
> True, but it's been cherry-picked into the clang 14.0.0 branch, which
> is scheduled for release in March.
> 
> > Seems better to change it (back) to "=r" and comment that this
> > is currently as good as __builtin_ia32_readeflags_u64() and that
> > clang makes a 'pigs breakfast' of "=rm" - which has only marginal
> > benefit.
> >
> That would be okay as far as code generation is concerned, but it does
> place the burden of correctness back on the programmer. Also, it was
> that at some point, but was changed to "=rm" here. :-)

As I said, a comment should stop the bounce.

...
> I was able to come up with an example where GCC generates "pushf ; pop mem":
> 
>   https://godbolt.org/z/9rocjdoaK
> 
> (Clang generates a variation of "pop mem," and is horrible code, but
> it's meant for demonstration purposes only.) One interesting thing
> about the use of the builtins is that if at all possible, the "pop"
> instruction may be moved away from the "pushf" if it's safe and would
> reduce register pressure.

I wouldn't trust the compiler to get stack pointer relative accesses
right if it does move them apart.
Definitely scope for horrid bugs ;-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  reply	other threads:[~2022-02-11 22:10 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15 21:18 [PATCH] x86: use builtins to read eflags Bill Wendling
2021-12-15 22:46 ` Nathan Chancellor
2021-12-15 23:26 ` Peter Zijlstra
2021-12-16 20:00   ` Bill Wendling
2021-12-16 20:07     ` Nick Desaulniers
2021-12-16  0:57 ` Thomas Gleixner
2021-12-16 19:55   ` Bill Wendling
2021-12-17 12:48     ` Peter Zijlstra
2021-12-17 19:39     ` Thomas Gleixner
2022-03-14 23:09     ` H. Peter Anvin
2022-03-15  0:08       ` Bill Wendling
2021-12-16 19:58   ` Nick Desaulniers
2021-12-29  2:12 ` [PATCH v2] " Bill Wendling
2022-01-27 20:56   ` Bill Wendling
2022-02-04  0:16   ` Thomas Gleixner
2022-02-04  0:58     ` Bill Wendling
2022-02-04  0:57   ` [PATCH v3] " Bill Wendling
2022-02-07 22:11     ` Nick Desaulniers
2022-02-08  9:14       ` David Laight
2022-02-08 23:18         ` Bill Wendling
2022-02-14 23:53         ` Nick Desaulniers
2022-02-10 22:31     ` [PATCH v4] " Bill Wendling
2022-02-11 16:40       ` David Laight
2022-02-11 19:25         ` Bill Wendling
2022-02-11 22:09           ` David Laight [this message]
2022-02-11 23:33             ` Bill Wendling
2022-02-12  0:24           ` Nick Desaulniers
2022-02-12  9:23             ` Bill Wendling
2022-02-15  0:33               ` Nick Desaulniers
2022-03-01 20:19       ` [PATCH v5] " Bill Wendling
2022-03-14 23:07         ` Bill Wendling
     [not found]           ` <AC3D873E-A28B-41F1-8BF4-2F6F37BCEEB4@zytor.com>
2022-03-15  7:19             ` Bill Wendling
2022-03-17 15:43               ` H. Peter Anvin
2022-03-17 18:00                 ` Nick Desaulniers
2022-03-17 18:52                   ` Linus Torvalds
2022-03-17 19:45                     ` Bill Wendling
2022-03-17 20:13                       ` Linus Torvalds
2022-03-17 21:10                         ` Bill Wendling
2022-03-17 21:21                           ` Linus Torvalds
2022-03-17 21:45                             ` Bill Wendling
2022-03-17 22:51                               ` Linus Torvalds
2022-03-17 23:14                                 ` Linus Torvalds
2022-03-17 23:19                                 ` Segher Boessenkool
2022-03-17 23:31                                   ` Linus Torvalds
2022-03-18  0:05                                     ` Segher Boessenkool
2022-03-17 22:37                       ` Segher Boessenkool
2022-03-17 20:13                     ` Florian Weimer
2022-03-17 20:36                       ` Linus Torvalds
2022-03-18  0:25                         ` Segher Boessenkool
2022-03-18  1:21                           ` Linus Torvalds
2022-03-18  1:50                             ` Linus Torvalds
2022-03-17 21:05                     ` Andrew Cooper
2022-03-17 21:39                       ` Linus Torvalds
2022-03-18 17:59                         ` Andy Lutomirski
2022-03-18 18:19                           ` Linus Torvalds
2022-03-18 21:48                             ` Andrew Cooper
2022-03-18 23:10                               ` Linus Torvalds
2022-03-18 23:42                                 ` Segher Boessenkool
2022-03-19  1:13                                   ` Linus Torvalds
2022-03-19 23:15                                   ` Andy Lutomirski
2022-03-18 22:09                             ` Segher Boessenkool
2022-03-18 22:33                               ` H. Peter Anvin
2022-03-18 22:36                               ` David Laight
2022-03-18 22:47                                 ` H. Peter Anvin
2022-03-18 22:43                             ` David Laight
2022-03-18 23:03                               ` H. Peter Anvin
2022-03-18 23:04                         ` Segher Boessenkool
2022-03-18 23:52                           ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6b83fa302b974f749c60fc6c456e055f@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.