All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uros Bizjak <ubizjak@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	Andy Lutomirski <luto@kernel.org>, Ingo Molnar <mingo@kernel.org>,
	Nadav Amit <namit@vmware.com>, Brian Gerst <brgerst@gmail.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>,
	Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: [PATCH v2 4/4] x86/percpu: Use C for percpu read/write accessors
Date: Wed, 4 Oct 2023 22:07:54 +0200	[thread overview]
Message-ID: <CAFULd4YRHmQVnwaORm7=7kUs7DYG7SfwdTXAitDt=bxiMU5AoQ@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wjuRGzhuETLYDoi4hM6RAxHVL0ptuRb3TH-od+348Y8zA@mail.gmail.com>

On Wed, Oct 4, 2023 at 9:42 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Unrelated reaction..
>
> On Wed, 4 Oct 2023 at 12:24, Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > the code improves from:
> >
> >  65 8b 05 00 00 00 00    mov    %gs:0x0(%rip),%eax
> >  a9 00 00 0f 00          test   $0xf0000,%eax
> >
> > to:
> >
> >  65 f7 05 00 00 00 00    testl  $0xf0000,%gs:0x0(%rip)
> >  00 00 0f 00
>
> Funky.
>
> Why does gcc generate that full-width load from memory, and not demote
> it to a byte test?

It does when LSB is accessed at the same address. For example:

int m;
_Bool foo (void) { return m & 0x0f; }

compiles to:

  0:   f6 05 00 00 00 00 0f    testb  $0xf,0x0(%rip)        # 7 <foo+0x7>

>
> IOW, it should not be
>
>   65 f7 05 00 00 00 00 testl  $0xf0000,%gs:0x0(%rip)
>   00 00 0f 00
>
> after optimizing it, it should be three bytes shorter at
>
>   65 f6 05 00 00 00 00 testb  $0xf,%gs:0x0(%rip)
>   0f
>
> instead (this is "objdump", so it doesn't show that the relocation
> entry has changed by +2 to compensate).
>
> Now, doing the access narrowing is a bad idea for stores (because it
> can cause subsequent loads to have conflicts in the store buffer), but
> for loads it should always be a win to narrow the access.
>
> I wonder why gcc doesn't do it. This is not related to __seg_gs - I
> tried it with regular memory accesses too, and gcc kept those as
> 32-bit accesses too.
>
> And no, the assembler can't optimize that operation either, since I
> think changing the testl to a testb would change the 'P' bit in the
> resulting eflags, so this is a "the compiler could pick a better
> instruction choice" thing.
>
> I'm probably missing some reason why gcc wouldn't do this. But clang
> does seem to do this obvious optimization.

You get a store forwarding stall when you write a bigger operand to
memory and then read part of it, if the smaller part doesn't start at
the same
address.

Uros.

  reply	other threads:[~2023-10-04 20:08 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-04 14:49 [PATCH 0/4] x86/percpu: Use segment qualifiers Uros Bizjak
2023-10-04 14:49 ` [PATCH 1/4] x86/percpu: Update arch/x86/include/asm/percpu.h to the current tip Uros Bizjak
2023-10-04 14:49 ` [PATCH 2/4] x86/percpu: Enable named address spaces with known compiler version Uros Bizjak
2023-10-05  7:20   ` [tip: x86/percpu] " tip-bot2 for Uros Bizjak
2023-10-04 14:49 ` [PATCH 3/4] x86/percpu: Use compiler segment prefix qualifier Uros Bizjak
2023-10-05  7:20   ` [tip: x86/percpu] " tip-bot2 for Nadav Amit
2023-10-04 14:49 ` [PATCH 4/4] x86/percpu: Use C for percpu read/write accessors Uros Bizjak
2023-10-04 16:37   ` Ingo Molnar
2023-10-04 16:40     ` Ingo Molnar
2023-10-04 19:23     ` [PATCH v2 " Uros Bizjak
2023-10-04 19:42       ` Linus Torvalds
2023-10-04 20:07         ` Uros Bizjak [this message]
2023-10-04 20:12           ` Linus Torvalds
2023-10-04 20:19             ` Linus Torvalds
2023-10-04 20:22               ` Uros Bizjak
2023-10-05  7:06       ` Ingo Molnar
2023-10-05  7:40         ` Uros Bizjak
2023-10-05  7:20       ` [tip: x86/percpu] " tip-bot2 for Uros Bizjak
2023-10-08 17:59   ` [PATCH 4/4] " Linus Torvalds
2023-10-08 19:17     ` Uros Bizjak
2023-10-08 20:13       ` Linus Torvalds
2023-10-08 20:48         ` Linus Torvalds
2023-10-08 21:41           ` Uros Bizjak
2023-10-09 11:41             ` Ingo Molnar
2023-10-09 11:51               ` Ingo Molnar
2023-10-09 12:00                 ` Uros Bizjak
2023-10-09 12:20                   ` Ingo Molnar
2023-10-09 12:21                   ` Nadav Amit
2023-10-09 12:42                     ` Uros Bizjak
2023-10-09 12:53                       ` Nadav Amit
2023-10-09 12:27               ` Uros Bizjak
2023-10-09 14:35               ` Uros Bizjak
2024-04-10 11:11                 ` Andrey Konovalov
2024-04-10 11:21                   ` Uros Bizjak
2024-04-10 11:24                     ` Andrey Konovalov
2023-10-09 11:42       ` Ingo Molnar
2023-10-10  6:37     ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFULd4YRHmQVnwaORm7=7kUs7DYG7SfwdTXAitDt=bxiMU5AoQ@mail.gmail.com' \
    --to=ubizjak@gmail.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=namit@vmware.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.