linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Denys Vlasenko <dvlasenk@redhat.com>,
	Ingo Molnar <mingo@kernel.org>, Andy Lutomirski <luto@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Vinod Koul <vinod.koul@intel.com>,
	"Johannes Berg" <johannes.berg@intel.com>,
	Sara Sharon <sara.sharon@intel.com>,
	"Adrian Hunter" <adrian.hunter@intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: RFC: Petition Intel/AMD to add POPF_IF insn
Date: Wed, 17 Aug 2016 19:30:54 +0200	[thread overview]
Message-ID: <64bd8281-aeb4-391b-90c6-3f5e44f0ce07@amd.com> (raw)
In-Reply-To: <f2d39812-8e2b-ed32-398f-8bb32bd20c7f@redhat.com>

Well first of all you actually CCed AMDs graphics developers. So Alex 
and I can't say much about CPU optimizations.

Additional to that a comment below.

Am 17.08.2016 um 19:20 schrieb Denys Vlasenko:
> Last year, a proposal was floated to avoid costly POPF.
> In particular, each spin_unlock_irqrestore() does it,
> a rather common operation.
>
> https://lkml.org/lkml/2015/4/21/290
>     [RFC PATCH] x86/asm/irq: Don't use POPF but STI
>
> * Andy Lutomirski <luto@kernel.org> wrote:
>> > Another different approach would be to formally state that
>> > pv_irq_ops.save_fl() needs to return all the flags, which would
>> > make local_irq_save() safe to use in this circumstance, but that
>> > makes a hotpath longer for the sake of a single boot time check.
>>
>> ...which reminds me:
>>
>> Why does native_restore_fl restore anything other than IF?  A branch
>> and sti should be considerably faster than popf.
>
>
> Ingo agreed:
> ====
> Yes, this has come up in the past, something like the patch below?
>
> Totally untested and not signed off yet: because we'd first have to
> make sure (via irq flags debugging) that it's not used in reverse, to
> re-disable interrupts:
>     local_irq_save(flags);
>     local_irq_enable();
>     ...
>     local_irq_restore(flags); /* effective local_irq_disable() */
> I don't think we have many (any?) such patterns left, but it has to be
> checked first. If we have such cases then we'll have to use different
> primitives there.
> ====
>
> Linus replied:
> =====
> "popf" is fast for the "no changes to IF" case, and is a smaller
> instruction anyway.
> ====
>
>
> This basically shot down the proposal.
>
> But in my measurements POPF is not fast even in the case where restored
> flags are not changes at all:
>
>                 mov     $200*1000*1000, %eax
>                 pushf
>                 pop     %rbx
>                 .balign 64
> loop:           push    %rbx
>                 popf
>                 dec     %eax
>                 jnz     loop
>
> # perf stat -r20 ./popf_1g
>      4,929,012,093      cycles                    #    3.412 
> GHz                      ( +-  0.02% )
>        835,721,371      instructions              #    0.17  insn per 
> cycle           ( +-  0.02% )
>        1.446185359 seconds time 
> elapsed                                          ( +-  0.46% )
>
> If I replace POPF with a pop into an unused register, I get this:

You are comparing apples and bananas here.

The microcode path in the CPUs will probably through away the read if 
you don't actually use the register value.

Regards,
Christian.

>
> loop:           push    %rbx
>                 pop     %rcx
>                 dec     %eax
>                 jnz     loop
>
>        209,247,645      cycles                    #    3.209 
> GHz                      ( +-  0.11% )
>        801,898,016      instructions              #    3.83  insn per 
> cycle           ( +-  0.00% )
>        0.066210725 seconds time 
> elapsed                                          ( +-  0.59% )
>
>
> IOW, POPF takes at least 6 cycles.
>
>
> Linus does have a point that a "test+branch+STI" may end up not a 
> clear win because of the branch.
>
> But the need to restore IF flag exists, it is necessary not only for 
> Linux, but for any OS
> running on x86: they all have some sort of spinlock.
>
> The addition of a POPF instruction variant which looks only at IF bit 
> and changes
> only that bit in EFLAGS may be a good idea, for all OSes.
>
> I propose that we ask Intel / AMD to do that.
>
> Maybe by the "ignored prefix" trick which was used when LZCNT insn was 
> introduced
> as REPZ-prefixed BSR?
> Currently, REPZ POPF (f3 9d) insn does execute. Redefine this opcode as
> POPF_IF. Then the same kernel will work on old and new CPUs.
>
> CC'ing some @intel and @amd emails...

  reply	other threads:[~2016-08-17 20:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-17 17:20 RFC: Petition Intel/AMD to add POPF_IF insn Denys Vlasenko
2016-08-17 17:30 ` Christian König [this message]
2016-08-17 18:34   ` Denys Vlasenko
2016-08-17 17:32 ` Linus Torvalds
2016-08-17 18:41   ` Denys Vlasenko
     [not found]     ` <CA+55aFwmxwQBkyDjombS4cy1q=a_buhmDjDnaa8rdC8ZDaDYEA@mail.gmail.com>
2016-08-17 19:13       ` Andy Lutomirski
2016-08-17 19:26         ` Denys Vlasenko
2016-08-17 19:32           ` Linus Torvalds
2016-08-17 19:35             ` Denys Vlasenko
2016-08-17 19:54               ` Linus Torvalds
2016-08-17 19:37             ` Linus Torvalds
2016-08-17 21:26               ` Linus Torvalds
2016-08-17 21:35                 ` Linus Torvalds
2016-08-17 21:43                   ` Andy Lutomirski
2016-08-17 21:48                     ` Linus Torvalds
2016-08-18 13:26                       ` Denys Vlasenko
2016-08-18 17:24                         ` Linus Torvalds
2016-08-18 17:47                           ` Denys Vlasenko
2016-08-18 17:49                             ` Denys Vlasenko
2016-08-19 10:54                           ` Paolo Bonzini
2016-08-31 11:12                             ` Denys Vlasenko
2016-08-18  9:21                   ` Denys Vlasenko
2016-08-18 12:18                     ` Denys Vlasenko
2016-08-18 17:22                       ` Paolo Bonzini
2016-08-17 19:29         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64bd8281-aeb4-391b-90c6-3f5e44f0ce07@amd.com \
    --to=christian.koenig@amd.com \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.deucher@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=dvlasenk@redhat.com \
    --cc=johannes.berg@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=sara.sharon@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vinod.koul@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).