All of lore.kernel.org
 help / color / mirror / Atom feed
From: Loic Dachary <loic@dachary.org>
To: Milosz Tanski <milosz@adfin.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: GCC -msse2 portability question
Date: Wed, 26 Mar 2014 19:24:13 +0100	[thread overview]
Message-ID: <53331B4D.8030901@dachary.org> (raw)
In-Reply-To: <CANP1eJG9xoCPkFs19KXG1RPUqc-D3aO_0SBOM=4WWFRN2JtX=g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5686 bytes --]



On 26/03/2014 18:40, Milosz Tanski wrote:
> Loic,
> 
> I don't mean to be redundant since I posted this comment already in
> the github on commit comments but I'm not sure if you saw this.

Thanks for posting it : your comment got lost by a rebase ( github's not good at that ... ).

> Instead of doing cpuid manually you can use builtins provided in gcc
> (and in clang). There's a cpuid.h header you can include. This
> stackoverflow answer has a good summary of it:
> http://stackoverflow.com/questions/14266772/how-do-i-call-cpuid-in-linux?answertab=votes#tab-top

It is a nice improvement to have indeed. Created http://tracker.ceph.com/issues/7869

Cheers

> On Wed, Mar 26, 2014 at 3:14 AM, Loic Dachary <loic@dachary.org> wrote:
>> Hi Kevin & Milosz,
>>
>> So it would be
>>
>> if(sse4 & sse3) => use a plugin compiled with sse + sse3 + sse4 activated
>> else if(sse3) => use a plugin with sse2 + sse3 activated but not sse4
>> else => fallback to not using sse at all
>>
>> like so:
>>
>> https://github.com/dachary/ceph/commit/b6e4307bd2ee1de6e8bbda0ced370d484d512114#diff-5249f49580782dfe95a1cbcc986ee5deR113
>>
>> If I understand Laurent correctly, the right approach would be to semi-transparently generate and select the code path depending on the features at runtime. But that would require more work and I created a ticket to track this : http://tracker.ceph.com/issues/7865
>>
>> Does that sound right ?
>>
>> On 25/03/2014 22:31, Kevin Greenan wrote:
>>> Hey Loic,
>>>
>>> I think we want something closer to what Milosz is proposing (3 cut-offs instead of 2) .  The shuffle instruction is part of SSSE3 and is the basis for the SSE split table techniques, which are super fast.  By doing all-or-nothing, it is possible many users would not be able to take advantage of it when they are capable.
>>>
>>> Make sense?
>>>
>>> -kevin
>>>
>>>
>>> On Tue, Mar 25, 2014 at 12:46 PM, Milosz Tanski <milosz@adfin.com <mailto:milosz@adfin.com>> wrote:
>>>
>>>     It gets a bit more tricky with x86_64 since the arch dictates that the
>>>     base line has SSE2 (but not necessarily later).
>>>
>>>     I would do is both support SSE2 (maybe in core without dlopen) and
>>>     then support all the others in a SSE4 version (including SSE4_PCMUL).
>>>     I'm glossing over x86-32 here, but you could something similar.
>>>
>>>     Best
>>>     - Milosz
>>>
>>>     On Tue, Mar 25, 2014 at 3:21 PM, Loic Dachary <loic@dachary.org <mailto:loic@dachary.org>> wrote:
>>>     >
>>>     >
>>>     > On 25/03/2014 20:13, Kevin Greenan wrote:
>>>     >> +1
>>>     >>
>>>     >> Yeah, that sounds better...  Let's keep this as simple as possible.
>>>     >
>>>     > I'll rework the https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse accordingly.
>>>     >
>>>     > Would it be sensible to compile with SSE optimizations only if all are available ( SSE2, SSSE3, SSE4, SSE4_PCMUL ) and not attempt to distinguish betweel SSSE3 being available but not SSE4_PCMUL etc. From what I understand at this point that kind of distinction is going to be difficult to manage anyway.
>>>     >
>>>     > Is it too simplistic ?
>>>     >
>>>     >>
>>>     >> -kevin
>>>     >>
>>>     >>
>>>     >> On Tue, Mar 25, 2014 at 12:08 PM, Loic Dachary <loic@dachary.org <mailto:loic@dachary.org> <mailto:loic@dachary.org <mailto:loic@dachary.org>>> wrote:
>>>     >>
>>>     >>     Andreas Peters suggested another approach, which makes sense to me : have one plugin with SSE optimizations enabled, another without them and chose at runtime between the two.
>>>     >>
>>>     >>     What do you think ?
>>>     >>
>>>     >>     On 23/03/2014 20:50, Loic Dachary wrote:
>>>     >>     > Hi Laurent,
>>>     >>     >
>>>     >>     > In the context of optimizing erasure code functions implemented by Kevin Greenan (cc'ed) and James Plank at https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a negative impact on the portability of the compiled binary code ?
>>>     >>     >
>>>     >>     > In other words, if a code is compiled without -msse* and runs fine on all intel processors it targets, could it be that adding -msse* to the compilation of the same source code generate a binary that would fail on some processors ? This is assuming no sse specific functions were used in the source code.
>>>     >>     >
>>>     >>     > In gf-complete, all sse specific instructions are carefully protected to not be run on a CPU that does not support them. The runtime detection is done by checking CPU id bits ( see https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )
>>>     >>     >
>>>     >>     > The corresponding thread is at:
>>>     >>     >
>>>     >>     > https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
>>>     >>     >
>>>     >>     > Cheers
>>>     >>     >
>>>     >>
>>>     >>     --
>>>     >>     Loïc Dachary, Artisan Logiciel Libre
>>>     >>
>>>     >>
>>>     >
>>>     > --
>>>     > Loïc Dachary, Artisan Logiciel Libre
>>>     >
>>>
>>>
>>>
>>>     --
>>>     Milosz Tanski
>>>     CTO
>>>     10 East 53rd Street, 37th floor
>>>     New York, NY 10022
>>>
>>>     p: 646-253-9055 <tel:646-253-9055>
>>>     e: milosz@adfin.com <mailto:milosz@adfin.com>
>>>
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
> 
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

  parent reply	other threads:[~2014-03-26 18:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-23 19:50 GCC -msse2 portability question Loic Dachary
2014-03-23 22:34 ` Laurent GUERBY
2014-03-24 21:27   ` Loic Dachary
2014-03-25  9:43     ` Laurent GUERBY
2014-03-25  9:56       ` Loic Dachary
2014-03-25 11:22         ` Laurent GUERBY
2014-03-25 14:44           ` Milosz Tanski
2014-03-25 18:45             ` Loic Dachary
2014-03-24  1:40 ` Sage Weil
2014-03-25 19:08 ` Loic Dachary
     [not found]   ` <CA+AFVBhpOZEPehsd4qHCBr4aRzv60ZW8LzRwKsduUrZmLV1wxQ@mail.gmail.com>
2014-03-25 19:21     ` Loic Dachary
2014-03-25 19:46       ` Milosz Tanski
     [not found]         ` <CA+AFVBgOEz8_fv9H-8_kOuVSJNL3KQ+36b5kscfjnRMs09DZ6Q@mail.gmail.com>
     [not found]           ` <53327E59.7060408@dachary.org>
     [not found]             ` <CANP1eJG9xoCPkFs19KXG1RPUqc-D3aO_0SBOM=4WWFRN2JtX=g@mail.gmail.com>
2014-03-26 18:24               ` Loic Dachary [this message]
     [not found]             ` <CANP1eJErc4qnRhtOCs=Cnh6VNtihLVcZxB1PSCQjpH0sFDBuWA@mail.gmail.com>
2014-03-26 22:13               ` Loic Dachary

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53331B4D.8030901@dachary.org \
    --to=loic@dachary.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=milosz@adfin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.