All of lore.kernel.org
 help / color / mirror / Atom feed
* GCC -msse2 portability question
@ 2014-03-23 19:50 Loic Dachary
  2014-03-23 22:34 ` Laurent GUERBY
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Loic Dachary @ 2014-03-23 19:50 UTC (permalink / raw)
  To: Laurent Guerby, Kevin Greenan; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

Hi Laurent,

In the context of optimizing erasure code functions implemented by Kevin Greenan (cc'ed) and James Plank at https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a negative impact on the portability of the compiled binary code ? 

In other words, if a code is compiled without -msse* and runs fine on all intel processors it targets, could it be that adding -msse* to the compilation of the same source code generate a binary that would fail on some processors ? This is assuming no sse specific functions were used in the source code.

In gf-complete, all sse specific instructions are carefully protected to not be run on a CPU that does not support them. The runtime detection is done by checking CPU id bits ( see https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )

The corresponding thread is at:

https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
  2014-03-23 19:50 GCC -msse2 portability question Loic Dachary
@ 2014-03-23 22:34 ` Laurent GUERBY
  2014-03-24 21:27   ` Loic Dachary
  2014-03-24  1:40 ` Sage Weil
  2014-03-25 19:08 ` Loic Dachary
  2 siblings, 1 reply; 14+ messages in thread
From: Laurent GUERBY @ 2014-03-23 22:34 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Kevin Greenan, Ceph Development

On Sun, 2014-03-23 at 20:50 +0100, Loic Dachary wrote:
> Hi Laurent,
> 
> In the context of optimizing erasure code functions implemented by
> Kevin Greenan (cc'ed) and James Plank at
> https://bitbucket.org/jimplank/gf-complete/ we ran accross a question
> you may have the answer to: can gcc -msse2 (or -msse* for that matter
> ) have a negative impact on the portability of the compiled binary
> code ? 
> 
> In other words, if a code is compiled without -msse* and runs fine on
> all intel processors it targets, could it be that adding -msse* to the
> compilation of the same source code generate a binary that would fail
> on some processors ? This is assuming no sse specific functions were
> used in the source code.
> 
> In gf-complete, all sse specific instructions are carefully protected
> to not be run on a CPU that does not support them. The runtime
> detection is done by checking CPU id bits ( see
> https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )
> 
> The corresponding thread is at:
> 
> https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
> 
> Cheers
> 

Hi Loic,

The GCC documentation is here with lists of architecture supporting
sse/sse2:

http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options

So unless you want to run your code a very very old x86 32 bit processor
"-msse" shouldn't be an issue. "-msse2" is similar.

-mtune=xxx with xxx being a recent arch could be interesting for you
because it keeps compatibility with the generic arch while tuning
resulting code on the specific arch (for example the current fashionable
arch like corei7).

For alibrary you can choose the code you execute a load/run time
for a specific function by using the STT_GNU_IFUNC feature :

http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2010/02/07
http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html#index-g_t_0040code_007bifunc_007d-attribute-2529

I believe recent GLIBC use this feature to tune
some performance/arch sensitive functions.

Sincerely,

Laurent



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
  2014-03-23 19:50 GCC -msse2 portability question Loic Dachary
  2014-03-23 22:34 ` Laurent GUERBY
@ 2014-03-24  1:40 ` Sage Weil
  2014-03-25 19:08 ` Loic Dachary
  2 siblings, 0 replies; 14+ messages in thread
From: Sage Weil @ 2014-03-24  1:40 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Laurent Guerby, Kevin Greenan, Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1447 bytes --]

Hi Loic,

FWIW something with the build or feature detection seems to be off, as I 
just got an Illegal Instruction signal on the current master:

	http://tracker.ceph.com/issues/7826

sage



On Sun, 23 Mar 2014, Loic Dachary wrote:

> Hi Laurent,
> 
> In the context of optimizing erasure code functions implemented by Kevin Greenan (cc'ed) and James Plank at https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a negative impact on the portability of the compiled binary code ? 
> 
> In other words, if a code is compiled without -msse* and runs fine on all intel processors it targets, could it be that adding -msse* to the compilation of the same source code generate a binary that would fail on some processors ? This is assuming no sse specific functions were used in the source code.
> 
> In gf-complete, all sse specific instructions are carefully protected to not be run on a CPU that does not support them. The runtime detection is done by checking CPU id bits ( see https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )
> 
> The corresponding thread is at:
> 
> https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
> 
> Cheers
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
  2014-03-23 22:34 ` Laurent GUERBY
@ 2014-03-24 21:27   ` Loic Dachary
  2014-03-25  9:43     ` Laurent GUERBY
  0 siblings, 1 reply; 14+ messages in thread
From: Loic Dachary @ 2014-03-24 21:27 UTC (permalink / raw)
  To: Laurent GUERBY; +Cc: Kevin Greenan, Ceph Development

[-- Attachment #1: Type: text/plain, Size: 2599 bytes --]



On 23/03/2014 23:34, Laurent GUERBY wrote:
> On Sun, 2014-03-23 at 20:50 +0100, Loic Dachary wrote:
>> Hi Laurent,
>>
>> In the context of optimizing erasure code functions implemented by
>> Kevin Greenan (cc'ed) and James Plank at
>> https://bitbucket.org/jimplank/gf-complete/ we ran accross a question
>> you may have the answer to: can gcc -msse2 (or -msse* for that matter
>> ) have a negative impact on the portability of the compiled binary
>> code ? 
>>
>> In other words, if a code is compiled without -msse* and runs fine on
>> all intel processors it targets, could it be that adding -msse* to the
>> compilation of the same source code generate a binary that would fail
>> on some processors ? This is assuming no sse specific functions were
>> used in the source code.
>>
>> In gf-complete, all sse specific instructions are carefully protected
>> to not be run on a CPU that does not support them. The runtime
>> detection is done by checking CPU id bits ( see
>> https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )
>>
>> The corresponding thread is at:
>>
>> https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
>>
>> Cheers
>>
> 
> Hi Loic,
> 
> The GCC documentation is here with lists of architecture supporting
> sse/sse2:
> 
> http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options
> 
> So unless you want to run your code a very very old x86 32 bit processor
> "-msse" shouldn't be an issue. "-msse2" is similar.

This is good to know :) Should I be worried about unintended side effects of -msse4.2 -mssse3 -msse4.1 or -mpclmul ? These are the flags that gf-complete are using, specifically.

Cheers

> 
> -mtune=xxx with xxx being a recent arch could be interesting for you
> because it keeps compatibility with the generic arch while tuning
> resulting code on the specific arch (for example the current fashionable
> arch like corei7).
> 
> For alibrary you can choose the code you execute a load/run time
> for a specific function by using the STT_GNU_IFUNC feature :
> 
> http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2010/02/07
> http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html#index-g_t_0040code_007bifunc_007d-attribute-2529
> 
> I believe recent GLIBC use this feature to tune
> some performance/arch sensitive functions.
> 
> Sincerely,
> 
> Laurent
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
  2014-03-24 21:27   ` Loic Dachary
@ 2014-03-25  9:43     ` Laurent GUERBY
  2014-03-25  9:56       ` Loic Dachary
  0 siblings, 1 reply; 14+ messages in thread
From: Laurent GUERBY @ 2014-03-25  9:43 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Kevin Greenan, Ceph Development

On Mon, 2014-03-24 at 22:27 +0100, Loic Dachary wrote:
> 
> On 23/03/2014 23:34, Laurent GUERBY wrote:
> > http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options
> > 
> > So unless you want to run your code a very very old x86 32 bit processor
> > "-msse" shouldn't be an issue. "-msse2" is similar.
> 
> This is good to know :) Should I be worried about unintended side effects of -msse4.2 -mssse3 -msse4.1 or -mpclmul ? These are the flags that gf-complete are using, specifically.

Hi,

SSE4.2 will be available only in more recent
processors as documented on the page above.

If your library already is dynamically checking for processor
feature I would advise to be conservative in your
-m flags, ie using what debian would use for maximum
x86 portability.

Sincerely,

Laurent


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
  2014-03-25  9:43     ` Laurent GUERBY
@ 2014-03-25  9:56       ` Loic Dachary
  2014-03-25 11:22         ` Laurent GUERBY
  0 siblings, 1 reply; 14+ messages in thread
From: Loic Dachary @ 2014-03-25  9:56 UTC (permalink / raw)
  To: Laurent GUERBY; +Cc: Kevin Greenan, Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1876 bytes --]

Hi Laurent,

It occurs to me that all we're after is to enable SSE functions such as _mm_set_epi32. We're not trying to have the binary optimized in any implicit way, it is all explicit. The problem seems to be that -msse4.2 will do both 

* activate _mm_set_epi32 etc functions 
* optimize the binary to use sse4.2 instructions

Do you know of a compiler flag that would only 

* activate _mm_set_epi32 etc functions 

and not

* optimize the binary to use sse4.2 instructions

? It may be a RTFM question and I apologize for that. Reading http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options it looks like this is more or less what --mtune=corei7-avx would do (because gf-complete uses PCLMUL when available). But it feels weird to specify a specific processor model where what we need is a set of features. 

Thanks for your help :-)

On 25/03/2014 10:43, Laurent GUERBY wrote:
> On Mon, 2014-03-24 at 22:27 +0100, Loic Dachary wrote:
>>
>> On 23/03/2014 23:34, Laurent GUERBY wrote:
>>> http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options
>>>
>>> So unless you want to run your code a very very old x86 32 bit processor
>>> "-msse" shouldn't be an issue. "-msse2" is similar.
>>
>> This is good to know :) Should I be worried about unintended side effects of -msse4.2 -mssse3 -msse4.1 or -mpclmul ? These are the flags that gf-complete are using, specifically.
> 
> Hi,
> 
> SSE4.2 will be available only in more recent
> processors as documented on the page above.
> 
> If your library already is dynamically checking for processor
> feature I would advise to be conservative in your
> -m flags, ie using what debian would use for maximum
> x86 portability.
> 
> Sincerely,
> 
> Laurent
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
  2014-03-25  9:56       ` Loic Dachary
@ 2014-03-25 11:22         ` Laurent GUERBY
  2014-03-25 14:44           ` Milosz Tanski
  0 siblings, 1 reply; 14+ messages in thread
From: Laurent GUERBY @ 2014-03-25 11:22 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Kevin Greenan, Ceph Development

On Tue, 2014-03-25 at 10:56 +0100, Loic Dachary wrote:
> Hi Laurent,

Hi Loic,

> It occurs to me that all we're after is to enable SSE functions such as _mm_set_epi32. We're not trying to have the binary optimized in any implicit way, it is all explicit. The problem seems to be that -msse4.2 will do both 
> 
> * activate _mm_set_epi32 etc functions
> * optimize the binary to use sse4.2 instructions
> 
> Do you know of a compiler flag that would only 
> 
> * activate _mm_set_epi32 etc functions 

This is a function part of an Intel defined standard to access processor
feature, this standard will have one or more implementation depending on
your compiler/libc/OS. IIRC these functions are closely aligned with
specific processor feature, if the feature isn't there in general it
makes no sense to use them.

In the particular case of  _mm_set_epi32 it seems
to be a data formating inline function:

/usr/lib/gcc/x86_64-linux-gnu/4.7.2/include/emmintrin.h
...
typedef long long __m128i __attribute__ ((__vector_size__ (16),
__may_alias__));
...
extern __inline __m128i __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
_mm_set_epi32 (int __q3, int __q2, int __q1, int __q0)
{
  return __extension__ (__m128i)(__v4si){ __q0, __q1, __q2, __q3 };
}

Functions in this include files are using GCC builtins:

http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/X86-Built-in-Functions.html#X86-Built-in-Functions

To avoid any issue I wouldn't use these functions at all
on a non SSE machine.

Sincerely,

Laurent

> and not
> 
> * optimize the binary to use sse4.2 instructions
> 
> ? It may be a RTFM question and I apologize for that. Reading http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options it looks like this is more or less what --mtune=corei7-avx would do (because gf-complete uses PCLMUL when available). But it feels weird to specify a specific processor model where what we need is a set of features. 
> 
> Thanks for your help :-)
> 
> On 25/03/2014 10:43, Laurent GUERBY wrote:
> > On Mon, 2014-03-24 at 22:27 +0100, Loic Dachary wrote:
> >>
> >> On 23/03/2014 23:34, Laurent GUERBY wrote:
> >>> http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options
> >>>
> >>> So unless you want to run your code a very very old x86 32 bit processor
> >>> "-msse" shouldn't be an issue. "-msse2" is similar.
> >>
> >> This is good to know :) Should I be worried about unintended side effects of -msse4.2 -mssse3 -msse4.1 or -mpclmul ? These are the flags that gf-complete are using, specifically.
> > 
> > Hi,
> > 
> > SSE4.2 will be available only in more recent
> > processors as documented on the page above.
> > 
> > If your library already is dynamically checking for processor
> > feature I would advise to be conservative in your
> > -m flags, ie using what debian would use for maximum
> > x86 portability.
> > 
> > Sincerely,
> > 
> > Laurent
> > 
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
  2014-03-25 11:22         ` Laurent GUERBY
@ 2014-03-25 14:44           ` Milosz Tanski
  2014-03-25 18:45             ` Loic Dachary
  0 siblings, 1 reply; 14+ messages in thread
From: Milosz Tanski @ 2014-03-25 14:44 UTC (permalink / raw)
  To: Laurent GUERBY; +Cc: Loic Dachary, Kevin Greenan, Ceph Development

Loic,

If you're already doing a runtime checking of these bits before
calling the functions you want to optimize then you can use the gcc
Function Specific Opt feature of GCC.
http://gcc.gnu.org/onlinedocs/gcc-4.4.0/gcc/Function-Attributes.html#index-g_t_0040code_007btarget_007d-function-attribute-2259

Basically you add a target attribute to a function (specifying use SSE version).

void my_optimized_function(void* sse_vec, size_t n)
    __attribute__ ((__target__ ("sse4.2")));

It's available from GCC 4.4 and on. That happens to be the GCC version
on RHEL6, Debian Squeeze, Ubuntu 10.04 LTS. Hopefully that's good
enough and you can omit the optimization on people on platforms older
than that.

Best,
- Milosz


On Tue, Mar 25, 2014 at 7:22 AM, Laurent GUERBY <laurent@guerby.net> wrote:
> On Tue, 2014-03-25 at 10:56 +0100, Loic Dachary wrote:
>> Hi Laurent,
>
> Hi Loic,
>
>> It occurs to me that all we're after is to enable SSE functions such as _mm_set_epi32. We're not trying to have the binary optimized in any implicit way, it is all explicit. The problem seems to be that -msse4.2 will do both
>>
>> * activate _mm_set_epi32 etc functions
>> * optimize the binary to use sse4.2 instructions
>>
>> Do you know of a compiler flag that would only
>>
>> * activate _mm_set_epi32 etc functions
>
> This is a function part of an Intel defined standard to access processor
> feature, this standard will have one or more implementation depending on
> your compiler/libc/OS. IIRC these functions are closely aligned with
> specific processor feature, if the feature isn't there in general it
> makes no sense to use them.
>
> In the particular case of  _mm_set_epi32 it seems
> to be a data formating inline function:
>
> /usr/lib/gcc/x86_64-linux-gnu/4.7.2/include/emmintrin.h
> ...
> typedef long long __m128i __attribute__ ((__vector_size__ (16),
> __may_alias__));
> ...
> extern __inline __m128i __attribute__((__gnu_inline__,
> __always_inline__, __artificial__))
> _mm_set_epi32 (int __q3, int __q2, int __q1, int __q0)
> {
>   return __extension__ (__m128i)(__v4si){ __q0, __q1, __q2, __q3 };
> }
>
> Functions in this include files are using GCC builtins:
>
> http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/X86-Built-in-Functions.html#X86-Built-in-Functions
>
> To avoid any issue I wouldn't use these functions at all
> on a non SSE machine.
>
> Sincerely,
>
> Laurent
>
>> and not
>>
>> * optimize the binary to use sse4.2 instructions
>>
>> ? It may be a RTFM question and I apologize for that. Reading http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options it looks like this is more or less what --mtune=corei7-avx would do (because gf-complete uses PCLMUL when available). But it feels weird to specify a specific processor model where what we need is a set of features.
>>
>> Thanks for your help :-)
>>
>> On 25/03/2014 10:43, Laurent GUERBY wrote:
>> > On Mon, 2014-03-24 at 22:27 +0100, Loic Dachary wrote:
>> >>
>> >> On 23/03/2014 23:34, Laurent GUERBY wrote:
>> >>> http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options
>> >>>
>> >>> So unless you want to run your code a very very old x86 32 bit processor
>> >>> "-msse" shouldn't be an issue. "-msse2" is similar.
>> >>
>> >> This is good to know :) Should I be worried about unintended side effects of -msse4.2 -mssse3 -msse4.1 or -mpclmul ? These are the flags that gf-complete are using, specifically.
>> >
>> > Hi,
>> >
>> > SSE4.2 will be available only in more recent
>> > processors as documented on the page above.
>> >
>> > If your library already is dynamically checking for processor
>> > feature I would advise to be conservative in your
>> > -m flags, ie using what debian would use for maximum
>> > x86 portability.
>> >
>> > Sincerely,
>> >
>> > Laurent
>> >
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022

p: 646-253-9055
e: milosz@adfin.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
  2014-03-25 14:44           ` Milosz Tanski
@ 2014-03-25 18:45             ` Loic Dachary
  0 siblings, 0 replies; 14+ messages in thread
From: Loic Dachary @ 2014-03-25 18:45 UTC (permalink / raw)
  To: Milosz Tanski; +Cc: Kevin Greenan, Ceph Development

[-- Attachment #1: Type: text/plain, Size: 4416 bytes --]

Thanks, I did not know about this attribute :-)

On 25/03/2014 15:44, Milosz Tanski wrote:
> Loic,
> 
> If you're already doing a runtime checking of these bits before
> calling the functions you want to optimize then you can use the gcc
> Function Specific Opt feature of GCC.
> http://gcc.gnu.org/onlinedocs/gcc-4.4.0/gcc/Function-Attributes.html#index-g_t_0040code_007btarget_007d-function-attribute-2259
> 
> Basically you add a target attribute to a function (specifying use SSE version).
> 
> void my_optimized_function(void* sse_vec, size_t n)
>     __attribute__ ((__target__ ("sse4.2")));
> 
> It's available from GCC 4.4 and on. That happens to be the GCC version
> on RHEL6, Debian Squeeze, Ubuntu 10.04 LTS. Hopefully that's good
> enough and you can omit the optimization on people on platforms older
> than that.
> 
> Best,
> - Milosz
> 
> 
> On Tue, Mar 25, 2014 at 7:22 AM, Laurent GUERBY <laurent@guerby.net> wrote:
>> On Tue, 2014-03-25 at 10:56 +0100, Loic Dachary wrote:
>>> Hi Laurent,
>>
>> Hi Loic,
>>
>>> It occurs to me that all we're after is to enable SSE functions such as _mm_set_epi32. We're not trying to have the binary optimized in any implicit way, it is all explicit. The problem seems to be that -msse4.2 will do both
>>>
>>> * activate _mm_set_epi32 etc functions
>>> * optimize the binary to use sse4.2 instructions
>>>
>>> Do you know of a compiler flag that would only
>>>
>>> * activate _mm_set_epi32 etc functions
>>
>> This is a function part of an Intel defined standard to access processor
>> feature, this standard will have one or more implementation depending on
>> your compiler/libc/OS. IIRC these functions are closely aligned with
>> specific processor feature, if the feature isn't there in general it
>> makes no sense to use them.
>>
>> In the particular case of  _mm_set_epi32 it seems
>> to be a data formating inline function:
>>
>> /usr/lib/gcc/x86_64-linux-gnu/4.7.2/include/emmintrin.h
>> ...
>> typedef long long __m128i __attribute__ ((__vector_size__ (16),
>> __may_alias__));
>> ...
>> extern __inline __m128i __attribute__((__gnu_inline__,
>> __always_inline__, __artificial__))
>> _mm_set_epi32 (int __q3, int __q2, int __q1, int __q0)
>> {
>>   return __extension__ (__m128i)(__v4si){ __q0, __q1, __q2, __q3 };
>> }
>>
>> Functions in this include files are using GCC builtins:
>>
>> http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/X86-Built-in-Functions.html#X86-Built-in-Functions
>>
>> To avoid any issue I wouldn't use these functions at all
>> on a non SSE machine.
>>
>> Sincerely,
>>
>> Laurent
>>
>>> and not
>>>
>>> * optimize the binary to use sse4.2 instructions
>>>
>>> ? It may be a RTFM question and I apologize for that. Reading http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options it looks like this is more or less what --mtune=corei7-avx would do (because gf-complete uses PCLMUL when available). But it feels weird to specify a specific processor model where what we need is a set of features.
>>>
>>> Thanks for your help :-)
>>>
>>> On 25/03/2014 10:43, Laurent GUERBY wrote:
>>>> On Mon, 2014-03-24 at 22:27 +0100, Loic Dachary wrote:
>>>>>
>>>>> On 23/03/2014 23:34, Laurent GUERBY wrote:
>>>>>> http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options
>>>>>>
>>>>>> So unless you want to run your code a very very old x86 32 bit processor
>>>>>> "-msse" shouldn't be an issue. "-msse2" is similar.
>>>>>
>>>>> This is good to know :) Should I be worried about unintended side effects of -msse4.2 -mssse3 -msse4.1 or -mpclmul ? These are the flags that gf-complete are using, specifically.
>>>>
>>>> Hi,
>>>>
>>>> SSE4.2 will be available only in more recent
>>>> processors as documented on the page above.
>>>>
>>>> If your library already is dynamically checking for processor
>>>> feature I would advise to be conservative in your
>>>> -m flags, ie using what debian would use for maximum
>>>> x86 portability.
>>>>
>>>> Sincerely,
>>>>
>>>> Laurent
>>>>
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
  2014-03-23 19:50 GCC -msse2 portability question Loic Dachary
  2014-03-23 22:34 ` Laurent GUERBY
  2014-03-24  1:40 ` Sage Weil
@ 2014-03-25 19:08 ` Loic Dachary
       [not found]   ` <CA+AFVBhpOZEPehsd4qHCBr4aRzv60ZW8LzRwKsduUrZmLV1wxQ@mail.gmail.com>
  2 siblings, 1 reply; 14+ messages in thread
From: Loic Dachary @ 2014-03-25 19:08 UTC (permalink / raw)
  To: Laurent Guerby, Kevin Greenan; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1438 bytes --]

Andreas Peters suggested another approach, which makes sense to me : have one plugin with SSE optimizations enabled, another without them and chose at runtime between the two. 

What do you think ?

On 23/03/2014 20:50, Loic Dachary wrote:
> Hi Laurent,
> 
> In the context of optimizing erasure code functions implemented by Kevin Greenan (cc'ed) and James Plank at https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a negative impact on the portability of the compiled binary code ? 
> 
> In other words, if a code is compiled without -msse* and runs fine on all intel processors it targets, could it be that adding -msse* to the compilation of the same source code generate a binary that would fail on some processors ? This is assuming no sse specific functions were used in the source code.
> 
> In gf-complete, all sse specific instructions are carefully protected to not be run on a CPU that does not support them. The runtime detection is done by checking CPU id bits ( see https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )
> 
> The corresponding thread is at:
> 
> https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
       [not found]   ` <CA+AFVBhpOZEPehsd4qHCBr4aRzv60ZW8LzRwKsduUrZmLV1wxQ@mail.gmail.com>
@ 2014-03-25 19:21     ` Loic Dachary
  2014-03-25 19:46       ` Milosz Tanski
  0 siblings, 1 reply; 14+ messages in thread
From: Loic Dachary @ 2014-03-25 19:21 UTC (permalink / raw)
  To: Kevin Greenan; +Cc: Laurent Guerby, Ceph Development

[-- Attachment #1: Type: text/plain, Size: 2325 bytes --]



On 25/03/2014 20:13, Kevin Greenan wrote:
> +1 
> 
> Yeah, that sounds better...  Let's keep this as simple as possible.  

I'll rework the https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse accordingly.

Would it be sensible to compile with SSE optimizations only if all are available ( SSE2, SSSE3, SSE4, SSE4_PCMUL ) and not attempt to distinguish betweel SSSE3 being available but not SSE4_PCMUL etc. From what I understand at this point that kind of distinction is going to be difficult to manage anyway.

Is it too simplistic ? 

> 
> -kevin
> 
> 
> On Tue, Mar 25, 2014 at 12:08 PM, Loic Dachary <loic@dachary.org <mailto:loic@dachary.org>> wrote:
> 
>     Andreas Peters suggested another approach, which makes sense to me : have one plugin with SSE optimizations enabled, another without them and chose at runtime between the two.
> 
>     What do you think ?
> 
>     On 23/03/2014 20:50, Loic Dachary wrote:
>     > Hi Laurent,
>     >
>     > In the context of optimizing erasure code functions implemented by Kevin Greenan (cc'ed) and James Plank at https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a negative impact on the portability of the compiled binary code ?
>     >
>     > In other words, if a code is compiled without -msse* and runs fine on all intel processors it targets, could it be that adding -msse* to the compilation of the same source code generate a binary that would fail on some processors ? This is assuming no sse specific functions were used in the source code.
>     >
>     > In gf-complete, all sse specific instructions are carefully protected to not be run on a CPU that does not support them. The runtime detection is done by checking CPU id bits ( see https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )
>     >
>     > The corresponding thread is at:
>     >
>     > https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
>     >
>     > Cheers
>     >
> 
>     --
>     Loïc Dachary, Artisan Logiciel Libre
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
  2014-03-25 19:21     ` Loic Dachary
@ 2014-03-25 19:46       ` Milosz Tanski
       [not found]         ` <CA+AFVBgOEz8_fv9H-8_kOuVSJNL3KQ+36b5kscfjnRMs09DZ6Q@mail.gmail.com>
  0 siblings, 1 reply; 14+ messages in thread
From: Milosz Tanski @ 2014-03-25 19:46 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Kevin Greenan, Laurent Guerby, Ceph Development

It gets a bit more tricky with x86_64 since the arch dictates that the
base line has SSE2 (but not necessarily later).

I would do is both support SSE2 (maybe in core without dlopen) and
then support all the others in a SSE4 version (including SSE4_PCMUL).
I'm glossing over x86-32 here, but you could something similar.

Best
- Milosz

On Tue, Mar 25, 2014 at 3:21 PM, Loic Dachary <loic@dachary.org> wrote:
>
>
> On 25/03/2014 20:13, Kevin Greenan wrote:
>> +1
>>
>> Yeah, that sounds better...  Let's keep this as simple as possible.
>
> I'll rework the https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse accordingly.
>
> Would it be sensible to compile with SSE optimizations only if all are available ( SSE2, SSSE3, SSE4, SSE4_PCMUL ) and not attempt to distinguish betweel SSSE3 being available but not SSE4_PCMUL etc. From what I understand at this point that kind of distinction is going to be difficult to manage anyway.
>
> Is it too simplistic ?
>
>>
>> -kevin
>>
>>
>> On Tue, Mar 25, 2014 at 12:08 PM, Loic Dachary <loic@dachary.org <mailto:loic@dachary.org>> wrote:
>>
>>     Andreas Peters suggested another approach, which makes sense to me : have one plugin with SSE optimizations enabled, another without them and chose at runtime between the two.
>>
>>     What do you think ?
>>
>>     On 23/03/2014 20:50, Loic Dachary wrote:
>>     > Hi Laurent,
>>     >
>>     > In the context of optimizing erasure code functions implemented by Kevin Greenan (cc'ed) and James Plank at https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a negative impact on the portability of the compiled binary code ?
>>     >
>>     > In other words, if a code is compiled without -msse* and runs fine on all intel processors it targets, could it be that adding -msse* to the compilation of the same source code generate a binary that would fail on some processors ? This is assuming no sse specific functions were used in the source code.
>>     >
>>     > In gf-complete, all sse specific instructions are carefully protected to not be run on a CPU that does not support them. The runtime detection is done by checking CPU id bits ( see https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )
>>     >
>>     > The corresponding thread is at:
>>     >
>>     > https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
>>     >
>>     > Cheers
>>     >
>>
>>     --
>>     Loïc Dachary, Artisan Logiciel Libre
>>
>>
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>



-- 
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022

p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
       [not found]             ` <CANP1eJG9xoCPkFs19KXG1RPUqc-D3aO_0SBOM=4WWFRN2JtX=g@mail.gmail.com>
@ 2014-03-26 18:24               ` Loic Dachary
  0 siblings, 0 replies; 14+ messages in thread
From: Loic Dachary @ 2014-03-26 18:24 UTC (permalink / raw)
  To: Milosz Tanski; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 5686 bytes --]



On 26/03/2014 18:40, Milosz Tanski wrote:
> Loic,
> 
> I don't mean to be redundant since I posted this comment already in
> the github on commit comments but I'm not sure if you saw this.

Thanks for posting it : your comment got lost by a rebase ( github's not good at that ... ).

> Instead of doing cpuid manually you can use builtins provided in gcc
> (and in clang). There's a cpuid.h header you can include. This
> stackoverflow answer has a good summary of it:
> http://stackoverflow.com/questions/14266772/how-do-i-call-cpuid-in-linux?answertab=votes#tab-top

It is a nice improvement to have indeed. Created http://tracker.ceph.com/issues/7869

Cheers

> On Wed, Mar 26, 2014 at 3:14 AM, Loic Dachary <loic@dachary.org> wrote:
>> Hi Kevin & Milosz,
>>
>> So it would be
>>
>> if(sse4 & sse3) => use a plugin compiled with sse + sse3 + sse4 activated
>> else if(sse3) => use a plugin with sse2 + sse3 activated but not sse4
>> else => fallback to not using sse at all
>>
>> like so:
>>
>> https://github.com/dachary/ceph/commit/b6e4307bd2ee1de6e8bbda0ced370d484d512114#diff-5249f49580782dfe95a1cbcc986ee5deR113
>>
>> If I understand Laurent correctly, the right approach would be to semi-transparently generate and select the code path depending on the features at runtime. But that would require more work and I created a ticket to track this : http://tracker.ceph.com/issues/7865
>>
>> Does that sound right ?
>>
>> On 25/03/2014 22:31, Kevin Greenan wrote:
>>> Hey Loic,
>>>
>>> I think we want something closer to what Milosz is proposing (3 cut-offs instead of 2) .  The shuffle instruction is part of SSSE3 and is the basis for the SSE split table techniques, which are super fast.  By doing all-or-nothing, it is possible many users would not be able to take advantage of it when they are capable.
>>>
>>> Make sense?
>>>
>>> -kevin
>>>
>>>
>>> On Tue, Mar 25, 2014 at 12:46 PM, Milosz Tanski <milosz@adfin.com <mailto:milosz@adfin.com>> wrote:
>>>
>>>     It gets a bit more tricky with x86_64 since the arch dictates that the
>>>     base line has SSE2 (but not necessarily later).
>>>
>>>     I would do is both support SSE2 (maybe in core without dlopen) and
>>>     then support all the others in a SSE4 version (including SSE4_PCMUL).
>>>     I'm glossing over x86-32 here, but you could something similar.
>>>
>>>     Best
>>>     - Milosz
>>>
>>>     On Tue, Mar 25, 2014 at 3:21 PM, Loic Dachary <loic@dachary.org <mailto:loic@dachary.org>> wrote:
>>>     >
>>>     >
>>>     > On 25/03/2014 20:13, Kevin Greenan wrote:
>>>     >> +1
>>>     >>
>>>     >> Yeah, that sounds better...  Let's keep this as simple as possible.
>>>     >
>>>     > I'll rework the https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse accordingly.
>>>     >
>>>     > Would it be sensible to compile with SSE optimizations only if all are available ( SSE2, SSSE3, SSE4, SSE4_PCMUL ) and not attempt to distinguish betweel SSSE3 being available but not SSE4_PCMUL etc. From what I understand at this point that kind of distinction is going to be difficult to manage anyway.
>>>     >
>>>     > Is it too simplistic ?
>>>     >
>>>     >>
>>>     >> -kevin
>>>     >>
>>>     >>
>>>     >> On Tue, Mar 25, 2014 at 12:08 PM, Loic Dachary <loic@dachary.org <mailto:loic@dachary.org> <mailto:loic@dachary.org <mailto:loic@dachary.org>>> wrote:
>>>     >>
>>>     >>     Andreas Peters suggested another approach, which makes sense to me : have one plugin with SSE optimizations enabled, another without them and chose at runtime between the two.
>>>     >>
>>>     >>     What do you think ?
>>>     >>
>>>     >>     On 23/03/2014 20:50, Loic Dachary wrote:
>>>     >>     > Hi Laurent,
>>>     >>     >
>>>     >>     > In the context of optimizing erasure code functions implemented by Kevin Greenan (cc'ed) and James Plank at https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a negative impact on the portability of the compiled binary code ?
>>>     >>     >
>>>     >>     > In other words, if a code is compiled without -msse* and runs fine on all intel processors it targets, could it be that adding -msse* to the compilation of the same source code generate a binary that would fail on some processors ? This is assuming no sse specific functions were used in the source code.
>>>     >>     >
>>>     >>     > In gf-complete, all sse specific instructions are carefully protected to not be run on a CPU that does not support them. The runtime detection is done by checking CPU id bits ( see https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )
>>>     >>     >
>>>     >>     > The corresponding thread is at:
>>>     >>     >
>>>     >>     > https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
>>>     >>     >
>>>     >>     > Cheers
>>>     >>     >
>>>     >>
>>>     >>     --
>>>     >>     Loïc Dachary, Artisan Logiciel Libre
>>>     >>
>>>     >>
>>>     >
>>>     > --
>>>     > Loïc Dachary, Artisan Logiciel Libre
>>>     >
>>>
>>>
>>>
>>>     --
>>>     Milosz Tanski
>>>     CTO
>>>     10 East 53rd Street, 37th floor
>>>     New York, NY 10022
>>>
>>>     p: 646-253-9055 <tel:646-253-9055>
>>>     e: milosz@adfin.com <mailto:milosz@adfin.com>
>>>
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
> 
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GCC -msse2 portability question
       [not found]             ` <CANP1eJErc4qnRhtOCs=Cnh6VNtihLVcZxB1PSCQjpH0sFDBuWA@mail.gmail.com>
@ 2014-03-26 22:13               ` Loic Dachary
  0 siblings, 0 replies; 14+ messages in thread
From: Loic Dachary @ 2014-03-26 22:13 UTC (permalink / raw)
  To: Milosz Tanski; +Cc: Kevin Greenan, Ceph Development

[-- Attachment #1: Type: text/plain, Size: 5772 bytes --]



On 26/03/2014 19:44, Milosz Tanski wrote:
> On Wed, Mar 26, 2014 at 3:14 AM, Loic Dachary <loic@dachary.org> wrote:
>> Hi Kevin & Milosz,
>>
>> So it would be
>>
>> if(sse4 & sse3) => use a plugin compiled with sse + sse3 + sse4 activated
>> else if(sse3) => use a plugin with sse2 + sse3 activated but not sse4
>> else => fallback to not using sse at all
> 
> Out of curiosity does else (generic) fallback to sse2 on x86_64? Since
> sse2 is the guarenteed baseline on x86_64 and I'm guessing that most
> ceph servers are x86_64.

It does not activate any -msse flags which is conservative until "erasure-code: fine grain SSE support" http://tracker.ceph.com/issues/7865 . I'm assuming most intel processor running ceph will have SSE3 and only a few will not, based on what http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options shows. But it's a gut feeling. Do you think this is a mistake ?

This is how it looks at the moment:

https://github.com/dachary/ceph/commit/e7875af10bf92c557b1ef97ffcd871dfe617c160

Cheers

> 
>>
>> like so:
>>
>> https://github.com/dachary/ceph/commit/b6e4307bd2ee1de6e8bbda0ced370d484d512114#diff-5249f49580782dfe95a1cbcc986ee5deR113
>>
>> If I understand Laurent correctly, the right approach would be to semi-transparently generate and select the code path depending on the features at runtime. But that would require more work and I created a ticket to track this : http://tracker.ceph.com/issues/7865
>>
>> Does that sound right ?
>>
>> On 25/03/2014 22:31, Kevin Greenan wrote:
>>> Hey Loic,
>>>
>>> I think we want something closer to what Milosz is proposing (3 cut-offs instead of 2) .  The shuffle instruction is part of SSSE3 and is the basis for the SSE split table techniques, which are super fast.  By doing all-or-nothing, it is possible many users would not be able to take advantage of it when they are capable.
>>>
>>> Make sense?
>>>
>>> -kevin
>>>
>>>
>>> On Tue, Mar 25, 2014 at 12:46 PM, Milosz Tanski <milosz@adfin.com <mailto:milosz@adfin.com>> wrote:
>>>
>>>     It gets a bit more tricky with x86_64 since the arch dictates that the
>>>     base line has SSE2 (but not necessarily later).
>>>
>>>     I would do is both support SSE2 (maybe in core without dlopen) and
>>>     then support all the others in a SSE4 version (including SSE4_PCMUL).
>>>     I'm glossing over x86-32 here, but you could something similar.
>>>
>>>     Best
>>>     - Milosz
>>>
>>>     On Tue, Mar 25, 2014 at 3:21 PM, Loic Dachary <loic@dachary.org <mailto:loic@dachary.org>> wrote:
>>>     >
>>>     >
>>>     > On 25/03/2014 20:13, Kevin Greenan wrote:
>>>     >> +1
>>>     >>
>>>     >> Yeah, that sounds better...  Let's keep this as simple as possible.
>>>     >
>>>     > I'll rework the https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse accordingly.
>>>     >
>>>     > Would it be sensible to compile with SSE optimizations only if all are available ( SSE2, SSSE3, SSE4, SSE4_PCMUL ) and not attempt to distinguish betweel SSSE3 being available but not SSE4_PCMUL etc. From what I understand at this point that kind of distinction is going to be difficult to manage anyway.
>>>     >
>>>     > Is it too simplistic ?
>>>     >
>>>     >>
>>>     >> -kevin
>>>     >>
>>>     >>
>>>     >> On Tue, Mar 25, 2014 at 12:08 PM, Loic Dachary <loic@dachary.org <mailto:loic@dachary.org> <mailto:loic@dachary.org <mailto:loic@dachary.org>>> wrote:
>>>     >>
>>>     >>     Andreas Peters suggested another approach, which makes sense to me : have one plugin with SSE optimizations enabled, another without them and chose at runtime between the two.
>>>     >>
>>>     >>     What do you think ?
>>>     >>
>>>     >>     On 23/03/2014 20:50, Loic Dachary wrote:
>>>     >>     > Hi Laurent,
>>>     >>     >
>>>     >>     > In the context of optimizing erasure code functions implemented by Kevin Greenan (cc'ed) and James Plank at https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a negative impact on the portability of the compiled binary code ?
>>>     >>     >
>>>     >>     > In other words, if a code is compiled without -msse* and runs fine on all intel processors it targets, could it be that adding -msse* to the compilation of the same source code generate a binary that would fail on some processors ? This is assuming no sse specific functions were used in the source code.
>>>     >>     >
>>>     >>     > In gf-complete, all sse specific instructions are carefully protected to not be run on a CPU that does not support them. The runtime detection is done by checking CPU id bits ( see https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )
>>>     >>     >
>>>     >>     > The corresponding thread is at:
>>>     >>     >
>>>     >>     > https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
>>>     >>     >
>>>     >>     > Cheers
>>>     >>     >
>>>     >>
>>>     >>     --
>>>     >>     Loïc Dachary, Artisan Logiciel Libre
>>>     >>
>>>     >>
>>>     >
>>>     > --
>>>     > Loïc Dachary, Artisan Logiciel Libre
>>>     >
>>>
>>>
>>>
>>>     --
>>>     Milosz Tanski
>>>     CTO
>>>     10 East 53rd Street, 37th floor
>>>     New York, NY 10022
>>>
>>>     p: 646-253-9055 <tel:646-253-9055>
>>>     e: milosz@adfin.com <mailto:milosz@adfin.com>
>>>
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
> 
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-03-26 22:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-23 19:50 GCC -msse2 portability question Loic Dachary
2014-03-23 22:34 ` Laurent GUERBY
2014-03-24 21:27   ` Loic Dachary
2014-03-25  9:43     ` Laurent GUERBY
2014-03-25  9:56       ` Loic Dachary
2014-03-25 11:22         ` Laurent GUERBY
2014-03-25 14:44           ` Milosz Tanski
2014-03-25 18:45             ` Loic Dachary
2014-03-24  1:40 ` Sage Weil
2014-03-25 19:08 ` Loic Dachary
     [not found]   ` <CA+AFVBhpOZEPehsd4qHCBr4aRzv60ZW8LzRwKsduUrZmLV1wxQ@mail.gmail.com>
2014-03-25 19:21     ` Loic Dachary
2014-03-25 19:46       ` Milosz Tanski
     [not found]         ` <CA+AFVBgOEz8_fv9H-8_kOuVSJNL3KQ+36b5kscfjnRMs09DZ6Q@mail.gmail.com>
     [not found]           ` <53327E59.7060408@dachary.org>
     [not found]             ` <CANP1eJG9xoCPkFs19KXG1RPUqc-D3aO_0SBOM=4WWFRN2JtX=g@mail.gmail.com>
2014-03-26 18:24               ` Loic Dachary
     [not found]             ` <CANP1eJErc4qnRhtOCs=Cnh6VNtihLVcZxB1PSCQjpH0sFDBuWA@mail.gmail.com>
2014-03-26 22:13               ` Loic Dachary

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.