All of lore.kernel.org
 help / color / mirror / Atom feed
* AVX "Sandy Bridge" hardware issue?
@ 2011-07-12 20:16 MK
  2011-07-12 21:06 ` Chris Friesen
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: MK @ 2011-07-12 20:16 UTC (permalink / raw)
  To: linux-kernel

Hi gang! I'd forgotten how busy this list is, I hope someone can help
me out.

I have a small VPS slice, run under openVZ, that I use for testing and
personal projects.  Recently, the provider migrated to new Xeon "Sandy
Bridge" processors, which according to wikipedia are the first and
thus far only commercially available processors using AVX.

After the migration, I had a number of apache mod_perl applications
break due to SIGILL.   Reproducible test case:

use Apache2::Const qw(SERVER_ERROR)

sub handler {
     return SERVER_ERROR;
};

Apache2::Const is the indirect culprit here; if I remove it and just
return 500 the module works.  Note that this is not a perl error. A
backtrace from running apache under gdb, triggering the issue, is here:

http://pastebin.com/16SrEzHM

I posted this to the mod_perl list and someone pointed me to a
backtrace identical in its final contexts, from a glibc bug
reported last year:

http://sourceware.org/bugzilla/show_bug.cgi?format=multiple&id=12113

Which involves AVX hardware.  The VPS provider has provided me with a
bare Fedora 14 slice for debugging this issue, and the "small
reproducer" available from the above bug report, verified by Ulrich
Drepper, does reproduce the issue.

So I filed a glibc bug with fedora to that effect:

https://bugzilla.redhat.com/show_bug.cgi?id=720176

In which Andreas Schwab points out (rightly or wrongly) that according
to the /proc/cpuinfo from the slice, the processor actually does not
support AVX.  However,  the "model name", "Intel(R) Xeon(R) CPU
E31230", is according to this a Sandy Bridge processor with AVX:

http://en.wikipedia.org/wiki/Sandy_Bridge#Server_processors

And while I do not have access to the hardware, the provider is very
unequivocal about the fact that these are Sandy Bridges, which
apparently include AVX.

So I am looking for a next step to take in debugging this.  The kernel
used on the slice (nb, openVZ does not allow for rolling your own) is 
2.6.32 built with gcc 4.1.2.  I think this may be prior to AVX support
in the kernel and gcc, but the glibc is 2.13, which apparently includes
it.  

Does anyone have any idea why I would get this identical backtrace, and
a failed reproducer test, on hardware which supposedly supports AVX
(but not according to the kernel in /proc/cpuinfo)?

Sincerely, MK   

-- 
"Enthusiasm is not the enemy of the intellect." (said of Irving Howe)
"The angel of history[...]is turned toward the past." (Walter Benjamin)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AVX "Sandy Bridge" hardware issue?
  2011-07-12 20:16 AVX "Sandy Bridge" hardware issue? MK
@ 2011-07-12 21:06 ` Chris Friesen
  2011-07-15 13:06   ` MK
  2011-07-13  0:49 ` Andi Kleen
  2011-07-20 13:55 ` Andy Lutomirski
  2 siblings, 1 reply; 7+ messages in thread
From: Chris Friesen @ 2011-07-12 21:06 UTC (permalink / raw)
  To: MK; +Cc: linux-kernel

On 07/12/2011 02:16 PM, MK wrote:

> So I filed a glibc bug with fedora to that effect:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=720176
>
> In which Andreas Schwab points out (rightly or wrongly) that according
> to the /proc/cpuinfo from the slice, the processor actually does not
> support AVX.  However,  the "model name", "Intel(R) Xeon(R) CPU
> E31230", is according to this a Sandy Bridge processor with AVX:
>
> http://en.wikipedia.org/wiki/Sandy_Bridge#Server_processors
>
> And while I do not have access to the hardware, the provider is very
> unequivocal about the fact that these are Sandy Bridges, which
> apparently include AVX.
>
> So I am looking for a next step to take in debugging this.  The kernel
> used on the slice (nb, openVZ does not allow for rolling your own) is
> 2.6.32 built with gcc 4.1.2.  I think this may be prior to AVX support
> in the kernel and gcc, but the glibc is 2.13, which apparently includes
> it.
>
> Does anyone have any idea why I would get this identical backtrace, and
> a failed reproducer test, on hardware which supposedly supports AVX
> (but not according to the kernel in /proc/cpuinfo)?

For what it's worth, Intel says the E31230 supports AVX as well:

http://ark.intel.com/Product.aspx?id=52271

so it's interesting that /proc/cpuinfo doesn't mention it.  Certainly 
I'd consider that worth following up.


As far as I can tell, support for saving/restoring AVX registers was 
added in 2.6.30, but gcc 4.6 is needed for proper support to actually 
use the new instructions.  However, given that you're using openVZ then 
if there is a bug in the kernel all it would take is one VPS slice using 
code built with a new compiler.

Chris

-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AVX "Sandy Bridge" hardware issue?
  2011-07-12 20:16 AVX "Sandy Bridge" hardware issue? MK
  2011-07-12 21:06 ` Chris Friesen
@ 2011-07-13  0:49 ` Andi Kleen
  2011-07-13 16:17   ` Chris Friesen
  2011-07-15 13:12   ` MK
  2011-07-20 13:55 ` Andy Lutomirski
  2 siblings, 2 replies; 7+ messages in thread
From: Andi Kleen @ 2011-07-13  0:49 UTC (permalink / raw)
  To: MK; +Cc: linux-kernel

MK <mk@cognitivedissonance.ca> writes:
>
> In which Andreas Schwab points out (rightly or wrongly) that according
> to the /proc/cpuinfo from the slice, the processor actually does not
> support AVX.  However,  the "model name", "Intel(R) Xeon(R) CPU
> E31230", is according to this a Sandy Bridge processor with AVX:

If it's in a VM then the VM may not expose AVX to the guest
(the VMs have to do that explicitely because AVX has additional state). 
If it's not in /proc/cpuinfo on the guest that's likely the case.

However glibc should of course not use AVX in this case.

> Does anyone have any idea why I would get this identical backtrace, and
> a failed reproducer test, on hardware which supposedly supports AVX
> (but not according to the kernel in /proc/cpuinfo)?

If there's a problem then it's likely in the VM. Maybe it leaks
AVX partially through only.

For example I had a similar problem a long time ago on a system
which had inconsistent CPU features for different CPUs. The kernel
will chose the least common denominator, but an application
directly calling CPUID can sometimes see different flags and then
crash when it switches CPUs.

The symptoms would be consistent with that.

I would contact the VM vendor.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AVX "Sandy Bridge" hardware issue?
  2011-07-13  0:49 ` Andi Kleen
@ 2011-07-13 16:17   ` Chris Friesen
  2011-07-15 13:12   ` MK
  1 sibling, 0 replies; 7+ messages in thread
From: Chris Friesen @ 2011-07-13 16:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: MK, linux-kernel

On 07/12/2011 06:49 PM, Andi Kleen wrote:
> MK<mk@cognitivedissonance.ca>  writes:
>>
>> In which Andreas Schwab points out (rightly or wrongly) that according
>> to the /proc/cpuinfo from the slice, the processor actually does not
>> support AVX.  However,  the "model name", "Intel(R) Xeon(R) CPU
>> E31230", is according to this a Sandy Bridge processor with AVX:
>
> If it's in a VM then the VM may not expose AVX to the guest
> (the VMs have to do that explicitely because AVX has additional state).
> If it's not in /proc/cpuinfo on the guest that's likely the case.

The OP mentioned that he's using OpenVZ, so it's running in a container 
(chroot on steroids), not a "true" VM.

Each instance runs the same kernel, they're isolated from the other 
instances using filesystem/process/network namespaces.

That said, /proc/cpuinfo is virtualized somehow by the containerization 
since it shows only the cpus assigned to the container--it seems 
plausible that they screwed up the cpu flags while doing it.

I think your suggestion of talking to the vendor is the correct next step.

Chris


-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AVX "Sandy Bridge" hardware issue?
  2011-07-12 21:06 ` Chris Friesen
@ 2011-07-15 13:06   ` MK
  0 siblings, 0 replies; 7+ messages in thread
From: MK @ 2011-07-15 13:06 UTC (permalink / raw)
  To: linux-kernel

On Tue, 12 Jul 2011 15:06:48 -0600
Chris Friesen <chris.friesen@genband.com> wrote:
> For what it's worth, Intel says the E31230 supports AVX as well:
> 
> http://ark.intel.com/Product.aspx?id=52271

Thanks for that.
 
> As far as I can tell, support for saving/restoring AVX registers was 
> added in 2.6.30, but gcc 4.6 is needed for proper support to actually 
> use the new instructions.  However, given that you're using openVZ
> then if there is a bug in the kernel all it would take is one VPS
> slice using code built with a new compiler.

I might try building a new gcc (since there is no 4.6 binary for
F14) and then testing that way.




-- 
"Enthusiasm is not the enemy of the intellect." (said of Irving Howe)
"The angel of history[...]is turned toward the past." (Walter Benjamin)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AVX "Sandy Bridge" hardware issue?
  2011-07-13  0:49 ` Andi Kleen
  2011-07-13 16:17   ` Chris Friesen
@ 2011-07-15 13:12   ` MK
  1 sibling, 0 replies; 7+ messages in thread
From: MK @ 2011-07-15 13:12 UTC (permalink / raw)
  To: linux-kernel

On Tue, 12 Jul 2011 17:49:35 -0700
Andi Kleen <andi@firstfloor.org> wrote:
> If there's a problem then it's likely in the VM. Maybe it leaks
> AVX partially through only.
> 
> For example I had a similar problem a long time ago on a system
> which had inconsistent CPU features for different CPUs. The kernel
> will chose the least common denominator, but an application
> directly calling CPUID can sometimes see different flags and then
> crash when it switches CPUs.
> 
> The symptoms would be consistent with that.
> 
> I would contact the VM vendor.

I think I will go the developer list for openVZ first and get some more
info.  The vendor is trepidatious as no one else has had a problem (I
dunno how many, if any, of the other clients use mod_perl) and this
could be an unsolvable headache headache for them.

Thanks all for the input!

-- 
"Enthusiasm is not the enemy of the intellect." (said of Irving Howe)
"The angel of history[...]is turned toward the past." (Walter Benjamin)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AVX "Sandy Bridge" hardware issue?
  2011-07-12 20:16 AVX "Sandy Bridge" hardware issue? MK
  2011-07-12 21:06 ` Chris Friesen
  2011-07-13  0:49 ` Andi Kleen
@ 2011-07-20 13:55 ` Andy Lutomirski
  2 siblings, 0 replies; 7+ messages in thread
From: Andy Lutomirski @ 2011-07-20 13:55 UTC (permalink / raw)
  To: MK; +Cc: linux-kernel

On 07/12/2011 04:16 PM, MK wrote:
> Hi gang! I'd forgotten how busy this list is, I hope someone can help
> me out.
>
> I have a small VPS slice, run under openVZ, that I use for testing and
> personal projects.  Recently, the provider migrated to new Xeon "Sandy
> Bridge" processors, which according to wikipedia are the first and
> thus far only commercially available processors using AVX.
>
> After the migration, I had a number of apache mod_perl applications
> break due to SIGILL.   Reproducible test case:
>
> use Apache2::Const qw(SERVER_ERROR)
>
> sub handler {
>       return SERVER_ERROR;
> };
>
> Apache2::Const is the indirect culprit here; if I remove it and just
> return 500 the module works.  Note that this is not a perl error. A
> backtrace from running apache under gdb, triggering the issue, is here:
>
> http://pastebin.com/16SrEzHM
>
> I posted this to the mod_perl list and someone pointed me to a
> backtrace identical in its final contexts, from a glibc bug
> reported last year:
>
> http://sourceware.org/bugzilla/show_bug.cgi?format=multiple&id=12113
>
> Which involves AVX hardware.  The VPS provider has provided me with a
> bare Fedora 14 slice for debugging this issue, and the "small
> reproducer" available from the above bug report, verified by Ulrich
> Drepper, does reproduce the issue.
>
> So I filed a glibc bug with fedora to that effect:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=720176
>
> In which Andreas Schwab points out (rightly or wrongly) that according
> to the /proc/cpuinfo from the slice, the processor actually does not
> support AVX.  However,  the "model name", "Intel(R) Xeon(R) CPU
> E31230", is according to this a Sandy Bridge processor with AVX:
>
> http://en.wikipedia.org/wiki/Sandy_Bridge#Server_processors
>
> And while I do not have access to the hardware, the provider is very
> unequivocal about the fact that these are Sandy Bridges, which
> apparently include AVX.
>
> So I am looking for a next step to take in debugging this.  The kernel
> used on the slice (nb, openVZ does not allow for rolling your own) is
> 2.6.32 built with gcc 4.1.2.  I think this may be prior to AVX support
> in the kernel and gcc, but the glibc is 2.13, which apparently includes
> it.
>
> Does anyone have any idea why I would get this identical backtrace, and
> a failed reproducer test, on hardware which supposedly supports AVX
> (but not according to the kernel in /proc/cpuinfo)?

I was bored and read the manual.  It looks like glibc is buggy: it 
checks whether the CPU supports AVX but not whether the OS enables AVX.

http://sourceware.org/bugzilla/show_bug.cgi?id=13007

That being said, you should still bug your provider for a better kernel. 
  AVX is useful and should be enabled.

--Andy

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-07-20 13:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-12 20:16 AVX "Sandy Bridge" hardware issue? MK
2011-07-12 21:06 ` Chris Friesen
2011-07-15 13:06   ` MK
2011-07-13  0:49 ` Andi Kleen
2011-07-13 16:17   ` Chris Friesen
2011-07-15 13:12   ` MK
2011-07-20 13:55 ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.