AMD FX CPU bug, not fixed by latest microcode?

* AMD FX CPU bug, not fixed by latest microcode?
@ 2012-06-10 19:24 Boszormenyi Zoltan
  2012-06-11  7:52 ` Clemens Ladisch
  2012-06-11  8:43 ` Borislav Petkov
  0 siblings, 2 replies; 14+ messages in thread
From: Boszormenyi Zoltan @ 2012-06-10 19:24 UTC (permalink / raw)
  To: linux-kernel

Hi,

I have an AMD FX-8120 boxed CPU in an ASUS M5A99X-EVO mainboard
with 32GB DDR3/1600 memory, running Fedora 17, upgraded from 16.
memtest86+ show no problems.

Still, I get occasional crashes and signal 11 during kernel compilation even
with single-job make. Sometimes the compiler jumps out with a strange
error message, like "stray \NNN character in the source". When re-running
make, the error doesn't happen in the same file and the source file doesn't
contain the character being complained about when inspecting with
an editor or hexdump.

Now, a few minutes ago I was able to catch this bug when I copied the
kernel GIT tree to apply a patch manually and did "git commit -a".
Strangely, the commit contained one extra file that I didn't touch.
git diff showed this for the extra file:

==============================

--- a/drivers/usb/gadget/fsl_usb2_udc.h
+++ b/drivers/usb/gadget/fsl_usb2_udc.h
@@ -427,7 +427,7 @@ struct ep_td_struct {
  #define  DTD_ADDR_MASK                        0xFFFFFFE0
  #define  DTD_PACKET_SIZE                      0x7FFF0000
  #define  DTD_LENGTH_BIT_POS                   16
-#define  DTD_ERROR_MASK                       (DTD_STATUS_HALTED | \
+#define  DTD_ERROR_MASK                       (DTD_STATUS_HALTED | ^Z
                                                 DTD_STATUS_DATA_BUFF_ERR | \
                                                 DTD_STATUS_TRANSACTION_ERR)
  /* Alignment requirements; must be a power of two */
==============================

The "^Z" is a 0-character in the file and is not present in the
original source tree, only in the copy.

Similar errors happened during copying large files on the same
machine but it seems it's enough to trigger if the total amount
of data read is large enough.

The mainboard has the latest (UEFI) firmware flashed which
contains the latest AMD microcode, so microcode_ctl doesn't
need to apply it anymore. Previously, I used amd-ucode-2012-01-17.tar
from www.amd64.org/support/microcode.html which is now
part of microcode_ctl in Fedora.

Since the error happens during compiling a source file and not only
copying, the bug seems to happens during *reading* data.

Does anyone know whether it's a known problem in AMD FX CPUs?
Does AMD have a newer microcode to fix this bug, or should I apply
for warranty?

Thanks in advance,
Zoltán Böszörményi


^ permalink raw reply	[flat|nested] 14+ messages in thread