All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Bauer <dfnsonfsduifb@gmx.de>
To: Andrey Korolyov <andrey@xdel.ru>
Cc: Jan Kara <jack@suse.cz>, linux-ext4@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Frequent ext4 oopses with 4.4.0 on Intel NUC6i3SYB
Date: Tue, 4 Oct 2016 21:55:58 +0200	[thread overview]
Message-ID: <087b53e5-b23b-d3c2-6b8e-980bdcbf75c1@gmx.de> (raw)
In-Reply-To: <CABYiri-UUT6zVGyNENp-aBJDj6Oikodc5ZA27Gzq5-bVDqjZ4g@mail.gmail.com>

On 04.10.2016 20:45, Andrey Korolyov wrote:
>> Damn bad idea to build on the instable target. Lots of gcc segfaults and
>> weird stuff, even without a kernel panic. The system appears to be
>> instable as hell. Wonder how it can even run and how much of the root fs
>> is already corrupted :-(
>>
>> Rebuilding 4.8 on a different host.
> 
> Looks like a platform itself is somewhat faulty: [1]. Also please bear
> in mind that standalone memory testers would rather not expose certain
> classes of memory failures, I`d suggest to test allocator`s work
> against gcc runs on tmpfs, almost same as you did before. Frequency of
> crashes due to wrong pointer contents of an fs cache is most probably
> a direct outcome from its relative memory footprint.

So there's some interesting new data points that I couldn't make sense
of. Maybe you can.

First off, 4.8.0 shows the same symptoms. When I try to build 4.8.0 in
/usr/src/linux using make -j4, I get bus errors and segfaults in gcc
pretty soon.

Doing the same thing in /dev/shm, however, builds like a charm. Three
kernels built, all ran through perfectly. Not one try in /usr/src did
that, all my attempts failed.

What could cause this? Faulty hard drive? It's brand new:

Model Family:     Western Digital Red
Device Model:     WDC WD10JFCX-68N6GN0
Firmware Version: 82.00A82

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always
      -       0
  3 Spin_Up_Time            0x0027   182   181   021    Pre-fail  Always
      -       1858
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always
      -       17
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always
      -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always
      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always
      -       178

Or faulty AHCI controller or driver?

[    9.746277] ahci 0000:00:17.0: version 3.0
[    9.746499] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps
0x1 impl SATA mode
[    9.746501] ahci 0000:00:17.0: flags: 64bit ncq pm led clo only pio
slum part deso sadm sds apst
[    9.753844] scsi host0: ahci
[    9.754648] ata1: SATA max UDMA/133 abar m2048@0xdf14d000 port
0xdf14d100 irq 275

I'm super puzzled right now :-(

Cheers,
Johannes

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Bauer <dfnsonfsduifb@gmx.de>
To: Andrey Korolyov <andrey@xdel.ru>
Cc: Jan Kara <jack@suse.cz>, linux-ext4@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Frequent ext4 oopses with 4.4.0 on Intel NUC6i3SYB
Date: Tue, 4 Oct 2016 21:55:58 +0200	[thread overview]
Message-ID: <087b53e5-b23b-d3c2-6b8e-980bdcbf75c1@gmx.de> (raw)
In-Reply-To: <CABYiri-UUT6zVGyNENp-aBJDj6Oikodc5ZA27Gzq5-bVDqjZ4g@mail.gmail.com>

On 04.10.2016 20:45, Andrey Korolyov wrote:
>> Damn bad idea to build on the instable target. Lots of gcc segfaults and
>> weird stuff, even without a kernel panic. The system appears to be
>> instable as hell. Wonder how it can even run and how much of the root fs
>> is already corrupted :-(
>>
>> Rebuilding 4.8 on a different host.
> 
> Looks like a platform itself is somewhat faulty: [1]. Also please bear
> in mind that standalone memory testers would rather not expose certain
> classes of memory failures, I`d suggest to test allocator`s work
> against gcc runs on tmpfs, almost same as you did before. Frequency of
> crashes due to wrong pointer contents of an fs cache is most probably
> a direct outcome from its relative memory footprint.

So there's some interesting new data points that I couldn't make sense
of. Maybe you can.

First off, 4.8.0 shows the same symptoms. When I try to build 4.8.0 in
/usr/src/linux using make -j4, I get bus errors and segfaults in gcc
pretty soon.

Doing the same thing in /dev/shm, however, builds like a charm. Three
kernels built, all ran through perfectly. Not one try in /usr/src did
that, all my attempts failed.

What could cause this? Faulty hard drive? It's brand new:

Model Family:     Western Digital Red
Device Model:     WDC WD10JFCX-68N6GN0
Firmware Version: 82.00A82

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always
      -       0
  3 Spin_Up_Time            0x0027   182   181   021    Pre-fail  Always
      -       1858
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always
      -       17
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always
      -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always
      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always
      -       178

Or faulty AHCI controller or driver?

[    9.746277] ahci 0000:00:17.0: version 3.0
[    9.746499] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps
0x1 impl SATA mode
[    9.746501] ahci 0000:00:17.0: flags: 64bit ncq pm led clo only pio
slum part deso sadm sds apst
[    9.753844] scsi host0: ahci
[    9.754648] ata1: SATA max UDMA/133 abar m2048@0xdf14d000 port
0xdf14d100 irq 275

I'm super puzzled right now :-(

Cheers,
Johannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-10-04 19:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-03 10:52 Frequent ext4 oopses with 4.4.0 on Intel NUC6i3SYB Johannes Bauer
2016-10-04  3:18 ` Theodore Ts'o
2016-10-04  8:41 ` Jan Kara
2016-10-04 16:50   ` Johannes Bauer
2016-10-04 17:32     ` Johannes Bauer
2016-10-04 17:32       ` Johannes Bauer
2016-10-04 18:45       ` Andrey Korolyov
2016-10-04 18:45         ` Andrey Korolyov
2016-10-04 19:02         ` Johannes Bauer
2016-10-04 19:02           ` Johannes Bauer
2016-10-04 19:55         ` Johannes Bauer [this message]
2016-10-04 19:55           ` Johannes Bauer
2016-10-04 20:17           ` Andrey Korolyov
2016-10-04 20:17             ` Andrey Korolyov
2016-10-04 21:54             ` Johannes Bauer
2016-10-04 21:54               ` Johannes Bauer
2016-10-05  6:20               ` Jan Kara
2016-10-04 20:18         ` Johannes Bauer
2016-10-04 20:18           ` Johannes Bauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=087b53e5-b23b-d3c2-6b8e-980bdcbf75c1@gmx.de \
    --to=dfnsonfsduifb@gmx.de \
    --cc=andrey@xdel.ru \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.