All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lionel Bouton <lionel-subscription@bouton.name>
To: "Niccolò Belli" <darkbasic@linuxsystems.it>, linux-btrfs@vger.kernel.org
Cc: Clemens Eisserer <linuxhippy@gmail.com>,
	"Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	Patrik Lundquist <patrik.lundquist@gmail.com>,
	Chris Murphy <lists@colorremedies.com>,
	Qu Wenruo <quwenruo@cn.fujitsu.com>,
	Omar Sandoval <osandov@osandov.com>
Subject: Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair
Date: Mon, 9 May 2016 21:23:07 +0200	[thread overview]
Message-ID: <5730E39B.7000403@bouton.name> (raw)
In-Reply-To: <52f0c710-d695-443d-b6d5-266e3db634f8@linuxsystems.it>

Hi,

Le 09/05/2016 16:53, Niccolò Belli a écrit :
> On domenica 8 maggio 2016 20:27:55 CEST, Patrik Lundquist wrote:
>> Are you using any power management tweaks?
>
> Yes, as stated in my very first post I use TLP with
> SATA_LINKPWR_ON_BAT=max_performance, but I managed to reproduce the
> bug even without TLP. Also in the past week I've alwyas been on AC.
>
> On lunedì 9 maggio 2016 13:52:16 CEST, Austin S. Hemmelgarn wrote:
>> Memtest doesn't replicate typical usage patterns very well.  My usual
>> testing for RAM involves not just memtest, but also booting into a
>> LiveCD (usually SystemRescueCD), pulling down a copy of the kernel
>> source, and then running as many concurrent kernel builds as cores,
>> each with as many make jobs as cores (so if you've got a quad core
>> CPU (or a dual core with hyperthreading), it would be running 4
>> builds with -j4 passed to make).  GCC seems to have memory usage
>> patterns that reliably trigger memory errors that aren't caught by
>> memtest, so this generally gives good results.
>
> Building kernel with 4 concurrent threads is not an issue for my
> system, in fact I do compile a lot and I never had any issue.

Note : I once had a server which would pass memtest86 and repeated
kernel compilations maxing out the CPU threads but couldn't at the same
time reliably compile a kernel and copy large amounts of data.
I think I lost my little automated test suite (I should definitely look
for it again or code it from scratch) but what I did on new servers
since that time was :

1/ create a file larger than the system's RAM (this makes sure you will
read and write all data from disk and not only caches and might catch
controller hardware problems too) with dd if=/dev/urandom (several
gigabytes of random data exercise many different patterns, far more than
what memtest86 would test), compute its md5 checksum
2/ launch a subprocess repeatedly compiling the kernel with more jobs
than available CPU threads and stopping as soon as the make exit code
was != 0.
3/ launch another subprocess repeatedly copying the random file to
another location and exiting when the md5 checksum didn't match the source.

Let it run as a burn-in test for as long as you can afford (from
experience after 24 hours if it's still running the probability that the
test will find a problem becomes negligible).
If one of the subprocess stopped by itself your hardware is not stable.

This actually caught a few unstable systems before it could go into
production for me.

Lionel

  parent reply	other threads:[~2016-05-09 19:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-04 23:21 btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair Niccolò Belli
2016-05-05  1:07 ` Chris Murphy
2016-05-05 10:36   ` Niccolò Belli
2016-05-05 17:48     ` Omar Sandoval
2016-05-06 11:38       ` Niccolò Belli
2016-05-07 15:45         ` Niccolò Belli
2016-05-07 15:58           ` Clemens Eisserer
2016-05-07 16:11             ` Niccolò Belli
2016-05-08 18:27               ` Patrik Lundquist
2016-05-09 11:52               ` Austin S. Hemmelgarn
2016-05-09 14:53                 ` Niccolò Belli
2016-05-09 16:29                   ` Zygo Blaxell
2016-05-09 18:21                     ` Austin S. Hemmelgarn
2016-05-09 19:18                       ` Duncan
2016-05-12 14:35                     ` Niccolò Belli
2016-05-12 15:43                       ` Austin S. Hemmelgarn
2016-05-13 11:07                         ` Niccolò Belli
2016-05-13 11:35                           ` Austin S. Hemmelgarn
2016-05-13 12:10                             ` Niccolò Belli
2016-05-13 21:54                               ` Chris Murphy
2016-05-12 16:48                       ` Zygo Blaxell
2016-05-09 19:23                   ` Lionel Bouton [this message]
2016-05-09 21:30                   ` Chris Murphy
2016-05-07 23:35           ` Chris Murphy
2016-05-05  4:12 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5730E39B.7000403@bouton.name \
    --to=lionel-subscription@bouton.name \
    --cc=ahferroin7@gmail.com \
    --cc=darkbasic@linuxsystems.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linuxhippy@gmail.com \
    --cc=lists@colorremedies.com \
    --cc=osandov@osandov.com \
    --cc=patrik.lundquist@gmail.com \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.