linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: "Niccolò Belli" <darkbasic@linuxsystems.it>, linux-btrfs@vger.kernel.org
Cc: Clemens Eisserer <linuxhippy@gmail.com>
Subject: Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair
Date: Mon, 9 May 2016 07:52:16 -0400	[thread overview]
Message-ID: <799cf552-4612-56c5-b44d-59458119e2b0@gmail.com> (raw)
In-Reply-To: <f1dd07efb34a0a110f62566979530944@linuxsystems.it>

On 2016-05-07 12:11, Niccolò Belli wrote:
> Il 2016-05-07 17:58 Clemens Eisserer ha scritto:
>> Hi Niccolo,
>>
>>> btrfs + dmcrypt + compress=lzo + autodefrag = corruption at first boot
>>
>> Just to be curious - couldn't it be a hardware issue? I use almost the
>> same setup (compress-force=lzo instead of compress-force=lzo) on my
>> laptop for 2-3 years and haven't experienced any issues since
>> ~kernel-3.14 or so.
>>
>> Br, Clemens Eisserer
>
> Hi,
> Which kind of hardware issue? I did a full memtest86 check, a full
> smartmontools extended check and even a badblocks -wsv.
> If this is really an hardware issue that we can identify I would be more
> than happy because Dell will replace my laptop and this nightmare will
> be finally over. I'm open to suggestions.
First, some general advice:
1. It is fully possible to have bad RAM that still passes memtest86 
consistently, and in fact, most of the time this will be the case (if 
you're seeing any thing other than the bit-fade test in memtest86 fail, 
then your system probably won't boot fully).  Memtest doesn't replicate 
typical usage patterns very well.  My usual testing for RAM involves not 
just memtest, but also booting into a LiveCD (usually SystemRescueCD), 
pulling down a copy of the kernel source, and then running as many 
concurrent kernel builds as cores, each with as many make jobs as cores 
(so if you've got a quad core CPU (or a dual core with hyperthreading), 
it would be running 4 builds with -j4 passed to make).  GCC seems to 
have memory usage patterns that reliably trigger memory errors that 
aren't caught by memtest, so this generally gives good results. 
Secondarily, if it's a big system and I am not pressed for time, I do a 
quick Gentoo install with Xen, and then spin up twice as many Xen VM's 
as cores and run memtest in those concurrently (this seems to catch 
things a bit more reliably than just a plain memtest).
2. On a similar note, badblocks doesn't replicate filesystem like access 
patterns, it just runs sequentially through the entire disk.  This isn't 
as likely to give bad results, but it's still important to know.  In 
particular, try running it over a dmcrypt volume a couple of times 
(preferably with a different key each time, pulling keys from 
/dev/urandom works well for this), as that will result in writing 
different data.  For what it's worth, when I'm doing initial testing of 
new disks, I always use ddrescue to copy /dev/zero over the whole disk, 
then do it twice through dmcrypt with different keys, copying from the 
disk to /dev/null after each pass.  This gives random data on disk as a 
starting point (which is good if you're going to use dmcrypt), and 
usually triggers reallocation of any bad sectors as early as possible. 
If I have time and access to an existing system I can connect the disk 
to, I often do testing with fio as well.

Now, to slightly more specific advice:
1. If you have an eSATA port, try plugging your hard disk in there and 
see if things work.  If that works but having the hard drive plugged in 
internally doesn't, then the issue is probably either that specific SATA 
port (in which case your chip-set is bad and you should get a new 
system), or the SATA connector itself (or the wiring, but that's not as 
likely when it's traces on a PCB).  Normally I'd suggest just swapping 
cables and SATA ports, but that's not really possible with a laptop.
2. If you have access to a reasonably large flash drive, or to a USB to 
SATA adapter, try that as well, if it works on that but not internally 
(or on an eSATA port), you've probably got a bad SATA controller, and 
should get a new system.
3. Try things without dmcrypt.  Adding extra layers makes it harder to 
determine what is actually wrong.  If it works without dmcrypt, try 
using different parameters for the encryption (different ciphers is what 
I would try first).  If it works reliably without dmcrypt, then it's 
either a bug in dmcrypt (which I don't think is very likely), or it's 
bad interaction between dmcrypt and BTRFS.  If it works with some 
encryption parameters but not others, then that will help narrow down 
where the issue is.

  parent reply	other threads:[~2016-05-09 11:52 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-04 23:21 btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair Niccolò Belli
2016-05-05  1:07 ` Chris Murphy
2016-05-05 10:36   ` Niccolò Belli
2016-05-05 17:48     ` Omar Sandoval
2016-05-06 11:38       ` Niccolò Belli
2016-05-07 15:45         ` Niccolò Belli
2016-05-07 15:58           ` Clemens Eisserer
2016-05-07 16:11             ` Niccolò Belli
2016-05-08 18:27               ` Patrik Lundquist
2016-05-09 11:52               ` Austin S. Hemmelgarn [this message]
2016-05-09 14:53                 ` Niccolò Belli
2016-05-09 16:29                   ` Zygo Blaxell
2016-05-09 18:21                     ` Austin S. Hemmelgarn
2016-05-09 19:18                       ` Duncan
2016-05-12 14:35                     ` Niccolò Belli
2016-05-12 15:43                       ` Austin S. Hemmelgarn
2016-05-13 11:07                         ` Niccolò Belli
2016-05-13 11:35                           ` Austin S. Hemmelgarn
2016-05-13 12:10                             ` Niccolò Belli
2016-05-13 21:54                               ` Chris Murphy
2016-05-12 16:48                       ` Zygo Blaxell
2016-05-09 19:23                   ` Lionel Bouton
2016-05-09 21:30                   ` Chris Murphy
2016-05-07 23:35           ` Chris Murphy
2016-05-05  4:12 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=799cf552-4612-56c5-b44d-59458119e2b0@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=darkbasic@linuxsystems.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linuxhippy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).