linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dm-crypt is broken and causes massive data corruption
@ 2006-05-08 17:20 Tillmann Steinbrecher
  2006-05-08 17:57 ` [dm-crypt] " Simpson, Brett
  2006-05-09 19:04 ` Alasdair G Kergon
  0 siblings, 2 replies; 8+ messages in thread
From: Tillmann Steinbrecher @ 2006-05-08 17:20 UTC (permalink / raw)
  To: linux-kernel, dm-crypt

Hi,

it's been many months that dm-crypt has been broken, and is known to 
cause massive data corruption.

Various people have noticed this, have lost data and wasted many hours 
trying to find the reason, and still NOTHING is being done about it. The 
problem seems to occur only in conjunction with RAID (dm-crypt on top of 
RAID) (or possibly it occurs only in conjunction with large 
filesystems). I've had issues with that for many months as well, trying 
to eliminate other possible reasons. There are none.

Let's say this loud and clear:

dm-crypt causes data corruption. Yet it is not even marked as 
"EXPERIMENTAL" in the kernel config, when in fact it's more than just 
experimental, it's "DANGEROUS/BROKEN".

Here are some more reports:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=336153
(That was for 2.6.8, but the problems are still the same in recent 
kernel versions)

http://www.ubuntuforums.org/showthread.php?t=170304
(Similar config, similar problem - this time with 2.6.12 and 2.6.15)

http://episteme.arstechnica.com/groupee/forums/a/tpc/f/96509133/m/282007248731/r/224008458731
(Again the same constellation, and the same problem.)

http://marc.theaimsgroup.com/?l=linux-kernel&m=114664786711245&w=2
(Same config, same problem. This time with 2.6.16!)

BTW the problem seems to be independent from the filesystem used; 
however, filesystems seem to be more or less robust against this type of 
corruption. With ext3, the filesystem would mess itself up within hours 
on my system. With XFS, massive corruption (all data lost) had occured 
after a few weeks. With ReiserFS 3, occasional problems that were 
fixable using reiserfsck --rebuild-tree occured.

Sorry for the rant. But I think this is an important issue that needs to 
be adressed ASAP, before even more people lose their data. Keep in mind 
that crypto filesystems are typically used for systems where the data is 
sensitive and important! Something must be done about it - in the worst 
case, removing dm-crypt from the mainline kernel.

Please CC replies to me, as I'm not subscribed to either linux-kernel or 
dm-crypt.

bye,
Tillmann
-- 
Dipl.-Ing. Tillmann Steinbrecher        http://www.igd.fhg.de/~tsteinbr/
Cognitive Computing & Medical Imaging
Fraunhofer IGD, Fraunhoferstr. 5, D-64283 Darmstadt, Germany
All opinions are mine and not those of my employer.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-crypt] dm-crypt is broken and causes massive data corruption
  2006-05-08 17:20 dm-crypt is broken and causes massive data corruption Tillmann Steinbrecher
@ 2006-05-08 17:57 ` Simpson, Brett
  2006-05-08 18:27   ` Christophe Saout
  2006-05-09 19:04 ` Alasdair G Kergon
  1 sibling, 1 reply; 8+ messages in thread
From: Simpson, Brett @ 2006-05-08 17:57 UTC (permalink / raw)
  To: dm-crypt; +Cc: linux-kernel, Tillmann Steinbrecher

On Mon, 2006-05-08 at 19:20 +0200, Tillmann Steinbrecher wrote:

> it's been many months that dm-crypt has been broken, and is known to 
> cause massive data corruption.
> 
> Various people have noticed this, have lost data and wasted many hours 
> trying to find the reason, and still NOTHING is being done about it. The 
> problem seems to occur only in conjunction with RAID (dm-crypt on top of 
> RAID) (or possibly it occurs only in conjunction with large 
> filesystems). I've had issues with that for many months as well, trying 
> to eliminate other possible reasons. There are none.

I've been running Gentoo for over month with a 54GB ext3 filesystem via
dm-crypt on an IDE drive. No problems so far.

I've used Gentoo-sources 2.6.16-r1 and vanilla kernels 2.6.17-rc1
through rc3.

I've been using cryptsetup-1.0.1-i686-pc-linux-gnu-static and have it in
my initrd so I can mount my root partition.

Brett

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-crypt] dm-crypt is broken and causes massive data corruption
  2006-05-08 17:57 ` [dm-crypt] " Simpson, Brett
@ 2006-05-08 18:27   ` Christophe Saout
  0 siblings, 0 replies; 8+ messages in thread
From: Christophe Saout @ 2006-05-08 18:27 UTC (permalink / raw)
  To: bart; +Cc: dm-crypt, linux-kernel, Tillmann Steinbrecher

[-- Attachment #1: Type: text/plain, Size: 287 bytes --]

Am Montag, den 08.05.2006, 13:57 -0400 schrieb Simpson, Brett:

> I've been running Gentoo for over month with a 54GB ext3 filesystem via
> dm-crypt on an IDE drive. No problems so far.

It's a problem with dm-crypt on top of md. I'm trying to figure out
what's going on there.


[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-crypt] dm-crypt is broken and causes massive data corruption
  2006-05-08 17:20 dm-crypt is broken and causes massive data corruption Tillmann Steinbrecher
  2006-05-08 17:57 ` [dm-crypt] " Simpson, Brett
@ 2006-05-09 19:04 ` Alasdair G Kergon
  2006-05-11 15:15   ` Paul Slootman
  1 sibling, 1 reply; 8+ messages in thread
From: Alasdair G Kergon @ 2006-05-09 19:04 UTC (permalink / raw)
  To: Tillmann Steinbrecher; +Cc: linux-kernel, dm-crypt

On Mon, May 08, 2006 at 07:20:12PM +0200, Tillmann Steinbrecher wrote:
> it's been many months that dm-crypt has been broken, and is known to 
> cause massive data corruption.
 
> Various people have noticed this, have lost data and wasted many hours 
> trying to find the reason, and still NOTHING is being done about it. 

Perhaps that's because it wasn't until last week that the upstream
maintainers heard of these problems?

So far there isn't much in the way of controlled experiments, but:

  All the reports agree the problem is independent of filesystem.

  One thread suggests only filesystem metadata is corrupted, not file
  data, and wonders if something's going wrong with (unsupported) write 
  barriers.

  Another report said dm-crypt over raid5 failed while raid5 
  over dm-crypt worked.

Alasdair
-- 
agk@redhat.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-crypt] dm-crypt is broken and causes massive data corruption
  2006-05-09 19:04 ` Alasdair G Kergon
@ 2006-05-11 15:15   ` Paul Slootman
  2006-05-11 15:42     ` Andrea Gelmini
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Paul Slootman @ 2006-05-11 15:15 UTC (permalink / raw)
  To: linux-kernel

Alasdair G Kergon  <agk@redhat.com> wrote:
>On Mon, May 08, 2006 at 07:20:12PM +0200, Tillmann Steinbrecher wrote:
>> it's been many months that dm-crypt has been broken, and is known to 
>> cause massive data corruption.

>So far there isn't much in the way of controlled experiments, but:
>
>  All the reports agree the problem is independent of filesystem.
>
>  One thread suggests only filesystem metadata is corrupted, not file
>  data, and wonders if something's going wrong with (unsupported) write 
>  barriers.
>
>  Another report said dm-crypt over raid5 failed while raid5 
>  over dm-crypt worked.

A data point:

I'm running my /home on reiserfs3 over dm-crypt over lvm over raid5 for
at least a year now, without any problems. Currently running 2.6.13.4
(that's my "stable" work system...).


Paul Slootman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-crypt] dm-crypt is broken and causes massive data corruption
  2006-05-11 15:15   ` Paul Slootman
@ 2006-05-11 15:42     ` Andrea Gelmini
  2006-05-11 23:17     ` Christian Schmidt
  2006-05-12 21:47     ` Dan Merillat
  2 siblings, 0 replies; 8+ messages in thread
From: Andrea Gelmini @ 2006-05-11 15:42 UTC (permalink / raw)
  To: Paul Slootman; +Cc: linux-kernel

On Thu, May 11, 2006 at 03:15:29PM +0000, Paul Slootman wrote:
> A data point:
> 
> I'm running my /home on reiserfs3 over dm-crypt over lvm over raid5 for
> at least a year now, without any problems. Currently running 2.6.13.4
> (that's my "stable" work system...).

It seems the write pattern is important... I can replicate corruption
copying giga of data from an locale attached IDE disk. Do you write mostly
from network or from slow devices?

ciao,
gelma

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-crypt] dm-crypt is broken and causes massive data corruption
  2006-05-11 15:15   ` Paul Slootman
  2006-05-11 15:42     ` Andrea Gelmini
@ 2006-05-11 23:17     ` Christian Schmidt
  2006-05-12 21:47     ` Dan Merillat
  2 siblings, 0 replies; 8+ messages in thread
From: Christian Schmidt @ 2006-05-11 23:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: tsteinbr

Paul Slootman wrote:
> Alasdair G Kergon  <agk@redhat.com> wrote:
>> On Mon, May 08, 2006 at 07:20:12PM +0200, Tillmann Steinbrecher wrote:
>>> it's been many months that dm-crypt has been broken, and is known to 
>>> cause massive data corruption.
> 
>> So far there isn't much in the way of controlled experiments, but:
>>
>>  All the reports agree the problem is independent of filesystem.
>>
>>  One thread suggests only filesystem metadata is corrupted, not file
>>  data, and wonders if something's going wrong with (unsupported) write 
>>  barriers.
>>
>>  Another report said dm-crypt over raid5 failed while raid5 
>>  over dm-crypt worked.
> 
> A data point:
> 
> I'm running my /home on reiserfs3 over dm-crypt over lvm over raid5 for
> at least a year now, without any problems. Currently running 2.6.13.4
> (that's my "stable" work system...).

Just so you know,

I'm running dm-crypt on top of raid-5 as well. Kernels ranging from
gentoo's hardened 2.6.11 to 2.6.15.X with gentoo patchset on AMD64. The
raid is running since February 2005 with >1TB and survived a disk
failure with rebuild.
Cipher module was aes, now the asm-accelerated x86_64 version. The
filesystem is ext-3. Survived several hard lockups (damn cheap SATA
controllers hanging if a drive passes out), an LV/filesystem resize, and
feeding with GBytes of data in a row (at max ~30MByte/s to 2-3 files in
parallel).

Just re-checked the filesystem: no metadata information wrong. I
remember I checked the crc of several bigger archives when I had to
replace a drive two month ago, and couldn't find any problems then.

Best regards,
Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-crypt] dm-crypt is broken and causes massive data corruption
  2006-05-11 15:15   ` Paul Slootman
  2006-05-11 15:42     ` Andrea Gelmini
  2006-05-11 23:17     ` Christian Schmidt
@ 2006-05-12 21:47     ` Dan Merillat
  2 siblings, 0 replies; 8+ messages in thread
From: Dan Merillat @ 2006-05-12 21:47 UTC (permalink / raw)
  To: linux-kernel

On 5/11/06, Paul Slootman <paul+nospam@wurtel.net> wrote:

> A data point:
>
> I'm running my /home on reiserfs3 over dm-crypt over lvm over raid5 for
> at least a year now, without any problems. Currently running 2.6.13.4
> (that's my "stable" work system...).

Datapoint:

Linux fileserver 2.6.15.6 #1 PREEMPT Wed Mar 8 20:26:55 EST 2006
x86_64 GNU/Linux
CONFIG_MD_RAID5=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_SNAPSHOT=y
CONFIG_CRYPTO_AES_X86_64=y

encrypted logical volume on a raid-5 MD on 4 SATA drives, mounted reiser3.

aes-cbc-plain

It's worked through multiple kernels, and moving from 32 to 64bits.
2.6.11 (64-bit) 2.6.10 (64bit) 2.6.8 (32bit) is the kernel history I
have so far.  I'm not sure when I switched from cryptoloop to dm-crypt
though, at least before may '05.

I'm not running dm-crypt directly on MD, though, the stack is
SATA->MD->DM->DM-crypt->reiser3.   That may be the difference.

I've got plenty of free space, I could make a ~75gb encrypted
partition and run any sort
of write pattern test/filesystem you want me to try.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-05-12 21:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-08 17:20 dm-crypt is broken and causes massive data corruption Tillmann Steinbrecher
2006-05-08 17:57 ` [dm-crypt] " Simpson, Brett
2006-05-08 18:27   ` Christophe Saout
2006-05-09 19:04 ` Alasdair G Kergon
2006-05-11 15:15   ` Paul Slootman
2006-05-11 15:42     ` Andrea Gelmini
2006-05-11 23:17     ` Christian Schmidt
2006-05-12 21:47     ` Dan Merillat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).