All of lore.kernel.org
 help / color / mirror / Atom feed
* reiserfs + acl corruption
@ 2010-03-28 15:29 Marco Gatti
  2010-03-30  8:02 ` Marco Gatti
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Marco Gatti @ 2010-03-28 15:29 UTC (permalink / raw)
  To: reiserfs-devel

I hope to post in the right place.
I recently suffered a filesystem corruption with reiserfs in a
production environment and I was able to reproduce it.
The corruption started when i played with extended attributes, posix
acls, with a partition containing hundreds of thousands of files.
To reproduce the issue test it this way (using bash) in a separate
disk, partition or virtual disk using loopback:

mkfsreiserfs /dev/sdc1
mount -o acl /dev/sdc1 /mnt
cd /mnt
mkdir dir_with_many_files
touch dir_with_many_files/{1..100000}
setfacl -R -m u:username:rw dir_with_many_files
setfacl -R -x u:username dir_with_many_files	(slow responsiveness of
system during the execution of this command)
setfacl -R -b dir_with_many_files

With a debian lenny standard kernel 2.6.26 (port amd64) these commands
ends succesfully and no corruption occours.
With a recent kernel, versions 2.6.32.8 - 2.6.32.9 - 2.6.32.10,
(x86_64) compiled in different ways, from standard configuration to
optimized versions even with no support for modules i get thousands of
this kind of message:

REISERFS warning (device sdc1): jdm-20002 reiserfs_xattr_get: Invalid
hash for xattr (system.posix_acl_access) associated with [2 848 0x0
SD]

then wierd things start to happen and the more you use this filesystem
the more you disrupt it: this leads to a corrupted filesystem!
If you try with less files, let's say 50000, no corruption or error
occour to me.
The number of files to reproduce this behaviour could be different and
it seems to be related to the machine you use: 100000 are enought for
a virtual machine with 1GB of RAM, but i needed 300000 of files using
a real machine with 4GB of RAM.
I tested other filesystem but i get no corruption at all with ext2,
ext3, ext4 and xfs.
I use debian stable or testing environments and i'm using reiserfs
included in vanilla kernels, with default options.
Am I doing something wrong?
Can someone test and reproduce this behaviour?
Regards

Marco Gatti

^ permalink raw reply	[flat|nested] 11+ messages in thread

* reiserfs + acl corruption
  2010-03-28 15:29 reiserfs + acl corruption Marco Gatti
@ 2010-03-30  8:02 ` Marco Gatti
  2010-03-30 18:16 ` Jeff Mahoney
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Marco Gatti @ 2010-03-30  8:02 UTC (permalink / raw)
  To: linux-kernel

I hope to post in the right place.
I recently suffered a filesystem corruption with reiserfs in a
production environment and I was able to reproduce it.
The corruption started when i played with extended attributes, posix
acls, with a partition containing hundreds of thousands of files.
To reproduce the issue test it this way (using bash) in a separate
disk, partition or virtual disk using loopback:

mkfsreiserfs /dev/sdc1
mount -o acl /dev/sdc1 /mnt
cd /mnt
mkdir dir_with_many_files
touch dir_with_many_files/{1..100000}
setfacl -R -m u:username:rw dir_with_many_files
setfacl -R -x u:username dir_with_many_files    (slow responsiveness of
system during the execution of this command)
setfacl -R -b dir_with_many_files

With a debian lenny standard kernel 2.6.26 (port amd64) these commands
ends succesfully and no corruption occours.
With a recent kernel, versions 2.6.32.8 - 2.6.32.9 - 2.6.32.10,
(x86_64) compiled in different ways, from standard configuration to
optimized versions even with no support for modules i get thousands of
this kind of message:

REISERFS warning (device sdc1): jdm-20002 reiserfs_xattr_get: Invalid
hash for xattr (system.posix_acl_access) associated with [2 848 0x0
SD]

then wierd things start to happen and the more you use this filesystem
the more you disrupt it: this leads to a corrupted filesystem!
If you try with less files, let's say 50000, no corruption or error
occour to me.
The number of files to reproduce this behaviour could be different and
it seems to be related to the machine you use: 100000 are enought for
a virtual machine with 1GB of RAM, but i needed 300000 of files using
a real machine with 4GB of RAM.
I tested other filesystem but i get no corruption at all with ext2,
ext3, ext4 and xfs.
I use debian stable or testing environments and i'm using reiserfs
included in vanilla kernels, with default options.
Am I doing something wrong?
Can someone test and reproduce this behaviour?
Regards

Marco Gatti

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: reiserfs + acl corruption
  2010-03-28 15:29 reiserfs + acl corruption Marco Gatti
  2010-03-30  8:02 ` Marco Gatti
@ 2010-03-30 18:16 ` Jeff Mahoney
  2010-03-31 14:39 ` dimas
  2010-04-04 20:38 ` Christian Kujau
  3 siblings, 0 replies; 11+ messages in thread
From: Jeff Mahoney @ 2010-03-30 18:16 UTC (permalink / raw)
  To: Marco Gatti; +Cc: reiserfs-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/28/2010 11:29 AM, Marco Gatti wrote:
> I hope to post in the right place.
> I recently suffered a filesystem corruption with reiserfs in a
> production environment and I was able to reproduce it.
> The corruption started when i played with extended attributes, posix
> acls, with a partition containing hundreds of thousands of files.
> To reproduce the issue test it this way (using bash) in a separate
> disk, partition or virtual disk using loopback:
> 
> mkfsreiserfs /dev/sdc1
> mount -o acl /dev/sdc1 /mnt
> cd /mnt
> mkdir dir_with_many_files
> touch dir_with_many_files/{1..100000}
> setfacl -R -m u:username:rw dir_with_many_files
> setfacl -R -x u:username dir_with_many_files	(slow responsiveness of
> system during the execution of this command)
> setfacl -R -b dir_with_many_files
> 
> With a debian lenny standard kernel 2.6.26 (port amd64) these commands
> ends succesfully and no corruption occours.
> With a recent kernel, versions 2.6.32.8 - 2.6.32.9 - 2.6.32.10,
> (x86_64) compiled in different ways, from standard configuration to
> optimized versions even with no support for modules i get thousands of
> this kind of message:
> 
> REISERFS warning (device sdc1): jdm-20002 reiserfs_xattr_get: Invalid
> hash for xattr (system.posix_acl_access) associated with [2 848 0x0
> SD]
> 
> then wierd things start to happen and the more you use this filesystem
> the more you disrupt it: this leads to a corrupted filesystem!
> If you try with less files, let's say 50000, no corruption or error
> occour to me.
> The number of files to reproduce this behaviour could be different and
> it seems to be related to the machine you use: 100000 are enought for
> a virtual machine with 1GB of RAM, but i needed 300000 of files using
> a real machine with 4GB of RAM.
> I tested other filesystem but i get no corruption at all with ext2,
> ext3, ext4 and xfs.
> I use debian stable or testing environments and i'm using reiserfs
> included in vanilla kernels, with default options.
> Am I doing something wrong?
> Can someone test and reproduce this behaviour?

I'll give it a try. There was some churn after 2.6.26 when I pushed my
reiserfs patch queue to mainline but I didn't run into anything like
this in my testing.

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iEYEARECAAYFAkuyQA0ACgkQLPWxlyuTD7K8awCgmxIujWl86sWWXXgpKrvq5kK/
llAAn24y2iD7y6BQyMA+h0f2fxaDKxeQ
=YbfE
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: reiserfs + acl corruption
  2010-03-28 15:29 reiserfs + acl corruption Marco Gatti
  2010-03-30  8:02 ` Marco Gatti
  2010-03-30 18:16 ` Jeff Mahoney
@ 2010-03-31 14:39 ` dimas
  2010-04-04 20:38 ` Christian Kujau
  3 siblings, 0 replies; 11+ messages in thread
From: dimas @ 2010-03-31 14:39 UTC (permalink / raw)
  To: Marco Gatti; +Cc: reiserfs-devel

hello, Marco!
i've created 1G partition for this test and made everything like you 
suggested. setfacl -R -m .... failed on file 150000+ with "no space left"))
mb will try later with bigger partition

Marco Gatti пишет:
> I hope to post in the right place.
> I recently suffered a filesystem corruption with reiserfs in a
> production environment and I was able to reproduce it.
> The corruption started when i played with extended attributes, posix
> acls, with a partition containing hundreds of thousands of files.
> To reproduce the issue test it this way (using bash) in a separate
> disk, partition or virtual disk using loopback:
> 
> mkfsreiserfs /dev/sdc1
> mount -o acl /dev/sdc1 /mnt
> cd /mnt
> mkdir dir_with_many_files
> touch dir_with_many_files/{1..100000}
> setfacl -R -m u:username:rw dir_with_many_files
> setfacl -R -x u:username dir_with_many_files	(slow responsiveness of
> system during the execution of this command)
> setfacl -R -b dir_with_many_files
> 
> With a debian lenny standard kernel 2.6.26 (port amd64) these commands
> ends succesfully and no corruption occours.
> With a recent kernel, versions 2.6.32.8 - 2.6.32.9 - 2.6.32.10,
> (x86_64) compiled in different ways, from standard configuration to
> optimized versions even with no support for modules i get thousands of
> this kind of message:
> 
> REISERFS warning (device sdc1): jdm-20002 reiserfs_xattr_get: Invalid
> hash for xattr (system.posix_acl_access) associated with [2 848 0x0
> SD]
> 
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: reiserfs + acl corruption
  2010-03-28 15:29 reiserfs + acl corruption Marco Gatti
                   ` (2 preceding siblings ...)
  2010-03-31 14:39 ` dimas
@ 2010-04-04 20:38 ` Christian Kujau
  2010-04-05  1:11   ` Christian Kujau
  3 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2010-04-04 20:38 UTC (permalink / raw)
  To: Marco Gatti; +Cc: reiserfs-devel

On Sun, 28 Mar 2010 at 17:29, Marco Gatti wrote:
> To reproduce the issue test it this way (using bash) in a separate
> disk, partition or virtual disk using loopback:

Thanks for sharing this testcase. One question though: which version of 
"acl" are you using, my setfacl 2.2.49 doesn't understand "-R".

> REISERFS warning (device sdc1): jdm-20002 reiserfs_xattr_get: Invalid
> hash for xattr (system.posix_acl_access) associated with [2 848 0x0
> SD]

I've reported something similar a while back for user.* ACLs, it looks 
like it's hitting the same issue: https://bugzilla.kernel.org/show_bug.cgi?id=14826

> then wierd things start to happen and the more you use this filesystem
> the more you disrupt it: this leads to a corrupted filesystem!
> If you try with less files, let's say 50000, no corruption or error
> occour to me.

I just tested again (with a variant of your script) and I was able to 
create 100000 files, set ACLs on each of them and then remove this very
ACL from each file. However, using "setfacl -b" (removing *all* ACLs from
a file) it instantly returned with:

  setfacl: ./1: Input/output error

And in the syslog:

REISERFS warning (device md0): jdm-20002 reiserfs_xattr_get: Invalid hash for xattr (system.posix_acl_access) associated with [2 5 0x0 SD]

See: http://nerdbynature.de/bits/2.6.34-rc2/reiserfs/

However, this does not happen on a freshly created reiserfs with e.g. just 
one file on it, I'm still figuring out when exactly this happens.

Thanks,
Christian.
-- 
BOFH excuse #308:

CD-ROM server needs recalibration

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: reiserfs + acl corruption
  2010-04-04 20:38 ` Christian Kujau
@ 2010-04-05  1:11   ` Christian Kujau
  2010-04-05  5:44     ` Christian Kujau
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2010-04-05  1:11 UTC (permalink / raw)
  To: Marco Gatti; +Cc: reiserfs-devel, jeffm

On Sun, 4 Apr 2010 at 13:38, Christian Kujau wrote:
> However, this does not happen on a freshly created reiserfs with e.g. just 
> one file on it, I'm still figuring out when exactly this happens.

Got it. After a few more iterations of the testscript[0] I was hitting it 
at 25601 files, 25600 file were fine. Again, setting and removing *one* 
ACL is fine, but removing *all* ACLs lead to I/O error.

After attempting to "setfacl -b", subsequent "ls -l" commands are 
producing I/O errors as well (plus one syslog message for every error). 
"ls" with no "-l" is fine, probably due to the missing stat() calls.

Reiserfsck 3.6.21 did not find any corruptions, yet unmounting/remounting 
the filesystem returns I/O errors whenever I try to stat the directory in 
question. Deleting the files is fine, creating and stat'ing 30001 new 
files in the same directory (I did not remoe it!) is fine too.

Should I open a new bug for this or will this be dealt with in #14826 as 
well?

Thanks,
Christian.

[0] http://nerdbynature.de/bits/2.6.34-rc2/reiserfs/
-- 
BOFH excuse #225:

It's those computer people in X {city of world}.  They keep stuffing things up.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: reiserfs + acl corruption
  2010-04-05  1:11   ` Christian Kujau
@ 2010-04-05  5:44     ` Christian Kujau
  2010-04-05 15:34       ` Jeff Mahoney
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2010-04-05  5:44 UTC (permalink / raw)
  To: Marco Gatti; +Cc: reiserfs-devel, jeffm

On Sun, 4 Apr 2010 at 18:11, Christian Kujau wrote:
> Got it. After a few more iterations of the testscript[0] I was hitting it 
> at 25601 files, 25600 file were fine. Again, setting and removing *one* 
> ACL is fine, but removing *all* ACLs lead to I/O error.

The 25601 files don't need to be in one single directory. With 25000 files in 
one and 25000 files in another (sub-)directory, the same happens. Enabling 
REISERFS_CHECK hasn't revealed any more information yet.

Christian.
-- 
BOFH excuse #233:

TCP/IP UDP alarm threshold is set too low.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: reiserfs + acl corruption
  2010-04-05  5:44     ` Christian Kujau
@ 2010-04-05 15:34       ` Jeff Mahoney
  2010-04-06  8:40         ` Christian Kujau
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Mahoney @ 2010-04-05 15:34 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Marco Gatti, reiserfs-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/05/2010 01:44 AM, Christian Kujau wrote:
> On Sun, 4 Apr 2010 at 18:11, Christian Kujau wrote:
>> Got it. After a few more iterations of the testscript[0] I was hitting it 
>> at 25601 files, 25600 file were fine. Again, setting and removing *one* 
>> ACL is fine, but removing *all* ACLs lead to I/O error.
> 
> The 25601 files don't need to be in one single directory. With 25000 files in 
> one and 25000 files in another (sub-)directory, the same happens. Enabling 
> REISERFS_CHECK hasn't revealed any more information yet.

Ok. I'm able to reproduce this with your script. I _can't_ reproduce it
with 8 GB of memory, but I can reproduce it with mem=256m. That's a good
data point.

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iEYEARECAAYFAku6AxMACgkQLPWxlyuTD7K5HwCfQrKlzFoLu93sfVPAolDVPsvI
gsoAmwdMU9sT4fWIlrWw+RMnFuHJDk8U
=pJCj
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: reiserfs + acl corruption
  2010-04-05 15:34       ` Jeff Mahoney
@ 2010-04-06  8:40         ` Christian Kujau
  2010-04-06  9:40           ` Marco Gatti
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2010-04-06  8:40 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Marco Gatti, reiserfs-devel

On Mon, 5 Apr 2010 at 11:34, Jeff Mahoney wrote:
> Ok. I'm able to reproduce this with your script. I _can't_ reproduce it
> with 8 GB of memory, but I can reproduce it with mem=256m. That's a good
> data point.

On a 8GB machine I can reproduce it with 1000000 files, 500000 files is 
fine. I haven't found out the exact threshold though.

v40z1# free -k
             total       used       free     shared    buffers     cached
Mem:       8121176    5497464    2623712          0     718264    1878760
-/+ buffers/cache:    2900440    5220736
Swap:       996024       1016     995008


Could you comment on the relation to #14826, or is this something 
completely different. I'm getting more and more jdm-20002 in my syslog and 
it's making me kinda nervous...

Thanks,
Christian.

[0] http://bugzilla.kernel.org/show_bug.cgi?id=14826
-- 
BOFH excuse #336:

the xy axis in the trackball is coordinated with the summer solstice

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: reiserfs + acl corruption
  2010-04-06  8:40         ` Christian Kujau
@ 2010-04-06  9:40           ` Marco Gatti
  2010-04-06 13:31             ` Marco Gatti
  0 siblings, 1 reply; 11+ messages in thread
From: Marco Gatti @ 2010-04-06  9:40 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Jeff Mahoney, reiserfs-devel

2010/4/6 Christian Kujau <lists@nerdbynature.de>:
> On Mon, 5 Apr 2010 at 11:34, Jeff Mahoney wrote:
>> Ok. I'm able to reproduce this with your script. I _can't_ reproduce it
>> with 8 GB of memory, but I can reproduce it with mem=256m. That's a good
>> data point.
>
> On a 8GB machine I can reproduce it with 1000000 files, 500000 files is
> fine. I haven't found out the exact threshold though.
>
> v40z1# free -k
>             total       used       free     shared    buffers     cached
> Mem:       8121176    5497464    2623712          0     718264    1878760
> -/+ buffers/cache:    2900440    5220736
> Swap:       996024       1016     995008
>
>
> Could you comment on the relation to #14826, or is this something
> completely different. I'm getting more and more jdm-20002 in my syslog and
> it's making me kinda nervous...
>
> Thanks,
> Christian.
>
> [0] http://bugzilla.kernel.org/show_bug.cgi?id=14826

Well, I checked the bug #14826 and it seems just the same that happened to me.
I was curious so i reverted to a previous kernel version and i tested 2.6.31.12.
The bug is there and my debian system with 4GB of RAM gives the
Input/Output error managing 300000 files. So, IMHO, this bug is
related to something that changed before 2.6.32.
If i'll have some time i'll check previous versions of already built
kernels i used before and let you know what i found.

If i were you, Christian, backup all your data asap. When corruption
happened to me reiserfsck didn't help at all, even with rebuild-tree
option...
LVM snapshots saved me a lot of work and time!
For the record i was able to verify this bug with setfacl version
2.2.47 and 2.2.49, packaged by debian:
http://packages.debian.org/lenny/acl
The -R option is there for recurse.

Marco
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: reiserfs + acl corruption
  2010-04-06  9:40           ` Marco Gatti
@ 2010-04-06 13:31             ` Marco Gatti
  0 siblings, 0 replies; 11+ messages in thread
From: Marco Gatti @ 2010-04-06 13:31 UTC (permalink / raw)
  To: reiserfs-devel

Ok, i made some tests and i think i found something really relevant.
I tested in 2 different machines various kernel versions, this is my report:

2.6.26 debian flavoured - no bug
2.6.29.6 - no bug
2.6.30.9 - acl bug present
2.6.31.12 - acl bug present
2.6.32.8, 2.6.32.9, 2.6.32.10 - acl bug present

Then the really interesting thing. If you try my test


mkfsreiserfs /dev/sdc1
mount -o acl /dev/sdc1 /mnt
cd /mnt
mkdir dir_with_many_files
touch dir_with_many_files/{1..100000}
setfacl -R -m u:username:rw dir_with_many_files
setfacl -R -x u:username dir_with_many_files
setfacl -R -b dir_with_many_files


when using the setfacl command keep an eye on the space occupation on
the partition/disc you are testing.
With kernels with bug present when issuing "setfacl -R -x ..." the
used space get reduced like there is no acl at all (or a kind of).
Let's make it clear:
- after creating 300000 empty files in a directory i have 60MB of space used;
- after "setfacl -R -m ..." i have 1.3GB of space used;
- after "setfacl -R -x ..." with bugged kernels i have 153MB of space used;
- after "setfacl -R -x ..." with NO bugged kernels i still have 1.3GB
of space used;
- after "setfacl -R -b ..." with NO bugged kernels i have 119MB of space used.

It seems to me that the changes introduced in kernel version 2.6.30
have modified heavily the behaviour in handling extended attributes,
but i'm not a kernel hacker then i'll leave you looking through the
code!
Hope this helps and if you have some patch (i hope against 2.6.32)
i'll be glad to test it.

Cheers
Marco

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-04-06 13:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-28 15:29 reiserfs + acl corruption Marco Gatti
2010-03-30  8:02 ` Marco Gatti
2010-03-30 18:16 ` Jeff Mahoney
2010-03-31 14:39 ` dimas
2010-04-04 20:38 ` Christian Kujau
2010-04-05  1:11   ` Christian Kujau
2010-04-05  5:44     ` Christian Kujau
2010-04-05 15:34       ` Jeff Mahoney
2010-04-06  8:40         ` Christian Kujau
2010-04-06  9:40           ` Marco Gatti
2010-04-06 13:31             ` Marco Gatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.