* reiserfs + acl corruption
@ 2010-03-28 15:29 Marco Gatti
2010-03-30 8:02 ` Marco Gatti
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Marco Gatti @ 2010-03-28 15:29 UTC (permalink / raw)
To: reiserfs-devel
I hope to post in the right place.
I recently suffered a filesystem corruption with reiserfs in a
production environment and I was able to reproduce it.
The corruption started when i played with extended attributes, posix
acls, with a partition containing hundreds of thousands of files.
To reproduce the issue test it this way (using bash) in a separate
disk, partition or virtual disk using loopback:
mkfsreiserfs /dev/sdc1
mount -o acl /dev/sdc1 /mnt
cd /mnt
mkdir dir_with_many_files
touch dir_with_many_files/{1..100000}
setfacl -R -m u:username:rw dir_with_many_files
setfacl -R -x u:username dir_with_many_files (slow responsiveness of
system during the execution of this command)
setfacl -R -b dir_with_many_files
With a debian lenny standard kernel 2.6.26 (port amd64) these commands
ends succesfully and no corruption occours.
With a recent kernel, versions 2.6.32.8 - 2.6.32.9 - 2.6.32.10,
(x86_64) compiled in different ways, from standard configuration to
optimized versions even with no support for modules i get thousands of
this kind of message:
REISERFS warning (device sdc1): jdm-20002 reiserfs_xattr_get: Invalid
hash for xattr (system.posix_acl_access) associated with [2 848 0x0
SD]
then wierd things start to happen and the more you use this filesystem
the more you disrupt it: this leads to a corrupted filesystem!
If you try with less files, let's say 50000, no corruption or error
occour to me.
The number of files to reproduce this behaviour could be different and
it seems to be related to the machine you use: 100000 are enought for
a virtual machine with 1GB of RAM, but i needed 300000 of files using
a real machine with 4GB of RAM.
I tested other filesystem but i get no corruption at all with ext2,
ext3, ext4 and xfs.
I use debian stable or testing environments and i'm using reiserfs
included in vanilla kernels, with default options.
Am I doing something wrong?
Can someone test and reproduce this behaviour?
Regards
Marco Gatti
^ permalink raw reply [flat|nested] 11+ messages in thread
* reiserfs + acl corruption
2010-03-28 15:29 reiserfs + acl corruption Marco Gatti
@ 2010-03-30 8:02 ` Marco Gatti
2010-03-30 18:16 ` Jeff Mahoney
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Marco Gatti @ 2010-03-30 8:02 UTC (permalink / raw)
To: linux-kernel
I hope to post in the right place.
I recently suffered a filesystem corruption with reiserfs in a
production environment and I was able to reproduce it.
The corruption started when i played with extended attributes, posix
acls, with a partition containing hundreds of thousands of files.
To reproduce the issue test it this way (using bash) in a separate
disk, partition or virtual disk using loopback:
mkfsreiserfs /dev/sdc1
mount -o acl /dev/sdc1 /mnt
cd /mnt
mkdir dir_with_many_files
touch dir_with_many_files/{1..100000}
setfacl -R -m u:username:rw dir_with_many_files
setfacl -R -x u:username dir_with_many_files (slow responsiveness of
system during the execution of this command)
setfacl -R -b dir_with_many_files
With a debian lenny standard kernel 2.6.26 (port amd64) these commands
ends succesfully and no corruption occours.
With a recent kernel, versions 2.6.32.8 - 2.6.32.9 - 2.6.32.10,
(x86_64) compiled in different ways, from standard configuration to
optimized versions even with no support for modules i get thousands of
this kind of message:
REISERFS warning (device sdc1): jdm-20002 reiserfs_xattr_get: Invalid
hash for xattr (system.posix_acl_access) associated with [2 848 0x0
SD]
then wierd things start to happen and the more you use this filesystem
the more you disrupt it: this leads to a corrupted filesystem!
If you try with less files, let's say 50000, no corruption or error
occour to me.
The number of files to reproduce this behaviour could be different and
it seems to be related to the machine you use: 100000 are enought for
a virtual machine with 1GB of RAM, but i needed 300000 of files using
a real machine with 4GB of RAM.
I tested other filesystem but i get no corruption at all with ext2,
ext3, ext4 and xfs.
I use debian stable or testing environments and i'm using reiserfs
included in vanilla kernels, with default options.
Am I doing something wrong?
Can someone test and reproduce this behaviour?
Regards
Marco Gatti
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: reiserfs + acl corruption
2010-03-28 15:29 reiserfs + acl corruption Marco Gatti
2010-03-30 8:02 ` Marco Gatti
@ 2010-03-30 18:16 ` Jeff Mahoney
2010-03-31 14:39 ` dimas
2010-04-04 20:38 ` Christian Kujau
3 siblings, 0 replies; 11+ messages in thread
From: Jeff Mahoney @ 2010-03-30 18:16 UTC (permalink / raw)
To: Marco Gatti; +Cc: reiserfs-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 03/28/2010 11:29 AM, Marco Gatti wrote:
> I hope to post in the right place.
> I recently suffered a filesystem corruption with reiserfs in a
> production environment and I was able to reproduce it.
> The corruption started when i played with extended attributes, posix
> acls, with a partition containing hundreds of thousands of files.
> To reproduce the issue test it this way (using bash) in a separate
> disk, partition or virtual disk using loopback:
>
> mkfsreiserfs /dev/sdc1
> mount -o acl /dev/sdc1 /mnt
> cd /mnt
> mkdir dir_with_many_files
> touch dir_with_many_files/{1..100000}
> setfacl -R -m u:username:rw dir_with_many_files
> setfacl -R -x u:username dir_with_many_files (slow responsiveness of
> system during the execution of this command)
> setfacl -R -b dir_with_many_files
>
> With a debian lenny standard kernel 2.6.26 (port amd64) these commands
> ends succesfully and no corruption occours.
> With a recent kernel, versions 2.6.32.8 - 2.6.32.9 - 2.6.32.10,
> (x86_64) compiled in different ways, from standard configuration to
> optimized versions even with no support for modules i get thousands of
> this kind of message:
>
> REISERFS warning (device sdc1): jdm-20002 reiserfs_xattr_get: Invalid
> hash for xattr (system.posix_acl_access) associated with [2 848 0x0
> SD]
>
> then wierd things start to happen and the more you use this filesystem
> the more you disrupt it: this leads to a corrupted filesystem!
> If you try with less files, let's say 50000, no corruption or error
> occour to me.
> The number of files to reproduce this behaviour could be different and
> it seems to be related to the machine you use: 100000 are enought for
> a virtual machine with 1GB of RAM, but i needed 300000 of files using
> a real machine with 4GB of RAM.
> I tested other filesystem but i get no corruption at all with ext2,
> ext3, ext4 and xfs.
> I use debian stable or testing environments and i'm using reiserfs
> included in vanilla kernels, with default options.
> Am I doing something wrong?
> Can someone test and reproduce this behaviour?
I'll give it a try. There was some churn after 2.6.26 when I pushed my
reiserfs patch queue to mainline but I didn't run into anything like
this in my testing.
- -Jeff
- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/
iEYEARECAAYFAkuyQA0ACgkQLPWxlyuTD7K8awCgmxIujWl86sWWXXgpKrvq5kK/
llAAn24y2iD7y6BQyMA+h0f2fxaDKxeQ
=YbfE
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: reiserfs + acl corruption
2010-03-28 15:29 reiserfs + acl corruption Marco Gatti
2010-03-30 8:02 ` Marco Gatti
2010-03-30 18:16 ` Jeff Mahoney
@ 2010-03-31 14:39 ` dimas
2010-04-04 20:38 ` Christian Kujau
3 siblings, 0 replies; 11+ messages in thread
From: dimas @ 2010-03-31 14:39 UTC (permalink / raw)
To: Marco Gatti; +Cc: reiserfs-devel
hello, Marco!
i've created 1G partition for this test and made everything like you
suggested. setfacl -R -m .... failed on file 150000+ with "no space left"))
mb will try later with bigger partition
Marco Gatti пишет:
> I hope to post in the right place.
> I recently suffered a filesystem corruption with reiserfs in a
> production environment and I was able to reproduce it.
> The corruption started when i played with extended attributes, posix
> acls, with a partition containing hundreds of thousands of files.
> To reproduce the issue test it this way (using bash) in a separate
> disk, partition or virtual disk using loopback:
>
> mkfsreiserfs /dev/sdc1
> mount -o acl /dev/sdc1 /mnt
> cd /mnt
> mkdir dir_with_many_files
> touch dir_with_many_files/{1..100000}
> setfacl -R -m u:username:rw dir_with_many_files
> setfacl -R -x u:username dir_with_many_files (slow responsiveness of
> system during the execution of this command)
> setfacl -R -b dir_with_many_files
>
> With a debian lenny standard kernel 2.6.26 (port amd64) these commands
> ends succesfully and no corruption occours.
> With a recent kernel, versions 2.6.32.8 - 2.6.32.9 - 2.6.32.10,
> (x86_64) compiled in different ways, from standard configuration to
> optimized versions even with no support for modules i get thousands of
> this kind of message:
>
> REISERFS warning (device sdc1): jdm-20002 reiserfs_xattr_get: Invalid
> hash for xattr (system.posix_acl_access) associated with [2 848 0x0
> SD]
>
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: reiserfs + acl corruption
2010-03-28 15:29 reiserfs + acl corruption Marco Gatti
` (2 preceding siblings ...)
2010-03-31 14:39 ` dimas
@ 2010-04-04 20:38 ` Christian Kujau
2010-04-05 1:11 ` Christian Kujau
3 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2010-04-04 20:38 UTC (permalink / raw)
To: Marco Gatti; +Cc: reiserfs-devel
On Sun, 28 Mar 2010 at 17:29, Marco Gatti wrote:
> To reproduce the issue test it this way (using bash) in a separate
> disk, partition or virtual disk using loopback:
Thanks for sharing this testcase. One question though: which version of
"acl" are you using, my setfacl 2.2.49 doesn't understand "-R".
> REISERFS warning (device sdc1): jdm-20002 reiserfs_xattr_get: Invalid
> hash for xattr (system.posix_acl_access) associated with [2 848 0x0
> SD]
I've reported something similar a while back for user.* ACLs, it looks
like it's hitting the same issue: https://bugzilla.kernel.org/show_bug.cgi?id=14826
> then wierd things start to happen and the more you use this filesystem
> the more you disrupt it: this leads to a corrupted filesystem!
> If you try with less files, let's say 50000, no corruption or error
> occour to me.
I just tested again (with a variant of your script) and I was able to
create 100000 files, set ACLs on each of them and then remove this very
ACL from each file. However, using "setfacl -b" (removing *all* ACLs from
a file) it instantly returned with:
setfacl: ./1: Input/output error
And in the syslog:
REISERFS warning (device md0): jdm-20002 reiserfs_xattr_get: Invalid hash for xattr (system.posix_acl_access) associated with [2 5 0x0 SD]
See: http://nerdbynature.de/bits/2.6.34-rc2/reiserfs/
However, this does not happen on a freshly created reiserfs with e.g. just
one file on it, I'm still figuring out when exactly this happens.
Thanks,
Christian.
--
BOFH excuse #308:
CD-ROM server needs recalibration
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: reiserfs + acl corruption
2010-04-04 20:38 ` Christian Kujau
@ 2010-04-05 1:11 ` Christian Kujau
2010-04-05 5:44 ` Christian Kujau
0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2010-04-05 1:11 UTC (permalink / raw)
To: Marco Gatti; +Cc: reiserfs-devel, jeffm
On Sun, 4 Apr 2010 at 13:38, Christian Kujau wrote:
> However, this does not happen on a freshly created reiserfs with e.g. just
> one file on it, I'm still figuring out when exactly this happens.
Got it. After a few more iterations of the testscript[0] I was hitting it
at 25601 files, 25600 file were fine. Again, setting and removing *one*
ACL is fine, but removing *all* ACLs lead to I/O error.
After attempting to "setfacl -b", subsequent "ls -l" commands are
producing I/O errors as well (plus one syslog message for every error).
"ls" with no "-l" is fine, probably due to the missing stat() calls.
Reiserfsck 3.6.21 did not find any corruptions, yet unmounting/remounting
the filesystem returns I/O errors whenever I try to stat the directory in
question. Deleting the files is fine, creating and stat'ing 30001 new
files in the same directory (I did not remoe it!) is fine too.
Should I open a new bug for this or will this be dealt with in #14826 as
well?
Thanks,
Christian.
[0] http://nerdbynature.de/bits/2.6.34-rc2/reiserfs/
--
BOFH excuse #225:
It's those computer people in X {city of world}. They keep stuffing things up.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: reiserfs + acl corruption
2010-04-05 1:11 ` Christian Kujau
@ 2010-04-05 5:44 ` Christian Kujau
2010-04-05 15:34 ` Jeff Mahoney
0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2010-04-05 5:44 UTC (permalink / raw)
To: Marco Gatti; +Cc: reiserfs-devel, jeffm
On Sun, 4 Apr 2010 at 18:11, Christian Kujau wrote:
> Got it. After a few more iterations of the testscript[0] I was hitting it
> at 25601 files, 25600 file were fine. Again, setting and removing *one*
> ACL is fine, but removing *all* ACLs lead to I/O error.
The 25601 files don't need to be in one single directory. With 25000 files in
one and 25000 files in another (sub-)directory, the same happens. Enabling
REISERFS_CHECK hasn't revealed any more information yet.
Christian.
--
BOFH excuse #233:
TCP/IP UDP alarm threshold is set too low.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: reiserfs + acl corruption
2010-04-05 5:44 ` Christian Kujau
@ 2010-04-05 15:34 ` Jeff Mahoney
2010-04-06 8:40 ` Christian Kujau
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Mahoney @ 2010-04-05 15:34 UTC (permalink / raw)
To: Christian Kujau; +Cc: Marco Gatti, reiserfs-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 04/05/2010 01:44 AM, Christian Kujau wrote:
> On Sun, 4 Apr 2010 at 18:11, Christian Kujau wrote:
>> Got it. After a few more iterations of the testscript[0] I was hitting it
>> at 25601 files, 25600 file were fine. Again, setting and removing *one*
>> ACL is fine, but removing *all* ACLs lead to I/O error.
>
> The 25601 files don't need to be in one single directory. With 25000 files in
> one and 25000 files in another (sub-)directory, the same happens. Enabling
> REISERFS_CHECK hasn't revealed any more information yet.
Ok. I'm able to reproduce this with your script. I _can't_ reproduce it
with 8 GB of memory, but I can reproduce it with mem=256m. That's a good
data point.
- -Jeff
- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/
iEYEARECAAYFAku6AxMACgkQLPWxlyuTD7K5HwCfQrKlzFoLu93sfVPAolDVPsvI
gsoAmwdMU9sT4fWIlrWw+RMnFuHJDk8U
=pJCj
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: reiserfs + acl corruption
2010-04-05 15:34 ` Jeff Mahoney
@ 2010-04-06 8:40 ` Christian Kujau
2010-04-06 9:40 ` Marco Gatti
0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2010-04-06 8:40 UTC (permalink / raw)
To: Jeff Mahoney; +Cc: Marco Gatti, reiserfs-devel
On Mon, 5 Apr 2010 at 11:34, Jeff Mahoney wrote:
> Ok. I'm able to reproduce this with your script. I _can't_ reproduce it
> with 8 GB of memory, but I can reproduce it with mem=256m. That's a good
> data point.
On a 8GB machine I can reproduce it with 1000000 files, 500000 files is
fine. I haven't found out the exact threshold though.
v40z1# free -k
total used free shared buffers cached
Mem: 8121176 5497464 2623712 0 718264 1878760
-/+ buffers/cache: 2900440 5220736
Swap: 996024 1016 995008
Could you comment on the relation to #14826, or is this something
completely different. I'm getting more and more jdm-20002 in my syslog and
it's making me kinda nervous...
Thanks,
Christian.
[0] http://bugzilla.kernel.org/show_bug.cgi?id=14826
--
BOFH excuse #336:
the xy axis in the trackball is coordinated with the summer solstice
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: reiserfs + acl corruption
2010-04-06 8:40 ` Christian Kujau
@ 2010-04-06 9:40 ` Marco Gatti
2010-04-06 13:31 ` Marco Gatti
0 siblings, 1 reply; 11+ messages in thread
From: Marco Gatti @ 2010-04-06 9:40 UTC (permalink / raw)
To: Christian Kujau; +Cc: Jeff Mahoney, reiserfs-devel
2010/4/6 Christian Kujau <lists@nerdbynature.de>:
> On Mon, 5 Apr 2010 at 11:34, Jeff Mahoney wrote:
>> Ok. I'm able to reproduce this with your script. I _can't_ reproduce it
>> with 8 GB of memory, but I can reproduce it with mem=256m. That's a good
>> data point.
>
> On a 8GB machine I can reproduce it with 1000000 files, 500000 files is
> fine. I haven't found out the exact threshold though.
>
> v40z1# free -k
> total used free shared buffers cached
> Mem: 8121176 5497464 2623712 0 718264 1878760
> -/+ buffers/cache: 2900440 5220736
> Swap: 996024 1016 995008
>
>
> Could you comment on the relation to #14826, or is this something
> completely different. I'm getting more and more jdm-20002 in my syslog and
> it's making me kinda nervous...
>
> Thanks,
> Christian.
>
> [0] http://bugzilla.kernel.org/show_bug.cgi?id=14826
Well, I checked the bug #14826 and it seems just the same that happened to me.
I was curious so i reverted to a previous kernel version and i tested 2.6.31.12.
The bug is there and my debian system with 4GB of RAM gives the
Input/Output error managing 300000 files. So, IMHO, this bug is
related to something that changed before 2.6.32.
If i'll have some time i'll check previous versions of already built
kernels i used before and let you know what i found.
If i were you, Christian, backup all your data asap. When corruption
happened to me reiserfsck didn't help at all, even with rebuild-tree
option...
LVM snapshots saved me a lot of work and time!
For the record i was able to verify this bug with setfacl version
2.2.47 and 2.2.49, packaged by debian:
http://packages.debian.org/lenny/acl
The -R option is there for recurse.
Marco
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: reiserfs + acl corruption
2010-04-06 9:40 ` Marco Gatti
@ 2010-04-06 13:31 ` Marco Gatti
0 siblings, 0 replies; 11+ messages in thread
From: Marco Gatti @ 2010-04-06 13:31 UTC (permalink / raw)
To: reiserfs-devel
Ok, i made some tests and i think i found something really relevant.
I tested in 2 different machines various kernel versions, this is my report:
2.6.26 debian flavoured - no bug
2.6.29.6 - no bug
2.6.30.9 - acl bug present
2.6.31.12 - acl bug present
2.6.32.8, 2.6.32.9, 2.6.32.10 - acl bug present
Then the really interesting thing. If you try my test
mkfsreiserfs /dev/sdc1
mount -o acl /dev/sdc1 /mnt
cd /mnt
mkdir dir_with_many_files
touch dir_with_many_files/{1..100000}
setfacl -R -m u:username:rw dir_with_many_files
setfacl -R -x u:username dir_with_many_files
setfacl -R -b dir_with_many_files
when using the setfacl command keep an eye on the space occupation on
the partition/disc you are testing.
With kernels with bug present when issuing "setfacl -R -x ..." the
used space get reduced like there is no acl at all (or a kind of).
Let's make it clear:
- after creating 300000 empty files in a directory i have 60MB of space used;
- after "setfacl -R -m ..." i have 1.3GB of space used;
- after "setfacl -R -x ..." with bugged kernels i have 153MB of space used;
- after "setfacl -R -x ..." with NO bugged kernels i still have 1.3GB
of space used;
- after "setfacl -R -b ..." with NO bugged kernels i have 119MB of space used.
It seems to me that the changes introduced in kernel version 2.6.30
have modified heavily the behaviour in handling extended attributes,
but i'm not a kernel hacker then i'll leave you looking through the
code!
Hope this helps and if you have some patch (i hope against 2.6.32)
i'll be glad to test it.
Cheers
Marco
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-04-06 13:31 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-28 15:29 reiserfs + acl corruption Marco Gatti
2010-03-30 8:02 ` Marco Gatti
2010-03-30 18:16 ` Jeff Mahoney
2010-03-31 14:39 ` dimas
2010-04-04 20:38 ` Christian Kujau
2010-04-05 1:11 ` Christian Kujau
2010-04-05 5:44 ` Christian Kujau
2010-04-05 15:34 ` Jeff Mahoney
2010-04-06 8:40 ` Christian Kujau
2010-04-06 9:40 ` Marco Gatti
2010-04-06 13:31 ` Marco Gatti
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.