All of lore.kernel.org
 help / color / mirror / Atom feed
* xfsdump-3.0.4 problems
@ 2010-08-16 16:22 Mario Bachmann
  2010-08-16 22:30 ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Mario Bachmann @ 2010-08-16 16:22 UTC (permalink / raw)
  To: xfs

Hello, 

my kernel is
Linux x2 2.6.35.2 #1 SMP Sun Aug 15 00:32:14 CEST 2010 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ AuthenticAMD GNU/Linux

I get a lot of Warnings with xfsdump-3.0.4 (booth, gentoo package 3.0.4-r1 and git-version):

x2 ~/source/d/xfsdump/dump # ./xfsdump -l0 -L "Test" - /dev/sda2 |gzip - > /mnt/data2/dump_text.gz
./xfsdump: using file dump (drive_simple) strategy
./xfsdump: version 3.0.4 (dump format 3.0) - Running single-threaded
./xfsdump: level 0 dump of x2:/home
./xfsdump: dump date: Mon Aug 16 17:34:30 2010
./xfsdump: session id: 288ed27c-2b26-4c8b-a5d4-bfd1a32f4b6f
./xfsdump: session label: "Test"
./xfsdump: ino map phase 1: constructing initial dump list
./xfsdump: ino map phase 2: skipping (no pruning necessary)
./xfsdump: ino map phase 3: skipping (only one dump stream)
./xfsdump: ino map construction complete
./xfsdump: estimated dump size: 3236335808 bytes
./xfsdump: creating dump session media file 0 (media 0, file 0)
./xfsdump: dumping ino map
./xfsdump: dumping directories
./xfsdump: WARNING: could not stat dirent .crack-attack ino 100663674: Das Argument ist ungültig: using null generation count in directory entry
./xfsdump: WARNING: could not stat dirent .kvirc4.rc ino 239: Das Argument ist ungültig: using null generation count in directory entry
./xfsdump: WARNING: could not stat dirent .recently-used ino 240: Das Argument ist ungültig: using null generation count in directory entry
./xfsdump: WARNING: could not stat dirent .DownloadManager ino 100663836: Das Argument ist ungültig: using null generation count in directory entry
./xfsdump: WARNING: could not stat dirent .distcc ino 33554725: Das Argument ist ungültig: using null generation count in directory entry
./xfsdump: WARNING: could not stat dirent .javafx_eula_accepted ino 1327: Das Argument ist ungültig: using null generation count in directory entry
./xfsdump: WARNING: could not stat dirent solar_kassel_thomas.txt ino 1671: Das Argument ist ungültig: using null generation count in directory entry

[thousands of these messages...]

./xfsdump: WARNING: could not stat dirent -07a9540007aa7500-0000000000 ino 38797391: Das Argument ist ungültig: using null generation count in directory entry
./xfsdump: dumping non-directory files
./xfsdump: ending media file
./xfsdump: media file size 3319179552 bytes
./xfsdump: dump size (non-dir files) : 3313078008 bytes
./xfsdump: dump complete: 174 seconds elapsed
./xfsdump: Dump Status: SUCCESS

At the end there is a file with 2,8 GB. When restored, I have 3,2 GB. 
A lot of files simply are not there! 

With the "old" version xfsdump-3.0.1, I get no warnings!
xfsdump -l0 -L "Test" - /dev/sda2 |gzip - > /mnt/data2/dump301_test.gz
xfsdump: using file dump (drive_simple) strategy
xfsdump: version 3.0.1 (dump format 3.0) - Running single-threaded
xfsdump: level 0 dump of x2:/home
xfsdump: dump date: Mon Aug 16 18:14:47 2010
xfsdump: session id: b75a088f-6481-4583-8020-7c67fbe92bca
xfsdump: session label: "Test"
xfsdump: ino map phase 1: constructing initial dump list
xfsdump: ino map phase 2: skipping (no pruning necessary)
xfsdump: ino map phase 3: skipping (only one dump stream)
xfsdump: ino map construction complete
xfsdump: estimated dump size: 6000020480 bytes
xfsdump: creating dump session media file 0 (media 0, file 0)
xfsdump: dumping ino map
xfsdump: dumping directories
xfsdump: dumping non-directory files
xfsdump: ending media file
xfsdump: media file size 5701306656 bytes
xfsdump: dump size (non-dir files) : 5691937016 bytes
xfsdump: dump complete: 308 seconds elapsed
xfsdump: Dump Status: SUCCESS

And the file is 4,8 GB. All seems to be correct!

Mario

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfsdump-3.0.4 problems
  2010-08-16 16:22 xfsdump-3.0.4 problems Mario Bachmann
@ 2010-08-16 22:30 ` Dave Chinner
  2010-08-17  6:32   ` Mario Bachmann
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2010-08-16 22:30 UTC (permalink / raw)
  To: Mario Bachmann; +Cc: xfs

On Mon, Aug 16, 2010 at 06:22:36PM +0200, Mario Bachmann wrote:
> Hello, 
> 
> my kernel is
> Linux x2 2.6.35.2 #1 SMP Sun Aug 15 00:32:14 CEST 2010 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ AuthenticAMD GNU/Linux
> 
> I get a lot of Warnings with xfsdump-3.0.4 (booth, gentoo package 3.0.4-r1 and git-version):
> 
> x2 ~/source/d/xfsdump/dump # ./xfsdump -l0 -L "Test" - /dev/sda2 |gzip - > /mnt/data2/dump_text.gz
.....
> ./xfsdump: WARNING: could not stat dirent .crack-attack ino 100663674: Das Argument ist ungültig: using null generation count in directory entry

bulkstat is failing, indicating an invalid option was passed.

> With the "old" version xfsdump-3.0.1, I get no warnings!
> xfsdump -l0 -L "Test" - /dev/sda2 |gzip - > /mnt/data2/dump301_test.gz
> xfsdump: using file dump (drive_simple) strategy
> xfsdump: version 3.0.1 (dump format 3.0) - Running single-threaded
....
> xfsdump: Dump Status: SUCCESS
> 
> And the file is 4,8 GB. All seems to be correct!

I can't see any changes between 3.0.1 and 3.0.4 that would explain
this. Did you run them on the same machine and kernel? Did you build
them with the same compiler? If everything is the same, then perhaps
you could bisect (only a few changes so should be quick) to point
out the offending change.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfsdump-3.0.4 problems
  2010-08-16 22:30 ` Dave Chinner
@ 2010-08-17  6:32   ` Mario Bachmann
  2010-08-17  7:13     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Mario Bachmann @ 2010-08-17  6:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Am Tue, 17 Aug 2010 08:30:21 +1000
schrieb Dave Chinner <david@fromorbit.com>:

> On Mon, Aug 16, 2010 at 06:22:36PM +0200, Mario Bachmann wrote:
> > Hello, 
> > 
> > my kernel is
> > Linux x2 2.6.35.2 #1 SMP Sun Aug 15 00:32:14 CEST 2010 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ AuthenticAMD GNU/Linux
> > 
> > I get a lot of Warnings with xfsdump-3.0.4 (booth, gentoo package 3.0.4-r1 and git-version):
> > 
> > x2 ~/source/d/xfsdump/dump # ./xfsdump -l0 -L "Test" - /dev/sda2 |gzip - > /mnt/data2/dump_text.gz
> .....
> > ./xfsdump: WARNING: could not stat dirent .crack-attack ino 100663674: Das Argument ist ungültig: using null generation count in directory entry
> 
> bulkstat is failing, indicating an invalid option was passed.
> 
> > With the "old" version xfsdump-3.0.1, I get no warnings!
> > xfsdump -l0 -L "Test" - /dev/sda2 |gzip - > /mnt/data2/dump301_test.gz
> > xfsdump: using file dump (drive_simple) strategy
> > xfsdump: version 3.0.1 (dump format 3.0) - Running single-threaded
> ....
> > xfsdump: Dump Status: SUCCESS
> > 
> > And the file is 4,8 GB. All seems to be correct!
> 
> I can't see any changes between 3.0.1 and 3.0.4 that would explain
> this. Did you run them on the same machine and kernel? Did you build
> them with the same compiler? If everything is the same, then perhaps
> you could bisect (only a few changes so should be quick) to point
> out the offending change.
> 
> Cheers,
> 
> Dave.

After some testing, I think it is NOT a problem with xfsdump, but with the new kernel 2.6.35.2. First I must correct my last posting: xfsdump-3.0.1 DO have the same problem as xfsdump-3.0.4 on kernel 2.6.35.2. It was just a coincidence that is worked one time without problems...

Machines: I have two x86_64 and one x86. All machines have the same problems after I upgraded all three Kernels from 2.6.34.x to 2.6.35.2. So I believe, it is a problem with 2.6.35.2 or the combination of [2.6.35.2 & xfsdump]

Compiler: I use "gcc (Gentoo 4.4.4-r1 p1.0, pie-0.4.5) 4.4.4". 

Testing List (on one machine only):
works:   x86_64, 2.6.34.4, xfsdump-3.0.1
works:   x86_64, 2.6.34.4, xfsdump-3.0.4
failure: x86_64, 2.6.35.2, xfsdump-3.0.1 (worked only one time)
failure: x86_64, 2.6.35.2, xfsdump-3.0.4

Mario

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfsdump-3.0.4 problems
  2010-08-17  6:32   ` Mario Bachmann
@ 2010-08-17  7:13     ` Dave Chinner
  2010-08-17  7:53       ` Mario Bachmann
  2010-08-17  9:03       ` Christoph Hellwig
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Chinner @ 2010-08-17  7:13 UTC (permalink / raw)
  To: Mario Bachmann; +Cc: xfs

On Tue, Aug 17, 2010 at 08:32:27AM +0200, Mario Bachmann wrote:
> Am Tue, 17 Aug 2010 08:30:21 +1000
> schrieb Dave Chinner <david@fromorbit.com>:
> 
> > On Mon, Aug 16, 2010 at 06:22:36PM +0200, Mario Bachmann wrote:
> > > Hello, 
> > > 
> > > my kernel is
> > > Linux x2 2.6.35.2 #1 SMP Sun Aug 15 00:32:14 CEST 2010 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ AuthenticAMD GNU/Linux
> > > 
> > > I get a lot of Warnings with xfsdump-3.0.4 (booth, gentoo package 3.0.4-r1 and git-version):
> > > 
> > > x2 ~/source/d/xfsdump/dump # ./xfsdump -l0 -L "Test" - /dev/sda2 |gzip - > /mnt/data2/dump_text.gz
> > .....
> > > ./xfsdump: WARNING: could not stat dirent .crack-attack ino 100663674: Das Argument ist ungültig: using null generation count in directory entry
> > 
> > bulkstat is failing, indicating an invalid option was passed.
> > 
> > > With the "old" version xfsdump-3.0.1, I get no warnings!
> > > xfsdump -l0 -L "Test" - /dev/sda2 |gzip - > /mnt/data2/dump301_test.gz
> > > xfsdump: using file dump (drive_simple) strategy
> > > xfsdump: version 3.0.1 (dump format 3.0) - Running single-threaded
> > ....
> > > xfsdump: Dump Status: SUCCESS
> > > 
> > > And the file is 4,8 GB. All seems to be correct!
> > 
> > I can't see any changes between 3.0.1 and 3.0.4 that would explain
> > this. Did you run them on the same machine and kernel? Did you build
> > them with the same compiler? If everything is the same, then perhaps
> > you could bisect (only a few changes so should be quick) to point
> > out the offending change.
> 
> After some testing, I think it is NOT a problem with xfsdump, but
> with the new kernel 2.6.35.2. First I must correct my last
> posting: xfsdump-3.0.1 DO have the same problem as xfsdump-3.0.4
> on kernel 2.6.35.2. It was just a coincidence that is worked one
> time without problems...
> 
> Machines: I have two x86_64 and one x86. All machines have the
> same problems after I upgraded all three Kernels from 2.6.34.x to
> 2.6.35.2. So I believe, it is a problem with 2.6.35.2 or the
> combination of [2.6.35.2 & xfsdump].
> 
> Compiler: I use "gcc (Gentoo 4.4.4-r1 p1.0, pie-0.4.5) 4.4.4". 
> 
> Testing List (on one machine only):
> works:   x86_64, 2.6.34.4, xfsdump-3.0.1
> works:   x86_64, 2.6.34.4, xfsdump-3.0.4
> failure: x86_64, 2.6.35.2, xfsdump-3.0.1 (worked only one time)
> failure: x86_64, 2.6.35.2, xfsdump-3.0.4

Ok, that makes more sense - we changed the way bulkstat works in
from 2.6.34 to 2.6.35 to correctly validate inode numbers being
passed in via bulkstat, and hence files unlinked during the dump run
could return EINVAL when validating the directory structure (as they
no longer exist). Is you system completely idle while the dump
is running, or are files being removed while the dump is running?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfsdump-3.0.4 problems
  2010-08-17  7:13     ` Dave Chinner
@ 2010-08-17  7:53       ` Mario Bachmann
  2010-08-17  9:05         ` Dave Chinner
  2010-08-17  9:03       ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Mario Bachmann @ 2010-08-17  7:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Am Tue, 17 Aug 2010 17:13:37 +1000
schrieb Dave Chinner <david@fromorbit.com>:

> On Tue, Aug 17, 2010 at 08:32:27AM +0200, Mario Bachmann wrote:
> > Am Tue, 17 Aug 2010 08:30:21 +1000
> > schrieb Dave Chinner <david@fromorbit.com>:
> > 
> > > On Mon, Aug 16, 2010 at 06:22:36PM +0200, Mario Bachmann wrote:
> > > > Hello, 
> > > > 
> > > > my kernel is
> > > > Linux x2 2.6.35.2 #1 SMP Sun Aug 15 00:32:14 CEST 2010 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ AuthenticAMD GNU/Linux
> > > > 
> > > > I get a lot of Warnings with xfsdump-3.0.4 (booth, gentoo package 3.0.4-r1 and git-version):
> > > > 
> > > > x2 ~/source/d/xfsdump/dump # ./xfsdump -l0 -L "Test" - /dev/sda2 |gzip - > /mnt/data2/dump_text.gz
> > > .....
> > > > ./xfsdump: WARNING: could not stat dirent .crack-attack ino 100663674: Das Argument ist ungültig: using null generation count in directory entry
> > > 
> > > bulkstat is failing, indicating an invalid option was passed.
> > > 
> > > > With the "old" version xfsdump-3.0.1, I get no warnings!
> > > > xfsdump -l0 -L "Test" - /dev/sda2 |gzip - > /mnt/data2/dump301_test.gz
> > > > xfsdump: using file dump (drive_simple) strategy
> > > > xfsdump: version 3.0.1 (dump format 3.0) - Running single-threaded
> > > ....
> > > > xfsdump: Dump Status: SUCCESS
> > > > 
> > > > And the file is 4,8 GB. All seems to be correct!
> > > 
> > > I can't see any changes between 3.0.1 and 3.0.4 that would explain
> > > this. Did you run them on the same machine and kernel? Did you build
> > > them with the same compiler? If everything is the same, then perhaps
> > > you could bisect (only a few changes so should be quick) to point
> > > out the offending change.
> > 
> > After some testing, I think it is NOT a problem with xfsdump, but
> > with the new kernel 2.6.35.2. First I must correct my last
> > posting: xfsdump-3.0.1 DO have the same problem as xfsdump-3.0.4
> > on kernel 2.6.35.2. It was just a coincidence that is worked one
> > time without problems...
> > 
> > Machines: I have two x86_64 and one x86. All machines have the
> > same problems after I upgraded all three Kernels from 2.6.34.x to
> > 2.6.35.2. So I believe, it is a problem with 2.6.35.2 or the
> > combination of [2.6.35.2 & xfsdump].
> > 
> > Compiler: I use "gcc (Gentoo 4.4.4-r1 p1.0, pie-0.4.5) 4.4.4". 
> > 
> > Testing List (on one machine only):
> > works:   x86_64, 2.6.34.4, xfsdump-3.0.1
> > works:   x86_64, 2.6.34.4, xfsdump-3.0.4
> > failure: x86_64, 2.6.35.2, xfsdump-3.0.1 (worked only one time)
> > failure: x86_64, 2.6.35.2, xfsdump-3.0.4
> 
> Ok, that makes more sense - we changed the way bulkstat works in
> from 2.6.34 to 2.6.35 to correctly validate inode numbers being
> passed in via bulkstat, and hence files unlinked during the dump run
> could return EINVAL when validating the directory structure (as they
> no longer exist). Is you system completely idle while the dump
> is running, or are files being removed while the dump is running?
> 
> Cheers,
> 
> Dave.

I would call my system idle, when I use xfsdump. No rm or mv operations 
are running while the dump. The first machine has a dual core 2.9 GHz and
8 GB of RAM and the filesystems are not really big (~10GB used). The second 
machine has a dual core 2 GHz and 2 GB of RAM. 

It does not matter if I dump the running root partition or the extra 
home partition (even not logged in with a user, so there should be absolutely 
no changes to the files, also I did a sync before the dump). 

What I tested now: After downgrading to 2.6.34.4 on both x86_64, xfsdump worked again, 
but that is no solution to use an old kernel!

To describe the result on 2.6.35.2 again clearly: xfsdump produces a dump where 
some gigabyte of data are simply missing. I think about 30% of all files are 
missing. 

Mario

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfsdump-3.0.4 problems
  2010-08-17  7:13     ` Dave Chinner
  2010-08-17  7:53       ` Mario Bachmann
@ 2010-08-17  9:03       ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2010-08-17  9:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Mario Bachmann, xfs

On Tue, Aug 17, 2010 at 05:13:37PM +1000, Dave Chinner wrote:
> > Testing List (on one machine only):
> > works:   x86_64, 2.6.34.4, xfsdump-3.0.1
> > works:   x86_64, 2.6.34.4, xfsdump-3.0.4
> > failure: x86_64, 2.6.35.2, xfsdump-3.0.1 (worked only one time)
> > failure: x86_64, 2.6.35.2, xfsdump-3.0.4
> 
> Ok, that makes more sense - we changed the way bulkstat works in
> from 2.6.34 to 2.6.35 to correctly validate inode numbers being
> passed in via bulkstat, and hence files unlinked during the dump run
> could return EINVAL when validating the directory structure (as they
> no longer exist). Is you system completely idle while the dump
> is running, or are files being removed while the dump is running?

I've also seen related issues in xfstests recently, but even going all
the way back to kernel 2.6.34 does not solve all of them for me.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfsdump-3.0.4 problems
  2010-08-17  7:53       ` Mario Bachmann
@ 2010-08-17  9:05         ` Dave Chinner
  2010-08-17 11:45           ` [PATCH] " Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2010-08-17  9:05 UTC (permalink / raw)
  To: Mario Bachmann; +Cc: xfs

On Tue, Aug 17, 2010 at 09:53:40AM +0200, Mario Bachmann wrote:
> Am Tue, 17 Aug 2010 17:13:37 +1000 > schrieb Dave Chinner <david@fromorbit.com>:
> > > Compiler: I use "gcc (Gentoo 4.4.4-r1 p1.0, pie-0.4.5) 4.4.4". 
> > > 
> > > Testing List (on one machine only):
> > > works:   x86_64, 2.6.34.4, xfsdump-3.0.1
> > > works:   x86_64, 2.6.34.4, xfsdump-3.0.4
> > > failure: x86_64, 2.6.35.2, xfsdump-3.0.1 (worked only one time)
> > > failure: x86_64, 2.6.35.2, xfsdump-3.0.4
> > 
> > Ok, that makes more sense - we changed the way bulkstat works in
> > from 2.6.34 to 2.6.35 to correctly validate inode numbers being
> > passed in via bulkstat, and hence files unlinked during the dump run
> > could return EINVAL when validating the directory structure (as they
> > no longer exist). Is you system completely idle while the dump
> > is running, or are files being removed while the dump is running?
> 
> I would call my system idle, when I use xfsdump. No rm or mv operations 
> are running while the dump. The first machine has a dual core 2.9 GHz and
> 8 GB of RAM and the filesystems are not really big (~10GB used). The second 
> machine has a dual core 2 GHz and 2 GB of RAM. 

Yup, I have reproduced it here. What is strange is that xfs_fsr uses
XFS_IOC_BULKSTAT_SINGLE, and that works fine on 2.6.35.2. The same
ioctl calls from xfsdump are failing, though, so something funny is
going on there.

I'll look into it further.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Re: xfsdump-3.0.4 problems
  2010-08-17  9:05         ` Dave Chinner
@ 2010-08-17 11:45           ` Dave Chinner
  2010-08-17 15:47             ` Mario Bachmann
                               ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dave Chinner @ 2010-08-17 11:45 UTC (permalink / raw)
  To: Mario Bachmann; +Cc: xfs

On Tue, Aug 17, 2010 at 07:05:34PM +1000, Dave Chinner wrote:
> On Tue, Aug 17, 2010 at 09:53:40AM +0200, Mario Bachmann wrote:
> > Am Tue, 17 Aug 2010 17:13:37 +1000 > schrieb Dave Chinner <david@fromorbit.com>:
> > > > Compiler: I use "gcc (Gentoo 4.4.4-r1 p1.0, pie-0.4.5) 4.4.4". 
> > > > 
> > > > Testing List (on one machine only):
> > > > works:   x86_64, 2.6.34.4, xfsdump-3.0.1
> > > > works:   x86_64, 2.6.34.4, xfsdump-3.0.4
> > > > failure: x86_64, 2.6.35.2, xfsdump-3.0.1 (worked only one time)
> > > > failure: x86_64, 2.6.35.2, xfsdump-3.0.4
> > > 
> > > Ok, that makes more sense - we changed the way bulkstat works in
> > > from 2.6.34 to 2.6.35 to correctly validate inode numbers being
> > > passed in via bulkstat, and hence files unlinked during the dump run
> > > could return EINVAL when validating the directory structure (as they
> > > no longer exist). Is you system completely idle while the dump
> > > is running, or are files being removed while the dump is running?
> > 
> > I would call my system idle, when I use xfsdump. No rm or mv operations 
> > are running while the dump. The first machine has a dual core 2.9 GHz and
> > 8 GB of RAM and the filesystems are not really big (~10GB used). The second 
> > machine has a dual core 2 GHz and 2 GB of RAM. 
> 
> Yup, I have reproduced it here. What is strange is that xfs_fsr uses
> XFS_IOC_BULKSTAT_SINGLE, and that works fine on 2.6.35.2. The same
> ioctl calls from xfsdump are failing, though, so something funny is
> going on there.
> 
> I'll look into it further.

Ok, there is nothing wrong with the changes to the bulkstat code;
when all the inodes in the filesystem are hot in the inode cache
xfsdump succeeds.

When I run xfs_fsr per file to exercise the XFS_IOC_BULKSTAT_SINGLE
path like so:

$ sudo find /mnt/test -type f -exec xfs_fsr -d -v {} \;

It succeeds without any bulkstat failures. A subsequent xfsdump
invocation then succeeds without failure, either. Clearly the find
is populating the inode cache for the subsequent bulkstat calls,

Ok, so the reason this wasn't picked up is that xfs_fsr silently
ignores inodes that it gets an error from bulkstat on.

and it looks like
Dropping caches then running xfsdump:

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ sudo xfsdump -l0 -L "Test" - /dev/vda 2> t.t |gzip - > ~/dump_test.gz

Results in failures.

/me sighs

My fault. I screwed up the btree lookup for the inode validation.
Can you test the patch below?

Cheers,

Dave
-- 
Dave Chinner
david@fromorbit.com

xfs: fix untrusted inode number lookup

From: Dave Chinner <dchinner@redhat.com>

Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode
numbers during lookup") changes the inode lookup code to do btree lookups for
untrusted inode numbers. This change made an invalid assumption about the
alignment of inodes and hence incorrectly calculated the first inode in the
cluster. As a result, some inode numbers were being incorrectly considered
invalid when they were actually valid.

The issue was not picked up by the xfstests suite because it always runs fsr
and dump (the two utilities that utilise the bulkstat interface) on cache hot
inodes and hence the lookup code in the cold cache path was not sufficiently
exercised to uncover this intermittent problem.

Fix the issue by relaxing the btree lookup criteria and then checking if the
record returned contains the inode number we are lookup for. If it we get an
incorrect record, then the inode number is invalid.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_ialloc.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index abf80ae..5371d2d 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -1213,7 +1213,6 @@ xfs_imap_lookup(
 	struct xfs_inobt_rec_incore rec;
 	struct xfs_btree_cur	*cur;
 	struct xfs_buf		*agbp;
-	xfs_agino_t		startino;
 	int			error;
 	int			i;
 
@@ -1227,13 +1226,13 @@ xfs_imap_lookup(
 	}
 
 	/*
-	 * derive and lookup the exact inode record for the given agino. If the
-	 * record cannot be found, then it's an invalid inode number and we
-	 * should abort.
+	 * Lookup the inode record for the given agino. If the record cannot be
+	 * found, then it's an invalid inode number and we should abort. Once
+	 * we have a record, we need to ensure it contains the inode number
+	 * we are looking up.
 	 */
 	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno);
-	startino = agino & ~(XFS_IALLOC_INODES(mp) - 1);
-	error = xfs_inobt_lookup(cur, startino, XFS_LOOKUP_EQ, &i);
+	error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &i);
 	if (!error) {
 		if (i)
 			error = xfs_inobt_get_rec(cur, &rec, &i);
@@ -1246,6 +1245,11 @@ xfs_imap_lookup(
 	if (error)
 		return error;
 
+	/* check that the returned record contains the required inode */
+	if (rec.ir_startino > agino ||
+	    rec.ir_startino + XFS_IALLOC_INODES(mp) <= agino)
+		return EINVAL;
+
 	/* for untrusted inodes check it is allocated first */
 	if ((flags & XFS_IGET_UNTRUSTED) &&
 	    (rec.ir_free & XFS_INOBT_MASK(agino - rec.ir_startino)))

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: xfsdump-3.0.4 problems
  2010-08-17 11:45           ` [PATCH] " Dave Chinner
@ 2010-08-17 15:47             ` Mario Bachmann
  2010-08-18 10:10             ` Christoph Hellwig
  2010-08-27 11:18             ` Iustin Pop
  2 siblings, 0 replies; 14+ messages in thread
From: Mario Bachmann @ 2010-08-17 15:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Am Tue, 17 Aug 2010 21:45:50 +1000
schrieb Dave Chinner <david@fromorbit.com>:

> On Tue, Aug 17, 2010 at 07:05:34PM +1000, Dave Chinner wrote:
> > On Tue, Aug 17, 2010 at 09:53:40AM +0200, Mario Bachmann wrote:
> > > Am Tue, 17 Aug 2010 17:13:37 +1000 > schrieb Dave Chinner <david@fromorbit.com>:
> > > > > Compiler: I use "gcc (Gentoo 4.4.4-r1 p1.0, pie-0.4.5) 4.4.4". 
> > > > > 
> > > > > Testing List (on one machine only):
> > > > > works:   x86_64, 2.6.34.4, xfsdump-3.0.1
> > > > > works:   x86_64, 2.6.34.4, xfsdump-3.0.4
> > > > > failure: x86_64, 2.6.35.2, xfsdump-3.0.1 (worked only one time)
> > > > > failure: x86_64, 2.6.35.2, xfsdump-3.0.4
> > > > 
> > > > Ok, that makes more sense - we changed the way bulkstat works in
> > > > from 2.6.34 to 2.6.35 to correctly validate inode numbers being
> > > > passed in via bulkstat, and hence files unlinked during the dump run
> > > > could return EINVAL when validating the directory structure (as they
> > > > no longer exist). Is you system completely idle while the dump
> > > > is running, or are files being removed while the dump is running?
> > > 
> > > I would call my system idle, when I use xfsdump. No rm or mv operations 
> > > are running while the dump. The first machine has a dual core 2.9 GHz and
> > > 8 GB of RAM and the filesystems are not really big (~10GB used). The second 
> > > machine has a dual core 2 GHz and 2 GB of RAM. 
> > 
> > Yup, I have reproduced it here. What is strange is that xfs_fsr uses
> > XFS_IOC_BULKSTAT_SINGLE, and that works fine on 2.6.35.2. The same
> > ioctl calls from xfsdump are failing, though, so something funny is
> > going on there.
> > 
> > I'll look into it further.
> 
> Ok, there is nothing wrong with the changes to the bulkstat code;
> when all the inodes in the filesystem are hot in the inode cache
> xfsdump succeeds.
> 
> When I run xfs_fsr per file to exercise the XFS_IOC_BULKSTAT_SINGLE
> path like so:
> 
> $ sudo find /mnt/test -type f -exec xfs_fsr -d -v {} \;
> 
> It succeeds without any bulkstat failures. A subsequent xfsdump
> invocation then succeeds without failure, either. Clearly the find
> is populating the inode cache for the subsequent bulkstat calls,
> 
> Ok, so the reason this wasn't picked up is that xfs_fsr silently
> ignores inodes that it gets an error from bulkstat on.
> 
> and it looks like
> Dropping caches then running xfsdump:
> 
> $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
> $ sudo xfsdump -l0 -L "Test" - /dev/vda 2> t.t |gzip - > ~/dump_test.gz
> 
> Results in failures.
> 
> /me sighs
> 
> My fault. I screwed up the btree lookup for the inode validation.
> Can you test the patch below?
> 
> Cheers,
> 
> Dave

Your patch works here with 2.6.35.2. 
:-)

Mario

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: xfsdump-3.0.4 problems
  2010-08-17 11:45           ` [PATCH] " Dave Chinner
  2010-08-17 15:47             ` Mario Bachmann
@ 2010-08-18 10:10             ` Christoph Hellwig
  2010-08-27 11:18             ` Iustin Pop
  2 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2010-08-18 10:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Mario Bachmann, xfs

> xfs: fix untrusted inode number lookup
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode
> numbers during lookup") changes the inode lookup code to do btree lookups for
> untrusted inode numbers. This change made an invalid assumption about the
> alignment of inodes and hence incorrectly calculated the first inode in the
> cluster. As a result, some inode numbers were being incorrectly considered
> invalid when they were actually valid.
> 
> The issue was not picked up by the xfstests suite because it always runs fsr
> and dump (the two utilities that utilise the bulkstat interface) on cache hot
> inodes and hence the lookup code in the cold cache path was not sufficiently
> exercised to uncover this intermittent problem.
> 
> Fix the issue by relaxing the btree lookup criteria and then checking if the
> record returned contains the inode number we are lookup for. If it we get an
> incorrect record, then the inode number is invalid.

Looks good and fixes the dump issues I've seen in xfstests.


Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: xfsdump-3.0.4 problems
  2010-08-17 11:45           ` [PATCH] " Dave Chinner
  2010-08-17 15:47             ` Mario Bachmann
  2010-08-18 10:10             ` Christoph Hellwig
@ 2010-08-27 11:18             ` Iustin Pop
  2010-08-27 11:40               ` Dave Chinner
  2 siblings, 1 reply; 14+ messages in thread
From: Iustin Pop @ 2010-08-27 11:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Mario Bachmann, xfs

On Tue, Aug 17, 2010 at 09:45:50PM +1000, Dave Chinner wrote:
> My fault. I screwed up the btree lookup for the inode validation.
> Can you test the patch below?

I just see that 2.6.35.4 has been released, but it doesn't include this
fix (as far as I can see). Could it be send for inclusion into the next
stable please (yes, it fixes the issue for me too).

thanks,
iustin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: xfsdump-3.0.4 problems
  2010-08-27 11:18             ` Iustin Pop
@ 2010-08-27 11:40               ` Dave Chinner
  2010-09-24  9:53                 ` Mario Bachmann
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2010-08-27 11:40 UTC (permalink / raw)
  To: Mario Bachmann, xfs

On Fri, Aug 27, 2010 at 01:18:20PM +0200, Iustin Pop wrote:
> On Tue, Aug 17, 2010 at 09:45:50PM +1000, Dave Chinner wrote:
> > My fault. I screwed up the btree lookup for the inode validation.
> > Can you test the patch below?
> 
> I just see that 2.6.35.4 has been released, but it doesn't include this
> fix (as far as I can see). Could it be send for inclusion into the next
> stable please (yes, it fixes the issue for me too).

The commit is now upstream, it had a "cc: stable@kernel.org" in it,
so it should get automatically queued for inclusion in the next
stable kernel release. If I don't see it appear in Greg's stable
queue once he starts processing the commits for the next stable
release, I'll chase it up....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: xfsdump-3.0.4 problems
  2010-08-27 11:40               ` Dave Chinner
@ 2010-09-24  9:53                 ` Mario Bachmann
  2010-10-03  6:20                   ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Mario Bachmann @ 2010-09-24  9:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi there, 

is the patch now in 2.6.35.5? 

The more exact question is: Does XFS (and xfsprogs) work perfectly now again?
My backup works with xfsdump, so it would be a disaster when this does not work.

Thanks for your information!

Greetings
Mario


Am Fri, 27 Aug 2010 21:40:55 +1000
schrieb Dave Chinner <david@fromorbit.com>:

> On Fri, Aug 27, 2010 at 01:18:20PM +0200, Iustin Pop wrote:
> > On Tue, Aug 17, 2010 at 09:45:50PM +1000, Dave Chinner wrote:
> > > My fault. I screwed up the btree lookup for the inode validation.
> > > Can you test the patch below?
> > 
> > I just see that 2.6.35.4 has been released, but it doesn't include this
> > fix (as far as I can see). Could it be send for inclusion into the next
> > stable please (yes, it fixes the issue for me too).
> 
> The commit is now upstream, it had a "cc: stable@kernel.org" in it,
> so it should get automatically queued for inclusion in the next
> stable kernel release. If I don't see it appear in Greg's stable
> queue once he starts processing the commits for the next stable
> release, I'll chase it up....
> 
> Cheers,
> 
> Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: xfsdump-3.0.4 problems
  2010-09-24  9:53                 ` Mario Bachmann
@ 2010-10-03  6:20                   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2010-10-03  6:20 UTC (permalink / raw)
  To: Mario Bachmann; +Cc: xfs

On Fri, Sep 24, 2010 at 11:53:17AM +0200, Mario Bachmann wrote:
> Hi there, 
> 
> is the patch now in 2.6.35.5? 
> 
> The more exact question is: Does XFS (and xfsprogs) work perfectly now again?
> My backup works with xfsdump, so it would be a disaster when this does not work.
> 
> Thanks for your information!

I've not seen the mail that gets sent for -stable inclusion normally, so
I suspect it's not been included yet.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-10-03  6:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-16 16:22 xfsdump-3.0.4 problems Mario Bachmann
2010-08-16 22:30 ` Dave Chinner
2010-08-17  6:32   ` Mario Bachmann
2010-08-17  7:13     ` Dave Chinner
2010-08-17  7:53       ` Mario Bachmann
2010-08-17  9:05         ` Dave Chinner
2010-08-17 11:45           ` [PATCH] " Dave Chinner
2010-08-17 15:47             ` Mario Bachmann
2010-08-18 10:10             ` Christoph Hellwig
2010-08-27 11:18             ` Iustin Pop
2010-08-27 11:40               ` Dave Chinner
2010-09-24  9:53                 ` Mario Bachmann
2010-10-03  6:20                   ` Christoph Hellwig
2010-08-17  9:03       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.