xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?

All of lore.kernel.org
 help / color / mirror / Atom feed

* xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
@ 2011-02-02 13:30 Michael Lueck
  2011-02-02 18:32 ` Bill Kendall
  2011-04-22 12:34 ` Michael Lueck
  0 siblings, 2 replies; 23+ messages in thread
From: Michael Lueck @ 2011-02-02 13:30 UTC (permalink / raw)
  To: linux-xfs

Greetings,

Somehow a reported IRIX bug with XFS got into Ubuntu 10.04 (Lucid) starting with kernel 2.6.32-27.

I am cross posting to this list in order to receive details as to what bug got in Ubuntu which has been solved in IRIX, hoping with more details Ubuntu might make the same fix.

Ubuntu has since updated their kernel to 2.6.32-28 and someone already verified at the bug report that the problem persists with that kernel version.

"Regression between 2.6.32-27 and 2.6.32-26 xfsdump SGI_FS_BULKSTAT errno = 22"
https://bugs.launchpad.net/bugs/692848

On our Ubuntu 10.04 LTS server running x86 code, this evening a kernel
update was ready for installation. I updated the kernel, rebooted
(IPL'ed), and proceeded with the backup which utilized xfsdump as we use
the xfs filesystem. Four of the xfsdump received a never before seen
error: SGI_FS_BULKSTAT errno = 22

Output as follows:
using file dump (drive_simple) strategy
version 3.0.4 (dump format 3.0) - Running single-threaded
level 0 dump of ldslnx01:/srv
dump date: Mon Dec 20 21:59:33 2010
session id: f98f8cc0-963f-41a6-9a19-a89192502bf0
session label: "data"
ino map phase 1: constructing initial dump list
ino map phase 2: skipping (no pruning necessary)
ino map phase 3: skipping (only one dump stream)
ino map construction complete
estimated dump size: 100739707392 bytes
WARNING: no media label specified
creating dump session media file 0 (media 0, file 0)
dumping ino map
dumping directories
SGI_FS_BULKSTAT failed: Invalid argument (22)
dump size (non-dir files) : 0 bytes
NOTE: dump interrupted: 79 seconds elapsed: may resume later using -R option
Dump Status: INTERRUPT

This backup file did have some size to it. The other three, backing up
smaller amounts of data, were all zero (0) length dump files.

I rebooted to the prior kernel: $ uname -a
Linux ldslnx01 2.6.32-26-generic-pae #48-Ubuntu SMP Wed Nov 24 10:31:20 UTC 2010 i686 GNU/Linux

And the same backup gets to 100% success.

Reboot to the new kernel, same failure.

I think that fairly well illustrates that the problem exists only with
the kernel update installed this evening.

<><><><>

I did come across a reference to this problem on the SGI website:

http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=relnotes&fname=/usr/relnotes/eoe
   1.6.17  Bugs fixed in IRIX 6.5.13
     + 816457:  xfsdump SGI_FS_BULKSTAT errno = 22	cxfs

So evidently it is something that has been seen and corrected in IRIX.

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-02 13:30 xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26? Michael Lueck
@ 2011-02-02 18:32 ` Bill Kendall
  2011-02-02 19:03   ` Michael Lueck
  2011-02-03  4:58   ` Dave Chinner
  2011-04-22 12:34 ` Michael Lueck
  1 sibling, 2 replies; 23+ messages in thread
From: Bill Kendall @ 2011-02-02 18:32 UTC (permalink / raw)
  To: mlueck; +Cc: linux-xfs

On 02/02/2011 07:30 AM, Michael Lueck wrote:
> Greetings,
>
> Somehow a reported IRIX bug with XFS got into Ubuntu 10.04 (Lucid) starting with kernel 2.6.32-27.
>
> I am cross posting to this list in order to receive details as to what bug got in Ubuntu which has
> been solved in IRIX, hoping with more details Ubuntu might make the same fix.

Aside from the fact that the errno is the same, there's nothing to suggest
that the Ubuntu problem is the the same bug. The IRIX bug is quite old.

>
> Ubuntu has since updated their kernel to 2.6.32-28 and someone already verified at the bug report
> that the problem persists with that kernel version.
>
> "Regression between 2.6.32-27 and 2.6.32-26 xfsdump SGI_FS_BULKSTAT errno = 22"
> https://bugs.launchpad.net/bugs/692848

Between 2.6.32-26 and 2.6.32-27, Ubuntu backported 4 XFS commits from
2.6.35/2.6.36. All are part of a bulkstat security fix.

% git log Ubuntu-2.6.32-26.48..Ubuntu-2.6.32-27.49 -- fs/xfs | grep commit
commit 52d2a4cfbc852da8c3d3b9fa0cac2a07b12f5cfd
     (cherry picked from commit 4536f2ad8b330453d7ebec0746c4374eadd649b1)
commit eb5ab28c8a5e4bb3f1ce05eba166c12175f6c701
     (backported from commit 7b6259e7a83647948fa33a736cc832310c8d85aa)
commit 5f8e8c6ab416bbd58d4f5df512c119a888ff923c
     (cherry picked from commit 1920779e67cbf5ea8afef317777c5bf2b8096188)
commit 52e0d703745f7110f1ecbe83c02cf06a83da82e8
     (backported from commit 7124fe0a5b619d65b739477b3b55a20bf805b06d)

I'm not aware of a similar problem upstream, so it would appear
to be a problem with Ubuntu's backport of these commits.

Bill

>
> On our Ubuntu 10.04 LTS server running x86 code, this evening a kernel
> update was ready for installation. I updated the kernel, rebooted
> (IPL'ed), and proceeded with the backup which utilized xfsdump as we use
> the xfs filesystem. Four of the xfsdump received a never before seen
> error: SGI_FS_BULKSTAT errno = 22
>
> Output as follows:
> using file dump (drive_simple) strategy
> version 3.0.4 (dump format 3.0) - Running single-threaded
> level 0 dump of ldslnx01:/srv
> dump date: Mon Dec 20 21:59:33 2010
> session id: f98f8cc0-963f-41a6-9a19-a89192502bf0
> session label: "data"
> ino map phase 1: constructing initial dump list
> ino map phase 2: skipping (no pruning necessary)
> ino map phase 3: skipping (only one dump stream)
> ino map construction complete
> estimated dump size: 100739707392 bytes
> WARNING: no media label specified
> creating dump session media file 0 (media 0, file 0)
> dumping ino map
> dumping directories
> SGI_FS_BULKSTAT failed: Invalid argument (22)
> dump size (non-dir files) : 0 bytes
> NOTE: dump interrupted: 79 seconds elapsed: may resume later using -R option
> Dump Status: INTERRUPT
>
> This backup file did have some size to it. The other three, backing up
> smaller amounts of data, were all zero (0) length dump files.
>
> I rebooted to the prior kernel: $ uname -a
> Linux ldslnx01 2.6.32-26-generic-pae #48-Ubuntu SMP Wed Nov 24 10:31:20 UTC 2010 i686 GNU/Linux
>
> And the same backup gets to 100% success.
>
> Reboot to the new kernel, same failure.
>
> I think that fairly well illustrates that the problem exists only with
> the kernel update installed this evening.
>
> <><><><>
>
> I did come across a reference to this problem on the SGI website:
>
> http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=relnotes&fname=/usr/relnotes/eoe
> 1.6.17 Bugs fixed in IRIX 6.5.13
> + 816457: xfsdump SGI_FS_BULKSTAT errno = 22 cxfs
>
> So evidently it is something that has been seen and corrected in IRIX.
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-02 18:32 ` Bill Kendall
@ 2011-02-02 19:03   ` Michael Lueck
  2011-02-03  4:58   ` Dave Chinner
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Lueck @ 2011-02-02 19:03 UTC (permalink / raw)
  To: linux-xfs

Bill Kendall wrote:
> I'm not aware of a similar problem upstream, so it would appear
> to be a problem with Ubuntu's backport of these commits.

Thank you SO much, Bill! :-) I will post your comments back to the Ubuntu bug report.

Sincerely,

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-02 18:32 ` Bill Kendall
  2011-02-02 19:03   ` Michael Lueck
@ 2011-02-03  4:58   ` Dave Chinner
  2011-02-03 14:43     ` Michael Lueck
                       ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Dave Chinner @ 2011-02-03  4:58 UTC (permalink / raw)
  To: Bill Kendall; +Cc: linux-xfs, Dann Frazier, mlueck

On Wed, Feb 02, 2011 at 12:32:59PM -0600, Bill Kendall wrote:
> On 02/02/2011 07:30 AM, Michael Lueck wrote:
> >Greetings,
> >
> >Somehow a reported IRIX bug with XFS got into Ubuntu 10.04 (Lucid) starting with kernel 2.6.32-27.
> >
> >I am cross posting to this list in order to receive details as to what bug got in Ubuntu which has
> >been solved in IRIX, hoping with more details Ubuntu might make the same fix.
> 
> Aside from the fact that the errno is the same, there's nothing to suggest
> that the Ubuntu problem is the the same bug. The IRIX bug is quite old.
> 
> >
> >Ubuntu has since updated their kernel to 2.6.32-28 and someone already verified at the bug report
> >that the problem persists with that kernel version.
> >
> >"Regression between 2.6.32-27 and 2.6.32-26 xfsdump SGI_FS_BULKSTAT errno = 22"
> >https://bugs.launchpad.net/bugs/692848
> 
> Between 2.6.32-26 and 2.6.32-27, Ubuntu backported 4 XFS commits from
> 2.6.35/2.6.36. All are part of a bulkstat security fix.
> 
> % git log Ubuntu-2.6.32-26.48..Ubuntu-2.6.32-27.49 -- fs/xfs | grep commit
> commit 52d2a4cfbc852da8c3d3b9fa0cac2a07b12f5cfd
>     (cherry picked from commit 4536f2ad8b330453d7ebec0746c4374eadd649b1)
> commit eb5ab28c8a5e4bb3f1ce05eba166c12175f6c701
>     (backported from commit 7b6259e7a83647948fa33a736cc832310c8d85aa)
> commit 5f8e8c6ab416bbd58d4f5df512c119a888ff923c
>     (cherry picked from commit 1920779e67cbf5ea8afef317777c5bf2b8096188)
> commit 52e0d703745f7110f1ecbe83c02cf06a83da82e8
>     (backported from commit 7124fe0a5b619d65b739477b3b55a20bf805b06d)
> 
> I'm not aware of a similar problem upstream, so it would appear
> to be a problem with Ubuntu's backport of these commits.

<sigh>

That'll be the untrusted inode lookup fixes.

> 
> Bill
> 
> >
> >On our Ubuntu 10.04 LTS server running x86 code, this evening a kernel
> >update was ready for installation. I updated the kernel, rebooted
> >(IPL'ed), and proceeded with the backup which utilized xfsdump as we use
> >the xfs filesystem. Four of the xfsdump received a never before seen
> >error: SGI_FS_BULKSTAT errno = 22
> >
> >Output as follows:
> >using file dump (drive_simple) strategy
> >version 3.0.4 (dump format 3.0) - Running single-threaded
> >level 0 dump of ldslnx01:/srv
> >dump date: Mon Dec 20 21:59:33 2010
> >session id: f98f8cc0-963f-41a6-9a19-a89192502bf0
> >session label: "data"
> >ino map phase 1: constructing initial dump list
> >ino map phase 2: skipping (no pruning necessary)
> >ino map phase 3: skipping (only one dump stream)
> >ino map construction complete
> >estimated dump size: 100739707392 bytes
> >WARNING: no media label specified
> >creating dump session media file 0 (media 0, file 0)
> >dumping ino map
> >dumping directories
> >SGI_FS_BULKSTAT failed: Invalid argument (22)
> >dump size (non-dir files) : 0 bytes
> >NOTE: dump interrupted: 79 seconds elapsed: may resume later using -R option
> >Dump Status: INTERRUPT

So bulkstat got EINVAL returned for and inode that it was looking
up. That implies that it was racing with an unlink, which is
what the above commits catch and prevent. Can you run xfsdump with
full debug output (-v 5) so we can see what inode is being operated
on when this failure occurs?

Dann, I'm not sure whether this means there was a bug in your
backport or whether it's just xfsdump not handling a failure
gracefully....

> >This backup file did have some size to it. The other three, backing up
> >smaller amounts of data, were all zero (0) length dump files.
> >
> >I rebooted to the prior kernel: $ uname -a
> >Linux ldslnx01 2.6.32-26-generic-pae #48-Ubuntu SMP Wed Nov 24 10:31:20 UTC 2010 i686 GNU/Linux
> >
> >And the same backup gets to 100% success.
> >
> >Reboot to the new kernel, same failure.
> >
> >I think that fairly well illustrates that the problem exists only with
> >the kernel update installed this evening.
> >
> ><><><><>
> >
> >I did come across a reference to this problem on the SGI website:
> >
> >http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=relnotes&fname=/usr/relnotes/eoe
> >1.6.17 Bugs fixed in IRIX 6.5.13
> >+ 816457: xfsdump SGI_FS_BULKSTAT errno = 22 cxfs
> >
> >So evidently it is something that has been seen and corrected in IRIX.

Oh, the curse of Google.

Irix 6.5.13 was released in 2001, so I don't think this is at all
relevant for a regression reported for the latest and greatest Ubuntu
kernel....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-03  4:58   ` Dave Chinner
@ 2011-02-03 14:43     ` Michael Lueck
  2011-02-04  0:08       ` Dave Chinner
  2011-02-03 14:51     ` Michael Lueck
  2011-02-04 14:52     ` dann frazier
  2 siblings, 1 reply; 23+ messages in thread
From: Michael Lueck @ 2011-02-03 14:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier

Dave Chinner wrote:
> So bulkstat got EINVAL returned for and inode that it was looking
> up. That implies that it was racing with an unlink, which is
> what the above commits catch and prevent. Can you run xfsdump with
> full debug output (-v 5) so we can see what inode is being operated
> on when this failure occurs?

Thank you so much Dave!

Please find the trace output here in zipped format:

http://www.lueckdatasystems.com/pub/ldsbackup.trace.log.zip

I chose to trace the "data" backup that actually had some size to it.

Sincerely,

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-03  4:58   ` Dave Chinner
  2011-02-03 14:43     ` Michael Lueck
@ 2011-02-03 14:51     ` Michael Lueck
  2011-02-04 14:52     ` dann frazier
  2 siblings, 0 replies; 23+ messages in thread
From: Michael Lueck @ 2011-02-03 14:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier

Greetings Dave-

BTW: I ended up testing with the latest 10.04 kernel.

The size diffrerence of both kernel packages is disturbing!

-rw-r--r-- 1 root root 31608926 2011-01-11 13:09 linux-image-2.6.32-28-generic-pae_2.6.32-28.55_i386.deb
-rw-r--r-- 1 root root     4160 2010-11-08 05:05 linux-image-generic-pae_2.6.32.26.28_i386.deb

Perhaps that is due to the only change-log entry that came through apticron to me that "bump kernel number in prep for xyz" type of a change comment.

Sincerely,

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-03 14:43     ` Michael Lueck
@ 2011-02-04  0:08       ` Dave Chinner
  2011-02-04 14:12         ` Michael Lueck
                           ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Dave Chinner @ 2011-02-04  0:08 UTC (permalink / raw)
  To: Michael Lueck; +Cc: linux-xfs, Dann Frazier

On Thu, Feb 03, 2011 at 09:43:03AM -0500, Michael Lueck wrote:
> Dave Chinner wrote:
> >So bulkstat got EINVAL returned for and inode that it was looking
> >up. That implies that it was racing with an unlink, which is
> >what the above commits catch and prevent. Can you run xfsdump with
> >full debug output (-v 5) so we can see what inode is being operated
> >on when this failure occurs?
> 
> Thank you so much Dave!
> 
> Please find the trace output here in zipped format:
> 
> http://www.lueckdatasystems.com/pub/ldsbackup.trace.log.zip

Ok, so xfsdump i seeing a short bulkstat, then an EINVAL returned
from the next bulkstat. That's not a race condition, and makes me
think you have some kind of on-disk corruption. The inode it is
starting at when it returns EINVAL is 80508397. Can you firstly
post the output of:

# xfs_db -c "inode 80508397" -c p <dev>

And can you also run 'xfs_repair -n <dev>' on the filesystem and
post the output as well so we can see what state the filesytem is
in?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-04  0:08       ` Dave Chinner
@ 2011-02-04 14:12         ` Michael Lueck
  2011-02-04 20:49           ` Dave Chinner
  2011-02-08 17:39         ` Michael Lueck
  2011-02-08 17:39         ` Michael Lueck
  2 siblings, 1 reply; 23+ messages in thread
From: Michael Lueck @ 2011-02-04 14:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier

Dave Chinner wrote:
> Ok, so xfsdump i seeing a short bulkstat, then an EINVAL returned
> from the next bulkstat. That's not a race condition, and makes me
> think you have some kind of on-disk corruption.

Very odd that some kind of on-disk corruption is suddenly causing xfsdump problems starting with Ubuntu 10.04 (Lucid) kernel 2.6.32-27 and persisting in 2.6.32-28.

And there is one other person who confirmed this xfsdump problem running Lucid with kernel 2.6.32-28. They reported their "me too" in the Ubuntu bug tracker.

Could it be that 2.6.32-26 and prior managed to write something to disk corrupted, and the newer code is tripping on it?

I shall reboot the server to the 2.6.32-28 kernel and perform the tests you requested.

Sincerely,

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-03  4:58   ` Dave Chinner
  2011-02-03 14:43     ` Michael Lueck
  2011-02-03 14:51     ` Michael Lueck
@ 2011-02-04 14:52     ` dann frazier
  2 siblings, 0 replies; 23+ messages in thread
From: dann frazier @ 2011-02-04 14:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, mlueck

On Thu, Feb 03, 2011 at 03:58:36PM +1100, Dave Chinner wrote:
> On Wed, Feb 02, 2011 at 12:32:59PM -0600, Bill Kendall wrote:
> > On 02/02/2011 07:30 AM, Michael Lueck wrote:
> > >Greetings,
> > >
> > >Somehow a reported IRIX bug with XFS got into Ubuntu 10.04 (Lucid) starting with kernel 2.6.32-27.
> > >
> > >I am cross posting to this list in order to receive details as to what bug got in Ubuntu which has
> > >been solved in IRIX, hoping with more details Ubuntu might make the same fix.
> > 
> > Aside from the fact that the errno is the same, there's nothing to suggest
> > that the Ubuntu problem is the the same bug. The IRIX bug is quite old.
> > 
> > >
> > >Ubuntu has since updated their kernel to 2.6.32-28 and someone already verified at the bug report
> > >that the problem persists with that kernel version.
> > >
> > >"Regression between 2.6.32-27 and 2.6.32-26 xfsdump SGI_FS_BULKSTAT errno = 22"
> > >https://bugs.launchpad.net/bugs/692848
> > 
> > Between 2.6.32-26 and 2.6.32-27, Ubuntu backported 4 XFS commits from
> > 2.6.35/2.6.36. All are part of a bulkstat security fix.
> > 
> > % git log Ubuntu-2.6.32-26.48..Ubuntu-2.6.32-27.49 -- fs/xfs | grep commit
> > commit 52d2a4cfbc852da8c3d3b9fa0cac2a07b12f5cfd
> >     (cherry picked from commit 4536f2ad8b330453d7ebec0746c4374eadd649b1)
> > commit eb5ab28c8a5e4bb3f1ce05eba166c12175f6c701
> >     (backported from commit 7b6259e7a83647948fa33a736cc832310c8d85aa)
> > commit 5f8e8c6ab416bbd58d4f5df512c119a888ff923c
> >     (cherry picked from commit 1920779e67cbf5ea8afef317777c5bf2b8096188)
> > commit 52e0d703745f7110f1ecbe83c02cf06a83da82e8
> >     (backported from commit 7124fe0a5b619d65b739477b3b55a20bf805b06d)
> > 
> > I'm not aware of a similar problem upstream, so it would appear
> > to be a problem with Ubuntu's backport of these commits.
> 
> <sigh>
> 
> That'll be the untrusted inode lookup fixes.
> 
> > 
> > Bill
> > 
> > >
> > >On our Ubuntu 10.04 LTS server running x86 code, this evening a kernel
> > >update was ready for installation. I updated the kernel, rebooted
> > >(IPL'ed), and proceeded with the backup which utilized xfsdump as we use
> > >the xfs filesystem. Four of the xfsdump received a never before seen
> > >error: SGI_FS_BULKSTAT errno = 22
> > >
> > >Output as follows:
> > >using file dump (drive_simple) strategy
> > >version 3.0.4 (dump format 3.0) - Running single-threaded
> > >level 0 dump of ldslnx01:/srv
> > >dump date: Mon Dec 20 21:59:33 2010
> > >session id: f98f8cc0-963f-41a6-9a19-a89192502bf0
> > >session label: "data"
> > >ino map phase 1: constructing initial dump list
> > >ino map phase 2: skipping (no pruning necessary)
> > >ino map phase 3: skipping (only one dump stream)
> > >ino map construction complete
> > >estimated dump size: 100739707392 bytes
> > >WARNING: no media label specified
> > >creating dump session media file 0 (media 0, file 0)
> > >dumping ino map
> > >dumping directories
> > >SGI_FS_BULKSTAT failed: Invalid argument (22)
> > >dump size (non-dir files) : 0 bytes
> > >NOTE: dump interrupted: 79 seconds elapsed: may resume later using -R option
> > >Dump Status: INTERRUPT
> 
> So bulkstat got EINVAL returned for and inode that it was looking
> up. That implies that it was racing with an unlink, which is
> what the above commits catch and prevent. Can you run xfsdump with
> full debug output (-v 5) so we can see what inode is being operated
> on when this failure occurs?
> 
> Dann, I'm not sure whether this means there was a bug in your
> backport or whether it's just xfsdump not handling a failure
> gracefully....

Yeah, I'm not sure either. I've attempted to reproduce it, but have so
far not been successful. I'll try to get a 32-bit test system setup in
case that makes a difference.

> > >This backup file did have some size to it. The other three, backing up
> > >smaller amounts of data, were all zero (0) length dump files.
> > >
> > >I rebooted to the prior kernel: $ uname -a
> > >Linux ldslnx01 2.6.32-26-generic-pae #48-Ubuntu SMP Wed Nov 24 10:31:20 UTC 2010 i686 GNU/Linux
> > >
> > >And the same backup gets to 100% success.
> > >
> > >Reboot to the new kernel, same failure.
> > >
> > >I think that fairly well illustrates that the problem exists only with
> > >the kernel update installed this evening.
> > >
> > ><><><><>
> > >
> > >I did come across a reference to this problem on the SGI website:
> > >
> > >http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=relnotes&fname=/usr/relnotes/eoe
> > >1.6.17 Bugs fixed in IRIX 6.5.13
> > >+ 816457: xfsdump SGI_FS_BULKSTAT errno = 22 cxfs
> > >
> > >So evidently it is something that has been seen and corrected in IRIX.
> 
> Oh, the curse of Google.
> 
> Irix 6.5.13 was released in 2001, so I don't think this is at all
> relevant for a regression reported for the latest and greatest Ubuntu
> kernel....
> 
> Cheers,
> 
> Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-04 14:12         ` Michael Lueck
@ 2011-02-04 20:49           ` Dave Chinner
  2011-02-07 20:55             ` Bill Kendall
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2011-02-04 20:49 UTC (permalink / raw)
  To: Michael Lueck; +Cc: linux-xfs, Dann Frazier

On Fri, Feb 04, 2011 at 09:12:53AM -0500, Michael Lueck wrote:
> Dave Chinner wrote:
> >Ok, so xfsdump i seeing a short bulkstat, then an EINVAL returned
> >from the next bulkstat. That's not a race condition, and makes me
> >think you have some kind of on-disk corruption.
> 
> Very odd that some kind of on-disk corruption is suddenly causing
> xfsdump problems starting with Ubuntu 10.04 (Lucid) kernel
> 2.6.32-27 and persisting in 2.6.32-28.

Not really. The newer kernels have code in them that does more
validity checks than previous kernels, so older kernels would have
erroneously and silently returned unlinked files to xfsdump and have
them backed up. IOWs, you'd never notice such a corruption with
xfsdump. On the new kernel, xfsdump gets an EINVAL error to such
occurrences, which it should have in the first place.

> And there is one other person who confirmed this xfsdump problem
> running Lucid with kernel 2.6.32-28. They reported their "me too"
> in the Ubuntu bug tracker.
> 
> Could it be that 2.6.32-26 and prior managed to write something to
> disk corrupted, and the newer code is tripping on it?

That's what I'm trying to find out. Or it could be something as
simple as your disk has had an undetected bit error that has flipped
a bit in the inode allocation btree.

> I shall reboot the server to the 2.6.32-28 kernel and perform the
> tests you requested.

No need to change kernels to run xfs_repair....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-04 20:49           ` Dave Chinner
@ 2011-02-07 20:55             ` Bill Kendall
  2011-02-07 21:23               ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Bill Kendall @ 2011-02-07 20:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier, Michael Lueck

On 02/04/2011 02:49 PM, Dave Chinner wrote:
> On Fri, Feb 04, 2011 at 09:12:53AM -0500, Michael Lueck wrote:
>> Dave Chinner wrote:
>>> Ok, so xfsdump i seeing a short bulkstat, then an EINVAL returned
>> >from the next bulkstat. That's not a race condition, and makes me
>>> think you have some kind of on-disk corruption.
>>
>> Very odd that some kind of on-disk corruption is suddenly causing
>> xfsdump problems starting with Ubuntu 10.04 (Lucid) kernel
>> 2.6.32-27 and persisting in 2.6.32-28.
>
> Not really. The newer kernels have code in them that does more
> validity checks than previous kernels, so older kernels would have
> erroneously and silently returned unlinked files to xfsdump and have
> them backed up. IOWs, you'd never notice such a corruption with
> xfsdump. On the new kernel, xfsdump gets an EINVAL error to such
> occurrences, which it should have in the first place.
>
>> And there is one other person who confirmed this xfsdump problem
>> running Lucid with kernel 2.6.32-28. They reported their "me too"
>> in the Ubuntu bug tracker.
>>
>> Could it be that 2.6.32-26 and prior managed to write something to
>> disk corrupted, and the newer code is tripping on it?
>
> That's what I'm trying to find out. Or it could be something as
> simple as your disk has had an undetected bit error that has flipped
> a bit in the inode allocation btree.
>

Hi Dave,

I am able to reproduce this on a system running Ubuntu 10.4
(2.6.32-28). I took a metadump of the filesystem and moved it to
a system running 10.10 (2.6.35-25), and was able to successfully
dump it there. Likewise it dumps fine on 2.6.38-rc1. So this
suggests an issue with the Ubuntu 10.4 kernel.

Bill

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-07 20:55             ` Bill Kendall
@ 2011-02-07 21:23               ` Dave Chinner
  2011-02-07 21:42                 ` Bill Kendall
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2011-02-07 21:23 UTC (permalink / raw)
  To: Bill Kendall; +Cc: linux-xfs, Dann Frazier, Michael Lueck

On Mon, Feb 07, 2011 at 02:55:36PM -0600, Bill Kendall wrote:
> On 02/04/2011 02:49 PM, Dave Chinner wrote:
> >On Fri, Feb 04, 2011 at 09:12:53AM -0500, Michael Lueck wrote:
> >>Dave Chinner wrote:
> >>>Ok, so xfsdump i seeing a short bulkstat, then an EINVAL returned
> >>>from the next bulkstat. That's not a race condition, and makes me
> >>>think you have some kind of on-disk corruption.
> >>
> >>Very odd that some kind of on-disk corruption is suddenly causing
> >>xfsdump problems starting with Ubuntu 10.04 (Lucid) kernel
> >>2.6.32-27 and persisting in 2.6.32-28.
> >
> >Not really. The newer kernels have code in them that does more
> >validity checks than previous kernels, so older kernels would have
> >erroneously and silently returned unlinked files to xfsdump and have
> >them backed up. IOWs, you'd never notice such a corruption with
> >xfsdump. On the new kernel, xfsdump gets an EINVAL error to such
> >occurrences, which it should have in the first place.
> >
> >>And there is one other person who confirmed this xfsdump problem
> >>running Lucid with kernel 2.6.32-28. They reported their "me too"
> >>in the Ubuntu bug tracker.
> >>
> >>Could it be that 2.6.32-26 and prior managed to write something to
> >>disk corrupted, and the newer code is tripping on it?
> >
> >That's what I'm trying to find out. Or it could be something as
> >simple as your disk has had an undetected bit error that has flipped
> >a bit in the inode allocation btree.
> >
> 
> Hi Dave,
> 
> I am able to reproduce this on a system running Ubuntu 10.4
> (2.6.32-28). I took a metadump of the filesystem and moved it to
> a system running 10.10 (2.6.35-25), and was able to successfully
> dump it there. Likewise it dumps fine on 2.6.38-rc1. So this
> suggests an issue with the Ubuntu 10.4 kernel.

2.6.35 hasn't had the untrusted inode lookup patches back ported to
it, so it's no surprise that it isn't having problems - it's just
like the older 2.6.32 kernels.

Hmmm, can you find out if there is any specific pattern to the inode
numbers that are returning EINVAL? Maybe the inode allocbt freespace
record checks aren't quite correct in the backport (like the
original bogus alignment assumption I made).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-07 21:23               ` Dave Chinner
@ 2011-02-07 21:42                 ` Bill Kendall
  2011-02-07 22:04                   ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Bill Kendall @ 2011-02-07 21:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier, Michael Lueck

On 02/07/2011 03:23 PM, Dave Chinner wrote:
> On Mon, Feb 07, 2011 at 02:55:36PM -0600, Bill Kendall wrote:
>> On 02/04/2011 02:49 PM, Dave Chinner wrote:
>>> On Fri, Feb 04, 2011 at 09:12:53AM -0500, Michael Lueck wrote:
>>>> Dave Chinner wrote:
>>>>> Ok, so xfsdump i seeing a short bulkstat, then an EINVAL returned
>>>> >from the next bulkstat. That's not a race condition, and makes me
>>>>> think you have some kind of on-disk corruption.
>>>>
>>>> Very odd that some kind of on-disk corruption is suddenly causing
>>>> xfsdump problems starting with Ubuntu 10.04 (Lucid) kernel
>>>> 2.6.32-27 and persisting in 2.6.32-28.
>>>
>>> Not really. The newer kernels have code in them that does more
>>> validity checks than previous kernels, so older kernels would have
>>> erroneously and silently returned unlinked files to xfsdump and have
>>> them backed up. IOWs, you'd never notice such a corruption with
>>> xfsdump. On the new kernel, xfsdump gets an EINVAL error to such
>>> occurrences, which it should have in the first place.
>>>
>>>> And there is one other person who confirmed this xfsdump problem
>>>> running Lucid with kernel 2.6.32-28. They reported their "me too"
>>>> in the Ubuntu bug tracker.
>>>>
>>>> Could it be that 2.6.32-26 and prior managed to write something to
>>>> disk corrupted, and the newer code is tripping on it?
>>>
>>> That's what I'm trying to find out. Or it could be something as
>>> simple as your disk has had an undetected bit error that has flipped
>>> a bit in the inode allocation btree.
>>>
>>
>> Hi Dave,
>>
>> I am able to reproduce this on a system running Ubuntu 10.4
>> (2.6.32-28). I took a metadump of the filesystem and moved it to
>> a system running 10.10 (2.6.35-25), and was able to successfully
>> dump it there. Likewise it dumps fine on 2.6.38-rc1. So this
>> suggests an issue with the Ubuntu 10.4 kernel.
>
> 2.6.35 hasn't had the untrusted inode lookup patches back ported to
> it, so it's no surprise that it isn't having problems - it's just
> like the older 2.6.32 kernels.

I thought it landed in 2.6.35 and then a regression was fixed in
2.6.36. The untrusted inode lookup changes are referenced here:
http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.35

>
> Hmmm, can you find out if there is any specific pattern to the inode
> numbers that are returning EINVAL? Maybe the inode allocbt freespace
> record checks aren't quite correct in the backport (like the
> original bogus alignment assumption I made).

I'll take a look.

Bill

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-07 21:42                 ` Bill Kendall
@ 2011-02-07 22:04                   ` Dave Chinner
  2011-02-09  1:24                     ` Bill Kendall
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2011-02-07 22:04 UTC (permalink / raw)
  To: Bill Kendall; +Cc: linux-xfs, Dann Frazier, Michael Lueck

On Mon, Feb 07, 2011 at 03:42:28PM -0600, Bill Kendall wrote:
> On 02/07/2011 03:23 PM, Dave Chinner wrote:
> >On Mon, Feb 07, 2011 at 02:55:36PM -0600, Bill Kendall wrote:
> >>On 02/04/2011 02:49 PM, Dave Chinner wrote:
> >>>On Fri, Feb 04, 2011 at 09:12:53AM -0500, Michael Lueck wrote:
> >>>>Dave Chinner wrote:
> >>>>>Ok, so xfsdump i seeing a short bulkstat, then an EINVAL returned
> >>>>>from the next bulkstat. That's not a race condition, and makes me
> >>>>>think you have some kind of on-disk corruption.
> >>>>
> >>>>Very odd that some kind of on-disk corruption is suddenly causing
> >>>>xfsdump problems starting with Ubuntu 10.04 (Lucid) kernel
> >>>>2.6.32-27 and persisting in 2.6.32-28.
> >>>
> >>>Not really. The newer kernels have code in them that does more
> >>>validity checks than previous kernels, so older kernels would have
> >>>erroneously and silently returned unlinked files to xfsdump and have
> >>>them backed up. IOWs, you'd never notice such a corruption with
> >>>xfsdump. On the new kernel, xfsdump gets an EINVAL error to such
> >>>occurrences, which it should have in the first place.
> >>>
> >>>>And there is one other person who confirmed this xfsdump problem
> >>>>running Lucid with kernel 2.6.32-28. They reported their "me too"
> >>>>in the Ubuntu bug tracker.
> >>>>
> >>>>Could it be that 2.6.32-26 and prior managed to write something to
> >>>>disk corrupted, and the newer code is tripping on it?
> >>>
> >>>That's what I'm trying to find out. Or it could be something as
> >>>simple as your disk has had an undetected bit error that has flipped
> >>>a bit in the inode allocation btree.
> >>>
> >>
> >>Hi Dave,
> >>
> >>I am able to reproduce this on a system running Ubuntu 10.4
> >>(2.6.32-28). I took a metadump of the filesystem and moved it to
> >>a system running 10.10 (2.6.35-25), and was able to successfully
> >>dump it there. Likewise it dumps fine on 2.6.38-rc1. So this
> >>suggests an issue with the Ubuntu 10.4 kernel.
> >
> >2.6.35 hasn't had the untrusted inode lookup patches back ported to
> >it, so it's no surprise that it isn't having problems - it's just
> >like the older 2.6.32 kernels.
> 
> I thought it landed in 2.6.35 and then a regression was fixed in
> 2.6.36. The untrusted inode lookup changes are referenced here:
> http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.35

My bad, I just checked the regression fix. I have no idea if it got
back ported to 2.6.35-stable or not - it probably didn't judging by
your results.....

> >Hmmm, can you find out if there is any specific pattern to the inode
> >numbers that are returning EINVAL? Maybe the inode allocbt freespace
> >record checks aren't quite correct in the backport (like the
> >original bogus alignment assumption I made).
> 
> I'll take a look.

Thanks.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-04  0:08       ` Dave Chinner
  2011-02-04 14:12         ` Michael Lueck
@ 2011-02-08 17:39         ` Michael Lueck
  2011-02-08 19:52           ` Dave Chinner
  2011-02-08 17:39         ` Michael Lueck
  2 siblings, 1 reply; 23+ messages in thread
From: Michael Lueck @ 2011-02-08 17:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier

Greetings Dave,

I am just finally getting back to this. Rebooted to the latest 10.04 kernel.

$ uname -a
Linux ldslnx01 2.6.32-28-generic-pae #55-Ubuntu SMP Mon Jan 10 22:34:08 UTC 2011 i686 GNU/Linux

Dave Chinner wrote:
> Can you firstly post the output of:
>
> # xfs_db -c "inode 80508397" -c p<dev>

Command seems grumpy...

$ sudo xfs_db -c "inode 80508397" -c p <dev>
-bash: syntax error near unexpected token `newline'

I even tried unmounting the /srv partition. I am having difficulty finding docs about the usage of xfs_db / xfs_repair.

rrrrrr?????

> And can you also run 'xfs_repair -n<dev>' on the filesystem and
> post the output as well so we can see what state the filesytem is
> in?

This one as well...

/srv$ sudo xfs_repair -n<dev>
-bash: syntax error near unexpected token `newline'

Sincerely,

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-04  0:08       ` Dave Chinner
  2011-02-04 14:12         ` Michael Lueck
  2011-02-08 17:39         ` Michael Lueck
@ 2011-02-08 17:39         ` Michael Lueck
  2 siblings, 0 replies; 23+ messages in thread
From: Michael Lueck @ 2011-02-08 17:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier

Greetings Dave,

I am just finally getting back to this. Rebooted to the latest 10.04 kernel.

$ uname -a
Linux ldslnx01 2.6.32-28-generic-pae #55-Ubuntu SMP Mon Jan 10 22:34:08 UTC 2011 i686 GNU/Linux

Dave Chinner wrote:
> Can you firstly post the output of:
>
> # xfs_db -c "inode 80508397" -c p<dev>

Command seems grumpy...

$ sudo xfs_db -c "inode 80508397" -c p <dev>
-bash: syntax error near unexpected token `newline'

I even tried unmounting the /srv partition. I am having difficulty finding docs about the usage of xfs_db / xfs_repair.

rrrrrr?????

> And can you also run 'xfs_repair -n<dev>' on the filesystem and
> post the output as well so we can see what state the filesytem is
> in?

This one as well...

/srv$ sudo xfs_repair -n<dev>
-bash: syntax error near unexpected token `newline'

Sincerely,

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-08 17:39         ` Michael Lueck
@ 2011-02-08 19:52           ` Dave Chinner
  2011-02-08 19:59             ` Michael Lueck
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2011-02-08 19:52 UTC (permalink / raw)
  To: Michael Lueck; +Cc: linux-xfs, Dann Frazier

On Tue, Feb 08, 2011 at 12:39:30PM -0500, Michael Lueck wrote:
> Greetings Dave,
> 
> I am just finally getting back to this. Rebooted to the latest 10.04 kernel.
> 
> $ uname -a
> Linux ldslnx01 2.6.32-28-generic-pae #55-Ubuntu SMP Mon Jan 10 22:34:08 UTC 2011 i686 GNU/Linux
> 
> Dave Chinner wrote:
> >Can you firstly post the output of:
> >
> ># xfs_db -c "inode 80508397" -c p<dev>
> 
> Command seems grumpy...
> 
> $ sudo xfs_db -c "inode 80508397" -c p <dev>
> -bash: syntax error near unexpected token `newline'
> 
> I even tried unmounting the /srv partition. I am having difficulty finding docs about the usage of xfs_db / xfs_repair.

The man pages, perhaps? i.e. 'man xfs_db' and 'man xfs_repair'

> >And can you also run 'xfs_repair -n<dev>' on the filesystem and
> >post the output as well so we can see what state the filesytem is
> >in?
> 
> This one as well...
> 
> /srv$ sudo xfs_repair -n<dev>
> -bash: syntax error near unexpected token `newline'

Substitute /dev/sda, /dev/sdb or whatever your filesystem is on for
"<dev>" in the above commands.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-08 19:52           ` Dave Chinner
@ 2011-02-08 19:59             ` Michael Lueck
  2011-02-08 20:24               ` Michael Lueck
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Lueck @ 2011-02-08 19:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier

Dave Chinner wrote:
> The man pages, perhaps? i.e. 'man xfs_db' and 'man xfs_repair'

I did, and oddly enough I did not find "<dev>" in the example syntax...

> Substitute /dev/sda, /dev/sdb or whatever your filesystem is on for
> "<dev>" in the above commands.

aaaahhh, THAT explains the curve ball. ;-)

I will IPL the server to the latest kernel and run the tests immediately.

Thank you so much for the explanation!

Sincerely,

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-08 19:59             ` Michael Lueck
@ 2011-02-08 20:24               ` Michael Lueck
  2011-02-08 22:47                 ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Lueck @ 2011-02-08 20:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier

[-- Attachment #1: Type: text/plain, Size: 207 bytes --]

Michael Lueck wrote:
> I will IPL the server to the latest kernel and run the tests immediately.

Here they are attached...

Sincerely,

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

[-- Attachment #2: xfs_db-inode_80508397.log --]
[-- Type: text/x-log, Size: 1032 bytes --]

# sudo xfs_db -c "inode 80508397" -c p /dev/sda9 2>&1 | tee xfs_db-inode_80508397.log
core.magic = 0x494e
core.mode = 040777
core.version = 2
core.format = 2 (extents)
core.nlinkv2 = 2
core.onlink = 0
core.projid = 0
core.uid = 1000
core.gid = 1000
core.flushiter = 10
core.atime.sec = Sun Feb  6 16:59:44 2011
core.atime.nsec = 088915516
core.mtime.sec = Tue Jan 11 13:50:51 2011
core.mtime.nsec = 950662167
core.ctime.sec = Tue Jan 11 13:50:51 2011
core.ctime.nsec = 950662167
core.size = 4096
core.nblocks = 1
core.extsize = 0
core.nextents = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 3978985788
next_unlinked = null
u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,5052169,1,0]

[-- Attachment #3: xfs_repair-srv.log --]
[-- Type: text/x-log, Size: 988 bytes --]

# sudo xfs_repair -n /dev/sda9 2>&1 | tee xfs_repair-srv.log
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

[-- Attachment #4: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-08 20:24               ` Michael Lueck
@ 2011-02-08 22:47                 ` Dave Chinner
  2011-02-14  2:52                   ` Michael Lueck
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2011-02-08 22:47 UTC (permalink / raw)
  To: Michael Lueck; +Cc: linux-xfs, Dann Frazier

On Tue, Feb 08, 2011 at 03:24:53PM -0500, Michael Lueck wrote:
> Michael Lueck wrote:
> >I will IPL the server to the latest kernel and run the tests immediately.
> 
> Here they are attached...
>
> # sudo xfs_db -c "inode 80508397" -c p /dev/sda9 2>&1 | tee xfs_db-inode_80508397.log
> core.magic = 0x494e
> core.mode = 040777
> core.version = 2
> core.format = 2 (extents)
> core.nlinkv2 = 2
> core.onlink = 0
> core.projid = 0
> core.uid = 1000
> core.gid = 1000
> core.flushiter = 10
> core.atime.sec = Sun Feb  6 16:59:44 2011
> core.atime.nsec = 088915516
> core.mtime.sec = Tue Jan 11 13:50:51 2011
> core.mtime.nsec = 950662167
> core.ctime.sec = Tue Jan 11 13:50:51 2011
> core.ctime.nsec = 950662167
> core.size = 4096
> core.nblocks = 1
> core.extsize = 0
> core.nextents = 1
> core.naextents = 0
> core.forkoff = 0
> core.aformat = 2 (extents)
> core.dmevmask = 0
> core.dmstate = 0
> core.newrtbm = 0
> core.prealloc = 0
> core.realtime = 0
> core.immutable = 0
> core.append = 0
> core.sync = 0
> core.noatime = 0
> core.nodump = 0
> core.rtinherit = 0
> core.projinherit = 0
> core.nosymlinks = 0
> core.extsz = 0
> core.extszinherit = 0
> core.nodefrag = 0
> core.filestream = 0
> core.gen = 3978985788
> next_unlinked = null
> u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,5052169,1,0]

Nothing wrong with that inode....

> # sudo xfs_repair -n /dev/sda9 2>&1 | tee xfs_repair-srv.log
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.

And no errors detected in the filesystem. Can you run the xfsdump
again to confirm that it fails on the same inode? If it doesņ then
it definitely seems like an alignment problem in the untrusted inode
lookup patches....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-07 22:04                   ` Dave Chinner
@ 2011-02-09  1:24                     ` Bill Kendall
  0 siblings, 0 replies; 23+ messages in thread
From: Bill Kendall @ 2011-02-09  1:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier, Michael Lueck

On 02/07/2011 04:04 PM, Dave Chinner wrote:
> On Mon, Feb 07, 2011 at 03:42:28PM -0600, Bill Kendall wrote:
>> On 02/07/2011 03:23 PM, Dave Chinner wrote:
>>> On Mon, Feb 07, 2011 at 02:55:36PM -0600, Bill Kendall wrote:
>>>> On 02/04/2011 02:49 PM, Dave Chinner wrote:
>>>>> On Fri, Feb 04, 2011 at 09:12:53AM -0500, Michael Lueck wrote:
>>>>>> Dave Chinner wrote:
>>>>>>> Ok, so xfsdump i seeing a short bulkstat, then an EINVAL returned
>>>>>> >from the next bulkstat. That's not a race condition, and makes me
>>>>>>> think you have some kind of on-disk corruption.
>>>>>>
>>>>>> Very odd that some kind of on-disk corruption is suddenly causing
>>>>>> xfsdump problems starting with Ubuntu 10.04 (Lucid) kernel
>>>>>> 2.6.32-27 and persisting in 2.6.32-28.
>>>>>
>>>>> Not really. The newer kernels have code in them that does more
>>>>> validity checks than previous kernels, so older kernels would have
>>>>> erroneously and silently returned unlinked files to xfsdump and have
>>>>> them backed up. IOWs, you'd never notice such a corruption with
>>>>> xfsdump. On the new kernel, xfsdump gets an EINVAL error to such
>>>>> occurrences, which it should have in the first place.
>>>>>
>>>>>> And there is one other person who confirmed this xfsdump problem
>>>>>> running Lucid with kernel 2.6.32-28. They reported their "me too"
>>>>>> in the Ubuntu bug tracker.
>>>>>>
>>>>>> Could it be that 2.6.32-26 and prior managed to write something to
>>>>>> disk corrupted, and the newer code is tripping on it?
>>>>>
>>>>> That's what I'm trying to find out. Or it could be something as
>>>>> simple as your disk has had an undetected bit error that has flipped
>>>>> a bit in the inode allocation btree.
>>>>>
>>>>
>>>> Hi Dave,
>>>>
>>>> I am able to reproduce this on a system running Ubuntu 10.4
>>>> (2.6.32-28). I took a metadump of the filesystem and moved it to
>>>> a system running 10.10 (2.6.35-25), and was able to successfully
>>>> dump it there. Likewise it dumps fine on 2.6.38-rc1. So this
>>>> suggests an issue with the Ubuntu 10.4 kernel.
>>>
>>> 2.6.35 hasn't had the untrusted inode lookup patches back ported to
>>> it, so it's no surprise that it isn't having problems - it's just
>>> like the older 2.6.32 kernels.
>>
>> I thought it landed in 2.6.35 and then a regression was fixed in
>> 2.6.36. The untrusted inode lookup changes are referenced here:
>> http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.35
>
> My bad, I just checked the regression fix. I have no idea if it got
> back ported to 2.6.35-stable or not - it probably didn't judging by
> your results.....
>
>>> Hmmm, can you find out if there is any specific pattern to the inode
>>> numbers that are returning EINVAL? Maybe the inode allocbt freespace
>>> record checks aren't quite correct in the backport (like the
>>> original bogus alignment assumption I made).
>>
>> I'll take a look.

The failing bulkstats, at least the ones I've checked so far, are 
hitting this path in xfs_bulkstat():

/*
  * Skip if this inode is free.
  */
if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free) {
	lastino = ino;
	continue;
}

The backport of the 4 untrusted inode lookup commits looks okay to
me, however I think they depend on commit
7dce11dbac54fce777eea0f5fb25b2694ccd7900 (xfs: always use iget in
bulkstat), which was checked in shortly before the untrusted
inode lookup changes. When that commit is added to the Ubuntu
2.6.32-28 kernel, xfsdump runs fine on the 2 filesystems of mine
that were exhibiting the problem.

Bill

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-08 22:47                 ` Dave Chinner
@ 2011-02-14  2:52                   ` Michael Lueck
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Lueck @ 2011-02-14  2:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Dann Frazier

Greetings Dave,

Dave Chinner wrote:
> And no errors detected in the filesystem. Can you run the xfsdump
> again to confirm that it fails on the same inode? If it doesņ then
> it definitely seems like an alignment problem in the untrusted inode
> lookup patches....

I got around to doing the recreate test just now. A larger trace file this time, but still crashed. Rebooted off of the -26 kernel and pulling a backup with "old faithful" kernel.

Please find the trace output here in zipped format:

http://www.lueckdatasystems.com/pub/ldsbackup.trace2.log.zip

I suppose this means that the "untrusted inode lookup patches" are further confirmed as missing? If so, any idea on how long it will take to receive an updated kernel with the xfs code corrections?

Sincerely,

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26?
  2011-02-02 13:30 xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26? Michael Lueck
  2011-02-02 18:32 ` Bill Kendall
@ 2011-04-22 12:34 ` Michael Lueck
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Lueck @ 2011-04-22 12:34 UTC (permalink / raw)
  To: linux-xfs

Greetings,

Yesterday I received via production Ubuntu 10.04 Lucid updates the new kernel addressing this issue.

Thank you so much Bill, Dave, and Dann for your assistance in getting this diagnosed and resolved so quickly! I really appreciate it!

Sincerely,

-- 
Michael Lueck
Lueck Data Systems
http://www.lueckdatasystems.com/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-04-22 12:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-02 13:30 xfsdump SGI_FS_BULKSTAT errno = 22, how could this IRIX bug get into Ubuntu 10.04 Lucid between kernels 2.6.32-27 and 2.6.32-26? Michael Lueck
2011-02-02 18:32 ` Bill Kendall
2011-02-02 19:03   ` Michael Lueck
2011-02-03  4:58   ` Dave Chinner
2011-02-03 14:43     ` Michael Lueck
2011-02-04  0:08       ` Dave Chinner
2011-02-04 14:12         ` Michael Lueck
2011-02-04 20:49           ` Dave Chinner
2011-02-07 20:55             ` Bill Kendall
2011-02-07 21:23               ` Dave Chinner
2011-02-07 21:42                 ` Bill Kendall
2011-02-07 22:04                   ` Dave Chinner
2011-02-09  1:24                     ` Bill Kendall
2011-02-08 17:39         ` Michael Lueck
2011-02-08 19:52           ` Dave Chinner
2011-02-08 19:59             ` Michael Lueck
2011-02-08 20:24               ` Michael Lueck
2011-02-08 22:47                 ` Dave Chinner
2011-02-14  2:52                   ` Michael Lueck
2011-02-08 17:39         ` Michael Lueck
2011-02-03 14:51     ` Michael Lueck
2011-02-04 14:52     ` dann frazier
2011-04-22 12:34 ` Michael Lueck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.