All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS corruptions
@ 2015-11-30 16:51 Sandeep Patel
  2015-11-30 17:40 ` Emmanuel Florac
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Sandeep Patel @ 2015-11-30 16:51 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 2731 bytes --]

Hi,

We have multiple 22 disk raid 6 arrays using LSI 9280-24i4e raid cards, all using enterprise grade drives. The array is setup with physical drive cache disabled and mounted using inode64 with nobarriers option in Oracle Linux Server release 6.6, kernel version 3.8.13-55.1.5.el6uek.x86_64. We are suffering from corruptions with xfs_repair unable to permanently fix the the issues. Output of dmesg from the latest corruption.

Pid: 5319, comm: glusterfsd Not tainted 3.8.13-55.1.5.el6uek.x86_64 #2
Call Trace:
[<ffffffffa02a3a9f>] xfs_error_report+0x3f/0x50 [xfs]
[<ffffffffa02ed416>] ? xfs_iread_extents+0x86/0x110 [xfs]
[<ffffffffa02a3b0e>] xfs_corruption_error+0x5e/0x90 [xfs]
[<ffffffffa02c7ac8>] xfs_bmap_read_extents+0x3f8/0x450 [xfs]
[<ffffffffa02ed416>] ? xfs_iread_extents+0x86/0x110 [xfs]
[<ffffffffa02eb94e>] ? xfs_iext_realloc_direct+0xae/0x160 [xfs]
[<ffffffffa02ed416>] xfs_iread_extents+0x86/0x110 [xfs]
[<ffffffffa02fb0cc>] ? xfs_trans_free_item_desc+0x3c/0x50 [xfs]
[<ffffffffa02c5df5>] xfs_bmap_last_extent+0x95/0xb0 [xfs]
[<ffffffffa02c5ef6>] xfs_bmap_last_offset+0x66/0xc0 [xfs]
[<ffffffffa02dc123>] ? xfs_dir_lookup+0xc3/0x170 [xfs]
[<ffffffffa02dbc96>] xfs_dir2_isblock+0x26/0x60 [xfs]
[<ffffffffa02dc123>] xfs_dir_lookup+0xc3/0x170 [xfs]
[<ffffffffa02b6497>] xfs_lookup+0x87/0x110 [xfs]
[<ffffffff811a92ed>] ? __d_alloc+0x14d/0x180
[<ffffffffa02aea84>] xfs_vn_lookup+0x54/0xa0 [xfs]
[<ffffffff8119db43>] ? lookup_dcache+0xa3/0xd0
[<ffffffff8119d3ad>] lookup_real+0x1d/0x60
[<ffffffff8119dba8>] __lookup_hash+0x38/0x50
[<ffffffff8119dc0e>] lookup_slow+0x4e/0xc0
[<ffffffff811a1d41>] path_lookupat+0x201/0x780
[<ffffffff811a22f4>] filename_lookup+0x34/0xc0
[<ffffffff811a35b9>] user_path_at_empty+0x59/0xa0
[<ffffffff811a3611>] ? user_path_at+0x11/0x20
[<ffffffff811986c0>] ? vfs_fstatat+0x50/0xb0
[<ffffffff811a3611>] user_path_at+0x11/0x20
[<ffffffff811a3698>] sys_linkat+0x78/0x250
[<ffffffff810e37e6>] ? __audit_syscall_exit+0x216/0x2c0
[<ffffffff815a25d9>] system_call_fastpath+0x16/0x1b
XFS (sdb): Corruption detected. Unmount and run xfs_repair
XFS (sdb): corrupt dinode 6442451040, (btree extents).
ffff88042615d000: 42 4d 41 50 00 00 00 fe ff ff ff ff ff ff ff ff BMAP............
XFS (sdb): Internal error xfs_bmap_read_extents(1) at line 1610 of file fs/xfs/xfs_bmap.c. Caller 0xffffffffa02ed416

What could be causing these issues?

Thanks for your help in advanced.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

[-- Attachment #1.2: Type: text/html, Size: 6960 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: XFS corruptions
  2015-11-30 16:51 XFS corruptions Sandeep Patel
@ 2015-11-30 17:40 ` Emmanuel Florac
  2015-11-30 18:06   ` Sandeep Patel
  2015-11-30 21:51 ` Dave Chinner
  2015-12-10 16:46 ` Eric Sandeen
  2 siblings, 1 reply; 19+ messages in thread
From: Emmanuel Florac @ 2015-11-30 17:40 UTC (permalink / raw)
  To: Sandeep Patel; +Cc: xfs

Le Mon, 30 Nov 2015 16:51:57 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> 
> What could be causing these issues?
> 
> Thanks for your help in advanced.

Hard to say, what does xfs_info /mountpoint says?

Are the corruption occuring on all machines or do they target some
systems more?

Did you try repairing using the latest (4.2 or 4.3) version of
xfs_repair? The Oracle Linux version is probably seriously out of date.


-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: XFS corruptions
  2015-11-30 17:40 ` Emmanuel Florac
@ 2015-11-30 18:06   ` Sandeep Patel
  2015-11-30 18:45     ` Emmanuel Florac
  0 siblings, 1 reply; 19+ messages in thread
From: Sandeep Patel @ 2015-11-30 18:06 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

Hi Emmanuel,

Thanks for the response.

 [root@gc003b ~]# xfs_info /dev/sdb
meta-data=/dev/sdb               isize=512    agcount=52, agsize=268435455 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=0
data     =                       bsize=4096   blocks=13916176384, imaxpct=1
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

We have 18 nodes each with 2 of these arrays and we are seeing this across the board.

I have updated the xfsprogs to 3.1.11-1.0.6.el6.x86_64 which is the latest version available on our yum repo.

This is a production system so upgrading to the latest version is not very easy. 

Thanks
Sandeep

-----Original Message-----
From: Emmanuel Florac [mailto:eflorac@intellique.com] 
Sent: 30 November 2015 17:41
To: Sandeep Patel
Cc: xfs@oss.sgi.com
Subject: Re: XFS corruptions

Le Mon, 30 Nov 2015 16:51:57 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> 
> What could be causing these issues?
> 
> Thanks for your help in advanced.

Hard to say, what does xfs_info /mountpoint says?

Are the corruption occuring on all machines or do they target some systems more?

Did you try repairing using the latest (4.2 or 4.3) version of xfs_repair? The Oracle Linux version is probably seriously out of date.


--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com ______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: XFS corruptions
  2015-11-30 18:06   ` Sandeep Patel
@ 2015-11-30 18:45     ` Emmanuel Florac
  2015-12-01 12:18       ` Sandeep Patel
  0 siblings, 1 reply; 19+ messages in thread
From: Emmanuel Florac @ 2015-11-30 18:45 UTC (permalink / raw)
  To: Sandeep Patel; +Cc: xfs

Le Mon, 30 Nov 2015 18:06:55 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> Hi Emmanuel,
> 
> Thanks for the response.
> 
>  [root@gc003b ~]# xfs_info /dev/sdb
> meta-data=/dev/sdb               isize=512    agcount=52,
> agsize=268435455 blks =                       sectsz=4096  attr=2,
> projid32bit=1 =                       crc=0
> data     =                       bsize=4096   blocks=13916176384,
> imaxpct=1 =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=521728, version=2
>          =                       sectsz=4096  sunit=1 blks,
> lazy-count=1 realtime =none                   extsz=4096   blocks=0,
> rtextents=0

Looks plain defaults... you didn't apply any customization, did you?

> 
> We have 18 nodes each with 2 of these arrays and we are seeing this
> across the board.

Hum, strange, are you running the latest RAID firmware on the
controllers?

Does this happen more often when the array is rebuilding, or verifying,
or when the system is under heavy IO? Or does it happen just completely
at random?
 
> I have updated the xfsprogs to 3.1.11-1.0.6.el6.x86_64 which is the
> latest version available on our yum repo.

If you're not afraid of running binaries from unknown source, here's a
4.2.0 version I've built recently:
http://update.intellique.com/pub/xfs_repair-4.2.0.gz
 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: XFS corruptions
  2015-11-30 16:51 XFS corruptions Sandeep Patel
  2015-11-30 17:40 ` Emmanuel Florac
@ 2015-11-30 21:51 ` Dave Chinner
  2015-11-30 22:04   ` Darrick J. Wong
  2015-12-10 16:46 ` Eric Sandeen
  2 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2015-11-30 21:51 UTC (permalink / raw)
  To: Sandeep Patel; +Cc: xfs

On Mon, Nov 30, 2015 at 04:51:57PM +0000, Sandeep Patel wrote:
> Hi,
> 
> We have multiple 22 disk raid 6 arrays using LSI 9280-24i4e raid
> cards, all using enterprise grade drives. The array is setup with
> physical drive cache disabled and mounted using inode64 with
> nobarriers option in Oracle Linux Server release 6.6, kernel
> version 3.8.13-55.1.5.el6uek.x86_64. We are suffering from
> corruptions with xfs_repair unable to permanently fix the the
> issues. Output of dmesg from the latest corruption.
> 
> Pid: 5319, comm: glusterfsd Not tainted 3.8.13-55.1.5.el6uek.x86_64 #2

FYI, we cannot really support vendor enterprise on the upstream lists
because the code base is so different from vanilla/upstream kernels.
I same exactly the same thing for bugs reported on RHEL/CentOS
kernels, and for SLES kernels - only the vendor can properly support
their own franken-kernels...

Hence I'd suggest that you report the problem to your Oracle support
contact so they can walk you through the process of finding the
problem....

[ Darrick is really going to thank me for saying this. ]

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: XFS corruptions
  2015-11-30 21:51 ` Dave Chinner
@ 2015-11-30 22:04   ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2015-11-30 22:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Sandeep Patel, xfs

On Tue, Dec 01, 2015 at 08:51:12AM +1100, Dave Chinner wrote:
> On Mon, Nov 30, 2015 at 04:51:57PM +0000, Sandeep Patel wrote:
> > Hi,
> > 
> > We have multiple 22 disk raid 6 arrays using LSI 9280-24i4e raid
> > cards, all using enterprise grade drives. The array is setup with
> > physical drive cache disabled and mounted using inode64 with
> > nobarriers option in Oracle Linux Server release 6.6, kernel
> > version 3.8.13-55.1.5.el6uek.x86_64. We are suffering from
> > corruptions with xfs_repair unable to permanently fix the the
> > issues. Output of dmesg from the latest corruption.
> > 
> > Pid: 5319, comm: glusterfsd Not tainted 3.8.13-55.1.5.el6uek.x86_64 #2
> 
> FYI, we cannot really support vendor enterprise on the upstream lists
> because the code base is so different from vanilla/upstream kernels.
> I same exactly the same thing for bugs reported on RHEL/CentOS
> kernels, and for SLES kernels - only the vendor can properly support
> their own franken-kernels...
>
> Hence I'd suggest that you report the problem to your Oracle support
> contact so they can walk you through the process of finding the
> problem....
> 
> [ Darrick is really going to thank me for saying this. ]

Probably what I'd have written anyway. :)

Oracle support might just tell you to upgrade the kernel whatever the
latest is.  3.8.13-55 is pretty far back AFAIK.

(Is not a support engineer, nor do I play one on TV.)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: XFS corruptions
  2015-11-30 18:45     ` Emmanuel Florac
@ 2015-12-01 12:18       ` Sandeep Patel
  2015-12-01 13:50         ` Emmanuel Florac
  0 siblings, 1 reply; 19+ messages in thread
From: Sandeep Patel @ 2015-12-01 12:18 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

Yes, they only thing we changed from default is the isize.

We are running the latest version on the firmware. We previously upgraded the firmware on these thinking it would solve the problem.

The problems seem to occur at random. The load to these units are quite minimal at the moment and we are still having issues. 

Thanks
Sandeep

-----Original Message-----
From: Emmanuel Florac [mailto:eflorac@intellique.com] 
Sent: 30 November 2015 18:46
To: Sandeep Patel
Cc: xfs@oss.sgi.com
Subject: Re: XFS corruptions

Le Mon, 30 Nov 2015 18:06:55 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> Hi Emmanuel,
> 
> Thanks for the response.
> 
>  [root@gc003b ~]# xfs_info /dev/sdb
> meta-data=/dev/sdb               isize=512    agcount=52,
> agsize=268435455 blks =                       sectsz=4096  attr=2,
> projid32bit=1 =                       crc=0
> data     =                       bsize=4096   blocks=13916176384,
> imaxpct=1 =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=521728, version=2
>          =                       sectsz=4096  sunit=1 blks,
> lazy-count=1 realtime =none                   extsz=4096   blocks=0,
> rtextents=0

Looks plain defaults... you didn't apply any customization, did you?

> 
> We have 18 nodes each with 2 of these arrays and we are seeing this 
> across the board.

Hum, strange, are you running the latest RAID firmware on the controllers?

Does this happen more often when the array is rebuilding, or verifying, or when the system is under heavy IO? Or does it happen just completely at random?
 
> I have updated the xfsprogs to 3.1.11-1.0.6.el6.x86_64 which is the 
> latest version available on our yum repo.

If you're not afraid of running binaries from unknown source, here's a
4.2.0 version I've built recently:
http://update.intellique.com/pub/xfs_repair-4.2.0.gz
 

--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com ______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: XFS corruptions
  2015-12-01 12:18       ` Sandeep Patel
@ 2015-12-01 13:50         ` Emmanuel Florac
  2015-12-01 14:01           ` Sandeep Patel
  2015-12-10 15:07           ` Sandeep Patel
  0 siblings, 2 replies; 19+ messages in thread
From: Emmanuel Florac @ 2015-12-01 13:50 UTC (permalink / raw)
  To: Sandeep Patel; +Cc: xfs

Le Tue, 1 Dec 2015 12:18:21 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> Yes, they only thing we changed from default is the isize.
> 
> We are running the latest version on the firmware. We previously
> upgraded the firmware on these thinking it would solve the problem.
> 
> The problems seem to occur at random. The load to these units are
> quite minimal at the moment and we are still having issues. 
> 

I just received a new server with an LSI controller similar to yours,
I'll run some tests but I'd suspect the kernel may be at fault. What is
your firmware version exactly?



-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: XFS corruptions
  2015-12-01 13:50         ` Emmanuel Florac
@ 2015-12-01 14:01           ` Sandeep Patel
  2015-12-10 15:07           ` Sandeep Patel
  1 sibling, 0 replies; 19+ messages in thread
From: Sandeep Patel @ 2015-12-01 14:01 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

The firmware for the raid card is 12.15.0-0205

Thanks for this. Let me know if you need any more details.

Sandeep

-----Original Message-----
From: Emmanuel Florac [mailto:eflorac@intellique.com] 
Sent: 01 December 2015 13:50
To: Sandeep Patel
Cc: xfs@oss.sgi.com
Subject: Re: XFS corruptions

Le Tue, 1 Dec 2015 12:18:21 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> Yes, they only thing we changed from default is the isize.
> 
> We are running the latest version on the firmware. We previously 
> upgraded the firmware on these thinking it would solve the problem.
> 
> The problems seem to occur at random. The load to these units are 
> quite minimal at the moment and we are still having issues.
> 

I just received a new server with an LSI controller similar to yours, I'll run some tests but I'd suspect the kernel may be at fault. What is your firmware version exactly?



--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com ______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: XFS corruptions
  2015-12-01 13:50         ` Emmanuel Florac
  2015-12-01 14:01           ` Sandeep Patel
@ 2015-12-10 15:07           ` Sandeep Patel
  2015-12-10 15:46             ` Emmanuel Florac
  1 sibling, 1 reply; 19+ messages in thread
From: Sandeep Patel @ 2015-12-10 15:07 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

Hi Emmanuel,

Any luck with the testing. I am attaching a copy of my dmesg output which keeps on repeating. In this case, the error seems to be with one dinode.

Pid: 5321, comm: glusterfsd Not tainted 3.8.13-55.1.5.el6uek.x86_64 #2
Call Trace:
 [<ffffffffa02a3a9f>] xfs_error_report+0x3f/0x50 [xfs]
 [<ffffffffa02ed416>] ? xfs_iread_extents+0x86/0x110 [xfs]
 [<ffffffffa02a3b0e>] xfs_corruption_error+0x5e/0x90 [xfs]
 [<ffffffffa02c7ac8>] xfs_bmap_read_extents+0x3f8/0x450 [xfs]
 [<ffffffffa02ed416>] ? xfs_iread_extents+0x86/0x110 [xfs]
 [<ffffffffa02eb94e>] ? xfs_iext_realloc_direct+0xae/0x160 [xfs]
 [<ffffffffa02ed416>] xfs_iread_extents+0x86/0x110 [xfs]
 [<ffffffffa02fb0cc>] ? xfs_trans_free_item_desc+0x3c/0x50 [xfs]
 [<ffffffffa02c5df5>] xfs_bmap_last_extent+0x95/0xb0 [xfs]
 [<ffffffffa02c5ef6>] xfs_bmap_last_offset+0x66/0xc0 [xfs]
 [<ffffffffa02dc123>] ? xfs_dir_lookup+0xc3/0x170 [xfs]
 [<ffffffffa02dbc96>] xfs_dir2_isblock+0x26/0x60 [xfs]
 [<ffffffffa02dc123>] xfs_dir_lookup+0xc3/0x170 [xfs]
 [<ffffffffa02b6497>] xfs_lookup+0x87/0x110 [xfs]
 [<ffffffff811a92ed>] ? __d_alloc+0x14d/0x180
 [<ffffffffa02aea84>] xfs_vn_lookup+0x54/0xa0 [xfs]
 [<ffffffff8119db43>] ? lookup_dcache+0xa3/0xd0
 [<ffffffff8119d3ad>] lookup_real+0x1d/0x60
 [<ffffffff8119dba8>] __lookup_hash+0x38/0x50
 [<ffffffff8119dc0e>] lookup_slow+0x4e/0xc0
 [<ffffffff811a1d41>] path_lookupat+0x201/0x780
 [<ffffffff811a22f4>] filename_lookup+0x34/0xc0
 [<ffffffff811a35b9>] user_path_at_empty+0x59/0xa0
 [<ffffffff811a3611>] ? user_path_at+0x11/0x20
 [<ffffffff811986c0>] ? vfs_fstatat+0x50/0xb0
 [<ffffffff811a3611>] user_path_at+0x11/0x20
 [<ffffffff811a3698>] sys_linkat+0x78/0x250
 [<ffffffff810e37e6>] ? __audit_syscall_exit+0x216/0x2c0
 [<ffffffff815a25d9>] system_call_fastpath+0x16/0x1b
XFS (sdb): Corruption detected. Unmount and run xfs_repair
XFS (sdb): corrupt dinode 6442451040, (btree extents).
ffff880425182000: 42 4d 41 50 00 00 00 fe ff ff ff ff ff ff ff ff  BMAP............
XFS (sdb): Internal error xfs_bmap_read_extents(1) at line 1610 of file fs/xfs/xfs_bmap.c.  Caller 0xffffffffa02ed416

Let me know if you progress this any further. On another note, we are looking to upgrade the firmware of our LSI raid cards the latest version. Seems like LSI have released a new FW not too long ago.

Thanks
Sandeep

-----Original Message-----
From: Sandeep Patel 
Sent: 01 December 2015 14:00
To: 'Emmanuel Florac'
Cc: xfs@oss.sgi.com
Subject: RE: XFS corruptions

The firmware for the raid card is 12.15.0-0205

Thanks for this. Let me know if you need any more details.

Sandeep

-----Original Message-----
From: Emmanuel Florac [mailto:eflorac@intellique.com]
Sent: 01 December 2015 13:50
To: Sandeep Patel
Cc: xfs@oss.sgi.com
Subject: Re: XFS corruptions

Le Tue, 1 Dec 2015 12:18:21 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> Yes, they only thing we changed from default is the isize.
> 
> We are running the latest version on the firmware. We previously 
> upgraded the firmware on these thinking it would solve the problem.
> 
> The problems seem to occur at random. The load to these units are 
> quite minimal at the moment and we are still having issues.
> 

I just received a new server with an LSI controller similar to yours, I'll run some tests but I'd suspect the kernel may be at fault. What is your firmware version exactly?



--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com ______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: XFS corruptions
  2015-12-10 15:07           ` Sandeep Patel
@ 2015-12-10 15:46             ` Emmanuel Florac
  2015-12-10 19:51               ` Sandeep Patel
  2016-01-11 14:20               ` Sandeep Patel
  0 siblings, 2 replies; 19+ messages in thread
From: Emmanuel Florac @ 2015-12-10 15:46 UTC (permalink / raw)
  To: Sandeep Patel; +Cc: xfs

Le Thu, 10 Dec 2015 15:07:32 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> Any luck with the testing. I am attaching a copy of my dmesg output
> which keeps on repeating. In this case, the error seems to be with
> one dinode.
> 

I've run some tests on a 18x6 TB array, and no error occurred. However
I'm running a much newer kernel, 3.18.24.

Could you maybe try installing a different kernel on some machines and
see if it performs differently?

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: XFS corruptions
  2015-11-30 16:51 XFS corruptions Sandeep Patel
  2015-11-30 17:40 ` Emmanuel Florac
  2015-11-30 21:51 ` Dave Chinner
@ 2015-12-10 16:46 ` Eric Sandeen
  2 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2015-12-10 16:46 UTC (permalink / raw)
  To: xfs

On 11/30/15 10:51 AM, Sandeep Patel wrote:
> Hi,

> We have multiple 22 disk raid 6 arrays using LSI 9280-24i4e raid
> cards, all using enterprise grade drives. The array is setup with
> physical drive cache disabled and mounted using inode64 with
> nobarriers option in Oracle Linux Server release 6.6, kernel version
> 3.8.13-55.1.5.el6uek.x86_64. We are suffering from corruptions with
> xfs_repair unable to permanently fix the the issues.
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Does that mean it finds and fixes something "temporarily?"  If so, what?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: XFS corruptions
  2015-12-10 15:46             ` Emmanuel Florac
@ 2015-12-10 19:51               ` Sandeep Patel
  2016-01-11 14:20               ` Sandeep Patel
  1 sibling, 0 replies; 19+ messages in thread
From: Sandeep Patel @ 2015-12-10 19:51 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

Hi Emmanuel,

Some extra information for the volume which may be related. We do not partition or align the xfs formatted volume. We mount it directly using /dev/sda. The fstab info below.

LABEL=brick1            /export/brick1          xfs     nobarrier,inode64 1 2
LABEL=brick2            /export/brick2          xfs     nobarrier,inode64 1 2

Thanks
Sandeep

-----Original Message-----
From: Emmanuel Florac [mailto:eflorac@intellique.com] 
Sent: 10 December 2015 15:47
To: Sandeep Patel
Cc: xfs@oss.sgi.com
Subject: Re: XFS corruptions

Le Thu, 10 Dec 2015 15:07:32 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> Any luck with the testing. I am attaching a copy of my dmesg output 
> which keeps on repeating. In this case, the error seems to be with one 
> dinode.
> 

I've run some tests on a 18x6 TB array, and no error occurred. However I'm running a much newer kernel, 3.18.24.

Could you maybe try installing a different kernel on some machines and see if it performs differently?

--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com ______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: XFS corruptions
  2015-12-10 15:46             ` Emmanuel Florac
  2015-12-10 19:51               ` Sandeep Patel
@ 2016-01-11 14:20               ` Sandeep Patel
  2016-01-11 15:16                 ` Emmanuel Florac
  1 sibling, 1 reply; 19+ messages in thread
From: Sandeep Patel @ 2016-01-11 14:20 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

Hi Emmanuel,

I have managed to upgrade our kernel to 3.8.13-118.2.2.el6uek.x86_64 which was the latest available on our repo. I will let you know how I get on.

Thanks
Sandeep

-----Original Message-----
From: Emmanuel Florac [mailto:eflorac@intellique.com] 
Sent: 10 December 2015 15:47
To: Sandeep Patel
Cc: xfs@oss.sgi.com
Subject: Re: XFS corruptions

Le Thu, 10 Dec 2015 15:07:32 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> Any luck with the testing. I am attaching a copy of my dmesg output 
> which keeps on repeating. In this case, the error seems to be with one 
> dinode.
> 

I've run some tests on a 18x6 TB array, and no error occurred. However I'm running a much newer kernel, 3.18.24.

Could you maybe try installing a different kernel on some machines and see if it performs differently?

--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com ______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: XFS corruptions
  2016-01-11 14:20               ` Sandeep Patel
@ 2016-01-11 15:16                 ` Emmanuel Florac
  2016-03-16 13:15                   ` Sandeep Patel
  0 siblings, 1 reply; 19+ messages in thread
From: Emmanuel Florac @ 2016-01-11 15:16 UTC (permalink / raw)
  To: Sandeep Patel; +Cc: xfs

Le Mon, 11 Jan 2016 14:20:12 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> Hi Emmanuel,
> 
> I have managed to upgrade our kernel to 3.8.13-118.2.2.el6uek.x86_64
> which was the latest available on our repo. I will let you know how I
> get on.
> 

OK. I'm pretty confident it should enhance the situation :)

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: XFS corruptions
  2016-01-11 15:16                 ` Emmanuel Florac
@ 2016-03-16 13:15                   ` Sandeep Patel
  2016-03-16 14:07                     ` Emmanuel Florac
  0 siblings, 1 reply; 19+ messages in thread
From: Sandeep Patel @ 2016-03-16 13:15 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

Hi Emmanuel,

It seems you were correct. Our cluster seems to be much more stable since the kernel upgrade.

Thank you very much for your help. Really appreciate it.

Kind Regards
Sandeep

-----Original Message-----
From: Emmanuel Florac [mailto:eflorac@intellique.com] 
Sent: 11 January 2016 15:16
To: Sandeep Patel
Cc: xfs@oss.sgi.com
Subject: Re: XFS corruptions

Le Mon, 11 Jan 2016 14:20:12 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> Hi Emmanuel,
> 
> I have managed to upgrade our kernel to 3.8.13-118.2.2.el6uek.x86_64 
> which was the latest available on our repo. I will let you know how I 
> get on.
> 

OK. I'm pretty confident it should enhance the situation :)

--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com ______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: XFS corruptions
  2016-03-16 13:15                   ` Sandeep Patel
@ 2016-03-16 14:07                     ` Emmanuel Florac
  0 siblings, 0 replies; 19+ messages in thread
From: Emmanuel Florac @ 2016-03-16 14:07 UTC (permalink / raw)
  To: Sandeep Patel; +Cc: xfs

Le Wed, 16 Mar 2016 13:15:59 +0000
Sandeep Patel <spatel@omnifone.com> écrivait:

> It seems you were correct. Our cluster seems to be much more stable
> since the kernel upgrade.
> 
> Thank you very much for your help. Really appreciate it.

Good news :)

Have a nice day.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xfs corruptions
  2008-09-04 17:11 xfs corruptions Bernd Schubert
@ 2008-09-04 23:02 ` Dave Chinner
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2008-09-04 23:02 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: linux-xfs

On Thu, Sep 04, 2008 at 07:11:48PM +0200, Bernd Schubert wrote:
> Hello,
> 
> I'm presently debugging the error handler of the MPT fusion driver and
> therefore causing errors on the disk (Infortrend scsi hardware raids).
> When I later on try to delete files and directories having been created
> before and during the failures, "rm -fr" simply says directory not empty.
> No message in dmesg about it, but xfs_repair reports errors, see below.
> Once xfs_repair has done its jobs, removing these directories works fine.
> But this shouldn't happen, should it? This is with 2.6.26

So we have an inode that is marked free in the AGI btree, but
apparently still in use according to the shortform directory that
referenced it.

There are two possibilities here:

The first possibility is the inode btree buffer containing the
record indicating the inode is free/used never got written to
disk while the other metadata blocks made it to disk. Seeing as
the filesystem didn't hang here, it implies that the buffer was
written so that on I/O completion the tail of the log could
move forward. That is, the I/O was issued, no error was reported,
but the I/O never made it to disk. If there was an error, you
should see something like:

Warning: Device <bdevname>, XFS metadata write error, block 0x456 in <bdev>

in the syslog indicating a write error. In this case it woul dbe
non-fatal and XFS would try to write it again a little later.

The second possibility is that the write of the inode containing the 
shortform directory to disk did not actually hit the disk, but that
implies unlinks had already taken place well before the 'rm -rf'
was executed. Perhaps your workload does that....

However, both cases imply that an I/O was indicated as completing
successfully when they did not get written to disk, and that points
to a bug in the error handling in the underlying driver.

That being said - it could be a bug in the XFS error handling that
is causing this, but XFS tends to be pretty noisy when errors occur.
I guess that you need to add more tracing to indicate when errors
are induced so we can check if errors are being created against
the same buffers that the inconsistent state is being found in. That
will help point out if the errors are being reported to XFS or not
correctly.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* xfs corruptions
@ 2008-09-04 17:11 Bernd Schubert
  2008-09-04 23:02 ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Bernd Schubert @ 2008-09-04 17:11 UTC (permalink / raw)
  To: linux-xfs

Hello,

I'm presently debugging the error handler of the MPT fusion driver and
therefore causing errors on the disk (Infortrend scsi hardware raids).
When I later on try to delete files and directories having been created
before and during the failures, "rm -fr" simply says directory not empty.
No message in dmesg about it, but xfs_repair reports errors, see below.
Once xfs_repair has done its jobs, removing these directories works fine.
But this shouldn't happen, should it? This is with 2.6.26

root@beo-11:~# xfs_repair /dev/inf/box-3a/disc
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
imap claims a free inode 1073741964 is in use, correcting imap and clearing
inode
cleared inode 1073741964
        - agno = 2
imap claims a free inode 2147483788 is in use, correcting imap and clearing
inode
cleared inode 2147483788
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
entry "9B769A18" in shortform directory 1073741962 references free inode
1073741964
junking entry "9B769A18" in directory inode 1073741962
        - agno = 2
entry "E95A1D2D" in shortform directory 2147483786 references free inode
2147483788
junking entry "E95A1D2D" in directory inode 2147483786
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-03-16 14:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-30 16:51 XFS corruptions Sandeep Patel
2015-11-30 17:40 ` Emmanuel Florac
2015-11-30 18:06   ` Sandeep Patel
2015-11-30 18:45     ` Emmanuel Florac
2015-12-01 12:18       ` Sandeep Patel
2015-12-01 13:50         ` Emmanuel Florac
2015-12-01 14:01           ` Sandeep Patel
2015-12-10 15:07           ` Sandeep Patel
2015-12-10 15:46             ` Emmanuel Florac
2015-12-10 19:51               ` Sandeep Patel
2016-01-11 14:20               ` Sandeep Patel
2016-01-11 15:16                 ` Emmanuel Florac
2016-03-16 13:15                   ` Sandeep Patel
2016-03-16 14:07                     ` Emmanuel Florac
2015-11-30 21:51 ` Dave Chinner
2015-11-30 22:04   ` Darrick J. Wong
2015-12-10 16:46 ` Eric Sandeen
  -- strict thread matches above, loose matches on Subject: below --
2008-09-04 17:11 xfs corruptions Bernd Schubert
2008-09-04 23:02 ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.