All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
@ 2016-10-21 17:09 Libor Klepáč
  2016-10-21 17:59 ` Brian Foster
  0 siblings, 1 reply; 18+ messages in thread
From: Libor Klepáč @ 2016-10-21 17:09 UTC (permalink / raw)
  To: linux-xfs

Hello,
sorry for last incomplete email (if it arrives), i hit some send button by accident.

Last week we have started to have problems with one virtual machine running debian jessie, with kernel 3.16.7-ckt20-1+deb8u4.
virtualization is done on vmware 5.5 on dell r610, disks are on perc h700.

XFS is on data disk (/dev/mapper/vgDisk2-lvData) running cyrus, mysql, apache+php.
It resides on single disk LVM, without partitions.
#pvs
  PV         VG      Fmt  Attr PSize   PFree
  /dev/sda2  vgDisk1 lvm2 a--   15.76g    0 
  /dev/sdb   vgDisk2 lvm2 a--  410.00g    0

#lvs
  LV       VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvSwap   vgDisk1 -wi-ao----   1.86g                                                    
  lvSystem vgDisk1 -wi-ao----  13.90g                                                    
  lvData   vgDisk2 -wi-ao---- 410.00g

#grep xfs /etc/fstab 
/dev/mapper/vgDisk2-lvData      /mountpoint       xfs     noatime,logbufs=8       0       1

It was created in Debian Squeeze on kernel 2.6.32 OR Wheezy on 3.2.0.


There are some logs, this one repeats but doesn't cause shutdown

Oct 14 07:02:58 vps2 kernel: [18855093.206725] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x46/0xd0 [xfs], block 0x24c17ba8
Oct 14 07:02:58 vps2 kernel: [18855093.210393] XFS (dm-2): Unmount and run xfs_repair
Oct 14 07:02:58 vps2 kernel: [18855093.211224] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Oct 14 07:02:58 vps2 kernel: [18855093.212092] ffff8801853da000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Oct 14 07:02:58 vps2 kernel: [18855093.213932] ffff8801853da010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Oct 14 07:02:58 vps2 kernel: [18855093.215915] ffff8801853da020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 14 07:02:58 vps2 kernel: [18855093.218054] ffff8801853da030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 14 07:02:58 vps2 kernel: [18855093.220317] XFS (dm-2): metadata I/O error: block 0x24c17ba8 ("xfs_trans_read_buf_map") error 117 numblks 8

Then shutdown occured on different block 0x12f63f40
Oct 14 12:00:24 vps2 kernel: [18872956.205316] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs], block 0x12f63f40
Oct 14 12:00:24 vps2 kernel: [18872956.208382] XFS (dm-2): Unmount and run xfs_repair
Oct 14 12:00:24 vps2 kernel: [18872956.209385] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Oct 14 12:00:24 vps2 kernel: [18872956.210187] ffff88011dadd000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Oct 14 12:00:24 vps2 kernel: [18872956.211816] ffff88011dadd010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Oct 14 12:00:24 vps2 kernel: [18872956.213390] ffff88011dadd020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 14 12:00:24 vps2 kernel: [18872956.214983] ffff88011dadd030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 14 12:00:24 vps2 kernel: [18872956.216598] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1330 of file /build/linux-U7H2aZ/linux-3.16.7-ckt20/fs/xfs/xfs_buf.c.  Return address = 0xffffffffa03ef820
Oct 14 12:00:24 vps2 kernel: [18872956.217448] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
Oct 14 12:00:24 vps2 kernel: [18872956.218338] XFS (dm-2): Please umount the filesystem and rectify the problem(s)

after killing all relevant processes and unmounting some bind-mount points and remounting

Oct 14 12:09:21 vps2 kernel: [18873494.193987] XFS (dm-2): xfs_log_force: error 5 returned.
Oct 14 12:09:28 vps2 kernel: [18873501.622426] XFS (dm-2): Mounting V4 Filesystem
Oct 14 12:09:29 vps2 kernel: [18873501.700781] XFS (dm-2): Starting recovery (logdev: internal)
Oct 14 12:09:29 vps2 kernel: [18873501.998101] XFS (dm-2): Ending recovery (logdev: internal)

filesystem mounts ok, but after while it logs again on block 0x24c17ba8, without shutdown

Oct 14 12:20:31 vps2 kernel: [18874164.759507] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x46/0xd0 [xfs], block 0x24c17ba8
Oct 14 12:20:31 vps2 kernel: [18874164.764684] XFS (dm-2): Unmount and run xfs_repair
Oct 14 12:20:31 vps2 kernel: [18874164.766246] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Oct 14 12:20:31 vps2 kernel: [18874164.767802] ffff880115a49000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Oct 14 12:20:31 vps2 kernel: [18874164.770820] ffff880115a49010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Oct 14 12:20:31 vps2 kernel: [18874164.773848] ffff880115a49020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 14 12:20:31 vps2 kernel: [18874164.776839] ffff880115a49030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 14 12:20:31 vps2 kernel: [18874164.779904] XFS (dm-2): metadata I/O error: block 0x24c17ba8 ("xfs_trans_read_buf_map") error 117 numblks 8

FS shutdown happened on Oct 13, but i don't have logs ...

Over night i upgraded kernel to debian kernel 3.16.36-1+deb8u1 , rebooted a ran xfs_repair. It repaired some metadata (sorry, don't have logs either :(

It seems it logged this problem over week, i didn't check, busy on different tasks ...
Oct 16 07:05:09 vps2 kernel: [103607.064314] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x46/0xd0 [xfs], block 0x12f63f40
Oct 16 07:05:09 vps2 kernel: [103607.067200] XFS (dm-2): Unmount and run xfs_repair
Oct 16 07:05:09 vps2 kernel: [103607.068510] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Oct 16 07:05:09 vps2 kernel: [103607.069554] ffff8801262e9000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Oct 16 07:05:09 vps2 kernel: [103607.070712] ffff8801262e9010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Oct 16 07:05:09 vps2 kernel: [103607.071971] ffff8801262e9020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 16 07:05:09 vps2 kernel: [103607.072990] ffff8801262e9030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 16 07:05:09 vps2 kernel: [103607.074329] XFS (dm-2): metadata I/O error: block 0x12f63f40 ("xfs_trans_read_buf_map") error 117 numblks 8

This night, FS shutdown occured again, with slightly different log
Oct 21 01:00:06 vps2 kernel: [514098.568389] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs], block 0x12f4ca30
Oct 21 01:00:06 vps2 kernel: [514098.570073] XFS (dm-2): Unmount and run xfs_repair
Oct 21 01:00:06 vps2 kernel: [514098.571014] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Oct 21 01:00:06 vps2 kernel: [514098.571800] ffff88020e8b0000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Oct 21 01:00:06 vps2 kernel: [514098.572408] ffff88020e8b0010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Oct 21 01:00:06 vps2 kernel: [514098.573167] ffff88020e8b0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 21 01:00:06 vps2 kernel: [514098.573779] ffff88020e8b0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 21 01:00:06 vps2 kernel: [514098.574347] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1337 of file /build/linux-EZT6bx/linux-3.16.36/fs/xfs/xfs_buf.c.  Return address = 0xffffffffa03eac00
Oct 21 01:00:06 vps2 kernel: [514098.574447] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
Oct 21 01:00:06 vps2 kernel: [514098.575000] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
Oct 21 01:00:06 vps2 kernel: [514098.627574] XFS (dm-2): xfs_log_force: error 5 returned.
Oct 21 01:00:06 vps2 kernel: [514098.680405] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x46/0xd0 [xfs], block 0x12f4ca30
Oct 21 01:00:06 vps2 kernel: [514098.681555] XFS (dm-2): Unmount and run xfs_repair
Oct 21 01:00:06 vps2 kernel: [514098.682143] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Oct 21 01:00:06 vps2 kernel: [514098.682726] ffff88020e8b0000: 3c 3f 70 68 70 20 2f 2a 25 25 53 6d 61 72 74 79  <?php /*%%Smarty
Oct 21 01:00:06 vps2 kernel: [514098.683315] ffff88020e8b0010: 48 65 61 64 65 72 43 6f 64 65 3a 31 30 30 37 36  HeaderCode:10076
Oct 21 01:00:06 vps2 kernel: [514098.683930] ffff88020e8b0020: 34 36 39 39 35 35 38 30 39 33 30 37 65 30 36 33  469955809307e063
Oct 21 01:00:06 vps2 kernel: [514098.684501] ffff88020e8b0030: 37 63 30 2d 33 32 38 39 34 32 38 31 25 25 2a 2f  7c0-32894281%%*/
Oct 21 01:00:06 vps2 kernel: [514098.685064] XFS (dm-2): metadata I/O error: block 0x12f4ca30 ("xfs_trans_read_buf_map") error 117 numblks 8
Oct 21 01:00:06 vps2 kernel: [514098.745473] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.

Is there some way to stop this? Maybe upgrading to kernel 4.7 from backports?
Is there a way to map those "block 0x12f4ca30" , "block 0x24c17ba8" to a specific file?


We have another virtual running in almost same configuration, but on different HW (dell r710) in same VM cluster.
It have had similar problems with in memory data corruption several times a year, but without logging any problems in between.
It had several 3.16 kernel versions (i always update to latest package when this happens)
Log is similar
Oct 11 14:18:01 vps1 kernel: [6376491.318342] XFS (dm-3): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs], block 0x4b060
Oct 11 14:18:01 vps1 kernel: [6376491.320972] XFS (dm-3): Unmount and run xfs_repair
Oct 11 14:18:01 vps1 kernel: [6376491.321165] XFS (dm-3): First 64 bytes of corrupted metadata buffer:
Oct 11 14:18:01 vps1 kernel: [6376491.321437] ffff88000e97a000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Oct 11 14:18:01 vps1 kernel: [6376491.321726] ffff88000e97a010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Oct 11 14:18:01 vps1 kernel: [6376491.322023] ffff88000e97a020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 11 14:18:01 vps1 kernel: [6376491.322314] ffff88000e97a030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 11 14:18:01 vps1 kernel: [6376491.322630] XFS (dm-3): xfs_do_force_shutdown(0x8) called from line 1337 of file /build/linux-7z1rSb/linux-3.16.7-ckt25/fs/xfs/xfs_buf.c.  Return address = 0xffffffffa03a3820
Oct 11 14:18:01 vps1 kernel: [6376491.323832] XFS (dm-3): Corruption of in-memory data detected.  Shutting down filesystem
Oct 11 14:18:01 vps1 kernel: [6376491.324157] XFS (dm-3): Please umount the filesystem and rectify the problem(s)
Oct 11 14:18:16 vps1 kernel: [6376506.023406] XFS (dm-3): xfs_log_force: error 5 returned.
Oct 11 14:18:46 vps1 kernel: [6376536.132491] XFS (dm-3): xfs_log_force: error 5 returned.
Oct 11 14:19:16 vps1 kernel: [6376566.241488] XFS (dm-3): xfs_log_force: error 5 returned.
Oct 11 14:19:46 vps1 kernel: [6376596.350546] XFS (dm-3): xfs_log_force: error 5 returned.
Oct 11 14:20:16 vps1 kernel: [6376626.459602] XFS (dm-3): xfs_log_force: error 5 returned.
Oct 11 14:20:47 vps1 kernel: [6376656.568708] XFS (dm-3): xfs_log_force: error 5 returned.
Oct 11 14:21:17 vps1 kernel: [6376686.677853] XFS (dm-3): xfs_log_force: error 5 returned.
Oct 11 14:21:20 vps1 kernel: [6376689.870237] XFS (dm-3): xfs_log_force: error 5 returned.
Oct 11 14:21:22 vps1 kernel: [6376692.358466] XFS (dm-3): xfs_log_force: error 5 returned.
Oct 11 14:21:25 vps1 kernel: [6376694.871370] XFS (dm-3): xfs_log_force: error 5 returned.
Oct 11 14:21:31 vps1 kernel: [6376700.985227] XFS (dm-3): Mounting V4 Filesystem
Oct 11 14:21:31 vps1 kernel: [6376701.052522] XFS (dm-3): Starting recovery (logdev: internal)
Oct 11 14:21:31 vps1 kernel: [6376701.091589] XFS (dm-3): Ending recovery (logdev: internal)


Any clues what might be wrong? HW problem? but it doesn't affect other hosts, we use XFS on all of them for data.

With regards,

Libor

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-10-21 17:09 BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify Libor Klepáč
@ 2016-10-21 17:59 ` Brian Foster
  2016-10-21 22:20   ` Dave Chinner
  2016-10-23  6:48   ` Libor Klepáč
  0 siblings, 2 replies; 18+ messages in thread
From: Brian Foster @ 2016-10-21 17:59 UTC (permalink / raw)
  To: Libor Klepáč; +Cc: linux-xfs

On Fri, Oct 21, 2016 at 07:09:06PM +0200, Libor Klepáč wrote:
> Hello,
> sorry for last incomplete email (if it arrives), i hit some send button by accident.
> 
> Last week we have started to have problems with one virtual machine running debian jessie, with kernel 3.16.7-ckt20-1+deb8u4.
> virtualization is done on vmware 5.5 on dell r610, disks are on perc h700.
> 
> XFS is on data disk (/dev/mapper/vgDisk2-lvData) running cyrus, mysql, apache+php.
> It resides on single disk LVM, without partitions.
> #pvs
>   PV         VG      Fmt  Attr PSize   PFree
>   /dev/sda2  vgDisk1 lvm2 a--   15.76g    0 
>   /dev/sdb   vgDisk2 lvm2 a--  410.00g    0
> 
> #lvs
>   LV       VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   lvSwap   vgDisk1 -wi-ao----   1.86g                                                    
>   lvSystem vgDisk1 -wi-ao----  13.90g                                                    
>   lvData   vgDisk2 -wi-ao---- 410.00g
> 
> #grep xfs /etc/fstab 
> /dev/mapper/vgDisk2-lvData      /mountpoint       xfs     noatime,logbufs=8       0       1
> 
> It was created in Debian Squeeze on kernel 2.6.32 OR Wheezy on 3.2.0.
> 
> 
> There are some logs, this one repeats but doesn't cause shutdown
> 
> Oct 14 07:02:58 vps2 kernel: [18855093.206725] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x46/0xd0 [xfs], block 0x24c17ba8
> Oct 14 07:02:58 vps2 kernel: [18855093.210393] XFS (dm-2): Unmount and run xfs_repair
> Oct 14 07:02:58 vps2 kernel: [18855093.211224] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Oct 14 07:02:58 vps2 kernel: [18855093.212092] ffff8801853da000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Oct 14 07:02:58 vps2 kernel: [18855093.213932] ffff8801853da010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Oct 14 07:02:58 vps2 kernel: [18855093.215915] ffff8801853da020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 14 07:02:58 vps2 kernel: [18855093.218054] ffff8801853da030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 14 07:02:58 vps2 kernel: [18855093.220317] XFS (dm-2): metadata I/O error: block 0x24c17ba8 ("xfs_trans_read_buf_map") error 117 numblks 8
> 
> Then shutdown occured on different block 0x12f63f40
> Oct 14 12:00:24 vps2 kernel: [18872956.205316] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs], block 0x12f63f40
> Oct 14 12:00:24 vps2 kernel: [18872956.208382] XFS (dm-2): Unmount and run xfs_repair
> Oct 14 12:00:24 vps2 kernel: [18872956.209385] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Oct 14 12:00:24 vps2 kernel: [18872956.210187] ffff88011dadd000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Oct 14 12:00:24 vps2 kernel: [18872956.211816] ffff88011dadd010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Oct 14 12:00:24 vps2 kernel: [18872956.213390] ffff88011dadd020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 14 12:00:24 vps2 kernel: [18872956.214983] ffff88011dadd030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 14 12:00:24 vps2 kernel: [18872956.216598] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1330 of file /build/linux-U7H2aZ/linux-3.16.7-ckt20/fs/xfs/xfs_buf.c.  Return address = 0xffffffffa03ef820
> Oct 14 12:00:24 vps2 kernel: [18872956.217448] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> Oct 14 12:00:24 vps2 kernel: [18872956.218338] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
> 

The shutdown has more to do with whether the corruption is detected on
read vs. write. E.g., we shutdown on write verifier failure to avoid
writing corrupted data to disk and causing further damage.

I suppose in this particular instance we don't really know whether the
corruption existed on disk or originated in memory. Regardless, the
corruption appears to be consistently associated with extended attribute
blocks. Are you running an application that makes heavy use of xattrs?

> after killing all relevant processes and unmounting some bind-mount points and remounting
> 
> Oct 14 12:09:21 vps2 kernel: [18873494.193987] XFS (dm-2): xfs_log_force: error 5 returned.
> Oct 14 12:09:28 vps2 kernel: [18873501.622426] XFS (dm-2): Mounting V4 Filesystem
> Oct 14 12:09:29 vps2 kernel: [18873501.700781] XFS (dm-2): Starting recovery (logdev: internal)
> Oct 14 12:09:29 vps2 kernel: [18873501.998101] XFS (dm-2): Ending recovery (logdev: internal)
> 
> filesystem mounts ok, but after while it logs again on block 0x24c17ba8, without shutdown
> 

Note that a remount isn't going to resolve on-disk corruption. We're
just going to trip over it again on the next access as we have here.

> Oct 14 12:20:31 vps2 kernel: [18874164.759507] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x46/0xd0 [xfs], block 0x24c17ba8
> Oct 14 12:20:31 vps2 kernel: [18874164.764684] XFS (dm-2): Unmount and run xfs_repair
> Oct 14 12:20:31 vps2 kernel: [18874164.766246] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Oct 14 12:20:31 vps2 kernel: [18874164.767802] ffff880115a49000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Oct 14 12:20:31 vps2 kernel: [18874164.770820] ffff880115a49010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Oct 14 12:20:31 vps2 kernel: [18874164.773848] ffff880115a49020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 14 12:20:31 vps2 kernel: [18874164.776839] ffff880115a49030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 14 12:20:31 vps2 kernel: [18874164.779904] XFS (dm-2): metadata I/O error: block 0x24c17ba8 ("xfs_trans_read_buf_map") error 117 numblks 8
> 
> FS shutdown happened on Oct 13, but i don't have logs ...
> 
> Over night i upgraded kernel to debian kernel 3.16.36-1+deb8u1 , rebooted a ran xfs_repair. It repaired some metadata (sorry, don't have logs either :(
> 

So presumably xfs_repair found and fixed some problems. What version of
xfs_repair is being used? 

> It seems it logged this problem over week, i didn't check, busy on different tasks ...
> Oct 16 07:05:09 vps2 kernel: [103607.064314] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x46/0xd0 [xfs], block 0x12f63f40
> Oct 16 07:05:09 vps2 kernel: [103607.067200] XFS (dm-2): Unmount and run xfs_repair
> Oct 16 07:05:09 vps2 kernel: [103607.068510] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Oct 16 07:05:09 vps2 kernel: [103607.069554] ffff8801262e9000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Oct 16 07:05:09 vps2 kernel: [103607.070712] ffff8801262e9010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Oct 16 07:05:09 vps2 kernel: [103607.071971] ffff8801262e9020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 16 07:05:09 vps2 kernel: [103607.072990] ffff8801262e9030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 16 07:05:09 vps2 kernel: [103607.074329] XFS (dm-2): metadata I/O error: block 0x12f63f40 ("xfs_trans_read_buf_map") error 117 numblks 8
> 

This looks like the same block that tripped over the write verifier
above.

> This night, FS shutdown occured again, with slightly different log
> Oct 21 01:00:06 vps2 kernel: [514098.568389] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs], block 0x12f4ca30
> Oct 21 01:00:06 vps2 kernel: [514098.570073] XFS (dm-2): Unmount and run xfs_repair
> Oct 21 01:00:06 vps2 kernel: [514098.571014] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Oct 21 01:00:06 vps2 kernel: [514098.571800] ffff88020e8b0000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Oct 21 01:00:06 vps2 kernel: [514098.572408] ffff88020e8b0010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Oct 21 01:00:06 vps2 kernel: [514098.573167] ffff88020e8b0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 21 01:00:06 vps2 kernel: [514098.573779] ffff88020e8b0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 21 01:00:06 vps2 kernel: [514098.574347] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1337 of file /build/linux-EZT6bx/linux-3.16.36/fs/xfs/xfs_buf.c.  Return address = 0xffffffffa03eac00
> Oct 21 01:00:06 vps2 kernel: [514098.574447] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
> Oct 21 01:00:06 vps2 kernel: [514098.575000] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
> Oct 21 01:00:06 vps2 kernel: [514098.627574] XFS (dm-2): xfs_log_force: error 5 returned.
> Oct 21 01:00:06 vps2 kernel: [514098.680405] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x46/0xd0 [xfs], block 0x12f4ca30
> Oct 21 01:00:06 vps2 kernel: [514098.681555] XFS (dm-2): Unmount and run xfs_repair
> Oct 21 01:00:06 vps2 kernel: [514098.682143] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> Oct 21 01:00:06 vps2 kernel: [514098.682726] ffff88020e8b0000: 3c 3f 70 68 70 20 2f 2a 25 25 53 6d 61 72 74 79  <?php /*%%Smarty
> Oct 21 01:00:06 vps2 kernel: [514098.683315] ffff88020e8b0010: 48 65 61 64 65 72 43 6f 64 65 3a 31 30 30 37 36  HeaderCode:10076
> Oct 21 01:00:06 vps2 kernel: [514098.683930] ffff88020e8b0020: 34 36 39 39 35 35 38 30 39 33 30 37 65 30 36 33  469955809307e063
> Oct 21 01:00:06 vps2 kernel: [514098.684501] ffff88020e8b0030: 37 63 30 2d 33 32 38 39 34 32 38 31 25 25 2a 2f  7c0-32894281%%*/
> Oct 21 01:00:06 vps2 kernel: [514098.685064] XFS (dm-2): metadata I/O error: block 0x12f4ca30 ("xfs_trans_read_buf_map") error 117 numblks 8
> Oct 21 01:00:06 vps2 kernel: [514098.745473] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> 
> Is there some way to stop this? Maybe upgrading to kernel 4.7 from backports?
> Is there a way to map those "block 0x12f4ca30" , "block 0x24c17ba8" to a specific file?
> 

v3.16 is certainly kind of old. For starters though, I would suggest to
grab the most recent xfsprogs release you can (you can even grab the
source and run it right out of the build tree), run 'xfs_repair -n' and
report the results. Presumably there has been some corruption on disk
since the last run, so it might find some things you want to fix. If you
run repair without -n to actually fix the problems, I find it usually a
good idea to follow up with 'xfs_repair -n' again to make sure repair
fixed up everything it found.

With regard to mapping the block back to an inode, you may be able to
use xfs_db:

$ xfs_db <dev>
xfs_db> blockget
xfs_db> daddr 0x2309
xfs_db> blockuse
...

Brian

> 
> We have another virtual running in almost same configuration, but on different HW (dell r710) in same VM cluster.
> It have had similar problems with in memory data corruption several times a year, but without logging any problems in between.
> It had several 3.16 kernel versions (i always update to latest package when this happens)
> Log is similar
> Oct 11 14:18:01 vps1 kernel: [6376491.318342] XFS (dm-3): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs], block 0x4b060
> Oct 11 14:18:01 vps1 kernel: [6376491.320972] XFS (dm-3): Unmount and run xfs_repair
> Oct 11 14:18:01 vps1 kernel: [6376491.321165] XFS (dm-3): First 64 bytes of corrupted metadata buffer:
> Oct 11 14:18:01 vps1 kernel: [6376491.321437] ffff88000e97a000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Oct 11 14:18:01 vps1 kernel: [6376491.321726] ffff88000e97a010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Oct 11 14:18:01 vps1 kernel: [6376491.322023] ffff88000e97a020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 11 14:18:01 vps1 kernel: [6376491.322314] ffff88000e97a030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Oct 11 14:18:01 vps1 kernel: [6376491.322630] XFS (dm-3): xfs_do_force_shutdown(0x8) called from line 1337 of file /build/linux-7z1rSb/linux-3.16.7-ckt25/fs/xfs/xfs_buf.c.  Return address = 0xffffffffa03a3820
> Oct 11 14:18:01 vps1 kernel: [6376491.323832] XFS (dm-3): Corruption of in-memory data detected.  Shutting down filesystem
> Oct 11 14:18:01 vps1 kernel: [6376491.324157] XFS (dm-3): Please umount the filesystem and rectify the problem(s)
> Oct 11 14:18:16 vps1 kernel: [6376506.023406] XFS (dm-3): xfs_log_force: error 5 returned.
> Oct 11 14:18:46 vps1 kernel: [6376536.132491] XFS (dm-3): xfs_log_force: error 5 returned.
> Oct 11 14:19:16 vps1 kernel: [6376566.241488] XFS (dm-3): xfs_log_force: error 5 returned.
> Oct 11 14:19:46 vps1 kernel: [6376596.350546] XFS (dm-3): xfs_log_force: error 5 returned.
> Oct 11 14:20:16 vps1 kernel: [6376626.459602] XFS (dm-3): xfs_log_force: error 5 returned.
> Oct 11 14:20:47 vps1 kernel: [6376656.568708] XFS (dm-3): xfs_log_force: error 5 returned.
> Oct 11 14:21:17 vps1 kernel: [6376686.677853] XFS (dm-3): xfs_log_force: error 5 returned.
> Oct 11 14:21:20 vps1 kernel: [6376689.870237] XFS (dm-3): xfs_log_force: error 5 returned.
> Oct 11 14:21:22 vps1 kernel: [6376692.358466] XFS (dm-3): xfs_log_force: error 5 returned.
> Oct 11 14:21:25 vps1 kernel: [6376694.871370] XFS (dm-3): xfs_log_force: error 5 returned.
> Oct 11 14:21:31 vps1 kernel: [6376700.985227] XFS (dm-3): Mounting V4 Filesystem
> Oct 11 14:21:31 vps1 kernel: [6376701.052522] XFS (dm-3): Starting recovery (logdev: internal)
> Oct 11 14:21:31 vps1 kernel: [6376701.091589] XFS (dm-3): Ending recovery (logdev: internal)
> 
> 
> Any clues what might be wrong? HW problem? but it doesn't affect other hosts, we use XFS on all of them for data.
> 
> With regards,
> 
> Libor
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-10-21 17:59 ` Brian Foster
@ 2016-10-21 22:20   ` Dave Chinner
  2016-10-23  6:48   ` Libor Klepáč
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-10-21 22:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: Libor Klepáč, linux-xfs

On Fri, Oct 21, 2016 at 01:59:13PM -0400, Brian Foster wrote:
> On Fri, Oct 21, 2016 at 07:09:06PM +0200, Libor Klepáč wrote:
> > Hello,
> > sorry for last incomplete email (if it arrives), i hit some send button by accident.
> > 
> > Last week we have started to have problems with one virtual machine running debian jessie, with kernel 3.16.7-ckt20-1+deb8u4.
> > virtualization is done on vmware 5.5 on dell r610, disks are on perc h700.
> > 
> > XFS is on data disk (/dev/mapper/vgDisk2-lvData) running cyrus, mysql, apache+php.
> > It resides on single disk LVM, without partitions.
> > #pvs
> >   PV         VG      Fmt  Attr PSize   PFree
> >   /dev/sda2  vgDisk1 lvm2 a--   15.76g    0 
> >   /dev/sdb   vgDisk2 lvm2 a--  410.00g    0
> > 
> > #lvs
> >   LV       VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
> >   lvSwap   vgDisk1 -wi-ao----   1.86g                                                    
> >   lvSystem vgDisk1 -wi-ao----  13.90g                                                    
> >   lvData   vgDisk2 -wi-ao---- 410.00g
> > 
> > #grep xfs /etc/fstab 
> > /dev/mapper/vgDisk2-lvData      /mountpoint       xfs     noatime,logbufs=8       0       1
> > 
> > It was created in Debian Squeeze on kernel 2.6.32 OR Wheezy on 3.2.0.
> > 
> > 
> > There are some logs, this one repeats but doesn't cause shutdown
> > 
> > Oct 14 07:02:58 vps2 kernel: [18855093.206725] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x46/0xd0 [xfs], block 0x24c17ba8
> > Oct 14 07:02:58 vps2 kernel: [18855093.210393] XFS (dm-2): Unmount and run xfs_repair
> > Oct 14 07:02:58 vps2 kernel: [18855093.211224] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
> > Oct 14 07:02:58 vps2 kernel: [18855093.212092] ffff8801853da000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> > Oct 14 07:02:58 vps2 kernel: [18855093.213932] ffff8801853da010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> > Oct 14 07:02:58 vps2 kernel: [18855093.215915] ffff8801853da020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > Oct 14 07:02:58 vps2 kernel: [18855093.218054] ffff8801853da030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

ffff8801853da000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
                  | forw     |  back     |magic| pad |count|usedbytes
ffff8801853da010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
	      firstused|hl|pd|base0|size0|base1|size1|base2|size2|
ffff8801853da020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
		  entry data....
ffff8801853da030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

Ok, that's a completely empty attribute leaf block. It's got no
attribute entries in it, and because the entry data is zero, it's
never had any data in it. It's failing the verifier because the
entry count in the header is zero. i.e. this sort of empty buffer
should never end up on disk.

This rings a bell, but I can't put my finger on it right now. It may
be a really old corruption that has been sitting on disk for a long
time (i.e. from whatever kernel the fs was originally created and
run on) that is only manifest now on a more recent kernel that has
better validity checking...

> > Is there some way to stop this? Maybe upgrading to kernel 4.7 from backports?
> > Is there a way to map those "block 0x12f4ca30" , "block 0x24c17ba8" to a specific file?
> > 
> 
> v3.16 is certainly kind of old. For starters though, I would suggest to
> grab the most recent xfsprogs release you can (you can even grab the
> source and run it right out of the build tree), run 'xfs_repair -n' and
> report the results.

This, please, and paste the output for us to see. If repair is not
detecting and correcting the corrupt attribute block you'll continue
to see the problem. 

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-10-21 17:59 ` Brian Foster
  2016-10-21 22:20   ` Dave Chinner
@ 2016-10-23  6:48   ` Libor Klepáč
  2016-10-24  2:40     ` Dave Chinner
  1 sibling, 1 reply; 18+ messages in thread
From: Libor Klepáč @ 2016-10-23  6:48 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

> On Fri, Oct 21, 2016 at 07:09:06PM +0200, Libor Klep�? wrote:
> > Hello,
> > Then shutdown occured on different block 0x12f63f40
> > Oct 14 12:00:24 vps2 kernel: [18872956.205316] XFS (dm-2): Metadata
> > corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs], block
> > 0x12f63f40 Oct 14 12:00:24 vps2 kernel: [18872956.208382] XFS (dm-2):
> > Unmount and run xfs_repair Oct 14 12:00:24 vps2 kernel: [18872956.209385]
> > XFS (dm-2): First 64 bytes of corrupted metadata buffer: Oct 14 12:00:24
> > vps2 kernel: [18872956.210187] ffff88011dadd000: 00 00 00 00 00 00 00 00
> > fb ee 00 00 00 00 00 00  ................ Oct 14 12:00:24 vps2 kernel:
> > [18872956.211816] ffff88011dadd010: 10 00 00 00 00 20 0f e0 00 00 00 00
> > 00 00 00 00  ..... .......... Oct 14 12:00:24 vps2 kernel:
> > [18872956.213390] ffff88011dadd020: 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00  ................ Oct 14 12:00:24 vps2 kernel:
> > [18872956.214983] ffff88011dadd030: 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00  ................ Oct 14 12:00:24 vps2 kernel:
> > [18872956.216598] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line
> > 1330 of file /build/linux-U7H2aZ/linux-3.16.7-ckt20/fs/xfs/xfs_buf.c. 
> > Return address = 0xffffffffa03ef820 Oct 14 12:00:24 vps2 kernel:
> > [18872956.217448] XFS (dm-2): Corruption of in-memory data detected. 
> > Shutting down filesystem Oct 14 12:00:24 vps2 kernel: [18872956.218338]
> > XFS (dm-2): Please umount the filesystem and rectify the problem(s)
> The shutdown has more to do with whether the corruption is detected on
> read vs. write. E.g., we shutdown on write verifier failure to avoid
> writing corrupted data to disk and causing further damage.
> 
> I suppose in this particular instance we don't really know whether the
> corruption existed on disk or originated in memory. Regardless, the
> corruption appears to be consistently associated with extended attribute
> blocks. Are you running an application that makes heavy use of xattrs?
> 

Hello,
i think that xattrs are not used at all or rarely. It's used for php 
webhosting, cyrus mail server, mysql server.

> > Oct 14 12:20:31 vps2 kernel: [18874164.759507] XFS (dm-2): Metadata
> > corruption detected at xfs_attr3_leaf_read_verify+0x46/0xd0 [xfs], block
> > 0x24c17ba8 Oct 14 12:20:31 vps2 kernel: [18874164.764684] XFS (dm-2):
> > Unmount and run xfs_repair Oct 14 12:20:31 vps2 kernel: [18874164.766246]
> > XFS (dm-2): First 64 bytes of corrupted metadata buffer: Oct 14 12:20:31
> > vps2 kernel: [18874164.767802] ffff880115a49000: 00 00 00 00 00 00 00 00
> > fb ee 00 00 00 00 00 00  ................ Oct 14 12:20:31 vps2 kernel:
> > [18874164.770820] ffff880115a49010: 10 00 00 00 00 20 0f e0 00 00 00 00
> > 00 00 00 00  ..... .......... Oct 14 12:20:31 vps2 kernel:
> > [18874164.773848] ffff880115a49020: 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00  ................ Oct 14 12:20:31 vps2 kernel:
> > [18874164.776839] ffff880115a49030: 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00  ................ Oct 14 12:20:31 vps2 kernel:
> > [18874164.779904] XFS (dm-2): metadata I/O error: block 0x24c17ba8
> > ("xfs_trans_read_buf_map") error 117 numblks 8
> > 
> > FS shutdown happened on Oct 13, but i don't have logs ...
> > 
> > Over night i upgraded kernel to debian kernel 3.16.36-1+deb8u1 , rebooted
> > a ran xfs_repair. It repaired some metadata (sorry, don't have logs
> > either :(
> So presumably xfs_repair found and fixed some problems. What version of
> xfs_repair is being used?
> 

It's version 3.2.1 from stable. Is it sufficient for 3.16 kernel or is it 
better to upgrade to 4.3.0 from testing or to newest version from upstream? 

We had fs shutdown on vps1 i mentioned in bottom of my first email again. 
I will run repair there too and post results.

Thanks Libor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-10-23  6:48   ` Libor Klepáč
@ 2016-10-24  2:40     ` Dave Chinner
  2016-10-25  6:52       ` Libor Klepáč
  2016-10-31  8:54       ` Libor Klepáč
  0 siblings, 2 replies; 18+ messages in thread
From: Dave Chinner @ 2016-10-24  2:40 UTC (permalink / raw)
  To: Libor Klepáč; +Cc: Brian Foster, linux-xfs

On Sun, Oct 23, 2016 at 08:48:06AM +0200, Libor Klepáč wrote:
> > On Fri, Oct 21, 2016 at 07:09:06PM +0200, Libor Klep�? wrote:
> > > Hello,
> > > Then shutdown occured on different block 0x12f63f40
> > > Oct 14 12:00:24 vps2 kernel: [18872956.205316] XFS (dm-2): Metadata
> > > corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs], block
> > > 0x12f63f40 Oct 14 12:00:24 vps2 kernel: [18872956.208382] XFS (dm-2):
> > > Unmount and run xfs_repair Oct 14 12:00:24 vps2 kernel: [18872956.209385]
> > > XFS (dm-2): First 64 bytes of corrupted metadata buffer: Oct 14 12:00:24
> > > vps2 kernel: [18872956.210187] ffff88011dadd000: 00 00 00 00 00 00 00 00
> > > fb ee 00 00 00 00 00 00  ................ Oct 14 12:00:24 vps2 kernel:
> > > [18872956.211816] ffff88011dadd010: 10 00 00 00 00 20 0f e0 00 00 00 00
> > > 00 00 00 00  ..... .......... Oct 14 12:00:24 vps2 kernel:
> > > [18872956.213390] ffff88011dadd020: 00 00 00 00 00 00 00 00 00 00 00 00
> > > 00 00 00 00  ................ Oct 14 12:00:24 vps2 kernel:
> > > [18872956.214983] ffff88011dadd030: 00 00 00 00 00 00 00 00 00 00 00 00
> > > 00 00 00 00  ................ Oct 14 12:00:24 vps2 kernel:
> > > [18872956.216598] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line
> > > 1330 of file /build/linux-U7H2aZ/linux-3.16.7-ckt20/fs/xfs/xfs_buf.c. 
> > > Return address = 0xffffffffa03ef820 Oct 14 12:00:24 vps2 kernel:
> > > [18872956.217448] XFS (dm-2): Corruption of in-memory data detected. 
> > > Shutting down filesystem Oct 14 12:00:24 vps2 kernel: [18872956.218338]
> > > XFS (dm-2): Please umount the filesystem and rectify the problem(s)
> > The shutdown has more to do with whether the corruption is detected on
> > read vs. write. E.g., we shutdown on write verifier failure to avoid
> > writing corrupted data to disk and causing further damage.
> > 
> > I suppose in this particular instance we don't really know whether the
> > corruption existed on disk or originated in memory. Regardless, the
> > corruption appears to be consistently associated with extended attribute
> > blocks. Are you running an application that makes heavy use of xattrs?
> > 
> 
> Hello,
> i think that xattrs are not used at all or rarely. It's used for php 
> webhosting, cyrus mail server, mysql server.

selinux, acls or some other security system enabled that uses
xattrs?

> > So presumably xfs_repair found and fixed some problems. What version of
> > xfs_repair is being used?
> 
> It's version 3.2.1 from stable. Is it sufficient for 3.16 kernel or is it 
> better to upgrade to 4.3.0 from testing or to newest version from upstream? 

The newer the better. 3.2.1 is really quite old now, and we've fixed
a lot of bugs in xfs_repair since then....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-10-24  2:40     ` Dave Chinner
@ 2016-10-25  6:52       ` Libor Klepáč
  2016-10-31  8:54       ` Libor Klepáč
  1 sibling, 0 replies; 18+ messages in thread
From: Libor Klepáč @ 2016-10-25  6:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, linux-xfs

Hello,
sorry for delay, Microsoft marks those emails as spam ...

On pondělí 24. října 2016 13:40:25 CEST Dave Chinner wrote:
> On Sun, Oct 23, 2016 at 08:48:06AM +0200, Libor Klep�? wrote:
> > > On Fri, Oct 21, 2016 at 07:09:06PM +0200, Libor Klep?? wrote:
> > > > Hello,
> > > > Then shutdown occured on different block 0x12f63f40
> > > > Oct 14 12:00:24 vps2 kernel: [18872956.205316] XFS (dm-2): Metadata
> > > > corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs],
> > > > block
> > > > 0x12f63f40 Oct 14 12:00:24 vps2 kernel: [18872956.208382] XFS (dm-2):
> > > > Unmount and run xfs_repair Oct 14 12:00:24 vps2 kernel:
> > > > [18872956.209385]
> > > > XFS (dm-2): First 64 bytes of corrupted metadata buffer: Oct 14
> > > > 12:00:24
> > > > vps2 kernel: [18872956.210187] ffff88011dadd000: 00 00 00 00 00 00 00
> > > > 00
> > > > fb ee 00 00 00 00 00 00  ................ Oct 14 12:00:24 vps2 kernel:
> > > > [18872956.211816] ffff88011dadd010: 10 00 00 00 00 20 0f e0 00 00 00
> > > > 00
> > > > 00 00 00 00  ..... .......... Oct 14 12:00:24 vps2 kernel:
> > > > [18872956.213390] ffff88011dadd020: 00 00 00 00 00 00 00 00 00 00 00
> > > > 00
> > > > 00 00 00 00  ................ Oct 14 12:00:24 vps2 kernel:
> > > > [18872956.214983] ffff88011dadd030: 00 00 00 00 00 00 00 00 00 00 00
> > > > 00
> > > > 00 00 00 00  ................ Oct 14 12:00:24 vps2 kernel:
> > > > [18872956.216598] XFS (dm-2): xfs_do_force_shutdown(0x8) called from
> > > > line
> > > > 1330 of file /build/linux-U7H2aZ/linux-3.16.7-ckt20/fs/xfs/xfs_buf.c.
> > > > Return address = 0xffffffffa03ef820 Oct 14 12:00:24 vps2 kernel:
> > > > [18872956.217448] XFS (dm-2): Corruption of in-memory data detected.
> > > > Shutting down filesystem Oct 14 12:00:24 vps2 kernel:
> > > > [18872956.218338]
> > > > XFS (dm-2): Please umount the filesystem and rectify the problem(s)
> > > 
> > > The shutdown has more to do with whether the corruption is detected on
> > > read vs. write. E.g., we shutdown on write verifier failure to avoid
> > > writing corrupted data to disk and causing further damage.
> > > 
> > > I suppose in this particular instance we don't really know whether the
> > > corruption existed on disk or originated in memory. Regardless, the
> > > corruption appears to be consistently associated with extended attribute
> > > blocks. Are you running an application that makes heavy use of xattrs?
> > 
> > Hello,
> > i think that xattrs are not used at all or rarely. It's used for php
> > webhosting, cyrus mail server, mysql server.
> 
> selinux, acls or some other security system enabled that uses
> xattrs?

No selinux, but we started to use ACL recently (files are owned by separate 
users who run php-fpm, but everything is made readable to user www-data, using 
acls) - backuppc says, full backup of server has around 4 million files, most 
of them are probably with acls.

> 
> > > So presumably xfs_repair found and fixed some problems. What version of
> > > xfs_repair is being used?
> > 
> > It's version 3.2.1 from stable. Is it sufficient for 3.16 kernel or is it
> > better to upgrade to 4.3.0 from testing or to newest version from
> > upstream?
> 
> The newer the better. 3.2.1 is really quite old now, and we've fixed
> a lot of bugs in xfs_repair since then....
> 

Ok, i will upgrade and report back. I have to coordinate with customer, 
because xfs_repair runs about 35 minutes on this filesystem.

Thanks,
Libor

> Cheers,
> 
> Dave.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-10-24  2:40     ` Dave Chinner
  2016-10-25  6:52       ` Libor Klepáč
@ 2016-10-31  8:54       ` Libor Klepáč
  2016-10-31 11:57         ` Brian Foster
  2016-10-31 12:02         ` Dave Chinner
  1 sibling, 2 replies; 18+ messages in thread
From: Libor Klepáč @ 2016-10-31  8:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, linux-xfs

Hello,
i have upgrade VM called vps1 in original email (the one on bottom of email)
to debian kernel 4.7.8-1~bpo8+1 and compiled xfsprogs 4.8.0.

Here is output of xfs_repair -n and xfs_repair. Is it supposed to write if/
what is being repaired, when i forgot -v option?

With regards, 
Libor

root@vps1:~# xfs_repair -n /dev/vgEOSVPS1Disk2/lvData 
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

-----

root@vps1:~# xfs_repair /dev/vgEOSVPS1Disk2/lvData 
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-10-31  8:54       ` Libor Klepáč
@ 2016-10-31 11:57         ` Brian Foster
  2016-10-31 12:02         ` Dave Chinner
  1 sibling, 0 replies; 18+ messages in thread
From: Brian Foster @ 2016-10-31 11:57 UTC (permalink / raw)
  To: Libor Klepáč; +Cc: Dave Chinner, linux-xfs

On Mon, Oct 31, 2016 at 09:54:00AM +0100, Libor Klepáč wrote:
> Hello,
> i have upgrade VM called vps1 in original email (the one on bottom of email)
> to debian kernel 4.7.8-1~bpo8+1 and compiled xfsprogs 4.8.0.
> 
> Here is output of xfs_repair -n and xfs_repair. Is it supposed to write if/
> what is being repaired, when i forgot -v option?
> 

Yes, xfs_repair will write some things to the fs when -n is not provided
(if that was your question?).

> With regards, 
> Libor
> 
> root@vps1:~# xfs_repair -n /dev/vgEOSVPS1Disk2/lvData 
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> 

Hmm, and it looks like all is well with the fs. Have you reproduced the
original issue since updating the kernel as well?

Brian

> -----
> 
> root@vps1:~# xfs_repair /dev/vgEOSVPS1Disk2/lvData 
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-10-31  8:54       ` Libor Klepáč
  2016-10-31 11:57         ` Brian Foster
@ 2016-10-31 12:02         ` Dave Chinner
  2016-10-31 15:36           ` Libor Klepáč
  2016-11-08 11:09           ` Libor Klepáč
  1 sibling, 2 replies; 18+ messages in thread
From: Dave Chinner @ 2016-10-31 12:02 UTC (permalink / raw)
  To: Libor Klepáč; +Cc: Brian Foster, linux-xfs

On Mon, Oct 31, 2016 at 09:54:00AM +0100, Libor Klepáč wrote:
> Hello,
> i have upgrade VM called vps1 in original email (the one on bottom of email)
> to debian kernel 4.7.8-1~bpo8+1 and compiled xfsprogs 4.8.0.
> 
> Here is output of xfs_repair -n and xfs_repair. Is it supposed to write if/
> what is being repaired, when i forgot -v option?

repair will always tell you if there are errors. The logs you have
posted indicate a clean filesystem with no errors.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-10-31 12:02         ` Dave Chinner
@ 2016-10-31 15:36           ` Libor Klepáč
  2016-11-08 11:09           ` Libor Klepáč
  1 sibling, 0 replies; 18+ messages in thread
From: Libor Klepáč @ 2016-10-31 15:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, linux-xfs

Hello,

On pondělí 31. října 2016 23:02:26 CET Dave Chinner wrote:
> On Mon, Oct 31, 2016 at 09:54:00AM +0100, Libor Klep�? wrote:
> > Hello,
> > i have upgrade VM called vps1 in original email (the one on bottom of
> > email) to debian kernel 4.7.8-1~bpo8+1 and compiled xfsprogs 4.8.0.
> > 
> > Here is output of xfs_repair -n and xfs_repair. Is it supposed to write
> > if/
> > what is being repaired, when i forgot -v option?
> 
> repair will always tell you if there are errors. The logs you have
> posted indicate a clean filesystem with no errors.

Ok, so this one looks good for now.
I have to arrange a "upgrade and xfs_repair run" window with other customer to 
deal with machine called vps2 in my original email.

> 
> Cheers,
> 
> Dave.

Thanks,

Libor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-10-31 12:02         ` Dave Chinner
  2016-10-31 15:36           ` Libor Klepáč
@ 2016-11-08 11:09           ` Libor Klepáč
  2016-11-08 11:28             ` Libor Klepáč
  1 sibling, 1 reply; 18+ messages in thread
From: Libor Klepáč @ 2016-11-08 11:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, linux-xfs

Hello,
status update on vps2.
Booted to kernel 4.7.8-1~bpo8+1, xfsprogs 4.8.0.

First check it:
vps2:~# xfs_repair -n /dev/mapper/vgDisk2-lvData
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000
        - agno = 2
Metadata corruption detected at xfs_attr3_leaf block 0x24c17ba8/0x1000
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
 
then repair it
vps2:~# xfs_repair /dev/mapper/vgDisk2-lvData
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000
        - agno = 2
Metadata corruption detected at xfs_attr3_leaf block 0x24c17ba8/0x1000
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Done

But after approx two hours:
Nov  8 07:14:12 vps2 kernel: [ 6215.369387] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12f63f40
Nov  8 07:14:12 vps2 kernel: [ 6215.369468] XFS (dm-2): Unmount and run xfs_repair
Nov  8 07:14:12 vps2 kernel: [ 6215.369498] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Nov  8 07:14:12 vps2 kernel: [ 6215.369750] XFS (dm-2): metadata I/O error: block 0x12f63f40 ("xfs_trans_read_buf_map") error 117 numblks 8
Nov  8 07:21:16 vps2 kernel: [ 6639.384423] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12645ef8
Nov  8 07:21:16 vps2 kernel: [ 6639.384501] XFS (dm-2): Unmount and run xfs_repair
Nov  8 07:21:16 vps2 kernel: [ 6639.384530] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Nov  8 07:21:16 vps2 kernel: [ 6639.384792] XFS (dm-2): metadata I/O error: block 0x12645ef8 ("xfs_trans_read_buf_map") error 117 numblks 8

No problem since then, for now.

What should be done next?

Thanks,
Libor


On pondělí 31. října 2016 23:02:26 CET Dave Chinner wrote:
> On Mon, Oct 31, 2016 at 09:54:00AM +0100, Libor Klep�? wrote:
> > Hello,
> > i have upgrade VM called vps1 in original email (the one on bottom of
> > email) to debian kernel 4.7.8-1~bpo8+1 and compiled xfsprogs 4.8.0.
> > 
> > Here is output of xfs_repair -n and xfs_repair. Is it supposed to write
> > if/
> > what is being repaired, when i forgot -v option?
> 
> repair will always tell you if there are errors. The logs you have
> posted indicate a clean filesystem with no errors.
> 
> Cheers,
> 
> Dave.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-11-08 11:09           ` Libor Klepáč
@ 2016-11-08 11:28             ` Libor Klepáč
  2016-11-10  5:29               ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Libor Klepáč @ 2016-11-08 11:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, linux-xfs

Adding more output from dmesg

Nov  8 07:14:12 vps2 kernel: [ 6215.369387] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12f63f40
Nov  8 07:14:12 vps2 kernel: [ 6215.369468] XFS (dm-2): Unmount and run xfs_repair
Nov  8 07:14:12 vps2 kernel: [ 6215.369498] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Nov  8 07:14:12 vps2 kernel: [ 6215.369538] ffff88018fe02000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Nov  8 07:14:12 vps2 kernel: [ 6215.369590] ffff88018fe02010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Nov  8 07:14:12 vps2 kernel: [ 6215.369642] ffff88018fe02020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Nov  8 07:14:12 vps2 kernel: [ 6215.369693] ffff88018fe02030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Nov  8 07:14:12 vps2 kernel: [ 6215.369750] XFS (dm-2): metadata I/O error: block 0x12f63f40 ("xfs_trans_read_buf_map") error 117 numblks 8

Nov  8 07:21:16 vps2 kernel: [ 6639.384423] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12645ef8
Nov  8 07:21:16 vps2 kernel: [ 6639.384501] XFS (dm-2): Unmount and run xfs_repair
Nov  8 07:21:16 vps2 kernel: [ 6639.384530] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Nov  8 07:21:16 vps2 kernel: [ 6639.384568] ffff88010d91c000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Nov  8 07:21:16 vps2 kernel: [ 6639.384617] ffff88010d91c010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Nov  8 07:21:16 vps2 kernel: [ 6639.384665] ffff88010d91c020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Nov  8 07:21:16 vps2 kernel: [ 6639.384734] ffff88010d91c030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Nov  8 07:21:16 vps2 kernel: [ 6639.384792] XFS (dm-2): metadata I/O error: block 0x12645ef8 ("xfs_trans_read_buf_map") error 117 numblks 8

Libor

On úterý 8. listopadu 2016 12:09:51 CET Libor Klepáč wrote:
> This sender failed our fraud detection checks and may not be who they appear
> to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing
> 
> Hello,
> status update on vps2.
> Booted to kernel 4.7.8-1~bpo8+1, xfsprogs 4.8.0.
> 
> First check it:
> vps2:~# xfs_repair -n /dev/mapper/vgDisk2-lvData
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
> Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
> Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000
>         - agno = 2
> Metadata corruption detected at xfs_attr3_leaf block 0x24c17ba8/0x1000
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> 
> then repair it
> vps2:~# xfs_repair /dev/mapper/vgDisk2-lvData
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
> Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
> Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000
>         - agno = 2
> Metadata corruption detected at xfs_attr3_leaf block 0x24c17ba8/0x1000
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> Done
> 
> But after approx two hours:
> Nov  8 07:14:12 vps2 kernel: [ 6215.369387] XFS (dm-2): Metadata corruption
> detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf
> block 0x12f63f40 Nov  8 07:14:12 vps2 kernel: [ 6215.369468] XFS (dm-2):
> Unmount and run xfs_repair Nov  8 07:14:12 vps2 kernel: [ 6215.369498] XFS
> (dm-2): First 64 bytes of corrupted metadata buffer: Nov  8 07:14:12 vps2
> kernel: [ 6215.369750] XFS (dm-2): metadata I/O error: block 0x12f63f40
> ("xfs_trans_read_buf_map") error 117 numblks 8 Nov  8 07:21:16 vps2 kernel:
> [ 6639.384423] XFS (dm-2): Metadata corruption detected at
> xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block
> 0x12645ef8 Nov  8 07:21:16 vps2 kernel: [ 6639.384501] XFS (dm-2): Unmount
> and run xfs_repair Nov  8 07:21:16 vps2 kernel: [ 6639.384530] XFS (dm-2):
> First 64 bytes of corrupted metadata buffer: Nov  8 07:21:16 vps2 kernel: [
> 6639.384792] XFS (dm-2): metadata I/O error: block 0x12645ef8
> ("xfs_trans_read_buf_map") error 117 numblks 8
> 
> No problem since then, for now.
> 
> What should be done next?
> 
> Thanks,
> Libor
> 
> On pondělí 31. října 2016 23:02:26 CET Dave Chinner wrote:
> > On Mon, Oct 31, 2016 at 09:54:00AM +0100, Libor Klep�? wrote:
> > > Hello,
> > > i have upgrade VM called vps1 in original email (the one on bottom of
> > > email) to debian kernel 4.7.8-1~bpo8+1 and compiled xfsprogs 4.8.0.
> > > 
> > > Here is output of xfs_repair -n and xfs_repair. Is it supposed to write
> > > if/
> > > what is being repaired, when i forgot -v option?
> > 
> > repair will always tell you if there are errors. The logs you have
> > posted indicate a clean filesystem with no errors.
> > 
> > Cheers,
> > 
> > Dave.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-11-08 11:28             ` Libor Klepáč
@ 2016-11-10  5:29               ` Dave Chinner
       [not found]                 ` <2152865.L3K5Xz7SXO@libor-nb>
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2016-11-10  5:29 UTC (permalink / raw)
  To: Libor Klepáč; +Cc: Brian Foster, linux-xfs

On Tue, Nov 08, 2016 at 12:28:54PM +0100, Libor Klepáč wrote:
> Adding more output from dmesg
> 
>  XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_read_verify+0x5a/0x100 [xfs], xfs_attr3_leaf block 0x12f63f40
>  XFS (dm-2): Unmount and run xfs_repair
>  XFS (dm-2): First 64 bytes of corrupted metadata buffer:
>  ffff88018fe02000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
>  ffff88018fe02010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
>  ffff88018fe02020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>  ffff88018fe02030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>  XFS (dm-2): metadata I/O error: block 0x12f63f40 ("xfs_trans_read_buf_map") error 117 numblks 8

So, again, it is empty attribute blocks that are being tripped over
at blkno 0x12f63f40 and 0x12645ef8

Which:

> > Phase 3 - for each AG...
> >         - scan (but don't clear) agi unlinked lists...
> >         - process known inodes and perform inode discovery...
> >         - agno = 0
> >         - agno = 1
> > Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
> > Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000

These two blocks. It looks like repair didn't clean them up?

Hmmmm - looking at the code I'm not sure that repair detects and
removes empty attr leaf blocks, which would explain why the error
showed up again.. Can you provide a metadump of the filesystem so we
can did into the exact neature of the problem you are seeing?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
       [not found]                 ` <2152865.L3K5Xz7SXO@libor-nb>
@ 2016-11-10 21:30                   ` Dave Chinner
  2016-11-23 11:40                     ` Libor Klepáč
  2016-12-06  9:08                     ` Libor Klepáč
  0 siblings, 2 replies; 18+ messages in thread
From: Dave Chinner @ 2016-11-10 21:30 UTC (permalink / raw)
  To: Libor Klepáč; +Cc: Brian Foster, linux-xfs

On Thu, Nov 10, 2016 at 05:04:48PM +0100, Libor Klepáč wrote:
> On čtvrtek 10. listopadu 2016 16:29:15 CET Dave Chinner wrote:
> > Which:
> > > > Phase 3 - for each AG...
> > > > 
> > > >         - scan (but don't clear) agi unlinked lists...
> > > >         - process known inodes and perform inode discovery...
> > > >         - agno = 0
> > > >         - agno = 1
> > > > 
> > > > Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
> > > > Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000
> > 
> > These two blocks. It looks like repair didn't clean them up?
> > 
> > Hmmmm - looking at the code I'm not sure that repair detects and
> > removes empty attr leaf blocks, which would explain why the error
> > showed up again.. Can you provide a metadump of the filesystem so we
> > can did into the exact neature of the problem you are seeing?
> 
> Sure not a problem. How much time will it take giving xfs_repair took approx 40 minutes?

No longer than that, with agood possibility it will be much faster
as metadump only needs 1 pass over the metadata, not three...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-11-10 21:30                   ` Dave Chinner
@ 2016-11-23 11:40                     ` Libor Klepáč
  2016-11-26  6:05                       ` Eric Sandeen
  2016-12-06  9:08                     ` Libor Klepáč
  1 sibling, 1 reply; 18+ messages in thread
From: Libor Klepáč @ 2016-11-23 11:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, linux-xfs

Hello,
collegue of mine was trying metadump, but it segfaults :-(

# xfs_metadump -g /dev/vgDisk2/lvData /mnt/xfs_dump/vps2_metadump
Copied 2588032 of 5767488 inodes (1 of 3 AGs)              Segmentation fault

He ran it twice, segfault occured on same place

dmesg says
Nov 23 05:04:42 vps2 kernel: [1294386.804427] xfs_db[25156]: segfault at fffffffffffffff0 ip 00007f6bd7bc567d sp 00007ffc1a2897b8 error 7 in libc-2.19.so[7f6bd7b40000+1a1000]
Nov 23 05:20:22 vps2 kernel: [1295327.311413] xfs_db[28241]: segfault at fffffffffffffff0 ip 00007f7b7426d67d sp 00007ffddefeb4f8 error 7 in libc-2.19.so[7f7b741e8000+1a1000]

Are you interested in this partial metadump? It has over 3GB decompressed.
I will keep it here for few days:
https://download.bcom.cz/vps2_metadump_20161123.xz

btw, it seems to contain strings from inside of files, is that normal? 
It also contains strings from mail.log which is on another (ext4) filesystem.

Thanks,
Libor


> On Thu, Nov 10, 2016 at 05:04:48PM +0100, Libor Klep�? wrote:
> > On ?tvrtek 10. listopadu 2016 16:29:15 CET Dave Chinner wrote:
> > > Which:
> > > > > Phase 3 - for each AG...
> > > > > 
> > > > >         - scan (but don't clear) agi unlinked lists...
> > > > >         - process known inodes and perform inode discovery...
> > > > >         - agno = 0
> > > > >         - agno = 1
> > > > > 
> > > > > Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
> > > > > Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000
> > > 
> > > These two blocks. It looks like repair didn't clean them up?
> > > 
> > > Hmmmm - looking at the code I'm not sure that repair detects and
> > > removes empty attr leaf blocks, which would explain why the error
> > > showed up again.. Can you provide a metadump of the filesystem so we
> > > can did into the exact neature of the problem you are seeing?
> > 
> > Sure not a problem. How much time will it take giving xfs_repair took approx 40 minutes?
> 
> No longer than that, with agood possibility it will be much faster
> as metadump only needs 1 pass over the metadata, not three...
> 
> Cheers,
> 
> Dave.
> 




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-11-23 11:40                     ` Libor Klepáč
@ 2016-11-26  6:05                       ` Eric Sandeen
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Sandeen @ 2016-11-26  6:05 UTC (permalink / raw)
  To: Libor Klepáč; +Cc: Dave Chinner, Brian Foster, linux-xfs

On Nov 23, 2016, at 5:40 AM, Libor Klepáč <libor.klepac@bcom.cz> wrote:
> 
> btw, it seems to contain strings from inside of files, is that normal? 
> It also contains strings from mail.log which is on another (ext4) filesystem.

Please try a newer version of metadump; it should not leak any stale data at all, and hopefully will not segfault, either.

Eric



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
  2016-11-10 21:30                   ` Dave Chinner
  2016-11-23 11:40                     ` Libor Klepáč
@ 2016-12-06  9:08                     ` Libor Klepáč
  1 sibling, 0 replies; 18+ messages in thread
From: Libor Klepáč @ 2016-12-06  9:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, linux-xfs

Hello,
did you get anything useful from partial metadata dump?

Meanwhile, we have another VPS/machine acting like that, this one was installed as Debian Jessie, 
so it was always on some version of kernel 3.16 (+xfsprogs 3.2.1)
I wiil upgrade to kernel 4.7.8 and xfsprogs 4.8.0 and run check, repair and metadata dump.
Error has some new lines 

Dec  6 04:00:36 vps3 kernel: [29332726.258682] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs], block 0x4878b30
Dec  6 04:00:36 vps3 kernel: [29332726.259234] XFS (dm-2): Unmount and run xfs_repair
Dec  6 04:00:36 vps3 kernel: [29332726.259598] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Dec  6 04:00:36 vps3 kernel: [29332726.259929] ffff880129d9b000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Dec  6 04:00:36 vps3 kernel: [29332726.260661] ffff880129d9b010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Dec  6 04:00:36 vps3 kernel: [29332726.261552] ffff880129d9b020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Dec  6 04:00:36 vps3 kernel: [29332726.262594] ffff880129d9b030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Dec  6 04:00:36 vps3 kernel: [29332726.263800] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1330 of file /build/linux-HklQoT/linux-3.16.7-ckt20/fs/xfs/xfs_buf.c.  Return address = 0xffffffffa0385820
Dec  6 04:00:36 vps3 kernel: [29332726.277233] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
Dec  6 04:00:36 vps3 kernel: [29332726.277926] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
Dec  6 04:00:36 vps3 kernel: [29332726.285057] Buffer I/O error on device dm-2, logical block 10636433
Dec  6 04:00:36 vps3 kernel: [29332726.285854] lost page write due to I/O error on dm-2
Dec  6 04:00:36 vps3 kernel: [29332726.285860] Buffer I/O error on device dm-2, logical block 10636434
Dec  6 04:00:36 vps3 kernel: [29332726.286580] lost page write due to I/O error on dm-2
Dec  6 04:00:36 vps3 kernel: [29332726.286602] Buffer I/O error on device dm-2, logical block 14169416
Dec  6 04:00:36 vps3 kernel: [29332726.287347] lost page write due to I/O error on dm-2
Dec  6 04:00:36 vps3 kernel: [29332726.287354] Buffer I/O error on device dm-2, logical block 13145613
Dec  6 04:00:36 vps3 kernel: [29332726.288100] lost page write due to I/O error on dm-2
Dec  6 04:00:36 vps3 kernel: [29332726.288105] Buffer I/O error on device dm-2, logical block 13145614
Dec  6 04:00:36 vps3 kernel: [29332726.288851] lost page write due to I/O error on dm-2
Dec  6 04:00:36 vps3 kernel: [29332726.288856] Buffer I/O error on device dm-2, logical block 13145615
Dec  6 04:00:36 vps3 kernel: [29332726.289611] lost page write due to I/O error on dm-2
Dec  6 04:00:36 vps3 kernel: [29332726.289615] Buffer I/O error on device dm-2, logical block 13145616
Dec  6 04:00:36 vps3 kernel: [29332726.290347] lost page write due to I/O error on dm-2
Dec  6 04:00:36 vps3 kernel: [29332726.290352] Buffer I/O error on device dm-2, logical block 13145617
Dec  6 04:00:36 vps3 kernel: [29332726.291072] lost page write due to I/O error on dm-2
Dec  6 04:00:36 vps3 kernel: [29332726.291075] Buffer I/O error on device dm-2, logical block 13145618
Dec  6 04:00:36 vps3 kernel: [29332726.291814] lost page write due to I/O error on dm-2
Dec  6 04:00:36 vps3 kernel: [29332726.291819] Buffer I/O error on device dm-2, logical block 13145619
Dec  6 04:00:36 vps3 kernel: [29332726.292535] lost page write due to I/O error on dm-2
Dec  6 04:00:48 vps3 kernel: [29332737.898720] XFS (dm-2): xfs_log_force: error 5 returned.


dm-2 is logical volume created on single disk without partitions


Could it be HW problem? HW servers do have ECC memory and HW raids

Thanks,
Libor


On pátek 11. listopadu 2016 8:30:57 CET Dave Chinner wrote:
> On Thu, Nov 10, 2016 at 05:04:48PM +0100, Libor Klep�? wrote:
> > On ?tvrtek 10. listopadu 2016 16:29:15 CET Dave Chinner wrote:
> > > Which:
> > > > > Phase 3 - for each AG...
> > > > > 
> > > > >         - scan (but don't clear) agi unlinked lists...
> > > > >         - process known inodes and perform inode discovery...
> > > > >         - agno = 0
> > > > >         - agno = 1
> > > > > 
> > > > > Metadata corruption detected at xfs_attr3_leaf block 0x12645ef8/0x1000
> > > > > Metadata corruption detected at xfs_attr3_leaf block 0x12f63f40/0x1000
> > > 
> > > These two blocks. It looks like repair didn't clean them up?
> > > 
> > > Hmmmm - looking at the code I'm not sure that repair detects and
> > > removes empty attr leaf blocks, which would explain why the error
> > > showed up again.. Can you provide a metadump of the filesystem so we
> > > can did into the exact neature of the problem you are seeing?
> > 
> > Sure not a problem. How much time will it take giving xfs_repair took approx 40 minutes?
> 
> No longer than that, with agood possibility it will be much faster
> as metadump only needs 1 pass over the metadata, not three...
> 
> Cheers,
> 
> Dave.
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify
@ 2016-10-21 12:46 Libor Klepáč
  0 siblings, 0 replies; 18+ messages in thread
From: Libor Klepáč @ 2016-10-21 12:46 UTC (permalink / raw)
  To: linux-xfs

Hello,
last week we have started to have problems with one virtual machine running debian jessie, with kernel 3.16.7-ckt20-1+deb8u4.

There are some logs, 


I don't have complete logs, but filesystem shutdown twice with

Oct 14 12:00:24 bajkonur2 kernel: [18872956.205316] XFS (dm-2): Metadata corruption detected at xfs_attr3_leaf_write_verify+0xd5/0xe0 [xfs], block 0x12f63f40
Oct 14 12:00:24 bajkonur2 kernel: [18872956.208382] XFS (dm-2): Unmount and run xfs_repair
Oct 14 12:00:24 bajkonur2 kernel: [18872956.209385] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
Oct 14 12:00:24 bajkonur2 kernel: [18872956.210187] ffff88011dadd000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Oct 14 12:00:24 bajkonur2 kernel: [18872956.211816] ffff88011dadd010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Oct 14 12:00:24 bajkonur2 kernel: [18872956.213390] ffff88011dadd020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 14 12:00:24 bajkonur2 kernel: [18872956.214983] ffff88011dadd030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 14 12:00:24 bajkonur2 kernel: [18872956.216598] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1330 of file /build/linux-U7H2aZ/linux-3.16.7-ckt20/fs/xfs/xfs_buf.c.  Return address = 0xffffffffa03ef820
Oct 14 12:00:24 bajkonur2 kernel: [18872956.217448] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
Oct 14 12:00:24 bajkonur2 kernel: [18872956.218338] XFS (dm-2): Please umount the filesystem and rectify the problem(s)

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-12-06  9:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-21 17:09 BUG: Metadata corruption detected at xfs_attr3_leaf_read_verify Libor Klepáč
2016-10-21 17:59 ` Brian Foster
2016-10-21 22:20   ` Dave Chinner
2016-10-23  6:48   ` Libor Klepáč
2016-10-24  2:40     ` Dave Chinner
2016-10-25  6:52       ` Libor Klepáč
2016-10-31  8:54       ` Libor Klepáč
2016-10-31 11:57         ` Brian Foster
2016-10-31 12:02         ` Dave Chinner
2016-10-31 15:36           ` Libor Klepáč
2016-11-08 11:09           ` Libor Klepáč
2016-11-08 11:28             ` Libor Klepáč
2016-11-10  5:29               ` Dave Chinner
     [not found]                 ` <2152865.L3K5Xz7SXO@libor-nb>
2016-11-10 21:30                   ` Dave Chinner
2016-11-23 11:40                     ` Libor Klepáč
2016-11-26  6:05                       ` Eric Sandeen
2016-12-06  9:08                     ` Libor Klepáč
  -- strict thread matches above, loose matches on Subject: below --
2016-10-21 12:46 Libor Klepáč

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.