All of lore.kernel.org
 help / color / mirror / Atom feed
* xfs_repair segfault
@ 2015-03-09 15:50 Rui Gomes
  2015-03-09 15:55 ` Carsten Aulbert
  2015-03-09 16:14 ` Eric Sandeen
  0 siblings, 2 replies; 23+ messages in thread
From: Rui Gomes @ 2015-03-09 15:50 UTC (permalink / raw)
  To: xfs; +Cc: Ómar Hermannsson


[-- Attachment #1.1: Type: text/plain, Size: 698 bytes --]

Hello Guys, 

I have a XFS filesystem that went rough, the file system got corrupted without any hardware failure or power outage what is strange enough. 
But after trying to run xfs_repair it segmented fault, this system was originally CentOs6.5, we upgraded to Centos7 using xfsprogs-3.2.0-0.10.alpha2.el7.x86_64, and run the newer version of the xfs_repair, same result segmentation fault. 


Full output and GDB Backtrace in the attachment, do you guys have any advice how can we get xfs_repair to do a clean run? 



Regards 

------------------------------- 
Rui Gomes 
CTO 


RVX - Reykjavik Visual Effects 
Seljavegur 2, 
101 Reykjavik 
Iceland 


Tel: + 354 527 3330 
Mob: + 354 663 3360 

[-- Attachment #1.2.1: Type: text/html, Size: 7086 bytes --]

[-- Attachment #1.2.2: rvx-logo.png --]
[-- Type: image/png, Size: 2226 bytes --]

[-- Attachment #2: gdb.txt --]
[-- Type: text/plain, Size: 6507 bytes --]

Starting program: /usr/sbin/xfs_repair -n -P -m 500000000000000 /dev/sdb1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffb0ec9700 (LWP 16151)]
[New Thread 0x7fffb06c8700 (LWP 16152)]
[New Thread 0x7fffafec7700 (LWP 16153)]
[New Thread 0x7fffaf6c6700 (LWP 16154)]
[New Thread 0x7fffaeec5700 (LWP 16155)]
[New Thread 0x7fffae6c4700 (LWP 16156)]
[New Thread 0x7fffadec3700 (LWP 16157)]
[New Thread 0x7fffad6c2700 (LWP 16158)]
[New Thread 0x7fffacec1700 (LWP 16159)]
[New Thread 0x7fffac6c0700 (LWP 16160)]
[New Thread 0x7fffabebf700 (LWP 16161)]
[New Thread 0x7fffab6be700 (LWP 16162)]
[New Thread 0x7fffaaebd700 (LWP 16163)]
[New Thread 0x7fffaa6bc700 (LWP 16164)]
[New Thread 0x7fffa9ebb700 (LWP 16165)]
[New Thread 0x7fffa96ba700 (LWP 16166)]
[New Thread 0x7fffa8eb9700 (LWP 16167)]
[New Thread 0x7fffa86b8700 (LWP 16168)]
[New Thread 0x7fffa7eb7700 (LWP 16169)]
[New Thread 0x7fffa76b6700 (LWP 16170)]
[New Thread 0x7fffa6eb5700 (LWP 16171)]
[New Thread 0x7fffa66b4700 (LWP 16172)]
[New Thread 0x7fffa5eb3700 (LWP 16173)]
[New Thread 0x7fffa56b2700 (LWP 16174)]
[New Thread 0x7fffa4eb1700 (LWP 16175)]
[New Thread 0x7fffa46b0700 (LWP 16176)]
[New Thread 0x7fffa3eaf700 (LWP 16177)]
[New Thread 0x7fffa36ae700 (LWP 16178)]
[New Thread 0x7fffa2ead700 (LWP 16179)]
[New Thread 0x7fffa26ac700 (LWP 16180)]
[New Thread 0x7fffa1eab700 (LWP 16181)]
[New Thread 0x7fffa16aa700 (LWP 16182)]
[Thread 0x7fffac6c0700 (LWP 16160) exited]
[Thread 0x7fffa86b8700 (LWP 16168) exited]
[Thread 0x7fffacec1700 (LWP 16159) exited]
[Thread 0x7fffa76b6700 (LWP 16170) exited]
[Thread 0x7fffa56b2700 (LWP 16174) exited]
[Thread 0x7fffa6eb5700 (LWP 16171) exited]
[Thread 0x7fffa96ba700 (LWP 16166) exited]
[Thread 0x7fffa9ebb700 (LWP 16165) exited]
[Thread 0x7fffaa6bc700 (LWP 16164) exited]
[Thread 0x7fffabebf700 (LWP 16161) exited]
[Thread 0x7fffab6be700 (LWP 16162) exited]
[Thread 0x7fffa4eb1700 (LWP 16175) exited]
[Thread 0x7fffa16aa700 (LWP 16182) exited]
[Thread 0x7fffa8eb9700 (LWP 16167) exited]
[Thread 0x7fffa5eb3700 (LWP 16173) exited]
[Thread 0x7fffa2ead700 (LWP 16179) exited]
[Thread 0x7fffae6c4700 (LWP 16156) exited]
[Thread 0x7fffadec3700 (LWP 16157) exited]
[Thread 0x7fffa7eb7700 (LWP 16169) exited]
[Thread 0x7fffaeec5700 (LWP 16155) exited]
[Thread 0x7fffad6c2700 (LWP 16158) exited]
[Thread 0x7fffa1eab700 (LWP 16181) exited]
[Thread 0x7fffb0ec9700 (LWP 16151) exited]
[Thread 0x7fffafec7700 (LWP 16153) exited]
[Thread 0x7fffa26ac700 (LWP 16180) exited]
[Thread 0x7fffb06c8700 (LWP 16152) exited]
[Thread 0x7fffaaebd700 (LWP 16163) exited]
[Thread 0x7fffa3eaf700 (LWP 16177) exited]
[Thread 0x7fffa66b4700 (LWP 16172) exited]
[Thread 0x7fffaf6c6700 (LWP 16154) exited]
[Thread 0x7fffa46b0700 (LWP 16176) exited]
[Thread 0x7fffa36ae700 (LWP 16178) exited]

Program received signal SIGABRT, Aborted.
0x00007ffff74275c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
#0  0x00007ffff74275c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff7428cd8 in __GI_abort () at abort.c:90
#2  0x00007ffff7467db7 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff756f561 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007ffff74ff9c7 in __GI___fortify_fail (msg=msg@entry=0x7ffff756f507 "buffer overflow detected") at fortify_fail.c:31
#4  0x00007ffff74fdb90 in __GI___chk_fail () at chk_fail.c:28
#5  0x0000000000414ea8 in memmove (__len=18446744073709551615, __src=0x1e562094, __dest=0x7fffffffd8f0) at /usr/include/bits/string3.h:57
#6  process_sf_dir2 (dirname=0x46b0e2 "", repair=<synthetic pointer>, parent=0x7fffffffdc20, dino_dirty=0x7fffffffdc18, ino_discovery=1, dip=0x1e562000, ino=260256256, mp=0x1e562091) at dir2.c:992
#7  process_dir2 (mp=mp@entry=0x7fffffffe020, ino=ino@entry=260256256, dip=dip@entry=0x1e562000, ino_discovery=ino_discovery@entry=1, dino_dirty=dino_dirty@entry=0x7fffffffdc18, dirname=dirname@entry=0x46b0e2 "", 
    parent=parent@entry=0x7fffffffdc20, blkmap=0x0) at dir2.c:1988
#8  0x000000000041189f in process_dinode_int (mp=mp@entry=0x7fffffffe020, dino=dino@entry=0x1e562000, agno=agno@entry=0, ino=ino@entry=260256256, was_free=<optimized out>, dirty=dirty@entry=0x7fffffffdc18, 
    used=used@entry=0x7fffffffdc14, verify_mode=verify_mode@entry=0, uncertain=uncertain@entry=0, ino_discovery=ino_discovery@entry=1, check_dups=check_dups@entry=0, extra_attr_check=extra_attr_check@entry=1, 
    isa_dir=isa_dir@entry=0x7fffffffdc1c, parent=parent@entry=0x7fffffffdc20) at dinode.c:2881
#9  0x00000000004124ce in process_dinode (mp=mp@entry=0x7fffffffe020, dino=dino@entry=0x1e562000, agno=agno@entry=0, ino=ino@entry=260256256, was_free=<optimized out>, dirty=dirty@entry=0x7fffffffdc18, used=used@entry=0x7fffffffdc14, 
    ino_discovery=ino_discovery@entry=1, check_dups=check_dups@entry=0, extra_attr_check=extra_attr_check@entry=1, isa_dir=isa_dir@entry=0x7fffffffdc1c, parent=parent@entry=0x7fffffffdc20) at dinode.c:2989
#10 0x000000000040b96f in process_inode_chunk (mp=mp@entry=0x7fffffffe020, agno=agno@entry=0, first_irec=first_irec@entry=0x7fff9c55b580, ino_discovery=ino_discovery@entry=1, check_dups=check_dups@entry=0, 
    extra_attr_check=extra_attr_check@entry=1, bogus=bogus@entry=0x7fffffffdca4, num_inos=<optimized out>) at dino_chunks.c:772
#11 0x000000000040cddd in process_aginodes (mp=0x7fffffffe020, pf_args=pf_args@entry=0x0, agno=agno@entry=0, ino_discovery=ino_discovery@entry=1, check_dups=check_dups@entry=0, extra_attr_check=extra_attr_check@entry=1)
    at dino_chunks.c:1025
#12 0x000000000041964e in process_ag_func (wq=0x7fffffffdd90, agno=0, arg=0x0) at phase3.c:77
#13 0x00000000004265da in prefetch_ag_range (work=0x7fffffffdd90, start_ag=<optimized out>, end_ag=32, dirs_only=false, func=0x419600 <process_ag_func>) at prefetch.c:907
#14 0x000000000042666c in do_inode_prefetch (mp=mp@entry=0x7fffffffe020, stride=0, func=func@entry=0x419600 <process_ag_func>, check_cache=check_cache@entry=false, dirs_only=dirs_only@entry=false) at prefetch.c:970
#15 0x000000000041975d in process_ags (mp=0x7fffffffe020) at phase3.c:85
#16 phase3 (mp=mp@entry=0x7fffffffe020) at phase3.c:121
#17 0x000000000040388e in main (argc=<optimized out>, argv=<optimized out>) at xfs_repair.c:785
A debugging session is active.

	Inferior 1 [process 16147] will be killed.

Quit anyway? (y or n) 

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2015-03-09 15:50 xfs_repair segfault Rui Gomes
@ 2015-03-09 15:55 ` Carsten Aulbert
  2015-03-09 16:11   ` Rui Gomes
  2015-03-09 16:14 ` Eric Sandeen
  1 sibling, 1 reply; 23+ messages in thread
From: Carsten Aulbert @ 2015-03-09 15:55 UTC (permalink / raw)
  To: Rui Gomes, xfs; +Cc: Ómar Hermannsson

Hi Rui

On 03/09/2015 04:50 PM, Rui Gomes wrote:
> Full output and GDB Backtrace in the attachment, do you guys have any
> advice how can we get xfs_repair to do a clean run?
> 

At the very least (though I'm not sure if that will already fix it) I
think you need to change the -m flag:


/usr/sbin/xfs_repair -n -P -m 500000000000000 /dev/sdb1

according to man page:

 -m maxmem
              Specifies the approximate maximum amount of memory, in
megabytes, to use for xfs_repair.  xfs_repair has its own internal block
cache  which  will  scale
              out up to the lesser of the process's virtual address
limit or about 75% of the system's physical RAM.  This option overrides
these limits.

              NOTE: These memory limits are only approximate and may use
more than the specified limit.

and I doubt your machine has that much memory, possibly just drop it for
now.

Cheers

Carsten

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2015-03-09 15:55 ` Carsten Aulbert
@ 2015-03-09 16:11   ` Rui Gomes
  0 siblings, 0 replies; 23+ messages in thread
From: Rui Gomes @ 2015-03-09 16:11 UTC (permalink / raw)
  To: Carsten Aulbert; +Cc: omar, xfs

[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]

Hello Carsten,

Thank you for the quick reply, we tried many different combinations, without the -m flag the result is the same.
New gdb dump in the attachment replacing the -m with -v

Regards 

------------------------------- 
Rui Gomes 
CTO 


RVX - Reykjavik Visual Effects 
Seljavegur 2, 
101 Reykjavik 
Iceland 


Tel: + 354 527 3330 
Mob: + 354 663 3360

----- Original Message -----
From: "Carsten Aulbert" <Carsten.Aulbert@aei.mpg.de>
To: "Rui Gomes" <rgomes@rvx.is>, "xfs" <xfs@oss.sgi.com>
Cc: "omar" <omar@rvx.is>
Sent: Monday, 9 March, 2015 15:55:00
Subject: xfs_repair segfault

Hi Rui

On 03/09/2015 04:50 PM, Rui Gomes wrote:
> Full output and GDB Backtrace in the attachment, do you guys have any
> advice how can we get xfs_repair to do a clean run?
> 

At the very least (though I'm not sure if that will already fix it) I
think you need to change the -m flag:


/usr/sbin/xfs_repair -n -P -m 500000000000000 /dev/sdb1

according to man page:

 -m maxmem
              Specifies the approximate maximum amount of memory, in
megabytes, to use for xfs_repair.  xfs_repair has its own internal block
cache  which  will  scale
              out up to the lesser of the process's virtual address
limit or about 75% of the system's physical RAM.  This option overrides
these limits.

              NOTE: These memory limits are only approximate and may use
more than the specified limit.

and I doubt your machine has that much memory, possibly just drop it for
now.

Cheers

Carsten

[-- Attachment #2: gdb2.txt --]
[-- Type: text/plain, Size: 6491 bytes --]

Starting program: /usr/sbin/xfs_repair -n -P -v /dev/sdb1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffef950700 (LWP 16520)]
[New Thread 0x7fffef14f700 (LWP 16521)]
[New Thread 0x7fffee94e700 (LWP 16522)]
[New Thread 0x7fffee14d700 (LWP 16523)]
[New Thread 0x7fffed94c700 (LWP 16524)]
[New Thread 0x7fffed14b700 (LWP 16525)]
[New Thread 0x7fffec94a700 (LWP 16526)]
[New Thread 0x7fffec149700 (LWP 16527)]
[New Thread 0x7fffeb948700 (LWP 16528)]
[New Thread 0x7fffeb147700 (LWP 16529)]
[New Thread 0x7fffea946700 (LWP 16530)]
[New Thread 0x7fffea145700 (LWP 16531)]
[New Thread 0x7fffe9944700 (LWP 16532)]
[New Thread 0x7fffe9143700 (LWP 16533)]
[New Thread 0x7fffe8942700 (LWP 16534)]
[New Thread 0x7fffe8141700 (LWP 16535)]
[New Thread 0x7fffe7940700 (LWP 16536)]
[New Thread 0x7fffe713f700 (LWP 16537)]
[New Thread 0x7fffe693e700 (LWP 16538)]
[New Thread 0x7fffe613d700 (LWP 16539)]
[New Thread 0x7fffe593c700 (LWP 16540)]
[New Thread 0x7fffe513b700 (LWP 16541)]
[New Thread 0x7fffe493a700 (LWP 16542)]
[New Thread 0x7fffe4139700 (LWP 16543)]
[New Thread 0x7fffe3938700 (LWP 16544)]
[New Thread 0x7fffe3137700 (LWP 16545)]
[New Thread 0x7fffe2936700 (LWP 16546)]
[New Thread 0x7fffe2135700 (LWP 16547)]
[New Thread 0x7fffe1934700 (LWP 16548)]
[New Thread 0x7fffe1133700 (LWP 16549)]
[New Thread 0x7fffe0932700 (LWP 16550)]
[New Thread 0x7fffe0131700 (LWP 16551)]
[Thread 0x7fffec149700 (LWP 16527) exited]
[Thread 0x7fffe613d700 (LWP 16539) exited]
[Thread 0x7fffe8942700 (LWP 16534) exited]
[Thread 0x7fffe3938700 (LWP 16544) exited]
[Thread 0x7fffe8141700 (LWP 16535) exited]
[Thread 0x7fffe9944700 (LWP 16532) exited]
[Thread 0x7fffe0932700 (LWP 16550) exited]
[Thread 0x7fffe593c700 (LWP 16540) exited]
[Thread 0x7fffea946700 (LWP 16530) exited]
[Thread 0x7fffe3137700 (LWP 16545) exited]
[Thread 0x7fffe693e700 (LWP 16538) exited]
[Thread 0x7fffea145700 (LWP 16531) exited]
[Thread 0x7fffe7940700 (LWP 16536) exited]
[Thread 0x7fffeb147700 (LWP 16529) exited]
[Thread 0x7fffe493a700 (LWP 16542) exited]
[Thread 0x7fffeb948700 (LWP 16528) exited]
[Thread 0x7fffed94c700 (LWP 16524) exited]
[Thread 0x7fffe1934700 (LWP 16548) exited]
[Thread 0x7fffec94a700 (LWP 16526) exited]
[Thread 0x7fffed14b700 (LWP 16525) exited]
[Thread 0x7fffe0131700 (LWP 16551) exited]
[Thread 0x7fffe713f700 (LWP 16537) exited]
[Thread 0x7fffef950700 (LWP 16520) exited]
[Thread 0x7fffee94e700 (LWP 16522) exited]
[Thread 0x7fffe1133700 (LWP 16549) exited]
[Thread 0x7fffe2135700 (LWP 16547) exited]
[Thread 0x7fffe9143700 (LWP 16533) exited]
[Thread 0x7fffe513b700 (LWP 16541) exited]
[Thread 0x7fffef14f700 (LWP 16521) exited]
[Thread 0x7fffee14d700 (LWP 16523) exited]
[Thread 0x7fffe2936700 (LWP 16546) exited]
[Thread 0x7fffe4139700 (LWP 16543) exited]

Program received signal SIGABRT, Aborted.
0x00007ffff74275c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
#0  0x00007ffff74275c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff7428cd8 in __GI_abort () at abort.c:90
#2  0x00007ffff7467db7 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff756f561 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007ffff74ff9c7 in __GI___fortify_fail (msg=msg@entry=0x7ffff756f507 "buffer overflow detected") at fortify_fail.c:31
#4  0x00007ffff74fdb90 in __GI___chk_fail () at chk_fail.c:28
#5  0x0000000000414ea8 in memmove (__len=18446744073709551615, __src=0x1e560e94, __dest=0x7fffffffd900) at /usr/include/bits/string3.h:57
#6  process_sf_dir2 (dirname=0x46b0e2 "", repair=<synthetic pointer>, parent=0x7fffffffdc30, dino_dirty=0x7fffffffdc28, ino_discovery=1, dip=0x1e560e00, ino=260256256, mp=0x1e560e91) at dir2.c:992
#7  process_dir2 (mp=mp@entry=0x7fffffffe030, ino=ino@entry=260256256, dip=dip@entry=0x1e560e00, ino_discovery=ino_discovery@entry=1, dino_dirty=dino_dirty@entry=0x7fffffffdc28, dirname=dirname@entry=0x46b0e2 "", 
    parent=parent@entry=0x7fffffffdc30, blkmap=0x0) at dir2.c:1988
#8  0x000000000041189f in process_dinode_int (mp=mp@entry=0x7fffffffe030, dino=dino@entry=0x1e560e00, agno=agno@entry=0, ino=ino@entry=260256256, was_free=<optimized out>, dirty=dirty@entry=0x7fffffffdc28, 
    used=used@entry=0x7fffffffdc24, verify_mode=verify_mode@entry=0, uncertain=uncertain@entry=0, ino_discovery=ino_discovery@entry=1, check_dups=check_dups@entry=0, extra_attr_check=extra_attr_check@entry=1, 
    isa_dir=isa_dir@entry=0x7fffffffdc2c, parent=parent@entry=0x7fffffffdc30) at dinode.c:2881
#9  0x00000000004124ce in process_dinode (mp=mp@entry=0x7fffffffe030, dino=dino@entry=0x1e560e00, agno=agno@entry=0, ino=ino@entry=260256256, was_free=<optimized out>, dirty=dirty@entry=0x7fffffffdc28, used=used@entry=0x7fffffffdc24, 
    ino_discovery=ino_discovery@entry=1, check_dups=check_dups@entry=0, extra_attr_check=extra_attr_check@entry=1, isa_dir=isa_dir@entry=0x7fffffffdc2c, parent=parent@entry=0x7fffffffdc30) at dinode.c:2989
#10 0x000000000040b96f in process_inode_chunk (mp=mp@entry=0x7fffffffe030, agno=agno@entry=0, first_irec=first_irec@entry=0x7fffc055a3f0, ino_discovery=ino_discovery@entry=1, check_dups=check_dups@entry=0, 
    extra_attr_check=extra_attr_check@entry=1, bogus=bogus@entry=0x7fffffffdcb4, num_inos=<optimized out>) at dino_chunks.c:772
#11 0x000000000040cddd in process_aginodes (mp=0x7fffffffe030, pf_args=pf_args@entry=0x0, agno=agno@entry=0, ino_discovery=ino_discovery@entry=1, check_dups=check_dups@entry=0, extra_attr_check=extra_attr_check@entry=1)
    at dino_chunks.c:1025
#12 0x000000000041964e in process_ag_func (wq=0x7fffffffdda0, agno=0, arg=0x0) at phase3.c:77
#13 0x00000000004265da in prefetch_ag_range (work=0x7fffffffdda0, start_ag=<optimized out>, end_ag=32, dirs_only=false, func=0x419600 <process_ag_func>) at prefetch.c:907
#14 0x000000000042666c in do_inode_prefetch (mp=mp@entry=0x7fffffffe030, stride=0, func=func@entry=0x419600 <process_ag_func>, check_cache=check_cache@entry=false, dirs_only=dirs_only@entry=false) at prefetch.c:970
#15 0x000000000041975d in process_ags (mp=0x7fffffffe030) at phase3.c:85
#16 phase3 (mp=mp@entry=0x7fffffffe030) at phase3.c:121
#17 0x000000000040388e in main (argc=<optimized out>, argv=<optimized out>) at xfs_repair.c:785
A debugging session is active.

	Inferior 1 [process 16516] will be killed.

Quit anyway? (y or n) 

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2015-03-09 15:50 xfs_repair segfault Rui Gomes
  2015-03-09 15:55 ` Carsten Aulbert
@ 2015-03-09 16:14 ` Eric Sandeen
  2015-03-09 16:24   ` Rui Gomes
  1 sibling, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2015-03-09 16:14 UTC (permalink / raw)
  To: Rui Gomes, xfs; +Cc: Ómar Hermannsson

On 3/9/15 11:50 AM, Rui Gomes wrote:
> Program received signal SIGABRT, Aborted.
> 0x00007ffff74275c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> #0  0x00007ffff74275c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007ffff7428cd8 in __GI_abort () at abort.c:90
> #2  0x00007ffff7467db7 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff756f561 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
> #3  0x00007ffff74ff9c7 in __GI___fortify_fail (msg=msg@entry=0x7ffff756f507 "buffer overflow detected") at fortify_fail.c:31
> #4  0x00007ffff74fdb90 in __GI___chk_fail () at chk_fail.c:28
> #5  0x0000000000414ea8 in memmove (__len=18446744073709551615, __src=0x1e562094, __dest=0x7fffffffd8f0) at /usr/include/bits/string3.h:57
> #6  process_sf_dir2 (dirname=0x46b0e2 "", repair=<synthetic pointer>, parent=0x7fffffffdc20, dino_dirty=0x7fffffffdc18, ino_discovery=1, dip=0x1e562000, ino=260256256, mp=0x1e562091) at dir2.c:992

That's here:

                if (junkit)  {
                        memmove(name, sfep->name, namelen); <<<<
                        name[namelen] = '\0';

and the len passed to memmove, 18446744073709551615, is 0xFFFFFFFFFFFFFFFF
or -1 according to gdb.

What are the few lines of xfs_repair output prior to this, i.e. messages
containing "shortform dir"?

If you'd like to create & compress an xfs_metadump & provide it to me offline,
I'll see if that recreates the segfault & look into it further.

Thanks,
-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2015-03-09 16:14 ` Eric Sandeen
@ 2015-03-09 16:24   ` Rui Gomes
  2015-03-09 17:34     ` Eric Sandeen
  0 siblings, 1 reply; 23+ messages in thread
From: Rui Gomes @ 2015-03-09 16:24 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: omar, xfs

Hello Eric, 

I would love to send you the xfs metadump but it segfaults as well.

[root@icess8a ~]# xfs_metadump -w /dev/sdb1 xfs_metadata.dump
Metadata corruption detected at block 0x4ffed6d08/0x1000
xfs_metadump: cannot init perag data (117). Continuing anyway.
xfs_metadump: invalid block number (103589472/7327271) in inode 31138 bmapbtd root
xfs_metadump: bad number of extents 101 in inode 438913
xfs_metadump: bad number of extents 8126465 in inode 438922
xfs_metadump: bad number of extents 106 in inode 438930
xfs_metadump: bad number of extents 95 in inode 438931
xfs_metadump: bad number of extents 124 in inode 438932
xfs_metadump: bad number of extents 33648898 in inode 438933
xfs_metadump: bad number of extents 1134 in inode 438933
xfs_metadump: bad number of extents 1678966914 in inode 438942
xfs_metadump: bad number of extents 107 in inode 438947
xfs_metadump: bad number of extents 83917828 in inode 438952
xfs_metadump: bad number of extents 106 in inode 438952
xfs_metadump: bad number of extents 793014134 in inode 438957
xfs_metadump: invalid magic in dir inode 754758 block 0
xfs_metadump: invalid magic in dir inode 754767 block 0
xfs_metadump: invalid magic in dir inode 191761973 block 2
xfs_metadump: invalid magic in dir inode 191761973 block 3
xfs_metadump: invalid magic in dir inode 191761973 block 4
xfs_metadump: bad number of extents 201326593 in inode 252685314
xfs_metadump: invalid size in dir inode 252685315
xfs_metadump: zero length entry in dir inode 252685315
xfs_metadump: invalid size in dir inode 252685316
xfs_metadump: invalid size in dir inode 252685317
xfs_metadump: bad number of extents 268497404 in inode 252685318
xfs_metadump: bad number of extents 256 in inode 252685322
xfs_metadump: bad number of extents 754974721 in inode 252685326
xfs_metadump: bad number of extents 1931505779 in inode 252685349
xfs_metadump: suspicious count 860276 in bmap extent 0 in symlink ino 252685351
xfs_metadump: bad number of extents -2097020927 in inode 252685358
xfs_metadump: bad number of extents 50332161 in inode 252685374
xfs_metadump: bad number of extents 301989889 in inode 252685380
xfs_metadump: bad number of extents 301992705 in inode 252685383
xfs_metadump: entry length in dir inode 255735873 overflows space
xfs_metadump: bad number of extents 10229761 in inode 255735925
xfs_metadump: invalid attr size in inode 255735926
xfs_metadump: attr entry length in inode 255735926 overflows space
xfs_metadump: bad number of extents 301989889 in inode 259383043
xfs_metadump: bad number of extents 1952407297 in inode 259383045
xfs_metadump: bad number of extents 117440513 in inode 259383051
xfs_metadump: invalid block number 3840/16213200 (515412288720) in bmap extent 0 in symlink ino 259383052
xfs_metadump: bad number of extents 59484 in inode 259383054
xfs_metadump: entry length in dir inode 259383076 overflows space
xfs_metadump: invalid size in dir inode 259383077
xfs_metadump: entry length in dir inode 259383077 overflows space
xfs_metadump: invalid magic in dir inode 260215042 block 0
xfs_metadump: entry length in dir inode 260256256 overflows space
/usr/sbin/xfs_metadump: line 32: 16695 Segmentation fault      xfs_db$DBOPTS -i -p xfs_metadump -c "metadump$OPTS $2" $1



This is the output of xfs_repair truncated:

entry "epl-v10.html" in shortform directory 259383076 references invalid inode 472470915093
would have junked entry "epl-v10.html" in directory inode 259383076
entry "fea�{re.xml" in shortform directory 259383076 references invalid inode 3940649933342743
would have junked entry "fea�{re.xml" in directory inode 259383076
entry "                                                                                                                                                                                                      licen�k.html" in shortform directory 259383076 references invalid inode 1275257465539072
size of last entry overflows space left in in shortform dir 259383076, would reset to 20
entry contains illegal character in shortform dir 259383076
would have junked entry "licen�k.html" in directory inode 259383076
would have corrected directory 259383076 size from 107 to 115
bogus .. inode number (44169981931271) in directory inode 259383076, would clear inode number
bad magic number 0x2741 on inode 259383077, would reset magic number
bad (negative) size -2445736072638889871 on inode 259383077
would have cleared inode 259383077
bad magic number 0x4755 on inode 259383078, would reset magic number
bad non-zero extent size 2149318656 for non-realtime/extsize inode 259383078, would reset to zero
would have cleared inode 259383078
bad magic number 0x673d on inode 259383079, would reset magic number
bad version number 0x66 on inode 259383079, would reset version number
bad inode format in inode 259383079
would have cleared inode 259383079
bad non-zero extent size 33554432 for non-realtime/extsize inode 259383080, would reset to zero
bad nblocks 792633534534647812 for inode 259383080, would reset to 4
bad anextents 36865 for inode 259383080, would reset to 0
bad magic number 0x794e on inode 259383081, would reset magic number
bad non-zero extent size 33554432 for non-realtime/extsize inode 259383081, would reset to zero
data fork in ino 259383081 claims free block 16227590
data fork in ino 259383081 claims free block 16227591
data fork in regular inode 259383081 claims used block 16228000
correcting nextents for inode 259383081
bad data fork in inode 259383081
would have cleared inode 259383081
bad magic number 0x174e on inode 259383082, would reset magic number
bad version number 0x6 on inode 259383082, would reset version number
bad non-zero extent size 100663296 for non-realtime/extsize inode 259383082, would reset to zero
bad nblocks 7493989779961282574 for inode 259383082, would reset to 14
bad anextents 16398 for inode 259383082, would reset to 0
would have cleared inode 259383082
bad magic number 0x3d4e on inode 259383083, would reset magic number
bad non-zero extent size 33554432 for non-realtime/extsize inode 259383083, would reset to zero
zero length extent (off = 562949953421312, fsbno = 0) in ino 259383083
correcting nextents for inode 259383083
bad data fork in inode 259383083
would have cleared inode 259383083
bad magic number 0xd34e on inode 259383084, would reset magic number
bad non-zero extent size 50331648 for non-realtime/extsize inode 259383084, would reset to zero
bad attr fork offset 32 in inode 259383084, max=10
would have cleared inode 259383084
bad magic number 0xf34e on inode 259383085, would reset magic number
bad version number 0x4 on inode 259383085, would reset version number
bad non-zero extent size 50331648 for non-realtime/extsize inode 259383085, would reset to zero
bad attr fork offset 32 in inode 259383085, max=19
would have cleared inode 259383085
bad magic number 0x9e4e on inode 259383086, would reset magic number
bad version number 0xa on inode 259383086, would reset version number
bad nblocks 15852670688360923137 for inode 259383086, would reset to 1
would have cleared inode 259383086
data fork in regular inode 259393564 claims used block 16230080
correcting nextents for inode 259393564
bad data fork in inode 259393564
would have cleared inode 259393564
data fork in regular inode 259393566 claims used block 16230128
correcting nextents for inode 259393566
bad data fork in inode 259393566
would have cleared inode 259393566
data fork in regular inode 259702849 claims used block 16231728
correcting nextents for inode 259702849
bad data fork in inode 259702849
would have cleared inode 259702849
Metadata corruption detected at block 0x7c14878/0x1000
bad directory block magic # 0x3d012146 in block 0 for directory inode 260215042
corrupt block 0 in directory inode 260215042
        would junk block
no . entry for directory 260215042
no .. entry for directory 260215042
problem with directory contents in inode 260215042
would have cleared inode 260215042
bad nblocks 7 for inode 260256256, would reset to 0
bad nextents 1 for inode 260256256, would reset to 0
entry "                 kchnfig" in shortform directory 260256256 references invalid inode 28428972647780227
entry contains illegal character in shortform dir 260256256
would have junked entry "kchnfig" in directory inode 260256256
entry "                                                  " in shortform directory 260256256 references invalid inode 0
size of last entry overflows space left in in shortform dir 260256256, would reset to -1
*** buffer overflow detected ***: /usr/sbin/xfs_repair terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7ffff74ff9c7]
/lib64/libc.so.6(+0x10bb90)[0x7ffff74fdb90]
/usr/sbin/xfs_repair[0x414ea8]
/usr/sbin/xfs_repair[0x41189f]
/usr/sbin/xfs_repair[0x4124ce]
/usr/sbin/xfs_repair[0x40b96f]
/usr/sbin/xfs_repair[0x40cddd]
/usr/sbin/xfs_repair[0x41964e]
/usr/sbin/xfs_repair[0x4265da]
/usr/sbin/xfs_repair[0x42666c]
/usr/sbin/xfs_repair[0x41975d]
/usr/sbin/xfs_repair[0x40388e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff7413af5]
/usr/sbin/xfs_repair[0x403f89]
======= Memory map: ========
00400000-00481000 r-xp 00000000 08:05 1075058659                         /usr/sbin/xfs_repair
00680000-00681000 r--p 00080000 08:05 1075058659                         /usr/sbin/xfs_repair
00681000-00684000 rw-p 00081000 08:05 1075058659                         /usr/sbin/xfs_repair
00684000-1e571000 rw-p 00000000 00:00 0                                  [heap]
7fff58000000-7fff5884a000 rw-p 00000000 00:00 0 
7fff5884a000-7fff5c000000 ---p 00000000 00:00 0 
7fff60000000-7fff607c4000 rw-p 00000000 00:00 0 
7fff607c4000-7fff64000000 ---p 00000000 00:00 0 
7fff64000000-7fff6471c000 rw-p 00000000 00:00 0 
7fff6471c000-7fff68000000 ---p 00000000 00:00 0 
7fff68000000-7fff68723000 rw-p 00000000 00:00 0 
7fff68723000-7fff6c000000 ---p 00000000 00:00 0 
7fff6c000000-7fff6c75b000 rw-p 00000000 00:00 0 
7fff6c75b000-7fff70000000 ---p 00000000 00:00 0 
7fff70000000-7fff7080e000 rw-p 00000000 00:00 0 
7fff7080e000-7fff74000000 ---p 00000000 00:00 0 
7fff74000000-7fff749e8000 rw-p 00000000 00:00 0 
7fff749e8000-7fff78000000 ---p 00000000 00:00 0 
7fff78000000-7fff7873d000 rw-p 00000000 00:00 0 
7fff7873d000-7fff7c000000 ---p 00000000 00:00 0 
7fff7c000000-7fff7c794000 rw-p 00000000 00:00 0 
7fff7c794000-7fff80000000 ---p 00000000 00:00 0 
7fff80000000-7fff804ed000 rw-p 00000000 00:00 0 
7fff804ed000-7fff84000000 ---p 00000000 00:00 0 
7fff84000000-7fff847bf000 rw-p 00000000 00:00 0 
7fff847bf000-7fff88000000 ---p 00000000 00:00 0 
7fff88000000-7fff8875d000 rw-p 00000000 00:00 0 
7fff8875d000-7fff8c000000 ---p 00000000 00:00 0 
7fff8c000000-7fff8c021000 rw-p 00000000 00:00 0 
7fff8c021000-7fff90000000 ---p 00000000 00:00 0 
7fff90000000-7fff90729000 rw-p 00000000 00:00 0 
7fff90729000-7fff94000000 ---p 00000000 00:00 0 
7fff94000000-7fff947bf000 rw-p 00000000 00:00 0 
7fff947bf000-7fff98000000 ---p 00000000 00:00 0 
7fff98000000-7fff9876c000 rw-p 00000000 00:00 0 
7fff9876c000-7fff9c000000 ---p 00000000 00:00 0 
7fff9c000000-7fff9c8b6000 rw-p 00000000 00:00 0 
7fff9c8b6000-7fffa0000000 ---p 00000000 00:00 0 
7fffa0000000-7fffa0759000 rw-p 00000000 00:00 0 
7fffa0759000-7fffa4000000 ---p 00000000 00:00 0 
7fffa4000000-7fffa451c000 rw-p 00000000 00:00 0 
7fffa451c000-7fffa8000000 ---p 00000000 00:00 0 
7fffa8000000-7fffa8714000 rw-p 00000000 00:00 0 
7fffa8714000-7fffac000000 ---p 00000000 00:00 0 
7fffac000000-7fffac71d000 rw-p 00000000 00:00 0 
7fffac71d000-7fffb0000000 ---p 00000000 00:00 0 
7fffb0000000-7fffb081d000 rw-p 00000000 00:00 0 
7fffb081d000-7fffb4000000 ---p 00000000 00:00 0 
7fffb4000000-7fffb469d000 rw-p 00000000 00:00 0 
7fffb469d000-7fffb8000000 ---p 00000000 00:00 0 
7fffb8000000-7fffb84b6000 rw-p 00000000 00:00 0 
7fffb84b6000-7fffbc000000 ---p 00000000 00:00 0 
7fffbc000000-7fffbc6a2000 rw-p 00000000 00:00 0 
7fffbc6a2000-7fffc0000000 ---p 00000000 00:00 0 
7fffc0000000-7fffc1f31000 rw-p 00000000 00:00 0 
7fffc1f31000-7fffc4000000 ---p 00000000 00:00 0 
7fffc4000000-7fffc47a0000 rw-p 00000000 00:00 0 
7fffc47a0000-7fffc8000000 ---p 00000000 00:00 0 
7fffc8000000-7fffc8782000 rw-p 00000000 00:00 0 
7fffc8782000-7fffcc000000 ---p 00000000 00:00 0 
7fffcc000000-7fffcc7a0000 rw-p 00000000 00:00 0 
7fffcc7a0000-7fffd0000000 ---p 00000000 00:00 0 
7fffd0000000-7fffd0719000 rw-p 00000000 00:00 0 
7fffd0719000-7fffd4000000 ---p 00000000 00:00 0 
7fffd4000000-7fffd4798000 rw-p 00000000 00:00 0 
7fffd4798000-7fffd8000000 ---p 00000000 00:00 0 
7fffd8000000-7fffd8635000 rw-p 00000000 00:00 0 
7fffd8635000-7fffdc000000 ---p 00000000 00:00 0 
7fffdf931000-7fffdf932000 ---p 00000000 00:00 0 
7fffdf932000-7fffe0132000 rw-p 00000000 00:00 0 
7fffe0132000-7fffe0133000 ---p 00000000 00:00 0 
7fffe0133000-7fffe0933000 rw-p 00000000 00:00 0 
7fffe0933000-7fffe0934000 ---p 00000000 00:00 0 
7fffe0934000-7fffe1134000 rw-p 00000000 00:00 0 
7fffe1134000-7fffe1135000 ---p 00000000 00:00 0 
7fffe1135000-7fffe1935000 rw-p 00000000 00:00 0 
7fffef73b000-7fffef750000 r-xp 00000000 08:05 1075245487                 /usr/lib64/libgcc_s-4.8.2-20140120.so.1
7fffef750000-7fffef94f000 ---p 00015000 08:05 1075245487                 /usr/lib64/libgcc_s-4.8.2-20140120.so.1
7fffef94f000-7fffef950000 r--p 00014000 08:05 1075245487                 /usr/lib64/libgcc_s-4.8.2-20140120.so.1
7fffef950000-7fffef951000 rw-p 00015000 08:05 1075245487                 /usr/lib64/libgcc_s-4.8.2-20140120.so.1
7fffef951000-7ffff0ecb000 rw-p 00000000 00:00 0 
7ffff0ecb000-7ffff73f2000 r--p 00000000 08:05 1610627723                 /usr/lib/locale/locale-archive
7ffff73f2000-7ffff75a8000 r-xp 00000000 08:05 1074635524                 /usr/lib64/libc-2.17.so
7ffff75a8000-7ffff77a8000 ---p 001b6000 08:05 1074635524                 /usr/lib64/libc-2.17.so
7ffff77a8000-7ffff77ac000 r--p 001b6000 08:05 1074635524                 /usr/lib64/libc-2.17.so
7ffff77ac000-7ffff77ae000 rw-p 001ba000 08:05 1074635524                 /usr/lib64/libc-2.17.so
7ffff77ae000-7ffff77b3000 rw-p 00000000 00:00 0 
7ffff77b3000-7ffff77c9000 r-xp 00000000 08:05 1074635679                 /usr/lib64/libpthread-2.17.so
7ffff77c9000-7ffff79c9000 ---p 00016000 08:05 1074635679                 /usr/lib64/libpthread-2.17.so
7ffff79c9000-7ffff79ca000 r--p 00016000 08:05 1074635679                 /usr/lib64/libpthread-2.17.so
7ffff79ca000-7ffff79cb000 rw-p 00017000 08:05 1074635679                 /usr/lib64/libpthread-2.17.so
7ffff79cb000-7ffff79cf000 rw-p 00000000 00:00 0 
7ffff79cf000-7ffff79d6000 r-xp 00000000 08:05 1074635683                 /usr/lib64/librt-2.17.so
7ffff79d6000-7ffff7bd5000 ---p 00007000 08:05 1074635683                 /usr/lib64/librt-2.17.so
7ffff7bd5000-7ffff7bd6000 r--p 00006000 08:05 1074635683                 /usr/lib64/librt-2.17.so
7ffff7bd6000-7ffff7bd7000 rw-p 00007000 08:05 1074635683                 /usr/lib64/librt-2.17.so
7ffff7bd7000-7ffff7bdb000 r-xp 00000000 08:05 1074635744                 /usr/lib64/libuuid.so.1.3.0
7ffff7bdb000-7ffff7dda000 ---p 00004000 08:05 1074635744                 /usr/lib64/libuuid.so.1.3.0
7ffff7dda000-7ffff7ddb000 r--p 00003000 08:05 1074635744                 /usr/lib64/libuuid.so.1.3.0
7ffff7ddb000-7ffff7ddc000 rw-p 00004000 08:05 1074635744                 /usr/lib64/libuuid.so.1.3.0
7ffff7ddc000-7ffff7dfd000 r-xp 00000000 08:05 1075245500                 /usr/lib64/ld-2.17.so
7ffff7e91000-7ffff7ff2000 rw-p 00000000 00:00 0 
7ffff7ff8000-7ffff7ffa000 rw-p 00000000 00:00 0 
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0                          [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00020000 08:05 1075245500                 /usr/lib64/ld-2.17.so
7ffff7ffd000-7ffff7ffe000 rw-p 00021000 08:05 1075245500                 /usr/lib64/ld-2.17.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Program received signal SIGABRT, Aborted.
0x00007ffff74275c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);






Regards 

------------------------------- 
Rui Gomes 
CTO 


RVX - Reykjavik Visual Effects 
Seljavegur 2, 
101 Reykjavik 
Iceland 


Tel: + 354 527 3330 
Mob: + 354 663 3360

----- Original Message -----
From: "Eric Sandeen" <sandeen@sandeen.net>
To: "Rui Gomes" <rgomes@rvx.is>, "xfs" <xfs@oss.sgi.com>
Cc: "omar" <omar@rvx.is>
Sent: Monday, 9 March, 2015 16:14:52
Subject: Re: xfs_repair segfault

On 3/9/15 11:50 AM, Rui Gomes wrote:
> Program received signal SIGABRT, Aborted.
> 0x00007ffff74275c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> #0  0x00007ffff74275c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007ffff7428cd8 in __GI_abort () at abort.c:90
> #2  0x00007ffff7467db7 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff756f561 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
> #3  0x00007ffff74ff9c7 in __GI___fortify_fail (msg=msg@entry=0x7ffff756f507 "buffer overflow detected") at fortify_fail.c:31
> #4  0x00007ffff74fdb90 in __GI___chk_fail () at chk_fail.c:28
> #5  0x0000000000414ea8 in memmove (__len=18446744073709551615, __src=0x1e562094, __dest=0x7fffffffd8f0) at /usr/include/bits/string3.h:57
> #6  process_sf_dir2 (dirname=0x46b0e2 "", repair=<synthetic pointer>, parent=0x7fffffffdc20, dino_dirty=0x7fffffffdc18, ino_discovery=1, dip=0x1e562000, ino=260256256, mp=0x1e562091) at dir2.c:992

That's here:

                if (junkit)  {
                        memmove(name, sfep->name, namelen); <<<<
                        name[namelen] = '\0';

and the len passed to memmove, 18446744073709551615, is 0xFFFFFFFFFFFFFFFF
or -1 according to gdb.

What are the few lines of xfs_repair output prior to this, i.e. messages
containing "shortform dir"?

If you'd like to create & compress an xfs_metadump & provide it to me offline,
I'll see if that recreates the segfault & look into it further.

Thanks,
-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2015-03-09 16:24   ` Rui Gomes
@ 2015-03-09 17:34     ` Eric Sandeen
  2015-03-09 17:50       ` Rui Gomes
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2015-03-09 17:34 UTC (permalink / raw)
  To: Rui Gomes; +Cc: omar, xfs

On 3/9/15 12:24 PM, Rui Gomes wrote:
> Hello Eric, 
> 
> I would love to send you the xfs metadump but it segfaults as well.

woohoo!  \o/

> This is the output of xfs_repair truncated:
> 
...

> no . entry for directory 260215042
> no .. entry for directory 260215042
> problem with directory contents in inode 260215042
> would have cleared inode 260215042
> bad nblocks 7 for inode 260256256, would reset to 0
> bad nextents 1 for inode 260256256, would reset to 0
> entry "                 kchnfig" in shortform directory 260256256 references invalid inode 28428972647780227
> entry contains illegal character in shortform dir 260256256
> would have junked entry "kchnfig" in directory inode 260256256
> entry "                                                  " in shortform directory 260256256 references invalid inode 0
> size of last entry overflows space left in in shortform dir 260256256, would reset to -1
> *** buffer overflow detected ***: /usr/sbin/xfs_repair terminated

Ok, looking at the sheer number of errors, I really wonder what happened to the fs.

You''d do well to be 100% sure that storage is OK, and that you're not trying to
repair a filesytem on scrambled disks but in any case, xfs should not segfault.

But anyway, this:

> size of last entry overflows space left in in shortform dir 260256256, would reset to -1

is a good clue; it must be in here:

                        if (i == num_entries - 1)  {
                                namelen = ino_dir_size -
                                        ((__psint_t) &sfep->name[0] -
                                         (__psint_t) sfp);
                                do_warn(
_("size of last entry overflows space left in in shortform dir %" PRIu64 ", "),
                                        ino);
                                if (!no_modify)  {
                                        do_warn(_("resetting to %d\n"),
                                                namelen);
                                        sfep->namelen = namelen;
                                        *dino_dirty = 1;

which means the -1 namelen memmove choked on came from:

ino_dir_size - ((__psint_t) &sfep->name[0] - (__psint_t) sfp);

and those come from:

sfp = (struct xfs_dir2_sf_hdr *)XFS_DFORK_DPTR(dip) = ((char *)dip + xfs_dinode_size(dip->di_version))
ino_dir_size = be64_to_cpu(dip->di_size);
sfep = ... xfs_dir2_sf_firstentry(sfp);

We could just be defensive against a negative namelen, but maybe we should
understand a bit more clearly how we got here.

Might start by trying:

# xfs_db -c "inode 260256256" -c "p" /dev/whatever

and show us what you get.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2015-03-09 17:34     ` Eric Sandeen
@ 2015-03-09 17:50       ` Rui Gomes
  2015-03-09 18:18         ` Eric Sandeen
  0 siblings, 1 reply; 23+ messages in thread
From: Rui Gomes @ 2015-03-09 17:50 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: omar, xfs

Hi,

Yeah I feel the same way what could possible happen here, since no "funky" business happen in this server.

In case this help the underline hardware is:
Raid Controller: MegaRAID SAS 2108 [Liberator] (rev 05)
With 16 7.2k SAS 2TB harddrives in raid6

The output from the command:
[root@icess8a ~]# xfs_db -c "inode 260256256" -c "p" /dev/sdb1 
Metadata corruption detected at block 0x4ffed6d08/0x1000
xfs_db: cannot init perag data (117). Continuing anyway.
core.magic = 0x494e
core.mode = 040755
core.version = 2
core.format = 1 (local)
core.nlinkv2 = 2
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 0
core.gid = 0
core.flushiter = 0
core.atime.sec = Fri May 16 10:52:31 2014
core.atime.nsec = 051443134
core.mtime.sec = Thu Aug 25 11:05:18 2011
core.mtime.nsec = 000000000
core.ctime.sec = Wed Feb 26 04:39:42 2014
core.ctime.nsec = 964671556
core.size = 47
core.nblocks = 7
core.extsize = 0
core.nextents = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 270972429
next_unlinked = null
u.sfdir2.hdr.count = 2
u.sfdir2.hdr.i8count = 1
u.sfdir2.hdr.parent.i8 = 51672582160
u.sfdir2.list[0].namelen = 24
u.sfdir2.list[0].offset = 0x63d
u.sfdir2.list[0].name = "kchnfig\000\000\000\000\017\2032\030\b\000\210Makefi"
u.sfdir2.list[0].inumber.i8 = 7810649128743997315
u.sfdir2.list[1].namelen = 50
u.sfdir2.list[1].offset = 0x1900
u.sfdir2.list[1].name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
u.sfdir2.list[1].inumber.i8 = 0


Regards 

------------------------------- 
Rui Gomes 
CTO 


RVX - Reykjavik Visual Effects 
Seljavegur 2, 
101 Reykjavik 
Iceland 


Tel: + 354 527 3330 
Mob: + 354 663 3360

----- Original Message -----
From: "Eric Sandeen" <sandeen@sandeen.net>
To: "Rui Gomes" <rgomes@rvx.is>
Cc: "xfs" <xfs@oss.sgi.com>, "omar" <omar@rvx.is>
Sent: Monday, 9 March, 2015 17:34:13
Subject: Re: xfs_repair segfault

On 3/9/15 12:24 PM, Rui Gomes wrote:
> Hello Eric, 
> 
> I would love to send you the xfs metadump but it segfaults as well.

woohoo!  \o/

> This is the output of xfs_repair truncated:
> 
...

> no . entry for directory 260215042
> no .. entry for directory 260215042
> problem with directory contents in inode 260215042
> would have cleared inode 260215042
> bad nblocks 7 for inode 260256256, would reset to 0
> bad nextents 1 for inode 260256256, would reset to 0
> entry "                 kchnfig" in shortform directory 260256256 references invalid inode 28428972647780227
> entry contains illegal character in shortform dir 260256256
> would have junked entry "kchnfig" in directory inode 260256256
> entry "                                                  " in shortform directory 260256256 references invalid inode 0
> size of last entry overflows space left in in shortform dir 260256256, would reset to -1
> *** buffer overflow detected ***: /usr/sbin/xfs_repair terminated

Ok, looking at the sheer number of errors, I really wonder what happened to the fs.

You''d do well to be 100% sure that storage is OK, and that you're not trying to
repair a filesytem on scrambled disks but in any case, xfs should not segfault.

But anyway, this:

> size of last entry overflows space left in in shortform dir 260256256, would reset to -1

is a good clue; it must be in here:

                        if (i == num_entries - 1)  {
                                namelen = ino_dir_size -
                                        ((__psint_t) &sfep->name[0] -
                                         (__psint_t) sfp);
                                do_warn(
_("size of last entry overflows space left in in shortform dir %" PRIu64 ", "),
                                        ino);
                                if (!no_modify)  {
                                        do_warn(_("resetting to %d\n"),
                                                namelen);
                                        sfep->namelen = namelen;
                                        *dino_dirty = 1;

which means the -1 namelen memmove choked on came from:

ino_dir_size - ((__psint_t) &sfep->name[0] - (__psint_t) sfp);

and those come from:

sfp = (struct xfs_dir2_sf_hdr *)XFS_DFORK_DPTR(dip) = ((char *)dip + xfs_dinode_size(dip->di_version))
ino_dir_size = be64_to_cpu(dip->di_size);
sfep = ... xfs_dir2_sf_firstentry(sfp);

We could just be defensive against a negative namelen, but maybe we should
understand a bit more clearly how we got here.

Might start by trying:

# xfs_db -c "inode 260256256" -c "p" /dev/whatever

and show us what you get.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2015-03-09 17:50       ` Rui Gomes
@ 2015-03-09 18:18         ` Eric Sandeen
  2015-03-09 18:24           ` Rui Gomes
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2015-03-09 18:18 UTC (permalink / raw)
  To: Rui Gomes; +Cc: omar, xfs

On 3/9/15 1:50 PM, Rui Gomes wrote:
> Hi,
> 
> Yeah I feel the same way what could possible happen here, since no "funky" business happen in this server.
> 
> In case this help the underline hardware is:
> Raid Controller: MegaRAID SAS 2108 [Liberator] (rev 05)
> With 16 7.2k SAS 2TB harddrives in raid6
> 
> The output from the command:
> [root@icess8a ~]# xfs_db -c "inode 260256256" -c "p" /dev/sdb1 

<snip>

Ok, that's enough to create an image which sees the same failure:

# repair/xfs_repair -n namelen.img 
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
local inode 131 attr too small (size = 0, min size = 4)
bad attribute fork in inode 131, would clear attr fork
bad nblocks 7 for inode 131, would reset to 0
bad nextents 1 for inode 131, would reset to 0
entry "aaaaaaaaaaaaaaaaaaaaaaaa" in shortform directory 131 references invalid inode 28428972647780227
would have junked entry "aaaaaaaaaaaaaaaaaaaaaaaa" in directory inode 131
entry "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" in shortform directory 131 references invalid inode 0
size of last entry overflows space left in in shortform dir 131, would reset to -1
entry contains offset out of order in shortform dir 131
Segmentation fault

I'll see what we need to do in repair to handle this type of corruption.

(However, I don't think that it will suffice to get much of your filesystem
back ...)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2015-03-09 18:18         ` Eric Sandeen
@ 2015-03-09 18:24           ` Rui Gomes
  2015-03-09 20:13             ` Eric Sandeen
  0 siblings, 1 reply; 23+ messages in thread
From: Rui Gomes @ 2015-03-09 18:24 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: omar, xfs

Hello Eric, 

Thank you very much for looking in to this.
Just as a curiosity, I can mount the filesystem and access a lot of the files, but some files/folders will just hang the kernel, so I need to be careful almost file by file tries.

If I could get the xfs_repair to at least clean the filesystem and allow me to browse the remain files without hanging the kernel it will be a bonus!

Once again thank for looking in to this.

Regards 

------------------------------- 
Rui Gomes 
CTO 


RVX - Reykjavik Visual Effects 
Seljavegur 2, 
101 Reykjavik 
Iceland 


Tel: + 354 527 3330 
Mob: + 354 663 3360

----- Original Message -----
From: "Eric Sandeen" <sandeen@sandeen.net>
To: "Rui Gomes" <rgomes@rvx.is>
Cc: "omar" <omar@rvx.is>, "xfs" <xfs@oss.sgi.com>
Sent: Monday, 9 March, 2015 18:18:19
Subject: Re: xfs_repair segfault

On 3/9/15 1:50 PM, Rui Gomes wrote:
> Hi,
> 
> Yeah I feel the same way what could possible happen here, since no "funky" business happen in this server.
> 
> In case this help the underline hardware is:
> Raid Controller: MegaRAID SAS 2108 [Liberator] (rev 05)
> With 16 7.2k SAS 2TB harddrives in raid6
> 
> The output from the command:
> [root@icess8a ~]# xfs_db -c "inode 260256256" -c "p" /dev/sdb1 

<snip>

Ok, that's enough to create an image which sees the same failure:

# repair/xfs_repair -n namelen.img 
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
local inode 131 attr too small (size = 0, min size = 4)
bad attribute fork in inode 131, would clear attr fork
bad nblocks 7 for inode 131, would reset to 0
bad nextents 1 for inode 131, would reset to 0
entry "aaaaaaaaaaaaaaaaaaaaaaaa" in shortform directory 131 references invalid inode 28428972647780227
would have junked entry "aaaaaaaaaaaaaaaaaaaaaaaa" in directory inode 131
entry "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" in shortform directory 131 references invalid inode 0
size of last entry overflows space left in in shortform dir 131, would reset to -1
entry contains offset out of order in shortform dir 131
Segmentation fault

I'll see what we need to do in repair to handle this type of corruption.

(However, I don't think that it will suffice to get much of your filesystem
back ...)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2015-03-09 18:24           ` Rui Gomes
@ 2015-03-09 20:13             ` Eric Sandeen
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Sandeen @ 2015-03-09 20:13 UTC (permalink / raw)
  To: Rui Gomes; +Cc: omar, xfs

On 3/9/15 2:24 PM, Rui Gomes wrote:
> Hello Eric, 
> 
> Thank you very much for looking in to this. Just as a curiosity, I
> can mount the filesystem and access a lot of the files, but some
> files/folders will just hang the kernel, so I need to be careful
> almost file by file tries.
> 
> If I could get the xfs_repair to at least clean the filesystem and
> allow me to browse the remain files without hanging the kernel it
> will be a bonus!
> 
> Once again thank for looking in to this.

Ok, I've sent a patch; you may wish to wait until it's reviewed
before you try it with your filesystem.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2013-10-08 20:23             ` Dave Chinner
  2013-10-09 18:59               ` Viet Nguyen
@ 2013-10-10 21:13               ` Viet Nguyen
  1 sibling, 0 replies; 23+ messages in thread
From: Viet Nguyen @ 2013-10-10 21:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1764 bytes --]

Luckily the files that I want to recover have metadata in them that will
help me rebuild their names. So I'm okay with blowing away the directory
inodes. I guess I wish there was a flag that I can pass to say that
xfs_repair can junk those for me instead of having to do it manually each
time.


So, the compressed metadump file is 2.4G. Any suggestions on where I should
put it, and who I send it to?

On Tue, Oct 8, 2013 at 1:23 PM, Dave Chinner <david@fromorbit.com> wrote:

> On Mon, Oct 07, 2013 at 01:09:09PM -0700, Viet Nguyen wrote:
> > Thanks. That seemed to fix that bug.
> >
> > Now I'm getting a lot of this:
> > xfs_da_do_buf(2): XFS_CORRUPTION_ERROR
>
> Right, that's blocks that are being detected as corrupt when they
> are read. You can ignore that for now.
>
> > fatal error -- can't read block 8388608 for directory inode 8628218
>
> That's a corrupted block list of some kind - it should junk the
> inode.
>
> > Then xfs_repair exits.
>
> I'm not sure why that happens. Is it exiting cleanly or crashing?
> Can you take a metadump of the filesystem and provide it for someone
> to debug the problems it causes repair?
>
> > What I've been doing is what I saw in the FAQ where I would use xfs_db
> and
> > write core.mode 0 for these inodes. But there are just so many of them.
> And
> > is that even the right thing to do?
>
> That marks the inode as "free" which effectively junks it and then
> xfs_repair will free all it's extents next time it is run. Basically
> you are removing the files from the filesystem and making them
> unrecoverable.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>

[-- Attachment #1.2: Type: text/html, Size: 2522 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2013-10-09 18:59               ` Viet Nguyen
@ 2013-10-09 20:15                 ` Dave Chinner
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2013-10-09 20:15 UTC (permalink / raw)
  To: Viet Nguyen; +Cc: xfs

On Wed, Oct 09, 2013 at 11:59:19AM -0700, Viet Nguyen wrote:
> On Tue, Oct 8, 2013 at 1:23 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> > On Mon, Oct 07, 2013 at 01:09:09PM -0700, Viet Nguyen wrote:
> > > Thanks. That seemed to fix that bug.
> > >
> > > Now I'm getting a lot of this:
> > > xfs_da_do_buf(2): XFS_CORRUPTION_ERROR
> >
> > Right, that's blocks that are being detected as corrupt when they
> > are read. You can ignore that for now.
> >
> > > fatal error -- can't read block 8388608 for directory inode 8628218
> >
> > That's a corrupted block list of some kind - it should junk the
> > inode.
> >
> > > Then xfs_repair exits.
> >
> > I'm not sure why that happens. Is it exiting cleanly or crashing?
> > Can you take a metadump of the filesystem and provide it for someone
> > to debug the problems it causes repair?
> >
> 
> It seems to be exiting cleanly with return code 1. I created a metadump,
> but it's 9.6GB. I suppose I can put up on a secure FTP or something like
> that, but it does seem a big large to shuffle around.

How big is it when you compress it? I should get a lot smaller...

> > > What I've been doing is what I saw in the FAQ where I would use xfs_db
> > and
> > > write core.mode 0 for these inodes. But there are just so many of them.
> > And
> > > is that even the right thing to do?
> >
> > That marks the inode as "free" which effectively junks it and then
> > xfs_repair will free all it's extents next time it is run. Basically
> > you are removing the files from the filesystem and making them
> > unrecoverable.
> >
> 
> In the case of directories, it blows away just directory but xfs_repair
> later on scans for orphan files, no? Or am I mistaken on how that works.

It does do that, putting all the unreferenced files into lost+found.
But you lose all the names, and you have to manually work out what
all the files are and what they used to be named and what directory
they belonged to. So it's a mess that would be better avoided if at
all possible.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2013-10-08 20:23             ` Dave Chinner
@ 2013-10-09 18:59               ` Viet Nguyen
  2013-10-09 20:15                 ` Dave Chinner
  2013-10-10 21:13               ` Viet Nguyen
  1 sibling, 1 reply; 23+ messages in thread
From: Viet Nguyen @ 2013-10-09 18:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1531 bytes --]

On Tue, Oct 8, 2013 at 1:23 PM, Dave Chinner <david@fromorbit.com> wrote:

> On Mon, Oct 07, 2013 at 01:09:09PM -0700, Viet Nguyen wrote:
> > Thanks. That seemed to fix that bug.
> >
> > Now I'm getting a lot of this:
> > xfs_da_do_buf(2): XFS_CORRUPTION_ERROR
>
> Right, that's blocks that are being detected as corrupt when they
> are read. You can ignore that for now.
>
> > fatal error -- can't read block 8388608 for directory inode 8628218
>
> That's a corrupted block list of some kind - it should junk the
> inode.
>
> > Then xfs_repair exits.
>
> I'm not sure why that happens. Is it exiting cleanly or crashing?
> Can you take a metadump of the filesystem and provide it for someone
> to debug the problems it causes repair?
>

It seems to be exiting cleanly with return code 1. I created a metadump,
but it's 9.6GB. I suppose I can put up on a secure FTP or something like
that, but it does seem a big large to shuffle around.


>
> > What I've been doing is what I saw in the FAQ where I would use xfs_db
> and
> > write core.mode 0 for these inodes. But there are just so many of them.
> And
> > is that even the right thing to do?
>
> That marks the inode as "free" which effectively junks it and then
> xfs_repair will free all it's extents next time it is run. Basically
> you are removing the files from the filesystem and making them
> unrecoverable.
>

In the case of directories, it blows away just directory but xfs_repair
later on scans for orphan files, no? Or am I mistaken on how that works.

Thanks,
Viet

[-- Attachment #1.2: Type: text/html, Size: 2257 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2013-10-07 20:09           ` Viet Nguyen
@ 2013-10-08 20:23             ` Dave Chinner
  2013-10-09 18:59               ` Viet Nguyen
  2013-10-10 21:13               ` Viet Nguyen
  0 siblings, 2 replies; 23+ messages in thread
From: Dave Chinner @ 2013-10-08 20:23 UTC (permalink / raw)
  To: Viet Nguyen; +Cc: xfs

On Mon, Oct 07, 2013 at 01:09:09PM -0700, Viet Nguyen wrote:
> Thanks. That seemed to fix that bug.
> 
> Now I'm getting a lot of this:
> xfs_da_do_buf(2): XFS_CORRUPTION_ERROR

Right, that's blocks that are being detected as corrupt when they
are read. You can ignore that for now.

> fatal error -- can't read block 8388608 for directory inode 8628218

That's a corrupted block list of some kind - it should junk the
inode.

> Then xfs_repair exits.

I'm not sure why that happens. Is it exiting cleanly or crashing?
Can you take a metadump of the filesystem and provide it for someone
to debug the problems it causes repair?

> What I've been doing is what I saw in the FAQ where I would use xfs_db and
> write core.mode 0 for these inodes. But there are just so many of them. And
> is that even the right thing to do?

That marks the inode as "free" which effectively junks it and then
xfs_repair will free all it's extents next time it is run. Basically
you are removing the files from the filesystem and making them
unrecoverable.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2013-10-04 21:43         ` Dave Chinner
@ 2013-10-07 20:09           ` Viet Nguyen
  2013-10-08 20:23             ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Viet Nguyen @ 2013-10-07 20:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 2508 bytes --]

Thanks. That seemed to fix that bug.

Now I'm getting a lot of this:
xfs_da_do_buf(2): XFS_CORRUPTION_ERROR

fatal error -- can't read block 8388608 for directory inode 8628218

Then xfs_repair exits.

What I've been doing is what I saw in the FAQ where I would use xfs_db and
write core.mode 0 for these inodes. But there are just so many of them. And
is that even the right thing to do?

Thanks


On Fri, Oct 4, 2013 at 2:43 PM, Dave Chinner <david@fromorbit.com> wrote:

> On Fri, Oct 04, 2013 at 10:51:50AM -0700, Viet Nguyen wrote:
> > Hi,
> >
> > I was wondering if you got a chance to look at this and if one's
> available,
> > where can I get a patch?
>
> Can you try the patch below?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> libxfs: validity check the directory block leaf entry count
>
> From: Dave Chinner <dchinner@redhat.com>
>
> The directory block format verifier fails to check that the leaf
> entry count is in a valid range, and so if it is corrupted then it
> can lead to derefencing a pointer outside the block buffer. While we
> can't exactly validate the count without first walking the directory
> block, we can ensure the count lands in the valid area within the
> directory block and hence avoid out-of-block references.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  libxfs/xfs_dir2_data.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/libxfs/xfs_dir2_data.c b/libxfs/xfs_dir2_data.c
> index 189699f..1b5196b 100644
> --- a/libxfs/xfs_dir2_data.c
> +++ b/libxfs/xfs_dir2_data.c
> @@ -59,6 +59,18 @@ __xfs_dir3_data_check(
>                 btp = xfs_dir2_block_tail_p(mp, hdr);
>                 lep = xfs_dir2_block_leaf_p(btp);
>                 endp = (char *)lep;
> +
> +               /*
> +                * The number of leaf entries is limited by the size of the
> +                * block and the amount of space used by the data entries.
> +                * We don't know how much space is used by the data
> entries yet,
> +                * so just ensure that the count falls somewhere inside the
> +                * block right now.
> +                */
> +               XFS_WANT_CORRUPTED_RETURN(be32_to_cpu(btp->count) >
> +                               ((char *)btp - (char *)p) /
> +                                       sizeof(struct
> xfs_dir2_leaf_entry));
> +
>                 break;
>         case cpu_to_be32(XFS_DIR3_DATA_MAGIC):
>         case cpu_to_be32(XFS_DIR2_DATA_MAGIC):
>

[-- Attachment #1.2: Type: text/html, Size: 3335 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2013-10-04 17:51       ` Viet Nguyen
@ 2013-10-04 21:43         ` Dave Chinner
  2013-10-07 20:09           ` Viet Nguyen
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2013-10-04 21:43 UTC (permalink / raw)
  To: Viet Nguyen; +Cc: xfs

On Fri, Oct 04, 2013 at 10:51:50AM -0700, Viet Nguyen wrote:
> Hi,
> 
> I was wondering if you got a chance to look at this and if one's available,
> where can I get a patch?

Can you try the patch below?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

libxfs: validity check the directory block leaf entry count

From: Dave Chinner <dchinner@redhat.com>

The directory block format verifier fails to check that the leaf
entry count is in a valid range, and so if it is corrupted then it
can lead to derefencing a pointer outside the block buffer. While we
can't exactly validate the count without first walking the directory
block, we can ensure the count lands in the valid area within the
directory block and hence avoid out-of-block references.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 libxfs/xfs_dir2_data.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/libxfs/xfs_dir2_data.c b/libxfs/xfs_dir2_data.c
index 189699f..1b5196b 100644
--- a/libxfs/xfs_dir2_data.c
+++ b/libxfs/xfs_dir2_data.c
@@ -59,6 +59,18 @@ __xfs_dir3_data_check(
 		btp = xfs_dir2_block_tail_p(mp, hdr);
 		lep = xfs_dir2_block_leaf_p(btp);
 		endp = (char *)lep;
+
+		/*
+		 * The number of leaf entries is limited by the size of the
+		 * block and the amount of space used by the data entries.
+		 * We don't know how much space is used by the data entries yet,
+		 * so just ensure that the count falls somewhere inside the
+		 * block right now.
+		 */
+		XFS_WANT_CORRUPTED_RETURN(be32_to_cpu(btp->count) >
+				((char *)btp - (char *)p) /
+					sizeof(struct xfs_dir2_leaf_entry));
+
 		break;
 	case cpu_to_be32(XFS_DIR3_DATA_MAGIC):
 	case cpu_to_be32(XFS_DIR2_DATA_MAGIC):

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2013-10-02 10:42     ` Dave Chinner
@ 2013-10-04 17:51       ` Viet Nguyen
  2013-10-04 21:43         ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Viet Nguyen @ 2013-10-04 17:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 2984 bytes --]

Hi,

I was wondering if you got a chance to look at this and if one's available,
where can I get a patch?

Thanks!
Viet


On Wed, Oct 2, 2013 at 3:42 AM, Dave Chinner <david@fromorbit.com> wrote:

> On Tue, Oct 01, 2013 at 02:12:16PM -0700, Viet Nguyen wrote:
> > Hi again,
> > Here's the stack trace:
> >
> > #0  __xfs_dir3_data_check (dp=<value optimized out>, bp=<value optimized
> > out>) at xfs_dir2_data.c:149
> > #1  0x0000000000451d32 in xfs_dir3_block_verify (bp=0x94369210) at
> > xfs_dir2_block.c:62
> > #2  0x0000000000451ed1 in xfs_dir3_block_read_verify (bp=0x94369210) at
> > xfs_dir2_block.c:73
> > #3  0x0000000000431e2a in libxfs_readbuf (btp=0x6aaca0, blkno=5292504,
> > len=8, flags=0, ops=0x478c60) at rdwr.c:718
> > #4  0x0000000000412295 in da_read_buf (mp=0x7fffffffe090, nex=1,
> bmp=<value
> > optimized out>, ops=<value optimized out>) at dir2.c:129
> > #5  0x0000000000415c26 in process_block_dir2 (mp=0x7fffffffe090,
> > ino=8639864, dip=0x95030000, ino_discovery=1, dino_dirty=<value optimized
> > out>, dirname=0x472201 "", parent=0x7fffffffdf28, blkmap=0x7ffff0342010)
> at
> > dir2.c:1594
> > #6  process_dir2 (mp=0x7fffffffe090, ino=8639864, dip=0x95030000,
> > ino_discovery=1, dino_dirty=<value optimized out>, dirname=0x472201 "",
> > parent=0x7fffffffdf28, blkmap=0x7ffff0342010) at dir2.c:1993
> > #7  0x0000000000411e6c in process_dinode_int (mp=0x7fffffffe090,
> > dino=0x95030000, agno=1, ino=0, was_free=0, dirty=0x7fffffffdf38,
> > used=0x7fffffffdf3c, verify_mode=0, uncertain=0, ino_discovery=1,
> > check_dups=0, extra_attr_check=1, isa_dir=0x7fffffffdf34,
> >     parent=0x7fffffffdf28) at dinode.c:2859
> > #8  0x000000000041213e in process_dinode (mp=<value optimized out>,
> > dino=<value optimized out>, agno=<value optimized out>, ino=<value
> > optimized out>, was_free=<value optimized out>, dirty=<value optimized
> > out>, used=0x7fffffffdf3c, ino_discovery=1, check_dups=0,
> >     extra_attr_check=1, isa_dir=0x7fffffffdf34, parent=0x7fffffffdf28) at
> > dinode.c:2967
> > #9  0x000000000040a870 in process_inode_chunk (mp=0x7fffffffe090, agno=0,
> > num_inos=<value optimized out>, first_irec=0x7fff5d63f320,
> ino_discovery=1,
> > check_dups=0, extra_attr_check=1, bogus=0x7fffffffdfcc) at
> dino_chunks.c:772
> > #10 0x000000000040ae97 in process_aginodes (mp=0x7fffffffe090,
> pf_args=0x0,
> > agno=0, ino_discovery=1, check_dups=0, extra_attr_check=1) at
> > dino_chunks.c:1014
> > #11 0x000000000041978d in process_ag_func (wq=0x695f40, agno=0, arg=0x0)
> at
> > phase3.c:77
> > #12 0x0000000000419bac in process_ags (mp=0x7fffffffe090) at phase3.c:116
> > #13 phase3 (mp=0x7fffffffe090) at phase3.c:155
> > #14 0x000000000042d200 in main (argc=<value optimized out>, argv=<value
> > optimized out>) at xfs_repair.c:749
>
> Looks like an out of range entry count. it's not checked for
> validity before it is used. I'll try to whip up a fix
> tomorrow.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

[-- Attachment #1.2: Type: text/html, Size: 3889 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2013-10-01 21:12   ` Viet Nguyen
@ 2013-10-02 10:42     ` Dave Chinner
  2013-10-04 17:51       ` Viet Nguyen
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2013-10-02 10:42 UTC (permalink / raw)
  To: Viet Nguyen; +Cc: xfs

On Tue, Oct 01, 2013 at 02:12:16PM -0700, Viet Nguyen wrote:
> Hi again,
> Here's the stack trace:
> 
> #0  __xfs_dir3_data_check (dp=<value optimized out>, bp=<value optimized
> out>) at xfs_dir2_data.c:149
> #1  0x0000000000451d32 in xfs_dir3_block_verify (bp=0x94369210) at
> xfs_dir2_block.c:62
> #2  0x0000000000451ed1 in xfs_dir3_block_read_verify (bp=0x94369210) at
> xfs_dir2_block.c:73
> #3  0x0000000000431e2a in libxfs_readbuf (btp=0x6aaca0, blkno=5292504,
> len=8, flags=0, ops=0x478c60) at rdwr.c:718
> #4  0x0000000000412295 in da_read_buf (mp=0x7fffffffe090, nex=1, bmp=<value
> optimized out>, ops=<value optimized out>) at dir2.c:129
> #5  0x0000000000415c26 in process_block_dir2 (mp=0x7fffffffe090,
> ino=8639864, dip=0x95030000, ino_discovery=1, dino_dirty=<value optimized
> out>, dirname=0x472201 "", parent=0x7fffffffdf28, blkmap=0x7ffff0342010) at
> dir2.c:1594
> #6  process_dir2 (mp=0x7fffffffe090, ino=8639864, dip=0x95030000,
> ino_discovery=1, dino_dirty=<value optimized out>, dirname=0x472201 "",
> parent=0x7fffffffdf28, blkmap=0x7ffff0342010) at dir2.c:1993
> #7  0x0000000000411e6c in process_dinode_int (mp=0x7fffffffe090,
> dino=0x95030000, agno=1, ino=0, was_free=0, dirty=0x7fffffffdf38,
> used=0x7fffffffdf3c, verify_mode=0, uncertain=0, ino_discovery=1,
> check_dups=0, extra_attr_check=1, isa_dir=0x7fffffffdf34,
>     parent=0x7fffffffdf28) at dinode.c:2859
> #8  0x000000000041213e in process_dinode (mp=<value optimized out>,
> dino=<value optimized out>, agno=<value optimized out>, ino=<value
> optimized out>, was_free=<value optimized out>, dirty=<value optimized
> out>, used=0x7fffffffdf3c, ino_discovery=1, check_dups=0,
>     extra_attr_check=1, isa_dir=0x7fffffffdf34, parent=0x7fffffffdf28) at
> dinode.c:2967
> #9  0x000000000040a870 in process_inode_chunk (mp=0x7fffffffe090, agno=0,
> num_inos=<value optimized out>, first_irec=0x7fff5d63f320, ino_discovery=1,
> check_dups=0, extra_attr_check=1, bogus=0x7fffffffdfcc) at dino_chunks.c:772
> #10 0x000000000040ae97 in process_aginodes (mp=0x7fffffffe090, pf_args=0x0,
> agno=0, ino_discovery=1, check_dups=0, extra_attr_check=1) at
> dino_chunks.c:1014
> #11 0x000000000041978d in process_ag_func (wq=0x695f40, agno=0, arg=0x0) at
> phase3.c:77
> #12 0x0000000000419bac in process_ags (mp=0x7fffffffe090) at phase3.c:116
> #13 phase3 (mp=0x7fffffffe090) at phase3.c:155
> #14 0x000000000042d200 in main (argc=<value optimized out>, argv=<value
> optimized out>) at xfs_repair.c:749

Looks like an out of range entry count. it's not checked for
validity before it is used. I'll try to whip up a fix
tomorrow.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2013-10-01 20:19 ` Dave Chinner
@ 2013-10-01 21:12   ` Viet Nguyen
  2013-10-02 10:42     ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Viet Nguyen @ 2013-10-01 21:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 4232 bytes --]

Hi again,
Here's the stack trace:

#0  __xfs_dir3_data_check (dp=<value optimized out>, bp=<value optimized
out>) at xfs_dir2_data.c:149
#1  0x0000000000451d32 in xfs_dir3_block_verify (bp=0x94369210) at
xfs_dir2_block.c:62
#2  0x0000000000451ed1 in xfs_dir3_block_read_verify (bp=0x94369210) at
xfs_dir2_block.c:73
#3  0x0000000000431e2a in libxfs_readbuf (btp=0x6aaca0, blkno=5292504,
len=8, flags=0, ops=0x478c60) at rdwr.c:718
#4  0x0000000000412295 in da_read_buf (mp=0x7fffffffe090, nex=1, bmp=<value
optimized out>, ops=<value optimized out>) at dir2.c:129
#5  0x0000000000415c26 in process_block_dir2 (mp=0x7fffffffe090,
ino=8639864, dip=0x95030000, ino_discovery=1, dino_dirty=<value optimized
out>, dirname=0x472201 "", parent=0x7fffffffdf28, blkmap=0x7ffff0342010) at
dir2.c:1594
#6  process_dir2 (mp=0x7fffffffe090, ino=8639864, dip=0x95030000,
ino_discovery=1, dino_dirty=<value optimized out>, dirname=0x472201 "",
parent=0x7fffffffdf28, blkmap=0x7ffff0342010) at dir2.c:1993
#7  0x0000000000411e6c in process_dinode_int (mp=0x7fffffffe090,
dino=0x95030000, agno=1, ino=0, was_free=0, dirty=0x7fffffffdf38,
used=0x7fffffffdf3c, verify_mode=0, uncertain=0, ino_discovery=1,
check_dups=0, extra_attr_check=1, isa_dir=0x7fffffffdf34,
    parent=0x7fffffffdf28) at dinode.c:2859
#8  0x000000000041213e in process_dinode (mp=<value optimized out>,
dino=<value optimized out>, agno=<value optimized out>, ino=<value
optimized out>, was_free=<value optimized out>, dirty=<value optimized
out>, used=0x7fffffffdf3c, ino_discovery=1, check_dups=0,
    extra_attr_check=1, isa_dir=0x7fffffffdf34, parent=0x7fffffffdf28) at
dinode.c:2967
#9  0x000000000040a870 in process_inode_chunk (mp=0x7fffffffe090, agno=0,
num_inos=<value optimized out>, first_irec=0x7fff5d63f320, ino_discovery=1,
check_dups=0, extra_attr_check=1, bogus=0x7fffffffdfcc) at dino_chunks.c:772
#10 0x000000000040ae97 in process_aginodes (mp=0x7fffffffe090, pf_args=0x0,
agno=0, ino_discovery=1, check_dups=0, extra_attr_check=1) at
dino_chunks.c:1014
#11 0x000000000041978d in process_ag_func (wq=0x695f40, agno=0, arg=0x0) at
phase3.c:77
#12 0x0000000000419bac in process_ags (mp=0x7fffffffe090) at phase3.c:116
#13 phase3 (mp=0x7fffffffe090) at phase3.c:155
#14 0x000000000042d200 in main (argc=<value optimized out>, argv=<value
optimized out>) at xfs_repair.c:749




On Tue, Oct 1, 2013 at 1:19 PM, Dave Chinner <david@fromorbit.com> wrote:

> On Tue, Oct 01, 2013 at 12:57:42PM -0700, Viet Nguyen wrote:
> > Hi,
> >
> > I have a corrupted xfs partition that segfaults when I run xfs_repair, at
> > the same place every time.
> >
> > I'm using the latest version of xfs_repair that I am aware of: xfs_repair
> > version 3.2.0-alpha1
> >
> > I simply run it as so: xfs_repair -P /dev/sda1
> >
> > Here's a sample of the last few lines that are spit out:
> > correcting nextents for inode 8637985
> > correcting nblocks for inode 8637985, was 198 - counted 0
> > correcting nextents for inode 8637985, was 1 - counted 0
> > data fork in regular inode 8637987 claims used block 7847452695
> > correcting nextents for inode 8637987
> > correcting nblocks for inode 8637987, was 198 - counted 0
> > correcting nextents for inode 8637987, was 1 - counted 0
> > data fork in regular inode 8637999 claims used block 11068974204
> > correcting nextents for inode 8637999
> > correcting nblocks for inode 8637999, was 200 - counted 0
> > correcting nextents for inode 8637999, was 1 - counted 0
> > data fork in regular inode 8638002 claims used block 11873152787
> > correcting nextents for inode 8638002
> > correcting nblocks for inode 8638002, was 201 - counted 0
> > correcting nextents for inode 8638002, was 1 - counted 0
> > imap claims a free inode 8638005 is in use, correcting imap and clearing
> > inode
> > cleared inode 8638005
> > imap claims a free inode 8638011 is in use, correcting imap and clearing
> > inode
> > cleared inode 8638011
> > Segmentation fault (core dumped)
> >
> > It crashes after attempting to clear that same inode every time.
> >
> > Any advice you can give me on this?
>
> Can you run it under gdb and send the stack trace that tells us
> where it crashed?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

[-- Attachment #1.2: Type: text/html, Size: 5276 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: xfs_repair segfault
  2013-10-01 19:57 Viet Nguyen
@ 2013-10-01 20:19 ` Dave Chinner
  2013-10-01 21:12   ` Viet Nguyen
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2013-10-01 20:19 UTC (permalink / raw)
  To: Viet Nguyen; +Cc: xfs

On Tue, Oct 01, 2013 at 12:57:42PM -0700, Viet Nguyen wrote:
> Hi,
> 
> I have a corrupted xfs partition that segfaults when I run xfs_repair, at
> the same place every time.
> 
> I'm using the latest version of xfs_repair that I am aware of: xfs_repair
> version 3.2.0-alpha1
> 
> I simply run it as so: xfs_repair -P /dev/sda1
> 
> Here's a sample of the last few lines that are spit out:
> correcting nextents for inode 8637985
> correcting nblocks for inode 8637985, was 198 - counted 0
> correcting nextents for inode 8637985, was 1 - counted 0
> data fork in regular inode 8637987 claims used block 7847452695
> correcting nextents for inode 8637987
> correcting nblocks for inode 8637987, was 198 - counted 0
> correcting nextents for inode 8637987, was 1 - counted 0
> data fork in regular inode 8637999 claims used block 11068974204
> correcting nextents for inode 8637999
> correcting nblocks for inode 8637999, was 200 - counted 0
> correcting nextents for inode 8637999, was 1 - counted 0
> data fork in regular inode 8638002 claims used block 11873152787
> correcting nextents for inode 8638002
> correcting nblocks for inode 8638002, was 201 - counted 0
> correcting nextents for inode 8638002, was 1 - counted 0
> imap claims a free inode 8638005 is in use, correcting imap and clearing
> inode
> cleared inode 8638005
> imap claims a free inode 8638011 is in use, correcting imap and clearing
> inode
> cleared inode 8638011
> Segmentation fault (core dumped)
> 
> It crashes after attempting to clear that same inode every time.
> 
> Any advice you can give me on this?

Can you run it under gdb and send the stack trace that tells us
where it crashed?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* xfs_repair segfault
@ 2013-10-01 19:57 Viet Nguyen
  2013-10-01 20:19 ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Viet Nguyen @ 2013-10-01 19:57 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1466 bytes --]

Hi,

I have a corrupted xfs partition that segfaults when I run xfs_repair, at
the same place every time.

I'm using the latest version of xfs_repair that I am aware of: xfs_repair
version 3.2.0-alpha1

I simply run it as so: xfs_repair -P /dev/sda1

Here's a sample of the last few lines that are spit out:
correcting nextents for inode 8637985
correcting nblocks for inode 8637985, was 198 - counted 0
correcting nextents for inode 8637985, was 1 - counted 0
data fork in regular inode 8637987 claims used block 7847452695
correcting nextents for inode 8637987
correcting nblocks for inode 8637987, was 198 - counted 0
correcting nextents for inode 8637987, was 1 - counted 0
data fork in regular inode 8637999 claims used block 11068974204
correcting nextents for inode 8637999
correcting nblocks for inode 8637999, was 200 - counted 0
correcting nextents for inode 8637999, was 1 - counted 0
data fork in regular inode 8638002 claims used block 11873152787
correcting nextents for inode 8638002
correcting nblocks for inode 8638002, was 201 - counted 0
correcting nextents for inode 8638002, was 1 - counted 0
imap claims a free inode 8638005 is in use, correcting imap and clearing
inode
cleared inode 8638005
imap claims a free inode 8638011 is in use, correcting imap and clearing
inode
cleared inode 8638011
Segmentation fault (core dumped)

It crashes after attempting to clear that same inode every time.

Any advice you can give me on this?

Thanks,
Viet

[-- Attachment #1.2: Type: text/html, Size: 1899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: xfs_repair segfault
  2007-04-03 19:11 James W. Abendschan
@ 2007-04-04  0:45 ` Barry Naujok
  0 siblings, 0 replies; 23+ messages in thread
From: Barry Naujok @ 2007-04-04  0:45 UTC (permalink / raw)
  To: 'James W. Abendschan', xfs

Hi James,

Would it be possible for you apply the patch I posted to xfs@oss
in Feb http://oss.sgi.com/archives/xfs/2007-02/msg00072.html
to the latest xfsprogs source, make and install it and run:

# xfs_metadump /dev/md1 - | bzip2 > /tmp/bad_xfs.bz2

And make the image available for me to download and analyse?

Regards,
Barry.

> -----Original Message-----
> From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] 
> On Behalf Of James W. Abendschan
> Sent: Wednesday, 4 April 2007 5:12 AM
> To: xfs@oss.sgi.com
> Subject: xfs_repair segfault
> 
> Hi there -- I have a 6.9TB XFS volume that is acting up
> after a power failure (I understand XFS + no UPS + PC
> hardware == badness.  Not my decision.)
> 
> The machine is a dual proc x86 (intel xeon 5130) w/ 8GB RAM
> running a custom 2.6.18 kernel on top of Ubuntu 6.06.
> 
> Since xfs_check can't repair volumes of this size without
> scads of memory, I've been using xfs_repair to correct
> power-related problems before.
> 
> Unfortunately, for some reason xfs_repair is segfaulting:
> 
> # ulimit -c unlimited
> # xfs_repair -v /dev/md1
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
> zero_log: head block 8 tail block 8
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - agno = 4
>         - agno = 5
>         - agno = 6
>         - agno = 7
>         - agno = 8
>         - agno = 9
>         - agno = 10
>         - agno = 11
>         - agno = 12
>         - agno = 13
>         - agno = 14
>         - agno = 15
>         - agno = 16
>         - agno = 17
>         - agno = 18
>         - agno = 19
>         - agno = 20
>         - agno = 21
>         - agno = 22
>         - agno = 23
>         - agno = 24
>         - agno = 25
>         - agno = 26
>         - agno = 27
>         - agno = 28
>         - agno = 29
>         - agno = 30
>         - agno = 31
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - clear lost+found (if it exists) ...
>         - clearing existing "lost+found" inode
> Segmentation fault      (core dumped)
> 
> 
> gdb doesn't show anything useful (I don't know how to interpret
> the I/O error) :
> 
> 
> # gdb /sbin/xfs_repair core
> GNU gdb 6.4-debian
> Copyright 2005 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public 
> License, and you are
> welcome to change it and/or distribute copies of it under 
> certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show 
> warranty" for details.
> This GDB was configured as "i486-linux-gnu"...(no debugging 
> symbols found)
> Using host libthread_db library 
> "/lib/tls/i686/cmov/libthread_db.so.1".
> 
> (no debugging symbols found)
> Core was generated by `xfs_repair -v /dev/md1'.
> Program terminated with signal 11, Segmentation fault.
> 
> warning: Can't read pathname for load map: Input/output error.
> Reading symbols from /lib/libuuid.so.1...(no debugging 
> symbols found)...done.
> Loaded symbols for /lib/libuuid.so.1
> Reading symbols from /lib/tls/i686/cmov/libc.so.6...(no 
> debugging symbols found)
> Loaded symbols for /lib/tls/i686/cmov/libc.so.6
> Reading symbols from /lib/ld-linux.so.2...(no debugging 
> symbols found)...done.
> Loaded symbols for /lib/ld-linux.so.2
> 
> #0  0x08052f42 in ?? ()
> (gdb) bt
> #0  0x08052f42 in ?? ()
> #1  0x000088e9 in ?? ()
> #2  0x00000800 in ?? ()
> #3  0x00000080 in ?? ()
> #4  0x00000000 in ?? ()
> 
> 
> What's the next step?
> 
> Thanks,
> James
> 
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* xfs_repair segfault
@ 2007-04-03 19:11 James W. Abendschan
  2007-04-04  0:45 ` Barry Naujok
  0 siblings, 1 reply; 23+ messages in thread
From: James W. Abendschan @ 2007-04-03 19:11 UTC (permalink / raw)
  To: xfs

Hi there -- I have a 6.9TB XFS volume that is acting up
after a power failure (I understand XFS + no UPS + PC
hardware == badness.  Not my decision.)

The machine is a dual proc x86 (intel xeon 5130) w/ 8GB RAM
running a custom 2.6.18 kernel on top of Ubuntu 6.06.

Since xfs_check can't repair volumes of this size without
scads of memory, I've been using xfs_repair to correct
power-related problems before.

Unfortunately, for some reason xfs_repair is segfaulting:

# ulimit -c unlimited
# xfs_repair -v /dev/md1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
zero_log: head block 8 tail block 8
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - clearing existing "lost+found" inode
Segmentation fault      (core dumped)


gdb doesn't show anything useful (I don't know how to interpret
the I/O error) :


# gdb /sbin/xfs_repair core
GNU gdb 6.4-debian
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(no debugging symbols found)
Core was generated by `xfs_repair -v /dev/md1'.
Program terminated with signal 11, Segmentation fault.

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libuuid.so.1
Reading symbols from /lib/tls/i686/cmov/libc.so.6...(no debugging symbols found)
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2

#0  0x08052f42 in ?? ()
(gdb) bt
#0  0x08052f42 in ?? ()
#1  0x000088e9 in ?? ()
#2  0x00000800 in ?? ()
#3  0x00000080 in ?? ()
#4  0x00000000 in ?? ()


What's the next step?

Thanks,
James

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-03-09 20:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-09 15:50 xfs_repair segfault Rui Gomes
2015-03-09 15:55 ` Carsten Aulbert
2015-03-09 16:11   ` Rui Gomes
2015-03-09 16:14 ` Eric Sandeen
2015-03-09 16:24   ` Rui Gomes
2015-03-09 17:34     ` Eric Sandeen
2015-03-09 17:50       ` Rui Gomes
2015-03-09 18:18         ` Eric Sandeen
2015-03-09 18:24           ` Rui Gomes
2015-03-09 20:13             ` Eric Sandeen
  -- strict thread matches above, loose matches on Subject: below --
2013-10-01 19:57 Viet Nguyen
2013-10-01 20:19 ` Dave Chinner
2013-10-01 21:12   ` Viet Nguyen
2013-10-02 10:42     ` Dave Chinner
2013-10-04 17:51       ` Viet Nguyen
2013-10-04 21:43         ` Dave Chinner
2013-10-07 20:09           ` Viet Nguyen
2013-10-08 20:23             ` Dave Chinner
2013-10-09 18:59               ` Viet Nguyen
2013-10-09 20:15                 ` Dave Chinner
2013-10-10 21:13               ` Viet Nguyen
2007-04-03 19:11 James W. Abendschan
2007-04-04  0:45 ` Barry Naujok

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.