All of lore.kernel.org
 help / color / mirror / Atom feed
* xfs_repair segfaut in stage 6
@ 2011-09-09  8:43 Bartosz Cisek
  2011-09-09 12:01 ` Michael Monnerie
  2011-09-09 12:38 ` Christoph Hellwig
  0 siblings, 2 replies; 17+ messages in thread
From: Bartosz Cisek @ 2011-09-09  8:43 UTC (permalink / raw)
  To: xfs

Hi,

Recently we had some problems with 2TB partition using xfs.

I/O error in filesystem ("cciss/c0d5p1") meta-data dev cciss/c0d5p1
block 0x120a0       ("xfs_trans_read_buf") error 5 buf count 4096

At first I suspected failed hardware, but disk array wasn't reporting
anything.

Distro (Debian 5.0) version of xfs_repair finished with segfault. Same
with compiled from git. I filled bug report [1]. Can someone possibly
look into it? As it's hadoop heavy replicated partition I keep it only
to provide additional information to resolve this segfault. I would be
happy to provide as much detailed info as required, but I need to know
what :)

[1] http://oss.sgi.com/bugzilla/show_bug.cgi?id=914

Kind regards,

Bartek
-- 
Bartosz Cisek
Admin

email: bartosz.cisek@nasza-klasa.pl
tel: +48 519 300 122

Nasza Klasa Sp. z o.o.,
ul. Gen. J. Bema 2, 50-265 Wrocław

Sąd Rejonowy dla Wrocławia - Fabrycznej we Wrocławiu,
VI Wydział Gospodarczy Krajowego Rejestru Sądowego,
nr KRS:0000289629, NIP:898-21-22-104 REGON:020586020,
Kapitał zakładowy: 67 850,00 PLN

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-09  8:43 xfs_repair segfaut in stage 6 Bartosz Cisek
@ 2011-09-09 12:01 ` Michael Monnerie
  2011-09-09 15:06   ` Bartosz Cisek
  2011-09-09 12:38 ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Michael Monnerie @ 2011-09-09 12:01 UTC (permalink / raw)
  To: xfs; +Cc: Bartosz Cisek


[-- Attachment #1.1: Type: Text/Plain, Size: 848 bytes --]

On Freitag, 9. September 2011 Bartosz Cisek wrote:
> Distro (Debian 5.0) version of xfs_repair finished with segfault.
> Same with compiled from git. I filled bug report [1]. Can someone
> possibly look into it? As it's hadoop heavy replicated partition I
> keep it only to provide additional information to resolve this
> segfault. I would be happy to provide as much detailed info as
> required, but I need to know what :)

Can you provide a metadump of that partition? By this, the devs can 
reproduce the bug on their machine, and resolve issues much quicker.

Which kernel and architecture are you on?

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-09  8:43 xfs_repair segfaut in stage 6 Bartosz Cisek
  2011-09-09 12:01 ` Michael Monnerie
@ 2011-09-09 12:38 ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-09-09 12:38 UTC (permalink / raw)
  To: Bartosz Cisek; +Cc: xfs

On Fri, Sep 09, 2011 at 10:43:37AM +0200, Bartosz Cisek wrote:
> Hi,
> 
> Recently we had some problems with 2TB partition using xfs.
> 
> I/O error in filesystem ("cciss/c0d5p1") meta-data dev cciss/c0d5p1
> block 0x120a0       ("xfs_trans_read_buf") error 5 buf count 4096
> 
> At first I suspected failed hardware, but disk array wasn't reporting
> anything.

But that error really means that we got an error from the device.

> Distro (Debian 5.0) version of xfs_repair finished with segfault. Same
> with compiled from git. I filled bug report [1]. Can someone possibly
> look into it? As it's hadoop heavy replicated partition I keep it only
> to provide additional information to resolve this segfault. I would be
> happy to provide as much detailed info as required, but I need to know
> what :)

The above pretty much guarantees a hardware (or maybe driver) issue,
as XFS only gets the EIO from the lower layers.

Of course that does not mean that xfs_repair should crash.  I'll look
into it.  As already mentioned a metadump image would be very helpful.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-09 12:01 ` Michael Monnerie
@ 2011-09-09 15:06   ` Bartosz Cisek
  2011-09-12 15:42     ` Bartosz Cisek
  2011-09-12 16:12     ` Christoph Hellwig
  0 siblings, 2 replies; 17+ messages in thread
From: Bartosz Cisek @ 2011-09-09 15:06 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

W dniu 09.09.2011 14:01, Michael Monnerie pisze:

Thanks for reply :)

> Can you provide a metadump of that partition? By this, the devs can 
> reproduce the bug on their machine, and resolve issues much quicker.

xfs_metadump -g /dev/cciss/c0d5p1 - | bzip2  > metadump.bz2

http://bartoszcisek.pl/metadump.bz2

> Which kernel and architecture are you on?

Linux hd-slave5 2.6.30-1-amd64 #1 SMP Tue Aug 18 17:39:23 CEST 2009
x86_64 GNU/Linux

-- 
Bartosz Cisek
Admin

email: bartosz.cisek@nasza-klasa.pl
tel: +48 519 300 122

Nasza Klasa Sp. z o.o.,
ul. Gen. J. Bema 2, 50-265 Wrocław

Sąd Rejonowy dla Wrocławia - Fabrycznej we Wrocławiu,
VI Wydział Gospodarczy Krajowego Rejestru Sądowego,
nr KRS:0000289629, NIP:898-21-22-104 REGON:020586020,
Kapitał zakładowy: 67 850,00 PLN

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-09 15:06   ` Bartosz Cisek
@ 2011-09-12 15:42     ` Bartosz Cisek
  2011-09-12 15:58       ` Christoph Hellwig
  2011-09-12 16:12     ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Bartosz Cisek @ 2011-09-12 15:42 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

W dniu 09.09.2011 17:06, Bartosz Cisek pisze:

>> Can you provide a metadump of that partition? By this, the devs can 
>> reproduce the bug on their machine, and resolve issues much quicker.
> 
> xfs_metadump -g /dev/cciss/c0d5p1 - | bzip2  > metadump.bz2
> 
> http://bartoszcisek.pl/metadump.bz2

Is there anything more I can get from failed partition to help solving
this bug? Or it can be safely recreated?
-- 
Bartosz Cisek
Admin

email: bartosz.cisek@nasza-klasa.pl
tel: +48 519 300 122

Nasza Klasa Sp. z o.o.,
ul. Gen. J. Bema 2, 50-265 Wrocław

Sąd Rejonowy dla Wrocławia - Fabrycznej we Wrocławiu,
VI Wydział Gospodarczy Krajowego Rejestru Sądowego,
nr KRS:0000289629, NIP:898-21-22-104 REGON:020586020,
Kapitał zakładowy: 67 850,00 PLN

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-12 15:42     ` Bartosz Cisek
@ 2011-09-12 15:58       ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-09-12 15:58 UTC (permalink / raw)
  To: Bartosz Cisek; +Cc: Michael Monnerie, xfs

On Mon, Sep 12, 2011 at 05:42:18PM +0200, Bartosz Cisek wrote:
> W dniu 09.09.2011 17:06, Bartosz Cisek pisze:
> 
> >> Can you provide a metadump of that partition? By this, the devs can 
> >> reproduce the bug on their machine, and resolve issues much quicker.
> > 
> > xfs_metadump -g /dev/cciss/c0d5p1 - | bzip2  > metadump.bz2
> > 
> > http://bartoszcisek.pl/metadump.bz2
> 
> Is there anything more I can get from failed partition to help solving
> this bug? Or it can be safely recreated?

Sorry for not replying earlier Bart.  I'm downloading the metadump image
now and will look at it.

Do you still have the kernel logs from when the issue happened?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-09 15:06   ` Bartosz Cisek
  2011-09-12 15:42     ` Bartosz Cisek
@ 2011-09-12 16:12     ` Christoph Hellwig
  2011-09-14  9:38       ` Bartosz Cisek
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2011-09-12 16:12 UTC (permalink / raw)
  To: Bartosz Cisek; +Cc: Michael Monnerie, xfs

On Fri, Sep 09, 2011 at 05:06:27PM +0200, Bartosz Cisek wrote:
> W dniu 09.09.2011 14:01, Michael Monnerie pisze:
> 
> Thanks for reply :)
> 
> > Can you provide a metadump of that partition? By this, the devs can 
> > reproduce the bug on their machine, and resolve issues much quicker.
> 
> xfs_metadump -g /dev/cciss/c0d5p1 - | bzip2  > metadump.bz2
> 
> http://bartoszcisek.pl/metadump.bz2
> 
> > Which kernel and architecture are you on?
> 
> Linux hd-slave5 2.6.30-1-amd64 #1 SMP Tue Aug 18 17:39:23 CEST 2009
> x86_64 GNU/Linux

I've repaired the image fine using xfs_repair from the Debian testing
xfsprogs 3.1.5+nmu1 package.  It found two invalid blocks in a
directory, which look like the result from the hardware error you saw.

You should be able to just rebuild the current xfsprogs (from
testing/unstable or git) on Lenny and get the same result.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-12 16:12     ` Christoph Hellwig
@ 2011-09-14  9:38       ` Bartosz Cisek
  2011-09-14 14:24         ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Bartosz Cisek @ 2011-09-14  9:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Michael Monnerie, xfs

W dniu 12.09.2011 18:12, Christoph Hellwig pisze:
> I've repaired the image fine using xfs_repair from the Debian testing
> xfsprogs 3.1.5+nmu1 package.  It found two invalid blocks in a
> directory, which look like the result from the hardware error you saw.
> 
> You should be able to just rebuild the current xfsprogs (from
> testing/unstable or git) on Lenny and get the same result.

I've build xfs_repair from git on lenny and got segfault before my first
email to this list (please refer to first email in this thread) :)

What else I can do to find what differ our two cases?


hd-slave5 ~/devel/xfsprogs/repair # LC_ALL=en_EN ./xfs_repair -v
/dev/cciss/c0d5p1
Phase 1 - find and verify superblock...
        - block cache size set to 3446312 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 2 tail block 2
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
xfs_repair: read failed: Input/output error
can't read block 0 for directory inode 146453
no . entry for directory 146453
no .. entry for directory 146453
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 3
        - agno = 1
xfs_repair: read failed: Input/output error
can't read block 0 for directory inode 146453
no . entry for directory 146453
no .. entry for directory 146453
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
entry "subdir49" in dir ino 437 doesn't have a .. entry, will set it in
ino 146453.
xfs_repair: read failed: Input/output error
Naruszenie ochrony pamięci (segfault)
hd-slave5 ~/devel/xfsprogs/repair # LC_ALL=en_EN ./xfs_repair -V
xfs_repair version 3.1.5

-- 
Bartosz Cisek
Admin

email: bartosz.cisek@nasza-klasa.pl
tel: +48 519 300 122

Nasza Klasa Sp. z o.o.,
ul. Gen. J. Bema 2, 50-265 Wrocław

Sąd Rejonowy dla Wrocławia - Fabrycznej we Wrocławiu,
VI Wydział Gospodarczy Krajowego Rejestru Sądowego,
nr KRS:0000289629, NIP:898-21-22-104 REGON:020586020,
Kapitał zakładowy: 67 850,00 PLN

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-14  9:38       ` Bartosz Cisek
@ 2011-09-14 14:24         ` Christoph Hellwig
  2011-09-14 14:57           ` Eric Sandeen
  2011-09-14 14:59           ` Bartosz Cisek
  0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-09-14 14:24 UTC (permalink / raw)
  To: Bartosz Cisek; +Cc: Christoph Hellwig, Michael Monnerie, xfs

On Wed, Sep 14, 2011 at 11:38:44AM +0200, Bartosz Cisek wrote:
> W dniu 12.09.2011 18:12, Christoph Hellwig pisze:
> > I've repaired the image fine using xfs_repair from the Debian testing
> > xfsprogs 3.1.5+nmu1 package.  It found two invalid blocks in a
> > directory, which look like the result from the hardware error you saw.
> > 
> > You should be able to just rebuild the current xfsprogs (from
> > testing/unstable or git) on Lenny and get the same result.
> 
> I've build xfs_repair from git on lenny and got segfault before my first
> email to this list (please refer to first email in this thread) :)

Ooops.

> What else I can do to find what differ our two cases?

The hardware?  Given that you were getting read I/O errors from the
hardware when shutting the fs down you probably got those as well when
running repair and that caused the segfault.   Can you run xfs_repair
under gdb for me, that is:

gdb /path/to.xfs_repair

(gdb) set args /dev/cciss/c0d5p1
(gdb) run

and see what backtrace it gives you?

Please make sure to build xfs_repair in the git tree manually using
make and not the Debian packaging, as that removes the debug symbols.

You can run gdb on the xfs_repair binary just built in the tree, no
need to install it first.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-14 14:24         ` Christoph Hellwig
@ 2011-09-14 14:57           ` Eric Sandeen
  2011-09-14 15:10             ` Bartosz Cisek
  2011-09-14 14:59           ` Bartosz Cisek
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2011-09-14 14:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Bartosz Cisek, Michael Monnerie, xfs

On 9/14/11 9:24 AM, Christoph Hellwig wrote:
> On Wed, Sep 14, 2011 at 11:38:44AM +0200, Bartosz Cisek wrote:
>> W dniu 12.09.2011 18:12, Christoph Hellwig pisze:
>>> I've repaired the image fine using xfs_repair from the Debian testing
>>> xfsprogs 3.1.5+nmu1 package.  It found two invalid blocks in a
>>> directory, which look like the result from the hardware error you saw.
>>>
>>> You should be able to just rebuild the current xfsprogs (from
>>> testing/unstable or git) on Lenny and get the same result.
>>
>> I've build xfs_repair from git on lenny and got segfault before my first
>> email to this list (please refer to first email in this thread) :)
> 
> Ooops.
> 
>> What else I can do to find what differ our two cases?
> 
> The hardware?  Given that you were getting read I/O errors from the
> hardware when shutting the fs down you probably got those as well when
> running repair and that caused the segfault.   Can you run xfs_repair
> under gdb for me, that is:
> 
> gdb /path/to.xfs_repair
> 
> (gdb) set args /dev/cciss/c0d5p1
> (gdb) run
> 
> and see what backtrace it gives you?
> 
> Please make sure to build xfs_repair in the git tree manually using
> make and not the Debian packaging, as that removes the debug symbols.
> 
> You can run gdb on the xfs_repair binary just built in the tree, no
> need to install it first.

If it worked for Christoph and not for Bartosz ....

Is the -DDEBUG / -DNDEBUG type setting the same in both cases?
Maybe Fedora is the only one doing -DNDEBUG :( but maybe worth
checking.

-Eric

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-14 14:24         ` Christoph Hellwig
  2011-09-14 14:57           ` Eric Sandeen
@ 2011-09-14 14:59           ` Bartosz Cisek
  2011-09-14 15:38             ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Bartosz Cisek @ 2011-09-14 14:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Michael Monnerie, xfs

W dniu 14.09.2011 16:24, Christoph Hellwig pisze:
> The hardware?  Given that you were getting read I/O errors from the
> hardware when shutting the fs down you probably got those as well when
> running repair and that caused the segfault.   Can you run xfs_repair
> under gdb for me, that is:
> 
> gdb /path/to.xfs_repair
> 
> (gdb) set args /dev/cciss/c0d5p1
> (gdb) run
> 
> and see what backtrace it gives you?
> 
> Please make sure to build xfs_repair in the git tree manually using
> make and not the Debian packaging, as that removes the debug symbols.
> 
> You can run gdb on the xfs_repair binary just built in the tree, no
> need to install it first.

Stack trace is pasted in bug issue [1] that is linked in first mail ;)
Compiled by hand from git: "DEBUG=-DDEBUG make". I don't know why some
of values are 'optimized out'.

[1] http://oss.sgi.com/bugzilla/show_bug.cgi?id=914
-- 
Bartosz Cisek
Admin

email: bartosz.cisek@nasza-klasa.pl
tel: +48 519 300 122

Nasza Klasa Sp. z o.o.,
ul. Gen. J. Bema 2, 50-265 Wrocław

Sąd Rejonowy dla Wrocławia - Fabrycznej we Wrocławiu,
VI Wydział Gospodarczy Krajowego Rejestru Sądowego,
nr KRS:0000289629, NIP:898-21-22-104 REGON:020586020,
Kapitał zakładowy: 67 850,00 PLN

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-14 14:57           ` Eric Sandeen
@ 2011-09-14 15:10             ` Bartosz Cisek
  2011-09-14 15:23               ` Eric Sandeen
  0 siblings, 1 reply; 17+ messages in thread
From: Bartosz Cisek @ 2011-09-14 15:10 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Christoph Hellwig, Michael Monnerie, xfs

W dniu 14.09.2011 16:57, Eric Sandeen pisze:
> If it worked for Christoph and not for Bartosz ....
> 
> Is the -DDEBUG / -DNDEBUG type setting the same in both cases?
> Maybe Fedora is the only one doing -DNDEBUG :( but maybe worth
> checking.

Tested both. I was wodering why I got 'value optimized out' in gdb, so
compiled with DEBUG=-DNDBEUG and DEBUG=-DDEBUG. Both crashed, both had
'optimized out' values in gdb backtrace.

I found two placed where DEBUG may be overwritten:

hd-slave5 ~/devel/xfsprogs # ack-grep NDEBUG
libxfs/Makefile
39:DEBUG = -DNDEBUG

libxlog/Makefile
16:DEBUG = -DNDEBUG

When I changed -DNDEBUG to -DDEBUG/ compilation failed.

xfs_ialloc.c: In function ‘xfs_imap’:
xfs_ialloc.c:1122: warning: implicit declaration of function
‘xfs_stack_trace’
    [CC]     xfs_inode.lo
xfs_inode.c: In function ‘xfs_validate_extents’:
xfs_inode.c:45: warning: implicit declaration of function ‘get_unaligned’
xfs_inode.c: In function ‘xfs_imap_to_bp’:
xfs_inode.c:149: warning: implicit declaration of function
‘XFS_BUFTARG_NAME’
xfs_inode.c: In function ‘xfs_iextents_copy’:
xfs_inode.c:1012: warning: implicit declaration of function ‘xfs_isilocked’
xfs_inode.c:1012: error: ‘XFS_ILOCK_SHARED’ undeclared (first use in
this function)
xfs_inode.c:1012: error: (Each undeclared identifier is reported only once
xfs_inode.c:1012: error: for each function it appears in.)
-- 
Bartosz Cisek
Admin

email: bartosz.cisek@nasza-klasa.pl
tel: +48 519 300 122

Nasza Klasa Sp. z o.o.,
ul. Gen. J. Bema 2, 50-265 Wrocław

Sąd Rejonowy dla Wrocławia - Fabrycznej we Wrocławiu,
VI Wydział Gospodarczy Krajowego Rejestru Sądowego,
nr KRS:0000289629, NIP:898-21-22-104 REGON:020586020,
Kapitał zakładowy: 67 850,00 PLN

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-14 15:10             ` Bartosz Cisek
@ 2011-09-14 15:23               ` Eric Sandeen
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2011-09-14 15:23 UTC (permalink / raw)
  To: Bartosz Cisek; +Cc: Christoph Hellwig, Michael Monnerie, xfs

On 9/14/11 10:10 AM, Bartosz Cisek wrote:
> W dniu 14.09.2011 16:57, Eric Sandeen pisze:
>> If it worked for Christoph and not for Bartosz ....
>>
>> Is the -DDEBUG / -DNDEBUG type setting the same in both cases?
>> Maybe Fedora is the only one doing -DNDEBUG :( but maybe worth
>> checking.
> 
> Tested both. I was wodering why I got 'value optimized out' in gdb, so
> compiled with DEBUG=-DNDBEUG and DEBUG=-DDEBUG. Both crashed, both had
> 'optimized out' values in gdb backtrace.
> 
> I found two placed where DEBUG may be overwritten:
> 
> hd-slave5 ~/devel/xfsprogs # ack-grep NDEBUG
> libxfs/Makefile
> 39:DEBUG = -DNDEBUG
> 
> libxlog/Makefile
> 16:DEBUG = -DNDEBUG
> 
> When I changed -DNDEBUG to -DDEBUG/ compilation failed.

yeah, that's why it's overridden in those places.  :)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-14 14:59           ` Bartosz Cisek
@ 2011-09-14 15:38             ` Christoph Hellwig
  2011-09-14 16:05               ` Bartosz Cisek
  2011-09-20 20:25               ` Alex Elder
  0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-09-14 15:38 UTC (permalink / raw)
  To: Bartosz Cisek; +Cc: Christoph Hellwig, Michael Monnerie, xfs

On Wed, Sep 14, 2011 at 04:59:40PM +0200, Bartosz Cisek wrote:
> Stack trace is pasted in bug issue [1] that is linked in first mail ;)
> Compiled by hand from git: "DEBUG=-DDEBUG make". I don't know why some
> of values are 'optimized out'.
> 
> [1] http://oss.sgi.com/bugzilla/show_bug.cgi?id=914

Looks like we do not handle read I/O errors very well (to say at all)
in phase6.  Can you see if the patch below makes a difference?

---
From: Christoph Hellwig <hch@lst.de>
Subject: repair: fix I/O error handling

Currently libxfs_trans_read_buf never returns an error, even if
libxfs_readbuf did not manage to complete the I/O.  This is different
from the kernel behaviour and can lead to segfaults in code that
doesn't expect it.  Add a new b_error member to xfs_buf (mirroring
the kernel version) and use that to propagate proper error codes
to the caller.  Also fix libxfs_readbufr to handle short reads
properly, and to not override errno values e.g. by a fprintf.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfsprogs-dev/include/libxfs.h
===================================================================
--- xfsprogs-dev.orig/include/libxfs.h	2011-09-14 11:17:42.660738577 -0400
+++ xfsprogs-dev/include/libxfs.h	2011-09-14 11:20:45.959738580 -0400
@@ -230,6 +230,7 @@ typedef struct xfs_buf {
 	void			*b_fsprivate2;
 	void			*b_fsprivate3;
 	char			*b_addr;
+	int			b_error;
 #ifdef XFS_BUF_TRACING
 	struct list_head	b_lock_list;
 	const char		*b_func;
Index: xfsprogs-dev/libxfs/rdwr.c
===================================================================
--- xfsprogs-dev.orig/libxfs/rdwr.c	2011-09-14 11:12:08.807741720 -0400
+++ xfsprogs-dev/libxfs/rdwr.c	2011-09-14 11:20:21.183238272 -0400
@@ -314,6 +314,7 @@ libxfs_initbuf(xfs_buf_t *bp, dev_t devi
 	bp->b_blkno = bno;
 	bp->b_bcount = bytes;
 	bp->b_dev = device;
+	bp->b_error = 0;
 	if (!bp->b_addr)
 		bp->b_addr = memalign(libxfs_device_alignment(), bytes);
 	if (!bp->b_addr) {
@@ -454,15 +455,17 @@ libxfs_readbufr(dev_t dev, xfs_daddr_t b
 {
 	int	fd = libxfs_device_to_fd(dev);
 	int	bytes = BBTOB(len);
+	int	error;
 
 	ASSERT(BBTOB(len) <= bp->b_bcount);
 
-	if (pread64(fd, bp->b_addr, bytes, LIBXFS_BBTOOFF64(blkno)) < 0) {
+	if (pread64(fd, bp->b_addr, bytes, LIBXFS_BBTOOFF64(blkno)) != bytes) {
+		error = errno;
 		fprintf(stderr, _("%s: read failed: %s\n"),
-			progname, strerror(errno));
+			progname, strerror(error));
 		if (flags & LIBXFS_EXIT_ON_FAILURE)
 			exit(1);
-		return errno;
+		return error;
 	}
 #ifdef IO_DEBUG
 	printf("%lx: %s: read %u bytes, blkno=%llu(%llu), %p\n",
@@ -485,10 +488,8 @@ libxfs_readbuf(dev_t dev, xfs_daddr_t bl
 	bp = libxfs_getbuf(dev, blkno, len);
 	if (bp && !(bp->b_flags & (LIBXFS_B_UPTODATE|LIBXFS_B_DIRTY))) {
 		error = libxfs_readbufr(dev, blkno, bp, len, flags);
-		if (error) {
-			libxfs_putbuf(bp);
-			return NULL;
-		}
+		if (error)
+			bp->b_error = error;
 	}
 	return bp;
 }
Index: xfsprogs-dev/libxfs/trans.c
===================================================================
--- xfsprogs-dev.orig/libxfs/trans.c	2011-09-14 11:12:08.827738490 -0400
+++ xfsprogs-dev/libxfs/trans.c	2011-09-14 11:21:19.771739416 -0400
@@ -478,9 +478,20 @@ libxfs_trans_read_buf(
 	xfs_buf_log_item_t	*bip;
 	xfs_buftarg_t		bdev;
 
+	*bpp = NULL;
+
 	if (tp == NULL) {
-		*bpp = libxfs_readbuf(dev, blkno, len, flags);
-		return 0;
+		bp = libxfs_readbuf(dev, blkno, len, flags);
+		if (!bp) {
+			return (flags & XBF_TRYLOCK) ?
+				EAGAIN : XFS_ERROR(ENOMEM);
+		}
+		if (bp->b_error) {
+			int error = bp->b_error;
+			xfs_buf_relse(bp);
+			return error;
+		}
+		goto done;
 	}
 
 	bdev.dev = dev;
@@ -490,15 +501,20 @@ libxfs_trans_read_buf(
 		ASSERT(XFS_BUF_FSPRIVATE(bp, void *) != NULL);
 		bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t*);
 		bip->bli_recur++;
-		*bpp = bp;
-		return 0;
+		goto done;
 	}
 
 	bp = libxfs_readbuf(dev, blkno, len, flags);
-       if (!bp){
-               *bpp = NULL;
-               return errno;
-       }
+	if (!bp) {
+		return (flags & XBF_TRYLOCK) ?
+			EAGAIN : XFS_ERROR(ENOMEM);
+	}
+	if (bp->b_error) {
+		int error = bp->b_error;
+		xfs_buf_relse(bp);
+		return error;
+	}
+
 #ifdef XACT_DEBUG
 	fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp);
 #endif
@@ -510,6 +526,8 @@ libxfs_trans_read_buf(
 
 	/* initialise b_fsprivate2 so we can find it incore */
 	XFS_BUF_SET_FSPRIVATE2(bp, tp);
+
+done:
 	*bpp = bp;
 	return 0;
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-14 15:38             ` Christoph Hellwig
@ 2011-09-14 16:05               ` Bartosz Cisek
  2011-09-20 20:25               ` Alex Elder
  1 sibling, 0 replies; 17+ messages in thread
From: Bartosz Cisek @ 2011-09-14 16:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Michael Monnerie, xfs

W dniu 14.09.2011 17:38, Christoph Hellwig pisze:
> On Wed, Sep 14, 2011 at 04:59:40PM +0200, Bartosz Cisek wrote:
>> Stack trace is pasted in bug issue [1] that is linked in first mail ;)
>> Compiled by hand from git: "DEBUG=-DDEBUG make". I don't know why some
>> of values are 'optimized out'.
>>
>> [1] http://oss.sgi.com/bugzilla/show_bug.cgi?id=914
> 
> Looks like we do not handle read I/O errors very well (to say at all)
> in phase6.  Can you see if the patch below makes a difference?

It handled IO error and went on. Thanks ;)
-- 
Bartosz Cisek
Admin

email: bartosz.cisek@nasza-klasa.pl
tel: +48 519 300 122

Nasza Klasa Sp. z o.o.,
ul. Gen. J. Bema 2, 50-265 Wrocław

Sąd Rejonowy dla Wrocławia - Fabrycznej we Wrocławiu,
VI Wydział Gospodarczy Krajowego Rejestru Sądowego,
nr KRS:0000289629, NIP:898-21-22-104 REGON:020586020,
Kapitał zakładowy: 67 850,00 PLN

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-14 15:38             ` Christoph Hellwig
  2011-09-14 16:05               ` Bartosz Cisek
@ 2011-09-20 20:25               ` Alex Elder
  2011-09-20 20:28                 ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Alex Elder @ 2011-09-20 20:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Bartosz Cisek, Michael Monnerie, xfs

On Wed, 2011-09-14 at 11:38 -0400, Christoph Hellwig wrote:
> On Wed, Sep 14, 2011 at 04:59:40PM +0200, Bartosz Cisek wrote:
> > Stack trace is pasted in bug issue [1] that is linked in first mail ;)
> > Compiled by hand from git: "DEBUG=-DDEBUG make". I don't know why some
> > of values are 'optimized out'.
> > 
> > [1] http://oss.sgi.com/bugzilla/show_bug.cgi?id=914
> 
> Looks like we do not handle read I/O errors very well (to say at all)
> in phase6.  Can you see if the patch below makes a difference?

Christoph, I'm assuming you want this reviewed
as a submitted patch.


> ---
> From: Christoph Hellwig <hch@lst.de>
> Subject: repair: fix I/O error handling
> 
> Currently libxfs_trans_read_buf never returns an error, even if
> libxfs_readbuf did not manage to complete the I/O.  This is different
> from the kernel behaviour and can lead to segfaults in code that
> doesn't expect it.  Add a new b_error member to xfs_buf (mirroring
> the kernel version) and use that to propagate proper error codes
> to the caller.  Also fix libxfs_readbufr to handle short reads
> properly, and to not override errno values e.g. by a fprintf.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: xfsprogs-dev/include/libxfs.h
> ===================================================================
> --- xfsprogs-dev.orig/include/libxfs.h	2011-09-14 11:17:42.660738577 -0400
> +++ xfsprogs-dev/include/libxfs.h	2011-09-14 11:20:45.959738580 -0400
> @@ -230,6 +230,7 @@ typedef struct xfs_buf {
>  	void			*b_fsprivate2;
>  	void			*b_fsprivate3;
>  	char			*b_addr;
> +	int			b_error;
>  #ifdef XFS_BUF_TRACING
>  	struct list_head	b_lock_list;
>  	const char		*b_func;
> Index: xfsprogs-dev/libxfs/rdwr.c
> ===================================================================
> --- xfsprogs-dev.orig/libxfs/rdwr.c	2011-09-14 11:12:08.807741720 -0400
> +++ xfsprogs-dev/libxfs/rdwr.c	2011-09-14 11:20:21.183238272 -0400
> @@ -314,6 +314,7 @@ libxfs_initbuf(xfs_buf_t *bp, dev_t devi
>  	bp->b_blkno = bno;
>  	bp->b_bcount = bytes;
>  	bp->b_dev = device;
> +	bp->b_error = 0;
>  	if (!bp->b_addr)
>  		bp->b_addr = memalign(libxfs_device_alignment(), bytes);
>  	if (!bp->b_addr) {
> @@ -454,15 +455,17 @@ libxfs_readbufr(dev_t dev, xfs_daddr_t b
>  {
>  	int	fd = libxfs_device_to_fd(dev);
>  	int	bytes = BBTOB(len);
> +	int	error;
>  
>  	ASSERT(BBTOB(len) <= bp->b_bcount);
>  
> -	if (pread64(fd, bp->b_addr, bytes, LIBXFS_BBTOOFF64(blkno)) < 0) {
> +	if (pread64(fd, bp->b_addr, bytes, LIBXFS_BBTOOFF64(blkno)) != bytes) {

If we reach EOF this returns 0, but errno is I think
going to be 0.  Do we want to print a "read failed"
message in that case?  Is EOF a failure, or just
a somewhat normal condition?

Also, it may not matter in the calling code (I
did only a quick check) but maybe it would be
better to set bp->b_error here, where the error
really occurred, rather than in libxfs_readbuf().

Other than that, this change looks good to me.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: xfs_repair segfaut in stage 6
  2011-09-20 20:25               ` Alex Elder
@ 2011-09-20 20:28                 ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-09-20 20:28 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, Bartosz Cisek, Michael Monnerie, xfs

On Tue, Sep 20, 2011 at 03:25:41PM -0500, Alex Elder wrote:
> Christoph, I'm assuming you want this reviewed
> as a submitted patch.

I'll resend it split into a proper series soon.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-09-20 20:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-09  8:43 xfs_repair segfaut in stage 6 Bartosz Cisek
2011-09-09 12:01 ` Michael Monnerie
2011-09-09 15:06   ` Bartosz Cisek
2011-09-12 15:42     ` Bartosz Cisek
2011-09-12 15:58       ` Christoph Hellwig
2011-09-12 16:12     ` Christoph Hellwig
2011-09-14  9:38       ` Bartosz Cisek
2011-09-14 14:24         ` Christoph Hellwig
2011-09-14 14:57           ` Eric Sandeen
2011-09-14 15:10             ` Bartosz Cisek
2011-09-14 15:23               ` Eric Sandeen
2011-09-14 14:59           ` Bartosz Cisek
2011-09-14 15:38             ` Christoph Hellwig
2011-09-14 16:05               ` Bartosz Cisek
2011-09-20 20:25               ` Alex Elder
2011-09-20 20:28                 ` Christoph Hellwig
2011-09-09 12:38 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.