From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rcsinet15.oracle.com ([148.87.113.117]:28809 "EHLO rcsinet15.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751212Ab2GJGed (ORCPT ); Tue, 10 Jul 2012 02:34:33 -0400 Message-ID: <4FFBCC1A.8020800@oracle.com> Date: Tue, 10 Jul 2012 14:30:50 +0800 From: Anand Jain MIME-Version: 1.0 To: Christian Volkmann CC: linux-btrfs@vger.kernel.org Subject: Re: btrfsck crashes References: <4FF9B07C.8090209@cv-sv.de> <4FFA52BC.9010401@oracle.com> <4FFB4BB4.4080408@cv-sv.de> In-Reply-To: <4FFB4BB4.4080408@cv-sv.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: Christian, line # is still confusing to me as well. patch was to avoid seg fault when csum_root node is null and it might not be the case here then. (If the original problem stack-trace has remained the same which is as below).. --------- >>> (gdb) bt >>> #0 0x0000000000402379 in btrfs_header_nritems (eb=0x0) at ctree.h:1426 >>> #1 0x0000000000408c14 in run_next_block (root=0x73fb40, bits=0x740d50, bits_nr=1024, last=0x7fffffffd948, pending=0x7fffffffda40, >>> seen=0x7fffffffda50, reada=0x7fffffffda30, nodes=0x7fffffffda20, extent_cache=0x7fffffffda60) at btrfsck.c:2512 >>> #2 0x00000000004099e2 in check_extents (root=0x73fb40) at btrfsck.c:2792 >>> #3 0x0000000000409bec in main (ac=1, av=0x7fffffffdbe8) at btrfsck.c:2853 ---------- >>> What I have seen: buf is "0", after read_tree_block. >>> >>> btrfsck.c:2511 buf = read_tree_block(root, bytenr, size, 0); >>> 2512 nritems = btrfs_header_nritems(buf); ---------- A re-look (ignore line number) suggests that we already have the extent_buffer_uptodate check for the buf, so buf can't be NULL when calling btrfs_header_nritems which contradicts the above stack trace if you are using the latest code. as shown below. http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=blob;f=btrfsck.c;h=088b9f427339cde70dd6b1a457aeba5cf190ce34;hb=HEAD ------- 2526 static int run_next_block(struct btrfs_root *root, :: 2585 buf = read_tree_block(root, bytenr, size, 0); 2586 if (!extent_buffer_uptodate(buf)) { 2587 record_bad_block_io(root->fs_info, 2588 extent_cache, bytenr, size); 2589 free_extent_buffer(buf); 2590 goto out; 2591 } 2592 2593 nritems = btrfs_header_nritems(buf); <-- Seg fault ?? ------- Thanks, -Anand On 10/07/12 05:23, Christian Volkmann wrote: > Anand Jain schrieb:> > > > >> What I have seen: buf is "0", after read_tree_block. > > > > Yes since we not checking extent_buffer_uptodate for the csum_root_tree, > > that will pass the null buf, The following patch will avoid sending null > > buffer > > https://patchwork.kernel.org/patch/1148831/ > > > > However whether --init-csum-tree will build the good csum I think that > > will still depends on the corruption IMO. > > > > -Anand > > > > .) > The patch does not help. > This is false: !extent_buffer_uptodate(info->csum_root->node) > > .) > Output btrfsck of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git , > patched at line 3552. > > speedy:/tmp/btrfs/btrfs-progs # gdb ./btrfsck > GNU gdb (GDB) SUSE (7.3-41.1.2) > Copyright (C) 2011 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-suse-linux". > For bug reporting instructions, please see: > ... > Reading symbols from /tmp/btrfs/btrfs-progs/btrfsck...done. > (gdb) r /dev/md3 > Starting program: /tmp/btrfs/btrfs-progs/btrfsck /dev/md3 > Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2 > Try: zypper install -C "debuginfo(build-id)=f20c99249f5a5776e1377d3bd728502e3f455a3f" > Missing separate debuginfo for /lib64/libuuid.so.1 > Try: zypper install -C "debuginfo(build-id)=24ae727f9cd5fb29f81b0f965859d3cf4668bf17" > Missing separate debuginfo for /lib64/libc.so.6 > Try: zypper install -C "debuginfo(build-id)=7b169b1db50384b70e3e4b4884cd56432d5de796" > checking extents > checksum verify failed on 2327654400 wanted 89AAEA38 found 72 > checksum verify failed on 2327654400 wanted 89AAEA38 found 72 > checksum verify failed on 2327654400 wanted 73CDE79C found 72 > checksum verify failed on 2327654400 wanted 89AAEA38 found 72 > Csum didn't match > owner ref check failed [2327654400 4096] > ref mismatch on [101138354176 98304] extent item 1, found 0 > Incorrect local backref count on 101138354176 root 5 owner 1867898 offset 0 found 0 wanted 1 back 0x182ebd20 > backpointer mismatch on [101138354176 98304] > owner ref check failed [101138354176 98304] > ref mismatch on [101138452480 106496] extent item 1, found 0 > Incorrect local backref count on 101138452480 root 5 owner 1867899 offset 0 found 0 wanted 1 back 0xefb8d0 > backpointer mismatch on [101138452480 106496] > owner ref check failed [101138452480 106496] > ref mismatch on [101138558976 8192] extent item 1, found 0 > Incorrect local backref count on 101138558976 root 5 owner 1867901 offset 0 found 0 wanted 1 back 0x5a22350 > backpointer mismatch on [101138558976 8192] > owner ref check failed [101138558976 8192] > ref mismatch on [101138567168 16384] extent item 1, found 0 > Incorrect local backref count on 101138567168 root 5 owner 1867902 offset 0 found 0 wanted 1 back 0x5a22390 > backpointer mismatch on [101138567168 16384] > owner ref check failed [101138567168 16384] > ref mismatch on [101138583552 16384] extent item 1, found 0 > Incorrect local backref count on 101138583552 root 5 owner 1867903 offset 0 found 0 wanted 1 back 0x19dfaae0 > backpointer mismatch on [101138583552 16384] > owner ref check failed [101138583552 16384] > Errors found in extent allocation tree > checking fs roots > checksum verify failed on 2327654400 wanted 89AAEA38 found 72 > checksum verify failed on 2327654400 wanted 89AAEA38 found 72 > checksum verify failed on 2327654400 wanted 73CDE79C found 72 > checksum verify failed on 2327654400 wanted 89AAEA38 found 72 > Csum didn't match > > Program received signal SIGSEGV, Segmentation fault. > 0x0000000000402264 in btrfs_header_level (eb=0x0) at ctree.h:1540 > 1540 BTRFS_SETGET_HEADER_FUNCS(header_level, struct btrfs_header, level, 8); > (gdb) > > > .) > Against which git should I regular patch? > This git from the wiki seems to be not up to date: > http://git.darksatanic.net/repo/btrfs-progs-unstable.git > > This repository does not match from the line number: > git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git > > .) > Strange for me: Why seems the same "number" 2327654400 wants > to have a different checksum? > > checksum verify failed on 2327654400 wanted 89AAEA38 found 72 > checksum verify failed on 2327654400 wanted 73CDE79C found 72 > > > Thanks & regards, > Christian > > >> >> On 09/07/12 00:08, Christian Volkmann wrote: >>> Hi there, >>> >>> I have a corrupted filesystem. This filesystem crashes btrfsck. >>> >>> A gdb anaylsis showed me: >>> (gdb) bt >>> #0 0x0000000000402379 in btrfs_header_nritems (eb=0x0) at ctree.h:1426 >>> #1 0x0000000000408c14 in run_next_block (root=0x73fb40, bits=0x740d50, bits_nr=1024, last=0x7fffffffd948, pending=0x7fffffffda40, >>> seen=0x7fffffffda50, reada=0x7fffffffda30, nodes=0x7fffffffda20, extent_cache=0x7fffffffda60) at btrfsck.c:2512 >>> #2 0x00000000004099e2 in check_extents (root=0x73fb40) at btrfsck.c:2792 >>> #3 0x0000000000409bec in main (ac=1, av=0x7fffffffdbe8) at btrfsck.c:2853 >>> >>> What I have seen: buf is "0", after read_tree_block. >>> >>> btrfsck.c:2511 buf = read_tree_block(root, bytenr, size, 0); >>> 2512 nritems = btrfs_header_nritems(buf); >>> >>> So ctree.h crashes here with btrfs_header_nritems(buf) >>> ... >>> static inline u##bits btrfs_##name(struct extent_buffer *eb) \ >>> { \ >>> struct btrfs_header *h = (struct btrfs_header *)eb->data; \ >>> return le##bits##_to_cpu(h->member); \ >>> } \ >>> ... >>> >>> I expect an error "eb == 0" is not covered by ctree.h. >>> May be another fix is required. E.g. harden btrfsck against "0". >>> >>> The file system crashes the kernel on some access. I did not follow up this, >>> cause the file system is corrupt.( Using openSUSE Tumbleweed 3.4.4-31-desktop) >>> May be the kernel code requires also checks for this? >>> >>> Please contact me, if I should do some further tests with this file system >>> or use some tools for a fix test. (developer knowledge given) >>> >>> Another minor issue: btrfsck uses much memory. But this might be normal. >>> ( > 800MB) >>> >>> Best regards, >>> Christian >>> >>> >>> >>> PS: Just if anyone is interested: >>> - History + tried: openSUSE btrfsck showed the messages below in the first step. >>> - /sbin/btrfsck /dev/md3 --repair removed some messages, except checksum. >>> - File system is mounted with: >>> /backup btrfs defaults,compress=zlib,noatime 1 2 >>> - filesystem is used to back up some unix system with heavy usage of: >>> rsync -aH .... --link-dest=... >>> So each file should have regular multiple hard links. >>> >>> === >>> Is there anybody interested in fixing this file system with me, >>> to check btrfsck speedy:/home/cv # /sbin/btrfsck /dev/md3 >>> checking extents >>> checksum verify failed on 2327654400 wanted 73CDE79C found 72 >>> checksum verify failed on 2327654400 wanted 73CDE79C found 72 >>> checksum verify failed on 2327654400 wanted 73CDE79C found 72 >>> checksum verify failed on 2327654400 wanted 73CDE79C found 72 >>> Csum didn't match >>> owner ref check failed [2327654400 4096] >>> ref mismatch on [101138354176 98304] extent item 1, found 0 >>> Incorrect local backref count on 101138354176 root 5 owner 1867898 offset 0 found 0 wanted 1 back 0x1f076d0 >>> backpointer mismatch on [101138354176 98304] >>> owner ref check failed [101138354176 98304] >>> ref mismatch on [101138452480 106496] extent item 1, found 0 >>> Incorrect local backref count on 101138452480 root 5 owner 1867899 offset 0 found 0 wanted 1 back 0x6aa85d0 >>> backpointer mismatch on [101138452480 106496] >>> owner ref check failed [101138452480 106496] >>> ref mismatch on [101138558976 8192] extent item 1, found 0 >>> Incorrect local backref count on 101138558976 root 5 owner 1867901 offset 0 found 0 wanted 1 back 0x6aa8610 >>> backpointer mismatch on [101138558976 8192] >>> owner ref check failed [101138558976 8192] >>> ref mismatch on [101138567168 16384] extent item 1, found 0 >>> Incorrect local backref count on 101138567168 root 5 owner 1867902 offset 0 found 0 wanted 1 back 0x1f8fa80 >>> backpointer mismatch on [101138567168 16384] >>> owner ref check failed [101138567168 16384] >>> ref mismatch on [101138583552 16384] extent item 1, found 0 >>> Incorrect local backref count on 101138583552 root 5 owner 1867903 offset 0 found 0 wanted 1 back 0x1f8fac0 >>> backpointer mismatch on [101138583552 16384] >>> owner ref check failed [101138583552 16384] >>> Errors found in extent allocation tree >>> checking fs roots >>> checksum verify failed on 2327654400 wanted 73CDE79C found 72 >>> checksum verify failed on 2327654400 wanted 73CDE79C found 72 >>> checksum verify failed on 2327654400 wanted 73CDE79C found 72 >>> checksum verify failed on 2327654400 wanted 73CDE79C found 72 >>> Csum didn't match >>> Speicherzugriffsfehler >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html