All of lore.kernel.org
 help / color / mirror / Atom feed
* [sparc64] mkfs.btrfs bus error / align issue?
@ 2016-07-27 13:59 Anatoly Pugachev
  2016-07-27 19:56 ` David Sterba
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Anatoly Pugachev @ 2016-07-27 13:59 UTC (permalink / raw)
  To: Btrfs BTRFS

Hello!

Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it
shows the following :

mator@nvg5120:~/btrfs-progs$ git log -1 --oneline
40650bf Btrfs progs v4.6.1

root@nvg5120:/home/mator/xfstests# gdb
GNU gdb (Debian 7.11.1-2) 7.11.1
(gdb) file /opt/btrfs/bin/mkfs.btrfs
Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done.
(gdb) set args -f -draid5 -mraid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
(gdb) run
Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid5 -mraid5
/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1".
btrfs-progs v4.6.1
See http://btrfs.wiki.kernel.org for more information.

ERROR: superblock checksum mismatch
ERROR: superblock checksum mismatch
ERROR: superblock checksum mismatch
Performing full device TRIM (2.00GiB) ...
Performing full device TRIM (2.00GiB) ...
Performing full device TRIM (2.00GiB) ...
Performing full device TRIM (2.00GiB) ...

Program received signal SIGBUS, Bus error.
0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at
volumes.c:2156
2156                                    *(unsigned long *)(p_eb->data + i) ^=
(gdb) bt
#0  0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570)
    at volumes.c:2156
#1  0x0000000000119b30 in write_and_map_eb (trans=0x2cc250,
root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:426
#2  0x0000000000119e74 in write_tree_block (trans=0x2cc250,
root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:459
#3  0x000000000011a4ac in __commit_transaction (trans=0x2cc250,
root=0x2c7d80) at disk-io.c:562
#4  0x000000000011a7b8 in btrfs_commit_transaction (trans=0x2cc250,
root=0x2c7d80) at disk-io.c:598
#5  0x00000000001a2b04 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1786
(gdb)

Can someone help please? Thanks.

PS: /dev/loop is ramdisk devices:

# mount tmpfs -t tmpfs -o size=12g /ramdisk
# fallocate -l 3.9G /ramdisk/testvol
# for i in 1 2 3 4; do fallocate -l 2G /ramdisk/scratch${i} ; done
# ls -lh /ramdisk/
total 12G
-rw-r--r-- 1 root root 2.0G Jul 27 16:16 scratch1
-rw-r--r-- 1 root root 2.0G Jul 27 16:16 scratch2
-rw-r--r-- 1 root root 2.0G Jul 27 16:16 scratch3
-rw-r--r-- 1 root root 2.0G Jul 27 16:16 scratch4
-rw-r--r-- 1 root root 3.9G Jul 27 16:15 testvol

# for i in /ramdisk/*; do echo -n "$i : "; losetup -f --show $i; done
/ramdisk/scratch1 : /dev/loop0
/ramdisk/scratch2 : /dev/loop1
/ramdisk/scratch3 : /dev/loop2
/ramdisk/scratch4 : /dev/loop3
/ramdisk/testvol : /dev/loop4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-27 13:59 [sparc64] mkfs.btrfs bus error / align issue? Anatoly Pugachev
@ 2016-07-27 19:56 ` David Sterba
  2016-07-28  9:44   ` David Sterba
  2016-07-27 20:39 ` Patrick Baggett
  2016-07-27 20:40 ` John Paul Adrian Glaubitz
  2 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2016-07-27 19:56 UTC (permalink / raw)
  To: Anatoly Pugachev; +Cc: Btrfs BTRFS

On Wed, Jul 27, 2016 at 04:59:27PM +0300, Anatoly Pugachev wrote:
> Hello!
> 
> Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it
> shows the following :
> 
> Program received signal SIGBUS, Bus error.
> 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at
> volumes.c:2156
> 2156                                    *(unsigned long *)(p_eb->data + i) ^=

Yeah, clear unaligned access. We have helpers for so I'll fix it. I was
looking for a way to simulate and catch that on x86 or at least let gcc
warn but no such thing seems to exist. Which means we might accidentally
introduce that in the future.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-27 13:59 [sparc64] mkfs.btrfs bus error / align issue? Anatoly Pugachev
  2016-07-27 19:56 ` David Sterba
@ 2016-07-27 20:39 ` Patrick Baggett
  2016-07-27 20:41   ` Patrick Baggett
  2016-07-27 20:40 ` John Paul Adrian Glaubitz
  2 siblings, 1 reply; 21+ messages in thread
From: Patrick Baggett @ 2016-07-27 20:39 UTC (permalink / raw)
  To: Anatoly Pugachev; +Cc: Btrfs BTRFS

On Wed, Jul 27, 2016 at 8:59 AM, Anatoly Pugachev <matorola@gmail.com> wrote:
>
> Hello!
>
> Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it
> shows the following :
>
> mator@nvg5120:~/btrfs-progs$ git log -1 --oneline
> 40650bf Btrfs progs v4.6.1
>
> root@nvg5120:/home/mator/xfstests# gdb
> GNU gdb (Debian 7.11.1-2) 7.11.1
> (gdb) file /opt/btrfs/bin/mkfs.btrfs
> Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done.
> (gdb) set args -f -draid5 -mraid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
> (gdb) run
> Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid5 -mraid5
> /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1".
> btrfs-progs v4.6.1
> See http://btrfs.wiki.kernel.org for more information.
>
> ERROR: superblock checksum mismatch
> ERROR: superblock checksum mismatch
> ERROR: superblock checksum mismatch
> Performing full device TRIM (2.00GiB) ...
> Performing full device TRIM (2.00GiB) ...
> Performing full device TRIM (2.00GiB) ...
> Performing full device TRIM (2.00GiB) ...
>
> Program received signal SIGBUS, Bus error.
> 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at
> volumes.c:2156
> 2156                                    *(unsigned long *)(p_eb->data + i) ^=
> (gdb) bt
> #0  0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570)
>     at volumes.c:2156
> #1  0x0000000000119b30 in write_and_map_eb (trans=0x2cc250,
> root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:426
> #2  0x0000000000119e74 in write_tree_block (trans=0x2cc250,
> root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:459
> #3  0x000000000011a4ac in __commit_transaction (trans=0x2cc250,
> root=0x2c7d80) at disk-io.c:562
> #4  0x000000000011a7b8 in btrfs_commit_transaction (trans=0x2cc250,
> root=0x2c7d80) at disk-io.c:598
> #5  0x00000000001a2b04 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1786
> (gdb)
>
> Can someone help please? Thanks.
>

The code that faults:

(unsigned long *)(p_eb->data + i) ^= *(unsigned long *)(ebs[j]->data + i);

Because struct extent_buffer has 'data' as a char[], this will always
fault on sparc64 and probably a number of other RISC architectures. It
increments the address by 1, then reads an 8-byte chunk, XORs, and
writes the 8-byte chunk, repeat. In other words, 7 out of 8 reads
would fault, even if both `data` pointers were 8-byte aligned.

This would probably fix it, though it looks ugly.
unsigned long a, b;
memcpy(&a, p_eb->data + i, sizeof(a));  /* Read 8 bytes from p_eb->data+i */
memcpy(&b, ebs[j]->data + i, sizeof(b)); /* Read 8 bytes from ebs[j]->data+i */
a ^= b; /* XOR */
memcpy(p_eb->data + i, &a, sizeof(a)); /* Write back to p_eb->data+i */

I'm not familiar with btrfs, but the results seems like they depend on
the sizeof(unsigned long). Given that they used parentheses, I assume
it was intentional. However, if this was supposed to do an XOR
operation 8 bytes at a time, then it would need to be something like:

*(((unsigned long *)p_eb->data)+i) ^= *(((unsigned long *)ebs[j]->data) + i);

i.e. cast pointer to unsigned long*, then add i (which would index
array of unsigned long, not char).

--Patrick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-27 13:59 [sparc64] mkfs.btrfs bus error / align issue? Anatoly Pugachev
  2016-07-27 19:56 ` David Sterba
  2016-07-27 20:39 ` Patrick Baggett
@ 2016-07-27 20:40 ` John Paul Adrian Glaubitz
  2016-07-27 20:48   ` Patrick Baggett
  2 siblings, 1 reply; 21+ messages in thread
From: John Paul Adrian Glaubitz @ 2016-07-27 20:40 UTC (permalink / raw)
  To: Anatoly Pugachev; +Cc: Btrfs BTRFS, debian-sparc

On 07/27/2016 03:59 PM, Anatoly Pugachev wrote:
> Program received signal SIGBUS, Bus error.
> 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at
> volumes.c:2156
> 2156                                    *(unsigned long *)(p_eb->data + i) ^=

Well, that pretty much looks some creative pointer arithmetics that will provoke
unaligned access. Just check what the declaration of p_eb->data is. If it's
not "unsigned long", then you know why the code breaks here.

You should be able to fix this issue by replacing the code line with:

unsigned long tmp;
memcpy(&tmp, &(ebs[j]->data + i), sizeof(unsigned long));
tmp ^= tmp;
memcpy(&(p_eb->data + i), &tmp, sizeof(unsigned long));

I'm currently not sure whether this can be done more elegantly without
provoking unaligned access due to the bitwise XOR (^). In either case,
your problem is the casting and using memcpy() should fix the problem.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-27 20:39 ` Patrick Baggett
@ 2016-07-27 20:41   ` Patrick Baggett
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick Baggett @ 2016-07-27 20:41 UTC (permalink / raw)
  To: Anatoly Pugachev; +Cc: Btrfs BTRFS

On Wed, Jul 27, 2016 at 3:39 PM, Patrick Baggett
<baggett.patrick@gmail.com> wrote:
> On Wed, Jul 27, 2016 at 8:59 AM, Anatoly Pugachev <matorola@gmail.com> wrote:
>>
>> Hello!
>>
>> Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it
>> shows the following :
>>
>> mator@nvg5120:~/btrfs-progs$ git log -1 --oneline
>> 40650bf Btrfs progs v4.6.1
>>
>> root@nvg5120:/home/mator/xfstests# gdb
>> GNU gdb (Debian 7.11.1-2) 7.11.1
>> (gdb) file /opt/btrfs/bin/mkfs.btrfs
>> Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done.
>> (gdb) set args -f -draid5 -mraid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
>> (gdb) run
>> Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid5 -mraid5
>> /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1".
>> btrfs-progs v4.6.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> ERROR: superblock checksum mismatch
>> ERROR: superblock checksum mismatch
>> ERROR: superblock checksum mismatch
>> Performing full device TRIM (2.00GiB) ...
>> Performing full device TRIM (2.00GiB) ...
>> Performing full device TRIM (2.00GiB) ...
>> Performing full device TRIM (2.00GiB) ...
>>
>> Program received signal SIGBUS, Bus error.
>> 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
>> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at
>> volumes.c:2156
>> 2156                                    *(unsigned long *)(p_eb->data + i) ^=
>> (gdb) bt
>> #0  0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
>> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570)
>>     at volumes.c:2156
>> #1  0x0000000000119b30 in write_and_map_eb (trans=0x2cc250,
>> root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:426
>> #2  0x0000000000119e74 in write_tree_block (trans=0x2cc250,
>> root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:459
>> #3  0x000000000011a4ac in __commit_transaction (trans=0x2cc250,
>> root=0x2c7d80) at disk-io.c:562
>> #4  0x000000000011a7b8 in btrfs_commit_transaction (trans=0x2cc250,
>> root=0x2c7d80) at disk-io.c:598
>> #5  0x00000000001a2b04 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1786
>> (gdb)
>>
>> Can someone help please? Thanks.
>>
>
> The code that faults:
>
> (unsigned long *)(p_eb->data + i) ^= *(unsigned long *)(ebs[j]->data + i);
>
> Because struct extent_buffer has 'data' as a char[], this will always
> fault on sparc64 and probably a number of other RISC architectures. It
> increments the address by 1, then reads an 8-byte chunk, XORs, and
> writes the 8-byte chunk, repeat. In other words, 7 out of 8 reads
> would fault, even if both `data` pointers were 8-byte aligned.

Actually, I misread that: the code increments i by sizeof(unsigned
long), not by 1 so only if either data point is misaligned, then this
will fault.

Disregard that.

>
> This would probably fix it, though it looks ugly.
> unsigned long a, b;
> memcpy(&a, p_eb->data + i, sizeof(a));  /* Read 8 bytes from p_eb->data+i */
> memcpy(&b, ebs[j]->data + i, sizeof(b)); /* Read 8 bytes from ebs[j]->data+i */
> a ^= b; /* XOR */
> memcpy(p_eb->data + i, &a, sizeof(a)); /* Write back to p_eb->data+i */
>
> I'm not familiar with btrfs, but the results seems like they depend on
> the sizeof(unsigned long). Given that they used parentheses, I assume
> it was intentional. However, if this was supposed to do an XOR
> operation 8 bytes at a time, then it would need to be something like:
>
> *(((unsigned long *)p_eb->data)+i) ^= *(((unsigned long *)ebs[j]->data) + i);
>
> i.e. cast pointer to unsigned long*, then add i (which would index
> array of unsigned long, not char).
>
> --Patrick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-27 20:40 ` John Paul Adrian Glaubitz
@ 2016-07-27 20:48   ` Patrick Baggett
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick Baggett @ 2016-07-27 20:48 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: Anatoly Pugachev, Btrfs BTRFS, debian-sparc

On Wed, Jul 27, 2016 at 3:40 PM, John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
> On 07/27/2016 03:59 PM, Anatoly Pugachev wrote:
>> Program received signal SIGBUS, Bus error.
>> 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
>> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at
>> volumes.c:2156
>> 2156                                    *(unsigned long *)(p_eb->data + i) ^=
>
> Well, that pretty much looks some creative pointer arithmetics that will provoke
> unaligned access. Just check what the declaration of p_eb->data is. If it's
> not "unsigned long", then you know why the code breaks here.
>

Yeah, so basically, the best solution (assuming you can't change the
alignment of `data` somehow) would look something like:

Compare lower 'n' bits of `data` pointers (n=2 for ILP32, n=3 for
LP64), something like (data1 & sizeof(long)-1) == (data2 &
sizeof(long)-1). If they are equal, fast loop possible. Not equal ->
slow loop (all byte-by-byte copy).

fast loop path:
do byte-by-byte XOR until lower n bits of (data+i) are zero.

do word-by-word XOR until < 1 word

do byte-by-bye XOR until last bytes done.

This sort of processing is pretty standard for "chunking". If this was
done with some cool SIMD instruction set, it would have similar sort
of approach, I'd imagine.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-27 19:56 ` David Sterba
@ 2016-07-28  9:44   ` David Sterba
  2016-07-28 11:58     ` Anatoly Pugachev
  0 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2016-07-28  9:44 UTC (permalink / raw)
  To: dsterba, Anatoly Pugachev, Btrfs BTRFS

On Wed, Jul 27, 2016 at 09:56:09PM +0200, David Sterba wrote:
> On Wed, Jul 27, 2016 at 04:59:27PM +0300, Anatoly Pugachev wrote:
> > Hello!
> > 
> > Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it
> > shows the following :
> > 
> > Program received signal SIGBUS, Bus error.
> > 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
> > eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at
> > volumes.c:2156
> > 2156                                    *(unsigned long *)(p_eb->data + i) ^=
> 
> Yeah, clear unaligned access. We have helpers for so I'll fix it. I was
> looking for a way to simulate and catch that on x86 or at least let gcc
> warn but no such thing seems to exist. Which means we might accidentally
> introduce that in the future.

Can you please test with the current 'devel' branch? Fixed by the patch
"btrfs-progs: fix unaligned access calculating raid56 data" (depends on
another patch in devel). Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28  9:44   ` David Sterba
@ 2016-07-28 11:58     ` Anatoly Pugachev
  2016-07-28 12:02       ` John Paul Adrian Glaubitz
  2016-07-28 12:09       ` John Paul Adrian Glaubitz
  0 siblings, 2 replies; 21+ messages in thread
From: Anatoly Pugachev @ 2016-07-28 11:58 UTC (permalink / raw)
  To: dsterba; +Cc: Btrfs BTRFS, debian-sparc

On Thu, Jul 28, 2016 at 12:44 PM, David Sterba <dsterba@suse.cz> wrote:
> On Wed, Jul 27, 2016 at 09:56:09PM +0200, David Sterba wrote:
>> On Wed, Jul 27, 2016 at 04:59:27PM +0300, Anatoly Pugachev wrote:
>> > Hello!
>> >
>> > Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it
>> > shows the following :
>> >
>> > Program received signal SIGBUS, Bus error.
>> > 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
>> > eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at
>> > volumes.c:2156
>> > 2156                                    *(unsigned long *)(p_eb->data + i) ^=
>>
>> Yeah, clear unaligned access. We have helpers for so I'll fix it. I was
>> looking for a way to simulate and catch that on x86 or at least let gcc
>> warn but no such thing seems to exist. Which means we might accidentally
>> introduce that in the future.
>
> Can you please test with the current 'devel' branch? Fixed by the patch
> "btrfs-progs: fix unaligned access calculating raid56 data" (depends on
> another patch in devel). Thanks.

David,

but where do I get -devel branch of btrfs-progs?
I just tried git://repo.or.cz/btrfs-progs-unstable/devel.git , but
still seeing last commit in it:

mator@nvg5120:~/1/devel$ git log -1 --pretty=format:"%h %s, %an, %ad"
--date=short
40650bf Btrfs progs v4.6.1, David Sterba, 2016-06-24

so where do i pull git repo to take latest development patches?

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 11:58     ` Anatoly Pugachev
@ 2016-07-28 12:02       ` John Paul Adrian Glaubitz
  2016-07-28 12:09       ` John Paul Adrian Glaubitz
  1 sibling, 0 replies; 21+ messages in thread
From: John Paul Adrian Glaubitz @ 2016-07-28 12:02 UTC (permalink / raw)
  To: Anatoly Pugachev, dsterba; +Cc: Btrfs BTRFS, debian-sparc

On 07/28/2016 01:58 PM, Anatoly Pugachev wrote:
> but where do I get -devel branch of btrfs-progs?
> I just tried git://repo.or.cz/btrfs-progs-unstable/devel.git , but
> still seeing last commit in it:

glaubitz@ikarus:~/upstream/devel$ git checkout devel
Branch devel set up to track remote branch devel from origin.
Switched to a new branch 'devel'
glaubitz@ikarus:~/upstream/devel$ git log | head -n 10
commit 8f88fca93afc870de9783b0179c6e8d1b9f0cbf3
Merge: bf80586 366983b
Author: David Sterba <dsterba@suse.com>
Date:   Tue Jul 26 19:43:50 2016 +0200

    Merge branch 'foreign/qu/fsck-lowmem-v2.1-fixups' into devel

commit bf80586481a4fa1c27571078dddc9392537aacf0
Author: David Sterba <dsterba@suse.com>
Date:   Tue Jul 26 19:33:12 2016 +0200
glaubitz@ikarus:~/upstream/devel$

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 11:58     ` Anatoly Pugachev
  2016-07-28 12:02       ` John Paul Adrian Glaubitz
@ 2016-07-28 12:09       ` John Paul Adrian Glaubitz
  2016-07-28 12:24         ` David Sterba
  1 sibling, 1 reply; 21+ messages in thread
From: John Paul Adrian Glaubitz @ 2016-07-28 12:09 UTC (permalink / raw)
  To: Anatoly Pugachev, dsterba; +Cc: Btrfs BTRFS, debian-sparc

Hi David!

On 07/28/2016 01:58 PM, Anatoly Pugachev wrote:
>> Can you please test with the current 'devel' branch? Fixed by the patch
>> "btrfs-progs: fix unaligned access calculating raid56 data" (depends on
>> another patch in devel). Thanks.

Are you sure you pushed these changes? I don't see them in the devel branch [1].

Adrian

> [1] http://repo.or.cz/btrfs-progs-unstable/devel.git/shortlog/refs/heads/devel

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 12:09       ` John Paul Adrian Glaubitz
@ 2016-07-28 12:24         ` David Sterba
  2016-07-28 14:01           ` Anatoly Pugachev
  0 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2016-07-28 12:24 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: Anatoly Pugachev, dsterba, Btrfs BTRFS, debian-sparc

On Thu, Jul 28, 2016 at 02:09:03PM +0200, John Paul Adrian Glaubitz wrote:
> Hi David!
> 
> On 07/28/2016 01:58 PM, Anatoly Pugachev wrote:
> >> Can you please test with the current 'devel' branch? Fixed by the patch
> >> "btrfs-progs: fix unaligned access calculating raid56 data" (depends on
> >> another patch in devel). Thanks.
> 
> Are you sure you pushed these changes? I don't see them in the devel branch [1].

Oh shame, of course not. Now pushed.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 12:24         ` David Sterba
@ 2016-07-28 14:01           ` Anatoly Pugachev
  2016-07-28 14:25             ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 21+ messages in thread
From: Anatoly Pugachev @ 2016-07-28 14:01 UTC (permalink / raw)
  To: dsterba; +Cc: debian-sparc, Btrfs BTRFS

On Thu, Jul 28, 2016 at 3:24 PM, David Sterba <dsterba@suse.cz> wrote:
> On Thu, Jul 28, 2016 at 02:09:03PM +0200, John Paul Adrian Glaubitz wrote:
>> Hi David!
>>
>> On 07/28/2016 01:58 PM, Anatoly Pugachev wrote:
>> >> Can you please test with the current 'devel' branch? Fixed by the patch
>> >> "btrfs-progs: fix unaligned access calculating raid56 data" (depends on
>> >> another patch in devel). Thanks.
>>
>> Are you sure you pushed these changes? I don't see them in the devel branch [1].
>
> Oh shame, of course not. Now pushed.

David,

after checkout of devel tree:

mator@nvg5120:~/devel$ git describe
v4.6.1-64-g6d1564c
mator@nvg5120:~/devel$ git remote -v
origin  git://repo.or.cz/btrfs-progs-unstable/devel.git (fetch)
origin  git://repo.or.cz/btrfs-progs-unstable/devel.git (push)
mator@nvg5120:~/devel$ git branch
* devel
  master


a new place discovered:

Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done.
(gdb) set args -f -draid6 -mraid6 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
(gdb) run
Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid6 -mraid6
/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1".
btrfs-progs v4.6.1-64-g6d1564c
See http://btrfs.wiki.kernel.org for more information.

ERROR: superblock checksum mismatch
ERROR: superblock checksum mismatch
ERROR: superblock checksum mismatch
Performing full device TRIM (2.00GiB) ...
Performing full device TRIM (2.00GiB) ...
Performing full device TRIM (2.00GiB) ...
Performing full device TRIM (2.00GiB) ...

Program received signal SIGBUS, Bus error.
0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536,
ptrs=0x2c4510) at raid6.c:87
87                      wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE];
(gdb) bt
#0  0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536,
ptrs=0x2c4510) at raid6.c:87
#1  0x000000000015e174 in write_raid56_with_parity (info=0x2b37b0,
eb=0x2c9fe0, multi=0x2c4870, stripe_len=65536, raid_map=0x2c4570)
    at volumes.c:2151
#2  0x0000000000119bd0 in write_and_map_eb (trans=0x2ce250,
root=0x2c9d80, eb=0x2c9fe0) at disk-io.c:426
#3  0x0000000000119f14 in write_tree_block (trans=0x2ce250,
root=0x2c9d80, eb=0x2c9fe0) at disk-io.c:459
#4  0x000000000011a54c in __commit_transaction (trans=0x2ce250,
root=0x2c9d80) at disk-io.c:562
#5  0x000000000011a858 in btrfs_commit_transaction (trans=0x2ce250,
root=0x2c9d80) at disk-io.c:598
#6  0x00000000001a52e8 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1809
(gdb)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 14:01           ` Anatoly Pugachev
@ 2016-07-28 14:25             ` John Paul Adrian Glaubitz
  2016-07-28 14:28               ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 21+ messages in thread
From: John Paul Adrian Glaubitz @ 2016-07-28 14:25 UTC (permalink / raw)
  To: Anatoly Pugachev, dsterba; +Cc: debian-sparc, Btrfs BTRFS

On 07/28/2016 04:01 PM, Anatoly Pugachev wrote:
> Program received signal SIGBUS, Bus error.
> 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536,
> ptrs=0x2c4510) at raid6.c:87
> 87                      wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE];

That should be easy to fix. Just make the R values aligned with the
appropriate get_aligned functions, see David's previous commit [1]:

-                       for (i = 0; i < stripe_len; i += sizeof(unsigned long)) {
-                               *(unsigned long *)(p_eb->data + i) ^=
-                                       *(unsigned long *)(ebs[j]->data + i);
+                       for (i = 0; i < stripe_len; i += sizeof(u64)) {
+                               u64 p_eb_data;
+                               u64 ebs_data;
+
+                               p_eb_data = get_unaligned_64(p_eb->data + i);
+                               ebs_data = get_unaligned_64(ebs[j]->data + i);
+                               p_eb_data ^= ebs_data;
+                               put_unaligned_64(p_eb_data, p_eb->data + i);

> (gdb) bt

You don't need a backtrace here. It stops directly at the offending line. The
pattern is usually for  = *(new_type_t *) bar. Rather surprised to see such
code in here, especially given the fact they already have all the necessary
helper macros defined in [2].

There are more lines in raid6.c which need the same fix, basically everything
with * (unative_t *).

Cheers,
Adrian

> [1] http://repo.or.cz/btrfs-progs-unstable/devel.git/blobdiff/1c47e5b03922772c1a9429c7817cc728c99e2530..24f1713777f02300d0c48ddc142b2c711e462b65:/volumes.c
> [2] http://repo.or.cz/btrfs-progs-unstable/devel.git/blob/refs/heads/devel:/kerncompat.h#l337

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 14:25             ` John Paul Adrian Glaubitz
@ 2016-07-28 14:28               ` John Paul Adrian Glaubitz
  2016-07-28 14:32                 ` Patrick Baggett
  2016-07-28 18:04                 ` David Sterba
  0 siblings, 2 replies; 21+ messages in thread
From: John Paul Adrian Glaubitz @ 2016-07-28 14:28 UTC (permalink / raw)
  To: Anatoly Pugachev, dsterba; +Cc: debian-sparc, Btrfs BTRFS

On 07/28/2016 04:25 PM, John Paul Adrian Glaubitz wrote:
> On 07/28/2016 04:01 PM, Anatoly Pugachev wrote:
>> Program received signal SIGBUS, Bus error.
>> 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536,
>> ptrs=0x2c4510) at raid6.c:87
>> 87                      wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE];
> 
> That should be easy to fix. Just make the R values aligned with the
> appropriate get_aligned functions, see David's previous commit [1]:

Argh, those are called get_UNaligned_*, not get_aligned_*.

> There are more lines in raid6.c which need the same fix, basically everything
> with * (unative_t *).

Oh, and you will somehow need to guard this with #if BITS_PER_LONG == 64 ...
#else ... #endif respectively since you need to use different versions
(64 vs. 32) of get_unaligned_* depending on the size of unative_t.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 14:28               ` John Paul Adrian Glaubitz
@ 2016-07-28 14:32                 ` Patrick Baggett
  2016-07-28 18:04                 ` David Sterba
  1 sibling, 0 replies; 21+ messages in thread
From: Patrick Baggett @ 2016-07-28 14:32 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: Anatoly Pugachev, dsterba, debian-sparc, Btrfs BTRFS

> Oh, and you will somehow need to guard this with #if BITS_PER_LONG == 64 ...
> #else ... #endif respectively since you need to use different versions
> (64 vs. 32) of get_unaligned_* depending on the size of unative_t.

Maybe a get_unaligned_unative() would be better so that preprocessor
fun is kept to a minimum. :)

--Patrick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 14:28               ` John Paul Adrian Glaubitz
  2016-07-28 14:32                 ` Patrick Baggett
@ 2016-07-28 18:04                 ` David Sterba
  2016-07-28 20:34                   ` Anatoly Pugachev
  2016-07-29 10:20                   ` John Paul Adrian Glaubitz
  1 sibling, 2 replies; 21+ messages in thread
From: David Sterba @ 2016-07-28 18:04 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: Anatoly Pugachev, debian-sparc, Btrfs BTRFS

On Thu, Jul 28, 2016 at 04:28:41PM +0200, John Paul Adrian Glaubitz wrote:
> On 07/28/2016 04:25 PM, John Paul Adrian Glaubitz wrote:
> > On 07/28/2016 04:01 PM, Anatoly Pugachev wrote:
> >> Program received signal SIGBUS, Bus error.
> >> 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536,
> >> ptrs=0x2c4510) at raid6.c:87
> >> 87                      wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE];
> > 
> > That should be easy to fix. Just make the R values aligned with the
> > appropriate get_aligned functions, see David's previous commit [1]:
> 
> Argh, those are called get_UNaligned_*, not get_aligned_*.
> 
> > There are more lines in raid6.c which need the same fix, basically everything
> > with * (unative_t *).
> 
> Oh, and you will somehow need to guard this with #if BITS_PER_LONG == 64 ...
> #else ... #endif respectively since you need to use different versions
> (64 vs. 32) of get_unaligned_* depending on the size of unative_t.

And I've fixed it that way, now pushed to devel ("btrfs-progs: fix
unaligned access in raid6 calculations" [1]). Would be great if you or
Anatoly can test it so I can add it to the 4.7 release (ETA tomorrow).

[1] https://github.com/kdave/btrfs-progs/commit/44b35e94facf820d2c2d8e1be631bf40d68dff8d

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 18:04                 ` David Sterba
@ 2016-07-28 20:34                   ` Anatoly Pugachev
  2016-07-29  9:40                     ` Anatoly Pugachev
  2016-07-29 12:41                     ` David Sterba
  2016-07-29 10:20                   ` John Paul Adrian Glaubitz
  1 sibling, 2 replies; 21+ messages in thread
From: Anatoly Pugachev @ 2016-07-28 20:34 UTC (permalink / raw)
  To: dsterba; +Cc: Btrfs BTRFS, debian-sparc

On Thu, Jul 28, 2016 at 9:04 PM, David Sterba <dsterba@suse.cz> wrote:
> On Thu, Jul 28, 2016 at 04:28:41PM +0200, John Paul Adrian Glaubitz wrote:
>> On 07/28/2016 04:25 PM, John Paul Adrian Glaubitz wrote:
>> > On 07/28/2016 04:01 PM, Anatoly Pugachev wrote:
>> >> Program received signal SIGBUS, Bus error.
>> >> 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536,
>> >> ptrs=0x2c4510) at raid6.c:87
>> >> 87                      wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE];
>> >
>> > That should be easy to fix. Just make the R values aligned with the
>> > appropriate get_aligned functions, see David's previous commit [1]:
>>
>> Argh, those are called get_UNaligned_*, not get_aligned_*.
>>
>> > There are more lines in raid6.c which need the same fix, basically everything
>> > with * (unative_t *).
>>
>> Oh, and you will somehow need to guard this with #if BITS_PER_LONG == 64 ...
>> #else ... #endif respectively since you need to use different versions
>> (64 vs. 32) of get_unaligned_* depending on the size of unative_t.
>
> And I've fixed it that way, now pushed to devel ("btrfs-progs: fix
> unaligned access in raid6 calculations" [1]). Would be great if you or
> Anatoly can test it so I can add it to the 4.7 release (ETA tomorrow).


David,

well, I think mkfs.btrfs is fixed, since I just tested it with :

root@nvg5120:/home/mator/xfstests# ./check 'btrfs/06?'
FSTYP         -- btrfs
PLATFORM      -- Linux/sparc64 nvg5120 4.7.0+
MKFS_OPTIONS  -- /dev/loop0
MOUNT_OPTIONS -- /dev/loop0 /mnt/scratch

btrfs/060        145s
btrfs/061        158s
btrfs/062        288s
btrfs/063        141s
btrfs/064        129s
btrfs/065        44s
btrfs/066        46s
btrfs/067        - output mismatch (see
/home/mator/xfstests/results//btrfs/067.out.bad)
    --- tests/btrfs/067.out     2016-07-20 12:12:21.772228422 +0300
    +++ /home/mator/xfstests/results//btrfs/067.out.bad 2016-07-28
22:54:00.059192629 +0300
    @@ -1,2 +1,3 @@
     QA output created by 067
     Silence is golden
    +Scrub find errors in "-m single -d single" test
    ...
    (Run 'diff -u tests/btrfs/067.out
/home/mator/xfstests/results//btrfs/067.out.bad'  to see the entire
diff)
btrfs/068        57s
btrfs/069        45s
Ran: btrfs/060 btrfs/061 btrfs/062 btrfs/063 btrfs/064 btrfs/065
btrfs/066 btrfs/067 btrfs/068 btrfs/069
Failures: btrfs/067
Failed 1 of 10 tests


previously (before mkfs.btrfs fix) , all tests from 06? were bad/failed.

Starting from "tests/btrfs/064" kernel started to log TPC (Trap
Program Counter register) messages, a lot of them.

Results of the this test i put on a webserver [1].
Output of journalctl -b (from boot) with TPC messages are at [2].

Not sure what we need to do with sparc64 btrfs module TPC messages.
Probably fill kernel bugzilla report?

Thanks.

[1] http://u163.east.ru/btrfs/xfstests-btrfs-06x-results.tar.gz
[2] http://u163.east.ru/btrfs/kernel-4.7.0+-logs-xfstests-06x.txt.gz

PS: my xfstests setup is the following:

# mount tmpfs -t tmpfs -o size=13g /ramdisk/
/ramdisk# for i in 1 2 3 4 5 6; do fallocate -l 1g scratch${i}; done
/ramdisk# fallocate -l 4g testvol1

/ramdisk# for i in *; do losetup -f $i; done
/home/mator/xfstests# losetup
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE         DIO
/dev/loop0         0      0         0  0 /ramdisk/scratch1   0
/dev/loop1         0      0         0  0 /ramdisk/scratch2   0
/dev/loop2         0      0         0  0 /ramdisk/scratch3   0
/dev/loop3         0      0         0  0 /ramdisk/scratch4   0
/dev/loop4         0      0         0  0 /ramdisk/scratch5   0
/dev/loop5         0      0         0  0 /ramdisk/scratch6   0
/dev/loop6         0      0         0  0 /ramdisk/testvol1   0

# mkfs.btrfs /dev/loop6
btrfs-progs v4.6.1-66-g4367e35
See http://btrfs.wiki.kernel.org for more information.

Performing full device TRIM (4.00GiB) ...
Label:              (null)
UUID:               6a4d5918-adfe-469c-8454-9b28545b88bc
Node size:          16384
Sector size:        8192
Filesystem size:    4.00GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         DUP             204.75MiB
  System:           DUP               8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1     4.00GiB  /dev/loop6

root@nvg5120:/home/mator/xfstests# cat local.config
export TEST_DEV=/dev/loop6
export TEST_DIR=/fst
export SCRATCH_DEV_POOL="/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
/dev/loop4 /dev/loop5"
export SCRATCH_MNT=/mnt/scratch

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 20:34                   ` Anatoly Pugachev
@ 2016-07-29  9:40                     ` Anatoly Pugachev
  2016-07-29 12:41                     ` David Sterba
  1 sibling, 0 replies; 21+ messages in thread
From: Anatoly Pugachev @ 2016-07-29  9:40 UTC (permalink / raw)
  To: dsterba; +Cc: Btrfs BTRFS, debian-sparc, fstests

On Thu, Jul 28, 2016 at 11:34 PM, Anatoly Pugachev <matorola@gmail.com> wrote:
> On Thu, Jul 28, 2016 at 9:04 PM, David Sterba <dsterba@suse.cz> wrote:
>> On Thu, Jul 28, 2016 at 04:28:41PM +0200, John Paul Adrian Glaubitz wrote:
>>> On 07/28/2016 04:25 PM, John Paul Adrian Glaubitz wrote:
>>> > On 07/28/2016 04:01 PM, Anatoly Pugachev wrote:
>>> >> Program received signal SIGBUS, Bus error.
>>> >> 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536,
>>> >> ptrs=0x2c4510) at raid6.c:87
>>> >> 87                      wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE];
>>> >
>>> > That should be easy to fix. Just make the R values aligned with the
>>> > appropriate get_aligned functions, see David's previous commit [1]:
>>>
>>> Argh, those are called get_UNaligned_*, not get_aligned_*.
>>>
>>> > There are more lines in raid6.c which need the same fix, basically everything
>>> > with * (unative_t *).
>>>
>>> Oh, and you will somehow need to guard this with #if BITS_PER_LONG == 64 ...
>>> #else ... #endif respectively since you need to use different versions
>>> (64 vs. 32) of get_unaligned_* depending on the size of unative_t.
>>
>> And I've fixed it that way, now pushed to devel ("btrfs-progs: fix
>> unaligned access in raid6 calculations" [1]). Would be great if you or
>> Anatoly can test it so I can add it to the 4.7 release (ETA tomorrow).
>
> David,
> well, I think mkfs.btrfs is fixed, since I just tested it with :
> root@nvg5120:/home/mator/xfstests# ./check 'btrfs/06?'
> FSTYP         -- btrfs
> PLATFORM      -- Linux/sparc64 nvg5120 4.7.0+
> MKFS_OPTIONS  -- /dev/loop0
> MOUNT_OPTIONS -- /dev/loop0 /mnt/scratch
>
> btrfs/060        145s
> btrfs/061        158s
> btrfs/062        288s
> btrfs/063        141s
> btrfs/064        129s
> btrfs/065        44s
> btrfs/066        46s
> btrfs/067        - output mismatch (see
> /home/mator/xfstests/results//btrfs/067.out.bad)
>     --- tests/btrfs/067.out     2016-07-20 12:12:21.772228422 +0300
>     +++ /home/mator/xfstests/results//btrfs/067.out.bad 2016-07-28
> 22:54:00.059192629 +0300
>     @@ -1,2 +1,3 @@
>      QA output created by 067
>      Silence is golden
>     +Scrub find errors in "-m single -d single" test
>     ...
>     (Run 'diff -u tests/btrfs/067.out
> /home/mator/xfstests/results//btrfs/067.out.bad'  to see the entire
> diff)
> btrfs/068        57s
> btrfs/069        45s
> Ran: btrfs/060 btrfs/061 btrfs/062 btrfs/063 btrfs/064 btrfs/065
> btrfs/066 btrfs/067 btrfs/068 btrfs/069
> Failures: btrfs/067
> Failed 1 of 10 tests
>
>
> previously (before mkfs.btrfs fix) , all tests from 06? were bad/failed.
>
> Starting from "tests/btrfs/064" kernel started to log TPC (Trap
> Program Counter register) messages, a lot of them.
>
> Results of the this test i put on a webserver [1].
> Output of journalctl -b (from boot) with TPC messages are at [2].
>
> Not sure what we need to do with sparc64 btrfs module TPC messages.
> Probably fill kernel bugzilla report?
>
> Thanks.
>
> [1] http://u163.east.ru/btrfs/xfstests-btrfs-06x-results.tar.gz
> [2] http://u163.east.ru/btrfs/kernel-4.7.0+-logs-xfstests-06x.txt.gz
>
> PS: my xfstests setup is the following:
>
> # mount tmpfs -t tmpfs -o size=13g /ramdisk/
> /ramdisk# for i in 1 2 3 4 5 6; do fallocate -l 1g scratch${i}; done
> /ramdisk# fallocate -l 4g testvol1
>
> /ramdisk# for i in *; do losetup -f $i; done
> /home/mator/xfstests# losetup
> NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE         DIO
> /dev/loop0         0      0         0  0 /ramdisk/scratch1   0
> /dev/loop1         0      0         0  0 /ramdisk/scratch2   0
> /dev/loop2         0      0         0  0 /ramdisk/scratch3   0
> /dev/loop3         0      0         0  0 /ramdisk/scratch4   0
> /dev/loop4         0      0         0  0 /ramdisk/scratch5   0
> /dev/loop5         0      0         0  0 /ramdisk/scratch6   0
> /dev/loop6         0      0         0  0 /ramdisk/testvol1   0
>
> # mkfs.btrfs /dev/loop6
> btrfs-progs v4.6.1-66-g4367e35
> See http://btrfs.wiki.kernel.org for more information.
>
> Performing full device TRIM (4.00GiB) ...
> Label:              (null)
> UUID:               6a4d5918-adfe-469c-8454-9b28545b88bc
> Node size:          16384
> Sector size:        8192
> Filesystem size:    4.00GiB
> Block group profiles:
>   Data:             single            8.00MiB
>   Metadata:         DUP             204.75MiB
>   System:           DUP               8.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>    ID        SIZE  PATH
>     1     4.00GiB  /dev/loop6
>
> root@nvg5120:/home/mator/xfstests# cat local.config
> export TEST_DEV=/dev/loop6
> export TEST_DIR=/fst
> export SCRATCH_DEV_POOL="/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
> /dev/loop4 /dev/loop5"
> export SCRATCH_MNT=/mnt/scratch









Just to add, I've also run tests from btrfs/000 to btrfs/059, with not
so bad results:

Ran: btrfs/001 btrfs/002 btrfs/005 btrfs/006 btrfs/008 btrfs/009
btrfs/010 btrfs/012 btrfs/013 btrfs/014 btrfs/015 btrfs/016 btrfs/017
btrfs/018 btrfs/019 btrfs/020 btrfs/021 btrfs/022 btrfs/023 btrfs/024
btrfs/025 btrfs/026 btrfs/027 btrfs/028 btrfs/029 btrfs/030 btrfs/031
btrfs/032 btrfs/033 btrfs/034 btrfs/035 btrfs/036 btrfs/037 btrfs/038
btrfs/039 btrfs/040 btrfs/041 btrfs/042 btrfs/043 btrfs/044 btrfs/045
btrfs/046 btrfs/048 btrfs/049 btrfs/050 btrfs/051 btrfs/052 btrfs/053
btrfs/054 btrfs/055 btrfs/056 btrfs/057 btrfs/058 btrfs/059
Not run: btrfs/003 btrfs/004 btrfs/007 btrfs/011 btrfs/047
Failures: btrfs/010 btrfs/012 btrfs/057
Failed 3 of 54 tests

Failures:

btrfs/010 - failed with "number of extents mis-match!"
$ cat /home/mator/xfstests/results//btrfs/010.full
Create subvolume '/mnt/scratch/subvol'
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
Create a snapshot of '/mnt/scratch/subvol' in '/mnt/scratch/snap-2'
Create a snapshot of '/mnt/scratch/subvol' in '/mnt/scratch/snap-1'
/mnt/scratch/subvol/foobar:
        0: [0..79]: 24704..24783
/mnt/scratch/snap-1/foobar:
        0: [0..31]: 24672..24703
        1: [32..47]: 24656..24671
        2: [48..63]: 24608..24623
        3: [64..79]: 24592..24607
/mnt/scratch/snap-2/foobar:
        0: [0..31]: 24672..24703
        1: [32..47]: 24656..24671
        2: [48..63]: 24608..24623
        3: [64..79]: 24592..24607
1 4 4


btrfs/012 - failed with "btrfs-convert failed"
$ cat /home/mator/xfstests/results//btrfs/012.full
mke2fs 1.43.1 (08-Jun-2016)
Discarding device blocks: done
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 98d0756e-76b6-4ab1-ac7d-a1fceb4b21b4
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

ERROR: system chunk array too big 1627389952 > 2048
ERROR: superblock checksum matches but it has invalid members
No valid Btrfs found on /dev/loop0
unable to open ctree
conversion aborted
create btrfs filesystem:
        blocksize: 4096
        nodesize:  16384
        features:  extref, skinny-metadata (default)
btrfs-convert failed


btrfs/057 - failed: '_scratch_mkfs -b 1g --nodesize 4096'
$ cat /home/mator/xfstests/results//btrfs/057.full
# _scratch_mkfs -b 1g --nodesize 4096
ERROR: illegal nodesize 4096 (smaller than 8192)
failed: '_scratch_mkfs -b 1g --nodesize 4096'


JFYI,
mator@nvg5120:~$ getconf PAGE_SIZE
8192


this 000-059 tests was done with fresh reboot.

within 027 test, kernel started to show TPC messages, like this one:

Jul 29 12:10:32 nvg5120 unknown: run fstests btrfs/027 at 2016-07-29 12:10:32
...
Jul 29 12:10:58 nvg5120 kernel: BTRFS info (device loop4): allowing
degraded mounts
Jul 29 12:10:58 nvg5120 kernel: BTRFS info (device loop4): disk space
caching is enabled
Jul 29 12:10:58 nvg5120 kernel: BTRFS info (device loop4): has skinny extents
Jul 29 12:10:59 nvg5120 kernel: BTRFS info (device loop4): dev_replace
from <missing disk> (devid 2) to /dev/loop5 started
Jul 29 12:10:59 nvg5120 kernel: Kernel unaligned access at
TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs]
Jul 29 12:10:59 nvg5120 kernel: Kernel unaligned access at
TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs]
Jul 29 12:10:59 nvg5120 kernel: Kernel unaligned access at
TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs]
Jul 29 12:10:59 nvg5120 kernel: Kernel unaligned access at
TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs]
Jul 29 12:10:59 nvg5120 kernel: Kernel unaligned access at
TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs]
Jul 29 12:11:00 nvg5120 kernel: BTRFS info (device loop4): dev_replace
from <missing disk> (devid 2) to /dev/loop5 finished
...
Jul 29 12:11:07 nvg5120 kernel: BTRFS info (device loop4): allowing
degraded mounts
Jul 29 12:11:07 nvg5120 kernel: BTRFS info (device loop4): disk space
caching is enabled
Jul 29 12:11:07 nvg5120 kernel: BTRFS info (device loop4): has skinny extents
Jul 29 12:11:08 nvg5120 kernel: BTRFS info (device loop4): dev_replace
from <missing disk> (devid 2) to /dev/loop5 started
Jul 29 12:11:08 nvg5120 kernel: log_unaligned: 10616 callbacks suppressed
Jul 29 12:11:08 nvg5120 kernel: Kernel unaligned access at
TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs]
Jul 29 12:11:09 nvg5120 kernel: Kernel unaligned access at
TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs]
Jul 29 12:11:09 nvg5120 kernel: Kernel unaligned access at
TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs]
Jul 29 12:11:09 nvg5120 kernel: Kernel unaligned access at
TPC[118e0094] __btrfs_map_block+0x3d4/0x1180 [btrfs]
Jul 29 12:11:09 nvg5120 kernel: Kernel unaligned access at
TPC[118e0960] __btrfs_map_block+0xca0/0x1180 [btrfs]
Jul 29 12:11:09 nvg5120 kernel: BTRFS info (device loop4): dev_replace
from <missing disk> (devid 2) to /dev/loop5 finished
Jul 29 12:11:11 nvg5120 mator[34598]: run xfstest btrfs/028

and only with 027 test, next tests were finished without TPC messages.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 18:04                 ` David Sterba
  2016-07-28 20:34                   ` Anatoly Pugachev
@ 2016-07-29 10:20                   ` John Paul Adrian Glaubitz
  1 sibling, 0 replies; 21+ messages in thread
From: John Paul Adrian Glaubitz @ 2016-07-29 10:20 UTC (permalink / raw)
  To: dsterba, Anatoly Pugachev, debian-sparc, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 547 bytes --]

On 07/28/2016 08:04 PM, David Sterba wrote:
> And I've fixed it that way, now pushed to devel ("btrfs-progs: fix
> unaligned access in raid6 calculations" [1]). Would be great if you or
> Anatoly can test it so I can add it to the 4.7 release (ETA tomorrow).

Awesome, thank you very much! Much appreciated!

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-28 20:34                   ` Anatoly Pugachev
  2016-07-29  9:40                     ` Anatoly Pugachev
@ 2016-07-29 12:41                     ` David Sterba
  2016-07-29 15:03                       ` Anatoly Pugachev
  1 sibling, 1 reply; 21+ messages in thread
From: David Sterba @ 2016-07-29 12:41 UTC (permalink / raw)
  To: Anatoly Pugachev; +Cc: Btrfs BTRFS, debian-sparc

On Thu, Jul 28, 2016 at 11:34:58PM +0300, Anatoly Pugachev wrote:
> well, I think mkfs.btrfs is fixed, since I just tested it with :

Good news, thanks.

> Not sure what we need to do with sparc64 btrfs module TPC messages.
> Probably fill kernel bugzilla report?

Yes please.

> [1] http://u163.east.ru/btrfs/xfstests-btrfs-06x-results.tar.gz
> [2] http://u163.east.ru/btrfs/kernel-4.7.0+-logs-xfstests-06x.txt.gz

quick stats of the TPC messages:

     23 __btrfs_map_block+0x36c/0x1180
      9 __remove_rbio_from_cache+0x38/0x140
      6 lock_stripe_add+0xb0/0x360
      4 __btrfs_map_block+0x3d4/0x1180
      3 __btrfs_map_block+0xca0/0x1180

running in 'gdb btrfs.ko' for each of the addresses should tell us what are the
locations:

gdb> l *(__btrfs_map_block+0x36c)
...

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [sparc64] mkfs.btrfs bus error / align issue?
  2016-07-29 12:41                     ` David Sterba
@ 2016-07-29 15:03                       ` Anatoly Pugachev
  0 siblings, 0 replies; 21+ messages in thread
From: Anatoly Pugachev @ 2016-07-29 15:03 UTC (permalink / raw)
  To: dsterba; +Cc: Btrfs BTRFS, debian-sparc

On Fri, Jul 29, 2016 at 3:41 PM, David Sterba <dsterba@suse.cz> wrote:
> On Thu, Jul 28, 2016 at 11:34:58PM +0300, Anatoly Pugachev wrote:
>> well, I think mkfs.btrfs is fixed, since I just tested it with :
>
> Good news, thanks.
>
> quick stats of the TPC messages:
>
>      23 __btrfs_map_block+0x36c/0x1180
>       9 __remove_rbio_from_cache+0x38/0x140
>       6 lock_stripe_add+0xb0/0x360
>       4 __btrfs_map_block+0x3d4/0x1180
>       3 __btrfs_map_block+0xca0/0x1180
>
> running in 'gdb btrfs.ko' for each of the addresses should tell us what are the
> locations:
>
> gdb> l *(__btrfs_map_block+0x36c)
> ...

installed fresh btrgs-progs from git
mator@nvg5120:~/btrfs-progs$ git describe --long
v4.7-0-g9d2ea01

recompiled kernel with debug info... and run xfstests/check 'btrfs/06?' again

mator@nvg5120:~/linux-2.6$ git describe --long
v4.7-0-g523d939

kernel is patched with [1] to enable btrfs module loading on
big-endian systems (not sure does current linux kernel git includes
this patch or not, used/checkout plain v4.7 tag which is 5 days old)

root@nvg5120:/home/mator/xfstests# ./check 'btrfs/06?'
FSTYP         -- btrfs
PLATFORM      -- Linux/sparc64 nvg5120 4.7.0+
MKFS_OPTIONS  -- /dev/loop0
MOUNT_OPTIONS -- /dev/loop0 /mnt/scratch

btrfs/060        156s
btrfs/061        182s
btrfs/062        312s
btrfs/063        162s
btrfs/064        152s
btrfs/065        61s
btrfs/066        65s
btrfs/067        158s
btrfs/068        74s
btrfs/069        65s
Ran: btrfs/060 btrfs/061 btrfs/062 btrfs/063 btrfs/064 btrfs/065
btrfs/066 btrfs/067 btrfs/068 btrfs/069
Passed all 10 tests

$ journalctl -b -k | awk '/TPC/{print $11}' | sort | uniq -c | sort -n
      4 __btrfs_map_block+0xa10/0x1100
      5 lock_stripe_add+0xb0/0x340
      7 __btrfs_map_block+0x9d4/0x1100
      9 __remove_rbio_from_cache+0x30/0x140
     29 __btrfs_map_block+0x96c/0x1100


$ gdb -q /lib/modules/4.7.0+/kernel/fs/btrfs/btrfs.ko
Reading symbols from /lib/modules/4.7.0+/kernel/fs/btrfs/btrfs.ko...done.
(gdb) l *(__btrfs_map_block+0x96c)
0x8498c is in __btrfs_map_block (fs/btrfs/volumes.c:5615).
5610                    div_u64_rem(stripe_nr, num_stripes, &rot);
5611
5612                    /* Fill in the logical address of each stripe */
5613                    tmp = stripe_nr * nr_data_stripes(map);
5614                    for (i = 0; i < nr_data_stripes(map); i++)
5615                            bbio->raid_map[(i+rot) % num_stripes] =
5616                                    em->start + (tmp + i) * map->stripe_len;
5617
5618                    bbio->raid_map[(i+rot) % map->num_stripes] =
RAID5_P_STRIPE;
5619                    if (map->type & BTRFS_BLOCK_GROUP_RAID6)
(gdb) l *(__btrfs_map_block+0x9d4)
0x849f4 is in __btrfs_map_block (fs/btrfs/volumes.c:5618).
5613                    tmp = stripe_nr * nr_data_stripes(map);
5614                    for (i = 0; i < nr_data_stripes(map); i++)
5615                            bbio->raid_map[(i+rot) % num_stripes] =
5616                                    em->start + (tmp + i) * map->stripe_len;
5617
5618                    bbio->raid_map[(i+rot) % map->num_stripes] =
RAID5_P_STRIPE;
5619                    if (map->type & BTRFS_BLOCK_GROUP_RAID6)
5620                            bbio->raid_map[(i+rot+1) % num_stripes] =
5621                                    RAID6_Q_STRIPE;
5622            }
(gdb) l *(__btrfs_map_block+0xa10)
0x84a30 is in __btrfs_map_block (fs/btrfs/volumes.c:5620).
5615                            bbio->raid_map[(i+rot) % num_stripes] =
5616                                    em->start + (tmp + i) * map->stripe_len;
5617
5618                    bbio->raid_map[(i+rot) % map->num_stripes] =
RAID5_P_STRIPE;
5619                    if (map->type & BTRFS_BLOCK_GROUP_RAID6)
5620                            bbio->raid_map[(i+rot+1) % num_stripes] =
5621                                    RAID6_Q_STRIPE;
5622            }
5623
5624            if (rw & REQ_DISCARD) {
(gdb) l *(lock_stripe_add+0xb0)
0xe0370 is in lock_stripe_add (fs/btrfs/raid56.c:685).
680             int walk = 0;
681
682             spin_lock_irqsave(&h->lock, flags);
683             list_for_each_entry(cur, &h->hash_list, hash_list) {
684                     walk++;
685                     if (cur->bbio->raid_map[0] == rbio->bbio->raid_map[0]) {
686                             spin_lock(&cur->bio_list_lock);
687
688                             /* can we steal this cached rbio's pages? */
689                             if (bio_list_empty(&cur->bio_list) &&
(gdb) l *(__remove_rbio_from_cache+0x30)
0xdfe30 is in __remove_rbio_from_cache (include/linux/spinlock.h:302).
297             raw_spin_lock_init(&(_lock)->rlock);            \
298     } while (0)
299
300     static __always_inline void spin_lock(spinlock_t *lock)
301     {
302             raw_spin_lock(&lock->rlock);
303     }
304
305     static __always_inline void spin_lock_bh(spinlock_t *lock)
306     {


Thanks.

[1]. http://www.spinics.net/lists/linux-btrfs/msg57193.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-07-29 15:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-27 13:59 [sparc64] mkfs.btrfs bus error / align issue? Anatoly Pugachev
2016-07-27 19:56 ` David Sterba
2016-07-28  9:44   ` David Sterba
2016-07-28 11:58     ` Anatoly Pugachev
2016-07-28 12:02       ` John Paul Adrian Glaubitz
2016-07-28 12:09       ` John Paul Adrian Glaubitz
2016-07-28 12:24         ` David Sterba
2016-07-28 14:01           ` Anatoly Pugachev
2016-07-28 14:25             ` John Paul Adrian Glaubitz
2016-07-28 14:28               ` John Paul Adrian Glaubitz
2016-07-28 14:32                 ` Patrick Baggett
2016-07-28 18:04                 ` David Sterba
2016-07-28 20:34                   ` Anatoly Pugachev
2016-07-29  9:40                     ` Anatoly Pugachev
2016-07-29 12:41                     ` David Sterba
2016-07-29 15:03                       ` Anatoly Pugachev
2016-07-29 10:20                   ` John Paul Adrian Glaubitz
2016-07-27 20:39 ` Patrick Baggett
2016-07-27 20:41   ` Patrick Baggett
2016-07-27 20:40 ` John Paul Adrian Glaubitz
2016-07-27 20:48   ` Patrick Baggett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.