From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f196.google.com ([209.85.220.196]:34709 "EHLO mail-qk0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932328AbcG0UlX (ORCPT ); Wed, 27 Jul 2016 16:41:23 -0400 Received: by mail-qk0-f196.google.com with SMTP id p126so3458032qke.1 for ; Wed, 27 Jul 2016 13:41:22 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: From: Patrick Baggett Date: Wed, 27 Jul 2016 15:41:21 -0500 Message-ID: Subject: Re: [sparc64] mkfs.btrfs bus error / align issue? To: Anatoly Pugachev Cc: Btrfs BTRFS Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Wed, Jul 27, 2016 at 3:39 PM, Patrick Baggett wrote: > On Wed, Jul 27, 2016 at 8:59 AM, Anatoly Pugachev wrote: >> >> Hello! >> >> Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it >> shows the following : >> >> mator@nvg5120:~/btrfs-progs$ git log -1 --oneline >> 40650bf Btrfs progs v4.6.1 >> >> root@nvg5120:/home/mator/xfstests# gdb >> GNU gdb (Debian 7.11.1-2) 7.11.1 >> (gdb) file /opt/btrfs/bin/mkfs.btrfs >> Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done. >> (gdb) set args -f -draid5 -mraid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 >> (gdb) run >> Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid5 -mraid5 >> /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1". >> btrfs-progs v4.6.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> ERROR: superblock checksum mismatch >> ERROR: superblock checksum mismatch >> ERROR: superblock checksum mismatch >> Performing full device TRIM (2.00GiB) ... >> Performing full device TRIM (2.00GiB) ... >> Performing full device TRIM (2.00GiB) ... >> Performing full device TRIM (2.00GiB) ... >> >> Program received signal SIGBUS, Bus error. >> 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, >> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at >> volumes.c:2156 >> 2156 *(unsigned long *)(p_eb->data + i) ^= >> (gdb) bt >> #0 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, >> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) >> at volumes.c:2156 >> #1 0x0000000000119b30 in write_and_map_eb (trans=0x2cc250, >> root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:426 >> #2 0x0000000000119e74 in write_tree_block (trans=0x2cc250, >> root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:459 >> #3 0x000000000011a4ac in __commit_transaction (trans=0x2cc250, >> root=0x2c7d80) at disk-io.c:562 >> #4 0x000000000011a7b8 in btrfs_commit_transaction (trans=0x2cc250, >> root=0x2c7d80) at disk-io.c:598 >> #5 0x00000000001a2b04 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1786 >> (gdb) >> >> Can someone help please? Thanks. >> > > The code that faults: > > (unsigned long *)(p_eb->data + i) ^= *(unsigned long *)(ebs[j]->data + i); > > Because struct extent_buffer has 'data' as a char[], this will always > fault on sparc64 and probably a number of other RISC architectures. It > increments the address by 1, then reads an 8-byte chunk, XORs, and > writes the 8-byte chunk, repeat. In other words, 7 out of 8 reads > would fault, even if both `data` pointers were 8-byte aligned. Actually, I misread that: the code increments i by sizeof(unsigned long), not by 1 so only if either data point is misaligned, then this will fault. Disregard that. > > This would probably fix it, though it looks ugly. > unsigned long a, b; > memcpy(&a, p_eb->data + i, sizeof(a)); /* Read 8 bytes from p_eb->data+i */ > memcpy(&b, ebs[j]->data + i, sizeof(b)); /* Read 8 bytes from ebs[j]->data+i */ > a ^= b; /* XOR */ > memcpy(p_eb->data + i, &a, sizeof(a)); /* Write back to p_eb->data+i */ > > I'm not familiar with btrfs, but the results seems like they depend on > the sizeof(unsigned long). Given that they used parentheses, I assume > it was intentional. However, if this was supposed to do an XOR > operation 8 bytes at a time, then it would need to be something like: > > *(((unsigned long *)p_eb->data)+i) ^= *(((unsigned long *)ebs[j]->data) + i); > > i.e. cast pointer to unsigned long*, then add i (which would index > array of unsigned long, not char). > > --Patrick