* [sparc64] mkfs.btrfs bus error / align issue? @ 2016-07-27 13:59 Anatoly Pugachev 2016-07-27 19:56 ` David Sterba ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Anatoly Pugachev @ 2016-07-27 13:59 UTC (permalink / raw) To: Btrfs BTRFS Hello! Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it shows the following : mator@nvg5120:~/btrfs-progs$ git log -1 --oneline 40650bf Btrfs progs v4.6.1 root@nvg5120:/home/mator/xfstests# gdb GNU gdb (Debian 7.11.1-2) 7.11.1 (gdb) file /opt/btrfs/bin/mkfs.btrfs Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done. (gdb) set args -f -draid5 -mraid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 (gdb) run Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid5 -mraid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1". btrfs-progs v4.6.1 See http://btrfs.wiki.kernel.org for more information. ERROR: superblock checksum mismatch ERROR: superblock checksum mismatch ERROR: superblock checksum mismatch Performing full device TRIM (2.00GiB) ... Performing full device TRIM (2.00GiB) ... Performing full device TRIM (2.00GiB) ... Performing full device TRIM (2.00GiB) ... Program received signal SIGBUS, Bus error. 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at volumes.c:2156 2156 *(unsigned long *)(p_eb->data + i) ^= (gdb) bt #0 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at volumes.c:2156 #1 0x0000000000119b30 in write_and_map_eb (trans=0x2cc250, root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:426 #2 0x0000000000119e74 in write_tree_block (trans=0x2cc250, root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:459 #3 0x000000000011a4ac in __commit_transaction (trans=0x2cc250, root=0x2c7d80) at disk-io.c:562 #4 0x000000000011a7b8 in btrfs_commit_transaction (trans=0x2cc250, root=0x2c7d80) at disk-io.c:598 #5 0x00000000001a2b04 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1786 (gdb) Can someone help please? Thanks. PS: /dev/loop is ramdisk devices: # mount tmpfs -t tmpfs -o size=12g /ramdisk # fallocate -l 3.9G /ramdisk/testvol # for i in 1 2 3 4; do fallocate -l 2G /ramdisk/scratch${i} ; done # ls -lh /ramdisk/ total 12G -rw-r--r-- 1 root root 2.0G Jul 27 16:16 scratch1 -rw-r--r-- 1 root root 2.0G Jul 27 16:16 scratch2 -rw-r--r-- 1 root root 2.0G Jul 27 16:16 scratch3 -rw-r--r-- 1 root root 2.0G Jul 27 16:16 scratch4 -rw-r--r-- 1 root root 3.9G Jul 27 16:15 testvol # for i in /ramdisk/*; do echo -n "$i : "; losetup -f --show $i; done /ramdisk/scratch1 : /dev/loop0 /ramdisk/scratch2 : /dev/loop1 /ramdisk/scratch3 : /dev/loop2 /ramdisk/scratch4 : /dev/loop3 /ramdisk/testvol : /dev/loop4 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-27 13:59 [sparc64] mkfs.btrfs bus error / align issue? Anatoly Pugachev @ 2016-07-27 19:56 ` David Sterba 2016-07-28 9:44 ` David Sterba 2016-07-27 20:39 ` Patrick Baggett 2016-07-27 20:40 ` John Paul Adrian Glaubitz 2 siblings, 1 reply; 21+ messages in thread From: David Sterba @ 2016-07-27 19:56 UTC (permalink / raw) To: Anatoly Pugachev; +Cc: Btrfs BTRFS On Wed, Jul 27, 2016 at 04:59:27PM +0300, Anatoly Pugachev wrote: > Hello! > > Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it > shows the following : > > Program received signal SIGBUS, Bus error. > 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, > eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at > volumes.c:2156 > 2156 *(unsigned long *)(p_eb->data + i) ^= Yeah, clear unaligned access. We have helpers for so I'll fix it. I was looking for a way to simulate and catch that on x86 or at least let gcc warn but no such thing seems to exist. Which means we might accidentally introduce that in the future. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-27 19:56 ` David Sterba @ 2016-07-28 9:44 ` David Sterba 2016-07-28 11:58 ` Anatoly Pugachev 0 siblings, 1 reply; 21+ messages in thread From: David Sterba @ 2016-07-28 9:44 UTC (permalink / raw) To: dsterba, Anatoly Pugachev, Btrfs BTRFS On Wed, Jul 27, 2016 at 09:56:09PM +0200, David Sterba wrote: > On Wed, Jul 27, 2016 at 04:59:27PM +0300, Anatoly Pugachev wrote: > > Hello! > > > > Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it > > shows the following : > > > > Program received signal SIGBUS, Bus error. > > 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, > > eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at > > volumes.c:2156 > > 2156 *(unsigned long *)(p_eb->data + i) ^= > > Yeah, clear unaligned access. We have helpers for so I'll fix it. I was > looking for a way to simulate and catch that on x86 or at least let gcc > warn but no such thing seems to exist. Which means we might accidentally > introduce that in the future. Can you please test with the current 'devel' branch? Fixed by the patch "btrfs-progs: fix unaligned access calculating raid56 data" (depends on another patch in devel). Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 9:44 ` David Sterba @ 2016-07-28 11:58 ` Anatoly Pugachev 2016-07-28 12:02 ` John Paul Adrian Glaubitz 2016-07-28 12:09 ` John Paul Adrian Glaubitz 0 siblings, 2 replies; 21+ messages in thread From: Anatoly Pugachev @ 2016-07-28 11:58 UTC (permalink / raw) To: dsterba; +Cc: Btrfs BTRFS, debian-sparc On Thu, Jul 28, 2016 at 12:44 PM, David Sterba <dsterba@suse.cz> wrote: > On Wed, Jul 27, 2016 at 09:56:09PM +0200, David Sterba wrote: >> On Wed, Jul 27, 2016 at 04:59:27PM +0300, Anatoly Pugachev wrote: >> > Hello! >> > >> > Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it >> > shows the following : >> > >> > Program received signal SIGBUS, Bus error. >> > 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, >> > eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at >> > volumes.c:2156 >> > 2156 *(unsigned long *)(p_eb->data + i) ^= >> >> Yeah, clear unaligned access. We have helpers for so I'll fix it. I was >> looking for a way to simulate and catch that on x86 or at least let gcc >> warn but no such thing seems to exist. Which means we might accidentally >> introduce that in the future. > > Can you please test with the current 'devel' branch? Fixed by the patch > "btrfs-progs: fix unaligned access calculating raid56 data" (depends on > another patch in devel). Thanks. David, but where do I get -devel branch of btrfs-progs? I just tried git://repo.or.cz/btrfs-progs-unstable/devel.git , but still seeing last commit in it: mator@nvg5120:~/1/devel$ git log -1 --pretty=format:"%h %s, %an, %ad" --date=short 40650bf Btrfs progs v4.6.1, David Sterba, 2016-06-24 so where do i pull git repo to take latest development patches? Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 11:58 ` Anatoly Pugachev @ 2016-07-28 12:02 ` John Paul Adrian Glaubitz 2016-07-28 12:09 ` John Paul Adrian Glaubitz 1 sibling, 0 replies; 21+ messages in thread From: John Paul Adrian Glaubitz @ 2016-07-28 12:02 UTC (permalink / raw) To: Anatoly Pugachev, dsterba; +Cc: Btrfs BTRFS, debian-sparc On 07/28/2016 01:58 PM, Anatoly Pugachev wrote: > but where do I get -devel branch of btrfs-progs? > I just tried git://repo.or.cz/btrfs-progs-unstable/devel.git , but > still seeing last commit in it: glaubitz@ikarus:~/upstream/devel$ git checkout devel Branch devel set up to track remote branch devel from origin. Switched to a new branch 'devel' glaubitz@ikarus:~/upstream/devel$ git log | head -n 10 commit 8f88fca93afc870de9783b0179c6e8d1b9f0cbf3 Merge: bf80586 366983b Author: David Sterba <dsterba@suse.com> Date: Tue Jul 26 19:43:50 2016 +0200 Merge branch 'foreign/qu/fsck-lowmem-v2.1-fixups' into devel commit bf80586481a4fa1c27571078dddc9392537aacf0 Author: David Sterba <dsterba@suse.com> Date: Tue Jul 26 19:33:12 2016 +0200 glaubitz@ikarus:~/upstream/devel$ -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz@debian.org `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 11:58 ` Anatoly Pugachev 2016-07-28 12:02 ` John Paul Adrian Glaubitz @ 2016-07-28 12:09 ` John Paul Adrian Glaubitz 2016-07-28 12:24 ` David Sterba 1 sibling, 1 reply; 21+ messages in thread From: John Paul Adrian Glaubitz @ 2016-07-28 12:09 UTC (permalink / raw) To: Anatoly Pugachev, dsterba; +Cc: Btrfs BTRFS, debian-sparc Hi David! On 07/28/2016 01:58 PM, Anatoly Pugachev wrote: >> Can you please test with the current 'devel' branch? Fixed by the patch >> "btrfs-progs: fix unaligned access calculating raid56 data" (depends on >> another patch in devel). Thanks. Are you sure you pushed these changes? I don't see them in the devel branch [1]. Adrian > [1] http://repo.or.cz/btrfs-progs-unstable/devel.git/shortlog/refs/heads/devel -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz@debian.org `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 12:09 ` John Paul Adrian Glaubitz @ 2016-07-28 12:24 ` David Sterba 2016-07-28 14:01 ` Anatoly Pugachev 0 siblings, 1 reply; 21+ messages in thread From: David Sterba @ 2016-07-28 12:24 UTC (permalink / raw) To: John Paul Adrian Glaubitz Cc: Anatoly Pugachev, dsterba, Btrfs BTRFS, debian-sparc On Thu, Jul 28, 2016 at 02:09:03PM +0200, John Paul Adrian Glaubitz wrote: > Hi David! > > On 07/28/2016 01:58 PM, Anatoly Pugachev wrote: > >> Can you please test with the current 'devel' branch? Fixed by the patch > >> "btrfs-progs: fix unaligned access calculating raid56 data" (depends on > >> another patch in devel). Thanks. > > Are you sure you pushed these changes? I don't see them in the devel branch [1]. Oh shame, of course not. Now pushed. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 12:24 ` David Sterba @ 2016-07-28 14:01 ` Anatoly Pugachev 2016-07-28 14:25 ` John Paul Adrian Glaubitz 0 siblings, 1 reply; 21+ messages in thread From: Anatoly Pugachev @ 2016-07-28 14:01 UTC (permalink / raw) To: dsterba; +Cc: debian-sparc, Btrfs BTRFS On Thu, Jul 28, 2016 at 3:24 PM, David Sterba <dsterba@suse.cz> wrote: > On Thu, Jul 28, 2016 at 02:09:03PM +0200, John Paul Adrian Glaubitz wrote: >> Hi David! >> >> On 07/28/2016 01:58 PM, Anatoly Pugachev wrote: >> >> Can you please test with the current 'devel' branch? Fixed by the patch >> >> "btrfs-progs: fix unaligned access calculating raid56 data" (depends on >> >> another patch in devel). Thanks. >> >> Are you sure you pushed these changes? I don't see them in the devel branch [1]. > > Oh shame, of course not. Now pushed. David, after checkout of devel tree: mator@nvg5120:~/devel$ git describe v4.6.1-64-g6d1564c mator@nvg5120:~/devel$ git remote -v origin git://repo.or.cz/btrfs-progs-unstable/devel.git (fetch) origin git://repo.or.cz/btrfs-progs-unstable/devel.git (push) mator@nvg5120:~/devel$ git branch * devel master a new place discovered: Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done. (gdb) set args -f -draid6 -mraid6 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 (gdb) run Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid6 -mraid6 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1". btrfs-progs v4.6.1-64-g6d1564c See http://btrfs.wiki.kernel.org for more information. ERROR: superblock checksum mismatch ERROR: superblock checksum mismatch ERROR: superblock checksum mismatch Performing full device TRIM (2.00GiB) ... Performing full device TRIM (2.00GiB) ... Performing full device TRIM (2.00GiB) ... Performing full device TRIM (2.00GiB) ... Program received signal SIGBUS, Bus error. 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536, ptrs=0x2c4510) at raid6.c:87 87 wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE]; (gdb) bt #0 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536, ptrs=0x2c4510) at raid6.c:87 #1 0x000000000015e174 in write_raid56_with_parity (info=0x2b37b0, eb=0x2c9fe0, multi=0x2c4870, stripe_len=65536, raid_map=0x2c4570) at volumes.c:2151 #2 0x0000000000119bd0 in write_and_map_eb (trans=0x2ce250, root=0x2c9d80, eb=0x2c9fe0) at disk-io.c:426 #3 0x0000000000119f14 in write_tree_block (trans=0x2ce250, root=0x2c9d80, eb=0x2c9fe0) at disk-io.c:459 #4 0x000000000011a54c in __commit_transaction (trans=0x2ce250, root=0x2c9d80) at disk-io.c:562 #5 0x000000000011a858 in btrfs_commit_transaction (trans=0x2ce250, root=0x2c9d80) at disk-io.c:598 #6 0x00000000001a52e8 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1809 (gdb) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 14:01 ` Anatoly Pugachev @ 2016-07-28 14:25 ` John Paul Adrian Glaubitz 2016-07-28 14:28 ` John Paul Adrian Glaubitz 0 siblings, 1 reply; 21+ messages in thread From: John Paul Adrian Glaubitz @ 2016-07-28 14:25 UTC (permalink / raw) To: Anatoly Pugachev, dsterba; +Cc: debian-sparc, Btrfs BTRFS On 07/28/2016 04:01 PM, Anatoly Pugachev wrote: > Program received signal SIGBUS, Bus error. > 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536, > ptrs=0x2c4510) at raid6.c:87 > 87 wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE]; That should be easy to fix. Just make the R values aligned with the appropriate get_aligned functions, see David's previous commit [1]: - for (i = 0; i < stripe_len; i += sizeof(unsigned long)) { - *(unsigned long *)(p_eb->data + i) ^= - *(unsigned long *)(ebs[j]->data + i); + for (i = 0; i < stripe_len; i += sizeof(u64)) { + u64 p_eb_data; + u64 ebs_data; + + p_eb_data = get_unaligned_64(p_eb->data + i); + ebs_data = get_unaligned_64(ebs[j]->data + i); + p_eb_data ^= ebs_data; + put_unaligned_64(p_eb_data, p_eb->data + i); > (gdb) bt You don't need a backtrace here. It stops directly at the offending line. The pattern is usually for = *(new_type_t *) bar. Rather surprised to see such code in here, especially given the fact they already have all the necessary helper macros defined in [2]. There are more lines in raid6.c which need the same fix, basically everything with * (unative_t *). Cheers, Adrian > [1] http://repo.or.cz/btrfs-progs-unstable/devel.git/blobdiff/1c47e5b03922772c1a9429c7817cc728c99e2530..24f1713777f02300d0c48ddc142b2c711e462b65:/volumes.c > [2] http://repo.or.cz/btrfs-progs-unstable/devel.git/blob/refs/heads/devel:/kerncompat.h#l337 -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz@debian.org `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 14:25 ` John Paul Adrian Glaubitz @ 2016-07-28 14:28 ` John Paul Adrian Glaubitz 2016-07-28 14:32 ` Patrick Baggett 2016-07-28 18:04 ` David Sterba 0 siblings, 2 replies; 21+ messages in thread From: John Paul Adrian Glaubitz @ 2016-07-28 14:28 UTC (permalink / raw) To: Anatoly Pugachev, dsterba; +Cc: debian-sparc, Btrfs BTRFS On 07/28/2016 04:25 PM, John Paul Adrian Glaubitz wrote: > On 07/28/2016 04:01 PM, Anatoly Pugachev wrote: >> Program received signal SIGBUS, Bus error. >> 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536, >> ptrs=0x2c4510) at raid6.c:87 >> 87 wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE]; > > That should be easy to fix. Just make the R values aligned with the > appropriate get_aligned functions, see David's previous commit [1]: Argh, those are called get_UNaligned_*, not get_aligned_*. > There are more lines in raid6.c which need the same fix, basically everything > with * (unative_t *). Oh, and you will somehow need to guard this with #if BITS_PER_LONG == 64 ... #else ... #endif respectively since you need to use different versions (64 vs. 32) of get_unaligned_* depending on the size of unative_t. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz@debian.org `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 14:28 ` John Paul Adrian Glaubitz @ 2016-07-28 14:32 ` Patrick Baggett 2016-07-28 18:04 ` David Sterba 1 sibling, 0 replies; 21+ messages in thread From: Patrick Baggett @ 2016-07-28 14:32 UTC (permalink / raw) To: John Paul Adrian Glaubitz Cc: Anatoly Pugachev, dsterba, debian-sparc, Btrfs BTRFS > Oh, and you will somehow need to guard this with #if BITS_PER_LONG == 64 ... > #else ... #endif respectively since you need to use different versions > (64 vs. 32) of get_unaligned_* depending on the size of unative_t. Maybe a get_unaligned_unative() would be better so that preprocessor fun is kept to a minimum. :) --Patrick ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 14:28 ` John Paul Adrian Glaubitz 2016-07-28 14:32 ` Patrick Baggett @ 2016-07-28 18:04 ` David Sterba 2016-07-28 20:34 ` Anatoly Pugachev 2016-07-29 10:20 ` John Paul Adrian Glaubitz 1 sibling, 2 replies; 21+ messages in thread From: David Sterba @ 2016-07-28 18:04 UTC (permalink / raw) To: John Paul Adrian Glaubitz; +Cc: Anatoly Pugachev, debian-sparc, Btrfs BTRFS On Thu, Jul 28, 2016 at 04:28:41PM +0200, John Paul Adrian Glaubitz wrote: > On 07/28/2016 04:25 PM, John Paul Adrian Glaubitz wrote: > > On 07/28/2016 04:01 PM, Anatoly Pugachev wrote: > >> Program received signal SIGBUS, Bus error. > >> 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536, > >> ptrs=0x2c4510) at raid6.c:87 > >> 87 wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE]; > > > > That should be easy to fix. Just make the R values aligned with the > > appropriate get_aligned functions, see David's previous commit [1]: > > Argh, those are called get_UNaligned_*, not get_aligned_*. > > > There are more lines in raid6.c which need the same fix, basically everything > > with * (unative_t *). > > Oh, and you will somehow need to guard this with #if BITS_PER_LONG == 64 ... > #else ... #endif respectively since you need to use different versions > (64 vs. 32) of get_unaligned_* depending on the size of unative_t. And I've fixed it that way, now pushed to devel ("btrfs-progs: fix unaligned access in raid6 calculations" [1]). Would be great if you or Anatoly can test it so I can add it to the 4.7 release (ETA tomorrow). [1] https://github.com/kdave/btrfs-progs/commit/44b35e94facf820d2c2d8e1be631bf40d68dff8d ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 18:04 ` David Sterba @ 2016-07-28 20:34 ` Anatoly Pugachev 2016-07-29 9:40 ` Anatoly Pugachev 2016-07-29 12:41 ` David Sterba 2016-07-29 10:20 ` John Paul Adrian Glaubitz 1 sibling, 2 replies; 21+ messages in thread From: Anatoly Pugachev @ 2016-07-28 20:34 UTC (permalink / raw) To: dsterba; +Cc: Btrfs BTRFS, debian-sparc On Thu, Jul 28, 2016 at 9:04 PM, David Sterba <dsterba@suse.cz> wrote: > On Thu, Jul 28, 2016 at 04:28:41PM +0200, John Paul Adrian Glaubitz wrote: >> On 07/28/2016 04:25 PM, John Paul Adrian Glaubitz wrote: >> > On 07/28/2016 04:01 PM, Anatoly Pugachev wrote: >> >> Program received signal SIGBUS, Bus error. >> >> 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536, >> >> ptrs=0x2c4510) at raid6.c:87 >> >> 87 wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE]; >> > >> > That should be easy to fix. Just make the R values aligned with the >> > appropriate get_aligned functions, see David's previous commit [1]: >> >> Argh, those are called get_UNaligned_*, not get_aligned_*. >> >> > There are more lines in raid6.c which need the same fix, basically everything >> > with * (unative_t *). >> >> Oh, and you will somehow need to guard this with #if BITS_PER_LONG == 64 ... >> #else ... #endif respectively since you need to use different versions >> (64 vs. 32) of get_unaligned_* depending on the size of unative_t. > > And I've fixed it that way, now pushed to devel ("btrfs-progs: fix > unaligned access in raid6 calculations" [1]). Would be great if you or > Anatoly can test it so I can add it to the 4.7 release (ETA tomorrow). David, well, I think mkfs.btrfs is fixed, since I just tested it with : root@nvg5120:/home/mator/xfstests# ./check 'btrfs/06?' FSTYP -- btrfs PLATFORM -- Linux/sparc64 nvg5120 4.7.0+ MKFS_OPTIONS -- /dev/loop0 MOUNT_OPTIONS -- /dev/loop0 /mnt/scratch btrfs/060 145s btrfs/061 158s btrfs/062 288s btrfs/063 141s btrfs/064 129s btrfs/065 44s btrfs/066 46s btrfs/067 - output mismatch (see /home/mator/xfstests/results//btrfs/067.out.bad) --- tests/btrfs/067.out 2016-07-20 12:12:21.772228422 +0300 +++ /home/mator/xfstests/results//btrfs/067.out.bad 2016-07-28 22:54:00.059192629 +0300 @@ -1,2 +1,3 @@ QA output created by 067 Silence is golden +Scrub find errors in "-m single -d single" test ... (Run 'diff -u tests/btrfs/067.out /home/mator/xfstests/results//btrfs/067.out.bad' to see the entire diff) btrfs/068 57s btrfs/069 45s Ran: btrfs/060 btrfs/061 btrfs/062 btrfs/063 btrfs/064 btrfs/065 btrfs/066 btrfs/067 btrfs/068 btrfs/069 Failures: btrfs/067 Failed 1 of 10 tests previously (before mkfs.btrfs fix) , all tests from 06? were bad/failed. Starting from "tests/btrfs/064" kernel started to log TPC (Trap Program Counter register) messages, a lot of them. Results of the this test i put on a webserver [1]. Output of journalctl -b (from boot) with TPC messages are at [2]. Not sure what we need to do with sparc64 btrfs module TPC messages. Probably fill kernel bugzilla report? Thanks. [1] http://u163.east.ru/btrfs/xfstests-btrfs-06x-results.tar.gz [2] http://u163.east.ru/btrfs/kernel-4.7.0+-logs-xfstests-06x.txt.gz PS: my xfstests setup is the following: # mount tmpfs -t tmpfs -o size=13g /ramdisk/ /ramdisk# for i in 1 2 3 4 5 6; do fallocate -l 1g scratch${i}; done /ramdisk# fallocate -l 4g testvol1 /ramdisk# for i in *; do losetup -f $i; done /home/mator/xfstests# losetup NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO /dev/loop0 0 0 0 0 /ramdisk/scratch1 0 /dev/loop1 0 0 0 0 /ramdisk/scratch2 0 /dev/loop2 0 0 0 0 /ramdisk/scratch3 0 /dev/loop3 0 0 0 0 /ramdisk/scratch4 0 /dev/loop4 0 0 0 0 /ramdisk/scratch5 0 /dev/loop5 0 0 0 0 /ramdisk/scratch6 0 /dev/loop6 0 0 0 0 /ramdisk/testvol1 0 # mkfs.btrfs /dev/loop6 btrfs-progs v4.6.1-66-g4367e35 See http://btrfs.wiki.kernel.org for more information. Performing full device TRIM (4.00GiB) ... Label: (null) UUID: 6a4d5918-adfe-469c-8454-9b28545b88bc Node size: 16384 Sector size: 8192 Filesystem size: 4.00GiB Block group profiles: Data: single 8.00MiB Metadata: DUP 204.75MiB System: DUP 8.00MiB SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 1 Devices: ID SIZE PATH 1 4.00GiB /dev/loop6 root@nvg5120:/home/mator/xfstests# cat local.config export TEST_DEV=/dev/loop6 export TEST_DIR=/fst export SCRATCH_DEV_POOL="/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4 /dev/loop5" export SCRATCH_MNT=/mnt/scratch ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 20:34 ` Anatoly Pugachev @ 2016-07-29 9:40 ` Anatoly Pugachev 2016-07-29 12:41 ` David Sterba 1 sibling, 0 replies; 21+ messages in thread From: Anatoly Pugachev @ 2016-07-29 9:40 UTC (permalink / raw) To: dsterba; +Cc: Btrfs BTRFS, debian-sparc, fstests On Thu, Jul 28, 2016 at 11:34 PM, Anatoly Pugachev <matorola@gmail.com> wrote: > On Thu, Jul 28, 2016 at 9:04 PM, David Sterba <dsterba@suse.cz> wrote: >> On Thu, Jul 28, 2016 at 04:28:41PM +0200, John Paul Adrian Glaubitz wrote: >>> On 07/28/2016 04:25 PM, John Paul Adrian Glaubitz wrote: >>> > On 07/28/2016 04:01 PM, Anatoly Pugachev wrote: >>> >> Program received signal SIGBUS, Bus error. >>> >> 0x0000000000177dfc in raid6_gen_syndrome (disks=4, bytes=65536, >>> >> ptrs=0x2c4510) at raid6.c:87 >>> >> 87 wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE]; >>> > >>> > That should be easy to fix. Just make the R values aligned with the >>> > appropriate get_aligned functions, see David's previous commit [1]: >>> >>> Argh, those are called get_UNaligned_*, not get_aligned_*. >>> >>> > There are more lines in raid6.c which need the same fix, basically everything >>> > with * (unative_t *). >>> >>> Oh, and you will somehow need to guard this with #if BITS_PER_LONG == 64 ... >>> #else ... #endif respectively since you need to use different versions >>> (64 vs. 32) of get_unaligned_* depending on the size of unative_t. >> >> And I've fixed it that way, now pushed to devel ("btrfs-progs: fix >> unaligned access in raid6 calculations" [1]). Would be great if you or >> Anatoly can test it so I can add it to the 4.7 release (ETA tomorrow). > > David, > well, I think mkfs.btrfs is fixed, since I just tested it with : > root@nvg5120:/home/mator/xfstests# ./check 'btrfs/06?' > FSTYP -- btrfs > PLATFORM -- Linux/sparc64 nvg5120 4.7.0+ > MKFS_OPTIONS -- /dev/loop0 > MOUNT_OPTIONS -- /dev/loop0 /mnt/scratch > > btrfs/060 145s > btrfs/061 158s > btrfs/062 288s > btrfs/063 141s > btrfs/064 129s > btrfs/065 44s > btrfs/066 46s > btrfs/067 - output mismatch (see > /home/mator/xfstests/results//btrfs/067.out.bad) > --- tests/btrfs/067.out 2016-07-20 12:12:21.772228422 +0300 > +++ /home/mator/xfstests/results//btrfs/067.out.bad 2016-07-28 > 22:54:00.059192629 +0300 > @@ -1,2 +1,3 @@ > QA output created by 067 > Silence is golden > +Scrub find errors in "-m single -d single" test > ... > (Run 'diff -u tests/btrfs/067.out > /home/mator/xfstests/results//btrfs/067.out.bad' to see the entire > diff) > btrfs/068 57s > btrfs/069 45s > Ran: btrfs/060 btrfs/061 btrfs/062 btrfs/063 btrfs/064 btrfs/065 > btrfs/066 btrfs/067 btrfs/068 btrfs/069 > Failures: btrfs/067 > Failed 1 of 10 tests > > > previously (before mkfs.btrfs fix) , all tests from 06? were bad/failed. > > Starting from "tests/btrfs/064" kernel started to log TPC (Trap > Program Counter register) messages, a lot of them. > > Results of the this test i put on a webserver [1]. > Output of journalctl -b (from boot) with TPC messages are at [2]. > > Not sure what we need to do with sparc64 btrfs module TPC messages. > Probably fill kernel bugzilla report? > > Thanks. > > [1] http://u163.east.ru/btrfs/xfstests-btrfs-06x-results.tar.gz > [2] http://u163.east.ru/btrfs/kernel-4.7.0+-logs-xfstests-06x.txt.gz > > PS: my xfstests setup is the following: > > # mount tmpfs -t tmpfs -o size=13g /ramdisk/ > /ramdisk# for i in 1 2 3 4 5 6; do fallocate -l 1g scratch${i}; done > /ramdisk# fallocate -l 4g testvol1 > > /ramdisk# for i in *; do losetup -f $i; done > /home/mator/xfstests# losetup > NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO > /dev/loop0 0 0 0 0 /ramdisk/scratch1 0 > /dev/loop1 0 0 0 0 /ramdisk/scratch2 0 > /dev/loop2 0 0 0 0 /ramdisk/scratch3 0 > /dev/loop3 0 0 0 0 /ramdisk/scratch4 0 > /dev/loop4 0 0 0 0 /ramdisk/scratch5 0 > /dev/loop5 0 0 0 0 /ramdisk/scratch6 0 > /dev/loop6 0 0 0 0 /ramdisk/testvol1 0 > > # mkfs.btrfs /dev/loop6 > btrfs-progs v4.6.1-66-g4367e35 > See http://btrfs.wiki.kernel.org for more information. > > Performing full device TRIM (4.00GiB) ... > Label: (null) > UUID: 6a4d5918-adfe-469c-8454-9b28545b88bc > Node size: 16384 > Sector size: 8192 > Filesystem size: 4.00GiB > Block group profiles: > Data: single 8.00MiB > Metadata: DUP 204.75MiB > System: DUP 8.00MiB > SSD detected: no > Incompat features: extref, skinny-metadata > Number of devices: 1 > Devices: > ID SIZE PATH > 1 4.00GiB /dev/loop6 > > root@nvg5120:/home/mator/xfstests# cat local.config > export TEST_DEV=/dev/loop6 > export TEST_DIR=/fst > export SCRATCH_DEV_POOL="/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 > /dev/loop4 /dev/loop5" > export SCRATCH_MNT=/mnt/scratch Just to add, I've also run tests from btrfs/000 to btrfs/059, with not so bad results: Ran: btrfs/001 btrfs/002 btrfs/005 btrfs/006 btrfs/008 btrfs/009 btrfs/010 btrfs/012 btrfs/013 btrfs/014 btrfs/015 btrfs/016 btrfs/017 btrfs/018 btrfs/019 btrfs/020 btrfs/021 btrfs/022 btrfs/023 btrfs/024 btrfs/025 btrfs/026 btrfs/027 btrfs/028 btrfs/029 btrfs/030 btrfs/031 btrfs/032 btrfs/033 btrfs/034 btrfs/035 btrfs/036 btrfs/037 btrfs/038 btrfs/039 btrfs/040 btrfs/041 btrfs/042 btrfs/043 btrfs/044 btrfs/045 btrfs/046 btrfs/048 btrfs/049 btrfs/050 btrfs/051 btrfs/052 btrfs/053 btrfs/054 btrfs/055 btrfs/056 btrfs/057 btrfs/058 btrfs/059 Not run: btrfs/003 btrfs/004 btrfs/007 btrfs/011 btrfs/047 Failures: btrfs/010 btrfs/012 btrfs/057 Failed 3 of 54 tests Failures: btrfs/010 - failed with "number of extents mis-match!" $ cat /home/mator/xfstests/results//btrfs/010.full Create subvolume '/mnt/scratch/subvol' 1+0 records in 1+0 records out 1+0 records in 1+0 records out 1+0 records in 1+0 records out 1+0 records in 1+0 records out 1+0 records in 1+0 records out Create a snapshot of '/mnt/scratch/subvol' in '/mnt/scratch/snap-2' Create a snapshot of '/mnt/scratch/subvol' in '/mnt/scratch/snap-1' /mnt/scratch/subvol/foobar: 0: [0..79]: 24704..24783 /mnt/scratch/snap-1/foobar: 0: [0..31]: 24672..24703 1: [32..47]: 24656..24671 2: [48..63]: 24608..24623 3: [64..79]: 24592..24607 /mnt/scratch/snap-2/foobar: 0: [0..31]: 24672..24703 1: [32..47]: 24656..24671 2: [48..63]: 24608..24623 3: [64..79]: 24592..24607 1 4 4 btrfs/012 - failed with "btrfs-convert failed" $ cat /home/mator/xfstests/results//btrfs/012.full mke2fs 1.43.1 (08-Jun-2016) Discarding device blocks: done Creating filesystem with 262144 4k blocks and 65536 inodes Filesystem UUID: 98d0756e-76b6-4ab1-ac7d-a1fceb4b21b4 Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done ERROR: system chunk array too big 1627389952 > 2048 ERROR: superblock checksum matches but it has invalid members No valid Btrfs found on /dev/loop0 unable to open ctree conversion aborted create btrfs filesystem: blocksize: 4096 nodesize: 16384 features: extref, skinny-metadata (default) btrfs-convert failed btrfs/057 - failed: '_scratch_mkfs -b 1g --nodesize 4096' $ cat /home/mator/xfstests/results//btrfs/057.full # _scratch_mkfs -b 1g --nodesize 4096 ERROR: illegal nodesize 4096 (smaller than 8192) failed: '_scratch_mkfs -b 1g --nodesize 4096' JFYI, mator@nvg5120:~$ getconf PAGE_SIZE 8192 this 000-059 tests was done with fresh reboot. within 027 test, kernel started to show TPC messages, like this one: Jul 29 12:10:32 nvg5120 unknown: run fstests btrfs/027 at 2016-07-29 12:10:32 ... Jul 29 12:10:58 nvg5120 kernel: BTRFS info (device loop4): allowing degraded mounts Jul 29 12:10:58 nvg5120 kernel: BTRFS info (device loop4): disk space caching is enabled Jul 29 12:10:58 nvg5120 kernel: BTRFS info (device loop4): has skinny extents Jul 29 12:10:59 nvg5120 kernel: BTRFS info (device loop4): dev_replace from <missing disk> (devid 2) to /dev/loop5 started Jul 29 12:10:59 nvg5120 kernel: Kernel unaligned access at TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs] Jul 29 12:10:59 nvg5120 kernel: Kernel unaligned access at TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs] Jul 29 12:10:59 nvg5120 kernel: Kernel unaligned access at TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs] Jul 29 12:10:59 nvg5120 kernel: Kernel unaligned access at TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs] Jul 29 12:10:59 nvg5120 kernel: Kernel unaligned access at TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs] Jul 29 12:11:00 nvg5120 kernel: BTRFS info (device loop4): dev_replace from <missing disk> (devid 2) to /dev/loop5 finished ... Jul 29 12:11:07 nvg5120 kernel: BTRFS info (device loop4): allowing degraded mounts Jul 29 12:11:07 nvg5120 kernel: BTRFS info (device loop4): disk space caching is enabled Jul 29 12:11:07 nvg5120 kernel: BTRFS info (device loop4): has skinny extents Jul 29 12:11:08 nvg5120 kernel: BTRFS info (device loop4): dev_replace from <missing disk> (devid 2) to /dev/loop5 started Jul 29 12:11:08 nvg5120 kernel: log_unaligned: 10616 callbacks suppressed Jul 29 12:11:08 nvg5120 kernel: Kernel unaligned access at TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs] Jul 29 12:11:09 nvg5120 kernel: Kernel unaligned access at TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs] Jul 29 12:11:09 nvg5120 kernel: Kernel unaligned access at TPC[118e002c] __btrfs_map_block+0x36c/0x1180 [btrfs] Jul 29 12:11:09 nvg5120 kernel: Kernel unaligned access at TPC[118e0094] __btrfs_map_block+0x3d4/0x1180 [btrfs] Jul 29 12:11:09 nvg5120 kernel: Kernel unaligned access at TPC[118e0960] __btrfs_map_block+0xca0/0x1180 [btrfs] Jul 29 12:11:09 nvg5120 kernel: BTRFS info (device loop4): dev_replace from <missing disk> (devid 2) to /dev/loop5 finished Jul 29 12:11:11 nvg5120 mator[34598]: run xfstest btrfs/028 and only with 027 test, next tests were finished without TPC messages. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 20:34 ` Anatoly Pugachev 2016-07-29 9:40 ` Anatoly Pugachev @ 2016-07-29 12:41 ` David Sterba 2016-07-29 15:03 ` Anatoly Pugachev 1 sibling, 1 reply; 21+ messages in thread From: David Sterba @ 2016-07-29 12:41 UTC (permalink / raw) To: Anatoly Pugachev; +Cc: Btrfs BTRFS, debian-sparc On Thu, Jul 28, 2016 at 11:34:58PM +0300, Anatoly Pugachev wrote: > well, I think mkfs.btrfs is fixed, since I just tested it with : Good news, thanks. > Not sure what we need to do with sparc64 btrfs module TPC messages. > Probably fill kernel bugzilla report? Yes please. > [1] http://u163.east.ru/btrfs/xfstests-btrfs-06x-results.tar.gz > [2] http://u163.east.ru/btrfs/kernel-4.7.0+-logs-xfstests-06x.txt.gz quick stats of the TPC messages: 23 __btrfs_map_block+0x36c/0x1180 9 __remove_rbio_from_cache+0x38/0x140 6 lock_stripe_add+0xb0/0x360 4 __btrfs_map_block+0x3d4/0x1180 3 __btrfs_map_block+0xca0/0x1180 running in 'gdb btrfs.ko' for each of the addresses should tell us what are the locations: gdb> l *(__btrfs_map_block+0x36c) ... Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-29 12:41 ` David Sterba @ 2016-07-29 15:03 ` Anatoly Pugachev 0 siblings, 0 replies; 21+ messages in thread From: Anatoly Pugachev @ 2016-07-29 15:03 UTC (permalink / raw) To: dsterba; +Cc: Btrfs BTRFS, debian-sparc On Fri, Jul 29, 2016 at 3:41 PM, David Sterba <dsterba@suse.cz> wrote: > On Thu, Jul 28, 2016 at 11:34:58PM +0300, Anatoly Pugachev wrote: >> well, I think mkfs.btrfs is fixed, since I just tested it with : > > Good news, thanks. > > quick stats of the TPC messages: > > 23 __btrfs_map_block+0x36c/0x1180 > 9 __remove_rbio_from_cache+0x38/0x140 > 6 lock_stripe_add+0xb0/0x360 > 4 __btrfs_map_block+0x3d4/0x1180 > 3 __btrfs_map_block+0xca0/0x1180 > > running in 'gdb btrfs.ko' for each of the addresses should tell us what are the > locations: > > gdb> l *(__btrfs_map_block+0x36c) > ... installed fresh btrgs-progs from git mator@nvg5120:~/btrfs-progs$ git describe --long v4.7-0-g9d2ea01 recompiled kernel with debug info... and run xfstests/check 'btrfs/06?' again mator@nvg5120:~/linux-2.6$ git describe --long v4.7-0-g523d939 kernel is patched with [1] to enable btrfs module loading on big-endian systems (not sure does current linux kernel git includes this patch or not, used/checkout plain v4.7 tag which is 5 days old) root@nvg5120:/home/mator/xfstests# ./check 'btrfs/06?' FSTYP -- btrfs PLATFORM -- Linux/sparc64 nvg5120 4.7.0+ MKFS_OPTIONS -- /dev/loop0 MOUNT_OPTIONS -- /dev/loop0 /mnt/scratch btrfs/060 156s btrfs/061 182s btrfs/062 312s btrfs/063 162s btrfs/064 152s btrfs/065 61s btrfs/066 65s btrfs/067 158s btrfs/068 74s btrfs/069 65s Ran: btrfs/060 btrfs/061 btrfs/062 btrfs/063 btrfs/064 btrfs/065 btrfs/066 btrfs/067 btrfs/068 btrfs/069 Passed all 10 tests $ journalctl -b -k | awk '/TPC/{print $11}' | sort | uniq -c | sort -n 4 __btrfs_map_block+0xa10/0x1100 5 lock_stripe_add+0xb0/0x340 7 __btrfs_map_block+0x9d4/0x1100 9 __remove_rbio_from_cache+0x30/0x140 29 __btrfs_map_block+0x96c/0x1100 $ gdb -q /lib/modules/4.7.0+/kernel/fs/btrfs/btrfs.ko Reading symbols from /lib/modules/4.7.0+/kernel/fs/btrfs/btrfs.ko...done. (gdb) l *(__btrfs_map_block+0x96c) 0x8498c is in __btrfs_map_block (fs/btrfs/volumes.c:5615). 5610 div_u64_rem(stripe_nr, num_stripes, &rot); 5611 5612 /* Fill in the logical address of each stripe */ 5613 tmp = stripe_nr * nr_data_stripes(map); 5614 for (i = 0; i < nr_data_stripes(map); i++) 5615 bbio->raid_map[(i+rot) % num_stripes] = 5616 em->start + (tmp + i) * map->stripe_len; 5617 5618 bbio->raid_map[(i+rot) % map->num_stripes] = RAID5_P_STRIPE; 5619 if (map->type & BTRFS_BLOCK_GROUP_RAID6) (gdb) l *(__btrfs_map_block+0x9d4) 0x849f4 is in __btrfs_map_block (fs/btrfs/volumes.c:5618). 5613 tmp = stripe_nr * nr_data_stripes(map); 5614 for (i = 0; i < nr_data_stripes(map); i++) 5615 bbio->raid_map[(i+rot) % num_stripes] = 5616 em->start + (tmp + i) * map->stripe_len; 5617 5618 bbio->raid_map[(i+rot) % map->num_stripes] = RAID5_P_STRIPE; 5619 if (map->type & BTRFS_BLOCK_GROUP_RAID6) 5620 bbio->raid_map[(i+rot+1) % num_stripes] = 5621 RAID6_Q_STRIPE; 5622 } (gdb) l *(__btrfs_map_block+0xa10) 0x84a30 is in __btrfs_map_block (fs/btrfs/volumes.c:5620). 5615 bbio->raid_map[(i+rot) % num_stripes] = 5616 em->start + (tmp + i) * map->stripe_len; 5617 5618 bbio->raid_map[(i+rot) % map->num_stripes] = RAID5_P_STRIPE; 5619 if (map->type & BTRFS_BLOCK_GROUP_RAID6) 5620 bbio->raid_map[(i+rot+1) % num_stripes] = 5621 RAID6_Q_STRIPE; 5622 } 5623 5624 if (rw & REQ_DISCARD) { (gdb) l *(lock_stripe_add+0xb0) 0xe0370 is in lock_stripe_add (fs/btrfs/raid56.c:685). 680 int walk = 0; 681 682 spin_lock_irqsave(&h->lock, flags); 683 list_for_each_entry(cur, &h->hash_list, hash_list) { 684 walk++; 685 if (cur->bbio->raid_map[0] == rbio->bbio->raid_map[0]) { 686 spin_lock(&cur->bio_list_lock); 687 688 /* can we steal this cached rbio's pages? */ 689 if (bio_list_empty(&cur->bio_list) && (gdb) l *(__remove_rbio_from_cache+0x30) 0xdfe30 is in __remove_rbio_from_cache (include/linux/spinlock.h:302). 297 raw_spin_lock_init(&(_lock)->rlock); \ 298 } while (0) 299 300 static __always_inline void spin_lock(spinlock_t *lock) 301 { 302 raw_spin_lock(&lock->rlock); 303 } 304 305 static __always_inline void spin_lock_bh(spinlock_t *lock) 306 { Thanks. [1]. http://www.spinics.net/lists/linux-btrfs/msg57193.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-28 18:04 ` David Sterba 2016-07-28 20:34 ` Anatoly Pugachev @ 2016-07-29 10:20 ` John Paul Adrian Glaubitz 1 sibling, 0 replies; 21+ messages in thread From: John Paul Adrian Glaubitz @ 2016-07-29 10:20 UTC (permalink / raw) To: dsterba, Anatoly Pugachev, debian-sparc, Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 547 bytes --] On 07/28/2016 08:04 PM, David Sterba wrote: > And I've fixed it that way, now pushed to devel ("btrfs-progs: fix > unaligned access in raid6 calculations" [1]). Would be great if you or > Anatoly can test it so I can add it to the 4.7 release (ETA tomorrow). Awesome, thank you very much! Much appreciated! Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz@debian.org `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-27 13:59 [sparc64] mkfs.btrfs bus error / align issue? Anatoly Pugachev 2016-07-27 19:56 ` David Sterba @ 2016-07-27 20:39 ` Patrick Baggett 2016-07-27 20:41 ` Patrick Baggett 2016-07-27 20:40 ` John Paul Adrian Glaubitz 2 siblings, 1 reply; 21+ messages in thread From: Patrick Baggett @ 2016-07-27 20:39 UTC (permalink / raw) To: Anatoly Pugachev; +Cc: Btrfs BTRFS On Wed, Jul 27, 2016 at 8:59 AM, Anatoly Pugachev <matorola@gmail.com> wrote: > > Hello! > > Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it > shows the following : > > mator@nvg5120:~/btrfs-progs$ git log -1 --oneline > 40650bf Btrfs progs v4.6.1 > > root@nvg5120:/home/mator/xfstests# gdb > GNU gdb (Debian 7.11.1-2) 7.11.1 > (gdb) file /opt/btrfs/bin/mkfs.btrfs > Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done. > (gdb) set args -f -draid5 -mraid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 > (gdb) run > Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid5 -mraid5 > /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1". > btrfs-progs v4.6.1 > See http://btrfs.wiki.kernel.org for more information. > > ERROR: superblock checksum mismatch > ERROR: superblock checksum mismatch > ERROR: superblock checksum mismatch > Performing full device TRIM (2.00GiB) ... > Performing full device TRIM (2.00GiB) ... > Performing full device TRIM (2.00GiB) ... > Performing full device TRIM (2.00GiB) ... > > Program received signal SIGBUS, Bus error. > 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, > eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at > volumes.c:2156 > 2156 *(unsigned long *)(p_eb->data + i) ^= > (gdb) bt > #0 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, > eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) > at volumes.c:2156 > #1 0x0000000000119b30 in write_and_map_eb (trans=0x2cc250, > root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:426 > #2 0x0000000000119e74 in write_tree_block (trans=0x2cc250, > root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:459 > #3 0x000000000011a4ac in __commit_transaction (trans=0x2cc250, > root=0x2c7d80) at disk-io.c:562 > #4 0x000000000011a7b8 in btrfs_commit_transaction (trans=0x2cc250, > root=0x2c7d80) at disk-io.c:598 > #5 0x00000000001a2b04 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1786 > (gdb) > > Can someone help please? Thanks. > The code that faults: (unsigned long *)(p_eb->data + i) ^= *(unsigned long *)(ebs[j]->data + i); Because struct extent_buffer has 'data' as a char[], this will always fault on sparc64 and probably a number of other RISC architectures. It increments the address by 1, then reads an 8-byte chunk, XORs, and writes the 8-byte chunk, repeat. In other words, 7 out of 8 reads would fault, even if both `data` pointers were 8-byte aligned. This would probably fix it, though it looks ugly. unsigned long a, b; memcpy(&a, p_eb->data + i, sizeof(a)); /* Read 8 bytes from p_eb->data+i */ memcpy(&b, ebs[j]->data + i, sizeof(b)); /* Read 8 bytes from ebs[j]->data+i */ a ^= b; /* XOR */ memcpy(p_eb->data + i, &a, sizeof(a)); /* Write back to p_eb->data+i */ I'm not familiar with btrfs, but the results seems like they depend on the sizeof(unsigned long). Given that they used parentheses, I assume it was intentional. However, if this was supposed to do an XOR operation 8 bytes at a time, then it would need to be something like: *(((unsigned long *)p_eb->data)+i) ^= *(((unsigned long *)ebs[j]->data) + i); i.e. cast pointer to unsigned long*, then add i (which would index array of unsigned long, not char). --Patrick ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-27 20:39 ` Patrick Baggett @ 2016-07-27 20:41 ` Patrick Baggett 0 siblings, 0 replies; 21+ messages in thread From: Patrick Baggett @ 2016-07-27 20:41 UTC (permalink / raw) To: Anatoly Pugachev; +Cc: Btrfs BTRFS On Wed, Jul 27, 2016 at 3:39 PM, Patrick Baggett <baggett.patrick@gmail.com> wrote: > On Wed, Jul 27, 2016 at 8:59 AM, Anatoly Pugachev <matorola@gmail.com> wrote: >> >> Hello! >> >> Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it >> shows the following : >> >> mator@nvg5120:~/btrfs-progs$ git log -1 --oneline >> 40650bf Btrfs progs v4.6.1 >> >> root@nvg5120:/home/mator/xfstests# gdb >> GNU gdb (Debian 7.11.1-2) 7.11.1 >> (gdb) file /opt/btrfs/bin/mkfs.btrfs >> Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done. >> (gdb) set args -f -draid5 -mraid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 >> (gdb) run >> Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid5 -mraid5 >> /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1". >> btrfs-progs v4.6.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> ERROR: superblock checksum mismatch >> ERROR: superblock checksum mismatch >> ERROR: superblock checksum mismatch >> Performing full device TRIM (2.00GiB) ... >> Performing full device TRIM (2.00GiB) ... >> Performing full device TRIM (2.00GiB) ... >> Performing full device TRIM (2.00GiB) ... >> >> Program received signal SIGBUS, Bus error. >> 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, >> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at >> volumes.c:2156 >> 2156 *(unsigned long *)(p_eb->data + i) ^= >> (gdb) bt >> #0 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, >> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) >> at volumes.c:2156 >> #1 0x0000000000119b30 in write_and_map_eb (trans=0x2cc250, >> root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:426 >> #2 0x0000000000119e74 in write_tree_block (trans=0x2cc250, >> root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:459 >> #3 0x000000000011a4ac in __commit_transaction (trans=0x2cc250, >> root=0x2c7d80) at disk-io.c:562 >> #4 0x000000000011a7b8 in btrfs_commit_transaction (trans=0x2cc250, >> root=0x2c7d80) at disk-io.c:598 >> #5 0x00000000001a2b04 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1786 >> (gdb) >> >> Can someone help please? Thanks. >> > > The code that faults: > > (unsigned long *)(p_eb->data + i) ^= *(unsigned long *)(ebs[j]->data + i); > > Because struct extent_buffer has 'data' as a char[], this will always > fault on sparc64 and probably a number of other RISC architectures. It > increments the address by 1, then reads an 8-byte chunk, XORs, and > writes the 8-byte chunk, repeat. In other words, 7 out of 8 reads > would fault, even if both `data` pointers were 8-byte aligned. Actually, I misread that: the code increments i by sizeof(unsigned long), not by 1 so only if either data point is misaligned, then this will fault. Disregard that. > > This would probably fix it, though it looks ugly. > unsigned long a, b; > memcpy(&a, p_eb->data + i, sizeof(a)); /* Read 8 bytes from p_eb->data+i */ > memcpy(&b, ebs[j]->data + i, sizeof(b)); /* Read 8 bytes from ebs[j]->data+i */ > a ^= b; /* XOR */ > memcpy(p_eb->data + i, &a, sizeof(a)); /* Write back to p_eb->data+i */ > > I'm not familiar with btrfs, but the results seems like they depend on > the sizeof(unsigned long). Given that they used parentheses, I assume > it was intentional. However, if this was supposed to do an XOR > operation 8 bytes at a time, then it would need to be something like: > > *(((unsigned long *)p_eb->data)+i) ^= *(((unsigned long *)ebs[j]->data) + i); > > i.e. cast pointer to unsigned long*, then add i (which would index > array of unsigned long, not char). > > --Patrick ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-27 13:59 [sparc64] mkfs.btrfs bus error / align issue? Anatoly Pugachev 2016-07-27 19:56 ` David Sterba 2016-07-27 20:39 ` Patrick Baggett @ 2016-07-27 20:40 ` John Paul Adrian Glaubitz 2016-07-27 20:48 ` Patrick Baggett 2 siblings, 1 reply; 21+ messages in thread From: John Paul Adrian Glaubitz @ 2016-07-27 20:40 UTC (permalink / raw) To: Anatoly Pugachev; +Cc: Btrfs BTRFS, debian-sparc On 07/27/2016 03:59 PM, Anatoly Pugachev wrote: > Program received signal SIGBUS, Bus error. > 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, > eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at > volumes.c:2156 > 2156 *(unsigned long *)(p_eb->data + i) ^= Well, that pretty much looks some creative pointer arithmetics that will provoke unaligned access. Just check what the declaration of p_eb->data is. If it's not "unsigned long", then you know why the code breaks here. You should be able to fix this issue by replacing the code line with: unsigned long tmp; memcpy(&tmp, &(ebs[j]->data + i), sizeof(unsigned long)); tmp ^= tmp; memcpy(&(p_eb->data + i), &tmp, sizeof(unsigned long)); I'm currently not sure whether this can be done more elegantly without provoking unaligned access due to the bitwise XOR (^). In either case, your problem is the casting and using memcpy() should fix the problem. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz@debian.org `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [sparc64] mkfs.btrfs bus error / align issue? 2016-07-27 20:40 ` John Paul Adrian Glaubitz @ 2016-07-27 20:48 ` Patrick Baggett 0 siblings, 0 replies; 21+ messages in thread From: Patrick Baggett @ 2016-07-27 20:48 UTC (permalink / raw) To: John Paul Adrian Glaubitz; +Cc: Anatoly Pugachev, Btrfs BTRFS, debian-sparc On Wed, Jul 27, 2016 at 3:40 PM, John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote: > On 07/27/2016 03:59 PM, Anatoly Pugachev wrote: >> Program received signal SIGBUS, Bus error. >> 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, >> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at >> volumes.c:2156 >> 2156 *(unsigned long *)(p_eb->data + i) ^= > > Well, that pretty much looks some creative pointer arithmetics that will provoke > unaligned access. Just check what the declaration of p_eb->data is. If it's > not "unsigned long", then you know why the code breaks here. > Yeah, so basically, the best solution (assuming you can't change the alignment of `data` somehow) would look something like: Compare lower 'n' bits of `data` pointers (n=2 for ILP32, n=3 for LP64), something like (data1 & sizeof(long)-1) == (data2 & sizeof(long)-1). If they are equal, fast loop possible. Not equal -> slow loop (all byte-by-byte copy). fast loop path: do byte-by-byte XOR until lower n bits of (data+i) are zero. do word-by-word XOR until < 1 word do byte-by-bye XOR until last bytes done. This sort of processing is pretty standard for "chunking". If this was done with some cool SIMD instruction set, it would have similar sort of approach, I'd imagine. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2016-07-29 15:03 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-27 13:59 [sparc64] mkfs.btrfs bus error / align issue? Anatoly Pugachev 2016-07-27 19:56 ` David Sterba 2016-07-28 9:44 ` David Sterba 2016-07-28 11:58 ` Anatoly Pugachev 2016-07-28 12:02 ` John Paul Adrian Glaubitz 2016-07-28 12:09 ` John Paul Adrian Glaubitz 2016-07-28 12:24 ` David Sterba 2016-07-28 14:01 ` Anatoly Pugachev 2016-07-28 14:25 ` John Paul Adrian Glaubitz 2016-07-28 14:28 ` John Paul Adrian Glaubitz 2016-07-28 14:32 ` Patrick Baggett 2016-07-28 18:04 ` David Sterba 2016-07-28 20:34 ` Anatoly Pugachev 2016-07-29 9:40 ` Anatoly Pugachev 2016-07-29 12:41 ` David Sterba 2016-07-29 15:03 ` Anatoly Pugachev 2016-07-29 10:20 ` John Paul Adrian Glaubitz 2016-07-27 20:39 ` Patrick Baggett 2016-07-27 20:41 ` Patrick Baggett 2016-07-27 20:40 ` John Paul Adrian Glaubitz 2016-07-27 20:48 ` Patrick Baggett
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.