linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "bad tree block start" when trying to mount on ARM
@ 2019-05-21  8:34 Erik Jensen
  2019-05-21  8:56 ` Patrik Lundquist
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Erik Jensen @ 2019-05-21  8:34 UTC (permalink / raw)
  To: linux-btrfs

I have a 5-drive btrfs filesystem. (raid-5 data, dup metadata). I can
mount it fine on my x86_64 system, and running `btrfs check` there
reveals no errors. However, I am not able to mount the filesystem on
my 32-bit ARM board, which I am hoping to use for lower-power file
serving. dmesg shows the following:

[   83.066301] BTRFS info (device dm-3): disk space caching is enabled
[   83.072817] BTRFS info (device dm-3): has skinny extents
[   83.553973] BTRFS error (device dm-3): bad tree block start, want
17628726968320 have 396461950000496896
[   83.554089] BTRFS error (device dm-3): bad tree block start, want
17628727001088 have 5606876608493751477
[   83.601176] BTRFS error (device dm-3): bad tree block start, want
17628727001088 have 5606876608493751477
[   83.610811] BTRFS error (device dm-3): failed to verify dev extents
against chunks: -5
[   83.639058] BTRFS error (device dm-3): open_ctree failed

Is this expected to work? I did notice that there are gotchas on the
wiki related to filesystems over 8TiB on 32-bit systems, but it
sounded like they were mostly related to running the tools, as opposed
to the filesystem driver itself. (Each of the five drives is
8TB/7.28TiB)

If this isn't expected, what should I do to help track down the issue?

Also potentially relevant: The x86_64 system is currently running
4.19.27, while the ARM system is running 5.1.3.

Finally, just in case it's relevant, I just finished reencrypting the
array, which involved doing a `btrfs replace` on each device in the
array.

Any pointers would be appreciated.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2019-05-21  8:34 "bad tree block start" when trying to mount on ARM Erik Jensen
@ 2019-05-21  8:56 ` Patrik Lundquist
  2019-05-21  9:01   ` Erik Jensen
  2019-05-21  9:18 ` Hugo Mills
  2019-05-21 10:17 ` Qu Wenruo
  2 siblings, 1 reply; 44+ messages in thread
From: Patrik Lundquist @ 2019-05-21  8:56 UTC (permalink / raw)
  To: Erik Jensen; +Cc: linux-btrfs

On Tue, 21 May 2019 at 10:35, Erik Jensen <erikjensen@rkjnsn.net> wrote:
>
> I have a 5-drive btrfs filesystem. (raid-5 data, dup metadata).

I don't know about ARM but you should use raid1 for the metadata since
dup can place both copies on the same drive.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2019-05-21  8:56 ` Patrik Lundquist
@ 2019-05-21  9:01   ` Erik Jensen
  0 siblings, 0 replies; 44+ messages in thread
From: Erik Jensen @ 2019-05-21  9:01 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs

Whoops, sorry. I actually meant RAID1. Data is RAID5, Metadata and
System are RAID1.

On Tue, May 21, 2019 at 1:56 AM Patrik Lundquist
<patrik.lundquist@gmail.com> wrote:
>
> On Tue, 21 May 2019 at 10:35, Erik Jensen <erikjensen@rkjnsn.net> wrote:
> >
> > I have a 5-drive btrfs filesystem. (raid-5 data, dup metadata).
>
> I don't know about ARM but you should use raid1 for the metadata since
> dup can place both copies on the same drive.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2019-05-21  8:34 "bad tree block start" when trying to mount on ARM Erik Jensen
  2019-05-21  8:56 ` Patrik Lundquist
@ 2019-05-21  9:18 ` Hugo Mills
  2019-05-22 16:02   ` Erik Jensen
  2019-05-21 10:17 ` Qu Wenruo
  2 siblings, 1 reply; 44+ messages in thread
From: Hugo Mills @ 2019-05-21  9:18 UTC (permalink / raw)
  To: Erik Jensen; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2714 bytes --]

On Tue, May 21, 2019 at 01:34:42AM -0700, Erik Jensen wrote:
> I have a 5-drive btrfs filesystem. (raid-5 data, dup metadata). I can
> mount it fine on my x86_64 system, and running `btrfs check` there
> reveals no errors. However, I am not able to mount the filesystem on
> my 32-bit ARM board, which I am hoping to use for lower-power file
> serving. dmesg shows the following:
> 
> [   83.066301] BTRFS info (device dm-3): disk space caching is enabled
> [   83.072817] BTRFS info (device dm-3): has skinny extents
> [   83.553973] BTRFS error (device dm-3): bad tree block start, want
> 17628726968320 have 396461950000496896
> [   83.554089] BTRFS error (device dm-3): bad tree block start, want
> 17628727001088 have 5606876608493751477
> [   83.601176] BTRFS error (device dm-3): bad tree block start, want
> 17628727001088 have 5606876608493751477
> [   83.610811] BTRFS error (device dm-3): failed to verify dev extents
> against chunks: -5
> [   83.639058] BTRFS error (device dm-3): open_ctree failed
> 
> Is this expected to work? I did notice that there are gotchas on the
> wiki related to filesystems over 8TiB on 32-bit systems, but it
> sounded like they were mostly related to running the tools, as opposed
> to the filesystem driver itself. (Each of the five drives is
> 8TB/7.28TiB)

   Yes, it should work. We had problems with ARM several years ago,
because of its unusual behaviour with unaligned word accesses, but
those were in userspace, and, as far as I know, fixed now. Looking at
the want/have numbers, it doesn't look like an endianness problem or
an ARM-unaligned-access problem.

> If this isn't expected, what should I do to help track down the issue?

   Can you show us the output of "btrfs check --readonly", on both the
x86_64 machine and the ARM machine? It might give some more insight
into the nature of the breakage.

   Possibly also "btrfs inspect dump-super" on both machines.

> Also potentially relevant: The x86_64 system is currently running
> 4.19.27, while the ARM system is running 5.1.3.

   Shouldn't make a difference.

> Finally, just in case it's relevant, I just finished reencrypting the
> array, which involved doing a `btrfs replace` on each device in the
> array.

   If you can still mount on x86_64, then the FS is at least
reasonably complete and undamaged. I don't think this will make a
difference.  However, it's worth checking whether there are any
funnies about your encryption layer on ARM (I wouldn't expect any,
since it's recognising the decrypted device as btrfs, rather than
random crud).

   Hugo.

-- 
Hugo Mills             | Prisoner unknown: Return to Zenda.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2019-05-21  8:34 "bad tree block start" when trying to mount on ARM Erik Jensen
  2019-05-21  8:56 ` Patrik Lundquist
  2019-05-21  9:18 ` Hugo Mills
@ 2019-05-21 10:17 ` Qu Wenruo
  2 siblings, 0 replies; 44+ messages in thread
From: Qu Wenruo @ 2019-05-21 10:17 UTC (permalink / raw)
  To: Erik Jensen, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1938 bytes --]



On 2019/5/21 下午4:34, Erik Jensen wrote:
> I have a 5-drive btrfs filesystem. (raid-5 data, dup metadata). I can
> mount it fine on my x86_64 system, and running `btrfs check` there
> reveals no errors. However, I am not able to mount the filesystem on
> my 32-bit ARM board, which I am hoping to use for lower-power file
> serving. dmesg shows the following:

Have you ever tried btrfs check on the arm board?

I have an odroid C2 board at hand, but never tried armhf build on it,
only tried aarch64.
It may be an interesting adventure.

Thanks,
Qu

> 
> [   83.066301] BTRFS info (device dm-3): disk space caching is enabled
> [   83.072817] BTRFS info (device dm-3): has skinny extents
> [   83.553973] BTRFS error (device dm-3): bad tree block start, want
> 17628726968320 have 396461950000496896
> [   83.554089] BTRFS error (device dm-3): bad tree block start, want
> 17628727001088 have 5606876608493751477
> [   83.601176] BTRFS error (device dm-3): bad tree block start, want
> 17628727001088 have 5606876608493751477
> [   83.610811] BTRFS error (device dm-3): failed to verify dev extents
> against chunks: -5
> [   83.639058] BTRFS error (device dm-3): open_ctree failed
> 
> Is this expected to work? I did notice that there are gotchas on the
> wiki related to filesystems over 8TiB on 32-bit systems, but it
> sounded like they were mostly related to running the tools, as opposed
> to the filesystem driver itself. (Each of the five drives is
> 8TB/7.28TiB)
> 
> If this isn't expected, what should I do to help track down the issue?
> 
> Also potentially relevant: The x86_64 system is currently running
> 4.19.27, while the ARM system is running 5.1.3.
> 
> Finally, just in case it's relevant, I just finished reencrypting the
> array, which involved doing a `btrfs replace` on each device in the
> array.
> 
> Any pointers would be appreciated.
> 
> Thanks.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2019-05-21  9:18 ` Hugo Mills
@ 2019-05-22 16:02   ` Erik Jensen
  2019-06-26  7:04     ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2019-05-22 16:02 UTC (permalink / raw)
  To: Hugo Mills, Erik Jensen, linux-btrfs

On Tue, May 21, 2019 at 2:18 AM Hugo Mills <hugo@carfax.org.uk> wrote:
>
> On Tue, May 21, 2019 at 01:34:42AM -0700, Erik Jensen wrote:
> > I have a 5-drive btrfs filesystem. (raid-5 data, dup metadata). I can
> > mount it fine on my x86_64 system, and running `btrfs check` there
> > reveals no errors. However, I am not able to mount the filesystem on
> > my 32-bit ARM board, which I am hoping to use for lower-power file
> > serving. dmesg shows the following:
> >
> > [   83.066301] BTRFS info (device dm-3): disk space caching is enabled
> > [   83.072817] BTRFS info (device dm-3): has skinny extents
> > [   83.553973] BTRFS error (device dm-3): bad tree block start, want
> > 17628726968320 have 396461950000496896
> > [   83.554089] BTRFS error (device dm-3): bad tree block start, want
> > 17628727001088 have 5606876608493751477
> > [   83.601176] BTRFS error (device dm-3): bad tree block start, want
> > 17628727001088 have 5606876608493751477
> > [   83.610811] BTRFS error (device dm-3): failed to verify dev extents
> > against chunks: -5
> > [   83.639058] BTRFS error (device dm-3): open_ctree failed
> >
> > Is this expected to work? I did notice that there are gotchas on the
> > wiki related to filesystems over 8TiB on 32-bit systems, but it
> > sounded like they were mostly related to running the tools, as opposed
> > to the filesystem driver itself. (Each of the five drives is
> > 8TB/7.28TiB)
>
>    Yes, it should work. We had problems with ARM several years ago,
> because of its unusual behaviour with unaligned word accesses, but
> those were in userspace, and, as far as I know, fixed now. Looking at
> the want/have numbers, it doesn't look like an endianness problem or
> an ARM-unaligned-access problem.
>
> > If this isn't expected, what should I do to help track down the issue?
>
>    Can you show us the output of "btrfs check --readonly", on both the
> x86_64 machine and the ARM machine? It might give some more insight
> into the nature of the breakage.

On x86_64:
Opening filesystem to check...
Checking filesystem on /dev/mapper/storage1
UUID: aafd9149-9cfe-4970-ae21-f1065c36ed63
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 17647861833728 bytes used, no error found
total csum bytes: 17211131512
total tree bytes: 19333480448
total fs tree bytes: 202801152
total extent tree bytes: 183894016
btree space waste bytes: 1474174626
file data blocks allocated: 17628822319104
 referenced 17625817141248

On ARM:
Opening filesystem to check...
Checking filesystem on /dev/mapper/storage1
UUID: aafd9149-9cfe-4970-ae21-f1065c36ed63
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 17647861833728 bytes used, no error found
total csum bytes: 17211131512
total tree bytes: 19333480448
total fs tree bytes: 202801152
total extent tree bytes: 183894016
btree space waste bytes: 1474174626
file data blocks allocated: 17628822319104
 referenced 17625817141248

>    Possibly also "btrfs inspect dump-super" on both machines.

On x86_64:
superblock: bytenr=65536, device=/dev/dm-5
---------------------------------------------------------
csum_type        0 (crc32c)
csum_size        4
csum            0x737fcf72 [match]
bytenr            65536
flags            0x1
            ( WRITTEN )
magic            _BHRfS_M [match]
fsid            aafd9149-9cfe-4970-ae21-f1065c36ed63
label            Storage
generation        97532
root            30687232
sys_array_size        129
chunk_root_generation    97526
root_level        1
chunk_root        20971520
chunk_root_level    1
log_root        0
log_root_transid    0
log_root_level        0
total_bytes        40007732224000
bytes_used        17647861833728
sectorsize        4096
nodesize        16384
leafsize (deprecated)        16384
stripesize        4096
root_dir        6
num_devices        5
compat_flags        0x0
compat_ro_flags        0x0
incompat_flags        0x1e1
            ( MIXED_BACKREF |
              BIG_METADATA |
              EXTENDED_IREF |
              RAID56 |
              SKINNY_METADATA )
cache_generation    97532
uuid_tree_generation    97532
dev_item.uuid        39a9463d-65f5-499b-bca8-dae6b52eb729
dev_item.fsid        aafd9149-9cfe-4970-ae21-f1065c36ed63 [match]
dev_item.type        0
dev_item.total_bytes    8001546444800
dev_item.bytes_used    4436709605376
dev_item.io_align    4096
dev_item.io_width    4096
dev_item.sector_size    4096
dev_item.devid        5
dev_item.dev_group    0
dev_item.seek_speed    0
dev_item.bandwidth    0
dev_item.generation    0

On ARM:
superblock: bytenr=65536, device=/dev/dm-2
---------------------------------------------------------
csum_type        0 (crc32c)
csum_size        4
csum            0x737fcf72 [match]
bytenr            65536
flags            0x1
            ( WRITTEN )
magic            _BHRfS_M [match]
fsid            aafd9149-9cfe-4970-ae21-f1065c36ed63
metadata_uuid        aafd9149-9cfe-4970-ae21-f1065c36ed63
label            Storage
generation        97532
root            30687232
sys_array_size        129
chunk_root_generation    97526
root_level        1
chunk_root        20971520
chunk_root_level    1
log_root        0
log_root_transid    0
log_root_level        0
total_bytes        40007732224000
bytes_used        17647861833728
sectorsize        4096
nodesize        16384
leafsize (deprecated)    16384
stripesize        4096
root_dir        6
num_devices        5
compat_flags        0x0
compat_ro_flags        0x0
incompat_flags        0x1e1
            ( MIXED_BACKREF |
              BIG_METADATA |
              EXTENDED_IREF |
              RAID56 |
              SKINNY_METADATA )
cache_generation    97532
uuid_tree_generation    97532
dev_item.uuid        39a9463d-65f5-499b-bca8-dae6b52eb729
dev_item.fsid        aafd9149-9cfe-4970-ae21-f1065c36ed63 [match]
dev_item.type        0
dev_item.total_bytes    8001546444800
dev_item.bytes_used    4436709605376
dev_item.io_align    4096
dev_item.io_width    4096
dev_item.sector_size    4096
dev_item.devid        5
dev_item.dev_group    0
dev_item.seek_speed    0
dev_item.bandwidth    0
dev_item.generation    0

Only difference appears to the extra metadata_uuid line on ARM. I
assume that's because the ARM system is running btrfs-progs v4.20.2 vs
v4.19 on the x86_64 system.

> > Also potentially relevant: The x86_64 system is currently running
> > 4.19.27, while the ARM system is running 5.1.3.
>
>    Shouldn't make a difference.
>
> > Finally, just in case it's relevant, I just finished reencrypting the
> > array, which involved doing a `btrfs replace` on each device in the
> > array.
>
>    If you can still mount on x86_64, then the FS is at least
> reasonably complete and undamaged. I don't think this will make a
> difference.  However, it's worth checking whether there are any
> funnies about your encryption layer on ARM (I wouldn't expect any,
> since it's recognising the decrypted device as btrfs, rather than
> random crud).

I took the sha256 hash of the first GiB of plaintext on each drive,
and got the same result on both systems, so I think things should be
okay, there.

>    Hugo.
>
> --
> Hugo Mills             | Prisoner unknown: Return to Zenda.
> hugo@... carfax.org.uk |
> http://carfax.org.uk/  |
> PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2019-05-22 16:02   ` Erik Jensen
@ 2019-06-26  7:04     ` Erik Jensen
  2019-06-26  8:10       ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2019-06-26  7:04 UTC (permalink / raw)
  To: Hugo Mills, Erik Jensen, linux-btrfs

I'm still seeing this. Anything else I can try?

On Wed, May 22, 2019 at 9:02 AM Erik Jensen <erikjensen@rkjnsn.net> wrote:
>
> On Tue, May 21, 2019 at 2:18 AM Hugo Mills <hugo@carfax.org.uk> wrote:
> >
> > On Tue, May 21, 2019 at 01:34:42AM -0700, Erik Jensen wrote:
> > > I have a 5-drive btrfs filesystem. (raid-5 data, dup metadata). I can
> > > mount it fine on my x86_64 system, and running `btrfs check` there
> > > reveals no errors. However, I am not able to mount the filesystem on
> > > my 32-bit ARM board, which I am hoping to use for lower-power file
> > > serving. dmesg shows the following:
> > >
> > > [   83.066301] BTRFS info (device dm-3): disk space caching is enabled
> > > [   83.072817] BTRFS info (device dm-3): has skinny extents
> > > [   83.553973] BTRFS error (device dm-3): bad tree block start, want
> > > 17628726968320 have 396461950000496896
> > > [   83.554089] BTRFS error (device dm-3): bad tree block start, want
> > > 17628727001088 have 5606876608493751477
> > > [   83.601176] BTRFS error (device dm-3): bad tree block start, want
> > > 17628727001088 have 5606876608493751477
> > > [   83.610811] BTRFS error (device dm-3): failed to verify dev extents
> > > against chunks: -5
> > > [   83.639058] BTRFS error (device dm-3): open_ctree failed
> > >
> > > Is this expected to work? I did notice that there are gotchas on the
> > > wiki related to filesystems over 8TiB on 32-bit systems, but it
> > > sounded like they were mostly related to running the tools, as opposed
> > > to the filesystem driver itself. (Each of the five drives is
> > > 8TB/7.28TiB)
> >
> >    Yes, it should work. We had problems with ARM several years ago,
> > because of its unusual behaviour with unaligned word accesses, but
> > those were in userspace, and, as far as I know, fixed now. Looking at
> > the want/have numbers, it doesn't look like an endianness problem or
> > an ARM-unaligned-access problem.
> >
> > > If this isn't expected, what should I do to help track down the issue?
> >
> >    Can you show us the output of "btrfs check --readonly", on both the
> > x86_64 machine and the ARM machine? It might give some more insight
> > into the nature of the breakage.
>
> On x86_64:
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/storage1
> UUID: aafd9149-9cfe-4970-ae21-f1065c36ed63
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 17647861833728 bytes used, no error found
> total csum bytes: 17211131512
> total tree bytes: 19333480448
> total fs tree bytes: 202801152
> total extent tree bytes: 183894016
> btree space waste bytes: 1474174626
> file data blocks allocated: 17628822319104
>  referenced 17625817141248
>
> On ARM:
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/storage1
> UUID: aafd9149-9cfe-4970-ae21-f1065c36ed63
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 17647861833728 bytes used, no error found
> total csum bytes: 17211131512
> total tree bytes: 19333480448
> total fs tree bytes: 202801152
> total extent tree bytes: 183894016
> btree space waste bytes: 1474174626
> file data blocks allocated: 17628822319104
>  referenced 17625817141248
>
> >    Possibly also "btrfs inspect dump-super" on both machines.
>
> On x86_64:
> superblock: bytenr=65536, device=/dev/dm-5
> ---------------------------------------------------------
> csum_type        0 (crc32c)
> csum_size        4
> csum            0x737fcf72 [match]
> bytenr            65536
> flags            0x1
>             ( WRITTEN )
> magic            _BHRfS_M [match]
> fsid            aafd9149-9cfe-4970-ae21-f1065c36ed63
> label            Storage
> generation        97532
> root            30687232
> sys_array_size        129
> chunk_root_generation    97526
> root_level        1
> chunk_root        20971520
> chunk_root_level    1
> log_root        0
> log_root_transid    0
> log_root_level        0
> total_bytes        40007732224000
> bytes_used        17647861833728
> sectorsize        4096
> nodesize        16384
> leafsize (deprecated)        16384
> stripesize        4096
> root_dir        6
> num_devices        5
> compat_flags        0x0
> compat_ro_flags        0x0
> incompat_flags        0x1e1
>             ( MIXED_BACKREF |
>               BIG_METADATA |
>               EXTENDED_IREF |
>               RAID56 |
>               SKINNY_METADATA )
> cache_generation    97532
> uuid_tree_generation    97532
> dev_item.uuid        39a9463d-65f5-499b-bca8-dae6b52eb729
> dev_item.fsid        aafd9149-9cfe-4970-ae21-f1065c36ed63 [match]
> dev_item.type        0
> dev_item.total_bytes    8001546444800
> dev_item.bytes_used    4436709605376
> dev_item.io_align    4096
> dev_item.io_width    4096
> dev_item.sector_size    4096
> dev_item.devid        5
> dev_item.dev_group    0
> dev_item.seek_speed    0
> dev_item.bandwidth    0
> dev_item.generation    0
>
> On ARM:
> superblock: bytenr=65536, device=/dev/dm-2
> ---------------------------------------------------------
> csum_type        0 (crc32c)
> csum_size        4
> csum            0x737fcf72 [match]
> bytenr            65536
> flags            0x1
>             ( WRITTEN )
> magic            _BHRfS_M [match]
> fsid            aafd9149-9cfe-4970-ae21-f1065c36ed63
> metadata_uuid        aafd9149-9cfe-4970-ae21-f1065c36ed63
> label            Storage
> generation        97532
> root            30687232
> sys_array_size        129
> chunk_root_generation    97526
> root_level        1
> chunk_root        20971520
> chunk_root_level    1
> log_root        0
> log_root_transid    0
> log_root_level        0
> total_bytes        40007732224000
> bytes_used        17647861833728
> sectorsize        4096
> nodesize        16384
> leafsize (deprecated)    16384
> stripesize        4096
> root_dir        6
> num_devices        5
> compat_flags        0x0
> compat_ro_flags        0x0
> incompat_flags        0x1e1
>             ( MIXED_BACKREF |
>               BIG_METADATA |
>               EXTENDED_IREF |
>               RAID56 |
>               SKINNY_METADATA )
> cache_generation    97532
> uuid_tree_generation    97532
> dev_item.uuid        39a9463d-65f5-499b-bca8-dae6b52eb729
> dev_item.fsid        aafd9149-9cfe-4970-ae21-f1065c36ed63 [match]
> dev_item.type        0
> dev_item.total_bytes    8001546444800
> dev_item.bytes_used    4436709605376
> dev_item.io_align    4096
> dev_item.io_width    4096
> dev_item.sector_size    4096
> dev_item.devid        5
> dev_item.dev_group    0
> dev_item.seek_speed    0
> dev_item.bandwidth    0
> dev_item.generation    0
>
> Only difference appears to the extra metadata_uuid line on ARM. I
> assume that's because the ARM system is running btrfs-progs v4.20.2 vs
> v4.19 on the x86_64 system.
>
> > > Also potentially relevant: The x86_64 system is currently running
> > > 4.19.27, while the ARM system is running 5.1.3.
> >
> >    Shouldn't make a difference.
> >
> > > Finally, just in case it's relevant, I just finished reencrypting the
> > > array, which involved doing a `btrfs replace` on each device in the
> > > array.
> >
> >    If you can still mount on x86_64, then the FS is at least
> > reasonably complete and undamaged. I don't think this will make a
> > difference.  However, it's worth checking whether there are any
> > funnies about your encryption layer on ARM (I wouldn't expect any,
> > since it's recognising the decrypted device as btrfs, rather than
> > random crud).
>
> I took the sha256 hash of the first GiB of plaintext on each drive,
> and got the same result on both systems, so I think things should be
> okay, there.
>
> >    Hugo.
> >
> > --
> > Hugo Mills             | Prisoner unknown: Return to Zenda.
> > hugo@... carfax.org.uk |
> > http://carfax.org.uk/  |
> > PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2019-06-26  7:04     ` Erik Jensen
@ 2019-06-26  8:10       ` Qu Wenruo
       [not found]         ` <CAMj6ewO229vq6=s+T7GhUegwDADv4dzhqPiM0jo10QiKujvytA@mail.gmail.com>
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2019-06-26  8:10 UTC (permalink / raw)
  To: Erik Jensen, Hugo Mills, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2749 bytes --]



On 2019/6/26 下午3:04, Erik Jensen wrote:
> I'm still seeing this. Anything else I can try?
[...]
>>>>
>>>> [   83.066301] BTRFS info (device dm-3): disk space caching is enabled
>>>> [   83.072817] BTRFS info (device dm-3): has skinny extents
>>>> [   83.553973] BTRFS error (device dm-3): bad tree block start, want
>>>> 17628726968320 have 396461950000496896
>>>> [   83.554089] BTRFS error (device dm-3): bad tree block start, want
>>>> 17628727001088 have 5606876608493751477
>>>> [   83.601176] BTRFS error (device dm-3): bad tree block start, want
>>>> 17628727001088 have 5606876608493751477
>>>> [   83.610811] BTRFS error (device dm-3): failed to verify dev extents
>>>> against chunks: -5
>>>> [   83.639058] BTRFS error (device dm-3): open_ctree failed

Since your fsck reports no error, I'd say your on-disk data is
completely fine.

So it's either the block layer reading some wrong from the disk or btrfs
layer doesn't do correct endian convert.

Would you dump the following data (X86 and ARM should output the same
content, thus one output is enough).
# btrfs ins dump-tree -b 17628726968320 /dev/dm-3
# btrfs ins dump-tree -b 17628727001088 /dev/dm-3

And then, for the ARM system, please apply the following diff, and try
mount again.
The diff adds extra debug info, to exam the vital members of a tree block.

Correct fs should output something like:
  BTRFS error (device dm-4): bad tree block start, want 30408704 have 0
  tree block gen=4 owner=5 nritems=2 level=0
  csum:
a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000

The csum one is the most important one, if there aren't so many zeros,
it means at that timing, btrfs just got a bunch of garbage, thus we
could do further debug.

Thanks,
Qu


diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index deb74a8c191a..e9d11d501b7b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -618,8 +618,16 @@ static int btree_readpage_end_io_hook(struct
btrfs_io_bio *io_bio,

        found_start = btrfs_header_bytenr(eb);
        if (found_start != eb->start) {
+               u8 csum[BTRFS_CSUM_SIZE];
+
                btrfs_err_rl(fs_info, "bad tree block start, want %llu
have %llu",
                             eb->start, found_start);
+               pr_info("tree block gen=%llu owner=%llu nritems=%u
level=%u\n",
+                       btrfs_header_generation(eb), btrfs_header_owner(eb),
+                       btrfs_header_nritems(eb), btrfs_header_level(eb));
+               read_extent_buffer(eb, csum, 0, BTRFS_CSUM_SIZE);
+               pr_info("csum: %pU%-pU\n", csum, csum + 16);
+
                ret = -EIO;
                goto err;
        }


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
       [not found]         ` <CAMj6ewO229vq6=s+T7GhUegwDADv4dzhqPiM0jo10QiKujvytA@mail.gmail.com>
@ 2019-06-28  8:15           ` Qu Wenruo
  2021-01-18 10:50             ` Erik Jensen
       [not found]             ` <CAMj6ewMqXLtrBQgTJuz04v3MBZ0W95fU4pT0jP6kFhuP830TuA@mail.gmail.com>
  0 siblings, 2 replies; 44+ messages in thread
From: Qu Wenruo @ 2019-06-28  8:15 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Hugo Mills, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3244 bytes --]



On 2019/6/28 下午4:00, Erik Jensen wrote:
>> So it's either the block layer reading some wrong from the disk or btrfs
>> layer doesn't do correct endian convert.
> 
> My ARM board is running in little endian mode, so it doesn't seem like
> endianness should be an issue. (It is 32-bits versus my desktop's 64,
> though.) I've also tried exporting the drives via NBD to my x86_64
> system, and that worked fine, so if the problem is under btrfs, it
> would have to be in the encryption layer, but fsck succeeding on the
> ARM board would seem to rule that out, as well.
> 
>> Would you dump the following data (X86 and ARM should output the same
>> content, thus one output is enough).
>> # btrfs ins dump-tree -b 17628726968320 /dev/dm-3
>> # btrfs ins dump-tree -b 17628727001088 /dev/dm-3
> 
> Attached, and also 17628705964032, since that's the block mentioned in
> my most recent mount attempt (see below).

The trees are completely fine.

So it should be something else causing the problem.

> 
>> And then, for the ARM system, please apply the following diff, and try
>> mount again.
>> The diff adds extra debug info, to exam the vital members of a tree block.
>>
>> Correct fs should output something like:
>>   BTRFS error (device dm-4): bad tree block start, want 30408704 have 0
>>   tree block gen=4 owner=5 nritems=2 level=0
>>   csum:
>> a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000
>>
>> The csum one is the most important one, if there aren't so many zeros,
>> it means at that timing, btrfs just got a bunch of garbage, thus we
>> could do further debug.
> 
> [  131.725573] BTRFS info (device dm-1): disk space caching is enabled
> [  131.731884] BTRFS info (device dm-1): has skinny extents
> [  133.046145] BTRFS error (device dm-1): bad tree block start, want
> 17628705964032 have 2807793151171243621
> [  133.055775] tree block gen=7888986126946982446
> owner=11331573954727661546 nritems=4191910623 level=112
> [  133.065661] csum:
> 416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc

Completely garbage here, so I'd say the data we got isn't what we want.

> [  133.108383] BTRFS error (device dm-1): bad tree block start, want
> 17628705964032 have 2807793151171243621
> [  133.117999] tree block gen=7888986126946982446
> owner=11331573954727661546 nritems=4191910623 level=112
> [  133.127756] csum:
> 416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc

But strangely, the 2nd try still gives us the same result, if it's
really some garbage, we should get some different result.

> [  133.136241] BTRFS error (device dm-1): failed to verify dev extents
> against chunks: -5

You can try to skip the dev extents verification by commenting out the
btrfs_verify_dev_extents() call in disk-io.c::open_ctree().

It may fail at another location though.

The more strange part is, we have the device tree root node read out
without problem.

Thanks,
Qu

> [  133.166165] BTRFS error (device dm-1): open_ctree failed
> 
> I copied some files over last time I had it mounted on my desktop,
> which may be why it's now failing at a different block.
> 
> Thanks!
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2019-06-28  8:15           ` Qu Wenruo
@ 2021-01-18 10:50             ` Erik Jensen
       [not found]             ` <CAMj6ewMqXLtrBQgTJuz04v3MBZ0W95fU4pT0jP6kFhuP830TuA@mail.gmail.com>
  1 sibling, 0 replies; 44+ messages in thread
From: Erik Jensen @ 2021-01-18 10:50 UTC (permalink / raw)
  To: linux-btrfs

I ended up having other priorities occupying my time since 2019, and
the "solution" of exporting the individual drives on my NAS using NBD
and mounting them on my desktop worked, even if it wasn't pretty.

However, I am currently looking into Syncthing, which I would like to
run on the NAS directly. That would, of course, require accessing the
filesystem directly on the NAS rather than just exporting the raw
devices, which means circling back to this issue.

After updating my NAS, I have determined that the issue still occurs
with Linux 5.8.

What's the next best step for debugging the issue? Ideally, I'd like
to help track down the issue to find a proper fix, rather than just
trying to bypass the issue. I wasn't sure if the suggestion to comment
out btrfs_verify_dev_extents() was more geared toward the former or
the latter.


On Fri, Jun 28, 2019 at 1:15 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/6/28 下午4:00, Erik Jensen wrote:
> >> So it's either the block layer reading some wrong from the disk or btrfs
> >> layer doesn't do correct endian convert.
> >
> > My ARM board is running in little endian mode, so it doesn't seem like
> > endianness should be an issue. (It is 32-bits versus my desktop's 64,
> > though.) I've also tried exporting the drives via NBD to my x86_64
> > system, and that worked fine, so if the problem is under btrfs, it
> > would have to be in the encryption layer, but fsck succeeding on the
> > ARM board would seem to rule that out, as well.
> >
> >> Would you dump the following data (X86 and ARM should output the same
> >> content, thus one output is enough).
> >> # btrfs ins dump-tree -b 17628726968320 /dev/dm-3
> >> # btrfs ins dump-tree -b 17628727001088 /dev/dm-3
> >
> > Attached, and also 17628705964032, since that's the block mentioned in
> > my most recent mount attempt (see below).
>
> The trees are completely fine.
>
> So it should be something else causing the problem.
>
> >
> >> And then, for the ARM system, please apply the following diff, and try
> >> mount again.
> >> The diff adds extra debug info, to exam the vital members of a tree block.
> >>
> >> Correct fs should output something like:
> >>   BTRFS error (device dm-4): bad tree block start, want 30408704 have 0
> >>   tree block gen=4 owner=5 nritems=2 level=0
> >>   csum:
> >> a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000
> >>
> >> The csum one is the most important one, if there aren't so many zeros,
> >> it means at that timing, btrfs just got a bunch of garbage, thus we
> >> could do further debug.
> >
> > [  131.725573] BTRFS info (device dm-1): disk space caching is enabled
> > [  131.731884] BTRFS info (device dm-1): has skinny extents
> > [  133.046145] BTRFS error (device dm-1): bad tree block start, want
> > 17628705964032 have 2807793151171243621
> > [  133.055775] tree block gen=7888986126946982446
> > owner=11331573954727661546 nritems=4191910623 level=112
> > [  133.065661] csum:
> > 416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
>
> Completely garbage here, so I'd say the data we got isn't what we want.
>
> > [  133.108383] BTRFS error (device dm-1): bad tree block start, want
> > 17628705964032 have 2807793151171243621
> > [  133.117999] tree block gen=7888986126946982446
> > owner=11331573954727661546 nritems=4191910623 level=112
> > [  133.127756] csum:
> > 416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
>
> But strangely, the 2nd try still gives us the same result, if it's
> really some garbage, we should get some different result.
>
> > [  133.136241] BTRFS error (device dm-1): failed to verify dev extents
> > against chunks: -5
>
> You can try to skip the dev extents verification by commenting out the
> btrfs_verify_dev_extents() call in disk-io.c::open_ctree().
>
> It may fail at another location though.
>
> The more strange part is, we have the device tree root node read out
> without problem.
>
> Thanks,
> Qu
>
> > [  133.166165] BTRFS error (device dm-1): open_ctree failed
> >
> > I copied some files over last time I had it mounted on my desktop,
> > which may be why it's now failing at a different block.
> >
> > Thanks!
> >
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
       [not found]             ` <CAMj6ewMqXLtrBQgTJuz04v3MBZ0W95fU4pT0jP6kFhuP830TuA@mail.gmail.com>
@ 2021-01-18 11:07               ` Qu Wenruo
  2021-01-18 11:55                 ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-01-18 11:07 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Hugo Mills, linux-btrfs



On 2021/1/18 下午6:33, Erik Jensen wrote:
> I ended up having other priorities occupying my time since 2019, and the
> "solution" of exporting the individual drives on my NAS using NBD and
> mounting them on my desktop worked, even if it wasn't pretty.
>
> However, I am currently looking into Syncthing, which I would like to
> run on the NAS directly. That would, of course, require accessing the
> filesystem directly on the NAS rather than just exporting the raw
> devices, which means circling back to this issue.
>
> After updating my NAS, I have determined that the issue still occurs
> with Linux 5.8.
>
> What's the next best step for debugging the issue? Ideally, I'd like to
> help track down the issue to find a proper fix, rather than just trying
> to bypass the issue. I wasn't sure if the suggestion to comment out
> btrfs_verify_dev_extents() was more geared toward the former or the latter.

After rewinding my memory on this case, the problem is really that the
ARM btrfs kernel is reading garbage, while X86 or ARM user space tool
works as expected.

Can you recompile your kernel on the ARM board to add extra debugging
messages?
If possible, we can try to add some extra debug points to bombarding
your dmesg.

Or do you have other ARM boards to test the same fs?


Thanks,
Qu


>
> On Fri, Jun 28, 2019 at 1:15 AM Qu Wenruo <quwenruo.btrfs@gmx.com
> <mailto:quwenruo.btrfs@gmx.com>> wrote:
>
>
>
>     On 2019/6/28 下午4:00, Erik Jensen wrote:
>      >> So it's either the block layer reading some wrong from the disk
>     or btrfs
>      >> layer doesn't do correct endian convert.
>      >
>      > My ARM board is running in little endian mode, so it doesn't seem
>     like
>      > endianness should be an issue. (It is 32-bits versus my desktop's 64,
>      > though.) I've also tried exporting the drives via NBD to my x86_64
>      > system, and that worked fine, so if the problem is under btrfs, it
>      > would have to be in the encryption layer, but fsck succeeding on the
>      > ARM board would seem to rule that out, as well.
>      >
>      >> Would you dump the following data (X86 and ARM should output the
>     same
>      >> content, thus one output is enough).
>      >> # btrfs ins dump-tree -b 17628726968320 /dev/dm-3
>      >> # btrfs ins dump-tree -b 17628727001088 /dev/dm-3
>      >
>      > Attached, and also 17628705964032, since that's the block
>     mentioned in
>      > my most recent mount attempt (see below).
>
>     The trees are completely fine.
>
>     So it should be something else causing the problem.
>
>      >
>      >> And then, for the ARM system, please apply the following diff,
>     and try
>      >> mount again.
>      >> The diff adds extra debug info, to exam the vital members of a
>     tree block.
>      >>
>      >> Correct fs should output something like:
>      >>   BTRFS error (device dm-4): bad tree block start, want 30408704
>     have 0
>      >>   tree block gen=4 owner=5 nritems=2 level=0
>      >>   csum:
>      >>
>     a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000
>      >>
>      >> The csum one is the most important one, if there aren't so many
>     zeros,
>      >> it means at that timing, btrfs just got a bunch of garbage, thus we
>      >> could do further debug.
>      >
>      > [  131.725573] BTRFS info (device dm-1): disk space caching is
>     enabled
>      > [  131.731884] BTRFS info (device dm-1): has skinny extents
>      > [  133.046145] BTRFS error (device dm-1): bad tree block start, want
>      > 17628705964032 have 2807793151171243621
>      > [  133.055775] tree block gen=7888986126946982446
>      > owner=11331573954727661546 nritems=4191910623 level=112
>      > [  133.065661] csum:
>      >
>     416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
>
>     Completely garbage here, so I'd say the data we got isn't what we want.
>
>      > [  133.108383] BTRFS error (device dm-1): bad tree block start, want
>      > 17628705964032 have 2807793151171243621
>      > [  133.117999] tree block gen=7888986126946982446
>      > owner=11331573954727661546 nritems=4191910623 level=112
>      > [  133.127756] csum:
>      >
>     416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
>
>     But strangely, the 2nd try still gives us the same result, if it's
>     really some garbage, we should get some different result.
>
>      > [  133.136241] BTRFS error (device dm-1): failed to verify dev
>     extents
>      > against chunks: -5
>
>     You can try to skip the dev extents verification by commenting out the
>     btrfs_verify_dev_extents() call in disk-io.c::open_ctree().
>
>     It may fail at another location though.
>
>     The more strange part is, we have the device tree root node read out
>     without problem.
>
>     Thanks,
>     Qu
>
>      > [  133.166165] BTRFS error (device dm-1): open_ctree failed
>      >
>      > I copied some files over last time I had it mounted on my desktop,
>      > which may be why it's now failing at a different block.
>      >
>      > Thanks!
>      >
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-01-18 11:07               ` Qu Wenruo
@ 2021-01-18 11:55                 ` Erik Jensen
  2021-01-18 12:01                   ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-01-18 11:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Hugo Mills, linux-btrfs

On Mon, Jan 18, 2021 at 3:07 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2021/1/18 下午6:33, Erik Jensen wrote:
> > I ended up having other priorities occupying my time since 2019, and the
> > "solution" of exporting the individual drives on my NAS using NBD and
> > mounting them on my desktop worked, even if it wasn't pretty.
> >
> > However, I am currently looking into Syncthing, which I would like to
> > run on the NAS directly. That would, of course, require accessing the
> > filesystem directly on the NAS rather than just exporting the raw
> > devices, which means circling back to this issue.
> >
> > After updating my NAS, I have determined that the issue still occurs
> > with Linux 5.8.
> >
> > What's the next best step for debugging the issue? Ideally, I'd like to
> > help track down the issue to find a proper fix, rather than just trying
> > to bypass the issue. I wasn't sure if the suggestion to comment out
> > btrfs_verify_dev_extents() was more geared toward the former or the latter.
>
> After rewinding my memory on this case, the problem is really that the
> ARM btrfs kernel is reading garbage, while X86 or ARM user space tool
> works as expected.
>
> Can you recompile your kernel on the ARM board to add extra debugging
> messages?
> If possible, we can try to add some extra debug points to bombarding
> your dmesg.
>
> Or do you have other ARM boards to test the same fs?
>
>
> Thanks,
> Qu

It's pretty easy to build a kernel with custom patches applied, though
the actual building takes a while, so I'd be happy to add whatever
debug messages would be useful. I also have an old Raspberry Pi
(original model B) I can dig out and try to get going, tomorrow. I
can't hook it up to the drives directly, but I should be able to
access them via NBD like I was doing from my desktop. If I can't get
that going for whatever reason, I could also try running an emulated
ARM system with QEMU.

> >
> > On Fri, Jun 28, 2019 at 1:15 AM Qu Wenruo <quwenruo.btrfs@gmx.com
> > <mailto:quwenruo.btrfs@gmx.com>> wrote:
> >
> >
> >
> >     On 2019/6/28 下午4:00, Erik Jensen wrote:
> >      >> So it's either the block layer reading some wrong from the disk
> >     or btrfs
> >      >> layer doesn't do correct endian convert.
> >      >
> >      > My ARM board is running in little endian mode, so it doesn't seem
> >     like
> >      > endianness should be an issue. (It is 32-bits versus my desktop's 64,
> >      > though.) I've also tried exporting the drives via NBD to my x86_64
> >      > system, and that worked fine, so if the problem is under btrfs, it
> >      > would have to be in the encryption layer, but fsck succeeding on the
> >      > ARM board would seem to rule that out, as well.
> >      >
> >      >> Would you dump the following data (X86 and ARM should output the
> >     same
> >      >> content, thus one output is enough).
> >      >> # btrfs ins dump-tree -b 17628726968320 /dev/dm-3
> >      >> # btrfs ins dump-tree -b 17628727001088 /dev/dm-3
> >      >
> >      > Attached, and also 17628705964032, since that's the block
> >     mentioned in
> >      > my most recent mount attempt (see below).
> >
> >     The trees are completely fine.
> >
> >     So it should be something else causing the problem.
> >
> >      >
> >      >> And then, for the ARM system, please apply the following diff,
> >     and try
> >      >> mount again.
> >      >> The diff adds extra debug info, to exam the vital members of a
> >     tree block.
> >      >>
> >      >> Correct fs should output something like:
> >      >>   BTRFS error (device dm-4): bad tree block start, want 30408704
> >     have 0
> >      >>   tree block gen=4 owner=5 nritems=2 level=0
> >      >>   csum:
> >      >>
> >     a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000
> >      >>
> >      >> The csum one is the most important one, if there aren't so many
> >     zeros,
> >      >> it means at that timing, btrfs just got a bunch of garbage, thus we
> >      >> could do further debug.
> >      >
> >      > [  131.725573] BTRFS info (device dm-1): disk space caching is
> >     enabled
> >      > [  131.731884] BTRFS info (device dm-1): has skinny extents
> >      > [  133.046145] BTRFS error (device dm-1): bad tree block start, want
> >      > 17628705964032 have 2807793151171243621
> >      > [  133.055775] tree block gen=7888986126946982446
> >      > owner=11331573954727661546 nritems=4191910623 level=112
> >      > [  133.065661] csum:
> >      >
> >     416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
> >
> >     Completely garbage here, so I'd say the data we got isn't what we want.
> >
> >      > [  133.108383] BTRFS error (device dm-1): bad tree block start, want
> >      > 17628705964032 have 2807793151171243621
> >      > [  133.117999] tree block gen=7888986126946982446
> >      > owner=11331573954727661546 nritems=4191910623 level=112
> >      > [  133.127756] csum:
> >      >
> >     416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
> >
> >     But strangely, the 2nd try still gives us the same result, if it's
> >     really some garbage, we should get some different result.
> >
> >      > [  133.136241] BTRFS error (device dm-1): failed to verify dev
> >     extents
> >      > against chunks: -5
> >
> >     You can try to skip the dev extents verification by commenting out the
> >     btrfs_verify_dev_extents() call in disk-io.c::open_ctree().
> >
> >     It may fail at another location though.
> >
> >     The more strange part is, we have the device tree root node read out
> >     without problem.
> >
> >     Thanks,
> >     Qu
> >
> >      > [  133.166165] BTRFS error (device dm-1): open_ctree failed
> >      >
> >      > I copied some files over last time I had it mounted on my desktop,
> >      > which may be why it's now failing at a different block.
> >      >
> >      > Thanks!
> >      >
> >

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-01-18 11:55                 ` Erik Jensen
@ 2021-01-18 12:01                   ` Qu Wenruo
  2021-01-18 12:12                     ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-01-18 12:01 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Hugo Mills, linux-btrfs



On 2021/1/18 下午7:55, Erik Jensen wrote:
> On Mon, Jan 18, 2021 at 3:07 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2021/1/18 下午6:33, Erik Jensen wrote:
>>> I ended up having other priorities occupying my time since 2019, and the
>>> "solution" of exporting the individual drives on my NAS using NBD and
>>> mounting them on my desktop worked, even if it wasn't pretty.
>>>
>>> However, I am currently looking into Syncthing, which I would like to
>>> run on the NAS directly. That would, of course, require accessing the
>>> filesystem directly on the NAS rather than just exporting the raw
>>> devices, which means circling back to this issue.
>>>
>>> After updating my NAS, I have determined that the issue still occurs
>>> with Linux 5.8.
>>>
>>> What's the next best step for debugging the issue? Ideally, I'd like to
>>> help track down the issue to find a proper fix, rather than just trying
>>> to bypass the issue. I wasn't sure if the suggestion to comment out
>>> btrfs_verify_dev_extents() was more geared toward the former or the latter.
>>
>> After rewinding my memory on this case, the problem is really that the
>> ARM btrfs kernel is reading garbage, while X86 or ARM user space tool
>> works as expected.
>>
>> Can you recompile your kernel on the ARM board to add extra debugging
>> messages?
>> If possible, we can try to add some extra debug points to bombarding
>> your dmesg.
>>
>> Or do you have other ARM boards to test the same fs?
>>
>>
>> Thanks,
>> Qu
>
> It's pretty easy to build a kernel with custom patches applied, though
> the actual building takes a while, so I'd be happy to add whatever
> debug messages would be useful. I also have an old Raspberry Pi
> (original model B) I can dig out and try to get going, tomorrow. I
> can't hook it up to the drives directly, but I should be able to
> access them via NBD like I was doing from my desktop.

RPI 1B would be a little slow but should be enough to expose the
problem, if the problem is for all arm builds (as long as you're also
using armv7 for the offending system).

Thanks,
Qu

> If I can't get
> that going for whatever reason, I could also try running an emulated
> ARM system with QEMU.
>
>>>
>>> On Fri, Jun 28, 2019 at 1:15 AM Qu Wenruo <quwenruo.btrfs@gmx.com
>>> <mailto:quwenruo.btrfs@gmx.com>> wrote:
>>>
>>>
>>>
>>>      On 2019/6/28 下午4:00, Erik Jensen wrote:
>>>       >> So it's either the block layer reading some wrong from the disk
>>>      or btrfs
>>>       >> layer doesn't do correct endian convert.
>>>       >
>>>       > My ARM board is running in little endian mode, so it doesn't seem
>>>      like
>>>       > endianness should be an issue. (It is 32-bits versus my desktop's 64,
>>>       > though.) I've also tried exporting the drives via NBD to my x86_64
>>>       > system, and that worked fine, so if the problem is under btrfs, it
>>>       > would have to be in the encryption layer, but fsck succeeding on the
>>>       > ARM board would seem to rule that out, as well.
>>>       >
>>>       >> Would you dump the following data (X86 and ARM should output the
>>>      same
>>>       >> content, thus one output is enough).
>>>       >> # btrfs ins dump-tree -b 17628726968320 /dev/dm-3
>>>       >> # btrfs ins dump-tree -b 17628727001088 /dev/dm-3
>>>       >
>>>       > Attached, and also 17628705964032, since that's the block
>>>      mentioned in
>>>       > my most recent mount attempt (see below).
>>>
>>>      The trees are completely fine.
>>>
>>>      So it should be something else causing the problem.
>>>
>>>       >
>>>       >> And then, for the ARM system, please apply the following diff,
>>>      and try
>>>       >> mount again.
>>>       >> The diff adds extra debug info, to exam the vital members of a
>>>      tree block.
>>>       >>
>>>       >> Correct fs should output something like:
>>>       >>   BTRFS error (device dm-4): bad tree block start, want 30408704
>>>      have 0
>>>       >>   tree block gen=4 owner=5 nritems=2 level=0
>>>       >>   csum:
>>>       >>
>>>      a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000
>>>       >>
>>>       >> The csum one is the most important one, if there aren't so many
>>>      zeros,
>>>       >> it means at that timing, btrfs just got a bunch of garbage, thus we
>>>       >> could do further debug.
>>>       >
>>>       > [  131.725573] BTRFS info (device dm-1): disk space caching is
>>>      enabled
>>>       > [  131.731884] BTRFS info (device dm-1): has skinny extents
>>>       > [  133.046145] BTRFS error (device dm-1): bad tree block start, want
>>>       > 17628705964032 have 2807793151171243621
>>>       > [  133.055775] tree block gen=7888986126946982446
>>>       > owner=11331573954727661546 nritems=4191910623 level=112
>>>       > [  133.065661] csum:
>>>       >
>>>      416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
>>>
>>>      Completely garbage here, so I'd say the data we got isn't what we want.
>>>
>>>       > [  133.108383] BTRFS error (device dm-1): bad tree block start, want
>>>       > 17628705964032 have 2807793151171243621
>>>       > [  133.117999] tree block gen=7888986126946982446
>>>       > owner=11331573954727661546 nritems=4191910623 level=112
>>>       > [  133.127756] csum:
>>>       >
>>>      416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
>>>
>>>      But strangely, the 2nd try still gives us the same result, if it's
>>>      really some garbage, we should get some different result.
>>>
>>>       > [  133.136241] BTRFS error (device dm-1): failed to verify dev
>>>      extents
>>>       > against chunks: -5
>>>
>>>      You can try to skip the dev extents verification by commenting out the
>>>      btrfs_verify_dev_extents() call in disk-io.c::open_ctree().
>>>
>>>      It may fail at another location though.
>>>
>>>      The more strange part is, we have the device tree root node read out
>>>      without problem.
>>>
>>>      Thanks,
>>>      Qu
>>>
>>>       > [  133.166165] BTRFS error (device dm-1): open_ctree failed
>>>       >
>>>       > I copied some files over last time I had it mounted on my desktop,
>>>       > which may be why it's now failing at a different block.
>>>       >
>>>       > Thanks!
>>>       >
>>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-01-18 12:01                   ` Qu Wenruo
@ 2021-01-18 12:12                     ` Erik Jensen
  2021-01-19  5:22                       ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-01-18 12:12 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Hugo Mills, linux-btrfs

The offending system is indeed ARMv7 (specifically a Marvell ARMADA®
388), but I believe the Broadcom BCM2835 in my Raspberry Pi is
actually ARMv6 (with hardware float support).

On Mon, Jan 18, 2021 at 4:01 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2021/1/18 下午7:55, Erik Jensen wrote:
> > On Mon, Jan 18, 2021 at 3:07 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >> On 2021/1/18 下午6:33, Erik Jensen wrote:
> >>> I ended up having other priorities occupying my time since 2019, and the
> >>> "solution" of exporting the individual drives on my NAS using NBD and
> >>> mounting them on my desktop worked, even if it wasn't pretty.
> >>>
> >>> However, I am currently looking into Syncthing, which I would like to
> >>> run on the NAS directly. That would, of course, require accessing the
> >>> filesystem directly on the NAS rather than just exporting the raw
> >>> devices, which means circling back to this issue.
> >>>
> >>> After updating my NAS, I have determined that the issue still occurs
> >>> with Linux 5.8.
> >>>
> >>> What's the next best step for debugging the issue? Ideally, I'd like to
> >>> help track down the issue to find a proper fix, rather than just trying
> >>> to bypass the issue. I wasn't sure if the suggestion to comment out
> >>> btrfs_verify_dev_extents() was more geared toward the former or the latter.
> >>
> >> After rewinding my memory on this case, the problem is really that the
> >> ARM btrfs kernel is reading garbage, while X86 or ARM user space tool
> >> works as expected.
> >>
> >> Can you recompile your kernel on the ARM board to add extra debugging
> >> messages?
> >> If possible, we can try to add some extra debug points to bombarding
> >> your dmesg.
> >>
> >> Or do you have other ARM boards to test the same fs?
> >>
> >>
> >> Thanks,
> >> Qu
> >
> > It's pretty easy to build a kernel with custom patches applied, though
> > the actual building takes a while, so I'd be happy to add whatever
> > debug messages would be useful. I also have an old Raspberry Pi
> > (original model B) I can dig out and try to get going, tomorrow. I
> > can't hook it up to the drives directly, but I should be able to
> > access them via NBD like I was doing from my desktop.
>
> RPI 1B would be a little slow but should be enough to expose the
> problem, if the problem is for all arm builds (as long as you're also
> using armv7 for the offending system).
>
> Thanks,
> Qu
>
> > If I can't get
> > that going for whatever reason, I could also try running an emulated
> > ARM system with QEMU.
> >
> >>>
> >>> On Fri, Jun 28, 2019 at 1:15 AM Qu Wenruo <quwenruo.btrfs@gmx.com
> >>> <mailto:quwenruo.btrfs@gmx.com>> wrote:
> >>>
> >>>
> >>>
> >>>      On 2019/6/28 下午4:00, Erik Jensen wrote:
> >>>       >> So it's either the block layer reading some wrong from the disk
> >>>      or btrfs
> >>>       >> layer doesn't do correct endian convert.
> >>>       >
> >>>       > My ARM board is running in little endian mode, so it doesn't seem
> >>>      like
> >>>       > endianness should be an issue. (It is 32-bits versus my desktop's 64,
> >>>       > though.) I've also tried exporting the drives via NBD to my x86_64
> >>>       > system, and that worked fine, so if the problem is under btrfs, it
> >>>       > would have to be in the encryption layer, but fsck succeeding on the
> >>>       > ARM board would seem to rule that out, as well.
> >>>       >
> >>>       >> Would you dump the following data (X86 and ARM should output the
> >>>      same
> >>>       >> content, thus one output is enough).
> >>>       >> # btrfs ins dump-tree -b 17628726968320 /dev/dm-3
> >>>       >> # btrfs ins dump-tree -b 17628727001088 /dev/dm-3
> >>>       >
> >>>       > Attached, and also 17628705964032, since that's the block
> >>>      mentioned in
> >>>       > my most recent mount attempt (see below).
> >>>
> >>>      The trees are completely fine.
> >>>
> >>>      So it should be something else causing the problem.
> >>>
> >>>       >
> >>>       >> And then, for the ARM system, please apply the following diff,
> >>>      and try
> >>>       >> mount again.
> >>>       >> The diff adds extra debug info, to exam the vital members of a
> >>>      tree block.
> >>>       >>
> >>>       >> Correct fs should output something like:
> >>>       >>   BTRFS error (device dm-4): bad tree block start, want 30408704
> >>>      have 0
> >>>       >>   tree block gen=4 owner=5 nritems=2 level=0
> >>>       >>   csum:
> >>>       >>
> >>>      a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000
> >>>       >>
> >>>       >> The csum one is the most important one, if there aren't so many
> >>>      zeros,
> >>>       >> it means at that timing, btrfs just got a bunch of garbage, thus we
> >>>       >> could do further debug.
> >>>       >
> >>>       > [  131.725573] BTRFS info (device dm-1): disk space caching is
> >>>      enabled
> >>>       > [  131.731884] BTRFS info (device dm-1): has skinny extents
> >>>       > [  133.046145] BTRFS error (device dm-1): bad tree block start, want
> >>>       > 17628705964032 have 2807793151171243621
> >>>       > [  133.055775] tree block gen=7888986126946982446
> >>>       > owner=11331573954727661546 nritems=4191910623 level=112
> >>>       > [  133.065661] csum:
> >>>       >
> >>>      416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
> >>>
> >>>      Completely garbage here, so I'd say the data we got isn't what we want.
> >>>
> >>>       > [  133.108383] BTRFS error (device dm-1): bad tree block start, want
> >>>       > 17628705964032 have 2807793151171243621
> >>>       > [  133.117999] tree block gen=7888986126946982446
> >>>       > owner=11331573954727661546 nritems=4191910623 level=112
> >>>       > [  133.127756] csum:
> >>>       >
> >>>      416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
> >>>
> >>>      But strangely, the 2nd try still gives us the same result, if it's
> >>>      really some garbage, we should get some different result.
> >>>
> >>>       > [  133.136241] BTRFS error (device dm-1): failed to verify dev
> >>>      extents
> >>>       > against chunks: -5
> >>>
> >>>      You can try to skip the dev extents verification by commenting out the
> >>>      btrfs_verify_dev_extents() call in disk-io.c::open_ctree().
> >>>
> >>>      It may fail at another location though.
> >>>
> >>>      The more strange part is, we have the device tree root node read out
> >>>      without problem.
> >>>
> >>>      Thanks,
> >>>      Qu
> >>>
> >>>       > [  133.166165] BTRFS error (device dm-1): open_ctree failed
> >>>       >
> >>>       > I copied some files over last time I had it mounted on my desktop,
> >>>       > which may be why it's now failing at a different block.
> >>>       >
> >>>       > Thanks!
> >>>       >
> >>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-01-18 12:12                     ` Erik Jensen
@ 2021-01-19  5:22                       ` Erik Jensen
  2021-01-19  9:28                         ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-01-19  5:22 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Hugo Mills, linux-btrfs

On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen <erikjensen@rkjnsn.net> wrote:
>
> The offending system is indeed ARMv7 (specifically a Marvell ARMADA®
> 388), but I believe the Broadcom BCM2835 in my Raspberry Pi is
> actually ARMv6 (with hardware float support).

Using NBD, I have verified that I receive the same error when
attempting to mount the filesystem on my ARMv6 Raspberry Pi:
[ 3491.339572] BTRFS info (device dm-4): disk space caching is enabled
[ 3491.394584] BTRFS info (device dm-4): has skinny extents
[ 3492.385095] BTRFS error (device dm-4): bad tree block start, want
26207780683776 have 3395945502747707095
[ 3492.514071] BTRFS error (device dm-4): bad tree block start, want
26207780683776 have 3395945502747707095
[ 3492.553599] BTRFS warning (device dm-4): failed to read tree root
[ 3492.865368] BTRFS error (device dm-4): open_ctree failed

The Raspberry Pi is running Linux 5.4.83.

> On Mon, Jan 18, 2021 at 4:01 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >
> >
> >
> > On 2021/1/18 下午7:55, Erik Jensen wrote:
> > > On Mon, Jan 18, 2021 at 3:07 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > >> On 2021/1/18 下午6:33, Erik Jensen wrote:
> > >>> I ended up having other priorities occupying my time since 2019, and the
> > >>> "solution" of exporting the individual drives on my NAS using NBD and
> > >>> mounting them on my desktop worked, even if it wasn't pretty.
> > >>>
> > >>> However, I am currently looking into Syncthing, which I would like to
> > >>> run on the NAS directly. That would, of course, require accessing the
> > >>> filesystem directly on the NAS rather than just exporting the raw
> > >>> devices, which means circling back to this issue.
> > >>>
> > >>> After updating my NAS, I have determined that the issue still occurs
> > >>> with Linux 5.8.
> > >>>
> > >>> What's the next best step for debugging the issue? Ideally, I'd like to
> > >>> help track down the issue to find a proper fix, rather than just trying
> > >>> to bypass the issue. I wasn't sure if the suggestion to comment out
> > >>> btrfs_verify_dev_extents() was more geared toward the former or the latter.
> > >>
> > >> After rewinding my memory on this case, the problem is really that the
> > >> ARM btrfs kernel is reading garbage, while X86 or ARM user space tool
> > >> works as expected.
> > >>
> > >> Can you recompile your kernel on the ARM board to add extra debugging
> > >> messages?
> > >> If possible, we can try to add some extra debug points to bombarding
> > >> your dmesg.
> > >>
> > >> Or do you have other ARM boards to test the same fs?
> > >>
> > >>
> > >> Thanks,
> > >> Qu
> > >
> > > It's pretty easy to build a kernel with custom patches applied, though
> > > the actual building takes a while, so I'd be happy to add whatever
> > > debug messages would be useful. I also have an old Raspberry Pi
> > > (original model B) I can dig out and try to get going, tomorrow. I
> > > can't hook it up to the drives directly, but I should be able to
> > > access them via NBD like I was doing from my desktop.
> >
> > RPI 1B would be a little slow but should be enough to expose the
> > problem, if the problem is for all arm builds (as long as you're also
> > using armv7 for the offending system).
> >
> > Thanks,
> > Qu
> >
> > > If I can't get
> > > that going for whatever reason, I could also try running an emulated
> > > ARM system with QEMU.
> > >
> > >>>
> > >>> On Fri, Jun 28, 2019 at 1:15 AM Qu Wenruo <quwenruo.btrfs@gmx.com
> > >>> <mailto:quwenruo.btrfs@gmx.com>> wrote:
> > >>>
> > >>>
> > >>>
> > >>>      On 2019/6/28 下午4:00, Erik Jensen wrote:
> > >>>       >> So it's either the block layer reading some wrong from the disk
> > >>>      or btrfs
> > >>>       >> layer doesn't do correct endian convert.
> > >>>       >
> > >>>       > My ARM board is running in little endian mode, so it doesn't seem
> > >>>      like
> > >>>       > endianness should be an issue. (It is 32-bits versus my desktop's 64,
> > >>>       > though.) I've also tried exporting the drives via NBD to my x86_64
> > >>>       > system, and that worked fine, so if the problem is under btrfs, it
> > >>>       > would have to be in the encryption layer, but fsck succeeding on the
> > >>>       > ARM board would seem to rule that out, as well.
> > >>>       >
> > >>>       >> Would you dump the following data (X86 and ARM should output the
> > >>>      same
> > >>>       >> content, thus one output is enough).
> > >>>       >> # btrfs ins dump-tree -b 17628726968320 /dev/dm-3
> > >>>       >> # btrfs ins dump-tree -b 17628727001088 /dev/dm-3
> > >>>       >
> > >>>       > Attached, and also 17628705964032, since that's the block
> > >>>      mentioned in
> > >>>       > my most recent mount attempt (see below).
> > >>>
> > >>>      The trees are completely fine.
> > >>>
> > >>>      So it should be something else causing the problem.
> > >>>
> > >>>       >
> > >>>       >> And then, for the ARM system, please apply the following diff,
> > >>>      and try
> > >>>       >> mount again.
> > >>>       >> The diff adds extra debug info, to exam the vital members of a
> > >>>      tree block.
> > >>>       >>
> > >>>       >> Correct fs should output something like:
> > >>>       >>   BTRFS error (device dm-4): bad tree block start, want 30408704
> > >>>      have 0
> > >>>       >>   tree block gen=4 owner=5 nritems=2 level=0
> > >>>       >>   csum:
> > >>>       >>
> > >>>      a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000
> > >>>       >>
> > >>>       >> The csum one is the most important one, if there aren't so many
> > >>>      zeros,
> > >>>       >> it means at that timing, btrfs just got a bunch of garbage, thus we
> > >>>       >> could do further debug.
> > >>>       >
> > >>>       > [  131.725573] BTRFS info (device dm-1): disk space caching is
> > >>>      enabled
> > >>>       > [  131.731884] BTRFS info (device dm-1): has skinny extents
> > >>>       > [  133.046145] BTRFS error (device dm-1): bad tree block start, want
> > >>>       > 17628705964032 have 2807793151171243621
> > >>>       > [  133.055775] tree block gen=7888986126946982446
> > >>>       > owner=11331573954727661546 nritems=4191910623 level=112
> > >>>       > [  133.065661] csum:
> > >>>       >
> > >>>      416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
> > >>>
> > >>>      Completely garbage here, so I'd say the data we got isn't what we want.
> > >>>
> > >>>       > [  133.108383] BTRFS error (device dm-1): bad tree block start, want
> > >>>       > 17628705964032 have 2807793151171243621
> > >>>       > [  133.117999] tree block gen=7888986126946982446
> > >>>       > owner=11331573954727661546 nritems=4191910623 level=112
> > >>>       > [  133.127756] csum:
> > >>>       >
> > >>>      416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
> > >>>
> > >>>      But strangely, the 2nd try still gives us the same result, if it's
> > >>>      really some garbage, we should get some different result.
> > >>>
> > >>>       > [  133.136241] BTRFS error (device dm-1): failed to verify dev
> > >>>      extents
> > >>>       > against chunks: -5
> > >>>
> > >>>      You can try to skip the dev extents verification by commenting out the
> > >>>      btrfs_verify_dev_extents() call in disk-io.c::open_ctree().
> > >>>
> > >>>      It may fail at another location though.
> > >>>
> > >>>      The more strange part is, we have the device tree root node read out
> > >>>      without problem.
> > >>>
> > >>>      Thanks,
> > >>>      Qu
> > >>>
> > >>>       > [  133.166165] BTRFS error (device dm-1): open_ctree failed
> > >>>       >
> > >>>       > I copied some files over last time I had it mounted on my desktop,
> > >>>       > which may be why it's now failing at a different block.
> > >>>       >
> > >>>       > Thanks!
> > >>>       >
> > >>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-01-19  5:22                       ` Erik Jensen
@ 2021-01-19  9:28                         ` Erik Jensen
  2021-01-20  8:21                           ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-01-19  9:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Hugo Mills, linux-btrfs

On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen <erikjensen@rkjnsn.net> wrote:
>
> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen <erikjensen@rkjnsn.net> wrote:
> >
> > The offending system is indeed ARMv7 (specifically a Marvell ARMADA®
> > 388), but I believe the Broadcom BCM2835 in my Raspberry Pi is
> > actually ARMv6 (with hardware float support).
>
> Using NBD, I have verified that I receive the same error when
> attempting to mount the filesystem on my ARMv6 Raspberry Pi:
> [ 3491.339572] BTRFS info (device dm-4): disk space caching is enabled
> [ 3491.394584] BTRFS info (device dm-4): has skinny extents
> [ 3492.385095] BTRFS error (device dm-4): bad tree block start, want
> 26207780683776 have 3395945502747707095
> [ 3492.514071] BTRFS error (device dm-4): bad tree block start, want
> 26207780683776 have 3395945502747707095
> [ 3492.553599] BTRFS warning (device dm-4): failed to read tree root
> [ 3492.865368] BTRFS error (device dm-4): open_ctree failed
>
> The Raspberry Pi is running Linux 5.4.83.
>

Okay, after some more testing, ARM seems to be irrelevant, and 32-bit
is the key factor. On a whim, I booted up an i686, 5.8.14 kernel in a
VM, attached the drives via NBD, ran cryptsetup, tried to mount, and…
I got the exact same error message.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-01-19  9:28                         ` Erik Jensen
@ 2021-01-20  8:21                           ` Qu Wenruo
  2021-01-20  8:30                             ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-01-20  8:21 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Hugo Mills, linux-btrfs



On 2021/1/19 下午5:28, Erik Jensen wrote:
> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen <erikjensen@rkjnsn.net> wrote:
>>
>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen <erikjensen@rkjnsn.net> wrote:
>>>
>>> The offending system is indeed ARMv7 (specifically a Marvell ARMADA®
>>> 388), but I believe the Broadcom BCM2835 in my Raspberry Pi is
>>> actually ARMv6 (with hardware float support).
>>
>> Using NBD, I have verified that I receive the same error when
>> attempting to mount the filesystem on my ARMv6 Raspberry Pi:
>> [ 3491.339572] BTRFS info (device dm-4): disk space caching is enabled
>> [ 3491.394584] BTRFS info (device dm-4): has skinny extents
>> [ 3492.385095] BTRFS error (device dm-4): bad tree block start, want
>> 26207780683776 have 3395945502747707095
>> [ 3492.514071] BTRFS error (device dm-4): bad tree block start, want
>> 26207780683776 have 3395945502747707095
>> [ 3492.553599] BTRFS warning (device dm-4): failed to read tree root
>> [ 3492.865368] BTRFS error (device dm-4): open_ctree failed
>>
>> The Raspberry Pi is running Linux 5.4.83.
>>
>
> Okay, after some more testing, ARM seems to be irrelevant, and 32-bit
> is the key factor. On a whim, I booted up an i686, 5.8.14 kernel in a
> VM, attached the drives via NBD, ran cryptsetup, tried to mount, and…
> I got the exact same error message.
>
My educated guess is on 32bit platforms, we passed incorrect sector into
bio, thus gave us garbage.

Is this bug happening only on the fs, or any other btrfs can also
trigger similar problems on 32bit platforms?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-01-20  8:21                           ` Qu Wenruo
@ 2021-01-20  8:30                             ` Qu Wenruo
       [not found]                               ` <CAMj6ewOqCJTGjykDijun9_LWYELA=92HrE+KjGo-ehJTutR_+w@mail.gmail.com>
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-01-20  8:30 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Hugo Mills, linux-btrfs



On 2021/1/20 下午4:21, Qu Wenruo wrote:
>
>
> On 2021/1/19 下午5:28, Erik Jensen wrote:
>> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen <erikjensen@rkjnsn.net>
>> wrote:
>>>
>>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen <erikjensen@rkjnsn.net>
>>> wrote:
>>>>
>>>> The offending system is indeed ARMv7 (specifically a Marvell ARMADA®
>>>> 388), but I believe the Broadcom BCM2835 in my Raspberry Pi is
>>>> actually ARMv6 (with hardware float support).
>>>
>>> Using NBD, I have verified that I receive the same error when
>>> attempting to mount the filesystem on my ARMv6 Raspberry Pi:
>>> [ 3491.339572] BTRFS info (device dm-4): disk space caching is enabled
>>> [ 3491.394584] BTRFS info (device dm-4): has skinny extents
>>> [ 3492.385095] BTRFS error (device dm-4): bad tree block start, want
>>> 26207780683776 have 3395945502747707095
>>> [ 3492.514071] BTRFS error (device dm-4): bad tree block start, want
>>> 26207780683776 have 3395945502747707095
>>> [ 3492.553599] BTRFS warning (device dm-4): failed to read tree root
>>> [ 3492.865368] BTRFS error (device dm-4): open_ctree failed
>>>
>>> The Raspberry Pi is running Linux 5.4.83.
>>>
>>
>> Okay, after some more testing, ARM seems to be irrelevant, and 32-bit
>> is the key factor. On a whim, I booted up an i686, 5.8.14 kernel in a
>> VM, attached the drives via NBD, ran cryptsetup, tried to mount, and…
>> I got the exact same error message.
>>
> My educated guess is on 32bit platforms, we passed incorrect sector into
> bio, thus gave us garbage.

To prove that, you can use bcc tool to verify it.
biosnoop can do that:
https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt

Just try mount the fs with biosnoop running.
With "btrfs ins dump-tree -t chunk <dev>", we can manually calculate the
offset of each read to see if they matches.
If not match, it would prove my assumption and give us a pretty good
clue to fix.

Thanks,
Qu

>
> Is this bug happening only on the fs, or any other btrfs can also
> trigger similar problems on 32bit platforms?
>
> Thanks,
> Qu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
       [not found]                               ` <CAMj6ewOqCJTGjykDijun9_LWYELA=92HrE+KjGo-ehJTutR_+w@mail.gmail.com>
@ 2021-01-26  4:54                                 ` Erik Jensen
  2021-01-29  6:39                                   ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-01-26  4:54 UTC (permalink / raw)
  To: Qu Wenruo, Hugo Mills, linux-btrfs

On Wed, Jan 20, 2021 at 1:08 AM Erik Jensen <erikjensen@rkjnsn.net> wrote:
>
> On Wed, Jan 20, 2021 at 12:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > On 2021/1/20 下午4:21, Qu Wenruo wrote:
> > > On 2021/1/19 下午5:28, Erik Jensen wrote:
> > >> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen <erikjensen@rkjnsn.net>
> > >> wrote:
> > >>>
> > >>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen <erikjensen@rkjnsn.net>
> > >>> wrote:
> > >>>>
> > >>>> The offending system is indeed ARMv7 (specifically a Marvell ARMADA®
> > >>>> 388), but I believe the Broadcom BCM2835 in my Raspberry Pi is
> > >>>> actually ARMv6 (with hardware float support).
> > >>>
> > >>> Using NBD, I have verified that I receive the same error when
> > >>> attempting to mount the filesystem on my ARMv6 Raspberry Pi:
> > >>> [ 3491.339572] BTRFS info (device dm-4): disk space caching is enabled
> > >>> [ 3491.394584] BTRFS info (device dm-4): has skinny extents
> > >>> [ 3492.385095] BTRFS error (device dm-4): bad tree block start, want
> > >>> 26207780683776 have 3395945502747707095
> > >>> [ 3492.514071] BTRFS error (device dm-4): bad tree block start, want
> > >>> 26207780683776 have 3395945502747707095
> > >>> [ 3492.553599] BTRFS warning (device dm-4): failed to read tree root
> > >>> [ 3492.865368] BTRFS error (device dm-4): open_ctree failed
> > >>>
> > >>> The Raspberry Pi is running Linux 5.4.83.
> > >>>
> > >>
> > >> Okay, after some more testing, ARM seems to be irrelevant, and 32-bit
> > >> is the key factor. On a whim, I booted up an i686, 5.8.14 kernel in a
> > >> VM, attached the drives via NBD, ran cryptsetup, tried to mount, and…
> > >> I got the exact same error message.
> > >>
> > > My educated guess is on 32bit platforms, we passed incorrect sector into
> > > bio, thus gave us garbage.
> >
> > To prove that, you can use bcc tool to verify it.
> > biosnoop can do that:
> > https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt
> >
> > Just try mount the fs with biosnoop running.
> > With "btrfs ins dump-tree -t chunk <dev>", we can manually calculate the
> > offset of each read to see if they matches.
> > If not match, it would prove my assumption and give us a pretty good
> > clue to fix.
> >
> > Thanks,
> > Qu
> >
> > >
> > > Is this bug happening only on the fs, or any other btrfs can also
> > > trigger similar problems on 32bit platforms?
> > >
> > > Thanks,
> > > Qu
>
> I have only observed this error on this file system. Additionally, the
> error mounting with the NAS only started after I did a `btrfs replace`
> on all five 8TB drives using an x86_64 system. (Ironically, I did this
> with the goal of making it faster to use the filesystem on the NAS by
> re-encrypting the drives to use a cipher supported by my NAS's crypto
> accelerator.)
>
> Maybe this process of shuffling 40TB around caused some value in the
> filesystem to increment to the point that a calculation using it
> overflows on 32-bit systems?
>
> I should be able to try biosnoop later this week, and I'll report back
> with the results.

Okay, I tried running biosnoop, but I seem to be running into this
bug: https://github.com/iovisor/bcc/issues/3241 (That bug was reported
for cpudist, but I'm seeing the same error when I try to run
biosnoop.)

Anything else I can try?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-01-26  4:54                                 ` Erik Jensen
@ 2021-01-29  6:39                                   ` Erik Jensen
  2021-02-01  2:35                                     ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-01-29  6:39 UTC (permalink / raw)
  To: Qu Wenruo, Hugo Mills, linux-btrfs

On Mon, Jan 25, 2021 at 8:54 PM Erik Jensen <erikjensen@rkjnsn.net> wrote:
> On Wed, Jan 20, 2021 at 1:08 AM Erik Jensen <erikjensen@rkjnsn.net> wrote:
> > On Wed, Jan 20, 2021 at 12:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > > On 2021/1/20 下午4:21, Qu Wenruo wrote:
> > > > On 2021/1/19 下午5:28, Erik Jensen wrote:
> > > >> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen <erikjensen@rkjnsn.net>
> > > >> wrote:
> > > >>>
> > > >>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen <erikjensen@rkjnsn.net>
> > > >>> wrote:
> > > >>>>
> > > >>>> The offending system is indeed ARMv7 (specifically a Marvell ARMADA®
> > > >>>> 388), but I believe the Broadcom BCM2835 in my Raspberry Pi is
> > > >>>> actually ARMv6 (with hardware float support).
> > > >>>
> > > >>> Using NBD, I have verified that I receive the same error when
> > > >>> attempting to mount the filesystem on my ARMv6 Raspberry Pi:
> > > >>> [ 3491.339572] BTRFS info (device dm-4): disk space caching is enabled
> > > >>> [ 3491.394584] BTRFS info (device dm-4): has skinny extents
> > > >>> [ 3492.385095] BTRFS error (device dm-4): bad tree block start, want
> > > >>> 26207780683776 have 3395945502747707095
> > > >>> [ 3492.514071] BTRFS error (device dm-4): bad tree block start, want
> > > >>> 26207780683776 have 3395945502747707095
> > > >>> [ 3492.553599] BTRFS warning (device dm-4): failed to read tree root
> > > >>> [ 3492.865368] BTRFS error (device dm-4): open_ctree failed
> > > >>>
> > > >>> The Raspberry Pi is running Linux 5.4.83.
> > > >>>
> > > >>
> > > >> Okay, after some more testing, ARM seems to be irrelevant, and 32-bit
> > > >> is the key factor. On a whim, I booted up an i686, 5.8.14 kernel in a
> > > >> VM, attached the drives via NBD, ran cryptsetup, tried to mount, and…
> > > >> I got the exact same error message.
> > > >>
> > > > My educated guess is on 32bit platforms, we passed incorrect sector into
> > > > bio, thus gave us garbage.
> > >
> > > To prove that, you can use bcc tool to verify it.
> > > biosnoop can do that:
> > > https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt
> > >
> > > Just try mount the fs with biosnoop running.
> > > With "btrfs ins dump-tree -t chunk <dev>", we can manually calculate the
> > > offset of each read to see if they matches.
> > > If not match, it would prove my assumption and give us a pretty good
> > > clue to fix.
> > >
> > > Thanks,
> > > Qu
> > >
> > > >
> > > > Is this bug happening only on the fs, or any other btrfs can also
> > > > trigger similar problems on 32bit platforms?
> > > >
> > > > Thanks,
> > > > Qu
> >
> > I have only observed this error on this file system. Additionally, the
> > error mounting with the NAS only started after I did a `btrfs replace`
> > on all five 8TB drives using an x86_64 system. (Ironically, I did this
> > with the goal of making it faster to use the filesystem on the NAS by
> > re-encrypting the drives to use a cipher supported by my NAS's crypto
> > accelerator.)
> >
> > Maybe this process of shuffling 40TB around caused some value in the
> > filesystem to increment to the point that a calculation using it
> > overflows on 32-bit systems?
> >
> > I should be able to try biosnoop later this week, and I'll report back
> > with the results.
>
> Okay, I tried running biosnoop, but I seem to be running into this
> bug: https://github.com/iovisor/bcc/issues/3241 (That bug was reported
> for cpudist, but I'm seeing the same error when I try to run
> biosnoop.)
>
> Anything else I can try?

Is it possible to add printks to retrieve the same data?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-01-29  6:39                                   ` Erik Jensen
@ 2021-02-01  2:35                                     ` Qu Wenruo
  2021-02-01  5:49                                       ` Su Yue
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-01  2:35 UTC (permalink / raw)
  To: Erik Jensen, Hugo Mills, linux-btrfs



On 2021/1/29 下午2:39, Erik Jensen wrote:
> On Mon, Jan 25, 2021 at 8:54 PM Erik Jensen <erikjensen@rkjnsn.net> wrote:
>> On Wed, Jan 20, 2021 at 1:08 AM Erik Jensen <erikjensen@rkjnsn.net> wrote:
>>> On Wed, Jan 20, 2021 at 12:31 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>> On 2021/1/20 下午4:21, Qu Wenruo wrote:
>>>>> On 2021/1/19 下午5:28, Erik Jensen wrote:
>>>>>> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen <erikjensen@rkjnsn.net>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen <erikjensen@rkjnsn.net>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> The offending system is indeed ARMv7 (specifically a Marvell ARMADA®
>>>>>>>> 388), but I believe the Broadcom BCM2835 in my Raspberry Pi is
>>>>>>>> actually ARMv6 (with hardware float support).
>>>>>>>
>>>>>>> Using NBD, I have verified that I receive the same error when
>>>>>>> attempting to mount the filesystem on my ARMv6 Raspberry Pi:
>>>>>>> [ 3491.339572] BTRFS info (device dm-4): disk space caching is enabled
>>>>>>> [ 3491.394584] BTRFS info (device dm-4): has skinny extents
>>>>>>> [ 3492.385095] BTRFS error (device dm-4): bad tree block start, want
>>>>>>> 26207780683776 have 3395945502747707095
>>>>>>> [ 3492.514071] BTRFS error (device dm-4): bad tree block start, want
>>>>>>> 26207780683776 have 3395945502747707095
>>>>>>> [ 3492.553599] BTRFS warning (device dm-4): failed to read tree root
>>>>>>> [ 3492.865368] BTRFS error (device dm-4): open_ctree failed
>>>>>>>
>>>>>>> The Raspberry Pi is running Linux 5.4.83.
>>>>>>>
>>>>>>
>>>>>> Okay, after some more testing, ARM seems to be irrelevant, and 32-bit
>>>>>> is the key factor. On a whim, I booted up an i686, 5.8.14 kernel in a
>>>>>> VM, attached the drives via NBD, ran cryptsetup, tried to mount, and…
>>>>>> I got the exact same error message.
>>>>>>
>>>>> My educated guess is on 32bit platforms, we passed incorrect sector into
>>>>> bio, thus gave us garbage.
>>>>
>>>> To prove that, you can use bcc tool to verify it.
>>>> biosnoop can do that:
>>>> https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt
>>>>
>>>> Just try mount the fs with biosnoop running.
>>>> With "btrfs ins dump-tree -t chunk <dev>", we can manually calculate the
>>>> offset of each read to see if they matches.
>>>> If not match, it would prove my assumption and give us a pretty good
>>>> clue to fix.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> Is this bug happening only on the fs, or any other btrfs can also
>>>>> trigger similar problems on 32bit platforms?
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>
>>> I have only observed this error on this file system. Additionally, the
>>> error mounting with the NAS only started after I did a `btrfs replace`
>>> on all five 8TB drives using an x86_64 system. (Ironically, I did this
>>> with the goal of making it faster to use the filesystem on the NAS by
>>> re-encrypting the drives to use a cipher supported by my NAS's crypto
>>> accelerator.)
>>>
>>> Maybe this process of shuffling 40TB around caused some value in the
>>> filesystem to increment to the point that a calculation using it
>>> overflows on 32-bit systems?
>>>
>>> I should be able to try biosnoop later this week, and I'll report back
>>> with the results.
>>
>> Okay, I tried running biosnoop, but I seem to be running into this
>> bug: https://github.com/iovisor/bcc/issues/3241 (That bug was reported
>> for cpudist, but I'm seeing the same error when I try to run
>> biosnoop.)
>>
>> Anything else I can try?
>
> Is it possible to add printks to retrieve the same data?
>
Sorry for the late reply, busying testing subpage patchset. (And
unfortunately no much process).

If bcc is not possible, you can still use ftrace events, but
unfortunately I didn't find good enough one. (In fact, the trace events
for block layer is pretty limited).

You can try to add printk()s in function blk_account_io_done() to
emulate what's done in function trace_req_completion() of biosnoop.

The time delta is not important, we only need the device name, sector
and length.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-01  2:35                                     ` Qu Wenruo
@ 2021-02-01  5:49                                       ` Su Yue
  2021-02-04  6:16                                         ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Su Yue @ 2021-02-01  5:49 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Erik Jensen, Hugo Mills, linux-btrfs


On Mon 01 Feb 2021 at 10:35, Qu Wenruo <quwenruo.btrfs@gmx.com> 
wrote:

> On 2021/1/29 下午2:39, Erik Jensen wrote:
>> On Mon, Jan 25, 2021 at 8:54 PM Erik Jensen 
>> <erikjensen@rkjnsn.net> wrote:
>>> On Wed, Jan 20, 2021 at 1:08 AM Erik Jensen 
>>> <erikjensen@rkjnsn.net> wrote:
>>>> On Wed, Jan 20, 2021 at 12:31 AM Qu Wenruo 
>>>> <quwenruo.btrfs@gmx.com> wrote:
>>>>> On 2021/1/20 下午4:21, Qu Wenruo wrote:
>>>>>> On 2021/1/19 下午5:28, Erik Jensen wrote:
>>>>>>> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen 
>>>>>>> <erikjensen@rkjnsn.net>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen 
>>>>>>>> <erikjensen@rkjnsn.net>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> The offending system is indeed ARMv7 (specifically a 
>>>>>>>>> Marvell ARMADA®
>>>>>>>>> 388), but I believe the Broadcom BCM2835 in my Raspberry 
>>>>>>>>> Pi is
>>>>>>>>> actually ARMv6 (with hardware float support).
>>>>>>>>
>>>>>>>> Using NBD, I have verified that I receive the same error 
>>>>>>>> when
>>>>>>>> attempting to mount the filesystem on my ARMv6 Raspberry 
>>>>>>>> Pi:
>>>>>>>> [ 3491.339572] BTRFS info (device dm-4): disk space 
>>>>>>>> caching is enabled
>>>>>>>> [ 3491.394584] BTRFS info (device dm-4): has skinny 
>>>>>>>> extents
>>>>>>>> [ 3492.385095] BTRFS error (device dm-4): bad tree block 
>>>>>>>> start, want
>>>>>>>> 26207780683776 have 3395945502747707095
>>>>>>>> [ 3492.514071] BTRFS error (device dm-4): bad tree block 
>>>>>>>> start, want
>>>>>>>> 26207780683776 have 3395945502747707095
>>>>>>>> [ 3492.553599] BTRFS warning (device dm-4): failed to 
>>>>>>>> read tree root
>>>>>>>> [ 3492.865368] BTRFS error (device dm-4): open_ctree 
>>>>>>>> failed
>>>>>>>>
>>>>>>>> The Raspberry Pi is running Linux 5.4.83.
>>>>>>>>
>>>>>>>
>>>>>>> Okay, after some more testing, ARM seems to be irrelevant, 
>>>>>>> and 32-bit
>>>>>>> is the key factor. On a whim, I booted up an i686, 5.8.14 
>>>>>>> kernel in a
>>>>>>> VM, attached the drives via NBD, ran cryptsetup, tried to 
>>>>>>> mount, and…
>>>>>>> I got the exact same error message.
>>>>>>>
>>>>>> My educated guess is on 32bit platforms, we passed 
>>>>>> incorrect sector into
>>>>>> bio, thus gave us garbage.
>>>>>
>>>>> To prove that, you can use bcc tool to verify it.
>>>>> biosnoop can do that:
>>>>> https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt
>>>>>
>>>>> Just try mount the fs with biosnoop running.
>>>>> With "btrfs ins dump-tree -t chunk <dev>", we can manually 
>>>>> calculate the
>>>>> offset of each read to see if they matches.
>>>>> If not match, it would prove my assumption and give us a 
>>>>> pretty good
>>>>> clue to fix.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>>
>>>>>> Is this bug happening only on the fs, or any other btrfs 
>>>>>> can also
>>>>>> trigger similar problems on 32bit platforms?
>>>>>>
>>>>>> Thanks,
>>>>>> Qu
>>>>
>>>> I have only observed this error on this file system. 
>>>> Additionally, the
>>>> error mounting with the NAS only started after I did a `btrfs 
>>>> replace`
>>>> on all five 8TB drives using an x86_64 system. (Ironically, I 
>>>> did this
>>>> with the goal of making it faster to use the filesystem on 
>>>> the NAS by
>>>> re-encrypting the drives to use a cipher supported by my 
>>>> NAS's crypto
>>>> accelerator.)
>>>>
>>>> Maybe this process of shuffling 40TB around caused some value 
>>>> in the
>>>> filesystem to increment to the point that a calculation using 
>>>> it
>>>> overflows on 32-bit systems?
>>>>
>>>> I should be able to try biosnoop later this week, and I'll 
>>>> report back
>>>> with the results.
>>>
>>> Okay, I tried running biosnoop, but I seem to be running into 
>>> this
>>> bug: https://github.com/iovisor/bcc/issues/3241 (That bug was 
>>> reported
>>> for cpudist, but I'm seeing the same error when I try to run
>>> biosnoop.)
>>>
>>> Anything else I can try?
>>
>> Is it possible to add printks to retrieve the same data?
>>
> Sorry for the late reply, busying testing subpage patchset. (And
> unfortunately no much process).
>
> If bcc is not possible, you can still use ftrace events, but
> unfortunately I didn't find good enough one. (In fact, the trace 
> events
> for block layer is pretty limited).
>
> You can try to add printk()s in function blk_account_io_done() 
> to
> emulate what's done in function trace_req_completion() of 
> biosnoop.
>
> The time delta is not important, we only need the device name, 
> sector
> and length.
>

Tips: There are ftrace events called block:block_rq_issue and
block:block_rq_complete to fetch those infomation. No need to
add printk().

>
> Thanks,
> Qu


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-01  5:49                                       ` Su Yue
@ 2021-02-04  6:16                                         ` Erik Jensen
  2021-02-06  1:57                                           ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-02-04  6:16 UTC (permalink / raw)
  To: Su Yue; +Cc: Qu Wenruo, Hugo Mills, linux-btrfs

On Sun, Jan 31, 2021 at 9:50 PM Su Yue <l@damenly.su> wrote:
> On Mon 01 Feb 2021 at 10:35, Qu Wenruo <quwenruo.btrfs@gmx.com>
> wrote:
> > On 2021/1/29 下午2:39, Erik Jensen wrote:
> >> On Mon, Jan 25, 2021 at 8:54 PM Erik Jensen
> >> <erikjensen@rkjnsn.net> wrote:
> >>> On Wed, Jan 20, 2021 at 1:08 AM Erik Jensen
> >>> <erikjensen@rkjnsn.net> wrote:
> >>>> On Wed, Jan 20, 2021 at 12:31 AM Qu Wenruo
> >>>> <quwenruo.btrfs@gmx.com> wrote:
> >>>>> On 2021/1/20 下午4:21, Qu Wenruo wrote:
> >>>>>> On 2021/1/19 下午5:28, Erik Jensen wrote:
> >>>>>>> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen
> >>>>>>> <erikjensen@rkjnsn.net>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen
> >>>>>>>> <erikjensen@rkjnsn.net>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> The offending system is indeed ARMv7 (specifically a
> >>>>>>>>> Marvell ARMADA®
> >>>>>>>>> 388), but I believe the Broadcom BCM2835 in my Raspberry
> >>>>>>>>> Pi is
> >>>>>>>>> actually ARMv6 (with hardware float support).
> >>>>>>>>
> >>>>>>>> Using NBD, I have verified that I receive the same error
> >>>>>>>> when
> >>>>>>>> attempting to mount the filesystem on my ARMv6 Raspberry
> >>>>>>>> Pi:
> >>>>>>>> [ 3491.339572] BTRFS info (device dm-4): disk space
> >>>>>>>> caching is enabled
> >>>>>>>> [ 3491.394584] BTRFS info (device dm-4): has skinny
> >>>>>>>> extents
> >>>>>>>> [ 3492.385095] BTRFS error (device dm-4): bad tree block
> >>>>>>>> start, want
> >>>>>>>> 26207780683776 have 3395945502747707095
> >>>>>>>> [ 3492.514071] BTRFS error (device dm-4): bad tree block
> >>>>>>>> start, want
> >>>>>>>> 26207780683776 have 3395945502747707095
> >>>>>>>> [ 3492.553599] BTRFS warning (device dm-4): failed to
> >>>>>>>> read tree root
> >>>>>>>> [ 3492.865368] BTRFS error (device dm-4): open_ctree
> >>>>>>>> failed
> >>>>>>>>
> >>>>>>>> The Raspberry Pi is running Linux 5.4.83.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Okay, after some more testing, ARM seems to be irrelevant,
> >>>>>>> and 32-bit
> >>>>>>> is the key factor. On a whim, I booted up an i686, 5.8.14
> >>>>>>> kernel in a
> >>>>>>> VM, attached the drives via NBD, ran cryptsetup, tried to
> >>>>>>> mount, and…
> >>>>>>> I got the exact same error message.
> >>>>>>>
> >>>>>> My educated guess is on 32bit platforms, we passed
> >>>>>> incorrect sector into
> >>>>>> bio, thus gave us garbage.
> >>>>>
> >>>>> To prove that, you can use bcc tool to verify it.
> >>>>> biosnoop can do that:
> >>>>> https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt
> >>>>>
> >>>>> Just try mount the fs with biosnoop running.
> >>>>> With "btrfs ins dump-tree -t chunk <dev>", we can manually
> >>>>> calculate the
> >>>>> offset of each read to see if they matches.
> >>>>> If not match, it would prove my assumption and give us a
> >>>>> pretty good
> >>>>> clue to fix.
> >>>>>
> >>>>> Thanks,
> >>>>> Qu
> >>>>>
> >>>>>>
> >>>>>> Is this bug happening only on the fs, or any other btrfs
> >>>>>> can also
> >>>>>> trigger similar problems on 32bit platforms?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Qu
> >>>>
> >>>> I have only observed this error on this file system.
> >>>> Additionally, the
> >>>> error mounting with the NAS only started after I did a `btrfs
> >>>> replace`
> >>>> on all five 8TB drives using an x86_64 system. (Ironically, I
> >>>> did this
> >>>> with the goal of making it faster to use the filesystem on
> >>>> the NAS by
> >>>> re-encrypting the drives to use a cipher supported by my
> >>>> NAS's crypto
> >>>> accelerator.)
> >>>>
> >>>> Maybe this process of shuffling 40TB around caused some value
> >>>> in the
> >>>> filesystem to increment to the point that a calculation using
> >>>> it
> >>>> overflows on 32-bit systems?
> >>>>
> >>>> I should be able to try biosnoop later this week, and I'll
> >>>> report back
> >>>> with the results.
> >>>
> >>> Okay, I tried running biosnoop, but I seem to be running into
> >>> this
> >>> bug: https://github.com/iovisor/bcc/issues/3241 (That bug was
> >>> reported
> >>> for cpudist, but I'm seeing the same error when I try to run
> >>> biosnoop.)
> >>>
> >>> Anything else I can try?
> >>
> >> Is it possible to add printks to retrieve the same data?
> >>
> > Sorry for the late reply, busying testing subpage patchset. (And
> > unfortunately no much process).
> >
> > If bcc is not possible, you can still use ftrace events, but
> > unfortunately I didn't find good enough one. (In fact, the trace
> > events
> > for block layer is pretty limited).
> >
> > You can try to add printk()s in function blk_account_io_done()
> > to
> > emulate what's done in function trace_req_completion() of
> > biosnoop.
> >
> > The time delta is not important, we only need the device name,
> > sector
> > and length.
> >
>
> Tips: There are ftrace events called block:block_rq_issue and
> block:block_rq_complete to fetch those infomation. No need to
> add printk().
>
> >
> > Thanks,
> > Qu
>

Okay, here's the output of the trace:
https://gist.github.com/rkjnsn/4cf606874962b5a0284249b2f2e934f5

And here's the output dump-tree:
https://gist.github.com/rkjnsn/630b558eaf90369478d670a1cb54b40f

One important note is that ftrace only captured requests at the
underlying block device (nbd, in this case), not at the device mapper
level. The encryption header on these drives is 16 MiB, so the offset
reported in the trace will be 16777216 bytes larger than the offset
brtfs was actually trying to read at the time.

In case it's helpful, I believe this is the mapping of which
(encrypted) nbd device node in the trace corresponds to which
(decrypted) filesystem device:
43,0    33c75e20-26f2-4328-a565-5ef3484832aa
43,32   9bdfdb8f-abfb-47c5-90af-d360d754a958
43,64   39a9463d-65f5-499b-bca8-dae6b52eb729
43,96   f1174dea-ea10-42f2-96b4-4589a2980684
43,128  e669d804-6ea2-4516-8536-1d266f88ebad

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-04  6:16                                         ` Erik Jensen
@ 2021-02-06  1:57                                           ` Erik Jensen
  2021-02-10  5:47                                             ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-02-06  1:57 UTC (permalink / raw)
  To: Su Yue; +Cc: Qu Wenruo, Hugo Mills, linux-btrfs

On Wed, Feb 3, 2021 at 10:16 PM Erik Jensen <erikjensen@rkjnsn.net> wrote:
> On Sun, Jan 31, 2021 at 9:50 PM Su Yue <l@damenly.su> wrote:
> > On Mon 01 Feb 2021 at 10:35, Qu Wenruo <quwenruo.btrfs@gmx.com>
> > wrote:
> > > On 2021/1/29 下午2:39, Erik Jensen wrote:
> > >> On Mon, Jan 25, 2021 at 8:54 PM Erik Jensen
> > >> <erikjensen@rkjnsn.net> wrote:
> > >>> On Wed, Jan 20, 2021 at 1:08 AM Erik Jensen
> > >>> <erikjensen@rkjnsn.net> wrote:
> > >>>> On Wed, Jan 20, 2021 at 12:31 AM Qu Wenruo
> > >>>> <quwenruo.btrfs@gmx.com> wrote:
> > >>>>> On 2021/1/20 下午4:21, Qu Wenruo wrote:
> > >>>>>> On 2021/1/19 下午5:28, Erik Jensen wrote:
> > >>>>>>> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen
> > >>>>>>> <erikjensen@rkjnsn.net>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen
> > >>>>>>>> <erikjensen@rkjnsn.net>
> > >>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> The offending system is indeed ARMv7 (specifically a
> > >>>>>>>>> Marvell ARMADA®
> > >>>>>>>>> 388), but I believe the Broadcom BCM2835 in my Raspberry
> > >>>>>>>>> Pi is
> > >>>>>>>>> actually ARMv6 (with hardware float support).
> > >>>>>>>>
> > >>>>>>>> Using NBD, I have verified that I receive the same error
> > >>>>>>>> when
> > >>>>>>>> attempting to mount the filesystem on my ARMv6 Raspberry
> > >>>>>>>> Pi:
> > >>>>>>>> [ 3491.339572] BTRFS info (device dm-4): disk space
> > >>>>>>>> caching is enabled
> > >>>>>>>> [ 3491.394584] BTRFS info (device dm-4): has skinny
> > >>>>>>>> extents
> > >>>>>>>> [ 3492.385095] BTRFS error (device dm-4): bad tree block
> > >>>>>>>> start, want
> > >>>>>>>> 26207780683776 have 3395945502747707095
> > >>>>>>>> [ 3492.514071] BTRFS error (device dm-4): bad tree block
> > >>>>>>>> start, want
> > >>>>>>>> 26207780683776 have 3395945502747707095
> > >>>>>>>> [ 3492.553599] BTRFS warning (device dm-4): failed to
> > >>>>>>>> read tree root
> > >>>>>>>> [ 3492.865368] BTRFS error (device dm-4): open_ctree
> > >>>>>>>> failed
> > >>>>>>>>
> > >>>>>>>> The Raspberry Pi is running Linux 5.4.83.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> Okay, after some more testing, ARM seems to be irrelevant,
> > >>>>>>> and 32-bit
> > >>>>>>> is the key factor. On a whim, I booted up an i686, 5.8.14
> > >>>>>>> kernel in a
> > >>>>>>> VM, attached the drives via NBD, ran cryptsetup, tried to
> > >>>>>>> mount, and…
> > >>>>>>> I got the exact same error message.
> > >>>>>>>
> > >>>>>> My educated guess is on 32bit platforms, we passed
> > >>>>>> incorrect sector into
> > >>>>>> bio, thus gave us garbage.
> > >>>>>
> > >>>>> To prove that, you can use bcc tool to verify it.
> > >>>>> biosnoop can do that:
> > >>>>> https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt
> > >>>>>
> > >>>>> Just try mount the fs with biosnoop running.
> > >>>>> With "btrfs ins dump-tree -t chunk <dev>", we can manually
> > >>>>> calculate the
> > >>>>> offset of each read to see if they matches.
> > >>>>> If not match, it would prove my assumption and give us a
> > >>>>> pretty good
> > >>>>> clue to fix.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Qu
> > >>>>>
> > >>>>>>
> > >>>>>> Is this bug happening only on the fs, or any other btrfs
> > >>>>>> can also
> > >>>>>> trigger similar problems on 32bit platforms?
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Qu
> > >>>>
> > >>>> I have only observed this error on this file system.
> > >>>> Additionally, the
> > >>>> error mounting with the NAS only started after I did a `btrfs
> > >>>> replace`
> > >>>> on all five 8TB drives using an x86_64 system. (Ironically, I
> > >>>> did this
> > >>>> with the goal of making it faster to use the filesystem on
> > >>>> the NAS by
> > >>>> re-encrypting the drives to use a cipher supported by my
> > >>>> NAS's crypto
> > >>>> accelerator.)
> > >>>>
> > >>>> Maybe this process of shuffling 40TB around caused some value
> > >>>> in the
> > >>>> filesystem to increment to the point that a calculation using
> > >>>> it
> > >>>> overflows on 32-bit systems?
> > >>>>
> > >>>> I should be able to try biosnoop later this week, and I'll
> > >>>> report back
> > >>>> with the results.
> > >>>
> > >>> Okay, I tried running biosnoop, but I seem to be running into
> > >>> this
> > >>> bug: https://github.com/iovisor/bcc/issues/3241 (That bug was
> > >>> reported
> > >>> for cpudist, but I'm seeing the same error when I try to run
> > >>> biosnoop.)
> > >>>
> > >>> Anything else I can try?
> > >>
> > >> Is it possible to add printks to retrieve the same data?
> > >>
> > > Sorry for the late reply, busying testing subpage patchset. (And
> > > unfortunately no much process).
> > >
> > > If bcc is not possible, you can still use ftrace events, but
> > > unfortunately I didn't find good enough one. (In fact, the trace
> > > events
> > > for block layer is pretty limited).
> > >
> > > You can try to add printk()s in function blk_account_io_done()
> > > to
> > > emulate what's done in function trace_req_completion() of
> > > biosnoop.
> > >
> > > The time delta is not important, we only need the device name,
> > > sector
> > > and length.
> > >
> >
> > Tips: There are ftrace events called block:block_rq_issue and
> > block:block_rq_complete to fetch those infomation. No need to
> > add printk().
> >
> > >
> > > Thanks,
> > > Qu
> >
>
> Okay, here's the output of the trace:
> https://gist.github.com/rkjnsn/4cf606874962b5a0284249b2f2e934f5
>
> And here's the output dump-tree:
> https://gist.github.com/rkjnsn/630b558eaf90369478d670a1cb54b40f
>
> One important note is that ftrace only captured requests at the
> underlying block device (nbd, in this case), not at the device mapper
> level. The encryption header on these drives is 16 MiB, so the offset
> reported in the trace will be 16777216 bytes larger than the offset
> brtfs was actually trying to read at the time.
>
> In case it's helpful, I believe this is the mapping of which
> (encrypted) nbd device node in the trace corresponds to which
> (decrypted) filesystem device:
> 43,0    33c75e20-26f2-4328-a565-5ef3484832aa
> 43,32   9bdfdb8f-abfb-47c5-90af-d360d754a958
> 43,64   39a9463d-65f5-499b-bca8-dae6b52eb729
> 43,96   f1174dea-ea10-42f2-96b4-4589a2980684
> 43,128  e669d804-6ea2-4516-8536-1d266f88ebad

What are the chances it's something simple like a long getting used
somewhere in the code that should actually be a 64-bit int?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-06  1:57                                           ` Erik Jensen
@ 2021-02-10  5:47                                             ` Qu Wenruo
  2021-02-10 22:17                                               ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-10  5:47 UTC (permalink / raw)
  To: Erik Jensen, Su Yue; +Cc: Hugo Mills, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 7403 bytes --]



On 2021/2/6 上午9:57, Erik Jensen wrote:
> On Wed, Feb 3, 2021 at 10:16 PM Erik Jensen <erikjensen@rkjnsn.net> wrote:
>> On Sun, Jan 31, 2021 at 9:50 PM Su Yue <l@damenly.su> wrote:
>>> On Mon 01 Feb 2021 at 10:35, Qu Wenruo <quwenruo.btrfs@gmx.com>
>>> wrote:
>>>> On 2021/1/29 下午2:39, Erik Jensen wrote:
>>>>> On Mon, Jan 25, 2021 at 8:54 PM Erik Jensen
>>>>> <erikjensen@rkjnsn.net> wrote:
>>>>>> On Wed, Jan 20, 2021 at 1:08 AM Erik Jensen
>>>>>> <erikjensen@rkjnsn.net> wrote:
>>>>>>> On Wed, Jan 20, 2021 at 12:31 AM Qu Wenruo
>>>>>>> <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>> On 2021/1/20 下午4:21, Qu Wenruo wrote:
>>>>>>>>> On 2021/1/19 下午5:28, Erik Jensen wrote:
>>>>>>>>>> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen
>>>>>>>>>> <erikjensen@rkjnsn.net>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen
>>>>>>>>>>> <erikjensen@rkjnsn.net>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> The offending system is indeed ARMv7 (specifically a
>>>>>>>>>>>> Marvell ARMADA®
>>>>>>>>>>>> 388), but I believe the Broadcom BCM2835 in my Raspberry
>>>>>>>>>>>> Pi is
>>>>>>>>>>>> actually ARMv6 (with hardware float support).
>>>>>>>>>>>
>>>>>>>>>>> Using NBD, I have verified that I receive the same error
>>>>>>>>>>> when
>>>>>>>>>>> attempting to mount the filesystem on my ARMv6 Raspberry
>>>>>>>>>>> Pi:
>>>>>>>>>>> [ 3491.339572] BTRFS info (device dm-4): disk space
>>>>>>>>>>> caching is enabled
>>>>>>>>>>> [ 3491.394584] BTRFS info (device dm-4): has skinny
>>>>>>>>>>> extents
>>>>>>>>>>> [ 3492.385095] BTRFS error (device dm-4): bad tree block
>>>>>>>>>>> start, want
>>>>>>>>>>> 26207780683776 have 3395945502747707095
>>>>>>>>>>> [ 3492.514071] BTRFS error (device dm-4): bad tree block
>>>>>>>>>>> start, want
>>>>>>>>>>> 26207780683776 have 3395945502747707095
>>>>>>>>>>> [ 3492.553599] BTRFS warning (device dm-4): failed to
>>>>>>>>>>> read tree root
>>>>>>>>>>> [ 3492.865368] BTRFS error (device dm-4): open_ctree
>>>>>>>>>>> failed
>>>>>>>>>>>
>>>>>>>>>>> The Raspberry Pi is running Linux 5.4.83.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Okay, after some more testing, ARM seems to be irrelevant,
>>>>>>>>>> and 32-bit
>>>>>>>>>> is the key factor. On a whim, I booted up an i686, 5.8.14
>>>>>>>>>> kernel in a
>>>>>>>>>> VM, attached the drives via NBD, ran cryptsetup, tried to
>>>>>>>>>> mount, and…
>>>>>>>>>> I got the exact same error message.
>>>>>>>>>>
>>>>>>>>> My educated guess is on 32bit platforms, we passed
>>>>>>>>> incorrect sector into
>>>>>>>>> bio, thus gave us garbage.
>>>>>>>>
>>>>>>>> To prove that, you can use bcc tool to verify it.
>>>>>>>> biosnoop can do that:
>>>>>>>> https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt
>>>>>>>>
>>>>>>>> Just try mount the fs with biosnoop running.
>>>>>>>> With "btrfs ins dump-tree -t chunk <dev>", we can manually
>>>>>>>> calculate the
>>>>>>>> offset of each read to see if they matches.
>>>>>>>> If not match, it would prove my assumption and give us a
>>>>>>>> pretty good
>>>>>>>> clue to fix.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Qu
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is this bug happening only on the fs, or any other btrfs
>>>>>>>>> can also
>>>>>>>>> trigger similar problems on 32bit platforms?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Qu
>>>>>>>
>>>>>>> I have only observed this error on this file system.
>>>>>>> Additionally, the
>>>>>>> error mounting with the NAS only started after I did a `btrfs
>>>>>>> replace`
>>>>>>> on all five 8TB drives using an x86_64 system. (Ironically, I
>>>>>>> did this
>>>>>>> with the goal of making it faster to use the filesystem on
>>>>>>> the NAS by
>>>>>>> re-encrypting the drives to use a cipher supported by my
>>>>>>> NAS's crypto
>>>>>>> accelerator.)
>>>>>>>
>>>>>>> Maybe this process of shuffling 40TB around caused some value
>>>>>>> in the
>>>>>>> filesystem to increment to the point that a calculation using
>>>>>>> it
>>>>>>> overflows on 32-bit systems?
>>>>>>>
>>>>>>> I should be able to try biosnoop later this week, and I'll
>>>>>>> report back
>>>>>>> with the results.
>>>>>>
>>>>>> Okay, I tried running biosnoop, but I seem to be running into
>>>>>> this
>>>>>> bug: https://github.com/iovisor/bcc/issues/3241 (That bug was
>>>>>> reported
>>>>>> for cpudist, but I'm seeing the same error when I try to run
>>>>>> biosnoop.)
>>>>>>
>>>>>> Anything else I can try?
>>>>>
>>>>> Is it possible to add printks to retrieve the same data?
>>>>>
>>>> Sorry for the late reply, busying testing subpage patchset. (And
>>>> unfortunately no much process).
>>>>
>>>> If bcc is not possible, you can still use ftrace events, but
>>>> unfortunately I didn't find good enough one. (In fact, the trace
>>>> events
>>>> for block layer is pretty limited).
>>>>
>>>> You can try to add printk()s in function blk_account_io_done()
>>>> to
>>>> emulate what's done in function trace_req_completion() of
>>>> biosnoop.
>>>>
>>>> The time delta is not important, we only need the device name,
>>>> sector
>>>> and length.
>>>>
>>>
>>> Tips: There are ftrace events called block:block_rq_issue and
>>> block:block_rq_complete to fetch those infomation. No need to
>>> add printk().
>>>
>>>>
>>>> Thanks,
>>>> Qu
>>>
>>
>> Okay, here's the output of the trace:
>> https://gist.github.com/rkjnsn/4cf606874962b5a0284249b2f2e934f5
>>
>> And here's the output dump-tree:
>> https://gist.github.com/rkjnsn/630b558eaf90369478d670a1cb54b40f
>>
>> One important note is that ftrace only captured requests at the
>> underlying block device (nbd, in this case), not at the device mapper
>> level. The encryption header on these drives is 16 MiB, so the offset
>> reported in the trace will be 16777216 bytes larger than the offset
>> brtfs was actually trying to read at the time.
>>
>> In case it's helpful, I believe this is the mapping of which
>> (encrypted) nbd device node in the trace corresponds to which
>> (decrypted) filesystem device:
>> 43,0    33c75e20-26f2-4328-a565-5ef3484832aa
>> 43,32   9bdfdb8f-abfb-47c5-90af-d360d754a958
>> 43,64   39a9463d-65f5-499b-bca8-dae6b52eb729
>> 43,96   f1174dea-ea10-42f2-96b4-4589a2980684
>> 43,128  e669d804-6ea2-4516-8536-1d266f88ebad
>
> What are the chances it's something simple like a long getting used
> somewhere in the code that should actually be a 64-bit int?
>
That's what I expected, but I didn't find anything obviously suspicious yet.

Unfortunately I didn't get much useful info from the trace events.
As a lot of the values doesn't even make sense to me....

But the chunk tree dump proves to be more useful.

Firstly, the offending tree block doesn't even occur in chunk chunk ranges.

The offending tree block is 26207780683776, but the tree dump doesn't
have any range there.

The highest chunk is at 5958289850368 + 4294967296, still one digit
lower than the expected value.

I'm surprised we didn't even get any error for that, thus it may
indicate our chunk mapping is incorrect too.

Would you please try the following diff on the 32bit system and report
back the dmesg?

The diff adds the following debug output:
- when we try to read one tree block
- when a bio is mapped to read device
- when a new chunk is added to chunk tree

Thanks,
Qu

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 2049 bytes --]

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7f689ad7709c..a97399f5ac6b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5573,6 +5573,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
 
+	pr_info("%s: eb->start=%llu mirror=%d\n", __func__, eb->start,
+			mirror_num);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index badb972919eb..03dd432b9812 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6374,6 +6374,11 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio,
 	btrfs_io_bio(bio)->device = dev;
 	bio->bi_end_io = btrfs_end_bio;
 	bio->bi_iter.bi_sector = physical >> 9;
+
+	pr_info("%s: rw %d 0x%x, phy=%llu sector=%llu dev_id=%llu size=%u\n", __func__,
+		bio_op(bio), bio->bi_opf, ((u64)bio->bi_iter.bi_sector) << 9,
+		bio->bi_iter.bi_sector,
+		dev->devid, bio->bi_iter.bi_size);
 	btrfs_debug_in_rcu(fs_info,
 	"btrfs_map_bio: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u",
 		bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector,
@@ -6670,6 +6675,8 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
 		return -ENOMEM;
 	}
 
+	pr_info("%s: chunk start=%llu len=%llu num_stripes=%d type=0x%llx\n", __func__,
+		logical, length, num_stripes, btrfs_chunk_type(leaf, chunk));
 	set_bit(EXTENT_FLAG_FS_MAPPING, &em->flags);
 	em->map_lookup = map;
 	em->start = logical;
@@ -6694,6 +6701,9 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
 		read_extent_buffer(leaf, uuid, (unsigned long)
 				   btrfs_stripe_dev_uuid_nr(chunk, i),
 				   BTRFS_UUID_SIZE);
+		pr_info("%s:    stripe %u phy=%llu devid=%llu\n", __func__,
+			i, btrfs_stripe_offset_nr(leaf, chunk, i),
+			devid);
 		map->stripes[i].dev = btrfs_find_device(fs_info->fs_devices,
 							devid, uuid, NULL);
 		if (!map->stripes[i].dev &&

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-10  5:47                                             ` Qu Wenruo
@ 2021-02-10 22:17                                               ` Erik Jensen
  2021-02-10 23:47                                                 ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-02-10 22:17 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs

On Tue, Feb 9, 2021 at 9:47 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2021/2/6 上午9:57, Erik Jensen wrote:
> > On Wed, Feb 3, 2021 at 10:16 PM Erik Jensen <erikjensen@rkjnsn.net> wrote:
> >> On Sun, Jan 31, 2021 at 9:50 PM Su Yue <l@damenly.su> wrote:
> >>> On Mon 01 Feb 2021 at 10:35, Qu Wenruo <quwenruo.btrfs@gmx.com>
> >>> wrote:
> >>>> On 2021/1/29 下午2:39, Erik Jensen wrote:
> >>>>> On Mon, Jan 25, 2021 at 8:54 PM Erik Jensen
> >>>>> <erikjensen@rkjnsn.net> wrote:
> >>>>>> On Wed, Jan 20, 2021 at 1:08 AM Erik Jensen
> >>>>>> <erikjensen@rkjnsn.net> wrote:
> >>>>>>> On Wed, Jan 20, 2021 at 12:31 AM Qu Wenruo
> >>>>>>> <quwenruo.btrfs@gmx.com> wrote:
> >>>>>>>> On 2021/1/20 下午4:21, Qu Wenruo wrote:
> >>>>>>>>> On 2021/1/19 下午5:28, Erik Jensen wrote:
> >>>>>>>>>> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen
> >>>>>>>>>> <erikjensen@rkjnsn.net>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen
> >>>>>>>>>>> <erikjensen@rkjnsn.net>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> The offending system is indeed ARMv7 (specifically a
> >>>>>>>>>>>> Marvell ARMADA®
> >>>>>>>>>>>> 388), but I believe the Broadcom BCM2835 in my Raspberry
> >>>>>>>>>>>> Pi is
> >>>>>>>>>>>> actually ARMv6 (with hardware float support).
> >>>>>>>>>>>
> >>>>>>>>>>> Using NBD, I have verified that I receive the same error
> >>>>>>>>>>> when
> >>>>>>>>>>> attempting to mount the filesystem on my ARMv6 Raspberry
> >>>>>>>>>>> Pi:
> >>>>>>>>>>> [ 3491.339572] BTRFS info (device dm-4): disk space
> >>>>>>>>>>> caching is enabled
> >>>>>>>>>>> [ 3491.394584] BTRFS info (device dm-4): has skinny
> >>>>>>>>>>> extents
> >>>>>>>>>>> [ 3492.385095] BTRFS error (device dm-4): bad tree block
> >>>>>>>>>>> start, want
> >>>>>>>>>>> 26207780683776 have 3395945502747707095
> >>>>>>>>>>> [ 3492.514071] BTRFS error (device dm-4): bad tree block
> >>>>>>>>>>> start, want
> >>>>>>>>>>> 26207780683776 have 3395945502747707095
> >>>>>>>>>>> [ 3492.553599] BTRFS warning (device dm-4): failed to
> >>>>>>>>>>> read tree root
> >>>>>>>>>>> [ 3492.865368] BTRFS error (device dm-4): open_ctree
> >>>>>>>>>>> failed
> >>>>>>>>>>>
> >>>>>>>>>>> The Raspberry Pi is running Linux 5.4.83.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Okay, after some more testing, ARM seems to be irrelevant,
> >>>>>>>>>> and 32-bit
> >>>>>>>>>> is the key factor. On a whim, I booted up an i686, 5.8.14
> >>>>>>>>>> kernel in a
> >>>>>>>>>> VM, attached the drives via NBD, ran cryptsetup, tried to
> >>>>>>>>>> mount, and…
> >>>>>>>>>> I got the exact same error message.
> >>>>>>>>>>
> >>>>>>>>> My educated guess is on 32bit platforms, we passed
> >>>>>>>>> incorrect sector into
> >>>>>>>>> bio, thus gave us garbage.
> >>>>>>>>
> >>>>>>>> To prove that, you can use bcc tool to verify it.
> >>>>>>>> biosnoop can do that:
> >>>>>>>> https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt
> >>>>>>>>
> >>>>>>>> Just try mount the fs with biosnoop running.
> >>>>>>>> With "btrfs ins dump-tree -t chunk <dev>", we can manually
> >>>>>>>> calculate the
> >>>>>>>> offset of each read to see if they matches.
> >>>>>>>> If not match, it would prove my assumption and give us a
> >>>>>>>> pretty good
> >>>>>>>> clue to fix.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Qu
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Is this bug happening only on the fs, or any other btrfs
> >>>>>>>>> can also
> >>>>>>>>> trigger similar problems on 32bit platforms?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Qu
> >>>>>>>
> >>>>>>> I have only observed this error on this file system.
> >>>>>>> Additionally, the
> >>>>>>> error mounting with the NAS only started after I did a `btrfs
> >>>>>>> replace`
> >>>>>>> on all five 8TB drives using an x86_64 system. (Ironically, I
> >>>>>>> did this
> >>>>>>> with the goal of making it faster to use the filesystem on
> >>>>>>> the NAS by
> >>>>>>> re-encrypting the drives to use a cipher supported by my
> >>>>>>> NAS's crypto
> >>>>>>> accelerator.)
> >>>>>>>
> >>>>>>> Maybe this process of shuffling 40TB around caused some value
> >>>>>>> in the
> >>>>>>> filesystem to increment to the point that a calculation using
> >>>>>>> it
> >>>>>>> overflows on 32-bit systems?
> >>>>>>>
> >>>>>>> I should be able to try biosnoop later this week, and I'll
> >>>>>>> report back
> >>>>>>> with the results.
> >>>>>>
> >>>>>> Okay, I tried running biosnoop, but I seem to be running into
> >>>>>> this
> >>>>>> bug: https://github.com/iovisor/bcc/issues/3241 (That bug was
> >>>>>> reported
> >>>>>> for cpudist, but I'm seeing the same error when I try to run
> >>>>>> biosnoop.)
> >>>>>>
> >>>>>> Anything else I can try?
> >>>>>
> >>>>> Is it possible to add printks to retrieve the same data?
> >>>>>
> >>>> Sorry for the late reply, busying testing subpage patchset. (And
> >>>> unfortunately no much process).
> >>>>
> >>>> If bcc is not possible, you can still use ftrace events, but
> >>>> unfortunately I didn't find good enough one. (In fact, the trace
> >>>> events
> >>>> for block layer is pretty limited).
> >>>>
> >>>> You can try to add printk()s in function blk_account_io_done()
> >>>> to
> >>>> emulate what's done in function trace_req_completion() of
> >>>> biosnoop.
> >>>>
> >>>> The time delta is not important, we only need the device name,
> >>>> sector
> >>>> and length.
> >>>>
> >>>
> >>> Tips: There are ftrace events called block:block_rq_issue and
> >>> block:block_rq_complete to fetch those infomation. No need to
> >>> add printk().
> >>>
> >>>>
> >>>> Thanks,
> >>>> Qu
> >>>
> >>
> >> Okay, here's the output of the trace:
> >> https://gist.github.com/rkjnsn/4cf606874962b5a0284249b2f2e934f5
> >>
> >> And here's the output dump-tree:
> >> https://gist.github.com/rkjnsn/630b558eaf90369478d670a1cb54b40f
> >>
> >> One important note is that ftrace only captured requests at the
> >> underlying block device (nbd, in this case), not at the device mapper
> >> level. The encryption header on these drives is 16 MiB, so the offset
> >> reported in the trace will be 16777216 bytes larger than the offset
> >> brtfs was actually trying to read at the time.
> >>
> >> In case it's helpful, I believe this is the mapping of which
> >> (encrypted) nbd device node in the trace corresponds to which
> >> (decrypted) filesystem device:
> >> 43,0    33c75e20-26f2-4328-a565-5ef3484832aa
> >> 43,32   9bdfdb8f-abfb-47c5-90af-d360d754a958
> >> 43,64   39a9463d-65f5-499b-bca8-dae6b52eb729
> >> 43,96   f1174dea-ea10-42f2-96b4-4589a2980684
> >> 43,128  e669d804-6ea2-4516-8536-1d266f88ebad
> >
> > What are the chances it's something simple like a long getting used
> > somewhere in the code that should actually be a 64-bit int?
> >
> That's what I expected, but I didn't find anything obviously suspicious yet.
>
> Unfortunately I didn't get much useful info from the trace events.
> As a lot of the values doesn't even make sense to me....
>
> But the chunk tree dump proves to be more useful.
>
> Firstly, the offending tree block doesn't even occur in chunk chunk ranges.
>
> The offending tree block is 26207780683776, but the tree dump doesn't
> have any range there.
>
> The highest chunk is at 5958289850368 + 4294967296, still one digit
> lower than the expected value.
>
> I'm surprised we didn't even get any error for that, thus it may
> indicate our chunk mapping is incorrect too.
>
> Would you please try the following diff on the 32bit system and report
> back the dmesg?
>
> The diff adds the following debug output:
> - when we try to read one tree block
> - when a bio is mapped to read device
> - when a new chunk is added to chunk tree
>
> Thanks,
> Qu

Okay, here's the dmesg output from attempting to mount the filesystem:
https://gist.github.com/rkjnsn/914651efdca53c83199029de6bb61e20

I captured this on my 32-bit x86 VM, as it's much faster to rebuild
the kernel there than on my ARM board, and it fails with the same
error.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-10 22:17                                               ` Erik Jensen
@ 2021-02-10 23:47                                                 ` Qu Wenruo
  2021-02-18  1:24                                                   ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-10 23:47 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Su Yue, Hugo Mills, linux-btrfs



On 2021/2/11 上午6:17, Erik Jensen wrote:
> On Tue, Feb 9, 2021 at 9:47 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
[...]
>>
>> Unfortunately I didn't get much useful info from the trace events.
>> As a lot of the values doesn't even make sense to me....
>>
>> But the chunk tree dump proves to be more useful.
>>
>> Firstly, the offending tree block doesn't even occur in chunk chunk ranges.
>>
>> The offending tree block is 26207780683776, but the tree dump doesn't
>> have any range there.
>>
>> The highest chunk is at 5958289850368 + 4294967296, still one digit
>> lower than the expected value.
>>
>> I'm surprised we didn't even get any error for that, thus it may
>> indicate our chunk mapping is incorrect too.
>>
>> Would you please try the following diff on the 32bit system and report
>> back the dmesg?
>>
>> The diff adds the following debug output:
>> - when we try to read one tree block
>> - when a bio is mapped to read device
>> - when a new chunk is added to chunk tree
>>
>> Thanks,
>> Qu
>
> Okay, here's the dmesg output from attempting to mount the filesystem:
> https://gist.github.com/rkjnsn/914651efdca53c83199029de6bb61e20
>
> I captured this on my 32-bit x86 VM, as it's much faster to rebuild
> the kernel there than on my ARM board, and it fails with the same
> error.
>

This is indeed much better.

The involved things are:

[   84.463147] read_one_chunk: chunk start=26207148048384 len=1073741824
num_stripes=2 type=0x14
[   84.463148] read_one_chunk:    stripe 0 phy=6477927415808 devid=5
[   84.463149] read_one_chunk:    stripe 1 phy=6477927415808 devid=4

Above is the chunk for the offending tree block.

[   84.463724] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
[   84.463731] submit_stripe_bio: rw 0 0x1000, phy=2118735708160
sector=4138155680 dev_id=3 size=16384
[   84.470793] BTRFS error (device dm-4): bad tree block start, want
26207780683776 have 3395945502747707095

But when the metadata read happens, the physical address and dev id is
completely insane.

The chunk doesn't have dev 3 in it at all, but we still get the wrong
mapping.

Furthermore, that physical and devid belongs to chunk 8614760677376,
which is raid5 data chunk.

So there is definitely something wrong in btrfs chunk mapping on 32bit.

I'll craft a newer debug diff for you after I pinned down which can be
wrong.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-10 23:47                                                 ` Qu Wenruo
@ 2021-02-18  1:24                                                   ` Qu Wenruo
  2021-02-18  4:03                                                     ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-18  1:24 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Su Yue, Hugo Mills, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3019 bytes --]



On 2021/2/11 上午7:47, Qu Wenruo wrote:
>
>
> On 2021/2/11 上午6:17, Erik Jensen wrote:
>> On Tue, Feb 9, 2021 at 9:47 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> [...]
>>>
>>> Unfortunately I didn't get much useful info from the trace events.
>>> As a lot of the values doesn't even make sense to me....
>>>
>>> But the chunk tree dump proves to be more useful.
>>>
>>> Firstly, the offending tree block doesn't even occur in chunk chunk
>>> ranges.
>>>
>>> The offending tree block is 26207780683776, but the tree dump doesn't
>>> have any range there.
>>>
>>> The highest chunk is at 5958289850368 + 4294967296, still one digit
>>> lower than the expected value.
>>>
>>> I'm surprised we didn't even get any error for that, thus it may
>>> indicate our chunk mapping is incorrect too.
>>>
>>> Would you please try the following diff on the 32bit system and report
>>> back the dmesg?
>>>
>>> The diff adds the following debug output:
>>> - when we try to read one tree block
>>> - when a bio is mapped to read device
>>> - when a new chunk is added to chunk tree
>>>
>>> Thanks,
>>> Qu
>>
>> Okay, here's the dmesg output from attempting to mount the filesystem:
>> https://gist.github.com/rkjnsn/914651efdca53c83199029de6bb61e20
>>
>> I captured this on my 32-bit x86 VM, as it's much faster to rebuild
>> the kernel there than on my ARM board, and it fails with the same
>> error.
>>
>
> This is indeed much better.
>
> The involved things are:
>
> [   84.463147] read_one_chunk: chunk start=26207148048384 len=1073741824
> num_stripes=2 type=0x14
> [   84.463148] read_one_chunk:    stripe 0 phy=6477927415808 devid=5
> [   84.463149] read_one_chunk:    stripe 1 phy=6477927415808 devid=4
>
> Above is the chunk for the offending tree block.
>
> [   84.463724] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
> [   84.463731] submit_stripe_bio: rw 0 0x1000, phy=2118735708160
> sector=4138155680 dev_id=3 size=16384
> [   84.470793] BTRFS error (device dm-4): bad tree block start, want
> 26207780683776 have 3395945502747707095
>
> But when the metadata read happens, the physical address and dev id is
> completely insane.
>
> The chunk doesn't have dev 3 in it at all, but we still get the wrong
> mapping.
>
> Furthermore, that physical and devid belongs to chunk 8614760677376,
> which is raid5 data chunk.
>
> So there is definitely something wrong in btrfs chunk mapping on 32bit.
>
> I'll craft a newer debug diff for you after I pinned down which can be
> wrong.

Sorry for the delay, mostly due to lunar new year vocation.

Here is the new diff, it should be applied upon previous diff.

This new diff would add extra debug info inside __btrfs_map_block().

BTW, you only need to rebuild btrfs module to test it, hopes this saves
you some time.

Although if I could got a small enough image to reproduce locally, it
would be the best case...

Thanks,
Qu
>
> Thanks,
> Qu

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 875 bytes --]

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b8fab44394f5..95b4815dc04b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6230,6 +6230,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 
 	map = em->map_lookup;
 
+	pr_info("%s: logical=%llu chunk start=%llu len=%llu type=0x%llx\n", __func__,
+		logical, em->start, em->len, map->type);
 	*length = geom.len;
 	stripe_len = geom.stripe_len;
 	stripe_nr = geom.stripe_nr;
@@ -6372,6 +6374,9 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 		bbio->stripes[i].physical = map->stripes[stripe_index].physical +
 			stripe_offset + stripe_nr * map->stripe_len;
 		bbio->stripes[i].dev = map->stripes[stripe_index].dev;
+		pr_info("%s: stripe[%d] devid=%llu phy=%llu\n", __func__, i,
+				bbio->stripes[i].dev->devid,
+				bbio->stripes[i].physical);
 		stripe_index++;
 	}
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  1:24                                                   ` Qu Wenruo
@ 2021-02-18  4:03                                                     ` Erik Jensen
  2021-02-18  5:24                                                       ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-02-18  4:03 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs

On Wed, Feb 17, 2021 at 5:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2021/2/11 上午7:47, Qu Wenruo wrote:
> > On 2021/2/11 上午6:17, Erik Jensen wrote:
> >> On Tue, Feb 9, 2021 at 9:47 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > [...]
> >>>
> >>> Unfortunately I didn't get much useful info from the trace events.
> >>> As a lot of the values doesn't even make sense to me....
> >>>
> >>> But the chunk tree dump proves to be more useful.
> >>>
> >>> Firstly, the offending tree block doesn't even occur in chunk chunk
> >>> ranges.
> >>>
> >>> The offending tree block is 26207780683776, but the tree dump doesn't
> >>> have any range there.
> >>>
> >>> The highest chunk is at 5958289850368 + 4294967296, still one digit
> >>> lower than the expected value.
> >>>
> >>> I'm surprised we didn't even get any error for that, thus it may
> >>> indicate our chunk mapping is incorrect too.
> >>>
> >>> Would you please try the following diff on the 32bit system and report
> >>> back the dmesg?
> >>>
> >>> The diff adds the following debug output:
> >>> - when we try to read one tree block
> >>> - when a bio is mapped to read device
> >>> - when a new chunk is added to chunk tree
> >>>
> >>> Thanks,
> >>> Qu
> >>
> >> Okay, here's the dmesg output from attempting to mount the filesystem:
> >> https://gist.github.com/rkjnsn/914651efdca53c83199029de6bb61e20
> >>
> >> I captured this on my 32-bit x86 VM, as it's much faster to rebuild
> >> the kernel there than on my ARM board, and it fails with the same
> >> error.
> >>
> >
> > This is indeed much better.
> >
> > The involved things are:
> >
> > [   84.463147] read_one_chunk: chunk start=26207148048384 len=1073741824
> > num_stripes=2 type=0x14
> > [   84.463148] read_one_chunk:    stripe 0 phy=6477927415808 devid=5
> > [   84.463149] read_one_chunk:    stripe 1 phy=6477927415808 devid=4
> >
> > Above is the chunk for the offending tree block.
> >
> > [   84.463724] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
> > [   84.463731] submit_stripe_bio: rw 0 0x1000, phy=2118735708160
> > sector=4138155680 dev_id=3 size=16384
> > [   84.470793] BTRFS error (device dm-4): bad tree block start, want
> > 26207780683776 have 3395945502747707095
> >
> > But when the metadata read happens, the physical address and dev id is
> > completely insane.
> >
> > The chunk doesn't have dev 3 in it at all, but we still get the wrong
> > mapping.
> >
> > Furthermore, that physical and devid belongs to chunk 8614760677376,
> > which is raid5 data chunk.
> >
> > So there is definitely something wrong in btrfs chunk mapping on 32bit.
> >
> > I'll craft a newer debug diff for you after I pinned down which can be
> > wrong.
>
> Sorry for the delay, mostly due to lunar new year vocation.
>
> Here is the new diff, it should be applied upon previous diff.
>
> This new diff would add extra debug info inside __btrfs_map_block().
>
> BTW, you only need to rebuild btrfs module to test it, hopes this saves
> you some time.
>
> Although if I could got a small enough image to reproduce locally, it
> would be the best case...
>
> Thanks,
> Qu
> >
> > Thanks,
> > Qu

Okay, here is the output with both patches applied:
https://gist.github.com/rkjnsn/7139eaf855687c6bd4ce371f88e28a9e

I've only run into the issue on this filesystem, which is quite large,
so I'm not sure how I would even attempt to make a reduced test case.

Thanks!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  4:03                                                     ` Erik Jensen
@ 2021-02-18  5:24                                                       ` Qu Wenruo
  2021-02-18  5:49                                                         ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-18  5:24 UTC (permalink / raw)
  To: Erik Jensen, Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs



On 2021/2/18 下午12:03, Erik Jensen wrote:
> On Wed, Feb 17, 2021 at 5:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2021/2/11 上午7:47, Qu Wenruo wrote:
>>> On 2021/2/11 上午6:17, Erik Jensen wrote:
>>>> On Tue, Feb 9, 2021 at 9:47 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>> [...]
>>>>>
>>>>> Unfortunately I didn't get much useful info from the trace events.
>>>>> As a lot of the values doesn't even make sense to me....
>>>>>
>>>>> But the chunk tree dump proves to be more useful.
>>>>>
>>>>> Firstly, the offending tree block doesn't even occur in chunk chunk
>>>>> ranges.
>>>>>
>>>>> The offending tree block is 26207780683776, but the tree dump doesn't
>>>>> have any range there.
>>>>>
>>>>> The highest chunk is at 5958289850368 + 4294967296, still one digit
>>>>> lower than the expected value.
>>>>>
>>>>> I'm surprised we didn't even get any error for that, thus it may
>>>>> indicate our chunk mapping is incorrect too.
>>>>>
>>>>> Would you please try the following diff on the 32bit system and report
>>>>> back the dmesg?
>>>>>
>>>>> The diff adds the following debug output:
>>>>> - when we try to read one tree block
>>>>> - when a bio is mapped to read device
>>>>> - when a new chunk is added to chunk tree
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>
>>>> Okay, here's the dmesg output from attempting to mount the filesystem:
>>>> https://gist.github.com/rkjnsn/914651efdca53c83199029de6bb61e20
>>>>
>>>> I captured this on my 32-bit x86 VM, as it's much faster to rebuild
>>>> the kernel there than on my ARM board, and it fails with the same
>>>> error.
>>>>
>>>
>>> This is indeed much better.
>>>
>>> The involved things are:
>>>
>>> [   84.463147] read_one_chunk: chunk start=26207148048384 len=1073741824
>>> num_stripes=2 type=0x14
>>> [   84.463148] read_one_chunk:    stripe 0 phy=6477927415808 devid=5
>>> [   84.463149] read_one_chunk:    stripe 1 phy=6477927415808 devid=4
>>>
>>> Above is the chunk for the offending tree block.
>>>
>>> [   84.463724] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
>>> [   84.463731] submit_stripe_bio: rw 0 0x1000, phy=2118735708160
>>> sector=4138155680 dev_id=3 size=16384
>>> [   84.470793] BTRFS error (device dm-4): bad tree block start, want
>>> 26207780683776 have 3395945502747707095
>>>
>>> But when the metadata read happens, the physical address and dev id is
>>> completely insane.
>>>
>>> The chunk doesn't have dev 3 in it at all, but we still get the wrong
>>> mapping.
>>>
>>> Furthermore, that physical and devid belongs to chunk 8614760677376,
>>> which is raid5 data chunk.
>>>
>>> So there is definitely something wrong in btrfs chunk mapping on 32bit.
>>>
>>> I'll craft a newer debug diff for you after I pinned down which can be
>>> wrong.
>>
>> Sorry for the delay, mostly due to lunar new year vocation.
>>
>> Here is the new diff, it should be applied upon previous diff.
>>
>> This new diff would add extra debug info inside __btrfs_map_block().
>>
>> BTW, you only need to rebuild btrfs module to test it, hopes this saves
>> you some time.
>>
>> Although if I could got a small enough image to reproduce locally, it
>> would be the best case...
>>
>> Thanks,
>> Qu
>>>
>>> Thanks,
>>> Qu
>
> Okay, here is the output with both patches applied:
> https://gist.github.com/rkjnsn/7139eaf855687c6bd4ce371f88e28a9e

Got it now.

[  295.249182] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
[  295.249188] __btrfs_map_block: logical=8615594639360 chunk
start=8614760677376 len=4294967296 type=0x81
[  295.249189] __btrfs_map_block: stripe[0] devid=3 phy=2118735708160

Note that, the initial request is to read from 26207780683776.
But inside btrfs_map_block(), we want to read from 8615594639360.

This is totally screwed up in a unexpected way.

26207780683776 = 0x17d5f9754000
8615594639360  = 0x07d5f9754000

See the missing leading 1, which screws up the result.

The problem should be the logical calculation part, which doesn't do
proper u64 conversion which could cause the problem.

Would you like to test the single line fix below?

Thanks,
Qu

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b8fab44394f5..69d728f5ff9e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6575,7 +6575,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info
*fs_info, struct bio *bio,
  {
         struct btrfs_device *dev;
         struct bio *first_bio = bio;
-       u64 logical = bio->bi_iter.bi_sector << 9;
+       u64 logical = ((u64)bio->bi_iter.bi_sector) << 9;
         u64 length = 0;
         u64 map_length;
         int ret;


>
> I've only run into the issue on this filesystem, which is quite large,
> so I'm not sure how I would even attempt to make a reduced test case.
>
> Thanks!
>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  5:24                                                       ` Qu Wenruo
@ 2021-02-18  5:49                                                         ` Erik Jensen
  2021-02-18  6:09                                                           ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-02-18  5:49 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs

On Wed, Feb 17, 2021 at 9:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> Got it now.
>
> [  295.249182] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
> [  295.249188] __btrfs_map_block: logical=8615594639360 chunk
> start=8614760677376 len=4294967296 type=0x81
> [  295.249189] __btrfs_map_block: stripe[0] devid=3 phy=2118735708160
>
> Note that, the initial request is to read from 26207780683776.
> But inside btrfs_map_block(), we want to read from 8615594639360.
>
> This is totally screwed up in a unexpected way.
>
> 26207780683776 = 0x17d5f9754000
> 8615594639360  = 0x07d5f9754000
>
> See the missing leading 1, which screws up the result.
>
> The problem should be the logical calculation part, which doesn't do
> proper u64 conversion which could cause the problem.
>
> Would you like to test the single line fix below?
>
> Thanks,
> Qu
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b8fab44394f5..69d728f5ff9e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6575,7 +6575,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info
> *fs_info, struct bio *bio,
>   {
>          struct btrfs_device *dev;
>          struct bio *first_bio = bio;
> -       u64 logical = bio->bi_iter.bi_sector << 9;
> +       u64 logical = ((u64)bio->bi_iter.bi_sector) << 9;
>          u64 length = 0;
>          u64 map_length;
>          int ret;

So… it appears my kernel tree (Arch32's 5.10.14-arch1) already has that:

blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
                           int mirror_num)
{
        struct btrfs_device *dev;
        struct bio *first_bio = bio;
        u64 logical = (u64)bio->bi_iter.bi_sector << 9;
        u64 length = 0;
        u64 map_length;
        int ret;
        int dev_nr;
        int total_devs;
        struct btrfs_bio *bbio = NULL;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  5:49                                                         ` Erik Jensen
@ 2021-02-18  6:09                                                           ` Qu Wenruo
  2021-02-18  6:59                                                             ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-18  6:09 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Su Yue, Hugo Mills, linux-btrfs



On 2021/2/18 下午1:49, Erik Jensen wrote:
> On Wed, Feb 17, 2021 at 9:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> Got it now.
>>
>> [  295.249182] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
>> [  295.249188] __btrfs_map_block: logical=8615594639360 chunk
>> start=8614760677376 len=4294967296 type=0x81
>> [  295.249189] __btrfs_map_block: stripe[0] devid=3 phy=2118735708160
>>
>> Note that, the initial request is to read from 26207780683776.
>> But inside btrfs_map_block(), we want to read from 8615594639360.
>>
>> This is totally screwed up in a unexpected way.
>>
>> 26207780683776 = 0x17d5f9754000
>> 8615594639360  = 0x07d5f9754000
>>
>> See the missing leading 1, which screws up the result.
>>
>> The problem should be the logical calculation part, which doesn't do
>> proper u64 conversion which could cause the problem.
>>
>> Would you like to test the single line fix below?
>>
>> Thanks,
>> Qu
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index b8fab44394f5..69d728f5ff9e 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6575,7 +6575,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info
>> *fs_info, struct bio *bio,
>>    {
>>           struct btrfs_device *dev;
>>           struct bio *first_bio = bio;
>> -       u64 logical = bio->bi_iter.bi_sector << 9;
>> +       u64 logical = ((u64)bio->bi_iter.bi_sector) << 9;
>>           u64 length = 0;
>>           u64 map_length;
>>           int ret;
>
> So… it appears my kernel tree (Arch32's 5.10.14-arch1) already has that:
>

And I also noticed that since v5.2 kernel, we should already have
bi_sector as u64.

So why that left shift would get higher bits missing is really strange.
Especially the missing part is just at the 45 bit, not 32 bit boundary.

Then what about this diff? It goes multiplying other than using
dangerous left shift.

(Also, it's recommended to still use previous debug diffs, so if it
doesn't work we still have a chance to know what's going wrong).

Thanks,
Qu

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b8fab44394f5..15cea408a51f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6575,7 +6575,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info
*fs_info, struct bio *bio,
  {
         struct btrfs_device *dev;
         struct bio *first_bio = bio;
-       u64 logical = bio->bi_iter.bi_sector << 9;
+       u64 logical = bio->bi_iter.bi_sector * 512ULL;
         u64 length = 0;
         u64 map_length;
         int ret;



> blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>                             int mirror_num)
> {
>          struct btrfs_device *dev;
>          struct bio *first_bio = bio;
>          u64 logical = (u64)bio->bi_iter.bi_sector << 9;
>          u64 length = 0;
>          u64 map_length;
>          int ret;
>          int dev_nr;
>          int total_devs;
>          struct btrfs_bio *bbio = NULL;
>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  6:09                                                           ` Qu Wenruo
@ 2021-02-18  6:59                                                             ` Erik Jensen
  2021-02-18  7:24                                                               ` Qu Wenruo
  2021-02-18  7:25                                                               ` Erik Jensen
  0 siblings, 2 replies; 44+ messages in thread
From: Erik Jensen @ 2021-02-18  6:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs

On Wed, Feb 17, 2021 at 10:09 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2021/2/18 下午1:49, Erik Jensen wrote:
> > On Wed, Feb 17, 2021 at 9:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >> Got it now.
> >>
> >> [  295.249182] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
> >> [  295.249188] __btrfs_map_block: logical=8615594639360 chunk
> >> start=8614760677376 len=4294967296 type=0x81
> >> [  295.249189] __btrfs_map_block: stripe[0] devid=3 phy=2118735708160
> >>
> >> Note that, the initial request is to read from 26207780683776.
> >> But inside btrfs_map_block(), we want to read from 8615594639360.
> >>
> >> This is totally screwed up in a unexpected way.
> >>
> >> 26207780683776 = 0x17d5f9754000
> >> 8615594639360  = 0x07d5f9754000
> >>
> >> See the missing leading 1, which screws up the result.
> >>
> >> The problem should be the logical calculation part, which doesn't do
> >> proper u64 conversion which could cause the problem.
> >>
> >> Would you like to test the single line fix below?
> >>
> >> Thanks,
> >> Qu
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index b8fab44394f5..69d728f5ff9e 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -6575,7 +6575,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info
> >> *fs_info, struct bio *bio,
> >>    {
> >>           struct btrfs_device *dev;
> >>           struct bio *first_bio = bio;
> >> -       u64 logical = bio->bi_iter.bi_sector << 9;
> >> +       u64 logical = ((u64)bio->bi_iter.bi_sector) << 9;
> >>           u64 length = 0;
> >>           u64 map_length;
> >>           int ret;
> >
> > So… it appears my kernel tree (Arch32's 5.10.14-arch1) already has that:
> >
>
> And I also noticed that since v5.2 kernel, we should already have
> bi_sector as u64.
>
> So why that left shift would get higher bits missing is really strange.
> Especially the missing part is just at the 45 bit, not 32 bit boundary.
>
> Then what about this diff? It goes multiplying other than using
> dangerous left shift.
>
> (Also, it's recommended to still use previous debug diffs, so if it
> doesn't work we still have a chance to know what's going wrong).
>
> Thanks,
> Qu

No change. I added an extra debug line in btrfs_map_bio, and get the following:

btrfs_map_bio: bio->bi_iter.bi_sector=16827333280, logical=8615594639360

bio->bi_iter.bi_sector is 16827333280, not 51187071648, so it looks
like the top bit is already missing before the shift / multiplication.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  6:59                                                             ` Erik Jensen
@ 2021-02-18  7:24                                                               ` Qu Wenruo
  2021-02-18  7:59                                                                 ` Erik Jensen
  2021-02-18  7:25                                                               ` Erik Jensen
  1 sibling, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-18  7:24 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Su Yue, Hugo Mills, linux-btrfs



On 2021/2/18 下午2:59, Erik Jensen wrote:
> On Wed, Feb 17, 2021 at 10:09 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2021/2/18 下午1:49, Erik Jensen wrote:
>>> On Wed, Feb 17, 2021 at 9:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>> Got it now.
>>>>
>>>> [  295.249182] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
>>>> [  295.249188] __btrfs_map_block: logical=8615594639360 chunk
>>>> start=8614760677376 len=4294967296 type=0x81
>>>> [  295.249189] __btrfs_map_block: stripe[0] devid=3 phy=2118735708160
>>>>
>>>> Note that, the initial request is to read from 26207780683776.
>>>> But inside btrfs_map_block(), we want to read from 8615594639360.
>>>>
>>>> This is totally screwed up in a unexpected way.
>>>>
>>>> 26207780683776 = 0x17d5f9754000
>>>> 8615594639360  = 0x07d5f9754000
>>>>
>>>> See the missing leading 1, which screws up the result.
>>>>
>>>> The problem should be the logical calculation part, which doesn't do
>>>> proper u64 conversion which could cause the problem.
>>>>
>>>> Would you like to test the single line fix below?
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index b8fab44394f5..69d728f5ff9e 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -6575,7 +6575,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info
>>>> *fs_info, struct bio *bio,
>>>>     {
>>>>            struct btrfs_device *dev;
>>>>            struct bio *first_bio = bio;
>>>> -       u64 logical = bio->bi_iter.bi_sector << 9;
>>>> +       u64 logical = ((u64)bio->bi_iter.bi_sector) << 9;
>>>>            u64 length = 0;
>>>>            u64 map_length;
>>>>            int ret;
>>>
>>> So… it appears my kernel tree (Arch32's 5.10.14-arch1) already has that:
>>>
>>
>> And I also noticed that since v5.2 kernel, we should already have
>> bi_sector as u64.
>>
>> So why that left shift would get higher bits missing is really strange.
>> Especially the missing part is just at the 45 bit, not 32 bit boundary.
>>
>> Then what about this diff? It goes multiplying other than using
>> dangerous left shift.
>>
>> (Also, it's recommended to still use previous debug diffs, so if it
>> doesn't work we still have a chance to know what's going wrong).
>>
>> Thanks,
>> Qu
>
> No change. I added an extra debug line in btrfs_map_bio, and get the following:
>
> btrfs_map_bio: bio->bi_iter.bi_sector=16827333280, logical=8615594639360
>
> bio->bi_iter.bi_sector is 16827333280, not 51187071648, so it looks
> like the top bit is already missing before the shift / multiplication.
>
Special thanks to Su, he points out that, page->index is still just
unsigned long, which is not ensured to be 64 bits.

Thus page_offset(page) can easily go wrong, which takes page->index and
does left shift.

Mind to test the following debug diff?

Thanks,
Qu

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4dfb3ead1175..794f97d6eda7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6001,6 +6001,8 @@ int read_extent_buffer_pages(struct extent_buffer
*eb, int wait, int mirror_num)
                         }

                         ClearPageError(page);
+                       pr_info("%s: eb start=%llu i=%d page_offset=%llu\n",
+                               __func__, eb->start, i, page_offset(page));
                         err = submit_extent_page(REQ_OP_READ |
REQ_META, NULL,
                                          page, page_offset(page),
PAGE_SIZE, 0,
                                          &bio, end_bio_extent_readpage,

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  6:59                                                             ` Erik Jensen
  2021-02-18  7:24                                                               ` Qu Wenruo
@ 2021-02-18  7:25                                                               ` Erik Jensen
  1 sibling, 0 replies; 44+ messages in thread
From: Erik Jensen @ 2021-02-18  7:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs

On Wed, Feb 17, 2021 at 10:59 PM Erik Jensen <erikjensen@rkjnsn.net> wrote:
> On Wed, Feb 17, 2021 at 10:09 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > On 2021/2/18 下午1:49, Erik Jensen wrote:
> > > On Wed, Feb 17, 2021 at 9:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > >> Got it now.
> > >>
> > >> [  295.249182] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
> > >> [  295.249188] __btrfs_map_block: logical=8615594639360 chunk
> > >> start=8614760677376 len=4294967296 type=0x81
> > >> [  295.249189] __btrfs_map_block: stripe[0] devid=3 phy=2118735708160
> > >>
> > >> Note that, the initial request is to read from 26207780683776.
> > >> But inside btrfs_map_block(), we want to read from 8615594639360.
> > >>
> > >> This is totally screwed up in a unexpected way.
> > >>
> > >> 26207780683776 = 0x17d5f9754000
> > >> 8615594639360  = 0x07d5f9754000
> > >>
> > >> See the missing leading 1, which screws up the result.
> > >>
> > >> The problem should be the logical calculation part, which doesn't do
> > >> proper u64 conversion which could cause the problem.
> > >>
> > >> Would you like to test the single line fix below?
> > >>
> > >> Thanks,
> > >> Qu
> > >>
> > >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > >> index b8fab44394f5..69d728f5ff9e 100644
> > >> --- a/fs/btrfs/volumes.c
> > >> +++ b/fs/btrfs/volumes.c
> > >> @@ -6575,7 +6575,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info
> > >> *fs_info, struct bio *bio,
> > >>    {
> > >>           struct btrfs_device *dev;
> > >>           struct bio *first_bio = bio;
> > >> -       u64 logical = bio->bi_iter.bi_sector << 9;
> > >> +       u64 logical = ((u64)bio->bi_iter.bi_sector) << 9;
> > >>           u64 length = 0;
> > >>           u64 map_length;
> > >>           int ret;
> > >
> > > So… it appears my kernel tree (Arch32's 5.10.14-arch1) already has that:
> > >
> >
> > And I also noticed that since v5.2 kernel, we should already have
> > bi_sector as u64.
> >
> > So why that left shift would get higher bits missing is really strange.
> > Especially the missing part is just at the 45 bit, not 32 bit boundary.
> >
> > Then what about this diff? It goes multiplying other than using
> > dangerous left shift.
> >
> > (Also, it's recommended to still use previous debug diffs, so if it
> > doesn't work we still have a chance to know what's going wrong).
> >
> > Thanks,
> > Qu
>
> No change. I added an extra debug line in btrfs_map_bio, and get the following:
>
> btrfs_map_bio: bio->bi_iter.bi_sector=16827333280, logical=8615594639360
>
> bio->bi_iter.bi_sector is 16827333280, not 51187071648, so it looks
> like the top bit is already missing before the shift / multiplication.

Possibly relevant observation: if you take 26207780683776 and divide
it by 4096, you get 6398383956, which is a 33 bit number. If you
truncate that to 32 bits, and then multiply by 4096, you get
8615594639360. Not sure if 4096 would be relevant here because it's
the kernel page size, because the block device has a 4096 sector size
(both physical and logical), something else, or if it's a read
herring.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  7:24                                                               ` Qu Wenruo
@ 2021-02-18  7:59                                                                 ` Erik Jensen
  2021-02-18  8:38                                                                   ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-02-18  7:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs

On Wed, Feb 17, 2021 at 11:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2021/2/18 下午2:59, Erik Jensen wrote:
> > On Wed, Feb 17, 2021 at 10:09 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >> On 2021/2/18 下午1:49, Erik Jensen wrote:
> >>> On Wed, Feb 17, 2021 at 9:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>> Got it now.
> >>>>
> >>>> [  295.249182] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
> >>>> [  295.249188] __btrfs_map_block: logical=8615594639360 chunk
> >>>> start=8614760677376 len=4294967296 type=0x81
> >>>> [  295.249189] __btrfs_map_block: stripe[0] devid=3 phy=2118735708160
> >>>>
> >>>> Note that, the initial request is to read from 26207780683776.
> >>>> But inside btrfs_map_block(), we want to read from 8615594639360.
> >>>>
> >>>> This is totally screwed up in a unexpected way.
> >>>>
> >>>> 26207780683776 = 0x17d5f9754000
> >>>> 8615594639360  = 0x07d5f9754000
> >>>>
> >>>> See the missing leading 1, which screws up the result.
> >>>>
> >>>> The problem should be the logical calculation part, which doesn't do
> >>>> proper u64 conversion which could cause the problem.
> >>>>
> >>>> Would you like to test the single line fix below?
> >>>>
> >>>> Thanks,
> >>>> Qu
> >>>>
> >>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >>>> index b8fab44394f5..69d728f5ff9e 100644
> >>>> --- a/fs/btrfs/volumes.c
> >>>> +++ b/fs/btrfs/volumes.c
> >>>> @@ -6575,7 +6575,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info
> >>>> *fs_info, struct bio *bio,
> >>>>     {
> >>>>            struct btrfs_device *dev;
> >>>>            struct bio *first_bio = bio;
> >>>> -       u64 logical = bio->bi_iter.bi_sector << 9;
> >>>> +       u64 logical = ((u64)bio->bi_iter.bi_sector) << 9;
> >>>>            u64 length = 0;
> >>>>            u64 map_length;
> >>>>            int ret;
> >>>
> >>> So… it appears my kernel tree (Arch32's 5.10.14-arch1) already has that:
> >>>
> >>
> >> And I also noticed that since v5.2 kernel, we should already have
> >> bi_sector as u64.
> >>
> >> So why that left shift would get higher bits missing is really strange.
> >> Especially the missing part is just at the 45 bit, not 32 bit boundary.
> >>
> >> Then what about this diff? It goes multiplying other than using
> >> dangerous left shift.
> >>
> >> (Also, it's recommended to still use previous debug diffs, so if it
> >> doesn't work we still have a chance to know what's going wrong).
> >>
> >> Thanks,
> >> Qu
> >
> > No change. I added an extra debug line in btrfs_map_bio, and get the following:
> >
> > btrfs_map_bio: bio->bi_iter.bi_sector=16827333280, logical=8615594639360
> >
> > bio->bi_iter.bi_sector is 16827333280, not 51187071648, so it looks
> > like the top bit is already missing before the shift / multiplication.
> >
> Special thanks to Su, he points out that, page->index is still just
> unsigned long, which is not ensured to be 64 bits.
>
> Thus page_offset(page) can easily go wrong, which takes page->index and
> does left shift.
>
> Mind to test the following debug diff?
>
> Thanks,
> Qu
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 4dfb3ead1175..794f97d6eda7 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -6001,6 +6001,8 @@ int read_extent_buffer_pages(struct extent_buffer
> *eb, int wait, int mirror_num)
>                          }
>
>                          ClearPageError(page);
> +                       pr_info("%s: eb start=%llu i=%d page_offset=%llu\n",
> +                               __func__, eb->start, i, page_offset(page));
>                          err = submit_extent_page(REQ_OP_READ |
> REQ_META, NULL,
>                                           page, page_offset(page),
> PAGE_SIZE, 0,
>                                           &bio, end_bio_extent_readpage,

Here's the new dmesg log:
https://gist.github.com/rkjnsn/5153682d5be865c13966d342ea7cbe9e

Relevant looking new lines:

[   52.903379] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
[   52.903380] read_extent_buffer_pages: eb start=26207780683776 i=0
page_offset=8615594639360
[   52.903400] read_extent_buffer_pages: eb start=26207780683776 i=1
page_offset=8615594643456
[   52.903403] read_extent_buffer_pages: eb start=26207780683776 i=2
page_offset=8615594647552
[   52.903403] read_extent_buffer_pages: eb start=26207780683776 i=3
page_offset=8615594651648

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  7:59                                                                 ` Erik Jensen
@ 2021-02-18  8:38                                                                   ` Qu Wenruo
  2021-02-18  8:52                                                                     ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-18  8:38 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Su Yue, Hugo Mills, linux-btrfs



On 2021/2/18 下午3:59, Erik Jensen wrote:
> On Wed, Feb 17, 2021 at 11:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2021/2/18 下午2:59, Erik Jensen wrote:
>>> On Wed, Feb 17, 2021 at 10:09 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>> On 2021/2/18 下午1:49, Erik Jensen wrote:
>>>>> On Wed, Feb 17, 2021 at 9:24 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>> Got it now.
>>>>>>
>>>>>> [  295.249182] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
>>>>>> [  295.249188] __btrfs_map_block: logical=8615594639360 chunk
>>>>>> start=8614760677376 len=4294967296 type=0x81
>>>>>> [  295.249189] __btrfs_map_block: stripe[0] devid=3 phy=2118735708160
>>>>>>
>>>>>> Note that, the initial request is to read from 26207780683776.
>>>>>> But inside btrfs_map_block(), we want to read from 8615594639360.
>>>>>>
>>>>>> This is totally screwed up in a unexpected way.
>>>>>>
>>>>>> 26207780683776 = 0x17d5f9754000
>>>>>> 8615594639360  = 0x07d5f9754000
>>>>>>
>>>>>> See the missing leading 1, which screws up the result.
>>>>>>
>>>>>> The problem should be the logical calculation part, which doesn't do
>>>>>> proper u64 conversion which could cause the problem.
>>>>>>
>>>>>> Would you like to test the single line fix below?
>>>>>>
>>>>>> Thanks,
>>>>>> Qu
>>>>>>
>>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>>> index b8fab44394f5..69d728f5ff9e 100644
>>>>>> --- a/fs/btrfs/volumes.c
>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>> @@ -6575,7 +6575,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info
>>>>>> *fs_info, struct bio *bio,
>>>>>>      {
>>>>>>             struct btrfs_device *dev;
>>>>>>             struct bio *first_bio = bio;
>>>>>> -       u64 logical = bio->bi_iter.bi_sector << 9;
>>>>>> +       u64 logical = ((u64)bio->bi_iter.bi_sector) << 9;
>>>>>>             u64 length = 0;
>>>>>>             u64 map_length;
>>>>>>             int ret;
>>>>>
>>>>> So… it appears my kernel tree (Arch32's 5.10.14-arch1) already has that:
>>>>>
>>>>
>>>> And I also noticed that since v5.2 kernel, we should already have
>>>> bi_sector as u64.
>>>>
>>>> So why that left shift would get higher bits missing is really strange.
>>>> Especially the missing part is just at the 45 bit, not 32 bit boundary.
>>>>
>>>> Then what about this diff? It goes multiplying other than using
>>>> dangerous left shift.
>>>>
>>>> (Also, it's recommended to still use previous debug diffs, so if it
>>>> doesn't work we still have a chance to know what's going wrong).
>>>>
>>>> Thanks,
>>>> Qu
>>>
>>> No change. I added an extra debug line in btrfs_map_bio, and get the following:
>>>
>>> btrfs_map_bio: bio->bi_iter.bi_sector=16827333280, logical=8615594639360
>>>
>>> bio->bi_iter.bi_sector is 16827333280, not 51187071648, so it looks
>>> like the top bit is already missing before the shift / multiplication.
>>>
>> Special thanks to Su, he points out that, page->index is still just
>> unsigned long, which is not ensured to be 64 bits.
>>
>> Thus page_offset(page) can easily go wrong, which takes page->index and
>> does left shift.
>>
>> Mind to test the following debug diff?
>>
>> Thanks,
>> Qu
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 4dfb3ead1175..794f97d6eda7 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -6001,6 +6001,8 @@ int read_extent_buffer_pages(struct extent_buffer
>> *eb, int wait, int mirror_num)
>>                           }
>>
>>                           ClearPageError(page);
>> +                       pr_info("%s: eb start=%llu i=%d page_offset=%llu\n",
>> +                               __func__, eb->start, i, page_offset(page));
>>                           err = submit_extent_page(REQ_OP_READ |
>> REQ_META, NULL,
>>                                            page, page_offset(page),
>> PAGE_SIZE, 0,
>>                                            &bio, end_bio_extent_readpage,
>
> Here's the new dmesg log:
> https://gist.github.com/rkjnsn/5153682d5be865c13966d342ea7cbe9e
>
> Relevant looking new lines:
>
> [   52.903379] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
> [   52.903380] read_extent_buffer_pages: eb start=26207780683776 i=0
> page_offset=8615594639360
> [   52.903400] read_extent_buffer_pages: eb start=26207780683776 i=1
> page_offset=8615594643456
> [   52.903403] read_extent_buffer_pages: eb start=26207780683776 i=2
> page_offset=8615594647552
> [   52.903403] read_extent_buffer_pages: eb start=26207780683776 i=3
> page_offset=8615594651648
>
We got it!

The eb->start mismatch with page_offset(), this means something is wrong
with page->index.

Considering page->index is just unsigned long thus when we initialize
page->index using a real u64, we truncated some high bits.

And when we get it back to u64, the truncated bits leads to above result.

The fix would be pretty tricky and with MM guys involved, and may need a
much longer time.

I guess this is a known bug, as page->index limit means we can't handle
files over 4T on 32bit systems, even if the underlying fs can handle it
(just like what you hit).

Thanks,
Qu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  8:38                                                                   ` Qu Wenruo
@ 2021-02-18  8:52                                                                     ` Erik Jensen
  2021-02-18  8:59                                                                       ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-02-18  8:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs

On Thu, Feb 18, 2021 at 12:38 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> We got it!
>
> The eb->start mismatch with page_offset(), this means something is wrong
> with page->index.
>
> Considering page->index is just unsigned long thus when we initialize
> page->index using a real u64, we truncated some high bits.
>
> And when we get it back to u64, the truncated bits leads to above result.
>
> The fix would be pretty tricky and with MM guys involved, and may need a
> much longer time.
>
> I guess this is a known bug, as page->index limit means we can't handle
> files over 4T on 32bit systems, even if the underlying fs can handle it
> (just like what you hit).
>
> Thanks,
> Qu

Thanks for digging into it! Is there an existing bug or discussion I
can follow, or any other way I can be of assistance?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  8:52                                                                     ` Erik Jensen
@ 2021-02-18  8:59                                                                       ` Qu Wenruo
  2021-02-20  2:47                                                                         ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-18  8:59 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Su Yue, Hugo Mills, linux-btrfs



On 2021/2/18 下午4:52, Erik Jensen wrote:
> On Thu, Feb 18, 2021 at 12:38 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> We got it!
>>
>> The eb->start mismatch with page_offset(), this means something is wrong
>> with page->index.
>>
>> Considering page->index is just unsigned long thus when we initialize
>> page->index using a real u64, we truncated some high bits.
>>
>> And when we get it back to u64, the truncated bits leads to above result.
>>
>> The fix would be pretty tricky and with MM guys involved, and may need a
>> much longer time.
>>
>> I guess this is a known bug, as page->index limit means we can't handle
>> files over 4T on 32bit systems, even if the underlying fs can handle it
>> (just like what you hit).
>>
>> Thanks,
>> Qu
>
> Thanks for digging into it! Is there an existing bug or discussion I
> can follow, or any other way I can be of assistance?
>

Just send a mail to the fs-devel mail list, titled "page->index
limitation on 32bit system?".

I guess your experience as a real world user would definitely bring more
weight to the discussion.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-18  8:59                                                                       ` Qu Wenruo
@ 2021-02-20  2:47                                                                         ` Erik Jensen
  2021-02-20  3:16                                                                           ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-02-20  2:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs

On Thu, Feb 18, 2021 at 12:59 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> Just send a mail to the fs-devel mail list, titled "page->index
> limitation on 32bit system?".
>
> I guess your experience as a real world user would definitely bring more
> weight to the discussion.
>
> Thanks,
> Qu

Given that it sounds like the issue is the metadata address space, and
given that I surely don't actually have 16TiB of metadata on a 24TiB
file system (indeed, Metadata, RAID1: total=30.00GiB, used=28.91GiB),
is there any way I could compact the metadata offsets into the lower
16TiB of the virtual metadata inode? Perhaps that could be something
balance could be taught to do? (Obviously, the initial run of such a
balance would have to be performed using a 64-bit system.)

Perhaps, on 32-bit, btrfs itself or some monitoring tool could even
kick off such a metadata balance automatically when the offset hits
10TiB to hopefully avoid ever reaching 16TiB?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-20  2:47                                                                         ` Erik Jensen
@ 2021-02-20  3:16                                                                           ` Qu Wenruo
  2021-02-20  4:28                                                                             ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-20  3:16 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Su Yue, Hugo Mills, linux-btrfs



On 2021/2/20 上午10:47, Erik Jensen wrote:
> On Thu, Feb 18, 2021 at 12:59 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> Just send a mail to the fs-devel mail list, titled "page->index
>> limitation on 32bit system?".
>>
>> I guess your experience as a real world user would definitely bring more
>> weight to the discussion.
>>
>> Thanks,
>> Qu
>
> Given that it sounds like the issue is the metadata address space, and
> given that I surely don't actually have 16TiB of metadata on a 24TiB
> file system (indeed, Metadata, RAID1: total=30.00GiB, used=28.91GiB),
> is there any way I could compact the metadata offsets into the lower
> 16TiB of the virtual metadata inode? Perhaps that could be something
> balance could be taught to do? (Obviously, the initial run of such a
> balance would have to be performed using a 64-bit system.)

Unfortunately, no.

Btrfs relies on increasing bytenr in the logical address space for
things like balance, thus we can't relocate chunks to smaller bytenr.

>
> Perhaps, on 32-bit, btrfs itself or some monitoring tool could even
> kick off such a metadata balance automatically when the offset hits
> 10TiB to hopefully avoid ever reaching 16TiB?
>
That would be worse, as each balanced block group can only go higher
bytenr, not lower, thus it will speed up the problem.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-20  3:16                                                                           ` Qu Wenruo
@ 2021-02-20  4:28                                                                             ` Erik Jensen
  2021-02-20  6:01                                                                               ` Qu Wenruo
  0 siblings, 1 reply; 44+ messages in thread
From: Erik Jensen @ 2021-02-20  4:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs

On Fri, Feb 19, 2021 at 7:16 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2021/2/20 上午10:47, Erik Jensen wrote:
> > Given that it sounds like the issue is the metadata address space, and
> > given that I surely don't actually have 16TiB of metadata on a 24TiB
> > file system (indeed, Metadata, RAID1: total=30.00GiB, used=28.91GiB),
> > is there any way I could compact the metadata offsets into the lower
> > 16TiB of the virtual metadata inode? Perhaps that could be something
> > balance could be taught to do? (Obviously, the initial run of such a
> > balance would have to be performed using a 64-bit system.)
>
> Unfortunately, no.
>
> Btrfs relies on increasing bytenr in the logical address space for
> things like balance, thus we can't relocate chunks to smaller bytenr.

That's… unfortunate. How much relies on the assumption that bytenr is monotonic?

Brainstorming some ideas, is compacting the address space something
that could be done offline? E.g., maybe some two-pass process: first
something balance-like that bumps all of the metadata up to a compact
region of address space, starting at a new 16TiB boundary, and then a
follow up pass that just strips the top bits off?

Or maybe once all of the bytenrs are brought within 16TiB of each
other by balance, btrfs could just keep track of an offset that needs
to be applied when mapping page cache indexes?

Or maybe btrfs could use multiple virtual inodes on 32-bit systems,
one for each 16TiB block of address space with metadata in it? If this
were to ever grow to need more than a handful of virtual inodes, it
seems like a balance *would* actually help in this case by compacting
the metadata higher in the address space, allowing the virtual inodes
for lower in the address space to be dropped.

Or maybe btrfs could just not use the page cache for the metadata
inode once the offset exceeds 16TiB, and only cache at the block
layer? This would surely hurt performance, but at least the filesystem
could still be accessed.

Given that this issue appears to be not due to the size of the
filesystem, but merely how much I've used it, having the only solution
be to copy all of the data off, reformat the drives, and then restore
every time filesystem usage exceeds a certain thresholds is… not very
satisfying.

Finally, I've never done kernel dev before, but I do have some C
experience, so if there is a solution that falls into the category of
seeming reasonable, likely to be accepted if implemented, but being
unlikely to get implemented given the low priority of supporting
32-bit systems, let me know and maybe I can carve out some time to
give it a try.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-20  4:28                                                                             ` Erik Jensen
@ 2021-02-20  6:01                                                                               ` Qu Wenruo
  2021-02-21  5:36                                                                                 ` Erik Jensen
  0 siblings, 1 reply; 44+ messages in thread
From: Qu Wenruo @ 2021-02-20  6:01 UTC (permalink / raw)
  To: Erik Jensen; +Cc: Su Yue, Hugo Mills, linux-btrfs



On 2021/2/20 下午12:28, Erik Jensen wrote:
> On Fri, Feb 19, 2021 at 7:16 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2021/2/20 上午10:47, Erik Jensen wrote:
>>> Given that it sounds like the issue is the metadata address space, and
>>> given that I surely don't actually have 16TiB of metadata on a 24TiB
>>> file system (indeed, Metadata, RAID1: total=30.00GiB, used=28.91GiB),
>>> is there any way I could compact the metadata offsets into the lower
>>> 16TiB of the virtual metadata inode? Perhaps that could be something
>>> balance could be taught to do? (Obviously, the initial run of such a
>>> balance would have to be performed using a 64-bit system.)
>>
>> Unfortunately, no.
>>
>> Btrfs relies on increasing bytenr in the logical address space for
>> things like balance, thus we can't relocate chunks to smaller bytenr.
>
> That's… unfortunate. How much relies on the assumption that bytenr is monotonic?

IIRC mostly balance itself.

>
> Brainstorming some ideas, is compacting the address space something
> that could be done offline? E.g., maybe some two-pass process: first
> something balance-like that bumps all of the metadata up to a compact
> region of address space, starting at a new 16TiB boundary, and then a
> follow up pass that just strips the top bits off?

We need btrfs-progs support for off-line balancing.

I used to have this idea, but see very limited usage.

This would be the safest bet, but needs a lot of work, although in user
space.

>
> Or maybe once all of the bytenrs are brought within 16TiB of each
> other by balance, btrfs could just keep track of an offset that needs
> to be applied when mapping page cache indexes?

But further balance/new chunk allocation can still go beyond the limit.

This is biggest problem other fs don't need to bother.
We can dynamically allocate chunks while others can't.

>
> Or maybe btrfs could use multiple virtual inodes on 32-bit systems,
> one for each 16TiB block of address space with metadata in it? If this
> were to ever grow to need more than a handful of virtual inodes, it
> seems like a balance *would* actually help in this case by compacting
> the metadata higher in the address space, allowing the virtual inodes
> for lower in the address space to be dropped.

This may be a good idea.

But the problem of test coverage is always here.

We can spend tons of lines, but at the end it will not really be well
tested, as it's really hard
>
> Or maybe btrfs could just not use the page cache for the metadata
> inode once the offset exceeds 16TiB, and only cache at the block
> layer? This would surely hurt performance, but at least the filesystem
> could still be accessed.

I don't believe it's really possible, unless we override the XArray
thing provided by MM completely and implemented a btrfs only structure.

That's too costy.

>
> Given that this issue appears to be not due to the size of the
> filesystem, but merely how much I've used it, having the only solution
> be to copy all of the data off, reformat the drives, and then restore
> every time filesystem usage exceeds a certain thresholds is… not very
> satisfying.

Yeah, definitely not a good experience.

>
> Finally, I've never done kernel dev before, but I do have some C
> experience, so if there is a solution that falls into the category of
> seeming reasonable, likely to be accepted if implemented, but being
> unlikely to get implemented given the low priority of supporting
> 32-bit systems, let me know and maybe I can carve out some time to
> give it a try.
>
BTW, if you want things like 64K page size, while still keep the 4K
sector size of your existing btrfs, then I guess you may be interested
in the recent subpage support.

Which allow btrfs to mount 4K sector size fs with 64K page size.

Unfortunately it's still WIP, but may fit your usecase, as ARM support
multiple page sizes (4K, 16K, 64K).
(Although we are only going to support 64K page for now)

Thanks,
Qu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: "bad tree block start" when trying to mount on ARM
  2021-02-20  6:01                                                                               ` Qu Wenruo
@ 2021-02-21  5:36                                                                                 ` Erik Jensen
  0 siblings, 0 replies; 44+ messages in thread
From: Erik Jensen @ 2021-02-21  5:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Su Yue, Hugo Mills, linux-btrfs

On Fri, Feb 19, 2021 at 10:01 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2021/2/20 下午12:28, Erik Jensen wrote:
> > [...]
> > Brainstorming some ideas, is compacting the address space something
> > that could be done offline? E.g., maybe some two-pass process: first
> > something balance-like that bumps all of the metadata up to a compact
> > region of address space, starting at a new 16TiB boundary, and then a
> > follow up pass that just strips the top bits off?
>
> We need btrfs-progs support for off-line balancing.
>
> I used to have this idea, but see very limited usage.
>
> This would be the safest bet, but needs a lot of work, although in user
> space.

Would any of the chunks have to actually be physically moved on disk
like happens in a real balance, or would it just be a matter of
adjusting the bytenrs in the relevant data structures? If the latter,
it seems like it could do something relatively straightforward like
start with the lowest in-use bytenr, adjust it to the first possible
bytenr, adjust the second-lowest to be just after it, et cetera.

While I'm sure this would still be a complex challenge, and would need
to take precautions like marking the filesystem unmountable while it's
working and keeping a journal of its progress in case of interruption,
maybe it'd less onerous than reimplementing all of the rebalance logic
in userspace?

> > Or maybe once all of the bytenrs are brought within 16TiB of each
> > other by balance, btrfs could just keep track of an offset that needs
> > to be applied when mapping page cache indexes?
>
> But further balance/new chunk allocation can still go beyond the limit.
>
> This is biggest problem other fs don't need to bother.
> We can dynamically allocate chunks while others can't.

That's true, but no more so than for the offline address-space
compaction option above, or for doing a backup, format, restore cycle.
Obviously it would be ideal if the issue didn't occur in the first
place, but given that it does, it would be nice if there was *some*
way to get the filesystem back into a usable state for a while at
least, even if it required temporarily hooking the drives up to a
64-bit system to do so.

Now, if I had known about the issue beforehand, I probably would have
unmounted the filesystem and used dd when changing my drive
encryption, rather than calling btrfs replace a bunch of times, in
which case I probably never would have triggered the issue in the
first place. :)

> > Or maybe btrfs could use multiple virtual inodes on 32-bit systems,
> > one for each 16TiB block of address space with metadata in it? If this
> > were to ever grow to need more than a handful of virtual inodes, it
> > seems like a balance *would* actually help in this case by compacting
> > the metadata higher in the address space, allowing the virtual inodes
> > for lower in the address space to be dropped.
>
> This may be a good idea.
>
> But the problem of test coverage is always here.
>
> We can spend tons of lines, but at the end it will not really be well
> tested, as it's really hard

I guess this would involve replacing btrfs_fs_info::btree_inode with
an xarray of inodes on 32-bit systems, and allocating inodes as
needed? It looks like inode structs have a lot going on, and I
definitely don't have the knowledge base to judge if this would be a
tractable change to implement or not. (E.g., would calling
new_inode(fs_info->sb) whenever needed cause any issues, or would it
just work as expected?) It looks like chunk metadata can span more
than one page, so another question is whether those can ever be
allocated such that they cross a 16 TiB boundary? If so, it appears
that would be much harder to try to make work. (Presumably such
boundary-spanning allocations could be prevented going forward, but
there could still be existing filesystems that would have to be
rejected.)

> > Or maybe btrfs could just not use the page cache for the metadata
> > inode once the offset exceeds 16TiB, and only cache at the block
> > layer? This would surely hurt performance, but at least the filesystem
> > could still be accessed.
>
> I don't believe it's really possible, unless we override the XArray
> thing provided by MM completely and implemented a btrfs only structure.
>
> That's too costy.

Makes sense.

> > Given that this issue appears to be not due to the size of the
> > filesystem, but merely how much I've used it, having the only solution
> > be to copy all of the data off, reformat the drives, and then restore
> > every time filesystem usage exceeds a certain thresholds is… not very
> > satisfying.
>
> Yeah, definitely not a good experience.
>
> >
> > Finally, I've never done kernel dev before, but I do have some C
> > experience, so if there is a solution that falls into the category of
> > seeming reasonable, likely to be accepted if implemented, but being
> > unlikely to get implemented given the low priority of supporting
> > 32-bit systems, let me know and maybe I can carve out some time to
> > give it a try.
> >
> BTW, if you want things like 64K page size, while still keep the 4K
> sector size of your existing btrfs, then I guess you may be interested
> in the recent subpage support.
>
> Which allow btrfs to mount 4K sector size fs with 64K page size.
>
> Unfortunately it's still WIP, but may fit your usecase, as ARM support
> multiple page sizes (4K, 16K, 64K).
> (Although we are only going to support 64K page for now)

So, basically I'd need this change plus the Bootlin large page patch,
and then hope I never cross the 256 TiB mark for chunk metadata? (Or
at least, not until I find an AArch64 board that fits my needs.) Would
this conflict with your graceful error/warning patch at all? Is there
an easy way to see what my highest bytenr is today?

Also, I see read-only support went into 5.12. Do you have any idea
when write support will be ready for general use?

Thanks!

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-02-21  5:38 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-21  8:34 "bad tree block start" when trying to mount on ARM Erik Jensen
2019-05-21  8:56 ` Patrik Lundquist
2019-05-21  9:01   ` Erik Jensen
2019-05-21  9:18 ` Hugo Mills
2019-05-22 16:02   ` Erik Jensen
2019-06-26  7:04     ` Erik Jensen
2019-06-26  8:10       ` Qu Wenruo
     [not found]         ` <CAMj6ewO229vq6=s+T7GhUegwDADv4dzhqPiM0jo10QiKujvytA@mail.gmail.com>
2019-06-28  8:15           ` Qu Wenruo
2021-01-18 10:50             ` Erik Jensen
     [not found]             ` <CAMj6ewMqXLtrBQgTJuz04v3MBZ0W95fU4pT0jP6kFhuP830TuA@mail.gmail.com>
2021-01-18 11:07               ` Qu Wenruo
2021-01-18 11:55                 ` Erik Jensen
2021-01-18 12:01                   ` Qu Wenruo
2021-01-18 12:12                     ` Erik Jensen
2021-01-19  5:22                       ` Erik Jensen
2021-01-19  9:28                         ` Erik Jensen
2021-01-20  8:21                           ` Qu Wenruo
2021-01-20  8:30                             ` Qu Wenruo
     [not found]                               ` <CAMj6ewOqCJTGjykDijun9_LWYELA=92HrE+KjGo-ehJTutR_+w@mail.gmail.com>
2021-01-26  4:54                                 ` Erik Jensen
2021-01-29  6:39                                   ` Erik Jensen
2021-02-01  2:35                                     ` Qu Wenruo
2021-02-01  5:49                                       ` Su Yue
2021-02-04  6:16                                         ` Erik Jensen
2021-02-06  1:57                                           ` Erik Jensen
2021-02-10  5:47                                             ` Qu Wenruo
2021-02-10 22:17                                               ` Erik Jensen
2021-02-10 23:47                                                 ` Qu Wenruo
2021-02-18  1:24                                                   ` Qu Wenruo
2021-02-18  4:03                                                     ` Erik Jensen
2021-02-18  5:24                                                       ` Qu Wenruo
2021-02-18  5:49                                                         ` Erik Jensen
2021-02-18  6:09                                                           ` Qu Wenruo
2021-02-18  6:59                                                             ` Erik Jensen
2021-02-18  7:24                                                               ` Qu Wenruo
2021-02-18  7:59                                                                 ` Erik Jensen
2021-02-18  8:38                                                                   ` Qu Wenruo
2021-02-18  8:52                                                                     ` Erik Jensen
2021-02-18  8:59                                                                       ` Qu Wenruo
2021-02-20  2:47                                                                         ` Erik Jensen
2021-02-20  3:16                                                                           ` Qu Wenruo
2021-02-20  4:28                                                                             ` Erik Jensen
2021-02-20  6:01                                                                               ` Qu Wenruo
2021-02-21  5:36                                                                                 ` Erik Jensen
2021-02-18  7:25                                                               ` Erik Jensen
2019-05-21 10:17 ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).