On 2021/2/11 上午7:47, Qu Wenruo wrote: > > > On 2021/2/11 上午6:17, Erik Jensen wrote: >> On Tue, Feb 9, 2021 at 9:47 PM Qu Wenruo wrote: > [...] >>> >>> Unfortunately I didn't get much useful info from the trace events. >>> As a lot of the values doesn't even make sense to me.... >>> >>> But the chunk tree dump proves to be more useful. >>> >>> Firstly, the offending tree block doesn't even occur in chunk chunk >>> ranges. >>> >>> The offending tree block is 26207780683776, but the tree dump doesn't >>> have any range there. >>> >>> The highest chunk is at 5958289850368 + 4294967296, still one digit >>> lower than the expected value. >>> >>> I'm surprised we didn't even get any error for that, thus it may >>> indicate our chunk mapping is incorrect too. >>> >>> Would you please try the following diff on the 32bit system and report >>> back the dmesg? >>> >>> The diff adds the following debug output: >>> - when we try to read one tree block >>> - when a bio is mapped to read device >>> - when a new chunk is added to chunk tree >>> >>> Thanks, >>> Qu >> >> Okay, here's the dmesg output from attempting to mount the filesystem: >> https://gist.github.com/rkjnsn/914651efdca53c83199029de6bb61e20 >> >> I captured this on my 32-bit x86 VM, as it's much faster to rebuild >> the kernel there than on my ARM board, and it fails with the same >> error. >> > > This is indeed much better. > > The involved things are: > > [   84.463147] read_one_chunk: chunk start=26207148048384 len=1073741824 > num_stripes=2 type=0x14 > [   84.463148] read_one_chunk:    stripe 0 phy=6477927415808 devid=5 > [   84.463149] read_one_chunk:    stripe 1 phy=6477927415808 devid=4 > > Above is the chunk for the offending tree block. > > [   84.463724] read_extent_buffer_pages: eb->start=26207780683776 mirror=0 > [   84.463731] submit_stripe_bio: rw 0 0x1000, phy=2118735708160 > sector=4138155680 dev_id=3 size=16384 > [   84.470793] BTRFS error (device dm-4): bad tree block start, want > 26207780683776 have 3395945502747707095 > > But when the metadata read happens, the physical address and dev id is > completely insane. > > The chunk doesn't have dev 3 in it at all, but we still get the wrong > mapping. > > Furthermore, that physical and devid belongs to chunk 8614760677376, > which is raid5 data chunk. > > So there is definitely something wrong in btrfs chunk mapping on 32bit. > > I'll craft a newer debug diff for you after I pinned down which can be > wrong. Sorry for the delay, mostly due to lunar new year vocation. Here is the new diff, it should be applied upon previous diff. This new diff would add extra debug info inside __btrfs_map_block(). BTW, you only need to rebuild btrfs module to test it, hopes this saves you some time. Although if I could got a small enough image to reproduce locally, it would be the best case... Thanks, Qu > > Thanks, > Qu