linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Erik Jensen <erikjensen@rkjnsn.net>, Su Yue <l@damenly.su>
Cc: Hugo Mills <hugo@carfax.org.uk>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: "bad tree block start" when trying to mount on ARM
Date: Wed, 10 Feb 2021 13:47:39 +0800	[thread overview]
Message-ID: <7d32a06e-dc2e-c2c4-ddce-1f2693980c5b@gmx.com> (raw)
In-Reply-To: <CAMj6ewMSw+UzZHhEEN=rhxN8O3pN9gWA05usAodk2xX5+s-Qjw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 7403 bytes --]



On 2021/2/6 上午9:57, Erik Jensen wrote:
> On Wed, Feb 3, 2021 at 10:16 PM Erik Jensen <erikjensen@rkjnsn.net> wrote:
>> On Sun, Jan 31, 2021 at 9:50 PM Su Yue <l@damenly.su> wrote:
>>> On Mon 01 Feb 2021 at 10:35, Qu Wenruo <quwenruo.btrfs@gmx.com>
>>> wrote:
>>>> On 2021/1/29 下午2:39, Erik Jensen wrote:
>>>>> On Mon, Jan 25, 2021 at 8:54 PM Erik Jensen
>>>>> <erikjensen@rkjnsn.net> wrote:
>>>>>> On Wed, Jan 20, 2021 at 1:08 AM Erik Jensen
>>>>>> <erikjensen@rkjnsn.net> wrote:
>>>>>>> On Wed, Jan 20, 2021 at 12:31 AM Qu Wenruo
>>>>>>> <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>> On 2021/1/20 下午4:21, Qu Wenruo wrote:
>>>>>>>>> On 2021/1/19 下午5:28, Erik Jensen wrote:
>>>>>>>>>> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen
>>>>>>>>>> <erikjensen@rkjnsn.net>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen
>>>>>>>>>>> <erikjensen@rkjnsn.net>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> The offending system is indeed ARMv7 (specifically a
>>>>>>>>>>>> Marvell ARMADA®
>>>>>>>>>>>> 388), but I believe the Broadcom BCM2835 in my Raspberry
>>>>>>>>>>>> Pi is
>>>>>>>>>>>> actually ARMv6 (with hardware float support).
>>>>>>>>>>>
>>>>>>>>>>> Using NBD, I have verified that I receive the same error
>>>>>>>>>>> when
>>>>>>>>>>> attempting to mount the filesystem on my ARMv6 Raspberry
>>>>>>>>>>> Pi:
>>>>>>>>>>> [ 3491.339572] BTRFS info (device dm-4): disk space
>>>>>>>>>>> caching is enabled
>>>>>>>>>>> [ 3491.394584] BTRFS info (device dm-4): has skinny
>>>>>>>>>>> extents
>>>>>>>>>>> [ 3492.385095] BTRFS error (device dm-4): bad tree block
>>>>>>>>>>> start, want
>>>>>>>>>>> 26207780683776 have 3395945502747707095
>>>>>>>>>>> [ 3492.514071] BTRFS error (device dm-4): bad tree block
>>>>>>>>>>> start, want
>>>>>>>>>>> 26207780683776 have 3395945502747707095
>>>>>>>>>>> [ 3492.553599] BTRFS warning (device dm-4): failed to
>>>>>>>>>>> read tree root
>>>>>>>>>>> [ 3492.865368] BTRFS error (device dm-4): open_ctree
>>>>>>>>>>> failed
>>>>>>>>>>>
>>>>>>>>>>> The Raspberry Pi is running Linux 5.4.83.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Okay, after some more testing, ARM seems to be irrelevant,
>>>>>>>>>> and 32-bit
>>>>>>>>>> is the key factor. On a whim, I booted up an i686, 5.8.14
>>>>>>>>>> kernel in a
>>>>>>>>>> VM, attached the drives via NBD, ran cryptsetup, tried to
>>>>>>>>>> mount, and…
>>>>>>>>>> I got the exact same error message.
>>>>>>>>>>
>>>>>>>>> My educated guess is on 32bit platforms, we passed
>>>>>>>>> incorrect sector into
>>>>>>>>> bio, thus gave us garbage.
>>>>>>>>
>>>>>>>> To prove that, you can use bcc tool to verify it.
>>>>>>>> biosnoop can do that:
>>>>>>>> https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt
>>>>>>>>
>>>>>>>> Just try mount the fs with biosnoop running.
>>>>>>>> With "btrfs ins dump-tree -t chunk <dev>", we can manually
>>>>>>>> calculate the
>>>>>>>> offset of each read to see if they matches.
>>>>>>>> If not match, it would prove my assumption and give us a
>>>>>>>> pretty good
>>>>>>>> clue to fix.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Qu
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is this bug happening only on the fs, or any other btrfs
>>>>>>>>> can also
>>>>>>>>> trigger similar problems on 32bit platforms?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Qu
>>>>>>>
>>>>>>> I have only observed this error on this file system.
>>>>>>> Additionally, the
>>>>>>> error mounting with the NAS only started after I did a `btrfs
>>>>>>> replace`
>>>>>>> on all five 8TB drives using an x86_64 system. (Ironically, I
>>>>>>> did this
>>>>>>> with the goal of making it faster to use the filesystem on
>>>>>>> the NAS by
>>>>>>> re-encrypting the drives to use a cipher supported by my
>>>>>>> NAS's crypto
>>>>>>> accelerator.)
>>>>>>>
>>>>>>> Maybe this process of shuffling 40TB around caused some value
>>>>>>> in the
>>>>>>> filesystem to increment to the point that a calculation using
>>>>>>> it
>>>>>>> overflows on 32-bit systems?
>>>>>>>
>>>>>>> I should be able to try biosnoop later this week, and I'll
>>>>>>> report back
>>>>>>> with the results.
>>>>>>
>>>>>> Okay, I tried running biosnoop, but I seem to be running into
>>>>>> this
>>>>>> bug: https://github.com/iovisor/bcc/issues/3241 (That bug was
>>>>>> reported
>>>>>> for cpudist, but I'm seeing the same error when I try to run
>>>>>> biosnoop.)
>>>>>>
>>>>>> Anything else I can try?
>>>>>
>>>>> Is it possible to add printks to retrieve the same data?
>>>>>
>>>> Sorry for the late reply, busying testing subpage patchset. (And
>>>> unfortunately no much process).
>>>>
>>>> If bcc is not possible, you can still use ftrace events, but
>>>> unfortunately I didn't find good enough one. (In fact, the trace
>>>> events
>>>> for block layer is pretty limited).
>>>>
>>>> You can try to add printk()s in function blk_account_io_done()
>>>> to
>>>> emulate what's done in function trace_req_completion() of
>>>> biosnoop.
>>>>
>>>> The time delta is not important, we only need the device name,
>>>> sector
>>>> and length.
>>>>
>>>
>>> Tips: There are ftrace events called block:block_rq_issue and
>>> block:block_rq_complete to fetch those infomation. No need to
>>> add printk().
>>>
>>>>
>>>> Thanks,
>>>> Qu
>>>
>>
>> Okay, here's the output of the trace:
>> https://gist.github.com/rkjnsn/4cf606874962b5a0284249b2f2e934f5
>>
>> And here's the output dump-tree:
>> https://gist.github.com/rkjnsn/630b558eaf90369478d670a1cb54b40f
>>
>> One important note is that ftrace only captured requests at the
>> underlying block device (nbd, in this case), not at the device mapper
>> level. The encryption header on these drives is 16 MiB, so the offset
>> reported in the trace will be 16777216 bytes larger than the offset
>> brtfs was actually trying to read at the time.
>>
>> In case it's helpful, I believe this is the mapping of which
>> (encrypted) nbd device node in the trace corresponds to which
>> (decrypted) filesystem device:
>> 43,0    33c75e20-26f2-4328-a565-5ef3484832aa
>> 43,32   9bdfdb8f-abfb-47c5-90af-d360d754a958
>> 43,64   39a9463d-65f5-499b-bca8-dae6b52eb729
>> 43,96   f1174dea-ea10-42f2-96b4-4589a2980684
>> 43,128  e669d804-6ea2-4516-8536-1d266f88ebad
>
> What are the chances it's something simple like a long getting used
> somewhere in the code that should actually be a 64-bit int?
>
That's what I expected, but I didn't find anything obviously suspicious yet.

Unfortunately I didn't get much useful info from the trace events.
As a lot of the values doesn't even make sense to me....

But the chunk tree dump proves to be more useful.

Firstly, the offending tree block doesn't even occur in chunk chunk ranges.

The offending tree block is 26207780683776, but the tree dump doesn't
have any range there.

The highest chunk is at 5958289850368 + 4294967296, still one digit
lower than the expected value.

I'm surprised we didn't even get any error for that, thus it may
indicate our chunk mapping is incorrect too.

Would you please try the following diff on the 32bit system and report
back the dmesg?

The diff adds the following debug output:
- when we try to read one tree block
- when a bio is mapped to read device
- when a new chunk is added to chunk tree

Thanks,
Qu

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 2049 bytes --]

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7f689ad7709c..a97399f5ac6b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5573,6 +5573,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
 
+	pr_info("%s: eb->start=%llu mirror=%d\n", __func__, eb->start,
+			mirror_num);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index badb972919eb..03dd432b9812 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6374,6 +6374,11 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio,
 	btrfs_io_bio(bio)->device = dev;
 	bio->bi_end_io = btrfs_end_bio;
 	bio->bi_iter.bi_sector = physical >> 9;
+
+	pr_info("%s: rw %d 0x%x, phy=%llu sector=%llu dev_id=%llu size=%u\n", __func__,
+		bio_op(bio), bio->bi_opf, ((u64)bio->bi_iter.bi_sector) << 9,
+		bio->bi_iter.bi_sector,
+		dev->devid, bio->bi_iter.bi_size);
 	btrfs_debug_in_rcu(fs_info,
 	"btrfs_map_bio: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u",
 		bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector,
@@ -6670,6 +6675,8 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
 		return -ENOMEM;
 	}
 
+	pr_info("%s: chunk start=%llu len=%llu num_stripes=%d type=0x%llx\n", __func__,
+		logical, length, num_stripes, btrfs_chunk_type(leaf, chunk));
 	set_bit(EXTENT_FLAG_FS_MAPPING, &em->flags);
 	em->map_lookup = map;
 	em->start = logical;
@@ -6694,6 +6701,9 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
 		read_extent_buffer(leaf, uuid, (unsigned long)
 				   btrfs_stripe_dev_uuid_nr(chunk, i),
 				   BTRFS_UUID_SIZE);
+		pr_info("%s:    stripe %u phy=%llu devid=%llu\n", __func__,
+			i, btrfs_stripe_offset_nr(leaf, chunk, i),
+			devid);
 		map->stripes[i].dev = btrfs_find_device(fs_info->fs_devices,
 							devid, uuid, NULL);
 		if (!map->stripes[i].dev &&

  reply	other threads:[~2021-02-10  5:49 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-21  8:34 "bad tree block start" when trying to mount on ARM Erik Jensen
2019-05-21  8:56 ` Patrik Lundquist
2019-05-21  9:01   ` Erik Jensen
2019-05-21  9:18 ` Hugo Mills
2019-05-22 16:02   ` Erik Jensen
2019-06-26  7:04     ` Erik Jensen
2019-06-26  8:10       ` Qu Wenruo
     [not found]         ` <CAMj6ewO229vq6=s+T7GhUegwDADv4dzhqPiM0jo10QiKujvytA@mail.gmail.com>
2019-06-28  8:15           ` Qu Wenruo
2021-01-18 10:50             ` Erik Jensen
     [not found]             ` <CAMj6ewMqXLtrBQgTJuz04v3MBZ0W95fU4pT0jP6kFhuP830TuA@mail.gmail.com>
2021-01-18 11:07               ` Qu Wenruo
2021-01-18 11:55                 ` Erik Jensen
2021-01-18 12:01                   ` Qu Wenruo
2021-01-18 12:12                     ` Erik Jensen
2021-01-19  5:22                       ` Erik Jensen
2021-01-19  9:28                         ` Erik Jensen
2021-01-20  8:21                           ` Qu Wenruo
2021-01-20  8:30                             ` Qu Wenruo
     [not found]                               ` <CAMj6ewOqCJTGjykDijun9_LWYELA=92HrE+KjGo-ehJTutR_+w@mail.gmail.com>
2021-01-26  4:54                                 ` Erik Jensen
2021-01-29  6:39                                   ` Erik Jensen
2021-02-01  2:35                                     ` Qu Wenruo
2021-02-01  5:49                                       ` Su Yue
2021-02-04  6:16                                         ` Erik Jensen
2021-02-06  1:57                                           ` Erik Jensen
2021-02-10  5:47                                             ` Qu Wenruo [this message]
2021-02-10 22:17                                               ` Erik Jensen
2021-02-10 23:47                                                 ` Qu Wenruo
2021-02-18  1:24                                                   ` Qu Wenruo
2021-02-18  4:03                                                     ` Erik Jensen
2021-02-18  5:24                                                       ` Qu Wenruo
2021-02-18  5:49                                                         ` Erik Jensen
2021-02-18  6:09                                                           ` Qu Wenruo
2021-02-18  6:59                                                             ` Erik Jensen
2021-02-18  7:24                                                               ` Qu Wenruo
2021-02-18  7:59                                                                 ` Erik Jensen
2021-02-18  8:38                                                                   ` Qu Wenruo
2021-02-18  8:52                                                                     ` Erik Jensen
2021-02-18  8:59                                                                       ` Qu Wenruo
2021-02-20  2:47                                                                         ` Erik Jensen
2021-02-20  3:16                                                                           ` Qu Wenruo
2021-02-20  4:28                                                                             ` Erik Jensen
2021-02-20  6:01                                                                               ` Qu Wenruo
2021-02-21  5:36                                                                                 ` Erik Jensen
2021-02-18  7:25                                                               ` Erik Jensen
2019-05-21 10:17 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d32a06e-dc2e-c2c4-ddce-1f2693980c5b@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=erikjensen@rkjnsn.net \
    --cc=hugo@carfax.org.uk \
    --cc=l@damenly.su \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).