Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Justin Brown <Justin.Brown@fandingo.org>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Access Beyond End of Device & Input/Output Errors
Date: Sat, 1 Aug 2020 17:31:00 +0800
Message-ID: <2061ec67-a5a4-07c6-fe5e-8464feb272aa@gmx.com> (raw)
In-Reply-To: <CAKZK7uzmg19NDjGPPAxXKu7LJ-7ZdHu2cad22csj_chr2qxMJg@mail.gmail.com>

[-- Attachment #1.1: Type: text/plain, Size: 9437 bytes --]



On 2020/8/1 下午4:30, Justin Brown wrote:
> Hi Qu,
> 
> Thanks for the help.
> 
> Here's is the lsblk -b:
> 
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda 8:0 0 2000398934016 0 disk
> └─sda1 8:1 0 2000397868544 0 part
> sdb 8:16 0 8001563222016 0 disk
> └─sdb1 8:17 0 8001562156544 0 part
> sdc 8:32 0 120034123776 0 disk
> ├─sdc1 8:33 0 1048576 0 part
> ├─sdc2 8:34 0 524288000 0 part /boot
> └─sdc3 8:35 0 119507255296 0 part /home
> sdd 8:48 0 8001563222016 0 disk
> └─sdd1 8:49 0 8001562156544 0 part
> sde 8:64 0 2000398934016 0 disk
> └─sde1 8:65 0 2000397868544 0 part
> sdf 8:80 0 2000398934016 0 disk
> └─sdf1 8:81 0 2000397868544 0 part /var/media
> sdg 8:96 1 2000398934016 0 disk
> └─sdg1 8:97 1 2000397868544 0 part
> 
> The `btrfs ins...` output is quite long. I've attached it as a txt and
> also uploaded it at
> https://gist.github.com/fandingo/aa345d6c6fa97162f810e86c9ab20d6a


Thanks, this already shows some device size difference.

But all of them are in fact just a little smaller than device size, thus
it should be fine.

Another problem I found is, it looks like either size or start of some
partitions are not aligned to 4K.

It may be a problem for 4K aligned hard disks, so it may worthy some
concern after solving the btrfs problem.

Would you please also provide some extra dump?
- btrfs check /dev/sda1
  It should detect any problems I missed

- btrfs ins dump-super <device> | grep dev_item.uuid
  It's a little hard to find which device owns to which device id.
  So we need this dump of each btrfs device to make sure.

Thanks,
Qu


> 
> Thanks,
> Justin
> 
> On Sat, Aug 1, 2020 at 2:02 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2020/8/1 下午2:58, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/8/1 下午2:51, Justin Brown wrote:
>>>> Hello,
>>>>
>>>> I've run into a strange problem that I haven't seen before, and I need
>>>> some help. I started getting generic "input/output" errors on a couple
>>>> of files, and when I looked deeper, the kernel logs are full of
>>>> messages like:
>>>>
>>>>     sd 5:0:0:0: [sdf] tag#29 access beyond end of device
>>>
>>> We had a new fix for trim. But according to your kernel message, it
>>> doesn't look like the case.
>>>
>>> (No obvious tag showing it's trim/discard)
>>>
>>>>
>>>> I've never seen anything like this before with any FS, so I figured it
>>>> was worth asking before I consider running the standard btrfs tools.
>>>> (I briefly started a scrub, but it was going crazy with uncorrectable
>>>> errors, so I cancelled it.)
>>>>
>>>> Here's my system info:
>>>>
>>>> Fedora 32, kernel 5.7.7-200.fc32.x86_64
>>>> btrfs-progs v5.7
>>>>
>>>> /etc/fstab entry:
>>>> LABEL=media /var/media btrfs subvol=media,discard 0 2
>>>>
>>>> btrfs fi show /var/media/
>>>> Label: 'media' uuid: 51eef0c7-2977-4037-b271-3270ea22c7d9
>>>> Total devices 6 FS bytes used 4.68TiB
>>>> devid 1 size 1.82TiB used 963.00GiB path /dev/sdf1
>>>> devid 2 size 1.82TiB used 962.00GiB path /dev/sde1
>>>> devid 4 size 1.82TiB used 963.00GiB path /dev/sdg1
>>>> devid 6 size 1.82TiB used 962.03GiB path /dev/sda1
>>>> devid 7 size 7.28TiB used 967.03GiB path /dev/sdb1
>>>> devid 8 size 7.28TiB used 967.03GiB path /dev/sdd1
>>>>
>>>> btrfs fi df /var/media/
>>>> Data, RAID5: total=4.69TiB, used=4.68TiB
>>>> System, RAID1C3: total=32.00MiB, used=304.00KiB
>>>> Metadata, RAID1C3: total=6.00GiB, used=4.94GiB
>>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>>>
>>>> I can only mount -o degraded now. Here are the logs when mounting:
>>>>
>>>> Aug 01 01:15:26 spaceman.fandingo.org sudo[275572]: justin : TTY=pts/0
>>>> ; PWD=/home/justin ; USER=root ; COMMAND=/usr/bin/mount -t btrfs -o
>>>> degraded /dev/sda1 /var/media/
>>>> Aug 01 01:15:26 spaceman.fandingo.org kernel: sd 5:0:0:0: [sdf] tag#30
>>>> access beyond end of device
>>>> Aug 01 01:15:26 spaceman.fandingo.org kernel: blk_update_request: I/O
>>>> error, dev sdf, sector 2176 op 0x0:(READ) flags 0x0 phys_seg 1 prio
>>>> class 0
>>>
>>> OK, it's read, not DISCARD, thus a completely different problem.
>>>
>>>
>>>> Aug 01 01:15:26 spaceman.fandingo.org kernel: Buffer I/O error on dev
>>>> sdf1, logical block 16, async page read
>>>> Aug 01 01:15:26 spaceman.fandingo.org kernel: BTRFS info (device
>>>> sde1): allowing degraded mounts
>>>> Aug 01 01:15:26 spaceman.fandingo.org kernel: BTRFS info (device
>>>> sde1): disk space caching is enabled
>>>> Aug 01 01:15:26 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): devid 1 uuid cb05aae6-6c03-49d3-b46d-bf51a0eb8cd0 is missing
>>>> Aug 01 01:15:26 spaceman.fandingo.org kernel: BTRFS info (device
>>>> sde1): bdev /dev/sdf1 errs: wr 4458026, rd 14571, flush 0, corrupt 0,
>>>> gen 0
>>>>
>>>> It seems like only relatively recently written files are encountering
>>>> I/O errors. If I `cat` one of the problematic files when the FS is
>>>> mounted normally, I see a ton of this:
>>>>
>>>> Aug 01 01:13:49 spaceman.fandingo.org kernel: sd 5:0:0:0: [sdf] tag#26
>>>> access beyond end of device
>>>> Aug 01 01:13:49 spaceman.fandingo.org kernel: sd 5:0:0:0: [sdf] tag#27
>>>> access beyond end of device
>>>> Aug 01 01:13:49 spaceman.fandingo.org kernel: sd 5:0:0:0: [sdf] tag#28
>>>> access beyond end of device
>>>> Aug 01 01:13:49 spaceman.fandingo.org kernel: sd 5:0:0:0: [sdf] tag#29
>>>> access beyond end of device
>>>> Aug 01 01:13:49 spaceman.fandingo.org kernel: sd 5:0:0:0: [sdf] tag#30
>>>> access beyond end of device
>>>> Aug 01 01:13:49 spaceman.fandingo.org kernel: sd 5:0:0:0: [sdf] tag#0
>>>> access beyond end of device
>>>> Aug 01 01:13:49 spaceman.fandingo.org kernel: sd 5:0:0:0: [sdf] tag#1
>>>> access beyond end of device
>>>> Aug 01 01:13:49 spaceman.fandingo.org kernel: sd 5:0:0:0: [sdf] tag#13
>>>> access beyond end of device
>>>> Aug 01 01:13:49 spaceman.fandingo.org kernel: sd 5:0:0:0: [sdf] tag#2
>>>> access beyond end of device
>>>>
>>>> Now that I'm remounted in -o degraded, I'm getting more comprehensible
>>>> warnings, but it still results in I/O read failures:
>>>>
>>>> Aug 01 01:31:53 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): csum failed root 2820 ino 747435 off 99942400 csum 0x8941f998
>>>> expected csum 0xbe3f80a4 mirror 2
>>>> Aug 01 01:31:53 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): csum failed root 2820 ino 747435 off 99946496 csum 0x8941f998
>>>> expected csum 0x9c36a6b4 mirror 2
>>>> Aug 01 01:31:53 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): csum failed root 2820 ino 747435 off 99950592 csum 0x8941f998
>>>> expected csum 0x44d30ca2 mirror 2
>>>> Aug 01 01:31:53 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): csum failed root 2820 ino 747435 off 99958784 csum 0x8941f998
>>>> expected csum 0xc0f08acc mirror 2
>>>> Aug 01 01:31:53 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): csum failed root 2820 ino 747435 off 99954688 csum 0x8941f998
>>>> expected csum 0xcb11db59 mirror 2
>>>> Aug 01 01:31:53 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): csum failed root 2820 ino 747435 off 99962880 csum 0x8941f998
>>>> expected csum 0x8a4ee0aa mirror 2
>>>> Aug 01 01:31:53 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): csum failed root 2820 ino 747435 off 99971072 csum 0x8941f998
>>>> expected csum 0xdfb79e85 mirror 2
>>>> Aug 01 01:31:53 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): csum failed root 2820 ino 747435 off 99966976 csum 0x8941f998
>>>> expected csum 0xc14921a0 mirror 2
>>>> Aug 01 01:31:53 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): csum failed root 2820 ino 747435 off 99975168 csum 0x8941f998
>>>> expected csum 0xf2fe8774 mirror 2
>>>> Aug 01 01:31:53 spaceman.fandingo.org kernel: BTRFS warning (device
>>>> sde1): csum failed root 2820 ino 747435 off 99979264 csum 0x8941f998
>>>> expected csum 0xae1cafd6 mirror 2
>>>>
>>>> Why trying to research this problem, I came across a Github issue
>>>> https://github.com/kdave/btrfs-progs/issues/282 and a patch from Qu
>>>> from yesterday ([PATCH] btrfs: trim: fix underflow in trim length to
>>>> prevent access beyond device boundary). I do use the discard mount
>>>> option, and I have a weekly fstrim.timer enabled. I did replace 2x2TB
>>>> drives with the 2x8TB drives about 1 month ago, which involved a
>>>> conversion to -d raid5 -m raid1c3, which I suppose could hit the same
>>>> code paths that resize2fs would?
>>>
>>> The problem doesn't look like a trim one, but more likely some device
>>> boundary bug.
>>>
>>> Would you please provide the following info?
>>> - btrfs ins dump-tree -t chunk /dev/sde1
>>>   This contains the device info and chunk tree dump. Doesn't contain
>>>   any confidential info.
>>>   We can use this info to determine if there is some chunk really beyond
>>>   device boundary.
>>>   I guess some chunks are already beyond device boundary by somehow.
>>
>> And `lsblk -b` output.
>>
>> It may be possible that device size in btrfs doesn't match with the real
>> device...
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> Any advice on how to proceed would be greatly appreciated.
>>>>
>>>> Thanks,
>>>> Justin
>>>>
>>>
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-01  6:51 Justin Brown
2020-08-01  6:58 ` Qu Wenruo
2020-08-01  7:02   ` Qu Wenruo
     [not found]     ` <CAKZK7uzmg19NDjGPPAxXKu7LJ-7ZdHu2cad22csj_chr2qxMJg@mail.gmail.com>
2020-08-01  9:31       ` Qu Wenruo [this message]
2020-08-01 11:56         ` Justin Brown
2020-08-01 23:30           ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2061ec67-a5a4-07c6-fe5e-8464feb272aa@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=Justin.Brown@fandingo.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git