Re: Access Beyond End of Device & Input/Output Errors

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: chainofflowers <chainofflowers@neuromante.net>,
	linux-btrfs@vger.kernel.org
Subject: Re: Access Beyond End of Device & Input/Output Errors
Date: Mon, 18 Jan 2021 08:11:36 +0800	[thread overview]
Message-ID: <09596ccd-56b4-d55e-ad06-26d5c84b9ab6@gmx.com> (raw)
In-Reply-To: <5975832.dRgAyDc8OP@luna>



On 2021/1/18 上午7:38, chainofflowers wrote:
> Hi all,
> Hi Qu,
>
> I am also getting this very same error on my system.
> Actually, I am experiencing this since the following old bug was introduced AND also even after it has been fixed:
> https://lore.kernel.org/linux-btrfs/20190521190023.GA68070@glet/T/
>
> That (dm-related) bug was claimed to have been fixed and users confirmed that their btrfs partitions were working correctly again, but I am still experiencing some issues from time to time - and obviously only on SSD devices.
>
> Just to clarify: I am using btrfs volumes on encrypted LUKS partitions, where every partition is encrypted individually.
> I am *NOT* using LVM at all: just btrfs directly on top of LUKS (which is different from the users' setup in the above-mentioned bug reports).
> And I am trimming the partitions only via fstrim, have configured the mount points with the "nodiscard" option and the LUKS volumes in /etc/crypttab with "discard", so to have the pass-through when I use fstrim.
>
> Opposite to Justin, my partitions are all aligned.
>
> When this happens on my root partition, I cannot launch any command anymore because the file system is not responding (e.g.: I get "ls: command not found"). I cannot actually do anything in reality, because the system console is flooded with messages like:
>
>   "sd <....> [sdX] tag#29 access beyond end of device"

The best way to debug such problem is to recompile the kernel adding
some debug outputs.
(Maybe it can be done with bpftrace, but not yet familiar with that)

If you're able to recompile the kenerl (using abs + makepkg for Arch
based kernel), please try the following diff.

This will add extra debugging to show where the offending length
happens, either extent discard or unallocated space discard.
And from that output we can continue our investigation.

Thanks,
Qu

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 30b1a630dc2f..7451fa0b14b9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5776,6 +5776,7 @@ static int btrfs_trim_free_extents(struct
btrfs_device *device, u64 *trimmed)

         ret = 0;

+       pr_info("%s: enter devid=%llu\n", __func__, device->devid);
         while (1) {
                 struct btrfs_fs_info *fs_info = device->fs_info;
                 u64 bytes;
@@ -5820,6 +5821,8 @@ static int btrfs_trim_free_extents(struct
btrfs_device *device, u64 *trimmed)
                         break;
                 }

+               pr_info("%s: devid=%llu start=%llu len=%llu\n",
+                       __func__, device->devid, start, len);
                 ret = btrfs_issue_discard(device->bdev, start, len,
                                           &bytes);
                 if (!ret)
@@ -5842,6 +5845,7 @@ static int btrfs_trim_free_extents(struct
btrfs_device *device, u64 *trimmed)
                 cond_resched();
         }

+       pr_info("%s: done devid=%llu ret=%d\n", __func__, device->devid,
ret);
         return ret;
  }

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 379bef967e1d..03046fca53a2 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3772,6 +3772,8 @@ int btrfs_trim_block_group(struct
btrfs_block_group *block_group,
                 spin_unlock(&block_group->lock);
                 return 0;
         }
+       pr_info("%s: enter bg start=%llu start=%llu end=%llu minlen=%llu\n",
+               __func__, block_group->start, start, end, minlen);
         btrfs_freeze_block_group(block_group);
         spin_unlock(&block_group->lock);

@@ -3786,6 +3788,8 @@ int btrfs_trim_block_group(struct
btrfs_block_group *block_group,
                 reset_trimming_bitmap(ctl, offset_to_bitmap(ctl, end));
  out:
         btrfs_unfreeze_block_group(block_group);
+       pr_info("%s: enter bg start=%llu ret=%d\n",
+               __func__, block_group->start, ret);
         return ret;
  }


>
> and that "clogs" the system. Since the root fs is unusable, the system log cannot store those messages, so I can't find them at the next reboot.
> I can only soft-reset (CTRL-ALT-DEL), it's the "cleanest" (and only) way I can get back to a working system.
>
> When the system restarts, it takes 2 seconds longer than usual to mount the file systems, and then I can use the PC again.
> Immediately after login, if I run btrfs scrub, I get no errors (I scrub ALL of my partitions: they're all fine). So, it seems that at least the auto-recovery capability of btrfs works fine, thanks to you devs :-)
>
> Then, if I boot from an external device and run btrfs check on the unmounted file systems, it also reports NO errors - opposite to what was happening when the dm bug was still open: to me, this really means that btrfs today is able to heal itself from this issue (it was not always the case in 2019, when the dm bug was opened).
> I have not tried to boot from external device directly after this issue occurs - I mean, performing btrfs check without going first through the btrfs scrub step. I will do that next time and see what output I get.
>
> All my partitions are snapshotted, and surely this could help with auto-recovery.
>
> What I have noticed is that when this bug happens, it ALWAYS happens after I have purged the old snapshots: that is, when the root partition only has one "fresh" (read-only) snapshot. This is never happening when I have more than one snapshot - maybe it means nothing, but it seems to me to be systematic.
>
> I have attached a file with my setup.
> Could you maybe spot anything weird there? It looks fine to me. The USER and SCRATCH volumes are in RAID-0.
>
> I am unable to provide any dmesg output or system log because, as said, when it happens it does not write anything to the SYS partition (where /var/log is). I will move at least /var/log/journal to another device, so hopefully next time I will be able to provide some useful info.
>
> Another info: of course, I have tried (twice!) to reconstruct the system SSD from scratch, because I wanted to be sure that it was not depending on some exotic issue. And each time I used a brand new device.
> So, this issue has been happening with a SanDisk Ultra II and with two different Samsung EVO 860.
>
> Is it possible that what we are experiencing is still an effect of that dm bug, that it was not completely fixed?
>
>
> Thanks for your help, and for reading till here :)
>
> (c)
>