All of lore.kernel.org
 help / color / mirror / Atom feed
* IO failure without other (device) error
@ 2021-05-09 22:14 Martin Raiber
  2021-06-18 16:18 ` Martin Raiber
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Raiber @ 2021-05-09 22:14 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:

[1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
[2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
[2260628.156980] BTRFS info (device dm-0): forced readonly

This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).

Regards,
Martin Raiber


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-05-09 22:14 IO failure without other (device) error Martin Raiber
@ 2021-06-18 16:18 ` Martin Raiber
  2021-06-18 16:28   ` Roman Mamedov
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Martin Raiber @ 2021-06-18 16:18 UTC (permalink / raw)
  To: linux-btrfs

On 10.05.2021 00:14 Martin Raiber wrote:
> I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:
>
> [1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
> [2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
> [2260628.156980] BTRFS info (device dm-0): forced readonly
>
> This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).

Just got it again (5.10.43). So I guess the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that. Grep for EIO and put a WARN_ON at each location?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-06-18 16:18 ` Martin Raiber
@ 2021-06-18 16:28   ` Roman Mamedov
  2021-06-18 17:17     ` Martin Raiber
  2021-06-18 17:36   ` Roman Mamedov
  2021-06-30 18:40   ` Martin Raiber
  2 siblings, 1 reply; 15+ messages in thread
From: Roman Mamedov @ 2021-06-18 16:28 UTC (permalink / raw)
  To: Martin Raiber; +Cc: linux-btrfs

On Fri, 18 Jun 2021 16:18:40 +0000
Martin Raiber <martin@urbackup.org> wrote:

> On 10.05.2021 00:14 Martin Raiber wrote:
> > I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:
> >
> > [1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
> > [2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
> > [2260628.156980] BTRFS info (device dm-0): forced readonly
> >
> > This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).
> 
> Just got it again (5.10.43). So I guess the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that. Grep for EIO and put a WARN_ON at each location?

Is this the complete 'dmesg'? Usually it would print a backtrace that could be
useful for figuring out the reason.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-06-18 16:28   ` Roman Mamedov
@ 2021-06-18 17:17     ` Martin Raiber
  0 siblings, 0 replies; 15+ messages in thread
From: Martin Raiber @ 2021-06-18 17:17 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs

On 18.06.2021 18:28 Roman Mamedov wrote:
> On Fri, 18 Jun 2021 16:18:40 +0000
> Martin Raiber <martin@urbackup.org> wrote:
>
>> On 10.05.2021 00:14 Martin Raiber wrote:
>>> I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:
>>>
>>> [1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
>>> [2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
>>> [2260628.156980] BTRFS info (device dm-0): forced readonly
>>>
>>> This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).
>> Just got it again (5.10.43). So I guess the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that. Grep for EIO and put a WARN_ON at each location?
> Is this the complete 'dmesg'? Usually it would print a backtrace that could be
> useful for figuring out the reason.

Yes, that's why I included the systemd stuff... This is the newest one:

[86764.516395] systemd-journald[892771]: Received client request to flush runtime journal.
[182813.980004] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2735: errno=-5 IO failure
[182813.980059] BTRFS info (device dm-0): forced readonly
[182814.174438] BTRFS warning (device dm-0): Skipping commit of aborted transaction.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-06-18 16:18 ` Martin Raiber
  2021-06-18 16:28   ` Roman Mamedov
@ 2021-06-18 17:36   ` Roman Mamedov
  2021-06-18 18:24     ` Martin Raiber
  2021-06-30 18:40   ` Martin Raiber
  2 siblings, 1 reply; 15+ messages in thread
From: Roman Mamedov @ 2021-06-18 17:36 UTC (permalink / raw)
  To: Martin Raiber; +Cc: linux-btrfs

On Fri, 18 Jun 2021 16:18:40 +0000
Martin Raiber <martin@urbackup.org> wrote:

> On 10.05.2021 00:14 Martin Raiber wrote:
> the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that

Not entirely nothing, in theory the number afterwards is the line number of
source code (of btrfs/inode.c in this case) where the error occurred. But need
to check your specific patched sources, as if you patch Btrfs code the number
won't match with the mainline.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-06-18 17:36   ` Roman Mamedov
@ 2021-06-18 18:24     ` Martin Raiber
  0 siblings, 0 replies; 15+ messages in thread
From: Martin Raiber @ 2021-06-18 18:24 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs

On 18.06.2021 19:36 Roman Mamedov wrote:
> On Fri, 18 Jun 2021 16:18:40 +0000
> Martin Raiber <martin@urbackup.org> wrote:
>
>> On 10.05.2021 00:14 Martin Raiber wrote:
>> the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that
> Not entirely nothing, in theory the number afterwards is the line number of
> source code (of btrfs/inode.c in this case) where the error occurred. But need
> to check your specific patched sources, as if you patch Btrfs code the number
> won't match with the mainline.
>
Yeah, the line points at this in inode.c:

    ret = add_pending_csums(trans, &ordered_extent->list);
    if (ret) {
        -->btrfs_abort_transaction(trans, ret);
        goto out;
    }

and the only thing that can fail in add_pending_csums is btrfs_csum_file_blocks. Which is why I said it points at this function.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-06-18 16:18 ` Martin Raiber
  2021-06-18 16:28   ` Roman Mamedov
  2021-06-18 17:36   ` Roman Mamedov
@ 2021-06-30 18:40   ` Martin Raiber
  2021-07-01  1:40     ` Qu Wenruo
  2 siblings, 1 reply; 15+ messages in thread
From: Martin Raiber @ 2021-06-30 18:40 UTC (permalink / raw)
  To: linux-btrfs

On 18.06.2021 18:18 Martin Raiber wrote:
> On 10.05.2021 00:14 Martin Raiber wrote:
>> I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:
>>
>> [1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
>> [2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
>> [2260628.156980] BTRFS info (device dm-0): forced readonly
>>
>> This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).
> Just got it again (5.10.43). So I guess the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that. Grep for EIO and put a WARN_ON at each location?
>
Added the WARN_ON -EIOs. And hit it. It points at read_extent_buffer_pages (this time), this part before unlock_exit:

    for (i = 0; i < num_pages; i++) {
        page = eb->pages[i];
        wait_on_page_locked(page);
        if (!PageUptodate(page))
            -->ret = -EIO;
    }

Complete dmesg output. In this instance it seems to not be able to read a csum. It doesn't go read only in this case... Maybe it should?

[Wed Jun 30 10:31:11 2021] systemd[1]: Started Journal Service.
[Wed Jun 30 10:43:22 2021] ------------[ cut here ]------------
[Wed Jun 30 10:43:22 2021] WARNING: CPU: 13 PID: 3965190 at fs/btrfs/extent_io.c:5597 read_extent_buffer_pages+0x346/0x360
[Wed Jun 30 10:43:22 2021] Modules linked in: zram bcache crc64 loop dm_crypt bfq xfs dm_mod st sr_mod cdrom bridge stp llc intel_powerclamp coretemp kvm_intel snd_pcm mgag200 snd_timer kvm snd drm_kms_helper iTCO_wdt soundcore dcdbas irqbypass serio_raw pcspkr joydev iTCO_vendor_support evdev i2c_algo_bit i7core_edac sg ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter button ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm configfs ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid0 multipath linear raid1 md_mod ses enclosure sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd ahci cryptd glue_helper libahci uhci_hcd ehci_pci psmouse mpt3sas ehci_hcd raid_class lpc_ich libata nvme scsi_transport_sas mfd_core nvme_core usbcore t10_pi scsi_mod bnx2
[Wed Jun 30 10:43:22 2021] CPU: 13 PID: 3965190 Comm: io_wqe_worker-0 Tainted: G          I       5.10.44 #1
[Wed Jun 30 10:43:22 2021] Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.13.0 03/02/2018
[Wed Jun 30 10:43:22 2021] RIP: 0010:read_extent_buffer_pages+0x346/0x360
[Wed Jun 30 10:43:22 2021] Code: 48 8b 43 08 a8 01 48 8d 78 ff 48 0f 44 fb 31 f6 e8 8f d5 db ff 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da 48 8b 03 a8 04 75 ab <0f> 0b 41 be fb ff ff ff eb a1 e8 5b a2 56 00 66 66 2e 0f 1f 84 00
[Wed Jun 30 10:43:22 2021] RSP: 0018:ffffc900215ab5f0 EFLAGS: 00010246
[Wed Jun 30 10:43:22 2021] RAX: 06ffff80000020a3 RBX: ffffea0008a1b680 RCX: 0000000000000000
[Wed Jun 30 10:43:22 2021] RDX: dead0000000000ff RSI: ffffc900215ab588 RDI: ffffffff82208780
[Wed Jun 30 10:43:22 2021] RBP: ffff88844c924d68 R08: 000000000000000d R09: 00000000000003bb
[Wed Jun 30 10:43:22 2021] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88844c924e10
[Wed Jun 30 10:43:22 2021] R13: ffffea00091094c0 R14: 0000000000000000 R15: ffff88844c924e10
[Wed Jun 30 10:43:22 2021] FS:  0000000000000000(0000) GS:ffff888713b80000(0000) knlGS:0000000000000000
[Wed Jun 30 10:43:22 2021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Jun 30 10:43:22 2021] CR2: 00007f4c7a7fb00a CR3: 000000012f1ba003 CR4: 00000000000206e0
[Wed Jun 30 10:43:22 2021] Call Trace:
[Wed Jun 30 10:43:22 2021]  btree_read_extent_buffer_pages+0x9f/0x110
[Wed Jun 30 10:43:22 2021]  read_tree_block+0x36/0x60
[Wed Jun 30 10:43:22 2021]  read_block_for_search.isra.0+0x1ab/0x360
[Wed Jun 30 10:43:22 2021]  btrfs_search_slot+0x232/0x970
[Wed Jun 30 10:43:22 2021]  btrfs_lookup_csum+0x75/0x170
[Wed Jun 30 10:43:22 2021]  btrfs_lookup_bio_sums+0x23a/0x620
[Wed Jun 30 10:43:22 2021]  btrfs_submit_compressed_read+0x44f/0x4e0
[Wed Jun 30 10:43:22 2021]  btrfs_submit_data_bio+0x170/0x180
[Wed Jun 30 10:43:22 2021]  submit_one_bio+0x44/0x70
[Wed Jun 30 10:43:22 2021]  extent_readahead+0x374/0x3a0
[Wed Jun 30 10:43:22 2021]  ? xas_load+0x5/0x70
[Wed Jun 30 10:43:22 2021]  ? xa_get_order+0x93/0xd0
[Wed Jun 30 10:43:22 2021]  ? __mod_memcg_lruvec_state+0x21/0xe0
[Wed Jun 30 10:43:22 2021]  ? __add_to_page_cache_locked+0x19d/0x3f0
[Wed Jun 30 10:43:22 2021]  read_pages+0x83/0x1e0
[Wed Jun 30 10:43:22 2021]  page_cache_ra_unbounded+0x1aa/0x1f0
[Wed Jun 30 10:43:22 2021]  ? find_get_entry+0xd5/0x140
[Wed Jun 30 10:43:22 2021]  force_page_cache_ra+0xdc/0x140
[Wed Jun 30 10:43:22 2021]  generic_file_buffered_read+0x5d9/0x8e0
[Wed Jun 30 10:43:22 2021]  btrfs_file_read_iter+0x38/0xc0
[Wed Jun 30 10:43:22 2021]  generic_file_splice_read+0xfc/0x1b0
[Wed Jun 30 10:43:22 2021]  do_splice+0x670/0x720
[Wed Jun 30 10:43:22 2021]  io_issue_sqe+0xde7/0xfa0
[Wed Jun 30 10:43:22 2021]  ? pick_next_task_fair+0x32/0x380
[Wed Jun 30 10:43:22 2021]  ? __switch_to+0x7b/0x3d0
[Wed Jun 30 10:43:22 2021]  io_wq_submit_work+0x50/0x1a0
[Wed Jun 30 10:43:22 2021]  io_worker_handle_work+0x1a1/0x570
[Wed Jun 30 10:43:22 2021]  io_wqe_worker+0x2c1/0x360
[Wed Jun 30 10:43:22 2021]  ? io_worker_handle_work+0x570/0x570
[Wed Jun 30 10:43:22 2021]  kthread+0x11b/0x140
[Wed Jun 30 10:43:22 2021]  ? __kthread_bind_mask+0x60/0x60
[Wed Jun 30 10:43:22 2021]  ret_from_fork+0x1f/0x30
[Wed Jun 30 10:43:22 2021] ---[ end trace 32a38f7d02f704b2 ]---
[Wed Jun 30 10:43:22 2021] BTRFS info (device dm-0): no csum found for inode 23468615 start 10428416
[Wed Jun 30 10:43:22 2021] BTRFS warning (device dm-0): csum failed root 1571 ino 23468615 off 586692173824 csum 0xc0a0c537 expected csum 0x00000000 mirror 1
[Wed Jun 30 10:43:22 2021] BTRFS error (device dm-0): bdev /dev/mapper/LUKS-RC-75aba4438f084377be2fc6d7d1f8eba0 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-06-30 18:40   ` Martin Raiber
@ 2021-07-01  1:40     ` Qu Wenruo
  2021-07-01 17:25       ` Martin Raiber
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2021-07-01  1:40 UTC (permalink / raw)
  To: Martin Raiber, linux-btrfs



On 2021/7/1 上午2:40, Martin Raiber wrote:
> On 18.06.2021 18:18 Martin Raiber wrote:
>> On 10.05.2021 00:14 Martin Raiber wrote:
>>> I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:
>>>
>>> [1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
>>> [2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
>>> [2260628.156980] BTRFS info (device dm-0): forced readonly
>>>
>>> This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).
>> Just got it again (5.10.43). So I guess the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that. Grep for EIO and put a WARN_ON at each location?
>>
> Added the WARN_ON -EIOs. And hit it. It points at read_extent_buffer_pages (this time), this part before unlock_exit:

Well, this is quite different from your initial report.

Your initial report is EIO in btrfs_finish_ordered_io(), which happens
after all data is written back to disk.

But in this particular case, it happens before we submit the data to disk.

In this case, we search csum tree first, to find the csum for the range
we want to read, before submit the read bio.

Thus they are at completely different path.

>
>      for (i = 0; i < num_pages; i++) {
>          page = eb->pages[i];
>          wait_on_page_locked(page);
>          if (!PageUptodate(page))
>              -->ret = -EIO;
>      }
>
> Complete dmesg output. In this instance it seems to not be able to read a csum. It doesn't go read only in this case... Maybe it should?
>
> [Wed Jun 30 10:31:11 2021] systemd[1]: Started Journal Service.
> [Wed Jun 30 10:43:22 2021] ------------[ cut here ]------------
> [Wed Jun 30 10:43:22 2021] WARNING: CPU: 13 PID: 3965190 at fs/btrfs/extent_io.c:5597 read_extent_buffer_pages+0x346/0x360
> [Wed Jun 30 10:43:22 2021] Modules linked in: zram bcache crc64 loop dm_crypt bfq xfs dm_mod st sr_mod cdrom bridge stp llc intel_powerclamp coretemp kvm_intel snd_pcm mgag200 snd_timer kvm snd drm_kms_helper iTCO_wdt soundcore dcdbas irqbypass serio_raw pcspkr joydev iTCO_vendor_support evdev i2c_algo_bit i7core_edac sg ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter button ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm configfs ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid0 multipath linear raid1 md_mod ses enclosure sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd ahci cryptd glue_helper libahci uhci_hcd ehci_pci psmouse mpt3sas ehci_hcd raid_class lpc_ich libata nvme scsi_transport_sas mfd_core nvme_core usbcore t10_pi scsi_mod bnx2
> [Wed Jun 30 10:43:22 2021] CPU: 13 PID: 3965190 Comm: io_wqe_worker-0 Tainted: G          I       5.10.44 #1
> [Wed Jun 30 10:43:22 2021] Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.13.0 03/02/2018
> [Wed Jun 30 10:43:22 2021] RIP: 0010:read_extent_buffer_pages+0x346/0x360
> [Wed Jun 30 10:43:22 2021] Code: 48 8b 43 08 a8 01 48 8d 78 ff 48 0f 44 fb 31 f6 e8 8f d5 db ff 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da 48 8b 03 a8 04 75 ab <0f> 0b 41 be fb ff ff ff eb a1 e8 5b a2 56 00 66 66 2e 0f 1f 84 00
> [Wed Jun 30 10:43:22 2021] RSP: 0018:ffffc900215ab5f0 EFLAGS: 00010246
> [Wed Jun 30 10:43:22 2021] RAX: 06ffff80000020a3 RBX: ffffea0008a1b680 RCX: 0000000000000000
> [Wed Jun 30 10:43:22 2021] RDX: dead0000000000ff RSI: ffffc900215ab588 RDI: ffffffff82208780
> [Wed Jun 30 10:43:22 2021] RBP: ffff88844c924d68 R08: 000000000000000d R09: 00000000000003bb
> [Wed Jun 30 10:43:22 2021] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88844c924e10
> [Wed Jun 30 10:43:22 2021] R13: ffffea00091094c0 R14: 0000000000000000 R15: ffff88844c924e10
> [Wed Jun 30 10:43:22 2021] FS:  0000000000000000(0000) GS:ffff888713b80000(0000) knlGS:0000000000000000
> [Wed Jun 30 10:43:22 2021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Wed Jun 30 10:43:22 2021] CR2: 00007f4c7a7fb00a CR3: 000000012f1ba003 CR4: 00000000000206e0
> [Wed Jun 30 10:43:22 2021] Call Trace:
> [Wed Jun 30 10:43:22 2021]  btree_read_extent_buffer_pages+0x9f/0x110
> [Wed Jun 30 10:43:22 2021]  read_tree_block+0x36/0x60
> [Wed Jun 30 10:43:22 2021]  read_block_for_search.isra.0+0x1ab/0x360
> [Wed Jun 30 10:43:22 2021]  btrfs_search_slot+0x232/0x970
> [Wed Jun 30 10:43:22 2021]  btrfs_lookup_csum+0x75/0x170
> [Wed Jun 30 10:43:22 2021]  btrfs_lookup_bio_sums+0x23a/0x620
> [Wed Jun 30 10:43:22 2021]  btrfs_submit_compressed_read+0x44f/0x4e0
> [Wed Jun 30 10:43:22 2021]  btrfs_submit_data_bio+0x170/0x180
> [Wed Jun 30 10:43:22 2021]  submit_one_bio+0x44/0x70
> [Wed Jun 30 10:43:22 2021]  extent_readahead+0x374/0x3a0
> [Wed Jun 30 10:43:22 2021]  ? xas_load+0x5/0x70
> [Wed Jun 30 10:43:22 2021]  ? xa_get_order+0x93/0xd0
> [Wed Jun 30 10:43:22 2021]  ? __mod_memcg_lruvec_state+0x21/0xe0
> [Wed Jun 30 10:43:22 2021]  ? __add_to_page_cache_locked+0x19d/0x3f0
> [Wed Jun 30 10:43:22 2021]  read_pages+0x83/0x1e0
> [Wed Jun 30 10:43:22 2021]  page_cache_ra_unbounded+0x1aa/0x1f0
> [Wed Jun 30 10:43:22 2021]  ? find_get_entry+0xd5/0x140
> [Wed Jun 30 10:43:22 2021]  force_page_cache_ra+0xdc/0x140
> [Wed Jun 30 10:43:22 2021]  generic_file_buffered_read+0x5d9/0x8e0
> [Wed Jun 30 10:43:22 2021]  btrfs_file_read_iter+0x38/0xc0
> [Wed Jun 30 10:43:22 2021]  generic_file_splice_read+0xfc/0x1b0
> [Wed Jun 30 10:43:22 2021]  do_splice+0x670/0x720
> [Wed Jun 30 10:43:22 2021]  io_issue_sqe+0xde7/0xfa0
> [Wed Jun 30 10:43:22 2021]  ? pick_next_task_fair+0x32/0x380
> [Wed Jun 30 10:43:22 2021]  ? __switch_to+0x7b/0x3d0
> [Wed Jun 30 10:43:22 2021]  io_wq_submit_work+0x50/0x1a0
> [Wed Jun 30 10:43:22 2021]  io_worker_handle_work+0x1a1/0x570
> [Wed Jun 30 10:43:22 2021]  io_wqe_worker+0x2c1/0x360
> [Wed Jun 30 10:43:22 2021]  ? io_worker_handle_work+0x570/0x570
> [Wed Jun 30 10:43:22 2021]  kthread+0x11b/0x140
> [Wed Jun 30 10:43:22 2021]  ? __kthread_bind_mask+0x60/0x60
> [Wed Jun 30 10:43:22 2021]  ret_from_fork+0x1f/0x30
> [Wed Jun 30 10:43:22 2021] ---[ end trace 32a38f7d02f704b2 ]---
> [Wed Jun 30 10:43:22 2021] BTRFS info (device dm-0): no csum found for inode 23468615 start 10428416
> [Wed Jun 30 10:43:22 2021] BTRFS warning (device dm-0): csum failed root 1571 ino 23468615 off 586692173824 csum 0xc0a0c537 expected csum 0x00000000 mirror 1

For this particular case, btrfs first can't find the csum for the range
of read, and just left the csum as all zeros and continue.

Then the data read from disk will definitely cause a csum mismatch.

This normally means a csum tree corruption.

Can you run btrfs-check on that fs?

Thanks,
Qu
> [Wed Jun 30 10:43:22 2021] BTRFS error (device dm-0): bdev /dev/mapper/LUKS-RC-75aba4438f084377be2fc6d7d1f8eba0 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-07-01  1:40     ` Qu Wenruo
@ 2021-07-01 17:25       ` Martin Raiber
  2021-07-01 22:19         ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Raiber @ 2021-07-01 17:25 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 01.07.2021 03:40 Qu Wenruo wrote:
>
>
> On 2021/7/1 上午2:40, Martin Raiber wrote:
>> On 18.06.2021 18:18 Martin Raiber wrote:
>>> On 10.05.2021 00:14 Martin Raiber wrote:
>>>> I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:
>>>>
>>>> [1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
>>>> [2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
>>>> [2260628.156980] BTRFS info (device dm-0): forced readonly
>>>>
>>>> This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).
>>> Just got it again (5.10.43). So I guess the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that. Grep for EIO and put a WARN_ON at each location?
>>>
>> Added the WARN_ON -EIOs. And hit it. It points at read_extent_buffer_pages (this time), this part before unlock_exit:
>
> Well, this is quite different from your initial report.
>
> Your initial report is EIO in btrfs_finish_ordered_io(), which happens
> after all data is written back to disk.
>
> But in this particular case, it happens before we submit the data to disk.
>
> In this case, we search csum tree first, to find the csum for the range
> we want to read, before submit the read bio.
>
> Thus they are at completely different path.
Yes it fails to read the csum, because read_extent_buffer_pages returns -EIO. I made the, I think, reasonable assumption that there is only one issue in btrfs where -EIO happens without an actual IO error on the underlying device. The original issue has line numbers that point at btrfs_csum_file_blocks which calls btrfs_lookup_csum which is in the call path of this issue. Can't confirm it's the same issue because the original report didn't have the WARN_ONs in there, so feel free to treat them as separate issues.
>
>>
>>      for (i = 0; i < num_pages; i++) {
>>          page = eb->pages[i];
>>          wait_on_page_locked(page);
>>          if (!PageUptodate(page))
>>              -->ret = -EIO;
>>      }
>>
>> Complete dmesg output. In this instance it seems to not be able to read a csum. It doesn't go read only in this case... Maybe it should?
>>
>> [Wed Jun 30 10:31:11 2021] kernel log
>
> For this particular case, btrfs first can't find the csum for the range
> of read, and just left the csum as all zeros and continue.
>
> Then the data read from disk will definitely cause a csum mismatch.
>
> This normally means a csum tree corruption.
>
> Can you run btrfs-check on that fs?

It didn't "find" the csum because it has an -EIO error reading the extent where the csum is supposed to be stored. It is not a csum tree corruption because that would cause different log messages like transid not matching or csum of tree nodes being wrong, I think.

Sorry, the file is long deleted. Scrub comes back as clean and I guess the -EIO error causing the csum read failure was only transient anyway.

I'm not sufficiently familiar with btrfs/block device/mm subsystem obviously but here is one guess what could be wrong.

It waits for completion for the read of the extent buffer page like this:

wait_on_page_locked(page);
if (!PageUptodate(page))
    ret = -EIO;

while in filemap.c it reads a page like this:

wait_on_page_locked(page);
if (PageUptodate(page))
    goto out;
lock_page(page);
if (!page->mapping) {
        unlock_page(page);
        put_page(page);
        goto repeat;
}
/* Someone else locked and filled the page in a very small window */
if (PageUptodate(page)) {
        unlock_page(page);
        goto out;

}

With the comment:

> /*
> * Page is not up to date and may be locked due to one of the following
> * case a: Page is being filled and the page lock is held
> * case b: Read/write error clearing the page uptodate status
> * case c: Truncation in progress (page locked)
> * case d: Reclaim in progress
> *
> * Case a, the page will be up to date when the page is unlocked.
> * There is no need to serialise on the page lock here as the page
> * is pinned so the lock gives no additional protection. Even if the
> * page is truncated, the data is still valid if PageUptodate as
> * it's a race vs truncate race.
> * Case b, the page will not be up to date
> * Case c, the page may be truncated but in itself, the data may still
> * be valid after IO completes as it's a read vs truncate race. The
> * operation must restart if the page is not uptodate on unlock but
> * otherwise serialising on page lock to stabilise the mapping gives
> * no additional guarantees to the caller as the page lock is
> * released before return.
> * Case d, similar to truncation. If reclaim holds the page lock, it
> * will be a race with remove_mapping that determines if the mapping
> * is valid on unlock but otherwise the data is valid and there is
> * no need to serialise with page lock.
> *
> * As the page lock gives no additional guarantee, we optimistically
> * wait on the page to be unlocked and check if it's up to date and
> * use the page if it is. Otherwise, the page lock is required to
> * distinguish between the different cases. The motivation is that we
> * avoid spurious serialisations and wakeups when multiple processes
> * wait on the same page for IO to complete.
> */
So maybe the extent buffer page gets e.g. reclaimed in the small window between unlock and PageUptodate check?

Another option is case b (read/write error), but the NVMe/dm subsystem doesn't log any error for some reason.

I guess I could add the lock and check for mapping and PageError(page) to narrow it down further?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-07-01 17:25       ` Martin Raiber
@ 2021-07-01 22:19         ` Qu Wenruo
  2021-07-02 16:29           ` Martin Raiber
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2021-07-01 22:19 UTC (permalink / raw)
  To: Martin Raiber, Qu Wenruo, linux-btrfs



On 2021/7/2 上午1:25, Martin Raiber wrote:
> On 01.07.2021 03:40 Qu Wenruo wrote:
>>
>>
>> On 2021/7/1 上午2:40, Martin Raiber wrote:
>>> On 18.06.2021 18:18 Martin Raiber wrote:
>>>> On 10.05.2021 00:14 Martin Raiber wrote:
>>>>> I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:
>>>>>
>>>>> [1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
>>>>> [2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
>>>>> [2260628.156980] BTRFS info (device dm-0): forced readonly
>>>>>
>>>>> This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).
>>>> Just got it again (5.10.43). So I guess the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that. Grep for EIO and put a WARN_ON at each location?
>>>>
>>> Added the WARN_ON -EIOs. And hit it. It points at read_extent_buffer_pages (this time), this part before unlock_exit:
>>
>> Well, this is quite different from your initial report.
>>
>> Your initial report is EIO in btrfs_finish_ordered_io(), which happens
>> after all data is written back to disk.
>>
>> But in this particular case, it happens before we submit the data to disk.
>>
>> In this case, we search csum tree first, to find the csum for the range
>> we want to read, before submit the read bio.
>>
>> Thus they are at completely different path.
> Yes it fails to read the csum, because read_extent_buffer_pages returns -EIO. I made the, I think, reasonable assumption that there is only one issue in btrfs where -EIO happens without an actual IO error on the underlying device. The original issue has line numbers that point at btrfs_csum_file_blocks which calls btrfs_lookup_csum which is in the call path of this issue. Can't confirm it's the same issue because the original report didn't have the WARN_ONs in there, so feel free to treat them as separate issues.
>>
>>>
>>>       for (i = 0; i < num_pages; i++) {
>>>           page = eb->pages[i];
>>>           wait_on_page_locked(page);
>>>           if (!PageUptodate(page))
>>>               -->ret = -EIO;
>>>       }
>>>
>>> Complete dmesg output. In this instance it seems to not be able to read a csum. It doesn't go read only in this case... Maybe it should?
>>>
>>> [Wed Jun 30 10:31:11 2021] kernel log
>>
>> For this particular case, btrfs first can't find the csum for the range
>> of read, and just left the csum as all zeros and continue.
>>
>> Then the data read from disk will definitely cause a csum mismatch.
>>
>> This normally means a csum tree corruption.
>>
>> Can you run btrfs-check on that fs?
>
> It didn't "find" the csum because it has an -EIO error reading the extent where the csum is supposed to be stored. It is not a csum tree corruption because that would cause different log messages like transid not matching or csum of tree nodes being wrong, I think.

Yes, that's what I expect, and feel strange about.

>
> Sorry, the file is long deleted. Scrub comes back as clean and I guess the -EIO error causing the csum read failure was only transient anyway.
>
> I'm not sufficiently familiar with btrfs/block device/mm subsystem obviously but here is one guess what could be wrong.
>
> It waits for completion for the read of the extent buffer page like this:
>
> wait_on_page_locked(page);
> if (!PageUptodate(page))
>      ret = -EIO;
>
> while in filemap.c it reads a page like this:
>
> wait_on_page_locked(page);
> if (PageUptodate(page))
>      goto out;
> lock_page(page);
> if (!page->mapping) {
>          unlock_page(page);
>          put_page(page);
>          goto repeat;
> }

Yes, that what we do for data read path, as each time a page get
unlocked, we can get page invalidator trigger for the page, and when we
re-lock the page, it may has been invalidated.

Although above check has been updated to do extra check including
page->mapping and page->private check to be extra sure.

> /* Someone else locked and filled the page in a very small window */
> if (PageUptodate(page)) {
>          unlock_page(page);
>          goto out;
>
> }
>
> With the comment:
>
>> /*
>> * Page is not up to date and may be locked due to one of the following
>> * case a: Page is being filled and the page lock is held
>> * case b: Read/write error clearing the page uptodate status
>> * case c: Truncation in progress (page locked)
>> * case d: Reclaim in progress
>> *
>> * Case a, the page will be up to date when the page is unlocked.
>> * There is no need to serialise on the page lock here as the page
>> * is pinned so the lock gives no additional protection. Even if the
>> * page is truncated, the data is still valid if PageUptodate as
>> * it's a race vs truncate race.
>> * Case b, the page will not be up to date
>> * Case c, the page may be truncated but in itself, the data may still
>> * be valid after IO completes as it's a read vs truncate race. The
>> * operation must restart if the page is not uptodate on unlock but
>> * otherwise serialising on page lock to stabilise the mapping gives
>> * no additional guarantees to the caller as the page lock is
>> * released before return.
>> * Case d, similar to truncation. If reclaim holds the page lock, it
>> * will be a race with remove_mapping that determines if the mapping
>> * is valid on unlock but otherwise the data is valid and there is
>> * no need to serialise with page lock.
>> *
>> * As the page lock gives no additional guarantee, we optimistically
>> * wait on the page to be unlocked and check if it's up to date and
>> * use the page if it is. Otherwise, the page lock is required to
>> * distinguish between the different cases. The motivation is that we
>> * avoid spurious serialisations and wakeups when multiple processes
>> * wait on the same page for IO to complete.
>> */
> So maybe the extent buffer page gets e.g. reclaimed in the small window between unlock and PageUptodate check?

But for metadata case, unlike data path, we have very limited way to
invalidate/release a page.

Unlike data path, metadata page uses it page->private as pointer to
extent buffer.

And each time we want to drop a metadata page, we can only do that if
the extent buffer owning the page can be removed from the extent buffer
cache.

Thus a unlock metadata page get released halfway is not expected
behavior at all.

>
> Another option is case b (read/write error), but the NVMe/dm subsystem doesn't log any error for some reason.

I don't believe that's the case neither, or we should have csum mismatch
report from btrfs.

>
> I guess I could add the lock and check for mapping and PageError(page) to narrow it down further?
>
If you have a proper way to reproduce the bug reliable, I could craft a
diff for you to debug (with everything output to ftrace buffer to debug)

Thanks,
Qu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-07-01 22:19         ` Qu Wenruo
@ 2021-07-02 16:29           ` Martin Raiber
  2021-07-02 22:46             ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Raiber @ 2021-07-02 16:29 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 02.07.2021 00:19 Qu Wenruo wrote:
>
>
> On 2021/7/2 上午1:25, Martin Raiber wrote:
>> On 01.07.2021 03:40 Qu Wenruo wrote:
>>>
>>>
>>> On 2021/7/1 上午2:40, Martin Raiber wrote:
>>>> On 18.06.2021 18:18 Martin Raiber wrote:
>>>>> On 10.05.2021 00:14 Martin Raiber wrote:
>>>>>> I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:
>>>>>>
>>>>>> [1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
>>>>>> [2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
>>>>>> [2260628.156980] BTRFS info (device dm-0): forced readonly
>>>>>>
>>>>>> This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).
>>>>> Just got it again (5.10.43). So I guess the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that. Grep for EIO and put a WARN_ON at each location?
>>>>>
>>>> Added the WARN_ON -EIOs. And hit it. It points at read_extent_buffer_pages (this time), this part before unlock_exit:
>>>
>>> Well, this is quite different from your initial report.
>>>
>>> Your initial report is EIO in btrfs_finish_ordered_io(), which happens
>>> after all data is written back to disk.
>>>
>>> But in this particular case, it happens before we submit the data to disk.
>>>
>>> In this case, we search csum tree first, to find the csum for the range
>>> we want to read, before submit the read bio.
>>>
>>> Thus they are at completely different path.
>> Yes it fails to read the csum, because read_extent_buffer_pages returns -EIO. I made the, I think, reasonable assumption that there is only one issue in btrfs where -EIO happens without an actual IO error on the underlying device. The original issue has line numbers that point at btrfs_csum_file_blocks which calls btrfs_lookup_csum which is in the call path of this issue. Can't confirm it's the same issue because the original report didn't have the WARN_ONs in there, so feel free to treat them as separate issues.
>>>
>>>>
>>>>       for (i = 0; i < num_pages; i++) {
>>>>           page = eb->pages[i];
>>>>           wait_on_page_locked(page);
>>>>           if (!PageUptodate(page))
>>>>               -->ret = -EIO;
>>>>       }
>>>>
>>>> Complete dmesg output. In this instance it seems to not be able to read a csum. It doesn't go read only in this case... Maybe it should?
>>>>
>>>> [Wed Jun 30 10:31:11 2021] kernel log
>>>
>>> For this particular case, btrfs first can't find the csum for the range
>>> of read, and just left the csum as all zeros and continue.
>>>
>>> Then the data read from disk will definitely cause a csum mismatch.
>>>
>>> This normally means a csum tree corruption.
>>>
>>> Can you run btrfs-check on that fs?
>>
>> It didn't "find" the csum because it has an -EIO error reading the extent where the csum is supposed to be stored. It is not a csum tree corruption because that would cause different log messages like transid not matching or csum of tree nodes being wrong, I think.
>
> Yes, that's what I expect, and feel strange about.
>
>>
>> Sorry, the file is long deleted. Scrub comes back as clean and I guess the -EIO error causing the csum read failure was only transient anyway.
>>
>> I'm not sufficiently familiar with btrfs/block device/mm subsystem obviously but here is one guess what could be wrong.
>>
>> It waits for completion for the read of the extent buffer page like this:
>>
>> wait_on_page_locked(page);
>> if (!PageUptodate(page))
>>      ret = -EIO;
>>
>> while in filemap.c it reads a page like this:
>>
>> wait_on_page_locked(page);
>> if (PageUptodate(page))
>>      goto out;
>> lock_page(page);
>> if (!page->mapping) {
>>          unlock_page(page);
>>          put_page(page);
>>          goto repeat;
>> }
>
> Yes, that what we do for data read path, as each time a page get
> unlocked, we can get page invalidator trigger for the page, and when we
> re-lock the page, it may has been invalidated.
>
> Although above check has been updated to do extra check including
> page->mapping and page->private check to be extra sure.
>
>> /* Someone else locked and filled the page in a very small window */
>> if (PageUptodate(page)) {
>>          unlock_page(page);
>>          goto out;
>>
>> }
>>
>> With the comment:
>>
>>> /*
>>> * Page is not up to date and may be locked due to one of the following
>>> * case a: Page is being filled and the page lock is held
>>> * case b: Read/write error clearing the page uptodate status
>>> * case c: Truncation in progress (page locked)
>>> * case d: Reclaim in progress
>>> *  [...]
>>> */
>> So maybe the extent buffer page gets e.g. reclaimed in the small window between unlock and PageUptodate check?
>
> But for metadata case, unlike data path, we have very limited way to
> invalidate/release a page.
>
> Unlike data path, metadata page uses it page->private as pointer to
> extent buffer.
>
> And each time we want to drop a metadata page, we can only do that if
> the extent buffer owning the page can be removed from the extent buffer
> cache.
>
> Thus a unlock metadata page get released halfway is not expected
> behavior at all.
I see. It mainly checks extent_buffer->refs and it increments that after allocating/finding the extent... No further idea what could be the problem...
>
>>
>> Another option is case b (read/write error), but the NVMe/dm subsystem doesn't log any error for some reason.
>
> I don't believe that's the case neither, or we should have csum mismatch
> report from btrfs.
>
>>
>> I guess I could add the lock and check for mapping and PageError(page) to narrow it down further?
>>
> If you have a proper way to reproduce the bug reliable, I could craft a
> diff for you to debug (with everything output to ftrace buffer to debug)

Unfortunately I can't reproduce it and it is really rare. I could add further output at that location. E.g. checking for mapping presence if the page has an error or if it was submitted for read or was Uptodate before?

For now I'll just extent the retry logic in btree_read_extent_buffer_pages to try the same mirror multiple times (The problematic btrfs filesystems have single metadata). That should do it as work-around.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-07-02 16:29           ` Martin Raiber
@ 2021-07-02 22:46             ` Qu Wenruo
  2021-07-08 16:14               ` Martin Raiber
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2021-07-02 22:46 UTC (permalink / raw)
  To: Martin Raiber, linux-btrfs



On 2021/7/3 上午12:29, Martin Raiber wrote:
> On 02.07.2021 00:19 Qu Wenruo wrote:
>>
>>
>> On 2021/7/2 上午1:25, Martin Raiber wrote:
>>> On 01.07.2021 03:40 Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2021/7/1 上午2:40, Martin Raiber wrote:
>>>>> On 18.06.2021 18:18 Martin Raiber wrote:
>>>>>> On 10.05.2021 00:14 Martin Raiber wrote:
>>>>>>> I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:
>>>>>>>
>>>>>>> [1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
>>>>>>> [2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
>>>>>>> [2260628.156980] BTRFS info (device dm-0): forced readonly
>>>>>>>
>>>>>>> This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).
>>>>>> Just got it again (5.10.43). So I guess the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that. Grep for EIO and put a WARN_ON at each location?
>>>>>>
>>>>> Added the WARN_ON -EIOs. And hit it. It points at read_extent_buffer_pages (this time), this part before unlock_exit:
>>>>
>>>> Well, this is quite different from your initial report.
>>>>
>>>> Your initial report is EIO in btrfs_finish_ordered_io(), which happens
>>>> after all data is written back to disk.
>>>>
>>>> But in this particular case, it happens before we submit the data to disk.
>>>>
>>>> In this case, we search csum tree first, to find the csum for the range
>>>> we want to read, before submit the read bio.
>>>>
>>>> Thus they are at completely different path.
>>> Yes it fails to read the csum, because read_extent_buffer_pages returns -EIO. I made the, I think, reasonable assumption that there is only one issue in btrfs where -EIO happens without an actual IO error on the underlying device. The original issue has line numbers that point at btrfs_csum_file_blocks which calls btrfs_lookup_csum which is in the call path of this issue. Can't confirm it's the same issue because the original report didn't have the WARN_ONs in there, so feel free to treat them as separate issues.
>>>>
>>>>>
>>>>>        for (i = 0; i < num_pages; i++) {
>>>>>            page = eb->pages[i];
>>>>>            wait_on_page_locked(page);
>>>>>            if (!PageUptodate(page))
>>>>>                -->ret = -EIO;
>>>>>        }
>>>>>
>>>>> Complete dmesg output. In this instance it seems to not be able to read a csum. It doesn't go read only in this case... Maybe it should?
>>>>>
>>>>> [Wed Jun 30 10:31:11 2021] kernel log
>>>>
>>>> For this particular case, btrfs first can't find the csum for the range
>>>> of read, and just left the csum as all zeros and continue.
>>>>
>>>> Then the data read from disk will definitely cause a csum mismatch.
>>>>
>>>> This normally means a csum tree corruption.
>>>>
>>>> Can you run btrfs-check on that fs?
>>>
>>> It didn't "find" the csum because it has an -EIO error reading the extent where the csum is supposed to be stored. It is not a csum tree corruption because that would cause different log messages like transid not matching or csum of tree nodes being wrong, I think.
>>
>> Yes, that's what I expect, and feel strange about.
>>
>>>
>>> Sorry, the file is long deleted. Scrub comes back as clean and I guess the -EIO error causing the csum read failure was only transient anyway.
>>>
>>> I'm not sufficiently familiar with btrfs/block device/mm subsystem obviously but here is one guess what could be wrong.
>>>
>>> It waits for completion for the read of the extent buffer page like this:
>>>
>>> wait_on_page_locked(page);
>>> if (!PageUptodate(page))
>>>       ret = -EIO;
>>>
>>> while in filemap.c it reads a page like this:
>>>
>>> wait_on_page_locked(page);
>>> if (PageUptodate(page))
>>>       goto out;
>>> lock_page(page);
>>> if (!page->mapping) {
>>>           unlock_page(page);
>>>           put_page(page);
>>>           goto repeat;
>>> }
>>
>> Yes, that what we do for data read path, as each time a page get
>> unlocked, we can get page invalidator trigger for the page, and when we
>> re-lock the page, it may has been invalidated.
>>
>> Although above check has been updated to do extra check including
>> page->mapping and page->private check to be extra sure.
>>
>>> /* Someone else locked and filled the page in a very small window */
>>> if (PageUptodate(page)) {
>>>           unlock_page(page);
>>>           goto out;
>>>
>>> }
>>>
>>> With the comment:
>>>
>>>> /*
>>>> * Page is not up to date and may be locked due to one of the following
>>>> * case a: Page is being filled and the page lock is held
>>>> * case b: Read/write error clearing the page uptodate status
>>>> * case c: Truncation in progress (page locked)
>>>> * case d: Reclaim in progress
>>>> *  [...]
>>>> */
>>> So maybe the extent buffer page gets e.g. reclaimed in the small window between unlock and PageUptodate check?
>>
>> But for metadata case, unlike data path, we have very limited way to
>> invalidate/release a page.
>>
>> Unlike data path, metadata page uses it page->private as pointer to
>> extent buffer.
>>
>> And each time we want to drop a metadata page, we can only do that if
>> the extent buffer owning the page can be removed from the extent buffer
>> cache.
>>
>> Thus a unlock metadata page get released halfway is not expected
>> behavior at all.
> I see. It mainly checks extent_buffer->refs and it increments that after allocating/finding the extent... No further idea what could be the problem...
>>
>>>
>>> Another option is case b (read/write error), but the NVMe/dm subsystem doesn't log any error for some reason.
>>
>> I don't believe that's the case neither, or we should have csum mismatch
>> report from btrfs.
>>
>>>
>>> I guess I could add the lock and check for mapping and PageError(page) to narrow it down further?
>>>
>> If you have a proper way to reproduce the bug reliable, I could craft a
>> diff for you to debug (with everything output to ftrace buffer to debug)
>
> Unfortunately I can't reproduce it and it is really rare. I could add further output at that location. E.g. checking for mapping presence if the page has an error or if it was submitted for read or was Uptodate before?
>
> For now I'll just extent the retry logic in btree_read_extent_buffer_pages to try the same mirror multiple times (The problematic btrfs filesystems have single metadata). That should do it as work-around.
>
My recommendation for debugging is to add extra trace_printk() on
btree_releasepage() to see when a metadata page really get freed, and
read_extent_buffer_pages() to see what's the possible race.

Another idea is to add extra debug output in end_bio_extent_readpage()
for metadata pages.

As I'm still wondering if there is something wrong detected by btrfs
module but without any error message.

In that case, we definitely want to add a proper error message.

Finally, it's never a bad idea to run btrfs-check, maybe it can find
some hidden problems.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-07-02 22:46             ` Qu Wenruo
@ 2021-07-08 16:14               ` Martin Raiber
  2021-07-08 23:32                 ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Raiber @ 2021-07-08 16:14 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 03.07.2021 00:46 Qu Wenruo wrote:
>
>
> On 2021/7/3 上午12:29, Martin Raiber wrote:
>> On 02.07.2021 00:19 Qu Wenruo wrote:
>>>
>>>
>>> On 2021/7/2 上午1:25, Martin Raiber wrote:
>>>> On 01.07.2021 03:40 Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2021/7/1 上午2:40, Martin Raiber wrote:
>>>>>> On 18.06.2021 18:18 Martin Raiber wrote:
>>>>>>> On 10.05.2021 00:14 Martin Raiber wrote:
>>>>>>>> I get this (rare) issue where btrfs reports an IO error in run_delayed_refs or finish_ordered_io with no underlying device errors being reported. This is with 5.10.26 but with a few patches like the pcpu ENOMEM fix or work-arounds for btrfs ENOSPC issues:
>>>>>>>>
>>>>>>>> [1885197.101981] systemd-sysv-generator[2324776]: SysV service '/etc/init.d/exim4' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
>>>>>>>> [2260628.156893] BTRFS: error (device dm-0) in btrfs_finish_ordered_io:2736: errno=-5 IO failure
>>>>>>>> [2260628.156980] BTRFS info (device dm-0): forced readonly
>>>>>>>>
>>>>>>>> This issue occured on two different machines now (on one twice). Both with ECC RAM. One bare metal (where dm-0 is on a NVMe) and one in a VM (where dm-0 is a ceph volume).
>>>>>>> Just got it again (5.10.43). So I guess the question is how can I trace where this error comes from... The error message points at btrfs_csum_file_blocks but nothing beyond that. Grep for EIO and put a WARN_ON at each location?
>>>>>>>
>>>>>> Added the WARN_ON -EIOs. And hit it. It points at read_extent_buffer_pages (this time), this part before unlock_exit:
>>>>>
>>>>> Well, this is quite different from your initial report.
>>>>>
>>>>> Your initial report is EIO in btrfs_finish_ordered_io(), which happens
>>>>> after all data is written back to disk.
>>>>>
>>>>> But in this particular case, it happens before we submit the data to disk.
>>>>>
>>>>> In this case, we search csum tree first, to find the csum for the range
>>>>> we want to read, before submit the read bio.
>>>>>
>>>>> Thus they are at completely different path.
>>>> Yes it fails to read the csum, because read_extent_buffer_pages returns -EIO. I made the, I think, reasonable assumption that there is only one issue in btrfs where -EIO happens without an actual IO error on the underlying device. The original issue has line numbers that point at btrfs_csum_file_blocks which calls btrfs_lookup_csum which is in the call path of this issue. Can't confirm it's the same issue because the original report didn't have the WARN_ONs in there, so feel free to treat them as separate issues.
>>>>>
>>>>>>
>>>>>>        for (i = 0; i < num_pages; i++) {
>>>>>>            page = eb->pages[i];
>>>>>>            wait_on_page_locked(page);
>>>>>>            if (!PageUptodate(page))
>>>>>>                -->ret = -EIO;
>>>>>>        }
>>>>>>
>>>>>> Complete dmesg output. In this instance it seems to not be able to read a csum. It doesn't go read only in this case... Maybe it should?
>>>>>>
>>>>>> [Wed Jun 30 10:31:11 2021] kernel log
>>>>>
>>>>> For this particular case, btrfs first can't find the csum for the range
>>>>> of read, and just left the csum as all zeros and continue.
>>>>>
>>>>> Then the data read from disk will definitely cause a csum mismatch.
>>>>>
>>>>> This normally means a csum tree corruption.
>>>>>
>>>>> Can you run btrfs-check on that fs?
>>>>
>>>> It didn't "find" the csum because it has an -EIO error reading the extent where the csum is supposed to be stored. It is not a csum tree corruption because that would cause different log messages like transid not matching or csum of tree nodes being wrong, I think.
>>>
>>> Yes, that's what I expect, and feel strange about.
>>>
>>>>
>>>> Sorry, the file is long deleted. Scrub comes back as clean and I guess the -EIO error causing the csum read failure was only transient anyway.
>>>>
>>>> I'm not sufficiently familiar with btrfs/block device/mm subsystem obviously but here is one guess what could be wrong.
>>>>
>>>> It waits for completion for the read of the extent buffer page like this:
>>>>
>>>> wait_on_page_locked(page);
>>>> if (!PageUptodate(page))
>>>>       ret = -EIO;
>>>>
>>>> while in filemap.c it reads a page like this:
>>>>
>>>> wait_on_page_locked(page);
>>>> if (PageUptodate(page))
>>>>       goto out;
>>>> lock_page(page);
>>>> if (!page->mapping) {
>>>>           unlock_page(page);
>>>>           put_page(page);
>>>>           goto repeat;
>>>> }
>>>
>>> Yes, that what we do for data read path, as each time a page get
>>> unlocked, we can get page invalidator trigger for the page, and when we
>>> re-lock the page, it may has been invalidated.
>>>
>>> Although above check has been updated to do extra check including
>>> page->mapping and page->private check to be extra sure.
>>>
>>>> /* Someone else locked and filled the page in a very small window */
>>>> if (PageUptodate(page)) {
>>>>           unlock_page(page);
>>>>           goto out;
>>>>
>>>> }
>>>>
>>>> With the comment:
>>>>
>>>>> /*
>>>>> * Page is not up to date and may be locked due to one of the following
>>>>> * case a: Page is being filled and the page lock is held
>>>>> * case b: Read/write error clearing the page uptodate status
>>>>> * case c: Truncation in progress (page locked)
>>>>> * case d: Reclaim in progress
>>>>> *  [...]
>>>>> */
>>>> So maybe the extent buffer page gets e.g. reclaimed in the small window between unlock and PageUptodate check?
>>>
>>> But for metadata case, unlike data path, we have very limited way to
>>> invalidate/release a page.
>>>
>>> Unlike data path, metadata page uses it page->private as pointer to
>>> extent buffer.
>>>
>>> And each time we want to drop a metadata page, we can only do that if
>>> the extent buffer owning the page can be removed from the extent buffer
>>> cache.
>>>
>>> Thus a unlock metadata page get released halfway is not expected
>>> behavior at all.
>> I see. It mainly checks extent_buffer->refs and it increments that after allocating/finding the extent... No further idea what could be the problem...
>>>
>>>>
>>>> Another option is case b (read/write error), but the NVMe/dm subsystem doesn't log any error for some reason.
>>>
>>> I don't believe that's the case neither, or we should have csum mismatch
>>> report from btrfs.
>>>
>>>>
>>>> I guess I could add the lock and check for mapping and PageError(page) to narrow it down further?
>>>>
>>> If you have a proper way to reproduce the bug reliable, I could craft a
>>> diff for you to debug (with everything output to ftrace buffer to debug)
>>
>> Unfortunately I can't reproduce it and it is really rare. I could add further output at that location. E.g. checking for mapping presence if the page has an error or if it was submitted for read or was Uptodate before?
>>
>> For now I'll just extent the retry logic in btree_read_extent_buffer_pages to try the same mirror multiple times (The problematic btrfs filesystems have single metadata). That should do it as work-around.
>>
> My recommendation for debugging is to add extra trace_printk() on
> btree_releasepage() to see when a metadata page really get freed, and
> read_extent_buffer_pages() to see what's the possible race.
>
> Another idea is to add extra debug output in end_bio_extent_readpage()
> for metadata pages.
>
> As I'm still wondering if there is something wrong detected by btrfs
> module but without any error message.
>
> In that case, we definitely want to add a proper error message.

I guess this would be a proper error message:

--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6505,8 +6505,14 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
        for (i = 0; i < num_pages; i++) {
                page = eb->pages[i];
                wait_on_page_locked(page);
-               if (!PageUptodate(page))
+               if (!PageUptodate(page)) {
+                       btrfs_err_rl(eb->fs_info,
+                               "error reading extent buffer "
+                               "start %llu len %lu PageError %d",
+                               eb->start, eb->len,
+                               PageError(page) ? 1 : 0);
                        ret = -EIO;
+               }
        }

        return ret;

I haven't added this to the kernel I'm running yet. It currently still has a WARN_ON instead of the error message.

I also added the retry like this:

--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -385,6 +385,7 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
        int num_copies = 0;
        int mirror_num = 0;
        int failed_mirror = 0;
+       int tries = 2;

        io_tree = &BTRFS_I(fs_info->btree_inode)->io_tree;
        while (1) {
@@ -403,6 +404,10 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,

                num_copies = btrfs_num_copies(fs_info,
                                              eb->start, eb->len);
+
+               if (num_copies == 1 && tries-- > 0)
+                       continue;
+
                if (num_copies == 1)
                        break;

On Tuesday I got this:

[Tue Jul  6 14:49:17 2021] ------------[ cut here ]------------
[Tue Jul  6 14:49:17 2021] WARNING: CPU: 13 PID: 2265463 at fs/btrfs/extent_io.c:5597 read_extent_buffer_pages+0x346/0x360
[Tue Jul  6 14:49:17 2021] Modules linked in: zram bcache crc64 loop dm_crypt bfq xfs dm_mod st sr_mod cdrom bridge stp llc intel_powerclamp coretemp snd_pcm kvm_intel snd_timer snd mgag200 kvm soundcore drm_kms_helper iTCO_wdt dcdbas irqbypass pcspkr serio_raw iTCO_vendor_support i2c_algo_bit evdev joydev i7core_edac sg ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter button ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm configfs ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid0 multipath linear raid1 md_mod ses enclosure sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ahci libahci uhci_hcd ehci_pci psmouse mpt3sas ehci_hcd lpc_ich raid_class libata nvme scsi_transport_sas mfd_core nvme_core usbcore t10_pi scsi_mod bnx2
[Tue Jul  6 14:49:17 2021] CPU: 13 PID: 2265463 Comm: btrfs Tainted: G          I       5.10.47 #1
[Tue Jul  6 14:49:17 2021] Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.13.0 03/02/2018
[Tue Jul  6 14:49:17 2021] RIP: 0010:read_extent_buffer_pages+0x346/0x360
[Tue Jul  6 14:49:17 2021] Code: 48 8b 43 08 a8 01 48 8d 78 ff 48 0f 44 fb 31 f6 e8 cf d0 db ff 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da 48 8b 03 a8 04 75 ab <0f> 0b 41 be fb ff ff ff eb a1 e8 4b af 56 00 66 66 2e 0f 1f 84 00
[Tue Jul  6 14:49:17 2021] RSP: 0018:ffffc9007562bb40 EFLAGS: 00010246
[Tue Jul  6 14:49:17 2021] RAX: 06ffff80000020e3 RBX: ffffea00095a5fc0 RCX: 0000000000000000
[Tue Jul  6 14:49:17 2021] RDX: dead0000000000ff RSI: 0000000000000020 RDI: ffff888713babd40
[Tue Jul  6 14:49:17 2021] RBP: ffff88869a7d6f78 R08: 0000f5a746752ed4 R09: 0000000000000483
[Tue Jul  6 14:49:17 2021] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88869a7d7020
[Tue Jul  6 14:49:17 2021] R13: ffffea0005f51f80 R14: 0000000000000000 R15: ffff88869a7d7020
[Tue Jul  6 14:49:17 2021] FS:  00007f748ddc18c0(0000) GS:ffff888713b80000(0000) knlGS:0000000000000000
[Tue Jul  6 14:49:17 2021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Jul  6 14:49:17 2021] CR2: 00005618b6037058 CR3: 000000011feca004 CR4: 00000000000206e0
[Tue Jul  6 14:49:17 2021] Call Trace:
[Tue Jul  6 14:49:17 2021]  btree_read_extent_buffer_pages+0x66/0x130
[Tue Jul  6 14:49:17 2021]  read_tree_block+0x36/0x60
[Tue Jul  6 14:49:17 2021]  btrfs_read_node_slot+0xc0/0x110
[Tue Jul  6 14:49:17 2021]  btrfs_search_forward+0x1db/0x350
[Tue Jul  6 14:49:17 2021]  search_ioctl+0x19e/0x250
[Tue Jul  6 14:49:17 2021]  btrfs_ioctl_tree_search+0x63/0xc0
[Tue Jul  6 14:49:17 2021]  btrfs_ioctl+0x1874/0x3060
[Tue Jul  6 14:49:17 2021]  ? page_add_new_anon_rmap+0xa3/0x1f0
[Tue Jul  6 14:49:17 2021]  ? handle_mm_fault+0xf6c/0x1950
[Tue Jul  6 14:49:17 2021]  ? __x64_sys_ioctl+0x83/0xb0
[Tue Jul  6 14:49:17 2021]  __x64_sys_ioctl+0x83/0xb0
[Tue Jul  6 14:49:17 2021]  do_syscall_64+0x33/0x80
[Tue Jul  6 14:49:17 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Jul  6 14:49:17 2021] RIP: 0033:0x7f748deb8cc7
[Tue Jul  6 14:49:17 2021] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
[Tue Jul  6 14:49:17 2021] RSP: 002b:00007ffc7e07f098 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Tue Jul  6 14:49:17 2021] RAX: ffffffffffffffda RBX: 00007ffc7e0801d8 RCX: 00007f748deb8cc7
[Tue Jul  6 14:49:17 2021] RDX: 00007ffc7e07f0f8 RSI: 00000000d0009411 RDI: 0000000000000005
[Tue Jul  6 14:49:17 2021] RBP: 0000000000bafaad R08: 000000000000000b R09: 00007f748df82be0
[Tue Jul  6 14:49:17 2021] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000005
[Tue Jul  6 14:49:17 2021] R13: 00007ffc7e07ffe1 R14: 000000000000000b R15: 00007ffc7e07f160
[Tue Jul  6 14:49:17 2021] ---[ end trace 81f64eb2e9ceb4de ]---

with no other message afterwards. This means the retry is working and "fixed" the problem for me. I'll keep monitoring this and will try to catch the -EIO from other areas, like the original btrfs_finish_ordered_io.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-07-08 16:14               ` Martin Raiber
@ 2021-07-08 23:32                 ` Qu Wenruo
  2021-07-14 16:32                   ` Martin Raiber
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2021-07-08 23:32 UTC (permalink / raw)
  To: Martin Raiber, linux-btrfs


[...]
>> My recommendation for debugging is to add extra trace_printk() on
>> btree_releasepage() to see when a metadata page really get freed, and
>> read_extent_buffer_pages() to see what's the possible race.
>>
>> Another idea is to add extra debug output in end_bio_extent_readpage()
>> for metadata pages.
>>
>> As I'm still wondering if there is something wrong detected by btrfs
>> module but without any error message.
>>
>> In that case, we definitely want to add a proper error message.
>
> I guess this would be a proper error message:
>
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -6505,8 +6505,14 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
>          for (i = 0; i < num_pages; i++) {
>                  page = eb->pages[i];
>                  wait_on_page_locked(page);
> -               if (!PageUptodate(page))
> +               if (!PageUptodate(page)) {
> +                       btrfs_err_rl(eb->fs_info,
> +                               "error reading extent buffer "
> +                               "start %llu len %lu PageError %d",
> +                               eb->start, eb->len,
> +                               PageError(page) ? 1 : 0);

What I originally mean is, in end_bio_extent_readpage() we may hit a
case where @uptodate is 0, but we don't have any error message for it.

Your error message is fine, it has PageError() to indicate whether we're
really hitting a case like above, or someone is releasing the metadata
page improperly.

If combined with the following diff, it can rule out or catch the error
in metadata read path directly, as I'm still wondering if it's error
from the bio, or the error from improperly released metadata pages.

Thanks,
Qu

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b5f5de7e4a29..b1f8862ba539 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3047,6 +3047,16 @@ static void end_bio_extent_readpage(struct bio *bio)
                 if (likely(uptodate))
                         goto readpage_ok;

+               if (!is_data_inode(inode)) {
+                       struct extent_buffer *eb;
+
+                       eb = (struct extent_buffer *)page->private;
+                       btrfs_err_rl(fs_info,
+               "metadata read error, page=%llu eb=%llu bio_status=%d
ret=%d",
+                                    page_offset(page), eb->start,
+                                    bio->bi_status, ret);
+               }
+
                 if (is_data_inode(inode)) {
                         /*
                          * btrfs_submit_read_repair() will handle all
the good


>                          ret = -EIO;
> +               }
>          }
>
>          return ret;
>
> I haven't added this to the kernel I'm running yet. It currently still has a WARN_ON instead of the error message.
>
> I also added the retry like this:
>
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -385,6 +385,7 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
>          int num_copies = 0;
>          int mirror_num = 0;
>          int failed_mirror = 0;
> +       int tries = 2;
>
>          io_tree = &BTRFS_I(fs_info->btree_inode)->io_tree;
>          while (1) {
> @@ -403,6 +404,10 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
>
>                  num_copies = btrfs_num_copies(fs_info,
>                                                eb->start, eb->len);
> +
> +               if (num_copies == 1 && tries-- > 0)
> +                       continue;
> +
>                  if (num_copies == 1)
>                          break;
>
> On Tuesday I got this:
>
> [Tue Jul  6 14:49:17 2021] ------------[ cut here ]------------
> [Tue Jul  6 14:49:17 2021] WARNING: CPU: 13 PID: 2265463 at fs/btrfs/extent_io.c:5597 read_extent_buffer_pages+0x346/0x360
> [Tue Jul  6 14:49:17 2021] Modules linked in: zram bcache crc64 loop dm_crypt bfq xfs dm_mod st sr_mod cdrom bridge stp llc intel_powerclamp coretemp snd_pcm kvm_intel snd_timer snd mgag200 kvm soundcore drm_kms_helper iTCO_wdt dcdbas irqbypass pcspkr serio_raw iTCO_vendor_support i2c_algo_bit evdev joydev i7core_edac sg ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter button ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm configfs ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid0 multipath linear raid1 md_mod ses enclosure sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ahci libahci uhci_hcd ehci_pci psmouse mpt3sas ehci_hcd lpc_ich raid_class libata nvme scsi_transport_sas mfd_core nvme_core usbcore t10_pi scsi_mod bnx2
> [Tue Jul  6 14:49:17 2021] CPU: 13 PID: 2265463 Comm: btrfs Tainted: G          I       5.10.47 #1
> [Tue Jul  6 14:49:17 2021] Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.13.0 03/02/2018
> [Tue Jul  6 14:49:17 2021] RIP: 0010:read_extent_buffer_pages+0x346/0x360
> [Tue Jul  6 14:49:17 2021] Code: 48 8b 43 08 a8 01 48 8d 78 ff 48 0f 44 fb 31 f6 e8 cf d0 db ff 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da 48 8b 03 a8 04 75 ab <0f> 0b 41 be fb ff ff ff eb a1 e8 4b af 56 00 66 66 2e 0f 1f 84 00
> [Tue Jul  6 14:49:17 2021] RSP: 0018:ffffc9007562bb40 EFLAGS: 00010246
> [Tue Jul  6 14:49:17 2021] RAX: 06ffff80000020e3 RBX: ffffea00095a5fc0 RCX: 0000000000000000
> [Tue Jul  6 14:49:17 2021] RDX: dead0000000000ff RSI: 0000000000000020 RDI: ffff888713babd40
> [Tue Jul  6 14:49:17 2021] RBP: ffff88869a7d6f78 R08: 0000f5a746752ed4 R09: 0000000000000483
> [Tue Jul  6 14:49:17 2021] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88869a7d7020
> [Tue Jul  6 14:49:17 2021] R13: ffffea0005f51f80 R14: 0000000000000000 R15: ffff88869a7d7020
> [Tue Jul  6 14:49:17 2021] FS:  00007f748ddc18c0(0000) GS:ffff888713b80000(0000) knlGS:0000000000000000
> [Tue Jul  6 14:49:17 2021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Tue Jul  6 14:49:17 2021] CR2: 00005618b6037058 CR3: 000000011feca004 CR4: 00000000000206e0
> [Tue Jul  6 14:49:17 2021] Call Trace:
> [Tue Jul  6 14:49:17 2021]  btree_read_extent_buffer_pages+0x66/0x130
> [Tue Jul  6 14:49:17 2021]  read_tree_block+0x36/0x60
> [Tue Jul  6 14:49:17 2021]  btrfs_read_node_slot+0xc0/0x110
> [Tue Jul  6 14:49:17 2021]  btrfs_search_forward+0x1db/0x350
> [Tue Jul  6 14:49:17 2021]  search_ioctl+0x19e/0x250
> [Tue Jul  6 14:49:17 2021]  btrfs_ioctl_tree_search+0x63/0xc0
> [Tue Jul  6 14:49:17 2021]  btrfs_ioctl+0x1874/0x3060
> [Tue Jul  6 14:49:17 2021]  ? page_add_new_anon_rmap+0xa3/0x1f0
> [Tue Jul  6 14:49:17 2021]  ? handle_mm_fault+0xf6c/0x1950
> [Tue Jul  6 14:49:17 2021]  ? __x64_sys_ioctl+0x83/0xb0
> [Tue Jul  6 14:49:17 2021]  __x64_sys_ioctl+0x83/0xb0
> [Tue Jul  6 14:49:17 2021]  do_syscall_64+0x33/0x80
> [Tue Jul  6 14:49:17 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [Tue Jul  6 14:49:17 2021] RIP: 0033:0x7f748deb8cc7
> [Tue Jul  6 14:49:17 2021] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
> [Tue Jul  6 14:49:17 2021] RSP: 002b:00007ffc7e07f098 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [Tue Jul  6 14:49:17 2021] RAX: ffffffffffffffda RBX: 00007ffc7e0801d8 RCX: 00007f748deb8cc7
> [Tue Jul  6 14:49:17 2021] RDX: 00007ffc7e07f0f8 RSI: 00000000d0009411 RDI: 0000000000000005
> [Tue Jul  6 14:49:17 2021] RBP: 0000000000bafaad R08: 000000000000000b R09: 00007f748df82be0
> [Tue Jul  6 14:49:17 2021] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000005
> [Tue Jul  6 14:49:17 2021] R13: 00007ffc7e07ffe1 R14: 000000000000000b R15: 00007ffc7e07f160
> [Tue Jul  6 14:49:17 2021] ---[ end trace 81f64eb2e9ceb4de ]---
>
> with no other message afterwards. This means the retry is working and "fixed" the problem for me. I'll keep monitoring this and will try to catch the -EIO from other areas, like the original btrfs_finish_ordered_io.
>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: IO failure without other (device) error
  2021-07-08 23:32                 ` Qu Wenruo
@ 2021-07-14 16:32                   ` Martin Raiber
  0 siblings, 0 replies; 15+ messages in thread
From: Martin Raiber @ 2021-07-14 16:32 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 09.07.2021 01:32 Qu Wenruo wrote:
>
> [...]
>>> My recommendation for debugging is to add extra trace_printk() on
>>> btree_releasepage() to see when a metadata page really get freed, and
>>> read_extent_buffer_pages() to see what's the possible race.
>>>
>>> Another idea is to add extra debug output in end_bio_extent_readpage()
>>> for metadata pages.
>>>
>>> As I'm still wondering if there is something wrong detected by btrfs
>>> module but without any error message.
>>>
>>> In that case, we definitely want to add a proper error message.
>>
>> I guess this would be a proper error message:
>>
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -6505,8 +6505,14 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
>>          for (i = 0; i < num_pages; i++) {
>>                  page = eb->pages[i];
>>                  wait_on_page_locked(page);
>> -               if (!PageUptodate(page))
>> +               if (!PageUptodate(page)) {
>> +                       btrfs_err_rl(eb->fs_info,
>> +                               "error reading extent buffer "
>> +                               "start %llu len %lu PageError %d",
>> +                               eb->start, eb->len,
>> +                               PageError(page) ? 1 : 0);
>
> What I originally mean is, in end_bio_extent_readpage() we may hit a
> case where @uptodate is 0, but we don't have any error message for it.
>
> Your error message is fine, it has PageError() to indicate whether we're
> really hitting a case like above, or someone is releasing the metadata
> page improperly.
>
> If combined with the following diff, it can rule out or catch the error
> in metadata read path directly, as I'm still wondering if it's error
> from the bio, or the error from improperly released metadata pages.
>
> Thanks,
> Qu
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index b5f5de7e4a29..b1f8862ba539 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3047,6 +3047,16 @@ static void end_bio_extent_readpage(struct bio *bio)
>                 if (likely(uptodate))
>                         goto readpage_ok;
>
> +               if (!is_data_inode(inode)) {
> +                       struct extent_buffer *eb;
> +
> +                       eb = (struct extent_buffer *)page->private;
> +                       btrfs_err_rl(fs_info,
> +               "metadata read error, page=%llu eb=%llu bio_status=%d
> ret=%d",
> +                                    page_offset(page), eb->start,
> +                                    bio->bi_status, ret);
> +               }
> +
>                 if (is_data_inode(inode)) {
>                         /*
>                          * btrfs_submit_read_repair() will handle all
> the good
>
Added this to the kernel as well, but it is not hit:

[199571.686344] BTRFS error (device dm-0): error reading extent buffer start 97231241216 len 16384 PageError 0
[199571.686423] ------------[ cut here ]------------
[199571.686432] WARNING: CPU: 22 PID: 3451967 at fs/btrfs/extent_io.c:5611 read_extent_buffer_pages+0x308/0x380
[199571.686433] Modules linked in: zram bcache crc64 loop dm_crypt bfq xfs dm_mod st sr_mod cdrom bridge stp llc intel_powerclamp coretemp kvm_intel kvm snd_pcm snd_timer snd dcdbas irqbypass soundcore mgag200 serio_raw pcspkr drm_kms_helper evdev joydev iTCO_wdt iTCO_vendor_support i2c_algo_bit i7core_edac sg ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter button ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm configfs ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod ses enclosure sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ahci ehci_pci uhci_hcd libahci psmouse mpt3sas ehci_hcd raid_class lpc_ich libata nvme scsi_transport_sas mfd_core usbcore nvme_core t10_pi scsi_mod bnx2
[199571.686505] CPU: 22 PID: 3451967 Comm: kworker/u49:9 Tainted: G          I       5.10.48 #1
[199571.686506] Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.13.0 03/02/2018
[199571.686510] Workqueue: btrfs-endio-write btrfs_work_helper
[199571.686513] RIP: 0010:read_extent_buffer_pages+0x308/0x380
[199571.686516] Code: e9 8a fe ff ff 85 c9 75 05 83 fb 01 74 32 41 89 ce eb b3 48 c7 c6 b0 54 e4 81 48 c7 c7 80 d5 65 82 e8 2c b0 1a 00 85 c0 75 65 <0f> 0b 41 be fb ff ff ff 48 83 04 24 08 48 8b 04 24 4c 39 e0 74 86
[199571.686517] RSP: 0018:ffffc9004ad93a58 EFLAGS: 00010282
[199571.686519] RAX: 0000000000000000 RBX: ffffea001fbbc300 RCX: 0000000000000000
[199571.686520] RDX: 0000000000000000 RSI: ffff888e13cd8b80 RDI: ffff888e13cd8b80
[199571.686521] RBP: ffff888b616af7b8 R08: ffffffff825e21c8 R09: 0000000000027ffb
[199571.686522] R10: 00000000ffff8000 R11: 3fffffffffffffff R12: ffff888b616af860
[199571.686523] R13: ffffea0031592dc0 R14: 0000000000000000 R15: ffff888b616af860
[199571.686525] FS:  0000000000000000(0000) GS:ffff888e13cc0000(0000) knlGS:0000000000000000
[199571.686526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199571.686527] CR2: 00007f034897600a CR3: 000000011f344006 CR4: 00000000000206e0
[199571.686528] Call Trace:
[199571.686536]  btree_read_extent_buffer_pages+0x66/0x130
[199571.686539]  read_tree_block+0x36/0x60
[199571.686543]  read_block_for_search.isra.0+0x1ab/0x360
[199571.686547]  ? ttwu_do_wakeup+0x17/0x130
[199571.686550]  btrfs_search_slot+0x232/0x970
[199571.686552]  btrfs_lookup_csum+0x75/0x170
[199571.686557]  ? kmem_cache_alloc+0xda/0x1d0
[199571.686559]  btrfs_csum_file_blocks+0x17d/0x690
[199571.686561]  ? insert_reserved_file_extent+0x1cb/0x210
[199571.686564]  btrfs_finish_ordered_io.isra.0+0x49d/0x8d0
[199571.686567]  btrfs_work_helper+0xe0/0x310
[199571.686572]  process_one_work+0x1b6/0x350
[199571.686574]  worker_thread+0x53/0x3e0
[199571.686575]  ? process_one_work+0x350/0x350
[199571.686577]  kthread+0x11b/0x140
[199571.686579]  ? __kthread_bind_mask+0x60/0x60
[199571.686584]  ret_from_fork+0x1f/0x30
[199571.686586] ---[ end trace 05ee2703b7a5772f ]---

PageError isn't set es well, so it looks like the page is unlocked without setting an error but not setting Uptodate?

(I left the WARN_ON in to get a call trace and it retries so there is no further error message now for my kernel)
>
>>                          ret = -EIO;
>> +               }
>>          }
>>
>>          return ret;
>>
>> I haven't added this to the kernel I'm running yet. It currently still has a WARN_ON instead of the error message.
>>
>> I also added the retry like this:
>>
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -385,6 +385,7 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
>>          int num_copies = 0;
>>          int mirror_num = 0;
>>          int failed_mirror = 0;
>> +       int tries = 2;
>>
>>          io_tree = &BTRFS_I(fs_info->btree_inode)->io_tree;
>>          while (1) {
>> @@ -403,6 +404,10 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
>>
>>                  num_copies = btrfs_num_copies(fs_info,
>>                                                eb->start, eb->len);
>> +
>> +               if (num_copies == 1 && tries-- > 0)
>> +                       continue;
>> +
>>                  if (num_copies == 1)
>>                          break;
>>
>> On Tuesday I got this:
>>
>> [Tue Jul  6 14:49:17 2021] ------------[ cut here ]------------
>> [Tue Jul  6 14:49:17 2021] WARNING: CPU: 13 PID: 2265463 at fs/btrfs/extent_io.c:5597 read_extent_buffer_pages+0x346/0x360
>> [Tue Jul  6 14:49:17 2021] Modules linked in: zram bcache crc64 loop dm_crypt bfq xfs dm_mod st sr_mod cdrom bridge stp llc intel_powerclamp coretemp snd_pcm kvm_intel snd_timer snd mgag200 kvm soundcore drm_kms_helper iTCO_wdt dcdbas irqbypass pcspkr serio_raw iTCO_vendor_support i2c_algo_bit evdev joydev i7core_edac sg ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter button ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm configfs ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid0 multipath linear raid1 md_mod ses enclosure sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ahci libahci uhci_hcd ehci_pci psmouse mpt3sas ehci_hcd lpc_ich raid_class libata nvme scsi_transport_sas mfd_core nvme_core usbcore t10_pi scsi_mod bnx2
>> [Tue Jul  6 14:49:17 2021] CPU: 13 PID: 2265463 Comm: btrfs Tainted: G          I       5.10.47 #1
>> [Tue Jul  6 14:49:17 2021] Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.13.0 03/02/2018
>> [Tue Jul  6 14:49:17 2021] RIP: 0010:read_extent_buffer_pages+0x346/0x360
>> [Tue Jul  6 14:49:17 2021] Code: 48 8b 43 08 a8 01 48 8d 78 ff 48 0f 44 fb 31 f6 e8 cf d0 db ff 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da 48 8b 03 a8 04 75 ab <0f> 0b 41 be fb ff ff ff eb a1 e8 4b af 56 00 66 66 2e 0f 1f 84 00
>> [Tue Jul  6 14:49:17 2021] RSP: 0018:ffffc9007562bb40 EFLAGS: 00010246
>> [Tue Jul  6 14:49:17 2021] RAX: 06ffff80000020e3 RBX: ffffea00095a5fc0 RCX: 0000000000000000
>> [Tue Jul  6 14:49:17 2021] RDX: dead0000000000ff RSI: 0000000000000020 RDI: ffff888713babd40
>> [Tue Jul  6 14:49:17 2021] RBP: ffff88869a7d6f78 R08: 0000f5a746752ed4 R09: 0000000000000483
>> [Tue Jul  6 14:49:17 2021] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88869a7d7020
>> [Tue Jul  6 14:49:17 2021] R13: ffffea0005f51f80 R14: 0000000000000000 R15: ffff88869a7d7020
>> [Tue Jul  6 14:49:17 2021] FS:  00007f748ddc18c0(0000) GS:ffff888713b80000(0000) knlGS:0000000000000000
>> [Tue Jul  6 14:49:17 2021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [Tue Jul  6 14:49:17 2021] CR2: 00005618b6037058 CR3: 000000011feca004 CR4: 00000000000206e0
>> [Tue Jul  6 14:49:17 2021] Call Trace:
>> [Tue Jul  6 14:49:17 2021]  btree_read_extent_buffer_pages+0x66/0x130
>> [Tue Jul  6 14:49:17 2021]  read_tree_block+0x36/0x60
>> [Tue Jul  6 14:49:17 2021]  btrfs_read_node_slot+0xc0/0x110
>> [Tue Jul  6 14:49:17 2021]  btrfs_search_forward+0x1db/0x350
>> [Tue Jul  6 14:49:17 2021]  search_ioctl+0x19e/0x250
>> [Tue Jul  6 14:49:17 2021]  btrfs_ioctl_tree_search+0x63/0xc0
>> [Tue Jul  6 14:49:17 2021]  btrfs_ioctl+0x1874/0x3060
>> [Tue Jul  6 14:49:17 2021]  ? page_add_new_anon_rmap+0xa3/0x1f0
>> [Tue Jul  6 14:49:17 2021]  ? handle_mm_fault+0xf6c/0x1950
>> [Tue Jul  6 14:49:17 2021]  ? __x64_sys_ioctl+0x83/0xb0
>> [Tue Jul  6 14:49:17 2021]  __x64_sys_ioctl+0x83/0xb0
>> [Tue Jul  6 14:49:17 2021]  do_syscall_64+0x33/0x80
>> [Tue Jul  6 14:49:17 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [Tue Jul  6 14:49:17 2021] RIP: 0033:0x7f748deb8cc7
>> [Tue Jul  6 14:49:17 2021] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
>> [Tue Jul  6 14:49:17 2021] RSP: 002b:00007ffc7e07f098 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>> [Tue Jul  6 14:49:17 2021] RAX: ffffffffffffffda RBX: 00007ffc7e0801d8 RCX: 00007f748deb8cc7
>> [Tue Jul  6 14:49:17 2021] RDX: 00007ffc7e07f0f8 RSI: 00000000d0009411 RDI: 0000000000000005
>> [Tue Jul  6 14:49:17 2021] RBP: 0000000000bafaad R08: 000000000000000b R09: 00007f748df82be0
>> [Tue Jul  6 14:49:17 2021] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000005
>> [Tue Jul  6 14:49:17 2021] R13: 00007ffc7e07ffe1 R14: 000000000000000b R15: 00007ffc7e07f160
>> [Tue Jul  6 14:49:17 2021] ---[ end trace 81f64eb2e9ceb4de ]---
>>
>> with no other message afterwards. This means the retry is working and "fixed" the problem for me. I'll keep monitoring this and will try to catch the -EIO from other areas, like the original btrfs_finish_ordered_io.
>>


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-07-14 16:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-09 22:14 IO failure without other (device) error Martin Raiber
2021-06-18 16:18 ` Martin Raiber
2021-06-18 16:28   ` Roman Mamedov
2021-06-18 17:17     ` Martin Raiber
2021-06-18 17:36   ` Roman Mamedov
2021-06-18 18:24     ` Martin Raiber
2021-06-30 18:40   ` Martin Raiber
2021-07-01  1:40     ` Qu Wenruo
2021-07-01 17:25       ` Martin Raiber
2021-07-01 22:19         ` Qu Wenruo
2021-07-02 16:29           ` Martin Raiber
2021-07-02 22:46             ` Qu Wenruo
2021-07-08 16:14               ` Martin Raiber
2021-07-08 23:32                 ` Qu Wenruo
2021-07-14 16:32                   ` Martin Raiber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.