linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux RAID with btrfs stuck and consume 100 % CPU
@ 2020-07-22 20:47 Vojtech Myslivec
  2020-07-22 22:00 ` antlists
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Vojtech Myslivec @ 2020-07-22 20:47 UTC (permalink / raw)
  To: linux-btrfs, linux-raid; +Cc: Michal Moravec

[-- Attachment #1: Type: text/plain, Size: 4791 bytes --]

Hello,

I would like to file an issue related to a btrfs filesystem on a md RAID
and I am not sure which one is the cause of this problem. So I write to
both mailing lists at once.

My colleague, who spent a lot of time experiencing and trying to
workaround the issue, is in CC:

Thank you for any help,
Vojtech Myslivec



The issue
=========

During last several weeks, I am experiencing annoying issue on my
physical server with Linux software RAID 6 with write journal and btrfs
filesystem on top of it. Disks setup is described below.

This host serves as a backup server and it runs regular backup tasks.
When a backup is performed, one (read only) snapshot of one of my
subvolumes on the btrfs filesystem is created and one snapshot is
deleted afterwards.

Once in several days (irregularly, as I noticed), the `md1_raid6`
process starts to consume 100 % of one CPU core and during that time,
creating a snapshot (during the regular backup process) of a btrfs
subvolume get stucked. User space processes accessing this particular
subvolume then start to hang in *disk sleep* state. Access to other
subvolumes seems to be unaffected until another backup process tries to
create another snapshot (of different subvolume).

In most cases, after several "IO" actions like listing files (ls),
accessing btrfs information (`btrfs filesystem`, `btrfs subvolume`), or
accessing the device (with `dd` or whatever), the filesystem gets
magically unstucked and `md1_raid6` process released from its "live
lock" (or whatever it is cycled in). Snapshots are then created as
expected and all processes finish their job.

Once in a week approximately, it takes tens of minutes to unstuck these
processes. During that period, I try to access affected btrfs subvolumes
in several shell sessions to wake it up.

If I let the system untouched and let just let it be, the system unstuck
itself after several hours. This happens often during a night when I am
not awake to try to unstuck the filesystem manually.

Also, the worse case happens sometimes, when I am not successful with
this "magic unstucking". I am not able to even kill the processes
stucked in disk sleep state then and I am forced to restart the system.

When my issue happens, I found very often similar messages in kernel
dmesg log:

    INFO: task md1_reclaim:910 blocked for more than 120 seconds.

with included call trace.

However, there are some more "blocked" tasks, like `btrfs` and
`btrfs-transaction` with call trace also included.


Questions
=========

1. What should be the cause of this problem?
2. What should I do to mitigate this issue?
3. Could it be a hardware problem? How can I track this?


What I have done so far
=======================

- I keep the system up-to-date, with latest stable kernel provided by
  Debian packages

- I run both `btrfs scrub` and `fsck.btrfs` to exclude btrfs filesystem
  issue.

- I have read all the physical disks (with dd command) and perform SMART
  self tests to exclude disks issue (though read/write badblocks were
  not checked yet).

- I have also moved all the files out of the affected filesystem, create
  a new btrfs filesystem (with recent btrfs-progs) and moved files
  back. This issue, none the less, appeared again.

- I have tried to attach strace to cycled md1 process, but
  unsuccessfully (is it even possible to strace running kernel thread?)

- Of course, I have tried to find similar issues around the web, but I
  was not successful.


Some detailed facts
===================

OS
--

- Debian 10 buster (stable release)
- Linux kernel 5.5 (from Debian backports)
- btrfs-progs 5.2.1 (from Debian backports)

Hardware
--------

- 8 core/16 threads amd64 processor (AMD EPYC 7251)
- 6 SATA HDD disks (Seagate Enterprise Capacity)
- 2 SSD disks (Intel D3-S4610)

RAID
----

- All Linux MD RAID (no HW RAID used)
- RAID1 over 2 SSD (md0)
    - On top of this RAID1 is a LVM with logical volumes for:
        - root filesystem
        - swap
        - RAID6 write journal
- RAID6 over 6 HDD and 1 LV as write journal (md1)
    - This is the affected device

btrfs
-----
- Several subvolumes, tens of snapshots
- Default mount options: rw,noatime,space_cache,subvolid=5,subvol=/
- No compression, autodefrag or so
- I have tried to use quotas in the past but they are disabled for
  a long time

Usage
-----

- Affected RAID6 block device is directly formatted to btrfs
- This filesystem is used to store backups
- Backups are performed via rsnapshot
- rsnapshot is configured to use btrfs snapshots for hourly and daily
  backups and rsync to copy new backups

Other IO operations
-------------------

- There is ongoing monitoring through Icinga which uses iostat
  internally to monitor I/O
- SMART selftests run periodically (on weekly and monthly basis)



[-- Attachment #2: dmesg-btrfs.log --]
[-- Type: text/x-log, Size: 2444 bytes --]

[  +0,000032] INFO: task btrfs:14183 blocked for more than 120 seconds.
[  +0,000030]       Not tainted 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
[  +0,000033] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0,000036] btrfs           D    0 14183  14177 0x00004000
[  +0,000002] Call Trace:
[  +0,000003]  ? __schedule+0x2db/0x700
[  +0,000001]  ? mempool_alloc+0x67/0x190
[  +0,000002]  schedule+0x40/0xb0
[  +0,000007]  md_write_start+0x14b/0x220 [md_mod]
[  +0,000002]  ? finish_wait+0x80/0x80
[  +0,000002]  ? finish_wait+0x80/0x80
[  +0,000005]  raid5_make_request+0x83/0xb50 [raid456]
[  +0,000004]  ? bio_add_page+0x39/0x90
[  +0,000023]  ? submit_extent_page+0xd8/0x200 [btrfs]
[  +0,000002]  ? finish_wait+0x80/0x80
[  +0,000002]  ? __blk_queue_split+0x403/0x4c0
[  +0,000002]  ? finish_wait+0x80/0x80
[  +0,000005]  md_handle_request+0x119/0x190 [md_mod]
[  +0,000006]  md_make_request+0x8a/0x190 [md_mod]
[  +0,000003]  generic_make_request+0xcf/0x310
[  +0,000026]  ? btrfs_dev_stat_inc_and_print+0x50/0x50 [btrfs]
[  +0,000001]  submit_bio+0x42/0x1c0
[  +0,000025]  btrfs_map_bio+0x1c0/0x380 [btrfs]
[  +0,000022]  btrfs_submit_bio_hook+0x8c/0x170 [btrfs]
[  +0,000024]  submit_one_bio+0x31/0x50 [btrfs]
[  +0,000024]  extent_writepages+0x6b/0xa0 [btrfs]
[  +0,000002]  do_writepages+0x41/0xd0
[  +0,000024]  ? merge_state.part.43+0x3f/0x160 [btrfs]
[  +0,000002]  __filemap_fdatawrite_range+0xcb/0x100
[  +0,000023]  btrfs_fdatawrite_range+0x1b/0x50 [btrfs]
[  +0,000025]  __btrfs_write_out_cache+0x467/0x4c0 [btrfs]
[  +0,000026]  btrfs_write_out_cache+0x92/0xe0 [btrfs]
[  +0,000023]  btrfs_start_dirty_block_groups+0x270/0x4e0 [btrfs]
[  +0,000023]  btrfs_commit_transaction+0xc5/0xa40 [btrfs]
[  +0,000020]  ? btrfs_record_root_in_trans+0x56/0x60 [btrfs]
[  +0,000021]  ? start_transaction+0xbc/0x4d0 [btrfs]
[  +0,000024]  btrfs_mksubvol+0x4f0/0x530 [btrfs]
[  +0,000026]  btrfs_ioctl_snap_create_transid+0x170/0x180 [btrfs]
[  +0,000024]  btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
[  +0,000024]  btrfs_ioctl+0x11c2/0x2de0 [btrfs]
[  +0,000004]  ? tomoyo_path_number_perm+0x68/0x1e0
[  +0,000005]  ? do_vfs_ioctl+0xa4/0x680
[  +0,000001]  do_vfs_ioctl+0xa4/0x680
[  +0,000003]  ksys_ioctl+0x60/0x90
[  +0,000001]  __x64_sys_ioctl+0x16/0x20
[  +0,000004]  do_syscall_64+0x52/0x170
[  +0,000002]  entry_SYSCALL_64_after_hwframe+0x44/0xa9


[-- Attachment #3: dmesg-btrfs-transaction.log --]
[-- Type: text/x-log, Size: 1208 bytes --]

[  +0,000008] INFO: task btrfs-transacti:949 blocked for more than 120 seconds.
[  +0,000032]       Not tainted 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
[  +0,000033] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0,000035] btrfs-transacti D    0   949      2 0x80004000
[  +0,000002] Call Trace:
[  +0,000003]  ? __schedule+0x2db/0x700
[  +0,000002]  schedule+0x40/0xb0
[  +0,000002]  io_schedule+0x12/0x40
[  +0,000004]  wait_on_page_bit+0x15c/0x230
[  +0,000002]  ? file_fdatawait_range+0x20/0x20
[  +0,000002]  __filemap_fdatawait_range+0x89/0xf0
[  +0,000027]  ? __clear_extent_bit+0x2bd/0x440 [btrfs]
[  +0,000002]  filemap_fdatawait_range+0xe/0x20
[  +0,000022]  __btrfs_wait_marked_extents.isra.20+0xc2/0x100 [btrfs]
[  +0,000022]  btrfs_write_and_wait_transaction.isra.24+0x67/0xd0 [btrfs]
[  +0,000022]  btrfs_commit_transaction+0x754/0xa40 [btrfs]
[  +0,000022]  ? start_transaction+0xbc/0x4d0 [btrfs]
[  +0,000022]  transaction_kthread+0x144/0x170 [btrfs]
[  +0,000021]  ? btrfs_cleanup_transaction+0x5e0/0x5e0 [btrfs]
[  +0,000002]  kthread+0x112/0x130
[  +0,000001]  ? kthread_park+0x80/0x80
[  +0,000002]  ret_from_fork+0x22/0x40



[-- Attachment #4: dmesg-md1_reclaim.log --]
[-- Type: text/x-log, Size: 1441 bytes --]

[čen 1 13:05] INFO: task md1_reclaim:910 blocked for more than 120 seconds.
[  +0,000057]       Not tainted 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
[  +0,000033] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0,000035] md1_reclaim     D    0   910      2 0x80004000
[  +0,000003] Call Trace:
[  +0,000012]  ? __schedule+0x2db/0x700
[  +0,000004]  ? check_preempt_curr+0x62/0x90
[  +0,000004]  ? wbt_exit+0x30/0x30
[  +0,000001]  ? __wbt_done+0x30/0x30
[  +0,000002]  schedule+0x40/0xb0
[  +0,000002]  io_schedule+0x12/0x40
[  +0,000003]  rq_qos_wait+0xfa/0x170
[  +0,000002]  ? karma_partition+0x210/0x210
[  +0,000002]  ? wbt_exit+0x30/0x30
[  +0,000001]  wbt_wait+0x99/0xe0
[  +0,000002]  __rq_qos_throttle+0x23/0x30
[  +0,000004]  blk_mq_make_request+0x12a/0x5d0
[  +0,000003]  generic_make_request+0xcf/0x310
[  +0,000002]  submit_bio+0x42/0x1c0
[  +0,000010]  ? md_super_write.part.70+0x98/0x120 [md_mod]
[  +0,000007]  md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
[  +0,000008]  r5l_do_reclaim+0x32a/0x3b0 [raid456]
[  +0,000003]  ? schedule_timeout+0x161/0x310
[  +0,000002]  ? prepare_to_wait_event+0xb6/0x140
[  +0,000006]  ? md_register_thread+0xd0/0xd0 [md_mod]
[  +0,000004]  md_thread+0x94/0x150 [md_mod]
[  +0,000002]  ? finish_wait+0x80/0x80
[  +0,000003]  kthread+0x112/0x130
[  +0,000002]  ? kthread_park+0x80/0x80
[  +0,000002]  ret_from_fork+0x22/0x40



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec
@ 2020-07-22 22:00 ` antlists
  2020-07-23  2:08 ` Chris Murphy
  2020-07-29 21:06 ` Guoqing Jiang
  2 siblings, 0 replies; 22+ messages in thread
From: antlists @ 2020-07-22 22:00 UTC (permalink / raw)
  To: Vojtech Myslivec, linux-btrfs, linux-raid; +Cc: Michal Moravec

On 22/07/2020 21:47, Vojtech Myslivec wrote:
> - RAID6 over 6 HDD and 1 LV as write journal (md1)
>      - This is the affected device

What hard drives do you have?

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

Dunno whether that is your problem, but it's one of the first things to 
check.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec
  2020-07-22 22:00 ` antlists
@ 2020-07-23  2:08 ` Chris Murphy
       [not found]   ` <29509e08-e373-b352-d696-fcb9f507a545@xmyslivec.cz>
  2020-07-29 21:06 ` Guoqing Jiang
  2 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2020-07-23  2:08 UTC (permalink / raw)
  To: Vojtech Myslivec; +Cc: Btrfs BTRFS, Linux-RAID, Michal Moravec

On Wed, Jul 22, 2020 at 2:55 PM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote:
>
> This host serves as a backup server and it runs regular backup tasks.
> When a backup is performed, one (read only) snapshot of one of my
> subvolumes on the btrfs filesystem is created and one snapshot is
> deleted afterwards.

This is likely to be a decently metadata centric workflow, lots of
small file changes, and metadata (file system) changes. Parity raid
performance in such workloads is often not great. It's just the way it
goes. But what does iostat tell you about drive utilization during
these backups? And during the problem? Are they balanced? Are they
nearly fully utilized?



>
> Once in several days (irregularly, as I noticed), the `md1_raid6`
> process starts to consume 100 % of one CPU core and during that time,
> creating a snapshot (during the regular backup process) of a btrfs
> subvolume get stucked. User space processes accessing this particular
> subvolume then start to hang in *disk sleep* state. Access to other
> subvolumes seems to be unaffected until another backup process tries to
> create another snapshot (of different subvolume).

Snapshot results in flush. And snapshot delete result in btrfs-cleaner
process, which involves a lot of reas and writes to track down the
extents to be freed. But your call traces seem stuck in snapshot
creation.

Can you provide mdadm -E and -D output respectively? I wonder if the
setup is just not well suited for the workload. Default mdadm 512KiB
chunk may not align well with this workload.

Also, a complete dmesg might be useful.



>
> In most cases, after several "IO" actions like listing files (ls),
> accessing btrfs information (`btrfs filesystem`, `btrfs subvolume`), or
> accessing the device (with `dd` or whatever), the filesystem gets
> magically unstucked and `md1_raid6` process released from its "live
> lock" (or whatever it is cycled in). Snapshots are then created as
> expected and all processes finish their job.
>
> Once in a week approximately, it takes tens of minutes to unstuck these
> processes. During that period, I try to access affected btrfs subvolumes
> in several shell sessions to wake it up.

Could be lock contention on the subvolume.


> However, there are some more "blocked" tasks, like `btrfs` and
> `btrfs-transaction` with call trace also included.
>
>
> Questions
> =========
>
> 1. What should be the cause of this problem?
> 2. What should I do to mitigate this issue?
> 3. Could it be a hardware problem? How can I track this?

Not sure  yet. Need more info.

dmesg
mdadm -E
mdadm -D
btrfs filesystem usage /mountpoint
btrfs device stats /mountpoint

> What I have done so far
> =======================
>
> - I keep the system up-to-date, with latest stable kernel provided by
>   Debian packages

5.5 is fairly recent and OK. It should be fine, except you're having a
problem, so...it could be a bug that's fixed already or a new bug. Or
it could be suboptimal configuration for the workload - which can be
difficult to figure out.

>
> - I run both `btrfs scrub` and `fsck.btrfs` to exclude btrfs filesystem
>   issue.
>
> - I have read all the physical disks (with dd command) and perform SMART
>   self tests to exclude disks issue (though read/write badblocks were
>   not checked yet).

I wouldn't worry too much about badblocks. More important is
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

But you report using enterprise drives. They will invariably have an
SCT ERC time of ~70 deciseconds, which is well below that of the
kernel's SCSI command timer, ergo not a problem. But it's fine to
double check that.


> - I have also moved all the files out of the affected filesystem, create
>   a new btrfs filesystem (with recent btrfs-progs) and moved files
>   back. This issue, none the less, appeared again.

Exactly the same configuration? Anything different at all?


>
> - I have tried to attach strace to cycled md1 process, but
>   unsuccessfully (is it even possible to strace running kernel thread?)

You want to do 'cat /proc/<pid>/stack'


> Some detailed facts
> ===================
>
> OS
> --
>
> - Debian 10 buster (stable release)
> - Linux kernel 5.5 (from Debian backports)
> - btrfs-progs 5.2.1 (from Debian backports)

btrfs-progs 5.2.1 is ok, but I suggest something newer before using
'btrfs check --repair'. Just to be clear --repair is NOT indicated
right now.


> Hardware
> --------
>
> - 8 core/16 threads amd64 processor (AMD EPYC 7251)
> - 6 SATA HDD disks (Seagate Enterprise Capacity)
> - 2 SSD disks (Intel D3-S4610)

It's not related, but your workload might benefit from
'compress=zstd:1' mount option. Compress everything across the board.
Chances are these backups contain a lot of compressible data. This
isn't important to do right now. Fix the problem first. Optimize
later. But you have significant CPU capacity relative to the hardware.


> btrfs
> -----
> - Several subvolumes, tens of snapshots
> - Default mount options: rw,noatime,space_cache,subvolid=5,subvol=/
> - No compression, autodefrag or so
> - I have tried to use quotas in the past but they are disabled for
>   a long time

I don't think this is the only thing going on, but consider
space_cache=v2. You can mount with '-o clear_cache'  then umount and
then mount again with 'o space_cache=v2' to convert. And it will be
persistent (unless invalidated by a repair and then default v1 version
is used again). v2 will soon be the default.




>
> Usage
> -----
>
> - Affected RAID6 block device is directly formatted to btrfs
> - This filesystem is used to store backups
> - Backups are performed via rsnapshot
> - rsnapshot is configured to use btrfs snapshots for hourly and daily
>   backups and rsync to copy new backups

How many rsnapshot and rsync tasks are happening concurrently for a
subvolume at the time the  subvolume becomes unresponsive?




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
       [not found]   ` <29509e08-e373-b352-d696-fcb9f507a545@xmyslivec.cz>
@ 2020-07-28 20:23     ` Chris Murphy
       [not found]       ` <695936b4-67a2-c862-9cb6-5545b4ab3c42@xmyslivec.cz>
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2020-07-28 20:23 UTC (permalink / raw)
  To: Vojtech Myslivec; +Cc: Chris Murphy, Michal Moravec, Btrfs BTRFS, Linux-RAID

On Tue, Jul 28, 2020 at 7:31 AM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote:

> > dmesg
> > mdadm -E
> > mdadm -D
> > btrfs filesystem usage /mountpoint
> > btrfs device stats /mountpoint

These all look good.


> > SCT Error Recovery Control:
> >            Read:    100 (10.0 seconds)
> >           Write:    100 (10.0 seconds)
>
> It is higher than you expect, yet still below kernel 30 s timeout, right?

It's good.


> > It's not related, but your workload might benefit from
> > 'compress=zstd:1' mount option. Compress everything across the board.
> > Chances are these backups contain a lot of compressible data. This
> > isn't important to do right now. Fix the problem first. Optimize
> > later. But you have significant CPU capacity relative to the hardware.
>
> OK, thanks for the tip. Overall CPU utilization is not high at the
> moment. The server is dedicated to backups so I can try this.
>
> In fact, I am scared a bit of any compression related to btrfs. I do not
> to blame anyone, I just read some recommendation about disabling
> compression on btrfs (Debian wiki, kernel wiki, ...).

That's based on ancient kernels. Also the last known bug was really
obscure, I never hit it. You had to have some combination of inline
extents and also holes. You're using 5.5, and that has all bug fixes
for that. At least Facebook folks are using compress=zstd:1 pretty
much across the board and have a metric s ton of machines they're
doing this with, so it's reliable.

> In most cases backups are pretty fast and it runs only one at a time.
> From the logs on the server, I can see it it get stuck when only one
> backup process is running.
>
> But I am not able to tell if a background btrfs-cleaner procces is
> running at that moment. I can focus on this if it helps.

Your dmesg contains
[ 9667.449898] INFO: task md1_reclaim:910 blocked for more than 120 seconds.

It might be helpful to reproduce and take sysrq+w at the time of the
blocking. Sometimes it's best to have the sysrq trigger command ready
in a hell, but don't hit enter until the blocked task happens.
Sometimes during blocked tasks it takes forever to issue a command.

It would be nice if an md kernel developer can comment on what's going on.

Does this often happen when a btrfs snapshot is created? That will
cause a flush to happen and I wonder if that's instigating the problem
in the lower layers.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec
  2020-07-22 22:00 ` antlists
  2020-07-23  2:08 ` Chris Murphy
@ 2020-07-29 21:06 ` Guoqing Jiang
  2020-07-29 21:48   ` Chris Murphy
                     ` (2 more replies)
  2 siblings, 3 replies; 22+ messages in thread
From: Guoqing Jiang @ 2020-07-29 21:06 UTC (permalink / raw)
  To: Vojtech Myslivec, linux-btrfs, linux-raid; +Cc: Michal Moravec, Song Liu

Hi,

On 7/22/20 10:47 PM, Vojtech Myslivec wrote:
> 1. What should be the cause of this problem?

Just a quick glance based on the stacks which you attached, I guess it 
could be
a deadlock issue of raid5 cache super write.

Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock
write") didn't fix the problem completely.  Cc Song.

And I am curious why md thread is not waked if mddev_trylock fails, you can
give it a try but I can't promise it helps ...

--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1337,8 +1337,10 @@ static void 
r5l_write_super_and_discard_space(struct r5l_log *log,
          */
         set_mask_bits(&mddev->sb_flags, 0,
                 BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
-       if (!mddev_trylock(mddev))
+       if (!mddev_trylock(mddev)) {
+               md_wakeup_thread(mddev->thread);
                 return;
+       }
         md_update_sb(mddev, 1);
         mddev_unlock(mddev);


Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-07-29 21:06 ` Guoqing Jiang
@ 2020-07-29 21:48   ` Chris Murphy
  2020-08-12 14:19     ` Vojtech Myslivec
  2020-07-30  6:45   ` Song Liu
  2020-08-12 13:58   ` Vojtech Myslivec
  2 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2020-07-29 21:48 UTC (permalink / raw)
  To: Guoqing Jiang
  Cc: Vojtech Myslivec, Btrfs BTRFS, Linux-RAID, Michal Moravec, Song Liu

On Wed, Jul 29, 2020 at 3:06 PM Guoqing Jiang
<guoqing.jiang@cloud.ionos.com> wrote:
>
> Hi,
>
> On 7/22/20 10:47 PM, Vojtech Myslivec wrote:
> > 1. What should be the cause of this problem?
>
> Just a quick glance based on the stacks which you attached, I guess it
> could be
> a deadlock issue of raid5 cache super write.
>
> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock
> write") didn't fix the problem completely.  Cc Song.

That references discards, and it make me relook at mdadm -D which
shows a journal device:

       0     253        2        -      journal   /dev/dm-2

Vojtech, can you confirm this device is an SSD? There are a couple
SSDs that show up in the dmesg if I recall correctly.

What is the default discard hinting for this SSD when it's used as a
journal device for mdadm? And what is the write behavior of the
journal? I'm not familiar with this feature at all, whether it's
treated as a raw block device for the journal or if the journal
resides on a file system. So I get kinda curious what might happen
long term if this is a very busy file system, very busy raid5/6
journal on this SSD, without any discard hints? Is it possible the SSD
runs out of ready-to-write erase blocks, and the firmware has become
super slow doing erasure/garbage collection on demand? And the journal
is now having a hard time flushing?


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-07-29 21:06 ` Guoqing Jiang
  2020-07-29 21:48   ` Chris Murphy
@ 2020-07-30  6:45   ` Song Liu
  2020-08-12 13:58   ` Vojtech Myslivec
  2 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2020-07-30  6:45 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Vojtech Myslivec, linux-btrfs, linux-raid, Michal Moravec



> On Jul 29, 2020, at 2:06 PM, Guoqing Jiang <guoqing.jiang@cloud.ionos.com> wrote:
> 
> Hi,
> 
> On 7/22/20 10:47 PM, Vojtech Myslivec wrote:
>> 1. What should be the cause of this problem?
> 
> Just a quick glance based on the stacks which you attached, I guess it could be
> a deadlock issue of raid5 cache super write.
> 
> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock
> write") didn't fix the problem completely.  Cc Song.
> 
> And I am curious why md thread is not waked if mddev_trylock fails, you can
> give it a try but I can't promise it helps ...
> 
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -1337,8 +1337,10 @@ static void r5l_write_super_and_discard_space(struct r5l_log *log,
>          */
>         set_mask_bits(&mddev->sb_flags, 0,
>                 BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
> -       if (!mddev_trylock(mddev))
> +       if (!mddev_trylock(mddev)) {
> +               md_wakeup_thread(mddev->thread);
>                 return;
> +       }
>         md_update_sb(mddev, 1);
>         mddev_unlock(mddev);
> 

Thanks Guoqing!

I am not sure whether we hit the mddev_trylock() failure. Looks like the 
md1_raid6 thread is already running at 100%. 

A few questions: 

1. I see wbt_wait in the stack trace. Are we using write back throttling here?
2. Could you please get the /proc/<pid>/stack for <pid> of md1_raid6? We may
   want to sample it multiple times. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-07-29 21:06 ` Guoqing Jiang
  2020-07-29 21:48   ` Chris Murphy
  2020-07-30  6:45   ` Song Liu
@ 2020-08-12 13:58   ` Vojtech Myslivec
  2 siblings, 0 replies; 22+ messages in thread
From: Vojtech Myslivec @ 2020-08-12 13:58 UTC (permalink / raw)
  To: Guoqing Jiang, linux-btrfs, linux-raid; +Cc: Michal Moravec, Song Liu

Hello,

On 29. 07. 20 23:06, Guoqing Jiang wrote:
> On 7/22/20 10:47 PM, Vojtech Myslivec wrote:
>> 1. What should be the cause of this problem?
> 
> Just a quick glance based on the stacks which you attached, I guess it
> could be
> a deadlock issue of raid5 cache super write.
> 
> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock
> write") didn't fix the problem completely.  Cc Song.
> 
> And I am curious why md thread is not waked if mddev_trylock fails, you can
> give it a try but I can't promise it helps ...

Just to make sure we use RAID 6 across 6 HDD (I am not sure how RAID 5
implementation relates to RAID 6).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-07-29 21:48   ` Chris Murphy
@ 2020-08-12 14:19     ` Vojtech Myslivec
  2020-08-12 14:19       ` Vojtech Myslivec
  0 siblings, 1 reply; 22+ messages in thread
From: Vojtech Myslivec @ 2020-08-12 14:19 UTC (permalink / raw)
  To: Chris Murphy, Guoqing Jiang
  Cc: Btrfs BTRFS, Linux-RAID, Michal Moravec, Song Liu

[-- Attachment #1: Type: text/plain, Size: 2848 bytes --]



On 29. 07. 20 23:48, Chris Murphy wrote:
> On Wed, Jul 29, 2020 at 3:06 PM Guoqing Jiang
> <guoqing.jiang@cloud.ionos.com> wrote:
>> On 7/22/20 10:47 PM, Vojtech Myslivec wrote:
>>> 1. What should be the cause of this problem?
>>
>> Just a quick glance based on the stacks which you attached, I guess it
>> could be
>> a deadlock issue of raid5 cache super write.
>>
>> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock>> write") didn't fix the problem completely.  Cc Song.
> 
> That references discards, and it make me relook at mdadm -D which
> shows a journal device:
> 
>        0     253        2        -      journal   /dev/dm-2
> 
> Vojtech, can you confirm this device is an SSD? There are a couple
> SSDs that show up in the dmesg if I recall correctly.

I tried to explain this in my first post. It's logical volume in a
volume group over RAID 1 over 2 SSDs.

My colleague replied to with more details:

On 05. 08. 2020 Michal Moravec wrote:
>> On 29 Jul 2020, Chris Murphy wrote:
>> Vojtech, can you confirm this device is an SSD? There are a couple
>> SSDs that show up in the dmesg if I recall correctly.
>
> Yes. We have a pair (sdg, sdh) of INTEL D3-S4610 240 GB SSDs
> (SSDSC2KG240G8).
> We use them for OS and the raid6 journal.
> They are configured as raid md0 array with LVM on top of it.
> Logical volume vg0-journal_md1 (of 1G size) is used as journal device
> for md1 array (where are problem with proccess md1_raid6 consuming
> 100%
> CPU and blocking btrfs operation is happening)


>> What is the default discard hinting for this SSD when it's used as
>> a journal device for mdadm?
>
> What do you mean by discard hinting?
> We have a issue_discards = 1 configuration in /etc/lvm/lvm.conf


>> And what is the write behavior of the journal?
>
> That would be journal_mode set to write-through, right?


>> I'm not familiar with this feature at all, whether it's treated as a
>> raw block device for the journal or if the journal resides on a file
>> system.
>
> From lsblk output I see no filesystem on vg0-journal_md1. It looks
> like plain logical volume to me.

[my comment]: yes, it's LV block device, no filesystem here.


>> So I get kinda curious what might happen long term if this is a very
>> busy file system, very busy raid5/6 journal on this SSD, without any
>> discard hints?
>> Is it possible the SSD runs out of ready-to-write erase blocks, and
>> the firmware has become super slow doing erasure/garbage collection
>> on demand?
>> And the journal is now having a hard time flushing?
>
> What kind of information could we gather to verify/reject any of these
> ideas?


[my question]: Is LVM configuration (above) enough? Sadly, there are not
much information about RAID 6 journaling at kernel wiki. There are some
info in mdadm(8), but nothing about discards/trim operation.

[-- Attachment #2: lsblk-output.txt --]
[-- Type: text/plain, Size: 1076 bytes --]

NAME                  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdg                     8:96   1 223,6G  0 disk  
├─sdg1                  8:97   1  37,3G  0 part  
│ └─md0                 9:0    0  37,2G  0 raid1 
│   ├─vg0-swap        253:0    0   3,7G  0 lvm   [SWAP]
│   ├─vg0-root        253:1    0  14,9G  0 lvm   /
│   └─vg0-journal_md1 253:2    0     1G  0 lvm   
│     └─md1             9:1    0  29,1T  0 raid6 /mnt/data
├─sdg2                  8:98   1     1K  0 part  
└─sdg5                  8:101  1 186,3G  0 part  
sdh                     8:112  1 223,6G  0 disk  
├─sdh1                  8:113  1  37,3G  0 part  
│ └─md0                 9:0    0  37,2G  0 raid1 
│   ├─vg0-swap        253:0    0   3,7G  0 lvm   [SWAP]
│   ├─vg0-root        253:1    0  14,9G  0 lvm   /
│   └─vg0-journal_md1 253:2    0     1G  0 lvm   
│     └─md1             9:1    0  29,1T  0 raid6 /mnt/data
├─sdh2                  8:114  1     1K  0 part  
└─sdh5                  8:117  1 186,3G  0 part  

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-08-12 14:19     ` Vojtech Myslivec
@ 2020-08-12 14:19       ` Vojtech Myslivec
  0 siblings, 0 replies; 22+ messages in thread
From: Vojtech Myslivec @ 2020-08-12 14:19 UTC (permalink / raw)
  To: Chris Murphy, Guoqing Jiang
  Cc: Btrfs BTRFS, Linux-RAID, Michal Moravec, Song Liu

[-- Attachment #1: Type: text/plain, Size: 2848 bytes --]



On 29. 07. 20 23:48, Chris Murphy wrote:
> On Wed, Jul 29, 2020 at 3:06 PM Guoqing Jiang
> <guoqing.jiang@cloud.ionos.com> wrote:
>> On 7/22/20 10:47 PM, Vojtech Myslivec wrote:
>>> 1. What should be the cause of this problem?
>>
>> Just a quick glance based on the stacks which you attached, I guess it
>> could be
>> a deadlock issue of raid5 cache super write.
>>
>> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock>> write") didn't fix the problem completely.  Cc Song.
> 
> That references discards, and it make me relook at mdadm -D which
> shows a journal device:
> 
>        0     253        2        -      journal   /dev/dm-2
> 
> Vojtech, can you confirm this device is an SSD? There are a couple
> SSDs that show up in the dmesg if I recall correctly.

I tried to explain this in my first post. It's logical volume in a
volume group over RAID 1 over 2 SSDs.

My colleague replied to with more details:

On 05. 08. 2020 Michal Moravec wrote:
>> On 29 Jul 2020, Chris Murphy wrote:
>> Vojtech, can you confirm this device is an SSD? There are a couple
>> SSDs that show up in the dmesg if I recall correctly.
>
> Yes. We have a pair (sdg, sdh) of INTEL D3-S4610 240 GB SSDs
> (SSDSC2KG240G8).
> We use them for OS and the raid6 journal.
> They are configured as raid md0 array with LVM on top of it.
> Logical volume vg0-journal_md1 (of 1G size) is used as journal device
> for md1 array (where are problem with proccess md1_raid6 consuming
> 100%
> CPU and blocking btrfs operation is happening)


>> What is the default discard hinting for this SSD when it's used as
>> a journal device for mdadm?
>
> What do you mean by discard hinting?
> We have a issue_discards = 1 configuration in /etc/lvm/lvm.conf


>> And what is the write behavior of the journal?
>
> That would be journal_mode set to write-through, right?


>> I'm not familiar with this feature at all, whether it's treated as a
>> raw block device for the journal or if the journal resides on a file
>> system.
>
> From lsblk output I see no filesystem on vg0-journal_md1. It looks
> like plain logical volume to me.

[my comment]: yes, it's LV block device, no filesystem here.


>> So I get kinda curious what might happen long term if this is a very
>> busy file system, very busy raid5/6 journal on this SSD, without any
>> discard hints?
>> Is it possible the SSD runs out of ready-to-write erase blocks, and
>> the firmware has become super slow doing erasure/garbage collection
>> on demand?
>> And the journal is now having a hard time flushing?
>
> What kind of information could we gather to verify/reject any of these
> ideas?


[my question]: Is LVM configuration (above) enough? Sadly, there are not
much information about RAID 6 journaling at kernel wiki. There are some
info in mdadm(8), but nothing about discards/trim operation.

[-- Attachment #2: lsblk-output.txt --]
[-- Type: text/plain, Size: 1076 bytes --]

NAME                  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdg                     8:96   1 223,6G  0 disk  
├─sdg1                  8:97   1  37,3G  0 part  
│ └─md0                 9:0    0  37,2G  0 raid1 
│   ├─vg0-swap        253:0    0   3,7G  0 lvm   [SWAP]
│   ├─vg0-root        253:1    0  14,9G  0 lvm   /
│   └─vg0-journal_md1 253:2    0     1G  0 lvm   
│     └─md1             9:1    0  29,1T  0 raid6 /mnt/data
├─sdg2                  8:98   1     1K  0 part  
└─sdg5                  8:101  1 186,3G  0 part  
sdh                     8:112  1 223,6G  0 disk  
├─sdh1                  8:113  1  37,3G  0 part  
│ └─md0                 9:0    0  37,2G  0 raid1 
│   ├─vg0-swap        253:0    0   3,7G  0 lvm   [SWAP]
│   ├─vg0-root        253:1    0  14,9G  0 lvm   /
│   └─vg0-journal_md1 253:2    0     1G  0 lvm   
│     └─md1             9:1    0  29,1T  0 raid6 /mnt/data
├─sdh2                  8:114  1     1K  0 part  
└─sdh5                  8:117  1 186,3G  0 part  

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
       [not found]       ` <695936b4-67a2-c862-9cb6-5545b4ab3c42@xmyslivec.cz>
@ 2020-08-14 20:04         ` Chris Murphy
       [not found]           ` <2f2f1c21-c81b-55aa-6f77-e2d3f32d32cb@xmyslivec.cz>
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2020-08-14 20:04 UTC (permalink / raw)
  To: Vojtech Myslivec; +Cc: Michal Moravec, Btrfs BTRFS, Linux-RAID, Song Liu

[-- Attachment #1: Type: text/plain, Size: 27730 bytes --]

On Wed, Aug 12, 2020 at 7:54 AM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote:
>
> My colleague Michal replied to that On 05. 08. 2020:

I can't find any reply
https://lore.kernel.org/linux-raid/20200812141931.bT6w5ZM3DAPlTLvmUW5CIJWTqeVTrI7MAxdY33Z4JT8@z/

> I attach sysrq logs again here

Song requested " /proc/<pid>/stack for <pid> of md1_raid6? We may want
to sample it multiple times. "

But the tarball shows both copies empty
/proc/909/stack

sysrq+w I'll also try to attach as a file because I bet gmail will
screw up the wrapping in the paste:

Jul 31 16:31:40 backup1 kernel: sysrq: Show Blocked State
Jul 31 16:31:40 backup1 kernel:   task                        PC stack
  pid father
Jul 31 16:31:40 backup1 kernel: md1_reclaim     D    0   910      2 0x80004000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  ? __switch_to_asm+0x34/0x70
Jul 31 16:31:40 backup1 kernel:  ? __switch_to_asm+0x34/0x70
Jul 31 16:31:40 backup1 kernel:  ? wbt_exit+0x30/0x30
Jul 31 16:31:40 backup1 kernel:  ? __wbt_done+0x30/0x30
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  io_schedule+0x12/0x40
Jul 31 16:31:40 backup1 kernel:  rq_qos_wait+0xfa/0x170
Jul 31 16:31:40 backup1 kernel:  ? __switch_to_asm+0x34/0x70
Jul 31 16:31:40 backup1 kernel:  ? karma_partition+0x210/0x210
Jul 31 16:31:40 backup1 kernel:  ? wbt_exit+0x30/0x30
Jul 31 16:31:40 backup1 kernel:  wbt_wait+0x99/0xe0
Jul 31 16:31:40 backup1 kernel:  __rq_qos_throttle+0x23/0x30
Jul 31 16:31:40 backup1 kernel:  blk_mq_make_request+0x12a/0x5d0
Jul 31 16:31:40 backup1 kernel:  generic_make_request+0xcf/0x310
Jul 31 16:31:40 backup1 kernel:  submit_bio+0x42/0x1c0
Jul 31 16:31:40 backup1 kernel:  ? md_super_write.part.70+0x98/0x120 [md_mod]
Jul 31 16:31:40 backup1 kernel:  md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
Jul 31 16:31:40 backup1 kernel:  r5l_do_reclaim+0x32a/0x3b0 [raid456]
Jul 31 16:31:40 backup1 kernel:  ? schedule_timeout+0x161/0x310
Jul 31 16:31:40 backup1 kernel:  ? prepare_to_wait_event+0xb6/0x140
Jul 31 16:31:40 backup1 kernel:  ? md_register_thread+0xd0/0xd0 [md_mod]
Jul 31 16:31:40 backup1 kernel:  md_thread+0x94/0x150 [md_mod]
Jul 31 16:31:40 backup1 kernel:  ? finish_wait+0x80/0x80
Jul 31 16:31:40 backup1 kernel:  kthread+0x112/0x130
Jul 31 16:31:40 backup1 kernel:  ? kthread_park+0x80/0x80
Jul 31 16:31:40 backup1 kernel:  ret_from_fork+0x22/0x40
Jul 31 16:31:40 backup1 kernel: btrfs-transacti D    0 10200      2 0x80004000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  io_schedule+0x12/0x40
Jul 31 16:31:40 backup1 kernel:  wait_on_page_bit+0x15c/0x230
Jul 31 16:31:40 backup1 kernel:  ? file_fdatawait_range+0x20/0x20
Jul 31 16:31:40 backup1 kernel:  __filemap_fdatawait_range+0x89/0xf0
Jul 31 16:31:40 backup1 kernel:  ? __clear_extent_bit+0x2bd/0x440 [btrfs]
Jul 31 16:31:40 backup1 kernel:  filemap_fdatawait_range+0xe/0x20
Jul 31 16:31:40 backup1 kernel:
__btrfs_wait_marked_extents.isra.20+0xc2/0x100 [btrfs]
Jul 31 16:31:40 backup1 kernel:
btrfs_write_and_wait_transaction.isra.24+0x67/0xd0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_commit_transaction+0x754/0xa40 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? start_transaction+0xbc/0x4d0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  transaction_kthread+0x144/0x170 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? btrfs_cleanup_transaction+0x5e0/0x5e0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  kthread+0x112/0x130
Jul 31 16:31:40 backup1 kernel:  ? kthread_park+0x80/0x80
Jul 31 16:31:40 backup1 kernel:  ret_from_fork+0x22/0x40
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 30038  30032 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  wait_for_commit+0x58/0x80 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? finish_wait+0x80/0x80
Jul 31 16:31:40 backup1 kernel:  btrfs_commit_transaction+0x169/0xa40 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? btrfs_record_root_in_trans+0x56/0x60 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? start_transaction+0xbc/0x4d0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_mksubvol+0x4f0/0x530 [btrfs]
Jul 31 16:31:40 backup1 kernel:
btrfs_ioctl_snap_create_transid+0x170/0x180 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x11c2/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fe919fb4427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe760eac98 EFLAGS:
00000202 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007fe919fb4427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe760eacd8 RSI:
0000000050009417 RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000559cb3d14260 R08:
0000000000000008 R09: 35315f31332d3730
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000202 R12: 0000559cb3d142a0
Jul 31 16:31:40 backup1 kernel: R13: 0000559cb3d142a0 R14:
0000000000000003 R15: 0000000000000004
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 30247  30241 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  ? prep_new_page+0x90/0x140
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fd2159c9427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fff421698e8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007fd2159c9427
Jul 31 16:31:40 backup1 kernel: RDX: 00007fff42169920 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
00005631102c22c8 R09: 00007fd215a53e80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007fff42169920 R15: 00005631102c22c8
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 30586  30580 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f4e6e35a427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffb6891ab8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007f4e6e35a427
Jul 31 16:31:40 backup1 kernel: RDX: 00007fffb6891af0 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
000055a899f7f2ca R09: 00007f4e6e3e4e80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007fffb6891af0 R15: 000055a899f7f2ca
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 30613  30607 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f527437f427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe0dcd07d8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007f527437f427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe0dcd0810 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
000055bf05f592c6 R09: 00007f5274409e80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007ffe0dcd0810 R15: 000055bf05f592c6
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 31443  31437 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f6e74eb1427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe270079f8 EFLAGS:
00000202 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007f6e74eb1427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe27007a30 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
000056037d0422ca R09: 00007f6e74f3be80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000202 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007ffe27007a30 R15: 000056037d0422ca
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 31650  31644 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fb16f9ae427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffd266be828 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007fb16f9ae427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffd266be860 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
0000563ccec6d2ca R09: 00007fb16fa38e80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007ffd266be860 R15: 0000563ccec6d2ca
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 31813  31807 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f09dee75427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffeed6bea48 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007f09dee75427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffeed6bea80 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
000055ebcdb522c8 R09: 00007f09deeffe80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007ffeed6bea80 R15: 000055ebcdb522c8
Jul 31 16:31:40 backup1 kernel: btrfs           D    0   417    411 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f614eb62427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffcb2850518 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007f614eb62427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffcb2850550 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
000055f42641e2ca R09: 00007f614ebece80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007ffcb2850550 R15: 000055f42641e2ca
Jul 31 16:31:40 backup1 kernel: btrfs           D    0   573    567 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f40626c0427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffa93d57b8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007f40626c0427
Jul 31 16:31:40 backup1 kernel: RDX: 00007fffa93d57f0 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
000055bba6bae2c8 R09: 00007f406274ae80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007fffa93d57f0 R15: 000055bba6bae2c8
Jul 31 16:31:40 backup1 kernel: btrfs           D    0   811    805 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f434d293427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffb46312c8 EFLAGS:
00000202 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007f434d293427
Jul 31 16:31:40 backup1 kernel: RDX: 00007fffb4631300 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
000056166b0872c8 R09: 00007f434d31de80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000202 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007fffb4631300 R15: 000056166b0872c8
Jul 31 16:31:40 backup1 kernel: btrfs           D    0  2200   2194 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fd0c0e04427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffcdcda5d8 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007fd0c0e04427
Jul 31 16:31:40 backup1 kernel: RDX: 00007fffcdcda610 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
0000559a611ce2c7 R09: 00007fd0c0e8ee80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007fffcdcda610 R15: 0000559a611ce2c7
Jul 31 16:31:40 backup1 kernel: btrfs           D    0  2699   2693 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f9d1010e427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffd08c48a48 EFLAGS:
00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007f9d1010e427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffd08c48a80 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
00005639550432dd R09: 00007f9d10198e80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007ffd08c48a80 R15: 00005639550432dd
Jul 31 16:31:40 backup1 kernel: btrfs           D    0  2807   2801 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f0c33875427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffeb2ed3ac8 EFLAGS:
00000202 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 00007f0c33875427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffeb2ed3b00 RSI:
000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08:
0000558a0ba742de R09: 00007f0c338ffe80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11:
0000000000000202 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14:
00007ffeb2ed3b00 R15: 0000558a0ba742de
Jul 31 16:32:01 backup1 CRON[5113]: pam_unix(cron:session): session
opened for user root by (uid=0)
Jul 31 16:32:01 backup1 CRON[5114]: (root) CMD (/opt/scripts/unstuck.sh)
Jul 31 16:32:01 backup1 CRON[5113]: pam_unix(cron:session): session
closed for user root
Jul 31 16:32:03 backup1 sudo[5137]:   nagios : TTY=unknown ; PWD=/ ;
USER=root ; COMMAND=/usr/lib/nagios/plugins/check_btrfs
--allocated-critical-percent 90 --allocated-warning-percent 75
--mountpoint /mnt/data
Jul 31 16:32:03 backup1 sudo[5137]: pam_unix(sudo:session): session
opened for user root by (uid=0)
Jul 31 16:32:03 backup1 sudo[5137]: pam_unix(sudo:session): session
closed for user root
Jul 31 16:32:09 backup1 sudo[5157]:   nagios : TTY=unknown ; PWD=/ ;
USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ipmi_sensor
Jul 31 16:32:09 backup1 sudo[5157]: pam_unix(sudo:session): session
opened for user root by (uid=0)
Jul 31 16:32:10 backup1 sudo[5157]: pam_unix(sudo:session): session
closed for user root


One difficulty is that this is the upstream list; and kernel 5.5 is
end of life. Ordinarily older kernels are distribution responsibility.
Upstream development is mostly interested in current stable kernels at
the oldest, and mainline development kernels. But hopefully someone
will still have an idea what's going on, if it's a bug or possibly
misconfiguration.


--
Chris Murphy

[-- Attachment #2: w1_trace1.txt --]
[-- Type: text/plain, Size: 27172 bytes --]

Jul 31 16:31:40 backup1 kernel: sysrq: Show Blocked State
Jul 31 16:31:40 backup1 kernel:   task                        PC stack   pid father
Jul 31 16:31:40 backup1 kernel: md1_reclaim     D    0   910      2 0x80004000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  ? __switch_to_asm+0x34/0x70
Jul 31 16:31:40 backup1 kernel:  ? __switch_to_asm+0x34/0x70
Jul 31 16:31:40 backup1 kernel:  ? wbt_exit+0x30/0x30
Jul 31 16:31:40 backup1 kernel:  ? __wbt_done+0x30/0x30
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  io_schedule+0x12/0x40
Jul 31 16:31:40 backup1 kernel:  rq_qos_wait+0xfa/0x170
Jul 31 16:31:40 backup1 kernel:  ? __switch_to_asm+0x34/0x70
Jul 31 16:31:40 backup1 kernel:  ? karma_partition+0x210/0x210
Jul 31 16:31:40 backup1 kernel:  ? wbt_exit+0x30/0x30
Jul 31 16:31:40 backup1 kernel:  wbt_wait+0x99/0xe0
Jul 31 16:31:40 backup1 kernel:  __rq_qos_throttle+0x23/0x30
Jul 31 16:31:40 backup1 kernel:  blk_mq_make_request+0x12a/0x5d0
Jul 31 16:31:40 backup1 kernel:  generic_make_request+0xcf/0x310
Jul 31 16:31:40 backup1 kernel:  submit_bio+0x42/0x1c0
Jul 31 16:31:40 backup1 kernel:  ? md_super_write.part.70+0x98/0x120 [md_mod]
Jul 31 16:31:40 backup1 kernel:  md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
Jul 31 16:31:40 backup1 kernel:  r5l_do_reclaim+0x32a/0x3b0 [raid456]
Jul 31 16:31:40 backup1 kernel:  ? schedule_timeout+0x161/0x310
Jul 31 16:31:40 backup1 kernel:  ? prepare_to_wait_event+0xb6/0x140
Jul 31 16:31:40 backup1 kernel:  ? md_register_thread+0xd0/0xd0 [md_mod]
Jul 31 16:31:40 backup1 kernel:  md_thread+0x94/0x150 [md_mod]
Jul 31 16:31:40 backup1 kernel:  ? finish_wait+0x80/0x80
Jul 31 16:31:40 backup1 kernel:  kthread+0x112/0x130
Jul 31 16:31:40 backup1 kernel:  ? kthread_park+0x80/0x80
Jul 31 16:31:40 backup1 kernel:  ret_from_fork+0x22/0x40
Jul 31 16:31:40 backup1 kernel: btrfs-transacti D    0 10200      2 0x80004000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  io_schedule+0x12/0x40
Jul 31 16:31:40 backup1 kernel:  wait_on_page_bit+0x15c/0x230
Jul 31 16:31:40 backup1 kernel:  ? file_fdatawait_range+0x20/0x20
Jul 31 16:31:40 backup1 kernel:  __filemap_fdatawait_range+0x89/0xf0
Jul 31 16:31:40 backup1 kernel:  ? __clear_extent_bit+0x2bd/0x440 [btrfs]
Jul 31 16:31:40 backup1 kernel:  filemap_fdatawait_range+0xe/0x20
Jul 31 16:31:40 backup1 kernel:  __btrfs_wait_marked_extents.isra.20+0xc2/0x100 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_write_and_wait_transaction.isra.24+0x67/0xd0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_commit_transaction+0x754/0xa40 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? start_transaction+0xbc/0x4d0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  transaction_kthread+0x144/0x170 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? btrfs_cleanup_transaction+0x5e0/0x5e0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  kthread+0x112/0x130
Jul 31 16:31:40 backup1 kernel:  ? kthread_park+0x80/0x80
Jul 31 16:31:40 backup1 kernel:  ret_from_fork+0x22/0x40
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 30038  30032 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  wait_for_commit+0x58/0x80 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? finish_wait+0x80/0x80
Jul 31 16:31:40 backup1 kernel:  btrfs_commit_transaction+0x169/0xa40 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? btrfs_record_root_in_trans+0x56/0x60 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? start_transaction+0xbc/0x4d0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_mksubvol+0x4f0/0x530 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_create_transid+0x170/0x180 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x11c2/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fe919fb4427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe760eac98 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe919fb4427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe760eacd8 RSI: 0000000050009417 RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000559cb3d14260 R08: 0000000000000008 R09: 35315f31332d3730
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000559cb3d142a0
Jul 31 16:31:40 backup1 kernel: R13: 0000559cb3d142a0 R14: 0000000000000003 R15: 0000000000000004
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 30247  30241 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  ? prep_new_page+0x90/0x140
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fd2159c9427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fff421698e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd2159c9427
Jul 31 16:31:40 backup1 kernel: RDX: 00007fff42169920 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 00005631102c22c8 R09: 00007fd215a53e80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fff42169920 R15: 00005631102c22c8
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 30586  30580 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f4e6e35a427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffb6891ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4e6e35a427
Jul 31 16:31:40 backup1 kernel: RDX: 00007fffb6891af0 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055a899f7f2ca R09: 00007f4e6e3e4e80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffb6891af0 R15: 000055a899f7f2ca
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 30613  30607 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f527437f427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe0dcd07d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f527437f427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe0dcd0810 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055bf05f592c6 R09: 00007f5274409e80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffe0dcd0810 R15: 000055bf05f592c6
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 31443  31437 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f6e74eb1427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe270079f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6e74eb1427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe27007a30 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000056037d0422ca R09: 00007f6e74f3be80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffe27007a30 R15: 000056037d0422ca
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 31650  31644 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fb16f9ae427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffd266be828 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb16f9ae427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffd266be860 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 0000563ccec6d2ca R09: 00007fb16fa38e80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffd266be860 R15: 0000563ccec6d2ca
Jul 31 16:31:40 backup1 kernel: btrfs           D    0 31813  31807 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f09dee75427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffeed6bea48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f09dee75427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffeed6bea80 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055ebcdb522c8 R09: 00007f09deeffe80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffeed6bea80 R15: 000055ebcdb522c8
Jul 31 16:31:40 backup1 kernel: btrfs           D    0   417    411 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f614eb62427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffcb2850518 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f614eb62427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffcb2850550 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055f42641e2ca R09: 00007f614ebece80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffcb2850550 R15: 000055f42641e2ca
Jul 31 16:31:40 backup1 kernel: btrfs           D    0   573    567 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f40626c0427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffa93d57b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f40626c0427
Jul 31 16:31:40 backup1 kernel: RDX: 00007fffa93d57f0 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055bba6bae2c8 R09: 00007f406274ae80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffa93d57f0 R15: 000055bba6bae2c8
Jul 31 16:31:40 backup1 kernel: btrfs           D    0   811    805 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f434d293427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffb46312c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f434d293427
Jul 31 16:31:40 backup1 kernel: RDX: 00007fffb4631300 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000056166b0872c8 R09: 00007f434d31de80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffb4631300 R15: 000056166b0872c8
Jul 31 16:31:40 backup1 kernel: btrfs           D    0  2200   2194 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fd0c0e04427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffcdcda5d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd0c0e04427
Jul 31 16:31:40 backup1 kernel: RDX: 00007fffcdcda610 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 0000559a611ce2c7 R09: 00007fd0c0e8ee80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffcdcda610 R15: 0000559a611ce2c7
Jul 31 16:31:40 backup1 kernel: btrfs           D    0  2699   2693 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f9d1010e427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffd08c48a48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9d1010e427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffd08c48a80 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 00005639550432dd R09: 00007f9d10198e80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffd08c48a80 R15: 00005639550432dd
Jul 31 16:31:40 backup1 kernel: btrfs           D    0  2807   2801 0x00000000
Jul 31 16:31:40 backup1 kernel: Call Trace:
Jul 31 16:31:40 backup1 kernel:  ? __schedule+0x2db/0x700
Jul 31 16:31:40 backup1 kernel:  schedule+0x40/0xb0
Jul 31 16:31:40 backup1 kernel:  rwsem_down_write_slowpath+0x329/0x4e0
Jul 31 16:31:40 backup1 kernel:  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? d_lookup+0x25/0x50
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs]
Jul 31 16:31:40 backup1 kernel:  btrfs_ioctl+0x166c/0x2de0 [btrfs]
Jul 31 16:31:40 backup1 kernel:  ? tomoyo_path_number_perm+0x68/0x1e0
Jul 31 16:31:40 backup1 kernel:  ? do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  do_vfs_ioctl+0xa4/0x680
Jul 31 16:31:40 backup1 kernel:  ksys_ioctl+0x60/0x90
Jul 31 16:31:40 backup1 kernel:  __x64_sys_ioctl+0x16/0x20
Jul 31 16:31:40 backup1 kernel:  do_syscall_64+0x52/0x170
Jul 31 16:31:40 backup1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f0c33875427
Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value.
Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffeb2ed3ac8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0c33875427
Jul 31 16:31:40 backup1 kernel: RDX: 00007ffeb2ed3b00 RSI: 000000005000940f RDI: 0000000000000003
Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 0000558a0ba742de R09: 00007f0c338ffe80
Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003
Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffeb2ed3b00 R15: 0000558a0ba742de
Jul 31 16:32:01 backup1 CRON[5113]: pam_unix(cron:session): session opened for user root by (uid=0)
Jul 31 16:32:01 backup1 CRON[5114]: (root) CMD (/opt/scripts/unstuck.sh)
Jul 31 16:32:01 backup1 CRON[5113]: pam_unix(cron:session): session closed for user root
Jul 31 16:32:03 backup1 sudo[5137]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_btrfs --allocated-critical-percent 90 --allocated-warning-percent 75 --mountpoint /mnt/data
Jul 31 16:32:03 backup1 sudo[5137]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jul 31 16:32:03 backup1 sudo[5137]: pam_unix(sudo:session): session closed for user root
Jul 31 16:32:09 backup1 sudo[5157]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ipmi_sensor
Jul 31 16:32:09 backup1 sudo[5157]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jul 31 16:32:10 backup1 sudo[5157]: pam_unix(sudo:session): session closed for user root


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
       [not found]           ` <2f2f1c21-c81b-55aa-6f77-e2d3f32d32cb@xmyslivec.cz>
@ 2020-08-19 22:58             ` Chris Murphy
  2020-08-26 15:35               ` Vojtech Myslivec
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2020-08-19 22:58 UTC (permalink / raw)
  To: Vojtech Myslivec
  Cc: Chris Murphy, Michal Moravec, Btrfs BTRFS, Linux-RAID, Song Liu

On Wed, Aug 19, 2020 at 11:29 AM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote:
>
> Linux backup1 5.7.0-0.bpo.2-amd64 #1 SMP Debian 5.7.10-1~bpo10+1

Should be new enough; I don't see raid related md changes between
5.7.10 and 5.7.16. I haven't looked at 5.8, but 5.7 is probably recent
enough to know if there have been relevant changes in 5.8 that are
worth testing.

>
> - `5.7_profs.txt`
>   - dump of the whole /proc when the issue happened

The problem here I think is that /proc/pid/stack is empty. You might
have to hammer on it a bunch of times to get a stack. I can't tell if
the sysrq+w is enough information to conclusively tell if this is
strictly an md problem or if there's something else going on.

But I do see in the sysrq+w evidence of a Btrfs snapshot happening,
which will result in a flush of the file system. Since the mdadm raid
journal is on two SSDs which should be fast enough to accept the
metadata changes before actually doing the flush.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-08-19 22:58             ` Chris Murphy
@ 2020-08-26 15:35               ` Vojtech Myslivec
  2020-08-26 18:07                 ` Chris Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Vojtech Myslivec @ 2020-08-26 15:35 UTC (permalink / raw)
  To: Chris Murphy, Song Liu; +Cc: Michal Moravec, Btrfs BTRFS, Linux-RAID

[-- Attachment #1: Type: text/plain, Size: 2898 bytes --]

Hi again,

On 20. 08. 20 0:58, Chris Murphy wrote:
> The problem here I think is that /proc/pid/stack is empty. You might
> have to hammer on it a bunch of times to get a stack. I can't tell if
> the sysrq+w is enough information to conclusively tell if this is
> strictly an md problem or if there's something else going on.
> 
> But I do see in the sysrq+w evidence of a Btrfs snapshot happening,
> which will result in a flush of the file system. Since the mdadm raid
> journal is on two SSDs which should be fast enough to accept the
> metadata changes before actually doing the flush.

We were lucky to dump some stacks yesterday, when the issue happened again.


In `sysrq.log`, there is an dmesg/kern.log output of

    echo w > /proc/sysrq-trigger

command, when `md1_raid6` started to consume 100% CPU and all
btrfs-snapshot-related commands stuck in the *disk sleep* state.

We had issued sysrq+w command several times and I have included all
unique tasks appeared there.


In `stack_md1_reclaim.txt`, `stack_btrfs-transacti.txt` and
`stack_btrfs*.txt`, there are outputs of

    cat /proc/<pid>/stack

of given processes during the same time. The output is still the same
when we issued this command several times.


In `stack_md1_raid6-[0-3].txt`, there are outputs of the same command
where pid is `md1_raid6` process id. We issued that several time and the
output differs a bit sometimes. If I get it right, differs only in
instruction offset of given functions. I include all the combinations we
encounter for case it matters.

We have dumped stack of this process in a while-true cycle just after we
perform a manual "unstuck" workaround (I described this action in the
first e-mail). I can send it to the mailing list as well if needed.



I hope these outputs include what you, Song, requested on July the 30th
(and I hope it's ok to continue in this thread)

On 30. 07. 20 8:45, Song Liu wrote:
>> On Jul 29, 2020, at 2:06 PM, Guoqing Jiang wrote:
>>
>> Hi,
>>
>> On 7/22/20 10:47 PM, Vojtech Myslivec wrote:
>>> 1. What should be the cause of this problem?
>>
>> Just a quick glance based on the stacks which you attached, I guess
>> it could be a deadlock issue of raid5 cache super write.
>>
>> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in
>> superblock write") didn't fix the problem completely.  Cc Song.
>>
>> And I am curious why md thread is not waked if mddev_trylock fails,
>> you can give it a try but I can't promise it helps ...
>
> Thanks Guoqing!
>
> I am not sure whether we hit the mddev_trylock() failure. Looks like
> the md1_raid6 thread is already running at 100%.
>
> A few questions:
>
> 1. I see wbt_wait in the stack trace. Are we using write back
>    throttling here?
> 2. Could you please get the /proc/<pid>/stack for <pid> of md1_raid6?
>    We may want to sample it multiple times.
>
> Thanks,
> Song


Thanks,
Vojtech and Michal

[-- Attachment #2: stack_md1_raid6-0.txt --]
[-- Type: text/plain, Size: 266 bytes --]

[<0>] ops_run_io+0x40/0xd80 [raid456]
[<0>] handle_stripe+0xc32/0x2230 [raid456]
[<0>] handle_active_stripes.isra.71+0x3b8/0x590 [raid456]
[<0>] raid5d+0x392/0x5b0 [raid456]
[<0>] md_thread+0x94/0x150 [md_mod]
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x22/0x40

[-- Attachment #3: stack_md1_raid6-1.txt --]
[-- Type: text/plain, Size: 266 bytes --]

[<0>] ops_run_io+0x40/0xd80 [raid456]
[<0>] handle_stripe+0x136/0x2230 [raid456]
[<0>] handle_active_stripes.isra.71+0x3b8/0x590 [raid456]
[<0>] raid5d+0x392/0x5b0 [raid456]
[<0>] md_thread+0x94/0x150 [md_mod]
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x22/0x40

[-- Attachment #4: stack_md1_raid6-2.txt --]
[-- Type: text/plain, Size: 266 bytes --]

[<0>] ops_run_io+0x40/0xd80 [raid456]
[<0>] handle_stripe+0xc32/0x2230 [raid456]
[<0>] handle_active_stripes.isra.71+0x485/0x590 [raid456]
[<0>] raid5d+0x392/0x5b0 [raid456]
[<0>] md_thread+0x94/0x150 [md_mod]
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x22/0x40

[-- Attachment #5: stack_md1_raid6-3.txt --]
[-- Type: text/plain, Size: 266 bytes --]

[<0>] ops_run_io+0x40/0xd80 [raid456]
[<0>] handle_stripe+0x136/0x2230 [raid456]
[<0>] handle_active_stripes.isra.71+0x420/0x590 [raid456]
[<0>] raid5d+0x392/0x5b0 [raid456]
[<0>] md_thread+0x94/0x150 [md_mod]
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x22/0x40

[-- Attachment #6: stack_md1_reclaim.txt --]
[-- Type: text/plain, Size: 375 bytes --]

[<0>] rq_qos_wait+0xfa/0x170
[<0>] wbt_wait+0x98/0xe0
[<0>] __rq_qos_throttle+0x23/0x30
[<0>] blk_mq_make_request+0x12a/0x5d0
[<0>] generic_make_request+0xcf/0x310
[<0>] submit_bio+0x42/0x1c0
[<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
[<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456]
[<0>] md_thread+0x94/0x150 [md_mod]
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x22/0x40

[-- Attachment #7: stack_btrfs-transacti.txt --]
[-- Type: text/plain, Size: 395 bytes --]

[<0>] wait_on_page_bit+0x15c/0x230
[<0>] __filemap_fdatawait_range+0x89/0xf0
[<0>] filemap_fdatawait_range+0xe/0x20
[<0>] __btrfs_wait_marked_extents.isra.19+0xc2/0x100 [btrfs]
[<0>] btrfs_write_and_wait_transaction.isra.23+0x67/0xd0 [btrfs]
[<0>] btrfs_commit_transaction+0x755/0xa50 [btrfs]
[<0>] transaction_kthread+0x144/0x170 [btrfs]
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x22/0x40

[-- Attachment #8: stack_btrfs32445.txt --]
[-- Type: text/plain, Size: 165 bytes --]

[<0>] do_wait+0x1c7/0x230
[<0>] kernel_wait4+0xa6/0x140
[<0>] __do_sys_wait4+0x83/0x90
[<0>] do_syscall_64+0x52/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[-- Attachment #9: stack_btrfs32277.txt --]
[-- Type: text/plain, Size: 319 bytes --]

[<0>] rwsem_down_write_slowpath+0x329/0x4e0
[<0>] btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[<0>] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[<0>] btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[<0>] ksys_ioctl+0x86/0xc0
[<0>] __x64_sys_ioctl+0x16/0x20
[<0>] do_syscall_64+0x52/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[-- Attachment #10: stack_btrfs31357.txt --]
[-- Type: text/plain, Size: 319 bytes --]

[<0>] rwsem_down_write_slowpath+0x329/0x4e0
[<0>] btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[<0>] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[<0>] btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[<0>] ksys_ioctl+0x86/0xc0
[<0>] __x64_sys_ioctl+0x16/0x20
[<0>] do_syscall_64+0x52/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[-- Attachment #11: stack_btrfs31094.txt --]
[-- Type: text/plain, Size: 319 bytes --]

[<0>] rwsem_down_write_slowpath+0x329/0x4e0
[<0>] btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[<0>] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[<0>] btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[<0>] ksys_ioctl+0x86/0xc0
[<0>] __x64_sys_ioctl+0x16/0x20
[<0>] do_syscall_64+0x52/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[-- Attachment #12: stack_btrfs30168.txt --]
[-- Type: text/plain, Size: 411 bytes --]

[<0>] wait_for_commit+0x58/0x80 [btrfs]
[<0>] btrfs_commit_transaction+0x169/0xa50 [btrfs]
[<0>] btrfs_mksubvol+0x2e9/0x450 [btrfs]
[<0>] __btrfs_ioctl_snap_create+0x167/0x170 [btrfs]
[<0>] btrfs_ioctl_snap_create_v2+0xc5/0xe0 [btrfs]
[<0>] btrfs_ioctl+0xf61/0x2ef0 [btrfs]
[<0>] ksys_ioctl+0x86/0xc0
[<0>] __x64_sys_ioctl+0x16/0x20
[<0>] do_syscall_64+0x52/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[-- Attachment #13: sysrq.log --]
[-- Type: text/x-log, Size: 20285 bytes --]

[Tue Aug 25 16:41:16 2020] sysrq: Show Blocked State
[Tue Aug 25 16:41:16 2020]   task                        PC stack   pid father

[Tue Aug 25 16:41:16 2020] md1_reclaim     D    0   806      2 0x80004000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  ? __switch_to_asm+0x34/0x70
[Tue Aug 25 16:41:16 2020]  ? __switch_to_asm+0x34/0x70
[Tue Aug 25 16:41:16 2020]  ? wbt_exit+0x30/0x30
[Tue Aug 25 16:41:16 2020]  ? __wbt_done+0x30/0x30
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  io_schedule+0x12/0x40
[Tue Aug 25 16:41:16 2020]  rq_qos_wait+0xfa/0x170
[Tue Aug 25 16:41:16 2020]  ? __switch_to_asm+0x34/0x70
[Tue Aug 25 16:41:16 2020]  ? karma_partition+0x1e0/0x1e0
[Tue Aug 25 16:41:16 2020]  ? wbt_exit+0x30/0x30
[Tue Aug 25 16:41:16 2020]  wbt_wait+0x98/0xe0
[Tue Aug 25 16:41:16 2020]  __rq_qos_throttle+0x23/0x30
[Tue Aug 25 16:41:16 2020]  blk_mq_make_request+0x12a/0x5d0
[Tue Aug 25 16:41:16 2020]  generic_make_request+0xcf/0x310
[Tue Aug 25 16:41:16 2020]  submit_bio+0x42/0x1c0
[Tue Aug 25 16:41:16 2020]  ? md_super_write.part.70+0x98/0x120 [md_mod]
[Tue Aug 25 16:41:16 2020]  md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
[Tue Aug 25 16:41:16 2020]  r5l_do_reclaim+0x32a/0x3b0 [raid456]
[Tue Aug 25 16:41:16 2020]  ? schedule_timeout+0x162/0x340
[Tue Aug 25 16:41:16 2020]  ? prepare_to_wait_event+0xb6/0x140
[Tue Aug 25 16:41:16 2020]  ? md_register_thread+0xd0/0xd0 [md_mod]
[Tue Aug 25 16:41:16 2020]  md_thread+0x94/0x150 [md_mod]
[Tue Aug 25 16:41:16 2020]  ? finish_wait+0x80/0x80
[Tue Aug 25 16:41:16 2020]  kthread+0x112/0x130
[Tue Aug 25 16:41:16 2020]  ? kthread_park+0x80/0x80
[Tue Aug 25 16:41:16 2020]  ret_from_fork+0x22/0x40

[Tue Aug 25 16:41:16 2020] btrfs-transacti D    0   893      2 0x80004000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  io_schedule+0x12/0x40
[Tue Aug 25 16:41:16 2020]  wait_on_page_bit+0x15c/0x230
[Tue Aug 25 16:41:16 2020]  ? file_fdatawait_range+0x20/0x20
[Tue Aug 25 16:41:16 2020]  __filemap_fdatawait_range+0x89/0xf0
[Tue Aug 25 16:41:16 2020]  filemap_fdatawait_range+0xe/0x20
[Tue Aug 25 16:41:16 2020]  __btrfs_wait_marked_extents.isra.19+0xc2/0x100 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_write_and_wait_transaction.isra.23+0x67/0xd0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_commit_transaction+0x755/0xa50 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? start_transaction+0xd5/0x540 [btrfs]
[Tue Aug 25 16:41:16 2020]  transaction_kthread+0x144/0x170 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? btrfs_cleanup_transaction+0x590/0x590 [btrfs]
[Tue Aug 25 16:41:16 2020]  kthread+0x112/0x130
[Tue Aug 25 16:41:16 2020]  ? kthread_park+0x80/0x80
[Tue Aug 25 16:41:16 2020]  ret_from_fork+0x22/0x40

[Tue Aug 25 16:41:16 2020] btrfs           D    0 30168  30162 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  wait_for_commit+0x58/0x80 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? finish_wait+0x80/0x80
[Tue Aug 25 16:41:16 2020]  btrfs_commit_transaction+0x169/0xa50 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? btrfs_record_root_in_trans+0x56/0x60 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? start_transaction+0xd5/0x540 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_mksubvol+0x2e9/0x450 [btrfs]
[Tue Aug 25 16:41:16 2020]  __btrfs_ioctl_snap_create+0x167/0x170 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_create_v2+0xc5/0xe0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xf61/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f273259b427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffe48178418 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f273259b427
[Tue Aug 25 16:41:16 2020] RDX: 00007ffe48178458 RSI: 0000000050009417 RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 00005637741dc260 R08: 0000000000000008 R09: 0000000035322d38
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 00005637741dc2b0
[Tue Aug 25 16:41:16 2020] R13: 00005637741dc2b0 R14: 0000000000000003 R15: 0000000000000004

[Tue Aug 25 16:41:16 2020] btrfs           D    0 31094  31088 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  rwsem_down_write_slowpath+0x329/0x4e0
[Tue Aug 25 16:41:16 2020]  ? kmem_cache_alloc_trace+0x15e/0x230
[Tue Aug 25 16:41:16 2020]  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? d_lookup+0x25/0x50
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7ff3fa6a3427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffe92fc4058 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff3fa6a3427
[Tue Aug 25 16:41:16 2020] RDX: 00007ffe92fc4090 RSI: 000000005000940f RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055b4e34ab2ca R09: 00007ff3fa72de80
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003
[Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffe92fc4090 R15: 000055b4e34ab2ca

[Tue Aug 25 16:41:16 2020] btrfs           D    0 31357  31351 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  rwsem_down_write_slowpath+0x329/0x4e0
[Tue Aug 25 16:41:16 2020]  ? kmem_cache_alloc_trace+0x15e/0x230
[Tue Aug 25 16:41:16 2020]  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? d_lookup+0x25/0x50
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f9c14290427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007fff4c9f8ed8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9c14290427
[Tue Aug 25 16:41:16 2020] RDX: 00007fff4c9f8f10 RSI: 000000005000940f RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055ea83aa82c8 R09: 00007f9c1431ae80
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
[Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007fff4c9f8f10 R15: 000055ea83aa82c8

[Tue Aug 25 16:41:16 2020] btrfs           D    0 32277  32271 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  rwsem_down_write_slowpath+0x329/0x4e0
[Tue Aug 25 16:41:16 2020]  ? kmem_cache_alloc_trace+0x15e/0x230
[Tue Aug 25 16:41:16 2020]  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? d_lookup+0x25/0x50
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f0b35736427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007fff96ad7c48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0b35736427
[Tue Aug 25 16:41:16 2020] RDX: 00007fff96ad7c80 RSI: 000000005000940f RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055e2598b62c8 R09: 00007f0b357c0e80
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
[Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007fff96ad7c80 R15: 000055e2598b62c8

[Tue Aug 25 16:41:16 2020] btrfs           D    0 32451  32445 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  rwsem_down_write_slowpath+0x329/0x4e0
[Tue Aug 25 16:41:16 2020]  ? kmem_cache_alloc_trace+0x15e/0x230
[Tue Aug 25 16:41:16 2020]  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? d_lookup+0x25/0x50
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f0517ff8427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffc862d3c98 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0517ff8427
[Tue Aug 25 16:41:16 2020] RDX: 00007ffc862d3cd0 RSI: 000000005000940f RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 0000555d6d7872ca R09: 00007f0518082e80
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003
[Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffc862d3cd0 R15: 0000555d6d7872ca

[Tue Aug 25 16:41:16 2020] btrfs           D    0   596    590 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  rwsem_down_write_slowpath+0x329/0x4e0
[Tue Aug 25 16:41:16 2020]  ? kmem_cache_alloc_trace+0x15e/0x230
[Tue Aug 25 16:41:16 2020]  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? d_lookup+0x25/0x50
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7ffa2b954427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffc88d95748 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffa2b954427
[Tue Aug 25 16:41:16 2020] RDX: 00007ffc88d95780 RSI: 000000005000940f RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 0000558d2b97f2c8 R09: 00007ffa2b9dee80
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
[Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffc88d95780 R15: 0000558d2b97f2c8

[Tue Aug 25 16:41:16 2020] btrfs           D    0   792    786 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  rwsem_down_write_slowpath+0x329/0x4e0
[Tue Aug 25 16:41:16 2020]  ? kmem_cache_alloc_trace+0x15e/0x230
[Tue Aug 25 16:41:16 2020]  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? d_lookup+0x25/0x50
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f74f06d2427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007fffbb878738 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f74f06d2427
[Tue Aug 25 16:41:16 2020] RDX: 00007fffbb878770 RSI: 000000005000940f RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055f062e442c7 R09: 00007f74f075ce80
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003
[Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007fffbb878770 R15: 000055f062e442c7

[Tue Aug 25 16:41:16 2020] btrfs           D    0   833    827 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  rwsem_down_write_slowpath+0x329/0x4e0
[Tue Aug 25 16:41:16 2020]  ? kmem_cache_alloc_trace+0x15e/0x230
[Tue Aug 25 16:41:16 2020]  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? d_lookup+0x25/0x50
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f65fee13427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffc45aa0508 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f65fee13427
[Tue Aug 25 16:41:16 2020] RDX: 00007ffc45aa0540 RSI: 000000005000940f RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055d819de32c6 R09: 00007f65fee9de80
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003
[Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffc45aa0540 R15: 000055d819de32c6

[Tue Aug 25 16:41:16 2020] btrfs           D    0 10542  10515 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  rwsem_down_write_slowpath+0x329/0x4e0
[Tue Aug 25 16:41:16 2020]  ? kmem_cache_alloc_trace+0x15e/0x230
[Tue Aug 25 16:41:16 2020]  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? d_lookup+0x25/0x50
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f48b0d4a427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffdf1ab4fd8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f48b0d4a427
[Tue Aug 25 16:41:16 2020] RDX: 00007ffdf1ab5010 RSI: 000000005000940f RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 00005631cce8a2c8 R09: 00007f48b0dd4e80
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
[Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffdf1ab5010 R15: 00005631cce8a2c8

[Tue Aug 25 16:41:16 2020] btrfs           D    0  1667   1661 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  rwsem_down_write_slowpath+0x329/0x4e0
[Tue Aug 25 16:41:16 2020]  ? kmem_cache_alloc_trace+0x15e/0x230
[Tue Aug 25 16:41:16 2020]  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? d_lookup+0x25/0x50
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f759eff1427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffcc4fb06c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f759eff1427
[Tue Aug 25 16:41:16 2020] RDX: 00007ffcc4fb0700 RSI: 000000005000940f RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 00005584f7c5a2ca R09: 00007f759f07be80
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003
[Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffcc4fb0700 R15: 00005584f7c5a2ca

[Tue Aug 25 16:41:16 2020] btrfs           D    0  2473   2467 0x00000000
[Tue Aug 25 16:41:16 2020] Call Trace:
[Tue Aug 25 16:41:16 2020]  __schedule+0x2dd/0x710
[Tue Aug 25 16:41:16 2020]  schedule+0x40/0xb0
[Tue Aug 25 16:41:16 2020]  rwsem_down_write_slowpath+0x329/0x4e0
[Tue Aug 25 16:41:16 2020]  ? kmem_cache_alloc_trace+0x15e/0x230
[Tue Aug 25 16:41:16 2020]  btrfs_delete_subvolume+0x90/0x5f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? d_lookup+0x25/0x50
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs]
[Tue Aug 25 16:41:16 2020]  btrfs_ioctl+0xa5f/0x2ef0 [btrfs]
[Tue Aug 25 16:41:16 2020]  ? ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  ksys_ioctl+0x86/0xc0
[Tue Aug 25 16:41:16 2020]  __x64_sys_ioctl+0x16/0x20
[Tue Aug 25 16:41:16 2020]  do_syscall_64+0x52/0x170
[Tue Aug 25 16:41:16 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f278ea1a427
[Tue Aug 25 16:41:16 2020] Code: Bad RIP value.
[Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffd784d5318 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f278ea1a427
[Tue Aug 25 16:41:16 2020] RDX: 00007ffd784d5350 RSI: 000000005000940f RDI: 0000000000000003
[Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055b8f4d412de R09: 00007f278eaa4e80
[Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003
[Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffd784d5350 R15: 000055b8f4d412de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-08-26 15:35               ` Vojtech Myslivec
@ 2020-08-26 18:07                 ` Chris Murphy
  2020-09-16  9:42                   ` Vojtech Myslivec
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2020-08-26 18:07 UTC (permalink / raw)
  To: Vojtech Myslivec
  Cc: Chris Murphy, Song Liu, Michal Moravec, Btrfs BTRFS, Linux-RAID

OK so from the attachments..

cat /proc/<pid>/stack for md1_raid6

[<0>] rq_qos_wait+0xfa/0x170
[<0>] wbt_wait+0x98/0xe0
[<0>] __rq_qos_throttle+0x23/0x30
[<0>] blk_mq_make_request+0x12a/0x5d0
[<0>] generic_make_request+0xcf/0x310
[<0>] submit_bio+0x42/0x1c0
[<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
[<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456]
[<0>] md_thread+0x94/0x150 [md_mod]
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x22/0x40


Btrfs snapshot flushing might instigate the problem but it seems to me
there's some kind of contention or blocking happening within md, and
that's why everything stalls. But I can't tell why.

Do you have any iostat output at the time of this problem? I'm
wondering if md is waiting on disks. If not, try `iostat -dxm 5` and
share a few minutes before and after the freeze/hang.


--
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-08-26 18:07                 ` Chris Murphy
@ 2020-09-16  9:42                   ` Vojtech Myslivec
  2020-09-17 17:08                     ` Chris Murphy
  2020-09-17 17:43                     ` Chris Murphy
  0 siblings, 2 replies; 22+ messages in thread
From: Vojtech Myslivec @ 2020-09-16  9:42 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Song Liu, Michal Moravec, Btrfs BTRFS, Linux-RAID

[-- Attachment #1: Type: text/plain, Size: 2012 bytes --]

Hello,

it seems my last e-mail was filtered as I can't find it in the archives.
So I will resend it and include all attachments in one tarball.


On 26. 08. 20 20:07, Chris Murphy wrote:> OK so from the attachments..
>
> cat /proc/<pid>/stack for md1_raid6
>
> [<0>] rq_qos_wait+0xfa/0x170
> [<0>] wbt_wait+0x98/0xe0
> [<0>] __rq_qos_throttle+0x23/0x30
> [<0>] blk_mq_make_request+0x12a/0x5d0
> [<0>] generic_make_request+0xcf/0x310
> [<0>] submit_bio+0x42/0x1c0
> [<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
> [<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456]
> [<0>] md_thread+0x94/0x150 [md_mod]
> [<0>] kthread+0x112/0x130
> [<0>] ret_from_fork+0x22/0x40
>
>
> Btrfs snapshot flushing might instigate the problem but it seems to me
> there's some kind of contention or blocking happening within md, and
> that's why everything stalls. But I can't tell why.
>
> Do you have any iostat output at the time of this problem? I'm
> wondering if md is waiting on disks. If not, try `iostat -dxm 5` and
> share a few minutes before and after the freeze/hang.
We have detected the issue at Monday 31.09.2020 15:24. It must happen
sometimes between 15:22-15:24 as we monitor the state every 2 minutes.

We have recorded stacks of blocked processes, sysrq+w command and
requested `iostat`. Then in 15:45, we perform manual "unstuck" process
by accessing md1 device via dd command (reading a few random blocks).

I hope attached file names are self-explaining.

Please let me know if we can do anything more to track the issue or if I
forget something.

Thanks a lot,
Vojtech and Michal



Description of the devices in iostat, just for recap:
- sda-sdf: 6 HDD disks
- sdg, sdh: 2 SSD disks

- md0: raid1 over sdg1 and sdh1 ("SSD RAID", Physical Volume for LVM)
- md1: our "problematic" raid6 over sda-sdf ("HDD RAID", btrfs
       formatted)

- Logical volumes over md0 Physical Volume (on SSD RAID)
    - dm-0: 4G  LV for SWAP
    - dm-1: 16G LV for root file system (ext4 formatted)
    - dm-2: 1G  LV for md1 journal


[-- Attachment #2: mdraid-btrfs-issue.tgz --]
[-- Type: application/x-compressed-tar, Size: 43544 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-09-16  9:42                   ` Vojtech Myslivec
@ 2020-09-17 17:08                     ` Chris Murphy
  2020-09-17 17:20                       ` Chris Murphy
  2020-09-17 17:43                     ` Chris Murphy
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2020-09-17 17:08 UTC (permalink / raw)
  To: Vojtech Myslivec
  Cc: Chris Murphy, Song Liu, Michal Moravec, Btrfs BTRFS, Linux-RAID

On Wed, Sep 16, 2020 at 3:42 AM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote:
>
> Hello,
>
> it seems my last e-mail was filtered as I can't find it in the archives.
> So I will resend it and include all attachments in one tarball.
>
>
> On 26. 08. 20 20:07, Chris Murphy wrote:> OK so from the attachments..
> >
> > cat /proc/<pid>/stack for md1_raid6
> >
> > [<0>] rq_qos_wait+0xfa/0x170
> > [<0>] wbt_wait+0x98/0xe0
> > [<0>] __rq_qos_throttle+0x23/0x30
> > [<0>] blk_mq_make_request+0x12a/0x5d0
> > [<0>] generic_make_request+0xcf/0x310
> > [<0>] submit_bio+0x42/0x1c0
> > [<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
> > [<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456]
> > [<0>] md_thread+0x94/0x150 [md_mod]
> > [<0>] kthread+0x112/0x130
> > [<0>] ret_from_fork+0x22/0x40
> >
> >
> > Btrfs snapshot flushing might instigate the problem but it seems to me
> > there's some kind of contention or blocking happening within md, and
> > that's why everything stalls. But I can't tell why.
> >
> > Do you have any iostat output at the time of this problem? I'm
> > wondering if md is waiting on disks. If not, try `iostat -dxm 5` and
> > share a few minutes before and after the freeze/hang.
> We have detected the issue at Monday 31.09.2020 15:24. It must happen
> sometimes between 15:22-15:24 as we monitor the state every 2 minutes.
>
> We have recorded stacks of blocked processes, sysrq+w command and
> requested `iostat`. Then in 15:45, we perform manual "unstuck" process
> by accessing md1 device via dd command (reading a few random blocks).
>
> I hope attached file names are self-explaining.
>
> Please let me know if we can do anything more to track the issue or if I
> forget something.
>
> Thanks a lot,
> Vojtech and Michal
>
>
>
> Description of the devices in iostat, just for recap:
> - sda-sdf: 6 HDD disks
> - sdg, sdh: 2 SSD disks
>
> - md0: raid1 over sdg1 and sdh1 ("SSD RAID", Physical Volume for LVM)
> - md1: our "problematic" raid6 over sda-sdf ("HDD RAID", btrfs
>        formatted)
>
> - Logical volumes over md0 Physical Volume (on SSD RAID)
>     - dm-0: 4G  LV for SWAP
>     - dm-1: 16G LV for root file system (ext4 formatted)
>     - dm-2: 1G  LV for md1 journal
>

It's kindof a complicated setup. When this problem happens, can you
check swap pressure?

/sys/fs/cgroup/memory.stat

pgfault and maybe also pgmajfault - see if they're going up; or also
you can look at vmstat and see how heavy swap is being used at the
time. The thing is.

Because any heavy eviction means writes to dm-0->md0 raid1->sdg+sdh
SSDs, which are the same SSDs that you have the md1 raid6 mdadm
journal going to. So if you have any kind of swap pressure, it very
likely will stop the journal or at least substantially slow it down,
and now you get blocked tasks as the pressure builds more and more
because now you have a ton of dirty writes in Btrfs that can't make it
to disk.

If there is minimal swap usage, then this hypothesis is false and
something else is going on. I also don't have an explanation why your
work around works.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-09-17 17:08                     ` Chris Murphy
@ 2020-09-17 17:20                       ` Chris Murphy
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Murphy @ 2020-09-17 17:20 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Vojtech Myslivec, Song Liu, Michal Moravec, Btrfs BTRFS, Linux-RAID

The iostat isn't particularly revealing, I don't see especially high
%util for any device. SSD write MB/s gets up to 42 which is
reasonable.


--
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-09-16  9:42                   ` Vojtech Myslivec
  2020-09-17 17:08                     ` Chris Murphy
@ 2020-09-17 17:43                     ` Chris Murphy
  2020-09-23 18:14                       ` Vojtech Myslivec
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2020-09-17 17:43 UTC (permalink / raw)
  To: Vojtech Myslivec
  Cc: Chris Murphy, Song Liu, Michal Moravec, Btrfs BTRFS, Linux-RAID

[Mon Aug 31 15:31:55 2020] sysrq: Show Blocked State
[Mon Aug 31 15:31:55 2020]   task                        PC stack   pid father

[Mon Aug 31 15:31:55 2020] md1_reclaim     D    0   806      2 0x80004000
[Mon Aug 31 15:31:55 2020] Call Trace:
[Mon Aug 31 15:31:55 2020]  __schedule+0x2dd/0x710
[Mon Aug 31 15:31:55 2020]  ? finish_wait+0x80/0x80
[Mon Aug 31 15:31:55 2020]  ? wbt_exit+0x30/0x30
[Mon Aug 31 15:31:55 2020]  ? __wbt_done+0x30/0x30
[Mon Aug 31 15:31:55 2020]  schedule+0x40/0xb0
[Mon Aug 31 15:31:55 2020]  io_schedule+0x12/0x40
[Mon Aug 31 15:31:55 2020]  rq_qos_wait+0xfa/0x170
[Mon Aug 31 15:31:55 2020]  ? karma_partition+0x1e0/0x1e0
[Mon Aug 31 15:31:55 2020]  ? wbt_exit+0x30/0x30
[Mon Aug 31 15:31:55 2020]  wbt_wait+0x98/0xe0
[Mon Aug 31 15:31:55 2020]  __rq_qos_throttle+0x23/0x30
[Mon Aug 31 15:31:55 2020]  blk_mq_make_request+0x12a/0x5d0
[Mon Aug 31 15:31:55 2020]  generic_make_request+0xcf/0x310
[Mon Aug 31 15:31:55 2020]  submit_bio+0x42/0x1c0
[Mon Aug 31 15:31:55 2020]  ? md_super_write.part.70+0x98/0x120 [md_mod]
[Mon Aug 31 15:31:55 2020]  md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
[Mon Aug 31 15:31:55 2020]  r5l_do_reclaim+0x32a/0x3b0 [raid456]
[Mon Aug 31 15:31:55 2020]  ? schedule_timeout+0x162/0x340
[Mon Aug 31 15:31:55 2020]  ? prepare_to_wait_event+0xb6/0x140
[Mon Aug 31 15:31:55 2020]  ? md_register_thread+0xd0/0xd0 [md_mod]
[Mon Aug 31 15:31:55 2020]  md_thread+0x94/0x150 [md_mod]
[Mon Aug 31 15:31:55 2020]  ? finish_wait+0x80/0x80
[Mon Aug 31 15:31:55 2020]  kthread+0x112/0x130
[Mon Aug 31 15:31:55 2020]  ? kthread_park+0x80/0x80
[Mon Aug 31 15:31:55 2020]  ret_from_fork+0x22/0x40


*shrug*

These SSDs should be able to handle > 500MB/s. And > 130K IOPS. Swap
would have to be pretty heavy to slow down journal writes.

I'm not sure I have any good advise. My remaining ideas involve
changing configuration just to see if the problem goes away, rather
than actually understanding the cause of the problem.

---
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-09-17 17:43                     ` Chris Murphy
@ 2020-09-23 18:14                       ` Vojtech Myslivec
  2021-02-11  3:14                         ` Manuel Riel
  0 siblings, 1 reply; 22+ messages in thread
From: Vojtech Myslivec @ 2020-09-23 18:14 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Michal Moravec, Btrfs BTRFS, Linux-RAID


On 17. 09. 20 19:08, Chris Murphy wrote:
>
> On Wed, Sep 16, 2020 at 3:42 AM Vojtech Myslivec wrote:
>>
>> Description of the devices in iostat, just for recap:
>> - sda-sdf: 6 HDD disks
>> - sdg, sdh: 2 SSD disks
>>
>> - md0: raid1 over sdg1 and sdh1 ("SSD RAID", Physical Volume for LVM)
>> - md1: our "problematic" raid6 over sda-sdf ("HDD RAID", btrfs
>>        formatted)
>>
>> - Logical volumes over md0 Physical Volume (on SSD RAID)
>>     - dm-0: 4G  LV for SWAP
>>     - dm-1: 16G LV for root file system (ext4 formatted)
>>     - dm-2: 1G  LV for md1 journal
>
> It's kindof a complicated setup. When this problem happens, can you
> check swap pressure?
>
> /sys/fs/cgroup/memory.stat
>
> pgfault and maybe also pgmajfault - see if they're going up; or also
> you can look at vmstat and see how heavy swap is being used at the
> time. The thing is.
>
> Because any heavy eviction means writes to dm-0->md0 raid1->sdg+sdh
> SSDs, which are the same SSDs that you have the md1 raid6 mdadm
> journal going to. So if you have any kind of swap pressure, it very
> likely will stop the journal or at least substantially slow it down,
> and now you get blocked tasks as the pressure builds more and more
> because now you have a ton of dirty writes in Btrfs that can't make it
> to disk.
>
> If there is minimal swap usage, then this hypothesis is false and
> something else is going on. I also don't have an explanation why your
> work around works.

On 17. 09. 20 19:20, Chris Murphy wrote:
> The iostat isn't particularly revealing, I don't see especially high
> %util for any device. SSD write MB/s gets up to 42 which is
> reasonable.

On 17. 09. 20 19:43, Chris Murphy wrote:
> [Mon Aug 31 15:31:55 2020] sysrq: Show Blocked State
> [Mon Aug 31 15:31:55 2020]   task                        PC stack   pid father
> 
> [Mon Aug 31 15:31:55 2020] md1_reclaim     D    0   806      2 0x80004000
> [Mon Aug 31 15:31:55 2020] Call Trace:
> ...
> 
> *shrug*
> 
> These SSDs should be able to handle > 500MB/s. And > 130K IOPS. Swap
> would have to be pretty heavy to slow down journal writes.
> 
> I'm not sure I have any good advise. My remaining ideas involve
> changing configuration just to see if the problem goes away, rather
> than actually understanding the cause of the problem.

OK, I see.

This is a physical server with 32 GB RAM and dedicated to backup tasks.
Our monitoring shows there is (almost) no swap usage all the time. So I
hope this should not be the problem. However, I would look for the stats
you mentioned and, for start, I would disable the swap for some several
days. It's there only as a "backup" for any case, and it is not used at
all most of the time.

Sadly, I am not able to _disable the journal_ if I do - just by removing
the device from the array - the MD device instantly fails and btrfs
volume remounts read-only. I can not find any other way how to disable
the journal, it seems it is not supported. I can see only
`--add-journal` option and no corresponding `--delete-journal` one.

I welcome any advice how to exchange write-journal with internal bitmap.

Any other possible changes that comes to my mind are:
- Enlarge write-journal
- Move write-journal to physical sdg/sdh SSDs (out from md0 raid1
  device).

I find the later a bit risky, as the write-journal is not redundant
then. That's the reason we choose write journal on RAID device.

Vojtech

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2020-09-23 18:14                       ` Vojtech Myslivec
@ 2021-02-11  3:14                         ` Manuel Riel
  2021-02-28  8:34                           ` Manuel Riel
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Riel @ 2021-02-11  3:14 UTC (permalink / raw)
  To: Vojtech Myslivec
  Cc: Chris Murphy, Michal Moravec, Linux-RAID, songliubraving, guoqing.jiang

I'm also hitting the exact same problem with XFS on RAID6 using a RAID1 
write journal on two NVMes. CentOS 8, 4.18.0-240.10.1.el8_3.x86_64.

Symptoms:

- high CPU usage of md4_raid6 process
- IO wait goes up
- IO on that file system locks up for tens of minutes and the kernel reports:

[Wed Feb 10 23:23:05 2021] INFO: task md4_reclaim:1070 blocked for more than 20 seconds.
[Wed Feb 10 23:23:05 2021]       Not tainted 4.18.0-240.10.1.el8_3.x86_64 #1
[Wed Feb 10 23:23:05 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Feb 10 23:23:05 2021] md4_reclaim     D    0  1070      2 0x80004000

Already confirmed it's not a timeout mismatch. No drive errors reported. SCT Error Recovery
Control is set to 7 seconds

>> It's kindof a complicated setup. When this problem happens, can you
>> check swap pressure?

There is a RAID1 SWAP partition, but it's almost unused, since the server has ample of RAM.

>> 
>> /sys/fs/cgroup/memory.stat
>> 
>> pgfault and maybe also pgmajfault - see if they're going up; or also
>> you can look at vmstat and see how heavy swap is being used at the
>> time. The thing is.
>> 
>> Because any heavy eviction means writes to dm-0->md0 raid1->sdg+sdh
>> SSDs, which are the same SSDs that you have the md1 raid6 mdadm
>> journal going to. So if you have any kind of swap pressure, it very
>> likely will stop the journal or at least substantially slow it down,
>> and now you get blocked tasks as the pressure builds more and more
>> because now you have a ton of dirty writes in Btrfs that can't make it
>> to disk.

I've disabled SWAP to test this theory.

>> If there is minimal swap usage, then this hypothesis is false and
>> something else is going on. I also don't have an explanation why your
>> work around works.
> 
> Sadly, I am not able to _disable the journal_ if I do - just by removing
> the device from the array - the MD device instantly fails and btrfs
> volume remounts read-only. I can not find any other way how to disable
> the journal, it seems it is not supported. I can see only
> `--add-journal` option and no corresponding `--delete-journal` one.
> 
> I welcome any advice how to exchange write-journal with internal bitmap.

I read that the array needs to be in read-only mode. Then you can fail and replace
the write journal. (not tested)

# mdadm --readonly /dev/md0
# mdadm /dev/md0 --fail /dev/<journal>
# mdadm --manage /dev/md0 --add-journal /dev/<new-journal>

> Any other possible changes that comes to my mind are:
> - Enlarge write-journal

My write journal is about 180 GB

> - Move write-journal to physical sdg/sdh SSDs (out from md0 raid1
>  device).

I may try this, but as you say it's risky, especially when using "write-back"
journal mode

> I find the later a bit risky, as the write-journal is not redundant
> then. That's the reason we choose write journal on RAID device.

I'm also experimenting with write-back/write-through mode and different
stripe_cache_size. Hoping to find something. If nothing helps, it may not
be possible/supported to put a write journal on another RAID device?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
  2021-02-11  3:14                         ` Manuel Riel
@ 2021-02-28  8:34                           ` Manuel Riel
       [not found]                             ` <56AD80D0-6853-4E3A-A94C-AD1477D3FDA4@snapdragon.cc>
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Riel @ 2021-02-28  8:34 UTC (permalink / raw)
  To: Vojtech Myslivec
  Cc: Chris Murphy, Michal Moravec, Linux-RAID, songliubraving, guoqing.jiang

Hit another mdadm "hanger" today. No more reading possible and md4_raid6 stuck at 100% CPU.

I've now moved the write journal off the RAID1 device. So it's not a "nested" RAID any more. Hope this will help.

With only one hardware device used as write cache, I suppose only write-through mode[1] is suggested now.


1: https://www.kernel.org/doc/Documentation/md/raid5-cache.txt

> On Feb 11, 2021, at 11:14, Manuel Riel <manu@snapdragon.cc> wrote:
> 
> I'm also hitting the exact same problem with XFS on RAID6 using a RAID1 
> write journal on two NVMes. CentOS 8, 4.18.0-240.10.1.el8_3.x86_64.
> 
> Symptoms:
> 
> - high CPU usage of md4_raid6 process
> - IO wait goes up
> - IO on that file system locks up for tens of minutes and the kernel reports:
> 
> [Wed Feb 10 23:23:05 2021] INFO: task md4_reclaim:1070 blocked for more than 20 seconds.
> [Wed Feb 10 23:23:05 2021]       Not tainted 4.18.0-240.10.1.el8_3.x86_64 #1
> [Wed Feb 10 23:23:05 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Feb 10 23:23:05 2021] md4_reclaim     D    0  1070      2 0x80004000
> 
> Already confirmed it's not a timeout mismatch. No drive errors reported. SCT Error Recovery
> Control is set to 7 seconds


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Linux RAID with btrfs stuck and consume 100 % CPU
       [not found]                             ` <56AD80D0-6853-4E3A-A94C-AD1477D3FDA4@snapdragon.cc>
@ 2021-03-17 15:55                               ` Vojtech Myslivec
  0 siblings, 0 replies; 22+ messages in thread
From: Vojtech Myslivec @ 2021-03-17 15:55 UTC (permalink / raw)
  To: Manuel Riel
  Cc: Chris Murphy, Michal Moravec, songliubraving, guoqing.jiang,
	Btrfs BTRFS, Linux-RAID

Thanks a lot Manuel for your findings and information.

It's good to know btrfs is not causing this issue and the common symptom 
is an MD journal on another RAID device.

I have moved journal from logical volume on RAID1 to a plain partition 
on a SSD and I will monitor the state.

Vojtech



On 17. 03. 21 5:35, Manuel Riel wrote:
> Final update on this issue for anyone who encounters a similar problem in the future:
> 
> I didn't observe any "hanging" RAID devices after using an ordinary NVMe partition as journal. So using e.g. another md-RAID1 array as journal doesn't seem to be supported.
> 
> The docs[1] say "This means the cache disk must be ... sustainable." The sustainable part motivated me to use a md-RAID1 array. I think the docs should mention that the journal can't be on another RAID array.
> 
> I'm sending in a patch to emphasize this in the docs.
> 
> 
> 1: https://www.kernel.org/doc/html/latest/driver-api/md/raid5-cache.html
> 
>> On Feb 28, 2021, at 4:34 PM, Manuel Riel <manu@snapdragon.cc> wrote:
>>
>> Hit another mdadm "hanger" today. No more reading possible and md4_raid6 stuck at 100% CPU.
>>
>> I've now moved the write journal off the RAID1 device. So it's not a "nested" RAID any more. Hope this will help.
>>
>> With only one hardware device used as write cache, I suppose only write-through mode[1] is suggested now.
>>
>>
>> 1: https://www.kernel.org/doc/Documentation/md/raid5-cache.txt
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-03-17 16:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec
2020-07-22 22:00 ` antlists
2020-07-23  2:08 ` Chris Murphy
     [not found]   ` <29509e08-e373-b352-d696-fcb9f507a545@xmyslivec.cz>
2020-07-28 20:23     ` Chris Murphy
     [not found]       ` <695936b4-67a2-c862-9cb6-5545b4ab3c42@xmyslivec.cz>
2020-08-14 20:04         ` Chris Murphy
     [not found]           ` <2f2f1c21-c81b-55aa-6f77-e2d3f32d32cb@xmyslivec.cz>
2020-08-19 22:58             ` Chris Murphy
2020-08-26 15:35               ` Vojtech Myslivec
2020-08-26 18:07                 ` Chris Murphy
2020-09-16  9:42                   ` Vojtech Myslivec
2020-09-17 17:08                     ` Chris Murphy
2020-09-17 17:20                       ` Chris Murphy
2020-09-17 17:43                     ` Chris Murphy
2020-09-23 18:14                       ` Vojtech Myslivec
2021-02-11  3:14                         ` Manuel Riel
2021-02-28  8:34                           ` Manuel Riel
     [not found]                             ` <56AD80D0-6853-4E3A-A94C-AD1477D3FDA4@snapdragon.cc>
2021-03-17 15:55                               ` Vojtech Myslivec
2020-07-29 21:06 ` Guoqing Jiang
2020-07-29 21:48   ` Chris Murphy
2020-08-12 14:19     ` Vojtech Myslivec
2020-08-12 14:19       ` Vojtech Myslivec
2020-07-30  6:45   ` Song Liu
2020-08-12 13:58   ` Vojtech Myslivec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).