* Linux RAID with btrfs stuck and consume 100 % CPU @ 2020-07-22 20:47 Vojtech Myslivec 2020-07-22 22:00 ` antlists ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Vojtech Myslivec @ 2020-07-22 20:47 UTC (permalink / raw) To: linux-btrfs, linux-raid; +Cc: Michal Moravec [-- Attachment #1: Type: text/plain, Size: 4791 bytes --] Hello, I would like to file an issue related to a btrfs filesystem on a md RAID and I am not sure which one is the cause of this problem. So I write to both mailing lists at once. My colleague, who spent a lot of time experiencing and trying to workaround the issue, is in CC: Thank you for any help, Vojtech Myslivec The issue ========= During last several weeks, I am experiencing annoying issue on my physical server with Linux software RAID 6 with write journal and btrfs filesystem on top of it. Disks setup is described below. This host serves as a backup server and it runs regular backup tasks. When a backup is performed, one (read only) snapshot of one of my subvolumes on the btrfs filesystem is created and one snapshot is deleted afterwards. Once in several days (irregularly, as I noticed), the `md1_raid6` process starts to consume 100 % of one CPU core and during that time, creating a snapshot (during the regular backup process) of a btrfs subvolume get stucked. User space processes accessing this particular subvolume then start to hang in *disk sleep* state. Access to other subvolumes seems to be unaffected until another backup process tries to create another snapshot (of different subvolume). In most cases, after several "IO" actions like listing files (ls), accessing btrfs information (`btrfs filesystem`, `btrfs subvolume`), or accessing the device (with `dd` or whatever), the filesystem gets magically unstucked and `md1_raid6` process released from its "live lock" (or whatever it is cycled in). Snapshots are then created as expected and all processes finish their job. Once in a week approximately, it takes tens of minutes to unstuck these processes. During that period, I try to access affected btrfs subvolumes in several shell sessions to wake it up. If I let the system untouched and let just let it be, the system unstuck itself after several hours. This happens often during a night when I am not awake to try to unstuck the filesystem manually. Also, the worse case happens sometimes, when I am not successful with this "magic unstucking". I am not able to even kill the processes stucked in disk sleep state then and I am forced to restart the system. When my issue happens, I found very often similar messages in kernel dmesg log: INFO: task md1_reclaim:910 blocked for more than 120 seconds. with included call trace. However, there are some more "blocked" tasks, like `btrfs` and `btrfs-transaction` with call trace also included. Questions ========= 1. What should be the cause of this problem? 2. What should I do to mitigate this issue? 3. Could it be a hardware problem? How can I track this? What I have done so far ======================= - I keep the system up-to-date, with latest stable kernel provided by Debian packages - I run both `btrfs scrub` and `fsck.btrfs` to exclude btrfs filesystem issue. - I have read all the physical disks (with dd command) and perform SMART self tests to exclude disks issue (though read/write badblocks were not checked yet). - I have also moved all the files out of the affected filesystem, create a new btrfs filesystem (with recent btrfs-progs) and moved files back. This issue, none the less, appeared again. - I have tried to attach strace to cycled md1 process, but unsuccessfully (is it even possible to strace running kernel thread?) - Of course, I have tried to find similar issues around the web, but I was not successful. Some detailed facts =================== OS -- - Debian 10 buster (stable release) - Linux kernel 5.5 (from Debian backports) - btrfs-progs 5.2.1 (from Debian backports) Hardware -------- - 8 core/16 threads amd64 processor (AMD EPYC 7251) - 6 SATA HDD disks (Seagate Enterprise Capacity) - 2 SSD disks (Intel D3-S4610) RAID ---- - All Linux MD RAID (no HW RAID used) - RAID1 over 2 SSD (md0) - On top of this RAID1 is a LVM with logical volumes for: - root filesystem - swap - RAID6 write journal - RAID6 over 6 HDD and 1 LV as write journal (md1) - This is the affected device btrfs ----- - Several subvolumes, tens of snapshots - Default mount options: rw,noatime,space_cache,subvolid=5,subvol=/ - No compression, autodefrag or so - I have tried to use quotas in the past but they are disabled for a long time Usage ----- - Affected RAID6 block device is directly formatted to btrfs - This filesystem is used to store backups - Backups are performed via rsnapshot - rsnapshot is configured to use btrfs snapshots for hourly and daily backups and rsync to copy new backups Other IO operations ------------------- - There is ongoing monitoring through Icinga which uses iostat internally to monitor I/O - SMART selftests run periodically (on weekly and monthly basis) [-- Attachment #2: dmesg-btrfs.log --] [-- Type: text/x-log, Size: 2444 bytes --] [ +0,000032] INFO: task btrfs:14183 blocked for more than 120 seconds. [ +0,000030] Not tainted 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1 [ +0,000033] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000036] btrfs D 0 14183 14177 0x00004000 [ +0,000002] Call Trace: [ +0,000003] ? __schedule+0x2db/0x700 [ +0,000001] ? mempool_alloc+0x67/0x190 [ +0,000002] schedule+0x40/0xb0 [ +0,000007] md_write_start+0x14b/0x220 [md_mod] [ +0,000002] ? finish_wait+0x80/0x80 [ +0,000002] ? finish_wait+0x80/0x80 [ +0,000005] raid5_make_request+0x83/0xb50 [raid456] [ +0,000004] ? bio_add_page+0x39/0x90 [ +0,000023] ? submit_extent_page+0xd8/0x200 [btrfs] [ +0,000002] ? finish_wait+0x80/0x80 [ +0,000002] ? __blk_queue_split+0x403/0x4c0 [ +0,000002] ? finish_wait+0x80/0x80 [ +0,000005] md_handle_request+0x119/0x190 [md_mod] [ +0,000006] md_make_request+0x8a/0x190 [md_mod] [ +0,000003] generic_make_request+0xcf/0x310 [ +0,000026] ? btrfs_dev_stat_inc_and_print+0x50/0x50 [btrfs] [ +0,000001] submit_bio+0x42/0x1c0 [ +0,000025] btrfs_map_bio+0x1c0/0x380 [btrfs] [ +0,000022] btrfs_submit_bio_hook+0x8c/0x170 [btrfs] [ +0,000024] submit_one_bio+0x31/0x50 [btrfs] [ +0,000024] extent_writepages+0x6b/0xa0 [btrfs] [ +0,000002] do_writepages+0x41/0xd0 [ +0,000024] ? merge_state.part.43+0x3f/0x160 [btrfs] [ +0,000002] __filemap_fdatawrite_range+0xcb/0x100 [ +0,000023] btrfs_fdatawrite_range+0x1b/0x50 [btrfs] [ +0,000025] __btrfs_write_out_cache+0x467/0x4c0 [btrfs] [ +0,000026] btrfs_write_out_cache+0x92/0xe0 [btrfs] [ +0,000023] btrfs_start_dirty_block_groups+0x270/0x4e0 [btrfs] [ +0,000023] btrfs_commit_transaction+0xc5/0xa40 [btrfs] [ +0,000020] ? btrfs_record_root_in_trans+0x56/0x60 [btrfs] [ +0,000021] ? start_transaction+0xbc/0x4d0 [btrfs] [ +0,000024] btrfs_mksubvol+0x4f0/0x530 [btrfs] [ +0,000026] btrfs_ioctl_snap_create_transid+0x170/0x180 [btrfs] [ +0,000024] btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] [ +0,000024] btrfs_ioctl+0x11c2/0x2de0 [btrfs] [ +0,000004] ? tomoyo_path_number_perm+0x68/0x1e0 [ +0,000005] ? do_vfs_ioctl+0xa4/0x680 [ +0,000001] do_vfs_ioctl+0xa4/0x680 [ +0,000003] ksys_ioctl+0x60/0x90 [ +0,000001] __x64_sys_ioctl+0x16/0x20 [ +0,000004] do_syscall_64+0x52/0x170 [ +0,000002] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [-- Attachment #3: dmesg-btrfs-transaction.log --] [-- Type: text/x-log, Size: 1208 bytes --] [ +0,000008] INFO: task btrfs-transacti:949 blocked for more than 120 seconds. [ +0,000032] Not tainted 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1 [ +0,000033] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000035] btrfs-transacti D 0 949 2 0x80004000 [ +0,000002] Call Trace: [ +0,000003] ? __schedule+0x2db/0x700 [ +0,000002] schedule+0x40/0xb0 [ +0,000002] io_schedule+0x12/0x40 [ +0,000004] wait_on_page_bit+0x15c/0x230 [ +0,000002] ? file_fdatawait_range+0x20/0x20 [ +0,000002] __filemap_fdatawait_range+0x89/0xf0 [ +0,000027] ? __clear_extent_bit+0x2bd/0x440 [btrfs] [ +0,000002] filemap_fdatawait_range+0xe/0x20 [ +0,000022] __btrfs_wait_marked_extents.isra.20+0xc2/0x100 [btrfs] [ +0,000022] btrfs_write_and_wait_transaction.isra.24+0x67/0xd0 [btrfs] [ +0,000022] btrfs_commit_transaction+0x754/0xa40 [btrfs] [ +0,000022] ? start_transaction+0xbc/0x4d0 [btrfs] [ +0,000022] transaction_kthread+0x144/0x170 [btrfs] [ +0,000021] ? btrfs_cleanup_transaction+0x5e0/0x5e0 [btrfs] [ +0,000002] kthread+0x112/0x130 [ +0,000001] ? kthread_park+0x80/0x80 [ +0,000002] ret_from_fork+0x22/0x40 [-- Attachment #4: dmesg-md1_reclaim.log --] [-- Type: text/x-log, Size: 1441 bytes --] [čen 1 13:05] INFO: task md1_reclaim:910 blocked for more than 120 seconds. [ +0,000057] Not tainted 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1 [ +0,000033] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000035] md1_reclaim D 0 910 2 0x80004000 [ +0,000003] Call Trace: [ +0,000012] ? __schedule+0x2db/0x700 [ +0,000004] ? check_preempt_curr+0x62/0x90 [ +0,000004] ? wbt_exit+0x30/0x30 [ +0,000001] ? __wbt_done+0x30/0x30 [ +0,000002] schedule+0x40/0xb0 [ +0,000002] io_schedule+0x12/0x40 [ +0,000003] rq_qos_wait+0xfa/0x170 [ +0,000002] ? karma_partition+0x210/0x210 [ +0,000002] ? wbt_exit+0x30/0x30 [ +0,000001] wbt_wait+0x99/0xe0 [ +0,000002] __rq_qos_throttle+0x23/0x30 [ +0,000004] blk_mq_make_request+0x12a/0x5d0 [ +0,000003] generic_make_request+0xcf/0x310 [ +0,000002] submit_bio+0x42/0x1c0 [ +0,000010] ? md_super_write.part.70+0x98/0x120 [md_mod] [ +0,000007] md_update_sb.part.71+0x3c0/0x8f0 [md_mod] [ +0,000008] r5l_do_reclaim+0x32a/0x3b0 [raid456] [ +0,000003] ? schedule_timeout+0x161/0x310 [ +0,000002] ? prepare_to_wait_event+0xb6/0x140 [ +0,000006] ? md_register_thread+0xd0/0xd0 [md_mod] [ +0,000004] md_thread+0x94/0x150 [md_mod] [ +0,000002] ? finish_wait+0x80/0x80 [ +0,000003] kthread+0x112/0x130 [ +0,000002] ? kthread_park+0x80/0x80 [ +0,000002] ret_from_fork+0x22/0x40 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec @ 2020-07-22 22:00 ` antlists 2020-07-23 2:08 ` Chris Murphy 2020-07-29 21:06 ` Guoqing Jiang 2 siblings, 0 replies; 22+ messages in thread From: antlists @ 2020-07-22 22:00 UTC (permalink / raw) To: Vojtech Myslivec, linux-btrfs, linux-raid; +Cc: Michal Moravec On 22/07/2020 21:47, Vojtech Myslivec wrote: > - RAID6 over 6 HDD and 1 LV as write journal (md1) > - This is the affected device What hard drives do you have? https://raid.wiki.kernel.org/index.php/Timeout_Mismatch Dunno whether that is your problem, but it's one of the first things to check. Cheers, Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec 2020-07-22 22:00 ` antlists @ 2020-07-23 2:08 ` Chris Murphy [not found] ` <29509e08-e373-b352-d696-fcb9f507a545@xmyslivec.cz> 2020-07-29 21:06 ` Guoqing Jiang 2 siblings, 1 reply; 22+ messages in thread From: Chris Murphy @ 2020-07-23 2:08 UTC (permalink / raw) To: Vojtech Myslivec; +Cc: Btrfs BTRFS, Linux-RAID, Michal Moravec On Wed, Jul 22, 2020 at 2:55 PM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote: > > This host serves as a backup server and it runs regular backup tasks. > When a backup is performed, one (read only) snapshot of one of my > subvolumes on the btrfs filesystem is created and one snapshot is > deleted afterwards. This is likely to be a decently metadata centric workflow, lots of small file changes, and metadata (file system) changes. Parity raid performance in such workloads is often not great. It's just the way it goes. But what does iostat tell you about drive utilization during these backups? And during the problem? Are they balanced? Are they nearly fully utilized? > > Once in several days (irregularly, as I noticed), the `md1_raid6` > process starts to consume 100 % of one CPU core and during that time, > creating a snapshot (during the regular backup process) of a btrfs > subvolume get stucked. User space processes accessing this particular > subvolume then start to hang in *disk sleep* state. Access to other > subvolumes seems to be unaffected until another backup process tries to > create another snapshot (of different subvolume). Snapshot results in flush. And snapshot delete result in btrfs-cleaner process, which involves a lot of reas and writes to track down the extents to be freed. But your call traces seem stuck in snapshot creation. Can you provide mdadm -E and -D output respectively? I wonder if the setup is just not well suited for the workload. Default mdadm 512KiB chunk may not align well with this workload. Also, a complete dmesg might be useful. > > In most cases, after several "IO" actions like listing files (ls), > accessing btrfs information (`btrfs filesystem`, `btrfs subvolume`), or > accessing the device (with `dd` or whatever), the filesystem gets > magically unstucked and `md1_raid6` process released from its "live > lock" (or whatever it is cycled in). Snapshots are then created as > expected and all processes finish their job. > > Once in a week approximately, it takes tens of minutes to unstuck these > processes. During that period, I try to access affected btrfs subvolumes > in several shell sessions to wake it up. Could be lock contention on the subvolume. > However, there are some more "blocked" tasks, like `btrfs` and > `btrfs-transaction` with call trace also included. > > > Questions > ========= > > 1. What should be the cause of this problem? > 2. What should I do to mitigate this issue? > 3. Could it be a hardware problem? How can I track this? Not sure yet. Need more info. dmesg mdadm -E mdadm -D btrfs filesystem usage /mountpoint btrfs device stats /mountpoint > What I have done so far > ======================= > > - I keep the system up-to-date, with latest stable kernel provided by > Debian packages 5.5 is fairly recent and OK. It should be fine, except you're having a problem, so...it could be a bug that's fixed already or a new bug. Or it could be suboptimal configuration for the workload - which can be difficult to figure out. > > - I run both `btrfs scrub` and `fsck.btrfs` to exclude btrfs filesystem > issue. > > - I have read all the physical disks (with dd command) and perform SMART > self tests to exclude disks issue (though read/write badblocks were > not checked yet). I wouldn't worry too much about badblocks. More important is https://raid.wiki.kernel.org/index.php/Timeout_Mismatch But you report using enterprise drives. They will invariably have an SCT ERC time of ~70 deciseconds, which is well below that of the kernel's SCSI command timer, ergo not a problem. But it's fine to double check that. > - I have also moved all the files out of the affected filesystem, create > a new btrfs filesystem (with recent btrfs-progs) and moved files > back. This issue, none the less, appeared again. Exactly the same configuration? Anything different at all? > > - I have tried to attach strace to cycled md1 process, but > unsuccessfully (is it even possible to strace running kernel thread?) You want to do 'cat /proc/<pid>/stack' > Some detailed facts > =================== > > OS > -- > > - Debian 10 buster (stable release) > - Linux kernel 5.5 (from Debian backports) > - btrfs-progs 5.2.1 (from Debian backports) btrfs-progs 5.2.1 is ok, but I suggest something newer before using 'btrfs check --repair'. Just to be clear --repair is NOT indicated right now. > Hardware > -------- > > - 8 core/16 threads amd64 processor (AMD EPYC 7251) > - 6 SATA HDD disks (Seagate Enterprise Capacity) > - 2 SSD disks (Intel D3-S4610) It's not related, but your workload might benefit from 'compress=zstd:1' mount option. Compress everything across the board. Chances are these backups contain a lot of compressible data. This isn't important to do right now. Fix the problem first. Optimize later. But you have significant CPU capacity relative to the hardware. > btrfs > ----- > - Several subvolumes, tens of snapshots > - Default mount options: rw,noatime,space_cache,subvolid=5,subvol=/ > - No compression, autodefrag or so > - I have tried to use quotas in the past but they are disabled for > a long time I don't think this is the only thing going on, but consider space_cache=v2. You can mount with '-o clear_cache' then umount and then mount again with 'o space_cache=v2' to convert. And it will be persistent (unless invalidated by a repair and then default v1 version is used again). v2 will soon be the default. > > Usage > ----- > > - Affected RAID6 block device is directly formatted to btrfs > - This filesystem is used to store backups > - Backups are performed via rsnapshot > - rsnapshot is configured to use btrfs snapshots for hourly and daily > backups and rsync to copy new backups How many rsnapshot and rsync tasks are happening concurrently for a subvolume at the time the subvolume becomes unresponsive? -- Chris Murphy ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <29509e08-e373-b352-d696-fcb9f507a545@xmyslivec.cz>]
* Re: Linux RAID with btrfs stuck and consume 100 % CPU [not found] ` <29509e08-e373-b352-d696-fcb9f507a545@xmyslivec.cz> @ 2020-07-28 20:23 ` Chris Murphy [not found] ` <695936b4-67a2-c862-9cb6-5545b4ab3c42@xmyslivec.cz> 0 siblings, 1 reply; 22+ messages in thread From: Chris Murphy @ 2020-07-28 20:23 UTC (permalink / raw) To: Vojtech Myslivec; +Cc: Chris Murphy, Michal Moravec, Btrfs BTRFS, Linux-RAID On Tue, Jul 28, 2020 at 7:31 AM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote: > > dmesg > > mdadm -E > > mdadm -D > > btrfs filesystem usage /mountpoint > > btrfs device stats /mountpoint These all look good. > > SCT Error Recovery Control: > > Read: 100 (10.0 seconds) > > Write: 100 (10.0 seconds) > > It is higher than you expect, yet still below kernel 30 s timeout, right? It's good. > > It's not related, but your workload might benefit from > > 'compress=zstd:1' mount option. Compress everything across the board. > > Chances are these backups contain a lot of compressible data. This > > isn't important to do right now. Fix the problem first. Optimize > > later. But you have significant CPU capacity relative to the hardware. > > OK, thanks for the tip. Overall CPU utilization is not high at the > moment. The server is dedicated to backups so I can try this. > > In fact, I am scared a bit of any compression related to btrfs. I do not > to blame anyone, I just read some recommendation about disabling > compression on btrfs (Debian wiki, kernel wiki, ...). That's based on ancient kernels. Also the last known bug was really obscure, I never hit it. You had to have some combination of inline extents and also holes. You're using 5.5, and that has all bug fixes for that. At least Facebook folks are using compress=zstd:1 pretty much across the board and have a metric s ton of machines they're doing this with, so it's reliable. > In most cases backups are pretty fast and it runs only one at a time. > From the logs on the server, I can see it it get stuck when only one > backup process is running. > > But I am not able to tell if a background btrfs-cleaner procces is > running at that moment. I can focus on this if it helps. Your dmesg contains [ 9667.449898] INFO: task md1_reclaim:910 blocked for more than 120 seconds. It might be helpful to reproduce and take sysrq+w at the time of the blocking. Sometimes it's best to have the sysrq trigger command ready in a hell, but don't hit enter until the blocked task happens. Sometimes during blocked tasks it takes forever to issue a command. It would be nice if an md kernel developer can comment on what's going on. Does this often happen when a btrfs snapshot is created? That will cause a flush to happen and I wonder if that's instigating the problem in the lower layers. -- Chris Murphy ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <695936b4-67a2-c862-9cb6-5545b4ab3c42@xmyslivec.cz>]
* Re: Linux RAID with btrfs stuck and consume 100 % CPU [not found] ` <695936b4-67a2-c862-9cb6-5545b4ab3c42@xmyslivec.cz> @ 2020-08-14 20:04 ` Chris Murphy [not found] ` <2f2f1c21-c81b-55aa-6f77-e2d3f32d32cb@xmyslivec.cz> 0 siblings, 1 reply; 22+ messages in thread From: Chris Murphy @ 2020-08-14 20:04 UTC (permalink / raw) To: Vojtech Myslivec; +Cc: Michal Moravec, Btrfs BTRFS, Linux-RAID, Song Liu [-- Attachment #1: Type: text/plain, Size: 27730 bytes --] On Wed, Aug 12, 2020 at 7:54 AM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote: > > My colleague Michal replied to that On 05. 08. 2020: I can't find any reply https://lore.kernel.org/linux-raid/20200812141931.bT6w5ZM3DAPlTLvmUW5CIJWTqeVTrI7MAxdY33Z4JT8@z/ > I attach sysrq logs again here Song requested " /proc/<pid>/stack for <pid> of md1_raid6? We may want to sample it multiple times. " But the tarball shows both copies empty /proc/909/stack sysrq+w I'll also try to attach as a file because I bet gmail will screw up the wrapping in the paste: Jul 31 16:31:40 backup1 kernel: sysrq: Show Blocked State Jul 31 16:31:40 backup1 kernel: task PC stack pid father Jul 31 16:31:40 backup1 kernel: md1_reclaim D 0 910 2 0x80004000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: ? __switch_to_asm+0x34/0x70 Jul 31 16:31:40 backup1 kernel: ? __switch_to_asm+0x34/0x70 Jul 31 16:31:40 backup1 kernel: ? wbt_exit+0x30/0x30 Jul 31 16:31:40 backup1 kernel: ? __wbt_done+0x30/0x30 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: io_schedule+0x12/0x40 Jul 31 16:31:40 backup1 kernel: rq_qos_wait+0xfa/0x170 Jul 31 16:31:40 backup1 kernel: ? __switch_to_asm+0x34/0x70 Jul 31 16:31:40 backup1 kernel: ? karma_partition+0x210/0x210 Jul 31 16:31:40 backup1 kernel: ? wbt_exit+0x30/0x30 Jul 31 16:31:40 backup1 kernel: wbt_wait+0x99/0xe0 Jul 31 16:31:40 backup1 kernel: __rq_qos_throttle+0x23/0x30 Jul 31 16:31:40 backup1 kernel: blk_mq_make_request+0x12a/0x5d0 Jul 31 16:31:40 backup1 kernel: generic_make_request+0xcf/0x310 Jul 31 16:31:40 backup1 kernel: submit_bio+0x42/0x1c0 Jul 31 16:31:40 backup1 kernel: ? md_super_write.part.70+0x98/0x120 [md_mod] Jul 31 16:31:40 backup1 kernel: md_update_sb.part.71+0x3c0/0x8f0 [md_mod] Jul 31 16:31:40 backup1 kernel: r5l_do_reclaim+0x32a/0x3b0 [raid456] Jul 31 16:31:40 backup1 kernel: ? schedule_timeout+0x161/0x310 Jul 31 16:31:40 backup1 kernel: ? prepare_to_wait_event+0xb6/0x140 Jul 31 16:31:40 backup1 kernel: ? md_register_thread+0xd0/0xd0 [md_mod] Jul 31 16:31:40 backup1 kernel: md_thread+0x94/0x150 [md_mod] Jul 31 16:31:40 backup1 kernel: ? finish_wait+0x80/0x80 Jul 31 16:31:40 backup1 kernel: kthread+0x112/0x130 Jul 31 16:31:40 backup1 kernel: ? kthread_park+0x80/0x80 Jul 31 16:31:40 backup1 kernel: ret_from_fork+0x22/0x40 Jul 31 16:31:40 backup1 kernel: btrfs-transacti D 0 10200 2 0x80004000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: io_schedule+0x12/0x40 Jul 31 16:31:40 backup1 kernel: wait_on_page_bit+0x15c/0x230 Jul 31 16:31:40 backup1 kernel: ? file_fdatawait_range+0x20/0x20 Jul 31 16:31:40 backup1 kernel: __filemap_fdatawait_range+0x89/0xf0 Jul 31 16:31:40 backup1 kernel: ? __clear_extent_bit+0x2bd/0x440 [btrfs] Jul 31 16:31:40 backup1 kernel: filemap_fdatawait_range+0xe/0x20 Jul 31 16:31:40 backup1 kernel: __btrfs_wait_marked_extents.isra.20+0xc2/0x100 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_write_and_wait_transaction.isra.24+0x67/0xd0 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_commit_transaction+0x754/0xa40 [btrfs] Jul 31 16:31:40 backup1 kernel: ? start_transaction+0xbc/0x4d0 [btrfs] Jul 31 16:31:40 backup1 kernel: transaction_kthread+0x144/0x170 [btrfs] Jul 31 16:31:40 backup1 kernel: ? btrfs_cleanup_transaction+0x5e0/0x5e0 [btrfs] Jul 31 16:31:40 backup1 kernel: kthread+0x112/0x130 Jul 31 16:31:40 backup1 kernel: ? kthread_park+0x80/0x80 Jul 31 16:31:40 backup1 kernel: ret_from_fork+0x22/0x40 Jul 31 16:31:40 backup1 kernel: btrfs D 0 30038 30032 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: wait_for_commit+0x58/0x80 [btrfs] Jul 31 16:31:40 backup1 kernel: ? finish_wait+0x80/0x80 Jul 31 16:31:40 backup1 kernel: btrfs_commit_transaction+0x169/0xa40 [btrfs] Jul 31 16:31:40 backup1 kernel: ? btrfs_record_root_in_trans+0x56/0x60 [btrfs] Jul 31 16:31:40 backup1 kernel: ? start_transaction+0xbc/0x4d0 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_mksubvol+0x4f0/0x530 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_create_transid+0x170/0x180 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x11c2/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fe919fb4427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe760eac98 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe919fb4427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe760eacd8 RSI: 0000000050009417 RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000559cb3d14260 R08: 0000000000000008 R09: 35315f31332d3730 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000559cb3d142a0 Jul 31 16:31:40 backup1 kernel: R13: 0000559cb3d142a0 R14: 0000000000000003 R15: 0000000000000004 Jul 31 16:31:40 backup1 kernel: btrfs D 0 30247 30241 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: ? prep_new_page+0x90/0x140 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fd2159c9427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fff421698e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd2159c9427 Jul 31 16:31:40 backup1 kernel: RDX: 00007fff42169920 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 00005631102c22c8 R09: 00007fd215a53e80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fff42169920 R15: 00005631102c22c8 Jul 31 16:31:40 backup1 kernel: btrfs D 0 30586 30580 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f4e6e35a427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffb6891ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4e6e35a427 Jul 31 16:31:40 backup1 kernel: RDX: 00007fffb6891af0 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055a899f7f2ca R09: 00007f4e6e3e4e80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffb6891af0 R15: 000055a899f7f2ca Jul 31 16:31:40 backup1 kernel: btrfs D 0 30613 30607 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f527437f427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe0dcd07d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f527437f427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe0dcd0810 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055bf05f592c6 R09: 00007f5274409e80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffe0dcd0810 R15: 000055bf05f592c6 Jul 31 16:31:40 backup1 kernel: btrfs D 0 31443 31437 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f6e74eb1427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe270079f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6e74eb1427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe27007a30 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000056037d0422ca R09: 00007f6e74f3be80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffe27007a30 R15: 000056037d0422ca Jul 31 16:31:40 backup1 kernel: btrfs D 0 31650 31644 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fb16f9ae427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffd266be828 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb16f9ae427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffd266be860 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 0000563ccec6d2ca R09: 00007fb16fa38e80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffd266be860 R15: 0000563ccec6d2ca Jul 31 16:31:40 backup1 kernel: btrfs D 0 31813 31807 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f09dee75427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffeed6bea48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f09dee75427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffeed6bea80 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055ebcdb522c8 R09: 00007f09deeffe80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffeed6bea80 R15: 000055ebcdb522c8 Jul 31 16:31:40 backup1 kernel: btrfs D 0 417 411 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f614eb62427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffcb2850518 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f614eb62427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffcb2850550 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055f42641e2ca R09: 00007f614ebece80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffcb2850550 R15: 000055f42641e2ca Jul 31 16:31:40 backup1 kernel: btrfs D 0 573 567 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f40626c0427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffa93d57b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f40626c0427 Jul 31 16:31:40 backup1 kernel: RDX: 00007fffa93d57f0 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055bba6bae2c8 R09: 00007f406274ae80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffa93d57f0 R15: 000055bba6bae2c8 Jul 31 16:31:40 backup1 kernel: btrfs D 0 811 805 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f434d293427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffb46312c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f434d293427 Jul 31 16:31:40 backup1 kernel: RDX: 00007fffb4631300 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000056166b0872c8 R09: 00007f434d31de80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffb4631300 R15: 000056166b0872c8 Jul 31 16:31:40 backup1 kernel: btrfs D 0 2200 2194 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fd0c0e04427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffcdcda5d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd0c0e04427 Jul 31 16:31:40 backup1 kernel: RDX: 00007fffcdcda610 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 0000559a611ce2c7 R09: 00007fd0c0e8ee80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffcdcda610 R15: 0000559a611ce2c7 Jul 31 16:31:40 backup1 kernel: btrfs D 0 2699 2693 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f9d1010e427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffd08c48a48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9d1010e427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffd08c48a80 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 00005639550432dd R09: 00007f9d10198e80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffd08c48a80 R15: 00005639550432dd Jul 31 16:31:40 backup1 kernel: btrfs D 0 2807 2801 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f0c33875427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffeb2ed3ac8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0c33875427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffeb2ed3b00 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 0000558a0ba742de R09: 00007f0c338ffe80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffeb2ed3b00 R15: 0000558a0ba742de Jul 31 16:32:01 backup1 CRON[5113]: pam_unix(cron:session): session opened for user root by (uid=0) Jul 31 16:32:01 backup1 CRON[5114]: (root) CMD (/opt/scripts/unstuck.sh) Jul 31 16:32:01 backup1 CRON[5113]: pam_unix(cron:session): session closed for user root Jul 31 16:32:03 backup1 sudo[5137]: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_btrfs --allocated-critical-percent 90 --allocated-warning-percent 75 --mountpoint /mnt/data Jul 31 16:32:03 backup1 sudo[5137]: pam_unix(sudo:session): session opened for user root by (uid=0) Jul 31 16:32:03 backup1 sudo[5137]: pam_unix(sudo:session): session closed for user root Jul 31 16:32:09 backup1 sudo[5157]: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ipmi_sensor Jul 31 16:32:09 backup1 sudo[5157]: pam_unix(sudo:session): session opened for user root by (uid=0) Jul 31 16:32:10 backup1 sudo[5157]: pam_unix(sudo:session): session closed for user root One difficulty is that this is the upstream list; and kernel 5.5 is end of life. Ordinarily older kernels are distribution responsibility. Upstream development is mostly interested in current stable kernels at the oldest, and mainline development kernels. But hopefully someone will still have an idea what's going on, if it's a bug or possibly misconfiguration. -- Chris Murphy [-- Attachment #2: w1_trace1.txt --] [-- Type: text/plain, Size: 27172 bytes --] Jul 31 16:31:40 backup1 kernel: sysrq: Show Blocked State Jul 31 16:31:40 backup1 kernel: task PC stack pid father Jul 31 16:31:40 backup1 kernel: md1_reclaim D 0 910 2 0x80004000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: ? __switch_to_asm+0x34/0x70 Jul 31 16:31:40 backup1 kernel: ? __switch_to_asm+0x34/0x70 Jul 31 16:31:40 backup1 kernel: ? wbt_exit+0x30/0x30 Jul 31 16:31:40 backup1 kernel: ? __wbt_done+0x30/0x30 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: io_schedule+0x12/0x40 Jul 31 16:31:40 backup1 kernel: rq_qos_wait+0xfa/0x170 Jul 31 16:31:40 backup1 kernel: ? __switch_to_asm+0x34/0x70 Jul 31 16:31:40 backup1 kernel: ? karma_partition+0x210/0x210 Jul 31 16:31:40 backup1 kernel: ? wbt_exit+0x30/0x30 Jul 31 16:31:40 backup1 kernel: wbt_wait+0x99/0xe0 Jul 31 16:31:40 backup1 kernel: __rq_qos_throttle+0x23/0x30 Jul 31 16:31:40 backup1 kernel: blk_mq_make_request+0x12a/0x5d0 Jul 31 16:31:40 backup1 kernel: generic_make_request+0xcf/0x310 Jul 31 16:31:40 backup1 kernel: submit_bio+0x42/0x1c0 Jul 31 16:31:40 backup1 kernel: ? md_super_write.part.70+0x98/0x120 [md_mod] Jul 31 16:31:40 backup1 kernel: md_update_sb.part.71+0x3c0/0x8f0 [md_mod] Jul 31 16:31:40 backup1 kernel: r5l_do_reclaim+0x32a/0x3b0 [raid456] Jul 31 16:31:40 backup1 kernel: ? schedule_timeout+0x161/0x310 Jul 31 16:31:40 backup1 kernel: ? prepare_to_wait_event+0xb6/0x140 Jul 31 16:31:40 backup1 kernel: ? md_register_thread+0xd0/0xd0 [md_mod] Jul 31 16:31:40 backup1 kernel: md_thread+0x94/0x150 [md_mod] Jul 31 16:31:40 backup1 kernel: ? finish_wait+0x80/0x80 Jul 31 16:31:40 backup1 kernel: kthread+0x112/0x130 Jul 31 16:31:40 backup1 kernel: ? kthread_park+0x80/0x80 Jul 31 16:31:40 backup1 kernel: ret_from_fork+0x22/0x40 Jul 31 16:31:40 backup1 kernel: btrfs-transacti D 0 10200 2 0x80004000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: io_schedule+0x12/0x40 Jul 31 16:31:40 backup1 kernel: wait_on_page_bit+0x15c/0x230 Jul 31 16:31:40 backup1 kernel: ? file_fdatawait_range+0x20/0x20 Jul 31 16:31:40 backup1 kernel: __filemap_fdatawait_range+0x89/0xf0 Jul 31 16:31:40 backup1 kernel: ? __clear_extent_bit+0x2bd/0x440 [btrfs] Jul 31 16:31:40 backup1 kernel: filemap_fdatawait_range+0xe/0x20 Jul 31 16:31:40 backup1 kernel: __btrfs_wait_marked_extents.isra.20+0xc2/0x100 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_write_and_wait_transaction.isra.24+0x67/0xd0 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_commit_transaction+0x754/0xa40 [btrfs] Jul 31 16:31:40 backup1 kernel: ? start_transaction+0xbc/0x4d0 [btrfs] Jul 31 16:31:40 backup1 kernel: transaction_kthread+0x144/0x170 [btrfs] Jul 31 16:31:40 backup1 kernel: ? btrfs_cleanup_transaction+0x5e0/0x5e0 [btrfs] Jul 31 16:31:40 backup1 kernel: kthread+0x112/0x130 Jul 31 16:31:40 backup1 kernel: ? kthread_park+0x80/0x80 Jul 31 16:31:40 backup1 kernel: ret_from_fork+0x22/0x40 Jul 31 16:31:40 backup1 kernel: btrfs D 0 30038 30032 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: wait_for_commit+0x58/0x80 [btrfs] Jul 31 16:31:40 backup1 kernel: ? finish_wait+0x80/0x80 Jul 31 16:31:40 backup1 kernel: btrfs_commit_transaction+0x169/0xa40 [btrfs] Jul 31 16:31:40 backup1 kernel: ? btrfs_record_root_in_trans+0x56/0x60 [btrfs] Jul 31 16:31:40 backup1 kernel: ? start_transaction+0xbc/0x4d0 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_mksubvol+0x4f0/0x530 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_create_transid+0x170/0x180 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x11c2/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fe919fb4427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe760eac98 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe919fb4427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe760eacd8 RSI: 0000000050009417 RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000559cb3d14260 R08: 0000000000000008 R09: 35315f31332d3730 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000559cb3d142a0 Jul 31 16:31:40 backup1 kernel: R13: 0000559cb3d142a0 R14: 0000000000000003 R15: 0000000000000004 Jul 31 16:31:40 backup1 kernel: btrfs D 0 30247 30241 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: ? prep_new_page+0x90/0x140 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fd2159c9427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fff421698e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd2159c9427 Jul 31 16:31:40 backup1 kernel: RDX: 00007fff42169920 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 00005631102c22c8 R09: 00007fd215a53e80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fff42169920 R15: 00005631102c22c8 Jul 31 16:31:40 backup1 kernel: btrfs D 0 30586 30580 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f4e6e35a427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffb6891ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4e6e35a427 Jul 31 16:31:40 backup1 kernel: RDX: 00007fffb6891af0 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055a899f7f2ca R09: 00007f4e6e3e4e80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffb6891af0 R15: 000055a899f7f2ca Jul 31 16:31:40 backup1 kernel: btrfs D 0 30613 30607 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f527437f427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe0dcd07d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f527437f427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe0dcd0810 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055bf05f592c6 R09: 00007f5274409e80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffe0dcd0810 R15: 000055bf05f592c6 Jul 31 16:31:40 backup1 kernel: btrfs D 0 31443 31437 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f6e74eb1427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffe270079f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6e74eb1427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffe27007a30 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000056037d0422ca R09: 00007f6e74f3be80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffe27007a30 R15: 000056037d0422ca Jul 31 16:31:40 backup1 kernel: btrfs D 0 31650 31644 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fb16f9ae427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffd266be828 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb16f9ae427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffd266be860 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 0000563ccec6d2ca R09: 00007fb16fa38e80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffd266be860 R15: 0000563ccec6d2ca Jul 31 16:31:40 backup1 kernel: btrfs D 0 31813 31807 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f09dee75427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffeed6bea48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f09dee75427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffeed6bea80 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055ebcdb522c8 R09: 00007f09deeffe80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffeed6bea80 R15: 000055ebcdb522c8 Jul 31 16:31:40 backup1 kernel: btrfs D 0 417 411 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f614eb62427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffcb2850518 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f614eb62427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffcb2850550 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055f42641e2ca R09: 00007f614ebece80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffcb2850550 R15: 000055f42641e2ca Jul 31 16:31:40 backup1 kernel: btrfs D 0 573 567 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f40626c0427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffa93d57b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f40626c0427 Jul 31 16:31:40 backup1 kernel: RDX: 00007fffa93d57f0 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000055bba6bae2c8 R09: 00007f406274ae80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffa93d57f0 R15: 000055bba6bae2c8 Jul 31 16:31:40 backup1 kernel: btrfs D 0 811 805 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f434d293427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffb46312c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f434d293427 Jul 31 16:31:40 backup1 kernel: RDX: 00007fffb4631300 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 000056166b0872c8 R09: 00007f434d31de80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffb4631300 R15: 000056166b0872c8 Jul 31 16:31:40 backup1 kernel: btrfs D 0 2200 2194 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7fd0c0e04427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007fffcdcda5d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd0c0e04427 Jul 31 16:31:40 backup1 kernel: RDX: 00007fffcdcda610 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 0000559a611ce2c7 R09: 00007fd0c0e8ee80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007fffcdcda610 R15: 0000559a611ce2c7 Jul 31 16:31:40 backup1 kernel: btrfs D 0 2699 2693 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f9d1010e427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffd08c48a48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9d1010e427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffd08c48a80 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 00005639550432dd R09: 00007f9d10198e80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffd08c48a80 R15: 00005639550432dd Jul 31 16:31:40 backup1 kernel: btrfs D 0 2807 2801 0x00000000 Jul 31 16:31:40 backup1 kernel: Call Trace: Jul 31 16:31:40 backup1 kernel: ? __schedule+0x2db/0x700 Jul 31 16:31:40 backup1 kernel: schedule+0x40/0xb0 Jul 31 16:31:40 backup1 kernel: rwsem_down_write_slowpath+0x329/0x4e0 Jul 31 16:31:40 backup1 kernel: btrfs_delete_subvolume+0x90/0x5f0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? d_lookup+0x25/0x50 Jul 31 16:31:40 backup1 kernel: btrfs_ioctl_snap_destroy+0x2cd/0x330 [btrfs] Jul 31 16:31:40 backup1 kernel: btrfs_ioctl+0x166c/0x2de0 [btrfs] Jul 31 16:31:40 backup1 kernel: ? tomoyo_path_number_perm+0x68/0x1e0 Jul 31 16:31:40 backup1 kernel: ? do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: do_vfs_ioctl+0xa4/0x680 Jul 31 16:31:40 backup1 kernel: ksys_ioctl+0x60/0x90 Jul 31 16:31:40 backup1 kernel: __x64_sys_ioctl+0x16/0x20 Jul 31 16:31:40 backup1 kernel: do_syscall_64+0x52/0x170 Jul 31 16:31:40 backup1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 31 16:31:40 backup1 kernel: RIP: 0033:0x7f0c33875427 Jul 31 16:31:40 backup1 kernel: Code: Bad RIP value. Jul 31 16:31:40 backup1 kernel: RSP: 002b:00007ffeb2ed3ac8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 Jul 31 16:31:40 backup1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0c33875427 Jul 31 16:31:40 backup1 kernel: RDX: 00007ffeb2ed3b00 RSI: 000000005000940f RDI: 0000000000000003 Jul 31 16:31:40 backup1 kernel: RBP: 0000000000000000 R08: 0000558a0ba742de R09: 00007f0c338ffe80 Jul 31 16:31:40 backup1 kernel: R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 Jul 31 16:31:40 backup1 kernel: R13: 0000000000000003 R14: 00007ffeb2ed3b00 R15: 0000558a0ba742de Jul 31 16:32:01 backup1 CRON[5113]: pam_unix(cron:session): session opened for user root by (uid=0) Jul 31 16:32:01 backup1 CRON[5114]: (root) CMD (/opt/scripts/unstuck.sh) Jul 31 16:32:01 backup1 CRON[5113]: pam_unix(cron:session): session closed for user root Jul 31 16:32:03 backup1 sudo[5137]: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_btrfs --allocated-critical-percent 90 --allocated-warning-percent 75 --mountpoint /mnt/data Jul 31 16:32:03 backup1 sudo[5137]: pam_unix(sudo:session): session opened for user root by (uid=0) Jul 31 16:32:03 backup1 sudo[5137]: pam_unix(sudo:session): session closed for user root Jul 31 16:32:09 backup1 sudo[5157]: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ipmi_sensor Jul 31 16:32:09 backup1 sudo[5157]: pam_unix(sudo:session): session opened for user root by (uid=0) Jul 31 16:32:10 backup1 sudo[5157]: pam_unix(sudo:session): session closed for user root ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <2f2f1c21-c81b-55aa-6f77-e2d3f32d32cb@xmyslivec.cz>]
* Re: Linux RAID with btrfs stuck and consume 100 % CPU [not found] ` <2f2f1c21-c81b-55aa-6f77-e2d3f32d32cb@xmyslivec.cz> @ 2020-08-19 22:58 ` Chris Murphy 2020-08-26 15:35 ` Vojtech Myslivec 0 siblings, 1 reply; 22+ messages in thread From: Chris Murphy @ 2020-08-19 22:58 UTC (permalink / raw) To: Vojtech Myslivec Cc: Chris Murphy, Michal Moravec, Btrfs BTRFS, Linux-RAID, Song Liu On Wed, Aug 19, 2020 at 11:29 AM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote: > > Linux backup1 5.7.0-0.bpo.2-amd64 #1 SMP Debian 5.7.10-1~bpo10+1 Should be new enough; I don't see raid related md changes between 5.7.10 and 5.7.16. I haven't looked at 5.8, but 5.7 is probably recent enough to know if there have been relevant changes in 5.8 that are worth testing. > > - `5.7_profs.txt` > - dump of the whole /proc when the issue happened The problem here I think is that /proc/pid/stack is empty. You might have to hammer on it a bunch of times to get a stack. I can't tell if the sysrq+w is enough information to conclusively tell if this is strictly an md problem or if there's something else going on. But I do see in the sysrq+w evidence of a Btrfs snapshot happening, which will result in a flush of the file system. Since the mdadm raid journal is on two SSDs which should be fast enough to accept the metadata changes before actually doing the flush. -- Chris Murphy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-08-19 22:58 ` Chris Murphy @ 2020-08-26 15:35 ` Vojtech Myslivec 2020-08-26 18:07 ` Chris Murphy 0 siblings, 1 reply; 22+ messages in thread From: Vojtech Myslivec @ 2020-08-26 15:35 UTC (permalink / raw) To: Chris Murphy, Song Liu; +Cc: Michal Moravec, Btrfs BTRFS, Linux-RAID [-- Attachment #1: Type: text/plain, Size: 2898 bytes --] Hi again, On 20. 08. 20 0:58, Chris Murphy wrote: > The problem here I think is that /proc/pid/stack is empty. You might > have to hammer on it a bunch of times to get a stack. I can't tell if > the sysrq+w is enough information to conclusively tell if this is > strictly an md problem or if there's something else going on. > > But I do see in the sysrq+w evidence of a Btrfs snapshot happening, > which will result in a flush of the file system. Since the mdadm raid > journal is on two SSDs which should be fast enough to accept the > metadata changes before actually doing the flush. We were lucky to dump some stacks yesterday, when the issue happened again. In `sysrq.log`, there is an dmesg/kern.log output of echo w > /proc/sysrq-trigger command, when `md1_raid6` started to consume 100% CPU and all btrfs-snapshot-related commands stuck in the *disk sleep* state. We had issued sysrq+w command several times and I have included all unique tasks appeared there. In `stack_md1_reclaim.txt`, `stack_btrfs-transacti.txt` and `stack_btrfs*.txt`, there are outputs of cat /proc/<pid>/stack of given processes during the same time. The output is still the same when we issued this command several times. In `stack_md1_raid6-[0-3].txt`, there are outputs of the same command where pid is `md1_raid6` process id. We issued that several time and the output differs a bit sometimes. If I get it right, differs only in instruction offset of given functions. I include all the combinations we encounter for case it matters. We have dumped stack of this process in a while-true cycle just after we perform a manual "unstuck" workaround (I described this action in the first e-mail). I can send it to the mailing list as well if needed. I hope these outputs include what you, Song, requested on July the 30th (and I hope it's ok to continue in this thread) On 30. 07. 20 8:45, Song Liu wrote: >> On Jul 29, 2020, at 2:06 PM, Guoqing Jiang wrote: >> >> Hi, >> >> On 7/22/20 10:47 PM, Vojtech Myslivec wrote: >>> 1. What should be the cause of this problem? >> >> Just a quick glance based on the stacks which you attached, I guess >> it could be a deadlock issue of raid5 cache super write. >> >> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in >> superblock write") didn't fix the problem completely. Cc Song. >> >> And I am curious why md thread is not waked if mddev_trylock fails, >> you can give it a try but I can't promise it helps ... > > Thanks Guoqing! > > I am not sure whether we hit the mddev_trylock() failure. Looks like > the md1_raid6 thread is already running at 100%. > > A few questions: > > 1. I see wbt_wait in the stack trace. Are we using write back > throttling here? > 2. Could you please get the /proc/<pid>/stack for <pid> of md1_raid6? > We may want to sample it multiple times. > > Thanks, > Song Thanks, Vojtech and Michal [-- Attachment #2: stack_md1_raid6-0.txt --] [-- Type: text/plain, Size: 266 bytes --] [<0>] ops_run_io+0x40/0xd80 [raid456] [<0>] handle_stripe+0xc32/0x2230 [raid456] [<0>] handle_active_stripes.isra.71+0x3b8/0x590 [raid456] [<0>] raid5d+0x392/0x5b0 [raid456] [<0>] md_thread+0x94/0x150 [md_mod] [<0>] kthread+0x112/0x130 [<0>] ret_from_fork+0x22/0x40 [-- Attachment #3: stack_md1_raid6-1.txt --] [-- Type: text/plain, Size: 266 bytes --] [<0>] ops_run_io+0x40/0xd80 [raid456] [<0>] handle_stripe+0x136/0x2230 [raid456] [<0>] handle_active_stripes.isra.71+0x3b8/0x590 [raid456] [<0>] raid5d+0x392/0x5b0 [raid456] [<0>] md_thread+0x94/0x150 [md_mod] [<0>] kthread+0x112/0x130 [<0>] ret_from_fork+0x22/0x40 [-- Attachment #4: stack_md1_raid6-2.txt --] [-- Type: text/plain, Size: 266 bytes --] [<0>] ops_run_io+0x40/0xd80 [raid456] [<0>] handle_stripe+0xc32/0x2230 [raid456] [<0>] handle_active_stripes.isra.71+0x485/0x590 [raid456] [<0>] raid5d+0x392/0x5b0 [raid456] [<0>] md_thread+0x94/0x150 [md_mod] [<0>] kthread+0x112/0x130 [<0>] ret_from_fork+0x22/0x40 [-- Attachment #5: stack_md1_raid6-3.txt --] [-- Type: text/plain, Size: 266 bytes --] [<0>] ops_run_io+0x40/0xd80 [raid456] [<0>] handle_stripe+0x136/0x2230 [raid456] [<0>] handle_active_stripes.isra.71+0x420/0x590 [raid456] [<0>] raid5d+0x392/0x5b0 [raid456] [<0>] md_thread+0x94/0x150 [md_mod] [<0>] kthread+0x112/0x130 [<0>] ret_from_fork+0x22/0x40 [-- Attachment #6: stack_md1_reclaim.txt --] [-- Type: text/plain, Size: 375 bytes --] [<0>] rq_qos_wait+0xfa/0x170 [<0>] wbt_wait+0x98/0xe0 [<0>] __rq_qos_throttle+0x23/0x30 [<0>] blk_mq_make_request+0x12a/0x5d0 [<0>] generic_make_request+0xcf/0x310 [<0>] submit_bio+0x42/0x1c0 [<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod] [<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456] [<0>] md_thread+0x94/0x150 [md_mod] [<0>] kthread+0x112/0x130 [<0>] ret_from_fork+0x22/0x40 [-- Attachment #7: stack_btrfs-transacti.txt --] [-- Type: text/plain, Size: 395 bytes --] [<0>] wait_on_page_bit+0x15c/0x230 [<0>] __filemap_fdatawait_range+0x89/0xf0 [<0>] filemap_fdatawait_range+0xe/0x20 [<0>] __btrfs_wait_marked_extents.isra.19+0xc2/0x100 [btrfs] [<0>] btrfs_write_and_wait_transaction.isra.23+0x67/0xd0 [btrfs] [<0>] btrfs_commit_transaction+0x755/0xa50 [btrfs] [<0>] transaction_kthread+0x144/0x170 [btrfs] [<0>] kthread+0x112/0x130 [<0>] ret_from_fork+0x22/0x40 [-- Attachment #8: stack_btrfs32445.txt --] [-- Type: text/plain, Size: 165 bytes --] [<0>] do_wait+0x1c7/0x230 [<0>] kernel_wait4+0xa6/0x140 [<0>] __do_sys_wait4+0x83/0x90 [<0>] do_syscall_64+0x52/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [-- Attachment #9: stack_btrfs32277.txt --] [-- Type: text/plain, Size: 319 bytes --] [<0>] rwsem_down_write_slowpath+0x329/0x4e0 [<0>] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [<0>] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [<0>] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [<0>] ksys_ioctl+0x86/0xc0 [<0>] __x64_sys_ioctl+0x16/0x20 [<0>] do_syscall_64+0x52/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [-- Attachment #10: stack_btrfs31357.txt --] [-- Type: text/plain, Size: 319 bytes --] [<0>] rwsem_down_write_slowpath+0x329/0x4e0 [<0>] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [<0>] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [<0>] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [<0>] ksys_ioctl+0x86/0xc0 [<0>] __x64_sys_ioctl+0x16/0x20 [<0>] do_syscall_64+0x52/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [-- Attachment #11: stack_btrfs31094.txt --] [-- Type: text/plain, Size: 319 bytes --] [<0>] rwsem_down_write_slowpath+0x329/0x4e0 [<0>] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [<0>] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [<0>] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [<0>] ksys_ioctl+0x86/0xc0 [<0>] __x64_sys_ioctl+0x16/0x20 [<0>] do_syscall_64+0x52/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [-- Attachment #12: stack_btrfs30168.txt --] [-- Type: text/plain, Size: 411 bytes --] [<0>] wait_for_commit+0x58/0x80 [btrfs] [<0>] btrfs_commit_transaction+0x169/0xa50 [btrfs] [<0>] btrfs_mksubvol+0x2e9/0x450 [btrfs] [<0>] __btrfs_ioctl_snap_create+0x167/0x170 [btrfs] [<0>] btrfs_ioctl_snap_create_v2+0xc5/0xe0 [btrfs] [<0>] btrfs_ioctl+0xf61/0x2ef0 [btrfs] [<0>] ksys_ioctl+0x86/0xc0 [<0>] __x64_sys_ioctl+0x16/0x20 [<0>] do_syscall_64+0x52/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [-- Attachment #13: sysrq.log --] [-- Type: text/x-log, Size: 20285 bytes --] [Tue Aug 25 16:41:16 2020] sysrq: Show Blocked State [Tue Aug 25 16:41:16 2020] task PC stack pid father [Tue Aug 25 16:41:16 2020] md1_reclaim D 0 806 2 0x80004000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] ? __switch_to_asm+0x34/0x70 [Tue Aug 25 16:41:16 2020] ? __switch_to_asm+0x34/0x70 [Tue Aug 25 16:41:16 2020] ? wbt_exit+0x30/0x30 [Tue Aug 25 16:41:16 2020] ? __wbt_done+0x30/0x30 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] io_schedule+0x12/0x40 [Tue Aug 25 16:41:16 2020] rq_qos_wait+0xfa/0x170 [Tue Aug 25 16:41:16 2020] ? __switch_to_asm+0x34/0x70 [Tue Aug 25 16:41:16 2020] ? karma_partition+0x1e0/0x1e0 [Tue Aug 25 16:41:16 2020] ? wbt_exit+0x30/0x30 [Tue Aug 25 16:41:16 2020] wbt_wait+0x98/0xe0 [Tue Aug 25 16:41:16 2020] __rq_qos_throttle+0x23/0x30 [Tue Aug 25 16:41:16 2020] blk_mq_make_request+0x12a/0x5d0 [Tue Aug 25 16:41:16 2020] generic_make_request+0xcf/0x310 [Tue Aug 25 16:41:16 2020] submit_bio+0x42/0x1c0 [Tue Aug 25 16:41:16 2020] ? md_super_write.part.70+0x98/0x120 [md_mod] [Tue Aug 25 16:41:16 2020] md_update_sb.part.71+0x3c0/0x8f0 [md_mod] [Tue Aug 25 16:41:16 2020] r5l_do_reclaim+0x32a/0x3b0 [raid456] [Tue Aug 25 16:41:16 2020] ? schedule_timeout+0x162/0x340 [Tue Aug 25 16:41:16 2020] ? prepare_to_wait_event+0xb6/0x140 [Tue Aug 25 16:41:16 2020] ? md_register_thread+0xd0/0xd0 [md_mod] [Tue Aug 25 16:41:16 2020] md_thread+0x94/0x150 [md_mod] [Tue Aug 25 16:41:16 2020] ? finish_wait+0x80/0x80 [Tue Aug 25 16:41:16 2020] kthread+0x112/0x130 [Tue Aug 25 16:41:16 2020] ? kthread_park+0x80/0x80 [Tue Aug 25 16:41:16 2020] ret_from_fork+0x22/0x40 [Tue Aug 25 16:41:16 2020] btrfs-transacti D 0 893 2 0x80004000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] io_schedule+0x12/0x40 [Tue Aug 25 16:41:16 2020] wait_on_page_bit+0x15c/0x230 [Tue Aug 25 16:41:16 2020] ? file_fdatawait_range+0x20/0x20 [Tue Aug 25 16:41:16 2020] __filemap_fdatawait_range+0x89/0xf0 [Tue Aug 25 16:41:16 2020] filemap_fdatawait_range+0xe/0x20 [Tue Aug 25 16:41:16 2020] __btrfs_wait_marked_extents.isra.19+0xc2/0x100 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_write_and_wait_transaction.isra.23+0x67/0xd0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_commit_transaction+0x755/0xa50 [btrfs] [Tue Aug 25 16:41:16 2020] ? start_transaction+0xd5/0x540 [btrfs] [Tue Aug 25 16:41:16 2020] transaction_kthread+0x144/0x170 [btrfs] [Tue Aug 25 16:41:16 2020] ? btrfs_cleanup_transaction+0x590/0x590 [btrfs] [Tue Aug 25 16:41:16 2020] kthread+0x112/0x130 [Tue Aug 25 16:41:16 2020] ? kthread_park+0x80/0x80 [Tue Aug 25 16:41:16 2020] ret_from_fork+0x22/0x40 [Tue Aug 25 16:41:16 2020] btrfs D 0 30168 30162 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] wait_for_commit+0x58/0x80 [btrfs] [Tue Aug 25 16:41:16 2020] ? finish_wait+0x80/0x80 [Tue Aug 25 16:41:16 2020] btrfs_commit_transaction+0x169/0xa50 [btrfs] [Tue Aug 25 16:41:16 2020] ? btrfs_record_root_in_trans+0x56/0x60 [btrfs] [Tue Aug 25 16:41:16 2020] ? start_transaction+0xd5/0x540 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_mksubvol+0x2e9/0x450 [btrfs] [Tue Aug 25 16:41:16 2020] __btrfs_ioctl_snap_create+0x167/0x170 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_create_v2+0xc5/0xe0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xf61/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f273259b427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffe48178418 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f273259b427 [Tue Aug 25 16:41:16 2020] RDX: 00007ffe48178458 RSI: 0000000050009417 RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 00005637741dc260 R08: 0000000000000008 R09: 0000000035322d38 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 00005637741dc2b0 [Tue Aug 25 16:41:16 2020] R13: 00005637741dc2b0 R14: 0000000000000003 R15: 0000000000000004 [Tue Aug 25 16:41:16 2020] btrfs D 0 31094 31088 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] rwsem_down_write_slowpath+0x329/0x4e0 [Tue Aug 25 16:41:16 2020] ? kmem_cache_alloc_trace+0x15e/0x230 [Tue Aug 25 16:41:16 2020] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [Tue Aug 25 16:41:16 2020] ? d_lookup+0x25/0x50 [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7ff3fa6a3427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffe92fc4058 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff3fa6a3427 [Tue Aug 25 16:41:16 2020] RDX: 00007ffe92fc4090 RSI: 000000005000940f RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055b4e34ab2ca R09: 00007ff3fa72de80 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 [Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffe92fc4090 R15: 000055b4e34ab2ca [Tue Aug 25 16:41:16 2020] btrfs D 0 31357 31351 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] rwsem_down_write_slowpath+0x329/0x4e0 [Tue Aug 25 16:41:16 2020] ? kmem_cache_alloc_trace+0x15e/0x230 [Tue Aug 25 16:41:16 2020] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [Tue Aug 25 16:41:16 2020] ? d_lookup+0x25/0x50 [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f9c14290427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007fff4c9f8ed8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9c14290427 [Tue Aug 25 16:41:16 2020] RDX: 00007fff4c9f8f10 RSI: 000000005000940f RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055ea83aa82c8 R09: 00007f9c1431ae80 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 [Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007fff4c9f8f10 R15: 000055ea83aa82c8 [Tue Aug 25 16:41:16 2020] btrfs D 0 32277 32271 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] rwsem_down_write_slowpath+0x329/0x4e0 [Tue Aug 25 16:41:16 2020] ? kmem_cache_alloc_trace+0x15e/0x230 [Tue Aug 25 16:41:16 2020] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [Tue Aug 25 16:41:16 2020] ? d_lookup+0x25/0x50 [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f0b35736427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007fff96ad7c48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0b35736427 [Tue Aug 25 16:41:16 2020] RDX: 00007fff96ad7c80 RSI: 000000005000940f RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055e2598b62c8 R09: 00007f0b357c0e80 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 [Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007fff96ad7c80 R15: 000055e2598b62c8 [Tue Aug 25 16:41:16 2020] btrfs D 0 32451 32445 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] rwsem_down_write_slowpath+0x329/0x4e0 [Tue Aug 25 16:41:16 2020] ? kmem_cache_alloc_trace+0x15e/0x230 [Tue Aug 25 16:41:16 2020] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [Tue Aug 25 16:41:16 2020] ? d_lookup+0x25/0x50 [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f0517ff8427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffc862d3c98 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0517ff8427 [Tue Aug 25 16:41:16 2020] RDX: 00007ffc862d3cd0 RSI: 000000005000940f RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 0000555d6d7872ca R09: 00007f0518082e80 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 [Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffc862d3cd0 R15: 0000555d6d7872ca [Tue Aug 25 16:41:16 2020] btrfs D 0 596 590 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] rwsem_down_write_slowpath+0x329/0x4e0 [Tue Aug 25 16:41:16 2020] ? kmem_cache_alloc_trace+0x15e/0x230 [Tue Aug 25 16:41:16 2020] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [Tue Aug 25 16:41:16 2020] ? d_lookup+0x25/0x50 [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7ffa2b954427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffc88d95748 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffa2b954427 [Tue Aug 25 16:41:16 2020] RDX: 00007ffc88d95780 RSI: 000000005000940f RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 0000558d2b97f2c8 R09: 00007ffa2b9dee80 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 [Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffc88d95780 R15: 0000558d2b97f2c8 [Tue Aug 25 16:41:16 2020] btrfs D 0 792 786 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] rwsem_down_write_slowpath+0x329/0x4e0 [Tue Aug 25 16:41:16 2020] ? kmem_cache_alloc_trace+0x15e/0x230 [Tue Aug 25 16:41:16 2020] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [Tue Aug 25 16:41:16 2020] ? d_lookup+0x25/0x50 [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f74f06d2427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007fffbb878738 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f74f06d2427 [Tue Aug 25 16:41:16 2020] RDX: 00007fffbb878770 RSI: 000000005000940f RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055f062e442c7 R09: 00007f74f075ce80 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 [Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007fffbb878770 R15: 000055f062e442c7 [Tue Aug 25 16:41:16 2020] btrfs D 0 833 827 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] rwsem_down_write_slowpath+0x329/0x4e0 [Tue Aug 25 16:41:16 2020] ? kmem_cache_alloc_trace+0x15e/0x230 [Tue Aug 25 16:41:16 2020] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [Tue Aug 25 16:41:16 2020] ? d_lookup+0x25/0x50 [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f65fee13427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffc45aa0508 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f65fee13427 [Tue Aug 25 16:41:16 2020] RDX: 00007ffc45aa0540 RSI: 000000005000940f RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055d819de32c6 R09: 00007f65fee9de80 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 [Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffc45aa0540 R15: 000055d819de32c6 [Tue Aug 25 16:41:16 2020] btrfs D 0 10542 10515 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] rwsem_down_write_slowpath+0x329/0x4e0 [Tue Aug 25 16:41:16 2020] ? kmem_cache_alloc_trace+0x15e/0x230 [Tue Aug 25 16:41:16 2020] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [Tue Aug 25 16:41:16 2020] ? d_lookup+0x25/0x50 [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f48b0d4a427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffdf1ab4fd8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f48b0d4a427 [Tue Aug 25 16:41:16 2020] RDX: 00007ffdf1ab5010 RSI: 000000005000940f RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 00005631cce8a2c8 R09: 00007f48b0dd4e80 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 [Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffdf1ab5010 R15: 00005631cce8a2c8 [Tue Aug 25 16:41:16 2020] btrfs D 0 1667 1661 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] rwsem_down_write_slowpath+0x329/0x4e0 [Tue Aug 25 16:41:16 2020] ? kmem_cache_alloc_trace+0x15e/0x230 [Tue Aug 25 16:41:16 2020] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [Tue Aug 25 16:41:16 2020] ? d_lookup+0x25/0x50 [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f759eff1427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffcc4fb06c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f759eff1427 [Tue Aug 25 16:41:16 2020] RDX: 00007ffcc4fb0700 RSI: 000000005000940f RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 00005584f7c5a2ca R09: 00007f759f07be80 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 [Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffcc4fb0700 R15: 00005584f7c5a2ca [Tue Aug 25 16:41:16 2020] btrfs D 0 2473 2467 0x00000000 [Tue Aug 25 16:41:16 2020] Call Trace: [Tue Aug 25 16:41:16 2020] __schedule+0x2dd/0x710 [Tue Aug 25 16:41:16 2020] schedule+0x40/0xb0 [Tue Aug 25 16:41:16 2020] rwsem_down_write_slowpath+0x329/0x4e0 [Tue Aug 25 16:41:16 2020] ? kmem_cache_alloc_trace+0x15e/0x230 [Tue Aug 25 16:41:16 2020] btrfs_delete_subvolume+0x90/0x5f0 [btrfs] [Tue Aug 25 16:41:16 2020] ? d_lookup+0x25/0x50 [Tue Aug 25 16:41:16 2020] btrfs_ioctl_snap_destroy+0x5b3/0x6f0 [btrfs] [Tue Aug 25 16:41:16 2020] btrfs_ioctl+0xa5f/0x2ef0 [btrfs] [Tue Aug 25 16:41:16 2020] ? ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] ksys_ioctl+0x86/0xc0 [Tue Aug 25 16:41:16 2020] __x64_sys_ioctl+0x16/0x20 [Tue Aug 25 16:41:16 2020] do_syscall_64+0x52/0x170 [Tue Aug 25 16:41:16 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Aug 25 16:41:16 2020] RIP: 0033:0x7f278ea1a427 [Tue Aug 25 16:41:16 2020] Code: Bad RIP value. [Tue Aug 25 16:41:16 2020] RSP: 002b:00007ffd784d5318 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [Tue Aug 25 16:41:16 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f278ea1a427 [Tue Aug 25 16:41:16 2020] RDX: 00007ffd784d5350 RSI: 000000005000940f RDI: 0000000000000003 [Tue Aug 25 16:41:16 2020] RBP: 0000000000000000 R08: 000055b8f4d412de R09: 00007f278eaa4e80 [Tue Aug 25 16:41:16 2020] R10: fffffffffffffb66 R11: 0000000000000206 R12: 0000000000000003 [Tue Aug 25 16:41:16 2020] R13: 0000000000000003 R14: 00007ffd784d5350 R15: 000055b8f4d412de ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-08-26 15:35 ` Vojtech Myslivec @ 2020-08-26 18:07 ` Chris Murphy 2020-09-16 9:42 ` Vojtech Myslivec 0 siblings, 1 reply; 22+ messages in thread From: Chris Murphy @ 2020-08-26 18:07 UTC (permalink / raw) To: Vojtech Myslivec Cc: Chris Murphy, Song Liu, Michal Moravec, Btrfs BTRFS, Linux-RAID OK so from the attachments.. cat /proc/<pid>/stack for md1_raid6 [<0>] rq_qos_wait+0xfa/0x170 [<0>] wbt_wait+0x98/0xe0 [<0>] __rq_qos_throttle+0x23/0x30 [<0>] blk_mq_make_request+0x12a/0x5d0 [<0>] generic_make_request+0xcf/0x310 [<0>] submit_bio+0x42/0x1c0 [<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod] [<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456] [<0>] md_thread+0x94/0x150 [md_mod] [<0>] kthread+0x112/0x130 [<0>] ret_from_fork+0x22/0x40 Btrfs snapshot flushing might instigate the problem but it seems to me there's some kind of contention or blocking happening within md, and that's why everything stalls. But I can't tell why. Do you have any iostat output at the time of this problem? I'm wondering if md is waiting on disks. If not, try `iostat -dxm 5` and share a few minutes before and after the freeze/hang. -- Chris Murphy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-08-26 18:07 ` Chris Murphy @ 2020-09-16 9:42 ` Vojtech Myslivec 2020-09-17 17:08 ` Chris Murphy 2020-09-17 17:43 ` Chris Murphy 0 siblings, 2 replies; 22+ messages in thread From: Vojtech Myslivec @ 2020-09-16 9:42 UTC (permalink / raw) To: Chris Murphy; +Cc: Song Liu, Michal Moravec, Btrfs BTRFS, Linux-RAID [-- Attachment #1: Type: text/plain, Size: 2012 bytes --] Hello, it seems my last e-mail was filtered as I can't find it in the archives. So I will resend it and include all attachments in one tarball. On 26. 08. 20 20:07, Chris Murphy wrote:> OK so from the attachments.. > > cat /proc/<pid>/stack for md1_raid6 > > [<0>] rq_qos_wait+0xfa/0x170 > [<0>] wbt_wait+0x98/0xe0 > [<0>] __rq_qos_throttle+0x23/0x30 > [<0>] blk_mq_make_request+0x12a/0x5d0 > [<0>] generic_make_request+0xcf/0x310 > [<0>] submit_bio+0x42/0x1c0 > [<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod] > [<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456] > [<0>] md_thread+0x94/0x150 [md_mod] > [<0>] kthread+0x112/0x130 > [<0>] ret_from_fork+0x22/0x40 > > > Btrfs snapshot flushing might instigate the problem but it seems to me > there's some kind of contention or blocking happening within md, and > that's why everything stalls. But I can't tell why. > > Do you have any iostat output at the time of this problem? I'm > wondering if md is waiting on disks. If not, try `iostat -dxm 5` and > share a few minutes before and after the freeze/hang. We have detected the issue at Monday 31.09.2020 15:24. It must happen sometimes between 15:22-15:24 as we monitor the state every 2 minutes. We have recorded stacks of blocked processes, sysrq+w command and requested `iostat`. Then in 15:45, we perform manual "unstuck" process by accessing md1 device via dd command (reading a few random blocks). I hope attached file names are self-explaining. Please let me know if we can do anything more to track the issue or if I forget something. Thanks a lot, Vojtech and Michal Description of the devices in iostat, just for recap: - sda-sdf: 6 HDD disks - sdg, sdh: 2 SSD disks - md0: raid1 over sdg1 and sdh1 ("SSD RAID", Physical Volume for LVM) - md1: our "problematic" raid6 over sda-sdf ("HDD RAID", btrfs formatted) - Logical volumes over md0 Physical Volume (on SSD RAID) - dm-0: 4G LV for SWAP - dm-1: 16G LV for root file system (ext4 formatted) - dm-2: 1G LV for md1 journal [-- Attachment #2: mdraid-btrfs-issue.tgz --] [-- Type: application/x-compressed-tar, Size: 43544 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-09-16 9:42 ` Vojtech Myslivec @ 2020-09-17 17:08 ` Chris Murphy 2020-09-17 17:20 ` Chris Murphy 2020-09-17 17:43 ` Chris Murphy 1 sibling, 1 reply; 22+ messages in thread From: Chris Murphy @ 2020-09-17 17:08 UTC (permalink / raw) To: Vojtech Myslivec Cc: Chris Murphy, Song Liu, Michal Moravec, Btrfs BTRFS, Linux-RAID On Wed, Sep 16, 2020 at 3:42 AM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote: > > Hello, > > it seems my last e-mail was filtered as I can't find it in the archives. > So I will resend it and include all attachments in one tarball. > > > On 26. 08. 20 20:07, Chris Murphy wrote:> OK so from the attachments.. > > > > cat /proc/<pid>/stack for md1_raid6 > > > > [<0>] rq_qos_wait+0xfa/0x170 > > [<0>] wbt_wait+0x98/0xe0 > > [<0>] __rq_qos_throttle+0x23/0x30 > > [<0>] blk_mq_make_request+0x12a/0x5d0 > > [<0>] generic_make_request+0xcf/0x310 > > [<0>] submit_bio+0x42/0x1c0 > > [<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod] > > [<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456] > > [<0>] md_thread+0x94/0x150 [md_mod] > > [<0>] kthread+0x112/0x130 > > [<0>] ret_from_fork+0x22/0x40 > > > > > > Btrfs snapshot flushing might instigate the problem but it seems to me > > there's some kind of contention or blocking happening within md, and > > that's why everything stalls. But I can't tell why. > > > > Do you have any iostat output at the time of this problem? I'm > > wondering if md is waiting on disks. If not, try `iostat -dxm 5` and > > share a few minutes before and after the freeze/hang. > We have detected the issue at Monday 31.09.2020 15:24. It must happen > sometimes between 15:22-15:24 as we monitor the state every 2 minutes. > > We have recorded stacks of blocked processes, sysrq+w command and > requested `iostat`. Then in 15:45, we perform manual "unstuck" process > by accessing md1 device via dd command (reading a few random blocks). > > I hope attached file names are self-explaining. > > Please let me know if we can do anything more to track the issue or if I > forget something. > > Thanks a lot, > Vojtech and Michal > > > > Description of the devices in iostat, just for recap: > - sda-sdf: 6 HDD disks > - sdg, sdh: 2 SSD disks > > - md0: raid1 over sdg1 and sdh1 ("SSD RAID", Physical Volume for LVM) > - md1: our "problematic" raid6 over sda-sdf ("HDD RAID", btrfs > formatted) > > - Logical volumes over md0 Physical Volume (on SSD RAID) > - dm-0: 4G LV for SWAP > - dm-1: 16G LV for root file system (ext4 formatted) > - dm-2: 1G LV for md1 journal > It's kindof a complicated setup. When this problem happens, can you check swap pressure? /sys/fs/cgroup/memory.stat pgfault and maybe also pgmajfault - see if they're going up; or also you can look at vmstat and see how heavy swap is being used at the time. The thing is. Because any heavy eviction means writes to dm-0->md0 raid1->sdg+sdh SSDs, which are the same SSDs that you have the md1 raid6 mdadm journal going to. So if you have any kind of swap pressure, it very likely will stop the journal or at least substantially slow it down, and now you get blocked tasks as the pressure builds more and more because now you have a ton of dirty writes in Btrfs that can't make it to disk. If there is minimal swap usage, then this hypothesis is false and something else is going on. I also don't have an explanation why your work around works. -- Chris Murphy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-09-17 17:08 ` Chris Murphy @ 2020-09-17 17:20 ` Chris Murphy 0 siblings, 0 replies; 22+ messages in thread From: Chris Murphy @ 2020-09-17 17:20 UTC (permalink / raw) To: Chris Murphy Cc: Vojtech Myslivec, Song Liu, Michal Moravec, Btrfs BTRFS, Linux-RAID The iostat isn't particularly revealing, I don't see especially high %util for any device. SSD write MB/s gets up to 42 which is reasonable. -- Chris Murphy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-09-16 9:42 ` Vojtech Myslivec 2020-09-17 17:08 ` Chris Murphy @ 2020-09-17 17:43 ` Chris Murphy 2020-09-23 18:14 ` Vojtech Myslivec 1 sibling, 1 reply; 22+ messages in thread From: Chris Murphy @ 2020-09-17 17:43 UTC (permalink / raw) To: Vojtech Myslivec Cc: Chris Murphy, Song Liu, Michal Moravec, Btrfs BTRFS, Linux-RAID [Mon Aug 31 15:31:55 2020] sysrq: Show Blocked State [Mon Aug 31 15:31:55 2020] task PC stack pid father [Mon Aug 31 15:31:55 2020] md1_reclaim D 0 806 2 0x80004000 [Mon Aug 31 15:31:55 2020] Call Trace: [Mon Aug 31 15:31:55 2020] __schedule+0x2dd/0x710 [Mon Aug 31 15:31:55 2020] ? finish_wait+0x80/0x80 [Mon Aug 31 15:31:55 2020] ? wbt_exit+0x30/0x30 [Mon Aug 31 15:31:55 2020] ? __wbt_done+0x30/0x30 [Mon Aug 31 15:31:55 2020] schedule+0x40/0xb0 [Mon Aug 31 15:31:55 2020] io_schedule+0x12/0x40 [Mon Aug 31 15:31:55 2020] rq_qos_wait+0xfa/0x170 [Mon Aug 31 15:31:55 2020] ? karma_partition+0x1e0/0x1e0 [Mon Aug 31 15:31:55 2020] ? wbt_exit+0x30/0x30 [Mon Aug 31 15:31:55 2020] wbt_wait+0x98/0xe0 [Mon Aug 31 15:31:55 2020] __rq_qos_throttle+0x23/0x30 [Mon Aug 31 15:31:55 2020] blk_mq_make_request+0x12a/0x5d0 [Mon Aug 31 15:31:55 2020] generic_make_request+0xcf/0x310 [Mon Aug 31 15:31:55 2020] submit_bio+0x42/0x1c0 [Mon Aug 31 15:31:55 2020] ? md_super_write.part.70+0x98/0x120 [md_mod] [Mon Aug 31 15:31:55 2020] md_update_sb.part.71+0x3c0/0x8f0 [md_mod] [Mon Aug 31 15:31:55 2020] r5l_do_reclaim+0x32a/0x3b0 [raid456] [Mon Aug 31 15:31:55 2020] ? schedule_timeout+0x162/0x340 [Mon Aug 31 15:31:55 2020] ? prepare_to_wait_event+0xb6/0x140 [Mon Aug 31 15:31:55 2020] ? md_register_thread+0xd0/0xd0 [md_mod] [Mon Aug 31 15:31:55 2020] md_thread+0x94/0x150 [md_mod] [Mon Aug 31 15:31:55 2020] ? finish_wait+0x80/0x80 [Mon Aug 31 15:31:55 2020] kthread+0x112/0x130 [Mon Aug 31 15:31:55 2020] ? kthread_park+0x80/0x80 [Mon Aug 31 15:31:55 2020] ret_from_fork+0x22/0x40 *shrug* These SSDs should be able to handle > 500MB/s. And > 130K IOPS. Swap would have to be pretty heavy to slow down journal writes. I'm not sure I have any good advise. My remaining ideas involve changing configuration just to see if the problem goes away, rather than actually understanding the cause of the problem. --- Chris Murphy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-09-17 17:43 ` Chris Murphy @ 2020-09-23 18:14 ` Vojtech Myslivec 2021-02-11 3:14 ` Manuel Riel 0 siblings, 1 reply; 22+ messages in thread From: Vojtech Myslivec @ 2020-09-23 18:14 UTC (permalink / raw) To: Chris Murphy; +Cc: Michal Moravec, Btrfs BTRFS, Linux-RAID On 17. 09. 20 19:08, Chris Murphy wrote: > > On Wed, Sep 16, 2020 at 3:42 AM Vojtech Myslivec wrote: >> >> Description of the devices in iostat, just for recap: >> - sda-sdf: 6 HDD disks >> - sdg, sdh: 2 SSD disks >> >> - md0: raid1 over sdg1 and sdh1 ("SSD RAID", Physical Volume for LVM) >> - md1: our "problematic" raid6 over sda-sdf ("HDD RAID", btrfs >> formatted) >> >> - Logical volumes over md0 Physical Volume (on SSD RAID) >> - dm-0: 4G LV for SWAP >> - dm-1: 16G LV for root file system (ext4 formatted) >> - dm-2: 1G LV for md1 journal > > It's kindof a complicated setup. When this problem happens, can you > check swap pressure? > > /sys/fs/cgroup/memory.stat > > pgfault and maybe also pgmajfault - see if they're going up; or also > you can look at vmstat and see how heavy swap is being used at the > time. The thing is. > > Because any heavy eviction means writes to dm-0->md0 raid1->sdg+sdh > SSDs, which are the same SSDs that you have the md1 raid6 mdadm > journal going to. So if you have any kind of swap pressure, it very > likely will stop the journal or at least substantially slow it down, > and now you get blocked tasks as the pressure builds more and more > because now you have a ton of dirty writes in Btrfs that can't make it > to disk. > > If there is minimal swap usage, then this hypothesis is false and > something else is going on. I also don't have an explanation why your > work around works. On 17. 09. 20 19:20, Chris Murphy wrote: > The iostat isn't particularly revealing, I don't see especially high > %util for any device. SSD write MB/s gets up to 42 which is > reasonable. On 17. 09. 20 19:43, Chris Murphy wrote: > [Mon Aug 31 15:31:55 2020] sysrq: Show Blocked State > [Mon Aug 31 15:31:55 2020] task PC stack pid father > > [Mon Aug 31 15:31:55 2020] md1_reclaim D 0 806 2 0x80004000 > [Mon Aug 31 15:31:55 2020] Call Trace: > ... > > *shrug* > > These SSDs should be able to handle > 500MB/s. And > 130K IOPS. Swap > would have to be pretty heavy to slow down journal writes. > > I'm not sure I have any good advise. My remaining ideas involve > changing configuration just to see if the problem goes away, rather > than actually understanding the cause of the problem. OK, I see. This is a physical server with 32 GB RAM and dedicated to backup tasks. Our monitoring shows there is (almost) no swap usage all the time. So I hope this should not be the problem. However, I would look for the stats you mentioned and, for start, I would disable the swap for some several days. It's there only as a "backup" for any case, and it is not used at all most of the time. Sadly, I am not able to _disable the journal_ if I do - just by removing the device from the array - the MD device instantly fails and btrfs volume remounts read-only. I can not find any other way how to disable the journal, it seems it is not supported. I can see only `--add-journal` option and no corresponding `--delete-journal` one. I welcome any advice how to exchange write-journal with internal bitmap. Any other possible changes that comes to my mind are: - Enlarge write-journal - Move write-journal to physical sdg/sdh SSDs (out from md0 raid1 device). I find the later a bit risky, as the write-journal is not redundant then. That's the reason we choose write journal on RAID device. Vojtech ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-09-23 18:14 ` Vojtech Myslivec @ 2021-02-11 3:14 ` Manuel Riel 2021-02-28 8:34 ` Manuel Riel 0 siblings, 1 reply; 22+ messages in thread From: Manuel Riel @ 2021-02-11 3:14 UTC (permalink / raw) To: Vojtech Myslivec Cc: Chris Murphy, Michal Moravec, Linux-RAID, songliubraving, guoqing.jiang I'm also hitting the exact same problem with XFS on RAID6 using a RAID1 write journal on two NVMes. CentOS 8, 4.18.0-240.10.1.el8_3.x86_64. Symptoms: - high CPU usage of md4_raid6 process - IO wait goes up - IO on that file system locks up for tens of minutes and the kernel reports: [Wed Feb 10 23:23:05 2021] INFO: task md4_reclaim:1070 blocked for more than 20 seconds. [Wed Feb 10 23:23:05 2021] Not tainted 4.18.0-240.10.1.el8_3.x86_64 #1 [Wed Feb 10 23:23:05 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Feb 10 23:23:05 2021] md4_reclaim D 0 1070 2 0x80004000 Already confirmed it's not a timeout mismatch. No drive errors reported. SCT Error Recovery Control is set to 7 seconds >> It's kindof a complicated setup. When this problem happens, can you >> check swap pressure? There is a RAID1 SWAP partition, but it's almost unused, since the server has ample of RAM. >> >> /sys/fs/cgroup/memory.stat >> >> pgfault and maybe also pgmajfault - see if they're going up; or also >> you can look at vmstat and see how heavy swap is being used at the >> time. The thing is. >> >> Because any heavy eviction means writes to dm-0->md0 raid1->sdg+sdh >> SSDs, which are the same SSDs that you have the md1 raid6 mdadm >> journal going to. So if you have any kind of swap pressure, it very >> likely will stop the journal or at least substantially slow it down, >> and now you get blocked tasks as the pressure builds more and more >> because now you have a ton of dirty writes in Btrfs that can't make it >> to disk. I've disabled SWAP to test this theory. >> If there is minimal swap usage, then this hypothesis is false and >> something else is going on. I also don't have an explanation why your >> work around works. > > Sadly, I am not able to _disable the journal_ if I do - just by removing > the device from the array - the MD device instantly fails and btrfs > volume remounts read-only. I can not find any other way how to disable > the journal, it seems it is not supported. I can see only > `--add-journal` option and no corresponding `--delete-journal` one. > > I welcome any advice how to exchange write-journal with internal bitmap. I read that the array needs to be in read-only mode. Then you can fail and replace the write journal. (not tested) # mdadm --readonly /dev/md0 # mdadm /dev/md0 --fail /dev/<journal> # mdadm --manage /dev/md0 --add-journal /dev/<new-journal> > Any other possible changes that comes to my mind are: > - Enlarge write-journal My write journal is about 180 GB > - Move write-journal to physical sdg/sdh SSDs (out from md0 raid1 > device). I may try this, but as you say it's risky, especially when using "write-back" journal mode > I find the later a bit risky, as the write-journal is not redundant > then. That's the reason we choose write journal on RAID device. I'm also experimenting with write-back/write-through mode and different stripe_cache_size. Hoping to find something. If nothing helps, it may not be possible/supported to put a write journal on another RAID device? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2021-02-11 3:14 ` Manuel Riel @ 2021-02-28 8:34 ` Manuel Riel [not found] ` <56AD80D0-6853-4E3A-A94C-AD1477D3FDA4@snapdragon.cc> 0 siblings, 1 reply; 22+ messages in thread From: Manuel Riel @ 2021-02-28 8:34 UTC (permalink / raw) To: Vojtech Myslivec Cc: Chris Murphy, Michal Moravec, Linux-RAID, songliubraving, guoqing.jiang Hit another mdadm "hanger" today. No more reading possible and md4_raid6 stuck at 100% CPU. I've now moved the write journal off the RAID1 device. So it's not a "nested" RAID any more. Hope this will help. With only one hardware device used as write cache, I suppose only write-through mode[1] is suggested now. 1: https://www.kernel.org/doc/Documentation/md/raid5-cache.txt > On Feb 11, 2021, at 11:14, Manuel Riel <manu@snapdragon.cc> wrote: > > I'm also hitting the exact same problem with XFS on RAID6 using a RAID1 > write journal on two NVMes. CentOS 8, 4.18.0-240.10.1.el8_3.x86_64. > > Symptoms: > > - high CPU usage of md4_raid6 process > - IO wait goes up > - IO on that file system locks up for tens of minutes and the kernel reports: > > [Wed Feb 10 23:23:05 2021] INFO: task md4_reclaim:1070 blocked for more than 20 seconds. > [Wed Feb 10 23:23:05 2021] Not tainted 4.18.0-240.10.1.el8_3.x86_64 #1 > [Wed Feb 10 23:23:05 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Wed Feb 10 23:23:05 2021] md4_reclaim D 0 1070 2 0x80004000 > > Already confirmed it's not a timeout mismatch. No drive errors reported. SCT Error Recovery > Control is set to 7 seconds ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <56AD80D0-6853-4E3A-A94C-AD1477D3FDA4@snapdragon.cc>]
* Re: Linux RAID with btrfs stuck and consume 100 % CPU [not found] ` <56AD80D0-6853-4E3A-A94C-AD1477D3FDA4@snapdragon.cc> @ 2021-03-17 15:55 ` Vojtech Myslivec 0 siblings, 0 replies; 22+ messages in thread From: Vojtech Myslivec @ 2021-03-17 15:55 UTC (permalink / raw) To: Manuel Riel Cc: Chris Murphy, Michal Moravec, songliubraving, guoqing.jiang, Btrfs BTRFS, Linux-RAID Thanks a lot Manuel for your findings and information. It's good to know btrfs is not causing this issue and the common symptom is an MD journal on another RAID device. I have moved journal from logical volume on RAID1 to a plain partition on a SSD and I will monitor the state. Vojtech On 17. 03. 21 5:35, Manuel Riel wrote: > Final update on this issue for anyone who encounters a similar problem in the future: > > I didn't observe any "hanging" RAID devices after using an ordinary NVMe partition as journal. So using e.g. another md-RAID1 array as journal doesn't seem to be supported. > > The docs[1] say "This means the cache disk must be ... sustainable." The sustainable part motivated me to use a md-RAID1 array. I think the docs should mention that the journal can't be on another RAID array. > > I'm sending in a patch to emphasize this in the docs. > > > 1: https://www.kernel.org/doc/html/latest/driver-api/md/raid5-cache.html > >> On Feb 28, 2021, at 4:34 PM, Manuel Riel <manu@snapdragon.cc> wrote: >> >> Hit another mdadm "hanger" today. No more reading possible and md4_raid6 stuck at 100% CPU. >> >> I've now moved the write journal off the RAID1 device. So it's not a "nested" RAID any more. Hope this will help. >> >> With only one hardware device used as write cache, I suppose only write-through mode[1] is suggested now. >> >> >> 1: https://www.kernel.org/doc/Documentation/md/raid5-cache.txt > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec 2020-07-22 22:00 ` antlists 2020-07-23 2:08 ` Chris Murphy @ 2020-07-29 21:06 ` Guoqing Jiang 2020-07-29 21:48 ` Chris Murphy ` (2 more replies) 2 siblings, 3 replies; 22+ messages in thread From: Guoqing Jiang @ 2020-07-29 21:06 UTC (permalink / raw) To: Vojtech Myslivec, linux-btrfs, linux-raid; +Cc: Michal Moravec, Song Liu Hi, On 7/22/20 10:47 PM, Vojtech Myslivec wrote: > 1. What should be the cause of this problem? Just a quick glance based on the stacks which you attached, I guess it could be a deadlock issue of raid5 cache super write. Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock write") didn't fix the problem completely. Cc Song. And I am curious why md thread is not waked if mddev_trylock fails, you can give it a try but I can't promise it helps ... --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -1337,8 +1337,10 @@ static void r5l_write_super_and_discard_space(struct r5l_log *log, */ set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); - if (!mddev_trylock(mddev)) + if (!mddev_trylock(mddev)) { + md_wakeup_thread(mddev->thread); return; + } md_update_sb(mddev, 1); mddev_unlock(mddev); Thanks, Guoqing ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-07-29 21:06 ` Guoqing Jiang @ 2020-07-29 21:48 ` Chris Murphy 2020-08-12 14:19 ` Vojtech Myslivec 2020-07-30 6:45 ` Song Liu 2020-08-12 13:58 ` Vojtech Myslivec 2 siblings, 1 reply; 22+ messages in thread From: Chris Murphy @ 2020-07-29 21:48 UTC (permalink / raw) To: Guoqing Jiang Cc: Vojtech Myslivec, Btrfs BTRFS, Linux-RAID, Michal Moravec, Song Liu On Wed, Jul 29, 2020 at 3:06 PM Guoqing Jiang <guoqing.jiang@cloud.ionos.com> wrote: > > Hi, > > On 7/22/20 10:47 PM, Vojtech Myslivec wrote: > > 1. What should be the cause of this problem? > > Just a quick glance based on the stacks which you attached, I guess it > could be > a deadlock issue of raid5 cache super write. > > Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock > write") didn't fix the problem completely. Cc Song. That references discards, and it make me relook at mdadm -D which shows a journal device: 0 253 2 - journal /dev/dm-2 Vojtech, can you confirm this device is an SSD? There are a couple SSDs that show up in the dmesg if I recall correctly. What is the default discard hinting for this SSD when it's used as a journal device for mdadm? And what is the write behavior of the journal? I'm not familiar with this feature at all, whether it's treated as a raw block device for the journal or if the journal resides on a file system. So I get kinda curious what might happen long term if this is a very busy file system, very busy raid5/6 journal on this SSD, without any discard hints? Is it possible the SSD runs out of ready-to-write erase blocks, and the firmware has become super slow doing erasure/garbage collection on demand? And the journal is now having a hard time flushing? -- Chris Murphy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-07-29 21:48 ` Chris Murphy @ 2020-08-12 14:19 ` Vojtech Myslivec 2020-08-12 14:19 ` Vojtech Myslivec 0 siblings, 1 reply; 22+ messages in thread From: Vojtech Myslivec @ 2020-08-12 14:19 UTC (permalink / raw) To: Chris Murphy, Guoqing Jiang Cc: Btrfs BTRFS, Linux-RAID, Michal Moravec, Song Liu [-- Attachment #1: Type: text/plain, Size: 2848 bytes --] On 29. 07. 20 23:48, Chris Murphy wrote: > On Wed, Jul 29, 2020 at 3:06 PM Guoqing Jiang > <guoqing.jiang@cloud.ionos.com> wrote: >> On 7/22/20 10:47 PM, Vojtech Myslivec wrote: >>> 1. What should be the cause of this problem? >> >> Just a quick glance based on the stacks which you attached, I guess it >> could be >> a deadlock issue of raid5 cache super write. >> >> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock>> write") didn't fix the problem completely. Cc Song. > > That references discards, and it make me relook at mdadm -D which > shows a journal device: > > 0 253 2 - journal /dev/dm-2 > > Vojtech, can you confirm this device is an SSD? There are a couple > SSDs that show up in the dmesg if I recall correctly. I tried to explain this in my first post. It's logical volume in a volume group over RAID 1 over 2 SSDs. My colleague replied to with more details: On 05. 08. 2020 Michal Moravec wrote: >> On 29 Jul 2020, Chris Murphy wrote: >> Vojtech, can you confirm this device is an SSD? There are a couple >> SSDs that show up in the dmesg if I recall correctly. > > Yes. We have a pair (sdg, sdh) of INTEL D3-S4610 240 GB SSDs > (SSDSC2KG240G8). > We use them for OS and the raid6 journal. > They are configured as raid md0 array with LVM on top of it. > Logical volume vg0-journal_md1 (of 1G size) is used as journal device > for md1 array (where are problem with proccess md1_raid6 consuming > 100% > CPU and blocking btrfs operation is happening) >> What is the default discard hinting for this SSD when it's used as >> a journal device for mdadm? > > What do you mean by discard hinting? > We have a issue_discards = 1 configuration in /etc/lvm/lvm.conf >> And what is the write behavior of the journal? > > That would be journal_mode set to write-through, right? >> I'm not familiar with this feature at all, whether it's treated as a >> raw block device for the journal or if the journal resides on a file >> system. > > From lsblk output I see no filesystem on vg0-journal_md1. It looks > like plain logical volume to me. [my comment]: yes, it's LV block device, no filesystem here. >> So I get kinda curious what might happen long term if this is a very >> busy file system, very busy raid5/6 journal on this SSD, without any >> discard hints? >> Is it possible the SSD runs out of ready-to-write erase blocks, and >> the firmware has become super slow doing erasure/garbage collection >> on demand? >> And the journal is now having a hard time flushing? > > What kind of information could we gather to verify/reject any of these > ideas? [my question]: Is LVM configuration (above) enough? Sadly, there are not much information about RAID 6 journaling at kernel wiki. There are some info in mdadm(8), but nothing about discards/trim operation. [-- Attachment #2: lsblk-output.txt --] [-- Type: text/plain, Size: 1076 bytes --] NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdg 8:96 1 223,6G 0 disk ├─sdg1 8:97 1 37,3G 0 part │ └─md0 9:0 0 37,2G 0 raid1 │ ├─vg0-swap 253:0 0 3,7G 0 lvm [SWAP] │ ├─vg0-root 253:1 0 14,9G 0 lvm / │ └─vg0-journal_md1 253:2 0 1G 0 lvm │ └─md1 9:1 0 29,1T 0 raid6 /mnt/data ├─sdg2 8:98 1 1K 0 part └─sdg5 8:101 1 186,3G 0 part sdh 8:112 1 223,6G 0 disk ├─sdh1 8:113 1 37,3G 0 part │ └─md0 9:0 0 37,2G 0 raid1 │ ├─vg0-swap 253:0 0 3,7G 0 lvm [SWAP] │ ├─vg0-root 253:1 0 14,9G 0 lvm / │ └─vg0-journal_md1 253:2 0 1G 0 lvm │ └─md1 9:1 0 29,1T 0 raid6 /mnt/data ├─sdh2 8:114 1 1K 0 part └─sdh5 8:117 1 186,3G 0 part ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-08-12 14:19 ` Vojtech Myslivec @ 2020-08-12 14:19 ` Vojtech Myslivec 0 siblings, 0 replies; 22+ messages in thread From: Vojtech Myslivec @ 2020-08-12 14:19 UTC (permalink / raw) To: Chris Murphy, Guoqing Jiang Cc: Btrfs BTRFS, Linux-RAID, Michal Moravec, Song Liu [-- Attachment #1: Type: text/plain, Size: 2848 bytes --] On 29. 07. 20 23:48, Chris Murphy wrote: > On Wed, Jul 29, 2020 at 3:06 PM Guoqing Jiang > <guoqing.jiang@cloud.ionos.com> wrote: >> On 7/22/20 10:47 PM, Vojtech Myslivec wrote: >>> 1. What should be the cause of this problem? >> >> Just a quick glance based on the stacks which you attached, I guess it >> could be >> a deadlock issue of raid5 cache super write. >> >> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock>> write") didn't fix the problem completely. Cc Song. > > That references discards, and it make me relook at mdadm -D which > shows a journal device: > > 0 253 2 - journal /dev/dm-2 > > Vojtech, can you confirm this device is an SSD? There are a couple > SSDs that show up in the dmesg if I recall correctly. I tried to explain this in my first post. It's logical volume in a volume group over RAID 1 over 2 SSDs. My colleague replied to with more details: On 05. 08. 2020 Michal Moravec wrote: >> On 29 Jul 2020, Chris Murphy wrote: >> Vojtech, can you confirm this device is an SSD? There are a couple >> SSDs that show up in the dmesg if I recall correctly. > > Yes. We have a pair (sdg, sdh) of INTEL D3-S4610 240 GB SSDs > (SSDSC2KG240G8). > We use them for OS and the raid6 journal. > They are configured as raid md0 array with LVM on top of it. > Logical volume vg0-journal_md1 (of 1G size) is used as journal device > for md1 array (where are problem with proccess md1_raid6 consuming > 100% > CPU and blocking btrfs operation is happening) >> What is the default discard hinting for this SSD when it's used as >> a journal device for mdadm? > > What do you mean by discard hinting? > We have a issue_discards = 1 configuration in /etc/lvm/lvm.conf >> And what is the write behavior of the journal? > > That would be journal_mode set to write-through, right? >> I'm not familiar with this feature at all, whether it's treated as a >> raw block device for the journal or if the journal resides on a file >> system. > > From lsblk output I see no filesystem on vg0-journal_md1. It looks > like plain logical volume to me. [my comment]: yes, it's LV block device, no filesystem here. >> So I get kinda curious what might happen long term if this is a very >> busy file system, very busy raid5/6 journal on this SSD, without any >> discard hints? >> Is it possible the SSD runs out of ready-to-write erase blocks, and >> the firmware has become super slow doing erasure/garbage collection >> on demand? >> And the journal is now having a hard time flushing? > > What kind of information could we gather to verify/reject any of these > ideas? [my question]: Is LVM configuration (above) enough? Sadly, there are not much information about RAID 6 journaling at kernel wiki. There are some info in mdadm(8), but nothing about discards/trim operation. [-- Attachment #2: lsblk-output.txt --] [-- Type: text/plain, Size: 1076 bytes --] NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdg 8:96 1 223,6G 0 disk ├─sdg1 8:97 1 37,3G 0 part │ └─md0 9:0 0 37,2G 0 raid1 │ ├─vg0-swap 253:0 0 3,7G 0 lvm [SWAP] │ ├─vg0-root 253:1 0 14,9G 0 lvm / │ └─vg0-journal_md1 253:2 0 1G 0 lvm │ └─md1 9:1 0 29,1T 0 raid6 /mnt/data ├─sdg2 8:98 1 1K 0 part └─sdg5 8:101 1 186,3G 0 part sdh 8:112 1 223,6G 0 disk ├─sdh1 8:113 1 37,3G 0 part │ └─md0 9:0 0 37,2G 0 raid1 │ ├─vg0-swap 253:0 0 3,7G 0 lvm [SWAP] │ ├─vg0-root 253:1 0 14,9G 0 lvm / │ └─vg0-journal_md1 253:2 0 1G 0 lvm │ └─md1 9:1 0 29,1T 0 raid6 /mnt/data ├─sdh2 8:114 1 1K 0 part └─sdh5 8:117 1 186,3G 0 part ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-07-29 21:06 ` Guoqing Jiang 2020-07-29 21:48 ` Chris Murphy @ 2020-07-30 6:45 ` Song Liu 2020-08-12 13:58 ` Vojtech Myslivec 2 siblings, 0 replies; 22+ messages in thread From: Song Liu @ 2020-07-30 6:45 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Vojtech Myslivec, linux-btrfs, linux-raid, Michal Moravec > On Jul 29, 2020, at 2:06 PM, Guoqing Jiang <guoqing.jiang@cloud.ionos.com> wrote: > > Hi, > > On 7/22/20 10:47 PM, Vojtech Myslivec wrote: >> 1. What should be the cause of this problem? > > Just a quick glance based on the stacks which you attached, I guess it could be > a deadlock issue of raid5 cache super write. > > Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock > write") didn't fix the problem completely. Cc Song. > > And I am curious why md thread is not waked if mddev_trylock fails, you can > give it a try but I can't promise it helps ... > > --- a/drivers/md/raid5-cache.c > +++ b/drivers/md/raid5-cache.c > @@ -1337,8 +1337,10 @@ static void r5l_write_super_and_discard_space(struct r5l_log *log, > */ > set_mask_bits(&mddev->sb_flags, 0, > BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); > - if (!mddev_trylock(mddev)) > + if (!mddev_trylock(mddev)) { > + md_wakeup_thread(mddev->thread); > return; > + } > md_update_sb(mddev, 1); > mddev_unlock(mddev); > Thanks Guoqing! I am not sure whether we hit the mddev_trylock() failure. Looks like the md1_raid6 thread is already running at 100%. A few questions: 1. I see wbt_wait in the stack trace. Are we using write back throttling here? 2. Could you please get the /proc/<pid>/stack for <pid> of md1_raid6? We may want to sample it multiple times. Thanks, Song ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Linux RAID with btrfs stuck and consume 100 % CPU 2020-07-29 21:06 ` Guoqing Jiang 2020-07-29 21:48 ` Chris Murphy 2020-07-30 6:45 ` Song Liu @ 2020-08-12 13:58 ` Vojtech Myslivec 2 siblings, 0 replies; 22+ messages in thread From: Vojtech Myslivec @ 2020-08-12 13:58 UTC (permalink / raw) To: Guoqing Jiang, linux-btrfs, linux-raid; +Cc: Michal Moravec, Song Liu Hello, On 29. 07. 20 23:06, Guoqing Jiang wrote: > On 7/22/20 10:47 PM, Vojtech Myslivec wrote: >> 1. What should be the cause of this problem? > > Just a quick glance based on the stacks which you attached, I guess it > could be > a deadlock issue of raid5 cache super write. > > Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock > write") didn't fix the problem completely. Cc Song. > > And I am curious why md thread is not waked if mddev_trylock fails, you can > give it a try but I can't promise it helps ... Just to make sure we use RAID 6 across 6 HDD (I am not sure how RAID 5 implementation relates to RAID 6). ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2021-03-17 16:05 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec 2020-07-22 22:00 ` antlists 2020-07-23 2:08 ` Chris Murphy [not found] ` <29509e08-e373-b352-d696-fcb9f507a545@xmyslivec.cz> 2020-07-28 20:23 ` Chris Murphy [not found] ` <695936b4-67a2-c862-9cb6-5545b4ab3c42@xmyslivec.cz> 2020-08-14 20:04 ` Chris Murphy [not found] ` <2f2f1c21-c81b-55aa-6f77-e2d3f32d32cb@xmyslivec.cz> 2020-08-19 22:58 ` Chris Murphy 2020-08-26 15:35 ` Vojtech Myslivec 2020-08-26 18:07 ` Chris Murphy 2020-09-16 9:42 ` Vojtech Myslivec 2020-09-17 17:08 ` Chris Murphy 2020-09-17 17:20 ` Chris Murphy 2020-09-17 17:43 ` Chris Murphy 2020-09-23 18:14 ` Vojtech Myslivec 2021-02-11 3:14 ` Manuel Riel 2021-02-28 8:34 ` Manuel Riel [not found] ` <56AD80D0-6853-4E3A-A94C-AD1477D3FDA4@snapdragon.cc> 2021-03-17 15:55 ` Vojtech Myslivec 2020-07-29 21:06 ` Guoqing Jiang 2020-07-29 21:48 ` Chris Murphy 2020-08-12 14:19 ` Vojtech Myslivec 2020-08-12 14:19 ` Vojtech Myslivec 2020-07-30 6:45 ` Song Liu 2020-08-12 13:58 ` Vojtech Myslivec
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).