linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Drazen Kacar <drazen.kacar@oradian.com>
To: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Deadlock between btrfs-transacti and userland on 5.2.1
Date: Thu, 18 Jul 2019 12:23:37 +0000	[thread overview]
Message-ID: <DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com> (raw)
In-Reply-To: <DB8PR03MB56287A9E527898E5727BB41599C80@DB8PR03MB5628.eurprd03.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 5319 bytes --]

Hello,
 
I think I've encountered a deadlock between btrfs-transacti and postgres
process(es). This is system information (btrfs fi usage obtained after
poweroff and boot):

# cat /etc/redhat-release  
CentOS Linux release 7.6.1810 (Core) 
  
# uname -a
Linux prod-dbsnap-01 5.2.1-1.el7.elrepo.x86_64 #1 SMP Sun Jul 14 08:15:04 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux 
 
# btrfs --version
btrfs-progs v5.2 
 
# btrfs filesystem usage /data/pg_data 
 Overall:
     Device size:                   2.00TiB
     Device allocated:            345.03GiB
     Device unallocated:            1.66TiB
     Device missing:                  0.00B
     Used:                        338.07GiB
     Free (estimated):              1.67TiB      (min: 854.27GiB)
     Data ratio:                       1.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)
 
 Data,RAID0: Size:332.00GiB, Used:329.22GiB
    /dev/sdb       83.00GiB
    /dev/sdc       83.00GiB
    /dev/sdd       83.00GiB
    /dev/sde       83.00GiB
 
 Metadata,RAID10: Size:6.50GiB, Used:4.42GiB
    /dev/sdb        3.25GiB
    /dev/sdc        3.25GiB
    /dev/sdd        3.25GiB
    /dev/sde        3.25GiB
 
 System,RAID10: Size:16.00MiB, Used:48.00KiB
    /dev/sdb        8.00MiB
    /dev/sdc        8.00MiB
    /dev/sdd        8.00MiB
    /dev/sde        8.00MiB
 
 Unallocated:
    /dev/sdb      425.74GiB
    /dev/sdc      425.74GiB
    /dev/sdd      425.74GiB
    /dev/sde      425.74GiB
 
There were three btrfs subvolumes and on each one there was a Postgres
database slave doing recovery (single threaded writing). But there was a
lot of writing. And prior to starting Postgres slaves I was restoring base
backup from the backup server, which was being done by a number of
parallel rsync processes (12 at most, I think).
 
The file system is mounted with:
 
# grep btrfs /proc/mounts
/dev/sdd /data/pg_data btrfs rw,noatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/ 0 0
 
After several hours I got this in /var/log/messages:

# grep 'INFO: task .*blocked for more than' messages
 Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 122 seconds.
 Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 122 seconds.
 Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 122 seconds.
 Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 122 seconds.
 Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 122 seconds.
 Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 245 seconds.
 Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 245 seconds.
 Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 245 seconds.
 Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 245 seconds.
 Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 245 seconds.
 
Full log is in the attachment. It seems to me that there is a deadlock
between btrfs-transacti and any process which is trying to write
something (not sure about reading). While this was going on the cpu
usage (according to top) was non-existant. There are 4 cpus (it's a virtual
machine in VmWare) and 3 were 100% idle. The 4th was 100% in
waiting. (I didn't find out which process was on that cpu, unfortunately.)
 
I powered off the machine (yesterday), booted this morning and things
are working without errors. I stopped postgres clusters, though.

I have a few questions:

1.  After something like this happens and the machine is rebooted is there
a procedure which would lower the probablity of encountering the deadlock
again (maybe btrfs scrub or btrfs defragment or something like that)? This
happened after a heavy write activity,  so maybe fragmentation had
something to do with it.

2. Should I run btrfsck (or something else) to verify on-disk integrity after a
problem like this? Or it's just an in-memory problem, so I can assume that
nothing bad happened to the data on disks.

3. I'd like to write a watchdog program to catch deadlocks and reboot
(probably power-cycle) the VM, but I'm not sure what would the appropriate
check be. Does it have to write something to the disk or reading would be
sufficient? And how to bypass the OS  buffer cache (fsync() or O_DIRECT
should do it for writing, but I'm not sure about reading)?

What would the appropriate timeout be? (If the operation doesn't
complete in xx seconds a reboot should be triggered, but I don't know how
many seconds to wait when there's  a heavy load and things are just slow,
but there's no deadlock.)

Should I put watchdog process in real-time class (or however the equivalent
is called on Linux)? Since this is a mainline kernel, I'm not sure if I could assume
that real-time support won't have  bugs of its own.
 
And last, but not least, is there additional data that could help with debugging
issues like this? (If possible, something that could be programmed into
watchdog service.)

[-- Attachment #2: messages_deadlock.txt --]
[-- Type: text/plain, Size: 24455 bytes --]

Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 122 seconds.
Jul 17 16:47:09 prod-dbsnap-01 kernel:      Not tainted 5.2.1-1.el7.elrepo.x86_64 #1
Jul 17 16:47:09 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs-transacti D    0  1361      2 0x80004000
Jul 17 16:47:09 prod-dbsnap-01 kernel: Call Trace:
Jul 17 16:47:09 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? blk_flush_plug_list+0xd1/0x100
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? bit_wait+0x50/0x50
Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule+0x38/0xa0
Jul 17 16:47:09 prod-dbsnap-01 kernel: io_schedule+0x16/0x40
Jul 17 16:47:09 prod-dbsnap-01 kernel: bit_wait_io+0x11/0x50
Jul 17 16:47:09 prod-dbsnap-01 kernel: __wait_on_bit+0x6b/0xa0
Jul 17 16:47:09 prod-dbsnap-01 kernel: out_of_line_wait_on_bit+0x91/0xb0
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? init_wait_var_entry+0x50/0x50
Jul 17 16:47:09 prod-dbsnap-01 kernel: lock_extent_buffer_for_io+0x12f/0x320 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btree_write_cache_pages+0x207/0x420 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btree_writepages+0x5d/0x70 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: do_writepages+0x1f/0x70
Jul 17 16:47:09 prod-dbsnap-01 kernel: __filemap_fdatawrite_range+0xc6/0x100
Jul 17 16:47:09 prod-dbsnap-01 kernel: filemap_fdatawrite_range+0x13/0x20
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_write_marked_extents+0x107/0x150 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_write_and_wait_transaction.isra.26+0x4d/0xa0 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_commit_transaction+0x7b4/0xab0 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: transaction_kthread+0x157/0x190 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: kthread+0x105/0x140
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? btrfs_cleanup_transaction+0x580/0x580 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? kthread_bind+0x20/0x20
Jul 17 16:47:09 prod-dbsnap-01 kernel: ret_from_fork+0x35/0x40
Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 122 seconds.
Jul 17 16:47:09 prod-dbsnap-01 kernel:      Not tainted 5.2.1-1.el7.elrepo.x86_64 #1
Jul 17 16:47:09 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 17 16:47:09 prod-dbsnap-01 kernel: postgres        D    0 62682  20074 0x00004000
Jul 17 16:47:09 prod-dbsnap-01 kernel: Call Trace:
Jul 17 16:47:09 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650
Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule+0x38/0xa0
Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10
Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70
Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 17 16:47:09 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80
Jul 17 16:47:09 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70
Jul 17 16:47:09 prod-dbsnap-01 kernel: __x64_sys_fdatasync+0x17/0x20
Jul 17 16:47:09 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0
Jul 17 16:47:09 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 17 16:47:09 prod-dbsnap-01 kernel: RIP: 0033:0x7fdbacc98310
Jul 17 16:47:09 prod-dbsnap-01 kernel: Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 8d 6c 2d 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 2e 6a 01 00 48 89 04 24
Jul 17 16:47:09 prod-dbsnap-01 kernel: RSP: 002b:00007fff8e7c2908 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
Jul 17 16:47:09 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000018cbc RCX: 00007fdbacc98310
Jul 17 16:47:09 prod-dbsnap-01 kernel: RDX: 0000000000000027 RSI: 0000000000018cbc RDI: 0000000000000004
Jul 17 16:47:09 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Jul 17 16:47:09 prod-dbsnap-01 kernel: R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000004
Jul 17 16:47:09 prod-dbsnap-01 kernel: R13: 0000000001228189 R14: 00007fdb9caf9280 R15: 0000000000000100
Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 122 seconds.
Jul 17 16:47:09 prod-dbsnap-01 kernel:      Not tainted 5.2.1-1.el7.elrepo.x86_64 #1
Jul 17 16:47:09 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 17 16:47:09 prod-dbsnap-01 kernel: postgres        D    0 80145  80126 0x00004000
Jul 17 16:47:09 prod-dbsnap-01 kernel: Call Trace:
Jul 17 16:47:09 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650
Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule+0x38/0xa0
Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10
Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70
Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 17 16:47:09 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x163/0xdd0 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80
Jul 17 16:47:09 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70
Jul 17 16:47:09 prod-dbsnap-01 kernel: __x64_sys_fsync+0x14/0x20
Jul 17 16:47:09 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0
Jul 17 16:47:09 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 17 16:47:09 prod-dbsnap-01 kernel: RIP: 0033:0x7f1de34bdd00
Jul 17 16:47:09 prod-dbsnap-01 kernel: Code: Bad RIP value.
Jul 17 16:47:09 prod-dbsnap-01 kernel: RSP: 002b:00007fff3d26d058 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
Jul 17 16:47:09 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1de34bdd00
Jul 17 16:47:09 prod-dbsnap-01 kernel: RDX: 0000000000000120 RSI: 00007f1dc4a9e580 RDI: 0000000000000003
Jul 17 16:47:09 prod-dbsnap-01 kernel: RBP: 00007f1de38c97f0 R08: 00007f1dc4a9e690 R09: 00007f1dc4a9e588
Jul 17 16:47:09 prod-dbsnap-01 kernel: R10: 00007f1de1303c8c R11: 0000000000000246 R12: 00000086e3333668
Jul 17 16:47:09 prod-dbsnap-01 kernel: R13: 0000000000000001 R14: 00007fff3d26d560 R15: 0000000000000010
Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 122 seconds.
Jul 17 16:47:09 prod-dbsnap-01 kernel:      Not tainted 5.2.1-1.el7.elrepo.x86_64 #1
Jul 17 16:47:09 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 17 16:47:09 prod-dbsnap-01 kernel: postgres        D    0 87299  80126 0x00004000
Jul 17 16:47:09 prod-dbsnap-01 kernel: Call Trace:
Jul 17 16:47:09 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650
Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule+0x38/0xa0
Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10
Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70
Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 17 16:47:09 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80
Jul 17 16:47:09 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70
Jul 17 16:47:09 prod-dbsnap-01 kernel: __x64_sys_fdatasync+0x17/0x20
Jul 17 16:47:09 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0
Jul 17 16:47:09 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 17 16:47:09 prod-dbsnap-01 kernel: RIP: 0033:0x7f1de1031310
Jul 17 16:47:09 prod-dbsnap-01 kernel: Code: Bad RIP value.
Jul 17 16:47:09 prod-dbsnap-01 kernel: RSP: 002b:00007fff3d26d208 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
Jul 17 16:47:09 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 00000000000086e3 RCX: 00007f1de1031310
Jul 17 16:47:09 prod-dbsnap-01 kernel: RDX: 0000000000000027 RSI: 00000000000086e3 RDI: 0000000000000004
Jul 17 16:47:09 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Jul 17 16:47:09 prod-dbsnap-01 kernel: R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000004
Jul 17 16:47:09 prod-dbsnap-01 kernel: R13: 0000000001650289 R14: 00007f1dd0e92280 R15: 00000000000000e0
Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 122 seconds.
Jul 17 16:47:09 prod-dbsnap-01 kernel:      Not tainted 5.2.1-1.el7.elrepo.x86_64 #1
Jul 17 16:47:09 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 17 16:47:09 prod-dbsnap-01 kernel: postgres        D    0 93108  93106 0x00004000
Jul 17 16:47:09 prod-dbsnap-01 kernel: Call Trace:
Jul 17 16:47:09 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650
Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule+0x38/0xa0
Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10
Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? btrfs_must_commit_transaction+0x60/0x70 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 17 16:47:09 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30
Jul 17 16:47:09 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs]
Jul 17 16:47:09 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80
Jul 17 16:47:09 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70
Jul 17 16:47:09 prod-dbsnap-01 kernel: __x64_sys_fsync+0x14/0x20
Jul 17 16:47:09 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0
Jul 17 16:47:09 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 17 16:47:09 prod-dbsnap-01 kernel: RIP: 0033:0x7f489051dd00
Jul 17 16:47:09 prod-dbsnap-01 kernel: Code: Bad RIP value.
Jul 17 16:47:09 prod-dbsnap-01 kernel: RSP: 002b:00007fff39abd838 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
Jul 17 16:47:09 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f489051dd00
Jul 17 16:47:09 prod-dbsnap-01 kernel: RDX: 0000000000000180 RSI: 0000000000000002 RDI: 0000000000000003
Jul 17 16:47:09 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 3839303030303030 R09: 00007f488dfe920d
Jul 17 16:47:09 prod-dbsnap-01 kernel: R10: 00007f488e363c8c R11: 0000000000000246 R12: 00007fff39abe200
Jul 17 16:47:09 prod-dbsnap-01 kernel: R13: 0000000000000000 R14: 0000000000000014 R15: 00007fff39abe200
Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 245 seconds.
Jul 17 16:49:12 prod-dbsnap-01 kernel:      Not tainted 5.2.1-1.el7.elrepo.x86_64 #1
Jul 17 16:49:12 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs-transacti D    0  1361      2 0x80004000
Jul 17 16:49:12 prod-dbsnap-01 kernel: Call Trace:
Jul 17 16:49:12 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? blk_flush_plug_list+0xd1/0x100
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? bit_wait+0x50/0x50
Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule+0x38/0xa0
Jul 17 16:49:12 prod-dbsnap-01 kernel: io_schedule+0x16/0x40
Jul 17 16:49:12 prod-dbsnap-01 kernel: bit_wait_io+0x11/0x50
Jul 17 16:49:12 prod-dbsnap-01 kernel: __wait_on_bit+0x6b/0xa0
Jul 17 16:49:12 prod-dbsnap-01 kernel: out_of_line_wait_on_bit+0x91/0xb0
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? init_wait_var_entry+0x50/0x50
Jul 17 16:49:12 prod-dbsnap-01 kernel: lock_extent_buffer_for_io+0x12f/0x320 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btree_write_cache_pages+0x207/0x420 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btree_writepages+0x5d/0x70 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: do_writepages+0x1f/0x70
Jul 17 16:49:12 prod-dbsnap-01 kernel: __filemap_fdatawrite_range+0xc6/0x100
Jul 17 16:49:12 prod-dbsnap-01 kernel: filemap_fdatawrite_range+0x13/0x20
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_write_marked_extents+0x107/0x150 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_write_and_wait_transaction.isra.26+0x4d/0xa0 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_commit_transaction+0x7b4/0xab0 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: transaction_kthread+0x157/0x190 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: kthread+0x105/0x140
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? btrfs_cleanup_transaction+0x580/0x580 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? kthread_bind+0x20/0x20
Jul 17 16:49:12 prod-dbsnap-01 kernel: ret_from_fork+0x35/0x40
Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 245 seconds.
Jul 17 16:49:12 prod-dbsnap-01 kernel:      Not tainted 5.2.1-1.el7.elrepo.x86_64 #1
Jul 17 16:49:12 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 17 16:49:12 prod-dbsnap-01 kernel: postgres        D    0 62682  20074 0x00004000
Jul 17 16:49:12 prod-dbsnap-01 kernel: Call Trace:
Jul 17 16:49:12 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650
Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule+0x38/0xa0
Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10
Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70
Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 17 16:49:12 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80
Jul 17 16:49:12 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70
Jul 17 16:49:12 prod-dbsnap-01 kernel: __x64_sys_fdatasync+0x17/0x20
Jul 17 16:49:12 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0
Jul 17 16:49:12 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 17 16:49:12 prod-dbsnap-01 kernel: RIP: 0033:0x7fdbacc98310
Jul 17 16:49:12 prod-dbsnap-01 kernel: Code: Bad RIP value.
Jul 17 16:49:12 prod-dbsnap-01 kernel: RSP: 002b:00007fff8e7c2908 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
Jul 17 16:49:12 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000018cbc RCX: 00007fdbacc98310
Jul 17 16:49:12 prod-dbsnap-01 kernel: RDX: 0000000000000027 RSI: 0000000000018cbc RDI: 0000000000000004
Jul 17 16:49:12 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Jul 17 16:49:12 prod-dbsnap-01 kernel: R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000004
Jul 17 16:49:12 prod-dbsnap-01 kernel: R13: 0000000001228189 R14: 00007fdb9caf9280 R15: 0000000000000100
Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 245 seconds.
Jul 17 16:49:12 prod-dbsnap-01 kernel:      Not tainted 5.2.1-1.el7.elrepo.x86_64 #1
Jul 17 16:49:12 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 17 16:49:12 prod-dbsnap-01 kernel: postgres        D    0 80145  80126 0x00004000
Jul 17 16:49:12 prod-dbsnap-01 kernel: Call Trace:
Jul 17 16:49:12 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650
Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule+0x38/0xa0
Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10
Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70
Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 17 16:49:12 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x163/0xdd0 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80
Jul 17 16:49:12 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70
Jul 17 16:49:12 prod-dbsnap-01 kernel: __x64_sys_fsync+0x14/0x20
Jul 17 16:49:12 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0
Jul 17 16:49:12 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 17 16:49:12 prod-dbsnap-01 kernel: RIP: 0033:0x7f1de34bdd00
Jul 17 16:49:12 prod-dbsnap-01 kernel: Code: Bad RIP value.
Jul 17 16:49:12 prod-dbsnap-01 kernel: RSP: 002b:00007fff3d26d058 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
Jul 17 16:49:12 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1de34bdd00
Jul 17 16:49:12 prod-dbsnap-01 kernel: RDX: 0000000000000120 RSI: 00007f1dc4a9e580 RDI: 0000000000000003
Jul 17 16:49:12 prod-dbsnap-01 kernel: RBP: 00007f1de38c97f0 R08: 00007f1dc4a9e690 R09: 00007f1dc4a9e588
Jul 17 16:49:12 prod-dbsnap-01 kernel: R10: 00007f1de1303c8c R11: 0000000000000246 R12: 00000086e3333668
Jul 17 16:49:12 prod-dbsnap-01 kernel: R13: 0000000000000001 R14: 00007fff3d26d560 R15: 0000000000000010
Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 245 seconds.
Jul 17 16:49:12 prod-dbsnap-01 kernel:      Not tainted 5.2.1-1.el7.elrepo.x86_64 #1
Jul 17 16:49:12 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 17 16:49:12 prod-dbsnap-01 kernel: postgres        D    0 87299  80126 0x00004000
Jul 17 16:49:12 prod-dbsnap-01 kernel: Call Trace:
Jul 17 16:49:12 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650
Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule+0x38/0xa0
Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10
Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70
Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 17 16:49:12 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80
Jul 17 16:49:12 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70
Jul 17 16:49:12 prod-dbsnap-01 kernel: __x64_sys_fdatasync+0x17/0x20
Jul 17 16:49:12 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0
Jul 17 16:49:12 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 17 16:49:12 prod-dbsnap-01 kernel: RIP: 0033:0x7f1de1031310
Jul 17 16:49:12 prod-dbsnap-01 kernel: Code: Bad RIP value.
Jul 17 16:49:12 prod-dbsnap-01 kernel: RSP: 002b:00007fff3d26d208 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
Jul 17 16:49:12 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 00000000000086e3 RCX: 00007f1de1031310
Jul 17 16:49:12 prod-dbsnap-01 kernel: RDX: 0000000000000027 RSI: 00000000000086e3 RDI: 0000000000000004
Jul 17 16:49:12 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Jul 17 16:49:12 prod-dbsnap-01 kernel: R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000004
Jul 17 16:49:12 prod-dbsnap-01 kernel: R13: 0000000001650289 R14: 00007f1dd0e92280 R15: 00000000000000e0
Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 245 seconds.
Jul 17 16:49:12 prod-dbsnap-01 kernel:      Not tainted 5.2.1-1.el7.elrepo.x86_64 #1
Jul 17 16:49:12 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 17 16:49:12 prod-dbsnap-01 kernel: postgres        D    0 93108  93106 0x00004000
Jul 17 16:49:12 prod-dbsnap-01 kernel: Call Trace:
Jul 17 16:49:12 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650
Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule+0x38/0xa0
Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10
Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? btrfs_must_commit_transaction+0x60/0x70 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 17 16:49:12 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30
Jul 17 16:49:12 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs]
Jul 17 16:49:12 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80
Jul 17 16:49:12 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70
Jul 17 16:49:12 prod-dbsnap-01 kernel: __x64_sys_fsync+0x14/0x20
Jul 17 16:49:12 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0
Jul 17 16:49:12 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 17 16:49:12 prod-dbsnap-01 kernel: RIP: 0033:0x7f489051dd00
Jul 17 16:49:12 prod-dbsnap-01 kernel: Code: Bad RIP value.
Jul 17 16:49:12 prod-dbsnap-01 kernel: RSP: 002b:00007fff39abd838 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
Jul 17 16:49:12 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f489051dd00
Jul 17 16:49:12 prod-dbsnap-01 kernel: RDX: 0000000000000180 RSI: 0000000000000002 RDI: 0000000000000003
Jul 17 16:49:12 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 3839303030303030 R09: 00007f488dfe920d
Jul 17 16:49:12 prod-dbsnap-01 kernel: R10: 00007f488e363c8c R11: 0000000000000246 R12: 00007fff39abe200
Jul 17 16:49:12 prod-dbsnap-01 kernel: R13: 0000000000000000 R14: 0000000000000014 R15: 00007fff39abe200

       reply	other threads:[~2019-07-18 12:23 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <DB8PR03MB56287A9E527898E5727BB41599C80@DB8PR03MB5628.eurprd03.prod.outlook.com>
2019-07-18 12:23 ` Drazen Kacar [this message]
2019-07-19 12:12   ` Deadlock between btrfs-transacti and userland on 5.2.1 Drazen Kacar
2019-09-11 14:59   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com \
    --to=drazen.kacar@oradian.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).