On Jul 6, 2021, at 7:11 PM, Joseph Qi > wrote: Hi Gang, Could you please describe the issue in the following way? Node 1 Node 2 Node 3 ... ... ... That would be more clearly for discussing. Thanks, Joseph On 7/1/21 6:56 PM, Gang He wrote: Hi Guys, There are three node ocfs2 cluster, when the user run reflink command during ocfs2 node does recovery(e.g. one node is fenced). The hang problem was caused dlm dead lock between rksaph18 and rksaph19, the detailed processes are as below, Jun 01 12:33:10 rksaph18 kernel: task:reflink state:D stack: 0 pid: 7515 ppid: 7439 flags:0x00004000 Jun 01 12:33:10 rksaph18 kernel: Call Trace: Jun 01 12:33:10 rksaph18 kernel: __schedule+0x2fd/0x750 Jun 01 12:33:10 rksaph18 kernel: schedule+0x2f/0xa0 Jun 01 12:33:10 rksaph18 kernel: schedule_timeout+0x1cc/0x310 Jun 01 12:33:10 rksaph18 kernel: ? ocfs2_control_cfu+0x50/0x50 [ocfs2_stack_user] Jun 01 12:33:10 rksaph18 kernel: ? 0xffffffffc08f7000 Jun 01 12:33:10 rksaph18 kernel: wait_for_completion+0xba/0x140 Jun 01 12:33:10 rksaph18 kernel: ? wake_up_q+0xa0/0xa0 Jun 01 12:33:10 rksaph18 kernel: __ocfs2_cluster_lock.isra.41+0x3b5/0x820 [ocfs2] Jun 01 12:33:10 rksaph18 kernel: ? ocfs2_inode_lock_full_nested+0x1fc/0x960 [ocfs2] Jun 01 12:33:10 rksaph18 kernel: ocfs2_inode_lock_full_nested+0x1fc/0x960 [ocfs2] Jun 01 12:33:10 rksaph18 kernel: ocfs2_mv_orphaned_inode_to_new+0x346/0x7e0 [ocfs2] Jun 01 12:33:10 rksaph18 kernel: ? _raw_spin_unlock_irqrestore+0x14/0x20 Jun 01 12:33:10 rksaph18 kernel: ocfs2_reflink+0x335/0x4c0 [ocfs2] Jun 01 12:33:10 rksaph18 kernel: ? ocfs2_reflink_ioctl+0x2ca/0x360 [ocfs2] Jun 01 12:33:10 rksaph18 kernel: ocfs2_reflink_ioctl+0x2ca/0x360 [ocfs2] Jun 01 12:33:10 rksaph18 kernel: ocfs2_ioctl+0x25e/0x670 [ocfs2] Jun 01 12:33:10 rksaph18 kernel: do_vfs_ioctl+0xa0/0x680 Jun 01 12:33:10 rksaph18 kernel: ksys_ioctl+0x70/0x80 Jun 01 12:33:10 rksaph18 kernel: __x64_sys_ioctl+0x16/0x20 Jun 01 12:33:10 rksaph18 kernel: do_syscall_64+0x5b/0x1e0 Jun 01 12:33:10 rksaph18 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 01 12:33:10 rksaph18 kernel: RIP: 0033:0x7f1bb2aaf9e7 The reflink process is geting orphan_dir_inode(ocfs2 system file per node) dlm lock after it has acquired the reflink target file inode dlm lock, the reflink target file is put in orphan_dir_inode directory during the reflink operation. Jun 01 12:33:17 rksaph19 kernel: Workqueue: ocfs2_wq ocfs2_complete_recovery [ocfs2] Jun 01 12:33:17 rksaph19 kernel: Call Trace: Jun 01 12:33:17 rksaph19 kernel: __schedule+0x2fd/0x750 Jun 01 12:33:17 rksaph19 kernel: schedule+0x2f/0xa0 Jun 01 12:33:17 rksaph19 kernel: schedule_timeout+0x1cc/0x310 Jun 01 12:33:17 rksaph19 kernel: ? ocfs2_control_cfu+0x50/0x50 [ocfs2_stack_user] Jun 01 12:33:17 rksaph19 kernel: ? 0xffffffffc0862000 Jun 01 12:33:17 rksaph19 kernel: wait_for_completion+0xba/0x140 Jun 01 12:33:17 rksaph19 kernel: ? wake_up_q+0xa0/0xa0 Jun 01 12:33:17 rksaph19 kernel: __ocfs2_cluster_lock.isra.41+0x3b5/0x820 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ? ocfs2_inode_lock_full_nested+0x1fc/0x960 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ocfs2_inode_lock_full_nested+0x1fc/0x960 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ocfs2_evict_inode+0x18a/0x810 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: evict+0xca/0x1b0 Jun 01 12:33:17 rksaph19 kernel: ocfs2_orphan_filldir+0x92/0x140 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ocfs2_dir_foreach_blk+0x4b2/0x570 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ? ocfs2_inode_lock_full_nested+0x487/0x960 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ocfs2_dir_foreach+0x54/0x80 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ocfs2_queue_orphans+0xf2/0x1f0 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ? __ocfs2_recovery_map_test+0x50/0x50 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ? chacha_block_generic+0x6c/0xb0 Jun 01 12:33:17 rksaph19 kernel: ? ocfs2_recover_orphans+0x12d/0x4f0 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ocfs2_recover_orphans+0x12d/0x4f0 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ? internal_add_timer+0x4e/0x70 Jun 01 12:33:17 rksaph19 kernel: ocfs2_complete_recovery+0x19a/0x450 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: ? queue_delayed_work_on+0x2a/0x40 Jun 01 12:33:17 rksaph19 kernel: ? ocfs2_orphan_scan_work+0x110/0x2c0 [ocfs2] Jun 01 12:33:17 rksaph19 kernel: process_one_work+0x1f4/0x3e0 Jun 01 12:33:17 rksaph19 kernel: worker_thread+0x2d/0x3e0 Jun 01 12:33:17 rksaph19 kernel: ? process_one_work+0x3e0/0x3e0 Jun 01 12:33:17 rksaph19 kernel: kthread+0x10d/0x130 Jun 01 12:33:17 rksaph19 kernel: ? kthread_park+0xa0/0xa0 Jun 01 12:33:17 rksaph19 kernel: ret_from_fork+0x22/0x40 The stack on rksaph19 doesn’t look reasonable. The iput() shouldn’t drop the last reference. I am wondering if f5785283dd64867a711ca1fb1f5bb172f252ecdf fixes your problem. thanks, wengang The ocfs2_complete_recovery routine is geting that orphan file inode dlm lock after it has acquired orphan_dir_inode dlm lock. ​ Then, the hang looks like ABBA dead lock. So far, I cannot reproduce this race condition, but according to the backtraces and the related source code, this problem does exist. The triggering conditions are related to, the node is runing reflink command when one node is fenced/down (the dlm recovery routine is triggered). Any comments? Thanks Gang _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel