From: Gang He <ghe@suse.com> To: Joseph Qi <joseph.qi@linux.alibaba.com>, mark@fasheh.com, jlbec@evilplan.org, Wengang Wang <wen.gang.wang@oracle.com> Cc: linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, akpm@linux-foundation.org Subject: Re: [PATCH] ocfs2: reflink deadlock when clone file to the same directory simultaneously Date: Thu, 19 Aug 2021 14:31:36 +0800 [thread overview] Message-ID: <5a1af56c-3eab-5baf-62a3-1c98bac104ba@suse.com> (raw) In-Reply-To: <4ba3b404-824b-90a3-ef43-9ab6510ee073@linux.alibaba.com> On 2021/8/19 10:02, Joseph Qi wrote: > > > On 8/19/21 9:51 AM, Gang He wrote: >> >> >> On 2021/8/18 19:20, Joseph Qi wrote: >>> >>> >>> On 8/18/21 5:20 PM, Gang He wrote: >>>> >>>> >>>> On 2021/8/13 17:54, Joseph Qi wrote: >>>>> >>>>> >>>>> On 8/9/21 6:08 PM, Gang He wrote: >>>>>> Hi Joseph and All, >>>>>> >>>>>> The deadlock is caused by self-locking on one node. >>>>>> There is three node cluster (mounted to /mnt/shared), the user run reflink command to clone the file to the same directory repeatedly, >>>>>> e.g. >>>>>> reflink "/mnt/shared/test" \ >>>>>> "/mnt/shared/.snapshots/test.`date +%m%d%H%M%S`.`hostname`" >>>>>> >>>>>> After a while, the reflink process on each node is hung, the file system cannot be listed. >>>>>> The problematic reflink command process is blocked by itself, e.g. the reflink process is hung at ghe-sle15sp2-nd2, >>>>>> kernel: task:reflink state:D stack: 0 pid:16992 ppid: 4530 >>>>>> kernel: Call Trace: >>>>>> kernel: __schedule+0x2fd/0x750 >>>>>> kernel: ? try_to_wake_up+0x17b/0x4e0 >>>>>> kernel: schedule+0x2f/0xa0 >>>>>> kernel: schedule_timeout+0x1cc/0x310 >>>>>> kernel: ? __wake_up_common+0x74/0x120 >>>>>> kernel: wait_for_completion+0xba/0x140 >>>>>> kernel: ? wake_up_q+0xa0/0xa0 >>>>>> kernel: __ocfs2_cluster_lock.isra.41+0x3b5/0x820 [ocfs2] >>>>>> kernel: ? ocfs2_inode_lock_full_nested+0x1fc/0x960 [ocfs2] >>>>>> kernel: ocfs2_inode_lock_full_nested+0x1fc/0x960 [ocfs2] >>>>>> kernel: ocfs2_init_security_and_acl+0xbe/0x1d0 [ocfs2] >>>>>> kernel: ocfs2_reflink+0x436/0x4c0 [ocfs2] >>>>>> kernel: ? ocfs2_reflink_ioctl+0x2ca/0x360 [ocfs2] >>>>>> kernel: ocfs2_reflink_ioctl+0x2ca/0x360 [ocfs2] >>>>>> kernel: ocfs2_ioctl+0x25e/0x670 [ocfs2] >>>>>> kernel: do_vfs_ioctl+0xa0/0x680 >>>>>> kernel: ksys_ioctl+0x70/0x80 >>>>>> >>>>>> In fact, the destination directory(.snapshots) inode dlm lock was acquired by ghe-sle15sp2-nd2, next there is bast message from other nodes to ask ghe-sle15sp2-nd2 downconvert lock, but the operation failed, the kernel message is printed like, >>>>>> kernel: (ocfs2dc-AA35DD9,2560,3):ocfs2_downconvert_lock:3660 ERROR: DLM error -16 while calling ocfs2_dlm_lock on resource M0000000000000000046e0200000000 >>>>>> kernel: (ocfs2dc-AA35DD9,2560,3):ocfs2_unblock_lock:3904 ERROR: status = -16 >>>>>> kernel: (ocfs2dc-AA35DD9,2560,3):ocfs2_process_blocked_lock:4303 ERROR: status = -16 >>>>>> >>>>>> Then, the reflink process tries to acquire this directory inode dlm lock, the process is blocked, the dlm lock resource in memory looks like >>>>>> >>>>>> l_name = "M0000000000000000046e0200000000", >>>>>> l_ro_holders = 0, >>>>>> l_ex_holders = 0, >>>>>> l_level = 5 '\005', >>>>>> l_requested = 0 '\000', >>>>>> l_blocking = 5 '\005', >>>>>> l_type = 0 '\000', >>>>>> l_action = 0 '\000', >>>>>> l_unlock_action = 0 '\000', >>>>>> l_pending_gen = 645948, >>>>>> >>>>>> >>>>>> So far, I do not know what makes dlm lock function failed, it also looks we do not handle this failure case in dlmglue layer, but I always reproduce this hang with my test script, e.g. >>>>>> >>>>>> loop=1 >>>>>> while ((loop++)) ; do >>>>>> for i in `seq 1 100`; do >>>>>> reflink "/mnt/shared/test" "/mnt/shared/.snapshots /test.${loop}.${i}.`date +%m%d%H%M%S`.`hostname`" >>>>>> done >>>>>> usleep 500000 >>>>>> rm -f /mnt/shared/.snapshots/testnode1.qcow2.*.`hostname` >>>>>> done >>>>>> >>>>>> My patch changes multiple acquiring dest directory inode dlm lock during in ocfs2_reflink function, it avoids the hang issue happen again.The code change also can improve reflink performance in this case. >>>>>> >>>>>> Thanks >>>>>> Gang >>>>> >>>>> 'status = -16' implies DLM_CANCELGRANT. >>>>> Do you use stack user instead of o2cb? If yes, can you try o2cb with >>>>> your reproducer? >>>> >>>> I setup o2cb based ocfs2 clusters with sle15sp2 and oracleLinux8u4. >>>> After two day testing with the same script, I did not encounter dlm_lock downconvert failure, the hang issue did not happen. >>>> After my patch was applied, there was not any side effect, the reflink performance was doubled in the case. >>>> >>> >>> Do you mean the hang only happens on stack user? >> Yes. >> Why? since o2cb based dlm_lock did not return error -16 when downcovert dlm lock during the whole testing. >> But pmck based dlm_lock retuned error -16 during the testing, then we did not handle this error further in dlmglue layer, next encounter the hang issue when dlm_lock acquire the lock. Maybe there is a race condition when using dlm_lock/dlm_unlock(cancel) in dlmglue layer. >> Anyway, the problem belongs to ocfs2 own parts. >> > I meant if DLM_CANCELGRANT is not the expected return code, we'd > better fix the issue in stack_user.c but not dlmglue, e.g. some specific > wrapper. We cannot wrapper(or ignore) this error in stack_user, otherwise it will lead to a hang problem when the next dlm_lock is invoked. Based on comments from fs/dlm maintainer, the error -16 is returned by dlm_lock in case ocfs2 calls dlm_unlock(CANCEL) to cancel an in-progress dlm_lock() request. In fact, if you read the code comments in dlmglue.c, it also talked about the similar situation, but I feel the current code should still has a race condition, then trigger dlm_lock return -16 error. For o2cb stack, it's dlm_lock did not expose this error, maybe it is different in dlm implementation. Thanks Gang > > Thanks, > Joseph >
WARNING: multiple messages have this Message-ID (diff)
From: Gang He <ghe@suse.com> To: Joseph Qi <joseph.qi@linux.alibaba.com>, mark@fasheh.com, jlbec@evilplan.org, Wengang Wang <wen.gang.wang@oracle.com> Cc: linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com Subject: Re: [Ocfs2-devel] [PATCH] ocfs2: reflink deadlock when clone file to the same directory simultaneously Date: Thu, 19 Aug 2021 14:31:36 +0800 [thread overview] Message-ID: <5a1af56c-3eab-5baf-62a3-1c98bac104ba@suse.com> (raw) In-Reply-To: <4ba3b404-824b-90a3-ef43-9ab6510ee073@linux.alibaba.com> On 2021/8/19 10:02, Joseph Qi wrote: > > > On 8/19/21 9:51 AM, Gang He wrote: >> >> >> On 2021/8/18 19:20, Joseph Qi wrote: >>> >>> >>> On 8/18/21 5:20 PM, Gang He wrote: >>>> >>>> >>>> On 2021/8/13 17:54, Joseph Qi wrote: >>>>> >>>>> >>>>> On 8/9/21 6:08 PM, Gang He wrote: >>>>>> Hi Joseph and All, >>>>>> >>>>>> The deadlock is caused by self-locking on one node. >>>>>> There is three node cluster (mounted to /mnt/shared), the user run reflink command to clone the file to the same directory repeatedly, >>>>>> e.g. >>>>>> reflink "/mnt/shared/test" \ >>>>>> "/mnt/shared/.snapshots/test.`date +%m%d%H%M%S`.`hostname`" >>>>>> >>>>>> After a while, the reflink process on each node is hung, the file system cannot be listed. >>>>>> The problematic reflink command process is blocked by itself, e.g. the reflink process is hung at ghe-sle15sp2-nd2, >>>>>> kernel: task:reflink state:D stack: 0 pid:16992 ppid: 4530 >>>>>> kernel: Call Trace: >>>>>> kernel: __schedule+0x2fd/0x750 >>>>>> kernel: ? try_to_wake_up+0x17b/0x4e0 >>>>>> kernel: schedule+0x2f/0xa0 >>>>>> kernel: schedule_timeout+0x1cc/0x310 >>>>>> kernel: ? __wake_up_common+0x74/0x120 >>>>>> kernel: wait_for_completion+0xba/0x140 >>>>>> kernel: ? wake_up_q+0xa0/0xa0 >>>>>> kernel: __ocfs2_cluster_lock.isra.41+0x3b5/0x820 [ocfs2] >>>>>> kernel: ? ocfs2_inode_lock_full_nested+0x1fc/0x960 [ocfs2] >>>>>> kernel: ocfs2_inode_lock_full_nested+0x1fc/0x960 [ocfs2] >>>>>> kernel: ocfs2_init_security_and_acl+0xbe/0x1d0 [ocfs2] >>>>>> kernel: ocfs2_reflink+0x436/0x4c0 [ocfs2] >>>>>> kernel: ? ocfs2_reflink_ioctl+0x2ca/0x360 [ocfs2] >>>>>> kernel: ocfs2_reflink_ioctl+0x2ca/0x360 [ocfs2] >>>>>> kernel: ocfs2_ioctl+0x25e/0x670 [ocfs2] >>>>>> kernel: do_vfs_ioctl+0xa0/0x680 >>>>>> kernel: ksys_ioctl+0x70/0x80 >>>>>> >>>>>> In fact, the destination directory(.snapshots) inode dlm lock was acquired by ghe-sle15sp2-nd2, next there is bast message from other nodes to ask ghe-sle15sp2-nd2 downconvert lock, but the operation failed, the kernel message is printed like, >>>>>> kernel: (ocfs2dc-AA35DD9,2560,3):ocfs2_downconvert_lock:3660 ERROR: DLM error -16 while calling ocfs2_dlm_lock on resource M0000000000000000046e0200000000 >>>>>> kernel: (ocfs2dc-AA35DD9,2560,3):ocfs2_unblock_lock:3904 ERROR: status = -16 >>>>>> kernel: (ocfs2dc-AA35DD9,2560,3):ocfs2_process_blocked_lock:4303 ERROR: status = -16 >>>>>> >>>>>> Then, the reflink process tries to acquire this directory inode dlm lock, the process is blocked, the dlm lock resource in memory looks like >>>>>> >>>>>> l_name = "M0000000000000000046e0200000000", >>>>>> l_ro_holders = 0, >>>>>> l_ex_holders = 0, >>>>>> l_level = 5 '\005', >>>>>> l_requested = 0 '\000', >>>>>> l_blocking = 5 '\005', >>>>>> l_type = 0 '\000', >>>>>> l_action = 0 '\000', >>>>>> l_unlock_action = 0 '\000', >>>>>> l_pending_gen = 645948, >>>>>> >>>>>> >>>>>> So far, I do not know what makes dlm lock function failed, it also looks we do not handle this failure case in dlmglue layer, but I always reproduce this hang with my test script, e.g. >>>>>> >>>>>> loop=1 >>>>>> while ((loop++)) ; do >>>>>> for i in `seq 1 100`; do >>>>>> reflink "/mnt/shared/test" "/mnt/shared/.snapshots /test.${loop}.${i}.`date +%m%d%H%M%S`.`hostname`" >>>>>> done >>>>>> usleep 500000 >>>>>> rm -f /mnt/shared/.snapshots/testnode1.qcow2.*.`hostname` >>>>>> done >>>>>> >>>>>> My patch changes multiple acquiring dest directory inode dlm lock during in ocfs2_reflink function, it avoids the hang issue happen again.The code change also can improve reflink performance in this case. >>>>>> >>>>>> Thanks >>>>>> Gang >>>>> >>>>> 'status = -16' implies DLM_CANCELGRANT. >>>>> Do you use stack user instead of o2cb? If yes, can you try o2cb with >>>>> your reproducer? >>>> >>>> I setup o2cb based ocfs2 clusters with sle15sp2 and oracleLinux8u4. >>>> After two day testing with the same script, I did not encounter dlm_lock downconvert failure, the hang issue did not happen. >>>> After my patch was applied, there was not any side effect, the reflink performance was doubled in the case. >>>> >>> >>> Do you mean the hang only happens on stack user? >> Yes. >> Why? since o2cb based dlm_lock did not return error -16 when downcovert dlm lock during the whole testing. >> But pmck based dlm_lock retuned error -16 during the testing, then we did not handle this error further in dlmglue layer, next encounter the hang issue when dlm_lock acquire the lock. Maybe there is a race condition when using dlm_lock/dlm_unlock(cancel) in dlmglue layer. >> Anyway, the problem belongs to ocfs2 own parts. >> > I meant if DLM_CANCELGRANT is not the expected return code, we'd > better fix the issue in stack_user.c but not dlmglue, e.g. some specific > wrapper. We cannot wrapper(or ignore) this error in stack_user, otherwise it will lead to a hang problem when the next dlm_lock is invoked. Based on comments from fs/dlm maintainer, the error -16 is returned by dlm_lock in case ocfs2 calls dlm_unlock(CANCEL) to cancel an in-progress dlm_lock() request. In fact, if you read the code comments in dlmglue.c, it also talked about the similar situation, but I feel the current code should still has a race condition, then trigger dlm_lock return -16 error. For o2cb stack, it's dlm_lock did not expose this error, maybe it is different in dlm implementation. Thanks Gang > > Thanks, > Joseph > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
next prev parent reply other threads:[~2021-08-19 6:32 UTC|newest] Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-29 11:02 [PATCH] ocfs2: reflink deadlock when clone file to the same directory simultaneously Gang He 2021-07-29 11:02 ` [Ocfs2-devel] " Gang He 2021-07-29 22:07 ` Wengang Wang 2021-07-29 22:07 ` Wengang Wang 2021-07-30 6:16 ` Gang He 2021-07-30 6:16 ` Gang He 2021-07-30 16:35 ` Wengang Wang 2021-07-30 16:35 ` Wengang Wang 2021-08-02 4:35 ` Gang He 2021-08-02 4:35 ` Gang He 2021-08-02 18:22 ` Wengang Wang 2021-08-02 18:22 ` Wengang Wang 2021-08-03 1:00 ` Gang He 2021-08-03 1:00 ` Gang He 2021-08-03 20:08 ` Wengang Wang 2021-08-04 1:57 ` Joseph Qi 2021-08-04 1:57 ` [Ocfs2-devel] " Joseph Qi 2021-08-09 10:08 ` Gang He 2021-08-09 10:08 ` [Ocfs2-devel] " Gang He 2021-08-13 9:54 ` Joseph Qi 2021-08-13 9:54 ` [Ocfs2-devel] " Joseph Qi 2021-08-18 9:20 ` Gang He 2021-08-18 9:20 ` [Ocfs2-devel] " Gang He 2021-08-18 11:20 ` Joseph Qi 2021-08-18 11:20 ` [Ocfs2-devel] " Joseph Qi 2021-08-19 1:51 ` Gang He 2021-08-19 1:51 ` [Ocfs2-devel] " Gang He 2021-08-19 2:02 ` Joseph Qi 2021-08-19 2:02 ` [Ocfs2-devel] " Joseph Qi 2021-08-19 6:31 ` Gang He [this message] 2021-08-19 6:31 ` Gang He 2021-08-26 5:56 ` Gang He 2021-08-26 5:56 ` Gang He 2022-10-16 23:29 ` Andrew Morton 2022-10-16 23:29 ` Andrew Morton via Ocfs2-devel 2022-10-17 1:51 ` Joseph Qi 2022-10-17 1:51 ` Joseph Qi via Ocfs2-devel 2022-10-17 2:29 ` Andrew Morton 2022-10-17 2:29 ` Andrew Morton via Ocfs2-devel
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=5a1af56c-3eab-5baf-62a3-1c98bac104ba@suse.com \ --to=ghe@suse.com \ --cc=akpm@linux-foundation.org \ --cc=jlbec@evilplan.org \ --cc=joseph.qi@linux.alibaba.com \ --cc=linux-kernel@vger.kernel.org \ --cc=mark@fasheh.com \ --cc=ocfs2-devel@oss.oracle.com \ --cc=wen.gang.wang@oracle.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.