From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gang He Date: Fri, 13 Aug 2021 14:49:04 +0800 Subject: [Cluster-devel] Why does dlm_lock function fails when downconvert a dlm lock? In-Reply-To: <20210812174523.GC1757@redhat.com> References: <74009531-f6ef-4ef9-b969-353684006ddc@suse.com> <20210812174523.GC1757@redhat.com> Message-ID: List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi David, On 2021/8/13 1:45, David Teigland wrote: > On Thu, Aug 12, 2021 at 01:44:53PM +0800, Gang He wrote: >> In fact, I can reproduce this problem stably. >> I want to know if this error happen is by our expectation? since there is >> not any extreme pressure test. >> Second, how should we handle these error cases? call dlm_lock function >> again? maybe the function will fails again, that will lead to kernel >> soft-lockup after multiple re-tries. > > What's probably happening is that ocfs2 calls dlm_unlock(CANCEL) to cancel > an in-progress dlm_lock() request. Before the cancel completes (or the > original request completes), ocfs2 calls dlm_lock() again on the same > resource. This dlm_lock() returns -EBUSY because the previous request has > not completed, either normally or by cancellation. This is expected. These dlm_lock and dlm_unlock are invoked in the same node, or the different nodes? > > A couple options to try: wait for the original request to complete > (normally or by cancellation) before calling dlm_lock() again, or retry > dlm_lock() on -EBUSY. If I retry dlm_lock() repeatedly, I just wonder if this will lead to kernel soft lockup or waste lots of CPU. If dlm_lock() function returns -EAGAIN, how should we handle this case? retry it repeatedly? Thanks Gang > > Dave >