From mboxrd@z Thu Jan  1 00:00:00 1970
From: Gang He <ghe@suse.com>
Date: Fri, 13 Aug 2021 14:49:04 +0800
Subject: [Cluster-devel] Why does dlm_lock function fails when
 downconvert a dlm lock?
In-Reply-To: <20210812174523.GC1757@redhat.com>
References: <AM6PR04MB6488E4D6F12BFB4B91BE9E45CFF89@AM6PR04MB6488.eurprd04.prod.outlook.com>
	<CAK-6q+hXb=hBHHgS94M94aS7JCeCuUrwVH=05oJoxt3tFatDyw@mail.gmail.com>
	<74009531-f6ef-4ef9-b969-353684006ddc@suse.com>
	<20210812174523.GC1757@redhat.com>
Message-ID: <a8800dfb-a253-5b43-f47c-bd5b9076c2ae@suse.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi David,

On 2021/8/13 1:45, David Teigland wrote:
> On Thu, Aug 12, 2021 at 01:44:53PM +0800, Gang He wrote:
>> In fact, I can reproduce this problem stably.
>> I want to know if this error happen is by our expectation? since there is
>> not any extreme pressure test.
>> Second, how should we handle these error cases? call dlm_lock function
>> again? maybe the function will fails again, that will lead to kernel
>> soft-lockup after multiple re-tries.
> 
> What's probably happening is that ocfs2 calls dlm_unlock(CANCEL) to cancel
> an in-progress dlm_lock() request.  Before the cancel completes (or the
> original request completes), ocfs2 calls dlm_lock() again on the same
> resource.  This dlm_lock() returns -EBUSY because the previous request has
> not completed, either normally or by cancellation.  This is expected.
These dlm_lock and dlm_unlock are invoked in the same node, or the 
different nodes?

> 
> A couple options to try: wait for the original request to complete
> (normally or by cancellation) before calling dlm_lock() again, or retry
> dlm_lock() on -EBUSY.
If I retry dlm_lock() repeatedly, I just wonder if this will lead to 
kernel soft lockup or waste lots of CPU.
If dlm_lock() function returns -EAGAIN, how should we handle this case?
retry it repeatedly?

Thanks
Gang

> 
> Dave
>