All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junxiao Bi <junxiao.bi@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] ocfs2: A race between refmap setting and clearing
Date: Tue, 19 Jan 2016 11:03:19 +0800	[thread overview]
Message-ID: <569DA777.4000909@oracle.com> (raw)
In-Reply-To: <569C8F39.6090802@huawei.com>

Hi Jiufei & Joseph,

On 01/18/2016 03:07 PM, xuejiufei wrote:
> 
> 
> On 2016/1/18 12:28, Junxiao Bi wrote:
>> On 01/13/2016 04:24 PM, Joseph Qi wrote:
>>> Hi Junxiao,
>>>
>>> On 2016/1/13 15:00, Junxiao Bi wrote:
>>>> On 01/13/2016 02:21 PM, xuejiufei wrote:
>>>>> Hi Junxiao,
>>>>> I have not describe the issue clearly.
>>>>>
>>>>> Node 1                               Node 2(master)
>>>>> dlmlock
>>>>> dlm_do_master_request
>>>>>                                 dlm_master_request_handler
>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>> dlmlock succeed
>>>>> dlmunlock succeed
>>>>>
>>>>> dlm_purge_lockres
>>>>>                                 dlm_deref_handler
>>>>>                                 -> find lock resource is in
>>>>>                                    DLM_LOCK_RES_SETREF_INPROG state,
>>>>>                                    so dispatch a deref work
>>>>> dlm_purge_lockres succeed.
>>>>>
>>>>> call dlmlock again
>>>>> dlm_do_master_request
>>>>>                                 dlm_master_request_handler
>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>>
>>>>>                                 deref work trigger, call
>>>>>                                 dlm_lockres_clear_refmap_bit
>>>>>                                 to clear Node 1 from refmap
>>>>>
>>>>>                                 dlm_purge_lockres succeed
>>>>>
>>>>> dlm_send_remote_lock_request
>>>>>                                 return DLM_IVLOCKID because
>>>>>                                 the lockres is not exist
>>>> More clear now. Thank you.
>>>> This is a very complicated race. I didn't have a good solution to fix it
>>>> now. Your fix looks work, but I am afraid if we keep going fix this
>>>> kinds of races case by case, we will make dlm harder to understand and
>>>> easy to involve bugs, maybe we should think about refactor dlm.
>>>>
>>> Agree. IMO, the root cause is bit op cannot handle such a case.
>>> I wonder if we have to change it to refcount, which may require a much
>>> bigger refactoring.
>> one bit for each node seems reasonable, as lockres is per node. I think
>> the cause is the dis-order of set/clear, i am trying to see whether they
>> can be made happen in order.
>>
> Agree. The solution 1) in my first mail is going to add a new message to
> keep the order of set and clear. Other nodes can purge the lock resource
> only after the refmap on master is cleared.
I am taking another way, try to delete deref_lockres_worker, and make
deref refmap happen directly in deref handler. This needs find the
dis-order part and fix. As I can see, the dis-order can only happen at
do_assert_master set refmap after deref_lockres_handler clear it?

For this, need make sure not return MASTERY_REF to owner after purge.
SETREF_INPROG is designed to do this. It is set by assert_master_handler
and purge will wait it cleared before deref lockres. There are two cases
where SETREF_INPROG is not set when purge

1. assert master message not coming when purge()
This happened when lockres owner already exist when asking for a
lockres. Lockres owner will be known from master request message and
dlm_get_lock_resouces return. Soon dlmlock/dlmunlock is done, and purge
done. Then assert master message coming, but since MLE have been deleted
in dlm_get_lock_resouce, it will not return MASTERY_REF to owner to
reset refmap. So no problem in this case.

2. assert master handler set SETREF_INPROG too late
SETREF_INPROG is set after lockres owner has been updated and
dlm->spinlock released. That will make dlm_get_lock_resource waken up
too fast, and then purge may happen before SETREF_INPROG is set. This is
a bug and can be fix by the following patch.

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 9477d6e1de37..83cd65b128d0 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -1965,6 +1965,7 @@ ok:
                if (res) {
                        int wake = 0;
                        spin_lock(&res->spinlock);
+                       res->state |= DLM_LOCK_RES_SETREF_INPROG;
                        if (mle->type == DLM_MLE_MIGRATION) {
                                mlog(0, "finishing off migration of
lockres %.*s, "
                                        "from %u to %u\n",
@@ -2029,12 +2030,8 @@ ok:

 done:
        ret = 0;
-       if (res) {
-               spin_lock(&res->spinlock);
-               res->state |= DLM_LOCK_RES_SETREF_INPROG;
-               spin_unlock(&res->spinlock);
+       if (res)
                *ret_data = (void *)res;
-       }
        dlm_put(dlm);
        if (master_request) {
                mlog(0, "need to tell master to reassert\n");


Besides this path, revert commit
f3f854648de64c4b6f13f6f13113bc9525c621e5 ("ocfs2_dlm: Ensure correct
ordering of set/clear refmap bit on lockres"), can this fix this issue?

Thanks,
Junxiao.



> 
> Thanks
> Jiufei
> 
>> Thanks,
>> Junxiao.
>>>
>>> Thanks,
>>> Joseph
>>>
>>>> Thanks,
>>>> Junxiao.
>>>>
>>>>> BUG if the lockres is $RECOVERY
>>>>>
>>>>> On 2016/1/13 10:46, Junxiao Bi wrote:
>>>>>> On 01/12/2016 03:16 PM, xuejiufei wrote:
>>>>>>> Hi, Junxiao
>>>>>>>
>>>>>>> On 2016/1/12 12:03, Junxiao Bi wrote:
>>>>>>>> Hi Jiufei,
>>>>>>>>
>>>>>>>> On 01/11/2016 10:46 AM, xuejiufei wrote:
>>>>>>>>> Hi all,
>>>>>>>>> We have found a race between refmap setting and clearing which
>>>>>>>>> will cause the lock resource on master is freed before other nodes
>>>>>>>>> purge it.
>>>>>>>>>
>>>>>>>>> Node 1                               Node 2(master)
>>>>>>>>> dlm_do_master_request
>>>>>>>>>                                 dlm_master_request_handler
>>>>>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>>>>>> call dlm_purge_lockres after unlock
>>>>>>>>>                                 dlm_deref_handler
>>>>>>>>>                                 -> find lock resource is in
>>>>>>>>>                                    DLM_LOCK_RES_SETREF_INPROG state,
>>>>>>>>>                                    so dispatch a deref work
>>>>>>>>> dlm_purge_lockres succeed.
>>>>>>>>>
>>>>>>>>> dlm_do_master_request
>>>>>>>>>                                 dlm_master_request_handler
>>>>>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>>>>>>
>>>>>>>>>                                 deref work trigger, call
>>>>>>>>>                                 dlm_lockres_clear_refmap_bit
>>>>>>>>>                                 to clear Node 1 from refmap
>>>>>>>>>
>>>>>>>>> Now Node 2 can purge the lock resource but the owner of lock resource
>>>>>>>>> on Node 1 is still Node 2 which may trigger BUG if the lock resource
>>>>>>>>> is $RECOVERY or other problems.
>>>>>>>>>
>>>>>>>>> We have discussed 2 solutions:
>>>>>>>>> 1)The master return error to Node 1 if the DLM_LOCK_RES_SETREF_INPROG
>>>>>>>>> is set. Node 1 will not retry and master send another message to Node 1
>>>>>>>>> after clearing the refmap. Node 1 can purge the lock resource after the
>>>>>>>>> refmap on master is cleared.
>>>>>>>>> 2) The master return error to Node 1 if the DLM_LOCK_RES_SETREF_INPROG
>>>>>>>>> is set, and Node 1 will retry to deref the lockres.
>>>>>>>>>
>>>>>>>>> Does anybody has better ideas?
>>>>>>>>>
>>>>>>>> dlm_purge_lockres() will wait to drop ref until
>>>>>>>> DLM_LOCK_RES_SETREF_INPROG cleared. So if set this flag when find the
>>>>>>>> master during doing master request. And then this flag was cleared when
>>>>>>>> receiving assert master message, can this fix the issue?
>>>>>>>>
>>>>>>> I don't think this can fix. Before doing master request, the lock resource is
>>>>>>> already purged. The master should clear the refmap before client purge it.
>>>>>> inflight_locks is increased in dlm_get_lock_resource() which will stop
>>>>>> lockres purged? Set DLM_LOCK_RES_SETREF_INPROG when found lockres owner
>>>>>> during master request, then this will stop lockres purged after unlock?
>>>>>>
>>>>>> Thanks,
>>>>>> Junxiao.
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jiufei
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Junxiao.
>>>>>>>>> Thanks,
>>>>>>>>> --Jiufei
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>
>>
>> .
>>
> 

  reply	other threads:[~2016-01-19  3:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-11  2:46 [Ocfs2-devel] ocfs2: A race between refmap setting and clearing xuejiufei
2016-01-12  4:03 ` Junxiao Bi
2016-01-12  7:16   ` xuejiufei
2016-01-13  2:46     ` Junxiao Bi
2016-01-13  6:21       ` xuejiufei
2016-01-13  7:00         ` Junxiao Bi
2016-01-13  8:24           ` Joseph Qi
2016-01-18  4:28             ` Junxiao Bi
2016-01-18  7:07               ` xuejiufei
2016-01-19  3:03                 ` Junxiao Bi [this message]
2016-01-19  8:19                   ` xuejiufei
2016-01-19  9:02                     ` Junxiao Bi
2016-01-21  7:34 ` Junxiao Bi
2016-01-26  1:43   ` xuejiufei
2016-01-26  2:45     ` Junxiao Bi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=569DA777.4000909@oracle.com \
    --to=junxiao.bi@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.