All of lore.kernel.org
 help / color / mirror / Atom feed
* Potential race in dlm based messaging md-cluster.c
       [not found]     ` <CAE3Hb8pJ=0MB6EX5jVch28gj-gnf0Mp1wyzxBfWjzLf=SuV4sQ@mail.gmail.com>
@ 2015-04-30 18:36       ` Abhijit Bhopatkar
  2015-04-30 18:47         ` Abhijit Bhopatkar
  2015-05-05  9:22         ` Lidong Zhong
  0 siblings, 2 replies; 9+ messages in thread
From: Abhijit Bhopatkar @ 2015-04-30 18:36 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-raid

There is a possibility of a receiver losing out on messages in certain 
corner conditions. One of the buggy case is if there is are two sender 
ready with messages to be sent. Sender 1 initially gets the TOKEN lock 
and proceeds.
After initial processing the sender of message 1 _will_ release TOKEN as 
soon as receiver releases ACK, it does not wait till ACK CR is 
re-acquired by receiver.

To illustrate the problem consider timeline for two senders and one 
receiver (we will ignore receive part for Sender2 node)

Sender1              Sender2                         Receiver
Get EX on TOKEN       Get EX on TOKEN
<Granted>                    <Wait till granted>

Get EX on MSG
write LVB
down MSG to CR
Get EX of ACK
<wait till granted>                                                     
      BAST for ACK
                                                             Get CR on MSG
                     read LVB
                     process
                     release ACK
AST for ACK
down ACK to CR
release MSG
release TOKEN
                    <granted>
                    Get EX on MSG
                    <... proceed ...>
                    release TOKEN
  <lost one message>
^^^^^^^^^^^^^^^^^
                                                              Get EX on MSG
                                                              Get CR on ACK
release MSG


Abhijit

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Potential race in dlm based messaging md-cluster.c
  2015-04-30 18:36       ` Potential race in dlm based messaging md-cluster.c Abhijit Bhopatkar
@ 2015-04-30 18:47         ` Abhijit Bhopatkar
  2015-04-30 18:51           ` Abhijit Bhopatkar
  2015-05-05  9:22         ` Lidong Zhong
  1 sibling, 1 reply; 9+ messages in thread
From: Abhijit Bhopatkar @ 2015-04-30 18:47 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-raid

On 01/05/15 12:06 am, Abhijit Bhopatkar wrote:
> There is a possibility of a receiver losing out on messages in certain
> corner conditions. One of the buggy case is if there is are two sender
> ready with messages to be sent. Sender 1 initially gets the TOKEN lock
> and proceeds.
> After initial processing the sender of message 1 _will_ release TOKEN as
> soon as receiver releases ACK, it does not wait till ACK CR is
> re-acquired by receiver.
>
I could not come up with any solution except to add one more lock
resource for now we will call it "SYNC"

Sender 1             Sender2                  Receiver
Get EX on TOKEN      Get EX on TOKEN
Get EX on SYNC       <Wait till granted>
<Granted>

Get EX on MSG
write LVB
down MSG to CR
Get EX of ACK
<wait till granted>                           BAST for ACK
                                               Get CR on MSG
                                               read LVB
                                               <process>
                                               Queue EX on SYNC
                                               release ACK
AST for ACK
down ACK to CR
release MSG
release SYNC
release TOKEN
                                                SYNC  granted
                     <granted>
                     Get EX on SYNC
                     <wait till grant>
                                                Get EX on MSG
                                                Get CR on ACK
                                                release MSG
                                                release SYNC

                     Get EX on MSG
                     <....proceed rest>
                     release TOKEN

The key thing to note here is that the SYNC lock request is only queued
in receiver path. Having worked in dlm before I know for sure this will
work as expected.

Abhijit



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Potential race in dlm based messaging md-cluster.c
  2015-04-30 18:47         ` Abhijit Bhopatkar
@ 2015-04-30 18:51           ` Abhijit Bhopatkar
  0 siblings, 0 replies; 9+ messages in thread
From: Abhijit Bhopatkar @ 2015-04-30 18:51 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-raid

On 01/05/15 12:17 am, Abhijit Bhopatkar wrote:
> On 01/05/15 12:06 am, Abhijit Bhopatkar wrote:
>> There is a possibility of a receiver losing out on messages in certain
>> corner conditions. One of the buggy case is if there is are two sender
>> ready with messages to be sent. Sender 1 initially gets the TOKEN lock
>> and proceeds.
>> After initial processing the sender of message 1 _will_ release TOKEN as
>> soon as receiver releases ACK, it does not wait till ACK CR is
>> re-acquired by receiver.
>>
> I could not come up with any solution except to add one more lock
> resource for now we will call it "SYNC"
>
Here is POC patch (completely untested only as an RFC not even compiled)
If the solution is agreed upon I will go ahead and test it.

Abhijit
---
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index fcfc4b9..addbbb4 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -62,6 +62,7 @@ struct md_cluster_info {
  	struct dlm_lock_resource *ack_lockres;
  	struct dlm_lock_resource *message_lockres;
  	struct dlm_lock_resource *token_lockres;
+	struct dlm_lock_resource *sync_lockres;
  	struct dlm_lock_resource *no_new_dev_lockres;
  	struct md_thread *recv_thread;
  	struct completion newdisk_completion;
@@ -94,7 +95,7 @@ static void sync_ast(void *arg)
  	complete(&res->completion);
  }
  
-static int dlm_lock_sync(struct dlm_lock_resource *res, int mode)
+static int dlm_lock_queue(struct dlm_lock_resource *res, int mode)
  {
  	int ret = 0;
  
@@ -102,12 +103,26 @@ static int dlm_lock_sync(struct dlm_lock_resource *res, int mode)
  	ret = dlm_lock(res->ls, mode, &res->lksb,
  			res->flags, res->name, strlen(res->name),
  			0, sync_ast, res, res->bast);
-	if (ret)
-		return ret;
+	return ret;
+}
+
+static int dlm_wait_for_lock_grant(struct dlm_lock_resource *res)
+{
  	wait_for_completion(&res->completion);
  	return res->lksb.sb_status;
  }
  
+static int dlm_lock_sync(struct dlm_lock_resource *res, int mode)
+{
+	int ret = 0;
+	ret = dlm_lock_queue(res,mode);
+
+	if (ret)
+		return ret;
+	ret = dlm_wait_for_lock_grant(res);
+	return ret;
+}
+
  static int dlm_unlock_sync(struct dlm_lock_resource *res)
  {
  	return dlm_lock_sync(res, DLM_LOCK_NL);
@@ -466,6 +481,7 @@ static void recv_daemon(struct md_thread *thread)
  	struct md_cluster_info *cinfo = thread->mddev->cluster_info;
  	struct dlm_lock_resource *ack_lockres = cinfo->ack_lockres;
  	struct dlm_lock_resource *message_lockres = cinfo->message_lockres;
+	struct dlm_lock_resource *sync_lockres = cinfo->sync_lockres;
  	struct cluster_msg msg;
  
  	/*get CR on Message*/
@@ -478,6 +494,9 @@ static void recv_daemon(struct md_thread *thread)
  	memcpy(&msg, message_lockres->lksb.sb_lvbptr, sizeof(struct cluster_msg));
  	process_recvd_msg(thread->mddev, &msg);
  
+	/*queue EX on TOKEN blocks new senders till we acquire CR on ACK */
+	dlm_lock_queue(sync_lockres,DLM_LOCK_EX);
+
  	/*release CR on ack_lockres*/
  	dlm_unlock_sync(ack_lockres);
  	/*up-convert to EX on message_lockres*/
@@ -486,6 +505,11 @@ static void recv_daemon(struct md_thread *thread)
  	dlm_lock_sync(ack_lockres, DLM_LOCK_CR);
  	/*release CR on message_lockres*/
  	dlm_unlock_sync(message_lockres);
+
+	/*wait till EX on token is granted */
+	dlm_wait_for_lock_grant(token_lockres);
+	/*release EX on token_lockres*/
+	dlm_unlock_sync(sync_lockres);
  }
  
  /* lock_comm()
@@ -500,11 +524,16 @@ static int lock_comm(struct md_cluster_info *cinfo)
  	if (error)
  		pr_err("md-cluster(%s:%d): failed to get EX on TOKEN (%d)\n",
  				__func__, __LINE__, error);
+	error = dlm_lock_sync(cinfo->sync_lockres, DLM_LOCK_EX);
+	if (error)
+		pr_err("md-cluster(%s:%d): failed to get EX on SYNC (%d)\n",
+				__func__, __LINE__, error);
  	return error;
  }
  
  static void unlock_comm(struct md_cluster_info *cinfo)
  {
+	dlm_unlock_sync(cinfo->sync_lockres);
  	dlm_unlock_sync(cinfo->token_lockres);
  }
  
@@ -673,6 +702,9 @@ static int join(struct mddev *mddev, int nodes)
  	cinfo->token_lockres = lockres_init(mddev, "token", NULL, 0);
  	if (!cinfo->token_lockres)
  		goto err;
+	cinfo->sync_lockres = lockres_init(mddev, "sync", NULL, 0);
+	if (!cinfo->sync_lockres)
+		goto err;
  	cinfo->ack_lockres = lockres_init(mddev, "ack", ack_bast, 0);
  	if (!cinfo->ack_lockres)
  		goto err;
@@ -711,6 +743,7 @@ static int join(struct mddev *mddev, int nodes)
  err:
  	lockres_free(cinfo->message_lockres);
  	lockres_free(cinfo->token_lockres);
+	lockres_free(cinfo->sync_lockres);
  	lockres_free(cinfo->ack_lockres);
  	lockres_free(cinfo->no_new_dev_lockres);
  	lockres_free(cinfo->bitmap_lockres);
@@ -733,6 +766,7 @@ static int leave(struct mddev *mddev)
  	md_unregister_thread(&cinfo->recv_thread);
  	lockres_free(cinfo->message_lockres);
  	lockres_free(cinfo->token_lockres);
+	lockres_free(cinfo->sync_lockres);
  	lockres_free(cinfo->ack_lockres);
  	lockres_free(cinfo->no_new_dev_lockres);
  	lockres_free(cinfo->sb_lock);


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Potential race in dlm based messaging md-cluster.c
  2015-04-30 18:36       ` Potential race in dlm based messaging md-cluster.c Abhijit Bhopatkar
  2015-04-30 18:47         ` Abhijit Bhopatkar
@ 2015-05-05  9:22         ` Lidong Zhong
  2015-05-05  9:44           ` Abhijit Bhopatkar
  1 sibling, 1 reply; 9+ messages in thread
From: Lidong Zhong @ 2015-05-05  9:22 UTC (permalink / raw)
  To: Abhijit Bhopatkar, Goldwyn Rodrigues; +Cc: linux-raid

>>> On 5/1/2015 at 02:36 AM, in message <5542763C.90202@cisco.com>, Abhijit
Bhopatkar <abhopatk@cisco.com> wrote: 
> There is a possibility of a receiver losing out on messages in certain  
> corner conditions. One of the buggy case is if there is are two sender  
> ready with messages to be sent. Sender 1 initially gets the TOKEN lock  
> and proceeds. 
> After initial processing the sender of message 1 _will_ release TOKEN as  
> soon as receiver releases ACK, it does not wait till ACK CR is  
> re-acquired by receiver. 
>  
> To illustrate the problem consider timeline for two senders and one  
> receiver (we will ignore receive part for Sender2 node) 
>  
> Sender1              Sender2                         Receiver 
> Get EX on TOKEN       Get EX on TOKEN 
> <Granted>                    <Wait till granted> 
>  
> Get EX on MSG 
> write LVB 
> down MSG to CR 
> Get EX of ACK 
> <wait till granted>                                                      
>       BAST for ACK 
>                                                              Get CR on MSG 
>                      read LVB 
>                      process 
>                      release ACK 
> AST for ACK 
> down ACK to CR 
> release MSG 
> release TOKEN 
>                     <granted> 
>                     Get EX on MSG 

I am afraid this corner case could not be achieved ever. Sender2 will be blocked on getting 
EX lock on MSG resource until the receivers release the lock. The receivers' request on 
upconverting CR to EX on MSG should be put into the convert queue before Sender2's 
request being put into the wait queue, because sender2 has to wait until the EX on TOKEN 
is released.

Regards,
Lidong
 
>                     <... proceed ...> 
>                     release TOKEN 
>   <lost one message> 
> ^^^^^^^^^^^^^^^^^ 
>                                                               Get EX on MSG 
>                                                               Get CR on ACK 
> release MSG 
>  
>  
> Abhijit 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 
>  
>  



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Potential race in dlm based messaging md-cluster.c
  2015-05-05  9:22         ` Lidong Zhong
@ 2015-05-05  9:44           ` Abhijit Bhopatkar
  2015-05-05 12:10             ` Abhijit Bhopatkar
  0 siblings, 1 reply; 9+ messages in thread
From: Abhijit Bhopatkar @ 2015-05-05  9:44 UTC (permalink / raw)
  To: Lidong Zhong, Goldwyn Rodrigues; +Cc: linux-raid

On 05/05/15 2:52 pm, Lidong Zhong wrote:
>>>> On 5/1/2015 at 02:36 AM, in message <5542763C.90202@cisco.com>, Abhijit
> Bhopatkar <abhopatk@cisco.com> wrote:
>> There is a possibility of a receiver losing out on messages in certain
>> corner conditions. One of the buggy case is if there is are two sender
>> ready with messages to be sent. Sender 1 initially gets the TOKEN lock
>> and proceeds.
>> After initial processing the sender of message 1 _will_ release TOKEN as
>> soon as receiver releases ACK, it does not wait till ACK CR is
>> re-acquired by receiver.
>>
>> To illustrate the problem consider timeline for two senders and one
>> receiver (we will ignore receive part for Sender2 node)
>>
>> Sender1              Sender2                         Receiver
>> Get EX on TOKEN       Get EX on TOKEN
>> <Granted>                    <Wait till granted>
>>
>> Get EX on MSG
>> write LVB
>> down MSG to CR
>> Get EX of ACK
>> <wait till granted>
>>        BAST for ACK
>>                                                               Get CR on MSG
>>                       read LVB
>>                       process
>>                       release ACK
>> AST for ACK
>> down ACK to CR
>> release MSG
>> release TOKEN
>>                      <granted>
>>                      Get EX on MSG
>
> I am afraid this corner case could not be achieved ever. Sender2 will be blocked on getting
> EX lock on MSG resource until the receivers release the lock. The receivers' request on
> upconverting CR to EX on MSG should be put into the convert queue before Sender2's
> request being put into the wait queue, because sender2 has to wait until the EX on TOKEN
> is released.
>
Yes my initial though of losing a message is not correct. The EX on message won't be granted
immediately to Sender2 However there is still a deadlock.

Perhaps i am missing something, but according to me nothing prevents Sender2 from acquiring
EX on TOKEN _and_ MESSAGE __before__ up convert from reciever is queued.  Consider adding
unusual delay right after ACK is released on receiver. The Sender1 will immediately release
MESSAGE and TOKEN. The receiver is still delayed for whatever reason. Sender2 gets TOKEN grant
and immediately queues EX for MESSAGE (note this is before EX for MESSAGE is queued by receiver).

DLM will (should?) return error for the up convert saying there is deadlock (-EDEADLK ??)

This also assumes BAST on MESSAGE is NOP and receiver does not let go of MESSAGE CR.

Abhijit

> Regards,
> Lidong


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Potential race in dlm based messaging md-cluster.c
  2015-05-05  9:44           ` Abhijit Bhopatkar
@ 2015-05-05 12:10             ` Abhijit Bhopatkar
  2015-05-07  2:43               ` Lidong Zhong
  0 siblings, 1 reply; 9+ messages in thread
From: Abhijit Bhopatkar @ 2015-05-05 12:10 UTC (permalink / raw)
  To: Lidong Zhong, Goldwyn Rodrigues; +Cc: linux-raid

On 05/05/15 3:14 pm, Abhijit Bhopatkar wrote:
> On 05/05/15 2:52 pm, Lidong Zhong wrote:
>>>>> On 5/1/2015 at 02:36 AM, in message <5542763C.90202@cisco.com>, Abhijit
>> Bhopatkar <abhopatk@cisco.com> wrote:

<snip>

>>>
>>> To illustrate the problem consider timeline for two senders and one
>>> receiver (we will ignore receive part for Sender2 node)
>>>
>>> Sender1              Sender2                         Receiver
>>> Get EX on TOKEN       Get EX on TOKEN
>>> <Granted>                    <Wait till granted>
>>>
>>> Get EX on MSG
>>> write LVB
>>> down MSG to CR
>>> Get EX of ACK
>>> <wait till granted>
>>>        BAST for ACK
>>>                                                               Get CR on MSG
>>>                       read LVB
>>>                       process
>>>                       release ACK
>>> AST for ACK
>>> down ACK to CR
>>> release MSG
>>> release TOKEN
>>>                      <granted>
>>>                      Get EX on MSG
>>
>> I am afraid this corner case could not be achieved ever. Sender2 will be blocked on getting
>> EX lock on MSG resource until the receivers release the lock. The receivers' request on
>> upconverting CR to EX on MSG should be put into the convert queue before Sender2's
>> request being put into the wait queue, because sender2 has to wait until the EX on TOKEN
>> is released.
>>
> Yes my initial though of losing a message is not correct. The EX on message won't be granted
> immediately to Sender2 However there is still a deadlock.
>
> Perhaps i am missing something, but according to me nothing prevents Sender2 from acquiring
> EX on TOKEN _and_ MESSAGE __before__ up convert from reciever is queued.  Consider adding
> unusual delay right after ACK is released on receiver. The Sender1 will immediately release
> MESSAGE and TOKEN. The receiver is still delayed for whatever reason. Sender2 gets TOKEN grant
> and immediately queues EX for MESSAGE (note this is before EX for MESSAGE is queued by receiver).
>
> DLM will (should?) return error for the up convert saying there is deadlock (-EDEADLK ??)
>

On further investigation in dlm code. Since we do not set DLM_LKF_CONVDEADLK flag on our locks,
in above deadlock case receiver's request to up convert will be simply canceled. And the code
will proceed as expected since receiver still holds CR on MESSAGE. And then after the processing
we will release the CR.

So now my question is changed to;

Why do we up convert the MESSAGE to EX in the first place?

Was receiver EX on MESSAGE intended to serialize all receivers before taking CR on ACK?

Since there is a possibility that we might lose out on this up convert in a race  condition, can
we simply eliminate this up conversion? (since CR is preventing the next Sender from taking
EX on MESSAGE anyway).

Regards,
Abhijit


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Potential race in dlm based messaging md-cluster.c
  2015-05-05 12:10             ` Abhijit Bhopatkar
@ 2015-05-07  2:43               ` Lidong Zhong
  2015-05-07  9:14                 ` Abhijit Bhopatkar
  0 siblings, 1 reply; 9+ messages in thread
From: Lidong Zhong @ 2015-05-07  2:43 UTC (permalink / raw)
  To: Abhijit Bhopatkar, Goldwyn Rodrigues; +Cc: linux-raid

>>> On 5/5/2015 at 08:10 PM, in message <5548B32B.5070904@cisco.com>, Abhijit
Bhopatkar <abhopatk@cisco.com> wrote: 
> On 05/05/15 3:14 pm, Abhijit Bhopatkar wrote: 
> > On 05/05/15 2:52 pm, Lidong Zhong wrote: 
> >>>>> On 5/1/2015 at 02:36 AM, in message <5542763C.90202@cisco.com>, Abhijit 
> >> Bhopatkar <abhopatk@cisco.com> wrote: 
>  
> <snip> 
>  
> >>> 
> >>> To illustrate the problem consider timeline for two senders and one 
> >>> receiver (we will ignore receive part for Sender2 node) 
> >>> 
> >>> Sender1              Sender2                         Receiver 
> >>> Get EX on TOKEN       Get EX on TOKEN 
> >>> <Granted>                    <Wait till granted> 
> >>> 
> >>> Get EX on MSG 
> >>> write LVB 
> >>> down MSG to CR 
> >>> Get EX of ACK 
> >>> <wait till granted> 
> >>>        BAST for ACK 
> >>>                                                               Get CR on MSG 
> >>>                       read LVB 
> >>>                       process 
> >>>                       release ACK 
> >>> AST for ACK 
> >>> down ACK to CR 
> >>> release MSG 
> >>> release TOKEN 
> >>>                      <granted> 
> >>>                      Get EX on MSG 
> >> 
> >> I am afraid this corner case could not be achieved ever. Sender2 will be  
> blocked on getting 
> >> EX lock on MSG resource until the receivers release the lock. The  
> receivers' request on 
> >> upconverting CR to EX on MSG should be put into the convert queue before  
> Sender2's 
> >> request being put into the wait queue, because sender2 has to wait until  
> the EX on TOKEN 
> >> is released. 
> >> 
> > Yes my initial though of losing a message is not correct. The EX on message  
> won't be granted 
> > immediately to Sender2 However there is still a deadlock. 
> > 
> > Perhaps i am missing something, but according to me nothing prevents  
> Sender2 from acquiring 
> > EX on TOKEN _and_ MESSAGE __before__ up convert from reciever is queued.   
> Consider adding 
> > unusual delay right after ACK is released on receiver. The Sender1 will  
> immediately release 
> > MESSAGE and TOKEN. The receiver is still delayed for whatever reason.  
> Sender2 gets TOKEN grant 
> > and immediately queues EX for MESSAGE (note this is before EX for MESSAGE  
> is queued by receiver). 
> > 

Yes, there is a possibility leading to deadlock here.
> > DLM will (should?) return error for the up convert saying there is deadlock  
> (-EDEADLK ??) 
> > 
>  
> On further investigation in dlm code. Since we do not set DLM_LKF_CONVDEADLK  
> flag on our locks, 
> in above deadlock case receiver's request to up convert will be simply  
> canceled. And the code 
> will proceed as expected since receiver still holds CR on MESSAGE. And then  
> after the processing 
> we will release the CR. 
>  
> So now my question is changed to; 
>  
> Why do we up convert the MESSAGE to EX in the first place? 
>  
> Was receiver EX on MESSAGE intended to serialize all receivers before taking  
> CR on ACK? 
>  

Yes, it is. Otherwise, each receiver may get duplicate messages when they try to
get CR on ACK while the sender doesn't downconvert EX on ACK in time.

What I can think of a way to fix the deadlock now is setting the DLM_LKF_NOQUEUE
flag when the sender tries to get EX on MESSAGE. It should keep trying until all the 
receivers release their locks on MESSAGE. Do you have any better idea without adding
more lock resources? Since we already have three for transmitting messages.

Regards,
Lidong


> Since there is a possibility that we might lose out on this up convert in a  
> race  condition, can 
> we simply eliminate this up conversion? (since CR is preventing the next  
> Sender from taking 
> EX on MESSAGE anyway). 
>  
> Regards, 
> Abhijit 
>  
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 
>  
>  


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Potential race in dlm based messaging md-cluster.c
  2015-05-07  2:43               ` Lidong Zhong
@ 2015-05-07  9:14                 ` Abhijit Bhopatkar
  2015-05-08  5:06                   ` Lidong Zhong
  0 siblings, 1 reply; 9+ messages in thread
From: Abhijit Bhopatkar @ 2015-05-07  9:14 UTC (permalink / raw)
  To: Lidong Zhong, Goldwyn Rodrigues; +Cc: linux-raid

On 07/05/15 8:13 am, Lidong Zhong wrote:
>>>> On 5/5/2015 at 08:10 PM, in message <5548B32B.5070904@cisco.com>, Abhijit
> Bhopatkar <abhopatk@cisco.com> wrote:
>> On 05/05/15 3:14 pm, Abhijit Bhopatkar wrote:
>>> On 05/05/15 2:52 pm, Lidong Zhong wrote:
>>>>>>> On 5/1/2015 at 02:36 AM, in message <5542763C.90202@cisco.com>, Abhijit
>>>> Bhopatkar <abhopatk@cisco.com> wrote:
>>
>> <snip>
>>
>>>>>
>>>>> To illustrate the problem consider timeline for two senders and one
>>>>> receiver (we will ignore receive part for Sender2 node)
>>>>>
>>>>> Sender1              Sender2                         Receiver
>>>>> Get EX on TOKEN       Get EX on TOKEN
>>>>> <Granted>                    <Wait till granted>
>>>>>
>>>>> Get EX on MSG
>>>>> write LVB
>>>>> down MSG to CR
>>>>> Get EX of ACK
>>>>> <wait till granted>
>>>>>         BAST for ACK
>>>>>                                                                Get CR on MSG
>>>>>                        read LVB
>>>>>                        process
>>>>>                        release ACK
>>>>> AST for ACK
>>>>> down ACK to CR
>>>>> release MSG
>>>>> release TOKEN
>>>>>                       <granted>
>>>>>                       Get EX on MSG
>>>>
>>>> I am afraid this corner case could not be achieved ever. Sender2 will be
>> blocked on getting
>>>> EX lock on MSG resource until the receivers release the lock. The
>> receivers' request on
>>>> upconverting CR to EX on MSG should be put into the convert queue before
>> Sender2's
>>>> request being put into the wait queue, because sender2 has to wait until
>> the EX on TOKEN
>>>> is released.
>>>>
>>> Yes my initial though of losing a message is not correct. The EX on message
>> won't be granted
>>> immediately to Sender2 However there is still a deadlock.
>>>
>>> Perhaps i am missing something, but according to me nothing prevents
>> Sender2 from acquiring
>>> EX on TOKEN _and_ MESSAGE __before__ up convert from reciever is queued.
>> Consider adding
>>> unusual delay right after ACK is released on receiver. The Sender1 will
>> immediately release
>>> MESSAGE and TOKEN. The receiver is still delayed for whatever reason.
>> Sender2 gets TOKEN grant
>>> and immediately queues EX for MESSAGE (note this is before EX for MESSAGE
>> is queued by receiver).
>>>
>
> Yes, there is a possibility leading to deadlock here.
>>> DLM will (should?) return error for the up convert saying there is deadlock
>> (-EDEADLK ??)
>>>
>>
>> On further investigation in dlm code. Since we do not set DLM_LKF_CONVDEADLK
>> flag on our locks,
>> in above deadlock case receiver's request to up convert will be simply
>> canceled. And the code
>> will proceed as expected since receiver still holds CR on MESSAGE. And then
>> after the processing
>> we will release the CR.
>>
>> So now my question is changed to;
>>
>> Why do we up convert the MESSAGE to EX in the first place?
>>
>> Was receiver EX on MESSAGE intended to serialize all receivers before taking
>> CR on ACK?
>>
>
> Yes, it is. Otherwise, each receiver may get duplicate messages when they try to
> get CR on ACK while the sender doesn't downconvert EX on ACK in time.

If I am reading this right, are we afraid of getting second BAST call on receiver?
Sender is holding EX on ACK, receiver releases CR of ACK after processing the message.
But sender is delayed in releasing EX on ACK. Receiver re-queues CR on ACK, which
might trigger BAST? (Note receiver won't get CR grant until sender released EX).

A new CR by receiver on ACK will _not_ trigger BAST call. Instead no AST will be called
until the original EX on ACK by sender is not released. BAST is called only on locks
that are already granted. Since we trigger message processing only on BAST I don't
see a possibility of duplicate message here.

>
> What I can think of a way to fix the deadlock now is setting the DLM_LKF_NOQUEUE
> flag when the sender tries to get EX on MESSAGE. It should keep trying until all the
> receivers release their locks on MESSAGE. Do you have any better idea without adding
> more lock resources? Since we already have three for transmitting messages.
>
Its exactly what I was thinking about and sounds like a good solution. However
as said  above I don't think receiver EX on ACK is really needed.

Regards,
Abhijit

> Regards,
> Lidong
>
>
>> Since there is a possibility that we might lose out on this up convert in a
>> race  condition, can
>> we simply eliminate this up conversion? (since CR is preventing the next
>> Sender from taking
>> EX on MESSAGE anyway).
>>
>> Regards,
>> Abhijit
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Potential race in dlm based messaging md-cluster.c
  2015-05-07  9:14                 ` Abhijit Bhopatkar
@ 2015-05-08  5:06                   ` Lidong Zhong
  0 siblings, 0 replies; 9+ messages in thread
From: Lidong Zhong @ 2015-05-08  5:06 UTC (permalink / raw)
  To: Abhijit Bhopatkar, Goldwyn Rodrigues; +Cc: linux-raid

>>> On 5/7/2015 at 05:14 PM, in message <554B2CED.5050903@cisco.com>, Abhijit
Bhopatkar <abhopatk@cisco.com> wrote: 
> On 07/05/15 8:13 am, Lidong Zhong wrote: 
>>>>> On 5/5/2015 at 08:10 PM, in message <5548B32B.5070904@cisco.com>, Abhijit 
> > Bhopatkar <abhopatk@cisco.com> wrote: 
> >> On 05/05/15 3:14 pm, Abhijit Bhopatkar wrote: 
> >>> On 05/05/15 2:52 pm, Lidong Zhong wrote: 
> >>>>>>> On 5/1/2015 at 02:36 AM, in message <5542763C.90202@cisco.com>, Abhijit 
> >>>> Bhopatkar <abhopatk@cisco.com> wrote: 
> >> 
> >> <snip> 
> >> 
> >>>>> 
> >>>>> To illustrate the problem consider timeline for two senders and one 
> >>>>> receiver (we will ignore receive part for Sender2 node) 
> >>>>> 
> >>>>> Sender1              Sender2                         Receiver 
> >>>>> Get EX on TOKEN       Get EX on TOKEN 
> >>>>> <Granted>                    <Wait till granted> 
> >>>>> 
> >>>>> Get EX on MSG 
> >>>>> write LVB 
> >>>>> down MSG to CR 
> >>>>> Get EX of ACK 
> >>>>> <wait till granted> 
> >>>>>         BAST for ACK 
> >>>>>                                                                Get CR on  
> MSG 
> >>>>>                        read LVB 
> >>>>>                        process 
> >>>>>                        release ACK 
> >>>>> AST for ACK 
> >>>>> down ACK to CR 
> >>>>> release MSG 
> >>>>> release TOKEN 
> >>>>>                       <granted> 
> >>>>>                       Get EX on MSG 
> >>>> 
> >>>> I am afraid this corner case could not be achieved ever. Sender2 will be 
> >> blocked on getting 
> >>>> EX lock on MSG resource until the receivers release the lock. The 
> >> receivers' request on 
> >>>> upconverting CR to EX on MSG should be put into the convert queue before 
> >> Sender2's 
> >>>> request being put into the wait queue, because sender2 has to wait until 
> >> the EX on TOKEN 
> >>>> is released. 
> >>>> 
> >>> Yes my initial though of losing a message is not correct. The EX on message 
> >> won't be granted 
> >>> immediately to Sender2 However there is still a deadlock. 
> >>> 
> >>> Perhaps i am missing something, but according to me nothing prevents 
> >> Sender2 from acquiring 
> >>> EX on TOKEN _and_ MESSAGE __before__ up convert from reciever is queued. 
> >> Consider adding 
> >>> unusual delay right after ACK is released on receiver. The Sender1 will 
> >> immediately release 
> >>> MESSAGE and TOKEN. The receiver is still delayed for whatever reason. 
> >> Sender2 gets TOKEN grant 
> >>> and immediately queues EX for MESSAGE (note this is before EX for MESSAGE 
> >> is queued by receiver). 
> >>> 
> > 
> > Yes, there is a possibility leading to deadlock here. 
> >>> DLM will (should?) return error for the up convert saying there is deadlock 
> >> (-EDEADLK ??) 
> >>> 
> >> 
> >> On further investigation in dlm code. Since we do not set  
> DLM_LKF_CONVDEADLK 
> >> flag on our locks, 
> >> in above deadlock case receiver's request to up convert will be simply 
> >> canceled. And the code 
> >> will proceed as expected since receiver still holds CR on MESSAGE. And then 
> >> after the processing 
> >> we will release the CR. 
> >> 
> >> So now my question is changed to; 
> >> 
> >> Why do we up convert the MESSAGE to EX in the first place? 
> >> 
> >> Was receiver EX on MESSAGE intended to serialize all receivers before  
> taking 
> >> CR on ACK? 
> >> 
> > 
> > Yes, it is. Otherwise, each receiver may get duplicate messages when they  
> try to 
> > get CR on ACK while the sender doesn't downconvert EX on ACK in time. 
>  
> If I am reading this right, are we afraid of getting second BAST call on  
> receiver? 
> Sender is holding EX on ACK, receiver releases CR of ACK after processing  
> the message. 
> But sender is delayed in releasing EX on ACK. Receiver re-queues CR on ACK,  
> which 
> might trigger BAST? (Note receiver won't get CR grant until sender released  
> EX). 
>  

Yes and I think the reason you have explained well enough. Actually at first we 
did as you said, but we found that sometimes the receiver might get duplicate 
messages. Then we made a change here.

> A new CR by receiver on ACK will _not_ trigger BAST call. Instead no AST  
> will be called 
> until the original EX on ACK by sender is not released. BAST is called only  
> on locks 

The description here seems right to me, but I don't get the connection, sorry

> that are already granted. Since we trigger message processing only on BAST I  
> don't 
> see a possibility of duplicate message here. 
>  
> > 
> > What I can think of a way to fix the deadlock now is setting the  
> DLM_LKF_NOQUEUE 
> > flag when the sender tries to get EX on MESSAGE. It should keep trying  
> until all the 
> > receivers release their locks on MESSAGE. Do you have any better idea  
> without adding 
> > more lock resources? Since we already have three for transmitting messages. 
> > 
> Its exactly what I was thinking about and sounds like a good solution.  
> However 
> as said  above I don't think receiver EX on ACK is really needed. 
>  

As already explained. If there's no other problem to you, then here needs
a patch to fix the potential deadlock.

Regards,
Lidong
> Regards, 
> Abhijit 
>  
> > Regards, 
> > Lidong 
> > 
> > 
> >> Since there is a possibility that we might lose out on this up convert in a 
> >> race  condition, can 
> >> we simply eliminate this up conversion? (since CR is preventing the next 
> >> Sender from taking 
> >> EX on MESSAGE anyway). 
> >> 
> >> Regards, 
> >> Abhijit 
> >> 
> >> -- 
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in 
> >> the body of a message to majordomo@vger.kernel.org 
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html 
> >> 
> >> 
> > 
>  
>  
>  


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-05-08  5:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAE3Hb8oss1JZ2u7g7OQQgrEtgQ1vbQou04isiS6eEqbS=uzbhw@mail.gmail.com>
     [not found] ` <CAE3Hb8qNczD30RrcHFYCR90Jf9QFD-XH=x89MAu4Dpmm80se0A@mail.gmail.com>
     [not found]   ` <554251EA.3000807@suse.com>
     [not found]     ` <CAE3Hb8pJ=0MB6EX5jVch28gj-gnf0Mp1wyzxBfWjzLf=SuV4sQ@mail.gmail.com>
2015-04-30 18:36       ` Potential race in dlm based messaging md-cluster.c Abhijit Bhopatkar
2015-04-30 18:47         ` Abhijit Bhopatkar
2015-04-30 18:51           ` Abhijit Bhopatkar
2015-05-05  9:22         ` Lidong Zhong
2015-05-05  9:44           ` Abhijit Bhopatkar
2015-05-05 12:10             ` Abhijit Bhopatkar
2015-05-07  2:43               ` Lidong Zhong
2015-05-07  9:14                 ` Abhijit Bhopatkar
2015-05-08  5:06                   ` Lidong Zhong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.