All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sunrpc: cancel delayed connect working when conncet success
@ 2010-08-18  9:49 Mi Jinlong
  2010-08-30 19:18 ` Trond Myklebust
  0 siblings, 1 reply; 4+ messages in thread
From: Mi Jinlong @ 2010-08-18  9:49 UTC (permalink / raw)
  To: NFSv3 list, Trond.Myklebust, J. Bruce Fields, Chuck Lever, Jeff Layton

As network partition or some other reason, when client connect 
success, maybe there is some delayed connect working in connect_work list.

Aug  2 12:51:32 TEST-M kernel: RPC:       xs_connect delayed xprt ccc4c800 for 96 seconds
Aug  2 12:51:32 TEST-M kernel: RPC:       xs_error_report client ccc4c800...
Aug  2 12:51:32 TEST-M kernel: RPC:       error 111
 ... snip ...
Aug  2 12:53:08 TEST-M kernel: RPC:       disconnected transport ccc4c800
Aug  2 12:53:08 TEST-M kernel: RPC:       worker connecting xprt ccc4c800 via tcp to 192.168.0.21 (port 2049)
Aug  2 12:53:08 TEST-M kernel: RPC:       ccc4c800 connect status 115 connected 0 sock state 2
Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retrying
Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_prepare_transmit
Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_transmit(136)
Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_send_request(136) = -11
Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit incomplete (136 left of 136)
Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect xprt ccc4c800 is not connected
Aug  2 12:53:08 TEST-M kernel: RPC:       xs_connect delayed xprt ccc4c800 for 192 seconds
Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_state_change client ccc4c800...
Aug  2 12:53:08 TEST-M kernel: RPC:       state 1 conn 0 dead 0 zapped 1
Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retrying
Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_prepare_transmit
Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_transmit(136)
Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_send_request(136) = 136
Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit complete
Aug  2 12:53:08 TEST-M kernel: RPC:   229 xprt_prepare_transmit

As the debug message show, "xs_connect delayed xprt ccc4c800 for 192 seconds"
means a connecting work have be delayed at connect_worker list.
"state 1 conn 0 dead 0 zapped 1" shows the connect have successed 
but a delayed work still alive at connect_worker list.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>

---
 net/sunrpc/xprtsock.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 49a62f0..823f1db 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1324,6 +1324,10 @@ static void xs_tcp_state_change(struct sock *sk)
 			transport->tcp_flags =
 				TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
 
+			if (xprt_connecting(xprt) &&
+			      cancel_delayed_work(&transport->connect_worker))
+				xprt_clear_connecting(xprt);
+
 			xprt_wake_pending_tasks(xprt, -EAGAIN);
 		}
 		spin_unlock_bh(&xprt->transport_lock);
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sunrpc: cancel delayed connect working when conncet success
  2010-08-18  9:49 [PATCH] sunrpc: cancel delayed connect working when conncet success Mi Jinlong
@ 2010-08-30 19:18 ` Trond Myklebust
  2010-08-31  0:47   ` Mi Jinlong
  0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2010-08-30 19:18 UTC (permalink / raw)
  To: Mi Jinlong; +Cc: NFSv3 list, J. Bruce Fields, Chuck Lever, Jeff Layton

On Wed, 2010-08-18 at 17:49 +0800, Mi Jinlong wrote:
> As network partition or some other reason, when client connect 
> success, maybe there is some delayed connect working in connect_work list.
> 
> Aug  2 12:51:32 TEST-M kernel: RPC:       xs_connect delayed xprt ccc4c800 for 96 seconds
> Aug  2 12:51:32 TEST-M kernel: RPC:       xs_error_report client ccc4c800...
> Aug  2 12:51:32 TEST-M kernel: RPC:       error 111
>  ... snip ...
> Aug  2 12:53:08 TEST-M kernel: RPC:       disconnected transport ccc4c800
> Aug  2 12:53:08 TEST-M kernel: RPC:       worker connecting xprt ccc4c800 via tcp to 192.168.0.21 (port 2049)
> Aug  2 12:53:08 TEST-M kernel: RPC:       ccc4c800 connect status 115 connected 0 sock state 2
> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retrying
> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_prepare_transmit
> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_transmit(136)
> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_send_request(136) = -11
> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit incomplete (136 left of 136)
> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect xprt ccc4c800 is not connected
> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_connect delayed xprt ccc4c800 for 192 seconds
> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_state_change client ccc4c800...
> Aug  2 12:53:08 TEST-M kernel: RPC:       state 1 conn 0 dead 0 zapped 1
> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retrying
> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_prepare_transmit
> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_transmit(136)
> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_send_request(136) = 136
> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit complete
> Aug  2 12:53:08 TEST-M kernel: RPC:   229 xprt_prepare_transmit
> 
> As the debug message show, "xs_connect delayed xprt ccc4c800 for 192 seconds"
> means a connecting work have be delayed at connect_worker list.
> "state 1 conn 0 dead 0 zapped 1" shows the connect have successed 
> but a delayed work still alive at connect_worker list.
> 
> Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
> 
> ---
>  net/sunrpc/xprtsock.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 49a62f0..823f1db 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1324,6 +1324,10 @@ static void xs_tcp_state_change(struct sock *sk)
>  			transport->tcp_flags =
>  				TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
>  
> +			if (xprt_connecting(xprt) &&
> +			      cancel_delayed_work(&transport->connect_worker))
> +				xprt_clear_connecting(xprt);
> +
>  			xprt_wake_pending_tasks(xprt, -EAGAIN);
>  		}
>  		spin_unlock_bh(&xprt->transport_lock);

Wait... According to the above trace, the connect request is _failing_
due to an ECONNREFUSED error. In that case, we _want_ to delay the
reconnection in order to give the server time to set itself up.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sunrpc: cancel delayed connect working when conncet success
  2010-08-30 19:18 ` Trond Myklebust
@ 2010-08-31  0:47   ` Mi Jinlong
  2010-09-09 10:11     ` Mi Jinlong
  0 siblings, 1 reply; 4+ messages in thread
From: Mi Jinlong @ 2010-08-31  0:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFSv3 list, J. Bruce Fields, Chuck Lever, Jeff Layton

Hi Trond,

Trond Myklebust 写道:
> On Wed, 2010-08-18 at 17:49 +0800, Mi Jinlong wrote:
>> As network partition or some other reason, when client connect 
>> success, maybe there is some delayed connect working in connect_work list.
>>
>> Aug  2 12:51:32 TEST-M kernel: RPC:       xs_connect delayed xprt ccc4c800 for 96 seconds
>> Aug  2 12:51:32 TEST-M kernel: RPC:       xs_error_report client ccc4c800...
>> Aug  2 12:51:32 TEST-M kernel: RPC:       error 111
>>  ... snip ...
>> Aug  2 12:53:08 TEST-M kernel: RPC:       disconnected transport ccc4c800
>> Aug  2 12:53:08 TEST-M kernel: RPC:       worker connecting xprt ccc4c800 via tcp to 192.168.0.21 (port 2049)
>> Aug  2 12:53:08 TEST-M kernel: RPC:       ccc4c800 connect status 115 connected 0 sock state 2
>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retrying
>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_prepare_transmit
>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_transmit(136)
>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_send_request(136) = -11
>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit incomplete (136 left of 136)
>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect xprt ccc4c800 is not connected
>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_connect delayed xprt ccc4c800 for 192 seconds
>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_state_change client ccc4c800...
>> Aug  2 12:53:08 TEST-M kernel: RPC:       state 1 conn 0 dead 0 zapped 1
>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retrying
>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_prepare_transmit
>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_transmit(136)
>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_send_request(136) = 136
>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit complete
>> Aug  2 12:53:08 TEST-M kernel: RPC:   229 xprt_prepare_transmit
>>
>> As the debug message show, "xs_connect delayed xprt ccc4c800 for 192 seconds"
>> means a connecting work have be delayed at connect_worker list.
>> "state 1 conn 0 dead 0 zapped 1" shows the connect have successed 
>> but a delayed work still alive at connect_worker list.
>>
>> Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
>>
>> ---
>>  net/sunrpc/xprtsock.c |    4 ++++
>>  1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>> index 49a62f0..823f1db 100644
>> --- a/net/sunrpc/xprtsock.c
>> +++ b/net/sunrpc/xprtsock.c
>> @@ -1324,6 +1324,10 @@ static void xs_tcp_state_change(struct sock *sk)
>>  			transport->tcp_flags =
>>  				TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
>>  
>> +			if (xprt_connecting(xprt) &&
>> +			      cancel_delayed_work(&transport->connect_worker))
>> +				xprt_clear_connecting(xprt);
>> +
>>  			xprt_wake_pending_tasks(xprt, -EAGAIN);
>>  		}
>>  		spin_unlock_bh(&xprt->transport_lock);
> 
> Wait... According to the above trace, the connect request is _failing_
> due to an ECONNREFUSED error. In that case, we _want_ to delay the
> reconnection in order to give the server time to set itself up.

  Yes, that's right.

  But, the important part of the trace is 
  "
   Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit incomplete (136 left of 136)
   Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect xprt ccc4c800 is not connected
   Aug  2 12:53:08 TEST-M kernel: RPC:       xs_connect delayed xprt ccc4c800 for 192 seconds
   Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_state_change client ccc4c800...
   Aug  2 12:53:08 TEST-M kernel: RPC:       state 1 conn 0 dead 0 zapped 1
   Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retrying
  ".

  The SUNRPC's TCP connecting is asynchronous, but the tcp_connect()
  only send a SYN but don't waiting for the ACK reply.

    CLIENT                                   SERVER

  1. The first connecting

    |-xs_connect()
     |-kernel_connect(O_NONBLOCK)
       |-tcp_connet()  -------- SYN --------->

   xs_connect() return with EINPROGRESS and the ACK have not reply. 

  2. a reconnecting of 1
    |-xs_connect()
        queue_delayed_work(rpciod_workqueue,
                            &transport->connect_worker,
                            xprt->reestablish_timeout);

              <---------------ACK SYN--------------

   After the reconnecting have put the connect working to connect_worker,
   the ACK-SYN for the first connecting reply, the connect is OK now.

At this instance, a delayed connect working will be exist at connect_worker
after connecting success, we should cancel this working.

thanks,
Mi Jinlong


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sunrpc: cancel delayed connect working when conncet success
  2010-08-31  0:47   ` Mi Jinlong
@ 2010-09-09 10:11     ` Mi Jinlong
  0 siblings, 0 replies; 4+ messages in thread
From: Mi Jinlong @ 2010-09-09 10:11 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFSv3 list, J. Bruce Fields, Chuck Lever, Jeff Layton

Hi Trond,

  what do you think about this problem ?

thanks,
Mi Jinlong

Mi Jinlong =E5=86=99=E9=81=93:
> Hi Trond,
>=20
> Trond Myklebust =E5=86=99=E9=81=93:
>> On Wed, 2010-08-18 at 17:49 +0800, Mi Jinlong wrote:
>>> As network partition or some other reason, when client connect=20
>>> success, maybe there is some delayed connect working in connect_wor=
k list.
>>>
>>> Aug  2 12:51:32 TEST-M kernel: RPC:       xs_connect delayed xprt c=
cc4c800 for 96 seconds
>>> Aug  2 12:51:32 TEST-M kernel: RPC:       xs_error_report client cc=
c4c800...
>>> Aug  2 12:51:32 TEST-M kernel: RPC:       error 111
>>>  ... snip ...
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       disconnected transport cc=
c4c800
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       worker connecting xprt cc=
c4c800 via tcp to 192.168.0.21 (port 2049)
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       ccc4c800 connect status 1=
15 connected 0 sock state 2
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retr=
ying
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_prepare_transmit
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_transmit(136)
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_send_request(136) =
=3D -11
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit incomplete (136 left=
 of 136)
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect xprt ccc4c80=
0 is not connected
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_connect delayed xprt c=
cc4c800 for 192 seconds
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_state_change clien=
t ccc4c800...
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       state 1 conn 0 dead 0 zap=
ped 1
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retr=
ying
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_prepare_transmit
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_transmit(136)
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_send_request(136) =
=3D 136
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit complete
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   229 xprt_prepare_transmit
>>>
>>> As the debug message show, "xs_connect delayed xprt ccc4c800 for 19=
2 seconds"
>>> means a connecting work have be delayed at connect_worker list.
>>> "state 1 conn 0 dead 0 zapped 1" shows the connect have successed=20
>>> but a delayed work still alive at connect_worker list.
>>>
>>> Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
>>>
>>> ---
>>>  net/sunrpc/xprtsock.c |    4 ++++
>>>  1 files changed, 4 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>> index 49a62f0..823f1db 100644
>>> --- a/net/sunrpc/xprtsock.c
>>> +++ b/net/sunrpc/xprtsock.c
>>> @@ -1324,6 +1324,10 @@ static void xs_tcp_state_change(struct sock =
*sk)
>>>  			transport->tcp_flags =3D
>>>  				TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
>>> =20
>>> +			if (xprt_connecting(xprt) &&
>>> +			      cancel_delayed_work(&transport->connect_worker))
>>> +				xprt_clear_connecting(xprt);
>>> +
>>>  			xprt_wake_pending_tasks(xprt, -EAGAIN);
>>>  		}
>>>  		spin_unlock_bh(&xprt->transport_lock);
>> Wait... According to the above trace, the connect request is _failin=
g_
>> due to an ECONNREFUSED error. In that case, we _want_ to delay the
>> reconnection in order to give the server time to set itself up.
>=20
>   Yes, that's right.
>=20
>   But, the important part of the trace is=20
>   "
>    Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit incomplete (136 lef=
t of 136)
>    Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect xprt ccc4c8=
00 is not connected
>    Aug  2 12:53:08 TEST-M kernel: RPC:       xs_connect delayed xprt =
ccc4c800 for 192 seconds
>    Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_state_change clie=
nt ccc4c800...
>    Aug  2 12:53:08 TEST-M kernel: RPC:       state 1 conn 0 dead 0 za=
pped 1
>    Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: ret=
rying
>   ".
>=20
>   The SUNRPC's TCP connecting is asynchronous, but the tcp_connect()
>   only send a SYN but don't waiting for the ACK reply.
>=20
>     CLIENT                                   SERVER
>=20
>   1. The first connecting
>=20
>     |-xs_connect()
>      |-kernel_connect(O_NONBLOCK)
>        |-tcp_connet()  -------- SYN --------->
>=20
>    xs_connect() return with EINPROGRESS and the ACK have not reply.=20
>=20
>   2. a reconnecting of 1
>     |-xs_connect()
>         queue_delayed_work(rpciod_workqueue,
>                             &transport->connect_worker,
>                             xprt->reestablish_timeout);
>=20
>               <---------------ACK SYN--------------
>=20
>    After the reconnecting have put the connect working to connect_wor=
ker,
>    the ACK-SYN for the first connecting reply, the connect is OK now.
>=20
> At this instance, a delayed connect working will be exist at connect_=
worker
> after connecting success, we should cancel this working.
>=20
> thanks,
> Mi Jinlong
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>=20


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-09-09 10:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-18  9:49 [PATCH] sunrpc: cancel delayed connect working when conncet success Mi Jinlong
2010-08-30 19:18 ` Trond Myklebust
2010-08-31  0:47   ` Mi Jinlong
2010-09-09 10:11     ` Mi Jinlong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.