All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] Is it an issue and whether the code changed correct? Thanks a lot
@ 2013-07-16  9:06 Guozhonghua
  2013-07-17  7:55 ` Jeff Liu
  0 siblings, 1 reply; 5+ messages in thread
From: Guozhonghua @ 2013-07-16  9:06 UTC (permalink / raw)
  To: ocfs2-devel

Hi, everyone, is that an issue?

The Server version is Linux 3.2.0-23, Ubuntu 1204.

There are 4 nodes in the OCFS2 Cluster, using three iSCSI LUNS, and every LUN is one OCFS2 domain mounted by thread node.

As the network used buy node has one down/up, the tcp connection between node shutdown and reconnected with each other.
But there is one scenario that the node whose node number is little, shut down the tcp with node whose number is large, the node with large node number will not reconnect the node with little node number.
The otherwise is that if the node with large node number shut down the tcp with node with little number, the node with large node number will reconnect the node with little node number OK.

Such as below:
The server1 syslog is as below:
Jul  9 17:46:10 server1 kernel: [5199872.576027] o2net: Connection to node server2 (num 2) at 192.168.70.20:7100 shutdown, state 8
Jul  9 17:46:10 server1 kernel: [5199872.576111] o2net: No longer connected to node server2 (num 2) at 192.168.70.20:7100
Jul  9 17:46:10 server1 kernel: [5199872.576149] (ocfs2dc,14358,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x3671059b) to node 2
Jul  9 17:46:10 server1 kernel: [5199872.576162] o2dlm: Waiting on the death of node 2 in domain 3656D53908DC4149983BDB1DBBDF1291
Jul  9 17:46:10 server1 kernel: [5199872.576428] o2net: Accepted connection from node server2 (num 2) at 192.168.70.20:7100
Jul  9 17:46:11 server1 kernel: [5199872.995898] o2net: Connection to node server3 (num 3) at 192.168.70.30:7100 has been idle for 30.100 secs, shutting it down.
Jul  9 17:46:11 server1 kernel: [5199872.995987] o2net: No longer connected to node server3 (num 3) at 192.168.70.30:7100
Jul  9 17:46:11 server1 kernel: [5199873.069666] o2net: Connection to node server4 (num 4) at 192.168.70.40:7100 shutdown, state 8
Jul  9 17:46:11 server1 kernel: [5199873.069700] o2net: No longer connected to node server4 (num 4) at 192.168.70.40:7100
Jul  9 17:46:11 server1 kernel: [5199873.070385] o2net: Accepted connection from node server4 (num 4) at 192.168.70.40:7100

The server1 shutdown the tcp connection with server3, but server3 never reconnect with server1.

The server3 syslog is as below:
Jul  9 17:44:12 server3 kernel: [3971907.332698] o2net: Connection to node server1 (num 1) at 192.168.70.10:7100 shutdown, state 8
Jul  9 17:44:12 server3 kernel: [3971907.332748] o2net: No longer connected to node server1 (num 1) at 192.168.70.10:7100
Jul  9 17:44:42 server3 kernel: [3971937.355419] o2net: No connection established with node 1 after 30.0 seconds, giving up.
Jul  9 17:45:01 server3 CRON[52349]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul  9 17:45:12 server3 kernel: [3971967.421656] o2net: No connection established with node 1 after 30.0 seconds, giving up.
Jul  9 17:45:42 server3 kernel: [3971997.487949] o2net: No connection established with node 1 after 30.0 seconds, giving up.
Jul  9 17:46:12 server3 kernel: [3972027.554258] o2net: No connection established with node 1 after 30.0 seconds, giving up.
Jul  9 17:46:42 server3 kernel: [3972057.620496] o2net: No connection established with node 1 after 30.0 seconds, giving up.

The node of server2 and server4 shut down the connection with server1, and reconnect them ok.

I review the code of the ocfs2 kernel and found this may be an issue or bug.

As node of server1 did not receive msg from server3, he shut the connection with server3 and set the timeout with 1.
The server1's node number is little than server3, so he wait the connect request from server3.
static void o2net_idle_timer(unsigned long data)
{
    ... ...
        printk(KERN_NOTICE "o2net: Connection to " SC_NODEF_FMT " has been "
               "idle for %lu.%lu secs, shutting it down.\n", SC_NODEF_ARGS(sc),
               msecs / 1000, msecs % 1000);
    ..... ...
        atomic_set(&nn->nn_timeout, 1);
        o2net_sc_queue_work(sc, &sc->sc_shutdown_work);
}

But the server3 monitoring the TCP connection state changed and shutdown connect again and it will never reconnect with server1 because the nn->nn_timeout is 0.

static void o2net_state_change(struct sock *sk)
{
......
        switch(sk->sk_state) {
        ......
                default:
                        printk(KERN_INFO "AAAAA o2net: Connection to " SC_NODEF_FMT
                              " shutdown, state %d\n",
                              SC_NODEF_ARGS(sc), sk->sk_state);
                        o2net_sc_queue_work(sc, &sc->sc_shutdown_work);
                        break;
        }
... ...
}

I had test the TCP connect without any shutdown between nodes, but send message will failed because the connection state is error.


I change the code for the connect triggers in function o2net_set_nn_state and o2net_start_connect, and the reconnect rebuild up OK.
Is anyone review the code correct? Thanks a lots.

root at gzh-dev:~/ocfs2# diff -p -C 10 ./ocfs2_org/cluster/tcp.c ocfs2_rep/cluster/tcp.c
*** ./ocfs2_org/cluster/tcp.c 2012-10-29 19:33:19.534200000 +0800
--- ocfs2_rep/cluster/tcp.c      2013-07-16 16:58:31.380452531 +0800
*************** static void o2net_set_nn_state(struct o2
*** 567,586 ****
--- 567,590 ----
      if (!valid && o2net_wq) {
              unsigned long delay;
              /* delay if we're within a RECONNECT_DELAY of the
               * last attempt */
              delay = (nn->nn_last_connect_attempt +
                       msecs_to_jiffies(o2net_reconnect_delay()))
                      - jiffies;
              if (delay > msecs_to_jiffies(o2net_reconnect_delay()))
                      delay = 0;
              mlog(ML_CONN, "queueing conn attempt in %lu jiffies\n", delay);
+
+             /** Trigger the reconnection */
+             atomic_set(&nn->nn_timeout, 1);
+
              queue_delayed_work(o2net_wq, &nn->nn_connect_work, delay);

              /*
               * Delay the expired work after idle timeout.
               *
               * We might have lots of failed connection attempts that run
               * through here but we only cancel the connect_expired work when
               * a connection attempt succeeds.  So only the first enqueue of
               * the connect_expired work will do anything.  The rest will see
               * that it's already queued and do nothing.
*************** static void o2net_start_connect(struct w
*** 1691,1710 ****
--- 1695,1719 ----
      remoteaddr.sin_family = AF_INET;
      remoteaddr.sin_addr.s_addr = node->nd_ipv4_address;
      remoteaddr.sin_port = node->nd_ipv4_port;

      ret = sc->sc_sock->ops->connect(sc->sc_sock,
                                      (struct sockaddr *)&remoteaddr,
                                      sizeof(remoteaddr),
                                      O_NONBLOCK);
      if (ret == -EINPROGRESS)
              ret = 0;
+
+     /** Reset the timeout with 0 to avoid connection again, Just for test the tcp connection */
+         if (ret == 0) {
+                 atomic_set(&nn->nn_timeout, 0);
+         }

  out:
      if (ret) {
              printk(KERN_NOTICE "o2net: Connect attempt to " SC_NODEF_FMT
                     " failed with errno %d\n", SC_NODEF_ARGS(sc), ret);
              /* 0 err so that another will be queued and attempted
               * from set_nn_state */
              if (sc)
                      o2net_ensure_shutdown(nn, sc, 0);
      }
-------------------------------------------------------------------------------------------------------------------------------------
????????????????????????????????????????
????????????????????????????????????????
????????????????????????????????????????
???
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20130716/cae7604c/attachment-0001.html 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] Is it an issue and whether the code changed correct? Thanks a lot
  2013-07-16  9:06 [Ocfs2-devel] Is it an issue and whether the code changed correct? Thanks a lot Guozhonghua
@ 2013-07-17  7:55 ` Jeff Liu
  2013-07-24  0:59   ` Srinivas Eeda
  2013-07-27  9:27   ` [Ocfs2-devel] 答复: " Guozhonghua
  0 siblings, 2 replies; 5+ messages in thread
From: Jeff Liu @ 2013-07-17  7:55 UTC (permalink / raw)
  To: ocfs2-devel

[Add Srinivas/Xiaofei to CC list as they are investigating OCFS2 net related issues]

Hi Guo,

Thanks for your reports and analysis!

On 07/16/2013 05:06 PM, Guozhonghua wrote:

> Hi, everyone, is that an issue?
> 

That is an issue because we should keep attempting to reconnect
back until the connection is established or captured a disk
heartbeat down event.

This strategy has been described at upstream commit:
	5cc3bf2786f63cceb191c3c02ddd83c6f38a7d64
    		ocfs2:  Reconnect after idle time out.


> The Server version is Linux 3.2.0-23, Ubuntu 1204.

Generally speaking, we dig into potential problems against the
mainline updated source tree, linux-next is fine for OCFS2.
One important reason is that the facing issue on an old release
might be fixed recently. 

> There are 4 nodes in the OCFS2 Cluster, using three iSCSI LUNS, and
> every LUN is one OCFS2 domain mounted by thread node.
> 
>  
> 
> As the network used buy node has one down/up, the tcp connection between
> node shutdown and reconnected with each other.


> But there is one scenario that the node whose node number is little,
> shut down the tcp with node whose number is large, the node with large
> node number will not reconnect the node with little node number.
> 
> The otherwise is that if the node with large node number shut down the
> tcp with node with little number, the node with large node number will
> reconnect the node with little node number OK.

Could you please clarify your test scenario in a bit more detail?

Anyway, re-initialize the timeout to trigger reconnection looks fair to me,
but I'd like to see some comments from Srinivas and Xiaofei.

Btw, that's better if you would make patch via git and setup your email box by
following up the instructions at Documentation/email-clients.txt, please feel free
to drop me an offline email if you have any question regarding this. 


Thanks,
-Jeff

> 
>  
> 
> Such as below:
> 
> The server1 syslog is as below:
> 
> Jul  9 17:46:10 server1 kernel: [5199872.576027] o2net: Connection to
> node server2 (num 2) at 192.168.70.20:7100 shutdown, state 8
> 
> Jul  9 17:46:10 server1 kernel: [5199872.576111] o2net: No longer
> connected to node server2 (num 2) at 192.168.70.20:7100
> 
> Jul  9 17:46:10 server1 kernel: [5199872.576149]
> (ocfs2dc,14358,1):dlm_send_remote_convert_request:395 ERROR: Error -107
> when sending message 504 (key 0x3671059b) to node 2
> 
> Jul  9 17:46:10 server1 kernel: [5199872.576162] o2dlm: Waiting on the
> death of node 2 in domain 3656D53908DC4149983BDB1DBBDF1291
> 
> Jul  9 17:46:10 server1 kernel: [5199872.576428] o2net: Accepted
> connection from node server2 (num 2) at 192.168.70.20:7100
> 
> Jul  9 17:46:11 server1 kernel: [5199872.995898] o2net: Connection to
> node server3 (num 3) at 192.168.70.30:7100 has been idle for 30.100
> secs, shutting it down.
> 
> Jul  9 17:46:11 server1 kernel: [5199872.995987] o2net: No longer
> connected to node server3 (num 3) at 192.168.70.30:7100
> 
> Jul  9 17:46:11 server1 kernel: [5199873.069666] o2net: Connection to
> node server4 (num 4) at 192.168.70.40:7100 shutdown, state 8
> 
> Jul  9 17:46:11 server1 kernel: [5199873.069700] o2net: No longer
> connected to node server4 (num 4) at 192.168.70.40:7100
> 
> Jul  9 17:46:11 server1 kernel: [5199873.070385] o2net: Accepted
> connection from node server4 (num 4) at 192.168.70.40:7100
> 
>  
> 
> The server1 shutdown the tcp connection with server3, but server3 never
> reconnect with server1.
> 
>  
> 
> The server3 syslog is as below:
> 
> Jul  9 17:44:12 server3 kernel: [3971907.332698] o2net: Connection to
> node server1 (num 1) at 192.168.70.10:7100 shutdown, state 8
> 
> Jul  9 17:44:12 server3 kernel: [3971907.332748] o2net: No longer
> connected to node server1 (num 1) at 192.168.70.10:7100
> 
> Jul  9 17:44:42 server3 kernel: [3971937.355419] o2net: No connection
> established with node 1 after 30.0 seconds, giving up.
> 
> Jul  9 17:45:01 server3 CRON[52349]: (root) CMD (command -v debian-sa1 >
> /dev/null && debian-sa1 1 1)
> 
> Jul  9 17:45:12 server3 kernel: [3971967.421656] o2net: No connection
> established with node 1 after 30.0 seconds, giving up.
> 
> Jul  9 17:45:42 server3 kernel: [3971997.487949] o2net: No connection
> established with node 1 after 30.0 seconds, giving up.
> 
> Jul  9 17:46:12 server3 kernel: [3972027.554258] o2net: No connection
> established with node 1 after 30.0 seconds, giving up.
> 
> Jul  9 17:46:42 server3 kernel: [3972057.620496] o2net: No connection
> established with node 1 after 30.0 seconds, giving up.
> 
>  
> 
> The node of server2 and server4 shut down the connection with server1,
> and reconnect them ok.
> 
>  
> 
> I review the code of the ocfs2 kernel and found this may be an issue or bug.
> 
>  
> 
> As node of server1 did not receive msg from server3, he shut the
> connection with server3 and set the timeout with 1.
> 
> The server1?s node number is little than server3, so he wait the connect
> request from server3.
> 
> static void o2net_idle_timer(unsigned long data)
> 
> {
> 
>     ? ?
> 
>         printk(KERN_NOTICE "o2net: Connection to " SC_NODEF_FMT " has been "
> 
>                "idle for %lu.%lu secs, shutting it down.\n",
> SC_NODEF_ARGS(sc),
> 
>                msecs / 1000, msecs % 1000);
> 
>     ?.. ?
> 
>         atomic_set(&nn->nn_timeout, 1);
> 
>         o2net_sc_queue_work(sc, &sc->sc_shutdown_work);
> 
> }
> 
>  
> 
> But the server3 monitoring the TCP connection state changed and shutdown
> connect again and it will never reconnect with server1 because the
> nn->nn_timeout is 0.
> 
>  
> 
> static void o2net_state_change(struct sock *sk)
> 
> {
> 
> ??
> 
>         switch(sk->sk_state) {
> 
>         ??
> 
>                 default:
> 
>                         printk(KERN_INFO "AAAAA o2net: Connection to "
> SC_NODEF_FMT
> 
>                               " shutdown, state %d\n",
> 
>                               SC_NODEF_ARGS(sc), sk->sk_state);
> 
>                         o2net_sc_queue_work(sc, &sc->sc_shutdown_work);
> 
>                         break;
> 
>         }
> 
> ? ?
> 
> }
> 
>  
> 
> I had test the TCP connect without any shutdown between nodes, but send
> message will failed because the connection state is error.
> 
>  
> 
>  
> 
> I change the code for the connect triggers in function
> o2net_set_nn_state and o2net_start_connect, and the reconnect rebuild up OK.
> 
> Is anyone review the code correct? Thanks a lots.
> 
>  
> 
> root at gzh-dev:~/ocfs2# diff -p -C 10 ./ocfs2_org/cluster/tcp.c
> ocfs2_rep/cluster/tcp.c 
> 
> *** ./ocfs2_org/cluster/tcp.c 2012-10-29 19:33:19.534200000 +0800
> 
> --- ocfs2_rep/cluster/tcp.c      2013-07-16 16:58:31.380452531 +0800
> 
> *************** static void o2net_set_nn_state(struct o2
> 
> *** 567,586 ****
> 
> --- 567,590 ----
> 
>       if (!valid && o2net_wq) {
> 
>               unsigned long delay;
> 
>               /* delay if we're within a RECONNECT_DELAY of the
> 
>                * last attempt */
> 
>               delay = (nn->nn_last_connect_attempt +
> 
>                        msecs_to_jiffies(o2net_reconnect_delay()))
> 
>                       - jiffies;
> 
>               if (delay > msecs_to_jiffies(o2net_reconnect_delay()))
> 
>                       delay = 0;
> 
>               mlog(ML_CONN, "queueing conn attempt in %lu jiffies\n",
> delay);
> 
> +         
> 
> +             /** Trigger the reconnection */
> 
> +             atomic_set(&nn->nn_timeout, 1); 
> 
> +
> 
>               queue_delayed_work(o2net_wq, &nn->nn_connect_work, delay);
> 
>  
> 
>               /*
> 
>                * Delay the expired work after idle timeout.
> 
>                *
> 
>                * We might have lots of failed connection attempts that run
> 
>                * through here but we only cancel the connect_expired
> work when
> 
>                * a connection attempt succeeds.  So only the first
> enqueue of
> 
>                * the connect_expired work will do anything.  The rest
> will see
> 
>                * that it's already queued and do nothing.
> 
> *************** static void o2net_start_connect(struct w
> 
> *** 1691,1710 ****
> 
> --- 1695,1719 ----
> 
>       remoteaddr.sin_family = AF_INET;
> 
>       remoteaddr.sin_addr.s_addr = node->nd_ipv4_address;
> 
>       remoteaddr.sin_port = node->nd_ipv4_port;
> 
>  
> 
>       ret = sc->sc_sock->ops->connect(sc->sc_sock,
> 
>                                       (struct sockaddr *)&remoteaddr,
> 
>                                       sizeof(remoteaddr),
> 
>                                       O_NONBLOCK);
> 
>       if (ret == -EINPROGRESS)
> 
>               ret = 0;
> 
> +        
> 
> +     /** Reset the timeout with 0 to avoid connection again, Just for
> test the tcp connection */
> 
> +         if (ret == 0) {
> 
> +                 atomic_set(&nn->nn_timeout, 0);       
> 
> +         }
> 
>  
> 
>   out:
> 
>       if (ret) {
> 
>               printk(KERN_NOTICE "o2net: Connect attempt to " SC_NODEF_FMT
> 
>                      " failed with errno %d\n", SC_NODEF_ARGS(sc), ret);
> 
>               /* 0 err so that another will be queued and attempted
> 
>                * from set_nn_state */
> 
>               if (sc)
> 
>                       o2net_ensure_shutdown(nn, sc, 0);
> 
>       }
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] Is it an issue and whether the code changed correct? Thanks a lot
  2013-07-17  7:55 ` Jeff Liu
@ 2013-07-24  0:59   ` Srinivas Eeda
  2013-07-27  9:27   ` [Ocfs2-devel] 答复: " Guozhonghua
  1 sibling, 0 replies; 5+ messages in thread
From: Srinivas Eeda @ 2013-07-24  0:59 UTC (permalink / raw)
  To: ocfs2-devel

When network timeout happens one node could timeout before the other. 
The node that runs into it first will run o2net_idle_timer which 
initiates a socket shutdown. socket shutdown leads to sending TCP_CLOSE 
to the other end.

If o2net_idle_timer happened on the lower node then nn->nn_timeout won't 
get set on higher node number because it ran into TCP_CLOSE prior to the 
timeout itself. Since nn->nn_timeout is not set to 1 it doesn't initiate 
a reconnect.

So the fix is to set nn->timeout to 1. Now either we should move 
"atomic_set(&nn->nn_timeout, 1)" from o2net_idle_timer to 
o2net_set_nn_state or set this in o2net_state_change as well.

We made this patch along with few other changes and will send it shortly 
or you could send a proper patch based on Jeff's comments


On 07/17/2013 12:55 AM, Jeff Liu wrote:
> [Add Srinivas/Xiaofei to CC list as they are investigating OCFS2 net related issues]
>
> Hi Guo,
>
> Thanks for your reports and analysis!
>
> On 07/16/2013 05:06 PM, Guozhonghua wrote:
>
>> Hi, everyone, is that an issue?
>>
> That is an issue because we should keep attempting to reconnect
> back until the connection is established or captured a disk
> heartbeat down event.
>
> This strategy has been described at upstream commit:
> 	5cc3bf2786f63cceb191c3c02ddd83c6f38a7d64
>      		ocfs2:  Reconnect after idle time out.
>
>
>> The Server version is Linux 3.2.0-23, Ubuntu 1204.
> Generally speaking, we dig into potential problems against the
> mainline updated source tree, linux-next is fine for OCFS2.
> One important reason is that the facing issue on an old release
> might be fixed recently.
>
>> There are 4 nodes in the OCFS2 Cluster, using three iSCSI LUNS, and
>> every LUN is one OCFS2 domain mounted by thread node.
>>
>>   
>>
>> As the network used buy node has one down/up, the tcp connection between
>> node shutdown and reconnected with each other.
>
>> But there is one scenario that the node whose node number is little,
>> shut down the tcp with node whose number is large, the node with large
>> node number will not reconnect the node with little node number.
>>
>> The otherwise is that if the node with large node number shut down the
>> tcp with node with little number, the node with large node number will
>> reconnect the node with little node number OK.
> Could you please clarify your test scenario in a bit more detail?
>
> Anyway, re-initialize the timeout to trigger reconnection looks fair to me,
> but I'd like to see some comments from Srinivas and Xiaofei.
>
> Btw, that's better if you would make patch via git and setup your email box by
> following up the instructions at Documentation/email-clients.txt, please feel free
> to drop me an offline email if you have any question regarding this.
>
>
> Thanks,
> -Jeff
>
>>   
>>
>> Such as below:
>>
>> The server1 syslog is as below:
>>
>> Jul  9 17:46:10 server1 kernel: [5199872.576027] o2net: Connection to
>> node server2 (num 2) at 192.168.70.20:7100 shutdown, state 8
>>
>> Jul  9 17:46:10 server1 kernel: [5199872.576111] o2net: No longer
>> connected to node server2 (num 2) at 192.168.70.20:7100
>>
>> Jul  9 17:46:10 server1 kernel: [5199872.576149]
>> (ocfs2dc,14358,1):dlm_send_remote_convert_request:395 ERROR: Error -107
>> when sending message 504 (key 0x3671059b) to node 2
>>
>> Jul  9 17:46:10 server1 kernel: [5199872.576162] o2dlm: Waiting on the
>> death of node 2 in domain 3656D53908DC4149983BDB1DBBDF1291
>>
>> Jul  9 17:46:10 server1 kernel: [5199872.576428] o2net: Accepted
>> connection from node server2 (num 2) at 192.168.70.20:7100
>>
>> Jul  9 17:46:11 server1 kernel: [5199872.995898] o2net: Connection to
>> node server3 (num 3) at 192.168.70.30:7100 has been idle for 30.100
>> secs, shutting it down.
>>
>> Jul  9 17:46:11 server1 kernel: [5199872.995987] o2net: No longer
>> connected to node server3 (num 3) at 192.168.70.30:7100
>>
>> Jul  9 17:46:11 server1 kernel: [5199873.069666] o2net: Connection to
>> node server4 (num 4) at 192.168.70.40:7100 shutdown, state 8
>>
>> Jul  9 17:46:11 server1 kernel: [5199873.069700] o2net: No longer
>> connected to node server4 (num 4) at 192.168.70.40:7100
>>
>> Jul  9 17:46:11 server1 kernel: [5199873.070385] o2net: Accepted
>> connection from node server4 (num 4) at 192.168.70.40:7100
>>
>>   
>>
>> The server1 shutdown the tcp connection with server3, but server3 never
>> reconnect with server1.
>>
>>   
>>
>> The server3 syslog is as below:
>>
>> Jul  9 17:44:12 server3 kernel: [3971907.332698] o2net: Connection to
>> node server1 (num 1) at 192.168.70.10:7100 shutdown, state 8
>>
>> Jul  9 17:44:12 server3 kernel: [3971907.332748] o2net: No longer
>> connected to node server1 (num 1) at 192.168.70.10:7100
>>
>> Jul  9 17:44:42 server3 kernel: [3971937.355419] o2net: No connection
>> established with node 1 after 30.0 seconds, giving up.
>>
>> Jul  9 17:45:01 server3 CRON[52349]: (root) CMD (command -v debian-sa1 >
>> /dev/null && debian-sa1 1 1)
>>
>> Jul  9 17:45:12 server3 kernel: [3971967.421656] o2net: No connection
>> established with node 1 after 30.0 seconds, giving up.
>>
>> Jul  9 17:45:42 server3 kernel: [3971997.487949] o2net: No connection
>> established with node 1 after 30.0 seconds, giving up.
>>
>> Jul  9 17:46:12 server3 kernel: [3972027.554258] o2net: No connection
>> established with node 1 after 30.0 seconds, giving up.
>>
>> Jul  9 17:46:42 server3 kernel: [3972057.620496] o2net: No connection
>> established with node 1 after 30.0 seconds, giving up.
>>
>>   
>>
>> The node of server2 and server4 shut down the connection with server1,
>> and reconnect them ok.
>>
>>   
>>
>> I review the code of the ocfs2 kernel and found this may be an issue or bug.
>>
>>   
>>
>> As node of server1 did not receive msg from server3, he shut the
>> connection with server3 and set the timeout with 1.
>>
>> The server1?s node number is little than server3, so he wait the connect
>> request from server3.
>>
>> static void o2net_idle_timer(unsigned long data)
>>
>> {
>>
>>      ? ?
>>
>>          printk(KERN_NOTICE "o2net: Connection to " SC_NODEF_FMT " has been "
>>
>>                 "idle for %lu.%lu secs, shutting it down.\n",
>> SC_NODEF_ARGS(sc),
>>
>>                 msecs / 1000, msecs % 1000);
>>
>>      ?.. ?
>>
>>          atomic_set(&nn->nn_timeout, 1);
>>
>>          o2net_sc_queue_work(sc, &sc->sc_shutdown_work);
>>
>> }
>>
>>   
>>
>> But the server3 monitoring the TCP connection state changed and shutdown
>> connect again and it will never reconnect with server1 because the
>> nn->nn_timeout is 0.
>>
>>   
>>
>> static void o2net_state_change(struct sock *sk)
>>
>> {
>>
>> ??
>>
>>          switch(sk->sk_state) {
>>
>>          ??
>>
>>                  default:
>>
>>                          printk(KERN_INFO "AAAAA o2net: Connection to "
>> SC_NODEF_FMT
>>
>>                                " shutdown, state %d\n",
>>
>>                                SC_NODEF_ARGS(sc), sk->sk_state);
>>
>>                          o2net_sc_queue_work(sc, &sc->sc_shutdown_work);
>>
>>                          break;
>>
>>          }
>>
>> ? ?
>>
>> }
>>
>>   
>>
>> I had test the TCP connect without any shutdown between nodes, but send
>> message will failed because the connection state is error.
>>
>>   
>>
>>   
>>
>> I change the code for the connect triggers in function
>> o2net_set_nn_state and o2net_start_connect, and the reconnect rebuild up OK.
>>
>> Is anyone review the code correct? Thanks a lots.
>>
>>   
>>
>> root at gzh-dev:~/ocfs2# diff -p -C 10 ./ocfs2_org/cluster/tcp.c
>> ocfs2_rep/cluster/tcp.c
>>
>> *** ./ocfs2_org/cluster/tcp.c 2012-10-29 19:33:19.534200000 +0800
>>
>> --- ocfs2_rep/cluster/tcp.c      2013-07-16 16:58:31.380452531 +0800
>>
>> *************** static void o2net_set_nn_state(struct o2
>>
>> *** 567,586 ****
>>
>> --- 567,590 ----
>>
>>        if (!valid && o2net_wq) {
>>
>>                unsigned long delay;
>>
>>                /* delay if we're within a RECONNECT_DELAY of the
>>
>>                 * last attempt */
>>
>>                delay = (nn->nn_last_connect_attempt +
>>
>>                         msecs_to_jiffies(o2net_reconnect_delay()))
>>
>>                        - jiffies;
>>
>>                if (delay > msecs_to_jiffies(o2net_reconnect_delay()))
>>
>>                        delay = 0;
>>
>>                mlog(ML_CONN, "queueing conn attempt in %lu jiffies\n",
>> delay);
>>
>> +
>>
>> +             /** Trigger the reconnection */
>>
>> +             atomic_set(&nn->nn_timeout, 1);
>>
>> +
>>
>>                queue_delayed_work(o2net_wq, &nn->nn_connect_work, delay);
>>
>>   
>>
>>                /*
>>
>>                 * Delay the expired work after idle timeout.
>>
>>                 *
>>
>>                 * We might have lots of failed connection attempts that run
>>
>>                 * through here but we only cancel the connect_expired
>> work when
>>
>>                 * a connection attempt succeeds.  So only the first
>> enqueue of
>>
>>                 * the connect_expired work will do anything.  The rest
>> will see
>>
>>                 * that it's already queued and do nothing.
>>
>> *************** static void o2net_start_connect(struct w
>>
>> *** 1691,1710 ****
>>
>> --- 1695,1719 ----
>>
>>        remoteaddr.sin_family = AF_INET;
>>
>>        remoteaddr.sin_addr.s_addr = node->nd_ipv4_address;
>>
>>        remoteaddr.sin_port = node->nd_ipv4_port;
>>
>>   
>>
>>        ret = sc->sc_sock->ops->connect(sc->sc_sock,
>>
>>                                        (struct sockaddr *)&remoteaddr,
>>
>>                                        sizeof(remoteaddr),
>>
>>                                        O_NONBLOCK);
>>
>>        if (ret == -EINPROGRESS)
>>
>>                ret = 0;
>>
>> +
>>
>> +     /** Reset the timeout with 0 to avoid connection again, Just for
>> test the tcp connection */
>>
>> +         if (ret == 0) {
>>
>> +                 atomic_set(&nn->nn_timeout, 0);
>>
>> +         }
>>
>>   
>>
>>    out:
>>
>>        if (ret) {
>>
>>                printk(KERN_NOTICE "o2net: Connect attempt to " SC_NODEF_FMT
>>
>>                       " failed with errno %d\n", SC_NODEF_ARGS(sc), ret);
>>
>>                /* 0 err so that another will be queued and attempted
>>
>>                 * from set_nn_state */
>>
>>                if (sc)
>>
>>                        o2net_ensure_shutdown(nn, sc, 0);
>>
>>        }
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] 答复:  Is it an issue and whether the code changed correct? Thanks a lot
  2013-07-17  7:55 ` Jeff Liu
  2013-07-24  0:59   ` Srinivas Eeda
@ 2013-07-27  9:27   ` Guozhonghua
  2013-07-27 17:23     ` Srinivas Eeda
  1 sibling, 1 reply; 5+ messages in thread
From: Guozhonghua @ 2013-07-27  9:27 UTC (permalink / raw)
  To: ocfs2-devel

Hi Liu,
Sorry for delay response, and I am very glad to receive your email.

We are using OCFS2 as to full use IP-SAN or FC-SAN storage and provide conveniences of the iSCSI/FC storage management.

The test scenario is that there are several nodes in the ocfs2 cluster.
All the nodes have two network interface, one is used to be management network, such as 192.168.0.12, the other is network connected with iSCSI storage, such as 192.168.10.12.
And the management IP 192.168.0.12 is used as by OCFS2 to setup tcp connection, configured in the /etc/ocfs2/cluster.conf file.
This scenario may be same as that used as FC SAN, management network is used by OCFS2 to setup TCP connections to communicates information.
As we setup bond of the management network of the host node on the switch directly connected with node, the node's network interface will down and up, so OCFS2 kernel detect it, the TCP may be disconnected without reconnection. But the storage network is still OK, connected with iSCSI or FC SAN, the heart beat of OCFS2 writing disk on the iSCSI/FC SAN is OK still.
There are messages cannot be communicated between nodes, such as dlm messages, so the OCFS2 cluster may be blocked sometime.

I review the code and have setup one cluster to test, trying to find out the reason why all the cluster or several node of the cluster blocked on the storage disk.

There are code to process reconnect between node. The connection shutdown node whose cluster number is little, should be reconnected from the node whose cluster node number is larger does not triggered as my email described before.

And there is another issues, which blocks the use of the cluster, the cluster is blocked and several nodes could not access the OCFS storage.
The node host does not have response with packets, such as ping, this issue may be DLM.

Thanks a lot

Guozhonghua

-------------------------------------------------------------------------------------------------------------------------------------
????????????????????????????????????????
????????????????????????????????????????
????????????????????????????????????????
???
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] 答复:  Is it an issue and whether the code changed correct? Thanks a lot
  2013-07-27  9:27   ` [Ocfs2-devel] 答复: " Guozhonghua
@ 2013-07-27 17:23     ` Srinivas Eeda
  0 siblings, 0 replies; 5+ messages in thread
From: Srinivas Eeda @ 2013-07-27 17:23 UTC (permalink / raw)
  To: ocfs2-devel

I would like to understand what is causing the ocfs2 network heartbeat
to timeout. If you can reproduce the issue, can you please run the
following tcpdump command on all nodes and provide me the output.

tcpdump -Z root -i $DEVICE -C 50 -W 10 -s 2500 -Sw /tmp/`hostname
-s`_tcpdump.log -ttt 'port 7777' &

Please run and capture "top -2" output as well.


On 07/27/2013 02:27 AM, Guozhonghua wrote:
> Hi Liu,
> Sorry for delay response, and I am very glad to receive your email.
>
> We are using OCFS2 as to full use IP-SAN or FC-SAN storage and provide conveniences of the iSCSI/FC storage management.
>
> The test scenario is that there are several nodes in the ocfs2 cluster.
> All the nodes have two network interface, one is used to be management network, such as 192.168.0.12, the other is network connected with iSCSI storage, such as 192.168.10.12.
> And the management IP 192.168.0.12 is used as by OCFS2 to setup tcp connection, configured in the /etc/ocfs2/cluster.conf file.
> This scenario may be same as that used as FC SAN, management network is used by OCFS2 to setup TCP connections to communicates information.
> As we setup bond of the management network of the host node on the switch directly connected with node, the node's network interface will down and up, so OCFS2 kernel detect it, the TCP may be disconnected without reconnection. But the storage network is still OK, connected with iSCSI or FC SAN, the heart beat of OCFS2 writing disk on the iSCSI/FC SAN is OK still.
> There are messages cannot be communicated between nodes, such as dlm messages, so the OCFS2 cluster may be blocked sometime.
>
> I review the code and have setup one cluster to test, trying to find out the reason why all the cluster or several node of the cluster blocked on the storage disk.
>
> There are code to process reconnect between node. The connection shutdown node whose cluster number is little, should be reconnected from the node whose cluster node number is larger does not triggered as my email described before.
>
> And there is another issues, which blocks the use of the cluster, the cluster is blocked and several nodes could not access the OCFS storage.
> The node host does not have response with packets, such as ping, this issue may be DLM.
>
> Thanks a lot
>
> Guozhonghua
>
> -------------------------------------------------------------------------------------------------------------------------------------
> ????????????????????????????????????????
> ????????????????????????????????????????
> ????????????????????????????????????????
> ???
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-07-27 17:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-16  9:06 [Ocfs2-devel] Is it an issue and whether the code changed correct? Thanks a lot Guozhonghua
2013-07-17  7:55 ` Jeff Liu
2013-07-24  0:59   ` Srinivas Eeda
2013-07-27  9:27   ` [Ocfs2-devel] 答复: " Guozhonghua
2013-07-27 17:23     ` Srinivas Eeda

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.