From mboxrd@z Thu Jan 1 00:00:00 1970 From: FeldHost™ Admin Date: Thu, 8 Mar 2018 09:24:48 +0100 Subject: [Cluster-devel] [ClusterLabs] DLM connection channel switch take too long time (> 5mins) In-Reply-To: <5AA160E3020000F9000ADC97@prv-mh.provo.novell.com> References: <5AA15C35020000F9000ADC78@prv-mh.provo.novell.com> <33D91F74-44BF-4624-83F8-5E35E902DC00@feldhost.cz> <5AA160E3020000F9000ADC97@prv-mh.provo.novell.com> Message-ID: <61F298DD-9157-4EBB-B3F7-2C3C8AB335BA@feldhost.cz> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, so try to use active mode. https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_installation_terms.html That fixes I saw in 4.14.* > On 8 Mar 2018, at 09:12, Gang He wrote: > > Hi Feldhost, > > >>>> >> Hello Gang He, >> >> which type of corosync rrp_mode you use? Passive or Active? > clvm1:/etc/corosync # cat corosync.conf | grep rrp_mode > rrp_mode: passive > > Did you try test both? > No, only this mode. > Also, what kernel version you use? I see some SCTP fixes in latest kernels. > clvm1:/etc/corosync # uname -r > 4.4.114-94.11-default > It looks that sock->ops->connect() function is blocked for too long time before return, under broken network situation. > In normal network, sock->ops->connect() function returns very quickly. > > Thanks > Gang > >> >>> On 8 Mar 2018, at 08:52, Gang He wrote: >>> >>> Hello list and David Teigland, >>> >>> I got a problem under a two rings cluster, the problem can be reproduced >> with the below steps. >>> 1) setup a two rings cluster with two nodes. >>> e.g. >>> clvm1(nodeid 172204569) addr_list eth0 10.67.162.25 eth1 192.168.152.240 >>> clvm2(nodeid 172204570) addr_list eth0 10.67.162.26 eth1 192.168.152.103 >>> >>> 2) the whole cluster works well, then I put eth0 down on node clvm2, and >> restart pacemaker service on that node. >>> ifconfig eth0 down >>> rcpacemaker restart >>> >>> 3) the whole cluster still work well (that means corosync is very smooth to >> switch to the other ring). >>> Then, I can mount ocfs2 file system on node clvm2 quickly with the command >>> mount /dev/sda /mnt/ocfs2 >>> >>> 4) Next, I do the same mount on node clvm1, the mount command will be hanged >> for about 5 mins, and finally the mount command is done. >>> But, if we setup a ocfs2 file system resource in pacemaker, >>> the pacemaker resource agent will consider ocfs2 file system resource >> startup failure before this command returns, >>> the pacemaker will fence node clvm1. >>> This problem is impacting our customer's estimate, since they think the two >> rings can be switched smoothly. >>> >>> According to this problem, I can see the mount command is hanged with the >> below back trace, >>> clvm1:/ # cat /proc/6688/stack >>> [] new_lockspace+0x92d/0xa70 [dlm] >>> [] dlm_new_lockspace+0x69/0x160 [dlm] >>> [] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user] >>> [] ocfs2_cluster_connect+0x192/0x240 [ocfs2_stackglue] >>> [] ocfs2_dlm_init+0x31c/0x570 [ocfs2] >>> [] ocfs2_fill_super+0xb33/0x1200 [ocfs2] >>> [] mount_bdev+0x1a0/0x1e0 >>> [] mount_fs+0x3a/0x170 >>> [] vfs_kern_mount+0x62/0x110 >>> [] do_mount+0x213/0xcd0 >>> [] SyS_mount+0x85/0xd0 >>> [] entry_SYSCALL_64_fastpath+0x1e/0xb6 >>> [] 0xffffffffffffffff >>> >>> The root cause is in sctp_connect_to_sock() function in lowcomms.c, >>> 1075 >>> 1076 log_print("connecting to %d", con->nodeid); >>> 1077 >>> 1078 /* Turn off Nagle's algorithm */ >>> 1079 kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char *)&one, >>> 1080 sizeof(one)); >>> 1081 >>> 1082 result = sock->ops->connect(sock, (struct sockaddr *)&daddr, >> addr_len, >>> 1083 O_NONBLOCK); <<= here, this invoking >> will cost > 5 mins before return ETIMEDOUT(-110). >>> 1084 printk(KERN_ERR "sctp_connect_to_sock connect: %d\n", result); >>> 1085 >>> 1086 if (result == -EINPROGRESS) >>> 1087 result = 0; >>> 1088 if (result == 0) >>> 1089 goto out; >>> >>> Then, I want to know if this problem was found/fixed before? >>> it looks DLM can not switch the second ring very quickly, this will impact >> the above application (e.g. CLVM, ocfs2) to create a new lock space before >> it's startup. >>> >>> Thanks >>> Gang >>> >>> >>> _______________________________________________ >>> Users mailing list: Users at clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org