All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] cluster.cman.nodename vanish on config reload
@ 2012-07-10 11:33 Dietmar Maurer
  2012-07-10 11:43 ` Fabio M. Di Nitto
  0 siblings, 1 reply; 14+ messages in thread
From: Dietmar Maurer @ 2012-07-10 11:33 UTC (permalink / raw)
  To: cluster-devel.redhat.com

I just updated from 3.1.8 to latest STABLE32:

I use this cluster.conf:

# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="235" name="test">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"/>
  <clusternodes>
    <clusternode name="maui" nodeid="3" votes="1"/>
    <clusternode name="cnode1" nodeid="1" votes="1"/>
  </clusternodes>
  <rm>
    <pvevm autostart="0" vmid="100"/>
  </rm>
</cluster>

cman service starts without problems:

# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
Starting GFS2 Control Daemon: gfs_controld.
   Unfencing self... [  OK  ]
   Joining fence domain... [  OK  ]

And the corosync objdb contains:

# corosync-objctl|grep cluster.cman
cluster.cman.keyfile=/var/lib/pve-cluster/corosync.authkey
cluster.cman.transport=udpu
cluster.cman.nodename=maui
cluster.cman.cluster_id=1678

Note: there is a value for 'nodename' and 'cluster_id'

Now I simply increase the version inside cluster.conf (on both nodes):

# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="236" name="test">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"/>
  <clusternodes>
    <clusternode name="maui" nodeid="3" votes="1"/>
    <clusternode name="cnode1" nodeid="1" votes="1"/>
  </clusternodes>
  <rm>
    <pvevm autostart="0" vmid="100"/>
  </rm>
</cluster>

And trigger a reload:

# cman_tool version -r -S
cman_tool: Error loading configuration in corosync/cman

And the syslog have more details:

Jul 10 13:28:25 maui corosync[488675]:   [CMAN  ] cman was unable to determine our node name!
Jul 10 13:28:25 maui corosync[488675]:   [CMAN  ] Can't get updated config version: Successfully read config from /etc/cluster/cluster.conf#012.
Jul 10 13:28:25 maui corosync[488675]:   [CMAN  ] Continuing activity with old configuration

Somehow the nodename and cluster_id values are removed from the corosync objdb:

# corosync-objctl|grep cluster.cman
cluster.cman.keyfile=/var/lib/pve-cluster/corosync.authkey
cluster.cman.transport=udpu


Any Idea why that happens?

- Dietmar

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20120710/5e507008/attachment.htm>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-10 11:33 [Cluster-devel] cluster.cman.nodename vanish on config reload Dietmar Maurer
@ 2012-07-10 11:43 ` Fabio M. Di Nitto
  2012-07-10 12:09   ` Dietmar Maurer
  0 siblings, 1 reply; 14+ messages in thread
From: Fabio M. Di Nitto @ 2012-07-10 11:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

If are running stable32 from git, can you please revert:

commit 8975bd6341b2d94c1f89279b1b00d4360da1f5ff

and see if it?s still a problem?

Thanks
Fabio

On 7/10/2012 1:33 PM, Dietmar Maurer wrote:
> I just updated from 3.1.8 to latest STABLE32:
> 
>  
> 
> I use this cluster.conf:
> 
>  
> 
> # cat /etc/cluster/cluster.conf
> 
> <?xml version="1.0"?>
> 
> <cluster config_version="235" name="test">
> 
>   <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"/>
> 
>   <clusternodes>
> 
>     <clusternode name="maui" nodeid="3" votes="1"/>
> 
>     <clusternode name="cnode1" nodeid="1" votes="1"/>
> 
>   </clusternodes>
> 
>   <rm>
> 
>     <pvevm autostart="0" vmid="100"/>
> 
>   </rm>
> 
> </cluster>
> 
>  
> 
> cman service starts without problems:
> 
>  
> 
> # /etc/init.d/cman start
> 
> Starting cluster:
> 
>    Checking if cluster has been disabled at boot... [  OK  ]
> 
>    Checking Network Manager... [  OK  ]
> 
>    Global setup... [  OK  ]
> 
>    Loading kernel modules... [  OK  ]
> 
>    Mounting configfs... [  OK  ]
> 
>    Starting cman... [  OK  ]
> 
>    Waiting for quorum... [  OK  ]
> 
>    Starting fenced... [  OK  ]
> 
>    Starting dlm_controld... [  OK  ]
> 
> Starting GFS2 Control Daemon: gfs_controld.
> 
>    Unfencing self... [  OK  ]
> 
>    Joining fence domain... [  OK  ]
> 
>  
> 
> And the corosync objdb contains:
> 
>  
> 
> # corosync-objctl|grep cluster.cman
> 
> cluster.cman.keyfile=/var/lib/pve-cluster/corosync.authkey
> 
> cluster.cman.transport=udpu
> 
> cluster.cman.nodename=maui
> 
> cluster.cman.cluster_id=1678
> 
>  
> 
> Note: there is a value for ?nodename? and ?cluster_id?
> 
>  
> 
> Now I simply increase the version inside cluster.conf (on both nodes):
> 
>  
> 
> # cat /etc/cluster/cluster.conf
> 
> <?xml version="1.0"?>
> 
> <cluster config_version="236" name="test">
> 
>   <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"/>
> 
>   <clusternodes>
> 
>     <clusternode name="maui" nodeid="3" votes="1"/>
> 
>     <clusternode name="cnode1" nodeid="1" votes="1"/>
> 
>   </clusternodes>
> 
>   <rm>
> 
>     <pvevm autostart="0" vmid="100"/>
> 
>   </rm>
> 
> </cluster>
> 
>  
> 
> And trigger a reload:
> 
>  
> 
> # cman_tool version -r ?S
> 
> cman_tool: Error loading configuration in corosync/cman
> 
>  
> 
> And the syslog have more details:
> 
>  
> 
> Jul 10 13:28:25 maui corosync[488675]:   [CMAN  ] cman was unable to
> determine our node name!
> 
> Jul 10 13:28:25 maui corosync[488675]:   [CMAN  ] Can't get updated
> config version: Successfully read config from /etc/cluster/cluster.conf#012.
> 
> Jul 10 13:28:25 maui corosync[488675]:   [CMAN  ] Continuing activity
> with old configuration
> 
>  
> 
> Somehow the nodename and cluster_id values are removed from the corosync
> objdb:
> 
>  
> 
> # corosync-objctl|grep cluster.cman
> 
> cluster.cman.keyfile=/var/lib/pve-cluster/corosync.authkey
> 
> cluster.cman.transport=udpu
> 
>  
> 
>  
> 
> Any Idea why that happens?
> 
>  
> 
> - Dietmar
> 
>  
> 




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-10 11:43 ` Fabio M. Di Nitto
@ 2012-07-10 12:09   ` Dietmar Maurer
  2012-07-10 12:26     ` Fabio M. Di Nitto
  0 siblings, 1 reply; 14+ messages in thread
From: Dietmar Maurer @ 2012-07-10 12:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

> If are running stable32 from git, can you please revert:
> 
> commit 8975bd6341b2d94c1f89279b1b00d4360da1f5ff
> 
> and see if it?s still a problem?

Yes, same problem.

- Dietmar




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-10 12:09   ` Dietmar Maurer
@ 2012-07-10 12:26     ` Fabio M. Di Nitto
  2012-07-11  7:36       ` Dietmar Maurer
  2012-07-11  7:37       ` Dietmar Maurer
  0 siblings, 2 replies; 14+ messages in thread
From: Fabio M. Di Nitto @ 2012-07-10 12:26 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 7/10/2012 2:09 PM, Dietmar Maurer wrote:
>> If are running stable32 from git, can you please revert:
>>
>> commit 8975bd6341b2d94c1f89279b1b00d4360da1f5ff
>>
>> and see if it?s still a problem?
> 
> Yes, same problem.
> 
> - Dietmar
> 
> 


Ok. then please file a bugzilla. I?ll need to bisect and see when the
problem has been introduced (unless you want to give bisect a shot).

Fabio



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-10 12:26     ` Fabio M. Di Nitto
@ 2012-07-11  7:36       ` Dietmar Maurer
  2012-07-11  7:37       ` Dietmar Maurer
  1 sibling, 0 replies; 14+ messages in thread
From: Dietmar Maurer @ 2012-07-11  7:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

> >> If are running stable32 from git, can you please revert:
> >>
> >> commit 8975bd6341b2d94c1f89279b1b00d4360da1f5ff
> >>
> >> and see if it?s still a problem?
> >
> > Yes, same problem.
> >
> > - Dietmar
> >
> >
> 
> 
> Ok. then please file a bugzilla. I?ll need to bisect and see when the problem
> has been introduced (unless you want to give bisect a shot).

Ok, bisect myself.

This lead directly to commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6

But this is just the check you introduced. If I revert that patch, everything works
as before, but I noticed that It still deletes the values from the corosync objdb after config
reload - even in 3.1.8!

Both cluster.cman.nodename and cluster.cman.cluster_id get removed.

Testing with earlier versions now.

- Dietmar






^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-10 12:26     ` Fabio M. Di Nitto
  2012-07-11  7:36       ` Dietmar Maurer
@ 2012-07-11  7:37       ` Dietmar Maurer
  2012-07-11  8:14         ` Fabio M. Di Nitto
  1 sibling, 1 reply; 14+ messages in thread
From: Dietmar Maurer @ 2012-07-11  7:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

> Ok, bisect myself.
> 
> This lead directly to commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6
> 
> But this is just the check you introduced. If I revert that patch, everything
> works as before, but I noticed that It still deletes the values from the
> corosync objdb after config reload - even in 3.1.8!
> 
> Both cluster.cman.nodename and cluster.cman.cluster_id get removed.
> 
> Testing with earlier versions now.

That even happens with 3.1.4 (cant test easily with older versions).

Any ideas?

- Dietmar




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-11  7:37       ` Dietmar Maurer
@ 2012-07-11  8:14         ` Fabio M. Di Nitto
  2012-07-11  8:20           ` Fabio M. Di Nitto
  2012-07-11  8:21           ` Dietmar Maurer
  0 siblings, 2 replies; 14+ messages in thread
From: Fabio M. Di Nitto @ 2012-07-11  8:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 7/11/2012 9:37 AM, Dietmar Maurer wrote:
>> Ok, bisect myself.
>>
>> This lead directly to commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6
>>
>> But this is just the check you introduced. If I revert that patch, everything
>> works as before, but I noticed that It still deletes the values from the
>> corosync objdb after config reload - even in 3.1.8!
>>
>> Both cluster.cman.nodename and cluster.cman.cluster_id get removed.
>>
>> Testing with earlier versions now.
> 
> That even happens with 3.1.4 (cant test easily with older versions).
> 
> Any ideas?

No, not yet, but what kind of operational problem do you get? does it
affect runtime? if so how?

Fabio



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-11  8:14         ` Fabio M. Di Nitto
@ 2012-07-11  8:20           ` Fabio M. Di Nitto
  2012-07-11  8:21           ` Dietmar Maurer
  1 sibling, 0 replies; 14+ messages in thread
From: Fabio M. Di Nitto @ 2012-07-11  8:20 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 7/11/2012 10:14 AM, Fabio M. Di Nitto wrote:
> On 7/11/2012 9:37 AM, Dietmar Maurer wrote:
>>> Ok, bisect myself.
>>>
>>> This lead directly to commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6
>>>
>>> But this is just the check you introduced. If I revert that patch, everything
>>> works as before, but I noticed that It still deletes the values from the
>>> corosync objdb after config reload - even in 3.1.8!
>>>
>>> Both cluster.cman.nodename and cluster.cman.cluster_id get removed.
>>>
>>> Testing with earlier versions now.
>>
>> That even happens with 3.1.4 (cant test easily with older versions).
>>
>> Any ideas?
> 
> No, not yet, but what kind of operational problem do you get? does it
> affect runtime? if so how?
> 
> Fabio
> 


Nevermind.. i answered my own question.

Fabio



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-11  8:14         ` Fabio M. Di Nitto
  2012-07-11  8:20           ` Fabio M. Di Nitto
@ 2012-07-11  8:21           ` Dietmar Maurer
  2012-07-11  8:27             ` Fabio M. Di Nitto
  1 sibling, 1 reply; 14+ messages in thread
From: Dietmar Maurer @ 2012-07-11  8:21 UTC (permalink / raw)
  To: cluster-devel.redhat.com

> >> This lead directly to commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6
> >>
> >> But this is just the check you introduced. If I revert that patch,
> >> everything works as before, but I noticed that It still deletes the
> >> values from the corosync objdb after config reload - even in 3.1.8!
> >>
> >> Both cluster.cman.nodename and cluster.cman.cluster_id get removed.
> >>
> >> Testing with earlier versions now.
> >
> > That even happens with 3.1.4 (cant test easily with older versions).
> >
> > Any ideas?
> 
> No, not yet, but what kind of operational problem do you get? does it affect
> runtime? if so how?

I cannot change/reload the configuration  with commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6

When I revert that commit everything works fine.

I just wonder why those values get removed from the corosync objdb?

Note: You added that check, so I guess it has negative side effects when there is no nodename (why did you add that check)?

- Dietmar




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-11  8:21           ` Dietmar Maurer
@ 2012-07-11  8:27             ` Fabio M. Di Nitto
  2012-07-11  8:32               ` Dietmar Maurer
  0 siblings, 1 reply; 14+ messages in thread
From: Fabio M. Di Nitto @ 2012-07-11  8:27 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 7/11/2012 10:21 AM, Dietmar Maurer wrote:
>>>> This lead directly to commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6
>>>>
>>>> But this is just the check you introduced. If I revert that patch,
>>>> everything works as before, but I noticed that It still deletes the
>>>> values from the corosync objdb after config reload - even in 3.1.8!
>>>>
>>>> Both cluster.cman.nodename and cluster.cman.cluster_id get removed.
>>>>
>>>> Testing with earlier versions now.
>>>
>>> That even happens with 3.1.4 (cant test easily with older versions).
>>>
>>> Any ideas?
>>
>> No, not yet, but what kind of operational problem do you get? does it affect
>> runtime? if so how?
> 
> I cannot change/reload the configuration  with commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6
> 
> When I revert that commit everything works fine.
> 
> I just wonder why those values get removed from the corosync objdb?

That?s the root cause of the issue.

> 
> Note: You added that check, so I guess it has negative side effects when there is no nodename (why did you add that check)?

Well yes, it is an error if we can?t determine our nodename.

The issue now is to understand why it fails for you but doesn?t fail for
me using git.

Fabio



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-11  8:27             ` Fabio M. Di Nitto
@ 2012-07-11  8:32               ` Dietmar Maurer
  2012-07-11  8:35                 ` Fabio M. Di Nitto
  2012-07-11  9:48                 ` Fabio M. Di Nitto
  0 siblings, 2 replies; 14+ messages in thread
From: Dietmar Maurer @ 2012-07-11  8:32 UTC (permalink / raw)
  To: cluster-devel.redhat.com

> Well yes, it is an error if we can?t determine our nodename.
> 
> The issue now is to understand why it fails for you but doesn?t fail for me
> using git.

Oh, you can't reproduce the bug?




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-11  8:32               ` Dietmar Maurer
@ 2012-07-11  8:35                 ` Fabio M. Di Nitto
  2012-07-11  9:48                 ` Fabio M. Di Nitto
  1 sibling, 0 replies; 14+ messages in thread
From: Fabio M. Di Nitto @ 2012-07-11  8:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 7/11/2012 10:32 AM, Dietmar Maurer wrote:
>> Well yes, it is an error if we can?t determine our nodename.
>>
>> The issue now is to understand why it fails for you but doesn?t fail for me
>> using git.
> 
> Oh, you can't reproduce the bug?
> 


Found it.... it is triggered only when cluster.conf has a
<cman.. section.

Working on a fix now.

Fabio



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-11  8:32               ` Dietmar Maurer
  2012-07-11  8:35                 ` Fabio M. Di Nitto
@ 2012-07-11  9:48                 ` Fabio M. Di Nitto
  2012-07-11 10:11                   ` Dietmar Maurer
  1 sibling, 1 reply; 14+ messages in thread
From: Fabio M. Di Nitto @ 2012-07-11  9:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 7/11/2012 10:32 AM, Dietmar Maurer wrote:
>> Well yes, it is an error if we can?t determine our nodename.
>>
>> The issue now is to understand why it fails for you but doesn?t fail for me
>> using git.
> 
> Oh, you can't reproduce the bug?
> 
> 


Can you please try the patch I just posted to the list? it works for me,
but a couple of extra eyes won?t hurt.

Thanks
fabio



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] cluster.cman.nodename vanish on config reload
  2012-07-11  9:48                 ` Fabio M. Di Nitto
@ 2012-07-11 10:11                   ` Dietmar Maurer
  0 siblings, 0 replies; 14+ messages in thread
From: Dietmar Maurer @ 2012-07-11 10:11 UTC (permalink / raw)
  To: cluster-devel.redhat.com

> Can you please try the patch I just posted to the list? it works for me, but a
> couple of extra eyes won?t hurt.

Ok, seem to work here.

Many thanks for your help!

- Dietmar




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-07-11 10:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-10 11:33 [Cluster-devel] cluster.cman.nodename vanish on config reload Dietmar Maurer
2012-07-10 11:43 ` Fabio M. Di Nitto
2012-07-10 12:09   ` Dietmar Maurer
2012-07-10 12:26     ` Fabio M. Di Nitto
2012-07-11  7:36       ` Dietmar Maurer
2012-07-11  7:37       ` Dietmar Maurer
2012-07-11  8:14         ` Fabio M. Di Nitto
2012-07-11  8:20           ` Fabio M. Di Nitto
2012-07-11  8:21           ` Dietmar Maurer
2012-07-11  8:27             ` Fabio M. Di Nitto
2012-07-11  8:32               ` Dietmar Maurer
2012-07-11  8:35                 ` Fabio M. Di Nitto
2012-07-11  9:48                 ` Fabio M. Di Nitto
2012-07-11 10:11                   ` Dietmar Maurer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.