All of lore.kernel.org
 help / color / mirror / Atom feed
* Does anyone understand Calamari??
@ 2015-05-12 23:34 Bruce McFarland
  2015-05-13  0:02 ` [ceph-calamari] " Gregory Meno
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce McFarland @ 2015-05-12 23:34 UTC (permalink / raw)
  To: ceph-calamari-idqoXFIVOFJgJs9I8MT0rw, ceph-users-Qp0mS5GaXlQ,
	ceph-devel (ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)


[-- Attachment #1.1: Type: text/plain, Size: 495 bytes --]

Increasing the audience since ceph-calamari is not responsive. What salt event/info does the Calamari Master expect to see from the ceph-mon to determine there is an working cluster? I had to change servers hosting the calamari master and can't get the new machine to recognize the cluster. The 'salt \* ceph.get_heartbeats' returns monmap, fsid, ver, epoch, etc for the monitor and all of the osd's. Can anyone point me to docs or code that might enlighten me to what I'm overlooking? Thanks.

[-- Attachment #1.2: Type: text/html, Size: 2320 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ceph-calamari] Does anyone understand Calamari??
  2015-05-12 23:34 Does anyone understand Calamari?? Bruce McFarland
@ 2015-05-13  0:02 ` Gregory Meno
       [not found]   ` <607C9580-DF69-4CA0-9D65-550700893CA8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Gregory Meno @ 2015-05-13  0:02 UTC (permalink / raw)
  To: Bruce McFarland
  Cc: ceph-calamari, ceph-users, ceph-devel (ceph-devel@vger.kernel.org)

Bruce,

It is great to hear that salt is reporting status from all the nodes in the cluster.

Let me see if I understand your question:

You want to know what conditions cause us to recognize a working cluster?

see https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/manager.py#L135

https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/manager.py#L349

and

https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/cluster_monitor.py


Let’s check that you need to be digging into that level of detail:

You switched to a new instance of calamari and it is not recognizing the cluster.

You what to know what you are overlooking? Would you please clarify with some hostnames?

i.e. Let say that your old calamari node was called calamariA and that your new node is calamariB

from which are you running the get_heartbeats?

what is the master setting in the minion config files out on the nodes of the cluster if things are setup correctly they would look like this:

[root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf 
master: calamariB


If this is the case the thing I would check is the http://calamariB/api/v2/cluster endpoint is reporting anything?

hope this helps,
Gregory

> On May 12, 2015, at 4:34 PM, Bruce McFarland <Bruce.McFarland@taec.toshiba.com> wrote:
> 
> Increasing the audience since ceph-calamari is not responsive. What salt event/info does the Calamari Master expect to see from the ceph-mon to determine there is an working cluster? I had to change servers hosting the calamari master and can’t get the new machine to recognize the cluster. The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, etc for the monitor and all of the osd’s. Can anyone point me to docs or code that might enlighten me to what I’m overlooking? Thanks.
> _______________________________________________
> ceph-calamari mailing list
> ceph-calamari@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ceph-calamari] Does anyone understand Calamari??
       [not found]   ` <607C9580-DF69-4CA0-9D65-550700893CA8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-13  0:18     ` Bruce McFarland
       [not found]       ` <7E8CF9C16F722345A89076330719CB1F4A825343-S73WLWeSKkaudDcc988YZay9ae4OIm1A@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce McFarland @ 2015-05-13  0:18 UTC (permalink / raw)
  To: Gregory Meno
  Cc: ceph-calamari-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel (ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	ceph-users-Qp0mS5GaXlQ

Master was ess68 and now it's essperf3. 

On all cluster nodes the following files now have 'master: essperf3'
/etc/salt/minion 
/etc/salt/minion/calamari.conf 
/etc/diamond/diamond.conf

The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \* test.ping' from essperf3 Calamari Master to the cluster. I've also included a quick cluster sanity test with the output of ceph -s and ceph osd tree. And for your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats' since I suspect there might be a missing field in the monitor response. 

oot@essperf3:/etc/ceph# salt \* test.ping
octeon108:
    True
octeon114:
    True
octeon111:
    True
octeon101:
    True
octeon106:
    True
octeon109:
    True
octeon118:
    True
root@essperf3:/etc/ceph# ceph osd tree
# id	weight	type name	up/down	reweight
-1	7	root default
-4	1		host octeon108
0	1			osd.0	up	1	
-2	1		host octeon111
1	1			osd.1	up	1	
-5	1		host octeon115
2	1			osd.2	DNE		
-6	1		host octeon118
3	1			osd.3	up	1	
-7	1		host octeon114
4	1			osd.4	up	1	
-8	1		host octeon106
5	1			osd.5	up	1	
-9	1		host octeon101
6	1			osd.6	up	1	
root@essperf3:/etc/ceph# ceph -s 
    cluster 868bfacc-e492-11e4-89fa-000fb711110c
     health HEALTH_OK
     monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election epoch 1, quorum 0 octeon109
     osdmap e80: 6 osds: 6 up, 6 in
      pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
            60604 MB used, 2734 GB / 2793 GB avail
                 728 active+clean
root@essperf3:/etc/ceph#

root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
octeon109:
    ----------
    - boot_time:
        1430784431
    - ceph_version:
        0.80.8-0.el6
    - services:
        ----------
        ceph-mon.octeon109:
            ----------
            cluster:
                ceph
            fsid:
                868bfacc-e492-11e4-89fa-000fb711110c
            id:
                octeon109
            status:
                ----------
                election_epoch:
                    1
                extra_probe_peers:
                monmap:
                    ----------
                    created:
                        2015-04-16 23:50:52.412686
                    epoch:
                        1
                    fsid:
                        868bfacc-e492-11e4-89fa-000fb711110c
                    modified:
                        2015-04-16 23:50:52.412686
                    mons:
                        ----------
                        - addr:
                            209.243.160.70:6789/0
                        - name:
                            octeon109
                        - rank:
                            0
                name:
                    octeon109
                outside_quorum:
                quorum:
                    - 0
                rank:
                    0
                state:
                    leader
                sync_provider:
            type:
                mon
            version:
                0.86
    ----------
    - 868bfacc-e492-11e4-89fa-000fb711110c:
        ----------
        fsid:
            868bfacc-e492-11e4-89fa-000fb711110c
        name:
            ceph
        versions:
            ----------
            config:
                87f175c60e5c7ec06c263c556056fbcb
            health:
                a907d0ec395713369b4843381ec31bc2
            mds_map:
                1
            mon_map:
                1
            mon_status:
                1
            osd_map:
                80
            pg_summary:
                7e29d7cc93cfced8f3f146cc78f5682f
root@essperf3:/etc/ceph#



> -----Original Message-----
> From: Gregory Meno [mailto:gmeno@redhat.com]
> Sent: Tuesday, May 12, 2015 5:03 PM
> To: Bruce McFarland
> Cc: ceph-calamari@lists.ceph.com; ceph-users@ceph.com; ceph-devel
> (ceph-devel@vger.kernel.org)
> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
> 
> Bruce,
> 
> It is great to hear that salt is reporting status from all the nodes in the
> cluster.
> 
> Let me see if I understand your question:
> 
> You want to know what conditions cause us to recognize a working cluster?
> 
> see
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> manager.py#L135
> 
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> manager.py#L349
> 
> and
> 
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/c
> luster_monitor.py
> 
> 
> Let’s check that you need to be digging into that level of detail:
> 
> You switched to a new instance of calamari and it is not recognizing the
> cluster.
> 
> You what to know what you are overlooking? Would you please clarify with
> some hostnames?
> 
> i.e. Let say that your old calamari node was called calamariA and that your
> new node is calamariB
> 
> from which are you running the get_heartbeats?
> 
> what is the master setting in the minion config files out on the nodes of the
> cluster if things are setup correctly they would look like this:
> 
> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
> master: calamariB
> 
> 
> If this is the case the thing I would check is the
> http://calamariB/api/v2/cluster endpoint is reporting anything?
> 
> hope this helps,
> Gregory
> 
> > On May 12, 2015, at 4:34 PM, Bruce McFarland
> <Bruce.McFarland@taec.toshiba.com> wrote:
> >
> > Increasing the audience since ceph-calamari is not responsive. What salt
> event/info does the Calamari Master expect to see from the ceph-mon to
> determine there is an working cluster? I had to change servers hosting the
> calamari master and can’t get the new machine to recognize the cluster.
> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, etc for
> the monitor and all of the osd’s. Can anyone point me to docs or code that
> might enlighten me to what I’m overlooking? Thanks.
> > _______________________________________________
> > ceph-calamari mailing list
> > ceph-calamari@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ceph-calamari] Does anyone understand Calamari??
       [not found]       ` <7E8CF9C16F722345A89076330719CB1F4A825343-S73WLWeSKkaudDcc988YZay9ae4OIm1A@public.gmane.org>
@ 2015-05-13  0:58         ` Gregory Meno
       [not found]           ` <95DFDC74-A336-46C4-B72E-24E9747CDD6F-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-05-13  1:39           ` Bruce McFarland
  0 siblings, 2 replies; 9+ messages in thread
From: Gregory Meno @ 2015-05-13  0:58 UTC (permalink / raw)
  To: Bruce McFarland
  Cc: ceph-calamari-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel (ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	ceph-users-Qp0mS5GaXlQ

All that looks fine.

There must be some state where the cluster is known to calamari and it is failing to actually show it.

If you have time to debug I would love to see the logs at debug level.

If you don’t we could try cleaning out calamari’s state.
sudo supervisorctl shutdown
sudo service httpd stop
sudo calamari-ctl clear —yes-i-am-sure
sudo calamari-ctl initialize

then 
sudo service supervisord start
sudo service httpd start

see what the API and UI says then.

regards,
Gregory 
> On May 12, 2015, at 5:18 PM, Bruce McFarland <Bruce.McFarland@taec.toshiba.com> wrote:
> 
> Master was ess68 and now it's essperf3. 
> 
> On all cluster nodes the following files now have 'master: essperf3'
> /etc/salt/minion 
> /etc/salt/minion/calamari.conf 
> /etc/diamond/diamond.conf
> 
> The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \* test.ping' from essperf3 Calamari Master to the cluster. I've also included a quick cluster sanity test with the output of ceph -s and ceph osd tree. And for your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats' since I suspect there might be a missing field in the monitor response. 
> 
> oot@essperf3:/etc/ceph# salt \* test.ping
> octeon108:
>    True
> octeon114:
>    True
> octeon111:
>    True
> octeon101:
>    True
> octeon106:
>    True
> octeon109:
>    True
> octeon118:
>    True
> root@essperf3:/etc/ceph# ceph osd tree
> # id	weight	type name	up/down	reweight
> -1	7	root default
> -4	1		host octeon108
> 0	1			osd.0	up	1	
> -2	1		host octeon111
> 1	1			osd.1	up	1	
> -5	1		host octeon115
> 2	1			osd.2	DNE		
> -6	1		host octeon118
> 3	1			osd.3	up	1	
> -7	1		host octeon114
> 4	1			osd.4	up	1	
> -8	1		host octeon106
> 5	1			osd.5	up	1	
> -9	1		host octeon101
> 6	1			osd.6	up	1	
> root@essperf3:/etc/ceph# ceph -s 
>    cluster 868bfacc-e492-11e4-89fa-000fb711110c
>     health HEALTH_OK
>     monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election epoch 1, quorum 0 octeon109
>     osdmap e80: 6 osds: 6 up, 6 in
>      pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
>            60604 MB used, 2734 GB / 2793 GB avail
>                 728 active+clean
> root@essperf3:/etc/ceph#
> 
> root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
> octeon109:
>    ----------
>    - boot_time:
>        1430784431
>    - ceph_version:
>        0.80.8-0.el6
>    - services:
>        ----------
>        ceph-mon.octeon109:
>            ----------
>            cluster:
>                ceph
>            fsid:
>                868bfacc-e492-11e4-89fa-000fb711110c
>            id:
>                octeon109
>            status:
>                ----------
>                election_epoch:
>                    1
>                extra_probe_peers:
>                monmap:
>                    ----------
>                    created:
>                        2015-04-16 23:50:52.412686
>                    epoch:
>                        1
>                    fsid:
>                        868bfacc-e492-11e4-89fa-000fb711110c
>                    modified:
>                        2015-04-16 23:50:52.412686
>                    mons:
>                        ----------
>                        - addr:
>                            209.243.160.70:6789/0
>                        - name:
>                            octeon109
>                        - rank:
>                            0
>                name:
>                    octeon109
>                outside_quorum:
>                quorum:
>                    - 0
>                rank:
>                    0
>                state:
>                    leader
>                sync_provider:
>            type:
>                mon
>            version:
>                0.86
>    ----------
>    - 868bfacc-e492-11e4-89fa-000fb711110c:
>        ----------
>        fsid:
>            868bfacc-e492-11e4-89fa-000fb711110c
>        name:
>            ceph
>        versions:
>            ----------
>            config:
>                87f175c60e5c7ec06c263c556056fbcb
>            health:
>                a907d0ec395713369b4843381ec31bc2
>            mds_map:
>                1
>            mon_map:
>                1
>            mon_status:
>                1
>            osd_map:
>                80
>            pg_summary:
>                7e29d7cc93cfced8f3f146cc78f5682f
> root@essperf3:/etc/ceph#
> 
> 
> 
>> -----Original Message-----
>> From: Gregory Meno [mailto:gmeno@redhat.com]
>> Sent: Tuesday, May 12, 2015 5:03 PM
>> To: Bruce McFarland
>> Cc: ceph-calamari@lists.ceph.com; ceph-users@ceph.com; ceph-devel
>> (ceph-devel@vger.kernel.org)
>> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
>> 
>> Bruce,
>> 
>> It is great to hear that salt is reporting status from all the nodes in the
>> cluster.
>> 
>> Let me see if I understand your question:
>> 
>> You want to know what conditions cause us to recognize a working cluster?
>> 
>> see
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>> manager.py#L135
>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>> manager.py#L349
>> 
>> and
>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/c
>> luster_monitor.py
>> 
>> 
>> Let’s check that you need to be digging into that level of detail:
>> 
>> You switched to a new instance of calamari and it is not recognizing the
>> cluster.
>> 
>> You what to know what you are overlooking? Would you please clarify with
>> some hostnames?
>> 
>> i.e. Let say that your old calamari node was called calamariA and that your
>> new node is calamariB
>> 
>> from which are you running the get_heartbeats?
>> 
>> what is the master setting in the minion config files out on the nodes of the
>> cluster if things are setup correctly they would look like this:
>> 
>> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
>> master: calamariB
>> 
>> 
>> If this is the case the thing I would check is the
>> http://calamariB/api/v2/cluster endpoint is reporting anything?
>> 
>> hope this helps,
>> Gregory
>> 
>>> On May 12, 2015, at 4:34 PM, Bruce McFarland
>> <Bruce.McFarland@taec.toshiba.com> wrote:
>>> 
>>> Increasing the audience since ceph-calamari is not responsive. What salt
>> event/info does the Calamari Master expect to see from the ceph-mon to
>> determine there is an working cluster? I had to change servers hosting the
>> calamari master and can’t get the new machine to recognize the cluster.
>> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, etc for
>> the monitor and all of the osd’s. Can anyone point me to docs or code that
>> might enlighten me to what I’m overlooking? Thanks.
>>> _______________________________________________
>>> ceph-calamari mailing list
>>> ceph-calamari@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ceph-calamari] Does anyone understand Calamari??
       [not found]           ` <95DFDC74-A336-46C4-B72E-24E9747CDD6F-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-13  1:11             ` Bruce McFarland
       [not found]               ` <7E8CF9C16F722345A89076330719CB1F4A8253D3-S73WLWeSKkaudDcc988YZay9ae4OIm1A@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce McFarland @ 2015-05-13  1:11 UTC (permalink / raw)
  To: Gregory Meno
  Cc: ceph-calamari-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel (ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	ceph-users-Qp0mS5GaXlQ

Which logs? I'm assuming /var/log/salt/minon since the rest on the minions are relatively empty. Possibly Cthulhu from the master?

I'm running on Ubuntu 14.04 and don't have an httpd service. I had been start/stopping apache2. Likewise there is no supervisord service and I've been using supervisorctl to start/stop Cthulhu. 

I've performed the calamari-ctl clear/init sequence more than twice with also stopping/starting apache2 and Cthulhu.

> -----Original Message-----
> From: Gregory Meno [mailto:gmeno@redhat.com]
> Sent: Tuesday, May 12, 2015 5:58 PM
> To: Bruce McFarland
> Cc: ceph-calamari@lists.ceph.com; ceph-users@ceph.com; ceph-devel
> (ceph-devel@vger.kernel.org)
> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
> 
> All that looks fine.
> 
> There must be some state where the cluster is known to calamari and it is
> failing to actually show it.
> 
> If you have time to debug I would love to see the logs at debug level.
> 
> If you don’t we could try cleaning out calamari’s state.
> sudo supervisorctl shutdown
> sudo service httpd stop
> sudo calamari-ctl cl—yes-i-am-sure
> sudo calamari-ctl initialize
> ca
> then
> sudo service supervisord start
> sudo service httpd start
> 
> see what the API and UI says then.
> 
> regards,
> Gregory
> > On May 12, 2015, at 5:18 PM, Bruce McFarland
> <Bruce.McFarland@taec.toshiba.com> wrote:
> >
> > Master was ess68 and now it's essperf3.
> >
> > On all cluster nodes the following files now have 'master: essperf3'
> > /etc/salt/minion
> > /etc/salt/minion/calamari.conf
> > /etc/diamond/diamond.conf
> >
> > The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \*
> test.ping' from essperf3 Calamari Master to the cluster. I've also included a
> quick cluster sanity test with the output of ceph -s and ceph osd tree. And for
> your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats'
> since I suspect there might be a missing field in the monitor response.
> >
> > oot@essperf3:/etc/ceph# salt \* test.ping
> > octeon108:
> >    True
> > octeon114:
> >    True
> > octeon111:
> >    True
> > octeon101:
> >    True
> > octeon106:
> >    True
> > octeon109:
> >    True
> > octeon118:
> >    True
> > root@essperf3:/etc/ceph# ceph osd tree
> > # id	weight	type name	up/down	reweight
> > -1	7	root default
> > -4	1		host octeon108
> > 0	1			osd.0	up	1
> > -2	1		host octeon111
> > 1	1			osd.1	up	1
> > -5	1		host octeon115
> > 2	1			osd.2	DNE
> > -6	1		host octeon118
> > 3	1			osd.3	up	1
> > -7	1		host octeon114
> > 4	1			osd.4	up	1
> > -8	1		host octeon106
> > 5	1			osd.5	up	1
> > -9	1		host octeon101
> > 6	1			osd.6	up	1
> > root@essperf3:/etc/ceph# ceph -s
> >    cluster 868bfacc-e492-11e4-89fa-000fb711110c
> >     health HEALTH_OK
> >     monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election
> epoch 1, quorum 0 octeon109
> >     osdmap e80: 6 osds: 6 up, 6 in
> >      pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
> >            60604 MB used, 2734 GB / 2793 GB avail
> >                 728 active+clean
> > root@essperf3:/etc/ceph#
> >
> > root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
> > octeon109:
> >    ----------
> >    - boot_time:
> >        1430784431
> >    - ceph_version:
> >        0.80.8-0.el6
> >    - services:
> >        ----------
> >        ceph-mon.octeon109:
> >            ----------
> >            cluster:
> >                ceph
> >            fsid:
> >                868bfacc-e492-11e4-89fa-000fb711110c
> >            id:
> >                octeon109
> >            status:
> >                ----------
> >                election_epoch:
> >                    1
> >                extra_probe_peers:
> >                monmap:
> >                    ----------
> >                    created:
> >                        2015-04-16 23:50:52.412686
> >                    epoch:
> >                        1
> >                    fsid:
> >                        868bfacc-e492-11e4-89fa-000fb711110c
> >                    modified:
> >                        2015-04-16 23:50:52.412686
> >                    mons:
> >                        ----------
> >                        - addr:
> >                            209.243.160.70:6789/0
> >                        - name:
> >                            octeon109
> >                        - rank:
> >                            0
> >                name:
> >                    octeon109
> >                outside_quorum:
> >                quorum:
> >                    - 0
> >                rank:
> >                    0
> >                state:
> >                    leader
> >                sync_provider:
> >            type:
> >                mon
> >            version:
> >                0.86
> >    ----------
> >    - 868bfacc-e492-11e4-89fa-000fb711110c:
> >        ----------
> >        fsid:
> >            868bfacc-e492-11e4-89fa-000fb711110c
> >        name:
> >            ceph
> >        versions:
> >            ----------
> >            config:
> >                87f175c60e5c7ec06c263c556056fbcb
> >            health:
> >                a907d0ec395713369b4843381ec31bc2
> >            mds_map:
> >                1
> >            mon_map:
> >                1
> >            mon_status:
> >                1
> >            osd_map:
> >                80
> >            pg_summary:
> >                7e29d7cc93cfced8f3f146cc78f5682f
> > root@essperf3:/etc/ceph#
> >
> >
> >
> >> -----Original Message-----
> >> From: Gregory Meno [mailto:gmeno@redhat.com]
> >> Sent: Tuesday, May 12, 2015 5:03 PM
> >> To: Bruce McFarland
> >> Cc: ceph-calamari@lists.ceph.com; ceph-users@ceph.com; ceph-devel
> >> (ceph-devel@vger.kernel.org)
> >> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
> >>
> >> Bruce,
> >>
> >> It is great to hear that salt is reporting status from all the nodes
> >> in the cluster.
> >>
> >> Let me see if I understand your question:
> >>
> >> You want to know what conditions cause us to recognize a working
> cluster?
> >>
> >> see
> >>
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> >> manager.py#L135
> >>
> >>
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> >> manager.py#L349
> >>
> >> and
> >>
> >>
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> >> c
> >> luster_monitor.py
> >>
> >>
> >> Let’s check that you need to be digging into that level of detail:
> >>
> >> You switched to a new instance of calamari and it is not recognizing
> >> the cluster.
> >>
> >> You what to know what you are overlooking? Would you please clarify
> >> with some hostnames?
> >>
> >> i.e. Let say that your old calamari node was called calamariA and
> >> that your new node is calamariB
> >>
> >> from which are you running the get_heartbeats?
> >>
> >> what is the master setting in the minion config files out on the
> >> nodes of the cluster if things are setup correctly they would look like this:
> >>
> >> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
> >> master: calamariB
> >>
> >>
> >> If this is the case the thing I would check is the
> >> http://calamariB/api/v2/cluster endpoint is reporting anything?
> >>
> >> hope this helps,
> >> Gregory
> >>
> >>> On May 12, 2015, at 4:34 PM, Bruce McFarland
> >> <Bruce.McFarland@taec.toshiba.com> wrote:
> >>>
> >>> Increasing the audience since ceph-calamari is not responsive. What
> >>> salt
> >> event/info does the Calamari Master expect to see from the ceph-mon
> >> to determine there is an working cluster? I had to change servers
> >> hosting the calamari master and can’t get the new machine to recognize
> the cluster.
> >> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch,
> >> etc for the monitor and all of the osd’s. Can anyone point me to docs
> >> or code that might enlighten me to what I’m overlooking? Thanks.
> >>> _______________________________________________
> >>> ceph-calamari mailing list
> >>> ceph-calamari@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
> >

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ceph-calamari] Does anyone understand Calamari??
  2015-05-13  0:58         ` Gregory Meno
       [not found]           ` <95DFDC74-A336-46C4-B72E-24E9747CDD6F-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-13  1:39           ` Bruce McFarland
  1 sibling, 0 replies; 9+ messages in thread
From: Bruce McFarland @ 2015-05-13  1:39 UTC (permalink / raw)
  To: Gregory Meno
  Cc: ceph-calamari-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel (ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	ceph-users-Qp0mS5GaXlQ

/var/log/salt/minion doesn't really look very interesting after that sequence. I issues salt oceton109 ceph.get_heartbeats from the master. The logs are much more interesting when clear calamari and stop salt-minion. Looking at the endpoints from http://essperf2/api/v2/cluster doesn't show anything. It reports HTTP 200 OK and Vary: Accept but there is nothing in the body of the output ie no update_time, id, or name is being reported.

root@octeon109:/var/log/salt# tail -f /var/log/salt/minion
2015-05-13 01:31:19,066 [salt.crypt                               ][DEBUG   ][4699] Failed to authenticate message
2015-05-13 01:31:19,068 [salt.minion                              ][DEBUG   ][4699] Attempting to authenticate with the Salt Master at 209.243.160.35
2015-05-13 01:31:19,069 [salt.crypt                               ][DEBUG   ][4699] Re-using SAuth for ('/etc/salt/pki/minion', 'octeon109', 'tcp://209.243.160.35:4506')
2015-05-13 01:31:19,294 [salt.crypt                               ][DEBUG   ][4699] Decrypting the current master AES key
2015-05-13 01:31:19,296 [salt.crypt                               ][DEBUG   ][4699] Loaded minion key: /etc/salt/pki/minion/minion.pem
2015-05-13 01:31:20,026 [salt.crypt                               ][DEBUG   ][4699] Loaded minion key: /etc/salt/pki/minion/minion.pem
2015-05-13 01:33:04,027 [salt.minion                              ][INFO    ][4699] User root Executing command ceph.get_heartbeats with jid 20150512183304482562
2015-05-13 01:33:04,028 [salt.minion                              ][DEBUG   ][4699] Command details {'tgt_type': 'glob', 'jid': '20150512183304482562', 'tgt': 'octeon109', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'ceph.get_heartbeats'}
2015-05-13 01:33:04,043 [salt.minion                              ][INFO    ][5912] Starting a new job with PID 5912
2015-05-13 01:33:04,053 [salt.utils.lazy                          ][DEBUG   ][5912] LazyLoaded ceph.get_heartbeats
2015-05-13 01:33:04,209 [salt.utils.lazy                          ][DEBUG   ][5912] LazyLoaded pkg.version
2015-05-13 01:33:04,212 [salt.utils.lazy                          ][DEBUG   ][5912] LazyLoaded pkg_resource.version
2015-05-13 01:33:04,217 [salt.utils.lazy                          ][DEBUG   ][5912] LazyLoaded cmd.run_stdout
2015-05-13 01:33:04,219 [salt.loaded.int.module.cmdmod            ][INFO    ][5912] Executing command ['dpkg-query', '--showformat', '${Status} ${Package} ${Version} ${Architecture}\n', '-W'] in directory '/root'
2015-05-13 01:33:05,432 [salt.minion                              ][INFO    ][5912] Returning information for job: 20150512183304482562
2015-05-13 01:33:05,434 [salt.crypt                               ][DEBUG   ][5912] Re-using SAuth for ('/etc/salt/pki/minion', 'octeon109', 'tcp://209.243.160.35:4506')


> -----Original Message-----
> From: Bruce McFarland
> Sent: Tuesday, May 12, 2015 6:11 PM
> To: 'Gregory Meno'
> Cc: ceph-calamari@lists.ceph.com; ceph-users@ceph.com; ceph-devel
> (ceph-devel@vger.kernel.org)
> Subject: RE: [ceph-calamari] Does anyone understand Calamari??
> 
> Which logs? I'm assuming /var/log/salt/minon since the rest on the minions
> are relatively empty. Possibly Cthulhu from the master?
> 
> I'm running on Ubuntu 14.04 and don't have an httpd service. I had been
> start/stopping apache2. Likewise there is no supervisord service and I've
> been using supervisorctl to start/stop Cthulhu.
> 
> I've performed the calamari-ctl clear/init sequence more than twice with
> also stopping/starting apache2 and Cthulhu.
> 
> > -----Original Message-----
> > From: Gregory Meno [mailto:gmeno@redhat.com]
> > Sent: Tuesday, May 12, 2015 5:58 PM
> > To: Bruce McFarland
> > Cc: ceph-calamari@lists.ceph.com; ceph-users@ceph.com; ceph-devel
> > (ceph-devel@vger.kernel.org)
> > Subject: Re: [ceph-calamari] Does anyone understand Calamari??
> >
> > All that looks fine.
> >
> > There must be some state where the cluster is known to calamari and it
> > is failing to actually show it.
> >
> > If you have time to debug I would love to see the logs at debug level.
> >
> > If you don’t we could try cleaning out calamari’s state.
> > sudo supervisorctl shutdown
> > sudo service httpd stop
> > sudo calamari-ctl cl—yes-i-am-sure
> > sudo calamari-ctl initialize
> > ca
> > then
> > sudo service supervisord start
> > sudo service httpd start
> >
> > see what the API and UI says then.
> >
> > regards,
> > Gregory
> > > On May 12, 2015, at 5:18 PM, Bruce McFarland
> > <Bruce.McFarland@taec.toshiba.com> wrote:
> > >
> > > Master was ess68 and now it's essperf3.
> > >
> > > On all cluster nodes the following files now have 'master: essperf3'
> > > /etc/salt/minion
> > > /etc/salt/minion/calamari.conf
> > > /etc/diamond/diamond.conf
> > >
> > > The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a
> > > 'salt \*
> > test.ping' from essperf3 Calamari Master to the cluster. I've also
> > included a quick cluster sanity test with the output of ceph -s and
> > ceph osd tree. And for your reading pleasure the output of 'salt octeon109
> ceph.get_heartbeats'
> > since I suspect there might be a missing field in the monitor response.
> > >
> > > oot@essperf3:/etc/ceph# salt \* test.ping
> > > octeon108:
> > >    True
> > > octeon114:
> > >    True
> > > octeon111:
> > >    True
> > > octeon101:
> > >    True
> > > octeon106:
> > >    True
> > > octeon109:
> > >    True
> > > octeon118:
> > >    True
> > > root@essperf3:/etc/ceph# ceph osd tree
> > > # id	weight	type name	up/down	reweight
> > > -1	7	root default
> > > -4	1		host octeon108
> > > 0	1			osd.0	up	1
> > > -2	1		host octeon111
> > > 1	1			osd.1	up	1
> > > -5	1		host octeon115
> > > 2	1			osd.2	DNE
> > > -6	1		host octeon118
> > > 3	1			osd.3	up	1
> > > -7	1		host octeon114
> > > 4	1			osd.4	up	1
> > > -8	1		host octeon106
> > > 5	1			osd.5	up	1
> > > -9	1		host octeon101
> > > 6	1			osd.6	up	1
> > > root@essperf3:/etc/ceph# ceph -s
> > >    cluster 868bfacc-e492-11e4-89fa-000fb711110c
> > >     health HEALTH_OK
> > >     monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election
> > epoch 1, quorum 0 octeon109
> > >     osdmap e80: 6 osds: 6 up, 6 in
> > >      pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
> > >            60604 MB used, 2734 GB / 2793 GB avail
> > >                 728 active+clean
> > > root@essperf3:/etc/ceph#
> > >
> > > root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
> > > octeon109:
> > >    ----------
> > >    - boot_time:
> > >        1430784431
> > >    - ceph_version:
> > >        0.80.8-0.el6
> > >    - services:
> > >        ----------
> > >        ceph-mon.octeon109:
> > >            ----------
> > >            cluster:
> > >                ceph
> > >            fsid:
> > >                868bfacc-e492-11e4-89fa-000fb711110c
> > >            id:
> > >                octeon109
> > >            status:
> > >                ----------
> > >                election_epoch:
> > >                    1
> > >                extra_probe_peers:
> > >                monmap:
> > >                    ----------
> > >                    created:
> > >                        2015-04-16 23:50:52.412686
> > >                    epoch:
> > >                        1
> > >                    fsid:
> > >                        868bfacc-e492-11e4-89fa-000fb711110c
> > >                    modified:
> > >                        2015-04-16 23:50:52.412686
> > >                    mons:
> > >                        ----------
> > >                        - addr:
> > >                            209.243.160.70:6789/0
> > >                        - name:
> > >                            octeon109
> > >                        - rank:
> > >                            0
> > >                name:
> > >                    octeon109
> > >                outside_quorum:
> > >                quorum:
> > >                    - 0
> > >                rank:
> > >                    0
> > >                state:
> > >                    leader
> > >                sync_provider:
> > >            type:
> > >                mon
> > >            version:
> > >                0.86
> > >    ----------
> > >    - 868bfacc-e492-11e4-89fa-000fb711110c:
> > >        ----------
> > >        fsid:
> > >            868bfacc-e492-11e4-89fa-000fb711110c
> > >        name:
> > >            ceph
> > >        versions:
> > >            ----------
> > >            config:
> > >                87f175c60e5c7ec06c263c556056fbcb
> > >            health:
> > >                a907d0ec395713369b4843381ec31bc2
> > >            mds_map:
> > >                1
> > >            mon_map:
> > >                1
> > >            mon_status:
> > >                1
> > >            osd_map:
> > >                80
> > >            pg_summary:
> > >                7e29d7cc93cfced8f3f146cc78f5682f
> > > root@essperf3:/etc/ceph#
> > >
> > >
> > >
> > >> -----Original Message-----
> > >> From: Gregory Meno [mailto:gmeno@redhat.com]
> > >> Sent: Tuesday, May 12, 2015 5:03 PM
> > >> To: Bruce McFarland
> > >> Cc: ceph-calamari@lists.ceph.com; ceph-users@ceph.com; ceph-devel
> > >> (ceph-devel@vger.kernel.org)
> > >> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
> > >>
> > >> Bruce,
> > >>
> > >> It is great to hear that salt is reporting status from all the
> > >> nodes in the cluster.
> > >>
> > >> Let me see if I understand your question:
> > >>
> > >> You want to know what conditions cause us to recognize a working
> > cluster?
> > >>
> > >> see
> > >>
> >
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> > >> manager.py#L135
> > >>
> > >>
> >
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> > >> manager.py#L349
> > >>
> > >> and
> > >>
> > >>
> >
> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
> > >> c
> > >> luster_monitor.py
> > >>
> > >>
> > >> Let’s check that you need to be digging into that level of detail:
> > >>
> > >> You switched to a new instance of calamari and it is not
> > >> recognizing the cluster.
> > >>
> > >> You what to know what you are overlooking? Would you please clarify
> > >> with some hostnames?
> > >>
> > >> i.e. Let say that your old calamari node was called calamariA and
> > >> that your new node is calamariB
> > >>
> > >> from which are you running the get_heartbeats?
> > >>
> > >> what is the master setting in the minion config files out on the
> > >> nodes of the cluster if things are setup correctly they would look like
> this:
> > >>
> > >> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
> > >> master: calamariB
> > >>
> > >>
> > >> If this is the case the thing I would check is the
> > >> http://calamariB/api/v2/cluster endpoint is reporting anything?
> > >>
> > >> hope this helps,
> > >> Gregory
> > >>
> > >>> On May 12, 2015, at 4:34 PM, Bruce McFarland
> > >> <Bruce.McFarland@taec.toshiba.com> wrote:
> > >>>
> > >>> Increasing the audience since ceph-calamari is not responsive.
> > >>> What salt
> > >> event/info does the Calamari Master expect to see from the ceph-mon
> > >> to determine there is an working cluster? I had to change servers
> > >> hosting the calamari master and can’t get the new machine to
> > >> recognize
> > the cluster.
> > >> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch,
> > >> etc for the monitor and all of the osd’s. Can anyone point me to
> > >> docs or code that might enlighten me to what I’m overlooking? Thanks.
> > >>> _______________________________________________
> > >>> ceph-calamari mailing list
> > >>> ceph-calamari@lists.ceph.com
> > >>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
> > >

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ceph-calamari] Does anyone understand Calamari??
       [not found]               ` <7E8CF9C16F722345A89076330719CB1F4A8253D3-S73WLWeSKkaudDcc988YZay9ae4OIm1A@public.gmane.org>
@ 2015-05-13  2:08                 ` Gregory Meno
       [not found]                   ` <D363AFC4-84C1-42EB-A5F0-792A88B685C4-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Gregory Meno @ 2015-05-13  2:08 UTC (permalink / raw)
  To: Bruce McFarland
  Cc: ceph-calamari-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel (ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	ceph-users-Qp0mS5GaXlQ

Ideally I would like everything in /var/log/calmari

be sure to set calamari.conf like so:
[shadow_man@vpm107 ~]$ grep DEBUG /etc/calamari/calamari.conf 
log_level = DEBUG
db_log_level = DEBUG
log_level = DEBUG

then restart cthulhu and apache

visit http://essperf3/api/v2/cluster
and http://essperf3

and then share the logs here. Hopefully something obvious will be off in either calamari or cthulhu log

regards,
Gregory

> On May 12, 2015, at 6:11 PM, Bruce McFarland <Bruce.McFarland@taec.toshiba.com> wrote:
> 
> Which logs? I'm assuming /var/log/salt/minon since the rest on the minions are relatively empty. Possibly Cthulhu from the master?
> 
> I'm running on Ubuntu 14.04 and don't have an httpd service. I had been start/stopping apache2. Likewise there is no supervisord service and I've been using supervisorctl to start/stop Cthulhu. 
> 
> I've performed the calamari-ctl clear/init sequence more than twice with also stopping/starting apache2 and Cthulhu.
> 
>> -----Original Message-----
>> From: Gregory Meno [mailto:gmeno@redhat.com]
>> Sent: Tuesday, May 12, 2015 5:58 PM
>> To: Bruce McFarland
>> Cc: ceph-calamari@lists.ceph.com; ceph-users@ceph.com; ceph-devel
>> (ceph-devel@vger.kernel.org)
>> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
>> 
>> All that looks fine.
>> 
>> There must be some state where the cluster is known to calamari and it is
>> failing to actually show it.
>> 
>> If you have time to debug I would love to see the logs at debug level.
>> 
>> If you don’t we could try cleaning out calamari’s state.
>> sudo supervisorctl shutdown
>> sudo service httpd stop
>> sudo calamari-ctl cl—yes-i-am-sure
>> sudo calamari-ctl initialize
>> ca
>> then
>> sudo service supervisord start
>> sudo service httpd start
>> 
>> see what the API and UI says then.
>> 
>> regards,
>> Gregory
>>> On May 12, 2015, at 5:18 PM, Bruce McFarland
>> <Bruce.McFarland@taec.toshiba.com> wrote:
>>> 
>>> Master was ess68 and now it's essperf3.
>>> 
>>> On all cluster nodes the following files now have 'master: essperf3'
>>> /etc/salt/minion
>>> /etc/salt/minion/calamari.conf
>>> /etc/diamond/diamond.conf
>>> 
>>> The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \*
>> test.ping' from essperf3 Calamari Master to the cluster. I've also included a
>> quick cluster sanity test with the output of ceph -s and ceph osd tree. And for
>> your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats'
>> since I suspect there might be a missing field in the monitor response.
>>> 
>>> oot@essperf3:/etc/ceph# salt \* test.ping
>>> octeon108:
>>>  True
>>> octeon114:
>>>  True
>>> octeon111:
>>>  True
>>> octeon101:
>>>  True
>>> octeon106:
>>>  True
>>> octeon109:
>>>  True
>>> octeon118:
>>>  True
>>> root@essperf3:/etc/ceph# ceph osd tree
>>> # id	weight	type name	up/down	reweight
>>> -1	7	root default
>>> -4	1		host octeon108
>>> 0	1			osd.0	up	1
>>> -2	1		host octeon111
>>> 1	1			osd.1	up	1
>>> -5	1		host octeon115
>>> 2	1			osd.2	DNE
>>> -6	1		host octeon118
>>> 3	1			osd.3	up	1
>>> -7	1		host octeon114
>>> 4	1			osd.4	up	1
>>> -8	1		host octeon106
>>> 5	1			osd.5	up	1
>>> -9	1		host octeon101
>>> 6	1			osd.6	up	1
>>> root@essperf3:/etc/ceph# ceph -s
>>>  cluster 868bfacc-e492-11e4-89fa-000fb711110c
>>>   health HEALTH_OK
>>>   monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election
>> epoch 1, quorum 0 octeon109
>>>   osdmap e80: 6 osds: 6 up, 6 in
>>>    pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
>>>          60604 MB used, 2734 GB / 2793 GB avail
>>>               728 active+clean
>>> root@essperf3:/etc/ceph#
>>> 
>>> root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
>>> octeon109:
>>>  ----------
>>>  - boot_time:
>>>      1430784431
>>>  - ceph_version:
>>>      0.80.8-0.el6
>>>  - services:
>>>      ----------
>>>      ceph-mon.octeon109:
>>>          ----------
>>>          cluster:
>>>              ceph
>>>          fsid:
>>>              868bfacc-e492-11e4-89fa-000fb711110c
>>>          id:
>>>              octeon109
>>>          status:
>>>              ----------
>>>              election_epoch:
>>>                  1
>>>              extra_probe_peers:
>>>              monmap:
>>>                  ----------
>>>                  created:
>>>                      2015-04-16 23:50:52.412686
>>>                  epoch:
>>>                      1
>>>                  fsid:
>>>                      868bfacc-e492-11e4-89fa-000fb711110c
>>>                  modified:
>>>                      2015-04-16 23:50:52.412686
>>>                  mons:
>>>                      ----------
>>>                      - addr:
>>>                          209.243.160.70:6789/0
>>>                      - name:
>>>                          octeon109
>>>                      - rank:
>>>                          0
>>>              name:
>>>                  octeon109
>>>              outside_quorum:
>>>              quorum:
>>>                  - 0
>>>              rank:
>>>                  0
>>>              state:
>>>                  leader
>>>              sync_provider:
>>>          type:
>>>              mon
>>>          version:
>>>              0.86
>>>  ----------
>>>  - 868bfacc-e492-11e4-89fa-000fb711110c:
>>>      ----------
>>>      fsid:
>>>          868bfacc-e492-11e4-89fa-000fb711110c
>>>      name:
>>>          ceph
>>>      versions:
>>>          ----------
>>>          config:
>>>              87f175c60e5c7ec06c263c556056fbcb
>>>          health:
>>>              a907d0ec395713369b4843381ec31bc2
>>>          mds_map:
>>>              1
>>>          mon_map:
>>>              1
>>>          mon_status:
>>>              1
>>>          osd_map:
>>>              80
>>>          pg_summary:
>>>              7e29d7cc93cfced8f3f146cc78f5682f
>>> root@essperf3:/etc/ceph#
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Gregory Meno [mailto:gmeno@redhat.com]
>>>> Sent: Tuesday, May 12, 2015 5:03 PM
>>>> To: Bruce McFarland
>>>> Cc: ceph-calamari@lists.ceph.com; ceph-users@ceph.com; ceph-devel
>>>> (ceph-devel@vger.kernel.org)
>>>> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
>>>> 
>>>> Bruce,
>>>> 
>>>> It is great to hear that salt is reporting status from all the nodes
>>>> in the cluster.
>>>> 
>>>> Let me see if I understand your question:
>>>> 
>>>> You want to know what conditions cause us to recognize a working
>> cluster?
>>>> 
>>>> see
>>>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>>>> manager.py#L135
>>>> 
>>>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>>>> manager.py#L349
>>>> 
>>>> and
>>>> 
>>>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>>>> c
>>>> luster_monitor.py
>>>> 
>>>> 
>>>> Let’s check that you need to be digging into that level of detail:
>>>> 
>>>> You switched to a new instance of calamari and it is not recognizing
>>>> the cluster.
>>>> 
>>>> You what to know what you are overlooking? Would you please clarify
>>>> with some hostnames?
>>>> 
>>>> i.e. Let say that your old calamari node was called calamariA and
>>>> that your new node is calamariB
>>>> 
>>>> from which are you running the get_heartbeats?
>>>> 
>>>> what is the master setting in the minion config files out on the
>>>> nodes of the cluster if things are setup correctly they would look like this:
>>>> 
>>>> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
>>>> master: calamariB
>>>> 
>>>> 
>>>> If this is the case the thing I would check is the
>>>> http://calamariB/api/v2/cluster endpoint is reporting anything?
>>>> 
>>>> hope this helps,
>>>> Gregory
>>>> 
>>>>> On May 12, 2015, at 4:34 PM, Bruce McFarland
>>>> <Bruce.McFarland@taec.toshiba.com> wrote:
>>>>> 
>>>>> Increasing the audience since ceph-calamari is not responsive. What
>>>>> salt
>>>> event/info does the Calamari Master expect to see from the ceph-mon
>>>> to determine there is an working cluster? I had to change servers
>>>> hosting the calamari master and can’t get the new machine to recognize
>> the cluster.
>>>> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch,
>>>> etc for the monitor and all of the osd’s. Can anyone point me to docs
>>>> or code that might enlighten me to what I’m overlooking? Thanks.
>>>>> _______________________________________________
>>>>> ceph-calamari mailing list
>>>>> ceph-calamari@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
>>> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ceph-calamari] Does anyone understand Calamari??
       [not found]                   ` <D363AFC4-84C1-42EB-A5F0-792A88B685C4-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-13  9:23                     ` Steffen W Sørensen
       [not found]                       ` <666078A3-277A-4442-B2E4-29774B8470BB-BUHhN+a2lJ4@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Steffen W Sørensen @ 2015-05-13  9:23 UTC (permalink / raw)
  To: Gregory Meno
  Cc: ceph-calamari-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel (ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	ceph-users-Qp0mS5GaXlQ


[-- Attachment #1.1: Type: text/plain, Size: 5366 bytes --]


> On 13/05/2015, at 04.08, Gregory Meno <gmeno-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> Ideally I would like everything in /var/log/calmari
> 
> be sure to set calamari.conf like so:
> [shadow_man@vpm107 ~]$ grep DEBUG /etc/calamari/calamari.conf 
> log_level = DEBUG
> db_log_level = DEBUG
> log_level = DEBUG
> 
> then restart cthulhu and apache
> 
> visit http://essperf3/api/v2/cluster
> and http://essperf3
> 
> and then share the logs here. Hopefully something obvious will be off in either calamari or cthulhu log

Since I got similar issue, I sneak my log data in here as well, no offence…

-rw-r--r-- 1 www-data www-data   1554 May 13 11:14 info.log
-rw-r--r-- 1 root     root      29311 May 13 11:14 httpd_error.log
-rw-r--r-- 1 root     root       4599 May 13 11:14 httpd_access.log
-rw-r--r-- 1 www-data www-data    739 May 13 11:14 calamari.log
-rw-r--r-- 1 root     root     238047 May 13 11:14 cthulhu.log

root@node1:/var/log/calamari# cat calamari.log
2015-05-13 04:05:40,787 - metric_access - django.request Not Found: /favicon.ico
2015-05-13 04:14:02,263 - DEBUG - django.request.profile [17.5249576569ms] /api/v2/cluster
2015-05-13 04:14:02,263 - DEBUG - django.request.profile RPC timing for 'list_clusters': 3.89504432678/3.89504432678/3.89504432678 avg/min/max ms
2015-05-13 04:14:02,263 - DEBUG - django.request.profile Total time in RPC: 3.89504432678ms
2015-05-13 04:14:06,172 - DEBUG - django.request.profile [15.8069133759ms] /api/v2/cluster
2015-05-13 04:14:06,173 - DEBUG - django.request.profile RPC timing for 'list_clusters': 2.44808197021/2.44808197021/2.44808197021 avg/min/max ms
2015-05-13 04:14:06,173 - DEBUG - django.request.profile Total time in RPC: 2.44808197021ms

root@node1:/var/log/calamari# tail cthulhu.log
2015-05-13 11:14:46,694 - DEBUG - cthulhu nivcsw: 102
2015-05-13 11:14:46,709 - DEBUG - cthulhu Eventer.on_tick
2015-05-13 11:14:46,710 - INFO - cthulhu Eventer._emit: 2015-05-13 09:14:46.710030+00:00/WARNING/Cluster 'ceph' is late reporting in
2015-05-13 11:14:46,710 - INFO - sqlalchemy.engine.base.Engine BEGIN (implicit)
2015-05-13 11:14:46,711 - INFO - sqlalchemy.engine.base.Engine INSERT INTO cthulhu_event ("when", severity, message, fsid, fqdn, service_type, service_id) VALUES (%(when)s, %(severity)s, %(message)s, %(fsid)s, %(fqdn)s, %(service_type)s, %(service_id)s) RETURNING cthulhu_event.id
2015-05-13 11:14:46,711 - INFO - sqlalchemy.engine.base.Engine {'severity': 3, 'when': datetime.datetime(2015, 5, 13, 9, 14, 46, 710030, tzinfo=tzutc()), 'fqdn': None, 'service_type': None, 'service_id': None, 'message': "Cluster 'ceph' is late reporting in", 'fsid': u'16fe2dcf-2629-422f-a649-871deba78bcd'}
2015-05-13 11:14:46,713 - DEBUG - sqlalchemy.engine.base.Engine Col ('id',)
2015-05-13 11:14:46,714 - DEBUG - sqlalchemy.engine.base.Engine Row (54,)
2015-05-13 11:14:46,714 - INFO - sqlalchemy.engine.base.Engine COMMIT
2015-05-13 11:14:56,710 - DEBUG - cthulhu Eventer.on_tick

root@node1:/var/log/calamari# tail httpd_error.log
[Wed May 13 04:14:05 2015] [warn]   File "/usr/lib/python2.7/dist-packages/git/__init__.py", line 20, in _init_externals
[Wed May 13 04:14:05 2015] [warn]     import gitdb
[Wed May 13 04:14:05 2015] [warn]   File "/usr/lib/python2.7/dist-packages/gitdb/__init__.py", line 25, in <module>
[Wed May 13 04:14:05 2015] [warn]     _init_externals()
[Wed May 13 04:14:05 2015] [warn]   File "/usr/lib/python2.7/dist-packages/gitdb/__init__.py", line 17, in _init_externals
[Wed May 13 04:14:05 2015] [warn]     __import__(module)
[Wed May 13 04:14:05 2015] [warn]   File "/usr/lib/python2.7/dist-packages/async/__init__.py", line 36, in <module>
[Wed May 13 04:14:05 2015] [warn]     _init_signals()
[Wed May 13 04:14:05 2015] [warn]   File "/usr/lib/python2.7/dist-packages/async/__init__.py", line 26, in _init_signals
[Wed May 13 04:14:05 2015] [warn]     signal.signal(signal.SIGINT, thread_interrupt_handler)

root@node1:/var/log/calamari# tail httpd_access.log
<ip> - - [13/May/2015:11:14:02 +0200] "GET /static/rest_framework/js/default.js HTTP/1.1" 304 209
<ip> - - [13/May/2015:11:14:04 +0200] "GET /api/v2/cluster HTTP/1.1" 200 2258
<ip> - - [13/May/2015:11:14:06 +0200] "GET /static/rest_framework/css/bootstrap.min.css HTTP/1.1" 304 211
<ip> - - [13/May/2015:11:14:06 +0200] "GET /static/rest_framework/css/bootstrap-tweaks.css HTTP/1.1" 304 209
<ip> - - [13/May/2015:11:14:06 +0200] "GET /static/rest_framework/css/prettify.css HTTP/1.1" 304 209
<ip> - - [13/May/2015:11:14:06 +0200] "GET /static/rest_framework/js/jquery-1.8.1-min.js HTTP/1.1" 304 211
<ip> - - [13/May/2015:11:14:06 +0200] "GET /static/rest_framework/js/bootstrap.min.js HTTP/1.1" 304 210
<ip> - - [13/May/2015:11:14:06 +0200] "GET /static/rest_framework/css/default.css HTTP/1.1" 304 209
<ip> - - [13/May/2015:11:14:06 +0200] "GET /static/rest_framework/js/prettify-min.js HTTP/1.1" 304 210
<ip> - - [13/May/2015:11:14:06 +0200] "GET /static/rest_framework/js/default.js HTTP/1.1" 304 209

wget /api/v2/cluster returns:

GET /api/v2/cluster
HTTP 200 OK
Vary: Accept
Content-Type: text/html; charset=utf-8
Allow: GET, HEAD, OPTIONS

[
    {
        "update_time": "2015-05-13T09:13:46.607104+00:00", 
        "id": "16fe2dcf-2629-422f-a649-871deba78bcd", 
        "name": "ceph"
    }
]


[-- Attachment #1.2: Type: text/html, Size: 10760 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ceph-calamari] Does anyone understand Calamari??
       [not found]                       ` <666078A3-277A-4442-B2E4-29774B8470BB-BUHhN+a2lJ4@public.gmane.org>
@ 2015-05-13 10:55                         ` Steffen W Sørensen
  0 siblings, 0 replies; 9+ messages in thread
From: Steffen W Sørensen @ 2015-05-13 10:55 UTC (permalink / raw)
  To: Gregory Meno
  Cc: ceph-calamari-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel (ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	ceph-users-Qp0mS5GaXlQ


[-- Attachment #1.1: Type: text/plain, Size: 1117 bytes --]


> On 13/05/2015, at 11.23, Steffen W Sørensen <stefws-BUHhN+a2lJ4@public.gmane.org> wrote:
> 
> 
>> On 13/05/2015, at 04.08, Gregory Meno <gmeno-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org <mailto:gmeno-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>> wrote:
>> 
>> Ideally I would like everything in /var/log/calmari
>> 
>> be sure to set calamari.conf like so:
>> [shadow_man@vpm107 ~]$ grep DEBUG /etc/calamari/calamari.conf 
>> log_level = DEBUG
>> db_log_level = DEBUG
>> log_level = DEBUG
>> 
>> then restart cthulhu and apache
>> 
>> visit http://essperf3/api/v2/cluster <http://essperf3/api/v2/cluster>
>> and http://essperf3 <http://essperf3/>
>> 
>> and then share the logs here. Hopefully something obvious will be off in either calamari or cthulhu log
> 
> Since I got similar issue, I sneak my log data in here as well, no offence…
… had similar issue, dunno what changed, but just but revisiting our calamari UI, it seems to be working again… knowing of our cluster at least :)
only it’ll not update [health] state, which seems stuck, but IO and other stats are updated fine.



[-- Attachment #1.2: Type: text/html, Size: 2104 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-05-13 10:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-12 23:34 Does anyone understand Calamari?? Bruce McFarland
2015-05-13  0:02 ` [ceph-calamari] " Gregory Meno
     [not found]   ` <607C9580-DF69-4CA0-9D65-550700893CA8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-13  0:18     ` Bruce McFarland
     [not found]       ` <7E8CF9C16F722345A89076330719CB1F4A825343-S73WLWeSKkaudDcc988YZay9ae4OIm1A@public.gmane.org>
2015-05-13  0:58         ` Gregory Meno
     [not found]           ` <95DFDC74-A336-46C4-B72E-24E9747CDD6F-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-13  1:11             ` Bruce McFarland
     [not found]               ` <7E8CF9C16F722345A89076330719CB1F4A8253D3-S73WLWeSKkaudDcc988YZay9ae4OIm1A@public.gmane.org>
2015-05-13  2:08                 ` Gregory Meno
     [not found]                   ` <D363AFC4-84C1-42EB-A5F0-792A88B685C4-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-13  9:23                     ` Steffen W Sørensen
     [not found]                       ` <666078A3-277A-4442-B2E4-29774B8470BB-BUHhN+a2lJ4@public.gmane.org>
2015-05-13 10:55                         ` Steffen W Sørensen
2015-05-13  1:39           ` Bruce McFarland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.