All of lore.kernel.org
 help / color / mirror / Atom feed
* Feeding pool utilization data to time series for trending
       [not found] ` <c0ef7893-2bd2-d11f-b008-db566145ce84@redhat.com>
@ 2016-12-20  4:19   ` Shubhendu Tripathi
  2016-12-20  8:59     ` Wido den Hollander
                       ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Shubhendu Tripathi @ 2016-12-20  4:19 UTC (permalink / raw)
  To: ceph-devel

Hi Team,

Our team is currently working on project named "tendrl" [1][2].
Tendrl is a management platform for software defined storage system like 
Ceph, Gluster etc.

As part of tendrl we are integrating with collectd to collect 
performance data and we maintain the time series data in graphite.

I have a question at this juncture regarding pool utilization data.
As our thought process goes, we think of using output from command "ceph 
df" and parse it to figure out pool utilization data and push it to 
graphite using collectd.
The question here is what is/would be performance impact of running 
"ceph df" command on ceph nodes. We should be running this command only 
on mon nodes I feel.

Wanted to verify with the team here if this thought process is in right 
direction and if so what ideally should be frequency of running the 
command "ceph df" from collectd.

This is just from our point of view and we are open to any other 
foolproof solution (if any).

Kindly guide us.

Regards,
Shubhendu Tripathi

[1] http://tendrl.org/
[2] https://github.com/tendrl/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Feeding pool utilization data to time series for trending
  2016-12-20  4:19   ` Feeding pool utilization data to time series for trending Shubhendu Tripathi
@ 2016-12-20  8:59     ` Wido den Hollander
  2016-12-20 10:17       ` Shubhendu Tripathi
  2016-12-20 10:22     ` John Spray
  2016-12-20 11:43     ` Ruben Kerkhof
  2 siblings, 1 reply; 5+ messages in thread
From: Wido den Hollander @ 2016-12-20  8:59 UTC (permalink / raw)
  To: ceph-devel, Shubhendu Tripathi


> Op 20 december 2016 om 5:19 schreef Shubhendu Tripathi <shtripat@redhat.com>:
> 
> 
> Hi Team,
> 
> Our team is currently working on project named "tendrl" [1][2].
> Tendrl is a management platform for software defined storage system like 
> Ceph, Gluster etc.
> 
> As part of tendrl we are integrating with collectd to collect 
> performance data and we maintain the time series data in graphite.
> 
> I have a question at this juncture regarding pool utilization data.
> As our thought process goes, we think of using output from command "ceph 
> df" and parse it to figure out pool utilization data and push it to 
> graphite using collectd.
> The question here is what is/would be performance impact of running 
> "ceph df" command on ceph nodes. We should be running this command only 
> on mon nodes I feel.
> 

Correct, that data comes from the MONs and is not that heavy.

> Wanted to verify with the team here if this thought process is in right 
> direction and if so what ideally should be frequency of running the 
> command "ceph df" from collectd.
> 

Running the command means forking a process every time and also going through the whole cephx authentication and client <> MON process.

> This is just from our point of view and we are open to any other 
> foolproof solution (if any).

The best would be to keep a open connection to a MON and run the 'df' command directly on the MONs in a loop.

I wrote something like that in Python a while ago for 'ceph status': https://gist.github.com/wido/ac53ae01d661dd57f4a8

cmd = {"prefix":"status", "format":"json"}

If you change that to:

cmd = {"prefix":"df", "format":"json"}

You ask the MON for 'df' and get back a JSON. Run that in a loop where you sleep every 1 or 5 seconds and you should have very real-time information.

Wido

> 
> Kindly guide us.
> 
> Regards,
> Shubhendu Tripathi
> 
> [1] http://tendrl.org/
> [2] https://github.com/tendrl/
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Feeding pool utilization data to time series for trending
  2016-12-20  8:59     ` Wido den Hollander
@ 2016-12-20 10:17       ` Shubhendu Tripathi
  0 siblings, 0 replies; 5+ messages in thread
From: Shubhendu Tripathi @ 2016-12-20 10:17 UTC (permalink / raw)
  To: Wido den Hollander, ceph-devel

On 12/20/2016 02:29 PM, Wido den Hollander wrote:
>> Op 20 december 2016 om 5:19 schreef Shubhendu Tripathi <shtripat@redhat.com>:
>>
>>
>> Hi Team,
>>
>> Our team is currently working on project named "tendrl" [1][2].
>> Tendrl is a management platform for software defined storage system like
>> Ceph, Gluster etc.
>>
>> As part of tendrl we are integrating with collectd to collect
>> performance data and we maintain the time series data in graphite.
>>
>> I have a question at this juncture regarding pool utilization data.
>> As our thought process goes, we think of using output from command "ceph
>> df" and parse it to figure out pool utilization data and push it to
>> graphite using collectd.
>> The question here is what is/would be performance impact of running
>> "ceph df" command on ceph nodes. We should be running this command only
>> on mon nodes I feel.
>>
> Correct, that data comes from the MONs and is not that heavy.
>
>> Wanted to verify with the team here if this thought process is in right
>> direction and if so what ideally should be frequency of running the
>> command "ceph df" from collectd.
>>
> Running the command means forking a process every time and also going through the whole cephx authentication and client <> MON process.
>
>> This is just from our point of view and we are open to any other
>> foolproof solution (if any).
> The best would be to keep a open connection to a MON and run the 'df' command directly on the MONs in a loop.
>
> I wrote something like that in Python a while ago for 'ceph status': https://gist.github.com/wido/ac53ae01d661dd57f4a8
>
> cmd = {"prefix":"status", "format":"json"}
>
> If you change that to:
>
> cmd = {"prefix":"df", "format":"json"}
>
> You ask the MON for 'df' and get back a JSON. Run that in a loop where you sleep every 1 or 5 seconds and you should have very real-time information.

Thanks Wido. This is for sure a good suggestion.
I would try to incorporate this and test once.

Regards,
Shubhendu

>
> Wido
>
>> Kindly guide us.
>>
>> Regards,
>> Shubhendu Tripathi
>>
>> [1] http://tendrl.org/
>> [2] https://github.com/tendrl/
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Feeding pool utilization data to time series for trending
  2016-12-20  4:19   ` Feeding pool utilization data to time series for trending Shubhendu Tripathi
  2016-12-20  8:59     ` Wido den Hollander
@ 2016-12-20 10:22     ` John Spray
  2016-12-20 11:43     ` Ruben Kerkhof
  2 siblings, 0 replies; 5+ messages in thread
From: John Spray @ 2016-12-20 10:22 UTC (permalink / raw)
  To: Shubhendu Tripathi; +Cc: ceph-devel

On Tue, Dec 20, 2016 at 4:19 AM, Shubhendu Tripathi <shtripat@redhat.com> wrote:
> Hi Team,
>
> Our team is currently working on project named "tendrl" [1][2].
> Tendrl is a management platform for software defined storage system like
> Ceph, Gluster etc.
>
> As part of tendrl we are integrating with collectd to collect performance
> data and we maintain the time series data in graphite.
>
> I have a question at this juncture regarding pool utilization data.
> As our thought process goes, we think of using output from command "ceph df"
> and parse it to figure out pool utilization data and push it to graphite
> using collectd.

From Kraken onwards it's simpler to write a ceph-mgr module that sends
the data straight to your time series store -- mgr plugins have access
to in-memory copies of this stuff without having to do any polling.

If you need to be backwards compatible with Jewel, you can do what the
existing stats collector does:
https://github.com/ceph/Diamond/blob/calamari/src/collectors/ceph/ceph.py

Note that the existing collector sends commands to the mons using
librados: no need to literally wrap the command line.

> The question here is what is/would be performance impact of running "ceph
> df" command on ceph nodes. We should be running this command only on mon
> nodes I feel.

The Ceph command line connects to mons over the network -- you can run
it from wherever you like.  However, you only actually need to run it
from one place: it's redundant to collect the same data from multiple
nodes.  The existing stats collector runs on all mons, but decides
whether to collect the cluster-wide data (such as free space) based on
whether its local mon is the leader or not (see
_collect_cluster_stats).

This problem goes away with ceph-mgr because it takes care of
instantiating your plugin in just one place.

> Wanted to verify with the team here if this thought process is in right
> direction and if so what ideally should be frequency of running the command
> "ceph df" from collectd.

No more frequently than the data is collected internally from OSDs
(osd_mon_report_interval_min, which is 5 seconds by default).

John

> This is just from our point of view and we are open to any other foolproof
> solution (if any).
>
> Kindly guide us.
>
> Regards,
> Shubhendu Tripathi
>
> [1] http://tendrl.org/
> [2] https://github.com/tendrl/
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Feeding pool utilization data to time series for trending
  2016-12-20  4:19   ` Feeding pool utilization data to time series for trending Shubhendu Tripathi
  2016-12-20  8:59     ` Wido den Hollander
  2016-12-20 10:22     ` John Spray
@ 2016-12-20 11:43     ` Ruben Kerkhof
  2 siblings, 0 replies; 5+ messages in thread
From: Ruben Kerkhof @ 2016-12-20 11:43 UTC (permalink / raw)
  To: Shubhendu Tripathi; +Cc: ceph-devel

On Tue, Dec 20, 2016 at 5:19 AM, Shubhendu Tripathi <shtripat@redhat.com> wrote:
> Hi Team,
>
> Our team is currently working on project named "tendrl" [1][2].
> Tendrl is a management platform for software defined storage system like
> Ceph, Gluster etc.
>
> As part of tendrl we are integrating with collectd to collect performance
> data and we maintain the time series data in graphite.
>
> I have a question at this juncture regarding pool utilization data.
> As our thought process goes, we think of using output from command "ceph df"
> and parse it to figure out pool utilization data and push it to graphite
> using collectd.
> The question here is what is/would be performance impact of running "ceph
> df" command on ceph nodes. We should be running this command only on mon
> nodes I feel.
>
> Wanted to verify with the team here if this thought process is in right
> direction and if so what ideally should be frequency of running the command
> "ceph df" from collectd.

Have you looked at Collectd's Ceph plugin
(https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_ceph)

Kind regards,

Ruben Kerkhof

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-20 11:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0dafd5ff-1ed6-cb05-05d3-dff3afb43c44@redhat.com>
     [not found] ` <c0ef7893-2bd2-d11f-b008-db566145ce84@redhat.com>
2016-12-20  4:19   ` Feeding pool utilization data to time series for trending Shubhendu Tripathi
2016-12-20  8:59     ` Wido den Hollander
2016-12-20 10:17       ` Shubhendu Tripathi
2016-12-20 10:22     ` John Spray
2016-12-20 11:43     ` Ruben Kerkhof

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.