All of lore.kernel.org
 help / color / mirror / Atom feed
* config on mons
@ 2017-11-10 15:30 Sage Weil
  2017-11-13  0:27 ` Patrick Donnelly
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Sage Weil @ 2017-11-10 15:30 UTC (permalink / raw)
  To: ceph-devel

I've started on this long-discussed feature!  I haven't gotten too far but 
you can see what's there so far at

	https://github.com/ceph/ceph/pull/18856

The first thing perhaps is to finalize what flexibility we want to 
support.  I've a quick summary at

	http://pad.ceph.com/p/config

Namely,

 config/option = value               # like [global]
 config/$type/option = value         # like [mon]
 config/$type.$id/option = value     # like [mon.a]

There are two new things:

 config/.../class:$classname/option = value

For OSDs, this matches the device_class.  So you can do something like

 config/osd/class:ssd/bluestore_cache_size = 10485760  # 10gb, woohoo!

You can also match the crush location:

 config/.../$crushtype:$crushvalue/option = value

e.g.,

 config/osd/rack:foo/debug_osd = 10    # hunting some issue

This obviously makes sense for OSDs.  We can also make it makes sense for 
non-OSDs since everybody (clients and daemons) has a concept of 
crush_location that is a set of key/value pairs like "host=foo rack=bar" 
which match the CRUSH hierarchy.  In this case, my plan is to make the 
initial mon authentication step include the hostname of the host you're 
connecting from and then extract the rest of the location by lookup 
up the host in the CRUSH map.

The precedence for these is described here:

	https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15


Lots of other thorny issues to consider.  For example:

- What about monitor configs?  If they store their config paxos, and you 
set an option that breaks paxos, how can you change/fix it?  For the 
moment I'm just ignoring the mons.

- What about ceph.conf?  My thought here is to mark which options are 
legal for bootstrap (i.e., used during the initial connection to mon to 
authenticate and fetch config), and warn on anything other than that in 
ceph.conf.  But what about after you connect?  Do these options get reset 
to default?

- Bootstrapping/upgrade: So far my best idea is to make the client share 
it's config with the mon on startup, and the first time a given daemon 
connects the mon will use that to populate it's config database.  
Thereafter it will be ignored.

- OSD startup: lots of stuff happens before we authenticate.  I think 
there will be a new initial step to fetch config, then do all that work, 
then start up for real.  And a new option to bypass mon configuration 
to avoid that (and for old school folks who don't want centralized 
configs... e.g. mon_config = false and everything works as before).

Feedback welcome!
sage

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-10 15:30 config on mons Sage Weil
@ 2017-11-13  0:27 ` Patrick Donnelly
  2017-11-13  1:43 ` Yehuda Sadeh-Weinraub
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Patrick Donnelly @ 2017-11-13  0:27 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development

On Sat, Nov 11, 2017 at 2:30 AM, Sage Weil <sweil@redhat.com> wrote:
> - What about ceph.conf?  My thought here is to mark which options are
> legal for bootstrap (i.e., used during the initial connection to mon to
> authenticate and fetch config), and warn on anything other than that in
> ceph.conf.  But what about after you connect?  Do these options get reset
> to default?

Perhaps we should deprecate ceph.conf and mandate an alternate
bootstrap file for connecting to the mons.

-- 
Patrick Donnelly

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-10 15:30 config on mons Sage Weil
  2017-11-13  0:27 ` Patrick Donnelly
@ 2017-11-13  1:43 ` Yehuda Sadeh-Weinraub
  2017-11-13  9:57   ` John Spray
  2017-11-13  4:30 ` Christian Wuerdig
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2017-11-13  1:43 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Fri, Nov 10, 2017 at 7:30 AM, Sage Weil <sweil@redhat.com> wrote:
> I've started on this long-discussed feature!  I haven't gotten too far but
> you can see what's there so far at
>
>         https://github.com/ceph/ceph/pull/18856
>
> The first thing perhaps is to finalize what flexibility we want to
> support.  I've a quick summary at
>
>         http://pad.ceph.com/p/config
>
> Namely,
>
>  config/option = value               # like [global]
>  config/$type/option = value         # like [mon]
>  config/$type.$id/option = value     # like [mon.a]
>
> There are two new things:
>
>  config/.../class:$classname/option = value
>
> For OSDs, this matches the device_class.  So you can do something like
>
>  config/osd/class:ssd/bluestore_cache_size = 10485760  # 10gb, woohoo!
>
> You can also match the crush location:
>
>  config/.../$crushtype:$crushvalue/option = value
>
> e.g.,
>
>  config/osd/rack:foo/debug_osd = 10    # hunting some issue
>
> This obviously makes sense for OSDs.  We can also make it makes sense for
> non-OSDs since everybody (clients and daemons) has a concept of
> crush_location that is a set of key/value pairs like "host=foo rack=bar"
> which match the CRUSH hierarchy.  In this case, my plan is to make the
> initial mon authentication step include the hostname of the host you're
> connecting from and then extract the rest of the location by lookup
> up the host in the CRUSH map.
>
> The precedence for these is described here:
>
>         https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15
>
>
> Lots of other thorny issues to consider.  For example:
>
> - What about monitor configs?  If they store their config paxos, and you
> set an option that breaks paxos, how can you change/fix it?  For the
> moment I'm just ignoring the mons.
>
> - What about ceph.conf?  My thought here is to mark which options are
> legal for bootstrap (i.e., used during the initial connection to mon to
> authenticate and fetch config), and warn on anything other than that in
> ceph.conf.  But what about after you connect?  Do these options get reset
> to default?

And also hat about configurables passed in as args? I think that other
than any local configuration (ceph.conf, args) should still be used to
override config from mons. We can add warnings and whistles to warn
when such configuration exists, but should not lose it.
>
> - Bootstrapping/upgrade: So far my best idea is to make the client share
> it's config with the mon on startup, and the first time a given daemon
> connects the mon will use that to populate it's config database.
> Thereafter it will be ignored.

Maybe there could be some flag that we could pass in to select th
client's behavior. By default it'd take the mon config if that exists.
Other options would be to take local config, or overlay local over
mon.

Yehuda

>
> - OSD startup: lots of stuff happens before we authenticate.  I think
> there will be a new initial step to fetch config, then do all that work,
> then start up for real.  And a new option to bypass mon configuration
> to avoid that (and for old school folks who don't want centralized
> configs... e.g. mon_config = false and everything works as before).
>
> Feedback welcome!
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-10 15:30 config on mons Sage Weil
  2017-11-13  0:27 ` Patrick Donnelly
  2017-11-13  1:43 ` Yehuda Sadeh-Weinraub
@ 2017-11-13  4:30 ` Christian Wuerdig
  2017-11-13 10:00   ` John Spray
  2017-11-13 13:23 ` John Spray
  2017-11-14 22:21 ` Sage Weil
  4 siblings, 1 reply; 26+ messages in thread
From: Christian Wuerdig @ 2017-11-13  4:30 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development

Hm, have you guys considered utilizing existing key-value stores like
Consul or etcd for this instead of rolling your own? Not sure about
the details of etcd but the Consul API is quite nice, supports long
polling and transactional support. Obvious downside is that you depend
on a separate service but that can also be an advantage.

On Sat, Nov 11, 2017 at 4:30 AM, Sage Weil <sweil@redhat.com> wrote:
> I've started on this long-discussed feature!  I haven't gotten too far but
> you can see what's there so far at
>
>         https://github.com/ceph/ceph/pull/18856
>
> The first thing perhaps is to finalize what flexibility we want to
> support.  I've a quick summary at
>
>         http://pad.ceph.com/p/config
>
> Namely,
>
>  config/option = value               # like [global]
>  config/$type/option = value         # like [mon]
>  config/$type.$id/option = value     # like [mon.a]
>
> There are two new things:
>
>  config/.../class:$classname/option = value
>
> For OSDs, this matches the device_class.  So you can do something like
>
>  config/osd/class:ssd/bluestore_cache_size = 10485760  # 10gb, woohoo!
>
> You can also match the crush location:
>
>  config/.../$crushtype:$crushvalue/option = value
>
> e.g.,
>
>  config/osd/rack:foo/debug_osd = 10    # hunting some issue
>
> This obviously makes sense for OSDs.  We can also make it makes sense for
> non-OSDs since everybody (clients and daemons) has a concept of
> crush_location that is a set of key/value pairs like "host=foo rack=bar"
> which match the CRUSH hierarchy.  In this case, my plan is to make the
> initial mon authentication step include the hostname of the host you're
> connecting from and then extract the rest of the location by lookup
> up the host in the CRUSH map.
>
> The precedence for these is described here:
>
>         https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15
>
>
> Lots of other thorny issues to consider.  For example:
>
> - What about monitor configs?  If they store their config paxos, and you
> set an option that breaks paxos, how can you change/fix it?  For the
> moment I'm just ignoring the mons.
>
> - What about ceph.conf?  My thought here is to mark which options are
> legal for bootstrap (i.e., used during the initial connection to mon to
> authenticate and fetch config), and warn on anything other than that in
> ceph.conf.  But what about after you connect?  Do these options get reset
> to default?
>
> - Bootstrapping/upgrade: So far my best idea is to make the client share
> it's config with the mon on startup, and the first time a given daemon
> connects the mon will use that to populate it's config database.
> Thereafter it will be ignored.
>
> - OSD startup: lots of stuff happens before we authenticate.  I think
> there will be a new initial step to fetch config, then do all that work,
> then start up for real.  And a new option to bypass mon configuration
> to avoid that (and for old school folks who don't want centralized
> configs... e.g. mon_config = false and everything works as before).
>
> Feedback welcome!
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-13  1:43 ` Yehuda Sadeh-Weinraub
@ 2017-11-13  9:57   ` John Spray
  2017-11-13 16:29     ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 26+ messages in thread
From: John Spray @ 2017-11-13  9:57 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Sage Weil, ceph-devel

On Mon, Nov 13, 2017 at 1:43 AM, Yehuda Sadeh-Weinraub
<ysadehwe@redhat.com> wrote:
>> - What about ceph.conf?  My thought here is to mark which options are
>> legal for bootstrap (i.e., used during the initial connection to mon to
>> authenticate and fetch config), and warn on anything other than that in
>> ceph.conf.  But what about after you connect?  Do these options get reset
>> to default?
>
> And also hat about configurables passed in as args? I think that other
> than any local configuration (ceph.conf, args) should still be used to
> override config from mons. We can add warnings and whistles to warn
> when such configuration exists, but should not lose it.

This comes up whenever we talk about the centralized config so I guess
it never quite got put to rest...

The big downside to letting services selectively ignore the mons is
that anyone building a user interface is pretty much screwed if they
want to show the current value of a config setting, unless we make the
MonClient config subscription a two-way thing that enables services to
*set* their own config (from their ceph.conf) in addition to receiving
it.

John

>> - Bootstrapping/upgrade: So far my best idea is to make the client share
>> it's config with the mon on startup, and the first time a given daemon
>> connects the mon will use that to populate it's config database.
>> Thereafter it will be ignored.
>
> Maybe there could be some flag that we could pass in to select th
> client's behavior. By default it'd take the mon config if that exists.
> Other options would be to take local config, or overlay local over
> mon.
>
> Yehuda
>
>>
>> - OSD startup: lots of stuff happens before we authenticate.  I think
>> there will be a new initial step to fetch config, then do all that work,
>> then start up for real.  And a new option to bypass mon configuration
>> to avoid that (and for old school folks who don't want centralized
>> configs... e.g. mon_config = false and everything works as before).
>>
>> Feedback welcome!
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-13  4:30 ` Christian Wuerdig
@ 2017-11-13 10:00   ` John Spray
  2017-11-13 16:45     ` Mark Nelson
  0 siblings, 1 reply; 26+ messages in thread
From: John Spray @ 2017-11-13 10:00 UTC (permalink / raw)
  To: Christian Wuerdig; +Cc: Sage Weil, Ceph Development

On Mon, Nov 13, 2017 at 4:30 AM, Christian Wuerdig
<christian.wuerdig@gmail.com> wrote:
> Hm, have you guys considered utilizing existing key-value stores like
> Consul or etcd for this instead of rolling your own? Not sure about
> the details of etcd but the Consul API is quite nice, supports long
> polling and transactional support. Obvious downside is that you depend
> on a separate service but that can also be an advantage.

When it comes to putting and getting values, Consul and etcd don't
really offer much that the ceph mons don't already do.  As you say, it
would be a new dependency, but more importantly it would also be a
whole new network comms path with its own authentication, ports, etc.

This is one of those situations where using something off the shelf is
actually way more effort (for developers and for users) than building
it in.

John

>
> On Sat, Nov 11, 2017 at 4:30 AM, Sage Weil <sweil@redhat.com> wrote:
>> I've started on this long-discussed feature!  I haven't gotten too far but
>> you can see what's there so far at
>>
>>         https://github.com/ceph/ceph/pull/18856
>>
>> The first thing perhaps is to finalize what flexibility we want to
>> support.  I've a quick summary at
>>
>>         http://pad.ceph.com/p/config
>>
>> Namely,
>>
>>  config/option = value               # like [global]
>>  config/$type/option = value         # like [mon]
>>  config/$type.$id/option = value     # like [mon.a]
>>
>> There are two new things:
>>
>>  config/.../class:$classname/option = value
>>
>> For OSDs, this matches the device_class.  So you can do something like
>>
>>  config/osd/class:ssd/bluestore_cache_size = 10485760  # 10gb, woohoo!
>>
>> You can also match the crush location:
>>
>>  config/.../$crushtype:$crushvalue/option = value
>>
>> e.g.,
>>
>>  config/osd/rack:foo/debug_osd = 10    # hunting some issue
>>
>> This obviously makes sense for OSDs.  We can also make it makes sense for
>> non-OSDs since everybody (clients and daemons) has a concept of
>> crush_location that is a set of key/value pairs like "host=foo rack=bar"
>> which match the CRUSH hierarchy.  In this case, my plan is to make the
>> initial mon authentication step include the hostname of the host you're
>> connecting from and then extract the rest of the location by lookup
>> up the host in the CRUSH map.
>>
>> The precedence for these is described here:
>>
>>         https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15
>>
>>
>> Lots of other thorny issues to consider.  For example:
>>
>> - What about monitor configs?  If they store their config paxos, and you
>> set an option that breaks paxos, how can you change/fix it?  For the
>> moment I'm just ignoring the mons.
>>
>> - What about ceph.conf?  My thought here is to mark which options are
>> legal for bootstrap (i.e., used during the initial connection to mon to
>> authenticate and fetch config), and warn on anything other than that in
>> ceph.conf.  But what about after you connect?  Do these options get reset
>> to default?
>>
>> - Bootstrapping/upgrade: So far my best idea is to make the client share
>> it's config with the mon on startup, and the first time a given daemon
>> connects the mon will use that to populate it's config database.
>> Thereafter it will be ignored.
>>
>> - OSD startup: lots of stuff happens before we authenticate.  I think
>> there will be a new initial step to fetch config, then do all that work,
>> then start up for real.  And a new option to bypass mon configuration
>> to avoid that (and for old school folks who don't want centralized
>> configs... e.g. mon_config = false and everything works as before).
>>
>> Feedback welcome!
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-10 15:30 config on mons Sage Weil
                   ` (2 preceding siblings ...)
  2017-11-13  4:30 ` Christian Wuerdig
@ 2017-11-13 13:23 ` John Spray
  2017-11-14 22:21 ` Sage Weil
  4 siblings, 0 replies; 26+ messages in thread
From: John Spray @ 2017-11-13 13:23 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development

On Fri, Nov 10, 2017 at 3:30 PM, Sage Weil <sweil@redhat.com> wrote:
> I've started on this long-discussed feature!  I haven't gotten too far but
> you can see what's there so far at
>
>         https://github.com/ceph/ceph/pull/18856

Woohoo!

> The first thing perhaps is to finalize what flexibility we want to
> support.  I've a quick summary at
>
>         http://pad.ceph.com/p/config
>
> Namely,
>
>  config/option = value               # like [global]
>  config/$type/option = value         # like [mon]
>  config/$type.$id/option = value     # like [mon.a]
>
> There are two new things:
>
>  config/.../class:$classname/option = value
>
> For OSDs, this matches the device_class.  So you can do something like
>
>  config/osd/class:ssd/bluestore_cache_size = 10485760  # 10gb, woohoo!
>
> You can also match the crush location:
>
>  config/.../$crushtype:$crushvalue/option = value
>
> e.g.,
>
>  config/osd/rack:foo/debug_osd = 10    # hunting some issue
>
> This obviously makes sense for OSDs.  We can also make it makes sense for
> non-OSDs since everybody (clients and daemons) has a concept of
> crush_location that is a set of key/value pairs like "host=foo rack=bar"
> which match the CRUSH hierarchy.  In this case, my plan is to make the
> initial mon authentication step include the hostname of the host you're
> connecting from and then extract the rest of the location by lookup
> up the host in the CRUSH map.
>
> The precedence for these is described here:
>
>         https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15
>
>
> Lots of other thorny issues to consider.  For example:
>
> - What about monitor configs?  If they store their config paxos, and you
> set an option that breaks paxos, how can you change/fix it?  For the
> moment I'm just ignoring the mons.
>
> - What about ceph.conf?  My thought here is to mark which options are
> legal for bootstrap (i.e., used during the initial connection to mon to
> authenticate and fetch config), and warn on anything other than that in
> ceph.conf.  But what about after you connect?  Do these options get reset
> to default?

I can't immediately think of examples of something that would be
needed for bootstrap but would also be sane to change later?  In
general if something is needed for bootstrap I would imagine that the
local setting would be authoritative, but I suspect (because you're
bringing it up) that there are cases where this doesn't apply...

> - Bootstrapping/upgrade: So far my best idea is to make the client share
> it's config with the mon on startup, and the first time a given daemon
> connects the mon will use that to populate it's config database.
> Thereafter it will be ignored.

I hate upgrades :-)

This sounds like a sane thing to do.  We certainly have to do
*something* or we'll have really nasty issues like we had when the
crush_location_hook setting changed names.

For our two-major-versions commitment from Mimic onwards, I guess that
means we leave this mechanism in for the N release too, and then
eventually remove in the O release.

BTW I learned about the "cockeyed squid" aka "strawberry squid"
yesterday, so I think that's a strong candidate for the S name when we
get there, just thinking ahead :-)

> - OSD startup: lots of stuff happens before we authenticate.  I think
> there will be a new initial step to fetch config, then do all that work,
> then start up for real.  And a new option to bypass mon configuration
> to avoid that (and for old school folks who don't want centralized
> configs... e.g. mon_config = false and everything works as before).

I know I'm on the opinionated end of the spectrum here, but I'm not
quite convinced we should leave in a "mon_config = false" option.  If
we continue to let people use the local file interface through this
version, then it's at least another three versions before we can
ultimately remove it, whereas if we disable it now (apart from the
initial load on upgrade) then we are starting the clock for ultimately
removing that plumbing.

We do need to support the upgrade path, but if we enable it to
optionally run with local config on an ongoing basis then we might be
undermining the motivations for building the centralized
infrastructure (the confidence/certainty that the value set is a
validated thing, and that it is really what is in effect).

John

>
> Feedback welcome!
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-13  9:57   ` John Spray
@ 2017-11-13 16:29     ` Yehuda Sadeh-Weinraub
  0 siblings, 0 replies; 26+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2017-11-13 16:29 UTC (permalink / raw)
  To: John Spray; +Cc: Sage Weil, ceph-devel

On Mon, Nov 13, 2017 at 1:57 AM, John Spray <jspray@redhat.com> wrote:
> On Mon, Nov 13, 2017 at 1:43 AM, Yehuda Sadeh-Weinraub
> <ysadehwe@redhat.com> wrote:
>>> - What about ceph.conf?  My thought here is to mark which options are
>>> legal for bootstrap (i.e., used during the initial connection to mon to
>>> authenticate and fetch config), and warn on anything other than that in
>>> ceph.conf.  But what about after you connect?  Do these options get reset
>>> to default?
>>
>> And also hat about configurables passed in as args? I think that other
>> than any local configuration (ceph.conf, args) should still be used to
>> override config from mons. We can add warnings and whistles to warn
>> when such configuration exists, but should not lose it.
>
> This comes up whenever we talk about the centralized config so I guess
> it never quite got put to rest...
>
> The big downside to letting services selectively ignore the mons is
> that anyone building a user interface is pretty much screwed if they
> want to show the current value of a config setting, unless we make the
> MonClient config subscription a two-way thing that enables services to
> *set* their own config (from their ceph.conf) in addition to receiving
> it.

More like have them report it, not necessarily set it. We should have that.
I don't like the idea of not being able to modify it without going to
the monitors. There might be cases where either doing that via the
monitors is not practical, or cumbersome, or you'd just want to try
different values quickly, or running in a test or dev environment,
etc. And given that we need to have that subsystem working anyway, as
we need it for bootstrapping, and not everything that we run even
connects or should connect to the cluster, I think it would also make
logical sense.

Yehuda

>
> John
>
>>> - Bootstrapping/upgrade: So far my best idea is to make the client share
>>> it's config with the mon on startup, and the first time a given daemon
>>> connects the mon will use that to populate it's config database.
>>> Thereafter it will be ignored.
>>
>> Maybe there could be some flag that we could pass in to select th
>> client's behavior. By default it'd take the mon config if that exists.
>> Other options would be to take local config, or overlay local over
>> mon.
>>
>> Yehuda
>>
>>>
>>> - OSD startup: lots of stuff happens before we authenticate.  I think
>>> there will be a new initial step to fetch config, then do all that work,
>>> then start up for real.  And a new option to bypass mon configuration
>>> to avoid that (and for old school folks who don't want centralized
>>> configs... e.g. mon_config = false and everything works as before).
>>>
>>> Feedback welcome!
>>> sage
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-13 10:00   ` John Spray
@ 2017-11-13 16:45     ` Mark Nelson
  2017-11-13 18:20       ` Kyle Bader
  0 siblings, 1 reply; 26+ messages in thread
From: Mark Nelson @ 2017-11-13 16:45 UTC (permalink / raw)
  To: John Spray, Christian Wuerdig; +Cc: Sage Weil, Ceph Development



On 11/13/2017 04:00 AM, John Spray wrote:
> On Mon, Nov 13, 2017 at 4:30 AM, Christian Wuerdig
> <christian.wuerdig@gmail.com> wrote:
>> Hm, have you guys considered utilizing existing key-value stores like
>> Consul or etcd for this instead of rolling your own? Not sure about
>> the details of etcd but the Consul API is quite nice, supports long
>> polling and transactional support. Obvious downside is that you depend
>> on a separate service but that can also be an advantage.
>
> When it comes to putting and getting values, Consul and etcd don't
> really offer much that the ceph mons don't already do.  As you say, it
> would be a new dependency, but more importantly it would also be a
> whole new network comms path with its own authentication, ports, etc.
>
> This is one of those situations where using something off the shelf is
> actually way more effort (for developers and for users) than building
> it in.
>
> John
>

I don't disagree, but I could imagine there are a number of sysadmins 
that want Ceph to play nice with whatever they are currently using for 
everything else they maintain.  Whatever we do here, we probably want to 
be mindful (ie I'd argue that deprecating ceph.conf might not be well 
liked by folks that are happy with their current setup).

Mark

>>
>> On Sat, Nov 11, 2017 at 4:30 AM, Sage Weil <sweil@redhat.com> wrote:
>>> I've started on this long-discussed feature!  I haven't gotten too far but
>>> you can see what's there so far at
>>>
>>>         https://github.com/ceph/ceph/pull/18856
>>>
>>> The first thing perhaps is to finalize what flexibility we want to
>>> support.  I've a quick summary at
>>>
>>>         http://pad.ceph.com/p/config
>>>
>>> Namely,
>>>
>>>  config/option = value               # like [global]
>>>  config/$type/option = value         # like [mon]
>>>  config/$type.$id/option = value     # like [mon.a]
>>>
>>> There are two new things:
>>>
>>>  config/.../class:$classname/option = value
>>>
>>> For OSDs, this matches the device_class.  So you can do something like
>>>
>>>  config/osd/class:ssd/bluestore_cache_size = 10485760  # 10gb, woohoo!
>>>
>>> You can also match the crush location:
>>>
>>>  config/.../$crushtype:$crushvalue/option = value
>>>
>>> e.g.,
>>>
>>>  config/osd/rack:foo/debug_osd = 10    # hunting some issue
>>>
>>> This obviously makes sense for OSDs.  We can also make it makes sense for
>>> non-OSDs since everybody (clients and daemons) has a concept of
>>> crush_location that is a set of key/value pairs like "host=foo rack=bar"
>>> which match the CRUSH hierarchy.  In this case, my plan is to make the
>>> initial mon authentication step include the hostname of the host you're
>>> connecting from and then extract the rest of the location by lookup
>>> up the host in the CRUSH map.
>>>
>>> The precedence for these is described here:
>>>
>>>         https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15
>>>
>>>
>>> Lots of other thorny issues to consider.  For example:
>>>
>>> - What about monitor configs?  If they store their config paxos, and you
>>> set an option that breaks paxos, how can you change/fix it?  For the
>>> moment I'm just ignoring the mons.
>>>
>>> - What about ceph.conf?  My thought here is to mark which options are
>>> legal for bootstrap (i.e., used during the initial connection to mon to
>>> authenticate and fetch config), and warn on anything other than that in
>>> ceph.conf.  But what about after you connect?  Do these options get reset
>>> to default?
>>>
>>> - Bootstrapping/upgrade: So far my best idea is to make the client share
>>> it's config with the mon on startup, and the first time a given daemon
>>> connects the mon will use that to populate it's config database.
>>> Thereafter it will be ignored.
>>>
>>> - OSD startup: lots of stuff happens before we authenticate.  I think
>>> there will be a new initial step to fetch config, then do all that work,
>>> then start up for real.  And a new option to bypass mon configuration
>>> to avoid that (and for old school folks who don't want centralized
>>> configs... e.g. mon_config = false and everything works as before).
>>>
>>> Feedback welcome!
>>> sage
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-13 16:45     ` Mark Nelson
@ 2017-11-13 18:20       ` Kyle Bader
  2017-11-13 18:40         ` John Spray
  0 siblings, 1 reply; 26+ messages in thread
From: Kyle Bader @ 2017-11-13 18:20 UTC (permalink / raw)
  To: Mark Nelson; +Cc: John Spray, Christian Wuerdig, Sage Weil, Ceph Development

Configuration files are often driven by configuration management, with
previous versions stored in some kind of version control systems. We
should make sure that if configuration moves to the monitors that you
have some form of history and rollback capabilities. It might be worth
modeling it similar to network switch configuration shells, a la
Junos.

* change configuration
* require commit configuration change
* ability to rollback N configuration changes
* ability to diff to configuration versions

That way an admin can figure out when the last configuration change
was, what changed, and rollback if necessary.


On Mon, Nov 13, 2017 at 8:45 AM, Mark Nelson <mnelson@redhat.com> wrote:
>
>
> On 11/13/2017 04:00 AM, John Spray wrote:
>>
>> On Mon, Nov 13, 2017 at 4:30 AM, Christian Wuerdig
>> <christian.wuerdig@gmail.com> wrote:
>>>
>>> Hm, have you guys considered utilizing existing key-value stores like
>>> Consul or etcd for this instead of rolling your own? Not sure about
>>> the details of etcd but the Consul API is quite nice, supports long
>>> polling and transactional support. Obvious downside is that you depend
>>> on a separate service but that can also be an advantage.
>>
>>
>> When it comes to putting and getting values, Consul and etcd don't
>> really offer much that the ceph mons don't already do.  As you say, it
>> would be a new dependency, but more importantly it would also be a
>> whole new network comms path with its own authentication, ports, etc.
>>
>> This is one of those situations where using something off the shelf is
>> actually way more effort (for developers and for users) than building
>> it in.
>>
>> John
>>
>
> I don't disagree, but I could imagine there are a number of sysadmins that
> want Ceph to play nice with whatever they are currently using for everything
> else they maintain.  Whatever we do here, we probably want to be mindful (ie
> I'd argue that deprecating ceph.conf might not be well liked by folks that
> are happy with their current setup).
>
> Mark
>
>
>>>
>>> On Sat, Nov 11, 2017 at 4:30 AM, Sage Weil <sweil@redhat.com> wrote:
>>>>
>>>> I've started on this long-discussed feature!  I haven't gotten too far
>>>> but
>>>> you can see what's there so far at
>>>>
>>>>         https://github.com/ceph/ceph/pull/18856
>>>>
>>>> The first thing perhaps is to finalize what flexibility we want to
>>>> support.  I've a quick summary at
>>>>
>>>>         http://pad.ceph.com/p/config
>>>>
>>>> Namely,
>>>>
>>>>  config/option = value               # like [global]
>>>>  config/$type/option = value         # like [mon]
>>>>  config/$type.$id/option = value     # like [mon.a]
>>>>
>>>> There are two new things:
>>>>
>>>>  config/.../class:$classname/option = value
>>>>
>>>> For OSDs, this matches the device_class.  So you can do something like
>>>>
>>>>  config/osd/class:ssd/bluestore_cache_size = 10485760  # 10gb, woohoo!
>>>>
>>>> You can also match the crush location:
>>>>
>>>>  config/.../$crushtype:$crushvalue/option = value
>>>>
>>>> e.g.,
>>>>
>>>>  config/osd/rack:foo/debug_osd = 10    # hunting some issue
>>>>
>>>> This obviously makes sense for OSDs.  We can also make it makes sense
>>>> for
>>>> non-OSDs since everybody (clients and daemons) has a concept of
>>>> crush_location that is a set of key/value pairs like "host=foo rack=bar"
>>>> which match the CRUSH hierarchy.  In this case, my plan is to make the
>>>> initial mon authentication step include the hostname of the host you're
>>>> connecting from and then extract the rest of the location by lookup
>>>> up the host in the CRUSH map.
>>>>
>>>> The precedence for these is described here:
>>>>
>>>>
>>>> https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15
>>>>
>>>>
>>>> Lots of other thorny issues to consider.  For example:
>>>>
>>>> - What about monitor configs?  If they store their config paxos, and you
>>>> set an option that breaks paxos, how can you change/fix it?  For the
>>>> moment I'm just ignoring the mons.
>>>>
>>>> - What about ceph.conf?  My thought here is to mark which options are
>>>> legal for bootstrap (i.e., used during the initial connection to mon to
>>>> authenticate and fetch config), and warn on anything other than that in
>>>> ceph.conf.  But what about after you connect?  Do these options get
>>>> reset
>>>> to default?
>>>>
>>>> - Bootstrapping/upgrade: So far my best idea is to make the client share
>>>> it's config with the mon on startup, and the first time a given daemon
>>>> connects the mon will use that to populate it's config database.
>>>> Thereafter it will be ignored.
>>>>
>>>> - OSD startup: lots of stuff happens before we authenticate.  I think
>>>> there will be a new initial step to fetch config, then do all that work,
>>>> then start up for real.  And a new option to bypass mon configuration
>>>> to avoid that (and for old school folks who don't want centralized
>>>> configs... e.g. mon_config = false and everything works as before).
>>>>
>>>> Feedback welcome!
>>>> sage
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-13 18:20       ` Kyle Bader
@ 2017-11-13 18:40         ` John Spray
  2017-11-14 10:18           ` Piotr Dałek
  0 siblings, 1 reply; 26+ messages in thread
From: John Spray @ 2017-11-13 18:40 UTC (permalink / raw)
  To: Kyle Bader; +Cc: Mark Nelson, Christian Wuerdig, Sage Weil, Ceph Development

On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote:
> Configuration files are often driven by configuration management, with
> previous versions stored in some kind of version control systems. We
> should make sure that if configuration moves to the monitors that you
> have some form of history and rollback capabilities. It might be worth
> modeling it similar to network switch configuration shells, a la
> Junos.
>
> * change configuration
> * require commit configuration change
> * ability to rollback N configuration changes
> * ability to diff to configuration versions
>
> That way an admin can figure out when the last configuration change
> was, what changed, and rollback if necessary.

That is an extremely good idea.

As a minimal thing, it should be pretty straightforward to implement a
snapshot/rollback.

I imagine many users today are not so disciplined as to version
control their configs, but this is a good opportunity to push that as
the norm by building it in.

John

>
>
> On Mon, Nov 13, 2017 at 8:45 AM, Mark Nelson <mnelson@redhat.com> wrote:
>>
>>
>> On 11/13/2017 04:00 AM, John Spray wrote:
>>>
>>> On Mon, Nov 13, 2017 at 4:30 AM, Christian Wuerdig
>>> <christian.wuerdig@gmail.com> wrote:
>>>>
>>>> Hm, have you guys considered utilizing existing key-value stores like
>>>> Consul or etcd for this instead of rolling your own? Not sure about
>>>> the details of etcd but the Consul API is quite nice, supports long
>>>> polling and transactional support. Obvious downside is that you depend
>>>> on a separate service but that can also be an advantage.
>>>
>>>
>>> When it comes to putting and getting values, Consul and etcd don't
>>> really offer much that the ceph mons don't already do.  As you say, it
>>> would be a new dependency, but more importantly it would also be a
>>> whole new network comms path with its own authentication, ports, etc.
>>>
>>> This is one of those situations where using something off the shelf is
>>> actually way more effort (for developers and for users) than building
>>> it in.
>>>
>>> John
>>>
>>
>> I don't disagree, but I could imagine there are a number of sysadmins that
>> want Ceph to play nice with whatever they are currently using for everything
>> else they maintain.  Whatever we do here, we probably want to be mindful (ie
>> I'd argue that deprecating ceph.conf might not be well liked by folks that
>> are happy with their current setup).
>>
>> Mark
>>
>>
>>>>
>>>> On Sat, Nov 11, 2017 at 4:30 AM, Sage Weil <sweil@redhat.com> wrote:
>>>>>
>>>>> I've started on this long-discussed feature!  I haven't gotten too far
>>>>> but
>>>>> you can see what's there so far at
>>>>>
>>>>>         https://github.com/ceph/ceph/pull/18856
>>>>>
>>>>> The first thing perhaps is to finalize what flexibility we want to
>>>>> support.  I've a quick summary at
>>>>>
>>>>>         http://pad.ceph.com/p/config
>>>>>
>>>>> Namely,
>>>>>
>>>>>  config/option = value               # like [global]
>>>>>  config/$type/option = value         # like [mon]
>>>>>  config/$type.$id/option = value     # like [mon.a]
>>>>>
>>>>> There are two new things:
>>>>>
>>>>>  config/.../class:$classname/option = value
>>>>>
>>>>> For OSDs, this matches the device_class.  So you can do something like
>>>>>
>>>>>  config/osd/class:ssd/bluestore_cache_size = 10485760  # 10gb, woohoo!
>>>>>
>>>>> You can also match the crush location:
>>>>>
>>>>>  config/.../$crushtype:$crushvalue/option = value
>>>>>
>>>>> e.g.,
>>>>>
>>>>>  config/osd/rack:foo/debug_osd = 10    # hunting some issue
>>>>>
>>>>> This obviously makes sense for OSDs.  We can also make it makes sense
>>>>> for
>>>>> non-OSDs since everybody (clients and daemons) has a concept of
>>>>> crush_location that is a set of key/value pairs like "host=foo rack=bar"
>>>>> which match the CRUSH hierarchy.  In this case, my plan is to make the
>>>>> initial mon authentication step include the hostname of the host you're
>>>>> connecting from and then extract the rest of the location by lookup
>>>>> up the host in the CRUSH map.
>>>>>
>>>>> The precedence for these is described here:
>>>>>
>>>>>
>>>>> https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15
>>>>>
>>>>>
>>>>> Lots of other thorny issues to consider.  For example:
>>>>>
>>>>> - What about monitor configs?  If they store their config paxos, and you
>>>>> set an option that breaks paxos, how can you change/fix it?  For the
>>>>> moment I'm just ignoring the mons.
>>>>>
>>>>> - What about ceph.conf?  My thought here is to mark which options are
>>>>> legal for bootstrap (i.e., used during the initial connection to mon to
>>>>> authenticate and fetch config), and warn on anything other than that in
>>>>> ceph.conf.  But what about after you connect?  Do these options get
>>>>> reset
>>>>> to default?
>>>>>
>>>>> - Bootstrapping/upgrade: So far my best idea is to make the client share
>>>>> it's config with the mon on startup, and the first time a given daemon
>>>>> connects the mon will use that to populate it's config database.
>>>>> Thereafter it will be ignored.
>>>>>
>>>>> - OSD startup: lots of stuff happens before we authenticate.  I think
>>>>> there will be a new initial step to fetch config, then do all that work,
>>>>> then start up for real.  And a new option to bypass mon configuration
>>>>> to avoid that (and for old school folks who don't want centralized
>>>>> configs... e.g. mon_config = false and everything works as before).
>>>>>
>>>>> Feedback welcome!
>>>>> sage
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-13 18:40         ` John Spray
@ 2017-11-14 10:18           ` Piotr Dałek
  2017-11-14 11:36             ` John Spray
  2017-11-14 13:48             ` Mark Nelson
  0 siblings, 2 replies; 26+ messages in thread
From: Piotr Dałek @ 2017-11-14 10:18 UTC (permalink / raw)
  To: John Spray, Kyle Bader
  Cc: Mark Nelson, Christian Wuerdig, Sage Weil, Ceph Development

On 17-11-13 07:40 PM, John Spray wrote:
> On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote:
>> Configuration files are often driven by configuration management, with
>> previous versions stored in some kind of version control systems. We
>> should make sure that if configuration moves to the monitors that you
>> have some form of history and rollback capabilities. It might be worth
>> modeling it similar to network switch configuration shells, a la
>> Junos.
>>
>> * change configuration
>> * require commit configuration change
>> * ability to rollback N configuration changes
>> * ability to diff to configuration versions
>>
>> That way an admin can figure out when the last configuration change
>> was, what changed, and rollback if necessary.
> 
> That is an extremely good idea.
> 
> As a minimal thing, it should be pretty straightforward to implement a
> snapshot/rollback.

https://thedailywtf.com/articles/The_Complicator_0x27_s_Gloves

> I imagine many users today are not so disciplined as to version
> control their configs, but this is a good opportunity to push that as
> the norm by building it in.

Using Ceph on any decent scale actually requires one to use at least Puppet 
or similar tool, I wouldn't add any unnecessary complexity to already 
complex code just because of novice users that are going to have hard time 
using Ceph anyway once a disk breaks and needs to be replaced, or when 
performance goes to hell because users are free to create and remove 
snapshots every 5 minutes.
And I can already imagine clusters breaking down once config 
database/history breaks for whatever reason, including early implementation 
bugs.

Distributing configs through mon isn't bad idea by itself, I can imagine 
having changes to runtime-changeable settings propagated to OSDs without the 
need for extra step (actually injecting them) and without the need for 
restart, but for anything else, there are already good tools and I see no 
value in trying to mimic them.

-- 
Piotr Dałek
piotr.dalek@corp.ovh.com
https://www.ovh.com/us/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-14 10:18           ` Piotr Dałek
@ 2017-11-14 11:36             ` John Spray
  2017-11-14 13:58               ` Piotr Dałek
  2017-11-14 14:33               ` Mark Nelson
  2017-11-14 13:48             ` Mark Nelson
  1 sibling, 2 replies; 26+ messages in thread
From: John Spray @ 2017-11-14 11:36 UTC (permalink / raw)
  To: Piotr Dałek
  Cc: Kyle Bader, Mark Nelson, Christian Wuerdig, Sage Weil, Ceph Development

On Tue, Nov 14, 2017 at 10:18 AM, Piotr Dałek <piotr.dalek@corp.ovh.com> wrote:
> On 17-11-13 07:40 PM, John Spray wrote:
>>
>> On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote:
>>>
>>> Configuration files are often driven by configuration management, with
>>> previous versions stored in some kind of version control systems. We
>>> should make sure that if configuration moves to the monitors that you
>>> have some form of history and rollback capabilities. It might be worth
>>> modeling it similar to network switch configuration shells, a la
>>> Junos.
>>>
>>> * change configuration
>>> * require commit configuration change
>>> * ability to rollback N configuration changes
>>> * ability to diff to configuration versions
>>>
>>> That way an admin can figure out when the last configuration change
>>> was, what changed, and rollback if necessary.
>>
>>
>> That is an extremely good idea.
>>
>> As a minimal thing, it should be pretty straightforward to implement a
>> snapshot/rollback.
>
>
> https://thedailywtf.com/articles/The_Complicator_0x27_s_Gloves
>
>> I imagine many users today are not so disciplined as to version
>> control their configs, but this is a good opportunity to push that as
>> the norm by building it in.
>
>
> Using Ceph on any decent scale actually requires one to use at least Puppet
> or similar tool, I wouldn't add any unnecessary complexity to already
> complex code just because of novice users that are going to have hard time
> using Ceph anyway once a disk breaks and needs to be replaced, or when
> performance goes to hell because users are free to create and remove
> snapshots every 5 minutes.

All of the experienced users were novice users once -- making Ceph
work well for those people is worthwhile.  It's not easy to build
things that are easy enough for a newcomer but also powerful enough
for the general case, but it is worth doing.

When we have to trade internal complexity vs. complexity at
interfaces, it's generally better to keep the interfaces simple.
Currently a Ceph cluster with 1000 OSDs has 1000 places to input the
configuration, and no one place that a person can ask "what is setting
X on my OSDs?".  Even when they look at a ceph.conf file, they can't
be sure that those are really the values in use (has the service
restarted since the file was updated?) or that they will ever be (are
they invalid values that Ceph will reject on load?).

The "dump a text file in /etc" interface looks simple on the face of
it, but is actually quite complex when you look to automate a Ceph
cluster from a central user interface, or build more intelligence into
Ceph for avoiding dangerous configurations.  It's also painful for
non-expert users who are required to type precisely correct syntax
into that text file.

> And I can already imagine clusters breaking down once config
> database/history breaks for whatever reason, including early implementation
> bugs.
>
> Distributing configs through mon isn't bad idea by itself, I can imagine
> having changes to runtime-changeable settings propagated to OSDs without the
> need for extra step (actually injecting them) and without the need for
> restart, but for anything else, there are already good tools and I see no
> value in trying to mimic them.

Remember that the goal here is not to just invent an alternative way
of distributing ceph.conf.  Even Puppet is overkill for that!  The
goal is to change the way configuration is defined in Ceph, so that
there is a central point of truth for how the cluster is configured,
which will enable us to create a user experience that is more robust,
and an interface that enables building better interactive tooling on
top of Ceph.

When it comes to using something like Puppet as that central point of
truth, there are two major problems with that:
 - If someone wants to write a GUI, they would need to integrate with
your Puppet, someone else's Chef, someone else's Ansible, etc -- a lot
of work, and in many cases the interfaces for doing it don't even
exist (believe me, I've tried writing dashboards that drove Puppet in
the past).
 - If Ceph wants to validate configuration options, and say "No, that
setting is no good" when someone tries to change something, we can't,
because we're not hooked in to Puppet at the point that the user is
changing the setting.

The ultimate benefit to you is that by making Ceph easier to use, we
grow our community, and we grow the population of people who want to
invest in Ceph (all of it, not just the new user friendly bits).

John

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-14 10:18           ` Piotr Dałek
  2017-11-14 11:36             ` John Spray
@ 2017-11-14 13:48             ` Mark Nelson
  1 sibling, 0 replies; 26+ messages in thread
From: Mark Nelson @ 2017-11-14 13:48 UTC (permalink / raw)
  To: Piotr Dałek, John Spray, Kyle Bader
  Cc: Christian Wuerdig, Sage Weil, Ceph Development



On 11/14/2017 04:18 AM, Piotr Dałek wrote:
> On 17-11-13 07:40 PM, John Spray wrote:
>> On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote:
>>> Configuration files are often driven by configuration management, with
>>> previous versions stored in some kind of version control systems. We
>>> should make sure that if configuration moves to the monitors that you
>>> have some form of history and rollback capabilities. It might be worth
>>> modeling it similar to network switch configuration shells, a la
>>> Junos.
>>>
>>> * change configuration
>>> * require commit configuration change
>>> * ability to rollback N configuration changes
>>> * ability to diff to configuration versions
>>>
>>> That way an admin can figure out when the last configuration change
>>> was, what changed, and rollback if necessary.
>>
>> That is an extremely good idea.
>>
>> As a minimal thing, it should be pretty straightforward to implement a
>> snapshot/rollback.
>
> https://thedailywtf.com/articles/The_Complicator_0x27_s_Gloves
>
>> I imagine many users today are not so disciplined as to version
>> control their configs, but this is a good opportunity to push that as
>> the norm by building it in.
>
> Using Ceph on any decent scale actually requires one to use at least
> Puppet or similar tool, I wouldn't add any unnecessary complexity to
> already complex code just because of novice users that are going to have
> hard time using Ceph anyway once a disk breaks and needs to be replaced,
> or when performance goes to hell because users are free to create and
> remove snapshots every 5 minutes.
> And I can already imagine clusters breaking down once config
> database/history breaks for whatever reason, including early
> implementation bugs.
>
> Distributing configs through mon isn't bad idea by itself, I can imagine
> having changes to runtime-changeable settings propagated to OSDs without
> the need for extra step (actually injecting them) and without the need
> for restart, but for anything else, there are already good tools and I
> see no value in trying to mimic them.
>

Those were more or less my thoughts as well.

Mark

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-14 11:36             ` John Spray
@ 2017-11-14 13:58               ` Piotr Dałek
  2017-11-14 16:24                 ` Sage Weil
  2017-11-14 14:33               ` Mark Nelson
  1 sibling, 1 reply; 26+ messages in thread
From: Piotr Dałek @ 2017-11-14 13:58 UTC (permalink / raw)
  To: John Spray
  Cc: Kyle Bader, Mark Nelson, Christian Wuerdig, Sage Weil, Ceph Development

On 17-11-14 12:36 PM, John Spray wrote:
> On Tue, Nov 14, 2017 at 10:18 AM, Piotr Dałek <piotr.dalek@corp.ovh.com> wrote:
>> On 17-11-13 07:40 PM, John Spray wrote:
>>> I imagine many users today are not so disciplined as to version
>>> control their configs, but this is a good opportunity to push that as
>>> the norm by building it in.
>>
>> Using Ceph on any decent scale actually requires one to use at least Puppet
>> or similar tool, I wouldn't add any unnecessary complexity to already
>> complex code just because of novice users that are going to have hard time
>> using Ceph anyway once a disk breaks and needs to be replaced, or when
>> performance goes to hell because users are free to create and remove
>> snapshots every 5 minutes.
> 
> All of the experienced users were novice users once -- making Ceph
> work well for those people is worthwhile.  It's not easy to build
> things that are easy enough for a newcomer but also powerful enough
> for the general case, but it is worth doing.
> 
> When we have to trade internal complexity vs. complexity at
> interfaces, it's generally better to keep the interfaces simple.
> Currently a Ceph cluster with 1000 OSDs has 1000 places to input the
> configuration, and no one place that a person can ask "what is setting
> X on my OSDs?".  Even when they look at a ceph.conf file, they can't
> be sure that those are really the values in use (has the service
> restarted since the file was updated?) or that they will ever be (are
> they invalid values that Ceph will reject on load?).

Well, at least I understand now why my config diff patch 
(https://github.com/ceph/ceph/pull/18586) is not interesting to reviewers. ;)

> The "dump a text file in /etc" interface looks simple on the face of
> it, but is actually quite complex when you look to automate a Ceph
> cluster from a central user interface, or build more intelligence into
> Ceph for avoiding dangerous configurations.  It's also painful for
> non-expert users who are required to type precisely correct syntax
> into that text file.

Anybody who is overwhelmed by ini-style config file should be kept 100km 
away from any datacentre and have their shell access rights revoked ASAP.
Using Ceph (or any kind of SDN-like software) in production requires a few 
years as admin under their belt and trying to change that will only cause 
more grief and frustration from future new users. Ceph already has a feature 
designed with network switch configuration newbies in mind -- it shouldn't.

>> And I can already imagine clusters breaking down once config
>> database/history breaks for whatever reason, including early implementation
>> bugs.
>>
>> Distributing configs through mon isn't bad idea by itself, I can imagine
>> having changes to runtime-changeable settings propagated to OSDs without the
>> need for extra step (actually injecting them) and without the need for
>> restart, but for anything else, there are already good tools and I see no
>> value in trying to mimic them.
> 
> Remember that the goal here is not to just invent an alternative way
> of distributing ceph.conf.  Even Puppet is overkill for that!  The

Of course! This bash oneliner:

for i in {1..4}; do scp ~/cluster_dev/ceph.conf ceph@node$i:/etc/ceph/; done;

is more than enough to distribute config from some central place to 4 nodes. 
But nobody sane does this because anything that's not automated is prone to 
human error. So no, using Puppet is not an overkill, because that does its 
job and is familiar way of doing this for much more users than just users of 
Ceph. Still, I'm not opposing distributing Ceph configs through mons, 
because that's actually useful.

> goal is to change the way configuration is defined in Ceph, so that
> there is a central point of truth for how the cluster is configured,
> which will enable us to create a user experience that is more robust,
> and an interface that enables building better interactive tooling on
> top of Ceph.
> 
> When it comes to using something like Puppet as that central point of
> truth, there are two major problems with that:
>   - If someone wants to write a GUI, they would need to integrate with
> your Puppet, someone else's Chef, someone else's Ansible, etc -- a lot
> of work, and in many cases the interfaces for doing it don't even
> exist (believe me, I've tried writing dashboards that drove Puppet in
> the past).

Usually when someone needs a GUI to deploy Ceph cluster, they need to deploy 
much more than just Ceph. They need to configure network interfaces, 
storage, kernel, monitoring, etc. etc., so they need to deal with Puppet or 
Chef (or anything) anyway.

>   - If Ceph wants to validate configuration options, and say "No, that
> setting is no good" when someone tries to change something, we can't,
> because we're not hooked in to Puppet at the point that the user is
> changing the setting.

One can use ceph-conf tool to validate config syntax, because it shares the 
config code with daemons. And with recent config code changes, it's even 
possible to validate values. But that's true, validating configuration 
before pushing it to production is tricky at the moment.

> The ultimate benefit to you is that by making Ceph easier to use, we
> grow our community, and we grow the population of people who want to
> invest in Ceph (all of it, not just the new user friendly bits).

True, more users mean tighter bug sieve. But attracting users with ease of 
use is one thing and reinventing wheels AND asking existing users to use 
these reinvented wheels at the same time is another thing. Remember that I 
was relating to the idea of built-in mini-git/mini-svn.

-- 
Piotr Dałek
piotr.dalek@corp.ovh.com
https://www.ovh.com/us/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-14 11:36             ` John Spray
  2017-11-14 13:58               ` Piotr Dałek
@ 2017-11-14 14:33               ` Mark Nelson
  2017-11-14 16:37                 ` Kyle Bader
  1 sibling, 1 reply; 26+ messages in thread
From: Mark Nelson @ 2017-11-14 14:33 UTC (permalink / raw)
  To: John Spray, Piotr Dałek; +Cc: Kyle Bader, Sage Weil, Ceph Development



On 11/14/2017 05:36 AM, John Spray wrote:
> On Tue, Nov 14, 2017 at 10:18 AM, Piotr Dałek <piotr.dalek@corp.ovh.com> wrote:
>> On 17-11-13 07:40 PM, John Spray wrote:
>>>
>>> On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote:
>>>>
>>>> Configuration files are often driven by configuration management, with
>>>> previous versions stored in some kind of version control systems. We
>>>> should make sure that if configuration moves to the monitors that you
>>>> have some form of history and rollback capabilities. It might be worth
>>>> modeling it similar to network switch configuration shells, a la
>>>> Junos.
>>>>
>>>> * change configuration
>>>> * require commit configuration change
>>>> * ability to rollback N configuration changes
>>>> * ability to diff to configuration versions
>>>>
>>>> That way an admin can figure out when the last configuration change
>>>> was, what changed, and rollback if necessary.
>>>
>>>
>>> That is an extremely good idea.
>>>
>>> As a minimal thing, it should be pretty straightforward to implement a
>>> snapshot/rollback.
>>
>>
>> https://thedailywtf.com/articles/The_Complicator_0x27_s_Gloves
>>
>>> I imagine many users today are not so disciplined as to version
>>> control their configs, but this is a good opportunity to push that as
>>> the norm by building it in.
>>
>>
>> Using Ceph on any decent scale actually requires one to use at least Puppet
>> or similar tool, I wouldn't add any unnecessary complexity to already
>> complex code just because of novice users that are going to have hard time
>> using Ceph anyway once a disk breaks and needs to be replaced, or when
>> performance goes to hell because users are free to create and remove
>> snapshots every 5 minutes.
>
> All of the experienced users were novice users once -- making Ceph
> work well for those people is worthwhile.  It's not easy to build
> things that are easy enough for a newcomer but also powerful enough
> for the general case, but it is worth doing.
>
> When we have to trade internal complexity vs. complexity at
> interfaces, it's generally better to keep the interfaces simple.

I've seen too many examples both in our code and in other projects where 
that kind of internal complexity leaks out and makes things worse.  If 
we want to reduce complexity we need to reduce complexity.  I'm not 
against having the mon to centrally report state.  I think it's a great 
idea.  Management I'm not sold on, see below.

> Currently a Ceph cluster with 1000 OSDs has 1000 places to input the
> configuration, and no one place that a person can ask "what is setting
> X on my OSDs?".  Even when they look at a ceph.conf file, they can't
> be sure that those are really the values in use (has the service
> restarted since the file was updated?) or that they will ever be (are
> they invalid values that Ceph will reject on load?).

How many folks with 1000 OSD clusters are manually managing 
configuration files though?  These are the kinds of customers that have 
dedicated linux/storage administrators on staff that have preferences 
regarding how they do things.  When I was managing distributed storage 
systems few things angered me more than trying to deal with each storage 
vendor's custom management systems.  I was never particularly concerned 
with being able to manage (user-facing) state on my own.  What I was 
*very* concerned about was bug-ridden code that got shipped out at the 
last minute so the vendor could checkbox a feature that I couldn't 
easily work around.  There was a particular vendor's Lustre HA 
management/stonith solution that comes to mind.  They weren't the only 
one though.  We had a variety of interesting and horrific issues with 
other non-lustre storage too.  The worst cases were the ones where the 
solution could have been fast/easy but we had to go through all kinds of 
gymnastics to circumvent the vendor's bad behavior.

> The "dump a text file in /etc" interface looks simple on the face of
> it, but is actually quite complex when you look to automate a Ceph
> cluster from a central user interface, or build more intelligence into
> Ceph for avoiding dangerous configurations.  It's also painful for
> non-expert users who are required to type precisely correct syntax
> into that text file.
>

This feels a bit like a proxy war over whether we are designing a 
storage appliance or a traditional linux style service.  I'm not 
convinced we can do both well at the same time.  If we want both, maybe 
we need to think about each as independent products with their own 
goals/management/code/etc.

>> And I can already imagine clusters breaking down once config
>> database/history breaks for whatever reason, including early implementation
>> bugs.
>>
>> Distributing configs through mon isn't bad idea by itself, I can imagine
>> having changes to runtime-changeable settings propagated to OSDs without the
>> need for extra step (actually injecting them) and without the need for
>> restart, but for anything else, there are already good tools and I see no
>> value in trying to mimic them.
>
> Remember that the goal here is not to just invent an alternative way
> of distributing ceph.conf.  Even Puppet is overkill for that!  The
> goal is to change the way configuration is defined in Ceph, so that
> there is a central point of truth for how the cluster is configured,
> which will enable us to create a user experience that is more robust,
> and an interface that enables building better interactive tooling on
> top of Ceph.
>
> When it comes to using something like Puppet as that central point of
> truth, there are two major problems with that:
>  - If someone wants to write a GUI, they would need to integrate with
> your Puppet, someone else's Chef, someone else's Ansible, etc -- a lot
> of work, and in many cases the interfaces for doing it don't even
> exist (believe me, I've tried writing dashboards that drove Puppet in
> the past).
>  - If Ceph wants to validate configuration options, and say "No, that
> setting is no good" when someone tries to change something, we can't,
> because we're not hooked in to Puppet at the point that the user is
> changing the setting.
>
> The ultimate benefit to you is that by making Ceph easier to use, we
> grow our community, and we grow the population of people who want to
> invest in Ceph (all of it, not just the new user friendly bits).
>
> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-14 13:58               ` Piotr Dałek
@ 2017-11-14 16:24                 ` Sage Weil
  0 siblings, 0 replies; 26+ messages in thread
From: Sage Weil @ 2017-11-14 16:24 UTC (permalink / raw)
  To: Piotr Dałek
  Cc: John Spray, Kyle Bader, Mark Nelson, Christian Wuerdig, Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3227 bytes --]

On Tue, 14 Nov 2017, Piotr Dałek wrote:
> On 17-11-14 12:36 PM, John Spray wrote:
> > On Tue, Nov 14, 2017 at 10:18 AM, Piotr Dałek <piotr.dalek@corp.ovh.com>
> > wrote:
> > > On 17-11-13 07:40 PM, John Spray wrote:
> > > > I imagine many users today are not so disciplined as to version
> > > > control their configs, but this is a good opportunity to push that as
> > > > the norm by building it in.
> > > 
> > > Using Ceph on any decent scale actually requires one to use at least 
> > > Puppet or similar tool, I wouldn't add any unnecessary complexity to 
> > > already complex code just because of novice users that are going to 
> > > have hard time using Ceph anyway once a disk breaks and needs to be 
> > > replaced, or when performance goes to hell because users are free to 
> > > create and remove snapshots every 5 minutes.
> > 
> > All of the experienced users were novice users once -- making Ceph
> > work well for those people is worthwhile.  It's not easy to build
> > things that are easy enough for a newcomer but also powerful enough
> > for the general case, but it is worth doing.
> > 
> > When we have to trade internal complexity vs. complexity at
> > interfaces, it's generally better to keep the interfaces simple.
> > Currently a Ceph cluster with 1000 OSDs has 1000 places to input the
> > configuration, and no one place that a person can ask "what is setting
> > X on my OSDs?".  Even when they look at a ceph.conf file, they can't
> > be sure that those are really the values in use (has the service
> > restarted since the file was updated?) or that they will ever be (are
> > they invalid values that Ceph will reject on load?).
> 
> Well, at least I understand now why my config diff patch
> (https://github.com/ceph/ceph/pull/18586) is not interesting to reviewers. ;)

Oh, I hadn't seen this.  (Don't read too much into a lack of reviews or
comments!) I like the diff local command, not sure about the file one.
I'm in the midst of rewriting a bunch of this code in a preparatory
cleanup for the other config changes... I'll post a PR with just the      
cleanup portions shortly.

I think the big question is whether we can go all-in on mon configs or
whether we need to maintain a traditional ceph.conf option as well.  I'm
of two minds here.  I think it's pretty straightforward to get the
transparency/reporting that John is after by making daemons report running
config but not necessarily pull mon configs... and we probably want/need
that anyway to allow e.g. 'ceph daemon <name> config ...' overrides, and
for the upgrade path.  I'm just worried about an ever-expanding menu of
options.

I can't simply throw up our hands and say this is out of scope and
administrators need to be able to handle this on their own.  I had this
attitude for a long time and as a result Ceph has a repuation for being
hard to install, hard to configure, and hard to manage.  This limits     
adoption, makes it easy for users to make mistakes, and hurts the project.
We can do better.  Lots of other projects and storage systems *do* do     
better.

Let's just be smart about what we implement so that we're solving the
usability and transparently problems and without hobbling power users.

sage

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-14 14:33               ` Mark Nelson
@ 2017-11-14 16:37                 ` Kyle Bader
  2017-11-14 18:01                   ` Alfredo Deza
  0 siblings, 1 reply; 26+ messages in thread
From: Kyle Bader @ 2017-11-14 16:37 UTC (permalink / raw)
  To: Mark Nelson; +Cc: John Spray, Piotr Dałek, Sage Weil, Ceph Development

>>> Using Ceph on any decent scale actually requires one to use at least
>>> Puppet
>>> or similar tool, I wouldn't add any unnecessary complexity to already
>>> complex code just because of novice users that are going to have hard
>>> time
>>> using Ceph anyway once a disk breaks and needs to be replaced, or when
>>> performance goes to hell because users are free to create and remove
>>> snapshots every 5 minutes.

This discussion reminds me of a heated debate we had in the early days
about whether configuration management should handle the provisioning
of OSDs, or whether Ceph should have a tool to hide the ugliness. At
the time, I was staunchly on the configuration management side. We
used this horribleness to create new OSDs:

https://github.com/dreamhost-cookbooks/ceph/blob/de5929eb45bda50785aa01181b281e25af0d1785/recipes/osd.rb

Today we have ceph-disk (and soon to be ceph-volume)! I still have my
reservations about the level of udev wizardry, which is tricky to
debug, but it generally works and makes the experience better the vast
majority of operators, reglardless . This lead to a single method to
prepare OSDs that was configuration management agnostic. Nowadays all
the Ansible/Chef/Puppet thingers use ceph-disk.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-14 16:37                 ` Kyle Bader
@ 2017-11-14 18:01                   ` Alfredo Deza
  0 siblings, 0 replies; 26+ messages in thread
From: Alfredo Deza @ 2017-11-14 18:01 UTC (permalink / raw)
  To: Kyle Bader
  Cc: Mark Nelson, John Spray, Piotr Dałek, Sage Weil, Ceph Development

On Tue, Nov 14, 2017 at 11:37 AM, Kyle Bader <kyle.bader@gmail.com> wrote:
>>>> Using Ceph on any decent scale actually requires one to use at least
>>>> Puppet
>>>> or similar tool, I wouldn't add any unnecessary complexity to already
>>>> complex code just because of novice users that are going to have hard
>>>> time
>>>> using Ceph anyway once a disk breaks and needs to be replaced, or when
>>>> performance goes to hell because users are free to create and remove
>>>> snapshots every 5 minutes.
>
> This discussion reminds me of a heated debate we had in the early days
> about whether configuration management should handle the provisioning
> of OSDs, or whether Ceph should have a tool to hide the ugliness. At
> the time, I was staunchly on the configuration management side. We
> used this horribleness to create new OSDs:
>
> https://github.com/dreamhost-cookbooks/ceph/blob/de5929eb45bda50785aa01181b281e25af0d1785/recipes/osd.rb
>
> Today we have ceph-disk (and soon to be ceph-volume)! I still have my
> reservations about the level of udev wizardry, which is tricky to
> debug, but it generally works and makes the experience better the vast
> majority of operators, reglardless . This lead to a single method to
> prepare OSDs that was configuration management agnostic. Nowadays all
> the Ansible/Chef/Puppet thingers use ceph-disk.

There is a separation needed here I think, where there are tools and
abstractions that work at a local (or close to always local) level.
ceph-disk and
ceph-volume are good examples of this, since they operate with the
context of local devices. At some point during the process they do
need to inform
the cluster of their operations though (e.g. there is a new OSD,
register as part of the cluster).

So configuration that makes sense for a localized service like
ceph-volume (or ceph-disk) makes sense to be on the server itself.
That is why there are abstractions
via Puppet/Chef/Ansible for these tools, because these tools are
cluster-aware, and they are just delegating to localized services.

For configuration it might make sense to have this sort of duality,
where some settings and configuration makes sense for the server but
the rest is for the cluster.

I'm not sure that everything (exclusively) must be file-based or on
the monitors. If we are trying to make sure users are happy with these
changes, lets accept/embrace views like the one from Piotr, which
doesn't mean throwing away ideas on where we should be headed.

> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-10 15:30 config on mons Sage Weil
                   ` (3 preceding siblings ...)
  2017-11-13 13:23 ` John Spray
@ 2017-11-14 22:21 ` Sage Weil
  2017-11-14 23:45   ` John Spray
  4 siblings, 1 reply; 26+ messages in thread
From: Sage Weil @ 2017-11-14 22:21 UTC (permalink / raw)
  To: ceph-devel

I've updated the pad at

	http://pad.ceph.com/p/config

After thinking about this a bit more, I think we may need to abandon the 
idea of a pure ceph.conf-less world.  Lots of people already have tooling 
around ceph.conf, getting rid of it will be an awkward process (even for a 
one-time upgrade), and I'm not sure we can eliminate it entirely anyway 
since many options affect the bootstrapping phase, authentication, and so 
on.

Instead, I'm currently partial to giving processes a more nuanced view of 
their config based on where the value comes from.  A single option may 
include

1- a default value (compiled in)
2- a value from the mon
3- a value from ceph.conf
4- a value set via command line, 'ceph tell', 'ceph daemon ... config set 
...', etc.

We would always use the highest-priority value on that list.  This means 
that ceph.conf can override the mon, just like a command-line argument 
overrides ceph.conf.

On the flip side of this, all of these values are also reported to the mgr 
and tracked along with the other daemon state.  So regardless of where 
config values come from, it is all still visible via the CLI, GUI, or 
whatever else.

Further, we can then make the GUI (or CLI or whatever) act on that 
information to, say,

- assimilate ceph.conf values into the mon so that ceph.cong can be 
removed/abbreviated (i.e., the upgrade/transition path to centralized 
config)
- see override values set via cli (i.e., in gui)
- clear override values (i.e., ceph tell <daemon> config rm <name>)
- surface a HEALTH_WARN if a CLI or 'config set' override has been 
set on one or more daemons (so the operator knows the running config is 
not persistent).
- surface a HEALTH_WARN if a mon option is overriden by a daemon's local 
ceph.conf 
file.

Notably, the user can also do nothing and the cluster can continue to 
operate as it always has.  The mgr will still have the new visibility into 
running daemon options, so the GUI experience will still be 
consistent--they just won't be able to change configs centrally (or 
rather, those settings won't have any effect if old ceph.conf's override 
them).

I think Kyle's revision history suggestion is a great one.  I don't have 
any bright ideas about how this should be managed on the mon side yet, but 
I agree that it is an important function and should be baked in from day 
1.

Thoughts?
sage

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-14 22:21 ` Sage Weil
@ 2017-11-14 23:45   ` John Spray
  2017-11-15 13:32     ` Sage Weil
  0 siblings, 1 reply; 26+ messages in thread
From: John Spray @ 2017-11-14 23:45 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development

On Tue, Nov 14, 2017 at 10:21 PM, Sage Weil <sage@newdream.net> wrote:
> I've updated the pad at
>
>         http://pad.ceph.com/p/config
>
> After thinking about this a bit more, I think we may need to abandon the
> idea of a pure ceph.conf-less world.  Lots of people already have tooling
> around ceph.conf, getting rid of it will be an awkward process (even for a
> one-time upgrade), and I'm not sure we can eliminate it entirely anyway
> since many options affect the bootstrapping phase, authentication, and so
> on.
>
> Instead, I'm currently partial to giving processes a more nuanced view of
> their config based on where the value comes from.  A single option may
> include
>
> 1- a default value (compiled in)
> 2- a value from the mon
> 3- a value from ceph.conf
> 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set
> ...', etc.
>
> We would always use the highest-priority value on that list.  This means
> that ceph.conf can override the mon, just like a command-line argument
> overrides ceph.conf.

I think that if there are some folks who just cannot work without
loading local configs onto their nodes, I want to insulate folks
working on user interfaces from having to handle the resulting
complexity.  The folks pushing config files out to their nodes
presumably have their own preferred way of dealing with this stuff, so
they shouldn't miss it from the Ceph UI.

In that spirit, I think that we don't need to have a per-setting
granularity of what is overridden and what isn't: daemons should just
flag whether they are consuming the mon config (default), or whether
they are using local ceph.conf.  That way, folks building UIs can grey
things out at a whole-page level if the cluster is not using
centralized config.  It sacrifices some flexibility for the people who
want to use local conf for some things but central conf for others (do
those people exist?) but I think it's worth it to avoid having a
complicated UI that has to worry about displaying and communicating
the subtle distinctions between those 1/2/3/4 values which might all
be different.

The upshot would be that UI developers could build elements that work
as expected by default for systems that use the central config, but
safely disable themselves on systems where the user has gone their own
way and pushed out local configuration.

> On the flip side of this, all of these values are also reported to the mgr
> and tracked along with the other daemon state.  So regardless of where
> config values come from, it is all still visible via the CLI, GUI, or
> whatever else.
>
> Further, we can then make the GUI (or CLI or whatever) act on that
> information to, say,
>
> - assimilate ceph.conf values into the mon so that ceph.cong can be
> removed/abbreviated (i.e., the upgrade/transition path to centralized
> config)

> - see override values set via cli (i.e., in gui)
> - clear override values (i.e., ceph tell <daemon> config rm <name>)
> - surface a HEALTH_WARN if a CLI or 'config set' override has been
> set on one or more daemons (so the operator knows the running config is
> not persistent).

This comes back to our recurring discussion about whether a
HEALTH_INFO level should exist: I'm increasingly of the opinion that
when we run into things like this, it's nature's way of telling us
that maybe our underlying model is weird (in this case, maybe we
didn't need to have the concept of ephemeral configuration settings in
the system at all).

Maybe ephemeral config changes should be treated the same way I
propose to treat local overrides: the daemon reports just that it has
been overridden, and the GUI goes hands-off and does not attempt to
communicate the story to the user "Well, you see, it's currently set
to xyz until the next restart, at which point it will revert to abc,
that is unless you have a local ceph.conf in which case...".

The ability to rollback config changes seems like it would be the
"right way" to accomplish having some config settings that we set and
then subsequently revert, rather than having the revert happen
implicitly when the daemon next restarts (intentionally or not).

> - surface a HEALTH_WARN if a mon option is overriden by a daemon's local
> ceph.conf file.

Hmm, this makes me a bit confused, as if you're still thinking of the
local ceph.conf being a deprecated/upgrade thing?  If it's really
permitted in general than it wouldn't make sense for it to be a WARN.

John

> Notably, the user can also do nothing and the cluster can continue to
> operate as it always has.  The mgr will still have the new visibility into
> running daemon options, so the GUI experience will still be
> consistent--they just won't be able to change configs centrally (or
> rather, those settings won't have any effect if old ceph.conf's override
> them).
>
> I think Kyle's revision history suggestion is a great one.  I don't have
> any bright ideas about how this should be managed on the mon side yet, but
> I agree that it is an important function and should be baked in from day
> 1.
>
> Thoughts?
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-14 23:45   ` John Spray
@ 2017-11-15 13:32     ` Sage Weil
  2017-11-15 17:16       ` Lars Marowsky-Bree
  0 siblings, 1 reply; 26+ messages in thread
From: Sage Weil @ 2017-11-15 13:32 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

On Tue, 14 Nov 2017, John Spray wrote:
> On Tue, Nov 14, 2017 at 10:21 PM, Sage Weil <sage@newdream.net> wrote:
> > I've updated the pad at
> >
> >         http://pad.ceph.com/p/config
> >
> > After thinking about this a bit more, I think we may need to abandon the
> > idea of a pure ceph.conf-less world.  Lots of people already have tooling
> > around ceph.conf, getting rid of it will be an awkward process (even for a
> > one-time upgrade), and I'm not sure we can eliminate it entirely anyway
> > since many options affect the bootstrapping phase, authentication, and so
> > on.
> >
> > Instead, I'm currently partial to giving processes a more nuanced view of
> > their config based on where the value comes from.  A single option may
> > include
> >
> > 1- a default value (compiled in)
> > 2- a value from the mon
> > 3- a value from ceph.conf
> > 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set
> > ...', etc.
> >
> > We would always use the highest-priority value on that list.  This means
> > that ceph.conf can override the mon, just like a command-line argument
> > overrides ceph.conf.
> 
> I think that if there are some folks who just cannot work without
> loading local configs onto their nodes, I want to insulate folks
> working on user interfaces from having to handle the resulting
> complexity.  The folks pushing config files out to their nodes
> presumably have their own preferred way of dealing with this stuff, so
> they shouldn't miss it from the Ceph UI.
> 
> In that spirit, I think that we don't need to have a per-setting
> granularity of what is overridden and what isn't: daemons should just
> flag whether they are consuming the mon config (default), or whether
> they are using local ceph.conf.  That way, folks building UIs can grey
> things out at a whole-page level if the cluster is not using
> centralized config.  It sacrifices some flexibility for the people who
> want to use local conf for some things but central conf for others (do
> those people exist?) but I think it's worth it to avoid having a
> complicated UI that has to worry about displaying and communicating
> the subtle distinctions between those 1/2/3/4 values which might all
> be different.

The problem is I think non-trivial ceph.confs are going to still be 
required in many valid situations, since there are a load of settings that 
affect how to connect and authenticate with the mon.  For most users the 
defaults will do and it will just be 'mon_host' (or maybe they 
use DNS for this), but any nontrivial authentication settings (e.g., 
kerberos is coming) or messenger types will require something.

(We also have to allow local overrides to ensure that the mon config can't 
brick the cluster by setting the internal mon settings, like paxos_*, to 
some bad value.)

I could see us combining 3-4 to simplify, though; the fact that a setting 
will go away on daemon restart isn't that interesting or normal, and 
presumably the cluster is *already* in a state where the mon and ceph.conf 
configs aren't fighting each other, so any disparity there will be seen 
for what it is.

I think the reporting to mgr to make a distinction needs to be there, 
though, because (1) to make a transition we want to see the delta between 
what the daemon has running and what the mon wants, and (2) I don't think 
we should make things like 'ceph daemon ... config set ...' turn into a 
request to the monitor to set a config so that the daemon will get a 
corresponding config update.  These are low-level commands that are 
important for debugging/fixing issues and I we shouldn't break them.

> The upshot would be that UI developers could build elements that work
> as expected by default for systems that use the central config, but
> safely disable themselves on systems where the user has gone their own
> way and pushed out local configuration.

I think the scenarios aren't too complex for the UI:

- the mon config doesn't match running config.
  - button to update mon config to match running config, and/or
  - button to clear running config so that it matches mon config
- the mon config is overridden by local ceph.conf
  - button to update/abbreviate/remove local ceph.conf settings so that 
    mon can drive

We can either keep the distinction for 3-4 and implement both, or blur 
them and the 'clear running config' just won't do anything.

Or the UI can not implement those buttons at all and just show that there 
is a disaparity and leave it to the user to fix (or not)...?

> > On the flip side of this, all of these values are also reported to the mgr
> > and tracked along with the other daemon state.  So regardless of where
> > config values come from, it is all still visible via the CLI, GUI, or
> > whatever else.
> >
> > Further, we can then make the GUI (or CLI or whatever) act on that
> > information to, say,
> >
> > - assimilate ceph.conf values into the mon so that ceph.cong can be
> > removed/abbreviated (i.e., the upgrade/transition path to centralized
> > config)
> 
> > - see override values set via cli (i.e., in gui)
> > - clear override values (i.e., ceph tell <daemon> config rm <name>)
> > - surface a HEALTH_WARN if a CLI or 'config set' override has been
> > set on one or more daemons (so the operator knows the running config is
> > not persistent).
> 
> This comes back to our recurring discussion about whether a
> HEALTH_INFO level should exist: I'm increasingly of the opinion that
> when we run into things like this, it's nature's way of telling us
> that maybe our underlying model is weird (in this case, maybe we
> didn't need to have the concept of ephemeral configuration settings in
> the system at all).
> 
> Maybe ephemeral config changes should be treated the same way I
> propose to treat local overrides: the daemon reports just that it has
> been overridden, and the GUI goes hands-off and does not attempt to
> communicate the story to the user "Well, you see, it's currently set
> to xyz until the next restart, at which point it will revert to abc,
> that is unless you have a local ceph.conf in which case...".

I don't think the restart subtlety needs to be communicated...

> The ability to rollback config changes seems like it would be the
> "right way" to accomplish having some config settings that we set and
> then subsequently revert, rather than having the revert happen
> implicitly when the daemon next restarts (intentionally or not).

Agreed.  We should be setting things like debug_osd=20 for diagnosing 
an issue via the mon.

> > - surface a HEALTH_WARN if a mon option is overriden by a daemon's local
> > ceph.conf file.
> 
> Hmm, this makes me a bit confused, as if you're still thinking of the
> local ceph.conf being a deprecated/upgrade thing?  If it's really
> permitted in general than it wouldn't make sense for it to be a WARN.

This warning would only appear if the mon sets option foo to A and the 
conf sets teh same option to B...

sage

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-15 13:32     ` Sage Weil
@ 2017-11-15 17:16       ` Lars Marowsky-Bree
  2017-11-15 21:26         ` Sage Weil
  0 siblings, 1 reply; 26+ messages in thread
From: Lars Marowsky-Bree @ 2017-11-15 17:16 UTC (permalink / raw)
  To: Ceph Development

On 2017-11-15T13:32:55, Sage Weil <sage@newdream.net> wrote:

I am strongly in favor of moving the config to the MONs, and
depreciating ceph.conf - maybe a ceph-bootstrap.conf for connecting to
the MONs to get it, but that's it.

In a previous life, I helped design a Cluster Information Base to reduce
config drift - a central information store is vastly superior to files
copied around, whether that happens manually or from a config management
system.

It's always outdated *somewhere*, and Ceph already has the concept of
the MONs having maps and a concurrency/consistency algorithm for them
(beloved PAXOS), so it doesn't add any significant complexity.

So for once, I vote for building it in. Don't add etcd/consul. We want
strong consistency here, and can build on stuff already there. If Ceph
would need to invent this from scratch, sure, but thus we can build on
something existing that needs to work anyway or we're screwed.

> > > 1- a default value (compiled in)
> > > 2- a value from the mon
> > > 3- a value from ceph.conf
> > > 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set
> > > ...', etc.

I'm opposed to 3 and 4.

I *can* see the need to override a value on a per-host or on a
per-daemon instance basis (including combinations thereof, e.g., all
OSDs on node X). (Back when, we also expected these to be way more
frequently needed; to this day, I can count on my fingers the times I
needed per-host overrides, I think; really the only use case where this
happens more often are debug flags.)

But if you want any sort of consistency, those modify the settings in
the respective map on the MON, and the daemon *then* gets that one from
the single authoritative source of truth.

> (We also have to allow local overrides to ensure that the mon config can't 
> brick the cluster by setting the internal mon settings, like paxos_*, to 
> some bad value.)

Valid point; but perhaps this could be solved by allowing the MONs to
start up in a "safe" mode, too?

> I think the reporting to mgr to make a distinction needs to be there, 
> though, because (1) to make a transition we want to see the delta between 
> what the daemon has running and what the mon wants, and (2) I don't think 
> we should make things like 'ceph daemon ... config set ...' turn into a 
> request to the monitor to set a config so that the daemon will get a 
> corresponding config update.  These are low-level commands that are 
> important for debugging/fixing issues and I we shouldn't break them.

I'm not perfectly sure about this one, see above. I think having a
single channel through which config updates reach daemons might be worth
it.

> Or the UI can not implement those buttons at all and just show that there 
> is a disaparity and leave it to the user to fix (or not)...?

... or make the disparity go away through a single source of truth.


Regards,
    Lars

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-15 17:16       ` Lars Marowsky-Bree
@ 2017-11-15 21:26         ` Sage Weil
  2017-11-30 22:31           ` Gregory Farnum
  0 siblings, 1 reply; 26+ messages in thread
From: Sage Weil @ 2017-11-15 21:26 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Ceph Development

On Wed, 15 Nov 2017, Lars Marowsky-Bree wrote:
> On 2017-11-15T13:32:55, Sage Weil <sage@newdream.net> wrote:
> > > > 1- a default value (compiled in)
> > > > 2- a value from the mon
> > > > 3- a value from ceph.conf
> > > > 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set
> > > > ...', etc.
> 
> I'm opposed to 3 and 4.
> 
> I *can* see the need to override a value on a per-host or on a
> per-daemon instance basis (including combinations thereof, e.g., all
> OSDs on node X). (Back when, we also expected these to be way more
> frequently needed; to this day, I can count on my fingers the times I
> needed per-host overrides, I think; really the only use case where this
> happens more often are debug flags.)
> 
> But if you want any sort of consistency, those modify the settings in
> the respective map on the MON, and the daemon *then* gets that one from
> the single authoritative source of truth.

The problem is this makes the system more fragile, and with a 
complex distributed, and the types of things I've needed to diagnose and 
debug in the past, I am very nervous about taking away the ability to 
force a config value locally (e.g., via 'ceph daemon ...', when it is 
having trouble pulling config from the mon for whatever reason).

...

As far as broad principles go, I think we are mostly in alignment: (1) we 
want centrally managed config, (2) managed by the mons, for (3) a 
simplified user experience, and (4) an easy upgrade path to get there.  
I think the implementation required to get that is roughly what I 
described, and although it sounds complicated, none of the key pieces can 
really be taken away.

1. Daemons report running config to mgr.  We need some form of this no 
matter what for the upgrade/transition.  Beyond that, I think it's still 
important in order to tell whether the "single source of truth" is 
something that even can be true: (1) some options cannot be changed at 
runtime and require a restart, (2) some options may have illegal/invalid 
values, (3) the set of allowed options may change build to build, so 
something that used to valid may not be anymore or may not be if the 
daemon is newer or older than the mon.

2. Local overrides are possible.  This can/should be rare and reserved 
for extraordinary circumstances, but I don't feel comfortable removing 
this.  In a complex there are many things that could prevent the daemon 
from speaking to the mon to get an updated config.

3. ceph.conf is allowed in at least some cases.  This is more or less a 
given on the mon in order to handle bootstraping and to resolve bad 
changes to the mon config (that, say, break paxos itself).  There are also 
still cases where initial options are needed to fetch the rest of the 
config from the mon.  And during the transition period it is required.

I think the real question is whether, post-nautilus, we continue to 
encourage or allow ceph.conf for daemons.  I think this is a decision that 
amounts to turning it off in certain circumstances to force users into a 
better world, but it's not something we can do away with to simplify the 
world today.  We can still ignore this possibility from the GUI, perhaps, 
but I think we're better off lumping it together with #2 and doing 
something extremely simple like, say, putting a (!) icon next to options 
that the daemon isn't respecting (because they have overridden it, or need 
to restart, or it is not valid, or whatever else).

I can't see a way to change 1-3 above without a very different approach 
(like, using something external to the mons).  Am I missing something?

sage

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-15 21:26         ` Sage Weil
@ 2017-11-30 22:31           ` Gregory Farnum
  2017-12-01 17:53             ` Sage Weil
  0 siblings, 1 reply; 26+ messages in thread
From: Gregory Farnum @ 2017-11-30 22:31 UTC (permalink / raw)
  To: Sage Weil, John Spray; +Cc: Ceph Development

I'm resurrecting this thread since it wasn't clear a consensus was
reached, I was out on vacation while it was happening, and it doesn't
look like there's been much work done yet to render any discussion
obsolete.

Mostly, I agree with Sage's last email, but I think I have a few other
points to raise. :)

On Wed, Nov 15, 2017 at 1:26 PM, Sage Weil <sage@newdream.net> wrote:
> On Wed, 15 Nov 2017, Lars Marowsky-Bree wrote:
>> On 2017-11-15T13:32:55, Sage Weil <sage@newdream.net> wrote:
>> > > > 1- a default value (compiled in)
>> > > > 2- a value from the mon
>> > > > 3- a value from ceph.conf
>> > > > 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set
>> > > > ...', etc.
>>
>> I'm opposed to 3 and 4.
>>
>> I *can* see the need to override a value on a per-host or on a
>> per-daemon instance basis (including combinations thereof, e.g., all
>> OSDs on node X). (Back when, we also expected these to be way more
>> frequently needed; to this day, I can count on my fingers the times I
>> needed per-host overrides, I think; really the only use case where this
>> happens more often are debug flags.)
>>
>> But if you want any sort of consistency, those modify the settings in
>> the respective map on the MON, and the daemon *then* gets that one from
>> the single authoritative source of truth.
>
> The problem is this makes the system more fragile, and with a
> complex distributed, and the types of things I've needed to diagnose and
> debug in the past, I am very nervous about taking away the ability to
> force a config value locally (e.g., via 'ceph daemon ...', when it is
> having trouble pulling config from the mon for whatever reason).

Yes, we definitely need a local override. For one thing, we need to be
able to turn on and configure OSDs in disconnected modes (eg, journal
flushes with FileStore) that involve turning on an awful lot of the
full system. Remembering to mark specific config options as
"allowed-to-set-locally" is just not practical or maintainable.

>
> ...
>
> As far as broad principles go, I think we are mostly in alignment: (1) we
> want centrally managed config, (2) managed by the mons, for (3) a
> simplified user experience, and (4) an easy upgrade path to get there.
> I think the implementation required to get that is roughly what I
> described, and although it sounds complicated, none of the key pieces can
> really be taken away.
>
> 1. Daemons report running config to mgr.  We need some form of this no
> matter what for the upgrade/transition.  Beyond that, I think it's still
> important in order to tell whether the "single source of truth" is
> something that even can be true: (1) some options cannot be changed at
> runtime and require a restart, (2) some options may have illegal/invalid
> values, (3) the set of allowed options may change build to build, so
> something that used to valid may not be anymore or may not be if the
> daemon is newer or older than the mon.
>
> 2. Local overrides are possible.  This can/should be rare and reserved
> for extraordinary circumstances, but I don't feel comfortable removing
> this.  In a complex there are many things that could prevent the daemon
> from speaking to the mon to get an updated config.
>
> 3. ceph.conf is allowed in at least some cases.  This is more or less a
> given on the mon in order to handle bootstraping and to resolve bad
> changes to the mon config (that, say, break paxos itself).  There are also
> still cases where initial options are needed to fetch the rest of the
> config from the mon.  And during the transition period it is required.
>
> I think the real question is whether, post-nautilus, we continue to
> encourage or allow ceph.conf for daemons.  I think this is a decision that
> amounts to turning it off in certain circumstances to force users into a
> better world, but it's not something we can do away with to simplify the
> world today.  We can still ignore this possibility from the GUI, perhaps,
> but I think we're better off lumping it together with #2 and doing
> something extremely simple like, say, putting a (!) icon next to options
> that the daemon isn't respecting (because they have overridden it, or need
> to restart, or it is not valid, or whatever else).
>
> I can't see a way to change 1-3 above without a very different approach
> (like, using something external to the mons).  Am I missing something?

I think you're correct about these three statements.

My inclination would be to shift the documentation and expectation to
using the central config service, but that we don't break anything
which users might already have. As long as we expose that daemons have
differing config values from the central service, ceph-mgr can be as
clever or dumb as it wants about handling that.

By the same token, though, I don't think we need to take central
responsibility for removing or editing configs which aren't in the
central mon store. Doing that parsing is a pain in the butt and
presumably anybody who set up a real ceph.conf can manage to remove it
themselves.
One thing we could maybe do is identify the "local config" settings in
Nautilus (that is, stuff specifying specific disks and paths, or
otherwise necessary to make the daemon turn on) and offer a one-click
"delete the ceph.conf and replace it with the minimal set", but that
would just be a one-time option to make life better for upgraders, not
something we want to commit to.


Now, starting from the beginning of the thread, a few other things...

On Fri, Nov 10, 2017 at 7:30 AM, Sage Weil <sweil@redhat.com> wrote:
> Namely,
>
>  config/option = value               # like [global]
>  config/$type/option = value         # like [mon]
>  config/$type.$id/option = value     # like [mon.a]

I am finding this really difficult to work with. Do you expect for
users to manipulate this directly? I can imagine this being the
internal schema, but I hope the CLI commands and GUI are about setting
options on buckets which are pretty-printed in the "osd tree" command!

> There are two new things:
>
>  config/.../class:$classname/option = value
>
> For OSDs, this matches the device_class.  So you can do something like
>
>  config/osd/class:ssd/bluestore_cache_size = 10485760  # 10gb, woohoo!
>
> You can also match the crush location:
>
>  config/.../$crushtype:$crushvalue/option = value
>
> e.g.,
>
>  config/osd/rack:foo/debug_osd = 10    # hunting some issue
>
> This obviously makes sense for OSDs.  We can also make it makes sense for
> non-OSDs since everybody (clients and daemons) has a concept of
> crush_location that is a set of key/value pairs like "host=foo rack=bar"
> which match the CRUSH hierarchy.

I am not understanding this at all — I don't think we can have any
expectation that clients know where they are in relationship to the
CRUSH tree. Frequently they are not sharing any of the specified
resources, and they are much more likely to shift locations than OSDs
are. (eg, rbd running in compute boxes in different domains from the
storage nodes, possibly getting live migrated...)

On Mon, Nov 13, 2017 at 10:40 AM, John Spray <jspray@redhat.com> wrote:
> On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote:
>> Configuration files are often driven by configuration management, with
>> previous versions stored in some kind of version control systems. We
>> should make sure that if configuration moves to the monitors that you
>> have some form of history and rollback capabilities. It might be worth
>> modeling it similar to network switch configuration shells, a la
>> Junos.
>>
>> * change configuration
>> * require commit configuration change
>> * ability to rollback N configuration changes
>> * ability to diff to configuration versions
>>
>> That way an admin can figure out when the last configuration change
>> was, what changed, and rollback if necessary.
>
> That is an extremely good idea.
>
> As a minimal thing, it should be pretty straightforward to implement a
> snapshot/rollback.
>
> I imagine many users today are not so disciplined as to version
> control their configs, but this is a good opportunity to push that as
> the norm by building it in.

I get the appeal of snapshotting, but I am definitely not convinced
this is something we should build directly into the monitors. Do you
have an implementation in mind?
It seems to me like this is something we can implement pretty easily
in ceph-mgr (either by restricting the snapshotting to mechanisms that
make changes via the manager, or by subscribing to config changes),
and that for admins using orchestration frameworks they already get
rollbackability from their own version control. Why not take advantage
of those easier development environments, which are easy to adjust
later if we find new requirements or issues?

On Tue, Nov 14, 2017 at 3:45 PM, John Spray <jspray@redhat.com> wrote:
> This comes back to our recurring discussion about whether a
> HEALTH_INFO level should exist: I'm increasingly of the opinion that
> when we run into things like this, it's nature's way of telling us
> that maybe our underlying model is weird (in this case, maybe we
> didn't need to have the concept of ephemeral configuration settings in
> the system at all).
>
> Maybe ephemeral config changes should be treated the same way I
> propose to treat local overrides: the daemon reports just that it has
> been overridden, and the GUI goes hands-off and does not attempt to
> communicate the story to the user "Well, you see, it's currently set
> to xyz until the next restart, at which point it will revert to abc,
> that is unless you have a local ceph.conf in which case...".

I'm with you on this — I don't think there's a reason for the central
config to distinguish between *kinds* of disagreement. We probably
want to expose which daemons are disagreeing on which options, but I'm
not seeing the utility of diagnosing *where* the disagreement was
injected.

We can do a lot with those reported config options and their
disagreements that I think will be of value, though!
*) we can specify that certain config options must not be overridden —
heartbeat timeouts, for instance — and we boot anybody who does so
*) we can be selective about which configs we care about matching in
the GUI. If we roll out a new AwesomeMessenger, we may want to let
users switch to it incrementally and expose that in the GUI. We may
get ambitious someday and have a one-click "convert this OSD to
Bluestore" button. etc. But maybe we just ignore all filestore config
settings, since we're moving to BlueStore and don't care how those may
be set differently for different classes of OSDs. We can deal with the
fact that sometimes a support tech will tell customers to restart an
OSD with debug settings on the command line, and we don't want to
disable part of their dashboard gui when that happens.
*) we can recommend importing differences into the central config
store (eg on upgrade) when they match some heuristic standard of
"makes sense"

-Greg

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: config on mons
  2017-11-30 22:31           ` Gregory Farnum
@ 2017-12-01 17:53             ` Sage Weil
  0 siblings, 0 replies; 26+ messages in thread
From: Sage Weil @ 2017-12-01 17:53 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: John Spray, Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 8422 bytes --]

On Thu, 30 Nov 2017, Gregory Farnum wrote:
> I'm resurrecting this thread since it wasn't clear a consensus was
> reached, I was out on vacation while it was happening, and it doesn't
> look like there's been much work done yet to render any discussion
> obsolete.

Thanks!

> My inclination would be to shift the documentation and expectation to
> using the central config service, but that we don't break anything
> which users might already have. As long as we expose that daemons have
> differing config values from the central service, ceph-mgr can be as
> clever or dumb as it wants about handling that.

+1
 
> By the same token, though, I don't think we need to take central
> responsibility for removing or editing configs which aren't in the
> central mon store. Doing that parsing is a pain in the butt and
> presumably anybody who set up a real ceph.conf can manage to remove it
> themselves.
> One thing we could maybe do is identify the "local config" settings in
> Nautilus (that is, stuff specifying specific disks and paths, or
> otherwise necessary to make the daemon turn on) and offer a one-click
> "delete the ceph.conf and replace it with the minimal set", but that
> would just be a one-time option to make life better for upgraders, not
> something we want to commit to.

Yeah, I view this as TBD.  I want there to be *some* transition path but 
I'm not sure how magic it should be.  Among other issues, daemons run as 
user ceph and won't be able to overwrite /etc/ceph/ceph.conf (usually 
owned by root), so... yeah.


> On Fri, Nov 10, 2017 at 7:30 AM, Sage Weil <sweil@redhat.com> wrote:
> > Namely,
> >
> >  config/option = value               # like [global]
> >  config/$type/option = value         # like [mon]
> >  config/$type.$id/option = value     # like [mon.a]
> 
> I am finding this really difficult to work with. Do you expect for
> users to manipulate this directly? I can imagine this being the
> internal schema, but I hope the CLI commands and GUI are about setting
> options on buckets which are pretty-printed in the "osd tree" command!

The plan is to *store* these in config-key, but have a new, higher-level 
CLI interface (ceph config ...) to them.  That interface would do the 
validation to make sure you are not talking nonsense: verify 
values are legal, config option exists, is not being set on a 
daemon that doesn't care, isn't something that is ceph.conf-only, 
etc.  It would also have the 'show' commands that would dump the running 
config for a daemon and so on.

> > There are two new things:
> >
> >  config/.../class:$classname/option = value
> >
> > For OSDs, this matches the device_class.  So you can do something like
> >
> >  config/osd/class:ssd/bluestore_cache_size = 10485760  # 10gb, woohoo!
> >
> > You can also match the crush location:
> >
> >  config/.../$crushtype:$crushvalue/option = value
> >
> > e.g.,
> >
> >  config/osd/rack:foo/debug_osd = 10    # hunting some issue
> >
> > This obviously makes sense for OSDs.  We can also make it makes sense for
> > non-OSDs since everybody (clients and daemons) has a concept of
> > crush_location that is a set of key/value pairs like "host=foo rack=bar"
> > which match the CRUSH hierarchy.
> 
> I am not understanding this at all — I don't think we can have any
> expectation that clients know where they are in relationship to the
> CRUSH tree. Frequently they are not sharing any of the specified
> resources, and they are much more likely to shift locations than OSDs
> are. (eg, rbd running in compute boxes in different domains from the
> storage nodes, possibly getting live migrated...)

The idea is that *everyone* knows their hostname, which (if the CRUSH 
hierarchy is populated) is enough to tell us the crush location.  
Obviously some clients will be on hosts not in the map and won't 
know--that's fine.  Generally daemons will be, or can be, if we make an 
effort to place hosts that have mon/mgr/mds/rgw/etc daemons but not OSDs 
in the map.

But even if ignore that an only make it work for OSDs that's pretty 
useful too.


> On Mon, Nov 13, 2017 at 10:40 AM, John Spray <jspray@redhat.com> wrote:
> > On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote:
> >> Configuration files are often driven by configuration management, with
> >> previous versions stored in some kind of version control systems. We
> >> should make sure that if configuration moves to the monitors that you
> >> have some form of history and rollback capabilities. It might be worth
> >> modeling it similar to network switch configuration shells, a la
> >> Junos.
> >>
> >> * change configuration
> >> * require commit configuration change
> >> * ability to rollback N configuration changes
> >> * ability to diff to configuration versions
> >>
> >> That way an admin can figure out when the last configuration change
> >> was, what changed, and rollback if necessary.
> >
> > That is an extremely good idea.
> >
> > As a minimal thing, it should be pretty straightforward to implement a
> > snapshot/rollback.
> >
> > I imagine many users today are not so disciplined as to version
> > control their configs, but this is a good opportunity to push that as
> > the norm by building it in.
> 
> I get the appeal of snapshotting, but I am definitely not convinced
> this is something we should build directly into the monitors. Do you
> have an implementation in mind?
> It seems to me like this is something we can implement pretty easily
> in ceph-mgr (either by restricting the snapshotting to mechanisms that
> make changes via the manager, or by subscribing to config changes),
> and that for admins using orchestration frameworks they already get
> rollbackability from their own version control. Why not take advantage
> of those easier development environments, which are easy to adjust
> later if we find new requirements or issues?

I have no good implementation ideas yet, so I'm just ignoring it for the 
moment.  I think a ceph-based interface would be valuable, though.  Say,

 ceph config checkpoint foo
 ceph config set osd.0 debug_osd 20
 ...
 ceph config rollback foo

or even

 ceph config rollback foo osd.0   # just rollback osd.0's config

Even a pretty basic implementation like encoding all of config/ in a map 
and stuffing it into a config/checkpoint/foo key (compressed even?) would 
be sufficient for that sort of thing.

Alternatively, a complete config changelog/history could also support the 
above and would let you do a 'ceph config history [osd.0]' type command 
that tells you how the config has changed, and when, going backwards in 
time.

Of course, having all of that doesn't prevent you from using your 
existing external tools to manage configs and history.  Perhaps a 'ceph 
config import' type operation that takes a dump of everything 
(efficiently) is appropriate for supporting that well.


> On Tue, Nov 14, 2017 at 3:45 PM, John Spray <jspray@redhat.com> wrote:
> > This comes back to our recurring discussion about whether a
> > HEALTH_INFO level should exist: I'm increasingly of the opinion that
> > when we run into things like this, it's nature's way of telling us
> > that maybe our underlying model is weird (in this case, maybe we
> > didn't need to have the concept of ephemeral configuration settings in
> > the system at all).
> >
> > Maybe ephemeral config changes should be treated the same way I
> > propose to treat local overrides: the daemon reports just that it has
> > been overridden, and the GUI goes hands-off and does not attempt to
> > communicate the story to the user "Well, you see, it's currently set
> > to xyz until the next restart, at which point it will revert to abc,
> > that is unless you have a local ceph.conf in which case...".
> 
> I'm with you on this — I don't think there's a reason for the central
> config to distinguish between *kinds* of disagreement. We probably
> want to expose which daemons are disagreeing on which options, but I'm
> not seeing the utility of diagnosing *where* the disagreement was
> injected.

Having a active/not active on the mgr/mon seems fine; I think it's mostly 
a matter of how much effort we want to invest in that interface.

I plan to make the 'ceph daemon X config diff' show the complete story 
(from the daemons perspective), indicating each source (default, conf, 
mon, override) and value that is in play, along with the effective result.

sage

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-12-01 17:53 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-10 15:30 config on mons Sage Weil
2017-11-13  0:27 ` Patrick Donnelly
2017-11-13  1:43 ` Yehuda Sadeh-Weinraub
2017-11-13  9:57   ` John Spray
2017-11-13 16:29     ` Yehuda Sadeh-Weinraub
2017-11-13  4:30 ` Christian Wuerdig
2017-11-13 10:00   ` John Spray
2017-11-13 16:45     ` Mark Nelson
2017-11-13 18:20       ` Kyle Bader
2017-11-13 18:40         ` John Spray
2017-11-14 10:18           ` Piotr Dałek
2017-11-14 11:36             ` John Spray
2017-11-14 13:58               ` Piotr Dałek
2017-11-14 16:24                 ` Sage Weil
2017-11-14 14:33               ` Mark Nelson
2017-11-14 16:37                 ` Kyle Bader
2017-11-14 18:01                   ` Alfredo Deza
2017-11-14 13:48             ` Mark Nelson
2017-11-13 13:23 ` John Spray
2017-11-14 22:21 ` Sage Weil
2017-11-14 23:45   ` John Spray
2017-11-15 13:32     ` Sage Weil
2017-11-15 17:16       ` Lars Marowsky-Bree
2017-11-15 21:26         ` Sage Weil
2017-11-30 22:31           ` Gregory Farnum
2017-12-01 17:53             ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.