* config on mons @ 2017-11-10 15:30 Sage Weil 2017-11-13 0:27 ` Patrick Donnelly ` (4 more replies) 0 siblings, 5 replies; 26+ messages in thread From: Sage Weil @ 2017-11-10 15:30 UTC (permalink / raw) To: ceph-devel I've started on this long-discussed feature! I haven't gotten too far but you can see what's there so far at https://github.com/ceph/ceph/pull/18856 The first thing perhaps is to finalize what flexibility we want to support. I've a quick summary at http://pad.ceph.com/p/config Namely, config/option = value # like [global] config/$type/option = value # like [mon] config/$type.$id/option = value # like [mon.a] There are two new things: config/.../class:$classname/option = value For OSDs, this matches the device_class. So you can do something like config/osd/class:ssd/bluestore_cache_size = 10485760 # 10gb, woohoo! You can also match the crush location: config/.../$crushtype:$crushvalue/option = value e.g., config/osd/rack:foo/debug_osd = 10 # hunting some issue This obviously makes sense for OSDs. We can also make it makes sense for non-OSDs since everybody (clients and daemons) has a concept of crush_location that is a set of key/value pairs like "host=foo rack=bar" which match the CRUSH hierarchy. In this case, my plan is to make the initial mon authentication step include the hostname of the host you're connecting from and then extract the rest of the location by lookup up the host in the CRUSH map. The precedence for these is described here: https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15 Lots of other thorny issues to consider. For example: - What about monitor configs? If they store their config paxos, and you set an option that breaks paxos, how can you change/fix it? For the moment I'm just ignoring the mons. - What about ceph.conf? My thought here is to mark which options are legal for bootstrap (i.e., used during the initial connection to mon to authenticate and fetch config), and warn on anything other than that in ceph.conf. But what about after you connect? Do these options get reset to default? - Bootstrapping/upgrade: So far my best idea is to make the client share it's config with the mon on startup, and the first time a given daemon connects the mon will use that to populate it's config database. Thereafter it will be ignored. - OSD startup: lots of stuff happens before we authenticate. I think there will be a new initial step to fetch config, then do all that work, then start up for real. And a new option to bypass mon configuration to avoid that (and for old school folks who don't want centralized configs... e.g. mon_config = false and everything works as before). Feedback welcome! sage ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-10 15:30 config on mons Sage Weil @ 2017-11-13 0:27 ` Patrick Donnelly 2017-11-13 1:43 ` Yehuda Sadeh-Weinraub ` (3 subsequent siblings) 4 siblings, 0 replies; 26+ messages in thread From: Patrick Donnelly @ 2017-11-13 0:27 UTC (permalink / raw) To: Sage Weil; +Cc: Ceph Development On Sat, Nov 11, 2017 at 2:30 AM, Sage Weil <sweil@redhat.com> wrote: > - What about ceph.conf? My thought here is to mark which options are > legal for bootstrap (i.e., used during the initial connection to mon to > authenticate and fetch config), and warn on anything other than that in > ceph.conf. But what about after you connect? Do these options get reset > to default? Perhaps we should deprecate ceph.conf and mandate an alternate bootstrap file for connecting to the mons. -- Patrick Donnelly ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-10 15:30 config on mons Sage Weil 2017-11-13 0:27 ` Patrick Donnelly @ 2017-11-13 1:43 ` Yehuda Sadeh-Weinraub 2017-11-13 9:57 ` John Spray 2017-11-13 4:30 ` Christian Wuerdig ` (2 subsequent siblings) 4 siblings, 1 reply; 26+ messages in thread From: Yehuda Sadeh-Weinraub @ 2017-11-13 1:43 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On Fri, Nov 10, 2017 at 7:30 AM, Sage Weil <sweil@redhat.com> wrote: > I've started on this long-discussed feature! I haven't gotten too far but > you can see what's there so far at > > https://github.com/ceph/ceph/pull/18856 > > The first thing perhaps is to finalize what flexibility we want to > support. I've a quick summary at > > http://pad.ceph.com/p/config > > Namely, > > config/option = value # like [global] > config/$type/option = value # like [mon] > config/$type.$id/option = value # like [mon.a] > > There are two new things: > > config/.../class:$classname/option = value > > For OSDs, this matches the device_class. So you can do something like > > config/osd/class:ssd/bluestore_cache_size = 10485760 # 10gb, woohoo! > > You can also match the crush location: > > config/.../$crushtype:$crushvalue/option = value > > e.g., > > config/osd/rack:foo/debug_osd = 10 # hunting some issue > > This obviously makes sense for OSDs. We can also make it makes sense for > non-OSDs since everybody (clients and daemons) has a concept of > crush_location that is a set of key/value pairs like "host=foo rack=bar" > which match the CRUSH hierarchy. In this case, my plan is to make the > initial mon authentication step include the hostname of the host you're > connecting from and then extract the rest of the location by lookup > up the host in the CRUSH map. > > The precedence for these is described here: > > https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15 > > > Lots of other thorny issues to consider. For example: > > - What about monitor configs? If they store their config paxos, and you > set an option that breaks paxos, how can you change/fix it? For the > moment I'm just ignoring the mons. > > - What about ceph.conf? My thought here is to mark which options are > legal for bootstrap (i.e., used during the initial connection to mon to > authenticate and fetch config), and warn on anything other than that in > ceph.conf. But what about after you connect? Do these options get reset > to default? And also hat about configurables passed in as args? I think that other than any local configuration (ceph.conf, args) should still be used to override config from mons. We can add warnings and whistles to warn when such configuration exists, but should not lose it. > > - Bootstrapping/upgrade: So far my best idea is to make the client share > it's config with the mon on startup, and the first time a given daemon > connects the mon will use that to populate it's config database. > Thereafter it will be ignored. Maybe there could be some flag that we could pass in to select th client's behavior. By default it'd take the mon config if that exists. Other options would be to take local config, or overlay local over mon. Yehuda > > - OSD startup: lots of stuff happens before we authenticate. I think > there will be a new initial step to fetch config, then do all that work, > then start up for real. And a new option to bypass mon configuration > to avoid that (and for old school folks who don't want centralized > configs... e.g. mon_config = false and everything works as before). > > Feedback welcome! > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-13 1:43 ` Yehuda Sadeh-Weinraub @ 2017-11-13 9:57 ` John Spray 2017-11-13 16:29 ` Yehuda Sadeh-Weinraub 0 siblings, 1 reply; 26+ messages in thread From: John Spray @ 2017-11-13 9:57 UTC (permalink / raw) To: Yehuda Sadeh-Weinraub; +Cc: Sage Weil, ceph-devel On Mon, Nov 13, 2017 at 1:43 AM, Yehuda Sadeh-Weinraub <ysadehwe@redhat.com> wrote: >> - What about ceph.conf? My thought here is to mark which options are >> legal for bootstrap (i.e., used during the initial connection to mon to >> authenticate and fetch config), and warn on anything other than that in >> ceph.conf. But what about after you connect? Do these options get reset >> to default? > > And also hat about configurables passed in as args? I think that other > than any local configuration (ceph.conf, args) should still be used to > override config from mons. We can add warnings and whistles to warn > when such configuration exists, but should not lose it. This comes up whenever we talk about the centralized config so I guess it never quite got put to rest... The big downside to letting services selectively ignore the mons is that anyone building a user interface is pretty much screwed if they want to show the current value of a config setting, unless we make the MonClient config subscription a two-way thing that enables services to *set* their own config (from their ceph.conf) in addition to receiving it. John >> - Bootstrapping/upgrade: So far my best idea is to make the client share >> it's config with the mon on startup, and the first time a given daemon >> connects the mon will use that to populate it's config database. >> Thereafter it will be ignored. > > Maybe there could be some flag that we could pass in to select th > client's behavior. By default it'd take the mon config if that exists. > Other options would be to take local config, or overlay local over > mon. > > Yehuda > >> >> - OSD startup: lots of stuff happens before we authenticate. I think >> there will be a new initial step to fetch config, then do all that work, >> then start up for real. And a new option to bypass mon configuration >> to avoid that (and for old school folks who don't want centralized >> configs... e.g. mon_config = false and everything works as before). >> >> Feedback welcome! >> sage >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-13 9:57 ` John Spray @ 2017-11-13 16:29 ` Yehuda Sadeh-Weinraub 0 siblings, 0 replies; 26+ messages in thread From: Yehuda Sadeh-Weinraub @ 2017-11-13 16:29 UTC (permalink / raw) To: John Spray; +Cc: Sage Weil, ceph-devel On Mon, Nov 13, 2017 at 1:57 AM, John Spray <jspray@redhat.com> wrote: > On Mon, Nov 13, 2017 at 1:43 AM, Yehuda Sadeh-Weinraub > <ysadehwe@redhat.com> wrote: >>> - What about ceph.conf? My thought here is to mark which options are >>> legal for bootstrap (i.e., used during the initial connection to mon to >>> authenticate and fetch config), and warn on anything other than that in >>> ceph.conf. But what about after you connect? Do these options get reset >>> to default? >> >> And also hat about configurables passed in as args? I think that other >> than any local configuration (ceph.conf, args) should still be used to >> override config from mons. We can add warnings and whistles to warn >> when such configuration exists, but should not lose it. > > This comes up whenever we talk about the centralized config so I guess > it never quite got put to rest... > > The big downside to letting services selectively ignore the mons is > that anyone building a user interface is pretty much screwed if they > want to show the current value of a config setting, unless we make the > MonClient config subscription a two-way thing that enables services to > *set* their own config (from their ceph.conf) in addition to receiving > it. More like have them report it, not necessarily set it. We should have that. I don't like the idea of not being able to modify it without going to the monitors. There might be cases where either doing that via the monitors is not practical, or cumbersome, or you'd just want to try different values quickly, or running in a test or dev environment, etc. And given that we need to have that subsystem working anyway, as we need it for bootstrapping, and not everything that we run even connects or should connect to the cluster, I think it would also make logical sense. Yehuda > > John > >>> - Bootstrapping/upgrade: So far my best idea is to make the client share >>> it's config with the mon on startup, and the first time a given daemon >>> connects the mon will use that to populate it's config database. >>> Thereafter it will be ignored. >> >> Maybe there could be some flag that we could pass in to select th >> client's behavior. By default it'd take the mon config if that exists. >> Other options would be to take local config, or overlay local over >> mon. >> >> Yehuda >> >>> >>> - OSD startup: lots of stuff happens before we authenticate. I think >>> there will be a new initial step to fetch config, then do all that work, >>> then start up for real. And a new option to bypass mon configuration >>> to avoid that (and for old school folks who don't want centralized >>> configs... e.g. mon_config = false and everything works as before). >>> >>> Feedback welcome! >>> sage >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-10 15:30 config on mons Sage Weil 2017-11-13 0:27 ` Patrick Donnelly 2017-11-13 1:43 ` Yehuda Sadeh-Weinraub @ 2017-11-13 4:30 ` Christian Wuerdig 2017-11-13 10:00 ` John Spray 2017-11-13 13:23 ` John Spray 2017-11-14 22:21 ` Sage Weil 4 siblings, 1 reply; 26+ messages in thread From: Christian Wuerdig @ 2017-11-13 4:30 UTC (permalink / raw) To: Sage Weil; +Cc: Ceph Development Hm, have you guys considered utilizing existing key-value stores like Consul or etcd for this instead of rolling your own? Not sure about the details of etcd but the Consul API is quite nice, supports long polling and transactional support. Obvious downside is that you depend on a separate service but that can also be an advantage. On Sat, Nov 11, 2017 at 4:30 AM, Sage Weil <sweil@redhat.com> wrote: > I've started on this long-discussed feature! I haven't gotten too far but > you can see what's there so far at > > https://github.com/ceph/ceph/pull/18856 > > The first thing perhaps is to finalize what flexibility we want to > support. I've a quick summary at > > http://pad.ceph.com/p/config > > Namely, > > config/option = value # like [global] > config/$type/option = value # like [mon] > config/$type.$id/option = value # like [mon.a] > > There are two new things: > > config/.../class:$classname/option = value > > For OSDs, this matches the device_class. So you can do something like > > config/osd/class:ssd/bluestore_cache_size = 10485760 # 10gb, woohoo! > > You can also match the crush location: > > config/.../$crushtype:$crushvalue/option = value > > e.g., > > config/osd/rack:foo/debug_osd = 10 # hunting some issue > > This obviously makes sense for OSDs. We can also make it makes sense for > non-OSDs since everybody (clients and daemons) has a concept of > crush_location that is a set of key/value pairs like "host=foo rack=bar" > which match the CRUSH hierarchy. In this case, my plan is to make the > initial mon authentication step include the hostname of the host you're > connecting from and then extract the rest of the location by lookup > up the host in the CRUSH map. > > The precedence for these is described here: > > https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15 > > > Lots of other thorny issues to consider. For example: > > - What about monitor configs? If they store their config paxos, and you > set an option that breaks paxos, how can you change/fix it? For the > moment I'm just ignoring the mons. > > - What about ceph.conf? My thought here is to mark which options are > legal for bootstrap (i.e., used during the initial connection to mon to > authenticate and fetch config), and warn on anything other than that in > ceph.conf. But what about after you connect? Do these options get reset > to default? > > - Bootstrapping/upgrade: So far my best idea is to make the client share > it's config with the mon on startup, and the first time a given daemon > connects the mon will use that to populate it's config database. > Thereafter it will be ignored. > > - OSD startup: lots of stuff happens before we authenticate. I think > there will be a new initial step to fetch config, then do all that work, > then start up for real. And a new option to bypass mon configuration > to avoid that (and for old school folks who don't want centralized > configs... e.g. mon_config = false and everything works as before). > > Feedback welcome! > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-13 4:30 ` Christian Wuerdig @ 2017-11-13 10:00 ` John Spray 2017-11-13 16:45 ` Mark Nelson 0 siblings, 1 reply; 26+ messages in thread From: John Spray @ 2017-11-13 10:00 UTC (permalink / raw) To: Christian Wuerdig; +Cc: Sage Weil, Ceph Development On Mon, Nov 13, 2017 at 4:30 AM, Christian Wuerdig <christian.wuerdig@gmail.com> wrote: > Hm, have you guys considered utilizing existing key-value stores like > Consul or etcd for this instead of rolling your own? Not sure about > the details of etcd but the Consul API is quite nice, supports long > polling and transactional support. Obvious downside is that you depend > on a separate service but that can also be an advantage. When it comes to putting and getting values, Consul and etcd don't really offer much that the ceph mons don't already do. As you say, it would be a new dependency, but more importantly it would also be a whole new network comms path with its own authentication, ports, etc. This is one of those situations where using something off the shelf is actually way more effort (for developers and for users) than building it in. John > > On Sat, Nov 11, 2017 at 4:30 AM, Sage Weil <sweil@redhat.com> wrote: >> I've started on this long-discussed feature! I haven't gotten too far but >> you can see what's there so far at >> >> https://github.com/ceph/ceph/pull/18856 >> >> The first thing perhaps is to finalize what flexibility we want to >> support. I've a quick summary at >> >> http://pad.ceph.com/p/config >> >> Namely, >> >> config/option = value # like [global] >> config/$type/option = value # like [mon] >> config/$type.$id/option = value # like [mon.a] >> >> There are two new things: >> >> config/.../class:$classname/option = value >> >> For OSDs, this matches the device_class. So you can do something like >> >> config/osd/class:ssd/bluestore_cache_size = 10485760 # 10gb, woohoo! >> >> You can also match the crush location: >> >> config/.../$crushtype:$crushvalue/option = value >> >> e.g., >> >> config/osd/rack:foo/debug_osd = 10 # hunting some issue >> >> This obviously makes sense for OSDs. We can also make it makes sense for >> non-OSDs since everybody (clients and daemons) has a concept of >> crush_location that is a set of key/value pairs like "host=foo rack=bar" >> which match the CRUSH hierarchy. In this case, my plan is to make the >> initial mon authentication step include the hostname of the host you're >> connecting from and then extract the rest of the location by lookup >> up the host in the CRUSH map. >> >> The precedence for these is described here: >> >> https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15 >> >> >> Lots of other thorny issues to consider. For example: >> >> - What about monitor configs? If they store their config paxos, and you >> set an option that breaks paxos, how can you change/fix it? For the >> moment I'm just ignoring the mons. >> >> - What about ceph.conf? My thought here is to mark which options are >> legal for bootstrap (i.e., used during the initial connection to mon to >> authenticate and fetch config), and warn on anything other than that in >> ceph.conf. But what about after you connect? Do these options get reset >> to default? >> >> - Bootstrapping/upgrade: So far my best idea is to make the client share >> it's config with the mon on startup, and the first time a given daemon >> connects the mon will use that to populate it's config database. >> Thereafter it will be ignored. >> >> - OSD startup: lots of stuff happens before we authenticate. I think >> there will be a new initial step to fetch config, then do all that work, >> then start up for real. And a new option to bypass mon configuration >> to avoid that (and for old school folks who don't want centralized >> configs... e.g. mon_config = false and everything works as before). >> >> Feedback welcome! >> sage >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-13 10:00 ` John Spray @ 2017-11-13 16:45 ` Mark Nelson 2017-11-13 18:20 ` Kyle Bader 0 siblings, 1 reply; 26+ messages in thread From: Mark Nelson @ 2017-11-13 16:45 UTC (permalink / raw) To: John Spray, Christian Wuerdig; +Cc: Sage Weil, Ceph Development On 11/13/2017 04:00 AM, John Spray wrote: > On Mon, Nov 13, 2017 at 4:30 AM, Christian Wuerdig > <christian.wuerdig@gmail.com> wrote: >> Hm, have you guys considered utilizing existing key-value stores like >> Consul or etcd for this instead of rolling your own? Not sure about >> the details of etcd but the Consul API is quite nice, supports long >> polling and transactional support. Obvious downside is that you depend >> on a separate service but that can also be an advantage. > > When it comes to putting and getting values, Consul and etcd don't > really offer much that the ceph mons don't already do. As you say, it > would be a new dependency, but more importantly it would also be a > whole new network comms path with its own authentication, ports, etc. > > This is one of those situations where using something off the shelf is > actually way more effort (for developers and for users) than building > it in. > > John > I don't disagree, but I could imagine there are a number of sysadmins that want Ceph to play nice with whatever they are currently using for everything else they maintain. Whatever we do here, we probably want to be mindful (ie I'd argue that deprecating ceph.conf might not be well liked by folks that are happy with their current setup). Mark >> >> On Sat, Nov 11, 2017 at 4:30 AM, Sage Weil <sweil@redhat.com> wrote: >>> I've started on this long-discussed feature! I haven't gotten too far but >>> you can see what's there so far at >>> >>> https://github.com/ceph/ceph/pull/18856 >>> >>> The first thing perhaps is to finalize what flexibility we want to >>> support. I've a quick summary at >>> >>> http://pad.ceph.com/p/config >>> >>> Namely, >>> >>> config/option = value # like [global] >>> config/$type/option = value # like [mon] >>> config/$type.$id/option = value # like [mon.a] >>> >>> There are two new things: >>> >>> config/.../class:$classname/option = value >>> >>> For OSDs, this matches the device_class. So you can do something like >>> >>> config/osd/class:ssd/bluestore_cache_size = 10485760 # 10gb, woohoo! >>> >>> You can also match the crush location: >>> >>> config/.../$crushtype:$crushvalue/option = value >>> >>> e.g., >>> >>> config/osd/rack:foo/debug_osd = 10 # hunting some issue >>> >>> This obviously makes sense for OSDs. We can also make it makes sense for >>> non-OSDs since everybody (clients and daemons) has a concept of >>> crush_location that is a set of key/value pairs like "host=foo rack=bar" >>> which match the CRUSH hierarchy. In this case, my plan is to make the >>> initial mon authentication step include the hostname of the host you're >>> connecting from and then extract the rest of the location by lookup >>> up the host in the CRUSH map. >>> >>> The precedence for these is described here: >>> >>> https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15 >>> >>> >>> Lots of other thorny issues to consider. For example: >>> >>> - What about monitor configs? If they store their config paxos, and you >>> set an option that breaks paxos, how can you change/fix it? For the >>> moment I'm just ignoring the mons. >>> >>> - What about ceph.conf? My thought here is to mark which options are >>> legal for bootstrap (i.e., used during the initial connection to mon to >>> authenticate and fetch config), and warn on anything other than that in >>> ceph.conf. But what about after you connect? Do these options get reset >>> to default? >>> >>> - Bootstrapping/upgrade: So far my best idea is to make the client share >>> it's config with the mon on startup, and the first time a given daemon >>> connects the mon will use that to populate it's config database. >>> Thereafter it will be ignored. >>> >>> - OSD startup: lots of stuff happens before we authenticate. I think >>> there will be a new initial step to fetch config, then do all that work, >>> then start up for real. And a new option to bypass mon configuration >>> to avoid that (and for old school folks who don't want centralized >>> configs... e.g. mon_config = false and everything works as before). >>> >>> Feedback welcome! >>> sage >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-13 16:45 ` Mark Nelson @ 2017-11-13 18:20 ` Kyle Bader 2017-11-13 18:40 ` John Spray 0 siblings, 1 reply; 26+ messages in thread From: Kyle Bader @ 2017-11-13 18:20 UTC (permalink / raw) To: Mark Nelson; +Cc: John Spray, Christian Wuerdig, Sage Weil, Ceph Development Configuration files are often driven by configuration management, with previous versions stored in some kind of version control systems. We should make sure that if configuration moves to the monitors that you have some form of history and rollback capabilities. It might be worth modeling it similar to network switch configuration shells, a la Junos. * change configuration * require commit configuration change * ability to rollback N configuration changes * ability to diff to configuration versions That way an admin can figure out when the last configuration change was, what changed, and rollback if necessary. On Mon, Nov 13, 2017 at 8:45 AM, Mark Nelson <mnelson@redhat.com> wrote: > > > On 11/13/2017 04:00 AM, John Spray wrote: >> >> On Mon, Nov 13, 2017 at 4:30 AM, Christian Wuerdig >> <christian.wuerdig@gmail.com> wrote: >>> >>> Hm, have you guys considered utilizing existing key-value stores like >>> Consul or etcd for this instead of rolling your own? Not sure about >>> the details of etcd but the Consul API is quite nice, supports long >>> polling and transactional support. Obvious downside is that you depend >>> on a separate service but that can also be an advantage. >> >> >> When it comes to putting and getting values, Consul and etcd don't >> really offer much that the ceph mons don't already do. As you say, it >> would be a new dependency, but more importantly it would also be a >> whole new network comms path with its own authentication, ports, etc. >> >> This is one of those situations where using something off the shelf is >> actually way more effort (for developers and for users) than building >> it in. >> >> John >> > > I don't disagree, but I could imagine there are a number of sysadmins that > want Ceph to play nice with whatever they are currently using for everything > else they maintain. Whatever we do here, we probably want to be mindful (ie > I'd argue that deprecating ceph.conf might not be well liked by folks that > are happy with their current setup). > > Mark > > >>> >>> On Sat, Nov 11, 2017 at 4:30 AM, Sage Weil <sweil@redhat.com> wrote: >>>> >>>> I've started on this long-discussed feature! I haven't gotten too far >>>> but >>>> you can see what's there so far at >>>> >>>> https://github.com/ceph/ceph/pull/18856 >>>> >>>> The first thing perhaps is to finalize what flexibility we want to >>>> support. I've a quick summary at >>>> >>>> http://pad.ceph.com/p/config >>>> >>>> Namely, >>>> >>>> config/option = value # like [global] >>>> config/$type/option = value # like [mon] >>>> config/$type.$id/option = value # like [mon.a] >>>> >>>> There are two new things: >>>> >>>> config/.../class:$classname/option = value >>>> >>>> For OSDs, this matches the device_class. So you can do something like >>>> >>>> config/osd/class:ssd/bluestore_cache_size = 10485760 # 10gb, woohoo! >>>> >>>> You can also match the crush location: >>>> >>>> config/.../$crushtype:$crushvalue/option = value >>>> >>>> e.g., >>>> >>>> config/osd/rack:foo/debug_osd = 10 # hunting some issue >>>> >>>> This obviously makes sense for OSDs. We can also make it makes sense >>>> for >>>> non-OSDs since everybody (clients and daemons) has a concept of >>>> crush_location that is a set of key/value pairs like "host=foo rack=bar" >>>> which match the CRUSH hierarchy. In this case, my plan is to make the >>>> initial mon authentication step include the hostname of the host you're >>>> connecting from and then extract the rest of the location by lookup >>>> up the host in the CRUSH map. >>>> >>>> The precedence for these is described here: >>>> >>>> >>>> https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15 >>>> >>>> >>>> Lots of other thorny issues to consider. For example: >>>> >>>> - What about monitor configs? If they store their config paxos, and you >>>> set an option that breaks paxos, how can you change/fix it? For the >>>> moment I'm just ignoring the mons. >>>> >>>> - What about ceph.conf? My thought here is to mark which options are >>>> legal for bootstrap (i.e., used during the initial connection to mon to >>>> authenticate and fetch config), and warn on anything other than that in >>>> ceph.conf. But what about after you connect? Do these options get >>>> reset >>>> to default? >>>> >>>> - Bootstrapping/upgrade: So far my best idea is to make the client share >>>> it's config with the mon on startup, and the first time a given daemon >>>> connects the mon will use that to populate it's config database. >>>> Thereafter it will be ignored. >>>> >>>> - OSD startup: lots of stuff happens before we authenticate. I think >>>> there will be a new initial step to fetch config, then do all that work, >>>> then start up for real. And a new option to bypass mon configuration >>>> to avoid that (and for old school folks who don't want centralized >>>> configs... e.g. mon_config = false and everything works as before). >>>> >>>> Feedback welcome! >>>> sage >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-13 18:20 ` Kyle Bader @ 2017-11-13 18:40 ` John Spray 2017-11-14 10:18 ` Piotr Dałek 0 siblings, 1 reply; 26+ messages in thread From: John Spray @ 2017-11-13 18:40 UTC (permalink / raw) To: Kyle Bader; +Cc: Mark Nelson, Christian Wuerdig, Sage Weil, Ceph Development On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote: > Configuration files are often driven by configuration management, with > previous versions stored in some kind of version control systems. We > should make sure that if configuration moves to the monitors that you > have some form of history and rollback capabilities. It might be worth > modeling it similar to network switch configuration shells, a la > Junos. > > * change configuration > * require commit configuration change > * ability to rollback N configuration changes > * ability to diff to configuration versions > > That way an admin can figure out when the last configuration change > was, what changed, and rollback if necessary. That is an extremely good idea. As a minimal thing, it should be pretty straightforward to implement a snapshot/rollback. I imagine many users today are not so disciplined as to version control their configs, but this is a good opportunity to push that as the norm by building it in. John > > > On Mon, Nov 13, 2017 at 8:45 AM, Mark Nelson <mnelson@redhat.com> wrote: >> >> >> On 11/13/2017 04:00 AM, John Spray wrote: >>> >>> On Mon, Nov 13, 2017 at 4:30 AM, Christian Wuerdig >>> <christian.wuerdig@gmail.com> wrote: >>>> >>>> Hm, have you guys considered utilizing existing key-value stores like >>>> Consul or etcd for this instead of rolling your own? Not sure about >>>> the details of etcd but the Consul API is quite nice, supports long >>>> polling and transactional support. Obvious downside is that you depend >>>> on a separate service but that can also be an advantage. >>> >>> >>> When it comes to putting and getting values, Consul and etcd don't >>> really offer much that the ceph mons don't already do. As you say, it >>> would be a new dependency, but more importantly it would also be a >>> whole new network comms path with its own authentication, ports, etc. >>> >>> This is one of those situations where using something off the shelf is >>> actually way more effort (for developers and for users) than building >>> it in. >>> >>> John >>> >> >> I don't disagree, but I could imagine there are a number of sysadmins that >> want Ceph to play nice with whatever they are currently using for everything >> else they maintain. Whatever we do here, we probably want to be mindful (ie >> I'd argue that deprecating ceph.conf might not be well liked by folks that >> are happy with their current setup). >> >> Mark >> >> >>>> >>>> On Sat, Nov 11, 2017 at 4:30 AM, Sage Weil <sweil@redhat.com> wrote: >>>>> >>>>> I've started on this long-discussed feature! I haven't gotten too far >>>>> but >>>>> you can see what's there so far at >>>>> >>>>> https://github.com/ceph/ceph/pull/18856 >>>>> >>>>> The first thing perhaps is to finalize what flexibility we want to >>>>> support. I've a quick summary at >>>>> >>>>> http://pad.ceph.com/p/config >>>>> >>>>> Namely, >>>>> >>>>> config/option = value # like [global] >>>>> config/$type/option = value # like [mon] >>>>> config/$type.$id/option = value # like [mon.a] >>>>> >>>>> There are two new things: >>>>> >>>>> config/.../class:$classname/option = value >>>>> >>>>> For OSDs, this matches the device_class. So you can do something like >>>>> >>>>> config/osd/class:ssd/bluestore_cache_size = 10485760 # 10gb, woohoo! >>>>> >>>>> You can also match the crush location: >>>>> >>>>> config/.../$crushtype:$crushvalue/option = value >>>>> >>>>> e.g., >>>>> >>>>> config/osd/rack:foo/debug_osd = 10 # hunting some issue >>>>> >>>>> This obviously makes sense for OSDs. We can also make it makes sense >>>>> for >>>>> non-OSDs since everybody (clients and daemons) has a concept of >>>>> crush_location that is a set of key/value pairs like "host=foo rack=bar" >>>>> which match the CRUSH hierarchy. In this case, my plan is to make the >>>>> initial mon authentication step include the hostname of the host you're >>>>> connecting from and then extract the rest of the location by lookup >>>>> up the host in the CRUSH map. >>>>> >>>>> The precedence for these is described here: >>>>> >>>>> >>>>> https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15 >>>>> >>>>> >>>>> Lots of other thorny issues to consider. For example: >>>>> >>>>> - What about monitor configs? If they store their config paxos, and you >>>>> set an option that breaks paxos, how can you change/fix it? For the >>>>> moment I'm just ignoring the mons. >>>>> >>>>> - What about ceph.conf? My thought here is to mark which options are >>>>> legal for bootstrap (i.e., used during the initial connection to mon to >>>>> authenticate and fetch config), and warn on anything other than that in >>>>> ceph.conf. But what about after you connect? Do these options get >>>>> reset >>>>> to default? >>>>> >>>>> - Bootstrapping/upgrade: So far my best idea is to make the client share >>>>> it's config with the mon on startup, and the first time a given daemon >>>>> connects the mon will use that to populate it's config database. >>>>> Thereafter it will be ignored. >>>>> >>>>> - OSD startup: lots of stuff happens before we authenticate. I think >>>>> there will be a new initial step to fetch config, then do all that work, >>>>> then start up for real. And a new option to bypass mon configuration >>>>> to avoid that (and for old school folks who don't want centralized >>>>> configs... e.g. mon_config = false and everything works as before). >>>>> >>>>> Feedback welcome! >>>>> sage >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-13 18:40 ` John Spray @ 2017-11-14 10:18 ` Piotr Dałek 2017-11-14 11:36 ` John Spray 2017-11-14 13:48 ` Mark Nelson 0 siblings, 2 replies; 26+ messages in thread From: Piotr Dałek @ 2017-11-14 10:18 UTC (permalink / raw) To: John Spray, Kyle Bader Cc: Mark Nelson, Christian Wuerdig, Sage Weil, Ceph Development On 17-11-13 07:40 PM, John Spray wrote: > On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote: >> Configuration files are often driven by configuration management, with >> previous versions stored in some kind of version control systems. We >> should make sure that if configuration moves to the monitors that you >> have some form of history and rollback capabilities. It might be worth >> modeling it similar to network switch configuration shells, a la >> Junos. >> >> * change configuration >> * require commit configuration change >> * ability to rollback N configuration changes >> * ability to diff to configuration versions >> >> That way an admin can figure out when the last configuration change >> was, what changed, and rollback if necessary. > > That is an extremely good idea. > > As a minimal thing, it should be pretty straightforward to implement a > snapshot/rollback. https://thedailywtf.com/articles/The_Complicator_0x27_s_Gloves > I imagine many users today are not so disciplined as to version > control their configs, but this is a good opportunity to push that as > the norm by building it in. Using Ceph on any decent scale actually requires one to use at least Puppet or similar tool, I wouldn't add any unnecessary complexity to already complex code just because of novice users that are going to have hard time using Ceph anyway once a disk breaks and needs to be replaced, or when performance goes to hell because users are free to create and remove snapshots every 5 minutes. And I can already imagine clusters breaking down once config database/history breaks for whatever reason, including early implementation bugs. Distributing configs through mon isn't bad idea by itself, I can imagine having changes to runtime-changeable settings propagated to OSDs without the need for extra step (actually injecting them) and without the need for restart, but for anything else, there are already good tools and I see no value in trying to mimic them. -- Piotr Dałek piotr.dalek@corp.ovh.com https://www.ovh.com/us/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-14 10:18 ` Piotr Dałek @ 2017-11-14 11:36 ` John Spray 2017-11-14 13:58 ` Piotr Dałek 2017-11-14 14:33 ` Mark Nelson 2017-11-14 13:48 ` Mark Nelson 1 sibling, 2 replies; 26+ messages in thread From: John Spray @ 2017-11-14 11:36 UTC (permalink / raw) To: Piotr Dałek Cc: Kyle Bader, Mark Nelson, Christian Wuerdig, Sage Weil, Ceph Development On Tue, Nov 14, 2017 at 10:18 AM, Piotr Dałek <piotr.dalek@corp.ovh.com> wrote: > On 17-11-13 07:40 PM, John Spray wrote: >> >> On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote: >>> >>> Configuration files are often driven by configuration management, with >>> previous versions stored in some kind of version control systems. We >>> should make sure that if configuration moves to the monitors that you >>> have some form of history and rollback capabilities. It might be worth >>> modeling it similar to network switch configuration shells, a la >>> Junos. >>> >>> * change configuration >>> * require commit configuration change >>> * ability to rollback N configuration changes >>> * ability to diff to configuration versions >>> >>> That way an admin can figure out when the last configuration change >>> was, what changed, and rollback if necessary. >> >> >> That is an extremely good idea. >> >> As a minimal thing, it should be pretty straightforward to implement a >> snapshot/rollback. > > > https://thedailywtf.com/articles/The_Complicator_0x27_s_Gloves > >> I imagine many users today are not so disciplined as to version >> control their configs, but this is a good opportunity to push that as >> the norm by building it in. > > > Using Ceph on any decent scale actually requires one to use at least Puppet > or similar tool, I wouldn't add any unnecessary complexity to already > complex code just because of novice users that are going to have hard time > using Ceph anyway once a disk breaks and needs to be replaced, or when > performance goes to hell because users are free to create and remove > snapshots every 5 minutes. All of the experienced users were novice users once -- making Ceph work well for those people is worthwhile. It's not easy to build things that are easy enough for a newcomer but also powerful enough for the general case, but it is worth doing. When we have to trade internal complexity vs. complexity at interfaces, it's generally better to keep the interfaces simple. Currently a Ceph cluster with 1000 OSDs has 1000 places to input the configuration, and no one place that a person can ask "what is setting X on my OSDs?". Even when they look at a ceph.conf file, they can't be sure that those are really the values in use (has the service restarted since the file was updated?) or that they will ever be (are they invalid values that Ceph will reject on load?). The "dump a text file in /etc" interface looks simple on the face of it, but is actually quite complex when you look to automate a Ceph cluster from a central user interface, or build more intelligence into Ceph for avoiding dangerous configurations. It's also painful for non-expert users who are required to type precisely correct syntax into that text file. > And I can already imagine clusters breaking down once config > database/history breaks for whatever reason, including early implementation > bugs. > > Distributing configs through mon isn't bad idea by itself, I can imagine > having changes to runtime-changeable settings propagated to OSDs without the > need for extra step (actually injecting them) and without the need for > restart, but for anything else, there are already good tools and I see no > value in trying to mimic them. Remember that the goal here is not to just invent an alternative way of distributing ceph.conf. Even Puppet is overkill for that! The goal is to change the way configuration is defined in Ceph, so that there is a central point of truth for how the cluster is configured, which will enable us to create a user experience that is more robust, and an interface that enables building better interactive tooling on top of Ceph. When it comes to using something like Puppet as that central point of truth, there are two major problems with that: - If someone wants to write a GUI, they would need to integrate with your Puppet, someone else's Chef, someone else's Ansible, etc -- a lot of work, and in many cases the interfaces for doing it don't even exist (believe me, I've tried writing dashboards that drove Puppet in the past). - If Ceph wants to validate configuration options, and say "No, that setting is no good" when someone tries to change something, we can't, because we're not hooked in to Puppet at the point that the user is changing the setting. The ultimate benefit to you is that by making Ceph easier to use, we grow our community, and we grow the population of people who want to invest in Ceph (all of it, not just the new user friendly bits). John ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-14 11:36 ` John Spray @ 2017-11-14 13:58 ` Piotr Dałek 2017-11-14 16:24 ` Sage Weil 2017-11-14 14:33 ` Mark Nelson 1 sibling, 1 reply; 26+ messages in thread From: Piotr Dałek @ 2017-11-14 13:58 UTC (permalink / raw) To: John Spray Cc: Kyle Bader, Mark Nelson, Christian Wuerdig, Sage Weil, Ceph Development On 17-11-14 12:36 PM, John Spray wrote: > On Tue, Nov 14, 2017 at 10:18 AM, Piotr Dałek <piotr.dalek@corp.ovh.com> wrote: >> On 17-11-13 07:40 PM, John Spray wrote: >>> I imagine many users today are not so disciplined as to version >>> control their configs, but this is a good opportunity to push that as >>> the norm by building it in. >> >> Using Ceph on any decent scale actually requires one to use at least Puppet >> or similar tool, I wouldn't add any unnecessary complexity to already >> complex code just because of novice users that are going to have hard time >> using Ceph anyway once a disk breaks and needs to be replaced, or when >> performance goes to hell because users are free to create and remove >> snapshots every 5 minutes. > > All of the experienced users were novice users once -- making Ceph > work well for those people is worthwhile. It's not easy to build > things that are easy enough for a newcomer but also powerful enough > for the general case, but it is worth doing. > > When we have to trade internal complexity vs. complexity at > interfaces, it's generally better to keep the interfaces simple. > Currently a Ceph cluster with 1000 OSDs has 1000 places to input the > configuration, and no one place that a person can ask "what is setting > X on my OSDs?". Even when they look at a ceph.conf file, they can't > be sure that those are really the values in use (has the service > restarted since the file was updated?) or that they will ever be (are > they invalid values that Ceph will reject on load?). Well, at least I understand now why my config diff patch (https://github.com/ceph/ceph/pull/18586) is not interesting to reviewers. ;) > The "dump a text file in /etc" interface looks simple on the face of > it, but is actually quite complex when you look to automate a Ceph > cluster from a central user interface, or build more intelligence into > Ceph for avoiding dangerous configurations. It's also painful for > non-expert users who are required to type precisely correct syntax > into that text file. Anybody who is overwhelmed by ini-style config file should be kept 100km away from any datacentre and have their shell access rights revoked ASAP. Using Ceph (or any kind of SDN-like software) in production requires a few years as admin under their belt and trying to change that will only cause more grief and frustration from future new users. Ceph already has a feature designed with network switch configuration newbies in mind -- it shouldn't. >> And I can already imagine clusters breaking down once config >> database/history breaks for whatever reason, including early implementation >> bugs. >> >> Distributing configs through mon isn't bad idea by itself, I can imagine >> having changes to runtime-changeable settings propagated to OSDs without the >> need for extra step (actually injecting them) and without the need for >> restart, but for anything else, there are already good tools and I see no >> value in trying to mimic them. > > Remember that the goal here is not to just invent an alternative way > of distributing ceph.conf. Even Puppet is overkill for that! The Of course! This bash oneliner: for i in {1..4}; do scp ~/cluster_dev/ceph.conf ceph@node$i:/etc/ceph/; done; is more than enough to distribute config from some central place to 4 nodes. But nobody sane does this because anything that's not automated is prone to human error. So no, using Puppet is not an overkill, because that does its job and is familiar way of doing this for much more users than just users of Ceph. Still, I'm not opposing distributing Ceph configs through mons, because that's actually useful. > goal is to change the way configuration is defined in Ceph, so that > there is a central point of truth for how the cluster is configured, > which will enable us to create a user experience that is more robust, > and an interface that enables building better interactive tooling on > top of Ceph. > > When it comes to using something like Puppet as that central point of > truth, there are two major problems with that: > - If someone wants to write a GUI, they would need to integrate with > your Puppet, someone else's Chef, someone else's Ansible, etc -- a lot > of work, and in many cases the interfaces for doing it don't even > exist (believe me, I've tried writing dashboards that drove Puppet in > the past). Usually when someone needs a GUI to deploy Ceph cluster, they need to deploy much more than just Ceph. They need to configure network interfaces, storage, kernel, monitoring, etc. etc., so they need to deal with Puppet or Chef (or anything) anyway. > - If Ceph wants to validate configuration options, and say "No, that > setting is no good" when someone tries to change something, we can't, > because we're not hooked in to Puppet at the point that the user is > changing the setting. One can use ceph-conf tool to validate config syntax, because it shares the config code with daemons. And with recent config code changes, it's even possible to validate values. But that's true, validating configuration before pushing it to production is tricky at the moment. > The ultimate benefit to you is that by making Ceph easier to use, we > grow our community, and we grow the population of people who want to > invest in Ceph (all of it, not just the new user friendly bits). True, more users mean tighter bug sieve. But attracting users with ease of use is one thing and reinventing wheels AND asking existing users to use these reinvented wheels at the same time is another thing. Remember that I was relating to the idea of built-in mini-git/mini-svn. -- Piotr Dałek piotr.dalek@corp.ovh.com https://www.ovh.com/us/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-14 13:58 ` Piotr Dałek @ 2017-11-14 16:24 ` Sage Weil 0 siblings, 0 replies; 26+ messages in thread From: Sage Weil @ 2017-11-14 16:24 UTC (permalink / raw) To: Piotr Dałek Cc: John Spray, Kyle Bader, Mark Nelson, Christian Wuerdig, Ceph Development [-- Attachment #1: Type: TEXT/PLAIN, Size: 3227 bytes --] On Tue, 14 Nov 2017, Piotr Dałek wrote: > On 17-11-14 12:36 PM, John Spray wrote: > > On Tue, Nov 14, 2017 at 10:18 AM, Piotr Dałek <piotr.dalek@corp.ovh.com> > > wrote: > > > On 17-11-13 07:40 PM, John Spray wrote: > > > > I imagine many users today are not so disciplined as to version > > > > control their configs, but this is a good opportunity to push that as > > > > the norm by building it in. > > > > > > Using Ceph on any decent scale actually requires one to use at least > > > Puppet or similar tool, I wouldn't add any unnecessary complexity to > > > already complex code just because of novice users that are going to > > > have hard time using Ceph anyway once a disk breaks and needs to be > > > replaced, or when performance goes to hell because users are free to > > > create and remove snapshots every 5 minutes. > > > > All of the experienced users were novice users once -- making Ceph > > work well for those people is worthwhile. It's not easy to build > > things that are easy enough for a newcomer but also powerful enough > > for the general case, but it is worth doing. > > > > When we have to trade internal complexity vs. complexity at > > interfaces, it's generally better to keep the interfaces simple. > > Currently a Ceph cluster with 1000 OSDs has 1000 places to input the > > configuration, and no one place that a person can ask "what is setting > > X on my OSDs?". Even when they look at a ceph.conf file, they can't > > be sure that those are really the values in use (has the service > > restarted since the file was updated?) or that they will ever be (are > > they invalid values that Ceph will reject on load?). > > Well, at least I understand now why my config diff patch > (https://github.com/ceph/ceph/pull/18586) is not interesting to reviewers. ;) Oh, I hadn't seen this. (Don't read too much into a lack of reviews or comments!) I like the diff local command, not sure about the file one. I'm in the midst of rewriting a bunch of this code in a preparatory cleanup for the other config changes... I'll post a PR with just the cleanup portions shortly. I think the big question is whether we can go all-in on mon configs or whether we need to maintain a traditional ceph.conf option as well. I'm of two minds here. I think it's pretty straightforward to get the transparency/reporting that John is after by making daemons report running config but not necessarily pull mon configs... and we probably want/need that anyway to allow e.g. 'ceph daemon <name> config ...' overrides, and for the upgrade path. I'm just worried about an ever-expanding menu of options. I can't simply throw up our hands and say this is out of scope and administrators need to be able to handle this on their own. I had this attitude for a long time and as a result Ceph has a repuation for being hard to install, hard to configure, and hard to manage. This limits adoption, makes it easy for users to make mistakes, and hurts the project. We can do better. Lots of other projects and storage systems *do* do better. Let's just be smart about what we implement so that we're solving the usability and transparently problems and without hobbling power users. sage ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-14 11:36 ` John Spray 2017-11-14 13:58 ` Piotr Dałek @ 2017-11-14 14:33 ` Mark Nelson 2017-11-14 16:37 ` Kyle Bader 1 sibling, 1 reply; 26+ messages in thread From: Mark Nelson @ 2017-11-14 14:33 UTC (permalink / raw) To: John Spray, Piotr Dałek; +Cc: Kyle Bader, Sage Weil, Ceph Development On 11/14/2017 05:36 AM, John Spray wrote: > On Tue, Nov 14, 2017 at 10:18 AM, Piotr Dałek <piotr.dalek@corp.ovh.com> wrote: >> On 17-11-13 07:40 PM, John Spray wrote: >>> >>> On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote: >>>> >>>> Configuration files are often driven by configuration management, with >>>> previous versions stored in some kind of version control systems. We >>>> should make sure that if configuration moves to the monitors that you >>>> have some form of history and rollback capabilities. It might be worth >>>> modeling it similar to network switch configuration shells, a la >>>> Junos. >>>> >>>> * change configuration >>>> * require commit configuration change >>>> * ability to rollback N configuration changes >>>> * ability to diff to configuration versions >>>> >>>> That way an admin can figure out when the last configuration change >>>> was, what changed, and rollback if necessary. >>> >>> >>> That is an extremely good idea. >>> >>> As a minimal thing, it should be pretty straightforward to implement a >>> snapshot/rollback. >> >> >> https://thedailywtf.com/articles/The_Complicator_0x27_s_Gloves >> >>> I imagine many users today are not so disciplined as to version >>> control their configs, but this is a good opportunity to push that as >>> the norm by building it in. >> >> >> Using Ceph on any decent scale actually requires one to use at least Puppet >> or similar tool, I wouldn't add any unnecessary complexity to already >> complex code just because of novice users that are going to have hard time >> using Ceph anyway once a disk breaks and needs to be replaced, or when >> performance goes to hell because users are free to create and remove >> snapshots every 5 minutes. > > All of the experienced users were novice users once -- making Ceph > work well for those people is worthwhile. It's not easy to build > things that are easy enough for a newcomer but also powerful enough > for the general case, but it is worth doing. > > When we have to trade internal complexity vs. complexity at > interfaces, it's generally better to keep the interfaces simple. I've seen too many examples both in our code and in other projects where that kind of internal complexity leaks out and makes things worse. If we want to reduce complexity we need to reduce complexity. I'm not against having the mon to centrally report state. I think it's a great idea. Management I'm not sold on, see below. > Currently a Ceph cluster with 1000 OSDs has 1000 places to input the > configuration, and no one place that a person can ask "what is setting > X on my OSDs?". Even when they look at a ceph.conf file, they can't > be sure that those are really the values in use (has the service > restarted since the file was updated?) or that they will ever be (are > they invalid values that Ceph will reject on load?). How many folks with 1000 OSD clusters are manually managing configuration files though? These are the kinds of customers that have dedicated linux/storage administrators on staff that have preferences regarding how they do things. When I was managing distributed storage systems few things angered me more than trying to deal with each storage vendor's custom management systems. I was never particularly concerned with being able to manage (user-facing) state on my own. What I was *very* concerned about was bug-ridden code that got shipped out at the last minute so the vendor could checkbox a feature that I couldn't easily work around. There was a particular vendor's Lustre HA management/stonith solution that comes to mind. They weren't the only one though. We had a variety of interesting and horrific issues with other non-lustre storage too. The worst cases were the ones where the solution could have been fast/easy but we had to go through all kinds of gymnastics to circumvent the vendor's bad behavior. > The "dump a text file in /etc" interface looks simple on the face of > it, but is actually quite complex when you look to automate a Ceph > cluster from a central user interface, or build more intelligence into > Ceph for avoiding dangerous configurations. It's also painful for > non-expert users who are required to type precisely correct syntax > into that text file. > This feels a bit like a proxy war over whether we are designing a storage appliance or a traditional linux style service. I'm not convinced we can do both well at the same time. If we want both, maybe we need to think about each as independent products with their own goals/management/code/etc. >> And I can already imagine clusters breaking down once config >> database/history breaks for whatever reason, including early implementation >> bugs. >> >> Distributing configs through mon isn't bad idea by itself, I can imagine >> having changes to runtime-changeable settings propagated to OSDs without the >> need for extra step (actually injecting them) and without the need for >> restart, but for anything else, there are already good tools and I see no >> value in trying to mimic them. > > Remember that the goal here is not to just invent an alternative way > of distributing ceph.conf. Even Puppet is overkill for that! The > goal is to change the way configuration is defined in Ceph, so that > there is a central point of truth for how the cluster is configured, > which will enable us to create a user experience that is more robust, > and an interface that enables building better interactive tooling on > top of Ceph. > > When it comes to using something like Puppet as that central point of > truth, there are two major problems with that: > - If someone wants to write a GUI, they would need to integrate with > your Puppet, someone else's Chef, someone else's Ansible, etc -- a lot > of work, and in many cases the interfaces for doing it don't even > exist (believe me, I've tried writing dashboards that drove Puppet in > the past). > - If Ceph wants to validate configuration options, and say "No, that > setting is no good" when someone tries to change something, we can't, > because we're not hooked in to Puppet at the point that the user is > changing the setting. > > The ultimate benefit to you is that by making Ceph easier to use, we > grow our community, and we grow the population of people who want to > invest in Ceph (all of it, not just the new user friendly bits). > > John > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-14 14:33 ` Mark Nelson @ 2017-11-14 16:37 ` Kyle Bader 2017-11-14 18:01 ` Alfredo Deza 0 siblings, 1 reply; 26+ messages in thread From: Kyle Bader @ 2017-11-14 16:37 UTC (permalink / raw) To: Mark Nelson; +Cc: John Spray, Piotr Dałek, Sage Weil, Ceph Development >>> Using Ceph on any decent scale actually requires one to use at least >>> Puppet >>> or similar tool, I wouldn't add any unnecessary complexity to already >>> complex code just because of novice users that are going to have hard >>> time >>> using Ceph anyway once a disk breaks and needs to be replaced, or when >>> performance goes to hell because users are free to create and remove >>> snapshots every 5 minutes. This discussion reminds me of a heated debate we had in the early days about whether configuration management should handle the provisioning of OSDs, or whether Ceph should have a tool to hide the ugliness. At the time, I was staunchly on the configuration management side. We used this horribleness to create new OSDs: https://github.com/dreamhost-cookbooks/ceph/blob/de5929eb45bda50785aa01181b281e25af0d1785/recipes/osd.rb Today we have ceph-disk (and soon to be ceph-volume)! I still have my reservations about the level of udev wizardry, which is tricky to debug, but it generally works and makes the experience better the vast majority of operators, reglardless . This lead to a single method to prepare OSDs that was configuration management agnostic. Nowadays all the Ansible/Chef/Puppet thingers use ceph-disk. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-14 16:37 ` Kyle Bader @ 2017-11-14 18:01 ` Alfredo Deza 0 siblings, 0 replies; 26+ messages in thread From: Alfredo Deza @ 2017-11-14 18:01 UTC (permalink / raw) To: Kyle Bader Cc: Mark Nelson, John Spray, Piotr Dałek, Sage Weil, Ceph Development On Tue, Nov 14, 2017 at 11:37 AM, Kyle Bader <kyle.bader@gmail.com> wrote: >>>> Using Ceph on any decent scale actually requires one to use at least >>>> Puppet >>>> or similar tool, I wouldn't add any unnecessary complexity to already >>>> complex code just because of novice users that are going to have hard >>>> time >>>> using Ceph anyway once a disk breaks and needs to be replaced, or when >>>> performance goes to hell because users are free to create and remove >>>> snapshots every 5 minutes. > > This discussion reminds me of a heated debate we had in the early days > about whether configuration management should handle the provisioning > of OSDs, or whether Ceph should have a tool to hide the ugliness. At > the time, I was staunchly on the configuration management side. We > used this horribleness to create new OSDs: > > https://github.com/dreamhost-cookbooks/ceph/blob/de5929eb45bda50785aa01181b281e25af0d1785/recipes/osd.rb > > Today we have ceph-disk (and soon to be ceph-volume)! I still have my > reservations about the level of udev wizardry, which is tricky to > debug, but it generally works and makes the experience better the vast > majority of operators, reglardless . This lead to a single method to > prepare OSDs that was configuration management agnostic. Nowadays all > the Ansible/Chef/Puppet thingers use ceph-disk. There is a separation needed here I think, where there are tools and abstractions that work at a local (or close to always local) level. ceph-disk and ceph-volume are good examples of this, since they operate with the context of local devices. At some point during the process they do need to inform the cluster of their operations though (e.g. there is a new OSD, register as part of the cluster). So configuration that makes sense for a localized service like ceph-volume (or ceph-disk) makes sense to be on the server itself. That is why there are abstractions via Puppet/Chef/Ansible for these tools, because these tools are cluster-aware, and they are just delegating to localized services. For configuration it might make sense to have this sort of duality, where some settings and configuration makes sense for the server but the rest is for the cluster. I'm not sure that everything (exclusively) must be file-based or on the monitors. If we are trying to make sure users are happy with these changes, lets accept/embrace views like the one from Piotr, which doesn't mean throwing away ideas on where we should be headed. > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-14 10:18 ` Piotr Dałek 2017-11-14 11:36 ` John Spray @ 2017-11-14 13:48 ` Mark Nelson 1 sibling, 0 replies; 26+ messages in thread From: Mark Nelson @ 2017-11-14 13:48 UTC (permalink / raw) To: Piotr Dałek, John Spray, Kyle Bader Cc: Christian Wuerdig, Sage Weil, Ceph Development On 11/14/2017 04:18 AM, Piotr Dałek wrote: > On 17-11-13 07:40 PM, John Spray wrote: >> On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote: >>> Configuration files are often driven by configuration management, with >>> previous versions stored in some kind of version control systems. We >>> should make sure that if configuration moves to the monitors that you >>> have some form of history and rollback capabilities. It might be worth >>> modeling it similar to network switch configuration shells, a la >>> Junos. >>> >>> * change configuration >>> * require commit configuration change >>> * ability to rollback N configuration changes >>> * ability to diff to configuration versions >>> >>> That way an admin can figure out when the last configuration change >>> was, what changed, and rollback if necessary. >> >> That is an extremely good idea. >> >> As a minimal thing, it should be pretty straightforward to implement a >> snapshot/rollback. > > https://thedailywtf.com/articles/The_Complicator_0x27_s_Gloves > >> I imagine many users today are not so disciplined as to version >> control their configs, but this is a good opportunity to push that as >> the norm by building it in. > > Using Ceph on any decent scale actually requires one to use at least > Puppet or similar tool, I wouldn't add any unnecessary complexity to > already complex code just because of novice users that are going to have > hard time using Ceph anyway once a disk breaks and needs to be replaced, > or when performance goes to hell because users are free to create and > remove snapshots every 5 minutes. > And I can already imagine clusters breaking down once config > database/history breaks for whatever reason, including early > implementation bugs. > > Distributing configs through mon isn't bad idea by itself, I can imagine > having changes to runtime-changeable settings propagated to OSDs without > the need for extra step (actually injecting them) and without the need > for restart, but for anything else, there are already good tools and I > see no value in trying to mimic them. > Those were more or less my thoughts as well. Mark ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-10 15:30 config on mons Sage Weil ` (2 preceding siblings ...) 2017-11-13 4:30 ` Christian Wuerdig @ 2017-11-13 13:23 ` John Spray 2017-11-14 22:21 ` Sage Weil 4 siblings, 0 replies; 26+ messages in thread From: John Spray @ 2017-11-13 13:23 UTC (permalink / raw) To: Sage Weil; +Cc: Ceph Development On Fri, Nov 10, 2017 at 3:30 PM, Sage Weil <sweil@redhat.com> wrote: > I've started on this long-discussed feature! I haven't gotten too far but > you can see what's there so far at > > https://github.com/ceph/ceph/pull/18856 Woohoo! > The first thing perhaps is to finalize what flexibility we want to > support. I've a quick summary at > > http://pad.ceph.com/p/config > > Namely, > > config/option = value # like [global] > config/$type/option = value # like [mon] > config/$type.$id/option = value # like [mon.a] > > There are two new things: > > config/.../class:$classname/option = value > > For OSDs, this matches the device_class. So you can do something like > > config/osd/class:ssd/bluestore_cache_size = 10485760 # 10gb, woohoo! > > You can also match the crush location: > > config/.../$crushtype:$crushvalue/option = value > > e.g., > > config/osd/rack:foo/debug_osd = 10 # hunting some issue > > This obviously makes sense for OSDs. We can also make it makes sense for > non-OSDs since everybody (clients and daemons) has a concept of > crush_location that is a set of key/value pairs like "host=foo rack=bar" > which match the CRUSH hierarchy. In this case, my plan is to make the > initial mon authentication step include the hostname of the host you're > connecting from and then extract the rest of the location by lookup > up the host in the CRUSH map. > > The precedence for these is described here: > > https://github.com/ceph/ceph/pull/18856/commits/5abbd0c9e279022f185787238d21eabbbe28e336#diff-344645b5339d494e1839ff1fcaa5cb7dR15 > > > Lots of other thorny issues to consider. For example: > > - What about monitor configs? If they store their config paxos, and you > set an option that breaks paxos, how can you change/fix it? For the > moment I'm just ignoring the mons. > > - What about ceph.conf? My thought here is to mark which options are > legal for bootstrap (i.e., used during the initial connection to mon to > authenticate and fetch config), and warn on anything other than that in > ceph.conf. But what about after you connect? Do these options get reset > to default? I can't immediately think of examples of something that would be needed for bootstrap but would also be sane to change later? In general if something is needed for bootstrap I would imagine that the local setting would be authoritative, but I suspect (because you're bringing it up) that there are cases where this doesn't apply... > - Bootstrapping/upgrade: So far my best idea is to make the client share > it's config with the mon on startup, and the first time a given daemon > connects the mon will use that to populate it's config database. > Thereafter it will be ignored. I hate upgrades :-) This sounds like a sane thing to do. We certainly have to do *something* or we'll have really nasty issues like we had when the crush_location_hook setting changed names. For our two-major-versions commitment from Mimic onwards, I guess that means we leave this mechanism in for the N release too, and then eventually remove in the O release. BTW I learned about the "cockeyed squid" aka "strawberry squid" yesterday, so I think that's a strong candidate for the S name when we get there, just thinking ahead :-) > - OSD startup: lots of stuff happens before we authenticate. I think > there will be a new initial step to fetch config, then do all that work, > then start up for real. And a new option to bypass mon configuration > to avoid that (and for old school folks who don't want centralized > configs... e.g. mon_config = false and everything works as before). I know I'm on the opinionated end of the spectrum here, but I'm not quite convinced we should leave in a "mon_config = false" option. If we continue to let people use the local file interface through this version, then it's at least another three versions before we can ultimately remove it, whereas if we disable it now (apart from the initial load on upgrade) then we are starting the clock for ultimately removing that plumbing. We do need to support the upgrade path, but if we enable it to optionally run with local config on an ongoing basis then we might be undermining the motivations for building the centralized infrastructure (the confidence/certainty that the value set is a validated thing, and that it is really what is in effect). John > > Feedback welcome! > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-10 15:30 config on mons Sage Weil ` (3 preceding siblings ...) 2017-11-13 13:23 ` John Spray @ 2017-11-14 22:21 ` Sage Weil 2017-11-14 23:45 ` John Spray 4 siblings, 1 reply; 26+ messages in thread From: Sage Weil @ 2017-11-14 22:21 UTC (permalink / raw) To: ceph-devel I've updated the pad at http://pad.ceph.com/p/config After thinking about this a bit more, I think we may need to abandon the idea of a pure ceph.conf-less world. Lots of people already have tooling around ceph.conf, getting rid of it will be an awkward process (even for a one-time upgrade), and I'm not sure we can eliminate it entirely anyway since many options affect the bootstrapping phase, authentication, and so on. Instead, I'm currently partial to giving processes a more nuanced view of their config based on where the value comes from. A single option may include 1- a default value (compiled in) 2- a value from the mon 3- a value from ceph.conf 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set ...', etc. We would always use the highest-priority value on that list. This means that ceph.conf can override the mon, just like a command-line argument overrides ceph.conf. On the flip side of this, all of these values are also reported to the mgr and tracked along with the other daemon state. So regardless of where config values come from, it is all still visible via the CLI, GUI, or whatever else. Further, we can then make the GUI (or CLI or whatever) act on that information to, say, - assimilate ceph.conf values into the mon so that ceph.cong can be removed/abbreviated (i.e., the upgrade/transition path to centralized config) - see override values set via cli (i.e., in gui) - clear override values (i.e., ceph tell <daemon> config rm <name>) - surface a HEALTH_WARN if a CLI or 'config set' override has been set on one or more daemons (so the operator knows the running config is not persistent). - surface a HEALTH_WARN if a mon option is overriden by a daemon's local ceph.conf file. Notably, the user can also do nothing and the cluster can continue to operate as it always has. The mgr will still have the new visibility into running daemon options, so the GUI experience will still be consistent--they just won't be able to change configs centrally (or rather, those settings won't have any effect if old ceph.conf's override them). I think Kyle's revision history suggestion is a great one. I don't have any bright ideas about how this should be managed on the mon side yet, but I agree that it is an important function and should be baked in from day 1. Thoughts? sage ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-14 22:21 ` Sage Weil @ 2017-11-14 23:45 ` John Spray 2017-11-15 13:32 ` Sage Weil 0 siblings, 1 reply; 26+ messages in thread From: John Spray @ 2017-11-14 23:45 UTC (permalink / raw) To: Sage Weil; +Cc: Ceph Development On Tue, Nov 14, 2017 at 10:21 PM, Sage Weil <sage@newdream.net> wrote: > I've updated the pad at > > http://pad.ceph.com/p/config > > After thinking about this a bit more, I think we may need to abandon the > idea of a pure ceph.conf-less world. Lots of people already have tooling > around ceph.conf, getting rid of it will be an awkward process (even for a > one-time upgrade), and I'm not sure we can eliminate it entirely anyway > since many options affect the bootstrapping phase, authentication, and so > on. > > Instead, I'm currently partial to giving processes a more nuanced view of > their config based on where the value comes from. A single option may > include > > 1- a default value (compiled in) > 2- a value from the mon > 3- a value from ceph.conf > 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set > ...', etc. > > We would always use the highest-priority value on that list. This means > that ceph.conf can override the mon, just like a command-line argument > overrides ceph.conf. I think that if there are some folks who just cannot work without loading local configs onto their nodes, I want to insulate folks working on user interfaces from having to handle the resulting complexity. The folks pushing config files out to their nodes presumably have their own preferred way of dealing with this stuff, so they shouldn't miss it from the Ceph UI. In that spirit, I think that we don't need to have a per-setting granularity of what is overridden and what isn't: daemons should just flag whether they are consuming the mon config (default), or whether they are using local ceph.conf. That way, folks building UIs can grey things out at a whole-page level if the cluster is not using centralized config. It sacrifices some flexibility for the people who want to use local conf for some things but central conf for others (do those people exist?) but I think it's worth it to avoid having a complicated UI that has to worry about displaying and communicating the subtle distinctions between those 1/2/3/4 values which might all be different. The upshot would be that UI developers could build elements that work as expected by default for systems that use the central config, but safely disable themselves on systems where the user has gone their own way and pushed out local configuration. > On the flip side of this, all of these values are also reported to the mgr > and tracked along with the other daemon state. So regardless of where > config values come from, it is all still visible via the CLI, GUI, or > whatever else. > > Further, we can then make the GUI (or CLI or whatever) act on that > information to, say, > > - assimilate ceph.conf values into the mon so that ceph.cong can be > removed/abbreviated (i.e., the upgrade/transition path to centralized > config) > - see override values set via cli (i.e., in gui) > - clear override values (i.e., ceph tell <daemon> config rm <name>) > - surface a HEALTH_WARN if a CLI or 'config set' override has been > set on one or more daemons (so the operator knows the running config is > not persistent). This comes back to our recurring discussion about whether a HEALTH_INFO level should exist: I'm increasingly of the opinion that when we run into things like this, it's nature's way of telling us that maybe our underlying model is weird (in this case, maybe we didn't need to have the concept of ephemeral configuration settings in the system at all). Maybe ephemeral config changes should be treated the same way I propose to treat local overrides: the daemon reports just that it has been overridden, and the GUI goes hands-off and does not attempt to communicate the story to the user "Well, you see, it's currently set to xyz until the next restart, at which point it will revert to abc, that is unless you have a local ceph.conf in which case...". The ability to rollback config changes seems like it would be the "right way" to accomplish having some config settings that we set and then subsequently revert, rather than having the revert happen implicitly when the daemon next restarts (intentionally or not). > - surface a HEALTH_WARN if a mon option is overriden by a daemon's local > ceph.conf file. Hmm, this makes me a bit confused, as if you're still thinking of the local ceph.conf being a deprecated/upgrade thing? If it's really permitted in general than it wouldn't make sense for it to be a WARN. John > Notably, the user can also do nothing and the cluster can continue to > operate as it always has. The mgr will still have the new visibility into > running daemon options, so the GUI experience will still be > consistent--they just won't be able to change configs centrally (or > rather, those settings won't have any effect if old ceph.conf's override > them). > > I think Kyle's revision history suggestion is a great one. I don't have > any bright ideas about how this should be managed on the mon side yet, but > I agree that it is an important function and should be baked in from day > 1. > > Thoughts? > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-14 23:45 ` John Spray @ 2017-11-15 13:32 ` Sage Weil 2017-11-15 17:16 ` Lars Marowsky-Bree 0 siblings, 1 reply; 26+ messages in thread From: Sage Weil @ 2017-11-15 13:32 UTC (permalink / raw) To: John Spray; +Cc: Ceph Development On Tue, 14 Nov 2017, John Spray wrote: > On Tue, Nov 14, 2017 at 10:21 PM, Sage Weil <sage@newdream.net> wrote: > > I've updated the pad at > > > > http://pad.ceph.com/p/config > > > > After thinking about this a bit more, I think we may need to abandon the > > idea of a pure ceph.conf-less world. Lots of people already have tooling > > around ceph.conf, getting rid of it will be an awkward process (even for a > > one-time upgrade), and I'm not sure we can eliminate it entirely anyway > > since many options affect the bootstrapping phase, authentication, and so > > on. > > > > Instead, I'm currently partial to giving processes a more nuanced view of > > their config based on where the value comes from. A single option may > > include > > > > 1- a default value (compiled in) > > 2- a value from the mon > > 3- a value from ceph.conf > > 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set > > ...', etc. > > > > We would always use the highest-priority value on that list. This means > > that ceph.conf can override the mon, just like a command-line argument > > overrides ceph.conf. > > I think that if there are some folks who just cannot work without > loading local configs onto their nodes, I want to insulate folks > working on user interfaces from having to handle the resulting > complexity. The folks pushing config files out to their nodes > presumably have their own preferred way of dealing with this stuff, so > they shouldn't miss it from the Ceph UI. > > In that spirit, I think that we don't need to have a per-setting > granularity of what is overridden and what isn't: daemons should just > flag whether they are consuming the mon config (default), or whether > they are using local ceph.conf. That way, folks building UIs can grey > things out at a whole-page level if the cluster is not using > centralized config. It sacrifices some flexibility for the people who > want to use local conf for some things but central conf for others (do > those people exist?) but I think it's worth it to avoid having a > complicated UI that has to worry about displaying and communicating > the subtle distinctions between those 1/2/3/4 values which might all > be different. The problem is I think non-trivial ceph.confs are going to still be required in many valid situations, since there are a load of settings that affect how to connect and authenticate with the mon. For most users the defaults will do and it will just be 'mon_host' (or maybe they use DNS for this), but any nontrivial authentication settings (e.g., kerberos is coming) or messenger types will require something. (We also have to allow local overrides to ensure that the mon config can't brick the cluster by setting the internal mon settings, like paxos_*, to some bad value.) I could see us combining 3-4 to simplify, though; the fact that a setting will go away on daemon restart isn't that interesting or normal, and presumably the cluster is *already* in a state where the mon and ceph.conf configs aren't fighting each other, so any disparity there will be seen for what it is. I think the reporting to mgr to make a distinction needs to be there, though, because (1) to make a transition we want to see the delta between what the daemon has running and what the mon wants, and (2) I don't think we should make things like 'ceph daemon ... config set ...' turn into a request to the monitor to set a config so that the daemon will get a corresponding config update. These are low-level commands that are important for debugging/fixing issues and I we shouldn't break them. > The upshot would be that UI developers could build elements that work > as expected by default for systems that use the central config, but > safely disable themselves on systems where the user has gone their own > way and pushed out local configuration. I think the scenarios aren't too complex for the UI: - the mon config doesn't match running config. - button to update mon config to match running config, and/or - button to clear running config so that it matches mon config - the mon config is overridden by local ceph.conf - button to update/abbreviate/remove local ceph.conf settings so that mon can drive We can either keep the distinction for 3-4 and implement both, or blur them and the 'clear running config' just won't do anything. Or the UI can not implement those buttons at all and just show that there is a disaparity and leave it to the user to fix (or not)...? > > On the flip side of this, all of these values are also reported to the mgr > > and tracked along with the other daemon state. So regardless of where > > config values come from, it is all still visible via the CLI, GUI, or > > whatever else. > > > > Further, we can then make the GUI (or CLI or whatever) act on that > > information to, say, > > > > - assimilate ceph.conf values into the mon so that ceph.cong can be > > removed/abbreviated (i.e., the upgrade/transition path to centralized > > config) > > > - see override values set via cli (i.e., in gui) > > - clear override values (i.e., ceph tell <daemon> config rm <name>) > > - surface a HEALTH_WARN if a CLI or 'config set' override has been > > set on one or more daemons (so the operator knows the running config is > > not persistent). > > This comes back to our recurring discussion about whether a > HEALTH_INFO level should exist: I'm increasingly of the opinion that > when we run into things like this, it's nature's way of telling us > that maybe our underlying model is weird (in this case, maybe we > didn't need to have the concept of ephemeral configuration settings in > the system at all). > > Maybe ephemeral config changes should be treated the same way I > propose to treat local overrides: the daemon reports just that it has > been overridden, and the GUI goes hands-off and does not attempt to > communicate the story to the user "Well, you see, it's currently set > to xyz until the next restart, at which point it will revert to abc, > that is unless you have a local ceph.conf in which case...". I don't think the restart subtlety needs to be communicated... > The ability to rollback config changes seems like it would be the > "right way" to accomplish having some config settings that we set and > then subsequently revert, rather than having the revert happen > implicitly when the daemon next restarts (intentionally or not). Agreed. We should be setting things like debug_osd=20 for diagnosing an issue via the mon. > > - surface a HEALTH_WARN if a mon option is overriden by a daemon's local > > ceph.conf file. > > Hmm, this makes me a bit confused, as if you're still thinking of the > local ceph.conf being a deprecated/upgrade thing? If it's really > permitted in general than it wouldn't make sense for it to be a WARN. This warning would only appear if the mon sets option foo to A and the conf sets teh same option to B... sage ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-15 13:32 ` Sage Weil @ 2017-11-15 17:16 ` Lars Marowsky-Bree 2017-11-15 21:26 ` Sage Weil 0 siblings, 1 reply; 26+ messages in thread From: Lars Marowsky-Bree @ 2017-11-15 17:16 UTC (permalink / raw) To: Ceph Development On 2017-11-15T13:32:55, Sage Weil <sage@newdream.net> wrote: I am strongly in favor of moving the config to the MONs, and depreciating ceph.conf - maybe a ceph-bootstrap.conf for connecting to the MONs to get it, but that's it. In a previous life, I helped design a Cluster Information Base to reduce config drift - a central information store is vastly superior to files copied around, whether that happens manually or from a config management system. It's always outdated *somewhere*, and Ceph already has the concept of the MONs having maps and a concurrency/consistency algorithm for them (beloved PAXOS), so it doesn't add any significant complexity. So for once, I vote for building it in. Don't add etcd/consul. We want strong consistency here, and can build on stuff already there. If Ceph would need to invent this from scratch, sure, but thus we can build on something existing that needs to work anyway or we're screwed. > > > 1- a default value (compiled in) > > > 2- a value from the mon > > > 3- a value from ceph.conf > > > 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set > > > ...', etc. I'm opposed to 3 and 4. I *can* see the need to override a value on a per-host or on a per-daemon instance basis (including combinations thereof, e.g., all OSDs on node X). (Back when, we also expected these to be way more frequently needed; to this day, I can count on my fingers the times I needed per-host overrides, I think; really the only use case where this happens more often are debug flags.) But if you want any sort of consistency, those modify the settings in the respective map on the MON, and the daemon *then* gets that one from the single authoritative source of truth. > (We also have to allow local overrides to ensure that the mon config can't > brick the cluster by setting the internal mon settings, like paxos_*, to > some bad value.) Valid point; but perhaps this could be solved by allowing the MONs to start up in a "safe" mode, too? > I think the reporting to mgr to make a distinction needs to be there, > though, because (1) to make a transition we want to see the delta between > what the daemon has running and what the mon wants, and (2) I don't think > we should make things like 'ceph daemon ... config set ...' turn into a > request to the monitor to set a config so that the daemon will get a > corresponding config update. These are low-level commands that are > important for debugging/fixing issues and I we shouldn't break them. I'm not perfectly sure about this one, see above. I think having a single channel through which config updates reach daemons might be worth it. > Or the UI can not implement those buttons at all and just show that there > is a disaparity and leave it to the user to fix (or not)...? ... or make the disparity go away through a single source of truth. Regards, Lars -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-15 17:16 ` Lars Marowsky-Bree @ 2017-11-15 21:26 ` Sage Weil 2017-11-30 22:31 ` Gregory Farnum 0 siblings, 1 reply; 26+ messages in thread From: Sage Weil @ 2017-11-15 21:26 UTC (permalink / raw) To: Lars Marowsky-Bree; +Cc: Ceph Development On Wed, 15 Nov 2017, Lars Marowsky-Bree wrote: > On 2017-11-15T13:32:55, Sage Weil <sage@newdream.net> wrote: > > > > 1- a default value (compiled in) > > > > 2- a value from the mon > > > > 3- a value from ceph.conf > > > > 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set > > > > ...', etc. > > I'm opposed to 3 and 4. > > I *can* see the need to override a value on a per-host or on a > per-daemon instance basis (including combinations thereof, e.g., all > OSDs on node X). (Back when, we also expected these to be way more > frequently needed; to this day, I can count on my fingers the times I > needed per-host overrides, I think; really the only use case where this > happens more often are debug flags.) > > But if you want any sort of consistency, those modify the settings in > the respective map on the MON, and the daemon *then* gets that one from > the single authoritative source of truth. The problem is this makes the system more fragile, and with a complex distributed, and the types of things I've needed to diagnose and debug in the past, I am very nervous about taking away the ability to force a config value locally (e.g., via 'ceph daemon ...', when it is having trouble pulling config from the mon for whatever reason). ... As far as broad principles go, I think we are mostly in alignment: (1) we want centrally managed config, (2) managed by the mons, for (3) a simplified user experience, and (4) an easy upgrade path to get there. I think the implementation required to get that is roughly what I described, and although it sounds complicated, none of the key pieces can really be taken away. 1. Daemons report running config to mgr. We need some form of this no matter what for the upgrade/transition. Beyond that, I think it's still important in order to tell whether the "single source of truth" is something that even can be true: (1) some options cannot be changed at runtime and require a restart, (2) some options may have illegal/invalid values, (3) the set of allowed options may change build to build, so something that used to valid may not be anymore or may not be if the daemon is newer or older than the mon. 2. Local overrides are possible. This can/should be rare and reserved for extraordinary circumstances, but I don't feel comfortable removing this. In a complex there are many things that could prevent the daemon from speaking to the mon to get an updated config. 3. ceph.conf is allowed in at least some cases. This is more or less a given on the mon in order to handle bootstraping and to resolve bad changes to the mon config (that, say, break paxos itself). There are also still cases where initial options are needed to fetch the rest of the config from the mon. And during the transition period it is required. I think the real question is whether, post-nautilus, we continue to encourage or allow ceph.conf for daemons. I think this is a decision that amounts to turning it off in certain circumstances to force users into a better world, but it's not something we can do away with to simplify the world today. We can still ignore this possibility from the GUI, perhaps, but I think we're better off lumping it together with #2 and doing something extremely simple like, say, putting a (!) icon next to options that the daemon isn't respecting (because they have overridden it, or need to restart, or it is not valid, or whatever else). I can't see a way to change 1-3 above without a very different approach (like, using something external to the mons). Am I missing something? sage ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-15 21:26 ` Sage Weil @ 2017-11-30 22:31 ` Gregory Farnum 2017-12-01 17:53 ` Sage Weil 0 siblings, 1 reply; 26+ messages in thread From: Gregory Farnum @ 2017-11-30 22:31 UTC (permalink / raw) To: Sage Weil, John Spray; +Cc: Ceph Development I'm resurrecting this thread since it wasn't clear a consensus was reached, I was out on vacation while it was happening, and it doesn't look like there's been much work done yet to render any discussion obsolete. Mostly, I agree with Sage's last email, but I think I have a few other points to raise. :) On Wed, Nov 15, 2017 at 1:26 PM, Sage Weil <sage@newdream.net> wrote: > On Wed, 15 Nov 2017, Lars Marowsky-Bree wrote: >> On 2017-11-15T13:32:55, Sage Weil <sage@newdream.net> wrote: >> > > > 1- a default value (compiled in) >> > > > 2- a value from the mon >> > > > 3- a value from ceph.conf >> > > > 4- a value set via command line, 'ceph tell', 'ceph daemon ... config set >> > > > ...', etc. >> >> I'm opposed to 3 and 4. >> >> I *can* see the need to override a value on a per-host or on a >> per-daemon instance basis (including combinations thereof, e.g., all >> OSDs on node X). (Back when, we also expected these to be way more >> frequently needed; to this day, I can count on my fingers the times I >> needed per-host overrides, I think; really the only use case where this >> happens more often are debug flags.) >> >> But if you want any sort of consistency, those modify the settings in >> the respective map on the MON, and the daemon *then* gets that one from >> the single authoritative source of truth. > > The problem is this makes the system more fragile, and with a > complex distributed, and the types of things I've needed to diagnose and > debug in the past, I am very nervous about taking away the ability to > force a config value locally (e.g., via 'ceph daemon ...', when it is > having trouble pulling config from the mon for whatever reason). Yes, we definitely need a local override. For one thing, we need to be able to turn on and configure OSDs in disconnected modes (eg, journal flushes with FileStore) that involve turning on an awful lot of the full system. Remembering to mark specific config options as "allowed-to-set-locally" is just not practical or maintainable. > > ... > > As far as broad principles go, I think we are mostly in alignment: (1) we > want centrally managed config, (2) managed by the mons, for (3) a > simplified user experience, and (4) an easy upgrade path to get there. > I think the implementation required to get that is roughly what I > described, and although it sounds complicated, none of the key pieces can > really be taken away. > > 1. Daemons report running config to mgr. We need some form of this no > matter what for the upgrade/transition. Beyond that, I think it's still > important in order to tell whether the "single source of truth" is > something that even can be true: (1) some options cannot be changed at > runtime and require a restart, (2) some options may have illegal/invalid > values, (3) the set of allowed options may change build to build, so > something that used to valid may not be anymore or may not be if the > daemon is newer or older than the mon. > > 2. Local overrides are possible. This can/should be rare and reserved > for extraordinary circumstances, but I don't feel comfortable removing > this. In a complex there are many things that could prevent the daemon > from speaking to the mon to get an updated config. > > 3. ceph.conf is allowed in at least some cases. This is more or less a > given on the mon in order to handle bootstraping and to resolve bad > changes to the mon config (that, say, break paxos itself). There are also > still cases where initial options are needed to fetch the rest of the > config from the mon. And during the transition period it is required. > > I think the real question is whether, post-nautilus, we continue to > encourage or allow ceph.conf for daemons. I think this is a decision that > amounts to turning it off in certain circumstances to force users into a > better world, but it's not something we can do away with to simplify the > world today. We can still ignore this possibility from the GUI, perhaps, > but I think we're better off lumping it together with #2 and doing > something extremely simple like, say, putting a (!) icon next to options > that the daemon isn't respecting (because they have overridden it, or need > to restart, or it is not valid, or whatever else). > > I can't see a way to change 1-3 above without a very different approach > (like, using something external to the mons). Am I missing something? I think you're correct about these three statements. My inclination would be to shift the documentation and expectation to using the central config service, but that we don't break anything which users might already have. As long as we expose that daemons have differing config values from the central service, ceph-mgr can be as clever or dumb as it wants about handling that. By the same token, though, I don't think we need to take central responsibility for removing or editing configs which aren't in the central mon store. Doing that parsing is a pain in the butt and presumably anybody who set up a real ceph.conf can manage to remove it themselves. One thing we could maybe do is identify the "local config" settings in Nautilus (that is, stuff specifying specific disks and paths, or otherwise necessary to make the daemon turn on) and offer a one-click "delete the ceph.conf and replace it with the minimal set", but that would just be a one-time option to make life better for upgraders, not something we want to commit to. Now, starting from the beginning of the thread, a few other things... On Fri, Nov 10, 2017 at 7:30 AM, Sage Weil <sweil@redhat.com> wrote: > Namely, > > config/option = value # like [global] > config/$type/option = value # like [mon] > config/$type.$id/option = value # like [mon.a] I am finding this really difficult to work with. Do you expect for users to manipulate this directly? I can imagine this being the internal schema, but I hope the CLI commands and GUI are about setting options on buckets which are pretty-printed in the "osd tree" command! > There are two new things: > > config/.../class:$classname/option = value > > For OSDs, this matches the device_class. So you can do something like > > config/osd/class:ssd/bluestore_cache_size = 10485760 # 10gb, woohoo! > > You can also match the crush location: > > config/.../$crushtype:$crushvalue/option = value > > e.g., > > config/osd/rack:foo/debug_osd = 10 # hunting some issue > > This obviously makes sense for OSDs. We can also make it makes sense for > non-OSDs since everybody (clients and daemons) has a concept of > crush_location that is a set of key/value pairs like "host=foo rack=bar" > which match the CRUSH hierarchy. I am not understanding this at all — I don't think we can have any expectation that clients know where they are in relationship to the CRUSH tree. Frequently they are not sharing any of the specified resources, and they are much more likely to shift locations than OSDs are. (eg, rbd running in compute boxes in different domains from the storage nodes, possibly getting live migrated...) On Mon, Nov 13, 2017 at 10:40 AM, John Spray <jspray@redhat.com> wrote: > On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote: >> Configuration files are often driven by configuration management, with >> previous versions stored in some kind of version control systems. We >> should make sure that if configuration moves to the monitors that you >> have some form of history and rollback capabilities. It might be worth >> modeling it similar to network switch configuration shells, a la >> Junos. >> >> * change configuration >> * require commit configuration change >> * ability to rollback N configuration changes >> * ability to diff to configuration versions >> >> That way an admin can figure out when the last configuration change >> was, what changed, and rollback if necessary. > > That is an extremely good idea. > > As a minimal thing, it should be pretty straightforward to implement a > snapshot/rollback. > > I imagine many users today are not so disciplined as to version > control their configs, but this is a good opportunity to push that as > the norm by building it in. I get the appeal of snapshotting, but I am definitely not convinced this is something we should build directly into the monitors. Do you have an implementation in mind? It seems to me like this is something we can implement pretty easily in ceph-mgr (either by restricting the snapshotting to mechanisms that make changes via the manager, or by subscribing to config changes), and that for admins using orchestration frameworks they already get rollbackability from their own version control. Why not take advantage of those easier development environments, which are easy to adjust later if we find new requirements or issues? On Tue, Nov 14, 2017 at 3:45 PM, John Spray <jspray@redhat.com> wrote: > This comes back to our recurring discussion about whether a > HEALTH_INFO level should exist: I'm increasingly of the opinion that > when we run into things like this, it's nature's way of telling us > that maybe our underlying model is weird (in this case, maybe we > didn't need to have the concept of ephemeral configuration settings in > the system at all). > > Maybe ephemeral config changes should be treated the same way I > propose to treat local overrides: the daemon reports just that it has > been overridden, and the GUI goes hands-off and does not attempt to > communicate the story to the user "Well, you see, it's currently set > to xyz until the next restart, at which point it will revert to abc, > that is unless you have a local ceph.conf in which case...". I'm with you on this — I don't think there's a reason for the central config to distinguish between *kinds* of disagreement. We probably want to expose which daemons are disagreeing on which options, but I'm not seeing the utility of diagnosing *where* the disagreement was injected. We can do a lot with those reported config options and their disagreements that I think will be of value, though! *) we can specify that certain config options must not be overridden — heartbeat timeouts, for instance — and we boot anybody who does so *) we can be selective about which configs we care about matching in the GUI. If we roll out a new AwesomeMessenger, we may want to let users switch to it incrementally and expose that in the GUI. We may get ambitious someday and have a one-click "convert this OSD to Bluestore" button. etc. But maybe we just ignore all filestore config settings, since we're moving to BlueStore and don't care how those may be set differently for different classes of OSDs. We can deal with the fact that sometimes a support tech will tell customers to restart an OSD with debug settings on the command line, and we don't want to disable part of their dashboard gui when that happens. *) we can recommend importing differences into the central config store (eg on upgrade) when they match some heuristic standard of "makes sense" -Greg ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: config on mons 2017-11-30 22:31 ` Gregory Farnum @ 2017-12-01 17:53 ` Sage Weil 0 siblings, 0 replies; 26+ messages in thread From: Sage Weil @ 2017-12-01 17:53 UTC (permalink / raw) To: Gregory Farnum; +Cc: John Spray, Ceph Development [-- Attachment #1: Type: TEXT/PLAIN, Size: 8422 bytes --] On Thu, 30 Nov 2017, Gregory Farnum wrote: > I'm resurrecting this thread since it wasn't clear a consensus was > reached, I was out on vacation while it was happening, and it doesn't > look like there's been much work done yet to render any discussion > obsolete. Thanks! > My inclination would be to shift the documentation and expectation to > using the central config service, but that we don't break anything > which users might already have. As long as we expose that daemons have > differing config values from the central service, ceph-mgr can be as > clever or dumb as it wants about handling that. +1 > By the same token, though, I don't think we need to take central > responsibility for removing or editing configs which aren't in the > central mon store. Doing that parsing is a pain in the butt and > presumably anybody who set up a real ceph.conf can manage to remove it > themselves. > One thing we could maybe do is identify the "local config" settings in > Nautilus (that is, stuff specifying specific disks and paths, or > otherwise necessary to make the daemon turn on) and offer a one-click > "delete the ceph.conf and replace it with the minimal set", but that > would just be a one-time option to make life better for upgraders, not > something we want to commit to. Yeah, I view this as TBD. I want there to be *some* transition path but I'm not sure how magic it should be. Among other issues, daemons run as user ceph and won't be able to overwrite /etc/ceph/ceph.conf (usually owned by root), so... yeah. > On Fri, Nov 10, 2017 at 7:30 AM, Sage Weil <sweil@redhat.com> wrote: > > Namely, > > > > config/option = value # like [global] > > config/$type/option = value # like [mon] > > config/$type.$id/option = value # like [mon.a] > > I am finding this really difficult to work with. Do you expect for > users to manipulate this directly? I can imagine this being the > internal schema, but I hope the CLI commands and GUI are about setting > options on buckets which are pretty-printed in the "osd tree" command! The plan is to *store* these in config-key, but have a new, higher-level CLI interface (ceph config ...) to them. That interface would do the validation to make sure you are not talking nonsense: verify values are legal, config option exists, is not being set on a daemon that doesn't care, isn't something that is ceph.conf-only, etc. It would also have the 'show' commands that would dump the running config for a daemon and so on. > > There are two new things: > > > > config/.../class:$classname/option = value > > > > For OSDs, this matches the device_class. So you can do something like > > > > config/osd/class:ssd/bluestore_cache_size = 10485760 # 10gb, woohoo! > > > > You can also match the crush location: > > > > config/.../$crushtype:$crushvalue/option = value > > > > e.g., > > > > config/osd/rack:foo/debug_osd = 10 # hunting some issue > > > > This obviously makes sense for OSDs. We can also make it makes sense for > > non-OSDs since everybody (clients and daemons) has a concept of > > crush_location that is a set of key/value pairs like "host=foo rack=bar" > > which match the CRUSH hierarchy. > > I am not understanding this at all — I don't think we can have any > expectation that clients know where they are in relationship to the > CRUSH tree. Frequently they are not sharing any of the specified > resources, and they are much more likely to shift locations than OSDs > are. (eg, rbd running in compute boxes in different domains from the > storage nodes, possibly getting live migrated...) The idea is that *everyone* knows their hostname, which (if the CRUSH hierarchy is populated) is enough to tell us the crush location. Obviously some clients will be on hosts not in the map and won't know--that's fine. Generally daemons will be, or can be, if we make an effort to place hosts that have mon/mgr/mds/rgw/etc daemons but not OSDs in the map. But even if ignore that an only make it work for OSDs that's pretty useful too. > On Mon, Nov 13, 2017 at 10:40 AM, John Spray <jspray@redhat.com> wrote: > > On Mon, Nov 13, 2017 at 6:20 PM, Kyle Bader <kyle.bader@gmail.com> wrote: > >> Configuration files are often driven by configuration management, with > >> previous versions stored in some kind of version control systems. We > >> should make sure that if configuration moves to the monitors that you > >> have some form of history and rollback capabilities. It might be worth > >> modeling it similar to network switch configuration shells, a la > >> Junos. > >> > >> * change configuration > >> * require commit configuration change > >> * ability to rollback N configuration changes > >> * ability to diff to configuration versions > >> > >> That way an admin can figure out when the last configuration change > >> was, what changed, and rollback if necessary. > > > > That is an extremely good idea. > > > > As a minimal thing, it should be pretty straightforward to implement a > > snapshot/rollback. > > > > I imagine many users today are not so disciplined as to version > > control their configs, but this is a good opportunity to push that as > > the norm by building it in. > > I get the appeal of snapshotting, but I am definitely not convinced > this is something we should build directly into the monitors. Do you > have an implementation in mind? > It seems to me like this is something we can implement pretty easily > in ceph-mgr (either by restricting the snapshotting to mechanisms that > make changes via the manager, or by subscribing to config changes), > and that for admins using orchestration frameworks they already get > rollbackability from their own version control. Why not take advantage > of those easier development environments, which are easy to adjust > later if we find new requirements or issues? I have no good implementation ideas yet, so I'm just ignoring it for the moment. I think a ceph-based interface would be valuable, though. Say, ceph config checkpoint foo ceph config set osd.0 debug_osd 20 ... ceph config rollback foo or even ceph config rollback foo osd.0 # just rollback osd.0's config Even a pretty basic implementation like encoding all of config/ in a map and stuffing it into a config/checkpoint/foo key (compressed even?) would be sufficient for that sort of thing. Alternatively, a complete config changelog/history could also support the above and would let you do a 'ceph config history [osd.0]' type command that tells you how the config has changed, and when, going backwards in time. Of course, having all of that doesn't prevent you from using your existing external tools to manage configs and history. Perhaps a 'ceph config import' type operation that takes a dump of everything (efficiently) is appropriate for supporting that well. > On Tue, Nov 14, 2017 at 3:45 PM, John Spray <jspray@redhat.com> wrote: > > This comes back to our recurring discussion about whether a > > HEALTH_INFO level should exist: I'm increasingly of the opinion that > > when we run into things like this, it's nature's way of telling us > > that maybe our underlying model is weird (in this case, maybe we > > didn't need to have the concept of ephemeral configuration settings in > > the system at all). > > > > Maybe ephemeral config changes should be treated the same way I > > propose to treat local overrides: the daemon reports just that it has > > been overridden, and the GUI goes hands-off and does not attempt to > > communicate the story to the user "Well, you see, it's currently set > > to xyz until the next restart, at which point it will revert to abc, > > that is unless you have a local ceph.conf in which case...". > > I'm with you on this — I don't think there's a reason for the central > config to distinguish between *kinds* of disagreement. We probably > want to expose which daemons are disagreeing on which options, but I'm > not seeing the utility of diagnosing *where* the disagreement was > injected. Having a active/not active on the mgr/mon seems fine; I think it's mostly a matter of how much effort we want to invest in that interface. I plan to make the 'ceph daemon X config diff' show the complete story (from the daemons perspective), indicating each source (default, conf, mon, override) and value that is in play, along with the effective result. sage ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2017-12-01 17:53 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-11-10 15:30 config on mons Sage Weil 2017-11-13 0:27 ` Patrick Donnelly 2017-11-13 1:43 ` Yehuda Sadeh-Weinraub 2017-11-13 9:57 ` John Spray 2017-11-13 16:29 ` Yehuda Sadeh-Weinraub 2017-11-13 4:30 ` Christian Wuerdig 2017-11-13 10:00 ` John Spray 2017-11-13 16:45 ` Mark Nelson 2017-11-13 18:20 ` Kyle Bader 2017-11-13 18:40 ` John Spray 2017-11-14 10:18 ` Piotr Dałek 2017-11-14 11:36 ` John Spray 2017-11-14 13:58 ` Piotr Dałek 2017-11-14 16:24 ` Sage Weil 2017-11-14 14:33 ` Mark Nelson 2017-11-14 16:37 ` Kyle Bader 2017-11-14 18:01 ` Alfredo Deza 2017-11-14 13:48 ` Mark Nelson 2017-11-13 13:23 ` John Spray 2017-11-14 22:21 ` Sage Weil 2017-11-14 23:45 ` John Spray 2017-11-15 13:32 ` Sage Weil 2017-11-15 17:16 ` Lars Marowsky-Bree 2017-11-15 21:26 ` Sage Weil 2017-11-30 22:31 ` Gregory Farnum 2017-12-01 17:53 ` Sage Weil
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.