ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Rook orchestrator module
@ 2020-09-29 19:31 Travis Nielsen
  2020-09-29 19:50 ` Jason Dillaman
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Travis Nielsen @ 2020-09-29 19:31 UTC (permalink / raw)
  To: Sebastian Wagner, Patrick Donnelly, Varsha Rao, Sebastien Han,
	Ceph Development List

Sebastian and fellow orchestrators,

Some questions have come up recently about issues in the Rook
orchestrator module and its state of disrepair. Patrick, Varsha, and I
have been discussing these recently as Varsha has been working on the
module. Before we fix all the issues that are being found, I want to
start a higher level conversation. I’ll join the leads meeting
tomorrow to discuss, and would be good to include in the Monday
orchestrator agenda as well, which unfortunately I haven’t been able
to attend recently...

First, Rook is driven by the K8s APIs, including CRDs, an operator,
the CSI driver, etc. When the admin needs to configure the Ceph
cluster, they create the CRDs and other resources directly with the
K8s tools such as kubectl. Rook does everything with K8s patterns so
that the admin doesn’t need to leave their standard administration
sandbox in order to configure Rook or Ceph. If any Ceph-specific
command needs to be run, the rook toolbox can be used. However, we
prefer to avoid the toolbox for common scenarios that should have CRDs
for declaring desired state.

The fundamental question then is, **what scenarios require the Rook
orchestrator mgr module**? The module is not enabled by default in
Rook clusters and I am not aware of upstream users consuming it.

The purpose of the orchestrator module was originally to provide a
common entry point either for Ceph CLI tools or the dashboard. This
would provide the constant interface to work with both Rook or cephadm
clusters. Patrick pointed out that the dashboard isn’t really a
scenario anymore for the orchestrator module. If so, the only
remaining usage is for CLI tools. And if we only have the CLI
scenario, this means that the CLI commands would be run from the
toolbox. But we are trying to avoid the toolbox. We should be putting
our effort into the CRDs, CSI driver, etc.

If the orchestrator module is creating CRs, we are likely doing
something wrong. We expect the cluster admin to create CRs.

Thus, I’d like to understand the scenarios where the rook orchestrator
module is needed. If there isn’t a need anymore since dashboard
requirements have changed, I’d propose the module can be removed.

Thanks,
Travis
Rook


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Rook orchestrator module
  2020-09-29 19:31 Rook orchestrator module Travis Nielsen
@ 2020-09-29 19:50 ` Jason Dillaman
       [not found]   ` <CAByD1q89xGQGGj4=ySAw_hrHCq+t3zp9u8CkY-ey0_oo-7ntxA@mail.gmail.com>
  2020-10-07 15:25 ` Denis Kondratenko
  2020-10-07 15:52 ` Patrick Donnelly
  2 siblings, 1 reply; 5+ messages in thread
From: Jason Dillaman @ 2020-09-29 19:50 UTC (permalink / raw)
  To: Travis Nielsen
  Cc: Sebastian Wagner, Patrick Donnelly, Varsha Rao, Sebastien Han,
	Ceph Development List

On Tue, Sep 29, 2020 at 3:33 PM Travis Nielsen <tnielsen@redhat.com> wrote:
>
> Sebastian and fellow orchestrators,
>
> Some questions have come up recently about issues in the Rook
> orchestrator module and its state of disrepair. Patrick, Varsha, and I
> have been discussing these recently as Varsha has been working on the
> module. Before we fix all the issues that are being found, I want to
> start a higher level conversation. I’ll join the leads meeting
> tomorrow to discuss, and would be good to include in the Monday
> orchestrator agenda as well, which unfortunately I haven’t been able
> to attend recently...
>
> First, Rook is driven by the K8s APIs, including CRDs, an operator,
> the CSI driver, etc. When the admin needs to configure the Ceph
> cluster, they create the CRDs and other resources directly with the
> K8s tools such as kubectl. Rook does everything with K8s patterns so
> that the admin doesn’t need to leave their standard administration
> sandbox in order to configure Rook or Ceph. If any Ceph-specific
> command needs to be run, the rook toolbox can be used. However, we
> prefer to avoid the toolbox for common scenarios that should have CRDs
> for declaring desired state.
>
> The fundamental question then is, **what scenarios require the Rook
> orchestrator mgr module**? The module is not enabled by default in
> Rook clusters and I am not aware of upstream users consuming it.
>
> The purpose of the orchestrator module was originally to provide a
> common entry point either for Ceph CLI tools or the dashboard. This
> would provide the constant interface to work with both Rook or cephadm
> clusters. Patrick pointed out that the dashboard isn’t really a
> scenario anymore for the orchestrator module.

Is that true? [1]

> If so, the only
> remaining usage is for CLI tools. And if we only have the CLI
> scenario, this means that the CLI commands would be run from the
> toolbox. But we are trying to avoid the toolbox. We should be putting
> our effort into the CRDs, CSI driver, etc.
>
> If the orchestrator module is creating CRs, we are likely doing
> something wrong. We expect the cluster admin to create CRs.
>
> Thus, I’d like to understand the scenarios where the rook orchestrator
> module is needed. If there isn’t a need anymore since dashboard
> requirements have changed, I’d propose the module can be removed.

I don't have a current stake in the outcome, but I could foresee the
future need/desire for letting the Ceph cluster itself spin up
resources on-demand in k8s via Rook. Let's say that I want to convert
an XFS on RBD image to CephFS, the MGR could instruct the orchestrator
to kick off a job to translate between the two formats. I'd imagine
the same could be argued for on-demand NFS/SMB gateways or anywhere
else there is a delta between a storage administrator setting up the
basic Ceph system and Ceph attempting to self-regulate/optimize.

> Thanks,
> Travis
> Rook
>

[1] https://tracker.ceph.com/issues/46756

-- 
Jason


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Rook orchestrator module
       [not found]     ` <CA+aFP1Bxt9NgybrEKGRG2QDsxaoMqcHYyFOzLFeVZqc_AQW1_w@mail.gmail.com>
@ 2020-09-29 21:05       ` Travis Nielsen
  0 siblings, 0 replies; 5+ messages in thread
From: Travis Nielsen @ 2020-09-29 21:05 UTC (permalink / raw)
  To: Dillaman, Jason, Sebastian Wagner, Patrick Donnelly, Varsha Rao,
	Sebastien Han, Ceph Development List

Adding reply-all this time...

On Tue, Sep 29, 2020 at 2:53 PM Jason Dillaman <jdillama@redhat.com> wrote:
>
> On Tue, Sep 29, 2020 at 4:47 PM Travis Nielsen <tnielsen@redhat.com> wrote:
> >
> > On Tue, Sep 29, 2020 at 1:50 PM Jason Dillaman <jdillama@redhat.com> wrote:
> > >
> > > On Tue, Sep 29, 2020 at 3:33 PM Travis Nielsen <tnielsen@redhat.com> wrote:
> > > >
> > > > Sebastian and fellow orchestrators,
> > > >
> > > > Some questions have come up recently about issues in the Rook
> > > > orchestrator module and its state of disrepair. Patrick, Varsha, and I
> > > > have been discussing these recently as Varsha has been working on the
> > > > module. Before we fix all the issues that are being found, I want to
> > > > start a higher level conversation. I’ll join the leads meeting
> > > > tomorrow to discuss, and would be good to include in the Monday
> > > > orchestrator agenda as well, which unfortunately I haven’t been able
> > > > to attend recently...
> > > >
> > > > First, Rook is driven by the K8s APIs, including CRDs, an operator,
> > > > the CSI driver, etc. When the admin needs to configure the Ceph
> > > > cluster, they create the CRDs and other resources directly with the
> > > > K8s tools such as kubectl. Rook does everything with K8s patterns so
> > > > that the admin doesn’t need to leave their standard administration
> > > > sandbox in order to configure Rook or Ceph. If any Ceph-specific
> > > > command needs to be run, the rook toolbox can be used. However, we
> > > > prefer to avoid the toolbox for common scenarios that should have CRDs
> > > > for declaring desired state.
> > > >
> > > > The fundamental question then is, **what scenarios require the Rook
> > > > orchestrator mgr module**? The module is not enabled by default in
> > > > Rook clusters and I am not aware of upstream users consuming it.
> > > >
> > > > The purpose of the orchestrator module was originally to provide a
> > > > common entry point either for Ceph CLI tools or the dashboard. This
> > > > would provide the constant interface to work with both Rook or cephadm
> > > > clusters. Patrick pointed out that the dashboard isn’t really a
> > > > scenario anymore for the orchestrator module.
> > >
> > > Is that true? [1]
> >
> > Perhaps I misunderstood. If the dashboard is still a requirement, the
> > requirements will certainly be much higher to maintain support.
> >
> > >
> > > > If so, the only
> > > > remaining usage is for CLI tools. And if we only have the CLI
> > > > scenario, this means that the CLI commands would be run from the
> > > > toolbox. But we are trying to avoid the toolbox. We should be putting
> > > > our effort into the CRDs, CSI driver, etc.
> > > >
> > > > If the orchestrator module is creating CRs, we are likely doing
> > > > something wrong. We expect the cluster admin to create CRs.
> > > >
> > > > Thus, I’d like to understand the scenarios where the rook orchestrator
> > > > module is needed. If there isn’t a need anymore since dashboard
> > > > requirements have changed, I’d propose the module can be removed.
> > >
> > > I don't have a current stake in the outcome, but I could foresee the
> > > future need/desire for letting the Ceph cluster itself spin up
> > > resources on-demand in k8s via Rook. Let's say that I want to convert
> > > an XFS on RBD image to CephFS, the MGR could instruct the orchestrator
> > > to kick off a job to translate between the two formats. I'd imagine
> > > the same could be argued for on-demand NFS/SMB gateways or anywhere
> > > else there is a delta between a storage administrator setting up the
> > > basic Ceph system and Ceph attempting to self-regulate/optimize.
> >
> > If Ceph needs to self regulate, I could certainly see the module as
> > useful, such as auto-scaling the daemons when load is high. But at the
> > same time, the operator could watch for Ceph events, metrics, or other
> > indicators and perform the self-regulation according to the CR
> > settings, instead of it happening inside the mgr module.
>
> But then you would be embedding low-level business logic about Ceph
> inside Rook? Or if you are saying Rook would wait for a special event
> / alert hook from Ceph to perform some action. If that's the case, it
> sounds a lot like what the orchestrator purports to do (at least to me
> and at least as an end-state goal).

Agreed we don't want to embed Ceph logic in Rook. But yes, if Rook can
have a hook into Ceph to perform the action, the operator could handle
it. Then if cephadm needed to handle the same scenario, it might use a
mgr module to implement. But no need for a rook module in that case.

>
> > At the end of the day, I want to make sure we actually need an
> > orchestrator interface. K8s and cephadm are very different
> > environments and their features probably won't ever be at parity with
> > each other. It may be more appropriate to define the rook and cephadm
> > module separately. Or at least we need to be very clear why we need
> > the common interface, that it's tested, and supported.
>
> Not going to disagree with that last point.
>
> > >
> > > > Thanks,
> > > > Travis
> > > > Rook
> > > >
> > >
> > > [1] https://tracker.ceph.com/issues/46756
> > >
> > > --
> > > Jason
> > >
> >
>
>
> --
> Jason
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Rook orchestrator module
  2020-09-29 19:31 Rook orchestrator module Travis Nielsen
  2020-09-29 19:50 ` Jason Dillaman
@ 2020-10-07 15:25 ` Denis Kondratenko
  2020-10-07 15:52 ` Patrick Donnelly
  2 siblings, 0 replies; 5+ messages in thread
From: Denis Kondratenko @ 2020-10-07 15:25 UTC (permalink / raw)
  To: Travis Nielsen, Sebastian Wagner, Patrick Donnelly, Varsha Rao,
	Sebastien Han, Ceph Development List



On 9/29/20 9:31 PM, Travis Nielsen wrote:

> The purpose of the orchestrator module was originally to provide a
> common entry point either for Ceph CLI tools or the dashboard. This
> would provide the constant interface to work with both Rook or cephadm
> clusters. Patrick pointed out that the dashboard isn’t really a
> scenario anymore for the orchestrator module. If so, the only
> remaining usage is for CLI tools. And if we only have the CLI
> scenario, this means that the CLI commands would be run from the
> toolbox. But we are trying to avoid the toolbox. We should be putting
> our effort into the CRDs, CSI driver, etc.

I though it is exactly that, providing CLI to change something with the
same unified (cephadm and Rook) orch interface. Like checking what
resources are available, adding daemons, modifying DriveGroups, updating
Ceph version and discovering that there are new updates in the registry.

Sure node management probably is better on k8s level, and yes many of
those tasks ppl could do modifying CRs and applying them.
But it might be that Ceph admin would be inside toolbox to watch cluster
state, fixing it and modifying things for a quite some time.

So it (as idea) make sense for Ceph specific actions (services, drive
configs, osd configs, troubleshoot, Ceph versions) to use Ceph CLI.
And for k8s specific actions (labels, nodes, toleration, CRs creation
and etc) to use k8s CLI.

But that is more dev/eng point of view, don't know if that correlates to
much with user experience.

> 
> If the orchestrator module is creating CRs, we are likely doing
> something wrong. We expect the cluster admin to create CRs.

I would echo that. But changing CR for "cephVersion" looks like a good idea.

BTW how update workflow of cephVersion is designed? There is no Helm
chart (and that looks logical) and there is no other way to changed but
edit the CR directly or by applying some manual changes in some self
managed "yaml" file.
In case of user, maybe I would like to control that mostly myself. But
as vendor, vendor would like to control Ceph version, to limit user to
supported images.

> 
> Thus, I’d like to understand the scenarios where the rook orchestrator
> module is needed. If there isn’t a need anymore since dashboard
> requirements have changed, I’d propose the module can be removed.

Maybe, but that also the question to the end user. cephadm is not widely
adopted and there is not much feedback if extended orch is useful and it
might be better unified with Rook.

Also many devs are busy with cephadm so not much time to extend orch for
Rook.

> 
> Thanks,
> Travis
> Rook
> 

Thanks,
-- 
Denis Kondratenko
Engineering Manager SUSE Linux Enterprise Storage

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nuremberg
Germany

(HRB 36809, AG Nürnberg)
Managing Director: Felix Imendörffer


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Rook orchestrator module
  2020-09-29 19:31 Rook orchestrator module Travis Nielsen
  2020-09-29 19:50 ` Jason Dillaman
  2020-10-07 15:25 ` Denis Kondratenko
@ 2020-10-07 15:52 ` Patrick Donnelly
  2 siblings, 0 replies; 5+ messages in thread
From: Patrick Donnelly @ 2020-10-07 15:52 UTC (permalink / raw)
  To: Travis Nielsen
  Cc: Sebastian Wagner, Varsha Rao, Sebastien Han,
	Ceph Development List, dev, Venky Shankar

Adding in dev@ceph.io. ceph-devel is now for kernel development but
I'm keeping it in the cc list because a lot of discussion already
happened there.

Also for those interested, there's a recording of a meeting we had on
this topic here: https://www.youtube.com/watch?v=1OSQySElojg

On Tue, Sep 29, 2020 at 12:32 PM Travis Nielsen <tnielsen@redhat.com> wrote:
>
> Sebastian and fellow orchestrators,
>
> Some questions have come up recently about issues in the Rook
> orchestrator module and its state of disrepair. Patrick, Varsha, and I
> have been discussing these recently as Varsha has been working on the
> module. Before we fix all the issues that are being found, I want to
> start a higher level conversation. I’ll join the leads meeting
> tomorrow to discuss, and would be good to include in the Monday
> orchestrator agenda as well, which unfortunately I haven’t been able
> to attend recently...
>
> First, Rook is driven by the K8s APIs, including CRDs, an operator,
> the CSI driver, etc. When the admin needs to configure the Ceph
> cluster, they create the CRDs and other resources directly with the
> K8s tools such as kubectl. Rook does everything with K8s patterns so
> that the admin doesn’t need to leave their standard administration
> sandbox in order to configure Rook or Ceph. If any Ceph-specific
> command needs to be run, the rook toolbox can be used. However, we
> prefer to avoid the toolbox for common scenarios that should have CRDs
> for declaring desired state.

We're at a crossroads here. Ceph is increasingly learning to manage
itself with a primary goal of increasing user friendliness. Awareness
of the deployment technology is key to that.

> The fundamental question then is, **what scenarios require the Rook
> orchestrator mgr module**? The module is not enabled by default in
> Rook clusters and I am not aware of upstream users consuming it.
>
> The purpose of the orchestrator module was originally to provide a
> common entry point either for Ceph CLI tools or the dashboard. This
> would provide the constant interface to work with both Rook or cephadm
> clusters. Patrick pointed out that the dashboard isn’t really a
> scenario anymore for the orchestrator module.

As Lenz pointed out in another reply, my understanding was wrong here.
Dashboard has been using the orchestrator for displaying information
from the orchestrator.

> If so, the only
> remaining usage is for CLI tools. And if we only have the CLI
> scenario, this means that the CLI commands would be run from the
> toolbox. But we are trying to avoid the toolbox. We should be putting
> our effort into the CRDs, CSI driver, etc.

I think we need to be careful about looking at the CLI as the sole
entry point for the orchestrator. The mgr modules (including the
dashboard) are increasingly using the orchestrator to do tasks. As we
discussed in the orchestrator meeting (youtube linked earlier in this
mail) CephFS is planning these scenarios for Pacific:

- mds_autoscaler plugin deploys MDS in response to file system
degradation (increased max_mds, insufficient standby). Future work [1]
will look at deploying MDS with more memory in response to load on the
file system. (Think lots of small file systems with small MDS to
start.)

- volumes plugin deploys NFS clusters configured via the `ceph nfs
...` command suite.

- cephfs-mirror daemons deployed to geo-replicate CephFS file systems.

- (Still TBD:) volumes plugin to use an rsync container to copy data
between two CephFS subvolumes (encrypted or not). Probably include RBD
mounted images as source or destination, at some point.

> If the orchestrator module is creating CRs, we are likely doing
> something wrong. We expect the cluster admin to create CRs.
>
> Thus, I’d like to understand the scenarios where the rook orchestrator
> module is needed. If there isn’t a need anymore since dashboard
> requirements have changed, I’d propose the module can be removed.

Outside of this thread I think we already decided not to do this but
I'm still interested to hear everyone's thoughts. Hopefully broader
exposure on dev@ceph.io will get us more voices.

[1] https://tracker.ceph.com/issues/46680

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-10-07 15:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 19:31 Rook orchestrator module Travis Nielsen
2020-09-29 19:50 ` Jason Dillaman
     [not found]   ` <CAByD1q89xGQGGj4=ySAw_hrHCq+t3zp9u8CkY-ey0_oo-7ntxA@mail.gmail.com>
     [not found]     ` <CA+aFP1Bxt9NgybrEKGRG2QDsxaoMqcHYyFOzLFeVZqc_AQW1_w@mail.gmail.com>
2020-09-29 21:05       ` Travis Nielsen
2020-10-07 15:25 ` Denis Kondratenko
2020-10-07 15:52 ` Patrick Donnelly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).