All of lore.kernel.org
 help / color / mirror / Atom feed
* A Transformation of our Global Context
@ 2016-06-10 18:50 Adam C. Emerson
  2016-06-10 19:47 ` Gregory Farnum
  0 siblings, 1 reply; 28+ messages in thread
From: Adam C. Emerson @ 2016-06-10 18:50 UTC (permalink / raw)
  To: The Sacred Order of the Squid Cybernetic

Terriffically Tentacular Teammates,

I come not with a multilateral trade deal, but with a request for
comments.

In our code base we have CephContext. In daemons and official Ceph
Utilities, we have g_ceph_context. I believe CephContext does not show
proper separation of concerns.

Some of its contents (debugging flags, file locations, and the log
levels used by the dout, initialized cryptographic subsystems, the
admin socket, and others) are relevant to the process as a whole.

Others (such as the monitor address, fsid, cluster and public network,
CrushLocation, and so forth) are relevant to the cluster.

I would like to split CephContext between its process-relevant subset
and its cluster-relevant subset and move the process-relevant portion
into a global variable in libcommon. Cluster-relevant parts would be
plumbed into subsystems the way CephContext is now.

The improved separation of concerns would make it easier to write
utilities that connect to multiple clusters, which sounds like it will
be increasingly important with the work people have done on
replication and the notion of 'pop-up' Ceph clusters for specific
workloads.

It would also make the code more modular and reusable. Several pieces
that could be reasonably made into libraries (such as the MDS or OSD)
or incorporated into non-daemon code (like the erasure code system)
can't be now. They could be re-plumbed, but having to thread a
CephContext around just to log messages is unappealing.

It would also be cleaner not having to plumb a CephContext in to every
point in libcommon that checks the time or prints to the log.

If I were to do this, my plan would be:

1. Separate configuration variables into process and cluster
2. Create a global 'ProcessContext' class. Move some elements of
   CephContext into it and the process-relevant configuration
   options. Have the configuration parser write to both locations, and
   have CephContext carry references to alias the relevant members of
   the global.
3. Create a ClusterContext (perhaps as a base to CephContext holding
   all non-aliased members).
4. Subsystem by subsystem remove uses of CephContext, replacing some
   with ClusterContext.
5. When all uses of CephContext are removed, remove the class.

Does this seem reasonable?

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 18:50 A Transformation of our Global Context Adam C. Emerson
@ 2016-06-10 19:47 ` Gregory Farnum
  2016-06-10 19:58   ` Adam C. Emerson
  0 siblings, 1 reply; 28+ messages in thread
From: Gregory Farnum @ 2016-06-10 19:47 UTC (permalink / raw)
  To: The Sacred Order of the Squid Cybernetic

On Fri, Jun 10, 2016 at 11:50 AM, Adam C. Emerson <aemerson@redhat.com> wrote:
> Terriffically Tentacular Teammates,
>
> I come not with a multilateral trade deal, but with a request for
> comments.
>
> In our code base we have CephContext. In daemons and official Ceph
> Utilities, we have g_ceph_context. I believe CephContext does not show
> proper separation of concerns.
>
> Some of its contents (debugging flags, file locations, and the log
> levels used by the dout, initialized cryptographic subsystems, the
> admin socket, and others) are relevant to the process as a whole.
>
> Others (such as the monitor address, fsid, cluster and public network,
> CrushLocation, and so forth) are relevant to the cluster.
>
> I would like to split CephContext between its process-relevant subset
> and its cluster-relevant subset and move the process-relevant portion
> into a global variable in libcommon. Cluster-relevant parts would be
> plumbed into subsystems the way CephContext is now.
>
> The improved separation of concerns would make it easier to write
> utilities that connect to multiple clusters, which sounds like it will
> be increasingly important with the work people have done on
> replication and the notion of 'pop-up' Ceph clusters for specific
> workloads.

Maybe I'm misunderstanding the scope of what you're proposing, but I
don't think this is a good reason to go through this work right now.
Lots of things that seem process-specific turn out to be essentially
cluster properties, despite being configurable on both ends, and will
need to be referenced from all over the stack. And in general if we're
addressing multiple Ceph clusters within a utility we want a full
RadosClient etc for each cluster, so the split doesn't do us much good
on its own, does it? (If we don't want separate admin sockets for each
one, we'd need to extend all of those interfaces to support two
entries for the same type of thing, etc)

> It would also make the code more modular and reusable. Several pieces
> that could be reasonably made into libraries (such as the MDS or OSD)
> or incorporated into non-daemon code (like the erasure code system)
> can't be now. They could be re-plumbed, but having to thread a
> CephContext around just to log messages is unappealing.

If this is your actual goal, I think the unappealing work of replacing
g_ceph_context with a CephContext *cct is the way to go — we'd need to
do this much to support multiple ClusterContexts within a process
anyway. Once we have a motivating use case for splitting stuff up, we
can address that. But if we try and do a generic solution it'll be
much more difficult, and probably will miss some key points we don't
run into until trying to make use of it.

Am I missing something?
-Greg

>
> It would also be cleaner not having to plumb a CephContext in to every
> point in libcommon that checks the time or prints to the log.
>
> If I were to do this, my plan would be:
>
> 1. Separate configuration variables into process and cluster
> 2. Create a global 'ProcessContext' class. Move some elements of
>    CephContext into it and the process-relevant configuration
>    options. Have the configuration parser write to both locations, and
>    have CephContext carry references to alias the relevant members of
>    the global.
> 3. Create a ClusterContext (perhaps as a base to CephContext holding
>    all non-aliased members).
> 4. Subsystem by subsystem remove uses of CephContext, replacing some
>    with ClusterContext.
> 5. When all uses of CephContext are removed, remove the class.
>
> Does this seem reasonable?
>
> --
> Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
> IRC: Aemerson@{RedHat, OFTC, Freenode}
> 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 19:47 ` Gregory Farnum
@ 2016-06-10 19:58   ` Adam C. Emerson
  2016-06-10 20:06     ` Gregory Farnum
  2016-06-10 20:28     ` Allen Samuels
  0 siblings, 2 replies; 28+ messages in thread
From: Adam C. Emerson @ 2016-06-10 19:58 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: The Sacred Order of the Squid Cybernetic

On 10/06/2016, Gregory Farnum wrote:
> Maybe I'm misunderstanding the scope of what you're proposing, but I
> don't think this is a good reason to go through this work right now.
> Lots of things that seem process-specific turn out to be essentially
> cluster properties, despite being configurable on both ends, and will
> need to be referenced from all over the stack. And in general if we're
> addressing multiple Ceph clusters within a utility we want a full
> RadosClient etc for each cluster, so the split doesn't do us much good
> on its own, does it? (If we don't want separate admin sockets for each
> one, we'd need to extend all of those interfaces to support two
> entries for the same type of thing, etc)

Finding out exactly which things are process specific is one reason
I'm soliciting for feedback. AdminSocket doesn't have to be
included. Really, I think we'd get a lot of benefit if we made JUST
some debug options and dout support process specific and refactored it
that way. Other things could move later on if people wanted them to
but wouldn't have to.


> If this is your actual goal, I think the unappealing work of replacing
> g_ceph_context with a CephContext *cct is the way to go — we'd need to
> do this much to support multiple ClusterContexts within a process
> anyway. Once we have a motivating use case for splitting stuff up, we
> can address that. But if we try and do a generic solution it'll be
> much more difficult, and probably will miss some key points we don't
> run into until trying to make use of it.
> 
> Am I missing something?
> -Greg

I have been working on the unappealing work of replacing
g_ceph_context with CephContext* cct. It's not just unappealing work,
it ends up wiht a result ugly enough I don't think I'd want to merge
it if someone presented it to me. It's very invasive in that it
changes a whole lot of everything. The split would ALSO change a whole
lot of everything, but likely less. You would at least be able to
avoid changing the constructors of a while bunch of unrelated things
just to pass a variable in that only some member several classes in
cares about so it can log a message.


-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 19:58   ` Adam C. Emerson
@ 2016-06-10 20:06     ` Gregory Farnum
  2016-06-10 20:15       ` Adam C. Emerson
  2016-06-10 20:38       ` Sage Weil
  2016-06-10 20:28     ` Allen Samuels
  1 sibling, 2 replies; 28+ messages in thread
From: Gregory Farnum @ 2016-06-10 20:06 UTC (permalink / raw)
  To: The Sacred Order of the Squid Cybernetic

On Fri, Jun 10, 2016 at 12:58 PM, Adam C. Emerson <aemerson@redhat.com> wrote:
> On 10/06/2016, Gregory Farnum wrote:
>> Maybe I'm misunderstanding the scope of what you're proposing, but I
>> don't think this is a good reason to go through this work right now.
>> Lots of things that seem process-specific turn out to be essentially
>> cluster properties, despite being configurable on both ends, and will
>> need to be referenced from all over the stack. And in general if we're
>> addressing multiple Ceph clusters within a utility we want a full
>> RadosClient etc for each cluster, so the split doesn't do us much good
>> on its own, does it? (If we don't want separate admin sockets for each
>> one, we'd need to extend all of those interfaces to support two
>> entries for the same type of thing, etc)
>
> Finding out exactly which things are process specific is one reason
> I'm soliciting for feedback. AdminSocket doesn't have to be
> included. Really, I think we'd get a lot of benefit if we made JUST
> some debug options and dout support process specific and refactored it
> that way. Other things could move later on if people wanted them to
> but wouldn't have to.
>
>
>> If this is your actual goal, I think the unappealing work of replacing
>> g_ceph_context with a CephContext *cct is the way to go — we'd need to
>> do this much to support multiple ClusterContexts within a process
>> anyway. Once we have a motivating use case for splitting stuff up, we
>> can address that. But if we try and do a generic solution it'll be
>> much more difficult, and probably will miss some key points we don't
>> run into until trying to make use of it.
>>
>> Am I missing something?
>> -Greg
>
> I have been working on the unappealing work of replacing
> g_ceph_context with CephContext* cct. It's not just unappealing work,
> it ends up wiht a result ugly enough I don't think I'd want to merge
> it if someone presented it to me. It's very invasive in that it
> changes a whole lot of everything. The split would ALSO change a whole
> lot of everything, but likely less. You would at least be able to
> avoid changing the constructors of a while bunch of unrelated things
> just to pass a variable in that only some member several classes in
> cares about so it can log a message.

So you want everything to stay with/go back to having a global member
for the ProcessContext?
I'm surprised it needs to find its way into so many random things;
most objects have references to their parent that let them go back to
the MDSRank/OSD/whatever main class there is.

>
>
> --
> Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
> IRC: Aemerson@{RedHat, OFTC, Freenode}
> 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 20:06     ` Gregory Farnum
@ 2016-06-10 20:15       ` Adam C. Emerson
  2016-06-10 20:40         ` Sage Weil
  2016-06-10 20:38       ` Sage Weil
  1 sibling, 1 reply; 28+ messages in thread
From: Adam C. Emerson @ 2016-06-10 20:15 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: The Sacred Order of the Squid Cybernetic

On 10/06/2016, Gregory Farnum wrote:
> So you want everything to stay with/go back to having a global member
> for the ProcessContext?

Precisely. I want a ProcessContext to handle dout and the like in
libcommon (since it doesn't vary by cluster I don't think there's any
reason to restrict it to libglobal/daemon/whatever) and let everything
else be passed in.

> I'm surprised it needs to find its way into so many random things;
> most objects have references to their parent that let them go back to
> the MDSRank/OSD/whatever main class there is.

A lot of them do have a link back to the main class but a lot
don't. A lot of 'light weight' data classes in the MDS for example
don't have a link back to anything but still need g_ceph_context to
tell time or dout something. A lot of callbacks do that, too. other
places they link back but the cct is protected (which I just switched
to public in my version, but it isn't ideal.)

After having tried to do it I think having a fairly minimal (process
specific) part that's global in libcommon and the rest that Isn't
Global Anywhere is much less disruptive.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: A Transformation of our Global Context
  2016-06-10 19:58   ` Adam C. Emerson
  2016-06-10 20:06     ` Gregory Farnum
@ 2016-06-10 20:28     ` Allen Samuels
  2016-06-10 20:43       ` Adam C. Emerson
  1 sibling, 1 reply; 28+ messages in thread
From: Allen Samuels @ 2016-06-10 20:28 UTC (permalink / raw)
  To: Adam C. Emerson, Gregory Farnum; +Cc: The Sacred Order of the Squid Cybernetic

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Adam C. Emerson
> Sent: Friday, June 10, 2016 12:59 PM
> To: Gregory Farnum <gfarnum@redhat.com>
> Cc: The Sacred Order of the Squid Cybernetic <ceph-devel@vger.kernel.org>
> Subject: Re: A Transformation of our Global Context
> 
> On 10/06/2016, Gregory Farnum wrote:
> > Maybe I'm misunderstanding the scope of what you're proposing, but I
> > don't think this is a good reason to go through this work right now.
> > Lots of things that seem process-specific turn out to be essentially
> > cluster properties, despite being configurable on both ends, and will
> > need to be referenced from all over the stack. And in general if we're
> > addressing multiple Ceph clusters within a utility we want a full
> > RadosClient etc for each cluster, so the split doesn't do us much good
> > on its own, does it? (If we don't want separate admin sockets for each
> > one, we'd need to extend all of those interfaces to support two
> > entries for the same type of thing, etc)
> 
> Finding out exactly which things are process specific is one reason I'm
> soliciting for feedback. AdminSocket doesn't have to be included. Really, I
> think we'd get a lot of benefit if we made JUST some debug options and dout
> support process specific and refactored it that way. Other things could move
> later on if people wanted them to but wouldn't have to.
> 
> 
> > If this is your actual goal, I think the unappealing work of replacing
> > g_ceph_context with a CephContext *cct is the way to go — we'd need to
> > do this much to support multiple ClusterContexts within a process
> > anyway. Once we have a motivating use case for splitting stuff up, we
> > can address that. But if we try and do a generic solution it'll be
> > much more difficult, and probably will miss some key points we don't
> > run into until trying to make use of it.
> >
> > Am I missing something?
> > -Greg
> 
> I have been working on the unappealing work of replacing g_ceph_context
> with CephContext* cct. It's not just unappealing work, it ends up wiht a
> result ugly enough I don't think I'd want to merge it if someone presented it
> to me. It's very invasive in that it changes a whole lot of everything. The split
> would ALSO change a whole lot of everything, but likely less. You would at
> least be able to avoid changing the constructors of a while bunch of
> unrelated things just to pass a variable in that only some member several
> classes in cares about so it can log a message.

Have you considered threadlocal storage for this? I suspect it's a *much* easier way to go. Very little source code will need modification.

> 
> 
> --
> Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
> IRC: Aemerson@{RedHat, OFTC, Freenode}
> 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 20:06     ` Gregory Farnum
  2016-06-10 20:15       ` Adam C. Emerson
@ 2016-06-10 20:38       ` Sage Weil
  2016-06-10 21:04         ` Adam C. Emerson
  1 sibling, 1 reply; 28+ messages in thread
From: Sage Weil @ 2016-06-10 20:38 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: The Sacred Order of the Squid Cybernetic

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3663 bytes --]

On Fri, 10 Jun 2016, Gregory Farnum wrote:
> On Fri, Jun 10, 2016 at 12:58 PM, Adam C. Emerson <aemerson@redhat.com> wrote:
> > On 10/06/2016, Gregory Farnum wrote:
> >> Maybe I'm misunderstanding the scope of what you're proposing, but I
> >> don't think this is a good reason to go through this work right now.
> >> Lots of things that seem process-specific turn out to be essentially
> >> cluster properties, despite being configurable on both ends, and will
> >> need to be referenced from all over the stack. And in general if we're
> >> addressing multiple Ceph clusters within a utility we want a full
> >> RadosClient etc for each cluster, so the split doesn't do us much good
> >> on its own, does it? (If we don't want separate admin sockets for each
> >> one, we'd need to extend all of those interfaces to support two
> >> entries for the same type of thing, etc)
> >
> > Finding out exactly which things are process specific is one reason
> > I'm soliciting for feedback. AdminSocket doesn't have to be
> > included. Really, I think we'd get a lot of benefit if we made JUST
> > some debug options and dout support process specific and refactored it
> > that way. Other things could move later on if people wanted them to
> > but wouldn't have to.
> >
> >
> >> If this is your actual goal, I think the unappealing work of replacing
> >> g_ceph_context with a CephContext *cct is the way to go — we'd need to
> >> do this much to support multiple ClusterContexts within a process
> >> anyway. Once we have a motivating use case for splitting stuff up, we
> >> can address that. But if we try and do a generic solution it'll be
> >> much more difficult, and probably will miss some key points we don't
> >> run into until trying to make use of it.
> >>
> >> Am I missing something?
> >> -Greg
> >
> > I have been working on the unappealing work of replacing
> > g_ceph_context with CephContext* cct. It's not just unappealing work,
> > it ends up wiht a result ugly enough I don't think I'd want to merge
> > it if someone presented it to me. It's very invasive in that it
> > changes a whole lot of everything. The split would ALSO change a whole
> > lot of everything, but likely less. You would at least be able to
> > avoid changing the constructors of a while bunch of unrelated things
> > just to pass a variable in that only some member several classes in
> > cares about so it can log a message.
> 
> So you want everything to stay with/go back to having a global member
> for the ProcessContext?
> I'm surprised it needs to find its way into so many random things;
> most objects have references to their parent that let them go back to
> the MDSRank/OSD/whatever main class there is.

I think the crux of the issue is the debug/logging code.  Almost all the 
other uses of cct are related to config options that are per-cluster.

And with logging... it seems like even this is really a per-cluster (well, 
per entity) thing.  Sorting through a log file that combines output from 
two different objecters sounds horrific, and when we cram multiple
OSDs into one process, we're going to want them to still feed into 
different log files.  (And pid files. And admin sockets.)

There are a few things that really are process-bound (like the crypto 
libraries) and are obvious candidates, but I think even those we have to 
be careful with; using globals for anything that is dynamically linked 
(i.e., included in librados or libcommon) is generally a big no-no.  IIRC 
lockdep is the main place where we break that rule (for better or for 
worse).

How bad is the cct plumbing patch?  I fear this might still be the best 
path forward...

sage

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 20:15       ` Adam C. Emerson
@ 2016-06-10 20:40         ` Sage Weil
  2016-06-10 20:45           ` Adam C. Emerson
  0 siblings, 1 reply; 28+ messages in thread
From: Sage Weil @ 2016-06-10 20:40 UTC (permalink / raw)
  To: Adam C. Emerson; +Cc: Gregory Farnum, The Sacred Order of the Squid Cybernetic

On Fri, 10 Jun 2016, Adam C. Emerson wrote:
> don't. A lot of 'light weight' data classes in the MDS for example
> don't have a link back to anything but still need g_ceph_context to
> tell time or dout something. A lot of callbacks do that, too. other

FWIW the time thing is a bit of a red herring.  The only reason time takes 
cct is for the configuration clock skew.  That was added for debugging 
purposes but has never (to my knowledge) been used.  I'd be happy to rip 
it out.

There is an implicit dependency on the host tz in the render-to-string 
method, but that's handled by glibc.

sage

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 20:28     ` Allen Samuels
@ 2016-06-10 20:43       ` Adam C. Emerson
  2016-06-10 22:38         ` Allen Samuels
  0 siblings, 1 reply; 28+ messages in thread
From: Adam C. Emerson @ 2016-06-10 20:43 UTC (permalink / raw)
  To: Allen Samuels; +Cc: Gregory Farnum, The Sacred Order of the Squid Cybernetic

On 10/06/2016, Allen Samuels wrote:
> Have you considered threadlocal storage for this? I suspect it's a
> *much* easier way to go. Very little source code will need modification.

I thought about it briefly, but I don't think it would work for tools
that spawn two RadosClients each pointing to a different cluster. While
each would launch threads of its own, they'd both share the application
threads for outgoing operations (at least until they hit the messenger). So I
think we'd still need to tease apart the class somewhat to make thread local
storage work.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 20:40         ` Sage Weil
@ 2016-06-10 20:45           ` Adam C. Emerson
  2016-06-10 20:53             ` Sage Weil
  0 siblings, 1 reply; 28+ messages in thread
From: Adam C. Emerson @ 2016-06-10 20:45 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, The Sacred Order of the Squid Cybernetic

On 10/06/2016, Sage Weil wrote:
> FWIW the time thing is a bit of a red herring.  The only reason time takes 
> cct is for the configuration clock skew.  That was added for debugging 
> purposes but has never (to my knowledge) been used.  I'd be happy to rip 
> it out.

I wasn't aware of that. I'd asked once if the clock skew debug option
was really important (back with the original time patch) and was told
it was. The person I was talking to might have thought I was talking
about monitor clock skew or something.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 20:45           ` Adam C. Emerson
@ 2016-06-10 20:53             ` Sage Weil
  2016-06-10 23:57               ` Joao Eduardo Luis
  0 siblings, 1 reply; 28+ messages in thread
From: Sage Weil @ 2016-06-10 20:53 UTC (permalink / raw)
  To: Adam C. Emerson; +Cc: Gregory Farnum, The Sacred Order of the Squid Cybernetic

On Fri, 10 Jun 2016, Adam C. Emerson wrote:
> On 10/06/2016, Sage Weil wrote:
> > FWIW the time thing is a bit of a red herring.  The only reason time takes 
> > cct is for the configuration clock skew.  That was added for debugging 
> > purposes but has never (to my knowledge) been used.  I'd be happy to rip 
> > it out.
> 
> I wasn't aware of that. I'd asked once if the clock skew debug option
> was really important (back with the original time patch) and was told
> it was. The person I was talking to might have thought I was talking
> about monitor clock skew or something.

Ah, there is *one* teuthology test that sets this:

$ git grep clock\ offset | cat
suites/rados/multimon/tasks/mon_clock_with_skews.yaml:        clock offset: 10

I'd be quite happy to drop this test, or kludge around it some other way 
(e.g., put a mon on a different host and change the system time), if it 
saves us as much code churn as I suspect it will...

sage

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 20:38       ` Sage Weil
@ 2016-06-10 21:04         ` Adam C. Emerson
  2016-06-10 21:20           ` Sage Weil
  0 siblings, 1 reply; 28+ messages in thread
From: Adam C. Emerson @ 2016-06-10 21:04 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, The Sacred Order of the Squid Cybernetic

On 10/06/2016, Sage Weil wrote:
> I think the crux of the issue is the debug/logging code.  Almost all
> the other uses of cct are related to config options that are
> per-cluster.

That is correct. I thought we might pull some things like (like the
Crypto stuff) while we were at it but that was more "As long as I'm
here anyway I might as well see if there's anything more to do..."

> And with logging... it seems like even this is really a per-cluster
> (well, per entity) thing.  Sorting through a log file that combines
> output from two different objecters sounds horrific, and when we
> cram multiple OSDs into one process, we're going to want them to
> still feed into different log files.

This is a good point that I had not considered. Though, I'm not sure
if the un-separated case is that much better. I think part of the
savings we would want from multi-OSD work (at least in principle) is
to be able to have them share messengers and other overhead. In that
case giving each OSD its own fully fledged CephContext doesn't make
sense, I don't think.

Some form of sterring might work, perhaps making the log system smart
enough to shove things around based on subsystem or an entity
identifier or something.

> How bad is the cct plumbing patch?  I fear this might still be the
> best path forward...

It's not great? Really at all. I wrote it and I don't like it. I
could pull it up and smack it into shape and make sure it passes tests
if it were necessary.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 21:04         ` Adam C. Emerson
@ 2016-06-10 21:20           ` Sage Weil
  2016-06-10 22:51             ` Adam C. Emerson
  2016-11-11 21:31             ` Yehuda Sadeh-Weinraub
  0 siblings, 2 replies; 28+ messages in thread
From: Sage Weil @ 2016-06-10 21:20 UTC (permalink / raw)
  To: Adam C. Emerson; +Cc: Gregory Farnum, The Sacred Order of the Squid Cybernetic

On Fri, 10 Jun 2016, Adam C. Emerson wrote:
> On 10/06/2016, Sage Weil wrote:
> > I think the crux of the issue is the debug/logging code.  Almost all
> > the other uses of cct are related to config options that are
> > per-cluster.
> 
> That is correct. I thought we might pull some things like (like the
> Crypto stuff) while we were at it but that was more "As long as I'm
> here anyway I might as well see if there's anything more to do..."
> 
> > And with logging... it seems like even this is really a per-cluster
> > (well, per entity) thing.  Sorting through a log file that combines
> > output from two different objecters sounds horrific, and when we
> > cram multiple OSDs into one process, we're going to want them to
> > still feed into different log files.
> 
> This is a good point that I had not considered. Though, I'm not sure
> if the un-separated case is that much better. I think part of the
> savings we would want from multi-OSD work (at least in principle) is
> to be able to have them share messengers and other overhead. In that
> case giving each OSD its own fully fledged CephContext doesn't make
> sense, I don't think.

The messengers aren't tied to CephContext, so I don't think this changes 
things in that regard.  I'm guessing we'll end up with something where 
each entity has a thing that implements the Messenger interface but many 
of those are sharing all their resources behind the scenes (thread pools, 
multiplexing their connections, etc.).  We'll have to figure out how the 
config options related to messenger itself are handled, but I think that 
oddity needs to be done explicitly anyway.  We wouldn't for instance, want 
to assume that all clusters in the same process must share the same 
messenger options.  (Maybe we invent an entity that governs the shared 
resources (process.NNN) and somehow direct config options/admin socket 
commands/whatever toward that?  Who knows.)

> Some form of sterring might work, perhaps making the log system smart
> enough to shove things around based on subsystem or an entity
> identifier or something.
> 
> > How bad is the cct plumbing patch?  I fear this might still be the
> > best path forward...
> 
> It's not great? Really at all. I wrote it and I don't like it. I
> could pull it up and smack it into shape and make sure it passes tests
> if it were necessary.

Is it pushed somewhere?  Maybe much of the pain is related to the clock 
thing...

sage

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: A Transformation of our Global Context
  2016-06-10 20:43       ` Adam C. Emerson
@ 2016-06-10 22:38         ` Allen Samuels
  2016-06-10 22:44           ` Adam C. Emerson
  0 siblings, 1 reply; 28+ messages in thread
From: Allen Samuels @ 2016-06-10 22:38 UTC (permalink / raw)
  To: Adam C. Emerson; +Cc: Gregory Farnum, The Sacred Order of the Squid Cybernetic

Not sure why you'd want to RadosClient instances to share outgoing threads etc. Seems like a bad idea to me.


Allen Samuels
SanDisk |a Western Digital brand
2880 Junction Avenue, San Jose, CA 95134
T: +1 408 801 7030| M: +1 408 780 6416
allen.samuels@SanDisk.com


> -----Original Message-----
> From: Adam C. Emerson [mailto:aemerson@redhat.com]
> Sent: Friday, June 10, 2016 1:43 PM
> To: Allen Samuels <Allen.Samuels@sandisk.com>
> Cc: Gregory Farnum <gfarnum@redhat.com>; The Sacred Order of the Squid
> Cybernetic <ceph-devel@vger.kernel.org>
> Subject: Re: A Transformation of our Global Context
> 
> On 10/06/2016, Allen Samuels wrote:
> > Have you considered threadlocal storage for this? I suspect it's a
> > *much* easier way to go. Very little source code will need modification.
> 
> I thought about it briefly, but I don't think it would work for tools that spawn
> two RadosClients each pointing to a different cluster. While each would
> launch threads of its own, they'd both share the application threads for
> outgoing operations (at least until they hit the messenger). So I think we'd
> still need to tease apart the class somewhat to make thread local storage
> work.
> 
> --
> Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
> IRC: Aemerson@{RedHat, OFTC, Freenode}
> 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 22:38         ` Allen Samuels
@ 2016-06-10 22:44           ` Adam C. Emerson
  0 siblings, 0 replies; 28+ messages in thread
From: Adam C. Emerson @ 2016-06-10 22:44 UTC (permalink / raw)
  To: Allen Samuels; +Cc: Gregory Farnum, The Sacred Order of the Squid Cybernetic

On 10/06/2016, Allen Samuels wrote:
> Not sure why you'd want to RadosClient instances to share outgoing threads etc. Seems like a bad idea to me.

No, I meant that when I, as an application, say to write an object, everything
happens in my (the application's) down through objecter until we hit the
messenger. So it's not that I /want/ them to share threds, it's just that they
do on the outgoing operation path. (And having them not do so would involve a
bunch of overhead.)

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 21:20           ` Sage Weil
@ 2016-06-10 22:51             ` Adam C. Emerson
  2016-11-11 21:31             ` Yehuda Sadeh-Weinraub
  1 sibling, 0 replies; 28+ messages in thread
From: Adam C. Emerson @ 2016-06-10 22:51 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, The Sacred Order of the Squid Cybernetic

On 10/06/2016, Sage Weil wrote:
> The messengers aren't tied to CephContext, so I don't think this changes 

I'd thought each Messenger had a CephContext of its own and you'd want it to
line up with that of the person calling it if you're interested in making sure
logs go in the wrong places

> things in that regard.  I'm guessing we'll end up with something where 
> each entity has a thing that implements the Messenger interface but many 
> of those are sharing all their resources behind the scenes (thread pools, 
> multiplexing their connections, etc.).  We'll have to figure out how the 
> config options related to messenger itself are handled, but I think that 
> oddity needs to be done explicitly anyway.  We wouldn't for instance, want 
> to assume that all clusters in the same process must share the same 
> messenger options.  (Maybe we invent an entity that governs the shared 
> resources (process.NNN) and somehow direct config options/admin socket 
> commands/whatever toward that?  Who knows.)

That's also true, you can replumb things a fair bit.

> Is it pushed somewhere?  Maybe much of the pain is related to the clock 
> thing...

https://github.com/adamemerson/ceph/tree/wip-g_ceph_context-exterminate

There's more to be done on it, but that's what's there. The Clock thing might
help some, but the logging does seem to kill us.

It ended up becoming much more of a time sink than I expected it to so even if
we do end up going that route I need to do other things before I put much more
work into it. (I'm going to look at that reweighting bug for one thing.)

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 20:53             ` Sage Weil
@ 2016-06-10 23:57               ` Joao Eduardo Luis
  0 siblings, 0 replies; 28+ messages in thread
From: Joao Eduardo Luis @ 2016-06-10 23:57 UTC (permalink / raw)
  To: Sage Weil, Adam C. Emerson
  Cc: Gregory Farnum, The Sacred Order of the Squid Cybernetic

On 06/10/2016 09:53 PM, Sage Weil wrote:
> On Fri, 10 Jun 2016, Adam C. Emerson wrote:
>> On 10/06/2016, Sage Weil wrote:
>>> FWIW the time thing is a bit of a red herring.  The only reason time takes
>>> cct is for the configuration clock skew.  That was added for debugging
>>> purposes but has never (to my knowledge) been used.  I'd be happy to rip
>>> it out.
>>
>> I wasn't aware of that. I'd asked once if the clock skew debug option
>> was really important (back with the original time patch) and was told
>> it was. The person I was talking to might have thought I was talking
>> about monitor clock skew or something.
>
> Ah, there is *one* teuthology test that sets this:
>
> $ git grep clock\ offset | cat
> suites/rados/multimon/tasks/mon_clock_with_skews.yaml:        clock offset: 10
>
> I'd be quite happy to drop this test, or kludge around it some other way
> (e.g., put a mon on a different host and change the system time), if it
> saves us as much code churn as I suspect it will...

i've been hacking a mon debug interface via the ceph tool to test 
features and do all sorts of nasty things (provided a given experimental 
feature is enabled and the user is really, really sure about it), and I 
think we could then use it to test clock skews just as well. It'd be a 
matter of telling a given monitor to just skew the response to the 
leader during the timecheck round.

   -Joao


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-06-10 21:20           ` Sage Weil
  2016-06-10 22:51             ` Adam C. Emerson
@ 2016-11-11 21:31             ` Yehuda Sadeh-Weinraub
  2016-11-11 22:17               ` Bassam Tabbara
  1 sibling, 1 reply; 28+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-11-11 21:31 UTC (permalink / raw)
  To: Weil, Sage
  Cc: Adam C. Emerson, Gregory Farnum,
	The Sacred Order of the Squid Cybernetic

Resurrecting this thread.

Trying to extract a simple and practical plan. Here are my thoughts:

We need to have different ceph contexts for:
 - process (e.g, ceph-osd)
 - entity (e.g., osd.1, osd.2)
 - cluster (? -- maybe in the case where it's a client)

We can make a thread local storage variable that will reference the
context this thread needs to use.
Threads that deal with common stuff should use the process or entity
context. This should be set explicitly in these threads. Upon thread
creation we need to set the context for that thread. By default the
context that will be used will be the process context. g_ceph_context
can be changed to be a function (via a macro) that handles the context
selection logic.

Thoughts?

Yehuda

On Fri, Jun 10, 2016 at 2:20 PM, Sage Weil <sage@newdream.net> wrote:
> On Fri, 10 Jun 2016, Adam C. Emerson wrote:
>> On 10/06/2016, Sage Weil wrote:
>> > I think the crux of the issue is the debug/logging code.  Almost all
>> > the other uses of cct are related to config options that are
>> > per-cluster.
>>
>> That is correct. I thought we might pull some things like (like the
>> Crypto stuff) while we were at it but that was more "As long as I'm
>> here anyway I might as well see if there's anything more to do..."
>>
>> > And with logging... it seems like even this is really a per-cluster
>> > (well, per entity) thing.  Sorting through a log file that combines
>> > output from two different objecters sounds horrific, and when we
>> > cram multiple OSDs into one process, we're going to want them to
>> > still feed into different log files.
>>
>> This is a good point that I had not considered. Though, I'm not sure
>> if the un-separated case is that much better. I think part of the
>> savings we would want from multi-OSD work (at least in principle) is
>> to be able to have them share messengers and other overhead. In that
>> case giving each OSD its own fully fledged CephContext doesn't make
>> sense, I don't think.
>
> The messengers aren't tied to CephContext, so I don't think this changes
> things in that regard.  I'm guessing we'll end up with something where
> each entity has a thing that implements the Messenger interface but many
> of those are sharing all their resources behind the scenes (thread pools,
> multiplexing their connections, etc.).  We'll have to figure out how the
> config options related to messenger itself are handled, but I think that
> oddity needs to be done explicitly anyway.  We wouldn't for instance, want
> to assume that all clusters in the same process must share the same
> messenger options.  (Maybe we invent an entity that governs the shared
> resources (process.NNN) and somehow direct config options/admin socket
> commands/whatever toward that?  Who knows.)
>
>> Some form of sterring might work, perhaps making the log system smart
>> enough to shove things around based on subsystem or an entity
>> identifier or something.
>>
>> > How bad is the cct plumbing patch?  I fear this might still be the
>> > best path forward...
>>
>> It's not great? Really at all. I wrote it and I don't like it. I
>> could pull it up and smack it into shape and make sure it passes tests
>> if it were necessary.
>
> Is it pushed somewhere?  Maybe much of the pain is related to the clock
> thing...
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-11-11 21:31             ` Yehuda Sadeh-Weinraub
@ 2016-11-11 22:17               ` Bassam Tabbara
  2016-11-11 22:53                 ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 28+ messages in thread
From: Bassam Tabbara @ 2016-11-11 22:17 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub
  Cc: Sage Weil, Adam C. Emerson, Gregory Farnum,
	The Sacred Order of the Squid Cybernetic

Sorry, I’m late to the discussion, but why not just pass the context to every class that needs it. Its a painful one time change but the compiler will keep you honest. Also if we can figure out a way around the clock skew (which seems like a corner case) I think the surface are is significantly reduced. Using TLS and other mechanisms seems like more complication.

Thanks!
Bassam

> On Nov 11, 2016, at 1:31 PM, Yehuda Sadeh-Weinraub <yehuda@redhat.com> wrote:
> 
> Resurrecting this thread.
> 
> Trying to extract a simple and practical plan. Here are my thoughts:
> 
> We need to have different ceph contexts for:
> - process (e.g, ceph-osd)
> - entity (e.g., osd.1, osd.2)
> - cluster (? -- maybe in the case where it's a client)
> 
> We can make a thread local storage variable that will reference the
> context this thread needs to use.
> Threads that deal with common stuff should use the process or entity
> context. This should be set explicitly in these threads. Upon thread
> creation we need to set the context for that thread. By default the
> context that will be used will be the process context. g_ceph_context
> can be changed to be a function (via a macro) that handles the context
> selection logic.
> 
> Thoughts?
> 
> Yehuda
> 
> On Fri, Jun 10, 2016 at 2:20 PM, Sage Weil <sage@newdream.net> wrote:
>> On Fri, 10 Jun 2016, Adam C. Emerson wrote:
>>> On 10/06/2016, Sage Weil wrote:
>>>> I think the crux of the issue is the debug/logging code.  Almost all
>>>> the other uses of cct are related to config options that are
>>>> per-cluster.
>>> 
>>> That is correct. I thought we might pull some things like (like the
>>> Crypto stuff) while we were at it but that was more "As long as I'm
>>> here anyway I might as well see if there's anything more to do..."
>>> 
>>>> And with logging... it seems like even this is really a per-cluster
>>>> (well, per entity) thing.  Sorting through a log file that combines
>>>> output from two different objecters sounds horrific, and when we
>>>> cram multiple OSDs into one process, we're going to want them to
>>>> still feed into different log files.
>>> 
>>> This is a good point that I had not considered. Though, I'm not sure
>>> if the un-separated case is that much better. I think part of the
>>> savings we would want from multi-OSD work (at least in principle) is
>>> to be able to have them share messengers and other overhead. In that
>>> case giving each OSD its own fully fledged CephContext doesn't make
>>> sense, I don't think.
>> 
>> The messengers aren't tied to CephContext, so I don't think this changes
>> things in that regard.  I'm guessing we'll end up with something where
>> each entity has a thing that implements the Messenger interface but many
>> of those are sharing all their resources behind the scenes (thread pools,
>> multiplexing their connections, etc.).  We'll have to figure out how the
>> config options related to messenger itself are handled, but I think that
>> oddity needs to be done explicitly anyway.  We wouldn't for instance, want
>> to assume that all clusters in the same process must share the same
>> messenger options.  (Maybe we invent an entity that governs the shared
>> resources (process.NNN) and somehow direct config options/admin socket
>> commands/whatever toward that?  Who knows.)
>> 
>>> Some form of sterring might work, perhaps making the log system smart
>>> enough to shove things around based on subsystem or an entity
>>> identifier or something.
>>> 
>>>> How bad is the cct plumbing patch?  I fear this might still be the
>>>> best path forward...
>>> 
>>> It's not great? Really at all. I wrote it and I don't like it. I
>>> could pull it up and smack it into shape and make sure it passes tests
>>> if it were necessary.
>> 
>> Is it pushed somewhere?  Maybe much of the pain is related to the clock
>> thing...
>> 
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e= 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e= 


----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-11-11 22:17               ` Bassam Tabbara
@ 2016-11-11 22:53                 ` Yehuda Sadeh-Weinraub
  2016-11-11 23:29                   ` Bassam Tabbara
       [not found]                   ` <BLUPR0301MB200478F7C6DC349474A0B5F8A7BB0@BLUPR0301MB2004.namprd03.prod.outlook.com>
  0 siblings, 2 replies; 28+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-11-11 22:53 UTC (permalink / raw)
  To: Bassam Tabbara
  Cc: Sage Weil, Adam C. Emerson, Gregory Farnum,
	The Sacred Order of the Squid Cybernetic

On Fri, Nov 11, 2016 at 2:17 PM, Bassam Tabbara
<Bassam.Tabbara@quantum.com> wrote:
> Sorry, I’m late to the discussion, but why not just pass the context to every class that needs it. Its a painful one time change but the compiler will keep you honest. Also if we can figure out a way around the clock skew (which seems like a corner case) I think the surface are is significantly reduced. Using TLS and other mechanisms seems like more complication.

Because then you end up with passing around extra param that you need
to get trickled down to the most basic functions. That's basically
what we are doing now and it's a huge annoyance.

Yehuda

>
> Thanks!
> Bassam
>
>> On Nov 11, 2016, at 1:31 PM, Yehuda Sadeh-Weinraub <yehuda@redhat.com> wrote:
>>
>> Resurrecting this thread.
>>
>> Trying to extract a simple and practical plan. Here are my thoughts:
>>
>> We need to have different ceph contexts for:
>> - process (e.g, ceph-osd)
>> - entity (e.g., osd.1, osd.2)
>> - cluster (? -- maybe in the case where it's a client)
>>
>> We can make a thread local storage variable that will reference the
>> context this thread needs to use.
>> Threads that deal with common stuff should use the process or entity
>> context. This should be set explicitly in these threads. Upon thread
>> creation we need to set the context for that thread. By default the
>> context that will be used will be the process context. g_ceph_context
>> can be changed to be a function (via a macro) that handles the context
>> selection logic.
>>
>> Thoughts?
>>
>> Yehuda
>>
>> On Fri, Jun 10, 2016 at 2:20 PM, Sage Weil <sage@newdream.net> wrote:
>>> On Fri, 10 Jun 2016, Adam C. Emerson wrote:
>>>> On 10/06/2016, Sage Weil wrote:
>>>>> I think the crux of the issue is the debug/logging code.  Almost all
>>>>> the other uses of cct are related to config options that are
>>>>> per-cluster.
>>>>
>>>> That is correct. I thought we might pull some things like (like the
>>>> Crypto stuff) while we were at it but that was more "As long as I'm
>>>> here anyway I might as well see if there's anything more to do..."
>>>>
>>>>> And with logging... it seems like even this is really a per-cluster
>>>>> (well, per entity) thing.  Sorting through a log file that combines
>>>>> output from two different objecters sounds horrific, and when we
>>>>> cram multiple OSDs into one process, we're going to want them to
>>>>> still feed into different log files.
>>>>
>>>> This is a good point that I had not considered. Though, I'm not sure
>>>> if the un-separated case is that much better. I think part of the
>>>> savings we would want from multi-OSD work (at least in principle) is
>>>> to be able to have them share messengers and other overhead. In that
>>>> case giving each OSD its own fully fledged CephContext doesn't make
>>>> sense, I don't think.
>>>
>>> The messengers aren't tied to CephContext, so I don't think this changes
>>> things in that regard.  I'm guessing we'll end up with something where
>>> each entity has a thing that implements the Messenger interface but many
>>> of those are sharing all their resources behind the scenes (thread pools,
>>> multiplexing their connections, etc.).  We'll have to figure out how the
>>> config options related to messenger itself are handled, but I think that
>>> oddity needs to be done explicitly anyway.  We wouldn't for instance, want
>>> to assume that all clusters in the same process must share the same
>>> messenger options.  (Maybe we invent an entity that governs the shared
>>> resources (process.NNN) and somehow direct config options/admin socket
>>> commands/whatever toward that?  Who knows.)
>>>
>>>> Some form of sterring might work, perhaps making the log system smart
>>>> enough to shove things around based on subsystem or an entity
>>>> identifier or something.
>>>>
>>>>> How bad is the cct plumbing patch?  I fear this might still be the
>>>>> best path forward...
>>>>
>>>> It's not great? Really at all. I wrote it and I don't like it. I
>>>> could pull it up and smack it into shape and make sure it passes tests
>>>> if it were necessary.
>>>
>>> Is it pushed somewhere?  Maybe much of the pain is related to the clock
>>> thing...
>>>
>>> sage
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e=
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e=
>
>
> ----------------------------------------------------------------------
> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-11-11 22:53                 ` Yehuda Sadeh-Weinraub
@ 2016-11-11 23:29                   ` Bassam Tabbara
  2016-11-11 23:45                     ` Sage Weil
  2016-11-12  0:33                     ` Matt Benjamin
       [not found]                   ` <BLUPR0301MB200478F7C6DC349474A0B5F8A7BB0@BLUPR0301MB2004.namprd03.prod.outlook.com>
  1 sibling, 2 replies; 28+ messages in thread
From: Bassam Tabbara @ 2016-11-11 23:29 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub
  Cc: Sage Weil, Adam C. Emerson, Gregory Farnum,
	The Sacred Order of the Squid Cybernetic

Yehuda, when I looked at this a few months ago, I thought there might be a way to make it less of an annoyance. If we can remove the clock_skew and figure out a way around dout/logging (i.e. make those process wide) then the number of classes we would need to pass the context to is greatly reduced. This is roughly the path we take for the clients (librados for example). I’ll try to prototype this over the next few weeks as we would love to be able to run multiple OSDs and MONs in the same process.

> On Nov 11, 2016, at 2:53 PM, Yehuda Sadeh-Weinraub <yehuda@redhat.com> wrote:
> 
> On Fri, Nov 11, 2016 at 2:17 PM, Bassam Tabbara
> <Bassam.Tabbara@quantum.com> wrote:
>> Sorry, I’m late to the discussion, but why not just pass the context to every class that needs it. Its a painful one time change but the compiler will keep you honest. Also if we can figure out a way around the clock skew (which seems like a corner case) I think the surface are is significantly reduced. Using TLS and other mechanisms seems like more complication.
> 
> Because then you end up with passing around extra param that you need
> to get trickled down to the most basic functions. That's basically
> what we are doing now and it's a huge annoyance.
> 
> Yehuda
> 
>> 
>> Thanks!
>> Bassam
>> 
>>> On Nov 11, 2016, at 1:31 PM, Yehuda Sadeh-Weinraub <yehuda@redhat.com> wrote:
>>> 
>>> Resurrecting this thread.
>>> 
>>> Trying to extract a simple and practical plan. Here are my thoughts:
>>> 
>>> We need to have different ceph contexts for:
>>> - process (e.g, ceph-osd)
>>> - entity (e.g., osd.1, osd.2)
>>> - cluster (? -- maybe in the case where it's a client)
>>> 
>>> We can make a thread local storage variable that will reference the
>>> context this thread needs to use.
>>> Threads that deal with common stuff should use the process or entity
>>> context. This should be set explicitly in these threads. Upon thread
>>> creation we need to set the context for that thread. By default the
>>> context that will be used will be the process context. g_ceph_context
>>> can be changed to be a function (via a macro) that handles the context
>>> selection logic.
>>> 
>>> Thoughts?
>>> 
>>> Yehuda
>>> 
>>> On Fri, Jun 10, 2016 at 2:20 PM, Sage Weil <sage@newdream.net> wrote:
>>>> On Fri, 10 Jun 2016, Adam C. Emerson wrote:
>>>>> On 10/06/2016, Sage Weil wrote:
>>>>>> I think the crux of the issue is the debug/logging code.  Almost all
>>>>>> the other uses of cct are related to config options that are
>>>>>> per-cluster.
>>>>> 
>>>>> That is correct. I thought we might pull some things like (like the
>>>>> Crypto stuff) while we were at it but that was more "As long as I'm
>>>>> here anyway I might as well see if there's anything more to do..."
>>>>> 
>>>>>> And with logging... it seems like even this is really a per-cluster
>>>>>> (well, per entity) thing.  Sorting through a log file that combines
>>>>>> output from two different objecters sounds horrific, and when we
>>>>>> cram multiple OSDs into one process, we're going to want them to
>>>>>> still feed into different log files.
>>>>> 
>>>>> This is a good point that I had not considered. Though, I'm not sure
>>>>> if the un-separated case is that much better. I think part of the
>>>>> savings we would want from multi-OSD work (at least in principle) is
>>>>> to be able to have them share messengers and other overhead. In that
>>>>> case giving each OSD its own fully fledged CephContext doesn't make
>>>>> sense, I don't think.
>>>> 
>>>> The messengers aren't tied to CephContext, so I don't think this changes
>>>> things in that regard.  I'm guessing we'll end up with something where
>>>> each entity has a thing that implements the Messenger interface but many
>>>> of those are sharing all their resources behind the scenes (thread pools,
>>>> multiplexing their connections, etc.).  We'll have to figure out how the
>>>> config options related to messenger itself are handled, but I think that
>>>> oddity needs to be done explicitly anyway.  We wouldn't for instance, want
>>>> to assume that all clusters in the same process must share the same
>>>> messenger options.  (Maybe we invent an entity that governs the shared
>>>> resources (process.NNN) and somehow direct config options/admin socket
>>>> commands/whatever toward that?  Who knows.)
>>>> 
>>>>> Some form of sterring might work, perhaps making the log system smart
>>>>> enough to shove things around based on subsystem or an entity
>>>>> identifier or something.
>>>>> 
>>>>>> How bad is the cct plumbing patch?  I fear this might still be the
>>>>>> best path forward...
>>>>> 
>>>>> It's not great? Really at all. I wrote it and I don't like it. I
>>>>> could pull it up and smack it into shape and make sure it passes tests
>>>>> if it were necessary.
>>>> 
>>>> Is it pushed somewhere?  Maybe much of the pain is related to the clock
>>>> thing...
>>>> 
>>>> sage
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e=
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e=
>> 
>> 
>> ----------------------------------------------------------------------
>> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-11-11 23:29                   ` Bassam Tabbara
@ 2016-11-11 23:45                     ` Sage Weil
  2016-11-11 23:51                       ` Bassam Tabbara
  2016-11-12  0:33                     ` Matt Benjamin
  1 sibling, 1 reply; 28+ messages in thread
From: Sage Weil @ 2016-11-11 23:45 UTC (permalink / raw)
  To: Bassam Tabbara
  Cc: Yehuda Sadeh-Weinraub, Adam C. Emerson, Gregory Farnum,
	The Sacred Order of the Squid Cybernetic

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7221 bytes --]

On Fri, 11 Nov 2016, Bassam Tabbara wrote:
> Yehuda, when I looked at this a few months ago, I thought there might be 
> a way to make it less of an annoyance. If we can remove the clock_skew 

Happy to remove that, BTW.  We don't use it.

> and figure out a way around dout/logging (i.e. make those process wide) 
> then the number of classes we would need to pass the context to is 
> greatly reduced. This is roughly the path we take for the clients 
> (librados for example). I’ll try to prototype this over the next few 
> weeks as we would love to be able to run multiple OSDs and MONs in the 
> same process.

The problem here is that currently debugging levels are per-entity. We'd 
need to make the choice that per-process is sufficient for our purposes in 
order to make that leap...

But either way, our options are basically another global for debug levels, 
or some thread-local shenanigans, or a cct argument everywhere.  Right?

sage


> 
> > On Nov 11, 2016, at 2:53 PM, Yehuda Sadeh-Weinraub <yehuda@redhat.com> wrote:
> > 
> > On Fri, Nov 11, 2016 at 2:17 PM, Bassam Tabbara
> > <Bassam.Tabbara@quantum.com> wrote:
> >> Sorry, I’m late to the discussion, but why not just pass the context to every class that needs it. Its a painful one time change but the compiler will keep you honest. Also if we can figure out a way around the clock skew (which seems like a corner case) I think the surface are is significantly reduced. Using TLS and other mechanisms seems like more complication.
> > 
> > Because then you end up with passing around extra param that you need
> > to get trickled down to the most basic functions. That's basically
> > what we are doing now and it's a huge annoyance.
> > 
> > Yehuda
> > 
> >> 
> >> Thanks!
> >> Bassam
> >> 
> >>> On Nov 11, 2016, at 1:31 PM, Yehuda Sadeh-Weinraub <yehuda@redhat.com> wrote:
> >>> 
> >>> Resurrecting this thread.
> >>> 
> >>> Trying to extract a simple and practical plan. Here are my thoughts:
> >>> 
> >>> We need to have different ceph contexts for:
> >>> - process (e.g, ceph-osd)
> >>> - entity (e.g., osd.1, osd.2)
> >>> - cluster (? -- maybe in the case where it's a client)
> >>> 
> >>> We can make a thread local storage variable that will reference the
> >>> context this thread needs to use.
> >>> Threads that deal with common stuff should use the process or entity
> >>> context. This should be set explicitly in these threads. Upon thread
> >>> creation we need to set the context for that thread. By default the
> >>> context that will be used will be the process context. g_ceph_context
> >>> can be changed to be a function (via a macro) that handles the context
> >>> selection logic.
> >>> 
> >>> Thoughts?
> >>> 
> >>> Yehuda
> >>> 
> >>> On Fri, Jun 10, 2016 at 2:20 PM, Sage Weil <sage@newdream.net> wrote:
> >>>> On Fri, 10 Jun 2016, Adam C. Emerson wrote:
> >>>>> On 10/06/2016, Sage Weil wrote:
> >>>>>> I think the crux of the issue is the debug/logging code.  Almost all
> >>>>>> the other uses of cct are related to config options that are
> >>>>>> per-cluster.
> >>>>> 
> >>>>> That is correct. I thought we might pull some things like (like the
> >>>>> Crypto stuff) while we were at it but that was more "As long as I'm
> >>>>> here anyway I might as well see if there's anything more to do..."
> >>>>> 
> >>>>>> And with logging... it seems like even this is really a per-cluster
> >>>>>> (well, per entity) thing.  Sorting through a log file that combines
> >>>>>> output from two different objecters sounds horrific, and when we
> >>>>>> cram multiple OSDs into one process, we're going to want them to
> >>>>>> still feed into different log files.
> >>>>> 
> >>>>> This is a good point that I had not considered. Though, I'm not sure
> >>>>> if the un-separated case is that much better. I think part of the
> >>>>> savings we would want from multi-OSD work (at least in principle) is
> >>>>> to be able to have them share messengers and other overhead. In that
> >>>>> case giving each OSD its own fully fledged CephContext doesn't make
> >>>>> sense, I don't think.
> >>>> 
> >>>> The messengers aren't tied to CephContext, so I don't think this changes
> >>>> things in that regard.  I'm guessing we'll end up with something where
> >>>> each entity has a thing that implements the Messenger interface but many
> >>>> of those are sharing all their resources behind the scenes (thread pools,
> >>>> multiplexing their connections, etc.).  We'll have to figure out how the
> >>>> config options related to messenger itself are handled, but I think that
> >>>> oddity needs to be done explicitly anyway.  We wouldn't for instance, want
> >>>> to assume that all clusters in the same process must share the same
> >>>> messenger options.  (Maybe we invent an entity that governs the shared
> >>>> resources (process.NNN) and somehow direct config options/admin socket
> >>>> commands/whatever toward that?  Who knows.)
> >>>> 
> >>>>> Some form of sterring might work, perhaps making the log system smart
> >>>>> enough to shove things around based on subsystem or an entity
> >>>>> identifier or something.
> >>>>> 
> >>>>>> How bad is the cct plumbing patch?  I fear this might still be the
> >>>>>> best path forward...
> >>>>> 
> >>>>> It's not great? Really at all. I wrote it and I don't like it. I
> >>>>> could pull it up and smack it into shape and make sure it passes tests
> >>>>> if it were necessary.
> >>>> 
> >>>> Is it pushed somewhere?  Maybe much of the pain is related to the clock
> >>>> thing...
> >>>> 
> >>>> sage
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e=
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e=
> >> 
> >> 
> >> ----------------------------------------------------------------------
> >> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-11-11 23:45                     ` Sage Weil
@ 2016-11-11 23:51                       ` Bassam Tabbara
  0 siblings, 0 replies; 28+ messages in thread
From: Bassam Tabbara @ 2016-11-11 23:51 UTC (permalink / raw)
  To: Sage Weil
  Cc: Yehuda Sadeh-Weinraub, Adam C. Emerson, Gregory Farnum,
	The Sacred Order of the Squid Cybernetic

> But either way, our options are basically another global for debug levels, 
> or some thread-local shenanigans, or a cct argument everywhere.  Right?

yes. another global for debug levels seems like it would go a long way and does make assumptions about threading model.

> On Nov 11, 2016, at 3:45 PM, Sage Weil <sweil@redhat.com> wrote:
> 
> On Fri, 11 Nov 2016, Bassam Tabbara wrote:
>> Yehuda, when I looked at this a few months ago, I thought there might be 
>> a way to make it less of an annoyance. If we can remove the clock_skew 
> 
> Happy to remove that, BTW.  We don't use it.
> 
>> and figure out a way around dout/logging (i.e. make those process wide) 
>> then the number of classes we would need to pass the context to is 
>> greatly reduced. This is roughly the path we take for the clients 
>> (librados for example). I’ll try to prototype this over the next few 
>> weeks as we would love to be able to run multiple OSDs and MONs in the 
>> same process.
> 
> The problem here is that currently debugging levels are per-entity. We'd 
> need to make the choice that per-process is sufficient for our purposes in 
> order to make that leap...
> 
> But either way, our options are basically another global for debug levels, 
> or some thread-local shenanigans, or a cct argument everywhere.  Right?
> 
> sage
> 
> 
>> 
>>> On Nov 11, 2016, at 2:53 PM, Yehuda Sadeh-Weinraub <yehuda@redhat.com> wrote:
>>> 
>>> On Fri, Nov 11, 2016 at 2:17 PM, Bassam Tabbara
>>> <Bassam.Tabbara@quantum.com> wrote:
>>>> Sorry, I’m late to the discussion, but why not just pass the context to every class that needs it. Its a painful one time change but the compiler will keep you honest. Also if we can figure out a way around the clock skew (which seems like a corner case) I think the surface are is significantly reduced. Using TLS and other mechanisms seems like more complication.
>>> 
>>> Because then you end up with passing around extra param that you need
>>> to get trickled down to the most basic functions. That's basically
>>> what we are doing now and it's a huge annoyance.
>>> 
>>> Yehuda
>>> 
>>>> 
>>>> Thanks!
>>>> Bassam
>>>> 
>>>>> On Nov 11, 2016, at 1:31 PM, Yehuda Sadeh-Weinraub <yehuda@redhat.com> wrote:
>>>>> 
>>>>> Resurrecting this thread.
>>>>> 
>>>>> Trying to extract a simple and practical plan. Here are my thoughts:
>>>>> 
>>>>> We need to have different ceph contexts for:
>>>>> - process (e.g, ceph-osd)
>>>>> - entity (e.g., osd.1, osd.2)
>>>>> - cluster (? -- maybe in the case where it's a client)
>>>>> 
>>>>> We can make a thread local storage variable that will reference the
>>>>> context this thread needs to use.
>>>>> Threads that deal with common stuff should use the process or entity
>>>>> context. This should be set explicitly in these threads. Upon thread
>>>>> creation we need to set the context for that thread. By default the
>>>>> context that will be used will be the process context. g_ceph_context
>>>>> can be changed to be a function (via a macro) that handles the context
>>>>> selection logic.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Yehuda
>>>>> 
>>>>> On Fri, Jun 10, 2016 at 2:20 PM, Sage Weil <sage@newdream.net> wrote:
>>>>>> On Fri, 10 Jun 2016, Adam C. Emerson wrote:
>>>>>>> On 10/06/2016, Sage Weil wrote:
>>>>>>>> I think the crux of the issue is the debug/logging code.  Almost all
>>>>>>>> the other uses of cct are related to config options that are
>>>>>>>> per-cluster.
>>>>>>> 
>>>>>>> That is correct. I thought we might pull some things like (like the
>>>>>>> Crypto stuff) while we were at it but that was more "As long as I'm
>>>>>>> here anyway I might as well see if there's anything more to do..."
>>>>>>> 
>>>>>>>> And with logging... it seems like even this is really a per-cluster
>>>>>>>> (well, per entity) thing.  Sorting through a log file that combines
>>>>>>>> output from two different objecters sounds horrific, and when we
>>>>>>>> cram multiple OSDs into one process, we're going to want them to
>>>>>>>> still feed into different log files.
>>>>>>> 
>>>>>>> This is a good point that I had not considered. Though, I'm not sure
>>>>>>> if the un-separated case is that much better. I think part of the
>>>>>>> savings we would want from multi-OSD work (at least in principle) is
>>>>>>> to be able to have them share messengers and other overhead. In that
>>>>>>> case giving each OSD its own fully fledged CephContext doesn't make
>>>>>>> sense, I don't think.
>>>>>> 
>>>>>> The messengers aren't tied to CephContext, so I don't think this changes
>>>>>> things in that regard.  I'm guessing we'll end up with something where
>>>>>> each entity has a thing that implements the Messenger interface but many
>>>>>> of those are sharing all their resources behind the scenes (thread pools,
>>>>>> multiplexing their connections, etc.).  We'll have to figure out how the
>>>>>> config options related to messenger itself are handled, but I think that
>>>>>> oddity needs to be done explicitly anyway.  We wouldn't for instance, want
>>>>>> to assume that all clusters in the same process must share the same
>>>>>> messenger options.  (Maybe we invent an entity that governs the shared
>>>>>> resources (process.NNN) and somehow direct config options/admin socket
>>>>>> commands/whatever toward that?  Who knows.)
>>>>>> 
>>>>>>> Some form of sterring might work, perhaps making the log system smart
>>>>>>> enough to shove things around based on subsystem or an entity
>>>>>>> identifier or something.
>>>>>>> 
>>>>>>>> How bad is the cct plumbing patch?  I fear this might still be the
>>>>>>>> best path forward...
>>>>>>> 
>>>>>>> It's not great? Really at all. I wrote it and I don't like it. I
>>>>>>> could pull it up and smack it into shape and make sure it passes tests
>>>>>>> if it were necessary.
>>>>>> 
>>>>>> Is it pushed somewhere?  Maybe much of the pain is related to the clock
>>>>>> thing...
>>>>>> 
>>>>>> sage
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e=
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e=
>>>> 
>>>> 
>>>> ----------------------------------------------------------------------
>>>> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
>> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-11-11 23:29                   ` Bassam Tabbara
  2016-11-11 23:45                     ` Sage Weil
@ 2016-11-12  0:33                     ` Matt Benjamin
  2016-11-12  0:40                       ` Bassam Tabbara
  2016-11-14 16:10                       ` Adam C. Emerson
  1 sibling, 2 replies; 28+ messages in thread
From: Matt Benjamin @ 2016-11-12  0:33 UTC (permalink / raw)
  To: Bassam Tabbara
  Cc: Yehuda Sadeh-Weinraub, Sage Weil, Adam C. Emerson,
	Gregory Farnum, The Sacred Order of the Squid Cybernetic

Hi,

----- Original Message -----
> From: "Bassam Tabbara" <Bassam.Tabbara@quantum.com>
> To: "Yehuda Sadeh-Weinraub" <yehuda@redhat.com>
> Cc: "Sage Weil" <sweil@redhat.com>, "Adam C. Emerson" <aemerson@redhat.com>, "Gregory Farnum" <gfarnum@redhat.com>,
> "The Sacred Order of the Squid Cybernetic" <ceph-devel@vger.kernel.org>
> Sent: Friday, November 11, 2016 6:29:19 PM
> Subject: Re: A Transformation of our Global Context
> 
> Yehuda, when I looked at this a few months ago, I thought there might be a
> way to make it less of an annoyance. If we can remove the clock_skew and
> figure out a way around dout/logging (i.e. make those process wide) then the
> number of classes we would need to pass the context to is greatly reduced.
> This is roughly the path we take for the clients (librados for example).
> I’ll try to prototype this over the next few weeks as we would love to be
> able to run multiple OSDs and MONs in the same process.

Well, I certainly want to run multiple OSDs in the same process.  My bigger
interest is in colocating other things with an OSD.  Our cohortfs codebase
made use of this.  So far as I can tell, what you're working on would be
compatible.  The other piece that we relied on was a DirectMessenger, and sync
and async APIs built on top of that.

> 
> > On Nov 11, 2016, at 2:53 PM, Yehuda Sadeh-Weinraub <yehuda@redhat.com>
> > wrote:
> > 
> > On Fri, Nov 11, 2016 at 2:17 PM, Bassam Tabbara
> > <Bassam.Tabbara@quantum.com> wrote:
> >> Sorry, I’m late to the discussion, but why not just pass the context to
> >> every class that needs it. Its a painful one time change but the compiler
> >> will keep you honest. Also if we can figure out a way around the clock
> >> skew (which seems like a corner case) I think the surface are is
> >> significantly reduced. Using TLS and other mechanisms seems like more
> >> complication.
> > 
> > Because then you end up with passing around extra param that you need
> > to get trickled down to the most basic functions. That's basically
> > what we are doing now and it's a huge annoyance.

Well, I guess.  We had removed g_ceph_context, and apart from some duplication,
it wasn't very noticeable pain.

Matt

> > 
> > Yehuda
> > 
> >> 
> >> Thanks!
> >> Bassam
> >> 
> >>> On Nov 11, 2016, at 1:31 PM, Yehuda Sadeh-Weinraub <yehuda@redhat.com>
> >>> wrote:
> >>> 
> >>> Resurrecting this thread.
> >>> 
> >>> Trying to extract a simple and practical plan. Here are my thoughts:
> >>> 
> >>> We need to have different ceph contexts for:
> >>> - process (e.g, ceph-osd)
> >>> - entity (e.g., osd.1, osd.2)
> >>> - cluster (? -- maybe in the case where it's a client)
> >>> 
> >>> We can make a thread local storage variable that will reference the
> >>> context this thread needs to use.
> >>> Threads that deal with common stuff should use the process or entity
> >>> context. This should be set explicitly in these threads. Upon thread
> >>> creation we need to set the context for that thread. By default the
> >>> context that will be used will be the process context. g_ceph_context
> >>> can be changed to be a function (via a macro) that handles the context
> >>> selection logic.
> >>> 
> >>> Thoughts?
> >>> 
> >>> Yehuda
> >>> 
> >>> On Fri, Jun 10, 2016 at 2:20 PM, Sage Weil <sage@newdream.net> wrote:
> >>>> On Fri, 10 Jun 2016, Adam C. Emerson wrote:
> >>>>> On 10/06/2016, Sage Weil wrote:
> >>>>>> I think the crux of the issue is the debug/logging code.  Almost all
> >>>>>> the other uses of cct are related to config options that are
> >>>>>> per-cluster.
> >>>>> 
> >>>>> That is correct. I thought we might pull some things like (like the
> >>>>> Crypto stuff) while we were at it but that was more "As long as I'm
> >>>>> here anyway I might as well see if there's anything more to do..."
> >>>>> 
> >>>>>> And with logging... it seems like even this is really a per-cluster
> >>>>>> (well, per entity) thing.  Sorting through a log file that combines
> >>>>>> output from two different objecters sounds horrific, and when we
> >>>>>> cram multiple OSDs into one process, we're going to want them to
> >>>>>> still feed into different log files.
> >>>>> 
> >>>>> This is a good point that I had not considered. Though, I'm not sure
> >>>>> if the un-separated case is that much better. I think part of the
> >>>>> savings we would want from multi-OSD work (at least in principle) is
> >>>>> to be able to have them share messengers and other overhead. In that
> >>>>> case giving each OSD its own fully fledged CephContext doesn't make
> >>>>> sense, I don't think.
> >>>> 
> >>>> The messengers aren't tied to CephContext, so I don't think this changes
> >>>> things in that regard.  I'm guessing we'll end up with something where
> >>>> each entity has a thing that implements the Messenger interface but many
> >>>> of those are sharing all their resources behind the scenes (thread
> >>>> pools,
> >>>> multiplexing their connections, etc.).  We'll have to figure out how the
> >>>> config options related to messenger itself are handled, but I think that
> >>>> oddity needs to be done explicitly anyway.  We wouldn't for instance,
> >>>> want
> >>>> to assume that all clusters in the same process must share the same
> >>>> messenger options.  (Maybe we invent an entity that governs the shared
> >>>> resources (process.NNN) and somehow direct config options/admin socket
> >>>> commands/whatever toward that?  Who knows.)
> >>>> 
> >>>>> Some form of sterring might work, perhaps making the log system smart
> >>>>> enough to shove things around based on subsystem or an entity
> >>>>> identifier or something.
> >>>>> 
> >>>>>> How bad is the cct plumbing patch?  I fear this might still be the
> >>>>>> best path forward...
> >>>>> 
> >>>>> It's not great? Really at all. I wrote it and I don't like it. I
> >>>>> could pull it up and smack it into shape and make sure it passes tests
> >>>>> if it were necessary.
> >>>> 
> >>>> Is it pushed somewhere?  Maybe much of the pain is related to the clock
> >>>> thing...
> >>>> 
> >>>> sage
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at
> >>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e=
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=BTMd2ANcDl5P_nTkEam5zzywWdGjHoaoXy4JMG_yHPA&m=QKRlbMbVgp2mHcuHDtHefIP61023h3qQSsiXl9dsMFY&s=-_S4Iv0GTotef4x0R2-A98N0ufnGQnNa3OOZJwPwjqw&e=
> >> 
> >> 
> >> ----------------------------------------------------------------------
> >> The information contained in this transmission may be confidential. Any
> >> disclosure, copying, or further distribution of confidential information
> >> is not permitted unless such privilege is explicitly granted in writing
> >> by Quantum. Quantum reserves the right to have electronic communications,
> >> including email and attachments, sent across its networks filtered
> >> through anti virus and spam software programs and retain such messages in
> >> order to comply with applicable data security and retention requirements.
> >> Quantum is not responsible for the proper and complete transmission of
> >> the substance of this communication or for any delay in its receipt.
> 
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-11-12  0:33                     ` Matt Benjamin
@ 2016-11-12  0:40                       ` Bassam Tabbara
  2016-11-12  2:37                         ` Matt Benjamin
  2016-11-14 16:10                       ` Adam C. Emerson
  1 sibling, 1 reply; 28+ messages in thread
From: Bassam Tabbara @ 2016-11-12  0:40 UTC (permalink / raw)
  To: Matt Benjamin
  Cc: Yehuda Sadeh-Weinraub, Sage Weil, Adam C. Emerson,
	Gregory Farnum, The Sacred Order of the Squid Cybernetic

> Well, I guess.  We had removed g_ceph_context, and apart from some duplication,
> it wasn't very noticeable pain.

Matt, is this work you can put back upstream? Do you have a pointer to this work?

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-11-12  0:40                       ` Bassam Tabbara
@ 2016-11-12  2:37                         ` Matt Benjamin
  0 siblings, 0 replies; 28+ messages in thread
From: Matt Benjamin @ 2016-11-12  2:37 UTC (permalink / raw)
  To: Bassam Tabbara
  Cc: Yehuda Sadeh-Weinraub, Sage Weil, Adam C. Emerson,
	Gregory Farnum, The Sacred Order of the Squid Cybernetic

Hi Bassam,

----- Original Message -----
> From: "Bassam Tabbara" <Bassam.Tabbara@quantum.com>
> To: "Matt Benjamin" <mbenjamin@redhat.com>
> Cc: "Yehuda Sadeh-Weinraub" <yehuda@redhat.com>, "Sage Weil" <sweil@redhat.com>, "Adam C. Emerson"
> <aemerson@redhat.com>, "Gregory Farnum" <gfarnum@redhat.com>, "The Sacred Order of the Squid Cybernetic"
> <ceph-devel@vger.kernel.org>
> Sent: Friday, November 11, 2016 7:40:18 PM
> Subject: Re: A Transformation of our Global Context
> 
> > Well, I guess.  We had removed g_ceph_context, and apart from some
> > duplication,
> > it wasn't very noticeable pain.
> 
> Matt, is this work you can put back upstream? Do you have a pointer to this
> work?

Of course, it's all out in github.  The actual code isn't that interesting.  Perhaps shows
where a couple of the pitfalls were, for a subset of an older Ceph.  Adam's email starting the
original thread points up where the old change is insufficient as written.

Matt

> 
> ----------------------------------------------------------------------
> The information contained in this transmission may be confidential. Any
> disclosure, copying, or further distribution of confidential information is
> not permitted unless such privilege is explicitly granted in writing by
> Quantum. Quantum reserves the right to have electronic communications,
> including email and attachments, sent across its networks filtered through
> anti virus and spam software programs and retain such messages in order to
> comply with applicable data security and retention requirements. Quantum is
> not responsible for the proper and complete transmission of the substance of
> this communication or for any delay in its receipt.
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
       [not found]                   ` <BLUPR0301MB200478F7C6DC349474A0B5F8A7BB0@BLUPR0301MB2004.namprd03.prod.outlook.com>
@ 2016-11-14 16:08                     ` Adam C. Emerson
  0 siblings, 0 replies; 28+ messages in thread
From: Adam C. Emerson @ 2016-11-14 16:08 UTC (permalink / raw)
  To: Bassam Tabbara
  Cc: Yehuda Sadeh-Weinraub, Sage Weil, Gregory Farnum,
	The Sacred Order of the Squid Cybernetic

On 11/11/2016, Bassam Tabbara wrote:
> Yehuda, when I looked at this a few months ago, I thought there
> might be a way to make it less of an annoyance. If we can remove the
> clock_skew and figure out a way around dout/logging (i.e. make those
> process wide) then the number of classes we would need to pass the
> context to is greatly reduced. This is roughly the path we take for
> the clients (librados for example). I’ll try to prototype this over
> the next few weeks as we would love to be able to run multiple OSDs
> and MONs in the same process.

That was something like my original idea, that we coudl factor the
dout support out of CephContext and have it be a per-thread variable,
ideally set up as a fluid so we could annotate the call path. (So that
some function way down at the bottom of the call stack would log
things and still know that it was called by osd.3 or mds.5 or what
have you.)

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: A Transformation of our Global Context
  2016-11-12  0:33                     ` Matt Benjamin
  2016-11-12  0:40                       ` Bassam Tabbara
@ 2016-11-14 16:10                       ` Adam C. Emerson
  1 sibling, 0 replies; 28+ messages in thread
From: Adam C. Emerson @ 2016-11-14 16:10 UTC (permalink / raw)
  To: Matt Benjamin
  Cc: Bassam Tabbara, Yehuda Sadeh-Weinraub, Sage Weil, Gregory Farnum,
	The Sacred Order of the Squid Cybernetic

On 11/11/2016, Matt Benjamin wrote:
> Well, I guess.  We had removed g_ceph_context, and apart from some duplication,
> it wasn't very noticeable pain.

The difference there was that I felt more free to throw out the most
painful cases of dout.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-11-14 16:11 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-10 18:50 A Transformation of our Global Context Adam C. Emerson
2016-06-10 19:47 ` Gregory Farnum
2016-06-10 19:58   ` Adam C. Emerson
2016-06-10 20:06     ` Gregory Farnum
2016-06-10 20:15       ` Adam C. Emerson
2016-06-10 20:40         ` Sage Weil
2016-06-10 20:45           ` Adam C. Emerson
2016-06-10 20:53             ` Sage Weil
2016-06-10 23:57               ` Joao Eduardo Luis
2016-06-10 20:38       ` Sage Weil
2016-06-10 21:04         ` Adam C. Emerson
2016-06-10 21:20           ` Sage Weil
2016-06-10 22:51             ` Adam C. Emerson
2016-11-11 21:31             ` Yehuda Sadeh-Weinraub
2016-11-11 22:17               ` Bassam Tabbara
2016-11-11 22:53                 ` Yehuda Sadeh-Weinraub
2016-11-11 23:29                   ` Bassam Tabbara
2016-11-11 23:45                     ` Sage Weil
2016-11-11 23:51                       ` Bassam Tabbara
2016-11-12  0:33                     ` Matt Benjamin
2016-11-12  0:40                       ` Bassam Tabbara
2016-11-12  2:37                         ` Matt Benjamin
2016-11-14 16:10                       ` Adam C. Emerson
     [not found]                   ` <BLUPR0301MB200478F7C6DC349474A0B5F8A7BB0@BLUPR0301MB2004.namprd03.prod.outlook.com>
2016-11-14 16:08                     ` Adam C. Emerson
2016-06-10 20:28     ` Allen Samuels
2016-06-10 20:43       ` Adam C. Emerson
2016-06-10 22:38         ` Allen Samuels
2016-06-10 22:44           ` Adam C. Emerson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.