All of lore.kernel.org
 help / color / mirror / Atom feed
* wip-addr
@ 2015-08-24 18:01 Sage Weil
  2015-08-25 10:06 ` wip-addr Marcus Watts
  0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2015-08-24 18:01 UTC (permalink / raw)
  To: mwatts, ceph-devel

Hi Marcus,

I looked over the wip-addr branch a bit.  I have two basic 
questions/concerns:

1) In commit
	https://github.com/ceph/ceph/commit/73b09090466a43d5aceb979a4802de3f3f5bf24a 
we switch the type field from sa_family to transport_type.  This seems 
like the way to go, but we need to deal with the fact that lots of 
clusters out there are IPv6 and have AF_INET6 filled in here.  Probably 
both types should be interpreted to mean "existing/legacy IP messenger" or 
whatever we want to call SimpleMessenger/AsyncMessenger's protocol.

I think encode needs to make sure it fills in that value for the type when 
encoding the legacy entity_addr_t encoding, but could use a/the single 
valid value for the new encoding.  And any get_transport_type() accessor 
should also return the single valid value.

2) In the later commit
	https://github.com/ceph/ceph/commit/9d203a2058f76414703b4fc212a1a0a960d0c672 
you introduce a grammar for printing/parsing the addrs.  This also makes 
sense since e.g. xio uses an IP to identify an endpoint.  I think we 
should identify these based on the *protocol* and not the implementation, 
though... whether we use SimpleMessenger or AsyncMessenger is not a 
property of the address.  Maybe "tcp://" makes more sense here?  Or 
perhaps no prefix at all (a bare IP address), so that this looks the same 
as it did before in the case where the default protocol(s) are in use.

I assume the xio protocol (whether it is rdma or tcp) is closely tied to 
libxio itself.. is that right?  If so, using xio in that prefix makes 
sense.  I'd include xio somewhere in the rdma prefix though (xrdma:// and 
xtcp://)?

What do you think?

Logistically, I think the steps for getting this ready for merge are:

1) Separate out the preliminary patches that pass a feature to the addr 
encoding.. without any of the other cohortfs patches that are currently on 
this branch.  Once this builds we can merge it separately from the rest...

2) The entity_addrvec_t type.

3) The type -> transport_type switch.

4) We should make the new entity_addr_t encoding encode sockaddr piece 
more compactly instead of eating up a full 80-byte sockaddr_storage even 
for the ~8-byte IPv4 sockaddr_in.  Maybe just need to encode an 
explicit length for the sockaddr_* piece?

Something like that?
sage

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: wip-addr
  2015-08-24 18:01 wip-addr Sage Weil
@ 2015-08-25 10:06 ` Marcus Watts
  2015-08-25 14:08   ` wip-addr Sage Weil
  0 siblings, 1 reply; 9+ messages in thread
From: Marcus Watts @ 2015-08-25 10:06 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Mon, Aug 24, 2015 at 11:01:45AM -0700, Sage Weil wrote:
...
> I looked over the wip-addr branch a bit.  I have two basic 
> questions/concerns:
> 
> 1) In commit
> 	https://github.com/ceph/ceph/commit/73b09090466a43d5aceb979a4802de3f3f5bf24a 
> we switch the type field from sa_family to transport_type.  This seems 
> like the way to go, but we need to deal with the fact that lots of 
> clusters out there are IPv6 and have AF_INET6 filled in here.  Probably 
> both types should be interpreted to mean "existing/legacy IP messenger" or 
> whatever we want to call SimpleMessenger/AsyncMessenger's protocol.
> 
> I think encode needs to make sure it fills in that value for the type when 
> encoding the legacy entity_addr_t encoding, but could use a/the single 
> valid value for the new encoding.  And any get_transport_type() accessor 
> should also return the single valid value.

With cohortfs I had the luxury of ignoring existing installations.

I think I concluded at the time that probably most real systems
had '0' stored here.
Yes, if there are things that have "junk" here, they'll need to be handled
appropriately - I think it can be mostly fixed by just having decode map
transport_type "AF_INET6" (which is probably 10) to 0.  AF_INET is 2,
so maybe that too.  People switching *back* to older code after using
new code won't be so happy - would probably need a patch for older code
if that's a concern.

If there's code out there that thinks get_transport_type() returns an address
family, that will need fixing.  something like,
	get_address_family(){ return in_addr->ss_family; }
but I didn't find any code that actually needed this.

> 
> 2) In the later commit
> 	https://github.com/ceph/ceph/commit/9d203a2058f76414703b4fc212a1a0a960d0c672 
> you introduce a grammar for printing/parsing the addrs.  This also makes 
> sense since e.g. xio uses an IP to identify an endpoint.  I think we 
> should identify these based on the *protocol* and not the implementation, 
> though... whether we use SimpleMessenger or AsyncMessenger is not a 
> property of the address.  Maybe "tcp://" makes more sense here?  Or 
> perhaps no prefix at all (a bare IP address), so that this looks the same 
> as it did before in the case where the default protocol(s) are in use.
> 
> I assume the xio protocol (whether it is rdma or tcp) is closely tied to 
> libxio itself.. is that right?  If so, using xio in that prefix makes 
> sense.  I'd include xio somewhere in the rdma prefix though (xrdma:// and 
> xtcp://)?

Code base I had didn't have async messenger as a transport type; if that
needs to be addressable independently of simple messenger it will need a
prefix.  I had a prefixless address decode as simple messenger - I
can make this more obvious than "unsigned type = 0;".

I had these prefixes,
	sm
	rdma
	xtcp
but I can easily pick others.  I figured rdma didn't need an x since
there didn't seem to be much chance anybody would want a non-xio version
of rdma.

> 
> What do you think?
> 
> Logistically, I think the steps for getting this ready for merge are:
> 
> 1) Separate out the preliminary patches that pass a feature to the addr 
> encoding.. without any of the other cohortfs patches that are currently on 
> this branch.  Once this builds we can merge it separately from the rest...
> 
> 2) The entity_addrvec_t type.
> 
> 3) The type -> transport_type switch.

There's a lot of cohortfs glop that still needs to go away in this branch.

I hadn't thought of structuring the patch that way.  I can do it,
but the encoding feature will be tedious to split out.
There's a bunch of cascading stuff after entity_addrvec_t which
could go into a series of further patches, monmap, and so forth,
not all of which exist yet.

> 
> 4) We should make the new entity_addr_t encoding encode sockaddr piece 
> more compactly instead of eating up a full 80-byte sockaddr_storage even 
> for the ~8-byte IPv4 sockaddr_in.  Maybe just need to encode an 
> explicit length for the sockaddr_* piece?

Yes, this could be made more memory efficient.  It's mainly a matter
of making sure the constructor gets the proper amount of memory.
There's a C hack to do this, but I remember seeing a c++ technique
that was a bit less hackish.

A sockaddr_in is actually 16 bytes of which the last 8 are dummy
padding (sin_zero).  (And there there are some systems that
actually check to be sure they're 0.)

BSD4.4 derived systems have a length coded into sockaddr too - but on
linux it's usually possible to get by without storing the length, and
I don't think ceph will need to do so.

...

					-Marcus Watts

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: wip-addr
  2015-08-25 10:06 ` wip-addr Marcus Watts
@ 2015-08-25 14:08   ` Sage Weil
  0 siblings, 0 replies; 9+ messages in thread
From: Sage Weil @ 2015-08-25 14:08 UTC (permalink / raw)
  To: Marcus Watts; +Cc: ceph-devel

On Tue, 25 Aug 2015, Marcus Watts wrote:
> On Mon, Aug 24, 2015 at 11:01:45AM -0700, Sage Weil wrote:
> ...
> > I looked over the wip-addr branch a bit.  I have two basic 
> > questions/concerns:
> > 
> > 1) In commit
> > 	https://github.com/ceph/ceph/commit/73b09090466a43d5aceb979a4802de3f3f5bf24a 
> > we switch the type field from sa_family to transport_type.  This seems 
> > like the way to go, but we need to deal with the fact that lots of 
> > clusters out there are IPv6 and have AF_INET6 filled in here.  Probably 
> > both types should be interpreted to mean "existing/legacy IP messenger" or 
> > whatever we want to call SimpleMessenger/AsyncMessenger's protocol.
> > 
> > I think encode needs to make sure it fills in that value for the type when 
> > encoding the legacy entity_addr_t encoding, but could use a/the single 
> > valid value for the new encoding.  And any get_transport_type() accessor 
> > should also return the single valid value.
> 
> With cohortfs I had the luxury of ignoring existing installations.
> 
> I think I concluded at the time that probably most real systems
> had '0' stored here.

Oh, you're right--this is always 0 currently, so we're in good shape.

> Yes, if there are things that have "junk" here, they'll need to be handled
> appropriately - I think it can be mostly fixed by just having decode map
> transport_type "AF_INET6" (which is probably 10) to 0.  AF_INET is 2,
> so maybe that too.  People switching *back* to older code after using
> new code won't be so happy - would probably need a patch for older code
> if that's a concern.
> 
> If there's code out there that thinks get_transport_type() returns an address
> family, that will need fixing.  something like,
> 	get_address_family(){ return in_addr->ss_family; }
> but I didn't find any code that actually needed this.

There's already a {get,set}_family() that return ss_family, so I think 
we're okay there.

The main thing with entity_addr_t (and entity_addrvec_t) is that when the 
new feature is not present we'll want to encode in the old format.  I 
wonder if the best thing to do here is silently drop the type if it is 
non-zero here.  We could assert and blow up, but that is a bit dramatic.

The underlying strategy here is to replace (the old) entity_addr_t with 
entity_addrvec_t in most places, and when encoding the addrvec without the 
new feature, encode only the first addr in its compat form.  That way, as 
long as the first addr in the list is always IPv4 or IPv6 and the standard 
protocol, old clients will be happy.

If we make the addrvec ordered by preference, that may be problematic, 
since we'd always prefer the legacy protocol in rdma environments.  
Perhaps it would be better to make the compat addrvec encode pick out the 
first type 0 addr?  That's more robust anyway...

> > 2) In the later commit
> > 	https://github.com/ceph/ceph/commit/9d203a2058f76414703b4fc212a1a0a960d0c672 
> > you introduce a grammar for printing/parsing the addrs.  This also makes 
> > sense since e.g. xio uses an IP to identify an endpoint.  I think we 
> > should identify these based on the *protocol* and not the implementation, 
> > though... whether we use SimpleMessenger or AsyncMessenger is not a 
> > property of the address.  Maybe "tcp://" makes more sense here?  Or 
> > perhaps no prefix at all (a bare IP address), so that this looks the same 
> > as it did before in the case where the default protocol(s) are in use.
> > 
> > I assume the xio protocol (whether it is rdma or tcp) is closely tied to 
> > libxio itself.. is that right?  If so, using xio in that prefix makes 
> > sense.  I'd include xio somewhere in the rdma prefix though (xrdma:// and 
> > xtcp://)?
> 
> Code base I had didn't have async messenger as a transport type; if that
> needs to be addressable independently of simple messenger it will need a
> prefix.  I had a prefixless address decode as simple messenger - I
> can make this more obvious than "unsigned type = 0;".
> 
> I had these prefixes,
> 	sm
> 	rdma
> 	xtcp
> but I can easily pick others.  I figured rdma didn't need an x since
> there didn't seem to be much chance anybody would want a non-xio version
> of rdma.

The key thing is that the addr property is the *protocol* not the 
implementation, so from addr's perspective SimpleMessenger and 
AsyncMessenger are indistinguishable.  How about

  tcp    - legacy protocol (simple or async, whatever use has configured)
  xrdma  - xio over rdma
  xtcp   - xio over tcp

where the tcp:// is parsed (if present) but left off when rendering 
output, so when using the legacy protocol nothing looks 
different?

> > Logistically, I think the steps for getting this ready for merge are:
> > 
> > 1) Separate out the preliminary patches that pass a feature to the addr 
> > encoding.. without any of the other cohortfs patches that are currently on 
> > this branch.  Once this builds we can merge it separately from the rest...
> > 
> > 2) The entity_addrvec_t type.
> > 
> > 3) The type -> transport_type switch.
> 
> There's a lot of cohortfs glop that still needs to go away in this branch.
> 
> I hadn't thought of structuring the patch that way.  I can do it,
> but the encoding feature will be tedious to split out.
> There's a bunch of cascading stuff after entity_addrvec_t which
> could go into a series of further patches, monmap, and so forth,
> not all of which exist yet.

It looks like c728926a86e1410f959011d24700bb07bad1dc2c adds 
entity_addrvec_t and adds feature to encoding all over the place... but 
doesn't actually use addrvec at all yet.  I'd separate the addrvec 
declaration into its own patch (just ignore it for now), and build a 
new branch on master that

 1- makes entity_addr_t require featuers when encoding, and
 2- cherry-pick the patches adding the feature arg

Once that whole cleanup is complete and builds, the rest will be much 
easier... and IMO we should keep the two separate.  This first piece will 
effectively be a no-op (entity_addr_t encoding doesn't yet depend on 
the feature).  Unfortunately it's the hardest part since we need to make 
sure we have the peer features available in all sorts of awkward 
places...

> > 4) We should make the new entity_addr_t encoding encode sockaddr piece 
> > more compactly instead of eating up a full 80-byte sockaddr_storage even 
> > for the ~8-byte IPv4 sockaddr_in.  Maybe just need to encode an 
> > explicit length for the sockaddr_* piece?
> 
> Yes, this could be made more memory efficient.  It's mainly a matter
> of making sure the constructor gets the proper amount of memory.
> There's a C hack to do this, but I remember seeing a c++ technique
> that was a bit less hackish.
> 
> A sockaddr_in is actually 16 bytes of which the last 8 are dummy
> padding (sin_zero).  (And there there are some systems that
> actually check to be sure they're 0.)
> 
> BSD4.4 derived systems have a length coded into sockaddr too - but on
> linux it's usually possible to get by without storing the length, and
> I don't think ceph will need to do so.

I suspect the only way to really do this is put the sockaddr_whatever on 
the heap.  Most entity_addr_t users put it on the stack or embed it in 
other structures, so having it variable sized is not an option.  That has 
a cost, although with an 80-byte sockaddr_storage it may well be worth 
it...  We should probably embed a sockaddr_in[6] and transparently spill 
out to the heap only when exotic addr types are used?

Anyway, that part can come later... the first hurdle is the feature bits 
passed to encode().

sage

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: wip-addr
  2015-10-12 17:42   ` wip-addr David Zafman
@ 2015-10-12 18:04     ` Sage Weil
  0 siblings, 0 replies; 9+ messages in thread
From: Sage Weil @ 2015-10-12 18:04 UTC (permalink / raw)
  To: David Zafman; +Cc: Marcus Watts, ceph-devel, haomai

On Mon, 12 Oct 2015, David Zafman wrote:
> I don't understand how encode/decode of entity_addr_t is changing without
> versioning in the encode/decode.  This means that this branch is changing the
> ceph-objectstore-tool export format if CEPH_FEATURE_MSG_ADDR2 is part of the
> features.  So we could bump super_header::super_ver if the export format must
> change.
> 
> Now that I look at it, I'm sure I can clear the watchers and old_watchers in
> object_info_t during export because that is dynamic information and it happens
> to include entity_addr_t.  I need to verify this, but that may be the only
> reason that the objectstore tool needs a valid features value to be passed
> there.

Ah, yeah... clearing watchers (perhaps optionally, though) sounds fine.  

sage

> 
> David
> 
> On 10/9/15 2:49 PM, Sage Weil wrote:
> > > 2.
> > > >(about line 2067 in src/tools/ceph_objectstore_tool.cc)
> > > >(use via ceph cmd?) tools - "object store tool".
> > > >This has a way to serialize objects which includes a watch list
> > > >which includes an address.  There should be an option here to say
> > > >whether to include exported addresses.
> > I think it's safe to use defaults here.. what do you think, David?
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: wip-addr
  2015-10-09 21:49 ` wip-addr Sage Weil
       [not found]   ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
@ 2015-10-12 17:42   ` David Zafman
  2015-10-12 18:04     ` wip-addr Sage Weil
  1 sibling, 1 reply; 9+ messages in thread
From: David Zafman @ 2015-10-12 17:42 UTC (permalink / raw)
  To: Sage Weil, Marcus Watts; +Cc: ceph-devel, haomai


I don't understand how encode/decode of entity_addr_t is changing 
without versioning in the encode/decode.  This means that this branch is 
changing the ceph-objectstore-tool export format if 
CEPH_FEATURE_MSG_ADDR2 is part of the features.  So we could bump 
super_header::super_ver if the export format must change.

Now that I look at it, I'm sure I can clear the watchers and 
old_watchers in object_info_t during export because that is dynamic 
information and it happens to include entity_addr_t.  I need to verify 
this, but that may be the only reason that the objectstore tool needs a 
valid features value to be passed there.

David

On 10/9/15 2:49 PM, Sage Weil wrote:
>> 2.
>> >(about line 2067 in src/tools/ceph_objectstore_tool.cc)
>> >(use via ceph cmd?) tools - "object store tool".
>> >This has a way to serialize objects which includes a watch list
>> >which includes an address.  There should be an option here to say
>> >whether to include exported addresses.
> I think it's safe to use defaults here.. what do you think, David?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: wip-addr
       [not found]   ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
  2015-10-10  3:20     ` wip-addr Haomai Wang
@ 2015-10-10 12:07     ` Sage Weil
  1 sibling, 0 replies; 9+ messages in thread
From: Sage Weil @ 2015-10-10 12:07 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Marcus Watts, ceph-devel, dzafman

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3613 bytes --]

On Sat, 10 Oct 2015, Haomai Wang wrote:
> On Sat, Oct 10, 2015 at 5:49 AM, Sage Weil <sage@newdream.net> wrote:
>       Hey Marcus,
> 
>       On Fri, 2 Oct 2015, Marcus Watts wrote:
>       > wip-addr
>       >
>       > 1. where is it?
>       > 2. current state
>       > 3. more info
>       > 4. cheap fixes
>       > 5. in case you were wondering why?
>       >
>       > ____ 1. where is it?
>       >
>       > I've just pushed another update to wip-addr:
>       >
>       > git@github.com:linuxbox2/linuxbox-ceph.git
>       > wip-addr
>       >
>       > ____ 2. current state
>       >
>       > This version
>       > 1/ compiles
>       > 2/ ran an extremely limited set of tests successfully
>       >       (was able to bring up ceph-mon, ceph-osd).
>       >
>       > In theory, it should do everything a recent "master" branch
>       copy of ceph
>       > can do and little or nothing past that.  Internally it adds
>       "address vector"
>       > support, some parsing/print logic, and lots of encoding rules
>       to pass them
>       > around, but there's nothing that can create and little that
>       makes any
>       > sensible use of this.  So this is just the back end encoding
>       and storage rules.
> 
>       This is looking pretty good. I left some comments.  There are
>       still a few
>       XXX's left... but not many.  Haomai, can you help with the async
>       msgr one?
>       (Also, Marcus, can you check if the msg/Simple/Pipe.cc connect()
>       and
>       accept() code doing the right thing?)
> 
> 
> I have a quick view among all commits, looks a great improvement for the
> future enhancing.
>  
> 
>       One minor thing.. please put the subsystem as the prefix to the
>       commit
>       message instead of the branch name (e.g., mds: add features to
>       event
>       types).
> 
>       > Phase 2 is to add logic to actually make it useful.
>       >       (the very start of this is on linuxbox2 "wip-addr-p2",
>       >       just monmap changes so far...)
>       > ____ 3. more info
>       >
>       > There's an etherpad document that describes this in more
>       detail,
>       >
>       > http://pad.ceph.com/p/wip_addr
>       >
>       > ____ 4. cheap fixes
>       >
>       > a couple of minor issues that should be easy to resolve,
>       > 1.
>       > AsyncConnection.cc
>       > this passes addresses back and forth as it's setting up the
>       connection,
>       > and it also exchanges features.  As best I can tell, it looks
>       like
>       > it exchanges addresses before it knows what features the other
>       end
>       > supports.  There should be something in here that
>       > does this after knowing what features the other end supports.
> 
>       Copying Haomai.
> 
> 
> Right, it should be the same as simple messenger. The "features" bit is
> exchanged in "ceph_msg_connect" and "ceph_msg_connect_reply". 
> 
> I'm afraid that making features before addr exchange isn't a smooth way. 
> Maybe we need a middle release to help format migrating. Or we need to 
> add retry mechanism, we could add proper way to let new-style addr side 
> detect peer format.  

Yeah... it seems like we need the initial exchange to share the addr that 
we accepted on, and then a second tag + addrvec to share the full list 
after that if both sides have the new feature bit.

In any case, I think in both these cases encoding wth 0 with a FIXME 
comment is fine for the time being.

sage

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: wip-addr
       [not found]   ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
@ 2015-10-10  3:20     ` Haomai Wang
  2015-10-10 12:07     ` wip-addr Sage Weil
  1 sibling, 0 replies; 9+ messages in thread
From: Haomai Wang @ 2015-10-10  3:20 UTC (permalink / raw)
  To: Sage Weil; +Cc: Marcus Watts, ceph-devel, dzafman

resend to ML

On Sat, Oct 10, 2015 at 11:20 AM, Haomai Wang <haomaiwang@gmail.com> wrote:
>
>
> On Sat, Oct 10, 2015 at 5:49 AM, Sage Weil <sage@newdream.net> wrote:
>>
>> Hey Marcus,
>>
>> On Fri, 2 Oct 2015, Marcus Watts wrote:
>> > wip-addr
>> >
>> > 1. where is it?
>> > 2. current state
>> > 3. more info
>> > 4. cheap fixes
>> > 5. in case you were wondering why?
>> >
>> > ____ 1. where is it?
>> >
>> > I've just pushed another update to wip-addr:
>> >
>> > git@github.com:linuxbox2/linuxbox-ceph.git
>> > wip-addr
>> >
>> > ____ 2. current state
>> >
>> > This version
>> > 1/ compiles
>> > 2/ ran an extremely limited set of tests successfully
>> >       (was able to bring up ceph-mon, ceph-osd).
>> >
>> > In theory, it should do everything a recent "master" branch copy of ceph
>> > can do and little or nothing past that.  Internally it adds "address
>> > vector"
>> > support, some parsing/print logic, and lots of encoding rules to pass
>> > them
>> > around, but there's nothing that can create and little that makes any
>> > sensible use of this.  So this is just the back end encoding and storage
>> > rules.
>>
>> This is looking pretty good. I left some comments.  There are still a few
>> XXX's left... but not many.  Haomai, can you help with the async msgr one?
>> (Also, Marcus, can you check if the msg/Simple/Pipe.cc connect() and
>> accept() code doing the right thing?)
>
>
> I have a quick view among all commits, looks a great improvement for the
> future enhancing.
>
>>
>>
>> One minor thing.. please put the subsystem as the prefix to the commit
>> message instead of the branch name (e.g., mds: add features to event
>> types).
>>
>> > Phase 2 is to add logic to actually make it useful.
>> >       (the very start of this is on linuxbox2 "wip-addr-p2",
>> >       just monmap changes so far...)
>> > ____ 3. more info
>> >
>> > There's an etherpad document that describes this in more detail,
>> >
>> > http://pad.ceph.com/p/wip_addr
>> >
>> > ____ 4. cheap fixes
>> >
>> > a couple of minor issues that should be easy to resolve,
>> > 1.
>> > AsyncConnection.cc
>> > this passes addresses back and forth as it's setting up the connection,
>> > and it also exchanges features.  As best I can tell, it looks like
>> > it exchanges addresses before it knows what features the other end
>> > supports.  There should be something in here that
>> > does this after knowing what features the other end supports.
>>
>> Copying Haomai.
>
>
> Right, it should be the same as simple messenger. The "features" bit is
> exchanged in "ceph_msg_connect" and "ceph_msg_connect_reply".
>
> I'm afraid that making features before addr exchange isn't a smooth way.
> Maybe we need a middle release to help format migrating. Or we need to add
> retry mechanism, we could add proper way to let new-style addr side detect
> peer format.
>
>>
>>
>> > 2.
>> > (about line 2067 in src/tools/ceph_objectstore_tool.cc)
>> > (use via ceph cmd?) tools - "object store tool".
>> > This has a way to serialize objects which includes a watch list
>> > which includes an address.  There should be an option here to say
>> > whether to include exported addresses.
>>
>> I think it's safe to use defaults here.. what do you think, David?
>>
>> Thanks!
>> sage
>>
>
>
>
> --
>
> Best Regards,
>
> Wheat



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: wip-addr
  2015-10-02 23:24 wip-addr Marcus Watts
@ 2015-10-09 21:49 ` Sage Weil
       [not found]   ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
  2015-10-12 17:42   ` wip-addr David Zafman
  0 siblings, 2 replies; 9+ messages in thread
From: Sage Weil @ 2015-10-09 21:49 UTC (permalink / raw)
  To: Marcus Watts; +Cc: ceph-devel, haomai, dzafman

Hey Marcus,

On Fri, 2 Oct 2015, Marcus Watts wrote:
> wip-addr
> 
> 1. where is it?
> 2. current state
> 3. more info
> 4. cheap fixes
> 5. in case you were wondering why?
> 
> ____ 1. where is it?
> 
> I've just pushed another update to wip-addr:
> 
> git@github.com:linuxbox2/linuxbox-ceph.git
> wip-addr
> 
> ____ 2. current state
> 
> This version
> 1/ compiles
> 2/ ran an extremely limited set of tests successfully
> 	(was able to bring up ceph-mon, ceph-osd).
> 
> In theory, it should do everything a recent "master" branch copy of ceph
> can do and little or nothing past that.  Internally it adds "address vector"
> support, some parsing/print logic, and lots of encoding rules to pass them
> around, but there's nothing that can create and little that makes any
> sensible use of this.  So this is just the back end encoding and storage rules.

This is looking pretty good. I left some comments.  There are still a few 
XXX's left... but not many.  Haomai, can you help with the async msgr one? 
(Also, Marcus, can you check if the msg/Simple/Pipe.cc connect() and 
accept() code doing the right thing?)

One minor thing.. please put the subsystem as the prefix to the commit
message instead of the branch name (e.g., mds: add features to event 
types).

> Phase 2 is to add logic to actually make it useful.
> 	(the very start of this is on linuxbox2 "wip-addr-p2",
> 	just monmap changes so far...)
> ____ 3. more info
> 
> There's an etherpad document that describes this in more detail,
> 
> http://pad.ceph.com/p/wip_addr
> 
> ____ 4. cheap fixes
> 
> a couple of minor issues that should be easy to resolve,
> 1.
> AsyncConnection.cc
> this passes addresses back and forth as it's setting up the connection,
> and it also exchanges features.  As best I can tell, it looks like
> it exchanges addresses before it knows what features the other end
> supports.  There should be something in here that
> does this after knowing what features the other end supports.

Copying Haomai.
 
> 2.
> (about line 2067 in src/tools/ceph_objectstore_tool.cc)
> (use via ceph cmd?) tools - "object store tool".
> This has a way to serialize objects which includes a watch list
> which includes an address.  There should be an option here to say
> whether to include exported addresses.

I think it's safe to use defaults here.. what do you think, David?

Thanks!
sage


^ permalink raw reply	[flat|nested] 9+ messages in thread

* wip-addr
@ 2015-10-02 23:24 Marcus Watts
  2015-10-09 21:49 ` wip-addr Sage Weil
  0 siblings, 1 reply; 9+ messages in thread
From: Marcus Watts @ 2015-10-02 23:24 UTC (permalink / raw)
  To: ceph-devel

wip-addr

1. where is it?
2. current state
3. more info
4. cheap fixes
5. in case you were wondering why?

____ 1. where is it?

I've just pushed another update to wip-addr:

git@github.com:linuxbox2/linuxbox-ceph.git
wip-addr

____ 2. current state

This version
1/ compiles
2/ ran an extremely limited set of tests successfully
	(was able to bring up ceph-mon, ceph-osd).

In theory, it should do everything a recent "master" branch copy of ceph
can do and little or nothing past that.  Internally it adds "address vector"
support, some parsing/print logic, and lots of encoding rules to pass them
around, but there's nothing that can create and little that makes any
sensible use of this.  So this is just the back end encoding and storage rules.

Phase 2 is to add logic to actually make it useful.
	(the very start of this is on linuxbox2 "wip-addr-p2",
	just monmap changes so far...)

____ 3. more info

There's an etherpad document that describes this in more detail,

http://pad.ceph.com/p/wip_addr

____ 4. cheap fixes

a couple of minor issues that should be easy to resolve,
1.
AsyncConnection.cc
this passes addresses back and forth as it's setting up the connection,
and it also exchanges features.  As best I can tell, it looks like
it exchanges addresses before it knows what features the other end
supports.  There should be something in here that
does this after knowing what features the other end supports.

2.
(about line 2067 in src/tools/ceph_objectstore_tool.cc)
(use via ceph cmd?) tools - "object store tool".
This has a way to serialize objects which includes a watch list
which includes an address.  There should be an option here to say
whether to include exported addresses.

____ 5. in case you were wondering why?

The main current interest for this it to work with accelio,
which introduces 2 more transport types (accelio, via tcp or
infiniband.)  So that means actually 6 possible choices,
{ ipv6 or ipv6 } x { simple messenger, tcp, infiniband. }.
Infiniband typically would work within a data center but not
between them, so there are also reachability issues which
may not be obvious to the client.  (ipv6 has this too).

					-Marcus Watts

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-10-12 18:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-24 18:01 wip-addr Sage Weil
2015-08-25 10:06 ` wip-addr Marcus Watts
2015-08-25 14:08   ` wip-addr Sage Weil
2015-10-02 23:24 wip-addr Marcus Watts
2015-10-09 21:49 ` wip-addr Sage Weil
     [not found]   ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
2015-10-10  3:20     ` wip-addr Haomai Wang
2015-10-10 12:07     ` wip-addr Sage Weil
2015-10-12 17:42   ` wip-addr David Zafman
2015-10-12 18:04     ` wip-addr Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.