* wip-addr
@ 2015-08-24 18:01 Sage Weil
2015-08-25 10:06 ` wip-addr Marcus Watts
0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2015-08-24 18:01 UTC (permalink / raw)
To: mwatts, ceph-devel
Hi Marcus,
I looked over the wip-addr branch a bit. I have two basic
questions/concerns:
1) In commit
https://github.com/ceph/ceph/commit/73b09090466a43d5aceb979a4802de3f3f5bf24a
we switch the type field from sa_family to transport_type. This seems
like the way to go, but we need to deal with the fact that lots of
clusters out there are IPv6 and have AF_INET6 filled in here. Probably
both types should be interpreted to mean "existing/legacy IP messenger" or
whatever we want to call SimpleMessenger/AsyncMessenger's protocol.
I think encode needs to make sure it fills in that value for the type when
encoding the legacy entity_addr_t encoding, but could use a/the single
valid value for the new encoding. And any get_transport_type() accessor
should also return the single valid value.
2) In the later commit
https://github.com/ceph/ceph/commit/9d203a2058f76414703b4fc212a1a0a960d0c672
you introduce a grammar for printing/parsing the addrs. This also makes
sense since e.g. xio uses an IP to identify an endpoint. I think we
should identify these based on the *protocol* and not the implementation,
though... whether we use SimpleMessenger or AsyncMessenger is not a
property of the address. Maybe "tcp://" makes more sense here? Or
perhaps no prefix at all (a bare IP address), so that this looks the same
as it did before in the case where the default protocol(s) are in use.
I assume the xio protocol (whether it is rdma or tcp) is closely tied to
libxio itself.. is that right? If so, using xio in that prefix makes
sense. I'd include xio somewhere in the rdma prefix though (xrdma:// and
xtcp://)?
What do you think?
Logistically, I think the steps for getting this ready for merge are:
1) Separate out the preliminary patches that pass a feature to the addr
encoding.. without any of the other cohortfs patches that are currently on
this branch. Once this builds we can merge it separately from the rest...
2) The entity_addrvec_t type.
3) The type -> transport_type switch.
4) We should make the new entity_addr_t encoding encode sockaddr piece
more compactly instead of eating up a full 80-byte sockaddr_storage even
for the ~8-byte IPv4 sockaddr_in. Maybe just need to encode an
explicit length for the sockaddr_* piece?
Something like that?
sage
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-addr
2015-08-24 18:01 wip-addr Sage Weil
@ 2015-08-25 10:06 ` Marcus Watts
2015-08-25 14:08 ` wip-addr Sage Weil
0 siblings, 1 reply; 9+ messages in thread
From: Marcus Watts @ 2015-08-25 10:06 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Mon, Aug 24, 2015 at 11:01:45AM -0700, Sage Weil wrote:
...
> I looked over the wip-addr branch a bit. I have two basic
> questions/concerns:
>
> 1) In commit
> https://github.com/ceph/ceph/commit/73b09090466a43d5aceb979a4802de3f3f5bf24a
> we switch the type field from sa_family to transport_type. This seems
> like the way to go, but we need to deal with the fact that lots of
> clusters out there are IPv6 and have AF_INET6 filled in here. Probably
> both types should be interpreted to mean "existing/legacy IP messenger" or
> whatever we want to call SimpleMessenger/AsyncMessenger's protocol.
>
> I think encode needs to make sure it fills in that value for the type when
> encoding the legacy entity_addr_t encoding, but could use a/the single
> valid value for the new encoding. And any get_transport_type() accessor
> should also return the single valid value.
With cohortfs I had the luxury of ignoring existing installations.
I think I concluded at the time that probably most real systems
had '0' stored here.
Yes, if there are things that have "junk" here, they'll need to be handled
appropriately - I think it can be mostly fixed by just having decode map
transport_type "AF_INET6" (which is probably 10) to 0. AF_INET is 2,
so maybe that too. People switching *back* to older code after using
new code won't be so happy - would probably need a patch for older code
if that's a concern.
If there's code out there that thinks get_transport_type() returns an address
family, that will need fixing. something like,
get_address_family(){ return in_addr->ss_family; }
but I didn't find any code that actually needed this.
>
> 2) In the later commit
> https://github.com/ceph/ceph/commit/9d203a2058f76414703b4fc212a1a0a960d0c672
> you introduce a grammar for printing/parsing the addrs. This also makes
> sense since e.g. xio uses an IP to identify an endpoint. I think we
> should identify these based on the *protocol* and not the implementation,
> though... whether we use SimpleMessenger or AsyncMessenger is not a
> property of the address. Maybe "tcp://" makes more sense here? Or
> perhaps no prefix at all (a bare IP address), so that this looks the same
> as it did before in the case where the default protocol(s) are in use.
>
> I assume the xio protocol (whether it is rdma or tcp) is closely tied to
> libxio itself.. is that right? If so, using xio in that prefix makes
> sense. I'd include xio somewhere in the rdma prefix though (xrdma:// and
> xtcp://)?
Code base I had didn't have async messenger as a transport type; if that
needs to be addressable independently of simple messenger it will need a
prefix. I had a prefixless address decode as simple messenger - I
can make this more obvious than "unsigned type = 0;".
I had these prefixes,
sm
rdma
xtcp
but I can easily pick others. I figured rdma didn't need an x since
there didn't seem to be much chance anybody would want a non-xio version
of rdma.
>
> What do you think?
>
> Logistically, I think the steps for getting this ready for merge are:
>
> 1) Separate out the preliminary patches that pass a feature to the addr
> encoding.. without any of the other cohortfs patches that are currently on
> this branch. Once this builds we can merge it separately from the rest...
>
> 2) The entity_addrvec_t type.
>
> 3) The type -> transport_type switch.
There's a lot of cohortfs glop that still needs to go away in this branch.
I hadn't thought of structuring the patch that way. I can do it,
but the encoding feature will be tedious to split out.
There's a bunch of cascading stuff after entity_addrvec_t which
could go into a series of further patches, monmap, and so forth,
not all of which exist yet.
>
> 4) We should make the new entity_addr_t encoding encode sockaddr piece
> more compactly instead of eating up a full 80-byte sockaddr_storage even
> for the ~8-byte IPv4 sockaddr_in. Maybe just need to encode an
> explicit length for the sockaddr_* piece?
Yes, this could be made more memory efficient. It's mainly a matter
of making sure the constructor gets the proper amount of memory.
There's a C hack to do this, but I remember seeing a c++ technique
that was a bit less hackish.
A sockaddr_in is actually 16 bytes of which the last 8 are dummy
padding (sin_zero). (And there there are some systems that
actually check to be sure they're 0.)
BSD4.4 derived systems have a length coded into sockaddr too - but on
linux it's usually possible to get by without storing the length, and
I don't think ceph will need to do so.
...
-Marcus Watts
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-addr
2015-08-25 10:06 ` wip-addr Marcus Watts
@ 2015-08-25 14:08 ` Sage Weil
0 siblings, 0 replies; 9+ messages in thread
From: Sage Weil @ 2015-08-25 14:08 UTC (permalink / raw)
To: Marcus Watts; +Cc: ceph-devel
On Tue, 25 Aug 2015, Marcus Watts wrote:
> On Mon, Aug 24, 2015 at 11:01:45AM -0700, Sage Weil wrote:
> ...
> > I looked over the wip-addr branch a bit. I have two basic
> > questions/concerns:
> >
> > 1) In commit
> > https://github.com/ceph/ceph/commit/73b09090466a43d5aceb979a4802de3f3f5bf24a
> > we switch the type field from sa_family to transport_type. This seems
> > like the way to go, but we need to deal with the fact that lots of
> > clusters out there are IPv6 and have AF_INET6 filled in here. Probably
> > both types should be interpreted to mean "existing/legacy IP messenger" or
> > whatever we want to call SimpleMessenger/AsyncMessenger's protocol.
> >
> > I think encode needs to make sure it fills in that value for the type when
> > encoding the legacy entity_addr_t encoding, but could use a/the single
> > valid value for the new encoding. And any get_transport_type() accessor
> > should also return the single valid value.
>
> With cohortfs I had the luxury of ignoring existing installations.
>
> I think I concluded at the time that probably most real systems
> had '0' stored here.
Oh, you're right--this is always 0 currently, so we're in good shape.
> Yes, if there are things that have "junk" here, they'll need to be handled
> appropriately - I think it can be mostly fixed by just having decode map
> transport_type "AF_INET6" (which is probably 10) to 0. AF_INET is 2,
> so maybe that too. People switching *back* to older code after using
> new code won't be so happy - would probably need a patch for older code
> if that's a concern.
>
> If there's code out there that thinks get_transport_type() returns an address
> family, that will need fixing. something like,
> get_address_family(){ return in_addr->ss_family; }
> but I didn't find any code that actually needed this.
There's already a {get,set}_family() that return ss_family, so I think
we're okay there.
The main thing with entity_addr_t (and entity_addrvec_t) is that when the
new feature is not present we'll want to encode in the old format. I
wonder if the best thing to do here is silently drop the type if it is
non-zero here. We could assert and blow up, but that is a bit dramatic.
The underlying strategy here is to replace (the old) entity_addr_t with
entity_addrvec_t in most places, and when encoding the addrvec without the
new feature, encode only the first addr in its compat form. That way, as
long as the first addr in the list is always IPv4 or IPv6 and the standard
protocol, old clients will be happy.
If we make the addrvec ordered by preference, that may be problematic,
since we'd always prefer the legacy protocol in rdma environments.
Perhaps it would be better to make the compat addrvec encode pick out the
first type 0 addr? That's more robust anyway...
> > 2) In the later commit
> > https://github.com/ceph/ceph/commit/9d203a2058f76414703b4fc212a1a0a960d0c672
> > you introduce a grammar for printing/parsing the addrs. This also makes
> > sense since e.g. xio uses an IP to identify an endpoint. I think we
> > should identify these based on the *protocol* and not the implementation,
> > though... whether we use SimpleMessenger or AsyncMessenger is not a
> > property of the address. Maybe "tcp://" makes more sense here? Or
> > perhaps no prefix at all (a bare IP address), so that this looks the same
> > as it did before in the case where the default protocol(s) are in use.
> >
> > I assume the xio protocol (whether it is rdma or tcp) is closely tied to
> > libxio itself.. is that right? If so, using xio in that prefix makes
> > sense. I'd include xio somewhere in the rdma prefix though (xrdma:// and
> > xtcp://)?
>
> Code base I had didn't have async messenger as a transport type; if that
> needs to be addressable independently of simple messenger it will need a
> prefix. I had a prefixless address decode as simple messenger - I
> can make this more obvious than "unsigned type = 0;".
>
> I had these prefixes,
> sm
> rdma
> xtcp
> but I can easily pick others. I figured rdma didn't need an x since
> there didn't seem to be much chance anybody would want a non-xio version
> of rdma.
The key thing is that the addr property is the *protocol* not the
implementation, so from addr's perspective SimpleMessenger and
AsyncMessenger are indistinguishable. How about
tcp - legacy protocol (simple or async, whatever use has configured)
xrdma - xio over rdma
xtcp - xio over tcp
where the tcp:// is parsed (if present) but left off when rendering
output, so when using the legacy protocol nothing looks
different?
> > Logistically, I think the steps for getting this ready for merge are:
> >
> > 1) Separate out the preliminary patches that pass a feature to the addr
> > encoding.. without any of the other cohortfs patches that are currently on
> > this branch. Once this builds we can merge it separately from the rest...
> >
> > 2) The entity_addrvec_t type.
> >
> > 3) The type -> transport_type switch.
>
> There's a lot of cohortfs glop that still needs to go away in this branch.
>
> I hadn't thought of structuring the patch that way. I can do it,
> but the encoding feature will be tedious to split out.
> There's a bunch of cascading stuff after entity_addrvec_t which
> could go into a series of further patches, monmap, and so forth,
> not all of which exist yet.
It looks like c728926a86e1410f959011d24700bb07bad1dc2c adds
entity_addrvec_t and adds feature to encoding all over the place... but
doesn't actually use addrvec at all yet. I'd separate the addrvec
declaration into its own patch (just ignore it for now), and build a
new branch on master that
1- makes entity_addr_t require featuers when encoding, and
2- cherry-pick the patches adding the feature arg
Once that whole cleanup is complete and builds, the rest will be much
easier... and IMO we should keep the two separate. This first piece will
effectively be a no-op (entity_addr_t encoding doesn't yet depend on
the feature). Unfortunately it's the hardest part since we need to make
sure we have the peer features available in all sorts of awkward
places...
> > 4) We should make the new entity_addr_t encoding encode sockaddr piece
> > more compactly instead of eating up a full 80-byte sockaddr_storage even
> > for the ~8-byte IPv4 sockaddr_in. Maybe just need to encode an
> > explicit length for the sockaddr_* piece?
>
> Yes, this could be made more memory efficient. It's mainly a matter
> of making sure the constructor gets the proper amount of memory.
> There's a C hack to do this, but I remember seeing a c++ technique
> that was a bit less hackish.
>
> A sockaddr_in is actually 16 bytes of which the last 8 are dummy
> padding (sin_zero). (And there there are some systems that
> actually check to be sure they're 0.)
>
> BSD4.4 derived systems have a length coded into sockaddr too - but on
> linux it's usually possible to get by without storing the length, and
> I don't think ceph will need to do so.
I suspect the only way to really do this is put the sockaddr_whatever on
the heap. Most entity_addr_t users put it on the stack or embed it in
other structures, so having it variable sized is not an option. That has
a cost, although with an 80-byte sockaddr_storage it may well be worth
it... We should probably embed a sockaddr_in[6] and transparently spill
out to the heap only when exotic addr types are used?
Anyway, that part can come later... the first hurdle is the feature bits
passed to encode().
sage
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-addr
2015-10-12 17:42 ` wip-addr David Zafman
@ 2015-10-12 18:04 ` Sage Weil
0 siblings, 0 replies; 9+ messages in thread
From: Sage Weil @ 2015-10-12 18:04 UTC (permalink / raw)
To: David Zafman; +Cc: Marcus Watts, ceph-devel, haomai
On Mon, 12 Oct 2015, David Zafman wrote:
> I don't understand how encode/decode of entity_addr_t is changing without
> versioning in the encode/decode. This means that this branch is changing the
> ceph-objectstore-tool export format if CEPH_FEATURE_MSG_ADDR2 is part of the
> features. So we could bump super_header::super_ver if the export format must
> change.
>
> Now that I look at it, I'm sure I can clear the watchers and old_watchers in
> object_info_t during export because that is dynamic information and it happens
> to include entity_addr_t. I need to verify this, but that may be the only
> reason that the objectstore tool needs a valid features value to be passed
> there.
Ah, yeah... clearing watchers (perhaps optionally, though) sounds fine.
sage
>
> David
>
> On 10/9/15 2:49 PM, Sage Weil wrote:
> > > 2.
> > > >(about line 2067 in src/tools/ceph_objectstore_tool.cc)
> > > >(use via ceph cmd?) tools - "object store tool".
> > > >This has a way to serialize objects which includes a watch list
> > > >which includes an address. There should be an option here to say
> > > >whether to include exported addresses.
> > I think it's safe to use defaults here.. what do you think, David?
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-addr
2015-10-09 21:49 ` wip-addr Sage Weil
[not found] ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
@ 2015-10-12 17:42 ` David Zafman
2015-10-12 18:04 ` wip-addr Sage Weil
1 sibling, 1 reply; 9+ messages in thread
From: David Zafman @ 2015-10-12 17:42 UTC (permalink / raw)
To: Sage Weil, Marcus Watts; +Cc: ceph-devel, haomai
I don't understand how encode/decode of entity_addr_t is changing
without versioning in the encode/decode. This means that this branch is
changing the ceph-objectstore-tool export format if
CEPH_FEATURE_MSG_ADDR2 is part of the features. So we could bump
super_header::super_ver if the export format must change.
Now that I look at it, I'm sure I can clear the watchers and
old_watchers in object_info_t during export because that is dynamic
information and it happens to include entity_addr_t. I need to verify
this, but that may be the only reason that the objectstore tool needs a
valid features value to be passed there.
David
On 10/9/15 2:49 PM, Sage Weil wrote:
>> 2.
>> >(about line 2067 in src/tools/ceph_objectstore_tool.cc)
>> >(use via ceph cmd?) tools - "object store tool".
>> >This has a way to serialize objects which includes a watch list
>> >which includes an address. There should be an option here to say
>> >whether to include exported addresses.
> I think it's safe to use defaults here.. what do you think, David?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-addr
[not found] ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
2015-10-10 3:20 ` wip-addr Haomai Wang
@ 2015-10-10 12:07 ` Sage Weil
1 sibling, 0 replies; 9+ messages in thread
From: Sage Weil @ 2015-10-10 12:07 UTC (permalink / raw)
To: Haomai Wang; +Cc: Marcus Watts, ceph-devel, dzafman
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3613 bytes --]
On Sat, 10 Oct 2015, Haomai Wang wrote:
> On Sat, Oct 10, 2015 at 5:49 AM, Sage Weil <sage@newdream.net> wrote:
> Hey Marcus,
>
> On Fri, 2 Oct 2015, Marcus Watts wrote:
> > wip-addr
> >
> > 1. where is it?
> > 2. current state
> > 3. more info
> > 4. cheap fixes
> > 5. in case you were wondering why?
> >
> > ____ 1. where is it?
> >
> > I've just pushed another update to wip-addr:
> >
> > git@github.com:linuxbox2/linuxbox-ceph.git
> > wip-addr
> >
> > ____ 2. current state
> >
> > This version
> > 1/ compiles
> > 2/ ran an extremely limited set of tests successfully
> > (was able to bring up ceph-mon, ceph-osd).
> >
> > In theory, it should do everything a recent "master" branch
> copy of ceph
> > can do and little or nothing past that. Internally it adds
> "address vector"
> > support, some parsing/print logic, and lots of encoding rules
> to pass them
> > around, but there's nothing that can create and little that
> makes any
> > sensible use of this. So this is just the back end encoding
> and storage rules.
>
> This is looking pretty good. I left some comments. There are
> still a few
> XXX's left... but not many. Haomai, can you help with the async
> msgr one?
> (Also, Marcus, can you check if the msg/Simple/Pipe.cc connect()
> and
> accept() code doing the right thing?)
>
>
> I have a quick view among all commits, looks a great improvement for the
> future enhancing.
>
>
> One minor thing.. please put the subsystem as the prefix to the
> commit
> message instead of the branch name (e.g., mds: add features to
> event
> types).
>
> > Phase 2 is to add logic to actually make it useful.
> > (the very start of this is on linuxbox2 "wip-addr-p2",
> > just monmap changes so far...)
> > ____ 3. more info
> >
> > There's an etherpad document that describes this in more
> detail,
> >
> > http://pad.ceph.com/p/wip_addr
> >
> > ____ 4. cheap fixes
> >
> > a couple of minor issues that should be easy to resolve,
> > 1.
> > AsyncConnection.cc
> > this passes addresses back and forth as it's setting up the
> connection,
> > and it also exchanges features. As best I can tell, it looks
> like
> > it exchanges addresses before it knows what features the other
> end
> > supports. There should be something in here that
> > does this after knowing what features the other end supports.
>
> Copying Haomai.
>
>
> Right, it should be the same as simple messenger. The "features" bit is
> exchanged in "ceph_msg_connect" and "ceph_msg_connect_reply".
>
> I'm afraid that making features before addr exchange isn't a smooth way.
> Maybe we need a middle release to help format migrating. Or we need to
> add retry mechanism, we could add proper way to let new-style addr side
> detect peer format.
Yeah... it seems like we need the initial exchange to share the addr that
we accepted on, and then a second tag + addrvec to share the full list
after that if both sides have the new feature bit.
In any case, I think in both these cases encoding wth 0 with a FIXME
comment is fine for the time being.
sage
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-addr
[not found] ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
@ 2015-10-10 3:20 ` Haomai Wang
2015-10-10 12:07 ` wip-addr Sage Weil
1 sibling, 0 replies; 9+ messages in thread
From: Haomai Wang @ 2015-10-10 3:20 UTC (permalink / raw)
To: Sage Weil; +Cc: Marcus Watts, ceph-devel, dzafman
resend to ML
On Sat, Oct 10, 2015 at 11:20 AM, Haomai Wang <haomaiwang@gmail.com> wrote:
>
>
> On Sat, Oct 10, 2015 at 5:49 AM, Sage Weil <sage@newdream.net> wrote:
>>
>> Hey Marcus,
>>
>> On Fri, 2 Oct 2015, Marcus Watts wrote:
>> > wip-addr
>> >
>> > 1. where is it?
>> > 2. current state
>> > 3. more info
>> > 4. cheap fixes
>> > 5. in case you were wondering why?
>> >
>> > ____ 1. where is it?
>> >
>> > I've just pushed another update to wip-addr:
>> >
>> > git@github.com:linuxbox2/linuxbox-ceph.git
>> > wip-addr
>> >
>> > ____ 2. current state
>> >
>> > This version
>> > 1/ compiles
>> > 2/ ran an extremely limited set of tests successfully
>> > (was able to bring up ceph-mon, ceph-osd).
>> >
>> > In theory, it should do everything a recent "master" branch copy of ceph
>> > can do and little or nothing past that. Internally it adds "address
>> > vector"
>> > support, some parsing/print logic, and lots of encoding rules to pass
>> > them
>> > around, but there's nothing that can create and little that makes any
>> > sensible use of this. So this is just the back end encoding and storage
>> > rules.
>>
>> This is looking pretty good. I left some comments. There are still a few
>> XXX's left... but not many. Haomai, can you help with the async msgr one?
>> (Also, Marcus, can you check if the msg/Simple/Pipe.cc connect() and
>> accept() code doing the right thing?)
>
>
> I have a quick view among all commits, looks a great improvement for the
> future enhancing.
>
>>
>>
>> One minor thing.. please put the subsystem as the prefix to the commit
>> message instead of the branch name (e.g., mds: add features to event
>> types).
>>
>> > Phase 2 is to add logic to actually make it useful.
>> > (the very start of this is on linuxbox2 "wip-addr-p2",
>> > just monmap changes so far...)
>> > ____ 3. more info
>> >
>> > There's an etherpad document that describes this in more detail,
>> >
>> > http://pad.ceph.com/p/wip_addr
>> >
>> > ____ 4. cheap fixes
>> >
>> > a couple of minor issues that should be easy to resolve,
>> > 1.
>> > AsyncConnection.cc
>> > this passes addresses back and forth as it's setting up the connection,
>> > and it also exchanges features. As best I can tell, it looks like
>> > it exchanges addresses before it knows what features the other end
>> > supports. There should be something in here that
>> > does this after knowing what features the other end supports.
>>
>> Copying Haomai.
>
>
> Right, it should be the same as simple messenger. The "features" bit is
> exchanged in "ceph_msg_connect" and "ceph_msg_connect_reply".
>
> I'm afraid that making features before addr exchange isn't a smooth way.
> Maybe we need a middle release to help format migrating. Or we need to add
> retry mechanism, we could add proper way to let new-style addr side detect
> peer format.
>
>>
>>
>> > 2.
>> > (about line 2067 in src/tools/ceph_objectstore_tool.cc)
>> > (use via ceph cmd?) tools - "object store tool".
>> > This has a way to serialize objects which includes a watch list
>> > which includes an address. There should be an option here to say
>> > whether to include exported addresses.
>>
>> I think it's safe to use defaults here.. what do you think, David?
>>
>> Thanks!
>> sage
>>
>
>
>
> --
>
> Best Regards,
>
> Wheat
--
Best Regards,
Wheat
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-addr
2015-10-02 23:24 wip-addr Marcus Watts
@ 2015-10-09 21:49 ` Sage Weil
[not found] ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
2015-10-12 17:42 ` wip-addr David Zafman
0 siblings, 2 replies; 9+ messages in thread
From: Sage Weil @ 2015-10-09 21:49 UTC (permalink / raw)
To: Marcus Watts; +Cc: ceph-devel, haomai, dzafman
Hey Marcus,
On Fri, 2 Oct 2015, Marcus Watts wrote:
> wip-addr
>
> 1. where is it?
> 2. current state
> 3. more info
> 4. cheap fixes
> 5. in case you were wondering why?
>
> ____ 1. where is it?
>
> I've just pushed another update to wip-addr:
>
> git@github.com:linuxbox2/linuxbox-ceph.git
> wip-addr
>
> ____ 2. current state
>
> This version
> 1/ compiles
> 2/ ran an extremely limited set of tests successfully
> (was able to bring up ceph-mon, ceph-osd).
>
> In theory, it should do everything a recent "master" branch copy of ceph
> can do and little or nothing past that. Internally it adds "address vector"
> support, some parsing/print logic, and lots of encoding rules to pass them
> around, but there's nothing that can create and little that makes any
> sensible use of this. So this is just the back end encoding and storage rules.
This is looking pretty good. I left some comments. There are still a few
XXX's left... but not many. Haomai, can you help with the async msgr one?
(Also, Marcus, can you check if the msg/Simple/Pipe.cc connect() and
accept() code doing the right thing?)
One minor thing.. please put the subsystem as the prefix to the commit
message instead of the branch name (e.g., mds: add features to event
types).
> Phase 2 is to add logic to actually make it useful.
> (the very start of this is on linuxbox2 "wip-addr-p2",
> just monmap changes so far...)
> ____ 3. more info
>
> There's an etherpad document that describes this in more detail,
>
> http://pad.ceph.com/p/wip_addr
>
> ____ 4. cheap fixes
>
> a couple of minor issues that should be easy to resolve,
> 1.
> AsyncConnection.cc
> this passes addresses back and forth as it's setting up the connection,
> and it also exchanges features. As best I can tell, it looks like
> it exchanges addresses before it knows what features the other end
> supports. There should be something in here that
> does this after knowing what features the other end supports.
Copying Haomai.
> 2.
> (about line 2067 in src/tools/ceph_objectstore_tool.cc)
> (use via ceph cmd?) tools - "object store tool".
> This has a way to serialize objects which includes a watch list
> which includes an address. There should be an option here to say
> whether to include exported addresses.
I think it's safe to use defaults here.. what do you think, David?
Thanks!
sage
^ permalink raw reply [flat|nested] 9+ messages in thread
* wip-addr
@ 2015-10-02 23:24 Marcus Watts
2015-10-09 21:49 ` wip-addr Sage Weil
0 siblings, 1 reply; 9+ messages in thread
From: Marcus Watts @ 2015-10-02 23:24 UTC (permalink / raw)
To: ceph-devel
wip-addr
1. where is it?
2. current state
3. more info
4. cheap fixes
5. in case you were wondering why?
____ 1. where is it?
I've just pushed another update to wip-addr:
git@github.com:linuxbox2/linuxbox-ceph.git
wip-addr
____ 2. current state
This version
1/ compiles
2/ ran an extremely limited set of tests successfully
(was able to bring up ceph-mon, ceph-osd).
In theory, it should do everything a recent "master" branch copy of ceph
can do and little or nothing past that. Internally it adds "address vector"
support, some parsing/print logic, and lots of encoding rules to pass them
around, but there's nothing that can create and little that makes any
sensible use of this. So this is just the back end encoding and storage rules.
Phase 2 is to add logic to actually make it useful.
(the very start of this is on linuxbox2 "wip-addr-p2",
just monmap changes so far...)
____ 3. more info
There's an etherpad document that describes this in more detail,
http://pad.ceph.com/p/wip_addr
____ 4. cheap fixes
a couple of minor issues that should be easy to resolve,
1.
AsyncConnection.cc
this passes addresses back and forth as it's setting up the connection,
and it also exchanges features. As best I can tell, it looks like
it exchanges addresses before it knows what features the other end
supports. There should be something in here that
does this after knowing what features the other end supports.
2.
(about line 2067 in src/tools/ceph_objectstore_tool.cc)
(use via ceph cmd?) tools - "object store tool".
This has a way to serialize objects which includes a watch list
which includes an address. There should be an option here to say
whether to include exported addresses.
____ 5. in case you were wondering why?
The main current interest for this it to work with accelio,
which introduces 2 more transport types (accelio, via tcp or
infiniband.) So that means actually 6 possible choices,
{ ipv6 or ipv6 } x { simple messenger, tcp, infiniband. }.
Infiniband typically would work within a data center but not
between them, so there are also reachability issues which
may not be obvious to the client. (ipv6 has this too).
-Marcus Watts
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-10-12 18:04 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-24 18:01 wip-addr Sage Weil
2015-08-25 10:06 ` wip-addr Marcus Watts
2015-08-25 14:08 ` wip-addr Sage Weil
2015-10-02 23:24 wip-addr Marcus Watts
2015-10-09 21:49 ` wip-addr Sage Weil
[not found] ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
2015-10-10 3:20 ` wip-addr Haomai Wang
2015-10-10 12:07 ` wip-addr Sage Weil
2015-10-12 17:42 ` wip-addr David Zafman
2015-10-12 18:04 ` wip-addr Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.