All of lore.kernel.org
 help / color / mirror / Atom feed
From: Haomai Wang <haomai@xsky.com>
To: Sage Weil <sweil@redhat.com>
Cc: Yehuda Sadeh-Weinraub <yehuda@redhat.com>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: msgr2 protocol
Date: Thu, 2 Jun 2016 23:59:35 +0800	[thread overview]
Message-ID: <CACJqLybY4Y1t787arDojNa=zLD+LdNMDukEO8yZcV+E2NSxUvA@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1606021137190.6221@cpach.fuggernut.com>

On Thu, Jun 2, 2016 at 11:43 PM, Sage Weil <sweil@redhat.com> wrote:
> Based on the discussion during CDM yesterday I wrote up a nicer-looking
> spec of the protocol in rst:
>
>         https://github.com/ceph/ceph/pull/9461
>
> Please let me know if this looks right.  I have two questions:
>
> 1. Is TAG_START is really necessary?  I guess it doesn't hurt, and makes
> it easy to add flags later.
>
> 2. We don't explicitly have anything here that indicates a session is
> stateless or stateful.  Currently this is determined by the Policy stuff
> on either end and the peers just happen to agree.  Setting/asserting
> it explicitly has part of the handshake seems like a good idea.  Maybe a
> flags field in the TAG_IDENT message, with a flags for lossy/lossess,
> whether we initiate connections (true for client or p2p servers)?

we already have CEPH_MSG_CONNECT_LOSSY flag when handshake.

>
> sage
>
>
> On Sat, 28 May 2016, Yehuda Sadeh-Weinraub wrote:
>
>> On Fri, May 27, 2016 at 10:37 AM, Sage Weil <sweil@redhat.com> wrote:
>> > On Fri, 27 May 2016, Yehuda Sadeh-Weinraub wrote:
>> >> On Thu, May 26, 2016 at 11:17 AM, Sage Weil <sweil@redhat.com> wrote:
>> >> > I wrote up a basic proposal for the new msgr2 protocol:
>> >> >
>> >> >         http://pad.ceph.com/p/msgr2
>> >> >
>> >> > It is pretty similar to the current protocol, with a few key changes:
>> >> >
>> >> > 1. The initial banner has a version number for protocl features supported
>> >> > and required.  This will allow optional behavior later.  The current
>> >> > protocol doesn't allow this (the banner string is fixed and has to match
>> >> > verbatim).
>> >> >
>> >> > 2. The auth handshake is a low-level msgr exchange now.  This more or less
>> >> > matches the MAuth and MAuthReply exchange with the mon.  Also, the
>> >> > authenticator/ticket presentation for established clients can be sent here
>> >> > as part of this exchange, instead of as part of the msg_connect and
>> >> > msg_connect_reply exchnage.
>> >> >
>> >> > 3. The identification of peers during connect is moved to the TAG_IDENT
>> >> > stage.  This way it could happen after authentication and/or encryption,
>> >> > if we like.  (Not sure it matters.)
>> >> >
>> >> > 4. Signatures are a separate message now that follows the previous
>> >> > message.  If a message doesn't have a signature that follows, it is
>> >> > dropped.  Once authenticated we can sign all the other handshake exchanges
>> >> > (TAG_IDENT, etc.) as well as the messages themselves.
>> >> >
>> >>
>> >> Is there a reason why the signature needs to be a separate message? It
>> >> would add extra overhead, and it seems to me that it would complicate
>> >> implementation (in terms of message state and such).
>> >
>> > It doesn't have to be--I was just wanting to keep things simple.  We could
>> > similarly make it part of the underlying format, e.g.,
>> >
>> >  tag byte
>> >  8 byte signature
>> >  payload
>>
>> signature should come after payload, but yeah. Might need to define
>> extended envelope to allow future extensions.
>>
>> >
>> > or whatever.  That's basically the same thing, except we save 1 byte.
>> >
>> >> > 5. The reconnect behavior for stateful connections is a separate
>> >> > exchange. This keeps the stateless connections free of clutter.
>> >> >
>> >> > 6. A few changes in the auth_none and cephx integratoin will be needed.
>> >> > For example, all the current stubs assume that authentication happens over
>> >> > MAuth message and authorization happens in an authorizer blob in
>> >> > ceph_msg_connect.  Now both are part of TAG_AUTH_REQUEST, so we'll need to
>> >> > multiplex the cephx message blobs. Also, because the IDENT exchanges
>> >> > happens later, we may need to pass additional info in the auth handshake
>> >> > messages (like the peer type, or whatever else is needed).
>> >> >
>> >> > 7. Lots of messages can go either way, and I tried ot avoid a strict
>> >> > request/response model so that things could be pipelined, and we'd spend a
>> >> > minimal amount of time waiting for a response from the other end.  For
>> >> > example,
>> >> >
>> >> > C:
>> >> >  initiates connection
>> >> > S:
>> >> >  accepts connection
>> >> >  -> banner
>> >> >  -> TAG_AUTH_METHODS
>> >> > C:
>> >> >  -> banner
>> >> >  -> TAG_AUTH_SET_METHOD
>> >> >  -> TAG_AUTH_AUTH_REQUEST
>> >> > S:
>> >> >  -> TAG_AUTH_REPLY
>> >> > C:
>> >> >  -> TAG_ENCRYPT_BEGIN
>> >> >  -> TAG_IDENT
>> >> >  -> TAG_SIGNATURE
>> >>
>> >> Can we have the client start authenticating with some predetermined
>> >> auth params, and resort to having the server responding with
>> >> AUTH_METHODS only if it doesn't support the method selected by the
>> >> client. Even if not having it preconfigured, the auth method usually
>> >> doesn't change across connection instances, so we can have the client
>> >> cache that info per server. That would then be something like this:
>> >>
>> >> a first connection:
>> >>
>> >> C:
>> >>  initiates connection
>> >>  -> banner
>> >>  -> TAG_AUTH_GET_METHODS <-- be explicit
>> >>  -> TAG_AUTH_SET_METHOD  <-- opportunistically trying a specific
>> >> method type anyway
>> >>  -> TAG_AUTH_AUTH_REQUEST
>> >>
>> >> S:
>> >>  accepts connection
>> >>  -> banner
>> >>  -> TAG_AUTH_REPLY
>> >>
>> >>
>> >> a followup connection:
>> >>
>> >>
>> >> C:
>> >>  initiates connection
>> >>  -> banner
>> >>  -> TAG_AUTH_SET_METHOD
>> >>  -> TAG_AUTH_AUTH_REQUEST
>> >>
>> >> S:
>> >>  accepts connection
>> >>  -> banner
>> >>  -> TAG_AUTH_REPLY
>> >
>> > Yeah.. of even just make the initial connection try it's preferred method
>> > and only do the GET_METHODS if it is rejected.
>> >
>>
>> Right. In any case, the protocol should enable this flexibility.
>>
>>
>> > If you do a connect and immediately write a few bytes to teh TCP stream,
>> > does that actaully translate to fewer packets?  I was guessing that the
>> > server writing the first bytes of the exchange would be fine but if it
>> > speeds things up for the client to optimistically start the exchange too
>> > we may as well...
>> >
>>
>> While haven't really looked at it recently, I don't think it'd be
>> possible to embed data with the SYN packet using the plain vanilla tcp
>> implementation. However, I believe that doing connect() and sending
>> data immediately following it should improve things, specifically if
>> doing async connect (as with the async messenger), but this still
>> needs to be proven.
>>
>> Yehuda
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-06-02 15:59 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-26 18:17 msgr2 protocol Sage Weil
2016-05-27  4:41 ` Haomai Wang
2016-05-27  4:45   ` Haomai Wang
2016-05-27  8:28   ` Marcus Watts
2016-05-27 17:33     ` Sage Weil
2016-05-27 17:28   ` Sage Weil
2016-05-27  9:44 ` Yehuda Sadeh-Weinraub
2016-05-27 17:37   ` Sage Weil
2016-05-28 18:19     ` Yehuda Sadeh-Weinraub
2016-06-02 15:43       ` Sage Weil
2016-06-02 15:59         ` Haomai Wang [this message]
2016-06-02 16:35           ` Sage Weil
2016-06-02 18:11 ` Gregory Farnum
2016-06-02 18:24   ` Sage Weil
2016-06-02 18:34     ` Gregory Farnum
2016-06-03 13:11       ` Sage Weil
2016-06-03 13:24       ` Sage Weil
2016-06-03 16:47         ` Haomai Wang
2016-06-03 17:33           ` Sage Weil
2016-06-03 17:35             ` Haomai Wang
2016-06-06  8:23               ` Junwang Zhao
2016-06-10  8:31                 ` Marcus Watts
2016-06-10 10:11                   ` Sage Weil
2016-06-10 10:48                   ` Sage Weil
2016-06-06 20:16             ` Gregory Farnum
2016-06-10 11:04               ` Sage Weil
2016-06-10 19:05                 ` Marcus Watts
2016-06-10 21:15                   ` Sage Weil
2016-06-10 21:22                     ` Gregory Farnum
2016-06-11 23:05                     ` Marcus Watts
2016-06-12 23:59                       ` Sage Weil
     [not found]                         ` <CACJqLyax_SXEZp3vA2_wR+CdwKOo2Re=SsK2xfXqmXjz9d8iNw@mail.gmail.com>
2016-09-09 21:14                           ` Sage Weil
     [not found]                             ` <CACJqLyYwKZ5_1OHR_5=+mr=1ED2Nt34x4TB29j5dE1D+NjzFpg@mail.gmail.com>
2016-09-10 14:43                               ` Haomai Wang
2016-09-11 17:05                                 ` Sage Weil
2016-09-12  2:29                                   ` Haomai Wang
2016-09-12 13:21                                     ` Sage Weil
2016-09-13  0:03                                       ` Gregory Farnum
2016-09-13  1:35                                         ` Haomai Wang
2016-09-13 13:21                                           ` Sage Weil
2016-09-13 11:50                                       ` Jeff Layton
2016-09-13 11:18                                   ` Jeff Layton
2016-09-13 13:31                                     ` Sage Weil
2016-09-13 14:48                                       ` Jeff Layton
2016-09-13 15:10                                         ` Sage Weil
2016-09-13 20:07                                           ` Gregory Farnum
2016-06-02 18:16 ` Gregory Farnum
2016-06-29 11:59 Avner Ben Hanoch
2016-06-29 16:52 ` Yehuda Sadeh-Weinraub
2016-06-30 11:59   ` Avner Ben Hanoch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACJqLybY4Y1t787arDojNa=zLD+LdNMDukEO8yZcV+E2NSxUvA@mail.gmail.com' \
    --to=haomai@xsky.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sweil@redhat.com \
    --cc=yehuda@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.