ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiubo Li <xiubli@redhat.com>
To: Jeff Layton <jlayton@kernel.org>, Ilya Dryomov <idryomov@gmail.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>,
	Patrick Donnelly <pdonnell@redhat.com>
Subject: Re: [PATCH v2] ceph: implement updated ceph_mds_request_head structure
Date: Mon, 21 Dec 2020 13:41:09 +0800	[thread overview]
Message-ID: <d20a27e2-0c0e-65a8-3a67-b9ba9a49de87@redhat.com> (raw)
In-Reply-To: <cf73e0bde0eb4e43c0eeb0e884a468002a9f9889.camel@kernel.org>

On 2020/12/17 5:04, Jeff Layton wrote:
> On Wed, 2020-12-16 at 21:16 +0100, Ilya Dryomov wrote:
>> On Wed, Dec 16, 2020 at 8:50 PM Jeff Layton <jlayton@kernel.org> wrote:
>>> When we added the btime feature in mainline ceph, we had to extend
>>> struct ceph_mds_request_args so that it could be set. Implement the same
>>> in the kernel client.
>>>
>>> Rename ceph_mds_request_head with a _old extension, and a union
>>> ceph_mds_request_args_ext to allow for the extended size of the new
>>> header format.
>>>
>>> Add the appropriate code to handle both formats in struct
>>> create_request_message and key the behavior on whether the peer supports
>>> CEPH_FEATURE_FS_BTIME.
>>>
>>> The gid_list field in the payload is now populated from the saved
>>> credential. For now, we don't add any support for setting the btime via
>>> setattr, but this does enable us to add that in the future.
>>>
>>> [ idryomov: break unnecessarily long lines ]
>>>
>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>>   fs/ceph/mds_client.c         | 101 ++++++++++++++++++++++++++---------
>>>   include/linux/ceph/ceph_fs.h |  32 ++++++++++-
>>>   2 files changed, 108 insertions(+), 25 deletions(-)
>>>
>>>   v2: fix encoding of unsafe request resends
>>>       add encode_payload_tail helper
>>>       rework find_old_request_head to take a "legacy" flag argument
>>>
>>> Ilya,
>>>
>>> I'll go ahead and merge this into testing, but your call on whether we
>>> should take this for v5.11, or wait for v5.12. We don't have anything
>>> blocked on this just yet.
>>>
>>> I dropped your SoB and Xiubo Reviewed-by tags as well, as the patch is
>>> a bit different from the original.
>>>
>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>> index 0ff76f21466a..cd0cc5d8c4f0 100644
>>> --- a/fs/ceph/mds_client.c
>>> +++ b/fs/ceph/mds_client.c
>>> @@ -2475,15 +2475,46 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
>>>          return r;
>>>   }
>>>
>>> +static struct ceph_mds_request_head_old *
>>> +find_old_request_head(void *p, bool legacy)
>>> +{
>>> +       struct ceph_mds_request_head *new_head;
>>> +
>>> +       if (legacy)
>>> +               return (struct ceph_mds_request_head_old *)p;
>>> +       new_head = (struct ceph_mds_request_head *)p;
>>> +       return (struct ceph_mds_request_head_old *)&new_head->oldest_client_tid;
>>> +}
>>> +
>>> +static void encode_payload_tail(void **p, struct ceph_mds_request *req, bool legacy)
>>> +{
>>> +       struct ceph_timespec ts;
>>> +
>>> +       ceph_encode_timespec64(&ts, &req->r_stamp);
>>> +       ceph_encode_copy(p, &ts, sizeof(ts));
>>> +
>>> +       /* gid list */
>>> +       if (!legacy) {
>>> +               int i;
>>> +
>>> +               ceph_encode_32(p, req->r_cred->group_info->ngroups);
>>> +               for (i = 0; i < req->r_cred->group_info->ngroups; i++)
>>> +                       ceph_encode_64(p, from_kgid(&init_user_ns,
>>> +                                      req->r_cred->group_info->gid[i]));
>>> +       }
>>> +}
>>> +
>>>   /*
>>>    * called under mdsc->mutex
>>>    */
>>> -static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
>>> +static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>                                                 struct ceph_mds_request *req,
>>> -                                              int mds, bool drop_cap_releases)
>>> +                                              bool drop_cap_releases)
>>>   {
>>> +       int mds = session->s_mds;
>>> +       struct ceph_mds_client *mdsc = session->s_mdsc;
>>>          struct ceph_msg *msg;
>>> -       struct ceph_mds_request_head *head;
>>> +       struct ceph_mds_request_head_old *head;
>>>          const char *path1 = NULL;
>>>          const char *path2 = NULL;
>>>          u64 ino1 = 0, ino2 = 0;
>>> @@ -2493,6 +2524,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
>>>          u16 releases;
>>>          void *p, *end;
>>>          int ret;
>>> +       bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
>>>
>>>          ret = set_request_path_attr(req->r_inode, req->r_dentry,
>>>                                req->r_parent, req->r_path1, req->r_ino1.ino,
>>> @@ -2514,14 +2546,23 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
>>>                  goto out_free1;
>>>          }
>>>
>>> -       len = sizeof(*head) +
>>> -               pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) +
>>> +       if (legacy) {
>>> +               /* Old style */
>>> +               len = sizeof(*head);
>>> +       } else {
>>> +               /* New style: add gid_list and any later fields */
>>> +               len = sizeof(struct ceph_mds_request_head) + sizeof(u32) +
>>> +                     (sizeof(u64) * req->r_cred->group_info->ngroups);
>>> +       }
>>> +
>>> +       len += pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) +
>>>                  sizeof(struct ceph_timespec);
>>>
>>>          /* calculate (max) length for cap releases */
>>>          len += sizeof(struct ceph_mds_request_release) *
>>>                  (!!req->r_inode_drop + !!req->r_dentry_drop +
>>>                   !!req->r_old_inode_drop + !!req->r_old_dentry_drop);
>>> +
>>>          if (req->r_dentry_drop)
>>>                  len += pathlen1;
>>>          if (req->r_old_dentry_drop)
>>> @@ -2533,11 +2574,25 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
>>>                  goto out_free2;
>>>          }
>>>
>>> -       msg->hdr.version = cpu_to_le16(3);
>>>          msg->hdr.tid = cpu_to_le64(req->r_tid);
>>>
>>> -       head = msg->front.iov_base;
>>> -       p = msg->front.iov_base + sizeof(*head);
>>> +       /*
>>> +        * The old ceph_mds_request_header didn't contain a version field, and
>>> +        * one was added when we moved the message version from 3->4.
>>> +        */
>>> +       if (legacy) {
>>> +               msg->hdr.version = cpu_to_le16(3);
>>> +               p = msg->front.iov_base + sizeof(*head);
>>> +       } else {
>>> +               struct ceph_mds_request_head *new_head = msg->front.iov_base;
>>> +
>>> +               msg->hdr.version = cpu_to_le16(4);
>>> +               new_head->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
>>> +               p = msg->front.iov_base + sizeof(*new_head);
>>> +       }
>>> +
>>> +       head = find_old_request_head(msg->front.iov_base, legacy);
>>> +
>>>          end = msg->front.iov_base + msg->front.iov_len;
>>>
>>>          head->mdsmap_epoch = cpu_to_le32(mdsc->mdsmap->m_epoch);
>>> @@ -2583,12 +2638,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
>>>
>>>          head->num_releases = cpu_to_le16(releases);
>>>
>>> -       /* time stamp */
>>> -       {
>>> -               struct ceph_timespec ts;
>>> -               ceph_encode_timespec64(&ts, &req->r_stamp);
>>> -               ceph_encode_copy(&p, &ts, sizeof(ts));
>>> -       }
>>> +       encode_payload_tail(&p, req, legacy);
>>>
>>>          if (WARN_ON_ONCE(p > end)) {
>>>                  ceph_msg_put(msg);
>>> @@ -2642,9 +2692,10 @@ static int __prepare_send_request(struct ceph_mds_session *session,
>>>   {
>>>          int mds = session->s_mds;
>>>          struct ceph_mds_client *mdsc = session->s_mdsc;
>>> -       struct ceph_mds_request_head *rhead;
>>> +       struct ceph_mds_request_head_old *rhead;
>>>          struct ceph_msg *msg;
>>>          int flags = 0;
>>> +       bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
>>>
>>>          req->r_attempts++;
>>>          if (req->r_inode) {
>>> @@ -2661,6 +2712,7 @@ static int __prepare_send_request(struct ceph_mds_session *session,
>>>
>>>          if (test_bit(CEPH_MDS_R_GOT_UNSAFE, &req->r_req_flags)) {
>>>                  void *p;
>>> +
>>>                  /*
>>>                   * Replay.  Do not regenerate message (and rebuild
>>>                   * paths, etc.); just use the original message.
>>> @@ -2668,7 +2720,7 @@ static int __prepare_send_request(struct ceph_mds_session *session,
>>>                   * d_move mangles the src name.
>>>                   */
>>>                  msg = req->r_request;
>>> -               rhead = msg->front.iov_base;
>>> +               rhead = find_old_request_head(msg->front.iov_base, legacy);
>>>
>>>                  flags = le32_to_cpu(rhead->flags);
>>>                  flags |= CEPH_MDS_FLAG_REPLAY;
>>> @@ -2682,13 +2734,14 @@ static int __prepare_send_request(struct ceph_mds_session *session,
>>>                  /* remove cap/dentry releases from message */
>>>                  rhead->num_releases = 0;
>>>
>>> -               /* time stamp */
>>> +               /* verify that we haven't got mixed-feature MDSs */
>>> +               if (legacy)
>>> +                       WARN_ON_ONCE(le16_to_cpu(msg->hdr.version) >= 4);
>>> +               else
>>> +                       WARN_ON_ONCE(le16_to_cpu(msg->hdr.version) < 4);
>> As mentioned in the ticket, I already did a minimal encode_gid_list()
>> helper -- just haven't pushed because I had to leave.  Looking at this,
>> I not clear on why even bother with v3 vs v4 and the legacy branch.
>> The only that is conditional on CEPH_FEATURE_FS_BTIME is the head, we
>> can encode gids and set the version to 4 unconditionally.
>>
> We can't because legacy servers won't understand the newer struct
> ceph_mds_request_head. I'm pretty sure anything pre-luminous will barf
> on it, which is why we needed the feature bit test.
>
Yeah, agree. For the old ceph versions it may something like dropping 
the message to ground or crash the mds deamons as I hit before for the 
metric features.


      reply	other threads:[~2020-12-21  5:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-16 19:50 [PATCH v2] ceph: implement updated ceph_mds_request_head structure Jeff Layton
2020-12-16 20:16 ` Ilya Dryomov
2020-12-16 20:59   ` Ilya Dryomov
2020-12-16 21:10     ` Jeff Layton
2020-12-16 21:42       ` Ilya Dryomov
2020-12-16 21:04   ` Jeff Layton
2020-12-21  5:41     ` Xiubo Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d20a27e2-0c0e-65a8-3a67-b9ba9a49de87@redhat.com \
    --to=xiubli@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=pdonnell@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).