All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: xiubli@redhat.com
Cc: idryomov@gmail.com, vshankar@redhat.com, khiremat@redhat.com,
	pdonnell@redhat.com, ceph-devel@vger.kernel.org
Subject: Re: [PATCH v2 0/4] ceph: size handling for the fscrypt
Date: Wed, 20 Oct 2021 11:32:48 -0400	[thread overview]
Message-ID: <d88365035eb11560425e67aa34444086c80c628f.camel@kernel.org> (raw)
In-Reply-To: <20211020132813.543695-1-xiubli@redhat.com>

On Wed, 2021-10-20 at 21:28 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> This patch series is based on the fscrypt_size_handling branch in
> https://github.com/lxbsz/linux.git, which is based Jeff's
> ceph-fscrypt-content-experimental branch in
> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git,
> has reverted one useless commit and added some upstream commits.
> 
> I will keep this patch set as simple as possible to review since
> this is still one framework code. It works and still in developing
> and need some feedbacks and suggestions for two corner cases below.
> 
> ====
> 
> This approach is based on the discussion from V1, which will pass
> the encrypted last block contents to MDS along with the truncate
> request.
> 
> This will send the encrypted last block contents to MDS along with
> the truncate request when truncating to a smaller size and at the
> same time new size does not align to BLOCK SIZE.
> 
> The MDS side patch is raised in PR
> https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
> previous great work in PR https://github.com/ceph/ceph/pull/41284.
> 
> The MDS will use the filer.write_trunc(), which could update and
> truncate the file in one shot, instead of filer.truncate().
> 
> I have removed the inline data related code since we are remove
> this feature, more detail please see:
> https://tracker.ceph.com/issues/52916
> 
> 
> Note: There still has two CORNER cases we need to deal with:
> 
> 1), If a truncate request with the last block is sent to the MDS and
> just before the MDS has acquired the xlock for FILE lock, if another
> client has updated that last block content, we will over write the
> last block with old data.
> 
> For this case we could send the old encrypted last block data along
> with the truncate request and in MDS side read it and then do compare
> just before updating it, if the comparasion fails, then fail the
> truncate and let the kclient retry it.

Right -- this is the tricky bit. We're doing a truncate with a read-
modify-write cycle for the last block rolled in. We _must_ gate the
truncate+write vs. intervening changes to that extent.

You may be able to use the object version instead of comparing the old
block. The ceph-fscrypt-content branch has a patch that adds support for
CEPH_OSD_OP_ASSERT_VER, but I seem to recall that the OSD supports a way
to assert that an extent hasn't changed.

So, basically my thinking was something like:

client reads the data from the object and fetches the object version
send the object version along with the last block, and then the MDS's
write+truncate operation could assert on that version.

The catch here is that tracking those object versions is sort of nasty.
Having it do comparisons of the extent contents might be simpler.

> 2), If another client has buffered the last block, we should flush
> it first. I am still thinking how to do this ? Any idea is welcome.
> 

I think by asserting on the contents of the last block or the object
version, this problem is also solved.

> Thanks.
> 
> 
> Xiubo Li (4):
>   ceph: add __ceph_get_caps helper support
>   ceph: add __ceph_sync_read helper support
>   ceph: return the real size readed when hit EOF
>   ceph: add truncate size handling support for fscrypt
> 
>  fs/ceph/caps.c  |  28 ++++---
>  fs/ceph/file.c  |  41 ++++++----
>  fs/ceph/inode.c | 210 ++++++++++++++++++++++++++++++++++++++++++------
>  fs/ceph/super.h |   4 +
>  4 files changed, 234 insertions(+), 49 deletions(-)
> 

-- 
Jeff Layton <jlayton@kernel.org>


  parent reply	other threads:[~2021-10-20 15:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20 13:28 [PATCH v2 0/4] ceph: size handling for the fscrypt xiubli
2021-10-20 13:28 ` [PATCH v2 1/4] ceph: add __ceph_get_caps helper support xiubli
2021-10-20 13:28 ` [PATCH v2 2/4] ceph: add __ceph_sync_read " xiubli
2021-10-20 13:28 ` [PATCH v2 3/4] ceph: return the real size readed when hit EOF xiubli
2021-10-25 19:05   ` Jeff Layton
2021-10-26  3:12     ` Xiubo Li
2021-10-20 13:28 ` [PATCH v2 4/4] ceph: add truncate size handling support for fscrypt xiubli
2021-10-25 20:01   ` Jeff Layton
2021-10-26  3:41     ` Xiubo Li
2021-10-27 23:23       ` Xiubo Li
2021-10-27  5:12     ` Xiubo Li
2021-10-27 12:17       ` Jeff Layton
2021-10-27 13:57         ` Xiubo Li
2021-10-27 15:06         ` Luís Henriques
2021-10-27 23:08           ` Xiubo Li
2021-10-20 15:32 ` Jeff Layton [this message]
2021-10-25 20:13 ` [PATCH v2 0/4] ceph: size handling for the fscrypt Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d88365035eb11560425e67aa34444086c80c628f.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=khiremat@redhat.com \
    --cc=pdonnell@redhat.com \
    --cc=vshankar@redhat.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.