ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiubo Li <xiubli@redhat.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: idryomov@gmail.com, ukernel@gmail.com, pdonnell@redhat.com,
	ceph-devel@vger.kernel.org
Subject: Re: [PATCH RFC 0/2] ceph: size handling for the fscrypt
Date: Thu, 9 Sep 2021 16:12:47 +0800	[thread overview]
Message-ID: <701c4074-404d-42a4-d2e3-1999b7ffbea6@redhat.com> (raw)
In-Reply-To: <87d643da3e1c3bdd7fc24c697b01f42519c12dd5.camel@kernel.org>


On 9/8/21 10:12 PM, Jeff Layton wrote:
> On Wed, 2021-09-08 at 19:16 +0800, Xiubo Li wrote:
>> On 9/8/21 4:58 AM, Jeff Layton wrote:
>>> On Tue, 2021-09-07 at 21:19 +0800, Xiubo Li wrote:
>>>> On 9/7/21 8:35 PM, Jeff Layton wrote:
>>>>> On Fri, 2021-09-03 at 16:15 +0800, xiubli@redhat.com wrote:
>>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>>
>>>>>> This patch series is based Jeff's ceph-fscrypt-size-experimental
>>>>>> branch in https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git.
>>>>>>
>>>>>> This is just a draft patch and need to rebase or recode after Jeff
>>>>>> finished his huge patch set.
>>>>>>
>>>>>> Post the patch out for advices and ideas. Thanks.
>>>>>>
>>>>> I'll take a look. Going forward though, it'd probably be best for you to
>>>>> just take over development of the entire ceph-fscrypt-size series
>>>>> instead of trying to develop on top of my branch.
>>>>>
>>>>> That branch is _very_ rough anyway. Just clone the branch into your tree
>>>>> and then you can drop or change patches in it as you see fit.
>>>> Sure.
>>>>
>>>>
>>>>>> ====
>>>>>>
>>>>>> This approach will not do the rmw immediately after the file is
>>>>>> truncated. If the truncate size is aligned to the BLOCK SIZE, so
>>>>>> there no need to do the rmw and only in unaligned case will the
>>>>>> rmw is needed.
>>>>>>
>>>>>> And the 'fscrypt_file' field will be cleared after the rmw is done.
>>>>>> If the 'fscrypt_file' is none zero that means after the kclient
>>>>>> reading that block to local buffer or pagecache it needs to do the
>>>>>> zeroing of that block in range of [fscrypt_file, round_up(fscrypt_file,
>>>>>> BLOCK SIZE)).
>>>>>>
>>>>>> Once any kclient has dirty that block and write it back to ceph, the
>>>>>> 'fscrypt_file' field will be cleared and set to 0. More detail please
>>>>>> see the commit comments in the second patch.
>>>>>>
>>>>> That sounds odd. How do you know where the file ends once you zero out
>>>>> fscrypt_file?
>>>>>
>>>>> /me goes to look at the patches...
>>>> The code in the ceph_fill_inode() is not handling well for multiple
>>>> ftruncates case, need to be fixed.
>>>>
>>> Ok. It'd probably be best to do that fix first in a separate patch and
>>> do the fscrypt work on top.
>>>
>>> FWIW, I'd really like to see the existing truncate code simplified (or
>>> at least, better documented). I'm very leery of adding yet more fields
>>> to the inode to handle truncate/size. So far, we have all of this:
>>>
>>>           struct mutex i_truncate_mutex;
>>>           u32 i_truncate_seq;        /* last truncate to smaller size */
>>>           u64 i_truncate_size;       /*  and the size we last truncated down to */
>>>           int i_truncate_pending;    /*  still need to call vmtruncate */
>>>
>>>           u64 i_max_size;            /* max file size authorized by mds */
>>>           u64 i_reported_size; /* (max_)size reported to or requested of mds */
>>>           u64 i_wanted_max_size;     /* offset we'd like to write too */
>>>           u64 i_requested_max_size;  /* max_size we've requested */
>>>
>>> Your patchset adds yet another new field with its own logic. I think we
>>> need to aim to simplify this code rather than just piling more logic on
>>> top.
>> Yeah, makes sense.
>>
>>
>>>> Maybe we need to change the 'fscrypt_file' field's logic and make it
>>>> opaqueness for MDS, then the MDS will use it to do the truncate instead
>>>> as I mentioned in the previous reply in your patch set.
>>>>
>>>> Then we can do the defer rmw in any kclient when necessary as this patch
>>>> does.
>>>>
>>> I think you can't defer the rmw unless you have Fb caps. In that case,
>>> you'd probably want to just truncate it in the pagecache, dirty the last
>>> page in the inode, and issue the truncate to the MDS.
>>> In the case where you don't have Fb caps, then I think you don't want to
>>> defer anything, as you can't guarantee another client won't get in there
>>> to read the object. On a truncate, you'll want to issue the truncate to
>>> the MDS and do the RMW on the last page. I'm not sure what order you'd
>>> want to do that in though. Maybe you can issue them simultaneously?
>> I am not sure I correctly understand this. If my understanding is correct:
>>
>> If one kclient will ftruncate a file, the fscrypt_file will be recorded
>> in the metadata. So after that this kclient could just release the Fwb
>> caps if it has. And later for any kclient it should first get the
>> fscrypt_file, so when:
>>
>> A), reading, it should be granted the Fr caps, then we it always zero
>> that specified block, which the contents needs to be truncated, just
>> after the readed data dencrypted by using the fscrypt_file.
>>
> Yes. Also, the end of the block beyond the offset in fscrypt_file should
> already contain all zeroes after it has been decrypted.
>
>> B), writing, if kclient wants to write data back to a file, it should
>> always do the read-modify-write, right ? It will read the data to the
>> local page buffers first by zeroing that specified block. Since it can
>> buffer the data it should already have been granted the Fb caps. If that
>> specified block will be updated, then it should update that whole block
>> contents, and that whole block has already been truncated and modified.
>> Then we can reset the fscrypt_file value in MDS, and at the same time we
>> need to hold the Fx too. That means if encryption is enabled, when
>> writing it should always get the Fx caps.
>>
> Yes.
>
> When we are operating with Fb caps, the client is already doing page-
> aligned I/Os. Note that we already do a rmw in the case where we have Fb
> caps. See ceph_write_begin().
>
> The first thing it does before writing the new data to the pagecache is
> update the parts of the page that are not being overwritten (if
> necessary).  So it does a read to fill the unwritten parts of the page,
> writes the data to the page, and eventually it gets flushed back out to
> the OSD during writeback.
>
>> If one kclient have held the Fb caps, will MDS allow any other kclient
>> to hold the Fr caps ?
>>
> It should not, but I don't have the firmest grasp of the MDS locks.c
> logic.
>
> AIUI, if the MDS hands out Fb caps, then it needs to ensure that no
> other clients are doing reads at the same time and it does that by
> withholding Fr from other clients. Only once Fb caps have been returned
> to the MDS should it hand out Fr to another client.

Yeah, this is also what I understand the cephfs caps.

Thanks.

>
>> For the Fb cap did I miss something ?
>>
>>
>>
>>>>>> There also need on small work in Jeff's MDS PR in cap flushing code
>>>>>> to clear the 'fscrypt_file'.
>>>>>>
>>>>>>
>>>>>> Xiubo Li (2):
>>>>>>      Revert "ceph: make client zero partial trailing block on truncate"
>>>>>>      ceph: truncate the file contents when needed when file scrypted
>>>>>>
>>>>>>     fs/ceph/addr.c  | 19 ++++++++++++++-
>>>>>>     fs/ceph/caps.c  | 24 ++++++++++++++++++
>>>>>>     fs/ceph/file.c  | 65 ++++++++++++++++++++++++++++++++++++++++++++++---
>>>>>>     fs/ceph/inode.c | 48 +++++++++++++++++++-----------------
>>>>>>     fs/ceph/super.h | 13 +++++++---
>>>>>>     5 files changed, 138 insertions(+), 31 deletions(-)
>>>>>>


  reply	other threads:[~2021-09-09  8:12 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-03  8:15 [PATCH RFC 0/2] ceph: size handling for the fscrypt xiubli
2021-09-03  8:15 ` [PATCH RFC 1/2] Revert "ceph: make client zero partial trailing block on truncate" xiubli
2021-09-03  8:15 ` [PATCH RFC 2/2] ceph: truncate the file contents when needed when file scrypted xiubli
2021-09-07 16:26   ` Jeff Layton
2021-09-08  9:37     ` Xiubo Li
2021-09-08 13:57       ` Jeff Layton
2021-09-09  3:38         ` Xiubo Li
2021-09-09 12:48           ` Jeff Layton
2021-09-10  2:30             ` Xiubo Li
2021-09-10 11:46               ` Jeff Layton
2021-09-13  5:42                 ` Xiubo Li
2021-09-13 14:05                   ` Jeff Layton
2021-09-14  5:43                     ` Xiubo Li
2021-09-13 19:34                   ` Jeff Layton
2021-09-14  5:40                     ` Xiubo Li
2021-09-14 14:24                       ` Jeff Layton
2021-09-16 10:02                         ` Xiubo Li
2021-09-17 17:19                           ` Jeff Layton
2021-09-20 14:32                             ` Xiubo Li
2021-09-20 19:24                               ` Jeff Layton
2021-09-22  2:23                                 ` Xiubo Li
2021-09-24 18:52                                   ` Jeff Layton
2021-09-25  1:02                                     ` Xiubo Li
2021-09-24 15:01                     ` Xiubo Li
2021-09-25  9:56                     ` Xiubo Li
2021-10-11 13:29                       ` Jeff Layton
2021-10-11 15:16                         ` Xiubo Li
2021-09-07 12:35 ` [PATCH RFC 0/2] ceph: size handling for the fscrypt Jeff Layton
2021-09-07 13:19   ` Xiubo Li
2021-09-07 20:58     ` Jeff Layton
2021-09-08 11:16       ` Xiubo Li
2021-09-08 14:12         ` Jeff Layton
2021-09-09  8:12           ` Xiubo Li [this message]
2021-09-08 11:17       ` Xiubo Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=701c4074-404d-42a4-d2e3-1999b7ffbea6@redhat.com \
    --to=xiubli@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=pdonnell@redhat.com \
    --cc=ukernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).