From: Xiubo Li <xiubli@redhat.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: idryomov@gmail.com, ukernel@gmail.com, pdonnell@redhat.com,
ceph-devel@vger.kernel.org, Douglas Fuller <dfuller@redhat.com>
Subject: Re: [PATCH RFC 2/2] ceph: truncate the file contents when needed when file scrypted
Date: Mon, 11 Oct 2021 23:16:39 +0800 [thread overview]
Message-ID: <05c0cca9-5447-7199-afcb-b02a15cb6ec7@redhat.com> (raw)
In-Reply-To: <126f465b60fd02e14dcc1d6901a80065424c1f17.camel@kernel.org>
On 10/11/21 9:29 PM, Jeff Layton wrote:
> On Sat, 2021-09-25 at 17:56 +0800, Xiubo Li wrote:
>> On 9/14/21 3:34 AM, Jeff Layton wrote:
>>> On Mon, 2021-09-13 at 13:42 +0800, Xiubo Li wrote:
>>>> On 9/10/21 7:46 PM, Jeff Layton wrote:
>> [...]
>>>>> Are you certain that Fw caps is enough to ensure that no other client
>>>>> holds Fr caps?
>>>> I spent hours and went through the mds Locker related code on the weekends.
>>>>
>>>> From the mds/lock.cc code, for mds filelock for example in the LOCK_MIX
>>>> state and some interim transition states to LOCK_MIX it will allow
>>>> different clients could hold any of Fw or Fr caps. But the Fb/Fc will be
>>>> disabled. Checked the mds/Locker.cc code, found that the mds filelock
>>>> could to switch LOCK_MIX state in some cases when there has one client
>>>> wants Fw and another client wants any of Fr and Fw.
>>>>
>>>> In this case I think the Linux advisory or mandatory locks are necessary
>>>> to keep the file contents concurrency. In multiple processes' concurrent
>>>> read/write or write/write cases without the Linux advisory/mandatory
>>>> locks the file contents' concurrency won't be guaranteed, so the logic
>>>> is the same here ?
>>>>
>>>> If so, couldn't we just assume the Fw vs Fw and Fr vs Fw caps should be
>>>> exclusive in correct use case ? For example, just after the mds filelock
>>>> state switches to LOCK_MIX, if clientA gets the advisory file lock and
>>>> the Fw caps, and even another clientB could be successfully issued the
>>>> Fr caps, the clientB won't do any read because it should be still stuck
>>>> and be waiting for the advisory file lock.
>>>>
>>> I'm not sure I like that idea. Basically, that would change the meaning
>>> of the what Frw caps represent, in a way that is not really consistent
>>> with how they have been used before.
>>>
>>> We could gate that new behavior on the new feature flags, but it sounds
>>> pretty tough.
>>>
>>> I think we have a couple of options:
>>>
>>> 1) we could just make the clients request and wait on Fx caps when they
>>> do a truncate. They might stall for a bit if there is contention, but it
>>> would ensure consistency and the client could be completely in charge of
>>> the truncate. [a]
>>>
>>> 2) we could rev the protocol, and have the client send along the last
>>> block to be written along with the SETATTR request.
>> I am also thinking send the last block along with SETATTR request, it
>> must journal the last block too, I am afraid it will occupy many cephfs
>> meta pool in corner case, such as when client send massive truncate
>> requests in a short time, just like in this bug:
>> https://tracker.ceph.com/issues/52280.
>>
>>
> Good point.
>
> Yes, we'd need to buffer the last block on a truncate like this, but we
> could limit the amount of truncates with "last block" operations that
> run concurrently. We'd probably also want to cap the size of the "last
> block" too.
Okay. So by far this seems the best approach.
I will try to it tomorrow.
>
>>> Maybe we even
>>> consider just adding a new TRUNCATE call independent of SETATTR. The MDS
>>> would remain in complete control of it at that point.
>> Maybe we can just do:
>>
>> When the MDS received a SETATTR request with size changing from clientA,
>> it will try to xlock(filelock), during which the MDS will always only
>> allow Fcb caps to all the clients, so another client could still
>> buffering the last block.
>>
>> I think we can just nudge the journal log for this request in MDS and do
>> not do the early reply to let the clientA's truncate request to wait.
>> When the journal log is successfully flushed and before releasing the
>> xlock(filelock), we can tell the clientA to do the RMW for the last
>> block. Currently while the xlock is held, no client could get the Frw
>> caps, so we need to add one interim xlock state to only allow the
>> xlocker clientA to have the Frw, such as:
>>
>> [LOCK_XLOCKDONE_TRUNC] = { LOCK_LOCK, false, LOCK_LOCK, 0, XCL, 0,
>> 0, 0, 0, 0, 0,0,CEPH_CAP_GRD|CEPH_CAP_GWR,0 },
>>
>> So for clientA it will be safe to do the RMW, after this the MDS will
>> finished the truncate request with safe reply only.
>>
>>
> This sounds pretty fragile. I worry about splitting responsibility for
> truncates across two different entities (MDS and client). That means a
> lot more complex failure cases.
Yeah, this will be more complex to handle the failure cases.
> What will you do if you do this, and then the client dies before it can
> finish the RMW? How will you know when the client's RMW cycle is
> complete? I assume it'll have to send a "truncate complete" message to
> the MDS in that case to know when it can release the xlock?
>
Okay, I didn't foresee this case, this sounds making it very complex...
>>> The other ideas I've considered seem more complex and don't offer any
>>> significant advantages that I can see.
>>>
>>> [a]: Side question: why does buffering a truncate require Fx and not Fb?
>>> How do Fx and Fb interact?
>>>
[...]
next prev parent reply other threads:[~2021-10-11 15:16 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-03 8:15 [PATCH RFC 0/2] ceph: size handling for the fscrypt xiubli
2021-09-03 8:15 ` [PATCH RFC 1/2] Revert "ceph: make client zero partial trailing block on truncate" xiubli
2021-09-03 8:15 ` [PATCH RFC 2/2] ceph: truncate the file contents when needed when file scrypted xiubli
2021-09-07 16:26 ` Jeff Layton
2021-09-08 9:37 ` Xiubo Li
2021-09-08 13:57 ` Jeff Layton
2021-09-09 3:38 ` Xiubo Li
2021-09-09 12:48 ` Jeff Layton
2021-09-10 2:30 ` Xiubo Li
2021-09-10 11:46 ` Jeff Layton
2021-09-13 5:42 ` Xiubo Li
2021-09-13 14:05 ` Jeff Layton
2021-09-14 5:43 ` Xiubo Li
2021-09-13 19:34 ` Jeff Layton
2021-09-14 5:40 ` Xiubo Li
2021-09-14 14:24 ` Jeff Layton
2021-09-16 10:02 ` Xiubo Li
2021-09-17 17:19 ` Jeff Layton
2021-09-20 14:32 ` Xiubo Li
2021-09-20 19:24 ` Jeff Layton
2021-09-22 2:23 ` Xiubo Li
2021-09-24 18:52 ` Jeff Layton
2021-09-25 1:02 ` Xiubo Li
2021-09-24 15:01 ` Xiubo Li
2021-09-25 9:56 ` Xiubo Li
2021-10-11 13:29 ` Jeff Layton
2021-10-11 15:16 ` Xiubo Li [this message]
2021-09-07 12:35 ` [PATCH RFC 0/2] ceph: size handling for the fscrypt Jeff Layton
2021-09-07 13:19 ` Xiubo Li
2021-09-07 20:58 ` Jeff Layton
2021-09-08 11:16 ` Xiubo Li
2021-09-08 14:12 ` Jeff Layton
2021-09-09 8:12 ` Xiubo Li
2021-09-08 11:17 ` Xiubo Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=05c0cca9-5447-7199-afcb-b02a15cb6ec7@redhat.com \
--to=xiubli@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=dfuller@redhat.com \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=pdonnell@redhat.com \
--cc=ukernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).