ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiubo Li <xiubli@redhat.com>
To: Patrick Donnelly <pdonnell@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>,
	Ilya Dryomov <idryomov@gmail.com>,
	Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH 4/5] ceph: flush the mdlog before waiting on unsafe reqs
Date: Sat, 3 Jul 2021 09:33:24 +0800	[thread overview]
Message-ID: <b32a4f22-52f5-9c1d-a89f-373093c84dca@redhat.com> (raw)
In-Reply-To: <CA+2bHPbtUchykAeDcH1rh5YXzJHRMLPtOaHy7f332scX+9wmHw@mail.gmail.com>


On 7/3/21 2:14 AM, Patrick Donnelly wrote:
> On Fri, Jul 2, 2021 at 6:17 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 7/2/21 7:46 AM, Patrick Donnelly wrote:
>>> On Wed, Jun 30, 2021 at 11:18 PM Xiubo Li <xiubli@redhat.com> wrote:
>>>> And just now I have run by adding the time stamp:
>>>>
>>>>> fd = open("/path")
>>>>> fopenat(fd, "foo")
>>>>> renameat(fd, "foo", fd, "bar")
>>>>> fstat(fd)
>>>>> fsync(fd)
>>>> lxb ----- before renameat ---> Current time is Thu Jul  1 13:28:52 2021
>>>> lxb ----- after renameat ---> Current time is Thu Jul  1 13:28:52 2021
>>>> lxb ----- before fstat ---> Current time is Thu Jul  1 13:28:52 2021
>>>> lxb ----- after fstat ---> Current time is Thu Jul  1 13:28:52 2021
>>>> lxb ----- before fsync ---> Current time is Thu Jul  1 13:28:52 2021
>>>> lxb ----- after fsync ---> Current time is Thu Jul  1 13:28:56 2021
>>>>
>>>> We can see that even after 'fstat(fd)', the 'fsync(fd)' still will wait around 4s.
>>>>
>>>> Why your test worked it should be the MDS's tick thread and the 'fstat(fd)' were running almost simultaneously sometimes, I also could see the 'fsync(fd)' finished very fast sometimes:
>>>>
>>>> lxb ----- before renameat ---> Current time is Thu Jul  1 13:29:51 2021
>>>> lxb ----- after renameat ---> Current time is Thu Jul  1 13:29:51 2021
>>>> lxb ----- before fstat ---> Current time is Thu Jul  1 13:29:51 2021
>>>> lxb ----- after fstat ---> Current time is Thu Jul  1 13:29:51 2021
>>>> lxb ----- before fsync ---> Current time is Thu Jul  1 13:29:51 2021
>>>> lxb ----- after fsync ---> Current time is Thu Jul  1 13:29:51 2021
>>> Actually, I did a lot more testing on this. It's a unique behavior of
>>> the directory is /. You will see a getattr force a flush of the
>>> journal:
>>>
>>> 2021-07-01T23:42:18.095+0000 7fcc7741c700  7 mds.0.server
>>> dispatch_client_request client_request(client.4257:74 getattr
>>> pAsLsXsFs #0x1 2021-07-01T23:42:18.095884+0000 caller_uid=1147,
>>> caller_gid=1147{1000,1147,}) v5
>>> ...
>>> 2021-07-01T23:42:18.096+0000 7fcc7741c700 10 mds.0.locker nudge_log
>>> (ifile mix->sync w=2) on [inode 0x1 [...2,head] / auth v34 pv39 ap=6
>>> snaprealm=0x564734479600 DIRTYPARENT f(v0
>>> m2021-07-01T23:38:00.418466+0000 3=1+2) n(v6
>>> rc2021-07-01T23:38:15.692076+0000 b65536 7=2+5)/n(v0
>>> rc2021-07-01T19:31:40.924877+0000 1=0+1) (iauth sync r=1) (isnap sync
>>> r=4) (inest mix w=3) (ipolicy sync r=2) (ifile mix->sync w=2)
>>> (iversion lock w=3) caps={4257=pAsLsXs/-@32} | dirtyscattered=0
>>> request=1 lock=6 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1
>>> authpin=1 0x56473913a580]
>>>
>>> You don't see that getattr for directories other than root. That's
>>> probably because the client has been issued more caps than what the
>>> MDS is willing to normally hand out for root.
>> For the root dir, when doing the 'rename' the wrlock_start('ifile lock')
>> will change the lock state 'SYNC' --> 'MIX'. Then the inode 0x1 will
>> issue 'pAsLsXs' to clients. So when the client sends a 'getattr' request
>> with caps 'AsXsFs' wanted, the mds will try to switch the 'ifile lock'
>> state back to 'SYNC' to get the 'Fs' cap. Since the rdlock_start('ifile
>> lock') needs to do the lock state transition, it will wait and trigger
>> the 'nudge_log'.
>>
>> The reason why will wrlock_start('ifile lock') change the lock state
>> 'SYNC' --> 'MIX' above is that the inode '0x1' has subtree, if my
>> understanding is correct so for the root dir it should be very probably
>> shared by multiple MDSes and it chooses to switch to MIX.
>>
>> This is why the root dir will work when we send a 'getattr' request.
>>
>>
>> For the none root directories, it will bump to loner and then the
>> 'ifile/iauth/ixattr locks' state switched to EXCL instead, for this lock
>> state it will issue 'pAsxLsXsxFsx' cap. So when doing the
>> 'getattr(AsXsFs)' in client, it will do nothing since it's already
>> issued the caps needed. This is why we couldn't see the getattr request
>> was sent out.
>>
>> Even we 'forced' to call the getattr, it can get the rdlock immediately
>> and no need to gather or do lock state transition, so no 'nudge_log' was
>> called. Since in case if the none directories are in loner mode and the
>> locks will be in 'EXCL' state, so it will allow 'pAsxLsXsxFsxrwb' as
>> default, then even we 'forced' call the getattr('pAsxLsXsxFsxrwb') in
>> fsync, in the MDS side it still won't do the lock states transition.
>>
>>
>>> I'm not really sure why there is a difference. I even experimented
>>> with redundant getattr ("forced") calls to cause a journal flush on
>>> non-root directories but didn't get anywhere. Maybe you can
>>> investigate further? It'd be optimal if we could nudge the log just by
>>> doing a getattr.
>> So in the above case, from my tests and reading the Locker code, I
>> didn't figure out how can the getattr could work for this issue yet.
>>
>> Patrick,
>>
>> Did I miss something about the Lockers ?
> No, your analysis looks right. Thanks.
>
> I suppose this flush_mdlog message is the best tool we have to fix this.
>
Cool.

I will post the second version of this patch series by just sending the 
mdlog flush requests to the relevant and auth MDSes. I will fix this in 
fuse client, which is trying to send mdlog flush to all the MDSes, later.

Thanks Patrick.

BRs



  reply	other threads:[~2021-07-03  1:33 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-29  4:42 [PATCH 0/5] flush the mdlog before waiting on unsafe reqs xiubli
2021-06-29  4:42 ` [PATCH 1/5] ceph: export ceph_create_session_msg xiubli
2021-06-29 13:12   ` Jeff Layton
2021-06-29 13:27     ` Xiubo Li
2021-06-30 12:17       ` Ilya Dryomov
2021-07-01  1:50         ` Xiubo Li
2021-06-29  4:42 ` [PATCH 2/5] ceph: export iterate_sessions xiubli
2021-06-29 15:39   ` Jeff Layton
2021-06-30  0:55     ` Xiubo Li
2021-06-29  4:42 ` [PATCH 3/5] ceph: flush mdlog before umounting xiubli
2021-06-29 15:34   ` Jeff Layton
2021-06-30  0:36     ` Xiubo Li
2021-06-30 12:39   ` Ilya Dryomov
2021-07-01  1:18     ` Xiubo Li
2021-06-29  4:42 ` [PATCH 4/5] ceph: flush the mdlog before waiting on unsafe reqs xiubli
2021-06-29 13:25   ` Jeff Layton
2021-06-30  1:26     ` Xiubo Li
2021-06-30 12:13       ` Jeff Layton
2021-07-01  1:16         ` Xiubo Li
2021-07-01  3:27           ` Patrick Donnelly
     [not found]             ` <e917a3e1-2902-604b-5154-98086c95357f@redhat.com>
2021-07-01 23:46               ` Patrick Donnelly
2021-07-02  0:01                 ` Xiubo Li
2021-07-02 13:17                 ` Xiubo Li
2021-07-02 18:14                   ` Patrick Donnelly
2021-07-03  1:33                     ` Xiubo Li [this message]
2021-06-29  4:42 ` [PATCH 5/5] ceph: fix ceph feature bits xiubli
2021-06-29 15:38   ` Jeff Layton
2021-06-30  0:52     ` Xiubo Li
2021-06-30 12:05       ` Jeff Layton
2021-06-30 12:52         ` Ilya Dryomov
2021-07-01  1:07           ` Xiubo Li
2021-07-01  1:08           ` Xiubo Li
2021-07-01  3:35           ` Xiubo Li
2021-06-29 15:27 ` [PATCH 0/5] flush the mdlog before waiting on unsafe reqs Jeff Layton
2021-06-30  0:35   ` Xiubo Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b32a4f22-52f5-9c1d-a89f-373093c84dca@redhat.com \
    --to=xiubli@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=pdonnell@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).