ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiubo Li <xiubli@redhat.com>
To: Patrick Donnelly <pdonnell@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>,
	Ilya Dryomov <idryomov@gmail.com>,
	Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH 4/5] ceph: flush the mdlog before waiting on unsafe reqs
Date: Fri, 2 Jul 2021 21:17:13 +0800	[thread overview]
Message-ID: <838be760-4d61-9fc7-be8c-59deea9d0e98@redhat.com> (raw)
In-Reply-To: <CA+2bHPY=xyqW48RfuGX8C9Br7vRUArF66AK5yDTOKH4Ewdt8dg@mail.gmail.com>


On 7/2/21 7:46 AM, Patrick Donnelly wrote:
> On Wed, Jun 30, 2021 at 11:18 PM Xiubo Li <xiubli@redhat.com> wrote:
>> And just now I have run by adding the time stamp:
>>
>>> fd = open("/path")
>>> fopenat(fd, "foo")
>>> renameat(fd, "foo", fd, "bar")
>>> fstat(fd)
>>> fsync(fd)
>> lxb ----- before renameat ---> Current time is Thu Jul  1 13:28:52 2021
>> lxb ----- after renameat ---> Current time is Thu Jul  1 13:28:52 2021
>> lxb ----- before fstat ---> Current time is Thu Jul  1 13:28:52 2021
>> lxb ----- after fstat ---> Current time is Thu Jul  1 13:28:52 2021
>> lxb ----- before fsync ---> Current time is Thu Jul  1 13:28:52 2021
>> lxb ----- after fsync ---> Current time is Thu Jul  1 13:28:56 2021
>>
>> We can see that even after 'fstat(fd)', the 'fsync(fd)' still will wait around 4s.
>>
>> Why your test worked it should be the MDS's tick thread and the 'fstat(fd)' were running almost simultaneously sometimes, I also could see the 'fsync(fd)' finished very fast sometimes:
>>
>> lxb ----- before renameat ---> Current time is Thu Jul  1 13:29:51 2021
>> lxb ----- after renameat ---> Current time is Thu Jul  1 13:29:51 2021
>> lxb ----- before fstat ---> Current time is Thu Jul  1 13:29:51 2021
>> lxb ----- after fstat ---> Current time is Thu Jul  1 13:29:51 2021
>> lxb ----- before fsync ---> Current time is Thu Jul  1 13:29:51 2021
>> lxb ----- after fsync ---> Current time is Thu Jul  1 13:29:51 2021
> Actually, I did a lot more testing on this. It's a unique behavior of
> the directory is /. You will see a getattr force a flush of the
> journal:
>
> 2021-07-01T23:42:18.095+0000 7fcc7741c700  7 mds.0.server
> dispatch_client_request client_request(client.4257:74 getattr
> pAsLsXsFs #0x1 2021-07-01T23:42:18.095884+0000 caller_uid=1147,
> caller_gid=1147{1000,1147,}) v5
> ...
> 2021-07-01T23:42:18.096+0000 7fcc7741c700 10 mds.0.locker nudge_log
> (ifile mix->sync w=2) on [inode 0x1 [...2,head] / auth v34 pv39 ap=6
> snaprealm=0x564734479600 DIRTYPARENT f(v0
> m2021-07-01T23:38:00.418466+0000 3=1+2) n(v6
> rc2021-07-01T23:38:15.692076+0000 b65536 7=2+5)/n(v0
> rc2021-07-01T19:31:40.924877+0000 1=0+1) (iauth sync r=1) (isnap sync
> r=4) (inest mix w=3) (ipolicy sync r=2) (ifile mix->sync w=2)
> (iversion lock w=3) caps={4257=pAsLsXs/-@32} | dirtyscattered=0
> request=1 lock=6 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1
> authpin=1 0x56473913a580]
>
> You don't see that getattr for directories other than root. That's
> probably because the client has been issued more caps than what the
> MDS is willing to normally hand out for root.

For the root dir, when doing the 'rename' the wrlock_start('ifile lock') 
will change the lock state 'SYNC' --> 'MIX'. Then the inode 0x1 will 
issue 'pAsLsXs' to clients. So when the client sends a 'getattr' request 
with caps 'AsXsFs' wanted, the mds will try to switch the 'ifile lock' 
state back to 'SYNC' to get the 'Fs' cap. Since the rdlock_start('ifile 
lock') needs to do the lock state transition, it will wait and trigger 
the 'nudge_log'.

The reason why will wrlock_start('ifile lock') change the lock state 
'SYNC' --> 'MIX' above is that the inode '0x1' has subtree, if my 
understanding is correct so for the root dir it should be very probably 
shared by multiple MDSes and it chooses to switch to MIX.

This is why the root dir will work when we send a 'getattr' request.


For the none root directories, it will bump to loner and then the 
'ifile/iauth/ixattr locks' state switched to EXCL instead, for this lock 
state it will issue 'pAsxLsXsxFsx' cap. So when doing the 
'getattr(AsXsFs)' in client, it will do nothing since it's already 
issued the caps needed. This is why we couldn't see the getattr request 
was sent out.

Even we 'forced' to call the getattr, it can get the rdlock immediately 
and no need to gather or do lock state transition, so no 'nudge_log' was 
called. Since in case if the none directories are in loner mode and the 
locks will be in 'EXCL' state, so it will allow 'pAsxLsXsxFsxrwb' as 
default, then even we 'forced' call the getattr('pAsxLsXsxFsxrwb') in 
fsync, in the MDS side it still won't do the lock states transition.


>
> I'm not really sure why there is a difference. I even experimented
> with redundant getattr ("forced") calls to cause a journal flush on
> non-root directories but didn't get anywhere. Maybe you can
> investigate further? It'd be optimal if we could nudge the log just by
> doing a getattr.

So in the above case, from my tests and reading the Locker code, I 
didn't figure out how can the getattr could work for this issue yet.

Patrick,

Did I miss something about the Lockers ?


BRs

Xiubo



  parent reply	other threads:[~2021-07-02 13:17 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-29  4:42 [PATCH 0/5] flush the mdlog before waiting on unsafe reqs xiubli
2021-06-29  4:42 ` [PATCH 1/5] ceph: export ceph_create_session_msg xiubli
2021-06-29 13:12   ` Jeff Layton
2021-06-29 13:27     ` Xiubo Li
2021-06-30 12:17       ` Ilya Dryomov
2021-07-01  1:50         ` Xiubo Li
2021-06-29  4:42 ` [PATCH 2/5] ceph: export iterate_sessions xiubli
2021-06-29 15:39   ` Jeff Layton
2021-06-30  0:55     ` Xiubo Li
2021-06-29  4:42 ` [PATCH 3/5] ceph: flush mdlog before umounting xiubli
2021-06-29 15:34   ` Jeff Layton
2021-06-30  0:36     ` Xiubo Li
2021-06-30 12:39   ` Ilya Dryomov
2021-07-01  1:18     ` Xiubo Li
2021-06-29  4:42 ` [PATCH 4/5] ceph: flush the mdlog before waiting on unsafe reqs xiubli
2021-06-29 13:25   ` Jeff Layton
2021-06-30  1:26     ` Xiubo Li
2021-06-30 12:13       ` Jeff Layton
2021-07-01  1:16         ` Xiubo Li
2021-07-01  3:27           ` Patrick Donnelly
     [not found]             ` <e917a3e1-2902-604b-5154-98086c95357f@redhat.com>
2021-07-01 23:46               ` Patrick Donnelly
2021-07-02  0:01                 ` Xiubo Li
2021-07-02 13:17                 ` Xiubo Li [this message]
2021-07-02 18:14                   ` Patrick Donnelly
2021-07-03  1:33                     ` Xiubo Li
2021-06-29  4:42 ` [PATCH 5/5] ceph: fix ceph feature bits xiubli
2021-06-29 15:38   ` Jeff Layton
2021-06-30  0:52     ` Xiubo Li
2021-06-30 12:05       ` Jeff Layton
2021-06-30 12:52         ` Ilya Dryomov
2021-07-01  1:07           ` Xiubo Li
2021-07-01  1:08           ` Xiubo Li
2021-07-01  3:35           ` Xiubo Li
2021-06-29 15:27 ` [PATCH 0/5] flush the mdlog before waiting on unsafe reqs Jeff Layton
2021-06-30  0:35   ` Xiubo Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=838be760-4d61-9fc7-be8c-59deea9d0e98@redhat.com \
    --to=xiubli@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=pdonnell@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).