All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Xiubo Li <xiubli@redhat.com>
Cc: idryomov@gmail.com, pdonnell@redhat.com, ceph-devel@vger.kernel.org
Subject: Re: [PATCH 4/5] ceph: flush the mdlog before waiting on unsafe reqs
Date: Wed, 30 Jun 2021 08:13:23 -0400	[thread overview]
Message-ID: <2e8aabad80e166d7c628fde9d820fc5f403e034f.camel@kernel.org> (raw)
In-Reply-To: <4f2f6de6-eb1f-1527-de73-2378f262228b@redhat.com>

On Wed, 2021-06-30 at 09:26 +0800, Xiubo Li wrote:
> On 6/29/21 9:25 PM, Jeff Layton wrote:
> > On Tue, 2021-06-29 at 12:42 +0800, xiubli@redhat.com wrote:
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > For the client requests who will have unsafe and safe replies from
> > > MDS daemons, in the MDS side the MDS daemons won't flush the mdlog
> > > (journal log) immediatelly, because they think it's unnecessary.
> > > That's true for most cases but not all, likes the fsync request.
> > > The fsync will wait until all the unsafe replied requests to be
> > > safely replied.
> > > 
> > > Normally if there have multiple threads or clients are running, the
> > > whole mdlog in MDS daemons could be flushed in time if any request
> > > will trigger the mdlog submit thread. So usually we won't experience
> > > the normal operations will stuck for a long time. But in case there
> > > has only one client with only thread is running, the stuck phenomenon
> > > maybe obvious and the worst case it must wait at most 5 seconds to
> > > wait the mdlog to be flushed by the MDS's tick thread periodically.
> > > 
> > > This patch will trigger to flush the mdlog in all the MDSes manually
> > > just before waiting the unsafe requests to finish.
> > > 
> > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > ---
> > >   fs/ceph/caps.c | 9 +++++++++
> > >   1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > index c6a3352a4d52..6e80e4649c7a 100644
> > > --- a/fs/ceph/caps.c
> > > +++ b/fs/ceph/caps.c
> > > @@ -2286,6 +2286,7 @@ static int caps_are_flushed(struct inode *inode, u64 flush_tid)
> > >    */
> > >   static int unsafe_request_wait(struct inode *inode)
> > >   {
> > > +	struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
> > >   	struct ceph_inode_info *ci = ceph_inode(inode);
> > >   	struct ceph_mds_request *req1 = NULL, *req2 = NULL;
> > >   	int ret, err = 0;
> > > @@ -2305,6 +2306,14 @@ static int unsafe_request_wait(struct inode *inode)
> > >   	}
> > >   	spin_unlock(&ci->i_unsafe_lock);
> > >   
> > > +	/*
> > > +	 * Trigger to flush the journal logs in all the MDSes manually,
> > > +	 * or in the worst case we must wait at most 5 seconds to wait
> > > +	 * the journal logs to be flushed by the MDSes periodically.
> > > +	 */
> > > +	if (req1 || req2)
> > > +		flush_mdlog(mdsc);
> > > +
> > So this is called on fsync(). Do we really need to flush all of the mds
> > logs on every fsync? That sounds like it might have some performance
> > impact. Would it be possible to just flush the mdslog on the MDS that's
> > authoritative for this inode?
> 
> I hit one case before, the mds.0 is the auth mds, but the client just 
> sent the request to mds.2, then when the mds.2 tried to gather the 
> rdlocks then it was stuck for waiting for the mds.0 to flush the mdlog. 
> I think it also will happen that if the mds.0 could also be stuck just 
> like this even its the auth MDS.
> 

It sounds like mds.0 should flush its own mdlog in this situation once
mds.2 started requesting locks that mds.0 was holding. Shouldn't it?

> Normally the mdlog submit thread will be triggered per MDS's tick, 
> that's 5 seconds. But this is not always true mostly because any other 
> client request could trigger the mdlog submit thread to run at any time. 
> Since the fsync is not running all the time, so IMO the performance 
> impact should be okay.
> 
> 

I'm not sure I'm convinced.

Consider a situation where we have a large(ish) ceph cluster with
several MDSs. One client is writing to a file that is on mds.0 and there
is little other activity there. Several other clients are doing heavy
I/O on other inodes (of which mds.1 is auth).

The first client then calls fsync, and now the other clients stall for a
bit while mds.1 unnecessarily flushes its mdlog. I think we need to take
care to only flush the mdlog for mds's that we care about here.


> > 
> > >   	dout("unsafe_request_wait %p wait on tid %llu %llu\n",
> > >   	     inode, req1 ? req1->r_tid : 0ULL, req2 ? req2->r_tid : 0ULL);
> > >   	if (req1) {
> 

-- 
Jeff Layton <jlayton@kernel.org>


  reply	other threads:[~2021-06-30 12:13 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-29  4:42 [PATCH 0/5] flush the mdlog before waiting on unsafe reqs xiubli
2021-06-29  4:42 ` [PATCH 1/5] ceph: export ceph_create_session_msg xiubli
2021-06-29 13:12   ` Jeff Layton
2021-06-29 13:27     ` Xiubo Li
2021-06-30 12:17       ` Ilya Dryomov
2021-07-01  1:50         ` Xiubo Li
2021-06-29  4:42 ` [PATCH 2/5] ceph: export iterate_sessions xiubli
2021-06-29 15:39   ` Jeff Layton
2021-06-30  0:55     ` Xiubo Li
2021-06-29  4:42 ` [PATCH 3/5] ceph: flush mdlog before umounting xiubli
2021-06-29 15:34   ` Jeff Layton
2021-06-30  0:36     ` Xiubo Li
2021-06-30 12:39   ` Ilya Dryomov
2021-07-01  1:18     ` Xiubo Li
2021-06-29  4:42 ` [PATCH 4/5] ceph: flush the mdlog before waiting on unsafe reqs xiubli
2021-06-29 13:25   ` Jeff Layton
2021-06-30  1:26     ` Xiubo Li
2021-06-30 12:13       ` Jeff Layton [this message]
2021-07-01  1:16         ` Xiubo Li
2021-07-01  3:27           ` Patrick Donnelly
     [not found]             ` <e917a3e1-2902-604b-5154-98086c95357f@redhat.com>
2021-07-01 23:46               ` Patrick Donnelly
2021-07-02  0:01                 ` Xiubo Li
2021-07-02 13:17                 ` Xiubo Li
2021-07-02 18:14                   ` Patrick Donnelly
2021-07-03  1:33                     ` Xiubo Li
2021-06-29  4:42 ` [PATCH 5/5] ceph: fix ceph feature bits xiubli
2021-06-29 15:38   ` Jeff Layton
2021-06-30  0:52     ` Xiubo Li
2021-06-30 12:05       ` Jeff Layton
2021-06-30 12:52         ` Ilya Dryomov
2021-07-01  1:07           ` Xiubo Li
2021-07-01  1:08           ` Xiubo Li
2021-07-01  3:35           ` Xiubo Li
2021-06-29 15:27 ` [PATCH 0/5] flush the mdlog before waiting on unsafe reqs Jeff Layton
2021-06-30  0:35   ` Xiubo Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e8aabad80e166d7c628fde9d820fc5f403e034f.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=pdonnell@redhat.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.