linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: "jack@suse.cz" <jack@suse.cz>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"nfbrown@suse.com" <nfbrown@suse.com>,
	"bfields@redhat.com" <bfields@redhat.com>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Subject: Re: Performance regression with random IO pattern from the client
Date: Wed, 30 Mar 2022 18:14:32 +0200	[thread overview]
Message-ID: <20220330161432.xn6z5lyez2iizwj2@quack3.lan> (raw)
In-Reply-To: <64a4832afd830d7c831ab687bc7a72cc791c2f0c.camel@hammerspace.com>

On Wed 30-03-22 15:03:30, Trond Myklebust wrote:
> On Wed, 2022-03-30 at 12:34 +0200, Jan Kara wrote:
> > Hello,
> > 
> > during our performance testing we have noticed that commit
> > b6669305d35a
> > ("nfsd: Reduce the number of calls to nfsd_file_gc()") has introduced
> > a
> > performance regression when a client does random buffered writes. The
> > workload on NFS client is fio running 4 processed doing random
> > buffered writes to 4
> > different files and the files are large enough to hit dirty limits
> > and
> > force writeback from the client. In particular the invocation is
> > like:
> > 
> > fio --direct=0 --ioengine=sync --thread --directory=/mnt/mnt1 --
> > invalidate=1 --group_reporting=1 --runtime=300 --fallocate=posix --
> > ramp_time=10 --name=RandomReads-128000-4k-4 --new_group --
> > rw=randwrite --size=4000m --numjobs=4 --bs=4k --
> > filename_format=FioWorkloads.\$jobnum --end_fsync=1
> > 
> > The reason why commit b6669305d35a regresses performance is the
> > filemap_flush() call it adds into nfsd_file_put(). Before this commit
> > writeback on the server happened from nfsd_commit() code resulting in
> > rather long semisequential streams of 4k writes. After commit
> > b6669305d35a
> > all the writeback happens from filemap_flush() calls resulting in
> > much
> > longer average seek distance (IO from different files is more
> > interleaved)
> > and about 16-20% regression in the achieved writeback throughput when
> > the
> > backing store is rotational storage.
> > 
> > I think the filemap_flush() from nfsd_file_put() is indeed rather
> > aggressive and I think we'd be better off to just leave writeback to
> > either
> > nfsd_commit() or standard dirty page cleaning happening on the
> > system. I
> > assume the rationale for the filemap_flush() call was to make it more
> > likely the file can be evicted during the garbage collection run? Was
> > there
> > any particular problem leading to addition of this call or was it
> > just "it
> > seemed like a good idea" thing?
> > 
> > Thanks in advance for ideas.
> 
> It was mainly introduced to reduce the amount of work that
> nfsd_file_free() needs to do. In particular when re-exporting NFS, the
> call to filp_close() can be expensive because it synchronously flushes
> out dirty pages. That again means that some of the calls to
> nfsd_file_dispose_list() can end up being very expensive (particularly
> the ones run by the garbage collector itself).

I see, thanks for info. So I'm pondering what options we have for fixing
the performance regression. Because the filemap_flush() call in
nfsd_file_put() is just too aggressive and doesn't allow enough dirty data
to accumulate in the page cache for a reasonable IO pattern.

E.g. if the concern is just too long nfsd_file_dispose_list() runtime when
there are more files in the dispose list, we could do two iterations there
- the first one that walks all the files and starts async writeback for all
  of them, and the second one which drops the reference which among other
things may end up in ->flush() doing the synchronous writeback (but that
will now have not much to do). This is how generic writeback actually does
things for synchronous writeback because it is much faster than doing
submit one file, wait for one file in a loop if there are multiple files to
write. Would something like this be acceptable for you?

If something like this is not enough, we could also do something like
having another delayed work walking unused files and starting writeback for
them.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2022-03-30 16:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-30 10:34 Performance regression with random IO pattern from the client Jan Kara
2022-03-30 15:03 ` Trond Myklebust
2022-03-30 15:38   ` Chuck Lever III
2022-03-30 16:19     ` Trond Myklebust
2022-03-30 16:19     ` Jan Kara
2022-03-30 17:56       ` Chuck Lever III
2022-03-30 22:02         ` Trond Myklebust
2022-03-31 13:09           ` Jan Kara
2022-03-31 14:20             ` Chuck Lever III
2022-03-31 14:22               ` Chuck Lever III
2022-03-30 16:14   ` Jan Kara [this message]
2022-03-31  8:43 ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220330161432.xn6z5lyez2iizwj2@quack3.lan \
    --to=jack@suse.cz \
    --cc=bfields@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=nfbrown@suse.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).