From: Amir Goldstein <amir73il@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>,
Dave Chinner <david@fromorbit.com>,
Christoph Hellwig <hch@lst.de>,
Matthew Wilcox <willy@infradead.org>,
linux-xfs <linux-xfs@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [POC][PATCH] xfs: reduce ilock contention on buffered randrw workload
Date: Tue, 21 Jun 2022 10:49:48 +0300 [thread overview]
Message-ID: <CAOQ4uxg+uY5PdcU1=RyDWCxbP4gJB3jH1zkAj=RpfndH9czXbg@mail.gmail.com> (raw)
In-Reply-To: <20220620091136.4uosazpwkmt65a5d@quack3.lan>
> > > > Hi Jan, Dave,
> > > >
> > > > Trying to circle back to this after 3 years!
> > > > Seeing that there is no progress with range locks and
> > > > that the mixed rw workloads performance issue still very much exists.
> > > >
> > > > Is the situation now different than 3 years ago with invalidate_lock?
> > >
> > > Yes, I've implemented invalidate_lock exactly to fix the issues you've
> > > pointed out without regressing the mixed rw workloads (because
> > > invalidate_lock is taken in shared mode only for reads and usually not at
> > > all for writes).
> > >
> > > > Would my approach of pre-warm page cache before taking IOLOCK
> > > > be safe if page cache is pre-warmed with invalidate_lock held?
> > >
> > > Why would it be needed? But yes, with invalidate_lock you could presumably
> > > make that idea safe...
> >
> > To remind you, the context in which I pointed you to the punch hole race
> > issue in "other file systems" was a discussion about trying to relax the
> > "atomic write" POSIX semantics [1] of xfs.
>
> Ah, I see. Sorry, I already forgot :-|
Understandable. It has been 3 years ;-)
>
> > There was a lot of discussions around range locks and changing the
> > fairness of rwsem readers and writer, but none of this changes the fact
> > that as long as the lock is file wide (and it does not look like that is
> > going to change in the near future), it is better for lock contention to
> > perform the serialization on page cache read/write and not on disk
> > read/write.
> >
> > Therefore, *if* it is acceptable to pre-warn page cache for buffered read
> > under invalidate_lock, that is a simple way to bring the xfs performance with
> > random rw mix workload on par with ext4 performance without losing the
> > atomic write POSIX semantics. So everyone can be happy?
>
> So to spell out your proposal so that we are on the same page: you want to
> use invalidate_lock + page locks to achieve "writes are atomic wrt reads"
> property XFS currently has without holding i_rwsem in shared mode during
> reads. Am I getting it correct?
Not exactly.
>
> How exactly do you imagine the synchronization of buffered read against
> buffered write would work? Lock all pages for the read range in the page
> cache? You'd need to be careful to not bring the machine OOM when someone
> asks to read a huge range...
I imagine that the atomic r/w synchronisation will remain *exactly* as it is
today by taking XFS_IOLOCK_SHARED around generic_file_read_iter(),
when reading data into user buffer, but before that, I would like to issue
and wait for read of the pages in the range to reduce the probability
of doing the read I/O under XFS_IOLOCK_SHARED.
The pre-warm of page cache does not need to abide to the atomic read
semantics and it is also tolerable if some pages are evicted in between
pre-warn and read to user buffer - in the worst case this will result in
I/O amplification, but for the common case, it will be a big win for the
mixed random r/w performance on xfs.
To reduce risk of page cache thrashing we can limit this optimization
to a maximum number of page cache pre-warm.
The questions are:
1. Does this plan sound reasonable?
2. Is there a ready helper (force_page_cache_readahead?) that
I can use which takes the required page/invalidate locks?
Thanks,
Amir.
next prev parent reply other threads:[~2022-06-21 7:50 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-04 16:57 [POC][PATCH] xfs: reduce ilock contention on buffered randrw workload Amir Goldstein
2019-04-04 21:17 ` Dave Chinner
2019-04-05 14:02 ` Amir Goldstein
2019-04-07 23:27 ` Dave Chinner
2019-04-08 9:02 ` Amir Goldstein
2019-04-08 14:11 ` Jan Kara
2019-04-08 17:41 ` Amir Goldstein
2019-04-09 8:26 ` Jan Kara
2022-06-17 14:48 ` Amir Goldstein
2022-06-17 15:11 ` Jan Kara
2022-06-18 8:38 ` Amir Goldstein
2022-06-20 9:11 ` Jan Kara
2022-06-21 7:49 ` Amir Goldstein [this message]
2022-06-21 8:59 ` Jan Kara
2022-06-21 12:53 ` Amir Goldstein
2022-06-22 3:23 ` Matthew Wilcox
2022-06-22 9:00 ` Amir Goldstein
2022-06-22 9:34 ` Jan Kara
2022-06-22 16:26 ` Amir Goldstein
2022-09-13 14:40 ` Amir Goldstein
2022-09-14 16:01 ` Darrick J. Wong
2022-09-14 16:29 ` Amir Goldstein
2022-09-14 17:39 ` Darrick J. Wong
2022-09-19 23:09 ` Dave Chinner
2022-09-20 2:24 ` Dave Chinner
2022-09-20 3:08 ` Amir Goldstein
2022-09-21 11:20 ` Amir Goldstein
2019-04-08 11:03 ` Jan Kara
2019-04-22 10:55 ` Boaz Harrosh
2019-04-08 10:33 ` Jan Kara
2019-04-08 16:37 ` Davidlohr Bueso
2019-04-11 1:11 ` Dave Chinner
2019-04-16 12:22 ` Dave Chinner
2019-04-18 3:10 ` Dave Chinner
2019-04-18 18:21 ` Davidlohr Bueso
2019-04-20 23:54 ` Dave Chinner
2019-05-03 4:17 ` Dave Chinner
2019-05-03 5:17 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOQ4uxg+uY5PdcU1=RyDWCxbP4gJB3jH1zkAj=RpfndH9czXbg@mail.gmail.com' \
--to=amir73il@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).