Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	Matthew Wilcox <willy@infradead.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [POC][PATCH] xfs: reduce ilock contention on buffered randrw workload
Date: Fri, 5 Apr 2019 17:02:33 +0300
Message-ID: <CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com> (raw)
In-Reply-To: <20190404211730.GD26298@dastard>

On Fri, Apr 5, 2019 at 12:17 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Apr 04, 2019 at 07:57:37PM +0300, Amir Goldstein wrote:
> > This patch improves performance of mixed random rw workload
> > on xfs without relaxing the atomic buffered read/write guaranty
> > that xfs has always provided.
> >
> > We achieve that by calling generic_file_read_iter() twice.
> > Once with a discard iterator to warm up page cache before taking
> > the shared ilock and once again under shared ilock.
>
> This will race with thing like truncate, hole punching, etc that
> serialise IO and invalidate the page cache for data integrity
> reasons under the IOLOCK. These rely on there being no IO to the
> inode in progress at all to work correctly, which this patch
> violates. IOWs, while this is fast, it is not safe and so not a
> viable approach to solving the problem.
>

This statement leaves me wondering, if ext4 does not takes
i_rwsem on generic_file_read_iter(), how does ext4 (or any other
fs for that matter) guaranty buffered read synchronization with
truncate, hole punching etc?
The answer in ext4 case is i_mmap_sem, which is read locked
in the page fault handler.

And xfs does the same type of synchronization with MMAPLOCK,
so while my patch may not be safe, I cannot follow why from your
explanation, so please explain if I am missing something.

One thing that Darrick mentioned earlier was that IOLOCK is also
used by xfs to synchronization pNFS leases (probably listed under
'etc' in your explanation). I consent that my patch does not look safe
w.r.t pNFS leases, but that can be sorted out with a hammer
#ifndef CONFIG_EXPORTFS_BLOCK_OPS
or with finer instruments.

> FYI, I'm working on a range lock implementation that should both
> solve the performance issue and the reader starvation issue at the
> same time by allowing concurrent buffered reads and writes to
> different file ranges.
>
> IO range locks will allow proper exclusion for other extent
> manipulation operations like fallocate and truncate, and eventually
> even allow truncate, hole punch, file extension, etc to run
> concurrently with other non-overlapping IO. They solve more than
> just the performance issue you are seeing....
>

I'm glad to hear that. IO range locks are definitely a more wholesome
solution to the problem looking forward.

However, I am still interested to continue the discussion on my POC
patch. One reason is that I am guessing it would be much easier for
distros to backport and pick up to solve performance issues.

Even if my patch doesn't get applied upstream nor picked by distros,
I would still like to understand its flaws and limitations. I know...
if I break it, I get to keep the pieces, but the information that you
provide helps me make my risk assessments.

Thanks,
Amir.

  reply index

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-04 16:57 Amir Goldstein
2019-04-04 21:17 ` Dave Chinner
2019-04-05 14:02   ` Amir Goldstein [this message]
2019-04-07 23:27     ` Dave Chinner
2019-04-08  9:02       ` Amir Goldstein
2019-04-08 14:11         ` Jan Kara
2019-04-08 17:41           ` Amir Goldstein
2019-04-09  8:26             ` Jan Kara
2019-04-08 11:03       ` Jan Kara
2019-04-22 10:55         ` Boaz Harrosh
2019-04-08 10:33   ` Jan Kara
2019-04-08 16:37     ` Davidlohr Bueso
2019-04-11  1:11       ` Dave Chinner
2019-04-16 12:22         ` Dave Chinner
2019-04-18  3:10           ` Dave Chinner
2019-04-18 18:21             ` Davidlohr Bueso
2019-04-20 23:54               ` Dave Chinner
2019-05-03  4:17                 ` Dave Chinner
2019-05-03  5:17                   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git