From: Amir Goldstein <firstname.lastname@example.org> To: Dave Chinner <email@example.com> Cc: "Darrick J . Wong" <firstname.lastname@example.org>, Christoph Hellwig <email@example.com>, Matthew Wilcox <firstname.lastname@example.org>, linux-xfs <email@example.com>, linux-fsdevel <firstname.lastname@example.org> Subject: Re: [POC][PATCH] xfs: reduce ilock contention on buffered randrw workload Date: Fri, 5 Apr 2019 17:02:33 +0300 Message-ID: <CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com> (raw) In-Reply-To: <20190404211730.GD26298@dastard> On Fri, Apr 5, 2019 at 12:17 AM Dave Chinner <email@example.com> wrote: > > On Thu, Apr 04, 2019 at 07:57:37PM +0300, Amir Goldstein wrote: > > This patch improves performance of mixed random rw workload > > on xfs without relaxing the atomic buffered read/write guaranty > > that xfs has always provided. > > > > We achieve that by calling generic_file_read_iter() twice. > > Once with a discard iterator to warm up page cache before taking > > the shared ilock and once again under shared ilock. > > This will race with thing like truncate, hole punching, etc that > serialise IO and invalidate the page cache for data integrity > reasons under the IOLOCK. These rely on there being no IO to the > inode in progress at all to work correctly, which this patch > violates. IOWs, while this is fast, it is not safe and so not a > viable approach to solving the problem. > This statement leaves me wondering, if ext4 does not takes i_rwsem on generic_file_read_iter(), how does ext4 (or any other fs for that matter) guaranty buffered read synchronization with truncate, hole punching etc? The answer in ext4 case is i_mmap_sem, which is read locked in the page fault handler. And xfs does the same type of synchronization with MMAPLOCK, so while my patch may not be safe, I cannot follow why from your explanation, so please explain if I am missing something. One thing that Darrick mentioned earlier was that IOLOCK is also used by xfs to synchronization pNFS leases (probably listed under 'etc' in your explanation). I consent that my patch does not look safe w.r.t pNFS leases, but that can be sorted out with a hammer #ifndef CONFIG_EXPORTFS_BLOCK_OPS or with finer instruments. > FYI, I'm working on a range lock implementation that should both > solve the performance issue and the reader starvation issue at the > same time by allowing concurrent buffered reads and writes to > different file ranges. > > IO range locks will allow proper exclusion for other extent > manipulation operations like fallocate and truncate, and eventually > even allow truncate, hole punch, file extension, etc to run > concurrently with other non-overlapping IO. They solve more than > just the performance issue you are seeing.... > I'm glad to hear that. IO range locks are definitely a more wholesome solution to the problem looking forward. However, I am still interested to continue the discussion on my POC patch. One reason is that I am guessing it would be much easier for distros to backport and pick up to solve performance issues. Even if my patch doesn't get applied upstream nor picked by distros, I would still like to understand its flaws and limitations. I know... if I break it, I get to keep the pieces, but the information that you provide helps me make my risk assessments. Thanks, Amir.
next prev parent reply index Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-04-04 16:57 Amir Goldstein 2019-04-04 21:17 ` Dave Chinner 2019-04-05 14:02 ` Amir Goldstein [this message] 2019-04-07 23:27 ` Dave Chinner 2019-04-08 9:02 ` Amir Goldstein 2019-04-08 14:11 ` Jan Kara 2019-04-08 17:41 ` Amir Goldstein 2019-04-09 8:26 ` Jan Kara 2019-04-08 11:03 ` Jan Kara 2019-04-22 10:55 ` Boaz Harrosh 2019-04-08 10:33 ` Jan Kara 2019-04-08 16:37 ` Davidlohr Bueso 2019-04-11 1:11 ` Dave Chinner 2019-04-16 12:22 ` Dave Chinner 2019-04-18 3:10 ` Dave Chinner 2019-04-18 18:21 ` Davidlohr Bueso 2019-04-20 23:54 ` Dave Chinner 2019-05-03 4:17 ` Dave Chinner 2019-05-03 5:17 ` Dave Chinner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-Fsdevel Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \ email@example.com public-inbox-index linux-fsdevel Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git