From: Matthew Wilcox <firstname.lastname@example.org> To: Dave Chinner <email@example.com> Cc: "Darrick J. Wong" <firstname.lastname@example.org>, Zhengyuan Liu <email@example.com>, firstname.lastname@example.org, Zhengyuan Liu <email@example.com>, Christoph Hellwig <firstname.lastname@example.org> Subject: Re: [Question] About XFS random buffer write performance Date: Fri, 31 Jul 2020 21:47:13 +0100 Message-ID: <20200731204713.GA24067@casper.infradead.org> (raw) In-Reply-To: <20200730234517.GM23808@casper.infradead.org> On Fri, Jul 31, 2020 at 12:45:17AM +0100, Matthew Wilcox wrote: > On Fri, Jul 31, 2020 at 08:08:57AM +1000, Dave Chinner wrote: > > On Thu, Jul 30, 2020 at 02:50:40PM +0100, Matthew Wilcox wrote: > > > On Thu, Jul 30, 2020 at 09:05:03AM +1000, Dave Chinner wrote: > > > > On Wed, Jul 29, 2020 at 07:50:35PM +0100, Matthew Wilcox wrote: > > > > > I had a bit of a misunderstanding. Let's discard that proposal > > > > > and discuss what we want to optimise for, ignoring THPs. We don't > > > > > need to track any per-block state, of course. We could implement > > > > > __iomap_write_begin() by reading in the entire page (skipping the last > > > > > few blocks if they lie outside i_size, of course) and then marking the > > > > > entire page Uptodate. > > > > > > > > __iomap_write_begin() already does read-around for sub-page writes. > > > > And, if necessary, it does zeroing of unwritten extents, newly > > > > allocated ranges and ranges beyond EOF and marks them uptodate > > > > appropriately. > > > > > > But it doesn't read in the entire page, just the blocks in the page which > > > will be touched by the write. > > > > Ah, you are right, I got my page/offset macros mixed up. > > > > In which case, you just identified why the uptodate array is > > necessary and can't be removed. If we do a sub-page write() the page > > is not fully initialised, and so if we then mmap it readpage needs > > to know what part of the page requires initialisation to bring the > > page uptodate before it is exposed to userspace. > > You snipped the part of my mail where I explained how we could handle > that without the uptodate array ;-( Essentially, we do as you thought > it worked, we read the entire page (or at least the portion of it that > isn't going to be overwritten. Once all the bytes have been transferred, > we can mark the page Uptodate. We'll need to wait for the transfer to > happen if the write overlaps a block boundary, but we do that right now. OK, so this turns out to be Hard. We enter the iomap code with iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops) which does: ret = iomap_apply(inode, pos, iov_iter_count(iter), IOMAP_WRITE, ops, iter, iomap_write_actor); so iomap_write_actor doesn't get told about the blocks in the page before the starting pos. They might be a hole or mapped; we have no idea. We could allocate pages _here_ and call iomap_readpage() for the pages which overlap the beginning and end of the I/O, but I'm not entirely convinced that the iomap_ops being passed in will appreciate being called for a read that has no intent to write the portions of the page outside pos. Bleh. I'm going to give up on removing the uptodate bit array and go back to making the THP patchset better.
next prev parent reply index Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-28 11:34 Zhengyuan Liu 2020-07-28 15:34 ` Darrick J. Wong 2020-07-28 15:47 ` Matthew Wilcox 2020-07-29 1:54 ` Dave Chinner 2020-07-29 2:12 ` Matthew Wilcox 2020-07-29 5:19 ` Dave Chinner 2020-07-29 18:50 ` Matthew Wilcox 2020-07-29 23:05 ` Dave Chinner 2020-07-30 13:50 ` Matthew Wilcox 2020-07-30 22:08 ` Dave Chinner 2020-07-30 23:45 ` Matthew Wilcox 2020-07-31 2:05 ` Dave Chinner 2020-07-31 2:37 ` Matthew Wilcox 2020-07-31 20:47 ` Matthew Wilcox [this message] 2020-07-31 22:13 ` Dave Chinner 2020-07-31 6:55 ` Christoph Hellwig 2020-07-29 13:02 ` Zhengyuan Liu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200731204713.GA24067@casper.infradead.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-XFS Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \ firstname.lastname@example.org public-inbox-index linux-xfs Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs AGPL code for this site: git clone https://public-inbox.org/public-inbox.git