linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Jan Kara <jack@suse.cz>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	ocfs2-devel@oss.oracle.com,
	cluster-devel <cluster-devel@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
Date: Mon, 3 May 2021 12:29:04 +0200	[thread overview]
Message-ID: <20210503102904.GC2994@quack2.suse.cz> (raw)
In-Reply-To: <a69fa4bc-ffe7-204b-6a1f-6a166c6971a4@oracle.com>

On Fri 30-04-21 14:18:15, Junxiao Bi wrote:
> On 4/30/21 5:47 AM, Jan Kara wrote:
> 
> > On Thu 29-04-21 11:07:15, Junxiao Bi wrote:
> > > On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:
> > > > On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
> > > > > When doing truncate/fallocate for some filesytem like ocfs2, it
> > > > > will zero some pages that are out of inode size and then later
> > > > > update the inode size, so it needs this api to writeback eof
> > > > > pages.
> > > > is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
> > > > cache filling races" patch set [*]? It doesn't look like the kind of
> > > > patch Christoph would be happy with.
> > > Thank you for pointing the patch set. I think that is fixing a different
> > > issue.
> > > 
> > > The issue here is when extending file size with fallocate/truncate, if the
> > > original inode size
> > > 
> > > is in the middle of the last cluster block(1M), eof part will be zeroed with
> > > buffer write first,
> > > 
> > > and then new inode size is updated, so there is a window that dirty pages is
> > > out of inode size,
> > > 
> > > if writeback is kicked in, block_write_full_page will drop all those eof
> > > pages.
> > I agree that the buffers describing part of the cluster beyond i_size won't
> > be written. But page cache will remain zeroed out so that is fine. So you
> > only need to zero out the on disk contents. Since this is actually
> > physically contiguous range of blocks why don't you just use
> > sb_issue_zeroout() to zero out the tail of the cluster? It will be more
> > efficient than going through the page cache and you also won't have to
> > tweak block_write_full_page()...
> 
> Thanks for the review.
> 
> The physical blocks to be zeroed were continuous only when sparse mode is
> enabled, if sparse mode is disabled, unwritten extent was not supported for
> ocfs2, then all the blocks to the new size will be zeroed by the buffer
> write, since sb_issue_zeroout() will need waiting io done, there will be a
> lot of delay when extending file size. Use writeback to flush async seemed
> more efficient?

It depends. Higher end storage (e.g. NVME or NAS, maybe some better SATA
flash disks as well) do support WRITE_ZERO command so you don't actually
have to write all those zeros. The storage will just internally mark all
those blocks as having zeros. This is rather fast so I'd expect the overall
result to be faster that zeroing page cache and then writing all those
pages with zeroes on transaction commit. But I agree that for lower end
storage this may be slower because of synchronous writing of zeroes. That
being said your transaction commit has to write those zeroes anyway so the
cost is only mostly shifted but it could still make a difference for some
workloads. Not sure if that matters, that is your call I'd say.

Also note that you could submit those zeroing bios asynchronously but that
would be more coding and you need to make sure they are completed on
transaction commit so probably it isn't worth the complexity.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2021-05-03 10:29 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-26 22:05 [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Junxiao Bi
2021-04-26 22:05 ` [PATCH 2/3] ocfs2: allow writing back pages out of inode size Junxiao Bi
2021-04-28 16:00   ` Junxiao Bi
2021-04-29 13:09   ` [Ocfs2-devel] " Joseph Qi
2021-04-26 22:05 ` [PATCH 3/3] gfs2: fix out of inode size writeback Junxiao Bi
2021-04-28 16:02   ` Junxiao Bi
2021-04-29 11:58 ` [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Joseph Qi
2021-04-29 17:14 ` [Cluster-devel] " Andreas Gruenbacher
2021-04-29 18:07   ` Junxiao Bi
2021-04-30 12:47     ` Jan Kara
2021-04-30 21:18       ` Junxiao Bi
2021-05-03 10:29         ` Jan Kara [this message]
2021-05-03 17:25           ` Junxiao Bi
2021-05-04  9:02             ` Jan Kara
2021-05-04 23:35               ` Junxiao Bi
2021-05-05 11:43                 ` Jan Kara
2021-05-05 15:54                   ` Junxiao Bi
2021-05-09 23:23 ` [Ocfs2-devel] " Andrew Morton
2021-05-10 22:15   ` Junxiao Bi
2021-05-11 12:19     ` Bob Peterson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210503102904.GC2994@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=hch@lst.de \
    --cc=junxiao.bi@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).