ocfs2-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Junxiao Bi <junxiao.bi@oracle.com>
To: Jan Kara <jack@suse.cz>
Cc: cluster-devel <cluster-devel@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
Date: Mon, 3 May 2021 10:25:31 -0700	[thread overview]
Message-ID: <72cde802-bd8a-9ce5-84d7-57b34a6a8b03@oracle.com> (raw)
In-Reply-To: <20210503102904.GC2994@quack2.suse.cz>


On 5/3/21 3:29 AM, Jan Kara wrote:
> On Fri 30-04-21 14:18:15, Junxiao Bi wrote:
>> On 4/30/21 5:47 AM, Jan Kara wrote:
>>
>>> On Thu 29-04-21 11:07:15, Junxiao Bi wrote:
>>>> On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:
>>>>> On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
>>>>>> When doing truncate/fallocate for some filesytem like ocfs2, it
>>>>>> will zero some pages that are out of inode size and then later
>>>>>> update the inode size, so it needs this api to writeback eof
>>>>>> pages.
>>>>> is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
>>>>> cache filling races" patch set [*]? It doesn't look like the kind of
>>>>> patch Christoph would be happy with.
>>>> Thank you for pointing the patch set. I think that is fixing a different
>>>> issue.
>>>>
>>>> The issue here is when extending file size with fallocate/truncate, if the
>>>> original inode size
>>>>
>>>> is in the middle of the last cluster block(1M), eof part will be zeroed with
>>>> buffer write first,
>>>>
>>>> and then new inode size is updated, so there is a window that dirty pages is
>>>> out of inode size,
>>>>
>>>> if writeback is kicked in, block_write_full_page will drop all those eof
>>>> pages.
>>> I agree that the buffers describing part of the cluster beyond i_size won't
>>> be written. But page cache will remain zeroed out so that is fine. So you
>>> only need to zero out the on disk contents. Since this is actually
>>> physically contiguous range of blocks why don't you just use
>>> sb_issue_zeroout() to zero out the tail of the cluster? It will be more
>>> efficient than going through the page cache and you also won't have to
>>> tweak block_write_full_page()...
>> Thanks for the review.
>>
>> The physical blocks to be zeroed were continuous only when sparse mode is
>> enabled, if sparse mode is disabled, unwritten extent was not supported for
>> ocfs2, then all the blocks to the new size will be zeroed by the buffer
>> write, since sb_issue_zeroout() will need waiting io done, there will be a
>> lot of delay when extending file size. Use writeback to flush async seemed
>> more efficient?
> It depends. Higher end storage (e.g. NVME or NAS, maybe some better SATA
> flash disks as well) do support WRITE_ZERO command so you don't actually
> have to write all those zeros. The storage will just internally mark all
> those blocks as having zeros. This is rather fast so I'd expect the overall
> result to be faster that zeroing page cache and then writing all those
> pages with zeroes on transaction commit. But I agree that for lower end
> storage this may be slower because of synchronous writing of zeroes. That
> being said your transaction commit has to write those zeroes anyway so the
> cost is only mostly shifted but it could still make a difference for some
> workloads. Not sure if that matters, that is your call I'd say.

Ocfs2 is mostly used with SAN, i don't think it's common for SAN storage 
to support WRITE_ZERO command.

Anything bad to add a new api to support eof writeback?

Thanks,

Junxiao.

>
> Also note that you could submit those zeroing bios asynchronously but that
> would be more coding and you need to make sure they are completed on
> transaction commit so probably it isn't worth the complexity.
>
> 								Honza

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

  reply	other threads:[~2021-05-03 17:33 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-26 22:05 [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Junxiao Bi
2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size Junxiao Bi
2021-04-28 16:00   ` Junxiao Bi
2021-04-29 13:09   ` Joseph Qi
2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 3/3] gfs2: fix out of inode size writeback Junxiao Bi
2021-04-28 16:02   ` Junxiao Bi
2021-04-29 11:58 ` [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Joseph Qi
2021-04-29 17:14 ` [Ocfs2-devel] [Cluster-devel] " Andreas Gruenbacher
2021-04-29 18:07   ` Junxiao Bi
2021-04-30 12:47     ` Jan Kara
2021-04-30 21:18       ` Junxiao Bi
2021-05-03 10:29         ` Jan Kara
2021-05-03 17:25           ` Junxiao Bi [this message]
2021-05-04  9:02             ` Jan Kara
2021-05-04 23:35               ` Junxiao Bi
2021-05-05 11:43                 ` Jan Kara
2021-05-05 15:54                   ` Junxiao Bi
2021-05-09 23:23 ` [Ocfs2-devel] " Andrew Morton
2021-05-10 22:15   ` Junxiao Bi
2021-05-11 12:19     ` Bob Peterson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=72cde802-bd8a-9ce5-84d7-57b34a6a8b03@oracle.com \
    --to=junxiao.bi@oracle.com \
    --cc=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).