答复: [External Mail]Re: [PATCH v3 09/13] ext4: fast-commit commit path changes

From: "Xiaohui1 Li 李晓辉" <lixiaohui1@xiaomi.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"harshadshirwadkar@gmail.com" <harshadshirwadkar@gmail.com>
Subject: 答复: [External Mail]Re: [PATCH v3 09/13] ext4: fast-commit commit path changes
Date: Wed, 30 Oct 2019 04:28:42 +0000	[thread overview]
Message-ID: <1572409673853.43507@xiaomi.com> (raw)
In-Reply-To: <20191029213553.GD4404@mit.edu>

thanks to this  iJournaling Usenix paper,

fsync latency-too-long problem because of entangled dependencies and inode' data has to be waited in jbd2 order mode
can be fixed.

entangled dependencies problem is known to us by your kind reply email.
the  problem of file' data wating in jbd2 order mode is also a serious problem which case a long-latency fsync call.

as pointed out in this iJournaling paper, when three conditions turn up at the same time,
1: order mode must be applied, not the writeback mode.
2: The delayed block allocation technique of ext4 must be  applied.
3: backgroud buffer writes are too many.

because the periodic flush disk time caused by delayed block allocation is 30s(a bit too long) in android system,
so when begin to flush data and metadata to disk, the amount of inode data flushed can be so large.
and so because of the default ext4 data mode is order(not the writeback mode), so when fsync is called,
we have to be faced with such a difficult condition which is that have to be waited for so many inode data(not the metadata) flushed to
disk completely in jbd2 thread.

we have no choice as the order mode need to do this work, so the waiting inode-data-flushed-disk time is too long in some extreme conditions.
so it cause the appearance of long-latency fsync call.

thank you for your reply, i will try to fix this problem in my free time.

append some words in ijournal paper which may be help for someone(may be include me) which don't be familiar with why delayed block allocation
will cause long-latency fsync call :

The delayed block allocation technique of ext4 ag-
gravates the CTX problem(appeared in fsync call).

However, if an fsync is called just after
the flush kernel thread invocation, as shown in the ex-
ample in Figure 1(a), the flush thread will allocate data
blocks for dirty pages, and register several modified in-
odes in the running transaction during the delayed block
allocation. Then, the commit operation of the journal
transaction will generate many write requests into stor-
age.

 Shall someone can tell the reason why delayed block allocation technique of ext4 cause  long-latency fsync call with more detail ?
many thanks.

________________________________________
发件人: Theodore Y. Ts'o <tytso@mit.edu>
发送时间: 2019年10月30日 5:35
收件人: Xiaohui1 Li 李晓辉
抄送: linux-ext4@vger.kernel.org; harshadshirwadkar@gmail.com
主题: [External Mail]Re: [PATCH v3 09/13] ext4: fast-commit commit path changes

On Tue, Oct 29, 2019 at 11:43:54AM +0000, Xiaohui1 Li 李晓辉 wrote:
> > We don't actually have to do this.  Strictly speaking, we only have to
> > write out the specific inode being fsync'ed, or the specific inode for
> > which ext4_nfs_commit_metdata() has been called.  For an fsync()
> > workload, especially one where for example, we might have hundreds of
> > modified inodes, all of which are fc-eligible --- for example, because
> > a kernel build is happening in the background, and a single file which
> > is being fsync'ed --- for example because the programmer has just
> > saved a source file in emacs ---- we only need to include that single
> > inode in the fast commit.  Including *all* of the inodes in the
> > i_fc_list in the fast commit, is wasted effort, especially since the
> > inodes in question will be committed within the next 5 seconds.
>
> you said only need to include that single inode in the fast commit.
> do you mean that create a fast-commit transaction which only need to
> commit data and metadata of the specific inode ?  but in your last
> email, you says "we can't just separate out some of the handles from
> others in one transation".
>
> so if we just only include that single inode(ie: being fsync'ed) in
> the fast commit, is it means that in the ext4 traditional way，
> metadata of this single inode being fsync'ed need to be mixed with
> other inodes not being fsync'ed (may doing buffer write) together in
> one transaction to be flushed to disk both together because of
> entagled dependencies you says in your last reply email.
>
> but when fast-commit patches applied, how the metadata and data of
> this single inode being fsync'ed can be extracted from all files
> metadata changes during one time range ？

Did you read the iJournaling Usenix paper[1] which I referenced
earlier?  It's described in there.

[1] https://www.usenix.org/conference/atc17/technical-sessions/presentation/park

The trick is that we track whether the inode has changes which we
can't represent in the fast commit "logical journal".  In the logical
journal, we record changes since the last full commit, not as the full
physical metadata block, but just bits of the logical metadata that
have changed.  If that inode has changed in ways that we can't
represent in the fast commit journal, then we do a normal full commit.

So we avoid entangled dependencies in two ways .  First of all, we
only journal the logical change.  Hence, if there is a change in
another part of the metadata block (say, another inode in the inode
table) there won't be an issue, since we only update that one inode.
Secondly, if the inode has some entangelements either with other
inodes, or (b) changes in the inode which we can't reflect in the fast
commit log, then fall back to doing a full commit.

So basically, we only deal with the simple, common cases, where it's
easy to log changes to the fast commit log.  Now, those changes are
also logged in the normal physical commit, so once we do a full
commit, all of the entries in the fast commit log are no longer needed
--- the fast commit just contains the small, simple changes since the
last full commit.

Cheers,

                                                - Ted
#/******本邮件及其附件含有小米公司的保密信息，仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！ This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#