linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Chao Yu <yuchao0@huawei.com>
Cc: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read
Date: Tue, 14 Aug 2018 10:28:37 -0700	[thread overview]
Message-ID: <20180814172837.GD56510@jaegeuk-macbookpro.roam.corp.google.com> (raw)
In-Reply-To: <7194f3d2-b875-376a-4c46-37c598bf3e8e@huawei.com>

On 08/14, Chao Yu wrote:
> On 2018/8/14 12:04, Jaegeuk Kim wrote:
> > On 08/14, Chao Yu wrote:
> >> On 2018/8/14 4:11, Jaegeuk Kim wrote:
> >>> On 08/13, Chao Yu wrote:
> >>>> Hi Jaegeuk,
> >>>>
> >>>> On 2018/8/11 2:56, Jaegeuk Kim wrote:
> >>>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
> >>>>> to fix the drop in sequential read throughput.
> >>>>>
> >>>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
> >>>>> device: UFS
> >>>>>
> >>>>> Before -
> >>>>> read throughput: 185 MB/s
> >>>>> total read requests: 85177 (of these ~80000 are 4KB size requests).
> >>>>> total write requests: 2546 (of these ~2208 requests are written in 512KB).
> >>>>>
> >>>>> After -
> >>>>> read throughput: 758 MB/s
> >>>>> total read requests: 2417 (of these ~2042 are 512KB reads).
> >>>>> total write requests: 2701 (of these ~2034 requests are written in 512KB).
> >>>>
> >>>> IMO, it only impact sequential read performance in a large file which may be
> >>>> fragmented during multi-thread writing.
> >>>>
> >>>> In android environment, mostly, the large file should be cold type, such as apk,
> >>>> mp3, rmvb, jpeg..., so I think we only need to serialize writepages() for cold
> >>>> data area writer.
> >>>>
> >>>> So how about adding a mount option to serialize writepage() for different type
> >>>> of log, e.g. in android, using serialize=4; by default, using serialize=7
> >>>> HOT_DATA	1
> >>>> WARM_DATA	2
> >>>> COLD_DATA	4
> >>>
> >>> Well, I don't think we need to give too many mount options for this fragmented
> >>> case. How about doing this for the large files only like this?
> >>
> >> Thread A write 512 pages			Thread B write 8 pages
> >>
> >> - writepages()
> >>  - mutex_lock(&sbi->writepages);
> >>   - writepage();
> >> ...
> >> 						- writepages()
> >> 						 - writepage()
> >> 						  ....
> >>   - writepage();
> >> ...
> >>  - mutex_unlock(&sbi->writepages);
> >>
> >> Above case will also cause fragmentation since we didn't serialize all
> >> concurrent IO with the lock.
> >>
> >> Do we need to consider such case?
> > 
> > We can simply allow 512 and 8 in the same segment, which would not a big deal,
> > when considering starvation of Thread B.
> 
> Yeah, but in reality, there would be more threads competing in same log header,
> so I worry that the effect of defragmenting will not so good as we expect,
> anyway, for benchmark, it's enough.

Basically, I think this is not a benchmark issue. :) It just reveals the issue
much easily. Let me think about three cases:
1) WB_SYNC_NONE & WB_SYNC_NONE
 -> can simply use mutex_lock

2) WB_SYNC_ALL & WB_SYNC_NONE
 -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, while WB_SYNC_NONE
    will skip writing blocks

3) WB_SYNC_ALL & WB_SYNC_ALL
 -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, in order to avoid
    starvation.


I've been testing the below.

if (!S_ISDIR(inode->i_mode) && (wbc->sync_mode != WB_SYNC_ALL ||
                get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks)) {
        mutex_lock(&sbi->writepages);
        locked = true;
}

Thanks,

> 
> Thanks,
> 
> > 
> >>
> >> Thanks,
> >>
> >>>
> >>> >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001
> >>> From: Jaegeuk Kim <jaegeuk@kernel.org>
> >>> Date: Thu, 9 Aug 2018 17:53:34 -0700
> >>> Subject: [PATCH] f2fs: fix performance issue observed with multi-thread
> >>>  sequential read
> >>>
> >>> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
> >>> to fix the drop in sequential read throughput.
> >>>
> >>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
> >>> device: UFS
> >>>
> >>> Before -
> >>> read throughput: 185 MB/s
> >>> total read requests: 85177 (of these ~80000 are 4KB size requests).
> >>> total write requests: 2546 (of these ~2208 requests are written in 512KB).
> >>>
> >>> After -
> >>> read throughput: 758 MB/s
> >>> total read requests: 2417 (of these ~2042 are 512KB reads).
> >>> total write requests: 2701 (of these ~2034 requests are written in 512KB).
> >>>
> >>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> >>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> >>> ---
> >>>  Documentation/ABI/testing/sysfs-fs-f2fs |  8 ++++++++
> >>>  fs/f2fs/data.c                          | 10 ++++++++++
> >>>  fs/f2fs/f2fs.h                          |  2 ++
> >>>  fs/f2fs/segment.c                       |  1 +
> >>>  fs/f2fs/super.c                         |  1 +
> >>>  fs/f2fs/sysfs.c                         |  2 ++
> >>>  6 files changed, 24 insertions(+)
> >>>
> >>> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
> >>> index 9b0123388f18..94a24aedcdb2 100644
> >>> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> >>> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> >>> @@ -51,6 +51,14 @@ Description:
> >>>  		 Controls the dirty page count condition for the in-place-update
> >>>  		 policies.
> >>>  
> >>> +What:		/sys/fs/f2fs/<disk>/min_seq_blocks
> >>> +Date:		August 2018
> >>> +Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
> >>> +Description:
> >>> +		 Controls the dirty page count condition for batched sequential
> >>> +		 writes in ->writepages.
> >>> +
> >>> +
> >>>  What:		/sys/fs/f2fs/<disk>/min_hot_blocks
> >>>  Date:		March 2017
> >>>  Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
> >>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> >>> index 45f043ee48bd..f09231b1cc74 100644
> >>> --- a/fs/f2fs/data.c
> >>> +++ b/fs/f2fs/data.c
> >>> @@ -2132,6 +2132,7 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
> >>>  	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> >>>  	struct blk_plug plug;
> >>>  	int ret;
> >>> +	bool locked = false;
> >>>  
> >>>  	/* deal with chardevs and other special file */
> >>>  	if (!mapping->a_ops->writepage)
> >>> @@ -2162,10 +2163,19 @@ static int __f2fs_write_data_pages(struct address_space *mapping,
> >>>  	else if (atomic_read(&sbi->wb_sync_req[DATA]))
> >>>  		goto skip_write;
> >>>  
> >>> +	if (!S_ISDIR(inode->i_mode) &&
> >>> +			get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks) {
> >>> +		mutex_lock(&sbi->writepages);
> >>> +		locked = true;
> >>> +	}
> >>> +
> >>>  	blk_start_plug(&plug);
> >>>  	ret = f2fs_write_cache_pages(mapping, wbc, io_type);
> >>>  	blk_finish_plug(&plug);
> >>>  
> >>> +	if (locked)
> >>> +		mutex_unlock(&sbi->writepages);
> >>> +
> >>>  	if (wbc->sync_mode == WB_SYNC_ALL)
> >>>  		atomic_dec(&sbi->wb_sync_req[DATA]);
> >>>  	/*
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>> index 375aa9f30cfa..098bdedc28bf 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -913,6 +913,7 @@ struct f2fs_sm_info {
> >>>  	unsigned int ipu_policy;	/* in-place-update policy */
> >>>  	unsigned int min_ipu_util;	/* in-place-update threshold */
> >>>  	unsigned int min_fsync_blocks;	/* threshold for fsync */
> >>> +	unsigned int min_seq_blocks;	/* threshold for sequential blocks */
> >>>  	unsigned int min_hot_blocks;	/* threshold for hot block allocation */
> >>>  	unsigned int min_ssr_sections;	/* threshold to trigger SSR allocation */
> >>>  
> >>> @@ -1133,6 +1134,7 @@ struct f2fs_sb_info {
> >>>  	struct rw_semaphore sb_lock;		/* lock for raw super block */
> >>>  	int valid_super_block;			/* valid super block no */
> >>>  	unsigned long s_flag;				/* flags for sbi */
> >>> +	struct mutex writepages;		/* mutex for writepages() */
> >>>  
> >>>  #ifdef CONFIG_BLK_DEV_ZONED
> >>>  	unsigned int blocks_per_blkz;		/* F2FS blocks per zone */
> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >>> index 63fc647f9ac2..ffea2d1303bd 100644
> >>> --- a/fs/f2fs/segment.c
> >>> +++ b/fs/f2fs/segment.c
> >>> @@ -4131,6 +4131,7 @@ int f2fs_build_segment_manager(struct f2fs_sb_info *sbi)
> >>>  		sm_info->ipu_policy = 1 << F2FS_IPU_FSYNC;
> >>>  	sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
> >>>  	sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
> >>> +	sm_info->min_seq_blocks = sbi->blocks_per_seg * sbi->segs_per_sec;
> >>>  	sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS;
> >>>  	sm_info->min_ssr_sections = reserved_sections(sbi);
> >>>  
> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>> index be41dbd7b261..53d70b64fea1 100644
> >>> --- a/fs/f2fs/super.c
> >>> +++ b/fs/f2fs/super.c
> >>> @@ -2842,6 +2842,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>  	/* init f2fs-specific super block info */
> >>>  	sbi->valid_super_block = valid_super_block;
> >>>  	mutex_init(&sbi->gc_mutex);
> >>> +	mutex_init(&sbi->writepages);
> >>>  	mutex_init(&sbi->cp_mutex);
> >>>  	init_rwsem(&sbi->node_write);
> >>>  	init_rwsem(&sbi->node_change);
> >>> diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> >>> index cd2e030e47b8..81c0e5337443 100644
> >>> --- a/fs/f2fs/sysfs.c
> >>> +++ b/fs/f2fs/sysfs.c
> >>> @@ -397,6 +397,7 @@ F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, batched_trim_sections, trim_sections);
> >>>  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, ipu_policy, ipu_policy);
> >>>  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ipu_util, min_ipu_util);
> >>>  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_fsync_blocks, min_fsync_blocks);
> >>> +F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_seq_blocks, min_seq_blocks);
> >>>  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_hot_blocks, min_hot_blocks);
> >>>  F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ssr_sections, min_ssr_sections);
> >>>  F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh);
> >>> @@ -449,6 +450,7 @@ static struct attribute *f2fs_attrs[] = {
> >>>  	ATTR_LIST(ipu_policy),
> >>>  	ATTR_LIST(min_ipu_util),
> >>>  	ATTR_LIST(min_fsync_blocks),
> >>> +	ATTR_LIST(min_seq_blocks),
> >>>  	ATTR_LIST(min_hot_blocks),
> >>>  	ATTR_LIST(min_ssr_sections),
> >>>  	ATTR_LIST(max_victim_search),
> >>>
> > 
> > .
> > 

  reply	other threads:[~2018-08-14 17:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-10  2:37 [PATCH] f2fs: fix performance issue observed with multi-thread sequential read Jaegeuk Kim
2018-08-10  2:48 ` [PATCH v2] " Jaegeuk Kim
2018-08-10 18:56   ` [f2fs-dev] [PATCH v3] " Jaegeuk Kim
2018-08-13  2:20     ` Chao Yu
2018-08-13 20:11       ` Jaegeuk Kim
2018-08-14  3:49         ` Chao Yu
2018-08-14  4:04           ` Jaegeuk Kim
2018-08-14  6:18             ` Chao Yu
2018-08-14 17:28               ` Jaegeuk Kim [this message]
2018-08-15  1:52                 ` Chao Yu
2018-08-15  2:15                   ` Jaegeuk Kim
2018-08-15  3:44                     ` Chao Yu
2018-08-16  1:34                       ` Jaegeuk Kim
2018-08-16 11:39                         ` Chao Yu
2018-08-17 18:29     ` [f2fs-dev] [PATCH v4] " Jaegeuk Kim
2018-08-20  1:21       ` Chao Yu
2018-08-21  2:28         ` Jaegeuk Kim
2018-08-21  3:26           ` Chao Yu
2018-08-21  2:36       ` [f2fs-dev] [PATCH v5] " Jaegeuk Kim
2018-08-21  3:30         ` Chao Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180814172837.GD56510@jaegeuk-macbookpro.roam.corp.google.com \
    --to=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).