From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E223C46464 for ; Wed, 15 Aug 2018 03:44:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0D012214FA for ; Wed, 15 Aug 2018 03:44:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D012214FA Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728776AbeHOGee (ORCPT ); Wed, 15 Aug 2018 02:34:34 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:35492 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728480AbeHOGee (ORCPT ); Wed, 15 Aug 2018 02:34:34 -0400 Received: from DGGEMS401-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 95A237E8C0D5F; Wed, 15 Aug 2018 11:44:11 +0800 (CST) Received: from [127.0.0.1] (10.134.22.195) by DGGEMS401-HUB.china.huawei.com (10.3.19.201) with Microsoft SMTP Server id 14.3.399.0; Wed, 15 Aug 2018 11:44:12 +0800 Subject: Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read To: Jaegeuk Kim CC: , References: <20180810023758.46974-1-jaegeuk@kernel.org> <20180810024859.GA48219@jaegeuk-macbookpro.roam.corp.google.com> <20180810185640.GA63079@jaegeuk-macbookpro.roam.corp.google.com> <57149b77-3576-87ed-2cae-a3bc2e8088f2@huawei.com> <20180813201150.GA27044@jaegeuk-macbookpro.roam.corp.google.com> <0e07e79b-5b67-9355-1fa0-402a5d118bd1@huawei.com> <20180814040434.GA52730@jaegeuk-macbookpro.roam.corp.google.com> <7194f3d2-b875-376a-4c46-37c598bf3e8e@huawei.com> <20180814172837.GD56510@jaegeuk-macbookpro.roam.corp.google.com> <167ce8f1-ee4d-2ffc-3518-32850465cd0c@huawei.com> <20180815021524.GA84720@jaegeuk-macbookpro.roam.corp.google.com> From: Chao Yu Message-ID: Date: Wed, 15 Aug 2018 11:44:11 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180815021524.GA84720@jaegeuk-macbookpro.roam.corp.google.com> Content-Type: text/plain; charset="windows-1252" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.134.22.195] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018/8/15 10:15, Jaegeuk Kim wrote: > On 08/15, Chao Yu wrote: >> On 2018/8/15 1:28, Jaegeuk Kim wrote: >>> On 08/14, Chao Yu wrote: >>>> On 2018/8/14 12:04, Jaegeuk Kim wrote: >>>>> On 08/14, Chao Yu wrote: >>>>>> On 2018/8/14 4:11, Jaegeuk Kim wrote: >>>>>>> On 08/13, Chao Yu wrote: >>>>>>>> Hi Jaegeuk, >>>>>>>> >>>>>>>> On 2018/8/11 2:56, Jaegeuk Kim wrote: >>>>>>>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock" >>>>>>>>> to fix the drop in sequential read throughput. >>>>>>>>> >>>>>>>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L >>>>>>>>> device: UFS >>>>>>>>> >>>>>>>>> Before - >>>>>>>>> read throughput: 185 MB/s >>>>>>>>> total read requests: 85177 (of these ~80000 are 4KB size requests). >>>>>>>>> total write requests: 2546 (of these ~2208 requests are written in 512KB). >>>>>>>>> >>>>>>>>> After - >>>>>>>>> read throughput: 758 MB/s >>>>>>>>> total read requests: 2417 (of these ~2042 are 512KB reads). >>>>>>>>> total write requests: 2701 (of these ~2034 requests are written in 512KB). >>>>>>>> >>>>>>>> IMO, it only impact sequential read performance in a large file which may be >>>>>>>> fragmented during multi-thread writing. >>>>>>>> >>>>>>>> In android environment, mostly, the large file should be cold type, such as apk, >>>>>>>> mp3, rmvb, jpeg..., so I think we only need to serialize writepages() for cold >>>>>>>> data area writer. >>>>>>>> >>>>>>>> So how about adding a mount option to serialize writepage() for different type >>>>>>>> of log, e.g. in android, using serialize=4; by default, using serialize=7 >>>>>>>> HOT_DATA 1 >>>>>>>> WARM_DATA 2 >>>>>>>> COLD_DATA 4 >>>>>>> >>>>>>> Well, I don't think we need to give too many mount options for this fragmented >>>>>>> case. How about doing this for the large files only like this? >>>>>> >>>>>> Thread A write 512 pages Thread B write 8 pages >>>>>> >>>>>> - writepages() >>>>>> - mutex_lock(&sbi->writepages); >>>>>> - writepage(); >>>>>> ... >>>>>> - writepages() >>>>>> - writepage() >>>>>> .... >>>>>> - writepage(); >>>>>> ... >>>>>> - mutex_unlock(&sbi->writepages); >>>>>> >>>>>> Above case will also cause fragmentation since we didn't serialize all >>>>>> concurrent IO with the lock. >>>>>> >>>>>> Do we need to consider such case? >>>>> >>>>> We can simply allow 512 and 8 in the same segment, which would not a big deal, >>>>> when considering starvation of Thread B. >>>> >>>> Yeah, but in reality, there would be more threads competing in same log header, >>>> so I worry that the effect of defragmenting will not so good as we expect, >>>> anyway, for benchmark, it's enough. >>> >>> Basically, I think this is not a benchmark issue. :) It just reveals the issue >>> much easily. Let me think about three cases: >>> 1) WB_SYNC_NONE & WB_SYNC_NONE >>> -> can simply use mutex_lock >>> >>> 2) WB_SYNC_ALL & WB_SYNC_NONE >>> -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, while WB_SYNC_NONE >>> will skip writing blocks >>> >>> 3) WB_SYNC_ALL & WB_SYNC_ALL >>> -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, in order to avoid >>> starvation. >>> >>> >>> I've been testing the below. >>> >>> if (!S_ISDIR(inode->i_mode) && (wbc->sync_mode != WB_SYNC_ALL || >>> get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks)) { >>> mutex_lock(&sbi->writepages); >>> locked = true; >> >> Just cover buffered IO? how about covering Direct IO and atomic write as well? > > I'd expect direct IO does in-place-updates, and not sure whether we need to For initial writes, they are not IPU. > add another lock contention between buffered or direct IO. Atomic writes > would be covered by ->min_seq_blocks. Okay. :) Thanks, > >> >> Thanks, >> >>> } >>> >>> Thanks, >>> >>>> >>>> Thanks, >>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>>> >>>>>>> >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001 >>>>>>> From: Jaegeuk Kim >>>>>>> Date: Thu, 9 Aug 2018 17:53:34 -0700 >>>>>>> Subject: [PATCH] f2fs: fix performance issue observed with multi-thread >>>>>>> sequential read >>>>>>> >>>>>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock" >>>>>>> to fix the drop in sequential read throughput. >>>>>>> >>>>>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L >>>>>>> device: UFS >>>>>>> >>>>>>> Before - >>>>>>> read throughput: 185 MB/s >>>>>>> total read requests: 85177 (of these ~80000 are 4KB size requests). >>>>>>> total write requests: 2546 (of these ~2208 requests are written in 512KB). >>>>>>> >>>>>>> After - >>>>>>> read throughput: 758 MB/s >>>>>>> total read requests: 2417 (of these ~2042 are 512KB reads). >>>>>>> total write requests: 2701 (of these ~2034 requests are written in 512KB). >>>>>>> >>>>>>> Signed-off-by: Sahitya Tummala >>>>>>> Signed-off-by: Jaegeuk Kim >>>>>>> --- >>>>>>> Documentation/ABI/testing/sysfs-fs-f2fs | 8 ++++++++ >>>>>>> fs/f2fs/data.c | 10 ++++++++++ >>>>>>> fs/f2fs/f2fs.h | 2 ++ >>>>>>> fs/f2fs/segment.c | 1 + >>>>>>> fs/f2fs/super.c | 1 + >>>>>>> fs/f2fs/sysfs.c | 2 ++ >>>>>>> 6 files changed, 24 insertions(+) >>>>>>> >>>>>>> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs >>>>>>> index 9b0123388f18..94a24aedcdb2 100644 >>>>>>> --- a/Documentation/ABI/testing/sysfs-fs-f2fs >>>>>>> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs >>>>>>> @@ -51,6 +51,14 @@ Description: >>>>>>> Controls the dirty page count condition for the in-place-update >>>>>>> policies. >>>>>>> >>>>>>> +What: /sys/fs/f2fs//min_seq_blocks >>>>>>> +Date: August 2018 >>>>>>> +Contact: "Jaegeuk Kim" >>>>>>> +Description: >>>>>>> + Controls the dirty page count condition for batched sequential >>>>>>> + writes in ->writepages. >>>>>>> + >>>>>>> + >>>>>>> What: /sys/fs/f2fs//min_hot_blocks >>>>>>> Date: March 2017 >>>>>>> Contact: "Jaegeuk Kim" >>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>>>>>> index 45f043ee48bd..f09231b1cc74 100644 >>>>>>> --- a/fs/f2fs/data.c >>>>>>> +++ b/fs/f2fs/data.c >>>>>>> @@ -2132,6 +2132,7 @@ static int __f2fs_write_data_pages(struct address_space *mapping, >>>>>>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>>>>>> struct blk_plug plug; >>>>>>> int ret; >>>>>>> + bool locked = false; >>>>>>> >>>>>>> /* deal with chardevs and other special file */ >>>>>>> if (!mapping->a_ops->writepage) >>>>>>> @@ -2162,10 +2163,19 @@ static int __f2fs_write_data_pages(struct address_space *mapping, >>>>>>> else if (atomic_read(&sbi->wb_sync_req[DATA])) >>>>>>> goto skip_write; >>>>>>> >>>>>>> + if (!S_ISDIR(inode->i_mode) && >>>>>>> + get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks) { >>>>>>> + mutex_lock(&sbi->writepages); >>>>>>> + locked = true; >>>>>>> + } >>>>>>> + >>>>>>> blk_start_plug(&plug); >>>>>>> ret = f2fs_write_cache_pages(mapping, wbc, io_type); >>>>>>> blk_finish_plug(&plug); >>>>>>> >>>>>>> + if (locked) >>>>>>> + mutex_unlock(&sbi->writepages); >>>>>>> + >>>>>>> if (wbc->sync_mode == WB_SYNC_ALL) >>>>>>> atomic_dec(&sbi->wb_sync_req[DATA]); >>>>>>> /* >>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>>>>>> index 375aa9f30cfa..098bdedc28bf 100644 >>>>>>> --- a/fs/f2fs/f2fs.h >>>>>>> +++ b/fs/f2fs/f2fs.h >>>>>>> @@ -913,6 +913,7 @@ struct f2fs_sm_info { >>>>>>> unsigned int ipu_policy; /* in-place-update policy */ >>>>>>> unsigned int min_ipu_util; /* in-place-update threshold */ >>>>>>> unsigned int min_fsync_blocks; /* threshold for fsync */ >>>>>>> + unsigned int min_seq_blocks; /* threshold for sequential blocks */ >>>>>>> unsigned int min_hot_blocks; /* threshold for hot block allocation */ >>>>>>> unsigned int min_ssr_sections; /* threshold to trigger SSR allocation */ >>>>>>> >>>>>>> @@ -1133,6 +1134,7 @@ struct f2fs_sb_info { >>>>>>> struct rw_semaphore sb_lock; /* lock for raw super block */ >>>>>>> int valid_super_block; /* valid super block no */ >>>>>>> unsigned long s_flag; /* flags for sbi */ >>>>>>> + struct mutex writepages; /* mutex for writepages() */ >>>>>>> >>>>>>> #ifdef CONFIG_BLK_DEV_ZONED >>>>>>> unsigned int blocks_per_blkz; /* F2FS blocks per zone */ >>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c >>>>>>> index 63fc647f9ac2..ffea2d1303bd 100644 >>>>>>> --- a/fs/f2fs/segment.c >>>>>>> +++ b/fs/f2fs/segment.c >>>>>>> @@ -4131,6 +4131,7 @@ int f2fs_build_segment_manager(struct f2fs_sb_info *sbi) >>>>>>> sm_info->ipu_policy = 1 << F2FS_IPU_FSYNC; >>>>>>> sm_info->min_ipu_util = DEF_MIN_IPU_UTIL; >>>>>>> sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS; >>>>>>> + sm_info->min_seq_blocks = sbi->blocks_per_seg * sbi->segs_per_sec; >>>>>>> sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS; >>>>>>> sm_info->min_ssr_sections = reserved_sections(sbi); >>>>>>> >>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c >>>>>>> index be41dbd7b261..53d70b64fea1 100644 >>>>>>> --- a/fs/f2fs/super.c >>>>>>> +++ b/fs/f2fs/super.c >>>>>>> @@ -2842,6 +2842,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) >>>>>>> /* init f2fs-specific super block info */ >>>>>>> sbi->valid_super_block = valid_super_block; >>>>>>> mutex_init(&sbi->gc_mutex); >>>>>>> + mutex_init(&sbi->writepages); >>>>>>> mutex_init(&sbi->cp_mutex); >>>>>>> init_rwsem(&sbi->node_write); >>>>>>> init_rwsem(&sbi->node_change); >>>>>>> diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c >>>>>>> index cd2e030e47b8..81c0e5337443 100644 >>>>>>> --- a/fs/f2fs/sysfs.c >>>>>>> +++ b/fs/f2fs/sysfs.c >>>>>>> @@ -397,6 +397,7 @@ F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, batched_trim_sections, trim_sections); >>>>>>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, ipu_policy, ipu_policy); >>>>>>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ipu_util, min_ipu_util); >>>>>>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_fsync_blocks, min_fsync_blocks); >>>>>>> +F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_seq_blocks, min_seq_blocks); >>>>>>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_hot_blocks, min_hot_blocks); >>>>>>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ssr_sections, min_ssr_sections); >>>>>>> F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh); >>>>>>> @@ -449,6 +450,7 @@ static struct attribute *f2fs_attrs[] = { >>>>>>> ATTR_LIST(ipu_policy), >>>>>>> ATTR_LIST(min_ipu_util), >>>>>>> ATTR_LIST(min_fsync_blocks), >>>>>>> + ATTR_LIST(min_seq_blocks), >>>>>>> ATTR_LIST(min_hot_blocks), >>>>>>> ATTR_LIST(min_ssr_sections), >>>>>>> ATTR_LIST(max_victim_search), >>>>>>> >>>>> >>>>> . >>>>> >>> >>> . >>> > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chao Yu Subject: Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read Date: Wed, 15 Aug 2018 11:44:11 +0800 Message-ID: References: <20180810023758.46974-1-jaegeuk@kernel.org> <20180810024859.GA48219@jaegeuk-macbookpro.roam.corp.google.com> <20180810185640.GA63079@jaegeuk-macbookpro.roam.corp.google.com> <57149b77-3576-87ed-2cae-a3bc2e8088f2@huawei.com> <20180813201150.GA27044@jaegeuk-macbookpro.roam.corp.google.com> <0e07e79b-5b67-9355-1fa0-402a5d118bd1@huawei.com> <20180814040434.GA52730@jaegeuk-macbookpro.roam.corp.google.com> <7194f3d2-b875-376a-4c46-37c598bf3e8e@huawei.com> <20180814172837.GD56510@jaegeuk-macbookpro.roam.corp.google.com> <167ce8f1-ee4d-2ffc-3518-32850465cd0c@huawei.com> <20180815021524.GA84720@jaegeuk-macbookpro.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20180815021524.GA84720@jaegeuk-macbookpro.roam.corp.google.com> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org To: Jaegeuk Kim Cc: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net List-Id: linux-f2fs-devel.lists.sourceforge.net On 2018/8/15 10:15, Jaegeuk Kim wrote: > On 08/15, Chao Yu wrote: >> On 2018/8/15 1:28, Jaegeuk Kim wrote: >>> On 08/14, Chao Yu wrote: >>>> On 2018/8/14 12:04, Jaegeuk Kim wrote: >>>>> On 08/14, Chao Yu wrote: >>>>>> On 2018/8/14 4:11, Jaegeuk Kim wrote: >>>>>>> On 08/13, Chao Yu wrote: >>>>>>>> Hi Jaegeuk, >>>>>>>> >>>>>>>> On 2018/8/11 2:56, Jaegeuk Kim wrote: >>>>>>>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock" >>>>>>>>> to fix the drop in sequential read throughput. >>>>>>>>> >>>>>>>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L >>>>>>>>> device: UFS >>>>>>>>> >>>>>>>>> Before - >>>>>>>>> read throughput: 185 MB/s >>>>>>>>> total read requests: 85177 (of these ~80000 are 4KB size requests). >>>>>>>>> total write requests: 2546 (of these ~2208 requests are written in 512KB). >>>>>>>>> >>>>>>>>> After - >>>>>>>>> read throughput: 758 MB/s >>>>>>>>> total read requests: 2417 (of these ~2042 are 512KB reads). >>>>>>>>> total write requests: 2701 (of these ~2034 requests are written in 512KB). >>>>>>>> >>>>>>>> IMO, it only impact sequential read performance in a large file which may be >>>>>>>> fragmented during multi-thread writing. >>>>>>>> >>>>>>>> In android environment, mostly, the large file should be cold type, such as apk, >>>>>>>> mp3, rmvb, jpeg..., so I think we only need to serialize writepages() for cold >>>>>>>> data area writer. >>>>>>>> >>>>>>>> So how about adding a mount option to serialize writepage() for different type >>>>>>>> of log, e.g. in android, using serialize=4; by default, using serialize=7 >>>>>>>> HOT_DATA 1 >>>>>>>> WARM_DATA 2 >>>>>>>> COLD_DATA 4 >>>>>>> >>>>>>> Well, I don't think we need to give too many mount options for this fragmented >>>>>>> case. How about doing this for the large files only like this? >>>>>> >>>>>> Thread A write 512 pages Thread B write 8 pages >>>>>> >>>>>> - writepages() >>>>>> - mutex_lock(&sbi->writepages); >>>>>> - writepage(); >>>>>> ... >>>>>> - writepages() >>>>>> - writepage() >>>>>> .... >>>>>> - writepage(); >>>>>> ... >>>>>> - mutex_unlock(&sbi->writepages); >>>>>> >>>>>> Above case will also cause fragmentation since we didn't serialize all >>>>>> concurrent IO with the lock. >>>>>> >>>>>> Do we need to consider such case? >>>>> >>>>> We can simply allow 512 and 8 in the same segment, which would not a big deal, >>>>> when considering starvation of Thread B. >>>> >>>> Yeah, but in reality, there would be more threads competing in same log header, >>>> so I worry that the effect of defragmenting will not so good as we expect, >>>> anyway, for benchmark, it's enough. >>> >>> Basically, I think this is not a benchmark issue. :) It just reveals the issue >>> much easily. Let me think about three cases: >>> 1) WB_SYNC_NONE & WB_SYNC_NONE >>> -> can simply use mutex_lock >>> >>> 2) WB_SYNC_ALL & WB_SYNC_NONE >>> -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, while WB_SYNC_NONE >>> will skip writing blocks >>> >>> 3) WB_SYNC_ALL & WB_SYNC_ALL >>> -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, in order to avoid >>> starvation. >>> >>> >>> I've been testing the below. >>> >>> if (!S_ISDIR(inode->i_mode) && (wbc->sync_mode != WB_SYNC_ALL || >>> get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks)) { >>> mutex_lock(&sbi->writepages); >>> locked = true; >> >> Just cover buffered IO? how about covering Direct IO and atomic write as well? > > I'd expect direct IO does in-place-updates, and not sure whether we need to For initial writes, they are not IPU. > add another lock contention between buffered or direct IO. Atomic writes > would be covered by ->min_seq_blocks. Okay. :) Thanks, > >> >> Thanks, >> >>> } >>> >>> Thanks, >>> >>>> >>>> Thanks, >>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>>> >>>>>>> >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001 >>>>>>> From: Jaegeuk Kim >>>>>>> Date: Thu, 9 Aug 2018 17:53:34 -0700 >>>>>>> Subject: [PATCH] f2fs: fix performance issue observed with multi-thread >>>>>>> sequential read >>>>>>> >>>>>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock" >>>>>>> to fix the drop in sequential read throughput. >>>>>>> >>>>>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L >>>>>>> device: UFS >>>>>>> >>>>>>> Before - >>>>>>> read throughput: 185 MB/s >>>>>>> total read requests: 85177 (of these ~80000 are 4KB size requests). >>>>>>> total write requests: 2546 (of these ~2208 requests are written in 512KB). >>>>>>> >>>>>>> After - >>>>>>> read throughput: 758 MB/s >>>>>>> total read requests: 2417 (of these ~2042 are 512KB reads). >>>>>>> total write requests: 2701 (of these ~2034 requests are written in 512KB). >>>>>>> >>>>>>> Signed-off-by: Sahitya Tummala >>>>>>> Signed-off-by: Jaegeuk Kim >>>>>>> --- >>>>>>> Documentation/ABI/testing/sysfs-fs-f2fs | 8 ++++++++ >>>>>>> fs/f2fs/data.c | 10 ++++++++++ >>>>>>> fs/f2fs/f2fs.h | 2 ++ >>>>>>> fs/f2fs/segment.c | 1 + >>>>>>> fs/f2fs/super.c | 1 + >>>>>>> fs/f2fs/sysfs.c | 2 ++ >>>>>>> 6 files changed, 24 insertions(+) >>>>>>> >>>>>>> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs >>>>>>> index 9b0123388f18..94a24aedcdb2 100644 >>>>>>> --- a/Documentation/ABI/testing/sysfs-fs-f2fs >>>>>>> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs >>>>>>> @@ -51,6 +51,14 @@ Description: >>>>>>> Controls the dirty page count condition for the in-place-update >>>>>>> policies. >>>>>>> >>>>>>> +What: /sys/fs/f2fs//min_seq_blocks >>>>>>> +Date: August 2018 >>>>>>> +Contact: "Jaegeuk Kim" >>>>>>> +Description: >>>>>>> + Controls the dirty page count condition for batched sequential >>>>>>> + writes in ->writepages. >>>>>>> + >>>>>>> + >>>>>>> What: /sys/fs/f2fs//min_hot_blocks >>>>>>> Date: March 2017 >>>>>>> Contact: "Jaegeuk Kim" >>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>>>>>> index 45f043ee48bd..f09231b1cc74 100644 >>>>>>> --- a/fs/f2fs/data.c >>>>>>> +++ b/fs/f2fs/data.c >>>>>>> @@ -2132,6 +2132,7 @@ static int __f2fs_write_data_pages(struct address_space *mapping, >>>>>>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>>>>>> struct blk_plug plug; >>>>>>> int ret; >>>>>>> + bool locked = false; >>>>>>> >>>>>>> /* deal with chardevs and other special file */ >>>>>>> if (!mapping->a_ops->writepage) >>>>>>> @@ -2162,10 +2163,19 @@ static int __f2fs_write_data_pages(struct address_space *mapping, >>>>>>> else if (atomic_read(&sbi->wb_sync_req[DATA])) >>>>>>> goto skip_write; >>>>>>> >>>>>>> + if (!S_ISDIR(inode->i_mode) && >>>>>>> + get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks) { >>>>>>> + mutex_lock(&sbi->writepages); >>>>>>> + locked = true; >>>>>>> + } >>>>>>> + >>>>>>> blk_start_plug(&plug); >>>>>>> ret = f2fs_write_cache_pages(mapping, wbc, io_type); >>>>>>> blk_finish_plug(&plug); >>>>>>> >>>>>>> + if (locked) >>>>>>> + mutex_unlock(&sbi->writepages); >>>>>>> + >>>>>>> if (wbc->sync_mode == WB_SYNC_ALL) >>>>>>> atomic_dec(&sbi->wb_sync_req[DATA]); >>>>>>> /* >>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>>>>>> index 375aa9f30cfa..098bdedc28bf 100644 >>>>>>> --- a/fs/f2fs/f2fs.h >>>>>>> +++ b/fs/f2fs/f2fs.h >>>>>>> @@ -913,6 +913,7 @@ struct f2fs_sm_info { >>>>>>> unsigned int ipu_policy; /* in-place-update policy */ >>>>>>> unsigned int min_ipu_util; /* in-place-update threshold */ >>>>>>> unsigned int min_fsync_blocks; /* threshold for fsync */ >>>>>>> + unsigned int min_seq_blocks; /* threshold for sequential blocks */ >>>>>>> unsigned int min_hot_blocks; /* threshold for hot block allocation */ >>>>>>> unsigned int min_ssr_sections; /* threshold to trigger SSR allocation */ >>>>>>> >>>>>>> @@ -1133,6 +1134,7 @@ struct f2fs_sb_info { >>>>>>> struct rw_semaphore sb_lock; /* lock for raw super block */ >>>>>>> int valid_super_block; /* valid super block no */ >>>>>>> unsigned long s_flag; /* flags for sbi */ >>>>>>> + struct mutex writepages; /* mutex for writepages() */ >>>>>>> >>>>>>> #ifdef CONFIG_BLK_DEV_ZONED >>>>>>> unsigned int blocks_per_blkz; /* F2FS blocks per zone */ >>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c >>>>>>> index 63fc647f9ac2..ffea2d1303bd 100644 >>>>>>> --- a/fs/f2fs/segment.c >>>>>>> +++ b/fs/f2fs/segment.c >>>>>>> @@ -4131,6 +4131,7 @@ int f2fs_build_segment_manager(struct f2fs_sb_info *sbi) >>>>>>> sm_info->ipu_policy = 1 << F2FS_IPU_FSYNC; >>>>>>> sm_info->min_ipu_util = DEF_MIN_IPU_UTIL; >>>>>>> sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS; >>>>>>> + sm_info->min_seq_blocks = sbi->blocks_per_seg * sbi->segs_per_sec; >>>>>>> sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS; >>>>>>> sm_info->min_ssr_sections = reserved_sections(sbi); >>>>>>> >>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c >>>>>>> index be41dbd7b261..53d70b64fea1 100644 >>>>>>> --- a/fs/f2fs/super.c >>>>>>> +++ b/fs/f2fs/super.c >>>>>>> @@ -2842,6 +2842,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) >>>>>>> /* init f2fs-specific super block info */ >>>>>>> sbi->valid_super_block = valid_super_block; >>>>>>> mutex_init(&sbi->gc_mutex); >>>>>>> + mutex_init(&sbi->writepages); >>>>>>> mutex_init(&sbi->cp_mutex); >>>>>>> init_rwsem(&sbi->node_write); >>>>>>> init_rwsem(&sbi->node_change); >>>>>>> diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c >>>>>>> index cd2e030e47b8..81c0e5337443 100644 >>>>>>> --- a/fs/f2fs/sysfs.c >>>>>>> +++ b/fs/f2fs/sysfs.c >>>>>>> @@ -397,6 +397,7 @@ F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, batched_trim_sections, trim_sections); >>>>>>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, ipu_policy, ipu_policy); >>>>>>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ipu_util, min_ipu_util); >>>>>>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_fsync_blocks, min_fsync_blocks); >>>>>>> +F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_seq_blocks, min_seq_blocks); >>>>>>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_hot_blocks, min_hot_blocks); >>>>>>> F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ssr_sections, min_ssr_sections); >>>>>>> F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh); >>>>>>> @@ -449,6 +450,7 @@ static struct attribute *f2fs_attrs[] = { >>>>>>> ATTR_LIST(ipu_policy), >>>>>>> ATTR_LIST(min_ipu_util), >>>>>>> ATTR_LIST(min_fsync_blocks), >>>>>>> + ATTR_LIST(min_seq_blocks), >>>>>>> ATTR_LIST(min_hot_blocks), >>>>>>> ATTR_LIST(min_ssr_sections), >>>>>>> ATTR_LIST(max_victim_search), >>>>>>> >>>>> >>>>> . >>>>> >>> >>> . >>> > > . >