From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from szxga03-in.huawei.com ([45.249.212.189]:2088 "EHLO huawei.com"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1751037AbeAVI2B (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
        Mon, 22 Jan 2018 03:28:01 -0500
Subject: Re: [PATCH RFC] f2fs: add PRE2 to mark segments free to one
 checkpoint but obsolete to the other
To: guoweichao <guoweichao@huawei.com>, Chao Yu <chao@kernel.org>,
        "jaegeuk@kernel.org" <jaegeuk@kernel.org>
CC: "linux-f2fs-devel@lists.sourceforge.net"
        <linux-f2fs-devel@lists.sourceforge.net>,
        "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
        heyunlei <heyunlei@huawei.com>
References: <20180119232903.40922-1-guoweichao@huawei.com>
 <6a06586a-7432-03c5-912a-410e4aa3b129@kernel.org>
 <927e0099-0bad-1a1c-1b03-ac5f935f3df5@kernel.org>
 <db2b36d3-3f36-7a58-cc1b-55430e3e90b2@huawei.com>
 <78761541-c44f-f0dc-77b2-3782f39d9c5b@huawei.com>
 <8dbafe22-4bd6-7e26-5a7f-802f0b550b98@huawei.com>
From: Chao Yu <yuchao0@huawei.com>
Message-ID: <8ee8ebd6-6f31-2405-9bca-ab96d4afb94d@huawei.com>
Date: Mon, 22 Jan 2018 16:27:51 +0800
MIME-Version: 1.0
In-Reply-To: <8dbafe22-4bd6-7e26-5a7f-802f0b550b98@huawei.com>
Content-Type: text/plain; charset="windows-1252"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 2018/1/22 16:03, guoweichao wrote:
> 
> 
> On 2018/1/22 14:19, Chao Yu wrote:
>> On 2018/1/22 10:40, guoweichao wrote:
>>> Hi Chao,
>>>
>>> On 2018/1/21 10:34, Chao Yu wrote:
>>>> Hi Weichao,
>>>>
>>>> On 2018/1/20 23:50, guoweichao wrote:
>>>>> Hi Chao,
>>>>>
>>>>> Yes, it is exactly what I mean.
>>>>> It seems that F2FS has no responsibility to cover hardware problems.
>>>>> However, file systems usually choose redundancy for super block fault tolerance.
>>>>> So I think we actually have considered some external errors when designing a file system.
>>>>> Our dual checkpoint mechanism is mainly designed to keep at least one stable
>>>>> CP pack while creating a new one in case of SPO. And it has fault tolerant effects.
>>>>> As CP pack is also very critical to F2FS, why not make checkpoint more robustness
>>>>
>>>> I think you're trying to change basic design of A/B upgrade system to A/B/C one,
>>>> which can keep always two checkpoint valid. There will be no simple modification
>>>> to implement that one, in where we should cover not only prefree case but also
>>>> SSR case.
>>>>
>>> Nope, I do not intent to add another checkpoint. My main idea is reusing the invalid blocks
>>> after two checkpoints are recorded them as invalid. We could mark or record the blocks that
>>> are free to one checkpoint and valid to the other one as PARTIAL_FREE. And when set prefree
>>> segments as free or use invalid blocks in SSR mode, just igore PARTIAL_FREE blocks. If there
>>> is no other blocks can be used, just write a new checkpoint.
>>
>> Actually, your implementation looks like a variant of A/B/C upgrade system
>> (or A/B upgrade system), as checkpoint C (or inflight A/B) uses main area
>> separated from checkpoint A/B, and its inflight dirty metadata will persist
>> into checkpoint A or B since we haven't actually allocated checkpoint C
>> area during mkfs.
>>
>> Anyway, I don't think we have a strong reason to implement this one.
>>
>>>
>>>> IMO, the biggest problem there is available space, since in checkpoint C, we can
>>>> only use invalid blocks in both checkpoint A and B, so in some cases there will
>>>> almost be no valid space we can use during allocation, result in frequently
>>>> checkpoint.
>>>>
>>>> IMO, what we can do is trying to keep last valid checkpoint being integrity as
>>>> possible as we can. One way is that we can add mirror or parity for the
>>>> checkpoint which can help to do recovery once checkpoint is corrupted. At
>>>> least, I hope that with it in debug version we can help hardware staff to fix
>>>> their issue instead of wasting much time to troubleshoot filesystem issue.
>>>>
>>> Yes, I agree. Adding a backup for the latest checkpoint in debug version is OK. Sometimes,
>>
>> Parity will be better due to light overhead?
> Yes, backup may also cause disk layout changes. How about adding a parity block for the blocks

Right,

> in a checkpoint similar to RAID-5 or other erasure code methods?

Yes, I think we can.

Thanks,

> 
> Thanks,
> 
>>
>> Thanks,
>>
>>> hardware issues are hard to uncover. We should try to separate them from F2FS.
>>>
>>> Thanks,
>>>
>>>> Thanks,
>>>>
>>>>> with simple modification in current design and little overhead except for FG_GC.
>>>>> Of cause, keep unchanged is OK. I just want to discuss this proposal. :)
>>>>>
>>>>> Thanks,
>>>>> *From:*Chao Yu
>>>>> *To:*guoweichao,jaegeuk@kernel.org,
>>>>> *Cc:*linux-f2fs-devel@lists.sourceforge.net,linux-fsdevel@vger.kernel.org,heyunlei,
>>>>> *Date:*2018-01-20 15:43:23
>>>>> *Subject:*Re: [PATCH RFC] f2fs: add PRE2 to mark segments free to one checkpoint but obsolete to the other
>>>>>
>>>>> Hi Weichao,
>>>>>
>>>>> On 2018/1/20 7:29, Weichao Guo wrote:
>>>>>> Currently, we set prefree segments as free ones after writing
>>>>>> a checkpoint, then believe the segments could be used safely.
>>>>>> However, if a sudden power-off coming and the newer checkpoint
>>>>>> corrupted due to hardware issues at the same time, we will try
>>>>>> to use the old checkpoint and may face an inconsistent file
>>>>>> system status.
>>>>>
>>>>> IIUC, you mean:
>>>>>
>>>>> 1. write nodes into segment x;
>>>>> 2. write checkpoint A;
>>>>> 3. remove nodes in segment x;
>>>>> 4. write checkpoint B;
>>>>> 5. issue discard or write datas into segment x;
>>>>> 6. sudden power-cut
>>>>>
>>>>> But after reboot, we found checkpoint B is corrupted due to hardware, and
>>>>> then start to use checkpoint A, but nodes in segment x recorded as valid
>>>>> data in checkpoint A has been overcovered in step 5), so we will encounter
>>>>> inconsistent meta data, right?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>>
>>>>>> How about add an PRE2 status for prefree segments, and make
>>>>>> sure the segments could be used safely to both checkpoints?
>>>>>> Or any better solutions? Or this is not a problem?
>>>>>>
>>>>>> Look forward to your comments!
>>>>>>
>>>>>> Signed-off-by: Weichao Guo <guoweichao@huawei.com>
>>>>>> ---
>>>>>>  fs/f2fs/gc.c      | 11 +++++++++--
>>>>>>  fs/f2fs/segment.c | 21 ++++++++++++++++++---
>>>>>>  fs/f2fs/segment.h |  6 ++++++
>>>>>>  3 files changed, 33 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>>>>> index 33e7969..153e3ea 100644
>>>>>> --- a/fs/f2fs/gc.c
>>>>>> +++ b/fs/f2fs/gc.c
>>>>>> @@ -1030,7 +1030,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>>>>>    * threshold, we can make them free by checkpoint. Then, we
>>>>>>    * secure free segments which doesn't need fggc any more.
>>>>>>    */
>>>>>> - if (prefree_segments(sbi)) {
>>>>>> + if (prefree_segments(sbi) || prefree2_segments(sbi)) {
>>>>>> + ret = write_checkpoint(sbi, &cpc);
>>>>>> + if (ret)
>>>>>> + goto stop;
>>>>>> + }
>>>>>> + if (has_not_enough_free_secs(sbi, 0, 0) && prefree2_segments(sbi)) {
>>>>>>   ret = write_checkpoint(sbi, &cpc);
>>>>>>   if (ret)
>>>>>>   goto stop;
>>>>>> @@ -1063,8 +1068,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>>>>>   goto gc_more;
>>>>>>   }
>>>>>>  
>>>>>> - if (gc_type == FG_GC)
>>>>>> + if (gc_type == FG_GC) {
>>>>>> + ret = write_checkpoint(sbi, &cpc);
>>>>>>   ret = write_checkpoint(sbi, &cpc);
>>>>>> + }
>>>>>>   }
>>>>>>  stop:
>>>>>>   SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>> index 2e8e054d..9dec445 100644
>>>>>> --- a/fs/f2fs/segment.c
>>>>>> +++ b/fs/f2fs/segment.c
>>>>>> @@ -1606,7 +1606,7 @@ static void set_prefree_as_free_segments(struct f2fs_sb_info *sbi)
>>>>>>   unsigned int segno;
>>>>>>  
>>>>>>   mutex_lock(&dirty_i->seglist_lock);
>>>>>> - for_each_set_bit(segno, dirty_i->dirty_segmap[PRE], MAIN_SEGS(sbi))
>>>>>> + for_each_set_bit(segno, dirty_i->dirty_segmap[PRE2], MAIN_SEGS(sbi))
>>>>>>   __set_test_and_free(sbi, segno);
>>>>>>   mutex_unlock(&dirty_i->seglist_lock);
>>>>>>  }
>>>>>> @@ -1617,13 +1617,17 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>>>>>>   struct list_head *head = &dcc->entry_list;
>>>>>>   struct discard_entry *entry, *this;
>>>>>>   struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>>>>>> - unsigned long *prefree_map = dirty_i->dirty_segmap[PRE];
>>>>>> + unsigned long *prefree_map;
>>>>>>   unsigned int start = 0, end = -1;
>>>>>>   unsigned int secno, start_segno;
>>>>>>   bool force = (cpc->reason & CP_DISCARD);
>>>>>> + int phase = 0;
>>>>>> + enum dirty_type dirty_type = PRE2;
>>>>>>  
>>>>>>   mutex_lock(&dirty_i->seglist_lock);
>>>>>>  
>>>>>> +next_step:
>>>>>> + prefree_map = dirty_i->dirty_segmap[dirty_type];
>>>>>>   while (1) {
>>>>>>   int i;
>>>>>>   start = find_next_bit(prefree_map, MAIN_SEGS(sbi), end + 1);
>>>>>> @@ -1635,7 +1639,7 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>>>>>>   for (i = start; i < end; i++)
>>>>>>   clear_bit(i, prefree_map);
>>>>>>  
>>>>>> - dirty_i->nr_dirty[PRE] -= end - start;
>>>>>> + dirty_i->nr_dirty[dirty_type] -= end - start;
>>>>>>  
>>>>>>   if (!test_opt(sbi, DISCARD))
>>>>>>   continue;
>>>>>> @@ -1663,6 +1667,16 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>>>>>>   else
>>>>>>   end = start - 1;
>>>>>>   }
>>>>>> + if (phase == 0) {
>>>>>> + /* status change: PRE -> PRE2 */
>>>>>> + for_each_set_bit(segno, dirty_i->dirty_segmap[PRE], MAIN_SEGS(sbi))
>>>>>> + if (!test_and_set_bit(segno, prefree_map))
>>>>>> + dirty_i->nr_dirty[PRE2]++;
>>>>>> +
>>>>>> + phase = 1;
>>>>>> + dirty_type = PRE;
>>>>>> + goto next_step;
>>>>>> + }
>>>>>>   mutex_unlock(&dirty_i->seglist_lock);
>>>>>>  
>>>>>>   /* send small discards */
>>>>>> @@ -2245,6 +2259,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int type)
>>>>>>  
>>>>>>   mutex_lock(&dirty_i->seglist_lock);
>>>>>>   __remove_dirty_segment(sbi, new_segno, PRE);
>>>>>> + __remove_dirty_segment(sbi, new_segno, PRE2);
>>>>>>   __remove_dirty_segment(sbi, new_segno, DIRTY);
>>>>>>   mutex_unlock(&dirty_i->seglist_lock);
>>>>>>  
>>>>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
>>>>>> index 71a2aaa..f702726 100644
>>>>>> --- a/fs/f2fs/segment.h
>>>>>> +++ b/fs/f2fs/segment.h
>>>>>> @@ -263,6 +263,7 @@ enum dirty_type {
>>>>>>   DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
>>>>>>   DIRTY, /* to count # of dirty segments */
>>>>>>   PRE, /* to count # of entirely obsolete segments */
>>>>>> + PRE2, /* to count # of the segments free to one checkpoint but obsolete to the other checkpoint */
>>>>>>   NR_DIRTY_TYPE
>>>>>>  };
>>>>>>  
>>>>>> @@ -477,6 +478,11 @@ static inline unsigned int prefree_segments(struct f2fs_sb_info *sbi)
>>>>>>   return DIRTY_I(sbi)->nr_dirty[PRE];
>>>>>>  }
>>>>>>  
>>>>>> +static inline unsigned int prefree2_segments(struct f2fs_sb_info *sbi)
>>>>>> +{
>>>>>> + return DIRTY_I(sbi)->nr_dirty[PRE2];
>>>>>> +}
>>>>>> +
>>>>>>  static inline unsigned int dirty_segments(struct f2fs_sb_info *sbi)
>>>>>>  {
>>>>>>   return DIRTY_I(sbi)->nr_dirty[DIRTY_HOT_DATA] +
>>>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>> .
>>>
>>
>>
>> .
>>
> 
> 
> .
> 

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Chao Yu <yuchao0@huawei.com>
Subject: Re: [PATCH RFC] f2fs: add PRE2 to mark segments free to
 one checkpoint but obsolete to the other
Date: Mon, 22 Jan 2018 16:27:51 +0800
Message-ID: <8ee8ebd6-6f31-2405-9bca-ab96d4afb94d@huawei.com>
References: <20180119232903.40922-1-guoweichao@huawei.com>
 <6a06586a-7432-03c5-912a-410e4aa3b129@kernel.org>
 <927e0099-0bad-1a1c-1b03-ac5f935f3df5@kernel.org>
 <db2b36d3-3f36-7a58-cc1b-55430e3e90b2@huawei.com>
 <78761541-c44f-f0dc-77b2-3782f39d9c5b@huawei.com>
 <8dbafe22-4bd6-7e26-5a7f-802f0b550b98@huawei.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <linux-f2fs-devel-bounces@lists.sourceforge.net>
Received: from sfi-mx-1.v28.ch3.sourceforge.com ([172.29.28.191]
 helo=mx.sourceforge.net)
 by sfs-ml-4.v29.ch3.sourceforge.com with esmtps
 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89)
 (envelope-from <yuchao0@huawei.com>) id 1edXSb-0002E6-00
 for linux-f2fs-devel@lists.sourceforge.net; Mon, 22 Jan 2018 08:28:09 +0000
Received: from szxga03-in.huawei.com ([45.249.212.189] helo=huawei.com)
 by sfi-mx-1.v28.ch3.sourceforge.com with esmtps
 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89)
 id 1edXSY-0007Bq-M3
 for linux-f2fs-devel@lists.sourceforge.net; Mon, 22 Jan 2018 08:28:08 +0000
In-Reply-To: <8dbafe22-4bd6-7e26-5a7f-802f0b550b98@huawei.com>
Content-Language: en-US
List-Id: <linux-f2fs-devel.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/options/linux-f2fs-devel>,
 <mailto:linux-f2fs-devel-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=linux-f2fs-devel>
List-Post: <mailto:linux-f2fs-devel@lists.sourceforge.net>
List-Help: <mailto:linux-f2fs-devel-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel>,
 <mailto:linux-f2fs-devel-request@lists.sourceforge.net?subject=subscribe>
Errors-To: linux-f2fs-devel-bounces@lists.sourceforge.net
To: guoweichao <guoweichao@huawei.com>, Chao Yu <chao@kernel.org>, "jaegeuk@kernel.org" <jaegeuk@kernel.org>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, heyunlei <heyunlei@huawei.com>, "linux-f2fs-devel@lists.sourceforge.net" <linux-f2fs-devel@lists.sourceforge.net>

On 2018/1/22 16:03, guoweichao wrote:
> 
> 
> On 2018/1/22 14:19, Chao Yu wrote:
>> On 2018/1/22 10:40, guoweichao wrote:
>>> Hi Chao,
>>>
>>> On 2018/1/21 10:34, Chao Yu wrote:
>>>> Hi Weichao,
>>>>
>>>> On 2018/1/20 23:50, guoweichao wrote:
>>>>> Hi Chao,
>>>>>
>>>>> Yes, it is exactly what I mean.
>>>>> It seems that F2FS has no responsibility to cover hardware problems.
>>>>> However, file systems usually choose redundancy for super block fault tolerance.
>>>>> So I think we actually have considered some external errors when designing a file system.
>>>>> Our dual checkpoint mechanism is mainly designed to keep at least one stable
>>>>> CP pack while creating a new one in case of SPO. And it has fault tolerant effects.
>>>>> As CP pack is also very critical to F2FS, why not make checkpoint more robustness
>>>>
>>>> I think you're trying to change basic design of A/B upgrade system to A/B/C one,
>>>> which can keep always two checkpoint valid. There will be no simple modification
>>>> to implement that one, in where we should cover not only prefree case but also
>>>> SSR case.
>>>>
>>> Nope, I do not intent to add another checkpoint. My main idea is reusing the invalid blocks
>>> after two checkpoints are recorded them as invalid. We could mark or record the blocks that
>>> are free to one checkpoint and valid to the other one as PARTIAL_FREE. And when set prefree
>>> segments as free or use invalid blocks in SSR mode, just igore PARTIAL_FREE blocks. If there
>>> is no other blocks can be used, just write a new checkpoint.
>>
>> Actually, your implementation looks like a variant of A/B/C upgrade system
>> (or A/B upgrade system), as checkpoint C (or inflight A/B) uses main area
>> separated from checkpoint A/B, and its inflight dirty metadata will persist
>> into checkpoint A or B since we haven't actually allocated checkpoint C
>> area during mkfs.
>>
>> Anyway, I don't think we have a strong reason to implement this one.
>>
>>>
>>>> IMO, the biggest problem there is available space, since in checkpoint C, we can
>>>> only use invalid blocks in both checkpoint A and B, so in some cases there will
>>>> almost be no valid space we can use during allocation, result in frequently
>>>> checkpoint.
>>>>
>>>> IMO, what we can do is trying to keep last valid checkpoint being integrity as
>>>> possible as we can. One way is that we can add mirror or parity for the
>>>> checkpoint which can help to do recovery once checkpoint is corrupted. At
>>>> least, I hope that with it in debug version we can help hardware staff to fix
>>>> their issue instead of wasting much time to troubleshoot filesystem issue.
>>>>
>>> Yes, I agree. Adding a backup for the latest checkpoint in debug version is OK. Sometimes,
>>
>> Parity will be better due to light overhead?
> Yes, backup may also cause disk layout changes. How about adding a parity block for the blocks

Right,

> in a checkpoint similar to RAID-5 or other erasure code methods?

Yes, I think we can.

Thanks,

> 
> Thanks,
> 
>>
>> Thanks,
>>
>>> hardware issues are hard to uncover. We should try to separate them from F2FS.
>>>
>>> Thanks,
>>>
>>>> Thanks,
>>>>
>>>>> with simple modification in current design and little overhead except for FG_GC.
>>>>> Of cause, keep unchanged is OK. I just want to discuss this proposal. :)
>>>>>
>>>>> Thanks,
>>>>> *From:*Chao Yu
>>>>> *To:*guoweichao,jaegeuk@kernel.org,
>>>>> *Cc:*linux-f2fs-devel@lists.sourceforge.net,linux-fsdevel@vger.kernel.org,heyunlei,
>>>>> *Date:*2018-01-20 15:43:23
>>>>> *Subject:*Re: [PATCH RFC] f2fs: add PRE2 to mark segments free to one checkpoint but obsolete to the other
>>>>>
>>>>> Hi Weichao,
>>>>>
>>>>> On 2018/1/20 7:29, Weichao Guo wrote:
>>>>>> Currently, we set prefree segments as free ones after writing
>>>>>> a checkpoint, then believe the segments could be used safely.
>>>>>> However, if a sudden power-off coming and the newer checkpoint
>>>>>> corrupted due to hardware issues at the same time, we will try
>>>>>> to use the old checkpoint and may face an inconsistent file
>>>>>> system status.
>>>>>
>>>>> IIUC, you mean:
>>>>>
>>>>> 1. write nodes into segment x;
>>>>> 2. write checkpoint A;
>>>>> 3. remove nodes in segment x;
>>>>> 4. write checkpoint B;
>>>>> 5. issue discard or write datas into segment x;
>>>>> 6. sudden power-cut
>>>>>
>>>>> But after reboot, we found checkpoint B is corrupted due to hardware, and
>>>>> then start to use checkpoint A, but nodes in segment x recorded as valid
>>>>> data in checkpoint A has been overcovered in step 5), so we will encounter
>>>>> inconsistent meta data, right?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>>
>>>>>> How about add an PRE2 status for prefree segments, and make
>>>>>> sure the segments could be used safely to both checkpoints?
>>>>>> Or any better solutions? Or this is not a problem?
>>>>>>
>>>>>> Look forward to your comments!
>>>>>>
>>>>>> Signed-off-by: Weichao Guo <guoweichao@huawei.com>
>>>>>> ---
>>>>>>  fs/f2fs/gc.c      | 11 +++++++++--
>>>>>>  fs/f2fs/segment.c | 21 ++++++++++++++++++---
>>>>>>  fs/f2fs/segment.h |  6 ++++++
>>>>>>  3 files changed, 33 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>>>>> index 33e7969..153e3ea 100644
>>>>>> --- a/fs/f2fs/gc.c
>>>>>> +++ b/fs/f2fs/gc.c
>>>>>> @@ -1030,7 +1030,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>>>>>    * threshold, we can make them free by checkpoint. Then, we
>>>>>>    * secure free segments which doesn't need fggc any more.
>>>>>>    */
>>>>>> - if (prefree_segments(sbi)) {
>>>>>> + if (prefree_segments(sbi) || prefree2_segments(sbi)) {
>>>>>> + ret = write_checkpoint(sbi, &cpc);
>>>>>> + if (ret)
>>>>>> + goto stop;
>>>>>> + }
>>>>>> + if (has_not_enough_free_secs(sbi, 0, 0) && prefree2_segments(sbi)) {
>>>>>>   ret = write_checkpoint(sbi, &cpc);
>>>>>>   if (ret)
>>>>>>   goto stop;
>>>>>> @@ -1063,8 +1068,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>>>>>   goto gc_more;
>>>>>>   }
>>>>>>  
>>>>>> - if (gc_type == FG_GC)
>>>>>> + if (gc_type == FG_GC) {
>>>>>> + ret = write_checkpoint(sbi, &cpc);
>>>>>>   ret = write_checkpoint(sbi, &cpc);
>>>>>> + }
>>>>>>   }
>>>>>>  stop:
>>>>>>   SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>> index 2e8e054d..9dec445 100644
>>>>>> --- a/fs/f2fs/segment.c
>>>>>> +++ b/fs/f2fs/segment.c
>>>>>> @@ -1606,7 +1606,7 @@ static void set_prefree_as_free_segments(struct f2fs_sb_info *sbi)
>>>>>>   unsigned int segno;
>>>>>>  
>>>>>>   mutex_lock(&dirty_i->seglist_lock);
>>>>>> - for_each_set_bit(segno, dirty_i->dirty_segmap[PRE], MAIN_SEGS(sbi))
>>>>>> + for_each_set_bit(segno, dirty_i->dirty_segmap[PRE2], MAIN_SEGS(sbi))
>>>>>>   __set_test_and_free(sbi, segno);
>>>>>>   mutex_unlock(&dirty_i->seglist_lock);
>>>>>>  }
>>>>>> @@ -1617,13 +1617,17 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>>>>>>   struct list_head *head = &dcc->entry_list;
>>>>>>   struct discard_entry *entry, *this;
>>>>>>   struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>>>>>> - unsigned long *prefree_map = dirty_i->dirty_segmap[PRE];
>>>>>> + unsigned long *prefree_map;
>>>>>>   unsigned int start = 0, end = -1;
>>>>>>   unsigned int secno, start_segno;
>>>>>>   bool force = (cpc->reason & CP_DISCARD);
>>>>>> + int phase = 0;
>>>>>> + enum dirty_type dirty_type = PRE2;
>>>>>>  
>>>>>>   mutex_lock(&dirty_i->seglist_lock);
>>>>>>  
>>>>>> +next_step:
>>>>>> + prefree_map = dirty_i->dirty_segmap[dirty_type];
>>>>>>   while (1) {
>>>>>>   int i;
>>>>>>   start = find_next_bit(prefree_map, MAIN_SEGS(sbi), end + 1);
>>>>>> @@ -1635,7 +1639,7 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>>>>>>   for (i = start; i < end; i++)
>>>>>>   clear_bit(i, prefree_map);
>>>>>>  
>>>>>> - dirty_i->nr_dirty[PRE] -= end - start;
>>>>>> + dirty_i->nr_dirty[dirty_type] -= end - start;
>>>>>>  
>>>>>>   if (!test_opt(sbi, DISCARD))
>>>>>>   continue;
>>>>>> @@ -1663,6 +1667,16 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>>>>>>   else
>>>>>>   end = start - 1;
>>>>>>   }
>>>>>> + if (phase == 0) {
>>>>>> + /* status change: PRE -> PRE2 */
>>>>>> + for_each_set_bit(segno, dirty_i->dirty_segmap[PRE], MAIN_SEGS(sbi))
>>>>>> + if (!test_and_set_bit(segno, prefree_map))
>>>>>> + dirty_i->nr_dirty[PRE2]++;
>>>>>> +
>>>>>> + phase = 1;
>>>>>> + dirty_type = PRE;
>>>>>> + goto next_step;
>>>>>> + }
>>>>>>   mutex_unlock(&dirty_i->seglist_lock);
>>>>>>  
>>>>>>   /* send small discards */
>>>>>> @@ -2245,6 +2259,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int type)
>>>>>>  
>>>>>>   mutex_lock(&dirty_i->seglist_lock);
>>>>>>   __remove_dirty_segment(sbi, new_segno, PRE);
>>>>>> + __remove_dirty_segment(sbi, new_segno, PRE2);
>>>>>>   __remove_dirty_segment(sbi, new_segno, DIRTY);
>>>>>>   mutex_unlock(&dirty_i->seglist_lock);
>>>>>>  
>>>>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
>>>>>> index 71a2aaa..f702726 100644
>>>>>> --- a/fs/f2fs/segment.h
>>>>>> +++ b/fs/f2fs/segment.h
>>>>>> @@ -263,6 +263,7 @@ enum dirty_type {
>>>>>>   DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
>>>>>>   DIRTY, /* to count # of dirty segments */
>>>>>>   PRE, /* to count # of entirely obsolete segments */
>>>>>> + PRE2, /* to count # of the segments free to one checkpoint but obsolete to the other checkpoint */
>>>>>>   NR_DIRTY_TYPE
>>>>>>  };
>>>>>>  
>>>>>> @@ -477,6 +478,11 @@ static inline unsigned int prefree_segments(struct f2fs_sb_info *sbi)
>>>>>>   return DIRTY_I(sbi)->nr_dirty[PRE];
>>>>>>  }
>>>>>>  
>>>>>> +static inline unsigned int prefree2_segments(struct f2fs_sb_info *sbi)
>>>>>> +{
>>>>>> + return DIRTY_I(sbi)->nr_dirty[PRE2];
>>>>>> +}
>>>>>> +
>>>>>>  static inline unsigned int dirty_segments(struct f2fs_sb_info *sbi)
>>>>>>  {
>>>>>>   return DIRTY_I(sbi)->nr_dirty[DIRTY_HOT_DATA] +
>>>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>> .
>>>
>>
>>
>> .
>>
> 
> 
> .
> 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot