All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
@ 2013-10-08 11:30 ` Yuan Zhong
  0 siblings, 0 replies; 13+ messages in thread
From: Yuan Zhong @ 2013-10-08 11:30 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Jaegeuk Kim, linux-f2fs-devel, linux-kernel, linux-fsdevel, shu tan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=windows-1252, Size: 5263 bytes --]

Hi Gu,

> Hi Yuan,
> On 10/08/2013 04:30 PM, Yuan Zhong wrote:

> > Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back.
> > Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting.
> > For this reason, there is a situation that after the pages have been written back, 
> > but the checkpoint thread still wait for congestion_wait to exit.

> How do you confirm this issue? 

  I traced the execution path.
  In f2fs_end_io_write, dec_page_count(p->sbi, F2FS_WRITEBACK) will be called.
  And I found that, when pages of F2FS_WRITEBACK has been zero, but
  checkpoint thread still congestion_wait for pages of F2FS_WRITEBACK to be zero.	
  So, I think this point could be improved.
  And I wrote a simple test case and tested on Micro-SD card, the steps as following:
      (a) create a fixed-size file (4KB)
      (b) go on to sync the file 
      (c) go back to step #a (fixed numbers of cycling:1024)	
   The results indicated that the execution time is reduced greatly by using this patch.  


> I suspect that the block-core does not have a wake-up mechanism
> when the back device is uncongested.


  Yes, you are right.
  So I wake up the checkpoint thread by myself, when pages of F2FS_WRITEBACK to be zero.
  In f2fs_end_io_write, f2fs_writeback_wait is called.
  you cloud find this code in my patch. 


> > This is a problem here, especially, when sync a large number of small files or dirs.
> > In order to avoid this, a wait_list is introduced, 
> > the checkpoint thread will be dropped into the wait_list if the pages have not been written back, 
> > and will be waked up by contrast.

> Please pay some attention to the mail form, this mail is out of format in my mail client.

> Regards,
> Gu

Regards,
Yuan

> > 
> > Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
> > ---  
> >  fs/f2fs/checkpoint.c |    3 +--
> >  fs/f2fs/f2fs.h       |   19 +++++++++++++++++++
> >  fs/f2fs/segment.c    |    1 +
> >  fs/f2fs/super.c      |    1 +
> >  4 files changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index ca39442..5d69ae0 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
> >  	f2fs_put_page(cp_page, 1);
> >  
> >  	/* wait for previous submitted node/meta pages writeback */
> > -	while (get_pages(sbi, F2FS_WRITEBACK))
> > -		congestion_wait(BLK_RW_ASYNC, HZ / 50);
> > +	f2fs_writeback_wait(sbi);
> >  
> >  	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
> >  	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 7fd99d8..4b0d70e 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -18,6 +18,8 @@
> >  #include <linux/crc32.h>
> >  #include <linux/magic.h>
> >  #include <linux/kobject.h>
> > +#include <linux/wait.h>
> > +#include <linux/sched.h>
> >  
> >  /*
> >   * For mount options
> > @@ -368,6 +370,7 @@ struct f2fs_sb_info {
> >  	struct mutex fs_lock[NR_GLOBAL_LOCKS];	/* blocking FS operations */
> >  	struct mutex node_write;		/* locking node writes */
> >  	struct mutex writepages;		/* mutex for writepages() */
> > +	wait_queue_head_t writeback_wqh;	/* wait_queue for writeback */
> >  	unsigned char next_lock_num;		/* round-robin global locks */
> >  	int por_doing;				/* recovery is doing or not */
> >  	int on_build_free_nids;			/* build_free_nids is doing */
> > @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
> >  	return sb->s_flags & MS_RDONLY;
> >  }
> >  
> > +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
> > +{
> > +	DEFINE_WAIT(wait);
> > +
> > +	prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
> > +	if (get_pages(sbi, F2FS_WRITEBACK))
> > +		io_schedule();
> > +	finish_wait(&sbi->writeback_wqh, &wait);
> > +}
> > +
> > +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
> > +{
> > +	if (!get_pages(sbi, F2FS_WRITEBACK))
> > +		wake_up_all(&sbi->writeback_wqh);
> > +}
> > +
> >  /*
> >   * file.c
> >   */
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index bd79bbe..0708aa9 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
> >  
> >  	if (p->is_sync)
> >  		complete(p->wait);
> > +	f2fs_writeback_wake(p->sbi);
> >  	kfree(p);
> >  	bio_put(bio);
> >  }
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 094ccc6..3ac6d85 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >  	mutex_init(&sbi->gc_mutex);
> >  	mutex_init(&sbi->writepages);
> >  	mutex_init(&sbi->cp_mutex);
> > +	init_waitqueue_head(&sbi->writeback_wqh);
> >  	for (i = 0; i < NR_GLOBAL_LOCKS; i++)
> >  		mutex_init(&sbi->fs_lock[i]);
> >  	mutex_init(&sbi->node_write);ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
@ 2013-10-08 11:30 ` Yuan Zhong
  0 siblings, 0 replies; 13+ messages in thread
From: Yuan Zhong @ 2013-10-08 11:30 UTC (permalink / raw)
  To: Gu Zheng; +Cc: linux-fsdevel, shu tan, linux-kernel, linux-f2fs-devel

Hi Gu,

> Hi Yuan,
> On 10/08/2013 04:30 PM, Yuan Zhong wrote:

> > Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back.
> > Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting.
> > For this reason, there is a situation that after the pages have been written back, 
> > but the checkpoint thread still wait for congestion_wait to exit.

> How do you confirm this issue? 

  I traced the execution path.
  In f2fs_end_io_write, dec_page_count(p->sbi, F2FS_WRITEBACK) will be called.
  And I found that, when pages of F2FS_WRITEBACK has been zero, but
  checkpoint thread still congestion_wait for pages of F2FS_WRITEBACK to be zero.	
  So, I think this point could be improved.
  And I wrote a simple test case and tested on Micro-SD card, the steps as following:
      (a) create a fixed-size file (4KB)
      (b) go on to sync the file 
      (c) go back to step #a (fixed numbers of cycling:1024)	
   The results indicated that the execution time is reduced greatly by using this patch.  


> I suspect that the block-core does not have a wake-up mechanism
> when the back device is uncongested.


  Yes, you are right.
  So I wake up the checkpoint thread by myself, when pages of F2FS_WRITEBACK to be zero.
  In f2fs_end_io_write, f2fs_writeback_wait is called.
  you cloud find this code in my patch. 


> > This is a problem here, especially, when sync a large number of small files or dirs.
> > In order to avoid this, a wait_list is introduced, 
> > the checkpoint thread will be dropped into the wait_list if the pages have not been written back, 
> > and will be waked up by contrast.

> Please pay some attention to the mail form, this mail is out of format in my mail client.

> Regards,
> Gu

Regards,
Yuan

> > 
> > Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
> > ---  
> >  fs/f2fs/checkpoint.c |    3 +--
> >  fs/f2fs/f2fs.h       |   19 +++++++++++++++++++
> >  fs/f2fs/segment.c    |    1 +
> >  fs/f2fs/super.c      |    1 +
> >  4 files changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index ca39442..5d69ae0 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
> >  	f2fs_put_page(cp_page, 1);
> >  
> >  	/* wait for previous submitted node/meta pages writeback */
> > -	while (get_pages(sbi, F2FS_WRITEBACK))
> > -		congestion_wait(BLK_RW_ASYNC, HZ / 50);
> > +	f2fs_writeback_wait(sbi);
> >  
> >  	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
> >  	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 7fd99d8..4b0d70e 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -18,6 +18,8 @@
> >  #include <linux/crc32.h>
> >  #include <linux/magic.h>
> >  #include <linux/kobject.h>
> > +#include <linux/wait.h>
> > +#include <linux/sched.h>
> >  
> >  /*
> >   * For mount options
> > @@ -368,6 +370,7 @@ struct f2fs_sb_info {
> >  	struct mutex fs_lock[NR_GLOBAL_LOCKS];	/* blocking FS operations */
> >  	struct mutex node_write;		/* locking node writes */
> >  	struct mutex writepages;		/* mutex for writepages() */
> > +	wait_queue_head_t writeback_wqh;	/* wait_queue for writeback */
> >  	unsigned char next_lock_num;		/* round-robin global locks */
> >  	int por_doing;				/* recovery is doing or not */
> >  	int on_build_free_nids;			/* build_free_nids is doing */
> > @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
> >  	return sb->s_flags & MS_RDONLY;
> >  }
> >  
> > +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
> > +{
> > +	DEFINE_WAIT(wait);
> > +
> > +	prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
> > +	if (get_pages(sbi, F2FS_WRITEBACK))
> > +		io_schedule();
> > +	finish_wait(&sbi->writeback_wqh, &wait);
> > +}
> > +
> > +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
> > +{
> > +	if (!get_pages(sbi, F2FS_WRITEBACK))
> > +		wake_up_all(&sbi->writeback_wqh);
> > +}
> > +
> >  /*
> >   * file.c
> >   */
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index bd79bbe..0708aa9 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
> >  
> >  	if (p->is_sync)
> >  		complete(p->wait);
> > +	f2fs_writeback_wake(p->sbi);
> >  	kfree(p);
> >  	bio_put(bio);
> >  }
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 094ccc6..3ac6d85 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >  	mutex_init(&sbi->gc_mutex);
> >  	mutex_init(&sbi->writepages);
> >  	mutex_init(&sbi->cp_mutex);
> > +	init_waitqueue_head(&sbi->writeback_wqh);
> >  	for (i = 0; i < NR_GLOBAL_LOCKS; i++)
> >  		mutex_init(&sbi->fs_lock[i]);
> >  	mutex_init(&sbi->node_write);
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
  2013-10-08 11:30 ` Yuan Zhong
@ 2013-10-09  4:04   ` Gu Zheng
  -1 siblings, 0 replies; 13+ messages in thread
From: Gu Zheng @ 2013-10-09  4:04 UTC (permalink / raw)
  To: yuan.mark.zhong
  Cc: Jaegeuk Kim, linux-f2fs-devel, linux-kernel, linux-fsdevel, shu tan

Hi Yuan,
On 10/08/2013 07:30 PM, Yuan Zhong wrote:

> Hi Gu,
> 
>> Hi Yuan,
>> On 10/08/2013 04:30 PM, Yuan Zhong wrote:
> 
>>> Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back.
>>> Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting.
>>> For this reason, there is a situation that after the pages have been written back, 
>>> but the checkpoint thread still wait for congestion_wait to exit.
> 
>> How do you confirm this issue? 
> 
>   I traced the execution path.
>   In f2fs_end_io_write, dec_page_count(p->sbi, F2FS_WRITEBACK) will be called.
>   And I found that, when pages of F2FS_WRITEBACK has been zero, but
>   checkpoint thread still congestion_wait for pages of F2FS_WRITEBACK to be zero.

Yes, it maybe. Congestion_wait add the task to a global wait queue which related to
all back devices, so if F2FS_WRITEBACK has been zero, but other io may be still going on.
Anyway, using a private wait queue to hold is a better choose.:)

	
>   So, I think this point could be improved.
>   And I wrote a simple test case and tested on Micro-SD card, the steps as following:
>       (a) create a fixed-size file (4KB)
>       (b) go on to sync the file 
>       (c) go back to step #a (fixed numbers of cycling:1024)	
>    The results indicated that the execution time is reduced greatly by using this patch.

Yes, the change is an improvement if the issue is existent.

  
> 
> 
>> I suspect that the block-core does not have a wake-up mechanism
>> when the back device is uncongested.
> 
> 
>   Yes, you are right.
>   So I wake up the checkpoint thread by myself, when pages of F2FS_WRITEBACK to be zero.
>   In f2fs_end_io_write, f2fs_writeback_wait is called.
>   you cloud find this code in my patch. 

Saw it.:)
But one problem is that the checkpoint routine always is singleton, so the wait queue just
services only one body, it seems not very worthy. How about just schedule and wake up it
directly? See the following one.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
 fs/f2fs/checkpoint.c |   11 +++++++++--
 fs/f2fs/f2fs.h       |    1 +
 fs/f2fs/segment.c    |    4 ++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d808827..2a5999d 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
 	f2fs_put_page(cp_page, 1);
 
 	/* wait for previous submitted node/meta pages writeback */
-	while (get_pages(sbi, F2FS_WRITEBACK))
-		congestion_wait(BLK_RW_ASYNC, HZ / 50);
+	sbi->cp_task = current;
+	while (get_pages(sbi, F2FS_WRITEBACK)) {
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		if (!get_pages(sbi, F2FS_WRITEBACK))
+			break;
+		io_schedule();
+	}
+	__set_current_state(TASK_RUNNING);
+	sbi->cp_task = NULL;
 
 	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
 	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index a955a59..408ace7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -365,6 +365,7 @@ struct f2fs_sb_info {
 	struct mutex writepages;		/* mutex for writepages() */
 	int por_doing;				/* recovery is doing or not */
 	int on_build_free_nids;			/* build_free_nids is doing */
+	struct task_struct *cp_task;		/* checkpoint task */
 
 	/* for orphan inode management */
 	struct list_head orphan_inode_list;	/* orphan inode list */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index bd79bbe..3b20359 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
 
 	if (p->is_sync)
 		complete(p->wait);
+
+	if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
+		wake_up_process(p->sbi->cp_task);
+
 	kfree(p);
 	bio_put(bio);
 }
-- 
1.7.7

Regards,
Gu 

> 
> 
>>> This is a problem here, especially, when sync a large number of small files or dirs.
>>> In order to avoid this, a wait_list is introduced, 
>>> the checkpoint thread will be dropped into the wait_list if the pages have not been written back, 
>>> and will be waked up by contrast.
> 
>> Please pay some attention to the mail form, this mail is out of format in my mail client.
> 
>> Regards,
>> Gu
> 
> Regards,
> Yuan
> 
>>>
>>> Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
>>> ---  
>>>  fs/f2fs/checkpoint.c |    3 +--
>>>  fs/f2fs/f2fs.h       |   19 +++++++++++++++++++
>>>  fs/f2fs/segment.c    |    1 +
>>>  fs/f2fs/super.c      |    1 +
>>>  4 files changed, 22 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>> index ca39442..5d69ae0 100644
>>> --- a/fs/f2fs/checkpoint.c
>>> +++ b/fs/f2fs/checkpoint.c
>>> @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>>>  	f2fs_put_page(cp_page, 1);
>>>  
>>>  	/* wait for previous submitted node/meta pages writeback */
>>> -	while (get_pages(sbi, F2FS_WRITEBACK))
>>> -		congestion_wait(BLK_RW_ASYNC, HZ / 50);
>>> +	f2fs_writeback_wait(sbi);
>>>  
>>>  	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>>>  	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 7fd99d8..4b0d70e 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -18,6 +18,8 @@
>>>  #include <linux/crc32.h>
>>>  #include <linux/magic.h>
>>>  #include <linux/kobject.h>
>>> +#include <linux/wait.h>
>>> +#include <linux/sched.h>
>>>  
>>>  /*
>>>   * For mount options
>>> @@ -368,6 +370,7 @@ struct f2fs_sb_info {
>>>  	struct mutex fs_lock[NR_GLOBAL_LOCKS];	/* blocking FS operations */
>>>  	struct mutex node_write;		/* locking node writes */
>>>  	struct mutex writepages;		/* mutex for writepages() */
>>> +	wait_queue_head_t writeback_wqh;	/* wait_queue for writeback */
>>>  	unsigned char next_lock_num;		/* round-robin global locks */
>>>  	int por_doing;				/* recovery is doing or not */
>>>  	int on_build_free_nids;			/* build_free_nids is doing */
>>> @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
>>>  	return sb->s_flags & MS_RDONLY;
>>>  }
>>>  
>>> +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
>>> +{
>>> +	DEFINE_WAIT(wait);
>>> +
>>> +	prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
>>> +	if (get_pages(sbi, F2FS_WRITEBACK))
>>> +		io_schedule();
>>> +	finish_wait(&sbi->writeback_wqh, &wait);
>>> +}
>>> +
>>> +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
>>> +{
>>> +	if (!get_pages(sbi, F2FS_WRITEBACK))
>>> +		wake_up_all(&sbi->writeback_wqh);
>>> +}
>>> +
>>>  /*
>>>   * file.c
>>>   */
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index bd79bbe..0708aa9 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>>>  
>>>  	if (p->is_sync)
>>>  		complete(p->wait);
>>> +	f2fs_writeback_wake(p->sbi);
>>>  	kfree(p);
>>>  	bio_put(bio);
>>>  }
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index 094ccc6..3ac6d85 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>  	mutex_init(&sbi->gc_mutex);
>>>  	mutex_init(&sbi->writepages);
>>>  	mutex_init(&sbi->cp_mutex);
>>> +	init_waitqueue_head(&sbi->writeback_wqh);
>>>  	for (i = 0; i < NR_GLOBAL_LOCKS; i++)
>>>  		mutex_init(&sbi->fs_lock[i]);
>>>  	mutex_init(&sbi->node_write);N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±‘êçzX§¶\x17›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾\a«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹\x1e®w¥¢¸?™¨è­Ú&¢)ߢ^[f”ù^jÇ«y§m…á@A«a¶Ú\x7fÿ\f0¶ìh®\x0få’i\x7f



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
@ 2013-10-09  4:04   ` Gu Zheng
  0 siblings, 0 replies; 13+ messages in thread
From: Gu Zheng @ 2013-10-09  4:04 UTC (permalink / raw)
  To: yuan.mark.zhong
  Cc: Jaegeuk Kim, linux-f2fs-devel, linux-kernel, linux-fsdevel, shu tan

Hi Yuan,
On 10/08/2013 07:30 PM, Yuan Zhong wrote:

> Hi Gu,
> 
>> Hi Yuan,
>> On 10/08/2013 04:30 PM, Yuan Zhong wrote:
> 
>>> Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back.
>>> Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting.
>>> For this reason, there is a situation that after the pages have been written back, 
>>> but the checkpoint thread still wait for congestion_wait to exit.
> 
>> How do you confirm this issue? 
> 
>   I traced the execution path.
>   In f2fs_end_io_write, dec_page_count(p->sbi, F2FS_WRITEBACK) will be called.
>   And I found that, when pages of F2FS_WRITEBACK has been zero, but
>   checkpoint thread still congestion_wait for pages of F2FS_WRITEBACK to be zero.

Yes, it maybe. Congestion_wait add the task to a global wait queue which related to
all back devices, so if F2FS_WRITEBACK has been zero, but other io may be still going on.
Anyway, using a private wait queue to hold is a better choose.:)

	
>   So, I think this point could be improved.
>   And I wrote a simple test case and tested on Micro-SD card, the steps as following:
>       (a) create a fixed-size file (4KB)
>       (b) go on to sync the file 
>       (c) go back to step #a (fixed numbers of cycling:1024)	
>    The results indicated that the execution time is reduced greatly by using this patch.

Yes, the change is an improvement if the issue is existent.

  
> 
> 
>> I suspect that the block-core does not have a wake-up mechanism
>> when the back device is uncongested.
> 
> 
>   Yes, you are right.
>   So I wake up the checkpoint thread by myself, when pages of F2FS_WRITEBACK to be zero.
>   In f2fs_end_io_write, f2fs_writeback_wait is called.
>   you cloud find this code in my patch. 

Saw it.:)
But one problem is that the checkpoint routine always is singleton, so the wait queue just
services only one body, it seems not very worthy. How about just schedule and wake up it
directly? See the following one.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
 fs/f2fs/checkpoint.c |   11 +++++++++--
 fs/f2fs/f2fs.h       |    1 +
 fs/f2fs/segment.c    |    4 ++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d808827..2a5999d 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
 	f2fs_put_page(cp_page, 1);
 
 	/* wait for previous submitted node/meta pages writeback */
-	while (get_pages(sbi, F2FS_WRITEBACK))
-		congestion_wait(BLK_RW_ASYNC, HZ / 50);
+	sbi->cp_task = current;
+	while (get_pages(sbi, F2FS_WRITEBACK)) {
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		if (!get_pages(sbi, F2FS_WRITEBACK))
+			break;
+		io_schedule();
+	}
+	__set_current_state(TASK_RUNNING);
+	sbi->cp_task = NULL;
 
 	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
 	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index a955a59..408ace7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -365,6 +365,7 @@ struct f2fs_sb_info {
 	struct mutex writepages;		/* mutex for writepages() */
 	int por_doing;				/* recovery is doing or not */
 	int on_build_free_nids;			/* build_free_nids is doing */
+	struct task_struct *cp_task;		/* checkpoint task */
 
 	/* for orphan inode management */
 	struct list_head orphan_inode_list;	/* orphan inode list */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index bd79bbe..3b20359 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
 
 	if (p->is_sync)
 		complete(p->wait);
+
+	if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
+		wake_up_process(p->sbi->cp_task);
+
 	kfree(p);
 	bio_put(bio);
 }
-- 
1.7.7

Regards,
Gu 

> 
> 
>>> This is a problem here, especially, when sync a large number of small files or dirs.
>>> In order to avoid this, a wait_list is introduced, 
>>> the checkpoint thread will be dropped into the wait_list if the pages have not been written back, 
>>> and will be waked up by contrast.
> 
>> Please pay some attention to the mail form, this mail is out of format in my mail client.
> 
>> Regards,
>> Gu
> 
> Regards,
> Yuan
> 
>>>
>>> Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
>>> ---  
>>>  fs/f2fs/checkpoint.c |    3 +--
>>>  fs/f2fs/f2fs.h       |   19 +++++++++++++++++++
>>>  fs/f2fs/segment.c    |    1 +
>>>  fs/f2fs/super.c      |    1 +
>>>  4 files changed, 22 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>> index ca39442..5d69ae0 100644
>>> --- a/fs/f2fs/checkpoint.c
>>> +++ b/fs/f2fs/checkpoint.c
>>> @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>>>  	f2fs_put_page(cp_page, 1);
>>>  
>>>  	/* wait for previous submitted node/meta pages writeback */
>>> -	while (get_pages(sbi, F2FS_WRITEBACK))
>>> -		congestion_wait(BLK_RW_ASYNC, HZ / 50);
>>> +	f2fs_writeback_wait(sbi);
>>>  
>>>  	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>>>  	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 7fd99d8..4b0d70e 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -18,6 +18,8 @@
>>>  #include <linux/crc32.h>
>>>  #include <linux/magic.h>
>>>  #include <linux/kobject.h>
>>> +#include <linux/wait.h>
>>> +#include <linux/sched.h>
>>>  
>>>  /*
>>>   * For mount options
>>> @@ -368,6 +370,7 @@ struct f2fs_sb_info {
>>>  	struct mutex fs_lock[NR_GLOBAL_LOCKS];	/* blocking FS operations */
>>>  	struct mutex node_write;		/* locking node writes */
>>>  	struct mutex writepages;		/* mutex for writepages() */
>>> +	wait_queue_head_t writeback_wqh;	/* wait_queue for writeback */
>>>  	unsigned char next_lock_num;		/* round-robin global locks */
>>>  	int por_doing;				/* recovery is doing or not */
>>>  	int on_build_free_nids;			/* build_free_nids is doing */
>>> @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
>>>  	return sb->s_flags & MS_RDONLY;
>>>  }
>>>  
>>> +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
>>> +{
>>> +	DEFINE_WAIT(wait);
>>> +
>>> +	prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
>>> +	if (get_pages(sbi, F2FS_WRITEBACK))
>>> +		io_schedule();
>>> +	finish_wait(&sbi->writeback_wqh, &wait);
>>> +}
>>> +
>>> +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
>>> +{
>>> +	if (!get_pages(sbi, F2FS_WRITEBACK))
>>> +		wake_up_all(&sbi->writeback_wqh);
>>> +}
>>> +
>>>  /*
>>>   * file.c
>>>   */
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index bd79bbe..0708aa9 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>>>  
>>>  	if (p->is_sync)
>>>  		complete(p->wait);
>>> +	f2fs_writeback_wake(p->sbi);
>>>  	kfree(p);
>>>  	bio_put(bio);
>>>  }
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index 094ccc6..3ac6d85 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>  	mutex_init(&sbi->gc_mutex);
>>>  	mutex_init(&sbi->writepages);
>>>  	mutex_init(&sbi->cp_mutex);
>>> +	init_waitqueue_head(&sbi->writeback_wqh);
>>>  	for (i = 0; i < NR_GLOBAL_LOCKS; i++)
>>>  		mutex_init(&sbi->fs_lock[i]);
>>>  	mutex_init(&sbi->node_write);N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±‘êçzX§¶\x17›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾\a«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹\x1e®w¥¢¸?™¨è­Ú&¢)ߢ^[f”ù^jÇ«y§m…á@A«a¶Ú\x7fÿ\f0¶ìh®\x0få’i\x7f

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
  2013-10-09  4:04   ` Gu Zheng
  (?)
@ 2013-10-10  8:09   ` Jin Xu
  2013-10-10  8:11       ` Gu Zheng
  -1 siblings, 1 reply; 13+ messages in thread
From: Jin Xu @ 2013-10-10  8:09 UTC (permalink / raw)
  To: Gu Zheng, yuan.mark.zhong
  Cc: linux-fsdevel, shu tan, linux-kernel, linux-f2fs-devel


[-- Attachment #1.1: Type: text/plain, Size: 7215 bytes --]

Hi Gu,
I have a comment below.

> Date: Wed, 9 Oct 2013 12:04:09 +0800
> From: guz.fnst@cn.fujitsu.com
> To: yuan.mark.zhong@samsung.com
> CC: jaegeuk.kim@samsung.com; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; shu.tan@samsung.com
> Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
> 
> Hi Yuan,
> On 10/08/2013 07:30 PM, Yuan Zhong wrote:
> 
...
> 
> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> ---
>  fs/f2fs/checkpoint.c |   11 +++++++++--
>  fs/f2fs/f2fs.h       |    1 +
>  fs/f2fs/segment.c    |    4 ++++
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index d808827..2a5999d 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>  	f2fs_put_page(cp_page, 1);
>  
>  	/* wait for previous submitted node/meta pages writeback */
> -	while (get_pages(sbi, F2FS_WRITEBACK))
> -		congestion_wait(BLK_RW_ASYNC, HZ / 50);
> +	sbi->cp_task = current;
> +	while (get_pages(sbi, F2FS_WRITEBACK)) {
> +		set_current_state(TASK_UNINTERRUPTIBLE);
> +		if (!get_pages(sbi, F2FS_WRITEBACK))
> +			break;
> +		io_schedule();
> +	}
> +	__set_current_state(TASK_RUNNING);
> +	sbi->cp_task = NULL;
>  
>  	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>  	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index a955a59..408ace7 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -365,6 +365,7 @@ struct f2fs_sb_info {
>  	struct mutex writepages;		/* mutex for writepages() */
>  	int por_doing;				/* recovery is doing or not */
>  	int on_build_free_nids;			/* build_free_nids is doing */
> +	struct task_struct *cp_task;		/* checkpoint task */
>  
>  	/* for orphan inode management */
>  	struct list_head orphan_inode_list;	/* orphan inode list */
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index bd79bbe..3b20359 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>  
>  	if (p->is_sync)
>  		complete(p->wait);
> +
> +	if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
> +		wake_up_process(p->sbi->cp_task);
There is a risk of dereferencing a NULL pointer because here simply comparing thecp_task against NULL is not enough to avoid race in multi-thread environment.Another thread could have assigned it to NULL in the window between the comparisonand waking up.
Regards,Jin
> +
>  	kfree(p);
>  	bio_put(bio);
>  }
> -- 
> 1.7.7
> 
> Regards,
> Gu 
> 
> > 
> > 
> >>> This is a problem here, especially, when sync a large number of small files or dirs.
> >>> In order to avoid this, a wait_list is introduced, 
> >>> the checkpoint thread will be dropped into the wait_list if the pages have not been written back, 
> >>> and will be waked up by contrast.
> > 
> >> Please pay some attention to the mail form, this mail is out of format in my mail client.
> > 
> >> Regards,
> >> Gu
> > 
> > Regards,
> > Yuan
> > 
> >>>
> >>> Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
> >>> ---  
> >>>  fs/f2fs/checkpoint.c |    3 +--
> >>>  fs/f2fs/f2fs.h       |   19 +++++++++++++++++++
> >>>  fs/f2fs/segment.c    |    1 +
> >>>  fs/f2fs/super.c      |    1 +
> >>>  4 files changed, 22 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> >>> index ca39442..5d69ae0 100644
> >>> --- a/fs/f2fs/checkpoint.c
> >>> +++ b/fs/f2fs/checkpoint.c
> >>> @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
> >>>  	f2fs_put_page(cp_page, 1);
> >>>  
> >>>  	/* wait for previous submitted node/meta pages writeback */
> >>> -	while (get_pages(sbi, F2FS_WRITEBACK))
> >>> -		congestion_wait(BLK_RW_ASYNC, HZ / 50);
> >>> +	f2fs_writeback_wait(sbi);
> >>>  
> >>>  	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
> >>>  	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>> index 7fd99d8..4b0d70e 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -18,6 +18,8 @@
> >>>  #include <linux/crc32.h>
> >>>  #include <linux/magic.h>
> >>>  #include <linux/kobject.h>
> >>> +#include <linux/wait.h>
> >>> +#include <linux/sched.h>
> >>>  
> >>>  /*
> >>>   * For mount options
> >>> @@ -368,6 +370,7 @@ struct f2fs_sb_info {
> >>>  	struct mutex fs_lock[NR_GLOBAL_LOCKS];	/* blocking FS operations */
> >>>  	struct mutex node_write;		/* locking node writes */
> >>>  	struct mutex writepages;		/* mutex for writepages() */
> >>> +	wait_queue_head_t writeback_wqh;	/* wait_queue for writeback */
> >>>  	unsigned char next_lock_num;		/* round-robin global locks */
> >>>  	int por_doing;				/* recovery is doing or not */
> >>>  	int on_build_free_nids;			/* build_free_nids is doing */
> >>> @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
> >>>  	return sb->s_flags & MS_RDONLY;
> >>>  }
> >>>  
> >>> +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
> >>> +{
> >>> +	DEFINE_WAIT(wait);
> >>> +
> >>> +	prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
> >>> +	if (get_pages(sbi, F2FS_WRITEBACK))
> >>> +		io_schedule();
> >>> +	finish_wait(&sbi->writeback_wqh, &wait);
> >>> +}
> >>> +
> >>> +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
> >>> +{
> >>> +	if (!get_pages(sbi, F2FS_WRITEBACK))
> >>> +		wake_up_all(&sbi->writeback_wqh);
> >>> +}
> >>> +
> >>>  /*
> >>>   * file.c
> >>>   */
> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >>> index bd79bbe..0708aa9 100644
> >>> --- a/fs/f2fs/segment.c
> >>> +++ b/fs/f2fs/segment.c
> >>> @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
> >>>  
> >>>  	if (p->is_sync)
> >>>  		complete(p->wait);
> >>> +	f2fs_writeback_wake(p->sbi);
> >>>  	kfree(p);
> >>>  	bio_put(bio);
> >>>  }
> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>> index 094ccc6..3ac6d85 100644
> >>> --- a/fs/f2fs/super.c
> >>> +++ b/fs/f2fs/super.c
> >>> @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>  	mutex_init(&sbi->gc_mutex);
> >>>  	mutex_init(&sbi->writepages);
> >>>  	mutex_init(&sbi->cp_mutex);
> >>> +	init_waitqueue_head(&sbi->writeback_wqh);
> >>>  	for (i = 0; i < NR_GLOBAL_LOCKS; i++)
> >>>  		mutex_init(&sbi->fs_lock[i]);
> >>>  	mutex_init(&sbi->node_write);N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±‘êçzX§¶\x17›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾\a«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹\x1e®w¥¢¸?™¨è­Ú&¢)ߢ^[f”ù^jÇ«y§m…á@A«a¶Ú\x7fÿ\f0¶ìh®\x0få’i\x7f
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 9624 bytes --]

[-- Attachment #2: Type: text/plain, Size: 416 bytes --]

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk

[-- Attachment #3: Type: text/plain, Size: 179 bytes --]

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
  2013-10-10  8:09   ` Jin Xu
@ 2013-10-10  8:11       ` Gu Zheng
  0 siblings, 0 replies; 13+ messages in thread
From: Gu Zheng @ 2013-10-10  8:11 UTC (permalink / raw)
  To: Jin Xu
  Cc: yuan.mark.zhong, Jaegeuk Kim, linux-f2fs-devel, linux-kernel,
	linux-fsdevel, shu tan

Hi Jin,

On 10/10/2013 04:09 PM, Jin Xu wrote:

> Hi Gu,
> 
> I have a comment below.
> 
>> Date: Wed, 9 Oct 2013 12:04:09 +0800
>> From: guz.fnst@cn.fujitsu.com
>> To: yuan.mark.zhong@samsung.com
>> CC: jaegeuk.kim@samsung.com; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; shu.tan@samsung.com
>> Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
>>
>> Hi Yuan,
>> On 10/08/2013 07:30 PM, Yuan Zhong wrote:
>>
> ...
>>
>> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> ---
>> fs/f2fs/checkpoint.c | 11 +++++++++--
>> fs/f2fs/f2fs.h | 1 +
>> fs/f2fs/segment.c | 4 ++++
>> 3 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> index d808827..2a5999d 100644
>> --- a/fs/f2fs/checkpoint.c
>> +++ b/fs/f2fs/checkpoint.c
>> @@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>> f2fs_put_page(cp_page, 1);
>>
>> /* wait for previous submitted node/meta pages writeback */
>> - while (get_pages(sbi, F2FS_WRITEBACK))
>> - congestion_wait(BLK_RW_ASYNC, HZ / 50);
>> + sbi->cp_task = current;
>> + while (get_pages(sbi, F2FS_WRITEBACK)) {
>> + set_current_state(TASK_UNINTERRUPTIBLE);
>> + if (!get_pages(sbi, F2FS_WRITEBACK))
>> + break;
>> + io_schedule();
>> + }
>> + __set_current_state(TASK_RUNNING);
>> + sbi->cp_task = NULL;
>>
>> filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>> filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index a955a59..408ace7 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -365,6 +365,7 @@ struct f2fs_sb_info {
>> struct mutex writepages; /* mutex for writepages() */
>> int por_doing; /* recovery is doing or not */
>> int on_build_free_nids; /* build_free_nids is doing */
>> + struct task_struct *cp_task; /* checkpoint task */
>>
>> /* for orphan inode management */
>> struct list_head orphan_inode_list; /* orphan inode list */
>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> index bd79bbe..3b20359 100644
>> --- a/fs/f2fs/segment.c
>> +++ b/fs/f2fs/segment.c
>> @@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>>
>> if (p->is_sync)
>> complete(p->wait);
>> +
>> + if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
>> + wake_up_process(p->sbi->cp_task);
> 
> There is a risk of dereferencing a NULL pointer because here simply comparing the
> cp_task against NULL is not enough to avoid race in multi-thread environment.
> Another thread could have assigned it to NULL in the window between the comparison
> and waking up.

Can not be that, checkpoint routine is always singleton and protected by cp_mutex and cp_rwsem.

Thanks,
Gu

> 
> Regards,
> Jin
>> +
>> kfree(p);
>> bio_put(bio);
>> }
>> --
>> 1.7.7
>>
>> Regards,
>> Gu
>>
>> >
>> >
>> >>> This is a problem here, especially, when sync a large number of small files or dirs.
>> >>> In order to avoid this, a wait_list is introduced,
>> >>> the checkpoint thread will be dropped into the wait_list if the pages have not been written back,
>> >>> and will be waked up by contrast.
>> >
>> >> Please pay some attention to the mail form, this mail is out of format in my mail client.
>> >
>> >> Regards,
>> >> Gu
>> >
>> > Regards,
>> > Yuan
>> >
>> >>>
>> >>> Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
>> >>> ---
>> >>> fs/f2fs/checkpoint.c | 3 +--
>> >>> fs/f2fs/f2fs.h | 19 +++++++++++++++++++
>> >>> fs/f2fs/segment.c | 1 +
>> >>> fs/f2fs/super.c | 1 +
>> >>> 4 files changed, 22 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> >>> index ca39442..5d69ae0 100644
>> >>> --- a/fs/f2fs/checkpoint.c
>> >>> +++ b/fs/f2fs/checkpoint.c
>> >>> @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>> >>> f2fs_put_page(cp_page, 1);
>> >>>
>> >>> /* wait for previous submitted node/meta pages writeback */
>> >>> - while (get_pages(sbi, F2FS_WRITEBACK))
>> >>> - congestion_wait(BLK_RW_ASYNC, HZ / 50);
>> >>> + f2fs_writeback_wait(sbi);
>> >>>
>> >>> filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>> >>> filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> >>> index 7fd99d8..4b0d70e 100644
>> >>> --- a/fs/f2fs/f2fs.h
>> >>> +++ b/fs/f2fs/f2fs.h
>> >>> @@ -18,6 +18,8 @@
>> >>> #include <linux/crc32.h>
>> >>> #include <linux/magic.h>
>> >>> #include <linux/kobject.h>
>> >>> +#include <linux/wait.h>
>> >>> +#include <linux/sched.h>
>> >>>
>> >>> /*
>> >>> * For mount options
>> >>> @@ -368,6 +370,7 @@ struct f2fs_sb_info {
>> >>> struct mutex fs_lock[NR_GLOBAL_LOCKS]; /* blocking FS operations */
>> >>> struct mutex node_write; /* locking node writes */
>> >>> struct mutex writepages; /* mutex for writepages() */
>> >>> + wait_queue_head_t writeback_wqh; /* wait_queue for writeback */
>> >>> unsigned char next_lock_num; /* round-robin global locks */
>> >>> int por_doing; /* recovery is doing or not */
>> >>> int on_build_free_nids; /* build_free_nids is doing */
>> >>> @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
>> >>> return sb->s_flags & MS_RDONLY;
>> >>> }
>> >>>
>> >>> +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
>> >>> +{
>> >>> + DEFINE_WAIT(wait);
>> >>> +
>> >>> + prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
>> >>> + if (get_pages(sbi, F2FS_WRITEBACK))
>> >>> + io_schedule();
>> >>> + finish_wait(&sbi->writeback_wqh, &wait);
>> >>> +}
>> >>> +
>> >>> +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
>> >>> +{
>> >>> + if (!get_pages(sbi, F2FS_WRITEBACK))
>> >>> + wake_up_all(&sbi->writeback_wqh);
>> >>> +}
>> >>> +
>> >>> /*
>> >>> * file.c
>> >>> */
>> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> >>> index bd79bbe..0708aa9 100644
>> >>> --- a/fs/f2fs/segment.c
>> >>> +++ b/fs/f2fs/segment.c
>> >>> @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>> >>>
>> >>> if (p->is_sync)
>> >>> complete(p->wait);
>> >>> + f2fs_writeback_wake(p->sbi);
>> >>> kfree(p);
>> >>> bio_put(bio);
>> >>> }
>> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> >>> index 094ccc6..3ac6d85 100644
>> >>> --- a/fs/f2fs/super.c
>> >>> +++ b/fs/f2fs/super.c
>> >>> @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>> >>> mutex_init(&sbi->gc_mutex);
>> >>> mutex_init(&sbi->writepages);
>> >>> mutex_init(&sbi->cp_mutex);
>> >>> + init_waitqueue_head(&sbi->writeback_wqh);
>> >>> for (i = 0; i < NR_GLOBAL_LOCKS; i++)
>> >>> mutex_init(&sbi->fs_lock[i]);
>> >>> mutex_init(&sbi->node_write);N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±‘êçzX§¶\x17›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾\a«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹\x1e®w¥¢¸?™¨è­Ú&¢)ߢ^[f”ù^jÇ«y§m…á@A«a¶Ú\x7fÿ\f0¶ìh®\x0få’i\x7f
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
@ 2013-10-10  8:11       ` Gu Zheng
  0 siblings, 0 replies; 13+ messages in thread
From: Gu Zheng @ 2013-10-10  8:11 UTC (permalink / raw)
  To: Jin Xu; +Cc: shu tan, linux-kernel, linux-f2fs-devel, linux-fsdevel

Hi Jin,

On 10/10/2013 04:09 PM, Jin Xu wrote:

> Hi Gu,
> 
> I have a comment below.
> 
>> Date: Wed, 9 Oct 2013 12:04:09 +0800
>> From: guz.fnst@cn.fujitsu.com
>> To: yuan.mark.zhong@samsung.com
>> CC: jaegeuk.kim@samsung.com; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; shu.tan@samsung.com
>> Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
>>
>> Hi Yuan,
>> On 10/08/2013 07:30 PM, Yuan Zhong wrote:
>>
> ...
>>
>> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> ---
>> fs/f2fs/checkpoint.c | 11 +++++++++--
>> fs/f2fs/f2fs.h | 1 +
>> fs/f2fs/segment.c | 4 ++++
>> 3 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> index d808827..2a5999d 100644
>> --- a/fs/f2fs/checkpoint.c
>> +++ b/fs/f2fs/checkpoint.c
>> @@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>> f2fs_put_page(cp_page, 1);
>>
>> /* wait for previous submitted node/meta pages writeback */
>> - while (get_pages(sbi, F2FS_WRITEBACK))
>> - congestion_wait(BLK_RW_ASYNC, HZ / 50);
>> + sbi->cp_task = current;
>> + while (get_pages(sbi, F2FS_WRITEBACK)) {
>> + set_current_state(TASK_UNINTERRUPTIBLE);
>> + if (!get_pages(sbi, F2FS_WRITEBACK))
>> + break;
>> + io_schedule();
>> + }
>> + __set_current_state(TASK_RUNNING);
>> + sbi->cp_task = NULL;
>>
>> filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>> filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index a955a59..408ace7 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -365,6 +365,7 @@ struct f2fs_sb_info {
>> struct mutex writepages; /* mutex for writepages() */
>> int por_doing; /* recovery is doing or not */
>> int on_build_free_nids; /* build_free_nids is doing */
>> + struct task_struct *cp_task; /* checkpoint task */
>>
>> /* for orphan inode management */
>> struct list_head orphan_inode_list; /* orphan inode list */
>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> index bd79bbe..3b20359 100644
>> --- a/fs/f2fs/segment.c
>> +++ b/fs/f2fs/segment.c
>> @@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>>
>> if (p->is_sync)
>> complete(p->wait);
>> +
>> + if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
>> + wake_up_process(p->sbi->cp_task);
> 
> There is a risk of dereferencing a NULL pointer because here simply comparing the
> cp_task against NULL is not enough to avoid race in multi-thread environment.
> Another thread could have assigned it to NULL in the window between the comparison
> and waking up.

Can not be that, checkpoint routine is always singleton and protected by cp_mutex and cp_rwsem.

Thanks,
Gu

> 
> Regards,
> Jin
>> +
>> kfree(p);
>> bio_put(bio);
>> }
>> --
>> 1.7.7
>>
>> Regards,
>> Gu
>>
>> >
>> >
>> >>> This is a problem here, especially, when sync a large number of small files or dirs.
>> >>> In order to avoid this, a wait_list is introduced,
>> >>> the checkpoint thread will be dropped into the wait_list if the pages have not been written back,
>> >>> and will be waked up by contrast.
>> >
>> >> Please pay some attention to the mail form, this mail is out of format in my mail client.
>> >
>> >> Regards,
>> >> Gu
>> >
>> > Regards,
>> > Yuan
>> >
>> >>>
>> >>> Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
>> >>> ---
>> >>> fs/f2fs/checkpoint.c | 3 +--
>> >>> fs/f2fs/f2fs.h | 19 +++++++++++++++++++
>> >>> fs/f2fs/segment.c | 1 +
>> >>> fs/f2fs/super.c | 1 +
>> >>> 4 files changed, 22 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> >>> index ca39442..5d69ae0 100644
>> >>> --- a/fs/f2fs/checkpoint.c
>> >>> +++ b/fs/f2fs/checkpoint.c
>> >>> @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>> >>> f2fs_put_page(cp_page, 1);
>> >>>
>> >>> /* wait for previous submitted node/meta pages writeback */
>> >>> - while (get_pages(sbi, F2FS_WRITEBACK))
>> >>> - congestion_wait(BLK_RW_ASYNC, HZ / 50);
>> >>> + f2fs_writeback_wait(sbi);
>> >>>
>> >>> filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>> >>> filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> >>> index 7fd99d8..4b0d70e 100644
>> >>> --- a/fs/f2fs/f2fs.h
>> >>> +++ b/fs/f2fs/f2fs.h
>> >>> @@ -18,6 +18,8 @@
>> >>> #include <linux/crc32.h>
>> >>> #include <linux/magic.h>
>> >>> #include <linux/kobject.h>
>> >>> +#include <linux/wait.h>
>> >>> +#include <linux/sched.h>
>> >>>
>> >>> /*
>> >>> * For mount options
>> >>> @@ -368,6 +370,7 @@ struct f2fs_sb_info {
>> >>> struct mutex fs_lock[NR_GLOBAL_LOCKS]; /* blocking FS operations */
>> >>> struct mutex node_write; /* locking node writes */
>> >>> struct mutex writepages; /* mutex for writepages() */
>> >>> + wait_queue_head_t writeback_wqh; /* wait_queue for writeback */
>> >>> unsigned char next_lock_num; /* round-robin global locks */
>> >>> int por_doing; /* recovery is doing or not */
>> >>> int on_build_free_nids; /* build_free_nids is doing */
>> >>> @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
>> >>> return sb->s_flags & MS_RDONLY;
>> >>> }
>> >>>
>> >>> +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
>> >>> +{
>> >>> + DEFINE_WAIT(wait);
>> >>> +
>> >>> + prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
>> >>> + if (get_pages(sbi, F2FS_WRITEBACK))
>> >>> + io_schedule();
>> >>> + finish_wait(&sbi->writeback_wqh, &wait);
>> >>> +}
>> >>> +
>> >>> +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
>> >>> +{
>> >>> + if (!get_pages(sbi, F2FS_WRITEBACK))
>> >>> + wake_up_all(&sbi->writeback_wqh);
>> >>> +}
>> >>> +
>> >>> /*
>> >>> * file.c
>> >>> */
>> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> >>> index bd79bbe..0708aa9 100644
>> >>> --- a/fs/f2fs/segment.c
>> >>> +++ b/fs/f2fs/segment.c
>> >>> @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>> >>>
>> >>> if (p->is_sync)
>> >>> complete(p->wait);
>> >>> + f2fs_writeback_wake(p->sbi);
>> >>> kfree(p);
>> >>> bio_put(bio);
>> >>> }
>> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> >>> index 094ccc6..3ac6d85 100644
>> >>> --- a/fs/f2fs/super.c
>> >>> +++ b/fs/f2fs/super.c
>> >>> @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>> >>> mutex_init(&sbi->gc_mutex);
>> >>> mutex_init(&sbi->writepages);
>> >>> mutex_init(&sbi->cp_mutex);
>> >>> + init_waitqueue_head(&sbi->writeback_wqh);
>> >>> for (i = 0; i < NR_GLOBAL_LOCKS; i++)
>> >>> mutex_init(&sbi->fs_lock[i]);
>> >>> mutex_init(&sbi->node_write);N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±‘êçzX§¶\x17›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾\a«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹\x1e®w¥¢¸?™¨è­Ú&¢)ߢ^[f”ù^jÇ«y§m…á@A«a¶Ú\x7fÿ\f0¶ìh®\x0få’i\x7f
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
  2013-10-10  8:11       ` Gu Zheng
  (?)
@ 2013-10-10 23:54       ` Jin Xu
  2013-10-11  2:16           ` Gu Zheng
  -1 siblings, 1 reply; 13+ messages in thread
From: Jin Xu @ 2013-10-10 23:54 UTC (permalink / raw)
  To: Gu Zheng; +Cc: shu tan, linux-kernel, linux-f2fs-devel, linux-fsdevel


[-- Attachment #1.1: Type: text/plain, Size: 8784 bytes --]

> Date: Thu, 10 Oct 2013 16:11:53 +0800
> From: guz.fnst@cn.fujitsu.com
> To: jinuxstyle@live.com
> CC: yuan.mark.zhong@samsung.com; jaegeuk.kim@samsung.com; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; shu.tan@samsung.com
> Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
> 
> Hi Jin,
> 
> On 10/10/2013 04:09 PM, Jin Xu wrote:
> 
> > Hi Gu,
> > 
> > I have a comment below.
> > 
> >> Date: Wed, 9 Oct 2013 12:04:09 +0800
> >> From: guz.fnst@cn.fujitsu.com
> >> To: yuan.mark.zhong@samsung.com
> >> CC: jaegeuk.kim@samsung.com; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; shu.tan@samsung.com
> >> Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
> >>
> >> Hi Yuan,
> >> On 10/08/2013 07:30 PM, Yuan Zhong wrote:
> >>
> > ...
> >>
> >> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> >> ---
> >> fs/f2fs/checkpoint.c | 11 +++++++++--
> >> fs/f2fs/f2fs.h | 1 +
> >> fs/f2fs/segment.c | 4 ++++
> >> 3 files changed, 14 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> >> index d808827..2a5999d 100644
> >> --- a/fs/f2fs/checkpoint.c
> >> +++ b/fs/f2fs/checkpoint.c
> >> @@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
> >> f2fs_put_page(cp_page, 1);
> >>
> >> /* wait for previous submitted node/meta pages writeback */
> >> - while (get_pages(sbi, F2FS_WRITEBACK))
> >> - congestion_wait(BLK_RW_ASYNC, HZ / 50);
> >> + sbi->cp_task = current;
> >> + while (get_pages(sbi, F2FS_WRITEBACK)) {
> >> + set_current_state(TASK_UNINTERRUPTIBLE);
> >> + if (!get_pages(sbi, F2FS_WRITEBACK))
> >> + break;
> >> + io_schedule();
> >> + }
> >> + __set_current_state(TASK_RUNNING);
> >> + sbi->cp_task = NULL;
> >>
> >> filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
> >> filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >> index a955a59..408ace7 100644
> >> --- a/fs/f2fs/f2fs.h
> >> +++ b/fs/f2fs/f2fs.h
> >> @@ -365,6 +365,7 @@ struct f2fs_sb_info {
> >> struct mutex writepages; /* mutex for writepages() */
> >> int por_doing; /* recovery is doing or not */
> >> int on_build_free_nids; /* build_free_nids is doing */
> >> + struct task_struct *cp_task; /* checkpoint task */
> >>
> >> /* for orphan inode management */
> >> struct list_head orphan_inode_list; /* orphan inode list */
> >> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >> index bd79bbe..3b20359 100644
> >> --- a/fs/f2fs/segment.c
> >> +++ b/fs/f2fs/segment.c
> >> @@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
> >>
> >> if (p->is_sync)
> >> complete(p->wait);
> >> +
> >> + if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
> >> + wake_up_process(p->sbi->cp_task);
> > 
> > There is a risk of dereferencing a NULL pointer because here simply comparing the
> > cp_task against NULL is not enough to avoid race in multi-thread environment.
> > Another thread could have assigned it to NULL in the window between the comparison
> > and waking up.
> 
> Can not be that, checkpoint routine is always singleton and protected by cp_mutex and cp_rwsem.
> 
The race could happen like this for example:On a SMP environment, thread 1 wakes up the checkpoint thread, thenthread 2 comes to the f2fs_end_io_write, compared the cp_task as not NULL,but at the same time, the checkpoint thread just assigned the cp_task to NULL.When thread 2 gets to the wake_up_process, dereferencing to NULL pointerhappens.
Jin
> Thanks,
> Gu 
> 
> > 
> > Regards,
> > Jin
> >> +
> >> kfree(p);
> >> bio_put(bio);
> >> }
> >> --
> >> 1.7.7
> >>
> >> Regards,
> >> Gu
> >>
> >> >
> >> >
> >> >>> This is a problem here, especially, when sync a large number of small files or dirs.
> >> >>> In order to avoid this, a wait_list is introduced,
> >> >>> the checkpoint thread will be dropped into the wait_list if the pages have not been written back,
> >> >>> and will be waked up by contrast.
> >> >
> >> >> Please pay some attention to the mail form, this mail is out of format in my mail client.
> >> >
> >> >> Regards,
> >> >> Gu
> >> >
> >> > Regards,
> >> > Yuan
> >> >
> >> >>>
> >> >>> Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
> >> >>> ---
> >> >>> fs/f2fs/checkpoint.c | 3 +--
> >> >>> fs/f2fs/f2fs.h | 19 +++++++++++++++++++
> >> >>> fs/f2fs/segment.c | 1 +
> >> >>> fs/f2fs/super.c | 1 +
> >> >>> 4 files changed, 22 insertions(+), 2 deletions(-)
> >> >>>
> >> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> >> >>> index ca39442..5d69ae0 100644
> >> >>> --- a/fs/f2fs/checkpoint.c
> >> >>> +++ b/fs/f2fs/checkpoint.c
> >> >>> @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
> >> >>> f2fs_put_page(cp_page, 1);
> >> >>>
> >> >>> /* wait for previous submitted node/meta pages writeback */
> >> >>> - while (get_pages(sbi, F2FS_WRITEBACK))
> >> >>> - congestion_wait(BLK_RW_ASYNC, HZ / 50);
> >> >>> + f2fs_writeback_wait(sbi);
> >> >>>
> >> >>> filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
> >> >>> filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
> >> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >> >>> index 7fd99d8..4b0d70e 100644
> >> >>> --- a/fs/f2fs/f2fs.h
> >> >>> +++ b/fs/f2fs/f2fs.h
> >> >>> @@ -18,6 +18,8 @@
> >> >>> #include <linux/crc32.h>
> >> >>> #include <linux/magic.h>
> >> >>> #include <linux/kobject.h>
> >> >>> +#include <linux/wait.h>
> >> >>> +#include <linux/sched.h>
> >> >>>
> >> >>> /*
> >> >>> * For mount options
> >> >>> @@ -368,6 +370,7 @@ struct f2fs_sb_info {
> >> >>> struct mutex fs_lock[NR_GLOBAL_LOCKS]; /* blocking FS operations */
> >> >>> struct mutex node_write; /* locking node writes */
> >> >>> struct mutex writepages; /* mutex for writepages() */
> >> >>> + wait_queue_head_t writeback_wqh; /* wait_queue for writeback */
> >> >>> unsigned char next_lock_num; /* round-robin global locks */
> >> >>> int por_doing; /* recovery is doing or not */
> >> >>> int on_build_free_nids; /* build_free_nids is doing */
> >> >>> @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
> >> >>> return sb->s_flags & MS_RDONLY;
> >> >>> }
> >> >>>
> >> >>> +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
> >> >>> +{
> >> >>> + DEFINE_WAIT(wait);
> >> >>> +
> >> >>> + prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
> >> >>> + if (get_pages(sbi, F2FS_WRITEBACK))
> >> >>> + io_schedule();
> >> >>> + finish_wait(&sbi->writeback_wqh, &wait);
> >> >>> +}
> >> >>> +
> >> >>> +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
> >> >>> +{
> >> >>> + if (!get_pages(sbi, F2FS_WRITEBACK))
> >> >>> + wake_up_all(&sbi->writeback_wqh);
> >> >>> +}
> >> >>> +
> >> >>> /*
> >> >>> * file.c
> >> >>> */
> >> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >> >>> index bd79bbe..0708aa9 100644
> >> >>> --- a/fs/f2fs/segment.c
> >> >>> +++ b/fs/f2fs/segment.c
> >> >>> @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
> >> >>>
> >> >>> if (p->is_sync)
> >> >>> complete(p->wait);
> >> >>> + f2fs_writeback_wake(p->sbi);
> >> >>> kfree(p);
> >> >>> bio_put(bio);
> >> >>> }
> >> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >> >>> index 094ccc6..3ac6d85 100644
> >> >>> --- a/fs/f2fs/super.c
> >> >>> +++ b/fs/f2fs/super.c
> >> >>> @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >> >>> mutex_init(&sbi->gc_mutex);
> >> >>> mutex_init(&sbi->writepages);
> >> >>> mutex_init(&sbi->cp_mutex);
> >> >>> + init_waitqueue_head(&sbi->writeback_wqh);
> >> >>> for (i = 0; i < NR_GLOBAL_LOCKS; i++)
> >> >>> mutex_init(&sbi->fs_lock[i]);
> >> >>> mutex_init(&sbi->node_write);N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±‘êçzX§¶\x17›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾\a«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹\x1e®w¥¢¸?™¨è­Ú&¢)ߢ^[f”ù^jÇ«y§m…á@A«a¶Ú\x7fÿ\f0¶ìh®\x0få’i\x7f
> >>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 12660 bytes --]

[-- Attachment #2: Type: text/plain, Size: 416 bytes --]

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk

[-- Attachment #3: Type: text/plain, Size: 179 bytes --]

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
  2013-10-10 23:54       ` Jin Xu
@ 2013-10-11  2:16           ` Gu Zheng
  0 siblings, 0 replies; 13+ messages in thread
From: Gu Zheng @ 2013-10-11  2:16 UTC (permalink / raw)
  To: Jin Xu
  Cc: yuan.mark.zhong, Jaegeuk Kim, linux-f2fs-devel, linux-kernel,
	linux-fsdevel, shu tan

Hi Jin,
On 10/11/2013 07:54 AM, Jin Xu wrote:

>> Date: Thu, 10 Oct 2013 16:11:53 +0800
>> From: guz.fnst@cn.fujitsu.com
>> To: jinuxstyle@live.com
>> CC: yuan.mark.zhong@samsung.com; jaegeuk.kim@samsung.com; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; shu.tan@samsung.com
>> Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
>>
>> Hi Jin,
>>
>> On 10/10/2013 04:09 PM, Jin Xu wrote:
>>
>> > Hi Gu,
>> >
>> > I have a comment below.
>> >
>> >> Date: Wed, 9 Oct 2013 12:04:09 +0800
>> >> From: guz.fnst@cn.fujitsu.com
>> >> To: yuan.mark.zhong@samsung.com
>> >> CC: jaegeuk.kim@samsung.com; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; shu.tan@samsung.com
>> >> Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
>> >>
>> >> Hi Yuan,
>> >> On 10/08/2013 07:30 PM, Yuan Zhong wrote:
>> >>
>> > ...
>> >>
>> >> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> >> ---
>> >> fs/f2fs/checkpoint.c | 11 +++++++++--
>> >> fs/f2fs/f2fs.h | 1 +
>> >> fs/f2fs/segment.c | 4 ++++
>> >> 3 files changed, 14 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> >> index d808827..2a5999d 100644
>> >> --- a/fs/f2fs/checkpoint.c
>> >> +++ b/fs/f2fs/checkpoint.c
>> >> @@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>> >> f2fs_put_page(cp_page, 1);
>> >>
>> >> /* wait for previous submitted node/meta pages writeback */
>> >> - while (get_pages(sbi, F2FS_WRITEBACK))
>> >> - congestion_wait(BLK_RW_ASYNC, HZ / 50);
>> >> + sbi->cp_task = current;
>> >> + while (get_pages(sbi, F2FS_WRITEBACK)) {
>> >> + set_current_state(TASK_UNINTERRUPTIBLE);
>> >> + if (!get_pages(sbi, F2FS_WRITEBACK))
>> >> + break;
>> >> + io_schedule();
>> >> + }
>> >> + __set_current_state(TASK_RUNNING);
>> >> + sbi->cp_task = NULL;
>> >>
>> >> filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>> >> filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> >> index a955a59..408ace7 100644
>> >> --- a/fs/f2fs/f2fs.h
>> >> +++ b/fs/f2fs/f2fs.h
>> >> @@ -365,6 +365,7 @@ struct f2fs_sb_info {
>> >> struct mutex writepages; /* mutex for writepages() */
>> >> int por_doing; /* recovery is doing or not */
>> >> int on_build_free_nids; /* build_free_nids is doing */
>> >> + struct task_struct *cp_task; /* checkpoint task */
>> >>
>> >> /* for orphan inode management */
>> >> struct list_head orphan_inode_list; /* orphan inode list */
>> >> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> >> index bd79bbe..3b20359 100644
>> >> --- a/fs/f2fs/segment.c
>> >> +++ b/fs/f2fs/segment.c
>> >> @@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>> >>
>> >> if (p->is_sync)
>> >> complete(p->wait);
>> >> +
>> >> + if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
>> >> + wake_up_process(p->sbi->cp_task);
>> >
>> > There is a risk of dereferencing a NULL pointer because here simply comparing the
>> > cp_task against NULL is not enough to avoid race in multi-thread environment.
>> > Another thread could have assigned it to NULL in the window between the comparison
>> > and waking up.
>>
>> Can not be that, checkpoint routine is always singleton and protected by cp_mutex and cp_rwsem.
>> 
> 
> The race could happen like this for example:
> On a SMP environment, thread 1 wakes up the checkpoint thread, then
> thread 2 comes to the f2fs_end_io_write, compared the cp_task as not NULL,
> but at the same time, the checkpoint thread just assigned the cp_task to NULL.
> When thread 2 gets to the wake_up_process, dereferencing to NULL pointer
> happens.

The case means that two or more IO are going on. If thread 1 wake up checkpoint, it'll check
get_pages(p->sbi, F2FS_WRITEBACK) before going down to assign the cp_task to NULL, so when
thread 2 gets to the wake_up_process the cp_task is still valid. 
The "while (get_pages(sbi, F2FS_WRITEBACK))" loop is used to avoid this issue.

Thanks,
Gu

> 
> Jin
> 
>> Thanks,
>> Gu 
>>
>> >
>> > Regards,
>> > Jin
>> >> +
>> >> kfree(p);
>> >> bio_put(bio);
>> >> }
>> >> --
>> >> 1.7.7
>> >>
>> >> Regards,
>> >> Gu
>> >>
>> >> >
>> >> >
>> >> >>> This is a problem here, especially, when sync a large number of small files or dirs.
>> >> >>> In order to avoid this, a wait_list is introduced,
>> >> >>> the checkpoint thread will be dropped into the wait_list if the pages have not been written back,
>> >> >>> and will be waked up by contrast.
>> >> >
>> >> >> Please pay some attention to the mail form, this mail is out of format in my mail client.
>> >> >
>> >> >> Regards,
>> >> >> Gu
>> >> >
>> >> > Regards,
>> >> > Yuan
>> >> >
>> >> >>>
>> >> >>> Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
>> >> >>> ---
>> >> >>> fs/f2fs/checkpoint.c | 3 +--
>> >> >>> fs/f2fs/f2fs.h | 19 +++++++++++++++++++
>> >> >>> fs/f2fs/segment.c | 1 +
>> >> >>> fs/f2fs/super.c | 1 +
>> >> >>> 4 files changed, 22 insertions(+), 2 deletions(-)
>> >> >>>
>> >> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> >> >>> index ca39442..5d69ae0 100644
>> >> >>> --- a/fs/f2fs/checkpoint.c
>> >> >>> +++ b/fs/f2fs/checkpoint.c
>> >> >>> @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>> >> >>> f2fs_put_page(cp_page, 1);
>> >> >>>
>> >> >>> /* wait for previous submitted node/meta pages writeback */
>> >> >>> - while (get_pages(sbi, F2FS_WRITEBACK))
>> >> >>> - congestion_wait(BLK_RW_ASYNC, HZ / 50);
>> >> >>> + f2fs_writeback_wait(sbi);
>> >> >>>
>> >> >>> filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>> >> >>> filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>> >> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> >> >>> index 7fd99d8..4b0d70e 100644
>> >> >>> --- a/fs/f2fs/f2fs.h
>> >> >>> +++ b/fs/f2fs/f2fs.h
>> >> >>> @@ -18,6 +18,8 @@
>> >> >>> #include <linux/crc32.h>
>> >> >>> #include <linux/magic.h>
>> >> >>> #include <linux/kobject.h>
>> >> >>> +#include <linux/wait.h>
>> >> >>> +#include <linux/sched.h>
>> >> >>>
>> >> >>> /*
>> >> >>> * For mount options
>> >> >>> @@ -368,6 +370,7 @@ struct f2fs_sb_info {
>> >> >>> struct mutex fs_lock[NR_GLOBAL_LOCKS]; /* blocking FS operations */
>> >> >>> struct mutex node_write; /* locking node writes */
>> >> >>> struct mutex writepages; /* mutex for writepages() */
>> >> >>> + wait_queue_head_t writeback_wqh; /* wait_queue for writeback */
>> >> >>> unsigned char next_lock_num; /* round-robin global locks */
>> >> >>> int por_doing; /* recovery is doing or not */
>> >> >>> int on_build_free_nids; /* build_free_nids is doing */
>> >> >>> @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
>> >> >>> return sb->s_flags & MS_RDONLY;
>> >> >>> }
>> >> >>>
>> >> >>> +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
>> >> >>> +{
>> >> >>> + DEFINE_WAIT(wait);
>> >> >>> +
>> >> >>> + prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
>> >> >>> + if (get_pages(sbi, F2FS_WRITEBACK))
>> >> >>> + io_schedule();
>> >> >>> + finish_wait(&sbi->writeback_wqh, &wait);
>> >> >>> +}
>> >> >>> +
>> >> >>> +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
>> >> >>> +{
>> >> >>> + if (!get_pages(sbi, F2FS_WRITEBACK))
>> >> >>> + wake_up_all(&sbi->writeback_wqh);
>> >> >>> +}
>> >> >>> +
>> >> >>> /*
>> >> >>> * file.c
>> >> >>> */
>> >> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> >> >>> index bd79bbe..0708aa9 100644
>> >> >>> --- a/fs/f2fs/segment.c
>> >> >>> +++ b/fs/f2fs/segment.c
>> >> >>> @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>> >> >>>
>> >> >>> if (p->is_sync)
>> >> >>> complete(p->wait);
>> >> >>> + f2fs_writeback_wake(p->sbi);
>> >> >>> kfree(p);
>> >> >>> bio_put(bio);
>> >> >>> }
>> >> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> >> >>> index 094ccc6..3ac6d85 100644
>> >> >>> --- a/fs/f2fs/super.c
>> >> >>> +++ b/fs/f2fs/super.c
>> >> >>> @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>> >> >>> mutex_init(&sbi->gc_mutex);
>> >> >>> mutex_init(&sbi->writepages);
>> >> >>> mutex_init(&sbi->cp_mutex);
>> >> >>> + init_waitqueue_head(&sbi->writeback_wqh);
>> >> >>> for (i = 0; i < NR_GLOBAL_LOCKS; i++)
>> >> >>> mutex_init(&sbi->fs_lock[i]);
>> >> >>> mutex_init(&sbi->node_write);N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±‘êçzX§¶\x17›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾\a«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹\x1e®w¥¢¸?™¨è­Ú&¢)ߢ^[f”ù^jÇ«y§m…á@A«a¶Ú\x7fÿ\f0¶ìh®\x0få’i\x7f
>> >>
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
@ 2013-10-11  2:16           ` Gu Zheng
  0 siblings, 0 replies; 13+ messages in thread
From: Gu Zheng @ 2013-10-11  2:16 UTC (permalink / raw)
  To: Jin Xu
  Cc: yuan.mark.zhong, Jaegeuk Kim, linux-f2fs-devel, linux-kernel,
	linux-fsdevel, shu tan

Hi Jin,
On 10/11/2013 07:54 AM, Jin Xu wrote:

>> Date: Thu, 10 Oct 2013 16:11:53 +0800
>> From: guz.fnst@cn.fujitsu.com
>> To: jinuxstyle@live.com
>> CC: yuan.mark.zhong@samsung.com; jaegeuk.kim@samsung.com; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; shu.tan@samsung.com
>> Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
>>
>> Hi Jin,
>>
>> On 10/10/2013 04:09 PM, Jin Xu wrote:
>>
>> > Hi Gu,
>> >
>> > I have a comment below.
>> >
>> >> Date: Wed, 9 Oct 2013 12:04:09 +0800
>> >> From: guz.fnst@cn.fujitsu.com
>> >> To: yuan.mark.zhong@samsung.com
>> >> CC: jaegeuk.kim@samsung.com; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; shu.tan@samsung.com
>> >> Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
>> >>
>> >> Hi Yuan,
>> >> On 10/08/2013 07:30 PM, Yuan Zhong wrote:
>> >>
>> > ...
>> >>
>> >> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> >> ---
>> >> fs/f2fs/checkpoint.c | 11 +++++++++--
>> >> fs/f2fs/f2fs.h | 1 +
>> >> fs/f2fs/segment.c | 4 ++++
>> >> 3 files changed, 14 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> >> index d808827..2a5999d 100644
>> >> --- a/fs/f2fs/checkpoint.c
>> >> +++ b/fs/f2fs/checkpoint.c
>> >> @@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>> >> f2fs_put_page(cp_page, 1);
>> >>
>> >> /* wait for previous submitted node/meta pages writeback */
>> >> - while (get_pages(sbi, F2FS_WRITEBACK))
>> >> - congestion_wait(BLK_RW_ASYNC, HZ / 50);
>> >> + sbi->cp_task = current;
>> >> + while (get_pages(sbi, F2FS_WRITEBACK)) {
>> >> + set_current_state(TASK_UNINTERRUPTIBLE);
>> >> + if (!get_pages(sbi, F2FS_WRITEBACK))
>> >> + break;
>> >> + io_schedule();
>> >> + }
>> >> + __set_current_state(TASK_RUNNING);
>> >> + sbi->cp_task = NULL;
>> >>
>> >> filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>> >> filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> >> index a955a59..408ace7 100644
>> >> --- a/fs/f2fs/f2fs.h
>> >> +++ b/fs/f2fs/f2fs.h
>> >> @@ -365,6 +365,7 @@ struct f2fs_sb_info {
>> >> struct mutex writepages; /* mutex for writepages() */
>> >> int por_doing; /* recovery is doing or not */
>> >> int on_build_free_nids; /* build_free_nids is doing */
>> >> + struct task_struct *cp_task; /* checkpoint task */
>> >>
>> >> /* for orphan inode management */
>> >> struct list_head orphan_inode_list; /* orphan inode list */
>> >> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> >> index bd79bbe..3b20359 100644
>> >> --- a/fs/f2fs/segment.c
>> >> +++ b/fs/f2fs/segment.c
>> >> @@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>> >>
>> >> if (p->is_sync)
>> >> complete(p->wait);
>> >> +
>> >> + if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
>> >> + wake_up_process(p->sbi->cp_task);
>> >
>> > There is a risk of dereferencing a NULL pointer because here simply comparing the
>> > cp_task against NULL is not enough to avoid race in multi-thread environment.
>> > Another thread could have assigned it to NULL in the window between the comparison
>> > and waking up.
>>
>> Can not be that, checkpoint routine is always singleton and protected by cp_mutex and cp_rwsem.
>> 
> 
> The race could happen like this for example:
> On a SMP environment, thread 1 wakes up the checkpoint thread, then
> thread 2 comes to the f2fs_end_io_write, compared the cp_task as not NULL,
> but at the same time, the checkpoint thread just assigned the cp_task to NULL.
> When thread 2 gets to the wake_up_process, dereferencing to NULL pointer
> happens.

The case means that two or more IO are going on. If thread 1 wake up checkpoint, it'll check
get_pages(p->sbi, F2FS_WRITEBACK) before going down to assign the cp_task to NULL, so when
thread 2 gets to the wake_up_process the cp_task is still valid. 
The "while (get_pages(sbi, F2FS_WRITEBACK))" loop is used to avoid this issue.

Thanks,
Gu

> 
> Jin
> 
>> Thanks,
>> Gu 
>>
>> >
>> > Regards,
>> > Jin
>> >> +
>> >> kfree(p);
>> >> bio_put(bio);
>> >> }
>> >> --
>> >> 1.7.7
>> >>
>> >> Regards,
>> >> Gu
>> >>
>> >> >
>> >> >
>> >> >>> This is a problem here, especially, when sync a large number of small files or dirs.
>> >> >>> In order to avoid this, a wait_list is introduced,
>> >> >>> the checkpoint thread will be dropped into the wait_list if the pages have not been written back,
>> >> >>> and will be waked up by contrast.
>> >> >
>> >> >> Please pay some attention to the mail form, this mail is out of format in my mail client.
>> >> >
>> >> >> Regards,
>> >> >> Gu
>> >> >
>> >> > Regards,
>> >> > Yuan
>> >> >
>> >> >>>
>> >> >>> Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
>> >> >>> ---
>> >> >>> fs/f2fs/checkpoint.c | 3 +--
>> >> >>> fs/f2fs/f2fs.h | 19 +++++++++++++++++++
>> >> >>> fs/f2fs/segment.c | 1 +
>> >> >>> fs/f2fs/super.c | 1 +
>> >> >>> 4 files changed, 22 insertions(+), 2 deletions(-)
>> >> >>>
>> >> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> >> >>> index ca39442..5d69ae0 100644
>> >> >>> --- a/fs/f2fs/checkpoint.c
>> >> >>> +++ b/fs/f2fs/checkpoint.c
>> >> >>> @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>> >> >>> f2fs_put_page(cp_page, 1);
>> >> >>>
>> >> >>> /* wait for previous submitted node/meta pages writeback */
>> >> >>> - while (get_pages(sbi, F2FS_WRITEBACK))
>> >> >>> - congestion_wait(BLK_RW_ASYNC, HZ / 50);
>> >> >>> + f2fs_writeback_wait(sbi);
>> >> >>>
>> >> >>> filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>> >> >>> filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>> >> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> >> >>> index 7fd99d8..4b0d70e 100644
>> >> >>> --- a/fs/f2fs/f2fs.h
>> >> >>> +++ b/fs/f2fs/f2fs.h
>> >> >>> @@ -18,6 +18,8 @@
>> >> >>> #include <linux/crc32.h>
>> >> >>> #include <linux/magic.h>
>> >> >>> #include <linux/kobject.h>
>> >> >>> +#include <linux/wait.h>
>> >> >>> +#include <linux/sched.h>
>> >> >>>
>> >> >>> /*
>> >> >>> * For mount options
>> >> >>> @@ -368,6 +370,7 @@ struct f2fs_sb_info {
>> >> >>> struct mutex fs_lock[NR_GLOBAL_LOCKS]; /* blocking FS operations */
>> >> >>> struct mutex node_write; /* locking node writes */
>> >> >>> struct mutex writepages; /* mutex for writepages() */
>> >> >>> + wait_queue_head_t writeback_wqh; /* wait_queue for writeback */
>> >> >>> unsigned char next_lock_num; /* round-robin global locks */
>> >> >>> int por_doing; /* recovery is doing or not */
>> >> >>> int on_build_free_nids; /* build_free_nids is doing */
>> >> >>> @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
>> >> >>> return sb->s_flags & MS_RDONLY;
>> >> >>> }
>> >> >>>
>> >> >>> +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
>> >> >>> +{
>> >> >>> + DEFINE_WAIT(wait);
>> >> >>> +
>> >> >>> + prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
>> >> >>> + if (get_pages(sbi, F2FS_WRITEBACK))
>> >> >>> + io_schedule();
>> >> >>> + finish_wait(&sbi->writeback_wqh, &wait);
>> >> >>> +}
>> >> >>> +
>> >> >>> +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
>> >> >>> +{
>> >> >>> + if (!get_pages(sbi, F2FS_WRITEBACK))
>> >> >>> + wake_up_all(&sbi->writeback_wqh);
>> >> >>> +}
>> >> >>> +
>> >> >>> /*
>> >> >>> * file.c
>> >> >>> */
>> >> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> >> >>> index bd79bbe..0708aa9 100644
>> >> >>> --- a/fs/f2fs/segment.c
>> >> >>> +++ b/fs/f2fs/segment.c
>> >> >>> @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>> >> >>>
>> >> >>> if (p->is_sync)
>> >> >>> complete(p->wait);
>> >> >>> + f2fs_writeback_wake(p->sbi);
>> >> >>> kfree(p);
>> >> >>> bio_put(bio);
>> >> >>> }
>> >> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> >> >>> index 094ccc6..3ac6d85 100644
>> >> >>> --- a/fs/f2fs/super.c
>> >> >>> +++ b/fs/f2fs/super.c
>> >> >>> @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>> >> >>> mutex_init(&sbi->gc_mutex);
>> >> >>> mutex_init(&sbi->writepages);
>> >> >>> mutex_init(&sbi->cp_mutex);
>> >> >>> + init_waitqueue_head(&sbi->writeback_wqh);
>> >> >>> for (i = 0; i < NR_GLOBAL_LOCKS; i++)
>> >> >>> mutex_init(&sbi->fs_lock[i]);
>> >> >>> mutex_init(&sbi->node_write);N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±‘êçzX§¶\x17›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾\a«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹\x1e®w¥¢¸?™¨è­Ú&¢)ߢ^[f”ù^jÇ«y§m…á@A«a¶Ú\x7fÿ\f0¶ìh®\x0få’i\x7f
>> >>
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
@ 2013-10-09  5:58 Yuan Zhong
  0 siblings, 0 replies; 13+ messages in thread
From: Yuan Zhong @ 2013-10-09  5:58 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Jaegeuk Kim, linux-f2fs-devel, linux-kernel, linux-fsdevel, shu tan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=windows-1252, Size: 8286 bytes --]

Hi Gu,

> Hi Yuan,
> On 10/08/2013 07:30 PM, Yuan Zhong wrote:
>
>> Hi Gu,
>> 
>>> Hi Yuan,
>>> On 10/08/2013 04:30 PM, Yuan Zhong wrote:
>> 
>>>> Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back.
>>>> Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting.
>>>> For this reason, there is a situation that after the pages have been written back, 
>>>> but the checkpoint thread still wait for congestion_wait to exit.
>> 
>>> How do you confirm this issue? 
>> 
>>   I traced the execution path.
>>   In f2fs_end_io_write, dec_page_count(p->sbi, F2FS_WRITEBACK) will be called.
>>   And I found that, when pages of F2FS_WRITEBACK has been zero, but
>>   checkpoint thread still congestion_wait for pages of F2FS_WRITEBACK to be zero.
>
>Yes, it maybe. Congestion_wait add the task to a global wait queue which related to
>all back devices, so if F2FS_WRITEBACK has been zero, but other io may be still going on.
>Anyway, using a private wait queue to hold is a better choose.:)
>
>	
>>   So, I think this point could be improved.
>>   And I wrote a simple test case and tested on Micro-SD card, the steps as following:
>>       (a) create a fixed-size file (4KB)
>>       (b) go on to sync the file 
>>       (c) go back to step #a (fixed numbers of cycling:1024)	
>>    The results indicated that the execution time is reduced greatly by using this patch.
>
>Yes, the change is an improvement if the issue is existent.
>
>  
>> 
>> 
>>> I suspect that the block-core does not have a wake-up mechanism
>>> when the back device is uncongested.
>> 
>> 
>>   Yes, you are right.
>>   So I wake up the checkpoint thread by myself, when pages of F2FS_WRITEBACK to be zero.
>>   In f2fs_end_io_write, f2fs_writeback_wake is called.
>>   you cloud find this code in my patch. 
>
>Saw it.:)
>But one problem is that the checkpoint routine always is singleton, so the wait queue just
>services only one body, it seems not very worthy. How about just schedule and wake up it
>directly? See the following one.


Yes, your point is right.
My reason for using wait queue is that I am influenced by congestion_wait function.
The inner function of congesiton_wait is also using wait_queue.
And, I think, your patch is also a more efficient method.


>
>Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>---
> fs/f2fs/checkpoint.c |   11 +++++++++--
> fs/f2fs/f2fs.h       |    1 +
> fs/f2fs/segment.c    |    4 ++++
> 3 files changed, 14 insertions(+), 2 deletions(-)
>
>diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>index d808827..2a5999d 100644
>--- a/fs/f2fs/checkpoint.c
>+++ b/fs/f2fs/checkpoint.c
>@@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
> 	f2fs_put_page(cp_page, 1);
> 
> 	/* wait for previous submitted node/meta pages writeback */
>-	while (get_pages(sbi, F2FS_WRITEBACK))
>-		congestion_wait(BLK_RW_ASYNC, HZ / 50);
>+	sbi->cp_task = current;
>+	while (get_pages(sbi, F2FS_WRITEBACK)) {
>+		set_current_state(TASK_UNINTERRUPTIBLE);
>+		if (!get_pages(sbi, F2FS_WRITEBACK))
>+			break;
>+		io_schedule();
>+	}
>+	__set_current_state(TASK_RUNNING);
>+	sbi->cp_task = NULL;
> 
> 	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
> 	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>index a955a59..408ace7 100644
>--- a/fs/f2fs/f2fs.h
>+++ b/fs/f2fs/f2fs.h
>@@ -365,6 +365,7 @@ struct f2fs_sb_info {
> 	struct mutex writepages;		/* mutex for writepages() */
> 	int por_doing;				/* recovery is doing or not */
> 	int on_build_free_nids;			/* build_free_nids is doing */
>+	struct task_struct *cp_task;		/* checkpoint task */
> 
> 	/* for orphan inode management */
> 	struct list_head orphan_inode_list;	/* orphan inode list */
>diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>index bd79bbe..3b20359 100644
>--- a/fs/f2fs/segment.c
>+++ b/fs/f2fs/segment.c
>@@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
> 
> 	if (p->is_sync)
> 		complete(p->wait);
>+
>+	if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
>+		wake_up_process(p->sbi->cp_task);
>+
> 	kfree(p);
> 	bio_put(bio);
> }
>-- 
>1.7.7
>
>Regards,
>Gu 
>


Regards,
Yuan


>> 
>> 
>>>> This is a problem here, especially, when sync a large number of small files or dirs.
>>>> In order to avoid this, a wait_list is introduced, 
>>>> the checkpoint thread will be dropped into the wait_list if the pages have not been written back, 
>>>> and will be waked up by contrast.
>> 
>>> Please pay some attention to the mail form, this mail is out of format in my mail client.
>> 
>>> Regards,
>>> Gu
>> 
>> Regards,
>> Yuan
>> 
>>>>
>>>> Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
>>>> ---  
>>>>  fs/f2fs/checkpoint.c |    3 +--
>>>>  fs/f2fs/f2fs.h       |   19 +++++++++++++++++++
>>>>  fs/f2fs/segment.c    |    1 +
>>>>  fs/f2fs/super.c      |    1 +
>>>>  4 files changed, 22 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>>> index ca39442..5d69ae0 100644
>>>> --- a/fs/f2fs/checkpoint.c
>>>> +++ b/fs/f2fs/checkpoint.c
>>>> @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>>>>  	f2fs_put_page(cp_page, 1);
>>>>  
>>>>  	/* wait for previous submitted node/meta pages writeback */
>>>> -	while (get_pages(sbi, F2FS_WRITEBACK))
>>>> -		congestion_wait(BLK_RW_ASYNC, HZ / 50);
>>>> +	f2fs_writeback_wait(sbi);
>>>>  
>>>>  	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>>>>  	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>> index 7fd99d8..4b0d70e 100644
>>>> --- a/fs/f2fs/f2fs.h
>>>> +++ b/fs/f2fs/f2fs.h
>>>> @@ -18,6 +18,8 @@
>>>>  #include <linux/crc32.h>
>>>>  #include <linux/magic.h>
>>>>  #include <linux/kobject.h>
>>>> +#include <linux/wait.h>
>>>> +#include <linux/sched.h>
>>>>  
>>>>  /*
>>>>   * For mount options
>>>> @@ -368,6 +370,7 @@ struct f2fs_sb_info {
>>>>  	struct mutex fs_lock[NR_GLOBAL_LOCKS];	/* blocking FS operations */
>>>>  	struct mutex node_write;		/* locking node writes */
>>>>  	struct mutex writepages;		/* mutex for writepages() */
>>>> +	wait_queue_head_t writeback_wqh;	/* wait_queue for writeback */
>>>>  	unsigned char next_lock_num;		/* round-robin global locks */
>>>>  	int por_doing;				/* recovery is doing or not */
>>>>  	int on_build_free_nids;			/* build_free_nids is doing */
>>>> @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
>>>>  	return sb->s_flags & MS_RDONLY;
>>>>  }
>>>>  
>>>> +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
>>>> +{
>>>> +	DEFINE_WAIT(wait);
>>>> +
>>>> +	prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
>>>> +	if (get_pages(sbi, F2FS_WRITEBACK))
>>>> +		io_schedule();
>>>> +	finish_wait(&sbi->writeback_wqh, &wait);
>>>> +}
>>>> +
>>>> +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
>>>> +{
>>>> +	if (!get_pages(sbi, F2FS_WRITEBACK))
>>>> +		wake_up_all(&sbi->writeback_wqh);
>>>> +}
>>>> +
>>>>  /*
>>>>   * file.c
>>>>   */
>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>> index bd79bbe..0708aa9 100644
>>>> --- a/fs/f2fs/segment.c
>>>> +++ b/fs/f2fs/segment.c
>>>> @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>>>>  
>>>>  	if (p->is_sync)
>>>>  		complete(p->wait);
>>>> +	f2fs_writeback_wake(p->sbi);
>>>>  	kfree(p);
>>>>  	bio_put(bio);
>>>>  }
>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>> index 094ccc6..3ac6d85 100644
>>>> --- a/fs/f2fs/super.c
>>>> +++ b/fs/f2fs/super.c
>>>> @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>  	mutex_init(&sbi->gc_mutex);
>>>>  	mutex_init(&sbi->writepages);
>>>>  	mutex_init(&sbi->cp_mutex);
>>>> +	init_waitqueue_head(&sbi->writeback_wqh);
>>>>  	for (i = 0; i < NR_GLOBAL_LOCKS; i++)
>>>>  		mutex_init(&sbi->fs_lock[i]);
>>>>  	mutex_init(&sbi->node_write);
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
  2013-10-08  8:30 Yuan Zhong
@ 2013-10-08  9:37 ` Gu Zheng
  0 siblings, 0 replies; 13+ messages in thread
From: Gu Zheng @ 2013-10-08  9:37 UTC (permalink / raw)
  To: yuan.mark.zhong
  Cc: jaegeuk.kim, linux-f2fs-devel, linux-kernel, linux-fsdevel, shu.tan

Hi Yuan,
On 10/08/2013 04:30 PM, Yuan Zhong wrote:

> Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back.
> Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting.
> For this reason, there is a situation that after the pages have been written back, but the checkpoint thread still wait for congestion_wait to exit.

How do you confirm this issue? I suspect that the block-core does not have a wake-up mechanism
when the back device is uncongested.

> This is a problem here, especially, when sync a large number of small files or dirs.
> In order to avoid this, a wait_list is introduced, the checkpoint thread will be dropped into the wait_list if the pages have not been written back, and will be waked up by contrast.

Please pay some attention to the mail form, this mail is out of format in my mail client.

Regards,
Gu

> 
> Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
> ---  
>  fs/f2fs/checkpoint.c |    3 +--
>  fs/f2fs/f2fs.h       |   19 +++++++++++++++++++
>  fs/f2fs/segment.c    |    1 +
>  fs/f2fs/super.c      |    1 +
>  4 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index ca39442..5d69ae0 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
>  	f2fs_put_page(cp_page, 1);
>  
>  	/* wait for previous submitted node/meta pages writeback */
> -	while (get_pages(sbi, F2FS_WRITEBACK))
> -		congestion_wait(BLK_RW_ASYNC, HZ / 50);
> +	f2fs_writeback_wait(sbi);
>  
>  	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>  	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 7fd99d8..4b0d70e 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -18,6 +18,8 @@
>  #include <linux/crc32.h>
>  #include <linux/magic.h>
>  #include <linux/kobject.h>
> +#include <linux/wait.h>
> +#include <linux/sched.h>
>  
>  /*
>   * For mount options
> @@ -368,6 +370,7 @@ struct f2fs_sb_info {
>  	struct mutex fs_lock[NR_GLOBAL_LOCKS];	/* blocking FS operations */
>  	struct mutex node_write;		/* locking node writes */
>  	struct mutex writepages;		/* mutex for writepages() */
> +	wait_queue_head_t writeback_wqh;	/* wait_queue for writeback */
>  	unsigned char next_lock_num;		/* round-robin global locks */
>  	int por_doing;				/* recovery is doing or not */
>  	int on_build_free_nids;			/* build_free_nids is doing */
> @@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
>  	return sb->s_flags & MS_RDONLY;
>  }
>  
> +static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
> +{
> +	DEFINE_WAIT(wait);
> +
> +	prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
> +	if (get_pages(sbi, F2FS_WRITEBACK))
> +		io_schedule();
> +	finish_wait(&sbi->writeback_wqh, &wait);
> +}
> +
> +static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
> +{
> +	if (!get_pages(sbi, F2FS_WRITEBACK))
> +		wake_up_all(&sbi->writeback_wqh);
> +}
> +
>  /*
>   * file.c
>   */
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index bd79bbe..0708aa9 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>  
>  	if (p->is_sync)
>  		complete(p->wait);
> +	f2fs_writeback_wake(p->sbi);
>  	kfree(p);
>  	bio_put(bio);
>  }
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 094ccc6..3ac6d85 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  	mutex_init(&sbi->gc_mutex);
>  	mutex_init(&sbi->writepages);
>  	mutex_init(&sbi->cp_mutex);
> +	init_waitqueue_head(&sbi->writeback_wqh);
>  	for (i = 0; i < NR_GLOBAL_LOCKS; i++)
>  		mutex_init(&sbi->fs_lock[i]);
>  	mutex_init(&sbi->node_write);N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±‘êçzX§¶\x17›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾\a«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹\x1e®w¥¢¸?™¨è­Ú&¢)ߢ^[f”ù^jÇ«y§m…á@A«a¶Ú\x7fÿ\f0¶ìh®\x0få’i\x7f



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance
@ 2013-10-08  8:30 Yuan Zhong
  2013-10-08  9:37 ` Gu Zheng
  0 siblings, 1 reply; 13+ messages in thread
From: Yuan Zhong @ 2013-10-08  8:30 UTC (permalink / raw)
  To: jaegeuk.kim; +Cc: linux-f2fs-devel, linux-kernel, linux-fsdevel, shu.tan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=windows-1252, Size: 3701 bytes --]

Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back.
Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting.
For this reason, there is a situation that after the pages have been written back, but the checkpoint thread still wait for congestion_wait to exit.
This is a problem here, especially, when sync a large number of small files or dirs.
In order to avoid this, a wait_list is introduced, the checkpoint thread will be dropped into the wait_list if the pages have not been written back, and will be waked up by contrast.

Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
---  
 fs/f2fs/checkpoint.c |    3 +--
 fs/f2fs/f2fs.h       |   19 +++++++++++++++++++
 fs/f2fs/segment.c    |    1 +
 fs/f2fs/super.c      |    1 +
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index ca39442..5d69ae0 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -758,8 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
 	f2fs_put_page(cp_page, 1);
 
 	/* wait for previous submitted node/meta pages writeback */
-	while (get_pages(sbi, F2FS_WRITEBACK))
-		congestion_wait(BLK_RW_ASYNC, HZ / 50);
+	f2fs_writeback_wait(sbi);
 
 	filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
 	filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 7fd99d8..4b0d70e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -18,6 +18,8 @@
 #include <linux/crc32.h>
 #include <linux/magic.h>
 #include <linux/kobject.h>
+#include <linux/wait.h>
+#include <linux/sched.h>
 
 /*
  * For mount options
@@ -368,6 +370,7 @@ struct f2fs_sb_info {
 	struct mutex fs_lock[NR_GLOBAL_LOCKS];	/* blocking FS operations */
 	struct mutex node_write;		/* locking node writes */
 	struct mutex writepages;		/* mutex for writepages() */
+	wait_queue_head_t writeback_wqh;	/* wait_queue for writeback */
 	unsigned char next_lock_num;		/* round-robin global locks */
 	int por_doing;				/* recovery is doing or not */
 	int on_build_free_nids;			/* build_free_nids is doing */
@@ -961,6 +964,22 @@ static inline int f2fs_readonly(struct super_block *sb)
 	return sb->s_flags & MS_RDONLY;
 }
 
+static inline void f2fs_writeback_wait(struct f2fs_sb_info *sbi)
+{
+	DEFINE_WAIT(wait);
+
+	prepare_to_wait(&sbi->writeback_wqh, &wait, TASK_UNINTERRUPTIBLE);
+	if (get_pages(sbi, F2FS_WRITEBACK))
+		io_schedule();
+	finish_wait(&sbi->writeback_wqh, &wait);
+}
+
+static inline void f2fs_writeback_wake(struct f2fs_sb_info *sbi)
+{
+	if (!get_pages(sbi, F2FS_WRITEBACK))
+		wake_up_all(&sbi->writeback_wqh);
+}
+
 /*
  * file.c
  */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index bd79bbe..0708aa9 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -597,6 +597,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
 
 	if (p->is_sync)
 		complete(p->wait);
+	f2fs_writeback_wake(p->sbi);
 	kfree(p);
 	bio_put(bio);
 }
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 094ccc6..3ac6d85 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -835,6 +835,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 	mutex_init(&sbi->gc_mutex);
 	mutex_init(&sbi->writepages);
 	mutex_init(&sbi->cp_mutex);
+	init_waitqueue_head(&sbi->writeback_wqh);
 	for (i = 0; i < NR_GLOBAL_LOCKS; i++)
 		mutex_init(&sbi->fs_lock[i]);
 	mutex_init(&sbi->node_write);ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-10-11  2:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-08 11:30 [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance Yuan Zhong
2013-10-08 11:30 ` Yuan Zhong
2013-10-09  4:04 ` [f2fs-dev] " Gu Zheng
2013-10-09  4:04   ` Gu Zheng
2013-10-10  8:09   ` Jin Xu
2013-10-10  8:11     ` [f2fs-dev] " Gu Zheng
2013-10-10  8:11       ` Gu Zheng
2013-10-10 23:54       ` Jin Xu
2013-10-11  2:16         ` [f2fs-dev] " Gu Zheng
2013-10-11  2:16           ` Gu Zheng
  -- strict thread matches above, loose matches on Subject: below --
2013-10-09  5:58 Yuan Zhong
2013-10-08  8:30 Yuan Zhong
2013-10-08  9:37 ` Gu Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.