linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
@ 2012-09-11 18:28 OGAWA Hirofumi
  2012-09-12  2:42 ` Fengguang Wu
  2012-09-13  6:39 ` Fengguang Wu
  0 siblings, 2 replies; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-11 18:28 UTC (permalink / raw)
  To: viro; +Cc: jack, hch, fengguang.wu, linux-kernel


If bdi has BDI_CAP_NO_WRITEBACK, bdi_forker_thread() doesn't start
writeback thread. This means there is no consumer of work item made
by bdi_queue_work().

This adds to checking of !bdi_cap_writeback_dirty(sb->s_bdi) before
calling bdi_queue_work(), otherwise queued work never be consumed.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---

 fs/fs-writeback.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff -puN fs/fs-writeback.c~noop_backing_dev_info-check-fix fs/fs-writeback.c
--- linux/fs/fs-writeback.c~noop_backing_dev_info-check-fix	2012-09-11 06:12:30.000000000 +0900
+++ linux-hirofumi/fs/fs-writeback.c	2012-09-11 06:12:30.000000000 +0900
@@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
 {
 	struct wb_writeback_work *work;
 
+	if (!bdi_cap_writeback_dirty(bdi))
+		return;
+
 	/*
 	 * This is WB_SYNC_NONE writeback, so if allocation fails just
 	 * wakeup the thread for old dirty data writeback
@@ -1310,7 +1313,7 @@ void writeback_inodes_sb_nr(struct super
 		.reason			= reason,
 	};
 
-	if (sb->s_bdi == &noop_backing_dev_info)
+	if (!bdi_cap_writeback_dirty(sb->s_bdi))
 		return;
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 	bdi_queue_work(sb->s_bdi, &work);
@@ -1396,7 +1399,7 @@ void sync_inodes_sb(struct super_block *
 	};
 
 	/* Nothing to do? */
-	if (sb->s_bdi == &noop_backing_dev_info)
+	if (!bdi_cap_writeback_dirty(sb->s_bdi))
 		return;
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 
_

-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-11 18:28 [PATCH] Fix queueing work if !bdi_cap_writeback_dirty() OGAWA Hirofumi
@ 2012-09-12  2:42 ` Fengguang Wu
  2012-09-12  8:00   ` OGAWA Hirofumi
  2012-09-13  6:39 ` Fengguang Wu
  1 sibling, 1 reply; 29+ messages in thread
From: Fengguang Wu @ 2012-09-12  2:42 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: viro, jack, hch, linux-kernel

On Wed, Sep 12, 2012 at 03:28:42AM +0900, OGAWA Hirofumi wrote:
> 
> If bdi has BDI_CAP_NO_WRITEBACK, bdi_forker_thread() doesn't start
> writeback thread. This means there is no consumer of work item made
> by bdi_queue_work().
> 
> This adds to checking of !bdi_cap_writeback_dirty(sb->s_bdi) before
> calling bdi_queue_work(), otherwise queued work never be consumed.

Thanks for catching this! Does this bug have any side effects other
than memory leaking? It may be possible for some caller that actually
expect it to do some work to make progress, otherwise will eventually
block.  If so, we'll need to fix the caller.

Thanks,
Fengguang

> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> ---
> 
>  fs/fs-writeback.c |    7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff -puN fs/fs-writeback.c~noop_backing_dev_info-check-fix fs/fs-writeback.c
> --- linux/fs/fs-writeback.c~noop_backing_dev_info-check-fix	2012-09-11 06:12:30.000000000 +0900
> +++ linux-hirofumi/fs/fs-writeback.c	2012-09-11 06:12:30.000000000 +0900
> @@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
>  {
>  	struct wb_writeback_work *work;
>  
> +	if (!bdi_cap_writeback_dirty(bdi))
> +		return;
> +
>  	/*
>  	 * This is WB_SYNC_NONE writeback, so if allocation fails just
>  	 * wakeup the thread for old dirty data writeback
> @@ -1310,7 +1313,7 @@ void writeback_inodes_sb_nr(struct super
>  		.reason			= reason,
>  	};
>  
> -	if (sb->s_bdi == &noop_backing_dev_info)
> +	if (!bdi_cap_writeback_dirty(sb->s_bdi))
>  		return;
>  	WARN_ON(!rwsem_is_locked(&sb->s_umount));
>  	bdi_queue_work(sb->s_bdi, &work);
> @@ -1396,7 +1399,7 @@ void sync_inodes_sb(struct super_block *
>  	};
>  
>  	/* Nothing to do? */
> -	if (sb->s_bdi == &noop_backing_dev_info)
> +	if (!bdi_cap_writeback_dirty(sb->s_bdi))
>  		return;
>  	WARN_ON(!rwsem_is_locked(&sb->s_umount));
>  
> _
> 
> -- 
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-12  2:42 ` Fengguang Wu
@ 2012-09-12  8:00   ` OGAWA Hirofumi
  2012-09-13  0:33     ` Fengguang Wu
  0 siblings, 1 reply; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-12  8:00 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: viro, jack, hch, linux-kernel

Fengguang Wu <fengguang.wu@intel.com> writes:

> On Wed, Sep 12, 2012 at 03:28:42AM +0900, OGAWA Hirofumi wrote:
>> 
>> If bdi has BDI_CAP_NO_WRITEBACK, bdi_forker_thread() doesn't start
>> writeback thread. This means there is no consumer of work item made
>> by bdi_queue_work().
>> 
>> This adds to checking of !bdi_cap_writeback_dirty(sb->s_bdi) before
>> calling bdi_queue_work(), otherwise queued work never be consumed.
>
> Thanks for catching this! Does this bug have any side effects other
> than memory leaking?
>
> It may be possible for some caller that actually expect it to do some
> work to make progress, otherwise will eventually block.  If so, we'll
> need to fix the caller.

If used custom bdi with BDI_CAP_NO_WRITEBACK, wait_for_completion()
(e.g. sync_inodes_sb()) will be blocked forever.

I tested by custom bdi with BDI_CAP_NO_WRITEBACK - sync(2) blocked
forever by this reason.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-12  8:00   ` OGAWA Hirofumi
@ 2012-09-13  0:33     ` Fengguang Wu
  2012-09-13  5:41       ` OGAWA Hirofumi
  0 siblings, 1 reply; 29+ messages in thread
From: Fengguang Wu @ 2012-09-13  0:33 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: viro, jack, hch, linux-kernel

On Wed, Sep 12, 2012 at 05:00:48PM +0900, OGAWA Hirofumi wrote:
> Fengguang Wu <fengguang.wu@intel.com> writes:
> 
> > On Wed, Sep 12, 2012 at 03:28:42AM +0900, OGAWA Hirofumi wrote:
> >> 
> >> If bdi has BDI_CAP_NO_WRITEBACK, bdi_forker_thread() doesn't start
> >> writeback thread. This means there is no consumer of work item made
> >> by bdi_queue_work().
> >> 
> >> This adds to checking of !bdi_cap_writeback_dirty(sb->s_bdi) before
> >> calling bdi_queue_work(), otherwise queued work never be consumed.
> >
> > Thanks for catching this! Does this bug have any side effects other
> > than memory leaking?
> >
> > It may be possible for some caller that actually expect it to do some
> > work to make progress, otherwise will eventually block.  If so, we'll
> > need to fix the caller.
> 
> If used custom bdi with BDI_CAP_NO_WRITEBACK, wait_for_completion()
> (e.g. sync_inodes_sb()) will be blocked forever.

The sync(2) block cannot be fixed by this patch?

> I tested by custom bdi with BDI_CAP_NO_WRITEBACK - sync(2) blocked
> forever by this reason.

What's your test script? How do you create/use that custom bdi?

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-13  0:33     ` Fengguang Wu
@ 2012-09-13  5:41       ` OGAWA Hirofumi
  2012-09-13  6:03         ` Fengguang Wu
  0 siblings, 1 reply; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-13  5:41 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: viro, jack, hch, linux-kernel

Fengguang Wu <fengguang.wu@intel.com> writes:

>> If used custom bdi with BDI_CAP_NO_WRITEBACK, wait_for_completion()
>> (e.g. sync_inodes_sb()) will be blocked forever.
>
> The sync(2) block cannot be fixed by this patch?

This patch fixes block problem too.

>> I tested by custom bdi with BDI_CAP_NO_WRITEBACK - sync(2) blocked
>> forever by this reason.
>
> What's your test script? How do you create/use that custom bdi?

Ah, I wrote my kernel module to test. I guess there is no users in
current kernel for now, because it doesn't work.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-13  5:41       ` OGAWA Hirofumi
@ 2012-09-13  6:03         ` Fengguang Wu
  2012-09-13  6:31           ` OGAWA Hirofumi
  0 siblings, 1 reply; 29+ messages in thread
From: Fengguang Wu @ 2012-09-13  6:03 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: viro, jack, hch, linux-kernel

On Thu, Sep 13, 2012 at 02:41:31PM +0900, OGAWA Hirofumi wrote:
> Fengguang Wu <fengguang.wu@intel.com> writes:
> 
> >> If used custom bdi with BDI_CAP_NO_WRITEBACK, wait_for_completion()
> >> (e.g. sync_inodes_sb()) will be blocked forever.
> >
> > The sync(2) block cannot be fixed by this patch?
> 
> This patch fixes block problem too.

Ah OK, my sight was somehow limited by the screen size and only saw
the first chunk...

> >> I tested by custom bdi with BDI_CAP_NO_WRITEBACK - sync(2) blocked
> >> forever by this reason.
> >
> > What's your test script? How do you create/use that custom bdi?
> 
> Ah, I wrote my kernel module to test. I guess there is no users in
> current kernel for now, because it doesn't work.

OK. Got it. So I'd suggest to change the first chunk to a VM_BUG_ON(),
which should be enough to catch new bad callers. The other two chunks
look fine.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-13  6:03         ` Fengguang Wu
@ 2012-09-13  6:31           ` OGAWA Hirofumi
  0 siblings, 0 replies; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-13  6:31 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: viro, jack, hch, linux-kernel

Fengguang Wu <fengguang.wu@intel.com> writes:

>> >> I tested by custom bdi with BDI_CAP_NO_WRITEBACK - sync(2) blocked
>> >> forever by this reason.
>> >
>> > What's your test script? How do you create/use that custom bdi?
>> 
>> Ah, I wrote my kernel module to test. I guess there is no users in
>> current kernel for now, because it doesn't work.
>
> OK. Got it. So I'd suggest to change the first chunk to a VM_BUG_ON(),
> which should be enough to catch new bad callers. The other two chunks
> look fine.

Hm, if it changed to VM_BUG_ON(), we can't use BDI_CAP_NO_WRITEBACK to
disable the writeback task (but allow inode cache reclaim) after all.

Or you meant we don't allow to disable the writeback task?

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-11 18:28 [PATCH] Fix queueing work if !bdi_cap_writeback_dirty() OGAWA Hirofumi
  2012-09-12  2:42 ` Fengguang Wu
@ 2012-09-13  6:39 ` Fengguang Wu
  2012-09-13  7:53   ` OGAWA Hirofumi
  1 sibling, 1 reply; 29+ messages in thread
From: Fengguang Wu @ 2012-09-13  6:39 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: viro, jack, hch, linux-kernel

On Wed, Sep 12, 2012 at 03:28:42AM +0900, OGAWA Hirofumi wrote:
> 
> If bdi has BDI_CAP_NO_WRITEBACK, bdi_forker_thread() doesn't start
> writeback thread. This means there is no consumer of work item made
> by bdi_queue_work().
> 
> This adds to checking of !bdi_cap_writeback_dirty(sb->s_bdi) before
> calling bdi_queue_work(), otherwise queued work never be consumed.
> 
> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> ---
> 
>  fs/fs-writeback.c |    7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff -puN fs/fs-writeback.c~noop_backing_dev_info-check-fix fs/fs-writeback.c
> --- linux/fs/fs-writeback.c~noop_backing_dev_info-check-fix	2012-09-11 06:12:30.000000000 +0900
> +++ linux-hirofumi/fs/fs-writeback.c	2012-09-11 06:12:30.000000000 +0900
> @@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
>  {
>  	struct wb_writeback_work *work;
>  
> +	if (!bdi_cap_writeback_dirty(bdi))
> +		return;

Will someone in the current kernel actually call
__bdi_start_writeback() on a BDI_CAP_NO_WRITEBACK bdi?

If the answer is no, VM_BUG_ON(!bdi_cap_writeback_dirty(bdi)) looks better.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-13  6:39 ` Fengguang Wu
@ 2012-09-13  7:53   ` OGAWA Hirofumi
  2012-09-14 11:13     ` OGAWA Hirofumi
  2012-09-14 11:14     ` Fengguang Wu
  0 siblings, 2 replies; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-13  7:53 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: viro, jack, hch, linux-kernel

Fengguang Wu <fengguang.wu@intel.com> writes:

> On Wed, Sep 12, 2012 at 03:28:42AM +0900, OGAWA Hirofumi wrote:
>> 
>> If bdi has BDI_CAP_NO_WRITEBACK, bdi_forker_thread() doesn't start
>> writeback thread. This means there is no consumer of work item made
>> by bdi_queue_work().
>> 
>> This adds to checking of !bdi_cap_writeback_dirty(sb->s_bdi) before
>> calling bdi_queue_work(), otherwise queued work never be consumed.
>> 
>> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>> ---
>> 
>>  fs/fs-writeback.c |    7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>> 
>> diff -puN fs/fs-writeback.c~noop_backing_dev_info-check-fix fs/fs-writeback.c
>> --- linux/fs/fs-writeback.c~noop_backing_dev_info-check-fix	2012-09-11 06:12:30.000000000 +0900
>> +++ linux-hirofumi/fs/fs-writeback.c	2012-09-11 06:12:30.000000000 +0900
>> @@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
>>  {
>>  	struct wb_writeback_work *work;
>>  
>> +	if (!bdi_cap_writeback_dirty(bdi))
>> +		return;
>
> Will someone in the current kernel actually call
> __bdi_start_writeback() on a BDI_CAP_NO_WRITEBACK bdi?
>
> If the answer is no, VM_BUG_ON(!bdi_cap_writeback_dirty(bdi)) looks better.

I guess nobody call it in current kernel though. Hmm.., but we also have
check in __mark_inode_dirty(), nobody should be using it, right?

If we defined it as the bug, I can't see what BDI_CAP_NO_WRITEBACK wants
to do actually.  We are not going to allow to disable the writeback task?

I was going to use this to disable writeback task on my developing FS...

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-13  7:53   ` OGAWA Hirofumi
@ 2012-09-14 11:13     ` OGAWA Hirofumi
  2012-09-14 11:18       ` Fengguang Wu
  2012-09-14 11:14     ` Fengguang Wu
  1 sibling, 1 reply; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-14 11:13 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: viro, jack, hch, linux-kernel

OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:

> Fengguang Wu <fengguang.wu@intel.com> writes:
>
>> On Wed, Sep 12, 2012 at 03:28:42AM +0900, OGAWA Hirofumi wrote:
>>> 
>>> If bdi has BDI_CAP_NO_WRITEBACK, bdi_forker_thread() doesn't start
>>> writeback thread. This means there is no consumer of work item made
>>> by bdi_queue_work().
>>> 
>>> This adds to checking of !bdi_cap_writeback_dirty(sb->s_bdi) before
>>> calling bdi_queue_work(), otherwise queued work never be consumed.
>>> 
>>> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>>> ---
>>> 
>>>  fs/fs-writeback.c |    7 +++++--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>> 
>>> diff -puN fs/fs-writeback.c~noop_backing_dev_info-check-fix fs/fs-writeback.c
>>> --- linux/fs/fs-writeback.c~noop_backing_dev_info-check-fix	2012-09-11 06:12:30.000000000 +0900
>>> +++ linux-hirofumi/fs/fs-writeback.c	2012-09-11 06:12:30.000000000 +0900
>>> @@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
>>>  {
>>>  	struct wb_writeback_work *work;
>>>  
>>> +	if (!bdi_cap_writeback_dirty(bdi))
>>> +		return;
>>
>> Will someone in the current kernel actually call
>> __bdi_start_writeback() on a BDI_CAP_NO_WRITEBACK bdi?
>>
>> If the answer is no, VM_BUG_ON(!bdi_cap_writeback_dirty(bdi)) looks better.
>
> I guess nobody call it in current kernel though. Hmm.., but we also have
> check in __mark_inode_dirty(), nobody should be using it, right?
>
> If we defined it as the bug, I can't see what BDI_CAP_NO_WRITEBACK wants
> to do actually.  We are not going to allow to disable the writeback task?

ping.

You are saying we should change bdi_cap_writeback_dirty(bdi) to
VM_BUG_ON() too in __mark_inode_dirty()? And probably, you are going to
remove the usage of BDI_CAP_NO_WRITEBACK for sb->s_bdi?

> I was going to use this to disable writeback task on my developing FS...
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-13  7:53   ` OGAWA Hirofumi
  2012-09-14 11:13     ` OGAWA Hirofumi
@ 2012-09-14 11:14     ` Fengguang Wu
  2012-09-14 12:12       ` OGAWA Hirofumi
  1 sibling, 1 reply; 29+ messages in thread
From: Fengguang Wu @ 2012-09-14 11:14 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: viro, jack, hch, linux-kernel

> >> @@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
> >>  {
> >>  	struct wb_writeback_work *work;
> >>  
> >> +	if (!bdi_cap_writeback_dirty(bdi))
> >> +		return;
> >
> > Will someone in the current kernel actually call
> > __bdi_start_writeback() on a BDI_CAP_NO_WRITEBACK bdi?
> >
> > If the answer is no, VM_BUG_ON(!bdi_cap_writeback_dirty(bdi)) looks better.
> 
> I guess nobody call it in current kernel though. Hmm.., but we also have
> check in __mark_inode_dirty(), nobody should be using it, right?
> 
> If we defined it as the bug, I can't see what BDI_CAP_NO_WRITEBACK wants
> to do actually.  We are not going to allow to disable the writeback task?

> I was going to use this to disable writeback task on my developing FS...

That sounds like an interesting use case. Can you elaborate a bit more?
Note that even if you disable __bdi_start_writeback() here, the kernel
may also start writeback in the page reclaim path, the fsync() path,
and perhaps more.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 11:13     ` OGAWA Hirofumi
@ 2012-09-14 11:18       ` Fengguang Wu
  0 siblings, 0 replies; 29+ messages in thread
From: Fengguang Wu @ 2012-09-14 11:18 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: viro, jack, hch, linux-kernel

On Fri, Sep 14, 2012 at 08:13:05PM +0900, OGAWA Hirofumi wrote:
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:
> 
> > Fengguang Wu <fengguang.wu@intel.com> writes:
> >
> >> On Wed, Sep 12, 2012 at 03:28:42AM +0900, OGAWA Hirofumi wrote:
> >>> 
> >>> If bdi has BDI_CAP_NO_WRITEBACK, bdi_forker_thread() doesn't start
> >>> writeback thread. This means there is no consumer of work item made
> >>> by bdi_queue_work().
> >>> 
> >>> This adds to checking of !bdi_cap_writeback_dirty(sb->s_bdi) before
> >>> calling bdi_queue_work(), otherwise queued work never be consumed.
> >>> 
> >>> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> >>> ---
> >>> 
> >>>  fs/fs-writeback.c |    7 +++++--
> >>>  1 file changed, 5 insertions(+), 2 deletions(-)
> >>> 
> >>> diff -puN fs/fs-writeback.c~noop_backing_dev_info-check-fix fs/fs-writeback.c
> >>> --- linux/fs/fs-writeback.c~noop_backing_dev_info-check-fix	2012-09-11 06:12:30.000000000 +0900
> >>> +++ linux-hirofumi/fs/fs-writeback.c	2012-09-11 06:12:30.000000000 +0900
> >>> @@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
> >>>  {
> >>>  	struct wb_writeback_work *work;
> >>>  
> >>> +	if (!bdi_cap_writeback_dirty(bdi))
> >>> +		return;
> >>
> >> Will someone in the current kernel actually call
> >> __bdi_start_writeback() on a BDI_CAP_NO_WRITEBACK bdi?
> >>
> >> If the answer is no, VM_BUG_ON(!bdi_cap_writeback_dirty(bdi)) looks better.
> >
> > I guess nobody call it in current kernel though. Hmm.., but we also have
> > check in __mark_inode_dirty(), nobody should be using it, right?
> >
> > If we defined it as the bug, I can't see what BDI_CAP_NO_WRITEBACK wants
> > to do actually.  We are not going to allow to disable the writeback task?
> 
> ping.
> 
> You are saying we should change bdi_cap_writeback_dirty(bdi) to
> VM_BUG_ON() too in __mark_inode_dirty()? And probably, you are going to
> remove the usage of BDI_CAP_NO_WRITEBACK for sb->s_bdi?

No, I'm talking about this chunk only. My feeling is, adding the
bdi_cap_writeback_dirty() detection in __bdi_start_writeback() sounds
too late. If ever the test is false, it may well indicate a bug in the
callers. Obviously the callers all assume success because
__bdi_start_writeback() does not even return a value.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 11:14     ` Fengguang Wu
@ 2012-09-14 12:12       ` OGAWA Hirofumi
  2012-09-14 12:53         ` Fengguang Wu
  2012-09-14 13:19         ` Jan Kara
  0 siblings, 2 replies; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-14 12:12 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: viro, jack, hch, linux-kernel

Fengguang Wu <fengguang.wu@intel.com> writes:

>> >> @@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
>> >>  {
>> >>  	struct wb_writeback_work *work;
>> >>  
>> >> +	if (!bdi_cap_writeback_dirty(bdi))
>> >> +		return;
>> >
>> > Will someone in the current kernel actually call
>> > __bdi_start_writeback() on a BDI_CAP_NO_WRITEBACK bdi?
>> >
>> > If the answer is no, VM_BUG_ON(!bdi_cap_writeback_dirty(bdi)) looks better.
>> 
>> I guess nobody call it in current kernel though. Hmm.., but we also have
>> check in __mark_inode_dirty(), nobody should be using it, right?
>> 
>> If we defined it as the bug, I can't see what BDI_CAP_NO_WRITEBACK wants
>> to do actually.  We are not going to allow to disable the writeback task?
>
>> I was going to use this to disable writeback task on my developing FS...
>
> That sounds like an interesting use case. Can you elaborate a bit more?
>
> Note that even if you disable __bdi_start_writeback() here, the kernel
> may also start writeback in the page reclaim path, the fsync() path,
> and perhaps more.

page reclaim and fsync path have FS handler. So, FS can control those.

The modern FS have to control to flush carefully. Many FSes are already
ignoring if wbc->sync_mode != WB_SYNC_ALL (e.g. ext3_write_inode,
nilfs_writepages), and have own FS task to flush.

The writeback task is always called with sync_mode != WB_SYNC_ALL except
sync_inodes_sb(). But FS has sb->s_op->sync_fs() handler for
sync_inodes_sb() path. So, writeback task just bothers FS to control to
flush.

Also it wants to control the reclaimable of inode cache too, because FS
have to control to flush, and wants to use inode in own FS task, and it
knows when inode is cleaned and can be reclaimed.

I thought there are 2 options - 1) pin inode with iget(), and iput() on
own FS task, 2) disable writeback task and care about inode reclaim by
dirty flags.

(1) was complex (e.g. inode can be the orphan inode), and seems to be
ineffective workaround to survive with writeback task.

So, I want to do (2).

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 12:12       ` OGAWA Hirofumi
@ 2012-09-14 12:53         ` Fengguang Wu
  2012-09-14 13:07           ` OGAWA Hirofumi
  2012-09-14 13:19         ` Jan Kara
  1 sibling, 1 reply; 29+ messages in thread
From: Fengguang Wu @ 2012-09-14 12:53 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: viro, jack, hch, linux-kernel

On Fri, Sep 14, 2012 at 09:12:02PM +0900, OGAWA Hirofumi wrote:
> Fengguang Wu <fengguang.wu@intel.com> writes:
> 
> >> >> @@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
> >> >>  {
> >> >>  	struct wb_writeback_work *work;
> >> >>  
> >> >> +	if (!bdi_cap_writeback_dirty(bdi))
> >> >> +		return;
> >> >
> >> > Will someone in the current kernel actually call
> >> > __bdi_start_writeback() on a BDI_CAP_NO_WRITEBACK bdi?
> >> >
> >> > If the answer is no, VM_BUG_ON(!bdi_cap_writeback_dirty(bdi)) looks better.
> >> 
> >> I guess nobody call it in current kernel though. Hmm.., but we also have
> >> check in __mark_inode_dirty(), nobody should be using it, right?
> >> 
> >> If we defined it as the bug, I can't see what BDI_CAP_NO_WRITEBACK wants
> >> to do actually.  We are not going to allow to disable the writeback task?
> >
> >> I was going to use this to disable writeback task on my developing FS...
> >
> > That sounds like an interesting use case. Can you elaborate a bit more?
> >
> > Note that even if you disable __bdi_start_writeback() here, the kernel
> > may also start writeback in the page reclaim path, the fsync() path,
> > and perhaps more.
> 
> page reclaim and fsync path have FS handler. So, FS can control those.
> 
> The modern FS have to control to flush carefully. Many FSes are already
> ignoring if wbc->sync_mode != WB_SYNC_ALL (e.g. ext3_write_inode,
> nilfs_writepages), and have own FS task to flush.

Yeah, that test is mainly to improve IO efficiency for
non-data-integrity writes.

> The writeback task is always called with sync_mode != WB_SYNC_ALL except
> sync_inodes_sb(). But FS has sb->s_op->sync_fs() handler for
> sync_inodes_sb() path. So, writeback task just bothers FS to control to
> flush.
> 
> Also it wants to control the reclaimable of inode cache too, because FS
> have to control to flush, and wants to use inode in own FS task, and it
> knows when inode is cleaned and can be reclaimed.
> 
> I thought there are 2 options - 1) pin inode with iget(), and iput() on
> own FS task, 2) disable writeback task and care about inode reclaim by
> dirty flags.
> 
> (1) was complex (e.g. inode can be the orphan inode), and seems to be
> ineffective workaround to survive with writeback task.

In principle, the VFS should of course give enough flexibility for the
FS. But it's all about the details that matter. As for the
BDI_CAP_NO_WRITEBACK approach, I'm afraid you'll not get the expected
"FS control" through it. Because the flusher thread may already have a
long queue of works which will take long time to finish. It even have
its internal background/periodic works that's not controllable this
way, see wb_check_background_flush().

And BDI_CAP_NO_WRITEBACK is expected to be a static/constant flag that
always evaluate to true/false for a given bdi.  There will be
correctness problems if you change the BDI_CAP_NO_WRITEBACK flag
dynamically.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 12:53         ` Fengguang Wu
@ 2012-09-14 13:07           ` OGAWA Hirofumi
  2012-09-14 13:33             ` Fengguang Wu
  0 siblings, 1 reply; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-14 13:07 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: viro, jack, hch, linux-kernel

Fengguang Wu <fengguang.wu@intel.com> writes:

>> The writeback task is always called with sync_mode != WB_SYNC_ALL except
>> sync_inodes_sb(). But FS has sb->s_op->sync_fs() handler for
>> sync_inodes_sb() path. So, writeback task just bothers FS to control to
>> flush.
>> 
>> Also it wants to control the reclaimable of inode cache too, because FS
>> have to control to flush, and wants to use inode in own FS task, and it
>> knows when inode is cleaned and can be reclaimed.
>> 
>> I thought there are 2 options - 1) pin inode with iget(), and iput() on
>> own FS task, 2) disable writeback task and care about inode reclaim by
>> dirty flags.
>> 
>> (1) was complex (e.g. inode can be the orphan inode), and seems to be
>> ineffective workaround to survive with writeback task.
>
> In principle, the VFS should of course give enough flexibility for the
> FS. But it's all about the details that matter. As for the
> BDI_CAP_NO_WRITEBACK approach, I'm afraid you'll not get the expected
> "FS control" through it. Because the flusher thread may already have a
> long queue of works which will take long time to finish. It even have
> its internal background/periodic works that's not controllable this
> way, see wb_check_background_flush().

wb_check_background_flush() is called from,

bdi_forker_thread()
    bdi_writeback_thread()
        wb_do_writeback()
            wb_check_background_flush()

But, bdi_forker_thread() never start bdi_writeback_thread() if
!bdi_cap_writeback_dirty(bdi).

Or I'm seeing something wrong here?

> And BDI_CAP_NO_WRITEBACK is expected to be a static/constant flag that
> always evaluate to true/false for a given bdi.  There will be
> correctness problems if you change the BDI_CAP_NO_WRITEBACK flag
> dynamically.

I'm going to use it as static or per-sb by initialized in
fill_super(). And it uses always BDI_CAP_NO_WRITEBACK if sb is
available. Because own FS task flush instead.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 12:12       ` OGAWA Hirofumi
  2012-09-14 12:53         ` Fengguang Wu
@ 2012-09-14 13:19         ` Jan Kara
  2012-09-14 13:44           ` OGAWA Hirofumi
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Kara @ 2012-09-14 13:19 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Fengguang Wu, viro, jack, hch, linux-kernel

On Fri 14-09-12 21:12:02, OGAWA Hirofumi wrote:
> Fengguang Wu <fengguang.wu@intel.com> writes:
> >> >> @@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
> >> >>  {
> >> >>  	struct wb_writeback_work *work;
> >> >>  
> >> >> +	if (!bdi_cap_writeback_dirty(bdi))
> >> >> +		return;
> >> >
> >> > Will someone in the current kernel actually call
> >> > __bdi_start_writeback() on a BDI_CAP_NO_WRITEBACK bdi?
> >> >
> >> > If the answer is no, VM_BUG_ON(!bdi_cap_writeback_dirty(bdi)) looks better.
> >> 
> >> I guess nobody call it in current kernel though. Hmm.., but we also have
> >> check in __mark_inode_dirty(), nobody should be using it, right?
> >> 
> >> If we defined it as the bug, I can't see what BDI_CAP_NO_WRITEBACK wants
> >> to do actually.  We are not going to allow to disable the writeback task?
> >
> >> I was going to use this to disable writeback task on my developing FS...
> >
> > That sounds like an interesting use case. Can you elaborate a bit more?
> >
> > Note that even if you disable __bdi_start_writeback() here, the kernel
> > may also start writeback in the page reclaim path, the fsync() path,
> > and perhaps more.
> 
> page reclaim and fsync path have FS handler. So, FS can control those.
> 
> The modern FS have to control to flush carefully. Many FSes are already
> ignoring if wbc->sync_mode != WB_SYNC_ALL (e.g. ext3_write_inode,
> nilfs_writepages), and have own FS task to flush.
  Out of curiosity, what exactly do you need to control in your filesystem
that makes flusher thread unusable for you? You still have a lot of
flexibility with ->write_inode() and ->writepages() callbacks...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 13:07           ` OGAWA Hirofumi
@ 2012-09-14 13:33             ` Fengguang Wu
  2012-09-14 13:49               ` OGAWA Hirofumi
  0 siblings, 1 reply; 29+ messages in thread
From: Fengguang Wu @ 2012-09-14 13:33 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: viro, jack, hch, linux-kernel

On Fri, Sep 14, 2012 at 10:07:48PM +0900, OGAWA Hirofumi wrote:
> Fengguang Wu <fengguang.wu@intel.com> writes:
> 
> >> The writeback task is always called with sync_mode != WB_SYNC_ALL except
> >> sync_inodes_sb(). But FS has sb->s_op->sync_fs() handler for
> >> sync_inodes_sb() path. So, writeback task just bothers FS to control to
> >> flush.
> >> 
> >> Also it wants to control the reclaimable of inode cache too, because FS
> >> have to control to flush, and wants to use inode in own FS task, and it
> >> knows when inode is cleaned and can be reclaimed.
> >> 
> >> I thought there are 2 options - 1) pin inode with iget(), and iput() on
> >> own FS task, 2) disable writeback task and care about inode reclaim by
> >> dirty flags.
> >> 
> >> (1) was complex (e.g. inode can be the orphan inode), and seems to be
> >> ineffective workaround to survive with writeback task.
> >
> > In principle, the VFS should of course give enough flexibility for the
> > FS. But it's all about the details that matter. As for the
> > BDI_CAP_NO_WRITEBACK approach, I'm afraid you'll not get the expected
> > "FS control" through it. Because the flusher thread may already have a
> > long queue of works which will take long time to finish. It even have
> > its internal background/periodic works that's not controllable this
> > way, see wb_check_background_flush().
> 
> wb_check_background_flush() is called from,
> 
> bdi_forker_thread()
>     bdi_writeback_thread()
>         wb_do_writeback()
>             wb_check_background_flush()
> 
> But, bdi_forker_thread() never start bdi_writeback_thread() if
> !bdi_cap_writeback_dirty(bdi).
> 
> Or I'm seeing something wrong here?

Nothing wrong.

> > And BDI_CAP_NO_WRITEBACK is expected to be a static/constant flag that
> > always evaluate to true/false for a given bdi.  There will be
> > correctness problems if you change the BDI_CAP_NO_WRITEBACK flag
> > dynamically.
> 
> I'm going to use it as static or per-sb by initialized in
> fill_super(). And it uses always BDI_CAP_NO_WRITEBACK if sb is
> available. Because own FS task flush instead.

Ah OK, sorry I didn't quite catch your use case.

But then if you set BDI_CAP_NO_WRITEBACK in the beginning, how come
__bdi_start_writeback() will be called at all?

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 13:19         ` Jan Kara
@ 2012-09-14 13:44           ` OGAWA Hirofumi
  2012-09-14 14:45             ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-14 13:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: Fengguang Wu, viro, hch, linux-kernel

Jan Kara <jack@suse.cz> writes:

>> page reclaim and fsync path have FS handler. So, FS can control those.
>> 
>> The modern FS have to control to flush carefully. Many FSes are already
>> ignoring if wbc->sync_mode != WB_SYNC_ALL (e.g. ext3_write_inode,
>> nilfs_writepages), and have own FS task to flush.
>   Out of curiosity, what exactly do you need to control in your filesystem
> that makes flusher thread unusable for you? You still have a lot of
> flexibility with ->write_inode() and ->writepages() callbacks...

If flusher is working, it clears dirty flags of inode. But if those
handers can't flush at the time, we have to do redirty or something to
prevent the reclaim.

This job is nothing benefit to just for workaround of flusher, and is
complex and racy.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 13:33             ` Fengguang Wu
@ 2012-09-14 13:49               ` OGAWA Hirofumi
  0 siblings, 0 replies; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-14 13:49 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: viro, jack, hch, linux-kernel

Fengguang Wu <fengguang.wu@intel.com> writes:

>> > And BDI_CAP_NO_WRITEBACK is expected to be a static/constant flag that
>> > always evaluate to true/false for a given bdi.  There will be
>> > correctness problems if you change the BDI_CAP_NO_WRITEBACK flag
>> > dynamically.
>> 
>> I'm going to use it as static or per-sb by initialized in
>> fill_super(). And it uses always BDI_CAP_NO_WRITEBACK if sb is
>> available. Because own FS task flush instead.
>
> Ah OK, sorry I didn't quite catch your use case.
>
> But then if you set BDI_CAP_NO_WRITEBACK in the beginning, how come
> __bdi_start_writeback() will be called at all?

If we call mark_inode_dirty(inode), inode goes into bdi->wb.b_dirty.
And sync(2) calls __bdi_start_writeback() for all of bdi if bdi->wb.b_*
is not empty.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 13:44           ` OGAWA Hirofumi
@ 2012-09-14 14:45             ` Jan Kara
  2012-09-14 15:10               ` OGAWA Hirofumi
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2012-09-14 14:45 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Jan Kara, Fengguang Wu, viro, hch, linux-kernel

On Fri 14-09-12 22:44:28, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> page reclaim and fsync path have FS handler. So, FS can control those.
> >> 
> >> The modern FS have to control to flush carefully. Many FSes are already
> >> ignoring if wbc->sync_mode != WB_SYNC_ALL (e.g. ext3_write_inode,
> >> nilfs_writepages), and have own FS task to flush.
> >   Out of curiosity, what exactly do you need to control in your filesystem
> > that makes flusher thread unusable for you? You still have a lot of
> > flexibility with ->write_inode() and ->writepages() callbacks...
> 
> If flusher is working, it clears dirty flags of inode. But if those
> handers can't flush at the time, we have to do redirty or something to
> prevent the reclaim.
  Well, if this is your only problem then I'd see better options than just
disabling flusher thread. If the inability to write inode is rare, then
redirtying seems like a reasonable option (despite I agree it's a bit
ugly). If the inability to write is common, then you'll probably have to do
the dirty inode tracking yourself in some list and expose inodes to VM when
they are ready to be written. Or you handle writing of inodes yourself but
leave writing of pages on flusher thread...

Because when you disable flusher thread completely you have to put all the
smarts to avoid livelocks, keep fairness among processes, write old data,
keep number of dirty pages under control into your filesystem which leads
to a lot of duplication.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 14:45             ` Jan Kara
@ 2012-09-14 15:10               ` OGAWA Hirofumi
  2012-09-16 21:49                 ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-14 15:10 UTC (permalink / raw)
  To: Jan Kara; +Cc: Fengguang Wu, viro, hch, linux-kernel

Jan Kara <jack@suse.cz> writes:

>> If flusher is working, it clears dirty flags of inode. But if those
>> handers can't flush at the time, we have to do redirty or something to
>> prevent the reclaim.
>   Well, if this is your only problem then I'd see better options than just
> disabling flusher thread. If the inability to write inode is rare, then
> redirtying seems like a reasonable option (despite I agree it's a bit
> ugly). If the inability to write is common, then you'll probably have to do
> the dirty inode tracking yourself in some list and expose inodes to VM when
> they are ready to be written. Or you handle writing of inodes yourself but
> leave writing of pages on flusher thread...

Basically all data can be data-integrity write like data logging, so it
would be more than common. And ->writepages() will also ignore WBC_SYNC_NONE.

> Because when you disable flusher thread completely you have to put all the
> smarts to avoid livelocks, keep fairness among processes, write old data,
> keep number of dirty pages under control into your filesystem which leads
> to a lot of duplication.

I'm not sure what you meant though. What is the difference with ignoring
WBC_SYNC_NONE?

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-14 15:10               ` OGAWA Hirofumi
@ 2012-09-16 21:49                 ` Jan Kara
  2012-09-16 23:24                   ` OGAWA Hirofumi
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2012-09-16 21:49 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Jan Kara, Fengguang Wu, viro, hch, linux-kernel

On Sat 15-09-12 00:10:53, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> If flusher is working, it clears dirty flags of inode. But if those
> >> handers can't flush at the time, we have to do redirty or something to
> >> prevent the reclaim.
> >   Well, if this is your only problem then I'd see better options than just
> > disabling flusher thread. If the inability to write inode is rare, then
> > redirtying seems like a reasonable option (despite I agree it's a bit
> > ugly). If the inability to write is common, then you'll probably have to do
> > the dirty inode tracking yourself in some list and expose inodes to VM when
> > they are ready to be written. Or you handle writing of inodes yourself but
> > leave writing of pages on flusher thread...
> 
> Basically all data can be data-integrity write like data logging, so it
> would be more than common. And ->writepages() will also ignore WBC_SYNC_NONE.
> 
> > Because when you disable flusher thread completely you have to put all the
> > smarts to avoid livelocks, keep fairness among processes, write old data,
> > keep number of dirty pages under control into your filesystem which leads
> > to a lot of duplication.
> 
> I'm not sure what you meant though. What is the difference with ignoring
> WBC_SYNC_NONE?
  When you completely ignore WB_SYNC_NONE writeback, you'll soon drive the
machine close to dirty limits and processes dirtying pages will get
throttled. Because flusher threads won't be able to write pages - they
do WB_SYNC_NONE writeback when we have too many dirty pages - processes
will be throttled until somebody calls sync(1) or someone writes the data
for some other reason... So I suspect things won't really work as you
expect.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-16 21:49                 ` Jan Kara
@ 2012-09-16 23:24                   ` OGAWA Hirofumi
  2012-09-17  8:48                     ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-16 23:24 UTC (permalink / raw)
  To: Jan Kara; +Cc: Fengguang Wu, viro, hch, linux-kernel

Jan Kara <jack@suse.cz> writes:

>> I'm not sure what you meant though. What is the difference with ignoring
>> WBC_SYNC_NONE?
>   When you completely ignore WB_SYNC_NONE writeback, you'll soon drive the
> machine close to dirty limits and processes dirtying pages will get
> throttled. Because flusher threads won't be able to write pages - they
> do WB_SYNC_NONE writeback when we have too many dirty pages - processes
> will be throttled until somebody calls sync(1) or someone writes the data
> for some other reason... So I suspect things won't really work as you
> expect.

I think you know how to solve it though. You can add the periodic flush
in own task. And you can check bdi->dirty_exceeded in any handlers.

Well, ok. The alternative plan but more bigger change is to add the
handler to writeback task path. This would be better way, and core
should be able to request to flush with usual way (I guess this is what
you are concerning).  And I believe some FS can implement the simpler
and more efficient writeback path.

But this would look like what reiserfs4 was submitted in past (before
bdi was introduced), and unfortunately never accepted though.

Since situation was changed, will we accept it?

OK, why my FS requires it? Because basic strategy try to keep the
consistency of user view, not only internal metadata consistency.
I.e. it works like to flush the snapshot of user view.

So, flushing metadata/data by arbitrary order like current writeback
task does is unacceptable (of course, except request by user). And
writeback task will never know the correct order of FS.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-16 23:24                   ` OGAWA Hirofumi
@ 2012-09-17  8:48                     ` Jan Kara
  2012-09-17  9:39                       ` OGAWA Hirofumi
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2012-09-17  8:48 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Jan Kara, Fengguang Wu, viro, hch, linux-kernel

On Mon 17-09-12 08:24:08, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> I'm not sure what you meant though. What is the difference with ignoring
> >> WBC_SYNC_NONE?
> >   When you completely ignore WB_SYNC_NONE writeback, you'll soon drive the
> > machine close to dirty limits and processes dirtying pages will get
> > throttled. Because flusher threads won't be able to write pages - they
> > do WB_SYNC_NONE writeback when we have too many dirty pages - processes
> > will be throttled until somebody calls sync(1) or someone writes the data
> > for some other reason... So I suspect things won't really work as you
> > expect.
> 
> I think you know how to solve it though. You can add the periodic flush
> in own task. And you can check bdi->dirty_exceeded in any handlers.
  Sure, you can have your private thread. That is possible but you will
have to duplicate flusher logic and you will still get odd behavior e.g.
when your filesystem is on one partition and another filesystem is on a
different partition of the same disk.

> Well, ok. The alternative plan but more bigger change is to add the
> handler to writeback task path. This would be better way, and core
> should be able to request to flush with usual way (I guess this is what
> you are concerning).  And I believe some FS can implement the simpler
> and more efficient writeback path.
> 
> But this would look like what reiserfs4 was submitted in past (before
> bdi was introduced), and unfortunately never accepted though.
> 
> Since situation was changed, will we accept it?
> 
> OK, why my FS requires it? Because basic strategy try to keep the
> consistency of user view, not only internal metadata consistency.
> I.e. it works like to flush the snapshot of user view.
> 
> So, flushing metadata/data by arbitrary order like current writeback
> task does is unacceptable (of course, except request by user). And
> writeback task will never know the correct order of FS.
  OK, thanks for explanation. Now I understand what you are trying to do.
Would it be enough if you could track dirty inodes inside your filesystem
and provide some callback for flusher so that you can queue these inodes in
the IO queue?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-17  8:48                     ` Jan Kara
@ 2012-09-17  9:39                       ` OGAWA Hirofumi
  2012-09-17  9:56                         ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-17  9:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: Fengguang Wu, viro, hch, linux-kernel

Jan Kara <jack@suse.cz> writes:

>> I think you know how to solve it though. You can add the periodic flush
>> in own task. And you can check bdi->dirty_exceeded in any handlers.
>   Sure, you can have your private thread. That is possible but you will
> have to duplicate flusher logic and you will still get odd behavior e.g.
> when your filesystem is on one partition and another filesystem is on a
> different partition of the same disk.

Right. But it is what current FSes are doing more or less.

>> Well, ok. The alternative plan but more bigger change is to add the
>> handler to writeback task path. This would be better way, and core
>> should be able to request to flush with usual way (I guess this is what
>> you are concerning).  And I believe some FS can implement the simpler
>> and more efficient writeback path.
>> 
>> But this would look like what reiserfs4 was submitted in past (before
>> bdi was introduced), and unfortunately never accepted though.
>> 
>> Since situation was changed, will we accept it?
>> 
>> OK, why my FS requires it? Because basic strategy try to keep the
>> consistency of user view, not only internal metadata consistency.
>> I.e. it works like to flush the snapshot of user view.
>> 
>> So, flushing metadata/data by arbitrary order like current writeback
>> task does is unacceptable (of course, except request by user). And
>> writeback task will never know the correct order of FS.
>   OK, thanks for explanation. Now I understand what you are trying to do.
> Would it be enough if you could track dirty inodes inside your filesystem
> and provide some callback for flusher so that you can queue these inodes in
> the IO queue?

Yes, I guess so. I'm not doing the experiment this plan yet, so I'm not
sure though.  If we provide the callback something like
->writeback_sb_inodes(), it would work.

And the better design is to remove duplication of dirty inode tracking
on writeback task and own FS though. (However, this is quite optional)

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-17  9:39                       ` OGAWA Hirofumi
@ 2012-09-17  9:56                         ` Jan Kara
  2012-09-17 10:37                           ` OGAWA Hirofumi
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2012-09-17  9:56 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Jan Kara, Fengguang Wu, viro, hch, linux-kernel

On Mon 17-09-12 18:39:05, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> I think you know how to solve it though. You can add the periodic flush
> >> in own task. And you can check bdi->dirty_exceeded in any handlers.
> >   Sure, you can have your private thread. That is possible but you will
> > have to duplicate flusher logic and you will still get odd behavior e.g.
> > when your filesystem is on one partition and another filesystem is on a
> > different partition of the same disk.
> 
> Right. But it is what current FSes are doing more or less.
  It's not. Page writeback is respected by all filesystems in most cases
AFAIK. Inode writeback is a different issue but that's not so interesting
from mm point of view...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-17  9:56                         ` Jan Kara
@ 2012-09-17 10:37                           ` OGAWA Hirofumi
  2012-09-17 15:54                             ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-17 10:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: Fengguang Wu, viro, hch, linux-kernel

Jan Kara <jack@suse.cz> writes:

> On Mon 17-09-12 18:39:05, OGAWA Hirofumi wrote:
>> Jan Kara <jack@suse.cz> writes:
>> 
>> >> I think you know how to solve it though. You can add the periodic flush
>> >> in own task. And you can check bdi->dirty_exceeded in any handlers.
>> >   Sure, you can have your private thread. That is possible but you will
>> > have to duplicate flusher logic and you will still get odd behavior e.g.
>> > when your filesystem is on one partition and another filesystem is on a
>> > different partition of the same disk.
>> 
>> Right. But it is what current FSes are doing more or less.
>   It's not. Page writeback is respected by all filesystems in most cases
> AFAIK. Inode writeback is a different issue but that's not so interesting
> from mm point of view...

Duplicate flusher - many FSes has own task to flush.  Odd behavior in
the case of partition - agree, but I'm not sure why metadata is ok, and
it is not odd behavior.

Sorry, I'm not sure your point in latest comment. You are just saying FS
must flush pages on writepages()?

And if alternative plan is acceptable, maybe I will not have interest to
this anymore.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-17 10:37                           ` OGAWA Hirofumi
@ 2012-09-17 15:54                             ` Jan Kara
  2012-09-17 16:55                               ` OGAWA Hirofumi
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2012-09-17 15:54 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Jan Kara, Fengguang Wu, viro, hch, linux-kernel

On Mon 17-09-12 19:37:46, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> > On Mon 17-09-12 18:39:05, OGAWA Hirofumi wrote:
> >> Jan Kara <jack@suse.cz> writes:
> >> 
> >> >> I think you know how to solve it though. You can add the periodic flush
> >> >> in own task. And you can check bdi->dirty_exceeded in any handlers.
> >> >   Sure, you can have your private thread. That is possible but you will
> >> > have to duplicate flusher logic and you will still get odd behavior e.g.
> >> > when your filesystem is on one partition and another filesystem is on a
> >> > different partition of the same disk.
> >> 
> >> Right. But it is what current FSes are doing more or less.
> >   It's not. Page writeback is respected by all filesystems in most cases
> > AFAIK. Inode writeback is a different issue but that's not so interesting
> > from mm point of view...
> 
> Duplicate flusher - many FSes has own task to flush.  Odd behavior in
> the case of partition - agree, but I'm not sure why metadata is ok, and
> it is not odd behavior.
  Well, because there is much more of data pages then there is metadata. So
when you do strange things (like refuse to write / reclaim) with metadata,
it usually ends up in the noise. But when you start doing similar things
with data pages, people will notice.

> Sorry, I'm not sure your point in latest comment. You are just saying FS
> must flush pages on writepages()?
  Yes.

> And if alternative plan is acceptable, maybe I will not have interest to
> this anymore.
  Yes, the alternative plan looks better to me. But all in all I don't want
to stop you from your experiments :) I mostly just wanted to point out that
disabling flusher thread for a filesystem has a complex consequences which
IMHO bring more bad than good.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()
  2012-09-17 15:54                             ` Jan Kara
@ 2012-09-17 16:55                               ` OGAWA Hirofumi
  0 siblings, 0 replies; 29+ messages in thread
From: OGAWA Hirofumi @ 2012-09-17 16:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: Fengguang Wu, viro, hch, linux-kernel

Jan Kara <jack@suse.cz> writes:

>> Duplicate flusher - many FSes has own task to flush.  Odd behavior in
>> the case of partition - agree, but I'm not sure why metadata is ok, and
>> it is not odd behavior.
>   Well, because there is much more of data pages then there is metadata. So
> when you do strange things (like refuse to write / reclaim) with metadata,
> it usually ends up in the noise. But when you start doing similar things
> with data pages, people will notice.

Could you explain more. So, you are thinking we have to fix current
behavior for metadata intensive applications? And what will people
notice?

>> Sorry, I'm not sure your point in latest comment. You are just saying FS
>> must flush pages on writepages()?
>   Yes.
>
>> And if alternative plan is acceptable, maybe I will not have interest to
>> this anymore.
>   Yes, the alternative plan looks better to me. But all in all I don't want
> to stop you from your experiments :) I mostly just wanted to point out that
> disabling flusher thread for a filesystem has a complex consequences which
> IMHO bring more bad than good.

OK, thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-09-17 16:55 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-11 18:28 [PATCH] Fix queueing work if !bdi_cap_writeback_dirty() OGAWA Hirofumi
2012-09-12  2:42 ` Fengguang Wu
2012-09-12  8:00   ` OGAWA Hirofumi
2012-09-13  0:33     ` Fengguang Wu
2012-09-13  5:41       ` OGAWA Hirofumi
2012-09-13  6:03         ` Fengguang Wu
2012-09-13  6:31           ` OGAWA Hirofumi
2012-09-13  6:39 ` Fengguang Wu
2012-09-13  7:53   ` OGAWA Hirofumi
2012-09-14 11:13     ` OGAWA Hirofumi
2012-09-14 11:18       ` Fengguang Wu
2012-09-14 11:14     ` Fengguang Wu
2012-09-14 12:12       ` OGAWA Hirofumi
2012-09-14 12:53         ` Fengguang Wu
2012-09-14 13:07           ` OGAWA Hirofumi
2012-09-14 13:33             ` Fengguang Wu
2012-09-14 13:49               ` OGAWA Hirofumi
2012-09-14 13:19         ` Jan Kara
2012-09-14 13:44           ` OGAWA Hirofumi
2012-09-14 14:45             ` Jan Kara
2012-09-14 15:10               ` OGAWA Hirofumi
2012-09-16 21:49                 ` Jan Kara
2012-09-16 23:24                   ` OGAWA Hirofumi
2012-09-17  8:48                     ` Jan Kara
2012-09-17  9:39                       ` OGAWA Hirofumi
2012-09-17  9:56                         ` Jan Kara
2012-09-17 10:37                           ` OGAWA Hirofumi
2012-09-17 15:54                             ` Jan Kara
2012-09-17 16:55                               ` OGAWA Hirofumi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).