linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] writeback: inode cgroup wb switch should skip inode with zero i_count
@ 2016-06-13 22:37 Tahsin Erdogan
  2016-06-15 12:26 ` Jan Kara
  0 siblings, 1 reply; 5+ messages in thread
From: Tahsin Erdogan @ 2016-06-13 22:37 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Alexander Viro
  Cc: Jan Kara, linux-fsdevel, linux-kernel, Tahsin Erdogan

Asynchronous wb switching of inodes takes an additional ref count on an
inode to make sure inode remains valid until switchover is completed.

However, it is possible that inode->i_count has already reached zero
while inode is in writeback queue:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30
CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: writeback wb_workfn (flush-8:16)
 0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000
 0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2
 ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003
Call Trace:
 [<ffffffff805990af>] dump_stack+0x4d/0x6e
 [<ffffffff80268702>] __warn+0xc2/0xe0
 [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20
 [<ffffffff8035b4ab>] ihold+0x2b/0x30
 [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180
 [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0
 [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530
 [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0
 [<ffffffff8036a147>] wb_workfn+0xd7/0x280
 [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0
 [<ffffffff8027bb09>] process_one_work+0x129/0x300
 [<ffffffff8027be06>] worker_thread+0x126/0x480
 [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561
 [<ffffffff8027bce0>] ? process_one_work+0x300/0x300
 [<ffffffff80280ff4>] kthread+0xc4/0xe0
 [<ffffffff80335578>] ? kfree+0xc8/0x100
 [<ffffffff809903cf>] ret_from_fork+0x1f/0x40
 [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70
---[ end trace aaefd2fd9f306bc4 ]---

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
---
 fs/fs-writeback.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 989a2ce..b44ede0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -478,14 +478,15 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
 	spin_lock(&inode->i_lock);
 	if (!(inode->i_sb->s_flags & MS_ACTIVE) ||
 	    inode->i_state & (I_WB_SWITCH | I_FREEING) ||
+	    atomic_read(&inode->i_count) == 0 ||
 	    inode_to_wb(inode) == isw->new_wb) {
 		spin_unlock(&inode->i_lock);
 		goto out_free;
 	}
 	inode->i_state |= I_WB_SWITCH;
+	ihold(inode);
 	spin_unlock(&inode->i_lock);
 
-	ihold(inode);
 	isw->inode = inode;
 
 	atomic_inc(&isw_nr_in_flight);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] writeback: inode cgroup wb switch should skip inode with zero i_count
  2016-06-13 22:37 [PATCH] writeback: inode cgroup wb switch should skip inode with zero i_count Tahsin Erdogan
@ 2016-06-15 12:26 ` Jan Kara
  2016-06-15 14:55   ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2016-06-15 12:26 UTC (permalink / raw)
  To: Tahsin Erdogan
  Cc: Jens Axboe, Tejun Heo, Alexander Viro, Jan Kara, linux-fsdevel,
	linux-kernel

On Mon 13-06-16 15:37:09, Tahsin Erdogan wrote:
> Asynchronous wb switching of inodes takes an additional ref count on an
> inode to make sure inode remains valid until switchover is completed.
> 
> However, it is possible that inode->i_count has already reached zero
> while inode is in writeback queue:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30
> CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
> 01/01/2011
> Workqueue: writeback wb_workfn (flush-8:16)
>  0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000
>  0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2
>  ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003
> Call Trace:
>  [<ffffffff805990af>] dump_stack+0x4d/0x6e
>  [<ffffffff80268702>] __warn+0xc2/0xe0
>  [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20
>  [<ffffffff8035b4ab>] ihold+0x2b/0x30
>  [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180
>  [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0
>  [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530
>  [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0
>  [<ffffffff8036a147>] wb_workfn+0xd7/0x280
>  [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0
>  [<ffffffff8027bb09>] process_one_work+0x129/0x300
>  [<ffffffff8027be06>] worker_thread+0x126/0x480
>  [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561
>  [<ffffffff8027bce0>] ? process_one_work+0x300/0x300
>  [<ffffffff80280ff4>] kthread+0xc4/0xe0
>  [<ffffffff80335578>] ? kfree+0xc8/0x100
>  [<ffffffff809903cf>] ret_from_fork+0x1f/0x40
>  [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70
> ---[ end trace aaefd2fd9f306bc4 ]---
> 
> Acked-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Tahsin Erdogan <tahsin@google.com>

Ugh, this looks ugly. Inode with i_count == 0 without I_FREEING set is
sitting in inode LRU list. It may get reused at which point it would be
actually good if it switched WB to the good one, no?

Since we actually hold i_lock and have checked the inode is not being
freed, we can just use __iget() to grab the inode reference. That avoids
the warning and fixes the race as well. Something like:

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 989a2ce..b44ede0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -478,14 +478,15 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
 		goto out_free;
 	}
 	inode->i_state |= I_WB_SWITCH;
+	__iget(inode);
 	spin_unlock(&inode->i_lock);
 
-	ihold(inode);
 	isw->inode = inode;
 
 	atomic_inc(&isw_nr_in_flight);

Thoughts?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] writeback: inode cgroup wb switch should skip inode with zero i_count
  2016-06-15 12:26 ` Jan Kara
@ 2016-06-15 14:55   ` Tejun Heo
  0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2016-06-15 14:55 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tahsin Erdogan, Jens Axboe, Alexander Viro, Jan Kara,
	linux-fsdevel, linux-kernel

Hello, Jan.

On Wed, Jun 15, 2016 at 02:26:40PM +0200, Jan Kara wrote:
> Ugh, this looks ugly. Inode with i_count == 0 without I_FREEING set is
> sitting in inode LRU list. It may get reused at which point it would be
> actually good if it switched WB to the good one, no?

Yes, that'd be better but the switching is heuristics driven best
effort thing anyway, so occasionally failing to switch isn't critical.

> Since we actually hold i_lock and have checked the inode is not being
> freed, we can just use __iget() to grab the inode reference. That avoids
> the warning and fixes the race as well. Something like:
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 989a2ce..b44ede0 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -478,14 +478,15 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
>  		goto out_free;
>  	}
>  	inode->i_state |= I_WB_SWITCH;
> +	__iget(inode);
>  	spin_unlock(&inode->i_lock);
>  
> -	ihold(inode);
>  	isw->inode = inode;
>  
>  	atomic_inc(&isw_nr_in_flight);

That said, the above looks better.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] writeback: inode cgroup wb switch should skip inode with zero i_count
  2016-06-09  2:59 Tahsin Erdogan
@ 2016-06-13 22:11 ` Tejun Heo
  0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2016-06-13 22:11 UTC (permalink / raw)
  To: Tahsin Erdogan; +Cc: Alexander Viro, linux-fsdevel, linux-kernel

On Wed, Jun 08, 2016 at 07:59:28PM -0700, Tahsin Erdogan wrote:
> Asynchronous wb switching of inodes takes an additional ref count on an
> inode to make sure inode remains valid until switchover is completed.
> 
> However, it is possible that inode->i_count has already reached zero
> while inode is in writeback queue:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30
> CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
> 01/01/2011
> Workqueue: writeback wb_workfn (flush-8:16)
>  0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000
>  0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2
>  ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003
> Call Trace:
>  [<ffffffff805990af>] dump_stack+0x4d/0x6e
>  [<ffffffff80268702>] __warn+0xc2/0xe0
>  [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20
>  [<ffffffff8035b4ab>] ihold+0x2b/0x30
>  [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180
>  [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0
>  [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530
>  [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0
>  [<ffffffff8036a147>] wb_workfn+0xd7/0x280
>  [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0
>  [<ffffffff8027bb09>] process_one_work+0x129/0x300
>  [<ffffffff8027be06>] worker_thread+0x126/0x480
>  [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561
>  [<ffffffff8027bce0>] ? process_one_work+0x300/0x300
>  [<ffffffff80280ff4>] kthread+0xc4/0xe0
>  [<ffffffff80335578>] ? kfree+0xc8/0x100
>  [<ffffffff809903cf>] ret_from_fork+0x1f/0x40
>  [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70
> ---[ end trace aaefd2fd9f306bc4 ]---
> 
> Signed-off-by: Tahsin Erdogan <tahsin@google.com>

Acked-by: Tejun Heo <tj@kernel.org>

Can you please repost the patch to Jens Axboe <axboe@kernel.dk> with
acked-by added?  Please also cc Jan Kara <jack@suse.com>.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] writeback: inode cgroup wb switch should skip inode with zero i_count
@ 2016-06-09  2:59 Tahsin Erdogan
  2016-06-13 22:11 ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Tahsin Erdogan @ 2016-06-09  2:59 UTC (permalink / raw)
  To: Tejun Heo, Alexander Viro; +Cc: linux-fsdevel, linux-kernel, Tahsin Erdogan

Asynchronous wb switching of inodes takes an additional ref count on an
inode to make sure inode remains valid until switchover is completed.

However, it is possible that inode->i_count has already reached zero
while inode is in writeback queue:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30
CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: writeback wb_workfn (flush-8:16)
 0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000
 0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2
 ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003
Call Trace:
 [<ffffffff805990af>] dump_stack+0x4d/0x6e
 [<ffffffff80268702>] __warn+0xc2/0xe0
 [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20
 [<ffffffff8035b4ab>] ihold+0x2b/0x30
 [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180
 [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0
 [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530
 [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0
 [<ffffffff8036a147>] wb_workfn+0xd7/0x280
 [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0
 [<ffffffff8027bb09>] process_one_work+0x129/0x300
 [<ffffffff8027be06>] worker_thread+0x126/0x480
 [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561
 [<ffffffff8027bce0>] ? process_one_work+0x300/0x300
 [<ffffffff80280ff4>] kthread+0xc4/0xe0
 [<ffffffff80335578>] ? kfree+0xc8/0x100
 [<ffffffff809903cf>] ret_from_fork+0x1f/0x40
 [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70
---[ end trace aaefd2fd9f306bc4 ]---

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
---
 fs/fs-writeback.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 989a2ce..b44ede0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -478,14 +478,15 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
 	spin_lock(&inode->i_lock);
 	if (!(inode->i_sb->s_flags & MS_ACTIVE) ||
 	    inode->i_state & (I_WB_SWITCH | I_FREEING) ||
+	    atomic_read(&inode->i_count) == 0 ||
 	    inode_to_wb(inode) == isw->new_wb) {
 		spin_unlock(&inode->i_lock);
 		goto out_free;
 	}
 	inode->i_state |= I_WB_SWITCH;
+	ihold(inode);
 	spin_unlock(&inode->i_lock);
 
-	ihold(inode);
 	isw->inode = inode;
 
 	atomic_inc(&isw_nr_in_flight);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-06-15 14:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-13 22:37 [PATCH] writeback: inode cgroup wb switch should skip inode with zero i_count Tahsin Erdogan
2016-06-15 12:26 ` Jan Kara
2016-06-15 14:55   ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2016-06-09  2:59 Tahsin Erdogan
2016-06-13 22:11 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).