Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH block/for-linus] cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
@ 2019-11-08 20:18 Tejun Heo
  2019-11-08 20:33 ` Dennis Zhou
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Tejun Heo @ 2019-11-08 20:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, cgroups, kernel-team, Li Zefan, Johannes Weiner,
	Jan Kara, Konstantin Khlebnikov, Dennis Zhou

cgroup writeback tries to refresh the associated wb immediately if the
current wb is dead.  This is to avoid keeping issuing IOs on the stale
wb after memcg - blkcg association has changed (ie. when blkcg got
disabled / enabled higher up in the hierarchy).

Unfortunately, the logic gets triggered spuriously on inodes which are
associated with dead cgroups.  When the logic is triggered on dead
cgroups, the attempt fails only after doing quite a bit of work
allocating and initializing a new wb.

While c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping
has no dirty pages") alleviated the issue significantly as it now only
triggers when the inode has dirty pages.  However, the condition can
still be triggered before the inode is switched to a different cgroup
and the logic simply doesn't make sense.

Skip the immediate switching if the associated memcg is dying.

This is a simplified version of the following two patches:

 * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/
 * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: e8a7abf5a5bd ("writeback: disassociate inodes from dying bdi_writebacks")
---
 fs/fs-writeback.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 8461a6322039..335607b8c5c0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -576,10 +576,13 @@ void wbc_attach_and_unlock_inode(struct writeback_control *wbc,
 	spin_unlock(&inode->i_lock);
 
 	/*
-	 * A dying wb indicates that the memcg-blkcg mapping has changed
-	 * and a new wb is already serving the memcg.  Switch immediately.
+	 * A dying wb indicates that either the blkcg associated with the
+	 * memcg changed or the associated memcg is dying.  In the first
+	 * case, a replacement wb should already be available and we should
+	 * refresh the wb immediately.  In the second case, trying to
+	 * refresh will keep failing.
 	 */
-	if (unlikely(wb_dying(wbc->wb)))
+	if (unlikely(wb_dying(wbc->wb) && !css_is_dying(wbc->wb->memcg_css)))
 		inode_switch_wbs(inode, wbc->wb_id);
 }
 EXPORT_SYMBOL_GPL(wbc_attach_and_unlock_inode);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH block/for-linus] cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
  2019-11-08 20:18 [PATCH block/for-linus] cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead Tejun Heo
@ 2019-11-08 20:33 ` Dennis Zhou
  2019-11-08 20:37 ` Jens Axboe
  2019-11-11 13:15 ` Michal Hocko
  2 siblings, 0 replies; 6+ messages in thread
From: Dennis Zhou @ 2019-11-08 20:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block, cgroups, kernel-team, Li Zefan,
	Johannes Weiner, Jan Kara, Konstantin Khlebnikov, Dennis Zhou

On Fri, Nov 08, 2019 at 12:18:29PM -0800, Tejun Heo wrote:
> cgroup writeback tries to refresh the associated wb immediately if the
> current wb is dead.  This is to avoid keeping issuing IOs on the stale
> wb after memcg - blkcg association has changed (ie. when blkcg got
> disabled / enabled higher up in the hierarchy).
> 
> Unfortunately, the logic gets triggered spuriously on inodes which are
> associated with dead cgroups.  When the logic is triggered on dead
> cgroups, the attempt fails only after doing quite a bit of work
> allocating and initializing a new wb.
> 
> While c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping
> has no dirty pages") alleviated the issue significantly as it now only
> triggers when the inode has dirty pages.  However, the condition can
> still be triggered before the inode is switched to a different cgroup
> and the logic simply doesn't make sense.
> 
> Skip the immediate switching if the associated memcg is dying.
> 
> This is a simplified version of the following two patches:
> 
>  * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/
>  * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Dennis Zhou <dennis@kernel.org>
> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: e8a7abf5a5bd ("writeback: disassociate inodes from dying bdi_writebacks")
> ---
>  fs/fs-writeback.c |    9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 8461a6322039..335607b8c5c0 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -576,10 +576,13 @@ void wbc_attach_and_unlock_inode(struct writeback_control *wbc,
>  	spin_unlock(&inode->i_lock);
>  
>  	/*
> -	 * A dying wb indicates that the memcg-blkcg mapping has changed
> -	 * and a new wb is already serving the memcg.  Switch immediately.
> +	 * A dying wb indicates that either the blkcg associated with the
> +	 * memcg changed or the associated memcg is dying.  In the first
> +	 * case, a replacement wb should already be available and we should
> +	 * refresh the wb immediately.  In the second case, trying to
> +	 * refresh will keep failing.
>  	 */
> -	if (unlikely(wb_dying(wbc->wb)))
> +	if (unlikely(wb_dying(wbc->wb) && !css_is_dying(wbc->wb->memcg_css)))
>  		inode_switch_wbs(inode, wbc->wb_id);
>  }
>  EXPORT_SYMBOL_GPL(wbc_attach_and_unlock_inode);

Acked-by: Dennis Zhou <dennis@kernel.org>

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH block/for-linus] cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
  2019-11-08 20:18 [PATCH block/for-linus] cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead Tejun Heo
  2019-11-08 20:33 ` Dennis Zhou
@ 2019-11-08 20:37 ` Jens Axboe
  2019-11-11 13:15 ` Michal Hocko
  2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2019-11-08 20:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-block, cgroups, kernel-team, Li Zefan, Johannes Weiner,
	Jan Kara, Konstantin Khlebnikov, Dennis Zhou

On 11/8/19 1:18 PM, Tejun Heo wrote:
> cgroup writeback tries to refresh the associated wb immediately if the
> current wb is dead.  This is to avoid keeping issuing IOs on the stale
> wb after memcg - blkcg association has changed (ie. when blkcg got
> disabled / enabled higher up in the hierarchy).
> 
> Unfortunately, the logic gets triggered spuriously on inodes which are
> associated with dead cgroups.  When the logic is triggered on dead
> cgroups, the attempt fails only after doing quite a bit of work
> allocating and initializing a new wb.
> 
> While c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping
> has no dirty pages") alleviated the issue significantly as it now only
> triggers when the inode has dirty pages.  However, the condition can
> still be triggered before the inode is switched to a different cgroup
> and the logic simply doesn't make sense.
> 
> Skip the immediate switching if the associated memcg is dying.
> 
> This is a simplified version of the following two patches:
> 
>   * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/
>   * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz

Applied for 5.4, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH block/for-linus] cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
  2019-11-08 20:18 [PATCH block/for-linus] cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead Tejun Heo
  2019-11-08 20:33 ` Dennis Zhou
  2019-11-08 20:37 ` Jens Axboe
@ 2019-11-11 13:15 ` Michal Hocko
  2019-11-11 16:18   ` Tejun Heo
  2 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2019-11-11 13:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block, cgroups, kernel-team, Li Zefan,
	Johannes Weiner, Jan Kara, Konstantin Khlebnikov, Dennis Zhou

On Fri 08-11-19 12:18:29, Tejun Heo wrote:
> cgroup writeback tries to refresh the associated wb immediately if the
> current wb is dead.  This is to avoid keeping issuing IOs on the stale
> wb after memcg - blkcg association has changed (ie. when blkcg got
> disabled / enabled higher up in the hierarchy).
> 
> Unfortunately, the logic gets triggered spuriously on inodes which are
> associated with dead cgroups.  When the logic is triggered on dead
> cgroups, the attempt fails only after doing quite a bit of work
> allocating and initializing a new wb.
> 
> While c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping
> has no dirty pages") alleviated the issue significantly as it now only
> triggers when the inode has dirty pages.  However, the condition can
> still be triggered before the inode is switched to a different cgroup
> and the logic simply doesn't make sense.
> 
> Skip the immediate switching if the associated memcg is dying.
> 
> This is a simplified version of the following two patches:
> 
>  * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/
>  * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Dennis Zhou <dennis@kernel.org>
> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: e8a7abf5a5bd ("writeback: disassociate inodes from dying bdi_writebacks")

Is this a stable material?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH block/for-linus] cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
  2019-11-11 13:15 ` Michal Hocko
@ 2019-11-11 16:18   ` Tejun Heo
  2019-11-11 16:34     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2019-11-11 16:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Sasha Levin, stable
  Cc: Jens Axboe, linux-block, cgroups, kernel-team, Li Zefan,
	Johannes Weiner, Jan Kara, Konstantin Khlebnikov, Dennis Zhou,
	Michal Hocko

Hello, Michal.

On Mon, Nov 11, 2019 at 02:15:44PM +0100, Michal Hocko wrote:
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> > Cc: Dennis Zhou <dennis@kernel.org>
> > Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > Fixes: e8a7abf5a5bd ("writeback: disassociate inodes from dying bdi_writebacks")
> 
> Is this a stable material?

c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping has
no dirty pages") likely addresses larger part of the problem, but yeah
it prolly makes sense to backport both for -stable.

Greg, Sasha, can you pick the following two commits for -stable?

* c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping has
  no dirty pages")

* 65de03e25138 ("cgroup,writeback: don't switch wbs immediately on
  dead wbs if the memcg is dead")

Both are fixes for e8a7abf5a5bd ("writeback: disassociate inodes from
dying bdi_writebacks") - v4.2+.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH block/for-linus] cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
  2019-11-11 16:18   ` Tejun Heo
@ 2019-11-11 16:34     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2019-11-11 16:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sasha Levin, stable, Jens Axboe, linux-block, cgroups,
	kernel-team, Li Zefan, Johannes Weiner, Jan Kara,
	Konstantin Khlebnikov, Dennis Zhou, Michal Hocko

On Mon, Nov 11, 2019 at 08:18:16AM -0800, Tejun Heo wrote:
> Hello, Michal.
> 
> On Mon, Nov 11, 2019 at 02:15:44PM +0100, Michal Hocko wrote:
> > > Signed-off-by: Tejun Heo <tj@kernel.org>
> > > Cc: Dennis Zhou <dennis@kernel.org>
> > > Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > > Fixes: e8a7abf5a5bd ("writeback: disassociate inodes from dying bdi_writebacks")
> > 
> > Is this a stable material?
> 
> c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping has
> no dirty pages") likely addresses larger part of the problem, but yeah
> it prolly makes sense to backport both for -stable.
> 
> Greg, Sasha, can you pick the following two commits for -stable?
> 
> * c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping has
>   no dirty pages")
> 
> * 65de03e25138 ("cgroup,writeback: don't switch wbs immediately on
>   dead wbs if the memcg is dead")
> 
> Both are fixes for e8a7abf5a5bd ("writeback: disassociate inodes from
> dying bdi_writebacks") - v4.2+.

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-08 20:18 [PATCH block/for-linus] cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead Tejun Heo
2019-11-08 20:33 ` Dennis Zhou
2019-11-08 20:37 ` Jens Axboe
2019-11-11 13:15 ` Michal Hocko
2019-11-11 16:18   ` Tejun Heo
2019-11-11 16:34     ` Greg Kroah-Hartman

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git