linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET block/for-linus] Assorted blkcg fixes
@ 2019-06-13 22:30 Tejun Heo
  2019-06-13 22:30 ` [PATCH 1/5] blk-iolatency: clear use_delay when io.latency is set to zero Tejun Heo
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Tejun Heo @ 2019-06-13 22:30 UTC (permalink / raw)
  To: axboe, jbacik; +Cc: linux-kernel, linux-block, kernel-team, dennis, jack

Hello,

This patchset contains the following blkcg fixes.

 0001-blk-iolatency-clear-use_delay-when-io.latency-is-set.patch
 0002-blkcg-update-blkcg_print_stat-to-handle-larger-outpu.patch
 0003-blkcg-perpcu_ref-init-exit-should-be-done-from-blkg_.patch
 0004-blkcg-blkcg_activate_policy-should-initialize-ancest.patch
 0005-blkcg-writeback-dead-memcgs-shouldn-t-contribute-to-.patch

Please refer to each patch's description for details.  Patchset is
also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-fixes

Thanks.  diffstat follows.

 block/blk-cgroup.c    |   24 ++++++++++++------------
 block/blk-iolatency.c |    4 +++-
 fs/fs-writeback.c     |    8 +++++++-
 3 files changed, 22 insertions(+), 14 deletions(-)

--
tejun


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] blk-iolatency: clear use_delay when io.latency is set to zero
  2019-06-13 22:30 [PATCHSET block/for-linus] Assorted blkcg fixes Tejun Heo
@ 2019-06-13 22:30 ` Tejun Heo
  2019-06-13 22:30 ` [PATCH 2/5] blkcg: update blkcg_print_stat() to handle larger outputs Tejun Heo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2019-06-13 22:30 UTC (permalink / raw)
  To: axboe, jbacik
  Cc: linux-kernel, linux-block, kernel-team, dennis, jack, Tejun Heo, stable

If use_delay was non-zero when the latency target of a cgroup was set
to zero, it will stay stuck until io.latency is enabled on the cgroup
again.  This keeps readahead disabled for the cgroup impacting
performance negatively.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <jbacik@fb.com>
Fixes: d70675121546 ("block: introduce blk-iolatency io controller")
Cc: stable@vger.kernel.org # v4.19+
---
 block/blk-iolatency.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index d22e61bced86..17896bb3aaf2 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -778,8 +778,10 @@ static int iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val)
 
 	if (!oldval && val)
 		return 1;
-	if (oldval && !val)
+	if (oldval && !val) {
+		blkcg_clear_delay(blkg);
 		return -1;
+	}
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] blkcg: update blkcg_print_stat() to handle larger outputs
  2019-06-13 22:30 [PATCHSET block/for-linus] Assorted blkcg fixes Tejun Heo
  2019-06-13 22:30 ` [PATCH 1/5] blk-iolatency: clear use_delay when io.latency is set to zero Tejun Heo
@ 2019-06-13 22:30 ` Tejun Heo
  2019-06-13 22:30 ` [PATCH 3/5] blkcg: perpcu_ref init/exit should be done from blkg_alloc/free() Tejun Heo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2019-06-13 22:30 UTC (permalink / raw)
  To: axboe, jbacik
  Cc: linux-kernel, linux-block, kernel-team, dennis, jack, Tejun Heo, stable

Depending on the number of devices, blkcg stats can go over the
default seqfile buf size.  seqfile normally retries with a larger
buffer but since the ->pd_stat() addition, blkcg_print_stat() doesn't
tell seqfile that overflow has happened and the output gets printed
truncated.  Fix it by calling seq_commit() w/ -1 on possible
overflows.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 903d23f0a354 ("blk-cgroup: allow controllers to output their own stats")
Cc: stable@vger.kernel.org # v4.19+
Cc: Josef Bacik <jbacik@fb.com>
---
 block/blk-cgroup.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1f7127b03490..e4715b35d42c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1006,8 +1006,12 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
 		}
 next:
 		if (has_stats) {
-			off += scnprintf(buf+off, size-off, "\n");
-			seq_commit(sf, off);
+			if (off < size - 1) {
+				off += scnprintf(buf+off, size-off, "\n");
+				seq_commit(sf, off);
+			} else {
+				seq_commit(sf, -1);
+			}
 		}
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] blkcg: perpcu_ref init/exit should be done from blkg_alloc/free()
  2019-06-13 22:30 [PATCHSET block/for-linus] Assorted blkcg fixes Tejun Heo
  2019-06-13 22:30 ` [PATCH 1/5] blk-iolatency: clear use_delay when io.latency is set to zero Tejun Heo
  2019-06-13 22:30 ` [PATCH 2/5] blkcg: update blkcg_print_stat() to handle larger outputs Tejun Heo
@ 2019-06-13 22:30 ` Tejun Heo
  2019-06-13 22:30 ` [PATCH 4/5] blkcg: blkcg_activate_policy() should initialize ancestors first Tejun Heo
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2019-06-13 22:30 UTC (permalink / raw)
  To: axboe, jbacik
  Cc: linux-kernel, linux-block, kernel-team, dennis, jack, Tejun Heo

blkg alloc is performed as a separate step from the rest of blkg
creation so that GFP_KERNEL allocations can be used when creating
blkgs from configuration file writes because otherwise user actions
may fail due to failures of opportunistic GFP_NOWAIT allocations.

While making blkgs use percpu_ref, 7fcf2b033b84 ("blkcg: change blkg
reference counting to use percpu_ref") incorrectly added unconditional
opportunistic percpu_ref_init() to blkg_create() breaking this
guarantee.

This patch moves percpu_ref_init() to blkg_alloc() so makes it use
@gfp_mask that blkg_alloc() is called with.  Also, percpu_ref_exit()
is moved to blkg_free() for consistency.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref")
Cc: Dennis Zhou <dennis@kernel.org>
---
 block/blk-cgroup.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index e4715b35d42c..04d286934c5e 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -79,6 +79,7 @@ static void blkg_free(struct blkcg_gq *blkg)
 
 	blkg_rwstat_exit(&blkg->stat_ios);
 	blkg_rwstat_exit(&blkg->stat_bytes);
+	percpu_ref_exit(&blkg->refcnt);
 	kfree(blkg);
 }
 
@@ -86,8 +87,6 @@ static void __blkg_release(struct rcu_head *rcu)
 {
 	struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
 
-	percpu_ref_exit(&blkg->refcnt);
-
 	/* release the blkcg and parent blkg refs this blkg has been holding */
 	css_put(&blkg->blkcg->css);
 	if (blkg->parent)
@@ -132,6 +131,9 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
 	if (!blkg)
 		return NULL;
 
+	if (percpu_ref_init(&blkg->refcnt, blkg_release, 0, gfp_mask))
+		goto err_free;
+
 	if (blkg_rwstat_init(&blkg->stat_bytes, gfp_mask) ||
 	    blkg_rwstat_init(&blkg->stat_ios, gfp_mask))
 		goto err_free;
@@ -244,11 +246,6 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 		blkg_get(blkg->parent);
 	}
 
-	ret = percpu_ref_init(&blkg->refcnt, blkg_release, 0,
-			      GFP_NOWAIT | __GFP_NOWARN);
-	if (ret)
-		goto err_cancel_ref;
-
 	/* invoke per-policy init */
 	for (i = 0; i < BLKCG_MAX_POLS; i++) {
 		struct blkcg_policy *pol = blkcg_policy[i];
@@ -281,8 +278,6 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 	blkg_put(blkg);
 	return ERR_PTR(ret);
 
-err_cancel_ref:
-	percpu_ref_exit(&blkg->refcnt);
 err_put_congested:
 	wb_congested_put(wb_congested);
 err_put_css:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] blkcg: blkcg_activate_policy() should initialize ancestors first
  2019-06-13 22:30 [PATCHSET block/for-linus] Assorted blkcg fixes Tejun Heo
                   ` (2 preceding siblings ...)
  2019-06-13 22:30 ` [PATCH 3/5] blkcg: perpcu_ref init/exit should be done from blkg_alloc/free() Tejun Heo
@ 2019-06-13 22:30 ` Tejun Heo
  2019-06-13 22:30 ` [PATCH 5/5] blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration Tejun Heo
  2019-06-15  7:40 ` [PATCHSET block/for-linus] Assorted blkcg fixes Jens Axboe
  5 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2019-06-13 22:30 UTC (permalink / raw)
  To: axboe, jbacik
  Cc: linux-kernel, linux-block, kernel-team, dennis, jack, Tejun Heo

When blkcg_activate_policy() is creating blkg_policy_data for existing
blkgs, it did in the wrong order - descendants first.  Fix it.  None
of the existing controllers seem affected by this.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 04d286934c5e..440797293235 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1390,7 +1390,8 @@ int blkcg_activate_policy(struct request_queue *q,
 
 	spin_lock_irq(&q->queue_lock);
 
-	list_for_each_entry(blkg, &q->blkg_list, q_node) {
+	/* blkg_list is pushed at the head, reverse walk to init parents first */
+	list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
 		struct blkg_policy_data *pd;
 
 		if (blkg->pd[pol->plid])
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration
  2019-06-13 22:30 [PATCHSET block/for-linus] Assorted blkcg fixes Tejun Heo
                   ` (3 preceding siblings ...)
  2019-06-13 22:30 ` [PATCH 4/5] blkcg: blkcg_activate_policy() should initialize ancestors first Tejun Heo
@ 2019-06-13 22:30 ` Tejun Heo
  2019-06-19 11:20   ` Jan Kara
  2019-06-15  7:40 ` [PATCHSET block/for-linus] Assorted blkcg fixes Jens Axboe
  5 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2019-06-13 22:30 UTC (permalink / raw)
  To: axboe, jbacik
  Cc: linux-kernel, linux-block, kernel-team, dennis, jack, Tejun Heo

wbc_account_io() collects information on cgroup ownership of writeback
pages to determine which cgroup should own the inode.  Pages can stay
associated with dead memcgs but we want to avoid attributing IOs to
dead blkcgs as much as possible as the association is likely to be
stale.  However, currently, pages associated with dead memcgs
contribute to the accounting delaying and/or confusing the
arbitration.

Fix it by ignoring pages associated with dead memcgs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e41cbe8e81b9..9ebfb1b28430 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -715,6 +715,7 @@ void wbc_detach_inode(struct writeback_control *wbc)
 void wbc_account_io(struct writeback_control *wbc, struct page *page,
 		    size_t bytes)
 {
+	struct cgroup_subsys_state *css;
 	int id;
 
 	/*
@@ -726,7 +727,12 @@ void wbc_account_io(struct writeback_control *wbc, struct page *page,
 	if (!wbc->wb)
 		return;
 
-	id = mem_cgroup_css_from_page(page)->id;
+	css = mem_cgroup_css_from_page(page);
+	/* dead cgroups shouldn't contribute to inode ownership arbitration */
+	if (!(css->flags & CSS_ONLINE))
+		return;
+
+	id = css->id;
 
 	if (id == wbc->wb_id) {
 		wbc->wb_bytes += bytes;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHSET block/for-linus] Assorted blkcg fixes
  2019-06-13 22:30 [PATCHSET block/for-linus] Assorted blkcg fixes Tejun Heo
                   ` (4 preceding siblings ...)
  2019-06-13 22:30 ` [PATCH 5/5] blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration Tejun Heo
@ 2019-06-15  7:40 ` Jens Axboe
  2019-06-15 15:50   ` Tejun Heo
  5 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2019-06-15  7:40 UTC (permalink / raw)
  To: Tejun Heo, jbacik; +Cc: linux-kernel, linux-block, kernel-team, dennis, jack

On 6/13/19 4:30 PM, Tejun Heo wrote:
> Hello,
> 
> This patchset contains the following blkcg fixes.
> 
>   0001-blk-iolatency-clear-use_delay-when-io.latency-is-set.patch
>   0002-blkcg-update-blkcg_print_stat-to-handle-larger-outpu.patch
>   0003-blkcg-perpcu_ref-init-exit-should-be-done-from-blkg_.patch
>   0004-blkcg-blkcg_activate_policy-should-initialize-ancest.patch
>   0005-blkcg-writeback-dead-memcgs-shouldn-t-contribute-to-.patch
> 
> Please refer to each patch's description for details.  Patchset is
> also available in the following git branch.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-fixes
> 
> Thanks.  diffstat follows.

Are you fine with these hitting 5.3?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHSET block/for-linus] Assorted blkcg fixes
  2019-06-15  7:40 ` [PATCHSET block/for-linus] Assorted blkcg fixes Jens Axboe
@ 2019-06-15 15:50   ` Tejun Heo
  2019-06-15 16:40     ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2019-06-15 15:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: jbacik, linux-kernel, linux-block, kernel-team, dennis, jack

Hello,

On Sat, Jun 15, 2019 at 01:40:50AM -0600, Jens Axboe wrote:
> > Please refer to each patch's description for details.  Patchset is
> > also available in the following git branch.
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-fixes
> > 
> > Thanks.  diffstat follows.
> 
> Are you fine with these hitting 5.3?

Yeah, none of them are very urgent.  5.3 should be fine.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHSET block/for-linus] Assorted blkcg fixes
  2019-06-15 15:50   ` Tejun Heo
@ 2019-06-15 16:40     ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2019-06-15 16:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jbacik, linux-kernel, linux-block, kernel-team, dennis, jack

On 6/15/19 9:50 AM, Tejun Heo wrote:
> Hello,
> 
> On Sat, Jun 15, 2019 at 01:40:50AM -0600, Jens Axboe wrote:
>>> Please refer to each patch's description for details.  Patchset is
>>> also available in the following git branch.
>>>
>>>    git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-fixes
>>>
>>> Thanks.  diffstat follows.
>>
>> Are you fine with these hitting 5.3?
> 
> Yeah, none of them are very urgent.  5.3 should be fine.

OK great, added for 5.3, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration
  2019-06-13 22:30 ` [PATCH 5/5] blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration Tejun Heo
@ 2019-06-19 11:20   ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2019-06-19 11:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: axboe, jbacik, linux-kernel, linux-block, kernel-team, dennis, jack

On Thu 13-06-19 15:30:41, Tejun Heo wrote:
> wbc_account_io() collects information on cgroup ownership of writeback
> pages to determine which cgroup should own the inode.  Pages can stay
> associated with dead memcgs but we want to avoid attributing IOs to
> dead blkcgs as much as possible as the association is likely to be
> stale.  However, currently, pages associated with dead memcgs
> contribute to the accounting delaying and/or confusing the
> arbitration.
> 
> Fix it by ignoring pages associated with dead memcgs.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Jan Kara <jack@suse.cz>

I see Jens has already pulled the changes so this is mostly informative but
the patch looks good to me.

								Honza
> ---
>  fs/fs-writeback.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index e41cbe8e81b9..9ebfb1b28430 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -715,6 +715,7 @@ void wbc_detach_inode(struct writeback_control *wbc)
>  void wbc_account_io(struct writeback_control *wbc, struct page *page,
>  		    size_t bytes)
>  {
> +	struct cgroup_subsys_state *css;
>  	int id;
>  
>  	/*
> @@ -726,7 +727,12 @@ void wbc_account_io(struct writeback_control *wbc, struct page *page,
>  	if (!wbc->wb)
>  		return;
>  
> -	id = mem_cgroup_css_from_page(page)->id;
> +	css = mem_cgroup_css_from_page(page);
> +	/* dead cgroups shouldn't contribute to inode ownership arbitration */
> +	if (!(css->flags & CSS_ONLINE))
> +		return;
> +
> +	id = css->id;
>  
>  	if (id == wbc->wb_id) {
>  		wbc->wb_bytes += bytes;
> -- 
> 2.17.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-06-19 11:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-13 22:30 [PATCHSET block/for-linus] Assorted blkcg fixes Tejun Heo
2019-06-13 22:30 ` [PATCH 1/5] blk-iolatency: clear use_delay when io.latency is set to zero Tejun Heo
2019-06-13 22:30 ` [PATCH 2/5] blkcg: update blkcg_print_stat() to handle larger outputs Tejun Heo
2019-06-13 22:30 ` [PATCH 3/5] blkcg: perpcu_ref init/exit should be done from blkg_alloc/free() Tejun Heo
2019-06-13 22:30 ` [PATCH 4/5] blkcg: blkcg_activate_policy() should initialize ancestors first Tejun Heo
2019-06-13 22:30 ` [PATCH 5/5] blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration Tejun Heo
2019-06-19 11:20   ` Jan Kara
2019-06-15  7:40 ` [PATCHSET block/for-linus] Assorted blkcg fixes Jens Axboe
2019-06-15 15:50   ` Tejun Heo
2019-06-15 16:40     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).