All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4.6-rc] writeback: Fix performance regression in wb_over_bg_thresh()
@ 2016-05-05 19:51 Miklos Szeredi
  2016-05-05 21:43 ` Sedat Dilek
  2016-05-05 21:45 ` Jens Axboe
  0 siblings, 2 replies; 3+ messages in thread
From: Miklos Szeredi @ 2016-05-05 19:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Howard Cochran, Jakob Unterwurzacher, Tejun Heo, Ashish Sangwan,
	Sedat Dilek, Linux-Fsdevel, Kernel Mailing List

Hi Jens,

This fix seems to have been missed; it should go into v4.6.

Please apply.

Thanks,
Miklos


From: Howard Cochran <hcochran@kernelspring.com>
Subject: writeback: Fix performance regression in wb_over_bg_thresh()
Date: Thu, 10 Mar 2016 01:12:39 -0500

Commit 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use
wb_domain aware operations") unintentionally changed this function's
meaning from "are there more dirty pages than the background writeback
threshold" to "are there more dirty pages than the writeback threshold".
The background writeback threshold is typically half of the writeback
threshold, so this had the effect of raising the number of dirty pages
required to cause a writeback worker to perform background writeout.

This can cause a very severe performance regression when a BDI uses
BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker
can now disagree on whether writeback should be initiated.

For example, in a system having 1GB of RAM, a single spinning disk, and a
"pass-through" FUSE filesystem mounted over the disk, application code
mmapped a 128MB file on the disk and was randomly dirtying pages in that
mapping.

Because FUSE uses strictlimit and has a default max_ratio of only 1%, in
balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the
dirty_freerun_ceiling is the average of those, ~150. So, it pauses the
dirtying processes when we have 151 dirty pages and wakes up a background
writeback worker. But the worker tests the wrong threshold (200 instead of
100), so it does not initiate writeback and just returns.

Thus, balance_dirty_pages keeps looping, sleeping and then waking up the
worker who will do nothing. It remains stuck in this state until the few
dirty pages that we have finally expire and we write them back for that
reason. Then the whole process repeats, resulting in near-zero throughput
through the FUSE BDI.

The fix is to call the parameterized variant of wb_calc_thresh, so that the
worker will do writeback if the bg_thresh is exceeded which was the
behavior before the referenced commit.

Fixes: 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations")
Signed-off-by: Howard Cochran <hcochran@kernelspring.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.2+
---
 mm/page-writeback.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1910,7 +1910,8 @@ bool wb_over_bg_thresh(struct bdi_writeb
 	if (gdtc->dirty > gdtc->bg_thresh)
 		return true;
 
-	if (wb_stat(wb, WB_RECLAIMABLE) > __wb_calc_thresh(gdtc))
+	if (wb_stat(wb, WB_RECLAIMABLE) >
+	    wb_calc_thresh(gdtc->wb, gdtc->bg_thresh))
 		return true;
 
 	if (mdtc) {
@@ -1924,7 +1925,8 @@ bool wb_over_bg_thresh(struct bdi_writeb
 		if (mdtc->dirty > mdtc->bg_thresh)
 			return true;
 
-		if (wb_stat(wb, WB_RECLAIMABLE) > __wb_calc_thresh(mdtc))
+		if (wb_stat(wb, WB_RECLAIMABLE) >
+		    wb_calc_thresh(mdtc->wb, mdtc->bg_thresh))
 			return true;
 	}
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v4.6-rc] writeback: Fix performance regression in wb_over_bg_thresh()
  2016-05-05 19:51 [PATCH v4.6-rc] writeback: Fix performance regression in wb_over_bg_thresh() Miklos Szeredi
@ 2016-05-05 21:43 ` Sedat Dilek
  2016-05-05 21:45 ` Jens Axboe
  1 sibling, 0 replies; 3+ messages in thread
From: Sedat Dilek @ 2016-05-05 21:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Howard Cochran, Jakob Unterwurzacher, Tejun Heo,
	Ashish Sangwan, Linux-Fsdevel, Kernel Mailing List

On 5/5/16, Miklos Szeredi <miklos@szeredi.hu> wrote:
> Hi Jens,
>
> This fix seems to have been missed; it should go into v4.6.
>
> Please apply.
>
> Thanks,
> Miklos
>
>
> From: Howard Cochran <hcochran@kernelspring.com>
> Subject: writeback: Fix performance regression in wb_over_bg_thresh()
> Date: Thu, 10 Mar 2016 01:12:39 -0500
>
> Commit 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use
> wb_domain aware operations") unintentionally changed this function's
> meaning from "are there more dirty pages than the background writeback
> threshold" to "are there more dirty pages than the writeback threshold".
> The background writeback threshold is typically half of the writeback
> threshold, so this had the effect of raising the number of dirty pages
> required to cause a writeback worker to perform background writeout.
>
> This can cause a very severe performance regression when a BDI uses
> BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker
> can now disagree on whether writeback should be initiated.
>
> For example, in a system having 1GB of RAM, a single spinning disk, and a
> "pass-through" FUSE filesystem mounted over the disk, application code
> mmapped a 128MB file on the disk and was randomly dirtying pages in that
> mapping.
>
> Because FUSE uses strictlimit and has a default max_ratio of only 1%, in
> balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the
> dirty_freerun_ceiling is the average of those, ~150. So, it pauses the
> dirtying processes when we have 151 dirty pages and wakes up a background
> writeback worker. But the worker tests the wrong threshold (200 instead of
> 100), so it does not initiate writeback and just returns.
>
> Thus, balance_dirty_pages keeps looping, sleeping and then waking up the
> worker who will do nothing. It remains stuck in this state until the few
> dirty pages that we have finally expire and we write them back for that
> reason. Then the whole process repeats, resulting in near-zero throughput
> through the FUSE BDI.
>
> The fix is to call the parameterized variant of wb_calc_thresh, so that the
> worker will do writeback if the bg_thresh is exceeded which was the
> behavior before the referenced commit.
>
> Fixes: 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use wb_domain
> aware operations")
> Signed-off-by: Howard Cochran <hcochran@kernelspring.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> Cc: <stable@vger.kernel.org> # v4.2+

Fell free to add my...

      Tested-by Sedat Dilek <sedat.dilek@gmail.com>

- sed@ -

> ---
>  mm/page-writeback.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1910,7 +1910,8 @@ bool wb_over_bg_thresh(struct bdi_writeb
>  	if (gdtc->dirty > gdtc->bg_thresh)
>  		return true;
>
> -	if (wb_stat(wb, WB_RECLAIMABLE) > __wb_calc_thresh(gdtc))
> +	if (wb_stat(wb, WB_RECLAIMABLE) >
> +	    wb_calc_thresh(gdtc->wb, gdtc->bg_thresh))
>  		return true;
>
>  	if (mdtc) {
> @@ -1924,7 +1925,8 @@ bool wb_over_bg_thresh(struct bdi_writeb
>  		if (mdtc->dirty > mdtc->bg_thresh)
>  			return true;
>
> -		if (wb_stat(wb, WB_RECLAIMABLE) > __wb_calc_thresh(mdtc))
> +		if (wb_stat(wb, WB_RECLAIMABLE) >
> +		    wb_calc_thresh(mdtc->wb, mdtc->bg_thresh))
>  			return true;
>  	}
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v4.6-rc] writeback: Fix performance regression in wb_over_bg_thresh()
  2016-05-05 19:51 [PATCH v4.6-rc] writeback: Fix performance regression in wb_over_bg_thresh() Miklos Szeredi
  2016-05-05 21:43 ` Sedat Dilek
@ 2016-05-05 21:45 ` Jens Axboe
  1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2016-05-05 21:45 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Howard Cochran, Jakob Unterwurzacher, Tejun Heo, Ashish Sangwan,
	Sedat Dilek, Linux-Fsdevel, Kernel Mailing List

On 05/05/2016 01:51 PM, Miklos Szeredi wrote:
> Hi Jens,
>
> This fix seems to have been missed; it should go into v4.6.

Indeed. Thanks, I have queued it up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-05-05 21:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-05 19:51 [PATCH v4.6-rc] writeback: Fix performance regression in wb_over_bg_thresh() Miklos Szeredi
2016-05-05 21:43 ` Sedat Dilek
2016-05-05 21:45 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.