All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-07 12:35 zhdxzx
  2014-06-07 15:23   ` Felipe Contreras
  2014-06-07 18:24   ` Linus Torvalds
  0 siblings, 2 replies; 33+ messages in thread
From: zhdxzx @ 2014-06-07 12:35 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Michal Hocko, linux-kernel, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, dhillf, hillf.zj

----- Original Message -----
From: Felipe Contreras <felipe.contreras@gmail.com>

>>On Fri, Jun 6, 2014 at 4:58 AM,  <zhdxzx@sina.com> wrote:
>> Alternatively can we try wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10) ?
>>
> I see the same problem with that code.
>
The comments around the congestion_wait,
[1]
	 *
	 * Once a zone is flagged ZONE_WRITEBACK, kswapd will count the number
	 * of pages under pages flagged for immediate reclaim and stall if any
	 * are encountered in the nr_immediate check below.
	 */
	if (nr_writeback && nr_writeback == nr_taken)
		zone_set_flag(zone, ZONE_WRITEBACK);


[2]
		/*
		 * If dirty pages are scanned that are not queued for IO, it
		 * implies that flushers are not keeping up. In this case, flag
		 * the zone ZONE_TAIL_LRU_DIRTY and kswapd will start writing
		 * pages from reclaim context. It will forcibly stall in the
		 * next check.
		 */
		if (nr_unqueued_dirty == nr_taken)
			zone_set_flag(zone, ZONE_TAIL_LRU_DIRTY);

The "force stall" in [2] conflicts with "start writing pages" in [2], and
conflicts with "nr_immediate check below" in [1] as well, IIUC.

Would you please try again based only on comment [1](based on v3.15-rc8)?
thanks
Hillf

--- a/mm/vmscan.c	Sat Jun  7 18:38:08 2014
+++ b/mm/vmscan.c	Sat Jun  7 20:08:36 2014
@@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to
 		 * implies that pages are cycling through the LRU faster than
 		 * they are written so also forcibly stall.
 		 */
-		if (nr_unqueued_dirty == nr_taken || nr_immediate)
+		if (nr_immediate)
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 	}
 
--

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-07 12:35 Interactivity regression since v3.11 in mm/vmscan.c zhdxzx
@ 2014-06-07 15:23   ` Felipe Contreras
  2014-06-07 18:24   ` Linus Torvalds
  1 sibling, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-07 15:23 UTC (permalink / raw)
  To: zhdxzx
  Cc: Michal Hocko, linux-kernel, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, dhillf, hillf.zj

On Sat, Jun 7, 2014 at 7:35 AM,  <zhdxzx@sina.com> wrote:

> Would you please try again based only on comment [1](based on v3.15-rc8)?
> thanks
> Hillf
>
> --- a/mm/vmscan.c       Sat Jun  7 18:38:08 2014
> +++ b/mm/vmscan.c       Sat Jun  7 20:08:36 2014
> @@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to
>                  * implies that pages are cycling through the LRU faster than
>                  * they are written so also forcibly stall.
>                  */
> -               if (nr_unqueued_dirty == nr_taken || nr_immediate)
> +               if (nr_immediate)
>                         congestion_wait(BLK_RW_ASYNC, HZ/10);
>         }

That actually seems to work correctly on v3.15-rc8.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-07 15:23   ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-07 15:23 UTC (permalink / raw)
  To: zhdxzx
  Cc: Michal Hocko, linux-kernel, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, dhillf, hillf.zj

On Sat, Jun 7, 2014 at 7:35 AM,  <zhdxzx@sina.com> wrote:

> Would you please try again based only on comment [1](based on v3.15-rc8)?
> thanks
> Hillf
>
> --- a/mm/vmscan.c       Sat Jun  7 18:38:08 2014
> +++ b/mm/vmscan.c       Sat Jun  7 20:08:36 2014
> @@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to
>                  * implies that pages are cycling through the LRU faster than
>                  * they are written so also forcibly stall.
>                  */
> -               if (nr_unqueued_dirty == nr_taken || nr_immediate)
> +               if (nr_immediate)
>                         congestion_wait(BLK_RW_ASYNC, HZ/10);
>         }

That actually seems to work correctly on v3.15-rc8.

-- 
Felipe Contreras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-07 12:35 Interactivity regression since v3.11 in mm/vmscan.c zhdxzx
@ 2014-06-07 18:24   ` Linus Torvalds
  2014-06-07 18:24   ` Linus Torvalds
  1 sibling, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2014-06-07 18:24 UTC (permalink / raw)
  To: zhdxzx
  Cc: Felipe Contreras, Michal Hocko, linux-kernel, linux-mm,
	Andrew Morton, Mel Gorman, dhillf, hillf.zj

So we very recently (as in this merge window) merged a change to this
very area, but that change was very specific to one case.

Hillf's patch (below) apparently fixes the problem Felipe sees, and I
have to say, his problem sounds a *lot* like the kind of horrible
performance I've seen with writing to USB devices. I blamed
non-working per-bdi throttling, but this implies it is more generic
than that. The fact that the very same code also made nfsd very
unhappy makes me think that the code is just fundamentally broken.

And quite frankly, the whole logic is a bit questionable. That

  "nr_unqueued_dirty == nr_taken"

test is claimed to be "implies that flushers are not keeping up", but
that's not actually true at all. It just means that

 (a) all the pages we isolated are dirty
 (b) .. and none of them are under writeback

and it's very possible that none of them are under writeback because
nobody has even decided to start writeback on them yet, because nobody
has even walked through the list yet, so they were all still marked as
referenced. I guess you could say that "flushers are not keeping up",
but *we're* one of the flushers, and it's not that we aren't keeping
up, it's that we haven't even scanned things yet.

So what do we do when we haven't scanned the list enough to see any
non-referenced pages? Do we scan it a bit more? No. We decide to
congestion-wait.

That sounds completely and utterly stupid and broken. Does it make any
sense at all? No it doesn't. It just seems to delay starting any
writeback at all.

I suspect the code comes from "let's not spend too much time scanning
the dirty lists when everything is dirty", and is trying to avoid CPU
use. But what it seems to do is actually to avoid even starting
writeback in the first place, and just "congestion-waiting" even when
nothing is being written back (here "nothing" is not absolute - we're
only looking at a part of the dirty pages, obviously, but we're
looking at the *old* dirty pages, so it's a fairly important part of
it).

So I really get the feeling that this code is broken, and that the
patch to remove that "nr_unqueued_dirty == nr_taken" is correct.

In particular, doesn't that congestion wait - which is supposed to
wait for kswapd - end up waiting even when the process in question
*is* kswapd?

So it's not just processes like nfsd that got throttled down (which no
longer happens because of the recent commit 399ba0b95670), it seems
like kswapd itself gets throttled down because of this test.

So at the *very* least I feel like the new current_may_throttle()
needs to say that "kswapd must not be throttled", but I wonder if that
whole thing just needs to go.

And maybe that recent commit 399ba0b95670 is actually broken, and
wanted to fix just this part too. Maybe it *should* wait for the
"nr_immediate" case, which is the one that is currently aimed at
*only* throttling down kswapd itself. Maybe we should remove the
"current_is_kswapd()" test in the nr_immediate code instead, and make
everybody throttle when they hit the actual _real_ congestion case of
the whole zone being under writeback?

Comments? Mel, this code is mostly attributed to you, I'd like to hear
what you think in particular.

                    Linus

On Sat, Jun 7, 2014 at 5:35 AM,  <zhdxzx@sina.com> wrote:
>
> The comments around the congestion_wait,
> [1]
>          *
>          * Once a zone is flagged ZONE_WRITEBACK, kswapd will count the number
>          * of pages under pages flagged for immediate reclaim and stall if any
>          * are encountered in the nr_immediate check below.
>          */
>         if (nr_writeback && nr_writeback == nr_taken)
>                 zone_set_flag(zone, ZONE_WRITEBACK);
>
>
> [2]
>                 /*
>                  * If dirty pages are scanned that are not queued for IO, it
>                  * implies that flushers are not keeping up. In this case, flag
>                  * the zone ZONE_TAIL_LRU_DIRTY and kswapd will start writing
>                  * pages from reclaim context. It will forcibly stall in the
>                  * next check.
>                  */
>                 if (nr_unqueued_dirty == nr_taken)
>                         zone_set_flag(zone, ZONE_TAIL_LRU_DIRTY);
>
> The "force stall" in [2] conflicts with "start writing pages" in [2], and
> conflicts with "nr_immediate check below" in [1] as well, IIUC.
>
> Would you please try again based only on comment [1](based on v3.15-rc8)?
> thanks
> Hillf
>
> --- a/mm/vmscan.c       Sat Jun  7 18:38:08 2014
> +++ b/mm/vmscan.c       Sat Jun  7 20:08:36 2014
> @@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to
>                  * implies that pages are cycling through the LRU faster than
>                  * they are written so also forcibly stall.
>                  */
> -               if (nr_unqueued_dirty == nr_taken || nr_immediate)
> +               if (nr_immediate)
>                         congestion_wait(BLK_RW_ASYNC, HZ/10);
>         }
>
> --

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-07 18:24   ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2014-06-07 18:24 UTC (permalink / raw)
  To: zhdxzx
  Cc: Felipe Contreras, Michal Hocko, linux-kernel, linux-mm,
	Andrew Morton, Mel Gorman, dhillf, hillf.zj

So we very recently (as in this merge window) merged a change to this
very area, but that change was very specific to one case.

Hillf's patch (below) apparently fixes the problem Felipe sees, and I
have to say, his problem sounds a *lot* like the kind of horrible
performance I've seen with writing to USB devices. I blamed
non-working per-bdi throttling, but this implies it is more generic
than that. The fact that the very same code also made nfsd very
unhappy makes me think that the code is just fundamentally broken.

And quite frankly, the whole logic is a bit questionable. That

  "nr_unqueued_dirty == nr_taken"

test is claimed to be "implies that flushers are not keeping up", but
that's not actually true at all. It just means that

 (a) all the pages we isolated are dirty
 (b) .. and none of them are under writeback

and it's very possible that none of them are under writeback because
nobody has even decided to start writeback on them yet, because nobody
has even walked through the list yet, so they were all still marked as
referenced. I guess you could say that "flushers are not keeping up",
but *we're* one of the flushers, and it's not that we aren't keeping
up, it's that we haven't even scanned things yet.

So what do we do when we haven't scanned the list enough to see any
non-referenced pages? Do we scan it a bit more? No. We decide to
congestion-wait.

That sounds completely and utterly stupid and broken. Does it make any
sense at all? No it doesn't. It just seems to delay starting any
writeback at all.

I suspect the code comes from "let's not spend too much time scanning
the dirty lists when everything is dirty", and is trying to avoid CPU
use. But what it seems to do is actually to avoid even starting
writeback in the first place, and just "congestion-waiting" even when
nothing is being written back (here "nothing" is not absolute - we're
only looking at a part of the dirty pages, obviously, but we're
looking at the *old* dirty pages, so it's a fairly important part of
it).

So I really get the feeling that this code is broken, and that the
patch to remove that "nr_unqueued_dirty == nr_taken" is correct.

In particular, doesn't that congestion wait - which is supposed to
wait for kswapd - end up waiting even when the process in question
*is* kswapd?

So it's not just processes like nfsd that got throttled down (which no
longer happens because of the recent commit 399ba0b95670), it seems
like kswapd itself gets throttled down because of this test.

So at the *very* least I feel like the new current_may_throttle()
needs to say that "kswapd must not be throttled", but I wonder if that
whole thing just needs to go.

And maybe that recent commit 399ba0b95670 is actually broken, and
wanted to fix just this part too. Maybe it *should* wait for the
"nr_immediate" case, which is the one that is currently aimed at
*only* throttling down kswapd itself. Maybe we should remove the
"current_is_kswapd()" test in the nr_immediate code instead, and make
everybody throttle when they hit the actual _real_ congestion case of
the whole zone being under writeback?

Comments? Mel, this code is mostly attributed to you, I'd like to hear
what you think in particular.

                    Linus

On Sat, Jun 7, 2014 at 5:35 AM,  <zhdxzx@sina.com> wrote:
>
> The comments around the congestion_wait,
> [1]
>          *
>          * Once a zone is flagged ZONE_WRITEBACK, kswapd will count the number
>          * of pages under pages flagged for immediate reclaim and stall if any
>          * are encountered in the nr_immediate check below.
>          */
>         if (nr_writeback && nr_writeback == nr_taken)
>                 zone_set_flag(zone, ZONE_WRITEBACK);
>
>
> [2]
>                 /*
>                  * If dirty pages are scanned that are not queued for IO, it
>                  * implies that flushers are not keeping up. In this case, flag
>                  * the zone ZONE_TAIL_LRU_DIRTY and kswapd will start writing
>                  * pages from reclaim context. It will forcibly stall in the
>                  * next check.
>                  */
>                 if (nr_unqueued_dirty == nr_taken)
>                         zone_set_flag(zone, ZONE_TAIL_LRU_DIRTY);
>
> The "force stall" in [2] conflicts with "start writing pages" in [2], and
> conflicts with "nr_immediate check below" in [1] as well, IIUC.
>
> Would you please try again based only on comment [1](based on v3.15-rc8)?
> thanks
> Hillf
>
> --- a/mm/vmscan.c       Sat Jun  7 18:38:08 2014
> +++ b/mm/vmscan.c       Sat Jun  7 20:08:36 2014
> @@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to
>                  * implies that pages are cycling through the LRU faster than
>                  * they are written so also forcibly stall.
>                  */
> -               if (nr_unqueued_dirty == nr_taken || nr_immediate)
> +               if (nr_immediate)
>                         congestion_wait(BLK_RW_ASYNC, HZ/10);
>         }
>
> --

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-07 18:24   ` Linus Torvalds
@ 2014-06-08 21:33     ` Linus Torvalds
  -1 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2014-06-08 21:33 UTC (permalink / raw)
  To: zhdxzx
  Cc: Felipe Contreras, Michal Hocko, linux-kernel, linux-mm,
	Andrew Morton, Mel Gorman, dhillf, hillf.zj

On Sat, Jun 7, 2014 at 11:24 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Comments? Mel, this code is mostly attributed to you, I'd like to hear
> what you think in particular.

In the meantime, I've removed the "nr_unqueued_dirty == nr_taken"
check for congestion_wait(), since I can't see how it can possibly be
sensible, and Felipe confirmed that it fixes his interactivity issue.

Nobody commented on it, but let's see if we get reactions to the
behavior changing..

             Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-08 21:33     ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2014-06-08 21:33 UTC (permalink / raw)
  To: zhdxzx
  Cc: Felipe Contreras, Michal Hocko, linux-kernel, linux-mm,
	Andrew Morton, Mel Gorman, dhillf, hillf.zj

On Sat, Jun 7, 2014 at 11:24 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Comments? Mel, this code is mostly attributed to you, I'd like to hear
> what you think in particular.

In the meantime, I've removed the "nr_unqueued_dirty == nr_taken"
check for congestion_wait(), since I can't see how it can possibly be
sensible, and Felipe confirmed that it fixes his interactivity issue.

Nobody commented on it, but let's see if we get reactions to the
behavior changing..

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-07 18:24   ` Linus Torvalds
@ 2014-06-09 12:58     ` Mel Gorman
  -1 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2014-06-09 12:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: zhdxzx, Felipe Contreras, Michal Hocko, linux-kernel, linux-mm,
	Andrew Morton, dhillf, hillf.zj

On Sat, Jun 07, 2014 at 11:24:56AM -0700, Linus Torvalds wrote:
> So we very recently (as in this merge window) merged a change to this
> very area, but that change was very specific to one case.
> 
> Hillf's patch (below) apparently fixes the problem Felipe sees, and I
> have to say, his problem sounds a *lot* like the kind of horrible
> performance I've seen with writing to USB devices. I blamed
> non-working per-bdi throttling, but this implies it is more generic
> than that. The fact that the very same code also made nfsd very
> unhappy makes me think that the code is just fundamentally broken.
> 
> And quite frankly, the whole logic is a bit questionable. That
> 
>   "nr_unqueued_dirty == nr_taken"
> 
> test is claimed to be "implies that flushers are not keeping up", but
> that's not actually true at all. It just means that
> 
>  (a) all the pages we isolated are dirty
>  (b) .. and none of them are under writeback
> 
> and it's very possible that none of them are under writeback because
> nobody has even decided to start writeback on them yet, because nobody
> has even walked through the list yet, so they were all still marked as
> referenced. I guess you could say that "flushers are not keeping up",
> but *we're* one of the flushers, and it's not that we aren't keeping
> up, it's that we haven't even scanned things yet.
> 
> So what do we do when we haven't scanned the list enough to see any
> non-referenced pages? Do we scan it a bit more? No. We decide to
> congestion-wait.
> 
> That sounds completely and utterly stupid and broken. Does it make any
> sense at all? No it doesn't. It just seems to delay starting any
> writeback at all.
> 

The original intent was moving away from direct reclaimers and kswapd just
blocking on congestion for the sake of it and avoiding excessive swapping
during IO. That was not a smooth road.

> I suspect the code comes from "let's not spend too much time scanning
> the dirty lists when everything is dirty", and is trying to avoid CPU
> use.

Yes. At the time we moved away from calling congestion_wait() for all
sorts of reasons there were a number of bugs with different root causes
but looked like kswapd using 99% of CPU during heavy IO.

> But what it seems to do is actually to avoid even starting
> writeback in the first place, and just "congestion-waiting" even when
> nothing is being written back (here "nothing" is not absolute - we're
> only looking at a part of the dirty pages, obviously, but we're
> looking at the *old* dirty pages, so it's a fairly important part of
> it).
> 
> So I really get the feeling that this code is broken, and that the
> patch to remove that "nr_unqueued_dirty == nr_taken" is correct.
> 

I cannot think of a reason to disagree with that. It was a mistake because
it also failed to take into account that writeback might not have delayed
because the dirty expire limit had not been reached.

> In particular, doesn't that congestion wait - which is supposed to
> wait for kswapd - end up waiting even when the process in question
> *is* kswapd?
> 
> So it's not just processes like nfsd that got throttled down (which no
> longer happens because of the recent commit 399ba0b95670), it seems
> like kswapd itself gets throttled down because of this test.
> 
> So at the *very* least I feel like the new current_may_throttle()
> needs to say that "kswapd must not be throttled", but I wonder if that
> whole thing just needs to go.
> 
> And maybe that recent commit 399ba0b95670 is actually broken, and
> wanted to fix just this part too. Maybe it *should* wait for the
> "nr_immediate" case, which is the one that is currently aimed at
> *only* throttling down kswapd itself. Maybe we should remove the
> "current_is_kswapd()" test in the nr_immediate code instead, and make
> everybody throttle when they hit the actual _real_ congestion case of
> the whole zone being under writeback?
> 
> Comments? Mel, this code is mostly attributed to you, I'd like to hear
> what you think in particular.
> 

I've no problem with your patch so lets go with it with the caveat that there
are three bugs to watch out for. The first is excessive CPU usage during
reclaim by direct reclaimers or kswapd which should still be controlled
but worth watching for anyway. The second is excessive writeback from
kswapd context resulting in poor IO efficiency which is harder to measure a
performance impact for but looks like high counts for nr_vmscan_write. The
third is excessive swapping during IO as the file LRUs are being scanned
with mostly dirty pages and the reclaimer swaps anonymous pages instead
which will look like interactivity stalls due to swapping during heavy IO. I
can keep an eye out for all three when my regression tests pick up 3.16-rc1.

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-09 12:58     ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2014-06-09 12:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: zhdxzx, Felipe Contreras, Michal Hocko, linux-kernel, linux-mm,
	Andrew Morton, dhillf, hillf.zj

On Sat, Jun 07, 2014 at 11:24:56AM -0700, Linus Torvalds wrote:
> So we very recently (as in this merge window) merged a change to this
> very area, but that change was very specific to one case.
> 
> Hillf's patch (below) apparently fixes the problem Felipe sees, and I
> have to say, his problem sounds a *lot* like the kind of horrible
> performance I've seen with writing to USB devices. I blamed
> non-working per-bdi throttling, but this implies it is more generic
> than that. The fact that the very same code also made nfsd very
> unhappy makes me think that the code is just fundamentally broken.
> 
> And quite frankly, the whole logic is a bit questionable. That
> 
>   "nr_unqueued_dirty == nr_taken"
> 
> test is claimed to be "implies that flushers are not keeping up", but
> that's not actually true at all. It just means that
> 
>  (a) all the pages we isolated are dirty
>  (b) .. and none of them are under writeback
> 
> and it's very possible that none of them are under writeback because
> nobody has even decided to start writeback on them yet, because nobody
> has even walked through the list yet, so they were all still marked as
> referenced. I guess you could say that "flushers are not keeping up",
> but *we're* one of the flushers, and it's not that we aren't keeping
> up, it's that we haven't even scanned things yet.
> 
> So what do we do when we haven't scanned the list enough to see any
> non-referenced pages? Do we scan it a bit more? No. We decide to
> congestion-wait.
> 
> That sounds completely and utterly stupid and broken. Does it make any
> sense at all? No it doesn't. It just seems to delay starting any
> writeback at all.
> 

The original intent was moving away from direct reclaimers and kswapd just
blocking on congestion for the sake of it and avoiding excessive swapping
during IO. That was not a smooth road.

> I suspect the code comes from "let's not spend too much time scanning
> the dirty lists when everything is dirty", and is trying to avoid CPU
> use.

Yes. At the time we moved away from calling congestion_wait() for all
sorts of reasons there were a number of bugs with different root causes
but looked like kswapd using 99% of CPU during heavy IO.

> But what it seems to do is actually to avoid even starting
> writeback in the first place, and just "congestion-waiting" even when
> nothing is being written back (here "nothing" is not absolute - we're
> only looking at a part of the dirty pages, obviously, but we're
> looking at the *old* dirty pages, so it's a fairly important part of
> it).
> 
> So I really get the feeling that this code is broken, and that the
> patch to remove that "nr_unqueued_dirty == nr_taken" is correct.
> 

I cannot think of a reason to disagree with that. It was a mistake because
it also failed to take into account that writeback might not have delayed
because the dirty expire limit had not been reached.

> In particular, doesn't that congestion wait - which is supposed to
> wait for kswapd - end up waiting even when the process in question
> *is* kswapd?
> 
> So it's not just processes like nfsd that got throttled down (which no
> longer happens because of the recent commit 399ba0b95670), it seems
> like kswapd itself gets throttled down because of this test.
> 
> So at the *very* least I feel like the new current_may_throttle()
> needs to say that "kswapd must not be throttled", but I wonder if that
> whole thing just needs to go.
> 
> And maybe that recent commit 399ba0b95670 is actually broken, and
> wanted to fix just this part too. Maybe it *should* wait for the
> "nr_immediate" case, which is the one that is currently aimed at
> *only* throttling down kswapd itself. Maybe we should remove the
> "current_is_kswapd()" test in the nr_immediate code instead, and make
> everybody throttle when they hit the actual _real_ congestion case of
> the whole zone being under writeback?
> 
> Comments? Mel, this code is mostly attributed to you, I'd like to hear
> what you think in particular.
> 

I've no problem with your patch so lets go with it with the caveat that there
are three bugs to watch out for. The first is excessive CPU usage during
reclaim by direct reclaimers or kswapd which should still be controlled
but worth watching for anyway. The second is excessive writeback from
kswapd context resulting in poor IO efficiency which is harder to measure a
performance impact for but looks like high counts for nr_vmscan_write. The
third is excessive swapping during IO as the file LRUs are being scanned
with mostly dirty pages and the reclaimer swaps anonymous pages instead
which will look like interactivity stalls due to swapping during heavy IO. I
can keep an eye out for all three when my regression tests pick up 3.16-rc1.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-09  7:53             ` Michal Hocko
@ 2014-06-09 10:02               ` Felipe Contreras
  -1 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-09 10:02 UTC (permalink / raw)
  To: Michal Hocko, Felipe Contreras
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

Michal Hocko wrote:
> On Fri 06-06-14 18:11:14, Felipe Contreras wrote:
> > On Fri, Jun 6, 2014 at 5:33 AM, Felipe Contreras
> > <felipe.contreras@gmail.com> wrote:
> > > On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:
> > >
> > >> Mel has a nice systemtap script (attached) to watch for stalls. Maybe
> > >> you can give it a try?
> > >
> > > Is there any special configurations I should enable?
> > >
> > > I get this:
> > > semantic error: unresolved arity-1 global array name, missing global
> > > declaration?: identifier 'name' at /tmp/stapd6pu9A:4:2
> > >         source: name[t]=execname()
> > >                 ^
> > >
> > > Pass 2: analysis failed.  [man error::pass2]
> > > Number of similar error messages suppressed: 71.
> > > Rerun with -v to see them.
> > > Unexpected exit of STAP script at
> > > /home/felipec/Downloads/watch-dstate-new.pl line 320.
> > 
> > Actually I debugged the problem, and it's that the format of the
> > script is DOS, not UNIX. After changing the format the script works.
> 
> Ups, I've downloaded it from our bugzilla so maybe it just did some
> tricks with the script.
> 
> > However, it's not returning anything. It's running, but doesn't seem
> > to find any stalls.
> 
> Intereting. It was quite good at pointing at stalls. How are you
> measuring those stalls during your testing?

I'm not measuring them, I simply grab a GUI window and move it around
while the big file is being copied, when the issue happens the window
stops moving, and the mouse, everything hangs for a time, then it
resumes, then hangs again... the interactivity is bad.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-09 10:02               ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-09 10:02 UTC (permalink / raw)
  To: Michal Hocko, Felipe Contreras
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

Michal Hocko wrote:
> On Fri 06-06-14 18:11:14, Felipe Contreras wrote:
> > On Fri, Jun 6, 2014 at 5:33 AM, Felipe Contreras
> > <felipe.contreras@gmail.com> wrote:
> > > On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:
> > >
> > >> Mel has a nice systemtap script (attached) to watch for stalls. Maybe
> > >> you can give it a try?
> > >
> > > Is there any special configurations I should enable?
> > >
> > > I get this:
> > > semantic error: unresolved arity-1 global array name, missing global
> > > declaration?: identifier 'name' at /tmp/stapd6pu9A:4:2
> > >         source: name[t]=execname()
> > >                 ^
> > >
> > > Pass 2: analysis failed.  [man error::pass2]
> > > Number of similar error messages suppressed: 71.
> > > Rerun with -v to see them.
> > > Unexpected exit of STAP script at
> > > /home/felipec/Downloads/watch-dstate-new.pl line 320.
> > 
> > Actually I debugged the problem, and it's that the format of the
> > script is DOS, not UNIX. After changing the format the script works.
> 
> Ups, I've downloaded it from our bugzilla so maybe it just did some
> tricks with the script.
> 
> > However, it's not returning anything. It's running, but doesn't seem
> > to find any stalls.
> 
> Intereting. It was quite good at pointing at stalls. How are you
> measuring those stalls during your testing?

I'm not measuring them, I simply grab a GUI window and move it around
while the big file is being copied, when the issue happens the window
stops moving, and the mouse, everything hangs for a time, then it
resumes, then hangs again... the interactivity is bad.

-- 
Felipe Contreras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-06 23:11           ` Felipe Contreras
@ 2014-06-09  7:53             ` Michal Hocko
  -1 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-06-09  7:53 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Fri 06-06-14 18:11:14, Felipe Contreras wrote:
> On Fri, Jun 6, 2014 at 5:33 AM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
> > On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:
> >
> >> Mel has a nice systemtap script (attached) to watch for stalls. Maybe
> >> you can give it a try?
> >
> > Is there any special configurations I should enable?
> >
> > I get this:
> > semantic error: unresolved arity-1 global array name, missing global
> > declaration?: identifier 'name' at /tmp/stapd6pu9A:4:2
> >         source: name[t]=execname()
> >                 ^
> >
> > Pass 2: analysis failed.  [man error::pass2]
> > Number of similar error messages suppressed: 71.
> > Rerun with -v to see them.
> > Unexpected exit of STAP script at
> > /home/felipec/Downloads/watch-dstate-new.pl line 320.
> 
> Actually I debugged the problem, and it's that the format of the
> script is DOS, not UNIX. After changing the format the script works.

Ups, I've downloaded it from our bugzilla so maybe it just did some
tricks with the script.

> However, it's not returning anything. It's running, but doesn't seem
> to find any stalls.

Intereting. It was quite good at pointing at stalls. How are you
measuring those stalls during your testing?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-09  7:53             ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-06-09  7:53 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Fri 06-06-14 18:11:14, Felipe Contreras wrote:
> On Fri, Jun 6, 2014 at 5:33 AM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
> > On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:
> >
> >> Mel has a nice systemtap script (attached) to watch for stalls. Maybe
> >> you can give it a try?
> >
> > Is there any special configurations I should enable?
> >
> > I get this:
> > semantic error: unresolved arity-1 global array name, missing global
> > declaration?: identifier 'name' at /tmp/stapd6pu9A:4:2
> >         source: name[t]=execname()
> >                 ^
> >
> > Pass 2: analysis failed.  [man error::pass2]
> > Number of similar error messages suppressed: 71.
> > Rerun with -v to see them.
> > Unexpected exit of STAP script at
> > /home/felipec/Downloads/watch-dstate-new.pl line 320.
> 
> Actually I debugged the problem, and it's that the format of the
> script is DOS, not UNIX. After changing the format the script works.

Ups, I've downloaded it from our bugzilla so maybe it just did some
tricks with the script.

> However, it's not returning anything. It's running, but doesn't seem
> to find any stalls.

Intereting. It was quite good at pointing at stalls. How are you
measuring those stalls during your testing?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-06 10:33         ` Felipe Contreras
@ 2014-06-06 23:11           ` Felipe Contreras
  -1 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-06 23:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Fri, Jun 6, 2014 at 5:33 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:
>
>> Mel has a nice systemtap script (attached) to watch for stalls. Maybe
>> you can give it a try?
>
> Is there any special configurations I should enable?
>
> I get this:
> semantic error: unresolved arity-1 global array name, missing global
> declaration?: identifier 'name' at /tmp/stapd6pu9A:4:2
>         source: name[t]=execname()
>                 ^
>
> Pass 2: analysis failed.  [man error::pass2]
> Number of similar error messages suppressed: 71.
> Rerun with -v to see them.
> Unexpected exit of STAP script at
> /home/felipec/Downloads/watch-dstate-new.pl line 320.

Actually I debugged the problem, and it's that the format of the
script is DOS, not UNIX. After changing the format the script works.

However, it's not returning anything. It's running, but doesn't seem
to find any stalls.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-06 23:11           ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-06 23:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Fri, Jun 6, 2014 at 5:33 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:
>
>> Mel has a nice systemtap script (attached) to watch for stalls. Maybe
>> you can give it a try?
>
> Is there any special configurations I should enable?
>
> I get this:
> semantic error: unresolved arity-1 global array name, missing global
> declaration?: identifier 'name' at /tmp/stapd6pu9A:4:2
>         source: name[t]=execname()
>                 ^
>
> Pass 2: analysis failed.  [man error::pass2]
> Number of similar error messages suppressed: 71.
> Rerun with -v to see them.
> Unexpected exit of STAP script at
> /home/felipec/Downloads/watch-dstate-new.pl line 320.

Actually I debugged the problem, and it's that the format of the
script is DOS, not UNIX. After changing the format the script works.

However, it's not returning anything. It's running, but doesn't seem
to find any stalls.

-- 
Felipe Contreras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-06 11:03           ` Michal Hocko
@ 2014-06-06 12:54             ` Felipe Contreras
  -1 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-06 12:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Fri, Jun 6, 2014 at 6:03 AM, Michal Hocko <mhocko@suse.cz> wrote:
> On Fri 06-06-14 05:33:28, Felipe Contreras wrote:
>> On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:
>>
>> > Mel has a nice systemtap script (attached) to watch for stalls. Maybe
>> > you can give it a try?
>>
>> Is there any special configurations I should enable?
>
> You need debuginfo and systemtap AFAIK. I haven't used this script
> myself.

I have both.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-06 12:54             ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-06 12:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Fri, Jun 6, 2014 at 6:03 AM, Michal Hocko <mhocko@suse.cz> wrote:
> On Fri 06-06-14 05:33:28, Felipe Contreras wrote:
>> On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:
>>
>> > Mel has a nice systemtap script (attached) to watch for stalls. Maybe
>> > you can give it a try?
>>
>> Is there any special configurations I should enable?
>
> You need debuginfo and systemtap AFAIK. I haven't used this script
> myself.

I have both.

-- 
Felipe Contreras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-05 13:37   ` Michal Hocko
@ 2014-06-06 12:27     ` Felipe Contreras
  -1 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-06 12:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Thu, Jun 5, 2014 at 8:37 AM, Michal Hocko <mhocko@suse.cz> wrote:

> We had a similar report for opensuse. The common part was that there was
> an IO to a slow USB device going on.

Actually I've managed to narrow down my synthetic test, and all I need
is to copy a big file, and it even happens reading and writing to the
SSD (although the stall is less severe).

Here's the test:
http://pastie.org/9264124

Just pass a big file as the first argument.

I don't have much memory in this machine, so I guess running out of
memory is the trigger.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-06 12:27     ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-06 12:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Thu, Jun 5, 2014 at 8:37 AM, Michal Hocko <mhocko@suse.cz> wrote:

> We had a similar report for opensuse. The common part was that there was
> an IO to a slow USB device going on.

Actually I've managed to narrow down my synthetic test, and all I need
is to copy a big file, and it even happens reading and writing to the
SSD (although the stall is less severe).

Here's the test:
http://pastie.org/9264124

Just pass a big file as the first argument.

I don't have much memory in this machine, so I guess running out of
memory is the trigger.

-- 
Felipe Contreras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-06 10:33         ` Felipe Contreras
@ 2014-06-06 11:03           ` Michal Hocko
  -1 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-06-06 11:03 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Fri 06-06-14 05:33:28, Felipe Contreras wrote:
> On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Mel has a nice systemtap script (attached) to watch for stalls. Maybe
> > you can give it a try?
> 
> Is there any special configurations I should enable?

You need debuginfo and systemtap AFAIK. I haven't used this script
myself.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-06 11:03           ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-06-06 11:03 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Fri 06-06-14 05:33:28, Felipe Contreras wrote:
> On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Mel has a nice systemtap script (attached) to watch for stalls. Maybe
> > you can give it a try?
> 
> Is there any special configurations I should enable?

You need debuginfo and systemtap AFAIK. I haven't used this script
myself.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-06  9:16     ` Michal Hocko
@ 2014-06-06 10:33         ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-06 10:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:

> Mel has a nice systemtap script (attached) to watch for stalls. Maybe
> you can give it a try?

Is there any special configurations I should enable?

I get this:
semantic error: unresolved arity-1 global array name, missing global
declaration?: identifier 'name' at /tmp/stapd6pu9A:4:2
        source: name[t]=execname()
                ^

Pass 2: analysis failed.  [man error::pass2]
Number of similar error messages suppressed: 71.
Rerun with -v to see them.
Unexpected exit of STAP script at
/home/felipec/Downloads/watch-dstate-new.pl line 320.



-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-06 10:33         ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-06 10:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Fri, Jun 6, 2014 at 4:16 AM, Michal Hocko <mhocko@suse.cz> wrote:

> Mel has a nice systemtap script (attached) to watch for stalls. Maybe
> you can give it a try?

Is there any special configurations I should enable?

I get this:
semantic error: unresolved arity-1 global array name, missing global
declaration?: identifier 'name' at /tmp/stapd6pu9A:4:2
        source: name[t]=execname()
                ^

Pass 2: analysis failed.  [man error::pass2]
Number of similar error messages suppressed: 71.
Rerun with -v to see them.
Unexpected exit of STAP script at
/home/felipec/Downloads/watch-dstate-new.pl line 320.



-- 
Felipe Contreras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-06  9:58 zhdxzx
@ 2014-06-06 10:24   ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-06 10:24 UTC (permalink / raw)
  To: zhdxzx
  Cc: Michal Hocko, linux-kernel, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, dhillf

On Fri, Jun 6, 2014 at 4:58 AM,  <zhdxzx@sina.com> wrote:

> Alternatively can we try wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10) ?

I see the same problem with that code.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-06 10:24   ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-06 10:24 UTC (permalink / raw)
  To: zhdxzx
  Cc: Michal Hocko, linux-kernel, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, dhillf

On Fri, Jun 6, 2014 at 4:58 AM,  <zhdxzx@sina.com> wrote:

> Alternatively can we try wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10) ?

I see the same problem with that code.

-- 
Felipe Contreras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-06  9:58 zhdxzx
  2014-06-06 10:24   ` Felipe Contreras
  0 siblings, 1 reply; 33+ messages in thread
From: zhdxzx @ 2014-06-06  9:58 UTC (permalink / raw)
  To: Michal Hocko, Felipe Contreras
  Cc: linux-kernel, linux-mm, Andrew Morton, Linus Torvalds,
	Mel Gorman, dhillf

----- Original Message -----
From: Michal Hocko <mhocko@suse.cz>
To: Felipe Contreras <felipe.contreras@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, Andrew Morton <akpm@linux-foundation.org>, Linus Torvalds <torvalds@linux-foundation.org>, Mel Gorman <mgorman@suse.de>, KAMEZAWA Hiroyuki <kamezawa.hiro
Subject: Re: Interactivity regression since v3.11 in mm/vmscan.c
Date: 2014-06-06 17:16

On Thu 05-06-14 09:00:10, Felipe Contreras wrote:
> On Thu, Jun 5, 2014 at 8:37 AM, Michal Hocko <mhocko@suse.cz> wrote:
> > On Thu 05-06-14 06:33:40, Felipe Contreras wrote:
> 
> >> For a while I've noticed that my machine bogs down in certain
> >> situations, usually while doing heavy I/O operations, it is not just the
> >> I/O operations, but everything, including the graphical interface, even
> >> the mouse pointer.
> >>
> >> As far as I can recall this did not happen in the past.
> >>
> >> I noticed this specially on certain operations, for example updating a
> >> a game on Steam (to an exteranl USB 3.0 device), or copying TV episodes
> >> to a USB memory stick (probably flash-based).
> >
> > We had a similar report for opensuse. The common part was that there was
> > an IO to a slow USB device going on.
> 
> Well, it's a USB 3.0 device, I can write at 250 MB/s, so it's not
> really that slow.
> 
> And in fact, when I read and write to and from the same USB 3.0
> device, I don't see the issue.
> 
> >> Then I went back to the latest stable version (v3.14.5), and commented
> >> out the line I think is causing the slow down:
> >>
> >>   if (nr_unqueued_dirty == nr_taken || nr_immediate)
> >>         congestion_wait(BLK_RW_ASYNC, HZ/10);
> >
> > Yes, I came to the same check. I didn't have any confirmation yet so
> > thanks for your confirmation. I've suggested to reduce this
> > congestion_wait only to kswapd:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 32c661d66a45..ef6a1c0e788c 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >                  * implies that pages are cycling through the LRU faster than
> >                  * they are written so also forcibly stall.
> >                  */
> > -               if (nr_unqueued_dirty == nr_taken || nr_immediate)
> > +               if ((nr_unqueued_dirty == nr_taken || nr_immediate) && current_is_kswapd())
> >                         congestion_wait(BLK_RW_ASYNC, HZ/10);
> >         }
> 
> Unfortunately that doesn't fix the issue for me.
>That is really interesting. So removing the test completely helps but
>reducing it to kswapd doesn't. I would expect stalls coming from direct
>reclaimers not the kswapd.
>Mel has a nice systemtap script (attached) to watch for stalls. Maybe
>you can give it a try?
>
Alternatively can we try wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10) ?

Hillf

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-05 14:00     ` Felipe Contreras
  (?)
@ 2014-06-06  9:16     ` Michal Hocko
  2014-06-06 10:33         ` Felipe Contreras
  -1 siblings, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2014-06-06  9:16 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

[-- Attachment #1: Type: text/plain, Size: 2426 bytes --]

On Thu 05-06-14 09:00:10, Felipe Contreras wrote:
> On Thu, Jun 5, 2014 at 8:37 AM, Michal Hocko <mhocko@suse.cz> wrote:
> > On Thu 05-06-14 06:33:40, Felipe Contreras wrote:
> 
> >> For a while I've noticed that my machine bogs down in certain
> >> situations, usually while doing heavy I/O operations, it is not just the
> >> I/O operations, but everything, including the graphical interface, even
> >> the mouse pointer.
> >>
> >> As far as I can recall this did not happen in the past.
> >>
> >> I noticed this specially on certain operations, for example updating a
> >> a game on Steam (to an exteranl USB 3.0 device), or copying TV episodes
> >> to a USB memory stick (probably flash-based).
> >
> > We had a similar report for opensuse. The common part was that there was
> > an IO to a slow USB device going on.
> 
> Well, it's a USB 3.0 device, I can write at 250 MB/s, so it's not
> really that slow.
> 
> And in fact, when I read and write to and from the same USB 3.0
> device, I don't see the issue.
> 
> >> Then I went back to the latest stable version (v3.14.5), and commented
> >> out the line I think is causing the slow down:
> >>
> >>   if (nr_unqueued_dirty == nr_taken || nr_immediate)
> >>         congestion_wait(BLK_RW_ASYNC, HZ/10);
> >
> > Yes, I came to the same check. I didn't have any confirmation yet so
> > thanks for your confirmation. I've suggested to reduce this
> > congestion_wait only to kswapd:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 32c661d66a45..ef6a1c0e788c 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >                  * implies that pages are cycling through the LRU faster than
> >                  * they are written so also forcibly stall.
> >                  */
> > -               if (nr_unqueued_dirty == nr_taken || nr_immediate)
> > +               if ((nr_unqueued_dirty == nr_taken || nr_immediate) && current_is_kswapd())
> >                         congestion_wait(BLK_RW_ASYNC, HZ/10);
> >         }
> 
> Unfortunately that doesn't fix the issue for me.

That is really interesting. So removing the test completely helps but
reducing it to kswapd doesn't. I would expect stalls coming from direct
reclaimers not the kswapd.

Mel has a nice systemtap script (attached) to watch for stalls. Maybe
you can give it a try?

-- 
Michal Hocko
SUSE Labs

[-- Attachment #2: watch-dstate-new.pl --]
[-- Type: text/x-perl, Size: 11166 bytes --]

#!/usr/bin/perl
# This script is a combined perl and systemtap script to collect information
# on a system stalling in writeback. Ordinarily, one would expect that all
# information be collected in a STAP script. Unfortunately, in practice the
# stack unwinder in systemtap may not work with a current kernel version,
# have trouble collecting all the data necessary or some other oddities.
# Hence this hack. A systemtap script is run and as it records interesting
# events, the remaining information is collected from the script. This means
# that the data is *never* exact but can be better than nothing and easier
# than a fully manual check
#
# Copyright Mel Gorman <mgorman@suse.de> 2011

use File::Temp qw/mkstemp/;
use File::Find;
use FindBin qw($Bin);
use Getopt::Long;
use strict;

my @trace_functions = (
	# "get_request_wait" is now special cased unfortunately
	"wait_for_completion",
	"wait_on_page_bit",
	"wait_on_page_bit_killable",
	"try_to_free_pages"
	);

my @completion_functions=(
	"handle_mm_fault",
	"sys_select",
	"__wake_up",
	"wake_up_bit",
	"__alloc_pages_nodemask",
	"balance_pgdat",
	"kmem_cache_alloc");

my @trace_conditional = (
	"sync_page",
	"sync_buffer",
	"sleep_on_buffer",
	"try_to_compact_pages",
	"balance_dirty_pages_ratelimited_nr",
	"balance_dirty_pages",
	);

# Information on each stall is gathered and stored in a hash table for
# dumping later. Define some constants for the table lookup to avoid
# blinding headaches
use constant VMSTAT_AT_STALL       => 0;
use constant VMSTAT_AT_COMPLETE    => 1;
use constant BLOCKSTAT_AT_STALL    => 2;
use constant BLOCKSTAT_AT_COMPLETE => 3;
use constant PROCNAME              => 4;
use constant STACKTRACE            => 5;
use constant STALLFUNCTION         => 6;

use constant NR_WRITEBACK => 0;
use constant NR_DIRTY     => 2;
use constant VMSCAN_WRITE => 1;

sub usage() {
	print("In general, this script is not supported and that includes help.\n");
	exit(0);
}

# Option variables
my $opt_help;
my $opt_output;
my $opt_stapout;
my $opt_accurate_stall = 1;
my $opt_accurate_stack = 0;
GetOptions(
	'help|h'		=> \$opt_help,
	'output|o=s'		=> \$opt_output,
	'stapout|s=s'		=> \$opt_stapout,
	'accurate-stack|a'	=> \$opt_accurate_stack,
	'accurate-stall|a'	=> \$opt_accurate_stall,
);

usage() if $opt_help;
if ($opt_output) {
	open(OUTPUT, ">$opt_output") || die("Failed to open $opt_output for writing");
}
if ($opt_stapout) {
	open(OUTPUT, ">$opt_stapout") || die("Failed to open $opt_stapout for writing");
}

if ($opt_accurate_stack) {
	$opt_accurate_stall = 0;
}
if ($opt_accurate_stall) {
	$opt_accurate_stack = 0;
}

# Handle cleanup of temp files
my $stappid;
my ($handle, $stapscript) = mkstemp("/tmp/stapdXXXXX");
sub cleanup {
	if (defined($stappid)) {
		kill INT => $stappid;
	}
	if (defined($opt_output)) {
		close(OUTPUT);
	}
	unlink($stapscript);
}
sub sigint_handler {
	close(STAP);
	cleanup();
	exit(0);
}
$SIG{INT} = "sigint_handler";

# Build a list of stat files to read. Obviously this is not great if device
# hotplug occurs but that is not expected for the moment and this is lighter
# than running find every time
my @block_iostat_files;
sub d {
	my $file = $File::Find::name;
	return if $file eq "/sys/block";
	push(@block_iostat_files, "$file/stat");
}
find(\&d, ("/sys/block/"));

##
# Read the current stack of a given pid
sub read_stacktrace($) {
	open(STACK, "/proc/$_[0]/stack") || return "Stack unavailable";
	my $stack = do {
		local $/;
		<STACK>;
	};
	close(STACK);
	return $stack;
}

##
# Read information of relevant from /proc/vmstat
sub read_vmstat {
	if (!open(VMSTAT, "/proc/vmstat")) {
		cleanup();
		die("Failed to read /proc/vmstat");
	}

	my $vmstat;
	my ($key, $value);
	my @values;
	while (!eof(VMSTAT)) {
		$vmstat = <VMSTAT>;
		($key, $value) = split(/\s+/, $vmstat);
		chomp($value);

		if ($key eq "nr_writeback") {
			$values[NR_WRITEBACK] = $value;
		}
		if ($key eq "nr_dirty") {
			$values[NR_DIRTY] = $value;
		}
		if ($key eq "nr_vmscan_write") {
			$values[VMSCAN_WRITE] = $value;
		}
	}

	return \@values;
}

##
# Read information from all /sys/block stat files
sub read_blockstat($) {
	my $prefix = $_[0];
	my $stat;
	my $ret;
	
	foreach $stat (@block_iostat_files) {
		if (open(STAT, $stat)) {
			$ret .= sprintf "%s%20s %s", $prefix, $stat, <STAT>;
			close(STAT);
		}
	}
	return $ret;
}

##
# Record a line of output
sub log_output {
	if (defined($opt_output)) {
		print OUTPUT @_;
	}
	print @_;
}

sub log_printf {
	if (defined($opt_output)) {
		printf OUTPUT @_;
	}
	printf @_;
}

sub log_stap {
	if (defined($opt_stapout)) {
		print OUTPUT @_;
	}
}

# Crude as hell, do not really care
my %found_alts;
sub search_kallsyms {
	my @search_symbols = @_;

	# Read kernel symbols and add conditional trace functions if they exist
	open(KALLSYMS, "/proc/kallsyms") || die("Failed to open /proc/kallsyms");
	while (<KALLSYMS>) {
		my ($dummy, $dummy, $symbol) = split(/\s+/, $_);
		my $conditional;
		if ($symbol eq "get_request_wait" || $symbol eq "shrink_zone") {
			push(@trace_functions, $symbol);
			$found_alts{$symbol} = 1;
			next;
		}
		foreach $conditional (@search_symbols) {
			if ($symbol eq $conditional) {
				push(@trace_functions, $symbol);
				last;
			}
		}
	}
	close(KALLSYMS);
}
search_kallsyms(@trace_conditional);
if ($found_alts{"get_request_wait"} != 1) {
	push(@trace_functions, "get_request");
}

if ($found_alts{"shrink_zone"} != 1) {
	my @alt_shrinks = ("shrink_zones",
			"kswapd_shrink_zone",
			"__zone_reclaim",
			"balance_pgdat");
	search_kallsyms(@alt_shrinks);
}

# Extract the framework script and fill in the rest
open(SELF, "$0") || die("Failed to open running script");
while (<SELF>) {
	chomp($_);
	if ($_ ne "__END__") {
		next;
	}
	while (<SELF>) {
		print $handle $_;
	}
}
foreach(@trace_functions) {
	print $handle "probe kprobe.function(\"$_\")
{ 
	t=tid()
	name[t]=execname()
	stalled_at[t]=time()
	where[t]=\"$_\"
	delete stalled[t]
}";
}

if ($opt_accurate_stall) {
	# In an ideal world, we would always use a retprobe to catch exactly when
	# the function exited and get a stall time from it. Unfortunately, it mangles
	# the stack trace so we have the option of either accurately tracking stalls
	# or accurately tracking stacks
	foreach(@trace_functions) {
		print $handle "probe kprobe.function(\"$_\").return
{
	t=tid()

	if ([t] in stalled) {
		stall_time = time() - stalled_at[t]
		printf(\"C %d (%s) %d %s %s\\n\", t, name[t], stall_time, time_units, where[t])
	}

	delete stalled[t]
	delete name[t]
	delete stalled_at[t]
	delete where[t]
}"
	}
} else {
	# Alternatively, we try to catch when a stall completes by probing
	# commonly used functions and guessing that when they are called
	# that the operation completed
	foreach(@completion_functions) {
		print $handle "probe kprobe.function(\"$_\").return
{
	t=tid()

	if ([t] in stalled) {
		stall_time = time() - stalled_at[t]
		printf(\"C %d (%s) %d %s %s\\n\", t, name[t], stall_time, time_units, where[t])
	}

	delete stalled[t]
	delete name[t]
	delete stalled_at[t]
	delete where[t]
}";
	}
}

close($handle);

# Contact
$stappid = open(STAP, "stap $stapscript|");
if (!defined($stappid)) {
	die("Failed to execute stap script");
}

# Collect information until interrupted
my %stalled;
while (1) {
	if (eof(STAP)) {
		cleanup();
		die("Unexpected exit of STAP script");
	}

	my $input = <STAP>;
	log_stap($input);
	if ($input !~ /([CS]) ([0-9]*) \((.*)\) ([0-9]*) ms (.*)/) {
		cleanup();
		die("Failed to parse input from stap script\n");
	}

	my $event    = $1;
	my $pid      = $2;
	my $name     = $3;
	my $stalled  = $4;
	my $where    = $5;
	my $recursed = 0;
	
	# Check if we have recursively stalled. This is "impossible" but unless
	# we are using kretprobes, we cannot reliable catch when stalls complete
	if (defined($stalled{$pid}->{NAME}) && $event eq "S") {
		$recursed = 1;
		if ($opt_accurate_stall) {
			cleanup();
			print("Apparently recursing, missing kretprobes.\n");
			print("Process:  $pid ($name)\n");
			print("Stalled:  " . $stalled{$pid}->{STALLFUNCTION} . "\n");
			print($stalled{$pid}->{STACKTRACE});
			print("Stalling: $where\n");
			exit(-1);
		}
	}

	# Record information related to stalls.
	if ($event eq "C" || $recursed) {
		if ($name ne $stalled{$pid}->{NAME}) {
			cleanup();
			die("Processes are changing their identity.");
		}
		if ($where ne $stalled{$pid}->{STALLFUNCTION}) {
			$recursed = 1;
			if ($opt_accurate_stall) {
				cleanup();
				die("The stalling function teleported.");
			}
		}

		# Do not event pretend the stall time is accurate
		if ($recursed) {
			$stalled = -1;
		}

		$stalled{$pid}->{VMSTAT_AT_COMPLETE} = read_vmstat();
		$stalled{$pid}->{BLOCKSTAT_AT_COMPLETE} = read_blockstat("+");
		my $delta_writeback    = $stalled{$pid}->{VMSTAT_AT_COMPLETE}[NR_WRITEBACK] - $stalled{$pid}->{VMSTAT_AT_STALL}[NR_WRITEBACK];
		my $delta_dirty        = $stalled{$pid}->{VMSTAT_AT_COMPLETE}[NR_DIRTY]     - $stalled{$pid}->{VMSTAT_AT_STALL}[NR_DIRTY];
		my $delta_vmscan_write = $stalled{$pid}->{VMSTAT_AT_COMPLETE}[VMSCAN_WRITE] - $stalled{$pid}->{VMSTAT_AT_STALL}[VMSCAN_WRITE];

		# Blind stab in the dark as to what is going on
		my $status;
		if ($where eq "balance_dirty_pages") {
			$status = "DirtyThrottled";
		} else {
			$status = "IO";
		}
		if ($delta_writeback < 0) {
			$status = "${status}_WritebackInProgress";
		}
		if ($delta_writeback > 0) {
			$status = "${status}_WritebackSlow";
		}

		log_output("time " . time() . ": $pid ($name) Stalled: $stalled ms: $where\n");
		log_output("Guessing: $status\n");
		log_printf("-%-15s %12d\n", "nr_dirty",        $stalled{$pid}->{VMSTAT_AT_STALL}[NR_DIRTY]);
		log_printf("-%-15s %12d\n", "nr_writeback",    $stalled{$pid}->{VMSTAT_AT_STALL}[NR_WRITEBACK]);
		log_printf("-%-15s %12d\n", "nr_vmscan_write", $stalled{$pid}->{VMSTAT_AT_STALL}[VMSCAN_WRITE]);
		log_printf("%s", $stalled{$pid}->{BLOCKSTAT_AT_STALL});
		log_printf("+%-15s %12d %12d\n", "nr_dirty",
			$stalled{$pid}->{VMSTAT_AT_COMPLETE}[NR_DIRTY], $delta_dirty);
		log_printf("+%-15s %12d %12d\n", "nr_writeback",
			$stalled{$pid}->{VMSTAT_AT_COMPLETE}[NR_WRITEBACK], $delta_writeback);
		log_printf("+%-15s %12d %12d\n", "nr_vmscan_write",
			$stalled{$pid}->{VMSTAT_AT_COMPLETE}[VMSCAN_WRITE],
			$delta_vmscan_write);
		log_printf("%s", $stalled{$pid}->{BLOCKSTAT_AT_COMPLETE});
		log_output($stalled{$pid}->{STACKTRACE});

		delete($stalled{$pid});
	}

	if ($event eq "S") {
		$stalled{$pid}->{NAME} = $name;
		$stalled{$pid}->{STACKTRACE} = read_stacktrace($pid);
		$stalled{$pid}->{VMSTAT_AT_STALL} = read_vmstat();
		$stalled{$pid}->{BLOCKSTAT_AT_STALL} = read_blockstat("-");
		$stalled{$pid}->{STALLFUNCTION} = $where;
	}
}

cleanup();
exit(0);
__END__
function time () { return gettimeofday_ms() }
global stall_threshold = 1000
global time_units = "ms"
global name, stalled_at, stalled, where

probe timer.profile {
	foreach (tid+ in stalled_at) {
		if ([tid] in stalled) continue

		stall_time = time() - stalled_at[tid]
		if (stall_time >= stall_threshold) {
			printf ("S %d (%s) %d %s %s\n", tid, name[tid], stall_time, time_units, where[tid])
			stalled[tid] = 1 # defer further reports to wakeup
		}
	}
}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-05 13:37   ` Michal Hocko
@ 2014-06-05 14:00     ` Felipe Contreras
  -1 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-05 14:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Thu, Jun 5, 2014 at 8:37 AM, Michal Hocko <mhocko@suse.cz> wrote:
> On Thu 05-06-14 06:33:40, Felipe Contreras wrote:

>> For a while I've noticed that my machine bogs down in certain
>> situations, usually while doing heavy I/O operations, it is not just the
>> I/O operations, but everything, including the graphical interface, even
>> the mouse pointer.
>>
>> As far as I can recall this did not happen in the past.
>>
>> I noticed this specially on certain operations, for example updating a
>> a game on Steam (to an exteranl USB 3.0 device), or copying TV episodes
>> to a USB memory stick (probably flash-based).
>
> We had a similar report for opensuse. The common part was that there was
> an IO to a slow USB device going on.

Well, it's a USB 3.0 device, I can write at 250 MB/s, so it's not
really that slow.

And in fact, when I read and write to and from the same USB 3.0
device, I don't see the issue.

>> Then I went back to the latest stable version (v3.14.5), and commented
>> out the line I think is causing the slow down:
>>
>>   if (nr_unqueued_dirty == nr_taken || nr_immediate)
>>         congestion_wait(BLK_RW_ASYNC, HZ/10);
>
> Yes, I came to the same check. I didn't have any confirmation yet so
> thanks for your confirmation. I've suggested to reduce this
> congestion_wait only to kswapd:
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 32c661d66a45..ef6a1c0e788c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>                  * implies that pages are cycling through the LRU faster than
>                  * they are written so also forcibly stall.
>                  */
> -               if (nr_unqueued_dirty == nr_taken || nr_immediate)
> +               if ((nr_unqueued_dirty == nr_taken || nr_immediate) && current_is_kswapd())
>                         congestion_wait(BLK_RW_ASYNC, HZ/10);
>         }

Unfortunately that doesn't fix the issue for me.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-05 14:00     ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-05 14:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Kernel Mailing List, linux-mm, Andrew Morton,
	Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Thu, Jun 5, 2014 at 8:37 AM, Michal Hocko <mhocko@suse.cz> wrote:
> On Thu 05-06-14 06:33:40, Felipe Contreras wrote:

>> For a while I've noticed that my machine bogs down in certain
>> situations, usually while doing heavy I/O operations, it is not just the
>> I/O operations, but everything, including the graphical interface, even
>> the mouse pointer.
>>
>> As far as I can recall this did not happen in the past.
>>
>> I noticed this specially on certain operations, for example updating a
>> a game on Steam (to an exteranl USB 3.0 device), or copying TV episodes
>> to a USB memory stick (probably flash-based).
>
> We had a similar report for opensuse. The common part was that there was
> an IO to a slow USB device going on.

Well, it's a USB 3.0 device, I can write at 250 MB/s, so it's not
really that slow.

And in fact, when I read and write to and from the same USB 3.0
device, I don't see the issue.

>> Then I went back to the latest stable version (v3.14.5), and commented
>> out the line I think is causing the slow down:
>>
>>   if (nr_unqueued_dirty == nr_taken || nr_immediate)
>>         congestion_wait(BLK_RW_ASYNC, HZ/10);
>
> Yes, I came to the same check. I didn't have any confirmation yet so
> thanks for your confirmation. I've suggested to reduce this
> congestion_wait only to kswapd:
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 32c661d66a45..ef6a1c0e788c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>                  * implies that pages are cycling through the LRU faster than
>                  * they are written so also forcibly stall.
>                  */
> -               if (nr_unqueued_dirty == nr_taken || nr_immediate)
> +               if ((nr_unqueued_dirty == nr_taken || nr_immediate) && current_is_kswapd())
>                         congestion_wait(BLK_RW_ASYNC, HZ/10);
>         }

Unfortunately that doesn't fix the issue for me.

-- 
Felipe Contreras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
  2014-06-05 11:33 ` Felipe Contreras
@ 2014-06-05 13:37   ` Michal Hocko
  -1 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-06-05 13:37 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-kernel, linux-mm, Andrew Morton, Linus Torvalds,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Thu 05-06-14 06:33:40, Felipe Contreras wrote:
> Hi,

Hi,
 
> For a while I've noticed that my machine bogs down in certain
> situations, usually while doing heavy I/O operations, it is not just the
> I/O operations, but everything, including the graphical interface, even
> the mouse pointer.
> 
> As far as I can recall this did not happen in the past.
> 
> I noticed this specially on certain operations, for example updating a
> a game on Steam (to an exteranl USB 3.0 device), or copying TV episodes
> to a USB memory stick (probably flash-based).

We had a similar report for opensuse. The common part was that there was
an IO to a slow USB device going on.
 
> Today I decided to finally hunt down the problem, so I created a
> synthetic test that basically consists on copying a bunch of files from
> one drive to another (from an SSD to an external USB 3.0). This is
> pretty similar to what I noticed; the graphical interface slows down.
> 
> Then I bisected the issue and it turns out that indeed it wasn't
> happening in the past, it started happening in v3.11, and it was
> triggered by this commit:
> 
>   e2be15f (mm: vmscan: stall page reclaim and writeback pages based on
>   dirty/writepage pages encountered)
> 
> Then I went back to the latest stable version (v3.14.5), and commented
> out the line I think is causing the slow down:
> 
>   if (nr_unqueued_dirty == nr_taken || nr_immediate)
> 	  congestion_wait(BLK_RW_ASYNC, HZ/10);

Yes, I came to the same check. I didn't have any confirmation yet so
thanks for your confirmation. I've suggested to reduce this
congestion_wait only to kswapd:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 32c661d66a45..ef6a1c0e788c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * implies that pages are cycling through the LRU faster than
 		 * they are written so also forcibly stall.
 		 */
-		if (nr_unqueued_dirty == nr_taken || nr_immediate)
+		if ((nr_unqueued_dirty == nr_taken || nr_immediate) && current_is_kswapd())
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 	}

But I am still not sure whether this is the right way to fix it. Direct
reclaimers can be throttled later on wait_iff_congested. I guess the
original intention was to throttle kswapd to not scan LRU full of dirty
pages like crazy. So I think it makes some sense to reduce the
congestion_wait only to kswapd.

 
> After that I don't notice the slow down any more.
> 
> Anybody has any ideas how to fix the issue properly?
> 
> -- 
> Felipe Contreras
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-05 13:37   ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-06-05 13:37 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-kernel, linux-mm, Andrew Morton, Linus Torvalds,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Thu 05-06-14 06:33:40, Felipe Contreras wrote:
> Hi,

Hi,
 
> For a while I've noticed that my machine bogs down in certain
> situations, usually while doing heavy I/O operations, it is not just the
> I/O operations, but everything, including the graphical interface, even
> the mouse pointer.
> 
> As far as I can recall this did not happen in the past.
> 
> I noticed this specially on certain operations, for example updating a
> a game on Steam (to an exteranl USB 3.0 device), or copying TV episodes
> to a USB memory stick (probably flash-based).

We had a similar report for opensuse. The common part was that there was
an IO to a slow USB device going on.
 
> Today I decided to finally hunt down the problem, so I created a
> synthetic test that basically consists on copying a bunch of files from
> one drive to another (from an SSD to an external USB 3.0). This is
> pretty similar to what I noticed; the graphical interface slows down.
> 
> Then I bisected the issue and it turns out that indeed it wasn't
> happening in the past, it started happening in v3.11, and it was
> triggered by this commit:
> 
>   e2be15f (mm: vmscan: stall page reclaim and writeback pages based on
>   dirty/writepage pages encountered)
> 
> Then I went back to the latest stable version (v3.14.5), and commented
> out the line I think is causing the slow down:
> 
>   if (nr_unqueued_dirty == nr_taken || nr_immediate)
> 	  congestion_wait(BLK_RW_ASYNC, HZ/10);

Yes, I came to the same check. I didn't have any confirmation yet so
thanks for your confirmation. I've suggested to reduce this
congestion_wait only to kswapd:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 32c661d66a45..ef6a1c0e788c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * implies that pages are cycling through the LRU faster than
 		 * they are written so also forcibly stall.
 		 */
-		if (nr_unqueued_dirty == nr_taken || nr_immediate)
+		if ((nr_unqueued_dirty == nr_taken || nr_immediate) && current_is_kswapd())
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 	}

But I am still not sure whether this is the right way to fix it. Direct
reclaimers can be throttled later on wait_iff_congested. I guess the
original intention was to throttle kswapd to not scan LRU full of dirty
pages like crazy. So I think it makes some sense to reduce the
congestion_wait only to kswapd.

 
> After that I don't notice the slow down any more.
> 
> Anybody has any ideas how to fix the issue properly?
> 
> -- 
> Felipe Contreras
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-05 11:33 ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-05 11:33 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Andrew Morton, Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel

Hi,

For a while I've noticed that my machine bogs down in certain
situations, usually while doing heavy I/O operations, it is not just the
I/O operations, but everything, including the graphical interface, even
the mouse pointer.

As far as I can recall this did not happen in the past.

I noticed this specially on certain operations, for example updating a
a game on Steam (to an exteranl USB 3.0 device), or copying TV episodes
to a USB memory stick (probably flash-based).

Today I decided to finally hunt down the problem, so I created a
synthetic test that basically consists on copying a bunch of files from
one drive to another (from an SSD to an external USB 3.0). This is
pretty similar to what I noticed; the graphical interface slows down.

Then I bisected the issue and it turns out that indeed it wasn't
happening in the past, it started happening in v3.11, and it was
triggered by this commit:

  e2be15f (mm: vmscan: stall page reclaim and writeback pages based on
  dirty/writepage pages encountered)

Then I went back to the latest stable version (v3.14.5), and commented
out the line I think is causing the slow down:

  if (nr_unqueued_dirty == nr_taken || nr_immediate)
	  congestion_wait(BLK_RW_ASYNC, HZ/10);

After that I don't notice the slow down any more.

Anybody has any ideas how to fix the issue properly?

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Interactivity regression since v3.11 in mm/vmscan.c
@ 2014-06-05 11:33 ` Felipe Contreras
  0 siblings, 0 replies; 33+ messages in thread
From: Felipe Contreras @ 2014-06-05 11:33 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Andrew Morton, Linus Torvalds, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel

Hi,

For a while I've noticed that my machine bogs down in certain
situations, usually while doing heavy I/O operations, it is not just the
I/O operations, but everything, including the graphical interface, even
the mouse pointer.

As far as I can recall this did not happen in the past.

I noticed this specially on certain operations, for example updating a
a game on Steam (to an exteranl USB 3.0 device), or copying TV episodes
to a USB memory stick (probably flash-based).

Today I decided to finally hunt down the problem, so I created a
synthetic test that basically consists on copying a bunch of files from
one drive to another (from an SSD to an external USB 3.0). This is
pretty similar to what I noticed; the graphical interface slows down.

Then I bisected the issue and it turns out that indeed it wasn't
happening in the past, it started happening in v3.11, and it was
triggered by this commit:

  e2be15f (mm: vmscan: stall page reclaim and writeback pages based on
  dirty/writepage pages encountered)

Then I went back to the latest stable version (v3.14.5), and commented
out the line I think is causing the slow down:

  if (nr_unqueued_dirty == nr_taken || nr_immediate)
	  congestion_wait(BLK_RW_ASYNC, HZ/10);

After that I don't notice the slow down any more.

Anybody has any ideas how to fix the issue properly?

-- 
Felipe Contreras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-06-09 12:58 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-07 12:35 Interactivity regression since v3.11 in mm/vmscan.c zhdxzx
2014-06-07 15:23 ` Felipe Contreras
2014-06-07 15:23   ` Felipe Contreras
2014-06-07 18:24 ` Linus Torvalds
2014-06-07 18:24   ` Linus Torvalds
2014-06-08 21:33   ` Linus Torvalds
2014-06-08 21:33     ` Linus Torvalds
2014-06-09 12:58   ` Mel Gorman
2014-06-09 12:58     ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2014-06-06  9:58 zhdxzx
2014-06-06 10:24 ` Felipe Contreras
2014-06-06 10:24   ` Felipe Contreras
2014-06-05 11:33 Felipe Contreras
2014-06-05 11:33 ` Felipe Contreras
2014-06-05 13:37 ` Michal Hocko
2014-06-05 13:37   ` Michal Hocko
2014-06-05 14:00   ` Felipe Contreras
2014-06-05 14:00     ` Felipe Contreras
2014-06-06  9:16     ` Michal Hocko
2014-06-06 10:33       ` Felipe Contreras
2014-06-06 10:33         ` Felipe Contreras
2014-06-06 11:03         ` Michal Hocko
2014-06-06 11:03           ` Michal Hocko
2014-06-06 12:54           ` Felipe Contreras
2014-06-06 12:54             ` Felipe Contreras
2014-06-06 23:11         ` Felipe Contreras
2014-06-06 23:11           ` Felipe Contreras
2014-06-09  7:53           ` Michal Hocko
2014-06-09  7:53             ` Michal Hocko
2014-06-09 10:02             ` Felipe Contreras
2014-06-09 10:02               ` Felipe Contreras
2014-06-06 12:27   ` Felipe Contreras
2014-06-06 12:27     ` Felipe Contreras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.