All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Michael Wang <wangyun@linux.vnet.ibm.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>, Mike Galbraith <efault@gmx.de>,
	Alex Shi <alex.shi@intel.com>, Namhyung Kim <namhyung@kernel.org>,
	Paul Turner <pjt@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
	Ram Pai <linuxram@us.ibm.com>
Subject: Re: [PATCH v2] sched: wake-affine throttle
Date: Wed, 22 May 2013 10:49:47 +0200	[thread overview]
Message-ID: <20130522084947.GQ26912@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <519AE7F2.706@linux.vnet.ibm.com>

On Tue, May 21, 2013 at 11:20:18AM +0800, Michael Wang wrote:
> 
> wake-affine stuff is always trying to pull wakee close to waker, by theory,
> this will benefit us if waker's cpu cached hot data for wakee, or the extreme
> ping-pong case, and testing show it could benefit hackbench 15% at most.
> 
> However, the whole feature is somewhat blindly, load balance is the only factor
> to be guaranteed, and since the stuff itself is time-consuming, some workload
> suffered, and testing show it could damage pgbench 41% at most.
> 
> The feature currently settled in mainline, which means the current scheduler
> force sacrificed some workloads to benefit others, that is definitely unfair.
> 
> Thus, this patch provide the way to throttle wake-affine stuff, in order to
> adjust the gain and loss according to demand.
> 
> The patch introduced a new knob 'sysctl_sched_wake_affine_interval' with the
> default value 1ms (default minimum balance interval), which means wake-affine
> will keep silent for 1ms after it's failure.
> 
> By turning the new knob, compared with mainline, which currently blindly using
> wake-affine, pgbench show 41% improvement at most.
> 
> Link:
> 	Analysis from Mike Galbraith about the improvement:
> 		https://lkml.org/lkml/2013/4/11/54
> 
> 	Analysis about the reason of throttle after failed:
> 		https://lkml.org/lkml/2013/5/3/31
> 
> Test:
> 	Test with 12 cpu X86 server and tip 3.10.0-rc1.
> 
> 				default
> 		    base	1ms interval	 10ms interval	   100ms interval
> | db_size | clients |  tps  |   |  tps  |        |  tps  |         |  tps  |
> +---------+---------+-------+   +-------+        +-------+         +-------+
> | 22 MB   |       1 | 10828 |   | 10850 |        | 10795 |         | 10845 |
> | 22 MB   |       2 | 21434 |   | 21469 |        | 21463 |         | 21455 |
> | 22 MB   |       4 | 41563 |   | 41826 |        | 41789 |         | 41779 |
> | 22 MB   |       8 | 53451 |   | 54917 |        | 59250 |         | 59097 |
> | 22 MB   |      12 | 48681 |   | 50454 |        | 53248 |         | 54881 |
> | 22 MB   |      16 | 46352 |   | 49627 | +7.07% | 54029 | +16.56% | 55935 | +20.67%
> | 22 MB   |      24 | 44200 |   | 46745 | +5.76% | 52106 | +17.89% | 57907 | +31.01%
> | 22 MB   |      32 | 43567 |   | 45264 | +3.90% | 51463 | +18.12% | 57122 | +31.11%
> | 7484 MB |       1 |  8926 |   |  8959 |        |  8765 |         |  8682 |
> | 7484 MB |       2 | 19308 |   | 19470 |        | 19397 |         | 19409 |
> | 7484 MB |       4 | 37269 |   | 37501 |        | 37552 |         | 37470 |
> | 7484 MB |       8 | 47277 |   | 48452 |        | 51535 |         | 52095 |
> | 7484 MB |      12 | 42815 |   | 45347 |        | 48478 |         | 49256 |
> | 7484 MB |      16 | 40951 |   | 44063 | +7.60% | 48536 | +18.52% | 51141 | +24.88%
> | 7484 MB |      24 | 37389 |   | 39620 | +5.97% | 47052 | +25.84% | 52720 | +41.00%
> | 7484 MB |      32 | 36705 |   | 38109 | +3.83% | 45932 | +25.14% | 51456 | +40.19%
> | 15 GB   |       1 |  8642 |   |  8850 |        |  9092 |         |  8560 |
> | 15 GB   |       2 | 19256 |   | 19285 |        | 19362 |         | 19322 |
> | 15 GB   |       4 | 37114 |   | 37131 |        | 37221 |         | 37257 |
> | 15 GB   |       8 | 47120 |   | 48053 |        | 50845 |         | 50923 |
> | 15 GB   |      12 | 42386 |   | 44748 |        | 47868 |         | 48875 |
> | 15 GB   |      16 | 40624 |   | 43414 | +6.87% | 48169 | +18.57% | 50814 | +25.08%
> | 15 GB   |      24 | 37110 |   | 39096 | +5.35% | 46594 | +25.56% | 52477 | +41.41%
> | 15 GB   |      32 | 36252 |   | 37316 | +2.94% | 45327 | +25.03% | 51217 | +41.28%
> 
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Mike Galbraith <efault@gmx.de>
> CC: Alex Shi <alex.shi@intel.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>

So I utterly hate this patch. I hate it worse than your initial buddy
patch :/

And I know its got a Suggested-by there; but that was when you led me to
believe that wake_affine() itself was expensive to run; its not, its the
result of those runs you don't like.

While we have a ton (too many to be sure) scheduler tunables, users
shouldn't ever need to actually touch those. Its just that every time we
have to make a random choice its as easy to make it a debug knob as to
hardcode it.

The problem with this patch is that users _have_ to frob knobs and while
doing so potentially wreck other workloads.

To make it worse, the knob isn't anything fundamental, its a random
hack.

So I would really either improve the smarts of wake_affine, with for
example your wake buddy relation thing (and simply exempt [Soft]IRQs) or
kill wake_affine and be done with it.

Either avenue has the risk of regressing some workload, but at least
when that happens (and people report it) we'll have a counter-example to
learn from and incorporate.

  parent reply	other threads:[~2013-05-22  8:50 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-10  3:30 [PATCH] sched: wake-affine throttle Michael Wang
2013-04-10  4:16 ` Alex Shi
2013-04-10  5:11   ` Michael Wang
2013-04-10  5:27     ` Alex Shi
2013-04-10  8:51 ` Peter Zijlstra
2013-04-10  9:22   ` Michael Wang
2013-04-11  6:01     ` Michael Wang
2013-04-11  7:30       ` Mike Galbraith
2013-04-11  8:26         ` Michael Wang
2013-04-11  8:44           ` Mike Galbraith
2013-04-11  9:00             ` Mike Galbraith
2013-04-11  9:02             ` Michael Wang
2013-04-12  3:17   ` Michael Wang
2013-04-22  4:21 ` Michael Wang
2013-04-22  5:27   ` Mike Galbraith
2013-04-22  6:19     ` Michael Wang
2013-04-22 10:23 ` Peter Zijlstra
2013-04-22 10:35   ` Ingo Molnar
2013-04-23  4:05     ` Michael Wang
2013-04-22 17:49   ` Paul Turner
2013-04-23  4:01   ` Michael Wang
2013-04-27  2:46   ` Michael Wang
2013-05-02  5:48   ` Michael Wang
2013-05-02  7:10     ` Mike Galbraith
2013-05-02  7:36       ` Michael Wang
2013-05-03  3:46 ` Michael Wang
2013-05-03  5:01   ` Mike Galbraith
2013-05-03  5:57     ` Michael Wang
2013-05-03  6:14       ` Mike Galbraith
2013-05-04  2:20         ` Michael Wang
2013-05-07  2:46   ` Michael Wang
2013-05-13  2:27     ` Michael Wang
2013-05-16  7:40   ` Michael Wang
2013-05-16  7:45 ` Michael Wang
2013-05-21  3:20 ` [PATCH v2] " Michael Wang
2013-05-21  6:47   ` Alex Shi
2013-05-21  6:52     ` Michael Wang
2013-05-22  8:49   ` Peter Zijlstra [this message]
2013-05-22  9:25     ` Michael Wang
2013-05-22 14:55       ` Mike Galbraith
2013-05-23  2:12         ` Michael Wang
2013-05-28  5:02         ` Michael Wang
2013-05-28  6:29           ` Mike Galbraith
2013-05-28  7:22             ` Michael Wang
2013-05-28  8:49               ` Mike Galbraith
2013-05-28  8:56                 ` Michael Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130522084947.GQ26912@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=pjt@google.com \
    --cc=wangyun@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.