* balance storm
@ 2014-05-26 3:04 Libo Chen
2014-05-26 5:11 ` Mike Galbraith
2014-05-26 7:56 ` Mike Galbraith
0 siblings, 2 replies; 33+ messages in thread
From: Libo Chen @ 2014-05-26 3:04 UTC (permalink / raw)
To: tglx, mingo, LKML; +Cc: Greg KH, Li Zefan
hi,
my box has 16 cpu (E5-2658,8 core, 2 thread per core), i did a test on
3.4.24stable, startup 50 same process, every process is sample:
#include <unistd.h>
int main()
{
for (;;)
{
unsigned int i = 0;
while (i< 100){
i++;
}
usleep(100);
}
return 0;
}
the result is process uses 15% cpu time, perf tool shows 70w migrations in 5 second.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4374 root 20 0 6020 332 256 S 15 0.0 0:03.73 a2.out
4371 root 20 0 6020 332 256 S 15 0.0 0:03.71 a2.out
4373 root 20 0 6020 332 256 R 15 0.0 0:03.72 a2.out
4377 root 20 0 6020 332 256 R 15 0.0 0:03.72 a2.out
4389 root 20 0 6020 328 256 S 15 0.0 0:03.71 a2.out
4391 root 20 0 6020 332 256 S 15 0.0 0:03.72 a2.out
4394 root 20 0 6020 332 256 S 15 0.0 0:03.70 a2.out
4398 root 20 0 6020 328 256 S 15 0.0 0:03.71 a2.out
4403 root 20 0 6020 332 256 S 15 0.0 0:03.71 a2.out
4405 root 20 0 6020 328 256 S 15 0.0 0:03.72 a2.out
4407 root 20 0 6020 332 256 S 15 0.0 0:03.73 a2.out
4369 root 20 0 6020 332 256 S 15 0.0 0:03.72 a2.out
4370 root 20 0 6020 332 256 S 15 0.0 0:03.70 a2.out
4372 root 20 0 6020 332 256 S 15 0.0 0:03.71 a2.out
4375 root 20 0 6020 332 256 S 15 0.0 0:03.70 a2.out
4378 root 20 0 6020 332 256 S 15 0.0 0:03.71 a2.out
4379 root 20 0 6020 332 256 S 15 0.0 0:03.71 a2.out
4380 root 20 0 6020 332 256 S 15 0.0 0:03.72 a2.out
4381 root 20 0 6020 332 256 S 15 0.0 0:03.71 a2.out
4383 root 20 0 6020 332 256 S 15 0.0 0:03.69 a2.out
4384 root 20 0 6020 332 256 S 15 0.0 0:03.72 a2.out
4386 root 20 0 6020 332 256 S 15 0.0 0:03.71 a2.out
4387 root 20 0 6020 328 256 S 15 0.0 0:03.70 a2.out
4388 root 20 0 6020 332 256 R 15 0.0 0:03.72 a2.out
4390 root 20 0 6020 332 256 S 15 0.0 0:03.70 a2.out
4392 root 20 0 6020 332 256 S 15 0.0 0:03.72 a2.out
4393 root 20 0 6020 332 256 S 15 0.0 0:03.72 a2.out
4395 root 20 0 6020 332 256 S 15 0.0 0:03.70 a2.out
4396 root 20 0 6020 328 256 S 15 0.0 0:03.71 a2.out
4397 root 20 0 6020 332 256 S 15 0.0 0:03.70 a2.out
4399 root 20 0 6020 332 256 R 15 0.0 0:03.72 a2.out
4400 root 20 0 6020 332 256 S 15 0.0 0:03.71 a2.out
4402 root 20 0 6020 332 256 S 15 0.0 0:03.70 a2.out
4404 root 20 0 6020 332 256 R 15 0.0 0:03.69 a2.out
4406 root 20 0 6020 332 256 S 15 0.0 0:03.71 a2.out
4408 root 20 0 6020 328 256 R 15 0.0 0:03.71 a2.out
4409 root 20 0 6020 332 256 R 15 0.0 0:03.71 a2.out
4410 root 20 0 6020 328 256 S 15 0.0 0:03.72 a2.out
4411 root 20 0 6020 332 256 S 15 0.0 0:03.71 a2.out
===========================================================================
when i reverts commit 908a3283728d92df36e0c7cd63304fd35e93a8a9:
diff --git a/kernel/sched.c b/kernel/sched.c
index 1874c74..4cdc91c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5138,7 +5138,20 @@ EXPORT_SYMBOL(task_nice);
*/
int idle_cpu(int cpu)
{
- return cpu_curr(cpu) == cpu_rq(cpu)->idle;
+ struct rq *rq = cpu_rq(cpu);
+
+ if (rq->curr != rq->idle)
+ return 0;
+
+ if (rq->nr_running)
+ return 0;
+
+#ifdef CONFIG_SMP
+ if (!llist_empty(&rq->wake_list))
+ return 0;
+#endif
+
+ return 1;
}
the result is process uses 3-5% cpu time, perf tool shows only 1k migrations in 5 second.
4444 root 20 0 6020 328 256 S 5 0.0 2:18.49 a2.out
4469 root 20 0 6020 328 256 S 5 0.0 2:15.93 a2.out
4423 root 20 0 6020 328 256 S 5 0.0 2:14.36 a2.out
4433 root 20 0 6020 332 256 S 5 0.0 2:15.81 a2.out
4466 root 20 0 6020 328 256 S 4 0.0 2:17.62 a2.out
4428 root 20 0 6020 332 256 S 4 0.0 2:13.92 a2.out
4457 root 20 0 6020 332 256 R 4 0.0 2:15.30 a2.out
4429 root 20 0 6020 328 256 R 4 0.0 2:17.13 a2.out
4431 root 20 0 6020 332 256 S 3 0.0 2:15.91 a2.out
4438 root 20 0 6020 332 256 S 3 0.0 2:14.04 a2.out
4439 root 20 0 6020 332 256 S 3 0.0 2:15.94 a2.out
4462 root 20 0 6020 332 256 R 3 0.0 2:16.40 a2.out
4422 root 20 0 6020 328 256 S 3 0.0 2:17.41 a2.out
4434 root 20 0 6020 332 256 R 3 0.0 2:15.67 a2.out
4440 root 20 0 6020 332 256 S 3 0.0 2:14.40 a2.out
4447 root 20 0 6020 332 256 S 3 0.0 2:16.02 a2.out
4448 root 20 0 6020 332 256 S 3 0.0 2:16.40 a2.out
4453 root 20 0 6020 332 256 R 3 0.0 2:15.75 a2.out
4459 root 20 0 6020 328 256 S 3 0.0 2:16.66 a2.out
4461 root 20 0 6020 332 256 S 3 0.0 2:15.77 a2.out
4471 root 20 0 6020 328 256 S 3 0.0 2:20.68 a2.out
4424 root 20 0 6020 328 256 S 3 0.0 2:15.90 a2.out
4427 root 20 0 6020 332 256 S 3 0.0 2:14.28 a2.out
4432 root 20 0 6020 332 256 S 3 0.0 2:14.63 a2.out
4435 root 20 0 6020 328 256 S 3 0.0 2:15.32 a2.out
4436 root 20 0 6020 328 256 S 3 0.0 2:15.40 a2.out
4437 root 20 0 6020 332 256 S 3 0.0 2:15.42 a2.out
4441 root 20 0 6020 332 256 S 3 0.0 2:18.59 a2.out
4443 root 20 0 6020 332 256 S 3 0.0 2:14.82 a2.out
4445 root 20 0 6020 332 256 R 3 0.0 2:13.12 a2.out
4449 root 20 0 6020 332 256 R 3 0.0 2:21.37 a2.out
4450 root 20 0 6020 332 256 S 3 0.0 2:15.78 a2.out
4451 root 20 0 6020 332 256 S 3 0.0 2:16.25 a2.out
4455 root 20 0 6020 332 256 S 3 0.0 2:18.58 a2.out
4456 root 20 0 6020 332 256 S 3 0.0 2:16.37 a2.out
4458 root 20 0 6020 328 256 S 3 0.0 2:18.03 a2.out
4460 root 20 0 6020 332 256 S 3 0.0 2:14.04 a2.out
4463 root 20 0 6020 328 256 S 3 0.0 2:16.74 a2.out
4464 root 20 0 6020 328 256 S 3 0.0 2:18.11 a2.out
I guess task migration takes up a lot of cpu, so i did another test. use taskset tool to bind
a task to a fixed cpu. Results in line with expectations, cpu usage is down to 5%.
other test:
- 3.15upstream has the same problem with 3.4.24.
- suse sp2 has low cpu usage about 5%.
so I think 15% cpu usage and migration event are too high, how to fixed?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-26 3:04 balance storm Libo Chen
@ 2014-05-26 5:11 ` Mike Galbraith
2014-05-26 12:16 ` Libo Chen
2014-05-26 7:56 ` Mike Galbraith
1 sibling, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2014-05-26 5:11 UTC (permalink / raw)
To: Libo Chen; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan
On Mon, 2014-05-26 at 11:04 +0800, Libo Chen wrote:
> hi,
> my box has 16 cpu (E5-2658,8 core, 2 thread per core), i did a test on
> 3.4.24stable, startup 50 same process, every process is sample:
>
> #include <unistd.h>
>
> int main()
> {
> for (;;)
> {
> unsigned int i = 0;
> while (i< 100){
> i++;
> }
> usleep(100);
> }
>
> return 0;
> }
>
> the result is process uses 15% cpu time, perf tool shows 70w migrations in 5 second.
See e0a79f52 sched: Fix select_idle_sibling() bouncing cow syndrome
That commit will fix expensive as hell bouncing for most real loads, but
it won't fix your test. Doing nothing but wake, select_idle_sibling()
will be traversing all cores/siblings mightily, taking L2 misses as it
traverses, bouncing wakees that do _nothing_ when an idle CPU is found.
Your synthetic test is the absolute worst case scenario. There has to
be work between wakeups for select_idle_sibling() to have any chance
whatsoever of turning in a win. At 0 work, it becomes 100% overhead.
> I guess task migration takes up a lot of cpu, so i did another test. use taskset tool to bind
> a task to a fixed cpu. Results in line with expectations, cpu usage is down to 5%.
>
> other test:
> - 3.15upstream has the same problem with 3.4.24.
> - suse sp2 has low cpu usage about 5%.
SLE11-SP2 has a patch which fixes that behavior, but of course at the
expense of other load types. A trade. It also throttles nohz, which
can have substantial cost when cross CPU scheduling.
> so I think 15% cpu usage and migration event are too high, how to fixed?
You can't for free, low latency wakeup can be worth one hell of a lot.
You could do a decayed hit/miss or such to shut the thing off when the
price is just too high. Restricting migrations per unit time per task
also helps cut the cost, but hurts tasks that could have gotten to the
CPU quicker, and started your next bit of work. Anything you do there
is going to be a rob Peter to pay Paul thing.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-26 3:04 balance storm Libo Chen
2014-05-26 5:11 ` Mike Galbraith
@ 2014-05-26 7:56 ` Mike Galbraith
2014-05-26 11:49 ` Libo Chen
1 sibling, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2014-05-26 7:56 UTC (permalink / raw)
To: Libo Chen; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan
On Mon, 2014-05-26 at 11:04 +0800, Libo Chen wrote:
> hi,
> my box has 16 cpu (E5-2658,8 core, 2 thread per core), i did a test on
> 3.4.24stable, startup 50 same process, every process is sample:
>
> #include <unistd.h>
>
> int main()
> {
> for (;;)
> {
> unsigned int i = 0;
> while (i< 100){
> i++;
> }
> usleep(100);
> }
>
> return 0;
> }
>
> the result is process uses 15% cpu time, perf tool shows 70w migrations in 5 second.
My 8 socket 64 core DL980 running 256 copies (3.14-rt5) munches ~4%/copy
per top, and does roughly 1 sh*tload migrations, nano-work loop or not.
Turn SD_SHARE_PKG_RESOURCES off at MC (not a noop here), and consumption
drops to ~2%/copy, and migrations ('course) mostly go away.
vogelweide:/abuild/mike/:[0]# perf stat -a -e sched:sched_migrate_task -- sleep 5
Performance counter stats for 'system wide':
3108 sched:sched_migrate_task
5.001367910 seconds time elapsed
(turns SD_SHARE_PKG_RESOURCES back on)
vogelweide:/abuild/mike/:[0]# perf stat -a -e sched:sched_migrate_task -- sleep 5
Performance counter stats for 'system wide':
4182334 sched:sched_migrate_task
5.001365023 seconds time elapsed
vogelweide:/abuild/mike/:[0]#
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-26 7:56 ` Mike Galbraith
@ 2014-05-26 11:49 ` Libo Chen
2014-05-26 14:03 ` Mike Galbraith
2014-05-27 9:48 ` Peter Zijlstra
0 siblings, 2 replies; 33+ messages in thread
From: Libo Chen @ 2014-05-26 11:49 UTC (permalink / raw)
To: Mike Galbraith; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz
On 2014/5/26 15:56, Mike Galbraith wrote:
> On Mon, 2014-05-26 at 11:04 +0800, Libo Chen wrote:
>> hi,
>> my box has 16 cpu (E5-2658,8 core, 2 thread per core), i did a test on
>> 3.4.24stable, startup 50 same process, every process is sample:
>>
>> #include <unistd.h>
>>
>> int main()
>> {
>> for (;;)
>> {
>> unsigned int i = 0;
>> while (i< 100){
>> i++;
>> }
>> usleep(100);
>> }
>>
>> return 0;
>> }
>>
>> the result is process uses 15% cpu time, perf tool shows 70w migrations in 5 second.
>
> My 8 socket 64 core DL980 running 256 copies (3.14-rt5) munches ~4%/copy
> per top, and does roughly 1 sh*tload migrations, nano-work loop or not.
> Turn SD_SHARE_PKG_RESOURCES off at MC (not a noop here), and consumption
> drops to ~2%/copy, and migrations ('course) mostly go away.
how to turn off SD_SHARE_PKG_RESOURCES in userspace ?
>
> vogelweide:/abuild/mike/:[0]# perf stat -a -e sched:sched_migrate_task -- sleep 5
>
> Performance counter stats for 'system wide':
>
> 3108 sched:sched_migrate_task
>
> 5.001367910 seconds time elapsed
>
> (turns SD_SHARE_PKG_RESOURCES back on)
>
> vogelweide:/abuild/mike/:[0]# perf stat -a -e sched:sched_migrate_task -- sleep 5
>
> Performance counter stats for 'system wide':
>
> 4182334 sched:sched_migrate_task
>
> 5.001365023 seconds time elapsed
>
> vogelweide:/abuild/mike/:[0]#
>
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-26 5:11 ` Mike Galbraith
@ 2014-05-26 12:16 ` Libo Chen
2014-05-26 14:19 ` Mike Galbraith
0 siblings, 1 reply; 33+ messages in thread
From: Libo Chen @ 2014-05-26 12:16 UTC (permalink / raw)
To: Mike Galbraith; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz
On 2014/5/26 13:11, Mike Galbraith wrote:
> On Mon, 2014-05-26 at 11:04 +0800, Libo Chen wrote:
>> hi,
>> my box has 16 cpu (E5-2658,8 core, 2 thread per core), i did a test on
>> 3.4.24stable, startup 50 same process, every process is sample:
>>
>> #include <unistd.h>
>>
>> int main()
>> {
>> for (;;)
>> {
>> unsigned int i = 0;
>> while (i< 100){
>> i++;
>> }
>> usleep(100);
>> }
>>
>> return 0;
>> }
>>
>> the result is process uses 15% cpu time, perf tool shows 70w migrations in 5 second.
>
> See e0a79f52 sched: Fix select_idle_sibling() bouncing cow syndrome
>
> That commit will fix expensive as hell bouncing for most real loads, but
> it won't fix your test. Doing nothing but wake, select_idle_sibling()
> will be traversing all cores/siblings mightily, taking L2 misses as it
> traverses, bouncing wakees that do _nothing_ when an idle CPU is found.
>
> Your synthetic test is the absolute worst case scenario. There has to
> be work between wakeups for select_idle_sibling() to have any chance
> whatsoever of turning in a win. At 0 work, it becomes 100% overhead.
not synthetic, it is a real problem in our product. under no load, waste
much cpu time.
>
>> I guess task migration takes up a lot of cpu, so i did another test. use taskset tool to bind
>> a task to a fixed cpu. Results in line with expectations, cpu usage is down to 5%.
>>
>> other test:
>> - 3.15upstream has the same problem with 3.4.24.
>> - suse sp2 has low cpu usage about 5%.
>
> SLE11-SP2 has a patch which fixes that behavior, but of course at the
> expense of other load types. A trade. It also throttles nohz, which
> can have substantial cost when cross CPU scheduling.
which patch ?
>
>> so I think 15% cpu usage and migration event are too high, how to fixed?
>
> You can't for free, low latency wakeup can be worth one hell of a lot.
>
> You could do a decayed hit/miss or such to shut the thing off when the
> price is just too high. Restricting migrations per unit time per task
> also helps cut the cost, but hurts tasks that could have gotten to the
> CPU quicker, and started your next bit of work. Anything you do there
> is going to be a rob Peter to pay Paul thing.
>
I had tried to change sched_migration_cost and sched_nr_migrate in /proc,
but no use. any other suggestion?
I still think this is a problem to schedular. it is better to directly solve
this issue instead of a workaroud
thanks,
Libo
> -Mike
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-26 11:49 ` Libo Chen
@ 2014-05-26 14:03 ` Mike Galbraith
2014-05-27 7:44 ` Libo Chen
2014-05-27 9:48 ` Peter Zijlstra
1 sibling, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2014-05-26 14:03 UTC (permalink / raw)
To: Libo Chen; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz
On Mon, 2014-05-26 at 19:49 +0800, Libo Chen wrote:
> how to turn off SD_SHARE_PKG_RESOURCES in userspace ?
I use a script Ingo gave me years and years ago to
twiddle /proc/sys/kernel/sched_domain/cpuN/domainN/flags domain wise.
Doing that won't do you any good without a handler to build/tear down
sd_llc when you poke at flags though. You can easily add a sched
feature to play with it.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-26 12:16 ` Libo Chen
@ 2014-05-26 14:19 ` Mike Galbraith
2014-05-27 7:56 ` Libo Chen
0 siblings, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2014-05-26 14:19 UTC (permalink / raw)
To: Libo Chen; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz
On Mon, 2014-05-26 at 20:16 +0800, Libo Chen wrote:
> On 2014/5/26 13:11, Mike Galbraith wrote:
> > Your synthetic test is the absolute worst case scenario. There has to
> > be work between wakeups for select_idle_sibling() to have any chance
> > whatsoever of turning in a win. At 0 work, it becomes 100% overhead.
>
> not synthetic, it is a real problem in our product. under no load, waste
> much cpu time.
What happens in your product if you apply the commit I pointed out?
> >> so I think 15% cpu usage and migration event are too high, how to fixed?
> >
> > You can't for free, low latency wakeup can be worth one hell of a lot.
> >
> > You could do a decayed hit/miss or such to shut the thing off when the
> > price is just too high. Restricting migrations per unit time per task
> > also helps cut the cost, but hurts tasks that could have gotten to the
> > CPU quicker, and started your next bit of work. Anything you do there
> > is going to be a rob Peter to pay Paul thing.
> >
>
> I had tried to change sched_migration_cost and sched_nr_migrate in /proc,
> but no use. any other suggestion?
>
> I still think this is a problem to schedular. it is better to directly solve
> this issue instead of a workaroud
I didn't say it wasn't a problem, it is. I said whatever you do will be
a tradeoff.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-26 14:03 ` Mike Galbraith
@ 2014-05-27 7:44 ` Libo Chen
2014-05-27 8:12 ` Mike Galbraith
0 siblings, 1 reply; 33+ messages in thread
From: Libo Chen @ 2014-05-27 7:44 UTC (permalink / raw)
To: Mike Galbraith; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz
On 2014/5/26 22:03, Mike Galbraith wrote:
> On Mon, 2014-05-26 at 19:49 +0800, Libo Chen wrote:
>
>> how to turn off SD_SHARE_PKG_RESOURCES in userspace ?
>
> I use a script Ingo gave me years and years ago to
> twiddle /proc/sys/kernel/sched_domain/cpuN/domainN/flags domain wise.
> Doing that won't do you any good without a handler to build/tear down
> sd_llc when you poke at flags though. You can easily add a sched
> feature to play with it.
I make a simple script:
for ((i=0;i<=15;i++))
do
echo 4143 > /proc/sys/kernel/sched_domain/cpu$i/domain1/flags
done
In our kernel SD_SHARE_PKG_RESOURCE is 0x0200, the original flag value is 4655,
domain1's name is MC.
but migrations event doesn't reduce like yours, what problem? I wouldn't like
recompile kernel :(
>
> -Mike
>
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-26 14:19 ` Mike Galbraith
@ 2014-05-27 7:56 ` Libo Chen
2014-05-27 9:55 ` Mike Galbraith
0 siblings, 1 reply; 33+ messages in thread
From: Libo Chen @ 2014-05-27 7:56 UTC (permalink / raw)
To: Mike Galbraith; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz
On 2014/5/26 22:19, Mike Galbraith wrote:
> On Mon, 2014-05-26 at 20:16 +0800, Libo Chen wrote:
>> On 2014/5/26 13:11, Mike Galbraith wrote:
>
>>> Your synthetic test is the absolute worst case scenario. There has to
>>> be work between wakeups for select_idle_sibling() to have any chance
>>> whatsoever of turning in a win. At 0 work, it becomes 100% overhead.
>>
>> not synthetic, it is a real problem in our product. under no load, waste
>> much cpu time.
>
> What happens in your product if you apply the commit I pointed out?
under no load, cpu usage is up to 60%, but the same apps cost 10% on
susp sp1. The apps use a lot of timer.
I am not sure that commit is the root cause, but they do have some different
cpu usage between 3.4.24 and suse sp1, e.g. my synthetic test before.
>
>>>> so I think 15% cpu usage and migration event are too high, how to fixed?
>>>
>>> You can't for free, low latency wakeup can be worth one hell of a lot.
>>>
>>> You could do a decayed hit/miss or such to shut the thing off when the
>>> price is just too high. Restricting migrations per unit time per task
>>> also helps cut the cost, but hurts tasks that could have gotten to the
>>> CPU quicker, and started your next bit of work. Anything you do there
>>> is going to be a rob Peter to pay Paul thing.
>>>
>>
>> I had tried to change sched_migration_cost and sched_nr_migrate in /proc,
>> but no use. any other suggestion?
>>
>> I still think this is a problem to schedular. it is better to directly solve
>> this issue instead of a workaroud
>
> I didn't say it wasn't a problem, it is. I said whatever you do will be
> a tradeoff.
>
> -Mike
>
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 7:44 ` Libo Chen
@ 2014-05-27 8:12 ` Mike Galbraith
0 siblings, 0 replies; 33+ messages in thread
From: Mike Galbraith @ 2014-05-27 8:12 UTC (permalink / raw)
To: Libo Chen; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz
On Tue, 2014-05-27 at 15:44 +0800, Libo Chen wrote:
> On 2014/5/26 22:03, Mike Galbraith wrote:
> > On Mon, 2014-05-26 at 19:49 +0800, Libo Chen wrote:
> >
> >> how to turn off SD_SHARE_PKG_RESOURCES in userspace ?
> >
> > I use a script Ingo gave me years and years ago to
> > twiddle /proc/sys/kernel/sched_domain/cpuN/domainN/flags domain wise.
> > Doing that won't do you any good without a handler to build/tear down
> > sd_llc when you poke at flags though. You can easily add a sched
> > feature to play with it.
>
>
> I make a simple script:
>
> for ((i=0;i<=15;i++))
> do
> echo 4143 > /proc/sys/kernel/sched_domain/cpu$i/domain1/flags
> done
>
> In our kernel SD_SHARE_PKG_RESOURCE is 0x0200, the original flag value is 4655,
> domain1's name is MC.
>
> but migrations event doesn't reduce like yours, what problem? I wouldn't like
> recompile kernel :(
Hm, I thought you were a kernel hacker, but guess not since that would
be a really weird thing for a kernel hacker to say :) Problem is that
there's no handler in your kernel to convert your flag poking to sd_llc
poking.
I could send you a patchlet, but that ain't gonna work, neither that nor
the commit I pointed out will seep into the kernel via osmosis. There
should be a kernel hacker somewhere near you, look down the hall by the
water cooler, when you find one, he/she should be able to help.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-26 11:49 ` Libo Chen
2014-05-26 14:03 ` Mike Galbraith
@ 2014-05-27 9:48 ` Peter Zijlstra
2014-05-27 10:05 ` Mike Galbraith
2014-05-27 12:55 ` Libo Chen
1 sibling, 2 replies; 33+ messages in thread
From: Peter Zijlstra @ 2014-05-27 9:48 UTC (permalink / raw)
To: Libo Chen; +Cc: Mike Galbraith, tglx, mingo, LKML, Greg KH, Li Zefan
[-- Attachment #1: Type: text/plain, Size: 2241 bytes --]
On Mon, May 26, 2014 at 07:49:10PM +0800, Libo Chen wrote:
> On 2014/5/26 15:56, Mike Galbraith wrote:
> > On Mon, 2014-05-26 at 11:04 +0800, Libo Chen wrote:
> >> hi,
> >> my box has 16 cpu (E5-2658,8 core, 2 thread per core), i did a test on
> >> 3.4.24stable, startup 50 same process, every process is sample:
> >>
> >> #include <unistd.h>
> >>
> >> int main()
> >> {
> >> for (;;)
> >> {
> >> unsigned int i = 0;
> >> while (i< 100){
> >> i++;
> >> }
> >> usleep(100);
> >> }
> >>
> >> return 0;
> >> }
> >>
> >> the result is process uses 15% cpu time, perf tool shows 70w migrations in 5 second.
> >
> > My 8 socket 64 core DL980 running 256 copies (3.14-rt5) munches ~4%/copy
> > per top, and does roughly 1 sh*tload migrations, nano-work loop or not.
> > Turn SD_SHARE_PKG_RESOURCES off at MC (not a noop here), and consumption
> > drops to ~2%/copy, and migrations ('course) mostly go away.
So:
1) what kind of weird ass workload is that? Why are you waking up so
often to do no work?
2) turning on/off share_pkg_resource is a horrid hack whichever way
aruond you turn it.
So I suppose this is due to the select_idle_sibling() nonsense again,
where we assumes L3 is a fair compromise between cheap enough and
effective enough.
Of course, Intel keeps growing the cpu count covered by L3 to ridiculous
sizes, 8 cores isn't nowhere near their top silly, which shifts the
balance, and there's always going to be pathological cases (like the
proposed workload) where its just always going to suck eggs.
Also, when running 50 such things on a 16 cpu machine, you get roughly 3
per cpu, since their runtime is stupid low, I would expect it to pretty
much always hit an idle cpu, which in turn should inhibit the migration.
Then again, maybe the timer slack is causing you grief, resulting in all
3 being woken at the same time, instead of having them staggered.
In any case, I'm not sure what the 'regression' report is against, as
there's only a single kernel version mentioned: 3.4, and that's almost a
dinosaur.
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 7:56 ` Libo Chen
@ 2014-05-27 9:55 ` Mike Galbraith
2014-05-27 12:50 ` Libo Chen
0 siblings, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2014-05-27 9:55 UTC (permalink / raw)
To: Libo Chen; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz
On Tue, 2014-05-27 at 15:56 +0800, Libo Chen wrote:
> On 2014/5/26 22:19, Mike Galbraith wrote:
> > On Mon, 2014-05-26 at 20:16 +0800, Libo Chen wrote:
> >> On 2014/5/26 13:11, Mike Galbraith wrote:
> >
> >>> Your synthetic test is the absolute worst case scenario. There has to
> >>> be work between wakeups for select_idle_sibling() to have any chance
> >>> whatsoever of turning in a win. At 0 work, it becomes 100% overhead.
> >>
> >> not synthetic, it is a real problem in our product. under no load, waste
> >> much cpu time.
> >
> > What happens in your product if you apply the commit I pointed out?
>
> under no load, cpu usage is up to 60%, but the same apps cost 10% on
> susp sp1. The apps use a lot of timer.
Something is rotten. 3.14-rt contains that commit, I ran your test with
256 threads on 64 core box, saw ~4%.
Putting master/nopreempt config on box and doing the same test, box is
chewing up truckloads of CPU, but not from migrations.
perf top -g --sort=symbol
Samples: 7M of event 'cycles', Event count (approx.): 1316249172581
- 82.56% [k] _raw_spin_lock_irqsave ▒
- _raw_spin_lock_irqsave ▒
- 96.59% __nanosleep_nocancel ◆
100.00% __libc_start_main ▒
2.88% __poll ▒
0 ▒
+ 1.56% [k] native_write_msr_safe ▒
+ 1.21% [k] update_cfs_shares ▒
+ 0.92% [k] __schedule ▒
+ 0.88% [k] _raw_spin_lock ▒
+ 0.73% [k] update_cfs_rq_blocked_load ▒
+ 0.62% [k] idle_cpu ▒
+ 0.47% [.] usleep ▒
+ 0.41% [k] cpuidle_enter_state ▒
+ 0.37% [k] set_task_cpu
Oh, 256 * usleep(100) is not a great idea.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 9:48 ` Peter Zijlstra
@ 2014-05-27 10:05 ` Mike Galbraith
2014-05-27 10:43 ` Peter Zijlstra
2014-05-27 12:55 ` Libo Chen
1 sibling, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2014-05-27 10:05 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Libo Chen, tglx, mingo, LKML, Greg KH, Li Zefan
On Tue, 2014-05-27 at 11:48 +0200, Peter Zijlstra wrote:
> So I suppose this is due to the select_idle_sibling() nonsense again,
> where we assumes L3 is a fair compromise between cheap enough and
> effective enough.
Nodz.
> Of course, Intel keeps growing the cpu count covered by L3 to ridiculous
> sizes, 8 cores isn't nowhere near their top silly, which shifts the
> balance, and there's always going to be pathological cases (like the
> proposed workload) where its just always going to suck eggs.
Test is as pathological as it gets. 15 core + SMT wouldn't be pretty.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 10:05 ` Mike Galbraith
@ 2014-05-27 10:43 ` Peter Zijlstra
2014-05-27 10:55 ` Mike Galbraith
0 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2014-05-27 10:43 UTC (permalink / raw)
To: Mike Galbraith; +Cc: Libo Chen, tglx, mingo, LKML, Greg KH, Li Zefan
[-- Attachment #1: Type: text/plain, Size: 1199 bytes --]
On Tue, May 27, 2014 at 12:05:33PM +0200, Mike Galbraith wrote:
> On Tue, 2014-05-27 at 11:48 +0200, Peter Zijlstra wrote:
>
> > So I suppose this is due to the select_idle_sibling() nonsense again,
> > where we assumes L3 is a fair compromise between cheap enough and
> > effective enough.
>
> Nodz.
>
> > Of course, Intel keeps growing the cpu count covered by L3 to ridiculous
> > sizes, 8 cores isn't nowhere near their top silly, which shifts the
> > balance, and there's always going to be pathological cases (like the
> > proposed workload) where its just always going to suck eggs.
>
> Test is as pathological as it gets. 15 core + SMT wouldn't be pretty.
So one thing we could maybe do is measure the cost of
select_idle_sibling(), just like we do for idle_balance() and compare
this against the tasks avg runtime.
We can go all crazy and do reduced searches; like test every n-th cpu in
the mask, or make it statistical and do a full search ever n wakeups.
Not sure what's a good approach. But L3 spanning more and more CPUs is
not something that's going to get cured anytime soon I'm afraid.
Not to mention bloody SMT which makes the whole mess worse.
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 10:43 ` Peter Zijlstra
@ 2014-05-27 10:55 ` Mike Galbraith
2014-05-27 12:56 ` Libo Chen
0 siblings, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2014-05-27 10:55 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Libo Chen, tglx, mingo, LKML, Greg KH, Li Zefan
On Tue, 2014-05-27 at 12:43 +0200, Peter Zijlstra wrote:
> On Tue, May 27, 2014 at 12:05:33PM +0200, Mike Galbraith wrote:
> > On Tue, 2014-05-27 at 11:48 +0200, Peter Zijlstra wrote:
> >
> > > So I suppose this is due to the select_idle_sibling() nonsense again,
> > > where we assumes L3 is a fair compromise between cheap enough and
> > > effective enough.
> >
> > Nodz.
> >
> > > Of course, Intel keeps growing the cpu count covered by L3 to ridiculous
> > > sizes, 8 cores isn't nowhere near their top silly, which shifts the
> > > balance, and there's always going to be pathological cases (like the
> > > proposed workload) where its just always going to suck eggs.
> >
> > Test is as pathological as it gets. 15 core + SMT wouldn't be pretty.
>
> So one thing we could maybe do is measure the cost of
> select_idle_sibling(), just like we do for idle_balance() and compare
> this against the tasks avg runtime.
>
> We can go all crazy and do reduced searches; like test every n-th cpu in
> the mask, or make it statistical and do a full search ever n wakeups.
>
> Not sure what's a good approach. But L3 spanning more and more CPUs is
> not something that's going to get cured anytime soon I'm afraid.
>
> Not to mention bloody SMT which makes the whole mess worse.
I think we should keep it dirt simple and above all dirt cheap. The per
task migration cap per unit time should meet that bill, limit the damage
potential, while also limiting the good, but that's tough. I don't see
any way to make it perfect, so I'll settle for good enough.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 9:55 ` Mike Galbraith
@ 2014-05-27 12:50 ` Libo Chen
2014-05-27 13:20 ` Mike Galbraith
2014-05-27 20:53 ` Thomas Gleixner
0 siblings, 2 replies; 33+ messages in thread
From: Libo Chen @ 2014-05-27 12:50 UTC (permalink / raw)
To: Mike Galbraith; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz
On 2014/5/27 17:55, Mike Galbraith wrote:
> On Tue, 2014-05-27 at 15:56 +0800, Libo Chen wrote:
>> > On 2014/5/26 22:19, Mike Galbraith wrote:
>>> > > On Mon, 2014-05-26 at 20:16 +0800, Libo Chen wrote:
>>>> > >> On 2014/5/26 13:11, Mike Galbraith wrote:
>>> > >
>>>>> > >>> Your synthetic test is the absolute worst case scenario. There has to
>>>>> > >>> be work between wakeups for select_idle_sibling() to have any chance
>>>>> > >>> whatsoever of turning in a win. At 0 work, it becomes 100% overhead.
>>>> > >>
>>>> > >> not synthetic, it is a real problem in our product. under no load, waste
>>>> > >> much cpu time.
>>> > >
>>> > > What happens in your product if you apply the commit I pointed out?
>> >
>> > under no load, cpu usage is up to 60%, but the same apps cost 10% on
>> > susp sp1. The apps use a lot of timer.
> Something is rotten. 3.14-rt contains that commit, I ran your test with
> 256 threads on 64 core box, saw ~4%.
>
> Putting master/nopreempt config on box and doing the same test, box is
> chewing up truckloads of CPU, but not from migrations.
>
> perf top -g --sort=symbol
in my box:
perf top -g --sort=symbol
Events: 3K cycles
73.27% [k] read_hpet
4.30% [k] _raw_spin_lock_irqsave
1.88% [k] __schedule
1.00% [k] idle_cpu
0.91% [k] native_write_msr_safe
0.68% [k] select_task_rq_fair
0.51% [k] module_get_kallsym
0.49% [.] sem_post
0.44% [.] main
0.41% [k] menu_select
0.39% [k] _raw_spin_lock
0.38% [k] __switch_to
0.33% [k] _raw_spin_lock_irq
0.32% [k] format_decode
0.29% [.] usleep
0.28% [.] symbols__insert
0.27% [k] tick_nohz_stop_sched_tick
0.27% [k] update_stats_wait_end
0.26% [k] apic_timer_interrupt
0.25% [k] enqueue_entity
0.25% [k] sched_clock_local
0.24% [k] _raw_spin_unlock_irqrestore
0.24% [k] select_idle_sibling
0.22% [k] number
0.22% [k] kallsyms_expand_symbol
0.21% [k] rcu_irq_exit
0.20% [k] ktime_get
0.20% [k] rb_insert_color
0.20% [k] set_next_entity
0.19% [k] vsnprintf
0.19% [k] try_to_wake_up
0.18% [k] __hrtimer_start_range_ns
0.18% [k] update_cfs_load
0.17% [k] rcu_idle_exit_common
0.17% [k] do_nanosleep
0.17% [.] __GI___libc_nanosleep
0.17% [k] trace_hardirqs_off
0.16% [k] irq_exit
0.16% [k] timerqueue_add
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 9:48 ` Peter Zijlstra
2014-05-27 10:05 ` Mike Galbraith
@ 2014-05-27 12:55 ` Libo Chen
2014-05-27 13:13 ` Peter Zijlstra
1 sibling, 1 reply; 33+ messages in thread
From: Libo Chen @ 2014-05-27 12:55 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Mike Galbraith, tglx, mingo, LKML, Greg KH, Li Zefan
On 2014/5/27 17:48, Peter Zijlstra wrote:
> So:
>
> 1) what kind of weird ass workload is that? Why are you waking up so
> often to do no work?
it's just a testcase, I agree it doesn`t exist in real world.
>
> 2) turning on/off share_pkg_resource is a horrid hack whichever way
> aruond you turn it.
>
> So I suppose this is due to the select_idle_sibling() nonsense again,
> where we assumes L3 is a fair compromise between cheap enough and
> effective enough.
>
> Of course, Intel keeps growing the cpu count covered by L3 to ridiculous
> sizes, 8 cores isn't nowhere near their top silly, which shifts the
> balance, and there's always going to be pathological cases (like the
> proposed workload) where its just always going to suck eggs.
>
> Also, when running 50 such things on a 16 cpu machine, you get roughly 3
> per cpu, since their runtime is stupid low, I would expect it to pretty
> much always hit an idle cpu, which in turn should inhibit the migration.
>
> Then again, maybe the timer slack is causing you grief, resulting in all
> 3 being woken at the same time, instead of having them staggered.
>
> In any case, I'm not sure what the 'regression' report is against, as
> there's only a single kernel version mentioned: 3.4, and that's almost a
upstream has the same problem, I have mentioned before.
> dinosaur.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 10:55 ` Mike Galbraith
@ 2014-05-27 12:56 ` Libo Chen
0 siblings, 0 replies; 33+ messages in thread
From: Libo Chen @ 2014-05-27 12:56 UTC (permalink / raw)
To: Mike Galbraith, Peter Zijlstra; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan
On 2014/5/27 18:55, Mike Galbraith wrote:
> On Tue, 2014-05-27 at 12:43 +0200, Peter Zijlstra wrote:
>> On Tue, May 27, 2014 at 12:05:33PM +0200, Mike Galbraith wrote:
>>> On Tue, 2014-05-27 at 11:48 +0200, Peter Zijlstra wrote:
>>>
>>>> So I suppose this is due to the select_idle_sibling() nonsense again,
>>>> where we assumes L3 is a fair compromise between cheap enough and
>>>> effective enough.
>>>
>>> Nodz.
>>>
>>>> Of course, Intel keeps growing the cpu count covered by L3 to ridiculous
>>>> sizes, 8 cores isn't nowhere near their top silly, which shifts the
>>>> balance, and there's always going to be pathological cases (like the
>>>> proposed workload) where its just always going to suck eggs.
>>>
>>> Test is as pathological as it gets. 15 core + SMT wouldn't be pretty.
>>
>> So one thing we could maybe do is measure the cost of
>> select_idle_sibling(), just like we do for idle_balance() and compare
>> this against the tasks avg runtime.
>>
>> We can go all crazy and do reduced searches; like test every n-th cpu in
>> the mask, or make it statistical and do a full search ever n wakeups.
>>
>> Not sure what's a good approach. But L3 spanning more and more CPUs is
>> not something that's going to get cured anytime soon I'm afraid.
>>
>> Not to mention bloody SMT which makes the whole mess worse.
>
> I think we should keep it dirt simple and above all dirt cheap. The per
> task migration cap per unit time should meet that bill, limit the damage
> potential, while also limiting the good, but that's tough. I don't see
agree
> any way to make it perfect, so I'll settle for good enough.
>
> -Mike
>
>
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 12:55 ` Libo Chen
@ 2014-05-27 13:13 ` Peter Zijlstra
0 siblings, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2014-05-27 13:13 UTC (permalink / raw)
To: Libo Chen; +Cc: Mike Galbraith, tglx, mingo, LKML, Greg KH, Li Zefan
On Tue, May 27, 2014 at 08:55:20PM +0800, Libo Chen wrote:
> On 2014/5/27 17:48, Peter Zijlstra wrote:
> > In any case, I'm not sure what the 'regression' report is against, as
> > there's only a single kernel version mentioned: 3.4, and that's almost a
> upstream has the same problem, I have mentioned before.
Not on anything that landed in my inbox I think, but that's not the
point. For a regression report you need _2_ kernel versions, one with
and one without the 'problem'.
Providing one (or two) that have a problem doesn't qualify.
In any case, I didn't see the original email, but I got the impression
that it was complaining about 'new' behaviour from the bits I did see as
quoted.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 12:50 ` Libo Chen
@ 2014-05-27 13:20 ` Mike Galbraith
2014-05-28 1:04 ` Libo Chen
2014-05-27 20:53 ` Thomas Gleixner
1 sibling, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2014-05-27 13:20 UTC (permalink / raw)
To: Libo Chen; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz
On Tue, 2014-05-27 at 20:50 +0800, Libo Chen wrote:
> in my box:
>
> perf top -g --sort=symbol
>
> Events: 3K cycles
> 73.27% [k] read_hpet
> 4.30% [k] _raw_spin_lock_irqsave
> 1.88% [k] __schedule
> 1.00% [k] idle_cpu
> 0.91% [k] native_write_msr_safe
> 0.68% [k] select_task_rq_fair
> 0.51% [k] module_get_kallsym
> 0.49% [.] sem_post
> 0.44% [.] main
> 0.41% [k] menu_select
> 0.39% [k] _raw_spin_lock
> 0.38% [k] __switch_to
> 0.33% [k] _raw_spin_lock_irq
> 0.32% [k] format_decode
> 0.29% [.] usleep
> 0.28% [.] symbols__insert
> 0.27% [k] tick_nohz_stop_sched_tick
> 0.27% [k] update_stats_wait_end
> 0.26% [k] apic_timer_interrupt
> 0.25% [k] enqueue_entity
> 0.25% [k] sched_clock_local
> 0.24% [k] _raw_spin_unlock_irqrestore
> 0.24% [k] select_idle_sibling
read_hpet? Are you booting box notsc or something? Migration cost is
the least of your worries.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 12:50 ` Libo Chen
2014-05-27 13:20 ` Mike Galbraith
@ 2014-05-27 20:53 ` Thomas Gleixner
2014-05-28 1:06 ` Libo Chen
1 sibling, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2014-05-27 20:53 UTC (permalink / raw)
To: Libo Chen; +Cc: Mike Galbraith, mingo, LKML, Greg KH, Li Zefan, peterz
On Tue, 27 May 2014, Libo Chen wrote:
> On 2014/5/27 17:55, Mike Galbraith wrote:
> > On Tue, 2014-05-27 at 15:56 +0800, Libo Chen wrote:
> >> > On 2014/5/26 22:19, Mike Galbraith wrote:
> >>> > > On Mon, 2014-05-26 at 20:16 +0800, Libo Chen wrote:
> >>>> > >> On 2014/5/26 13:11, Mike Galbraith wrote:
> >>> > >
> >>>>> > >>> Your synthetic test is the absolute worst case scenario. There has to
> >>>>> > >>> be work between wakeups for select_idle_sibling() to have any chance
> >>>>> > >>> whatsoever of turning in a win. At 0 work, it becomes 100% overhead.
> >>>> > >>
> >>>> > >> not synthetic, it is a real problem in our product. under no load, waste
> >>>> > >> much cpu time.
> >>> > >
> >>> > > What happens in your product if you apply the commit I pointed out?
> >> >
> >> > under no load, cpu usage is up to 60%, but the same apps cost 10% on
> >> > susp sp1. The apps use a lot of timer.
> > Something is rotten. 3.14-rt contains that commit, I ran your test with
> > 256 threads on 64 core box, saw ~4%.
> >
> > Putting master/nopreempt config on box and doing the same test, box is
> > chewing up truckloads of CPU, but not from migrations.
> >
> > perf top -g --sort=symbol
> in my box:
>
> perf top -g --sort=symbol
>
> Events: 3K cycles
> 73.27% [k] read_hpet
Why is that machine using read_hpet() ?
Please provide the output of
# dmesg | grep -i tsc
and
# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
and
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
Thanks,
tglx
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 13:20 ` Mike Galbraith
@ 2014-05-28 1:04 ` Libo Chen
2014-05-28 1:53 ` Mike Galbraith
0 siblings, 1 reply; 33+ messages in thread
From: Libo Chen @ 2014-05-28 1:04 UTC (permalink / raw)
To: Mike Galbraith; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz, Huang Qiang
On 2014/5/27 21:20, Mike Galbraith wrote:
> On Tue, 2014-05-27 at 20:50 +0800, Libo Chen wrote:
>
>> in my box:
>>
>> perf top -g --sort=symbol
>>
>> Events: 3K cycles
>> 73.27% [k] read_hpet
>> 4.30% [k] _raw_spin_lock_irqsave
>> 1.88% [k] __schedule
>> 1.00% [k] idle_cpu
>> 0.91% [k] native_write_msr_safe
>> 0.68% [k] select_task_rq_fair
>> 0.51% [k] module_get_kallsym
>> 0.49% [.] sem_post
>> 0.44% [.] main
>> 0.41% [k] menu_select
>> 0.39% [k] _raw_spin_lock
>> 0.38% [k] __switch_to
>> 0.33% [k] _raw_spin_lock_irq
>> 0.32% [k] format_decode
>> 0.29% [.] usleep
>> 0.28% [.] symbols__insert
>> 0.27% [k] tick_nohz_stop_sched_tick
>> 0.27% [k] update_stats_wait_end
>> 0.26% [k] apic_timer_interrupt
>> 0.25% [k] enqueue_entity
>> 0.25% [k] sched_clock_local
>> 0.24% [k] _raw_spin_unlock_irqrestore
>> 0.24% [k] select_idle_sibling
>
> read_hpet? Are you booting box notsc or something? Migration cost is
> the least of your worries.
oh yes, no tsc only hpet in my box. I don't know hhy is read_hpet is hot.
but when I bind 3-th tasks to percpu,cost will be rapid decline, yet perf
shows read_hpet is still hot.
after bind
Events: 561K cycles
64.18% [kernel] [k] read_hpet
5.51% usleep [.] main
2.71% [kernel] [k] __schedule
1.82% [kernel] [k] _raw_spin_lock_irqsave
1.56% libc-2.11.3.so [.] usleep
1.07% [kernel] [k] apic_timer_interrupt
0.89% libc-2.11.3.so [.] __GI___libc_nanosleep
0.82% [kernel] [k] native_write_msr_safe
0.82% [kernel] [k] ktime_get
0.71% [kernel] [k] trace_hardirqs_off
0.63% [kernel] [k] __switch_to
0.60% [kernel] [k] _raw_spin_unlock_irqrestore
0.47% [kernel] [k] menu_select
0.46% [kernel] [k] _raw_spin_lock
0.45% [kernel] [k] enqueue_entity
0.45% [kernel] [k] sched_clock_local
0.43% [kernel] [k] try_to_wake_up
0.42% [kernel] [k] hrtimer_nanosleep
0.36% [kernel] [k] do_nanosleep
0.35% [kernel] [k] _raw_spin_lock_irq
0.34% [kernel] [k] rb_insert_color
0.29% [kernel] [k] update_curr
0.29% [kernel] [k] native_sched_clock
0.28% [kernel] [k] hrtimer_interrupt
0.28% [kernel] [k] rcu_idle_exit_common
0.27% [kernel] [k] hrtimer_init
0.27% [kernel] [k] __hrtimer_start_range_ns
0.26% [kernel] [k] __rb_erase_color
0.26% [kernel] [k] lock_hrtimer_base
0.25% [kernel] [k] trace_hardirqs_on
0.23% [kernel] [k] rcu_idle_enter_common
0.23% [kernel] [k] cpuidle_idle_call
0.23% [kernel] [k] finish_task_switch
0.22% [kernel] [k] set_next_entity
0.22% [kernel] [k] cpuacct_charge
0.22% [kernel] [k] pick_next_task_fair
0.21% [kernel] [k] sys_nanosleep
0.20% [kernel] [k] rb_next
0.20% [kernel] [k] start_critical_timings
>
> -Mike
>
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-27 20:53 ` Thomas Gleixner
@ 2014-05-28 1:06 ` Libo Chen
0 siblings, 0 replies; 33+ messages in thread
From: Libo Chen @ 2014-05-28 1:06 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Mike Galbraith, mingo, LKML, Greg KH, Li Zefan, peterz, Huang Qiang
On 2014/5/28 4:53, Thomas Gleixner wrote:
> On Tue, 27 May 2014, Libo Chen wrote:
>> On 2014/5/27 17:55, Mike Galbraith wrote:
>>> On Tue, 2014-05-27 at 15:56 +0800, Libo Chen wrote:
>>>>> On 2014/5/26 22:19, Mike Galbraith wrote:
>>>>>>> On Mon, 2014-05-26 at 20:16 +0800, Libo Chen wrote:
>>>>>>>>> On 2014/5/26 13:11, Mike Galbraith wrote:
>>>>>>>
>>>>>>>>>>> Your synthetic test is the absolute worst case scenario. There has to
>>>>>>>>>>> be work between wakeups for select_idle_sibling() to have any chance
>>>>>>>>>>> whatsoever of turning in a win. At 0 work, it becomes 100% overhead.
>>>>>>>>>
>>>>>>>>> not synthetic, it is a real problem in our product. under no load, waste
>>>>>>>>> much cpu time.
>>>>>>>
>>>>>>> What happens in your product if you apply the commit I pointed out?
>>>>>
>>>>> under no load, cpu usage is up to 60%, but the same apps cost 10% on
>>>>> susp sp1. The apps use a lot of timer.
>>> Something is rotten. 3.14-rt contains that commit, I ran your test with
>>> 256 threads on 64 core box, saw ~4%.
>>>
>>> Putting master/nopreempt config on box and doing the same test, box is
>>> chewing up truckloads of CPU, but not from migrations.
>>>
>>> perf top -g --sort=symbol
>> in my box:
>>
>> perf top -g --sort=symbol
>>
>> Events: 3K cycles
>> 73.27% [k] read_hpet
>
> Why is that machine using read_hpet() ?
>
> Please provide the output of
>
> # dmesg | grep -i tsc
>
Euler:/home # dmesg | grep -i tsc
[ 0.000000] Fast TSC calibration using PIT
[ 0.226921] TSC synchronization [CPU#0 -> CPU#1]:
[ 0.227142] Measured 1053728 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.008000] Marking TSC unstable due to check_tsc_sync_source failed
> and
>
> # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm
>
> and
>
> # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
>
> Thanks,
>
> tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
> .
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-28 1:04 ` Libo Chen
@ 2014-05-28 1:53 ` Mike Galbraith
2014-05-28 6:54 ` Libo Chen
0 siblings, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2014-05-28 1:53 UTC (permalink / raw)
To: Libo Chen; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz, Huang Qiang
On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:
> oh yes, no tsc only hpet in my box.
Making poor E5-2658 box a crippled wreck.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-28 1:53 ` Mike Galbraith
@ 2014-05-28 6:54 ` Libo Chen
2014-05-28 8:16 ` Mike Galbraith
2014-05-28 9:08 ` Thomas Gleixner
0 siblings, 2 replies; 33+ messages in thread
From: Libo Chen @ 2014-05-28 6:54 UTC (permalink / raw)
To: Mike Galbraith; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz, Huang Qiang
On 2014/5/28 9:53, Mike Galbraith wrote:
> On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:
>
>> oh yes, no tsc only hpet in my box.
>
> Making poor E5-2658 box a crippled wreck.
yes,it is. But cpu usage will be down from 15% to 5% when binding cpu, so maybe read_hpet
is not the root cause.
thanks,
Libo
>
> -Mike
>
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-28 6:54 ` Libo Chen
@ 2014-05-28 8:16 ` Mike Galbraith
2014-05-28 9:08 ` Thomas Gleixner
1 sibling, 0 replies; 33+ messages in thread
From: Mike Galbraith @ 2014-05-28 8:16 UTC (permalink / raw)
To: Libo Chen; +Cc: tglx, mingo, LKML, Greg KH, Li Zefan, peterz, Huang Qiang
On Wed, 2014-05-28 at 14:54 +0800, Libo Chen wrote:
> On 2014/5/28 9:53, Mike Galbraith wrote:
> > On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:
> >
> >> oh yes, no tsc only hpet in my box.
> >
> > Making poor E5-2658 box a crippled wreck.
>
> yes,it is. But cpu usage will be down from 15% to 5% when binding cpu, so maybe read_hpet
> is not the root cause.
I don't think anyone will be particularly interested in making kernel
changes based upon the behavior of a broken box. The problem we were
discussing is real enough though, it's just a question of severity.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-28 6:54 ` Libo Chen
2014-05-28 8:16 ` Mike Galbraith
@ 2014-05-28 9:08 ` Thomas Gleixner
2014-05-28 10:30 ` Peter Zijlstra
` (2 more replies)
1 sibling, 3 replies; 33+ messages in thread
From: Thomas Gleixner @ 2014-05-28 9:08 UTC (permalink / raw)
To: Libo Chen
Cc: Mike Galbraith, mingo, LKML, Greg KH, Li Zefan, peterz, Huang Qiang
On Wed, 28 May 2014, Libo Chen wrote:
> On 2014/5/28 9:53, Mike Galbraith wrote:
> > On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:
> >
> >> oh yes, no tsc only hpet in my box.
> >
> > Making poor E5-2658 box a crippled wreck.
>
> yes,it is. But cpu usage will be down from 15% to 5% when binding
> cpu, so maybe read_hpet is not the root cause.
Definitely hpet _IS_ the root cause on a machine as large as this,
simply because everything gets serialized on the hpet access.
Binding stuff to cpus just makes the timing behaviour different, so
the hpet serialization is not that prominent, but still bad enough.
Talk to your HW/BIOS vendor. The kernel cannot do anything about
defunct hardware.
Thanks,
tglx
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-28 9:08 ` Thomas Gleixner
@ 2014-05-28 10:30 ` Peter Zijlstra
2014-05-28 10:52 ` Borislav Petkov
2014-05-28 11:43 ` Libo Chen
2014-05-29 7:57 ` Libo Chen
2 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2014-05-28 10:30 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Libo Chen, Mike Galbraith, mingo, LKML, Greg KH, Li Zefan,
Huang Qiang, bp
[-- Attachment #1: Type: text/plain, Size: 3758 bytes --]
On Wed, May 28, 2014 at 11:08:40AM +0200, Thomas Gleixner wrote:
> On Wed, 28 May 2014, Libo Chen wrote:
>
> > On 2014/5/28 9:53, Mike Galbraith wrote:
> > > On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:
> > >
> > >> oh yes, no tsc only hpet in my box.
> > >
> > > Making poor E5-2658 box a crippled wreck.
> >
> > yes,it is. But cpu usage will be down from 15% to 5% when binding
> > cpu, so maybe read_hpet is not the root cause.
>
> Definitely hpet _IS_ the root cause on a machine as large as this,
> simply because everything gets serialized on the hpet access.
>
> Binding stuff to cpus just makes the timing behaviour different, so
> the hpet serialization is not that prominent, but still bad enough.
>
> Talk to your HW/BIOS vendor. The kernel cannot do anything about
> defunct hardware.
---
Subject: x86: FW_BUG when the TSC goes funny on hardware where it really should be stable
It happens far too often on 'consumer' grade hardware, and sometimes on
'enterprise' too that the TSC gets marked unstable due to FW fuckage,
complain more loudly in this case.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/tsc.h | 1 +
arch/x86/kernel/cpu/amd.c | 4 +++-
arch/x86/kernel/cpu/intel.c | 4 +++-
arch/x86/kernel/tsc.c | 7 +++++++
4 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 94605c0e9cee..e33853ee0416 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -52,6 +52,7 @@ extern int check_tsc_unstable(void);
extern int check_tsc_disabled(void);
extern unsigned long native_calibrate_tsc(void);
+extern int tsc_should_be_reliable;
extern int tsc_clocksource_reliable;
/*
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index ce8b8ff0e0ef..46012d2ca5a1 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -483,8 +483,10 @@ static void early_init_amd(struct cpuinfo_x86 *c)
if (c->x86_power & (1 << 8)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
- if (!check_tsc_unstable())
+ if (!check_tsc_unstable()) {
+ tsc_should_be_reliable = 1;
set_sched_clock_stable();
+ }
}
#ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index a80029035bf2..2273ca1166bc 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -88,8 +88,10 @@ static void early_init_intel(struct cpuinfo_x86 *c)
if (c->x86_power & (1 << 8)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
- if (!check_tsc_unstable())
+ if (!check_tsc_unstable()) {
+ tsc_should_be_reliable = 1;
set_sched_clock_stable();
+ }
}
/* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 57e5ce126d5a..1f93827561d8 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -40,6 +40,7 @@ static int __read_mostly tsc_disabled = -1;
static struct static_key __use_tsc = STATIC_KEY_INIT;
+int tsc_should_be_reliable;
int tsc_clocksource_reliable;
/*
@@ -994,6 +995,12 @@ void mark_tsc_unstable(char *reason)
clear_sched_clock_stable();
disable_sched_clock_irqtime();
pr_info("Marking TSC unstable due to %s\n", reason);
+
+ if (tsc_should_be_reliable) {
+ pr_err(FW_BUG "TSC unstable even though it should be; "
+ "HW/BIOS broken, contact your vendor.\n");
+ }
+
/* Change only the rating, when not registered */
if (clocksource_tsc.mult)
clocksource_mark_unstable(&clocksource_tsc);
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-28 10:30 ` Peter Zijlstra
@ 2014-05-28 10:52 ` Borislav Petkov
0 siblings, 0 replies; 33+ messages in thread
From: Borislav Petkov @ 2014-05-28 10:52 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Gleixner, Libo Chen, Mike Galbraith, mingo, LKML, Greg KH,
Li Zefan, Huang Qiang
On Wed, May 28, 2014 at 12:30:19PM +0200, Peter Zijlstra wrote:
> On Wed, May 28, 2014 at 11:08:40AM +0200, Thomas Gleixner wrote:
> > On Wed, 28 May 2014, Libo Chen wrote:
> >
> > > On 2014/5/28 9:53, Mike Galbraith wrote:
> > > > On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:
> > > >
> > > >> oh yes, no tsc only hpet in my box.
> > > >
> > > > Making poor E5-2658 box a crippled wreck.
> > >
> > > yes,it is. But cpu usage will be down from 15% to 5% when binding
> > > cpu, so maybe read_hpet is not the root cause.
> >
> > Definitely hpet _IS_ the root cause on a machine as large as this,
> > simply because everything gets serialized on the hpet access.
> >
> > Binding stuff to cpus just makes the timing behaviour different, so
> > the hpet serialization is not that prominent, but still bad enough.
> >
> > Talk to your HW/BIOS vendor. The kernel cannot do anything about
> > defunct hardware.
>
> ---
> Subject: x86: FW_BUG when the TSC goes funny on hardware where it really should be stable
>
> It happens far too often on 'consumer' grade hardware, and sometimes on
> 'enterprise' too that the TSC gets marked unstable due to FW fuckage,
> complain more loudly in this case.
>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Borislav Petkov <bp@suse.de>
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-28 9:08 ` Thomas Gleixner
2014-05-28 10:30 ` Peter Zijlstra
@ 2014-05-28 11:43 ` Libo Chen
2014-05-28 11:55 ` Mike Galbraith
2014-05-29 7:57 ` Libo Chen
2 siblings, 1 reply; 33+ messages in thread
From: Libo Chen @ 2014-05-28 11:43 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Mike Galbraith, mingo, LKML, Greg KH, Li Zefan, peterz,
Huang Qiang, Peter Zijlstra, Borislav Petkov, Greg KH
On 2014/5/28 17:08, Thomas Gleixner wrote:
> On Wed, 28 May 2014, Libo Chen wrote:
>
>> On 2014/5/28 9:53, Mike Galbraith wrote:
>>> On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:
>>>
>>>> oh yes, no tsc only hpet in my box.
>>>
>>> Making poor E5-2658 box a crippled wreck.
>>
>> yes,it is. But cpu usage will be down from 15% to 5% when binding
>> cpu, so maybe read_hpet is not the root cause.
>
> Definitely hpet _IS_ the root cause on a machine as large as this,
> simply because everything gets serialized on the hpet access.
>
> Binding stuff to cpus just makes the timing behaviour different, so
> the hpet serialization is not that prominent, but still bad enough.
>
> Talk to your HW/BIOS vendor. The kernel cannot do anything about
> defunct hardware.
thank you for your reply.but suse sp2 is very good in this scene.
Can it be said there has a bug, then community fix it later,
so it's just a coincidence?
Libo
>
> Thanks,
>
> tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-28 11:43 ` Libo Chen
@ 2014-05-28 11:55 ` Mike Galbraith
2014-05-29 7:58 ` Libo Chen
0 siblings, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2014-05-28 11:55 UTC (permalink / raw)
To: Libo Chen
Cc: Thomas Gleixner, mingo, LKML, Greg KH, Li Zefan, peterz,
Huang Qiang, Borislav Petkov
On Wed, 2014-05-28 at 19:43 +0800, Libo Chen wrote:
> On 2014/5/28 17:08, Thomas Gleixner wrote:
> > On Wed, 28 May 2014, Libo Chen wrote:
> >
> >> On 2014/5/28 9:53, Mike Galbraith wrote:
> >>> On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:
> >>>
> >>>> oh yes, no tsc only hpet in my box.
> >>>
> >>> Making poor E5-2658 box a crippled wreck.
> >>
> >> yes,it is. But cpu usage will be down from 15% to 5% when binding
> >> cpu, so maybe read_hpet is not the root cause.
> >
> > Definitely hpet _IS_ the root cause on a machine as large as this,
> > simply because everything gets serialized on the hpet access.
> >
> > Binding stuff to cpus just makes the timing behaviour different, so
> > the hpet serialization is not that prominent, but still bad enough.
> >
> > Talk to your HW/BIOS vendor. The kernel cannot do anything about
> > defunct hardware.
>
> thank you for your reply.but suse sp2 is very good in this scene.
> Can it be said there has a bug, then community fix it later,
> so it's just a coincidence?
I'm quite sure it's because of the patches I mentioned.
-Mike
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-28 9:08 ` Thomas Gleixner
2014-05-28 10:30 ` Peter Zijlstra
2014-05-28 11:43 ` Libo Chen
@ 2014-05-29 7:57 ` Libo Chen
2 siblings, 0 replies; 33+ messages in thread
From: Libo Chen @ 2014-05-29 7:57 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Mike Galbraith, mingo, LKML, Greg KH, Li Zefan, peterz, Huang Qiang
On 2014/5/28 17:08, Thomas Gleixner wrote:
> On Wed, 28 May 2014, Libo Chen wrote:
>
>> On 2014/5/28 9:53, Mike Galbraith wrote:
>>> On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:
>>>
>>>> oh yes, no tsc only hpet in my box.
>>>
>>> Making poor E5-2658 box a crippled wreck.
>>
>> yes,it is. But cpu usage will be down from 15% to 5% when binding
>> cpu, so maybe read_hpet is not the root cause.
>
> Definitely hpet _IS_ the root cause on a machine as large as this,
> simply because everything gets serialized on the hpet access.
>
> Binding stuff to cpus just makes the timing behaviour different, so
> the hpet serialization is not that prominent, but still bad enough.
>
> Talk to your HW/BIOS vendor. The kernel cannot do anything about
> defunct hardware.
I got it!
thanks,
Libo
>
> Thanks,
>
> tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: balance storm
2014-05-28 11:55 ` Mike Galbraith
@ 2014-05-29 7:58 ` Libo Chen
0 siblings, 0 replies; 33+ messages in thread
From: Libo Chen @ 2014-05-29 7:58 UTC (permalink / raw)
To: Mike Galbraith
Cc: Thomas Gleixner, mingo, LKML, Greg KH, Li Zefan, peterz,
Huang Qiang, Borislav Petkov
On 2014/5/28 19:55, Mike Galbraith wrote:
> On Wed, 2014-05-28 at 19:43 +0800, Libo Chen wrote:
>> On 2014/5/28 17:08, Thomas Gleixner wrote:
>>> On Wed, 28 May 2014, Libo Chen wrote:
>>>
>>>> On 2014/5/28 9:53, Mike Galbraith wrote:
>>>>> On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:
>>>>>
>>>>>> oh yes, no tsc only hpet in my box.
>>>>>
>>>>> Making poor E5-2658 box a crippled wreck.
>>>>
>>>> yes,it is. But cpu usage will be down from 15% to 5% when binding
>>>> cpu, so maybe read_hpet is not the root cause.
>>>
>>> Definitely hpet _IS_ the root cause on a machine as large as this,
>>> simply because everything gets serialized on the hpet access.
>>>
>>> Binding stuff to cpus just makes the timing behaviour different, so
>>> the hpet serialization is not that prominent, but still bad enough.
>>>
>>> Talk to your HW/BIOS vendor. The kernel cannot do anything about
>>> defunct hardware.
>>
>> thank you for your reply.but suse sp2 is very good in this scene.
>> Can it be said there has a bug, then community fix it later,
>> so it's just a coincidence?
>
> I'm quite sure it's because of the patches I mentioned.
I see
thanks,
Libo
>
> -Mike
>
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2014-05-29 7:59 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-26 3:04 balance storm Libo Chen
2014-05-26 5:11 ` Mike Galbraith
2014-05-26 12:16 ` Libo Chen
2014-05-26 14:19 ` Mike Galbraith
2014-05-27 7:56 ` Libo Chen
2014-05-27 9:55 ` Mike Galbraith
2014-05-27 12:50 ` Libo Chen
2014-05-27 13:20 ` Mike Galbraith
2014-05-28 1:04 ` Libo Chen
2014-05-28 1:53 ` Mike Galbraith
2014-05-28 6:54 ` Libo Chen
2014-05-28 8:16 ` Mike Galbraith
2014-05-28 9:08 ` Thomas Gleixner
2014-05-28 10:30 ` Peter Zijlstra
2014-05-28 10:52 ` Borislav Petkov
2014-05-28 11:43 ` Libo Chen
2014-05-28 11:55 ` Mike Galbraith
2014-05-29 7:58 ` Libo Chen
2014-05-29 7:57 ` Libo Chen
2014-05-27 20:53 ` Thomas Gleixner
2014-05-28 1:06 ` Libo Chen
2014-05-26 7:56 ` Mike Galbraith
2014-05-26 11:49 ` Libo Chen
2014-05-26 14:03 ` Mike Galbraith
2014-05-27 7:44 ` Libo Chen
2014-05-27 8:12 ` Mike Galbraith
2014-05-27 9:48 ` Peter Zijlstra
2014-05-27 10:05 ` Mike Galbraith
2014-05-27 10:43 ` Peter Zijlstra
2014-05-27 10:55 ` Mike Galbraith
2014-05-27 12:56 ` Libo Chen
2014-05-27 12:55 ` Libo Chen
2014-05-27 13:13 ` Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.