* /proc/stat information incorrect
@ 2012-04-16 8:04 Bas van der Oest
2012-04-16 12:36 ` Martin Schwidefsky
0 siblings, 1 reply; 5+ messages in thread
From: Bas van der Oest @ 2012-04-16 8:04 UTC (permalink / raw)
To: linux-kernel
Hi,
I am trying to figure out what my benchmark system is spending CPU
time on by looking at the /proc/stat information provided by the
kernel.
I manually map my benchmark application (in this case an
Ethernet-benchmark) to CPU4-7 and the interrupts are assigned to CPU5.
I am running kernel version 3.2.8 on a 8-core nehalem architecture CPU
(2* Intel E5540).
The interrupts are generated by a 10Gb Ethernet adapter (Intel AT2).
I benchmark for a ten second period and take the /proc/stat
information before and after this period.
The /proc/stat information looks like this:
before:
cpu 10384 0 136128 14417335 15115 6 21414
0 0 0
cpu0 371 0 1136 1808846 14606 3 398 0
0 0
cpu1 238 0 451 1827606 126 0 50 0
0 0
cpu2 89 0 345 1828021 39 0 15 0
0 0
cpu3 314 0 524 1827629 26 0 10 0
0 0
cpu4 268 0 3008 1804530 106 2 1827 0
0 0
cpu5 2320 0 33947 1767454 99 0 18941
0 0 0
cpu6 3429 0 48610 1776287 70 0 76 0
0 0
cpu7 3352 0 48105 1776957 41 0 95 0
0 0
after:
cpu 10389 0 136562 14425011 15115 6 21467
0 0 0
cpu0 371 0 1136 1809913 14606 3 398 0
0 0
cpu1 238 0 452 1828676 126 0 50 0
0 0
cpu2 89 0 345 1829092 39 0 15 0
0 0
cpu3 315 0 524 1828699 26 0 10 0
0 0
cpu4 269 0 3102 1805504 106 2 1827 0
0 0
cpu5 2322 0 34039 1767989 99 0 18993
0 0 0
cpu6 3429 0 48692 1777274 70 0 76 0
0 0
cpu7 3353 0 48270 1777862 41 0 95 0
0 0
After dropping some zero columns and taking the difference between the
statistics the reformatted result is:
user nice system idle iowait irq softirq sum
cpu 5 0 434 7676 0 0 53 8168
cpu0 0 0 0 1067 0 0 0 1067
cpu1 0 0 1 1070 0 0 0 1071
cpu2 0 0 0 1071 0 0 0 1071
cpu3 1 0 0 1070 0 0 0 1071
cpu4 1 0 94 974 0 0 0 1069
cpu5 2 0 92 535 0 0 52 681
cpu6 0 0 82 987 0 0 0 1069
cpu7 1 0 165 905 0 0 0 1071
I added a sum column which totals the time spent in the different
modes. The above table now shows how long each CPU was in what mode
for how long.
Now I am wondering how it is possible that CPU5 has spent much less
time than all the other CPUs. I expected that all CPUs spent around
the same time (10s). This time includes idle time so this is not
related to the difference in active/idle CPUs.
I know for a fact that this effect is related to which CPU is handling
my IRQs; this effect happens to all CPUs if I map the interrupts to
that particular CPU.
I looked up the scheduler's statistics handling in the kernel source
but was not able to find any cause for the above mentioned effect.
Can anyone reproduce this behaviour?
Does anyone know where/what might be the cause of this?
Regards,
Bas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: /proc/stat information incorrect
2012-04-16 8:04 /proc/stat information incorrect Bas van der Oest
@ 2012-04-16 12:36 ` Martin Schwidefsky
2012-04-17 8:04 ` Bas van der Oest
0 siblings, 1 reply; 5+ messages in thread
From: Martin Schwidefsky @ 2012-04-16 12:36 UTC (permalink / raw)
To: Bas van der Oest; +Cc: linux-kernel
On Mon, 16 Apr 2012 10:04:48 +0200
Bas van der Oest <bassvdo@gmail.com> wrote:
> After dropping some zero columns and taking the difference between the
> statistics the reformatted result is:
> user nice system idle iowait irq softirq sum
> cpu 5 0 434 7676 0 0 53 8168
> cpu0 0 0 0 1067 0 0 0 1067
> cpu1 0 0 1 1070 0 0 0 1071
> cpu2 0 0 0 1071 0 0 0 1071
> cpu3 1 0 0 1070 0 0 0 1071
> cpu4 1 0 94 974 0 0 0 1069
> cpu5 2 0 92 535 0 0 52 681
> cpu6 0 0 82 987 0 0 0 1069
> cpu7 1 0 165 905 0 0 0 1071
>
> I added a sum column which totals the time spent in the different
> modes. The above table now shows how long each CPU was in what mode
> for how long.
> Now I am wondering how it is possible that CPU5 has spent much less
> time than all the other CPUs. I expected that all CPUs spent around
> the same time (10s). This time includes idle time so this is not
> related to the difference in active/idle CPUs.
>
> I know for a fact that this effect is related to which CPU is handling
> my IRQs; this effect happens to all CPUs if I map the interrupts to
> that particular CPU.
> I looked up the scheduler's statistics handling in the kernel source
> but was not able to find any cause for the above mentioned effect.
>
> Can anyone reproduce this behaviour?
> Does anyone know where/what might be the cause of this?
Assuming that you are on a recent kernel (>= 3.2), could you please
try to revoke git commit a25cac5198d4ff28 "proc: Consider NO_HZ when
printing idle and iowait times" and try again ?
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: /proc/stat information incorrect
2012-04-16 12:36 ` Martin Schwidefsky
@ 2012-04-17 8:04 ` Bas van der Oest
2012-04-20 12:06 ` Martin Schwidefsky
0 siblings, 1 reply; 5+ messages in thread
From: Bas van der Oest @ 2012-04-17 8:04 UTC (permalink / raw)
To: Martin Schwidefsky; +Cc: linux-kernel
> Assuming that you are on a recent kernel (>= 3.2), could you please
> try to revoke git commit a25cac5198d4ff28 "proc: Consider NO_HZ when
> printing idle and iowait times" and try again ?
I revoked the patch and this certainly had effect, however I still
doubt if the result is completely correct now.
It now seems that CPU4 (this one is handling the irqs now) spends more
time than the other CPUs:
user nice system idle iowait irq softirq sum
cpu 4 0 412 8186 0 0 79 8681
cpu0 0 0 1 1074 0 0 1 1076
cpu1 0 0 2 1075 0 0 0 1077
cpu2 0 0 1 1075 0 0 0 1076
cpu3 0 0 1 1079 0 0 0 1080
cpu4 1 0 115 941 0 0 78 1135
cpu5 2 0 99 984 0 0 0 1085
cpu6 0 0 98 977 0 0 0 1075
cpu7 0 0 95 980 0 0 0 1075
Any suggestions where this small difference is coming from?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: /proc/stat information incorrect
2012-04-17 8:04 ` Bas van der Oest
@ 2012-04-20 12:06 ` Martin Schwidefsky
2012-04-25 7:24 ` Bas van der Oest
0 siblings, 1 reply; 5+ messages in thread
From: Martin Schwidefsky @ 2012-04-20 12:06 UTC (permalink / raw)
To: Bas van der Oest; +Cc: linux-kernel
On Tue, 17 Apr 2012 10:04:35 +0200
Bas van der Oest <bassvdo@gmail.com> wrote:
> > Assuming that you are on a recent kernel (>= 3.2), could you please
> > try to revoke git commit a25cac5198d4ff28 "proc: Consider NO_HZ when
> > printing idle and iowait times" and try again ?
>
> I revoked the patch and this certainly had effect, however I still
> doubt if the result is completely correct now.
> It now seems that CPU4 (this one is handling the irqs now) spends more
> time than the other CPUs:
>
> user nice system idle iowait irq softirq sum
> cpu 4 0 412 8186 0 0 79 8681
> cpu0 0 0 1 1074 0 0 1 1076
> cpu1 0 0 2 1075 0 0 0 1077
> cpu2 0 0 1 1075 0 0 0 1076
> cpu3 0 0 1 1079 0 0 0 1080
> cpu4 1 0 115 941 0 0 78 1135
> cpu5 2 0 99 984 0 0 0 1085
> cpu6 0 0 98 977 0 0 0 1075
> cpu7 0 0 95 980 0 0 0 1075
>
> Any suggestions where this small difference is coming from?
CONFIG_IRQ_TIME_ACCOUNTING comes to mind. Is this enabled for your
kernel ?
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: /proc/stat information incorrect
2012-04-20 12:06 ` Martin Schwidefsky
@ 2012-04-25 7:24 ` Bas van der Oest
0 siblings, 0 replies; 5+ messages in thread
From: Bas van der Oest @ 2012-04-25 7:24 UTC (permalink / raw)
To: Martin Schwidefsky; +Cc: linux-kernel
Thanks for your consideration Martin.
> CONFIG_IRQ_TIME_ACCOUNTING comes to mind. Is this enabled for your
> kernel ?
This configuration was not yet enabled.
I tried this configuration with and without the patch you mentioned
(git a25cac5198d4ff28).
The results are still not completely as expected:
With patch, with CONFIG_IRQ_TIME_ACCOUNTING:
user nice system idle iowait irq softirq sum
cpu 5 0 373 7610 0 55 382 8425
cpu0 0 0 1 1042 0 18 1 1062
cpu1 0 0 1 1058 0 1 0 1060
cpu2 0 0 1 1060 0 0 0 1061
cpu3 0 0 0 1061 0 1 0 1062
cpu4 2 0 75 505 0 36 380 998
cpu5 1 0 82 983 0 0 0 1066
cpu6 1 0 139 911 0 0 0 1051
cpu7 1 0 74 988 0 0 0 1063
Revoked patch and CONFIG_IRQ_TIME_ACCOUNTING:
user nice system idle iowait irq softirq sum
cpu 8 0 461 7854 0 55 448 8826
cpu0 0 0 1 1056 0 15 1 1073
cpu1 0 0 1 1073 0 0 1 1075
cpu2 1 0 1 1073 0 0 0 1075
cpu3 0 0 1 1217 0 0 1 1219
cpu4 3 0 147 511 0 39 447 1147
cpu5 1 0 106 967 0 0 0 1074
cpu6 1 0 99 972 0 0 0 1072
cpu7 1 0 106 985 0 1 1 1094
Looking at the sum column still some unexpected behaviour is seen.
The CONFIG_IRQ_TIME_ACCOUNTING seems to help but the interrupt
handling CPU (CPU4) is still "lazy" when the original code is used
(with patch).
If the patch is revoked some other CPUs seem to spend more time than
is really available.
Both scenarios should not be possible imho.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-04-25 7:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-16 8:04 /proc/stat information incorrect Bas van der Oest
2012-04-16 12:36 ` Martin Schwidefsky
2012-04-17 8:04 ` Bas van der Oest
2012-04-20 12:06 ` Martin Schwidefsky
2012-04-25 7:24 ` Bas van der Oest
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).