linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* /proc/stat information incorrect
@ 2012-04-16  8:04 Bas van der Oest
  2012-04-16 12:36 ` Martin Schwidefsky
  0 siblings, 1 reply; 5+ messages in thread
From: Bas van der Oest @ 2012-04-16  8:04 UTC (permalink / raw)
  To: linux-kernel

Hi,

I am trying to figure out what my benchmark system is spending CPU
time on by looking at the /proc/stat information provided by the
kernel.
I manually map my benchmark application (in this case an
Ethernet-benchmark) to CPU4-7 and the interrupts are assigned to CPU5.
I am running kernel version 3.2.8 on a 8-core nehalem architecture CPU
(2* Intel E5540).
The interrupts are generated by a 10Gb Ethernet adapter (Intel AT2).
I benchmark for a ten second period and take the /proc/stat
information before and after this period.

The /proc/stat information looks like this:
before:
cpu     10384   0       136128  14417335        15115   6       21414
0       0       0
cpu0    371     0       1136    1808846         14606   3       398 0
     0       0
cpu1    238     0       451     1827606         126     0       50 0
    0       0
cpu2    89      0       345     1828021         39      0       15 0
    0       0
cpu3    314     0       524     1827629         26      0       10 0
    0       0
cpu4    268     0       3008    1804530         106     2       1827 0
      0       0
cpu5    2320    0       33947   1767454         99      0       18941
0       0       0
cpu6    3429    0       48610   1776287         70      0       76 0
    0       0
cpu7    3352    0       48105   1776957         41      0       95 0
    0       0
after:
cpu     10389   0       136562  14425011        15115   6       21467
0       0       0
cpu0    371     0       1136    1809913         14606   3       398 0
     0       0
cpu1    238     0       452     1828676         126     0       50 0
    0       0
cpu2    89      0       345     1829092         39      0       15 0
    0       0
cpu3    315     0       524     1828699         26      0       10 0
    0       0
cpu4    269     0       3102    1805504         106     2       1827 0
      0       0
cpu5    2322    0       34039   1767989         99      0       18993
0       0       0
cpu6    3429    0       48692   1777274         70      0       76 0
    0       0
cpu7    3353    0       48270   1777862         41      0       95 0
    0       0

After dropping some zero columns and taking the difference between the
statistics the reformatted result is:
       user    nice    system  idle    iowait  irq     softirq sum
cpu     5       0       434     7676    0       0       53      8168
cpu0    0       0       0       1067    0       0       0       1067
cpu1    0       0       1       1070    0       0       0       1071
cpu2    0       0       0       1071    0       0       0       1071
cpu3    1       0       0       1070    0       0       0       1071
cpu4    1       0       94      974     0       0       0       1069
cpu5    2       0       92      535     0       0       52      681
cpu6    0       0       82      987     0       0       0       1069
cpu7    1       0       165     905     0       0       0       1071

I added a sum column which totals the time spent in the different
modes. The above table now shows how long each CPU was in what mode
for how long.
Now I am wondering how it is possible that CPU5 has spent much less
time than all the other CPUs. I expected that all CPUs spent around
the same time (10s). This time includes idle time so this is not
related to the difference in active/idle CPUs.

I know for a fact that this effect is related to which CPU is handling
my IRQs; this effect happens to all CPUs if I map the interrupts to
that particular CPU.
I looked up the scheduler's statistics handling in the kernel source
but was not able to find any cause for the above mentioned effect.

Can anyone reproduce this behaviour?
Does anyone know where/what might be the cause of this?

Regards,

Bas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: /proc/stat information incorrect
  2012-04-16  8:04 /proc/stat information incorrect Bas van der Oest
@ 2012-04-16 12:36 ` Martin Schwidefsky
  2012-04-17  8:04   ` Bas van der Oest
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Schwidefsky @ 2012-04-16 12:36 UTC (permalink / raw)
  To: Bas van der Oest; +Cc: linux-kernel

On Mon, 16 Apr 2012 10:04:48 +0200
Bas van der Oest <bassvdo@gmail.com> wrote:

> After dropping some zero columns and taking the difference between the
> statistics the reformatted result is:
>        user    nice    system  idle    iowait  irq     softirq sum
> cpu     5       0       434     7676    0       0       53      8168
> cpu0    0       0       0       1067    0       0       0       1067
> cpu1    0       0       1       1070    0       0       0       1071
> cpu2    0       0       0       1071    0       0       0       1071
> cpu3    1       0       0       1070    0       0       0       1071
> cpu4    1       0       94      974     0       0       0       1069
> cpu5    2       0       92      535     0       0       52      681
> cpu6    0       0       82      987     0       0       0       1069
> cpu7    1       0       165     905     0       0       0       1071
> 
> I added a sum column which totals the time spent in the different
> modes. The above table now shows how long each CPU was in what mode
> for how long.
> Now I am wondering how it is possible that CPU5 has spent much less
> time than all the other CPUs. I expected that all CPUs spent around
> the same time (10s). This time includes idle time so this is not
> related to the difference in active/idle CPUs.
> 
> I know for a fact that this effect is related to which CPU is handling
> my IRQs; this effect happens to all CPUs if I map the interrupts to
> that particular CPU.
> I looked up the scheduler's statistics handling in the kernel source
> but was not able to find any cause for the above mentioned effect.
> 
> Can anyone reproduce this behaviour?
> Does anyone know where/what might be the cause of this?

Assuming that you are on a recent kernel (>= 3.2), could you please
try to revoke git commit a25cac5198d4ff28 "proc: Consider NO_HZ when
printing idle and iowait times" and try again ?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: /proc/stat information incorrect
  2012-04-16 12:36 ` Martin Schwidefsky
@ 2012-04-17  8:04   ` Bas van der Oest
  2012-04-20 12:06     ` Martin Schwidefsky
  0 siblings, 1 reply; 5+ messages in thread
From: Bas van der Oest @ 2012-04-17  8:04 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: linux-kernel

> Assuming that you are on a recent kernel (>= 3.2), could you please
> try to revoke git commit a25cac5198d4ff28 "proc: Consider NO_HZ when
> printing idle and iowait times" and try again ?

I revoked the patch and this certainly had effect, however I still
doubt if the result is completely correct now.
It now seems that CPU4 (this one is handling the irqs now) spends more
time than the other CPUs:

        user    nice    system  idle    iowait  irq   softirq sum
cpu     4       0       412     8186    0       0       79      8681
cpu0    0       0       1       1074    0       0       1       1076
cpu1    0       0       2       1075    0       0       0       1077
cpu2    0       0       1       1075    0       0       0       1076
cpu3    0       0       1       1079    0       0       0       1080
cpu4    1       0       115     941     0       0       78      1135
cpu5    2       0       99      984     0       0       0       1085
cpu6    0       0       98      977     0       0       0       1075
cpu7    0       0       95      980     0       0       0       1075

Any suggestions where this small difference is coming from?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: /proc/stat information incorrect
  2012-04-17  8:04   ` Bas van der Oest
@ 2012-04-20 12:06     ` Martin Schwidefsky
  2012-04-25  7:24       ` Bas van der Oest
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Schwidefsky @ 2012-04-20 12:06 UTC (permalink / raw)
  To: Bas van der Oest; +Cc: linux-kernel

On Tue, 17 Apr 2012 10:04:35 +0200
Bas van der Oest <bassvdo@gmail.com> wrote:

> > Assuming that you are on a recent kernel (>= 3.2), could you please
> > try to revoke git commit a25cac5198d4ff28 "proc: Consider NO_HZ when
> > printing idle and iowait times" and try again ?
> 
> I revoked the patch and this certainly had effect, however I still
> doubt if the result is completely correct now.
> It now seems that CPU4 (this one is handling the irqs now) spends more
> time than the other CPUs:
> 
>         user    nice    system  idle    iowait  irq   softirq sum
> cpu     4       0       412     8186    0       0       79      8681
> cpu0    0       0       1       1074    0       0       1       1076
> cpu1    0       0       2       1075    0       0       0       1077
> cpu2    0       0       1       1075    0       0       0       1076
> cpu3    0       0       1       1079    0       0       0       1080
> cpu4    1       0       115     941     0       0       78      1135
> cpu5    2       0       99      984     0       0       0       1085
> cpu6    0       0       98      977     0       0       0       1075
> cpu7    0       0       95      980     0       0       0       1075
> 
> Any suggestions where this small difference is coming from?

CONFIG_IRQ_TIME_ACCOUNTING comes to mind. Is this enabled for your
kernel ?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: /proc/stat information incorrect
  2012-04-20 12:06     ` Martin Schwidefsky
@ 2012-04-25  7:24       ` Bas van der Oest
  0 siblings, 0 replies; 5+ messages in thread
From: Bas van der Oest @ 2012-04-25  7:24 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: linux-kernel

Thanks for your consideration Martin.
> CONFIG_IRQ_TIME_ACCOUNTING comes to mind. Is this enabled for your
> kernel ?

This configuration was not yet enabled.
I tried this configuration with and without the patch you mentioned
(git a25cac5198d4ff28).
The results are still not completely as expected:

With patch, with CONFIG_IRQ_TIME_ACCOUNTING:
        user    nice    system  idle    iowait  irq     softirq sum
cpu     5       0       373     7610    0       55      382     8425
cpu0    0       0       1       1042    0       18      1       1062
cpu1    0       0       1       1058    0       1       0       1060
cpu2    0       0       1       1060    0       0       0       1061
cpu3    0       0       0       1061    0       1       0       1062
cpu4    2       0       75      505     0       36      380     998
cpu5    1       0       82      983     0       0       0       1066
cpu6    1       0       139     911     0       0       0       1051
cpu7    1       0       74      988     0       0       0       1063

Revoked patch and CONFIG_IRQ_TIME_ACCOUNTING:
        user    nice    system  idle    iowait  irq     softirq sum
cpu     8       0       461     7854    0       55      448     8826
cpu0    0       0       1       1056    0       15      1       1073
cpu1    0       0       1       1073    0       0       1       1075
cpu2    1       0       1       1073    0       0       0       1075
cpu3    0       0       1       1217    0       0       1       1219
cpu4    3       0       147     511     0       39      447     1147
cpu5    1       0       106     967     0       0       0       1074
cpu6    1       0       99      972     0       0       0       1072
cpu7    1       0       106     985     0       1       1       1094

Looking at the sum column still some unexpected behaviour is seen.
The CONFIG_IRQ_TIME_ACCOUNTING  seems to help but the interrupt
handling CPU (CPU4) is still "lazy" when the original code is used
(with patch).
If the patch is revoked some other CPUs seem to spend more time than
is really available.
Both scenarios should not be possible imho.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-04-25  7:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-16  8:04 /proc/stat information incorrect Bas van der Oest
2012-04-16 12:36 ` Martin Schwidefsky
2012-04-17  8:04   ` Bas van der Oest
2012-04-20 12:06     ` Martin Schwidefsky
2012-04-25  7:24       ` Bas van der Oest

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).