All of lore.kernel.org
 help / color / mirror / Atom feed
* bogus utime and stime in /proc/<PID/stat - possibly related to fs/proc/array.c change
@ 2012-07-10 22:17 Alec Matusis
  0 siblings, 0 replies; only message in thread
From: Alec Matusis @ 2012-07-10 22:17 UTC (permalink / raw)
  To: linux-kernel

I run a number of Linux servers and I noticed an interesting bug, possibly
related to a recent change in fs/proc/array.c

After upgrading from Ubuntu 2.6.24-26 to 2.6.32-40 (and higher) in Ubuntu, I
noticed that about once per month, suddenly, a user process causing the main
load on a given machine disappears from "top", but it still continues to run
normally (perhaps with a slight performance decrease). After this, the load
average of the system remains the same, but the top shows no running
processes causing the load. This happened on a variety of new IBM System X
machines, all running different tasks (httpd 2.2, mysqld 5.1, Twisted Python
TCP servers).

I looked at a problematic process, and discovered that ps -o pcpu showed
crazily large numbers:

#ps -o pcpu,pid,cmd -p1587
%CPU   PID CMD
317713124 1587 /nail/encap/mysql-5.1.60/libexec/mysqld

Then I looked at: 

# cat /proc/1587/stat
 1587 (mysqld) S 1212 1088 1088 0 -1 4202752 14307313 0 162 0 85773299069
4611685932654088833 0 0 20 0 52 0 3549 27255418880 5483524
18446744073709551615 4194304 11111617 140733749236976 140733749235984
8858659 0 552967 4102 26345 18446744073709551615 0 0 17 5 0 0 0 0

I noticed that the 14th and 15th entry 85773299069     4611685932654088833
(utime and stime) become abnormally large and they were stuck. When the
server is in the normal state (i.e. the system load-causing process shows up
on top, and ps -o pcpu shows reasonable %CPU) , these numbers are 13 orders
of magnitude smaller, e.g.  416786 602262, and they are advancing by about
10 per second. 

I do not understand what causes this problem, expect that I know that
machines with 2.6.24-26 or earlier do not have this behavior, and since then
there was a change in fs/proc/array.c.

I wrote this up in detail in
http://serverfault.com/questions/406489/load-causing-processes-disappearing-
from-top-ps-o-pcpu-shows-bogus-numbers

If you have any comment on this, it'd be highly appreciated.

Thank you.


Alec Matusis




^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2012-07-10 22:17 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-10 22:17 bogus utime and stime in /proc/<PID/stat - possibly related to fs/proc/array.c change Alec Matusis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.