Q: dlm_recoverd takes 100%

* Q: dlm_recoverd takes 100%
@ 2012-08-28  8:54 Heiko Nardmann
  0 siblings, 0 replies; only message in thread
From: Heiko Nardmann @ 2012-08-28  8:54 UTC (permalink / raw)
  To: linux-kernel

Hi together,

maybe someone can give me a hint which ML to contact (if I am wrong here)?

In a two-node cluster system I see 'dlm_recoverd' taking 100% time of 
one cpu for around 6 minutes. Here is small excerpt from a 'top' output 
during that period:

top - 10:51:01 up 3 days, 17:21,  5 users,  load average: 10.19, 5.39, 2.76
Tasks: 536 total,   3 running, 533 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.2%us,  6.6%sy,  0.0%ni, 92.1%id,  0.1%wa,  0.0%hi, 0.0%si,  
0.0%st
Mem:  12183344k total, 11827540k used,   355804k free,   160332k buffers
Swap: 14417912k total,        0k used, 14417912k free,  8299364k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ COMMAND
  3121 root      20   0     0    0    0 R 100.0  0.0   3:36.15 dlm_recoverd

The cluster nodes use a shared SAN (GFS2). The second node has been 
rebooted while I experience this behaviour. The real problem is that my 
application is unable to open a file on the SAN for these 6 minutes. 
After the reboot of the second node all is fine again and the 
application succeeds in opening the file. So I am not sure what can 
cause those two symptoms.

Thanks in advance for any hint!

Kind regards,

     Heiko

^ permalink raw reply	[flat|nested] only message in thread