All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-29 21:50 ` Cal Leeming [Simplicity Media Ltd]
  0 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-29 21:50 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users

  First of all, my apologies if I have submitted this problem to the 
wrong place, spent 20 minutes trying to figure out where it needs to be 
sent, and was still none the wiser.

The problem is related to applying memory limitations within a cgroup. 
If the OOM killer kicks in, it gets stuck in a loop where it tries to 
kill a process which has an oom_adj of -17. This causes an infinite 
loop, which in turn locks up the system.

May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of 
memory: kill process 6016 (java) score 0 or a child
May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of 
memory: kill process 6016 (java) score 0 or a child
May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of 
memory: kill process 6016 (java) score 0 or a child
May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of 
memory: kill process 6016 (java) score 0 or a child
May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of 
memory: kill process 6016 (java) score 0 or a child


  root@vicky [/home/foxx] > uname -a
Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
GNU/Linux
(this happens on both the grsec patched and non patched 2.6.32.41 kernel)

When this is encountered, the memory usage across the whole server is 
still within limits (not even hitting swap).

The memory configuration for the cgroup/lxc is:
lxc.cgroup.memory.limit_in_bytes = 3000M
lxc.cgroup.memory.memsw.limit_in_bytes = 3128M

Now, what is even more strange, is that when running under the 2.6.32.28 
kernel (both patched and unpatched), this problem doesn't happen. 
However, there is a slight difference between the two kernels. The 
2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj, where as 
the 2.6.32.41 gives a default of -17. I suspect this is the root cause 
of why it's showing in the later kernel, but not the earlier.

To test this theory, I started up the lxc on both servers, and then ran 
a one liner which showed me all the processes with an oom_adj of -17:

(the below is the older/working kernel)
root@courtney.internal [/mnt/encstore/lxc] > uname -a
Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT 
2011 x86_64 GNU/Linux
root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc -iname 
'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do ps -p $x 
--no-headers ; done
grep: /proc/1411/task/1411/oom_adj: No such file or directory
grep: /proc/1411/oom_adj: No such file or directory
   804 ?        00:00:00 udevd
   804 ?        00:00:00 udevd
25536 ?        00:00:00 sshd
25536 ?        00:00:00 sshd
31861 ?        00:00:00 sshd
31861 ?        00:00:00 sshd
32173 ?        00:00:00 udevd
32173 ?        00:00:00 udevd
32174 ?        00:00:00 udevd
32174 ?        00:00:00 udevd

(the below is the newer/broken kernel)
  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
GNU/Linux
  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in `find 
/proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; 
do ps -p $x --no-headers ; done
grep: /proc/3118/task/3118/oom_adj: No such file or directory
grep: /proc/3118/oom_adj: No such file or directory
   895 ?        00:00:00 udevd
   895 ?        00:00:00 udevd
  1091 ?        00:00:00 udevd
  1091 ?        00:00:00 udevd
  1092 ?        00:00:00 udevd
  1092 ?        00:00:00 udevd
  2596 ?        00:00:00 sshd
  2596 ?        00:00:00 sshd
  2608 ?        00:00:00 sshd
  2608 ?        00:00:00 sshd
  2613 ?        00:00:00 sshd
  2613 ?        00:00:00 sshd
  2614 pts/0    00:00:00 bash
  2614 pts/0    00:00:00 bash
  2620 pts/0    00:00:00 sudo
  2620 pts/0    00:00:00 sudo
  2621 pts/0    00:00:00 su
  2621 pts/0    00:00:00 su
  2622 pts/0    00:00:00 bash
  2622 pts/0    00:00:00 bash
  2685 ?        00:00:00 lxc-start
  2685 ?        00:00:00 lxc-start
  2699 ?        00:00:00 init
  2699 ?        00:00:00 init
  2939 ?        00:00:00 rc
  2939 ?        00:00:00 rc
  2942 ?        00:00:00 startpar
  2942 ?        00:00:00 startpar
  2964 ?        00:00:00 rsyslogd
  2964 ?        00:00:00 rsyslogd
  2964 ?        00:00:00 rsyslogd
  2964 ?        00:00:00 rsyslogd
  2980 ?        00:00:00 startpar
  2980 ?        00:00:00 startpar
  2981 ?        00:00:00 ctlscript.sh
  2981 ?        00:00:00 ctlscript.sh
  3016 ?        00:00:00 cron
  3016 ?        00:00:00 cron
  3025 ?        00:00:00 mysqld_safe
  3025 ?        00:00:00 mysqld_safe
  3032 ?        00:00:00 sshd
  3032 ?        00:00:00 sshd
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3113 ?        00:00:00 ctl.sh
  3113 ?        00:00:00 ctl.sh
  3115 ?        00:00:00 sleep
  3115 ?        00:00:00 sleep
  3116 ?        00:00:00 .memcached.bin
  3116 ?        00:00:00 .memcached.bin


As you can see, it is clear that the newer kernel is setting -17 by 
default, which in turn is causing the OOM killer loop.

So I began to try and find what may have caused this problem by 
comparing the two sources...

I checked the code for all references to 'oom_adj' and 'oom_adjust' in 
both code sets, but found no obvious differences:
grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e oom_adj

Then I checked for references to "-17" in all .c and .h files, and found 
a couple of matches, but only one obvious one:
grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
./include/linux/oom.h:#define OOM_DISABLE (-17)

But again, a search for OOM_DISABLE came up with nothing obvious...

In a last ditch attempt, I did a search for all references to 'oom' 
(case-insensitive) in both code bases, then compared the two:
  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n 
 > /tmp/annabelle.oom_adj
  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R 
"oom" . | sort -n > /tmp/vicky.oom_adj

and this brought back (yet again) nothing obvious..


  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
./include/linux/oom.h
2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h

  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h

  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
./mm/oom_kill.c
1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c

  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c



Could anyone please shed some light as to why the default oom_adj is set 
to -17 now (and where it is actually set)? From what I can tell, the fix 
for this issue will either be:

   1. Allow OOM killer to override the decision of ignoring oom_adj ==
      -17 if an unrecoverable loop is encountered.
   2. Change the default back to 0.

Again, my apologies if this bug report is slightly unorthodox, or 
doesn't follow usual procedure etc. I can assure you I have tried my 
absolute best to give all the necessary information though.

Cal


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Fwd: cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-29 21:50 ` Cal Leeming [Simplicity Media Ltd]
  0 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-29 21:50 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users

  First of all, my apologies if I have submitted this problem to the 
wrong place, spent 20 minutes trying to figure out where it needs to be 
sent, and was still none the wiser.

The problem is related to applying memory limitations within a cgroup. 
If the OOM killer kicks in, it gets stuck in a loop where it tries to 
kill a process which has an oom_adj of -17. This causes an infinite 
loop, which in turn locks up the system.

May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of 
memory: kill process 6016 (java) score 0 or a child
May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of 
memory: kill process 6016 (java) score 0 or a child
May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of 
memory: kill process 6016 (java) score 0 or a child
May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of 
memory: kill process 6016 (java) score 0 or a child
May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of 
memory: kill process 6016 (java) score 0 or a child


  root@vicky [/home/foxx] > uname -a
Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
GNU/Linux
(this happens on both the grsec patched and non patched 2.6.32.41 kernel)

When this is encountered, the memory usage across the whole server is 
still within limits (not even hitting swap).

The memory configuration for the cgroup/lxc is:
lxc.cgroup.memory.limit_in_bytes = 3000M
lxc.cgroup.memory.memsw.limit_in_bytes = 3128M

Now, what is even more strange, is that when running under the 2.6.32.28 
kernel (both patched and unpatched), this problem doesn't happen. 
However, there is a slight difference between the two kernels. The 
2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj, where as 
the 2.6.32.41 gives a default of -17. I suspect this is the root cause 
of why it's showing in the later kernel, but not the earlier.

To test this theory, I started up the lxc on both servers, and then ran 
a one liner which showed me all the processes with an oom_adj of -17:

(the below is the older/working kernel)
root@courtney.internal [/mnt/encstore/lxc] > uname -a
Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT 
2011 x86_64 GNU/Linux
root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc -iname 
'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do ps -p $x 
--no-headers ; done
grep: /proc/1411/task/1411/oom_adj: No such file or directory
grep: /proc/1411/oom_adj: No such file or directory
   804 ?        00:00:00 udevd
   804 ?        00:00:00 udevd
25536 ?        00:00:00 sshd
25536 ?        00:00:00 sshd
31861 ?        00:00:00 sshd
31861 ?        00:00:00 sshd
32173 ?        00:00:00 udevd
32173 ?        00:00:00 udevd
32174 ?        00:00:00 udevd
32174 ?        00:00:00 udevd

(the below is the newer/broken kernel)
  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
GNU/Linux
  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in `find 
/proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; 
do ps -p $x --no-headers ; done
grep: /proc/3118/task/3118/oom_adj: No such file or directory
grep: /proc/3118/oom_adj: No such file or directory
   895 ?        00:00:00 udevd
   895 ?        00:00:00 udevd
  1091 ?        00:00:00 udevd
  1091 ?        00:00:00 udevd
  1092 ?        00:00:00 udevd
  1092 ?        00:00:00 udevd
  2596 ?        00:00:00 sshd
  2596 ?        00:00:00 sshd
  2608 ?        00:00:00 sshd
  2608 ?        00:00:00 sshd
  2613 ?        00:00:00 sshd
  2613 ?        00:00:00 sshd
  2614 pts/0    00:00:00 bash
  2614 pts/0    00:00:00 bash
  2620 pts/0    00:00:00 sudo
  2620 pts/0    00:00:00 sudo
  2621 pts/0    00:00:00 su
  2621 pts/0    00:00:00 su
  2622 pts/0    00:00:00 bash
  2622 pts/0    00:00:00 bash
  2685 ?        00:00:00 lxc-start
  2685 ?        00:00:00 lxc-start
  2699 ?        00:00:00 init
  2699 ?        00:00:00 init
  2939 ?        00:00:00 rc
  2939 ?        00:00:00 rc
  2942 ?        00:00:00 startpar
  2942 ?        00:00:00 startpar
  2964 ?        00:00:00 rsyslogd
  2964 ?        00:00:00 rsyslogd
  2964 ?        00:00:00 rsyslogd
  2964 ?        00:00:00 rsyslogd
  2980 ?        00:00:00 startpar
  2980 ?        00:00:00 startpar
  2981 ?        00:00:00 ctlscript.sh
  2981 ?        00:00:00 ctlscript.sh
  3016 ?        00:00:00 cron
  3016 ?        00:00:00 cron
  3025 ?        00:00:00 mysqld_safe
  3025 ?        00:00:00 mysqld_safe
  3032 ?        00:00:00 sshd
  3032 ?        00:00:00 sshd
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3097 ?        00:00:00 mysqld.bin
  3113 ?        00:00:00 ctl.sh
  3113 ?        00:00:00 ctl.sh
  3115 ?        00:00:00 sleep
  3115 ?        00:00:00 sleep
  3116 ?        00:00:00 .memcached.bin
  3116 ?        00:00:00 .memcached.bin


As you can see, it is clear that the newer kernel is setting -17 by 
default, which in turn is causing the OOM killer loop.

So I began to try and find what may have caused this problem by 
comparing the two sources...

I checked the code for all references to 'oom_adj' and 'oom_adjust' in 
both code sets, but found no obvious differences:
grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e oom_adj

Then I checked for references to "-17" in all .c and .h files, and found 
a couple of matches, but only one obvious one:
grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
./include/linux/oom.h:#define OOM_DISABLE (-17)

But again, a search for OOM_DISABLE came up with nothing obvious...

In a last ditch attempt, I did a search for all references to 'oom' 
(case-insensitive) in both code bases, then compared the two:
  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n 
 > /tmp/annabelle.oom_adj
  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R 
"oom" . | sort -n > /tmp/vicky.oom_adj

and this brought back (yet again) nothing obvious..


  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
./include/linux/oom.h
2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h

  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h

  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
./mm/oom_kill.c
1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c

  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c



Could anyone please shed some light as to why the default oom_adj is set 
to -17 now (and where it is actually set)? From what I can tell, the fix 
for this issue will either be:

   1. Allow OOM killer to override the decision of ignoring oom_adj ==
      -17 if an unrecoverable loop is encountered.
   2. Change the default back to 0.

Again, my apologies if this bug report is slightly unorthodox, or 
doesn't follow usual procedure etc. I can assure you I have tried my 
absolute best to give all the necessary information though.

Cal


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fwd: cgroup OOM killer loop causes system to lockup (possible fix included)
  2011-05-29 21:50 ` Cal Leeming [Simplicity Media Ltd]
@ 2011-05-29 22:24   ` Cal Leeming [Simplicity Media Ltd]
  -1 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-29 22:24 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users

Some further logs:
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283


On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>  First of all, my apologies if I have submitted this problem to the 
> wrong place, spent 20 minutes trying to figure out where it needs to 
> be sent, and was still none the wiser.
>
> The problem is related to applying memory limitations within a cgroup. 
> If the OOM killer kicks in, it gets stuck in a loop where it tries to 
> kill a process which has an oom_adj of -17. This causes an infinite 
> loop, which in turn locks up the system.
>
> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
>
>
>  root@vicky [/home/foxx] > uname -a
> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> GNU/Linux
> (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
>
> When this is encountered, the memory usage across the whole server is 
> still within limits (not even hitting swap).
>
> The memory configuration for the cgroup/lxc is:
> lxc.cgroup.memory.limit_in_bytes = 3000M
> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>
> Now, what is even more strange, is that when running under the 
> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't 
> happen. However, there is a slight difference between the two kernels. 
> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj, 
> where as the 2.6.32.41 gives a default of -17. I suspect this is the 
> root cause of why it's showing in the later kernel, but not the earlier.
>
> To test this theory, I started up the lxc on both servers, and then 
> ran a one liner which showed me all the processes with an oom_adj of -17:
>
> (the below is the older/working kernel)
> root@courtney.internal [/mnt/encstore/lxc] > uname -a
> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT 
> 2011 x86_64 GNU/Linux
> root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc 
> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do 
> ps -p $x --no-headers ; done
> grep: /proc/1411/task/1411/oom_adj: No such file or directory
> grep: /proc/1411/oom_adj: No such file or directory
>   804 ?        00:00:00 udevd
>   804 ?        00:00:00 udevd
> 25536 ?        00:00:00 sshd
> 25536 ?        00:00:00 sshd
> 31861 ?        00:00:00 sshd
> 31861 ?        00:00:00 sshd
> 32173 ?        00:00:00 udevd
> 32173 ?        00:00:00 udevd
> 32174 ?        00:00:00 udevd
> 32174 ?        00:00:00 udevd
>
> (the below is the newer/broken kernel)
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> GNU/Linux
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in 
> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print 
> $3}'` ; do ps -p $x --no-headers ; done
> grep: /proc/3118/task/3118/oom_adj: No such file or directory
> grep: /proc/3118/oom_adj: No such file or directory
>   895 ?        00:00:00 udevd
>   895 ?        00:00:00 udevd
>  1091 ?        00:00:00 udevd
>  1091 ?        00:00:00 udevd
>  1092 ?        00:00:00 udevd
>  1092 ?        00:00:00 udevd
>  2596 ?        00:00:00 sshd
>  2596 ?        00:00:00 sshd
>  2608 ?        00:00:00 sshd
>  2608 ?        00:00:00 sshd
>  2613 ?        00:00:00 sshd
>  2613 ?        00:00:00 sshd
>  2614 pts/0    00:00:00 bash
>  2614 pts/0    00:00:00 bash
>  2620 pts/0    00:00:00 sudo
>  2620 pts/0    00:00:00 sudo
>  2621 pts/0    00:00:00 su
>  2621 pts/0    00:00:00 su
>  2622 pts/0    00:00:00 bash
>  2622 pts/0    00:00:00 bash
>  2685 ?        00:00:00 lxc-start
>  2685 ?        00:00:00 lxc-start
>  2699 ?        00:00:00 init
>  2699 ?        00:00:00 init
>  2939 ?        00:00:00 rc
>  2939 ?        00:00:00 rc
>  2942 ?        00:00:00 startpar
>  2942 ?        00:00:00 startpar
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2980 ?        00:00:00 startpar
>  2980 ?        00:00:00 startpar
>  2981 ?        00:00:00 ctlscript.sh
>  2981 ?        00:00:00 ctlscript.sh
>  3016 ?        00:00:00 cron
>  3016 ?        00:00:00 cron
>  3025 ?        00:00:00 mysqld_safe
>  3025 ?        00:00:00 mysqld_safe
>  3032 ?        00:00:00 sshd
>  3032 ?        00:00:00 sshd
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3113 ?        00:00:00 ctl.sh
>  3113 ?        00:00:00 ctl.sh
>  3115 ?        00:00:00 sleep
>  3115 ?        00:00:00 sleep
>  3116 ?        00:00:00 .memcached.bin
>  3116 ?        00:00:00 .memcached.bin
>
>
> As you can see, it is clear that the newer kernel is setting -17 by 
> default, which in turn is causing the OOM killer loop.
>
> So I began to try and find what may have caused this problem by 
> comparing the two sources...
>
> I checked the code for all references to 'oom_adj' and 'oom_adjust' in 
> both code sets, but found no obvious differences:
> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e 
> oom_adj
>
> Then I checked for references to "-17" in all .c and .h files, and 
> found a couple of matches, but only one obvious one:
> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>
> But again, a search for OOM_DISABLE came up with nothing obvious...
>
> In a last ditch attempt, I did a search for all references to 'oom' 
> (case-insensitive) in both code bases, then compared the two:
>  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n 
> > /tmp/annabelle.oom_adj
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R 
> "oom" . | sort -n > /tmp/vicky.oom_adj
>
> and this brought back (yet again) nothing obvious..
>
>
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> ./include/linux/oom.h
> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>
>  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> ./mm/oom_kill.c
> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>
>  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>
>
>
> Could anyone please shed some light as to why the default oom_adj is 
> set to -17 now (and where it is actually set)? From what I can tell, 
> the fix for this issue will either be:
>
>   1. Allow OOM killer to override the decision of ignoring oom_adj ==
>      -17 if an unrecoverable loop is encountered.
>   2. Change the default back to 0.
>
> Again, my apologies if this bug report is slightly unorthodox, or 
> doesn't follow usual procedure etc. I can assure you I have tried my 
> absolute best to give all the necessary information though.
>
> Cal
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fwd: cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-29 22:24   ` Cal Leeming [Simplicity Media Ltd]
  0 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-29 22:24 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users

Some further logs:
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283


On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>  First of all, my apologies if I have submitted this problem to the 
> wrong place, spent 20 minutes trying to figure out where it needs to 
> be sent, and was still none the wiser.
>
> The problem is related to applying memory limitations within a cgroup. 
> If the OOM killer kicks in, it gets stuck in a loop where it tries to 
> kill a process which has an oom_adj of -17. This causes an infinite 
> loop, which in turn locks up the system.
>
> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
>
>
>  root@vicky [/home/foxx] > uname -a
> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> GNU/Linux
> (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
>
> When this is encountered, the memory usage across the whole server is 
> still within limits (not even hitting swap).
>
> The memory configuration for the cgroup/lxc is:
> lxc.cgroup.memory.limit_in_bytes = 3000M
> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>
> Now, what is even more strange, is that when running under the 
> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't 
> happen. However, there is a slight difference between the two kernels. 
> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj, 
> where as the 2.6.32.41 gives a default of -17. I suspect this is the 
> root cause of why it's showing in the later kernel, but not the earlier.
>
> To test this theory, I started up the lxc on both servers, and then 
> ran a one liner which showed me all the processes with an oom_adj of -17:
>
> (the below is the older/working kernel)
> root@courtney.internal [/mnt/encstore/lxc] > uname -a
> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT 
> 2011 x86_64 GNU/Linux
> root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc 
> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do 
> ps -p $x --no-headers ; done
> grep: /proc/1411/task/1411/oom_adj: No such file or directory
> grep: /proc/1411/oom_adj: No such file or directory
>   804 ?        00:00:00 udevd
>   804 ?        00:00:00 udevd
> 25536 ?        00:00:00 sshd
> 25536 ?        00:00:00 sshd
> 31861 ?        00:00:00 sshd
> 31861 ?        00:00:00 sshd
> 32173 ?        00:00:00 udevd
> 32173 ?        00:00:00 udevd
> 32174 ?        00:00:00 udevd
> 32174 ?        00:00:00 udevd
>
> (the below is the newer/broken kernel)
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> GNU/Linux
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in 
> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print 
> $3}'` ; do ps -p $x --no-headers ; done
> grep: /proc/3118/task/3118/oom_adj: No such file or directory
> grep: /proc/3118/oom_adj: No such file or directory
>   895 ?        00:00:00 udevd
>   895 ?        00:00:00 udevd
>  1091 ?        00:00:00 udevd
>  1091 ?        00:00:00 udevd
>  1092 ?        00:00:00 udevd
>  1092 ?        00:00:00 udevd
>  2596 ?        00:00:00 sshd
>  2596 ?        00:00:00 sshd
>  2608 ?        00:00:00 sshd
>  2608 ?        00:00:00 sshd
>  2613 ?        00:00:00 sshd
>  2613 ?        00:00:00 sshd
>  2614 pts/0    00:00:00 bash
>  2614 pts/0    00:00:00 bash
>  2620 pts/0    00:00:00 sudo
>  2620 pts/0    00:00:00 sudo
>  2621 pts/0    00:00:00 su
>  2621 pts/0    00:00:00 su
>  2622 pts/0    00:00:00 bash
>  2622 pts/0    00:00:00 bash
>  2685 ?        00:00:00 lxc-start
>  2685 ?        00:00:00 lxc-start
>  2699 ?        00:00:00 init
>  2699 ?        00:00:00 init
>  2939 ?        00:00:00 rc
>  2939 ?        00:00:00 rc
>  2942 ?        00:00:00 startpar
>  2942 ?        00:00:00 startpar
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2980 ?        00:00:00 startpar
>  2980 ?        00:00:00 startpar
>  2981 ?        00:00:00 ctlscript.sh
>  2981 ?        00:00:00 ctlscript.sh
>  3016 ?        00:00:00 cron
>  3016 ?        00:00:00 cron
>  3025 ?        00:00:00 mysqld_safe
>  3025 ?        00:00:00 mysqld_safe
>  3032 ?        00:00:00 sshd
>  3032 ?        00:00:00 sshd
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3113 ?        00:00:00 ctl.sh
>  3113 ?        00:00:00 ctl.sh
>  3115 ?        00:00:00 sleep
>  3115 ?        00:00:00 sleep
>  3116 ?        00:00:00 .memcached.bin
>  3116 ?        00:00:00 .memcached.bin
>
>
> As you can see, it is clear that the newer kernel is setting -17 by 
> default, which in turn is causing the OOM killer loop.
>
> So I began to try and find what may have caused this problem by 
> comparing the two sources...
>
> I checked the code for all references to 'oom_adj' and 'oom_adjust' in 
> both code sets, but found no obvious differences:
> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e 
> oom_adj
>
> Then I checked for references to "-17" in all .c and .h files, and 
> found a couple of matches, but only one obvious one:
> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>
> But again, a search for OOM_DISABLE came up with nothing obvious...
>
> In a last ditch attempt, I did a search for all references to 'oom' 
> (case-insensitive) in both code bases, then compared the two:
>  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n 
> > /tmp/annabelle.oom_adj
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R 
> "oom" . | sort -n > /tmp/vicky.oom_adj
>
> and this brought back (yet again) nothing obvious..
>
>
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> ./include/linux/oom.h
> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>
>  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> ./mm/oom_kill.c
> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>
>  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>
>
>
> Could anyone please shed some light as to why the default oom_adj is 
> set to -17 now (and where it is actually set)? From what I can tell, 
> the fix for this issue will either be:
>
>   1. Allow OOM killer to override the decision of ignoring oom_adj ==
>      -17 if an unrecoverable loop is encountered.
>   2. Change the default back to 0.
>
> Again, my apologies if this bug report is slightly unorthodox, or 
> doesn't follow usual procedure etc. I can assure you I have tried my 
> absolute best to give all the necessary information though.
>
> Cal
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fwd: cgroup OOM killer loop causes system to lockup (possible fix included)
  2011-05-29 22:24   ` Cal Leeming [Simplicity Media Ltd]
@ 2011-05-30  2:23     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-05-30  2:23 UTC (permalink / raw)
  To: Cal Leeming [Simplicity Media Ltd]
  Cc: linux-kernel, linux-rt-users, linux-mm, kosaki.motohiro,
	minchan.kim, rientjes


Thank you. memory cgroup and OOM troubles are handled in linux-mm.

On Sun, 29 May 2011 23:24:07 +0100
"Cal Leeming [Simplicity Media Ltd]"  <cal.leeming@simplicitymedialtd.co.uk> wrote:

> Some further logs:
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> 
> 

hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
AFAIK, no kernel has such crazy settings as default..

Does your 2 kernel uses the same environment/distribution ?

Thanks,
-Kame

> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
> >  First of all, my apologies if I have submitted this problem to the 
> > wrong place, spent 20 minutes trying to figure out where it needs to 
> > be sent, and was still none the wiser.
> >
> > The problem is related to applying memory limitations within a cgroup. 
> > If the OOM killer kicks in, it gets stuck in a loop where it tries to 
> > kill a process which has an oom_adj of -17. This causes an infinite 
> > loop, which in turn locks up the system.
> >
> > May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of 
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of 
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of 
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of 
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of 
> > memory: kill process 6016 (java) score 0 or a child
> >
> >
> >  root@vicky [/home/foxx] > uname -a
> > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> > GNU/Linux
> > (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
> >
> > When this is encountered, the memory usage across the whole server is 
> > still within limits (not even hitting swap).
> >
> > The memory configuration for the cgroup/lxc is:
> > lxc.cgroup.memory.limit_in_bytes = 3000M
> > lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
> >
> > Now, what is even more strange, is that when running under the 
> > 2.6.32.28 kernel (both patched and unpatched), this problem doesn't 
> > happen. However, there is a slight difference between the two kernels. 
> > The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj, 
> > where as the 2.6.32.41 gives a default of -17. I suspect this is the 
> > root cause of why it's showing in the later kernel, but not the earlier.
> >
> > To test this theory, I started up the lxc on both servers, and then 
> > ran a one liner which showed me all the processes with an oom_adj of -17:
> >
> > (the below is the older/working kernel)
> > root@courtney.internal [/mnt/encstore/lxc] > uname -a
> > Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT 
> > 2011 x86_64 GNU/Linux
> > root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc 
> > -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do 
> > ps -p $x --no-headers ; done
> > grep: /proc/1411/task/1411/oom_adj: No such file or directory
> > grep: /proc/1411/oom_adj: No such file or directory
> >   804 ?        00:00:00 udevd
> >   804 ?        00:00:00 udevd
> > 25536 ?        00:00:00 sshd
> > 25536 ?        00:00:00 sshd
> > 31861 ?        00:00:00 sshd
> > 31861 ?        00:00:00 sshd
> > 32173 ?        00:00:00 udevd
> > 32173 ?        00:00:00 udevd
> > 32174 ?        00:00:00 udevd
> > 32174 ?        00:00:00 udevd
> >
> > (the below is the newer/broken kernel)
> >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
> > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> > GNU/Linux
> >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in 
> > `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print 
> > $3}'` ; do ps -p $x --no-headers ; done
> > grep: /proc/3118/task/3118/oom_adj: No such file or directory
> > grep: /proc/3118/oom_adj: No such file or directory
> >   895 ?        00:00:00 udevd
> >   895 ?        00:00:00 udevd
> >  1091 ?        00:00:00 udevd
> >  1091 ?        00:00:00 udevd
> >  1092 ?        00:00:00 udevd
> >  1092 ?        00:00:00 udevd
> >  2596 ?        00:00:00 sshd
> >  2596 ?        00:00:00 sshd
> >  2608 ?        00:00:00 sshd
> >  2608 ?        00:00:00 sshd
> >  2613 ?        00:00:00 sshd
> >  2613 ?        00:00:00 sshd
> >  2614 pts/0    00:00:00 bash
> >  2614 pts/0    00:00:00 bash
> >  2620 pts/0    00:00:00 sudo
> >  2620 pts/0    00:00:00 sudo
> >  2621 pts/0    00:00:00 su
> >  2621 pts/0    00:00:00 su
> >  2622 pts/0    00:00:00 bash
> >  2622 pts/0    00:00:00 bash
> >  2685 ?        00:00:00 lxc-start
> >  2685 ?        00:00:00 lxc-start
> >  2699 ?        00:00:00 init
> >  2699 ?        00:00:00 init
> >  2939 ?        00:00:00 rc
> >  2939 ?        00:00:00 rc
> >  2942 ?        00:00:00 startpar
> >  2942 ?        00:00:00 startpar
> >  2964 ?        00:00:00 rsyslogd
> >  2964 ?        00:00:00 rsyslogd
> >  2964 ?        00:00:00 rsyslogd
> >  2964 ?        00:00:00 rsyslogd
> >  2980 ?        00:00:00 startpar
> >  2980 ?        00:00:00 startpar
> >  2981 ?        00:00:00 ctlscript.sh
> >  2981 ?        00:00:00 ctlscript.sh
> >  3016 ?        00:00:00 cron
> >  3016 ?        00:00:00 cron
> >  3025 ?        00:00:00 mysqld_safe
> >  3025 ?        00:00:00 mysqld_safe
> >  3032 ?        00:00:00 sshd
> >  3032 ?        00:00:00 sshd
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3113 ?        00:00:00 ctl.sh
> >  3113 ?        00:00:00 ctl.sh
> >  3115 ?        00:00:00 sleep
> >  3115 ?        00:00:00 sleep
> >  3116 ?        00:00:00 .memcached.bin
> >  3116 ?        00:00:00 .memcached.bin
> >
> >
> > As you can see, it is clear that the newer kernel is setting -17 by 
> > default, which in turn is causing the OOM killer loop.
> >
> > So I began to try and find what may have caused this problem by 
> > comparing the two sources...
> >
> > I checked the code for all references to 'oom_adj' and 'oom_adjust' in 
> > both code sets, but found no obvious differences:
> > grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e 
> > oom_adj
> >
> > Then I checked for references to "-17" in all .c and .h files, and 
> > found a couple of matches, but only one obvious one:
> > grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> > ./include/linux/oom.h:#define OOM_DISABLE (-17)
> >
> > But again, a search for OOM_DISABLE came up with nothing obvious...
> >
> > In a last ditch attempt, I did a search for all references to 'oom' 
> > (case-insensitive) in both code bases, then compared the two:
> >  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n 
> > > /tmp/annabelle.oom_adj
> >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R 
> > "oom" . | sort -n > /tmp/vicky.oom_adj
> >
> > and this brought back (yet again) nothing obvious..
> >
> >
> >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> > ./include/linux/oom.h
> > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> >
> >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
> > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> >
> >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> > ./mm/oom_kill.c
> > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> >
> >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
> > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> >
> >
> >
> > Could anyone please shed some light as to why the default oom_adj is 
> > set to -17 now (and where it is actually set)? From what I can tell, 
> > the fix for this issue will either be:
> >
> >   1. Allow OOM killer to override the decision of ignoring oom_adj ==
> >      -17 if an unrecoverable loop is encountered.
> >   2. Change the default back to 0.
> >
> > Again, my apologies if this bug report is slightly unorthodox, or 
> > doesn't follow usual procedure etc. I can assure you I have tried my 
> > absolute best to give all the necessary information though.
> >
> > Cal
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fwd: cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-30  2:23     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-05-30  2:23 UTC (permalink / raw)
  To: Cal Leeming [Simplicity Media Ltd]
  Cc: linux-kernel, linux-rt-users, linux-mm, kosaki.motohiro,
	minchan.kim, rientjes


Thank you. memory cgroup and OOM troubles are handled in linux-mm.

On Sun, 29 May 2011 23:24:07 +0100
"Cal Leeming [Simplicity Media Ltd]"  <cal.leeming@simplicitymedialtd.co.uk> wrote:

> Some further logs:
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server 
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]  
> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> 
> 

hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
AFAIK, no kernel has such crazy settings as default..

Does your 2 kernel uses the same environment/distribution ?

Thanks,
-Kame

> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
> >  First of all, my apologies if I have submitted this problem to the 
> > wrong place, spent 20 minutes trying to figure out where it needs to 
> > be sent, and was still none the wiser.
> >
> > The problem is related to applying memory limitations within a cgroup. 
> > If the OOM killer kicks in, it gets stuck in a loop where it tries to 
> > kill a process which has an oom_adj of -17. This causes an infinite 
> > loop, which in turn locks up the system.
> >
> > May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of 
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of 
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of 
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of 
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of 
> > memory: kill process 6016 (java) score 0 or a child
> >
> >
> >  root@vicky [/home/foxx] > uname -a
> > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> > GNU/Linux
> > (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
> >
> > When this is encountered, the memory usage across the whole server is 
> > still within limits (not even hitting swap).
> >
> > The memory configuration for the cgroup/lxc is:
> > lxc.cgroup.memory.limit_in_bytes = 3000M
> > lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
> >
> > Now, what is even more strange, is that when running under the 
> > 2.6.32.28 kernel (both patched and unpatched), this problem doesn't 
> > happen. However, there is a slight difference between the two kernels. 
> > The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj, 
> > where as the 2.6.32.41 gives a default of -17. I suspect this is the 
> > root cause of why it's showing in the later kernel, but not the earlier.
> >
> > To test this theory, I started up the lxc on both servers, and then 
> > ran a one liner which showed me all the processes with an oom_adj of -17:
> >
> > (the below is the older/working kernel)
> > root@courtney.internal [/mnt/encstore/lxc] > uname -a
> > Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT 
> > 2011 x86_64 GNU/Linux
> > root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc 
> > -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do 
> > ps -p $x --no-headers ; done
> > grep: /proc/1411/task/1411/oom_adj: No such file or directory
> > grep: /proc/1411/oom_adj: No such file or directory
> >   804 ?        00:00:00 udevd
> >   804 ?        00:00:00 udevd
> > 25536 ?        00:00:00 sshd
> > 25536 ?        00:00:00 sshd
> > 31861 ?        00:00:00 sshd
> > 31861 ?        00:00:00 sshd
> > 32173 ?        00:00:00 udevd
> > 32173 ?        00:00:00 udevd
> > 32174 ?        00:00:00 udevd
> > 32174 ?        00:00:00 udevd
> >
> > (the below is the newer/broken kernel)
> >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
> > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> > GNU/Linux
> >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in 
> > `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print 
> > $3}'` ; do ps -p $x --no-headers ; done
> > grep: /proc/3118/task/3118/oom_adj: No such file or directory
> > grep: /proc/3118/oom_adj: No such file or directory
> >   895 ?        00:00:00 udevd
> >   895 ?        00:00:00 udevd
> >  1091 ?        00:00:00 udevd
> >  1091 ?        00:00:00 udevd
> >  1092 ?        00:00:00 udevd
> >  1092 ?        00:00:00 udevd
> >  2596 ?        00:00:00 sshd
> >  2596 ?        00:00:00 sshd
> >  2608 ?        00:00:00 sshd
> >  2608 ?        00:00:00 sshd
> >  2613 ?        00:00:00 sshd
> >  2613 ?        00:00:00 sshd
> >  2614 pts/0    00:00:00 bash
> >  2614 pts/0    00:00:00 bash
> >  2620 pts/0    00:00:00 sudo
> >  2620 pts/0    00:00:00 sudo
> >  2621 pts/0    00:00:00 su
> >  2621 pts/0    00:00:00 su
> >  2622 pts/0    00:00:00 bash
> >  2622 pts/0    00:00:00 bash
> >  2685 ?        00:00:00 lxc-start
> >  2685 ?        00:00:00 lxc-start
> >  2699 ?        00:00:00 init
> >  2699 ?        00:00:00 init
> >  2939 ?        00:00:00 rc
> >  2939 ?        00:00:00 rc
> >  2942 ?        00:00:00 startpar
> >  2942 ?        00:00:00 startpar
> >  2964 ?        00:00:00 rsyslogd
> >  2964 ?        00:00:00 rsyslogd
> >  2964 ?        00:00:00 rsyslogd
> >  2964 ?        00:00:00 rsyslogd
> >  2980 ?        00:00:00 startpar
> >  2980 ?        00:00:00 startpar
> >  2981 ?        00:00:00 ctlscript.sh
> >  2981 ?        00:00:00 ctlscript.sh
> >  3016 ?        00:00:00 cron
> >  3016 ?        00:00:00 cron
> >  3025 ?        00:00:00 mysqld_safe
> >  3025 ?        00:00:00 mysqld_safe
> >  3032 ?        00:00:00 sshd
> >  3032 ?        00:00:00 sshd
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3097 ?        00:00:00 mysqld.bin
> >  3113 ?        00:00:00 ctl.sh
> >  3113 ?        00:00:00 ctl.sh
> >  3115 ?        00:00:00 sleep
> >  3115 ?        00:00:00 sleep
> >  3116 ?        00:00:00 .memcached.bin
> >  3116 ?        00:00:00 .memcached.bin
> >
> >
> > As you can see, it is clear that the newer kernel is setting -17 by 
> > default, which in turn is causing the OOM killer loop.
> >
> > So I began to try and find what may have caused this problem by 
> > comparing the two sources...
> >
> > I checked the code for all references to 'oom_adj' and 'oom_adjust' in 
> > both code sets, but found no obvious differences:
> > grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e 
> > oom_adj
> >
> > Then I checked for references to "-17" in all .c and .h files, and 
> > found a couple of matches, but only one obvious one:
> > grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> > ./include/linux/oom.h:#define OOM_DISABLE (-17)
> >
> > But again, a search for OOM_DISABLE came up with nothing obvious...
> >
> > In a last ditch attempt, I did a search for all references to 'oom' 
> > (case-insensitive) in both code bases, then compared the two:
> >  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n 
> > > /tmp/annabelle.oom_adj
> >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R 
> > "oom" . | sort -n > /tmp/vicky.oom_adj
> >
> > and this brought back (yet again) nothing obvious..
> >
> >
> >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> > ./include/linux/oom.h
> > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> >
> >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
> > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> >
> >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> > ./mm/oom_kill.c
> > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> >
> >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
> > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> >
> >
> >
> > Could anyone please shed some light as to why the default oom_adj is 
> > set to -17 now (and where it is actually set)? From what I can tell, 
> > the fix for this issue will either be:
> >
> >   1. Allow OOM killer to override the decision of ignoring oom_adj ==
> >      -17 if an unrecoverable loop is encountered.
> >   2. Change the default back to 0.
> >
> > Again, my apologies if this bug report is slightly unorthodox, or 
> > doesn't follow usual procedure etc. I can assure you I have tried my 
> > absolute best to give all the necessary information though.
> >
> > Cal
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fwd: cgroup OOM killer loop causes system to lockup (possible fix included)
  2011-05-30  2:23     ` KAMEZAWA Hiroyuki
  (?)
@ 2011-05-30 10:51     ` Cal Leeming [Simplicity Media Ltd]
  2011-05-30 10:52         ` Cal Leeming [Simplicity Media Ltd]
  -1 siblings, 1 reply; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 10:51 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-kernel, linux-rt-users, linux-mm, kosaki.motohiro,
	minchan.kim, rientjes

[-- Attachment #1: Type: text/plain, Size: 13467 bytes --]

@Kame
Thanks for the reply!

Both kernels used the same env/dist, but which slightly different packages.

After many frustrating hours, I have pin pointed this down to a dodgy Debian
package which appears to continue affecting the system, even after purging.
I'm still yet to pin point the package down (I'm doing several reinstall
tests, along with tripwire analysis after each reboot).

@Hiroyuki
Thank you for sending this to the right people!

@linux-mm
On a side note, would someone mind taking a few minutes to give a brief
explanation as to how the default oom_adj is set, and under what conditions
it is given -17 by default? Is this defined by the application? I looked
through the kernel source, and noticed some of the code was defaulted to set
oom_adj to OOM_DISABLE (which is defined in the headers as -17).

Assuming the debian problem is resolved, this might be another call for the
oom-killer to be modified so that if it encounters the unrecoverable loop,
it ignores the -17 rule (with some exceptions, such as kernel processes, and
other critical things). If this is going to be a relatively simple task, I
wouldn't mind spending a few hours patching this?

Cal

On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki <
kamezawa.hiroyu@jp.fujitsu.com> wrote:

>
> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>
> On Sun, 29 May 2011 23:24:07 +0100
> "Cal Leeming [Simplicity Media Ltd]"  <
> cal.leeming@simplicitymedialtd.co.uk> wrote:
>
> > Some further logs:
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >
> >
>
> hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
> AFAIK, no kernel has such crazy settings as default..
>
> Does your 2 kernel uses the same environment/distribution ?
>
> Thanks,
> -Kame
>
> > On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
> > >  First of all, my apologies if I have submitted this problem to the
> > > wrong place, spent 20 minutes trying to figure out where it needs to
> > > be sent, and was still none the wiser.
> > >
> > > The problem is related to applying memory limitations within a cgroup.
> > > If the OOM killer kicks in, it gets stuck in a loop where it tries to
> > > kill a process which has an oom_adj of -17. This causes an infinite
> > > loop, which in turn locks up the system.
> > >
> > > May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > >
> > >
> > >  root@vicky [/home/foxx] > uname -a
> > > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> > > GNU/Linux
> > > (this happens on both the grsec patched and non patched 2.6.32.41
> kernel)
> > >
> > > When this is encountered, the memory usage across the whole server is
> > > still within limits (not even hitting swap).
> > >
> > > The memory configuration for the cgroup/lxc is:
> > > lxc.cgroup.memory.limit_in_bytes = 3000M
> > > lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
> > >
> > > Now, what is even more strange, is that when running under the
> > > 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
> > > happen. However, there is a slight difference between the two kernels.
> > > The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
> > > where as the 2.6.32.41 gives a default of -17. I suspect this is the
> > > root cause of why it's showing in the later kernel, but not the
> earlier.
> > >
> > > To test this theory, I started up the lxc on both servers, and then
> > > ran a one liner which showed me all the processes with an oom_adj of
> -17:
> > >
> > > (the below is the older/working kernel)
> > > root@courtney.internal [/mnt/encstore/lxc] > uname -a
> > > Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT
> > > 2011 x86_64 GNU/Linux
> > > root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc
> > > -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do
> > > ps -p $x --no-headers ; done
> > > grep: /proc/1411/task/1411/oom_adj: No such file or directory
> > > grep: /proc/1411/oom_adj: No such file or directory
> > >   804 ?        00:00:00 udevd
> > >   804 ?        00:00:00 udevd
> > > 25536 ?        00:00:00 sshd
> > > 25536 ?        00:00:00 sshd
> > > 31861 ?        00:00:00 sshd
> > > 31861 ?        00:00:00 sshd
> > > 32173 ?        00:00:00 udevd
> > > 32173 ?        00:00:00 udevd
> > > 32174 ?        00:00:00 udevd
> > > 32174 ?        00:00:00 udevd
> > >
> > > (the below is the newer/broken kernel)
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
> > > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> > > GNU/Linux
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in
> > > `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print
> > > $3}'` ; do ps -p $x --no-headers ; done
> > > grep: /proc/3118/task/3118/oom_adj: No such file or directory
> > > grep: /proc/3118/oom_adj: No such file or directory
> > >   895 ?        00:00:00 udevd
> > >   895 ?        00:00:00 udevd
> > >  1091 ?        00:00:00 udevd
> > >  1091 ?        00:00:00 udevd
> > >  1092 ?        00:00:00 udevd
> > >  1092 ?        00:00:00 udevd
> > >  2596 ?        00:00:00 sshd
> > >  2596 ?        00:00:00 sshd
> > >  2608 ?        00:00:00 sshd
> > >  2608 ?        00:00:00 sshd
> > >  2613 ?        00:00:00 sshd
> > >  2613 ?        00:00:00 sshd
> > >  2614 pts/0    00:00:00 bash
> > >  2614 pts/0    00:00:00 bash
> > >  2620 pts/0    00:00:00 sudo
> > >  2620 pts/0    00:00:00 sudo
> > >  2621 pts/0    00:00:00 su
> > >  2621 pts/0    00:00:00 su
> > >  2622 pts/0    00:00:00 bash
> > >  2622 pts/0    00:00:00 bash
> > >  2685 ?        00:00:00 lxc-start
> > >  2685 ?        00:00:00 lxc-start
> > >  2699 ?        00:00:00 init
> > >  2699 ?        00:00:00 init
> > >  2939 ?        00:00:00 rc
> > >  2939 ?        00:00:00 rc
> > >  2942 ?        00:00:00 startpar
> > >  2942 ?        00:00:00 startpar
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2980 ?        00:00:00 startpar
> > >  2980 ?        00:00:00 startpar
> > >  2981 ?        00:00:00 ctlscript.sh
> > >  2981 ?        00:00:00 ctlscript.sh
> > >  3016 ?        00:00:00 cron
> > >  3016 ?        00:00:00 cron
> > >  3025 ?        00:00:00 mysqld_safe
> > >  3025 ?        00:00:00 mysqld_safe
> > >  3032 ?        00:00:00 sshd
> > >  3032 ?        00:00:00 sshd
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3113 ?        00:00:00 ctl.sh
> > >  3113 ?        00:00:00 ctl.sh
> > >  3115 ?        00:00:00 sleep
> > >  3115 ?        00:00:00 sleep
> > >  3116 ?        00:00:00 .memcached.bin
> > >  3116 ?        00:00:00 .memcached.bin
> > >
> > >
> > > As you can see, it is clear that the newer kernel is setting -17 by
> > > default, which in turn is causing the OOM killer loop.
> > >
> > > So I began to try and find what may have caused this problem by
> > > comparing the two sources...
> > >
> > > I checked the code for all references to 'oom_adj' and 'oom_adjust' in
> > > both code sets, but found no obvious differences:
> > > grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
> > > oom_adj
> > >
> > > Then I checked for references to "-17" in all .c and .h files, and
> > > found a couple of matches, but only one obvious one:
> > > grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> > > ./include/linux/oom.h:#define OOM_DISABLE (-17)
> > >
> > > But again, a search for OOM_DISABLE came up with nothing obvious...
> > >
> > > In a last ditch attempt, I did a search for all references to 'oom'
> > > (case-insensitive) in both code bases, then compared the two:
> > >  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n
> > > > /tmp/annabelle.oom_adj
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R
> > > "oom" . | sort -n > /tmp/vicky.oom_adj
> > >
> > > and this brought back (yet again) nothing obvious..
> > >
> > >
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum
> > > ./include/linux/oom.h
> > > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> > >
> > >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
> > > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> > >
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum
> > > ./mm/oom_kill.c
> > > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> > >
> > >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
> > > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> > >
> > >
> > >
> > > Could anyone please shed some light as to why the default oom_adj is
> > > set to -17 now (and where it is actually set)? From what I can tell,
> > > the fix for this issue will either be:
> > >
> > >   1. Allow OOM killer to override the decision of ignoring oom_adj ==
> > >      -17 if an unrecoverable loop is encountered.
> > >   2. Change the default back to 0.
> > >
> > > Again, my apologies if this bug report is slightly unorthodox, or
> > > doesn't follow usual procedure etc. I can assure you I have tried my
> > > absolute best to give all the necessary information though.
> > >
> > > Cal
> > >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
>
>

[-- Attachment #2: Type: text/html, Size: 16646 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* cgroup OOM killer loop causes system to lockup (possible fix included)
  2011-05-30 10:51     ` Cal Leeming [Simplicity Media Ltd]
  2011-05-30 10:52         ` Cal Leeming [Simplicity Media Ltd]
@ 2011-05-30 10:52         ` Cal Leeming [Simplicity Media Ltd]
  0 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 10:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users, linux-mm, kosaki.motohiro,
	minchan.kim, rientjes, kamezawa.hiroyu

-resent due to incorrect formatting, sorry if this dups!

@Kame
Thanks for the reply!
Both kernels used the same env/dist, but which slightly different packages.
After many frustrating hours, I have pin pointed this down to a dodgy
Debian package which appears to continue affecting the system, even
after purging. I'm still yet to pin point the package down (I'm doing
several reinstall tests, along with tripwire analysis after each
reboot).

@Hiroyuki
Thank you for sending this to the right people!

@linux-mm
On a side note, would someone mind taking a few minutes to give a
brief explanation as to how the default oom_adj is set, and under what
conditions it is given -17 by default? Is this defined by the
application? I looked through the kernel source,
and noticed some of the code was defaulted to set oom_adj to
OOM_DISABLE (which is defined in the headers as -17).

Assuming the debian problem is resolved, this might be another call
for the oom-killer to be modified so that if it encounters the
unrecoverable loop, it ignores the -17 rule (with some exceptions,
such as kernel processes, and other critical things). If this is going
to be a relatively simple task, I wouldn't mind spending a few hours
patching this?

Cal

On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>
> On Sun, 29 May 2011 23:24:07 +0100
> "Cal Leeming [Simplicity Media Ltd]"  <cal.leeming@simplicitymedialtd.co.uk> wrote:
>
> > Some further logs:
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >
> >
>
> hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
> AFAIK, no kernel has such crazy settings as default..
>
> Does your 2 kernel uses the same environment/distribution ?
>
> Thanks,
> -Kame
>
> > On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
> > >  First of all, my apologies if I have submitted this problem to the
> > > wrong place, spent 20 minutes trying to figure out where it needs to
> > > be sent, and was still none the wiser.
> > >
> > > The problem is related to applying memory limitations within a cgroup.
> > > If the OOM killer kicks in, it gets stuck in a loop where it tries to
> > > kill a process which has an oom_adj of -17. This causes an infinite
> > > loop, which in turn locks up the system.
> > >
> > > May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > >
> > >
> > >  root@vicky [/home/foxx] > uname -a
> > > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> > > GNU/Linux
> > > (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
> > >
> > > When this is encountered, the memory usage across the whole server is
> > > still within limits (not even hitting swap).
> > >
> > > The memory configuration for the cgroup/lxc is:
> > > lxc.cgroup.memory.limit_in_bytes = 3000M
> > > lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
> > >
> > > Now, what is even more strange, is that when running under the
> > > 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
> > > happen. However, there is a slight difference between the two kernels.
> > > The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
> > > where as the 2.6.32.41 gives a default of -17. I suspect this is the
> > > root cause of why it's showing in the later kernel, but not the earlier.
> > >
> > > To test this theory, I started up the lxc on both servers, and then
> > > ran a one liner which showed me all the processes with an oom_adj of -17:
> > >
> > > (the below is the older/working kernel)
> > > root@courtney.internal [/mnt/encstore/lxc] > uname -a
> > > Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT
> > > 2011 x86_64 GNU/Linux
> > > root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc
> > > -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do
> > > ps -p $x --no-headers ; done
> > > grep: /proc/1411/task/1411/oom_adj: No such file or directory
> > > grep: /proc/1411/oom_adj: No such file or directory
> > >   804 ?        00:00:00 udevd
> > >   804 ?        00:00:00 udevd
> > > 25536 ?        00:00:00 sshd
> > > 25536 ?        00:00:00 sshd
> > > 31861 ?        00:00:00 sshd
> > > 31861 ?        00:00:00 sshd
> > > 32173 ?        00:00:00 udevd
> > > 32173 ?        00:00:00 udevd
> > > 32174 ?        00:00:00 udevd
> > > 32174 ?        00:00:00 udevd
> > >
> > > (the below is the newer/broken kernel)
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
> > > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> > > GNU/Linux
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in
> > > `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print
> > > $3}'` ; do ps -p $x --no-headers ; done
> > > grep: /proc/3118/task/3118/oom_adj: No such file or directory
> > > grep: /proc/3118/oom_adj: No such file or directory
> > >   895 ?        00:00:00 udevd
> > >   895 ?        00:00:00 udevd
> > >  1091 ?        00:00:00 udevd
> > >  1091 ?        00:00:00 udevd
> > >  1092 ?        00:00:00 udevd
> > >  1092 ?        00:00:00 udevd
> > >  2596 ?        00:00:00 sshd
> > >  2596 ?        00:00:00 sshd
> > >  2608 ?        00:00:00 sshd
> > >  2608 ?        00:00:00 sshd
> > >  2613 ?        00:00:00 sshd
> > >  2613 ?        00:00:00 sshd
> > >  2614 pts/0    00:00:00 bash
> > >  2614 pts/0    00:00:00 bash
> > >  2620 pts/0    00:00:00 sudo
> > >  2620 pts/0    00:00:00 sudo
> > >  2621 pts/0    00:00:00 su
> > >  2621 pts/0    00:00:00 su
> > >  2622 pts/0    00:00:00 bash
> > >  2622 pts/0    00:00:00 bash
> > >  2685 ?        00:00:00 lxc-start
> > >  2685 ?        00:00:00 lxc-start
> > >  2699 ?        00:00:00 init
> > >  2699 ?        00:00:00 init
> > >  2939 ?        00:00:00 rc
> > >  2939 ?        00:00:00 rc
> > >  2942 ?        00:00:00 startpar
> > >  2942 ?        00:00:00 startpar
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2980 ?        00:00:00 startpar
> > >  2980 ?        00:00:00 startpar
> > >  2981 ?        00:00:00 ctlscript.sh
> > >  2981 ?        00:00:00 ctlscript.sh
> > >  3016 ?        00:00:00 cron
> > >  3016 ?        00:00:00 cron
> > >  3025 ?        00:00:00 mysqld_safe
> > >  3025 ?        00:00:00 mysqld_safe
> > >  3032 ?        00:00:00 sshd
> > >  3032 ?        00:00:00 sshd
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3113 ?        00:00:00 ctl.sh
> > >  3113 ?        00:00:00 ctl.sh
> > >  3115 ?        00:00:00 sleep
> > >  3115 ?        00:00:00 sleep
> > >  3116 ?        00:00:00 .memcached.bin
> > >  3116 ?        00:00:00 .memcached.bin
> > >
> > >
> > > As you can see, it is clear that the newer kernel is setting -17 by
> > > default, which in turn is causing the OOM killer loop.
> > >
> > > So I began to try and find what may have caused this problem by
> > > comparing the two sources...
> > >
> > > I checked the code for all references to 'oom_adj' and 'oom_adjust' in
> > > both code sets, but found no obvious differences:
> > > grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
> > > oom_adj
> > >
> > > Then I checked for references to "-17" in all .c and .h files, and
> > > found a couple of matches, but only one obvious one:
> > > grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> > > ./include/linux/oom.h:#define OOM_DISABLE (-17)
> > >
> > > But again, a search for OOM_DISABLE came up with nothing obvious...
> > >
> > > In a last ditch attempt, I did a search for all references to 'oom'
> > > (case-insensitive) in both code bases, then compared the two:
> > >  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n
> > > > /tmp/annabelle.oom_adj
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R
> > > "oom" . | sort -n > /tmp/vicky.oom_adj
> > >
> > > and this brought back (yet again) nothing obvious..
> > >
> > >
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum
> > > ./include/linux/oom.h
> > > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> > >
> > >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
> > > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> > >
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum
> > > ./mm/oom_kill.c
> > > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> > >
> > >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
> > > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> > >
> > >
> > >
> > > Could anyone please shed some light as to why the default oom_adj is
> > > set to -17 now (and where it is actually set)? From what I can tell,
> > > the fix for this issue will either be:
> > >
> > >   1. Allow OOM killer to override the decision of ignoring oom_adj ==
> > >      -17 if an unrecoverable loop is encountered.
> > >   2. Change the default back to 0.
> > >
> > > Again, my apologies if this bug report is slightly unorthodox, or
> > > doesn't follow usual procedure etc. I can assure you I have tried my
> > > absolute best to give all the necessary information though.
> > >
> > > Cal
> > >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-30 10:52         ` Cal Leeming [Simplicity Media Ltd]
  0 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 10:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users, linux-mm, kosaki.motohiro, 

-resent due to incorrect formatting, sorry if this dups!

@Kame
Thanks for the reply!
Both kernels used the same env/dist, but which slightly different packages.
After many frustrating hours, I have pin pointed this down to a dodgy
Debian package which appears to continue affecting the system, even
after purging. I'm still yet to pin point the package down (I'm doing
several reinstall tests, along with tripwire analysis after each
reboot).

@Hiroyuki
Thank you for sending this to the right people!

@linux-mm
On a side note, would someone mind taking a few minutes to give a
brief explanation as to how the default oom_adj is set, and under what
conditions it is given -17 by default? Is this defined by the
application? I looked through the kernel source,
and noticed some of the code was defaulted to set oom_adj to
OOM_DISABLE (which is defined in the headers as -17).

Assuming the debian problem is resolved, this might be another call
for the oom-killer to be modified so that if it encounters the
unrecoverable loop, it ignores the -17 rule (with some exceptions,
such as kernel processes, and other critical things). If this is going
to be a relatively simple task, I wouldn't mind spending a few hours
patching this?

Cal

On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>
> On Sun, 29 May 2011 23:24:07 +0100
> "Cal Leeming [Simplicity Media Ltd]"  <cal.leeming@simplicitymedialtd.co.uk> wrote:
>
> > Some further logs:
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >
> >
>
> hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
> AFAIK, no kernel has such crazy settings as default..
>
> Does your 2 kernel uses the same environment/distribution ?
>
> Thanks,
> -Kame
>
> > On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
> > >  First of all, my apologies if I have submitted this problem to the
> > > wrong place, spent 20 minutes trying to figure out where it needs to
> > > be sent, and was still none the wiser.
> > >
> > > The problem is related to applying memory limitations within a cgroup.
> > > If the OOM killer kicks in, it gets stuck in a loop where it tries to
> > > kill a process which has an oom_adj of -17. This causes an infinite
> > > loop, which in turn locks up the system.
> > >
> > > May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > >
> > >
> > >  root@vicky [/home/foxx] > uname -a
> > > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> > > GNU/Linux
> > > (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
> > >
> > > When this is encountered, the memory usage across the whole server is
> > > still within limits (not even hitting swap).
> > >
> > > The memory configuration for the cgroup/lxc is:
> > > lxc.cgroup.memory.limit_in_bytes = 3000M
> > > lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
> > >
> > > Now, what is even more strange, is that when running under the
> > > 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
> > > happen. However, there is a slight difference between the two kernels.
> > > The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
> > > where as the 2.6.32.41 gives a default of -17. I suspect this is the
> > > root cause of why it's showing in the later kernel, but not the earlier.
> > >
> > > To test this theory, I started up the lxc on both servers, and then
> > > ran a one liner which showed me all the processes with an oom_adj of -17:
> > >
> > > (the below is the older/working kernel)
> > > root@courtney.internal [/mnt/encstore/lxc] > uname -a
> > > Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT
> > > 2011 x86_64 GNU/Linux
> > > root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc
> > > -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do
> > > ps -p $x --no-headers ; done
> > > grep: /proc/1411/task/1411/oom_adj: No such file or directory
> > > grep: /proc/1411/oom_adj: No such file or directory
> > >   804 ?        00:00:00 udevd
> > >   804 ?        00:00:00 udevd
> > > 25536 ?        00:00:00 sshd
> > > 25536 ?        00:00:00 sshd
> > > 31861 ?        00:00:00 sshd
> > > 31861 ?        00:00:00 sshd
> > > 32173 ?        00:00:00 udevd
> > > 32173 ?        00:00:00 udevd
> > > 32174 ?        00:00:00 udevd
> > > 32174 ?        00:00:00 udevd
> > >
> > > (the below is the newer/broken kernel)
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
> > > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> > > GNU/Linux
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in
> > > `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print
> > > $3}'` ; do ps -p $x --no-headers ; done
> > > grep: /proc/3118/task/3118/oom_adj: No such file or directory
> > > grep: /proc/3118/oom_adj: No such file or directory
> > >   895 ?        00:00:00 udevd
> > >   895 ?        00:00:00 udevd
> > >  1091 ?        00:00:00 udevd
> > >  1091 ?        00:00:00 udevd
> > >  1092 ?        00:00:00 udevd
> > >  1092 ?        00:00:00 udevd
> > >  2596 ?        00:00:00 sshd
> > >  2596 ?        00:00:00 sshd
> > >  2608 ?        00:00:00 sshd
> > >  2608 ?        00:00:00 sshd
> > >  2613 ?        00:00:00 sshd
> > >  2613 ?        00:00:00 sshd
> > >  2614 pts/0    00:00:00 bash
> > >  2614 pts/0    00:00:00 bash
> > >  2620 pts/0    00:00:00 sudo
> > >  2620 pts/0    00:00:00 sudo
> > >  2621 pts/0    00:00:00 su
> > >  2621 pts/0    00:00:00 su
> > >  2622 pts/0    00:00:00 bash
> > >  2622 pts/0    00:00:00 bash
> > >  2685 ?        00:00:00 lxc-start
> > >  2685 ?        00:00:00 lxc-start
> > >  2699 ?        00:00:00 init
> > >  2699 ?        00:00:00 init
> > >  2939 ?        00:00:00 rc
> > >  2939 ?        00:00:00 rc
> > >  2942 ?        00:00:00 startpar
> > >  2942 ?        00:00:00 startpar
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2980 ?        00:00:00 startpar
> > >  2980 ?        00:00:00 startpar
> > >  2981 ?        00:00:00 ctlscript.sh
> > >  2981 ?        00:00:00 ctlscript.sh
> > >  3016 ?        00:00:00 cron
> > >  3016 ?        00:00:00 cron
> > >  3025 ?        00:00:00 mysqld_safe
> > >  3025 ?        00:00:00 mysqld_safe
> > >  3032 ?        00:00:00 sshd
> > >  3032 ?        00:00:00 sshd
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3113 ?        00:00:00 ctl.sh
> > >  3113 ?        00:00:00 ctl.sh
> > >  3115 ?        00:00:00 sleep
> > >  3115 ?        00:00:00 sleep
> > >  3116 ?        00:00:00 .memcached.bin
> > >  3116 ?        00:00:00 .memcached.bin
> > >
> > >
> > > As you can see, it is clear that the newer kernel is setting -17 by
> > > default, which in turn is causing the OOM killer loop.
> > >
> > > So I began to try and find what may have caused this problem by
> > > comparing the two sources...
> > >
> > > I checked the code for all references to 'oom_adj' and 'oom_adjust' in
> > > both code sets, but found no obvious differences:
> > > grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
> > > oom_adj
> > >
> > > Then I checked for references to "-17" in all .c and .h files, and
> > > found a couple of matches, but only one obvious one:
> > > grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> > > ./include/linux/oom.h:#define OOM_DISABLE (-17)
> > >
> > > But again, a search for OOM_DISABLE came up with nothing obvious...
> > >
> > > In a last ditch attempt, I did a search for all references to 'oom'
> > > (case-insensitive) in both code bases, then compared the two:
> > >  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n
> > > > /tmp/annabelle.oom_adj
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R
> > > "oom" . | sort -n > /tmp/vicky.oom_adj
> > >
> > > and this brought back (yet again) nothing obvious..
> > >
> > >
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum
> > > ./include/linux/oom.h
> > > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> > >
> > >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
> > > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> > >
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum
> > > ./mm/oom_kill.c
> > > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> > >
> > >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
> > > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> > >
> > >
> > >
> > > Could anyone please shed some light as to why the default oom_adj is
> > > set to -17 now (and where it is actually set)? From what I can tell,
> > > the fix for this issue will either be:
> > >
> > >   1. Allow OOM killer to override the decision of ignoring oom_adj ==
> > >      -17 if an unrecoverable loop is encountered.
> > >   2. Change the default back to 0.
> > >
> > > Again, my apologies if this bug report is slightly unorthodox, or
> > > doesn't follow usual procedure etc. I can assure you I have tried my
> > > absolute best to give all the necessary information though.
> > >
> > > Cal
> > >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-30 10:52         ` Cal Leeming [Simplicity Media Ltd]
  0 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 10:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users, linux-mm, kosaki.motohiro,
	minchan.kim, rientjes, kamezawa.hiroyu

-resent due to incorrect formatting, sorry if this dups!

@Kame
Thanks for the reply!
Both kernels used the same env/dist, but which slightly different packages.
After many frustrating hours, I have pin pointed this down to a dodgy
Debian package which appears to continue affecting the system, even
after purging. I'm still yet to pin point the package down (I'm doing
several reinstall tests, along with tripwire analysis after each
reboot).

@Hiroyuki
Thank you for sending this to the right people!

@linux-mm
On a side note, would someone mind taking a few minutes to give a
brief explanation as to how the default oom_adj is set, and under what
conditions it is given -17 by default? Is this defined by the
application? I looked through the kernel source,
and noticed some of the code was defaulted to set oom_adj to
OOM_DISABLE (which is defined in the headers as -17).

Assuming the debian problem is resolved, this might be another call
for the oom-killer to be modified so that if it encounters the
unrecoverable loop, it ignores the -17 rule (with some exceptions,
such as kernel processes, and other critical things). If this is going
to be a relatively simple task, I wouldn't mind spending a few hours
patching this?

Cal

On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>
> On Sun, 29 May 2011 23:24:07 +0100
> "Cal Leeming [Simplicity Media Ltd]"  <cal.leeming@simplicitymedialtd.co.uk> wrote:
>
> > Some further logs:
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
> > invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> > ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
> > [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >
> >
>
> hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
> AFAIK, no kernel has such crazy settings as default..
>
> Does your 2 kernel uses the same environment/distribution ?
>
> Thanks,
> -Kame
>
> > On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
> > >  First of all, my apologies if I have submitted this problem to the
> > > wrong place, spent 20 minutes trying to figure out where it needs to
> > > be sent, and was still none the wiser.
> > >
> > > The problem is related to applying memory limitations within a cgroup.
> > > If the OOM killer kicks in, it gets stuck in a loop where it tries to
> > > kill a process which has an oom_adj of -17. This causes an infinite
> > > loop, which in turn locks up the system.
> > >
> > > May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > > May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
> > > memory: kill process 6016 (java) score 0 or a child
> > >
> > >
> > >  root@vicky [/home/foxx] > uname -a
> > > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> > > GNU/Linux
> > > (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
> > >
> > > When this is encountered, the memory usage across the whole server is
> > > still within limits (not even hitting swap).
> > >
> > > The memory configuration for the cgroup/lxc is:
> > > lxc.cgroup.memory.limit_in_bytes = 3000M
> > > lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
> > >
> > > Now, what is even more strange, is that when running under the
> > > 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
> > > happen. However, there is a slight difference between the two kernels.
> > > The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
> > > where as the 2.6.32.41 gives a default of -17. I suspect this is the
> > > root cause of why it's showing in the later kernel, but not the earlier.
> > >
> > > To test this theory, I started up the lxc on both servers, and then
> > > ran a one liner which showed me all the processes with an oom_adj of -17:
> > >
> > > (the below is the older/working kernel)
> > > root@courtney.internal [/mnt/encstore/lxc] > uname -a
> > > Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT
> > > 2011 x86_64 GNU/Linux
> > > root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc
> > > -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do
> > > ps -p $x --no-headers ; done
> > > grep: /proc/1411/task/1411/oom_adj: No such file or directory
> > > grep: /proc/1411/oom_adj: No such file or directory
> > >   804 ?        00:00:00 udevd
> > >   804 ?        00:00:00 udevd
> > > 25536 ?        00:00:00 sshd
> > > 25536 ?        00:00:00 sshd
> > > 31861 ?        00:00:00 sshd
> > > 31861 ?        00:00:00 sshd
> > > 32173 ?        00:00:00 udevd
> > > 32173 ?        00:00:00 udevd
> > > 32174 ?        00:00:00 udevd
> > > 32174 ?        00:00:00 udevd
> > >
> > > (the below is the newer/broken kernel)
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
> > > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> > > GNU/Linux
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in
> > > `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print
> > > $3}'` ; do ps -p $x --no-headers ; done
> > > grep: /proc/3118/task/3118/oom_adj: No such file or directory
> > > grep: /proc/3118/oom_adj: No such file or directory
> > >   895 ?        00:00:00 udevd
> > >   895 ?        00:00:00 udevd
> > >  1091 ?        00:00:00 udevd
> > >  1091 ?        00:00:00 udevd
> > >  1092 ?        00:00:00 udevd
> > >  1092 ?        00:00:00 udevd
> > >  2596 ?        00:00:00 sshd
> > >  2596 ?        00:00:00 sshd
> > >  2608 ?        00:00:00 sshd
> > >  2608 ?        00:00:00 sshd
> > >  2613 ?        00:00:00 sshd
> > >  2613 ?        00:00:00 sshd
> > >  2614 pts/0    00:00:00 bash
> > >  2614 pts/0    00:00:00 bash
> > >  2620 pts/0    00:00:00 sudo
> > >  2620 pts/0    00:00:00 sudo
> > >  2621 pts/0    00:00:00 su
> > >  2621 pts/0    00:00:00 su
> > >  2622 pts/0    00:00:00 bash
> > >  2622 pts/0    00:00:00 bash
> > >  2685 ?        00:00:00 lxc-start
> > >  2685 ?        00:00:00 lxc-start
> > >  2699 ?        00:00:00 init
> > >  2699 ?        00:00:00 init
> > >  2939 ?        00:00:00 rc
> > >  2939 ?        00:00:00 rc
> > >  2942 ?        00:00:00 startpar
> > >  2942 ?        00:00:00 startpar
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2964 ?        00:00:00 rsyslogd
> > >  2980 ?        00:00:00 startpar
> > >  2980 ?        00:00:00 startpar
> > >  2981 ?        00:00:00 ctlscript.sh
> > >  2981 ?        00:00:00 ctlscript.sh
> > >  3016 ?        00:00:00 cron
> > >  3016 ?        00:00:00 cron
> > >  3025 ?        00:00:00 mysqld_safe
> > >  3025 ?        00:00:00 mysqld_safe
> > >  3032 ?        00:00:00 sshd
> > >  3032 ?        00:00:00 sshd
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3097 ?        00:00:00 mysqld.bin
> > >  3113 ?        00:00:00 ctl.sh
> > >  3113 ?        00:00:00 ctl.sh
> > >  3115 ?        00:00:00 sleep
> > >  3115 ?        00:00:00 sleep
> > >  3116 ?        00:00:00 .memcached.bin
> > >  3116 ?        00:00:00 .memcached.bin
> > >
> > >
> > > As you can see, it is clear that the newer kernel is setting -17 by
> > > default, which in turn is causing the OOM killer loop.
> > >
> > > So I began to try and find what may have caused this problem by
> > > comparing the two sources...
> > >
> > > I checked the code for all references to 'oom_adj' and 'oom_adjust' in
> > > both code sets, but found no obvious differences:
> > > grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
> > > oom_adj
> > >
> > > Then I checked for references to "-17" in all .c and .h files, and
> > > found a couple of matches, but only one obvious one:
> > > grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> > > ./include/linux/oom.h:#define OOM_DISABLE (-17)
> > >
> > > But again, a search for OOM_DISABLE came up with nothing obvious...
> > >
> > > In a last ditch attempt, I did a search for all references to 'oom'
> > > (case-insensitive) in both code bases, then compared the two:
> > >  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n
> > > > /tmp/annabelle.oom_adj
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R
> > > "oom" . | sort -n > /tmp/vicky.oom_adj
> > >
> > > and this brought back (yet again) nothing obvious..
> > >
> > >
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum
> > > ./include/linux/oom.h
> > > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> > >
> > >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
> > > 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> > >
> > >  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum
> > > ./mm/oom_kill.c
> > > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> > >
> > >  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
> > > 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> > >
> > >
> > >
> > > Could anyone please shed some light as to why the default oom_adj is
> > > set to -17 now (and where it is actually set)? From what I can tell,
> > > the fix for this issue will either be:
> > >
> > >   1. Allow OOM killer to override the decision of ignoring oom_adj ==
> > >      -17 if an unrecoverable loop is encountered.
> > >   2. Change the default back to 0.
> > >
> > > Again, my apologies if this bug report is slightly unorthodox, or
> > > doesn't follow usual procedure etc. I can assure you I have tried my
> > > absolute best to give all the necessary information though.
> > >
> > > Cal
> > >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: cgroup OOM killer loop causes system to lockup (possible fix included)
  2011-05-30 10:52         ` Cal Leeming [Simplicity Media Ltd]
  (?)
@ 2011-05-30 14:27           ` Cal Leeming [Simplicity Media Ltd]
  -1 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 14:27 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users, linux-mm, kosaki.motohiro,
	minchan.kim, rientjes, kamezawa.hiroyu

I FOUND THE PROBLEM!!!

Explicit details can be found on the Debian kernel mailing list, but to 
cut short, it's caused by the firmware-bnx2 kernel module:

The broken server uses 'firmware-bnx2'.. so I purged the bnx2 package, 
removed the bnx*.ko files from /lib/modules, ran update-initramfs, and 
then rebooted (i then confirmed it was removed by checking ifconfig and 
lsmod).

And guess what.. IT WORKED.

So, this problem seems to be caused by the firmware-bnx2 module being 
loaded.. some how, that module is causing -17 oom_adj to be set for 
everything..

WTF?!?! Surely a bug?? Could someone please forward this to the 
appropriate person for the bnx2 kernel module, as I wouldn't even know 
where to begin :S

Cal

On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
> -resent due to incorrect formatting, sorry if this dups!
>
> @Kame
> Thanks for the reply!
> Both kernels used the same env/dist, but which slightly different packages.
> After many frustrating hours, I have pin pointed this down to a dodgy
> Debian package which appears to continue affecting the system, even
> after purging. I'm still yet to pin point the package down (I'm doing
> several reinstall tests, along with tripwire analysis after each
> reboot).
>
> @Hiroyuki
> Thank you for sending this to the right people!
>
> @linux-mm
> On a side note, would someone mind taking a few minutes to give a
> brief explanation as to how the default oom_adj is set, and under what
> conditions it is given -17 by default? Is this defined by the
> application? I looked through the kernel source,
> and noticed some of the code was defaulted to set oom_adj to
> OOM_DISABLE (which is defined in the headers as -17).
>
> Assuming the debian problem is resolved, this might be another call
> for the oom-killer to be modified so that if it encounters the
> unrecoverable loop, it ignores the -17 rule (with some exceptions,
> such as kernel processes, and other critical things). If this is going
> to be a relatively simple task, I wouldn't mind spending a few hours
> patching this?
>
> Cal
>
> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>>
>> On Sun, 29 May 2011 23:24:07 +0100
>> "Cal Leeming [Simplicity Media Ltd]"<cal.leeming@simplicitymedialtd.co.uk>  wrote:
>>
>>> Some further logs:
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>
>>>
>> hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
>> AFAIK, no kernel has such crazy settings as default..
>>
>> Does your 2 kernel uses the same environment/distribution ?
>>
>> Thanks,
>> -Kame
>>
>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>>>>   First of all, my apologies if I have submitted this problem to the
>>>> wrong place, spent 20 minutes trying to figure out where it needs to
>>>> be sent, and was still none the wiser.
>>>>
>>>> The problem is related to applying memory limitations within a cgroup.
>>>> If the OOM killer kicks in, it gets stuck in a loop where it tries to
>>>> kill a process which has an oom_adj of -17. This causes an infinite
>>>> loop, which in turn locks up the system.
>>>>
>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>>
>>>>
>>>>   root@vicky [/home/foxx]>  uname -a
>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
>>>> GNU/Linux
>>>> (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
>>>>
>>>> When this is encountered, the memory usage across the whole server is
>>>> still within limits (not even hitting swap).
>>>>
>>>> The memory configuration for the cgroup/lxc is:
>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>>>>
>>>> Now, what is even more strange, is that when running under the
>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
>>>> happen. However, there is a slight difference between the two kernels.
>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is the
>>>> root cause of why it's showing in the later kernel, but not the earlier.
>>>>
>>>> To test this theory, I started up the lxc on both servers, and then
>>>> ran a one liner which showed me all the processes with an oom_adj of -17:
>>>>
>>>> (the below is the older/working kernel)
>>>> root@courtney.internal [/mnt/encstore/lxc]>  uname -a
>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT
>>>> 2011 x86_64 GNU/Linux
>>>> root@courtney.internal [/mnt/encstore/lxc]>  for x in `find /proc
>>>> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do
>>>> ps -p $x --no-headers ; done
>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
>>>> grep: /proc/1411/oom_adj: No such file or directory
>>>>    804 ?        00:00:00 udevd
>>>>    804 ?        00:00:00 udevd
>>>> 25536 ?        00:00:00 sshd
>>>> 25536 ?        00:00:00 sshd
>>>> 31861 ?        00:00:00 sshd
>>>> 31861 ?        00:00:00 sshd
>>>> 32173 ?        00:00:00 udevd
>>>> 32173 ?        00:00:00 udevd
>>>> 32174 ?        00:00:00 udevd
>>>> 32174 ?        00:00:00 udevd
>>>>
>>>> (the below is the newer/broken kernel)
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname -a
>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
>>>> GNU/Linux
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x in
>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print
>>>> $3}'` ; do ps -p $x --no-headers ; done
>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
>>>> grep: /proc/3118/oom_adj: No such file or directory
>>>>    895 ?        00:00:00 udevd
>>>>    895 ?        00:00:00 udevd
>>>>   1091 ?        00:00:00 udevd
>>>>   1091 ?        00:00:00 udevd
>>>>   1092 ?        00:00:00 udevd
>>>>   1092 ?        00:00:00 udevd
>>>>   2596 ?        00:00:00 sshd
>>>>   2596 ?        00:00:00 sshd
>>>>   2608 ?        00:00:00 sshd
>>>>   2608 ?        00:00:00 sshd
>>>>   2613 ?        00:00:00 sshd
>>>>   2613 ?        00:00:00 sshd
>>>>   2614 pts/0    00:00:00 bash
>>>>   2614 pts/0    00:00:00 bash
>>>>   2620 pts/0    00:00:00 sudo
>>>>   2620 pts/0    00:00:00 sudo
>>>>   2621 pts/0    00:00:00 su
>>>>   2621 pts/0    00:00:00 su
>>>>   2622 pts/0    00:00:00 bash
>>>>   2622 pts/0    00:00:00 bash
>>>>   2685 ?        00:00:00 lxc-start
>>>>   2685 ?        00:00:00 lxc-start
>>>>   2699 ?        00:00:00 init
>>>>   2699 ?        00:00:00 init
>>>>   2939 ?        00:00:00 rc
>>>>   2939 ?        00:00:00 rc
>>>>   2942 ?        00:00:00 startpar
>>>>   2942 ?        00:00:00 startpar
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2980 ?        00:00:00 startpar
>>>>   2980 ?        00:00:00 startpar
>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>   3016 ?        00:00:00 cron
>>>>   3016 ?        00:00:00 cron
>>>>   3025 ?        00:00:00 mysqld_safe
>>>>   3025 ?        00:00:00 mysqld_safe
>>>>   3032 ?        00:00:00 sshd
>>>>   3032 ?        00:00:00 sshd
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3113 ?        00:00:00 ctl.sh
>>>>   3113 ?        00:00:00 ctl.sh
>>>>   3115 ?        00:00:00 sleep
>>>>   3115 ?        00:00:00 sleep
>>>>   3116 ?        00:00:00 .memcached.bin
>>>>   3116 ?        00:00:00 .memcached.bin
>>>>
>>>>
>>>> As you can see, it is clear that the newer kernel is setting -17 by
>>>> default, which in turn is causing the OOM killer loop.
>>>>
>>>> So I began to try and find what may have caused this problem by
>>>> comparing the two sources...
>>>>
>>>> I checked the code for all references to 'oom_adj' and 'oom_adjust' in
>>>> both code sets, but found no obvious differences:
>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
>>>> oom_adj
>>>>
>>>> Then I checked for references to "-17" in all .c and .h files, and
>>>> found a couple of matches, but only one obvious one:
>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>>>>
>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
>>>>
>>>> In a last ditch attempt, I did a search for all references to 'oom'
>>>> (case-insensitive) in both code bases, then compared the two:
>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  grep -i -R "oom" . | sort -n
>>>>> /tmp/annabelle.oom_adj
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep -i -R
>>>> "oom" . | sort -n>  /tmp/vicky.oom_adj
>>>>
>>>> and this brought back (yet again) nothing obvious..
>>>>
>>>>
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>> ./include/linux/oom.h
>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>
>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./include/linux/oom.h
>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>> ./mm/oom_kill.c
>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>
>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./mm/oom_kill.c
>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>
>>>>
>>>>
>>>> Could anyone please shed some light as to why the default oom_adj is
>>>> set to -17 now (and where it is actually set)? From what I can tell,
>>>> the fix for this issue will either be:
>>>>
>>>>    1. Allow OOM killer to override the decision of ignoring oom_adj ==
>>>>       -17 if an unrecoverable loop is encountered.
>>>>    2. Change the default back to 0.
>>>>
>>>> Again, my apologies if this bug report is slightly unorthodox, or
>>>> doesn't follow usual procedure etc. I can assure you I have tried my
>>>> absolute best to give all the necessary information though.
>>>>
>>>> Cal
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-30 14:27           ` Cal Leeming [Simplicity Media Ltd]
  0 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 14:27 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users, linux-mm, kosaki.motohiro, 

I FOUND THE PROBLEM!!!

Explicit details can be found on the Debian kernel mailing list, but to 
cut short, it's caused by the firmware-bnx2 kernel module:

The broken server uses 'firmware-bnx2'.. so I purged the bnx2 package, 
removed the bnx*.ko files from /lib/modules, ran update-initramfs, and 
then rebooted (i then confirmed it was removed by checking ifconfig and 
lsmod).

And guess what.. IT WORKED.

So, this problem seems to be caused by the firmware-bnx2 module being 
loaded.. some how, that module is causing -17 oom_adj to be set for 
everything..

WTF?!?! Surely a bug?? Could someone please forward this to the 
appropriate person for the bnx2 kernel module, as I wouldn't even know 
where to begin :S

Cal

On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
> -resent due to incorrect formatting, sorry if this dups!
>
> @Kame
> Thanks for the reply!
> Both kernels used the same env/dist, but which slightly different packages.
> After many frustrating hours, I have pin pointed this down to a dodgy
> Debian package which appears to continue affecting the system, even
> after purging. I'm still yet to pin point the package down (I'm doing
> several reinstall tests, along with tripwire analysis after each
> reboot).
>
> @Hiroyuki
> Thank you for sending this to the right people!
>
> @linux-mm
> On a side note, would someone mind taking a few minutes to give a
> brief explanation as to how the default oom_adj is set, and under what
> conditions it is given -17 by default? Is this defined by the
> application? I looked through the kernel source,
> and noticed some of the code was defaulted to set oom_adj to
> OOM_DISABLE (which is defined in the headers as -17).
>
> Assuming the debian problem is resolved, this might be another call
> for the oom-killer to be modified so that if it encounters the
> unrecoverable loop, it ignores the -17 rule (with some exceptions,
> such as kernel processes, and other critical things). If this is going
> to be a relatively simple task, I wouldn't mind spending a few hours
> patching this?
>
> Cal
>
> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>>
>> On Sun, 29 May 2011 23:24:07 +0100
>> "Cal Leeming [Simplicity Media Ltd]"<cal.leeming@simplicitymedialtd.co.uk>  wrote:
>>
>>> Some further logs:
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>
>>>
>> hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
>> AFAIK, no kernel has such crazy settings as default..
>>
>> Does your 2 kernel uses the same environment/distribution ?
>>
>> Thanks,
>> -Kame
>>
>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>>>>   First of all, my apologies if I have submitted this problem to the
>>>> wrong place, spent 20 minutes trying to figure out where it needs to
>>>> be sent, and was still none the wiser.
>>>>
>>>> The problem is related to applying memory limitations within a cgroup.
>>>> If the OOM killer kicks in, it gets stuck in a loop where it tries to
>>>> kill a process which has an oom_adj of -17. This causes an infinite
>>>> loop, which in turn locks up the system.
>>>>
>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>>
>>>>
>>>>   root@vicky [/home/foxx]>  uname -a
>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
>>>> GNU/Linux
>>>> (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
>>>>
>>>> When this is encountered, the memory usage across the whole server is
>>>> still within limits (not even hitting swap).
>>>>
>>>> The memory configuration for the cgroup/lxc is:
>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>>>>
>>>> Now, what is even more strange, is that when running under the
>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
>>>> happen. However, there is a slight difference between the two kernels.
>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is the
>>>> root cause of why it's showing in the later kernel, but not the earlier.
>>>>
>>>> To test this theory, I started up the lxc on both servers, and then
>>>> ran a one liner which showed me all the processes with an oom_adj of -17:
>>>>
>>>> (the below is the older/working kernel)
>>>> root@courtney.internal [/mnt/encstore/lxc]>  uname -a
>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT
>>>> 2011 x86_64 GNU/Linux
>>>> root@courtney.internal [/mnt/encstore/lxc]>  for x in `find /proc
>>>> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do
>>>> ps -p $x --no-headers ; done
>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
>>>> grep: /proc/1411/oom_adj: No such file or directory
>>>>    804 ?        00:00:00 udevd
>>>>    804 ?        00:00:00 udevd
>>>> 25536 ?        00:00:00 sshd
>>>> 25536 ?        00:00:00 sshd
>>>> 31861 ?        00:00:00 sshd
>>>> 31861 ?        00:00:00 sshd
>>>> 32173 ?        00:00:00 udevd
>>>> 32173 ?        00:00:00 udevd
>>>> 32174 ?        00:00:00 udevd
>>>> 32174 ?        00:00:00 udevd
>>>>
>>>> (the below is the newer/broken kernel)
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname -a
>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
>>>> GNU/Linux
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x in
>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print
>>>> $3}'` ; do ps -p $x --no-headers ; done
>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
>>>> grep: /proc/3118/oom_adj: No such file or directory
>>>>    895 ?        00:00:00 udevd
>>>>    895 ?        00:00:00 udevd
>>>>   1091 ?        00:00:00 udevd
>>>>   1091 ?        00:00:00 udevd
>>>>   1092 ?        00:00:00 udevd
>>>>   1092 ?        00:00:00 udevd
>>>>   2596 ?        00:00:00 sshd
>>>>   2596 ?        00:00:00 sshd
>>>>   2608 ?        00:00:00 sshd
>>>>   2608 ?        00:00:00 sshd
>>>>   2613 ?        00:00:00 sshd
>>>>   2613 ?        00:00:00 sshd
>>>>   2614 pts/0    00:00:00 bash
>>>>   2614 pts/0    00:00:00 bash
>>>>   2620 pts/0    00:00:00 sudo
>>>>   2620 pts/0    00:00:00 sudo
>>>>   2621 pts/0    00:00:00 su
>>>>   2621 pts/0    00:00:00 su
>>>>   2622 pts/0    00:00:00 bash
>>>>   2622 pts/0    00:00:00 bash
>>>>   2685 ?        00:00:00 lxc-start
>>>>   2685 ?        00:00:00 lxc-start
>>>>   2699 ?        00:00:00 init
>>>>   2699 ?        00:00:00 init
>>>>   2939 ?        00:00:00 rc
>>>>   2939 ?        00:00:00 rc
>>>>   2942 ?        00:00:00 startpar
>>>>   2942 ?        00:00:00 startpar
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2980 ?        00:00:00 startpar
>>>>   2980 ?        00:00:00 startpar
>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>   3016 ?        00:00:00 cron
>>>>   3016 ?        00:00:00 cron
>>>>   3025 ?        00:00:00 mysqld_safe
>>>>   3025 ?        00:00:00 mysqld_safe
>>>>   3032 ?        00:00:00 sshd
>>>>   3032 ?        00:00:00 sshd
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3113 ?        00:00:00 ctl.sh
>>>>   3113 ?        00:00:00 ctl.sh
>>>>   3115 ?        00:00:00 sleep
>>>>   3115 ?        00:00:00 sleep
>>>>   3116 ?        00:00:00 .memcached.bin
>>>>   3116 ?        00:00:00 .memcached.bin
>>>>
>>>>
>>>> As you can see, it is clear that the newer kernel is setting -17 by
>>>> default, which in turn is causing the OOM killer loop.
>>>>
>>>> So I began to try and find what may have caused this problem by
>>>> comparing the two sources...
>>>>
>>>> I checked the code for all references to 'oom_adj' and 'oom_adjust' in
>>>> both code sets, but found no obvious differences:
>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
>>>> oom_adj
>>>>
>>>> Then I checked for references to "-17" in all .c and .h files, and
>>>> found a couple of matches, but only one obvious one:
>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>>>>
>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
>>>>
>>>> In a last ditch attempt, I did a search for all references to 'oom'
>>>> (case-insensitive) in both code bases, then compared the two:
>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  grep -i -R "oom" . | sort -n
>>>>> /tmp/annabelle.oom_adj
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep -i -R
>>>> "oom" . | sort -n>  /tmp/vicky.oom_adj
>>>>
>>>> and this brought back (yet again) nothing obvious..
>>>>
>>>>
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>> ./include/linux/oom.h
>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>
>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./include/linux/oom.h
>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>> ./mm/oom_kill.c
>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>
>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./mm/oom_kill.c
>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>
>>>>
>>>>
>>>> Could anyone please shed some light as to why the default oom_adj is
>>>> set to -17 now (and where it is actually set)? From what I can tell,
>>>> the fix for this issue will either be:
>>>>
>>>>    1. Allow OOM killer to override the decision of ignoring oom_adj ==
>>>>       -17 if an unrecoverable loop is encountered.
>>>>    2. Change the default back to 0.
>>>>
>>>> Again, my apologies if this bug report is slightly unorthodox, or
>>>> doesn't follow usual procedure etc. I can assure you I have tried my
>>>> absolute best to give all the necessary information though.
>>>>
>>>> Cal
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-30 14:27           ` Cal Leeming [Simplicity Media Ltd]
  0 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 14:27 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users, linux-mm, kosaki.motohiro,
	minchan.kim, rientjes, kamezawa.hiroyu

I FOUND THE PROBLEM!!!

Explicit details can be found on the Debian kernel mailing list, but to 
cut short, it's caused by the firmware-bnx2 kernel module:

The broken server uses 'firmware-bnx2'.. so I purged the bnx2 package, 
removed the bnx*.ko files from /lib/modules, ran update-initramfs, and 
then rebooted (i then confirmed it was removed by checking ifconfig and 
lsmod).

And guess what.. IT WORKED.

So, this problem seems to be caused by the firmware-bnx2 module being 
loaded.. some how, that module is causing -17 oom_adj to be set for 
everything..

WTF?!?! Surely a bug?? Could someone please forward this to the 
appropriate person for the bnx2 kernel module, as I wouldn't even know 
where to begin :S

Cal

On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
> -resent due to incorrect formatting, sorry if this dups!
>
> @Kame
> Thanks for the reply!
> Both kernels used the same env/dist, but which slightly different packages.
> After many frustrating hours, I have pin pointed this down to a dodgy
> Debian package which appears to continue affecting the system, even
> after purging. I'm still yet to pin point the package down (I'm doing
> several reinstall tests, along with tripwire analysis after each
> reboot).
>
> @Hiroyuki
> Thank you for sending this to the right people!
>
> @linux-mm
> On a side note, would someone mind taking a few minutes to give a
> brief explanation as to how the default oom_adj is set, and under what
> conditions it is given -17 by default? Is this defined by the
> application? I looked through the kernel source,
> and noticed some of the code was defaulted to set oom_adj to
> OOM_DISABLE (which is defined in the headers as -17).
>
> Assuming the debian problem is resolved, this might be another call
> for the oom-killer to be modified so that if it encounters the
> unrecoverable loop, it ignores the -17 rule (with some exceptions,
> such as kernel processes, and other critical things). If this is going
> to be a relatively simple task, I wouldn't mind spending a few hours
> patching this?
>
> Cal
>
> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>>
>> On Sun, 29 May 2011 23:24:07 +0100
>> "Cal Leeming [Simplicity Media Ltd]"<cal.leeming@simplicitymedialtd.co.uk>  wrote:
>>
>>> Some further logs:
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>
>>>
>> hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
>> AFAIK, no kernel has such crazy settings as default..
>>
>> Does your 2 kernel uses the same environment/distribution ?
>>
>> Thanks,
>> -Kame
>>
>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>>>>   First of all, my apologies if I have submitted this problem to the
>>>> wrong place, spent 20 minutes trying to figure out where it needs to
>>>> be sent, and was still none the wiser.
>>>>
>>>> The problem is related to applying memory limitations within a cgroup.
>>>> If the OOM killer kicks in, it gets stuck in a loop where it tries to
>>>> kill a process which has an oom_adj of -17. This causes an infinite
>>>> loop, which in turn locks up the system.
>>>>
>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
>>>> memory: kill process 6016 (java) score 0 or a child
>>>>
>>>>
>>>>   root@vicky [/home/foxx]>  uname -a
>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
>>>> GNU/Linux
>>>> (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
>>>>
>>>> When this is encountered, the memory usage across the whole server is
>>>> still within limits (not even hitting swap).
>>>>
>>>> The memory configuration for the cgroup/lxc is:
>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>>>>
>>>> Now, what is even more strange, is that when running under the
>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
>>>> happen. However, there is a slight difference between the two kernels.
>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is the
>>>> root cause of why it's showing in the later kernel, but not the earlier.
>>>>
>>>> To test this theory, I started up the lxc on both servers, and then
>>>> ran a one liner which showed me all the processes with an oom_adj of -17:
>>>>
>>>> (the below is the older/working kernel)
>>>> root@courtney.internal [/mnt/encstore/lxc]>  uname -a
>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT
>>>> 2011 x86_64 GNU/Linux
>>>> root@courtney.internal [/mnt/encstore/lxc]>  for x in `find /proc
>>>> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do
>>>> ps -p $x --no-headers ; done
>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
>>>> grep: /proc/1411/oom_adj: No such file or directory
>>>>    804 ?        00:00:00 udevd
>>>>    804 ?        00:00:00 udevd
>>>> 25536 ?        00:00:00 sshd
>>>> 25536 ?        00:00:00 sshd
>>>> 31861 ?        00:00:00 sshd
>>>> 31861 ?        00:00:00 sshd
>>>> 32173 ?        00:00:00 udevd
>>>> 32173 ?        00:00:00 udevd
>>>> 32174 ?        00:00:00 udevd
>>>> 32174 ?        00:00:00 udevd
>>>>
>>>> (the below is the newer/broken kernel)
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname -a
>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
>>>> GNU/Linux
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x in
>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print
>>>> $3}'` ; do ps -p $x --no-headers ; done
>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
>>>> grep: /proc/3118/oom_adj: No such file or directory
>>>>    895 ?        00:00:00 udevd
>>>>    895 ?        00:00:00 udevd
>>>>   1091 ?        00:00:00 udevd
>>>>   1091 ?        00:00:00 udevd
>>>>   1092 ?        00:00:00 udevd
>>>>   1092 ?        00:00:00 udevd
>>>>   2596 ?        00:00:00 sshd
>>>>   2596 ?        00:00:00 sshd
>>>>   2608 ?        00:00:00 sshd
>>>>   2608 ?        00:00:00 sshd
>>>>   2613 ?        00:00:00 sshd
>>>>   2613 ?        00:00:00 sshd
>>>>   2614 pts/0    00:00:00 bash
>>>>   2614 pts/0    00:00:00 bash
>>>>   2620 pts/0    00:00:00 sudo
>>>>   2620 pts/0    00:00:00 sudo
>>>>   2621 pts/0    00:00:00 su
>>>>   2621 pts/0    00:00:00 su
>>>>   2622 pts/0    00:00:00 bash
>>>>   2622 pts/0    00:00:00 bash
>>>>   2685 ?        00:00:00 lxc-start
>>>>   2685 ?        00:00:00 lxc-start
>>>>   2699 ?        00:00:00 init
>>>>   2699 ?        00:00:00 init
>>>>   2939 ?        00:00:00 rc
>>>>   2939 ?        00:00:00 rc
>>>>   2942 ?        00:00:00 startpar
>>>>   2942 ?        00:00:00 startpar
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2964 ?        00:00:00 rsyslogd
>>>>   2980 ?        00:00:00 startpar
>>>>   2980 ?        00:00:00 startpar
>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>   3016 ?        00:00:00 cron
>>>>   3016 ?        00:00:00 cron
>>>>   3025 ?        00:00:00 mysqld_safe
>>>>   3025 ?        00:00:00 mysqld_safe
>>>>   3032 ?        00:00:00 sshd
>>>>   3032 ?        00:00:00 sshd
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3097 ?        00:00:00 mysqld.bin
>>>>   3113 ?        00:00:00 ctl.sh
>>>>   3113 ?        00:00:00 ctl.sh
>>>>   3115 ?        00:00:00 sleep
>>>>   3115 ?        00:00:00 sleep
>>>>   3116 ?        00:00:00 .memcached.bin
>>>>   3116 ?        00:00:00 .memcached.bin
>>>>
>>>>
>>>> As you can see, it is clear that the newer kernel is setting -17 by
>>>> default, which in turn is causing the OOM killer loop.
>>>>
>>>> So I began to try and find what may have caused this problem by
>>>> comparing the two sources...
>>>>
>>>> I checked the code for all references to 'oom_adj' and 'oom_adjust' in
>>>> both code sets, but found no obvious differences:
>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
>>>> oom_adj
>>>>
>>>> Then I checked for references to "-17" in all .c and .h files, and
>>>> found a couple of matches, but only one obvious one:
>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>>>>
>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
>>>>
>>>> In a last ditch attempt, I did a search for all references to 'oom'
>>>> (case-insensitive) in both code bases, then compared the two:
>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  grep -i -R "oom" . | sort -n
>>>>> /tmp/annabelle.oom_adj
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep -i -R
>>>> "oom" . | sort -n>  /tmp/vicky.oom_adj
>>>>
>>>> and this brought back (yet again) nothing obvious..
>>>>
>>>>
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>> ./include/linux/oom.h
>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>
>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./include/linux/oom.h
>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>
>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>> ./mm/oom_kill.c
>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>
>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./mm/oom_kill.c
>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>
>>>>
>>>>
>>>> Could anyone please shed some light as to why the default oom_adj is
>>>> set to -17 now (and where it is actually set)? From what I can tell,
>>>> the fix for this issue will either be:
>>>>
>>>>    1. Allow OOM killer to override the decision of ignoring oom_adj ==
>>>>       -17 if an unrecoverable loop is encountered.
>>>>    2. Change the default back to 0.
>>>>
>>>> Again, my apologies if this bug report is slightly unorthodox, or
>>>> doesn't follow usual procedure etc. I can assure you I have tried my
>>>> absolute best to give all the necessary information though.
>>>>
>>>> Cal
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: cgroup OOM killer loop causes system to lockup (possible fix included)
  2011-05-30 14:27           ` Cal Leeming [Simplicity Media Ltd]
@ 2011-05-30 17:36             ` Cal Leeming [Simplicity Media Ltd]
  -1 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 17:36 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kosaki.motohiro, minchan.kim, rientjes,
	kamezawa.hiroyu

Here is an strace of the SSH process (which is somehow inheriting the 
-17 oom_adj on all forked user instances)

(broken server - with bnx2 module loaded)
[pid  2200] [    7f13a09c9cb0] open("/proc/self/oom_adj", 
O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
[pid  2120] [    7f13a09c9f00] write(7, "\0\0\2\240\n\n\n\nPort 
22\n\n\n\nProtocol 2\n\nH"..., 680 <unfinished ...>
[pid  2200] [    7f13a09c9cb0] <... open resumed> ) = 9
[pid  2120] [    7f13a09c9f00] <... write resumed> ) = 680
[pid  2120] [    7f13a09c9e40] close(7 <unfinished ...>
[pid  2200] [    7f13a09c9844] fstat(9, <unfinished ...>
[pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
[pid  2200] [    7f13a09c9844] <... fstat resumed> 
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  2120] [    7f13a09c9e40] close(8 <unfinished ...>
[pid  2200] [    7f13a09d2a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
[pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
[pid  2200] [    7f13a09d2a2a] <... mmap resumed> ) = 0x7f13a25a6000
[pid  2120] [    7f13a09c9e40] close(4 <unfinished ...>
[pid  2200] [    7f13a09c9f00] write(9, "-17\n", 4 <unfinished ...>


(working server - with bnx2 module unloaded)
[pid  1323] [    7fae577fbe40] close(7) = 0
[pid  1631] [    7fae577fbcb0] open("/proc/self/oom_adj", 
O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
[pid  1323] [    7fae577fbf00] write(8, "\0\0\2\217\0", 5 <unfinished ...>
[pid  1631] [    7fae577fbcb0] <... open resumed> ) = 10
[pid  1323] [    7fae577fbf00] <... write resumed> ) = 5
[pid  1323] [    7fae577fbf00] write(8, "\0\0\2\206\n\n\n\nPort 
22\n\n\n\nProtocol 2\n\nH"..., 654 <unfinished ...>
[pid  1631] [    7fae577fb844] fstat(10, <unfinished ...>
[pid  1323] [    7fae577fbf00] <... write resumed> ) = 654
[pid  1631] [    7fae577fb844] <... fstat resumed> 
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  1323] [    7fae577fbe40] close(8) = 0
[pid  1631] [    7fae57804a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
[pid  1323] [    7fae577fbe40] close(9 <unfinished ...>
[pid  1631] [    7fae57804a2a] <... mmap resumed> ) = 0x7fae593d9000
[pid  1323] [    7fae577fbe40] <... close resumed> ) = 0
[pid  1323] [    7fae577fbe40] close(5 <unfinished ...>
[pid  1631] [    7fae577fbf00] write(10, "0\n", 2 <unfinished ...>

The two servers are *EXACT* duplicates of each other, completely fresh 
Debian installs, with exactly the same packages installed.

As you can see, the working server sends "0" into the oom_adj and the 
broken one sends "-17".


On 30/05/2011 15:27, Cal Leeming [Simplicity Media Ltd] wrote:
> I FOUND THE PROBLEM!!!
>
> Explicit details can be found on the Debian kernel mailing list, but 
> to cut short, it's caused by the firmware-bnx2 kernel module:
>
> The broken server uses 'firmware-bnx2'.. so I purged the bnx2 package, 
> removed the bnx*.ko files from /lib/modules, ran update-initramfs, and 
> then rebooted (i then confirmed it was removed by checking ifconfig 
> and lsmod).
>
> And guess what.. IT WORKED.
>
> So, this problem seems to be caused by the firmware-bnx2 module being 
> loaded.. some how, that module is causing -17 oom_adj to be set for 
> everything..
>
> WTF?!?! Surely a bug?? Could someone please forward this to the 
> appropriate person for the bnx2 kernel module, as I wouldn't even know 
> where to begin :S
>
> Cal
>
> On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
>> -resent due to incorrect formatting, sorry if this dups!
>>
>> @Kame
>> Thanks for the reply!
>> Both kernels used the same env/dist, but which slightly different 
>> packages.
>> After many frustrating hours, I have pin pointed this down to a dodgy
>> Debian package which appears to continue affecting the system, even
>> after purging. I'm still yet to pin point the package down (I'm doing
>> several reinstall tests, along with tripwire analysis after each
>> reboot).
>>
>> @Hiroyuki
>> Thank you for sending this to the right people!
>>
>> @linux-mm
>> On a side note, would someone mind taking a few minutes to give a
>> brief explanation as to how the default oom_adj is set, and under what
>> conditions it is given -17 by default? Is this defined by the
>> application? I looked through the kernel source,
>> and noticed some of the code was defaulted to set oom_adj to
>> OOM_DISABLE (which is defined in the headers as -17).
>>
>> Assuming the debian problem is resolved, this might be another call
>> for the oom-killer to be modified so that if it encounters the
>> unrecoverable loop, it ignores the -17 rule (with some exceptions,
>> such as kernel processes, and other critical things). If this is going
>> to be a relatively simple task, I wouldn't mind spending a few hours
>> patching this?
>>
>> Cal
>>
>> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>>>
>>> On Sun, 29 May 2011 23:24:07 +0100
>>> "Cal Leeming [Simplicity Media 
>>> Ltd]"<cal.leeming@simplicitymedialtd.co.uk>  wrote:
>>>
>>>> Some further logs:
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>
>>>>
>>> hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
>>> AFAIK, no kernel has such crazy settings as default..
>>>
>>> Does your 2 kernel uses the same environment/distribution ?
>>>
>>> Thanks,
>>> -Kame
>>>
>>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>>>>>   First of all, my apologies if I have submitted this problem to the
>>>>> wrong place, spent 20 minutes trying to figure out where it needs to
>>>>> be sent, and was still none the wiser.
>>>>>
>>>>> The problem is related to applying memory limitations within a 
>>>>> cgroup.
>>>>> If the OOM killer kicks in, it gets stuck in a loop where it tries to
>>>>> kill a process which has an oom_adj of -17. This causes an infinite
>>>>> loop, which in turn locks up the system.
>>>>>
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>
>>>>>
>>>>>   root@vicky [/home/foxx]>  uname -a
>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
>>>>> x86_64
>>>>> GNU/Linux
>>>>> (this happens on both the grsec patched and non patched 2.6.32.41 
>>>>> kernel)
>>>>>
>>>>> When this is encountered, the memory usage across the whole server is
>>>>> still within limits (not even hitting swap).
>>>>>
>>>>> The memory configuration for the cgroup/lxc is:
>>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
>>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>>>>>
>>>>> Now, what is even more strange, is that when running under the
>>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
>>>>> happen. However, there is a slight difference between the two 
>>>>> kernels.
>>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
>>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is the
>>>>> root cause of why it's showing in the later kernel, but not the 
>>>>> earlier.
>>>>>
>>>>> To test this theory, I started up the lxc on both servers, and then
>>>>> ran a one liner which showed me all the processes with an oom_adj 
>>>>> of -17:
>>>>>
>>>>> (the below is the older/working kernel)
>>>>> root@courtney.internal [/mnt/encstore/lxc]>  uname -a
>>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 
>>>>> GMT
>>>>> 2011 x86_64 GNU/Linux
>>>>> root@courtney.internal [/mnt/encstore/lxc]>  for x in `find /proc
>>>>> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do
>>>>> ps -p $x --no-headers ; done
>>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
>>>>> grep: /proc/1411/oom_adj: No such file or directory
>>>>>    804 ?        00:00:00 udevd
>>>>>    804 ?        00:00:00 udevd
>>>>> 25536 ?        00:00:00 sshd
>>>>> 25536 ?        00:00:00 sshd
>>>>> 31861 ?        00:00:00 sshd
>>>>> 31861 ?        00:00:00 sshd
>>>>> 32173 ?        00:00:00 udevd
>>>>> 32173 ?        00:00:00 udevd
>>>>> 32174 ?        00:00:00 udevd
>>>>> 32174 ?        00:00:00 udevd
>>>>>
>>>>> (the below is the newer/broken kernel)
>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname -a
>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
>>>>> x86_64
>>>>> GNU/Linux
>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x in
>>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' 
>>>>> '{print
>>>>> $3}'` ; do ps -p $x --no-headers ; done
>>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
>>>>> grep: /proc/3118/oom_adj: No such file or directory
>>>>>    895 ?        00:00:00 udevd
>>>>>    895 ?        00:00:00 udevd
>>>>>   1091 ?        00:00:00 udevd
>>>>>   1091 ?        00:00:00 udevd
>>>>>   1092 ?        00:00:00 udevd
>>>>>   1092 ?        00:00:00 udevd
>>>>>   2596 ?        00:00:00 sshd
>>>>>   2596 ?        00:00:00 sshd
>>>>>   2608 ?        00:00:00 sshd
>>>>>   2608 ?        00:00:00 sshd
>>>>>   2613 ?        00:00:00 sshd
>>>>>   2613 ?        00:00:00 sshd
>>>>>   2614 pts/0    00:00:00 bash
>>>>>   2614 pts/0    00:00:00 bash
>>>>>   2620 pts/0    00:00:00 sudo
>>>>>   2620 pts/0    00:00:00 sudo
>>>>>   2621 pts/0    00:00:00 su
>>>>>   2621 pts/0    00:00:00 su
>>>>>   2622 pts/0    00:00:00 bash
>>>>>   2622 pts/0    00:00:00 bash
>>>>>   2685 ?        00:00:00 lxc-start
>>>>>   2685 ?        00:00:00 lxc-start
>>>>>   2699 ?        00:00:00 init
>>>>>   2699 ?        00:00:00 init
>>>>>   2939 ?        00:00:00 rc
>>>>>   2939 ?        00:00:00 rc
>>>>>   2942 ?        00:00:00 startpar
>>>>>   2942 ?        00:00:00 startpar
>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>   2980 ?        00:00:00 startpar
>>>>>   2980 ?        00:00:00 startpar
>>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>>   3016 ?        00:00:00 cron
>>>>>   3016 ?        00:00:00 cron
>>>>>   3025 ?        00:00:00 mysqld_safe
>>>>>   3025 ?        00:00:00 mysqld_safe
>>>>>   3032 ?        00:00:00 sshd
>>>>>   3032 ?        00:00:00 sshd
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3113 ?        00:00:00 ctl.sh
>>>>>   3113 ?        00:00:00 ctl.sh
>>>>>   3115 ?        00:00:00 sleep
>>>>>   3115 ?        00:00:00 sleep
>>>>>   3116 ?        00:00:00 .memcached.bin
>>>>>   3116 ?        00:00:00 .memcached.bin
>>>>>
>>>>>
>>>>> As you can see, it is clear that the newer kernel is setting -17 by
>>>>> default, which in turn is causing the OOM killer loop.
>>>>>
>>>>> So I began to try and find what may have caused this problem by
>>>>> comparing the two sources...
>>>>>
>>>>> I checked the code for all references to 'oom_adj' and 
>>>>> 'oom_adjust' in
>>>>> both code sets, but found no obvious differences:
>>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
>>>>> oom_adj
>>>>>
>>>>> Then I checked for references to "-17" in all .c and .h files, and
>>>>> found a couple of matches, but only one obvious one:
>>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
>>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>>>>>
>>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
>>>>>
>>>>> In a last ditch attempt, I did a search for all references to 'oom'
>>>>> (case-insensitive) in both code bases, then compared the two:
>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  grep -i -R "oom" . | 
>>>>> sort -n
>>>>>> /tmp/annabelle.oom_adj
>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep -i -R
>>>>> "oom" . | sort -n>  /tmp/vicky.oom_adj
>>>>>
>>>>> and this brought back (yet again) nothing obvious..
>>>>>
>>>>>
>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>>> ./include/linux/oom.h
>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>>
>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum 
>>>>> ./include/linux/oom.h
>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>>
>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>>> ./mm/oom_kill.c
>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>>
>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./mm/oom_kill.c
>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>>
>>>>>
>>>>>
>>>>> Could anyone please shed some light as to why the default oom_adj is
>>>>> set to -17 now (and where it is actually set)? From what I can tell,
>>>>> the fix for this issue will either be:
>>>>>
>>>>>    1. Allow OOM killer to override the decision of ignoring 
>>>>> oom_adj ==
>>>>>       -17 if an unrecoverable loop is encountered.
>>>>>    2. Change the default back to 0.
>>>>>
>>>>> Again, my apologies if this bug report is slightly unorthodox, or
>>>>> doesn't follow usual procedure etc. I can assure you I have tried my
>>>>> absolute best to give all the necessary information though.
>>>>>
>>>>> Cal
>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>> linux-kernel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>>
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-30 17:36             ` Cal Leeming [Simplicity Media Ltd]
  0 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 17:36 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kosaki.motohiro, minchan.kim, rientjes,
	kamezawa.hiroyu

Here is an strace of the SSH process (which is somehow inheriting the 
-17 oom_adj on all forked user instances)

(broken server - with bnx2 module loaded)
[pid  2200] [    7f13a09c9cb0] open("/proc/self/oom_adj", 
O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
[pid  2120] [    7f13a09c9f00] write(7, "\0\0\2\240\n\n\n\nPort 
22\n\n\n\nProtocol 2\n\nH"..., 680 <unfinished ...>
[pid  2200] [    7f13a09c9cb0] <... open resumed> ) = 9
[pid  2120] [    7f13a09c9f00] <... write resumed> ) = 680
[pid  2120] [    7f13a09c9e40] close(7 <unfinished ...>
[pid  2200] [    7f13a09c9844] fstat(9, <unfinished ...>
[pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
[pid  2200] [    7f13a09c9844] <... fstat resumed> 
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  2120] [    7f13a09c9e40] close(8 <unfinished ...>
[pid  2200] [    7f13a09d2a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
[pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
[pid  2200] [    7f13a09d2a2a] <... mmap resumed> ) = 0x7f13a25a6000
[pid  2120] [    7f13a09c9e40] close(4 <unfinished ...>
[pid  2200] [    7f13a09c9f00] write(9, "-17\n", 4 <unfinished ...>


(working server - with bnx2 module unloaded)
[pid  1323] [    7fae577fbe40] close(7) = 0
[pid  1631] [    7fae577fbcb0] open("/proc/self/oom_adj", 
O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
[pid  1323] [    7fae577fbf00] write(8, "\0\0\2\217\0", 5 <unfinished ...>
[pid  1631] [    7fae577fbcb0] <... open resumed> ) = 10
[pid  1323] [    7fae577fbf00] <... write resumed> ) = 5
[pid  1323] [    7fae577fbf00] write(8, "\0\0\2\206\n\n\n\nPort 
22\n\n\n\nProtocol 2\n\nH"..., 654 <unfinished ...>
[pid  1631] [    7fae577fb844] fstat(10, <unfinished ...>
[pid  1323] [    7fae577fbf00] <... write resumed> ) = 654
[pid  1631] [    7fae577fb844] <... fstat resumed> 
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  1323] [    7fae577fbe40] close(8) = 0
[pid  1631] [    7fae57804a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
[pid  1323] [    7fae577fbe40] close(9 <unfinished ...>
[pid  1631] [    7fae57804a2a] <... mmap resumed> ) = 0x7fae593d9000
[pid  1323] [    7fae577fbe40] <... close resumed> ) = 0
[pid  1323] [    7fae577fbe40] close(5 <unfinished ...>
[pid  1631] [    7fae577fbf00] write(10, "0\n", 2 <unfinished ...>

The two servers are *EXACT* duplicates of each other, completely fresh 
Debian installs, with exactly the same packages installed.

As you can see, the working server sends "0" into the oom_adj and the 
broken one sends "-17".


On 30/05/2011 15:27, Cal Leeming [Simplicity Media Ltd] wrote:
> I FOUND THE PROBLEM!!!
>
> Explicit details can be found on the Debian kernel mailing list, but 
> to cut short, it's caused by the firmware-bnx2 kernel module:
>
> The broken server uses 'firmware-bnx2'.. so I purged the bnx2 package, 
> removed the bnx*.ko files from /lib/modules, ran update-initramfs, and 
> then rebooted (i then confirmed it was removed by checking ifconfig 
> and lsmod).
>
> And guess what.. IT WORKED.
>
> So, this problem seems to be caused by the firmware-bnx2 module being 
> loaded.. some how, that module is causing -17 oom_adj to be set for 
> everything..
>
> WTF?!?! Surely a bug?? Could someone please forward this to the 
> appropriate person for the bnx2 kernel module, as I wouldn't even know 
> where to begin :S
>
> Cal
>
> On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
>> -resent due to incorrect formatting, sorry if this dups!
>>
>> @Kame
>> Thanks for the reply!
>> Both kernels used the same env/dist, but which slightly different 
>> packages.
>> After many frustrating hours, I have pin pointed this down to a dodgy
>> Debian package which appears to continue affecting the system, even
>> after purging. I'm still yet to pin point the package down (I'm doing
>> several reinstall tests, along with tripwire analysis after each
>> reboot).
>>
>> @Hiroyuki
>> Thank you for sending this to the right people!
>>
>> @linux-mm
>> On a side note, would someone mind taking a few minutes to give a
>> brief explanation as to how the default oom_adj is set, and under what
>> conditions it is given -17 by default? Is this defined by the
>> application? I looked through the kernel source,
>> and noticed some of the code was defaulted to set oom_adj to
>> OOM_DISABLE (which is defined in the headers as -17).
>>
>> Assuming the debian problem is resolved, this might be another call
>> for the oom-killer to be modified so that if it encounters the
>> unrecoverable loop, it ignores the -17 rule (with some exceptions,
>> such as kernel processes, and other critical things). If this is going
>> to be a relatively simple task, I wouldn't mind spending a few hours
>> patching this?
>>
>> Cal
>>
>> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>>>
>>> On Sun, 29 May 2011 23:24:07 +0100
>>> "Cal Leeming [Simplicity Media 
>>> Ltd]"<cal.leeming@simplicitymedialtd.co.uk>  wrote:
>>>
>>>> Some further logs:
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>
>>>>
>>> hmm, in short, applications has -17 oom_adj in default with 2.6.32.41 ?
>>> AFAIK, no kernel has such crazy settings as default..
>>>
>>> Does your 2 kernel uses the same environment/distribution ?
>>>
>>> Thanks,
>>> -Kame
>>>
>>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>>>>>   First of all, my apologies if I have submitted this problem to the
>>>>> wrong place, spent 20 minutes trying to figure out where it needs to
>>>>> be sent, and was still none the wiser.
>>>>>
>>>>> The problem is related to applying memory limitations within a 
>>>>> cgroup.
>>>>> If the OOM killer kicks in, it gets stuck in a loop where it tries to
>>>>> kill a process which has an oom_adj of -17. This causes an infinite
>>>>> loop, which in turn locks up the system.
>>>>>
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>
>>>>>
>>>>>   root@vicky [/home/foxx]>  uname -a
>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
>>>>> x86_64
>>>>> GNU/Linux
>>>>> (this happens on both the grsec patched and non patched 2.6.32.41 
>>>>> kernel)
>>>>>
>>>>> When this is encountered, the memory usage across the whole server is
>>>>> still within limits (not even hitting swap).
>>>>>
>>>>> The memory configuration for the cgroup/lxc is:
>>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
>>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>>>>>
>>>>> Now, what is even more strange, is that when running under the
>>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
>>>>> happen. However, there is a slight difference between the two 
>>>>> kernels.
>>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
>>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is the
>>>>> root cause of why it's showing in the later kernel, but not the 
>>>>> earlier.
>>>>>
>>>>> To test this theory, I started up the lxc on both servers, and then
>>>>> ran a one liner which showed me all the processes with an oom_adj 
>>>>> of -17:
>>>>>
>>>>> (the below is the older/working kernel)
>>>>> root@courtney.internal [/mnt/encstore/lxc]>  uname -a
>>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 
>>>>> GMT
>>>>> 2011 x86_64 GNU/Linux
>>>>> root@courtney.internal [/mnt/encstore/lxc]>  for x in `find /proc
>>>>> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do
>>>>> ps -p $x --no-headers ; done
>>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
>>>>> grep: /proc/1411/oom_adj: No such file or directory
>>>>>    804 ?        00:00:00 udevd
>>>>>    804 ?        00:00:00 udevd
>>>>> 25536 ?        00:00:00 sshd
>>>>> 25536 ?        00:00:00 sshd
>>>>> 31861 ?        00:00:00 sshd
>>>>> 31861 ?        00:00:00 sshd
>>>>> 32173 ?        00:00:00 udevd
>>>>> 32173 ?        00:00:00 udevd
>>>>> 32174 ?        00:00:00 udevd
>>>>> 32174 ?        00:00:00 udevd
>>>>>
>>>>> (the below is the newer/broken kernel)
>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname -a
>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
>>>>> x86_64
>>>>> GNU/Linux
>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x in
>>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' 
>>>>> '{print
>>>>> $3}'` ; do ps -p $x --no-headers ; done
>>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
>>>>> grep: /proc/3118/oom_adj: No such file or directory
>>>>>    895 ?        00:00:00 udevd
>>>>>    895 ?        00:00:00 udevd
>>>>>   1091 ?        00:00:00 udevd
>>>>>   1091 ?        00:00:00 udevd
>>>>>   1092 ?        00:00:00 udevd
>>>>>   1092 ?        00:00:00 udevd
>>>>>   2596 ?        00:00:00 sshd
>>>>>   2596 ?        00:00:00 sshd
>>>>>   2608 ?        00:00:00 sshd
>>>>>   2608 ?        00:00:00 sshd
>>>>>   2613 ?        00:00:00 sshd
>>>>>   2613 ?        00:00:00 sshd
>>>>>   2614 pts/0    00:00:00 bash
>>>>>   2614 pts/0    00:00:00 bash
>>>>>   2620 pts/0    00:00:00 sudo
>>>>>   2620 pts/0    00:00:00 sudo
>>>>>   2621 pts/0    00:00:00 su
>>>>>   2621 pts/0    00:00:00 su
>>>>>   2622 pts/0    00:00:00 bash
>>>>>   2622 pts/0    00:00:00 bash
>>>>>   2685 ?        00:00:00 lxc-start
>>>>>   2685 ?        00:00:00 lxc-start
>>>>>   2699 ?        00:00:00 init
>>>>>   2699 ?        00:00:00 init
>>>>>   2939 ?        00:00:00 rc
>>>>>   2939 ?        00:00:00 rc
>>>>>   2942 ?        00:00:00 startpar
>>>>>   2942 ?        00:00:00 startpar
>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>   2980 ?        00:00:00 startpar
>>>>>   2980 ?        00:00:00 startpar
>>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>>   3016 ?        00:00:00 cron
>>>>>   3016 ?        00:00:00 cron
>>>>>   3025 ?        00:00:00 mysqld_safe
>>>>>   3025 ?        00:00:00 mysqld_safe
>>>>>   3032 ?        00:00:00 sshd
>>>>>   3032 ?        00:00:00 sshd
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>   3113 ?        00:00:00 ctl.sh
>>>>>   3113 ?        00:00:00 ctl.sh
>>>>>   3115 ?        00:00:00 sleep
>>>>>   3115 ?        00:00:00 sleep
>>>>>   3116 ?        00:00:00 .memcached.bin
>>>>>   3116 ?        00:00:00 .memcached.bin
>>>>>
>>>>>
>>>>> As you can see, it is clear that the newer kernel is setting -17 by
>>>>> default, which in turn is causing the OOM killer loop.
>>>>>
>>>>> So I began to try and find what may have caused this problem by
>>>>> comparing the two sources...
>>>>>
>>>>> I checked the code for all references to 'oom_adj' and 
>>>>> 'oom_adjust' in
>>>>> both code sets, but found no obvious differences:
>>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
>>>>> oom_adj
>>>>>
>>>>> Then I checked for references to "-17" in all .c and .h files, and
>>>>> found a couple of matches, but only one obvious one:
>>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
>>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>>>>>
>>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
>>>>>
>>>>> In a last ditch attempt, I did a search for all references to 'oom'
>>>>> (case-insensitive) in both code bases, then compared the two:
>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  grep -i -R "oom" . | 
>>>>> sort -n
>>>>>> /tmp/annabelle.oom_adj
>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep -i -R
>>>>> "oom" . | sort -n>  /tmp/vicky.oom_adj
>>>>>
>>>>> and this brought back (yet again) nothing obvious..
>>>>>
>>>>>
>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>>> ./include/linux/oom.h
>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>>
>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum 
>>>>> ./include/linux/oom.h
>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>>
>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>>> ./mm/oom_kill.c
>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>>
>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./mm/oom_kill.c
>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>>
>>>>>
>>>>>
>>>>> Could anyone please shed some light as to why the default oom_adj is
>>>>> set to -17 now (and where it is actually set)? From what I can tell,
>>>>> the fix for this issue will either be:
>>>>>
>>>>>    1. Allow OOM killer to override the decision of ignoring 
>>>>> oom_adj ==
>>>>>       -17 if an unrecoverable loop is encountered.
>>>>>    2. Change the default back to 0.
>>>>>
>>>>> Again, my apologies if this bug report is slightly unorthodox, or
>>>>> doesn't follow usual procedure etc. I can assure you I have tried my
>>>>> absolute best to give all the necessary information though.
>>>>>
>>>>> Cal
>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>> linux-kernel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: cgroup OOM killer loop causes system to lockup (possible fix included)
  2011-05-30 17:36             ` Cal Leeming [Simplicity Media Ltd]
@ 2011-05-30 21:36               ` Cal Leeming [Simplicity Media Ltd]
  -1 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 21:36 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kosaki.motohiro, minchan.kim, rientjes,
	kamezawa.hiroyu

FYI everyone, I found a bug within openssh-server which caused this problem.

I've patched and submitted to the openssh list.

You can find details of this by googling for:
"port-linux.c bug with oom_adjust_restore() - causes real bad oom_adj - 
which can cause DoS conditions"

It's extremely strange.. :S

Cal

On 30/05/2011 18:36, Cal Leeming [Simplicity Media Ltd] wrote:
> Here is an strace of the SSH process (which is somehow inheriting the 
> -17 oom_adj on all forked user instances)
>
> (broken server - with bnx2 module loaded)
> [pid  2200] [    7f13a09c9cb0] open("/proc/self/oom_adj", 
> O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
> [pid  2120] [    7f13a09c9f00] write(7, "\0\0\2\240\n\n\n\nPort 
> 22\n\n\n\nProtocol 2\n\nH"..., 680 <unfinished ...>
> [pid  2200] [    7f13a09c9cb0] <... open resumed> ) = 9
> [pid  2120] [    7f13a09c9f00] <... write resumed> ) = 680
> [pid  2120] [    7f13a09c9e40] close(7 <unfinished ...>
> [pid  2200] [    7f13a09c9844] fstat(9, <unfinished ...>
> [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
> [pid  2200] [    7f13a09c9844] <... fstat resumed> 
> {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> [pid  2120] [    7f13a09c9e40] close(8 <unfinished ...>
> [pid  2200] [    7f13a09d2a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
> [pid  2200] [    7f13a09d2a2a] <... mmap resumed> ) = 0x7f13a25a6000
> [pid  2120] [    7f13a09c9e40] close(4 <unfinished ...>
> [pid  2200] [    7f13a09c9f00] write(9, "-17\n", 4 <unfinished ...>
>
>
> (working server - with bnx2 module unloaded)
> [pid  1323] [    7fae577fbe40] close(7) = 0
> [pid  1631] [    7fae577fbcb0] open("/proc/self/oom_adj", 
> O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
> [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\217\0", 5 <unfinished 
> ...>
> [pid  1631] [    7fae577fbcb0] <... open resumed> ) = 10
> [pid  1323] [    7fae577fbf00] <... write resumed> ) = 5
> [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\206\n\n\n\nPort 
> 22\n\n\n\nProtocol 2\n\nH"..., 654 <unfinished ...>
> [pid  1631] [    7fae577fb844] fstat(10, <unfinished ...>
> [pid  1323] [    7fae577fbf00] <... write resumed> ) = 654
> [pid  1631] [    7fae577fb844] <... fstat resumed> 
> {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> [pid  1323] [    7fae577fbe40] close(8) = 0
> [pid  1631] [    7fae57804a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> [pid  1323] [    7fae577fbe40] close(9 <unfinished ...>
> [pid  1631] [    7fae57804a2a] <... mmap resumed> ) = 0x7fae593d9000
> [pid  1323] [    7fae577fbe40] <... close resumed> ) = 0
> [pid  1323] [    7fae577fbe40] close(5 <unfinished ...>
> [pid  1631] [    7fae577fbf00] write(10, "0\n", 2 <unfinished ...>
>
> The two servers are *EXACT* duplicates of each other, completely fresh 
> Debian installs, with exactly the same packages installed.
>
> As you can see, the working server sends "0" into the oom_adj and the 
> broken one sends "-17".
>
>
> On 30/05/2011 15:27, Cal Leeming [Simplicity Media Ltd] wrote:
>> I FOUND THE PROBLEM!!!
>>
>> Explicit details can be found on the Debian kernel mailing list, but 
>> to cut short, it's caused by the firmware-bnx2 kernel module:
>>
>> The broken server uses 'firmware-bnx2'.. so I purged the bnx2 
>> package, removed the bnx*.ko files from /lib/modules, ran 
>> update-initramfs, and then rebooted (i then confirmed it was removed 
>> by checking ifconfig and lsmod).
>>
>> And guess what.. IT WORKED.
>>
>> So, this problem seems to be caused by the firmware-bnx2 module being 
>> loaded.. some how, that module is causing -17 oom_adj to be set for 
>> everything..
>>
>> WTF?!?! Surely a bug?? Could someone please forward this to the 
>> appropriate person for the bnx2 kernel module, as I wouldn't even 
>> know where to begin :S
>>
>> Cal
>>
>> On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
>>> -resent due to incorrect formatting, sorry if this dups!
>>>
>>> @Kame
>>> Thanks for the reply!
>>> Both kernels used the same env/dist, but which slightly different 
>>> packages.
>>> After many frustrating hours, I have pin pointed this down to a dodgy
>>> Debian package which appears to continue affecting the system, even
>>> after purging. I'm still yet to pin point the package down (I'm doing
>>> several reinstall tests, along with tripwire analysis after each
>>> reboot).
>>>
>>> @Hiroyuki
>>> Thank you for sending this to the right people!
>>>
>>> @linux-mm
>>> On a side note, would someone mind taking a few minutes to give a
>>> brief explanation as to how the default oom_adj is set, and under what
>>> conditions it is given -17 by default? Is this defined by the
>>> application? I looked through the kernel source,
>>> and noticed some of the code was defaulted to set oom_adj to
>>> OOM_DISABLE (which is defined in the headers as -17).
>>>
>>> Assuming the debian problem is resolved, this might be another call
>>> for the oom-killer to be modified so that if it encounters the
>>> unrecoverable loop, it ignores the -17 rule (with some exceptions,
>>> such as kernel processes, and other critical things). If this is going
>>> to be a relatively simple task, I wouldn't mind spending a few hours
>>> patching this?
>>>
>>> Cal
>>>
>>> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
>>> <kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>>>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>>>>
>>>> On Sun, 29 May 2011 23:24:07 +0100
>>>> "Cal Leeming [Simplicity Media 
>>>> Ltd]"<cal.leeming@simplicitymedialtd.co.uk>  wrote:
>>>>
>>>>> Some further logs:
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>
>>>>>
>>>> hmm, in short, applications has -17 oom_adj in default with 
>>>> 2.6.32.41 ?
>>>> AFAIK, no kernel has such crazy settings as default..
>>>>
>>>> Does your 2 kernel uses the same environment/distribution ?
>>>>
>>>> Thanks,
>>>> -Kame
>>>>
>>>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>>>>>>   First of all, my apologies if I have submitted this problem to the
>>>>>> wrong place, spent 20 minutes trying to figure out where it needs to
>>>>>> be sent, and was still none the wiser.
>>>>>>
>>>>>> The problem is related to applying memory limitations within a 
>>>>>> cgroup.
>>>>>> If the OOM killer kicks in, it gets stuck in a loop where it 
>>>>>> tries to
>>>>>> kill a process which has an oom_adj of -17. This causes an infinite
>>>>>> loop, which in turn locks up the system.
>>>>>>
>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>>
>>>>>>
>>>>>>   root@vicky [/home/foxx]>  uname -a
>>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
>>>>>> x86_64
>>>>>> GNU/Linux
>>>>>> (this happens on both the grsec patched and non patched 2.6.32.41 
>>>>>> kernel)
>>>>>>
>>>>>> When this is encountered, the memory usage across the whole 
>>>>>> server is
>>>>>> still within limits (not even hitting swap).
>>>>>>
>>>>>> The memory configuration for the cgroup/lxc is:
>>>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
>>>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>>>>>>
>>>>>> Now, what is even more strange, is that when running under the
>>>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
>>>>>> happen. However, there is a slight difference between the two 
>>>>>> kernels.
>>>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
>>>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is the
>>>>>> root cause of why it's showing in the later kernel, but not the 
>>>>>> earlier.
>>>>>>
>>>>>> To test this theory, I started up the lxc on both servers, and then
>>>>>> ran a one liner which showed me all the processes with an oom_adj 
>>>>>> of -17:
>>>>>>
>>>>>> (the below is the older/working kernel)
>>>>>> root@courtney.internal [/mnt/encstore/lxc]>  uname -a
>>>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 
>>>>>> 16:09:07 GMT
>>>>>> 2011 x86_64 GNU/Linux
>>>>>> root@courtney.internal [/mnt/encstore/lxc]>  for x in `find /proc
>>>>>> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` 
>>>>>> ; do
>>>>>> ps -p $x --no-headers ; done
>>>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
>>>>>> grep: /proc/1411/oom_adj: No such file or directory
>>>>>>    804 ?        00:00:00 udevd
>>>>>>    804 ?        00:00:00 udevd
>>>>>> 25536 ?        00:00:00 sshd
>>>>>> 25536 ?        00:00:00 sshd
>>>>>> 31861 ?        00:00:00 sshd
>>>>>> 31861 ?        00:00:00 sshd
>>>>>> 32173 ?        00:00:00 udevd
>>>>>> 32173 ?        00:00:00 udevd
>>>>>> 32174 ?        00:00:00 udevd
>>>>>> 32174 ?        00:00:00 udevd
>>>>>>
>>>>>> (the below is the newer/broken kernel)
>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname -a
>>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
>>>>>> x86_64
>>>>>> GNU/Linux
>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x in
>>>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' 
>>>>>> '{print
>>>>>> $3}'` ; do ps -p $x --no-headers ; done
>>>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
>>>>>> grep: /proc/3118/oom_adj: No such file or directory
>>>>>>    895 ?        00:00:00 udevd
>>>>>>    895 ?        00:00:00 udevd
>>>>>>   1091 ?        00:00:00 udevd
>>>>>>   1091 ?        00:00:00 udevd
>>>>>>   1092 ?        00:00:00 udevd
>>>>>>   1092 ?        00:00:00 udevd
>>>>>>   2596 ?        00:00:00 sshd
>>>>>>   2596 ?        00:00:00 sshd
>>>>>>   2608 ?        00:00:00 sshd
>>>>>>   2608 ?        00:00:00 sshd
>>>>>>   2613 ?        00:00:00 sshd
>>>>>>   2613 ?        00:00:00 sshd
>>>>>>   2614 pts/0    00:00:00 bash
>>>>>>   2614 pts/0    00:00:00 bash
>>>>>>   2620 pts/0    00:00:00 sudo
>>>>>>   2620 pts/0    00:00:00 sudo
>>>>>>   2621 pts/0    00:00:00 su
>>>>>>   2621 pts/0    00:00:00 su
>>>>>>   2622 pts/0    00:00:00 bash
>>>>>>   2622 pts/0    00:00:00 bash
>>>>>>   2685 ?        00:00:00 lxc-start
>>>>>>   2685 ?        00:00:00 lxc-start
>>>>>>   2699 ?        00:00:00 init
>>>>>>   2699 ?        00:00:00 init
>>>>>>   2939 ?        00:00:00 rc
>>>>>>   2939 ?        00:00:00 rc
>>>>>>   2942 ?        00:00:00 startpar
>>>>>>   2942 ?        00:00:00 startpar
>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>   2980 ?        00:00:00 startpar
>>>>>>   2980 ?        00:00:00 startpar
>>>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>>>   3016 ?        00:00:00 cron
>>>>>>   3016 ?        00:00:00 cron
>>>>>>   3025 ?        00:00:00 mysqld_safe
>>>>>>   3025 ?        00:00:00 mysqld_safe
>>>>>>   3032 ?        00:00:00 sshd
>>>>>>   3032 ?        00:00:00 sshd
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3113 ?        00:00:00 ctl.sh
>>>>>>   3113 ?        00:00:00 ctl.sh
>>>>>>   3115 ?        00:00:00 sleep
>>>>>>   3115 ?        00:00:00 sleep
>>>>>>   3116 ?        00:00:00 .memcached.bin
>>>>>>   3116 ?        00:00:00 .memcached.bin
>>>>>>
>>>>>>
>>>>>> As you can see, it is clear that the newer kernel is setting -17 by
>>>>>> default, which in turn is causing the OOM killer loop.
>>>>>>
>>>>>> So I began to try and find what may have caused this problem by
>>>>>> comparing the two sources...
>>>>>>
>>>>>> I checked the code for all references to 'oom_adj' and 
>>>>>> 'oom_adjust' in
>>>>>> both code sets, but found no obvious differences:
>>>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
>>>>>> oom_adj
>>>>>>
>>>>>> Then I checked for references to "-17" in all .c and .h files, and
>>>>>> found a couple of matches, but only one obvious one:
>>>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
>>>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>>>>>>
>>>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
>>>>>>
>>>>>> In a last ditch attempt, I did a search for all references to 'oom'
>>>>>> (case-insensitive) in both code bases, then compared the two:
>>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  grep -i -R "oom" . | 
>>>>>> sort -n
>>>>>>> /tmp/annabelle.oom_adj
>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep -i -R
>>>>>> "oom" . | sort -n>  /tmp/vicky.oom_adj
>>>>>>
>>>>>> and this brought back (yet again) nothing obvious..
>>>>>>
>>>>>>
>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>>>> ./include/linux/oom.h
>>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>>>
>>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum 
>>>>>> ./include/linux/oom.h
>>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>>>
>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>>>> ./mm/oom_kill.c
>>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>>>
>>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./mm/oom_kill.c
>>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>>>
>>>>>>
>>>>>>
>>>>>> Could anyone please shed some light as to why the default oom_adj is
>>>>>> set to -17 now (and where it is actually set)? From what I can tell,
>>>>>> the fix for this issue will either be:
>>>>>>
>>>>>>    1. Allow OOM killer to override the decision of ignoring 
>>>>>> oom_adj ==
>>>>>>       -17 if an unrecoverable loop is encountered.
>>>>>>    2. Change the default back to 0.
>>>>>>
>>>>>> Again, my apologies if this bug report is slightly unorthodox, or
>>>>>> doesn't follow usual procedure etc. I can assure you I have tried my
>>>>>> absolute best to give all the necessary information though.
>>>>>>
>>>>>> Cal
>>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>>> linux-kernel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>>>
>>
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-30 21:36               ` Cal Leeming [Simplicity Media Ltd]
  0 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-30 21:36 UTC (permalink / raw)
  To: linux-kernel, linux-mm, kosaki.motohiro, minchan.kim, rientjes,
	kamezawa.hiroyu

FYI everyone, I found a bug within openssh-server which caused this problem.

I've patched and submitted to the openssh list.

You can find details of this by googling for:
"port-linux.c bug with oom_adjust_restore() - causes real bad oom_adj - 
which can cause DoS conditions"

It's extremely strange.. :S

Cal

On 30/05/2011 18:36, Cal Leeming [Simplicity Media Ltd] wrote:
> Here is an strace of the SSH process (which is somehow inheriting the 
> -17 oom_adj on all forked user instances)
>
> (broken server - with bnx2 module loaded)
> [pid  2200] [    7f13a09c9cb0] open("/proc/self/oom_adj", 
> O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
> [pid  2120] [    7f13a09c9f00] write(7, "\0\0\2\240\n\n\n\nPort 
> 22\n\n\n\nProtocol 2\n\nH"..., 680 <unfinished ...>
> [pid  2200] [    7f13a09c9cb0] <... open resumed> ) = 9
> [pid  2120] [    7f13a09c9f00] <... write resumed> ) = 680
> [pid  2120] [    7f13a09c9e40] close(7 <unfinished ...>
> [pid  2200] [    7f13a09c9844] fstat(9, <unfinished ...>
> [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
> [pid  2200] [    7f13a09c9844] <... fstat resumed> 
> {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> [pid  2120] [    7f13a09c9e40] close(8 <unfinished ...>
> [pid  2200] [    7f13a09d2a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
> [pid  2200] [    7f13a09d2a2a] <... mmap resumed> ) = 0x7f13a25a6000
> [pid  2120] [    7f13a09c9e40] close(4 <unfinished ...>
> [pid  2200] [    7f13a09c9f00] write(9, "-17\n", 4 <unfinished ...>
>
>
> (working server - with bnx2 module unloaded)
> [pid  1323] [    7fae577fbe40] close(7) = 0
> [pid  1631] [    7fae577fbcb0] open("/proc/self/oom_adj", 
> O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
> [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\217\0", 5 <unfinished 
> ...>
> [pid  1631] [    7fae577fbcb0] <... open resumed> ) = 10
> [pid  1323] [    7fae577fbf00] <... write resumed> ) = 5
> [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\206\n\n\n\nPort 
> 22\n\n\n\nProtocol 2\n\nH"..., 654 <unfinished ...>
> [pid  1631] [    7fae577fb844] fstat(10, <unfinished ...>
> [pid  1323] [    7fae577fbf00] <... write resumed> ) = 654
> [pid  1631] [    7fae577fb844] <... fstat resumed> 
> {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> [pid  1323] [    7fae577fbe40] close(8) = 0
> [pid  1631] [    7fae57804a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> [pid  1323] [    7fae577fbe40] close(9 <unfinished ...>
> [pid  1631] [    7fae57804a2a] <... mmap resumed> ) = 0x7fae593d9000
> [pid  1323] [    7fae577fbe40] <... close resumed> ) = 0
> [pid  1323] [    7fae577fbe40] close(5 <unfinished ...>
> [pid  1631] [    7fae577fbf00] write(10, "0\n", 2 <unfinished ...>
>
> The two servers are *EXACT* duplicates of each other, completely fresh 
> Debian installs, with exactly the same packages installed.
>
> As you can see, the working server sends "0" into the oom_adj and the 
> broken one sends "-17".
>
>
> On 30/05/2011 15:27, Cal Leeming [Simplicity Media Ltd] wrote:
>> I FOUND THE PROBLEM!!!
>>
>> Explicit details can be found on the Debian kernel mailing list, but 
>> to cut short, it's caused by the firmware-bnx2 kernel module:
>>
>> The broken server uses 'firmware-bnx2'.. so I purged the bnx2 
>> package, removed the bnx*.ko files from /lib/modules, ran 
>> update-initramfs, and then rebooted (i then confirmed it was removed 
>> by checking ifconfig and lsmod).
>>
>> And guess what.. IT WORKED.
>>
>> So, this problem seems to be caused by the firmware-bnx2 module being 
>> loaded.. some how, that module is causing -17 oom_adj to be set for 
>> everything..
>>
>> WTF?!?! Surely a bug?? Could someone please forward this to the 
>> appropriate person for the bnx2 kernel module, as I wouldn't even 
>> know where to begin :S
>>
>> Cal
>>
>> On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
>>> -resent due to incorrect formatting, sorry if this dups!
>>>
>>> @Kame
>>> Thanks for the reply!
>>> Both kernels used the same env/dist, but which slightly different 
>>> packages.
>>> After many frustrating hours, I have pin pointed this down to a dodgy
>>> Debian package which appears to continue affecting the system, even
>>> after purging. I'm still yet to pin point the package down (I'm doing
>>> several reinstall tests, along with tripwire analysis after each
>>> reboot).
>>>
>>> @Hiroyuki
>>> Thank you for sending this to the right people!
>>>
>>> @linux-mm
>>> On a side note, would someone mind taking a few minutes to give a
>>> brief explanation as to how the default oom_adj is set, and under what
>>> conditions it is given -17 by default? Is this defined by the
>>> application? I looked through the kernel source,
>>> and noticed some of the code was defaulted to set oom_adj to
>>> OOM_DISABLE (which is defined in the headers as -17).
>>>
>>> Assuming the debian problem is resolved, this might be another call
>>> for the oom-killer to be modified so that if it encounters the
>>> unrecoverable loop, it ignores the -17 rule (with some exceptions,
>>> such as kernel processes, and other critical things). If this is going
>>> to be a relatively simple task, I wouldn't mind spending a few hours
>>> patching this?
>>>
>>> Cal
>>>
>>> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
>>> <kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>>>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>>>>
>>>> On Sun, 29 May 2011 23:24:07 +0100
>>>> "Cal Leeming [Simplicity Media 
>>>> Ltd]"<cal.leeming@simplicitymedialtd.co.uk>  wrote:
>>>>
>>>>> Some further logs:
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] 
>>>>> redis-server
>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>
>>>>>
>>>> hmm, in short, applications has -17 oom_adj in default with 
>>>> 2.6.32.41 ?
>>>> AFAIK, no kernel has such crazy settings as default..
>>>>
>>>> Does your 2 kernel uses the same environment/distribution ?
>>>>
>>>> Thanks,
>>>> -Kame
>>>>
>>>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>>>>>>   First of all, my apologies if I have submitted this problem to the
>>>>>> wrong place, spent 20 minutes trying to figure out where it needs to
>>>>>> be sent, and was still none the wiser.
>>>>>>
>>>>>> The problem is related to applying memory limitations within a 
>>>>>> cgroup.
>>>>>> If the OOM killer kicks in, it gets stuck in a loop where it 
>>>>>> tries to
>>>>>> kill a process which has an oom_adj of -17. This causes an infinite
>>>>>> loop, which in turn locks up the system.
>>>>>>
>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>>
>>>>>>
>>>>>>   root@vicky [/home/foxx]>  uname -a
>>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
>>>>>> x86_64
>>>>>> GNU/Linux
>>>>>> (this happens on both the grsec patched and non patched 2.6.32.41 
>>>>>> kernel)
>>>>>>
>>>>>> When this is encountered, the memory usage across the whole 
>>>>>> server is
>>>>>> still within limits (not even hitting swap).
>>>>>>
>>>>>> The memory configuration for the cgroup/lxc is:
>>>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
>>>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>>>>>>
>>>>>> Now, what is even more strange, is that when running under the
>>>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
>>>>>> happen. However, there is a slight difference between the two 
>>>>>> kernels.
>>>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
>>>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is the
>>>>>> root cause of why it's showing in the later kernel, but not the 
>>>>>> earlier.
>>>>>>
>>>>>> To test this theory, I started up the lxc on both servers, and then
>>>>>> ran a one liner which showed me all the processes with an oom_adj 
>>>>>> of -17:
>>>>>>
>>>>>> (the below is the older/working kernel)
>>>>>> root@courtney.internal [/mnt/encstore/lxc]>  uname -a
>>>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 
>>>>>> 16:09:07 GMT
>>>>>> 2011 x86_64 GNU/Linux
>>>>>> root@courtney.internal [/mnt/encstore/lxc]>  for x in `find /proc
>>>>>> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` 
>>>>>> ; do
>>>>>> ps -p $x --no-headers ; done
>>>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
>>>>>> grep: /proc/1411/oom_adj: No such file or directory
>>>>>>    804 ?        00:00:00 udevd
>>>>>>    804 ?        00:00:00 udevd
>>>>>> 25536 ?        00:00:00 sshd
>>>>>> 25536 ?        00:00:00 sshd
>>>>>> 31861 ?        00:00:00 sshd
>>>>>> 31861 ?        00:00:00 sshd
>>>>>> 32173 ?        00:00:00 udevd
>>>>>> 32173 ?        00:00:00 udevd
>>>>>> 32174 ?        00:00:00 udevd
>>>>>> 32174 ?        00:00:00 udevd
>>>>>>
>>>>>> (the below is the newer/broken kernel)
>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname -a
>>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
>>>>>> x86_64
>>>>>> GNU/Linux
>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x in
>>>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' 
>>>>>> '{print
>>>>>> $3}'` ; do ps -p $x --no-headers ; done
>>>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
>>>>>> grep: /proc/3118/oom_adj: No such file or directory
>>>>>>    895 ?        00:00:00 udevd
>>>>>>    895 ?        00:00:00 udevd
>>>>>>   1091 ?        00:00:00 udevd
>>>>>>   1091 ?        00:00:00 udevd
>>>>>>   1092 ?        00:00:00 udevd
>>>>>>   1092 ?        00:00:00 udevd
>>>>>>   2596 ?        00:00:00 sshd
>>>>>>   2596 ?        00:00:00 sshd
>>>>>>   2608 ?        00:00:00 sshd
>>>>>>   2608 ?        00:00:00 sshd
>>>>>>   2613 ?        00:00:00 sshd
>>>>>>   2613 ?        00:00:00 sshd
>>>>>>   2614 pts/0    00:00:00 bash
>>>>>>   2614 pts/0    00:00:00 bash
>>>>>>   2620 pts/0    00:00:00 sudo
>>>>>>   2620 pts/0    00:00:00 sudo
>>>>>>   2621 pts/0    00:00:00 su
>>>>>>   2621 pts/0    00:00:00 su
>>>>>>   2622 pts/0    00:00:00 bash
>>>>>>   2622 pts/0    00:00:00 bash
>>>>>>   2685 ?        00:00:00 lxc-start
>>>>>>   2685 ?        00:00:00 lxc-start
>>>>>>   2699 ?        00:00:00 init
>>>>>>   2699 ?        00:00:00 init
>>>>>>   2939 ?        00:00:00 rc
>>>>>>   2939 ?        00:00:00 rc
>>>>>>   2942 ?        00:00:00 startpar
>>>>>>   2942 ?        00:00:00 startpar
>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>   2980 ?        00:00:00 startpar
>>>>>>   2980 ?        00:00:00 startpar
>>>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>>>   3016 ?        00:00:00 cron
>>>>>>   3016 ?        00:00:00 cron
>>>>>>   3025 ?        00:00:00 mysqld_safe
>>>>>>   3025 ?        00:00:00 mysqld_safe
>>>>>>   3032 ?        00:00:00 sshd
>>>>>>   3032 ?        00:00:00 sshd
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>   3113 ?        00:00:00 ctl.sh
>>>>>>   3113 ?        00:00:00 ctl.sh
>>>>>>   3115 ?        00:00:00 sleep
>>>>>>   3115 ?        00:00:00 sleep
>>>>>>   3116 ?        00:00:00 .memcached.bin
>>>>>>   3116 ?        00:00:00 .memcached.bin
>>>>>>
>>>>>>
>>>>>> As you can see, it is clear that the newer kernel is setting -17 by
>>>>>> default, which in turn is causing the OOM killer loop.
>>>>>>
>>>>>> So I began to try and find what may have caused this problem by
>>>>>> comparing the two sources...
>>>>>>
>>>>>> I checked the code for all references to 'oom_adj' and 
>>>>>> 'oom_adjust' in
>>>>>> both code sets, but found no obvious differences:
>>>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
>>>>>> oom_adj
>>>>>>
>>>>>> Then I checked for references to "-17" in all .c and .h files, and
>>>>>> found a couple of matches, but only one obvious one:
>>>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
>>>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>>>>>>
>>>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
>>>>>>
>>>>>> In a last ditch attempt, I did a search for all references to 'oom'
>>>>>> (case-insensitive) in both code bases, then compared the two:
>>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  grep -i -R "oom" . | 
>>>>>> sort -n
>>>>>>> /tmp/annabelle.oom_adj
>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep -i -R
>>>>>> "oom" . | sort -n>  /tmp/vicky.oom_adj
>>>>>>
>>>>>> and this brought back (yet again) nothing obvious..
>>>>>>
>>>>>>
>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>>>> ./include/linux/oom.h
>>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>>>
>>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum 
>>>>>> ./include/linux/oom.h
>>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>>>
>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>>>> ./mm/oom_kill.c
>>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>>>
>>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./mm/oom_kill.c
>>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>>>
>>>>>>
>>>>>>
>>>>>> Could anyone please shed some light as to why the default oom_adj is
>>>>>> set to -17 now (and where it is actually set)? From what I can tell,
>>>>>> the fix for this issue will either be:
>>>>>>
>>>>>>    1. Allow OOM killer to override the decision of ignoring 
>>>>>> oom_adj ==
>>>>>>       -17 if an unrecoverable loop is encountered.
>>>>>>    2. Change the default back to 0.
>>>>>>
>>>>>> Again, my apologies if this bug report is slightly unorthodox, or
>>>>>> doesn't follow usual procedure etc. I can assure you I have tried my
>>>>>> absolute best to give all the necessary information though.
>>>>>>
>>>>>> Cal
>>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>>> linux-kernel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>>>
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: cgroup OOM killer loop causes system to lockup (possible fix included)
  2011-05-30 21:36               ` Cal Leeming [Simplicity Media Ltd]
@ 2011-05-30 23:50                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-05-30 23:50 UTC (permalink / raw)
  To: Cal Leeming [Simplicity Media Ltd]
  Cc: linux-kernel, linux-mm, kosaki.motohiro, minchan.kim, rientjes

On Mon, 30 May 2011 22:36:10 +0100
"Cal Leeming [Simplicity Media Ltd]" <cal.leeming@simplicitymedialtd.co.uk> wrote:

> FYI everyone, I found a bug within openssh-server which caused this problem.
> 
> I've patched and submitted to the openssh list.
> 
> You can find details of this by googling for:
> "port-linux.c bug with oom_adjust_restore() - causes real bad oom_adj - 
> which can cause DoS conditions"
> 

Thank you.

> It's extremely strange.. :S
> 

yes...

Thanks,
-Kame

> Cal
> 
> On 30/05/2011 18:36, Cal Leeming [Simplicity Media Ltd] wrote:
> > Here is an strace of the SSH process (which is somehow inheriting the 
> > -17 oom_adj on all forked user instances)
> >
> > (broken server - with bnx2 module loaded)
> > [pid  2200] [    7f13a09c9cb0] open("/proc/self/oom_adj", 
> > O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
> > [pid  2120] [    7f13a09c9f00] write(7, "\0\0\2\240\n\n\n\nPort 
> > 22\n\n\n\nProtocol 2\n\nH"..., 680 <unfinished ...>
> > [pid  2200] [    7f13a09c9cb0] <... open resumed> ) = 9
> > [pid  2120] [    7f13a09c9f00] <... write resumed> ) = 680
> > [pid  2120] [    7f13a09c9e40] close(7 <unfinished ...>
> > [pid  2200] [    7f13a09c9844] fstat(9, <unfinished ...>
> > [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
> > [pid  2200] [    7f13a09c9844] <... fstat resumed> 
> > {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > [pid  2120] [    7f13a09c9e40] close(8 <unfinished ...>
> > [pid  2200] [    7f13a09d2a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
> > [pid  2200] [    7f13a09d2a2a] <... mmap resumed> ) = 0x7f13a25a6000
> > [pid  2120] [    7f13a09c9e40] close(4 <unfinished ...>
> > [pid  2200] [    7f13a09c9f00] write(9, "-17\n", 4 <unfinished ...>
> >
> >
> > (working server - with bnx2 module unloaded)
> > [pid  1323] [    7fae577fbe40] close(7) = 0
> > [pid  1631] [    7fae577fbcb0] open("/proc/self/oom_adj", 
> > O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
> > [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\217\0", 5 <unfinished 
> > ...>
> > [pid  1631] [    7fae577fbcb0] <... open resumed> ) = 10
> > [pid  1323] [    7fae577fbf00] <... write resumed> ) = 5
> > [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\206\n\n\n\nPort 
> > 22\n\n\n\nProtocol 2\n\nH"..., 654 <unfinished ...>
> > [pid  1631] [    7fae577fb844] fstat(10, <unfinished ...>
> > [pid  1323] [    7fae577fbf00] <... write resumed> ) = 654
> > [pid  1631] [    7fae577fb844] <... fstat resumed> 
> > {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > [pid  1323] [    7fae577fbe40] close(8) = 0
> > [pid  1631] [    7fae57804a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > [pid  1323] [    7fae577fbe40] close(9 <unfinished ...>
> > [pid  1631] [    7fae57804a2a] <... mmap resumed> ) = 0x7fae593d9000
> > [pid  1323] [    7fae577fbe40] <... close resumed> ) = 0
> > [pid  1323] [    7fae577fbe40] close(5 <unfinished ...>
> > [pid  1631] [    7fae577fbf00] write(10, "0\n", 2 <unfinished ...>
> >
> > The two servers are *EXACT* duplicates of each other, completely fresh 
> > Debian installs, with exactly the same packages installed.
> >
> > As you can see, the working server sends "0" into the oom_adj and the 
> > broken one sends "-17".
> >
> >
> > On 30/05/2011 15:27, Cal Leeming [Simplicity Media Ltd] wrote:
> >> I FOUND THE PROBLEM!!!
> >>
> >> Explicit details can be found on the Debian kernel mailing list, but 
> >> to cut short, it's caused by the firmware-bnx2 kernel module:
> >>
> >> The broken server uses 'firmware-bnx2'.. so I purged the bnx2 
> >> package, removed the bnx*.ko files from /lib/modules, ran 
> >> update-initramfs, and then rebooted (i then confirmed it was removed 
> >> by checking ifconfig and lsmod).
> >>
> >> And guess what.. IT WORKED.
> >>
> >> So, this problem seems to be caused by the firmware-bnx2 module being 
> >> loaded.. some how, that module is causing -17 oom_adj to be set for 
> >> everything..
> >>
> >> WTF?!?! Surely a bug?? Could someone please forward this to the 
> >> appropriate person for the bnx2 kernel module, as I wouldn't even 
> >> know where to begin :S
> >>
> >> Cal
> >>
> >> On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
> >>> -resent due to incorrect formatting, sorry if this dups!
> >>>
> >>> @Kame
> >>> Thanks for the reply!
> >>> Both kernels used the same env/dist, but which slightly different 
> >>> packages.
> >>> After many frustrating hours, I have pin pointed this down to a dodgy
> >>> Debian package which appears to continue affecting the system, even
> >>> after purging. I'm still yet to pin point the package down (I'm doing
> >>> several reinstall tests, along with tripwire analysis after each
> >>> reboot).
> >>>
> >>> @Hiroyuki
> >>> Thank you for sending this to the right people!
> >>>
> >>> @linux-mm
> >>> On a side note, would someone mind taking a few minutes to give a
> >>> brief explanation as to how the default oom_adj is set, and under what
> >>> conditions it is given -17 by default? Is this defined by the
> >>> application? I looked through the kernel source,
> >>> and noticed some of the code was defaulted to set oom_adj to
> >>> OOM_DISABLE (which is defined in the headers as -17).
> >>>
> >>> Assuming the debian problem is resolved, this might be another call
> >>> for the oom-killer to be modified so that if it encounters the
> >>> unrecoverable loop, it ignores the -17 rule (with some exceptions,
> >>> such as kernel processes, and other critical things). If this is going
> >>> to be a relatively simple task, I wouldn't mind spending a few hours
> >>> patching this?
> >>>
> >>> Cal
> >>>
> >>> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
> >>> <kamezawa.hiroyu@jp.fujitsu.com>  wrote:
> >>>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
> >>>>
> >>>> On Sun, 29 May 2011 23:24:07 +0100
> >>>> "Cal Leeming [Simplicity Media 
> >>>> Ltd]"<cal.leeming@simplicitymedialtd.co.uk>  wrote:
> >>>>
> >>>>> Some further logs:
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>>
> >>>>>
> >>>> hmm, in short, applications has -17 oom_adj in default with 
> >>>> 2.6.32.41 ?
> >>>> AFAIK, no kernel has such crazy settings as default..
> >>>>
> >>>> Does your 2 kernel uses the same environment/distribution ?
> >>>>
> >>>> Thanks,
> >>>> -Kame
> >>>>
> >>>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
> >>>>>>   First of all, my apologies if I have submitted this problem to the
> >>>>>> wrong place, spent 20 minutes trying to figure out where it needs to
> >>>>>> be sent, and was still none the wiser.
> >>>>>>
> >>>>>> The problem is related to applying memory limitations within a 
> >>>>>> cgroup.
> >>>>>> If the OOM killer kicks in, it gets stuck in a loop where it 
> >>>>>> tries to
> >>>>>> kill a process which has an oom_adj of -17. This causes an infinite
> >>>>>> loop, which in turn locks up the system.
> >>>>>>
> >>>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
> >>>>>> memory: kill process 6016 (java) score 0 or a child
> >>>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
> >>>>>> memory: kill process 6016 (java) score 0 or a child
> >>>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
> >>>>>> memory: kill process 6016 (java) score 0 or a child
> >>>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
> >>>>>> memory: kill process 6016 (java) score 0 or a child
> >>>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
> >>>>>> memory: kill process 6016 (java) score 0 or a child
> >>>>>>
> >>>>>>
> >>>>>>   root@vicky [/home/foxx]>  uname -a
> >>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
> >>>>>> x86_64
> >>>>>> GNU/Linux
> >>>>>> (this happens on both the grsec patched and non patched 2.6.32.41 
> >>>>>> kernel)
> >>>>>>
> >>>>>> When this is encountered, the memory usage across the whole 
> >>>>>> server is
> >>>>>> still within limits (not even hitting swap).
> >>>>>>
> >>>>>> The memory configuration for the cgroup/lxc is:
> >>>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
> >>>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
> >>>>>>
> >>>>>> Now, what is even more strange, is that when running under the
> >>>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
> >>>>>> happen. However, there is a slight difference between the two 
> >>>>>> kernels.
> >>>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
> >>>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is the
> >>>>>> root cause of why it's showing in the later kernel, but not the 
> >>>>>> earlier.
> >>>>>>
> >>>>>> To test this theory, I started up the lxc on both servers, and then
> >>>>>> ran a one liner which showed me all the processes with an oom_adj 
> >>>>>> of -17:
> >>>>>>
> >>>>>> (the below is the older/working kernel)
> >>>>>> root@courtney.internal [/mnt/encstore/lxc]>  uname -a
> >>>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 
> >>>>>> 16:09:07 GMT
> >>>>>> 2011 x86_64 GNU/Linux
> >>>>>> root@courtney.internal [/mnt/encstore/lxc]>  for x in `find /proc
> >>>>>> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` 
> >>>>>> ; do
> >>>>>> ps -p $x --no-headers ; done
> >>>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
> >>>>>> grep: /proc/1411/oom_adj: No such file or directory
> >>>>>>    804 ?        00:00:00 udevd
> >>>>>>    804 ?        00:00:00 udevd
> >>>>>> 25536 ?        00:00:00 sshd
> >>>>>> 25536 ?        00:00:00 sshd
> >>>>>> 31861 ?        00:00:00 sshd
> >>>>>> 31861 ?        00:00:00 sshd
> >>>>>> 32173 ?        00:00:00 udevd
> >>>>>> 32173 ?        00:00:00 udevd
> >>>>>> 32174 ?        00:00:00 udevd
> >>>>>> 32174 ?        00:00:00 udevd
> >>>>>>
> >>>>>> (the below is the newer/broken kernel)
> >>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname -a
> >>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
> >>>>>> x86_64
> >>>>>> GNU/Linux
> >>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x in
> >>>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' 
> >>>>>> '{print
> >>>>>> $3}'` ; do ps -p $x --no-headers ; done
> >>>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
> >>>>>> grep: /proc/3118/oom_adj: No such file or directory
> >>>>>>    895 ?        00:00:00 udevd
> >>>>>>    895 ?        00:00:00 udevd
> >>>>>>   1091 ?        00:00:00 udevd
> >>>>>>   1091 ?        00:00:00 udevd
> >>>>>>   1092 ?        00:00:00 udevd
> >>>>>>   1092 ?        00:00:00 udevd
> >>>>>>   2596 ?        00:00:00 sshd
> >>>>>>   2596 ?        00:00:00 sshd
> >>>>>>   2608 ?        00:00:00 sshd
> >>>>>>   2608 ?        00:00:00 sshd
> >>>>>>   2613 ?        00:00:00 sshd
> >>>>>>   2613 ?        00:00:00 sshd
> >>>>>>   2614 pts/0    00:00:00 bash
> >>>>>>   2614 pts/0    00:00:00 bash
> >>>>>>   2620 pts/0    00:00:00 sudo
> >>>>>>   2620 pts/0    00:00:00 sudo
> >>>>>>   2621 pts/0    00:00:00 su
> >>>>>>   2621 pts/0    00:00:00 su
> >>>>>>   2622 pts/0    00:00:00 bash
> >>>>>>   2622 pts/0    00:00:00 bash
> >>>>>>   2685 ?        00:00:00 lxc-start
> >>>>>>   2685 ?        00:00:00 lxc-start
> >>>>>>   2699 ?        00:00:00 init
> >>>>>>   2699 ?        00:00:00 init
> >>>>>>   2939 ?        00:00:00 rc
> >>>>>>   2939 ?        00:00:00 rc
> >>>>>>   2942 ?        00:00:00 startpar
> >>>>>>   2942 ?        00:00:00 startpar
> >>>>>>   2964 ?        00:00:00 rsyslogd
> >>>>>>   2964 ?        00:00:00 rsyslogd
> >>>>>>   2964 ?        00:00:00 rsyslogd
> >>>>>>   2964 ?        00:00:00 rsyslogd
> >>>>>>   2980 ?        00:00:00 startpar
> >>>>>>   2980 ?        00:00:00 startpar
> >>>>>>   2981 ?        00:00:00 ctlscript.sh
> >>>>>>   2981 ?        00:00:00 ctlscript.sh
> >>>>>>   3016 ?        00:00:00 cron
> >>>>>>   3016 ?        00:00:00 cron
> >>>>>>   3025 ?        00:00:00 mysqld_safe
> >>>>>>   3025 ?        00:00:00 mysqld_safe
> >>>>>>   3032 ?        00:00:00 sshd
> >>>>>>   3032 ?        00:00:00 sshd
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3113 ?        00:00:00 ctl.sh
> >>>>>>   3113 ?        00:00:00 ctl.sh
> >>>>>>   3115 ?        00:00:00 sleep
> >>>>>>   3115 ?        00:00:00 sleep
> >>>>>>   3116 ?        00:00:00 .memcached.bin
> >>>>>>   3116 ?        00:00:00 .memcached.bin
> >>>>>>
> >>>>>>
> >>>>>> As you can see, it is clear that the newer kernel is setting -17 by
> >>>>>> default, which in turn is causing the OOM killer loop.
> >>>>>>
> >>>>>> So I began to try and find what may have caused this problem by
> >>>>>> comparing the two sources...
> >>>>>>
> >>>>>> I checked the code for all references to 'oom_adj' and 
> >>>>>> 'oom_adjust' in
> >>>>>> both code sets, but found no obvious differences:
> >>>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
> >>>>>> oom_adj
> >>>>>>
> >>>>>> Then I checked for references to "-17" in all .c and .h files, and
> >>>>>> found a couple of matches, but only one obvious one:
> >>>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> >>>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
> >>>>>>
> >>>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
> >>>>>>
> >>>>>> In a last ditch attempt, I did a search for all references to 'oom'
> >>>>>> (case-insensitive) in both code bases, then compared the two:
> >>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  grep -i -R "oom" . | 
> >>>>>> sort -n
> >>>>>>> /tmp/annabelle.oom_adj
> >>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep -i -R
> >>>>>> "oom" . | sort -n>  /tmp/vicky.oom_adj
> >>>>>>
> >>>>>> and this brought back (yet again) nothing obvious..
> >>>>>>
> >>>>>>
> >>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
> >>>>>> ./include/linux/oom.h
> >>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> >>>>>>
> >>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum 
> >>>>>> ./include/linux/oom.h
> >>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> >>>>>>
> >>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
> >>>>>> ./mm/oom_kill.c
> >>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> >>>>>>
> >>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./mm/oom_kill.c
> >>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Could anyone please shed some light as to why the default oom_adj is
> >>>>>> set to -17 now (and where it is actually set)? From what I can tell,
> >>>>>> the fix for this issue will either be:
> >>>>>>
> >>>>>>    1. Allow OOM killer to override the decision of ignoring 
> >>>>>> oom_adj ==
> >>>>>>       -17 if an unrecoverable loop is encountered.
> >>>>>>    2. Change the default back to 0.
> >>>>>>
> >>>>>> Again, my apologies if this bug report is slightly unorthodox, or
> >>>>>> doesn't follow usual procedure etc. I can assure you I have tried my
> >>>>>> absolute best to give all the necessary information though.
> >>>>>>
> >>>>>> Cal
> >>>>>>
> >>>>> -- 
> >>>>> To unsubscribe from this list: send the line "unsubscribe 
> >>>>> linux-kernel" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>> Please read the FAQ at  http://www.tux.org/lkml/
> >>>>>
> >>
> >
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: cgroup OOM killer loop causes system to lockup (possible fix included)
@ 2011-05-30 23:50                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-05-30 23:50 UTC (permalink / raw)
  To: Cal Leeming [Simplicity Media Ltd]
  Cc: linux-kernel, linux-mm, kosaki.motohiro, minchan.kim, rientjes

On Mon, 30 May 2011 22:36:10 +0100
"Cal Leeming [Simplicity Media Ltd]" <cal.leeming@simplicitymedialtd.co.uk> wrote:

> FYI everyone, I found a bug within openssh-server which caused this problem.
> 
> I've patched and submitted to the openssh list.
> 
> You can find details of this by googling for:
> "port-linux.c bug with oom_adjust_restore() - causes real bad oom_adj - 
> which can cause DoS conditions"
> 

Thank you.

> It's extremely strange.. :S
> 

yes...

Thanks,
-Kame

> Cal
> 
> On 30/05/2011 18:36, Cal Leeming [Simplicity Media Ltd] wrote:
> > Here is an strace of the SSH process (which is somehow inheriting the 
> > -17 oom_adj on all forked user instances)
> >
> > (broken server - with bnx2 module loaded)
> > [pid  2200] [    7f13a09c9cb0] open("/proc/self/oom_adj", 
> > O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
> > [pid  2120] [    7f13a09c9f00] write(7, "\0\0\2\240\n\n\n\nPort 
> > 22\n\n\n\nProtocol 2\n\nH"..., 680 <unfinished ...>
> > [pid  2200] [    7f13a09c9cb0] <... open resumed> ) = 9
> > [pid  2120] [    7f13a09c9f00] <... write resumed> ) = 680
> > [pid  2120] [    7f13a09c9e40] close(7 <unfinished ...>
> > [pid  2200] [    7f13a09c9844] fstat(9, <unfinished ...>
> > [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
> > [pid  2200] [    7f13a09c9844] <... fstat resumed> 
> > {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > [pid  2120] [    7f13a09c9e40] close(8 <unfinished ...>
> > [pid  2200] [    7f13a09d2a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
> > [pid  2200] [    7f13a09d2a2a] <... mmap resumed> ) = 0x7f13a25a6000
> > [pid  2120] [    7f13a09c9e40] close(4 <unfinished ...>
> > [pid  2200] [    7f13a09c9f00] write(9, "-17\n", 4 <unfinished ...>
> >
> >
> > (working server - with bnx2 module unloaded)
> > [pid  1323] [    7fae577fbe40] close(7) = 0
> > [pid  1631] [    7fae577fbcb0] open("/proc/self/oom_adj", 
> > O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
> > [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\217\0", 5 <unfinished 
> > ...>
> > [pid  1631] [    7fae577fbcb0] <... open resumed> ) = 10
> > [pid  1323] [    7fae577fbf00] <... write resumed> ) = 5
> > [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\206\n\n\n\nPort 
> > 22\n\n\n\nProtocol 2\n\nH"..., 654 <unfinished ...>
> > [pid  1631] [    7fae577fb844] fstat(10, <unfinished ...>
> > [pid  1323] [    7fae577fbf00] <... write resumed> ) = 654
> > [pid  1631] [    7fae577fb844] <... fstat resumed> 
> > {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > [pid  1323] [    7fae577fbe40] close(8) = 0
> > [pid  1631] [    7fae57804a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > [pid  1323] [    7fae577fbe40] close(9 <unfinished ...>
> > [pid  1631] [    7fae57804a2a] <... mmap resumed> ) = 0x7fae593d9000
> > [pid  1323] [    7fae577fbe40] <... close resumed> ) = 0
> > [pid  1323] [    7fae577fbe40] close(5 <unfinished ...>
> > [pid  1631] [    7fae577fbf00] write(10, "0\n", 2 <unfinished ...>
> >
> > The two servers are *EXACT* duplicates of each other, completely fresh 
> > Debian installs, with exactly the same packages installed.
> >
> > As you can see, the working server sends "0" into the oom_adj and the 
> > broken one sends "-17".
> >
> >
> > On 30/05/2011 15:27, Cal Leeming [Simplicity Media Ltd] wrote:
> >> I FOUND THE PROBLEM!!!
> >>
> >> Explicit details can be found on the Debian kernel mailing list, but 
> >> to cut short, it's caused by the firmware-bnx2 kernel module:
> >>
> >> The broken server uses 'firmware-bnx2'.. so I purged the bnx2 
> >> package, removed the bnx*.ko files from /lib/modules, ran 
> >> update-initramfs, and then rebooted (i then confirmed it was removed 
> >> by checking ifconfig and lsmod).
> >>
> >> And guess what.. IT WORKED.
> >>
> >> So, this problem seems to be caused by the firmware-bnx2 module being 
> >> loaded.. some how, that module is causing -17 oom_adj to be set for 
> >> everything..
> >>
> >> WTF?!?! Surely a bug?? Could someone please forward this to the 
> >> appropriate person for the bnx2 kernel module, as I wouldn't even 
> >> know where to begin :S
> >>
> >> Cal
> >>
> >> On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
> >>> -resent due to incorrect formatting, sorry if this dups!
> >>>
> >>> @Kame
> >>> Thanks for the reply!
> >>> Both kernels used the same env/dist, but which slightly different 
> >>> packages.
> >>> After many frustrating hours, I have pin pointed this down to a dodgy
> >>> Debian package which appears to continue affecting the system, even
> >>> after purging. I'm still yet to pin point the package down (I'm doing
> >>> several reinstall tests, along with tripwire analysis after each
> >>> reboot).
> >>>
> >>> @Hiroyuki
> >>> Thank you for sending this to the right people!
> >>>
> >>> @linux-mm
> >>> On a side note, would someone mind taking a few minutes to give a
> >>> brief explanation as to how the default oom_adj is set, and under what
> >>> conditions it is given -17 by default? Is this defined by the
> >>> application? I looked through the kernel source,
> >>> and noticed some of the code was defaulted to set oom_adj to
> >>> OOM_DISABLE (which is defined in the headers as -17).
> >>>
> >>> Assuming the debian problem is resolved, this might be another call
> >>> for the oom-killer to be modified so that if it encounters the
> >>> unrecoverable loop, it ignores the -17 rule (with some exceptions,
> >>> such as kernel processes, and other critical things). If this is going
> >>> to be a relatively simple task, I wouldn't mind spending a few hours
> >>> patching this?
> >>>
> >>> Cal
> >>>
> >>> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
> >>> <kamezawa.hiroyu@jp.fujitsu.com>  wrote:
> >>>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
> >>>>
> >>>> On Sun, 29 May 2011 23:24:07 +0100
> >>>> "Cal Leeming [Simplicity Media 
> >>>> Ltd]"<cal.leeming@simplicitymedialtd.co.uk>  wrote:
> >>>>
> >>>>> Some further logs:
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] 
> >>>>> redis-server
> >>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> >>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
> >>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
> >>>>>
> >>>>>
> >>>> hmm, in short, applications has -17 oom_adj in default with 
> >>>> 2.6.32.41 ?
> >>>> AFAIK, no kernel has such crazy settings as default..
> >>>>
> >>>> Does your 2 kernel uses the same environment/distribution ?
> >>>>
> >>>> Thanks,
> >>>> -Kame
> >>>>
> >>>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
> >>>>>>   First of all, my apologies if I have submitted this problem to the
> >>>>>> wrong place, spent 20 minutes trying to figure out where it needs to
> >>>>>> be sent, and was still none the wiser.
> >>>>>>
> >>>>>> The problem is related to applying memory limitations within a 
> >>>>>> cgroup.
> >>>>>> If the OOM killer kicks in, it gets stuck in a loop where it 
> >>>>>> tries to
> >>>>>> kill a process which has an oom_adj of -17. This causes an infinite
> >>>>>> loop, which in turn locks up the system.
> >>>>>>
> >>>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
> >>>>>> memory: kill process 6016 (java) score 0 or a child
> >>>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
> >>>>>> memory: kill process 6016 (java) score 0 or a child
> >>>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
> >>>>>> memory: kill process 6016 (java) score 0 or a child
> >>>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
> >>>>>> memory: kill process 6016 (java) score 0 or a child
> >>>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
> >>>>>> memory: kill process 6016 (java) score 0 or a child
> >>>>>>
> >>>>>>
> >>>>>>   root@vicky [/home/foxx]>  uname -a
> >>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
> >>>>>> x86_64
> >>>>>> GNU/Linux
> >>>>>> (this happens on both the grsec patched and non patched 2.6.32.41 
> >>>>>> kernel)
> >>>>>>
> >>>>>> When this is encountered, the memory usage across the whole 
> >>>>>> server is
> >>>>>> still within limits (not even hitting swap).
> >>>>>>
> >>>>>> The memory configuration for the cgroup/lxc is:
> >>>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
> >>>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
> >>>>>>
> >>>>>> Now, what is even more strange, is that when running under the
> >>>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
> >>>>>> happen. However, there is a slight difference between the two 
> >>>>>> kernels.
> >>>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
> >>>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is the
> >>>>>> root cause of why it's showing in the later kernel, but not the 
> >>>>>> earlier.
> >>>>>>
> >>>>>> To test this theory, I started up the lxc on both servers, and then
> >>>>>> ran a one liner which showed me all the processes with an oom_adj 
> >>>>>> of -17:
> >>>>>>
> >>>>>> (the below is the older/working kernel)
> >>>>>> root@courtney.internal [/mnt/encstore/lxc]>  uname -a
> >>>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 
> >>>>>> 16:09:07 GMT
> >>>>>> 2011 x86_64 GNU/Linux
> >>>>>> root@courtney.internal [/mnt/encstore/lxc]>  for x in `find /proc
> >>>>>> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` 
> >>>>>> ; do
> >>>>>> ps -p $x --no-headers ; done
> >>>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
> >>>>>> grep: /proc/1411/oom_adj: No such file or directory
> >>>>>>    804 ?        00:00:00 udevd
> >>>>>>    804 ?        00:00:00 udevd
> >>>>>> 25536 ?        00:00:00 sshd
> >>>>>> 25536 ?        00:00:00 sshd
> >>>>>> 31861 ?        00:00:00 sshd
> >>>>>> 31861 ?        00:00:00 sshd
> >>>>>> 32173 ?        00:00:00 udevd
> >>>>>> 32173 ?        00:00:00 udevd
> >>>>>> 32174 ?        00:00:00 udevd
> >>>>>> 32174 ?        00:00:00 udevd
> >>>>>>
> >>>>>> (the below is the newer/broken kernel)
> >>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname -a
> >>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
> >>>>>> x86_64
> >>>>>> GNU/Linux
> >>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x in
> >>>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' 
> >>>>>> '{print
> >>>>>> $3}'` ; do ps -p $x --no-headers ; done
> >>>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
> >>>>>> grep: /proc/3118/oom_adj: No such file or directory
> >>>>>>    895 ?        00:00:00 udevd
> >>>>>>    895 ?        00:00:00 udevd
> >>>>>>   1091 ?        00:00:00 udevd
> >>>>>>   1091 ?        00:00:00 udevd
> >>>>>>   1092 ?        00:00:00 udevd
> >>>>>>   1092 ?        00:00:00 udevd
> >>>>>>   2596 ?        00:00:00 sshd
> >>>>>>   2596 ?        00:00:00 sshd
> >>>>>>   2608 ?        00:00:00 sshd
> >>>>>>   2608 ?        00:00:00 sshd
> >>>>>>   2613 ?        00:00:00 sshd
> >>>>>>   2613 ?        00:00:00 sshd
> >>>>>>   2614 pts/0    00:00:00 bash
> >>>>>>   2614 pts/0    00:00:00 bash
> >>>>>>   2620 pts/0    00:00:00 sudo
> >>>>>>   2620 pts/0    00:00:00 sudo
> >>>>>>   2621 pts/0    00:00:00 su
> >>>>>>   2621 pts/0    00:00:00 su
> >>>>>>   2622 pts/0    00:00:00 bash
> >>>>>>   2622 pts/0    00:00:00 bash
> >>>>>>   2685 ?        00:00:00 lxc-start
> >>>>>>   2685 ?        00:00:00 lxc-start
> >>>>>>   2699 ?        00:00:00 init
> >>>>>>   2699 ?        00:00:00 init
> >>>>>>   2939 ?        00:00:00 rc
> >>>>>>   2939 ?        00:00:00 rc
> >>>>>>   2942 ?        00:00:00 startpar
> >>>>>>   2942 ?        00:00:00 startpar
> >>>>>>   2964 ?        00:00:00 rsyslogd
> >>>>>>   2964 ?        00:00:00 rsyslogd
> >>>>>>   2964 ?        00:00:00 rsyslogd
> >>>>>>   2964 ?        00:00:00 rsyslogd
> >>>>>>   2980 ?        00:00:00 startpar
> >>>>>>   2980 ?        00:00:00 startpar
> >>>>>>   2981 ?        00:00:00 ctlscript.sh
> >>>>>>   2981 ?        00:00:00 ctlscript.sh
> >>>>>>   3016 ?        00:00:00 cron
> >>>>>>   3016 ?        00:00:00 cron
> >>>>>>   3025 ?        00:00:00 mysqld_safe
> >>>>>>   3025 ?        00:00:00 mysqld_safe
> >>>>>>   3032 ?        00:00:00 sshd
> >>>>>>   3032 ?        00:00:00 sshd
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3097 ?        00:00:00 mysqld.bin
> >>>>>>   3113 ?        00:00:00 ctl.sh
> >>>>>>   3113 ?        00:00:00 ctl.sh
> >>>>>>   3115 ?        00:00:00 sleep
> >>>>>>   3115 ?        00:00:00 sleep
> >>>>>>   3116 ?        00:00:00 .memcached.bin
> >>>>>>   3116 ?        00:00:00 .memcached.bin
> >>>>>>
> >>>>>>
> >>>>>> As you can see, it is clear that the newer kernel is setting -17 by
> >>>>>> default, which in turn is causing the OOM killer loop.
> >>>>>>
> >>>>>> So I began to try and find what may have caused this problem by
> >>>>>> comparing the two sources...
> >>>>>>
> >>>>>> I checked the code for all references to 'oom_adj' and 
> >>>>>> 'oom_adjust' in
> >>>>>> both code sets, but found no obvious differences:
> >>>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
> >>>>>> oom_adj
> >>>>>>
> >>>>>> Then I checked for references to "-17" in all .c and .h files, and
> >>>>>> found a couple of matches, but only one obvious one:
> >>>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> >>>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
> >>>>>>
> >>>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
> >>>>>>
> >>>>>> In a last ditch attempt, I did a search for all references to 'oom'
> >>>>>> (case-insensitive) in both code bases, then compared the two:
> >>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  grep -i -R "oom" . | 
> >>>>>> sort -n
> >>>>>>> /tmp/annabelle.oom_adj
> >>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep -i -R
> >>>>>> "oom" . | sort -n>  /tmp/vicky.oom_adj
> >>>>>>
> >>>>>> and this brought back (yet again) nothing obvious..
> >>>>>>
> >>>>>>
> >>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
> >>>>>> ./include/linux/oom.h
> >>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> >>>>>>
> >>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum 
> >>>>>> ./include/linux/oom.h
> >>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
> >>>>>>
> >>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
> >>>>>> ./mm/oom_kill.c
> >>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> >>>>>>
> >>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./mm/oom_kill.c
> >>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Could anyone please shed some light as to why the default oom_adj is
> >>>>>> set to -17 now (and where it is actually set)? From what I can tell,
> >>>>>> the fix for this issue will either be:
> >>>>>>
> >>>>>>    1. Allow OOM killer to override the decision of ignoring 
> >>>>>> oom_adj ==
> >>>>>>       -17 if an unrecoverable loop is encountered.
> >>>>>>    2. Change the default back to 0.
> >>>>>>
> >>>>>> Again, my apologies if this bug report is slightly unorthodox, or
> >>>>>> doesn't follow usual procedure etc. I can assure you I have tried my
> >>>>>> absolute best to give all the necessary information though.
> >>>>>>
> >>>>>> Cal
> >>>>>>
> >>>>> -- 
> >>>>> To unsubscribe from this list: send the line "unsubscribe 
> >>>>> linux-kernel" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>> Please read the FAQ at  http://www.tux.org/lkml/
> >>>>>
> >>
> >
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: cgroup OOM killer loop causes system to lockup (possible fix included)
  2011-05-30 21:36               ` Cal Leeming [Simplicity Media Ltd]
  (?)
  (?)
@ 2011-05-31 19:29               ` Cal Leeming [Simplicity Media Ltd]
  -1 siblings, 0 replies; 20+ messages in thread
From: Cal Leeming [Simplicity Media Ltd] @ 2011-05-31 19:29 UTC (permalink / raw)
  To: Gert Doering, linux-mm, kosaki.motohiro, minchan.kim, rientjes,
	kamezawa.hiroyu

[-- Attachment #1: Type: text/plain, Size: 24675 bytes --]

This is now being handed back to linux-mm for re-assessment.. OpenSSH 
devs are saying this is not a fault in their code..

It would appear that the loadable bnx2 module is causing strange oom_adj 
behavior.. and if its affecting this, I wonder what else it might be 
affecting? Bug can only be reproduced when the module is in use by 
actual hardware.

Here is the latest conversation with openssh devs, which confirms this 
definitely falls within the remit of debian or kernel-mm.

On 31/05/2011 13:25, Gert Doering wrote:
 > Hi,
 >
 > On Tue, May 31, 2011 at 12:11:13PM +0100, Cal Leeming [Simplicity 
Media Ltd] wrote:
 >> Could you point out the line of code where oom_adj_save is set to the
 >> original value, because I've looked everywhere, and from what I can
 >> tell, it's only ever set to INT_MIN, and no where else is it changed.
 >> (C is not my strongest language tho, so I most likely have overlooked
 >> something). This is where I got thrown off.
 >
 > oom_adjust_setup() does this:
 >
 >                 if ((fp = fopen(oom_adj_path, "r+")) != NULL) {
 >                         if (fscanf(fp, "%d", &oom_adj_save) != 1)
 >                                 verbose("error reading %s: %s", 
oom_adj_path,
 >                                     strerror(errno));
 >
 > the "fscanf()" call will read an integer ("%d") from the file named,
 > and write that number into the variable being pointed to 
(&oom_adj_save).
 >
 > The loop is a bit tricky to read as it takes different paths into
 > account, and will exit after the first successful update.
 >
 > fscanf() will return the number of successful conversions, so if it
 > was able to read "one number", the return value is "1", and it will
 > jump to the else block
 >
 >                         else {
 >                                 rewind(fp);
 >                                 if (fprintf(fp, "%d\n", value) <= 0)
 >                                         verbose("error writing %s: %s",
 >                                            oom_adj_path, 
strerror(errno));
 >                                 else
 >                                         verbose("Set %s from %d to %d",
 >                                            oom_adj_path, 
oom_adj_save, value);
 >                         }
 >
 > where it will overwrite what is in that file with the new value
 > ("value"), and then print the "Set ... from -17 to -17" message that
 > you saw.

Ah, thank you for explaining this. Makes a lot more sense now :)

 >
 >
 >>> The question here is why sshd is sometimes started with -17 and 
sometimes
 >>> with 0 - that's the bug, not that sshd keeps what it's given.
 >>>
 >>> (Ask yourself: if sshd had no idea about oom_adj at all, would this 
make
 >>> it buggy by not changing the value?)
 >>
 >> This was what I was trying to pinpoint down before. I had came to this
 >> conclusion myself, sent it to the Debian bug list, and they dismissed
 >> on the grounds that it was an openssh problem...
 >
 > I must admit that I have no idea what is causing it, but from the logs,
 > it very much looks like sshd is started with "-17" in there - but only
 > in the problem case.
 >
 >
 >> So far, the buck has been passed from kernel-mm to debian-kernel, to
 >> openssh, and now back to debian-kernel lol. The most annoying thing,
 >> is that you can't get this bug to happen unless you physically test on
 >> a machine which requires the bnx2 firmwire, so I get the feeling this
 >> won't get resolved :/
 >
 > And *that* strongly points to a bug in the bnx2 stuff - if other 
programs
 > change their behaviour based on the existance of a given driver, that
 > does not smell very healthy.

Agreed.. I was thinking of adding some debug into the fs/proc/ code 
which does a kprint on every oom_adj read/write, but I couldn't figure 
out how to extract the pid from the task (pointer?).

 >
 > [..]
 >>> Anyway, as a workaround for your system, you can certainly set
 >>>
 >>>  oom_adj_save = 0;
 >>>
 >>> in the beginning of port-linux.c / oom_adjust_restore(), to claim that
 >>> "hey, this was the saved value to start with" and "restore" oom_adj 
to 0
 >>> then - but that's just hiding the bug, not fixing it.
 >>
 >> I'm disappointed this wasn't the correct fix, I honestly thought I had
 >> patched it right :(
 >
 > Well, that's the short hand - "just ignore everything that happened at
 > init / save time, and forcibly write back '0', no matter what was there
 > before".
 >
 >> But, on the other hand, ssh users should really never have a default
 >> oom_adj of -17, so maybe 0 should be set as default anyway? If this is
 >> not the case, could you give reasons why??
 >
 > Well, I would say "the default value in there is a matter of local 
policy",
 > so what if someone wants to make sure that whatever is run from sshd is
 > always privileged regarding oom, even if a local firefox etc. is running
 > amock and you need to ssh-in and kill the GUI stuff...
 >
 > One might opt to run sshd (and all its children) at "-5" (slightly 
special
 > treatment), or "0" (no special treatment), or even at "-17" - but that's
 > local policy.

Ah, okay that's make sense.

 >
 >
 > Mmmh.
 >
 > Since this seems to be inherited, it might even work if you just change
 > the sshd startup script, and insert
 >
 >   echo 0 >/proc/self/oom_adj
 >
 > in there, right before it starts the sshd...  "local policy at work".

Yeah I was going to do this, but then I thought "well if this problem is 
occurring for openssh, then what else could it be affecting?". As you 
pointed out above, having the oom_adj changed based on the existence of 
a driver is really not good.

I will paste this convo trail into the debian ticket, and hopefully 
it'll help convince them this issue needs fixing.

 >
 > gert

Thanks again for taking the time to reply!



On 30/05/2011 22:36, Cal Leeming [Simplicity Media Ltd] wrote:
> FYI everyone, I found a bug within openssh-server which caused this 
> problem.
>
> I've patched and submitted to the openssh list.
>
> You can find details of this by googling for:
> "port-linux.c bug with oom_adjust_restore() - causes real bad oom_adj 
> - which can cause DoS conditions"
>
> It's extremely strange.. :S
>
> Cal
>
> On 30/05/2011 18:36, Cal Leeming [Simplicity Media Ltd] wrote:
>> Here is an strace of the SSH process (which is somehow inheriting the 
>> -17 oom_adj on all forked user instances)
>>
>> (broken server - with bnx2 module loaded)
>> [pid  2200] [    7f13a09c9cb0] open("/proc/self/oom_adj", 
>> O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
>> [pid  2120] [    7f13a09c9f00] write(7, "\0\0\2\240\n\n\n\nPort 
>> 22\n\n\n\nProtocol 2\n\nH"..., 680 <unfinished ...>
>> [pid  2200] [    7f13a09c9cb0] <... open resumed> ) = 9
>> [pid  2120] [    7f13a09c9f00] <... write resumed> ) = 680
>> [pid  2120] [    7f13a09c9e40] close(7 <unfinished ...>
>> [pid  2200] [    7f13a09c9844] fstat(9, <unfinished ...>
>> [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
>> [pid  2200] [    7f13a09c9844] <... fstat resumed> 
>> {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>> [pid  2120] [    7f13a09c9e40] close(8 <unfinished ...>
>> [pid  2200] [    7f13a09d2a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
>> [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
>> [pid  2200] [    7f13a09d2a2a] <... mmap resumed> ) = 0x7f13a25a6000
>> [pid  2120] [    7f13a09c9e40] close(4 <unfinished ...>
>> [pid  2200] [    7f13a09c9f00] write(9, "-17\n", 4 <unfinished ...>
>>
>>
>> (working server - with bnx2 module unloaded)
>> [pid  1323] [    7fae577fbe40] close(7) = 0
>> [pid  1631] [    7fae577fbcb0] open("/proc/self/oom_adj", 
>> O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
>> [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\217\0", 5 <unfinished 
>> ...>
>> [pid  1631] [    7fae577fbcb0] <... open resumed> ) = 10
>> [pid  1323] [    7fae577fbf00] <... write resumed> ) = 5
>> [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\206\n\n\n\nPort 
>> 22\n\n\n\nProtocol 2\n\nH"..., 654 <unfinished ...>
>> [pid  1631] [    7fae577fb844] fstat(10, <unfinished ...>
>> [pid  1323] [    7fae577fbf00] <... write resumed> ) = 654
>> [pid  1631] [    7fae577fb844] <... fstat resumed> 
>> {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>> [pid  1323] [    7fae577fbe40] close(8) = 0
>> [pid  1631] [    7fae57804a2a] mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
>> [pid  1323] [    7fae577fbe40] close(9 <unfinished ...>
>> [pid  1631] [    7fae57804a2a] <... mmap resumed> ) = 0x7fae593d9000
>> [pid  1323] [    7fae577fbe40] <... close resumed> ) = 0
>> [pid  1323] [    7fae577fbe40] close(5 <unfinished ...>
>> [pid  1631] [    7fae577fbf00] write(10, "0\n", 2 <unfinished ...>
>>
>> The two servers are *EXACT* duplicates of each other, completely 
>> fresh Debian installs, with exactly the same packages installed.
>>
>> As you can see, the working server sends "0" into the oom_adj and the 
>> broken one sends "-17".
>>
>>
>> On 30/05/2011 15:27, Cal Leeming [Simplicity Media Ltd] wrote:
>>> I FOUND THE PROBLEM!!!
>>>
>>> Explicit details can be found on the Debian kernel mailing list, but 
>>> to cut short, it's caused by the firmware-bnx2 kernel module:
>>>
>>> The broken server uses 'firmware-bnx2'.. so I purged the bnx2 
>>> package, removed the bnx*.ko files from /lib/modules, ran 
>>> update-initramfs, and then rebooted (i then confirmed it was removed 
>>> by checking ifconfig and lsmod).
>>>
>>> And guess what.. IT WORKED.
>>>
>>> So, this problem seems to be caused by the firmware-bnx2 module 
>>> being loaded.. some how, that module is causing -17 oom_adj to be 
>>> set for everything..
>>>
>>> WTF?!?! Surely a bug?? Could someone please forward this to the 
>>> appropriate person for the bnx2 kernel module, as I wouldn't even 
>>> know where to begin :S
>>>
>>> Cal
>>>
>>> On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
>>>> -resent due to incorrect formatting, sorry if this dups!
>>>>
>>>> @Kame
>>>> Thanks for the reply!
>>>> Both kernels used the same env/dist, but which slightly different 
>>>> packages.
>>>> After many frustrating hours, I have pin pointed this down to a dodgy
>>>> Debian package which appears to continue affecting the system, even
>>>> after purging. I'm still yet to pin point the package down (I'm doing
>>>> several reinstall tests, along with tripwire analysis after each
>>>> reboot).
>>>>
>>>> @Hiroyuki
>>>> Thank you for sending this to the right people!
>>>>
>>>> @linux-mm
>>>> On a side note, would someone mind taking a few minutes to give a
>>>> brief explanation as to how the default oom_adj is set, and under what
>>>> conditions it is given -17 by default? Is this defined by the
>>>> application? I looked through the kernel source,
>>>> and noticed some of the code was defaulted to set oom_adj to
>>>> OOM_DISABLE (which is defined in the headers as -17).
>>>>
>>>> Assuming the debian problem is resolved, this might be another call
>>>> for the oom-killer to be modified so that if it encounters the
>>>> unrecoverable loop, it ignores the -17 rule (with some exceptions,
>>>> such as kernel processes, and other critical things). If this is going
>>>> to be a relatively simple task, I wouldn't mind spending a few hours
>>>> patching this?
>>>>
>>>> Cal
>>>>
>>>> On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
>>>> <kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>>>>> Thank you. memory cgroup and OOM troubles are handled in linux-mm.
>>>>>
>>>>> On Sun, 29 May 2011 23:24:07 +0100
>>>>> "Cal Leeming [Simplicity Media 
>>>>> Ltd]"<cal.leeming@simplicitymedialtd.co.uk>  wrote:
>>>>>
>>>>>> Some further logs:
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] 
>>>>>> redis-server
>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>>>>>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
>>>>>> [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
>>>>>>
>>>>>>
>>>>> hmm, in short, applications has -17 oom_adj in default with 
>>>>> 2.6.32.41 ?
>>>>> AFAIK, no kernel has such crazy settings as default..
>>>>>
>>>>> Does your 2 kernel uses the same environment/distribution ?
>>>>>
>>>>> Thanks,
>>>>> -Kame
>>>>>
>>>>>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>>>>>>>   First of all, my apologies if I have submitted this problem to 
>>>>>>> the
>>>>>>> wrong place, spent 20 minutes trying to figure out where it 
>>>>>>> needs to
>>>>>>> be sent, and was still none the wiser.
>>>>>>>
>>>>>>> The problem is related to applying memory limitations within a 
>>>>>>> cgroup.
>>>>>>> If the OOM killer kicks in, it gets stuck in a loop where it 
>>>>>>> tries to
>>>>>>> kill a process which has an oom_adj of -17. This causes an infinite
>>>>>>> loop, which in turn locks up the system.
>>>>>>>
>>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
>>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
>>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
>>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
>>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>>> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
>>>>>>> memory: kill process 6016 (java) score 0 or a child
>>>>>>>
>>>>>>>
>>>>>>>   root@vicky [/home/foxx]>  uname -a
>>>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
>>>>>>> x86_64
>>>>>>> GNU/Linux
>>>>>>> (this happens on both the grsec patched and non patched 
>>>>>>> 2.6.32.41 kernel)
>>>>>>>
>>>>>>> When this is encountered, the memory usage across the whole 
>>>>>>> server is
>>>>>>> still within limits (not even hitting swap).
>>>>>>>
>>>>>>> The memory configuration for the cgroup/lxc is:
>>>>>>> lxc.cgroup.memory.limit_in_bytes = 3000M
>>>>>>> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>>>>>>>
>>>>>>> Now, what is even more strange, is that when running under the
>>>>>>> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
>>>>>>> happen. However, there is a slight difference between the two 
>>>>>>> kernels.
>>>>>>> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
>>>>>>> where as the 2.6.32.41 gives a default of -17. I suspect this is 
>>>>>>> the
>>>>>>> root cause of why it's showing in the later kernel, but not the 
>>>>>>> earlier.
>>>>>>>
>>>>>>> To test this theory, I started up the lxc on both servers, and then
>>>>>>> ran a one liner which showed me all the processes with an 
>>>>>>> oom_adj of -17:
>>>>>>>
>>>>>>> (the below is the older/working kernel)
>>>>>>> root@courtney.internal [/mnt/encstore/lxc]>  uname -a
>>>>>>> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 
>>>>>>> 16:09:07 GMT
>>>>>>> 2011 x86_64 GNU/Linux
>>>>>>> root@courtney.internal [/mnt/encstore/lxc]>  for x in `find /proc
>>>>>>> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` 
>>>>>>> ; do
>>>>>>> ps -p $x --no-headers ; done
>>>>>>> grep: /proc/1411/task/1411/oom_adj: No such file or directory
>>>>>>> grep: /proc/1411/oom_adj: No such file or directory
>>>>>>>    804 ?        00:00:00 udevd
>>>>>>>    804 ?        00:00:00 udevd
>>>>>>> 25536 ?        00:00:00 sshd
>>>>>>> 25536 ?        00:00:00 sshd
>>>>>>> 31861 ?        00:00:00 sshd
>>>>>>> 31861 ?        00:00:00 sshd
>>>>>>> 32173 ?        00:00:00 udevd
>>>>>>> 32173 ?        00:00:00 udevd
>>>>>>> 32174 ?        00:00:00 udevd
>>>>>>> 32174 ?        00:00:00 udevd
>>>>>>>
>>>>>>> (the below is the newer/broken kernel)
>>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname -a
>>>>>>> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 
>>>>>>> x86_64
>>>>>>> GNU/Linux
>>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x in
>>>>>>> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' 
>>>>>>> '{print
>>>>>>> $3}'` ; do ps -p $x --no-headers ; done
>>>>>>> grep: /proc/3118/task/3118/oom_adj: No such file or directory
>>>>>>> grep: /proc/3118/oom_adj: No such file or directory
>>>>>>>    895 ?        00:00:00 udevd
>>>>>>>    895 ?        00:00:00 udevd
>>>>>>>   1091 ?        00:00:00 udevd
>>>>>>>   1091 ?        00:00:00 udevd
>>>>>>>   1092 ?        00:00:00 udevd
>>>>>>>   1092 ?        00:00:00 udevd
>>>>>>>   2596 ?        00:00:00 sshd
>>>>>>>   2596 ?        00:00:00 sshd
>>>>>>>   2608 ?        00:00:00 sshd
>>>>>>>   2608 ?        00:00:00 sshd
>>>>>>>   2613 ?        00:00:00 sshd
>>>>>>>   2613 ?        00:00:00 sshd
>>>>>>>   2614 pts/0    00:00:00 bash
>>>>>>>   2614 pts/0    00:00:00 bash
>>>>>>>   2620 pts/0    00:00:00 sudo
>>>>>>>   2620 pts/0    00:00:00 sudo
>>>>>>>   2621 pts/0    00:00:00 su
>>>>>>>   2621 pts/0    00:00:00 su
>>>>>>>   2622 pts/0    00:00:00 bash
>>>>>>>   2622 pts/0    00:00:00 bash
>>>>>>>   2685 ?        00:00:00 lxc-start
>>>>>>>   2685 ?        00:00:00 lxc-start
>>>>>>>   2699 ?        00:00:00 init
>>>>>>>   2699 ?        00:00:00 init
>>>>>>>   2939 ?        00:00:00 rc
>>>>>>>   2939 ?        00:00:00 rc
>>>>>>>   2942 ?        00:00:00 startpar
>>>>>>>   2942 ?        00:00:00 startpar
>>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>>   2964 ?        00:00:00 rsyslogd
>>>>>>>   2980 ?        00:00:00 startpar
>>>>>>>   2980 ?        00:00:00 startpar
>>>>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>>>>   2981 ?        00:00:00 ctlscript.sh
>>>>>>>   3016 ?        00:00:00 cron
>>>>>>>   3016 ?        00:00:00 cron
>>>>>>>   3025 ?        00:00:00 mysqld_safe
>>>>>>>   3025 ?        00:00:00 mysqld_safe
>>>>>>>   3032 ?        00:00:00 sshd
>>>>>>>   3032 ?        00:00:00 sshd
>>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>>   3097 ?        00:00:00 mysqld.bin
>>>>>>>   3113 ?        00:00:00 ctl.sh
>>>>>>>   3113 ?        00:00:00 ctl.sh
>>>>>>>   3115 ?        00:00:00 sleep
>>>>>>>   3115 ?        00:00:00 sleep
>>>>>>>   3116 ?        00:00:00 .memcached.bin
>>>>>>>   3116 ?        00:00:00 .memcached.bin
>>>>>>>
>>>>>>>
>>>>>>> As you can see, it is clear that the newer kernel is setting -17 by
>>>>>>> default, which in turn is causing the OOM killer loop.
>>>>>>>
>>>>>>> So I began to try and find what may have caused this problem by
>>>>>>> comparing the two sources...
>>>>>>>
>>>>>>> I checked the code for all references to 'oom_adj' and 
>>>>>>> 'oom_adjust' in
>>>>>>> both code sets, but found no obvious differences:
>>>>>>> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e 
>>>>>>> oom_adjust -e
>>>>>>> oom_adj
>>>>>>>
>>>>>>> Then I checked for references to "-17" in all .c and .h files, and
>>>>>>> found a couple of matches, but only one obvious one:
>>>>>>> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
>>>>>>> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>>>>>>>
>>>>>>> But again, a search for OOM_DISABLE came up with nothing obvious...
>>>>>>>
>>>>>>> In a last ditch attempt, I did a search for all references to 'oom'
>>>>>>> (case-insensitive) in both code bases, then compared the two:
>>>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  grep -i -R "oom" . | 
>>>>>>> sort -n
>>>>>>>> /tmp/annabelle.oom_adj
>>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep 
>>>>>>> -i -R
>>>>>>> "oom" . | sort -n>  /tmp/vicky.oom_adj
>>>>>>>
>>>>>>> and this brought back (yet again) nothing obvious..
>>>>>>>
>>>>>>>
>>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>>>>> ./include/linux/oom.h
>>>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>>>>
>>>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum 
>>>>>>> ./include/linux/oom.h
>>>>>>> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>>>>>>>
>>>>>>>   root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
>>>>>>> ./mm/oom_kill.c
>>>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>>>>
>>>>>>>   root@annabelle [~/lol/linux-2.6.32.28]>  md5sum ./mm/oom_kill.c
>>>>>>> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Could anyone please shed some light as to why the default 
>>>>>>> oom_adj is
>>>>>>> set to -17 now (and where it is actually set)? From what I can 
>>>>>>> tell,
>>>>>>> the fix for this issue will either be:
>>>>>>>
>>>>>>>    1. Allow OOM killer to override the decision of ignoring 
>>>>>>> oom_adj ==
>>>>>>>       -17 if an unrecoverable loop is encountered.
>>>>>>>    2. Change the default back to 0.
>>>>>>>
>>>>>>> Again, my apologies if this bug report is slightly unorthodox, or
>>>>>>> doesn't follow usual procedure etc. I can assure you I have 
>>>>>>> tried my
>>>>>>> absolute best to give all the necessary information though.
>>>>>>>
>>>>>>> Cal
>>>>>>>
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>>>> linux-kernel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>>>>
>>>
>>
>


[-- Attachment #2: Type: text/html, Size: 45183 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-05-31 19:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-29 21:50 Fwd: cgroup OOM killer loop causes system to lockup (possible fix included) Cal Leeming [Simplicity Media Ltd]
2011-05-29 21:50 ` Cal Leeming [Simplicity Media Ltd]
2011-05-29 22:24 ` Cal Leeming [Simplicity Media Ltd]
2011-05-29 22:24   ` Cal Leeming [Simplicity Media Ltd]
2011-05-30  2:23   ` KAMEZAWA Hiroyuki
2011-05-30  2:23     ` KAMEZAWA Hiroyuki
2011-05-30 10:51     ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 10:52       ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 10:52         ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 10:52         ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 14:27         ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 14:27           ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 14:27           ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 17:36           ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 17:36             ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 21:36             ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 21:36               ` Cal Leeming [Simplicity Media Ltd]
2011-05-30 23:50               ` KAMEZAWA Hiroyuki
2011-05-30 23:50                 ` KAMEZAWA Hiroyuki
2011-05-31 19:29               ` Cal Leeming [Simplicity Media Ltd]

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.