All of lore.kernel.org
 help / color / mirror / Atom feed
* "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock,  even for unlimited processes
@ 2011-06-21 14:51 Lutz Vieweg
  2011-06-21 16:01 ` Ying Han
  2011-06-22  0:10 ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 11+ messages in thread
From: Lutz Vieweg @ 2011-06-21 14:51 UTC (permalink / raw)
  To: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki; +Cc: linux-mm, lvml

[-- Attachment #1: Type: text/plain, Size: 2527 bytes --]

Dear Memory Ressource Controller maintainers,

by using per-user control groups with a limit on memory (and swap) I am
trying to secure a shared development server against memory exhaustion
by any one single user - as it happened before when somebody imprudently
issued "make -j" (which has the infamous habit to spawn an unlimited
number of processes) on a large software project with many source files.

The memory limitation using control groups works just fine when
only a few processes sum up to a usage that exceeds the limits - the
processes are OOM-killed, then, and the others users are unaffected.

But the original cause, a "make -j" on many source files, leads to
the following ugly symptom:

- make starts numerous (~ 100 < x < 200) gcc processes

- some of those gcc processes get OOM-killed quickly, then
   a few more are killed, but with increasing pauses in between

- then after a few seconds, no more gcc processes are killed, but
   the "make" process and its childs do not show any progress anymore

- at this time, top indicates 100% "system" CPU usage, mostly by
   "[kworker/*]" threads (one per CPU). But processes from other
   users, that only require CPU, proceed to run.

- but also at this time, if any other user (who has not exhausted
   his memory limits) tries to access any file (at least on /tmp/,
   as e.g. gcc does), even a simple "ls /tmp/", this operation
   waits forever. (But "iostat" does not indicate any I/O activity.)

- as soon as you press "CTRL-C" to abort the "make -j", everything
   goes back to normal, quickly - also the other users' processes proceed.


To reproduce the problem, the attached "Makefile" to a directory
on a filesystem with at least 70MB free space, then

  mount -o memory none /cgroup
  mkdir /cgroup/test
  echo 64M >/cgroup/test/memory.limit_in_bytes
  echo 64M >/cgroup/test/memory.memsw.limit_in_bytes

  cd /somewhere/with/70mb/free
  echo $$ >/cgroup/test/tasks
  make sources
  make -j compile

Notice that "make sources" will create 200 bogus "*.c" files from
/dev/urandom to make sure that "gcc" will use up some memory.

The "make -j compile" reliably reproduces the above mentioned syndrome,
here.

Please notice that the livelock does happen only with a significant
number of parallel compiler runs - it did e.g. not happen with
only 100 for me, and it also did not happen when I started "make"
with "strace" - so timing seems to be an issue, here.

Thanks for any hints towards a solution of this issue in advance!

Regards,

Lutz Vieweg

[-- Attachment #2: Makefile --]
[-- Type: text/plain, Size: 542 bytes --]


all:
	echo "first 'make sources', then 'make -j compile'


N=200 

clean:
	rm -f file_*.o lib.so


mrproper:
	rm -f file_*.c file_*.o lib.so
	

sources: clean
	for (( I=0 ; $$I < $(N) ; I=`expr $$I + 1` )) ; do \
		echo $$I; \
		echo "char array_$$I [] = " >file_$$I.c ;\
		dd if=/dev/urandom bs=256k count=1 | base64 | sed 's/^.*/"\0"/g' >>file_$$I.c ;\
		echo ";" >>file_$$I.c ;\
	done


OBJ = $(addsuffix .o, $(basename $(notdir $(wildcard file_*.c))))

compile: $(OBJ)
	gcc -shared -O3 -o lib.so $(OBJ)	

%.o: ./%.c
	gcc -O3 -c $< -o $@

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock, even for unlimited processes
  2011-06-21 14:51 "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock, even for unlimited processes Lutz Vieweg
@ 2011-06-21 16:01 ` Ying Han
  2011-06-21 16:19   ` Lutz Vieweg
  2011-06-22  0:10 ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 11+ messages in thread
From: Ying Han @ 2011-06-21 16:01 UTC (permalink / raw)
  To: Lutz Vieweg; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-mm

On Tue, Jun 21, 2011 at 7:51 AM, Lutz Vieweg <lvml@5t9.de> wrote:
> Dear Memory Ressource Controller maintainers,
>
> by using per-user control groups with a limit on memory (and swap) I am
> trying to secure a shared development server against memory exhaustion
> by any one single user - as it happened before when somebody imprudently
> issued "make -j" (which has the infamous habit to spawn an unlimited
> number of processes) on a large software project with many source files.
>
> The memory limitation using control groups works just fine when
> only a few processes sum up to a usage that exceeds the limits - the
> processes are OOM-killed, then, and the others users are unaffected.
>
> But the original cause, a "make -j" on many source files, leads to
> the following ugly symptom:
>
> - make starts numerous (~ 100 < x < 200) gcc processes
>
> - some of those gcc processes get OOM-killed quickly, then
>  a few more are killed, but with increasing pauses in between
>
> - then after a few seconds, no more gcc processes are killed, but
>  the "make" process and its childs do not show any progress anymore
>
> - at this time, top indicates 100% "system" CPU usage, mostly by
>  "[kworker/*]" threads (one per CPU). But processes from other
>  users, that only require CPU, proceed to run.

The following patch might not be the root-cause of livelock, but
should reduce the [kworker/*] in your case.

==

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock, even for unlimited processes
  2011-06-21 16:01 ` Ying Han
@ 2011-06-21 16:19   ` Lutz Vieweg
  2011-06-21 16:28     ` Ying Han
  0 siblings, 1 reply; 11+ messages in thread
From: Lutz Vieweg @ 2011-06-21 16:19 UTC (permalink / raw)
  To: Ying Han; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-mm

On 06/21/2011 06:01 PM, Ying Han wrote:
> The following patch might not be the root-cause of livelock, but
> should reduce the [kworker/*] in your case.
>
>  From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
>
> Can you give a try?

I will first need to move this test to a machine (like my Laptop)
that I can more aggressively reboot without disturbing the
developers on the shared hardware. Will do that asap.

> I don't know which kernel you are using in case
> you don't have this patched yet.

2.6.39.
5 out of 6 hunks in your patch apply to this version, 1 is rejected -
so I guess I should upgrade to a more recent kernel, first.
Would 2.6.39.1 be sufficient or would some non-release kernel
(from which git repository?) be required?

Regards,

Lutz Vieweg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock, even for unlimited processes
  2011-06-21 16:19   ` Lutz Vieweg
@ 2011-06-21 16:28     ` Ying Han
  2011-06-21 16:35       ` Lutz Vieweg
  0 siblings, 1 reply; 11+ messages in thread
From: Ying Han @ 2011-06-21 16:28 UTC (permalink / raw)
  To: Lutz Vieweg; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-mm

On Tue, Jun 21, 2011 at 9:19 AM, Lutz Vieweg <lvml@5t9.de> wrote:
> On 06/21/2011 06:01 PM, Ying Han wrote:
>>
>> The following patch might not be the root-cause of livelock, but
>> should reduce the [kworker/*] in your case.
>>
>>  From d1372da4d3c6f8051b5b1cf7b5e8b45a8094b388 Mon Sep 17 00:00:00 2001
>>
>> Can you give a try?
>
> I will first need to move this test to a machine (like my Laptop)
> that I can more aggressively reboot without disturbing the
> developers on the shared hardware. Will do that asap.
>
>> I don't know which kernel you are using in case
>> you don't have this patched yet.
>
> 2.6.39.
> 5 out of 6 hunks in your patch apply to this version, 1 is rejected -
> so I guess I should upgrade to a more recent kernel, first.
> Would 2.6.39.1 be sufficient or would some non-release kernel
> (from which git repository?) be required?

Last time I tried was build on mmotm-2011-05-12-15-52 with the patch.
But I assume you can also
patch it on top of 2.6.39.

Meantime, I am trying to reproduce your livelock on my host with kernbench.

--Ying
>
> Regards,
>
> Lutz Vieweg
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock, even for unlimited processes
  2011-06-21 16:28     ` Ying Han
@ 2011-06-21 16:35       ` Lutz Vieweg
  0 siblings, 0 replies; 11+ messages in thread
From: Lutz Vieweg @ 2011-06-21 16:35 UTC (permalink / raw)
  To: Ying Han; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-mm

On 06/21/2011 06:28 PM, Ying Han wrote:
> Last time I tried was build on mmotm-2011-05-12-15-52 with the patch.
> But I assume you can also
> patch it on top of 2.6.39.

Ok, thanks for that info.

> Meantime, I am trying to reproduce your livelock on my host with kernbench.

I'm not sure you will see a kernel-compile ever spawn enough compile jobs
in parallel to reproduce the problem.

It may be much easier to use the Makefile I attached to my initial
problem report...

Regards,

Lutz Vieweg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock,  even for unlimited processes
  2011-06-21 14:51 "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock, even for unlimited processes Lutz Vieweg
  2011-06-21 16:01 ` Ying Han
@ 2011-06-22  0:10 ` KAMEZAWA Hiroyuki
  2011-06-22  1:06   ` KAMEZAWA Hiroyuki
  2011-06-22  9:53   ` Lutz Vieweg
  1 sibling, 2 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-22  0:10 UTC (permalink / raw)
  To: Lutz Vieweg; +Cc: Balbir Singh, Daisuke Nishimura, linux-mm

On Tue, 21 Jun 2011 16:51:18 +0200
Lutz Vieweg <lvml@5t9.de> wrote:

> Dear Memory Ressource Controller maintainers,
> 
> by using per-user control groups with a limit on memory (and swap) I am
> trying to secure a shared development server against memory exhaustion
> by any one single user - as it happened before when somebody imprudently
> issued "make -j" (which has the infamous habit to spawn an unlimited
> number of processes) on a large software project with many source files.
> 
> The memory limitation using control groups works just fine when
> only a few processes sum up to a usage that exceeds the limits - the
> processes are OOM-killed, then, and the others users are unaffected.
> 
> But the original cause, a "make -j" on many source files, leads to
> the following ugly symptom:
> 
> - make starts numerous (~ 100 < x < 200) gcc processes
> 
> - some of those gcc processes get OOM-killed quickly, then
>    a few more are killed, but with increasing pauses in between
> 
> - then after a few seconds, no more gcc processes are killed, but
>    the "make" process and its childs do not show any progress anymore
> 

This is a famous fork-bomb problem. I posted fork-bomb-killer patch sets once
but not welcomed. And, there are OOM-killer trouble in kernel, too.
(I think it's recently fixed.)



Don't you use your test set under some cpu cgroup ?
If so, you can see  deadlock in some versions of kernel.


Then, you can stop oom-kill by echo 1 > .../memory.oom_control.
All processes under memcg will be blocked. you can kill all process under memcg
by you hands.

> - at this time, top indicates 100% "system" CPU usage, mostly by
>    "[kworker/*]" threads (one per CPU). But processes from other
>    users, that only require CPU, proceed to run.
> 

This is a known bug and it's now fixed.


> - but also at this time, if any other user (who has not exhausted
>    his memory limits) tries to access any file (at least on /tmp/,
>    as e.g. gcc does), even a simple "ls /tmp/", this operation
>    waits forever. (But "iostat" does not indicate any I/O activity.)
> 

Hmm, it means your 'ls' gets some lock and wait for it. Then, what lock
you wait for ? what w_chan is shown in 'ps -elf' ?


> - as soon as you press "CTRL-C" to abort the "make -j", everything
>    goes back to normal, quickly - also the other users' processes proceed.
> 

yes.


> 
> To reproduce the problem, the attached "Makefile" to a directory
> on a filesystem with at least 70MB free space, then
> 
>   mount -o memory none /cgroup
>   mkdir /cgroup/test
>   echo 64M >/cgroup/test/memory.limit_in_bytes
>   echo 64M >/cgroup/test/memory.memsw.limit_in_bytes
> 

64M is crazy small limit for make -j , I use 300M for my test...



>   cd /somewhere/with/70mb/free
>   echo $$ >/cgroup/test/tasks
>   make sources
>   make -j compile
> 
> Notice that "make sources" will create 200 bogus "*.c" files from
> /dev/urandom to make sure that "gcc" will use up some memory.
> 
> The "make -j compile" reliably reproduces the above mentioned syndrome,
> here.
> 
> Please notice that the livelock does happen only with a significant
> number of parallel compiler runs - it did e.g. not happen with
> only 100 for me, and it also did not happen when I started "make"
> with "strace" - so timing seems to be an issue, here.
> 
> Thanks for any hints towards a solution of this issue in advance!
> 

I think the most of problem comes from oom-killer logic.

Anyway, please post oom-killer log.

and plesse see what hapeens when 

 echo 1 > /memory.oom_control
 (See Documentation/cgroup/memory.txt)




Thsnks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock,  even for unlimited processes
  2011-06-22  0:10 ` KAMEZAWA Hiroyuki
@ 2011-06-22  1:06   ` KAMEZAWA Hiroyuki
  2011-06-22 10:20     ` KAMEZAWA Hiroyuki
  2011-06-22 14:37     ` Michal Hocko
  2011-06-22  9:53   ` Lutz Vieweg
  1 sibling, 2 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-22  1:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Lutz Vieweg, Balbir Singh, Daisuke Nishimura, linux-mm

On Wed, 22 Jun 2011 09:10:18 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 21 Jun 2011 16:51:18 +0200
> Lutz Vieweg <lvml@5t9.de> wrote:

> > - but also at this time, if any other user (who has not exhausted
> >    his memory limits) tries to access any file (at least on /tmp/,
> >    as e.g. gcc does), even a simple "ls /tmp/", this operation
> >    waits forever. (But "iostat" does not indicate any I/O activity.)
> > 
> 
> Hmm, it means your 'ls' gets some lock and wait for it. Then, what lock
> you wait for ? what w_chan is shown in 'ps -elf' ?
> 

I reproduced. And checked sysrq t.

At first, some oom killers run.
Second, oom killer stops by some reason. (I think there are a KILLED process in memcg
but it doesn't exit. I'll check memcg' bypass logic.)

Third, ls /tmp stops.

Here is sysrq log.
==
Jun 22 10:04:29 bluextal kernel: [ 1366.149012] ls              D 0000000000000082  5448 22307   2799 0x10000000
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  ffff880623a7bb08 0000000000000086 ffff88033fffcc08 ffff88033fffbe70
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  ffff8805fa70c530 0000000000012880 ffff880623a7bfd8 ffff880623a7a010
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  ffff880623a7bfd8 0000000000012880 ffff8805f9c30000 ffff8805fa70c530
Jun 22 10:04:29 bluextal kernel: [ 1366.149012] Call Trace:
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810ee900>] ? sleep_on_page+0x20/0x20
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8154f40c>] io_schedule+0x8c/0xd0
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810ee90e>] sleep_on_page_killable+0xe/0x40
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8154fddf>] __wait_on_bit+0x5f/0x90
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810eeab5>] wait_on_page_bit_killable+0x75/0x80
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810791f0>] ? autoremove_wake_function+0x40/0x40
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810eec35>] __lock_page_or_retry+0x95/0xc0
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810efe7f>] filemap_fault+0x2df/0x4b0
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81115685>] __do_fault+0x55/0x530
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81119150>] ? unmap_region+0x110/0x140
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81115c57>] handle_pte_fault+0xf7/0xb50
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8112f87a>] ? alloc_pages_current+0xaa/0x110
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8103a857>] ? pte_alloc_one+0x37/0x50
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81011ea9>] ? sched_clock+0x9/0x10
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810c0ec9>] ? trace_clock_local+0x9/0x10
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81116885>] handle_mm_fault+0x1d5/0x350
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81555090>] do_page_fault+0x140/0x470
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810ce4c3>] ? trace_nowake_buffer_unlock_commit+0x43/0x60
Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81015c83>] ? ftrace_raw_event_sys_exit+0xb3/
==

Then, waiting for some page bit...I/O of libc mapped pages ?

Hmm. it seems buggy behavior. Okay, I'll dig this.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock,  even for unlimited processes
  2011-06-22  0:10 ` KAMEZAWA Hiroyuki
  2011-06-22  1:06   ` KAMEZAWA Hiroyuki
@ 2011-06-22  9:53   ` Lutz Vieweg
  2011-06-23  6:13     ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 11+ messages in thread
From: Lutz Vieweg @ 2011-06-22  9:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Balbir Singh, Daisuke Nishimura, linux-mm

On 06/22/2011 02:10 AM, KAMEZAWA Hiroyuki wrote:

> This is a famous fork-bomb problem.

Well, the classical fork-bomb would probably try to spawn an infinite
amount of processes, while the number of processes spawned by "make -j"
is limited to the amount of source files (200 in my reproduction Makefile)
and "make" will not restart any processes that got OOM-killed, so it
should terminate after a (not really long) while.

> Don't you use your test set under some cpu cgroup ?

I use the "cpu" controller, too, but haven't seen adverse
effects from doing that so far.
Even in the situation of the livelock I reported, processes
of other users that do not try I/O get their fair share
of CPU time.


> Then, you can stop oom-kill by echo 1>  .../memory.oom_control.
> All processes under memcg will be blocked. you can kill all process under memcg
> by you hands.

Well, but automatic OOM-killing of the processes of the memory hog was exactly
the desired behaviour I was looking for :-)


>>    echo 64M>/cgroup/test/memory.limit_in_bytes
>>    echo 64M>/cgroup/test/memory.memsw.limit_in_bytes
>
> 64M is crazy small limit for make -j , I use 300M for my test...

Just as well, in our real-world use case, the limits are set both
to 16G (which still isn't enough for a "make -j" on our huge source tree),
I intentionally set a rather low limit for the test-Makefile because
I wanted to spare others from first having to write 16G of bogus
source-files to their local storage before the symptom can be reproduced.


> and plesse see what hapeens when
>
>   echo 1>  /memory.oom_control

When I do this before the "make -j", the make childs are stopped,
processes of other users proceed normally.

But of course this will let the user who did the "make -j" assume
the machine is just busy with the compilation, instead of telling
him "you used too much memory".
And further processes started by the same users will mysteriously
stop, too...


> Then, waiting for some page bit...I/O of libc mapped pages ?
>
> Hmm. it seems buggy behavior. Okay, I'll dig this.

Thanks a lot for investigating!

Regards,

Lutz Vieweg


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock,  even for unlimited processes
  2011-06-22  1:06   ` KAMEZAWA Hiroyuki
@ 2011-06-22 10:20     ` KAMEZAWA Hiroyuki
  2011-06-22 14:37     ` Michal Hocko
  1 sibling, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-22 10:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Lutz Vieweg, Balbir Singh, Daisuke Nishimura, linux-mm

On Wed, 22 Jun 2011 10:06:15 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Wed, 22 Jun 2011 09:10:18 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Tue, 21 Jun 2011 16:51:18 +0200
> > Lutz Vieweg <lvml@5t9.de> wrote:
> 
> > > - but also at this time, if any other user (who has not exhausted
> > >    his memory limits) tries to access any file (at least on /tmp/,
> > >    as e.g. gcc does), even a simple "ls /tmp/", this operation
> > >    waits forever. (But "iostat" does not indicate any I/O activity.)
> > > 
> > 
> > Hmm, it means your 'ls' gets some lock and wait for it. Then, what lock
> > you wait for ? what w_chan is shown in 'ps -elf' ?
> > 
> 
> I reproduced. And checked sysrq t.
> 
> At first, some oom killers run.
> Second, oom killer stops by some reason. (I think there are a KILLED process in memcg
> but it doesn't exit. I'll check memcg' bypass logic.)
> 
> Third, ls /tmp stops.
> 
> Here is sysrq log.
> ==
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012] ls              D 0000000000000082  5448 22307   2799 0x10000000
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  ffff880623a7bb08 0000000000000086 ffff88033fffcc08 ffff88033fffbe70
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  ffff8805fa70c530 0000000000012880 ffff880623a7bfd8 ffff880623a7a010
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  ffff880623a7bfd8 0000000000012880 ffff8805f9c30000 ffff8805fa70c530
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012] Call Trace:
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810ee900>] ? sleep_on_page+0x20/0x20
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8154f40c>] io_schedule+0x8c/0xd0
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810ee90e>] sleep_on_page_killable+0xe/0x40
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8154fddf>] __wait_on_bit+0x5f/0x90
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810eeab5>] wait_on_page_bit_killable+0x75/0x80
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810791f0>] ? autoremove_wake_function+0x40/0x40
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810eec35>] __lock_page_or_retry+0x95/0xc0
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810efe7f>] filemap_fault+0x2df/0x4b0
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81115685>] __do_fault+0x55/0x530
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81119150>] ? unmap_region+0x110/0x140
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81115c57>] handle_pte_fault+0xf7/0xb50
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8112f87a>] ? alloc_pages_current+0xaa/0x110
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8103a857>] ? pte_alloc_one+0x37/0x50
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81011ea9>] ? sched_clock+0x9/0x10
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810c0ec9>] ? trace_clock_local+0x9/0x10
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81116885>] handle_mm_fault+0x1d5/0x350
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81555090>] do_page_fault+0x140/0x470
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810ce4c3>] ? trace_nowake_buffer_unlock_commit+0x43/0x60
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81015c83>] ? ftrace_raw_event_sys_exit+0xb3/
> ==
> 
> Then, waiting for some page bit...I/O of libc mapped pages ?
> 
> Hmm. it seems buggy behavior. Okay, I'll dig this.
> 

IIUC, because of many many threads in a small memcg, it seems a loaded page from a filesystem
is dropped before mapped. Yes, it's verrrrry bad for shared pages as libc.

I'll consider a way to handle shared caches in clean and scalable way.
(An idea is avoiding pageout and put it to a victim cache/uncharge for
 avoiding ping-pong.)

But it seems there are no instant fix. please wait a little.
I may miss somthing...

Thanks,
-Kame





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock,  even for unlimited processes
  2011-06-22  1:06   ` KAMEZAWA Hiroyuki
  2011-06-22 10:20     ` KAMEZAWA Hiroyuki
@ 2011-06-22 14:37     ` Michal Hocko
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2011-06-22 14:37 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Lutz Vieweg, Balbir Singh, Daisuke Nishimura, linux-mm

On Wed 22-06-11 10:06:15, KAMEZAWA Hiroyuki wrote:
> On Wed, 22 Jun 2011 09:10:18 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Tue, 21 Jun 2011 16:51:18 +0200
> > Lutz Vieweg <lvml@5t9.de> wrote:
> 
> > > - but also at this time, if any other user (who has not exhausted
> > >    his memory limits) tries to access any file (at least on /tmp/,
> > >    as e.g. gcc does), even a simple "ls /tmp/", this operation
> > >    waits forever. (But "iostat" does not indicate any I/O activity.)
> > > 
> > 
> > Hmm, it means your 'ls' gets some lock and wait for it. Then, what lock
> > you wait for ? what w_chan is shown in 'ps -elf' ?
> > 
> 
> I reproduced. And checked sysrq t.
> 
> At first, some oom killers run.
> Second, oom killer stops by some reason. (I think there are a KILLED process in memcg
> but it doesn't exit. I'll check memcg' bypass logic.)
> 
> Third, ls /tmp stops.
> 
> Here is sysrq log.
> ==
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012] ls              D 0000000000000082  5448 22307   2799 0x10000000
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  ffff880623a7bb08 0000000000000086 ffff88033fffcc08 ffff88033fffbe70
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  ffff8805fa70c530 0000000000012880 ffff880623a7bfd8 ffff880623a7a010
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  ffff880623a7bfd8 0000000000012880 ffff8805f9c30000 ffff8805fa70c530
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012] Call Trace:
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810ee900>] ? sleep_on_page+0x20/0x20
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8154f40c>] io_schedule+0x8c/0xd0
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810ee90e>] sleep_on_page_killable+0xe/0x40
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8154fddf>] __wait_on_bit+0x5f/0x90
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810eeab5>] wait_on_page_bit_killable+0x75/0x80
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810791f0>] ? autoremove_wake_function+0x40/0x40
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810eec35>] __lock_page_or_retry+0x95/0xc0
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810efe7f>] filemap_fault+0x2df/0x4b0
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81115685>] __do_fault+0x55/0x530
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81119150>] ? unmap_region+0x110/0x140
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81115c57>] handle_pte_fault+0xf7/0xb50
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8112f87a>] ? alloc_pages_current+0xaa/0x110
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff8103a857>] ? pte_alloc_one+0x37/0x50
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81011ea9>] ? sched_clock+0x9/0x10
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810c0ec9>] ? trace_clock_local+0x9/0x10
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81116885>] handle_mm_fault+0x1d5/0x350
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81555090>] do_page_fault+0x140/0x470
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff810ce4c3>] ? trace_nowake_buffer_unlock_commit+0x43/0x60
> Jun 22 10:04:29 bluextal kernel: [ 1366.149012]  [<ffffffff81015c83>] ? ftrace_raw_event_sys_exit+0xb3/
> ==
> 
> Then, waiting for some page bit...I/O of libc mapped pages ?
> 
> Hmm. it seems buggy behavior. Okay, I'll dig this.

I have seen similar behavior and posted a patch just today:
https://lkml.org/lkml/2011/6/22/163

The point is that the original fault in page is locked when we try to
charge a new COW page and things can get bad when we reach long taking
OOM.
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock,  even for unlimited processes
  2011-06-22  9:53   ` Lutz Vieweg
@ 2011-06-23  6:13     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-23  6:13 UTC (permalink / raw)
  To: Lutz Vieweg; +Cc: Balbir Singh, Daisuke Nishimura, linux-mm

On Wed, 22 Jun 2011 11:53:10 +0200
Lutz Vieweg <lvml@5t9.de> wrote:

> On 06/22/2011 02:10 AM, KAMEZAWA Hiroyuki wrote:

> > Then, waiting for some page bit...I/O of libc mapped pages ?
> >
> > Hmm. it seems buggy behavior. Okay, I'll dig this.
> 
> Thanks a lot for investigating!
> 

This patch works for me. please see the thread
https://lkml.org/lkml/2011/6/22/163, too.
==

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-06-23  6:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-21 14:51 "make -j" with memory.(memsw.)limit_in_bytes smaller than required -> livelock, even for unlimited processes Lutz Vieweg
2011-06-21 16:01 ` Ying Han
2011-06-21 16:19   ` Lutz Vieweg
2011-06-21 16:28     ` Ying Han
2011-06-21 16:35       ` Lutz Vieweg
2011-06-22  0:10 ` KAMEZAWA Hiroyuki
2011-06-22  1:06   ` KAMEZAWA Hiroyuki
2011-06-22 10:20     ` KAMEZAWA Hiroyuki
2011-06-22 14:37     ` Michal Hocko
2011-06-22  9:53   ` Lutz Vieweg
2011-06-23  6:13     ` KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.