linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory
  2015-12-27  9:14 [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory Figo Zhang
@ 2015-12-27  3:58 ` Greg KH
  2015-12-27  6:47   ` Zhang, Tianfei
  2015-12-28  1:27 ` David Rientjes
  1 sibling, 1 reply; 8+ messages in thread
From: Greg KH @ 2015-12-27  3:58 UTC (permalink / raw)
  To: Figo Zhang
  Cc: mhocko, rientjes, linux-kernel, arve, anton.vorontsov,
	kirill.shutemov, riandrews, devel

On Sun, Dec 27, 2015 at 05:14:51PM +0800, Figo Zhang wrote:
> Android System UI hang when run heavy monkey stress test.
> 
> V2: add more detail about how to re-produce this issue, the
> important is install more than 100 apps/games.
> 
> Re-produce step:
> Run this monkey stress test script with more than 100
> apps/games installed:
> 
> adb shell "monkey --ignore-crashes --ignore-timeouts
> --kill-process-after-error --ignore-security-exceptions
> --throttle 200 -v 20000000"
> 
> kernel log:
> [ 1526.272125] lowmem_scan start: 128, 213da, ofree -9849 34419, ma 529
> [ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill
> [ 1526.272299] lowmem_d_timeout=4296194081
> [ 1526.272303] Killing 'dTi-lm' (27289), adj 647,
> [ 1526.272303]    to free 42520kB on behalf of 'servicemanager' (2365) because
> [ 1526.272303]    cache 137676kB is below limit 221184kB for oom_score_adj 529
> [ 1526.272303]    Free memory is -39396kB above reserved
> [ 1526.272304] lowmem_scan end: 128, 213da, return 10630
> [ 1526.272710] lowmem_scan start: 128, 213da, ofree -9849 34373, ma 529
> [ 1526.272832] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193081, 4296194081
> [ 1526.274450] lowmem_scan start: 128, 280da, ofree -9601 34327, ma 529
> [ 1526.274695] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193083, 4296194081
> [ 1526.282292] lowmem_scan start: 128, 213da, ofree -9703 34327, ma 529
> [ 1526.282727] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193090, 4296194081
> [ 1526.316888] lowmem_scan start: 128, 213da, ofree -9766 34465, ma 529
> [ 1526.317019] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
> [ 1526.319311] lowmem_scan start: 128, 213da, ofree -9856 34419, ma 529
> [ 1526.319442] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
> [ 1526.322026] lowmem_scan start: 128, 280da, ofree -9841 34327, ma 529
> [ 1526.360831] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193166, 4296194081
> [ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529
> [ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529
> [ 1527.437578] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194246, 4296195109
> [ 1527.442559] lowmem_scan start: 128, 213da, ofree -9850 41884, ma 529
> [ 1527.459540] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194268, 4296195109
> [ 1527.500352] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194309, 4296195109
> 
> when this happened, the android system UI will hang, no process can be
> select to kill.
> 
> i found the the value of "lowmem_deathpending_timeout" will be modified
> strangely, like in last killing, the value is 4296194081, but why not it
> had changed to 4296195109? so it will cause the deadloop in low memory
> state which will cause the android system UI hang, because no process will
>  be kill.
> 
> commit e5d7965f88a3 ("staging: android: lowmemorykiller: Don't wait more
> than one second for a process to die") said wait 1 seconds, i think it no
> need to wait 1 second, because we don't select the process with
> "TIF_MEMDIE" to kill.
> 
> Signed-off-by: Figo Zhang <tianfei.zhang@intel.com>

As you ignored my instructions to you, I'm going to just ignore this
patch, sorry.  Now discarded.

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory
  2015-12-27  3:58 ` Greg KH
@ 2015-12-27  6:47   ` Zhang, Tianfei
  0 siblings, 0 replies; 8+ messages in thread
From: Zhang, Tianfei @ 2015-12-27  6:47 UTC (permalink / raw)
  To: Greg KH, Wu, Fengguang
  Cc: mhocko, rientjes, linux-kernel, arve, anton.vorontsov,
	kirill.shutemov, riandrews, devel

> >
> > Signed-off-by: Figo Zhang <tianfei.zhang@intel.com>
> 
> As you ignored my instructions to you, I'm going to just ignore this patch,
> sorry.  Now discarded.
> 
I am no at intel kernel group (SSG OTC), I am at CCG CCE. I will involve the intel open source guys if they like to help.


Hi fengguang:

Would you like to help review or involve the related intel guys? This issue can reproduce on android devices, I think
This current upstream code maybe some problem. Your technical comments are welcome.

Best,
Figo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory
@ 2015-12-27  9:14 Figo Zhang
  2015-12-27  3:58 ` Greg KH
  2015-12-28  1:27 ` David Rientjes
  0 siblings, 2 replies; 8+ messages in thread
From: Figo Zhang @ 2015-12-27  9:14 UTC (permalink / raw)
  To: gregkh, mhocko, rientjes, tianfei.zhang, linux-kernel, arve,
	anton.vorontsov, kirill.shutemov, riandrews, devel

Android System UI hang when run heavy monkey stress test.

V2: add more detail about how to re-produce this issue, the
important is install more than 100 apps/games.

Re-produce step:
Run this monkey stress test script with more than 100
apps/games installed:

adb shell "monkey --ignore-crashes --ignore-timeouts
--kill-process-after-error --ignore-security-exceptions
--throttle 200 -v 20000000"

kernel log:
[ 1526.272125] lowmem_scan start: 128, 213da, ofree -9849 34419, ma 529
[ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill
[ 1526.272299] lowmem_d_timeout=4296194081
[ 1526.272303] Killing 'dTi-lm' (27289), adj 647,
[ 1526.272303]    to free 42520kB on behalf of 'servicemanager' (2365) because
[ 1526.272303]    cache 137676kB is below limit 221184kB for oom_score_adj 529
[ 1526.272303]    Free memory is -39396kB above reserved
[ 1526.272304] lowmem_scan end: 128, 213da, return 10630
[ 1526.272710] lowmem_scan start: 128, 213da, ofree -9849 34373, ma 529
[ 1526.272832] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193081, 4296194081
[ 1526.274450] lowmem_scan start: 128, 280da, ofree -9601 34327, ma 529
[ 1526.274695] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193083, 4296194081
[ 1526.282292] lowmem_scan start: 128, 213da, ofree -9703 34327, ma 529
[ 1526.282727] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193090, 4296194081
[ 1526.316888] lowmem_scan start: 128, 213da, ofree -9766 34465, ma 529
[ 1526.317019] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
[ 1526.319311] lowmem_scan start: 128, 213da, ofree -9856 34419, ma 529
[ 1526.319442] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
[ 1526.322026] lowmem_scan start: 128, 280da, ofree -9841 34327, ma 529
[ 1526.360831] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193166, 4296194081
[ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529
[ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529
[ 1527.437578] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194246, 4296195109
[ 1527.442559] lowmem_scan start: 128, 213da, ofree -9850 41884, ma 529
[ 1527.459540] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194268, 4296195109
[ 1527.500352] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194309, 4296195109

when this happened, the android system UI will hang, no process can be
select to kill.

i found the the value of "lowmem_deathpending_timeout" will be modified
strangely, like in last killing, the value is 4296194081, but why not it
had changed to 4296195109? so it will cause the deadloop in low memory
state which will cause the android system UI hang, because no process will
 be kill.

commit e5d7965f88a3 ("staging: android: lowmemorykiller: Don't wait more
than one second for a process to die") said wait 1 seconds, i think it no
need to wait 1 second, because we don't select the process with
"TIF_MEMDIE" to kill.

Signed-off-by: Figo Zhang <tianfei.zhang@intel.com>
---
 drivers/staging/android/lowmemorykiller.c |    9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
index e679d84..13c7182 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -59,8 +59,6 @@ static int lowmem_minfree[6] = {
 };
 static int lowmem_minfree_size = 4;
 
-static unsigned long lowmem_deathpending_timeout;
-
 #define lowmem_print(level, x...)			\
 	do {						\
 		if (lowmem_debug_level >= (level))	\
@@ -128,11 +126,9 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 		if (!p)
 			continue;
 
-		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
-		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
+		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
 			task_unlock(p);
-			rcu_read_unlock();
-			return 0;
+			continue;
 		}
 		oom_score_adj = p->signal->oom_score_adj;
 		if (oom_score_adj < min_score_adj) {
@@ -170,7 +166,6 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 		lowmem_print(1, "send sigkill to %d (%s), adj %hd, size %d\n",
 			     selected->pid, selected->comm,
 			     selected_oom_score_adj, selected_tasksize);
-		lowmem_deathpending_timeout = jiffies + HZ;
 		rem += selected_tasksize;
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory
  2015-12-27  9:14 [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory Figo Zhang
  2015-12-27  3:58 ` Greg KH
@ 2015-12-28  1:27 ` David Rientjes
  2015-12-29  4:58   ` Zhang, Tianfei
  1 sibling, 1 reply; 8+ messages in thread
From: David Rientjes @ 2015-12-28  1:27 UTC (permalink / raw)
  To: Figo Zhang
  Cc: gregkh, mhocko, linux-kernel, arve, anton.vorontsov,
	kirill.shutemov, riandrews, devel

On Sun, 27 Dec 2015, Figo Zhang wrote:

> Android System UI hang when run heavy monkey stress test.
> 
> V2: add more detail about how to re-produce this issue, the
> important is install more than 100 apps/games.
> 
> Re-produce step:
> Run this monkey stress test script with more than 100
> apps/games installed:
> 
> adb shell "monkey --ignore-crashes --ignore-timeouts
> --kill-process-after-error --ignore-security-exceptions
> --throttle 200 -v 20000000"
> 
> kernel log:
> [ 1526.272125] lowmem_scan start: 128, 213da, ofree -9849 34419, ma 529
> [ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill
> [ 1526.272299] lowmem_d_timeout=4296194081
> [ 1526.272303] Killing 'dTi-lm' (27289), adj 647,
> [ 1526.272303]    to free 42520kB on behalf of 'servicemanager' (2365) because
> [ 1526.272303]    cache 137676kB is below limit 221184kB for oom_score_adj 529
> [ 1526.272303]    Free memory is -39396kB above reserved
> [ 1526.272304] lowmem_scan end: 128, 213da, return 10630
> [ 1526.272710] lowmem_scan start: 128, 213da, ofree -9849 34373, ma 529
> [ 1526.272832] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193081, 4296194081
> [ 1526.274450] lowmem_scan start: 128, 280da, ofree -9601 34327, ma 529
> [ 1526.274695] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193083, 4296194081
> [ 1526.282292] lowmem_scan start: 128, 213da, ofree -9703 34327, ma 529
> [ 1526.282727] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193090, 4296194081
> [ 1526.316888] lowmem_scan start: 128, 213da, ofree -9766 34465, ma 529
> [ 1526.317019] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
> [ 1526.319311] lowmem_scan start: 128, 213da, ofree -9856 34419, ma 529
> [ 1526.319442] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
> [ 1526.322026] lowmem_scan start: 128, 280da, ofree -9841 34327, ma 529
> [ 1526.360831] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193166, 4296194081
> [ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529
> [ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529
> [ 1527.437578] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194246, 4296195109
> [ 1527.442559] lowmem_scan start: 128, 213da, ofree -9850 41884, ma 529
> [ 1527.459540] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194268, 4296195109
> [ 1527.500352] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194309, 4296195109
> 
> when this happened, the android system UI will hang, no process can be
> select to kill.
> 
> i found the the value of "lowmem_deathpending_timeout" will be modified
> strangely, like in last killing, the value is 4296194081, but why not it
> had changed to 4296195109? so it will cause the deadloop in low memory
> state which will cause the android system UI hang, because no process will
>  be kill.
> 

I'm assuming that you are loading the lowmem killer as a module since 
that's how you would modify lowmem_debug_level.  It appears that 
lowmem_debug_level is 2 from your kernel log, otherwise part of the log is 
missing.

I can tell this since you have a

	[ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill

line but not a line matching "send sigkill to %d (%s), adj %hd, size %d\n" 
with loglevel 1.

I think changing lowmem_debug_level to 1 would help to understand this 
issue better.

I think lowmem_deathpending_timeout is getting changed to 4296195109 at

	[ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529
	> HERE <
	[ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529

However, it appears that the same process, dTi-lm, is still chosen for oom 
kill because lowmem_deathpending_timeout has expired.

So this looks like a problem if the constantly chosen process cannot exit.  
It would have been helpful to have the stack of pid 27289 in the log to 
see where it was stuck.  But I think it may be unrelated to 
lowmem_deathpending_timeout itself.  We'd be better off selecting a 
different process to kill with something like this:

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -128,11 +128,15 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 		if (!p)
 			continue;
 
-		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
-		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
-			task_unlock(p);
-			rcu_read_unlock();
-			return 0;
+		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+			if (time_before_eq(jiffies,
+					   lowmem_deathpending_timeout)) {
+				task_unlock(p);
+				rcu_read_unlock();
+				return 0;
+			}
+			/* Need to select a different process to kill */
+			continue;
 		}
 		oom_score_adj = p->signal->oom_score_adj;
 		if (oom_score_adj < min_score_adj) {

But we need more information.  Please make sure that lowmem_debug_level is 
1, try to get a complete kernel log, and if possible please try to capture 
the stack of the process that can't exit (use /proc/<pid>/stack) before 
trying the above patch.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory
  2015-12-28  1:27 ` David Rientjes
@ 2015-12-29  4:58   ` Zhang, Tianfei
  0 siblings, 0 replies; 8+ messages in thread
From: Zhang, Tianfei @ 2015-12-29  4:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: gregkh, mhocko, arve, anton.vorontsov, kirill.shutemov,
	riandrews, devel, Wu, Fengguang, linux-kernel

> However, it appears that the same process, dTi-lm, is still chosen for oom kill
> because lowmem_deathpending_timeout has expired.
> 
> So this looks like a problem if the constantly chosen process cannot exit.
> It would have been helpful to have the stack of pid 27289 in the log to see
> where it was stuck.  But I think it may be unrelated to
> lowmem_deathpending_timeout itself.  We'd be better off selecting a
> different process to kill with something like this:
> 
> diff --git a/drivers/staging/android/lowmemorykiller.c
> b/drivers/staging/android/lowmemorykiller.c
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ b/drivers/staging/android/lowmemorykiller.c
> @@ -128,11 +128,15 @@ static unsigned long lowmem_scan(struct shrinker
> *s, struct shrink_control *sc)
>  		if (!p)
>  			continue;
> 
> -		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> -		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
> -			task_unlock(p);
> -			rcu_read_unlock();
> -			return 0;
> +		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
> +			if (time_before_eq(jiffies,
> +					   lowmem_deathpending_timeout)) {
> +				task_unlock(p);
> +				rcu_read_unlock();
> +				return 0;
> +			}
> +			/* Need to select a different process to kill */
> +			continue;
>  		}
>  		oom_score_adj = p->signal->oom_score_adj;
>  		if (oom_score_adj < min_score_adj) {
> 
> But we need more information.  Please make sure that
> lowmem_debug_level is 1, try to get a complete kernel log, and if possible
> please try to capture the stack of the process that can't exit (use
> /proc/<pid>/stack) before trying the above patch.

Hi Rientjes:
I re-test the monkey stress test on your patches, it seems better than current mainline code.

The kernel log is a little big, more than 10 MB. I send to you directly.

Best
tianfei






^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory
@ 2015-12-27  8:34 Figo.zhang
  2015-12-27  0:55 ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Figo.zhang @ 2015-12-27  8:34 UTC (permalink / raw)
  To: gregkh, mhocko, rientjes, tianfei.zhang, linux-kernel, arve,
	anton.vorontsov, kirill.shutemov, riandrews, devel

From: Figo <tianfei.zhang@intel.com>


Android System UI hang when run heavy monkey stress test.

V2: add more detail about how to re-produce this issue, the
important is install more than 100 apps/games.

Re-produce step:
Run this monkey stress test script with more than 100
apps/games installed:

adb shell "monkey --ignore-crashes --ignore-timeouts
--kill-process-after-error --ignore-security-exceptions
--throttle 200 -v 20000000"

kernel log:
[ 1526.272125] lowmem_scan start: 128, 213da, ofree -9849 34419, ma 529
[ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill
[ 1526.272299] lowmem_d_timeout=4296194081
[ 1526.272303] Killing 'dTi-lm' (27289), adj 647,
[ 1526.272303]    to free 42520kB on behalf of 'servicemanager' (2365) because
[ 1526.272303]    cache 137676kB is below limit 221184kB for oom_score_adj 529
[ 1526.272303]    Free memory is -39396kB above reserved
[ 1526.272304] lowmem_scan end: 128, 213da, return 10630
[ 1526.272710] lowmem_scan start: 128, 213da, ofree -9849 34373, ma 529
[ 1526.272832] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193081, 4296194081
[ 1526.274450] lowmem_scan start: 128, 280da, ofree -9601 34327, ma 529
[ 1526.274695] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193083, 4296194081
[ 1526.282292] lowmem_scan start: 128, 213da, ofree -9703 34327, ma 529
[ 1526.282727] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193090, 4296194081
[ 1526.316888] lowmem_scan start: 128, 213da, ofree -9766 34465, ma 529
[ 1526.317019] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
[ 1526.319311] lowmem_scan start: 128, 213da, ofree -9856 34419, ma 529
[ 1526.319442] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
[ 1526.322026] lowmem_scan start: 128, 280da, ofree -9841 34327, ma 529
[ 1526.360831] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193166, 4296194081
[ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529
[ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529
[ 1527.437578] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194246, 4296195109
[ 1527.442559] lowmem_scan start: 128, 213da, ofree -9850 41884, ma 529
[ 1527.459540] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194268, 4296195109
[ 1527.500352] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194309, 4296195109

when this happened, the android system UI will hang, no process can be
select to kill.

i found the the value of "lowmem_deathpending_timeout" will be modified
strangely, like in last killing, the value is 4296194081, but why not it
had changed to 4296195109? so it will cause the deadloop in low memory
state which will cause the android system UI hang, because no process will
 be kill.

commit e5d7965f88a3 ("staging: android: lowmemorykiller: Don't wait more
than one second for a process to die") said wait 1 seconds, i think it no
need to wait 1 second, because we don't select the process with
"TIF_MEMDIE" to kill.

Signed-off-by: Figo.zhang <tianfei.zhang@intel.com>
---
 drivers/staging/android/lowmemorykiller.c |    9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
index e679d84..13c7182 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -59,8 +59,6 @@ static int lowmem_minfree[6] = {
 };
 static int lowmem_minfree_size = 4;
 
-static unsigned long lowmem_deathpending_timeout;
-
 #define lowmem_print(level, x...)			\
 	do {						\
 		if (lowmem_debug_level >= (level))	\
@@ -128,11 +126,9 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 		if (!p)
 			continue;
 
-		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
-		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
+		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
 			task_unlock(p);
-			rcu_read_unlock();
-			return 0;
+			continue;
 		}
 		oom_score_adj = p->signal->oom_score_adj;
 		if (oom_score_adj < min_score_adj) {
@@ -170,7 +166,6 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 		lowmem_print(1, "send sigkill to %d (%s), adj %hd, size %d\n",
 			     selected->pid, selected->comm,
 			     selected_oom_score_adj, selected_tasksize);
-		lowmem_deathpending_timeout = jiffies + HZ;
 		rem += selected_tasksize;
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory
@ 2015-12-27  8:18 Figo.zhang
  0 siblings, 0 replies; 8+ messages in thread
From: Figo.zhang @ 2015-12-27  8:18 UTC (permalink / raw)
  To: gregkh, mhocko, rientjes, tianfei.zhang, linux-kernel, arve,
	anton.vorontsov, kirill.shutemov, riandrews, devel

From: Figo <tianfei.zhang@intel.com>

Android System UI hang when run heavy monkey stress test.

V2: add more detail about how to re-produce this issue, the
important is install more than 100 apps/games.

Re-produce step:
Run this monkey stress test script with more than 100
apps/games installed:
--kill-process-after-error --ignore-security-exceptions
 --throttle 200 -v 20000000"

kernel log:
[ 1526.272125] lowmem_scan start: 128, 213da, ofree -9849 34419, ma 529
[ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill
[ 1526.272299] lowmem_d_timeout=4296194081
[ 1526.272303] Killing 'dTi-lm' (27289), adj 647,
[ 1526.272303]    to free 42520kB on behalf of 'servicemanager' (2365) because
[ 1526.272303]    cache 137676kB is below limit 221184kB for oom_score_adj 529
[ 1526.272303]    Free memory is -39396kB above reserved
[ 1526.272304] lowmem_scan end: 128, 213da, return 10630
[ 1526.272710] lowmem_scan start: 128, 213da, ofree -9849 34373, ma 529
[ 1526.272832] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193081, 4296194081
[ 1526.274450] lowmem_scan start: 128, 280da, ofree -9601 34327, ma 529
[ 1526.274695] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193083, 4296194081
[ 1526.282292] lowmem_scan start: 128, 213da, ofree -9703 34327, ma 529
[ 1526.282727] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193090, 4296194081
[ 1526.316888] lowmem_scan start: 128, 213da, ofree -9766 34465, ma 529
[ 1526.317019] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
[ 1526.319311] lowmem_scan start: 128, 213da, ofree -9856 34419, ma 529
[ 1526.319442] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
[ 1526.322026] lowmem_scan start: 128, 280da, ofree -9841 34327, ma 529
[ 1526.360831] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193166, 4296194081
[ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529
[ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529
[ 1527.437578] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194246, 4296195109
[ 1527.442559] lowmem_scan start: 128, 213da, ofree -9850 41884, ma 529
[ 1527.459540] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194268, 4296195109
[ 1527.500352] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194309, 4296195109

when this happened, the android system UI will hang, no process can be
select to kill.

i found the the value of "lowmem_deathpending_timeout" will be modified
strangely, like in last killing, the value is 4296194081, but why not it
had changed to 4296195109? so it will cause the deadloop in low memory
state which will cause the android system UI hang, because no process will
 be kill.

commit e5d7965f88a3 ("staging: android: lowmemorykiller: Don't wait more
than one second for a process to die") said wait 1 seconds, i think it no
need to wait 1 second, because we don't select the process with
"TIF_MEMDIE" to kill.

Signed-off-by: Figo.zhang <tianfei.zhang@intel.com>
---
 drivers/staging/android/lowmemorykiller.c |    9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
index e679d84..13c7182 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -59,8 +59,6 @@ static int lowmem_minfree[6] = {
 };
 static int lowmem_minfree_size = 4;
 
-static unsigned long lowmem_deathpending_timeout;
-
 #define lowmem_print(level, x...)			\
 	do {						\
 		if (lowmem_debug_level >= (level))	\
@@ -128,11 +126,9 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 		if (!p)
 			continue;
 
-		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
-		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
+		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
 			task_unlock(p);
-			rcu_read_unlock();
-			return 0;
+			continue;
 		}
 		oom_score_adj = p->signal->oom_score_adj;
 		if (oom_score_adj < min_score_adj) {
@@ -170,7 +166,6 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 		lowmem_print(1, "send sigkill to %d (%s), adj %hd, size %d\n",
 			     selected->pid, selected->comm,
 			     selected_oom_score_adj, selected_tasksize);
-		lowmem_deathpending_timeout = jiffies + HZ;
 		rem += selected_tasksize;
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory
  2015-12-27  8:34 Figo.zhang
@ 2015-12-27  0:55 ` Greg KH
  0 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2015-12-27  0:55 UTC (permalink / raw)
  To: Figo.zhang
  Cc: mhocko, rientjes, linux-kernel, arve, anton.vorontsov,
	kirill.shutemov, riandrews, devel

On Sun, Dec 27, 2015 at 04:34:56PM +0800, Figo.zhang wrote:
> From: Figo <tianfei.zhang@intel.com>

This doesn't match your signed-off-by name :(

> Signed-off-by: Figo.zhang <tianfei.zhang@intel.com>


I doubt that you have a '.' in your name :(

Again, please consult the Intel Linux kernel group for how to do this
correctly, I will not accept the patch from you unless you have someone
from that group signed off on it as well.

Just slow down and get it right, don't waste everyone's time here
please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-12-29  4:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-27  9:14 [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory Figo Zhang
2015-12-27  3:58 ` Greg KH
2015-12-27  6:47   ` Zhang, Tianfei
2015-12-28  1:27 ` David Rientjes
2015-12-29  4:58   ` Zhang, Tianfei
  -- strict thread matches above, loose matches on Subject: below --
2015-12-27  8:34 Figo.zhang
2015-12-27  0:55 ` Greg KH
2015-12-27  8:18 Figo.zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).