From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754952Ab1EXLYg (ORCPT ); Tue, 24 May 2011 07:24:36 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:48452 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753866Ab1EXLYf convert rfc822-to-8bit (ORCPT ); Tue, 24 May 2011 07:24:35 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; b=pF1mTsf8cgA4fqVLON/9FfW1eTp4yKNkNIdqaXMOvlX3tzEkSyPDFL5dyeqTXB7hE4 hkSl10ab2SwgGbQs8LfTtvo9GayYHaFGjJ3RwmH9H9KvgexyBtmm3zi/loaa8ImZ49Yj rb1Yt4r7KQRCrudo3IJZVXsJGHNo9bd2TYQko= MIME-Version: 1.0 In-Reply-To: References: <4DD5DC06.6010204@jp.fujitsu.com> <20110520140856.fdf4d1c8.kamezawa.hiroyu@jp.fujitsu.com> <20110520101120.GC11729@random.random> <20110520153346.GA1843@barrios-desktop> <20110520161934.GA2386@barrios-desktop> From: Andrew Lutomirski Date: Tue, 24 May 2011 07:24:15 -0400 X-Google-Sender-Auth: lCL0kLgvu8Ful3_n5LQGF-3Qbh0 Message-ID: Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux) To: Minchan Kim Cc: KOSAKI Motohiro , Andrea Arcangeli , KAMEZAWA Hiroyuki , fengguang.wu@intel.com, andi@firstfloor.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mgorman@suse.de, hannes@cmpxchg.org, riel@redhat.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 23, 2011 at 9:34 PM, Minchan Kim wrote: > On Tue, May 24, 2011 at 10:19 AM, Andrew Lutomirski wrote: >> On Sun, May 22, 2011 at 7:12 PM, Minchan Kim wrote: >>> Could you test below patch based on vanilla 2.6.38.6? >>> The expect result is that system hang never should happen. >>> I hope this is last test about hang. >>> >>> Thanks. >>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 292582c..1663d24 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink, >>>       if (scanned == 0) >>>               scanned = SWAP_CLUSTER_MAX; >>> >>> -       if (!down_read_trylock(&shrinker_rwsem)) >>> -               return 1;       /* Assume we'll be able to shrink next time */ >>> +       if (!down_read_trylock(&shrinker_rwsem)) { >>> +               /* Assume we'll be able to shrink next time */ >>> +               ret = 1; >>> +               goto out; >>> +       } >>> >>>       list_for_each_entry(shrinker, &shrinker_list, list) { >>>               unsigned long long delta; >>> @@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *shrink, >>>               shrinker->nr += total_scan; >>>       } >>>       up_read(&shrinker_rwsem); >>> +out: >>> +       cond_resched(); >>>       return ret; >>>  } >>> >>> @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t >>> *pgdat, int order, long remaining, >>>        * must be balanced >>>        */ >>>       if (order) >>> -               return pgdat_balanced(pgdat, balanced, classzone_idx); >>> +               return !pgdat_balanced(pgdat, balanced, classzone_idx); >>>       else >>>               return !all_zones_ok; >>>  } >> >> So far with this patch I can't reproduce the hang or the bogus OOM. >> >> To be completely clear, I have COMPACTION, MIGRATION, and THP off, I'm >> running 2.6.38.6, and I have exactly two patches applied.  One is the >> attached patch and the other is a the fpu.ko/aesni_intel.ko merger >> which I need to get dracut to boot my box. >> >> For fun, I also upgraded to 8GB of RAM and it still works. >> > > Hmm. Could you test it with enable thp and 2G RAM? > Isn't it a original test environment? > Please don't change test environment. :) The test that passed last night was an environment (hardware and config) that I had confirmed earlier as failing without the patch. I just re-tested my original config (from a backup -- migration, compaction, and thp "always" are enabled). I get bogus OOMs but not a hang. (I'm running with mem=2G right now -- I'll swap the DIMMs back out later on if you want.) I attached the bogus OOM (actually several that happened in sequence). They look readahead-related. There was plenty of free swap space. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id CF7D36B0023 for ; Tue, 24 May 2011 07:24:36 -0400 (EDT) Received: by pwi12 with SMTP id 12so3882638pwi.14 for ; Tue, 24 May 2011 04:24:35 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <4DD5DC06.6010204@jp.fujitsu.com> <20110520140856.fdf4d1c8.kamezawa.hiroyu@jp.fujitsu.com> <20110520101120.GC11729@random.random> <20110520153346.GA1843@barrios-desktop> <20110520161934.GA2386@barrios-desktop> From: Andrew Lutomirski Date: Tue, 24 May 2011 07:24:15 -0400 Message-ID: Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux) Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: KOSAKI Motohiro , Andrea Arcangeli , KAMEZAWA Hiroyuki , fengguang.wu@intel.com, andi@firstfloor.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mgorman@suse.de, hannes@cmpxchg.org, riel@redhat.com On Mon, May 23, 2011 at 9:34 PM, Minchan Kim wrote: > On Tue, May 24, 2011 at 10:19 AM, Andrew Lutomirski wrote: >> On Sun, May 22, 2011 at 7:12 PM, Minchan Kim wro= te: >>> Could you test below patch based on vanilla 2.6.38.6? >>> The expect result is that system hang never should happen. >>> I hope this is last test about hang. >>> >>> Thanks. >>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 292582c..1663d24 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *s= hrink, >>> =A0 =A0 =A0 if (scanned =3D=3D 0) >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 scanned =3D SWAP_CLUSTER_MAX; >>> >>> - =A0 =A0 =A0 if (!down_read_trylock(&shrinker_rwsem)) >>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 1; =A0 =A0 =A0 /* Assume we'll be = able to shrink next time */ >>> + =A0 =A0 =A0 if (!down_read_trylock(&shrinker_rwsem)) { >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Assume we'll be able to shrink next ti= me */ >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D 1; >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out; >>> + =A0 =A0 =A0 } >>> >>> =A0 =A0 =A0 list_for_each_entry(shrinker, &shrinker_list, list) { >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 unsigned long long delta; >>> @@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *sh= rink, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 shrinker->nr +=3D total_scan; >>> =A0 =A0 =A0 } >>> =A0 =A0 =A0 up_read(&shrinker_rwsem); >>> +out: >>> + =A0 =A0 =A0 cond_resched(); >>> =A0 =A0 =A0 return ret; >>> =A0} >>> >>> @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t >>> *pgdat, int order, long remaining, >>> =A0 =A0 =A0 =A0* must be balanced >>> =A0 =A0 =A0 =A0*/ >>> =A0 =A0 =A0 if (order) >>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return pgdat_balanced(pgdat, balanced, cl= asszone_idx); >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return !pgdat_balanced(pgdat, balanced, c= lasszone_idx); >>> =A0 =A0 =A0 else >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return !all_zones_ok; >>> =A0} >> >> So far with this patch I can't reproduce the hang or the bogus OOM. >> >> To be completely clear, I have COMPACTION, MIGRATION, and THP off, I'm >> running 2.6.38.6, and I have exactly two patches applied. =A0One is the >> attached patch and the other is a the fpu.ko/aesni_intel.ko merger >> which I need to get dracut to boot my box. >> >> For fun, I also upgraded to 8GB of RAM and it still works. >> > > Hmm. Could you test it with enable thp and 2G RAM? > Isn't it a original test environment? > Please don't change test environment. :) The test that passed last night was an environment (hardware and config) that I had confirmed earlier as failing without the patch. I just re-tested my original config (from a backup -- migration, compaction, and thp "always" are enabled). I get bogus OOMs but not a hang. (I'm running with mem=3D2G right now -- I'll swap the DIMMs back out later on if you want.) I attached the bogus OOM (actually several that happened in sequence). They look readahead-related. There was plenty of free swap space. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org