From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755181Ab0JSCDk (ORCPT ); Mon, 18 Oct 2010 22:03:40 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:35062 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754621Ab0JSCDj convert rfc822-to-8bit (ORCPT ); Mon, 18 Oct 2010 22:03:39 -0400 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 From: KOSAKI Motohiro To: Minchan Kim Subject: Re: Deadlock possibly caused by too_many_isolated. Cc: kosaki.motohiro@jp.fujitsu.com, Andrew Morton , Neil Brown , Wu Fengguang , Rik van Riel , KAMEZAWA Hiroyuki , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "Li, Shaohua" In-Reply-To: References: <20101019102114.A1B9.A69D9226@jp.fujitsu.com> Message-Id: <20101019105257.A1C6.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 8BIT X-Mailer: Becky! ver. 2.50.07 [ja] Date: Tue, 19 Oct 2010 11:03:35 +0900 (JST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Tue, Oct 19, 2010 at 10:21 AM, KOSAKI Motohiro > wrote: > >> On Tue, Oct 19, 2010 at 9:57 AM, KOSAKI Motohiro > >> wrote: > >> >> > I think there are two bugs here. > >> >> > The raid1 bug that Torsten mentions is certainly real (and has been around > >> >> > for an embarrassingly long time). > >> >> > The bug that I identified in too_many_isolated is also a real bug and can be > >> >> > triggered without md/raid1 in the mix. > >> >> > So this is not a 'full fix' for every bug in the kernel :-), but it could > >> >> > well be a full fix for this particular bug. > >> >> > > >> >> > >> >> Can we just delete the too_many_isolated() logic?  (Crappy comment > >> >> describes what the code does but not why it does it). > >> > > >> > if my remember is correct, we got bug report that LTP may makes misterious > >> > OOM killer invocation about 1-2 years ago. because, if too many parocess are in > >> > reclaim path, all of reclaimable pages can be isolated and last reclaimer found > >> > the system don't have any reclaimable pages and lead to invoke OOM killer. > >> > We have strong motivation to avoid false positive oom. then, some discusstion > >> > made this patch. > >> > > >> > if my remember is incorrect, I hope Wu or Rik fix me. > >> > >> AFAIR, it's right. > >> > >> How about this? > >> > >> It's rather aggressive throttling than old(ie, it considers not lru > >> type granularity but zone ) > >> But I think it can prevent unnecessary OOM problem and solve deadlock problem. > > > > Can you please elaborate your intention? Do you think Wu's approach is wrong? > > No. I think Wu's patch may work well. But I agree Andrew. > Couldn't we remove the too_many_isolated logic? If it is, we can solve > the problem simply. > But If we remove the logic, we will meet long time ago problem, again. > So my patch's intention is to prevent OOM and deadlock problem with > simple patch without adding new heuristic in too_many_isolated. But your patch is much false positive/negative chance because isolated pages timing and too_many_isolated_zone() call site are in far distance place. So, if anyone don't say Wu's one is wrong, I like his one. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with SMTP id B93DF6B00B6 for ; Mon, 18 Oct 2010 22:03:40 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp ([10.0.50.74]) by fgwmail6.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id o9J23bv5008811 for (envelope-from kosaki.motohiro@jp.fujitsu.com); Tue, 19 Oct 2010 11:03:37 +0900 Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id E334445DE6F for ; Tue, 19 Oct 2010 11:03:36 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id A4AA945DE7A for ; Tue, 19 Oct 2010 11:03:36 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 66AA6EF8006 for ; Tue, 19 Oct 2010 11:03:36 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.249.87.103]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 05FFBEF800A for ; Tue, 19 Oct 2010 11:03:36 +0900 (JST) From: KOSAKI Motohiro Subject: Re: Deadlock possibly caused by too_many_isolated. In-Reply-To: References: <20101019102114.A1B9.A69D9226@jp.fujitsu.com> Message-Id: <20101019105257.A1C6.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable Date: Tue, 19 Oct 2010 11:03:35 +0900 (JST) Sender: owner-linux-mm@kvack.org To: Minchan Kim Cc: kosaki.motohiro@jp.fujitsu.com, Andrew Morton , Neil Brown , Wu Fengguang , Rik van Riel , KAMEZAWA Hiroyuki , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "Li, Shaohua" List-ID: > On Tue, Oct 19, 2010 at 10:21 AM, KOSAKI Motohiro > wrote: > >> On Tue, Oct 19, 2010 at 9:57 AM, KOSAKI Motohiro > >> wrote: > >> >> > I think there are two bugs here. > >> >> > The raid1 bug that Torsten mentions is certainly real (and has be= en around > >> >> > for an embarrassingly long time). > >> >> > The bug that I identified in too_many_isolated is also a real bug= and can be > >> >> > triggered without md/raid1 in the mix. > >> >> > So this is not a 'full fix' for every bug in the kernel :-), but = it could > >> >> > well be a full fix for this particular bug. > >> >> > > >> >> > >> >> Can we just delete the too_many_isolated() logic? =A0(Crappy commen= t > >> >> describes what the code does but not why it does it). > >> > > >> > if my remember is correct, we got bug report that LTP may makes mist= erious > >> > OOM killer invocation about 1-2 years ago. because, if too many paro= cess are in > >> > reclaim path, all of reclaimable pages can be isolated and last recl= aimer found > >> > the system don't have any reclaimable pages and lead to invoke OOM k= iller. > >> > We have strong motivation to avoid false positive oom. then, some di= scusstion > >> > made this patch. > >> > > >> > if my remember is incorrect, I hope Wu or Rik fix me. > >> > >> AFAIR, it's right. > >> > >> How about this? > >> > >> It's rather aggressive throttling than old(ie, it considers not lru > >> type granularity but zone ) > >> But I think it can prevent unnecessary OOM problem and solve deadlock = problem. > > > > Can you please elaborate your intention? Do you think Wu's approach is = wrong? >=20 > No. I think Wu's patch may work well. But I agree Andrew. > Couldn't we remove the too_many_isolated logic? If it is, we can solve > the problem simply. > But If we remove the logic, we will meet long time ago problem, again. > So my patch's intention is to prevent OOM and deadlock problem with > simple patch without adding new heuristic in too_many_isolated. But your patch is much false positive/negative chance because isolated page= s timing=20 and too_many_isolated_zone() call site are in far distance place. So, if anyone don't say Wu's one is wrong, I like his one. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org