All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan.kim@gmail.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Neil Brown <neilb@suse.de>, Rik van Riel <riel@redhat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Li, Shaohua" <shaohua.li@intel.com>
Subject: Re: Deadlock possibly caused by too_many_isolated.
Date: Tue, 19 Oct 2010 14:11:07 +0900	[thread overview]
Message-ID: <AANLkTi=Akuku=Sz7kw0JRB-bzP8cmirbX4XJ8qyVqmze@mail.gmail.com> (raw)
In-Reply-To: <20101019121321.A1E1.A69D9226@jp.fujitsu.com>

On Tue, Oct 19, 2010 at 12:13 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> On Tue, Oct 19, 2010 at 12:05 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> > On Tue, Oct 19, 2010 at 10:52:47AM +0800, Minchan Kim wrote:
>> >> Hi Wu,
>> >>
>> >> On Tue, Oct 19, 2010 at 11:35 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> >> >> @@ -2054,10 +2069,11 @@ rebalance:
>> >> >>                 goto got_pg;
>> >> >>
>> >> >>         /*
>> >> >> -        * If we failed to make any progress reclaiming, then we are
>> >> >> -        * running out of options and have to consider going OOM
>> >> >> +        * If we failed to make any progress reclaiming and there aren't
>> >> >> +        * many parallel reclaiming, then we are unning out of options and
>> >> >> +        * have to consider going OOM
>> >> >>          */
>> >> >> -       if (!did_some_progress) {
>> >> >> +       if (!did_some_progress && !too_many_isolated_zone(preferred_zone)) {
>> >> >>                 if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
>> >> >>                         if (oom_killer_disabled)
>> >> >>                                 goto nopage;
>> >> >
>> >> > This is simply wrong.
>> >> >
>> >> > It disabled this block for 99% system because there won't be enough
>> >> > tasks to make (!too_many_isolated_zone == true). As a result the LRU
>> >> > will be scanned like mad and no task get OOMed when it should be.
>> >>
>> >> If !too_many_isolated_zone is false, it means there are already many
>> >> direct reclaiming tasks.
>> >> So they could exit reclaim path and !too_many_isolated_zone will be true.
>> >> What am I missing now?
>> >
>> > Ah sorry, my brain get short circuited.. but I still feel uneasy with
>> > this change. It's not fixing the root cause and won't prevent too many
>> > LRU pages be isolated. It's too late to test too_many_isolated_zone()
>> > after direct reclaim returns (after sleeping for a long time).
>> >
>>
>> Intend to agree.
>> I think root cause is a infinite looping in too_many_isolated holding FS lock.
>> Would it be simple that too_many_isolated would be bail out after some try?
>
> How?
> A lot of caller don't have good recover logic when memory allocation fail occur.
>

I means following as.

1. shrink_inactive_list
2. if too_many_isolated is looping than 5 times, it marks some
variable to notice this fail is concurrent reclaim and bail out
3. __alloc_pages_slowpath see that did_some_progress is zero and the
mark which show bailout by concurrent reclaim.
4. Instead of OOM, congestion_wait and rebalance.

While I implement it, I knew it makes code rather ugly and I thought
lost is bigger than gain.

Okay. I will drop this idea.

Thanks for advising me, Wu, KOSAKI.
-- 
Kind regards,
Minchan Kim

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan.kim@gmail.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Neil Brown <neilb@suse.de>, Rik van Riel <riel@redhat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Li, Shaohua" <shaohua.li@intel.com>
Subject: Re: Deadlock possibly caused by too_many_isolated.
Date: Tue, 19 Oct 2010 14:11:07 +0900	[thread overview]
Message-ID: <AANLkTi=Akuku=Sz7kw0JRB-bzP8cmirbX4XJ8qyVqmze@mail.gmail.com> (raw)
In-Reply-To: <20101019121321.A1E1.A69D9226@jp.fujitsu.com>

On Tue, Oct 19, 2010 at 12:13 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> On Tue, Oct 19, 2010 at 12:05 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> > On Tue, Oct 19, 2010 at 10:52:47AM +0800, Minchan Kim wrote:
>> >> Hi Wu,
>> >>
>> >> On Tue, Oct 19, 2010 at 11:35 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> >> >> @@ -2054,10 +2069,11 @@ rebalance:
>> >> >>                 goto got_pg;
>> >> >>
>> >> >>         /*
>> >> >> -        * If we failed to make any progress reclaiming, then we are
>> >> >> -        * running out of options and have to consider going OOM
>> >> >> +        * If we failed to make any progress reclaiming and there aren't
>> >> >> +        * many parallel reclaiming, then we are unning out of options and
>> >> >> +        * have to consider going OOM
>> >> >>          */
>> >> >> -       if (!did_some_progress) {
>> >> >> +       if (!did_some_progress && !too_many_isolated_zone(preferred_zone)) {
>> >> >>                 if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
>> >> >>                         if (oom_killer_disabled)
>> >> >>                                 goto nopage;
>> >> >
>> >> > This is simply wrong.
>> >> >
>> >> > It disabled this block for 99% system because there won't be enough
>> >> > tasks to make (!too_many_isolated_zone == true). As a result the LRU
>> >> > will be scanned like mad and no task get OOMed when it should be.
>> >>
>> >> If !too_many_isolated_zone is false, it means there are already many
>> >> direct reclaiming tasks.
>> >> So they could exit reclaim path and !too_many_isolated_zone will be true.
>> >> What am I missing now?
>> >
>> > Ah sorry, my brain get short circuited.. but I still feel uneasy with
>> > this change. It's not fixing the root cause and won't prevent too many
>> > LRU pages be isolated. It's too late to test too_many_isolated_zone()
>> > after direct reclaim returns (after sleeping for a long time).
>> >
>>
>> Intend to agree.
>> I think root cause is a infinite looping in too_many_isolated holding FS lock.
>> Would it be simple that too_many_isolated would be bail out after some try?
>
> How?
> A lot of caller don't have good recover logic when memory allocation fail occur.
>

I means following as.

1. shrink_inactive_list
2. if too_many_isolated is looping than 5 times, it marks some
variable to notice this fail is concurrent reclaim and bail out
3. __alloc_pages_slowpath see that did_some_progress is zero and the
mark which show bailout by concurrent reclaim.
4. Instead of OOM, congestion_wait and rebalance.

While I implement it, I knew it makes code rather ugly and I thought
lost is bigger than gain.

Okay. I will drop this idea.

Thanks for advising me, Wu, KOSAKI.
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-10-19  5:11 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-14 23:11 Deadlock possibly caused by too_many_isolated Neil Brown
2010-09-14 23:11 ` Neil Brown
2010-09-15  0:30 ` Rik van Riel
2010-09-15  0:30   ` Rik van Riel
2010-09-15  2:23   ` Neil Brown
2010-09-15  2:23     ` Neil Brown
2010-09-15  2:37     ` Wu Fengguang
2010-09-15  2:37       ` Wu Fengguang
2010-09-15  2:54       ` Wu Fengguang
2010-09-15  2:54         ` Wu Fengguang
2010-09-15  3:06         ` Wu Fengguang
2010-09-15  3:06           ` Wu Fengguang
2010-09-15  3:13           ` Wu Fengguang
2010-09-15  3:13             ` Wu Fengguang
2010-09-15  3:18             ` Shaohua Li
2010-09-15  3:18               ` Shaohua Li
2010-09-15  3:31               ` Wu Fengguang
2010-09-15  3:31                 ` Wu Fengguang
2010-09-15  3:17           ` Neil Brown
2010-09-15  3:17             ` Neil Brown
2010-09-15  3:47             ` Wu Fengguang
2010-09-15  3:47               ` Wu Fengguang
2010-09-15  8:28     ` Wu Fengguang
2010-09-15  8:28       ` Wu Fengguang
2010-09-15  8:44       ` Neil Brown
2010-09-15  8:44         ` Neil Brown
2010-10-18  4:14         ` Neil Brown
2010-10-18  4:14           ` Neil Brown
2010-10-18  5:04           ` KOSAKI Motohiro
2010-10-18  5:04             ` KOSAKI Motohiro
2010-10-18 10:58           ` Torsten Kaiser
2010-10-18 10:58             ` Torsten Kaiser
2010-10-18 23:11             ` Neil Brown
2010-10-18 23:11               ` Neil Brown
2010-10-19  8:43               ` Torsten Kaiser
2010-10-19  8:43                 ` Torsten Kaiser
2010-10-19 10:06                 ` Torsten Kaiser
2010-10-19 10:06                   ` Torsten Kaiser
2010-10-20  5:57                   ` Wu Fengguang
2010-10-20  5:57                     ` Wu Fengguang
2010-10-20  7:05                     ` KOSAKI Motohiro
2010-10-20  7:05                       ` KOSAKI Motohiro
2010-10-20  9:27                       ` Wu Fengguang
2010-10-20  9:27                         ` Wu Fengguang
2010-10-20 13:03                         ` Jens Axboe
2010-10-20 13:03                           ` Jens Axboe
2010-10-22  5:37                           ` Wu Fengguang
2010-10-22  5:37                             ` Wu Fengguang
2010-10-22  8:07                             ` Wu Fengguang
2010-10-22  8:07                               ` Wu Fengguang
2010-10-22  8:09                               ` Jens Axboe
2010-10-22  8:09                                 ` Jens Axboe
2010-10-24 16:52                                 ` Wu Fengguang
2010-10-24 16:52                                   ` Wu Fengguang
2010-10-25  6:40                                   ` Neil Brown
2010-10-25  6:40                                     ` Neil Brown
2010-10-25  7:26                                     ` Wu Fengguang
2010-10-25  7:26                                       ` Wu Fengguang
2010-10-20  7:25                     ` Torsten Kaiser
2010-10-20  7:25                       ` Torsten Kaiser
2010-10-20  9:01                       ` Wu Fengguang
2010-10-20  9:01                         ` Wu Fengguang
2010-10-20 10:07                         ` Torsten Kaiser
2010-10-20 10:07                           ` Torsten Kaiser
2010-10-20 14:23                       ` Minchan Kim
2010-10-20 14:23                         ` Minchan Kim
2010-10-20 15:35                         ` Torsten Kaiser
2010-10-20 15:35                           ` Torsten Kaiser
2010-10-20 23:31                           ` Minchan Kim
2010-10-20 23:31                             ` Minchan Kim
2010-10-18 16:15           ` Wu Fengguang
2010-10-18 16:15             ` Wu Fengguang
2010-10-18 21:58             ` Andrew Morton
2010-10-18 21:58               ` Andrew Morton
2010-10-18 22:31               ` Neil Brown
2010-10-18 22:31                 ` Neil Brown
2010-10-18 22:41                 ` Andrew Morton
2010-10-18 22:41                   ` Andrew Morton
2010-10-19  0:57                   ` KOSAKI Motohiro
2010-10-19  0:57                     ` KOSAKI Motohiro
2010-10-19  1:15                     ` Minchan Kim
2010-10-19  1:15                       ` Minchan Kim
2010-10-19  1:21                       ` KOSAKI Motohiro
2010-10-19  1:21                         ` KOSAKI Motohiro
2010-10-19  1:32                         ` Minchan Kim
2010-10-19  1:32                           ` Minchan Kim
2010-10-19  2:03                           ` KOSAKI Motohiro
2010-10-19  2:03                             ` KOSAKI Motohiro
2010-10-19  2:16                             ` Minchan Kim
2010-10-19  2:16                               ` Minchan Kim
2010-10-19  2:54                               ` KOSAKI Motohiro
2010-10-19  2:54                                 ` KOSAKI Motohiro
2010-10-19  2:35                       ` Wu Fengguang
2010-10-19  2:35                         ` Wu Fengguang
2010-10-19  2:52                         ` Minchan Kim
2010-10-19  2:52                           ` Minchan Kim
2010-10-19  3:05                           ` Wu Fengguang
2010-10-19  3:05                             ` Wu Fengguang
2010-10-19  3:09                             ` Minchan Kim
2010-10-19  3:09                               ` Minchan Kim
2010-10-19  3:13                               ` KOSAKI Motohiro
2010-10-19  3:13                                 ` KOSAKI Motohiro
2010-10-19  5:11                                 ` Minchan Kim [this message]
2010-10-19  5:11                                   ` Minchan Kim
2010-10-19  3:21                               ` Shaohua Li
2010-10-19  3:21                                 ` Shaohua Li
2010-10-19  7:15                                 ` Shaohua Li
2010-10-19  7:15                                   ` Shaohua Li
2010-10-19  7:34                                   ` Minchan Kim
2010-10-19  7:34                                     ` Minchan Kim
2010-10-19  2:24                   ` Wu Fengguang
2010-10-19  2:24                     ` Wu Fengguang
2010-10-19  2:37                     ` KOSAKI Motohiro
2010-10-19  2:37                       ` KOSAKI Motohiro
2010-10-19  2:37                     ` Minchan Kim
2010-10-19  2:37                       ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=Akuku=Sz7kw0JRB-bzP8cmirbX4XJ8qyVqmze@mail.gmail.com' \
    --to=minchan.kim@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=neilb@suse.de \
    --cc=riel@redhat.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.