All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan.kim@gmail.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>
Subject: Re: [BUGFIX][PATCH 1/4] memcg: fix limit estimation at reclaim for hugepage
Date: Sun, 30 Jan 2011 11:26:18 +0900	[thread overview]
Message-ID: <AANLkTik=yJnHoZbyjc8bMp_vbGaNdgzvAXY1P5qZ8W6W@mail.gmail.com> (raw)
In-Reply-To: <20110128173655.7c1d9ebd.kamezawa.hiroyu@jp.fujitsu.com>

On Fri, Jan 28, 2011 at 5:36 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 28 Jan 2011 17:25:58 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi Hannes,
>>
>> On Fri, Jan 28, 2011 at 5:17 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> > On Fri, Jan 28, 2011 at 05:04:16PM +0900, Minchan Kim wrote:
>> >> Hi Kame,
>> >>
>> >> On Fri, Jan 28, 2011 at 1:58 PM, KAMEZAWA Hiroyuki
>> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> >> > How about this ?
>> >> > ==
>> >> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> >> >
>> >> > Current memory cgroup's code tends to assume page_size == PAGE_SIZE
>> >> > and arrangement for THP is not enough yet.
>> >> >
>> >> > This is one of fixes for supporing THP. This adds
>> >> > mem_cgroup_check_margin() and checks whether there are required amount of
>> >> > free resource after memory reclaim. By this, THP page allocation
>> >> > can know whether it really succeeded or not and avoid infinite-loop
>> >> > and hangup.
>> >> >
>> >> > Total fixes for do_charge()/reclaim memory will follow this patch.
>> >>
>> >> If this patch is only related to THP, I think patch order isn't good.
>> >> Before applying [2/4], huge page allocation will retry without
>> >> reclaiming and loop forever by below part.
>> >>
>> >> @@ -1854,9 +1858,6 @@ static int __mem_cgroup_do_charge(struct
>> >>       } else
>> >>               mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
>> >>
>> >> -     if (csize > PAGE_SIZE) /* change csize and retry */
>> >> -             return CHARGE_RETRY;
>> >> -
>> >>       if (!(gfp_mask & __GFP_WAIT))
>> >>               return CHARGE_WOULDBLOCK;
>> >>
>> >> Am I missing something?
>> >
>> > No, you are correct.  But I am not sure the order really matters in
>> > theory: you have two endless loops that need independent fixing.
>>
>> That's why I ask a question.
>> Two endless loop?
>>
>> One is what I mentioned. The other is what?
>> Maybe this patch solve the other.
>> But I can't guess it by only this description. Stupid..
>>
>> Please open my eyes.
>>
>
> One is.
>
>  if (csize > PAGE_SIZE)
>        return CHARGE_RETRY;
>
> By this, reclaim will never be called.
>
>
> Another is a check after memory reclaim.
> ==
>       ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL,
>                                        gfp_mask, flags);
>        /*
>         * try_to_free_mem_cgroup_pages() might not give us a full
>         * picture of reclaim. Some pages are reclaimed and might be
>         * moved to swap cache or just unmapped from the cgroup.
>         * Check the limit again to see if the reclaim reduced the
>         * current usage of the cgroup before giving up
>         */
>        if (ret || mem_cgroup_check_under_limit(mem_over_limit))
>                return CHARGE_RETRY;
> ==
>
> ret != 0 if one page is reclaimed. Then, khupaged will retry charge and
> cannot get enough room, reclaim, one page -> again. SO, in busy memcg,
> HPAGE_SIZE allocation never fails.
>
> Even if khupaged luckly allocates HPAGE_SIZE, because khugepaged walks vmas
> one by one and try to collapse each pmd, under mmap_sem(), this seems a hang by
> khugepaged, infinite loop.
>
>
> Thanks,
> -Kame
>
>

Kame, Hannes, Thanks.

I understood yours opinion. :)
As I said earlier, at least, it can help patch review.
When I saw only [1/4] firstly, I felt it doesn't affect anything since
THP allocation would return earlier before reaching the your patch so
infinite loop still happens.

Of course, when we apply [2/4], the problem will be gone.
But I can't know the fact until I read [2/4]. It makes reviewers confuse.

So I suggest [2/4] is ahead of [1/4] and includes following as in [2/4].
"This patch still has a infinite problem in case of xxxx. Next patch solves it"

I hope if it doesn't have a problem on bisect, patch order would be
changed if you don't mind. When I review Hannes's version, it's same.
:(

I will review again when Hannes resends the series.

-- 
Kind regards,
Minchan Kim

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan.kim@gmail.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>
Subject: Re: [BUGFIX][PATCH 1/4] memcg: fix limit estimation at reclaim for hugepage
Date: Sun, 30 Jan 2011 11:26:18 +0900	[thread overview]
Message-ID: <AANLkTik=yJnHoZbyjc8bMp_vbGaNdgzvAXY1P5qZ8W6W@mail.gmail.com> (raw)
In-Reply-To: <20110128173655.7c1d9ebd.kamezawa.hiroyu@jp.fujitsu.com>

On Fri, Jan 28, 2011 at 5:36 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 28 Jan 2011 17:25:58 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi Hannes,
>>
>> On Fri, Jan 28, 2011 at 5:17 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> > On Fri, Jan 28, 2011 at 05:04:16PM +0900, Minchan Kim wrote:
>> >> Hi Kame,
>> >>
>> >> On Fri, Jan 28, 2011 at 1:58 PM, KAMEZAWA Hiroyuki
>> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> >> > How about this ?
>> >> > ==
>> >> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> >> >
>> >> > Current memory cgroup's code tends to assume page_size == PAGE_SIZE
>> >> > and arrangement for THP is not enough yet.
>> >> >
>> >> > This is one of fixes for supporing THP. This adds
>> >> > mem_cgroup_check_margin() and checks whether there are required amount of
>> >> > free resource after memory reclaim. By this, THP page allocation
>> >> > can know whether it really succeeded or not and avoid infinite-loop
>> >> > and hangup.
>> >> >
>> >> > Total fixes for do_charge()/reclaim memory will follow this patch.
>> >>
>> >> If this patch is only related to THP, I think patch order isn't good.
>> >> Before applying [2/4], huge page allocation will retry without
>> >> reclaiming and loop forever by below part.
>> >>
>> >> @@ -1854,9 +1858,6 @@ static int __mem_cgroup_do_charge(struct
>> >>       } else
>> >>               mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
>> >>
>> >> -     if (csize > PAGE_SIZE) /* change csize and retry */
>> >> -             return CHARGE_RETRY;
>> >> -
>> >>       if (!(gfp_mask & __GFP_WAIT))
>> >>               return CHARGE_WOULDBLOCK;
>> >>
>> >> Am I missing something?
>> >
>> > No, you are correct.  But I am not sure the order really matters in
>> > theory: you have two endless loops that need independent fixing.
>>
>> That's why I ask a question.
>> Two endless loop?
>>
>> One is what I mentioned. The other is what?
>> Maybe this patch solve the other.
>> But I can't guess it by only this description. Stupid..
>>
>> Please open my eyes.
>>
>
> One is.
>
>  if (csize > PAGE_SIZE)
>        return CHARGE_RETRY;
>
> By this, reclaim will never be called.
>
>
> Another is a check after memory reclaim.
> ==
>       ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL,
>                                        gfp_mask, flags);
>        /*
>         * try_to_free_mem_cgroup_pages() might not give us a full
>         * picture of reclaim. Some pages are reclaimed and might be
>         * moved to swap cache or just unmapped from the cgroup.
>         * Check the limit again to see if the reclaim reduced the
>         * current usage of the cgroup before giving up
>         */
>        if (ret || mem_cgroup_check_under_limit(mem_over_limit))
>                return CHARGE_RETRY;
> ==
>
> ret != 0 if one page is reclaimed. Then, khupaged will retry charge and
> cannot get enough room, reclaim, one page -> again. SO, in busy memcg,
> HPAGE_SIZE allocation never fails.
>
> Even if khupaged luckly allocates HPAGE_SIZE, because khugepaged walks vmas
> one by one and try to collapse each pmd, under mmap_sem(), this seems a hang by
> khugepaged, infinite loop.
>
>
> Thanks,
> -Kame
>
>

Kame, Hannes, Thanks.

I understood yours opinion. :)
As I said earlier, at least, it can help patch review.
When I saw only [1/4] firstly, I felt it doesn't affect anything since
THP allocation would return earlier before reaching the your patch so
infinite loop still happens.

Of course, when we apply [2/4], the problem will be gone.
But I can't know the fact until I read [2/4]. It makes reviewers confuse.

So I suggest [2/4] is ahead of [1/4] and includes following as in [2/4].
"This patch still has a infinite problem in case of xxxx. Next patch solves it"

I hope if it doesn't have a problem on bisect, patch order would be
changed if you don't mind. When I review Hannes's version, it's same.
:(

I will review again when Hannes resends the series.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-01-30  2:26 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-28  3:22 [BUGFIX][PATCH 0/4] Fixes for memcg with THP KAMEZAWA Hiroyuki
2011-01-28  3:22 ` KAMEZAWA Hiroyuki
2011-01-28  3:24 ` [BUGFIX][PATCH 1/4] memcg: fix limit estimation at reclaim for hugepage KAMEZAWA Hiroyuki
2011-01-28  3:24   ` KAMEZAWA Hiroyuki
2011-01-28  4:40   ` Daisuke Nishimura
2011-01-28  4:40     ` Daisuke Nishimura
2011-01-28  4:49     ` KAMEZAWA Hiroyuki
2011-01-28  4:49       ` KAMEZAWA Hiroyuki
2011-01-28  4:58     ` KAMEZAWA Hiroyuki
2011-01-28  4:58       ` KAMEZAWA Hiroyuki
2011-01-28  5:36       ` Daisuke Nishimura
2011-01-28  5:36         ` Daisuke Nishimura
2011-01-28  8:04       ` Minchan Kim
2011-01-28  8:04         ` Minchan Kim
2011-01-28  8:17         ` Johannes Weiner
2011-01-28  8:17           ` Johannes Weiner
2011-01-28  8:25           ` Minchan Kim
2011-01-28  8:25             ` Minchan Kim
2011-01-28  8:36             ` KAMEZAWA Hiroyuki
2011-01-28  8:36               ` KAMEZAWA Hiroyuki
2011-01-30  2:26               ` Minchan Kim [this message]
2011-01-30  2:26                 ` Minchan Kim
2011-01-28  8:41             ` Johannes Weiner
2011-01-28  8:41               ` Johannes Weiner
2011-01-28  8:24         ` KAMEZAWA Hiroyuki
2011-01-28  8:24           ` KAMEZAWA Hiroyuki
2011-01-28  8:37           ` Minchan Kim
2011-01-28  8:37             ` Minchan Kim
2011-01-28  7:52   ` Johannes Weiner
2011-01-28  7:52     ` Johannes Weiner
2011-01-28  8:06     ` KAMEZAWA Hiroyuki
2011-01-28  8:06       ` KAMEZAWA Hiroyuki
2011-01-28  3:26 ` [BUGFIX][PATCH 2/4] memcg: fix charge path for THP and allow early retirement KAMEZAWA Hiroyuki
2011-01-28  3:26   ` KAMEZAWA Hiroyuki
2011-01-28  5:37   ` Daisuke Nishimura
2011-01-28  5:37     ` Daisuke Nishimura
2011-01-28  7:57   ` Johannes Weiner
2011-01-28  7:57     ` Johannes Weiner
2011-01-28  8:14     ` KAMEZAWA Hiroyuki
2011-01-28  8:14       ` KAMEZAWA Hiroyuki
2011-01-28  9:02       ` Johannes Weiner
2011-01-28  9:02         ` Johannes Weiner
2011-01-28  9:16         ` KAMEZAWA Hiroyuki
2011-01-28  9:16           ` KAMEZAWA Hiroyuki
2011-01-28  3:27 ` [BUGFIX][PATCH 3/4] mecg: fix oom flag at THP charge KAMEZAWA Hiroyuki
2011-01-28  3:27   ` KAMEZAWA Hiroyuki
2011-01-28  5:39   ` Daisuke Nishimura
2011-01-28  5:39     ` Daisuke Nishimura
2011-01-28  5:50     ` KAMEZAWA Hiroyuki
2011-01-28  5:50       ` KAMEZAWA Hiroyuki
2011-01-28  8:02   ` Johannes Weiner
2011-01-28  8:02     ` Johannes Weiner
2011-01-28  8:21     ` KAMEZAWA Hiroyuki
2011-01-28  8:21       ` KAMEZAWA Hiroyuki
2011-01-31  7:41       ` Balbir Singh
2011-01-31  7:41         ` Balbir Singh
2011-01-28  3:28 ` [BUGFIX][PATCH 4/4] memcg: fix khugepaged should skip busy memcg KAMEZAWA Hiroyuki
2011-01-28  3:28   ` KAMEZAWA Hiroyuki
2011-01-28  8:20   ` Daisuke Nishimura
2011-01-28  8:20     ` Daisuke Nishimura
2011-01-28  8:30     ` KAMEZAWA Hiroyuki
2011-01-28  8:30       ` KAMEZAWA Hiroyuki
2011-01-29 12:47 ` [BUGFIX][PATCH 0/4] Fixes for memcg with THP Balbir Singh
2011-01-29 12:47   ` Balbir Singh
2011-01-30 23:55   ` KAMEZAWA Hiroyuki
2011-01-30 23:55     ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTik=yJnHoZbyjc8bMp_vbGaNdgzvAXY1P5qZ8W6W@mail.gmail.com' \
    --to=minchan.kim@gmail.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.