linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Mel Gorman <mel@csn.ul.ie>,
	Rik van Riel <riel@redhat.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration
Date: Thu, 15 Apr 2010 10:17:43 +0200	[thread overview]
Message-ID: <20100415081743.GP32034@random.random> (raw)
In-Reply-To: <20100415155611.da707913.kamezawa.hiroyu@jp.fujitsu.com>

On Thu, Apr 15, 2010 at 03:56:11PM +0900, KAMEZAWA Hiroyuki wrote:
> Ok, ignore this patch.

Ok so I'll stick to my original patch on aa.git:

http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=patch;h=f0a05fea58501298ab7b800ac8220f017c66f427

I already also merged the move from /proc to debugfs from Mel of two
files. So now I've to:

1) finish the generic doc in Documentation/ (mostly taken from
   transparent hugepage core changeset comments here:
   http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=commit;h=b901f7e1ab412241d4299954ae28505f2206af1d
   )

2) add alloc_pages_vma for numa awareness in the huge page faults

3) have the kernel stack 2m aligned and growsdown the vm_start in 2m
   chunks when enabled=always. I doubt it makes sense to decouple this
   feature from enabled=always and to add a special sysfs control for
   it, plus I don't like adding too many apis and it can always
   decoupled later.

4) I think I will not add a prctl to achieve Ingo's per-process enable
   for now. I'm quite convinced in real life madvise is enough and
   enabled=always|madvise|never is more than enough for the testing
   without having to add a prctl. This is identical issue to KSM after
   all, in the end also KSM is missing a prctl to enabled merging on a
   per process basis and that's fine. prctl really looks very much
   like libhugetlbfs to me so I'm not very attracted to it as I doubt
   its usefulness strongly and if I add it, it becomes a
   forever-existing API (actually even worse than the sysfs layout
   from the kernel API point of view) so there has to be a strong
   reason for it. And I don't think there's any point to add a
   madvise(MADV_NO_HUGEPAGE) or a prctl to selectively _disable_
   hugepages on mappings or processes when enabled=always. It makes no
   sense to use enabled=always and then to disable hugepages in a few
   apps. The opposite makes sense to save memory of course! I don't
   want to add kernel APIs in prctl useful only for testing and
   benchmarking. It can always be added later anyway...

5) Ulrich sent me a _three_ liner that will make glibc fully cooperate
   and guarantee all anon ram goes in hugepages without using
   khugepaged (just like libhugetlbfs would cooperate with
   hugetlbfs). For the posix threads it won't work yet and for that we
   may need to add a MAP_ALIGN flag to mmap (suggested by him) to be
   optimal and not waste address space on 32bit archs. That's no big
   deal, it's still orders of magnitude simpler that backing an
   mmap(4k) with a 2M page and collect the still unmapped parts of
   the 2M pages when system is low on memory. Furthermore MAP_ALIGN
   will involve the mmap paths with mmap_sem write mode, that aren't
   really fast paths, while the mmap(4k) backed by 2M would slowdown
   do_anonymous_pages and other core fast paths that are much more
   performance critical than the mmap paths. So I think this is the
   way to go. And if somebody don't want to risk wasting memory the
   default should be enabled=madvise and then add madvise where
   needed. One either has to choose between performance and memory,
   and I don't want intermediate terms like "a bit faster but not as
   fast as it can be, but waste a little less memory" which also
   complicates the code a lot and microslowdown the fast paths.

6) add a config option at kernel configuration time to select the
   transparent hugepage default between always/madvise/never
   (in-kernel set_recommended_min_free_kbytes late_initcall() will be
   running only for always/madvise, as it already checks the built time
   default and it won't run unless enabled=always|madvise).

  reply	other threads:[~2010-04-15  8:18 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-13  4:42 [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat Daisuke Nishimura
2010-04-13  6:14 ` KAMEZAWA Hiroyuki
2010-04-14  0:54   ` Daisuke Nishimura
2010-04-14  1:03     ` KAMEZAWA Hiroyuki
2010-04-14  1:40       ` KAMEZAWA Hiroyuki
2010-04-14  1:56         ` KAMEZAWA Hiroyuki
2010-04-14  3:06           ` KAMEZAWA Hiroyuki
2010-04-14  5:31             ` Daisuke Nishimura
2010-04-14  5:40               ` KAMEZAWA Hiroyuki
2010-04-15  2:22                 ` Daisuke Nishimura
2010-04-13  6:45 ` Balbir Singh
2010-04-15  3:05 ` [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration KAMEZAWA Hiroyuki
2010-04-15  3:06   ` [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v2) KAMEZAWA Hiroyuki
2010-04-16 10:31     ` [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3) KAMEZAWA Hiroyuki
2010-04-19  3:42       ` Daisuke Nishimura
2010-04-19  4:18         ` KAMEZAWA Hiroyuki
2010-04-19  8:07           ` Daisuke Nishimura
2010-04-19  8:26             ` KAMEZAWA Hiroyuki
2010-04-20  4:20               ` Daisuke Nishimura
2010-04-20  4:26                 ` KAMEZAWA Hiroyuki
2010-04-20  9:19                 ` KAMEZAWA Hiroyuki
2010-04-23  8:08                   ` Daisuke Nishimura
2010-04-23  8:23                     ` KAMEZAWA Hiroyuki
2010-04-15  6:43   ` [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration Daisuke Nishimura
2010-04-15  6:56     ` KAMEZAWA Hiroyuki
2010-04-15  8:17       ` Andrea Arcangeli [this message]
2010-04-16 16:13         ` Interleave policy on 2M pages (was Re: [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration) Christoph Lameter
2010-04-16 17:51           ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100415081743.GP32034@random.random \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=cl@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).