linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: TSUKADA Koutaro <tsukada@ascade.co.jp>
To: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"Luis R. Rodriguez" <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>,
	David Rientjes <rientjes@google.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	Marc-Andre Lureau <marcandre.lureau@redhat.com>,
	Punit Agrawal <punit.agrawal@arm.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org
Subject: Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg
Date: Thu, 24 May 2018 21:58:49 +0900	[thread overview]
Message-ID: <b2afbff6-b59f-7105-3808-64d41bd4a3a8@ascade.co.jp> (raw)
In-Reply-To: <20180524082044.GW20441@dhcp22.suse.cz>

On 2018/05/24 17:20, Michal Hocko wrote:
> On Thu 24-05-18 13:39:59, TSUKADA Koutaro wrote:
>> On 2018/05/23 3:54, Michal Hocko wrote:
> [...]
>>> I am also quite confused why you keep distinguishing surplus hugetlb
>>> pages from regular preallocated ones. Being a surplus page is an
>>> implementation detail that we use for an internal accounting rather than
>>> something to exhibit to the userspace even more than we do currently.
>>
>> I apologize for having confused.
>>
>> The hugetlb pages obtained from the pool do not waste the buddy pool.
> 
> Because they have already allocated from the buddy allocator so the end
> result is very same.
> 
>> On
>> the other hand, surplus hugetlb pages waste the buddy pool. Due to this
>> difference in property, I thought it could be distinguished.
> 
> But this is simply not correct. Surplus pages are fluid. If you increase
> the hugetlb size they will become regular persistent hugetlb pages.

I really can not understand what's wrong with this. That page is obviously
released before being added to the persistent pool, and at that time it is
uncharged from memcg to which the task belongs(This assumes my patch-set).
After that, the same page obtained from the pool is not surplus hugepage
so it will not be charged to memcg again.

>> Although my memcg knowledge is extremely limited, memcg is accounting for
>> various kinds of pages obtained from the buddy pool by the task belonging
>> to it. I would like to argue that surplus hugepage has specificity in
>> terms of obtaining from the buddy pool, and that it is specially permitted
>> charge requirements for memcg.
> 
> Not really. Memcg accounts primarily for reclaimable memory. We do
> account for some non-reclaimable slabs but the life time should be at
> least bound to a process life time. Otherwise the memcg oom killer
> behavior is not guaranteed to unclutter the situation. Hugetlb pages are
> simply persistent. Well, to be completely honest tmpfs pages have a
> similar problem but lacking the swap space for them is kinda
> configuration bug.

Absolutely you are saying the right thing, but, for example, can mlock(2)ed
pages be swapped out by reclaim?(What is the difference between mlock(2)ed
pages and hugetlb page?)

>> It seems very strange that charge hugetlb page to memcg, but essentially
>> it only charges the usage of the compound page obtained from the buddy pool,
>> and even if that page is used as hugetlb page after that, memcg is not
>> interested in that.
> 
> Ohh, it is very much interested. The primary goal of memcg is to enforce
> the limit. How are you going to do that in an absence of the reclaimable
> memory? And quite a lot of it because hugetlb pages usually consume a
> lot of memory.

Simply kill any of the tasks belonging to that memcg. Maybe, no one wants
reclaim at the time of account of with surplus hugepages.

[...]
>> I could not understand the intention of this question, sorry. When resize
>> the pool, I think that the number of surplus hugepages in use does not
>> change. Could you explain what you were concerned about?
> 
> It does change when you change the hugetlb pool size, migrate pages
> between per-numa pools (have a look at adjust_pool_surplus).

As I looked at, what kind of fatal problem is caused by charging surplus
hugepages to memcg by just manipulating counter of statistical information?

-- 
Thanks,
Tsukada

  reply	other threads:[~2018-05-24 12:58 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-18  4:27 [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg TSUKADA Koutaro
2018-05-18  4:29 ` [PATCH v2 1/7] hugetlb: introduce charge_surplus_huge_pages to struct hstate TSUKADA Koutaro
2018-05-18  4:32 ` [PATCH v2 2/7] hugetlb: support migrate charging for surplus hugepages TSUKADA Koutaro
2018-05-18  4:34 ` [PATCH v2 3/7] memcg: use compound_order rather than hpage_nr_pages TSUKADA Koutaro
2018-05-18 17:46   ` Punit Agrawal
2018-05-18 17:51     ` Punit Agrawal
2018-05-21  3:48       ` TSUKADA Koutaro
2018-05-21 14:53         ` Punit Agrawal
2018-05-18  4:36 ` [PATCH v2 4/7] mm, sysctl: make charging surplus hugepages controllable TSUKADA Koutaro
2018-05-18  4:37 ` [PATCH v2 5/7] hugetlb: add charge_surplus_hugepages attribute TSUKADA Koutaro
2018-05-18  4:39 ` [PATCH v2 6/7] Documentation, hugetlb: describe about charge_surplus_hugepages, TSUKADA Koutaro
2018-05-18  4:41 ` [PATCH v2 7/7] memcg: supports movement of surplus hugepages statistics TSUKADA Koutaro
2018-05-21 14:52 ` [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg Punit Agrawal
2018-05-22 12:56   ` TSUKADA Koutaro
2018-05-21 18:07 ` Mike Kravetz
2018-05-22 13:04   ` TSUKADA Koutaro
2018-05-22 18:54     ` Michal Hocko
2018-05-24  4:39       ` TSUKADA Koutaro
2018-05-24  8:20         ` Michal Hocko
2018-05-24 12:58           ` TSUKADA Koutaro [this message]
2018-05-24 13:24             ` Michal Hocko
2018-05-25  1:51               ` TSUKADA Koutaro
2018-05-22 20:28     ` Mike Kravetz
2018-05-22 13:51 ` Michal Hocko
2018-05-24  4:26   ` TSUKADA Koutaro
2018-05-24  8:27     ` Michal Hocko
2018-05-24 17:45     ` Mike Kravetz
2018-05-25  1:55       ` TSUKADA Koutaro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2afbff6-b59f-7105-3808-64d41bd4a3a8@ascade.co.jp \
    --to=tsukada@ascade.co.jp \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=keescook@chromium.org \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=marcandre.lureau@redhat.com \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=punit.agrawal@arm.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).