Linux-kselftest Archive on lore.kernel.org
 help / color / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Mina Almasry <almasrymina@google.com>, aneesh.kumar@linux.vnet.ibm.com
Cc: shuah@kernel.org, rientjes@google.com, shakeelb@google.com,
	gthelen@google.com, akpm@linux-foundation.org,
	khalid.aziz@oracle.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	cgroups@vger.kernel.org, mkoutny@suse.com
Subject: Re: [PATCH v5 0/7] hugetlb_cgroup: Add hugetlb_cgroup reservation limits
Date: Mon, 23 Sep 2019 10:47:30 -0700
Message-ID: <3c73d2b7-f8d0-16bf-b0f0-86673c3e9ce3@oracle.com> (raw)
In-Reply-To: <20190919222421.27408-1-almasrymina@google.com>

On 9/19/19 3:24 PM, Mina Almasry wrote:
> Patch series implements hugetlb_cgroup reservation usage and limits, which
> track hugetlb reservations rather than hugetlb memory faulted in. Details of
> the approach is 1/7.

Thanks for your continued efforts Mina.

One thing that has bothered me with this approach from the beginning is that
hugetlb reservations are related to, but somewhat distinct from hugetlb
allocations.  The original (existing) huegtlb cgroup implementation does not
take reservations into account.  This is an issue you are trying to address
by adding a cgroup support for hugetlb reservations.  However, this new
reservation cgroup ignores hugetlb allocations at fault time.

I 'think' the whole purpose of any hugetlb cgroup is to manage the allocation
of hugetlb pages.  Both the existing cgroup code and the reservation approach
have what I think are some serious flaws.  Consider a system with 100 hugetlb
pages available.  A sysadmin, has two groups A and B and wants to limit hugetlb
usage to 50 pages each.

With the existing implementation, a task in group A could create a mmap of
100 pages in size and reserve all 100 pages.  Since the pages are 'reserved',
nobody in group B can allocate ANY huge pages.  This is true even though
no pages have been allocated in A (or B).

With the reservation implementation, a task in group A could use MAP_NORESERVE
and allocate all 100 pages without taking any reservations.

As mentioned in your documentation, it would be possible to use both the
existing (allocation) and new reservation cgroups together.  Perhaps if both
are setup for the 50/50 split things would work a little better.

However, instead of creating a new reservation crgoup how about adding
reservation support to the existing allocation cgroup support.  One could
even argue that a reservation is an allocation as it sets aside huge pages
that can only be used for a specific purpose.  Here is something that
may work.

Starting with the existing allocation cgroup.
- When hugetlb pages are reserved, the cgroup of the task making the
  reservations is charged.  Tracking for the charged cgroup is done in the
  reservation map in the same way proposed by this patch set.
- At page fault time,
  - If a reservation already exists for that specific area do not charge the
    faulting task.  No tracking in page, just the reservation map.
  - If no reservation exists, charge the group of the faulting task.  Tracking
    of this information is in the page itself as implemented today.
- When the hugetlb object is removed, compare the reservation map with any
  allocated pages.  If cgroup tracking information exists in page, uncharge
  that group.  Otherwise, unharge the group (if any) in the reservation map.

One of the advantages of a separate reservation cgroup is that the existing
code is unmodified.  Combining the two provides a more complete/accurate
solution IMO.  But, it has the potential to break existing users.

I really would like to get feedback from anyone that knows how the existing
hugetlb cgroup controller may be used today.  Comments from Aneesh would
be very welcome to know if reservations were considered in development of the
existing code.
-- 
Mike Kravetz

  parent reply index

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19 22:24 Mina Almasry
2019-09-19 22:24 ` [PATCH v5 1/7] hugetlb_cgroup: Add hugetlb_cgroup reservation counter Mina Almasry
2019-09-19 22:24 ` [PATCH v5 2/7] hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations Mina Almasry
2019-09-19 22:24 ` [PATCH v5 3/7] hugetlb_cgroup: add reservation accounting for private mappings Mina Almasry
2019-09-19 22:24 ` [PATCH v5 4/7] hugetlb: disable region_add file_region coalescing Mina Almasry
2019-09-27 21:44   ` Mike Kravetz
2019-09-27 22:33     ` Mina Almasry
2019-09-19 22:24 ` [PATCH v5 5/7] hugetlb_cgroup: add accounting for shared mappings Mina Almasry
2019-09-19 22:24 ` [PATCH v5 6/7] hugetlb_cgroup: Add hugetlb_cgroup reservation tests Mina Almasry
2019-09-19 22:24 ` [PATCH v5 7/7] hugetlb_cgroup: Add hugetlb_cgroup reservation docs Mina Almasry
2019-09-23 17:47 ` Mike Kravetz [this message]
2019-09-23 19:18   ` [PATCH v5 0/7] hugetlb_cgroup: Add hugetlb_cgroup reservation limits Mina Almasry
2019-09-23 21:27     ` Mike Kravetz
2019-09-24 22:42       ` Mina Almasry
2019-09-26 19:28         ` David Rientjes
2019-09-26 21:23           ` Mike Kravetz
2019-09-27  0:55             ` Mina Almasry
2019-09-27 21:59               ` Mike Kravetz
2019-09-27 22:51                 ` Mina Almasry
2019-09-27 22:56                   ` Mike Kravetz
2019-09-30 15:12               ` Michal Koutný
2019-10-11 19:10   ` Mina Almasry
2019-10-11 20:41     ` Mina Almasry
2019-10-14 17:33       ` Mike Kravetz
2019-10-14 18:01         ` Mina Almasry

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c73d2b7-f8d0-16bf-b0f0-86673c3e9ce3@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=khalid.aziz@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mkoutny@suse.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-kselftest Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-kselftest/0 linux-kselftest/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-kselftest linux-kselftest/ https://lore.kernel.org/linux-kselftest \
		linux-kselftest@vger.kernel.org
	public-inbox-index linux-kselftest

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kselftest


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git