linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mina Almasry <almasrymina@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Michal Hocko" <mhocko@kernel.org>, shuah <shuah@kernel.org>,
	"David Rientjes" <rientjes@google.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Greg Thelen" <gthelen@google.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	khalid.aziz@oracle.com,
	"open list" <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	cgroups@vger.kernel.org,
	"Aneesh Kumar" <aneesh.kumar@linux.vnet.ibm.com>,
	"Michal Koutný" <mkoutny@suse.com>, "Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Li Zefan" <lizefan@huawei.com>
Subject: Re: [PATCH v3 0/6] hugetlb_cgroup: Add hugetlb_cgroup reservation limits
Date: Thu, 5 Sep 2019 13:07:30 -0700	[thread overview]
Message-ID: <CAHS8izMCA9+sY+dxHxuFgANCLD2oNznPqGYvi1+C2xOkv=7EYw@mail.gmail.com> (raw)
In-Reply-To: <e7f91a50-5957-249c-8756-25ea87c77fc4@oracle.com>

On Tue, Sep 3, 2019 at 4:46 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 9/3/19 10:57 AM, Mike Kravetz wrote:
> > On 8/29/19 12:18 AM, Michal Hocko wrote:
> >> [Cc cgroups maintainers]
> >>
> >> On Wed 28-08-19 10:58:00, Mina Almasry wrote:
> >>> On Wed, Aug 28, 2019 at 4:23 AM Michal Hocko <mhocko@kernel.org> wrote:
> >>>>
> >>>> On Mon 26-08-19 16:32:34, Mina Almasry wrote:
> >>>>>  mm/hugetlb.c                                  | 493 ++++++++++++------
> >>>>>  mm/hugetlb_cgroup.c                           | 187 +++++--
> >>>>
> >>>> This is a lot of changes to an already subtle code which hugetlb
> >>>> reservations undoubly are.
> >>>
> >>> For what it's worth, I think this patch series is a net decrease in
> >>> the complexity of the reservation code, especially the region_*
> >>> functions, which is where a lot of the complexity lies. I removed the
> >>> race between region_del and region_{add|chg}, refactored the main
> >>> logic into smaller code, moved common code to helpers and deleted the
> >>> duplicates, and finally added lots of comments to the hard to
> >>> understand pieces. I hope that when folks review the changes they will
> >>> see that! :)
> >>
> >> Post those improvements as standalone patches and sell them as
> >> improvements. We can talk about the net additional complexity of the
> >> controller much easier then.
> >
> > All such changes appear to be in patch 4 of this series.  The commit message
> > says "region_add() and region_chg() are heavily refactored to in this commit
> > to make the code easier to understand and remove duplication.".  However, the
> > modifications were also added to accommodate the new cgroup reservation
> > accounting.  I think it would be helpful to explain why the existing code does
> > not work with the new accounting.  For example, one change is because
> > "existing code coalesces resv_map entries for shared mappings.  new cgroup
> > accounting requires that resv_map entries be kept separate for proper
> > uncharging."
> >
> > I am starting to review the changes, but it would help if there was a high
> > level description.  I also like Michal's idea of calling out the region_*
> > changes separately.  If not a standalone patch, at least the first patch of
> > the series.  This new code will be exercised even if cgroup reservation
> > accounting not enabled, so it is very important than no subtle regressions
> > be introduced.
>
> While looking at the region_* changes, I started thinking about this no
> coalesce change for shared mappings which I think is necessary.  Am I
> mistaken, or is this a requirement?
>

No coalesce is a requirement, yes. The idea is that task A can reseve
range [0-1], and task B can reserve range [1-2]. We want the code to
put in 2 regions:

1. [0-1], with cgroup information that points to task A's cgroup.
2. [1-2], with cgroup information that points to task B's cgroup.

If coalescing is happening, then you end up with one region [0-2] with
cgroup information for one of those cgroups, and someone gets
uncharged wrong when the reservation is freed.

Technically we can still coalesce if the cgroup information is the
same and I can do that, but the region_* code becomes more
complicated, and you mentioned on an earlier patchset that you were
concerned with how complicated the region_* functions are as is.

> If it is a requirement, then think about some of the possible scenarios
> such as:
> - There is a hugetlbfs file of size 10 huge pages.
> - Task A has reservations for pages at offset 1 3 5 7 and 9
> - Task B then mmaps the entire file which should result in reservations
>   at 0 2 4 6 and 8.
> - region_chg will return 5, but will also need to allocate 5 resv_map
>   entries for the subsequent region_add which can not fail.  Correct?
>   The code does not appear to handle this.
>

I thought the code did handle this. region_chg calls
allocate_enough_cache_for_range_and_lock(), which in this scenario
will put 5 entries in resv_map->region_cache. region_add will use
these 5 region_cache entries to do its business.

I'll add a test in my suite to test this case to make sure.

> BTW, this series will BUG when running libhugetlbfs test suite.  It will
> hit this in resv_map_release().
>
>         VM_BUG_ON(resv_map->adds_in_progress);
>

Sorry about that, I've been having trouble running the libhugetlbfs
tests, but I'm still on it. I'll get to the bottom of this by next
patch series.

> --
> Mike Kravetz


  reply	other threads:[~2019-09-05 20:07 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26 23:32 [PATCH v3 0/6] hugetlb_cgroup: Add hugetlb_cgroup reservation limits Mina Almasry
2019-08-26 23:32 ` [PATCH v3 1/6] hugetlb_cgroup: Add hugetlb_cgroup reservation counter Mina Almasry
2019-08-26 23:32 ` [PATCH v3 2/6] hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations Mina Almasry
2019-08-26 23:32 ` [PATCH v3 3/6] hugetlb_cgroup: add reservation accounting for private mappings Mina Almasry
2019-08-26 23:32 ` [PATCH v3 4/6] hugetlb_cgroup: add accounting for shared mappings Mina Almasry
2019-08-26 23:32 ` [PATCH v3 5/6] hugetlb_cgroup: Add hugetlb_cgroup reservation tests Mina Almasry
2019-08-26 23:32 ` [PATCH v3 6/6] hugetlb_cgroup: Add hugetlb_cgroup reservation docs Mina Almasry
2019-08-27  8:00 ` [PATCH v3 1/6] hugetlb_cgroup: Add hugetlb_cgroup reservation counter Hillf Danton
2019-08-27  9:18 ` [PATCH v3 6/6] hugetlb_cgroup: Add hugetlb_cgroup reservation docs Hillf Danton
2019-08-28 11:23 ` [PATCH v3 0/6] hugetlb_cgroup: Add hugetlb_cgroup reservation limits Michal Hocko
2019-08-28 17:58   ` Mina Almasry
2019-08-29  7:18     ` Michal Hocko
2019-09-03 17:57       ` Mike Kravetz
2019-09-03 23:44         ` Mike Kravetz
2019-09-05 20:07           ` Mina Almasry [this message]
2019-09-05 19:55         ` Mina Almasry
2019-08-29  0:42   ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHS8izMCA9+sY+dxHxuFgANCLD2oNznPqGYvi1+C2xOkv=7EYw@mail.gmail.com' \
    --to=almasrymina@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=khalid.aziz@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mkoutny@suse.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).