All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Mina Almasry <almasrymina@google.com>
Cc: shuah@kernel.org, rientjes@google.com, shakeelb@google.com,
	gthelen@google.com, akpm@linux-foundation.org,
	khalid.aziz@oracle.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	cgroups@vger.kernel.org, aneesh.kumar@linux.vnet.ibm.com,
	mkoutny@suse.com
Subject: Re: [PATCH v6 5/9] hugetlb: disable region_add file_region coalescing
Date: Mon, 21 Oct 2019 12:02:46 -0700	[thread overview]
Message-ID: <982efbbc-f795-5819-83a8-7d328537e070@oracle.com> (raw)
In-Reply-To: <20191013003024.215429-5-almasrymina@google.com>

On 10/12/19 5:30 PM, Mina Almasry wrote:
> A follow up patch in this series adds hugetlb cgroup uncharge info the
> file_region entries in resv->regions. The cgroup uncharge info may
> differ for different regions, so they can no longer be coalesced at
> region_add time. So, disable region coalescing in region_add in this
> patch.
> 
> Behavior change:
> 
> Say a resv_map exists like this [0->1], [2->3], and [5->6].
> 
> Then a region_chg/add call comes in region_chg/add(f=0, t=5).
> 
> Old code would generate resv->regions: [0->5], [5->6].
> New code would generate resv->regions: [0->1], [1->2], [2->3], [3->5],
> [5->6].
> 
> Special care needs to be taken to handle the resv->adds_in_progress
> variable correctly. In the past, only 1 region would be added for every
> region_chg and region_add call. But now, each call may add multiple
> regions, so we can no longer increment adds_in_progress by 1 in region_chg,
> or decrement adds_in_progress by 1 after region_add or region_abort. Instead,
> region_chg calls add_reservation_in_range() to count the number of regions
> needed and allocates those, and that info is passed to region_add and
> region_abort to decrement adds_in_progress correctly.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> 
> ---
> 
> Changes in v6:
> - Fix bug in number of region_caches allocated by region_chg
> 
> ---
>  mm/hugetlb.c | 256 +++++++++++++++++++++++++++++----------------------
>  1 file changed, 147 insertions(+), 109 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4a60d7d44b4c3..f9c1947925bb9 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
<snip>
> -static long region_chg(struct resv_map *resv, long f, long t)
> +static long region_chg(struct resv_map *resv, long f, long t,
> +		       long *out_regions_needed)
>  {
> +	struct file_region *trg = NULL;
>  	long chg = 0;
> 
> +	/* Allocate the maximum number of regions we're going to need for this
> +	 * reservation. The maximum number of regions we're going to need is
> +	 * (t - f) / 2 + 1, which corresponds to a region with alternating
> +	 * reserved and unreserved pages.
> +	 */
> +	*out_regions_needed = (t - f) / 2 + 1;
> +
>  	spin_lock(&resv->lock);
> -retry_locked:
> -	resv->adds_in_progress++;
> +
> +	resv->adds_in_progress += *out_regions_needed;
> 
>  	/*
>  	 * Check for sufficient descriptors in the cache to accommodate
>  	 * the number of in progress add operations.
>  	 */
> -	if (resv->adds_in_progress > resv->region_cache_count) {
> -		struct file_region *trg;
> -
> -		VM_BUG_ON(resv->adds_in_progress - resv->region_cache_count > 1);
> +	while (resv->region_cache_count < resv->adds_in_progress) {
>  		/* Must drop lock to allocate a new descriptor. */
> -		resv->adds_in_progress--;
>  		spin_unlock(&resv->lock);
> -
>  		trg = kmalloc(sizeof(*trg), GFP_KERNEL);
>  		if (!trg)
>  			return -ENOMEM;
> @@ -393,9 +395,9 @@ static long region_chg(struct resv_map *resv, long f, long t)
>  		spin_lock(&resv->lock);
>  		list_add(&trg->link, &resv->region_cache);
>  		resv->region_cache_count++;
> -		goto retry_locked;
>  	}


I know that I suggested allocating the worst case number of entries, but this
is going to be too much of a hit for existing hugetlbfs users.  It is not
uncommon for DBs to have a shared areas in excess of 1TB mapped by hugetlbfs.
With this new scheme, the above while loop will allocate over a half million
file region entries and end up only using one.

I think we need to step back and come up with a different approach.  Let me
give it some more thought before throwing out ideas that may waste more of
your time.  Sorry.
-- 
Mike Kravetz

  reply	other threads:[~2019-10-21 19:03 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-13  0:30 [PATCH v6 1/9] hugetlb_cgroup: Add hugetlb_cgroup reservation counter Mina Almasry
2019-10-13  0:30 ` Mina Almasry
2019-10-13  0:30 ` [PATCH v6 2/9] hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations Mina Almasry
2019-10-13  0:30   ` Mina Almasry
2019-10-13  0:30 ` [PATCH v6 3/9] hugetlb_cgroup: add cgroup-v2 support Mina Almasry
2019-10-13  0:30   ` Mina Almasry
2019-10-13  0:30 ` [PATCH v6 4/9] hugetlb_cgroup: add reservation accounting for private mappings Mina Almasry
2019-10-13  0:30   ` Mina Almasry
2019-10-13  0:30 ` [PATCH v6 5/9] hugetlb: disable region_add file_region coalescing Mina Almasry
2019-10-13  0:30   ` Mina Almasry
2019-10-21 19:02   ` Mike Kravetz [this message]
2019-10-21 19:29     ` Mina Almasry
2019-10-13  0:30 ` [PATCH v6 6/9] hugetlb_cgroup: add accounting for shared mappings Mina Almasry
2019-10-13  0:30   ` Mina Almasry
2019-10-13  0:30 ` [PATCH v6 7/9] hugetlb_cgroup: support noreserve mappings Mina Almasry
2019-10-13  0:30   ` Mina Almasry
2019-10-13  0:30 ` [PATCH v6 8/9] hugetlb_cgroup: Add hugetlb_cgroup reservation tests Mina Almasry
2019-10-13  0:30   ` Mina Almasry
2019-10-13  0:30 ` [PATCH v6 9/9] hugetlb_cgroup: Add hugetlb_cgroup reservation docs Mina Almasry
2019-10-13  0:30   ` Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=982efbbc-f795-5819-83a8-7d328537e070@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=khalid.aziz@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mkoutny@suse.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.