Linux-kselftest Archive on lore.kernel.org
 help / color / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Mina Almasry <almasrymina@google.com>
Cc: shuah@kernel.org, rientjes@google.com, shakeelb@google.com,
	gthelen@google.com, akpm@linux-foundation.org,
	khalid.aziz@oracle.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	cgroups@vger.kernel.org, aneesh.kumar@linux.vnet.ibm.com,
	mkoutny@suse.com
Subject: Re: [PATCH v4 4/9] hugetlb: region_chg provides only cache entry
Date: Mon, 16 Sep 2019 15:17:35 -0700
Message-ID: <b2fa24c3-c92f-6834-e6da-b952ac25e5f3@oracle.com> (raw)
In-Reply-To: <20190910233146.206080-5-almasrymina@google.com>

On 9/10/19 4:31 PM, Mina Almasry wrote:
> Current behavior is that region_chg provides both a cache entry in
> resv->region_cache, AND a placeholder entry in resv->regions. region_add
> first tries to use the placeholder, and if it finds that the placeholder
> has been deleted by a racing region_del call, it uses the cache entry.
> 
> This behavior is completely unnecessary and is removed in this patch for
> a couple of reasons:
> 
> 1. region_add needs to either find a cached file_region entry in
>    resv->region_cache, or find an entry in resv->regions to expand. It
>    does not need both.
> 2. region_chg adding a placeholder entry in resv->regions opens up
>    a possible race with region_del, where region_chg adds a placeholder
>    region in resv->regions, and this region is deleted by a racing call
>    to region_del during region_chg execution or before region_add is
>    called. Removing the race makes the code easier to reason about and
>    maintain.
> 
> In addition, a follow up patch in this series disables region
> coalescing, which would be further complicated if the race with
> region_del exists.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>

Thanks.  I like this modification as it does simplify the code and could
be added as a general cleanup independent of the other changes.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

> ---
>  mm/hugetlb.c | 63 +++++++++-------------------------------------------
>  1 file changed, 11 insertions(+), 52 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fbd7c52e17348..bea51ae422f63 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -246,14 +246,10 @@ struct file_region {
> 
>  /*
>   * Add the huge page range represented by [f, t) to the reserve
> - * map.  In the normal case, existing regions will be expanded
> - * to accommodate the specified range.  Sufficient regions should
> - * exist for expansion due to the previous call to region_chg
> - * with the same range.  However, it is possible that region_del
> - * could have been called after region_chg and modifed the map
> - * in such a way that no region exists to be expanded.  In this
> - * case, pull a region descriptor from the cache associated with
> - * the map and use that for the new range.
> + * map.  Existing regions will be expanded to accommodate the specified
> + * range, or a region will be taken from the cache.  Sufficient regions
> + * must exist in the cache due to the previous call to region_chg with
> + * the same range.
>   *
>   * Return the number of new huge pages added to the map.  This
>   * number is greater than or equal to zero.
> @@ -272,9 +268,8 @@ static long region_add(struct resv_map *resv, long f, long t)
> 
>  	/*
>  	 * If no region exists which can be expanded to include the
> -	 * specified range, the list must have been modified by an
> -	 * interleving call to region_del().  Pull a region descriptor
> -	 * from the cache and use it for this range.
> +	 * specified range, pull a region descriptor from the cache
> +	 * and use it for this range.
>  	 */
>  	if (&rg->link == head || t < rg->from) {
>  		VM_BUG_ON(resv->region_cache_count <= 0);
> @@ -339,15 +334,9 @@ static long region_add(struct resv_map *resv, long f, long t)
>   * call to region_add that will actually modify the reserve
>   * map to add the specified range [f, t).  region_chg does
>   * not change the number of huge pages represented by the
> - * map.  However, if the existing regions in the map can not
> - * be expanded to represent the new range, a new file_region
> - * structure is added to the map as a placeholder.  This is
> - * so that the subsequent region_add call will have all the
> - * regions it needs and will not fail.
> - *
> - * Upon entry, region_chg will also examine the cache of region descriptors
> - * associated with the map.  If there are not enough descriptors cached, one
> - * will be allocated for the in progress add operation.
> + * map.  A new file_region structure is added to the cache
> + * as a placeholder, so that the subsequent region_add
> + * call will have all the regions it needs and will not fail.
>   *
>   * Returns the number of huge pages that need to be added to the existing
>   * reservation map for the range [f, t).  This number is greater or equal to
> @@ -357,10 +346,9 @@ static long region_add(struct resv_map *resv, long f, long t)
>  static long region_chg(struct resv_map *resv, long f, long t)
>  {
>  	struct list_head *head = &resv->regions;
> -	struct file_region *rg, *nrg = NULL;
> +	struct file_region *rg;
>  	long chg = 0;
> 
> -retry:
>  	spin_lock(&resv->lock);
>  retry_locked:
>  	resv->adds_in_progress++;
> @@ -378,10 +366,8 @@ static long region_chg(struct resv_map *resv, long f, long t)
>  		spin_unlock(&resv->lock);
> 
>  		trg = kmalloc(sizeof(*trg), GFP_KERNEL);
> -		if (!trg) {
> -			kfree(nrg);
> +		if (!trg)
>  			return -ENOMEM;
> -		}
> 
>  		spin_lock(&resv->lock);
>  		list_add(&trg->link, &resv->region_cache);
> @@ -394,28 +380,6 @@ static long region_chg(struct resv_map *resv, long f, long t)
>  		if (f <= rg->to)
>  			break;
> 
> -	/* If we are below the current region then a new region is required.
> -	 * Subtle, allocate a new region at the position but make it zero
> -	 * size such that we can guarantee to record the reservation. */
> -	if (&rg->link == head || t < rg->from) {
> -		if (!nrg) {
> -			resv->adds_in_progress--;
> -			spin_unlock(&resv->lock);
> -			nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
> -			if (!nrg)
> -				return -ENOMEM;
> -
> -			nrg->from = f;
> -			nrg->to   = f;
> -			INIT_LIST_HEAD(&nrg->link);
> -			goto retry;
> -		}
> -
> -		list_add(&nrg->link, rg->link.prev);
> -		chg = t - f;
> -		goto out_nrg;
> -	}
> -
>  	/* Round our left edge to the current segment if it encloses us. */
>  	if (f > rg->from)
>  		f = rg->from;
> @@ -439,11 +403,6 @@ static long region_chg(struct resv_map *resv, long f, long t)
>  	}
> 
>  out:
> -	spin_unlock(&resv->lock);
> -	/*  We already know we raced and no longer need the new region */
> -	kfree(nrg);
> -	return chg;
> -out_nrg:
>  	spin_unlock(&resv->lock);
>  	return chg;
>  }
> --
> 2.23.0.162.g0b9fbb3734-goog

  reply index

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-10 23:31 [PATCH v4 0/9] hugetlb_cgroup: Add hugetlb_cgroup reservation limits Mina Almasry
2019-09-10 23:31 ` [PATCH v4 1/9] hugetlb_cgroup: Add hugetlb_cgroup reservation counter Mina Almasry
2019-09-16 23:43   ` shuah
2019-09-10 23:31 ` [PATCH v4 2/9] hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations Mina Almasry
2019-09-17  1:29   ` shuah
2019-09-10 23:31 ` [PATCH v4 3/9] hugetlb_cgroup: add reservation accounting for private mappings Mina Almasry
2019-09-10 23:31 ` [PATCH v4 4/9] hugetlb: region_chg provides only cache entry Mina Almasry
2019-09-16 22:17   ` Mike Kravetz [this message]
2019-09-10 23:31 ` [PATCH v4 5/9] hugetlb: remove duplicated code Mina Almasry
2019-09-16 22:25   ` Mike Kravetz
2019-09-10 23:31 ` [PATCH v4 6/9] hugetlb: disable region_add file_region coalescing Mina Almasry
2019-09-16 23:57   ` Mike Kravetz
2019-09-17  0:16     ` Mina Almasry
2019-09-10 23:31 ` [PATCH v4 7/9] hugetlb_cgroup: add accounting for shared mappings Mina Almasry
2019-09-10 23:31 ` [PATCH v4 8/9] hugetlb_cgroup: Add hugetlb_cgroup reservation tests Mina Almasry
2019-09-17  1:52   ` shuah
2019-09-19  1:53     ` Mina Almasry
2019-09-10 23:31 ` [PATCH v4 9/9] hugetlb_cgroup: Add hugetlb_cgroup reservation docs Mina Almasry
2019-09-17  1:58   ` shuah

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2fa24c3-c92f-6834-e6da-b952ac25e5f3@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=khalid.aziz@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mkoutny@suse.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-kselftest Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-kselftest/0 linux-kselftest/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-kselftest linux-kselftest/ https://lore.kernel.org/linux-kselftest \
		linux-kselftest@vger.kernel.org
	public-inbox-index linux-kselftest

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kselftest


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git