linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>,
	Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	David Rientjes <rientjes@google.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v3 3/5] hugetlb: be sure to free demoted CMA pages to CMA
Date: Tue, 5 Oct 2021 11:57:54 -0700	[thread overview]
Message-ID: <f1f011cc-5c5a-7c4f-5701-929918fa2dbb@oracle.com> (raw)
In-Reply-To: <20211005093320.GC20412@linux>

On 10/5/21 2:33 AM, Oscar Salvador wrote:
> On Fri, Oct 01, 2021 at 10:52:08AM -0700, Mike Kravetz wrote:
>> When huge page demotion is fully implemented, gigantic pages can be
>> demoted to a smaller huge page size.  For example, on x86 a 1G page
>> can be demoted to 512 2M pages.  However, gigantic pages can potentially
>> be allocated from CMA.  If a gigantic page which was allocated from CMA
>> is demoted, the corresponding demoted pages needs to be returned to CMA.
>>
>> Use the new interface cma_pages_valid() to determine if a non-gigantic
>> hugetlb page should be freed to CMA.  Also, clear mapping field of these
>> pages as expected by cma_release.
>>
>> This also requires a change to CMA reservations for gigantic pages.
>> Currently, the 'order_per_bit' is set to the gigantic page size.
>> However, if gigantic pages can be demoted this needs to be set to the
>> order of the smallest huge page.  At CMA reservation time we do not know
> 
> to the smallest, or to the next smaller? Would you mind elaborating why?
> 

It is the smallest.

CMA uses a per-region bit map to track allocations.  When setting up the
region, you specify how many pages each bit represents.  Currently,
only gigantic pages are allocated/freed from CMA so the region is set up
such that one bit represents a gigantic page size allocation.

With demote, a gigantic page (allocation) could be split into smaller
size pages.  And, these smaller size pages will be freed to CMA.  So,
since the per-region bit map needs to be set up to represent the smallest
allocation/free size, it now needs to be set to the smallest huge page
size which can be freed to CMA.

Unfortunately, we set up the CMA region for huge pages before we set up
huge pages sizes (hstates).  So, technically we do not know the smallest
huge page size as this can change via command line options and
architecture specific code.  Therefore, at region setup time we need some
constant value for smallest possible huge page size.  That is why
HUGETLB_PAGE_ORDER is used.

I should probably add all that to the changelog for clarity?

>> @@ -3003,7 +3020,8 @@ static void __init hugetlb_init_hstates(void)
>>  		 *   is not supported.
>>  		 */
>>  		if (!hstate_is_gigantic(h) ||
>> -		    gigantic_page_runtime_supported()) {
>> +		    gigantic_page_runtime_supported() ||
>> +		    !hugetlb_cma_size || !(h->order <= HUGETLB_PAGE_ORDER)) {
> 
> I am bit lost in the CMA area, so bear with me.
> We do not allow to demote if we specify we want hugetlb pages from the CMA?

We limit the size of the order which can be demoted if hugetlb pages can
be allocated from CMA.

> Also, can h->order be smaller than HUGETLB_PAGE_ORDER? I though
> HUGETLB_PAGE_ORDER was the smallest one.

Nope,
arm64 with 64K PAGE_SIZE is one example.  huge page sizes/orders are:
CONT_PMD_SHIFT	34	16.0 GiB
PMD_SHIFT	29	512 MiB
CONT_PTE_SHIFT	21	2.00 MiB

#define HPAGE_SHIFT	PMD_SHIFT
#define	HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)

So, HUGETLB_PAGE_ORDER is associated with the 512 MiB huge page size,
but there is also a (smaller) 2.00 MiB huge page size.

After your comment yesterday about rewriting this code for clarity,  this
now becomes:

		/*
		 * Set demote order for each hstate.  Note that
		 * h->demote_order is initially 0.
		 * - We can not demote gigantic pages if runtime freeing
		 *   is not supported, so skip this.
		 * - If CMA allocation is possible, we can not demote
		 *   HUGETLB_PAGE_ORDER or smaller size pages.
		 */
		if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
			continue;
		if (hugetlb_cma_size && h->order <= HUGETLB_PAGE_ORDER)
			continue;
		for_each_hstate(h2) {
			if (h2 == h)
				continue;
			if (h2->order < h->order &&
			    h2->order > h->demote_order)
				h->demote_order = h2->order;
		}

Hopefully, that is more clear.

> 
> The check for HUGETLB_PAGE_ORDER can probably be squashed into patch#1.
> 
> 
>>  			for_each_hstate(h2) {
>>  				if (h2 == h)
>>  					continue;
>> @@ -3555,6 +3573,8 @@ static ssize_t demote_size_store(struct kobject *kobj,
>>  	if (!t_hstate)
>>  		return -EINVAL;
>>  	demote_order = t_hstate->order;
>> +	if (demote_order < HUGETLB_PAGE_ORDER)
>> +		return -EINVAL;
> 
> This could probably go in the first patch.
> 
> 

Both of the above HUGETLB_PAGE_ORDER checks 'could' go into the first
patch.  However, the code which actually makes them necessary is in this
patch.  I would prefer to leave them together here.
-- 
Mike Kravetz


  reply	other threads:[~2021-10-05 18:58 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-01 17:52 [PATCH v3 0/5] hugetlb: add demote/split page functionality Mike Kravetz
2021-10-01 17:52 ` [PATCH v3 1/5] hugetlb: add demote hugetlb page sysfs interfaces Mike Kravetz
2021-10-04 13:00   ` Oscar Salvador
2021-10-04 18:27     ` Mike Kravetz
2021-10-05  8:23   ` Oscar Salvador
2021-10-05 16:58     ` Mike Kravetz
2021-10-01 17:52 ` [PATCH v3 2/5] mm/cma: add cma_pages_valid to determine if pages are in CMA Mike Kravetz
2021-10-05  8:45   ` Oscar Salvador
2021-10-05 17:06     ` Mike Kravetz
2021-10-05  8:48   ` David Hildenbrand
2021-10-01 17:52 ` [PATCH v3 3/5] hugetlb: be sure to free demoted CMA pages to CMA Mike Kravetz
2021-10-05  9:33   ` Oscar Salvador
2021-10-05 18:57     ` Mike Kravetz [this message]
2021-10-06  7:54       ` Oscar Salvador
2021-10-06 18:27         ` Mike Kravetz
2021-10-01 17:52 ` [PATCH v3 4/5] hugetlb: add demote bool to gigantic page routines Mike Kravetz
2021-10-01 17:52 ` [PATCH v3 5/5] hugetlb: add hugetlb demote page support Mike Kravetz
2021-10-06  8:41   ` Oscar Salvador
2021-10-06 18:52     ` Mike Kravetz
2021-10-07  7:52       ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f1f011cc-5c5a-7c4f-5701-929918fa2dbb@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).