linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>,
	Michal Hocko <mhocko@suse.com>,
	Oscar Salvador <osalvador@suse.de>, Zi Yan <ziy@nvidia.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	David Rientjes <rientjes@google.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Subject: Re: [PATCH v2 1/4] hugetlb: add demote hugetlb page sysfs interfaces
Date: Thu, 23 Sep 2021 15:08:06 -0700	[thread overview]
Message-ID: <fc27f1a8-6a53-e7a6-ec6c-e0c185912c1f@oracle.com> (raw)
In-Reply-To: <20210923142426.8930bd1cfcabc782a2152c18@linux-foundation.org>

On 9/23/21 2:24 PM, Andrew Morton wrote:
> On Thu, 23 Sep 2021 10:53:44 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>> Two new sysfs files are added to demote hugtlb pages.  These files are
>> both per-hugetlb page size and per node.  Files are:
>>   demote_size - The size in Kb that pages are demoted to. (read-write)
>>   demote - The number of huge pages to demote. (write-only)
>>
>> By default, demote_size is the next smallest huge page size.  Valid huge
>> page sizes less than huge page size may be written to this file.  When
>> huge pages are demoted, they are demoted to this size.
>>
>> Writing a value to demote will result in an attempt to demote that
>> number of hugetlb pages to an appropriate number of demote_size pages.
>>
>> NOTE: Demote interfaces are only provided for huge page sizes if there
>> is a smaller target demote huge page size.  For example, on x86 1GB huge
>> pages will have demote interfaces.  2MB huge pages will not have demote
>> interfaces.
>>
>> This patch does not provide full demote functionality.  It only provides
>> the sysfs interfaces.
>>
>> It also provides documentation for the new interfaces.
>>
>> ...
>>
>> +static ssize_t demote_store(struct kobject *kobj,
>> +	       struct kobj_attribute *attr, const char *buf, size_t len)
>> +{
>> +	unsigned long nr_demote;
>> +	unsigned long nr_available;
>> +	nodemask_t nodes_allowed, *n_mask;
>> +	struct hstate *h;
>> +	int err;
>> +	int nid;
>> +
>> +	err = kstrtoul(buf, 10, &nr_demote);
>> +	if (err)
>> +		return err;
>> +	h = kobj_to_hstate(kobj, &nid);
>> +
>> +	/* Synchronize with other sysfs operations modifying huge pages */
>> +	mutex_lock(&h->resize_lock);
>> +
>> +	spin_lock_irq(&hugetlb_lock);
>> +	if (nid != NUMA_NO_NODE) {
>> +		nr_available = h->free_huge_pages_node[nid];
>> +		init_nodemask_of_node(&nodes_allowed, nid);
>> +		n_mask = &nodes_allowed;
>> +	} else {
>> +		nr_available = h->free_huge_pages;
>> +		n_mask = &node_states[N_MEMORY];
>> +	}
>> +	nr_available -= h->resv_huge_pages;
>> +	if (nr_available <= 0)
>> +		goto out;
>> +	nr_demote = min(nr_available, nr_demote);
>> +
>> +	while (nr_demote) {
>> +		if (!demote_pool_huge_page(h, n_mask))
>> +			break;
>> +
>> +		/*
>> +		 * We may have dropped the lock in the routines to
>> +		 * demote/free a page.  Recompute nr_demote as counts could
>> +		 * have changed and we want to make sure we do not demote
>> +		 * a reserved huge page.
>> +		 */
> 
> This comment doesn't become true until patch #4, and is a bit confusing
> in patch #1.  Also, saying "the lock" is far less helpful than saying
> "hugetlb_lock"!

Right.  That is the result of slicing and dicing working code to create
individual patches.  Sorry.  I will correct.

The comment is also not 100% accurate.  demote_pool_huge_page will
always drop hugetlb_lock except in the quick error case which is not
really interesting.  This helps answer your next question.

> 
> 
>> +		nr_demote--;
>> +		if (nid != NUMA_NO_NODE)
>> +			nr_available = h->free_huge_pages_node[nid];
>> +		else
>> +			nr_available = h->free_huge_pages;
>> +		nr_available -= h->resv_huge_pages;
>> +		if (nr_available <= 0)
>> +			nr_demote = 0;
>> +		else
>> +			nr_demote = min(nr_available, nr_demote);
>> +	}
>> +
>> +out:
>> +	spin_unlock_irq(&hugetlb_lock);
> 
> How long can we spend with IRQs disabled here (after patch #4!)?

Not very long.  We will drop the lock on page demote.  This is because
we need to potentially allocate vmemmap pages.  We will actually go
through quite a few acquire/drop lock cycles for each demoted page.
Something like:
	dequeue page to be demoted
	drop lock
	potentially allocate vmemmap pages
	for each page of demoted size
		prep page
		acquire lock
		enqueue page to new pool
		drop lock
	reacquire lock

This is 'no worse' than the lock cycling that happens with existing pool
adjustment mechanisms such as "echo > nr_hugepages".

The updated comment will point out that there is little need to worry
about lock hold/irq disable time.
-- 
Mike Kravetz

>> +	mutex_unlock(&h->resize_lock);
>> +
>> +	return len;
>> +}
>> +HSTATE_ATTR_WO(demote);
>> +
> 

  reply	other threads:[~2021-09-23 22:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-23 17:53 [PATCH v2 0/4] hugetlb: add demote/split page functionality Mike Kravetz
2021-09-23 17:53 ` [PATCH v2 1/4] hugetlb: add demote hugetlb page sysfs interfaces Mike Kravetz
2021-09-23 21:24   ` Andrew Morton
2021-09-23 22:08     ` Mike Kravetz [this message]
2021-09-24  7:08   ` Aneesh Kumar K.V
2021-09-29 18:22     ` Mike Kravetz
2021-09-24  9:28   ` David Hildenbrand
2021-09-29 19:34     ` Mike Kravetz
2021-09-23 17:53 ` [PATCH v2 2/4] hugetlb: add HPageCma flag and code to free non-gigantic pages in CMA Mike Kravetz
2021-09-24  9:36   ` David Hildenbrand
2021-09-29 19:42     ` Mike Kravetz
2021-09-29 23:21       ` Mike Kravetz
2021-10-01 17:50         ` Mike Kravetz
2021-09-23 17:53 ` [PATCH v2 3/4] hugetlb: add demote bool to gigantic page routines Mike Kravetz
2021-09-23 17:53 ` [PATCH v2 4/4] hugetlb: add hugetlb demote page support Mike Kravetz
2021-09-24  9:44   ` David Hildenbrand
2021-09-29 19:54     ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc27f1a8-6a53-e7a6-ec6c-e0c185912c1f@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).