All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Muchun Song <songmuchun@bytedance.com>
Cc: corbet@lwn.net, akpm@linux-foundation.org, mcgrof@kernel.org,
	keescook@chromium.org, yzaikin@google.com, osalvador@suse.de,
	david@redhat.com, masahiroy@kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, duanxiongchun@bytedance.com,
	smuchun@gmail.com
Subject: Re: [PATCH v9 4/4] mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl
Date: Fri, 6 May 2022 09:50:53 -0700	[thread overview]
Message-ID: <e199085c-d89d-093b-4257-0102980861bc@oracle.com> (raw)
In-Reply-To: <YnSM0Wd/lEc9wnwJ@FVFYT0MHHV2J.googleapis.com>

On 5/5/22 19:49, Muchun Song wrote:
> On Thu, May 05, 2022 at 09:48:34AM -0700, Mike Kravetz wrote:
>> On 5/5/22 01:02, Muchun Song wrote:
>>> On Wed, May 04, 2022 at 08:36:00PM -0700, Mike Kravetz wrote:
>>>> On 5/4/22 19:35, Muchun Song wrote:
>>>>> On Wed, May 04, 2022 at 03:12:39PM -0700, Mike Kravetz wrote:
>>>>>> On 4/29/22 05:18, Muchun Song wrote:
>>>>>>> +static void vmemmap_optimize_mode_switch(enum vmemmap_optimize_mode to)
>>>>>>> +{
>>>>>>> +	if (vmemmap_optimize_mode == to)
>>>>>>> +		return;
>>>>>>> +
>>>>>>> +	if (to == VMEMMAP_OPTIMIZE_OFF)
>>>>>>> +		static_branch_dec(&hugetlb_optimize_vmemmap_key);
>>>>>>> +	else
>>>>>>> +		static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>>>>>> +	vmemmap_optimize_mode = to;
>>>>>>> +}
>>>>>>> +
>>>>>>>  static int __init hugetlb_vmemmap_early_param(char *buf)
>>>>>>>  {
>>>>>>>  	bool enable;
>>>>>>> +	enum vmemmap_optimize_mode mode;
>>>>>>>  
>>>>>>>  	if (kstrtobool(buf, &enable))
>>>>>>>  		return -EINVAL;
>>>>>>>  
>>>>>>> -	if (enable)
>>>>>>> -		static_branch_enable(&hugetlb_optimize_vmemmap_key);
>>>>>>> -	else
>>>>>>> -		static_branch_disable(&hugetlb_optimize_vmemmap_key);
>>>>>>> +	mode = enable ? VMEMMAP_OPTIMIZE_ON : VMEMMAP_OPTIMIZE_OFF;
>>>>>>> +	vmemmap_optimize_mode_switch(mode);
>>>>>>>  
>>>>>>>  	return 0;
>>>>>>>  }
>>>>>>> @@ -60,6 +80,8 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
>>>>>>>  	vmemmap_end	= vmemmap_addr + (vmemmap_pages << PAGE_SHIFT);
>>>>>>>  	vmemmap_reuse	= vmemmap_addr - PAGE_SIZE;
>>>>>>>  
>>>>>>> +	VM_BUG_ON_PAGE(!vmemmap_pages, head);
>>>>>>> +
>>>>>>>  	/*
>>>>>>>  	 * The pages which the vmemmap virtual address range [@vmemmap_addr,
>>>>>>>  	 * @vmemmap_end) are mapped to are freed to the buddy allocator, and
>>>>>>> @@ -69,8 +91,10 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
>>>>>>>  	 */
>>>>>>>  	ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
>>>>>>>  				  GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
>>>>>>> -	if (!ret)
>>>>>>> +	if (!ret) {
>>>>>>>  		ClearHPageVmemmapOptimized(head);
>>>>>>> +		static_branch_dec(&hugetlb_optimize_vmemmap_key);
>>>>>>> +	}
>>>>>>>  
>>>>>>>  	return ret;
>>>>>>>  }
>>>>>>> @@ -84,6 +108,8 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
>>>>>>>  	if (!vmemmap_pages)
>>>>>>>  		return;
>>>>>>>  
>>>>>>> +	static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>>>>>
>>>>>> Can you explain the reasoning behind doing the static_branch_inc here in free,
>>>>>> and static_branch_dec in alloc?
>>>>>> IIUC, they may not be absolutely necessary but you could use the count to
>>>>>> know how many optimized pages are in use?  Or, I may just be missing
>>>>>> something.
>>>>>>
>>>>>
>>>>> Partly right. One 'count' is not enough. I have implemented this with similar
>>>>> approach in v6 [1]. Except the 'count', we also need a lock to do synchronization.
>>>>> However, both count and synchronization are included in static_key_inc/dec
>>>>> infrastructure. It is simpler to use static_key_inc/dec directly, right? 
>>>>>
>>>>> [1] https://lore.kernel.org/all/20220330153745.20465-5-songmuchun@bytedance.com/
>>>>>
>>>>
>>>> Sorry, but I am a little confused.
>>>>
>>>> vmemmap_optimize_mode_switch will static_key_inc to enable and static_key_dec
>>>> to disable.  In addition each time we optimize (allocate) a hugetlb page after
>>>> enabling we will static_key_inc.
>>>>
>>>> Suppose we have 1 hugetlb page optimized.  So static count == 2 IIUC.
>>>> The someone turns off optimization via sysctl.  static count == 1 ???
>>>
>>> Definitely right.
>>>
>>>> If we then add another hugetlb page via nr_hugepages it seems that it
>>>> would be optimized as static count == 1.  Is that correct?  Do we need
>>>
>>> I'm wrong.
>>>
>>>> to free all hugetlb pages with optimization before we can add new pages
>>>> without optimization?
>>>>
>>>
>>> My bad. I think the following code would fix this.
>>>
>>> Thanks for your review carefully.
>>>
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index 5820a681a724..997e192aeed7 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -105,7 +105,7 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
>>>         unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages;
>>>
>>>         vmemmap_pages = hugetlb_optimize_vmemmap_pages(h);
>>> -       if (!vmemmap_pages)
>>> +       if (!vmemmap_pages || READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
>>>                 return;
>>>
>>>         static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>>  
>>
>> If vmemmap_optimize_mode == VMEMMAP_OPTIMIZE_OFF is sufficient for turning
>> off optimizations, do we really need to static_branch_inc/dev for each
>> hugetlb page?
>>
> 
> static_branch_inc/dec is necessary since the user could change
> vmemmap_optimize_mode to off after the 'if' judgement.
> 
> CPU0:				CPU1:
> // Assume vmemmap_optimize_mode == 1
> // and static_key_count == 1
> if (vmemmap_optimize_mode == VMEMMAP_OPTIMIZE_OFF)
> 	return;
> 				hugetlb_optimize_vmemmap_handler();
> 					vmemmap_optimize_mode = 0;
> 					static_branch_dec();
> 					// static_key_count == 0
> // Enable static_key if necessary
> static_branch_inc();
> 
> Does this make sense for you?

Yes, it makes sense and is require because hugetlb_optimize_vmemmap_pages()
performs two functions:
1) It determines if vmemmap_optimization is enabled
2) It specifies how many vmemmap pages can be saved with optimization
hugetlb_optimize_vmemmap_pages returns 0 if static_key_count == 0, so this
would cause problems in places such as hugetlb free path (hugetlb_vmemmap_alloc).  I hope my understanding is correct?

Would it make the code more clear if we did not do the check for
vmemmap_optimization in hugetlb_optimize_vmemmap_pages()?  Instead:
- hugetlb_optimize_vmemmap_pages ALWAYS returns the number of vmemmap pages
  that can be freed/optimized
- At hugetlb allocation time (hugetlb_vmemmap_free) we only check
  hugetlb_optimize_vmemmap_enabled() to determine if optimization should
  be performed.
- After hugetlb_vmemmap_free, we can use HPageVmemmapOptimized to determine
  if vmemap pages need to be allocated in hugetlb freeing paths.

Perhaps, there is something wrong with the above suggestion?

I know you have always had hugetlb_optimize_vmemmap_pages perform the two
functions.  So, splitting functionality may not be more clear for you.  I am
OK leaving code as is (key inc/dec for each page).  Just wanted to get your
(and perhaps other) thoughts on splitting functionality as described above.   
-- 
Mike Kravetz

  reply	other threads:[~2022-05-06 16:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29 12:18 [PATCH v9 0/4] add hugetlb_optimize_vmemmap sysctl Muchun Song
2022-04-29 12:18 ` [PATCH v9 1/4] mm: hugetlb_vmemmap: disable hugetlb_optimize_vmemmap when struct page crosses page boundaries Muchun Song
2022-05-03 21:49   ` Mike Kravetz
2022-04-29 12:18 ` [PATCH v9 2/4] mm: memory_hotplug: override memmap_on_memory when hugetlb_free_vmemmap=on Muchun Song
2022-05-04  0:23   ` Mike Kravetz
2022-05-04  3:37     ` Muchun Song
2022-05-07 21:05       ` Andrew Morton
2022-04-29 12:18 ` [PATCH v9 3/4] mm: hugetlb_vmemmap: use kstrtobool for hugetlb_vmemmap param parsing Muchun Song
2022-05-04  0:42   ` Mike Kravetz
2022-04-29 12:18 ` [PATCH v9 4/4] mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl Muchun Song
2022-05-04 22:12   ` Mike Kravetz
2022-05-05  2:35     ` Muchun Song
2022-05-05  3:36       ` Mike Kravetz
2022-05-05  8:02         ` Muchun Song
2022-05-05 16:48           ` Mike Kravetz
2022-05-06  2:49             ` Muchun Song
2022-05-06 16:50               ` Mike Kravetz [this message]
2022-05-07 13:10                 ` Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e199085c-d89d-093b-4257-0102980861bc@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=masahiroy@kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=osalvador@suse.de \
    --cc=smuchun@gmail.com \
    --cc=songmuchun@bytedance.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.