Re: [PATCH v2] mm: hugetlb: support for shared memory policy

From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: Albert Huang <huangjie.albert@bytedance.com>, mike.kravetz@oracle.com
Cc: Jonathan Corbet <corbet@lwn.net>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v2] mm: hugetlb: support for shared memory policy
Date: Wed, 19 Oct 2022 17:19:05 +0530	[thread overview]
Message-ID: <e391aeec-08b6-12e4-42e1-e556860e49c5@linux.ibm.com> (raw)
In-Reply-To: <20221019092928.44146-1-huangjie.albert@bytedance.com>

On 10/19/22 2:59 PM, Albert Huang wrote:
> From: "huangjie.albert" <huangjie.albert@bytedance.com>
> 
> implement get/set_policy for hugetlb_vm_ops to support the shared policy
> This ensures that the mempolicy of all processes sharing this huge page
> file is consistent.
> 
> In some scenarios where huge pages are shared:
> if we need to limit the memory usage of vm within node0, so I set qemu's
> mempilciy bind to node0, but if there is a process (such as virtiofsd)
> shared memory with the vm, in this case. If the page fault is triggered
> by virtiofsd, the allocated memory may go to node1 which depends on
> virtiofsd. Although we can use the memory prealloc provided by qemu to
> avoid this issue, but this method will significantly increase the
> creation time of the vm(a few seconds, depending on memory size).
> 
> after we hooked up hugetlb_vm_ops(set/get_policy):
> both the shared memory segments created by shmget() with SHM_HUGETLB flag
> and the mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy.
> 
> v1->v2:
> 1、hugetlb share the memory policy when the vma with the VM_SHARED flag.
> 2、update the documentation.
> 
> Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
> ---
>  .../admin-guide/mm/numa_memory_policy.rst     | 20 +++++++++------
>  mm/hugetlb.c                                  | 25 +++++++++++++++++++
>  2 files changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
> index 5a6afecbb0d0..5672a6c2d2ef 100644
> --- a/Documentation/admin-guide/mm/numa_memory_policy.rst
> +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
> @@ -133,14 +133,18 @@ Shared Policy
>  	the object share the policy, and all pages allocated for the
>  	shared object, by any task, will obey the shared policy.
>  
> -	As of 2.6.22, only shared memory segments, created by shmget() or
> -	mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy.  When shared
> -	policy support was added to Linux, the associated data structures were
> -	added to hugetlbfs shmem segments.  At the time, hugetlbfs did not
> -	support allocation at fault time--a.k.a lazy allocation--so hugetlbfs
> -	shmem segments were never "hooked up" to the shared policy support.
> -	Although hugetlbfs segments now support lazy allocation, their support
> -	for shared policy has not been completed.
> +	As of 2.6.22, only shared memory segments, created by shmget() without
> +	SHM_HUGETLB flag or mmap(MAP_ANONYMOUS|MAP_SHARED) without MAP_HUGETLB
> +	flag, support shared policy. When shared policy support was added to Linux,
> +	the associated data structures were added to hugetlbfs shmem segments.
> +	At the time, hugetlbfs did not support allocation at fault time--a.k.a
> +	lazy allocation--so hugetlbfs shmem segments were never "hooked up" to
> +	the shared policy support. Although hugetlbfs segments now support lazy
> +	allocation, their support for shared policy has not been completed.
> +
> +	after we hooked up hugetlb_vm_ops(set/get_policy):
> +	both the shared memory segments created by shmget() with SHM_HUGETLB flag
> +	and mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy.
>  
>  	As mentioned above in :ref:`VMA policies <vma_policy>` section,
>  	allocations of page cache pages for regular files mmap()ed
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87d875e5e0a9..fc7038931832 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4632,6 +4632,27 @@ static vm_fault_t hugetlb_vm_op_fault(struct vm_fault *vmf)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_NUMA
> +int hugetlb_vm_op_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
> +{
> +	struct inode *inode = file_inode(vma->vm_file);
> +
> +	if (!(vma->vm_flags & VM_SHARED))
> +		return 0;
> +
> +	return mpol_set_shared_policy(&HUGETLBFS_I(inode)->policy, vma, mpol);
> +}
> +
> +struct mempolicy *hugetlb_vm_op_get_policy(struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct inode *inode = file_inode(vma->vm_file);
> +	pgoff_t index;
> +
> +	index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> +	return mpol_shared_policy_lookup(&HUGETLBFS_I(inode)->policy, index);
> +}
> +#endif
> +
>  /*
>   * When a new function is introduced to vm_operations_struct and added
>   * to hugetlb_vm_ops, please consider adding the function to shm_vm_ops.
> @@ -4645,6 +4666,10 @@ const struct vm_operations_struct hugetlb_vm_ops = {
>  	.close = hugetlb_vm_op_close,
>  	.may_split = hugetlb_vm_op_split,
>  	.pagesize = hugetlb_vm_op_pagesize,
> +#ifdef CONFIG_NUMA
> +	.set_policy = hugetlb_vm_op_set_policy,
> +	.get_policy = hugetlb_vm_op_get_policy,
> +#endif
>  };
>  
>  static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,

How is the current usage of 

/* Set numa allocation policy based on index */
hugetlb_set_vma_policy(&pseudo_vma, inode, index);

enforcing the policy with the current code? Also if we have get_policy()

Can we remove the usage of the same in hugetlbfs_fallocate()
after this patch? With shared policy we should be able to fetch
the policy via get_vma_policy()?

A related question does shm_pseudo_vma_init() requires that mpolicy_lookup? 

-aneesh