From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: Albert Huang <huangjie.albert@bytedance.com>, mike.kravetz@oracle.com
Cc: Jonathan Corbet <corbet@lwn.net>,
Muchun Song <songmuchun@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v2] mm: hugetlb: support for shared memory policy
Date: Wed, 19 Oct 2022 17:19:05 +0530 [thread overview]
Message-ID: <e391aeec-08b6-12e4-42e1-e556860e49c5@linux.ibm.com> (raw)
In-Reply-To: <20221019092928.44146-1-huangjie.albert@bytedance.com>
On 10/19/22 2:59 PM, Albert Huang wrote:
> From: "huangjie.albert" <huangjie.albert@bytedance.com>
>
> implement get/set_policy for hugetlb_vm_ops to support the shared policy
> This ensures that the mempolicy of all processes sharing this huge page
> file is consistent.
>
> In some scenarios where huge pages are shared:
> if we need to limit the memory usage of vm within node0, so I set qemu's
> mempilciy bind to node0, but if there is a process (such as virtiofsd)
> shared memory with the vm, in this case. If the page fault is triggered
> by virtiofsd, the allocated memory may go to node1 which depends on
> virtiofsd. Although we can use the memory prealloc provided by qemu to
> avoid this issue, but this method will significantly increase the
> creation time of the vm(a few seconds, depending on memory size).
>
> after we hooked up hugetlb_vm_ops(set/get_policy):
> both the shared memory segments created by shmget() with SHM_HUGETLB flag
> and the mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy.
>
> v1->v2:
> 1、hugetlb share the memory policy when the vma with the VM_SHARED flag.
> 2、update the documentation.
>
> Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
> ---
> .../admin-guide/mm/numa_memory_policy.rst | 20 +++++++++------
> mm/hugetlb.c | 25 +++++++++++++++++++
> 2 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
> index 5a6afecbb0d0..5672a6c2d2ef 100644
> --- a/Documentation/admin-guide/mm/numa_memory_policy.rst
> +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
> @@ -133,14 +133,18 @@ Shared Policy
> the object share the policy, and all pages allocated for the
> shared object, by any task, will obey the shared policy.
>
> - As of 2.6.22, only shared memory segments, created by shmget() or
> - mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy. When shared
> - policy support was added to Linux, the associated data structures were
> - added to hugetlbfs shmem segments. At the time, hugetlbfs did not
> - support allocation at fault time--a.k.a lazy allocation--so hugetlbfs
> - shmem segments were never "hooked up" to the shared policy support.
> - Although hugetlbfs segments now support lazy allocation, their support
> - for shared policy has not been completed.
> + As of 2.6.22, only shared memory segments, created by shmget() without
> + SHM_HUGETLB flag or mmap(MAP_ANONYMOUS|MAP_SHARED) without MAP_HUGETLB
> + flag, support shared policy. When shared policy support was added to Linux,
> + the associated data structures were added to hugetlbfs shmem segments.
> + At the time, hugetlbfs did not support allocation at fault time--a.k.a
> + lazy allocation--so hugetlbfs shmem segments were never "hooked up" to
> + the shared policy support. Although hugetlbfs segments now support lazy
> + allocation, their support for shared policy has not been completed.
> +
> + after we hooked up hugetlb_vm_ops(set/get_policy):
> + both the shared memory segments created by shmget() with SHM_HUGETLB flag
> + and mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy.
>
> As mentioned above in :ref:`VMA policies <vma_policy>` section,
> allocations of page cache pages for regular files mmap()ed
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87d875e5e0a9..fc7038931832 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4632,6 +4632,27 @@ static vm_fault_t hugetlb_vm_op_fault(struct vm_fault *vmf)
> return 0;
> }
>
> +#ifdef CONFIG_NUMA
> +int hugetlb_vm_op_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
> +{
> + struct inode *inode = file_inode(vma->vm_file);
> +
> + if (!(vma->vm_flags & VM_SHARED))
> + return 0;
> +
> + return mpol_set_shared_policy(&HUGETLBFS_I(inode)->policy, vma, mpol);
> +}
> +
> +struct mempolicy *hugetlb_vm_op_get_policy(struct vm_area_struct *vma, unsigned long addr)
> +{
> + struct inode *inode = file_inode(vma->vm_file);
> + pgoff_t index;
> +
> + index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> + return mpol_shared_policy_lookup(&HUGETLBFS_I(inode)->policy, index);
> +}
> +#endif
> +
> /*
> * When a new function is introduced to vm_operations_struct and added
> * to hugetlb_vm_ops, please consider adding the function to shm_vm_ops.
> @@ -4645,6 +4666,10 @@ const struct vm_operations_struct hugetlb_vm_ops = {
> .close = hugetlb_vm_op_close,
> .may_split = hugetlb_vm_op_split,
> .pagesize = hugetlb_vm_op_pagesize,
> +#ifdef CONFIG_NUMA
> + .set_policy = hugetlb_vm_op_set_policy,
> + .get_policy = hugetlb_vm_op_get_policy,
> +#endif
> };
>
> static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
How is the current usage of
/* Set numa allocation policy based on index */
hugetlb_set_vma_policy(&pseudo_vma, inode, index);
enforcing the policy with the current code? Also if we have get_policy()
Can we remove the usage of the same in hugetlbfs_fallocate()
after this patch? With shared policy we should be able to fetch
the policy via get_vma_policy()?
A related question does shm_pseudo_vma_init() requires that mpolicy_lookup?
-aneesh
next prev parent reply other threads:[~2022-10-19 12:15 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-12 8:15 [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops Albert Huang
2022-10-12 19:45 ` Hugh Dickins
2022-10-14 16:56 ` Mike Kravetz
2022-10-17 3:35 ` [External] " 黄杰
2022-10-19 9:29 ` [PATCH v2] mm: hugetlb: support for shared memory policy Albert Huang
2022-10-19 11:49 ` Aneesh Kumar K V [this message]
2022-10-19 9:33 ` [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops 黄杰
2022-10-23 20:16 ` Hugh Dickins
2022-10-17 8:44 ` David Hildenbrand
2022-10-17 9:48 ` [External] " 黄杰
2022-10-17 11:33 ` David Hildenbrand
2022-10-17 11:46 ` 黄杰
2022-10-17 12:00 ` David Hildenbrand
2022-10-18 9:27 ` 黄杰
2022-10-18 9:35 ` David Hildenbrand
2022-10-17 17:59 ` [External] " Mike Kravetz
2022-10-18 9:24 ` 黄杰
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e391aeec-08b6-12e4-42e1-e556860e49c5@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=huangjie.albert@bytedance.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).