From: Mina Almasry <almasrymina@google.com> To: unlisted-recipients:; (no To-header on input) Cc: Mina Almasry <almasrymina@google.com>, David Hildenbrand <david@redhat.com>, Matthew Wilcox <willy@infradead.org>, "Paul E . McKenney" <paulmckrcu@fb.com>, Yu Zhao <yuzhao@google.com>, Jonathan Corbet <corbet@lwn.net>, Andrew Morton <akpm@linux-foundation.org>, Peter Xu <peterx@redhat.com>, Ivan Teterevkov <ivan.teterevkov@nutanix.com>, Florian Schmidt <florian.schmidt@nutanix.com>, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v4] mm: Add PM_HUGE_THP_MAPPING to /proc/pid/pagemap Date: Sun, 7 Nov 2021 15:57:54 -0800 [thread overview] Message-ID: <20211107235754.1395488-1-almasrymina@google.com> (raw) Add PM_HUGE_THP MAPPING to allow userspace to detect whether a given virt address is currently mapped by a transparent huge page or not. Example use case is a process requesting THPs from the kernel (via a huge tmpfs mount for example), for a performance critical region of memory. The userspace may want to query whether the kernel is actually backing this memory by hugepages or not. PM_HUGE_THP_MAPPING bit is set if the virt address is mapped at the PMD level and the underlying page is a transparent huge page. Tested manually by adding logging into transhuge-stress, and by allocating THP and querying the PM_HUGE_THP_MAPPING flag at those virtual addresses. Signed-off-by: Mina Almasry <almasrymina@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Rientjes rientjes@google.com Cc: Paul E. McKenney <paulmckrcu@fb.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Florian Schmidt <florian.schmidt@nutanix.com> Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org --- Changes in v4: - Removed unnecessary moving of flags variable declaration Changes in v3: - Renamed PM_THP to PM_HUGE_THP_MAPPING - Fixed checks to set PM_HUGE_THP_MAPPING - Added PM_HUGE_THP_MAPPING docs --- Documentation/admin-guide/mm/pagemap.rst | 3 ++- fs/proc/task_mmu.c | 3 +++ tools/testing/selftests/vm/transhuge-stress.c | 21 +++++++++++++++---- 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst index fdc19fbc10839..8a0f0064ff336 100644 --- a/Documentation/admin-guide/mm/pagemap.rst +++ b/Documentation/admin-guide/mm/pagemap.rst @@ -23,7 +23,8 @@ There are four components to pagemap: * Bit 56 page exclusively mapped (since 4.2) * Bit 57 pte is uffd-wp write-protected (since 5.13) (see :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`) - * Bits 57-60 zero + * Bit 58 page is a huge (PMD size) THP mapping + * Bits 59-60 zero * Bit 61 page is file-page or shared-anon (since 3.5) * Bit 62 page swapped * Bit 63 page present diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ad667dbc96f5c..6f1403f83b310 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1302,6 +1302,7 @@ struct pagemapread { #define PM_SOFT_DIRTY BIT_ULL(55) #define PM_MMAP_EXCLUSIVE BIT_ULL(56) #define PM_UFFD_WP BIT_ULL(57) +#define PM_HUGE_THP_MAPPING BIT_ULL(58) #define PM_FILE BIT_ULL(61) #define PM_SWAP BIT_ULL(62) #define PM_PRESENT BIT_ULL(63) @@ -1456,6 +1457,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (page && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; + if (page && is_transparent_hugepage(page)) + flags |= PM_HUGE_THP_MAPPING; for (; addr != end; addr += PAGE_SIZE) { pagemap_entry_t pme = make_pme(frame, flags); diff --git a/tools/testing/selftests/vm/transhuge-stress.c b/tools/testing/selftests/vm/transhuge-stress.c index fd7f1b4a96f94..7dce18981fff5 100644 --- a/tools/testing/selftests/vm/transhuge-stress.c +++ b/tools/testing/selftests/vm/transhuge-stress.c @@ -16,6 +16,12 @@ #include <string.h> #include <sys/mman.h> +/* + * We can use /proc/pid/pagemap to detect whether the kernel was able to find + * hugepages or no. This can be very noisy, so is disabled by default. + */ +#define NO_DETECT_HUGEPAGES + #define PAGE_SHIFT 12 #define HPAGE_SHIFT 21 @@ -23,6 +29,7 @@ #define HPAGE_SIZE (1 << HPAGE_SHIFT) #define PAGEMAP_PRESENT(ent) (((ent) & (1ull << 63)) != 0) +#define PAGEMAP_THP(ent) (((ent) & (1ull << 58)) != 0) #define PAGEMAP_PFN(ent) ((ent) & ((1ull << 55) - 1)) int pagemap_fd; @@ -47,10 +54,16 @@ int64_t allocate_transhuge(void *ptr) (uintptr_t)ptr >> (PAGE_SHIFT - 3)) != sizeof(ent)) err(2, "read pagemap"); - if (PAGEMAP_PRESENT(ent[0]) && PAGEMAP_PRESENT(ent[1]) && - PAGEMAP_PFN(ent[0]) + 1 == PAGEMAP_PFN(ent[1]) && - !(PAGEMAP_PFN(ent[0]) & ((1 << (HPAGE_SHIFT - PAGE_SHIFT)) - 1))) - return PAGEMAP_PFN(ent[0]); + if (PAGEMAP_PRESENT(ent[0]) && PAGEMAP_PRESENT(ent[1])) { +#ifndef NO_DETECT_HUGEPAGES + if (!PAGEMAP_THP(ent[0])) + fprintf(stderr, "WARNING: detected non THP page\n"); +#endif + if (PAGEMAP_PFN(ent[0]) + 1 == PAGEMAP_PFN(ent[1]) && + !(PAGEMAP_PFN(ent[0]) & + ((1 << (HPAGE_SHIFT - PAGE_SHIFT)) - 1))) + return PAGEMAP_PFN(ent[0]); + } return -1; } -- 2.34.0.rc0.344.g81b53c2807-goog
WARNING: multiple messages have this Message-ID (diff)
From: Mina Almasry <almasrymina@google.com> Cc: Mina Almasry <almasrymina@google.com>, David Hildenbrand <david@redhat.com>, Matthew Wilcox <willy@infradead.org>, "Paul E . McKenney" <paulmckrcu@fb.com>, Yu Zhao <yuzhao@google.com>, Jonathan Corbet <corbet@lwn.net>, Andrew Morton <akpm@linux-foundation.org>, Peter Xu <peterx@redhat.com>, Ivan Teterevkov <ivan.teterevkov@nutanix.com>, Florian Schmidt <florian.schmidt@nutanix.com>, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v4] mm: Add PM_HUGE_THP_MAPPING to /proc/pid/pagemap Date: Sun, 7 Nov 2021 15:57:54 -0800 [thread overview] Message-ID: <20211107235754.1395488-1-almasrymina@google.com> (raw) Add PM_HUGE_THP MAPPING to allow userspace to detect whether a given virt address is currently mapped by a transparent huge page or not. Example use case is a process requesting THPs from the kernel (via a huge tmpfs mount for example), for a performance critical region of memory. The userspace may want to query whether the kernel is actually backing this memory by hugepages or not. PM_HUGE_THP_MAPPING bit is set if the virt address is mapped at the PMD level and the underlying page is a transparent huge page. Tested manually by adding logging into transhuge-stress, and by allocating THP and querying the PM_HUGE_THP_MAPPING flag at those virtual addresses. Signed-off-by: Mina Almasry <almasrymina@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Rientjes rientjes@google.com Cc: Paul E. McKenney <paulmckrcu@fb.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Florian Schmidt <florian.schmidt@nutanix.com> Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org --- Changes in v4: - Removed unnecessary moving of flags variable declaration Changes in v3: - Renamed PM_THP to PM_HUGE_THP_MAPPING - Fixed checks to set PM_HUGE_THP_MAPPING - Added PM_HUGE_THP_MAPPING docs --- Documentation/admin-guide/mm/pagemap.rst | 3 ++- fs/proc/task_mmu.c | 3 +++ tools/testing/selftests/vm/transhuge-stress.c | 21 +++++++++++++++---- 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst index fdc19fbc10839..8a0f0064ff336 100644 --- a/Documentation/admin-guide/mm/pagemap.rst +++ b/Documentation/admin-guide/mm/pagemap.rst @@ -23,7 +23,8 @@ There are four components to pagemap: * Bit 56 page exclusively mapped (since 4.2) * Bit 57 pte is uffd-wp write-protected (since 5.13) (see :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`) - * Bits 57-60 zero + * Bit 58 page is a huge (PMD size) THP mapping + * Bits 59-60 zero * Bit 61 page is file-page or shared-anon (since 3.5) * Bit 62 page swapped * Bit 63 page present diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ad667dbc96f5c..6f1403f83b310 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1302,6 +1302,7 @@ struct pagemapread { #define PM_SOFT_DIRTY BIT_ULL(55) #define PM_MMAP_EXCLUSIVE BIT_ULL(56) #define PM_UFFD_WP BIT_ULL(57) +#define PM_HUGE_THP_MAPPING BIT_ULL(58) #define PM_FILE BIT_ULL(61) #define PM_SWAP BIT_ULL(62) #define PM_PRESENT BIT_ULL(63) @@ -1456,6 +1457,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (page && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; + if (page && is_transparent_hugepage(page)) + flags |= PM_HUGE_THP_MAPPING; for (; addr != end; addr += PAGE_SIZE) { pagemap_entry_t pme = make_pme(frame, flags); diff --git a/tools/testing/selftests/vm/transhuge-stress.c b/tools/testing/selftests/vm/transhuge-stress.c index fd7f1b4a96f94..7dce18981fff5 100644 --- a/tools/testing/selftests/vm/transhuge-stress.c +++ b/tools/testing/selftests/vm/transhuge-stress.c @@ -16,6 +16,12 @@ #include <string.h> #include <sys/mman.h> +/* + * We can use /proc/pid/pagemap to detect whether the kernel was able to find + * hugepages or no. This can be very noisy, so is disabled by default. + */ +#define NO_DETECT_HUGEPAGES + #define PAGE_SHIFT 12 #define HPAGE_SHIFT 21 @@ -23,6 +29,7 @@ #define HPAGE_SIZE (1 << HPAGE_SHIFT) #define PAGEMAP_PRESENT(ent) (((ent) & (1ull << 63)) != 0) +#define PAGEMAP_THP(ent) (((ent) & (1ull << 58)) != 0) #define PAGEMAP_PFN(ent) ((ent) & ((1ull << 55) - 1)) int pagemap_fd; @@ -47,10 +54,16 @@ int64_t allocate_transhuge(void *ptr) (uintptr_t)ptr >> (PAGE_SHIFT - 3)) != sizeof(ent)) err(2, "read pagemap"); - if (PAGEMAP_PRESENT(ent[0]) && PAGEMAP_PRESENT(ent[1]) && - PAGEMAP_PFN(ent[0]) + 1 == PAGEMAP_PFN(ent[1]) && - !(PAGEMAP_PFN(ent[0]) & ((1 << (HPAGE_SHIFT - PAGE_SHIFT)) - 1))) - return PAGEMAP_PFN(ent[0]); + if (PAGEMAP_PRESENT(ent[0]) && PAGEMAP_PRESENT(ent[1])) { +#ifndef NO_DETECT_HUGEPAGES + if (!PAGEMAP_THP(ent[0])) + fprintf(stderr, "WARNING: detected non THP page\n"); +#endif + if (PAGEMAP_PFN(ent[0]) + 1 == PAGEMAP_PFN(ent[1]) && + !(PAGEMAP_PFN(ent[0]) & + ((1 << (HPAGE_SHIFT - PAGE_SHIFT)) - 1))) + return PAGEMAP_PFN(ent[0]); + } return -1; } -- 2.34.0.rc0.344.g81b53c2807-goog
next reply other threads:[~2021-11-07 23:58 UTC|newest] Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-11-07 23:57 Mina Almasry [this message] 2021-11-07 23:57 ` [PATCH v4] mm: Add PM_HUGE_THP_MAPPING to /proc/pid/pagemap Mina Almasry 2021-11-10 7:03 ` Peter Xu 2021-11-10 8:14 ` David Hildenbrand 2021-11-10 8:27 ` Peter Xu 2021-11-10 8:30 ` David Hildenbrand 2021-11-10 8:57 ` Peter Xu 2021-11-10 10:24 ` David Hildenbrand 2021-11-10 17:42 ` Mina Almasry 2021-11-12 7:41 ` Peter Xu 2021-11-10 17:50 ` Mina Almasry 2021-11-12 7:43 ` Peter Xu 2021-11-15 22:50 ` Mina Almasry 2021-11-16 1:59 ` Peter Xu 2021-11-17 19:50 ` Mina Almasry 2021-11-18 0:35 ` Peter Xu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20211107235754.1395488-1-almasrymina@google.com \ --to=almasrymina@google.com \ --cc=akpm@linux-foundation.org \ --cc=corbet@lwn.net \ --cc=david@redhat.com \ --cc=florian.schmidt@nutanix.com \ --cc=ivan.teterevkov@nutanix.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=paulmckrcu@fb.com \ --cc=peterx@redhat.com \ --cc=willy@infradead.org \ --cc=yuzhao@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.